From: Son Luong Ngoc <sluongng@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
Derrick Stolee <dstolee@microsoft.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/2] Sparse checkout status
Date: Wed, 17 Jun 2020 19:58:50 +0200 [thread overview]
Message-ID: <20200617175850.GA57254@C02YX140LVDN.corpad.adbkng.com> (raw)
In-Reply-To: <CABPp-BHyc=aYqY+YuvNRsFsrMPL6+O=CX37jzXx38_-SXw5gLA@mail.gmail.com>
Hi Elijah,
On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
>
> Well, there is `git sparse-checkout list`, assuming users know they
> are in a sparse-checkout, but the whole point of my suggested change
> is that they sometimes don't.
Ah thats true.
This was added recently and definitely slipped my mind often.
>
> This surprises me; I considered performance while writing it and kept
> it simple on that basis. In particular:
> * This does not cause any reading or writing of any extra files; it
> is done solely with information that is already loaded.
> * If users aren't in a sparse-checkout, their performance overhead
> is a single if-check, which I doubt anyone can measure.
> * If they are in a sparse-checkout, then they'd get one extra loop
> over files in the index to check the SKIP_WORKTREE bit.
>
> In which cases would performance implications be a concern? For a
> very simple point of reference, in a sparse-checkout of the linux
> kernel (using --cone mode and only selecting the drivers/ directory),
> I see the following timings for 'git status' in a clean checkout:
>
> Without my change:
> [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
> Time (mean ± σ): 78.8 ms ± 2.8 ms [User: 48.9 ms, System: 76.9 ms]
> Range (min … max): 74.0 ms … 84.0 ms 38 runs
>
> With my change:
> [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
> Time (mean ± σ): 79.8 ms ± 2.7 ms [User: 49.3 ms, System: 77.7 ms]
> Range (min … max): 74.8 ms … 84.5 ms 37 runs
>
> I know the linux kernel is tiny compared to repos like Windows or
> Office, but the relative scaling considerations are identical: it's
> one extra loop through the cached entries checking a bit for each
> entry. If people are worried about the "extra loop", I could find an
> existing loop to modify and just add an extra if-block in it so that
> we have the same number of loops. (I'm doubtful that'd actually help,
> but if the concern is an extra loop, it'd certainly avoid that.)
> Anyway, if you've got more information about it being too costly, I'm
> happy to listen. Otherwise, the overhead seems pretty small to me and
> it's only paid by those who would benefit from the information.
>
> However, all that said, I have good news: Peff already implemented the
> flag users can use to avoid this extra output, and did so back in
> September of 2009. It's called "--porcelain". Automated commands
> should already be using it, and if they aren't, they are what needs
> fixing -- not the long form status output.
When I wrote my initial reaction, the idea of having more than just a
percentage reported back stuck in my mind, specifically with using the
in-tree checkout that I mentioned.
But yeah, that's something down the line to address, you are absolutely
correct that the current patch has no performance impact. Thanks for the
reminder about '--porcelain'.
>
> I think having a 'git sparse-checkout status' would be a fine
> subcommand, and output like the above -- possibly also including other
> bits Stolee or I mentioned elsewhere in this thread -- would be cool
> and would be helpful; it'd complement what I'm doing here quite
> nicely.
>
> But you're solving a related problem rather than the one I was
> focusing on, and you have left the issue I was focusing on
> unaddressed. In particular, if users forgot that they sparsified in
> the first place, how are they going to know to run `git
> sparse-checkout status [--all]`?
>
> I think having a simple line of output in `git status` would remind
> them. With that reminder, they could today then go run 'git
> sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
> internally) or './sparsify --info' (as I use internally) to get more
> info. In the future we could provide additional things for them as
> well, such as your 'git sparse-checkout status'.
>
I do concede that this point could be a separate problem set and addressed
separately in the future.
>
> An aside, though, since you linked to the in-tree sparse-checkout
> definitions: When I reviewed that series, the possibility of merge
> conflicts and not knowing what sparse-checkout should have checked out
> when the in-tree defintions themselves were in a conflicted state
> seemed to me to be a pretty tough sticking point. I'm hoping someone
> has a clever solution, but I still don't yet. Do you?
I am no clever person, but I often take great pleasure in reading up
works of smarter people. One of which is the Google's and Facebook's Mercurial
extension sets that they opensourced a while ago to support large repos.
The test suite for FB's 'sparse' extension[1] may address your concerns?
The 'sparse' extension defines the sparse checkout definition of a
working repository. It supports '--enable-profile' which take in definition
files ('.sparse'). These profiles are often checked into the root dir
of the repo.
>
> Thanks,
> Elijah
Regards,
Son Luong.
[1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115
next prev parent reply other threads:[~2020-06-17 17:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-17 7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc
2020-06-17 16:48 ` Elijah Newren
2020-06-17 17:58 ` Son Luong Ngoc [this message]
2020-06-17 22:36 ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
-- strict thread matches above, loose matches on Subject: below --
2020-06-16 23:33 [PATCH 0/2] Sparse checkout status Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200617175850.GA57254@C02YX140LVDN.corpad.adbkng.com \
--to=sluongng@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).