git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Sparse checkout status
@ 2020-06-16 23:33 Elijah Newren via GitGitGadget
  0 siblings, 0 replies; 5+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-06-16 23:33 UTC (permalink / raw)
  To: git; +Cc: dstolee, Elijah Newren

Some of the feedback of folks trying out sparse-checkouts at $dayjob is that
sparse checkouts can sometimes be disorienting; users can forget that they
had a sparse-checkout and then wonder where files went. This series adds
some output to 'git status' and modifies git-prompt slightly as an attempt
to help.

For reference, I suspect that in repositories that are large enough that
people always use sparse-checkouts (e.g. windows or office repos), that this
isn't a problem. But when the repository is approximately
linux-kernel-sized, then it is reasonable for some folks to have a full
checkout. sparse-checkouts, however, can provide various build system and
IDE performance improvements, so we have a split of users who have
sparse-checkouts and those who have full checkouts. It's easy for users who
are bridging in between the two worlds or just trying out sparse-checkouts
for the first time to get confused.

Elijah Newren (2):
  [RFC] wt-status: show sparse checkout status as well
  [RFC] git-prompt: include sparsity state as well

 contrib/completion/git-prompt.sh |  7 ++++++-
 wt-status.c                      | 35 ++++++++++++++++++++++++++++++++
 wt-status.h                      |  1 +
 3 files changed, 42 insertions(+), 1 deletion(-)


base-commit: eebb51ba8cab97c0b3f3f18eaab7796803b8494b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-808%2Fnewren%2Fsparse-checkout-status-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-808/newren/sparse-checkout-status-v1
Pull-Request: https://github.com/git/git/pull/808
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Sparse checkout status
@ 2020-06-17  7:40 Son Luong Ngoc
  2020-06-17 16:48 ` Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Son Luong Ngoc @ 2020-06-17  7:40 UTC (permalink / raw)
  To: gitgitgadget; +Cc: dstolee, git, newren

Hi Elijah,

> Some of the feedback of folks trying out sparse-checkouts at $dayjob is that
> sparse checkouts can sometimes be disorienting; users can forget that they
> had a sparse-checkout and then wonder where files went.

I agree with this observation: that the current 'git sparse-checkout' experience
could be a bit 'lost' for end users, who may or may not be familiar
with git's 'arcane magic'.

Currently the only way to verify what's going on is to either run
'tree <repo-root-dir>'
or 'cat .git/info/sparse-checkout' (human-readable but not easy).

> This series adds some output to 'git status' and modifies git-prompt slightly as an attempt
> to help.

This is a great idea but I suggest to put a config/flag to let users
enable/disable this.

Git status is often utilized in automated commands (IDE, shell prompt,
etc...) and there may be
performance implications down the line not being able to skip this bit
of information out.

> For reference, I suspect that in repositories that are large enough that
> people always use sparse-checkouts (e.g. windows or office repos), that this
> isn't a problem. But when the repository is approximately
> linux-kernel-sized, then it is reasonable for some folks to have a full
> checkout. sparse-checkouts, however, can provide various build system and
> IDE performance improvements, so we have a split of users who have
> sparse-checkouts and those who have full checkouts. It's easy for users who
> are bridging in between the two worlds or just trying out sparse-checkouts
> for the first time to get confused.

One of our users noted that the experience is improved when combining
'git worktree' with sparse-checkout.
That way you get the correct sparsity for the topic that you are working on.

In a way, the current sparse-checkout experience is similar to a user
running 'git checkout <rev>' directly
instead of checking out a branch.
It does not feel tangible and reproducible.

I was hoping that these concerns will be addressed once the In-Tree
Sparse-Checkout Definition RFC[1] patch landed.
We should then be able to print out which Definition File(s) (we often
call it manifests) were used,
and ideally, only the top most file(s) in the inheritance tree.

So the ideal experience, in my mind, is something of this sort:

    git sc init --cone

    # assuming a inherited from b and c
    git sc add --in-tree manifest-dir/module-a.manifest
    git sc add --in-tree manifest-dir/module-d.manifest

    git sc status
        Your sparse checkout includes following definition(s):
        (1) manifest-dir/module-a.manifest
        (2) manifest-dir/module-d.manifest

    git sc status --all
        Your sparse checkout includes following definition(s):
        (1) manifest-dir/module-a.manifest
        (2) manifest-dir/module-d.manifest
        (3) manifest-dir/module-b.manifest (included by 1)
        (4) manifest-dir/module-c.manifest (included by 1)

I have a feeling that the current file skipped percentage prompt is
not that useful or actionable to end-users,
and they would still end up feeling lost/disoriented at the end.

Thanks,
Son Luong.

[1]: https://lore.kernel.org/git/pull.627.git.1588857462.gitgitgadget@gmail.com/T/#u

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Sparse checkout status
  2020-06-17  7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc
@ 2020-06-17 16:48 ` Elijah Newren
  2020-06-17 17:58   ` Son Luong Ngoc
  0 siblings, 1 reply; 5+ messages in thread
From: Elijah Newren @ 2020-06-17 16:48 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee,
	Git Mailing List

Hi Son,

Thanks for the feedback.

On Wed, Jun 17, 2020 at 12:40 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Elijah,
>
> > Some of the feedback of folks trying out sparse-checkouts at $dayjob is that
> > sparse checkouts can sometimes be disorienting; users can forget that they
> > had a sparse-checkout and then wonder where files went.
>
> I agree with this observation: that the current 'git sparse-checkout' experience
> could be a bit 'lost' for end users, who may or may not be familiar
> with git's 'arcane magic'.
>
> Currently the only way to verify what's going on is to either run
> 'tree <repo-root-dir>'
> or 'cat .git/info/sparse-checkout' (human-readable but not easy).

Well, there is `git sparse-checkout list`, assuming users know they
are in a sparse-checkout, but the whole point of my suggested change
is that they sometimes don't.

> > This series adds some output to 'git status' and modifies git-prompt slightly as an attempt
> > to help.
>
> This is a great idea but I suggest to put a config/flag to let users
> enable/disable this.
>
> Git status is often utilized in automated commands (IDE, shell prompt,
> etc...) and there may be
> performance implications down the line not being able to skip this bit
> of information out.

This surprises me; I considered performance while writing it and kept
it simple on that basis.  In particular:
  * This does not cause any reading or writing of any extra files; it
is done solely with information that is already loaded.
  * If users aren't in a sparse-checkout, their performance overhead
is a single if-check, which I doubt anyone can measure.
  * If they are in a sparse-checkout, then they'd get one extra loop
over files in the index to check the SKIP_WORKTREE bit.

In which cases would performance implications be a concern?  For a
very simple point of reference, in a sparse-checkout of the linux
kernel (using --cone mode and only selecting the drivers/ directory),
I see the following timings for 'git status' in a clean checkout:

Without my change:
[newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
1 'git status'
Benchmark #1: git status
  Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
  Range (min … max):    74.0 ms …  84.0 ms    38 runs

With my change:
[newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
1 'git status'
Benchmark #1: git status
  Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
  Range (min … max):    74.8 ms …  84.5 ms    37 runs

I know the linux kernel is tiny compared to repos like Windows or
Office, but the relative scaling considerations are identical: it's
one extra loop through the cached entries checking a bit for each
entry.  If people are worried about the "extra loop", I could find an
existing loop to modify and just add an extra if-block in it so that
we have the same number of loops.  (I'm doubtful that'd actually help,
but if the concern is an extra loop, it'd certainly avoid that.)
Anyway, if you've got more information about it being too costly, I'm
happy to listen.  Otherwise, the overhead seems pretty small to me and
it's only paid by those who would benefit from the information.

However, all that said, I have good news: Peff already implemented the
flag users can use to avoid this extra output, and did so back in
September of 2009.  It's called "--porcelain".  Automated commands
should already be using it, and if they aren't, they are what needs
fixing -- not the long form status output.

> > For reference, I suspect that in repositories that are large enough that
> > people always use sparse-checkouts (e.g. windows or office repos), that this
> > isn't a problem. But when the repository is approximately
> > linux-kernel-sized, then it is reasonable for some folks to have a full
> > checkout. sparse-checkouts, however, can provide various build system and
> > IDE performance improvements, so we have a split of users who have
> > sparse-checkouts and those who have full checkouts. It's easy for users who
> > are bridging in between the two worlds or just trying out sparse-checkouts
> > for the first time to get confused.
>
> One of our users noted that the experience is improved when combining
> 'git worktree' with sparse-checkout.
> That way you get the correct sparsity for the topic that you are working on.
>
> In a way, the current sparse-checkout experience is similar to a user
> running 'git checkout <rev>' directly
> instead of checking out a branch.
> It does not feel tangible and reproducible.
>
> I was hoping that these concerns will be addressed once the In-Tree
> Sparse-Checkout Definition RFC[1] patch landed.
> We should then be able to print out which Definition File(s) (we often
> call it manifests) were used,
> and ideally, only the top most file(s) in the inheritance tree.
>
> So the ideal experience, in my mind, is something of this sort:
>
>     git sc init --cone
>
>     # assuming a inherited from b and c
>     git sc add --in-tree manifest-dir/module-a.manifest
>     git sc add --in-tree manifest-dir/module-d.manifest
>
>     git sc status
>         Your sparse checkout includes following definition(s):
>         (1) manifest-dir/module-a.manifest
>         (2) manifest-dir/module-d.manifest
>
>     git sc status --all
>         Your sparse checkout includes following definition(s):
>         (1) manifest-dir/module-a.manifest
>         (2) manifest-dir/module-d.manifest
>         (3) manifest-dir/module-b.manifest (included by 1)
>         (4) manifest-dir/module-c.manifest (included by 1)

I think having a 'git sparse-checkout status' would be a fine
subcommand, and output like the above -- possibly also including other
bits Stolee or I mentioned elsewhere in this thread -- would be cool
and would be helpful; it'd complement what I'm doing here quite
nicely.

But you're solving a related problem rather than the one I was
focusing on, and you have left the issue I was focusing on
unaddressed.  In particular, if users forgot that they sparsified in
the first place, how are they going to know to run `git
sparse-checkout status [--all]`?

I think having a simple line of output in `git status` would remind
them.  With that reminder, they could today then go run 'git
sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
internally) or './sparsify --info' (as I use internally) to get more
info.  In the future we could provide additional things for them as
well, such as your 'git sparse-checkout status'.


An aside, though, since you linked to the in-tree sparse-checkout
definitions: When I reviewed that series, the possibility of merge
conflicts and not knowing what sparse-checkout should have checked out
when the in-tree defintions themselves were in a conflicted state
seemed to me to be a pretty tough sticking point.  I'm hoping someone
has a clever solution, but I still don't yet.  Do you?

Thanks,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Sparse checkout status
  2020-06-17 16:48 ` Elijah Newren
@ 2020-06-17 17:58   ` Son Luong Ngoc
  2020-06-17 22:36     ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Son Luong Ngoc @ 2020-06-17 17:58 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee,
	Git Mailing List

Hi Elijah,

On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> 
> Well, there is `git sparse-checkout list`, assuming users know they
> are in a sparse-checkout, but the whole point of my suggested change
> is that they sometimes don't.

Ah thats true.
This was added recently and definitely slipped my mind often.

> 
> This surprises me; I considered performance while writing it and kept
> it simple on that basis.  In particular:
>   * This does not cause any reading or writing of any extra files; it
> is done solely with information that is already loaded.
>   * If users aren't in a sparse-checkout, their performance overhead
> is a single if-check, which I doubt anyone can measure.
>   * If they are in a sparse-checkout, then they'd get one extra loop
> over files in the index to check the SKIP_WORKTREE bit.
> 
> In which cases would performance implications be a concern?  For a
> very simple point of reference, in a sparse-checkout of the linux
> kernel (using --cone mode and only selecting the drivers/ directory),
> I see the following timings for 'git status' in a clean checkout:
> 
> Without my change:
> [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
>   Range (min … max):    74.0 ms …  84.0 ms    38 runs
> 
> With my change:
> [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
>   Range (min … max):    74.8 ms …  84.5 ms    37 runs
> 
> I know the linux kernel is tiny compared to repos like Windows or
> Office, but the relative scaling considerations are identical: it's
> one extra loop through the cached entries checking a bit for each
> entry.  If people are worried about the "extra loop", I could find an
> existing loop to modify and just add an extra if-block in it so that
> we have the same number of loops.  (I'm doubtful that'd actually help,
> but if the concern is an extra loop, it'd certainly avoid that.)
> Anyway, if you've got more information about it being too costly, I'm
> happy to listen.  Otherwise, the overhead seems pretty small to me and
> it's only paid by those who would benefit from the information.
> 
> However, all that said, I have good news: Peff already implemented the
> flag users can use to avoid this extra output, and did so back in
> September of 2009.  It's called "--porcelain".  Automated commands
> should already be using it, and if they aren't, they are what needs
> fixing -- not the long form status output.

When I wrote my initial reaction, the idea of having more than just a
percentage reported back stuck in my mind, specifically with using the
in-tree checkout that I mentioned.

But yeah, that's something down the line to address, you are absolutely
correct that the current patch has no performance impact. Thanks for the
reminder about '--porcelain'.

> 
> I think having a 'git sparse-checkout status' would be a fine
> subcommand, and output like the above -- possibly also including other
> bits Stolee or I mentioned elsewhere in this thread -- would be cool
> and would be helpful; it'd complement what I'm doing here quite
> nicely.
> 
> But you're solving a related problem rather than the one I was
> focusing on, and you have left the issue I was focusing on
> unaddressed.  In particular, if users forgot that they sparsified in
> the first place, how are they going to know to run `git
> sparse-checkout status [--all]`?
> 
> I think having a simple line of output in `git status` would remind
> them.  With that reminder, they could today then go run 'git
> sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
> internally) or './sparsify --info' (as I use internally) to get more
> info.  In the future we could provide additional things for them as
> well, such as your 'git sparse-checkout status'.
> 

I do concede that this point could be a separate problem set and addressed
separately in the future.

> 
> An aside, though, since you linked to the in-tree sparse-checkout
> definitions: When I reviewed that series, the possibility of merge
> conflicts and not knowing what sparse-checkout should have checked out
> when the in-tree defintions themselves were in a conflicted state
> seemed to me to be a pretty tough sticking point.  I'm hoping someone
> has a clever solution, but I still don't yet.  Do you?

I am no clever person, but I often take great pleasure in reading up
works of smarter people. One of which is the Google's and Facebook's Mercurial
extension sets that they opensourced a while ago to support large repos.

The test suite for FB's 'sparse' extension[1] may address your concerns?

The 'sparse' extension defines the sparse checkout definition of a
working repository. It supports '--enable-profile' which take in definition
files ('.sparse'). These profiles are often checked into the root dir 
of the repo.

> 
> Thanks,
> Elijah

Regards,
Son Luong.

[1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status)
  2020-06-17 17:58   ` Son Luong Ngoc
@ 2020-06-17 22:36     ` Elijah Newren
  0 siblings, 0 replies; 5+ messages in thread
From: Elijah Newren @ 2020-06-17 22:36 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Derrick Stolee, Git Mailing List

Hi Son,

On Wed, Jun 17, 2020 at 10:58 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Elijah,
>
> On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> >
> > An aside, though, since you linked to the in-tree sparse-checkout
> > definitions: When I reviewed that series, the possibility of merge
> > conflicts and not knowing what sparse-checkout should have checked out
> > when the in-tree defintions themselves were in a conflicted state
> > seemed to me to be a pretty tough sticking point.  I'm hoping someone
> > has a clever solution, but I still don't yet.  Do you?
>
> I am no clever person, but I often take great pleasure in reading up
> works of smarter people. One of which is the Google's and Facebook's Mercurial
> extension sets that they opensourced a while ago to support large repos.
>
> The test suite for FB's 'sparse' extension[1] may address your concerns?
>
> The 'sparse' extension defines the sparse checkout definition of a
> working repository. It supports '--enable-profile' which take in definition
> files ('.sparse'). These profiles are often checked into the root dir
> of the repo.
>
> [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115

Ooh, interesting; thanks for the link.  It provides an idea, though
I'm not completely sure how it maps to our implementation.  The test
file says that during a merge you get "unioned files".  It's not fully
clear what union means, especially when the files have both includes
and excludes.  For example, does the union of matches mean a union of
includes and an intersection of excludes?  Also, digging a bit
further, it appears mercurial requires all includes to be before all
excludes[2].  But git's pattern specification used in
.git/info/sparse-checkout (taken from .gitignore rules) allows
includes and excludes to be arbitrarily interspersed, so what is an
appropriate union in our case?  (Can we sidestep this question by
limiting the in-tree sparsity definitions to cone mode only, which
then only have includes in the form of directory names, since that'd
allow easy "unioning"?)

A little more digging suggests that mercurial also only allows sparse
definitions to be read from commits, not from the working tree[3].
That seems bad to me; it's too much of a pain for users who want to
edit and test changes.  Sure, if their first commit is bad they could
`git commit --amend` after the fact, but I don't like forcing them
through that workflow.  (This is perhaps especially true if they're
trying to fix the definition during a rebase; they shouldn't have to
commit first to get a corrected sparsity definition, especially as
that can easily mess up rebase state.)

However, although I don't like reading sparsity definition from
commits rather than the working tree, it probably did have an
advantage in that it made it easier for mercurial folks to notice the
union idea: since they only get sparsity patterns from revisions, they
are kind of forced into thinking about getting them from both parents
and then "doing a union".  Anyway, following that logic, it'd be
tempting to say that we limit the in-tree definitions to cone mode,
and then if any of the definitions have conflicts then we just load
stages 2 and 3 of the file and union them.  But...what if stages 2 and
3 also have conflict markers in them (either because of recursive
merges or the more involved rename/rename(2to1) cases)?  How do we
ensure a well defined "union" of values?

I guess a similar question is what if users, while editing, fill the
sparse definition file with syntax errors -- and maybe even commit it.
Do we sparsify down to nothing? Expand out to everything? Ignore the
lines that don't otherwise parse and just use the rest?  Something
else?

The one other thing I noticed of interest from mercurial's sparsify
was that it apparently suffers from the same problems we used to in
git < 2.27.0: inability to update sparsity definitions when there are
any dirty changes[4].  That was a huge pain point; I'm glad we're not
stuck with that anymore.


Anyway, the mercurial link certainly provides some ideas even if it
doesn't answer all the questions.  Thanks for pointing it out.


Elijah


[2] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_59
[3] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_123
[4] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_485
     https://fossies.org/linux/mercurial/mercurial/sparse.py#l_526

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-17 22:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-17  7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc
2020-06-17 16:48 ` Elijah Newren
2020-06-17 17:58   ` Son Luong Ngoc
2020-06-17 22:36     ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-06-16 23:33 [PATCH 0/2] Sparse checkout status Elijah Newren via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).