git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Son Luong Ngoc <sluongng@gmail.com>
Cc: Derrick Stolee <dstolee@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status)
Date: Wed, 17 Jun 2020 15:36:10 -0700	[thread overview]
Message-ID: <CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com> (raw)
In-Reply-To: <20200617175850.GA57254@C02YX140LVDN.corpad.adbkng.com>

Hi Son,

On Wed, Jun 17, 2020 at 10:58 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Elijah,
>
> On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> >
> > An aside, though, since you linked to the in-tree sparse-checkout
> > definitions: When I reviewed that series, the possibility of merge
> > conflicts and not knowing what sparse-checkout should have checked out
> > when the in-tree defintions themselves were in a conflicted state
> > seemed to me to be a pretty tough sticking point.  I'm hoping someone
> > has a clever solution, but I still don't yet.  Do you?
>
> I am no clever person, but I often take great pleasure in reading up
> works of smarter people. One of which is the Google's and Facebook's Mercurial
> extension sets that they opensourced a while ago to support large repos.
>
> The test suite for FB's 'sparse' extension[1] may address your concerns?
>
> The 'sparse' extension defines the sparse checkout definition of a
> working repository. It supports '--enable-profile' which take in definition
> files ('.sparse'). These profiles are often checked into the root dir
> of the repo.
>
> [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115

Ooh, interesting; thanks for the link.  It provides an idea, though
I'm not completely sure how it maps to our implementation.  The test
file says that during a merge you get "unioned files".  It's not fully
clear what union means, especially when the files have both includes
and excludes.  For example, does the union of matches mean a union of
includes and an intersection of excludes?  Also, digging a bit
further, it appears mercurial requires all includes to be before all
excludes[2].  But git's pattern specification used in
.git/info/sparse-checkout (taken from .gitignore rules) allows
includes and excludes to be arbitrarily interspersed, so what is an
appropriate union in our case?  (Can we sidestep this question by
limiting the in-tree sparsity definitions to cone mode only, which
then only have includes in the form of directory names, since that'd
allow easy "unioning"?)

A little more digging suggests that mercurial also only allows sparse
definitions to be read from commits, not from the working tree[3].
That seems bad to me; it's too much of a pain for users who want to
edit and test changes.  Sure, if their first commit is bad they could
`git commit --amend` after the fact, but I don't like forcing them
through that workflow.  (This is perhaps especially true if they're
trying to fix the definition during a rebase; they shouldn't have to
commit first to get a corrected sparsity definition, especially as
that can easily mess up rebase state.)

However, although I don't like reading sparsity definition from
commits rather than the working tree, it probably did have an
advantage in that it made it easier for mercurial folks to notice the
union idea: since they only get sparsity patterns from revisions, they
are kind of forced into thinking about getting them from both parents
and then "doing a union".  Anyway, following that logic, it'd be
tempting to say that we limit the in-tree definitions to cone mode,
and then if any of the definitions have conflicts then we just load
stages 2 and 3 of the file and union them.  But...what if stages 2 and
3 also have conflict markers in them (either because of recursive
merges or the more involved rename/rename(2to1) cases)?  How do we
ensure a well defined "union" of values?

I guess a similar question is what if users, while editing, fill the
sparse definition file with syntax errors -- and maybe even commit it.
Do we sparsify down to nothing? Expand out to everything? Ignore the
lines that don't otherwise parse and just use the rest?  Something
else?

The one other thing I noticed of interest from mercurial's sparsify
was that it apparently suffers from the same problems we used to in
git < 2.27.0: inability to update sparsity definitions when there are
any dirty changes[4].  That was a huge pain point; I'm glad we're not
stuck with that anymore.


Anyway, the mercurial link certainly provides some ideas even if it
doesn't answer all the questions.  Thanks for pointing it out.


Elijah


[2] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_59
[3] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_123
[4] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_485
     https://fossies.org/linux/mercurial/mercurial/sparse.py#l_526

      reply	other threads:[~2020-06-17 22:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc
2020-06-17 16:48 ` Elijah Newren
2020-06-17 17:58   ` Son Luong Ngoc
2020-06-17 22:36     ` Elijah Newren [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).