From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, Stefan Beller <sbeller@google.com>,
Derrick Stolee <stolee@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
Date: Fri, 08 Jun 2018 13:59:56 +0200 [thread overview]
Message-ID: <867en9pjyb.fsf@gmail.com> (raw)
In-Reply-To: <20180531174024.124488-1-dstolee@microsoft.com> (Derrick Stolee's message of "Thu, 31 May 2018 17:40:39 +0000")
Derrick Stolee <dstolee@microsoft.com> writes:
> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.
I like to think about those features as providing an overlay for the
commit graph (similar to overlay filesystems, like overlayfs); in the
case of git-replace quite literally.
I will be calling all those features "history-altering features", for
short.
> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.
What I would like to see here in cover letter is a generic description
of _strategy_ to deal with those history-altering features.
From what I understand you have the following options for each of
replacements, shallow clones and grafts:
- turn off serialized commit-graph if given history-altering feature is
present, as if core.commitGraph was set to false
- invalidate and optionally refresh commit-graph file if given
history-altering feature is present (maybe only if it changed the
view of the history, is such check is possible)
For automatic invalidation you would need to have either:
- cover all possible ways by which given history-altering feature can
change the view of history, or
- keep state of history-altering change for which commit-graph was
created (e.g. in proposed VAL4 chunk) and compare with current view
of history if it changed
For automatic turning off you would need only to check if
history-altering feature is present.
Let us examine each of those history-altering features that Git
supports:
* shallow clone
- shallow clone usually means having shorter history, so serialized
commit-graph is not needed as much; also changing the depth
screws-up assumptions about generation numbers
- there are only a few entry points changing the view of history,
namely fetch and push with shallow options (--depth=<depth>,
--deepen=<depth>, --shallow-since=<date>, --update-shallow,
--shallow-exclude=<revision>, --unshallow)
- it is easy to check for presence of shallow clone feature by
chacking of $GIT_DIR/shallow exists and is not empty
- different contents of $GIT_DIR/shallow means different view of
history
- NOTE: internally uses grafts mechanism (emulated grafts)
* replacements (replace objects, git-replace)
- git-replace can be used to join current development repository with
historical repository, in which case we would certainly want to make
use of serialied commit graph; on the other hand the replacements do
not necessary alter the view of the history
- theoretically you could create replacement refs by hand, but in
practice there are TWO ways of getting them:
- using git-replace command to create, edit/change and delete
replacement objects'
'
- fetching or having pushed-to refs in refs/replace/* namespace
- you need to check if there are any refs in refs/replace/* namespace
to check if the feature is used (but it doesn't necessarily mean
that it altered project history)
- changed list of refs in refs/replace/* namespace (which you can get
with for-each-ref command/API) does not necessarily mean that the
view of the history changed: you can replace non-commit object, you
can replace commits and not change their parents; it is not as easy
as checking if file exists
- NOTE: the feature can be turned off manually with
GIT_NO_REPLACE_OBJECTS environment variable and with
--no-replace-objects option to git wrapper.
Also when pushing, fetching and fsck-ing this feature is turned off
and refs in refs/replace/* namespace are treated as ordinary refs.
This may mean that we would want to create commit-graph with
replacements for ordinary non-bare repository, and without
replacements for server-side bare repository.
* grafts
- subset of uses of git-replace, older and *obsolete* feature (because
it is unsafe to use; that is you need to be careful with it).
- edited by hand, no automatic entry points
- if $GIT_DIR/info/grafts file is present, then feature is enabled
(barring some corner cases, like empty file or file consisting only
of comments)
- changed contents of this file means changed view of history (well,
except for reordering lines, or removing/adding empty lines and
comments)
>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> write_commit_graph_reachable() from `git fetch` and `git commit`. I
> believe this is the only dependence on ds/commit-graph-fsck. The other
> commits should only depend on ds/commit-graph-lockfile-fix.
That's clever way of increasing coverage.
> Running the full test suite after these DO NO MERGE commits results in
> the following test failures, which I categorize by type of failure.
>
> The following tests are red herrings. Most work by deleting a commit
> from the object database and trying to demonstrate a failure. Others
> rely on how certain objects are loaded. These are not bugs, but will
> add noise if you run the tests suite with this patch.
>
> t0410-partial-clone.sh Failed tests: 5, 8
> t5307-pack-missing-commit.sh Failed tests: 3-4
> t6011-rev-list-with-bad-commit.sh Failed test: 4
> t6024-recursive-merge.sh Failed test: 4
Does this means that those tests should be in the end (i.e. when
core.commitGraph is turned on by defult) be simply run with
core.commitGraph explicitly disabled for the test?
> The following tests are fixed in "commit-graph: enable replace-object
> and grafts".
Would it make sense to split this commit into two dealing separately
with replace objects and with grafts? Or do they use the same
underlying API?
> t6001-rev-list-graft.sh Failed tests: 3, 5, 7, 9, 11, 13
O.K. It might be good idea to add separate test that does the same, but
with git-replace instead of grafts, though.
> t6050-replace.sh Failed tests: 11-15, 23-24, 29
The t6050-replace.sh does not test changing the DAG of revision
(excluding changing the SHA-1 of commit), if I read it correctly. It
would be good to test using git-replace to change committer date, and to
change parents: shortening or lengthening history (e.g. emulating
joining two independent lines of development in the latter case).
See also comment about t6001 above.
>
> The following tests involve shallow clones.
>
> t5500-fetch-pack.sh Failed tests: 30-31, 34, 348-349
> t5537-fetch-shallow.sh Failed tests: 4-7, 9
> t5802-connect-helper.sh Failed test: 3
>
> These tests are mostly fixed by three commits:
>
> * commit-graph: avoid writing when repo is shallow
> * fetch: destroy commit graph on shallow parameters
Seems O.K. from the subject, but I'd have to check the details.
> * commit-graph: revert to odb on missing parents
I wonder if reverting to using object database is a good solution, and
if it wouldn't be better to invalidate / delete commit-graph file
instead...
>
> Each of these cases cover a different interaction that occurs with
> shallow clones. Some are due to a commit becoming shallow, while others
> are due to a commit becoming unshallow (and hence invalidating its
> generation number).
Why do not simply check if repository is shallow?
>
> After these changes, there is one test case that still fails, and I
> cannot understand why:
>
> t5500-fetch-pack.sh Failed test: 348
>
> This test fails when performing a `git fetch --shallow-exclude`. When I
> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
> directory to replay the fetch it performs as expected. After struggling
> with it for a while, I figured I should just put this series up for
> discussion. Maybe someone with more experience in shallow clones can
> point out the obvious issues I'm having.
>
> Thanks,
> -Stolee
>
> Derrick Stolee (6):
> DO NOT MERGE: compute commit-graph on every commit
> DO NOT MERGE: write commit-graph on every fetch
> commit-graph: enable replace-object and grafts
> commit-graph: avoid writing when repo is shallow
> fetch: destroy commit graph on shallow parameters
> commit-graph: revert to odb on missing parents
>
> builtin/commit.c | 5 +++++
> builtin/fetch.c | 10 +++++++++
> builtin/gc.c | 2 +-
> builtin/replace.c | 3 +++
> commit-graph.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++----
> commit-graph.h | 9 ++++++++
> commit.c | 5 +++++
> environment.c | 2 +-
> 8 files changed, 95 insertions(+), 6 deletions(-)
prev parent reply other threads:[~2018-06-08 12:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
2018-05-31 19:39 ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
2018-06-09 15:47 ` Jakub Narebski
2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
2018-05-31 19:07 ` Stefan Beller
2018-06-01 2:30 ` Junio C Hamano
2018-06-01 11:46 ` Derrick Stolee
2018-06-02 18:39 ` Jakub Narebski
2018-06-04 2:19 ` Junio C Hamano
2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
2018-05-31 19:29 ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 6/6] commit-graph: revert to odb on missing parents Derrick Stolee
2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
2018-06-01 1:09 ` Derrick Stolee
2018-06-08 11:59 ` Jakub Narebski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=867en9pjyb.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=sbeller@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).