* [PATCH 0/1] read-cache: update index format default to v4
@ 2018-09-24 21:15 Derrick Stolee via GitGitGadget
2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-24 21:15 UTC (permalink / raw)
To: git; +Cc: pclouds, peartben, git, Junio C Hamano
After discussing this with several people off-list, I thought I would open
the question up to the list:
Should we update the default index version to v4?
The .git/index file stores the list of every path tracked by Git in the
working directory, including merge information, staged paths, and
information about the file system contents (based on modified time). The
only major update in v4 is that the paths are prefix-compressed. This
compression works best in repos with a lot of paths, especially deep paths.
For this reason, we set the index to v4 in VFS for Git.
Among VFS for Git contributors, we were talking about how the v4 format is
not covered by the test suite by default. We are working to increase the
number of CI builds that set extra GIT_TEST_* variables that we need.
However, I thought it worth having a discussion of whether this is a good
thing to recommend for all users of Git.
Personally, I'm not an expert here, but I am happy to start the
conversation.
Thanks, -Stolee
Derrick Stolee (1):
read-cache: update index format default to v4
read-cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-41%2Fderrickstolee%2Findex-v4-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-41/derrickstolee/index-v4-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/41
--
gitgitgadget
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/1] read-cache: update index format default to v4
2018-09-24 21:15 [PATCH 0/1] read-cache: update index format default to v4 Derrick Stolee via GitGitGadget
@ 2018-09-24 21:15 ` Derrick Stolee via GitGitGadget
2018-09-24 21:32 ` SZEDER Gábor
0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-24 21:15 UTC (permalink / raw)
To: git; +Cc: pclouds, peartben, git, Junio C Hamano, Derrick Stolee
From: Derrick Stolee <dstolee@microsoft.com>
The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.
Despite being stable for a long time, this index version was never
adopted as the default. This prefix-compressed version of the format
can get significant space savings on repos with large working
directories (which naturally tend to have deep nesting). This version
is set as the default for some external tools, such as VFS for Git.
Because of this external use, the format has had a lot of "testing in
production" and also is subject to continuous integration in these
environments.
Previously, to test version 4 indexes, we needed to run the test
suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
One potential, but short-term, downside is that we lose coverage of
the version 3 indexes. The trade-off is that we may want to cover
that version using GIT_TEST_INDEX_VERSION=3.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
read-cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/read-cache.c b/read-cache.c
index 372588260e..af6c8f2a67 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
* Index File I/O
*****************************************************************/
-#define INDEX_FORMAT_DEFAULT 3
+#define INDEX_FORMAT_DEFAULT 4
static unsigned int get_index_format_default(void)
{
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
@ 2018-09-24 21:32 ` SZEDER Gábor
2018-09-24 22:29 ` Junio C Hamano
2018-09-25 7:06 ` Patrick Steinhardt
0 siblings, 2 replies; 9+ messages in thread
From: SZEDER Gábor @ 2018-09-24 21:32 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, pclouds, peartben, git, Junio C Hamano, Derrick Stolee
On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The index v4 format has been available since 2012 with 9d22778
> "reach-cache.c: write prefix-compressed names in the index". Since
> the format has been stable for so long, almost all versions of Git
> in use today understand version 4, removing one barrier to upgrade
> -- that someone may want to downgrade and needs a working repo.
What about alternative implementations, like JGit, libgit2, etc.?
> Despite being stable for a long time, this index version was never
> adopted as the default. This prefix-compressed version of the format
> can get significant space savings on repos with large working
> directories (which naturally tend to have deep nesting). This version
> is set as the default for some external tools, such as VFS for Git.
> Because of this external use, the format has had a lot of "testing in
> production" and also is subject to continuous integration in these
> environments.
>
> Previously, to test version 4 indexes, we needed to run the test
> suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
>
> One potential, but short-term, downside is that we lose coverage of
> the version 3 indexes. The trade-off is that we may want to cover
> that version using GIT_TEST_INDEX_VERSION=3.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> read-cache.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 372588260e..af6c8f2a67 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
> * Index File I/O
> *****************************************************************/
>
> -#define INDEX_FORMAT_DEFAULT 3
> +#define INDEX_FORMAT_DEFAULT 4
>
> static unsigned int get_index_format_default(void)
> {
> --
> gitgitgadget
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-24 21:32 ` SZEDER Gábor
@ 2018-09-24 22:29 ` Junio C Hamano
2018-09-25 7:06 ` Patrick Steinhardt
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2018-09-24 22:29 UTC (permalink / raw)
To: SZEDER Gábor
Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
Derrick Stolee
SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The index v4 format has been available since 2012 with 9d22778
>> "reach-cache.c: write prefix-compressed names in the index". Since
>> the format has been stable for so long, almost all versions of Git
>> in use today understand version 4, removing one barrier to upgrade
>> -- that someone may want to downgrade and needs a working repo.
>
> What about alternative implementations, like JGit, libgit2, etc.?
Good question.
Because the index-version of an index file is designed to be sticky,
repos that need to be accessed by other implementations can keep
whatever current version. A new repo that need to be accessed by
them can be (forcibly) written in v2 and keep its v2ness, I would
think.
And that would serve as an incentive for the implementations to
catch up ;-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-24 21:32 ` SZEDER Gábor
2018-09-24 22:29 ` Junio C Hamano
@ 2018-09-25 7:06 ` Patrick Steinhardt
2018-09-25 14:29 ` Derrick Stolee
1 sibling, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2018-09-25 7:06 UTC (permalink / raw)
To: SZEDER Gábor
Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
Junio C Hamano, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]
On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > The index v4 format has been available since 2012 with 9d22778
> > "reach-cache.c: write prefix-compressed names in the index". Since
> > the format has been stable for so long, almost all versions of Git
> > in use today understand version 4, removing one barrier to upgrade
> > -- that someone may want to downgrade and needs a working repo.
>
> What about alternative implementations, like JGit, libgit2, etc.?
Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93 (Merge pull request #3837 from
novalis/dturner/indexv4, 2016-08-17), released with v0.25 in
December 2016. Due to insufficient tests, our read support was
initially broken, which got fixed with commit 3bc95cfe3 (Merge
pull request #4236 from pks-t/pks/index-v4-fixes, 2017-06-07) and
released with v0.26 in June 2017.
Right now I'm not aware of additional bugs in our index v4
support, so we'd be fine with changing the default.
Patrick
> > Despite being stable for a long time, this index version was never
> > adopted as the default. This prefix-compressed version of the format
> > can get significant space savings on repos with large working
> > directories (which naturally tend to have deep nesting). This version
> > is set as the default for some external tools, such as VFS for Git.
> > Because of this external use, the format has had a lot of "testing in
> > production" and also is subject to continuous integration in these
> > environments.
> >
> > Previously, to test version 4 indexes, we needed to run the test
> > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> >
> > One potential, but short-term, downside is that we lose coverage of
> > the version 3 indexes. The trade-off is that we may want to cover
> > that version using GIT_TEST_INDEX_VERSION=3.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> > read-cache.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index 372588260e..af6c8f2a67 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
> > * Index File I/O
> > *****************************************************************/
> >
> > -#define INDEX_FORMAT_DEFAULT 3
> > +#define INDEX_FORMAT_DEFAULT 4
> >
> > static unsigned int get_index_format_default(void)
> > {
> > --
> > gitgitgadget
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-25 7:06 ` Patrick Steinhardt
@ 2018-09-25 14:29 ` Derrick Stolee
2018-09-25 18:01 ` Stefan Beller
0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2018-09-25 14:29 UTC (permalink / raw)
To: Patrick Steinhardt, SZEDER Gábor
Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
Junio C Hamano, Derrick Stolee, Jonathan Nieder
On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> The index v4 format has been available since 2012 with 9d22778
>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>> the format has been stable for so long, almost all versions of Git
>>> in use today understand version 4, removing one barrier to upgrade
>>> -- that someone may want to downgrade and needs a working repo.
>> What about alternative implementations, like JGit, libgit2, etc.?
> Speaking of libgit2, we are able to read and write index v4 since
> commit c1b370e93
This is a good point, Szeder.
Patrick: I'm glad LibGit2 is up-to-date with index formats.
Unfortunately, taking a look (for the first time) at the JGit code
reveals that they don't appear to have v4 support. In
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
DirCache.readFrom() method: lines 488-494, I see the following snippet:
final int ver = NB.decodeInt32(hdr, 4);
boolean extended = false;
if (ver == 3)
extended = true;
else if (ver != 2)
throw new
CorruptObjectException(MessageFormat.format(
JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
It looks like this will immediately throw with versions other than 2 or 3.
I'm adding Jonathan Nieder to CC so he can check with JGit people about
the impact of this change.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-25 14:29 ` Derrick Stolee
@ 2018-09-25 18:01 ` Stefan Beller
2018-09-25 21:22 ` Ben Peart
2018-09-26 21:03 ` Matthias Sohn
0 siblings, 2 replies; 9+ messages in thread
From: Stefan Beller @ 2018-09-25 18:01 UTC (permalink / raw)
To: Derrick Stolee, Matthias Sohn
Cc: Patrick Steinhardt, SZEDER Gábor, gitgitgadget, git,
Duy Nguyen, Ben Peart, Jeff Hostetler, Junio C Hamano,
Derrick Stolee, Jonathan Nieder
On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
> >>> The index v4 format has been available since 2012 with 9d22778
> >>> "reach-cache.c: write prefix-compressed names in the index". Since
> >>> the format has been stable for so long, almost all versions of Git
> >>> in use today understand version 4, removing one barrier to upgrade
> >>> -- that someone may want to downgrade and needs a working repo.
> >> What about alternative implementations, like JGit, libgit2, etc.?
> > Speaking of libgit2, we are able to read and write index v4 since
> > commit c1b370e93
>
> This is a good point, Szeder.
>
> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>
> Unfortunately, taking a look (for the first time) at the JGit code
> reveals that they don't appear to have v4 support. In
> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>
> final int ver = NB.decodeInt32(hdr, 4);
> boolean extended = false;
> if (ver == 3)
> extended = true;
> else if (ver != 2)
> throw new
> CorruptObjectException(MessageFormat.format(
> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>
> It looks like this will immediately throw with versions other than 2 or 3.
>
> I'm adding Jonathan Nieder to CC so he can check with JGit people about
> the impact of this change.
JGit is used both on the server (which doesn't use index/staging area)
as well as client side as e.g. an Eclipse integration, which would
very much like to use the index.
Adding Matthias Sohn as well, who is active in JGit and cares
more about the client side than Googlers who only make use
of the server side part of JGit.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-25 18:01 ` Stefan Beller
@ 2018-09-25 21:22 ` Ben Peart
2018-09-26 21:03 ` Matthias Sohn
1 sibling, 0 replies; 9+ messages in thread
From: Ben Peart @ 2018-09-25 21:22 UTC (permalink / raw)
To: Stefan Beller, Derrick Stolee, Matthias Sohn
Cc: Patrick Steinhardt, SZEDER Gábor, gitgitgadget, git,
Duy Nguyen, Jeff Hostetler, Junio C Hamano, Derrick Stolee,
Jonathan Nieder
On 9/25/2018 2:01 PM, Stefan Beller wrote:
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
>>> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>>>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>>
>>>>> The index v4 format has been available since 2012 with 9d22778
>>>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>>>> the format has been stable for so long, almost all versions of Git
>>>>> in use today understand version 4, removing one barrier to upgrade
>>>>> -- that someone may want to downgrade and needs a working repo.
>>>> What about alternative implementations, like JGit, libgit2, etc.?
>>> Speaking of libgit2, we are able to read and write index v4 since
>>> commit c1b370e93
>>
>> This is a good point, Szeder.
>>
>> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>>
>> Unfortunately, taking a look (for the first time) at the JGit code
>> reveals that they don't appear to have v4 support. In
>> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
>> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>>
>> final int ver = NB.decodeInt32(hdr, 4);
>> boolean extended = false;
>> if (ver == 3)
>> extended = true;
>> else if (ver != 2)
>> throw new
>> CorruptObjectException(MessageFormat.format(
>> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>>
>> It looks like this will immediately throw with versions other than 2 or 3.
>>
>> I'm adding Jonathan Nieder to CC so he can check with JGit people about
>> the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.
>
> Stefan
>
In addition to the compatibility with other implementations of git
question, I think we should include Duy's patch [1] to "optimize reading
index format v4" before flipping the default to V4.
In my testing, this reduced the index load time of V4 by 10%-15% making
the CPU cost only ~2% more expensive than V2 or V3 indexes. This
increase in CPU cost is more than offset by the significant reduction in
file IO due to the reduced index file size.
[1] https://public-inbox.org/git/20180825064458.28484-1-pclouds@gmail.com/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] read-cache: update index format default to v4
2018-09-25 18:01 ` Stefan Beller
2018-09-25 21:22 ` Ben Peart
@ 2018-09-26 21:03 ` Matthias Sohn
1 sibling, 0 replies; 9+ messages in thread
From: Matthias Sohn @ 2018-09-26 21:03 UTC (permalink / raw)
To: Stefan Beller
Cc: stolee, ps, szeder.dev, gitgitgadget, git, pclouds, peartben, git,
Junio C Hamano, dstolee, jrnieder
On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > >>> From: Derrick Stolee <dstolee@microsoft.com>
> > >>>
> > >>> The index v4 format has been available since 2012 with 9d22778
> > >>> "reach-cache.c: write prefix-compressed names in the index". Since
> > >>> the format has been stable for so long, almost all versions of Git
> > >>> in use today understand version 4, removing one barrier to upgrade
> > >>> -- that someone may want to downgrade and needs a working repo.
> > >> What about alternative implementations, like JGit, libgit2, etc.?
> > > Speaking of libgit2, we are able to read and write index v4 since
> > > commit c1b370e93
> >
> > This is a good point, Szeder.
> >
> > Patrick: I'm glad LibGit2 is up-to-date with index formats.
> >
> > Unfortunately, taking a look (for the first time) at the JGit code
> > reveals that they don't appear to have v4 support. In
> > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> > DirCache.readFrom() method: lines 488-494, I see the following snippet:
> >
> > final int ver = NB.decodeInt32(hdr, 4);
> > boolean extended = false;
> > if (ver == 3)
> > extended = true;
> > else if (ver != 2)
> > throw new
> > CorruptObjectException(MessageFormat.format(
> > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
> >
> > It looks like this will immediately throw with versions other than 2 or 3.
> >
> > I'm adding Jonathan Nieder to CC so he can check with JGit people about
> > the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.
thanks for the heads up, in fact JGit does not yet support index version 4.
I will look into this.
-Matthias
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-09-26 21:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-24 21:15 [PATCH 0/1] read-cache: update index format default to v4 Derrick Stolee via GitGitGadget
2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2018-09-24 21:32 ` SZEDER Gábor
2018-09-24 22:29 ` Junio C Hamano
2018-09-25 7:06 ` Patrick Steinhardt
2018-09-25 14:29 ` Derrick Stolee
2018-09-25 18:01 ` Stefan Beller
2018-09-25 21:22 ` Ben Peart
2018-09-26 21:03 ` Matthias Sohn
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).