* git svn clone/fetch hits issues with gc --auto [not found] <CACPiFCJZ83sqE7Gaj2pa12APkBF5tau-C6t4_GrXBWDwcMnJHg@mail.gmail.com> @ 2018-10-09 22:51 ` Martin Langhoff 2018-10-09 23:45 ` Eric Wong 2018-10-10 8:04 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 26+ messages in thread From: Martin Langhoff @ 2018-10-09 22:51 UTC (permalink / raw) To: Git Mailing List, Eric Wong Hi folks, Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo I hit the gc error: warning: There are too many unreachable loose objects; run 'git prune' to remove them. gc --auto: command returned error: 255 I don't seem to be the only one -- https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn Looking at code history, it dropped the ability to pass options to git repack when it was converted it to using git gc. Experimentally I find that tweaking it to run git gc --auto --prune=5.minutes.ago works well, while --prune=now breaks it. Attempts to commit fail with missing objects. - Why does --prune=now break it? Perhaps "gc" runs in the background, and races with the commit being prepared? - Would it be safe, sane to apply --prune=some.value on _clone_? - During _fetch_, --prune=some.value seems risky. In a checkout being actively used for development or merging it'd risk pruning objects users expect to be there for recovery. Would there be a safe, sane way? - Is there a safer, saner value than 5 minutes? cheers, m -- martin.langhoff@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted ~ http://github.com/martin-langhoff by shiny stuff -- martin.langhoff@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted ~ http://github.com/martin-langhoff by shiny stuff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-09 22:51 ` git svn clone/fetch hits issues with gc --auto Martin Langhoff @ 2018-10-09 23:45 ` Eric Wong 2018-10-10 2:49 ` Junio C Hamano 2018-10-10 8:04 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 26+ messages in thread From: Eric Wong @ 2018-10-09 23:45 UTC (permalink / raw) To: Martin Langhoff; +Cc: List Martin Langhoff <martin.langhoff@gmail.com> wrote: > Hi folks, > > Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > I hit the gc error: > > warning: There are too many unreachable loose objects; run 'git prune' > to remove them. > gc --auto: command returned error: 255 GC can be annoying when that happens... For git-svn, perhaps this can be appropriate to at least allow the import to continue: diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm index 76b2965905..9b0caa3d47 100644 --- a/perl/Git/SVN.pm +++ b/perl/Git/SVN.pm @@ -999,7 +999,7 @@ sub restore_commit_header_env { } sub gc { - command_noisy('gc', '--auto'); + eval { command_noisy('gc', '--auto') }; }; sub do_git_commit { But yeah, somebody else who works on git regularly could probably stop repack from writing thousands of loose objects (and instead write a self-contained pack with those objects, instead). I haven't followed git closely lately, myself. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-09 23:45 ` Eric Wong @ 2018-10-10 2:49 ` Junio C Hamano 2018-10-10 11:01 ` Martin Langhoff 0 siblings, 1 reply; 26+ messages in thread From: Junio C Hamano @ 2018-10-10 2:49 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Eric Wong, Martin Langhoff, List Forwarding to Jonathan, as I think this is an interesting supporting vote for the topic that we were stuck on. Eric Wong <e@80x24.org> writes: > Martin Langhoff <martin.langhoff@gmail.com> wrote: >> Hi folks, >> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo >> I hit the gc error: >> >> warning: There are too many unreachable loose objects; run 'git prune' >> to remove them. >> gc --auto: command returned error: 255 > > GC can be annoying when that happens... For git-svn, perhaps > this can be appropriate to at least allow the import to continue: > > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm > index 76b2965905..9b0caa3d47 100644 > --- a/perl/Git/SVN.pm > +++ b/perl/Git/SVN.pm > @@ -999,7 +999,7 @@ sub restore_commit_header_env { > } > > sub gc { > - command_noisy('gc', '--auto'); > + eval { command_noisy('gc', '--auto') }; > }; > > sub do_git_commit { > > > But yeah, somebody else who works on git regularly could > probably stop repack from writing thousands of loose > objects (and instead write a self-contained pack with > those objects, instead). I haven't followed git closely > lately, myself. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 2:49 ` Junio C Hamano @ 2018-10-10 11:01 ` Martin Langhoff 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 26+ messages in thread From: Martin Langhoff @ 2018-10-10 11:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, e, Git Mailing List Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many loose objects" makes sense to me. - remove unactionable warning - as the warning is gone, no gc.log is produced - subsequent gc runs don't exit due to gc.log My very humble +1 on that. As for downsides... if we have truly tons of _recent_ loose objects, it'll ... take disk space? I'm fine with that. For more aggressive gc options, thoughts: - Do we always consider git gc --prune=now "safe" in a "won't delete stuff the user is likely to want" sense? For example -- are the references from reflogs enough safety? - Even if we don't, for some commands it should be safe to run git gc --prune=now at the end of the process, for example an import that generates a new git repo (git svn clone). cheers, m On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano <gitster@pobox.com> wrote: > > Forwarding to Jonathan, as I think this is an interesting supporting > vote for the topic that we were stuck on. > > Eric Wong <e@80x24.org> writes: > > > Martin Langhoff <martin.langhoff@gmail.com> wrote: > >> Hi folks, > >> > >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > >> I hit the gc error: > >> > >> warning: There are too many unreachable loose objects; run 'git prune' > >> to remove them. > >> gc --auto: command returned error: 255 > > > > GC can be annoying when that happens... For git-svn, perhaps > > this can be appropriate to at least allow the import to continue: > > > > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm > > index 76b2965905..9b0caa3d47 100644 > > --- a/perl/Git/SVN.pm > > +++ b/perl/Git/SVN.pm > > @@ -999,7 +999,7 @@ sub restore_commit_header_env { > > } > > > > sub gc { > > - command_noisy('gc', '--auto'); > > + eval { command_noisy('gc', '--auto') }; > > }; > > > > sub do_git_commit { > > > > > > But yeah, somebody else who works on git regularly could > > probably stop repack from writing thousands of loose > > objects (and instead write a self-contained pack with > > those objects, instead). I haven't followed git closely > > lately, myself. -- martin.langhoff@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted ~ http://github.com/martin-langhoff by shiny stuff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:01 ` Martin Langhoff @ 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason 2018-10-10 11:41 ` Martin Langhoff ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 11:27 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10 2018, Martin Langhoff wrote: > Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many > loose objects" makes sense to me. > > - remove unactionable warning > - as the warning is gone, no gc.log is produced > - subsequent gc runs don't exit due to gc.log > > My very humble +1 on that. > > As for downsides... if we have truly tons of _recent_ loose objects, > it'll ... take disk space? I'm fine with that. As Jeff's https://public-inbox.org/git/20180716175103.GB18636@sigill.intra.peff.net/ and my https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ note it's a bit more complex than that. I.e.: - The warning is actionable, you can decide to up your expiration policy. - We use this warning as a proxy for "let's not run for a day", otherwise we'll just grind on gc --auto trying to consolidate possibly many hundreds of K of loose objects only to find none of them can be pruned because the run into the expiry policy. With the warning we retry that once per day, which sucks less. - This conflation of the user-visible warning and the policy is an emergent effect of how the different gc pieces interact, which as I note in the linked thread(s) sucks. But we can't just yank one piece away (as Jonathan's patch does) without throwing the baby out with the bathwater. It will mean that e.g. if you have 10k loose objects in your git.git, and created them just now, that every time you run anything that runs "gc --auto" we'll fork to the background, peg a core at 100% CPU for 2-3 minutes or whatever it is, only do get nowhere and do the same thing again in ~3 minutes when you run your next command. - I think you may be underestimating some of the cases where this ends up taking a huge amount of disk space (and now we'll issue at least *some*) warning. See my https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/ where a repo's .git went from 2.5G to 30G due to being stuck in this mode. > For more aggressive gc options, thoughts: > > - Do we always consider git gc --prune=now "safe" in a "won't delete > stuff the user is likely to want" sense? For example -- are the > references from reflogs enough safety? The --prune=now command is not generally safe for the reasons noted in the "NOTES" section in "git help gc". > - Even if we don't, for some commands it should be safe to run git gc > --prune=now at the end of the process, for example an import that > generates a new git repo (git svn clone). Yeah I don't see a problem with that, I didn't know about this interesting use-case, i.e. that "git svn clone" will create a lot of loose objects. As seen in my https://public-inbox.org/git/87tvm3go42.fsf@evledraar.gmail.com/ I'm working on making "gc --auto" run at the end of clone for unrelated reasons, i.e. so we generate the commit-graph, seems like "git svn clone" could do something similar. So it's creating a lot of garbage during its cloning process that can just be immediately thrown away? What is it doing? Using the object store as a scratch pad for its own temporary state? > m > On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Forwarding to Jonathan, as I think this is an interesting supporting >> vote for the topic that we were stuck on. >> >> Eric Wong <e@80x24.org> writes: >> >> > Martin Langhoff <martin.langhoff@gmail.com> wrote: >> >> Hi folks, >> >> >> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo >> >> I hit the gc error: >> >> >> >> warning: There are too many unreachable loose objects; run 'git prune' >> >> to remove them. >> >> gc --auto: command returned error: 255 >> > >> > GC can be annoying when that happens... For git-svn, perhaps >> > this can be appropriate to at least allow the import to continue: >> > >> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm >> > index 76b2965905..9b0caa3d47 100644 >> > --- a/perl/Git/SVN.pm >> > +++ b/perl/Git/SVN.pm >> > @@ -999,7 +999,7 @@ sub restore_commit_header_env { >> > } >> > >> > sub gc { >> > - command_noisy('gc', '--auto'); >> > + eval { command_noisy('gc', '--auto') }; >> > }; >> > >> > sub do_git_commit { >> > >> > >> > But yeah, somebody else who works on git regularly could >> > probably stop repack from writing thousands of loose >> > objects (and instead write a self-contained pack with >> > those objects, instead). I haven't followed git closely >> > lately, myself. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 11:41 ` Martin Langhoff 2018-10-10 11:48 ` Ævar Arnfjörð Bjarmason 2018-10-10 11:43 ` Ævar Arnfjörð Bjarmason 2018-10-10 12:21 ` Junio C Hamano 2 siblings, 1 reply; 26+ messages in thread From: Martin Langhoff @ 2018-10-10 11:41 UTC (permalink / raw) To: Ævar Arnfjörð Cc: Junio C Hamano, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > As Jeff's > https://public-inbox.org/git/20180716175103.GB18636@sigill.intra.peff.net/ > and my https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ > note it's a bit more complex than that. Ok, my bad for not reading the whole thread :-) thanks for the kind explanation. > - The warning is actionable, you can decide to up your expiration > policy. A newbie-ish user shouldn't need to know git's internal store model _and the nuances of its special cases_ got get through. > - We use this warning as a proxy for "let's not run for a day" Oh, so _that's_ the trick with creating gc.log? I then understand the idea of changing to exit 0. But it's far from clear, and a clear _flag_, and not spitting again the same warning, or differently-worded warning would be better. "We won't try running gc, a recent run was deemed pointless until some time passes. Nothing to worry about." > - This conflation of the user-visible warning and the policy is an > emergent effect of how the different gc pieces interact, which as I > note in the linked thread(s) sucks. It sure does, and that aspect should be easy to fix...(?) > So it's creating a lot of garbage during its cloning process that can > just be immediately thrown away? What is it doing? Using the object > store as a scratch pad for its own temporary state? Yeah, thats suspicious and I don't know why. I've worked on other importers and while those needed 'gc' to generate packs, they didn't generate garbage objects. After gc, the repo was "clean". cheers, m -- martin.langhoff@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted ~ http://github.com/martin-langhoff by shiny stuff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:41 ` Martin Langhoff @ 2018-10-10 11:48 ` Ævar Arnfjörð Bjarmason 2018-10-10 16:51 ` Jonathan Nieder 0 siblings, 1 reply; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 11:48 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10 2018, Martin Langhoff wrote: > On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> As Jeff's >> https://public-inbox.org/git/20180716175103.GB18636@sigill.intra.peff.net/ >> and my https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ >> note it's a bit more complex than that. > > Ok, my bad for not reading the whole thread :-) thanks for the kind explanation. > >> - The warning is actionable, you can decide to up your expiration >> policy. > > A newbie-ish user shouldn't need to know git's internal store model > _and the nuances of its special cases_ got get through. Oh yeah, don't get me wrong. I think this whole thing sucks, and as the linked threads show I've run into various sucky edge cases of this. I'm just saying it's hard in this case to remove one piece without the whole Jenga tower collapsing, and it's probably a good idea in some of these cases to pester the user about what he wants, but probably not via gc --auto emitting the same warning every time, e.g. in one of these threads I suggested maybe "git status" should emit this. > >> - We use this warning as a proxy for "let's not run for a day" > > Oh, so _that's_ the trick with creating gc.log? I then understand the > idea of changing to exit 0. > > But it's far from clear, and a clear _flag_, and not spitting again > the same warning, or differently-worded warning would be better. > > "We won't try running gc, a recent run was deemed pointless until some > time passes. Nothing to worry about." Yup. That would be better. Right now we don't write anything machine-readable to the log, and we'd need to start doing that. E.g. we could just as well be reporting that gc --auto is segfaulting and that's why you have all this garbage. We just "cat" it. >> - This conflation of the user-visible warning and the policy is an >> emergent effect of how the different gc pieces interact, which as I >> note in the linked thread(s) sucks. > > It sure does, and that aspect should be easy to fix...(?) > >> So it's creating a lot of garbage during its cloning process that can >> just be immediately thrown away? What is it doing? Using the object >> store as a scratch pad for its own temporary state? > > Yeah, thats suspicious and I don't know why. I've worked on other > importers and while those needed 'gc' to generate packs, they didn't > generate garbage objects. After gc, the repo was "clean". I tried to find this out in my reply-to-myself in https://public-inbox.org/git/877eiqf2nk.fsf@evledraar.gmail.com/ But as noted just looked at this briefly, and I don't use git-svn for years now, so I don't know and might be missing something. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:48 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 16:51 ` Jonathan Nieder 2018-10-10 17:46 ` Jeff King 2018-10-10 18:38 ` git svn clone/fetch hits issues with gc --auto Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 26+ messages in thread From: Jonathan Nieder @ 2018-10-10 16:51 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Martin Langhoff, Junio C Hamano, e, Git Mailing List Hi, Ævar Arnfjörð Bjarmason wrote: > I'm just saying it's hard in this case to remove one piece without the > whole Jenga tower collapsing, and it's probably a good idea in some of > these cases to pester the user about what he wants, but probably not via > gc --auto emitting the same warning every time, e.g. in one of these > threads I suggested maybe "git status" should emit this. I have to say, I don't have a lot of sympathy for this. I've been running with the patches I sent before for a while now, and the behavior that they create is great. I think we can make further refinements on top. To put it another way, I haven't actually experienced any bad knock-on effects, and I think other feature requests can be addressed separately. I do have sympathy for some wishes for changes to "git gc --auto" behavior (I think it should be synchronous regardless of config and the asynchrony should move to being requested explicitly through a command line option by the callers within Git) but I don't understand why this holds up a change that IMHO is wholly positive for users. To put it another way, I am getting the feeling that the objections to that series were theoretical, while the practical benefits of the patch are pretty immediate and real. I'm happy to help anyone who wants to polish it but time has shown no one is working on that, so... Thanks, Jonathan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 16:51 ` Jonathan Nieder @ 2018-10-10 17:46 ` Jeff King 2018-10-10 19:27 ` [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 Ævar Arnfjörð Bjarmason 2018-10-10 18:38 ` git svn clone/fetch hits issues with gc --auto Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 26+ messages in thread From: Jeff King @ 2018-10-10 17:46 UTC (permalink / raw) To: Jonathan Nieder Cc: Ævar Arnfjörð Bjarmason, Martin Langhoff, Junio C Hamano, e, Git Mailing List On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > > > I'm just saying it's hard in this case to remove one piece without the > > whole Jenga tower collapsing, and it's probably a good idea in some of > > these cases to pester the user about what he wants, but probably not via > > gc --auto emitting the same warning every time, e.g. in one of these > > threads I suggested maybe "git status" should emit this. > > I have to say, I don't have a lot of sympathy for this. > > I've been running with the patches I sent before for a while now, and > the behavior that they create is great. I think we can make further > refinements on top. To put it another way, I haven't actually > experienced any bad knock-on effects, and I think other feature > requests can be addressed separately. I think there may be some miscommunication here. The Jenga tower above is referring (I think) to Jonathan Tan's original patch to drop the warning entirely, which does have some unwanted side effects. Your patches are much less controversial, I think, and are in next and marked as "will merge to master" in the last "what's cooking". -Peff ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 17:46 ` Jeff King @ 2018-10-10 19:27 ` Ævar Arnfjörð Bjarmason 2018-10-10 20:35 ` Jeff King 2018-10-10 20:56 ` Jonathan Nieder 0 siblings, 2 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 19:27 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Jonathan Nieder, Ævar Arnfjörð Bjarmason Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode' configuration option to optionally bring back the 'git gc --auto' exit code behavior as it existed between 2.6.3..2.19.0 (inclusive). This was changed in 3029970275 ("gc: do not return error for prior errors in daemonized mode", 2018-07-16). The motivation for that patch was to appease 3rd party tools whose treatment of the 'git gc --auto' exit code is different from that of git core where it has always been ignored. That means that out of the three modes gc --auto will operate in: 1. gc --auto has nothing to do 2. gc --auto has something to do, will fork and try to do it 3. gc --auto has something to do, but notices that gc has been failing before when forked and can't do anything now. We started exiting with zero in the case of #3, instead of non-zero (see [1] for more details). As noted by the docs being added here the #3 case is relatively rare, so I think it's fine to change this as the default with the assumption that the use-case for tools like the "repo" tool noted in commit 3029970275 above are more common than not. But it left us without any option to have "git gc --auto" tell us about this failure except by starting to either parse its output, or for the caller to start breaking the encapsulation and starting to check for .git/gc.log themselves. Let's instead provide an option to exit with non-zero when we have errors to tell the user about, and provide a configuration option so that it can be dropped in-place in anticipation of upgrading to Git version 2.20 without having to make using --auto-exit-code conditional on the git version in use. 1. https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- > On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote: > >> Ævar Arnfjörð Bjarmason wrote: >> >> > I'm just saying it's hard in this case to remove one piece without the >> > whole Jenga tower collapsing, and it's probably a good idea in some of >> > these cases to pester the user about what he wants, but probably not via >> > gc --auto emitting the same warning every time, e.g. in one of these >> > threads I suggested maybe "git status" should emit this. >> >> I have to say, I don't have a lot of sympathy for this. >> >> I've been running with the patches I sent before for a while now, and >> the behavior that they create is great. I think we can make further >> refinements on top. To put it another way, I haven't actually >> experienced any bad knock-on effects, and I think other feature >> requests can be addressed separately. > > I think there may be some miscommunication here. The Jenga tower above > is referring (I think) to Jonathan Tan's original patch to drop the > warning entirely, which does have some unwanted side effects. > > Your patches are much less controversial, I think, and are in next and > marked as "will merge to master" in the last "what's cooking". [Junio: This goes on top of gitster/jn/gc-auto] I thought the jn/gc-auto topic was still in "pu", my fault for not paying attention. It seems the general consensus is against my notion of what should be the default (which is fine), but since (as noted in the patch) it seems yucky to start breaking the encapsulation of gc.log, especially as it's looking more and more likely that it'll be an implementation detail we might drop, here's a patch on top of jn/gc-auto to optionally restore the old behavior. Documentation/config.txt | 28 ++++++++++++++++++++++++++++ Documentation/git-gc.txt | 7 +++++++ builtin/gc.c | 7 ++++++- t/t6500-gc.sh | 2 ++ 4 files changed, 43 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 5b72684999..e37a463bf8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1635,6 +1635,34 @@ gc.autoDetach:: Make `git gc --auto` return immediately and run in background if the system supports it. Default is true. +gc.autoExitCode:: + Make `git gc --auto` return non-zero if it would have + demonized itself (see `gc.autoDetach`) due to a needed gc, but + a 'gc.log' is found within the `gc.logExpiry` with an error + from a previous run. ++ +When 'git gc' is run with the default of `gc.autoDetach=true` a +failure might have been noted in the 'gc.log' from a previously +detached `--auto` run. If the failure is within the time configured in +`gc.logExpiry` the `--auto` run will abort early and report the error +in the 'gc.log'. ++ +From version 2.6.3 to version 2.19 of Git encountering this error +would cause 'git gc' to exit with non-zero, but this was deemed to be +a hassle for third-party tools to handle since it rarely happens, and +they usually don't assume that 'git gc --auto' can fail. Therefore +since version 2.20 of Git 'git gc --auto' will always exit with zero +if it would have demonized itself, even when encountering such an +error. ++ +Supplying this option will make 'git gc' exit with non-zero in that +case, which allows for detecting cases where a repository is in a +state where git 'gc --auto' is refusing to demonize due to previously +encountered errors. ++ +This option can also be turned on as a one-off with the +`--auto-exit-code` option, see linkgit:git-gc[1]. + gc.bigPackThreshold:: If non-zero, all packs larger than this limit are kept when `git gc` is run. This is very similar to `--keep-base-pack` diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 24b2dd44fe..adf53fc959 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -71,6 +71,13 @@ If houskeeping is required due to many loose objects or packs, all other housekeeping tasks (e.g. rerere, working trees, reflog...) will be performed as well. +--auto-exit-code:: + Make `git gc --auto` return non-zero if it would have + demonized itself (see `gc.autoDetach`) due to a needed gc, but + a 'gc.log' is found within the time period of `gc.logExpiry` + with an error from a previous run. See linkgit:git-config[1] + for more details about this option which can also be + configured with the `gc.autoExitCode` boolean variable. --prune=<date>:: Prune loose objects older than date (default is 2 weeks ago, diff --git a/builtin/gc.c b/builtin/gc.c index ce8a663a01..69c0dedd8c 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -41,6 +41,7 @@ static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; static int detach_auto = 1; +static int gc_auto_exit_code = 0; static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; static const char *prune_expire = "2.weeks.ago"; @@ -130,6 +131,7 @@ static void gc_config(void) git_config_get_int("gc.auto", &gc_auto_threshold); git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit); git_config_get_bool("gc.autodetach", &detach_auto); + git_config_get_bool("gc.autoexitcode", &gc_auto_exit_code); git_config_get_expiry("gc.pruneexpire", &prune_expire); git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire); git_config_get_expiry("gc.logexpiry", &gc_log_expire); @@ -518,6 +520,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")), OPT_BOOL_F(0, "auto", &auto_gc, N_("enable auto-gc mode"), PARSE_OPT_NOCOMPLETE), + OPT_BOOL_F(0, "auto-exit-code", &gc_auto_exit_code, + N_("exit with non-zero if an error is encountered before gc is daemonized"), + PARSE_OPT_NOCOMPLETE), OPT_BOOL_F(0, "force", &force, N_("force running gc even if there may be another gc running"), PARSE_OPT_NOCOMPLETE), @@ -582,7 +587,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) exit(128); if (ret == 1) /* Last gc --auto failed. Skip this one. */ - return 0; + return gc_auto_exit_code; if (lock_repo_for_gc(force, &pid)) return 0; diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index a222efdbe1..25bce70316 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -121,6 +121,8 @@ test_expect_success 'background auto gc does not run if gc.log is present and re test_config gc.logexpiry 5.days && test-tool chmtime =-345600 .git/gc.log && git gc --auto && + test_must_fail git gc --auto --auto-exit-code && + test_must_fail git -c gc.autoExitCode=true gc --auto && test_config gc.logexpiry 2.days && run_and_wait_for_auto_gc && ls .git/objects/pack/pack-*.pack >packs && -- 2.19.1.390.gf3a00b506f ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 19:27 ` [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 Ævar Arnfjörð Bjarmason @ 2018-10-10 20:35 ` Jeff King 2018-10-10 20:59 ` Ævar Arnfjörð Bjarmason 2018-10-10 20:56 ` Jonathan Nieder 1 sibling, 1 reply; 26+ messages in thread From: Jeff King @ 2018-10-10 20:35 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jonathan Nieder On Wed, Oct 10, 2018 at 07:27:32PM +0000, Ævar Arnfjörð Bjarmason wrote: > Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode' > configuration option to optionally bring back the 'git gc --auto' exit > code behavior as it existed between 2.6.3..2.19.0 (inclusive). > > This was changed in 3029970275 ("gc: do not return error for prior > errors in daemonized mode", 2018-07-16). The motivation for that patch > was to appease 3rd party tools whose treatment of the 'git gc --auto' > exit code is different from that of git core where it has always been > ignored. OK. I wouldn't want to use this myself, but I think you've made clear why you find it useful. So I don't mind making it an optional behavior (and it probably beats you trying to poke at the logfile yourself). I'm not sure if the config is going to actually help that much, though. The callers within Git will generally ignore the exit code anyway. So for those cases, setting it will at best do nothing, and at worst it may confuse the few stragglers (e.g., the git-svn one under recent discussion). Callers who _are_ prepared to act on the exit code probably ought to just use --auto-exit-code in their invocation. That said, I'm not entirely opposed to the matching config. There's enough history here that somebody might want a sledgehammer setting to go back to the old behavior. -Peff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 20:35 ` Jeff King @ 2018-10-10 20:59 ` Ævar Arnfjörð Bjarmason 2018-10-11 0:38 ` Jeff King 0 siblings, 1 reply; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 20:59 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, Jonathan Nieder On Wed, Oct 10 2018, Jeff King wrote: > On Wed, Oct 10, 2018 at 07:27:32PM +0000, Ævar Arnfjörð Bjarmason wrote: > >> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode' >> configuration option to optionally bring back the 'git gc --auto' exit >> code behavior as it existed between 2.6.3..2.19.0 (inclusive). >> >> This was changed in 3029970275 ("gc: do not return error for prior >> errors in daemonized mode", 2018-07-16). The motivation for that patch >> was to appease 3rd party tools whose treatment of the 'git gc --auto' >> exit code is different from that of git core where it has always been >> ignored. > > OK. I wouldn't want to use this myself, but I think you've made clear > why you find it useful. So I don't mind making it an optional behavior > (and it probably beats you trying to poke at the logfile yourself). [...] > I'm not sure if the config is going to actually help that much, though. > The callers within Git will generally ignore the exit code anyway. So > for those cases, setting it will at best do nothing, and at worst it may > confuse the few stragglers (e.g., the git-svn one under recent > discussion). Yeah git internals don't care, but we've never advertised the combination of --auto and gc.autoDetach=true as being something internal-only, so e.g. I wrote stuff expecting errors, and one might run "git gc --auto" in a repo whose .git/objects state is uncertain to see if it needed repack (and have a shell integration that reports failures...). > Callers who _are_ prepared to act on the exit code probably ought to > just use --auto-exit-code in their invocation. > > That said, I'm not entirely opposed to the matching config. There's > enough history here that somebody might want a sledgehammer setting to > go back to the old behavior. If it's not a config option then as git is upgraded I'll need to change my across-server invocation to be some variant of checking git version, then etiher using the --auto-exit-code option or not (which'll error on older gits). Easier to be able to just drop in a config setting before the upgrade. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 20:59 ` Ævar Arnfjörð Bjarmason @ 2018-10-11 0:38 ` Jeff King 0 siblings, 0 replies; 26+ messages in thread From: Jeff King @ 2018-10-11 0:38 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jonathan Nieder On Wed, Oct 10, 2018 at 10:59:45PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Callers who _are_ prepared to act on the exit code probably ought to > > just use --auto-exit-code in their invocation. > > > > That said, I'm not entirely opposed to the matching config. There's > > enough history here that somebody might want a sledgehammer setting to > > go back to the old behavior. > > If it's not a config option then as git is upgraded I'll need to change > my across-server invocation to be some variant of checking git version, > then etiher using the --auto-exit-code option or not (which'll error on > older gits). Easier to be able to just drop in a config setting before > the upgrade. Yeah, that's the "there's enough history here" that I was referring to, but I hadn't quite thought through a concrete example. That makes sense. (Though I also think the other part of the thread is reasonable, too, where we'd just have a command to abstract away "cat .git/gc.log" into "git gc --show-detached-log" or something). -Peff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 19:27 ` [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 Ævar Arnfjörð Bjarmason 2018-10-10 20:35 ` Jeff King @ 2018-10-10 20:56 ` Jonathan Nieder 2018-10-10 21:05 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 26+ messages in thread From: Jonathan Nieder @ 2018-10-10 20:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Jeff King Hi, Ævar Arnfjörð Bjarmason wrote: > Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode' > configuration option to optionally bring back the 'git gc --auto' exit > code behavior as it existed between 2.6.3..2.19.0 (inclusive). Hm. Can you tell me more about the use case where this would be helpful to you? That would help us come up with a better name for it. Thanks, Jonathan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 20:56 ` Jonathan Nieder @ 2018-10-10 21:05 ` Ævar Arnfjörð Bjarmason 2018-10-10 21:14 ` Jonathan Nieder 0 siblings, 1 reply; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 21:05 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Junio C Hamano, Jeff King On Wed, Oct 10 2018, Jonathan Nieder wrote: > Hi, > > Ævar Arnfjörð Bjarmason wrote: > >> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode' >> configuration option to optionally bring back the 'git gc --auto' exit >> code behavior as it existed between 2.6.3..2.19.0 (inclusive). > > Hm. Can you tell me more about the use case where this would be > helpful to you? That would help us come up with a better name for it. From the E-Mail linked from the commit message[1] (I opted not to put this in, because it was getting a bit long: Right. I know. What I mean is now I can (and do) use it to run 'git gc --auto' across our server fleet and see whether I have any of #3, or whether it's all #1 or #2. If there's nothing to do in #1 that's fine, and it just so happens that I'll run gc due to #2 that's also fine, but I'd like to see if gc really is stuck. This of course relies on them having other users / scripts doing normal git commands which would trigger previous 'git gc --auto' runs. I.e. with your change that command: git gc --auto Would change to something like: git gc --auto && ! test -e .git/gc.log Which, as noted is a bit of a nasty breaker of the encapsulation, so now: git gc --auto --auto-exit-code Or just a variant of that which will have dropped the config in-place in /etc/gitconfig, and then as before: git gc --auto 1. https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 21:05 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 21:14 ` Jonathan Nieder 2018-10-10 21:36 ` Junio C Hamano 0 siblings, 1 reply; 26+ messages in thread From: Jonathan Nieder @ 2018-10-10 21:14 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Jeff King Ævar Arnfjörð Bjarmason wrote: > Right. I know. What I mean is now I can (and do) use it to run 'git gc > --auto' across our server fleet and see whether I have any of #3, or > whether it's all #1 or #2. If there's nothing to do in #1 that's fine, > and it just so happens that I'll run gc due to #2 that's also fine, but > I'd like to see if gc really is stuck. > > This of course relies on them having other users / scripts doing normal > git commands which would trigger previous 'git gc --auto' runs. > > I.e. with your change that command: > > git gc --auto > > Would change to something like: > > git gc --auto && ! test -e .git/gc.log > > Which, as noted is a bit of a nasty breaker of the encapsulation That helps. What if we package up the "test -e .git/gc.log" bit *without* having the side effect of running git gc --auto, so that you can run if ! git gc --detached-exit-code then ... handle the error ... fi git gc --auto; # perhaps also with --detach ? I'm not great at naming options, so the --detached-exit-code name is bikesheddable. What I really mean to ask about is: what if the status reporting goes in a separate command from running gc --auto? Perhaps this reporting could also print the message from a previous run, so you could write: git gc --detached-status || exit git gc --auto; # perhaps also passing --detach (Names still open for bikeshedding.) Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 21:14 ` Jonathan Nieder @ 2018-10-10 21:36 ` Junio C Hamano 2018-10-10 21:51 ` Jonathan Nieder 0 siblings, 1 reply; 26+ messages in thread From: Junio C Hamano @ 2018-10-10 21:36 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Ævar Arnfjörð Bjarmason, git, Jeff King Jonathan Nieder <jrnieder@gmail.com> writes: > Perhaps this reporting could also print the message from a previous > run, so you could write: > > git gc --detached-status || exit > git gc --auto; # perhaps also passing --detach > > (Names still open for bikeshedding.) When the command is given --detached-exit-code/status option, what does it do? Does it perform the "did an earlier run left gc.log?" and report the result and nothing else? In other words, is it a pure replacement for "test -e .git/gc.log"? Or does it do some of the "auto-gc" prep logic like guestimating loose object count and have that also in its exit status (e.g. "from the gc.log left behind, we know that we failed to reduce loose object count down sufficiently after finding there are more than 6700 earlier, but now we do not have that many loose object, so there is nothing to complain about the presence of gc.log")? I am bad at naming myself, but worse at guessing what others meant with a new thing that was given a new name whose name is fuzzy, so... ;-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 21:36 ` Junio C Hamano @ 2018-10-10 21:51 ` Jonathan Nieder 2018-10-10 22:16 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 26+ messages in thread From: Jonathan Nieder @ 2018-10-10 21:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git, Jeff King Junio C Hamano wrote: > Jonathan Nieder <jrnieder@gmail.com> writes: >> Perhaps this reporting could also print the message from a previous >> run, so you could write: >> >> git gc --detached-status || exit >> git gc --auto; # perhaps also passing --detach >> >> (Names still open for bikeshedding.) > > When the command is given --detached-exit-code/status option, what > does it do? Does it perform the "did an earlier run left gc.log?" > and report the result and nothing else? In other words, is it a > pure replacement for "test -e .git/gc.log"? My intent was the latter. In other words, in the idiom do_something_async & ... a lot of time passes ... wait it is something like the replacement for "wait". More precisely, git gc --detached-status || exit would mean something like if test -e .git/gc.log # Error from previous gc --detach? then cat >&2 .git/gc.log # Report the error. exit 1 fi > Or does it do some of > the "auto-gc" prep logic like guestimating loose object count and > have that also in its exit status (e.g. "from the gc.log left > behind, we know that we failed to reduce loose object count down > sufficiently after finding there are more than 6700 earlier, but now > we do not have that many loose object, so there is nothing to > complain about the presence of gc.log")? Depending on the use case, a user might want to avoid losing information about the results of a previous "git gc --detach" run, even if they no longer apply. For example, a user might want to collect the error message for monitoring or later log analysis, to track down intermittent gc errors that go away on their own. A separate possible use case might be a git gc --needs-auto-gc command that detects whether an auto gc is needed. With that, a caller that only wants to learn about errors if auto gc is needed could run if git gc --needs-auto-gc then git gc --detached-status || exit fi > I am bad at naming myself, but worse at guessing what others meant > with a new thing that was given a new name whose name is fuzzy, > so... ;-) No problem. I'm mostly trying to tease out more details about the use case. Thanks, Jonathan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 21:51 ` Jonathan Nieder @ 2018-10-10 22:16 ` Ævar Arnfjörð Bjarmason 2018-10-10 22:25 ` Jonathan Nieder 0 siblings, 1 reply; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 22:16 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Junio C Hamano, git, Jeff King On Wed, Oct 10 2018, Jonathan Nieder wrote: > Junio C Hamano wrote: >> Jonathan Nieder <jrnieder@gmail.com> writes: > >>> Perhaps this reporting could also print the message from a previous >>> run, so you could write: >>> >>> git gc --detached-status || exit >>> git gc --auto; # perhaps also passing --detach >>> >>> (Names still open for bikeshedding.) >> >> When the command is given --detached-exit-code/status option, what >> does it do? Does it perform the "did an earlier run left gc.log?" >> and report the result and nothing else? In other words, is it a >> pure replacement for "test -e .git/gc.log"? > > My intent was the latter. In other words, in the idiom > > do_something_async & > ... a lot of time passes ... > wait > > it is something like the replacement for "wait". > > More precisely, > > git gc --detached-status || exit > > would mean something like > > if test -e .git/gc.log # Error from previous gc --detach? > then > cat >&2 .git/gc.log # Report the error. > exit 1 > fi > >> Or does it do some of >> the "auto-gc" prep logic like guestimating loose object count and >> have that also in its exit status (e.g. "from the gc.log left >> behind, we know that we failed to reduce loose object count down >> sufficiently after finding there are more than 6700 earlier, but now >> we do not have that many loose object, so there is nothing to >> complain about the presence of gc.log")? > > Depending on the use case, a user might want to avoid losing > information about the results of a previous "git gc --detach" run, > even if they no longer apply. For example, a user might want to > collect the error message for monitoring or later log analysis, to > track down intermittent gc errors that go away on their own. > > A separate possible use case might be a > > git gc --needs-auto-gc > > command that detects whether an auto gc is needed. With that, a > caller that only wants to learn about errors if auto gc is needed > could run > > if git gc --needs-auto-gc > then > git gc --detached-status || exit > fi > >> I am bad at naming myself, but worse at guessing what others meant >> with a new thing that was given a new name whose name is fuzzy, >> so... ;-) > > No problem. I'm mostly trying to tease out more details about the use > case. Likewise, so don't take the following as an assertion of fact, but more of a fact-finding mission: We could add something like this --detached-status / --needs-auto-gc, but I don't need it, and frankly I can't think of a reason for why anyone would want to use these. The entire point of having gc --auto in the first place is that you don't care when exactly GC happens, you're happy with whenever git decides it's needed. So why would anyone need a --needs-auto-gc? If your criteria for doing GC exactly matches that of gc --auto then ... you just run gc --auto, if it isn't (e.g. if you're using Microsoft's Windows repo) you're not using gc --auto in the first place, and neither --needs-auto-gc nor --auto is useful to you. So maybe I'm missing something here, but a --needs-auto-gc just seems like a gratuitous exposure of an internal implementation detail whose only actionable result is doing what we're doing with "gc --auto" now, i.e. just run gc. Which is what I'm doing by running "gc --auto" across a set of servers and looking at the exit code. If it's been failing I get an error, if there's no need to gc nothing happens, and if it hasn't been failing and it just so happens that it's time to GC then fine, now was as good a time as any. So if we assume that for the sake of argument there's no point in a --detached-status either. My only reason for ever caring about that status is when I run "gc --auto" and it says it can't fork() itself so it fails. Since I'm using "gc --auto" I have zero reason to even ask that question unless I'm OK with kicking off a gc run as a side-effect, so why split up the two? It just introduces a race condition for no benefit. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 2018-10-10 22:16 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 22:25 ` Jonathan Nieder 0 siblings, 0 replies; 26+ messages in thread From: Jonathan Nieder @ 2018-10-10 22:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, Jeff King Ævar Arnfjörð Bjarmason wrote: > Which is what I'm doing by running "gc --auto" across a set of servers > and looking at the exit code. If it's been failing I get an error, if > there's no need to gc nothing happens, and if it hasn't been failing and > it just so happens that it's time to GC then fine, now was as good a > time as any. For this, a simple "git gc --detached-status" would work. It sounds like the bonus gc --auto run was a side effect instead of being a requirement. > So if we assume that for the sake of argument there's no point in a > --detached-status either. My only reason for ever caring about that > status is when I run "gc --auto" and it says it can't fork() itself so > it fails. Since I'm using "gc --auto" I have zero reason to even ask > that question unless I'm OK with kicking off a gc run as a side-effect, > so why split up the two? It just introduces a race condition for no > benefit. What I am trying to do is design an interface which is simple to explain in the manual. The existing "git gc --auto" interface since detaching was introduced is super confusing to me, so I'm going through the thought exercise of "If we were starting over, what would we build instead?" Part of the answer to that question might include a --report-from-the-last-time-you-detached option. I'm still failing to come up of a case where the answer to that question would include a --report-from-the-last-time-you-detached-and-if-it-went-okay\ -then-run-another-detached-gc option. In other words, I think our disconnect is that you are describing things in terms of "I have been happy with the existing git gc --auto detaching behavior, so how can we maintain something as close as possible to that"? And I am trying to describe things in terms of "What is the simplest, most maintainable, and easiest to explain way to keep Ævar's servers working well"? Jonathan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 16:51 ` Jonathan Nieder 2018-10-10 17:46 ` Jeff King @ 2018-10-10 18:38 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 18:38 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Martin Langhoff, Junio C Hamano, e, Git Mailing List On Wed, Oct 10 2018, Jonathan Nieder wrote: > Hi, > > Ævar Arnfjörð Bjarmason wrote: > >> I'm just saying it's hard in this case to remove one piece without the >> whole Jenga tower collapsing, and it's probably a good idea in some of >> these cases to pester the user about what he wants, but probably not via >> gc --auto emitting the same warning every time, e.g. in one of these >> threads I suggested maybe "git status" should emit this. > > I have to say, I don't have a lot of sympathy for this. > > I've been running with the patches I sent before for a while now, and > the behavior that they create is great. I think we can make further > refinements on top. To put it another way, I haven't actually > experienced any bad knock-on effects, and I think other feature > requests can be addressed separately. > > I do have sympathy for some wishes for changes to "git gc --auto" > behavior (I think it should be synchronous regardless of config and > the asynchrony should move to being requested explicitly through a > command line option by the callers within Git) but I don't understand > why this holds up a change that IMHO is wholly positive for users. > > To put it another way, I am getting the feeling that the objections to > that series were theoretical, while the practical benefits of the > patch are pretty immediate and real. I'm happy to help anyone who > wants to polish it but time has shown no one is working on that, so... [I wrote this before seeing Jeff's reply, but just to bo clear...] Yes, like Jeff says I'm not referring to your gitster/jn/gc-auto with this "Jenga tower" comment. Re that patch: I've said what I think about tools printing error messages saying "I can't do stuff" while not returning a non-zero exit code, so I won't repeat that here. But whatever anyone thinks of that it's ultimately a rather trivial detail, and doesn't have any knock-on effects on the rest of git-gc behavior. I'm talking about the "gc: do not warn about too many loose objects" patch and similar approaches. FWIW what I'm describing in <878t36f3ed.fsf@evledraar.gmail.com> isn't some theoretical concern. In some large repositories at work that experience a lot of branch churn and have fetch.prune / fetch.pruneTags turned on active checkouts very quickly get to the default 6700 limit. I've currently found that gc.pruneExpire=4.days.ago is close to a sweet spot of avoiding that issue for now, while not e.g. gc-ing a loose object someone committed on Friday before the same time on Monday, but before I tweaked that, but with the default of 2.weeks we'd much more regularly see the problem described in [1]. But as noted in the various GC threads linked from this one that sort of solution within the confines of the current implementation and configuration promises we've made, which lead to all sorts of stupidity. 1. https://public-inbox.org/git/87inc89j38.fsf@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason 2018-10-10 11:41 ` Martin Langhoff @ 2018-10-10 11:43 ` Ævar Arnfjörð Bjarmason 2018-10-10 12:21 ` Junio C Hamano 2 siblings, 0 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 11:43 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote: > On Wed, Oct 10 2018, Martin Langhoff wrote: > >> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many >> loose objects" makes sense to me. >> >> - remove unactionable warning >> - as the warning is gone, no gc.log is produced >> - subsequent gc runs don't exit due to gc.log >> >> My very humble +1 on that. >> >> As for downsides... if we have truly tons of _recent_ loose objects, >> it'll ... take disk space? I'm fine with that. > > As Jeff's > https://public-inbox.org/git/20180716175103.GB18636@sigill.intra.peff.net/ > and my https://public-inbox.org/git/878t69dgvx.fsf@evledraar.gmail.com/ > note it's a bit more complex than that. > > I.e.: > > - The warning is actionable, you can decide to up your expiration > policy. > > - We use this warning as a proxy for "let's not run for a day", > otherwise we'll just grind on gc --auto trying to consolidate > possibly many hundreds of K of loose objects only to find none of > them can be pruned because the run into the expiry policy. With the > warning we retry that once per day, which sucks less. > > - This conflation of the user-visible warning and the policy is an > emergent effect of how the different gc pieces interact, which as I > note in the linked thread(s) sucks. > > But we can't just yank one piece away (as Jonathan's patch does) > without throwing the baby out with the bathwater. > > It will mean that e.g. if you have 10k loose objects in your git.git, > and created them just now, that every time you run anything that runs > "gc --auto" we'll fork to the background, peg a core at 100% CPU for > 2-3 minutes or whatever it is, only do get nowhere and do the same > thing again in ~3 minutes when you run your next command. > > - I think you may be underestimating some of the cases where this ends > up taking a huge amount of disk space (and now we'll issue at least > *some*) warning. See my > https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/ > where a repo's .git went from 2.5G to 30G due to being stuck in this > mode. > >> For more aggressive gc options, thoughts: >> >> - Do we always consider git gc --prune=now "safe" in a "won't delete >> stuff the user is likely to want" sense? For example -- are the >> references from reflogs enough safety? > > The --prune=now command is not generally safe for the reasons noted in > the "NOTES" section in "git help gc". > >> - Even if we don't, for some commands it should be safe to run git gc >> --prune=now at the end of the process, for example an import that >> generates a new git repo (git svn clone). > > Yeah I don't see a problem with that, I didn't know about this > interesting use-case, i.e. that "git svn clone" will create a lot of > loose objects. > > As seen in my > https://public-inbox.org/git/87tvm3go42.fsf@evledraar.gmail.com/ I'm > working on making "gc --auto" run at the end of clone for unrelated > reasons, i.e. so we generate the commit-graph, seems like "git svn > clone" could do something similar. > > So it's creating a lot of garbage during its cloning process that can > just be immediately thrown away? What is it doing? Using the object > store as a scratch pad for its own temporary state? To answer my own question (which was based on a thinko) it's continually creating loose objects during import, i.e. packs are not involved (don't know why I thought that), so yeah, because all of those have <2wks expiry we end up warning as gc --auto is run. But I actually think the git-svn import is revealing an entirely different problem. I.e. when I clone I seem to be getting a refs/remotes/git-svn branch that's kept up-to-date, and when I "gc" everything's consolidated into a pack, we don't have any loose objects that are meant for expiry. But the reason git-svn is whining is because we're doing this in gc (simplified for the sake af discussion): if (too_many_loose()) { expire(); repack(); if (too_many_loose()) die("oh noes too many loose that don't match our expiry policy!"); } But they don't fall under our expiry policy at all, we're just assuming that a crapload of loose objects haven't been added in the interim from when we ran expire() + repack() until when we check too_many_loose() again. That's a logic error which we could just solve at some expense by seeing *which* objects are loose and candidates for expiry at the beginning, and not warning if at the end we have *different* loose objects that should be consolidated, that just means we genuinely should run gc again. Or is this just wrong? I don't really know. If the above is true I'm missing how tweaking gc.pruneExpire=5.minutes.ago is helping. Surely we'd either just end up with the same set of loose objects (since the clone is still running), or alternatively if git-svn hadn't gotten around to updating refs create a corrupt repo. >> m >> On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano <gitster@pobox.com> wrote: >>> >>> Forwarding to Jonathan, as I think this is an interesting supporting >>> vote for the topic that we were stuck on. >>> >>> Eric Wong <e@80x24.org> writes: >>> >>> > Martin Langhoff <martin.langhoff@gmail.com> wrote: >>> >> Hi folks, >>> >> >>> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo >>> >> I hit the gc error: >>> >> >>> >> warning: There are too many unreachable loose objects; run 'git prune' >>> >> to remove them. >>> >> gc --auto: command returned error: 255 >>> > >>> > GC can be annoying when that happens... For git-svn, perhaps >>> > this can be appropriate to at least allow the import to continue: >>> > >>> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm >>> > index 76b2965905..9b0caa3d47 100644 >>> > --- a/perl/Git/SVN.pm >>> > +++ b/perl/Git/SVN.pm >>> > @@ -999,7 +999,7 @@ sub restore_commit_header_env { >>> > } >>> > >>> > sub gc { >>> > - command_noisy('gc', '--auto'); >>> > + eval { command_noisy('gc', '--auto') }; >>> > }; >>> > >>> > sub do_git_commit { >>> > >>> > >>> > But yeah, somebody else who works on git regularly could >>> > probably stop repack from writing thousands of loose >>> > objects (and instead write a self-contained pack with >>> > those objects, instead). I haven't followed git closely >>> > lately, myself. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason 2018-10-10 11:41 ` Martin Langhoff 2018-10-10 11:43 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 12:21 ` Junio C Hamano 2018-10-10 12:37 ` Ævar Arnfjörð Bjarmason 2018-10-10 16:38 ` Martin Langhoff 2 siblings, 2 replies; 26+ messages in thread From: Junio C Hamano @ 2018-10-10 12:21 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Martin Langhoff, Jonathan Nieder, e, Git Mailing List Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > - We use this warning as a proxy for "let's not run for a day", > otherwise we'll just grind on gc --auto trying to consolidate > possibly many hundreds of K of loose objects only to find none of > them can be pruned because the run into the expiry policy. With the > warning we retry that once per day, which sucks less. > > - This conflation of the user-visible warning and the policy is an > emergent effect of how the different gc pieces interact, which as I > note in the linked thread(s) sucks. > > But we can't just yank one piece away (as Jonathan's patch does) > without throwing the baby out with the bathwater. > > It will mean that e.g. if you have 10k loose objects in your git.git, > and created them just now, that every time you run anything that runs > "gc --auto" we'll fork to the background, peg a core at 100% CPU for > 2-3 minutes or whatever it is, only do get nowhere and do the same > thing again in ~3 minutes when you run your next command. We probably can keep the "let's not run for a day" safety while pretending that "git gc -auto" succeeded for callers like "git svn" so that these callers do not hae to do "eval { ... }" to hide our exit code, no? I think that is what Jonathan's patch (jn/gc-auto) does. From: Jonathan Nieder <jrnieder@gmail.com> Date: Mon, 16 Jul 2018 23:57:40 -0700 Subject: [PATCH] gc: do not return error for prior errors in daemonized mode diff --git a/builtin/gc.c b/builtin/gc.c index 95c8afd07b..ce8a663a01 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) return NULL; } -static void report_last_gc_error(void) +/* + * Returns 0 if there was no previous error and gc can proceed, 1 if + * gc should not proceed due to an error in the last run. Prints a + * message and returns -1 if an error occured while reading gc.log + */ +static int report_last_gc_error(void) { struct strbuf sb = STRBUF_INIT; + int ret = 0; ... if (len < 0) + ret = error_errno(_("cannot read '%s'"), gc_log_path); + else if (len > 0) { + /* + * A previous gc failed. Report the error, and don't + * bother with an automatic gc run since it is likely + * to fail in the same way. + */ + warning(_("The last gc run reported the following. " "Please correct the root cause\n" "and remove %s.\n" "Automatic cleanup will not be performed " "until the file is removed.\n\n" "%s"), gc_log_path, sb.buf); + ret = 1; + } strbuf_release(&sb); done: free(gc_log_path); + return ret; } I.e. report_last_gc_error() returns 1 when finds that the previous attempt to "gc --auto" failed. And then @@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix) fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); } if (detach_auto) { - report_last_gc_error(); /* dies on error */ + int ret = report_last_gc_error(); + if (ret < 0) + /* an I/O error occured, already reported */ + exit(128); + if (ret == 1) + /* Last gc --auto failed. Skip this one. */ + return 0; ... it exits with 0 without bothering to rerun "gc". So it won't get stuck for 3 minutes; the repository after "gc --auto" punts will stay to be suboptimal for a day, and the user kill not get an "actionable" error notice (due to this hiding of previous error), hence cannot make changes that may help like shortening expiry period, though. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 12:21 ` Junio C Hamano @ 2018-10-10 12:37 ` Ævar Arnfjörð Bjarmason 2018-10-10 16:38 ` Martin Langhoff 1 sibling, 0 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 12:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Martin Langhoff, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> - We use this warning as a proxy for "let's not run for a day", >> otherwise we'll just grind on gc --auto trying to consolidate >> possibly many hundreds of K of loose objects only to find none of >> them can be pruned because the run into the expiry policy. With the >> warning we retry that once per day, which sucks less. >> >> - This conflation of the user-visible warning and the policy is an >> emergent effect of how the different gc pieces interact, which as I >> note in the linked thread(s) sucks. >> >> But we can't just yank one piece away (as Jonathan's patch does) >> without throwing the baby out with the bathwater. >> >> It will mean that e.g. if you have 10k loose objects in your git.git, >> and created them just now, that every time you run anything that runs >> "gc --auto" we'll fork to the background, peg a core at 100% CPU for >> 2-3 minutes or whatever it is, only do get nowhere and do the same >> thing again in ~3 minutes when you run your next command. > > We probably can keep the "let's not run for a day" safety while > pretending that "git gc -auto" succeeded for callers like "git svn" > so that these callers do not hae to do "eval { ... }" to hide our > exit code, no? > > I think that is what Jonathan's patch (jn/gc-auto) does. Yeah we could take that patch to skip the eval {} suggested upthread. As noted when it was discussed I'm *mildly* negative on hiding a IMO meaningful exit code like that, but maybe sprinkling eval {} or other "run but ignore exit code" in stuff running "gc --auto" is worth it, and we could just document that you may want to check gc.log. > From: Jonathan Nieder <jrnieder@gmail.com> > Date: Mon, 16 Jul 2018 23:57:40 -0700 > Subject: [PATCH] gc: do not return error for prior errors in daemonized mode > > diff --git a/builtin/gc.c b/builtin/gc.c > index 95c8afd07b..ce8a663a01 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) > return NULL; > } > > -static void report_last_gc_error(void) > +/* > + * Returns 0 if there was no previous error and gc can proceed, 1 if > + * gc should not proceed due to an error in the last run. Prints a > + * message and returns -1 if an error occured while reading gc.log > + */ > +static int report_last_gc_error(void) > { > struct strbuf sb = STRBUF_INIT; > + int ret = 0; > ... > if (len < 0) > + ret = error_errno(_("cannot read '%s'"), gc_log_path); > + else if (len > 0) { > + /* > + * A previous gc failed. Report the error, and don't > + * bother with an automatic gc run since it is likely > + * to fail in the same way. > + */ > + warning(_("The last gc run reported the following. " > "Please correct the root cause\n" > "and remove %s.\n" > "Automatic cleanup will not be performed " > "until the file is removed.\n\n" > "%s"), > gc_log_path, sb.buf); > + ret = 1; > + } > strbuf_release(&sb); > done: > free(gc_log_path); > + return ret; > } > > I.e. report_last_gc_error() returns 1 when finds that the previous > attempt to "gc --auto" failed. And then > > @@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); > } > if (detach_auto) { > - report_last_gc_error(); /* dies on error */ > + int ret = report_last_gc_error(); > + if (ret < 0) > + /* an I/O error occured, already reported */ > + exit(128); > + if (ret == 1) > + /* Last gc --auto failed. Skip this one. */ > + return 0; > > ... it exits with 0 without bothering to rerun "gc". > > So it won't get stuck for 3 minutes; the repository after "gc > --auto" punts will stay to be suboptimal for a day, and the user > kill not get an "actionable" error notice (due to this hiding of > previous error), hence cannot make changes that may help like > shortening expiry period, though. Right, because it still writes the gc.log, but we'll still be yelling at the user on every commit/fetch etc. that we discovered such-and-such an issue on the last gc for that full day. That 3 minute comment was in reference to if we'd apply Jonathan Tan's "[PATCH] gc: do not warn about too many loose objects without any other changes. Then we'd just keep returning true on too_many_loose_objects() even though gc wouldn't help to resolve it. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-10 12:21 ` Junio C Hamano 2018-10-10 12:37 ` Ævar Arnfjörð Bjarmason @ 2018-10-10 16:38 ` Martin Langhoff 1 sibling, 0 replies; 26+ messages in thread From: Martin Langhoff @ 2018-10-10 16:38 UTC (permalink / raw) To: Junio C Hamano Cc: Ævar Arnfjörð, Jonathan Nieder, e, Git Mailing List On Wed, Oct 10, 2018 at 8:21 AM Junio C Hamano <gitster@pobox.com> wrote: > We probably can keep the "let's not run for a day" safety while > pretending that "git gc -auto" succeeded for callers like "git svn" > so that these callers do not hae to do "eval { ... }" to hide our > exit code, no? > > I think that is what Jonathan's patch (jn/gc-auto) does. +1 `--auto` means "DTRT, but remember you're running as part of a larger process; don't error out unless it's critical". cheers, m -- martin.langhoff@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted ~ http://github.com/martin-langhoff by shiny stuff ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git svn clone/fetch hits issues with gc --auto 2018-10-09 22:51 ` git svn clone/fetch hits issues with gc --auto Martin Langhoff 2018-10-09 23:45 ` Eric Wong @ 2018-10-10 8:04 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 26+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-10-10 8:04 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List, Eric Wong On Tue, Oct 09 2018, Martin Langhoff wrote: > Hi folks, > > Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > I hit the gc error: > > warning: There are too many unreachable loose objects; run 'git prune' > to remove them. > gc --auto: command returned error: 255 > > I don't seem to be the only one -- > https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn > > Looking at code history, it dropped the ability to pass options to git > repack when it was converted it to using git gc. > > Experimentally I find that tweaking it to run git gc --auto > --prune=5.minutes.ago works well, while --prune=now breaks it. > Attempts to commit fail with missing objects. > > - Why does --prune=now break it? Perhaps "gc" runs in the background, > and races with the commit being prepared? > > - Would it be safe, sane to apply --prune=some.value on _clone_? > > - During _fetch_, --prune=some.value seems risky. In a checkout being > actively used for development or merging it'd risk pruning objects > users expect to be there for recovery. Would there be a safe, sane > way? > > - Is there a safer, saner value than 5 minutes? What you've found is the least sucky way to work around this right now, but see my https://public-inbox.org/git/87inc89j38.fsf@evledraar.gmail.com/ and https://public-inbox.org/git/87d0vmck55.fsf@evledraar.gmail.com/ for some prior (and recent) discussion of this problem on-list. FWIW this has nothing to do with git-svn per-se, and also e.g. happens to me when I do a 'git fetch --all' sometimes on git.git. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2018-10-11 0:38 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CACPiFCJZ83sqE7Gaj2pa12APkBF5tau-C6t4_GrXBWDwcMnJHg@mail.gmail.com> 2018-10-09 22:51 ` git svn clone/fetch hits issues with gc --auto Martin Langhoff 2018-10-09 23:45 ` Eric Wong 2018-10-10 2:49 ` Junio C Hamano 2018-10-10 11:01 ` Martin Langhoff 2018-10-10 11:27 ` Ævar Arnfjörð Bjarmason 2018-10-10 11:41 ` Martin Langhoff 2018-10-10 11:48 ` Ævar Arnfjörð Bjarmason 2018-10-10 16:51 ` Jonathan Nieder 2018-10-10 17:46 ` Jeff King 2018-10-10 19:27 ` [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275 Ævar Arnfjörð Bjarmason 2018-10-10 20:35 ` Jeff King 2018-10-10 20:59 ` Ævar Arnfjörð Bjarmason 2018-10-11 0:38 ` Jeff King 2018-10-10 20:56 ` Jonathan Nieder 2018-10-10 21:05 ` Ævar Arnfjörð Bjarmason 2018-10-10 21:14 ` Jonathan Nieder 2018-10-10 21:36 ` Junio C Hamano 2018-10-10 21:51 ` Jonathan Nieder 2018-10-10 22:16 ` Ævar Arnfjörð Bjarmason 2018-10-10 22:25 ` Jonathan Nieder 2018-10-10 18:38 ` git svn clone/fetch hits issues with gc --auto Ævar Arnfjörð Bjarmason 2018-10-10 11:43 ` Ævar Arnfjörð Bjarmason 2018-10-10 12:21 ` Junio C Hamano 2018-10-10 12:37 ` Ævar Arnfjörð Bjarmason 2018-10-10 16:38 ` Martin Langhoff 2018-10-10 8:04 ` Ævar Arnfjörð Bjarmason
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).