git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Keep reflogs for deleted (remote tracking) branches?
@ 2022-03-08 11:27 Tao Klerks
  2022-03-08 11:54 ` Han-Wen Nienhuys
  2022-03-08 14:57 ` Jeff Hostetler
  0 siblings, 2 replies; 8+ messages in thread
From: Tao Klerks @ 2022-03-08 11:27 UTC (permalink / raw)
  To: git

Hi folks,

I have a practical question in case I missed something.

Imagine a small team (10ppl) working on a single centralized repo, in
github for example. They regularly create new branches, and typically
delete them eventually - after merging, or at other times when
branches were a dead end or whatever. The members of this team all
have a "simple" git remote configuration, the result of a "git clone"
with no special configuration. One exception is that they have set
"fetch.prune" to "true", because otherwise remote branches that have
been deleted (in the context of completed merges, or arbitrarily by
other team members) accumulate locally and having to explicitly prune
them from time to time is a pain. Every time someone says "why do I
still see these branches in my repo?", someone else replies "oh, just
run 'git config fetch.prune true'".

Now, one day someone deletes a branch accidentally from the server,
and the sole author of that branch has gone on vacation (or has an IT
failure, or has left the company, or whatever). Other team members
have seen this branch go by, it's appeared in their "fetch" output,
but no-one remembers checking it out, so it's not in their main
"HEAD" reflogs.

Even though the ref was at one point on every team member's computer,
and they still undoubtedly have a dangling commit in their repos,
they're going to have a hard time finding it - there are many dangling
commits in any given repo.

Now my question: is there any way to (temporarily) keep a reflog for
that deleted/pruned branch, in those team members' repos?

As far as I can tell, even "core.logAllRefUpdates=always" does *not*
keep any reflog entries around, even temporarily (until reflog
expiry), once a ref  is deleted - do I understand that correctly? Is
this behavior intentional / reasoned, or just a consequence of the
fact that it's *hard* to keep "managing" per-branch reflogs for
branches that don't exist?

I am planning a workaround using server hooks to "back up" refs that
are being deleted from specific namespaces, in my specific case, and I
imagine that a system like github keeps track of deleted stuff itself
for a while, but I find this "per-ref reflog disappearance" behavior
puzzling / out-of-character, so wanted to make sure I'm not missing
something.

Thanks,
Tao

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-08 11:27 Keep reflogs for deleted (remote tracking) branches? Tao Klerks
@ 2022-03-08 11:54 ` Han-Wen Nienhuys
  2022-03-08 12:59   ` Ævar Arnfjörð Bjarmason
  2022-03-08 14:57 ` Jeff Hostetler
  1 sibling, 1 reply; 8+ messages in thread
From: Han-Wen Nienhuys @ 2022-03-08 11:54 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

On Tue, Mar 8, 2022 at 12:28 PM Tao Klerks <tao@klerks.biz> wrote:
> As far as I can tell, even "core.logAllRefUpdates=always" does *not*
> keep any reflog entries around, even temporarily (until reflog
> expiry), once a ref  is deleted - do I understand that correctly? Is
> this behavior intentional / reasoned, or just a consequence of the
> fact that it's *hard* to keep "managing" per-branch reflogs for
> branches that don't exist?
>
> I am planning a workaround using server hooks to "back up" refs that
> are being deleted from specific namespaces, in my specific case, and I
> imagine that a system like github keeps track of deleted stuff itself
> for a while, but I find this "per-ref reflog disappearance" behavior
> puzzling / out-of-character, so wanted to make sure I'm not missing
> something.

I think this behavior is motivated by directory/file conflicts. If you
have a reflog file in refs/logs/foo, you can't create a reflog for
refs/foo/bar, because that would live in refs/logs/foo/bar

At Google, we keep reflogs in a completely different storage system
altogether, which avoids this problem, and I wouldn't be surprised if
other large hosting providers do something similar.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-08 11:54 ` Han-Wen Nienhuys
@ 2022-03-08 12:59   ` Ævar Arnfjörð Bjarmason
  2022-03-14  8:25     ` Tao Klerks
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-08 12:59 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Tao Klerks, git


On Tue, Mar 08 2022, Han-Wen Nienhuys wrote:

> On Tue, Mar 8, 2022 at 12:28 PM Tao Klerks <tao@klerks.biz> wrote:
>> As far as I can tell, even "core.logAllRefUpdates=always" does *not*
>> keep any reflog entries around, even temporarily (until reflog
>> expiry), once a ref  is deleted - do I understand that correctly? Is
>> this behavior intentional / reasoned, or just a consequence of the
>> fact that it's *hard* to keep "managing" per-branch reflogs for
>> branches that don't exist?
>>
>> I am planning a workaround using server hooks to "back up" refs that
>> are being deleted from specific namespaces, in my specific case, and I
>> imagine that a system like github keeps track of deleted stuff itself
>> for a while, but I find this "per-ref reflog disappearance" behavior
>> puzzling / out-of-character, so wanted to make sure I'm not missing
>> something.
>
> I think this behavior is motivated by directory/file conflicts. If you
> have a reflog file in refs/logs/foo, you can't create a reflog for
> refs/foo/bar, because that would live in refs/logs/foo/bar
>
> At Google, we keep reflogs in a completely different storage system
> altogether, which avoids this problem, and I wouldn't be surprised if
> other large hosting providers do something similar.

I once worked on a system where:

 * References would be "archived", i.e. just a backup system that would
   run "git fetch" without pruning.

 * You were only allowed to push to either existing branches like
   "master", or names with exactly one slash in them, e.g. "avar/topic",
   not "avar/topic/nested", for that you'd need "avar/topic-nested" or
   whatever.

The second item neatly avoids D/F conflicts, at the cost of some
grumbling from people who can't use their preferred branch name.

And you can easily implement backups without that constraint by fetching
refs/* to refs/YYYYMMDD-HHMMSS/* or whatever, and have some manual
pruning process in place for those "secondly refs".

More generally I have not really run into this as a practical
problem.

I.e. if a co-worker created a branch, AND nobody else used it, AND
nothing was based on it, AND someone (presumably they) thought it was OK
to delete it, it was probably something nobody cared all that much about
to begin with :)

Another way to solve a similar problem is to have
pre-receive/post-receive hooks log attempted/successful pushes, which
along with an appropriate "gc" policy will allow you to manually look up
these older branches (or even to fetch them, if you publish the log and
set uploadpack.allowAnySHA1InWant=true).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-08 11:27 Keep reflogs for deleted (remote tracking) branches? Tao Klerks
  2022-03-08 11:54 ` Han-Wen Nienhuys
@ 2022-03-08 14:57 ` Jeff Hostetler
  2022-03-14  8:09   ` Tao Klerks
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Hostetler @ 2022-03-08 14:57 UTC (permalink / raw)
  To: Tao Klerks, git



On 3/8/22 6:27 AM, Tao Klerks wrote:
> Hi folks,
> 
> I have a practical question in case I missed something.
> 
> Imagine a small team (10ppl) working on a single centralized repo, in
> github for example. They regularly create new branches, and typically
> delete them eventually - after merging, or at other times when
> branches were a dead end or whatever. The members of this team all
> have a "simple" git remote configuration, the result of a "git clone"
> with no special configuration. One exception is that they have set
> "fetch.prune" to "true", because otherwise remote branches that have
> been deleted (in the context of completed merges, or arbitrarily by
> other team members) accumulate locally and having to explicitly prune
> them from time to time is a pain. Every time someone says "why do I
> still see these branches in my repo?", someone else replies "oh, just
> run 'git config fetch.prune true'".
> 
> Now, one day someone deletes a branch accidentally from the server,
> and the sole author of that branch has gone on vacation (or has an IT
> failure, or has left the company, or whatever). Other team members
> have seen this branch go by, it's appeared in their "fetch" output,
> but no-one remembers checking it out, so it's not in their main
> "HEAD" reflogs.
> 
> Even though the ref was at one point on every team member's computer,
> and they still undoubtedly have a dangling commit in their repos,
> they're going to have a hard time finding it - there are many dangling
> commits in any given repo.
> 
> Now my question: is there any way to (temporarily) keep a reflog for
> that deleted/pruned branch, in those team members' repos?
> 
> As far as I can tell, even "core.logAllRefUpdates=always" does *not*
> keep any reflog entries around, even temporarily (until reflog
> expiry), once a ref  is deleted - do I understand that correctly? Is
> this behavior intentional / reasoned, or just a consequence of the
> fact that it's *hard* to keep "managing" per-branch reflogs for
> branches that don't exist?
> 
> I am planning a workaround using server hooks to "back up" refs that
> are being deleted from specific namespaces, in my specific case, and I
> imagine that a system like github keeps track of deleted stuff itself
> for a while, but I find this "per-ref reflog disappearance" behavior
> puzzling / out-of-character, so wanted to make sure I'm not missing
> something.
> 
> Thanks,
> Tao
> 

Have you considered having each team member have their own
private fork of the repo?  Then their branches are theirs
alone and no one else needs to see or collide with them.

Then when their work is ready for review/publishing, do
cross-fork PRs into the company's main fork.

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-08 14:57 ` Jeff Hostetler
@ 2022-03-14  8:09   ` Tao Klerks
  0 siblings, 0 replies; 8+ messages in thread
From: Tao Klerks @ 2022-03-14  8:09 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git

On Tue, Mar 8, 2022 at 3:57 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
> On 3/8/22 6:27 AM, Tao Klerks wrote:
> >
> > I have a practical question in case I missed something.
> >
>
> Have you considered having each team member have their own
> private fork of the repo?  Then their branches are theirs
> alone and no one else needs to see or collide with them.

Yes, this is a scheme that I've certainly considered - it is the
public norm after all, at least in open-source development.

In the case I'm describing, however, teams often prefer to work in
communal spaces, seeing work appear and disappear in their group
environment.

Of course most teams wish to be isolated from each other, and of
course individuals want and have the option to work in isolation from
their team for any given period of time - and by "isolation" I don't
necessarily mean secret, but rather "not pushing refs into a space
that others will automatically fetch".

The case I am describing is a specific subset of an ecosystem - the
case where a team normally works in a communal central refspace.

Anyway, thanks - it looks like no-one considers git's behavior very
surprising here, I guess I'll just implement a server-hook-based
workaround.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-08 12:59   ` Ævar Arnfjörð Bjarmason
@ 2022-03-14  8:25     ` Tao Klerks
  2022-03-14 10:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 8+ messages in thread
From: Tao Klerks @ 2022-03-14  8:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Han-Wen Nienhuys, git

On Tue, Mar 8, 2022 at 2:05 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Tue, Mar 08 2022, Han-Wen Nienhuys wrote:
>
> > On Tue, Mar 8, 2022 at 12:28 PM Tao Klerks <tao@klerks.biz> wrote:
> >> As far as I can tell, even "core.logAllRefUpdates=always" does *not*
> >> keep any reflog entries around, even temporarily (until reflog
> >> expiry), once a ref  is deleted - do I understand that correctly? Is
> >> this behavior intentional / reasoned, or just a consequence of the
> >> fact that it's *hard* to keep "managing" per-branch reflogs for
> >> branches that don't exist?
> >>
> >> I am planning a workaround using server hooks to "back up" refs that
> >> are being deleted from specific namespaces, in my specific case, and I
> >> imagine that a system like github keeps track of deleted stuff itself
> >> for a while, but I find this "per-ref reflog disappearance" behavior
> >> puzzling / out-of-character, so wanted to make sure I'm not missing
> >> something.
> >
> > I think this behavior is motivated by directory/file conflicts. If you
> > have a reflog file in refs/logs/foo, you can't create a reflog for
> > refs/foo/bar, because that would live in refs/logs/foo/bar
> >
> > At Google, we keep reflogs in a completely different storage system
> > altogether, which avoids this problem, and I wouldn't be surprised if
> > other large hosting providers do something similar.

This is interesting - so at google is the assumption that the storage
system, whatever it looks like, *does* keep reflogs for deleted
branches? Or at least backs up states that get force-pushed out of
existence?

>
> I once worked on a system where:
>
>  * References would be "archived", i.e. just a backup system that would
>    run "git fetch" without pruning.
>
>  * You were only allowed to push to either existing branches like
>    "master", or names with exactly one slash in them, e.g. "avar/topic",
>    not "avar/topic/nested", for that you'd need "avar/topic-nested" or
>    whatever.
>
> The second item neatly avoids D/F conflicts, at the cost of some
> grumbling from people who can't use their preferred branch name.
>
> And you can easily implement backups without that constraint by fetching
> refs/* to refs/YYYYMMDD-HHMMSS/* or whatever, and have some manual
> pruning process in place for those "secondly refs".

Ah right, backing up into another system - I guess we could...

>
> More generally I have not really run into this as a practical
> problem.

That's fair, nor have I - but I *have* come reasonably close: one
person accidentally deletes a branch that someone else had prepared
*without even realizing*, and the initial author is not available, and
I only find out about it a few hours later. Dangling commit hunt, here
we come. (the original author became available and re-pushed before it
came to that)

>
> Another way to solve a similar problem is to have
> pre-receive/post-receive hooks log attempted/successful pushes, which
> along with an appropriate "gc" policy will allow you to manually look up
> these older branches (or even to fetch them, if you publish the log and
> set uploadpack.allowAnySHA1InWant=true).

Yep, that's closer to my expected plan, thanks - my intent is to back
up, on force-push and/or deletion, into a specific refspace with a
cleanup policy, using a server hook. So after something is "deleted"
(or force-pushed away), it can be easily recovered for a period of eg
3 months in that refspace, eg
"refs/force-push-backups/YYYY-MM-DD-<BRANCHNAME>-<HASHPREFIX>".

My question is specifically about the, in my opinion, very surprising
behavior of deleting reflogs along with deleted branches - I mainly
provided the example use-case for context.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-14  8:25     ` Tao Klerks
@ 2022-03-14 10:44       ` Ævar Arnfjörð Bjarmason
  2022-03-14 12:10         ` Tao Klerks
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-14 10:44 UTC (permalink / raw)
  To: Tao Klerks; +Cc: Han-Wen Nienhuys, git


On Mon, Mar 14 2022, Tao Klerks wrote:

> On Tue, Mar 8, 2022 at 2:05 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> More generally I have not really run into this as a practical
>> problem.
>
> That's fair, nor have I - but I *have* come reasonably close: one
> person accidentally deletes a branch that someone else had prepared
> *without even realizing*, and the initial author is not available, and
> I only find out about it a few hours later. Dangling commit hunt, here
> we come. (the original author became available and re-pushed before it
> came to that)

I think you might find it interesting to have pre-receive hooks
e.g. reject pushes if you're deleting a topic whose commits aren't
entirely <your author> i.e. just something like:

    git push -o ireallymeanit=1 --delete topic

I.e. it's an easy to implement extra safety check that people can always
opt-out of, print a scary message and most people will think twice :)

>> Another way to solve a similar problem is to have
>> pre-receive/post-receive hooks log attempted/successful pushes, which
>> along with an appropriate "gc" policy will allow you to manually look up
>> these older branches (or even to fetch them, if you publish the log and
>> set uploadpack.allowAnySHA1InWant=true).
>
> Yep, that's closer to my expected plan, thanks - my intent is to back
> up, on force-push and/or deletion, into a specific refspace with a
> cleanup policy, using a server hook. So after something is "deleted"
> (or force-pushed away), it can be easily recovered for a period of eg
> 3 months in that refspace, eg
> "refs/force-push-backups/YYYY-MM-DD-<BRANCHNAME>-<HASHPREFIX>".
>
> My question is specifically about the, in my opinion, very surprising
> behavior of deleting reflogs along with deleted branches - I mainly
> provided the example use-case for context.

Yes it's quite a mess, e.g. if you follow the rabit hole at the
recent[1].

One fundimental problem (discussed in various places around the reftable
backend) is that we carry N meanings for an empty reflog:

A. "This is an active branch, but we have expired the entries".

B. "I manually created this, knowing that the various core.* configs
   around reflog will say "oh, a reflog exists, let's log to it" (in
   some cases).

C. Another is: This is "stale" log, i.e. no branch exists, but the log
   is there.

Which is one reason[2] we'd delete them on branch deletion, because
otherwise we'd start logging again when a branch is re-created, which
possibly isn't what we wanted.

1. https://lore.kernel.org/git/de5e2b0e290791d0a4f58a893d8571b5fc8c4f1a.1646952843.git.avarab@gmail.com
2. I'm not saying this was intended, and haven't looked into this case,
   just that's it it's an emergent effect of how these files are treated
   now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Keep reflogs for deleted (remote tracking) branches?
  2022-03-14 10:44       ` Ævar Arnfjörð Bjarmason
@ 2022-03-14 12:10         ` Tao Klerks
  0 siblings, 0 replies; 8+ messages in thread
From: Tao Klerks @ 2022-03-14 12:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Han-Wen Nienhuys, git

On Mon, Mar 14, 2022 at 11:52 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> I think you might find it interesting to have pre-receive hooks
> e.g. reject pushes if you're deleting a topic whose commits aren't
> entirely <your author> i.e. just something like:
>
>     git push -o ireallymeanit=1 --delete topic
>
> I.e. it's an easy to implement extra safety check that people can always
> opt-out of, print a scary message and most people will think twice :)

That is indeed very interesting, thanks! I need to think about exactly
when this is the right thing to do, but it's a tool in the box that I
was not aware of!


> > My question is specifically about the, in my opinion, very surprising
> > behavior of deleting reflogs along with deleted branches - I mainly
> > provided the example use-case for context.
>
> Yes it's quite a mess, e.g. if you follow the rabit hole at the
> recent[1].
>
> One fundimental problem (discussed in various places around the reftable
> backend) is that we carry N meanings for an empty reflog:
>
> A. "This is an active branch, but we have expired the entries".
>
> B. "I manually created this, knowing that the various core.* configs
>    around reflog will say "oh, a reflog exists, let's log to it" (in
>    some cases).
>
> C. Another is: This is "stale" log, i.e. no branch exists, but the log
>    is there.
>
> Which is one reason[2] we'd delete them on branch deletion, because
> otherwise we'd start logging again when a branch is re-created, which
> possibly isn't what we wanted.
>
> 1. https://lore.kernel.org/git/de5e2b0e290791d0a4f58a893d8571b5fc8c4f1a.1646952843.git.avarab@gmail.com
> 2. I'm not saying this was intended, and haven't looked into this case,
>    just that's it it's an emergent effect of how these files are treated
>    now.

Very interesting, thx. Fwiw I would argue that resuming full logging
when a new branch appears with the same name (within the period of
time where the reflog is not empty yet) is a very reasonable thing to
end up doing, but I guess Han-Wen's note about potential path
conflicts on branches *after* a deletion make this a hard thing to
change, even if "accidental logging resuming" were accepted as a
sensible outcome here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-14 12:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 11:27 Keep reflogs for deleted (remote tracking) branches? Tao Klerks
2022-03-08 11:54 ` Han-Wen Nienhuys
2022-03-08 12:59   ` Ævar Arnfjörð Bjarmason
2022-03-14  8:25     ` Tao Klerks
2022-03-14 10:44       ` Ævar Arnfjörð Bjarmason
2022-03-14 12:10         ` Tao Klerks
2022-03-08 14:57 ` Jeff Hostetler
2022-03-14  8:09   ` Tao Klerks

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).