* Distinguishing FF vs non-FF updates in the reflog? @ 2021-03-17 20:06 Han-Wen Nienhuys 2021-03-17 21:21 ` Martin Fick ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-17 20:06 UTC (permalink / raw) To: git Hi there, I'm working on some extensions to Gerrit for which it would be very beneficial if we could tell from the reflog if an update is a fast-forward or not: if we find a SHA1 in the reflog, and see there were only FF updates since, we can be sure that the SHA1 is reachable from the branch, without having to open packfiles and decode commits. For the reftable format, I think we could store this easily by introducing more record types. Today we have 0 = deletion, 1 = update, and we could add 2 = FF update, 3 = non-FF update. However, the textual reflog format doesn't easily allow for this. However, we might add a convention, eg. have the message start with 'FF' or 'NFF' depending on the nature of the update. Does this make sense, and if yes is it worth proposing a change? thanks, -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys @ 2021-03-17 21:21 ` Martin Fick 2021-03-18 8:58 ` Han-Wen Nienhuys 2021-03-18 19:47 ` Jeff King 2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 19+ messages in thread From: Martin Fick @ 2021-03-17 21:21 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Wednesday, March 17, 2021 9:06:06 PM MDT Han-Wen Nienhuys wrote: > I'm working on some extensions to Gerrit for which it would be very > beneficial if we could tell from the reflog if an update is a > fast-forward or not: if we find a SHA1 in the reflog, and see there > were only FF updates since, we can be sure that the SHA1 is reachable > from the branch, without having to open packfiles and decode commits. I don't think this would be reliable. 1) Not all updates make it to the reflogs 2) Reflogs can be edited or mucked with 3) On NFS reflogs can outright be wrong even when used properly as their are caching issues. We specifically have seen entries that appear to be FFs that were not. I believe that today git can do very fast reachability checks without opening pack files by using some of its indexes (bitmap code or https://git-scm.com/ docs/commit-graph ?). It probably makes sense to add this ability to jgit if that is what you need? -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-17 21:21 ` Martin Fick @ 2021-03-18 8:58 ` Han-Wen Nienhuys 2021-03-18 19:35 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-18 8:58 UTC (permalink / raw) To: Martin Fick; +Cc: git On Wed, Mar 17, 2021 at 10:22 PM Martin Fick <mfick@codeaurora.org> wrote: > > On Wednesday, March 17, 2021 9:06:06 PM MDT Han-Wen Nienhuys wrote: > > I'm working on some extensions to Gerrit for which it would be very > > beneficial if we could tell from the reflog if an update is a > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > were only FF updates since, we can be sure that the SHA1 is reachable > > from the branch, without having to open packfiles and decode commits. > > I don't think this would be reliable. > > 1) Not all updates make it to the reflogs > 2) Reflogs can be edited or mucked with > 3) On NFS reflogs can outright be wrong even when used properly as their are > caching issues. We specifically have seen entries that appear to be FFs that > were not. Can you tell a little more about 3) ? SInce we don't annotate non-FF vs FF today, what does "appear to be FFs" mean? But you are right: since the reflog for a branch is in a different file from the branch head, there is no way to do an update to both of them at the same time. I guess this will have to be a reftable-only feature. > I believe that today git can do very fast reachability checks without opening > pack files by using some of its indexes (bitmap code or https://git-scm.com/ > docs/commit-graph ?). It probably makes sense to add this ability to jgit if > that is what you need? The bitmaps are generated by GC, and you can't GC all the time. JGit has support for bitmaps, and its support actually predates C-Git's support for it. (It was added to JGit by Colby Ranger who worked in Shawn's team). I expect that the commit graph doesn't work for my intended use-case. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 8:58 ` Han-Wen Nienhuys @ 2021-03-18 19:35 ` Jeff King 2021-03-18 22:24 ` Martin Fick 2021-03-18 22:31 ` Martin Fick 2 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2021-03-18 19:35 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: Martin Fick, git On Thu, Mar 18, 2021 at 09:58:56AM +0100, Han-Wen Nienhuys wrote: > > 1) Not all updates make it to the reflogs > > 2) Reflogs can be edited or mucked with > > 3) On NFS reflogs can outright be wrong even when used properly as their are > > caching issues. We specifically have seen entries that appear to be FFs that > > were not. > > Can you tell a little more about 3) ? SInce we don't annotate non-FF > vs FF today, what does "appear to be FFs" mean? > > But you are right: since the reflog for a branch is in a different > file from the branch head, there is no way to do an update to both of > them at the same time. I guess this will have to be a reftable-only > feature. Each individual reflog entry (in the branch reflog and the HEAD reflog) should still be consistent, though. They give the "before" and "after" object ids, and the ff-ness is an immutable property of those commit ids. > > I believe that today git can do very fast reachability checks without opening > > pack files by using some of its indexes (bitmap code or https://git-scm.com/ > > docs/commit-graph ?). It probably makes sense to add this ability to jgit if > > that is what you need? > > The bitmaps are generated by GC, and you can't GC all the time. JGit > has support for bitmaps, and its support actually predates C-Git's > support for it. (It was added to JGit by Colby Ranger who worked in > Shawn's team). Bitmaps can help with these checks, but we don't actually look at them in most of the algorithms one might use for computing ancestry. One of the reasons for that is that they often backfire as an optimization, because: - as you note, they are often not up to date because they require a repack. So they won't help when asking about very recently added commits (which people tend to ask about more than ancient ones). - the bitmap file format doesn't have any index. So a reader has to scan the whole thing upon opening to decide which commits have bitmaps. For several years we had a patch at GitHub that checked for bitmaps during "--contains" traversals. Even though it did sometimes backfire, it was enough of a net win to be worth keeping, compared to actually opening commit objects to follow their parent pointers. But with commit-graphs, it was a strict loss, and we stopped using it entirely last year. (We do still look at bitmaps for our branch ahead/behind checks using a custom patch; I'm suspicious of its performance for the same reasons, but we haven't dug carefully into it). But... > I expect that the commit graph doesn't work for my intended use-case. ...I think commit-graphs are a big win here. They are more often kept up to date, because they can be generated incrementally with effort proportional to the number of new commits. And they make a big difference if the traversal has to cover a lot of commits. E.g., here's the most extreme case in git.git, checking ancestry of the oldest commit: $ time git merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $? real 0m0.014s user 0m0.008s sys 0m0.005s 0 $ time git -c core.commitgraph=false merge-base --is-ancestor e83c5163316f89bfbde7d9ab23ca2e25604af290 HEAD; echo $? real 0m0.398s user 0m0.369s sys 0m0.028s 0 Of course most results won't be so dramatic, because they wouldn't have to traverse many commits in the first place (so they are already pretty fast with or without the commit-graph). But that 14ms should be an upper bound for this repo. And naturally that scales with the number of commits; in linux.git it's 43ms, compared to 8.7s without commit-graphs). -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 8:58 ` Han-Wen Nienhuys 2021-03-18 19:35 ` Jeff King @ 2021-03-18 22:24 ` Martin Fick 2021-03-22 12:31 ` Han-Wen Nienhuys 2021-03-18 22:31 ` Martin Fick 2 siblings, 1 reply; 19+ messages in thread From: Martin Fick @ 2021-03-18 22:24 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Thursday, March 18, 2021 9:58:56 AM MDT Han-Wen Nienhuys wrote: > On Wed, Mar 17, 2021 at 10:22 PM Martin Fick <mfick@codeaurora.org> wrote: > > On Wednesday, March 17, 2021 9:06:06 PM MDT Han-Wen Nienhuys wrote: > > > I'm working on some extensions to Gerrit for which it would be very > > > beneficial if we could tell from the reflog if an update is a > > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > > were only FF updates since, we can be sure that the SHA1 is reachable > > > from the branch, without having to open packfiles and decode commits. > > > > I don't think this would be reliable. > > > > 1) Not all updates make it to the reflogs > > 2) Reflogs can be edited or mucked with > > 3) On NFS reflogs can outright be wrong even when used properly as their > > are caching issues. We specifically have seen entries that appear to be > > FFs that were not. > > Can you tell a little more about 3) ? SInce we don't annotate non-FF > vs FF today, what does "appear to be FFs" mean? To be honest I don't recall for sure, but I will describe what I think has happened. I think that we have seen a server(A) update a branch from C1 to C2A, and then later another server(B) update the same branch from C1 to C2B. Obviously the move from C2A to C2B is not a FF, but that move is not what is recorded. Each of those updates was a FF when viewed as separate entries, but if you look at both lines you can see that the second entry does not start where the first one left off. This would be detectable, it would appear as if there was a missed entry that did a rewind from C2A to C1, but that rewind presumably actually came in as part of the second update from server (B) as it had a cached version of the branch and believed it still pointed to C1 when it made its update, -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 22:24 ` Martin Fick @ 2021-03-22 12:31 ` Han-Wen Nienhuys 2021-03-22 17:45 ` Martin Fick 0 siblings, 1 reply; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-22 12:31 UTC (permalink / raw) To: Martin Fick; +Cc: git On Thu, Mar 18, 2021 at 11:24 PM Martin Fick <mfick@codeaurora.org> wrote: > > On Thursday, March 18, 2021 9:58:56 AM MDT Han-Wen Nienhuys wrote: > > On Wed, Mar 17, 2021 at 10:22 PM Martin Fick <mfick@codeaurora.org> wrote: > > > On Wednesday, March 17, 2021 9:06:06 PM MDT Han-Wen Nienhuys wrote: > > > > I'm working on some extensions to Gerrit for which it would be very > > > > beneficial if we could tell from the reflog if an update is a > > > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > > > were only FF updates since, we can be sure that the SHA1 is reachable > > > > from the branch, without having to open packfiles and decode commits. > > > > > > I don't think this would be reliable. > > > > > > 1) Not all updates make it to the reflogs > > > 2) Reflogs can be edited or mucked with > > > 3) On NFS reflogs can outright be wrong even when used properly as their > > > are caching issues. We specifically have seen entries that appear to be > > > FFs that were not. > > > > Can you tell a little more about 3) ? SInce we don't annotate non-FF > > vs FF today, what does "appear to be FFs" mean? > > To be honest I don't recall for sure, but I will describe what I think has > happened. I think that we have seen a server(A) update a branch from > C1 to C2A, and then later another server(B) update the same branch from C1 to > C2B. Obviously the move from C2A to C2B is not a FF, but that move is not what > is recorded. Each of those updates was a FF when viewed as separate entries, I think those would fail with the way that Gerrit uses JGit, because C1 -> C2B would fail with LOCK_ERROR. I guess there are code paths in Git (?) that will execute force-push without checking if the update is FF or not. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 12:31 ` Han-Wen Nienhuys @ 2021-03-22 17:45 ` Martin Fick 0 siblings, 0 replies; 19+ messages in thread From: Martin Fick @ 2021-03-22 17:45 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Monday, March 22, 2021 1:31:25 PM MDT Han-Wen Nienhuys wrote: > On Thu, Mar 18, 2021 at 11:24 PM Martin Fick <mfick@codeaurora.org> wrote: > > On Thursday, March 18, 2021 9:58:56 AM MDT Han-Wen Nienhuys wrote: > > > On Wed, Mar 17, 2021 at 10:22 PM Martin Fick <mfick@codeaurora.org> wrote: > > > > On Wednesday, March 17, 2021 9:06:06 PM MDT Han-Wen Nienhuys wrote: > > > > > I'm working on some extensions to Gerrit for which it would be very > > > > > beneficial if we could tell from the reflog if an update is a > > > > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > > > > were only FF updates since, we can be sure that the SHA1 is > > > > > reachable > > > > > from the branch, without having to open packfiles and decode > > > > > commits. > > > > > > > > I don't think this would be reliable. > > > > > > > > 1) Not all updates make it to the reflogs > > > > 2) Reflogs can be edited or mucked with > > > > 3) On NFS reflogs can outright be wrong even when used properly as > > > > their > > > > are caching issues. We specifically have seen entries that appear to > > > > be > > > > FFs that were not. > > > > > > Can you tell a little more about 3) ? SInce we don't annotate non-FF > > > vs FF today, what does "appear to be FFs" mean? > > > > To be honest I don't recall for sure, but I will describe what I think has > > happened. I think that we have seen a server(A) update a branch from > > C1 to C2A, and then later another server(B) update the same branch from C1 > > to C2B. Obviously the move from C2A to C2B is not a FF, but that move is > > not what is recorded. Each of those updates was a FF when viewed as > > separate entries, > I think those would fail with the way that Gerrit uses JGit, because > C1 -> C2B would fail with LOCK_ERROR. If jgit knew that the branch no longer pointed to C1, then yes it should fail with LOCK_ERROR. However this situation is believed to arise due to jgit's caching which could make it think that the branch still points to C1 even thought it has already been advanced to C2A! :( -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 8:58 ` Han-Wen Nienhuys 2021-03-18 19:35 ` Jeff King 2021-03-18 22:24 ` Martin Fick @ 2021-03-18 22:31 ` Martin Fick 2021-03-18 22:54 ` Jeff King 2 siblings, 1 reply; 19+ messages in thread From: Martin Fick @ 2021-03-18 22:31 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Thursday, March 18, 2021 9:58:56 AM MDT Han-Wen Nienhuys wrote: > The bitmaps are generated by GC, and you can't GC all the time. I believe that I recently saw an effort to make this incremental, perhaps related to the geometric repacking series? If that were the case, you could gc much more often cheaply. Perhaps it could be something done on every upload at some point the way that reflog effectively does on every update? As Peff pointed out though, commit-graphs would still be a better way to go anyway since that is closer to their intent, -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 22:31 ` Martin Fick @ 2021-03-18 22:54 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2021-03-18 22:54 UTC (permalink / raw) To: Martin Fick; +Cc: Han-Wen Nienhuys, git On Thu, Mar 18, 2021 at 04:31:24PM -0600, Martin Fick wrote: > On Thursday, March 18, 2021 9:58:56 AM MDT Han-Wen Nienhuys wrote: > > The bitmaps are generated by GC, and you can't GC all the time. > > I believe that I recently saw an effort to make this incremental, perhaps > related to the geometric repacking series? If that were the case, you could gc > much more often cheaply. Perhaps it could be something done on every upload at > some point the way that reflog effectively does on every update? That geometric repacking work is leading up to having a bitmap for a multi-pack-index. Which will make them _cheaper_, but still not especially cheap (because we've reordered the objects corresponding to each bit, and also because our writing process still does a lot of O(nr_commits) work). In the very long run, I think the way out would be to stop using pack or midx ordering as the basis of the bitmap, and instead have a stable object ordering that can be appended to. That would allow true incremental generation of the bitmaps (leaving old ones in place, and just adding a new ones to represent new commits). But that's such a big departure from the status quo that having a midx bitmap seemed like a more attainable middle ground in the meantime. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys 2021-03-17 21:21 ` Martin Fick @ 2021-03-18 19:47 ` Jeff King 2021-03-22 14:40 ` Han-Wen Nienhuys 2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 19+ messages in thread From: Jeff King @ 2021-03-18 19:47 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Wed, Mar 17, 2021 at 09:06:06PM +0100, Han-Wen Nienhuys wrote: > I'm working on some extensions to Gerrit for which it would be very > beneficial if we could tell from the reflog if an update is a > fast-forward or not: if we find a SHA1 in the reflog, and see there > were only FF updates since, we can be sure that the SHA1 is reachable > from the branch, without having to open packfiles and decode commits. I left some numbers in another part of the thread, but IMHO performance isn't that compelling a reason to do this these days, if you are using commit-graphs. Just walking the reflog might be _slightly_ faster, though not necessarily (it depends on whether the depth of the object graph or the depth of the reflog chain is deeper). It might matter more if you are using a more exotic storage scheme, where switching from accessing reflogs to objects implies extra round-trips to a server (e.g., custom storage backends with JGit; I don't know the state of the art in what Google is doing there). > For the reftable format, I think we could store this easily by > introducing more record types. Today we have 0 = deletion, 1 = update, > and we could add 2 = FF update, 3 = non-FF update. > > However, the textual reflog format doesn't easily allow for this. > However, we might add a convention, eg. have the message start with > 'FF' or 'NFF' depending on the nature of the update. > > Does this make sense, and if yes is it worth proposing a change? At GitHub we do something similar. We don't generally use reflogs much at all, but we keep a custom "audit log": a single append-only file that records every ref update in the repository. And its format just happens to be one reflog entry per line, prefixed by the updated ref. And there we do generally annotate the FF-ness of an update by stuffing it into the free-form message field (in fact, we shove in a small JSON object, so we record multiple fields like the pushing id, IP, etc). But the main goal there isn't performance (and in fact we don't generally consult it for anything outside of debugging). The reason we record FF-ness is for later debugging or analysis. We don't prune from the audit log, and we don't consider it for reachability when we prune objects (since otherwise you'd never be able to prune anything!). So the objects sometimes aren't available later to compute, but we still want to know if the user did a force-push, etc. I don't think that really applies to regular reflogs, because they do imply reachability (and they are not great for later analysis, because we may selectively expire unreachable entries). -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-18 19:47 ` Jeff King @ 2021-03-22 14:40 ` Han-Wen Nienhuys 2021-03-26 7:43 ` Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-22 14:40 UTC (permalink / raw) To: Jeff King; +Cc: git On Thu, Mar 18, 2021 at 8:47 PM Jeff King <peff@peff.net> wrote: > > I'm working on some extensions to Gerrit for which it would be very > > beneficial if we could tell from the reflog if an update is a > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > were only FF updates since, we can be sure that the SHA1 is reachable > > from the branch, without having to open packfiles and decode commits. > > I left some numbers in another part of the thread, but IMHO performance > isn't that compelling a reason to do this these days, if you are using > commit-graphs. > > Just walking the reflog might be _slightly_ faster, though not > necessarily (it depends on whether the depth of the object graph or the > depth of the reflog chain is deeper). It might matter more if you are > using a more exotic storage scheme, where switching from accessing > reflogs to objects implies extra round-trips to a server (e.g., custom > storage backends with JGit; I don't know the state of the art in what > Google is doing there). JGit doesn't currently support commit-graph, so it's hard to predict what performance will be like, but isn't commit-graph is keyed by SHA1? That makes it hard to do caching, especially when considering large repositories. AFAIU, commit-graph would help speed up reachability checks, by being able to shortcut cases where the commit number proves that some commit is not ancestor of the other, but you still have to do a revwalk to conclusively prove reachability. In our storage system, the revwalk runs on top of packfile data that must be faulted-in (slow!) from datacenter-wide storage. It's made worse because we don't support midx yet. The application that I'm thinking of providing a way for automation to deal with lagging replicas. This could be done by specifying a X-Need-GitRef: $repositoryname~$refname~$SHA1 header on Gerrit requests, that specify that the given $SHA1 must have been in a recent ref update, and be reachable from $refname. The reflog has this information organized in a form that suited very well to answering these questions quickly (assuming the reflog is annotated such that we can distinguish FF and non-FF updates) > > Does this make sense, and if yes is it worth proposing a change? > > At GitHub we do something similar. We don't generally use reflogs much > at all, but we keep a custom "audit log": a single append-only file that > records every ref update in the repository. And its format just happens > to be one reflog entry per line, prefixed by the updated ref. The interest of having a standard/convention in Git would be to not require reftable for folks that want to use this feature. > And there we do generally annotate the FF-ness of an update by stuffing > it into the free-form message field (in fact, we shove in a small JSON > object, so we record multiple fields like the pushing id, IP, etc). > > But the main goal there isn't performance (and in fact we don't > generally consult it for anything outside of debugging). The reason we > record FF-ness is for later debugging or analysis. We don't prune from > the audit log, and we don't consider it for reachability when we prune > objects (since otherwise you'd never be able to prune anything!). So the > objects sometimes aren't available later to compute, but we still want > to know if the user did a force-push, etc. We store reflogs in a global database table, which has this kind of information, but the Google-specific format is harder to make work with Gerrit, which is open source. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 14:40 ` Han-Wen Nienhuys @ 2021-03-26 7:43 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2021-03-26 7:43 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git On Mon, Mar 22, 2021 at 03:40:46PM +0100, Han-Wen Nienhuys wrote: > > I left some numbers in another part of the thread, but IMHO performance > > isn't that compelling a reason to do this these days, if you are using > > commit-graphs. > > > > Just walking the reflog might be _slightly_ faster, though not > > necessarily (it depends on whether the depth of the object graph or the > > depth of the reflog chain is deeper). It might matter more if you are > > using a more exotic storage scheme, where switching from accessing > > reflogs to objects implies extra round-trips to a server (e.g., custom > > storage backends with JGit; I don't know the state of the art in what > > Google is doing there). > > JGit doesn't currently support commit-graph, so it's hard to predict > what performance will be like, but isn't commit-graph is keyed by > SHA1? That makes it hard to do caching, especially when considering > large repositories. Yes, it's keyed by sha1. It's essentially replacing "inflate the commit object and parse it" with "here are the parsed values as mmap-able 32-bit integer fields" (there's some other stuff with generation numbers, too, but the main speedup is simply that accessing each commit is orders of magnitude cheaper). It caches well, because those properties of the commit are immutable. But if you meant "when pulling data from the commit-graph file, is it friendly to block cache", then no, it's not linear. You'd binary search within it to find each commit, just as you would a pack .idx (and just like a .idx, I'd expect a system that is pulling data from a network source to want to grab the whole commit-graph file. They tend to be much smaller than the main .idx for a given repo). > AFAIU, commit-graph would help speed up reachability checks, by being > able to shortcut cases where the commit number proves that some commit > is not ancestor of the other, but you still have to do a revwalk to > conclusively prove reachability. Right. You'll still walk a lot of the commits, but you'll do so much faster (the generation numbers can also help prune some uninteresting side paths, but again, I think the main value for this operation is just getting the parent info much faster). -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys 2021-03-17 21:21 ` Martin Fick 2021-03-18 19:47 ` Jeff King @ 2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason 2021-03-22 14:59 ` Han-Wen Nienhuys 2 siblings, 1 reply; 19+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-22 13:26 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git, Jeff King, Martin Fick On Wed, Mar 17 2021, Han-Wen Nienhuys wrote: > Hi there, > > I'm working on some extensions to Gerrit for which it would be very > beneficial if we could tell from the reflog if an update is a > fast-forward or not: if we find a SHA1 in the reflog, and see there > were only FF updates since, we can be sure that the SHA1 is reachable > from the branch, without having to open packfiles and decode commits. > > For the reftable format, I think we could store this easily by > introducing more record types. [snip]. Aside from what others have mentioned here, you're talking about the log_type field are you not? I.e.: https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md#log-block-format Has that "log_type = 0x0" tombstone proven to be a worthwhile optimization past the stash case mention there (which is presumably not relevant to the vast majority of Google's use-cases). I.e. it's redundant to looking at the record and seeing if new_id = ZERO_OID. Similarly can't ff v.s. non-ff be deduced unambiguously by looking ahead to the next record, and seeing if the current record's "old_id" matches that of the last record's "new_id". If it does it's a FF, if not it's a non-FF (or a create/delete). I'm not arguing that a quicker lookup isn't needed, I'm just trying to dig at what "beneficial" here is. The format is ordered, and the common case is that the page we have in memory has the last record. What sort of case are we talking about where not unpacking the log_data segment is making a difference? > However, the textual reflog format doesn't easily allow for this. > However, we might add a convention, eg. have the message start with > 'FF' or 'NFF' depending on the nature of the update. Maybe a bit ugly, but a ".." and "..." prefix would at least be consistent with "fetch" output. Or e.g. "commit:" and "+commit:" for ff and non-ff (and we could make it "\t commit:" v.s. "\t+commit:" v.s. current "\tcommit:" to distinguish all three in the current text-based format. Per "OUTPUT" in git-fetch(1). > [Ævar: snipped from earlier] Today we have 0 = deletion, 1 = update, > and we could add 2 = FF update, 3 = non-FF update. I've written log table implementations (a site table in a RDBMS) for git (one table for refs) which had: create, ff, non-ff, delete I wonder if that quad-state would be useful for reftable too, with this proposed change you'd still need to unpack the record and see if the old_id is ZERO_OID to check if it's a creation, would you not? I also wonder if it couldn't be: 0 = deletion, 1 = non-ff-update, 2 = ff-update, 4 = creation So the format wouldn't forever carry the historical wart of this not having been considered from the beginning. It would mean that the few current reftable users (just Google?) would have to look at the record to see if it's *really* a non-ff-update, but presumably they need to do so now for ff v.s. non-ff, so they're no worse off than they are now. Then when those users know they're on a version that distinguishes these they can hard rely on 1 not being a "ff for sure", not a "maybe" status for new updates. Presumably they either don't care about ancient reflog records, or a one-off migration of rewriting the records for older entries could be done. Also between my [1] and this proposal we have at least a reftable v1.01 in the wild (the filename locking behavior change discussed in [1]), and this would make it v1.02, but the only up-to-date spec is for v1.00 (and maybe JGit has other changes I haven't tracked). That [1] change is minor, but still, a spec change. So just a *poke* that having some version where the spec is kept up-to-date with that and this change if it happens would be very useful, especially if the reftable-in-git.git lands one of these days. 1. https://lore.kernel.org/git/87k0tzulf1.fsf@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason @ 2021-03-22 14:59 ` Han-Wen Nienhuys 2021-03-22 15:39 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-22 14:59 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Martin Fick On Mon, Mar 22, 2021 at 2:26 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > I'm working on some extensions to Gerrit for which it would be very > > beneficial if we could tell from the reflog if an update is a > > fast-forward or not: if we find a SHA1 in the reflog, and see there > > were only FF updates since, we can be sure that the SHA1 is reachable > > from the branch, without having to open packfiles and decode commits. > > > > For the reftable format, I think we could store this easily by > > introducing more record types. [snip]. > > Aside from what others have mentioned here, you're talking about the > log_type field are you not? I.e.: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md#log-block-format Correct. > Has that "log_type = 0x0" tombstone proven to be a worthwhile > optimization past the stash case mention there (which is presumably not > relevant to the vast majority of Google's use-cases). I've never really understood the log_type=0x0 use case. I think it was added solely to cater for a use case in CGit's stash command. > I.e. it's redundant to looking at the record and seeing if new_id = > ZERO_OID. > > Similarly can't ff v.s. non-ff be deduced unambiguously by looking ahead > to the next record, and seeing if the current record's "old_id" matches > that of the last record's "new_id". If it does it's a FF, if not it's a > non-FF (or a create/delete). I don't see how that will tell you FF vs non-FF-ness. Both an FF update and a non-FF update look like 'new_oid = 20-random-bytes'. Barring further info, you have to lookup the commit object for those bytes, and then walk back to see if you pass old_oid. AFAICT, a correct sequence of ref updates (FF or not) always has prev.new_oid = current.old_oid. > > [Ævar: snipped from earlier] Today we have 0 = deletion, 1 = update, > > and we could add 2 = FF update, 3 = non-FF update. > > I've written log table implementations (a site table in a RDBMS) for git > (one table for refs) which had: > > create, ff, non-ff, delete > > I wonder if that quad-state would be useful for reftable too, with this > proposed change you'd still need to unpack the record and see if the > old_id is ZERO_OID to check if it's a creation, would you not? Delete & create are handled with ZERO_OID. The reftable format makes it so that you have to decode a record in order to read past it (there is no size framing the table entry level), so there is no big performance advantage in encoding this information in the log_type. You merely use a log_type bit rather than a 20 byte raw ID. Since log records are zlib compressed anyway, it probably also makes no space difference. > I also wonder if it couldn't be: > > 0 = deletion, 1 = non-ff-update, 2 = ff-update, 4 = creation > > So the format wouldn't forever carry the historical wart of this not > having been considered from the beginning. If you do it like this, you will force that all implementations to have to compute whether a (forced) update is a FF or not. I don't know if that is a problem. A 'maybe non-FF' value would be useful. Perhaps we could even do simply 0 = deletion, 1 = maybe-ff-update, 2 = guaranteed-ff-update > It would mean that the few current reftable users (just Google?) would > have to look at the record to see if it's *really* a non-ff-update, but > presumably they need to do so now for ff v.s. non-ff, so they're no > worse off than they are now. At Google, we currently don't record log records in reftable yet. From our perspective, we could probably change the standard 'in place'. JGit has supported reftable since Nov 2019, but I'm unaware of users; I did hear about GerritForge wanting to try it out in production this year. > Then when those users know they're on a version that distinguishes these > they can hard rely on 1 not being a "ff for sure", not a "maybe" status > for new updates. Presumably they either don't care about ancient reflog > records, or a one-off migration of rewriting the records for older > entries could be done. > > Also between my [1] and this proposal we have at least a reftable v1.01 > in the wild (the filename locking behavior change discussed in [1]), and > this would make it v1.02, but the only up-to-date spec is for v1.00 (and > maybe JGit has other changes I haven't tracked). The file locking update has been added to the standard, https://github.com/git/git/pull/951. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 14:59 ` Han-Wen Nienhuys @ 2021-03-22 15:39 ` Ævar Arnfjörð Bjarmason 2021-03-22 15:56 ` Han-Wen Nienhuys 0 siblings, 1 reply; 19+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-22 15:39 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git, Jeff King, Martin Fick On Mon, Mar 22 2021, Han-Wen Nienhuys wrote: > On Mon, Mar 22, 2021 at 2:26 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> > I'm working on some extensions to Gerrit for which it would be very >> > beneficial if we could tell from the reflog if an update is a >> > fast-forward or not: if we find a SHA1 in the reflog, and see there >> > were only FF updates since, we can be sure that the SHA1 is reachable >> > from the branch, without having to open packfiles and decode commits. >> > >> > For the reftable format, I think we could store this easily by >> > introducing more record types. [snip]. >> >> Aside from what others have mentioned here, you're talking about the >> log_type field are you not? I.e.: >> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md#log-block-format > > Correct. > >> Has that "log_type = 0x0" tombstone proven to be a worthwhile >> optimization past the stash case mention there (which is presumably not >> relevant to the vast majority of Google's use-cases). > > I've never really understood the log_type=0x0 use case. I think it was > added solely to cater for a use case in CGit's stash command. > >> I.e. it's redundant to looking at the record and seeing if new_id = >> ZERO_OID. >> >> Similarly can't ff v.s. non-ff be deduced unambiguously by looking ahead >> to the next record, and seeing if the current record's "old_id" matches >> that of the last record's "new_id". If it does it's a FF, if not it's a >> non-FF (or a create/delete). > > I don't see how that will tell you FF vs non-FF-ness. Both an FF > update and a non-FF update look like 'new_oid = 20-random-bytes'. > Barring further info, you have to lookup the commit object for those > bytes, and then walk back to see if you pass old_oid. Because both the next reflog format and reftable's have the update-ref <newvalue> <oldvalue> for each update. So: $ cut -d ' ' -f1-2 .git/logs/refs/remotes/origin/master | head -n 2 1c52ecf4ba0f4f7af72775695fee653f50737c71 ba2aa15129e59f248d8cdd30404bc78b5178f61d ba2aa15129e59f248d8cdd30404bc78b5178f61d 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707 We can know with !strcmp(rows[0][1], rows[1][0]) whether the latest update is a ff or non-ff (it is). As opposed to: $ cut -d ' ' -f1-2 .git/logs/refs/remotes/origin/seen | head -n 2 8b26a41f4bf7e7c0f097cb91012d08fe8ae30e7f 0f3a981cbd5be5f97e9504ab770cd88f988fe820 0f3a981cbd5be5f97e9504ab770cd88f988fe820 fdd019edfe6bc60d0100d5751c41e4f6ad28a2ef Where the rows[0][1] value is not the same as rows[1][0]. What you can't do is find whether any given commit(s) in those ranges are part of the FF or not, since you just have the start/end points. But I'm vaguely paranoid that we're talking past one another here and I've misunderstood what you want... > AFAICT, a correct sequence of ref updates (FF or not) always has > prev.new_oid = current.old_oid. > >> > [Ævar: snipped from earlier] Today we have 0 = deletion, 1 = update, >> > and we could add 2 = FF update, 3 = non-FF update. >> >> I've written log table implementations (a site table in a RDBMS) for git >> (one table for refs) which had: >> >> create, ff, non-ff, delete >> >> I wonder if that quad-state would be useful for reftable too, with this >> proposed change you'd still need to unpack the record and see if the >> old_id is ZERO_OID to check if it's a creation, would you not? > > Delete & create are handled with ZERO_OID. > > The reftable format makes it so that you have to decode a record in > order to read past it (there is no size framing the table entry > level), so there is no big performance advantage in encoding this > information in the log_type. You merely use a log_type bit rather than > a 20 byte raw ID. Since log records are zlib compressed anyway, it > probably also makes no space difference. > >> I also wonder if it couldn't be: >> >> 0 = deletion, 1 = non-ff-update, 2 = ff-update, 4 = creation >> >> So the format wouldn't forever carry the historical wart of this not >> having been considered from the beginning. > > If you do it like this, you will force that all implementations to > have to compute whether a (forced) update is a FF or not. I don't know > if that is a problem. A 'maybe non-FF' value would be useful. Perhaps > we could even do simply > > 0 = deletion, 1 = maybe-ff-update, 2 = guaranteed-ff-update The point is that nobody relies on this now, but also that you'd of course want 1 = i-know-for-sure-non-ff-update, 2 = i-know-for-sure-ff-update. But since logs are transitory (aren't they also expired in reftable, can't see that from skimming hte spec) and this is just a helpful side-index I think a one-time migration for the tiny minority of reftable users who'd care would be preferrable to the vast majority of future reftable users (if it ever lands in git.git) having to deal with this (albeit small) special-case forever. >> It would mean that the few current reftable users (just Google?) would >> have to look at the record to see if it's *really* a non-ff-update, but >> presumably they need to do so now for ff v.s. non-ff, so they're no >> worse off than they are now. > > At Google, we currently don't record log records in reftable yet. From > our perspective, we could probably change the standard 'in place'. > JGit has supported reftable since Nov 2019, but I'm unaware of users; > I did hear about GerritForge wanting to try it out in production this > year. Ah, so not even Google's using it, the backcompat is just for good measure... >> Then when those users know they're on a version that distinguishes these >> they can hard rely on 1 not being a "ff for sure", not a "maybe" status >> for new updates. Presumably they either don't care about ancient reflog >> records, or a one-off migration of rewriting the records for older >> entries could be done. >> >> Also between my [1] and this proposal we have at least a reftable v1.01 >> in the wild (the filename locking behavior change discussed in [1]), and >> this would make it v1.02, but the only up-to-date spec is for v1.00 (and >> maybe JGit has other changes I haven't tracked). > > The file locking update has been added to the standard, > https://github.com/git/git/pull/951. Ah yes. Now I remember. I managed to miss/not recall that. Thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 15:39 ` Ævar Arnfjörð Bjarmason @ 2021-03-22 15:56 ` Han-Wen Nienhuys 2021-03-22 16:40 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-22 15:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Martin Fick On Mon, Mar 22, 2021 at 4:39 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > We can know with !strcmp(rows[0][1], rows[1][0]) whether the latest > update is a ff or non-ff (it is). As opposed to: > > $ cut -d ' ' -f1-2 .git/logs/refs/remotes/origin/seen | head -n 2 > 8b26a41f4bf7e7c0f097cb91012d08fe8ae30e7f 0f3a981cbd5be5f97e9504ab770cd88f988fe820 > 0f3a981cbd5be5f97e9504ab770cd88f988fe820 fdd019edfe6bc60d0100d5751c41e4f6ad28a2ef > > Where the rows[0][1] value is not the same as rows[1][0]. I'm confused. rows[0][1] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" rows[1][0] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" they are the same. I don't understand your argument. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 15:56 ` Han-Wen Nienhuys @ 2021-03-22 16:40 ` Ævar Arnfjörð Bjarmason 2021-03-22 17:12 ` Han-Wen Nienhuys 2021-03-22 18:36 ` Junio C Hamano 0 siblings, 2 replies; 19+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-22 16:40 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git, Jeff King, Martin Fick On Mon, Mar 22 2021, Han-Wen Nienhuys wrote: > On Mon, Mar 22, 2021 at 4:39 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> We can know with !strcmp(rows[0][1], rows[1][0]) whether the latest >> update is a ff or non-ff (it is). As opposed to: >> >> $ cut -d ' ' -f1-2 .git/logs/refs/remotes/origin/seen | head -n 2 >> 8b26a41f4bf7e7c0f097cb91012d08fe8ae30e7f 0f3a981cbd5be5f97e9504ab770cd88f988fe820 >> 0f3a981cbd5be5f97e9504ab770cd88f988fe820 fdd019edfe6bc60d0100d5751c41e4f6ad28a2ef >> >> Where the rows[0][1] value is not the same as rows[1][0]. > > I'm confused. > > rows[0][1] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" > rows[1][0] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" > > they are the same. I don't understand your argument. Sorry, I mean same = ff update, not the same = non-ff. So I flipped those around in describing it. But in any case, the point is that you can reliably tell from a log of updates whether individual updates are ff or non-ff, as long as you: 1. Look at two records but not one, or rather the ff-ness of the update you're looking at now depends on <oldvalue> of that update matching the <newvalue> of the update before that. 2. The log is guaranteed to be in the same order as the update-ref calls, and not to be pruned in the middle. 3. Don't care about the FF-ness of the first entry in the reflog (e.g. prune it at N entries or 2 weeks, you can't see if the oldest entry you have is a FF or not) So as I noted I've materialized this in a RDBMS before as something like ENUM('delete', 'create', 'ff', 'non-ff'), but that was: A. In the delete/create case as much about the convenience of selecting/grouping as anything else, easier to select operation = "delete" than {old,new} = "<copy paste or calefully type exactly 40 x '0'>" B. In SQL it's a PITA both typing and index-wise to get information about a record based on a "previous" record when the relation between the two is a a UNIQUE INDEX and AUTO-INCREMENT id. I.e. for an AUTO_INCREMENT UNIQUE INDEX of repo,refname,id to get the "last" record you need to select where repo && refname is the same, and id < current_id LIMIT 1. C. You're accessing the data ad-hoc & manually via SQL, not some Git-specific query language/wrapper. But in the case of reftable I'm wondering what the use-case is, since presumably whatever API now serves up the reflog from it can just as well look at one more record and fill in a matrialized "ff" or "non-ff" field for the current record based on that. Also: If you're adding more values a genuinely useful value to log (that I've seen custom logging for) is "was this a forced push? || update-ref without the <oldvalue>?", which is *not* the same as "not a fast-forward". I.e. it's the difference between "update this with this <oldvalue>" (ff or non-ff) v.s. "I don't care what the oldvalue is, just update it" (f orce or not). (To complicate matters and make them even more harder to explain, the whole "force with lease" thing is not actually a "force push" in that sense, it's just an update-ref with an oldvalue you expect to result in a non-ff update). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 16:40 ` Ævar Arnfjörð Bjarmason @ 2021-03-22 17:12 ` Han-Wen Nienhuys 2021-03-22 18:36 ` Junio C Hamano 1 sibling, 0 replies; 19+ messages in thread From: Han-Wen Nienhuys @ 2021-03-22 17:12 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Martin Fick On Mon, Mar 22, 2021 at 5:40 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Mon, Mar 22 2021, Han-Wen Nienhuys wrote: > > > On Mon, Mar 22, 2021 at 4:39 PM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> We can know with !strcmp(rows[0][1], rows[1][0]) whether the latest > >> update is a ff or non-ff (it is). As opposed to: > >> > >> $ cut -d ' ' -f1-2 .git/logs/refs/remotes/origin/seen | head -n 2 > >> 8b26a41f4bf7e7c0f097cb91012d08fe8ae30e7f 0f3a981cbd5be5f97e9504ab770cd88f988fe820 > >> 0f3a981cbd5be5f97e9504ab770cd88f988fe820 fdd019edfe6bc60d0100d5751c41e4f6ad28a2ef > >> > >> Where the rows[0][1] value is not the same as rows[1][0]. > > > > I'm confused. > > > > rows[0][1] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" > > rows[1][0] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" > > > > they are the same. I don't understand your argument. > > Sorry, I mean same = ff update, not the same = non-ff. So I flipped > those around in describing it. > > But in any case, the point is that you can reliably tell from a log of > updates whether individual updates are ff or non-ff, as long as you: ? $ git checkout -b non-ff origin/master Previous HEAD position was 59f4325222 Add "test-tool dump-reftable" command. Branch 'non-ff' set up to track remote branch 'master' from 'origin'. Switched to a new branch 'non-ff' $ git reset --hard HEAD^ HEAD is now at 56a57652ef Sync with Git 2.30.2 for CVE-2021-21300 $ cat .git/logs/refs/heads/non-ff 0000000000000000000000000000000000000000 13d7ab6b5d7929825b626f050b62a11241ea4945 Han-Wen Nienhuys <hanwen@google.com> 1616432938 +0100 branch: Created from origin/master 13d7ab6b5d7929825b626f050b62a11241ea4945 56a57652ef8e4ca2f108a8719b8caeed5e153c95 Han-Wen Nienhuys <hanwen@google.com> 1616432948 +0100 reset: moving to HEAD^ How can I tell from this reflog that "moving to HEAD^" is a non-FF update? -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Distinguishing FF vs non-FF updates in the reflog? 2021-03-22 16:40 ` Ævar Arnfjörð Bjarmason 2021-03-22 17:12 ` Han-Wen Nienhuys @ 2021-03-22 18:36 ` Junio C Hamano 1 sibling, 0 replies; 19+ messages in thread From: Junio C Hamano @ 2021-03-22 18:36 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Han-Wen Nienhuys, git, Jeff King, Martin Fick Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> I'm confused. >> >> rows[0][1] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" >> rows[1][0] == "0f3a981cbd5be5f97e9504ab770cd88f988fe820" >> >> they are the same. I don't understand your argument. > > Sorry, I mean same = ff update, not the same = non-ff. So I flipped > those around in describing it. I am confused too. Are you tacking something else, a gap in a run of reflog entries? If I go from commit A to B to C, the first log entry would record the transtion from A->B, and the second entry would record the transition from B->C, and the lack of gap does not say anything about the relationship between A and B, or B and C. A can be, and does not have to be, an ancestor of B, and B can be, and does not have to be, an ancestor of C. Hopping from A to B to C would leave the same pair of reflog records and I do not think you can tell the reachability among A and B and C from them. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-03-26 7:44 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-17 20:06 Distinguishing FF vs non-FF updates in the reflog? Han-Wen Nienhuys 2021-03-17 21:21 ` Martin Fick 2021-03-18 8:58 ` Han-Wen Nienhuys 2021-03-18 19:35 ` Jeff King 2021-03-18 22:24 ` Martin Fick 2021-03-22 12:31 ` Han-Wen Nienhuys 2021-03-22 17:45 ` Martin Fick 2021-03-18 22:31 ` Martin Fick 2021-03-18 22:54 ` Jeff King 2021-03-18 19:47 ` Jeff King 2021-03-22 14:40 ` Han-Wen Nienhuys 2021-03-26 7:43 ` Jeff King 2021-03-22 13:26 ` Ævar Arnfjörð Bjarmason 2021-03-22 14:59 ` Han-Wen Nienhuys 2021-03-22 15:39 ` Ævar Arnfjörð Bjarmason 2021-03-22 15:56 ` Han-Wen Nienhuys 2021-03-22 16:40 ` Ævar Arnfjörð Bjarmason 2021-03-22 17:12 ` Han-Wen Nienhuys 2021-03-22 18:36 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).