git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
@ 2023-01-03  3:22 Andrew Hlynskyi
  2023-01-03  9:04 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Hlynskyi @ 2023-01-03  3:22 UTC (permalink / raw)
  To: git

Steps to reproduce:
1. Create some notes with the `git notes add` command.
2. Run `git notes` to list stored notes.
3. Run any of `git gc  --prune=now` or `git pack-refs --all` commands.
4. Run `git notes` again and see that there are no more seen notes.

The reason is that all commands like `git notes` or `git log --notes`
expect to find notes head as unpacked ref in .git/refs/notes/commits.
But the gc or the pack-refs command packs .git/refs/notes/commits ref
into .git/packed-refs file.

It's possible to restore the notes ref with a shell script like:
```
fix-notes-unpack-commits-ref(){
    local hash path
    grep refs/notes/commits .git/packed-refs |\
    while read hash path; do
        path=".git/$path"
        mkdir -p "$(dirname "$path")"
        echo -n "$hash" > "$path"
    done
}
```
Or, fortunately, from another backup place: .git/logs/refs/notes/commits.

The expected behavior is that `git notes` and `git log` should work
also with notes ref stored in the .git/packed-refs file and also
respect that the `git notes` command can store notes with a custom ref
by using the `--ref` option.

Yours sincerely,
Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
  2023-01-03  3:22 [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command Andrew Hlynskyi
@ 2023-01-03  9:04 ` Jeff King
  2023-01-05 16:59   ` Andrew Hlynskyi
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-01-03  9:04 UTC (permalink / raw)
  To: Andrew Hlynskyi; +Cc: git

On Tue, Jan 03, 2023 at 05:22:01AM +0200, Andrew Hlynskyi wrote:

> Steps to reproduce:
> 1. Create some notes with the `git notes add` command.
> 2. Run `git notes` to list stored notes.
> 3. Run any of `git gc  --prune=now` or `git pack-refs --all` commands.
> 4. Run `git notes` again and see that there are no more seen notes.

This doesn't reproduce for me at all. Running:

  git init repo
  cd repo

  git commit --allow-empty -m "this is the commit message"
  git notes add -m "this is the note"

  git notes
  git pack-refs --all
  git notes

Shows the same output before and after packing the refs.

> The reason is that all commands like `git notes` or `git log --notes`
> expect to find notes head as unpacked ref in .git/refs/notes/commits.
> But the gc or the pack-refs command packs .git/refs/notes/commits ref
> into .git/packed-refs file.

That would be very surprising. The lookup of the ref in the notes code
uses the same generic ref code that the rest of Git uses, which
understands packed-refs and so on.

Can you share the .git directory of a repository that exhibits this
behavior? It's possible there's a bug or something in the packed-refs
code, though I find it pretty unlikely, as it's fairly well exercised in
normal use.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
  2023-01-03  9:04 ` Jeff King
@ 2023-01-05 16:59   ` Andrew Hlynskyi
  2023-01-06  8:57     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Hlynskyi @ 2023-01-05 16:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Tue, Jan 3, 2023 at 11:04 AM Jeff King <peff@peff.net> wrote:

> That would be very surprising. The lookup of the ref in the notes code
> uses the same generic ref code that the rest of Git uses, which
> understands packed-refs and so on.

Thank you Jeff for looking at this also.

I tried to debug the issue by myself and found that the
`find_reference_location` function can't find the record for
`refs/notes/commits` in a repo.
I recall that a year ago I had trouble with a repo and it was damaged,
I restored it with the `git fsck` command and took some info from the
`git reflog`.
But somehow there was an issue in the `.git/packed-refs` file that was
there unexplored all past year.
The core of the issue I can demonstrate with two `.git/packed-refs` excerpts.

The issue excerpt:
```
dca1abdb5b870d46deae3717e87d0085b62d7242 refs/heads/tmp
d5ec01136b79a59a9e39d3ab02d92e1d91558579 refs/heads/tdf
c60f96a3f76e99e1b42f07204ad830c1483647b0 refs/heads/trunk
93119b80d4c7676a0ebe1bd6ae46574f96dfc7ed refs/notes/commits
452a23af3cfac6eef3bb1045894521fac8e51e2a refs/heads/cargo-config-toml
f5599d73873c451041fde857a09b5f12662357c1 refs/heads/custom
64076392359c4d4989320bde5712a8b389056f49 refs/heads/main
2119ff71ca66bec5117534907a127cee32680788 refs/heads/next
7e0016059ee7a87e3d04f7e59154529a9616a238 refs/recovery/1
de2876aa45022af815e717eb92becd61564efe32 refs/recovery/10
```

The correct excerpt:
```
dca1abdb5b870d46deae3717e87d0085b62d7242 refs/heads/tmp
d5ec01136b79a59a9e39d3ab02d92e1d91558579 refs/heads/tdf
c60f96a3f76e99e1b42f07204ad830c1483647b0 refs/heads/trunk
452a23af3cfac6eef3bb1045894521fac8e51e2a refs/heads/cargo-config-toml
f5599d73873c451041fde857a09b5f12662357c1 refs/heads/custom
64076392359c4d4989320bde5712a8b389056f49 refs/heads/main
2119ff71ca66bec5117534907a127cee32680788 refs/heads/next
93119b80d4c7676a0ebe1bd6ae46574f96dfc7ed refs/notes/commits
7e0016059ee7a87e3d04f7e59154529a9616a238 refs/recovery/1
de2876aa45022af815e717eb92becd61564efe32 refs/recovery/10
```

As you may notice that on the first except there is an issue with
sorting and after `ref/notes/*` there are several `ref/heads/*`.
So I suspect that the issue was introduced by myself a year ago when I
was recovering the repo.

> Can you share the .git directory of a repository that exhibits this
> behavior? It's possible there's a bug or something in the packed-refs
> code, though I find it pretty unlikely, as it's fairly well exercised in
> normal use.

The above excerpts completely describe the issue and there is no more
special in the repo.

I was surprised that such an issue can remain in the
`.git/packed-refs` for more than a year.
I thought that commands like `git fsck` would report such unordering
problems and `git gc` or `git pack-refs`
make ordering checks and make full resorting of the `.git/packed-refs`
file in case of issues with ordering.
I understand that the `.git/packed-refs` file is for machines and not
for humans but sometimes
it's the fastest way to make several simple corrections in it manually.

P.S: What was most surprising is that `git pack-refs` was able to
correctly update a hash even for an unordered record.

-Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
  2023-01-05 16:59   ` Andrew Hlynskyi
@ 2023-01-06  8:57     ` Jeff King
  2023-01-06 12:40       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2023-01-06  8:57 UTC (permalink / raw)
  To: Andrew Hlynskyi; +Cc: git

On Thu, Jan 05, 2023 at 06:59:36PM +0200, Andrew Hlynskyi wrote:

> > Can you share the .git directory of a repository that exhibits this
> > behavior? It's possible there's a bug or something in the packed-refs
> > code, though I find it pretty unlikely, as it's fairly well exercised in
> > normal use.
> 
> The above excerpts completely describe the issue and there is no more
> special in the repo.

Thanks for digging. Your explanation makes sense.

> I was surprised that such an issue can remain in the
> `.git/packed-refs` for more than a year.
> I thought that commands like `git fsck` would report such unordering
> problems and `git gc` or `git pack-refs`
> make ordering checks and make full resorting of the `.git/packed-refs`
> file in case of issues with ordering.

I don't think we have any fsck checks that the packed-refs file is in
sorted order. It might be reasonable to have them. Likewise, when
pack-refs rewrites the file, it should be able to cheaply double-check
that the input is sorted by comparing each entry against its previous.

I'm not _too_ surprised that the repo could persist for a while with an
out-of-order packed-refs file, especially if you are not doing writes
that would trigger a gc. Lookups are done by binary-search (and thus
require sorting), but many operations will just iterate over the whole
list and don't care.  For an infrequently used ref, you might get
unlucky and fall on it during a binary search, but otherwise it wouldn't
affect most results.

> I understand that the `.git/packed-refs` file is for machines and not
> for humans but sometimes
> it's the fastest way to make several simple corrections in it manually.

Yes, I have certainly done this myself. There are is a header line at
the top of the file, though, which tells what is guaranteed:

  $ head -1 .git/packed-refs
  # pack-refs with: peeled fully-peeled sorted

if you are going to muck with the file, deleting that line should be
sufficient; the reading code will fall back to the older routines which
don't assume it's sorted.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
  2023-01-06  8:57     ` Jeff King
@ 2023-01-06 12:40       ` Junio C Hamano
  2023-01-06 13:04         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2023-01-06 12:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Andrew Hlynskyi, git

Jeff King <peff@peff.net> writes:

> On Thu, Jan 05, 2023 at 06:59:36PM +0200, Andrew Hlynskyi wrote:
>
>> > Can you share the .git directory of a repository that exhibits this
>> > behavior? It's possible there's a bug or something in the packed-refs
>> > code, though I find it pretty unlikely, as it's fairly well exercised in
>> > normal use.
>> 
>> The above excerpts completely describe the issue and there is no more
>> special in the repo.
>
> Thanks for digging. Your explanation makes sense.
> ...
> I don't think we have any fsck checks that the packed-refs file is in
> sorted order. It might be reasonable to have them. Likewise, when
> pack-refs rewrites the file, it should be able to cheaply double-check
> that the input is sorted by comparing each entry against its previous.

True.  I would not mind a patch to make us do so in the code path
where we rewrite the file and add "sorted" trait to the file.
refs/packed-backend.c::sort_snapshot() seems to be already equipped
to do this?

>> I understand that the `.git/packed-refs` file is for machines and not
>> for humans but sometimes
>> it's the fastest way to make several simple corrections in it manually.
>
> Yes, I have certainly done this myself. There are is a header line at
> the top of the file, though, which tells what is guaranteed:
>
>   $ head -1 .git/packed-refs
>   # pack-refs with: peeled fully-peeled sorted
>
> if you are going to muck with the file, deleting that line should be
> sufficient; the reading code will fall back to the older routines which
> don't assume it's sorted.

Thanks, both.

So we can conclude that this discussion thread has an
incorrect Subject: and the symptom was caused by human error?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command
  2023-01-06 12:40       ` Junio C Hamano
@ 2023-01-06 13:04         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2023-01-06 13:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andrew Hlynskyi, git

On Fri, Jan 06, 2023 at 09:40:22PM +0900, Junio C Hamano wrote:

> > I don't think we have any fsck checks that the packed-refs file is in
> > sorted order. It might be reasonable to have them. Likewise, when
> > pack-refs rewrites the file, it should be able to cheaply double-check
> > that the input is sorted by comparing each entry against its previous.
> 
> True.  I would not mind a patch to make us do so in the code path
> where we rewrite the file and add "sorted" trait to the file.
> refs/packed-backend.c::sort_snapshot() seems to be already equipped
> to do this?

I think it may be a little trickier than that. Yes, sort_snapshot()
knows about sorting, but it only kicks in when the file isn't already
marked as sorted (and we sort on the fly so that the rest of the code
can use the same lookup routines).

And it's possible that we can just sort_snapshot() before writing, even
if the original claims to be sorted. But I'm not sure what performance
impact that might have on the normal case that everything is already in
good order. Maybe it's not a big deal; the write is already O(n), so
adding an O(n log n) might not be the end of the world.

But I was thinking more that write_with_updates() would, while iterating
through the existing entries, check that the values it gets from the
ref_iterator are indeed in sorted order. And if not, I think it needs to
actually bail, since we might already have written a partially-confused
result. And there "bail" may mean "write a warning to the user, abort
the current write, call sort_snapshot(), and then try again".

All of which is to say I don't think it's conceptually _too_ hard, but
it was not simple enough that I was comfortable dashing off a one-liner
and saying "probably something like this". ;)

> So we can conclude that this discussion thread has an
> incorrect Subject: and the symptom was caused by human error?

That's my read on it.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-06 13:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03  3:22 [BUG] `git gc` or `git pack-refs` wipes all notes for `git notes` command Andrew Hlynskyi
2023-01-03  9:04 ` Jeff King
2023-01-05 16:59   ` Andrew Hlynskyi
2023-01-06  8:57     ` Jeff King
2023-01-06 12:40       ` Junio C Hamano
2023-01-06 13:04         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).