git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Unexpected cat-file --batch-check output
@ 2021-10-25 19:02 Bryan Turner
  2021-10-25 19:18 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Bryan Turner @ 2021-10-25 19:02 UTC (permalink / raw)
  To: Git Users

I'm working with some users trying to reconcile an odd mismatch
observed in some Git output.

Running an ls-tree for a branch and path, limited to a single pattern
within, shows this:
/usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file
100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file

If we then run cat-file --batch-check, though, we see this:
echo 'refs/heads/develop
refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330

There's a newline after the branch name, inside the single quotes,
followed by the same branch name plus the full path. In this output,
it comes back as a commit, though. Both commands were run with
refs/heads/develop at the same commit. I've checked for a .gitmodules
file and while they _do_ have submodules, they're at different,
non-intersecting paths to the one in question here.

I can't share the actual repository (I don't have access to it
myself), but I'm hoping someone might have some ideas. I've never seen
this sort of mismatch before; for every path in the repositories I do
have access to that I've tried this for, the cat-file --batch-check
always shows "commit" (or "tag") for the ref, and then "blob" for the
ref+path. Submodules were the only thing I could think of, but that
doesn't appear to be the case. Could it be a subtree instead? How
would I check?

Thanks in advance for any ideas; I appreciate any help.
Bryan Turner

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected cat-file --batch-check output
  2021-10-25 19:02 Unexpected cat-file --batch-check output Bryan Turner
@ 2021-10-25 19:18 ` Jeff King
  2021-10-25 21:48   ` Bryan Turner
  2021-10-26 23:58   ` Bryan Turner
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2021-10-25 19:18 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote:

> I'm working with some users trying to reconcile an odd mismatch
> observed in some Git output.
> 
> Running an ls-tree for a branch and path, limited to a single pattern
> within, shows this:
> /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file
> 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file
> 
> If we then run cat-file --batch-check, though, we see this:
> echo 'refs/heads/develop
> refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
> 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
> cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330

That's definitely odd. Some things I'd try:

  - do other versions of cat-file behave differently (i.e., is it a
    regression)?

  - what does "git rev-parse refs/heads/develop:path/to/parent/file"
    say? If it comes up with 4c8d566ed80, then the problem is cat-file
    specific. If not, then it's a problem in the name resolution
    routines.

  - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect
    it to really be a commit (a bug in batch-check's formatting routines
    could show the wrong object, but I'd expect the oid to at least
    match what ls-tree showed).

  - Is there anything odd about the tree? E.g., duplicate entries, out
    of order entries, etc? Examining "ls-tree" output might help, but
    "git fsck" should also note any irregularities.

After that, I'd probably start running "cat-file --batch-check" through
a debugger. I know you said you don't have access to the repository, but
perhaps whoever does might be willing to run it through "fast-export
--anonymize" and see if the bug persists?

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected cat-file --batch-check output
  2021-10-25 19:18 ` Jeff King
@ 2021-10-25 21:48   ` Bryan Turner
  2021-10-26 23:58   ` Bryan Turner
  1 sibling, 0 replies; 6+ messages in thread
From: Bryan Turner @ 2021-10-25 21:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users

On Mon, Oct 25, 2021 at 12:18 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote:
>
> > I'm working with some users trying to reconcile an odd mismatch
> > observed in some Git output.
> >
> > Running an ls-tree for a branch and path, limited to a single pattern
> > within, shows this:
> > /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file
> > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file
> >
> > If we then run cat-file --batch-check, though, we see this:
> > echo 'refs/heads/develop
> > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
> > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
> > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330
>
> That's definitely odd. Some things I'd try:
>
>   - do other versions of cat-file behave differently (i.e., is it a
>     regression)?
>
>   - what does "git rev-parse refs/heads/develop:path/to/parent/file"
>     say? If it comes up with 4c8d566ed80, then the problem is cat-file
>     specific. If not, then it's a problem in the name resolution
>     routines.
>
>   - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect
>     it to really be a commit (a bug in batch-check's formatting routines
>     could show the wrong object, but I'd expect the oid to at least
>     match what ls-tree showed).

I don't have that specific data, but one thing I do know is that
cat-file -p prints commit contents:

/usr/bin/git cat-file -p refs/heads/develop:path/to/parent/file
tree c378146c918c05794e5fb1d1f6986c81ca866326
parent 6cb6016c78c4c963311ca82fc53764141b0d3bdd
author ...
committer ...

<Commit message starts here>

One other observation. I threw
cc10f4b278086325aab2f95df97c807c7c6cd75e into Github's search, on a
lark, since so much open source is there, and it actually finds that
commit in multiple repositories[1][2][3]

[1] https://github.com/bitcoin-sv/bitcoin-sv/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e
[2] https://github.com/fakecoinbase/bitcoin-svslashbitcoin-sv/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e
[3] https://github.com/TuringBitchain/TuringBitchain/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e

None of those repositories has a branch named "develop", and the
"file" I've obscured here is not present in any of them, so while
there is clearly some ancestry in this repository with open source
roots, it's evolved since then. Experimenting with some of the
"nearby" files that are present in those public repositories, I have
not been able to reproduce the issue in any of them.

>
>   - Is there anything odd about the tree? E.g., duplicate entries, out
>     of order entries, etc? Examining "ls-tree" output might help, but
>     "git fsck" should also note any irregularities.

I've sent some further commands, based on your suggestions, to the users.

>
> After that, I'd probably start running "cat-file --batch-check" through
> a debugger. I know you said you don't have access to the repository, but
> perhaps whoever does might be willing to run it through "fast-export
> --anonymize" and see if the bug persists?

fast-export --anonymize might be a way forward. Thanks for suggesting
it (I always forget about it); I've mentioned it to the users.

Thanks Jeff; I appreciate your time/insights!

Best regards,
Bryan Turner

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected cat-file --batch-check output
  2021-10-25 19:18 ` Jeff King
  2021-10-25 21:48   ` Bryan Turner
@ 2021-10-26 23:58   ` Bryan Turner
  2021-10-27  1:28     ` Jeff King
  2021-10-27  8:08     ` Johannes Sixt
  1 sibling, 2 replies; 6+ messages in thread
From: Bryan Turner @ 2021-10-26 23:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users

A few quick updates to some of the questions:

On Mon, Oct 25, 2021 at 12:18 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote:
>
> > I'm working with some users trying to reconcile an odd mismatch
> > observed in some Git output.
> >
> > Running an ls-tree for a branch and path, limited to a single pattern
> > within, shows this:
> > /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file
> > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file
> >
> > If we then run cat-file --batch-check, though, we see this:
> > echo 'refs/heads/develop
> > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
> > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
> > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330
>
> That's definitely odd. Some things I'd try:
>
>   - do other versions of cat-file behave differently (i.e., is it a
>     regression)?

They're using Git 2.32 built from source on Ubuntu 20.04. I may see if
they can reinstall the 2.25.1 from focal's standard repositories and
see if it reproduces the issue. That said, they may not be
able/willing to do it.

>
>   - what does "git rev-parse refs/heads/develop:path/to/parent/file"
>     say? If it comes up with 4c8d566ed80, then the problem is cat-file
>     specific. If not, then it's a problem in the name resolution
>     routines.

$ /usr/bin/git rev-parse refs/heads/develop
28a05ce2e3079afcb32e4f1777b42971d7933a91
$ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file
cc10f4b278086325aab2f95df97c807c7c6cd75e

So it looks like rev-parse and cat-file --batch-check both exhibit the
same behavior.

I also had them expand their cat-file --batch-check to include another
file in the same "path/to/parent" directory:
$ echo 'refs/heads/develop
refs/heads/develop:path/to/parent/sibling
refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897
cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330

So the "sibling" file in the same directory comes out as a "blob", as expected.

They also ran an ls-tree for the directory without any globs:
# /usr/bin/git ls-tree refs/heads/develop:path/to/parent
100644 blob 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda    sibling
100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file

For "sibling" the blob's ID matches what cat-file --batch-check shows,
as I'd expect. There are several other tree entries, one "tree" and
the rest "blob", that I've omitted for brevity. All of their modes
look normal.

I also had them check ls-tree for some parent levels:
$ /usr/bin/git ls-tree refs/heads/develop:path -- to
040000 tree 5244cd18e3d9de9002bdfcd18e173ca55c035084    to
$ /usr/bin/git ls-tree refs/heads/develop:path/to -- parent
040000 tree 2847dc49d79e8d66040047a9dd61376115bf8829    parent

Nothing out of the ordinary to my eye.

>
>   - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect
>     it to really be a commit (a bug in batch-check's formatting routines
>     could show the wrong object, but I'd expect the oid to at least
>     match what ls-tree showed).

$ /usr/bin/git cat-file -t cc10f4b278086325aab2f95df97c807c7c6cd75e
commit

>
>   - Is there anything odd about the tree? E.g., duplicate entries, out
>     of order entries, etc? Examining "ls-tree" output might help, but
>     "git fsck" should also note any irregularities.

$ /usr/bin/git fsck --no-dangling
Checking object directories: 100% (256/256), done.
Checking object directories: 100% (256/256), done.
Checking objects: 100% (122888/122888), done.

There's one alternate. No warnings, though.

>
> After that, I'd probably start running "cat-file --batch-check" through
> a debugger. I know you said you don't have access to the repository, but
> perhaps whoever does might be willing to run it through "fast-export
> --anonymize" and see if the bug persists?

I've asked them to double-check whether they can provide me with the
repository, or with an anonymized copy. At this point, it feels like
there's not a lot more I can do/check without access to data that
reproduces the issue so I can attach a debugger.

Thanks again,
Bryan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected cat-file --batch-check output
  2021-10-26 23:58   ` Bryan Turner
@ 2021-10-27  1:28     ` Jeff King
  2021-10-27  8:08     ` Johannes Sixt
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2021-10-27  1:28 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

On Tue, Oct 26, 2021 at 04:58:49PM -0700, Bryan Turner wrote:

> >   - what does "git rev-parse refs/heads/develop:path/to/parent/file"
> >     say? If it comes up with 4c8d566ed80, then the problem is cat-file
> >     specific. If not, then it's a problem in the name resolution
> >     routines.
> 
> $ /usr/bin/git rev-parse refs/heads/develop
> 28a05ce2e3079afcb32e4f1777b42971d7933a91
> $ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file
> cc10f4b278086325aab2f95df97c807c7c6cd75e
> 
> So it looks like rev-parse and cat-file --batch-check both exhibit the
> same behavior.

OK, that's not too surprising, since they're using the same routines
under the hood. But that does imply that the problem is in the get_oid()
family, which is what's doing that name to oid lookup.

I don't recall us ever having a bug of this nature in the history of
Git, nor do I think this code would have changed recently. But of course
there's a first time for everything.

The parser there isn't exactly left-to-right, so perhaps this particular
name is stimulating some corner case. I imagine the answer is "no", or
you'd have said so already, but are there any unusual characters in the
filename path? Colons, curly braces, etc?

> I also had them expand their cat-file --batch-check to include another
> file in the same "path/to/parent" directory:
> $ echo 'refs/heads/develop
> refs/heads/develop:path/to/parent/sibling
> refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
> 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
> 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897
> cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330
> 
> So the "sibling" file in the same directory comes out as a "blob", as expected.

Interesting. That again points to their being something funny either
with this filename, or perhaps with the tree that contains it.

> >   - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect
> >     it to really be a commit (a bug in batch-check's formatting routines
> >     could show the wrong object, but I'd expect the oid to at least
> >     match what ls-tree showed).
> 
> $ /usr/bin/git cat-file -t cc10f4b278086325aab2f95df97c807c7c6cd75e
> commit

That's not too surprising. I did wonder if refs/replace or something
could be at work here, but I think in that case we'd still report the
expected oid. At any rate, we can probably rule that out as rev-parse is
returning the same unexpected oid, which means the problem is during the
name resolution (and we shouldn't respect refs/replace there at all; we
would respect it while reading the outer tree, but then so would your
ls-tree, etc).

> I've asked them to double-check whether they can provide me with the
> repository, or with an anonymized copy. At this point, it feels like
> there's not a lot more I can do/check without access to data that
> reproduces the issue so I can attach a debugger.

Another possibility, if they would run a custom Git on their end, is to
provide them with a patch that cranks up the debugging output from
get_oid_with_context_1(). Though I feel like it's hard to know where to
sprinkle printf()s until we know where things go wrong. Is it
misinterpreting the name, and not realizing it's a tree:path name? Or is
get_tree_entry() at fault? That kind of thing is much easier to figure
out interactively in a debugger.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected cat-file --batch-check output
  2021-10-26 23:58   ` Bryan Turner
  2021-10-27  1:28     ` Jeff King
@ 2021-10-27  8:08     ` Johannes Sixt
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Sixt @ 2021-10-27  8:08 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users, Jeff King

Am 27.10.21 um 01:58 schrieb Bryan Turner:
> $ /usr/bin/git rev-parse refs/heads/develop
> 28a05ce2e3079afcb32e4f1777b42971d7933a91
> $ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file
> cc10f4b278086325aab2f95df97c807c7c6cd75e
> 
> So it looks like rev-parse and cat-file --batch-check both exhibit the
> same behavior.
> 
> I also had them expand their cat-file --batch-check to include another
> file in the same "path/to/parent" directory:
> $ echo 'refs/heads/develop
> refs/heads/develop:path/to/parent/sibling
> refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check
> 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259
> 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897
> cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330
> 
> So the "sibling" file in the same directory comes out as a "blob", as expected.
> 
> They also ran an ls-tree for the directory without any globs:
> # /usr/bin/git ls-tree refs/heads/develop:path/to/parent
> 100644 blob 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda    sibling
> 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8    file

Just a shot in the dark: what happens when you use /usr/bin/git
--no-replace-objects?

-- Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-27  8:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 19:02 Unexpected cat-file --batch-check output Bryan Turner
2021-10-25 19:18 ` Jeff King
2021-10-25 21:48   ` Bryan Turner
2021-10-26 23:58   ` Bryan Turner
2021-10-27  1:28     ` Jeff King
2021-10-27  8:08     ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).