* Unexpected cat-file --batch-check output @ 2021-10-25 19:02 Bryan Turner 2021-10-25 19:18 ` Jeff King 0 siblings, 1 reply; 6+ messages in thread From: Bryan Turner @ 2021-10-25 19:02 UTC (permalink / raw) To: Git Users I'm working with some users trying to reconcile an odd mismatch observed in some Git output. Running an ls-tree for a branch and path, limited to a single pattern within, shows this: /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file If we then run cat-file --batch-check, though, we see this: echo 'refs/heads/develop refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 There's a newline after the branch name, inside the single quotes, followed by the same branch name plus the full path. In this output, it comes back as a commit, though. Both commands were run with refs/heads/develop at the same commit. I've checked for a .gitmodules file and while they _do_ have submodules, they're at different, non-intersecting paths to the one in question here. I can't share the actual repository (I don't have access to it myself), but I'm hoping someone might have some ideas. I've never seen this sort of mismatch before; for every path in the repositories I do have access to that I've tried this for, the cat-file --batch-check always shows "commit" (or "tag") for the ref, and then "blob" for the ref+path. Submodules were the only thing I could think of, but that doesn't appear to be the case. Could it be a subtree instead? How would I check? Thanks in advance for any ideas; I appreciate any help. Bryan Turner ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unexpected cat-file --batch-check output 2021-10-25 19:02 Unexpected cat-file --batch-check output Bryan Turner @ 2021-10-25 19:18 ` Jeff King 2021-10-25 21:48 ` Bryan Turner 2021-10-26 23:58 ` Bryan Turner 0 siblings, 2 replies; 6+ messages in thread From: Jeff King @ 2021-10-25 19:18 UTC (permalink / raw) To: Bryan Turner; +Cc: Git Users On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote: > I'm working with some users trying to reconcile an odd mismatch > observed in some Git output. > > Running an ls-tree for a branch and path, limited to a single pattern > within, shows this: > /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file > > If we then run cat-file --batch-check, though, we see this: > echo 'refs/heads/develop > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 That's definitely odd. Some things I'd try: - do other versions of cat-file behave differently (i.e., is it a regression)? - what does "git rev-parse refs/heads/develop:path/to/parent/file" say? If it comes up with 4c8d566ed80, then the problem is cat-file specific. If not, then it's a problem in the name resolution routines. - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect it to really be a commit (a bug in batch-check's formatting routines could show the wrong object, but I'd expect the oid to at least match what ls-tree showed). - Is there anything odd about the tree? E.g., duplicate entries, out of order entries, etc? Examining "ls-tree" output might help, but "git fsck" should also note any irregularities. After that, I'd probably start running "cat-file --batch-check" through a debugger. I know you said you don't have access to the repository, but perhaps whoever does might be willing to run it through "fast-export --anonymize" and see if the bug persists? -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unexpected cat-file --batch-check output 2021-10-25 19:18 ` Jeff King @ 2021-10-25 21:48 ` Bryan Turner 2021-10-26 23:58 ` Bryan Turner 1 sibling, 0 replies; 6+ messages in thread From: Bryan Turner @ 2021-10-25 21:48 UTC (permalink / raw) To: Jeff King; +Cc: Git Users On Mon, Oct 25, 2021 at 12:18 PM Jeff King <peff@peff.net> wrote: > > On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote: > > > I'm working with some users trying to reconcile an odd mismatch > > observed in some Git output. > > > > Running an ls-tree for a branch and path, limited to a single pattern > > within, shows this: > > /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file > > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file > > > > If we then run cat-file --batch-check, though, we see this: > > echo 'refs/heads/develop > > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check > > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 > > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 > > That's definitely odd. Some things I'd try: > > - do other versions of cat-file behave differently (i.e., is it a > regression)? > > - what does "git rev-parse refs/heads/develop:path/to/parent/file" > say? If it comes up with 4c8d566ed80, then the problem is cat-file > specific. If not, then it's a problem in the name resolution > routines. > > - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect > it to really be a commit (a bug in batch-check's formatting routines > could show the wrong object, but I'd expect the oid to at least > match what ls-tree showed). I don't have that specific data, but one thing I do know is that cat-file -p prints commit contents: /usr/bin/git cat-file -p refs/heads/develop:path/to/parent/file tree c378146c918c05794e5fb1d1f6986c81ca866326 parent 6cb6016c78c4c963311ca82fc53764141b0d3bdd author ... committer ... <Commit message starts here> One other observation. I threw cc10f4b278086325aab2f95df97c807c7c6cd75e into Github's search, on a lark, since so much open source is there, and it actually finds that commit in multiple repositories[1][2][3] [1] https://github.com/bitcoin-sv/bitcoin-sv/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e [2] https://github.com/fakecoinbase/bitcoin-svslashbitcoin-sv/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e [3] https://github.com/TuringBitchain/TuringBitchain/commit/cc10f4b278086325aab2f95df97c807c7c6cd75e None of those repositories has a branch named "develop", and the "file" I've obscured here is not present in any of them, so while there is clearly some ancestry in this repository with open source roots, it's evolved since then. Experimenting with some of the "nearby" files that are present in those public repositories, I have not been able to reproduce the issue in any of them. > > - Is there anything odd about the tree? E.g., duplicate entries, out > of order entries, etc? Examining "ls-tree" output might help, but > "git fsck" should also note any irregularities. I've sent some further commands, based on your suggestions, to the users. > > After that, I'd probably start running "cat-file --batch-check" through > a debugger. I know you said you don't have access to the repository, but > perhaps whoever does might be willing to run it through "fast-export > --anonymize" and see if the bug persists? fast-export --anonymize might be a way forward. Thanks for suggesting it (I always forget about it); I've mentioned it to the users. Thanks Jeff; I appreciate your time/insights! Best regards, Bryan Turner ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unexpected cat-file --batch-check output 2021-10-25 19:18 ` Jeff King 2021-10-25 21:48 ` Bryan Turner @ 2021-10-26 23:58 ` Bryan Turner 2021-10-27 1:28 ` Jeff King 2021-10-27 8:08 ` Johannes Sixt 1 sibling, 2 replies; 6+ messages in thread From: Bryan Turner @ 2021-10-26 23:58 UTC (permalink / raw) To: Jeff King; +Cc: Git Users A few quick updates to some of the questions: On Mon, Oct 25, 2021 at 12:18 PM Jeff King <peff@peff.net> wrote: > > On Mon, Oct 25, 2021 at 12:02:38PM -0700, Bryan Turner wrote: > > > I'm working with some users trying to reconcile an odd mismatch > > observed in some Git output. > > > > Running an ls-tree for a branch and path, limited to a single pattern > > within, shows this: > > /usr/bin/git ls-tree -z refs/heads/develop:path/to/parent – file > > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file > > > > If we then run cat-file --batch-check, though, we see this: > > echo 'refs/heads/develop > > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check > > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 > > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 > > That's definitely odd. Some things I'd try: > > - do other versions of cat-file behave differently (i.e., is it a > regression)? They're using Git 2.32 built from source on Ubuntu 20.04. I may see if they can reinstall the 2.25.1 from focal's standard repositories and see if it reproduces the issue. That said, they may not be able/willing to do it. > > - what does "git rev-parse refs/heads/develop:path/to/parent/file" > say? If it comes up with 4c8d566ed80, then the problem is cat-file > specific. If not, then it's a problem in the name resolution > routines. $ /usr/bin/git rev-parse refs/heads/develop 28a05ce2e3079afcb32e4f1777b42971d7933a91 $ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file cc10f4b278086325aab2f95df97c807c7c6cd75e So it looks like rev-parse and cat-file --batch-check both exhibit the same behavior. I also had them expand their cat-file --batch-check to include another file in the same "path/to/parent" directory: $ echo 'refs/heads/develop refs/heads/develop:path/to/parent/sibling refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897 cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 So the "sibling" file in the same directory comes out as a "blob", as expected. They also ran an ls-tree for the directory without any globs: # /usr/bin/git ls-tree refs/heads/develop:path/to/parent 100644 blob 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda sibling 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file For "sibling" the blob's ID matches what cat-file --batch-check shows, as I'd expect. There are several other tree entries, one "tree" and the rest "blob", that I've omitted for brevity. All of their modes look normal. I also had them check ls-tree for some parent levels: $ /usr/bin/git ls-tree refs/heads/develop:path -- to 040000 tree 5244cd18e3d9de9002bdfcd18e173ca55c035084 to $ /usr/bin/git ls-tree refs/heads/develop:path/to -- parent 040000 tree 2847dc49d79e8d66040047a9dd61376115bf8829 parent Nothing out of the ordinary to my eye. > > - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect > it to really be a commit (a bug in batch-check's formatting routines > could show the wrong object, but I'd expect the oid to at least > match what ls-tree showed). $ /usr/bin/git cat-file -t cc10f4b278086325aab2f95df97c807c7c6cd75e commit > > - Is there anything odd about the tree? E.g., duplicate entries, out > of order entries, etc? Examining "ls-tree" output might help, but > "git fsck" should also note any irregularities. $ /usr/bin/git fsck --no-dangling Checking object directories: 100% (256/256), done. Checking object directories: 100% (256/256), done. Checking objects: 100% (122888/122888), done. There's one alternate. No warnings, though. > > After that, I'd probably start running "cat-file --batch-check" through > a debugger. I know you said you don't have access to the repository, but > perhaps whoever does might be willing to run it through "fast-export > --anonymize" and see if the bug persists? I've asked them to double-check whether they can provide me with the repository, or with an anonymized copy. At this point, it feels like there's not a lot more I can do/check without access to data that reproduces the issue so I can attach a debugger. Thanks again, Bryan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unexpected cat-file --batch-check output 2021-10-26 23:58 ` Bryan Turner @ 2021-10-27 1:28 ` Jeff King 2021-10-27 8:08 ` Johannes Sixt 1 sibling, 0 replies; 6+ messages in thread From: Jeff King @ 2021-10-27 1:28 UTC (permalink / raw) To: Bryan Turner; +Cc: Git Users On Tue, Oct 26, 2021 at 04:58:49PM -0700, Bryan Turner wrote: > > - what does "git rev-parse refs/heads/develop:path/to/parent/file" > > say? If it comes up with 4c8d566ed80, then the problem is cat-file > > specific. If not, then it's a problem in the name resolution > > routines. > > $ /usr/bin/git rev-parse refs/heads/develop > 28a05ce2e3079afcb32e4f1777b42971d7933a91 > $ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file > cc10f4b278086325aab2f95df97c807c7c6cd75e > > So it looks like rev-parse and cat-file --batch-check both exhibit the > same behavior. OK, that's not too surprising, since they're using the same routines under the hood. But that does imply that the problem is in the get_oid() family, which is what's doing that name to oid lookup. I don't recall us ever having a bug of this nature in the history of Git, nor do I think this code would have changed recently. But of course there's a first time for everything. The parser there isn't exactly left-to-right, so perhaps this particular name is stimulating some corner case. I imagine the answer is "no", or you'd have said so already, but are there any unusual characters in the filename path? Colons, curly braces, etc? > I also had them expand their cat-file --batch-check to include another > file in the same "path/to/parent" directory: > $ echo 'refs/heads/develop > refs/heads/develop:path/to/parent/sibling > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 > 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897 > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 > > So the "sibling" file in the same directory comes out as a "blob", as expected. Interesting. That again points to their being something funny either with this filename, or perhaps with the tree that contains it. > > - likewise, what does "git cat-file -t cc10f4b27808" say? I'd expect > > it to really be a commit (a bug in batch-check's formatting routines > > could show the wrong object, but I'd expect the oid to at least > > match what ls-tree showed). > > $ /usr/bin/git cat-file -t cc10f4b278086325aab2f95df97c807c7c6cd75e > commit That's not too surprising. I did wonder if refs/replace or something could be at work here, but I think in that case we'd still report the expected oid. At any rate, we can probably rule that out as rev-parse is returning the same unexpected oid, which means the problem is during the name resolution (and we shouldn't respect refs/replace there at all; we would respect it while reading the outer tree, but then so would your ls-tree, etc). > I've asked them to double-check whether they can provide me with the > repository, or with an anonymized copy. At this point, it feels like > there's not a lot more I can do/check without access to data that > reproduces the issue so I can attach a debugger. Another possibility, if they would run a custom Git on their end, is to provide them with a patch that cranks up the debugging output from get_oid_with_context_1(). Though I feel like it's hard to know where to sprinkle printf()s until we know where things go wrong. Is it misinterpreting the name, and not realizing it's a tree:path name? Or is get_tree_entry() at fault? That kind of thing is much easier to figure out interactively in a debugger. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unexpected cat-file --batch-check output 2021-10-26 23:58 ` Bryan Turner 2021-10-27 1:28 ` Jeff King @ 2021-10-27 8:08 ` Johannes Sixt 1 sibling, 0 replies; 6+ messages in thread From: Johannes Sixt @ 2021-10-27 8:08 UTC (permalink / raw) To: Bryan Turner; +Cc: Git Users, Jeff King Am 27.10.21 um 01:58 schrieb Bryan Turner: > $ /usr/bin/git rev-parse refs/heads/develop > 28a05ce2e3079afcb32e4f1777b42971d7933a91 > $ /usr/bin/git rev-parse refs/heads/develop:path/to/parent/file > cc10f4b278086325aab2f95df97c807c7c6cd75e > > So it looks like rev-parse and cat-file --batch-check both exhibit the > same behavior. > > I also had them expand their cat-file --batch-check to include another > file in the same "path/to/parent" directory: > $ echo 'refs/heads/develop > refs/heads/develop:path/to/parent/sibling > refs/heads/develop:path/to/parent/file' | /usr/bin/git cat-file --batch-check > 28a05ce2e3079afcb32e4f1777b42971d7933a91 commit 259 > 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda blob 897 > cc10f4b278086325aab2f95df97c807c7c6cd75e commit 330 > > So the "sibling" file in the same directory comes out as a "blob", as expected. > > They also ran an ls-tree for the directory without any globs: > # /usr/bin/git ls-tree refs/heads/develop:path/to/parent > 100644 blob 2bfe7b4b7c7cdeb9653801d99b65dfefe5780dda sibling > 100644 blob 4c8d566ed80a1554a059b97f7cd533a55bbd2ea8 file Just a shot in the dark: what happens when you use /usr/bin/git --no-replace-objects? -- Hannes ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-27 8:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-25 19:02 Unexpected cat-file --batch-check output Bryan Turner 2021-10-25 19:18 ` Jeff King 2021-10-25 21:48 ` Bryan Turner 2021-10-26 23:58 ` Bryan Turner 2021-10-27 1:28 ` Jeff King 2021-10-27 8:08 ` Johannes Sixt
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).