From: Jeff King <peff@peff.net>
To: "Baumann, Moritz" <moritz.baumann@sap.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Feature Request: Option to make "git rev-list --objects" output duplicate objects
Date: Fri, 24 Mar 2023 15:28:48 -0400 [thread overview]
Message-ID: <20230324192848.GC536967@coredump.intra.peff.net> (raw)
In-Reply-To: <AS1PR02MB8185A45DB63216699AFB2C5494849@AS1PR02MB8185.eurprd02.prod.outlook.com>
On Fri, Mar 24, 2023 at 03:51:21PM +0000, Baumann, Moritz wrote:
> …and then used the resulting list for all subsequent checks. After writing some
> unit tests, I noticed that the returned list is not sufficient: If you generate
> the exact same file twice, once with a "bad" name and once with a "good" name,
> you will only see one of those names and therefore the hook will mistakenly
> allow the push.
>
> So, what I would want/need is an option that forces "git rev-list --objects"
> to output the object multiple times if it has multiple names in the commit
> range. Admittedly, such an option would likely only be useful for hooks that
> validate file names.
Another problem you might not have run into yet: the names given by
rev-list are not quoted in any way, and will just omit newlines. So if
your hook is trying to avoid malicious garbage like "foo\nbar", it won't
work.
Those names are really just intended as hints for pack-objects. I
suspect the documentation could be more clear about these limitations.
> Would it be feasible to implement such an option? If so, does it sound like a
> good or bad idea?
>
> Is there any alternative for my use case that doesn't involve walking the
> commits one-by-one? (That's what we previously did and what turned out to be
> quite slow on our repository.)
I'm not sure what you mean by "one by one", since that is inherently
what rev-list is doing under the hood. If you mean "running a separate
process for each commit", then yes, that will be slow. But if you want
to know all of the names touched in a set of commits, I have used
something like this before:
git rev-list $new --not --all |
git diff-tree --stdin --format= -r -c --name-only
A few notes:
- the names may be quoted if they have metacharacters; use "-z" if
your reading side can handle it to make things simpler
- merges are always tricky. I think "-c" will give you what you want
(showing names which differed from any parent), but I didn't think
too hard. Using "-m" definitely would work, but may produce extra
names (ones where the merge just brought together two lines of
history, even though the commit where one of those lines touched the
file may have been excluded via "--not --all").
- if you are assuming the existing names are good, then probably
--diff-filter=A would be useful, as it would show only
newly-introduced names.
-Peff
next prev parent reply other threads:[~2023-03-24 19:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-24 15:51 Feature Request: Option to make "git rev-list --objects" output duplicate objects Baumann, Moritz
2023-03-24 16:50 ` Junio C Hamano
2023-03-27 7:02 ` Baumann, Moritz
2023-03-27 16:07 ` Junio C Hamano
2023-03-24 19:28 ` Jeff King [this message]
2023-03-28 8:08 ` Baumann, Moritz
2023-03-28 18:26 ` [PATCH] docs: document caveats of rev-list's object-name output Jeff King
2023-03-30 10:32 ` Baumann, Moritz
2023-03-28 18:32 ` Feature Request: Option to make "git rev-list --objects" output duplicate objects Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230324192848.GC536967@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=moritz.baumann@sap.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).