git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Baumann, Moritz" <moritz.baumann@sap.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Feature Request: Option to make "git rev-list --objects" output duplicate objects
Date: Fri, 24 Mar 2023 15:28:48 -0400	[thread overview]
Message-ID: <20230324192848.GC536967@coredump.intra.peff.net> (raw)
In-Reply-To: <AS1PR02MB8185A45DB63216699AFB2C5494849@AS1PR02MB8185.eurprd02.prod.outlook.com>

On Fri, Mar 24, 2023 at 03:51:21PM +0000, Baumann, Moritz wrote:

> …and then used the resulting list for all subsequent checks. After writing some
> unit tests, I noticed that the returned list is not sufficient: If you generate
> the exact same file twice, once with a "bad" name and once with a "good" name,
> you will only see one of those names and therefore the hook will mistakenly
> allow the push.
> 
> So, what I would want/need is an option that forces "git rev-list --objects"
> to output the object multiple times if it has multiple names in the commit
> range. Admittedly, such an option would likely only be useful for hooks that
> validate file names.

Another problem you might not have run into yet: the names given by
rev-list are not quoted in any way, and will just omit newlines. So if
your hook is trying to avoid malicious garbage like "foo\nbar", it won't
work.

Those names are really just intended as hints for pack-objects. I
suspect the documentation could be more clear about these limitations.

> Would it be feasible to implement such an option? If so, does it sound like a
> good or bad idea?
> 
> Is there any alternative for my use case that doesn't involve walking the
> commits one-by-one? (That's what we previously did and what turned out to be
> quite slow on our repository.)

I'm not sure what you mean by "one by one", since that is inherently
what rev-list is doing under the hood. If you mean "running a separate
process for each commit", then yes, that will be slow. But if you want
to know all of the names touched in a set of commits, I have used
something like this before:

  git rev-list $new --not --all |
  git diff-tree --stdin --format= -r -c --name-only

A few notes:

  - the names may be quoted if they have metacharacters; use "-z" if
    your reading side can handle it to make things simpler

  - merges are always tricky. I think "-c" will give you what you want
    (showing names which differed from any parent), but I didn't think
    too hard.  Using "-m" definitely would work, but may produce extra
    names (ones where the merge just brought together two lines of
    history, even though the commit where one of those lines touched the
    file may have been excluded via "--not --all").

  - if you are assuming the existing names are good, then probably
    --diff-filter=A would be useful, as it would show only
    newly-introduced names.

-Peff

  parent reply	other threads:[~2023-03-24 19:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-24 15:51 Feature Request: Option to make "git rev-list --objects" output duplicate objects Baumann, Moritz
2023-03-24 16:50 ` Junio C Hamano
2023-03-27  7:02   ` Baumann, Moritz
2023-03-27 16:07     ` Junio C Hamano
2023-03-24 19:28 ` Jeff King [this message]
2023-03-28  8:08   ` Baumann, Moritz
2023-03-28 18:26     ` [PATCH] docs: document caveats of rev-list's object-name output Jeff King
2023-03-30 10:32       ` Baumann, Moritz
2023-03-28 18:32     ` Feature Request: Option to make "git rev-list --objects" output duplicate objects Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230324192848.GC536967@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=moritz.baumann@sap.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).