git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Dun Peal <dunpealer@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Efficiently detecting paths that differ from each other only in case
Date: Sun, 10 Oct 2010 23:07:55 -0400	[thread overview]
Message-ID: <20101011030755.GB6523@sigill.intra.peff.net> (raw)
In-Reply-To: <AANLkTi=YQOVYsK6Brq5pMiAdrH3Un7RgrWvYf_pymT=d@mail.gmail.com>

On Fri, Oct 08, 2010 at 05:57:16PM -0500, Dun Peal wrote:

> On Fri, Oct 8, 2010 at 3:06 PM, Jeff King <peff@peff.net> wrote:
> > Re-reading your original message, I have a few more thoughts.
> >
> > One is that you don't need to do this per-commit. You probably want to
> > do it per-updated-ref, each of which may be pushing many commits. And
> > then you either reject the new ref value or not.
> 
> I think I do, actually, because let's say the developer pushes two
> commits, 1<-2. Suppose commit 1 violates the rule, but commit 2
> reverts the violation. One might think that we don't care, since the
> head will now be on 2, which is a correct state. But in fact we do,
> because this is Git, and anyone may branch of from 1 in the future,
> and voila we have a head in an incorrect state.

Yeah, though it is not an especially likely state to branch from, since
you have to specify it manually. However, a much more likely scenario is
checkout out a past commit for testing, especially in bisection. So yes,
if you want to be thorough, you need to check every commit.

> Yeah, that's a pretty good idea, if not for the many ls-tree calls.
> With their overhead, I strongly suspect it may be slower than the
> solution you seem to propose, which is:
> 
> git ls-tree -r <commit>
> 
> which should give the full list of all paths in a commit, upon which I
> can decide to accept or reject.

Yeah, that is what I am proposing.

One other thing you could try is to "ls-tree -r" the known-good state of
the current HEAD at the beginning of the push, and then run "git log
-diff-filter=AD --name-status $old..$new". For each commit in the log
output, look for new entries that are in case-insensitive conflict with
the existing tree, and then update your tree state appropriately with
added and removed files. You only invoke two git commands, which saves
on invocation overhead, and you only ls-tree once per push, not per
commit. Git's internal diff shouldn't look at parts of the tree that
aren't relevant.

The downside is that the tree state you are keeping internally is not
entirely accurate. For example, when receiving a merge between two
parallel lines of development, you would process them linearly, when in
fact there are two simultaneous different states. So there is a case
where branch X removes "foo.txt" and branch Y adds "FOO.TXT", and then
they merge. It looks OK because linearly, they did not both exist at the
same time. But pre-merge, the commit in branch Y is broken.

So really the straightforward approach of checking the tree state for
each commit is probably simplest. If it's really too slow, you could try
jgit or linking against git itself, which would eliminate the external
process overhead.

-Peff

  parent reply	other threads:[~2010-10-11  3:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08  6:13 Efficiently detecting paths that differ from each other only in case Dun Peal
2010-10-08 13:50 ` Jeff King
2010-10-08 19:44   ` Dun Peal
2010-10-08 19:51     ` Jeff King
2010-10-08 19:57       ` Dun Peal
2010-10-08 20:06         ` Jeff King
2010-10-08 22:57           ` Dun Peal
2010-10-09  8:47             ` Jakub Narebski
2010-10-09 22:00               ` Dun Peal
2010-10-09 22:31                 ` Jakub Narebski
2010-10-11  3:07             ` Jeff King [this message]
2010-10-16 22:37               ` Dun Peal
2010-10-17  4:25                 ` Jeff King
2010-10-08 19:56     ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101011030755.GB6523@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dunpealer@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).