git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Thomas Rast <trast@student.ethz.ch>
Cc: <git@vger.kernel.org>, Bo Yang <struggleyb.nku@gmail.com>
Subject: Re: [PATCH v7 0/5] git log -L, all new and shiny
Date: Fri, 15 Jun 2012 23:01:13 -0700	[thread overview]
Message-ID: <7v1ulf94nq.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <7v1ulgd2f5.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Fri, 15 Jun 2012 08:23:42 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <trast@student.ethz.ch> writes:
>
>> This would be the first backwards coupling between the revision-walk and
>> the diff generation parts, at least that I know of.
>
> I am not convinced if you need to have any unusual back-coupling to
> begin with, by the way.
>
> If you say "git log -p [--options] -- pathspec", the revision
> machinery does filter commits that do not touch any paths that patch
> pathspec with the TREESAME logic, but that does not necessarily mean
> you will see _all_ the commits that are not TREESAME.

Let's clarify this part a bit.

Imagine you have three commits on a single strand of pearls.

	A---B---C

Between A and B, you only corrected indentation errors in file F as
a clean-up step, and then between B and C, you did a real change.
"git diff A B -- F" gives changes, "git diff -w A B -- F" does not.
Both "git diff B C -- F" and "git diff -w B C -- F" give you some
output.

Now, "git log --no-root -p -w C -- F" will give you C but not B.

Let's see how that happens.

The revision machinery looks at C and finds its parent B.  It runs
object level tree comparison and finds that their trees are
different at path F.  It makes a mental note that it may need to
show the log message of C, and asks the diff machinery to run
diff-tree between B and C.  The diff machinery finds that it needs
to show something even in the presense of -w option by actual
comparison, and just before showing the very first line of patch
output, it shows the log message of C (due to the earlier "mental
note").

Then the revision machinery looks at B.  It does the same between B
and A, but this time around, the diff machinery finds that, even
though A and B were _not_ TREESAME at the revision traversal level,
there is nothing to be shown after filtering with the -w option.
Hence no patch is shown and log message for B is not shown, either.

The interesting part of the above is that we did not bother using
the -w comparison to affect TREESAME check at the revision traversal
level.

I consider your "-L bottom,top" essentially the same "two blobs may
be different at the object name level, but after filtering the diff,
there may be no output" filter just like the "-w" option is.
Instead of filtering hunks that have changes only in whitespaces, it
filters hunks that do not overlap the given line range.  So if you
replace "-w" in the above example with "-L bottom,top", exactly the
same thing should happen.

The current code structure does not consider the content level
filtering done by the "-w" when computing TREESAME.  This DOES
affect how the merges are simplified.  If you and I start from a
commit X with full of indentation mistakes, you fix the earlier half
of these mistakes and make commit T while I fix the later half of
these mistakes and make commit J, and we merge our results into
merge M:

      J---M
     /   /
    X---T

none of "diff -w J M", "diff -w T M", "diff -w X J", "diff -w X T"
will output anything.  "git diff -p -w M" will however traverse both
branches, even though in this example nothing will be shown in the
output, because no two trees in these four commits are TREESAME.  

In the longer term, this _may_ be something we want to fix, but
teaching the revision machinery about only "-L bottom,top" is not
the way to do it.  Instead, we should devise a new mechanism to call
into the diff_patch machinery from the place where the revision
machinery computes TREESAME, and let it ask the filtered content
level differences if any of these options (these flags flip the
DIFF_FROM_CONTENTS diff option) are in use.  As a part of the
implementation of that new mechanism, we may want to cache the patch
such an early comparison produces, and reuse it when we actually
produce output, as an optimization.  But I think that is not limited
to the "-L bottom,top" option and an orthogonal issue, as it will
work equally well for existing "-w" filter (there may be others).

The above is where my recommendations are coming from.  The first
step for "-L bottom,top" should be to hook it as a new kind of
DIFF_FROM_CONTENTS option that compares two non-identical blobs but
possibly yield empty result into the diff machinery at the same
level as "-w".  Once that is solidly done, we can think about
updating the merge simplification logic to take DIFF_FROM_CONTENTS
changes into account when it computes TREESAME.

  reply	other threads:[~2012-06-16  6:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07 10:23 [PATCH v7 0/5] git log -L, all new and shiny Thomas Rast
2012-06-07 10:23 ` [PATCH v7 1/5] Refactor parse_loc Thomas Rast
2012-06-07 10:23 ` [PATCH v7 2/5] blame: introduce $ as "end of file" in -L syntax Thomas Rast
2012-06-07 17:23   ` Junio C Hamano
2012-06-07 17:44     ` Thomas Rast
2012-06-07 10:23 ` [PATCH v7 3/5] Export three functions from diff.c Thomas Rast
2012-06-07 17:44   ` Junio C Hamano
2012-06-07 10:23 ` [PATCH v7 4/5] Export rewrite_parents() for 'log -L' Thomas Rast
2012-06-07 10:23 ` [PATCH v7 5/5] Implement line-history search (git log -L) Thomas Rast
2012-06-07 17:42   ` Junio C Hamano
2012-06-07 17:52     ` Thomas Rast
2012-06-10  9:38   ` Zbigniew Jędrzejewski-Szmek
2012-06-15  4:40 ` [PATCH v7 0/5] git log -L, all new and shiny Junio C Hamano
2012-06-15 13:29   ` Thomas Rast
2012-06-15 15:23     ` Junio C Hamano
2012-06-16  6:01       ` Junio C Hamano [this message]
2012-06-19 10:11         ` Thomas Rast
2012-06-19 10:33           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v1ulf94nq.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=struggleyb.nku@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).