git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* RFE: git-patch-id should handle patches without leading "diff"
@ 2018-12-07 18:19 Konstantin Ryabitsev
  2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
  2018-12-07 22:01 ` Jonathan Nieder
  0 siblings, 2 replies; 6+ messages in thread
From: Konstantin Ryabitsev @ 2018-12-07 18:19 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]

Hi, all:

Every now and again I come across a patch sent to LKML without a leading
"diff a/foo b/foo" -- usually produced by quilt. E.g.:

https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/

I am guessing quilt does not bother including the leading "diff a/foo
b/foo" because it's redundant with the next two lines, however this
remains a valid patch recognized by git-am.

If you pipe that patch via git-patch-id, it produces nothing, but if I
put in the leading "diff", like so:

diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c

then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Can we please teach git-patch-id to work without the leading diff a/foo
b/foo, same as git-am?

Best,
-K

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFE: git-patch-id should handle patches without leading "diff"
  2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
@ 2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
  2018-12-07 22:01 ` Jonathan Nieder
  1 sibling, 0 replies; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-12-07 19:25 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git


On Fri, Dec 07 2018, Konstantin Ryabitsev wrote:

> Hi, all:
>
> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Can we please teach git-patch-id to work without the leading diff a/foo
> b/foo, same as git-am?
>
> Best,
> -K

The state machine is sensitive there being a "diff" line, then "index"
etc.

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 970d0d30b4..b99e4455fd 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -97,7 +97,9 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 		}

 		/* Ignore commit comments */
-		if (!patchlen && !starts_with(line, "diff "))
+		if (!patchlen && starts_with(line, "--- a/"))
+			;
+		else if (!patchlen && !starts_with(line, "diff "))
 			continue;

 		/* Parsing diff header?  */

This would make it produce a patch-id for that input, however note that
I've done "--- a/" there, with just "--- " (which is legit) we'd get
confused and start earlier before the diffstat.

So if you're interested in having this I leave it to you to run with
this & write tests for it, but more convincingly run it on the git &
LKML archives and see that the output is the same (or just extra in case
where we now find patches) with --stable etc.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: RFE: git-patch-id should handle patches without leading "diff"
  2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
  2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
@ 2018-12-07 22:01 ` Jonathan Nieder
  2018-12-07 22:23   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2018-12-07 22:01 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git, Ævar Arnfjörð Bjarmason

Hi,

Konstantin Ryabitsev wrote:

> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Interesting.  As Ævar mentioned, the relevant code is

		/* Ignore commit comments */
		if (!patchlen && !starts_with(line, "diff "))
			continue;

which is trying to handle a case where a line that is special to the
parser appears before the diff begins.

The patch-id appears to only care about the diff text, so it should be
able to handle this.  So if we have a better heuristic for where the
diff starts, it would be good to use it.

"git apply" uses apply.c::find_header, which is more permissive.
Maybe it would be possible to unify these somehow.  (I haven't looked
closely enough to tell how painful that would be.)

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFE: git-patch-id should handle patches without leading "diff"
  2018-12-07 22:01 ` Jonathan Nieder
@ 2018-12-07 22:23   ` Ævar Arnfjörð Bjarmason
  2018-12-07 22:34     ` Jonathan Nieder
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-12-07 22:23 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Konstantin Ryabitsev, git


On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
> 		/* Ignore commit comments */
> 		if (!patchlen && !starts_with(line, "diff "))
> 			continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    index 9183875dad..dd8abe9185 100644
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Observe that the diff --git line matters, we hash it:

    $ git diff-tree -p HEAD~.. | git patch-id
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
    5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

The thing it doesn't care about is the "index" between the "diff" and
patch:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

We also care about the +++ and --- lines:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
    56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000

Then we normalize the @@ line, e.g.:

    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000
    $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id
    4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000


There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:
    
    diff --git x/refs/files-backend.c y/refs/files-backend.c
    --- x/refs/files-backend.c
    +++ y/refs/files-backend.c
    @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs,
                    break;
            case REF_TYPE_OTHER_PSEUDOREF:
            case REF_TYPE_MAIN_PSEUDOREF:
    -               return files_reflog_path_other_worktrees(refs, sb, refname);
    +               files_reflog_path_other_worktrees(refs, sb, refname);
    +               break;
            case REF_TYPE_NORMAL:
                    strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
                    break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

To be:

    diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
    --- a/arch/x86/kernel/process.c
    +++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if we generate a different hash depending on the tool generating
the diff. OTOH the "diff --git" line was never there, and it *does*
matter, so should we be faking it up? Maybe not, bah!

> "git apply" uses apply.c::find_header, which is more permissive.
> Maybe it would be possible to unify these somehow.  (I haven't looked
> closely enough to tell how painful that would be.)
>
> Thanks and hope that helps,
> Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFE: git-patch-id should handle patches without leading "diff"
  2018-12-07 22:23   ` Ævar Arnfjörð Bjarmason
@ 2018-12-07 22:34     ` Jonathan Nieder
  2018-12-08  6:01       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2018-12-07 22:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Konstantin Ryabitsev, git

Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 07 2018, Jonathan Nieder wrote:

>> The patch-id appears to only care about the diff text, so it should be
>> able to handle this.  So if we have a better heuristic for where the
>> diff starts, it would be good to use it.
>
> No, the patch-id doesn't just care about the diff, it cares about the
> context before the diff too.

Sorry, I did a bad job of communicating.  When I said "diff text", I was
including context.

[...]
> Observe that the diff --git line matters, we hash it:
>
>     $ git diff-tree -p HEAD~.. | git patch-id
>     5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
>     $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable
>     5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000
>     $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable
>     4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000

Oh, hm.  That's unfortunate.

[...]
> So it seems most sensible to me if this is going to be supported that we
> go a bit beyond the call of duty and fake up the start of it, namely:
>
>     --- a/arch/x86/kernel/process.c
>     +++ b/arch/x86/kernel/process.c
>
> To be:
>
>     diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>     --- a/arch/x86/kernel/process.c
>     +++ b/arch/x86/kernel/process.c

Right.  We may want to handle diff.mnemonicPrefix as well.

Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFE: git-patch-id should handle patches without leading "diff"
  2018-12-07 22:34     ` Jonathan Nieder
@ 2018-12-08  6:01       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2018-12-08  6:01 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, Konstantin Ryabitsev, git

Jonathan Nieder <jrnieder@gmail.com> writes:

>> So it seems most sensible to me if this is going to be supported that we
>> go a bit beyond the call of duty and fake up the start of it, namely:
>>
>>     --- a/arch/x86/kernel/process.c
>>     +++ b/arch/x86/kernel/process.c
>>
>> To be:
>>
>>     diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>     --- a/arch/x86/kernel/process.c
>>     +++ b/arch/x86/kernel/process.c
>
> Right.  We may want to handle diff.mnemonicPrefix as well.

I definitely think under the --stable option, we should pretend as
if the canonical a/ vs b/ prefixes were given with the "diff --git"
header, just like we try to reverse the effect of diff-orderfile,
etc.

I am unsure what the right behaviour under --unstable is, though.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-08  6:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev
2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason
2018-12-07 22:01 ` Jonathan Nieder
2018-12-07 22:23   ` Ævar Arnfjörð Bjarmason
2018-12-07 22:34     ` Jonathan Nieder
2018-12-08  6:01       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).