* RFE: git-patch-id should handle patches without leading "diff" @ 2018-12-07 18:19 Konstantin Ryabitsev 2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason 2018-12-07 22:01 ` Jonathan Nieder 0 siblings, 2 replies; 6+ messages in thread From: Konstantin Ryabitsev @ 2018-12-07 18:19 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 737 bytes --] Hi, all: Every now and again I come across a patch sent to LKML without a leading "diff a/foo b/foo" -- usually produced by quilt. E.g.: https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/ I am guessing quilt does not bother including the leading "diff a/foo b/foo" because it's redundant with the next two lines, however this remains a valid patch recognized by git-am. If you pipe that patch via git-patch-id, it produces nothing, but if I put in the leading "diff", like so: diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". Can we please teach git-patch-id to work without the leading diff a/foo b/foo, same as git-am? Best, -K [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFE: git-patch-id should handle patches without leading "diff" 2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev @ 2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason 2018-12-07 22:01 ` Jonathan Nieder 1 sibling, 0 replies; 6+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-12-07 19:25 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: git On Fri, Dec 07 2018, Konstantin Ryabitsev wrote: > Hi, all: > > Every now and again I come across a patch sent to LKML without a leading > "diff a/foo b/foo" -- usually produced by quilt. E.g.: > > https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/ > > I am guessing quilt does not bother including the leading "diff a/foo > b/foo" because it's redundant with the next two lines, however this > remains a valid patch recognized by git-am. > > If you pipe that patch via git-patch-id, it produces nothing, but if I > put in the leading "diff", like so: > > diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". > > Can we please teach git-patch-id to work without the leading diff a/foo > b/foo, same as git-am? > > Best, > -K The state machine is sensitive there being a "diff" line, then "index" etc. diff --git a/builtin/patch-id.c b/builtin/patch-id.c index 970d0d30b4..b99e4455fd 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -97,7 +97,9 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, } /* Ignore commit comments */ - if (!patchlen && !starts_with(line, "diff ")) + if (!patchlen && starts_with(line, "--- a/")) + ; + else if (!patchlen && !starts_with(line, "diff ")) continue; /* Parsing diff header? */ This would make it produce a patch-id for that input, however note that I've done "--- a/" there, with just "--- " (which is legit) we'd get confused and start earlier before the diffstat. So if you're interested in having this I leave it to you to run with this & write tests for it, but more convincingly run it on the git & LKML archives and see that the output is the same (or just extra in case where we now find patches) with --stable etc. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: RFE: git-patch-id should handle patches without leading "diff" 2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev 2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason @ 2018-12-07 22:01 ` Jonathan Nieder 2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 6+ messages in thread From: Jonathan Nieder @ 2018-12-07 22:01 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: git, Ævar Arnfjörð Bjarmason Hi, Konstantin Ryabitsev wrote: > Every now and again I come across a patch sent to LKML without a leading > "diff a/foo b/foo" -- usually produced by quilt. E.g.: > > https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/ > > I am guessing quilt does not bother including the leading "diff a/foo > b/foo" because it's redundant with the next two lines, however this > remains a valid patch recognized by git-am. > > If you pipe that patch via git-patch-id, it produces nothing, but if I > put in the leading "diff", like so: > > diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". Interesting. As Ævar mentioned, the relevant code is /* Ignore commit comments */ if (!patchlen && !starts_with(line, "diff ")) continue; which is trying to handle a case where a line that is special to the parser appears before the diff begins. The patch-id appears to only care about the diff text, so it should be able to handle this. So if we have a better heuristic for where the diff starts, it would be good to use it. "git apply" uses apply.c::find_header, which is more permissive. Maybe it would be possible to unify these somehow. (I haven't looked closely enough to tell how painful that would be.) Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFE: git-patch-id should handle patches without leading "diff" 2018-12-07 22:01 ` Jonathan Nieder @ 2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason 2018-12-07 22:34 ` Jonathan Nieder 0 siblings, 1 reply; 6+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-12-07 22:23 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Konstantin Ryabitsev, git On Fri, Dec 07 2018, Jonathan Nieder wrote: > Hi, > > Konstantin Ryabitsev wrote: > >> Every now and again I come across a patch sent to LKML without a leading >> "diff a/foo b/foo" -- usually produced by quilt. E.g.: >> >> https://lore.kernel.org/lkml/20181125185004.151077005@linutronix.de/ >> >> I am guessing quilt does not bother including the leading "diff a/foo >> b/foo" because it's redundant with the next two lines, however this >> remains a valid patch recognized by git-am. >> >> If you pipe that patch via git-patch-id, it produces nothing, but if I >> put in the leading "diff", like so: >> >> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> >> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". > > Interesting. As Ævar mentioned, the relevant code is > > /* Ignore commit comments */ > if (!patchlen && !starts_with(line, "diff ")) > continue; > > which is trying to handle a case where a line that is special to the > parser appears before the diff begins. > > The patch-id appears to only care about the diff text, so it should be > able to handle this. So if we have a better heuristic for where the > diff starts, it would be good to use it. No, the patch-id doesn't just care about the diff, it cares about the context before the diff too. See this patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. diff --git x/refs/files-backend.c y/refs/files-backend.c index 9183875dad..dd8abe9185 100644 --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Observe that the diff --git line matters, we hash it: $ git diff-tree -p HEAD~.. | git patch-id 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 The thing it doesn't care about is the "index" between the "diff" and patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 We also care about the +++ and --- lines: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id 56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000 Then we normalize the @@ line, e.g.: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 There's other caveats (see the code, e.g. "strip space") but to a first approximation a patch id is a hash of something that looks like this: diff --git x/refs/files-backend.c y/refs/files-backend.c --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Which means that accepting a patch like this as input would actually give you a different patch-id than if it had the proper header. So it seems most sensible to me if this is going to be supported that we go a bit beyond the call of duty and fake up the start of it, namely: --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c To be: diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c It'll make the state machine a bit more complex, but IMO it would suck more if we generate a different hash depending on the tool generating the diff. OTOH the "diff --git" line was never there, and it *does* matter, so should we be faking it up? Maybe not, bah! > "git apply" uses apply.c::find_header, which is more permissive. > Maybe it would be possible to unify these somehow. (I haven't looked > closely enough to tell how painful that would be.) > > Thanks and hope that helps, > Jonathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFE: git-patch-id should handle patches without leading "diff" 2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason @ 2018-12-07 22:34 ` Jonathan Nieder 2018-12-08 6:01 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Nieder @ 2018-12-07 22:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Konstantin Ryabitsev, git Ævar Arnfjörð Bjarmason wrote: > On Fri, Dec 07 2018, Jonathan Nieder wrote: >> The patch-id appears to only care about the diff text, so it should be >> able to handle this. So if we have a better heuristic for where the >> diff starts, it would be good to use it. > > No, the patch-id doesn't just care about the diff, it cares about the > context before the diff too. Sorry, I did a bad job of communicating. When I said "diff text", I was including context. [...] > Observe that the diff --git line matters, we hash it: > > $ git diff-tree -p HEAD~.. | git patch-id > 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 > $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable > 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 > $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable > 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 Oh, hm. That's unfortunate. [...] > So it seems most sensible to me if this is going to be supported that we > go a bit beyond the call of duty and fake up the start of it, namely: > > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > > To be: > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c Right. We may want to handle diff.mnemonicPrefix as well. Jonathan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFE: git-patch-id should handle patches without leading "diff" 2018-12-07 22:34 ` Jonathan Nieder @ 2018-12-08 6:01 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2018-12-08 6:01 UTC (permalink / raw) To: Jonathan Nieder Cc: Ævar Arnfjörð Bjarmason, Konstantin Ryabitsev, git Jonathan Nieder <jrnieder@gmail.com> writes: >> So it seems most sensible to me if this is going to be supported that we >> go a bit beyond the call of duty and fake up the start of it, namely: >> >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c >> >> To be: >> >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c > > Right. We may want to handle diff.mnemonicPrefix as well. I definitely think under the --stable option, we should pretend as if the canonical a/ vs b/ prefixes were given with the "diff --git" header, just like we try to reverse the effect of diff-orderfile, etc. I am unsure what the right behaviour under --unstable is, though. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-08 6:02 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-07 18:19 RFE: git-patch-id should handle patches without leading "diff" Konstantin Ryabitsev 2018-12-07 19:25 ` Ævar Arnfjörð Bjarmason 2018-12-07 22:01 ` Jonathan Nieder 2018-12-07 22:23 ` Ævar Arnfjörð Bjarmason 2018-12-07 22:34 ` Jonathan Nieder 2018-12-08 6:01 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).