git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] Make git rebase -r's label generation more resilient
@ 2019-09-02 14:01 Johannes Schindelin via GitGitGadget
  2019-09-02 14:01 ` [PATCH 1/1] rebase -r: let `label` generate safer labels Matt R via GitGitGadget
  2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 22+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-09-02 14:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Those labels must be valid ref names, and therefore valid file names.

This just came in via Git for Windows.

Matt R (1):
  rebase -r: let `label` generate safer labels

 sequencer.c              | 12 +++++++++++-
 t/t3430-rebase-merges.sh |  5 +++++
 2 files changed, 16 insertions(+), 1 deletion(-)


base-commit: 5fa0f5238b0cd46cfe7f6fa76c3f526ea98148d9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-327%2Fdscho%2Ffix-rebase-r-labels-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-327/dscho/fix-rebase-r-labels-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/327
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 14:01 [PATCH 0/1] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
@ 2019-09-02 14:01 ` Matt R via GitGitGadget
  2019-09-02 17:57   ` Phillip Wood
  2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 22+ messages in thread
From: Matt R via GitGitGadget @ 2019-09-02 14:01 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Matt R

From: Matt R <mattr94@gmail.com>

The `label` todo command in interactive rebases creates temporary refs
in the `refs/rewritten/` namespace. These refs are stored as loose refs,
i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
with file name limitations on the current filesystem.

This poses a problem in particular on NTFS/FAT, where e.g. the colon
character is not a valid part of a file name.

Let's safeguard against this by replacing not only white-space
characters by dashes, but all non-alpha-numeric ones.

However, we exempt non-ASCII UTF-8 characters from that, as it should be
quite possible to reflect branch names such as `↯↯↯` in refs/file names.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c              | 12 +++++++++++-
 t/t3430-rebase-merges.sh |  5 +++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 34ebf8ed94..23f4a0876a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 		else
 			strbuf_addbuf(&label, &oneline);
 
+		/*
+		 * Sanitize labels by replacing non-alpha-numeric characters
+		 * (including white-space ones) by dashes, as they might be
+		 * illegal in file names (and hence in ref names).
+		 *
+		 * Note that we retain non-ASCII UTF-8 characters (identified
+		 * via the most significant bit). They should be all acceptable
+		 * in file names. We do not validate the UTF-8 here, that's not
+		 * the job of this function.
+		 */
 		for (p1 = label.buf; *p1; p1++)
-			if (isspace(*p1))
+			if (!(*p1 & 0x80) && !isalnum(*p1))
 				*(char *)p1 = '-';
 
 		strbuf_reset(&buf);
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 7b6c4847ad..737396f944 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -441,4 +441,9 @@ test_expect_success '--continue after resolving conflicts after a merge' '
 	test_path_is_missing .git/MERGE_HEAD
 '
 
+test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
+	git checkout -b colon-in-label E &&
+	git merge -m "colon: this should work" G &&
+	git rebase --rebase-merges --force-rebase E
+'
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 14:01 ` [PATCH 1/1] rebase -r: let `label` generate safer labels Matt R via GitGitGadget
@ 2019-09-02 17:57   ` Phillip Wood
  2019-09-02 18:29     ` Junio C Hamano
  2019-09-02 19:42     ` Johannes Schindelin
  0 siblings, 2 replies; 22+ messages in thread
From: Phillip Wood @ 2019-09-02 17:57 UTC (permalink / raw)
  To: Matt R via GitGitGadget, git; +Cc: Junio C Hamano, Matt R

Hi Matt

This is definitely worth fixing, I've got a couple of comments below

On 02/09/2019 15:01, Matt R via GitGitGadget wrote:
> From: Matt R <mattr94@gmail.com>
> 
> The `label` todo command in interactive rebases creates temporary refs
> in the `refs/rewritten/` namespace. These refs are stored as loose refs,
> i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
> with file name limitations on the current filesystem.
> 
> This poses a problem in particular on NTFS/FAT, where e.g. the colon
> character is not a valid part of a file name.

Being picking I'll point out that ':' is not a valid in refs either. 
Looking at 
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I 
think only " and | are not allowed on NTFS/FAT but are valid in refs 
(see the man page for git check-ref-format for all the details). So the 
main limitation is actually what git allows in refs.

> Let's safeguard against this by replacing not only white-space
> characters by dashes, but all non-alpha-numeric ones.
> 
> However, we exempt non-ASCII UTF-8 characters from that, as it should be
> quite possible to reflect branch names such as `↯↯↯` in refs/file names.
> 
> Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   sequencer.c              | 12 +++++++++++-
>   t/t3430-rebase-merges.sh |  5 +++++
>   2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 34ebf8ed94..23f4a0876a 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>   		else
>   			strbuf_addbuf(&label, &oneline);
>   
> +		/*
> +		 * Sanitize labels by replacing non-alpha-numeric characters
> +		 * (including white-space ones) by dashes, as they might be
> +		 * illegal in file names (and hence in ref names).
> +		 *
> +		 * Note that we retain non-ASCII UTF-8 characters (identified
> +		 * via the most significant bit). They should be all acceptable
> +		 * in file names. We do not validate the UTF-8 here, that's not
> +		 * the job of this function.
> +		 */
>   		for (p1 = label.buf; *p1; p1++)
> -			if (isspace(*p1))
> +			if (!(*p1 & 0x80) && !isalnum(*p1))
>   				*(char *)p1 = '-';

I'm sightly concerned that this opens the possibility for unexpected 
effects if two different labels get sanitized to the same string. I 
suspect it's unlikely to happen in practice but doing something like 
percent encoding non-alphanumeric characters would avoid the problem 
entirely.

Best Wishes

Phillip

>   		strbuf_reset(&buf);
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index 7b6c4847ad..737396f944 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -441,4 +441,9 @@ test_expect_success '--continue after resolving conflicts after a merge' '
>   	test_path_is_missing .git/MERGE_HEAD
>   '
>   
> +test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
> +	git checkout -b colon-in-label E &&
> +	git merge -m "colon: this should work" G &&
> +	git rebase --rebase-merges --force-rebase E
> +'
>   test_done
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 17:57   ` Phillip Wood
@ 2019-09-02 18:29     ` Junio C Hamano
  2019-09-02 20:12       ` brian m. carlson
                         ` (2 more replies)
  2019-09-02 19:42     ` Johannes Schindelin
  1 sibling, 3 replies; 22+ messages in thread
From: Junio C Hamano @ 2019-09-02 18:29 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Matt R via GitGitGadget, git, Matt R

Phillip Wood <phillip.wood123@gmail.com> writes:

> Being picking I'll point out that ':' is not a valid in refs
> either. Looking at
> https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I
> think only " and | are not allowed on NTFS/FAT but are valid in refs
> (see the man page for git check-ref-format for all the details). So
> the main limitation is actually what git allows in refs.

Yeah, trying to use the contents of the log message without
sufficient sanitization is looking for trouble.

>>   		for (p1 = label.buf; *p1; p1++)
>> -			if (isspace(*p1))
>> +			if (!(*p1 & 0x80) && !isalnum(*p1))
>>   				*(char *)p1 = '-';
>
> I'm sightly concerned that this opens the possibility for unexpected
> effects if two different labels get sanitized to the same string. I
> suspect it's unlikely to happen in practice but doing something like
> percent encoding non-alphanumeric characters would avoid the problem
> entirely.

I'd rather see 'x' used instead of '-' (double-or-more dashes and
leading dash in refnames may currently be allowed but double-or-more
exes and leading ex would be much more likely to stay valid) if we
just want to redact invalid characters.

I see there are "lets make sure it is unique by suffixing "-%d" in
other codepaths; would that help if this piece of code yields a
label that is not unique?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 17:57   ` Phillip Wood
  2019-09-02 18:29     ` Junio C Hamano
@ 2019-09-02 19:42     ` Johannes Schindelin
  2019-09-03 18:10       ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2019-09-02 19:42 UTC (permalink / raw)
  To: phillip.wood; +Cc: Matt R via GitGitGadget, git, Junio C Hamano, Matt R

[-- Attachment #1: Type: text/plain, Size: 3631 bytes --]

Hi Phillip,

On Mon, 2 Sep 2019, Phillip Wood wrote:

> This is definitely worth fixing, I've got a couple of comments below
>
> On 02/09/2019 15:01, Matt R via GitGitGadget wrote:
> > From: Matt R <mattr94@gmail.com>

I just noticed that the surname is abbreviated. The full name of the
author is "Matt Rogers". (Matt, Git uses the Signed-off-by: lines as
some sort of legally-binding assurance that you are free to submit these
changes under the GPLv2, so your full name is kinda required.)

> > The `label` todo command in interactive rebases creates temporary refs
> > in the `refs/rewritten/` namespace. These refs are stored as loose refs,
> > i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
> > with file name limitations on the current filesystem.
> >
> > This poses a problem in particular on NTFS/FAT, where e.g. the colon
> > character is not a valid part of a file name.
>
> Being picking I'll point out that ':' is not a valid in refs either.

True, but that was not the primary concern here ;-)

> Looking at
> https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I
> think only " and | are not allowed on NTFS/FAT but are valid in refs
> (see the man page for git check-ref-format for all the details). So
> the main limitation is actually what git allows in refs.

Right. And this example shows that we really need to be a bit more
conservative than just disallowing characters that would not be allowed
in refs.

I think it is more important to keep in mind the vagaries of various
current and future filesystems to justify the change in this patch.

> > Let's safeguard against this by replacing not only white-space
> > characters by dashes, but all non-alpha-numeric ones.
> >
> > However, we exempt non-ASCII UTF-8 characters from that, as it should be
> > quite possible to reflect branch names such as `↯↯↯` in refs/file names.
> >
> > Signed-off-by: Matthew Rogers <mattr94@gmail.com>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >   sequencer.c              | 12 +++++++++++-
> >   t/t3430-rebase-merges.sh |  5 +++++
> >   2 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/sequencer.c b/sequencer.c
> > index 34ebf8ed94..23f4a0876a 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -4635,8 +4635,18 @@ static int make_script_with_merges(struct
> > pretty_print_context *pp,
> >     else
> >      strbuf_addbuf(&label, &oneline);
> >   +		/*
> > +		 * Sanitize labels by replacing non-alpha-numeric characters
> > +		 * (including white-space ones) by dashes, as they might be
> > +		 * illegal in file names (and hence in ref names).
> > +		 *
> > +		 * Note that we retain non-ASCII UTF-8 characters (identified
> > +		 * via the most significant bit). They should be all
> > acceptable
> > +		 * in file names. We do not validate the UTF-8 here, that's
> > not
> > +		 * the job of this function.
> > +		 */
> >   		for (p1 = label.buf; *p1; p1++)
> > -			if (isspace(*p1))
> > +			if (!(*p1 & 0x80) && !isalnum(*p1))
> >       *(char *)p1 = '-';
>
> I'm sightly concerned that this opens the possibility for unexpected effects
> if two different labels get sanitized to the same string. I suspect it's
> unlikely to happen in practice but doing something like percent encoding
> non-alphanumeric characters would avoid the problem entirely.

Oh, but we make sure that the labels are unique, via the `label_oid()`
function! Otherwise, we would not be able to label more than one merge
parent ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 18:29     ` Junio C Hamano
@ 2019-09-02 20:12       ` brian m. carlson
  2019-09-02 21:24       ` Philip Oakley
  2019-09-03 11:19       ` Johannes Schindelin
  2 siblings, 0 replies; 22+ messages in thread
From: brian m. carlson @ 2019-09-02 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Matt R via GitGitGadget, git, Matt R

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]

On 2019-09-02 at 18:29:56, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
> > Being picking I'll point out that ':' is not a valid in refs
> > either. Looking at
> > https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file I
> > think only " and | are not allowed on NTFS/FAT but are valid in refs
> > (see the man page for git check-ref-format for all the details). So
> > the main limitation is actually what git allows in refs.
> 
> Yeah, trying to use the contents of the log message without
> sufficient sanitization is looking for trouble.
> 
> >>   		for (p1 = label.buf; *p1; p1++)
> >> -			if (isspace(*p1))
> >> +			if (!(*p1 & 0x80) && !isalnum(*p1))
> >>   				*(char *)p1 = '-';

While we're thinking of things that could go wrong, note that it's also
possible for the commit message to contain non-UTF-8 characters (if the
user is using a non-UTF-8 encoding), which will cause sadness on Windows
and macOS.  Non-Mac Unix systems aren't a problem here, but then again,
they aren't the reason for this patch.

> > I'm sightly concerned that this opens the possibility for unexpected
> > effects if two different labels get sanitized to the same string. I
> > suspect it's unlikely to happen in practice but doing something like
> > percent encoding non-alphanumeric characters would avoid the problem
> > entirely.
> 
> I see there are "lets make sure it is unique by suffixing "-%d" in
> other codepaths; would that help if this piece of code yields a
> label that is not unique?

I was thinking the same thing.  Since we're being much less lenient on
what's allowed (which is fine), we're at increased risk for collision.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 18:29     ` Junio C Hamano
  2019-09-02 20:12       ` brian m. carlson
@ 2019-09-02 21:24       ` Philip Oakley
       [not found]         ` <CAOjrSZtw+wYHxFRQCfb80xzm9OsGDh2rW8uD+AYYdmDPxk5DFQ@mail.gmail.com>
  2019-09-03 11:19       ` Johannes Schindelin
  2 siblings, 1 reply; 22+ messages in thread
From: Philip Oakley @ 2019-09-02 21:24 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood; +Cc: Matt R via GitGitGadget, git, Matt R

On 02/09/2019 19:29, Junio C Hamano wrote:
> I see there are "lets make sure it is unique by suffixing "-%d" in
> other codepaths; would that help if this piece of code yields a
> label that is not unique?
maybe use a trailing 4 characters  of the oid to get a reasonably unique 
label?

Oh, just seen dscho's "we make sure that the labels are unique, via the 
`label_oid()` function!", maybe needs mentioning in the commit message 
if re-rolled.

Philip

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
       [not found]         ` <CAOjrSZtw+wYHxFRQCfb80xzm9OsGDh2rW8uD+AYYdmDPxk5DFQ@mail.gmail.com>
@ 2019-09-02 22:13           ` Philip Oakley
  0 siblings, 0 replies; 22+ messages in thread
From: Philip Oakley @ 2019-09-02 22:13 UTC (permalink / raw)
  To: Matt Rogers; +Cc: Junio C Hamano, Phillip Wood, Matt R via GitGitGadget, git

On 02/09/2019 22:47, Matt Rogers wrote:
> I can redo the commit, I had thought that I had previously fixed the 
> author but I guess I was mistaken.
>
> As for issues with non utf-8 encodings, I don't know of any simple way 
> to check for those except for restricting to asci alphanumeric characters

If I read the Wikipedia article [1] about the utf-8 design choices it is 
pretty reasonable and robust most of the time, though that maybe a part 
one way trapdoor.

> Also the code given doesn't resolve onelines that consist only of 
> restricted file names (e.g. COM, NUL, etc. On windows)

It maybe that the rebase doc may need (if it happens) a short comment 
warning of that.

Also need to check if the `label_oid()` function actually makes the 
label distinct, hence prevents such labels from being used as such a 
restricted file name - i.e. does it include the oid element.

Ultimately the label could be tweaked to have say the 4char prefix to 
fool the Windows 'starts with' name detection - which assumes I 
understand how some of those bad filenames are detected...
>
>
>
> On Mon, Sep 2, 2019, 5:24 PM Philip Oakley <philipoakley@iee.email> wrote:
>
>     On 02/09/2019 19:29, Junio C Hamano wrote:
>     > I see there are "lets make sure it is unique by suffixing "-%d" in
>     > other codepaths; would that help if this piece of code yields a
>     > label that is not unique?
>     maybe use a trailing 4 characters  of the oid to get a reasonably
>     unique
>     label?
>
>     Oh, just seen dscho's "we make sure that the labels are unique,
>     via the
>     `label_oid()` function!", maybe needs mentioning in the commit
>     message
>     if re-rolled.
>
>     Philip
>

[1] https://en.wikipedia.org/wiki/UTF-8#Description

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 18:29     ` Junio C Hamano
  2019-09-02 20:12       ` brian m. carlson
  2019-09-02 21:24       ` Philip Oakley
@ 2019-09-03 11:19       ` Johannes Schindelin
  2019-09-03 19:51         ` Junio C Hamano
  2 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2019-09-03 11:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Matt R via GitGitGadget, git, Matt Rogers

Hi Junio,

On Mon, 2 Sep 2019, Junio C Hamano wrote:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
> >>   		for (p1 = label.buf; *p1; p1++)
> >> -			if (isspace(*p1))
> >> +			if (!(*p1 & 0x80) && !isalnum(*p1))
> >>   				*(char *)p1 = '-';
> >
> > I'm sightly concerned that this opens the possibility for unexpected
> > effects if two different labels get sanitized to the same string. I
> > suspect it's unlikely to happen in practice but doing something like
> > percent encoding non-alphanumeric characters would avoid the problem
> > entirely.
>
> I'd rather see 'x' used instead of '-' (double-or-more dashes and
> leading dash in refnames may currently be allowed but double-or-more
> exes and leading ex would be much more likely to stay valid) if we
> just want to redact invalid characters.

Hmm. Let's take a concrete example from the VFS for Git fork:

	Merge pull request #160: trace2:gvfs:experiment Add experimental regions and data events to help diagnose checkout and reset perf problems

(Yes, we have some quite verbose merge commits, with very, very long
onelines. Not a good practice, we stopped doing it, but it was well
within what Git allows.)

And now use dashes to encode all white-space:

	Merge-pull-request-#160:-trace2:gvfs:experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems

Pretty long, but looks okay. Of course, it does not work, because colons. So here is the label with Matt's patch:

	Merge-pull-request--160--trace2-gvfs-experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems

And here is the label with your proposed xs.

	Mergexpullxrequestxx160xxtrace2xgvfsxexperimentxAddxexperimentalxregionsxandxdataxeventsxtoxhelpxdiagnosexcheckoutxandxresetxperfxproblems

I cannot speak for you, of course, but I can speak for myself: this is
not only way too reminiscent of xoxoxothxbye, but it is also really,
totally unreadable.

If you care deeply about double dashes and leading dashes, how about
this instead?

		char *from, *to;

   		for (from = to = label.buf; *from; from++)
			if ((*from & 0x80) || isalnum(*from))
				*(to++) = *from;
			/* avoid leading dash and double-dashes */
			else if (to != label.buf && to[-1] != '-')
   				*(to++) = '-';
		strbuf_setlen(&label, to - label.buf);

That would result in

	Merge-pull-request-160-trace2-gvfs-experiment-Add-experimental-regions-and-data-events-to-help-diagnose-checkout-and-reset-perf-problems

> I see there are "lets make sure it is unique by suffixing "-%d" in
> other codepaths; would that help if this piece of code yields a
> label that is not unique?

I'm way ahead of you. The sequencer already goes out of its way to
guarantee the uniqueness of the labels (it's part of the design, as
applied in 1644c73c6d4 (rebase-helper --make-script: introduce a flag to
rebase merges, 2018-04-25)).

The patch you are looking at in this thread is only concerned about the
initial phase, `label_oid()` does a lot more. Not only does it make the
label unique (case-insensitively!), it also prevents it from looking
like a full 40-hex digit SHA-1, so that we can guarantee that unique
abbreviations of commit hashes will work as labels, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-02 19:42     ` Johannes Schindelin
@ 2019-09-03 18:10       ` Junio C Hamano
  2019-11-18 20:51         ` Johannes Schindelin
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2019-09-03 18:10 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: phillip.wood, Matt R via GitGitGadget, git, Matt R

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> I'm sightly concerned that this opens the possibility for unexpected effects
>> if two different labels get sanitized to the same string. I suspect it's
>> unlikely to happen in practice but doing something like percent encoding
>> non-alphanumeric characters would avoid the problem entirely.
>
> Oh, but we make sure that the labels are unique, via the `label_oid()`
> function! Otherwise, we would not be able to label more than one merge
> parent ;-)

It somewhat feels suboptimal, from code followability's point of
view, to have this "pre-sanitization" to replace isspace() to a
dash, which is being extended to "all non-alnums", and the uniquefy
of labels in label_oid(), in two separate places.  I wonder if the
resulting code becomes easier to follow and harder to introduce new
bugs, if this part is made to just yield label.buf it obtained form
the log message as-is and leave the munging to label_oid()?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-03 11:19       ` Johannes Schindelin
@ 2019-09-03 19:51         ` Junio C Hamano
  2019-09-03 22:40           ` Matt Rogers
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2019-09-03 19:51 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Phillip Wood, Matt R via GitGitGadget, git, Matt Rogers

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> If you care deeply about double dashes and leading dashes, how about
> this instead?
>
> 		char *from, *to;
>
>    		for (from = to = label.buf; *from; from++)
> 			if ((*from & 0x80) || isalnum(*from))
> 				*(to++) = *from;
> 			/* avoid leading dash and double-dashes */
> 			else if (to != label.buf && to[-1] != '-')
>    				*(to++) = '-';
> 		strbuf_setlen(&label, to - label.buf);

Simple enough and is a good change when judged locally.

It still would cause readers to wonder if label_oid() later append
'-%d' to end up with double-dash near the end, etc., which made me
wonder if the resulting code becomes better if sanitization and
uniquefying are done at the same single place in the other message.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-03 19:51         ` Junio C Hamano
@ 2019-09-03 22:40           ` Matt Rogers
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Rogers @ 2019-09-03 22:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Phillip Wood, Matt R via GitGitGadget, git

I agree that the code locally was simple enough.

Ultimately I feel that sanitizing and uniqueifying the label should
probably be done closer together/at the same place.  I'm just not
familiar enough with the codebase to know a good place (if any) to move
that to.  Eventually though this would still need to be expanded further
to protect against reserved filenames (e.g. NUL on windows).  Although
the behavior around these (espescially with file extensions like
NUL.txt) become less reliable, and although they are much more unlikely
to be encountered in practice, are still allowed by git as oneliners.


On Tue, Sep 3, 2019 at 3:51 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > If you care deeply about double dashes and leading dashes, how about
> > this instead?
> >
> >               char *from, *to;
> >
> >               for (from = to = label.buf; *from; from++)
> >                       if ((*from & 0x80) || isalnum(*from))
> >                               *(to++) = *from;
> >                       /* avoid leading dash and double-dashes */
> >                       else if (to != label.buf && to[-1] != '-')
> >                               *(to++) = '-';
> >               strbuf_setlen(&label, to - label.buf);
>
> Simple enough and is a good change when judged locally.
>
> It still would cause readers to wonder if label_oid() later append
> '-%d' to end up with double-dash near the end, etc., which made me
> wonder if the resulting code becomes better if sanitization and
> uniquefying are done at the same single place in the other message.



-- 
Matthew Rogers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/2] Make git rebase -r's label generation more resilient
  2019-09-02 14:01 [PATCH 0/1] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
  2019-09-02 14:01 ` [PATCH 1/1] rebase -r: let `label` generate safer labels Matt R via GitGitGadget
@ 2019-11-17 23:16 ` Johannes Schindelin via GitGitGadget
  2019-11-17 23:16   ` [PATCH v2 1/2] rebase-merges: move labels' whitespace mangling into `label_oid()` Johannes Schindelin via GitGitGadget
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-11-17 23:16 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano

Those labels must be valid ref names, and therefore valid file names. The
initial patch came in via Git for Windows.

Change since v1:

 * moved the entire sanitizing logic to label_oid(), as a preparatory step.

Johannes Schindelin (1):
  rebase-merges: move labels' whitespace mangling into `label_oid()`

Matthew Rogers (1):
  rebase -r: let `label` generate safer labels

 sequencer.c              | 72 +++++++++++++++++++++++++---------------
 t/t3430-rebase-merges.sh |  6 ++++
 2 files changed, 51 insertions(+), 27 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-327%2Fdscho%2Ffix-rebase-r-labels-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-327/dscho/fix-rebase-r-labels-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/327

Range-diff vs v1:

 -:  ---------- > 1:  3f4d086e80 rebase-merges: move labels' whitespace mangling into `label_oid()`
 1:  4a02c38442 ! 2:  adc22c8ba2 rebase -r: let `label` generate safer labels
     @@ -1,14 +1,15 @@
     -Author: Matt R <mattr94@gmail.com>
     +Author: Matthew Rogers <mattr94@gmail.com>
      
          rebase -r: let `label` generate safer labels
      
          The `label` todo command in interactive rebases creates temporary refs
          in the `refs/rewritten/` namespace. These refs are stored as loose refs,
          i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
     -    with file name limitations on the current filesystem.
     +    with file name limitations on the current filesystem in addition to the
     +    accepted ref format.
      
     -    This poses a problem in particular on NTFS/FAT, where e.g. the colon
     -    character is not a valid part of a file name.
     +    This poses a problem in particular on NTFS/FAT, where e.g. the colon,
     +    double-quote and pipe characters are disallowed as part of a file name.
      
          Let's safeguard against this by replacing not only white-space
          characters by dashes, but all non-alpha-numeric ones.
     @@ -23,8 +24,8 @@
       --- a/sequencer.c
       +++ b/sequencer.c
      @@
     - 		else
     - 			strbuf_addbuf(&label, &oneline);
     + 	} else {
     + 		struct strbuf *buf = &state->buf;
       
      +		/*
      +		 * Sanitize labels by replacing non-alpha-numeric characters
     @@ -36,18 +37,26 @@
      +		 * in file names. We do not validate the UTF-8 here, that's not
      +		 * the job of this function.
      +		 */
     - 		for (p1 = label.buf; *p1; p1++)
     --			if (isspace(*p1))
     -+			if (!(*p1 & 0x80) && !isalnum(*p1))
     - 				*(char *)p1 = '-';
     + 		for (; *label; label++)
     +-			strbuf_addch(buf, isspace(*label) ? '-' : *label);
     ++			if ((*label & 0x80) || isalnum(*label))
     ++				strbuf_addch(buf, *label);
     ++			/* avoid leading dash and double-dashes */
     ++			else if (buf->len && buf->buf[buf->len - 1] != '-')
     ++				strbuf_addch(buf, '-');
     ++		if (!buf->len) {
     ++			strbuf_addstr(buf, "rev-");
     ++			strbuf_add_unique_abbrev(buf, oid, default_abbrev);
     ++		}
     + 		label = buf->buf;
       
     - 		strbuf_reset(&buf);
     + 		if ((buf->len == the_hash_algo->hexsz &&
      
       diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
       --- a/t/t3430-rebase-merges.sh
       +++ b/t/t3430-rebase-merges.sh
      @@
     - 	test_path_is_missing .git/MERGE_HEAD
     + 	test_cmp expect G.t
       '
       
      +test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
     @@ -55,4 +64,5 @@
      +	git merge -m "colon: this should work" G &&
      +	git rebase --rebase-merges --force-rebase E
      +'
     ++
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/2] rebase-merges: move labels' whitespace mangling into `label_oid()`
  2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
@ 2019-11-17 23:16   ` Johannes Schindelin via GitGitGadget
  2019-11-17 23:16   ` [PATCH v2 2/2] rebase -r: let `label` generate safer labels Matthew Rogers via GitGitGadget
  2019-11-18  3:42   ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Junio C Hamano
  2 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-11-17 23:16 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

One of the trickier aspects of the design of `git rebase
--rebase-merges` is the way labels are generated for the initial todo
list: those labels are supposed to be intuitive and first and foremost
unique.

To that end, `label_oid()` appends a unique suffix when necessary.

Those labels not only need to be unique, but they also need to be valid
refs. To make sure of that, `make_script_with_merges()` replaces
whitespace by dashes.

That would appear to be the wrong layer for that sanitizing step,
though: all callers of `label_oid()` should get that same benefit.

Even if it does not make a difference currently (the only called of
`label_oid()` that passes a label that might need to be sanitized _is_
`make_script_with_merges()`), let's move the responsibility for
sanitizing labels into the `label_oid()` function.

This commit is best viewed with `-w` because it unfortunately needs to
change the indentation of a large block of code in `label_oid()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 56 ++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 8952cfa89b..85c66f489f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4423,7 +4423,6 @@ static const char *label_oid(struct object_id *oid, const char *label,
 	struct labels_entry *labels_entry;
 	struct string_entry *string_entry;
 	struct object_id dummy;
-	size_t len;
 	int i;
 
 	string_entry = oidmap_get(&state->commit2label, oid);
@@ -4443,10 +4442,10 @@ static const char *label_oid(struct object_id *oid, const char *label,
 	 * abbreviation for any uninteresting commit's names that does not
 	 * clash with any other label.
 	 */
+	strbuf_reset(&state->buf);
 	if (!label) {
 		char *p;
 
-		strbuf_reset(&state->buf);
 		strbuf_grow(&state->buf, GIT_MAX_HEXSZ);
 		label = p = state->buf.buf;
 
@@ -4469,32 +4468,37 @@ static const char *label_oid(struct object_id *oid, const char *label,
 				p[i] = save;
 			}
 		}
-	} else if (((len = strlen(label)) == the_hash_algo->hexsz &&
-		    !get_oid_hex(label, &dummy)) ||
-		   (len == 1 && *label == '#') ||
-		   hashmap_get_from_hash(&state->labels,
-					 strihash(label), label)) {
-		/*
-		 * If the label already exists, or if the label is a valid full
-		 * OID, or the label is a '#' (which we use as a separator
-		 * between merge heads and oneline), we append a dash and a
-		 * number to make it unique.
-		 */
+	} else {
 		struct strbuf *buf = &state->buf;
 
-		strbuf_reset(buf);
-		strbuf_add(buf, label, len);
+		for (; *label; label++)
+			strbuf_addch(buf, isspace(*label) ? '-' : *label);
+		label = buf->buf;
 
-		for (i = 2; ; i++) {
-			strbuf_setlen(buf, len);
-			strbuf_addf(buf, "-%d", i);
-			if (!hashmap_get_from_hash(&state->labels,
-						   strihash(buf->buf),
-						   buf->buf))
-				break;
-		}
+		if ((buf->len == the_hash_algo->hexsz &&
+		     !get_oid_hex(label, &dummy)) ||
+		    (buf->len == 1 && *label == '#') ||
+		    hashmap_get_from_hash(&state->labels,
+					  strihash(label), label)) {
+			/*
+			 * If the label already exists, or if the label is a
+			 * valid full OID, or the label is a '#' (which we use
+			 * as a separator between merge heads and oneline), we
+			 * append a dash and a number to make it unique.
+			 */
+			size_t len = buf->len;
 
-		label = buf->buf;
+			for (i = 2; ; i++) {
+				strbuf_setlen(buf, len);
+				strbuf_addf(buf, "-%d", i);
+				if (!hashmap_get_from_hash(&state->labels,
+							   strihash(buf->buf),
+							   buf->buf))
+					break;
+			}
+
+			label = buf->buf;
+		}
 	}
 
 	FLEX_ALLOC_STR(labels_entry, label, label);
@@ -4596,10 +4600,6 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 		else
 			strbuf_addbuf(&label, &oneline);
 
-		for (p1 = label.buf; *p1; p1++)
-			if (isspace(*p1))
-				*(char *)p1 = '-';
-
 		strbuf_reset(&buf);
 		strbuf_addf(&buf, "%s -C %s",
 			    cmd_merge, oid_to_hex(&commit->object.oid));
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] rebase -r: let `label` generate safer labels
  2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
  2019-11-17 23:16   ` [PATCH v2 1/2] rebase-merges: move labels' whitespace mangling into `label_oid()` Johannes Schindelin via GitGitGadget
@ 2019-11-17 23:16   ` Matthew Rogers via GitGitGadget
  2019-11-18  3:42   ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Junio C Hamano
  2 siblings, 0 replies; 22+ messages in thread
From: Matthew Rogers via GitGitGadget @ 2019-11-17 23:16 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Matthew Rogers

From: Matthew Rogers <mattr94@gmail.com>

The `label` todo command in interactive rebases creates temporary refs
in the `refs/rewritten/` namespace. These refs are stored as loose refs,
i.e. as files in `.git/refs/rewritten/`, therefore they have to conform
with file name limitations on the current filesystem in addition to the
accepted ref format.

This poses a problem in particular on NTFS/FAT, where e.g. the colon,
double-quote and pipe characters are disallowed as part of a file name.

Let's safeguard against this by replacing not only white-space
characters by dashes, but all non-alpha-numeric ones.

However, we exempt non-ASCII UTF-8 characters from that, as it should be
quite possible to reflect branch names such as `↯↯↯` in refs/file names.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c              | 20 +++++++++++++++++++-
 t/t3430-rebase-merges.sh |  6 ++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 85c66f489f..fece07b680 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4471,8 +4471,26 @@ static const char *label_oid(struct object_id *oid, const char *label,
 	} else {
 		struct strbuf *buf = &state->buf;
 
+		/*
+		 * Sanitize labels by replacing non-alpha-numeric characters
+		 * (including white-space ones) by dashes, as they might be
+		 * illegal in file names (and hence in ref names).
+		 *
+		 * Note that we retain non-ASCII UTF-8 characters (identified
+		 * via the most significant bit). They should be all acceptable
+		 * in file names. We do not validate the UTF-8 here, that's not
+		 * the job of this function.
+		 */
 		for (; *label; label++)
-			strbuf_addch(buf, isspace(*label) ? '-' : *label);
+			if ((*label & 0x80) || isalnum(*label))
+				strbuf_addch(buf, *label);
+			/* avoid leading dash and double-dashes */
+			else if (buf->len && buf->buf[buf->len - 1] != '-')
+				strbuf_addch(buf, '-');
+		if (!buf->len) {
+			strbuf_addstr(buf, "rev-");
+			strbuf_add_unique_abbrev(buf, oid, default_abbrev);
+		}
 		label = buf->buf;
 
 		if ((buf->len == the_hash_algo->hexsz &&
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 9efcf4808a..f728aba995 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -468,4 +468,10 @@ test_expect_success '--rebase-merges with strategies' '
 	test_cmp expect G.t
 '
 
+test_expect_success '--rebase-merges with commit that can generate bad characters for filename' '
+	git checkout -b colon-in-label E &&
+	git merge -m "colon: this should work" G &&
+	git rebase --rebase-merges --force-rebase E
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/2] Make git rebase -r's label generation more resilient
  2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
  2019-11-17 23:16   ` [PATCH v2 1/2] rebase-merges: move labels' whitespace mangling into `label_oid()` Johannes Schindelin via GitGitGadget
  2019-11-17 23:16   ` [PATCH v2 2/2] rebase -r: let `label` generate safer labels Matthew Rogers via GitGitGadget
@ 2019-11-18  3:42   ` Junio C Hamano
  2019-11-18 11:23     ` [PATCH] sequencer: handle rebase-merge for "onto" message Danh Doan
  2019-11-18 20:52     ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin
  2 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2019-11-18  3:42 UTC (permalink / raw)
  To: Doan Tran Cong Danh
  Cc: git, Johannes Schindelin, Johannes Schindelin via GitGitGadget

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Those labels must be valid ref names, and therefore valid file names. The
> initial patch came in via Git for Windows.
>
> Change since v1:
>
>  * moved the entire sanitizing logic to label_oid(), as a preparatory step.
>
> Johannes Schindelin (1):
>   rebase-merges: move labels' whitespace mangling into `label_oid()`
>
> Matthew Rogers (1):
>   rebase -r: let `label` generate safer labels
>
>  sequencer.c              | 72 +++++++++++++++++++++++++---------------
>  t/t3430-rebase-merges.sh |  6 ++++
>  2 files changed, 51 insertions(+), 27 deletions(-)

I think Dscho meant to Cc you as these two patches are meant to be a
more complete solution to supersede your [*1*].


[Reference]

*1* <860dee65f49ea7eacf5a0c7c8ffe59095a51b1ce.1573205699.git.congdanhqx@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] sequencer: handle rebase-merge for "onto" message
  2019-11-18  3:42   ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Junio C Hamano
@ 2019-11-18 11:23     ` Danh Doan
  2019-11-18 11:57       ` [PATCH v2] sequencer: handle rebase-merges " Doan Tran Cong Danh
  2019-11-21  0:16       ` [PATCH] sequencer: handle rebase-merge " Junio C Hamano
  2019-11-18 20:52     ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin
  1 sibling, 2 replies; 22+ messages in thread
From: Danh Doan @ 2019-11-18 11:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Johannes Schindelin via GitGitGadget

On 2019-11-18 12:42:41+0900, Junio C Hamano <gitster@pobox.com> wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > Those labels must be valid ref names, and therefore valid file names. The
> > initial patch came in via Git for Windows.
> >
> > Change since v1:
> >
> >  * moved the entire sanitizing logic to label_oid(), as a preparatory step.
> >
> > Johannes Schindelin (1):
> >   rebase-merges: move labels' whitespace mangling into `label_oid()`
> >
> > Matthew Rogers (1):
> >   rebase -r: let `label` generate safer labels
> >
> >  sequencer.c              | 72 +++++++++++++++++++++++++---------------
> >  t/t3430-rebase-merges.sh |  6 ++++
> >  2 files changed, 51 insertions(+), 27 deletions(-)
> 
> I think Dscho meant to Cc you as these two patches are meant to be a
> more complete solution to supersede your [*1*].

I've applied their patches over my branches,
my introduced test_expect_failure was flipped.
That's nice.

Anyway, when reading their patch, I discovered a problem with
git rebase --rebase-merges when its message is onto.

Here is a patch to fix it.

I applied Dscho's patches over my dd/sequencer-utf8 then writing this
patch, in case you have problem applying it.

The context for the diff is coming from Dscho's patches.

-------8<--------------------
From 48205889b404b82baa4b30c2eedd52243c349e3e Mon Sep 17 00:00:00 2001
From: Doan Tran Cong Danh <congdanhqx@gmail.com>
Date: Mon, 18 Nov 2019 18:02:05 +0700
Subject: [PATCH] sequencer: handle rebase-merge for "onto" message

In order to work correctly, git-rebase --rebase-merges needs to make
initial todo list with unique labels.

Those unique labels is being handled by employing a hashmap and
suffixing an unique number if any duplicate is found.

But we forgat that beside of those labels for side branches,
we also make a special label `onto' for our so-called new-base.

In a special case that any of those labels for side branches named
`onto', git will run into trouble.

Correct it.

Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
---
 sequencer.c              |  5 +++++
 t/t3430-rebase-merges.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 350045b1b4..fc81e43f0f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4569,10 +4569,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 	strbuf_init(&state.buf, 32);
 
 	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
+		struct labels_entry *onto_label_entry;
 		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
 		FLEX_ALLOC_STR(entry, string, "onto");
 		oidcpy(&entry->entry.oid, oid);
 		oidmap_put(&state.commit2label, entry);
+
+		FLEX_ALLOC_STR(onto_label_entry, label, "onto");
+		hashmap_entry_init(&onto_label_entry->entry, strihash("onto"));
+		hashmap_add(&state.labels, &onto_label_entry->entry);
 	}
 
 	/*
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index f728aba995..4e2c0ede51 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -474,4 +474,25 @@ test_expect_success '--rebase-merges with commit that can generate bad character
 	git rebase --rebase-merges --force-rebase E
 '
 
+test_expect_success '--rebase-merges with message matched with onto label' '
+	git checkout -b onto-label E &&
+	git merge -m onto G &&
+	git rebase --rebase-merges --force-rebase E &&
+	test_cmp_graph <<-\EOF
+	*   onto
+	|\
+	| * G
+	| * F
+	* |   E
+	|\ \
+	| * | B
+	* | | D
+	| |/
+	|/|
+	* | C
+	|/
+	* A
+	EOF
+'
+
 test_done
-- 
2.24.0.10.gc61c3b979f.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2] sequencer: handle rebase-merges for "onto" message
  2019-11-18 11:23     ` [PATCH] sequencer: handle rebase-merge for "onto" message Danh Doan
@ 2019-11-18 11:57       ` Doan Tran Cong Danh
  2019-11-18 20:15         ` Johannes Schindelin
  2019-11-21  0:16       ` [PATCH] sequencer: handle rebase-merge " Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Doan Tran Cong Danh @ 2019-11-18 11:57 UTC (permalink / raw)
  To: git; +Cc: gitster, johannes.schindelin, Doan Tran Cong Danh

In order to work correctly, git-rebase --rebase-merges needs to make
initial todo list with unique labels.

Those unique labels is being handled by employing a hashmap and
appending an unique number if any duplicate is found.

But, we forget that beside those labels for side branches,
we also have a special label `onto' for our so-called new-base.

In a special case that any of those labels for side branches named
`onto', git will run into trouble.

Correct it.

Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
---
Sorry for the noise, I forgot to check spelling for v1

And I forgot to delete From line when append to my MUA.

Range-diff against v1:
1:  48205889b4 ! 1:  9246beacf2 sequencer: handle rebase-merge for "onto" message
    @@ Metadata
     Author: Doan Tran Cong Danh <congdanhqx@gmail.com>
     
      ## Commit message ##
    -    sequencer: handle rebase-merge for "onto" message
    +    sequencer: handle rebase-merges for "onto" message
     
         In order to work correctly, git-rebase --rebase-merges needs to make
         initial todo list with unique labels.
     
         Those unique labels is being handled by employing a hashmap and
    -    suffixing an unique number if any duplicate is found.
    +    appending an unique number if any duplicate is found.
     
    -    But we forgat that beside of those labels for side branches,
    -    we also make a special label `onto' for our so-called new-base.
    +    But, we forget that beside those labels for side branches,
    +    we also have a special label `onto' for our so-called new-base.
     
         In a special case that any of those labels for side branches named
         `onto', git will run into trouble.

 sequencer.c              |  5 +++++
 t/t3430-rebase-merges.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 350045b1b4..fc81e43f0f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4569,10 +4569,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 	strbuf_init(&state.buf, 32);
 
 	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
+		struct labels_entry *onto_label_entry;
 		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
 		FLEX_ALLOC_STR(entry, string, "onto");
 		oidcpy(&entry->entry.oid, oid);
 		oidmap_put(&state.commit2label, entry);
+
+		FLEX_ALLOC_STR(onto_label_entry, label, "onto");
+		hashmap_entry_init(&onto_label_entry->entry, strihash("onto"));
+		hashmap_add(&state.labels, &onto_label_entry->entry);
 	}
 
 	/*
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index f728aba995..4e2c0ede51 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -474,4 +474,25 @@ test_expect_success '--rebase-merges with commit that can generate bad character
 	git rebase --rebase-merges --force-rebase E
 '
 
+test_expect_success '--rebase-merges with message matched with onto label' '
+	git checkout -b onto-label E &&
+	git merge -m onto G &&
+	git rebase --rebase-merges --force-rebase E &&
+	test_cmp_graph <<-\EOF
+	*   onto
+	|\
+	| * G
+	| * F
+	* |   E
+	|\ \
+	| * | B
+	* | | D
+	| |/
+	|/|
+	* | C
+	|/
+	* A
+	EOF
+'
+
 test_done
-- 
2.24.0.10.gc61c3b979f.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] sequencer: handle rebase-merges for "onto" message
  2019-11-18 11:57       ` [PATCH v2] sequencer: handle rebase-merges " Doan Tran Cong Danh
@ 2019-11-18 20:15         ` Johannes Schindelin
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2019-11-18 20:15 UTC (permalink / raw)
  To: Doan Tran Cong Danh; +Cc: git, gitster

Hi,

On Mon, 18 Nov 2019, Doan Tran Cong Danh wrote:

> In order to work correctly, git-rebase --rebase-merges needs to make
> initial todo list with unique labels.
>
> Those unique labels is being handled by employing a hashmap and
> appending an unique number if any duplicate is found.
>
> But, we forget that beside those labels for side branches,
> we also have a special label `onto' for our so-called new-base.
>
> In a special case that any of those labels for side branches named
> `onto', git will run into trouble.
>
> Correct it.
>
> Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
> ---

Looks obviously correct to me. ACK!

Thank you,
Dscho

> Sorry for the noise, I forgot to check spelling for v1
>
> And I forgot to delete From line when append to my MUA.
>
> Range-diff against v1:
> 1:  48205889b4 ! 1:  9246beacf2 sequencer: handle rebase-merge for "onto" message
>     @@ Metadata
>      Author: Doan Tran Cong Danh <congdanhqx@gmail.com>
>
>       ## Commit message ##
>     -    sequencer: handle rebase-merge for "onto" message
>     +    sequencer: handle rebase-merges for "onto" message
>
>          In order to work correctly, git-rebase --rebase-merges needs to make
>          initial todo list with unique labels.
>
>          Those unique labels is being handled by employing a hashmap and
>     -    suffixing an unique number if any duplicate is found.
>     +    appending an unique number if any duplicate is found.
>
>     -    But we forgat that beside of those labels for side branches,
>     -    we also make a special label `onto' for our so-called new-base.
>     +    But, we forget that beside those labels for side branches,
>     +    we also have a special label `onto' for our so-called new-base.
>
>          In a special case that any of those labels for side branches named
>          `onto', git will run into trouble.
>
>  sequencer.c              |  5 +++++
>  t/t3430-rebase-merges.sh | 21 +++++++++++++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index 350045b1b4..fc81e43f0f 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4569,10 +4569,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>  	strbuf_init(&state.buf, 32);
>
>  	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
> +		struct labels_entry *onto_label_entry;
>  		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
>  		FLEX_ALLOC_STR(entry, string, "onto");
>  		oidcpy(&entry->entry.oid, oid);
>  		oidmap_put(&state.commit2label, entry);
> +
> +		FLEX_ALLOC_STR(onto_label_entry, label, "onto");
> +		hashmap_entry_init(&onto_label_entry->entry, strihash("onto"));
> +		hashmap_add(&state.labels, &onto_label_entry->entry);
>  	}
>
>  	/*
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index f728aba995..4e2c0ede51 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -474,4 +474,25 @@ test_expect_success '--rebase-merges with commit that can generate bad character
>  	git rebase --rebase-merges --force-rebase E
>  '
>
> +test_expect_success '--rebase-merges with message matched with onto label' '
> +	git checkout -b onto-label E &&
> +	git merge -m onto G &&
> +	git rebase --rebase-merges --force-rebase E &&
> +	test_cmp_graph <<-\EOF
> +	*   onto
> +	|\
> +	| * G
> +	| * F
> +	* |   E
> +	|\ \
> +	| * | B
> +	* | | D
> +	| |/
> +	|/|
> +	* | C
> +	|/
> +	* A
> +	EOF
> +'
> +
>  test_done
> --
> 2.24.0.10.gc61c3b979f.dirty
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] rebase -r: let `label` generate safer labels
  2019-09-03 18:10       ` Junio C Hamano
@ 2019-11-18 20:51         ` Johannes Schindelin
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2019-11-18 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: phillip.wood, Matt R via GitGitGadget, git, Matt R

Hi Junio,

On Tue, 3 Sep 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> I'm sightly concerned that this opens the possibility for unexpected effects
> >> if two different labels get sanitized to the same string. I suspect it's
> >> unlikely to happen in practice but doing something like percent encoding
> >> non-alphanumeric characters would avoid the problem entirely.
> >
> > Oh, but we make sure that the labels are unique, via the `label_oid()`
> > function! Otherwise, we would not be able to label more than one merge
> > parent ;-)
>
> It somewhat feels suboptimal, from code followability's point of
> view, to have this "pre-sanitization" to replace isspace() to a
> dash, which is being extended to "all non-alnums", and the uniquefy
> of labels in label_oid(), in two separate places.  I wonder if the
> resulting code becomes easier to follow and harder to introduce new
> bugs, if this part is made to just yield label.buf it obtained form
> the log message as-is and leave the munging to label_oid()?

While the patch looks somewhat bulky without `-w` (due to the
indentation change), I agree that it is conceptually a good idea,
therefore I made it so.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/2] Make git rebase -r's label generation more resilient
  2019-11-18  3:42   ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Junio C Hamano
  2019-11-18 11:23     ` [PATCH] sequencer: handle rebase-merge for "onto" message Danh Doan
@ 2019-11-18 20:52     ` Johannes Schindelin
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2019-11-18 20:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Doan Tran Cong Danh, git, Johannes Schindelin via GitGitGadget

Hi Junio,

On Mon, 18 Nov 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > Those labels must be valid ref names, and therefore valid file names. The
> > initial patch came in via Git for Windows.
> >
> > Change since v1:
> >
> >  * moved the entire sanitizing logic to label_oid(), as a preparatory step.
> >
> > Johannes Schindelin (1):
> >   rebase-merges: move labels' whitespace mangling into `label_oid()`
> >
> > Matthew Rogers (1):
> >   rebase -r: let `label` generate safer labels
> >
> >  sequencer.c              | 72 +++++++++++++++++++++++++---------------
> >  t/t3430-rebase-merges.sh |  6 ++++
> >  2 files changed, 51 insertions(+), 27 deletions(-)
>
> I think Dscho meant to Cc you as these two patches are meant to be a
> more complete solution to supersede your [*1*].

Whoops, totally forgot. Thanks!
Dscho

>
>
> [Reference]
>
> *1* <860dee65f49ea7eacf5a0c7c8ffe59095a51b1ce.1573205699.git.congdanhqx@gmail.com>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] sequencer: handle rebase-merge for "onto" message
  2019-11-18 11:23     ` [PATCH] sequencer: handle rebase-merge for "onto" message Danh Doan
  2019-11-18 11:57       ` [PATCH v2] sequencer: handle rebase-merges " Doan Tran Cong Danh
@ 2019-11-21  0:16       ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2019-11-21  0:16 UTC (permalink / raw)
  To: Danh Doan; +Cc: git, Johannes Schindelin, Johannes Schindelin via GitGitGadget

Danh Doan <congdanhqx@gmail.com> writes:

> Anyway, when reading their patch, I discovered a problem with
> git rebase --rebase-merges when its message is onto.
>
> Here is a patch to fix it.
>
> I applied Dscho's patches over my dd/sequencer-utf8 then writing this
> patch, in case you have problem applying it.
>
> The context for the diff is coming from Dscho's patches.

Thanks.  While technically this is independent from the "safer
rebase-merges labels" topic (specifically its preparation step),
in the larger picture, this too is to ensure we do not use a wrong
string as a label ;-), so I'll queue it on top of those two patches,
just like how you developed.

Thanks.

> -------8<--------------------
> From 48205889b404b82baa4b30c2eedd52243c349e3e Mon Sep 17 00:00:00 2001
> From: Doan Tran Cong Danh <congdanhqx@gmail.com>
> Date: Mon, 18 Nov 2019 18:02:05 +0700
> Subject: [PATCH] sequencer: handle rebase-merge for "onto" message
>
> In order to work correctly, git-rebase --rebase-merges needs to make
> initial todo list with unique labels.
>
> Those unique labels is being handled by employing a hashmap and
> suffixing an unique number if any duplicate is found.
>
> But we forgat that beside of those labels for side branches,
> we also make a special label `onto' for our so-called new-base.
>
> In a special case that any of those labels for side branches named
> `onto', git will run into trouble.
>
> Correct it.
>
> Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
> ---
>  sequencer.c              |  5 +++++
>  t/t3430-rebase-merges.sh | 21 +++++++++++++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index 350045b1b4..fc81e43f0f 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4569,10 +4569,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>  	strbuf_init(&state.buf, 32);
>  
>  	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
> +		struct labels_entry *onto_label_entry;
>  		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
>  		FLEX_ALLOC_STR(entry, string, "onto");
>  		oidcpy(&entry->entry.oid, oid);
>  		oidmap_put(&state.commit2label, entry);
> +
> +		FLEX_ALLOC_STR(onto_label_entry, label, "onto");
> +		hashmap_entry_init(&onto_label_entry->entry, strihash("onto"));
> +		hashmap_add(&state.labels, &onto_label_entry->entry);
>  	}
>  
>  	/*
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index f728aba995..4e2c0ede51 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -474,4 +474,25 @@ test_expect_success '--rebase-merges with commit that can generate bad character
>  	git rebase --rebase-merges --force-rebase E
>  '
>  
> +test_expect_success '--rebase-merges with message matched with onto label' '
> +	git checkout -b onto-label E &&
> +	git merge -m onto G &&
> +	git rebase --rebase-merges --force-rebase E &&
> +	test_cmp_graph <<-\EOF
> +	*   onto
> +	|\
> +	| * G
> +	| * F
> +	* |   E
> +	|\ \
> +	| * | B
> +	* | | D
> +	| |/
> +	|/|
> +	* | C
> +	|/
> +	* A
> +	EOF
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-11-21  0:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 14:01 [PATCH 0/1] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
2019-09-02 14:01 ` [PATCH 1/1] rebase -r: let `label` generate safer labels Matt R via GitGitGadget
2019-09-02 17:57   ` Phillip Wood
2019-09-02 18:29     ` Junio C Hamano
2019-09-02 20:12       ` brian m. carlson
2019-09-02 21:24       ` Philip Oakley
     [not found]         ` <CAOjrSZtw+wYHxFRQCfb80xzm9OsGDh2rW8uD+AYYdmDPxk5DFQ@mail.gmail.com>
2019-09-02 22:13           ` Philip Oakley
2019-09-03 11:19       ` Johannes Schindelin
2019-09-03 19:51         ` Junio C Hamano
2019-09-03 22:40           ` Matt Rogers
2019-09-02 19:42     ` Johannes Schindelin
2019-09-03 18:10       ` Junio C Hamano
2019-11-18 20:51         ` Johannes Schindelin
2019-11-17 23:16 ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin via GitGitGadget
2019-11-17 23:16   ` [PATCH v2 1/2] rebase-merges: move labels' whitespace mangling into `label_oid()` Johannes Schindelin via GitGitGadget
2019-11-17 23:16   ` [PATCH v2 2/2] rebase -r: let `label` generate safer labels Matthew Rogers via GitGitGadget
2019-11-18  3:42   ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Junio C Hamano
2019-11-18 11:23     ` [PATCH] sequencer: handle rebase-merge for "onto" message Danh Doan
2019-11-18 11:57       ` [PATCH v2] sequencer: handle rebase-merges " Doan Tran Cong Danh
2019-11-18 20:15         ` Johannes Schindelin
2019-11-21  0:16       ` [PATCH] sequencer: handle rebase-merge " Junio C Hamano
2019-11-18 20:52     ` [PATCH v2 0/2] Make git rebase -r's label generation more resilient Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).