git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] git-remote-mediawiki: escape double quotes and LF in file names
@ 2012-11-29 12:33 Matthieu Moy
  2012-11-29 16:25 ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 12:33 UTC (permalink / raw
  To: git, gitster; +Cc: Matthieu Moy

A mediawiki page can contain, and even start with a " character, we have
to escape it when generating the fast-export stream. While we're there,
also escape newlines, but I don't think we can get them from MediaWiki
pages.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 contrib/mw-to-git/git-remote-mediawiki | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index 68555d4..e7a0e7b 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -711,6 +711,13 @@ sub fetch_mw_revisions {
 	return ($n, @revisions);
 }
 
+sub fe_escape_path {
+    my $path = shift;
+    $path =~ s/"/\\"/g;
+    $path =~ s/\n/\\n/g;
+    return $path;
+}
+
 sub import_file_revision {
 	my $commit = shift;
 	my %commit = %{$commit};
@@ -738,15 +745,17 @@ sub import_file_revision {
 		print STDOUT "from refs/mediawiki/$remotename/master^0\n";
 	}
 	if ($content ne DELETED_CONTENT) {
-		print STDOUT "M 644 inline $title.mw\n";
+		print STDOUT "M 644 inline " .
+		    fe_escape_path($title . ".mw") . "\n";
 		literal_data($content);
 		if (%mediafile) {
-			print STDOUT "M 644 inline $mediafile{title}\n";
+			print STDOUT "M 644 inline "
+			    . fe_escape_path($mediafile{title}) . "\n";
 			literal_data_raw($mediafile{content});
 		}
 		print STDOUT "\n\n";
 	} else {
-		print STDOUT "D $title.mw\n";
+		print STDOUT "D " . fe_escape_path($title . ".mw") . "\n";
 	}
 
 	# mediawiki revision number in the git note
-- 
1.8.0.319.g8abfee4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] git-remote-mediawiki: escape double quotes and LF in file names
  2012-11-29 12:33 [PATCH] git-remote-mediawiki: escape double quotes and LF in file names Matthieu Moy
@ 2012-11-29 16:25 ` Junio C Hamano
  2012-11-29 16:57   ` Matthieu Moy
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-11-29 16:25 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git

Matthieu Moy <Matthieu.Moy@imag.fr> writes:

> A mediawiki page can contain, and even start with a " character, we have
> to escape it when generating the fast-export stream. While we're there,
> also escape newlines, but I don't think we can get them from MediaWiki
> pages.
>
> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
> ---
>  contrib/mw-to-git/git-remote-mediawiki | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
> index 68555d4..e7a0e7b 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
> @@ -711,6 +711,13 @@ sub fetch_mw_revisions {
>  	return ($n, @revisions);
>  }
>  
> +sub fe_escape_path {
> +    my $path = shift;
> +    $path =~ s/"/\\"/g;
> +    $path =~ s/\n/\\n/g;
> +    return $path;
> +}

Is this sufficient?

My reading of the big comment at the beginning of fast-import.c is
that you would also want to quote each backslash; otherwise a
character (or an octal) after it will be taken as a C-style quoted
special letter, no?

>  sub import_file_revision {
>  	my $commit = shift;
>  	my %commit = %{$commit};
> @@ -738,15 +745,17 @@ sub import_file_revision {
>  		print STDOUT "from refs/mediawiki/$remotename/master^0\n";
>  	}
>  	if ($content ne DELETED_CONTENT) {
> -		print STDOUT "M 644 inline $title.mw\n";
> +		print STDOUT "M 644 inline " .
> +		    fe_escape_path($title . ".mw") . "\n";
>  		literal_data($content);
>  		if (%mediafile) {
> -			print STDOUT "M 644 inline $mediafile{title}\n";
> +			print STDOUT "M 644 inline "
> +			    . fe_escape_path($mediafile{title}) . "\n";
>  			literal_data_raw($mediafile{content});
>  		}
>  		print STDOUT "\n\n";
>  	} else {
> -		print STDOUT "D $title.mw\n";
> +		print STDOUT "D " . fe_escape_path($title . ".mw") . "\n";
>  	}
>  
>  	# mediawiki revision number in the git note

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] git-remote-mediawiki: escape double quotes and LF in file names
  2012-11-29 16:25 ` Junio C Hamano
@ 2012-11-29 16:57   ` Matthieu Moy
  2012-11-29 17:00     ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Matthieu Moy
  0 siblings, 1 reply; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 16:57 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>> +sub fe_escape_path {
>> +    my $path = shift;
>> +    $path =~ s/"/\\"/g;
>> +    $path =~ s/\n/\\n/g;
>> +    return $path;
>> +}
>
> Is this sufficient?
>
> My reading of the big comment at the beginning of fast-import.c is
> that you would also want to quote each backslash;

Nice catch, thanks.

Also, I was lacking the double-quotes around $path. In my defense, I had
only read the doc, not the code, and git-fast-import.txt was less
complete than fast-import.c. New patch follows, fixing the doc too.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 16:57   ` Matthieu Moy
@ 2012-11-29 17:00     ` Matthieu Moy
  2012-11-29 17:00       ` [PATCH 2/2 v2] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
  2012-11-29 18:11       ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 17:00 UTC (permalink / raw
  To: git, gitster; +Cc: Matthieu Moy

The documentation mentionned only newlines and double quotes as
characters needing escaping, but the backslash also needs it. Also, the
documentation was not clearly saying that double quotes around the file
name were required (double quotes in the examples could be interpreted as
part of the sentence, not part of the actual string).

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 Documentation/git-fast-import.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 6603a7a..35b909c 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -558,8 +558,9 @@ A `<path>` string must use UNIX-style directory separators (forward
 slash `/`), may contain any byte other than `LF`, and must not
 start with double quote (`"`).
 
-If an `LF` or double quote must be encoded into `<path>` shell-style
-quoting should be used, e.g. `"path/with\n and \" in it"`.
+If an `LF`, backslash or double quote must be encoded into `<path>`
+shell-style quoting should be used, and the complete name should be
+surrounded with double quotes e.g. `"path/with\n, \\ and \" in it"`.
 
 The value of `<path>` must be in canonical form. That is it must not:
 
-- 
1.8.0.319.g8abfee4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2 v2] git-remote-mediawiki: escape ", \, and LF in file names
  2012-11-29 17:00     ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Matthieu Moy
@ 2012-11-29 17:00       ` Matthieu Moy
  2012-11-29 18:11       ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 17:00 UTC (permalink / raw
  To: git, gitster; +Cc: Matthieu Moy

A mediawiki page can contain, and even start with a " character, we have
to escape it when generating the fast-export stream, as well as \
character. While we're there, also escape newlines, but I don't think we
can get them from MediaWiki pages.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 contrib/mw-to-git/git-remote-mediawiki      | 16 +++++++++++++---
 contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index 68555d4..094129d 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -711,6 +711,14 @@ sub fetch_mw_revisions {
 	return ($n, @revisions);
 }
 
+sub fe_escape_path {
+    my $path = shift;
+    $path =~ s/\\/\\\\/g;
+    $path =~ s/"/\\"/g;
+    $path =~ s/\n/\\n/g;
+    return '"' . $path . '"';
+}
+
 sub import_file_revision {
 	my $commit = shift;
 	my %commit = %{$commit};
@@ -738,15 +746,17 @@ sub import_file_revision {
 		print STDOUT "from refs/mediawiki/$remotename/master^0\n";
 	}
 	if ($content ne DELETED_CONTENT) {
-		print STDOUT "M 644 inline $title.mw\n";
+		print STDOUT "M 644 inline " .
+		    fe_escape_path($title . ".mw") . "\n";
 		literal_data($content);
 		if (%mediafile) {
-			print STDOUT "M 644 inline $mediafile{title}\n";
+			print STDOUT "M 644 inline "
+			    . fe_escape_path($mediafile{title}) . "\n";
 			literal_data_raw($mediafile{content});
 		}
 		print STDOUT "\n\n";
 	} else {
-		print STDOUT "D $title.mw\n";
+		print STDOUT "D " . fe_escape_path($title . ".mw") . "\n";
 	}
 
 	# mediawiki revision number in the git note
diff --git a/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh b/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
index 246d47d..b6405ce 100755
--- a/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
+++ b/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
@@ -318,4 +318,30 @@ test_expect_success 'git push with \ in format control' '
 '
 
 
+test_expect_success 'fast-import meta-characters in page name (mw -> git)' '
+	wiki_reset &&
+	wiki_editpage \"file\"_\\_foo "expect to be called \"file\"_\\_foo" false &&
+	git clone mediawiki::'"$WIKI_URL"' mw_dir_21 &&
+	test_path_is_file mw_dir_21/\"file\"_\\_foo.mw &&
+	wiki_getallpage ref_page_21 &&
+	test_diff_directories mw_dir_21 ref_page_21
+'
+
+
+test_expect_success 'fast-import meta-characters in page name (git -> mw) ' '
+	wiki_reset &&
+	git clone mediawiki::'"$WIKI_URL"' mw_dir_22 &&
+	(
+		cd mw_dir_22 &&
+		echo "this file is called \"file\"_\\_foo.mw" >\"file\"_\\_foo &&
+		git add . &&
+		git commit -am "file \"file\"_\\_foo" &&
+		git pull &&
+		git push
+	) &&
+	wiki_getallpage ref_page_22 &&
+	test_diff_directories mw_dir_22 ref_page_22
+'
+
+
 test_done
-- 
1.8.0.319.g8abfee4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 17:00     ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Matthieu Moy
  2012-11-29 17:00       ` [PATCH 2/2 v2] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
@ 2012-11-29 18:11       ` Jeff King
  2012-11-29 18:47         ` Matthieu Moy
  2012-11-29 19:15         ` [PATCH 1/2] " Junio C Hamano
  1 sibling, 2 replies; 16+ messages in thread
From: Jeff King @ 2012-11-29 18:11 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git, gitster

On Thu, Nov 29, 2012 at 06:00:54PM +0100, Matthieu Moy wrote:

> The documentation mentionned only newlines and double quotes as

s/nn/n/

> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 6603a7a..35b909c 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -558,8 +558,9 @@ A `<path>` string must use UNIX-style directory separators (forward
>  slash `/`), may contain any byte other than `LF`, and must not
>  start with double quote (`"`).
>  
> -If an `LF` or double quote must be encoded into `<path>` shell-style
> -quoting should be used, e.g. `"path/with\n and \" in it"`.
> +If an `LF`, backslash or double quote must be encoded into `<path>`
> +shell-style quoting should be used, and the complete name should be
> +surrounded with double quotes e.g. `"path/with\n, \\ and \" in it"`.

I think the point of the original is that you do not _need_ to quote
unless you have those two characters. IOW, you can do:

  M 100644 :1 file \with \backslashes

and it will stay in the "literal to the end of line" code path because
the path does not begin with a double-quote. It is only when you trigger
the "shell-style quoting" code path is in effect that you must then
follow the rules of that quoting (which includes escaping backslashes).
So technically, your modification to the beginning of the sentence is
not correct.

That being said, I think what you have written is more helpful to an end
user. There is no harm in quoting when we do not have to, as fast-import
implementations must know how to unquote anyway (and we over-quote in
fast-export in this case). And while the example above does work (and
was always designed to), it is sort of an unintuitive area that I would
not be surprised to see other fast-import implementations get wrong. As
a writer of a stream, it probably pays to be defensive and err on the
side of quoting more.

As for the text itself, a few minor punctuation suggestions:

> If an `LF`, backslash or double quote must be encoded
                       ^
                       missing comma as list delimiter

> into `<path>` shell-style quoting should be used, and the complete
               ^
               missing comma in if/then clause

This one was in the original as well, but it makes it harder to read and
is worth fixing.

> surrounded with double quotes e.g. `"path/with\n, \\ and \" in it"`.

Should the parenthetical be in parentheses (or a separate sentence)?

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 18:11       ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Jeff King
@ 2012-11-29 18:47         ` Matthieu Moy
  2012-11-29 18:54           ` Jeff King
  2012-11-29 19:15         ` [PATCH 1/2] " Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 18:47 UTC (permalink / raw
  To: Jeff King; +Cc: git, gitster

Jeff King <peff@peff.net> writes:

> So technically, your modification to the beginning of the sentence is
> not correct.

I'd say the resulting sentence is somehow incorrect, but not more than
the previous one (both say "if ..." without really telling what the
condition was).

>> If an `LF`, backslash or double quote must be encoded
>                        ^
>                        missing comma as list delimiter

Google tells me that my version was UK-correct but not US-correct. As
french, I have no opinion on the subject, I take yours ;-).

How about this:

A path can use the C-style string quoting (this is accepted in all
cases and mandatory if the filename starts with double quote or
contains `LF`). In C-style quoting, `LF`, backslash, and double quote
characters must be escaped by preceding them with a backslash. Also,
the complete name should be surrounded with double quotes (e.g.
`"path/with\n, \\ and \" in it"`).

This should be technically correct, and "this is accepted in all cases"
should encourrage people to use it.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 18:47         ` Matthieu Moy
@ 2012-11-29 18:54           ` Jeff King
  2012-11-29 19:11             ` [PATCH 1/2 v3] " Matthieu Moy
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2012-11-29 18:54 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git, gitster

On Thu, Nov 29, 2012 at 07:47:32PM +0100, Matthieu Moy wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So technically, your modification to the beginning of the sentence is
> > not correct.
> 
> I'd say the resulting sentence is somehow incorrect, but not more than
> the previous one (both say "if ..." without really telling what the
> condition was).

That's fair. There is a lot left unsaid in the original. :)

> >> If an `LF`, backslash or double quote must be encoded
> >                        ^
> >                        missing comma as list delimiter
> 
> Google tells me that my version was UK-correct but not US-correct. As
> french, I have no opinion on the subject, I take yours ;-).

Thanks, I had a vague recollection that it might be regional. I probably
should have looked it up myself.

> How about this:
> 
> A path can use the C-style string quoting (this is accepted in all
> cases and mandatory if the filename starts with double quote or
> contains `LF`). In C-style quoting, `LF`, backslash, and double quote
> characters must be escaped by preceding them with a backslash. Also,
> the complete name should be surrounded with double quotes (e.g.
> `"path/with\n, \\ and \" in it"`).
> 
> This should be technically correct, and "this is accepted in all cases"
> should encourrage people to use it.

I think that is much better, but it reads a little more easily to me if
we rearrange the second sentence. To complete my English bikeshedding,
here is how I would have written the whole paragraph:

  A path can use C-style string quoting; this is accepted in all cases
  and mandatory if the filename starts with double quote or contains
  `LF`. In C-style quoting, the complete name should be surrounded with
  double quotes, and any `LF`, backslash, or double quote characters
  must be escaped by preceding them with a backslash (e.g.,
  `"path/with\n, \\ and \" in it"`).

Feel free to incorporate or ignore any of my tweaks from that version.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 18:54           ` Jeff King
@ 2012-11-29 19:11             ` Matthieu Moy
  2012-11-29 19:11               ` [PATCH 2/2 v3] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
  2012-11-29 19:33               ` [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 19:11 UTC (permalink / raw
  To: git, gitster; +Cc: Matthieu Moy

The documentation mentionned only newlines and double quotes as
characters needing escaping, but the backslash also needs it. Also, the
documentation was not clearly saying that double quotes around the file
name were required (double quotes in the examples could be interpreted as
part of the sentence, not part of the actual string).

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
cut-and-paste of Peff's version, adapted from mine.

 Documentation/git-fast-import.txt | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 959e4d3..d1844ea 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -562,8 +562,12 @@ A `<path>` string must use UNIX-style directory separators (forward
 slash `/`), may contain any byte other than `LF`, and must not
 start with double quote (`"`).
 
-If an `LF` or double quote must be encoded into `<path>` shell-style
-quoting should be used, e.g. `"path/with\n and \" in it"`.
+A path can use C-style string quoting; this is accepted in all cases
+and mandatory if the filename starts with double quote or contains
+`LF`. In C-style quoting, the complete name should be surrounded with
+double quotes, and any `LF`, backslash, or double quote characters
+must be escaped by preceding them with a backslash (e.g.,
+`"path/with\n, \\ and \" in it"`).
 
 The value of `<path>` must be in canonical form. That is it must not:
 
-- 
1.8.0.319.g8abfee4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2 v3] git-remote-mediawiki: escape ", \, and LF in file names
  2012-11-29 19:11             ` [PATCH 1/2 v3] " Matthieu Moy
@ 2012-11-29 19:11               ` Matthieu Moy
  2012-11-29 19:33               ` [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Matthieu Moy @ 2012-11-29 19:11 UTC (permalink / raw
  To: git, gitster; +Cc: Matthieu Moy

A mediawiki page can contain, and even start with a " character, we have
to escape it when generating the fast-export stream, as well as \
character. While we're there, also escape newlines, but I don't think we
can get them from MediaWiki pages.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
 contrib/mw-to-git/git-remote-mediawiki      | 16 +++++++++++++---
 contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index 68555d4..094129d 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -711,6 +711,14 @@ sub fetch_mw_revisions {
 	return ($n, @revisions);
 }
 
+sub fe_escape_path {
+    my $path = shift;
+    $path =~ s/\\/\\\\/g;
+    $path =~ s/"/\\"/g;
+    $path =~ s/\n/\\n/g;
+    return '"' . $path . '"';
+}
+
 sub import_file_revision {
 	my $commit = shift;
 	my %commit = %{$commit};
@@ -738,15 +746,17 @@ sub import_file_revision {
 		print STDOUT "from refs/mediawiki/$remotename/master^0\n";
 	}
 	if ($content ne DELETED_CONTENT) {
-		print STDOUT "M 644 inline $title.mw\n";
+		print STDOUT "M 644 inline " .
+		    fe_escape_path($title . ".mw") . "\n";
 		literal_data($content);
 		if (%mediafile) {
-			print STDOUT "M 644 inline $mediafile{title}\n";
+			print STDOUT "M 644 inline "
+			    . fe_escape_path($mediafile{title}) . "\n";
 			literal_data_raw($mediafile{content});
 		}
 		print STDOUT "\n\n";
 	} else {
-		print STDOUT "D $title.mw\n";
+		print STDOUT "D " . fe_escape_path($title . ".mw") . "\n";
 	}
 
 	# mediawiki revision number in the git note
diff --git a/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh b/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
index 246d47d..b6405ce 100755
--- a/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
+++ b/contrib/mw-to-git/t/t9362-mw-to-git-utf8.sh
@@ -318,4 +318,30 @@ test_expect_success 'git push with \ in format control' '
 '
 
 
+test_expect_success 'fast-import meta-characters in page name (mw -> git)' '
+	wiki_reset &&
+	wiki_editpage \"file\"_\\_foo "expect to be called \"file\"_\\_foo" false &&
+	git clone mediawiki::'"$WIKI_URL"' mw_dir_21 &&
+	test_path_is_file mw_dir_21/\"file\"_\\_foo.mw &&
+	wiki_getallpage ref_page_21 &&
+	test_diff_directories mw_dir_21 ref_page_21
+'
+
+
+test_expect_success 'fast-import meta-characters in page name (git -> mw) ' '
+	wiki_reset &&
+	git clone mediawiki::'"$WIKI_URL"' mw_dir_22 &&
+	(
+		cd mw_dir_22 &&
+		echo "this file is called \"file\"_\\_foo.mw" >\"file\"_\\_foo &&
+		git add . &&
+		git commit -am "file \"file\"_\\_foo" &&
+		git pull &&
+		git push
+	) &&
+	wiki_getallpage ref_page_22 &&
+	test_diff_directories mw_dir_22 ref_page_22
+'
+
+
 test_done
-- 
1.8.0.319.g8abfee4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 18:11       ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Jeff King
  2012-11-29 18:47         ` Matthieu Moy
@ 2012-11-29 19:15         ` Junio C Hamano
  2012-11-29 19:19           ` Jeff King
  2012-11-30  9:39           ` Matthieu Moy
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2012-11-29 19:15 UTC (permalink / raw
  To: Jeff King; +Cc: Matthieu Moy, git

Jeff King <peff@peff.net> writes:

> On Thu, Nov 29, 2012 at 06:00:54PM +0100, Matthieu Moy wrote:
>
>> The documentation mentionned only newlines and double quotes as
>
> s/nn/n/
>
>> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
>> index 6603a7a..35b909c 100644
>> --- a/Documentation/git-fast-import.txt
>> +++ b/Documentation/git-fast-import.txt
>> @@ -558,8 +558,9 @@ A `<path>` string must use UNIX-style directory separators (forward
>>  slash `/`), may contain any byte other than `LF`, and must not
>>  start with double quote (`"`).
>>  
>> -If an `LF` or double quote must be encoded into `<path>` shell-style
>> -quoting should be used, e.g. `"path/with\n and \" in it"`.
>> +If an `LF`, backslash or double quote must be encoded into `<path>`
>> +shell-style quoting should be used, and the complete name should be
>> +surrounded with double quotes e.g. `"path/with\n, \\ and \" in it"`.

That "shell-style" contradicts with what fast-import.c says, though.
It claims to grok \octal and described as C-style.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 19:15         ` [PATCH 1/2] " Junio C Hamano
@ 2012-11-29 19:19           ` Jeff King
  2012-11-30  9:39           ` Matthieu Moy
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2012-11-29 19:19 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Matthieu Moy, git

On Thu, Nov 29, 2012 at 11:15:56AM -0800, Junio C Hamano wrote:

> >> -If an `LF` or double quote must be encoded into `<path>` shell-style
> >> -quoting should be used, e.g. `"path/with\n and \" in it"`.
> >> +If an `LF`, backslash or double quote must be encoded into `<path>`
> >> +shell-style quoting should be used, and the complete name should be
> >> +surrounded with double quotes e.g. `"path/with\n, \\ and \" in it"`.
> 
> That "shell-style" contradicts with what fast-import.c says, though.
> It claims to grok \octal and described as C-style.

Yeah, I think it was just laziness by the original author to use
"shell-style" to mean "quotes and backslash escaping" without thinking
too hard about which escape sequences are available. Saying C-style is
more accurate (and Matthieu's more recent update does that).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 19:11             ` [PATCH 1/2 v3] " Matthieu Moy
  2012-11-29 19:11               ` [PATCH 2/2 v3] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
@ 2012-11-29 19:33               ` Junio C Hamano
  2012-11-29 19:46                 ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2012-11-29 19:33 UTC (permalink / raw
  To: Matthieu Moy; +Cc: git

Matthieu Moy <Matthieu.Moy@imag.fr> writes:

> The documentation mentionned only newlines and double quotes as
> characters needing escaping, but the backslash also needs it. Also, the
> documentation was not clearly saying that double quotes around the file
> name were required (double quotes in the examples could be interpreted as
> part of the sentence, not part of the actual string).
>
> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
> ---
> cut-and-paste of Peff's version, adapted from mine.
>
>  Documentation/git-fast-import.txt | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 959e4d3..d1844ea 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -562,8 +562,12 @@ A `<path>` string must use UNIX-style directory separators (forward
>  slash `/`), may contain any byte other than `LF`, and must not
>  start with double quote (`"`).
>  
> -If an `LF` or double quote must be encoded into `<path>` shell-style
> -quoting should be used, e.g. `"path/with\n and \" in it"`.
> +A path can use C-style string quoting; this is accepted in all cases
> +and mandatory if the filename starts with double quote or contains
> +`LF`.

... or backslash?

> +In C-style quoting, the complete name should be surrounded with
> +double quotes, and any `LF`, backslash, or double quote characters
> +must be escaped by preceding them with a backslash (e.g.,
> +`"path/with\n, \\ and \" in it"`).
>  
>  The value of `<path>` must be in canonical form. That is it must not:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 19:33               ` [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths Junio C Hamano
@ 2012-11-29 19:46                 ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2012-11-29 19:46 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Matthieu Moy, git

On Thu, Nov 29, 2012 at 11:33:19AM -0800, Junio C Hamano wrote:

> > diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> > index 959e4d3..d1844ea 100644
> > --- a/Documentation/git-fast-import.txt
> > +++ b/Documentation/git-fast-import.txt
> > @@ -562,8 +562,12 @@ A `<path>` string must use UNIX-style directory separators (forward
> >  slash `/`), may contain any byte other than `LF`, and must not
> >  start with double quote (`"`).
> >  
> > -If an `LF` or double quote must be encoded into `<path>` shell-style
> > -quoting should be used, e.g. `"path/with\n and \" in it"`.
> > +A path can use C-style string quoting; this is accepted in all cases
> > +and mandatory if the filename starts with double quote or contains
> > +`LF`.
> 
> ... or backslash?

No, that was what we discussed elsewhere in the thread. It is OK to say:

  M 100644 :1 file \with \backslashes

as de-quoting is triggered by the first character being double-quote.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-29 19:15         ` [PATCH 1/2] " Junio C Hamano
  2012-11-29 19:19           ` Jeff King
@ 2012-11-30  9:39           ` Matthieu Moy
  2012-12-02  2:27             ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Matthieu Moy @ 2012-11-30  9:39 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Jeff King, git

Junio C Hamano <gitster@pobox.com> writes:

> That "shell-style" contradicts with what fast-import.c says, though.
> It claims to grok \octal and described as C-style.

As Peff mentionned, my last version is better, although still a bit
incomplete. My new version documents things that _must_ be escaped, but
not what _can_.

I'm reluctant to try being exhaustive in the .txt documentation if there
is an exhaustive comment in the code. One option would be to actually
turn the comment into an official specification by moving it to the .txt
file, but that goes far beyond the scope of my patch.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths
  2012-11-30  9:39           ` Matthieu Moy
@ 2012-12-02  2:27             ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2012-12-02  2:27 UTC (permalink / raw
  To: Matthieu Moy; +Cc: Jeff King, git

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> That "shell-style" contradicts with what fast-import.c says, though.
>> It claims to grok \octal and described as C-style.
>
> As Peff mentionned, my last version is better, although still a bit
> incomplete. My new version documents things that _must_ be escaped, but
> not what _can_.

Yeah, I agree.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-12-02  2:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 12:33 [PATCH] git-remote-mediawiki: escape double quotes and LF in file names Matthieu Moy
2012-11-29 16:25 ` Junio C Hamano
2012-11-29 16:57   ` Matthieu Moy
2012-11-29 17:00     ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Matthieu Moy
2012-11-29 17:00       ` [PATCH 2/2 v2] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
2012-11-29 18:11       ` [PATCH 1/2] git-fast-import.txt: improve documentation for quoted paths Jeff King
2012-11-29 18:47         ` Matthieu Moy
2012-11-29 18:54           ` Jeff King
2012-11-29 19:11             ` [PATCH 1/2 v3] " Matthieu Moy
2012-11-29 19:11               ` [PATCH 2/2 v3] git-remote-mediawiki: escape ", \, and LF in file names Matthieu Moy
2012-11-29 19:33               ` [PATCH 1/2 v3] git-fast-import.txt: improve documentation for quoted paths Junio C Hamano
2012-11-29 19:46                 ` Jeff King
2012-11-29 19:15         ` [PATCH 1/2] " Junio C Hamano
2012-11-29 19:19           ` Jeff King
2012-11-30  9:39           ` Matthieu Moy
2012-12-02  2:27             ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).