git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] remote-mediawiki: limit filenames to legal
@ 2017-10-29 16:37 Antoine Beaupré
  2017-10-29 18:10 ` [PATCH v2] " Antoine Beaupré
  2017-10-29 18:15 ` Antoine Beaupré
  0 siblings, 2 replies; 6+ messages in thread
From: Antoine Beaupré @ 2017-10-29 16:37 UTC (permalink / raw)
  To: git; +Cc: Antoine Beaupré

mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..58870d197 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -18,6 +18,7 @@ use Git::Mediawiki qw(clean_filename smudge_filename connect_maybe
 					EMPTY HTTP_CODE_OK);
 use DateTime::Format::ISO8601;
 use warnings;
+use POSIX;
 
 # By default, use UTF-8 to communicate with Git and the user
 binmode STDERR, ':encoding(UTF-8)';
@@ -703,7 +704,7 @@ sub import_file_revision {
 		%mediafile = %{$mediafile};
 	}
 
-	my $title = $commit{title};
+	my $title = substr($commit{title}, 0, NAME_MAX);
 	my $comment = $commit{comment};
 	my $content = $commit{content};
 	my $author = $commit{author};
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2] remote-mediawiki: limit filenames to legal
  2017-10-29 16:37 [PATCH] remote-mediawiki: limit filenames to legal Antoine Beaupré
@ 2017-10-29 18:10 ` Antoine Beaupré
  2017-10-30 10:34   ` Matthieu Moy
  2017-10-29 18:15 ` Antoine Beaupré
  1 sibling, 1 reply; 6+ messages in thread
From: Antoine Beaupré @ 2017-10-29 18:10 UTC (permalink / raw)
  To: git; +Cc: Antoine Beaupré

mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.

Signed-off-by: Antoine Beaupré <anarcat@debian.org>
---
 contrib/mw-to-git/Git/Mediawiki.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm b/contrib/mw-to-git/Git/Mediawiki.pm
index d13c4dfa7..c9f22680a 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -2,6 +2,7 @@ package Git::Mediawiki;
 
 use 5.008;
 use strict;
+use POSIX;
 use Git;
 
 BEGIN {
@@ -52,7 +53,7 @@ sub smudge_filename {
 	$filename =~ s/ /_/g;
 	# Decode forbidden characters encoded in clean_filename
 	$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
-	return $filename;
+	return substr($filename, 0, NAME_MAX-3);
 }
 
 sub connect_maybe {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* (no subject)
  2017-10-29 16:37 [PATCH] remote-mediawiki: limit filenames to legal Antoine Beaupré
  2017-10-29 18:10 ` [PATCH v2] " Antoine Beaupré
@ 2017-10-29 18:15 ` Antoine Beaupré
  2017-10-29 18:15   ` [PATCH v3] remote-mediawiki: limit filenames to legal Antoine Beaupré
  1 sibling, 1 reply; 6+ messages in thread
From: Antoine Beaupré @ 2017-10-29 18:15 UTC (permalink / raw)
  To: git


sorry for the noise here, but the original patch didn't fix the length
in the right place. v2 fixed it in the library properly, but i forgot
to also include the length of the suffix. this should be good to go...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3] remote-mediawiki: limit filenames to legal
  2017-10-29 18:15 ` Antoine Beaupré
@ 2017-10-29 18:15   ` Antoine Beaupré
  0 siblings, 0 replies; 6+ messages in thread
From: Antoine Beaupré @ 2017-10-29 18:15 UTC (permalink / raw)
  To: git; +Cc: Antoine Beaupré

mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.

Signed-off-by: Antoine Beaupré <anarcat@debian.org>
---
 contrib/mw-to-git/Git/Mediawiki.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm b/contrib/mw-to-git/Git/Mediawiki.pm
index d13c4dfa7..917d9e2d3 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -2,6 +2,7 @@ package Git::Mediawiki;
 
 use 5.008;
 use strict;
+use POSIX;
 use Git;
 
 BEGIN {
@@ -52,7 +53,7 @@ sub smudge_filename {
 	$filename =~ s/ /_/g;
 	# Decode forbidden characters encoded in clean_filename
 	$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
-	return $filename;
+	return substr($filename, 0, NAME_MAX-length('.mw'));
 }
 
 sub connect_maybe {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] remote-mediawiki: limit filenames to legal
  2017-10-29 18:10 ` [PATCH v2] " Antoine Beaupré
@ 2017-10-30 10:34   ` Matthieu Moy
  2017-10-30 12:31     ` Antoine Beaupré
  0 siblings, 1 reply; 6+ messages in thread
From: Matthieu Moy @ 2017-10-30 10:34 UTC (permalink / raw)
  To: Antoine Beaupré; +Cc: git

Antoine Beaupré <anarcat@debian.org> writes:

> @@ -52,7 +53,7 @@ sub smudge_filename {
>  	$filename =~ s/ /_/g;
>  	# Decode forbidden characters encoded in clean_filename
>  	$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
> -	return $filename;
> +	return substr($filename, 0, NAME_MAX-3);

There's a request to allow a configurable extension (.mediawiki would
help importing in some wikis, see
https://github.com/Git-Mediawiki/Git-Mediawiki/issues/42). You should at
least make this stg like length(".mw") so that the next search&replace
for ".mw" finds this.

Also, note that your solution works for using Git-Mediawiki in a
read-only way, but if you start modifying and pushing such files, you'll
get into trouble. It probably makes sense to issue a warnign in such
case.

Regards,

-- 
Matthieu Moy
https://matthieu-moy.fr/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] remote-mediawiki: limit filenames to legal
  2017-10-30 10:34   ` Matthieu Moy
@ 2017-10-30 12:31     ` Antoine Beaupré
  0 siblings, 0 replies; 6+ messages in thread
From: Antoine Beaupré @ 2017-10-30 12:31 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On 2017-10-30 11:34:11, Matthieu Moy wrote:
> Antoine Beaupré <anarcat@debian.org> writes:
>
>> @@ -52,7 +53,7 @@ sub smudge_filename {
>>  	$filename =~ s/ /_/g;
>>  	# Decode forbidden characters encoded in clean_filename
>>  	$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
>> -	return $filename;
>> +	return substr($filename, 0, NAME_MAX-3);
>
> There's a request to allow a configurable extension (.mediawiki would
> help importing in some wikis, see
> https://github.com/Git-Mediawiki/Git-Mediawiki/issues/42). You should at
> least make this stg like length(".mw") so that the next search&replace
> for ".mw" finds this.

I believe I did that in v3.

> Also, note that your solution works for using Git-Mediawiki in a
> read-only way, but if you start modifying and pushing such files, you'll
> get into trouble. It probably makes sense to issue a warnign in such
> case.

True. I didn't consider that, but then again the patch is not a
regression: you couldn't have pushed those repos in the first place
anyways...

A.

-- 
The history of any one part of the earth, like the life of a soldier,
consists of long periods of boredom and short periods of terror.
                       - British geologist Derek V. Ager

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-30 12:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-29 16:37 [PATCH] remote-mediawiki: limit filenames to legal Antoine Beaupré
2017-10-29 18:10 ` [PATCH v2] " Antoine Beaupré
2017-10-30 10:34   ` Matthieu Moy
2017-10-30 12:31     ` Antoine Beaupré
2017-10-29 18:15 ` Antoine Beaupré
2017-10-29 18:15   ` [PATCH v3] remote-mediawiki: limit filenames to legal Antoine Beaupré

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).