user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 25/24] tighten up digit matches to ASCII for git output
Date: Wed,  5 Jun 2019 02:18:48 +0000
Message-ID: <20190605021848.29258-1-e@80x24.org> (raw)
In-Reply-To: <20190604112748.23598-1-e@80x24.org>

While I don't expect git to suddenly start spewing non-ASCII
digits in places I'd expect ASCII, this would make things easier
for future hackers and reviewers.
---
 lib/PublicInbox/Git.pm      |  4 ++--
 lib/PublicInbox/Import.pm   | 10 +++++-----
 script/public-inbox-convert |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 9014e02..68445b3 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -141,7 +141,7 @@ again:
 		}
 		return;
 	}
-	$head =~ /^[0-9a-f]{40} \S+ (\d+)$/ or
+	$head =~ /^[0-9a-f]{40} \S+ ([0-9]+)$/ or
 		fail($self, "Unexpected result from git cat-file: $head");
 
 	my $size = $1;
@@ -319,7 +319,7 @@ sub modified ($) {
 	foreach my $oid (<$fh>) {
 		chomp $oid;
 		my $buf = cat_file($self, $oid) or next;
-		$$buf =~ /^committer .*?> (\d+) [\+\-]?\d+/sm or next;
+		$$buf =~ /^committer .*?> ([0-9]+) [\+\-]?[0-9]+/sm or next;
 		my $cmt_time = $1;
 		$modified = $cmt_time if $cmt_time > $modified;
 	}
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 81a38fb..2c4bad9 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -106,7 +106,7 @@ sub _cat_blob ($$$) {
 	local $/ = "\n";
 	my $info = <$r>;
 	defined $info or die "EOF from fast-import / cat-blob: $!";
-	$info =~ /\A[a-f0-9]{40} blob (\d+)\n\z/ or return;
+	$info =~ /\A[a-f0-9]{40} blob ([0-9]+)\n\z/ or return;
 	my $left = $1;
 	my $offset = 0;
 	my $buf = '';
@@ -493,9 +493,9 @@ sub clean_purge_buffer {
 
 	foreach my $i (0..$#$buf) {
 		my $l = $buf->[$i];
-		if ($l =~ /^author .* (\d+ [\+-]?\d+)$/) {
+		if ($l =~ /^author .* ([0-9]+ [\+-]?[0-9]+)$/) {
 			$buf->[$i] = "author <> $1\n";
-		} elsif ($l =~ /^data (\d+)/) {
+		} elsif ($l =~ /^data ([0-9]+)/) {
 			$buf->[$i++] = "data " . length($cmt_msg) . "\n";
 			$buf->[$i] = $cmt_msg;
 			last;
@@ -525,7 +525,7 @@ sub purge_oids {
 				@buf = ();
 			}
 			push @buf, "commit $tmp\n";
-		} elsif (/^data (\d+)/) {
+		} elsif (/^data ([0-9]+)/) {
 			# only commit message, so $len is small:
 			my $len = $1; # + 1 for trailing "\n"
 			push @buf, $_;
@@ -557,7 +557,7 @@ sub purge_oids {
 			@buf = ();
 		} elsif ($_ eq "done\n") {
 			$done = 1;
-		} elsif (/^mark :(\d+)$/) {
+		} elsif (/^mark :([0-9]+)$/) {
 			push @buf, $_;
 			$mark = $1;
 		} else {
diff --git a/script/public-inbox-convert b/script/public-inbox-convert
index bd8fb98..99480c3 100755
--- a/script/public-inbox-convert
+++ b/script/public-inbox-convert
@@ -103,7 +103,7 @@ while (<$rd>) {
 		$state = 'blob';
 	} elsif (/^commit /) {
 		$state = 'commit';
-	} elsif (/^data (\d+)/) {
+	} elsif (/^data ([0-9]+)/) {
 		my $len = $1;
 		$w->print($_) or $im->wfail;
 		while ($len) {
@@ -114,7 +114,7 @@ while (<$rd>) {
 		}
 		next;
 	} elsif ($state eq 'commit') {
-		if (m{^M 100644 :(\d+) (${h}{2}/${h}{38})}o) {
+		if (m{^M 100644 :([0-9]+) (${h}{2}/${h}{38})}o) {
 			my ($mark, $path) = ($1, $2);
 			$D{$path} = $mark;
 			if ($last && $last ne 'm') {
@@ -134,7 +134,7 @@ while (<$rd>) {
 			$last = 'd';
 			next;
 		}
-		if (m{^from (:\d+)}) {
+		if (m{^from (:[0-9]+)}) {
 			$prev = $from;
 			$from = $1;
 			# no next
-- 
EW


      parent reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04 11:27 [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong
2019-06-04 11:27 ` [PATCH 01/24] linkify: support Internationalized Domain Names in URLs Eric Wong
2019-06-04 11:27 ` [PATCH 02/24] nntp: be explicit about ASCII digit matches Eric Wong
2019-06-04 11:27 ` [PATCH 03/24] nntp: ensure we only handle ASCII whitespace Eric Wong
2019-06-04 11:27 ` [PATCH 04/24] mid: id_compress requires ASCII-clean words Eric Wong
2019-06-04 11:27 ` [PATCH 05/24] feed: only accept ASCII digits for ref~$N Eric Wong
2019-06-04 11:27 ` [PATCH 06/24] http: require SERVER_PORT to be ASCII digit Eric Wong
2019-06-04 11:27 ` [PATCH 07/24] wwwlisting: require ASCII digit for port number Eric Wong
2019-06-04 11:27 ` [PATCH 08/24] wwwattach: only pass the charset through if ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong
2019-06-04 11:27 ` [PATCH 10/24] www: require ASCII filenames in git blob downloads Eric Wong
2019-06-04 11:27 ` [PATCH 11/24] config: do not accept non-ASCII digits in cgitrc params Eric Wong
2019-06-04 11:27 ` [PATCH 12/24] newswww: only accept ASCII digits as article numbers Eric Wong
2019-06-04 11:27 ` [PATCH 13/24] view: require YYYYmmDD(HHMMSS) timestamps to be ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 14/24] githttpbackend: require Range:, Status: to be ASCII digits Eric Wong
2019-06-04 11:27 ` [PATCH 15/24] searchview: do not allow non-ASCII offsets and limits Eric Wong
2019-06-04 11:27 ` [PATCH 16/24] msgtime: require ASCII digits for parsing dates Eric Wong
2019-06-04 11:27 ` [PATCH 17/24] filter/rubylang: require ASCII digit for mailcount Eric Wong
2019-06-04 11:27 ` [PATCH 18/24] inbox: require ASCII digits for feedmax var Eric Wong
2019-06-04 11:27 ` [PATCH 19/24] solver|viewdiff: restrict digit matches to ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 20/24] www: require ASCII digit for git epoch Eric Wong
2019-06-04 11:27 ` [PATCH 21/24] require ASCII digits for local FS items Eric Wong
2019-06-04 11:27 ` [PATCH 22/24] githttpbackend: require ASCII in path Eric Wong
2019-06-04 11:27 ` [PATCH 23/24] www: require ASCII range for mbox downloads Eric Wong
2019-06-04 11:27 ` [PATCH 24/24] www: require ASCII word characters for CSS filenames Eric Wong
2019-06-05  2:18 ` Eric Wong [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190605021848.29258-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git