From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 102291F5AE; Fri, 3 Jul 2020 23:30:33 +0000 (UTC) Date: Fri, 3 Jul 2020 23:30:32 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: Re: [PATCH] Import: Be more careful with names in email Message-ID: <20200703233032.GA5810@dcvr> References: <87imf4qn87.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87imf4qn87.fsf@x220.int.ebiederm.org> List-Id: "Eric W. Biederman" wrote: > > I recent encountered a nasty (in several sense of the word) from line: > > From: =?UTF-8?B?DQolLg0KSVwnbSBhIGhvcm55IGJyYXppbGlhbiBhbmQgSVwnbSBuZXcgaGVyZS4gSVwnbSBzZWFyY2hpbmcgZm9yIGEgZ3V5IHRvIGZ1Y2sgbWUgZnJvbSBiZWhpbmQgdGhpcyBldmVuaW5nLiBUZXh0IG1lIHNvb24gaHR0cDovL2NlbnRwZWFkb3RvdGFiLmNmL3JBbmdpZQ0KDQogJHB4Mm9ybTVuam1lDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0=?= > > After extract_cmt_info was extracted and decoded the ``name'', > passing that decoded ``name'' to git fast-import caused it > to choke. Something about a missing '>'. > > To prevent this happening in the future transform a few more characters > into spaces, and don't use string interpolation, use comma separated > variables instead. > > Signed-off-by: "Eric W. Biederman" > --- > > I honestly don't know if I have closed all of the holes > when implementing the code this way. But this changes works for me(tm). > Feel free to treat this as a bug report instead of a patch to apply. > > lib/PublicInbox/Import.pm | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm > index 66b359023bd2..df6c78368845 100644 > --- a/lib/PublicInbox/Import.pm > +++ b/lib/PublicInbox/Import.pm > @@ -310,7 +310,7 @@ sub extract_cmt_info ($;$) { > # ref: > # > if (defined $name) { > - $name =~ tr/<>//d; > + $name =~ tr/\n\r<>$/ /d; Is getting rid of '$' an effort to avoid double interpolation by Perl? Perl won't recursively expand variables AFAIK. I agree with dropping \r and \n, though. > utf8::encode($name); > } else { > $name = ''; > @@ -425,9 +425,10 @@ sub add { > print $w "reset $ref\n" or wfail; > } > > - print $w "commit $ref\nmark :$commit\n", > - "author $name <$email> $at\n", > - "committer $self->{ident} $ct\n" or wfail; > + # Be very careful with the strings from the email > + print $w "commit ", $ref, "\nmark :", $commit, "\n", > + "author ", $name, " <", $email, "> ", $at, "\n", > + "committer ", $self->{ident}, " ", $ct, "\n" or wfail; I haven't tested, but I'm not seeing this hunk as necessary once \r\n<> are removed. Thanks.