From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id BC1E31F452 for ; Tue, 25 Apr 2023 05:24:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232390AbjDYFYE (ORCPT ); Tue, 25 Apr 2023 01:24:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229637AbjDYFYC (ORCPT ); Tue, 25 Apr 2023 01:24:02 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0D524C0F for ; Mon, 24 Apr 2023 22:24:00 -0700 (PDT) Received: (qmail 21584 invoked by uid 109); 25 Apr 2023 05:24:00 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Tue, 25 Apr 2023 05:24:00 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 466 invoked by uid 111); 25 Apr 2023 05:23:59 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 25 Apr 2023 01:23:59 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 25 Apr 2023 01:23:59 -0400 From: Jeff King To: Junio C Hamano Cc: Thomas Bock , Derrick Stolee , git@vger.kernel.org Subject: Re: [PATCH 2/3] parse_commit(): parse timestamp from end of line Message-ID: <20230425052359.GA4007491@coredump.intra.peff.net> References: <20230422134150.GA3516940@coredump.intra.peff.net> <20230422134703.GB3942326@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Apr 24, 2023 at 10:05:42AM -0700, Junio C Hamano wrote: > > + /* > > + * parse to end-of-line and then walk backwards, which > > + * handles some malformed cases. > > + */ > > I would say "parse to" -> "jump to", but technically moving forward > looking for a LF byte is still "parsing". "some" malformed cases > being "most plausible" ones (due to how ident.c::fmt_ident() is what > writes '>' after the string end-user gave as e-mail) may be worth > mentioning. I'll expand this to: /* * Jump to end-of-line so that we can walk backwards to find the * end-of-email ">". This is more forgiving of malformed cases * because unexpected characters tend to be in the name and email * fields. */ > > - dateptr = buf; > > - while (buf < tail && *buf++ != '\n') > > + for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--) > > /* nada */; > > OK. Just a style thing, but I found that "; /* nada */" is easier > to spot that there is an empty statement there. I found it ugly, too, but it's from earlier. Since the point is to advance "dateptr", it may be better to turn it into a while loop anyway, which side-steps the empty statement altogether: dateptr = eol; while (dateptr > buf && dateptr[-1] != '>') dateptr--; > > - if (buf >= tail) > > + if (dateptr == buf || dateptr == eol) > > return 0; > > Curious when dateptr that wanted to scan back from eol is still at > eol after the loop. It is when the ident line ends with ">" without > any timestamp/tz info. And the reason why we need to check that > here is ... Yeah, though as you saw in the next patch, it is not really sufficient. :) I think it may be redundant, though. In the original we already bailed earlier if we didn't find a ">" (because "buf >= tail" after we advanced it looking for ">"). Here "dateptr == buf" is checking the same thing (because we walked backwards). In the original we'd bail if we failed to find a newline as part of this loop (because "buf >= tail" after looking for a newline). But that can't happen here; we've already bailed after memchr() failed to find a newline). So "dateptr == eol" only triggers if there were no characters in "buf" to look at. So I added it only to keep this comment trivially true: > > - /* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */ > > + > > + /* dateptr < eol && *eol == '\n', so parsing will stop at eol */ > > return parse_timestamp(dateptr, NULL, 10); > > ... because parse_timestamp() is merely strtoumax() and would > happily skip over arbitrary number of leading "whitespace" without > stopping if (dateptr == eol && *eol == '\n'). OK, sad but correct. But it could also read "dateptr <= eol" and still be true (which is to say it is mostly accurate, but not quite because of the "soaking up whitespace" problem fixed by the next patch. I'll leave the extra condition in this patch, since it's orthogonal to what this patch is fixing. But in the next one I'll remove it and expand the comment to explain a bit more. -Peff