From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 03B621F93C for ; Tue, 8 Nov 2022 19:09:00 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WKgq+heb"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229721AbiKHTIo (ORCPT ); Tue, 8 Nov 2022 14:08:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229624AbiKHTIh (ORCPT ); Tue, 8 Nov 2022 14:08:37 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAEC91A832 for ; Tue, 8 Nov 2022 11:08:34 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id g12so22482065wrs.10 for ; Tue, 08 Nov 2022 11:08:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=A/1szkQWFrp9ZzWmj4dDcjZ9DG3uBjDO0qjHthAlnbU=; b=WKgq+heb6t3ml0uwn4KdJXXLk3YzxWlnvk5s2I+oT19yBUOK6yNwyLtqGr8PgJHLxl /RcDCyd6xvYa1MRhDxEXA+yP+cRulLscrMjxcKeyaKicnhT5cfc0joSZWpcU+/aC5l6/ gvM6hDC+cesJEvn+gyLzk6zyhNNKJ/6Lv3iqOOxT1g8GDAUi3KE24qieFtFKepvFuVla nwlu5g4b9WidBfodXq8wFCSmrVUaqtLTvh9/7KwzjzxuN6bVTC5xuo13pIW+vDtpmDm9 tQqF/1IpWYfYUW58/Ov35d0nGQcr0oZMrJwwoXmOYCFZiW+bUwHQ2hj6Dg24YUUQsR6I NWAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A/1szkQWFrp9ZzWmj4dDcjZ9DG3uBjDO0qjHthAlnbU=; b=qEhBIKNkW9yciT0FVKjA8MDo+JihRetKgToxe67qwaeDkTZPpGhx+BQEjrzNKW4scc he9u4yX51rq1JW8zIYbcGpX43ZJdXPvMRlah7UAYrZmyfKISvOYDJe0r6r5rU0JjVJPA YnCvX5jlQZTFnmkL1sF7Z3ECI5h3PDYRxuA+2RLOANovdI/GFIvk0HJh6NCvXTyRnEXT ikZrm1xzD0+roM2Rkt84qgpdba1osMEVgUArCx71IJPGKjcB80XXinRxpuU+uz9iyDXL NPgFgKpRC9b9N5wVKH0U4QKiS8MrFbg9niL+uzM8KH7+AxoYAfj9ynVamtJcUKqUF91Q PaWg== X-Gm-Message-State: ACrzQf3J/ad39sljwEXtLG/NsxFj0vuwkat/Y4TFHtLpJ7TfOaaBAVg9 7MdkA+JZUkEelVEgDLAuQOQZ086FK7I= X-Google-Smtp-Source: AMsMyM4dQe0KI9Ji5+AbG0XAsKdmUNLzf39hfYRczCiI+CeEqu+O33P6eZAgtuaDSUhK/g3G1Ey4RQ== X-Received: by 2002:a05:6000:18c7:b0:22e:5503:9c46 with SMTP id w7-20020a05600018c700b0022e55039c46mr34712419wrq.668.1667934513123; Tue, 08 Nov 2022 11:08:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o13-20020a5d684d000000b0023677693532sm10914101wrw.14.2022.11.08.11.08.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Nov 2022 11:08:32 -0800 (PST) Message-Id: <31af383fd439c3c0a5003598961acfecfae4018c.1667934510.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Eric Sunshine via GitGitGadget" Date: Tue, 08 Nov 2022 19:08:28 +0000 Subject: [PATCH 2/4] chainlint: tighten accuracy when consuming input stream Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jeff King , =?UTF-8?Q?=C3=86var_Arnfj=C3=B6r=C3=B0?= Bjarmason , Eric Sunshine , Eric Sunshine Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Eric Sunshine To extract the next token in the input stream, Lexer::scan_token() finds the start of the token by skipping whitespace, then consumes characters belonging to the token until it encounters a non-token character, such as an operator, punctuation, or whitespace. In the case of an operator or punctuation which ends a token, before returning the just-scanned token, it pushes that operator or punctuation character back onto the input stream to ensure that it will be the first character consumed by the next call to scan_token(). However, scan_token() is intentionally lax when whitespace ends a token; it doesn't bother pushing the whitespace character back onto the token stream since it knows that the next call to scan_token() will, as its first step, skip over whitespace anyhow when looking for the start of the token. Although such laxity is harmless for the proper functioning of the lexical analyzer, it does make it difficult to precisely identify the token's end position in the input stream. Accurate token position information may be desirable, for instance, to annotate problems or highlight other interesting facets of the input found during the parsing phase. To accommodate such possibilities, tighten scan_token() by making it push the token-ending whitespace character back onto the input stream, just as it does for other token-ending characters. Signed-off-by: Eric Sunshine --- t/chainlint.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/chainlint.pl b/t/chainlint.pl index 9908de6c758..1f66c03c593 100755 --- a/t/chainlint.pl +++ b/t/chainlint.pl @@ -179,7 +179,7 @@ RESTART: # handle special characters last unless $$b =~ /\G(.)/sgc; my $c = $1; - last if $c =~ /^[ \t]$/; # whitespace ends token + pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/; $token .= $self->scan_sqstring(), next if $c eq "'"; $token .= $self->scan_dqstring(), next if $c eq '"'; -- gitgitgadget