git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "René Scharfe" <l.s.r@web.de>, git@vger.kernel.org
Subject: Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
Date: Wed, 03 Jan 2024 08:37:59 -0800	[thread overview]
Message-ID: <xmqq8r56bcew.fsf@gitster.g> (raw)
In-Reply-To: <20240103090152.GB1866508@coredump.intra.peff.net> (Jeff King's message of "Wed, 3 Jan 2024 04:01:52 -0500")

Jeff King <peff@peff.net> writes:

> It looks like this is the original version. I posted a v2 that took
> René's suggestion to swap out the awk for shell, but it got overlooked.
> I'm happy enough either way, but if we want to salvage that effort,
> here's a patch which could go on top:

Thanks.  I was happy enough with the old one and placed the updated
one on backburner.

A commit message that explains why this incremental update (i.e.,
rewrite from awk to a shell loop) is a good idea below does make it
worthwhile ;-)

> -- >8 --
> From: René Scharfe <l.s.r@web.de>
> Subject: [PATCH] t1006: prefer shell loop to awk for packed object sizes
>
> To compute the expected on-disk size of packed objects, we sort the
> output of show-index by pack offset and then compute the difference
> between adjacent entries using awk. This works but has a few readability
> problems:
>
>   1. Reading the index in pack order means don't find out the size of an
>      oid's entry until we see the _next_ entry. So we have to save it to
>      print later.
>
>      We can instead iterate in reverse order, so we compute each oid's
>      size as we see it.

If you go forward, you need "the end of the previous round" (which
is "the beginning of the current round") to be subtracted from "the
end of the current round".  If you go forward, you have to have "the
beginning of the previous round" (which is "the end of the current
round") from which you subtract "the beginning of the current round".

So from that point of view, the only difference is that you would
not be ready to emit in the first round, and you would need to emit
for the last entry after the loop.  Because we happen to have the
end of the last entry outside the loop, we can omit the awkwardness.

OK.  But iterating over a list backwards is a bit awkward ;-).

>   2. Since the awk invocation is inside a text_expect block, we can't
>      easily use single-quotes to hold the script. So we use
>      double-quotes, but then have to escape the dollar signs in the awk
>      script.

Yup.  The joy of shell quoting rules ;-)

> I gave René authorship since this was his cleverness, but obviously I
> wrote the commit message. Giving an explicit signoff would be nice,
> though.

Indeed.

>  t/t1006-cat-file.sh | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 0c2eafae65..5ea3326128 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1117,14 +1117,16 @@ test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
>  		while read idx
>  		do
>  			git show-index <"$idx" >idx.raw &&
> -			sort -n <idx.raw >idx.sorted &&
> +			sort -nr <idx.raw >idx.sorted &&
>  			packsz=$(test_file_size "${idx%.idx}.pack") &&
>  			end=$((packsz - rawsz)) &&
> -			awk -v end="$end" "
> -			  NR > 1 { print oid, \$1 - start }
> -			  { start = \$1; oid = \$2 }
> -			  END { print oid, end - start }
> -			" idx.sorted ||
> +			while read start oid rest
> +			do
> +				size=$((end - start)) &&
> +				end=$start &&
> +				echo "$oid $size" ||
> +				return 1
> +			done <idx.sorted ||
>  			return 1
>  		done
>  	} >expect.raw &&

This is totally unrelated tangent, but the way "show-index" gets
invoked in the above loop makes readers wonder how the caller found
out which $idx file to read.

Of course, the above loop sits downstream of a pipe

    find .git/objects/pack -type f -name \*.idx

which means that any user of "git show-index" must be intimately
familiar with how the object database is structured.  I wonder if we
want an extra layer of abstraction, similar to how the reference
database can have different backend implementation.

Anyway, will queue.  Thanks.


  reply	other threads:[~2024-01-03 16:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03  1:02 What's cooking in git.git (Jan 2024, #01; Tue, 2) Junio C Hamano
2024-01-03  5:53 ` ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2)) Patrick Steinhardt
2024-01-03  9:01 ` What's cooking in git.git (Jan 2024, #01; Tue, 2) Jeff King
2024-01-03 16:37   ` Junio C Hamano [this message]
2024-01-05  8:59     ` Jeff King
2024-01-05 16:34       ` Junio C Hamano
2024-01-03 17:14   ` René Scharfe
2024-01-03 16:43 ` Taylor Blau
2024-01-03 18:08   ` Junio C Hamano
2024-01-13 18:35     ` SZEDER Gábor
2024-01-13 22:06       ` Taylor Blau
2024-01-13 23:41         ` SZEDER Gábor
2024-01-16 20:37           ` Taylor Blau
2024-02-25 22:59             ` SZEDER Gábor
2024-02-26 14:44               ` Taylor Blau
2024-01-13 22:51       ` SZEDER Gábor
2024-01-16 20:49         ` Taylor Blau
2024-01-16 22:45           ` Junio C Hamano
2024-01-16 23:31             ` Taylor Blau
2024-01-16 23:42               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8r56bcew.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).