git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "René Scharfe" <l.s.r@web.de>,
	"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: szeder.dev@gmail.com, me@ttaylorr.com,
	Derrick Stolee <derrickstolee@github.com>,
	Abhishek Kumar <abhishekkumar8222@gmail.com>
Subject: Re: [PATCH 00/15] Refactor chunk-format into an API
Date: Fri, 4 Dec 2020 08:57:34 -0500	[thread overview]
Message-ID: <3e3387b6-d2ab-7d86-0535-2ef182fbc469@gmail.com> (raw)
In-Reply-To: <ee0b73f7-8f59-a1dc-0a21-bf796bf9f2e2@web.de>

On 12/4/2020 7:48 AM, René Scharfe wrote:
> Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget:
...
>>  Documentation/technical/chunk-format.txt      |  54 ++
>>  .../technical/commit-graph-format.txt         |   3 +
>>  Documentation/technical/pack-format.txt       |   3 +
>>  Makefile                                      |   1 +
>>  chunk-format.c                                | 105 ++++
>>  chunk-format.h                                |  69 +++
>>  commit-graph.c                                | 298 ++++++-----
>>  midx.c                                        | 466 ++++++++----------
>>  t/t5318-commit-graph.sh                       |   2 +-
>>  t/t5319-multi-pack-index.sh                   |   6 +-
>>  10 files changed, 623 insertions(+), 384 deletions(-)
> 
> 623-384-54-3-3-1-69-2-6 = 101
> 
> So if we ignore changes to documentation, headers, tests and build
> script this spends ca. 100 more lines of code than the current version.
> That's roughly the size of the new file chunk-format.c -- from this
> bird's-eye-view the new API seems to be pure overhead.

Overhead in terms of lines of code, but many of those are function
prototypes and single lines containing only "{" and "}". So yes,
the code files are a bit longer, but the amount of executed code is
not meaningfully different.

Extra lines of code is an expected cost of refactoring. The remaining
question is, "is it worth the cost?" I believe it is.
 
> In the new code I see several magic numbers, use of void pointers and
> casting as well as repetition -- is this really going in the right
> direction?  I get the feeling that YAGNI.

void pointers are a cost of abstraction in C that we use all over the
codebase.

You (and Junio) are right to point out my magic numbers. Those should
be replaced with something better when possible.

As far as YAGNI, I doubt that very much. First, we have already seen
extensions to the commit-graph that added several new chunks, and
plugging into this (documented) API should be easier than the previous
ad-hoc mechanism.

I've CC'd Abhishek to get his opinion, since he's recently added chunks
to the commit-graph file. Outside of the fact that this series conflicts
with his series (which I will fix), it would be good to see if he
appreciates this model.

>> I was thinking about file formats recently and realized that the "chunks"
>> that are common to the commit-graph and multi-pack-index could inform future
>> file formats. To make that process easier, let's combine the process of
>> writing and reading chunks into a common API that both of these existing
>> formats use.

And another point on YAGNI: I'm literally prototyping a new file format and
want to use this API to build it instead of repeating myself. Specifically,
I noticed that the commit-graph and multi-pack-index were inconsistent in
how they protected the file format in different ways during writes and reads.
This leads to...

>> There is some extra benefit immediately: the writing and reading code for
>> each gets a bit cleaner. Also, there were different checks in each that made
>> the process more robust. Now, these share a common set of checks.

...my point that combining these checks make both codepaths slightly more
robust. I didn't even include the potential extension of storing the size
of each chunk in "struct commit_graph" and "struct multi_pack_index" for
run-time bound checks during lookups. That seemed like too much new
behavior for a series that intends to only refactor.

Thanks,
-Stolee

  reply	other threads:[~2020-12-04 14:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 16:16 [PATCH 00/15] Refactor chunk-format into an API Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 01/15] commit-graph: anonymize data in chunk_write_fn Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 02/15] chunk-format: add API for writing table of contents Derrick Stolee via GitGitGadget
2020-12-08 17:56   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 03/15] midx: rename pack_info to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 04/15] midx: use context in write_midx_pack_names() Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 05/15] midx: add entries to write_midx_context Derrick Stolee via GitGitGadget
2020-12-03 21:42   ` Junio C Hamano
2020-12-04 13:39     ` Derrick Stolee
2020-12-08 18:00   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 06/15] midx: add pack_perm " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 07/15] midx: add num_large_offsets " Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 08/15] midx: convert chunk write methods to return int Derrick Stolee via GitGitGadget
2020-12-03 21:50   ` Junio C Hamano
2020-12-04 13:40     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 09/15] midx: drop chunk progress during write Derrick Stolee via GitGitGadget
2020-12-03 16:16 ` [PATCH 10/15] midx: use chunk-format API in write_midx_internal() Derrick Stolee via GitGitGadget
2020-12-08 18:42   ` Taylor Blau
2020-12-10 14:36     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 11/15] midx: use 64-bit multiplication for chunk sizes Derrick Stolee via GitGitGadget
2020-12-03 22:00   ` Junio C Hamano
2020-12-08 18:43     ` Taylor Blau
2020-12-03 16:16 ` [PATCH 12/15] chunk-format: create write_chunks() Derrick Stolee via GitGitGadget
2020-12-08 18:45   ` Taylor Blau
2020-12-03 16:16 ` [PATCH 13/15] chunk-format: create chunk reading API Derrick Stolee via GitGitGadget
2020-12-03 22:17   ` Junio C Hamano
2020-12-04 13:47     ` Derrick Stolee
2020-12-04 20:17       ` Junio C Hamano
2020-12-03 22:43   ` Junio C Hamano
2020-12-04 13:45     ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 14/15] commit-graph: restore duplicate chunk checks Derrick Stolee via GitGitGadget
2020-12-07 13:43   ` Derrick Stolee
2020-12-03 16:16 ` [PATCH 15/15] chunk-format: add technical docs Derrick Stolee via GitGitGadget
2020-12-04 12:48 ` [PATCH 00/15] Refactor chunk-format into an API René Scharfe
2020-12-04 13:57   ` Derrick Stolee [this message]
2020-12-04 19:42   ` Junio C Hamano
2020-12-08 18:49   ` Taylor Blau
2020-12-09 17:13     ` René Scharfe
2020-12-10  0:50       ` Taylor Blau
2020-12-10 14:30         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e3387b6-d2ab-7d86-0535-2ef182fbc469@gmail.com \
    --to=stolee@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).