git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos
@ 2023-10-12 21:09 Taylor Blau
  2023-10-12 21:09 ` [PATCH 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-12 21:09 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Jeff King

These short couple of patches are some fallout from a larger series[^1]
that I'm working on. I noticed a couple of minor issues in the MIDX
format's documentation, and figured that sending these patches sooner
rather than later would be worthwhile.

The first patch swaps "null" for "NUL" when describing the NUL
character. The second patch corrects a piece of the documentation which
claims the PNAM chunk is not aligned (it is externally padded and thus
guaranteed to be well-aligned).

Thanks in advance for your review.

[^1]: The broader goal of that series is to allow verbatim pack reuse
  from multiple packs (instead of just the bitmapped one, or the
  preferred pack of a MIDX). This is only possible in certain
  circumstances, which require some tweaks to the MIDX format for
  bookkeeping reasons, which is how I noticed these issues in the first
  place.

Taylor Blau (2):
  Documentation/gitformat-pack.txt: fix typo
  Documentation/gitformat-pack.txt: fix incorrect MIDX documentation

 Documentation/gitformat-pack.txt | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

-- 
2.42.0.349.gf0c1128f8b.dirty


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] Documentation/gitformat-pack.txt: fix typo
  2023-10-12 21:09 [PATCH 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
@ 2023-10-12 21:09 ` Taylor Blau
  2023-10-12 21:09 ` [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
  2023-10-31 19:24 ` [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
  2 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-12 21:09 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Jeff King

e0d1bcf825 (multi-pack-index: add format details, 2018-07-12) describes
the MIDX's "PNAM" chunk as having entries which are "null-terminated
strings".

This is a typo, as strings are terminated with a NUL character, which is
a distinct concept from "NULL" or "null", which we typically reserve for
the void pointer to address 0.

Correct the documentation accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index 870e00f298..d7153962d4 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -390,7 +390,7 @@ CHUNK LOOKUP:
 CHUNK DATA:
 
 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
-	    Stores the packfile names as concatenated, null-terminated strings.
+	    Stores the packfile names as concatenated, NUL-terminated strings.
 	    Packfiles must be listed in lexicographic order for fast lookups by
 	    name. This is the only chunk not guaranteed to be a multiple of four
 	    bytes in length, so should be the last chunk for alignment reasons.
-- 
2.42.0.349.gf0c1128f8b.dirty



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-12 21:09 [PATCH 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
  2023-10-12 21:09 ` [PATCH 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
@ 2023-10-12 21:09 ` Taylor Blau
  2023-10-12 21:54   ` Junio C Hamano
  2023-10-31 19:24 ` [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
  2 siblings, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2023-10-12 21:09 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Jeff King

Back in 32f3c541e3 (multi-pack-index: write pack names in chunk, 2018-07-12)
the MIDX's "Packfile Names" (or "PNAM", for short) chunk was described
as containing an array of string entries. e0d1bcf825 notes that this is
the only chunk in the MIDX format's specification that is not guaranteed
to be 4-byte aligned, and so should be placed last.

This isn't quite accurate: the entries within the PNAM chunk are not
guaranteed to be aligned since they are arbitrary strings, but the
chunk itself is aligned since the ending is padded with NUL bytes.

That external padding has always been there since 32f3c541e3 via
midx.c::write_midx_pack_names(), which ended with:

    i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT)
    if (i < MIDX_CHUNK_ALIGNMENT) {
      unsigned char padding[MIDX_CHUNK_ALIGNMENT];
      memset(padding, 0, sizeof(padding))
      hashwrite(f, padding, i);
      written += i;
    }

In fact, 32f3c541e3's log message itself describes the chunk in its
first paragraph with:

    Since filenames are not well structured, add padding to keep good
    alignment in later chunks.

So these have always been externally aligned. Correct the corresponding
part of our documentation to reflect that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index d7153962d4..54000c9412 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -392,8 +392,9 @@ CHUNK DATA:
 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
 	    Stores the packfile names as concatenated, NUL-terminated strings.
 	    Packfiles must be listed in lexicographic order for fast lookups by
-	    name. This is the only chunk not guaranteed to be a multiple of four
-	    bytes in length, so should be the last chunk for alignment reasons.
+	    name. Individual entries in this chunk are not guarenteed to be
+	    aligned. The chunk is externally padded with zeros to align
+	    remaining chunks.
 
 	OID Fanout (ID: {'O', 'I', 'D', 'F'})
 	    The ith entry, F[i], stores the number of OIDs with first
-- 
2.42.0.349.gf0c1128f8b.dirty


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-12 21:09 ` [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
@ 2023-10-12 21:54   ` Junio C Hamano
  2023-10-30 21:55     ` Taylor Blau
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2023-10-12 21:54 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
> index d7153962d4..54000c9412 100644
> --- a/Documentation/gitformat-pack.txt
> +++ b/Documentation/gitformat-pack.txt
> @@ -392,8 +392,9 @@ CHUNK DATA:
>  	Packfile Names (ID: {'P', 'N', 'A', 'M'})
>  	    Stores the packfile names as concatenated, NUL-terminated strings.

Not a problem this series (neither this or the previous step)
introduces, but I had to read the implementation of
write_midx_pack_names() to find out what "concatenated
NUL-terminated string" really means.  The code has a list of
strings, writes each of them as a NUL-terminated string in sequence,
and to align the beginning of the next chunk, NULs are added to make
the whole thing multiple of MIDX_CHUNK_ALIGNMENT bytes.

A naive reader code might implement a loop like so:

	while (ptr[0] != '\0') {
		endp = strlen(ptr);
		... ptr[0..endp] is one pathname ...
		ptr = endp + 1;
	}
		
expecting that the terminating NUL of the last entry would be
followed by a NUL, but that is buggy.  The sum of the pathname
strings (with one NUL after each) may happen to be multiple of
MIDX_CHUNK_ALIGNMENT bytes, in which case no extra padding NUL bytes
will be there.  So the reader also needs to pay attention to the
chunk size to notice when to stop reading.  It feels somewhat
suboptimal.

>  	    Packfiles must be listed in lexicographic order for fast lookups by
> -	    name. This is the only chunk not guaranteed to be a multiple of four
> -	    bytes in length, so should be the last chunk for alignment reasons.
> +	    name. Individual entries in this chunk are not guarenteed to be
> +	    aligned. The chunk is externally padded with zeros to align
> +	    remaining chunks.

I am not sure what "externally padded" means.

Before write_midx_pack_names() returns, there is a code to pad the
space after it writes those names, which does not look any external
than the bytes used to record the pathnames or NUL bytes used to
terminate these pathnames.

To me, "externally padded" is an appropriate phrase to use if this
function does not bother with the alignment after what it needs to
record, but the caller, i.e., write_chunkfile(), has code to notice
that the number of bytes written by cf->chunks[i].write_fn() it just
called is not a multiple of some required alignment and adds padding
bytes after .write_fn() returned.  I do not think that is what is
going on here.

My observations.

 * The packfile names chunk is used to record N pathnames; N is not
   recorded anywhere explicitly.  To record N pathnames, a single
   chunk with N names in it is used.  It is not like N chunks, each
   with one name is used.

 * Each of these pathname is recorded literally, followed by a NUL,
   in some order, without any extra padding.

 * To keep the size of the chunk multiple of alignment requirement,
   there may be extra NUL bytes after the names.

This data structure does not allow you to binary search or hash
lookup without an extra table of pointers into this table of strings
at runtime, and once you build such a table, the source being sorted
does not help all that much.  So I am not sure how strict the
"lexicographic" requirement needs to be, but of course, starting
strict and loosening later is much easier than starting loose, so
documenting "must be listed" and following that rule is fine.

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-12 21:54   ` Junio C Hamano
@ 2023-10-30 21:55     ` Taylor Blau
  2023-10-31  0:42       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2023-10-30 21:55 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Jeff King

On Thu, Oct 12, 2023 at 02:54:17PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
> > index d7153962d4..54000c9412 100644
> > --- a/Documentation/gitformat-pack.txt
> > +++ b/Documentation/gitformat-pack.txt
> > @@ -392,8 +392,9 @@ CHUNK DATA:
> >  	Packfile Names (ID: {'P', 'N', 'A', 'M'})
> >  	    Stores the packfile names as concatenated, NUL-terminated strings.
>
> Not a problem this series (neither this or the previous step)
> introduces, but I had to read the implementation of
> write_midx_pack_names() to find out what "concatenated
> NUL-terminated string" really means.  The code has a list of
> strings, writes each of them as a NUL-terminated string in sequence,
> and to align the beginning of the next chunk, NULs are added to make
> the whole thing multiple of MIDX_CHUNK_ALIGNMENT bytes.
>
> A naive reader code might implement a loop like so:
>
> 	while (ptr[0] != '\0') {
> 		endp = strlen(ptr);
> 		... ptr[0..endp] is one pathname ...
> 		ptr = endp + 1;
> 	}
>
> expecting that the terminating NUL of the last entry would be
> followed by a NUL, but that is buggy.  The sum of the pathname
> strings (with one NUL after each) may happen to be multiple of
> MIDX_CHUNK_ALIGNMENT bytes, in which case no extra padding NUL bytes
> will be there.  So the reader also needs to pay attention to the
> chunk size to notice when to stop reading.  It feels somewhat
> suboptimal.

I agree.

> >  	    Packfiles must be listed in lexicographic order for fast lookups by
> > -	    name. This is the only chunk not guaranteed to be a multiple of four
> > -	    bytes in length, so should be the last chunk for alignment reasons.
> > +	    name. Individual entries in this chunk are not guarenteed to be
> > +	    aligned. The chunk is externally padded with zeros to align
> > +	    remaining chunks.
>
> I am not sure what "externally padded" means.

How about something like this, instead?

--- 8< ---
diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index 0bc80f0d46..229490f82f 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -392,9 +392,10 @@ CHUNK DATA:
 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
 	    Stores the packfile names as concatenated, NUL-terminated strings.
 	    Packfiles must be listed in lexicographic order for fast lookups by
-	    name. Individual entries in this chunk are not guarenteed to be
-	    aligned. The chunk is externally padded with zeros to align
-	    remaining chunks.
+	    name. Individual entries in this chunk are not guaranteed to be
+	    aligned, since the packfile names can be of arbitrary length. The
+	    chunk itself is padded at the end with NUL bytes in order to align
+	    the remaining chunks.

 	OID Fanout (ID: {'O', 'I', 'D', 'F'})
 	    The ith entry, F[i], stores the number of OIDs with first
--- >8 ---

Thanks,
Taylor


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-30 21:55     ` Taylor Blau
@ 2023-10-31  0:42       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2023-10-31  0:42 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

>> >  	    Packfiles must be listed in lexicographic order for fast lookups by
>> > -	    name. This is the only chunk not guaranteed to be a multiple of four
>> > -	    bytes in length, so should be the last chunk for alignment reasons.
>> > +	    name. Individual entries in this chunk are not guarenteed to be
>> > +	    aligned. The chunk is externally padded with zeros to align
>> > +	    remaining chunks.
>>
>> I am not sure what "externally padded" means.
>
> How about something like this, instead?
>
> --- 8< ---
> diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
> index 0bc80f0d46..229490f82f 100644
> --- a/Documentation/gitformat-pack.txt
> +++ b/Documentation/gitformat-pack.txt
> @@ -392,9 +392,10 @@ CHUNK DATA:
>  	Packfile Names (ID: {'P', 'N', 'A', 'M'})
>  	    Stores the packfile names as concatenated, NUL-terminated strings.
>  	    Packfiles must be listed in lexicographic order for fast lookups by
> -	    name. Individual entries in this chunk are not guarenteed to be
> -	    aligned. The chunk is externally padded with zeros to align
> -	    remaining chunks.
> +	    name. Individual entries in this chunk are not guaranteed to be
> +	    aligned, since the packfile names can be of arbitrary length. The
> +	    chunk itself is padded at the end with NUL bytes in order to align
> +	    the remaining chunks.

There is no alignment requirement described, so "not guaranteed" and
"in order to align" sound hollow.  These are always byte-aligned ;-)

How about something along this line to simplify it a bit?

	Store the names of packfiles as a sequence of NUL-terminated
	strings.  There is no extra padding between the filenames,
	and they are listed in lexicographic order.  The chunk
	itself is padded at the end with NUL bytes to make the chunk
	size a multiple of 4 bytes.

I did not ccheck if the chunks need to be 4-byte aligned, so if the
number is wrong, please adjust accordingly.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos
  2023-10-12 21:09 [PATCH 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
  2023-10-12 21:09 ` [PATCH 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
  2023-10-12 21:09 ` [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
@ 2023-10-31 19:24 ` Taylor Blau
  2023-10-31 19:24   ` [PATCH v2 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
  2023-10-31 19:24   ` [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
  2 siblings, 2 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-31 19:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

A minor reroll to adjust the text of the second patch to read more
clearly, thanks to input from Junio.

This has been rebased onto 692be87cbb (Merge branch
'jm/bisect-run-synopsis-fix', 2023-10-31). Thanks in advance for your
review!

Taylor Blau (2):
  Documentation/gitformat-pack.txt: fix typo
  Documentation/gitformat-pack.txt: fix incorrect MIDX documentation

 Documentation/gitformat-pack.txt | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Range-diff against v1:
1:  8c5fa1ff4f = 1:  92e9bee4ad Documentation/gitformat-pack.txt: fix typo
2:  af2742e05d ! 2:  c149be35a1 Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
    @@ Metadata
      ## Commit message ##
         Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
     
    -    Back in 32f3c541e3 (multi-pack-index: write pack names in chunk, 2018-07-12)
    -    the MIDX's "Packfile Names" (or "PNAM", for short) chunk was described
    -    as containing an array of string entries. e0d1bcf825 notes that this is
    -    the only chunk in the MIDX format's specification that is not guaranteed
    -    to be 4-byte aligned, and so should be placed last.
    +    Back in 32f3c541e3 (multi-pack-index: write pack names in chunk,
    +    2018-07-12) the MIDX's "Packfile Names" (or "PNAM", for short) chunk was
    +    described as containing an array of string entries. e0d1bcf825 notes
    +    that this is the only chunk in the MIDX format's specification that is
    +    not guaranteed to be 4-byte aligned, and so should be placed last.
     
         This isn't quite accurate: the entries within the PNAM chunk are not
    -    guaranteed to be aligned since they are arbitrary strings, but the
    -    chunk itself is aligned since the ending is padded with NUL bytes.
    +    guaranteed to be 4-byte aligned since they are arbitrary strings, but
    +    the chunk itself is 4-byte aligned since the ending is padded with NUL
    +    bytes.
     
    -    That external padding has always been there since 32f3c541e3 via
    +    That padding has always been there since 32f3c541e3 via
         midx.c::write_midx_pack_names(), which ended with:
     
             i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT)
    @@ Commit message
         So these have always been externally aligned. Correct the corresponding
         part of our documentation to reflect that.
     
    +    Helped-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/gitformat-pack.txt ##
    -@@ Documentation/gitformat-pack.txt: CHUNK DATA:
    +@@ Documentation/gitformat-pack.txt: CHUNK LOOKUP:
    + CHUNK DATA:
    + 
      	Packfile Names (ID: {'P', 'N', 'A', 'M'})
    - 	    Stores the packfile names as concatenated, NUL-terminated strings.
    - 	    Packfiles must be listed in lexicographic order for fast lookups by
    +-	    Stores the packfile names as concatenated, NUL-terminated strings.
    +-	    Packfiles must be listed in lexicographic order for fast lookups by
     -	    name. This is the only chunk not guaranteed to be a multiple of four
     -	    bytes in length, so should be the last chunk for alignment reasons.
    -+	    name. Individual entries in this chunk are not guarenteed to be
    -+	    aligned. The chunk is externally padded with zeros to align
    -+	    remaining chunks.
    ++	    Store the names of packfiles as a sequence of NUL-terminated
    ++	    strings. There is no extra padding between the filenames,
    ++	    and they are listed in lexicographic order. The chunk itself
    ++	    is padded at the end with between 0 and 3 NUL bytes to make the
    ++	    chunk size a multiple of 4 bytes.
      
      	OID Fanout (ID: {'O', 'I', 'D', 'F'})
      	    The ith entry, F[i], stores the number of OIDs with first

base-commit: 692be87cbba55e8488f805d236f2ad50483bd7d5
-- 
2.42.0.527.ge89c67d052


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] Documentation/gitformat-pack.txt: fix typo
  2023-10-31 19:24 ` [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
@ 2023-10-31 19:24   ` Taylor Blau
  2023-10-31 19:24   ` [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
  1 sibling, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-31 19:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

e0d1bcf825 (multi-pack-index: add format details, 2018-07-12) describes
the MIDX's "PNAM" chunk as having entries which are "null-terminated
strings".

This is a typo, as strings are terminated with a NUL character, which is
a distinct concept from "NULL" or "null", which we typically reserve for
the void pointer to address 0.

Correct the documentation accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index 4a4d87e7db..c4eb09d52a 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -390,7 +390,7 @@ CHUNK LOOKUP:
 CHUNK DATA:
 
 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
-	    Stores the packfile names as concatenated, null-terminated strings.
+	    Stores the packfile names as concatenated, NUL-terminated strings.
 	    Packfiles must be listed in lexicographic order for fast lookups by
 	    name. This is the only chunk not guaranteed to be a multiple of four
 	    bytes in length, so should be the last chunk for alignment reasons.
-- 
2.42.0.527.ge89c67d052



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-31 19:24 ` [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
  2023-10-31 19:24   ` [PATCH v2 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
@ 2023-10-31 19:24   ` Taylor Blau
  2023-10-31 20:00     ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2023-10-31 19:24 UTC (permalink / raw
  To: git; +Cc: Jeff King, Junio C Hamano

Back in 32f3c541e3 (multi-pack-index: write pack names in chunk,
2018-07-12) the MIDX's "Packfile Names" (or "PNAM", for short) chunk was
described as containing an array of string entries. e0d1bcf825 notes
that this is the only chunk in the MIDX format's specification that is
not guaranteed to be 4-byte aligned, and so should be placed last.

This isn't quite accurate: the entries within the PNAM chunk are not
guaranteed to be 4-byte aligned since they are arbitrary strings, but
the chunk itself is 4-byte aligned since the ending is padded with NUL
bytes.

That padding has always been there since 32f3c541e3 via
midx.c::write_midx_pack_names(), which ended with:

    i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT)
    if (i < MIDX_CHUNK_ALIGNMENT) {
      unsigned char padding[MIDX_CHUNK_ALIGNMENT];
      memset(padding, 0, sizeof(padding))
      hashwrite(f, padding, i);
      written += i;
    }

In fact, 32f3c541e3's log message itself describes the chunk in its
first paragraph with:

    Since filenames are not well structured, add padding to keep good
    alignment in later chunks.

So these have always been externally aligned. Correct the corresponding
part of our documentation to reflect that.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-pack.txt | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
index c4eb09d52a..9fcb29a9c8 100644
--- a/Documentation/gitformat-pack.txt
+++ b/Documentation/gitformat-pack.txt
@@ -390,10 +390,11 @@ CHUNK LOOKUP:
 CHUNK DATA:
 
 	Packfile Names (ID: {'P', 'N', 'A', 'M'})
-	    Stores the packfile names as concatenated, NUL-terminated strings.
-	    Packfiles must be listed in lexicographic order for fast lookups by
-	    name. This is the only chunk not guaranteed to be a multiple of four
-	    bytes in length, so should be the last chunk for alignment reasons.
+	    Store the names of packfiles as a sequence of NUL-terminated
+	    strings. There is no extra padding between the filenames,
+	    and they are listed in lexicographic order. The chunk itself
+	    is padded at the end with between 0 and 3 NUL bytes to make the
+	    chunk size a multiple of 4 bytes.
 
 	OID Fanout (ID: {'O', 'I', 'D', 'F'})
 	    The ith entry, F[i], stores the number of OIDs with first
-- 
2.42.0.527.ge89c67d052


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation
  2023-10-31 19:24   ` [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
@ 2023-10-31 20:00     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-10-31 20:00 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, Junio C Hamano

On Tue, Oct 31, 2023 at 03:24:11PM -0400, Taylor Blau wrote:

> Back in 32f3c541e3 (multi-pack-index: write pack names in chunk,
> 2018-07-12) the MIDX's "Packfile Names" (or "PNAM", for short) chunk was
> described as containing an array of string entries. e0d1bcf825 notes
> that this is the only chunk in the MIDX format's specification that is
> not guaranteed to be 4-byte aligned, and so should be placed last.
> 
> This isn't quite accurate: the entries within the PNAM chunk are not
> guaranteed to be 4-byte aligned since they are arbitrary strings, but
> the chunk itself is 4-byte aligned since the ending is padded with NUL
> bytes.

We also don't place it last! :) So the alignment is very important, as I
found out in the recent chunk-corruption series.

> So these have always been externally aligned. Correct the corresponding
> part of our documentation to reflect that.

Both this and the previous patch look good to me.

-Peff


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-31 20:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-12 21:09 [PATCH 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
2023-10-12 21:09 ` [PATCH 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
2023-10-12 21:09 ` [PATCH 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
2023-10-12 21:54   ` Junio C Hamano
2023-10-30 21:55     ` Taylor Blau
2023-10-31  0:42       ` Junio C Hamano
2023-10-31 19:24 ` [PATCH v2 0/2] Documentation/gitformat-pack.txt: correct a few issues/typos Taylor Blau
2023-10-31 19:24   ` [PATCH v2 1/2] Documentation/gitformat-pack.txt: fix typo Taylor Blau
2023-10-31 19:24   ` [PATCH v2 2/2] Documentation/gitformat-pack.txt: fix incorrect MIDX documentation Taylor Blau
2023-10-31 20:00     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).