[PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-02-29 20:57 [PATCH 0/1] Documentation/user-manual.txt: try to clarify on object hashes Dirk Gouders
@ 2024-02-29 13:05 ` Dirk Gouders
  2024-02-29 21:37   ` Junio C Hamano
  2024-03-12 10:41 ` [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on " Dirk Gouders
  1 sibling, 1 reply; 10+ messages in thread
From: Dirk Gouders @ 2024-02-29 13:05 UTC (permalink / raw
  To: git list

If someone spends the time to work through the documentation, the
subject "hashes" can lead to contradictions:

The README of the initial commit states hashes are generated from
compressed data (which changed very soon), whereas
Documentation/user-manual.txt says they are generated from original
data.

Don't give doubts a chance: clarify this and present a simple example
on how object hashes can be generated manually.

Signed-off-by: Dirk Gouders <dirk@gouders.net>
---
 Documentation/user-manual.txt | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 6433903491..8dfb81e045 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data
 plus this header, so `sha1sum` 'file' does not match the object name
 for 'file'.
 
+Starting with the initial commit, hashing was done on the compressed
+data and the file README of that commit explicitely states this:
+
+"The SHA1 hash is always the hash of the _compressed_ object, not the
+original one."
+
+This changed soon after that with commit
+d98b46f8d9a3 (Do SHA1 hash _before_ compression.).  Unfortunately, the
+commit message doesn't provide the detailed reasoning.
+
+The following is a short example that demonstrates how hashes can be
+generated manually:
+
+Let's asume a small text file with the content "Hello git.\n"
+-------------------------------------------------
+$ cat > hello.txt <<EOF
+Hello git.
+EOF
+-------------------------------------------------
+
+We can now manually generate the hash `git` would use for this file:
+
+- The object we want the hash for is of type "blob" and its size is
+  11 bytes.
+
+- Prepend the object header to the file content and feed this to
+  sha1sum(1):
+
+-------------------------------------------------
+$ printf "blob 11\0" | cat - hello.txt | sha1sum
+7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
+-------------------------------------------------
+
 As a result, the general consistency of an object can always be tested
 independently of the contents or the type of the object: all objects can
 be validated by verifying that (a) their hashes match the content of the
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 0/1] Documentation/user-manual.txt: try to clarify on object hashes
@ 2024-02-29 20:57 Dirk Gouders
  2024-02-29 13:05 ` [PATCH 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
  2024-03-12 10:41 ` [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on " Dirk Gouders
  0 siblings, 2 replies; 10+ messages in thread
From: Dirk Gouders @ 2024-02-29 20:57 UTC (permalink / raw
  To: git list

I'm not sure if such patches are welcome -- they could cause more work
than it's worth it.  On the other hand, the contradiction irritaded me
and I had to dig a bit and check it by example and writing this down
could perhaps help others who also stumble over this.

If someone knows the exact reasoning why the hashing changed from
_after_ compressed to _before_ compressed, we could perhaps add that,
too?

Dirk Gouders (1):
  Documentation/user-manual.txt: example for generating object hashes

 Documentation/user-manual.txt | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-02-29 13:05 ` [PATCH 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
@ 2024-02-29 21:37   ` Junio C Hamano
  2024-02-29 22:35     ` Dirk Gouders
  2024-03-08  6:45     ` Dirk Gouders
  0 siblings, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2024-02-29 21:37 UTC (permalink / raw
  To: Dirk Gouders; +Cc: git list

Dirk Gouders <dirk@gouders.net> writes:

> If someone spends the time to work through the documentation, the
> subject "hashes" can lead to contradictions:
>
> The README of the initial commit states hashes are generated from
> compressed data (which changed very soon), whereas
> Documentation/user-manual.txt says they are generated from original
> data.
>
> Don't give doubts a chance: clarify this and present a simple example
> on how object hashes can be generated manually.

I'd rather not to waste readers' attention to historical wart.

> @@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data
>  plus this header, so `sha1sum` 'file' does not match the object name
>  for 'file'.

The paragraph above (part of it is hidden before the hunk) clearly
states what the naming rules are.  We hash the original and then
compress.  If I use an implementation of Git that drives the zlib at
compression level 1, and if you clone from my repository with
another implementation of Git whose zlib is driven at compression
level 9, our .git/objects/01/2345...90 files may not be identical,
but when uncompressed they should store the same contents, so "hash
then compress" is the only sensible choice that is not affected by
the compression to give stable names to objects.

> +Starting with the initial commit, hashing was done on the compressed
> +data and the file README of that commit explicitely states this:
> +
> +"The SHA1 hash is always the hash of the _compressed_ object, not the
> +original one."
> +
> +This changed soon after that with commit
> +d98b46f8d9a3 (Do SHA1 hash _before_ compression.).  Unfortunately, the
> +commit message doesn't provide the detailed reasoning.

These three are about Git development history, which by itself may
be of interest for some people, but the main target audience of the
user-manual is probably different from them.  They may be interested
to learn how Git works, but it is only to feel that they understand
how the "magic" things Git does, like "a cryptographic hash of
contents is enough to uniquely identify the contents being tracked",
works well to trust their precious contents [*].

    Side note: 
    https://lore.kernel.org/git/Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org/
    explains the reason behind the change to those who did not find
    it obvious.

FYI, another "breaking" change we did earlier in the history of the
project was to update the sort order of paths in tree objects.  We
do not need to confuse readers by talking about the original and
updated sort order.  The only thing they need, when they want to get
the feeling that they understand how things work, is the description
of how things work in the version of Git they have ready access to.
Historical mistakes we made, corrections we made and why, are
certainly of interest but not for the target audience of this
document.

On the other hand, ...

> +The following is a short example that demonstrates how hashes can be
> +generated manually:
> +
> +Let's asume a small text file with the content "Hello git.\n"
> +-------------------------------------------------
> +$ cat > hello.txt <<EOF
> +Hello git.
> +EOF
> +-------------------------------------------------
> +
> +We can now manually generate the hash `git` would use for this file:
> +
> +- The object we want the hash for is of type "blob" and its size is
> +  11 bytes.
> +
> +- Prepend the object header to the file content and feed this to
> +  sha1sum(1):
> +
> +-------------------------------------------------
> +$ printf "blob 11\0" | cat - hello.txt | sha1sum
> +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
> +-------------------------------------------------
> +

... something like the above (modulo coding style) would be a useful
addition to help those who want to convince themselves they
understand how (some parts of) Git works under the hood, and I think
it would be a welcome addition to some subset of such readers (the
rest of the world may feel it is way too much detail, though).

I would draw the line between this one and a similar description and
demonstration of historical mistakes, which is not as relevant as
how things work in the current system.  In other words, to me, it is
OK to dig a bit deep to show how the current scheme works but it is
way too much to do the same for versions of the system that do not
exist anymore.

But others may draw the line differently and consider even the above
a bit too much detail, which is a position I would also accept.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-02-29 21:37   ` Junio C Hamano
@ 2024-02-29 22:35     ` Dirk Gouders
  2024-02-29 22:57       ` Junio C Hamano
  2024-03-08  6:45     ` Dirk Gouders
  1 sibling, 1 reply; 10+ messages in thread
From: Dirk Gouders @ 2024-02-29 22:35 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git list

Junio C Hamano <gitster@pobox.com> writes:

> Dirk Gouders <dirk@gouders.net> writes:
>
>> If someone spends the time to work through the documentation, the
>> subject "hashes" can lead to contradictions:
>>
>> The README of the initial commit states hashes are generated from
>> compressed data (which changed very soon), whereas
>> Documentation/user-manual.txt says they are generated from original
>> data.
>>
>> Don't give doubts a chance: clarify this and present a simple example
>> on how object hashes can be generated manually.
>
> I'd rather not to waste readers' attention to historical wart.

Yes, but -- I should have mentioned it -- the document itself suggests
to read the initial commit.

But I don't mean to argue about that, perhaps I digged to deep into
details.

>> @@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data
>>  plus this header, so `sha1sum` 'file' does not match the object name
>>  for 'file'.
>
> The paragraph above (part of it is hidden before the hunk) clearly
> states what the naming rules are.  We hash the original and then
> compress.  If I use an implementation of Git that drives the zlib at
> compression level 1, and if you clone from my repository with
> another implementation of Git whose zlib is driven at compression
> level 9, our .git/objects/01/2345...90 files may not be identical,
> but when uncompressed they should store the same contents, so "hash
> then compress" is the only sensible choice that is not affected by
> the compression to give stable names to objects.

Thank your for that detail.

>> +Starting with the initial commit, hashing was done on the compressed
>> +data and the file README of that commit explicitely states this:
>> +
>> +"The SHA1 hash is always the hash of the _compressed_ object, not the
>> +original one."
>> +
>> +This changed soon after that with commit
>> +d98b46f8d9a3 (Do SHA1 hash _before_ compression.).  Unfortunately, the
>> +commit message doesn't provide the detailed reasoning.
>
> These three are about Git development history, which by itself may
> be of interest for some people, but the main target audience of the
> user-manual is probably different from them.  They may be interested
> to learn how Git works, but it is only to feel that they understand
> how the "magic" things Git does, like "a cryptographic hash of
> contents is enough to uniquely identify the contents being tracked",
> works well to trust their precious contents [*].
>
>     Side note: 
>     https://lore.kernel.org/git/Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org/
>     explains the reason behind the change to those who did not find
>     it obvious.
>
> FYI, another "breaking" change we did earlier in the history of the
> project was to update the sort order of paths in tree objects.  We
> do not need to confuse readers by talking about the original and
> updated sort order.  The only thing they need, when they want to get
> the feeling that they understand how things work, is the description
> of how things work in the version of Git they have ready access to.
> Historical mistakes we made, corrections we made and why, are
> certainly of interest but not for the target audience of this
> document.

Again thank you, very interesting reading.

> On the other hand, ...
>
>> +The following is a short example that demonstrates how hashes can be
>> +generated manually:
>> +
>> +Let's asume a small text file with the content "Hello git.\n"
>> +-------------------------------------------------
>> +$ cat > hello.txt <<EOF
>> +Hello git.
>> +EOF
>> +-------------------------------------------------
>> +
>> +We can now manually generate the hash `git` would use for this file:
>> +
>> +- The object we want the hash for is of type "blob" and its size is
>> +  11 bytes.
>> +
>> +- Prepend the object header to the file content and feed this to
>> +  sha1sum(1):
>> +
>> +-------------------------------------------------
>> +$ printf "blob 11\0" | cat - hello.txt | sha1sum
>> +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
>> +-------------------------------------------------
>> +
>
> ... something like the above (modulo coding style) would be a useful
> addition to help those who want to convince themselves they
> understand how (some parts of) Git works under the hood, and I think
> it would be a welcome addition to some subset of such readers (the
> rest of the world may feel it is way too much detail, though).
>
> I would draw the line between this one and a similar description and
> demonstration of historical mistakes, which is not as relevant as
> how things work in the current system.  In other words, to me, it is
> OK to dig a bit deep to show how the current scheme works but it is
> way too much to do the same for versions of the system that do not
> exist anymore.
>
> But others may draw the line differently and consider even the above
> a bit too much detail, which is a position I would also accept.
>
> Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-02-29 22:35     ` Dirk Gouders
@ 2024-02-29 22:57       ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2024-02-29 22:57 UTC (permalink / raw
  To: Dirk Gouders; +Cc: git list

Dirk Gouders <dirk@gouders.net> writes:

>> I'd rather not to waste readers' attention to historical wart.
>
> Yes, but -- I should have mentioned it -- the document itself suggests
> to read the initial commit.

Ahh, yes, we'd need to hedge that part.  Good thinking.

I am still not sure if the first hunk below is a good idea or it is
too much detail.  The second hunk may be worth doing.

Thanks.

 Documentation/user-manual.txt | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git c/Documentation/user-manual.txt w/Documentation/user-manual.txt
index 6433903491..1027055784 100644
--- c/Documentation/user-manual.txt
+++ w/Documentation/user-manual.txt
@@ -4093,7 +4093,8 @@ that not only specifies their type, but also provides size information
 about the data in the object.  It's worth noting that the SHA-1 hash
 that is used to name the object is the hash of the original data
 plus this header, so `sha1sum` 'file' does not match the object name
-for 'file'.
+for 'file' (the earliest versions of Git hashed slightly differently
+but the conclusion is still the same).
 
 As a result, the general consistency of an object can always be tested
 independently of the contents or the type of the object: all objects can
@@ -4123,7 +4124,8 @@ $ git switch --detach e83c5163
 ----------------------------------------------------
 
 The initial revision lays the foundation for almost everything Git has
-today, but is small enough to read in one sitting.
+today (even though details may differ in a few places), but is small
+enough to read in one sitting.
 
 Note that terminology has changed since that revision.  For example, the
 README in that revision uses the word "changeset" to describe what we


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-02-29 21:37   ` Junio C Hamano
  2024-02-29 22:35     ` Dirk Gouders
@ 2024-03-08  6:45     ` Dirk Gouders
  2024-03-08 15:24       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Dirk Gouders @ 2024-03-08  6:45 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Dirk Gouders <dirk@gouders.net> writes:

>> +The following is a short example that demonstrates how hashes can be
>> +generated manually:
>> +
>> +Let's asume a small text file with the content "Hello git.\n"
>> +-------------------------------------------------
>> +$ cat > hello.txt <<EOF
>> +Hello git.
>> +EOF
>> +-------------------------------------------------
>> +
>> +We can now manually generate the hash `git` would use for this file:
>> +
>> +- The object we want the hash for is of type "blob" and its size is
>> +  11 bytes.
>> +
>> +- Prepend the object header to the file content and feed this to
>> +  sha1sum(1):
>> +
>> +-------------------------------------------------
>> +$ printf "blob 11\0" | cat - hello.txt | sha1sum
>> +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
>> +-------------------------------------------------
>> +
>
> ... something like the above (modulo coding style) would be a useful
> addition to help those who want to convince themselves they
> understand how (some parts of) Git works under the hood, and I think
> it would be a welcome addition to some subset of such readers (the
> rest of the world may feel it is way too much detail, though).

May I ask what you meant by "modulo coding style", e.g. where I should
look at to make the code of similar style?

I would also add that git-hash-object(1) could be used to verify the
result if you think that is OK.

In addition to a suggestion in another mail, the commit would
consist of substantial content you suggested and perhaps, you could tell
me how to express this; would a Helped-By be correct?

Dirk


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-03-08  6:45     ` Dirk Gouders
@ 2024-03-08 15:24       ` Junio C Hamano
  2024-03-08 22:11         ` Dirk Gouders
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2024-03-08 15:24 UTC (permalink / raw
  To: Dirk Gouders; +Cc: git

Dirk Gouders <dirk@gouders.net> writes:

> May I ask what you meant by "modulo coding style", e.g. where I should
> look at to make the code of similar style?

Documentation/CodingGuidelines would be a good start, but

 * A here-doc for a single liner is probably an overkill.  Why not

    echo "Hello, world" >file

   In either way, in our codebase a redirection operator '>' (or
   '<') has one whitespace before it, and no whitespace after it
   before the file.

 * printf piped to "cat - file" whose output feeds another pipe
   looked unusual.  More usual way novices write may be

    { printf ... ; cat file; } | sha1sum

were the two things I noticed.

> I would also add that git-hash-object(1) could be used to verify the
> result if you think that is OK.

git hash-object can be used to replace that whole thing ;-)

> In addition to a suggestion in another mail, the commit would
> consist of substantial content you suggested and perhaps, you could tell
> me how to express this; would a Helped-By be correct?

I think many may prefer to downcase B in By, but if it is
"substantial", probably.  I do not think I gave much in this
discussion to become a substantial addition to the original, though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-03-08 15:24       ` Junio C Hamano
@ 2024-03-08 22:11         ` Dirk Gouders
  0 siblings, 0 replies; 10+ messages in thread
From: Dirk Gouders @ 2024-03-08 22:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Dirk Gouders <dirk@gouders.net> writes:
>
>> May I ask what you meant by "modulo coding style", e.g. where I should
>> look at to make the code of similar style?
>
> Documentation/CodingGuidelines would be a good start, but
>
>  * A here-doc for a single liner is probably an overkill.  Why not
>
>     echo "Hello, world" >file
>
>    In either way, in our codebase a redirection operator '>' (or
>    '<') has one whitespace before it, and no whitespace after it
>    before the file.
>
>  * printf piped to "cat - file" whose output feeds another pipe
>    looked unusual.  More usual way novices write may be
>
>     { printf ... ; cat file; } | sha1sum
>
> were the two things I noticed.
>
>> I would also add that git-hash-object(1) could be used to verify the
>> result if you think that is OK.
>
> git hash-object can be used to replace that whole thing ;-)
>
>> In addition to a suggestion in another mail, the commit would
>> consist of substantial content you suggested and perhaps, you could tell
>> me how to express this; would a Helped-By be correct?
>
> I think many may prefer to downcase B in By, but if it is
> "substantial", probably.  I do not think I gave much in this
> discussion to become a substantial addition to the original, though.

Thank you for the explanation (some of which I should have found by
myself).

I will send the prepared v2 when I solved my struggling with range-diffs;
that concept is new to me and I have a slow brain -- if one at all.

Dirk


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on object hashes
  2024-02-29 20:57 [PATCH 0/1] Documentation/user-manual.txt: try to clarify on object hashes Dirk Gouders
  2024-02-29 13:05 ` [PATCH 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
@ 2024-03-12 10:41 ` Dirk Gouders
  2024-03-12 10:41   ` [PATCH v2 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
  1 sibling, 1 reply; 10+ messages in thread
From: Dirk Gouders @ 2024-03-12 10:41 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Dirk Gouders

This is the second round of adding a hashing example to user-manual.txt.
---
Changes in v2:
- Do not go into detail about hashing in the history.
- Change code according to coding guidelines.
- Fix a typo (s/asume/assume/) and change the wording of that sentence.
- Write Git instead of `git`.
- To fit the whole document, change sample content to "Hello world", lentgh 12.
- Add verification of hash using `git hash-object`.
- Provide for empty lines around code blocks.
---
Dirk Gouders (1):
  Documentation/user-manual.txt: example for generating object hashes

 Documentation/user-manual.txt | 36 +++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

Range-diff against v1:
1:  6995f866e7 ! 1:  568c59d69f Documentation/user-manual.txt: example for generating object hashes
    @@ Metadata
      ## Commit message ##
         Documentation/user-manual.txt: example for generating object hashes
     
    -    If someone spends the time to work through the documentation, the
    -    subject "hashes" can lead to contradictions:
    +    Add a simple example on how object hashes can be generated manually.
     
    -    The README of the initial commit states hashes are generated from
    -    compressed data (which changed very soon), whereas
    -    Documentation/user-manual.txt says they are generated from original
    -    data.
    -
    -    Don't give doubts a chance: clarify this and present a simple example
    -    on how object hashes can be generated manually.
    +    Further, because the document suggests to have a look at the initial
    +    commit, clarify that some details changed since that time.
     
         Signed-off-by: Dirk Gouders <dirk@gouders.net>
     
      ## Documentation/user-manual.txt ##
    -@@ Documentation/user-manual.txt: that is used to name the object is the hash of the original data
    +@@ Documentation/user-manual.txt: that not only specifies their type, but also provides size information
    + about the data in the object.  It's worth noting that the SHA-1 hash
    + that is used to name the object is the hash of the original data
      plus this header, so `sha1sum` 'file' does not match the object name
    - for 'file'.
    - 
    -+Starting with the initial commit, hashing was done on the compressed
    -+data and the file README of that commit explicitely states this:
    -+
    -+"The SHA1 hash is always the hash of the _compressed_ object, not the
    -+original one."
    +-for 'file'.
    ++for 'file' (the earliest versions of Git hashed slightly differently
    ++but the conclusion is still the same).
     +
    -+This changed soon after that with commit
    -+d98b46f8d9a3 (Do SHA1 hash _before_ compression.).  Unfortunately, the
    -+commit message doesn't provide the detailed reasoning.
    ++The following is a short example that demonstrates how these hashes
    ++can be generated manually:
     +
    -+The following is a short example that demonstrates how hashes can be
    -+generated manually:
    ++Let's assume a small text file with some simple content:
     +
    -+Let's asume a small text file with the content "Hello git.\n"
     +-------------------------------------------------
    -+$ cat > hello.txt <<EOF
    -+Hello git.
    -+EOF
    ++$ echo "Hello world" >hello.txt
     +-------------------------------------------------
     +
    -+We can now manually generate the hash `git` would use for this file:
    ++We can now manually generate the hash Git would use for this file:
     +
     +- The object we want the hash for is of type "blob" and its size is
    -+  11 bytes.
    ++  12 bytes.
     +
     +- Prepend the object header to the file content and feed this to
    -+  sha1sum(1):
    ++  `sha1sum`:
     +
     +-------------------------------------------------
    -+$ printf "blob 11\0" | cat - hello.txt | sha1sum
    -+7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
    ++$ { printf "blob 12\0"; cat hello.txt; } | sha1sum
    ++802992c4220de19a90767f3000a79a31b98d0df7  -
     +-------------------------------------------------
     +
    ++This manually constructed hash can be verified using `git hash-object`
    ++which of course hides the addition of the header:
    ++
    ++-------------------------------------------------
    ++$ git hash-object hello.txt
    ++802992c4220de19a90767f3000a79a31b98d0df7
    ++-------------------------------------------------
    + 
      As a result, the general consistency of an object can always be tested
      independently of the contents or the type of the object: all objects can
    - be validated by verifying that (a) their hashes match the content of the
    +@@ Documentation/user-manual.txt: $ git switch --detach e83c5163
    + ----------------------------------------------------
    + 
    + The initial revision lays the foundation for almost everything Git has
    +-today, but is small enough to read in one sitting.
    ++today (even though details may differ in a few places), but is small
    ++enough to read in one sitting.
    + 
    + Note that terminology has changed since that revision.  For example, the
    + README in that revision uses the word "changeset" to describe what we
-- 
2.43.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/1] Documentation/user-manual.txt: example for generating object hashes
  2024-03-12 10:41 ` [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on " Dirk Gouders
@ 2024-03-12 10:41   ` Dirk Gouders
  0 siblings, 0 replies; 10+ messages in thread
From: Dirk Gouders @ 2024-03-12 10:41 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, Dirk Gouders

Add a simple example on how object hashes can be generated manually.

Further, because the document suggests to have a look at the initial
commit, clarify that some details changed since that time.

Signed-off-by: Dirk Gouders <dirk@gouders.net>
---
 Documentation/user-manual.txt | 36 +++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 6433903491..90a4189358 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -4093,7 +4093,38 @@ that not only specifies their type, but also provides size information
 about the data in the object.  It's worth noting that the SHA-1 hash
 that is used to name the object is the hash of the original data
 plus this header, so `sha1sum` 'file' does not match the object name
-for 'file'.
+for 'file' (the earliest versions of Git hashed slightly differently
+but the conclusion is still the same).
+
+The following is a short example that demonstrates how these hashes
+can be generated manually:
+
+Let's assume a small text file with some simple content:
+
+-------------------------------------------------
+$ echo "Hello world" >hello.txt
+-------------------------------------------------
+
+We can now manually generate the hash Git would use for this file:
+
+- The object we want the hash for is of type "blob" and its size is
+  12 bytes.
+
+- Prepend the object header to the file content and feed this to
+  `sha1sum`:
+
+-------------------------------------------------
+$ { printf "blob 12\0"; cat hello.txt; } | sha1sum
+802992c4220de19a90767f3000a79a31b98d0df7  -
+-------------------------------------------------
+
+This manually constructed hash can be verified using `git hash-object`
+which of course hides the addition of the header:
+
+-------------------------------------------------
+$ git hash-object hello.txt
+802992c4220de19a90767f3000a79a31b98d0df7
+-------------------------------------------------
 
 As a result, the general consistency of an object can always be tested
 independently of the contents or the type of the object: all objects can
@@ -4123,7 +4154,8 @@ $ git switch --detach e83c5163
 ----------------------------------------------------
 
 The initial revision lays the foundation for almost everything Git has
-today, but is small enough to read in one sitting.
+today (even though details may differ in a few places), but is small
+enough to read in one sitting.
 
 Note that terminology has changed since that revision.  For example, the
 README in that revision uses the word "changeset" to describe what we
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-12 10:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29 20:57 [PATCH 0/1] Documentation/user-manual.txt: try to clarify on object hashes Dirk Gouders
2024-02-29 13:05 ` [PATCH 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
2024-02-29 21:37   ` Junio C Hamano
2024-02-29 22:35     ` Dirk Gouders
2024-02-29 22:57       ` Junio C Hamano
2024-03-08  6:45     ` Dirk Gouders
2024-03-08 15:24       ` Junio C Hamano
2024-03-08 22:11         ` Dirk Gouders
2024-03-12 10:41 ` [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on " Dirk Gouders
2024-03-12 10:41   ` [PATCH v2 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).