git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eric W. Biederman" <ebiederm@gmail.com>
To: Linus Arver <linusa@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	 git@vger.kernel.org,
	 "brian m. carlson" <sandals@crustytoothpaste.net>,
	 Eric Sunshine <sunshine@sunshineco.com>,
	 "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
Date: Thu, 15 Feb 2024 00:22:44 -0600	[thread overview]
Message-ID: <8734tumekr.fsf@gmail.froward.int.ebiederm.org> (raw)
In-Reply-To: <owlya5o4dbj1.fsf@fine.c.googlers.com> (Linus Arver's message of "Tue, 13 Feb 2024 00:16:34 -0800")

Linus Arver <linusa@google.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> While looking at how to handle input of both SHA-1 and SHA-256 oids in
>> get_oid_with_context, I realized that the oid_array in
>> repo_for_each_abbrev might have more than one kind of oid stored in it
>> simultaneously.
>>
>> Update to oid_array_append to ensure that oids added to an oid array
>
> s/Update to/Update
>
>> always have an algorithm set.
>>
>> Update void_hashcmp to first verify two oids use the same hash algorithm
>> before comparing them to each other.
>>
>> With that oid-array should be safe to use with different kinds of
>
> s/oid-array/oid_array
>
>> oids simultaneously.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  oid-array.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/oid-array.c b/oid-array.c
>> index 8e4717746c31..1f36651754ed 100644
>> --- a/oid-array.c
>> +++ b/oid-array.c
>> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>>  {
>>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>>  	oidcpy(&array->oid[array->nr++], oid);
>> +	if (!oid->algo)
>> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);
>
> How come we can't set oid->algo _before_ we call oidcpy()? It seems odd
> that we do the copy first and then modify what we just copied after the
> fact, instead of making sure that the thing we want to copy is correct
> before doing the copy.
>
> But also, if we are going to make the oid object "correct" before
> invoking oidcpy(), we might as well do it when the oid is first
> created/used (in the caller(s) of this function). I don't demand that
> you find/demonstrate where all these places are in this series (maybe
> that's a hairy problem to tackle?), but it seems cleaner in principle to
> fix the creation of oid objects instead of having to make oid users
> clean up their act like this after using them.

There is a hairy problem here.

I believe for reasons of simplicity when the algo field was added to
struct object_id it was allowed to be zero for users that don't
particularly care about the hash algorithm, and are happy to use the git
default hash algorithm.

Me experience working on this set of change set showed that there
are oids without their algo set in all kinds of places in the tree.

I could not think of any sure way to go through the entire tree
and find those users, so I just made certain that oid array handled
that case.

I need algo to be set properly in the oids in the oid array so I
could extend oid_array to hold multiple kinds of oids at the same
time.  To allow multiple kinds of oids at the same time void_hashcmp
needs a simple and reliable way to tell what the algorithm is of
any given oid.

>
>>  	array->sorted = 0;
>>  }
>>  
>> -static int void_hashcmp(const void *a, const void *b)
>> +static int void_hashcmp(const void *va, const void *vb)
>>  {
>> -	return oidcmp(a, b);
>> +	const struct object_id *a = va, *b = vb;
>> +	int ret;
>> +	if (a->algo == b->algo)
>> +		ret = oidcmp(a, b);
>
> This makes sense (per the commit message description) ...
>
>> +	else
>> +		ret = a->algo > b->algo ? 1 : -1;
>
> ... but this seems to go against it? I thought you wanted to only ever
> compare hashes if they were of the same algo? It would be good to add a
> comment explaining why this is OK (we are no longer doing a byte-by-byte
> comparison of these oids any more here like we do for oidcmp() above
> which boils down to calling memcmp()).

So the goal of this change is for oid_array to be able to hold hashes
from multiple algorithms at the same time.

A key part of oid_array is oid_array_sort that allows functions such
as oid_array_lookup and oid_array_for_each_unique.

To that end there needs to be a total ordering of oids.

The function oidcmp is only defined when two oids are of the same
algorithm, it does not even test to detect the case of comparing
mismatched algorithms.

Therefore to get a total ordering of oids.  I must use oidcmp
when the algorithm is the same (the common case) or simply order
the oids by algorithm when the algorithms are different.



All of this is relevant to get_oid_with_context as get_oid_with_context
and it's helper functions contain the logic that determines what
we do when a hex string that is ambiguous is specified.

In the ambiguous case all of the possible candidates are placed in
an oid_array, sorted and then displayed.


With a repository that can knows both the sha1 and the sha256 oid
of it's objects it is possible for a short oid to match both
some sha1 oids and some sha256 oids.

>> +	return ret;
>
> Also, in terms of style I think the "early return for errors" style
> would be simpler to read. I.e.
>
>     if (a->algo > b->algo)
>         return 1;
>
>     if (a->algo < b->algo)
>         return -1;
>
>     return oidcmd(a, b);
>

I can see doing:
	if (a->algo == b->algo)
        	return oidcmp(a,b);

	if (a->algo > b->algo)
        	return 1;
        else
        	return -1;

Or even:
	if (a->algo == b->algo)
        	return oidcmp(a,b);

	return a->algo - b->algo;

Although I suspect using subtraction is a bit too clever.

Comparing for less than, and greater than, and then assuming
the values are equal hides what is important before calling
oidcmp which is that the algo values are equal.


>>  }
>>  
>>  void oid_array_sort(struct oid_array *array)
>> -- 
>> 2.41.0

Eric


  reply	other threads:[~2024-02-15  6:23 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
2023-09-27 20:42   ` Eric Sunshine
2023-10-02  1:22     ` Eric W. Biederman
2023-10-02  2:27       ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids Eric W. Biederman
2023-09-27 23:20   ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 03/30] object-names: Support input of oids in any supported hash Eric W. Biederman
2023-09-27 23:29   ` Eric Sunshine
2023-10-02  1:54     ` Eric W. Biederman
2023-09-27 19:55 ` [PATCH 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2023-09-28  7:14   ` Eric Sunshine
2023-10-02  2:11     ` Eric W. Biederman
2023-10-02  2:36       ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 06/30] loose: Compatibilty short name support Eric W. Biederman
2023-09-27 19:55 ` [PATCH 07/30] object-file: Update the loose object map when writing loose objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 08/30] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 09/30] commit: write commits for both hashes Eric W. Biederman
2023-09-27 19:55 ` [PATCH 10/30] commit: Convert mergetag before computing the signature of a commit Eric W. Biederman
2023-09-27 19:55 ` [PATCH 11/30] commit: Export add_header_signature to support handling signatures on tags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 12/30] tag: sign both hashes Eric W. Biederman
2023-09-27 19:55 ` [PATCH 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 14/30] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-09-27 19:55 ` [PATCH 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-09-27 19:55 ` [PATCH 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
2023-09-27 19:55 ` [PATCH 17/30] object-file-convert: Don't leak when converting tag objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-09-27 19:55 ` [PATCH 19/30] object-file-convert: Convert commits that embed signed tags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 20/30] object-file: Update object_info_extended to reencode objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 21/30] repository: Implement extensions.compatObjectFormat Eric W. Biederman
2023-09-27 21:39   ` Junio C Hamano
2023-09-28 20:18     ` Junio C Hamano
2023-09-29  0:50       ` Eric Biederman
2023-09-29 16:59       ` Eric W. Biederman
2023-09-29 18:48         ` Junio C Hamano
2023-10-02  0:48           ` Eric W. Biederman
2023-10-02  1:31     ` Eric W. Biederman
2023-09-27 19:55 ` [PATCH 22/30] rev-parse: Add an --output-object-format parameter Eric W. Biederman
2023-09-27 19:55 ` [PATCH 23/30] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 25/30] object-file: Handle compat objects in check_object_signature Eric W. Biederman
2023-09-27 19:55 ` [PATCH 26/30] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 27/30] test-lib: Compute the compatibility hash so tests may use it Eric W. Biederman
2023-09-27 19:55 ` [PATCH 28/30] t1006: Rename sha1 to oid Eric W. Biederman
2023-09-27 19:55 ` [PATCH 29/30] t1006: Test oid compatibility with cat-file Eric W. Biederman
2023-09-27 19:55 ` [PATCH 30/30] t1016-compatObjectFormat: Add tests to verify the conversion between objects Eric W. Biederman
2023-09-27 21:31 ` [PATCH 00/30] Initial support for multiple hash functions Junio C Hamano
2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
2024-02-08  8:23     ` Linus Arver
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
2024-02-13  8:16     ` Linus Arver
2024-02-15  6:22       ` Eric W. Biederman [this message]
2024-02-16  0:16         ` Linus Arver
2024-02-16  4:48           ` Eric W. Biederman
2024-02-17  1:59             ` Linus Arver
2024-02-13  8:31     ` Kristoffer Haugsbakk
2024-02-15  6:24       ` Eric W. Biederman
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
2024-02-13  9:33     ` Linus Arver
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
2024-02-13 10:02     ` Linus Arver
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2024-02-14  7:20     ` Linus Arver
2024-02-15  5:33       ` Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 06/30] loose: compatibilty short name support Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 07/30] object-file: update the loose object map when writing loose objects Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 08/30] object-file: add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 09/30] commit: write commits for both hashes Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 10/30] commit: convert mergetag before computing the signature of a commit Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 11/30] commit: export add_header_signature to support handling signatures on tags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 12/30] tag: sign both hashes Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 14/30] object: factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 17/30] object-file-convert: don't leak when converting tag objects Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 19/30] object-file-convert: convert commits that embed signed tags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 20/30] object-file: update object_info_extended to reencode objects Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 21/30] repository: implement extensions.compatObjectFormat Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 22/30] rev-parse: add an --output-object-format parameter Eric W. Biederman
2024-02-08 16:25     ` Jean-Noël Avila
2023-10-02  2:40   ` [PATCH v2 23/30] builtin/cat-file: let the oid determine the output algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 25/30] object-file: handle compat objects in check_object_signature Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 26/30] builtin/ls-tree: let the oid determine the output algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 27/30] test-lib: compute the compatibility hash so tests may use it Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 28/30] t1006: rename sha1 to oid Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 29/30] t1006: test oid compatibility with cat-file Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 30/30] t1016-compatObjectFormat: add tests to verify the conversion between objects Eric W. Biederman
2024-02-07 22:18   ` [PATCH v2 00/30] initial support for multiple hash functions Junio C Hamano
2024-02-08  0:24     ` Linus Arver
2024-02-08  6:11       ` Patrick Steinhardt
2024-02-14  7:36       ` Linus Arver
2024-02-15 11:27   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8734tumekr.fsf@gmail.froward.int.ebiederm.org \
    --to=ebiederm@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=linusa@google.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).