git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [RFC PATCH 0/6] Hash Abstraction
Date: Mon, 21 Aug 2017 13:48:05 -0700
Message-ID: <CAGZ79kZ1CTza=PEqP=3gTT=81HRAAUfPWQyWKxNXKf5G0e+nNg@mail.gmail.com> (raw)
In-Reply-To: <20170821000022.26729-1-sandals@crustytoothpaste.net>

On Sun, Aug 20, 2017 at 5:00 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> = Overview
>
> This is an RFC series proposing a basic abstraction for hash functions.
>
> As we get closer to converting the remainder of the codebase to use
> struct object_id, we should think about the design we want our hash
> function abstraction to take.  This series is a proposal for one idea to
> start discussion.  Input on any aspect of this proposal is welcome.
>
> This series exposes a struct git_hash_algo that contains basic
> information about a given hash algorithm that distinguishes it from
> other algorithms: name, lengths, implementing functions, and empty tree
> and blob constants.  It also exposes an array of hash algorithms, and a
> constant for indexing them.
>
> The series also demonstrates a simple conversion using the abstraction
> over empty blob and tree values.
>
> In order to avoid conflicting with the struct repository work and with
> the goal of avoiding global variables as much as possible, I've pushed
> the hash algorithm into struct repository and exposed it via a #define.
> This necessitiates pulling repository.h into cache.h, which I don't
> think is fatal.  Doing that, in turn, necessitated some work on the
> Subversion code to avoid conflicts.
>
> It should be fine for Junio to pick up the first two patches from this
> series, as they're relatively independent and valuable without the rest
> of the series.  The rest should not be applied immediately, although
> they do pass the testsuite.
>
> I proposed this series now as it will inform the way we go about
> converting other parts of the codebase, especially some of the pack
> algorithms.  Because we share some hash computation code between pack
> checksums and object hashing, we need to decide whether to expose pack
> checksums as struct object_id, even though they are technically not
> object IDs.  Furthermore, if we end up needing to stuff an algorithm
> value into struct object_id, we'll no longer be able to directly
> reference object IDs in a pack without a copy.

Note that the checksum hash could be different from actual object ids,
so it may be easiest to introduce a "new pack format", which is just
s/sha1/<new hash>/g for any object ids, though I guess the work needed
is just the same for converting checksums to the new hash, too.


> This series is available from the usual places as branch hash-struct,
> based against master.
>
> = Naming
>
> The length names are similar to the current constant names
> intentionally.  I've used the "hash_algo" name for both the integer
> constant and the pointer to struct, although we could change the latter
> to "hash_impl" or such as people like.
>
> I chose to name the define "current_hash" and expose no other defines.
> The name is relatively short since we're going to be typing it a lot.
> However, if people like, we can capitalize it or expose other defines
> (say, a GIT_HASH_RAWSZ or GIT_HASH_HEXSZ) instead of or in addition to
> current_hash, which would make this name less interesting.
>
> Feel free to propose alternatives to the naming of anything in this
> series.
>
> = Open Issues
>
> I originally decided to convert hex.c as an example, but quickly found
> out that this caused segfaults.  As part of setup, we call
> is_git_directory, which calls validate_headref, which ends up calling
> get_sha1_hex.  Obviously, we don't have a repository, so the hash
> algorithm isn't set up yet.  This is an area we'll need to consider
> making hash function agnostic, and we may also need to consider
> inserting a hash constant integer into struct object_id if we're going
> to do that.  Alternatively, we could just paper over this issue as a
> special case.

We could introduce new files .sha1 or .sha256 in the config dir, whose
existence can be checked prior to validating the HEAD.

Maybe the HEAD validation can be non-strict and could check
if we'd have at least 'n' hex chars and at most 'm' hex chars. 'n' and 'm'
depend on the hash functions supported of that version of git. (Maybe
we'd validate later again to have the correct hash function length,
or we could also allow abbreviated hash names for HEAD)

> Clearly we're going to want to expose some sort of lookup functionality
> for hash algorithms.  We'll need to expose lookup by name (for the
> .git/config file and any command-line options), but we may want other
> functions as well.  What functions should those be?  Should we expose
> the structure or the constant for those lookup functions?  If the
> structure, we'll probably need to expose the constant in the structure
> as well for easy use.
>
> Should we avoid exposing the array of structure altogether and wrap this
> in a function?
>
> We could expose a union of hash context structures and take that as the
> pointer type for the API calls.  That would probably obviate the need
> for ctxsz.
>
> We could expose hex versions of the blob constants if desired.  This
> might make converting the remaining pieces of code that use them easier.
>
> There are probably dozens of other things I haven't thought of yet as
> well.
>
> brian m. carlson (6):
>   vcs-svn: remove unused prototypes
>   vcs-svn: rename repo functions to "svn_repo"
>   setup: expose enumerated repo info
>   Add structure representing hash algorithm
>   Integrate hash algorithm support with repo setup
>   Switch empty tree and blob lookups to use hash abstraction
>
>  builtin/am.c        |  2 +-
>  builtin/checkout.c  |  2 +-
>  builtin/diff.c      |  2 +-
>  builtin/pull.c      |  2 +-
>  cache.h             | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  diff-lib.c          |  2 +-
>  merge-recursive.c   |  2 +-
>  notes-merge.c       |  2 +-
>  repository.c        |  7 +++++++
>  repository.h        |  5 +++++
>  sequencer.c         |  6 +++---
>  setup.c             | 48 +++++++++++++++++++++++++++---------------------
>  sha1_file.c         | 29 +++++++++++++++++++++++++++++
>  submodule.c         |  2 +-
>  vcs-svn/repo_tree.c |  6 +++---
>  vcs-svn/repo_tree.h | 13 +++----------
>  vcs-svn/svndump.c   |  8 ++++----
>  17 files changed, 133 insertions(+), 53 deletions(-)
>

      parent reply index

Thread overview: 11+ messages in thread (expand / mbox.gz / Atom feed / [top])
2017-08-21  0:00 brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo" brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 3/6] setup: expose enumerated repo info brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
2017-08-21 21:08   ` Stefan Beller
2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
2017-08-21 21:16   ` Stefan Beller
2017-08-21  0:00 ` [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
2017-08-21  0:18 ` [RFC PATCH 0/6] Hash Abstraction Junio C Hamano
2017-08-21 20:48 ` Stefan Beller [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply to all the recipients using the --to, --cc,
  and --in-reply-to switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZ1CTza=PEqP=3gTT=81HRAAUfPWQyWKxNXKf5G0e+nNg@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox