git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [RFC PATCH 4/6] Add structure representing hash algorithm
Date: Mon, 21 Aug 2017 14:08:40 -0700	[thread overview]
Message-ID: <CAGZ79ka1X5zJwEzH8Pe2vNX8FaKSP_3m6D0Fva4cpO=QUEaeRw@mail.gmail.com> (raw)
In-Reply-To: <20170821000022.26729-5-sandals@crustytoothpaste.net>

On Sun, Aug 20, 2017 at 5:00 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Since in the future we want to support an additional hash algorithm, add
> a structure that represents a hash algorithm and all the data that must
> go along with it.  Add a constant to allow easy enumeration of hash
> algorithms.  Implement function typedefs to create an abstract API that
> can be used by any hash algorithm, and wrappers for the existing SHA1
> functions that conform to this API.
>
> Don't include an entry in the hash algorithm structure for the null
> object ID.  As this value is all zeros, any suitably sized all-zero
> object ID can be used, and there's no need to store a given one on a
> per-hash basis.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  cache.h     | 36 ++++++++++++++++++++++++++++++++++++
>  sha1_file.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 1c69d2a05a..375a7fb15e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -76,6 +76,42 @@ struct object_id {
>         unsigned char hash[GIT_MAX_RAWSZ];
>  };
>
> +#define GIT_HASH_SHA1 0
> +
> +typedef void (*git_hash_init_fn)(void *ctx);
> +typedef void (*git_hash_update_fn)(void *ctx, const void *in, size_t len);
> +typedef void (*git_hash_final_fn)(unsigned char *hash, void *ctx);
> +
> +struct git_hash_algo {
> +       /* The name of the algorithm, as appears in the config file. */
> +       const char *name;

and potentially in error messages?

> +
> +       /* The size of a hash context (e.g. git_SHA_CTX). */
> +       size_t ctxsz;
> +
> +       /* The length of the hash in bytes. */
> +       size_t rawsz;

[in binary, as opposed to the next entry]

> +
> +       /* The length of the hash in hex characters. */
> +       size_t hexsz;

By having two entries for binary and hex size, we current
choice of needing twice as much choice for the human
representation (as is inherent from the binary <-> hex
conversion, one char stores 8 or 4 bit); so this is good to
have. But is it misnamed? (This abstraction only makes
sense if we ever plan to have human readable representation
not factor 2, e.g. base64 for the non-binary representation.
But then the comment is wrong!)

Maybe s/hex characters/string representation suited for humans/ ?
(Bad naming proposal, but still)

> +       /* The hash initialization function. */
> +       git_hash_init_fn init_fn;
> +
> +       /* The hash update function. */
> +       git_hash_update_fn update_fn;
> +
> +       /* The hash finalization function. */
> +       git_hash_final_fn final_fn;

I shortly wondered if we want to have just one
pointer to a struct that contains these 3 functions,
but that seems overly complex.

> +       /* The OID of the empty tree. */
> +       const struct object_id *empty_tree;
> +
> +       /* The OID of the empty blob. */
> +       const struct object_id *empty_blob;

In a more object oriented world, this would
not be as micro-optimized(?), but rather be:

    object o = new object()
    o.getHash()

or such, but we tend to access empty_{blob,tree}
quite often in the code base. So it seems like a good
idea to keep this shortcut around.


> +};
> +extern const struct git_hash_algo hash_algos[1];
> +
>  #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
>  #define DTYPE(de)      ((de)->d_type)
>  #else
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f70..bd9ff02802 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -41,6 +41,35 @@ const struct object_id empty_blob_oid = {
>         EMPTY_BLOB_SHA1_BIN_LITERAL
>  };
>
> +static inline void git_hash_sha1_init(void *ctx)
> +{
> +       git_SHA1_Init((git_SHA_CTX *)ctx);
> +}
> +
> +static inline void git_hash_sha1_update(void *ctx, const void *data, size_t len)
> +{
> +       git_SHA1_Update((git_SHA_CTX *)ctx, data, len);
> +}
> +
> +static inline void git_hash_sha1_final(unsigned char *hash, void *ctx)
> +{
> +       git_SHA1_Final(hash, (git_SHA_CTX *)ctx);
> +}
> +
> +const struct git_hash_algo hash_algos[1] = {
> +       [GIT_HASH_SHA1] = {
> +               .name = "sha-1",
> +               .ctxsz = sizeof(git_SHA_CTX),
> +               .rawsz = GIT_SHA1_RAWSZ,
> +               .hexsz = GIT_SHA1_HEXSZ,
> +               .init_fn = git_hash_sha1_init,
> +               .update_fn = git_hash_sha1_update,
> +               .final_fn = git_hash_sha1_final,
> +               .empty_tree = &empty_tree_oid,
> +               .empty_blob = &empty_blob_oid,
> +       },
> +};

This is the same new pattern as in 512f41cfac (clean.c: use
designated initializer, 2017-07-14)

Maybe we want to keep just one test balloon and not mix
this one in there, too?

> +
>  /*
>   * This is meant to hold a *small* number of objects that you would
>   * want read_sha1_file() to be able to return, but yet you do not want

  reply	other threads:[~2017-08-21 21:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-21  0:00 [RFC PATCH 0/6] Hash Abstraction brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 1/6] vcs-svn: remove unused prototypes brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 2/6] vcs-svn: rename repo functions to "svn_repo" brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 3/6] setup: expose enumerated repo info brian m. carlson
2017-08-21  0:00 ` [RFC PATCH 4/6] Add structure representing hash algorithm brian m. carlson
2017-08-21 21:08   ` Stefan Beller [this message]
2017-08-21  0:00 ` [RFC PATCH 5/6] Integrate hash algorithm support with repo setup brian m. carlson
2017-08-21 21:16   ` Stefan Beller
2017-08-21  0:00 ` [RFC PATCH 6/6] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
2017-08-21  0:18 ` [RFC PATCH 0/6] Hash Abstraction Junio C Hamano
2017-08-21 20:48 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79ka1X5zJwEzH8Pe2vNX8FaKSP_3m6D0Fva4cpO=QUEaeRw@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).