git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	git@vger.kernel.org, "Martin Ågren" <martin.agren@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
Date: Fri, 11 Jan 2019 09:51:06 -0500	[thread overview]
Message-ID: <20190111145106.GB16754@sigill.intra.peff.net> (raw)
In-Reply-To: <20190110235551.GM423984@genre.crustytoothpaste.net>

On Thu, Jan 10, 2019 at 11:55:51PM +0000, brian m. carlson wrote:

> > I think the only reason they are "struct object_id" is because that's
> > what tree_entry_extract() returns. Should we be providing another
> > function to allow more byte-oriented access?
> 
> The reason is we recursively call splice_tree, passing rewrite_here as
> the first argument. That argument is then used for read_object_file,
> which requires a struct object_id * (because it is, logically, an object
> ID).
> 
> I *think* we could fix this by copying an unsigned char *rewrite_here
> into a temporary struct object_id before we recurse, but it's not
> obvious to me if that's safe to do.

I think rewrite_here needs to be a direct pointer into the buffer, since
we plan to modify it.

I think rewrite_with is correct to be an object_id. It's either the oid
passed in from the caller, or the subtree we generated (in which case
it's the result from write_object_file).

So the "most correct" thing is probably something like this:

diff --git a/match-trees.c b/match-trees.c
index 03e81b56e1..129b13a970 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1;
 			break;
 		}
 		update_tree_entry(&desc);
@@ -224,7 +231,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	} else {
 		rewrite_with = oid2;
 	}
-	hashcpy(rewrite_here->hash, rewrite_with->hash);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

I think if I were trying to write this in a less-subtle way, I'd
probably stop trying to do it in-place, and have a copy loop more like:

  for entry in src_tree
    if match_entry(entry, prefix)
      entry = rewrite_entry(entry) /* either oid2 or subtree */
    push_entry(dst_tree)

We may even have to go that way eventually if we might ever be rewriting
to a tree with a different hash size (i.e., there is a hidden assumption
here that rewrite_here points to exactly the_hash_algo->rawsz bytes of
hash). But I think for now it's not necessary, and it's way outside the
scope of what you're trying to do here.

-Peff

  reply	other threads:[~2019-01-11 14:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
2019-01-08 17:51   ` Junio C Hamano
2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin
2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan
2019-01-08 21:35   ` Junio C Hamano
2019-01-09 21:28     ` Stefan Beller
2019-01-09  7:37 ` Martin Ågren
2019-01-09 21:06   ` Martin Ågren
2019-01-10  1:02     ` brian m. carlson
2019-01-10 18:55       ` Junio C Hamano
2019-01-10 19:03       ` Martin Ågren
2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
2019-01-10  4:25       ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson
2019-01-10  4:25       ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
2019-01-10  4:25       ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson
2019-01-10  6:45         ` Jeff King
2019-01-10 23:55           ` brian m. carlson
2019-01-11 14:51             ` Jeff King [this message]
2019-01-11 14:54               ` Jeff King
2019-01-14  1:30                 ` brian m. carlson
2019-01-14 15:40                   ` Jeff King
2019-01-10  4:25       ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson
2019-01-10  6:49         ` Jeff King
2019-01-10 23:57           ` brian m. carlson
2019-01-10  4:25       ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
2019-01-10  6:50         ` Jeff King
2019-01-10  6:40       ` [PATCH 0/5] tree-walk object_id refactor Jeff King
2019-01-11  0:17         ` brian m. carlson
2019-01-11 14:17           ` Jeff King
2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
2019-01-15  0:39       ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson
2019-01-15  0:39       ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
2019-01-15  0:39       ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson
2019-01-15  0:39       ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson
2019-01-15  0:39       ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
2019-01-15 17:51       ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano
2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
2019-01-10 19:05   ` Junio C Hamano
2019-01-10 19:46   ` Junio C Hamano
2019-01-10 18:02 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190111145106.GB16754@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).