[PATCH 0/5] Modernize read_graft

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/5] Modernize read_graft_line implementation
@ 2017-08-15 11:49 Patryk Obara
  2017-08-15 11:49 ` [PATCH 1/5] cache: extend object_id size to sha3-256 Patryk Obara
                   ` (6 more replies)
  0 siblings, 7 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

I experimented with using a different hash algorithm (I am aware of
existing "Git hash function transition plan", I just want to push
things forward a bit) - and immediately hit a small issue - changing
the size of object_id hash buffer leads to compilation issues and
breaks graft-related tests.

I am sending patch 1 only to show a modification, that I did to
increase buffer size - it's not intended to be merged.

Patch 2 fixes trivial compilation issue.

Patches 3, 4, and 5 touch graft implementation to remove calculations
using GIT_SHA1_*, that lead to broken tests. I replaced FLEX_ARRAY of
object_id's representing parents with oid_array. New implementation
should be more future-proof, I think.

New implementation has tiny behaviour change: previously parents in
graft line needed to be separated with single space - now any number
of whitespace characters will do.

Alternative implementation approaches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Strbuf could be replaced with string_list with
string_list_split_in_place instead of while loop in read_graft_line.
I didn't implement it this way because I learned
about string_list_split_in_place after finishing this implementation
draft. Right now I'm not sure which approach is better.

Another possibility is dropping graft feature altogether - that would
mean removing code for parsing grafts and 'parent' field in the struct,
but preserving the struct itself as a shallow clone marker. Grafts are
a little-known feature with modern replacement, but this seems like
bigger task and rather out of the scope of transition to the new
hashing algorithm.

I considered making function read_graft_line a static one and
read_graft_file non-static, but read_graft_line is used in
'builtin/blame.c' in function read_ancestry, which is almost a copy of
read_graft_file (difference of single boolean flag passed to
register_commit_graft). Removal of this duplication may be worthwhile,
but I think it's out of scope.

Patryk Obara (5):
  cache: extend object_id size to sha3-256
  sha1_file: fix hardcoded size in null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: implement free_commit_graft
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 cache.h         |  8 ++++++--
 commit.c        | 55 ++++++++++++++++++++++++++++++++-----------------------
 commit.h        |  5 +++--
 sha1_file.c     |  2 +-
 shallow.c       |  1 +
 6 files changed, 44 insertions(+), 29 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/5] cache: extend object_id size to sha3-256
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
@ 2017-08-15 11:49 ` Patryk Obara
  2017-08-15 11:49 ` [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

This commit is not intended to be merged - it serves only as context for
next patches in this thread.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 cache.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 1c69d2a..ad8a57c 100644
--- a/cache.h
+++ b/cache.h
@@ -68,9 +68,13 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
 
+/* The length in bytes and in hex digits of an object name (SHA3-256 value). */
+#define GIT_SHA3_256_RAWSZ 32
+#define GIT_SHA3_256_HEXSZ (2 * GIT_SHA3_256_RAWSZ)
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA3_256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA3_256_HEXSZ
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
  2017-08-15 11:49 ` [PATCH 1/5] cache: extend object_id size to sha3-256 Patryk Obara
@ 2017-08-15 11:49 ` Patryk Obara
  2017-08-15 18:23   ` Junio C Hamano
  2017-08-15 11:49 ` [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

This prevents compilation error if GIT_MAX_RAWSZ is different than 20.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
 	EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
  2017-08-15 11:49 ` [PATCH 1/5] cache: extend object_id size to sha3-256 Patryk Obara
  2017-08-15 11:49 ` [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
@ 2017-08-15 11:49 ` Patryk Obara
  2017-08-15 17:02   ` Stefan Beller
  2017-08-15 18:25   ` Junio C Hamano
  2017-08-15 11:49 ` [PATCH 4/5] commit: implement free_commit_graft Patryk Obara
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 builtin/blame.c |  2 +-
 commit.c        | 11 ++++++-----
 commit.h        |  2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (graft)
 			register_commit_graft(graft, 0);
 	}
diff --git a/commit.c b/commit.c
index 8b28415..499fb14 100644
--- a/commit.c
+++ b/commit.c
@@ -134,15 +134,16 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i;
+	int i, len;
+	char *buf = line->buf;
 	struct commit_graft *graft = NULL;
 	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-	while (len && isspace(buf[len-1]))
-		buf[--len] = '\0';
+	strbuf_rtrim(line);
+	len = line->len;
 	if (buf[0] == '#' || buf[0] == '\0')
 		return NULL;
 	if ((len + 1) % entry_size)
@@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (!graft)
 			continue;
 		if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 4/5] commit: implement free_commit_graft
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
                   ` (2 preceding siblings ...)
  2017-08-15 11:49 ` [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-15 11:49 ` Patryk Obara
  2017-08-15 17:04   ` Stefan Beller
  2017-08-15 18:26   ` Junio C Hamano
  2017-08-15 11:49 ` [PATCH 5/5] commit: rewrite read_graft_line Patryk Obara
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 11 ++++++++---
 commit.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 499fb14..6a145f1 100644
--- a/commit.c
+++ b/commit.c
@@ -109,15 +109,20 @@ static int commit_graft_pos(const unsigned char *sha1)
 			commit_graft_sha1_access);
 }
 
+void free_commit_graft(struct commit_graft *graft)
+{
+	free(graft);
+}
+
 int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 {
 	int pos = commit_graft_pos(graft->oid.hash);
 
 	if (0 <= pos) {
 		if (ignore_dups)
-			free(graft);
+			free_commit_graft(graft);
 		else {
-			free(commit_graft[pos]);
+			free_commit_graft(commit_graft[pos]);
 			commit_graft[pos] = graft;
 		}
 		return 1;
@@ -163,7 +168,7 @@ struct commit_graft *read_graft_line(struct strbuf *line)
 
 bad_graft_data:
 	error("bad graft data: %s", buf);
-	free(graft);
+	free_commit_graft(graft);
 	return NULL;
 }
 
diff --git a/commit.h b/commit.h
index baecc0a..c1b319f 100644
--- a/commit.h
+++ b/commit.h
@@ -247,6 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
+void free_commit_graft(struct commit_graft *);
 struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 5/5] commit: rewrite read_graft_line
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
                   ` (3 preceding siblings ...)
  2017-08-15 11:49 ` [PATCH 4/5] commit: implement free_commit_graft Patryk Obara
@ 2017-08-15 11:49 ` Patryk Obara
  2017-08-15 17:11   ` Stefan Beller
  2017-08-15 18:30   ` Junio C Hamano
  2017-08-15 17:19 ` [PATCH 0/5] Modernize read_graft_line implementation Stefan Beller
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
  6 siblings, 2 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-15 11:49 UTC (permalink / raw)
  To: git, brian m . carlson, Junio C Hamano; +Cc: Patryk Obara

The previous implementation of read_graft_line used calculations based
on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
ids in a single graft line.  New implementation does not depend on these
constants, so it adapts to any object_id buffer size.

To make this possible, FLEX_ARRAY of object_id in struct was replaced
by an oid_array.

Code allocating graft now needs to use memset to zero the memory before
use to start with oid_array in a consistent state.

Updates free_graft function implemented in the previous patch to
properly cleanup an oid_array storing parents.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c  | 39 +++++++++++++++++++++------------------
 commit.h  |  2 +-
 shallow.c |  1 +
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index 6a145f1..75dd45d 100644
--- a/commit.c
+++ b/commit.c
@@ -111,6 +111,7 @@ static int commit_graft_pos(const unsigned char *sha1)
 
 void free_commit_graft(struct commit_graft *graft)
 {
+	oid_array_clear(&graft->parents);
 	free(graft);
 }
 
@@ -139,35 +140,37 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
+static int parse_next_oid_hex(const char *buf, struct object_id *oid, const char **end)
+{
+	while (isspace(buf[0]))
+		buf++;
+	return parse_oid_hex(buf, oid, end);
+}
+
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i, len;
-	char *buf = line->buf;
 	struct commit_graft *graft = NULL;
-	const int entry_size = GIT_SHA1_HEXSZ + 1;
+	struct object_id oid;
+	const char *tail = NULL;
 
 	strbuf_rtrim(line);
-	len = line->len;
-	if (buf[0] == '#' || buf[0] == '\0')
+	if (line->buf[0] == '#' || line->len == 0)
 		return NULL;
-	if ((len + 1) % entry_size)
+	graft = xmalloc(sizeof(*graft));
+	memset(graft, 0, sizeof(*graft));
+	if (parse_oid_hex(line->buf, &graft->oid, &tail))
 		goto bad_graft_data;
-	i = (len + 1) / entry_size - 1;
-	graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
-	graft->nr_parent = i;
-	if (get_oid_hex(buf, &graft->oid))
+	while (!parse_next_oid_hex(tail, &oid, &tail))
+		oid_array_append(&graft->parents, &oid);
+	if (tail[0] != '\0')
 		goto bad_graft_data;
-	for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-		if (buf[i] != ' ')
-			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
-			goto bad_graft_data;
-	}
+	graft->nr_parent = graft->parents.nr;
+
 	return graft;
 
 bad_graft_data:
-	error("bad graft data: %s", buf);
+	error("bad graft data: %s", line->buf);
 	free_commit_graft(graft);
 	return NULL;
 }
@@ -363,7 +366,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		int i;
 		struct commit *new_parent;
 		for (i = 0; i < graft->nr_parent; i++) {
-			new_parent = lookup_commit(&graft->parent[i]);
+			new_parent = lookup_commit(&graft->parents.oid[i]);
 			if (!new_parent)
 				continue;
 			pptr = &commit_list_insert(new_parent, pptr)->next;
diff --git a/commit.h b/commit.h
index c1b319f..070d45d 100644
--- a/commit.h
+++ b/commit.h
@@ -243,7 +243,7 @@ void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
 struct commit_graft {
 	struct object_id oid;
 	int nr_parent; /* < 0 if shallow commit */
-	struct object_id parent[FLEX_ARRAY]; /* more */
+	struct oid_array parents;
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
diff --git a/shallow.c b/shallow.c
index f5591e5..892cd90 100644
--- a/shallow.c
+++ b/shallow.c
@@ -33,6 +33,7 @@ int register_shallow(const struct object_id *oid)
 		xmalloc(sizeof(struct commit_graft));
 	struct commit *commit = lookup_commit(oid);
 
+	memset(graft, 0, sizeof(*graft));
 	oidcpy(&graft->oid, oid);
 	graft->nr_parent = -1;
 	if (commit && commit->object.parsed)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-15 11:49 ` [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-15 17:02   ` Stefan Beller
  2017-08-16 12:24     ` Patryk Obara
  2017-08-15 18:25   ` Junio C Hamano
  1 sibling, 1 reply; 59+ messages in thread
From: Stefan Beller @ 2017-08-15 17:02 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git@vger.kernel.org, brian m . carlson, Junio C Hamano

On Tue, Aug 15, 2017 at 4:49 AM, Patryk Obara <patryk.obara@gmail.com> wrote:
> This simplifies function declaration and allows for use of strbuf_rtrim
> instead of modifying buffer directly.
>
> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  builtin/blame.c |  2 +-
>  commit.c        | 11 ++++++-----
>  commit.h        |  2 +-
>  3 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index bda1a78..d4472e9 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
>                 return -1;
>         while (!strbuf_getwholeline(&buf, fp, '\n')) {
>                 /* The format is just "Commit Parent1 Parent2 ...\n" */
> -               struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
> +               struct commit_graft *graft = read_graft_line(&buf);
>                 if (graft)
>                         register_commit_graft(graft, 0);
>         }
> diff --git a/commit.c b/commit.c
> index 8b28415..499fb14 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -134,15 +134,16 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>         return 0;
>  }
>
> -struct commit_graft *read_graft_line(char *buf, int len)
> +struct commit_graft *read_graft_line(struct strbuf *line)
>  {
>         /* The format is just "Commit Parent1 Parent2 ...\n" */
> -       int i;
> +       int i, len;
> +       char *buf = line->buf;
>         struct commit_graft *graft = NULL;
>         const int entry_size = GIT_SHA1_HEXSZ + 1;

outside the scope of this patch:
Is GIT_SHA1_HEXSZ or GIT_MAX_HEXSZ the right call here?

>
> -       while (len && isspace(buf[len-1]))
> -               buf[--len] = '\0';
> +       strbuf_rtrim(line);
> +       len = line->len;
>         if (buf[0] == '#' || buf[0] == '\0')
>                 return NULL;
>         if ((len + 1) % entry_size)
> @@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
>                 return -1;
>         while (!strbuf_getwholeline(&buf, fp, '\n')) {
>                 /* The format is just "Commit Parent1 Parent2 ...\n" */
> -               struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
> +               struct commit_graft *graft = read_graft_line(&buf);
>                 if (!graft)
>                         continue;
>                 if (register_commit_graft(graft, 1))
> diff --git a/commit.h b/commit.h
> index 6d857f0..baecc0a 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -247,7 +247,7 @@ struct commit_graft {
>  };
>  typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
>
> -struct commit_graft *read_graft_line(char *buf, int len);
> +struct commit_graft *read_graft_line(struct strbuf *line);
>  int register_commit_graft(struct commit_graft *, int);
>  struct commit_graft *lookup_commit_graft(const struct object_id *oid);
>
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/5] commit: implement free_commit_graft
  2017-08-15 11:49 ` [PATCH 4/5] commit: implement free_commit_graft Patryk Obara
@ 2017-08-15 17:04   ` Stefan Beller
  2017-08-15 18:26   ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Stefan Beller @ 2017-08-15 17:04 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git@vger.kernel.org, brian m . carlson, Junio C Hamano

On Tue, Aug 15, 2017 at 4:49 AM, Patryk Obara <patryk.obara@gmail.com> wrote:

Here is a good place to explain why this is a good patch,
(which is not immediately obvious to me at least).

> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  commit.c | 11 ++++++++---
>  commit.h |  1 +
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 499fb14..6a145f1 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -109,15 +109,20 @@ static int commit_graft_pos(const unsigned char *sha1)
>                         commit_graft_sha1_access);
>  }
>
> +void free_commit_graft(struct commit_graft *graft)
> +{
> +       free(graft);
> +}
> +
>  int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>  {
>         int pos = commit_graft_pos(graft->oid.hash);
>
>         if (0 <= pos) {
>                 if (ignore_dups)
> -                       free(graft);
> +                       free_commit_graft(graft);
>                 else {
> -                       free(commit_graft[pos]);
> +                       free_commit_graft(commit_graft[pos]);
>                         commit_graft[pos] = graft;
>                 }
>                 return 1;
> @@ -163,7 +168,7 @@ struct commit_graft *read_graft_line(struct strbuf *line)
>
>  bad_graft_data:
>         error("bad graft data: %s", buf);
> -       free(graft);
> +       free_commit_graft(graft);
>         return NULL;
>  }
>
> diff --git a/commit.h b/commit.h
> index baecc0a..c1b319f 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -247,6 +247,7 @@ struct commit_graft {
>  };
>  typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
>
> +void free_commit_graft(struct commit_graft *);
>  struct commit_graft *read_graft_line(struct strbuf *line);
>  int register_commit_graft(struct commit_graft *, int);
>  struct commit_graft *lookup_commit_graft(const struct object_id *oid);
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/5] commit: rewrite read_graft_line
  2017-08-15 11:49 ` [PATCH 5/5] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-15 17:11   ` Stefan Beller
  2017-08-15 18:30   ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Stefan Beller @ 2017-08-15 17:11 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git@vger.kernel.org, brian m . carlson, Junio C Hamano

On Tue, Aug 15, 2017 at 4:49 AM, Patryk Obara <patryk.obara@gmail.com> wrote:
> The previous implementation of read_graft_line used calculations based
> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
> ids in a single graft line.  New implementation does not depend on these
> constants, so it adapts to any object_id buffer size.
>
> To make this possible, FLEX_ARRAY of object_id in struct was replaced
> by an oid_array.
>
> Code allocating graft now needs to use memset to zero the memory before
> use to start with oid_array in a consistent state.
>
> Updates free_graft function implemented in the previous patch to
> properly cleanup an oid_array storing parents.
>
> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  commit.c  | 39 +++++++++++++++++++++------------------
>  commit.h  |  2 +-
>  shallow.c |  1 +
>  3 files changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 6a145f1..75dd45d 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -111,6 +111,7 @@ static int commit_graft_pos(const unsigned char *sha1)
>
>  void free_commit_graft(struct commit_graft *graft)
>  {
> +       oid_array_clear(&graft->parents);

Oh patch 4/5 could just say it's in preparation of this patch.
Nevermind the comment on patch 4.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/5] Modernize read_graft_line implementation
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
                   ` (4 preceding siblings ...)
  2017-08-15 11:49 ` [PATCH 5/5] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-15 17:19 ` Stefan Beller
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
  6 siblings, 0 replies; 59+ messages in thread
From: Stefan Beller @ 2017-08-15 17:19 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git@vger.kernel.org, brian m . carlson, Junio C Hamano

On Tue, Aug 15, 2017 at 4:49 AM, Patryk Obara <patryk.obara@gmail.com> wrote:

Welcome (back?) to the git mailing list!

> I experimented with using a different hash algorithm (I am aware of
> existing "Git hash function transition plan", I just want to push
> things forward a bit) - and immediately hit a small issue - changing
> the size of object_id hash buffer leads to compilation issues and
> breaks graft-related tests.

Thanks for advancing this frontier. :)

>
> I am sending patch 1 only to show a modification, that I did to
> increase buffer size - it's not intended to be merged.
>
> Patch 2 fixes trivial compilation issue.
>
> Patches 3, 4, and 5 touch graft implementation to remove calculations
> using GIT_SHA1_*, that lead to broken tests. I replaced FLEX_ARRAY of
> object_id's representing parents with oid_array. New implementation
> should be more future-proof, I think.

I would think so, too.

parse_oid_hex currently only reads sha1, but once it can read a new
hash (or both old and new hash), it would solve the graft problems.

> New implementation has tiny behaviour change: previously parents in
> graft line needed to be separated with single space - now any number
> of whitespace characters will do.

Yeah that is because of parse_next_oid_hex in patch 5 is pretty smart
(and if we'd want to preserve behavior we'd need to just skip one SP
and in case of more SP "goto bad_graft_data" that is in the
caller function.

>
> Alternative implementation approaches
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Strbuf could be replaced with string_list with
> string_list_split_in_place instead of while loop in read_graft_line.
> I didn't implement it this way because I learned
> about string_list_split_in_place after finishing this implementation
> draft. Right now I'm not sure which approach is better.
>
> Another possibility is dropping graft feature altogether - that would
> mean removing code for parsing grafts and 'parent' field in the struct,
> but preserving the struct itself as a shallow clone marker. Grafts are
> a little-known feature with modern replacement, but this seems like
> bigger task and rather out of the scope of transition to the new
> hashing algorithm.
>
> I considered making function read_graft_line a static one and
> read_graft_file non-static, but read_graft_line is used in
> 'builtin/blame.c' in function read_ancestry, which is almost a copy of
> read_graft_file (difference of single boolean flag passed to
> register_commit_graft). Removal of this duplication may be worthwhile,
> but I think it's out of scope.

I think the grafts may be still in use in Linux, to fault in the
history before git was used, which cannot be replaced by the
shallow mechanism.

Thanks for the patches 2-5!

Stefan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-15 11:49 ` [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
@ 2017-08-15 18:23   ` Junio C Hamano
  2017-08-15 18:29     ` Stefan Beller
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-15 18:23 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> This prevents compilation error if GIT_MAX_RAWSZ is different than 20.
>
> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  sha1_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I think this is OK for "null" thing, but in general I feel
ambivalent when I see the use of "MAX" thing.

Use of "MAX" here implies that we wish to support multiple hashes at
the same time in a single binary, so that longer or shorter hashes
fit there.  But in such a case, the callers would want a union of
some sort so that they can say "I am using SHA2, give me the null
thing"?  

I said this is OK for "null" because we assume we will use ^\0{len}$
for any hash function we choose as the "impossible" value, and for
that particular use pattern, we do not need such a union.  Just
letting the caller peek at an appropriate number of bytes at the
beginning of that NUL buffer for hash the caller wants to use is
sufficient.

Thanks.

>
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15..f5b5bec 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -32,7 +32,7 @@
>  #define SZ_FMT PRIuMAX
>  static inline uintmax_t sz_fmt(size_t s) { return s; }
>  
> -const unsigned char null_sha1[20];
> +const unsigned char null_sha1[GIT_MAX_RAWSZ];
>  const struct object_id null_oid;
>  const struct object_id empty_tree_oid = {
>  	EMPTY_TREE_SHA1_BIN_LITERAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-15 11:49 ` [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
  2017-08-15 17:02   ` Stefan Beller
@ 2017-08-15 18:25   ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-15 18:25 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> This simplifies function declaration and allows for use of strbuf_rtrim
> instead of modifying buffer directly.
>
> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  builtin/blame.c |  2 +-
>  commit.c        | 11 ++++++-----
>  commit.h        |  2 +-
>  3 files changed, 8 insertions(+), 7 deletions(-)

Looks good; both existing callers already have strbuf, and we do not
expect we will gain a lot more callers to this ancient facility, so
there is no point to have a more accomodating API that separately
takes <ptr,len>.

> diff --git a/builtin/blame.c b/builtin/blame.c
> index bda1a78..d4472e9 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
>  		return -1;
>  	while (!strbuf_getwholeline(&buf, fp, '\n')) {
>  		/* The format is just "Commit Parent1 Parent2 ...\n" */
> -		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
> +		struct commit_graft *graft = read_graft_line(&buf);
>  		if (graft)
>  			register_commit_graft(graft, 0);
>  	}
> diff --git a/commit.c b/commit.c
> index 8b28415..499fb14 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -134,15 +134,16 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>  	return 0;
>  }
>  
> -struct commit_graft *read_graft_line(char *buf, int len)
> +struct commit_graft *read_graft_line(struct strbuf *line)
>  {
>  	/* The format is just "Commit Parent1 Parent2 ...\n" */
> -	int i;
> +	int i, len;
> +	char *buf = line->buf;
>  	struct commit_graft *graft = NULL;
>  	const int entry_size = GIT_SHA1_HEXSZ + 1;
>  
> -	while (len && isspace(buf[len-1]))
> -		buf[--len] = '\0';
> +	strbuf_rtrim(line);
> +	len = line->len;
>  	if (buf[0] == '#' || buf[0] == '\0')
>  		return NULL;
>  	if ((len + 1) % entry_size)
> @@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
>  		return -1;
>  	while (!strbuf_getwholeline(&buf, fp, '\n')) {
>  		/* The format is just "Commit Parent1 Parent2 ...\n" */
> -		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
> +		struct commit_graft *graft = read_graft_line(&buf);
>  		if (!graft)
>  			continue;
>  		if (register_commit_graft(graft, 1))
> diff --git a/commit.h b/commit.h
> index 6d857f0..baecc0a 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -247,7 +247,7 @@ struct commit_graft {
>  };
>  typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
>  
> -struct commit_graft *read_graft_line(char *buf, int len);
> +struct commit_graft *read_graft_line(struct strbuf *line);
>  int register_commit_graft(struct commit_graft *, int);
>  struct commit_graft *lookup_commit_graft(const struct object_id *oid);

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/5] commit: implement free_commit_graft
  2017-08-15 11:49 ` [PATCH 4/5] commit: implement free_commit_graft Patryk Obara
  2017-08-15 17:04   ` Stefan Beller
@ 2017-08-15 18:26   ` Junio C Hamano
  2017-08-16 12:28     ` Patryk Obara
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-15 18:26 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  commit.c | 11 ++++++++---
>  commit.h |  1 +
>  2 files changed, 9 insertions(+), 3 deletions(-)

I do not see a need to make this new function extern.  Shouldn't it
be made "static" and revert the change to commit.h?

> diff --git a/commit.c b/commit.c
> index 499fb14..6a145f1 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -109,15 +109,20 @@ static int commit_graft_pos(const unsigned char *sha1)
>  			commit_graft_sha1_access);
>  }
>  
> +void free_commit_graft(struct commit_graft *graft)
> +{
> +	free(graft);
> +}
> +
>  int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>  {
>  	int pos = commit_graft_pos(graft->oid.hash);
>  
>  	if (0 <= pos) {
>  		if (ignore_dups)
> -			free(graft);
> +			free_commit_graft(graft);
>  		else {
> -			free(commit_graft[pos]);
> +			free_commit_graft(commit_graft[pos]);
>  			commit_graft[pos] = graft;
>  		}
>  		return 1;
> @@ -163,7 +168,7 @@ struct commit_graft *read_graft_line(struct strbuf *line)
>  
>  bad_graft_data:
>  	error("bad graft data: %s", buf);
> -	free(graft);
> +	free_commit_graft(graft);
>  	return NULL;
>  }
>  
> diff --git a/commit.h b/commit.h
> index baecc0a..c1b319f 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -247,6 +247,7 @@ struct commit_graft {
>  };
>  typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
>  
> +void free_commit_graft(struct commit_graft *);
>  struct commit_graft *read_graft_line(struct strbuf *line);
>  int register_commit_graft(struct commit_graft *, int);
>  struct commit_graft *lookup_commit_graft(const struct object_id *oid);

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-15 18:23   ` Junio C Hamano
@ 2017-08-15 18:29     ` Stefan Beller
  2017-08-15 19:02       ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Stefan Beller @ 2017-08-15 18:29 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Nieder
  Cc: Patryk Obara, git@vger.kernel.org, brian m . carlson

On Tue, Aug 15, 2017 at 11:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Patryk Obara <patryk.obara@gmail.com> writes:
>
>> This prevents compilation error if GIT_MAX_RAWSZ is different than 20.
>>
>> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
>> ---
>>  sha1_file.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> I think this is OK for "null" thing, but in general I feel
> ambivalent when I see the use of "MAX" thing.
>
> Use of "MAX" here implies that we wish to support multiple hashes at
> the same time in a single binary, so that longer or shorter hashes
> fit there.  But in such a case, the callers would want a union of
> some sort so that they can say "I am using SHA2, give me the null
> thing"?
>
> I said this is OK for "null" because we assume we will use ^\0{len}$
> for any hash function we choose as the "impossible" value, and for
> that particular use pattern, we do not need such a union.  Just
> letting the caller peek at an appropriate number of bytes at the
> beginning of that NUL buffer for hash the caller wants to use is
> sufficient.
>
> Thanks.

Once we have 2 hash functions usable in a local Git installation,
this would be wasteful for the smaller hash function (and the
related grafts).

I think Jonathan once envisioned an 'optimized' version as a
second step, maybe this is a good time to discuss how we'd
get the right size for e.g. allocating memory, as _MAX_ seems
to be not the correct solution long term?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/5] commit: rewrite read_graft_line
  2017-08-15 11:49 ` [PATCH 5/5] commit: rewrite read_graft_line Patryk Obara
  2017-08-15 17:11   ` Stefan Beller
@ 2017-08-15 18:30   ` Junio C Hamano
  2017-08-16 13:07     ` Patryk Obara
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-15 18:30 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> The previous implementation of read_graft_line used calculations based
> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
> ids in a single graft line.  New implementation does not depend on these
> constants, so it adapts to any object_id buffer size.

I am not sure if this is a good approach.  Just like in 2/5 you can
use the MAX thing instead of 20, instead of having each graft entry
allocate a separate oid_array.oid[].

Is this because you expect more than one _kind_ of hashes are used
at the same time?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-15 18:29     ` Stefan Beller
@ 2017-08-15 19:02       ` Junio C Hamano
  2017-08-16 12:11         ` Patryk Obara
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-15 19:02 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Nieder, Patryk Obara, git@vger.kernel.org,
	brian m . carlson

Stefan Beller <sbeller@google.com> writes:

> Once we have 2 hash functions usable in a local Git installation,
> this would be wasteful for the smaller hash function (and the
> related grafts).
>
> I think Jonathan once envisioned an 'optimized' version as a
> second step, maybe this is a good time to discuss how we'd
> get the right size for e.g. allocating memory, as _MAX_ seems
> to be not the correct solution long term?

MAX is inevitable only if we envision that we have to handle objects
named using two or more hashing schemes at the same time, with the
same binary and during the same run inside a single process.

Using MAX may be nicer even if we use only one hashing scheme at a
time, though.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-15 19:02       ` Junio C Hamano
@ 2017-08-16 12:11         ` Patryk Obara
  2017-08-16 19:32           ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 12:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jonathan Nieder, git@vger.kernel.org,
	brian m . carlson

Junio C Hamano <gitster@pobox.com> wrote:

> I said this is OK for "null" because we assume we will use ^\0{len}$
> for any hash function we choose as the "impossible" value, and for
> that particular use pattern, we do not need such a union.  Just
> letting the caller peek at an appropriate number of bytes at the
> beginning of that NUL buffer for hash the caller wants to use is
> sufficient.

Do you think I should record this explanation as either commit message
or comment in sha1_file.c?

> MAX is inevitable only if we envision that we have to handle objects
> named using two or more hashing schemes at the same time, with the
> same binary and during the same run inside a single process.

I think this will be the case if "transition one local repository at
a time" from Jonathan Nieder's transition plan will be followed.
This plan assumes object_id translation happening e.g. during fetch
operation.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-15 17:02   ` Stefan Beller
@ 2017-08-16 12:24     ` Patryk Obara
  2017-08-16 22:59       ` brian m. carlson
  0 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 12:24 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, brian m . carlson, Junio C Hamano

On Tue, Aug 15, 2017 at 7:02 PM, Stefan Beller <sbeller@google.com> wrote:
>>         const int entry_size = GIT_SHA1_HEXSZ + 1;
>
> outside the scope of this patch:
> Is GIT_SHA1_HEXSZ or GIT_MAX_HEXSZ the right call here?

I think neither one. In my opinion, this code should not be so closely
coupled to hash parsing code - it should be tasked with parsing
whitespace separated list of commit ids without relying on specific
commit id length or format.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/5] commit: implement free_commit_graft
  2017-08-15 18:26   ` Junio C Hamano
@ 2017-08-16 12:28     ` Patryk Obara
  0 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 12:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, brian m . carlson

Junio C Hamano <gitster@pobox.com> wrote:
> I do not see a need to make this new function extern.  Shouldn't it
> be made "static" and revert the change to commit.h?

Ah, of course :) I was anticipating to find free(graft); in more places
throughout code but forgot to get back to it once there was none
left.

I will change it in v2.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/5] commit: rewrite read_graft_line
  2017-08-15 18:30   ` Junio C Hamano
@ 2017-08-16 13:07     ` Patryk Obara
  2017-08-16 19:39       ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 13:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, brian m . carlson

Junio C Hamano <gitster@pobox.com> wrote:
> I am not sure if this is a good approach.  Just like in 2/5 you can
> use the MAX thing instead of 20, instead of having each graft entry
> allocate a separate oid_array.oid[].

Once MAX values were increased memory corruption was caused exactly by
this line:

> - graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));

I could've replaced it by:

    graft = xmalloc(st_add(sizeof(*graft), st_mult(sizeof(struct
object_id), i)));

But it seemed to me like short-sighted solution (code might be broken if
object_id will be modified again in future).

> Is this because you expect more than one _kind_ of hashes are used
> at the same time?

Yes - I think more than one kind of hashes will be used at the same time
(meaning in single git binary, not necessarily in a single process), at
the very least to allow read access to repositories in "old" format(s), with
sha1-based grafts. Using MAX for parsing here would prevent this,
requiring either modifying grafts code again in future or forcing right now
some unwelcome design decisions down the line.

My goal here was to make this code compatible with whatever hash
function transition plan will be chosen.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 0/4] Modernize read_graft_line implementation
  2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
                   ` (5 preceding siblings ...)
  2017-08-15 17:19 ` [PATCH 0/5] Modernize read_graft_line implementation Stefan Beller
@ 2017-08-16 17:58 ` Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
                     ` (4 more replies)
  6 siblings, 5 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 17:58 UTC (permalink / raw)
  To: git, sandals, Junio C Hamano, Stefan Beller, Jonathan Nieder; +Cc: Patryk Obara

Compared to v1:
- the first patch is dropped to make it easier to merge
- free_graft is now static function in commit.c

I don't know, what are exact rules about adding Reviewed-by footer, so
I didn't add any.


Patryk Obara (4):
  sha1_file: fix hardcoded size in null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: implement free_commit_graft
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 commit.c        | 55 ++++++++++++++++++++++++++++++++-----------------------
 commit.h        |  4 ++--
 sha1_file.c     |  2 +-
 shallow.c       |  1 +
 5 files changed, 37 insertions(+), 27 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
@ 2017-08-16 17:58   ` Patryk Obara
  2017-08-16 19:46     ` Junio C Hamano
  2017-08-16 17:58   ` [PATCH v2 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 17:58 UTC (permalink / raw)
  To: git, sandals, Junio C Hamano, Stefan Beller, Jonathan Nieder; +Cc: Patryk Obara

This prevents compilation error if GIT_MAX_RAWSZ is different than 20.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
 	EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
@ 2017-08-16 17:58   ` Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 3/4] commit: implement free_commit_graft Patryk Obara
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 17:58 UTC (permalink / raw)
  To: git, sandals, Junio C Hamano, Stefan Beller, Jonathan Nieder; +Cc: Patryk Obara

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 builtin/blame.c |  2 +-
 commit.c        | 11 ++++++-----
 commit.h        |  2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (graft)
 			register_commit_graft(graft, 0);
 	}
diff --git a/commit.c b/commit.c
index 8b28415..499fb14 100644
--- a/commit.c
+++ b/commit.c
@@ -134,15 +134,16 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i;
+	int i, len;
+	char *buf = line->buf;
 	struct commit_graft *graft = NULL;
 	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-	while (len && isspace(buf[len-1]))
-		buf[--len] = '\0';
+	strbuf_rtrim(line);
+	len = line->len;
 	if (buf[0] == '#' || buf[0] == '\0')
 		return NULL;
 	if ((len + 1) % entry_size)
@@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (!graft)
 			continue;
 		if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 3/4] commit: implement free_commit_graft
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-16 17:58   ` Patryk Obara
  2017-08-16 17:58   ` [PATCH v2 4/4] commit: rewrite read_graft_line Patryk Obara
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
  4 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 17:58 UTC (permalink / raw)
  To: git, sandals, Junio C Hamano, Stefan Beller, Jonathan Nieder; +Cc: Patryk Obara

In preparation for new graft struct version introduced in next commit.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 499fb14..4d23e72 100644
--- a/commit.c
+++ b/commit.c
@@ -109,15 +109,20 @@ static int commit_graft_pos(const unsigned char *sha1)
 			commit_graft_sha1_access);
 }
 
+static void free_commit_graft(struct commit_graft *graft)
+{
+	free(graft);
+}
+
 int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 {
 	int pos = commit_graft_pos(graft->oid.hash);
 
 	if (0 <= pos) {
 		if (ignore_dups)
-			free(graft);
+			free_commit_graft(graft);
 		else {
-			free(commit_graft[pos]);
+			free_commit_graft(commit_graft[pos]);
 			commit_graft[pos] = graft;
 		}
 		return 1;
@@ -163,7 +168,7 @@ struct commit_graft *read_graft_line(struct strbuf *line)
 
 bad_graft_data:
 	error("bad graft data: %s", buf);
-	free(graft);
+	free_commit_graft(graft);
 	return NULL;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v2 4/4] commit: rewrite read_graft_line
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
                     ` (2 preceding siblings ...)
  2017-08-16 17:58   ` [PATCH v2 3/4] commit: implement free_commit_graft Patryk Obara
@ 2017-08-16 17:58   ` Patryk Obara
  2017-08-17 21:20     ` Junio C Hamano
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
  4 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-16 17:58 UTC (permalink / raw)
  To: git, sandals, Junio C Hamano, Stefan Beller, Jonathan Nieder; +Cc: Patryk Obara

The previous implementation of read_graft_line used calculations based
on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
ids in a single graft line.  New implementation does not depend on these
constants, so it adapts to any object_id buffer size.

To make this possible, FLEX_ARRAY of object_id in struct was replaced
by an oid_array.

Code allocating graft now needs to use memset to zero the memory before
use to start with oid_array in a consistent state.

Updates free_graft function implemented in the previous patch to
properly cleanup an oid_array storing parents.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c  | 39 +++++++++++++++++++++------------------
 commit.h  |  2 +-
 shallow.c |  1 +
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index 4d23e72..8bdce36 100644
--- a/commit.c
+++ b/commit.c
@@ -111,6 +111,7 @@ static int commit_graft_pos(const unsigned char *sha1)
 
 static void free_commit_graft(struct commit_graft *graft)
 {
+	oid_array_clear(&graft->parents);
 	free(graft);
 }
 
@@ -139,35 +140,37 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
+static int parse_next_oid_hex(const char *buf, struct object_id *oid, const char **end)
+{
+	while (isspace(buf[0]))
+		buf++;
+	return parse_oid_hex(buf, oid, end);
+}
+
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i, len;
-	char *buf = line->buf;
 	struct commit_graft *graft = NULL;
-	const int entry_size = GIT_SHA1_HEXSZ + 1;
+	struct object_id oid;
+	const char *tail = NULL;
 
 	strbuf_rtrim(line);
-	len = line->len;
-	if (buf[0] == '#' || buf[0] == '\0')
+	if (line->buf[0] == '#' || line->len == 0)
 		return NULL;
-	if ((len + 1) % entry_size)
+	graft = xmalloc(sizeof(*graft));
+	memset(graft, 0, sizeof(*graft));
+	if (parse_oid_hex(line->buf, &graft->oid, &tail))
 		goto bad_graft_data;
-	i = (len + 1) / entry_size - 1;
-	graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
-	graft->nr_parent = i;
-	if (get_oid_hex(buf, &graft->oid))
+	while (!parse_next_oid_hex(tail, &oid, &tail))
+		oid_array_append(&graft->parents, &oid);
+	if (tail[0] != '\0')
 		goto bad_graft_data;
-	for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-		if (buf[i] != ' ')
-			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
-			goto bad_graft_data;
-	}
+	graft->nr_parent = graft->parents.nr;
+
 	return graft;
 
 bad_graft_data:
-	error("bad graft data: %s", buf);
+	error("bad graft data: %s", line->buf);
 	free_commit_graft(graft);
 	return NULL;
 }
@@ -363,7 +366,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		int i;
 		struct commit *new_parent;
 		for (i = 0; i < graft->nr_parent; i++) {
-			new_parent = lookup_commit(&graft->parent[i]);
+			new_parent = lookup_commit(&graft->parents.oid[i]);
 			if (!new_parent)
 				continue;
 			pptr = &commit_list_insert(new_parent, pptr)->next;
diff --git a/commit.h b/commit.h
index baecc0a..96ff375 100644
--- a/commit.h
+++ b/commit.h
@@ -243,7 +243,7 @@ void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
 struct commit_graft {
 	struct object_id oid;
 	int nr_parent; /* < 0 if shallow commit */
-	struct object_id parent[FLEX_ARRAY]; /* more */
+	struct oid_array parents;
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
diff --git a/shallow.c b/shallow.c
index f5591e5..892cd90 100644
--- a/shallow.c
+++ b/shallow.c
@@ -33,6 +33,7 @@ int register_shallow(const struct object_id *oid)
 		xmalloc(sizeof(struct commit_graft));
 	struct commit *commit = lookup_commit(oid);
 
+	memset(graft, 0, sizeof(*graft));
 	oidcpy(&graft->oid, oid);
 	graft->nr_parent = -1;
 	if (commit && commit->object.parsed)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1
  2017-08-16 12:11         ` Patryk Obara
@ 2017-08-16 19:32           ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-16 19:32 UTC (permalink / raw)
  To: Patryk Obara
  Cc: Stefan Beller, Jonathan Nieder, git@vger.kernel.org,
	brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>
>> I said this is OK for "null" because we assume we will use ^\0{len}$
>> for any hash function we choose as the "impossible" value, and for
>> that particular use pattern, we do not need such a union.  Just
>> letting the caller peek at an appropriate number of bytes at the
>> beginning of that NUL buffer for hash the caller wants to use is
>> sufficient.
>
> Do you think I should record this explanation as either commit message
> or comment in sha1_file.c?
>
>> MAX is inevitable only if we envision that we have to handle objects
>> named using two or more hashing schemes at the same time, with the
>> same binary and during the same run inside a single process.
>
> I think this will be the case if "transition one local repository at
> a time" from Jonathan Nieder's transition plan will be followed.
> This plan assumes object_id translation happening e.g. during fetch
> operation.

It would be good if that assumption is made explicit.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/5] commit: rewrite read_graft_line
  2017-08-16 13:07     ` Patryk Obara
@ 2017-08-16 19:39       ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-16 19:39 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Git List, brian m . carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> I am not sure if this is a good approach.  Just like in 2/5 you can
>> use the MAX thing instead of 20, instead of having each graft entry
>> allocate a separate oid_array.oid[].
>
> Once MAX values were increased memory corruption was caused exactly by
> this line:
>
>> - graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
>
> I could've replaced it by:
>
>     graft = xmalloc(st_add(sizeof(*graft), st_mult(sizeof(struct
> object_id), i)));
>
> But it seemed to me like short-sighted solution (code might be broken if
> object_id will be modified again in future).

Why?  

Assuming that "struct object_id" would have to be prepared to handle
more than one types of hash functions at the same time, it would be
sufficiently large to hold any type of hash, plus possibly another
member that tells which kind.

The decision to choose between a flex array and a separately
allocatable structure primarily should come from how the enclosing
struct (i.e. the commti_graft structure) is meant to be used.  It is
not meant to be tweaked by adding more parents or removing parents
once it is constructed, which argues for having a flex array to hold
the known and fixed number of parents once it is constructed.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1
  2017-08-16 17:58   ` [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
@ 2017-08-16 19:46     ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-16 19:46 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, sandals, Stefan Beller, Jonathan Nieder

Patryk Obara <patryk.obara@gmail.com> writes:

> This prevents compilation error if GIT_MAX_RAWSZ is different than 20.

The above made me scratch my head wondering why because it does not
say what the root cause of the issue is.  It would have avoided a
few strand of lost hair if it were more like this, perhaps:

	The array is declared in cache.h as

	    extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

	The definition of it we have in sha1_file.c must match.


>
> Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
> ---
>  sha1_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15..f5b5bec 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -32,7 +32,7 @@
>  #define SZ_FMT PRIuMAX
>  static inline uintmax_t sz_fmt(size_t s) { return s; }
>  
> -const unsigned char null_sha1[20];
> +const unsigned char null_sha1[GIT_MAX_RAWSZ];
>  const struct object_id null_oid;
>  const struct object_id empty_tree_oid = {
>  	EMPTY_TREE_SHA1_BIN_LITERAL

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-16 12:24     ` Patryk Obara
@ 2017-08-16 22:59       ` brian m. carlson
  2017-08-17  5:55         ` Jeff King
  0 siblings, 1 reply; 59+ messages in thread
From: brian m. carlson @ 2017-08-16 22:59 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Stefan Beller, git@vger.kernel.org, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Wed, Aug 16, 2017 at 02:24:27PM +0200, Patryk Obara wrote:
> On Tue, Aug 15, 2017 at 7:02 PM, Stefan Beller <sbeller@google.com> wrote:
> >>         const int entry_size = GIT_SHA1_HEXSZ + 1;
> >
> > outside the scope of this patch:
> > Is GIT_SHA1_HEXSZ or GIT_MAX_HEXSZ the right call here?
> 
> I think neither one. In my opinion, this code should not be so closely
> coupled to hash parsing code - it should be tasked with parsing
> whitespace separated list of commit ids without relying on specific
> commit id length or format.

What I had intended, although maybe I have not explained this well, was
that we would have one binary that set up hash functionality as part of
early setup.  GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ would turn into
something like current_hash->rawsz and current_hash->hexsz at that
point.  The reason I introduced the GIT_MAX constants was to allocate
memory suitable for whatever hash we picked.

However, this is only what I had considered for design, and others might
have different views going forward.  I have, however, based my patches
on that assumption, and responded to others' comments with those
statements.

I agree that ideally we should make as much of the code as possible
ignorant of the hash size, because that will generally result in more
robust, less brittle code.  I've noticed in this series the use of
parse_oid_hex, and I agree that's one tool we can use to accomplish that
goal.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-16 22:59       ` brian m. carlson
@ 2017-08-17  5:55         ` Jeff King
  2017-08-17 21:17           ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Jeff King @ 2017-08-17  5:55 UTC (permalink / raw)
  To: brian m. carlson, Patryk Obara, Stefan Beller,
	git@vger.kernel.org, Junio C Hamano

On Wed, Aug 16, 2017 at 10:59:02PM +0000, brian m. carlson wrote:

> On Wed, Aug 16, 2017 at 02:24:27PM +0200, Patryk Obara wrote:
> > On Tue, Aug 15, 2017 at 7:02 PM, Stefan Beller <sbeller@google.com> wrote:
> > >>         const int entry_size = GIT_SHA1_HEXSZ + 1;
> > >
> > > outside the scope of this patch:
> > > Is GIT_SHA1_HEXSZ or GIT_MAX_HEXSZ the right call here?
> > 
> > I think neither one. In my opinion, this code should not be so closely
> > coupled to hash parsing code - it should be tasked with parsing
> > whitespace separated list of commit ids without relying on specific
> > commit id length or format.
> 
> What I had intended, although maybe I have not explained this well, was
> that we would have one binary that set up hash functionality as part of
> early setup.  GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ would turn into
> something like current_hash->rawsz and current_hash->hexsz at that
> point.  The reason I introduced the GIT_MAX constants was to allocate
> memory suitable for whatever hash we picked.
> 
> However, this is only what I had considered for design, and others might
> have different views going forward.  I have, however, based my patches
> on that assumption, and responded to others' comments with those
> statements.

What you wrote here matches my understanding of the general plan. IOW,
we'd expect to "waste" 12 bytes when dealing with a 160-bit sha1 in a
Git binary that's aware of 256-bit hashes. But that seems like a small
price to pay to be able to continue using automatic allocations, versus
rewriting each site to call xmalloc(current_hash->rawsz).

I'd expect most of the GIT_MAX constants to eventually go away in favor
of "struct object_id", but that will still be using the same "big enough
to hold any hash" size under the hood.

> I agree that ideally we should make as much of the code as possible
> ignorant of the hash size, because that will generally result in more
> robust, less brittle code.  I've noticed in this series the use of
> parse_oid_hex, and I agree that's one tool we can use to accomplish that
> goal.

Agreed. Most code should be dealing with the abstract concept of a hash
and shouldn't have to care about the size. I really like parse_oid_hex()
for that reason (and I think parsing is the main place we've found that
needs to care).

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-17  5:55         ` Jeff King
@ 2017-08-17 21:17           ` Junio C Hamano
  2017-08-17 21:38             ` Patryk Obara
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-17 21:17 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, Patryk Obara, Stefan Beller,
	git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> I'd expect most of the GIT_MAX constants to eventually go away in favor
> of "struct object_id", but that will still be using the same "big enough
> to hold any hash" size under the hood.

Indeed.  It is good to see major contributors are in agreement ;-)
I'd expect that an array of "struct object_id" would be how a fixed
number of object names would be represented, i.e.

	struct object_id thing[num_elements];

instead of an array of uchar that is MAX bytes long, i.e.

	unsigned char name[GIT_MAX_RAWSZ][num_elements];

In fact, the former is already how we represent the list of fake
parents in the commit_graft structure, so I think patch 5/5 in this
series does two unrelated things, one of which is bad (i.e. use of
parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
oid_array that requires a separate allocation of the array is bad).

> Agreed. Most code should be dealing with the abstract concept of a hash
> and shouldn't have to care about the size. I really like parse_oid_hex()
> for that reason (and I think parsing is the main place we've found that
> needs to care).

Yes.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/4] commit: rewrite read_graft_line
  2017-08-16 17:58   ` [PATCH v2 4/4] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-17 21:20     ` Junio C Hamano
  2017-08-17 21:42       ` Patryk Obara
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-17 21:20 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, sandals, Stefan Beller, Jonathan Nieder

Patryk Obara <patryk.obara@gmail.com> writes:

> The previous implementation of read_graft_line used calculations based
> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
> ids in a single graft line.  New implementation does not depend on these
> constants, so it adapts to any object_id buffer size.
>
> To make this possible, FLEX_ARRAY of object_id in struct was replaced
> by an oid_array.

There is a leap in logic between the two paragraphs.  Your use of
parse_oid_hex() is good.  But I do not think moving the array body
to outside commit_graft structure and forcing it to be separately
allocated is necessary or beneficial.  When we got a single line, we
know how many fake parents a child described by that graft line has,
and you can still use of FLEX_ARRAY to avoid separate allocation
and need for separate freeing of it.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-17 21:17           ` Junio C Hamano
@ 2017-08-17 21:38             ` Patryk Obara
  0 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-17 21:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, brian m. carlson, Stefan Beller, git@vger.kernel.org

> In fact, the former is already how we represent the list of fake
> parents in the commit_graft structure, so I think patch 5/5 in this
> series does two unrelated things, one of which is bad (i.e. use of
> parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
> oid_array that requires a separate allocation of the array is bad).

Agreed; I already split patch 5 into two separate changes (one fixing
memory allocation issue, one parsing object_ids into FLEX_ARRAY,
without modifying graft struct). In result patch 4 (free_graft) can
be dropped. I will send these changes as v3.


On Thu, Aug 17, 2017 at 11:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> I'd expect most of the GIT_MAX constants to eventually go away in favor
>> of "struct object_id", but that will still be using the same "big enough
>> to hold any hash" size under the hood.
>
> Indeed.  It is good to see major contributors are in agreement ;-)
> I'd expect that an array of "struct object_id" would be how a fixed
> number of object names would be represented, i.e.
>
>         struct object_id thing[num_elements];
>
> instead of an array of uchar that is MAX bytes long, i.e.
>
>         unsigned char name[GIT_MAX_RAWSZ][num_elements];
>
> In fact, the former is already how we represent the list of fake
> parents in the commit_graft structure, so I think patch 5/5 in this
> series does two unrelated things, one of which is bad (i.e. use of
> parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
> oid_array that requires a separate allocation of the array is bad).
>
>> Agreed. Most code should be dealing with the abstract concept of a hash
>> and shouldn't have to care about the size. I really like parse_oid_hex()
>> for that reason (and I think parsing is the main place we've found that
>> needs to care).
>
> Yes.



-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 4/4] commit: rewrite read_graft_line
  2017-08-17 21:20     ` Junio C Hamano
@ 2017-08-17 21:42       ` Patryk Obara
  0 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-17 21:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git List, brian m . carlson, Stefan Beller, Jonathan Nieder

Understood :) - yes, oid_array is not completely necessary here - and it
gives the wrong impression about usage of this struct.

FLEX_ARRAY will be brought back in v3.

On Thu, Aug 17, 2017 at 11:20 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Patryk Obara <patryk.obara@gmail.com> writes:
>
>> The previous implementation of read_graft_line used calculations based
>> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
>> ids in a single graft line.  New implementation does not depend on these
>> constants, so it adapts to any object_id buffer size.
>>
>> To make this possible, FLEX_ARRAY of object_id in struct was replaced
>> by an oid_array.
>
> There is a leap in logic between the two paragraphs.  Your use of
> parse_oid_hex() is good.  But I do not think moving the array body
> to outside commit_graft structure and forcing it to be separately
> allocated is necessary or beneficial.  When we got a single line, we
> know how many fake parents a child described by that graft line has,
> and you can still use of FLEX_ARRAY to avoid separate allocation
> and need for separate freeing of it.



-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 0/4] Modernize read_graft_line implementation
  2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
                     ` (3 preceding siblings ...)
  2017-08-16 17:58   ` [PATCH v2 4/4] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-18  1:59   ` Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
                       ` (5 more replies)
  4 siblings, 6 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18  1:59 UTC (permalink / raw)
  To: git, Junio C Hamano, Stefan Beller; +Cc: brian m. carlson, Jeff King

Changes since v2:
- commit implementing free_graft dropped (no longer needed)
- several more lines moved from last commit to commit replacing
  raw buffer with strbuf
- fix for memory allocation separated from change in hash
  parsing
- commit_graft struct uses FLEX_ARRAY again, meaning free_graft,
  memset, nor oid_array were not needed after all

Patryk Obara (4):
  sha1_file: fix definition of null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: allocate array using object_id size
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 commit.c        | 38 +++++++++++++++++++-------------------
 commit.h        |  2 +-
 sha1_file.c     |  2 +-
 4 files changed, 22 insertions(+), 22 deletions(-)

-- 
2.9.5

base-commit: b3622a4ee94e4916cd05e6d96e41eeb36b941182

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 1/4] sha1_file: fix definition of null_sha1
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
@ 2017-08-18  1:59     ` Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18  1:59 UTC (permalink / raw)
  To: git, Junio C Hamano, Stefan Beller; +Cc: brian m. carlson, Jeff King

The array is declared in cache.h as:

  extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

Definition in sha1_file.c must match.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
 	EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
@ 2017-08-18  1:59     ` Patryk Obara
  2017-08-18  6:29       ` Jeff King
  2017-08-18  1:59     ` [PATCH v3 3/4] commit: allocate array using object_id size Patryk Obara
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-18  1:59 UTC (permalink / raw)
  To: git, Junio C Hamano, Stefan Beller; +Cc: brian m. carlson, Jeff King

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 builtin/blame.c |  2 +-
 commit.c        | 15 ++++++++-------
 commit.h        |  2 +-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (graft)
 			register_commit_graft(graft, 0);
 	}
diff --git a/commit.c b/commit.c
index 8b28415..019e733 100644
--- a/commit.c
+++ b/commit.c
@@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i;
+	int i, len;
+	char *buf = line->buf;
 	struct commit_graft *graft = NULL;
 	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-	while (len && isspace(buf[len-1]))
-		buf[--len] = '\0';
-	if (buf[0] == '#' || buf[0] == '\0')
+	strbuf_rtrim(line);
+	if (line->buf[0] == '#' || line->len == 0)
 		return NULL;
+	len = line->len;
 	if ((len + 1) % entry_size)
 		goto bad_graft_data;
 	i = (len + 1) / entry_size - 1;
@@ -161,7 +162,7 @@ struct commit_graft *read_graft_line(char *buf, int len)
 	return graft;
 
 bad_graft_data:
-	error("bad graft data: %s", buf);
+	error("bad graft data: %s", line->buf);
 	free(graft);
 	return NULL;
 }
@@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (!graft)
 			continue;
 		if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v3 3/4] commit: allocate array using object_id size
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-18  1:59     ` Patryk Obara
  2017-08-18  1:59     ` [PATCH v3 4/4] commit: rewrite read_graft_line Patryk Obara
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18  1:59 UTC (permalink / raw)
  To: git, Junio C Hamano, Stefan Beller; +Cc: brian m. carlson, Jeff King

struct commit_graft aggregates an array of object_id's, which have
size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation
error when size of object_id is larger than GIT_SHA1_RAWSZ.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 019e733..61528a5 100644
--- a/commit.c
+++ b/commit.c
@@ -149,7 +149,8 @@ struct commit_graft *read_graft_line(struct strbuf *line)
 	if ((len + 1) % entry_size)
 		goto bad_graft_data;
 	i = (len + 1) / entry_size - 1;
-	graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
+	graft = xmalloc(st_add(sizeof(*graft),
+	                       st_mult(sizeof(struct object_id), i)));
 	graft->nr_parent = i;
 	if (get_oid_hex(buf, &graft->oid))
 		goto bad_graft_data;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
                       ` (2 preceding siblings ...)
  2017-08-18  1:59     ` [PATCH v3 3/4] commit: allocate array using object_id size Patryk Obara
@ 2017-08-18  1:59     ` Patryk Obara
  2017-08-18  6:43       ` Jeff King
  2017-08-18  2:20     ` [PATCH v3 0/4] Modernize read_graft_line implementation Junio C Hamano
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
  5 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-18  1:59 UTC (permalink / raw)
  To: git, Junio C Hamano, Stefan Beller; +Cc: brian m. carlson, Jeff King

Determine the number of object_id's to parse in a single graft line by
counting separators (whitespace characters) instead of dividing by
length of hash representation.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 61528a5..46ee2db 100644
--- a/commit.c
+++ b/commit.c
@@ -137,29 +137,27 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i, len;
-	char *buf = line->buf;
+	int i, n;
+	const char *tail = NULL;
 	struct commit_graft *graft = NULL;
-	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
 	strbuf_rtrim(line);
 	if (line->buf[0] == '#' || line->len == 0)
 		return NULL;
-	len = line->len;
-	if ((len + 1) % entry_size)
-		goto bad_graft_data;
-	i = (len + 1) / entry_size - 1;
+	/* count number of blanks to determine size of array to allocate */
+	for (i = 0, n = 0; i < line->len; i++)
+		if (isspace(line->buf[i]))
+			n++;
 	graft = xmalloc(st_add(sizeof(*graft),
-	                       st_mult(sizeof(struct object_id), i)));
-	graft->nr_parent = i;
-	if (get_oid_hex(buf, &graft->oid))
+	                       st_mult(sizeof(struct object_id), n)));
+	graft->nr_parent = n;
+	if (parse_oid_hex(line->buf, &graft->oid, &tail))
 		goto bad_graft_data;
-	for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-		if (buf[i] != ' ')
-			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
+	for (i = 0; i < graft->nr_parent; i++)
+		if (!isspace(*tail++) || parse_oid_hex(tail, &graft->parent[i], &tail))
 			goto bad_graft_data;
-	}
+	if (tail[0] != '\0')
+		goto bad_graft_data;
 	return graft;
 
 bad_graft_data:
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 0/4] Modernize read_graft_line implementation
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
                       ` (3 preceding siblings ...)
  2017-08-18  1:59     ` [PATCH v3 4/4] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-18  2:20     ` Junio C Hamano
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
  5 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18  2:20 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, Stefan Beller, brian m. carlson, Jeff King

Thanks, will queue.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-18  1:59     ` [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-18  6:29       ` Jeff King
  2017-08-18 10:12         ` Patryk Obara
  0 siblings, 1 reply; 59+ messages in thread
From: Jeff King @ 2017-08-18  6:29 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, Junio C Hamano, Stefan Beller, brian m. carlson

On Fri, Aug 18, 2017 at 03:59:36AM +0200, Patryk Obara wrote:

> diff --git a/commit.c b/commit.c
> index 8b28415..019e733 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
>  	return 0;
>  }
>  
> -struct commit_graft *read_graft_line(char *buf, int len)
> +struct commit_graft *read_graft_line(struct strbuf *line)
>  {
>  	/* The format is just "Commit Parent1 Parent2 ...\n" */
> -	int i;
> +	int i, len;
> +	char *buf = line->buf;

Copying a pointer to a strbuf's buffer is a dangerous habit. The strbuf
is free to re-allocate the buffer under the hood during any operation it
likes, potentially leaving you pointing to freed memory.

In this case it's OK because the only function you call is
strbuf_rtrim(), which never reallocates. But I feel like this is setting
up a maintenance trap for the next person to touch the function.

AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
of the function. But I think we should just make that change (you
already did in some of the spots). And IMHO we should do the same for
line->len. When there are two names for the same value, it increases the
chances of a bug where the two end up diverging.

> -	while (len && isspace(buf[len-1]))
> -		buf[--len] = '\0';
> -	if (buf[0] == '#' || buf[0] == '\0')
> +	strbuf_rtrim(line);
> +	if (line->buf[0] == '#' || line->len == 0)
>  		return NULL;

I find it funny to look at line->buf[0] before line->len, because it
means we're reading pas the end of the buffer. It's OK here because we
know there's a NUL terminator, but I think short-circuiting like:

  if (!line->len || line->buf[0] == '#')

is better (I also think "!" instead of "== 0" is our usual style, but
that's much less important).

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18  1:59     ` [PATCH v3 4/4] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-18  6:43       ` Jeff King
  2017-08-18  7:44         ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Jeff King @ 2017-08-18  6:43 UTC (permalink / raw)
  To: Patryk Obara; +Cc: git, Junio C Hamano, Stefan Beller, brian m. carlson

On Fri, Aug 18, 2017 at 03:59:38AM +0200, Patryk Obara wrote:

> Determine the number of object_id's to parse in a single graft line by
> counting separators (whitespace characters) instead of dividing by
> length of hash representation.
> 
> This way graft parsing code can support different sizes of hashes
> without any further code adaptations.

Sounds like a reasonable approach, though I wonder what happens if our
counting pass differs in its behavior from the actual parse.

E.g., here:

> +	/* count number of blanks to determine size of array to allocate */
> +	for (i = 0, n = 0; i < line->len; i++)
> +		if (isspace(line->buf[i]))
> +			n++;

If we see multiple spaces like "1234abcd     5678abcd" we'll allocate a
slot for each. I think that's OK because here:

> +	for (i = 0; i < graft->nr_parent; i++)
> +		if (!isspace(*tail++) || parse_oid_hex(tail, &graft->parent[i], &tail))
>  			goto bad_graft_data;

We'd reject such an input totally (though as an interesting side effect,
you can convince the parser to allocate 20x as much RAM as you send it;
one oid for each space).

So we're probably fine. The two parsing passes are right next to each
other and are sufficiently simple and strict that we don't have to
worry about them diverging.

The single-pass alternative would probably be to read into a dynamic
structure like an oid_array, and then copy the result into the flex
structure.

Or of course to stop using a flex structure, as your original pass did.
I agree with Junio that the use of object_id's is orthogonal to using a
FLEX_ARRAY. But I could also see an argument that the complexity the
flex array adds here isn't worth the savings. The main benefits of a
flex array are:

  1. Less memory used. But we don't expect to see a large enough number
     of grafts for this to matter.

  2. A more compact memory representation, which can be faster. But
     accessing the parent list of a graft isn't going to be the hot code
     path. It probably only happens once in a program run when we
     rewrite the parents (versus checking the grafted commit, which is
     looked up once per commit we access).

  3. It's easier to free the struct and its associated resources in a
     single free(). But we never free the graft list.

Reading your original, my thought was "why _not_ keep doing it as a
FLEX_ARRAY, as it saves a little memory, which can't hurt". But seeing
this pre-counting phase, I think it does make the code a little more
complicated. But I'd be OK with doing it either way.

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18  6:43       ` Jeff King
@ 2017-08-18  7:44         ` Junio C Hamano
  2017-08-18 11:30           ` Patryk Obara
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18  7:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Patryk Obara, git, Stefan Beller, brian m. carlson

Jeff King <peff@peff.net> writes:

> So we're probably fine. The two parsing passes are right next to each
> other and are sufficiently simple and strict that we don't have to
> worry about them diverging.

If I were doing the two-pass thing, I'd probably write a for loop
that runs exactly twice, where the first iteration parses into a
single throw-away oid struct only to count, and the second iteration
parses the same input into the allocated array of oid struct.  That
way, you do not have to worry about two phrases going out of sync.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-18  6:29       ` Jeff King
@ 2017-08-18 10:12         ` Patryk Obara
  2017-08-18 11:50           ` Jeff King
  0 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 10:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano, Stefan Beller, brian m. carlson

Jeff King <peff@peff.net> wrote:

> AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
> of the function. But I think we should just make that change (you
> already did in some of the spots). And IMHO we should do the same for
> line->len. When there are two names for the same value, it increases the
> chances of a bug where the two end up diverging.

My motivation was rather to keep patch(es) as small as possible because every
line using buf will be replaced in a later patch in series. But it will make
commit better (it will stand on its own), so why not to do it? :)

> (…) I think short-circuiting like:
>
>   if (!line->len || line->buf[0] == '#')
>
> is better (I also think "!" instead of "== 0" is our usual style, but
> that's much less important).

Ah, I only replaced comparison to NULL terminator with length check because
I thought it better shows intention of the code and I didn't notice, that
reversing order will result in better code overall.

I will include both changes in v4.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18  7:44         ` Junio C Hamano
@ 2017-08-18 11:30           ` Patryk Obara
  2017-08-18 11:45             ` Jeff King
  2017-08-18 16:44             ` Junio C Hamano
  0 siblings, 2 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 11:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git List, Stefan Beller, brian m. carlson

Jeff King <peff@peff.net> wrote:
>
> So we're probably fine. The two parsing passes are right next to each
> other and are sufficiently simple and strict that we don't have to
> worry about them diverging.

That was my conclusion as well. I added comment before the first pass and
avoided any "cleverness" to make it perfectly clear to a reader.

> We'd reject such an input totally (though as an interesting side effect,
> you can convince the parser to allocate 20x as much RAM as you send it;
> one oid for each space).

Grafts are not populated during clone operation, so it really would be user
making his life miserable. I could allocate FLEXI_ARRAY of size
min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even
worth the cost of making the code more complicated (and I don't want
to reintroduce these size macros in here.

We _could_ put an artificial limit on graft parents, though (e.g. 10) and
display an error message urging the user to stop using grafts?

> The single-pass alternative would probably be to read into a dynamic
> structure like an oid_array, and then copy the result into the flex
> structure.

Before sending v3 I tried two other alternative implementations (perhaps I
should've listed them in the v3 cover letter):

  1. Using string_list_split_in_place. I resigned from this approach as soon
     as I noticed, that line->buf needs to be preserved for possible
     error message. string_list_split would have no benefits over using
     oid_array.

  2. Parsing into temporary oid_array and then copying memory to FLEXI_ARRAY.
     Throw-away oid_array still needs to be cleaned, which means we have
     new/different return path (one before xmalloc and one after xmalloc),
     which means "bad_graft_data" label needs to be changed into "cleanup"
     label (or removed), which means error description needs be conditionally
     put in earlier code… and at this point, I decided these changes are not
     making code cleaner nor more readable at all :)

Junio C Hamano <gitster@pobox.com> wrote:
>
> If I were doing the two-pass thing, I'd probably write a for loop
> that runs exactly twice, where the first iteration parses into a
> single throw-away oid struct only to count, and the second iteration
> parses the same input into the allocated array of oid struct.  That
> way, you do not have to worry about two phrases going out of sync.

Two passes would still differ in error handling due to xmalloc between them…

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18 11:30           ` Patryk Obara
@ 2017-08-18 11:45             ` Jeff King
  2017-08-18 16:44             ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Jeff King @ 2017-08-18 11:45 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Junio C Hamano, Git List, Stefan Beller, brian m. carlson

On Fri, Aug 18, 2017 at 01:30:23PM +0200, Patryk Obara wrote:

> > We'd reject such an input totally (though as an interesting side effect,
> > you can convince the parser to allocate 20x as much RAM as you send it;
> > one oid for each space).
> 
> Grafts are not populated during clone operation, so it really would be user
> making his life miserable. I could allocate FLEXI_ARRAY of size
> min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even
> worth the cost of making the code more complicated (and I don't want
> to reintroduce these size macros in here.
> 
> We _could_ put an artificial limit on graft parents, though (e.g. 10) and
> display an error message urging the user to stop using grafts?

Yeah, sorry, I should have made more clear that this is fine. I always
try to read parsing code with my paranoid hat on, but I agree that
grafts aren't really exposed to untrusted entities.

In general I'd prefer to avoid artificial limits unless there's a need
for them. There are already spots (like receive-pack) where you can ask
Git to store bytes in RAM as fast as you can send them. What I found
interesting about this one was the 20:1 amplification. :)

> Before sending v3 I tried two other alternative implementations (perhaps I
> should've listed them in the v3 cover letter):

It might even be worth listing them in the commit message. Somebody
finding your commit 3 years from via "git log -S" or "git blame" might
say "yes, but why didn't they just do it like...". You can respond to
them preemptively. :)

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-18 10:12         ` Patryk Obara
@ 2017-08-18 11:50           ` Jeff King
  0 siblings, 0 replies; 59+ messages in thread
From: Jeff King @ 2017-08-18 11:50 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Git List, Junio C Hamano, Stefan Beller, brian m. carlson

On Fri, Aug 18, 2017 at 12:12:37PM +0200, Patryk Obara wrote:

> Jeff King <peff@peff.net> wrote:
> 
> > AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
> > of the function. But I think we should just make that change (you
> > already did in some of the spots). And IMHO we should do the same for
> > line->len. When there are two names for the same value, it increases the
> > chances of a bug where the two end up diverging.
> 
> My motivation was rather to keep patch(es) as small as possible because every
> line using buf will be replaced in a later patch in series. But it will make
> commit better (it will stand on its own), so why not to do it? :)

Ah, I didn't notice those lines went away. That does make it less bad,
but I do think it's easier to review if each commit stands on its own.

In some cases, if it's really painful to do the intermediate cleanup, I
might say something in the commit message like "this leaves X that is
not ideal, but we'll be getting rid of it soon anyway". But in this case
I think just creating that intermediate state is simple enough.

> Ah, I only replaced comparison to NULL terminator with length check because
> I thought it better shows intention of the code and I didn't notice, that
> reversing order will result in better code overall.
> 
> I will include both changes in v4.

Thanks.

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18 11:30           ` Patryk Obara
  2017-08-18 11:45             ` Jeff King
@ 2017-08-18 16:44             ` Junio C Hamano
  2017-08-18 17:05               ` Patryk Obara
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18 16:44 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Jeff King, Git List, Stefan Beller, brian m. carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>>
>> If I were doing the two-pass thing, I'd probably write a for loop
>> that runs exactly twice, where the first iteration parses into a
>> single throw-away oid struct only to count, and the second iteration
>> parses the same input into the allocated array of oid struct.  That
>> way, you do not have to worry about two phrases going out of sync.
>
> Two passes would still differ in error handling due to xmalloc between them…

I am not sure if I follow.  What I meant was something along these
lines:

	struct commit_graft *graft = NULL;
        char *line = ... what you read from the file ...;
        int phase; /* phase #0 counts, phase #1 fills */

	for (phase = 0; phase < 2; phase++) {
		int count;
		char *scan;
		strucut object_id dummy_oid, *oid;

		for (scan = line, count = 0;
                     *scan;
		     count++) {
                	oid = graft ? &graft->parent[count] : &dummy_oid;
			if (parse_oid_hex(scan, oid, &scan))
				return error(...);
			switch (*scan) {
			case ' ':
				scan++;
				continue; /* there are more */
			case '\0':
				break; /* we are done */
			default:
				return error(...);
			}
		}

		if (!graft)
			graft = xmalloc(... with 'count' parentes ...);
	}

        /* now we have graft with parent[count] all filled */
	return graft;

The inner for() loop will do the same parsing for both passes,
leaving little chance for programming errors to make the two passes
decide there are different number of fake parents.  I suspect I may
have botched some details in that loop, but both passes will even
share the same buggy counting when the code is structured that
way ;-)

That is what I meant by "not have to worry about two phases going
out of sync".

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18 16:44             ` Junio C Hamano
@ 2017-08-18 17:05               ` Patryk Obara
  2017-08-18 18:29                 ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 17:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git List, Stefan Beller, brian m. carlson

Ah! I presumed two separate loops, one throwing away oids and second
one actually filling a table - this makes more sense. I was just about
to send v4, but will rewrite the last patch and we'll see how it looks
like.
-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 4/4] commit: rewrite read_graft_line
  2017-08-18 17:05               ` Patryk Obara
@ 2017-08-18 18:29                 ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18 18:29 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Jeff King, Git List, Stefan Beller, brian m. carlson

Patryk Obara <patryk.obara@gmail.com> writes:

> Ah! I presumed two separate loops, one throwing away oids and second
> one actually filling a table - this makes more sense. I was just about
> to send v4, but will rewrite the last patch and we'll see how it looks
> like.

Yeah, it is understandable if you missed my "a loop that runs
exactly twice", as that pattern, while we do use it in a few places
in our codebase, is of limited applicability in general---the cost
of discarded computation in the first pass need to be low enough for
the improved maintainability to make sense.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v4 0/4] Modernize read_graft_line implementation
  2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
                       ` (4 preceding siblings ...)
  2017-08-18  2:20     ` [PATCH v3 0/4] Modernize read_graft_line implementation Junio C Hamano
@ 2017-08-18 18:33     ` Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
                         ` (3 more replies)
  5 siblings, 4 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:33 UTC (permalink / raw)
  To: git, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

Changes since v3:

- Commit replacing raw buffer does not store temporary pointer to
  strbuf internals any more.

- Commit message of patch 4 explains all alternative approaches
  considered so far.

- Patch 4 uses two-phases to parse graft line, without code repetition.

I have my reservations about patch 4 from readability standpoint
(it's not immediately clear why parsing code can skip freeing of graft
in phase 2), but this implementation seems to address every issue raised
in review so far. If you'll prefer me to go back to impementation
from v3, I have it prepared ;)


Patryk Obara (4):
  sha1_file: fix definition of null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: allocate array using object_id size
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 commit.c        | 45 +++++++++++++++++++++++++--------------------
 commit.h        |  2 +-
 sha1_file.c     |  2 +-
 4 files changed, 28 insertions(+), 23 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v4 1/4] sha1_file: fix definition of null_sha1
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
@ 2017-08-18 18:33       ` Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:33 UTC (permalink / raw)
  To: git, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

The array is declared in cache.h as:

  extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

Definition in sha1_file.c must match.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
 	EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
@ 2017-08-18 18:33       ` Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 3/4] commit: allocate array using object_id size Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 4/4] commit: rewrite read_graft_line Patryk Obara
  3 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:33 UTC (permalink / raw)
  To: git, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 builtin/blame.c |  2 +-
 commit.c        | 23 +++++++++++------------
 commit.h        |  2 +-
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (graft)
 			register_commit_graft(graft, 0);
 	}
diff --git a/commit.c b/commit.c
index 8b28415..1a0a9f2 100644
--- a/commit.c
+++ b/commit.c
@@ -134,34 +134,33 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 	return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
 	int i;
 	struct commit_graft *graft = NULL;
 	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-	while (len && isspace(buf[len-1]))
-		buf[--len] = '\0';
-	if (buf[0] == '#' || buf[0] == '\0')
+	strbuf_rtrim(line);
+	if (!line->len || line->buf[0] == '#')
 		return NULL;
-	if ((len + 1) % entry_size)
+	if ((line->len + 1) % entry_size)
 		goto bad_graft_data;
-	i = (len + 1) / entry_size - 1;
+	i = (line->len + 1) / entry_size - 1;
 	graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
 	graft->nr_parent = i;
-	if (get_oid_hex(buf, &graft->oid))
+	if (get_oid_hex(line->buf, &graft->oid))
 		goto bad_graft_data;
-	for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-		if (buf[i] != ' ')
+	for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) {
+		if (line->buf[i] != ' ')
 			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
+		if (get_sha1_hex(line->buf + i + 1, graft->parent[i/entry_size].hash))
 			goto bad_graft_data;
 	}
 	return graft;
 
 bad_graft_data:
-	error("bad graft data: %s", buf);
+	error("bad graft data: %s", line->buf);
 	free(graft);
 	return NULL;
 }
@@ -174,7 +173,7 @@ static int read_graft_file(const char *graft_file)
 		return -1;
 	while (!strbuf_getwholeline(&buf, fp, '\n')) {
 		/* The format is just "Commit Parent1 Parent2 ...\n" */
-		struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+		struct commit_graft *graft = read_graft_line(&buf);
 		if (!graft)
 			continue;
 		if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v4 3/4] commit: allocate array using object_id size
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
@ 2017-08-18 18:33       ` Patryk Obara
  2017-08-18 18:33       ` [PATCH v4 4/4] commit: rewrite read_graft_line Patryk Obara
  3 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:33 UTC (permalink / raw)
  To: git, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

struct commit_graft aggregates an array of object_id's, which have
size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation
error when size of object_id is larger than GIT_SHA1_RAWSZ.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 1a0a9f2..436eb34 100644
--- a/commit.c
+++ b/commit.c
@@ -147,7 +147,8 @@ struct commit_graft *read_graft_line(struct strbuf *line)
 	if ((line->len + 1) % entry_size)
 		goto bad_graft_data;
 	i = (line->len + 1) / entry_size - 1;
-	graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
+	graft = xmalloc(st_add(sizeof(*graft),
+	                       st_mult(sizeof(struct object_id), i)));
 	graft->nr_parent = i;
 	if (get_oid_hex(line->buf, &graft->oid))
 		goto bad_graft_data;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v4 4/4] commit: rewrite read_graft_line
  2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
                         ` (2 preceding siblings ...)
  2017-08-18 18:33       ` [PATCH v4 3/4] commit: allocate array using object_id size Patryk Obara
@ 2017-08-18 18:33       ` Patryk Obara
  2017-08-18 18:38         ` Patryk Obara
  3 siblings, 1 reply; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:33 UTC (permalink / raw)
  To: git, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

Old implementation determined number of hashes by dividing length of
line by length of hash, which works only if all hash representations
have same length.

New graft line parser works in two phases:

  1. In first phase line is scanned to verify correctness and compute
     number of hashes, then graft struct is allocated.

  2. In second phase line is scanned again to fill up already allocated
     graft struct.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

A number of alternative implementations were considered and discarded:

  - Modifying graft structure to store oid_array instead of FLEXI_ARRAY
    indicates undesirable usage of struct to readers.

  - Parsing into temporary string_list or oid_array complicates code
    by adding more return paths, as these structures needs to be
    cleared before returning from function.

  - Determining number of hashes by counting separators might cause
    maintenance issues, if this function needs to be modified in future
    again.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
---
 commit.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 436eb34..3eefd9d 100644
--- a/commit.c
+++ b/commit.c
@@ -137,32 +137,37 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
-	int i;
+	int i, phase;
+	const char *tail = NULL;
 	struct commit_graft *graft = NULL;
-	const int entry_size = GIT_SHA1_HEXSZ + 1;
+	struct object_id dummy_oid, *oid;
 
 	strbuf_rtrim(line);
 	if (!line->len || line->buf[0] == '#')
 		return NULL;
-	if ((line->len + 1) % entry_size)
-		goto bad_graft_data;
-	i = (line->len + 1) / entry_size - 1;
-	graft = xmalloc(st_add(sizeof(*graft),
-	                       st_mult(sizeof(struct object_id), i)));
-	graft->nr_parent = i;
-	if (get_oid_hex(line->buf, &graft->oid))
-		goto bad_graft_data;
-	for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) {
-		if (line->buf[i] != ' ')
-			goto bad_graft_data;
-		if (get_sha1_hex(line->buf + i + 1, graft->parent[i/entry_size].hash))
+	/*
+	 * phase 0 verifies line, counts hashes in line and allocates graft
+	 * phase 1 fills graft
+	 */
+	for (phase = 0; phase < 2; phase++) {
+		oid = graft ? &graft->oid : &dummy_oid;
+		if (parse_oid_hex(line->buf, oid, &tail))
 			goto bad_graft_data;
+		for (i = 0; *tail != '\0'; i++) {
+			oid = graft ? &graft->parent[i] : &dummy_oid;
+			if (!isspace(*tail++) || parse_oid_hex(tail, oid, &tail))
+				goto bad_graft_data;
+		}
+		if (!graft) {
+			graft = xmalloc(st_add(sizeof(*graft),
+			                       st_mult(sizeof(struct object_id), i)));
+			graft->nr_parent = i;
+		}
 	}
 	return graft;
 
 bad_graft_data:
 	error("bad graft data: %s", line->buf);
-	free(graft);
 	return NULL;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v4 4/4] commit: rewrite read_graft_line
  2017-08-18 18:33       ` [PATCH v4 4/4] commit: rewrite read_graft_line Patryk Obara
@ 2017-08-18 18:38         ` Patryk Obara
  2017-08-18 19:12           ` Junio C Hamano
  2017-08-18 19:47           ` Junio C Hamano
  0 siblings, 2 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 18:38 UTC (permalink / raw)
  To: Git List, Junio C Hamano, Jeff King; +Cc: brian m . carlson, Stefan Beller

Actually, I don't think I needed to remove free(graft) line, but I don't
know if freeing NULL is considered ok in git code. Let me know if I
should bring it back, please.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v4 4/4] commit: rewrite read_graft_line
  2017-08-18 18:38         ` Patryk Obara
@ 2017-08-18 19:12           ` Junio C Hamano
  2017-08-18 19:33             ` Patryk Obara
  2017-08-18 19:47           ` Junio C Hamano
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18 19:12 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Git List, Jeff King, brian m . carlson, Stefan Beller

Patryk Obara <patryk.obara@gmail.com> writes:

> Actually, I don't think I needed to remove free(graft) line, but I don't
> know if freeing NULL is considered ok in git code. Let me know if I
> should bring it back, please.

Calling free(var) when var may or may not be NULL is perfectly fine.

We even discourage people from writing:

	if (var)
		free(var);

because an unconditional call to free(var) is sufficient.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v4 4/4] commit: rewrite read_graft_line
  2017-08-18 19:12           ` Junio C Hamano
@ 2017-08-18 19:33             ` Patryk Obara
  0 siblings, 0 replies; 59+ messages in thread
From: Patryk Obara @ 2017-08-18 19:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jeff King, brian m . carlson, Stefan Beller

Ok, so that's an option - in this instance free is not actually needed
because it can be triggered only in phase 0, but it would add a bit of
robustness.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v4 4/4] commit: rewrite read_graft_line
  2017-08-18 18:38         ` Patryk Obara
  2017-08-18 19:12           ` Junio C Hamano
@ 2017-08-18 19:47           ` Junio C Hamano
  1 sibling, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2017-08-18 19:47 UTC (permalink / raw)
  To: Patryk Obara; +Cc: Git List, Jeff King, brian m . carlson, Stefan Beller

Patryk Obara <patryk.obara@gmail.com> writes:

> Actually, I don't think I needed to remove free(graft) line, but I don't
> know if freeing NULL is considered ok in git code. Let me know if I
> should bring it back, please.

Either free(graft) or assert(!graft) is fine, but we should have one
of them there.  I'll add assert(!graft) there while queuing, at
least for now.

In the current code, when the control reaches the bad_graft_data
label, 'graft' must be NULL, or there is a bug in our code.  Because
we are parsing exactly the same input using the same helper routines
in both passes, we should see failure during the first pass before
'graft' points to an allocated piece of memory.  So it may be a good
idea to have assert(!graft) there than free(graft); the latter would
sweep a potential bug under the carpet.

If this were a part of the system whose design is still fluid (it is
not), it is not implausible that we would later want to add new test
that jumps to the bad_graft_data label.  For example, after the
"runs exactly twice" loop, we may add a new test that iterates over
the graft->parents[] to ensure that there is no duplicate and jumps
to bad_graft_data when we find one.

If we add assert(!graft) there today, and if such an enhancement
forgets to replace it with free(graft), the assert() will catch the
mistake.  If we have neither, it makes it more likely that such an
enhancement leaves a possible memory leak in its error codepath.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2017-08-18 19:47 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-15 11:49 [PATCH 0/5] Modernize read_graft_line implementation Patryk Obara
2017-08-15 11:49 ` [PATCH 1/5] cache: extend object_id size to sha3-256 Patryk Obara
2017-08-15 11:49 ` [PATCH 2/5] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
2017-08-15 18:23   ` Junio C Hamano
2017-08-15 18:29     ` Stefan Beller
2017-08-15 19:02       ` Junio C Hamano
2017-08-16 12:11         ` Patryk Obara
2017-08-16 19:32           ` Junio C Hamano
2017-08-15 11:49 ` [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
2017-08-15 17:02   ` Stefan Beller
2017-08-16 12:24     ` Patryk Obara
2017-08-16 22:59       ` brian m. carlson
2017-08-17  5:55         ` Jeff King
2017-08-17 21:17           ` Junio C Hamano
2017-08-17 21:38             ` Patryk Obara
2017-08-15 18:25   ` Junio C Hamano
2017-08-15 11:49 ` [PATCH 4/5] commit: implement free_commit_graft Patryk Obara
2017-08-15 17:04   ` Stefan Beller
2017-08-15 18:26   ` Junio C Hamano
2017-08-16 12:28     ` Patryk Obara
2017-08-15 11:49 ` [PATCH 5/5] commit: rewrite read_graft_line Patryk Obara
2017-08-15 17:11   ` Stefan Beller
2017-08-15 18:30   ` Junio C Hamano
2017-08-16 13:07     ` Patryk Obara
2017-08-16 19:39       ` Junio C Hamano
2017-08-15 17:19 ` [PATCH 0/5] Modernize read_graft_line implementation Stefan Beller
2017-08-16 17:58 ` [PATCH v2 0/4] " Patryk Obara
2017-08-16 17:58   ` [PATCH v2 1/4] sha1_file: fix hardcoded size in null_sha1 Patryk Obara
2017-08-16 19:46     ` Junio C Hamano
2017-08-16 17:58   ` [PATCH v2 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
2017-08-16 17:58   ` [PATCH v2 3/4] commit: implement free_commit_graft Patryk Obara
2017-08-16 17:58   ` [PATCH v2 4/4] commit: rewrite read_graft_line Patryk Obara
2017-08-17 21:20     ` Junio C Hamano
2017-08-17 21:42       ` Patryk Obara
2017-08-18  1:59   ` [PATCH v3 0/4] Modernize read_graft_line implementation Patryk Obara
2017-08-18  1:59     ` [PATCH v3 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
2017-08-18  1:59     ` [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
2017-08-18  6:29       ` Jeff King
2017-08-18 10:12         ` Patryk Obara
2017-08-18 11:50           ` Jeff King
2017-08-18  1:59     ` [PATCH v3 3/4] commit: allocate array using object_id size Patryk Obara
2017-08-18  1:59     ` [PATCH v3 4/4] commit: rewrite read_graft_line Patryk Obara
2017-08-18  6:43       ` Jeff King
2017-08-18  7:44         ` Junio C Hamano
2017-08-18 11:30           ` Patryk Obara
2017-08-18 11:45             ` Jeff King
2017-08-18 16:44             ` Junio C Hamano
2017-08-18 17:05               ` Patryk Obara
2017-08-18 18:29                 ` Junio C Hamano
2017-08-18  2:20     ` [PATCH v3 0/4] Modernize read_graft_line implementation Junio C Hamano
2017-08-18 18:33     ` [PATCH v4 " Patryk Obara
2017-08-18 18:33       ` [PATCH v4 1/4] sha1_file: fix definition of null_sha1 Patryk Obara
2017-08-18 18:33       ` [PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line Patryk Obara
2017-08-18 18:33       ` [PATCH v4 3/4] commit: allocate array using object_id size Patryk Obara
2017-08-18 18:33       ` [PATCH v4 4/4] commit: rewrite read_graft_line Patryk Obara
2017-08-18 18:38         ` Patryk Obara
2017-08-18 19:12           ` Junio C Hamano
2017-08-18 19:33             ` Patryk Obara
2017-08-18 19:47           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).