git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, vdye@github.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 7/8] ahead-behind: implement ahead_behind() logic
Date: Mon, 6 Mar 2023 20:05:39 -0500	[thread overview]
Message-ID: <ZAaN4w4kltPIeYlt@nand.local> (raw)
In-Reply-To: <b8c55ecf88d6229f13e05e8369adaf9e70ae1de0.1678111599.git.gitgitgadget@gmail.com>

On Mon, Mar 06, 2023 at 02:06:37PM +0000, Derrick Stolee via GitGitGadget wrote:
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>

Having read and worked with this code before, I don't have a ton of
substance to add here. But it was interesting to reread, and I left a
few sprinklings here and there of some thing that we may want to
consider for v2.

Before that, though, IIRC we wrote most of this together, so I would be
happy to have my:

    Co-authored-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>

above your S-o-b here. But you've done so much work since we originally
wrote this together that I don't mind being dropped here. Up to you :-).

> @@ -71,5 +76,23 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
>  	if (!tips.nr)
>  		return 0;
>
> +	ALLOC_ARRAY(commits, tips.nr + 1);
> +	ALLOC_ARRAY(counts, tips.nr);
> +
> +	for (i = 0; i < tips.nr; i++) {
> +		commits[i] = tips.items[i].util;
> +		counts[i].tip_index = i;
> +		counts[i].base_index = tips.nr;
> +	}
> +	commits[tips.nr] = base;
> +
> +	ahead_behind(commits, tips.nr + 1, counts, tips.nr);
> +
> +	for (i = 0; i < tips.nr; i++)
> +		printf("%s %d %d\n", tips.items[i].string,
> +		       counts[i].ahead, counts[i].behind);
> +
> +	free(counts);
> +	free(commits);
>  	return 0;
>  }

I have to say, the interface looks particularly well designed when you
see the patches come together in this fashion. The builtin is doing
basically no work except collating the user's input, passing it off to
ahead_behind(), and then spitting out the results.

Very nice ;-).

> diff --git a/commit-reach.c b/commit-reach.c
> index 2e33c599a82..87ccc2cd4f5 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -8,6 +8,7 @@
>  #include "revision.h"
>  #include "tag.h"
>  #include "commit-reach.h"
> +#include "ewah/ewok.h"
>
>  /* Remember to update object flag allocation in object.h */

There is a new use of PARENT2 (which we hardcode here as bit 17) below,
but it is already covered as part of the object flag allocation table in
object.h. So this comment has done its job over the years ;-).

>  #define PARENT1		(1u<<16)
> @@ -941,3 +942,97 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>
>  	return found_commits;
>  }
> +
> +define_commit_slab(bit_arrays, struct bitmap *);
> +static struct bit_arrays bit_arrays;
> +
> +static void insert_no_dup(struct prio_queue *queue, struct commit *c)
> +{
> +	if (c->object.flags & PARENT2)
> +		return;
> +	prio_queue_put(queue, c);
> +	c->object.flags |= PARENT2;
> +}

You mentioned this in the patch message, but:

It may be worth noting here (or in the call to repo_clear_commit_marks()
below) that the PARENT2 flag is used to detect and avoid duplicates in
this list.

> +static struct bitmap *init_bit_array(struct commit *c, int width)
> +{
> +	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
> +	if (!*bitmap)
> +		*bitmap = bitmap_word_alloc(width);
> +	return *bitmap;
> +}
> +
> +static void free_bit_array(struct commit *c)
> +{
> +	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
> +	if (!*bitmap)
> +		return;
> +	bitmap_free(*bitmap);
> +	*bitmap = NULL;
> +}
> +
> +void ahead_behind(struct commit **commits, size_t commits_nr,
> +		  struct ahead_behind_count *counts, size_t counts_nr)
> +{
> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
> +	size_t i;
> +
> +	if (!commits_nr || !counts_nr)
> +		return;
> +
> +	for (i = 0; i < counts_nr; i++) {
> +		counts[i].ahead = 0;
> +		counts[i].behind = 0;
> +	}
> +
> +	ensure_generations_valid(commits, commits_nr);
> +
> +	init_bit_arrays(&bit_arrays);
> +
> +	for (i = 0; i < commits_nr; i++) {
> +		struct commit *c = commits[i];
> +		struct bitmap *bitmap = init_bit_array(c, width);
> +
> +		bitmap_set(bitmap, i);
> +		insert_no_dup(&queue, c);
> +	}
> +
> +	while (queue_has_nonstale(&queue)) {
> +		struct commit *c = prio_queue_get(&queue);
> +		struct commit_list *p;
> +		struct bitmap *bitmap_c = init_bit_array(c, width);
> +
> +		for (i = 0; i < counts_nr; i++) {
> +			int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index);
> +			int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index);

Since we're XORing these, I'd hate to get bit by bitmap_get returning
something other than 0 or 1. It doesn't, since the return value (for any
"pos" for which it holds that `EWAH_BLOCKI(pos) < self->word_alloc`) is:

    (self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos)) != 0

so we'll always be guaranteed to zero or one. But if we retuned instead:

    self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos)

...this code would break in a very annoying and hard-to-debug way ;-).

I wonder if we might do a little of belt-and-suspenders here by calling
these like:

    int reach_from_tip  = !!(bitmap_get(bitmap_c, counts[i].tip_index));
    int reach_from_base = !!(bitmap_get(bitmap_c, counts[i].base_index));

where the "!!(...)" is new.

> +			if (reach_from_tip ^ reach_from_base) {
> +				if (reach_from_base)
> +					counts[i].behind++;
> +				else
> +					counts[i].ahead++;
> +			}
> +		}

I have gone back and forth so many times on this code :-). I think the
XORs are fine, though.

> +		for (p = c->parents; p; p = p->next) {
> +			struct bitmap *bitmap_p;
> +
> +			parse_commit(p->item);
> +
> +			bitmap_p = init_bit_array(p->item, width);
> +			bitmap_or(bitmap_p, bitmap_c);
> +
> +			if (bitmap_popcount(bitmap_p) == commits_nr)
> +				p->item->object.flags |= STALE;
> +
> +			insert_no_dup(&queue, p->item);

Do we care about inserting p->item when the above condition is met? IOW,
would it be OK to instead write:

    if (bitmap_popcount(bitmap_p) == commits_nr)
      p->item->object.flags |= STALE;
    else
      insert_no_dup(&queue, p->item);

> diff --git a/commit-reach.h b/commit-reach.h
> index 148b56fea50..1780f9317bf 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>  					 struct commit **to, int nr_to,
>  					 unsigned int reachable_flag);
>
> +struct ahead_behind_count {
> +	/**
> +	 * As input, the *_index members indicate which positions in
> +	 * the 'tips' array correspond to the tip and base of this
> +	 * comparison.
> +	 */
> +	size_t tip_index;
> +	size_t base_index;
> +
> +	/**
> +	 * These values store the computed counts for each side of the
> +	 * symmetric difference:
> +	 *
> +	 * 'ahead' stores the number of commits reachable from the tip
> +	 * and not reachable from the base.
> +	 *
> +	 * 'behind' stores the number of commits reachable from the base
> +	 * and not reachable from the tip.
> +	 */
> +	int ahead;
> +	int behind;
> +};

Should these be unsigned values? I don't think we have a sensible
interpretation for what a negative "ahead" or "behind" could would mean.
I guess behind "behind" by "N" means you're "ahead" by "-N", but I don't
think it's practical ;-).

> +
> +/**

Here and elsewhere, these kind of doc-comments are a little
non-standard, and IIRC the opening should instead be "/*" (with one
asterisk instead of two).

Thanks,
Taylor

  reply	other threads:[~2023-03-07  1:05 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
2023-03-06 18:48   ` Junio C Hamano
2023-03-07  0:40     ` Taylor Blau
2023-03-08 22:14       ` Derrick Stolee
2023-03-08 22:56         ` Junio C Hamano
2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
2023-03-07  0:43   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
2023-03-07  0:46   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-06 18:52   ` Junio C Hamano
2023-03-07  0:50     ` Taylor Blau
2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-07  1:05   ` Taylor Blau [this message]
2023-03-09 17:32     ` Derrick Stolee
2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
2023-03-06 20:18   ` Derrick Stolee
2023-03-06 22:24     ` Junio C Hamano
2023-03-07  0:36   ` Taylor Blau
2023-03-09  9:20     ` Jeff King
2023-03-09 21:51       ` Junio C Hamano
2023-03-07  0:33 ` Taylor Blau
2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-10 18:08     ` Junio C Hamano
2023-03-13 10:31     ` Phillip Wood
2023-03-13 13:33       ` Derrick Stolee
2023-03-13 21:10         ` Taylor Blau
2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
2023-03-15 17:17       ` Jeff King
2023-03-15 17:49     ` Jeff King
2023-03-15 19:24       ` Junio C Hamano
2023-03-15 19:44         ` Jeff King
2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:03       ` Junio C Hamano
2023-03-15 16:13         ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-10 19:09     ` Junio C Hamano
2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:01       ` Junio C Hamano
2023-03-15 16:12         ` Derrick Stolee
2023-03-15 16:11       ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:17       ` Derrick Stolee
2023-03-15 16:18         ` Derrick Stolee
2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
2023-03-10 19:25     ` Derrick Stolee
2023-03-15 17:31       ` Jeff King
2023-03-15 17:44         ` Derrick Stolee
2023-03-15 19:34         ` Junio C Hamano
2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
2023-03-15 13:54     ` Derrick Stolee
2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-15 18:06       ` Jeff King
2023-03-15 19:14         ` Junio C Hamano
2023-03-15 22:41       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-15 22:49       ` Jonathan Tan
2023-03-17 18:30         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-15 22:58       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 23:28       ` Jonathan Tan
2023-03-17 18:44         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-20 20:40         ` Jonathan Tan
2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAaN4w4kltPIeYlt@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).