git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/1] ls-refs.c: minimize number of refs visited
Date: Tue, 19 Jan 2021 11:12:45 -0500	[thread overview]
Message-ID: <YAcE/dTuOB9PbGQU@nand.local> (raw)
In-Reply-To: <20210119144251.27924-2-jacob@gitlab.com>

Hi Jacob,

On Tue, Jan 19, 2021 at 03:42:51PM +0100, Jacob Vosmaer wrote:
> The previous implementation of ls-refs would perform exactly one ref
> walk, matching each ref against the prefixes (if any) provided by the
> user. This can be expensive if there are a lot of refs and the user
> only cares about a small subset of them.
>
> In this patch we analyze the prefixes provided by the user and build a
> minimal set of disjoint prefixes that contains all of them. We then do
> a ref walk for each of these minimal prefixes.

This reminds me of b31e2680c4 (ref-filter.c: find disjoint pattern
prefixes, 2019-06-26), where we solved a very similar problem for 'git
for-each-ref'. The difference here is that we are operating on a set of
prefixes, not a set of refs.

But, I think that we could get pretty far by treating the prefixes as
refs so that we can call ref-filter.c:find_longest_prefixes(). For its
purposes, it doesn't really care about whether or not the arguments
actually are references. It simply returns the longest common prefix
among all of its arguments (delimited by '/' characters).

> It is tempting to have just one strvec for the prefixes and use it
> both for matching and for iterating. But every time I tried that, it
> made things more complicated. I settled on leaving the existing ref
> matching (using &data.prefixes) alone, and I added a new layer around
> it for the ref walk optimization (using &iter_prefixes).

I think the implementation in b31e2680c4 deals with this nicely: it
takes a pointer to a strvec and dumps prefixes in there.

> This commit also fixes a bug in ls-refs.c that was not triggered
> before: we were using a strvec set to zero, which is not how you are
> supposed to initialize a strvec. We now call strvec_init after zeroing.

Good.

> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  ls-refs.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index a1e0b473e4..6d5f0c769a 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -84,12 +84,44 @@ static int ls_refs_config(const char *var, const char *value, void *data)
>  	return parse_hide_refs_config(var, value, "uploadpack");
>  }
>
> +static int cmp_prefix(const void *a_, const void *b_){
> +	const char *a = *(const char **)a_;
> +	const char *b = *(const char **)b_;
> +	return strcmp(a, b);
> +}
> +
> +static void deduplicate_prefixes(struct strvec *prefixes) {
> +	int i;
> +
> +	QSORT(prefixes->v, prefixes->nr, cmp_prefix);
> +
> +	for (i = 1; i < prefixes->nr;) {
> +		const char *p = prefixes->v[i];
> +
> +		/*
> +		 * If p is "refs/foobar" and its predecessor is "refs/foo" then we should
> +		 * drop p, both to avoid sending duplicate refs to the user, and to avoid
> +		 * doing unnecessary work.
> +		 */
> +		if (starts_with(p, prefixes->v[i - 1])) {
> +			MOVE_ARRAY(&prefixes->v[i], &prefixes->v[i + 1], prefixes->nr - (i + 1));
> +			prefixes->v[prefixes->nr - 1] = p;
> +			strvec_pop(prefixes);
> +		} else {
> +			i++;
> +		}
> +	}
> +}
> +

Indeed, this and the below code are very reminiscent of b31e2680c4. So,
I wonder if it's possible to use the existing implementation rather than
implement what is roughly the same thing twice.

Below is a completely untested patch to try and reuse the code from
b31e2680c4. (It compiles, but that's the extent of my guarantees about
it ;-).) It's all smashed into one huge patch, so if you're happy with
the direction I'll take care of cleaning it up. The new function in
ref-filter.h really belongs in refs.h, but I left the implementation in
ref-filter.c to avoid creating more noise in the diff.

Let me know what you think.

Thanks,
Taylor

--- >8 ---

Subject: [PATCH] ls-refs: iterate longest common refs prefix

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ls-refs.c    |  4 +++-
 ref-filter.c | 46 ++++++++++++++++++++++++++++++++--------------
 ref-filter.h | 10 ++++++++++
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1e0b473e4..6a3e11d45c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -6,6 +6,7 @@
 #include "ls-refs.h"
 #include "pkt-line.h"
 #include "config.h"
+#include "ref-filter.h"

 /*
  * Check if one of the prefixes is a prefix of the ref.
@@ -109,7 +110,8 @@ int ls_refs(struct repository *r, struct strvec *keys,
 		die(_("expected flush after ls-refs arguments"));

 	head_ref_namespaced(send_ref, &data);
-	for_each_namespaced_ref(send_ref, &data);
+	for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v,
+				     send_ref, &data, 0);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index aa260bfd09..c34bf34d06 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1987,6 +1987,36 @@ static void find_longest_prefixes(struct string_list *out,
 	strbuf_release(&prefix);
 }

+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken)
+{
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
+	struct string_list_item *prefix;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, namespace_len;
+
+	find_longest_prefixes(&prefixes, patterns);
+
+	if (namespace)
+		strbuf_addstr(&buf, namespace);
+	namespace_len = buf.len;
+
+	for_each_string_list_item(prefix, &prefixes) {
+		strbuf_addf(&buf, prefix->string);
+		ret = for_each_fullref_in(buf.buf, cb, cb_data, broken);
+		if (ret)
+			break;
+		strbuf_setlen(&buf, namespace_len);
+	}
+
+	string_list_clear(&prefixes, 0);
+	strbuf_release(&buf);
+	return ret;
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -1997,10 +2027,6 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       void *cb_data,
 				       int broken)
 {
-	struct string_list prefixes = STRING_LIST_INIT_DUP;
-	struct string_list_item *prefix;
-	int ret;
-
 	if (!filter->match_as_path) {
 		/*
 		 * in this case, the patterns are applied after
@@ -2024,16 +2050,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		return for_each_fullref_in("", cb, cb_data, broken);
 	}

-	find_longest_prefixes(&prefixes, filter->name_patterns);
-
-	for_each_string_list_item(prefix, &prefixes) {
-		ret = for_each_fullref_in(prefix->string, cb, cb_data, broken);
-		if (ret)
-			break;
-	}
-
-	string_list_clear(&prefixes, 0);
-	return ret;
+	return for_each_fullref_in_prefixes(NULL, filter->name_patterns,
+					    cb, cb_data, broken);
 }

 /*
diff --git a/ref-filter.h b/ref-filter.h
index feaef4a8fd..f666a0fb49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -146,4 +146,14 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);

+/**
+ * iterate all refs which descend from the longest common prefix among
+ * "patterns".
+ */
+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken);
+
 #endif /*  REF_FILTER_H  */
--
2.30.0.138.g6d7191ea01


  reply	other threads:[~2021-01-19 16:22 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:42 [PATCH 0/1] ls-refs.c: minimize number of refs visited Jacob Vosmaer
2021-01-19 14:42 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 16:12   ` Taylor Blau [this message]
2021-01-19 17:42     ` Jacob Vosmaer
2021-01-19 18:19       ` [PATCH 0/2] ls-refs: only traverse through longest common ref prefix Taylor Blau
2021-01-19 18:19         ` [PATCH 1/2] refs: expose 'for_each_fullref_in_prefixes' Taylor Blau
2021-01-19 18:19         ` [PATCH 2/2] ls-refs.c: traverse longest common ref prefix Taylor Blau
2021-01-19 23:09           ` Jeff King
2021-01-19 23:52             ` Taylor Blau
2021-01-20  0:08               ` Jeff King
2021-01-20 11:00           ` Jacob Vosmaer
2021-01-20 16:04         ` [PATCH v2 0/3] ls-refs: traverse prefixes of disjoint "ref-prefix" sets Taylor Blau
2021-01-20 16:04           ` [PATCH v2 1/3] refs: expose 'for_each_fullref_in_prefixes' Taylor Blau
2021-01-20 19:56             ` Jeff King
2021-01-20 20:12               ` Taylor Blau
2021-01-23  2:59             ` Junio C Hamano
2021-01-25  1:35               ` Taylor Blau
2021-01-20 16:04           ` [PATCH v2 2/3] ls-refs.c: initialize 'prefixes' before using it Taylor Blau
2021-01-20 19:58             ` Jeff King
2021-01-20 20:13               ` Taylor Blau
2021-01-20 21:50             ` Jacob Vosmaer
2021-01-20 16:04           ` [PATCH v2 3/3] ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets Taylor Blau
2021-01-23 17:55           ` [PATCH v2 0/3] ls-refs: " Junio C Hamano
2021-01-19 19:09       ` [PATCH 1/1] ls-refs.c: minimize number of refs visited Taylor Blau
2021-01-19 21:59         ` Jeff King
2021-01-19 22:15           ` Jeff King
2021-01-19 22:23             ` Taylor Blau
2021-01-19 22:52               ` Jeff King
2021-01-19 22:59                 ` Jeff King
2021-01-19 23:02                   ` Taylor Blau
2021-01-19 22:53   ` Jeff King
2021-01-19 23:00     ` Taylor Blau
2021-01-19 23:11       ` Jeff King
2021-01-20 10:40         ` Jacob Vosmaer
2021-01-20 10:44           ` Jacob Vosmaer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YAcE/dTuOB9PbGQU@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).