git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: newren@gmaill.com, peff@peff.net, me@ttaylorr.com,
	jrnieder@gmail.com, Derrick Stolee <dstolee@microsoft.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs
Date: Thu, 07 May 2020 13:17:38 +0000	[thread overview]
Message-ID: <fcf948bda7aebcc5f88c17f5b308b2ce0cc285f5.1588857462.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.627.git.1588857462.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

As we parse the in-tree config files that store the sparse.dir values
used to create an in-tree sparse-checkout definition, we can easily
avoid parsing the same file multiple times by using an oidset on those
blobs. We only parse if the oid is new to the oidset.

This is unlikely to have a major performance benefit right now, but will
be extremely important when we introduce the sparse.inherit options to
link multiple files in a directed graph. This oidset will prevent
infinite loops when cycles exist in that digraph, or exponential blowups
even in the case of a directed acyclic graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-checkout.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/sparse-checkout.c b/sparse-checkout.c
index 6c58fda9722..d01f4d7b525 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -9,6 +9,7 @@
 #include "string-list.h"
 #include "unpack-trees.h"
 #include "object-store.h"
+#include "oidset.h"
 
 char *get_sparse_checkout_filename(void)
 {
@@ -77,9 +78,12 @@ int load_in_tree_pattern_list(struct repository *r,
 			      struct string_list *sl,
 			      struct pattern_list *pl)
 {
+	int result = 0;
 	struct index_state *istate = r->index;
 	struct string_list_item *item;
 	struct strbuf path = STRBUF_INIT;
+	struct oidset set;
+	oidset_init(&set, 16);
 
 	pl->use_cone_patterns = 1;
 
@@ -96,24 +100,34 @@ int load_in_tree_pattern_list(struct repository *r,
 		 * Use -1 return to ensure populate_from_existing_patterns()
 		 * skips the sparse-checkout updates.
 		 */
-		if (pos < 0)
-			return -1;
+		if (pos < 0) {
+			result = -1;
+			goto cleanup;
+		}
 
 		oid = &istate->cache[pos]->oid;
+
+		if (oidset_contains(&set, oid))
+			continue;
+
+		oidset_insert(&set, oid);
+
 		type = oid_object_info(r, oid, NULL);
 
 		if (type != OBJ_BLOB) {
 			warning(_("expected a file at '%s'; not updating sparse-checkout"),
 				oid_to_hex(oid));
-			return 1;
+			result = 1;
+			goto cleanup;
 		}
 
 		load_in_tree_from_blob(pl, oid);
 	}
 
+cleanup:
 	strbuf_release(&path);
-
-	return 0;
+	oidset_clear(&set);
+	return result;
 }
 
 int populate_sparse_checkout_patterns(struct pattern_list *pl)
-- 
gitgitgadget


  parent reply	other threads:[~2020-05-07 13:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
2020-05-07 22:27   ` Junio C Hamano
2020-05-08 12:19     ` Derrick Stolee
2020-05-08 15:09       ` Junio C Hamano
2020-05-20 16:32     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58   ` Junio C Hamano
2020-05-08 15:40     ` Derrick Stolee
2020-05-20 17:52       ` Elijah Newren
2020-06-17 23:07         ` Elijah Newren
2020-06-18  8:18           ` Son Luong Ngoc
2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
2020-05-20 16:28   ` Elijah Newren
2020-05-07 13:17 ` Derrick Stolee via GitGitGadget [this message]
2020-05-20 16:40   ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Elijah Newren
2020-05-21  3:49     ` Elijah Newren
2020-05-21 17:54       ` Derrick Stolee
2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
2020-05-20 18:10   ` Elijah Newren
2020-05-30 17:26     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
2020-06-17 23:14 ` Elijah Newren
2020-06-18  1:42   ` Derrick Stolee
2020-06-18  1:59     ` Elijah Newren
2020-06-18  3:01       ` Derrick Stolee
2020-06-18  5:03         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcf948bda7aebcc5f88c17f5b308b2ce0cc285f5.1588857462.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmaill.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).