git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Sergey Vlasov <vsu@altlinux.ru>
Cc: Alexandre Julliard <julliard@winehq.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@gmail.com>,
	git@vger.kernel.org
Subject: Re: Shallow clone
Date: Sun, 12 Nov 2006 13:59:15 -0800	[thread overview]
Message-ID: <7vd57scong.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <20061112205909.f8951300.vsu@altlinux.ru> (Sergey Vlasov's message of "Sun, 12 Nov 2006 20:59:09 +0300")

Sergey Vlasov <vsu@altlinux.ru> writes:

> This is due to optimization in builtin-pack-objects.c:try_delta():
>
> 	/*
> 	 * We do not bother to try a delta that we discarded
> 	 * on an earlier try, but only when reusing delta data.
> 	 */
> 	if (!no_reuse_delta && trg_entry->in_pack &&
> 	    trg_entry->in_pack == src_entry->in_pack)
> 		return 0;
>
> After removing this part the shallow pack after clone is 2.6M, as it
> should be.
>
> The problem with this optimization is that it is only valid if we are
> repacking either the same set of objects as we did earlier, or its
> superset.  But if we are packing a subset of objects, there will be some
> objects in that subset which were delta-compressed in the original pack,
> but base objects for that deltas are not included in our subset -
> therefore we will be unable to reuse existing deltas, and with that
> optimization we will never try to use delta compression for these
> objects.
> ...
> So any partial fetch (shallow or not) from a mostly packed repository
> currently results in a suboptimal pack.

That is correct.  How about something like this?

I think the determination of "repacking_superset" may need to be
tweaked because existing packs may have overlaps, and the patch
counts them once per pack.


diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 69e5dd3..fb25124 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -64,6 +64,7 @@ struct object_entry {
 static unsigned char object_list_sha1[20];
 static int non_empty;
 static int no_reuse_delta;
+static int repacking_superset;
 static int local;
 static int incremental;
 static int allow_ofs_delta;
@@ -1172,10 +1173,13 @@ static int try_delta(struct unpacked *tr
 		return -1;
 
 	/*
-	 * We do not bother to try a delta that we discarded
-	 * on an earlier try, but only when reusing delta data.
+	 * When we are packing the superset of objects we have already
+	 * packed, we do not bother to try a delta that we discarded
+	 * on an earlier try.  This heuristic of course should not
+	 * kick in when we are not reusing delta, or we know we are
+	 * sending a subset of objects from a repository.
 	 */
-	if (!no_reuse_delta && trg_entry->in_pack &&
+	if (!no_reuse_delta && repacking_superset && trg_entry->in_pack &&
 	    trg_entry->in_pack == src_entry->in_pack)
 		return 0;
 
@@ -1493,6 +1497,16 @@ static void get_object_list(int ac, cons
 	traverse_commit_list(&revs, show_commit, show_object);
 }
 
+static int count_packed_objects(void)
+{
+	struct packed_git *p;
+	int cnt = 0;
+
+	for (p = packed_git; p; p = p->next)
+		cnt += num_packed_objects(p);
+	return cnt;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	SHA_CTX ctx;
@@ -1631,6 +1645,8 @@ int cmd_pack_objects(int argc, const cha
 	if (non_empty && !nr_result)
 		return 0;
 
+	repacking_superset = count_packed_objects() < nr_result;
+
 	SHA1_Init(&ctx);
 	list = sorted_by_sha;
 	for (i = 0; i < nr_result; i++) {


  reply	other threads:[~2006-11-12 21:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-08  3:21 What's in git.git Junio C Hamano
2006-11-08  4:13 ` David Lang
2006-11-08 16:40   ` Shallow clone [Was Re: What's in git.git ] Aneesh Kumar K.V
2006-11-08 17:59     ` Aneesh Kumar K.V
2006-11-09  4:04       ` Shallow clone Junio C Hamano
2006-11-09  4:17         ` Aneesh Kumar
2006-11-11 13:57         ` Alexandre Julliard
2006-11-12  8:16           ` Junio C Hamano
2006-11-12 17:59             ` Sergey Vlasov
2006-11-12 21:59               ` Junio C Hamano [this message]
2006-11-13  5:29                 ` Junio C Hamano
2006-11-12 13:12       ` Shallow clone [Was Re: What's in git.git ] Petr Baudis
2006-11-12 20:04         ` Shallow clone Junio C Hamano
2006-11-09  2:28   ` What's in git.git Horst H. von Brand
2006-11-09  2:54     ` Junio C Hamano
2006-11-09  3:04       ` Junio C Hamano
2006-11-09  3:45       ` Dave Dillow
2006-11-12 22:25   ` Johannes Schindelin
2006-11-08  7:40 ` Jakub Narebski
2006-11-08  7:59   ` Junio C Hamano
2006-11-08  7:58 ` Jakub Narebski
2006-11-08  8:26   ` Junio C Hamano
2006-11-08 14:51 ` Petr Baudis
2006-11-09  0:02 ` Junio C Hamano
     [not found] <CAEfjWpHhLKpghGRFtzstndk_vYMkLSLAGfXx8agoQmakC-6Otg@mail.gmail.com>
2014-08-19 11:11 ` Fwd: Shallow clone Steven Evergreen
2014-08-19 12:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vd57scong.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=aneesh.kumar@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=julliard@winehq.org \
    --cc=vsu@altlinux.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).