From: Junio C Hamano <gitster@pobox.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/2] pack-objects: optimize "recency order"
Date: Fri, 08 Jul 2011 10:45:51 -0700 [thread overview]
Message-ID: <7voc14fyy8.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <1310084657-16790-3-git-send-email-gitster@pobox.com> (Junio C. Hamano's message of "Thu, 7 Jul 2011 17:24:17 -0700")
I <gitster@pobox.com> wrote:
> - In the original recency order, trees and blobs are intermixed. Write
> trees together before blobs, in the hope that this will improve
> locality when running pathspec-limited revision traversal, i.e.
> "git log paths...";
>
> - When writing blob objects out, write the whole family of blobs that use
> the same delta base object together, by starting from the root of the
> delta chain, and writing its immediate children in a width-first
> manner, in the hope that this will again improve locality when reading
> blobs that belong to the same path, which are likely to be deltified
> against each other.
An interesting tangent.
With a single-liner change to apply the same "write the entire family as a
group" logic also to tree objects, we get a superb improvement to the
"index-pack" workload, while "log pathspec" becomes measurably worse.
> * Pathspec-limited log.
> $ git log drivers/net >/dev/null
> v1.7.6 with patch
> 60.0% percentile : 688,671 443,578
> 70.0% percentile : 319,574,732 110,370,100
> 99.0% percentile : 412,942,470 133,078,115
> 99.5% percentile : 413,172,266 133,163,349
> 99.9% percentile : 413,354,356 133,240,445
> Less than 2MiB seek: 61.71% 62.87%
60.00% percentile : 2,240,841
70.00% percentile : 115,729,538
99.00% percentile : 132,912,877
99.90% percentile : 133,129,574
Less than 2MiB seek : 59.77%
This of course is because trees tend to delta well against each other and
writing out everything in the same family tends to coalesce ancient and
recent trees close together, while the history traversal wants to see the
trees at the same path in adjacent commits, and the process to jump over
to the "next tree" in the same commit need to seek a lot more.
> * Blame.
> $ git blame drivers/net/ne.c >/dev/null
> v1.7.6 with patch
> 50.0% percentile : 11,248 7,334
> 95.0% percentile : 408,230,673 132,174,308
> Less than 2MiB seek: 56.47% 56.83%
50.00% percentile : 79,182
95.00% percentile : 132,547,959
Less than 2MiB seek : 56.16%
This also is harmed by trees in adjacent commits being further apart, but
blame needs to read a lot of blobs as well, so the effect is not as grave
as tree-only "log pathspec" case.
> * Index pack.
> $ git index-pack -v .git/objects/pack/pack*.pack
> v1.7.6 with patch
> 80.0% percentile : 1,317,862 4,971
> 90.0% percentile : 11,926,385 119,398
> 99.9% percentile : 382,919,785 33,155,191
> Less than 2MiB seek: 81.73% 96.92%
80.00% percentile : 745
90.00% percentile : 1,891
99.90% percentile : 80,001
Less than 2MiB seek : 100.00%
It is so obvious that the in pack object ordering with "the whole family
at once" tweak is extremely tuned for this workload that the result is not
even funny.
The moral of the story is that index-pack is not a workload to optimize
for ;-) and the following is a wrong patch to apply. In general, we have
to be extremely careful not to tune for a particular workload while
unknowingly sacrificing other workloads.
I wonder if we would get a better locality for "blame" by not writing the
whole family at once for blobs, but only a smaller subset.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 27132bb..adcb509 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -553,7 +553,7 @@ static struct object_entry **compute_write_order(void)
for (i = 0; i < nr_objects; i++) {
if (objects[i].type != OBJ_TREE)
continue;
- add_to_write_order(wo, &wo_end, &objects[i]);
+ add_family_to_write_order(wo, &wo_end, &objects[i]);
}
/*
next prev parent reply other threads:[~2011-07-08 17:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-08 0:24 [PATCH 0/2] For improved pack locality Junio C Hamano
2011-07-08 0:24 ` [PATCH 1/2] core: log offset pack data accesses happened Junio C Hamano
2011-07-08 0:24 ` [PATCH 2/2] pack-objects: optimize "recency order" Junio C Hamano
2011-07-08 2:08 ` Shawn Pearce
2011-07-08 17:45 ` Junio C Hamano [this message]
2011-07-11 22:49 ` Nicolas Pitre
2011-07-08 22:47 ` Junio C Hamano
2011-07-09 0:42 ` Shawn Pearce
2011-10-27 21:01 ` Ævar Arnfjörð Bjarmason
2011-10-27 21:49 ` Ævar Arnfjörð Bjarmason
2011-10-27 22:02 ` Ævar Arnfjörð Bjarmason
2011-10-27 22:32 ` Jakub Narebski
2011-10-27 22:05 ` Junio C Hamano
2011-10-28 9:17 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7voc14fyy8.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).