git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: git@vger.kernel.org
Cc: johannes.schindelin@gmx.de, git@jeffhostetler.com,
	kewillf@microsoft.com, Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 0/3] Improve abbreviation disambiguation
Date: Fri, 15 Sep 2017 12:57:47 -0400	[thread overview]
Message-ID: <20170915165750.198201-1-dstolee@microsoft.com> (raw)

Hello,

My name is Derrick Stolee and I just switched teams at Microsoft from
the VSTS Git Server to work on performance improvements in core Git.

This is my first patch submission, and I look forward to your feedback.

Thanks,
 Stolee


When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

A performance helper `test-abbrev` and performance test `p0008-abbrev.sh`
are added to demonstrate this performance improvement. Here are some
performance results for the three included commits, using
GIT_PERF_REPEAT_COUNT=10 since the first test is frequently an outlier
due to the file cache being cold.

Running git on a Linux VM, we see the following gains.

| Repo    | Pack-Files | Loose Objs | Baseline | Patch 2 | Patch 3 |
|---------|------------|------------|----------|---------|---------|
| Git.git | 1          | 0          | 0.46 s   | -87.0%  | -87.0%  |
| Git.git | 5          | 0          | 1.04 s   | -84.6%  | -85.6%  |
| Git.git | 4          | 75852      | 0.88 s   | -86.4%  | -86.4%  |
| Linux   | 1          | 0          | 0.63 s   | -38.1%  | -69.8%  |
| Linux   | 24         | 0          | 5.41 s   | -69.3%  | -71.5%  |
| Linux   | 23         | 323441     | 5.41 s   | -70.6%  | -73.4%  |

Running a similar patch on Git for Windows, we see the following gains.

| Repo          | Pack-Files | Loose | Baseline | Patch 2 | Patch 3 |
|---------------|------------|-------|----------|---------|---------|
| GitForWindows | 6          | 319   | 7.19 s   | -91.1%  | -91.5%  |
| VSTS          | 3          | 38    | 7.83 s   | -88.9%  | -90.9%  |
| Linux         | 3          | 0     | 7.92 s   | -87.9%  | -90.2%  |
| Windows       | 50         | 219   | 17.8 s   | -98.6%  | -98.6%  |

Note that performance improves in all cases, but the performance gain
is larger when there are multiple, large pack-files. This gain comes
from the lack of in-memory caching of index files that have already been
inspected.


Derrick Stolee (3):
  sha1_name: Create perf test for find_unique_abbrev()
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix

 Makefile               |  1 +
 sha1_name.c            | 66 ++++++++++++++++++++++++++++++++++++++------------
 t/helper/.gitignore    |  1 +
 t/helper/test-abbrev.c | 22 +++++++++++++++++
 t/perf/p0008-abbrev.sh | 12 +++++++++
 5 files changed, 87 insertions(+), 15 deletions(-)
 create mode 100644 t/helper/test-abbrev.c
 create mode 100755 t/perf/p0008-abbrev.sh

-- 
2.14.1.538.g56ec8fc98.dirty


             reply	other threads:[~2017-09-15 16:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 16:57 Derrick Stolee [this message]
2017-09-15 16:57 ` [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev() Derrick Stolee
2017-09-18  0:51   ` Junio C Hamano
2017-09-18 11:36     ` Derrick Stolee
2017-09-19  0:51       ` Junio C Hamano
2017-09-15 16:57 ` [PATCH 2/3] sha1_name: Unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-09-15 16:57 ` [PATCH 3/3] sha1_name: Parse less while finding common prefix Derrick Stolee
2017-09-15 17:08 ` [PATCH 0/3] Improve abbreviation disambiguation Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170915165750.198201-1-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=kewillf@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).