git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, stolee@gmail.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, sbeller@google.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v4 0/4] Improve abbreviation disambiguation
Date: Sun,  8 Oct 2017 14:49:38 -0400
Message-ID: <20171008184942.69444-1-dstolee@microsoft.com> (raw)

Changes since previous version:

* Fixed an overflow error in the binary search. I sent a separate patch
  to fix this error in existing searches; that patch should be applied
  before this one.

* Removed test-list-objects and test-abbrev in favor of a new git log
  test in p4211-line-log.sh. Limited perf numbers for Linux repo are
  given in cover letter and commit 4/4.

* Silently skip packfiles that fail to open with open_pack_index()

Thanks for all the comments from Jeff, Junio, Ramsey, and Stefan!

Thanks,
 Stolee

---

When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time
computing diffs. Below we report performance statistics for perf test
4211.6 from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | Base Time | New Time | Rel%  |
|-------|--------|-----------|----------|-------|
|  1    |      0 |   41.27 s |  38.93 s | -4.8% |
| 24    |      0 |   98.04 s |  91.35 s | -5.7% |
| 23    | 323952 |  117.78 s | 112.18 s | -4.8% |

Derrick Stolee (4):
  p4211-line-log.sh: add log --online --raw --parents perf test
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix
  sha1_name: Minimize OID comparisons during disambiguation

 sha1_name.c              | 129 +++++++++++++++++++++++++++++++++++++++++------
 t/perf/p4211-line-log.sh |   4 ++
 2 files changed, 118 insertions(+), 15 deletions(-)

-- 
2.14.1.538.g56ec8fc98.dirty


             reply index

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-08 18:49 Derrick Stolee [this message]
2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171008184942.69444-1-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox