git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"David Turner" <novalis@novalis.org>, "Jeff King" <peff@peff.net>,
	git@vger.kernel.org, "Michael Haggerty" <mhagger@alum.mit.edu>
Subject: [PATCH 00/20] Separate `ref_cache` into a separate module
Date: Mon, 20 Mar 2017 17:33:05 +0100	[thread overview]
Message-ID: <cover.1490026594.git.mhagger@alum.mit.edu> (raw)

I had a window of opportunity last week to hack intensely on Git, with
the following goals:

* Separate `ref_cache` out of `files_ref_cache`.

* Separate a new `packed_ref_cache` class out of `files_ref_cache`.
  Change the latter to use an instance of the former for all of its
  interactions with the `packed-refs` file.

* Mmap `packed-refs` files rather than reading-and-parsing.

* Use the mmapped version of the `packed-refs` file as the "cache"
  rather than using a separate `ref_cache`.

* (And the main goal): Avoid reading and parsing the *whole
  `packed-refs` file* (as we do now) every time any part of it is
  needed. Instead, use binary search to find the reference and/or
  range of references that we want, and parse the info out of the
  mmapped image on the fly.

I've completed a draft of an epic 48-patch series implementing all of
the above points on my GitHub fork [1] as branch
`wip/mmap-packed-refs`. It dramatically speeds up performance and
reduces memory usage for some tasks in repositories with very many
packed references.

But the later parts of that series aren't completely polished yet, and
such a large patch series would be indigestible anyway, so here I
submit the first part...

This patch series extracts a `ref_cache` module out of
`files_ref_cache`, and goes some way to disentangling those two
modules, which until now were overly intimate with each other:

* Remove `verify_refname_available()` from the refs VTABLE, instead
  implementing it in a generic way that uses only the usual refs API
  to talk to the `ref_store`.

* Split `ref_cache`-related code into a new module,
  `refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
  class, `struct ref_cache`.

* Change the lazy-filling mechanism of `ref_cache` to call back to its
  backing `ref_store` via a callback function rather than calling
  `read_loose_refs()` directly.

* Move the special handling of `refs/bisect/` from `ref_cache` to
  `files_ref_store`.

* Make `cache_ref_iterator_begin()` smarter, and change external users
  to iterate via this interface instead of using
  `do_for_each_entry_in_dir()`.

Even after this patch series, the modules are still too intimate for
my taste, but I think this is a big step forward, and it is enough to
allow the other changes that I've been working on.

These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
They are also available from my GitHub fork [1] as branch
`separate-ref-cache`.

Happily, this patch series actually removes a few more lines than it
adds, mostly thanks to the simpler `verify_refname_available()`
implementation.

Michael

[1] https://github.com/mhagger/git
[2] http://public-inbox.org/git/20170318020337.22767-1-pclouds@gmail.com/

Michael Haggerty (20):
  get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
  refs_read_raw_ref(): new function
  refs_ref_iterator_begin(): new function
  refs_verify_refname_available(): implement once for all backends
  refs_verify_refname_available(): use function in more places
  Rename `add_ref()` to `add_ref_entry()`
  Rename `find_ref()` to `find_ref_entry()`
  Rename `remove_entry()` to `remove_entry_from_dir()`
  refs: split `ref_cache` code into separate files
  ref-cache: introduce a new type, ref_cache
  refs: record the ref_store in ref_cache, not ref_dir
  ref-cache: use a callback function to fill the cache
  refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
  do_for_each_entry_in_dir(): eliminate `offset` argument
  get_loose_ref_dir(): function renamed from get_loose_refs()
  get_loose_ref_cache(): new function
  cache_ref_iterator_begin(): make function smarter
  commit_packed_refs(): use reference iteration
  files_pack_refs(): use reference iteration
  do_for_each_entry_in_dir(): delete function

 Makefile             |    1 +
 refs.c               |  111 ++++-
 refs.h               |    2 +-
 refs/files-backend.c | 1229 +++++++-------------------------------------------
 refs/ref-cache.c     |  523 +++++++++++++++++++++
 refs/ref-cache.h     |  267 +++++++++++
 refs/refs-internal.h |   22 +-
 7 files changed, 1066 insertions(+), 1089 deletions(-)
 create mode 100644 refs/ref-cache.c
 create mode 100644 refs/ref-cache.h

-- 
2.11.0


             reply	other threads:[~2017-03-20 16:33 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 16:33 Michael Haggerty [this message]
2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
2017-03-20 16:33 ` [PATCH 02/20] refs_read_raw_ref(): new function Michael Haggerty
2017-03-20 16:33 ` [PATCH 03/20] refs_ref_iterator_begin(): " Michael Haggerty
2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
2017-03-20 17:42   ` Jeff King
2017-03-20 22:20     ` Michael Haggerty
2017-03-20 16:33 ` [PATCH 05/20] refs_verify_refname_available(): use function in more places Michael Haggerty
2017-03-20 16:33 ` [PATCH 06/20] Rename `add_ref()` to `add_ref_entry()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 07/20] Rename `find_ref()` to `find_ref_entry()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 08/20] Rename `remove_entry()` to `remove_entry_from_dir()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 09/20] refs: split `ref_cache` code into separate files Michael Haggerty
2017-03-20 17:49   ` Jeff King
2017-03-20 19:47     ` Junio C Hamano
2017-03-20 20:35       ` Stefan Beller
2017-03-20 22:40         ` Junio C Hamano
2017-03-20 16:33 ` [PATCH 10/20] ref-cache: introduce a new type, ref_cache Michael Haggerty
2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
2017-03-20 17:51   ` Jeff King
2017-03-20 22:39     ` Michael Haggerty
2017-03-20 16:33 ` [PATCH 12/20] ref-cache: use a callback function to fill the cache Michael Haggerty
2017-03-20 16:33 ` [PATCH 13/20] refs: handle "refs/bisect/" in `loose_fill_ref_dir()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 14/20] do_for_each_entry_in_dir(): eliminate `offset` argument Michael Haggerty
2017-03-20 16:33 ` [PATCH 15/20] get_loose_ref_dir(): function renamed from get_loose_refs() Michael Haggerty
2017-03-20 16:33 ` [PATCH 16/20] get_loose_ref_cache(): new function Michael Haggerty
2017-03-20 16:33 ` [PATCH 17/20] cache_ref_iterator_begin(): make function smarter Michael Haggerty
2017-03-20 16:33 ` [PATCH 18/20] commit_packed_refs(): use reference iteration Michael Haggerty
2017-03-20 18:05   ` Jeff King
2017-03-22  8:42     ` Michael Haggerty
2017-03-22 13:06       ` Jeff King
2017-03-20 16:33 ` [PATCH 19/20] files_pack_refs(): " Michael Haggerty
2017-03-20 16:33 ` [PATCH 20/20] do_for_each_entry_in_dir(): delete function Michael Haggerty
2017-03-20 17:25 ` [PATCH 00/20] Separate `ref_cache` into a separate module Junio C Hamano
2017-03-20 18:12 ` Jeff King
2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
2017-03-20 18:30   ` Jeff King
2017-03-20 22:32 ` Junio C Hamano
2017-03-20 22:48   ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1490026594.git.mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=novalis@novalis.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).