git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>,
	Johannes Sixt <j6t@kdbg.org>,
	Git Mailing List <git@vger.kernel.org>,
	David Turner <dturner@twopensource.com>
Subject: Re: git gc and worktrees
Date: Thu, 2 Jun 2016 06:08:06 +0200	[thread overview]
Message-ID: <574FB126.4090805@alum.mit.edu> (raw)
In-Reply-To: <xmqqmvn4y9zq.fsf@gitster.mtv.corp.google.com>

On 06/01/2016 09:39 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I argue that the fundamental concept in terms of the implementation
>> should be the individual physical reference stores, and these should be
>> compounded together to form the logical reference collections and the
>> sets of reachability roots that are interesting at the UI level.
> 
> That is very good in principle.  How does that principle translate
> to the current setup (with possible enhancement with pluggable ref
> backends) and multiple worktrees?  Let me try thinking it through
> aloud.
> 
>  * Without pluggable ref backend or worktrees, we start from two
>    "physical reference stores"; packed-refs file lists refs that
>    will be covered (overridden) by loose refs in .git/refs/.
>    Symbolic refs always being in loose falls out as a natural
>    consequence that packed-refs file does not record symrefs.
> 
>  * Throw in multiple worktrees to the mix.  How?  Do we consider
>    selected refs/ hierarchies (like refs/bisect/*) as separate
>    physical store (even though it might be backed by the files in
>    the same .git/refs/ filesystem hierarchy) and represent the
>    "logical" view as an overlay across the traditional two types of
>    physical reference stores?  That is:
> 
>    - loose refs in .git/HEAD, .git/refs/{bisect,...} for
>      per-worktree part form one physical store.  If a ref is found
>      here, that is what we use as a part of the logical view.
> 
>    - loose refs in .git/refs/{branches,tags,notes,...} for common
>      part form one physical store.  For a ref that is not found
>      above but is found here becomes a part of the logical view.
> 
>    - packed refs in .git/packed-refs is another physical store.  For
>      a ref that is not found in the above two but is found here
>      becomes a part of the logical view.

I think I would represent the logical store of a worktree repo as
follows. First, I would implement a cached_ref_store that introduces a
layer of caching around another ref_store. Then

    def get_files_ref_store(dir) {
        loose = create_cached_ref_store(get_loose_ref_store(dir))
        packed = create_cached_ref_store(get_packed_ref_store(dir))
        return create_files_ref_store(loose, packed)
    }

    common_ref_store = get_files_ref_store(common_dir)

    /*
     * I think we only allow loose refs in worktrees; otherwise
     * this could be an overlay_ref_store too. Actually, we might
     * want to omit the caching here.
     */
    local_ref_store = create_cached_ref_store(
            get_loose_ref_store(git_dir))

    logical_ref_store = create_worktree_ref_store(
        local_ref_store, common_ref_store)

Where worktree_ref_store does something like

    if (is_per_worktree_ref(refname))
        lookup in local_ref_store
    else
        lookup in common_ref_store

for reading, and uses a merge_ref_iterator with a select function that
does something similar for iterating.

The files_ref_store would do lookups by looking first in the
loose_ref_store then in the packed_ref_store, would use an
overlay_ref_iterator for iteration, and would know to do all writes in
the loose_ref_store (except for deletes, which also have to go to
packed_ref_store). It would have a special "pack-refs" operation,
specific to files_ref_store, that shuffles references between its two
backends.

Writing to a worktree_ref_store is a bit tricker, because we want to
allow ref_transactions to span worktree and common refs (though we
probably need to give up atomicity for any such transaction). The
worktree_ref_transaction_commit() method has to split the main
transaction into two sub-transactions, one for each of its component
ref_stores. I planned for this when designing split_under_lock and think
it is possible, though I admit I haven't implemented it yet.

One nice thing about this design is that you can skip the
worktree_ref_store layer and its overhead entirely for repositories that
are not linked. The decision can be made once, at instantiation time,
rather than every time a reference is looked up. See the pseudocode below.

> Up to this point, I am all for your "separate physical stores are
> composited to give a logical view".  I can see how multi-worktree
> world view fits within that framework.
> 
>  * With pluggable ref backend, we may gain yet another "physical
>    reference store" possibility, e.g. one backed by lmdb.  If it
>    supports symrefs, a repoitory may use lmdb backed reference store
>    without the traditional two.
> 
>    But it is unclear how it would interact with the multi-worktree
>    world order.

Since you could plug-and-play different ref_stores in the above scheme,
I don't see any problem here.

    def get_logical_ref_store() {
        local_ref_store = get_local_ref_store(git_dir)
        if (is_linked_repo) {
            common_ref_store = get_ref_store(common_dir)
            return worktree_ref_store(local_ref_store,
                                      common_ref_store)
        } else {
            return local_ref_store;
        }
    }

get_ref_store() would read the git config to decide what the ref store
to use for the specified repository, which itself might be an
lmdb_ref_store or an overlay_ref_store(loose_ref_store, packed_ref_store).

Michael

  reply	other threads:[~2016-06-02  4:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-31  7:07 git gc and worktrees Johannes Sixt
2016-05-31 12:02 ` Duy Nguyen
2016-05-31 22:14   ` Jeff King
2016-06-01  7:00     ` Johannes Sixt
2016-06-01  8:57     ` Michael Haggerty
2016-06-01 15:15       ` Junio C Hamano
2016-06-01 16:12         ` Michael Haggerty
2016-06-01 19:39           ` Junio C Hamano
2016-06-02  4:08             ` Michael Haggerty [this message]
2016-06-03 16:45               ` Junio C Hamano
2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending() Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
2016-06-01 18:13     ` Eric Sunshine
2016-06-02  9:35       ` Duy Nguyen
2016-06-01 18:57     ` David Turner
2016-06-02  9:37       ` Duy Nguyen
2016-06-01 10:45   ` [PATCH 3/4] reachable.c: mark reachable detached HEAD " Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs Nguyễn Thái Ngọc Duy
2016-06-01 15:51     ` Michael Haggerty
2016-06-01 16:01   ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Jeff King
2016-06-01 16:06   ` Junio C Hamano
2016-06-02  9:53     ` Duy Nguyen
2016-06-02 11:26       ` Michael Haggerty
2016-06-02 17:44         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=574FB126.4090805@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).