git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: Brandon Williams <bmwill@google.com>
Cc: git@vger.kernel.org, "Christian Couder" <chriscool@tuxfamily.org>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 2/3] prune: fix pruning with multiple worktrees and split index
Date: Mon, 11 Dec 2017 21:39:48 +0000	[thread overview]
Message-ID: <20171211213948.GC25616@hank> (raw)
In-Reply-To: <20171211190946.GB177995@google.com>

On 12/11, Brandon Williams wrote:
> On 12/10, Thomas Gummerer wrote:
> > be489d02d2 ("revision.c: --indexed-objects add objects from all
> > worktrees", 2017-08-23) made sure that pruning takes objects from all
> > worktrees into account.
> > 
> > It did that by reading the index of every worktree and adding the
> > necessary index objects to the set of pending objects.  The index is
> > read by read_index_from.  As mentioned in the previous commit,
> > read_index_from depends on the CWD for the location of the split index,
> > and add_index_objects_to_pending doesn't set that before using
> > read_index_from.
> > 
> > Instead of using read_index_from, use repo_read_index, which is aware of
> > the proper paths for the worktree.
> > 
> > This fixes t5304-prune when ran with GIT_TEST_SPLIT_INDEX set.
> > 
> 
> I'm on the fence about this change.  I understand that this will ensure
> that the proper objects aren't pruned when using a split index in the
> presence of worktrees but I think the solution needs to be thought
> through a bit more.
> 
> My big concern right now is the interaction of 'struct worktree's and
> 'struct repository'.  I'll try to highlight my concerns below.

Thanks for the review!  My thoughts on the concerns are below.

> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> > 
> > This also fixes t7009 when ran with GIT_TEST_SPLIT_INDEX.  I'm not
> > quite sure why it is fixed by this.  Either way I tracked the failure
> > down to f767178a5a ("Merge branch 'jk/no-null-sha1-in-cache-tree'",
> > 2017-05-16).  Maybe Peff has an idea why this fixes that test?
> > 
> >  repository.c | 11 +++++++++++
> >  repository.h |  2 ++
> >  revision.c   | 13 ++++++++-----
> >  3 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/repository.c b/repository.c
> > index 928b1f553d..3c9bfbd1b8 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -2,6 +2,7 @@
> >  #include "repository.h"
> >  #include "config.h"
> >  #include "submodule-config.h"
> > +#include "worktree.h"
> >  
> >  /* The main repository */
> >  static struct repository the_repo = {
> > @@ -146,6 +147,16 @@ int repo_init(struct repository *repo, const char *gitdir, const char *worktree)
> >  	return -1;
> >  }
> >  
> > +/*
> > + * Initialize 'repo' based on the provided worktree
> > + * Return 0 upon success and a non-zero value upon failure.
> > + */
> > +int repo_worktree_init(struct repository *repo, struct worktree *worktree)
> > +{
> > +	return repo_init(repo, get_worktree_git_dir(worktree),
> > +			 worktree->path);
> > +}
> 
> My first concern is the use of 'get_worktree_git_dir()'.  Under the hood
> it calls 'get_git_dir()', 'get_git_common_dir()', and
> 'git_common_path()' which rely on global state as stored in
> 'the_repository'.  So how does one initialize a repository struct (using
> this initializer) using a worktree from a repository other than the
> global 'the_repository' struct?  I'm not sure I have an answer right
> now, but its an issue that needs to be thought through before we head
> down this road.

The main reason for just using 'get_worktree_git_dir()' was because it
exists and it does exactly what I needed.  If we ever have the need to
initialize a repository struct for a worktree from another repository
struct, I would imagine we introduce a new function
'repo_get_worktree_git_dir()', which takes a struct repository as
input and returns the worktree git dir based on that.

And then we'd have to add a different initializer
'repo_worktree_init_from_repo()' or something like that, where we
could initialize a worktree from a repository struct different from
'the_repository'.  But maybe there are some complications there that I
didn't think about?

> Just thinking to myself, Does it make sense to have worktree's as a
> separate struct or to have them stored in 'struct repository' in some
> way?

I'm not sure I have a good answer for this.  Looking at the libgit2
source, which has a repository struct for (I assume at least, I'm not
very familiar with the libgit2 source :)), they have a repository
struct for each worktree, with a flag specifying whether the struct is
for a worktree, or a "normal" git repository/the main worktree.

Obviously that doesn't mean we should do it the same way, but it
probably also doesn't hurt to look at prior art.

> Shouldn't a repository struct have a way to interact with all of
> its worktrees?

It could go either way I guess depending on how we want to design it.
If we want the repository struct to interact with all worktrees, it
seems to me like we would almost need some kind of 'struct
super_repository', which then contains a list of 'struct repository's
representing each worktree, probably including the main worktree.

But then again in normal operation we probably only want the
repository struct for the current repository we're working with, not
all of the worktrees.  I suspect it's only in some special cases where
we want to have all worktrees in memory at once (such as in the prune
case).

Or how would you haven envisioned this?

> How would initializing a repository struct for every
> worktree work once we migrate the object store to be stored in 'struct
> repoisotry'?  Shouldn't every worktree share the same object store
> in-memory like they do on-disk?

I haven't thought this far.  They probably want to share the same
object store, if they all want the object store to be loaded.  But
we can do that either way, whether we have one 'struct repository'
that has links to the worktrees, or we have multiple 'struct
repository's, which each have a ref-counted pointer to the object
store or something like that.

> > +
> >  /*
> >   * Initialize 'submodule' as the submodule given by 'path' in parent repository
> >   * 'superproject'.
> > diff --git a/repository.h b/repository.h
> > index 7f5e24a0a2..2adeb05bf4 100644
> > --- a/repository.h
> > +++ b/repository.h
> > @@ -4,6 +4,7 @@
> >  struct config_set;
> >  struct index_state;
> >  struct submodule_cache;
> > +struct worktree;
> >  
> >  struct repository {
> >  	/* Environment */
> > @@ -87,6 +88,7 @@ extern struct repository *the_repository;
> >  extern void repo_set_gitdir(struct repository *repo, const char *path);
> >  extern void repo_set_worktree(struct repository *repo, const char *path);
> >  extern int repo_init(struct repository *repo, const char *gitdir, const char *worktree);
> > +extern int repo_worktree_init(struct repository *repo, struct worktree *worktree);
> >  extern int repo_submodule_init(struct repository *submodule,
> >  			       struct repository *superproject,
> >  			       const char *path);
> > diff --git a/revision.c b/revision.c
> > index e2e691dd5a..9d8d9b96d1 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -22,6 +22,7 @@
> >  #include "packfile.h"
> >  #include "worktree.h"
> >  #include "argv-array.h"
> > +#include "repository.h"
> >  
> >  volatile show_early_output_fn_t show_early_output;
> >  
> > @@ -1346,15 +1347,17 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned int flags)
> >  	worktrees = get_worktrees(0);
> >  	for (p = worktrees; *p; p++) {
> >  		struct worktree *wt = *p;
> > -		struct index_state istate = { NULL };
> > +		struct repository *repo;
> >  
> > +		repo = xmalloc(sizeof(struct repository));
> 
> This was allocated but never freed, was that intentional?

No, that was an oversight, will fix, thanks!

> >  		if (wt->is_current)
> >  			continue; /* current index already taken care of */
> > +		if (repo_worktree_init(repo, wt))
> > +			BUG("couldn't initialize repository object from worktree");
> >  
> > -		if (read_index_from(&istate,
> > -				    worktree_git_path(wt, "index")) > 0)
> > -			do_add_index_objects_to_pending(revs, &istate);
> > -		discard_index(&istate);
> > +		if (repo_read_index(repo) > 0)
> > +			do_add_index_objects_to_pending(revs, repo->index);
> > +		discard_index(repo->index);
> 
> One we have separate object stores per-repository how would we handle
> this since this pruning should only work on a single repository's object
> store?

That depends a lot on how we design the object store and how it's held
by the repository struct.  If we only have one in-memory
representation of it, the answer is obvious.  If we have multiple
in-memory representations of it at the same time, things will get a
bit more hairy.  But until we have at least a design of how that would
look like it's quite hard to tell I'd say.

> >  	}
> >  	free_worktrees(worktrees);
> >  }
> > -- 
> > 2.15.1.504.g5279b80103
> > 
> 
> -- 
> Brandon Williams

  reply	other threads:[~2017-12-11 21:38 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-10 21:21 [PATCH 0/3] fixes for split index mode Thomas Gummerer
2017-12-10 21:22 ` [PATCH 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
2017-12-11 18:54   ` Brandon Williams
2017-12-11 20:37     ` Thomas Gummerer
2017-12-10 21:22 ` [PATCH 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
2017-12-11 19:09   ` Brandon Williams
2017-12-11 21:39     ` Thomas Gummerer [this message]
2017-12-10 21:22 ` [PATCH 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2017-12-10 23:37   ` Eric Sunshine
2017-12-11 21:09   ` SZEDER Gábor
2017-12-11 21:42     ` Thomas Gummerer
2017-12-12 15:54       ` Lars Schneider
2017-12-12 19:15         ` Junio C Hamano
2017-12-12 20:15           ` Thomas Gummerer
2017-12-12 20:51             ` Junio C Hamano
2017-12-13 23:21               ` Thomas Gummerer
2017-12-13 17:21           ` Lars Schneider
2017-12-13 17:38             ` Junio C Hamano
2017-12-13 17:46               ` Lars Schneider
2017-12-13 23:28                 ` Thomas Gummerer
2017-12-17 22:51 ` [PATCH v2 0/3] fixes for split index mode Thomas Gummerer
2017-12-17 22:51   ` [PATCH v2 1/3] repository: fix repo_read_index with submodules Thomas Gummerer
2017-12-18 18:01     ` Brandon Williams
2017-12-18 23:05       ` Thomas Gummerer
2017-12-18 23:05         ` Brandon Williams
2017-12-17 22:51   ` [PATCH v2 2/3] prune: fix pruning with multiple worktrees and split index Thomas Gummerer
2017-12-18 18:19     ` Brandon Williams
2018-01-03 22:18       ` Thomas Gummerer
2018-01-04 19:12         ` Junio C Hamano
2017-12-17 22:51   ` [PATCH v2 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2017-12-18 18:16     ` Lars Schneider
2018-01-04 20:13       ` Thomas Gummerer
2018-01-05 11:03         ` Lars Schneider
2018-01-07 20:02         ` Thomas Gummerer
2018-01-07 22:30   ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
2018-01-07 22:30     ` [PATCH v3 1/3] read-cache: fix reading the shared index for other repos Thomas Gummerer
2018-01-08 10:41       ` Duy Nguyen
2018-01-08 22:41         ` Thomas Gummerer
2018-01-13 22:33           ` Thomas Gummerer
2018-01-08 23:38         ` Brandon Williams
2018-01-09  1:24           ` Duy Nguyen
2018-01-16 21:42       ` Brandon Williams
2018-01-17  0:16         ` Duy Nguyen
2018-01-17  0:32           ` Brandon Williams
2018-01-17 18:16           ` Jonathan Nieder
2018-01-18 10:19             ` Duy Nguyen
2018-01-19 21:57       ` Junio C Hamano
2018-01-20 11:58         ` Thomas Gummerer
2018-01-22  6:14           ` Junio C Hamano
2018-01-27 12:18             ` Thomas Gummerer
2018-02-07 22:41             ` Junio C Hamano
2018-01-07 22:30     ` [PATCH v3 2/3] split-index: don't write cache tree with null oid entries Thomas Gummerer
2018-01-07 22:30     ` [PATCH v3 3/3] travis: run tests with GIT_TEST_SPLIT_INDEX Thomas Gummerer
2018-01-13 22:37     ` [PATCH v3 4/3] read-cache: don't try to write index if we can't write shared index Thomas Gummerer
2018-01-14  9:36       ` Duy Nguyen
2018-01-14 10:18         ` [PATCH 1/3] read-cache.c: change type of "temp" in write_shared_index() Nguyễn Thái Ngọc Duy
2018-01-14 10:18           ` [PATCH 2/3] read-cache.c: move tempfile creation/cleanup out of write_shared_index Nguyễn Thái Ngọc Duy
2018-01-14 10:18           ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index Nguyễn Thái Ngọc Duy
2018-01-18 11:36             ` SZEDER Gábor
2018-01-18 12:47               ` Duy Nguyen
2018-01-18 13:29                 ` Jeff King
2018-01-18 13:36                   ` Duy Nguyen
2018-01-18 15:00                     ` Duy Nguyen
2018-01-18 21:37                       ` Jeff King
2018-01-18 22:32                         ` SZEDER Gábor
2018-01-19  0:30                           ` Duy Nguyen
2018-01-22 13:32                           ` [PATCH 0/5] Travis CI: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 1/5] travis-ci: use 'set -x' for the commands under 'su' " SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 2/5] travis-ci: use 'set -e' in the 32 bit Linux build job SZEDER Gábor
2018-01-23 16:26                               ` Jeff King
2018-01-23 16:32                                 ` Jeff King
2018-01-24 12:12                                 ` SZEDER Gábor
2018-01-24 15:49                                   ` Jeff King
2018-01-22 13:32                             ` [PATCH 3/5] travis-ci: don't repeat the path of the cache directory SZEDER Gábor
2018-01-23 16:30                               ` Jeff King
2018-01-24 13:14                                 ` SZEDER Gábor
2018-01-22 13:32                             ` [PATCH 4/5] travis-ci: don't run the test suite as root in the 32 bit Linux build SZEDER Gábor
2018-01-23 16:43                               ` Jeff King
2018-01-24 13:45                                 ` SZEDER Gábor
2018-01-24 15:56                                   ` Jeff King
2018-01-24 18:01                                     ` Jeff King
2018-01-24 19:51                                       ` Jeff King
2018-01-22 13:32                             ` [PATCH 5/5] travis-ci: don't fail if user already exists on 32 bit Linux build job SZEDER Gábor
2018-01-23 16:46                               ` Jeff King
2018-01-24  0:32                                 ` Duy Nguyen
2018-01-24 19:39                                 ` SZEDER Gábor
2018-01-22 18:27                 ` [PATCH 3/3] read-cache: don't write index twice if we can't write shared index SZEDER Gábor
2018-01-22 19:46                   ` Eric Sunshine
2018-01-22 22:10                     ` SZEDER Gábor
2018-01-24  9:11                   ` Duy Nguyen
2018-01-26 22:44                   ` Lars Schneider
2018-01-14 14:29         ` [PATCH v3 4/3] read-cache: don't try to write index " Thomas Gummerer
2018-01-18 21:53     ` [PATCH v3 0/3] fixes for split index mode Thomas Gummerer
2018-01-19 18:34       ` Junio C Hamano
2018-01-19 21:11         ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171211213948.GC25616@hank \
    --to=t.gummerer@gmail.com \
    --cc=bmwill@google.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).