git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>
Subject: Re: [PATCH 5.5/22] Add documentation for the index api
Date: Wed, 10 Jul 2013 12:28:24 +0700
Message-ID: <CACsJy8BRw6jqB1XBzDcCr3UXNGG1wRPjwnMrh+EksFf7VsQysg@mail.gmail.com> (raw)
In-Reply-To: <87wqozpk9s.fsf@gmail.com>

On Wed, Jul 10, 2013 at 3:10 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> If you happen to know that certain entries match the given pathspec,
>> you could help the caller avoid match_pathspec'ing again by set a bit
>> in ce_flags.
>
> I currently don't know which entries do match the pathspec from just
> reading the index file, additional calls would be needed.  I don't think
> that would be worth the overhead.

Yeah I now see that you select what to load in v5 with the adjusted
pathspec, not the input pathspec. Originally I thought you match the
input pathspec against every file entry in the index :P Your adjusted
pathspec looks like what common_prefix is for. It's cheaper than
creating adjusted_pathspec from match_pathspec and reduces loading in
major cases, where glob is not used.

Still, creating an adjusted pathspec this way looks iffy. You need to
understand pathspec in order to strip the filename part out to match
the directory match only. An alternative is use
tree_entry_interesting. It goes along well with tree traversal and can
be used to match directories with original pathspec. Once you see it
matches an entry in a directory, you could skip matching the rest of
the files and load the whole directory. read_index_filtered_v5 and
read_entries may need some tweaking though. I'll try it and post a
patch later if I succeed.

>> To know which entry exists in the index and which is
>> new, use another flag. Most reader code won't change if we do it this
>> way, all match_pathspec() remain where they are.
>
> Hrm you mean to know which cache entries are added (or changed) in the
> in-memory index and will have to be written later?  I'm not sure I
> understand correctly what you mean here.

Oh.. The "to know.." sentence was nonsense. We probably don't need to
know. We may track changed entries for partial writing, but let's
leave that out for now.

>>> +`index_change_filter_opts(opts)`::
>>> +       This function again has a slightly different functionality for
>>> +       index-v2 and index-v5.
>>> +
>>> +       For index-v2 it simply changes the filter_opts, so
>>> +       for_each_index_entry uses the changed index_opts, to iterate
>>> +       over a different set of cache entries.
>>> +
>>> +       For index-v5 it refreshes the index if the filter_opts have
>>> +       changed and sets the new filter_opts in the index state, again
>>> +       to iterate over a different set of cache entries as with
>>> +       index-v2.
>>> +
>>> +       This has some optimization potential, in the case that the
>>> +       opts get stricter (less of the index should be read) it
>>> +       doesn't have to reload anything, but currently does.
>>
>> The only use case I see so far is converting a partial index_state
>> back to a full one. Apart from doing so in order to write the new
>> index, I think some operation (like rename tracking in diff or
>> unpack-trees) may expect full index. I think we should support that. I
>> doubt we need to change pathspec to something different than the one
>> we used to load the index. When a user passes a pathspec to a command,
>> the user expects the command to operate on that set only, not outside.
>
> One application was in ls-files, where we strip the trailing slash from
> the pathspecs for submodules.  But when we let the caller filter the
> rest out it's not needed anymore.  We load all entries without the
> trailing slash anyway.

That submodule trailing slash stripping code will be moved away soon
(I've been working on it for some time now). There's similar code in
pathspec.c. I hope by the time this series becomes a candidate for
'next', those pathspec manipulation is already gone. For
strip_trailing_slash_from_submodules, peeking in index file for a few
entries is probably ok. For check_path_for_gitlink, full index is
loaded until we figure out a clever way.

>> Some thoughts about the writing api.
>>
>> In think we should avoid automatically converting partial index into a
>> full one before writing. Push that back to the caller and die() when
>> asked to update partial index. They know at what point the index may
>> be updated and even what part of it may be updated. I think all
>> commands fall into two categories, tree-wide updates (merge,
>> checkout...) and limited by the user-given pathspec. "what part to be
>> updated" is not so hard to determine.
>
> Hrm this is only true if index entries are added or removed, not if they
> are only changed.  If they are only changed we can write a partially
> read index once we have partial writing.

Yep. We can detect if changes are updates only, no additions nor
removals. If so do partial write, else full write. These little
details are hidden from the user, as long as they keep their promise
about read/write regions.

> For now it would make sense to just die() though, until we have that in place.

Agreed.
--
Duy

  reply	other threads:[~2013-07-10  5:29 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07  8:11 [PATCH 00/22] Index v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-07  8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-07  8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-07  8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-07  8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer
2013-07-08  2:01   ` Duy Nguyen
2013-07-08 11:40     ` Thomas Gummerer
2013-07-08  2:19   ` Duy Nguyen
2013-07-08 11:20     ` Thomas Gummerer
2013-07-08 12:45       ` Duy Nguyen
2013-07-08 13:37         ` Thomas Gummerer
2013-07-08 20:54         ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer
2013-07-09 15:42           ` Duy Nguyen
2013-07-09 20:10             ` Thomas Gummerer
2013-07-10  5:28               ` Duy Nguyen [this message]
2013-07-11 11:30                 ` Thomas Gummerer
2013-07-11 11:42                   ` Duy Nguyen
2013-07-11 12:27                     ` Duy Nguyen
2013-07-08 16:36   ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano
2013-07-08 20:10     ` Thomas Gummerer
2013-07-08 23:09       ` Junio C Hamano
2013-07-09 20:13         ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer
2013-07-08 16:31   ` Junio C Hamano
2013-07-08 18:33     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer
2013-07-07  8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-11 10:39   ` Duy Nguyen
2013-07-11 11:39     ` Thomas Gummerer
2013-07-11 11:47       ` Duy Nguyen
2013-07-11 12:26         ` Thomas Gummerer
2013-07-11 12:50           ` Duy Nguyen
2013-07-07  8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-07  8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer
2013-07-07 20:18   ` Eric Sunshine
2013-07-08 11:40     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-07 20:41   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer
2013-07-07 20:43   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-07  8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8BRw6jqB1XBzDcCr3UXNGG1wRPjwnMrh+EksFf7VsQysg@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=robin.rosenberg@dewire.com \
    --cc=t.gummerer@gmail.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git