git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>
Subject: Re: [PATCH 5.5/22] Add documentation for the index api
Date: Thu, 11 Jul 2013 13:30:10 +0200
Message-ID: <87ppupcp19.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8BRw6jqB1XBzDcCr3UXNGG1wRPjwnMrh+EksFf7VsQysg@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Jul 10, 2013 at 3:10 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>>> If you happen to know that certain entries match the given pathspec,
>>> you could help the caller avoid match_pathspec'ing again by set a bit
>>> in ce_flags.
>>
>> I currently don't know which entries do match the pathspec from just
>> reading the index file, additional calls would be needed.  I don't think
>> that would be worth the overhead.
>
> Yeah I now see that you select what to load in v5 with the adjusted
> pathspec, not the input pathspec. Originally I thought you match the
> input pathspec against every file entry in the index :P Your adjusted
> pathspec looks like what common_prefix is for. It's cheaper than
> creating adjusted_pathspec from match_pathspec and reduces loading in
> major cases, where glob is not used.
>
> Still, creating an adjusted pathspec this way looks iffy. You need to
> understand pathspec in order to strip the filename part out to match
> the directory match only. An alternative is use
> tree_entry_interesting. It goes along well with tree traversal and can
> be used to match directories with original pathspec. Once you see it
> matches an entry in a directory, you could skip matching the rest of
> the files and load the whole directory. read_index_filtered_v5 and
> read_entries may need some tweaking though. I'll try it and post a
> patch later if I succeed.

Hrm, I played around a bit with this idea, but I couldn't figure out how
to make it work.  For it to work we would still have to load some
entries in a directory at least?  Or is there a way to match the
directories, which I just haven't figured out yet?

>>> To know which entry exists in the index and which is
>>> new, use another flag. Most reader code won't change if we do it this
>>> way, all match_pathspec() remain where they are.
>>
>> Hrm you mean to know which cache entries are added (or changed) in the
>> in-memory index and will have to be written later?  I'm not sure I
>> understand correctly what you mean here.
>
> Oh.. The "to know.." sentence was nonsense. We probably don't need to
> know. We may track changed entries for partial writing, but let's
> leave that out for now.

Ok, makes sense.

>>>> +`index_change_filter_opts(opts)`::
>>>> +       This function again has a slightly different functionality for
>>>> +       index-v2 and index-v5.
>>>> +
>>>> +       For index-v2 it simply changes the filter_opts, so
>>>> +       for_each_index_entry uses the changed index_opts, to iterate
>>>> +       over a different set of cache entries.
>>>> +
>>>> +       For index-v5 it refreshes the index if the filter_opts have
>>>> +       changed and sets the new filter_opts in the index state, again
>>>> +       to iterate over a different set of cache entries as with
>>>> +       index-v2.
>>>> +
>>>> +       This has some optimization potential, in the case that the
>>>> +       opts get stricter (less of the index should be read) it
>>>> +       doesn't have to reload anything, but currently does.
>>>
>>> The only use case I see so far is converting a partial index_state
>>> back to a full one. Apart from doing so in order to write the new
>>> index, I think some operation (like rename tracking in diff or
>>> unpack-trees) may expect full index. I think we should support that. I
>>> doubt we need to change pathspec to something different than the one
>>> we used to load the index. When a user passes a pathspec to a command,
>>> the user expects the command to operate on that set only, not outside.
>>
>> One application was in ls-files, where we strip the trailing slash from
>> the pathspecs for submodules.  But when we let the caller filter the
>> rest out it's not needed anymore.  We load all entries without the
>> trailing slash anyway.
>
> That submodule trailing slash stripping code will be moved away soon
> (I've been working on it for some time now). There's similar code in
> pathspec.c. I hope by the time this series becomes a candidate for
> 'next', those pathspec manipulation is already gone. For
> strip_trailing_slash_from_submodules, peeking in index file for a few
> entries is probably ok. For check_path_for_gitlink, full index is
> loaded until we figure out a clever way.

Ah great, for now I'll just not use the for_each_index_entry function in
ls-files, and then change the code later once the stripping code is
moved away.

>>> Some thoughts about the writing api.
>>>
>>> In think we should avoid automatically converting partial index into a
>>> full one before writing. Push that back to the caller and die() when
>>> asked to update partial index. They know at what point the index may
>>> be updated and even what part of it may be updated. I think all
>>> commands fall into two categories, tree-wide updates (merge,
>>> checkout...) and limited by the user-given pathspec. "what part to be
>>> updated" is not so hard to determine.
>>
>> Hrm this is only true if index entries are added or removed, not if they
>> are only changed.  If they are only changed we can write a partially
>> read index once we have partial writing.
>
> Yep. We can detect if changes are updates only, no additions nor
> removals. If so do partial write, else full write. These little
> details are hidden from the user, as long as they keep their promise
> about read/write regions.
>
>> For now it would make sense to just die() though, until we have that in place.
>
> Agreed.
> --
> Duy

  reply	other threads:[~2013-07-11 11:30 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07  8:11 [PATCH 00/22] Index v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-07  8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-07  8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-07  8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-07  8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer
2013-07-08  2:01   ` Duy Nguyen
2013-07-08 11:40     ` Thomas Gummerer
2013-07-08  2:19   ` Duy Nguyen
2013-07-08 11:20     ` Thomas Gummerer
2013-07-08 12:45       ` Duy Nguyen
2013-07-08 13:37         ` Thomas Gummerer
2013-07-08 20:54         ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer
2013-07-09 15:42           ` Duy Nguyen
2013-07-09 20:10             ` Thomas Gummerer
2013-07-10  5:28               ` Duy Nguyen
2013-07-11 11:30                 ` Thomas Gummerer [this message]
2013-07-11 11:42                   ` Duy Nguyen
2013-07-11 12:27                     ` Duy Nguyen
2013-07-08 16:36   ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano
2013-07-08 20:10     ` Thomas Gummerer
2013-07-08 23:09       ` Junio C Hamano
2013-07-09 20:13         ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer
2013-07-08 16:31   ` Junio C Hamano
2013-07-08 18:33     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer
2013-07-07  8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-11 10:39   ` Duy Nguyen
2013-07-11 11:39     ` Thomas Gummerer
2013-07-11 11:47       ` Duy Nguyen
2013-07-11 12:26         ` Thomas Gummerer
2013-07-11 12:50           ` Duy Nguyen
2013-07-07  8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-07  8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer
2013-07-07 20:18   ` Eric Sunshine
2013-07-08 11:40     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-07 20:41   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer
2013-07-07 20:43   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-07  8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ppupcp19.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git