From: Thomas Gummerer <firstname.lastname@example.org> To: Duy Nguyen <email@example.com> Cc: Git Mailing List <firstname.lastname@example.org>, Thomas Rast <email@example.com>, Michael Haggerty <firstname.lastname@example.org>, Junio C Hamano <email@example.com>, Robin Rosenberg <firstname.lastname@example.org> Subject: Re: [PATCH 5.5/22] Add documentation for the index api Date: Thu, 11 Jul 2013 13:30:10 +0200 [thread overview] Message-ID: <email@example.com> (raw) In-Reply-To: <CACsJy8BRw6jqB1XBzDcCr3UXNGG1wRPjwnMrh+EksFf7VsQysg@mail.gmail.com> Duy Nguyen <firstname.lastname@example.org> writes: > On Wed, Jul 10, 2013 at 3:10 AM, Thomas Gummerer <email@example.com> wrote: >>> If you happen to know that certain entries match the given pathspec, >>> you could help the caller avoid match_pathspec'ing again by set a bit >>> in ce_flags. >> >> I currently don't know which entries do match the pathspec from just >> reading the index file, additional calls would be needed. I don't think >> that would be worth the overhead. > > Yeah I now see that you select what to load in v5 with the adjusted > pathspec, not the input pathspec. Originally I thought you match the > input pathspec against every file entry in the index :P Your adjusted > pathspec looks like what common_prefix is for. It's cheaper than > creating adjusted_pathspec from match_pathspec and reduces loading in > major cases, where glob is not used. > > Still, creating an adjusted pathspec this way looks iffy. You need to > understand pathspec in order to strip the filename part out to match > the directory match only. An alternative is use > tree_entry_interesting. It goes along well with tree traversal and can > be used to match directories with original pathspec. Once you see it > matches an entry in a directory, you could skip matching the rest of > the files and load the whole directory. read_index_filtered_v5 and > read_entries may need some tweaking though. I'll try it and post a > patch later if I succeed. Hrm, I played around a bit with this idea, but I couldn't figure out how to make it work. For it to work we would still have to load some entries in a directory at least? Or is there a way to match the directories, which I just haven't figured out yet? >>> To know which entry exists in the index and which is >>> new, use another flag. Most reader code won't change if we do it this >>> way, all match_pathspec() remain where they are. >> >> Hrm you mean to know which cache entries are added (or changed) in the >> in-memory index and will have to be written later? I'm not sure I >> understand correctly what you mean here. > > Oh.. The "to know.." sentence was nonsense. We probably don't need to > know. We may track changed entries for partial writing, but let's > leave that out for now. Ok, makes sense. >>>> +`index_change_filter_opts(opts)`:: >>>> + This function again has a slightly different functionality for >>>> + index-v2 and index-v5. >>>> + >>>> + For index-v2 it simply changes the filter_opts, so >>>> + for_each_index_entry uses the changed index_opts, to iterate >>>> + over a different set of cache entries. >>>> + >>>> + For index-v5 it refreshes the index if the filter_opts have >>>> + changed and sets the new filter_opts in the index state, again >>>> + to iterate over a different set of cache entries as with >>>> + index-v2. >>>> + >>>> + This has some optimization potential, in the case that the >>>> + opts get stricter (less of the index should be read) it >>>> + doesn't have to reload anything, but currently does. >>> >>> The only use case I see so far is converting a partial index_state >>> back to a full one. Apart from doing so in order to write the new >>> index, I think some operation (like rename tracking in diff or >>> unpack-trees) may expect full index. I think we should support that. I >>> doubt we need to change pathspec to something different than the one >>> we used to load the index. When a user passes a pathspec to a command, >>> the user expects the command to operate on that set only, not outside. >> >> One application was in ls-files, where we strip the trailing slash from >> the pathspecs for submodules. But when we let the caller filter the >> rest out it's not needed anymore. We load all entries without the >> trailing slash anyway. > > That submodule trailing slash stripping code will be moved away soon > (I've been working on it for some time now). There's similar code in > pathspec.c. I hope by the time this series becomes a candidate for > 'next', those pathspec manipulation is already gone. For > strip_trailing_slash_from_submodules, peeking in index file for a few > entries is probably ok. For check_path_for_gitlink, full index is > loaded until we figure out a clever way. Ah great, for now I'll just not use the for_each_index_entry function in ls-files, and then change the code later once the stripping code is moved away. >>> Some thoughts about the writing api. >>> >>> In think we should avoid automatically converting partial index into a >>> full one before writing. Push that back to the caller and die() when >>> asked to update partial index. They know at what point the index may >>> be updated and even what part of it may be updated. I think all >>> commands fall into two categories, tree-wide updates (merge, >>> checkout...) and limited by the user-given pathspec. "what part to be >>> updated" is not so hard to determine. >> >> Hrm this is only true if index entries are added or removed, not if they >> are only changed. If they are only changed we can write a partially >> read index once we have partial writing. > > Yep. We can detect if changes are updates only, no additions nor > removals. If so do partial write, else full write. These little > details are hidden from the user, as long as they keep their promise > about read/write regions. > >> For now it would make sense to just die() though, until we have that in place. > > Agreed. > -- > Duy
next prev parent reply other threads:[~2013-07-11 11:30 UTC|newest] Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-07-07 8:11 [PATCH 00/22] Index v5 Thomas Gummerer 2013-07-07 8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than  Thomas Gummerer 2013-07-07 8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer 2013-07-07 8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer 2013-07-07 8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer 2013-07-07 8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer 2013-07-08 2:01 ` Duy Nguyen 2013-07-08 11:40 ` Thomas Gummerer 2013-07-08 2:19 ` Duy Nguyen 2013-07-08 11:20 ` Thomas Gummerer 2013-07-08 12:45 ` Duy Nguyen 2013-07-08 13:37 ` Thomas Gummerer 2013-07-08 20:54 ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer 2013-07-09 15:42 ` Duy Nguyen 2013-07-09 20:10 ` Thomas Gummerer 2013-07-10 5:28 ` Duy Nguyen 2013-07-11 11:30 ` Thomas Gummerer [this message] 2013-07-11 11:42 ` Duy Nguyen 2013-07-11 12:27 ` Duy Nguyen 2013-07-08 16:36 ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano 2013-07-08 20:10 ` Thomas Gummerer 2013-07-08 23:09 ` Junio C Hamano 2013-07-09 20:13 ` Thomas Gummerer 2013-07-07 8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer 2013-07-08 16:31 ` Junio C Hamano 2013-07-08 18:33 ` Thomas Gummerer 2013-07-07 8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer 2013-07-07 8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer 2013-07-07 8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer 2013-07-07 8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer 2013-07-07 8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer 2013-07-07 8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer 2013-07-07 8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer 2013-07-11 10:39 ` Duy Nguyen 2013-07-11 11:39 ` Thomas Gummerer 2013-07-11 11:47 ` Duy Nguyen 2013-07-11 12:26 ` Thomas Gummerer 2013-07-11 12:50 ` Duy Nguyen 2013-07-07 8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer 2013-07-07 8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer 2013-07-07 20:18 ` Eric Sunshine 2013-07-08 11:40 ` Thomas Gummerer 2013-07-07 8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer 2013-07-07 8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer 2013-07-07 20:41 ` Eric Sunshine 2013-07-07 8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer 2013-07-07 20:43 ` Eric Sunshine 2013-07-07 8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer 2013-07-07 8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer 2013-07-07 8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer 2013-07-07 8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH 5.5/22] Add documentation for the index api' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).