From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Thomas Rast <trast@inf.ethz.ch>,
Michael Haggerty <mhagger@alum.mit.edu>,
Junio C Hamano <gitster@pobox.com>,
Robin Rosenberg <robin.rosenberg@dewire.com>,
Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2 12/19] read-cache: read index-v5
Date: Fri, 09 Aug 2013 15:10:31 +0200 [thread overview]
Message-ID: <87eha3qabc.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8C4r=5K3ZdzNnNJ95xS3_xnsTSH2bUyLJ+rwRv4Jxo2zw@mail.gmail.com>
Duy Nguyen <pclouds@gmail.com> writes:
> On Wed, Jul 17, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>> [..]
>>
>>>> +static int read_entries(struct index_state *istate, struct directory_entry **de,
>>>> + unsigned int *entry_offset, void **mmap,
>>>> + unsigned long mmap_size, unsigned int *nr,
>>>> + unsigned int *foffsetblock)
>>>> +{
>>>> + struct cache_entry *head = NULL, *tail = NULL;
>>>> + struct conflict_entry *conflict_queue;
>>>> + struct cache_entry *ce;
>>>> + int i;
>>>> +
>>>> + conflict_queue = NULL;
>>>> + if (read_conflicts(&conflict_queue, *de, mmap, mmap_size) < 0)
>>>> + return -1;
>>>> + for (i = 0; i < (*de)->de_nfiles; i++) {
>>>> + if (read_entry(&ce,
>>>> + *de,
>>>> + entry_offset,
>>>> + mmap,
>>>> + mmap_size,
>>>> + foffsetblock) < 0)
>>>> + return -1;
>>>> + ce_queue_push(&head, &tail, ce);
>>>> + *foffsetblock += 4;
>>>> +
>>>> + /*
>>>> + * Add the conflicted entries at the end of the index file
>>>> + * to the in memory format
>>>> + */
>>>> + if (conflict_queue &&
>>>> + (conflict_queue->entries->flags & CONFLICT_CONFLICTED) != 0 &&
>>>> + !cache_name_compare(conflict_queue->name, conflict_queue->namelen,
>>>> + ce->name, ce_namelen(ce))) {
>>>> + struct conflict_part *cp;
>>>> + cp = conflict_queue->entries;
>>>> + cp = cp->next;
>>>> + while (cp) {
>>>> + ce = convert_conflict_part(cp,
>>>> + conflict_queue->name,
>>>> + conflict_queue->namelen);
>>>> + ce_queue_push(&head, &tail, ce);
>>>> + conflict_part_head_remove(&cp);
>>>> + }
>>>> + conflict_entry_head_remove(&conflict_queue);
>>>> + }
>>>
>>> I start to wonder if separating staged entries is a good idea. It
>>> seems to make the code more complicated. The good point about conflict
>>> section at the end of the file is you can just truncate() it out.
>>> Another way is putting staged entries in fileentries, sorted
>>> alphabetically then by stage number, and a flag indicating if the
>>> entry is valid. When you remove resolve an entry, just set the flag to
>>> invalid (partial write), so that read code will skip it.
>>>
>>> I think this approach is reasonably cheap (unless there are a lot of
>>> conflicts) and it simplifies this piece of code. truncate() may be
>>> overrated anyway. In my experience, I "git add <path>" as soon as I
>>> resolve <path> (so that "git diff" shrinks). One entry at a time, one
>>> index write at a time. I don't think I ever resolve everything then
>>> "git add -u .", which is where truncate() shines because staged
>>> entries are removed all at once. We should optimize for one file
>>> resolution at a time, imo.
>>
>> Thanks for your comments. I'll address the other ones once we decided
>> to do with the conflicts.
>>
>> It does make the code quite a bit more complicated, but also has one
>> advantage that you overlooked.
>
> I did overlook, although my goal is to keep the code simpler, not more
> comlicated. The thinking is if we can find everything in fileentries
> table, the code here is simplified, so..
>
>> We wouldn't truncate() when resolving
>> the conflicts. The resolve undo data is stored with the conflicts and
>> therefore we could just flip a bit and set the stage of the cache-entry
>> in the main index to 0 (always once we got partial writing). This can
>> be fast both in case we resolve one entry at a time and when we resolve
>> a lot of entries. The advantage is even bigger when we resolve one
>> entry at a time, when we otherwise would have to re-write the index for
>> each conflict resolution.
>
> If I understand it correctly, filentries can only contain stage-0 or
> stage-1 entries, "stage > 0" entries are stored in conflict data. Once
> a conflict is solved, you update the stage-1 entry in fileentries,
> turning it to stage-0 and recalculate the entry checksum. Conflict
> data remains there to function as the old REUC extension. Correct?
>
> First of all, if that's true, we only need 1 bit for stage in fileentries table.
>
> Secondly, you may get away with looking up to conflict data in this
> function by storing all stages in fileentries (now we need 2-bit
> stage), replicated in conflict data for reuc function. When you
> resolve conflict, you flip stage-1 to stage-0, and flip (a new bit) to
> mark stage-2 entry invalid so the code knows to skip it. Next time the
> index is rewritten, invalid entries are removed, but we still have old
> stage entries in conflict data. The flipping business is pretty much
> what you plan anyway, but the reading code does not need to look at
> both fileentries and conflict data at the same time.
>
> What do you think?
I've now tried it out for my synthetic repository, and created ~115,000
conflicts in it. It's a only a little slower even for that large number
of conflicts (which I don't think will ever happen in practice) but the
code is definitely simpler, so I will go with this. The times for the
old and new version are below.
Test HEAD~1 this tree
-----------------------------------------------------------------------------------
0003.2: v[23]: update-index 3.50(2.87+0.61) 3.52(2.89+0.60) +0.6%
0003.3: v[23]: grep nonexistent -- subdir 1.80(1.43+0.35) 1.86(1.50+0.34) +3.3%
0003.4: v[23]: ls-files -- subdir 1.67(1.29+0.36) 1.70(1.31+0.37) +1.8%
0003.6: v4: update-index 2.97(2.44+0.51) 3.00(2.50+0.48) +1.0%
0003.7: v4: grep nonexistent -- subdir 1.45(1.12+0.31) 1.48(1.06+0.40) +2.1%
0003.8: v4: ls-files -- subdir 1.33(0.99+0.33) 1.35(1.02+0.32) +1.5%
0003.10: v5: update-index 2.42(1.87+0.54) 2.50(1.84+0.63) +3.3%
0003.11: v5: ls-files 1.75(1.38+0.35) 1.80(1.37+0.41) +2.9%
0003.12: v5: grep nonexistent -- subdir 0.07(0.05+0.01) 0.07(0.05+0.01) +0.0%
0003.13: v5: ls-files -- subdir 0.07(0.05+0.01) 0.07(0.06+0.00) +0.0%
next prev parent reply other threads:[~2013-08-09 13:10 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-12 17:26 [PATCH v2 00/19] Index-v5 Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 01/19] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 02/19] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 03/19] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-14 3:10 ` Duy Nguyen
2013-07-19 14:53 ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 04/19] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 05/19] Add documentation for the index api Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 06/19] read-cache: add index reading api Thomas Gummerer
2013-07-14 3:21 ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 07/19] make sure partially read index is not changed Thomas Gummerer
2013-07-14 3:29 ` Duy Nguyen
2013-07-17 12:56 ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 08/19] grep.c: Use index api Thomas Gummerer
2013-07-14 3:32 ` Duy Nguyen
2013-07-15 9:51 ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 09/19] ls-files.c: use " Thomas Gummerer
2013-07-14 3:39 ` Duy Nguyen
2013-07-17 8:07 ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 10/19] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-14 3:59 ` Duy Nguyen
2013-07-17 8:09 ` Thomas Gummerer
2013-08-04 11:26 ` Duy Nguyen
2013-08-04 17:58 ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 12/19] read-cache: read index-v5 Thomas Gummerer
2013-07-14 4:42 ` Duy Nguyen
2013-08-07 8:13 ` Thomas Gummerer
2013-07-15 10:12 ` Duy Nguyen
2013-07-17 8:11 ` Thomas Gummerer
2013-08-08 2:00 ` Duy Nguyen
2013-08-08 13:28 ` Thomas Gummerer
2013-08-09 13:10 ` Thomas Gummerer [this message]
2013-08-07 8:23 ` Thomas Gummerer
2013-08-08 2:09 ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 13/19] read-cache: read resolve-undo data Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 14/19] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 15/19] read-cache: write index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 16/19] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 17/19] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 18/19] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 19/19] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-07-14 2:59 ` [PATCH v2 00/19] Index-v5 Duy Nguyen
2013-07-15 9:30 ` Thomas Gummerer
2013-07-15 9:38 ` Duy Nguyen
2013-07-17 8:12 ` Thomas Gummerer
2013-07-17 23:58 ` Junio C Hamano
2013-07-19 17:37 ` Thomas Gummerer
2013-07-19 18:25 ` Junio C Hamano
2013-07-16 21:03 ` Ramsay Jones
2013-07-17 8:04 ` Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87eha3qabc.fsf@gmail.com \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=robin.rosenberg@dewire.com \
--cc=sunshine@sunshineco.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).