Re: [PATCH v2 12/19] read-cache: read index-v5

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2 12/19] read-cache: read index-v5
Date: Thu, 08 Aug 2013 15:28:14 +0200	[thread overview]
Message-ID: <87haf0qpld.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8C4r=5K3ZdzNnNJ95xS3_xnsTSH2bUyLJ+rwRv4Jxo2zw@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Jul 17, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>> [..]
>>
>>>> +static int read_entries(struct index_state *istate, struct directory_entry **de,
>>>> +                       unsigned int *entry_offset, void **mmap,
>>>> +                       unsigned long mmap_size, unsigned int *nr,
>>>> +                       unsigned int *foffsetblock)
>>>> +{
>>>> +       struct cache_entry *head = NULL, *tail = NULL;
>>>> +       struct conflict_entry *conflict_queue;
>>>> +       struct cache_entry *ce;
>>>> +       int i;
>>>> +
>>>> +       conflict_queue = NULL;
>>>> +       if (read_conflicts(&conflict_queue, *de, mmap, mmap_size) < 0)
>>>> +               return -1;
>>>> +       for (i = 0; i < (*de)->de_nfiles; i++) {
>>>> +               if (read_entry(&ce,
>>>> +                              *de,
>>>> +                              entry_offset,
>>>> +                              mmap,
>>>> +                              mmap_size,
>>>> +                              foffsetblock) < 0)
>>>> +                       return -1;
>>>> +               ce_queue_push(&head, &tail, ce);
>>>> +               *foffsetblock += 4;
>>>> +
>>>> +               /*
>>>> +                * Add the conflicted entries at the end of the index file
>>>> +                * to the in memory format
>>>> +                */
>>>> +               if (conflict_queue &&
>>>> +                   (conflict_queue->entries->flags & CONFLICT_CONFLICTED) != 0 &&
>>>> +                   !cache_name_compare(conflict_queue->name, conflict_queue->namelen,
>>>> +                                       ce->name, ce_namelen(ce))) {
>>>> +                       struct conflict_part *cp;
>>>> +                       cp = conflict_queue->entries;
>>>> +                       cp = cp->next;
>>>> +                       while (cp) {
>>>> +                               ce = convert_conflict_part(cp,
>>>> +                                               conflict_queue->name,
>>>> +                                               conflict_queue->namelen);
>>>> +                               ce_queue_push(&head, &tail, ce);
>>>> +                               conflict_part_head_remove(&cp);
>>>> +                       }
>>>> +                       conflict_entry_head_remove(&conflict_queue);
>>>> +               }
>>>
>>> I start to wonder if separating staged entries is a good idea. It
>>> seems to make the code more complicated. The good point about conflict
>>> section at the end of the file is you can just truncate() it out.
>>> Another way is putting staged entries in fileentries, sorted
>>> alphabetically then by stage number, and a flag indicating if the
>>> entry is valid. When you remove resolve an entry, just set the flag to
>>> invalid (partial write), so that read code will skip it.
>>>
>>> I think this approach is reasonably cheap (unless there are a lot of
>>> conflicts) and it simplifies this piece of code. truncate() may be
>>> overrated anyway. In my experience, I "git add <path>" as soon as I
>>> resolve <path> (so that "git diff" shrinks). One entry at a time, one
>>> index write at a time. I don't think I ever resolve everything then
>>> "git add -u .", which is where truncate() shines because staged
>>> entries are removed all at once. We should optimize for one file
>>> resolution at a time, imo.
>>
>> Thanks for your comments.  I'll address the other ones once we decided
>> to do with the conflicts.
>>
>> It does make the code quite a bit more complicated, but also has one
>> advantage that you overlooked.
>
> I did overlook, although my goal is to keep the code simpler, not more
> comlicated. The thinking is if we can find everything in fileentries
> table, the code here is simplified, so..
>
>> We wouldn't truncate() when resolving
>> the conflicts.  The resolve undo data is stored with the conflicts and
>> therefore we could just flip a bit and set the stage of the cache-entry
>> in the main index to 0 (always once we got partial writing).  This can
>> be fast both in case we resolve one entry at a time and when we resolve
>> a lot of entries.  The advantage is even bigger when we resolve one
>> entry at a time, when we otherwise would have to re-write the index for
>> each conflict resolution.
>
> If I understand it correctly, filentries can only contain stage-0 or
> stage-1 entries, "stage > 0" entries are stored in conflict data. Once
> a conflict is solved, you update the stage-1 entry in fileentries,
> turning it to stage-0 and recalculate the entry checksum. Conflict
> data remains there to function as the old REUC extension. Correct?

Correct.

> First of all, if that's true, we only need 1 bit for stage in fileentries table.
>
> Secondly, you may get away with looking up to conflict data in this
> function by storing all stages in fileentries (now we need 2-bit
> stage), replicated in conflict data for reuc function. When you
> resolve conflict, you flip stage-1 to stage-0, and flip (a new bit) to
> mark stage-2 entry invalid so the code knows to skip it. Next time the
> index is rewritten, invalid entries are removed, but we still have old
> stage entries in conflict data. The flipping business is pretty much
> what you plan anyway, but the reading code does not need to look at
> both fileentries and conflict data at the same time.
>
> What do you think?

I think that's a good idea, thanks.  It may slow the index down a bit
when there are a lot of conflicts, but I think once we get to the point
that it slows the index down noticeably, the user has much bigger
problems anyway and will probably not try to resolve them all.

The invalid entry flag could also be used when entries are only removed
from the index, so we can just set that bit instead of re-writing the
whole index file.

I'll try to implement this idea to see if the experiments match the
expectations and then send a re-roll.

next prev parent reply	other threads:[~2013-08-08 13:28 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 17:26 [PATCH v2 00/19] Index-v5 Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 01/19] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 02/19] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 03/19] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-14  3:10   ` Duy Nguyen
2013-07-19 14:53     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 04/19] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 05/19] Add documentation for the index api Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 06/19] read-cache: add index reading api Thomas Gummerer
2013-07-14  3:21   ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 07/19] make sure partially read index is not changed Thomas Gummerer
2013-07-14  3:29   ` Duy Nguyen
2013-07-17 12:56     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 08/19] grep.c: Use index api Thomas Gummerer
2013-07-14  3:32   ` Duy Nguyen
2013-07-15  9:51     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 09/19] ls-files.c: use " Thomas Gummerer
2013-07-14  3:39   ` Duy Nguyen
2013-07-17  8:07     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 10/19] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-14  3:59   ` Duy Nguyen
2013-07-17  8:09     ` Thomas Gummerer
2013-08-04 11:26   ` Duy Nguyen
2013-08-04 17:58     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 12/19] read-cache: read index-v5 Thomas Gummerer
2013-07-14  4:42   ` Duy Nguyen
2013-08-07  8:13     ` Thomas Gummerer
2013-07-15 10:12   ` Duy Nguyen
2013-07-17  8:11     ` Thomas Gummerer
2013-08-08  2:00       ` Duy Nguyen
2013-08-08 13:28         ` Thomas Gummerer [this message]
2013-08-09 13:10         ` Thomas Gummerer
2013-08-07  8:23     ` Thomas Gummerer
2013-08-08  2:09       ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 13/19] read-cache: read resolve-undo data Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 14/19] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 15/19] read-cache: write index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 16/19] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 17/19] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 18/19] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 19/19] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-07-14  2:59 ` [PATCH v2 00/19] Index-v5 Duy Nguyen
2013-07-15  9:30   ` Thomas Gummerer
2013-07-15  9:38     ` Duy Nguyen
2013-07-17  8:12       ` Thomas Gummerer
2013-07-17 23:58         ` Junio C Hamano
2013-07-19 17:37           ` Thomas Gummerer
2013-07-19 18:25             ` Junio C Hamano
2013-07-16 21:03 ` Ramsay Jones
2013-07-17  8:04   ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87haf0qpld.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).