Re: [PATCH 0/2] A new merge algorithm, take 3

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Chuck Lever <cel@citi.umich.edu>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/2] A new merge algorithm, take 3
Date: Thu, 08 Sep 2005 11:06:21 -0400	[thread overview]
Message-ID: <4320536D.2010706@citi.umich.edu> (raw)
In-Reply-To: <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

Junio C Hamano wrote:
> Chuck Lever <cel@citi.umich.edu> writes:
> 
> 
>>for the past two weeks i've been attempting to replace the active_cache 
>>array with an abstract data type (linked list just as a prototype) in 
>>order to eliminate the need to use memmove() during insertions and 
>>deletions.
>>
>>one big win is that the merge algorithm no longer needs to overwrite the 
>>active_cache.  you can keep two lists and simply move elements from one 
>>to the other as you do the merge, and then hook the new, merged list 
>>back to the cache head when you're done.
>>
>>anyway, i'm at a point where i could use some help and advice if this is 
>>something you think could be useful in the long run.
> 
> 
> This is interesting.
> 
> For something like a kernel tree that has around 18k entries, a
> single memmove will at the worst case move that many pointers,
> so naturally a list based implementation would perform well for
> many insertions and deletions.   On the other hand, the list
> based implementation would make it very expensive to find an
> entry read-only, I suspect, unlike the array based
> implementation which lets us do binary search.

using a list was an easy prototype to see what issues there are when 
converting from an array into an abstract data type.  i've created a few 
helpers that allow me to limit the list implementation changes to 
cache.h, read-cache.c, and read-tree.c; but having these helpers means 
it should be fairly simple to switch to any reasonable data structure. 
(the helper parts are already working and i can post them to the list if 
folks are interested).

once the list implementation is working well, i plan to go back and 
replace it with something more sophisticated which can perform 
insertion, deletion, and searching efficiently.

in the long run, i see using a balanced tree.  that would make 
insertions and searches quick, and prevent the tree from degenerating 
into a linked list due to a series of in-order insertions.  a splay tree 
might be even more entertaining, as it always remains balanced, and a 
search operation places the found node or insertion point right at the 
root of the tree.

each node in the tree would represent a unique path, and have references 
to the entries of all four stages, contained in an array.  that would 
simplify several routines in read-cache.c and probably make the merge 
logic even easier to understand and more efficient.

> I haven't timed it like you did, but my gut feeling is that the
> most wastage during the merge is coming from having to move
> entries because we "insert into" or "remove from" the same
> active-cache array.

the merge actually replaces entries in place, so it is already fairly 
efficient.  the wastage in the merge case arises from the many list 
insertions done by read_cache, all of which involve moving some of the 
active cache array.

> If we allocate a new array and populate it
> from scratch as we iterate through the paths being merged and
> replace the active_cache pointer at the very end, we would do
> much better without going to a linked list implementation, which
> would penalize the normal read-only case.

i can already tell you that using a separate source and destination data 
structure during the merge simplifies the logic and makes it easier to 
understand.  i imagine it will also be easier to maintain and enhance in 
the long run.

> I think what Daniel
> did to read-tree recently, still in the proposed updates branch,
> would make this kind of change far easier to implement.

as i'm new to git development, i wasn't aware of the proposed branch.  i 
will see if i can take a look (or send me a pointer to the list archives 
if he has posted them here).

> I wanted to cc this message to Daniel but your message to me was
> somehow private so I did not.  Please feel free to bring this
> issue up either with him directly or on the list.

i wanted to assess whether this was a reasonable idea or if there was 
already work in progress in this area.  thanks for your response!

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard

next prev parent reply	other threads:[~2005-09-08 15:06 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
2005-09-11 21:05     ` Fredrik Kuivinen
2005-09-12  1:23       ` Junio C Hamano
2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
2005-09-14 16:11       ` Daniel Barkalow
2005-09-14 16:30         ` Junio C Hamano
2005-09-14 17:42       ` Fredrik Kuivinen
2005-09-14 17:51         ` Junio C Hamano
2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
2005-09-19 16:13         ` Fredrik Kuivinen
2005-09-20  1:53           ` Junio C Hamano
2005-09-20  5:50             ` Fredrik Kuivinen
2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
2005-09-08  6:06   ` Fredrik Kuivinen
2005-09-08 15:27     ` Daniel Barkalow
2005-09-08 20:05       ` Fredrik Kuivinen
2005-09-08 21:27         ` Daniel Barkalow
2005-09-07 18:36 ` Junio C Hamano
     [not found]   ` <431F34FF.5050301@citi.umich.edu>
     [not found]     ` <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
2005-09-08 15:06       ` Chuck Lever [this message]
2005-09-08 16:51         ` Junio C Hamano
2005-09-08 17:19           ` Linus Torvalds
2005-09-08 17:51             ` Junio C Hamano
2005-09-08 18:16             ` Chuck Lever
2005-09-08 18:35               ` Linus Torvalds
2005-09-08 18:58                 ` Chuck Lever
2005-09-08 20:59                   ` Linus Torvalds
2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
2005-09-09 16:05                     ` Daniel Barkalow
2005-09-09 16:43                       ` Junio C Hamano
2005-09-09 17:25                         ` Daniel Barkalow
2005-09-11  4:58                     ` Junio C Hamano
2005-09-12 21:08                     ` Fredrik Kuivinen
2005-09-12 21:16                       ` Junio C Hamano
2005-09-13 20:33                         ` Fredrik Kuivinen
2005-09-13 20:46                           ` Junio C Hamano
2005-09-08 20:54   ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
2005-09-08 21:23     ` A Large Angry SCM

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4320536D.2010706@citi.umich.edu \
    --to=cel@citi.umich.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).