From: Chuck Lever <cel@citi.umich.edu>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/2] A new merge algorithm, take 3
Date: Thu, 08 Sep 2005 11:06:21 -0400 [thread overview]
Message-ID: <4320536D.2010706@citi.umich.edu> (raw)
In-Reply-To: <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]
Junio C Hamano wrote:
> Chuck Lever <cel@citi.umich.edu> writes:
>
>
>>for the past two weeks i've been attempting to replace the active_cache
>>array with an abstract data type (linked list just as a prototype) in
>>order to eliminate the need to use memmove() during insertions and
>>deletions.
>>
>>one big win is that the merge algorithm no longer needs to overwrite the
>>active_cache. you can keep two lists and simply move elements from one
>>to the other as you do the merge, and then hook the new, merged list
>>back to the cache head when you're done.
>>
>>anyway, i'm at a point where i could use some help and advice if this is
>>something you think could be useful in the long run.
>
>
> This is interesting.
>
> For something like a kernel tree that has around 18k entries, a
> single memmove will at the worst case move that many pointers,
> so naturally a list based implementation would perform well for
> many insertions and deletions. On the other hand, the list
> based implementation would make it very expensive to find an
> entry read-only, I suspect, unlike the array based
> implementation which lets us do binary search.
using a list was an easy prototype to see what issues there are when
converting from an array into an abstract data type. i've created a few
helpers that allow me to limit the list implementation changes to
cache.h, read-cache.c, and read-tree.c; but having these helpers means
it should be fairly simple to switch to any reasonable data structure.
(the helper parts are already working and i can post them to the list if
folks are interested).
once the list implementation is working well, i plan to go back and
replace it with something more sophisticated which can perform
insertion, deletion, and searching efficiently.
in the long run, i see using a balanced tree. that would make
insertions and searches quick, and prevent the tree from degenerating
into a linked list due to a series of in-order insertions. a splay tree
might be even more entertaining, as it always remains balanced, and a
search operation places the found node or insertion point right at the
root of the tree.
each node in the tree would represent a unique path, and have references
to the entries of all four stages, contained in an array. that would
simplify several routines in read-cache.c and probably make the merge
logic even easier to understand and more efficient.
> I haven't timed it like you did, but my gut feeling is that the
> most wastage during the merge is coming from having to move
> entries because we "insert into" or "remove from" the same
> active-cache array.
the merge actually replaces entries in place, so it is already fairly
efficient. the wastage in the merge case arises from the many list
insertions done by read_cache, all of which involve moving some of the
active cache array.
> If we allocate a new array and populate it
> from scratch as we iterate through the paths being merged and
> replace the active_cache pointer at the very end, we would do
> much better without going to a linked list implementation, which
> would penalize the normal read-only case.
i can already tell you that using a separate source and destination data
structure during the merge simplifies the logic and makes it easier to
understand. i imagine it will also be easier to maintain and enhance in
the long run.
> I think what Daniel
> did to read-tree recently, still in the proposed updates branch,
> would make this kind of change far easier to implement.
as i'm new to git development, i wasn't aware of the proposed branch. i
will see if i can take a look (or send me a pointer to the list archives
if he has posted them here).
> I wanted to cc this message to Daniel but your message to me was
> somehow private so I did not. Please feel free to bring this
> issue up either with him directly or on the list.
i wanted to assess whether this was a reasonable idea or if there was
already work in progress in this area. thanks for your response!
[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard
next prev parent reply other threads:[~2005-09-08 15:06 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
2005-09-11 2:54 ` Unified merge driver pushed out to "master" branch Junio C Hamano
2005-09-11 21:05 ` Fredrik Kuivinen
2005-09-12 1:23 ` Junio C Hamano
2005-09-14 5:56 ` Another merge test case from the kernel tree Junio C Hamano
2005-09-14 16:11 ` Daniel Barkalow
2005-09-14 16:30 ` Junio C Hamano
2005-09-14 17:42 ` Fredrik Kuivinen
2005-09-14 17:51 ` Junio C Hamano
2005-09-15 0:47 ` Yet another set of merge test cases " Junio C Hamano
2005-09-19 16:13 ` Fredrik Kuivinen
2005-09-20 1:53 ` Junio C Hamano
2005-09-20 5:50 ` Fredrik Kuivinen
2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
2005-09-08 6:06 ` Fredrik Kuivinen
2005-09-08 15:27 ` Daniel Barkalow
2005-09-08 20:05 ` Fredrik Kuivinen
2005-09-08 21:27 ` Daniel Barkalow
2005-09-07 18:36 ` Junio C Hamano
[not found] ` <431F34FF.5050301@citi.umich.edu>
[not found] ` <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
2005-09-08 15:06 ` Chuck Lever [this message]
2005-09-08 16:51 ` Junio C Hamano
2005-09-08 17:19 ` Linus Torvalds
2005-09-08 17:51 ` Junio C Hamano
2005-09-08 18:16 ` Chuck Lever
2005-09-08 18:35 ` Linus Torvalds
2005-09-08 18:58 ` Chuck Lever
2005-09-08 20:59 ` Linus Torvalds
2005-09-09 7:44 ` [RFH] Merge driver Junio C Hamano
2005-09-09 16:05 ` Daniel Barkalow
2005-09-09 16:43 ` Junio C Hamano
2005-09-09 17:25 ` Daniel Barkalow
2005-09-11 4:58 ` Junio C Hamano
2005-09-12 21:08 ` Fredrik Kuivinen
2005-09-12 21:16 ` Junio C Hamano
2005-09-13 20:33 ` Fredrik Kuivinen
2005-09-13 20:46 ` Junio C Hamano
2005-09-08 20:54 ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
2005-09-08 21:23 ` A Large Angry SCM
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4320536D.2010706@citi.umich.edu \
--to=cel@citi.umich.edu \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).