git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Geert Bosch <bosch@gnat.com>
Cc: git@vger.kernel.org
Subject: Re: PATCH: New diff-delta.c implementation (updated)
Date: Thu, 27 Apr 2006 20:16:05 -0700	[thread overview]
Message-ID: <7v1wvigzka.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.GSO.4.60.0604272132170.9650@nile.gnat.com> (Geert Bosch's message of "Thu, 27 Apr 2006 21:59:53 -0400 (EDT)")

Geert Bosch <bosch@gnat.com> writes:

> Even though the previous version did really well on large files
> with many changes, performance was lacking for the many small
> files with very few changes that are so common for a VCS.
>...
> The result has been only a slight increase in delta size for
> very large test cases (but with better performance), and
> both smaller deltas and faster execution speed for repacking
> git.git. I had trouble cloning the Linux kernel repository,
> but am now reasonably confident this will outperform the
> existing algorithm pretty consistently.

Interesting.

Initial impression, the same test as before (a full packing of
the git.git repository that does not have _any_ pack -- all 18k
objects are loose).

First, the incumbent, with the "reusing delta-index" patch applied.

Total 17724, written 17724 (delta 12002), reused 0 (delta 0)
34.02user 6.48system 0:42.87elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+434478minor)pagefaults 0swaps

 6188418 pack-nico-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Then diff-delta.c replaced with your version.

Total 17724, written 17724 (delta 12012), reused 0 (delta 0)
44.87user 6.54system 0:54.01elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+441124minor)pagefaults 0swaps

 6099183 pack-geert-f1fac077a093ffdaf094aab2b7f11859ec0c18f1.pack

Second impression, in a recent kernel tree which is mostly
packed.  Packing 41k objects (v2.6.16..v2.6.17-rc3), with
"git-pack-objects --no-reuse-delta".

(Nico)
Total 41591, written 41591 (delta 29285), reused 8563 (delta 0)
169.08user 12.60system 3:27.68elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+1099928minor)pagefaults 0swaps

37363966 pack-nico-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack

(Geert)
Total 41591, written 41591 (delta 29347), reused 8427 (delta 0)
243.71user 12.32system 4:28.11elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1077843minor)pagefaults 0swaps

37165890 pack-geert-b9e4339c482cb7d787a2117e6da6eb2114053abc.pack


Of course, the absolute numbers do not matter, but for the
record these are on my Duron 750, 760MB or so RAM and with
relatively slow disks.

In the kernel repository (checked out is near the tip of the
source tree), the largest files are fs/nls/nls_cp949.c (900kB
korean character encoding), drivers/usb/misc/emi62_fw_s.h
(800kB, Emagic firmware blob), arch/m68k/ifpsp060/src/fpsp.S
(750kB, floating point emulation?), and nowhere near your
algorithm really should shine.

We would probably want some internal logic that says "if we see
that blobs larger than X MB is involved in the packing, we
should use this version of diff-delta, otherwise the other one."

  parent reply	other threads:[~2006-04-28  3:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-28  1:59 PATCH: New diff-delta.c implementation (updated) Geert Bosch
2006-04-28  2:07 ` Geert Bosch
2006-04-28  3:16 ` Junio C Hamano [this message]
2006-04-28  4:28   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v1wvigzka.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=bosch@gnat.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).