git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: Re: diffcore-rename performance mode
Date: Tue, 25 Sep 2007 12:38:43 -0400	[thread overview]
Message-ID: <20070925163843.GA22987@coredump.intra.peff.net> (raw)
In-Reply-To: <20070918085413.GA11751@coredump.intra.peff.net>

On Tue, Sep 18, 2007 at 04:54:13AM -0400, Jeff King wrote:

> > > However, keeping around _just_ the
> > > cnt_data caused only about 100M of extra memory consumption (and gave
> > > the same performance boost).
> > 
> > That would be an interesting and relatively low-hanging optimization.
> 
> I can produce memory usage numbers for the kernel, too.

And here are some kernel numbers. I measured performance of this script
in the linux-2.6 repository:

#!/bin/sh

last=
git-tag | grep -v -- - | while read tag; do
  if test -n "$last"; then
    echo Diffing $last..$tag
    git-diff --raw -M -l0 $last $tag >/dev/null
  fi
  last=$tag
done

under the assumption that diffing between major revisions would give a
good medium of diffs that would be large enough to show the n^2 rename
behavior, but still small enough to be close to "everyday" usage.

I measured three different approaches:
  1. stock 'next' (stock)
  2. removing entirely the calls to diff_free_filespec_data (nofree)
  3. changing those free calls to free everything except cnt_data (somefree)

And I measured two things:
  1. user CPU time to complete
  2. peak memory usage

All numbers are warm-cache, and typical cases after multiple runs.

                 | stock | nofree | somefree
-----------------|---------------------------
user time (s)    | 76.78 | 16.96  | 46.26
peak memory (Kb) | 52300 | 66796  | 59156

The raw 'time' output is below:

stock:
76.78user 3.35system 1:20.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+166733minor)pagefaults 0swaps

nofree:
16.96user 1.46system 0:18.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+185353minor)pagefaults 0swaps

somefree:
46.26user 1.54system 0:47.94elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+178819minor)pagefaults 0swaps

So this is definitely worth pursuing, as it yields massive speedups even
for regular repositories. And even the 'nofree' case only costs us 14M
of extra memory (although it is a 27% increase, this just isn't that
memory-hungry an endeavour for the sizes of changes we're talking
about). And as Linus noted, now that we have a default rename limit,
you're not likely to hit an explosion of memory usage.

What is most confusing is why the 'somefree' case performs so badly,
since we should just be using the cnt_data. I'll see if gprof can shed
any light on that. It would be nice to use it instead, since it will
have much better memory usage in the face of large blobs (e.g., my
pathological case that started this whole thread).

-Peff

  parent reply	other threads:[~2007-09-25 16:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-18  8:23 diffcore-rename performance mode Jeff King
2007-09-18  8:49 ` Junio C Hamano
2007-09-18  8:54   ` Jeff King
2007-09-18  8:58     ` Junio C Hamano
2007-09-18  9:01       ` Jeff King
2007-09-18  9:17         ` Junio C Hamano
2007-09-18 11:20     ` Johannes Schindelin
2007-09-25 16:38     ` Jeff King [this message]
2007-09-25 19:06       ` Jeff King
2007-09-25 19:10         ` Andreas Ericsson
2007-09-25 19:32         ` David Kastrup
2007-09-25 19:52           ` Jeff King
2007-09-18 22:12 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070925163843.GA22987@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).