git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: trast@inf.ethz.ch, mhagger@alum.mit.edu, gitster@pobox.com,
	pclouds@gmail.com, robin.rosenberg@dewire.com,
	t.gummerer@gmail.com
Subject: [PATCH 00/22] Index v5
Date: Sun,  7 Jul 2013 10:11:38 +0200	[thread overview]
Message-ID: <1373184720-29767-1-git-send-email-t.gummerer@gmail.com> (raw)

Hi,

This is a follow up for last years Google Summer of Code (late I know
:-) ), which wasn't merged back then.  The previous rounds of the
series are at $gmane/202752, $gmane/202923, $gmane/203088 and
$gmane/203517.

Since then I added a index reading api, which allows certain parts of
Git to take advantage of the the partial reading capability of the new
index file format now.  In this series the grep and the ls-files and
the code-paths used by them are switched to the new api.

Another goal for the api is to hide the open coded loops and accesses
to the in-memory format, to make it simpler to change the in-memory
format to a version that fits the new on-disk format better.

Except for the new patches, mostly the "read-cache: read index-v5"
patch changed, as the possibility to read the index partially was
added.

The first patch for t2104 makes sense without the rest of the series,
as it fixes running the test-suite with index-v4 as the default index
format.

Below are the timings for the WebKit repository.  c4b2d88 is the
revicion before adding anything, while HEAD are the times at the last
patch in the series.  The slower times in update-index come from the
update-index patch so they are no problem (in c4b2d88 the index is
only read, while in HEAD it's read and written).  The increase in time
in the ls-files test come from the not having the prune_cache function
in the index api.

I have not added this function as it only seems of use in ls-files,
but it can still be added if this increase is a problem.

Test                                        c4b2d88           HEAD                   
-------------------------------------------------------------------------------------
0003.2: v[23]: update-index                 0.11(0.06+0.04)   0.22(0.15+0.05) +100.0%
0003.3: v[23]: grep nonexistent -- subdir   0.12(0.08+0.03)   0.12(0.09+0.02) +0.0%  
0003.4: v[23]: ls-files -- subdir           0.11(0.08+0.01)   0.12(0.08+0.03) +9.1%  
0003.6: v4: update-index                    0.09(0.06+0.02)   0.18(0.14+0.03) +100.0%
0003.7: v4: grep nonexistent -- subdir      0.10(0.08+0.02)   0.10(0.07+0.02) +0.0%  
0003.8: v4: ls-files -- subdir              0.09(0.07+0.01)   0.10(0.08+0.01) +11.1% 
0003.10: v5: update-index                   <missing>         0.15(0.10+0.03)        
0003.11: v5: grep nonexistent -- subdir     <missing>         0.01(0.00+0.00)        
0003.12: v5: ls-files -- subdir             <missing>         0.01(0.01+0.00)        

And for reference the times for a synthetic repository with a 470MB
index file, just to demonstrate the improvements in large repositories.

Test                                        c4b2d88           HEAD                   
-------------------------------------------------------------------------------------
0003.2: v[23]: update-index                 1.50(1.18+0.30)   3.18(2.55+0.60) +112.0%
0003.3: v[23]: grep nonexistent -- subdir   1.62(1.28+0.32)   1.66(1.28+0.36) +2.5%  
0003.4: v[23]: ls-files -- subdir           1.49(1.21+0.26)   1.62(1.28+0.32) +8.7%  
0003.6: v4: update-index                    1.18(0.89+0.28)   2.68(2.22+0.44) +127.1%
0003.7: v4: grep nonexistent -- subdir      1.29(1.00+0.28)   1.30(1.04+0.24) +0.8%  
0003.8: v4: ls-files -- subdir              1.20(0.95+0.23)   1.30(0.98+0.30) +8.3%  
0003.10: v5: update-index                   <missing>         2.12(1.63+0.48)        
0003.11: v5: grep nonexistent -- subdir     <missing>         0.08(0.04+0.02)        
0003.12: v5: ls-files -- subdir             <missing>         0.07(0.05+0.01)        


Thomas Gummerer (21):
  t2104: Don't fail for index versions other than [23]
  read-cache: split index file version specific functionality
  read-cache: move index v2 specific functions to their own file
  read-cache: Re-read index if index file changed
  read-cache: add index reading api
  make sure partially read index is not changed
  dir.c: use index api
  tree.c: use index api
  name-hash.c: use index api
  grep.c: Use index api
  ls-files.c: use the index api
  read-cache: make read_blob_data_from_index use index api
  documentation: add documentation of the index-v5 file format
  read-cache: make in-memory format aware of stat_crc
  read-cache: read index-v5
  read-cache: read resolve-undo data
  read-cache: read cache-tree in index-v5
  read-cache: write index-v5
  read-cache: write index-v5 cache-tree data
  read-cache: write resolve-undo data for index-v5
  update-index.c: rewrite index when index-version is given

Thomas Rast (1):
  p0003-index.sh: add perf test for the index formats

 Documentation/technical/index-file-format-v5.txt |  296 +++++
 Makefile                                         |    3 +
 builtin/grep.c                                   |   71 +-
 builtin/ls-files.c                               |  213 ++-
 builtin/update-index.c                           |    8 +-
 cache-tree.c                                     |    2 +-
 cache-tree.h                                     |    6 +
 cache.h                                          |  158 ++-
 dir.c                                            |   33 +-
 name-hash.c                                      |   11 +-
 read-cache-v2.c                                  |  651 +++++++++
 read-cache-v5.c                                  | 1536 ++++++++++++++++++++++
 read-cache.c                                     |  752 ++++-------
 read-cache.h                                     |   69 +
 t/perf/p0003-index.sh                            |   59 +
 t/t2104-update-index-skip-worktree.sh            |    1 +
 test-index-version.c                             |    7 +-
 tree.c                                           |   38 +-
 18 files changed, 3183 insertions(+), 731 deletions(-)
 create mode 100644 Documentation/technical/index-file-format-v5.txt
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache-v5.c
 create mode 100644 read-cache.h
 create mode 100755 t/perf/p0003-index.sh

-- 
1.8.3.453.g1dfc63d

             reply	other threads:[~2013-07-07  8:12 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07  8:11 Thomas Gummerer [this message]
2013-07-07  8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-07  8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-07  8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-07  8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-07  8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer
2013-07-08  2:01   ` Duy Nguyen
2013-07-08 11:40     ` Thomas Gummerer
2013-07-08  2:19   ` Duy Nguyen
2013-07-08 11:20     ` Thomas Gummerer
2013-07-08 12:45       ` Duy Nguyen
2013-07-08 13:37         ` Thomas Gummerer
2013-07-08 20:54         ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer
2013-07-09 15:42           ` Duy Nguyen
2013-07-09 20:10             ` Thomas Gummerer
2013-07-10  5:28               ` Duy Nguyen
2013-07-11 11:30                 ` Thomas Gummerer
2013-07-11 11:42                   ` Duy Nguyen
2013-07-11 12:27                     ` Duy Nguyen
2013-07-08 16:36   ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano
2013-07-08 20:10     ` Thomas Gummerer
2013-07-08 23:09       ` Junio C Hamano
2013-07-09 20:13         ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer
2013-07-08 16:31   ` Junio C Hamano
2013-07-08 18:33     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer
2013-07-07  8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-11 10:39   ` Duy Nguyen
2013-07-11 11:39     ` Thomas Gummerer
2013-07-11 11:47       ` Duy Nguyen
2013-07-11 12:26         ` Thomas Gummerer
2013-07-11 12:50           ` Duy Nguyen
2013-07-07  8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-07  8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer
2013-07-07 20:18   ` Eric Sunshine
2013-07-08 11:40     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-07 20:41   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer
2013-07-07 20:43   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-07  8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1373184720-29767-1-git-send-email-t.gummerer@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=trast@inf.ethz.ch \
    --subject='Re: [PATCH 00/22] Index v5' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).