git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Nipunn Koorapati via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>, Utsav Shah <utsav@dropbox.com>,
	Nipunn Koorapati <nipunn1313@gmail.com>,
	Nipunn Koorapati <nipunn@dropbox.com>,
	Taylor Blau <me@ttaylorr.com>,
	Nipunn Koorapati <nipunn1313@gmail.com>
Subject: [PATCH v4 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat
Date: Tue, 20 Oct 2020 13:40:57 +0000	[thread overview]
Message-ID: <pull.756.v4.git.1603201264.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.756.v3.git.1603147657.gitgitgadget@gmail.com>

Credit to alexmv who made this commit back in Dec, 2017 when he was at dbx.
I've rebased it and am submitting it now.

With fsmonitor enabled, git diff currently lstats every file in the repo
This makes use of the fsmonitor extension to skip lstat() calls on files
that fsmonitor judged as unmodified.

I was able to do some testing with/without this change in a large in-house
repo (~ 400k files).

-----------------------------------------
(1) With fsmonitor enabled - on master of git (2.29.0)
-----------------------------------------
../git/bin-wrappers/git checkout HEAD~200
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.64    4.358994          10    446257         3 lstat
  0.12    0.005353           7       764       360 open

(A subsequent call)
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.84    4.380955          10    444904         3 lstat
  0.06    0.002564         135        19           munmap
...

-----------------------------------------
(2) With fsmonitor enabled - with my patch
-----------------------------------------
../git/bin-wrappers/git checkout HEAD~200
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.72    0.003090         163        19           munmap
 19.63    0.001196         598         2           futex
...
  0.00    0.000000           0         4         3 lstat


-----------------------------------------
(3) With fsmonitor disabled entirely
-----------------------------------------

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.52    0.277085       92362         3           futex
  0.27    0.000752           4       191        63 open
...
  0.14    0.000397           3       158         3 lstat

I was able to encode this into a perf test in one of the commits.

Changes since Patch Series V1

 * Add git diff -- <pathspec> to perf tests
 * improve readability of bitwise ops

Changes since Patch Series V2

 * Add git add to perf tests
 * Refactor perf fsmonitor to simplify / remove redundancy
 * Add linting to perf tests
 * Added git diff -- <pathspec> for various sized pathspecs
 * Confirmed that refresh_fsmonitor was always being called / added to
   commit message

Changes since Patch Series V3

 * Move perf test linting to Makefile in perf/ directory

Alex Vandiver (1):
  fsmonitor: use fsmonitor data in `git diff`

Nipunn Koorapati (6):
  t/perf/README: elaborate on output format
  t/perf/p7519-fsmonitor.sh: warm cache on first git status
  t/perf: add fsmonitor perf test for git diff
  perf lint: add make test-lint to perf tests
  p7519-fsmonitor: refactor to avoid code duplication
  p7519-fsmonitor: add a git add benchmark

 diff-lib.c                | 15 +++++-
 t/Makefile                |  7 +--
 t/perf/Makefile           |  5 +-
 t/perf/README             |  2 +
 t/perf/p3400-rebase.sh    |  6 +--
 t/perf/p7519-fsmonitor.sh | 96 ++++++++++++++++++++++-----------------
 6 files changed, 81 insertions(+), 50 deletions(-)


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-756%2Fnipunn1313%2Fdiff_fsmon-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-756/nipunn1313/diff_fsmon-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/756

Range-diff vs v3:

 1:  cba03dd40b = 1:  cba03dd40b fsmonitor: use fsmonitor data in `git diff`
 2:  1c7876166f = 2:  1c7876166f t/perf/README: elaborate on output format
 3:  401f696c81 = 3:  401f696c81 t/perf/p7519-fsmonitor.sh: warm cache on first git status
 4:  b3ad8faac4 = 4:  b3ad8faac4 t/perf: add fsmonitor perf test for git diff
 5:  28c1e488bf ! 5:  b534cd137a perf lint: check test-lint-shell-syntax in perf tests
     @@ Metadata
      Author: Nipunn Koorapati <nipunn@dropbox.com>
      
       ## Commit message ##
     -    perf lint: check test-lint-shell-syntax in perf tests
     +    perf lint: add make test-lint to perf tests
      
     -    Perf tests have some seq instead of test_seq. This
     -    runs the existing tests on the perf tests as well.
     +    Perf tests have not been linted for some time.
     +    They've grown some seq instead of test_seq. This
     +    runs the existing lints on the perf tests as well.
      
          Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
      
     @@ t/Makefile: CHAINLINTTMP_SQ = $(subst ','\'',$(CHAINLINTTMP))
       CHAINLINTTESTS = $(sort $(patsubst chainlint/%.test,%,$(wildcard chainlint/*.test)))
       CHAINLINT = sed -f chainlint.sed
       
     -@@ t/Makefile: test-lint-executable:
     +@@ t/Makefile: test-lint: test-lint-duplicates test-lint-executable test-lint-shell-syntax \
     + 	test-lint-filenames
     + 
     + test-lint-duplicates:
     +-	@dups=`echo $(T) | tr ' ' '\n' | sed 's/-.*//' | sort | uniq -d` && \
     ++	@dups=`echo $(T) $(TPERF) | tr ' ' '\n' | sed 's/-.*//' | sort | uniq -d` && \
     + 		test -z "$$dups" || { \
     + 		echo >&2 "duplicate test numbers:" $$dups; exit 1; }
     + 
     + test-lint-executable:
     +-	@bad=`for i in $(T); do test -x "$$i" || echo $$i; done` && \
     ++	@bad=`for i in $(T) $(TPERF); do test -x "$$i" || echo $$i; done` && \
     + 		test -z "$$bad" || { \
       		echo >&2 "non-executable tests:" $$bad; exit 1; }
       
       test-lint-shell-syntax:
     @@ t/Makefile: test-lint-executable:
       test-lint-filenames:
       	@# We do *not* pass a glob to ls-files but use grep instead, to catch
      
     + ## t/perf/Makefile ##
     +@@
     + -include ../../config.mak
     + export GIT_TEST_OPTIONS
     + 
     +-all: perf
     ++all: test-lint perf
     + 
     + perf: pre-clean
     + 	./run
     +@@ t/perf/Makefile: pre-clean:
     + clean:
     + 	rm -rf build "trash directory".* test-results
     + 
     ++test-lint:
     ++	$(MAKE) -C .. test-lint
     ++
     + .PHONY: all perf pre-clean clean
     +
       ## t/perf/p3400-rebase.sh ##
      @@ t/perf/p3400-rebase.sh: test_expect_success 'setup rebasing on top of a lot of changes' '
       	git checkout -f -B base &&
 6:  b38f2984f9 = 6:  3b20f4c76e p7519-fsmonitor: refactor to avoid code duplication
 7:  d392a523f2 = 7:  6f97439936 p7519-fsmonitor: add a git add benchmark

-- 
gitgitgadget

  parent reply	other threads:[~2020-10-20 13:41 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-17 21:04 [PATCH 0/4] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-17 22:25   ` Junio C Hamano
2020-10-18  0:54     ` Nipunn Koorapati
2020-10-18  4:17       ` Taylor Blau
2020-10-18  5:02         ` Junio C Hamano
2020-10-18 23:43           ` Taylor Blau
2020-10-19 17:23             ` Junio C Hamano
2020-10-19 17:37               ` Taylor Blau
2020-10-19 18:07                 ` Nipunn Koorapati
2020-10-17 21:04 ` [PATCH 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-18  4:22   ` Taylor Blau
2020-10-17 21:04 ` [PATCH 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-17 22:28   ` Junio C Hamano
2020-10-19 21:35 ` [PATCH v2 0/4] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 21:43     ` Taylor Blau
2020-10-19 21:54     ` Taylor Blau
2020-10-19 22:00       ` Nipunn Koorapati
2020-10-19 22:02         ` Taylor Blau
2020-10-19 22:25       ` Nipunn Koorapati
2020-10-19 22:47   ` [PATCH v3 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 5/7] perf lint: check test-lint-shell-syntax in perf tests Nipunn Koorapati via GitGitGadget
2020-10-20  2:38       ` Taylor Blau
2020-10-20  3:10         ` Junio C Hamano
2020-10-20  3:15           ` Taylor Blau
2020-10-20 10:16             ` Nipunn Koorapati
2020-10-20 10:09         ` Nipunn Koorapati
2020-10-19 22:47     ` [PATCH v3 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20  2:43       ` Taylor Blau
2020-10-19 22:47     ` [PATCH v3 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget
2020-10-19 23:02       ` Nipunn Koorapati
2020-10-20  2:40       ` Taylor Blau
2020-10-20 13:40     ` Nipunn Koorapati via GitGitGadget [this message]
2020-10-20 13:40       ` [PATCH v4 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-20 13:40       ` [PATCH v4 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 5/7] perf lint: add make test-lint to perf tests Nipunn Koorapati via GitGitGadget
2020-10-20 22:06         ` Taylor Blau
2020-10-20 22:17           ` Nipunn Koorapati
2020-10-20 22:19             ` Taylor Blau
2020-10-20 13:41       ` [PATCH v4 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.756.v4.git.1603201264.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=nipunn1313@gmail.com \
    --cc=nipunn@dropbox.com \
    --cc=stolee@gmail.com \
    --cc=utsav@dropbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).