git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: "Christian Couder" <christian.couder@gmail.com>,
	"Olga Telezhnaya" <olyatelezhnaya@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	kernel-usp@googlegroups.com
Subject: [GSoC][RFC WIP PATCH 0/3] grep: allow parallelism in zlib inflation
Date: Fri, 19 Jul 2019 20:08:51 -0300	[thread overview]
Message-ID: <cover.1563570204.git.matheus.bernardino@usp.br> (raw)

Threads were disabled in git-grep for non-worktree case at 53b8d93
("grep: disable threading in non-worktree case", 12-12-2011), due to
perfomance drops. 

To regain performance, this series work on the object reading code so
that zlib inflation may be performed in parallel. This is a good hotspot
for parallelism as, in some test cases[1], it accounts for up to 48% of
execution time. And besides that, inflation tasks are already
independent from one another.

Grep'ing 'abcd[02]' ("Regex 1") and '(static|extern) (int|double) \*'
("Regex 2") at chromium's repository[2], I got:

     Threads |   Regex 1  |  Regex 2
    ---------|------------|-----------
        1    |  17.5815s  | 21.7217s
        2    |   9.7983s  | 11.3965s
        8    |   6.3097s  |  6.9667s
    
These are all means of 30 executions after 2 warmup runs. All tests were
executed on a i7-7700HQ with 16GB of RAM and SSD. The output was also
validated against current git-grep.

I still want to repeat the test in an HDD machine and in a repo with
mainly loose objects.

There're still some open issues thought:

1) To allow parallel inflation, the `obj_read_mutex` is released at
`unpack_compressed_entry()` and `get_size_from_delta()` right before
calling `git_inflate()` (and re-acquired right after). For git-grep it
seems to be OK. But, besides git-grep, if it's wanted to read objects
and perform other operations in parallel, we probably cannot guarantee
that the lock will be held by the current thread when it gets to those
two functions. Also, as the lock does only protect this call tree, if
those other operations access the same global states through other
paths, we may also have race conditions.

2) git-grep operations on textconv and submodules are still
unprotected and should result in race conditions. I'm not sure if
having two separate mutexes for them would fix it as the latter also has
to access the object store.

Any comments on the series and open issues will be highly appreciated.

[1]: https://matheustavares.gitlab.io/posts/week-6-working-at-zlib-inflation#multithreading-zlib-inflation 
[2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
     04-06-2019), after a 'git gc' execution.

travis build: https://travis-ci.org/matheustavares/git/builds/561200398


Matheus Tavares (3):
  object-store: make read_object_file_extended() thread-safe
  grep: remove locks on object reading operations
  grep: re-enable threads in non-worktree case

 builtin/grep.c | 20 ++++--------------
 grep.c         |  3 ---
 object-store.h |  4 ++++
 packfile.c     |  7 +++++++
 sha1-file.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 66 insertions(+), 24 deletions(-)

-- 
2.22.0


             reply	other threads:[~2019-07-19 23:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-19 23:08 Matheus Tavares [this message]
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 1/3] object-store: make read_object_file_extended() thread-safe Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 2/3] grep: remove locks on object reading operations Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 3/3] grep: re-enable threads in non-worktree case Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1563570204.git.matheus.bernardino@usp.br \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kernel-usp@googlegroups.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).