* [GSoC][RFC WIP PATCH 1/3] object-store: make read_object_file_extended() thread-safe
2019-07-19 23:08 [GSoC][RFC WIP PATCH 0/3] grep: allow parallelism in zlib inflation Matheus Tavares
@ 2019-07-19 23:08 ` Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 2/3] grep: remove locks on object reading operations Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 3/3] grep: re-enable threads in non-worktree case Matheus Tavares
2 siblings, 0 replies; 4+ messages in thread
From: Matheus Tavares @ 2019-07-19 23:08 UTC (permalink / raw)
To: git
Cc: Christian Couder, Olga Telezhnaya,
Nguyễn Thái Ngọc Duy, kernel-usp, Stefan Beller,
Jeff King, Denton Liu, brian m. carlson, Junio C Hamano
Allow read_object_file_extended() to be called by multiple threads
protecting it with a lock. The lock usage can be toggled with
enable_obj_read_lock() and disable_obj_read_lock().
Probably there are many spots in read_object_file_extended()'s call
chain that could be executed unlocked (and thus, in parallel). But, just
to make sure, let's protect everthing for now, and go refining the lock
step-by-step in the future.
The only (and perhaps most important) exception is git_inflate. Since it
is already thread-safe and takes a significant amount of time, the lock
is released when entering this function, so that it can be performed in
parallel. This should already bring good performance, because of
inflation's time cost.
Note that add_delta_base_cache() was also modified to skip adding
already present cache entries. This wouldn't happen in the past, but now
it's possible as phase I and phase III of unpack_entry() may execute
concurrently.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
object-store.h | 4 ++++
packfile.c | 7 +++++++
sha1-file.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 62 insertions(+), 5 deletions(-)
diff --git a/object-store.h b/object-store.h
index 49f56ab8d9..8330ff3988 100644
--- a/object-store.h
+++ b/object-store.h
@@ -157,6 +157,10 @@ const char *loose_object_path(struct repository *r, struct strbuf *buf,
void *map_loose_object(struct repository *r, const struct object_id *oid,
unsigned long *size);
+void enable_obj_read_lock(void);
+void disable_obj_read_lock(void);
+void obj_read_lock(void);
+void obj_read_unlock(void);
void *read_object_file_extended(struct repository *r,
const struct object_id *oid,
enum object_type *type,
diff --git a/packfile.c b/packfile.c
index c0d83fdfed..a50be4e5e5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1087,7 +1087,9 @@ unsigned long get_size_from_delta(struct packed_git *p,
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
+ obj_read_unlock();
st = git_inflate(&stream, Z_FINISH);
+ obj_read_lock();
curpos += stream.next_in - in;
} while ((st == Z_OK || st == Z_BUF_ERROR) &&
stream.total_out < sizeof(delta_head));
@@ -1440,6 +1442,9 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
struct list_head *lru, *tmp;
+ if (get_delta_base_cache_entry(p, base_offset))
+ return;
+
delta_base_cached += base_size;
list_for_each_safe(lru, tmp, &delta_base_cache_lru) {
@@ -1569,7 +1574,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
+ obj_read_unlock();
st = git_inflate(&stream, Z_FINISH);
+ obj_read_lock();
if (!stream.avail_out)
break; /* the payload is larger than it should be */
curpos += stream.next_in - in;
diff --git a/sha1-file.c b/sha1-file.c
index 888b6024d5..37cde4a494 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1463,16 +1463,49 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type,
return 0;
}
+static pthread_mutex_t obj_read_mutex;
+static int obj_read_use_lock = 0;
+
+void enable_obj_read_lock(void)
+{
+ if (obj_read_use_lock)
+ return;
+
+ obj_read_use_lock = 1;
+ pthread_mutex_init(&obj_read_mutex, NULL);
+}
+
+void disable_obj_read_lock(void)
+{
+ if (!obj_read_use_lock)
+ return;
+
+ obj_read_use_lock = 0;
+ pthread_mutex_destroy(&obj_read_mutex);
+}
+
+void obj_read_lock(void)
+{
+ if(obj_read_use_lock)
+ pthread_mutex_lock(&obj_read_mutex);
+}
+
+void obj_read_unlock(void)
+{
+ if(obj_read_use_lock)
+ pthread_mutex_unlock(&obj_read_mutex);
+}
+
/*
* This function dies on corrupt objects; the callers who want to
* deal with them should arrange to call read_object() and give error
* messages themselves.
*/
-void *read_object_file_extended(struct repository *r,
- const struct object_id *oid,
- enum object_type *type,
- unsigned long *size,
- int lookup_replace)
+static void *do_read_object_file_extended(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size,
+ int lookup_replace)
{
void *data;
const struct packed_git *p;
@@ -1505,6 +1538,19 @@ void *read_object_file_extended(struct repository *r,
return NULL;
}
+void *read_object_file_extended(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size,
+ int lookup_replace)
+{
+ void *data;
+ obj_read_lock();
+ data = do_read_object_file_extended(r, oid, type, size, lookup_replace);
+ obj_read_unlock();
+ return data;
+}
+
void *read_object_with_reference(const struct object_id *oid,
const char *required_type_name,
unsigned long *size,
--
2.22.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [GSoC][RFC WIP PATCH 2/3] grep: remove locks on object reading operations
2019-07-19 23:08 [GSoC][RFC WIP PATCH 0/3] grep: allow parallelism in zlib inflation Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 1/3] object-store: make read_object_file_extended() thread-safe Matheus Tavares
@ 2019-07-19 23:08 ` Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 3/3] grep: re-enable threads in non-worktree case Matheus Tavares
2 siblings, 0 replies; 4+ messages in thread
From: Matheus Tavares @ 2019-07-19 23:08 UTC (permalink / raw)
To: git
Cc: Christian Couder, Olga Telezhnaya,
Nguyễn Thái Ngọc Duy, kernel-usp, Junio C Hamano,
Stefan Beller, brian m. carlson, Brandon Williams, Jeff King
git-grep has locks around the object reading operations it performs. But
these have an internal lock now, which can be enabled through
enable_obj_read_lock(). So let's use it and drop git-grep's ones.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/grep.c | 18 +++---------------
grep.c | 3 ---
2 files changed, 3 insertions(+), 18 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 580fd38f41..682e2461d0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -206,6 +206,7 @@ static void start_threads(struct grep_opt *opt)
pthread_cond_init(&cond_write, NULL);
pthread_cond_init(&cond_result, NULL);
grep_use_locks = 1;
+ enable_obj_read_lock();
for (i = 0; i < ARRAY_SIZE(todo); i++) {
strbuf_init(&todo[i].out, 0);
@@ -263,6 +264,7 @@ static int wait_all(void)
pthread_cond_destroy(&cond_write);
pthread_cond_destroy(&cond_result);
grep_use_locks = 0;
+ disable_obj_read_lock();
return hit;
}
@@ -295,16 +297,6 @@ static int grep_cmd_config(const char *var, const char *value, void *cb)
return st;
}
-static void *lock_and_read_oid_file(const struct object_id *oid, enum object_type *type, unsigned long *size)
-{
- void *data;
-
- grep_read_lock();
- data = read_object_file(oid, type, size);
- grep_read_unlock();
- return data;
-}
-
static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
const char *filename, int tree_name_len,
const char *path)
@@ -457,10 +449,8 @@ static int grep_submodule(struct grep_opt *opt,
object = parse_object_or_die(oid, oid_to_hex(oid));
- grep_read_lock();
data = read_object_with_reference(&object->oid, tree_type,
&size, NULL);
- grep_read_unlock();
if (!data)
die(_("unable to read tree (%s)"), oid_to_hex(&object->oid));
@@ -585,7 +575,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
void *data;
unsigned long size;
- data = lock_and_read_oid_file(&entry.oid, &type, &size);
+ data = read_object_file(&entry.oid, &type, &size);
if (!data)
die(_("unable to read tree (%s)"),
oid_to_hex(&entry.oid));
@@ -622,10 +612,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
struct strbuf base;
int hit, len;
- grep_read_lock();
data = read_object_with_reference(&obj->oid, tree_type,
&size, NULL);
- grep_read_unlock();
if (!data)
die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
diff --git a/grep.c b/grep.c
index f7c3a5803e..984a9e99cf 100644
--- a/grep.c
+++ b/grep.c
@@ -2109,10 +2109,7 @@ static int grep_source_load_oid(struct grep_source *gs)
{
enum object_type type;
- grep_read_lock();
gs->buf = read_object_file(gs->identifier, &type, &gs->size);
- grep_read_unlock();
-
if (!gs->buf)
return error(_("'%s': unable to read %s"),
gs->name,
--
2.22.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [GSoC][RFC WIP PATCH 3/3] grep: re-enable threads in non-worktree case
2019-07-19 23:08 [GSoC][RFC WIP PATCH 0/3] grep: allow parallelism in zlib inflation Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 1/3] object-store: make read_object_file_extended() thread-safe Matheus Tavares
2019-07-19 23:08 ` [GSoC][RFC WIP PATCH 2/3] grep: remove locks on object reading operations Matheus Tavares
@ 2019-07-19 23:08 ` Matheus Tavares
2 siblings, 0 replies; 4+ messages in thread
From: Matheus Tavares @ 2019-07-19 23:08 UTC (permalink / raw)
To: git
Cc: Christian Couder, Olga Telezhnaya,
Nguyễn Thái Ngọc Duy, kernel-usp, Jeff King,
Junio C Hamano, Brandon Williams
They were disabled at 53b8d93 ("grep: disable threading in non-worktree
case", 12-12-2011), due to observable perfomance drops. But now that
zlib inflation can be performed in parallel, we can regain the speedup.
Grep'ing 'abcd[02]' ("Regex 1") and '(static|extern) (int|double) \*'
("Regex 2") at chromium's repository[1] I got:
Threads | Regex 1 | Regex 2
---------|------------|-----------
1 | 17.5815s | 21.7217s
2 | 9.7983s | 11.3965s
8 | 6.3097s | 6.9667s
These are all means of 30 executions after 2 warmup runs. All tests were
executed on a i7-7700HQ with 16GB of RAM and SSD.
[1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
04-06-2019), after a 'git gc' execution.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
builtin/grep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 682e2461d0..9309dea833 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1046,7 +1046,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
pathspec.recursive = 1;
pathspec.recurse_submodules = !!recurse_submodules;
- if (list.nr || cached || show_in_pager) {
+ if (show_in_pager) {
if (num_threads > 1)
warning(_("invalid option combination, ignoring --threads"));
num_threads = 1;
--
2.22.0
^ permalink raw reply related [flat|nested] 4+ messages in thread