* [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' @ 2020-06-30 17:17 Taylor Blau 2020-06-30 17:17 ` [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places Taylor Blau ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Taylor Blau @ 2020-06-30 17:17 UTC (permalink / raw) To: git; +Cc: peff, dstolee Hi, Here are some patches that we have been using at GitHub to control whether or not Bloom filters stored in commit-graphs are read during normal operation. We're planning on using these patches as part of a two-phase roll-out of changed-path Bloom filters, where the first phase conditions whether or not repositories *write* Bloom filters, and the second phase (controlled via the new 'core.useBloomFilters') controls whether repositories *read* their Bloom filters. This can also be handy for debugging purposes, say, for e.g., if Bloom filters are suspected to be corrupt, they can be softly disabled without dropping the rest of the data in the commit-graph. Thanks in advance for your review. -Taylor Taylor Blau (3): commit-graph: pass a 'struct repository *' in more places t4216: fix broken '&&'-chain commit-graph: respect 'core.useBloomFilters' Documentation/config/core.txt | 5 +++++ builtin/commit-graph.c | 2 +- commit-graph.c | 17 ++++++++++------- commit-graph.h | 4 +++- fuzz-commit-graph.c | 5 +++-- repo-settings.c | 3 +++ repository.h | 1 + t/helper/test-read-graph.c | 3 ++- t/t4216-log-bloom.sh | 6 ++++-- 9 files changed, 32 insertions(+), 14 deletions(-) -- 2.27.0.224.g4cfa086e50 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places 2020-06-30 17:17 [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau @ 2020-06-30 17:17 ` Taylor Blau 2020-06-30 20:52 ` Derrick Stolee 2020-06-30 17:17 ` [PATCH 2/3] t4216: fix broken '&&'-chain Taylor Blau ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 17:17 UTC (permalink / raw) To: git; +Cc: peff, dstolee In a future commit, some commit-graph internals will want access to 'r->settings', but we only have the 'struct object_directory *' corresponding to that repository. Add an additional parameter to pass the repository around in more places. In the next patch, we will remove the object directory (and instead reference it with 'r->odb'). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/commit-graph.c | 2 +- commit-graph.c | 13 ++++++++----- commit-graph.h | 4 +++- fuzz-commit-graph.c | 5 +++-- 4 files changed, 15 insertions(+), 9 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 75455da138..7c0f98531d 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -106,7 +106,7 @@ static int graph_verify(int argc, const char **argv) FREE_AND_NULL(graph_name); if (open_ok) - graph = load_commit_graph_one_fd_st(fd, &st, odb); + graph = load_commit_graph_one_fd_st(fd, &st, the_repository, odb); else graph = read_commit_graph_one(the_repository, odb); diff --git a/commit-graph.c b/commit-graph.c index 2ff042fbf4..fdfb0888f0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -170,6 +170,7 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st) } struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, + struct repository *r, struct object_directory *odb) { void *graph_map; @@ -185,7 +186,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, } graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - ret = parse_commit_graph(graph_map, graph_size); + ret = parse_commit_graph(r, graph_map, graph_size); if (ret) ret->odb = odb; @@ -225,7 +226,8 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size) { const unsigned char *data, *chunk_lookup; uint32_t i; @@ -396,6 +398,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size) } static struct commit_graph *load_commit_graph_one(const char *graph_file, + struct repository *r, struct object_directory *odb) { @@ -407,7 +410,7 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file, if (!open_ok) return NULL; - g = load_commit_graph_one_fd_st(fd, &st, odb); + g = load_commit_graph_one_fd_st(fd, &st, r, odb); if (g) g->filename = xstrdup(graph_file); @@ -419,7 +422,7 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, struct object_directory *odb) { char *graph_name = get_commit_graph_filename(odb); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(graph_name, r, odb); free(graph_name); return g; @@ -500,7 +503,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, valid = 0; for (odb = r->objects->odb; odb; odb = odb->next) { char *graph_name = get_split_graph_filename(odb, line.buf); - struct commit_graph *g = load_commit_graph_one(graph_name, odb); + struct commit_graph *g = load_commit_graph_one(graph_name, r, odb); free(graph_name); diff --git a/commit-graph.h b/commit-graph.h index 3ba0da1e5f..03d848e168 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -76,10 +76,12 @@ struct commit_graph { }; struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, + struct repository *r, struct object_directory *odb); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index 430817214d..e7cf6d5b0f 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -1,7 +1,8 @@ #include "commit-graph.h" #include "repository.h" -struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph(struct repository *r, + void *graph_map, size_t graph_size); int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); @@ -10,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph((void *)data, size); + g = parse_commit_graph(the_repository, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); -- 2.27.0.224.g4cfa086e50 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places 2020-06-30 17:17 ` [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places Taylor Blau @ 2020-06-30 20:52 ` Derrick Stolee 0 siblings, 0 replies; 16+ messages in thread From: Derrick Stolee @ 2020-06-30 20:52 UTC (permalink / raw) To: Taylor Blau, git; +Cc: peff, dstolee On 6/30/2020 1:17 PM, Taylor Blau wrote: > In a future commit, some commit-graph internals will want access to > 'r->settings', but we only have the 'struct object_directory *' > corresponding to that repository. It is good to use "struct repository *" more. > Add an additional parameter to pass the repository around in more > places. In the next patch, we will remove the object directory (and > instead reference it with 'r->odb'). And this is a good reason why we need the repository here: we will need the settings AND odb. > diff --git a/commit-graph.h b/commit-graph.h > index 3ba0da1e5f..03d848e168 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -76,10 +76,12 @@ struct commit_graph { > }; > > struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st, > + struct repository *r, > struct object_directory *odb); I suppose my only nit is that the struct repository pointer is not always the first parameter. I understand that you have grouped it where the 'odb' parameter is here, so perhaps that is fine. Feel free to ignore this nit. Thanks, -Stolee ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 17:17 [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau 2020-06-30 17:17 ` [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places Taylor Blau @ 2020-06-30 17:17 ` Taylor Blau 2020-06-30 17:50 ` Eric Sunshine 2020-06-30 17:17 ` [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' Taylor Blau 2020-08-03 19:02 ` [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau 3 siblings, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 17:17 UTC (permalink / raw) To: git; +Cc: peff, dstolee In a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06), a 'rm' invocation was added without a corresponding '&&' chain. This ends up working fine when the file already exists, in which case 'rm' exits cleanly and the rest of the function executes normally. When the file does _not_ exist, however, 'rm' returns an unclean exit code, causing the function to terminate. Fix this by making the test use an '&&'-chain, and passing '-f' to ignore missing files (as can be the case when specifying which tests are '--run'). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c855bcd3e7..0b4cc4f8d1 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } -- 2.27.0.224.g4cfa086e50 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 17:17 ` [PATCH 2/3] t4216: fix broken '&&'-chain Taylor Blau @ 2020-06-30 17:50 ` Eric Sunshine 2020-06-30 18:39 ` Taylor Blau 2020-06-30 18:55 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: Eric Sunshine @ 2020-06-30 17:50 UTC (permalink / raw) To: Taylor Blau; +Cc: Git List, Jeff King, Derrick Stolee On Tue, Jun 30, 2020 at 1:17 PM Taylor Blau <me@ttaylorr.com> wrote: > In a759bfa9ee (t4216: add end to end tests for git log with Bloom > filters, 2020-04-06), a 'rm' invocation was added without a > corresponding '&&' chain. > > This ends up working fine when the file already exists, in which case > 'rm' exits cleanly and the rest of the function executes normally. When > the file does _not_ exist, however, 'rm' returns an unclean exit code, > causing the function to terminate. This explanation makes no sense. Since this command was not part of the &&-chain, its failure would not cause the function to terminate prematurely nor would it affect the return value of the function. This explanation would make sense, however, if you're talking about the behavior _after_ fixing the broken &&-chain. > Fix this by making the test use an '&&'-chain, and passing '-f' to > ignore missing files (as can be the case when specifying which tests are > '--run'). The entire commit message is talking about implementation details and merely repeats what the subject and patch itself already say quite clearly; anyone familiar with 'rm' understands implicitly that '-f' must be added to incorporate it into the &&-chain if the file's presence is not guaranteed. Thus, you could drop the entire body of the commit message without losing clarity... With one minor exception: What is much more interesting for the reader to know is whether the file being removed is guaranteed to exist (in which case '-f' is unnecessary) or may be missing (requiring '-f'), and under what conditions it might be missing. The very last part of the last sentence of the current commit message gives a good hint about the latter, thus would be a good bit to retain. > diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh > @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF > sane_unset GIT_TRACE2_CONFIG_PARAMS Not related to this patch, but 'sane_unset' is pointless outside of a test since there is no &&-chain to maintain. Plain 'unset' would work just as well and be less misleading. > setup () { > - rm "$TRASH_DIRECTORY/trace.perf" > + rm -f "$TRASH_DIRECTORY/trace.perf" && ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 17:50 ` Eric Sunshine @ 2020-06-30 18:39 ` Taylor Blau 2020-06-30 19:03 ` Jeff King 2020-06-30 18:55 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 18:39 UTC (permalink / raw) To: Eric Sunshine; +Cc: Taylor Blau, Git List, Jeff King, Derrick Stolee On Tue, Jun 30, 2020 at 01:50:22PM -0400, Eric Sunshine wrote: > On Tue, Jun 30, 2020 at 1:17 PM Taylor Blau <me@ttaylorr.com> wrote: > > In a759bfa9ee (t4216: add end to end tests for git log with Bloom > > filters, 2020-04-06), a 'rm' invocation was added without a > > corresponding '&&' chain. > > > > This ends up working fine when the file already exists, in which case > > 'rm' exits cleanly and the rest of the function executes normally. When > > the file does _not_ exist, however, 'rm' returns an unclean exit code, > > causing the function to terminate. > > This explanation makes no sense. Since this command was not part of > the &&-chain, its failure would not cause the function to terminate > prematurely nor would it affect the return value of the function. This > explanation would make sense, however, if you're talking about the > behavior _after_ fixing the broken &&-chain. Fair enough. For what it's worth, this explanation *does* make sense if you 'set -e' beforehand, which I am accustomed to (and had incorrectly assumed that tests in 't' also have 'set -e', when they do not). I've corrected the patch and shortened it to account for your suggestions. Mind taking a look at the updated version and telling me what you think? --- >8 --- Subject: [PATCH] t4216: fix broken '&&'-chain The 'rm' added in a759bfa9ee (t4216: add end to end tests for git log with Bloom filters, 2020-04-06) should be placed within the function's '&&'-chain. The file being removed may not exist (for eg., in the case of '--run', in which case it may not be generated beforehand by a skipped test), and so add '-f' to account for the file's optional existence. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t4216-log-bloom.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c855bcd3e7..0b4cc4f8d1 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS setup () { - rm "$TRASH_DIRECTORY/trace.perf" + rm -f "$TRASH_DIRECTORY/trace.perf" && git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom } -- 2.27.0.224.g4cfa086e50 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 18:39 ` Taylor Blau @ 2020-06-30 19:03 ` Jeff King 2020-06-30 19:12 ` Taylor Blau 2020-06-30 19:48 ` Eric Sunshine 0 siblings, 2 replies; 16+ messages in thread From: Jeff King @ 2020-06-30 19:03 UTC (permalink / raw) To: Taylor Blau; +Cc: Eric Sunshine, Git List, Derrick Stolee On Tue, Jun 30, 2020 at 02:39:28PM -0400, Taylor Blau wrote: > > > This ends up working fine when the file already exists, in which case > > > 'rm' exits cleanly and the rest of the function executes normally. When > > > the file does _not_ exist, however, 'rm' returns an unclean exit code, > > > causing the function to terminate. > > > > This explanation makes no sense. Since this command was not part of > > the &&-chain, its failure would not cause the function to terminate > > prematurely nor would it affect the return value of the function. This > > explanation would make sense, however, if you're talking about the > > behavior _after_ fixing the broken &&-chain. > > Fair enough. For what it's worth, this explanation *does* make sense if > you 'set -e' beforehand, which I am accustomed to (and had incorrectly > assumed that tests in 't' also have 'set -e', when they do not). If we _really_ want to nitpick, it probably wouldn't terminate under "set -e" because the call to "setup" is itself part of an &&-chain, which suppresses "-e" handling (which is one of the many confusing "set -e" behaviors that led us to avoid it in the first place). But definitely your revised commit message below is more accurate. However... > --- >8 --- > > Subject: [PATCH] t4216: fix broken '&&'-chain > > The 'rm' added in a759bfa9ee (t4216: add end to end tests for git log > with Bloom filters, 2020-04-06) should be placed within the function's > '&&'-chain. > > The file being removed may not exist (for eg., in the case of '--run', > in which case it may not be generated beforehand by a skipped test), and > so add '-f' to account for the file's optional existence. Is the &&-chain really broken, or is the first command simply not part of that chain? Perhaps a question for philosophers, but the more applied question here is: what are we improving, and why? The original code handled the fact that the file might not exist by not including its exit code in the &&-chain which leads to the function's return value. Your new code does so by putting it in the &&-chain but asking "rm" to ignore errors. Is one better than the other? I think so, but my argument would be more along the lines of: - without "-f", "rm" will complain about a missing file, which is distracting noise in the test log - once "-f" is added in to suppress that, we might as well add the command to the &&-chain. That's our normal style, so readers don't have to wonder if it's important or not. Plus it would help avoid a broken chain if more commands are added at the beginning of the function. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 19:03 ` Jeff King @ 2020-06-30 19:12 ` Taylor Blau 2020-06-30 19:19 ` Jeff King 2020-06-30 19:48 ` Eric Sunshine 1 sibling, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 19:12 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, Eric Sunshine, Git List, Derrick Stolee On Tue, Jun 30, 2020 at 03:03:25PM -0400, Jeff King wrote: > On Tue, Jun 30, 2020 at 02:39:28PM -0400, Taylor Blau wrote: > > > > > This ends up working fine when the file already exists, in which case > > > > 'rm' exits cleanly and the rest of the function executes normally. When > > > > the file does _not_ exist, however, 'rm' returns an unclean exit code, > > > > causing the function to terminate. > > > > > > This explanation makes no sense. Since this command was not part of > > > the &&-chain, its failure would not cause the function to terminate > > > prematurely nor would it affect the return value of the function. This > > > explanation would make sense, however, if you're talking about the > > > behavior _after_ fixing the broken &&-chain. > > > > Fair enough. For what it's worth, this explanation *does* make sense if > > you 'set -e' beforehand, which I am accustomed to (and had incorrectly > > assumed that tests in 't' also have 'set -e', when they do not). > > If we _really_ want to nitpick, it probably wouldn't terminate under > "set -e" because the call to "setup" is itself part of an &&-chain, > which suppresses "-e" handling (which is one of the many confusing "set > -e" behaviors that led us to avoid it in the first place). I learned something new about 'set -e'! I don't mind nitpicking at all, it's useful information to know... > But definitely your revised commit message below is more accurate. > > However... > > > --- >8 --- > > > > Subject: [PATCH] t4216: fix broken '&&'-chain > > > > The 'rm' added in a759bfa9ee (t4216: add end to end tests for git log > > with Bloom filters, 2020-04-06) should be placed within the function's > > '&&'-chain. > > > > The file being removed may not exist (for eg., in the case of '--run', > > in which case it may not be generated beforehand by a skipped test), and > > so add '-f' to account for the file's optional existence. > > Is the &&-chain really broken, or is the first command simply not part > of that chain? Perhaps a question for philosophers, but the more applied > question here is: what are we improving, and why? > > The original code handled the fact that the file might not exist by not > including its exit code in the &&-chain which leads to the function's > return value. Your new code does so by putting it in the &&-chain but > asking "rm" to ignore errors. Is one better than the other? > > I think so, but my argument would be more along the lines of: > > - without "-f", "rm" will complain about a missing file, which is > distracting noise in the test log > > - once "-f" is added in to suppress that, we might as well add the > command to the &&-chain. That's our normal style, so readers don't > have to wonder if it's important or not. Plus it would help avoid a > broken chain if more commands are added at the beginning of the > function. I made the change for basically these reasons, but mostly to bring this function into good style as with the rest of our test suite (there are a handful of other minor nits that we could look at, such as some odd spacing, etc.). Whether or not all of this needs to go into the commit message... I don't know. On the one hand, I think that your explanation here is clearer than what I wrote in the commit message, but on the other hand, I think that amending it again may be belaboring an otherwise simple change. If you feel strongly, though, I'm happy to send a revised patch. > -Peff Thanks, Taylor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 19:12 ` Taylor Blau @ 2020-06-30 19:19 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2020-06-30 19:19 UTC (permalink / raw) To: Taylor Blau; +Cc: Eric Sunshine, Git List, Derrick Stolee On Tue, Jun 30, 2020 at 03:12:31PM -0400, Taylor Blau wrote: > > I think so, but my argument would be more along the lines of: > > > > - without "-f", "rm" will complain about a missing file, which is > > distracting noise in the test log > > > > - once "-f" is added in to suppress that, we might as well add the > > command to the &&-chain. That's our normal style, so readers don't > > have to wonder if it's important or not. Plus it would help avoid a > > broken chain if more commands are added at the beginning of the > > function. > > I made the change for basically these reasons, but mostly to bring this > function into good style as with the rest of our test suite (there are a > handful of other minor nits that we could look at, such as some odd > spacing, etc.). > > Whether or not all of this needs to go into the commit message... I > don't know. On the one hand, I think that your explanation here is > clearer than what I wrote in the commit message, but on the other hand, > I think that amending it again may be belaboring an otherwise simple > change. > > If you feel strongly, though, I'm happy to send a revised patch. I agree it's a pretty trivial patch, but I think if it's worth applying at all, then it's worth justifying it appropriately. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 19:03 ` Jeff King 2020-06-30 19:12 ` Taylor Blau @ 2020-06-30 19:48 ` Eric Sunshine 1 sibling, 0 replies; 16+ messages in thread From: Eric Sunshine @ 2020-06-30 19:48 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, Git List, Derrick Stolee On Tue, Jun 30, 2020 at 3:03 PM Jeff King <peff@peff.net> wrote: > [...] what are we improving, and why? > > The original code handled the fact that the file might not exist by not > including its exit code in the &&-chain which leads to the function's > return value. Your new code does so by putting it in the &&-chain but > asking "rm" to ignore errors. Is one better than the other? > > I think so, but my argument would be more along the lines of: > > - without "-f", "rm" will complain about a missing file, which is > distracting noise in the test log Indeed, a nice detail when reading verbose test output; one less thing to distract the attention from the real/important problems. > - once "-f" is added in to suppress that, we might as well add the > command to the &&-chain. That's our normal style, so readers don't > have to wonder if it's important or not. Plus it would help avoid a > broken chain if more commands are added at the beginning of the > function. The bit about commands possibly being added at the beginning of the function probably deserves its own bullet point. I often (relatively speaking) cite that reason when asking people to &&-chain variable assignments at the beginning of a function. func () { a=1 && b=2 && ... } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] t4216: fix broken '&&'-chain 2020-06-30 17:50 ` Eric Sunshine 2020-06-30 18:39 ` Taylor Blau @ 2020-06-30 18:55 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: Jeff King @ 2020-06-30 18:55 UTC (permalink / raw) To: Eric Sunshine; +Cc: Taylor Blau, Git List, Derrick Stolee On Tue, Jun 30, 2020 at 01:50:22PM -0400, Eric Sunshine wrote: > > diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh > > @@ -53,7 +53,7 @@ sane_unset GIT_TRACE2_PERF_BRIEF > > sane_unset GIT_TRACE2_CONFIG_PARAMS > > Not related to this patch, but 'sane_unset' is pointless outside of a > test since there is no &&-chain to maintain. Plain 'unset' would work > just as well and be less misleading. We do this in lots of other places, too. Try: git grep sane_unset | grep -v '&&' Though seeing the variables they cover, I think many of them may come from the same few authors. :) I wonder if it is worth keeping these as sane_unset, though. Your comment is based on the knowledge that the "sane" part of the function is ignoring the return value. But we could conceivably have other portability fixes (it's not impossible that a shell wants one "unset" per variable, for example), in which case we'd want it in more places. That's hypothetical, of course, but saying "just use our portable unset wrapper that behaves sanely" seems like exactly what these call sites want, now and in a hypothetical future. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' 2020-06-30 17:17 [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau 2020-06-30 17:17 ` [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places Taylor Blau 2020-06-30 17:17 ` [PATCH 2/3] t4216: fix broken '&&'-chain Taylor Blau @ 2020-06-30 17:17 ` Taylor Blau 2020-06-30 19:18 ` Jeff King 2020-08-03 19:02 ` [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau 3 siblings, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 17:17 UTC (permalink / raw) To: git; +Cc: peff, dstolee Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'core.useBloomFilters' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/core.txt | 5 +++++ commit-graph.c | 4 ++-- repo-settings.c | 3 +++ repository.h | 1 + t/helper/test-read-graph.c | 3 ++- t/t4216-log-bloom.sh | 4 +++- 6 files changed, 16 insertions(+), 4 deletions(-) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index 74619a9c03..b146bf8d34 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -599,6 +599,11 @@ core.commitGraph:: to parse the graph structure of commits. Defaults to true. See linkgit:git-commit-graph[1] for more information. +core.useBloomFilters:: + If true, then git will use the changed-path Bloom filters in the + commit-graph file (if it exists, and they are present). Defaults to + true. See linkgit:git-commit-graph[1] for more information. + core.useReplaceRefs:: If set to `false`, behave as if the `--no-replace-objects` option was given on the command line. See linkgit:git[1] and diff --git a/commit-graph.c b/commit-graph.c index fdfb0888f0..03c00415c4 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -337,14 +337,14 @@ struct commit_graph *parse_commit_graph(struct repository *r, case GRAPH_CHUNKID_BLOOMINDEXES: if (graph->chunk_bloom_indexes) chunk_repeated = 1; - else + else if (r->settings.core_use_bloom_filters) graph->chunk_bloom_indexes = data + chunk_offset; break; case GRAPH_CHUNKID_BLOOMDATA: if (graph->chunk_bloom_data) chunk_repeated = 1; - else { + else if (r->settings.core_use_bloom_filters) { uint32_t hash_version; graph->chunk_bloom_data = data + chunk_offset; hash_version = get_be32(data + chunk_offset); diff --git a/repo-settings.c b/repo-settings.c index dc6817daa9..d8e3b1c61e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,12 @@ void prepare_repo_settings(struct repository *r) if (!repo_config_get_bool(r, "core.commitgraph", &value)) r->settings.core_commit_graph = value; + if (!repo_config_get_bool(r, "core.usebloomfilters", &value)) + r->settings.core_use_bloom_filters = value; if (!repo_config_get_bool(r, "gc.writecommitgraph", &value)) r->settings.gc_write_commit_graph = value; UPDATE_DEFAULT_BOOL(r->settings.core_commit_graph, 1); + UPDATE_DEFAULT_BOOL(r->settings.core_use_bloom_filters, 1); UPDATE_DEFAULT_BOOL(r->settings.gc_write_commit_graph, 1); if (!repo_config_get_int(r, "index.version", &value)) diff --git a/repository.h b/repository.h index 3c1f7d54bd..cc61533122 100644 --- a/repository.h +++ b/repository.h @@ -29,6 +29,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int core_use_bloom_filters; int gc_write_commit_graph; int fetch_write_commit_graph; diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c index 6d0c962438..5f585a1725 100644 --- a/t/helper/test-read-graph.c +++ b/t/helper/test-read-graph.c @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) setup_git_directory(); odb = the_repository->objects->odb; + prepare_repo_settings(the_repository); + graph = read_commit_graph_one(the_repository, odb); if (!graph) return 1; - printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), *(unsigned char*)(graph->data + 4), diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 0b4cc4f8d1..b1a247477e 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -90,7 +90,9 @@ do "--ancestry-path side..master" do test_expect_success "git log option: $option for path: $path" ' - test_bloom_filters_used "$option -- $path" + test_bloom_filters_used "$option -- $path" && + test_config core.useBloomFilters false && + test_bloom_filters_not_used "$option -- $path" ' done done -- 2.27.0.224.g4cfa086e50 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' 2020-06-30 17:17 ` [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' Taylor Blau @ 2020-06-30 19:18 ` Jeff King 2020-06-30 19:27 ` Taylor Blau 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2020-06-30 19:18 UTC (permalink / raw) To: Taylor Blau; +Cc: git, dstolee On Tue, Jun 30, 2020 at 01:17:48PM -0400, Taylor Blau wrote: > Git uses the 'core.commitGraph' configuration value to control whether > or not the commit graph is used when parsing commits or performing a > traversal. I think this is a good thing to have, and the patch itself makes sense to me (this is actually my first time reviewing it, despite its intended use within GitHub :) ). If I may bikeshed for a moment: > Introduce 'core.useBloomFilters' to control whether or not Bloom filters > are read. Note that this configuration is independent from both: > > - 'core.commitGraph', to allow flexibility in using all parts of a > commit-graph _except_ for its Bloom filters. > > - The '--changed-paths' option for 'git commit-graph write', to allow > reading and writing Bloom filters to be controlled independently. Should we avoid exposing the user to the words "Bloom filter"? The command-line option for writing them was genericized to "changed-paths", which I think is good. The use of Bloom filters is an implementation detail. What the user cares about is whether we can optimize queries of which paths changed in a commit. When we introduced reachability bitmaps long ago, we made the mistake of just calling them "bitmaps". That jargon is well understood by people who work with that code, but it's confusing outside of that (even within other parts of Git) because bitmaps are just a generic data structure. You can have a bitmap of just about anything (and indeed we do use other bitmaps these days). Consistently calling them "reachability bitmaps", especially in the user facing bits, would have reduced confusion over the years. Similarly, Bloom filters are a generic structure we might use elsewhere. I don't really care if we use the word "Bloom" internally to refer to this feature, but we'll be stuck with this config option for all time. I think it's worth picking something more clear. It might even be worth considering whether "changed paths" needs more context (or would if we add new features in the future). On a "git commit-graph write" command-line it is perfectly clear, but would core.commitGraphChangedPaths be worth it? It's definitely more specific, but it's also way more ugly. ;) > diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c > index 6d0c962438..5f585a1725 100644 > --- a/t/helper/test-read-graph.c > +++ b/t/helper/test-read-graph.c > @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) > setup_git_directory(); > odb = the_repository->objects->odb; > > + prepare_repo_settings(the_repository); > + > graph = read_commit_graph_one(the_repository, odb); I wondered why we would need this prepare_repo_settings() now, when it should have been needed already to cover core.commitGraph already. I strongly suspect the answer is: "test-tool read-graph" never properly respected core.commitGraph in the first place. And now presumably it would. If true, I don't think any tests need adjusted because the only places we set it are: - on a "git -c" command line, which wouldn't run a test-tool helper - when we do set it, it is always to "true", which is the default anyway > if (!graph) > return 1; > > - > printf("header: %08x %d %d %d %d\n", > ntohl(*(uint32_t*)graph->data), > *(unsigned char*)(graph->data + 4), Oh good, I happened to be looking at this code earlier today for an unrelated reason and was bothered by this extra newline. :) -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' 2020-06-30 19:18 ` Jeff King @ 2020-06-30 19:27 ` Taylor Blau 2020-06-30 19:33 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Taylor Blau @ 2020-06-30 19:27 UTC (permalink / raw) To: Jeff King; +Cc: Taylor Blau, git, dstolee On Tue, Jun 30, 2020 at 03:18:34PM -0400, Jeff King wrote: > On Tue, Jun 30, 2020 at 01:17:48PM -0400, Taylor Blau wrote: > > > Git uses the 'core.commitGraph' configuration value to control whether > > or not the commit graph is used when parsing commits or performing a > > traversal. > > I think this is a good thing to have, and the patch itself makes sense > to me (this is actually my first time reviewing it, despite its intended > use within GitHub :) ). > > If I may bikeshed for a moment: > > > Introduce 'core.useBloomFilters' to control whether or not Bloom filters > > are read. Note that this configuration is independent from both: > > > > - 'core.commitGraph', to allow flexibility in using all parts of a > > commit-graph _except_ for its Bloom filters. > > > > - The '--changed-paths' option for 'git commit-graph write', to allow > > reading and writing Bloom filters to be controlled independently. > > Should we avoid exposing the user to the words "Bloom filter"? > > The command-line option for writing them was genericized to > "changed-paths", which I think is good. The use of Bloom filters is an > implementation detail. What the user cares about is whether we can > optimize queries of which paths changed in a commit. > > When we introduced reachability bitmaps long ago, we made the mistake of > just calling them "bitmaps". That jargon is well understood by people > who work with that code, but it's confusing outside of that (even within > other parts of Git) because bitmaps are just a generic data structure. > You can have a bitmap of just about anything (and indeed we do use other > bitmaps these days). Consistently calling them "reachability bitmaps", > especially in the user facing bits, would have reduced confusion over > the years. > > Similarly, Bloom filters are a generic structure we might use elsewhere. > I don't really care if we use the word "Bloom" internally to refer to > this feature, but we'll be stuck with this config option for all time. I > think it's worth picking something more clear. All good thoughts. I wondered about this, too, when writing the patch, but ultimately decided to expose the name since this is the only usage of Bloom filters within Git to date. Whether that will continue to be true, I'm not sure, so it probably isn't a great idea to lock ourselves into that decision within the 'core' namespace. So, I'm certainly open to changing it, although I'm not sure that I'm as worried about exposing the implementation detail as I am about squatting on Bloom filters within Git in general. I don't think that this configuration will end up getting used by folks other than server administrators and for debugging purposes, so those populations are already likely to be aware of changed-path Bloom filters beforehand. But, hiding the implementation detail seems like sane advice either way. > It might even be worth considering whether "changed paths" needs more > context (or would if we add new features in the future). On a "git > commit-graph write" command-line it is perfectly clear, but would > core.commitGraphChangedPaths be worth it? It's definitely more specific, > but it's also way more ugly. ;) Here's a third option what about 'graph.readChangedPaths'. I think that Stolee and I discussed a new top-level 'graph' section, since we now have a few commit-graph-related configuration variables in 'core'. That's a little shorter, and it adds the verb 'read', which is more descriptive than 'use' (I touch on this in the third patch, where I say that this configuration variable _doesn't_ affect the '--changed-path' option when writing). Either way, I'd love to hear your thoughts and others', too, to figure out what we think the most agreeable configuration name is. > > diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c > > index 6d0c962438..5f585a1725 100644 > > --- a/t/helper/test-read-graph.c > > +++ b/t/helper/test-read-graph.c > > @@ -12,11 +12,12 @@ int cmd__read_graph(int argc, const char **argv) > > setup_git_directory(); > > odb = the_repository->objects->odb; > > > > + prepare_repo_settings(the_repository); > > + > > graph = read_commit_graph_one(the_repository, odb); > > I wondered why we would need this prepare_repo_settings() now, when it > should have been needed already to cover core.commitGraph already. I > strongly suspect the answer is: "test-tool read-graph" never properly > respected core.commitGraph in the first place. Yep. Could probably be broken out into a separate patch (or mentioned as an aside in this one), but you're right: this helper did not respect any configuration that 'prepare_repo_settings' picks up. > And now presumably it would. If true, I don't think any tests need > adjusted because the only places we set it are: > > - on a "git -c" command line, which wouldn't run a test-tool helper > > - when we do set it, it is always to "true", which is the default > anyway > > > if (!graph) > > return 1; > > > > - > > printf("header: %08x %d %d %d %d\n", > > ntohl(*(uint32_t*)graph->data), > > *(unsigned char*)(graph->data + 4), > > Oh good, I happened to be looking at this code earlier today for an > unrelated reason and was bothered by this extra newline. :) I hoped that nobody would mine me sneaking this in ;-). > > -Peff Thanks, Taylor ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' 2020-06-30 19:27 ` Taylor Blau @ 2020-06-30 19:33 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2020-06-30 19:33 UTC (permalink / raw) To: Taylor Blau; +Cc: git, dstolee On Tue, Jun 30, 2020 at 03:27:18PM -0400, Taylor Blau wrote: > So, I'm certainly open to changing it, although I'm not sure that I'm as > worried about exposing the implementation detail as I am about squatting > on Bloom filters within Git in general. I don't think that this > configuration will end up getting used by folks other than server > administrators and for debugging purposes, so those populations are > already likely to be aware of changed-path Bloom filters beforehand. Yeah, the squatting thing is definitely my bigger concern (having been through the "bitmaps" version of the same thing). > > It might even be worth considering whether "changed paths" needs more > > context (or would if we add new features in the future). On a "git > > commit-graph write" command-line it is perfectly clear, but would > > core.commitGraphChangedPaths be worth it? It's definitely more specific, > > but it's also way more ugly. ;) > > Here's a third option what about 'graph.readChangedPaths'. I think that > Stolee and I discussed a new top-level 'graph' section, since we now > have a few commit-graph-related configuration variables in 'core'. Yes, I like that even better. Probably "graph" is sufficiently specific within Git's context, though I guess it _could_ bring to mind "git log --graph". So many overloaded terms. :) > That's a little shorter, and it adds the verb 'read', which is more > descriptive than 'use' (I touch on this in the third patch, where I say > that this configuration variable _doesn't_ affect the '--changed-path' > option when writing). Yeah, saying "read" specifically is much nicer. > > > + prepare_repo_settings(the_repository); > > > + > > > graph = read_commit_graph_one(the_repository, odb); > > > > I wondered why we would need this prepare_repo_settings() now, when it > > should have been needed already to cover core.commitGraph already. I > > strongly suspect the answer is: "test-tool read-graph" never properly > > respected core.commitGraph in the first place. > > Yep. Could probably be broken out into a separate patch (or mentioned as > an aside in this one), but you're right: this helper did not respect > any configuration that 'prepare_repo_settings' picks up. I'd probably just note it in the commit message, but I'd be fine with that or with a separate patch. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' 2020-06-30 17:17 [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau ` (2 preceding siblings ...) 2020-06-30 17:17 ` [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' Taylor Blau @ 2020-08-03 19:02 ` Taylor Blau 3 siblings, 0 replies; 16+ messages in thread From: Taylor Blau @ 2020-08-03 19:02 UTC (permalink / raw) To: Taylor Blau; +Cc: git, peff, dstolee On Tue, Jun 30, 2020 at 01:17:36PM -0400, Taylor Blau wrote: > Hi, > > Here are some patches that we have been using at GitHub to control > whether or not Bloom filters stored in commit-graphs are read during > normal operation. I took the suggestions raised here and incorporated them into a much-larger series that contains updated versions of these patches here: https://lore.kernel.org/git/20200803185947.GA67482@syl.lan/ This thread should be discarded, and future discussion redirected above. Thanks. > > -Taylor > > Taylor Blau (3): > commit-graph: pass a 'struct repository *' in more places > t4216: fix broken '&&'-chain > commit-graph: respect 'core.useBloomFilters' > > Documentation/config/core.txt | 5 +++++ > builtin/commit-graph.c | 2 +- > commit-graph.c | 17 ++++++++++------- > commit-graph.h | 4 +++- > fuzz-commit-graph.c | 5 +++-- > repo-settings.c | 3 +++ > repository.h | 1 + > t/helper/test-read-graph.c | 3 ++- > t/t4216-log-bloom.sh | 6 ++++-- > 9 files changed, 32 insertions(+), 14 deletions(-) > > -- > 2.27.0.224.g4cfa086e50 Thanks, Taylor ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-08-03 19:02 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-30 17:17 [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau 2020-06-30 17:17 ` [PATCH 1/3] commit-graph: pass a 'struct repository *' in more places Taylor Blau 2020-06-30 20:52 ` Derrick Stolee 2020-06-30 17:17 ` [PATCH 2/3] t4216: fix broken '&&'-chain Taylor Blau 2020-06-30 17:50 ` Eric Sunshine 2020-06-30 18:39 ` Taylor Blau 2020-06-30 19:03 ` Jeff King 2020-06-30 19:12 ` Taylor Blau 2020-06-30 19:19 ` Jeff King 2020-06-30 19:48 ` Eric Sunshine 2020-06-30 18:55 ` Jeff King 2020-06-30 17:17 ` [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' Taylor Blau 2020-06-30 19:18 ` Jeff King 2020-06-30 19:27 ` Taylor Blau 2020-06-30 19:33 ` Jeff King 2020-08-03 19:02 ` [PATCH 0/3] commit-graph: introduce 'core.useBloomFilters' Taylor Blau
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).