* [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() @ 2017-08-27 7:37 Martin Ågren 2017-08-27 15:41 ` Jeff King 2017-08-27 18:21 ` Lars Schneider 0 siblings, 2 replies; 43+ messages in thread From: Martin Ågren @ 2017-08-27 7:37 UTC (permalink / raw) To: git; +Cc: Lars Schneider, Jeff King The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add packet_write_fmt_gently()", 2016-10-16). As a result, for each call to packet_write_fmt_1, we allocate and leak a buffer. We could keep the strbuf non-static and instead make sure we always release it before returning (but not before we die, so that we don't touch errno). That would also prepare us for threaded use. But until that needs to happen, let's just restore the static-ness so that we get back to a situation where we (eventually) do not continuosly keep allocating memory. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- I waffled between "fixing the memory leak" by releasing the buffer and "fixing the static-ness" as in this patch. I can see arguments both ways and don't have any strong opinion. pkt-line.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pkt-line.c b/pkt-line.c index 7db911957..f364944b9 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -136,9 +136,10 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args) static int packet_write_fmt_1(int fd, int gently, const char *fmt, va_list args) { - struct strbuf buf = STRBUF_INIT; + static struct strbuf buf = STRBUF_INIT; ssize_t count; + strbuf_reset(&buf); format_packet(&buf, fmt, args); count = write_in_full(fd, buf.buf, buf.len); if (count == buf.len) -- 2.14.1.151.g45c1275a3.dirty ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 7:37 [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren @ 2017-08-27 15:41 ` Jeff King 2017-08-27 18:25 ` Jeff King 2017-08-27 18:21 ` Lars Schneider 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-27 15:41 UTC (permalink / raw) To: Martin Ågren; +Cc: git, Lars Schneider On Sun, Aug 27, 2017 at 09:37:32AM +0200, Martin Ågren wrote: > The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add > packet_write_fmt_gently()", 2016-10-16). As a result, for each call to > packet_write_fmt_1, we allocate and leak a buffer. > > We could keep the strbuf non-static and instead make sure we always > release it before returning (but not before we die, so that we don't > touch errno). That would also prepare us for threaded use. But until > that needs to happen, let's just restore the static-ness so that we get > back to a situation where we (eventually) do not continuosly keep > allocating memory. Ouch. So this means that git since v2.11 is basically leaking every non-byte pack sent by upload-pack (so all of the ref advertisement and want/have negotiation). I'm surprised nobody noticed the extra memory use, but I guess those things aren't usually _too_ big. > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > I waffled between "fixing the memory leak" by releasing the buffer and > "fixing the static-ness" as in this patch. I can see arguments both ways > and don't have any strong opinion. I think this is a good fix for now as it takes as back to the pre-bug state. The only downside with the static buffer is that it's not reentrant. Since the function is just inherently writing out the result and then forgetting it, in a single thread there's no opportunity for a sub-function to try writing another packet. And I don't think we have any code paths that write packets from multiple threads. That may be something we do eventually, but we can deal with it then (and until then, it's nice to avoid the malloc/free overhead). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 15:41 ` Jeff King @ 2017-08-27 18:25 ` Jeff King 0 siblings, 0 replies; 43+ messages in thread From: Jeff King @ 2017-08-27 18:25 UTC (permalink / raw) To: Martin Ågren; +Cc: git, Lars Schneider On Sun, Aug 27, 2017 at 11:41:52AM -0400, Jeff King wrote: > Ouch. So this means that git since v2.11 is basically leaking every > non-byte pack sent by upload-pack (so all of the ref advertisement and > want/have negotiation). Determined experimentally that the answer is: yes. The increase in ref advertisement is roughly linear with the size of your packed-refs, so for most people this wasn't a big deal (and even if you have insanely large packed-refs, it's "only" 2-3x the RAM; potentially a problem, but only if you're close to the tipping point already). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 7:37 [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren 2017-08-27 15:41 ` Jeff King @ 2017-08-27 18:21 ` Lars Schneider 2017-08-27 19:09 ` Martin Ågren 1 sibling, 1 reply; 43+ messages in thread From: Lars Schneider @ 2017-08-27 18:21 UTC (permalink / raw) To: Martin Ågren; +Cc: Git Users, Jeff King > On 27 Aug 2017, at 09:37, Martin Ågren <martin.agren@gmail.com> wrote: > > The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add > packet_write_fmt_gently()", 2016-10-16). As a result, for each call to > packet_write_fmt_1, we allocate and leak a buffer. Oh :-( Thanks for detecting and fixing the leak. How did you actually find it? Reading the code or did you use some tool? > We could keep the strbuf non-static and instead make sure we always > release it before returning (but not before we die, so that we don't > touch errno). That would also prepare us for threaded use. But until > that needs to happen, let's just restore the static-ness so that we get > back to a situation where we (eventually) do not continuosly keep > allocating memory. > > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > I waffled between "fixing the memory leak" by releasing the buffer and > "fixing the static-ness" as in this patch. I can see arguments both ways > and don't have any strong opinion. > > pkt-line.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/pkt-line.c b/pkt-line.c > index 7db911957..f364944b9 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -136,9 +136,10 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args) > static int packet_write_fmt_1(int fd, int gently, > const char *fmt, va_list args) > { > - struct strbuf buf = STRBUF_INIT; > + static struct strbuf buf = STRBUF_INIT; > ssize_t count; > > + strbuf_reset(&buf); > format_packet(&buf, fmt, args); > count = write_in_full(fd, buf.buf, buf.len); > if (count == buf.len) > -- > 2.14.1.151.g45c1275a3.dirty Looks good to me! - Lars ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 18:21 ` Lars Schneider @ 2017-08-27 19:09 ` Martin Ågren 2017-08-27 19:15 ` Jeff King 2017-08-27 20:04 ` Lars Schneider 0 siblings, 2 replies; 43+ messages in thread From: Martin Ågren @ 2017-08-27 19:09 UTC (permalink / raw) To: Lars Schneider; +Cc: Git Users, Jeff King On 27 August 2017 at 20:21, Lars Schneider <larsxschneider@gmail.com> wrote: > >> On 27 Aug 2017, at 09:37, Martin Ågren <martin.agren@gmail.com> wrote: >> >> The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add >> packet_write_fmt_gently()", 2016-10-16). As a result, for each call to >> packet_write_fmt_1, we allocate and leak a buffer. > > Oh :-( Thanks for detecting and fixing the leak. > > How did you actually find it? Reading the code or did you use some > tool? Valgrind found it for me. Most leaks it reports are "ok" since we're about to exit, it just takes more or less effort to realize that... Martin ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 19:09 ` Martin Ågren @ 2017-08-27 19:15 ` Jeff King 2017-08-27 20:04 ` Lars Schneider 1 sibling, 0 replies; 43+ messages in thread From: Jeff King @ 2017-08-27 19:15 UTC (permalink / raw) To: Martin Ågren; +Cc: Lars Schneider, Git Users On Sun, Aug 27, 2017 at 09:09:15PM +0200, Martin Ågren wrote: > On 27 August 2017 at 20:21, Lars Schneider <larsxschneider@gmail.com> wrote: > > > >> On 27 Aug 2017, at 09:37, Martin Ågren <martin.agren@gmail.com> wrote: > >> > >> The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add > >> packet_write_fmt_gently()", 2016-10-16). As a result, for each call to > >> packet_write_fmt_1, we allocate and leak a buffer. > > > > Oh :-( Thanks for detecting and fixing the leak. > > > > How did you actually find it? Reading the code or did you use some > > tool? > > Valgrind found it for me. Most leaks it reports are "ok" since we're > about to exit, it just takes more or less effort to realize that... This is one more thing it would be nice to have as part of our perf-testing framework. It will run two versions of Git across a battery of tests and report on the runtime differences for each. It would be great if it did the same for peak heap. The tricky thing is that you sometimes need repositories that are exaggerated in size in one dimension to notice the differences as significant. So I don't know if we would need new tests for this, or if existing "this repo has a lot of refs" tests would have caught this. Another approach, of course, is to get valgrind (or asan) to a zero-leaks-reported steady state, and then even small leak reports would be worth investigating. I'm not sure how easy it is to get there, and whether it's less work to actually plug free-on-exit leaks or somehow suppress the reports. I think most free-on-exit false positives would show up as "leaked but still reachable", whereas leaks like this that can grow without bound mean would not be reachable (we throw away the pointer to the memory at the end of the function). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 19:09 ` Martin Ågren 2017-08-27 19:15 ` Jeff King @ 2017-08-27 20:04 ` Lars Schneider 2017-08-27 23:23 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: Lars Schneider @ 2017-08-27 20:04 UTC (permalink / raw) To: Martin Ågren; +Cc: Git Users, Jeff King > On 27 Aug 2017, at 21:09, Martin Ågren <martin.agren@gmail.com> wrote: > > On 27 August 2017 at 20:21, Lars Schneider <larsxschneider@gmail.com> wrote: >> >>> On 27 Aug 2017, at 09:37, Martin Ågren <martin.agren@gmail.com> wrote: >>> >>> The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add >>> packet_write_fmt_gently()", 2016-10-16). As a result, for each call to >>> packet_write_fmt_1, we allocate and leak a buffer. >> >> Oh :-( Thanks for detecting and fixing the leak. >> >> How did you actually find it? Reading the code or did you use some >> tool? > > Valgrind found it for me. Most leaks it reports are "ok" since we're > about to exit, it just takes more or less effort to realize that... I am sorry if I am bugging you here, but I would really like to understand how you found it. I did run all tests under valgrind using "make valgrind" and I found the following files with potential issues: cat valgrind.out | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c 7102 2 clone.c 33 common-main.c 6 connect.c 64 git.c 4 ls-remote.c 126 run-command.c 12 transport.c 7 worktree.c No mention of "pkt-line.c". Did you run Git with valgrind on one of your repositories to find it? Thanks, Lars ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 20:04 ` Lars Schneider @ 2017-08-27 23:23 ` Jeff King 2017-08-28 4:11 ` Martin Ågren 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-27 23:23 UTC (permalink / raw) To: Lars Schneider; +Cc: Martin Ågren, Git Users On Sun, Aug 27, 2017 at 10:04:55PM +0200, Lars Schneider wrote: > I did run all tests under valgrind using "make valgrind" and I found > the following files with potential issues: > > cat valgrind.out | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c > 7102 > 2 clone.c > 33 common-main.c > 6 connect.c > 64 git.c > 4 ls-remote.c > 126 run-command.c > 12 transport.c > 7 worktree.c I'm not sure where valgrind.out comes from. The individual test-results/*.out files may have valgrind output, but I don't think they usually contain leak output. Doing "valgrind ./git-upload-pack . </dev/null >/dev/null" mentions leaked memory but not the locations. Adding --leak-check=full shows that most of it comes from format_packet(). And applying Martin's patch drops the "definitely lost" category down to 0 bytes (there's still 550k in "still reachable", but those are in the "exit will free them for us" category). > No mention of "pkt-line.c". Did you run Git with valgrind on one of > your repositories to find it? I'm curious, too. I don't think the valgrind setup in our test suite is great for finding leaks right now. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-27 23:23 ` Jeff King @ 2017-08-28 4:11 ` Martin Ågren 2017-08-28 17:52 ` Stefan Beller 2017-08-29 17:51 ` Lars Schneider 0 siblings, 2 replies; 43+ messages in thread From: Martin Ågren @ 2017-08-28 4:11 UTC (permalink / raw) To: Jeff King; +Cc: Lars Schneider, Git Users On 28 August 2017 at 01:23, Jeff King <peff@peff.net> wrote: > On Sun, Aug 27, 2017 at 10:04:55PM +0200, Lars Schneider wrote: > >> I did run all tests under valgrind using "make valgrind" and I found >> the following files with potential issues: >> >> cat valgrind.out | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c >> 7102 >> 2 clone.c >> 33 common-main.c >> 6 connect.c >> 64 git.c >> 4 ls-remote.c >> 126 run-command.c >> 12 transport.c >> 7 worktree.c > > I'm not sure where valgrind.out comes from. The individual > test-results/*.out files may have valgrind output, but I don't think > they usually contain leak output. > > Doing "valgrind ./git-upload-pack . </dev/null >/dev/null" mentions > leaked memory but not the locations. Adding --leak-check=full shows that > most of it comes from format_packet(). > > And applying Martin's patch drops the "definitely lost" category down to > 0 bytes (there's still 550k in "still reachable", but those are in the > "exit will free them for us" category). > >> No mention of "pkt-line.c". Did you run Git with valgrind on one of >> your repositories to find it? > > I'm curious, too. I don't think the valgrind setup in our test suite is > great for finding leaks right now. Sorry for being brief. I've patched t/valgrind/valgrind.sh to say "--leak-check=yes". Then I run "./t0000 --valgrind", simply because running the complete suite gives more reports than I could possibly process. Then I check the first few leaks, verify that they're "ok" and add them to a suppressions-list. Lather, rinse, repeat. A couple of very targeted and well-motivated suppressions in git.git could perhaps be motivated, but there are many many reported leaks. My suppressions-list is getting gross. I started with t0000 and t0001 because I figure, once I have those basic suppressions in place (or leaks fixed), I can run some other more interesting tests. Of course, the concept of "this leak is ok" is a bit subjective. For example, we might do "return !!create_object(...);" (function name invented on the fly), which is a leak, and unreachable. But if this is only done once in builtin/foo.c and the object created is small, then this could be deemed "ok", since in practice this leak will never bring anyone over the cliff. If clean-ups in such code would not just be code churn, then I can of course adjust my definition of "ok" accordingly. This is not an attempt to find and fix a huge number of leaks, it's more to have a good reason to go through call-stacks, convince myself I know what the code wants to do and how it does it. Looking at only "unreachable" leaks seems like it should be an improvement for finding the interesting cases. I'll have less time for Git this week, but can try it out as time permits. Thanks for your feedback, both of you. Martin ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-28 4:11 ` Martin Ågren @ 2017-08-28 17:52 ` Stefan Beller 2017-08-28 17:58 ` Jeff King 2017-08-29 17:51 ` Lars Schneider 1 sibling, 1 reply; 43+ messages in thread From: Stefan Beller @ 2017-08-28 17:52 UTC (permalink / raw) To: Martin Ågren; +Cc: Jeff King, Lars Schneider, Git Users >>> No mention of "pkt-line.c". Did you run Git with valgrind on one of >>> your repositories to find it? >> >> I'm curious, too. I don't think the valgrind setup in our test suite is >> great for finding leaks right now. > This discussion comes up every once in a while, the last time I was involved in this discussion I proposed to have an "optional_free(void *)", which only frees memory in e.g. the developer build/debug build. That way we can have a strict "no leaks in developer build" policy (as it is easy to detect!), but it doesn't slow down the widely distributed version of Git. IIRC, then the discussion focused on "what if we misuse optional_free and the real free", such that the widely distributed version just leaks large amounts of memory and the developers do not notice, but quite the opposite(!) the developers would feel safe to ignore any memory complaints, because the developer build is clean. Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-28 17:52 ` Stefan Beller @ 2017-08-28 17:58 ` Jeff King 2017-09-05 10:03 ` Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-28 17:58 UTC (permalink / raw) To: Stefan Beller; +Cc: Martin Ågren, Lars Schneider, Git Users On Mon, Aug 28, 2017 at 10:52:51AM -0700, Stefan Beller wrote: > >> I'm curious, too. I don't think the valgrind setup in our test suite is > >> great for finding leaks right now. > > This discussion comes up every once in a while, > the last time I was involved in this discussion I proposed > to have an "optional_free(void *)", which only frees memory > in e.g. the developer build/debug build. > > That way we can have a strict "no leaks in developer build" > policy (as it is easy to detect!), but it doesn't slow down the > widely distributed version of Git. Personally I am not that worried about slowing down program-exit with some free() calls (though I would reserve judgement to see how slow it actually is). It is that I do not want to deal with the complexity and maintenance headache of dropping values which are program-lifetime caches. If we introduce a free-all-object-structs function, now everybody needs to remember to call it (even if they didn't realize they were allocating object structs in the first place, as it may have happened under the hood of a sub-function). And we have to wonder what might happen when somebody calls that function _not_ at program exit. If we can declare "reachable things are not leaks" (and certainly valgrind is aware of that distinction), then all of that goes away. And you're just left with local variables in main() (or our cmd_* equivalents) that appear as leaks. And then we can solve that either by freeing them, or just calling them program-lifetime and telling the compiler that it's so by declaring them "static". -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-28 17:58 ` Jeff King @ 2017-09-05 10:03 ` Junio C Hamano 0 siblings, 0 replies; 43+ messages in thread From: Junio C Hamano @ 2017-09-05 10:03 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, Martin Ågren, Lars Schneider, Git Users Jeff King <peff@peff.net> writes: > On Mon, Aug 28, 2017 at 10:52:51AM -0700, Stefan Beller wrote: > >> >> I'm curious, too. I don't think the valgrind setup in our test suite is >> >> great for finding leaks right now. >> >> This discussion comes up every once in a while, >> the last time I was involved in this discussion I proposed >> to have an "optional_free(void *)", which only frees memory >> in e.g. the developer build/debug build. >> >> That way we can have a strict "no leaks in developer build" >> policy (as it is easy to detect!), but it doesn't slow down the >> widely distributed version of Git. > > Personally I am not that worried about slowing down program-exit with > some free() calls (though I would reserve judgement to see how slow it > actually is). > > It is that I do not want to deal with the complexity and maintenance > headache of dropping values which are program-lifetime caches. If we > introduce a free-all-object-structs function, now everybody needs to > remember to call it (even if they didn't realize they were allocating > object structs in the first place, as it may have happened under the > hood of a sub-function). And we have to wonder what might happen when > somebody calls that function _not_ at program exit. In addition, the code earlier may have a variable point at a compiled in literal string or an allocated string depending on the situation and it would have been perfectly fine to rely on exit() to clean it up. "We free all ourselves before exit()" would mean these codepaths now need to be updated to keep track of what needs to be and what must not be freed, or just duplicate everything to make the life of that "free everything" part easier, which somehow feels like a tail wagging a dog. > If we can declare "reachable things are not leaks" (and certainly > valgrind is aware of that distinction), then all of that goes away. And > you're just left with local variables in main() (or our cmd_* > equivalents) that appear as leaks. And then we can solve that either by > freeing them, or just calling them program-lifetime and telling the > compiler that it's so by declaring them "static". Yup. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-28 4:11 ` Martin Ågren 2017-08-28 17:52 ` Stefan Beller @ 2017-08-29 17:51 ` Lars Schneider 2017-08-29 18:53 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: Lars Schneider @ 2017-08-29 17:51 UTC (permalink / raw) To: Martin Ågren; +Cc: Jeff King, Git Users > On 28 Aug 2017, at 06:11, Martin Ågren <martin.agren@gmail.com> wrote: > > On 28 August 2017 at 01:23, Jeff King <peff@peff.net> wrote: >> On Sun, Aug 27, 2017 at 10:04:55PM +0200, Lars Schneider wrote: >> >>> I did run all tests under valgrind using "make valgrind" and I found >>> the following files with potential issues: >>> >>> cat valgrind.out | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c >>> 7102 >>> 2 clone.c >>> 33 common-main.c >>> 6 connect.c >>> 64 git.c >>> 4 ls-remote.c >>> 126 run-command.c >>> 12 transport.c >>> 7 worktree.c >> >> I'm not sure where valgrind.out comes from. The individual >> test-results/*.out files may have valgrind output, but I don't think >> they usually contain leak output. >> >> Doing "valgrind ./git-upload-pack . </dev/null >/dev/null" mentions >> leaked memory but not the locations. Adding --leak-check=full shows that >> most of it comes from format_packet(). >> >> And applying Martin's patch drops the "definitely lost" category down to >> 0 bytes (there's still 550k in "still reachable", but those are in the >> "exit will free them for us" category). >> >>> No mention of "pkt-line.c". Did you run Git with valgrind on one of >>> your repositories to find it? >> >> I'm curious, too. I don't think the valgrind setup in our test suite is >> great for finding leaks right now. > > Sorry for being brief. I've patched t/valgrind/valgrind.sh to say > "--leak-check=yes". Then I run "./t0000 --valgrind", simply because > running the complete suite gives more reports than I could possibly > process. I set $TOOL_OPTIONS in valgrind.sh: to '--leak-check=full --errors-for-leak-kinds=definite' ... but I also had to adjust t/test-lib-functions.sh:test_create_repo as I ran into the error "cannot run git init -- have you built things yet?". With these changes I was able to see the leak running valgrind with t5110. --- What if we run a few selected tests with valgrind and count all files that valgrind mentions (a single leak has multiple file mentions because of the stack trace and other leak indicators). We record these counts and let TravisCI scream if one of the numbers increases. I wonder how stable/fragile such a metric would be as a simple refactoring could easily change these numbers. Below I ran valgrind on t5510 before and after Martin's patch. The diff below clearly shows the pkt-line leak. Would it make sense to pursue something like this in TravisCI to avoid "pkt-line" kind of leaks in the future? ## Valgrind run with leak $ ./t5510-fetch.sh --valgrind | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c 15529 39 abspath.c 30 add.c 34 branch.c 10 bundle.c 268 clone.c 13 commit.c 471 common-main.c 52 config.c 50 connect.c 4 connected.c 1 diff-lib.c 102 dir.c 120 environment.c 4 fetch-pack.c 47 fetch.c 14 files-backend.c 1 git-compat-util.h 871 git.c 96 init-db.c 26 iterator.c 1 list-objects.c 4 log-tree.c 2 object.c 1 pack-objects.c 6 parse-options.c 10 pathspec.c 83 pkt-line.c 6 precompose_utf8.c 2 push.c 67 refs.c 96 remote.c 90 repository.c 4 rev-list.c 10 revision.c 165 run-command.c 42 setup.c 288 strbuf.c 51 strbuf.h 2 tag.c 107 transport.c 10 tree-diff.c 85 upload-pack.c 1 usage.c 455 wrapper.c ## Valgrind run with Matrin's patch to fix the leak $ ./t5510-fetch.sh --valgrind | perl -nE 'say /^==.+\((.+)\:.+\)$/' | sort | uniq -c 14931 39 abspath.c 30 add.c 34 branch.c 10 bundle.c 268 clone.c 13 commit.c 433 common-main.c 52 config.c 50 connect.c 6 connected.c 1 diff-lib.c 102 dir.c 120 environment.c 4 fetch-pack.c 53 fetch.c 14 files-backend.c 1 git-compat-util.h 879 git.c 96 init-db.c 1 iterator.c 1 list-objects.c 4 log-tree.c 2 object.c 1 pack-objects.c 6 parse-options.c 10 pathspec.c 6 precompose_utf8.c 2 push.c 26 refs.c 96 remote.c 90 repository.c 6 rev-list.c 14 revision.c 171 run-command.c 42 setup.c 246 strbuf.c 50 strbuf.h 2 tag.c 107 transport.c 10 tree-diff.c 2 upload-pack.c 1 usage.c 415 wrapper.c ## Diff 15529 | 14931 39 abspath.c 39 abspath.c 30 add.c 30 add.c 34 branch.c 34 branch.c 10 bundle.c 10 bundle.c 268 clone.c 268 clone.c 13 commit.c 13 commit.c 471 common-main.c | 433 common-main.c 52 config.c 52 config.c 50 connect.c 50 connect.c 4 connected.c | 6 connected.c 1 diff-lib.c 1 diff-lib.c 102 dir.c 102 dir.c 120 environment.c 120 environment.c 4 fetch-pack.c 4 fetch-pack.c 47 fetch.c | 53 fetch.c 14 files-backend.c 14 files-backend.c 1 git-compat-util.h 1 git-compat-util.h 871 git.c | 879 git.c 96 init-db.c 96 init-db.c 26 iterator.c | 1 iterator.c 1 list-objects.c 1 list-objects.c 4 log-tree.c 4 log-tree.c 2 object.c 2 object.c 1 pack-objects.c 1 pack-objects.c 6 parse-options.c 6 parse-options.c 10 pathspec.c 10 pathspec.c 83 pkt-line.c < 6 precompose_utf8.c 6 precompose_utf8.c 2 push.c 2 push.c 67 refs.c | 26 refs.c 96 remote.c 96 remote.c 90 repository.c 90 repository.c 4 rev-list.c | 6 rev-list.c 10 revision.c | 14 revision.c 165 run-command.c | 171 run-command.c 42 setup.c 42 setup.c 288 strbuf.c | 246 strbuf.c 51 strbuf.h | 50 strbuf.h 2 tag.c 2 tag.c 107 transport.c 107 transport.c 10 tree-diff.c 10 tree-diff.c 85 upload-pack.c | 2 upload-pack.c 1 usage.c 1 usage.c 455 wrapper.c | 415 wrapp Thanks, Lars ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-29 17:51 ` Lars Schneider @ 2017-08-29 18:53 ` Jeff King 2017-08-29 18:58 ` [PATCH] config: use a static lock_file struct Jeff King 2017-08-29 19:22 ` [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren 0 siblings, 2 replies; 43+ messages in thread From: Jeff King @ 2017-08-29 18:53 UTC (permalink / raw) To: Lars Schneider; +Cc: Martin Ågren, Git Users On Tue, Aug 29, 2017 at 06:51:52PM +0100, Lars Schneider wrote: > I set $TOOL_OPTIONS in valgrind.sh: to > '--leak-check=full --errors-for-leak-kinds=definite' > > ... but I also had to adjust t/test-lib-functions.sh:test_create_repo > as I ran into the error "cannot run git init -- have you built things yet?". Yeah, this is a general problem with run-time analyzers. If we're not mostly error free than the setup code often breaks before you even get to the interesting part of the test. It looks like the config code has a minor-ish leak. Patch to follow. > What if we run a few selected tests with valgrind and count all files that > valgrind mentions (a single leak has multiple file mentions because of > the stack trace and other leak indicators). We record these counts and let > TravisCI scream if one of the numbers increases. > > I wonder how stable/fragile such a metric would be as a simple refactoring > could easily change these numbers. Below I ran valgrind on t5510 before and > after Martin's patch. The diff below clearly shows the pkt-line leak. > > Would it make sense to pursue something like this in TravisCI to avoid > "pkt-line" kind of leaks in the future? I don't think that would work, because simply adding new tests would bump the leak count, without the code actually growing worse. But think about your strategy for a moment: what you're really trying to do is say "these existing leaks are OK because they are too many for us to count, but we want to make sure we don't add _new_ ones". We already have two techniques for distinguishing old ones from new ones: 1. Mark existing ones with valgrind suppressions so they do not warn at all. 2. Fix them, so that the "existing" count drops to zero. Option (2), of course, has the added side effect that it's actually fixing potential problems. :) -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH] config: use a static lock_file struct 2017-08-29 18:53 ` Jeff King @ 2017-08-29 18:58 ` Jeff King 2017-08-29 19:09 ` Brandon Williams 2017-09-06 3:59 ` Junio C Hamano 2017-08-29 19:22 ` [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren 1 sibling, 2 replies; 43+ messages in thread From: Jeff King @ 2017-08-29 18:58 UTC (permalink / raw) To: Lars Schneider; +Cc: Martin Ågren, Git Users On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > It looks like the config code has a minor-ish leak. Patch to follow. Here it is. -- >8 -- Subject: [PATCH] config: use a static lock_file struct When modifying git config, we xcalloc() a struct lock_file but never free it. This is necessary because the tempfile code (upon which the locking code is built) requires that the resulting struct remain valid through the life of the program. However, it also confuses leak-checkers like valgrind because only the inner "struct tempfile" is still reachable; no pointer to the outer lock_file is kept. Other code paths solve this by using a single static lock struct. We can do the same here, because we know that we'll only lock and modify one config file at a time (and assertions within the lockfile code will ensure that this remains the case). That removes a real leak (when we fail to free the struct after locking fails) as well as removes the valgrind false positive. It also means that doing N sequential config-writes will use a constant amount of memory, rather than leaving stale lock_files for each. Note that since "lock" is no longer a pointer, it can't be NULL anymore. But that's OK. We used that feature only to avoid calling rollback_lock_file() on an already-committed lock. Since the lockfile code keeps its own "active" flag, it's a noop to rollback an inactive lock, and we don't have to worry about this ourselves. Signed-off-by: Jeff King <peff@peff.net> --- In the long run we may want to drop the "tempfiles must remain forever" rule. This is certainly not the first time it has caused confusion or leaks. And I don't think it's a fundamental issue, just the way the code is written. But in the interim, this fix is probably worth doing. config.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/config.c b/config.c index d0d8ce823a..1603f96e40 100644 --- a/config.c +++ b/config.c @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, { int fd = -1, in_fd = -1; int ret; - struct lock_file *lock = NULL; + static struct lock_file lock; char *filename_buf = NULL; char *contents = NULL; size_t contents_sz; @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, * The lock serves a purpose in addition to locking: the new * contents of .git/config will be written into it. */ - lock = xcalloc(1, sizeof(struct lock_file)); - fd = hold_lock_file_for_update(lock, config_filename, 0); + fd = hold_lock_file_for_update(&lock, config_filename, 0); if (fd < 0) { error_errno("could not lock config file %s", config_filename); free(store.key); @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, close(in_fd); in_fd = -1; - if (chmod(get_lock_file_path(lock), st.st_mode & 07777) < 0) { - error_errno("chmod on %s failed", get_lock_file_path(lock)); + if (chmod(get_lock_file_path(&lock), st.st_mode & 07777) < 0) { + error_errno("chmod on %s failed", get_lock_file_path(&lock)); ret = CONFIG_NO_WRITE; goto out_free; } @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, contents = NULL; } - if (commit_lock_file(lock) < 0) { + if (commit_lock_file(&lock) < 0) { error_errno("could not write config file %s", config_filename); ret = CONFIG_NO_WRITE; - lock = NULL; goto out_free; } - /* - * lock is committed, so don't try to roll it back below. - * NOTE: Since lockfile.c keeps a linked list of all created - * lock_file structures, it isn't safe to free(lock). It's - * better to just leave it hanging around. - */ - lock = NULL; ret = 0; /* Invalidate the config cache */ git_config_clear(); out_free: - if (lock) - rollback_lock_file(lock); + rollback_lock_file(&lock); free(filename_buf); if (contents) munmap(contents, contents_sz); @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, return ret; write_err_out: - ret = write_error(get_lock_file_path(lock)); + ret = write_error(get_lock_file_path(&lock)); goto out_free; } -- 2.14.1.721.gc5bc1565f1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-29 18:58 ` [PATCH] config: use a static lock_file struct Jeff King @ 2017-08-29 19:09 ` Brandon Williams 2017-08-29 19:10 ` Brandon Williams 2017-08-29 19:12 ` Jeff King 2017-09-06 3:59 ` Junio C Hamano 1 sibling, 2 replies; 43+ messages in thread From: Brandon Williams @ 2017-08-29 19:09 UTC (permalink / raw) To: Jeff King; +Cc: Lars Schneider, Martin Ågren, Git Users On 08/29, Jeff King wrote: > On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > > > It looks like the config code has a minor-ish leak. Patch to follow. > > Here it is. > > -- >8 -- > Subject: [PATCH] config: use a static lock_file struct > > When modifying git config, we xcalloc() a struct lock_file > but never free it. This is necessary because the tempfile > code (upon which the locking code is built) requires that > the resulting struct remain valid through the life of the > program. However, it also confuses leak-checkers like > valgrind because only the inner "struct tempfile" is still > reachable; no pointer to the outer lock_file is kept. Is this just due to a limitation in the tempfile code? Would it be possible to improve the tempfile code such that we don't need to require that a tempfile, once created, is required to exist for the remaining life of the program? > > Other code paths solve this by using a single static lock > struct. We can do the same here, because we know that we'll > only lock and modify one config file at a time (and > assertions within the lockfile code will ensure that this > remains the case). > > That removes a real leak (when we fail to free the struct > after locking fails) as well as removes the valgrind false > positive. It also means that doing N sequential > config-writes will use a constant amount of memory, rather > than leaving stale lock_files for each. > > Note that since "lock" is no longer a pointer, it can't be > NULL anymore. But that's OK. We used that feature only to > avoid calling rollback_lock_file() on an already-committed > lock. Since the lockfile code keeps its own "active" flag, > it's a noop to rollback an inactive lock, and we don't have > to worry about this ourselves. > > Signed-off-by: Jeff King <peff@peff.net> > --- > In the long run we may want to drop the "tempfiles must remain forever" > rule. This is certainly not the first time it has caused confusion or > leaks. And I don't think it's a fundamental issue, just the way the code > is written. But in the interim, this fix is probably worth doing. > > config.c | 24 +++++++----------------- > 1 file changed, 7 insertions(+), 17 deletions(-) > > diff --git a/config.c b/config.c > index d0d8ce823a..1603f96e40 100644 > --- a/config.c > +++ b/config.c > @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > { > int fd = -1, in_fd = -1; > int ret; > - struct lock_file *lock = NULL; > + static struct lock_file lock; > char *filename_buf = NULL; > char *contents = NULL; > size_t contents_sz; > @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > * The lock serves a purpose in addition to locking: the new > * contents of .git/config will be written into it. > */ > - lock = xcalloc(1, sizeof(struct lock_file)); > - fd = hold_lock_file_for_update(lock, config_filename, 0); > + fd = hold_lock_file_for_update(&lock, config_filename, 0); > if (fd < 0) { > error_errno("could not lock config file %s", config_filename); > free(store.key); > @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > close(in_fd); > in_fd = -1; > > - if (chmod(get_lock_file_path(lock), st.st_mode & 07777) < 0) { > - error_errno("chmod on %s failed", get_lock_file_path(lock)); > + if (chmod(get_lock_file_path(&lock), st.st_mode & 07777) < 0) { > + error_errno("chmod on %s failed", get_lock_file_path(&lock)); > ret = CONFIG_NO_WRITE; > goto out_free; > } > @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > contents = NULL; > } > > - if (commit_lock_file(lock) < 0) { > + if (commit_lock_file(&lock) < 0) { > error_errno("could not write config file %s", config_filename); > ret = CONFIG_NO_WRITE; > - lock = NULL; > goto out_free; > } > > - /* > - * lock is committed, so don't try to roll it back below. > - * NOTE: Since lockfile.c keeps a linked list of all created > - * lock_file structures, it isn't safe to free(lock). It's > - * better to just leave it hanging around. > - */ > - lock = NULL; > ret = 0; > > /* Invalidate the config cache */ > git_config_clear(); > > out_free: > - if (lock) > - rollback_lock_file(lock); > + rollback_lock_file(&lock); > free(filename_buf); > if (contents) > munmap(contents, contents_sz); > @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > return ret; > > write_err_out: > - ret = write_error(get_lock_file_path(lock)); > + ret = write_error(get_lock_file_path(&lock)); > goto out_free; > > } > -- > 2.14.1.721.gc5bc1565f1 > -- Brandon Williams ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-29 19:09 ` Brandon Williams @ 2017-08-29 19:10 ` Brandon Williams 2017-08-29 19:12 ` Jeff King 1 sibling, 0 replies; 43+ messages in thread From: Brandon Williams @ 2017-08-29 19:10 UTC (permalink / raw) To: Jeff King; +Cc: Lars Schneider, Martin Ågren, Git Users On 08/29, Brandon Williams wrote: > On 08/29, Jeff King wrote: > > On Tue, Aug 29, 2017 at 02:53:41PM -0400, Jeff King wrote: > > > > > It looks like the config code has a minor-ish leak. Patch to follow. > > > > Here it is. > > > > -- >8 -- > > Subject: [PATCH] config: use a static lock_file struct > > > > When modifying git config, we xcalloc() a struct lock_file > > but never free it. This is necessary because the tempfile > > code (upon which the locking code is built) requires that > > the resulting struct remain valid through the life of the > > program. However, it also confuses leak-checkers like > > valgrind because only the inner "struct tempfile" is still > > reachable; no pointer to the outer lock_file is kept. > > Is this just due to a limitation in the tempfile code? Would it be > possible to improve the tempfile code such that we don't need to require > that a tempfile, once created, is required to exist for the remaining > life of the program? > > > > > Other code paths solve this by using a single static lock > > struct. We can do the same here, because we know that we'll > > only lock and modify one config file at a time (and > > assertions within the lockfile code will ensure that this > > remains the case). > > > > That removes a real leak (when we fail to free the struct > > after locking fails) as well as removes the valgrind false > > positive. It also means that doing N sequential > > config-writes will use a constant amount of memory, rather > > than leaving stale lock_files for each. > > > > Note that since "lock" is no longer a pointer, it can't be > > NULL anymore. But that's OK. We used that feature only to > > avoid calling rollback_lock_file() on an already-committed > > lock. Since the lockfile code keeps its own "active" flag, > > it's a noop to rollback an inactive lock, and we don't have > > to worry about this ourselves. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > In the long run we may want to drop the "tempfiles must remain forever" > > rule. This is certainly not the first time it has caused confusion or > > leaks. And I don't think it's a fundamental issue, just the way the code > > is written. But in the interim, this fix is probably worth doing. I didn't read far enough apparently :) I took a look at this once and found that the in order to do this we would just need to be careful in modifying a list of tempfiles. > > > > config.c | 24 +++++++----------------- > > 1 file changed, 7 insertions(+), 17 deletions(-) > > > > diff --git a/config.c b/config.c > > index d0d8ce823a..1603f96e40 100644 > > --- a/config.c > > +++ b/config.c > > @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > > { > > int fd = -1, in_fd = -1; > > int ret; > > - struct lock_file *lock = NULL; > > + static struct lock_file lock; > > char *filename_buf = NULL; > > char *contents = NULL; > > size_t contents_sz; > > @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > > * The lock serves a purpose in addition to locking: the new > > * contents of .git/config will be written into it. > > */ > > - lock = xcalloc(1, sizeof(struct lock_file)); > > - fd = hold_lock_file_for_update(lock, config_filename, 0); > > + fd = hold_lock_file_for_update(&lock, config_filename, 0); > > if (fd < 0) { > > error_errno("could not lock config file %s", config_filename); > > free(store.key); > > @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > > close(in_fd); > > in_fd = -1; > > > > - if (chmod(get_lock_file_path(lock), st.st_mode & 07777) < 0) { > > - error_errno("chmod on %s failed", get_lock_file_path(lock)); > > + if (chmod(get_lock_file_path(&lock), st.st_mode & 07777) < 0) { > > + error_errno("chmod on %s failed", get_lock_file_path(&lock)); > > ret = CONFIG_NO_WRITE; > > goto out_free; > > } > > @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > > contents = NULL; > > } > > > > - if (commit_lock_file(lock) < 0) { > > + if (commit_lock_file(&lock) < 0) { > > error_errno("could not write config file %s", config_filename); > > ret = CONFIG_NO_WRITE; > > - lock = NULL; > > goto out_free; > > } > > > > - /* > > - * lock is committed, so don't try to roll it back below. > > - * NOTE: Since lockfile.c keeps a linked list of all created > > - * lock_file structures, it isn't safe to free(lock). It's > > - * better to just leave it hanging around. > > - */ > > - lock = NULL; > > ret = 0; > > > > /* Invalidate the config cache */ > > git_config_clear(); > > > > out_free: > > - if (lock) > > - rollback_lock_file(lock); > > + rollback_lock_file(&lock); > > free(filename_buf); > > if (contents) > > munmap(contents, contents_sz); > > @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, > > return ret; > > > > write_err_out: > > - ret = write_error(get_lock_file_path(lock)); > > + ret = write_error(get_lock_file_path(&lock)); > > goto out_free; > > > > } > > -- > > 2.14.1.721.gc5bc1565f1 > > > > -- > Brandon Williams -- Brandon Williams ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-29 19:09 ` Brandon Williams 2017-08-29 19:10 ` Brandon Williams @ 2017-08-29 19:12 ` Jeff King 2017-08-30 3:25 ` Michael Haggerty 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-29 19:12 UTC (permalink / raw) To: Brandon Williams Cc: Michael Haggerty, Lars Schneider, Martin Ågren, Git Users On Tue, Aug 29, 2017 at 12:09:28PM -0700, Brandon Williams wrote: > > -- >8 -- > > Subject: [PATCH] config: use a static lock_file struct > > > > When modifying git config, we xcalloc() a struct lock_file > > but never free it. This is necessary because the tempfile > > code (upon which the locking code is built) requires that > > the resulting struct remain valid through the life of the > > program. However, it also confuses leak-checkers like > > valgrind because only the inner "struct tempfile" is still > > reachable; no pointer to the outer lock_file is kept. > > Is this just due to a limitation in the tempfile code? Would it be > possible to improve the tempfile code such that we don't need to require > that a tempfile, once created, is required to exist for the remaining > life of the program? Yes. Like I wrote below: > > --- > > In the long run we may want to drop the "tempfiles must remain forever" > > rule. This is certainly not the first time it has caused confusion or > > leaks. And I don't think it's a fundamental issue, just the way the code > > is written. But in the interim, this fix is probably worth doing. The main issue is that the caller needs to make sure they're removed from the list (via commit or rollback) before being freed. As far as I know anyway. This restriction dates back to the very early days of the lockfile code and has been carried through the various tempfile-cleanup refactorings over the years (mostly because each was afraid to make functional changes). +cc Michael, who did most comprehensive cleanup of that code. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-29 19:12 ` Jeff King @ 2017-08-30 3:25 ` Michael Haggerty 2017-08-30 4:31 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Michael Haggerty @ 2017-08-30 3:25 UTC (permalink / raw) To: Jeff King, Brandon Williams; +Cc: Lars Schneider, Martin Ågren, Git Users On 08/29/2017 09:12 PM, Jeff King wrote: > On Tue, Aug 29, 2017 at 12:09:28PM -0700, Brandon Williams wrote: > >>> -- >8 -- >>> Subject: [PATCH] config: use a static lock_file struct >>> >>> When modifying git config, we xcalloc() a struct lock_file >>> but never free it. This is necessary because the tempfile >>> code (upon which the locking code is built) requires that >>> the resulting struct remain valid through the life of the >>> program. However, it also confuses leak-checkers like >>> valgrind because only the inner "struct tempfile" is still >>> reachable; no pointer to the outer lock_file is kept. >> >> Is this just due to a limitation in the tempfile code? Would it be >> possible to improve the tempfile code such that we don't need to require >> that a tempfile, once created, is required to exist for the remaining >> life of the program? > > Yes. Like I wrote below: > >>> --- >>> In the long run we may want to drop the "tempfiles must remain forever" >>> rule. This is certainly not the first time it has caused confusion or >>> leaks. And I don't think it's a fundamental issue, just the way the code >>> is written. But in the interim, this fix is probably worth doing. > > The main issue is that the caller needs to make sure they're removed > from the list (via commit or rollback) before being freed. > > As far as I know anyway. This restriction dates back to the very early > days of the lockfile code and has been carried through the various > tempfile-cleanup refactorings over the years (mostly because each was > afraid to make functional changes). > > +cc Michael, who did most comprehensive cleanup of that code. It was surprisingly hard trying to get that code to do the right thing, non-racily, in every error path. Since `remove_tempfiles()` can be called any time (even from a signal handler), the linked list of `tempfile` objects has to be kept valid at all times but can't use mutexes. I didn't have the energy to keep going and make the lock objects freeable. I suppose the task could be made a bit easier by using `sigprocmask(2)` or `pthread_sigmask(3)` to make its running environment a little bit less hostile. Michael ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 3:25 ` Michael Haggerty @ 2017-08-30 4:31 ` Jeff King 2017-08-30 4:55 ` Michael Haggerty 2017-08-30 4:55 ` Jeff King 0 siblings, 2 replies; 43+ messages in thread From: Jeff King @ 2017-08-30 4:31 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 05:25:18AM +0200, Michael Haggerty wrote: > >>> In the long run we may want to drop the "tempfiles must remain forever" > >>> rule. This is certainly not the first time it has caused confusion or > >>> leaks. And I don't think it's a fundamental issue, just the way the code > >>> is written. But in the interim, this fix is probably worth doing. > > > > The main issue is that the caller needs to make sure they're removed > > from the list (via commit or rollback) before being freed. > > > > As far as I know anyway. This restriction dates back to the very early > > days of the lockfile code and has been carried through the various > > tempfile-cleanup refactorings over the years (mostly because each was > > afraid to make functional changes). > > > > +cc Michael, who did most comprehensive cleanup of that code. > > It was surprisingly hard trying to get that code to do the right thing, > non-racily, in every error path. Since `remove_tempfiles()` can be > called any time (even from a signal handler), the linked list of > `tempfile` objects has to be kept valid at all times but can't use > mutexes. I didn't have the energy to keep going and make the lock > objects freeable. > > I suppose the task could be made a bit easier by using `sigprocmask(2)` > or `pthread_sigmask(3)` to make its running environment a little bit > less hostile. I think there are really two levels of carefulness here: 1. Avoiding complicated things during a signal handler that may rely on having a sane state from the rest of the program (e.g., half-formed entries, stdio locks, etc). 2. Being truly race-free in the face of a signal arriving while we're running arbitrary code that might have a tempfile struct in a funny state. I feel like right now we meet (1) and not (2). But I think if we keep to that lower bar of (1), it might not be that bad. We're assuming now that there's no race on the tempfile->active flag, for instance. We could probably make a similar assumption about putting items onto or taking them off of a linked list (it's not really atomic, but a single pointer assignment is probably "atomic enough" for our purposes). Or I dunno. There's a lot of "volatile" modifiers sprinkled around. Maybe those are enough to give us (2) as well (though in that case, I think we'd still be as OK with the list manipulation as we are with the active flag manipulation). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 4:31 ` Jeff King @ 2017-08-30 4:55 ` Michael Haggerty 2017-08-30 4:55 ` Jeff King 1 sibling, 0 replies; 43+ messages in thread From: Michael Haggerty @ 2017-08-30 4:55 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 6:31 AM, Jeff King <peff@peff.net> wrote: > On Wed, Aug 30, 2017 at 05:25:18AM +0200, Michael Haggerty wrote: >> It was surprisingly hard trying to get that code to do the right thing, >> non-racily, in every error path. Since `remove_tempfiles()` can be >> called any time (even from a signal handler), the linked list of >> `tempfile` objects has to be kept valid at all times but can't use >> mutexes. I didn't have the energy to keep going and make the lock >> objects freeable. >> >> I suppose the task could be made a bit easier by using `sigprocmask(2)` >> or `pthread_sigmask(3)` to make its running environment a little bit >> less hostile. > > I think there are really two levels of carefulness here: > > 1. Avoiding complicated things during a signal handler that may rely > on having a sane state from the rest of the program (e.g., > half-formed entries, stdio locks, etc). > > 2. Being truly race-free in the face of a signal arriving while we're > running arbitrary code that might have a tempfile struct in a funny > state. > > I feel like right now we meet (1) and not (2). But I think if we keep to > that lower bar of (1), it might not be that bad. We're assuming now that > there's no race on the tempfile->active flag, for instance. We could > probably make a similar assumption about putting items onto or taking > them off of a linked list (it's not really atomic, but a single pointer > assignment is probably "atomic enough" for our purposes). > > Or I dunno. There's a lot of "volatile" modifiers sprinkled around. > Maybe those are enough to give us (2) as well (though in that case, I > think we'd still be as OK with the list manipulation as we are with the > active flag manipulation). I did in fact strive for both (1) and (2), though I know that there are a couple of short races remaining in the code (plus who knows how many bugs). Actually, I think you're right that a single "probably-atomic" pointer assignment would be enough to remove an entry from the list safely. I was thinking about what it would take to make the list thread-safe (which it currently is not, but probably doesn't need to be(?)). Modifying the linked list is not that difficult if you assume that there is only a single thread modifying the list at any given time. Michael ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 4:31 ` Jeff King 2017-08-30 4:55 ` Michael Haggerty @ 2017-08-30 4:55 ` Jeff King 2017-08-30 5:55 ` Jeff King 2017-08-30 6:55 ` Michael Haggerty 1 sibling, 2 replies; 43+ messages in thread From: Jeff King @ 2017-08-30 4:55 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 12:31:47AM -0400, Jeff King wrote: > > It was surprisingly hard trying to get that code to do the right thing, > > non-racily, in every error path. Since `remove_tempfiles()` can be > > called any time (even from a signal handler), the linked list of > > `tempfile` objects has to be kept valid at all times but can't use > > mutexes. I didn't have the energy to keep going and make the lock > > objects freeable. > > > > I suppose the task could be made a bit easier by using `sigprocmask(2)` > > or `pthread_sigmask(3)` to make its running environment a little bit > > less hostile. > > I think there are really two levels of carefulness here: > > 1. Avoiding complicated things during a signal handler that may rely > on having a sane state from the rest of the program (e.g., > half-formed entries, stdio locks, etc). > > 2. Being truly race-free in the face of a signal arriving while we're > running arbitrary code that might have a tempfile struct in a funny > state. > > I feel like right now we meet (1) and not (2). But I think if we keep to > that lower bar of (1), it might not be that bad. We're assuming now that > there's no race on the tempfile->active flag, for instance. We could > probably make a similar assumption about putting items onto or taking > them off of a linked list (it's not really atomic, but a single pointer > assignment is probably "atomic enough" for our purposes). Something like this, which AFAICT is about as safe as the existing code in its list manipulation. It retains the "active" flag as an additional check which I think isn't strictly necessary, but potentially catches some logic errors. The strbuf_reset() calls become strbuf_release(), since we're promising the caller that they could now free the struct (or let it go out of scope) if they chose. Probably during a signal handler we should skip that (we know the struct is off the list and non-active at that point, but we could possibly hit libc's free() mutex). diff --git a/tempfile.c b/tempfile.c index 6843710670..a7d964ebf8 100644 --- a/tempfile.c +++ b/tempfile.c @@ -69,7 +69,6 @@ static void remove_tempfiles(int skip_fclose) tempfile_list->fp = NULL; delete_tempfile(tempfile_list); } - tempfile_list = tempfile_list->next; } } @@ -85,39 +84,54 @@ static void remove_tempfiles_on_signal(int signo) raise(signo); } -/* - * Initialize *tempfile if necessary and add it to tempfile_list. - */ -static void prepare_tempfile_object(struct tempfile *tempfile) +static void init_tempfile(struct tempfile *tempfile) +{ + tempfile->fd = -1; + tempfile->fp = NULL; + tempfile->active = 0; + tempfile->owner = 0; + tempfile->next = NULL; + strbuf_init(&tempfile->filename, 0); +} + +static void activate_tempfile(struct tempfile *tempfile) { - if (!tempfile_list) { - /* One-time initialization */ + static volatile int initialized; + + if (!initialized) { sigchain_push_common(remove_tempfiles_on_signal); atexit(remove_tempfiles_on_exit); + initialized = 1; } if (tempfile->active) - die("BUG: prepare_tempfile_object called for active object"); - if (!tempfile->on_list) { - /* Initialize *tempfile and add it to tempfile_list: */ - tempfile->fd = -1; - tempfile->fp = NULL; - tempfile->active = 0; - tempfile->owner = 0; - strbuf_init(&tempfile->filename, 0); - tempfile->next = tempfile_list; - tempfile_list = tempfile; - tempfile->on_list = 1; - } else if (tempfile->filename.len) { - /* This shouldn't happen, but better safe than sorry. */ - die("BUG: prepare_tempfile_object called for improperly-reset object"); + die("BUG: activate_tempfile called for active object"); + + tempfile->next = tempfile_list; + tempfile_list = tempfile; + tempfile->active = 1; +} + +static void deactivate_tempfile(struct tempfile *tempfile) +{ + struct tempfile *volatile *p; + + if (!tempfile->active) + return; + + tempfile->active = 0; + for (p = &tempfile_list; *p; p = &(*p)->next) { + if (*p == tempfile) { + *p = tempfile->next; + break; + } } } /* Make sure errno contains a meaningful value on error */ int create_tempfile(struct tempfile *tempfile, const char *path) { - prepare_tempfile_object(tempfile); + init_tempfile(tempfile); strbuf_add_absolute_path(&tempfile->filename, path); tempfile->fd = open(tempfile->filename.buf, @@ -127,11 +141,11 @@ int create_tempfile(struct tempfile *tempfile, const char *path) tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL, 0666); if (tempfile->fd < 0) { - strbuf_reset(&tempfile->filename); + strbuf_release(&tempfile->filename); return -1; } tempfile->owner = getpid(); - tempfile->active = 1; + activate_tempfile(tempfile); if (adjust_shared_perm(tempfile->filename.buf)) { int save_errno = errno; error("cannot fix permission bits on %s", tempfile->filename.buf); @@ -144,25 +158,25 @@ int create_tempfile(struct tempfile *tempfile, const char *path) void register_tempfile(struct tempfile *tempfile, const char *path) { - prepare_tempfile_object(tempfile); + init_tempfile(tempfile); strbuf_add_absolute_path(&tempfile->filename, path); tempfile->owner = getpid(); - tempfile->active = 1; + activate_tempfile(tempfile); } int mks_tempfile_sm(struct tempfile *tempfile, const char *template, int suffixlen, int mode) { - prepare_tempfile_object(tempfile); + init_tempfile(tempfile); strbuf_add_absolute_path(&tempfile->filename, template); tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, mode); if (tempfile->fd < 0) { - strbuf_reset(&tempfile->filename); + strbuf_release(&tempfile->filename); return -1; } tempfile->owner = getpid(); - tempfile->active = 1; + activate_tempfile(tempfile); return tempfile->fd; } @@ -171,7 +185,7 @@ int mks_tempfile_tsm(struct tempfile *tempfile, { const char *tmpdir; - prepare_tempfile_object(tempfile); + init_tempfile(tempfile); tmpdir = getenv("TMPDIR"); if (!tmpdir) @@ -180,11 +194,11 @@ int mks_tempfile_tsm(struct tempfile *tempfile, strbuf_addf(&tempfile->filename, "%s/%s", tmpdir, template); tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, mode); if (tempfile->fd < 0) { - strbuf_reset(&tempfile->filename); + strbuf_release(&tempfile->filename); return -1; } tempfile->owner = getpid(); - tempfile->active = 1; + activate_tempfile(tempfile); return tempfile->fd; } @@ -293,8 +307,8 @@ int rename_tempfile(struct tempfile *tempfile, const char *path) return -1; } - tempfile->active = 0; - strbuf_reset(&tempfile->filename); + deactivate_tempfile(tempfile); + strbuf_release(&tempfile->filename); return 0; } @@ -305,7 +319,7 @@ void delete_tempfile(struct tempfile *tempfile) if (!close_tempfile(tempfile)) { unlink_or_warn(tempfile->filename.buf); - tempfile->active = 0; - strbuf_reset(&tempfile->filename); + deactivate_tempfile(tempfile); + strbuf_release(&tempfile->filename); } } diff --git a/tempfile.h b/tempfile.h index 2f0038decd..df96f82e84 100644 --- a/tempfile.h +++ b/tempfile.h @@ -85,7 +85,6 @@ struct tempfile { volatile int fd; FILE *volatile fp; volatile pid_t owner; - char on_list; struct strbuf filename; }; ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 4:55 ` Jeff King @ 2017-08-30 5:55 ` Jeff King 2017-08-30 7:07 ` Michael Haggerty 2017-08-30 6:55 ` Michael Haggerty 1 sibling, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-30 5:55 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 12:55:55AM -0400, Jeff King wrote: > > I feel like right now we meet (1) and not (2). But I think if we keep to > > that lower bar of (1), it might not be that bad. We're assuming now that > > there's no race on the tempfile->active flag, for instance. We could > > probably make a similar assumption about putting items onto or taking > > them off of a linked list (it's not really atomic, but a single pointer > > assignment is probably "atomic enough" for our purposes). > > Something like this, which AFAICT is about as safe as the existing code > in its list manipulation. It retains the "active" flag as an additional > check which I think isn't strictly necessary, but potentially catches > some logic errors. The patch below demonstrates how this could be used to turn some "xcalloc" lock_files into stack variables that are allowed to go out of scope after we commit or rollback. This solves the three lock-related leaks reported by valgrind when running t0000. _But_ it also demonstrates an interesting downside of this approach. Some functions are lazy in their error paths. For instance, look at write_index_as_tree(). We take the lock early in the function, but may return before a commit or rollback if we hit an error. With the current code this "leaks" the tempfile, which is wrong. But nobody notices because in practice the program exits soon after and we clean up the tempfile then. But with a stack variable lock_file, that minor problem becomes a major one: our stack variable got added to a global linked list and the atexit() handler will try to read it. But now of course it will contain garbage, since the variable went out of scope. So it's probably safer to just let tempfile.c handle the whole lifetime, and have it put all live tempfiles on the heap, and free them when they're deactivated. That means our creation signature becomes more like: struct tempfile *create_tempfile(const char *path); and delete_tempfile() actually calls free() on it (though we'd probably want to skip the free() from a signal handler for the usual reasons). I don't have a patch for that, but just food for thought while exploring the implications of my earlier suggestion. --- diff --git a/builtin/update-index.c b/builtin/update-index.c index d955cd56b3..bf7420b808 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -917,7 +917,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct refresh_params refresh_args = {0, &has_errors}; int lock_error = 0; int split_index = -1; - struct lock_file *lock_file; + struct lock_file lock_file = LOCK_INIT; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; @@ -1016,11 +1016,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - /* We can't free this memory, it becomes part of a linked list parsed atexit() */ - lock_file = xcalloc(1, sizeof(struct lock_file)); - /* we will diagnose later if it turns out that we need to update it */ - newfd = hold_locked_index(lock_file, 0); + newfd = hold_locked_index(&lock_file, 0); if (newfd < 0) lock_error = errno; @@ -1155,11 +1152,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(&the_index, lock_file, COMMIT_LOCK)) + if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die("Unable to write new index file"); } - rollback_lock_file(lock_file); + rollback_lock_file(&lock_file); return has_errors ? 1 : 0; } diff --git a/cache-tree.c b/cache-tree.c index 2440d1dc89..440f92aeca 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -603,19 +603,15 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, const char *index_path, int flags, const char *prefix) { int entries, was_valid, newfd; - struct lock_file *lock_file; + struct lock_file lock_file = LOCK_INIT; - /* - * We can't free this memory, it becomes part of a linked list - * parsed atexit() - */ - lock_file = xcalloc(1, sizeof(struct lock_file)); - - newfd = hold_lock_file_for_update(lock_file, index_path, LOCK_DIE_ON_ERROR); + newfd = hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR); entries = read_index_from(index_state, index_path); - if (entries < 0) + if (entries < 0) { + rollback_lock_file(&lock_file); return WRITE_TREE_UNREADABLE_INDEX; + } if (flags & WRITE_TREE_IGNORE_CACHE_TREE) cache_tree_free(&index_state->cache_tree); @@ -624,10 +620,12 @@ int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, co was_valid = cache_tree_fully_valid(index_state->cache_tree); if (!was_valid) { - if (cache_tree_update(index_state, flags) < 0) + if (cache_tree_update(index_state, flags) < 0) { + rollback_lock_file(&lock_file); return WRITE_TREE_UNMERGED_INDEX; + } if (0 <= newfd) { - if (!write_locked_index(index_state, lock_file, COMMIT_LOCK)) + if (!write_locked_index(index_state, &lock_file, COMMIT_LOCK)) newfd = -1; } /* Not being able to write is fine -- we are only interested @@ -641,15 +639,17 @@ int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, co if (prefix) { struct cache_tree *subtree; subtree = cache_tree_find(index_state->cache_tree, prefix); - if (!subtree) + if (!subtree) { + rollback_lock_file(&lock_file); return WRITE_TREE_PREFIX_ERROR; + } hashcpy(sha1, subtree->oid.hash); } else hashcpy(sha1, index_state->cache_tree->oid.hash); if (0 <= newfd) - rollback_lock_file(lock_file); + rollback_lock_file(&lock_file); return 0; } diff --git a/config.c b/config.c index d0d8ce823a..7931182a54 100644 --- a/config.c +++ b/config.c @@ -2450,7 +2450,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, { int fd = -1, in_fd = -1; int ret; - struct lock_file *lock = NULL; + struct lock_file lock = LOCK_INIT; char *filename_buf = NULL; char *contents = NULL; size_t contents_sz; @@ -2469,8 +2469,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, * The lock serves a purpose in addition to locking: the new * contents of .git/config will be written into it. */ - lock = xcalloc(1, sizeof(struct lock_file)); - fd = hold_lock_file_for_update(lock, config_filename, 0); + fd = hold_lock_file_for_update(&lock, config_filename, 0); if (fd < 0) { error_errno("could not lock config file %s", config_filename); free(store.key); @@ -2583,8 +2582,8 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, close(in_fd); in_fd = -1; - if (chmod(get_lock_file_path(lock), st.st_mode & 07777) < 0) { - error_errno("chmod on %s failed", get_lock_file_path(lock)); + if (chmod(get_lock_file_path(&lock), st.st_mode & 07777) < 0) { + error_errno("chmod on %s failed", get_lock_file_path(&lock)); ret = CONFIG_NO_WRITE; goto out_free; } @@ -2639,28 +2638,19 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, contents = NULL; } - if (commit_lock_file(lock) < 0) { + if (commit_lock_file(&lock) < 0) { error_errno("could not write config file %s", config_filename); ret = CONFIG_NO_WRITE; - lock = NULL; goto out_free; } - /* - * lock is committed, so don't try to roll it back below. - * NOTE: Since lockfile.c keeps a linked list of all created - * lock_file structures, it isn't safe to free(lock). It's - * better to just leave it hanging around. - */ - lock = NULL; ret = 0; /* Invalidate the config cache */ git_config_clear(); out_free: - if (lock) - rollback_lock_file(lock); + rollback_lock_file(&lock); free(filename_buf); if (contents) munmap(contents, contents_sz); @@ -2669,7 +2659,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename, return ret; write_err_out: - ret = write_error(get_lock_file_path(lock)); + ret = write_error(get_lock_file_path(&lock)); goto out_free; } diff --git a/lockfile.h b/lockfile.h index 572064939c..a7ba42bdf2 100644 --- a/lockfile.h +++ b/lockfile.h @@ -114,6 +114,8 @@ struct lock_file { struct tempfile tempfile; }; +#define LOCK_INIT { { NULL } } + /* String appended to a filename to derive the lockfile name: */ #define LOCK_SUFFIX ".lock" #define LOCK_SUFFIX_LEN 5 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 5:55 ` Jeff King @ 2017-08-30 7:07 ` Michael Haggerty 2017-09-02 8:44 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Michael Haggerty @ 2017-08-30 7:07 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On 08/30/2017 07:55 AM, Jeff King wrote: > On Wed, Aug 30, 2017 at 12:55:55AM -0400, Jeff King wrote: > [...] > The patch below demonstrates how this could be used to turn some > "xcalloc" lock_files into stack variables that are allowed to go out of > scope after we commit or rollback. This solves the three lock-related > leaks reported by valgrind when running t0000. > > _But_ it also demonstrates an interesting downside of this approach. > Some functions are lazy in their error paths. For instance, look at > write_index_as_tree(). We take the lock early in the function, but may > return before a commit or rollback if we hit an error. With the current > code this "leaks" the tempfile, which is wrong. But nobody notices > because in practice the program exits soon after and we clean up the > tempfile then. > > But with a stack variable lock_file, that minor problem becomes a major > one: our stack variable got added to a global linked list and the > atexit() handler will try to read it. But now of course it will contain > garbage, since the variable went out of scope. > > So it's probably safer to just let tempfile.c handle the whole lifetime, > and have it put all live tempfiles on the heap, and free them when > they're deactivated. That means our creation signature becomes more > like: > > struct tempfile *create_tempfile(const char *path); > > and delete_tempfile() actually calls free() on it (though we'd probably > want to skip the free() from a signal handler for the usual reasons). I agree that the latter would be a nice, and relatively safe, design. It would involve some fairly intrusive changes to client code, though. I think it would be possible to implement the new API while leaving the old one intact, to avoid having to rewrite all clients at once, and potentially to allow clients to avoid a malloc if they already have a convenient place to embed a `struct tempfile` (except that now they'd be able to free it when done). For example, `create_tempfile(tempfile, path)` and its friends could accept NULL as the first argument, in which case it would malloc a `struct tempfile` itself, and mark it as being owned by the tempfile module. Such objects would be freed when deactivated. But if the caller passes in a non-NULL `tempfile` argument, the old behavior would be retained. Michael ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 7:07 ` Michael Haggerty @ 2017-09-02 8:44 ` Jeff King 2017-09-02 13:50 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-09-02 8:44 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 09:07:53AM +0200, Michael Haggerty wrote: > > So it's probably safer to just let tempfile.c handle the whole lifetime, > > and have it put all live tempfiles on the heap, and free them when > > they're deactivated. That means our creation signature becomes more > > like: > > > > struct tempfile *create_tempfile(const char *path); > > > > and delete_tempfile() actually calls free() on it (though we'd probably > > want to skip the free() from a signal handler for the usual reasons). > > I agree that the latter would be a nice, and relatively safe, design. It > would involve some fairly intrusive changes to client code, though. > > I think it would be possible to implement the new API while leaving the > old one intact, to avoid having to rewrite all clients at once, and > potentially to allow clients to avoid a malloc if they already have a > convenient place to embed a `struct tempfile` (except that now they'd be > able to free it when done). For example, `create_tempfile(tempfile, > path)` and its friends could accept NULL as the first argument, in which > case it would malloc a `struct tempfile` itself, and mark it as being > owned by the tempfile module. Such objects would be freed when > deactivated. But if the caller passes in a non-NULL `tempfile` argument, > the old behavior would be retained. There are actually very few callers of create_tempfile (I count two). Everybody just uses lock_file(). So I think we could get away with just modifying the internals of the lock struct to hold a pointer, and then teaching commit_lock_file() et al to reset it to NULL after performing an operation that frees the tempfile. We could even modify rename_tempfile() and friends to take a pointer-to-pointer and NULL it itself. That would make it harder to get wrong. In the long term I don't think there's a huge value to being able to place a "struct tempfile" inside another struct, as opposed to holding a pointer. It saves a malloc, but at the cost of opening up confusion about lifetimes. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-09-02 8:44 ` Jeff King @ 2017-09-02 13:50 ` Jeff King 0 siblings, 0 replies; 43+ messages in thread From: Jeff King @ 2017-09-02 13:50 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Sat, Sep 02, 2017 at 04:44:51AM -0400, Jeff King wrote: > There are actually very few callers of create_tempfile (I count two). > Everybody just uses lock_file(). So I think we could get away with just > modifying the internals of the lock struct to hold a pointer, and then > teaching commit_lock_file() et al to reset it to NULL after performing > an operation that frees the tempfile. We could even modify > rename_tempfile() and friends to take a pointer-to-pointer and NULL it > itself. That would make it harder to get wrong. I take it back. It's true that there are very few create_tempfile() callers, but we'd want to have a similar change for mks_tempfile_*, and there are a lot more of those. One solution would be to add an extra level of indirection, like: struct tempfile { struct the_part_that_we_put_on_our_global_list *data; }; That lets us keep normal value semantics for "struct tempfile", which is nice. And existing callers wouldn't have to be modified at all for the creation/deletion bits. But I suspect it would also result in a lot of replacements like s/temp->/temp.data->/ for any callers that look at the underlying fields that have moved into the sub-struct. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 4:55 ` Jeff King 2017-08-30 5:55 ` Jeff King @ 2017-08-30 6:55 ` Michael Haggerty 2017-08-30 19:53 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: Michael Haggerty @ 2017-08-30 6:55 UTC (permalink / raw) To: Jeff King; +Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On 08/30/2017 06:55 AM, Jeff King wrote: > [...] > Something like this, which AFAICT is about as safe as the existing code > in its list manipulation. It retains the "active" flag as an additional > check which I think isn't strictly necessary, but potentially catches > some logic errors. > > The strbuf_reset() calls become strbuf_release(), since we're promising > the caller that they could now free the struct (or let it go out of > scope) if they chose. Probably during a signal handler we should skip > that (we know the struct is off the list and non-active at that point, > but we could possibly hit libc's free() mutex). This looks OK to me aside from the one caveat below. > diff --git a/tempfile.c b/tempfile.c > index 6843710670..a7d964ebf8 100644 > --- a/tempfile.c > +++ b/tempfile.c > [...] > +static void deactivate_tempfile(struct tempfile *tempfile) > +{ > + struct tempfile *volatile *p; > + > + if (!tempfile->active) > + return; > + > + tempfile->active = 0; > + for (p = &tempfile_list; *p; p = &(*p)->next) { > + if (*p == tempfile) { > + *p = tempfile->next; > + break; > + } > } > } `deactivate_tempfile()` is O(N) in the number of active tempfiles. This could get noticeable for, say, updating 30k references, which involves obtaining 30k reference locks. I think that code adds the locks in alphabetical order and also removes them in alphabetical order, so the overall effort would go like O(N²). I'm guessing that this would be measurable but not fatal for realistic numbers of references, but it should at least be benchmarked. There are three obvious ways to make this O(1) again: * Make the list doubly-linked. * Make sure that all significant callers remove items in the *reverse* order that they were added, in which case the item to be removed would always be found near the head of the list. * Change this class to keep track of a tail pointer and add new items at the end of the list, and ensure that all significant callers remove locks in the *same* order that they were added. The second and third options might be easy or difficult (depend on how much current callers need to be changed) and certainly would be easy to break again as new callers are added in the future. > [...] Michael ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 6:55 ` Michael Haggerty @ 2017-08-30 19:53 ` Jeff King 2017-08-30 19:57 ` Brandon Williams 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-30 19:53 UTC (permalink / raw) To: Michael Haggerty Cc: Brandon Williams, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 08:55:01AM +0200, Michael Haggerty wrote: > > + tempfile->active = 0; > > + for (p = &tempfile_list; *p; p = &(*p)->next) { > > + if (*p == tempfile) { > > + *p = tempfile->next; > > + break; > > + } > > } > > } > > `deactivate_tempfile()` is O(N) in the number of active tempfiles. This > could get noticeable for, say, updating 30k references, which involves > obtaining 30k reference locks. I think that code adds the locks in > alphabetical order and also removes them in alphabetical order, so the > overall effort would go like O(N²). I'm guessing that this would be > measurable but not fatal for realistic numbers of references, but it > should at least be benchmarked. > > There are three obvious ways to make this O(1) again: > > * Make the list doubly-linked. Yeah, I noticed this when writing it, and the doubly-linked list was my first thought. It's much more straight-forward than making guesses about traversal order, and we already have a nice implementation in list.h. What made me hesitate for this demonstration was that it turns the removal into _two_ pointer updates. That made me more nervous about the racy case where we get a single handler while already deleting something. Specifically when we have "a <-> b <-> c" and we've updated "a->next" to point to "c" but "c->prev" still points to "b". But even with the singly-linked list we're not fully race-proof (somebody could set *p to NULL between the time we look at it and dereference it). What we really care about is not two versions of the function running simultaneously, but one version getting interrupted by another and having the second one see a sane state (because we'll never return to the first signal handler; the second one will raise() and die). And I think we're fine there even with a doubly-linked list. It's still the single update of the "next" pointer that controls that second traversal. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 19:53 ` Jeff King @ 2017-08-30 19:57 ` Brandon Williams 2017-08-30 20:11 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Brandon Williams @ 2017-08-30 19:57 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Lars Schneider, Martin Ågren, Git Users On 08/30, Jeff King wrote: > On Wed, Aug 30, 2017 at 08:55:01AM +0200, Michael Haggerty wrote: > > > > + tempfile->active = 0; > > > + for (p = &tempfile_list; *p; p = &(*p)->next) { > > > + if (*p == tempfile) { > > > + *p = tempfile->next; > > > + break; > > > + } > > > } > > > } > > > > `deactivate_tempfile()` is O(N) in the number of active tempfiles. This > > could get noticeable for, say, updating 30k references, which involves > > obtaining 30k reference locks. I think that code adds the locks in > > alphabetical order and also removes them in alphabetical order, so the > > overall effort would go like O(N²). I'm guessing that this would be > > measurable but not fatal for realistic numbers of references, but it > > should at least be benchmarked. > > > > There are three obvious ways to make this O(1) again: > > > > * Make the list doubly-linked. > > Yeah, I noticed this when writing it, and the doubly-linked list was my > first thought. It's much more straight-forward than making guesses about > traversal order, and we already have a nice implementation in list.h. I agree that a doubly-linked list is probably the best way to go in order to solve the performance issue. > > What made me hesitate for this demonstration was that it turns the > removal into _two_ pointer updates. That made me more nervous about the > racy case where we get a single handler while already deleting > something. Specifically when we have "a <-> b <-> c" and we've updated > "a->next" to point to "c" but "c->prev" still points to "b". > > But even with the singly-linked list we're not fully race-proof > (somebody could set *p to NULL between the time we look at it and > dereference it). What we really care about is not two versions of the > function running simultaneously, but one version getting interrupted by > another and having the second one see a sane state (because we'll never > return to the first signal handler; the second one will raise() and > die). > > And I think we're fine there even with a doubly-linked list. It's still > the single update of the "next" pointer that controls that second > traversal. > > -Peff I know it was mentioned earlier but if this is a critical section, and it would be bad if it was interrupted, then couldn't we turn off interrupts before attempting to remove an item from the list? -- Brandon Williams ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 19:57 ` Brandon Williams @ 2017-08-30 20:11 ` Jeff King 2017-08-30 21:06 ` Brandon Williams 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-30 20:11 UTC (permalink / raw) To: Brandon Williams Cc: Michael Haggerty, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 12:57:31PM -0700, Brandon Williams wrote: > > And I think we're fine there even with a doubly-linked list. It's still > > the single update of the "next" pointer that controls that second > > traversal. > > I know it was mentioned earlier but if this is a critical section, and > it would be bad if it was interrupted, then couldn't we turn off > interrupts before attempting to remove an item from the list? If we have to, that's an option. I mostly didn't want to get into the portability headaches of signal-handling if we can avoid it. Right now sigchain uses plain old signal(). We do use sigaction() in a few places, but in a vanilla way. So the portability questions are: - can we rely on using more interesting bits of sigaction like sa_mask? Probably. - which flags should we be setting in sa_flags to get the same behavior we've been getting with signal()? Or alternatively, which behavior do we want? On Linux and BSD systems, at least, signal() defers repeat instances of the handled signal. So getting two quick SIGTERMs will not interrupt the handler to deliver the second. But getting SIGTERM followed by SIGINT would. We could extend that protection by having sigchain_push_common() set sa_mask to cover all of the related signals. On Linux and BSD the current code using signal() also implies SA_RESTART. We could add that to our flags, though I suspect in practice it doesn't matter. Whenever we establish a handler like this our intent is to never return from it. That just protects us from calling the _same_ handler from itself. But that's probably enough in practice, and means handlers don't have to worry about "critical sections". The other alternative is sigprocmask() to block signals entirely during a section. I'm not sure if there are portability questions there (it looks like we have a mingw wrapper there, but it's a complete noop). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 20:11 ` Jeff King @ 2017-08-30 21:06 ` Brandon Williams 2017-08-31 4:09 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Brandon Williams @ 2017-08-30 21:06 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, Lars Schneider, Martin Ågren, Git Users On 08/30, Jeff King wrote: > On Wed, Aug 30, 2017 at 12:57:31PM -0700, Brandon Williams wrote: > > > > And I think we're fine there even with a doubly-linked list. It's still > > > the single update of the "next" pointer that controls that second > > > traversal. > > > > I know it was mentioned earlier but if this is a critical section, and > > it would be bad if it was interrupted, then couldn't we turn off > > interrupts before attempting to remove an item from the list? > > If we have to, that's an option. I mostly didn't want to get into the > portability headaches of signal-handling if we can avoid it. > > Right now sigchain uses plain old signal(). We do use sigaction() in a > few places, but in a vanilla way. So the portability questions are: > > - can we rely on using more interesting bits of sigaction like > sa_mask? Probably. > > - which flags should we be setting in sa_flags to get the same > behavior we've been getting with signal()? Or alternatively, which > behavior do we want? > > On Linux and BSD systems, at least, signal() defers repeat instances of > the handled signal. So getting two quick SIGTERMs will not interrupt the > handler to deliver the second. But getting SIGTERM followed by SIGINT would. > > We could extend that protection by having sigchain_push_common() set > sa_mask to cover all of the related signals. On Linux and BSD the > current code using signal() also implies SA_RESTART. We could add that > to our flags, though I suspect in practice it doesn't matter. Whenever > we establish a handler like this our intent is to never return from it. > > That just protects us from calling the _same_ handler from itself. But > that's probably enough in practice, and means handlers don't have to > worry about "critical sections". The other alternative is sigprocmask() > to block signals entirely during a section. I'm not sure if there are > portability questions there (it looks like we have a mingw wrapper > there, but it's a complete noop). Yeah there's a lot about signals that I'm not very clear on. I do know that Eric helped me out on the fork-exec series I worked on earlier in the year and I believe it was to turn on/off signals during process launches in 45afb1ca9 (run-command: block signals between fork and execve, 2017-04-19). Though that bit of code is strictly for unix so I wouldn't know how that would work on windows machines. Portability does seem to always be a challenging problem. -- Brandon Williams ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-30 21:06 ` Brandon Williams @ 2017-08-31 4:09 ` Jeff King 0 siblings, 0 replies; 43+ messages in thread From: Jeff King @ 2017-08-31 4:09 UTC (permalink / raw) To: Brandon Williams Cc: Michael Haggerty, Lars Schneider, Martin Ågren, Git Users On Wed, Aug 30, 2017 at 02:06:02PM -0700, Brandon Williams wrote: > > We could extend that protection by having sigchain_push_common() set > > sa_mask to cover all of the related signals. On Linux and BSD the > > current code using signal() also implies SA_RESTART. We could add that > > to our flags, though I suspect in practice it doesn't matter. Whenever > > we establish a handler like this our intent is to never return from it. > > > > That just protects us from calling the _same_ handler from itself. But > > that's probably enough in practice, and means handlers don't have to > > worry about "critical sections". The other alternative is sigprocmask() > > to block signals entirely during a section. I'm not sure if there are > > portability questions there (it looks like we have a mingw wrapper > > there, but it's a complete noop). > > Yeah there's a lot about signals that I'm not very clear on. I do know > that Eric helped me out on the fork-exec series I worked on earlier in > the year and I believe it was to turn on/off signals during process > launches in 45afb1ca9 (run-command: block signals between fork and > execve, 2017-04-19). Though that bit of code is strictly for unix so I > wouldn't know how that would work on windows machines. Portability does > seem to always be a challenging problem. Based on the sketch I wrote above, I figured it would be pretty easy to convert sigchain to sigaction. But after taking a look at compat/mingw.c, I don't think Windows would be on board. sigaction() there is is a stub implementation that _only_ handles SIGALRM and nothing else. So I think the best we could do is put big #ifdefs around it to use sigaction on other platforms, and fall back to signal() on Windows. That's do-able, but my enthusiasm is waning as the complexity increases. Getting two SIGINTs in a row seems plausible, but we already handle that well. Getting SIGINT and then a _different_ signal while we're in the handler seems less likely in practice. The only combination I can think that would be common is TERM+KILL, but of course we're not catching KILL in the first place. So this seems like adding complexity for very little benefit. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-08-29 18:58 ` [PATCH] config: use a static lock_file struct Jeff King 2017-08-29 19:09 ` Brandon Williams @ 2017-09-06 3:59 ` Junio C Hamano 2017-09-06 12:41 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2017-09-06 3:59 UTC (permalink / raw) To: Jeff King; +Cc: Lars Schneider, Martin Ågren, Git Users Jeff King <peff@peff.net> writes: > In the long run we may want to drop the "tempfiles must remain forever" > rule. This is certainly not the first time it has caused confusion or > leaks. And I don't think it's a fundamental issue, just the way the code > is written. But in the interim, this fix is probably worth doing. I notice you also have a series to rework the "must remain forever" on the list, but nevertheless let's take this one first. Will queue. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] config: use a static lock_file struct 2017-09-06 3:59 ` Junio C Hamano @ 2017-09-06 12:41 ` Jeff King 0 siblings, 0 replies; 43+ messages in thread From: Jeff King @ 2017-09-06 12:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lars Schneider, Martin Ågren, Git Users On Wed, Sep 06, 2017 at 12:59:46PM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > In the long run we may want to drop the "tempfiles must remain forever" > > rule. This is certainly not the first time it has caused confusion or > > leaks. And I don't think it's a fundamental issue, just the way the code > > is written. But in the interim, this fix is probably worth doing. > > I notice you also have a series to rework the "must remain forever" > on the list, but nevertheless let's take this one first. > > Will queue. Yes, this will conflict with that series (but the resolution should just be to throw this one away). I'm fine with taking this in the meantime, though it's certainly not urgent (my main goal was getting the pre-test bits of test-lib.sh to run with leak detection on; I don't think this was actually affecting anybody in the real world). -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-29 18:53 ` Jeff King 2017-08-29 18:58 ` [PATCH] config: use a static lock_file struct Jeff King @ 2017-08-29 19:22 ` Martin Ågren 2017-08-29 21:48 ` Jeff King 1 sibling, 1 reply; 43+ messages in thread From: Martin Ågren @ 2017-08-29 19:22 UTC (permalink / raw) To: Jeff King; +Cc: Lars Schneider, Git Users On 29 August 2017 at 20:53, Jeff King <peff@peff.net> wrote: > On Tue, Aug 29, 2017 at 06:51:52PM +0100, Lars Schneider wrote: >> What if we run a few selected tests with valgrind and count all files that >> valgrind mentions (a single leak has multiple file mentions because of >> the stack trace and other leak indicators). We record these counts and let >> TravisCI scream if one of the numbers increases. >> >> I wonder how stable/fragile such a metric would be as a simple refactoring >> could easily change these numbers. Below I ran valgrind on t5510 before and >> after Martin's patch. The diff below clearly shows the pkt-line leak. >> >> Would it make sense to pursue something like this in TravisCI to avoid >> "pkt-line" kind of leaks in the future? > > I don't think that would work, because simply adding new tests would > bump the leak count, without the code actually growing worse. > > But think about your strategy for a moment: what you're really trying to > do is say "these existing leaks are OK because they are too many for us > to count, but we want to make sure we don't add _new_ ones". We already > have two techniques for distinguishing old ones from new ones: > > 1. Mark existing ones with valgrind suppressions so they do not warn > at all. > > 2. Fix them, so that the "existing" count drops to zero. > > Option (2), of course, has the added side effect that it's actually > fixing potential problems. :) One idea would be to report that "t1234 used to complete without leaks, but now there is a leak". Of course, the leak could be triggered by some "irrelevant" setup step that was just added and not by the functionality that is actually tested. But maybe then the reaction might be "oh, let's fix this one leak so that this test is leak-free again". If that then also reduces the leakiness of some other tests, good. That might help against the "there are too many leaks, where should I start?"-effect. Or similarly "This stack-trace was reported. I haven't seen it before." It could be a refactoring, it could be an old leak that hasn't been triggered in that way before, or it could be a new leak. But to be honest, I think that only works -- in a psychological sense -- once the number of leaks is small enough, however much that is. One take-away for me from this thread is that memory-leak-plugging seems to be appreciated, i.e., it's not just an academic problem. I think I'll look into it some more, i.e., slowly pursue option 2. :-) That might help give some insights on how to identify interesting leaks. Martin ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-29 19:22 ` [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren @ 2017-08-29 21:48 ` Jeff King 2017-08-30 5:31 ` Jeff King 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-29 21:48 UTC (permalink / raw) To: Martin Ågren; +Cc: Lars Schneider, Git Users On Tue, Aug 29, 2017 at 09:22:08PM +0200, Martin Ågren wrote: > One take-away for me from this thread is that memory-leak-plugging seems > to be appreciated, i.e., it's not just an academic problem. I think I'll > look into it some more, i.e., slowly pursue option 2. :-) That might help > give some insights on how to identify interesting leaks. Yes, I think if it's tractable that's the best path forward. It's not clear to me how many of the things valgrind reports are false positives (and in what way). Certainly the lock_file patch I just posted seems like something worth addressing, even if we _tend_ to only leak a small number of such structs per program invocation. It's not a problem now, but it's a sign of a bad pattern. -Peff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-29 21:48 ` Jeff King @ 2017-08-30 5:31 ` Jeff King 2017-09-05 10:03 ` Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: Jeff King @ 2017-08-30 5:31 UTC (permalink / raw) To: Martin Ågren; +Cc: Lars Schneider, Git Users On Tue, Aug 29, 2017 at 05:48:09PM -0400, Jeff King wrote: > On Tue, Aug 29, 2017 at 09:22:08PM +0200, Martin Ågren wrote: > > > One take-away for me from this thread is that memory-leak-plugging seems > > to be appreciated, i.e., it's not just an academic problem. I think I'll > > look into it some more, i.e., slowly pursue option 2. :-) That might help > > give some insights on how to identify interesting leaks. > > Yes, I think if it's tractable that's the best path forward. It's not > clear to me how many of the things valgrind reports are false positives > (and in what way). Certainly the lock_file patch I just posted seems > like something worth addressing, even if we _tend_ to only leak a small > number of such structs per program invocation. It's not a problem now, > but it's a sign of a bad pattern. I wanted to see what it would take to get t0000 to pass with "definite" leak-checking on, as in: diff --git a/t/valgrind/valgrind.sh b/t/valgrind/valgrind.sh index 669ebaf68b..5f01c06280 100755 --- a/t/valgrind/valgrind.sh +++ b/t/valgrind/valgrind.sh @@ -10,7 +10,7 @@ test-*) ;; esac -TOOL_OPTIONS='--leak-check=no' +TOOL_OPTIONS='--leak-check=full --errors-for-leak-kinds=definite' test -z "$GIT_VALGRIND_ENABLED" && exec "$program" "$@" The results gave an interesting overview of the problems I'd expect to face for the rest of the tests. I started with the config lock_file fix I sent earlier, but we needed a few more. This one in cmd_update_index() can become static in the same way: diff --git a/builtin/update-index.c b/builtin/update-index.c index d955cd56b3..3bff7de720 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -908,6 +908,7 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx, int cmd_update_index(int argc, const char **argv, const char *prefix) { + static struct lock_file lock_file; int newfd, entries, has_errors = 0, nul_term_line = 0; enum uc_mode untracked_cache = UC_UNSPECIFIED; int read_from_stdin = 0; @@ -917,7 +918,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct refresh_params refresh_args = {0, &has_errors}; int lock_error = 0; int split_index = -1; - struct lock_file *lock_file; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; @@ -1016,11 +1016,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - /* We can't free this memory, it becomes part of a linked list parsed atexit() */ - lock_file = xcalloc(1, sizeof(struct lock_file)); - /* we will diagnose later if it turns out that we need to update it */ - newfd = hold_locked_index(lock_file, 0); + newfd = hold_locked_index(&lock_file, 0); if (newfd < 0) lock_error = errno; @@ -1155,11 +1152,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(&the_index, lock_file, COMMIT_LOCK)) + if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die("Unable to write new index file"); } - rollback_lock_file(lock_file); + rollback_lock_file(&lock_file); return has_errors ? 1 : 0; } But there's another one in write_index_as_tree(), which is less obviously a candidate for turning into a single static variable. We might plausibly have multiple indices. I hacked around it with: diff --git a/cache-tree.c b/cache-tree.c index 2440d1dc89..f2a5aa138f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -603,7 +603,7 @@ static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *pat int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, const char *index_path, int flags, const char *prefix) { int entries, was_valid, newfd; - struct lock_file *lock_file; + static struct lock_file *lock_file; /* * We can't free this memory, it becomes part of a linked list which means that if we call this function once per program we'll keep a single static pointer, and valgrind will consider it "reachable". But if we call multiple, they really will leak as before. So it's definitely a hack, though it was enough for t0000. I think I'd favor loosening the "you can never free locks" requirement instead, which makes this problem just go away (we could even declare it as a stack variable then). Next up is the fact that setup_git_env() happily overwrites its own pointers without freeing them if it's called twice. I expect this will be a reasonably common problem for things like config or option parsing where we might see multiple values. In the simple cases, we just need to make sure we do: free(foo); foo = xstrdup(...); instead of just calling the latter. But we can't always do that, because we sometimes point to string literals like: static const char *foo = "default_value"; ... foo = xstrfmt("computed_value: %s", value_from_user); If we assign to "foo" again, we need to free only when the previous value was from xstrfmt, not when it points to the string literal. But we can't duplicate the string literal, because that default assignment is outside any function. We could solve that by using a struct instead, with a few helpers. Or possibly we could just use strbufs, if we taught it to handle "unowned" strings (i.e., where "alloc" is zero but "len" isn't necessarily, and we don't just point to the slopbuf). And then you'd have something like: #define STRBUF_INIT_STR(x) { 0, strlen(x), (char *)(x) } ... static struct strbuf foo = STRBUF_INIT_STR("default_value"); ... strbuf_addf(&foo, "computed_value..."); And then strbuf_grow() would have to learn to copy the existing bytes when it moves from the "alloc==0" case. Anyway, that actually ended up _not_ being how I solved this one. I was puzzled why we were calling setup_git_env() twice in the first place. And the answer is that setup_work_tree() frequently does a noop set_git_dir(). And we can just get rid of that: diff --git a/setup.c b/setup.c index 23950173fc..9104740e2d 100644 --- a/setup.c +++ b/setup.c @@ -399,12 +399,9 @@ void setup_work_tree(void) if (getenv(GIT_WORK_TREE_ENVIRONMENT)) setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1); - /* - * NEEDSWORK: this call can essentially be set_git_dir(get_git_dir()) - * which can cause some problems when trying to free the old value of - * gitdir. - */ - set_git_dir(remove_leading_path(git_dir, work_tree)); + git_dir = remove_leading_path(git_dir, work_tree); + if (strcmp(get_git_dir(), git_dir)) + set_git_dir(git_dir); initialized = 1; } I think this is really papering over underlying problems, though in this case they are fairly simple "just call free()" cases that don't need the strbuf magic. Next up is update-index's add_one_path. This one is a real leak: diff --git a/builtin/update-index.c b/builtin/update-index.c index d562f2ec69..d955cd56b3 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -287,8 +287,10 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len } option = allow_add ? ADD_CACHE_OK_TO_ADD : 0; option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0; - if (add_cache_entry(ce, option)) + if (add_cache_entry(ce, option)) { + free(ce); return error("%s: cannot add to the index - missing --add option?", path); + } return 0; } It's not a big deal in practice since we exit soon, but since it's in a sub-function I think it's worth addressing (and the earlier error code path already does the same free, so I think this really is just a bug). And the final case I hit is in ls-files. Deep inside dir.c we allocate some structures into the dir_struct. But there's no cleanup function, so of course we just let it go out of scope at the end of cmd_ls_files() and let the OS deal with it. Obviously one solution is to add a cleanup function and call it at the end of cmd_ls_files(). But here's another trick: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index e1339e6d17..5f2396392e 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -504,7 +504,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) { int require_work_tree = 0, show_tag = 0, i; const char *max_prefix; - struct dir_struct dir; + static struct dir_struct dir; struct exclude_list *el; struct string_list exclude_list = STRING_LIST_INIT_NODUP; struct option builtin_ls_files_options[] = { By marking it as static, it remains reachable even after the function exits, and valgrind doesn't complain about the leak. In a sense, this is accomplishing exactly what we want to say: there should be one of these per program run, and its lifetime is the same as that of the whole program. But it does make me feel just a little dirty. If for some reason we called cmd_ls_files() multiple times in a single process, the current behavior would be to have a small leak for each call. But the behavior with the hunk above would be that the second run sees whatever cruft was left in "dir" from the last run. Overall I suspect that running our cmd_*() functions multiple times in a single process is already going to cause chaos, because they often are touching static globals, etc. So it's probably not that big a deal in practice to add one more variable to the list, and declare that calling cmd_ls_files() anywhere except as the main function of ls-files is forbidden. -Peff ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() 2017-08-30 5:31 ` Jeff King @ 2017-09-05 10:03 ` Junio C Hamano 2017-10-10 4:06 ` [PATCH 0/2] Do not call cmd_*() as a subroutine Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2017-09-05 10:03 UTC (permalink / raw) To: Jeff King; +Cc: Martin Ågren, Lars Schneider, Git Users Jeff King <peff@peff.net> writes: > Overall I suspect that running our cmd_*() functions multiple times in a > single process is already going to cause chaos, because they often are > touching static globals, etc. So it's probably not that big a deal in > practice to add one more variable to the list, and declare that calling > cmd_ls_files() anywhere except as the main function of ls-files is > forbidden. IIRC there were a few existing violators of this rule around "merge" implementation. It may be a reasonably low hanging fruit to find and fix them. #leftoverbits ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 0/2] Do not call cmd_*() as a subroutine 2017-09-05 10:03 ` Junio C Hamano @ 2017-10-10 4:06 ` Junio C Hamano 2017-10-10 4:06 ` [PATCH 1/2] describe: do not use " Junio C Hamano 2017-10-10 4:06 ` [PATCH 2/2] merge-ours: " Junio C Hamano 0 siblings, 2 replies; 43+ messages in thread From: Junio C Hamano @ 2017-10-10 4:06 UTC (permalink / raw) To: git These two patches are for a recent #leftoverbits topic. https://public-inbox.org/git/xmqqr2vlbgyk.fsf@gitster.mtv.corp.google.com/ The cmd_foo() function is a moral equivalent of 'main' for a Git subcommand 'git foo', and as such, it is allowed to do many things that make it unsuitable to be called as a subroutine, including - call exit(3) to terminate the process; - allocate resource held and used throughout its lifetime, without releasing it upon return/exit; - rely on global variables being initialized at program startup, and update them as needed, making another clean invocation of the function impossible. Correcting two callers by using helper API calls is not so hard. Junio C Hamano (2): describe: do not use cmd_*() as a subroutine merge-ours: do not use cmd_*() as a subroutine builtin/describe.c | 15 +++++++++++---- builtin/merge-ours.c | 16 +++++++--------- 2 files changed, 18 insertions(+), 13 deletions(-) -- 2.15.0-rc0-203-g4c8d0e28b1 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/2] describe: do not use cmd_*() as a subroutine 2017-10-10 4:06 ` [PATCH 0/2] Do not call cmd_*() as a subroutine Junio C Hamano @ 2017-10-10 4:06 ` Junio C Hamano 2017-10-10 13:43 ` SZEDER Gábor 2017-10-10 4:06 ` [PATCH 2/2] merge-ours: " Junio C Hamano 1 sibling, 1 reply; 43+ messages in thread From: Junio C Hamano @ 2017-10-10 4:06 UTC (permalink / raw) To: git The cmd_foo() function is a moral equivalent of 'main' for a Git subcommand 'git foo', and as such, it is allowed to do many things that make it unsuitable to be called as a subroutine, including - call exit(3) to terminate the process; - allocate resource held and used throughout its lifetime, without releasing it upon return/exit; - rely on global variables being initialized at program startup, and update them as needed, making another clean invocation of the function impossible. The call to cmd_diff_index() "git describe" makes has been working by accident that the function did not call exit(3); it sets a bad precedent for people to cut and paste. We could invoke it via the run_command() interface, but the diff family of commands have helper functions in diff-lib.c that are meant to be usable as subroutines, and using the latter does not make the resulting code all that longer. Use it. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/describe.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/builtin/describe.c b/builtin/describe.c index 89ea1cdd60..50263759cb 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -7,12 +7,12 @@ #include "builtin.h" #include "exec_cmd.h" #include "parse-options.h" +#include "revision.h" #include "diff.h" #include "hashmap.h" #include "argv-array.h" #include "run-command.h" -#define SEEN (1u << 0) #define MAX_TAGS (FLAG_BITS - 1) static const char * const describe_usage[] = { @@ -532,7 +532,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix) } } else if (dirty) { static struct lock_file index_lock; - int fd; + struct rev_info revs; + struct argv_array args = ARGV_ARRAY_INIT; + int fd, result; read_cache_preload(NULL); refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, @@ -541,8 +543,13 @@ int cmd_describe(int argc, const char **argv, const char *prefix) if (0 <= fd) update_index_if_able(&the_index, &index_lock); - if (!cmd_diff_index(ARRAY_SIZE(diff_index_args) - 1, - diff_index_args, prefix)) + init_revisions(&revs, prefix); + argv_array_pushv(&args, diff_index_args); + if (setup_revisions(args.argc, args.argv, &revs, NULL) != 1) + BUG("malformed internal diff-index command line"); + result = run_diff_index(&revs, 0); + + if (!diff_result_code(&revs.diffopt, result)) suffix = NULL; else suffix = dirty; -- 2.15.0-rc0-203-g4c8d0e28b1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/2] describe: do not use cmd_*() as a subroutine 2017-10-10 4:06 ` [PATCH 1/2] describe: do not use " Junio C Hamano @ 2017-10-10 13:43 ` SZEDER Gábor 2017-10-11 6:00 ` Junio C Hamano 0 siblings, 1 reply; 43+ messages in thread From: SZEDER Gábor @ 2017-10-10 13:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: SZEDER Gábor, git > The cmd_foo() function is a moral equivalent of 'main' for a Git > subcommand 'git foo', and as such, it is allowed to do many things > that make it unsuitable to be called as a subroutine, including > > - call exit(3) to terminate the process; > > - allocate resource held and used throughout its lifetime, without > releasing it upon return/exit; > > - rely on global variables being initialized at program startup, > and update them as needed, making another clean invocation of the > function impossible. > > The call to cmd_diff_index() "git describe" makes has been working > by accident that the function did not call exit(3); it sets a bad > precedent for people to cut and paste. The subject implies to me that this patch eliminates all cmd_*() calls in builtin/describe.c, but a call to cmd_name_rev() still remains. However, that call is special in the sense that cmd_describe() returns immediately thereafter, so none of the above three points are an issue there. Someone might argue that it still sets a bad precedent, but I won't :) To avoid the direct cmd_name_rev() call we would have to use run_command(), because there are no libified helper functions available to do the job, adding the overhead of a fork()+exec(), though only once or nonce, depending on cmdline options. Maybe you already considered all this WRT that cmd_name_rev() call, I don't know. In any case, I think at least the subject line should spell out cmd_diff_index(). Gábor > We could invoke it via the run_command() interface, but the diff > family of commands have helper functions in diff-lib.c that are > meant to be usable as subroutines, and using the latter does not > make the resulting code all that longer. Use it. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > builtin/describe.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/builtin/describe.c b/builtin/describe.c > index 89ea1cdd60..50263759cb 100644 > --- a/builtin/describe.c > +++ b/builtin/describe.c > @@ -7,12 +7,12 @@ > #include "builtin.h" > #include "exec_cmd.h" > #include "parse-options.h" > +#include "revision.h" > #include "diff.h" > #include "hashmap.h" > #include "argv-array.h" > #include "run-command.h" > > -#define SEEN (1u << 0) > #define MAX_TAGS (FLAG_BITS - 1) > > static const char * const describe_usage[] = { > @@ -532,7 +532,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix) > } > } else if (dirty) { > static struct lock_file index_lock; > - int fd; > + struct rev_info revs; > + struct argv_array args = ARGV_ARRAY_INIT; > + int fd, result; > > read_cache_preload(NULL); > refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, > @@ -541,8 +543,13 @@ int cmd_describe(int argc, const char **argv, const char *prefix) > if (0 <= fd) > update_index_if_able(&the_index, &index_lock); > > - if (!cmd_diff_index(ARRAY_SIZE(diff_index_args) - 1, > - diff_index_args, prefix)) > + init_revisions(&revs, prefix); > + argv_array_pushv(&args, diff_index_args); > + if (setup_revisions(args.argc, args.argv, &revs, NULL) != 1) > + BUG("malformed internal diff-index command line"); > + result = run_diff_index(&revs, 0); > + > + if (!diff_result_code(&revs.diffopt, result)) > suffix = NULL; > else > suffix = dirty; > -- > 2.15.0-rc0-203-g4c8d0e28b1 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/2] describe: do not use cmd_*() as a subroutine 2017-10-10 13:43 ` SZEDER Gábor @ 2017-10-11 6:00 ` Junio C Hamano 0 siblings, 0 replies; 43+ messages in thread From: Junio C Hamano @ 2017-10-11 6:00 UTC (permalink / raw) To: SZEDER Gábor; +Cc: git SZEDER Gábor <szeder.dev@gmail.com> writes: > Maybe you already considered all this WRT that cmd_name_rev() call, I > don't know. In any case, I think at least the subject line should > spell out cmd_diff_index(). Perhaps. The intent was to eradicate cmd_*() used as a subroutine, so I'd rather opt to explain why name_rev() that does not return is OK in this case in the log message without changing anything else. Thanks for looking it over. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 2/2] merge-ours: do not use cmd_*() as a subroutine 2017-10-10 4:06 ` [PATCH 0/2] Do not call cmd_*() as a subroutine Junio C Hamano 2017-10-10 4:06 ` [PATCH 1/2] describe: do not use " Junio C Hamano @ 2017-10-10 4:06 ` Junio C Hamano 1 sibling, 0 replies; 43+ messages in thread From: Junio C Hamano @ 2017-10-10 4:06 UTC (permalink / raw) To: git The call to cmd_diff_index() "git merge-ours" makes has been working by accident that the function did not call exit(3), and the caller exited almost immediately after making a call, but it sets a bad precedent for people to cut and paste. For finding out if the index exactly matches the HEAD (or a given tree-ish), there is index_differs_from() which is exactly written for that purpose. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/merge-ours.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/builtin/merge-ours.c b/builtin/merge-ours.c index 684411694f..beb0623d56 100644 --- a/builtin/merge-ours.c +++ b/builtin/merge-ours.c @@ -9,26 +9,24 @@ */ #include "git-compat-util.h" #include "builtin.h" +#include "diff.h" static const char builtin_merge_ours_usage[] = "git merge-ours <base>... -- HEAD <remote>..."; -static const char *diff_index_args[] = { - "diff-index", "--quiet", "--cached", "HEAD", "--", NULL -}; -#define NARGS (ARRAY_SIZE(diff_index_args) - 1) - int cmd_merge_ours(int argc, const char **argv, const char *prefix) { if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_merge_ours_usage); /* - * We need to exit with 2 if the index does not match our HEAD tree, - * because the current index is what we will be committing as the - * merge result. + * The contents of the current index becomes the tree we + * commit. The index must match HEAD, or this merge cannot go + * through. */ - if (cmd_diff_index(NARGS, diff_index_args, prefix)) + if (read_cache() < 0) + die_errno("read_cache failed"); + if (index_differs_from("HEAD", 0, 0)) exit(2); exit(0); } -- 2.15.0-rc0-203-g4c8d0e28b1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
end of thread, other threads:[~2017-10-11 6:00 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-27 7:37 [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren 2017-08-27 15:41 ` Jeff King 2017-08-27 18:25 ` Jeff King 2017-08-27 18:21 ` Lars Schneider 2017-08-27 19:09 ` Martin Ågren 2017-08-27 19:15 ` Jeff King 2017-08-27 20:04 ` Lars Schneider 2017-08-27 23:23 ` Jeff King 2017-08-28 4:11 ` Martin Ågren 2017-08-28 17:52 ` Stefan Beller 2017-08-28 17:58 ` Jeff King 2017-09-05 10:03 ` Junio C Hamano 2017-08-29 17:51 ` Lars Schneider 2017-08-29 18:53 ` Jeff King 2017-08-29 18:58 ` [PATCH] config: use a static lock_file struct Jeff King 2017-08-29 19:09 ` Brandon Williams 2017-08-29 19:10 ` Brandon Williams 2017-08-29 19:12 ` Jeff King 2017-08-30 3:25 ` Michael Haggerty 2017-08-30 4:31 ` Jeff King 2017-08-30 4:55 ` Michael Haggerty 2017-08-30 4:55 ` Jeff King 2017-08-30 5:55 ` Jeff King 2017-08-30 7:07 ` Michael Haggerty 2017-09-02 8:44 ` Jeff King 2017-09-02 13:50 ` Jeff King 2017-08-30 6:55 ` Michael Haggerty 2017-08-30 19:53 ` Jeff King 2017-08-30 19:57 ` Brandon Williams 2017-08-30 20:11 ` Jeff King 2017-08-30 21:06 ` Brandon Williams 2017-08-31 4:09 ` Jeff King 2017-09-06 3:59 ` Junio C Hamano 2017-09-06 12:41 ` Jeff King 2017-08-29 19:22 ` [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1() Martin Ågren 2017-08-29 21:48 ` Jeff King 2017-08-30 5:31 ` Jeff King 2017-09-05 10:03 ` Junio C Hamano 2017-10-10 4:06 ` [PATCH 0/2] Do not call cmd_*() as a subroutine Junio C Hamano 2017-10-10 4:06 ` [PATCH 1/2] describe: do not use " Junio C Hamano 2017-10-10 13:43 ` SZEDER Gábor 2017-10-11 6:00 ` Junio C Hamano 2017-10-10 4:06 ` [PATCH 2/2] merge-ours: " Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).