Re: [PATCH] http.c: clear the 'finished' member once we are done with it

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Michael J Gruber <git@grubix.eu>
Subject: Re: [PATCH] http.c: clear the 'finished' member once we are done with it
Date: Tue, 24 May 2022 10:15:57 -0700	[thread overview]
Message-ID: <xmqqleuqj1gy.fsf@gitster.g> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2205241258510.352@tvgsbejvaqbjf.bet> (Johannes Schindelin's message of "Tue, 24 May 2022 13:03:41 +0200 (CEST)")

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Not really.  An outer run_active_slot() and an inner
>> run_active_slot() have a pointer to the same slot object.
>
> How is that possible? One of the first things that function does is to
> assign `slot->finished = &finished`, and then run that `while (!finished)`
> loop.
>
> How would the outer `run_active_slot()` ever get signaled via `finished`
> when the inner `run_active_slot()` would overwrite `slot->finished`? I am
> puzzled why we do not see infinite loops in such outer calls all the time,
> then.

The idea in http subsystem goes like this.

 * Generally, we have multiple curl requests in flight.  A curlm
   passed to curl_multi_perform() call knows about them and attempts
   to make as much progress without blocking.

 * After calling curl_multi_perform(), we call process_curl_messages()
   to collect the response that corresponds to the request.  This is
   done using the slot data structure.  Once we read the response,
   we may process it further by making a callback.

 * A slot, when finished, can be reused.  THe reuse is controlled by
   its in_use member.

So, let's trace a code flow, http-walker.c::fetch_object() is used
as a sample starting point.

 * http-walker.c::fetch_object()
   - pushes the object name to object_request queue.
   - calls step_active_slots() to make progress.  This function in turn
     - calls curl_multi_perform() repeatedly to make progress
     - calls process_curl_messages() to possibly complete some active slots
     - calls fill_active_slots() to fill more requests.  This function
       calls the "fill" function repeatedly to make more requests,
       which is http-walker.c::fill_active_slot() in this code path.  It
       - repeatedly calls start_object_request()
         * start_object_request() does these:
           - calls new_http_object_request(), which prepares object-request
             structure, in which there is a slot member that was
	     obtained by calling get_active_slot().
	     * get_active_slot() does many things, but all we need to know	
	       here is that it does "in_use = 1".
           - sets callback for the slot to process_object_response()
           - calls start_active_slot(),
             which adds the slot to curlm and calls curl_multi_perform()
             to make progress on the active slots.
     - calls run_active_slots() repeatedly.

Now run_active_slots() we know about.  Before baa7b67d (HTTP slot
reuse fixes, 2006-03-10), we used to loop on slot->in_use but to fix
a bug we updated it to use slot->finished.

 * run_active_slot()
   - takes a slot
   - clears finished on its stack
   - makes slot->finished point at &finished on its stack
   - loops until "finished" is set
     - calls step_active_slots(); what it does can be seen above,
       but here, we need to know what process_curl_messages() it
       calls does, in order to complete some requests.
       * process_curl_messages() 
         - reads the response from curl
         - finds the slot with request that resulted in the response
         - sets its result member
         - calls finish_active_slot() on it, which in turn does these:
	   - calls closedown_active_slot(), slot->in_use becomes 0
           - sets (*slot->finished) = 1
	   - calls slot->callback_func

The callback_func was set to process_object_response() earlier in
this code flow.

 * http-walker.c::process_object_response()
   - calls process_http_object_request(), which dissociates the slot
     from the http_object_request object.
   - may call fetch_alternates() when the object is not found,
     otherwise calls finish_object_request().

Let's see what happens when fetch_alternates() gets called here.

 * http-walker.c::fetch_alternates()
   - calls step_active_slots() to make progress
   - calls get_active_slot() 
   - calls start_active_slot()
   - calls run_active_slot()

Now we can see how the "slot" we used in the "outer" run_active_slot()
can be reused for a different request.  We received response to the
request, and in process_curl_messages(), we called finish_active_slot()
on the slot, which did three things: (1) slot is now not-in-use, (2) the
"finished" on the stack of the outer run_active_slot() is set to 1, and
(3) called the process_object_response() callback.

The callback then asked for an unused slot, and got the slot we just
used, because we no longer need it (the necessary information in the
response have been copied away to http_object_request object before
the slot was dissociated from it, and the only one bit of
information the outer run_active_slot() needs has already been sent
there on its on-stack "finished" variable).  The reused slot goes
through the usual start_active_slot() call to add it to curlm, and
then the "inner" run_active_slot() is started on it.  Until the
inner run_active_slot() returns, fetch_alternates() would not
return, but once it does, the control goes back to the outer
run_active_slot(), where it finds that its "finished" is now set to
1.

This incidentally is a good illustration why the thread-starter
patch that did

	if (&finished == slot->finished)
		slot->finished = NULL;

would be sufficient, and the "clear only ours" guard is not
necessary, I think.  If the inner run_active_slot() did not trigger
a callback that adds more reuse of the slot, it will clear
slot->finished to NULL itself, with or without the guard.  And the
outer run_active_slot() may fail to clear if the guard is there, but
slot->finished is NULL in that case, so there is no point in clearing
it again.

And if the inner run_active_slot() did trigger a callback that ended
up reusing the slot, then eventually the innermost one would have
cleared slot->finished to NULL, with or without the guard, before it
returned the control to inner run_active_slot().  The inference goes
the same way to show that the guard is not necessary but is not
hurting.

I _think_ we can even get away by not doing anything to
slot->finished at the end of run_active_slot(), as we are not
multi-threaded and the callee only returns to the caller, but if it
helps pleasing the warning compiler, I'd prefer the simplest
workaround, perhaps with an unconditional clearing there?

What did I miss?  I must be missing something, as I can explain how
the current "(*slot->finished) = 1" with "while (finished)"
correctly works, but I cannot quite explain why the original "while
(slot->in_use)" would not, which is annoying.

In other words, why we needed baa7b67d (HTTP slot reuse fixes,
2006-03-10) in the first place?  It is possible that we had some
code paths that forgot to drop in_use before the inner run_active
returned that have been fixed in the 16 years and this fix was
hiding that bug, but I dunno.

next prev parent reply	other threads:[~2022-05-24 17:17 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-06 18:04 [PATCH 0/2] quell a few gcc warnings Michael J Gruber
2022-05-06 18:04 ` [PATCH 1/2] dir.c: avoid gcc warning Michael J Gruber
2022-05-06 20:21   ` Junio C Hamano
2022-05-09 15:58     ` Taylor Blau
2022-05-07  6:14   ` Carlo Marcelo Arenas Belón
2022-05-06 18:04 ` [PATCH 2/2] http.c: " Michael J Gruber
2022-05-06 20:22   ` Junio C Hamano
2022-05-06 21:17     ` [PATCH] http.c: clear the 'finished' member once we are done with it Junio C Hamano
2022-05-07  5:40       ` Carlo Marcelo Arenas Belón
2022-05-07 18:42         ` Junio C Hamano
2022-05-07 19:11           ` Carlo Arenas
2022-05-23 21:58       ` Johannes Schindelin
2022-05-23 22:58         ` Junio C Hamano
2022-05-23 23:36           ` Junio C Hamano
2022-05-23 23:41           ` Johannes Schindelin
2022-05-24  0:02             ` Junio C Hamano
2022-05-24  6:31               ` Daniel Stenberg
2022-05-24 10:57                 ` Johannes Schindelin
2022-05-24 17:45                 ` Junio C Hamano
2022-05-26 14:15                   ` Daniel Stenberg
2022-05-24 11:03               ` Johannes Schindelin
2022-05-24 17:15                 ` Junio C Hamano [this message]
2022-05-24 20:16                   ` Carlo Marcelo Arenas Belón
2022-05-24 20:45                     ` Ævar Arnfjörð Bjarmason
2022-05-24 22:34                       ` Junio C Hamano
2022-05-25  9:08                         ` Michael J Gruber
2022-05-25 13:27                         ` Ævar Arnfjörð Bjarmason
2022-05-24 22:16                     ` Junio C Hamano
2022-05-24 23:19                     ` Junio C Hamano
2022-05-25  2:02                       ` Carlo Arenas
2022-05-24 20:38                   ` Ævar Arnfjörð Bjarmason
2022-05-24 22:28                     ` Junio C Hamano
2022-05-25 10:07                       ` Johannes Schindelin
2022-05-25 16:40                         ` Junio C Hamano
2022-05-06 20:41   ` [PATCH 2/2] http.c: avoid gcc warning Carlo Marcelo Arenas Belón
2022-05-09 11:22 ` [PATCH] detect-compiler: make detection independent of locale Michael J Gruber
2022-05-09 15:52   ` Junio C Hamano
2022-05-09 15:59     ` rsbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqleuqj1gy.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@grubix.eu \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).