From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
Jonathan Nieder <jrnieder@gmail.com>,
Johannes Sixt <j6t@kdbg.org>, Jeff King <peff@peff.net>,
Heiko Voigt <hvoigt@hvoigt.net>,
Jens Lehmann <jens.lehmann@web.de>
Subject: Re: [RFC PATCH 2/3] run-commands: add an async queue processor
Date: Fri, 21 Aug 2015 12:45:28 -0700 [thread overview]
Message-ID: <CAGZ79kZrcPAAt+miHDGQp=052S-q=JaKvvLgKHaPG+G6cDjBtg@mail.gmail.com> (raw)
In-Reply-To: <xmqqegiw5uom.fsf@gitster.dls.corp.google.com>
On Fri, Aug 21, 2015 at 12:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This adds functionality to do work in parallel.
>>
>> The whole life cycle of such a thread pool would look like
>>
>> struct task_queue * tq = create_task_queue(32); // no of threads
>> for (...)
>> add_task(tq, process_one_item_function, item); // non blocking
>> ...
>> int ret = finish_task_queue(tq); // blocks until all tasks are done
>> if (!tq)
>> die ("Not all items were be processed");
>>
>> The caller must take care of handling the output.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>
>> I sent this a while ago to the list, no comments on it :(
>
> The primary reason I suspect is because you sent to a wrong set of
> people. Submodule folks have largely been working in the scripted
> ones, and may not necessarily be the ones who are most familiar with
> the run-command infrastructure.
>
> "shortlog --no-merges" tells me that the obvious suspects are j6t
> and peff.
noted.
>
>> The core functionality stayed the same, but I hope to improved naming and
>> location of the code.
>>
>> The WIP is only for the NO_PTHREADS case.
>
>> run-command.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> run-command.h | 30 +++++++++
>> 2 files changed, 230 insertions(+), 12 deletions(-)
>>
>> diff --git a/run-command.c b/run-command.c
>> index 28e1d55..4029011 100644
>> --- a/run-command.c
>> +++ b/run-command.c
>> @@ -4,6 +4,21 @@
>> #include "sigchain.h"
>> #include "argv-array.h"
>>
>> +#ifdef NO_PTHREADS
>> +
>> +#else
>> +
>> +#include "thread-utils.h"
>> +
>> +#include <pthread.h>
>> +#include <semaphore.h>
>> +#include <stdio.h>
>> +#include <unistd.h>
>> +
>> +#endif
>> +
>> +#include "git-compat-util.h"
>> +
>
> This goes against the way we have been organizing the header files.
>
> http://thread.gmane.org/gmane.comp.version-control.git/276241/focus=276265
ok
>
>> @@ -668,6 +683,22 @@ int git_atexit(void (*handler)(void))
>>
>> #endif
>>
>> +void setup_main_thread()
>
> void setup_main_thread(void)
>
>> @@ -852,3 +872,171 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
>> close(cmd->out);
>> return finish_command(cmd);
>> }
>> +
>> +#ifndef NO_PTHREADS
>> +struct job_list {
>> + int (*fct)(struct task_queue *aq, void *task);
>> + void *task;
>> + struct job_list *next;
>> +};
>> +#endif
>> +
>> +struct task_queue {
>> +#ifndef NO_PTHREADS
>> + /*
>> + * To avoid deadlocks always aquire the semaphores with lowest priority
>
> acquire.
>
>> + * first, priorites are in descending order as listed.
>> + *
>> + * The `mutex` is a general purpose lock for modifying data in the async
>> + * queue, such as adding a new task or adding a return value from
>> + * an already run task.
>> + *
>> + * `workingcount` and `freecount` are opposing semaphores, the sum of
>> + * their values should equal `max_threads` at any time while the `mutex`
>> + * is available.
>> + */
>> + sem_t mutex;
>> + sem_t workingcount;
>> + sem_t freecount;
>> +
>> + pthread_t *threads;
>> + unsigned max_threads;
>> +
>> + struct job_list *first;
>> + struct job_list *last;
>> +#endif
>> + int early_return;
>> +};
>> +
>> +#ifndef NO_PTHREADS
>> +
>> +static void get_task(struct task_queue *aq,
>> + int (**fct)(struct task_queue *aq, void *task),
>> + void **task,
>> + int *early_return)
>> +{
>> + struct job_list *job;
>> +
>> + sem_wait(&aq->workingcount);
>> + sem_wait(&aq->mutex);
>> +
>> + if (!aq->first)
>> + die("BUG: internal error with dequeuing jobs for threads");
>> + job = aq->first;
>> + *fct = job->fct;
>> + *task = job->task;
>> + aq->early_return |= *early_return;
>> + *early_return = aq->early_return;
>> + aq->first = job->next;
>> + if (!aq->first)
>> + aq->last = NULL;
>> +
>> + sem_post(&aq->freecount);
>> + sem_post(&aq->mutex);
>> +
>> + free(job);
>> +}
>> +
>> +static void* dispatcher(void *args)
>
> static void *dispatcher(....)
>
>> +{
>> + void *task;
>> + int (*fct)(struct task_queue *aq, void *data);
>
> s/data/task/?
>
>> + int early_return = 0;
>> + struct task_queue *aq = args;
>> +
>> + get_task(aq, &fct, &task, &early_return);
>> + while (fct || early_return != 0) {
>> + early_return = fct(aq, task);
>> + get_task(aq, &fct, &task, &early_return);
>> + }
>
> If the func said "we are done, you may stop dispatching now", do you
> still want to do another get_task()?
This question shows me I messed up readability of this. `get_task` both gets
a new task as well as taking the value from early_return writing it to
aq->early_return,
such that other threads are notified.
Maybe I should do that explicitely here and not get the new task.
>
>> + pthread_exit(0);
>> +}
>> +#endif
>> +
>> +struct task_queue *create_task_queue(unsigned max_threads)
>> +{
>> + struct task_queue *aq = xmalloc(sizeof(*aq));
>> +
>> +#ifndef NO_PTHREADS
>> + int i;
>> + if (!max_threads)
>> + aq->max_threads = online_cpus();
>> + else
>> + aq->max_threads = max_threads;
>> +
>> + sem_init(&aq->mutex, 0, 1);
>> + sem_init(&aq->workingcount, 0, 0);
>> + sem_init(&aq->freecount, 0, aq->max_threads);
>> + aq->threads = xmalloc(aq->max_threads * sizeof(pthread_t));
>> +
>> + for (i = 0; i < aq->max_threads; i++)
>> + pthread_create(&aq->threads[i], 0, &dispatcher, aq);
>> +
>> + aq->first = NULL;
>> + aq->last = NULL;
>
>
> Shouldn't these be initialized before letting threads call into
> dispatcher? The workingcount semaphore that is initialized to 0 may
> prevent them from peeking into these pointers and barfing, but still...
They are initialized to NULL as the empty queue doesn't need a
container element.
Do we do queues in another way usually?
>
>> +
>> + setup_main_thread();
>> +#endif
>> + aq->early_return = 0;
>> +
>> + return aq;
>> +}
next prev parent reply other threads:[~2015-08-21 19:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-21 1:40 [PATCH 1/3] submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-21 1:40 ` [RFC PATCH 2/3] run-commands: add an async queue processor Stefan Beller
2015-08-21 19:05 ` Junio C Hamano
2015-08-21 19:44 ` Jeff King
2015-08-21 19:48 ` Stefan Beller
2015-08-21 19:51 ` Jeff King
2015-08-21 20:12 ` Stefan Beller
2015-08-21 20:41 ` Junio C Hamano
2015-08-21 23:40 ` Stefan Beller
2015-08-24 21:22 ` Junio C Hamano
2015-08-21 19:45 ` Stefan Beller [this message]
2015-08-21 20:47 ` Junio C Hamano
2015-08-21 20:56 ` Stefan Beller
2015-08-21 1:40 ` [WIP/PATCH 3/3] submodule: helper to run foreach in parallel Stefan Beller
2015-08-21 19:23 ` Junio C Hamano
2015-08-21 20:21 ` Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGZ79kZrcPAAt+miHDGQp=052S-q=JaKvvLgKHaPG+G6cDjBtg@mail.gmail.com' \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=hvoigt@hvoigt.net \
--cc=j6t@kdbg.org \
--cc=jens.lehmann@web.de \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).