git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH] attr: Document a new possible thread safe API
@ 2016-10-04 22:14 Stefan Beller
  2016-10-04 23:13 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-10-04 22:14 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, Stefan Beller

This is what we want to see at the end of the refactoring session
to enable the attr subsystem to be thread safe.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Junio wrote:  
>> So how would we go about git_all_attrs then?
>
> I think you arrived the same conclusion, but ...
  
I think the changes as proposed are minimal to be able to use attrs in a
multithreaded environment. 
  
This patch applies on top of jc/attr-more.
A complete diff between origin/master..HEAD for the documentation is
appended below.
  
Thanks,
Stefan

 Documentation/technical/api-gitattributes.txt | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
index 92fc32a..940617e 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -59,7 +59,10 @@ Querying Specific Attributes
   empty `struct git_attr_check` can be prepared by calling
   `git_attr_check_alloc()` function and then attributes you want to
   ask about can be added to it with `git_attr_check_append()`
-  function.
+  function. git_attr_check_initl is thread safe, i.e. you can call it
+  from different threads at the same time; internally however only one
+  call at a time is processed. If the calls from different threads have
+  the same arguments, the returned `git_attr_check` may be the same.
 
 * Call `git_check_attr()` to check the attributes for the path.
 
@@ -89,15 +92,21 @@ static void setup_check(void)
 
 ------------
 	const char *path;
+	struct git_attr_check *result;
 
 	setup_check();
-	git_check_attr(path, check);
+	result = git_check_attr(path, check);
 ------------
 
-. Act on `.value` member of the result, left in `check->check[]`:
+The result may be the same as `check` (in a single threaded application),
+but generally assume it is not. The `result` must not be free'd as it is
+owned by the attr subsystem. It is reused by the same thread, so a subsequent
+call to git_check_attr in the same thread will overwrite the result.
+
+. Act on `.value` member of the `result->check[]`:
 
 ------------
-	const char *value = check->check[0].value;
+	const char *value = result->check[0].value;
 
 	if (ATTR_TRUE(value)) {
 		The attribute is Set, by listing only the name of the
@@ -138,10 +147,7 @@ Querying All Attributes
 
 To get the values of all attributes associated with a file:
 
-* Prepare an empty `git_attr_check` structure by calling
-  `git_attr_check_alloc()`.
-
-* Call `git_all_attrs()`, which populates the `git_attr_check`
+* Call `git_all_attrs()`, which returns a `git_attr_check`
   with the attributes attached to the path.
 
 * Iterate over the `git_attr_check.check[]` array to examine
-- 
2.10.0.129.g35f6318






The following is `git diff origin/master Documentation/technical/api-gitattributes.txt`:

diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
index 2602668..940617e 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,10 +16,15 @@ Data Structure
 	of no interest to the calling programs.  The name of the
 	attribute can be retrieved by calling `git_attr_name()`.
 
+`struct git_attr_check_elem`::
+
+	This structure represents one attribute and its value.
+
 `struct git_attr_check`::
 
-	This structure represents a set of attributes to check in a call
-	to `git_check_attr()` function, and receives the results.
+	This structure represents a collection of `git_attr_check_elem`.
+	It is passed to `git_check_attr()` function, specifying the
+	attributes to check, and receives their values.
 
 
 Attribute Values
@@ -48,49 +53,60 @@ value of the attribute for the path.
 Querying Specific Attributes
 ----------------------------
 
-* Prepare an array of `struct git_attr_check` to define the list of
-  attributes you would want to check.  To populate this array, you would
-  need to define necessary attributes by calling `git_attr()` function.
+* Prepare `struct git_attr_check` using git_attr_check_initl()
+  function, enumerating the names of attributes whose values you are
+  interested in, terminated with a NULL pointer.  Alternatively, an
+  empty `struct git_attr_check` can be prepared by calling
+  `git_attr_check_alloc()` function and then attributes you want to
+  ask about can be added to it with `git_attr_check_append()`
+  function. git_attr_check_initl is thread safe, i.e. you can call it
+  from different threads at the same time; internally however only one
+  call at a time is processed. If the calls from different threads have
+  the same arguments, the returned `git_attr_check` may be the same.
 
 * Call `git_check_attr()` to check the attributes for the path.
 
-* Inspect `git_attr_check` structure to see how each of the attribute in
-  the array is defined for the path.
+* Inspect `git_attr_check` structure to see how each of the
+  attribute in the array is defined for the path.
 
 
 Example
 -------
 
-To see how attributes "crlf" and "indent" are set for different paths.
+To see how attributes "crlf" and "ident" are set for different paths.
 
-. Prepare an array of `struct git_attr_check` with two elements (because
-  we are checking two attributes).  Initialize their `attr` member with
-  pointers to `struct git_attr` obtained by calling `git_attr()`:
+. Prepare a `struct git_attr_check` with two elements (because
+  we are checking two attributes):
 
 ------------
-static struct git_attr_check check[2];
+static struct git_attr_check *check;
 static void setup_check(void)
 {
-	if (check[0].attr)
+	if (check)
 		return; /* already done */
-	check[0].attr = git_attr("crlf");
-	check[1].attr = git_attr("ident");
+	check = git_attr_check_initl("crlf", "ident", NULL);
 }
 ------------
 
-. Call `git_check_attr()` with the prepared array of `struct git_attr_check`:
+. Call `git_check_attr()` with the prepared `struct git_attr_check`:
 
 ------------
 	const char *path;
+	struct git_attr_check *result;
 
 	setup_check();
-	git_check_attr(path, ARRAY_SIZE(check), check);
+	result = git_check_attr(path, check);
 ------------
 
-. Act on `.value` member of the result, left in `check[]`:
+The result may be the same as `check` (in a single threaded application),
+but generally assume it is not. The `result` must not be free'd as it is
+owned by the attr subsystem. It is reused by the same thread, so a subsequent
+call to git_check_attr in the same thread will overwrite the result.
+
+. Act on `.value` member of the `result->check[]`:
 
 ------------
-	const char *value = check[0].value;
+	const char *value = result->check[0].value;
 
 	if (ATTR_TRUE(value)) {
 		The attribute is Set, by listing only the name of the
@@ -109,20 +125,36 @@ static void setup_check(void)
 	}
 ------------
 
+To see how attributes in argv[] are set for different paths, only
+the first step in the above would be different.
+
+------------
+static struct git_attr_check *check;
+static void setup_check(const char **argv)
+{
+	check = git_attr_check_alloc();
+	while (*argv) {
+		struct git_attr *attr = git_attr(*argv);
+		git_attr_check_append(check, attr);
+		argv++;
+	}
+}
+------------
+
 
 Querying All Attributes
 -----------------------
 
 To get the values of all attributes associated with a file:
 
-* Call `git_all_attrs()`, which returns an array of `git_attr_check`
-  structures.
+* Call `git_all_attrs()`, which returns a `git_attr_check`
+  with the attributes attached to the path.
 
-* Iterate over the `git_attr_check` array to examine the attribute
-  names and values.  The name of the attribute described by a
-  `git_attr_check` object can be retrieved via
-  `git_attr_name(check[i].attr)`.  (Please note that no items will be
-  returned for unset attributes, so `ATTR_UNSET()` will return false
-  for all returned `git_array_check` objects.)
+* Iterate over the `git_attr_check.check[]` array to examine
+  the attribute names and values.  The name of the attribute
+  described by a  `git_attr_check.check[]` object can be retrieved via
+  `git_attr_name(check->check[i].attr)`.  (Please note that no items
+  will be returned for unset attributes, so `ATTR_UNSET()` will return
+  false for all returned `git_array_check` objects.)
 
-* Free the `git_array_check` array.
+* Free the `git_array_check` by calling `git_attr_check_free()`.





^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC/PATCH] attr: Document a new possible thread safe API
  2016-10-04 22:14 [RFC/PATCH] attr: Document a new possible thread safe API Stefan Beller
@ 2016-10-04 23:13 ` Junio C Hamano
  2016-10-04 23:25   ` Stefan Beller
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-10-04 23:13 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, bmwill

Stefan Beller <sbeller@google.com> writes:

> diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
> index 92fc32a..940617e 100644
> --- a/Documentation/technical/api-gitattributes.txt
> +++ b/Documentation/technical/api-gitattributes.txt
> @@ -59,7 +59,10 @@ Querying Specific Attributes
>    empty `struct git_attr_check` can be prepared by calling
>    `git_attr_check_alloc()` function and then attributes you want to
>    ask about can be added to it with `git_attr_check_append()`
> -  function.
> +  function. git_attr_check_initl is thread safe, i.e. you can call it
> +  from different threads at the same time; internally however only one
> +  call at a time is processed. If the calls from different threads have
> +  the same arguments, the returned `git_attr_check` may be the same.

I do not think this is enough.  Look at the example for _initl() and
notice that it keeps the "singleton static check that is initialized
by the very first caller if the caller notices it is NULL" pattern.

One way to hide that may be to pass the address of that singleton
pointer to _initl(), so that it can do the "has it been initialized?
If not, let's prepare the thing" under lock.

> @@ -89,15 +92,21 @@ static void setup_check(void)
>  
>  ------------
>  	const char *path;
> +	struct git_attr_check *result;
>  
>  	setup_check();
> -	git_check_attr(path, check);
> +	result = git_check_attr(path, check);

I haven't formed a firm opinion, but I suspect your thinking might
be clouded by too much staring at the current implementation that
has <attr>,<value> pairs inside git_attr_check.  Traditionally, the
attr subsystem used the same type for the query question and the
query answer the same type, but it does not have to stay to be the
case at all.  Have you considered that we are allowed to make these
two types distinct?  A caller can share the same question instance
(i.e. the set of interned <attr>, in git_attr_check) with other
threads as that is a read-only thing, but each of the callers would
want to have the result array on its own stack if possible
(e.g. when asking for a known set of attributes, which is the
majority of the case) to avoid allocation cost.  I'd expect the most
typical caller to be

	static struct git_attr_check *check;
	struct git_attr_result result[2]; /* we want two */
	
	git_attr_check_initl(&check, "crlf", "ident", NULL);
	git_check_attr(path, check, result);
	/* result[0] has "crlf", result[1] has "ident" */

or something like that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/PATCH] attr: Document a new possible thread safe API
  2016-10-04 23:13 ` Junio C Hamano
@ 2016-10-04 23:25   ` Stefan Beller
  2016-10-05 17:00     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-10-04 23:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Brandon Williams

On Tue, Oct 4, 2016 at 4:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
>> index 92fc32a..940617e 100644
>> --- a/Documentation/technical/api-gitattributes.txt
>> +++ b/Documentation/technical/api-gitattributes.txt
>> @@ -59,7 +59,10 @@ Querying Specific Attributes
>>    empty `struct git_attr_check` can be prepared by calling
>>    `git_attr_check_alloc()` function and then attributes you want to
>>    ask about can be added to it with `git_attr_check_append()`
>> -  function.
>> +  function. git_attr_check_initl is thread safe, i.e. you can call it
>> +  from different threads at the same time; internally however only one
>> +  call at a time is processed. If the calls from different threads have
>> +  the same arguments, the returned `git_attr_check` may be the same.
>
> I do not think this is enough.  Look at the example for _initl() and
> notice that it keeps the "singleton static check that is initialized
> by the very first caller if the caller notices it is NULL" pattern.
>
> One way to hide that may be to pass the address of that singleton
> pointer to _initl(), so that it can do the "has it been initialized?
> If not, let's prepare the thing" under lock.

Oh, I see. Yeah that makes sense.

>
>> @@ -89,15 +92,21 @@ static void setup_check(void)
>>
>>  ------------
>>       const char *path;
>> +     struct git_attr_check *result;
>>
>>       setup_check();
>> -     git_check_attr(path, check);
>> +     result = git_check_attr(path, check);
>
> I haven't formed a firm opinion, but I suspect your thinking might
> be clouded by too much staring at the current implementation that
> has <attr>,<value> pairs inside git_attr_check.  Traditionally, the
> attr subsystem used the same type for the query question and the
> query answer the same type, but it does not have to stay to be the
> case at all.  Have you considered that we are allowed to make these
> two types distinct?

I thought about that, but as I concluded that the get_all_attrs doesn't need
conversion to a threading environment, we can keep it as is.

When keeping the get_all_attrs as is, we need to keep the data structures
as is, (i.e. key,value pair inside git_check_attr), so introducing a new
data type seemed not useful for the threaded part.

>  A caller can share the same question instance
> (i.e. the set of interned <attr>, in git_attr_check) with other
> threads as that is a read-only thing, but each of the callers would
> want to have the result array on its own stack if possible
> (e.g. when asking for a known set of attributes, which is the
> majority of the case) to avoid allocation cost.  I'd expect the most
> typical caller to be
>
>         static struct git_attr_check *check;
>         struct git_attr_result result[2]; /* we want two */
>
>         git_attr_check_initl(&check, "crlf", "ident", NULL);
>         git_check_attr(path, check, result);
>         /* result[0] has "crlf", result[1] has "ident" */
>
> or something like that.

I see, that seems to be a clean API. So git_attr_check_initl
will lock the mutex and once it got the mutex it can either
* return early as someone else did the work
* needs to do the actual work
and then unlock. In any case the work was done.

git_check_attr, which runs in all threads points to the same check,
but gets the different results.

Ok, I'll go in that direction then.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/PATCH] attr: Document a new possible thread safe API
  2016-10-04 23:25   ` Stefan Beller
@ 2016-10-05 17:00     ` Junio C Hamano
  2016-10-05 17:04       ` Stefan Beller
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-10-05 17:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

> I thought about that, but as I concluded that the get_all_attrs doesn't need
> conversion to a threading environment, we can keep it as is.

I agree that it is OK for get_all_attrs() to use its own way to ask
a question and receive an answer to it, that is different from how
git_check_attr() asks its question.  The threading-support for it is
an unrelated issue, though (not that I think it needs to be run from
a multi-threaded caller).

>> ...  I'd expect the most
>> typical caller to be
>>
>>         static struct git_attr_check *check;
>>         struct git_attr_result result[2]; /* we want two */
>>
>>         git_attr_check_initl(&check, "crlf", "ident", NULL);
>>         git_check_attr(path, check, result);
>>         /* result[0] has "crlf", result[1] has "ident" */
>>
>> or something like that.
>
> I see, that seems to be a clean API. So git_attr_check_initl
> will lock the mutex and once it got the mutex it can either
> * return early as someone else did the work
> * needs to do the actual work
> and then unlock. In any case the work was done.
>
> git_check_attr, which runs in all threads points to the same check,
> but gets the different results.

Yeah, something along that line.  It seems that we are now on the
same page?

Thanks.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/PATCH] attr: Document a new possible thread safe API
  2016-10-05 17:00     ` Junio C Hamano
@ 2016-10-05 17:04       ` Stefan Beller
  2016-10-05 22:09         ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-10-05 17:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Brandon Williams

On Wed, Oct 5, 2016 at 10:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> I thought about that, but as I concluded that the get_all_attrs doesn't need
>> conversion to a threading environment, we can keep it as is.
>
> I agree that it is OK for get_all_attrs() to use its own way to ask
> a question and receive an answer to it, that is different from how
> git_check_attr() asks its question.  The threading-support for it is
> an unrelated issue, though (not that I think it needs to be run from
> a multi-threaded caller).
>
>>> ...  I'd expect the most
>>> typical caller to be
>>>
>>>         static struct git_attr_check *check;
>>>         struct git_attr_result result[2]; /* we want two */
>>>
>>>         git_attr_check_initl(&check, "crlf", "ident", NULL);
>>>         git_check_attr(path, check, result);
>>>         /* result[0] has "crlf", result[1] has "ident" */
>>>
>>> or something like that.
>>
>> I see, that seems to be a clean API. So git_attr_check_initl
>> will lock the mutex and once it got the mutex it can either
>> * return early as someone else did the work
>> * needs to do the actual work
>> and then unlock. In any case the work was done.
>>
>> git_check_attr, which runs in all threads points to the same check,
>> but gets the different results.
>
> Yeah, something along that line.  It seems that we are now on the
> same page?
>

I think so, instead of resending the documentation, maybe the header
file shows that we're on the same page, I converted everything except
attr.c to follow this header attr.h:

---8<---
#ifndef ATTR_H
#define ATTR_H

/* An attribute is a pointer to this opaque structure */
struct git_attr;

/*
 * Given a string, return the gitattribute object that
 * corresponds to it.
 */
extern struct git_attr *git_attr(const char *);

/*
 * Return the name of the attribute represented by the argument.  The
 * return value is a pointer to a null-delimited string that is part
 * of the internal data structure; it should not be modified or freed.
 */
extern const char *git_attr_name(const struct git_attr *);

extern int attr_name_valid(const char *name, size_t namelen);
extern void invalid_attr_name_message(struct strbuf *, const char *, int);

/* Internal use */
extern const char git_attr__true[];
extern const char git_attr__false[];

/* For public to check git_attr_check results */
#define ATTR_TRUE(v) ((v) == git_attr__true)
#define ATTR_FALSE(v) ((v) == git_attr__false)
#define ATTR_UNSET(v) ((v) == NULL)

struct git_attr_check {
        int finalized;
        int check_nr;
        int check_alloc;
        struct git_attr **attr;
};

struct git_attr_result {
        int check_nr;
        int check_alloc;
        const char **value;
};

/*
 * Initialize the `git_attr_check` via one of the following three functions:
 *
 * git_attr_check_alloc allocates an empty check, add more attributes via
 *                      git_attr_check_append.
 * git_all_attrs        allocates a check and fills in all attributes that
 *                      are set for the given path.
 * git_attr_check_initl takes a pointer to where the check will be initialized,
 *                      followed by all attributes that are to be checked.
 *                      This makes it potentially thread safe as it could
 *                      internally have a mutex for that memory location.
 *                      Currently it is not thread safe!
 */
extern struct git_attr_check *git_attr_check_alloc(void);
extern void git_attr_check_append(struct git_attr_check *, const
struct git_attr *);
extern struct git_attr_check *git_all_attrs(const char *path);
extern void git_attr_check_initl(struct git_attr_check **, const char *, ...);

/* Query a path for its attributes */
extern struct git_attr_result *git_check_attr(const char *path,
                                             struct git_attr_check *);

/* internal? */
extern void git_attr_check_clear(struct git_attr_check *);

/* After use free the check */
extern void git_attr_check_free(struct git_attr_check *);

enum git_attr_direction {
        GIT_ATTR_CHECKIN,
        GIT_ATTR_CHECKOUT,
        GIT_ATTR_INDEX
};
void git_attr_set_direction(enum git_attr_direction, struct index_state *);

#endif /* ATTR_H */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC/PATCH] attr: Document a new possible thread safe API
  2016-10-05 17:04       ` Stefan Beller
@ 2016-10-05 22:09         ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2016-10-05 22:09 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

> I think so, instead of resending the documentation, maybe the header
> file shows that we're on the same page, I converted everything except
> attr.c to follow this header attr.h:

OK.  The function signature of git_check_attr() looks suspect (it
does not match the "typical case" illustration in the message you
are responding to), but other than that I think this matches my
expectation.

Thanks for taking this over.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-05 22:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04 22:14 [RFC/PATCH] attr: Document a new possible thread safe API Stefan Beller
2016-10-04 23:13 ` Junio C Hamano
2016-10-04 23:25   ` Stefan Beller
2016-10-05 17:00     ` Junio C Hamano
2016-10-05 17:04       ` Stefan Beller
2016-10-05 22:09         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).