git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
@ 2017-07-08  8:59 René Scharfe
  2017-07-09 11:00 ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2017-07-08  8:59 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Jeff King

Add the slash between loose object subdirectory and file name just once
outside the loop instead of overwriting it with each readdir call.
Redefine baselen as the length with that slash, and add dirlen for the
length without it.  The result is slightly less wasteful and can use the
the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 sha1_file.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5862386cd0..6a9deb9e68 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3749,7 +3749,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 				each_loose_subdir_fn subdir_cb,
 				void *data)
 {
-	size_t origlen, baselen;
+	size_t origlen, dirlen, baselen;
 	DIR *dir;
 	struct dirent *de;
 	int r = 0;
@@ -3760,7 +3760,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	origlen = path->len;
 	strbuf_complete(path, '/');
 	strbuf_addf(path, "%02x", subdir_nr);
-	baselen = path->len;
+	dirlen = path->len;
 
 	dir = opendir(path->buf);
 	if (!dir) {
@@ -3770,12 +3770,15 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 		return r;
 	}
 
+	strbuf_addch(path, '/');
+	baselen = path->len;
+
 	while ((de = readdir(dir))) {
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
 		strbuf_setlen(path, baselen);
-		strbuf_addf(path, "/%s", de->d_name);
+		strbuf_addstr(path, de->d_name);
 
 		if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2)  {
 			char hex[GIT_MAX_HEXSZ+1];
@@ -3801,7 +3804,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	}
 	closedir(dir);
 
-	strbuf_setlen(path, baselen);
+	strbuf_setlen(path, dirlen);
 	if (!r && subdir_cb)
 		r = subdir_cb(subdir_nr, path->buf, data);
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-08  8:59 [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir() René Scharfe
@ 2017-07-09 11:00 ` Jeff King
  2017-07-09 13:26   ` René Scharfe
  2017-07-09 16:41   ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2017-07-09 11:00 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano

On Sat, Jul 08, 2017 at 10:59:06AM +0200, René Scharfe wrote:

> Add the slash between loose object subdirectory and file name just once
> outside the loop instead of overwriting it with each readdir call.
> Redefine baselen as the length with that slash, and add dirlen for the
> length without it.  The result is slightly less wasteful and can use the
> the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.

This patch looks correct to me.

I'm a little lukewarm on it overall, though. I'd be shocked if the
efficiency change is measurable. What I really care about is whether the
result is easier to read or not.

On the plus side, this moves an invariant out of the loop. On the minus
side, it has to introduce an extra variable for "length we add on to"
versus "dir length to pass to the subdir_cb". That's not rocket science,
but it does slightly complicate things (though I note we already have
"origlen", so this is bumping us from 2 to 3 length variables, not 1 to
2).

So I dunno. It's fine with me if we take it, and fine if we leave it.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-09 11:00 ` Jeff King
@ 2017-07-09 13:26   ` René Scharfe
  2017-07-09 16:41   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: René Scharfe @ 2017-07-09 13:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano

Am 09.07.2017 um 13:00 schrieb Jeff King:
> On Sat, Jul 08, 2017 at 10:59:06AM +0200, René Scharfe wrote:
> 
>> Add the slash between loose object subdirectory and file name just once
>> outside the loop instead of overwriting it with each readdir call.
>> Redefine baselen as the length with that slash, and add dirlen for the
>> length without it.  The result is slightly less wasteful and can use the
>> the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.
> 
> This patch looks correct to me.
> 
> I'm a little lukewarm on it overall, though. I'd be shocked if the
> efficiency change is measurable. What I really care about is whether the
> result is easier to read or not.
> 
> On the plus side, this moves an invariant out of the loop. On the minus
> side, it has to introduce an extra variable for "length we add on to"
> versus "dir length to pass to the subdir_cb". That's not rocket science,
> but it does slightly complicate things (though I note we already have
> "origlen", so this is bumping us from 2 to 3 length variables, not 1 to
> 2).

Admittedly this is more of an OCD thing.  Overwriting a character 200
times or parsing a format string don't very long compared to the rest
of the function, but it's a bit itchy.

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-09 11:00 ` Jeff King
  2017-07-09 13:26   ` René Scharfe
@ 2017-07-09 16:41   ` Junio C Hamano
  2017-07-10  0:06     ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2017-07-09 16:41 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git List

Jeff King <peff@peff.net> writes:

> On Sat, Jul 08, 2017 at 10:59:06AM +0200, René Scharfe wrote:
>
>> Add the slash between loose object subdirectory and file name just once
>> outside the loop instead of overwriting it with each readdir call.
>> Redefine baselen as the length with that slash, and add dirlen for the
>> length without it.  The result is slightly less wasteful and can use the
>> the cheaper strbuf_addstr instead of strbuf_addf without losing clarity.
>
> This patch looks correct to me.
>
> I'm a little lukewarm on it overall, though. I'd be shocked if the
> efficiency change is measurable. What I really care about is whether the
> result is easier to read or not.
>
> On the plus side, this moves an invariant out of the loop. On the minus
> side, it has to introduce an extra variable for "length we add on to"
> versus "dir length to pass to the subdir_cb". That's not rocket science,
> but it does slightly complicate things (though I note we already have
> "origlen", so this is bumping us from 2 to 3 length variables, not 1 to
> 2).
>
> So I dunno. It's fine with me if we take it, and fine if we leave it.

Unlike origlen, base vs dir lengths are not strictly needed; we
prepare the base including '/', and we know we always have just one
'/' at the end, so anybody that uses dirlen to truncate it back to
the original before passing it down can truncate to (baselen-1), no?

In other words, something like this (not an incremental but a
replacement) to keep calling "baselen" the length of the leading
constant part we append to?

 sha1_file.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5862386cd0..d277b32bf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3760,7 +3760,6 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	origlen = path->len;
 	strbuf_complete(path, '/');
 	strbuf_addf(path, "%02x", subdir_nr);
-	baselen = path->len;
 
 	dir = opendir(path->buf);
 	if (!dir) {
@@ -3770,12 +3769,15 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 		return r;
 	}
 
+	strbuf_addch(path, '/');
+	baselen = path->len;
+
 	while ((de = readdir(dir))) {
 		if (is_dot_or_dotdot(de->d_name))
 			continue;
 
 		strbuf_setlen(path, baselen);
-		strbuf_addf(path, "/%s", de->d_name);
+		strbuf_addstr(path, de->d_name);
 
 		if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2)  {
 			char hex[GIT_MAX_HEXSZ+1];
@@ -3801,7 +3803,7 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
 	}
 	closedir(dir);
 
-	strbuf_setlen(path, baselen);
+	strbuf_setlen(path, baselen - 1); /* chomp the '/' that we added */
 	if (!r && subdir_cb)
 		r = subdir_cb(subdir_nr, path->buf, data);
 


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-09 16:41   ` Junio C Hamano
@ 2017-07-10  0:06     ` Jeff King
  2017-07-10  2:10       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2017-07-10  0:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git List

On Sun, Jul 09, 2017 at 09:41:51AM -0700, Junio C Hamano wrote:

> > On the plus side, this moves an invariant out of the loop. On the minus
> > side, it has to introduce an extra variable for "length we add on to"
> > versus "dir length to pass to the subdir_cb". That's not rocket science,
> > but it does slightly complicate things (though I note we already have
> > "origlen", so this is bumping us from 2 to 3 length variables, not 1 to
> > 2).
> >
> > So I dunno. It's fine with me if we take it, and fine if we leave it.
> 
> Unlike origlen, base vs dir lengths are not strictly needed; we
> prepare the base including '/', and we know we always have just one
> '/' at the end, so anybody that uses dirlen to truncate it back to
> the original before passing it down can truncate to (baselen-1), no?
> 
> In other words, something like this (not an incremental but a
> replacement) to keep calling "baselen" the length of the leading
> constant part we append to?

Yeah, I think that is correct. And you could even drop origlen by
replacing it with "baselen - 3" at the end. But somehow doing the
computation on the fly actually seems more complicated to me (from the
perspective of a reader who is trying to make sure all is correct).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-10  0:06     ` Jeff King
@ 2017-07-10  2:10       ` Junio C Hamano
  2017-07-12 17:58         ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2017-07-10  2:10 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git List

Jeff King <peff@peff.net> writes:

> ... And you could even drop origlen by
> replacing it with "baselen - 3" at the end. But somehow doing the
> computation on the fly actually seems more complicated to me (from the
> perspective of a reader who is trying to make sure all is correct).

True enough.  I personally do not mind any of the three variants
(including the original) and would rather not spend too much time
micro-optimizing this codepath, lest the next clever person starts
to turn the loop in for_each_loose_file_in_objdir() to a nested loop
that runs 16 times each to avoid copying the same leading byte again
and again ;-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir()
  2017-07-10  2:10       ` Junio C Hamano
@ 2017-07-12 17:58         ` René Scharfe
  0 siblings, 0 replies; 7+ messages in thread
From: René Scharfe @ 2017-07-12 17:58 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Git List

Am 10.07.2017 um 04:10 schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
> 
>> ... And you could even drop origlen by
>> replacing it with "baselen - 3" at the end. But somehow doing the
>> computation on the fly actually seems more complicated to me (from the
>> perspective of a reader who is trying to make sure all is correct).
> 
> True enough.  I personally do not mind any of the three variants
> (including the original) and would rather not spend too much time
> micro-optimizing this codepath, lest the next clever person starts
> to turn the loop in for_each_loose_file_in_objdir() to a nested loop
> that runs 16 times each to avoid copying the same leading byte again
> and again ;-)

*evil grin*

Anyway, I agree that this patch is not worth spending much time on.  I
still think the result is cleaner (perhaps due to my strbuf_addf
allergy), though it's longer and perhaps adding a twelfth variable makes
the function too complicated.  The magic chomping variant is a clever
compromise -- but needs a comment to explain itself.  So if you are
indifferent please just drop it.

Thanks,
René

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-12 17:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-08  8:59 [PATCH] sha1_file: add slash once in for_each_file_in_obj_subdir() René Scharfe
2017-07-09 11:00 ` Jeff King
2017-07-09 13:26   ` René Scharfe
2017-07-09 16:41   ` Junio C Hamano
2017-07-10  0:06     ` Jeff King
2017-07-10  2:10       ` Junio C Hamano
2017-07-12 17:58         ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).