git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal
@ 2019-02-13 20:55 Matheus Tavares
  2019-02-14 21:16 ` Christian Couder
  0 siblings, 1 reply; 4+ messages in thread
From: Matheus Tavares @ 2019-02-13 20:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
This is my microproject for GSoC 2019. It's still a RFC because I have  
some questions. Any help will be much appreciated.                      
                                                                        
There're three places inside copy_or_link_directory's loop where        
die_errno() is called. Should I call dir_iterator_abort, at these       
places, before die_errno() is called (to free resources)?               
                                                                        
And if so, should I check dir_iterator_abort's return code? It's said at
dir-iterator.h that dir_iterator_abort returns ITER_ERROR on error, but 
by the code, we can see that it only possibly returns ITER_DONE.        
                                                                        
Finally, if this call and check both needs to be done, there'll be code 
replication in this three places. Should I add a goto and do all this    
stuff at the function end? (But then, I would have to store die_errno's 
error messages and errno code in temporary variables. Is this approach  
good?) 

 builtin/clone.c | 65 +++++++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..f5208ad9ca 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -392,50 +394,55 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
-static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+static void mkdir_if_missing(const char *pathname, mode_t mode)
 {
-	struct dirent *de;
+	/*
+	 * Tries to create a dir at pathname. If pathname already exists and
+	 * is a dir, do nothing.
+	 */
 	struct stat buf;
-	int src_len, dest_len;
-	DIR *dir;
 
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
-
-	if (mkdir(dest->buf, 0777)) {
+	if (mkdir(pathname, mode)) {
 		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
+			die_errno(_("failed to create directory '%s'"),
+				  pathname);
+		else if (stat(pathname, &buf))
+			die_errno(_("failed to stat '%s'"), pathname);
 		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
+			die(_("%s exists and is not a directory"), pathname);
 	}
+}
+
+static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
+				   const char *src_repo, int src_baselen)
+{
+	int src_len, dest_len;
+	struct dir_iterator *iter;
+	int iter_status;
+
+	mkdir_if_missing(dest->buf, 0777);
+
+	iter = dir_iterator_begin(src->buf);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			if (iter->basename[0] != '.')
+				mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -452,7 +459,11 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-13 20:55 [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
@ 2019-02-14 21:16 ` Christian Couder
  2019-02-14 22:03   ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Couder @ 2019-02-14 21:16 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, Junio C Hamano

On Thu, Feb 14, 2019 at 1:16 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Replace usage of opendir/readdir/closedir API to traverse directories
> recursively, at copy_or_link_directory function, by the dir-iterator
> API.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> This is my microproject for GSoC 2019. It's still a RFC because I have
> some questions. Any help will be much appreciated.

Thanks for working on a microproject!

> There're three places inside copy_or_link_directory's loop where
> die_errno() is called. Should I call dir_iterator_abort, at these
> places, before die_errno() is called (to free resources)?

I don't think it's necessary. We care about freeing resources when we
report errors (for example by returning an error code from inside a
function), but not when we are exiting.

> -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> -                                  const char *src_repo, int src_baselen)
> +static void mkdir_if_missing(const char *pathname, mode_t mode)

It looks like your patch is both splitting copy_or_link_directory()
into 2 functions and converting it to use the dir-iterator API. Maybe
it would be better to have 2 patches instead, one for splitting it and
one for converting it.

>  {
> -       struct dirent *de;
> +       /*
> +        * Tries to create a dir at pathname. If pathname already exists and
> +        * is a dir, do nothing.
> +        */

I think we usually put such comments just before the function. And
maybe it could be shortened to "Create a dir at pathname unless
there's already one"

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-14 21:16 ` Christian Couder
@ 2019-02-14 22:03   ` Matheus Tavares Bernardino
  2019-02-15  8:59     ` Christian Couder
  0 siblings, 1 reply; 4+ messages in thread
From: Matheus Tavares Bernardino @ 2019-02-14 22:03 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano

On Thu, Feb 14, 2019 at 7:16 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Thu, Feb 14, 2019 at 1:16 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Replace usage of opendir/readdir/closedir API to traverse directories
> > recursively, at copy_or_link_directory function, by the dir-iterator
> > API.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> > This is my microproject for GSoC 2019. It's still a RFC because I have
> > some questions. Any help will be much appreciated.
>
> Thanks for working on a microproject!
>

Hi, Christian. Thank you for the review and comments.

> > There're three places inside copy_or_link_directory's loop where
> > die_errno() is called. Should I call dir_iterator_abort, at these
> > places, before die_errno() is called (to free resources)?
>
> I don't think it's necessary. We care about freeing resources when we
> report errors (for example by returning an error code from inside a
> function), but not when we are exiting.
>

Ok, thanks!

> > -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > -                                  const char *src_repo, int src_baselen)
> > +static void mkdir_if_missing(const char *pathname, mode_t mode)
>
> It looks like your patch is both splitting copy_or_link_directory()
> into 2 functions and converting it to use the dir-iterator API. Maybe
> it would be better to have 2 patches instead, one for splitting it and
> one for converting it.
>

Got it. As the justification for splitting the function was to use the
extracted part in the section that was previously recursive, I thought
both changes should be in the same patch. But I really was in doubt
about that. Should I split it into two patches and mention that
justification at the first one? Or just split?

> >  {
> > -       struct dirent *de;
> > +       /*
> > +        * Tries to create a dir at pathname. If pathname already exists and
> > +        * is a dir, do nothing.
> > +        */
>
> I think we usually put such comments just before the function. And
> maybe it could be shortened to "Create a dir at pathname unless
> there's already one"

Right, the shortened version does sounds much better, thanks! About
the comment placement, I followed what I saw in other functions from
the same file ("copy_alternates", for example). But also, I couldn't
find any instruction about that at Documentation/CodingGuidelines. So
should I move it as you suggested?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal
  2019-02-14 22:03   ` Matheus Tavares Bernardino
@ 2019-02-15  8:59     ` Christian Couder
  0 siblings, 0 replies; 4+ messages in thread
From: Christian Couder @ 2019-02-15  8:59 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git, Junio C Hamano

On Thu, Feb 14, 2019 at 11:04 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Thu, Feb 14, 2019 at 7:16 PM Christian Couder
> <christian.couder@gmail.com> wrote:

> > > -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > > -                                  const char *src_repo, int src_baselen)
> > > +static void mkdir_if_missing(const char *pathname, mode_t mode)
> >
> > It looks like your patch is both splitting copy_or_link_directory()
> > into 2 functions and converting it to use the dir-iterator API. Maybe
> > it would be better to have 2 patches instead, one for splitting it and
> > one for converting it.
> >
>
> Got it. As the justification for splitting the function was to use the
> extracted part in the section that was previously recursive, I thought
> both changes should be in the same patch. But I really was in doubt
> about that. Should I split it into two patches and mention that
> justification at the first one? Or just split?

If 2 patches instead of 1 makes it easier to review and understand
what's going on, then I think you should indeed send 2 patches and
mention the justification for splitting the function in the commit
message of the first patch.

> > I think we usually put such comments just before the function. And
> > maybe it could be shortened to "Create a dir at pathname unless
> > there's already one"
>
> Right, the shortened version does sounds much better, thanks! About
> the comment placement, I followed what I saw in other functions from
> the same file ("copy_alternates", for example).

Then it's ok to place it like you did. Sorry about the noise.

> But also, I couldn't
> find any instruction about that at Documentation/CodingGuidelines. So
> should I move it as you suggested?

I think we have not been very consistent over the years. Recently I
think we have tried to add or move API documentation inside header
files, and in general before functions in the code, but yeah it might
not have been documented.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-15  9:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13 20:55 [GSoC][RFC PATCH] clone: use dir-iterator to avoid explicit dir traversal Matheus Tavares
2019-02-14 21:16 ` Christian Couder
2019-02-14 22:03   ` Matheus Tavares Bernardino
2019-02-15  8:59     ` Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).