git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git archive and glob pathspecs
@ 2014-09-02 22:17 Peter Wu
  2014-09-03  6:21 ` Duy Nguyen
  2014-09-04 13:37 ` [PATCH] archive: support filtering paths with glob Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 9+ messages in thread
From: Peter Wu @ 2014-09-02 22:17 UTC (permalink / raw)
  To: git

Hi,

The `git archive` seems to accept a pathspec judging from the error message (git
version 2.1.0):

    git archive HEAD -- :x
    fatal: pathspec 'x' did not match any files

When I try to use deeper glob specs however, it throws an error (this also
happens if I use `:(glob)**/Makefile`, tested in the git source tree):

    $ git archive HEAD -- ':(glob)*/Makefile'
    fatal: pathspec '*/Makefile' did not match any files

Strange enough, command `git log -- ':(glob)*/Makefile'` works. Any idea what is
wrong?

Kind regards,
Peter
https://lekensteyn.nl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git archive and glob pathspecs
  2014-09-02 22:17 git archive and glob pathspecs Peter Wu
@ 2014-09-03  6:21 ` Duy Nguyen
  2014-09-13 10:36   ` Peter Wu
  2014-09-04 13:37 ` [PATCH] archive: support filtering paths with glob Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2014-09-03  6:21 UTC (permalink / raw)
  To: Peter Wu; +Cc: Git Mailing List

On Wed, Sep 3, 2014 at 5:17 AM, Peter Wu <peter@lekensteyn.nl> wrote:
> Hi,
>
> The `git archive` seems to accept a pathspec judging from the error message (git
> version 2.1.0):
>
>     git archive HEAD -- :x
>     fatal: pathspec 'x' did not match any files
>
> When I try to use deeper glob specs however, it throws an error (this also
> happens if I use `:(glob)**/Makefile`, tested in the git source tree):
>
>     $ git archive HEAD -- ':(glob)*/Makefile'
>     fatal: pathspec '*/Makefile' did not match any files
>
> Strange enough, command `git log -- ':(glob)*/Makefile'` works. Any idea what is
> wrong?

There may be something wrong. This patch seems to make it work for me,
but it includes lots of empty directories. I'll have a closer look
later (btw it's surprising that negative pathspec works too..)

diff --git a/archive.c b/archive.c
index 3fc0fb2..a5be58d 100644
--- a/archive.c
+++ b/archive.c
@@ -221,6 +221,7 @@ static int path_exists(struct tree *tree, const char *path)
  int ret;

  parse_pathspec(&pathspec, 0, 0, "", paths);
+ pathspec.recursive = 1;
  ret = read_tree_recursive(tree, "", 0, 0, &pathspec, reject_entry, NULL);
  free_pathspec(&pathspec);
  return ret != 0;
@@ -237,6 +238,7 @@ static void parse_pathspec_arg(const char **pathspec,
  parse_pathspec(&ar_args->pathspec, 0,
        PATHSPEC_PREFER_FULL,
        "", pathspec);
+ ar_args->pathspec.recursive = 1;
  if (pathspec) {
  while (*pathspec) {
  if (**pathspec && !path_exists(ar_args->tree, *pathspec))

-- 
Duy

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] archive: support filtering paths with glob
  2014-09-02 22:17 git archive and glob pathspecs Peter Wu
  2014-09-03  6:21 ` Duy Nguyen
@ 2014-09-04 13:37 ` Nguyễn Thái Ngọc Duy
  2014-09-13 10:52   ` Peter Wu
  1 sibling, 1 reply; 9+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-09-04 13:37 UTC (permalink / raw)
  To: git; +Cc: peter, Nguyễn Thái Ngọc Duy

This patch fixes two problems with using :(glob) (or even "*.c"
without ":(glob)").

The first one is we forgot to turn on the 'recursive' flag in struct
pathspec. Without that, tree_entry_interesting() will not mark
potential directories "interesting" so that it can confirm whether
those directories have anything matching the pathspec.

The marking directories interesting has a side effect that we need to
walk inside a directory to realize that there's nothing interested in
there. By that time, 'archive' code has already written the (empty)
directory down. That means lots of empty directories in the result
archive.

This problem is fixed by lazily writing directories down when we know
they are actually needed. There is a theoretical bug in this
implementation: we can't write empty trees/directories that match that
pathspec.

Noticed-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Similar approach could probably be used for teaching ls-tree about pathspec..

 archive.c           | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 t/t5000-tar-tree.sh | 10 +++++++
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/archive.c b/archive.c
index 3fc0fb2..9e62bf4 100644
--- a/archive.c
+++ b/archive.c
@@ -98,9 +98,19 @@ static void setup_archive_check(struct git_attr_check *check)
 	check[1].attr = attr_export_subst;
 }
 
+struct directory {
+	struct directory *up;
+	unsigned char sha1[20];
+	int baselen, len;
+	unsigned mode;
+	int stage;
+	char path[FLEX_ARRAY];
+};
+
 struct archiver_context {
 	struct archiver_args *args;
 	write_archive_entry_fn_t write_entry;
+	struct directory *bottom;
 };
 
 static int write_archive_entry(const unsigned char *sha1, const char *base,
@@ -146,6 +156,65 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
 	return write_entry(args, sha1, path.buf, path.len, mode);
 }
 
+static void queue_directory(const unsigned char *sha1,
+		const char *base, int baselen, const char *filename,
+		unsigned mode, int stage, struct archiver_context *c)
+{
+	struct directory *d;
+	d = xmallocz(sizeof(*d) + baselen + 1 + strlen(filename));
+	d->up	   = c->bottom;
+	d->baselen = baselen;
+	d->mode	   = mode;
+	d->stage   = stage;
+	c->bottom  = d;
+	d->len = sprintf(d->path, "%.*s%s/", baselen, base, filename);
+	hashcpy(d->sha1, sha1);
+}
+
+static int write_directory(struct archiver_context *c)
+{
+	struct directory *d = c->bottom;
+	int ret;
+
+	if (!d)
+		return 0;
+	c->bottom = d->up;
+	d->path[d->len - 1] = '\0'; /* no trailing slash */
+	ret =
+		write_directory(c) ||
+		write_archive_entry(d->sha1, d->path, d->baselen,
+				    d->path + d->baselen, d->mode,
+				    d->stage, c) != READ_TREE_RECURSIVE;
+	free(d);
+	return ret ? -1 : 0;
+}
+
+static int queue_or_write_archive_entry(const unsigned char *sha1,
+		const char *base, int baselen, const char *filename,
+		unsigned mode, int stage, void *context)
+{
+	struct archiver_context *c = context;
+
+	while (c->bottom &&
+	       !(baselen >= c->bottom->len &&
+		 !strncmp(base, c->bottom->path, c->bottom->len))) {
+		struct directory *next = c->bottom->up;
+		free(c->bottom);
+		c->bottom = next;
+	}
+
+	if (S_ISDIR(mode)) {
+		queue_directory(sha1, base, baselen, filename,
+				mode, stage, c);
+		return READ_TREE_RECURSIVE;
+	}
+
+	if (write_directory(c))
+		return -1;
+	return write_archive_entry(sha1, base, baselen, filename, mode,
+				   stage, context);
+}
+
 int write_archive_entries(struct archiver_args *args,
 		write_archive_entry_fn_t write_entry)
 {
@@ -167,6 +236,7 @@ int write_archive_entries(struct archiver_args *args,
 			return err;
 	}
 
+	memset(&context, 0, sizeof(context));
 	context.args = args;
 	context.write_entry = write_entry;
 
@@ -187,9 +257,17 @@ int write_archive_entries(struct archiver_args *args,
 	}
 
 	err = read_tree_recursive(args->tree, "", 0, 0, &args->pathspec,
-				  write_archive_entry, &context);
+				  args->pathspec.has_wildcard ?
+				  queue_or_write_archive_entry :
+				  write_archive_entry,
+				  &context);
 	if (err == READ_TREE_RECURSIVE)
 		err = 0;
+	while (context.bottom) {
+		struct directory *next = context.bottom->up;
+		free(context.bottom);
+		context.bottom = next;
+	}
 	return err;
 }
 
@@ -221,6 +299,7 @@ static int path_exists(struct tree *tree, const char *path)
 	int ret;
 
 	parse_pathspec(&pathspec, 0, 0, "", paths);
+	pathspec.recursive = 1;
 	ret = read_tree_recursive(tree, "", 0, 0, &pathspec, reject_entry, NULL);
 	free_pathspec(&pathspec);
 	return ret != 0;
@@ -237,6 +316,7 @@ static void parse_pathspec_arg(const char **pathspec,
 	parse_pathspec(&ar_args->pathspec, 0,
 		       PATHSPEC_PREFER_FULL,
 		       "", pathspec);
+	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
 			if (**pathspec && !path_exists(ar_args->tree, *pathspec))
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 7b8babd..ade7a7f 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -305,4 +305,14 @@ test_expect_success GZIP 'remote tar.gz can be disabled' '
 		>remote.tar.gz
 '
 
+test_expect_success 'archive and :(glob)' '
+	git archive -v HEAD -- ":(glob)**/sh" >/dev/null 2>actual &&
+	cat >expect <<EOF &&
+a/
+a/bin/
+a/bin/sh
+EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: git archive and glob pathspecs
  2014-09-03  6:21 ` Duy Nguyen
@ 2014-09-13 10:36   ` Peter Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Wu @ 2014-09-13 10:36 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wednesday 03 September 2014 13:21:06 Duy Nguyen wrote:
> On Wed, Sep 3, 2014 at 5:17 AM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Hi,
> >
> > The `git archive` seems to accept a pathspec judging from the error message (git
> > version 2.1.0):
> >
> >     git archive HEAD -- :x
> >     fatal: pathspec 'x' did not match any files
> >
> > When I try to use deeper glob specs however, it throws an error (this also
> > happens if I use `:(glob)**/Makefile`, tested in the git source tree):
> >
> >     $ git archive HEAD -- ':(glob)*/Makefile'
> >     fatal: pathspec '*/Makefile' did not match any files
> >
> > Strange enough, command `git log -- ':(glob)*/Makefile'` works. Any idea what is
> > wrong?
> 
> There may be something wrong. This patch seems to make it work for me,
> but it includes lots of empty directories. I'll have a closer look
> later (btw it's surprising that negative pathspec works too..)

I can confirm that this patch shows Makefile's, but also includes a lot of empty
directories.

As for why this happens, my guess is that write_archive_entries() recurses the
full tree and adds every encountered directory (via read_tree_1, via
write_archive_entry()).

To fix this, write_archive (write_tar_archive, etc.) should be taught to handle
glob patterns, or parse_pathspec should expand globs (and then
parse_pathspec_arg might have to validate the remaining patterns).

Kind regards,
Peter

> diff --git a/archive.c b/archive.c
> index 3fc0fb2..a5be58d 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -221,6 +221,7 @@ static int path_exists(struct tree *tree, const char *path)
>   int ret;
> 
>   parse_pathspec(&pathspec, 0, 0, "", paths);
> + pathspec.recursive = 1;
>   ret = read_tree_recursive(tree, "", 0, 0, &pathspec, reject_entry, NULL);
>   free_pathspec(&pathspec);
>   return ret != 0;
> @@ -237,6 +238,7 @@ static void parse_pathspec_arg(const char **pathspec,
>   parse_pathspec(&ar_args->pathspec, 0,
>         PATHSPEC_PREFER_FULL,
>         "", pathspec);
> + ar_args->pathspec.recursive = 1;
>   if (pathspec) {
>   while (*pathspec) {
>   if (**pathspec && !path_exists(ar_args->tree, *pathspec))

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] archive: support filtering paths with glob
  2014-09-04 13:37 ` [PATCH] archive: support filtering paths with glob Nguyễn Thái Ngọc Duy
@ 2014-09-13 10:52   ` Peter Wu
  2014-09-21  3:55     ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Wu @ 2014-09-13 10:52 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Thursday 04 September 2014 20:37:52 Nguyễn Thái Ngọc Duy wrote:
> This patch fixes two problems with using :(glob) (or even "*.c"
> without ":(glob)").
> 
> The first one is we forgot to turn on the 'recursive' flag in struct
> pathspec. Without that, tree_entry_interesting() will not mark
> potential directories "interesting" so that it can confirm whether
> those directories have anything matching the pathspec.
> 
> The marking directories interesting has a side effect that we need to
> walk inside a directory to realize that there's nothing interested in
> there. By that time, 'archive' code has already written the (empty)
> directory down. That means lots of empty directories in the result
> archive.
> 
> This problem is fixed by lazily writing directories down when we know
> they are actually needed. There is a theoretical bug in this
> implementation: we can't write empty trees/directories that match that
> pathspec.
> 
> Noticed-by: Peter Wu <peter@lekensteyn.nl>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Ah, ignore my last mail, I just noticed this one. This patch works fine. By the
way, the glob pattern is treated as a 'nullglob'. If you specify a glob pattern
that matches nothing, an empty archive is created. Perhaps an error message
should be raised for that as it is unlikely that a user wants that?

Tested-by: Peter Wu <peter@lekensteyn.nl>

> ---
>  Similar approach could probably be used for teaching ls-tree about pathspec..
> 
>  archive.c           | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  t/t5000-tar-tree.sh | 10 +++++++
>  2 files changed, 91 insertions(+), 1 deletion(-)
> 
> diff --git a/archive.c b/archive.c
> index 3fc0fb2..9e62bf4 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -98,9 +98,19 @@ static void setup_archive_check(struct git_attr_check *check)
>  	check[1].attr = attr_export_subst;
>  }
>  
> +struct directory {
> +	struct directory *up;
> +	unsigned char sha1[20];
> +	int baselen, len;
> +	unsigned mode;
> +	int stage;
> +	char path[FLEX_ARRAY];
> +};
> +
>  struct archiver_context {
>  	struct archiver_args *args;
>  	write_archive_entry_fn_t write_entry;
> +	struct directory *bottom;
>  };
>  
>  static int write_archive_entry(const unsigned char *sha1, const char *base,
> @@ -146,6 +156,65 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>  	return write_entry(args, sha1, path.buf, path.len, mode);
>  }
>  
> +static void queue_directory(const unsigned char *sha1,
> +		const char *base, int baselen, const char *filename,
> +		unsigned mode, int stage, struct archiver_context *c)
> +{
> +	struct directory *d;
> +	d = xmallocz(sizeof(*d) + baselen + 1 + strlen(filename));
> +	d->up	   = c->bottom;
> +	d->baselen = baselen;
> +	d->mode	   = mode;
> +	d->stage   = stage;
> +	c->bottom  = d;
> +	d->len = sprintf(d->path, "%.*s%s/", baselen, base, filename);
> +	hashcpy(d->sha1, sha1);
> +}
> +
> +static int write_directory(struct archiver_context *c)
> +{
> +	struct directory *d = c->bottom;
> +	int ret;
> +
> +	if (!d)
> +		return 0;
> +	c->bottom = d->up;
> +	d->path[d->len - 1] = '\0'; /* no trailing slash */
> +	ret =
> +		write_directory(c) ||
> +		write_archive_entry(d->sha1, d->path, d->baselen,
> +				    d->path + d->baselen, d->mode,
> +				    d->stage, c) != READ_TREE_RECURSIVE;
> +	free(d);
> +	return ret ? -1 : 0;
> +}
> +
> +static int queue_or_write_archive_entry(const unsigned char *sha1,
> +		const char *base, int baselen, const char *filename,
> +		unsigned mode, int stage, void *context)
> +{
> +	struct archiver_context *c = context;
> +
> +	while (c->bottom &&
> +	       !(baselen >= c->bottom->len &&
> +		 !strncmp(base, c->bottom->path, c->bottom->len))) {
> +		struct directory *next = c->bottom->up;
> +		free(c->bottom);
> +		c->bottom = next;
> +	}
> +
> +	if (S_ISDIR(mode)) {
> +		queue_directory(sha1, base, baselen, filename,
> +				mode, stage, c);
> +		return READ_TREE_RECURSIVE;
> +	}
> +
> +	if (write_directory(c))
> +		return -1;
> +	return write_archive_entry(sha1, base, baselen, filename, mode,
> +				   stage, context);
> +}
> +
>  int write_archive_entries(struct archiver_args *args,
>  		write_archive_entry_fn_t write_entry)
>  {
> @@ -167,6 +236,7 @@ int write_archive_entries(struct archiver_args *args,
>  			return err;
>  	}
>  
> +	memset(&context, 0, sizeof(context));
>  	context.args = args;
>  	context.write_entry = write_entry;
>  
> @@ -187,9 +257,17 @@ int write_archive_entries(struct archiver_args *args,
>  	}
>  
>  	err = read_tree_recursive(args->tree, "", 0, 0, &args->pathspec,
> -				  write_archive_entry, &context);
> +				  args->pathspec.has_wildcard ?
> +				  queue_or_write_archive_entry :
> +				  write_archive_entry,
> +				  &context);
>  	if (err == READ_TREE_RECURSIVE)
>  		err = 0;
> +	while (context.bottom) {
> +		struct directory *next = context.bottom->up;
> +		free(context.bottom);
> +		context.bottom = next;
> +	}
>  	return err;
>  }
>  
> @@ -221,6 +299,7 @@ static int path_exists(struct tree *tree, const char *path)
>  	int ret;
>  
>  	parse_pathspec(&pathspec, 0, 0, "", paths);
> +	pathspec.recursive = 1;
>  	ret = read_tree_recursive(tree, "", 0, 0, &pathspec, reject_entry, NULL);
>  	free_pathspec(&pathspec);
>  	return ret != 0;
> @@ -237,6 +316,7 @@ static void parse_pathspec_arg(const char **pathspec,
>  	parse_pathspec(&ar_args->pathspec, 0,
>  		       PATHSPEC_PREFER_FULL,
>  		       "", pathspec);
> +	ar_args->pathspec.recursive = 1;
>  	if (pathspec) {
>  		while (*pathspec) {
>  			if (**pathspec && !path_exists(ar_args->tree, *pathspec))
> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 7b8babd..ade7a7f 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -305,4 +305,14 @@ test_expect_success GZIP 'remote tar.gz can be disabled' '
>  		>remote.tar.gz
>  '
>  
> +test_expect_success 'archive and :(glob)' '
> +	git archive -v HEAD -- ":(glob)**/sh" >/dev/null 2>actual &&
> +	cat >expect <<EOF &&
> +a/
> +a/bin/
> +a/bin/sh
> +EOF
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] archive: support filtering paths with glob
  2014-09-13 10:52   ` Peter Wu
@ 2014-09-21  3:55     ` Nguyễn Thái Ngọc Duy
  2014-09-22 19:15       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-09-21  3:55 UTC (permalink / raw)
  To: git; +Cc: peter, Nguyễn Thái Ngọc Duy

This patch fixes two problems with using :(glob) (or even "*.c"
without ":(glob)").

The first one is we forgot to turn on the 'recursive' flag in struct
pathspec. Without that, tree_entry_interesting() will not mark
potential directories "interesting" so that it can confirm whether
those directories have anything matching the pathspec.

The marking directories interesting has a side effect that we need to
walk inside a directory to realize that there's nothing interested in
there. By that time, 'archive' code has already written the (empty)
directory down. That means lots of empty directories in the result
archive.

This problem is fixed by lazily writing directories down when we know
they are actually needed. There is a theoretical bug in this
implementation: we can't write empty trees/directories that match that
pathspec.

path_exists() is also made stricter in order to detect non-matching
pathspec because when this 'recursive' flag is on, we most likely
match some directories. The easiest way is not consider any
directories "matched".

Noticed-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 On Sat, Sep 13, 2014 at 5:52 PM, Peter Wu <peter@lekensteyn.nl> wrote:
 > By the
 > way, the glob pattern is treated as a 'nullglob'. If you specify a glob pattern
 > that matches nothing, an empty archive is created. Perhaps an error message
 > should be raised for that as it is unlikely that a user wants that?

 If you do "git archive HEAD non-existent-path", you get an error.
 There's no reason why glob is an exception. Fixed in this version.

 archive.c           | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 t/t5000-tar-tree.sh | 14 ++++++++
 2 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/archive.c b/archive.c
index 952a659..94a9981 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
 #include "archive.h"
 #include "parse-options.h"
 #include "unpack-trees.h"
+#include "dir.h"
 
 static char const * const archive_usage[] = {
 	N_("git archive [options] <tree-ish> [<path>...]"),
@@ -98,9 +99,19 @@ static void setup_archive_check(struct git_attr_check *check)
 	check[1].attr = attr_export_subst;
 }
 
+struct directory {
+	struct directory *up;
+	unsigned char sha1[20];
+	int baselen, len;
+	unsigned mode;
+	int stage;
+	char path[FLEX_ARRAY];
+};
+
 struct archiver_context {
 	struct archiver_args *args;
 	write_archive_entry_fn_t write_entry;
+	struct directory *bottom;
 };
 
 static int write_archive_entry(const unsigned char *sha1, const char *base,
@@ -146,6 +157,65 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
 	return write_entry(args, sha1, path.buf, path.len, mode);
 }
 
+static void queue_directory(const unsigned char *sha1,
+		const char *base, int baselen, const char *filename,
+		unsigned mode, int stage, struct archiver_context *c)
+{
+	struct directory *d;
+	d = xmallocz(sizeof(*d) + baselen + 1 + strlen(filename));
+	d->up	   = c->bottom;
+	d->baselen = baselen;
+	d->mode	   = mode;
+	d->stage   = stage;
+	c->bottom  = d;
+	d->len = sprintf(d->path, "%.*s%s/", baselen, base, filename);
+	hashcpy(d->sha1, sha1);
+}
+
+static int write_directory(struct archiver_context *c)
+{
+	struct directory *d = c->bottom;
+	int ret;
+
+	if (!d)
+		return 0;
+	c->bottom = d->up;
+	d->path[d->len - 1] = '\0'; /* no trailing slash */
+	ret =
+		write_directory(c) ||
+		write_archive_entry(d->sha1, d->path, d->baselen,
+				    d->path + d->baselen, d->mode,
+				    d->stage, c) != READ_TREE_RECURSIVE;
+	free(d);
+	return ret ? -1 : 0;
+}
+
+static int queue_or_write_archive_entry(const unsigned char *sha1,
+		const char *base, int baselen, const char *filename,
+		unsigned mode, int stage, void *context)
+{
+	struct archiver_context *c = context;
+
+	while (c->bottom &&
+	       !(baselen >= c->bottom->len &&
+		 !strncmp(base, c->bottom->path, c->bottom->len))) {
+		struct directory *next = c->bottom->up;
+		free(c->bottom);
+		c->bottom = next;
+	}
+
+	if (S_ISDIR(mode)) {
+		queue_directory(sha1, base, baselen, filename,
+				mode, stage, c);
+		return READ_TREE_RECURSIVE;
+	}
+
+	if (write_directory(c))
+		return -1;
+	return write_archive_entry(sha1, base, baselen, filename, mode,
+				   stage, context);
+}
+
 int write_archive_entries(struct archiver_args *args,
 		write_archive_entry_fn_t write_entry)
 {
@@ -167,6 +237,7 @@ int write_archive_entries(struct archiver_args *args,
 			return err;
 	}
 
+	memset(&context, 0, sizeof(context));
 	context.args = args;
 	context.write_entry = write_entry;
 
@@ -187,9 +258,17 @@ int write_archive_entries(struct archiver_args *args,
 	}
 
 	err = read_tree_recursive(args->tree, "", 0, 0, &args->pathspec,
-				  write_archive_entry, &context);
+				  args->pathspec.has_wildcard ?
+				  queue_or_write_archive_entry :
+				  write_archive_entry,
+				  &context);
 	if (err == READ_TREE_RECURSIVE)
 		err = 0;
+	while (context.bottom) {
+		struct directory *next = context.bottom->up;
+		free(context.bottom);
+		context.bottom = next;
+	}
 	return err;
 }
 
@@ -211,7 +290,16 @@ static int reject_entry(const unsigned char *sha1, const char *base,
 			int baselen, const char *filename, unsigned mode,
 			int stage, void *context)
 {
-	return -1;
+	int ret = -1;
+	if (S_ISDIR(mode)) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, base);
+		strbuf_addstr(&sb, filename);
+		if (!match_pathspec(context, sb.buf, sb.len, 0, NULL, 1))
+			ret = READ_TREE_RECURSIVE;
+		strbuf_release(&sb);
+	}
+	return ret;
 }
 
 static int path_exists(struct tree *tree, const char *path)
@@ -221,7 +309,9 @@ static int path_exists(struct tree *tree, const char *path)
 	int ret;
 
 	parse_pathspec(&pathspec, 0, 0, "", paths);
-	ret = read_tree_recursive(tree, "", 0, 0, &pathspec, reject_entry, NULL);
+	pathspec.recursive = 1;
+	ret = read_tree_recursive(tree, "", 0, 0, &pathspec,
+				  reject_entry, &pathspec);
 	free_pathspec(&pathspec);
 	return ret != 0;
 }
@@ -237,6 +327,7 @@ static void parse_pathspec_arg(const char **pathspec,
 	parse_pathspec(&ar_args->pathspec, 0,
 		       PATHSPEC_PREFER_FULL,
 		       "", pathspec);
+	ar_args->pathspec.recursive = 1;
 	if (pathspec) {
 		while (*pathspec) {
 			if (**pathspec && !path_exists(ar_args->tree, *pathspec))
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 7b8babd..d01bbdc 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -305,4 +305,18 @@ test_expect_success GZIP 'remote tar.gz can be disabled' '
 		>remote.tar.gz
 '
 
+test_expect_success 'archive and :(glob)' '
+	git archive -v HEAD -- ":(glob)**/sh" >/dev/null 2>actual &&
+	cat >expect <<EOF &&
+a/
+a/bin/
+a/bin/sh
+EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'catch non-matching pathspec' '
+	test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
+'
+
 test_done
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] archive: support filtering paths with glob
  2014-09-21  3:55     ` [PATCH v2] " Nguyễn Thái Ngọc Duy
@ 2014-09-22 19:15       ` Junio C Hamano
  2014-09-22 23:04         ` Duy Nguyen
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2014-09-22 19:15 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, peter

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> +static void queue_directory(const unsigned char *sha1,
> +		const char *base, int baselen, const char *filename,
> +		unsigned mode, int stage, struct archiver_context *c)
> +{
> +	struct directory *d;
> +	d = xmallocz(sizeof(*d) + baselen + 1 + strlen(filename));
> +	d->up	   = c->bottom;
> +	d->baselen = baselen;
> +	d->mode	   = mode;
> +	d->stage   = stage;
> +	c->bottom  = d;
> +	d->len = sprintf(d->path, "%.*s%s/", baselen, base, filename);
> +	hashcpy(d->sha1, sha1);
> +}
> +
> +static int write_directory(struct archiver_context *c)
> +{
> +	struct directory *d = c->bottom;
> +	int ret;
> +
> +	if (!d)
> +		return 0;
> +	c->bottom = d->up;
> +	d->path[d->len - 1] = '\0'; /* no trailing slash */
> +	ret =
> +		write_directory(c) ||

By recursing into itself using c->bottom set to one level higher, we
first flush all the leading directories.

> +		write_archive_entry(d->sha1, d->path, d->baselen,
> +				    d->path + d->baselen, d->mode,
> +				    d->stage, c) != READ_TREE_RECURSIVE;

And after the entries for the leading directories, we write an entry
for this directory.

Which makes sense; presumably the caller will then write an entry
that hangs this directory after calling this.

When we have a/b/c and a/d/e to be written, the first round would
write a/ and then a/b/ with the above, and presumably elsewhere
somebody will write a/b/c; next time around we do need to write a/d/
but we wouldn't want to write a/ itself.  How is this code
preventing the recursion going all the way up every time to avoid
repeating a/?

Puzzled...

> +	free(d);
> +	return ret ? -1 : 0;
> +}
> +
> +static int queue_or_write_archive_entry(const unsigned char *sha1,
> +		const char *base, int baselen, const char *filename,
> +		unsigned mode, int stage, void *context)
> +{
> +	struct archiver_context *c = context;
> +
> +	while (c->bottom &&
> +	       !(baselen >= c->bottom->len &&
> +		 !strncmp(base, c->bottom->path, c->bottom->len))) {

Is this comparison, which does not see if the substring matches from
the beginning but without making sure that the tail of the shorter
substring coincides with the directory boundary in the longer
string, correctly done?  Or does it consider abcd/ef is inside abc/?

Ah, we should be OK as I think <base, baselen> is passed with a
trailing slash appended in read_tree_1() when it recurses, so this
comparison is correct.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] archive: support filtering paths with glob
  2014-09-22 19:15       ` Junio C Hamano
@ 2014-09-22 23:04         ` Duy Nguyen
  2014-09-23 16:57           ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2014-09-22 23:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Peter Wu

On Tue, Sep 23, 2014 at 2:15 AM, Junio C Hamano <gitster@pobox.com> wrote:
> When we have a/b/c and a/d/e to be written, the first round would
> write a/ and then a/b/ with the above, and presumably elsewhere
> somebody will write a/b/c; next time around we do need to write a/d/
> but we wouldn't want to write a/ itself.  How is this code
> preventing the recursion going all the way up every time to avoid
> repeating a/?
>
> Puzzled...

We never traverse 'a' (or any directory) twice and we only push a
directory to the stack when we examine it. After a/b and a are written
down and we examine 'd', 'a/d' is pushed to the stack. When we hit
'a/d/e', we only have 'a/d' in the stack, not 'a'.
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] archive: support filtering paths with glob
  2014-09-22 23:04         ` Duy Nguyen
@ 2014-09-23 16:57           ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2014-09-23 16:57 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Peter Wu

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Sep 23, 2014 at 2:15 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> When we have a/b/c and a/d/e to be written, the first round would
>> write a/ and then a/b/ with the above, and presumably elsewhere
>> somebody will write a/b/c; next time around we do need to write a/d/
>> but we wouldn't want to write a/ itself.  How is this code
>> preventing the recursion going all the way up every time to avoid
>> repeating a/?
>>
>> Puzzled...
>
> We never traverse 'a' (or any directory) twice and we only push a
> directory to the stack when we examine it. After a/b and a are written
> down and we examine 'd', 'a/d' is pushed to the stack. When we hit
> 'a/d/e', we only have 'a/d' in the stack, not 'a'.

So the traverser will feed you

	a/
        a/b/
        a/b/c
        a/d/
        a/d/e

in that order, and you keep a/ and a/b/ until you see a/b/c at which
time you flush the two queued ones and likewise for a/d/ and a/d/e.
When you queue a/d/ you do not decompose that to a/ and a/d/ because
you know that the caller of the callback would have made sure all
the leading path components have been given you by the time it gave
you a/d/, and it all works out fine.

Thanks for clarifying.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-23 16:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02 22:17 git archive and glob pathspecs Peter Wu
2014-09-03  6:21 ` Duy Nguyen
2014-09-13 10:36   ` Peter Wu
2014-09-04 13:37 ` [PATCH] archive: support filtering paths with glob Nguyễn Thái Ngọc Duy
2014-09-13 10:52   ` Peter Wu
2014-09-21  3:55     ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2014-09-22 19:15       ` Junio C Hamano
2014-09-22 23:04         ` Duy Nguyen
2014-09-23 16:57           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).