git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu)
@ 2016-07-11 22:45 Joey Hess
  2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

Back from vacation with a reroll of jh/clean-smudge-annex.

Deals with conflicting changes from cc/apply-am in pu.

Since tb/convert-peek-in-index is not currently in pu, this reroll isn't
based on it, and will conflict if that topic gets added back into pu.
Not sure what the status of tb/convert-peek-in-index is at this point?

Improvements from Junio's review:

	fix build with DEVELOPER=1
	style fixes
	use test_cmp in test cases
	improve robustness of a test case
	clean up some confusing code
	small performance tweak

Joey Hess (8):
  clarify %f documentation
  add smudgeToFile and cleanFromFile filter configs
  use cleanFromFile in git add
  use smudgeToFile in git checkout etc
  warn on unusable smudgeToFile/cleanFromFile config
  better recovery from failure of smudgeToFile filter
  use smudgeToFile filter in git am
  use smudgeToFile filter in recursive merge

 Documentation/config.txt        |  18 ++++-
 Documentation/gitattributes.txt |  42 ++++++++++++
 apply.c                         |  16 +++++
 convert.c                       | 148 ++++++++++++++++++++++++++++++++++++----
 convert.h                       |  10 +++
 entry.c                         |  59 ++++++++++++----
 merge-recursive.c               |  53 +++++++++++---
 sha1_file.c                     |  42 ++++++++++--
 t/t0021-conversion.sh           | 117 +++++++++++++++++++++++++++++++
 9 files changed, 459 insertions(+), 46 deletions(-)

-- 
2.8.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v5 1/8] clarify %f documentation
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 Documentation/gitattributes.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index f2afdb6..197ece8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -379,6 +379,11 @@ substitution.  For example:
 	smudge = git-p4-filter --smudge %f
 ------------------------
 
+Note that "%f" is the name of the path that is being worked on. Depending
+on the version that is being filtered, the corresponding file on disk may
+not exist, or may have different contents. So, smudge and clean commands
+should not try to access the file on disk, but only act as filters on the
+content provided to them on standard input.
 
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
  2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-13 16:02   ` Lars Schneider
  2016-07-11 22:45 ` [PATCH v5 3/8] use cleanFromFile in git add Joey Hess
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

This adds new smudgeToFile and cleanFromFile filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like cleanFromFile commands that avoid
reading the whole content of the file, and for smudgeToFile commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudgeToFile should also provide a smudge
command to be used in cases where smudgeToFile is not used. And ditto
with cleanFromFile and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 Documentation/config.txt        |  18 ++++++-
 Documentation/gitattributes.txt |  37 ++++++++++++++
 convert.c                       | 111 +++++++++++++++++++++++++++++++++++-----
 convert.h                       |  10 ++++
 4 files changed, 160 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 19493aa..a55bed8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1325,15 +1325,29 @@ format.useAutoBase::
 	format-patch by default.
 
 filter.<driver>.clean::
-	The command which is used to convert the content of a worktree
+	The command which is used as a filter to convert the content of a worktree
 	file to a blob upon checkin.  See linkgit:gitattributes[5] for
 	details.
 
 filter.<driver>.smudge::
-	The command which is used to convert the content of a blob
+	The command which is used as a filter to convert the content of a blob
 	object to a worktree file upon checkout.  See
 	linkgit:gitattributes[5] for details.
 
+filter.<driver>.cleanFromFile::
+	Similar to filter.<driver>.clean but the specified command
+	directly accesses a worktree file on disk, rather than
+	receiving the file content from standard input.
+	Only used when filter.<driver>.clean is also configured.
+	See linkgit:gitattributes[5] for details.
+
+filter.<driver>.smudgeToFile::
+	Similar to filter.<driver>.smudge but the specified command
+	writes the content of a blob directly to a worktree file,
+	rather than to standard output.
+	Only used when filter.<driver>.smudge is also configured.
+	See linkgit:gitattributes[5] for details.
+
 fsck.<msg-id>::
 	Allows overriding the message type (error, warn or ignore) of a
 	specific message ID such as `missingEmail`.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 197ece8..a58aafc 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+There are two extra commands "cleanFromFile" and "smudgeToFile", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead read/write files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Both "cleanFromFile" and "smudgeToFile" are provided a path as an
+added parameter after the configured command line.
+
+The "cleanFromFile" command is provided the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+The "smudgeToFile" command is provided a path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudgeToFile"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters cannot use "cleanFromFile"
+and "smudgeToFile", since the files are not present to disk. So, to avoid
+inconsistent behavior, "cleanFromFile" will only be used if "clean" is
+also configured, and "smudgeToFile" will only be used if "smudge" is also
+configured.
+
+An example large file storage filter driver using cleanFromFile and
+smudgeToFile follows:
+
+------------------------
+[filter "bigfiles"]
+	clean = store-bigfile --from-stdin
+	cleanFromFile = store-bigfile --from-file
+	smudge = retrieve-bigfile --to-stdout
+	smudgeToFile = retrieve-bigfile --to-file
+	required
+------------------------
+
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/convert.c b/convert.c
index 214c99f..eb7774f 100644
--- a/convert.c
+++ b/convert.c
@@ -358,7 +358,8 @@ struct filter_params {
 	unsigned long size;
 	int fd;
 	const char *cmd;
-	const char *path;
+	const char *path; /* Path within the git repository */
+	const char *fspath; /* Path to file on disk */
 };
 
 static int filter_buffer_or_fd(int in, int out, void *data)
@@ -387,6 +388,15 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
 	strbuf_release(&path);
 
+	/* append fspath to the command if it's set, separated with a space */
+	if (params->fspath) {
+		struct strbuf fspath = STRBUF_INIT;
+		sq_quote_buf(&fspath, params->fspath);
+		strbuf_addstr(&cmd, " ");
+		strbuf_addbuf(&cmd, &fspath);
+		strbuf_release(&fspath);
+	}
+
 	argv[0] = cmd.buf;
 
 	child_process.argv = argv;
@@ -425,7 +435,8 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len, int fd,
+static int apply_filter(const char *path, const char *fspath,
+			const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -454,6 +465,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	params.fd = fd;
 	params.cmd = cmd;
 	params.path = path;
+	params.fspath = fspath;
 
 	fflush(NULL);
 	if (start_async(&async))
@@ -484,6 +496,8 @@ static struct convert_driver {
 	struct convert_driver *next;
 	const char *smudge;
 	const char *clean;
+	const char *smudge_to_file;
+	const char *clean_from_file;
 	int required;
 } *user_convert, **user_convert_tail;
 
@@ -510,8 +524,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	}
 
 	/*
-	 * filter.<name>.smudge and filter.<name>.clean specifies
-	 * the command line:
+	 * filter.<name>.smudge, filter.<name>.clean,
+	 * filter.<name>.smudgeToFile, filter.<name>.cleanFromFile
+	 * specifies the command line:
 	 *
 	 *	command-line
 	 *
@@ -524,6 +539,12 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	if (!strcmp("clean", key))
 		return git_config_string(&drv->clean, var, value);
 
+	if (!strcmp("smudgetofile", key))
+		return git_config_string(&drv->smudge_to_file, var, value);
+
+	if (!strcmp("cleanfromfile", key))
+		return git_config_string(&drv->clean_from_file, var, value);
+
 	if (!strcmp("required", key)) {
 		drv->required = git_config_bool(var, value);
 		return 0;
@@ -821,7 +842,37 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+	return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
+int can_clean_from_file(const char *path)
+{
+	struct conv_attrs ca;
+
+	convert_attrs(&ca, path);
+	if (!ca.drv)
+		return 0;
+
+	/*
+	 * Only use the cleanFromFile filter when the clean filter is also
+	 * configured.
+	 */
+	return (ca.drv->clean_from_file && ca.drv->clean);
+}
+
+int can_smudge_to_file(const char *path)
+{
+	struct conv_attrs ca;
+
+	convert_attrs(&ca, path);
+	if (!ca.drv)
+		return 0;
+
+	/*
+	 * Only use the smudgeToFile filter when the smudge filter is also
+	 * configured.
+	 */
+	return (ca.drv->smudge_to_file && ca.drv->smudge);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -864,7 +915,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		required = ca.drv->required;
 	}
 
-	ret |= apply_filter(path, src, len, -1, dst, filter);
+	ret |= apply_filter(path, NULL, src, len, -1, dst, filter);
 	if (!ret && required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -889,14 +940,34 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	assert(ca.drv);
 	assert(ca.drv->clean);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+	if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
-static int convert_to_working_tree_internal(const char *path, const char *src,
+void convert_to_git_filter_from_file(const char *path, struct strbuf *dst,
+				   enum safe_crlf checksafe)
+{
+	struct conv_attrs ca;
+	convert_attrs(&ca, path);
+
+	assert(ca.drv);
+	assert(ca.drv->clean);
+	assert(ca.drv->clean_from_file);
+
+	if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file))
+		die("%s: cleanFromFile filter '%s' failed", path, ca.drv->name);
+
+	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
+		checksafe);
+	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
+}
+
+static int convert_to_working_tree_internal(const char *path,
+					    const char *destpath,
+					    const char *src,
 					    size_t len, struct strbuf *dst,
 					    int normalizing)
 {
@@ -907,7 +978,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 
 	convert_attrs(&ca, path);
 	if (ca.drv) {
-		filter = ca.drv->smudge;
+		if (destpath)
+			filter = ca.drv->smudge_to_file;
+		else
+			filter = ca.drv->smudge;
 		required = ca.drv->required;
 	}
 
@@ -918,7 +992,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	}
 	/*
 	 * CRLF conversion can be skipped if normalizing, unless there
-	 * is a smudge filter.  The filter might expect CRLFs.
+	 * is a filter.  The filter might expect CRLFs.
 	 */
 	if (filter || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
@@ -928,21 +1002,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, filter);
+	ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter);
 	if (!ret_filter && required)
-		die("%s: smudge filter %s failed", path, ca.drv->name);
+		die("%s: %s filter %s failed", path, destpath ? "smudgeToFile" : "smudge", ca.drv->name);
 
 	return ret | ret_filter;
 }
 
 int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	return convert_to_working_tree_internal(path, src, len, dst, 0);
+	return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
+}
+
+int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len)
+{
+	struct strbuf output = STRBUF_INIT;
+	int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
+	/* The smudgeToFile filter stdout is not used. */
+	strbuf_release(&output);
+	return ret;
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
+	int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1);
 	if (ret) {
 		src = dst->buf;
 		len = dst->len;
diff --git a/convert.h b/convert.h
index 82871a1..6f46d10 100644
--- a/convert.h
+++ b/convert.h
@@ -42,6 +42,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
 			  struct strbuf *dst, enum safe_crlf checksafe);
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
+extern int convert_to_working_tree_filter_to_file(const char *path,
+						  const char *destpath,
+						  const char *src,
+						  size_t len);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
 static inline int would_convert_to_git(const char *path)
@@ -53,6 +57,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
 				     struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
+/* Precondition: can_clean_from_file(path) == true */
+extern void convert_to_git_filter_from_file(const char *path,
+					    struct strbuf *dst,
+					    enum safe_crlf checksafe);
+extern int can_clean_from_file(const char *path);
+extern int can_smudge_to_file(const char *path);
 
 /*****************************************************************
  *
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 3/8] use cleanFromFile in git add
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
  2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
  2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 4/8] use smudgeToFile in git checkout etc Joey Hess
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

Includes test cases.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 sha1_file.c           | 42 ++++++++++++++++++++++++++++++++++++------
 t/t0021-conversion.sh | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 2fc22b0..549a20f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3335,6 +3335,29 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 	return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+				      const char *path, unsigned flags)
+{
+	int ret;
+	const int write_object = flags & HASH_WRITE_OBJECT;
+	struct strbuf sbuf = STRBUF_INIT;
+
+	assert(path);
+	assert(can_clean_from_file(path));
+
+	convert_to_git_filter_from_file(path, &sbuf,
+				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+	if (write_object)
+		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+				      sha1);
+	else
+		ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+				     sha1);
+	strbuf_release(&sbuf);
+	return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
 		      const char *path, unsigned flags)
 {
@@ -3427,12 +3450,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned
 
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
-		fd = open(path, O_RDONLY);
-		if (fd < 0)
-			return error_errno("open(\"%s\")", path);
-		if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-			return error("%s: failed to insert into database",
-				     path);
+		if (can_clean_from_file(path)) {
+			if (index_from_file_convert_blob(sha1, path, flags) < 0)
+				return error("%s: failed to insert into database",
+					     path);
+		}
+		else {
+			fd = open(path, O_RDONLY);
+			if (fd < 0)
+				return error_errno("open(\"%s\")", path);
+			if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+				return error("%s: failed to insert into database",
+					     path);
+		}
 		break;
 	case S_IFLNK:
 		if (strbuf_readlink(&sb, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..bd84b80 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <<EOF >rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
 	git config filter.rot13.smudge ./rot13.sh &&
 	git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
 	test_must_be_empty err
 '
 
+test_expect_success 'cleanFromFile filter is used when adding a file' '
+	test_config filter.rot13.cleanFromFile ./rot13-from-file.sh &&
+
+	echo "*.t filter=rot13" >.gitattributes &&
+
+	cat test >fstest.t &&
+	git add fstest.t &&
+	test -e rot13-from-file.ran &&
+	rm -f rot13-from-file.ran &&
+
+	rm -f fstest.t &&
+	git checkout -- fstest.t &&
+	test_cmp test fstest.t
+'
+
+test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' '
+	test_config filter.noclean.smudge ./rot13.sh &&
+	test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
+
+	echo "*.no filter=noclean" >.gitattributes &&
+
+	cat test >test.no &&
+	git add test.no &&
+	test ! -e rot13-from-file.ran &&
+	git cat-file blob :test.no >actual &&
+	test_cmp test actual
+'
+
 test_done
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 4/8] use smudgeToFile in git checkout etc
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (2 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 3/8] use cleanFromFile in git add Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config Joey Hess
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

This makes git checkout, git reset, etc use smudgeToFile.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudgeToFile as well.)

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 entry.c               | 40 ++++++++++++++++++++++++++++++++--------
 t/t0021-conversion.sh | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..81d12a1 100644
--- a/entry.c
+++ b/entry.c
@@ -146,6 +146,7 @@ static int write_entry(struct cache_entry *ce,
 	unsigned long size;
 	size_t wrote, newsize = 0;
 	struct stat st;
+	int regular_file, smudge_to_file;
 
 	if (ce_mode_s_ifmt == S_IFREG) {
 		struct stream_filter *filter = get_stream_filter(ce->name, ce->sha1);
@@ -175,8 +176,13 @@ static int write_entry(struct cache_entry *ce,
 
 		/*
 		 * Convert from git internal format to working tree format
+		 * unless the smudgeToFile filter can write to the
+		 * file directly.
 		 */
-		if (ce_mode_s_ifmt == S_IFREG &&
+		regular_file = ce_mode_s_ifmt == S_IFREG;
+		smudge_to_file = regular_file
+			&& can_smudge_to_file(ce->name);
+		if (regular_file && !smudge_to_file &&
 		    convert_to_working_tree(ce->name, new, size, &buf)) {
 			free(new);
 			new = strbuf_detach(&buf, &newsize);
@@ -189,13 +195,31 @@ static int write_entry(struct cache_entry *ce,
 			return error_errno("unable to create file %s", path);
 		}
 
-		wrote = write_in_full(fd, new, size);
-		if (!to_tempfile)
-			fstat_done = fstat_output(fd, state, &st);
-		close(fd);
-		free(new);
-		if (wrote != size)
-			return error("unable to write file %s", path);
+		if (!smudge_to_file) {
+			wrote = write_in_full(fd, new, size);
+			if (!to_tempfile)
+				fstat_done = fstat_output(fd, state, &st);
+			close(fd);
+			free(new);
+			if (wrote != size)
+				return error("unable to write file %s", path);
+		}
+		else {
+			close(fd);
+			convert_to_working_tree_filter_to_file(ce->name, path, new, size);
+			free(new);
+			/*
+			 * The smudgeToFile filter may have replaced the
+			 * file; open it to make sure that the file
+			 * exists.
+			 */
+			fd = open(path, O_RDONLY);
+			if (fd < 0)
+				return error_errno("unable to create file %s", path);
+			if (!to_tempfile)
+				fstat_done = fstat_output(fd, state, &st);
+			close(fd);
+		}
 		break;
 	case S_IFGITLINK:
 		if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index bd84b80..ea18b17 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <<EOF >rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <<EOF >rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh >"\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
 	git config filter.rot13.smudge ./rot13.sh &&
 	git config filter.rot13.clean ./rot13.sh &&
@@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when adding a file' '
 	test_cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used when checking out a file' '
+	test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+	rm -f fstest.t &&
+	git checkout -- fstest.t &&
+	test_cmp test fstest.t &&
+
+	test -e rot13-to-file.ran &&
+	rm -f rot13-to-file.ran
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' '
 	test_config filter.noclean.smudge ./rot13.sh &&
 	test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
@@ -304,4 +323,15 @@ test_expect_success 'cleanFromFile filter is not used when clean filter is not c
 	test_cmp test actual
 '
 
+test_expect_success 'smudgeToFile filter is not used when smudge filter is not configured' '
+	test_config filter.nosmudge.clean ./rot13.sh &&
+	test_config filter.nosmudge.smudgeToFile ./rot13-to-file.sh &&
+
+	echo "*.no filter=nosmudge" >.gitattributes &&
+
+	rm -f fstest.t &&
+	git checkout -- fstest.t &&
+	test ! -e rot13-to-file.ran
+'
+
 test_done
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (3 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 4/8] use smudgeToFile in git checkout etc Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 6/8] better recovery from failure of smudgeToFile filter Joey Hess
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

Let the user know when they have a smudgeToFile/cleanFromFile config
that cannot be used because the corresponding smudge/clean config
is missing.

The warning is only displayed a maximum of once per git invocation,
and only when doing an operation that would use the filter.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 convert.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/convert.c b/convert.c
index eb7774f..e1b0b44 100644
--- a/convert.c
+++ b/convert.c
@@ -845,34 +845,50 @@ int would_convert_to_git_filter_fd(const char *path)
 	return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
 }
 
+static int can_filter_file(const char *filefilter, const char *filefiltername,
+			   const char *stdiofilter, const char *stdiofiltername,
+			   const struct conv_attrs *ca,
+			   int *warncount)
+{
+	if (!filefilter)
+		return 0;
+
+	if (stdiofilter)
+		return 1;
+
+	if (*warncount == 0)
+		warning("Not running your configured filter.%s.%s command, because filter.%s.%s is not configured",
+			ca->drv->name, filefiltername,
+			ca->drv->name, stdiofiltername);
+		*warncount=*warncount+1;
+
+	return 0;
+}
+
 int can_clean_from_file(const char *path)
 {
 	struct conv_attrs ca;
+	static int warncount = 0;
 
 	convert_attrs(&ca, path);
 	if (!ca.drv)
 		return 0;
 
-	/*
-	 * Only use the cleanFromFile filter when the clean filter is also
-	 * configured.
-	 */
-	return (ca.drv->clean_from_file && ca.drv->clean);
+	return can_filter_file(ca.drv->clean_from_file, "cleanFromFile",
+			       ca.drv->clean, "clean", &ca, &warncount);
 }
 
 int can_smudge_to_file(const char *path)
 {
 	struct conv_attrs ca;
+	static int warncount = 0;
 
 	convert_attrs(&ca, path);
 	if (!ca.drv)
 		return 0;
 
-	/*
-	 * Only use the smudgeToFile filter when the smudge filter is also
-	 * configured.
-	 */
-	return (ca.drv->smudge_to_file && ca.drv->smudge);
+	return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile",
+			       ca.drv->smudge, "smudge", &ca, &warncount);
 }
 
 const char *get_convert_attr_ascii(const char *path)
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 6/8] better recovery from failure of smudgeToFile filter
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (4 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 7/8] use smudgeToFile filter in git am Joey Hess
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

If the smudgeToFile filter fails, it can leave the worktree file with the
wrong content, or even deleted. Recover from this by falling back to
running the smudge filter.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 entry.c               | 66 ++++++++++++++++++++++++++++++++++-----------------
 t/t0021-conversion.sh | 24 +++++++++++++++++++
 2 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/entry.c b/entry.c
index 81d12a1..7811e31 100644
--- a/entry.c
+++ b/entry.c
@@ -182,12 +182,6 @@ static int write_entry(struct cache_entry *ce,
 		regular_file = ce_mode_s_ifmt == S_IFREG;
 		smudge_to_file = regular_file
 			&& can_smudge_to_file(ce->name);
-		if (regular_file && !smudge_to_file &&
-		    convert_to_working_tree(ce->name, new, size, &buf)) {
-			free(new);
-			new = strbuf_detach(&buf, &newsize);
-			size = newsize;
-		}
 
 		fd = open_output_fd(path, ce, to_tempfile);
 		if (fd < 0) {
@@ -195,7 +189,51 @@ static int write_entry(struct cache_entry *ce,
 			return error_errno("unable to create file %s", path);
 		}
 
+		if (smudge_to_file) {
+			close(fd);
+			if (convert_to_working_tree_filter_to_file(ce->name, path, new, size)) {
+				free(new);
+				/*
+				 * The smudgeToFile filter may have replaced
+				 * or deleted the file; reopen it to make
+				 * sure that the file exists.
+				 */
+				fd = open(path, O_RDONLY);
+				if (fd < 0)
+					return error_errno("unable to create file %s", path);
+				if (!to_tempfile)
+					fstat_done = fstat_output(fd, state, &st);
+				close(fd);
+			}
+			else {
+				/*
+				 * The failing smudgeToFile filter may have
+				 * deleted or replaced the file; delete
+				 * the file and re-open for recovery write.
+				 */
+				unlink(path);
+				fd = open_output_fd(path, ce, to_tempfile);
+				if (fd < 0) {
+					free(new);
+					return error_errno("unable to create file %s", path);
+				}
+				/* Fall through to normal write below. */
+				smudge_to_file = 0;
+			}
+		}
+
+		/*
+		 * Not an else of above if (smudge_to_file) because the
+		 * smudgeToFile filter may fail and in that case this is
+		 * run to recover.
+		 */
 		if (!smudge_to_file) {
+			if (regular_file &&
+			    convert_to_working_tree(ce->name, new, size, &buf)) {
+				free(new);
+				new = strbuf_detach(&buf, &newsize);
+				size = newsize;
+			}
 			wrote = write_in_full(fd, new, size);
 			if (!to_tempfile)
 				fstat_done = fstat_output(fd, state, &st);
@@ -204,22 +242,6 @@ static int write_entry(struct cache_entry *ce,
 			if (wrote != size)
 				return error("unable to write file %s", path);
 		}
-		else {
-			close(fd);
-			convert_to_working_tree_filter_to_file(ce->name, path, new, size);
-			free(new);
-			/*
-			 * The smudgeToFile filter may have replaced the
-			 * file; open it to make sure that the file
-			 * exists.
-			 */
-			fd = open(path, O_RDONLY);
-			if (fd < 0)
-				return error_errno("unable to create file %s", path);
-			if (!to_tempfile)
-				fstat_done = fstat_output(fd, state, &st);
-			close(fd);
-		}
 		break;
 	case S_IFGITLINK:
 		if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index ea18b17..0efad9b 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,6 +28,14 @@ touch rot13-to-file.ran
 EOF
 chmod +x rot13-to-file.sh
 
+cat <<EOF >delete-file-and-fail.sh
+#!$SHELL_PATH
+destfile="\$1"
+rm -f "\$destfile"
+exit 1
+EOF
+chmod +x delete-file-and-fail.sh
+
 test_expect_success setup '
 	git config filter.rot13.smudge ./rot13.sh &&
 	git config filter.rot13.clean ./rot13.sh &&
@@ -310,6 +318,22 @@ test_expect_success 'smudgeToFile filter is used when checking out a file' '
 	rm -f rot13-to-file.ran
 '
 
+test_expect_success 'recovery from failure of smudgeToFile filter, using smudge filter' '
+	test_config filter.rot13.smudgeToFile false &&
+
+	rm -f fstest.t &&
+	git checkout -- fstest.t &&
+	test_cmp test fstest.t
+'
+
+test_expect_success 'recovery from failure of smudgeToFile filter that deletes the worktree file' '
+	test_config filter.rot13.smudgeToFile ./delete-file-and-fail.sh &&
+
+	rm -f fstest.t &&
+	git checkout -- fstest.t &&
+	test_cmp test fstest.t
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' '
 	test_config filter.noclean.smudge ./rot13.sh &&
 	test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 7/8] use smudgeToFile filter in git am
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (5 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 6/8] better recovery from failure of smudgeToFile filter Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-11 22:45 ` [PATCH v5 8/8] use smudgeToFile filter in recursive merge Joey Hess
  2016-07-12 19:52 ` [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Junio C Hamano
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

git am updates the work tree and so should use the smudgeToFile filter.

This includes some refactoring into convert_to_working_tree_filter_to_file
to make it check the file after running the smudgeToFile command, and clean
up from a failing command.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 apply.c               | 16 ++++++++++++++++
 convert.c             | 25 +++++++++++++++++++++++--
 entry.c               | 21 ++++-----------------
 t/t0021-conversion.sh | 13 +++++++++++++
 4 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/apply.c b/apply.c
index 4a6b2db..7db8344 100644
--- a/apply.c
+++ b/apply.c
@@ -4322,6 +4322,22 @@ static int try_create_file(const char *path, unsigned int mode, const char *buf,
 	if (fd < 0)
 		return 1;
 
+	if (can_smudge_to_file(path)) {
+		close(fd);
+		fd = convert_to_working_tree_filter_to_file(path, path, buf, size);
+		if (fd < 0) {
+			/* smudgeToFile filter failed; continue
+			 * with regular file creation instead. */
+			fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666);
+			if (fd < 0)
+				return -1;
+		}
+		else {
+			close(fd);
+			return 0;
+		}
+	}
+
 	if (convert_to_working_tree(path, buf, size, &nbuf)) {
 		size = nbuf.len;
 		buf  = nbuf.buf;
diff --git a/convert.c b/convert.c
index e1b0b44..3746ad5 100644
--- a/convert.c
+++ b/convert.c
@@ -1030,13 +1030,34 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc
 	return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
 }
 
+/*
+ * Returns fd open to read the worktree file on success.
+ * On failure, the worktree file will not exist.
+ */
 int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len)
 {
 	struct strbuf output = STRBUF_INIT;
-	int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
+	int ok = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
 	/* The smudgeToFile filter stdout is not used. */
 	strbuf_release(&output);
-	return ret;
+	if (ok) {
+		/*
+		 * Open the file to make sure that it's present
+		 * (and readable) after the command populated it.
+		 */
+		int fd = open(path, O_RDONLY);
+		if (fd < 0)
+			unlink(path);
+		return fd;
+	}
+	else {
+		/*
+		 * The command could have created the file before failing,
+		 * so delete it.
+		 */
+		unlink(path);
+		return -1;
+	}
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
diff --git a/entry.c b/entry.c
index 7811e31..40662eb 100644
--- a/entry.c
+++ b/entry.c
@@ -191,34 +191,21 @@ static int write_entry(struct cache_entry *ce,
 
 		if (smudge_to_file) {
 			close(fd);
-			if (convert_to_working_tree_filter_to_file(ce->name, path, new, size)) {
+			fd = convert_to_working_tree_filter_to_file(ce->name, path, new, size);
+			if (fd >= 0) {
 				free(new);
-				/*
-				 * The smudgeToFile filter may have replaced
-				 * or deleted the file; reopen it to make
-				 * sure that the file exists.
-				 */
-				fd = open(path, O_RDONLY);
-				if (fd < 0)
-					return error_errno("unable to create file %s", path);
 				if (!to_tempfile)
 					fstat_done = fstat_output(fd, state, &st);
 				close(fd);
 			}
 			else {
-				/*
-				 * The failing smudgeToFile filter may have
-				 * deleted or replaced the file; delete
-				 * the file and re-open for recovery write.
-				 */
-				unlink(path);
+				/* Fall through to normal write below. */
+				smudge_to_file = 0;
 				fd = open_output_fd(path, ce, to_tempfile);
 				if (fd < 0) {
 					free(new);
 					return error_errno("unable to create file %s", path);
 				}
-				/* Fall through to normal write below. */
-				smudge_to_file = 0;
 			}
 		}
 
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 0efad9b..42b28aa 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,6 +334,19 @@ test_expect_success 'recovery from failure of smudgeToFile filter that deletes t
 	test_cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used by git am' '
+	test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+	git commit fstest.t -m "added fstest.t" &&
+	git format-patch HEAD^ --stdout >fstest.patch &&
+	git reset --hard HEAD^ &&
+	git am fstest.patch &&
+
+	test -e rot13-to-file.ran &&
+	rm -f rot13-to-file.ran &&
+	test_cmp test fstest.t
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' '
 	test_config filter.noclean.smudge ./rot13.sh &&
 	test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v5 8/8] use smudgeToFile filter in recursive merge
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (6 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 7/8] use smudgeToFile filter in git am Joey Hess
@ 2016-07-11 22:45 ` Joey Hess
  2016-07-12 19:52 ` [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Junio C Hamano
  8 siblings, 0 replies; 11+ messages in thread
From: Joey Hess @ 2016-07-11 22:45 UTC (permalink / raw)
  To: git; +Cc: Joey Hess

Recursive merge updates the work tree and so should use the smudgeToFile
filter.

At this point, smudgeToFile is run by everything that updates work
tree files.

Signed-off-by: Joey Hess <joeyh@joeyh.name>
---
 merge-recursive.c     | 53 ++++++++++++++++++++++++++++++++++++++++-----------
 t/t0021-conversion.sh | 16 +++++++++++++++-
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index a4a1195..5fe3f50 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -758,6 +758,7 @@ static void update_file_flags(struct merge_options *o,
 		enum object_type type;
 		void *buf;
 		unsigned long size;
+		int isreg;
 
 		if (S_ISGITLINK(mode)) {
 			/*
@@ -774,22 +775,16 @@ static void update_file_flags(struct merge_options *o,
 			die(_("cannot read object %s '%s'"), oid_to_hex(oid), path);
 		if (type != OBJ_BLOB)
 			die(_("blob expected for %s '%s'"), oid_to_hex(oid), path);
-		if (S_ISREG(mode)) {
-			struct strbuf strbuf = STRBUF_INIT;
-			if (convert_to_working_tree(path, buf, size, &strbuf)) {
-				free(buf);
-				size = strbuf.len;
-				buf = strbuf_detach(&strbuf, NULL);
-			}
-		}
 
 		if (make_room_for_path(o, path) < 0) {
 			update_wd = 0;
 			free(buf);
 			goto update_index;
 		}
-		if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) {
+		isreg = S_ISREG(mode);
+		if (isreg || (!has_symlinks && S_ISLNK(mode))) {
 			int fd;
+			int smudge_to_file;
 			if (mode & 0100)
 				mode = 0777;
 			else
@@ -797,8 +792,44 @@ static void update_file_flags(struct merge_options *o,
 			fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
 			if (fd < 0)
 				die_errno(_("failed to open '%s'"), path);
-			write_in_full(fd, buf, size);
-			close(fd);
+
+			smudge_to_file = can_smudge_to_file(path);
+			if (smudge_to_file) {
+				close(fd);
+				fd = convert_to_working_tree_filter_to_file(path, path, buf, size);
+				if (fd < 0) {
+					/*
+					 * smudgeToFile filter failed;
+					 * continue with regular file
+					 * creation.
+					 */
+					smudge_to_file = 0;
+					fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
+					if (fd < 0)
+						die_errno(_("failed to open '%s'"), path);
+				}
+				else {
+					close(fd);
+				}
+			}
+
+			/*
+			 * Not an else of above if (smudge_to_file) because
+			 * the smudgeToFile filter may fail and in that case
+			 * this is run to recover.
+			 */
+			if (!smudge_to_file) {
+				if (isreg) {
+					struct strbuf strbuf = STRBUF_INIT;
+					if (convert_to_working_tree(path, buf, size, &strbuf)) {
+						free(buf);
+						size = strbuf.len;
+						buf = strbuf_detach(&strbuf, NULL);
+					}
+				}
+				write_in_full(fd, buf, size);
+				close(fd);
+			}
 		} else if (S_ISLNK(mode)) {
 			char *lnk = xmemdupz(buf, size);
 			safe_create_leading_directories_const(path);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 42b28aa..64b2b8f 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of smudgeToFile filter that deletes t
 	test_cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used in merge' '
+	test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+	git commit -m "added fstest.t" fstest.t &&
+	git checkout -b old &&
+	git reset --hard HEAD^ &&
+	git merge master &&
+	git checkout master &&
+
+	test -e rot13-to-file.ran &&
+	rm -f rot13-to-file.ran &&
+
+	test_cmp test fstest.t
+'
+
 test_expect_success 'smudgeToFile filter is used by git am' '
 	test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
 
-	git commit fstest.t -m "added fstest.t" &&
 	git format-patch HEAD^ --stdout >fstest.patch &&
 	git reset --hard HEAD^ &&
 	git am fstest.patch &&
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu)
  2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
                   ` (7 preceding siblings ...)
  2016-07-11 22:45 ` [PATCH v5 8/8] use smudgeToFile filter in recursive merge Joey Hess
@ 2016-07-12 19:52 ` Junio C Hamano
  8 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2016-07-12 19:52 UTC (permalink / raw)
  To: Joey Hess; +Cc: git

Joey Hess <joeyh@joeyh.name> writes:

> Since tb/convert-peek-in-index is not currently in pu, this reroll isn't
> based on it, and will conflict if that topic gets added back into pu.
> Not sure what the status of tb/convert-peek-in-index is at this point?

It appears that we are converging on _not_ using that topic after
all (cf. $gmane/299320).

I'll try to apply these on top of a merge between the 'cc/am-apply'
topic and the current 'master' branch and requeue.

> Improvements from Junio's review:
>
> 	fix build with DEVELOPER=1
> 	style fixes
> 	use test_cmp in test cases
> 	improve robustness of a test case
> 	clean up some confusing code
> 	small performance tweak
>
> Joey Hess (8):
>   clarify %f documentation
>   add smudgeToFile and cleanFromFile filter configs
>   use cleanFromFile in git add
>   use smudgeToFile in git checkout etc
>   warn on unusable smudgeToFile/cleanFromFile config
>   better recovery from failure of smudgeToFile filter
>   use smudgeToFile filter in git am
>   use smudgeToFile filter in recursive merge
>
>  Documentation/config.txt        |  18 ++++-
>  Documentation/gitattributes.txt |  42 ++++++++++++
>  apply.c                         |  16 +++++
>  convert.c                       | 148 ++++++++++++++++++++++++++++++++++++----
>  convert.h                       |  10 +++
>  entry.c                         |  59 ++++++++++++----
>  merge-recursive.c               |  53 +++++++++++---
>  sha1_file.c                     |  42 ++++++++++--
>  t/t0021-conversion.sh           | 117 +++++++++++++++++++++++++++++++
>  9 files changed, 459 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
  2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
@ 2016-07-13 16:02   ` Lars Schneider
  0 siblings, 0 replies; 11+ messages in thread
From: Lars Schneider @ 2016-07-13 16:02 UTC (permalink / raw)
  To: Joey Hess; +Cc: git


> On 12 Jul 2016, at 00:45, Joey Hess <joeyh@joeyh.name> wrote:
> 
> This adds new smudgeToFile and cleanFromFile filter commands,
> which are similar to smudge and clean but allow direct access to files on
> disk.
> 
> This interface can be much more efficient when operating on large files,
> because the whole file content does not need to be streamed through the
> filter. It even allows for things like cleanFromFile commands that avoid
> reading the whole content of the file, and for smudgeToFile commands that
> populate a work tree file using an efficient Copy On Write operation.
> 
> The new filter commands will not be used for all filtering. They are
> efficient to use when git add is adding a file, or when the work tree is
> being updated, but not a good fit when git is internally filtering blob
> objects in memory for eg, a diff.
> 
> So, a user who wants to use smudgeToFile should also provide a smudge
> command to be used in cases where smudgeToFile is not used. And ditto
> with cleanFromFile and clean. To avoid foot-shooting configurations, the
> new commands are not used unless the old commands are also configured.
> 
> That also ensures that a filter driver configuration that includes these
> new commands will work, although less efficiently, when used with an older
> version of git that does not support them.
> 
> Signed-off-by: Joey Hess <joeyh@joeyh.name>
> ---
> Documentation/config.txt        |  18 ++++++-
> Documentation/gitattributes.txt |  37 ++++++++++++++
> convert.c                       | 111 +++++++++++++++++++++++++++++++++++-----
> convert.h                       |  10 ++++
> 4 files changed, 160 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 19493aa..a55bed8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1325,15 +1325,29 @@ format.useAutoBase::
> 	format-patch by default.
> 
> filter.<driver>.clean::
> -	The command which is used to convert the content of a worktree
> +	The command which is used as a filter to convert the content of a worktree
> 	file to a blob upon checkin.  See linkgit:gitattributes[5] for
> 	details.
> 
> filter.<driver>.smudge::
> -	The command which is used to convert the content of a blob
> +	The command which is used as a filter to convert the content of a blob
> 	object to a worktree file upon checkout.  See
> 	linkgit:gitattributes[5] for details.
> 
> +filter.<driver>.cleanFromFile::
> +	Similar to filter.<driver>.clean but the specified command
> +	directly accesses a worktree file on disk, rather than
> +	receiving the file content from standard input.
> +	Only used when filter.<driver>.clean is also configured.
> +	See linkgit:gitattributes[5] for details.
> +
> +filter.<driver>.smudgeToFile::
> +	Similar to filter.<driver>.smudge but the specified command
> +	writes the content of a blob directly to a worktree file,
> +	rather than to standard output.
> +	Only used when filter.<driver>.smudge is also configured.
> +	See linkgit:gitattributes[5] for details.
> +
> fsck.<msg-id>::
> 	Allows overriding the message type (error, warn or ignore) of a
> 	specific message ID such as `missingEmail`.
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 197ece8..a58aafc 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands
> should not try to access the file on disk, but only act as filters on the
> content provided to them on standard input.
> 
> +There are two extra commands "cleanFromFile" and "smudgeToFile", which
> +can optionally be set in a filter driver. These are similar to the "clean"
> +and "smudge" commands, but avoid needing to pipe the contents of files
> +through the filters, and instead read/write files in the filesystem.
> +This can be more efficient when using filters with large files that are not
> +directly stored in the repository.
> +
> +Both "cleanFromFile" and "smudgeToFile" are provided a path as an
> +added parameter after the configured command line.
> +
> +The "cleanFromFile" command is provided the path to the file that
> +it should clean. Like the "clean" command, it should output the cleaned
> +version to standard output.
> +
> +The "smudgeToFile" command is provided a path to the file that it
> +should write to. (This file will already exist, as an empty file that can
> +be written to or replaced.) Like the "smudge" command, "smudgeToFile"
> +is fed the blob object from its standard input.
> +
> +Some git operations that need to apply filters cannot use "cleanFromFile"
> +and "smudgeToFile", since the files are not present to disk. So, to avoid
> +inconsistent behavior, "cleanFromFile" will only be used if "clean" is
> +also configured, and "smudgeToFile" will only be used if "smudge" is also
> +configured.
> +
> +An example large file storage filter driver using cleanFromFile and
> +smudgeToFile follows:
> +
> +------------------------
> +[filter "bigfiles"]
> +	clean = store-bigfile --from-stdin
> +	cleanFromFile = store-bigfile --from-file
> +	smudge = retrieve-bigfile --to-stdout
> +	smudgeToFile = retrieve-bigfile --to-file
> +	required
Minor nit: Do we need "required" in the minimal example? Plus, all test
cases use "required = true" (I just learned about that short-hand version).


> +------------------------
> +
> Interaction between checkin/checkout attributes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> diff --git a/convert.c b/convert.c
> index 214c99f..eb7774f 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -358,7 +358,8 @@ struct filter_params {
> 	unsigned long size;
> 	int fd;
> 	const char *cmd;
> -	const char *path;
> +	const char *path; /* Path within the git repository */
> +	const char *fspath; /* Path to file on disk */
Good comment here. However, I wonder if these two "path" variables
could be confused in other parts of this file.

I think with an additional apply_filter function you could avoid this
confusion and it would be easier for me to apply my "long running filter
patch" on top :-)
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L1143-L1146
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L402-L488

Thanks for this patch series,
Lars


> };
> 
> static int filter_buffer_or_fd(int in, int out, void *data)
> @@ -387,6 +388,15 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
> 	strbuf_release(&path);
> 
> +	/* append fspath to the command if it's set, separated with a space */
> +	if (params->fspath) {
> +		struct strbuf fspath = STRBUF_INIT;
> +		sq_quote_buf(&fspath, params->fspath);
> +		strbuf_addstr(&cmd, " ");
> +		strbuf_addbuf(&cmd, &fspath);
> +		strbuf_release(&fspath);
> +	}
> +
> 	argv[0] = cmd.buf;
> 
> 	child_process.argv = argv;
> @@ -425,7 +435,8 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> 	return (write_err || status);
> }
> 
> -static int apply_filter(const char *path, const char *src, size_t len, int fd,
> +static int apply_filter(const char *path, const char *fspath,
> +			const char *src, size_t len, int fd,
>                         struct strbuf *dst, const char *cmd)
> {
> 	/*
> @@ -454,6 +465,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
> 	params.fd = fd;
> 	params.cmd = cmd;
> 	params.path = path;
> +	params.fspath = fspath;
> 
> 	fflush(NULL);
> 	if (start_async(&async))
> @@ -484,6 +496,8 @@ static struct convert_driver {
> 	struct convert_driver *next;
> 	const char *smudge;
> 	const char *clean;
> +	const char *smudge_to_file;
> +	const char *clean_from_file;
> 	int required;
> } *user_convert, **user_convert_tail;
> 
> @@ -510,8 +524,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> 	}
> 
> 	/*
> -	 * filter.<name>.smudge and filter.<name>.clean specifies
> -	 * the command line:
> +	 * filter.<name>.smudge, filter.<name>.clean,
> +	 * filter.<name>.smudgeToFile, filter.<name>.cleanFromFile
> +	 * specifies the command line:
> 	 *
> 	 *	command-line
> 	 *
> @@ -524,6 +539,12 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> 	if (!strcmp("clean", key))
> 		return git_config_string(&drv->clean, var, value);
> 
> +	if (!strcmp("smudgetofile", key))
> +		return git_config_string(&drv->smudge_to_file, var, value);
> +
> +	if (!strcmp("cleanfromfile", key))
> +		return git_config_string(&drv->clean_from_file, var, value);
> +
> 	if (!strcmp("required", key)) {
> 		drv->required = git_config_bool(var, value);
> 		return 0;
> @@ -821,7 +842,37 @@ int would_convert_to_git_filter_fd(const char *path)
> 	if (!ca.drv->required)
> 		return 0;
> 
> -	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
> +	return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
> +}
> +
> +int can_clean_from_file(const char *path)
> +{
> +	struct conv_attrs ca;
> +
> +	convert_attrs(&ca, path);
> +	if (!ca.drv)
> +		return 0;
> +
> +	/*
> +	 * Only use the cleanFromFile filter when the clean filter is also
> +	 * configured.
> +	 */
> +	return (ca.drv->clean_from_file && ca.drv->clean);
> +}
> +
> +int can_smudge_to_file(const char *path)
> +{
> +	struct conv_attrs ca;
> +
> +	convert_attrs(&ca, path);
> +	if (!ca.drv)
> +		return 0;
> +
> +	/*
> +	 * Only use the smudgeToFile filter when the smudge filter is also
> +	 * configured.
> +	 */
> +	return (ca.drv->smudge_to_file && ca.drv->smudge);
> }
> 
> const char *get_convert_attr_ascii(const char *path)
> @@ -864,7 +915,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
> 		required = ca.drv->required;
> 	}
> 
> -	ret |= apply_filter(path, src, len, -1, dst, filter);
> +	ret |= apply_filter(path, NULL, src, len, -1, dst, filter);
> 	if (!ret && required)
> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
> 
> @@ -889,14 +940,34 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
> 	assert(ca.drv);
> 	assert(ca.drv->clean);
> 
> -	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
> +	if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean))
> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
> 
> 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
> 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> }
> 
> -static int convert_to_working_tree_internal(const char *path, const char *src,
> +void convert_to_git_filter_from_file(const char *path, struct strbuf *dst,
> +				   enum safe_crlf checksafe)
> +{
> +	struct conv_attrs ca;
> +	convert_attrs(&ca, path);
> +
> +	assert(ca.drv);
> +	assert(ca.drv->clean);
> +	assert(ca.drv->clean_from_file);
> +
> +	if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file))
> +		die("%s: cleanFromFile filter '%s' failed", path, ca.drv->name);
> +
> +	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
> +		checksafe);
> +	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> +}
> +
> +static int convert_to_working_tree_internal(const char *path,
> +					    const char *destpath,
> +					    const char *src,
> 					    size_t len, struct strbuf *dst,
> 					    int normalizing)
> {
> @@ -907,7 +978,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 
> 	convert_attrs(&ca, path);
> 	if (ca.drv) {
> -		filter = ca.drv->smudge;
> +		if (destpath)
> +			filter = ca.drv->smudge_to_file;
> +		else
> +			filter = ca.drv->smudge;
> 		required = ca.drv->required;
> 	}
> 
> @@ -918,7 +992,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 	}
> 	/*
> 	 * CRLF conversion can be skipped if normalizing, unless there
> -	 * is a smudge filter.  The filter might expect CRLFs.
> +	 * is a filter.  The filter might expect CRLFs.
> 	 */
> 	if (filter || !normalizing) {
> 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
> @@ -928,21 +1002,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 		}
> 	}
> 
> -	ret_filter = apply_filter(path, src, len, -1, dst, filter);
> +	ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter);
> 	if (!ret_filter && required)
> -		die("%s: smudge filter %s failed", path, ca.drv->name);
> +		die("%s: %s filter %s failed", path, destpath ? "smudgeToFile" : "smudge", ca.drv->name);
> 
> 	return ret | ret_filter;
> }
> 
> int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> -	return convert_to_working_tree_internal(path, src, len, dst, 0);
> +	return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
> +}
> +
> +int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len)
> +{
> +	struct strbuf output = STRBUF_INIT;
> +	int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
> +	/* The smudgeToFile filter stdout is not used. */
> +	strbuf_release(&output);
> +	return ret;
> }
> 
> int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> -	int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
> +	int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1);
> 	if (ret) {
> 		src = dst->buf;
> 		len = dst->len;
> diff --git a/convert.h b/convert.h
> index 82871a1..6f46d10 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -42,6 +42,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
> 			  struct strbuf *dst, enum safe_crlf checksafe);
> extern int convert_to_working_tree(const char *path, const char *src,
> 				   size_t len, struct strbuf *dst);
> +extern int convert_to_working_tree_filter_to_file(const char *path,
> +						  const char *destpath,
> +						  const char *src,
> +						  size_t len);
> extern int renormalize_buffer(const char *path, const char *src, size_t len,
> 			      struct strbuf *dst);
> static inline int would_convert_to_git(const char *path)
> @@ -53,6 +57,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
> 				     struct strbuf *dst,
> 				     enum safe_crlf checksafe);
> extern int would_convert_to_git_filter_fd(const char *path);
> +/* Precondition: can_clean_from_file(path) == true */
> +extern void convert_to_git_filter_from_file(const char *path,
> +					    struct strbuf *dst,
> +					    enum safe_crlf checksafe);
> +extern int can_clean_from_file(const char *path);
> +extern int can_smudge_to_file(const char *path);
> 
> /*****************************************************************
>  *
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-13 16:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
2016-07-13 16:02   ` Lars Schneider
2016-07-11 22:45 ` [PATCH v5 3/8] use cleanFromFile in git add Joey Hess
2016-07-11 22:45 ` [PATCH v5 4/8] use smudgeToFile in git checkout etc Joey Hess
2016-07-11 22:45 ` [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config Joey Hess
2016-07-11 22:45 ` [PATCH v5 6/8] better recovery from failure of smudgeToFile filter Joey Hess
2016-07-11 22:45 ` [PATCH v5 7/8] use smudgeToFile filter in git am Joey Hess
2016-07-11 22:45 ` [PATCH v5 8/8] use smudgeToFile filter in recursive merge Joey Hess
2016-07-12 19:52 ` [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).