git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Joey Hess <joeyh@joeyh.name>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
Date: Wed, 13 Jul 2016 18:02:50 +0200	[thread overview]
Message-ID: <1994A13B-DD56-4E60-8CBA-A7299A6D6703@gmail.com> (raw)
In-Reply-To: <1468277112-9909-3-git-send-email-joeyh@joeyh.name>


> On 12 Jul 2016, at 00:45, Joey Hess <joeyh@joeyh.name> wrote:
> 
> This adds new smudgeToFile and cleanFromFile filter commands,
> which are similar to smudge and clean but allow direct access to files on
> disk.
> 
> This interface can be much more efficient when operating on large files,
> because the whole file content does not need to be streamed through the
> filter. It even allows for things like cleanFromFile commands that avoid
> reading the whole content of the file, and for smudgeToFile commands that
> populate a work tree file using an efficient Copy On Write operation.
> 
> The new filter commands will not be used for all filtering. They are
> efficient to use when git add is adding a file, or when the work tree is
> being updated, but not a good fit when git is internally filtering blob
> objects in memory for eg, a diff.
> 
> So, a user who wants to use smudgeToFile should also provide a smudge
> command to be used in cases where smudgeToFile is not used. And ditto
> with cleanFromFile and clean. To avoid foot-shooting configurations, the
> new commands are not used unless the old commands are also configured.
> 
> That also ensures that a filter driver configuration that includes these
> new commands will work, although less efficiently, when used with an older
> version of git that does not support them.
> 
> Signed-off-by: Joey Hess <joeyh@joeyh.name>
> ---
> Documentation/config.txt        |  18 ++++++-
> Documentation/gitattributes.txt |  37 ++++++++++++++
> convert.c                       | 111 +++++++++++++++++++++++++++++++++++-----
> convert.h                       |  10 ++++
> 4 files changed, 160 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 19493aa..a55bed8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1325,15 +1325,29 @@ format.useAutoBase::
> 	format-patch by default.
> 
> filter.<driver>.clean::
> -	The command which is used to convert the content of a worktree
> +	The command which is used as a filter to convert the content of a worktree
> 	file to a blob upon checkin.  See linkgit:gitattributes[5] for
> 	details.
> 
> filter.<driver>.smudge::
> -	The command which is used to convert the content of a blob
> +	The command which is used as a filter to convert the content of a blob
> 	object to a worktree file upon checkout.  See
> 	linkgit:gitattributes[5] for details.
> 
> +filter.<driver>.cleanFromFile::
> +	Similar to filter.<driver>.clean but the specified command
> +	directly accesses a worktree file on disk, rather than
> +	receiving the file content from standard input.
> +	Only used when filter.<driver>.clean is also configured.
> +	See linkgit:gitattributes[5] for details.
> +
> +filter.<driver>.smudgeToFile::
> +	Similar to filter.<driver>.smudge but the specified command
> +	writes the content of a blob directly to a worktree file,
> +	rather than to standard output.
> +	Only used when filter.<driver>.smudge is also configured.
> +	See linkgit:gitattributes[5] for details.
> +
> fsck.<msg-id>::
> 	Allows overriding the message type (error, warn or ignore) of a
> 	specific message ID such as `missingEmail`.
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 197ece8..a58aafc 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands
> should not try to access the file on disk, but only act as filters on the
> content provided to them on standard input.
> 
> +There are two extra commands "cleanFromFile" and "smudgeToFile", which
> +can optionally be set in a filter driver. These are similar to the "clean"
> +and "smudge" commands, but avoid needing to pipe the contents of files
> +through the filters, and instead read/write files in the filesystem.
> +This can be more efficient when using filters with large files that are not
> +directly stored in the repository.
> +
> +Both "cleanFromFile" and "smudgeToFile" are provided a path as an
> +added parameter after the configured command line.
> +
> +The "cleanFromFile" command is provided the path to the file that
> +it should clean. Like the "clean" command, it should output the cleaned
> +version to standard output.
> +
> +The "smudgeToFile" command is provided a path to the file that it
> +should write to. (This file will already exist, as an empty file that can
> +be written to or replaced.) Like the "smudge" command, "smudgeToFile"
> +is fed the blob object from its standard input.
> +
> +Some git operations that need to apply filters cannot use "cleanFromFile"
> +and "smudgeToFile", since the files are not present to disk. So, to avoid
> +inconsistent behavior, "cleanFromFile" will only be used if "clean" is
> +also configured, and "smudgeToFile" will only be used if "smudge" is also
> +configured.
> +
> +An example large file storage filter driver using cleanFromFile and
> +smudgeToFile follows:
> +
> +------------------------
> +[filter "bigfiles"]
> +	clean = store-bigfile --from-stdin
> +	cleanFromFile = store-bigfile --from-file
> +	smudge = retrieve-bigfile --to-stdout
> +	smudgeToFile = retrieve-bigfile --to-file
> +	required
Minor nit: Do we need "required" in the minimal example? Plus, all test
cases use "required = true" (I just learned about that short-hand version).


> +------------------------
> +
> Interaction between checkin/checkout attributes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> diff --git a/convert.c b/convert.c
> index 214c99f..eb7774f 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -358,7 +358,8 @@ struct filter_params {
> 	unsigned long size;
> 	int fd;
> 	const char *cmd;
> -	const char *path;
> +	const char *path; /* Path within the git repository */
> +	const char *fspath; /* Path to file on disk */
Good comment here. However, I wonder if these two "path" variables
could be confused in other parts of this file.

I think with an additional apply_filter function you could avoid this
confusion and it would be easier for me to apply my "long running filter
patch" on top :-)
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L1143-L1146
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L402-L488

Thanks for this patch series,
Lars


> };
> 
> static int filter_buffer_or_fd(int in, int out, void *data)
> @@ -387,6 +388,15 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
> 	strbuf_release(&path);
> 
> +	/* append fspath to the command if it's set, separated with a space */
> +	if (params->fspath) {
> +		struct strbuf fspath = STRBUF_INIT;
> +		sq_quote_buf(&fspath, params->fspath);
> +		strbuf_addstr(&cmd, " ");
> +		strbuf_addbuf(&cmd, &fspath);
> +		strbuf_release(&fspath);
> +	}
> +
> 	argv[0] = cmd.buf;
> 
> 	child_process.argv = argv;
> @@ -425,7 +435,8 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> 	return (write_err || status);
> }
> 
> -static int apply_filter(const char *path, const char *src, size_t len, int fd,
> +static int apply_filter(const char *path, const char *fspath,
> +			const char *src, size_t len, int fd,
>                         struct strbuf *dst, const char *cmd)
> {
> 	/*
> @@ -454,6 +465,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
> 	params.fd = fd;
> 	params.cmd = cmd;
> 	params.path = path;
> +	params.fspath = fspath;
> 
> 	fflush(NULL);
> 	if (start_async(&async))
> @@ -484,6 +496,8 @@ static struct convert_driver {
> 	struct convert_driver *next;
> 	const char *smudge;
> 	const char *clean;
> +	const char *smudge_to_file;
> +	const char *clean_from_file;
> 	int required;
> } *user_convert, **user_convert_tail;
> 
> @@ -510,8 +524,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> 	}
> 
> 	/*
> -	 * filter.<name>.smudge and filter.<name>.clean specifies
> -	 * the command line:
> +	 * filter.<name>.smudge, filter.<name>.clean,
> +	 * filter.<name>.smudgeToFile, filter.<name>.cleanFromFile
> +	 * specifies the command line:
> 	 *
> 	 *	command-line
> 	 *
> @@ -524,6 +539,12 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> 	if (!strcmp("clean", key))
> 		return git_config_string(&drv->clean, var, value);
> 
> +	if (!strcmp("smudgetofile", key))
> +		return git_config_string(&drv->smudge_to_file, var, value);
> +
> +	if (!strcmp("cleanfromfile", key))
> +		return git_config_string(&drv->clean_from_file, var, value);
> +
> 	if (!strcmp("required", key)) {
> 		drv->required = git_config_bool(var, value);
> 		return 0;
> @@ -821,7 +842,37 @@ int would_convert_to_git_filter_fd(const char *path)
> 	if (!ca.drv->required)
> 		return 0;
> 
> -	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
> +	return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
> +}
> +
> +int can_clean_from_file(const char *path)
> +{
> +	struct conv_attrs ca;
> +
> +	convert_attrs(&ca, path);
> +	if (!ca.drv)
> +		return 0;
> +
> +	/*
> +	 * Only use the cleanFromFile filter when the clean filter is also
> +	 * configured.
> +	 */
> +	return (ca.drv->clean_from_file && ca.drv->clean);
> +}
> +
> +int can_smudge_to_file(const char *path)
> +{
> +	struct conv_attrs ca;
> +
> +	convert_attrs(&ca, path);
> +	if (!ca.drv)
> +		return 0;
> +
> +	/*
> +	 * Only use the smudgeToFile filter when the smudge filter is also
> +	 * configured.
> +	 */
> +	return (ca.drv->smudge_to_file && ca.drv->smudge);
> }
> 
> const char *get_convert_attr_ascii(const char *path)
> @@ -864,7 +915,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
> 		required = ca.drv->required;
> 	}
> 
> -	ret |= apply_filter(path, src, len, -1, dst, filter);
> +	ret |= apply_filter(path, NULL, src, len, -1, dst, filter);
> 	if (!ret && required)
> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
> 
> @@ -889,14 +940,34 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
> 	assert(ca.drv);
> 	assert(ca.drv->clean);
> 
> -	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
> +	if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean))
> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
> 
> 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
> 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> }
> 
> -static int convert_to_working_tree_internal(const char *path, const char *src,
> +void convert_to_git_filter_from_file(const char *path, struct strbuf *dst,
> +				   enum safe_crlf checksafe)
> +{
> +	struct conv_attrs ca;
> +	convert_attrs(&ca, path);
> +
> +	assert(ca.drv);
> +	assert(ca.drv->clean);
> +	assert(ca.drv->clean_from_file);
> +
> +	if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file))
> +		die("%s: cleanFromFile filter '%s' failed", path, ca.drv->name);
> +
> +	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
> +		checksafe);
> +	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> +}
> +
> +static int convert_to_working_tree_internal(const char *path,
> +					    const char *destpath,
> +					    const char *src,
> 					    size_t len, struct strbuf *dst,
> 					    int normalizing)
> {
> @@ -907,7 +978,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 
> 	convert_attrs(&ca, path);
> 	if (ca.drv) {
> -		filter = ca.drv->smudge;
> +		if (destpath)
> +			filter = ca.drv->smudge_to_file;
> +		else
> +			filter = ca.drv->smudge;
> 		required = ca.drv->required;
> 	}
> 
> @@ -918,7 +992,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 	}
> 	/*
> 	 * CRLF conversion can be skipped if normalizing, unless there
> -	 * is a smudge filter.  The filter might expect CRLFs.
> +	 * is a filter.  The filter might expect CRLFs.
> 	 */
> 	if (filter || !normalizing) {
> 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
> @@ -928,21 +1002,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> 		}
> 	}
> 
> -	ret_filter = apply_filter(path, src, len, -1, dst, filter);
> +	ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter);
> 	if (!ret_filter && required)
> -		die("%s: smudge filter %s failed", path, ca.drv->name);
> +		die("%s: %s filter %s failed", path, destpath ? "smudgeToFile" : "smudge", ca.drv->name);
> 
> 	return ret | ret_filter;
> }
> 
> int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> -	return convert_to_working_tree_internal(path, src, len, dst, 0);
> +	return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
> +}
> +
> +int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len)
> +{
> +	struct strbuf output = STRBUF_INIT;
> +	int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
> +	/* The smudgeToFile filter stdout is not used. */
> +	strbuf_release(&output);
> +	return ret;
> }
> 
> int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> -	int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
> +	int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1);
> 	if (ret) {
> 		src = dst->buf;
> 		len = dst->len;
> diff --git a/convert.h b/convert.h
> index 82871a1..6f46d10 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -42,6 +42,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
> 			  struct strbuf *dst, enum safe_crlf checksafe);
> extern int convert_to_working_tree(const char *path, const char *src,
> 				   size_t len, struct strbuf *dst);
> +extern int convert_to_working_tree_filter_to_file(const char *path,
> +						  const char *destpath,
> +						  const char *src,
> +						  size_t len);
> extern int renormalize_buffer(const char *path, const char *src, size_t len,
> 			      struct strbuf *dst);
> static inline int would_convert_to_git(const char *path)
> @@ -53,6 +57,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
> 				     struct strbuf *dst,
> 				     enum safe_crlf checksafe);
> extern int would_convert_to_git_filter_fd(const char *path);
> +/* Precondition: can_clean_from_file(path) == true */
> +extern void convert_to_git_filter_from_file(const char *path,
> +					    struct strbuf *dst,
> +					    enum safe_crlf checksafe);
> +extern int can_clean_from_file(const char *path);
> +extern int can_smudge_to_file(const char *path);
> 
> /*****************************************************************
>  *
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2016-07-13 16:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
2016-07-13 16:02   ` Lars Schneider [this message]
2016-07-11 22:45 ` [PATCH v5 3/8] use cleanFromFile in git add Joey Hess
2016-07-11 22:45 ` [PATCH v5 4/8] use smudgeToFile in git checkout etc Joey Hess
2016-07-11 22:45 ` [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config Joey Hess
2016-07-11 22:45 ` [PATCH v5 6/8] better recovery from failure of smudgeToFile filter Joey Hess
2016-07-11 22:45 ` [PATCH v5 7/8] use smudgeToFile filter in git am Joey Hess
2016-07-11 22:45 ` [PATCH v5 8/8] use smudgeToFile filter in recursive merge Joey Hess
2016-07-12 19:52 ` [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1994A13B-DD56-4E60-8CBA-A7299A6D6703@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=joeyh@joeyh.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).