From: Lars Schneider <larsxschneider@gmail.com>
To: Joey Hess <joeyh@joeyh.name>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
Date: Wed, 13 Jul 2016 18:02:50 +0200 [thread overview]
Message-ID: <1994A13B-DD56-4E60-8CBA-A7299A6D6703@gmail.com> (raw)
In-Reply-To: <1468277112-9909-3-git-send-email-joeyh@joeyh.name>
> On 12 Jul 2016, at 00:45, Joey Hess <joeyh@joeyh.name> wrote:
>
> This adds new smudgeToFile and cleanFromFile filter commands,
> which are similar to smudge and clean but allow direct access to files on
> disk.
>
> This interface can be much more efficient when operating on large files,
> because the whole file content does not need to be streamed through the
> filter. It even allows for things like cleanFromFile commands that avoid
> reading the whole content of the file, and for smudgeToFile commands that
> populate a work tree file using an efficient Copy On Write operation.
>
> The new filter commands will not be used for all filtering. They are
> efficient to use when git add is adding a file, or when the work tree is
> being updated, but not a good fit when git is internally filtering blob
> objects in memory for eg, a diff.
>
> So, a user who wants to use smudgeToFile should also provide a smudge
> command to be used in cases where smudgeToFile is not used. And ditto
> with cleanFromFile and clean. To avoid foot-shooting configurations, the
> new commands are not used unless the old commands are also configured.
>
> That also ensures that a filter driver configuration that includes these
> new commands will work, although less efficiently, when used with an older
> version of git that does not support them.
>
> Signed-off-by: Joey Hess <joeyh@joeyh.name>
> ---
> Documentation/config.txt | 18 ++++++-
> Documentation/gitattributes.txt | 37 ++++++++++++++
> convert.c | 111 +++++++++++++++++++++++++++++++++++-----
> convert.h | 10 ++++
> 4 files changed, 160 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 19493aa..a55bed8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1325,15 +1325,29 @@ format.useAutoBase::
> format-patch by default.
>
> filter.<driver>.clean::
> - The command which is used to convert the content of a worktree
> + The command which is used as a filter to convert the content of a worktree
> file to a blob upon checkin. See linkgit:gitattributes[5] for
> details.
>
> filter.<driver>.smudge::
> - The command which is used to convert the content of a blob
> + The command which is used as a filter to convert the content of a blob
> object to a worktree file upon checkout. See
> linkgit:gitattributes[5] for details.
>
> +filter.<driver>.cleanFromFile::
> + Similar to filter.<driver>.clean but the specified command
> + directly accesses a worktree file on disk, rather than
> + receiving the file content from standard input.
> + Only used when filter.<driver>.clean is also configured.
> + See linkgit:gitattributes[5] for details.
> +
> +filter.<driver>.smudgeToFile::
> + Similar to filter.<driver>.smudge but the specified command
> + writes the content of a blob directly to a worktree file,
> + rather than to standard output.
> + Only used when filter.<driver>.smudge is also configured.
> + See linkgit:gitattributes[5] for details.
> +
> fsck.<msg-id>::
> Allows overriding the message type (error, warn or ignore) of a
> specific message ID such as `missingEmail`.
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 197ece8..a58aafc 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands
> should not try to access the file on disk, but only act as filters on the
> content provided to them on standard input.
>
> +There are two extra commands "cleanFromFile" and "smudgeToFile", which
> +can optionally be set in a filter driver. These are similar to the "clean"
> +and "smudge" commands, but avoid needing to pipe the contents of files
> +through the filters, and instead read/write files in the filesystem.
> +This can be more efficient when using filters with large files that are not
> +directly stored in the repository.
> +
> +Both "cleanFromFile" and "smudgeToFile" are provided a path as an
> +added parameter after the configured command line.
> +
> +The "cleanFromFile" command is provided the path to the file that
> +it should clean. Like the "clean" command, it should output the cleaned
> +version to standard output.
> +
> +The "smudgeToFile" command is provided a path to the file that it
> +should write to. (This file will already exist, as an empty file that can
> +be written to or replaced.) Like the "smudge" command, "smudgeToFile"
> +is fed the blob object from its standard input.
> +
> +Some git operations that need to apply filters cannot use "cleanFromFile"
> +and "smudgeToFile", since the files are not present to disk. So, to avoid
> +inconsistent behavior, "cleanFromFile" will only be used if "clean" is
> +also configured, and "smudgeToFile" will only be used if "smudge" is also
> +configured.
> +
> +An example large file storage filter driver using cleanFromFile and
> +smudgeToFile follows:
> +
> +------------------------
> +[filter "bigfiles"]
> + clean = store-bigfile --from-stdin
> + cleanFromFile = store-bigfile --from-file
> + smudge = retrieve-bigfile --to-stdout
> + smudgeToFile = retrieve-bigfile --to-file
> + required
Minor nit: Do we need "required" in the minimal example? Plus, all test
cases use "required = true" (I just learned about that short-hand version).
> +------------------------
> +
> Interaction between checkin/checkout attributes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> diff --git a/convert.c b/convert.c
> index 214c99f..eb7774f 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -358,7 +358,8 @@ struct filter_params {
> unsigned long size;
> int fd;
> const char *cmd;
> - const char *path;
> + const char *path; /* Path within the git repository */
> + const char *fspath; /* Path to file on disk */
Good comment here. However, I wonder if these two "path" variables
could be confused in other parts of this file.
I think with an additional apply_filter function you could avoid this
confusion and it would be easier for me to apply my "long running filter
patch" on top :-)
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L1143-L1146
https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L402-L488
Thanks for this patch series,
Lars
> };
>
> static int filter_buffer_or_fd(int in, int out, void *data)
> @@ -387,6 +388,15 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
> strbuf_release(&path);
>
> + /* append fspath to the command if it's set, separated with a space */
> + if (params->fspath) {
> + struct strbuf fspath = STRBUF_INIT;
> + sq_quote_buf(&fspath, params->fspath);
> + strbuf_addstr(&cmd, " ");
> + strbuf_addbuf(&cmd, &fspath);
> + strbuf_release(&fspath);
> + }
> +
> argv[0] = cmd.buf;
>
> child_process.argv = argv;
> @@ -425,7 +435,8 @@ static int filter_buffer_or_fd(int in, int out, void *data)
> return (write_err || status);
> }
>
> -static int apply_filter(const char *path, const char *src, size_t len, int fd,
> +static int apply_filter(const char *path, const char *fspath,
> + const char *src, size_t len, int fd,
> struct strbuf *dst, const char *cmd)
> {
> /*
> @@ -454,6 +465,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
> params.fd = fd;
> params.cmd = cmd;
> params.path = path;
> + params.fspath = fspath;
>
> fflush(NULL);
> if (start_async(&async))
> @@ -484,6 +496,8 @@ static struct convert_driver {
> struct convert_driver *next;
> const char *smudge;
> const char *clean;
> + const char *smudge_to_file;
> + const char *clean_from_file;
> int required;
> } *user_convert, **user_convert_tail;
>
> @@ -510,8 +524,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> }
>
> /*
> - * filter.<name>.smudge and filter.<name>.clean specifies
> - * the command line:
> + * filter.<name>.smudge, filter.<name>.clean,
> + * filter.<name>.smudgeToFile, filter.<name>.cleanFromFile
> + * specifies the command line:
> *
> * command-line
> *
> @@ -524,6 +539,12 @@ static int read_convert_config(const char *var, const char *value, void *cb)
> if (!strcmp("clean", key))
> return git_config_string(&drv->clean, var, value);
>
> + if (!strcmp("smudgetofile", key))
> + return git_config_string(&drv->smudge_to_file, var, value);
> +
> + if (!strcmp("cleanfromfile", key))
> + return git_config_string(&drv->clean_from_file, var, value);
> +
> if (!strcmp("required", key)) {
> drv->required = git_config_bool(var, value);
> return 0;
> @@ -821,7 +842,37 @@ int would_convert_to_git_filter_fd(const char *path)
> if (!ca.drv->required)
> return 0;
>
> - return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
> + return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
> +}
> +
> +int can_clean_from_file(const char *path)
> +{
> + struct conv_attrs ca;
> +
> + convert_attrs(&ca, path);
> + if (!ca.drv)
> + return 0;
> +
> + /*
> + * Only use the cleanFromFile filter when the clean filter is also
> + * configured.
> + */
> + return (ca.drv->clean_from_file && ca.drv->clean);
> +}
> +
> +int can_smudge_to_file(const char *path)
> +{
> + struct conv_attrs ca;
> +
> + convert_attrs(&ca, path);
> + if (!ca.drv)
> + return 0;
> +
> + /*
> + * Only use the smudgeToFile filter when the smudge filter is also
> + * configured.
> + */
> + return (ca.drv->smudge_to_file && ca.drv->smudge);
> }
>
> const char *get_convert_attr_ascii(const char *path)
> @@ -864,7 +915,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
> required = ca.drv->required;
> }
>
> - ret |= apply_filter(path, src, len, -1, dst, filter);
> + ret |= apply_filter(path, NULL, src, len, -1, dst, filter);
> if (!ret && required)
> die("%s: clean filter '%s' failed", path, ca.drv->name);
>
> @@ -889,14 +940,34 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
> assert(ca.drv);
> assert(ca.drv->clean);
>
> - if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
> + if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean))
> die("%s: clean filter '%s' failed", path, ca.drv->name);
>
> crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
> ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> }
>
> -static int convert_to_working_tree_internal(const char *path, const char *src,
> +void convert_to_git_filter_from_file(const char *path, struct strbuf *dst,
> + enum safe_crlf checksafe)
> +{
> + struct conv_attrs ca;
> + convert_attrs(&ca, path);
> +
> + assert(ca.drv);
> + assert(ca.drv->clean);
> + assert(ca.drv->clean_from_file);
> +
> + if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file))
> + die("%s: cleanFromFile filter '%s' failed", path, ca.drv->name);
> +
> + crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
> + checksafe);
> + ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
> +}
> +
> +static int convert_to_working_tree_internal(const char *path,
> + const char *destpath,
> + const char *src,
> size_t len, struct strbuf *dst,
> int normalizing)
> {
> @@ -907,7 +978,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
>
> convert_attrs(&ca, path);
> if (ca.drv) {
> - filter = ca.drv->smudge;
> + if (destpath)
> + filter = ca.drv->smudge_to_file;
> + else
> + filter = ca.drv->smudge;
> required = ca.drv->required;
> }
>
> @@ -918,7 +992,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> }
> /*
> * CRLF conversion can be skipped if normalizing, unless there
> - * is a smudge filter. The filter might expect CRLFs.
> + * is a filter. The filter might expect CRLFs.
> */
> if (filter || !normalizing) {
> ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
> @@ -928,21 +1002,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
> }
> }
>
> - ret_filter = apply_filter(path, src, len, -1, dst, filter);
> + ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter);
> if (!ret_filter && required)
> - die("%s: smudge filter %s failed", path, ca.drv->name);
> + die("%s: %s filter %s failed", path, destpath ? "smudgeToFile" : "smudge", ca.drv->name);
>
> return ret | ret_filter;
> }
>
> int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> - return convert_to_working_tree_internal(path, src, len, dst, 0);
> + return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
> +}
> +
> +int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len)
> +{
> + struct strbuf output = STRBUF_INIT;
> + int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0);
> + /* The smudgeToFile filter stdout is not used. */
> + strbuf_release(&output);
> + return ret;
> }
>
> int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
> {
> - int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
> + int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1);
> if (ret) {
> src = dst->buf;
> len = dst->len;
> diff --git a/convert.h b/convert.h
> index 82871a1..6f46d10 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -42,6 +42,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
> struct strbuf *dst, enum safe_crlf checksafe);
> extern int convert_to_working_tree(const char *path, const char *src,
> size_t len, struct strbuf *dst);
> +extern int convert_to_working_tree_filter_to_file(const char *path,
> + const char *destpath,
> + const char *src,
> + size_t len);
> extern int renormalize_buffer(const char *path, const char *src, size_t len,
> struct strbuf *dst);
> static inline int would_convert_to_git(const char *path)
> @@ -53,6 +57,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd,
> struct strbuf *dst,
> enum safe_crlf checksafe);
> extern int would_convert_to_git_filter_fd(const char *path);
> +/* Precondition: can_clean_from_file(path) == true */
> +extern void convert_to_git_filter_from_file(const char *path,
> + struct strbuf *dst,
> + enum safe_crlf checksafe);
> +extern int can_clean_from_file(const char *path);
> +extern int can_smudge_to_file(const char *path);
>
> /*****************************************************************
> *
> --
> 2.8.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-07-13 16:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-11 22:45 [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Joey Hess
2016-07-11 22:45 ` [PATCH v5 1/8] clarify %f documentation Joey Hess
2016-07-11 22:45 ` [PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs Joey Hess
2016-07-13 16:02 ` Lars Schneider [this message]
2016-07-11 22:45 ` [PATCH v5 3/8] use cleanFromFile in git add Joey Hess
2016-07-11 22:45 ` [PATCH v5 4/8] use smudgeToFile in git checkout etc Joey Hess
2016-07-11 22:45 ` [PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config Joey Hess
2016-07-11 22:45 ` [PATCH v5 6/8] better recovery from failure of smudgeToFile filter Joey Hess
2016-07-11 22:45 ` [PATCH v5 7/8] use smudgeToFile filter in git am Joey Hess
2016-07-11 22:45 ` [PATCH v5 8/8] use smudgeToFile filter in recursive merge Joey Hess
2016-07-12 19:52 ` [PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu) Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1994A13B-DD56-4E60-8CBA-A7299A6D6703@gmail.com \
--to=larsxschneider@gmail.com \
--cc=git@vger.kernel.org \
--cc=joeyh@joeyh.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).