git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [Question] Diff text filters and git add
@ 2019-07-09 21:43 Randall S. Becker
  2019-07-09 21:51 ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Randall S. Becker @ 2019-07-09 21:43 UTC (permalink / raw)
  To: git

Hi all,

I am trying to do something a bit strange and wonder about the best way to
go. I have a text filter that presents content of very special binary file
formats using textconv. What I am wondering is whether using the textconv
mechanism is sufficient to have git calculate the file signature or whether
I need to use an external diff engine, so that git add behaves in a stable
manner (i.e., does git internally use the textconv mechanism for evaluating
whether a file changed or whether the external diff engine is required, or
whether this is even possible at all).

The basic use case is that there is a timestamp embedded in the binary file
that I want to forever ignore when committing. I only need this done on one
specific machine, which is under Jenkins control, so it's not something
developers would deal with at all (so the filter config is in one place).
When the binary generator runs, if the two file images are "similar enough"
(as in: the same except for the generated timestamp, and a couple of other
annoying bits of metadata), I want git to think that they are the same in an
automated way, so that when I am constructing commits, I do not want what
would be considered a duplicate of what is essentially the same file.

Sadly, I cannot modify the generator, so I'm stuck with the files being
wonky. I also cannot run the generator anywhere downstream, so doing so on
the deployment engine is also not an option (don't ask, the generator is
limited on where it can be run). Suggestions are welcome, please.

Thanks,
Randall




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Question] Diff text filters and git add
  2019-07-09 21:43 [Question] Diff text filters and git add Randall S. Becker
@ 2019-07-09 21:51 ` Jeff King
  2019-07-10 12:44   ` Randall S. Becker
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2019-07-09 21:51 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Tue, Jul 09, 2019 at 05:43:05PM -0400, Randall S. Becker wrote:

> I am trying to do something a bit strange and wonder about the best way to
> go. I have a text filter that presents content of very special binary file
> formats using textconv. What I am wondering is whether using the textconv
> mechanism is sufficient to have git calculate the file signature or whether
> I need to use an external diff engine, so that git add behaves in a stable
> manner (i.e., does git internally use the textconv mechanism for evaluating
> whether a file changed or whether the external diff engine is required, or
> whether this is even possible at all).

No, textconv only applies when generating a diff to output, and will
never impact what's stored in Git.

It sounds like you might want a clean filter instead, to sanitize
the file contents as they come into Git (and perhaps a matching smudge
filter to convert back to the working-tree version if necessary).

You're talking about "the diff engine" here, but note that git-add would
never do a diff at all. It cares only about full sha1s (and optimizes
out re-computing the sha1 on each invocation by using stat data). So
outside of clean/smudge, there's nothing else going on.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [Question] Diff text filters and git add
  2019-07-09 21:51 ` Jeff King
@ 2019-07-10 12:44   ` Randall S. Becker
  2019-07-11 17:28     ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Randall S. Becker @ 2019-07-10 12:44 UTC (permalink / raw)
  To: 'Jeff King'; +Cc: git

On July 9, 2019 5:51 PM, Peff wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: git@vger.kernel.org
> Subject: Re: [Question] Diff text filters and git add
> 
> On Tue, Jul 09, 2019 at 05:43:05PM -0400, Randall S. Becker wrote:
> 
> > I am trying to do something a bit strange and wonder about the best
> > way to go. I have a text filter that presents content of very special
> > binary file formats using textconv. What I am wondering is whether
> > using the textconv mechanism is sufficient to have git calculate the
> > file signature or whether I need to use an external diff engine, so
> > that git add behaves in a stable manner (i.e., does git internally use
> > the textconv mechanism for evaluating whether a file changed or
> > whether the external diff engine is required, or whether this is even
> possible at all).
> 
> No, textconv only applies when generating a diff to output, and will never
> impact what's stored in Git.
> 
> It sounds like you might want a clean filter instead, to sanitize the file
> contents as they come into Git (and perhaps a matching smudge filter to
> convert back to the working-tree version if necessary).
> 
> You're talking about "the diff engine" here, but note that git-add would never
> do a diff at all. It cares only about full sha1s (and optimizes out re-computing
> the sha1 on each invocation by using stat data). So outside of clean/smudge,
> there's nothing else going on.

Thanks. I can script this instead. Will do an external diff then --assume-unchanged when I detect an equivalence.

Appreciate the advice and info,
Randall


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Question] Diff text filters and git add
  2019-07-10 12:44   ` Randall S. Becker
@ 2019-07-11 17:28     ` Jakub Narebski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2019-07-11 17:28 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: 'Jeff King', git

"Randall S. Becker" <rsbecker@nexbridge.com> writes:
> On July 9, 2019 5:51 PM, Peff wrote:
[...]
>> No, textconv only applies when generating a diff to output, and will never
>> impact what's stored in Git.
>> 
>> It sounds like you might want a clean filter instead, to sanitize the file
>> contents as they come into Git (and perhaps a matching smudge filter to
>> convert back to the working-tree version if necessary).
>> 
>> You're talking about "the diff engine" here, but note that git-add would never
>> do a diff at all. It cares only about full sha1s (and optimizes out re-computing
>> the sha1 on each invocation by using stat data). So outside of clean/smudge,
>> there's nothing else going on.
>
> Thanks. I can script this instead. Will do an external diff then
> --assume-unchanged when I detect an equivalence.

If you want to ignore changes, --assume-unchanged (i.e. lying to Git) is
a wrong solution, as it can lead to data loss.  It is meant as
performance optimization.

A better solution would be to use --skip-worktree, which though meant
for sparse checkout can be used for ignoring changes.  The only problem
is that it can prevent some safe operations, like git-stash, because git
thinks that it could lead to data loss.

Though I am not sure if they are needed with clean/smudge filter.

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-11 17:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09 21:43 [Question] Diff text filters and git add Randall S. Becker
2019-07-09 21:51 ` Jeff King
2019-07-10 12:44   ` Randall S. Becker
2019-07-11 17:28     ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).