git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Randall S. Becker" <rsbecker@nexbridge.com>
To: "'Johannes Sixt'" <j6t@kdbg.org>
Cc: <git@vger.kernel.org>
Subject: RE: [Question] Signature calculation ignoring parts of binary files
Date: Wed, 12 Sep 2018 18:20:00 -0400	[thread overview]
Message-ID: <000001d44ae6$c2a20ac0$47e62040$@nexbridge.com> (raw)
In-Reply-To: <004e01d44ada$b4a11ad0$1de35070$@nexbridge.com>

On September 12, 2018 4:54 PM, I wrote:
> On September 12, 2018 4:48 PM, Johannes Sixt wrote:
> > Am 12.09.18 um 21:16 schrieb Randall S. Becker:
> > > I feel really bad asking this, and I should know the answer, and yet.
> > >
> > > I have a binary file that needs to go into a repo intact (unchanged).
> > > I also have a program that interprets the contents, like a textconv,
> > > that can output the relevant portions of the file in whatever format
> > > I like - used for diff typically, dumps in 1K chunks by file section.
> > > What I'm looking for is to have the SHA1 signature calculated with
> > > just the relevant portions of the file so that two actually
> > > different files will be considered the same by git during a commit
> > > or status. In real terms, I'm trying to ignore the Creator metadata
> > > of a JPG because it is mutable and irrelevant to my repo contents.
> > >
> > > I'm sorry to ask, but I thought this was in .gitattributes but I
> > > can't confirm the SHA1 behaviour.
> >
> > You are looking for a clean filter. See the 'filter' attribute in gitattributes(5).
> > Your clean filter program or script should strip the unwanted metadata
> > or set it to a constant known-good value.
> >
> > (You shouldn't need a smudge filter.)
> >
> > -- Hannes
> 
> Thanks Hannes. I thought about the clean filter, but I don't actually want to
> modify the file when going into git, just for SHA calculation. I need to be able
> to keep some origin metadata that might change with subsequent copies, so
> just cleaning the origin is not going to work - actually knowing the original
> author is important to our process. My objective is to keep the original file
> 100% exact as supplied and then ignore any changes to the metadata that I
> don't care about (like Creator) if the remainder of the file is the same.

I had a thought that might be workable, opinions are welcome on this.

The commit of my rather weird project is done by a script so I have flexibility in my approach. What I could do is set up a diff textconv configuration so that the text diff of the two JPG files will show no differences if the immutable fields and the image are the same. I can then trigger a git add and git commit for only those files where git diff reports no differences. That way the actual original file is stored in git with 100% fidelity (no cleaning). It's not as elegant as I'd like, but it does solve what I'm trying to do. Does this sound reasonable and/or is there a better way?

Cheers,
Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.




  reply	other threads:[~2018-09-12 22:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 19:16 [Question] Signature calculation ignoring parts of binary files Randall S. Becker
2018-09-12 20:48 ` Johannes Sixt
2018-09-12 20:53   ` Randall S. Becker
2018-09-12 22:20     ` Randall S. Becker [this message]
2018-09-12 22:59       ` Junio C Hamano
2018-09-13 12:19         ` Randall S. Becker
2018-09-13 15:03           ` Junio C Hamano
2018-09-13 15:38             ` Randall S. Becker
2018-09-13 17:51             ` Junio C Hamano
2018-09-13 17:55               ` Randall S. Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001d44ae6$c2a20ac0$47e62040$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).