git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Randall S. Becker" <rsbecker@nexbridge.com>
To: "'Junio C Hamano'" <gitster@pobox.com>
Cc: "'Johannes Sixt'" <j6t@kdbg.org>, <git@vger.kernel.org>
Subject: RE: [Question] Signature calculation ignoring parts of binary files
Date: Thu, 13 Sep 2018 08:19:04 -0400	[thread overview]
Message-ID: <001101d44b5b$fa337350$ee9a59f0$@nexbridge.com> (raw)
In-Reply-To: <xmqq1s9yjp9j.fsf@gitster-ct.c.googlers.com>

On September 12, 2018 7:00 PM, Junio C Hamano wrote:
> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
> >> author is important to our process. My objective is to keep the
> >> original file 100% exact as supplied and then ignore any changes to
> >> the metadata that I don't care about (like Creator) if the remainder of
the
> file is the same.
> 
> That will *not* work.  If person A gave you a version of original, which
> hashes to X after you strip the cruft you do not care about, you would
> register that original with person A's fingerprint on under the name of X.
> What happens when person B gives you another version, which is not byte-
> for-byte identical to the one you got earlier from person A, but does hash
to
> the same X after you strip the cruft?  If you are going to store it in
Git, and if
> by SHA-1 you are calling what we perceive as "object name" in Git land,
you
> must store that one with person B's fingerprint on it also under the name
of
> X.  Now which version will you get from Git when you ask it to give you
the
> object that hashes to X?

The scenario is slightly different.
1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
git unchanged.
2. Person B gives me binary file-2 with fingerprint B2. This does not go
into git yet.
3. We attempt a git diff between the committed file-1 and uncommitted file-2
using a textconv implementation that strips what we don't need to compare.
4. If file-1 and file-2 have no difference when textconv is used, file-2 is
not added and not committed. It is discarded with impunity, never to be seen
again, although we might whine a lot at the user for attempting to put
file-2 in - but that's not git's issue.
5. If file-1 and file-2 have differences when textconv is used, file-2 is
committed with fingerprint B2.
6. Even if an error is made by the user and they commit file-2 with B2
regardless of textconv, there will be a human who complains about it, but
git has two unambiguous fingerprints that happen to have no diffs after
textconv is applied.

My original hope was that textconv could be used to influence the
fingerprint, but I do not think that is the case, so I went with an
alternative. In the application, I am not allowed to strip any cruft off
file-1 when it is stored - it must be byte-for-byte the original file. This
application is marginally related to a DRM-like situation where we only care
about the original image provided by a user, but any copies that are
provided by another user with modified metadata will be disallowed from
repository.

Does that make more sense? 

Cheers,
Randall


  reply	other threads:[~2018-09-13 12:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 19:16 [Question] Signature calculation ignoring parts of binary files Randall S. Becker
2018-09-12 20:48 ` Johannes Sixt
2018-09-12 20:53   ` Randall S. Becker
2018-09-12 22:20     ` Randall S. Becker
2018-09-12 22:59       ` Junio C Hamano
2018-09-13 12:19         ` Randall S. Becker [this message]
2018-09-13 15:03           ` Junio C Hamano
2018-09-13 15:38             ` Randall S. Becker
2018-09-13 17:51             ` Junio C Hamano
2018-09-13 17:55               ` Randall S. Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='001101d44b5b$fa337350$ee9a59f0$@nexbridge.com' \
    --to=rsbecker@nexbridge.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).