Re: Git for games working group

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: John Austin <john@astrangergravity.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Lars Schneider <larsxschneider@gmail.com>,
	pastelmobilesuit@github.com, id@joeyh.name
Subject: Re: Git for games working group
Date: Sun, 16 Sep 2018 13:49:58 -0700	[thread overview]
Message-ID: <CA+AhR6dDEWSmQ8srbXmx2BYgDBdSRtz9U7czHwepioJAZt3Xkg@mail.gmail.com> (raw)
In-Reply-To: <878t41lcfi.fsf@evledraar.gmail.com>

Thanks for all the thoughts so far -- I'm going to try to collate some
of my responses to avoid this getting too lengthy.

## Regarding Merging / Diffing
A couple of folks have suggested that we could improve merging /
diffing of binary files in general. I think this is useful, but can
only ever result in minor improvements, for the following reasons:

1. Game developers use an incredible amount of proprietary file
formats: Maya, Houdini, Photoshop, Wwise, Unreal UAssets, etc. At the
end of the day, it's fairly unlikely that we can build visual merge
tools for these asset types without an enormous amount of corporate
support.

2. Merging doesn't have a meaning for many types of files. I think git
has trained us that everything is merge-able, but that's not always
the case. If you gave an audio designer two voice-over audio files and
asked them to merge them, they'd give you a pretty strange look. You
have to re-record it from scratch. Content files can be highly
intertwined and highly subjective: as a textual metaphor, every line
of content conflicts with every other line. Even if you had a perfect
merge tool, it just doesn't make much sense to try to merge changes,
unless it's an incredibly simple change.

## Regarding File Locking:
File locking works well enough in Perforce, but there are a couple of
issues I've found using file locking in LFS or in Gitolite (hadn't
seen this before, thanks!).

1. File Locking is an 'active' system. File Locking adds extra
operations that must be taken, both before writing to a file and then
after finishing your changes. Artists either must drop down to a
terminal (unlikely), or we must integrate our file-locking system with
existing artist tools (a large amount of work). Either way it adds a
lot of extra grunt-work. Imagine having to manually mark which files
you modify rather than just using git status. One of git's biggest
benefit is removing this type of manual labor.

2. File Locking doesn't extend well across branches. Acquiring a lock
usually blocks modifications to this file across all branches. This
cuts off basic branching models and features (like having release
branches) that are large part of why git is so successful.

3. It's not entirely sound. Developer A can modify 'binary.bin', and
push the changes to master. Developer B, who is behind master by a
couple of days, can then unknowingly acquire the lock and make further
changes ignoring A's new commit. When B attempts to push, they will
get conflicts. If you look closely, this is a symptom of issue 2:
locking doesn't understand branches.

## "Implicit" Locking

Instead, I think it's better to think about how we can use the
structure of the git graph to solve the issue. Imagine the following
pre-commit hook for a developer attempting to commit 'binary.bin':

If there exists any commit binary.bin on a different branch that is
not integrated into this branch,  block the commit.

In this case, making a commit with a file blocks others from touching
it, until they pull in that commit. To make the parallel, making a
commit acquires a 'lock' on the file, but there's no release. The only
requirement is that you always modify the latest version of the file.

This has issues of its own, and it's a simplification of the system I
have in mind. It means Developer A needs to have information about the
commit graph local to Developer B's machine (but notably not the
files). However I think it is a better starting place for thinking
about these sorts of systems. The locks fall implicitly from the
commit graph structure, so it plays well with all of your normal git
commands. You can branch, cherry-pick, rebase, etc without any extra
support or aliases. I'll write up something a bit more detailed in a
bit.

- JA
On Sun, Sep 16, 2018 at 7:55 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sat, Sep 15 2018, Taylor Blau wrote:
>
> > On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote:
> >> I've been working myself on strategies for handling binary conflicts,
> >> and particularly how to do it in a git-friendly way (ie. avoiding as
> >> much centralization as possible and playing into the commit/branching
> >> model of git).
> >
> > Git LFS handles conflict resolution and merging over binary files with
> > two primary mechanisms: (1) file locking, and (2) use of a merge-tool.
> >
> >   1. is the most "non-Git-friendly" solution, since it requires the use
> >      of a centralized Git LFS server (to be run alongside your remote
> >      repository) and that every clone phones home to make sure that they
> >      are OK to acquire a lock.
> >
> >      The workflow that we expect is that users will run 'git lfs lock
> >      /path/to/file' any time they want to make a change to an
> >      unmeregeable file, and that this call first checks to make sure
> >      that they are the only person who would hold the lock.
> >
> >      We also periodically "sync" the state of locks locally with those
> >      on the remote, namely during the post-merge, post-commit, and
> >      post-checkout hook(s).
> >
> >      Users are expected to perform the 'git lfs unlock /path/to/file'
> >      anytime they "merge" their changes back into master, but the
> >      thought is that servers could be taught to automatically do this
> >      upon the remote detecting the merge.
> >
> >   2. is a more it-friendly approach, i.e., that the 'git mergetool'
> >      builtin does work with files tracked under Git LFS, i.e., that both
> >      sides of the merge are filtered so that the mergetool can resolve
> >      the changes in the large files instead of the textual pointers.
> >
> >
> >> I've got to a loose design that I like, but it'd be good to get some
> >> feedback, as well as hearing what other game devs would want in a
> >> binary conflict system.
> >
> > Please do share, and I would be happy to provide feedback (and make
> > proposals to integrate favorable parts of your ideas into Git LFS).
>
> All of this is obviously correct as far as git-lfs goes. Just to use
> this as a jump-off comment on the topic of file locking and to frame
> this discussion more generally.
>
> It's true that a tool like git-lfs "requires the use of a centralized
> [...] server" for file locking, but it's not the case that a feature
> like file locking requires a centralized authority.
>
> In particular, git-lfs unlike git-annex (which preceded it) does the
> opposite of (to quote John upthread) "avoid[...] as much centralization
> as possible", it *is* explicitly a centralized large file solution, not
> a distributed one, as opposed to git-annex.
>
> That's not a critique of git-lfs or the centralized method, or a
> recommendation for decentralization in this context, but we already have
> a similar distributed solution in the form of git-annex, it's just a hop
> skip and a jump away from changing "who has the file" to "who has the
> lock".
>
> So how does that work? In the centralized case like
> git-lfs/cvs/p4/whatever you have some "lock/unlock" command, and it
> locks a file on a central server, locking is usually a a [locked?, who]
> state of "is it locked" and "who locked it?". Usually this is also
> followed-up on the client-side by checking those files out without the
> "w" flag.
>
> In the hypothetical git-annex-like case (simplifying a bit for the
> purposes this explanation), for every FILE in your tree you have a
> corresponding FILE.lock file, but it's not a boolean, but a log of who's
> asked for locks, i.e. lines of:
>
>     <repository UUID> <ts> <state> <who (email?)> <explanation?>
>
> E.g.:
>
>     $ cat Makefile.lock
>     my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring all Makefiles"
>     my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!"
>
> This log is append-only, when clients encounter conflicts there's a
> merge driver to ensure that all updates are kept.
>
> You can then enact a policy saying you care or don't care about updates
> from certain sources, or ignore locks older than so-and-so.
>
> None of this is stuff I'd really recommend. It's just instructive to
> point out that if someone wants a distributed locking solution for git,
> it pretty much already exists, you can even (ab)use git-annex for it
> today with a tiny hack on top.
>
> I.e. each time you want to lock a file called Makefile just:
>
>     echo We created a lock for this >Makefile.lock &&
>     git annex add Makefile.lock &&
>     git annex sync
>
> And to release the lock:
>
>     git annex rm Makefile.lock &&
>     git annex sync
>
> Then you and others using this just mentally pretend (or setup aliases)
> that the following mapping exists:
>
>     git annex get <file> && git annex sync ==> git lockit <file>
>     git annex rm <file>  && git annex sync ==> git unlockit <file>
>
> And that stuff like "git annex whereis" (designed to list "who has the
> files") means "git annex who-has-locks".
>
> Then you'd change the post-{checkout,merge} hooks to list the locks
> "tracked annex files", chmod -w appropriately, and voila, a distributed
> locking solution for git built on top of an existing tool you can
> implement in a couple of hours.
>
> Now, if I were in a game studio like this would I do any of this? Nope,
> I think even if you go for locks something like the centralized git-lfs
> approach is simpler and probably more appropriate (you presumably want
> to be centralized anyway).
>
> But to be honest I don't really get the need for this given something
> like the use-case noted upthread:
>
>     > John Austin <john@astrangergravity.com> wrote:
>     > An essential example would be a team of 5 audio designers working
>     > together on the SFX for a game. If one designer wants to add a layer
>     > of ambience to 40% of the .wav files, they have to coordinate with
>     > everyone else on the project manually.
>
> If you have 5 people working on a project together, isn't it more
> straightforward to post in IRC/E-Mail:
>
>     Hey @all, don't change *.wav files for the next couple of days,
>     major refactoring.
>
> That's what we do all the time over in the non-game-non-binary-assets SW
> development world, and I daresay that even if you have textual
> conflicts, they're sometimes just as hard to solve.
>
> I.e. you can have two people unaware of each other on a team starting to
> in parallel refactor the same set of code in two completely different
> ways, needing a lot of manual merging / throwing out of most of one
> implementation. The way that's usually dealt with is something like the
> above example post to a ML.
>
> But maybe I'm just not imagining the use-cases.
>

next prev parent reply	other threads:[~2018-09-16 20:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-14 17:55 Git for games working group John Austin
2018-09-14 19:00 ` Taylor Blau
2018-09-14 21:09   ` John Austin
2018-09-15 16:40     ` Taylor Blau
2018-09-16 14:55       ` Ævar Arnfjörð Bjarmason
2018-09-16 20:49         ` John Austin [this message]
2018-09-17 13:55         ` Taylor Blau
2018-09-17 14:01           ` Randall S. Becker
2018-09-17 15:00           ` Ævar Arnfjörð Bjarmason
2018-09-17 15:57             ` Taylor Blau
2018-09-17 16:21               ` Randall S. Becker
2018-09-17 16:47             ` Joey Hess
2018-09-17 17:23               ` Ævar Arnfjörð Bjarmason
2018-09-23 17:28                 ` John Austin
2018-09-23 17:56                   ` Randall S. Becker
2018-09-23 19:53                     ` John Austin
2018-09-23 19:55                       ` John Austin
2018-09-23 20:43                       ` Randall S. Becker
2018-09-24 14:01                       ` Taylor Blau
2018-09-24 15:34                         ` John Austin
2018-09-24 19:58                           ` Taylor Blau
2018-09-25  4:05                             ` John Austin
2018-09-25 20:14                               ` Taylor Blau
2018-09-24 13:59                     ` Taylor Blau
2018-09-14 21:13   ` John Austin
2018-09-16  7:56     ` David Aguilar
2018-09-17 13:48       ` Taylor Blau
2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason
2018-09-14 23:36   ` John Austin
2018-09-15 16:42     ` Taylor Blau
2018-09-16 18:17       ` John Austin
2018-09-16 22:05         ` Jonathan Nieder
2018-09-17 13:58           ` Taylor Blau
2018-09-17 15:58             ` Jonathan Nieder
2018-10-03 12:28               ` Thomas Braun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+AhR6dDEWSmQ8srbXmx2BYgDBdSRtz9U7czHwepioJAZt3Xkg@mail.gmail.com \
    --to=john@astrangergravity.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    --cc=larsxschneider@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=pastelmobilesuit@github.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).