* Git for games working group @ 2018-09-14 17:55 John Austin 2018-09-14 19:00 ` Taylor Blau 2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 35+ messages in thread From: John Austin @ 2018-09-14 17:55 UTC (permalink / raw) To: git Hey all, I've been putting together a working group for game studios wanting to use Git. There are a couple of blockers that keep most game and media companies on Perforce or others, but most would love to use git if it were feasible. The biggest tasks I'd like to tackle are: - improvements to large file management (mostly solved by LFS, GVFS) - avoiding excessive binary file conflicts (this is one of the big reasons most studio are on Perforce) Is anyone interested in contributing/offering insights? I suspect most folks here are git users as is, but if you know someone stuck on Perforce, I'd love to chat with them! Happy to field thoughts in this thread or answer other questions about why git doesn't work for games at the moment. Cheers, JA ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 17:55 Git for games working group John Austin @ 2018-09-14 19:00 ` Taylor Blau 2018-09-14 21:09 ` John Austin 2018-09-14 21:13 ` John Austin 2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 35+ messages in thread From: Taylor Blau @ 2018-09-14 19:00 UTC (permalink / raw) To: John Austin; +Cc: git, sandals, larsxschneider, pastelmobilesuit Hi John, On Fri, Sep 14, 2018 at 10:55:39AM -0700, John Austin wrote: > Is anyone interested in contributing/offering insights? I suspect most > folks here are git users as is, but if you know someone stuck on > Perforce, I'd love to chat with them! I'm thrilled that other folks are interested in this, too. I'm not a video game developer myself, but I am the maintainer of Git LFS. If there's a capacity in which I could be useful to this group, I'd be more than happy to offer myself in that capacity. I'm cc-ing in brian carlson, Lars Schneider, and Preben Ingvaldsen on this email, too, since they all server on the core team of the project. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 19:00 ` Taylor Blau @ 2018-09-14 21:09 ` John Austin 2018-09-15 16:40 ` Taylor Blau 2018-09-14 21:13 ` John Austin 1 sibling, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-14 21:09 UTC (permalink / raw) To: me; +Cc: git, sandals, larsxschneider, pastelmobilesuit Hey Taylor, Great to have your support! I think LFS has done a great job so far solving the large file issue. I've been working myself on strategies for handling binary conflicts, and particularly how to do it in a git-friendly way (ie. avoiding as much centralization as possible and playing into the commit/branching model of git). I've got to a loose design that I like, but it'd be good to get some feedback, as well as hearing what other game devs would want in a binary conflict system. - John On Fri, Sep 14, 2018 at 12:00 PM Taylor Blau <me@ttaylorr.com> wrote: > > Hi John, > > On Fri, Sep 14, 2018 at 10:55:39AM -0700, John Austin wrote: > > Is anyone interested in contributing/offering insights? I suspect most > > folks here are git users as is, but if you know someone stuck on > > Perforce, I'd love to chat with them! > > I'm thrilled that other folks are interested in this, too. I'm not a > video game developer myself, but I am the maintainer of Git LFS. If > there's a capacity in which I could be useful to this group, I'd be more > than happy to offer myself in that capacity. > > I'm cc-ing in brian carlson, Lars Schneider, and Preben Ingvaldsen on > this email, too, since they all server on the core team of the project. > > Thanks, > Taylor > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 21:09 ` John Austin @ 2018-09-15 16:40 ` Taylor Blau 2018-09-16 14:55 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-15 16:40 UTC (permalink / raw) To: John Austin; +Cc: me, git, sandals, larsxschneider, pastelmobilesuit On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote: > I've been working myself on strategies for handling binary conflicts, > and particularly how to do it in a git-friendly way (ie. avoiding as > much centralization as possible and playing into the commit/branching > model of git). Git LFS handles conflict resolution and merging over binary files with two primary mechanisms: (1) file locking, and (2) use of a merge-tool. 1. is the most "non-Git-friendly" solution, since it requires the use of a centralized Git LFS server (to be run alongside your remote repository) and that every clone phones home to make sure that they are OK to acquire a lock. The workflow that we expect is that users will run 'git lfs lock /path/to/file' any time they want to make a change to an unmeregeable file, and that this call first checks to make sure that they are the only person who would hold the lock. We also periodically "sync" the state of locks locally with those on the remote, namely during the post-merge, post-commit, and post-checkout hook(s). Users are expected to perform the 'git lfs unlock /path/to/file' anytime they "merge" their changes back into master, but the thought is that servers could be taught to automatically do this upon the remote detecting the merge. 2. is a more it-friendly approach, i.e., that the 'git mergetool' builtin does work with files tracked under Git LFS, i.e., that both sides of the merge are filtered so that the mergetool can resolve the changes in the large files instead of the textual pointers. > I've got to a loose design that I like, but it'd be good to get some > feedback, as well as hearing what other game devs would want in a > binary conflict system. Please do share, and I would be happy to provide feedback (and make proposals to integrate favorable parts of your ideas into Git LFS). Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-15 16:40 ` Taylor Blau @ 2018-09-16 14:55 ` Ævar Arnfjörð Bjarmason 2018-09-16 20:49 ` John Austin 2018-09-17 13:55 ` Taylor Blau 0 siblings, 2 replies; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-16 14:55 UTC (permalink / raw) To: Taylor Blau Cc: John Austin, git, sandals, larsxschneider, pastelmobilesuit, Joey Hess On Sat, Sep 15 2018, Taylor Blau wrote: > On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote: >> I've been working myself on strategies for handling binary conflicts, >> and particularly how to do it in a git-friendly way (ie. avoiding as >> much centralization as possible and playing into the commit/branching >> model of git). > > Git LFS handles conflict resolution and merging over binary files with > two primary mechanisms: (1) file locking, and (2) use of a merge-tool. > > 1. is the most "non-Git-friendly" solution, since it requires the use > of a centralized Git LFS server (to be run alongside your remote > repository) and that every clone phones home to make sure that they > are OK to acquire a lock. > > The workflow that we expect is that users will run 'git lfs lock > /path/to/file' any time they want to make a change to an > unmeregeable file, and that this call first checks to make sure > that they are the only person who would hold the lock. > > We also periodically "sync" the state of locks locally with those > on the remote, namely during the post-merge, post-commit, and > post-checkout hook(s). > > Users are expected to perform the 'git lfs unlock /path/to/file' > anytime they "merge" their changes back into master, but the > thought is that servers could be taught to automatically do this > upon the remote detecting the merge. > > 2. is a more it-friendly approach, i.e., that the 'git mergetool' > builtin does work with files tracked under Git LFS, i.e., that both > sides of the merge are filtered so that the mergetool can resolve > the changes in the large files instead of the textual pointers. > > >> I've got to a loose design that I like, but it'd be good to get some >> feedback, as well as hearing what other game devs would want in a >> binary conflict system. > > Please do share, and I would be happy to provide feedback (and make > proposals to integrate favorable parts of your ideas into Git LFS). All of this is obviously correct as far as git-lfs goes. Just to use this as a jump-off comment on the topic of file locking and to frame this discussion more generally. It's true that a tool like git-lfs "requires the use of a centralized [...] server" for file locking, but it's not the case that a feature like file locking requires a centralized authority. In particular, git-lfs unlike git-annex (which preceded it) does the opposite of (to quote John upthread) "avoid[...] as much centralization as possible", it *is* explicitly a centralized large file solution, not a distributed one, as opposed to git-annex. That's not a critique of git-lfs or the centralized method, or a recommendation for decentralization in this context, but we already have a similar distributed solution in the form of git-annex, it's just a hop skip and a jump away from changing "who has the file" to "who has the lock". So how does that work? In the centralized case like git-lfs/cvs/p4/whatever you have some "lock/unlock" command, and it locks a file on a central server, locking is usually a a [locked?, who] state of "is it locked" and "who locked it?". Usually this is also followed-up on the client-side by checking those files out without the "w" flag. In the hypothetical git-annex-like case (simplifying a bit for the purposes this explanation), for every FILE in your tree you have a corresponding FILE.lock file, but it's not a boolean, but a log of who's asked for locks, i.e. lines of: <repository UUID> <ts> <state> <who (email?)> <explanation?> E.g.: $ cat Makefile.lock my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring all Makefiles" my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!" This log is append-only, when clients encounter conflicts there's a merge driver to ensure that all updates are kept. You can then enact a policy saying you care or don't care about updates from certain sources, or ignore locks older than so-and-so. None of this is stuff I'd really recommend. It's just instructive to point out that if someone wants a distributed locking solution for git, it pretty much already exists, you can even (ab)use git-annex for it today with a tiny hack on top. I.e. each time you want to lock a file called Makefile just: echo We created a lock for this >Makefile.lock && git annex add Makefile.lock && git annex sync And to release the lock: git annex rm Makefile.lock && git annex sync Then you and others using this just mentally pretend (or setup aliases) that the following mapping exists: git annex get <file> && git annex sync ==> git lockit <file> git annex rm <file> && git annex sync ==> git unlockit <file> And that stuff like "git annex whereis" (designed to list "who has the files") means "git annex who-has-locks". Then you'd change the post-{checkout,merge} hooks to list the locks "tracked annex files", chmod -w appropriately, and voila, a distributed locking solution for git built on top of an existing tool you can implement in a couple of hours. Now, if I were in a game studio like this would I do any of this? Nope, I think even if you go for locks something like the centralized git-lfs approach is simpler and probably more appropriate (you presumably want to be centralized anyway). But to be honest I don't really get the need for this given something like the use-case noted upthread: > John Austin <john@astrangergravity.com> wrote: > An essential example would be a team of 5 audio designers working > together on the SFX for a game. If one designer wants to add a layer > of ambience to 40% of the .wav files, they have to coordinate with > everyone else on the project manually. If you have 5 people working on a project together, isn't it more straightforward to post in IRC/E-Mail: Hey @all, don't change *.wav files for the next couple of days, major refactoring. That's what we do all the time over in the non-game-non-binary-assets SW development world, and I daresay that even if you have textual conflicts, they're sometimes just as hard to solve. I.e. you can have two people unaware of each other on a team starting to in parallel refactor the same set of code in two completely different ways, needing a lot of manual merging / throwing out of most of one implementation. The way that's usually dealt with is something like the above example post to a ML. But maybe I'm just not imagining the use-cases. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-16 14:55 ` Ævar Arnfjörð Bjarmason @ 2018-09-16 20:49 ` John Austin 2018-09-17 13:55 ` Taylor Blau 1 sibling, 0 replies; 35+ messages in thread From: John Austin @ 2018-09-16 20:49 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, git, brian m. carlson, Lars Schneider, pastelmobilesuit, id Thanks for all the thoughts so far -- I'm going to try to collate some of my responses to avoid this getting too lengthy. ## Regarding Merging / Diffing A couple of folks have suggested that we could improve merging / diffing of binary files in general. I think this is useful, but can only ever result in minor improvements, for the following reasons: 1. Game developers use an incredible amount of proprietary file formats: Maya, Houdini, Photoshop, Wwise, Unreal UAssets, etc. At the end of the day, it's fairly unlikely that we can build visual merge tools for these asset types without an enormous amount of corporate support. 2. Merging doesn't have a meaning for many types of files. I think git has trained us that everything is merge-able, but that's not always the case. If you gave an audio designer two voice-over audio files and asked them to merge them, they'd give you a pretty strange look. You have to re-record it from scratch. Content files can be highly intertwined and highly subjective: as a textual metaphor, every line of content conflicts with every other line. Even if you had a perfect merge tool, it just doesn't make much sense to try to merge changes, unless it's an incredibly simple change. ## Regarding File Locking: File locking works well enough in Perforce, but there are a couple of issues I've found using file locking in LFS or in Gitolite (hadn't seen this before, thanks!). 1. File Locking is an 'active' system. File Locking adds extra operations that must be taken, both before writing to a file and then after finishing your changes. Artists either must drop down to a terminal (unlikely), or we must integrate our file-locking system with existing artist tools (a large amount of work). Either way it adds a lot of extra grunt-work. Imagine having to manually mark which files you modify rather than just using git status. One of git's biggest benefit is removing this type of manual labor. 2. File Locking doesn't extend well across branches. Acquiring a lock usually blocks modifications to this file across all branches. This cuts off basic branching models and features (like having release branches) that are large part of why git is so successful. 3. It's not entirely sound. Developer A can modify 'binary.bin', and push the changes to master. Developer B, who is behind master by a couple of days, can then unknowingly acquire the lock and make further changes ignoring A's new commit. When B attempts to push, they will get conflicts. If you look closely, this is a symptom of issue 2: locking doesn't understand branches. ## "Implicit" Locking Instead, I think it's better to think about how we can use the structure of the git graph to solve the issue. Imagine the following pre-commit hook for a developer attempting to commit 'binary.bin': If there exists any commit binary.bin on a different branch that is not integrated into this branch, block the commit. In this case, making a commit with a file blocks others from touching it, until they pull in that commit. To make the parallel, making a commit acquires a 'lock' on the file, but there's no release. The only requirement is that you always modify the latest version of the file. This has issues of its own, and it's a simplification of the system I have in mind. It means Developer A needs to have information about the commit graph local to Developer B's machine (but notably not the files). However I think it is a better starting place for thinking about these sorts of systems. The locks fall implicitly from the commit graph structure, so it plays well with all of your normal git commands. You can branch, cherry-pick, rebase, etc without any extra support or aliases. I'll write up something a bit more detailed in a bit. - JA On Sun, Sep 16, 2018 at 7:55 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Sat, Sep 15 2018, Taylor Blau wrote: > > > On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote: > >> I've been working myself on strategies for handling binary conflicts, > >> and particularly how to do it in a git-friendly way (ie. avoiding as > >> much centralization as possible and playing into the commit/branching > >> model of git). > > > > Git LFS handles conflict resolution and merging over binary files with > > two primary mechanisms: (1) file locking, and (2) use of a merge-tool. > > > > 1. is the most "non-Git-friendly" solution, since it requires the use > > of a centralized Git LFS server (to be run alongside your remote > > repository) and that every clone phones home to make sure that they > > are OK to acquire a lock. > > > > The workflow that we expect is that users will run 'git lfs lock > > /path/to/file' any time they want to make a change to an > > unmeregeable file, and that this call first checks to make sure > > that they are the only person who would hold the lock. > > > > We also periodically "sync" the state of locks locally with those > > on the remote, namely during the post-merge, post-commit, and > > post-checkout hook(s). > > > > Users are expected to perform the 'git lfs unlock /path/to/file' > > anytime they "merge" their changes back into master, but the > > thought is that servers could be taught to automatically do this > > upon the remote detecting the merge. > > > > 2. is a more it-friendly approach, i.e., that the 'git mergetool' > > builtin does work with files tracked under Git LFS, i.e., that both > > sides of the merge are filtered so that the mergetool can resolve > > the changes in the large files instead of the textual pointers. > > > > > >> I've got to a loose design that I like, but it'd be good to get some > >> feedback, as well as hearing what other game devs would want in a > >> binary conflict system. > > > > Please do share, and I would be happy to provide feedback (and make > > proposals to integrate favorable parts of your ideas into Git LFS). > > All of this is obviously correct as far as git-lfs goes. Just to use > this as a jump-off comment on the topic of file locking and to frame > this discussion more generally. > > It's true that a tool like git-lfs "requires the use of a centralized > [...] server" for file locking, but it's not the case that a feature > like file locking requires a centralized authority. > > In particular, git-lfs unlike git-annex (which preceded it) does the > opposite of (to quote John upthread) "avoid[...] as much centralization > as possible", it *is* explicitly a centralized large file solution, not > a distributed one, as opposed to git-annex. > > That's not a critique of git-lfs or the centralized method, or a > recommendation for decentralization in this context, but we already have > a similar distributed solution in the form of git-annex, it's just a hop > skip and a jump away from changing "who has the file" to "who has the > lock". > > So how does that work? In the centralized case like > git-lfs/cvs/p4/whatever you have some "lock/unlock" command, and it > locks a file on a central server, locking is usually a a [locked?, who] > state of "is it locked" and "who locked it?". Usually this is also > followed-up on the client-side by checking those files out without the > "w" flag. > > In the hypothetical git-annex-like case (simplifying a bit for the > purposes this explanation), for every FILE in your tree you have a > corresponding FILE.lock file, but it's not a boolean, but a log of who's > asked for locks, i.e. lines of: > > <repository UUID> <ts> <state> <who (email?)> <explanation?> > > E.g.: > > $ cat Makefile.lock > my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring all Makefiles" > my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!" > > This log is append-only, when clients encounter conflicts there's a > merge driver to ensure that all updates are kept. > > You can then enact a policy saying you care or don't care about updates > from certain sources, or ignore locks older than so-and-so. > > None of this is stuff I'd really recommend. It's just instructive to > point out that if someone wants a distributed locking solution for git, > it pretty much already exists, you can even (ab)use git-annex for it > today with a tiny hack on top. > > I.e. each time you want to lock a file called Makefile just: > > echo We created a lock for this >Makefile.lock && > git annex add Makefile.lock && > git annex sync > > And to release the lock: > > git annex rm Makefile.lock && > git annex sync > > Then you and others using this just mentally pretend (or setup aliases) > that the following mapping exists: > > git annex get <file> && git annex sync ==> git lockit <file> > git annex rm <file> && git annex sync ==> git unlockit <file> > > And that stuff like "git annex whereis" (designed to list "who has the > files") means "git annex who-has-locks". > > Then you'd change the post-{checkout,merge} hooks to list the locks > "tracked annex files", chmod -w appropriately, and voila, a distributed > locking solution for git built on top of an existing tool you can > implement in a couple of hours. > > Now, if I were in a game studio like this would I do any of this? Nope, > I think even if you go for locks something like the centralized git-lfs > approach is simpler and probably more appropriate (you presumably want > to be centralized anyway). > > But to be honest I don't really get the need for this given something > like the use-case noted upthread: > > > John Austin <john@astrangergravity.com> wrote: > > An essential example would be a team of 5 audio designers working > > together on the SFX for a game. If one designer wants to add a layer > > of ambience to 40% of the .wav files, they have to coordinate with > > everyone else on the project manually. > > If you have 5 people working on a project together, isn't it more > straightforward to post in IRC/E-Mail: > > Hey @all, don't change *.wav files for the next couple of days, > major refactoring. > > That's what we do all the time over in the non-game-non-binary-assets SW > development world, and I daresay that even if you have textual > conflicts, they're sometimes just as hard to solve. > > I.e. you can have two people unaware of each other on a team starting to > in parallel refactor the same set of code in two completely different > ways, needing a lot of manual merging / throwing out of most of one > implementation. The way that's usually dealt with is something like the > above example post to a ML. > > But maybe I'm just not imagining the use-cases. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-16 14:55 ` Ævar Arnfjörð Bjarmason 2018-09-16 20:49 ` John Austin @ 2018-09-17 13:55 ` Taylor Blau 2018-09-17 14:01 ` Randall S. Becker 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 35+ messages in thread From: Taylor Blau @ 2018-09-17 13:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, John Austin, git, sandals, larsxschneider, pastelmobilesuit, Joey Hess On Sun, Sep 16, 2018 at 04:55:13PM +0200, Ævar Arnfjörð Bjarmason wrote: > In the hypothetical git-annex-like case (simplifying a bit for the > purposes this explanation), for every FILE in your tree you have a > corresponding FILE.lock file, but it's not a boolean, but a log of who's > asked for locks, i.e. lines of: > > <repository UUID> <ts> <state> <who (email?)> <explanation?> > > E.g.: > > $ cat Makefile.lock > my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring all Makefiles" > my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!" > > This log is append-only, when clients encounter conflicts there's a > merge driver to ensure that all updates are kept. Certainly. I think that there are two things that aren't well expressed under this mechanism: 1. Having a log of locks held against that (a) file doesn't prevent us from introducing merge conflicts at the <file>.lock level, so we're reliant upon the caller first running 'git pull' and hoping that no one beats them out to locking and pushing their lock. 2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z together." This isn't possible in Git LFS today with the existing "git lfs lock" command (I had to check, but it takes only _one_ filename as its argument). Perhaps it would be nice to support something like this someday in Git LFS, but I think we would have to reimagine how this would look in your file.lock scheme. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: Git for games working group 2018-09-17 13:55 ` Taylor Blau @ 2018-09-17 14:01 ` Randall S. Becker 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 35+ messages in thread From: Randall S. Becker @ 2018-09-17 14:01 UTC (permalink / raw) To: 'Taylor Blau', 'Ævar Arnfjörð Bjarmason' Cc: 'John Austin', git, sandals, larsxschneider, pastelmobilesuit, 'Joey Hess' On September 17, 2018 9:55 AM Taylor Blau wrote: > On Sun, Sep 16, 2018 at 04:55:13PM +0200, Ævar Arnfjörð Bjarmason wrote: > > In the hypothetical git-annex-like case (simplifying a bit for the > > purposes this explanation), for every FILE in your tree you have a > > corresponding FILE.lock file, but it's not a boolean, but a log of > > who's asked for locks, i.e. lines of: > > > > <repository UUID> <ts> <state> <who (email?)> <explanation?> > > > > E.g.: > > > > $ cat Makefile.lock > > my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring > all Makefiles" > > my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!" > > > > This log is append-only, when clients encounter conflicts there's a > > merge driver to ensure that all updates are kept. > > Certainly. I think that there are two things that aren't well expressed under > this mechanism: > > 1. Having a log of locks held against that (a) file doesn't prevent us > from introducing merge conflicts at the <file>.lock level, so we're > reliant upon the caller first running 'git pull' and hoping that no > one beats them out to locking and pushing their lock. > > 2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z > together." This isn't possible in Git LFS today with the existing "git > lfs lock" command (I had to check, but it takes only _one_ filename as > its argument). > > Perhaps it would be nice to support something like this someday in > Git LFS, but I think we would have to reimagine how this would look > in your file.lock scheme. I have an interest in this particular scheme, so am looking at porting both golang and git-lfs over to my platform (HPE-NonStop). The multi-file lock problem can be addressed through a variety of cooperative scheme, and if I get the port, I'm hoping to contribute something to solve it (that's a big IF at this point in time) - there are known mutex patterns to solve this AFAIR. My own community has a similar requirement, so I'm investigating. Cheers, Randall ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 13:55 ` Taylor Blau 2018-09-17 14:01 ` Randall S. Becker @ 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason 2018-09-17 15:57 ` Taylor Blau 2018-09-17 16:47 ` Joey Hess 1 sibling, 2 replies; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-17 15:00 UTC (permalink / raw) To: Taylor Blau Cc: John Austin, git, sandals, larsxschneider, pastelmobilesuit, Joey Hess On Mon, Sep 17 2018, Taylor Blau wrote: > On Sun, Sep 16, 2018 at 04:55:13PM +0200, Ævar Arnfjörð Bjarmason wrote: >> In the hypothetical git-annex-like case (simplifying a bit for the >> purposes this explanation), for every FILE in your tree you have a >> corresponding FILE.lock file, but it's not a boolean, but a log of who's >> asked for locks, i.e. lines of: >> >> <repository UUID> <ts> <state> <who (email?)> <explanation?> >> >> E.g.: >> >> $ cat Makefile.lock >> my-random-per-repo-id 2018-09-15 1 avarab@gmail.com "refactoring all Makefiles" >> my-random-per-repo-id 2018-09-16 0 avarab@gmail.com "done!" >> >> This log is append-only, when clients encounter conflicts there's a >> merge driver to ensure that all updates are kept. > > Certainly. I think that there are two things that aren't well expressed > under this mechanism: > > 1. Having a log of locks held against that (a) file doesn't prevent us > from introducing merge conflicts at the <file>.lock level, so we're > reliant upon the caller first running 'git pull' and hoping that no > one beats them out to locking and pushing their lock. I was eliding a lot of details about how git-annex works under the hood. In reality under git-annex it's not a Makefile.lock file, but there's a dedicated branch (called "git-annex") that stores this sort of metadata, i.e. who has copies of the the "Makefile" file. That branch has dedicated merge drivers for the files it manages, so you never get into these sorts of conflicts. But yeah, the ad-hoc example I mentioned of: echo We created a lock for this >Makefile.lock *Would* conflict if two users picked a different string, so in practice you'd need something standard there, i.e. everyone would just echo "magic git-annex lock" to the file & track it, so even if they did that same action in parallel it wouldn't conflict. There's surely other aspects of that square peg of large file tracking not fitting the round hole of file locking, the point of my write-up was not that *that* solution is perfect, but there's prior art here that's very easily adopted to distributed locking if someone wanted to scratch that itch, since the notion of keeping a log of who has/hasn't gotten a file is very similar to a log of who has/hasn't locked some file(s) in the tree. > 2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z > together." This isn't possible in Git LFS today with the existing "git > lfs lock" command (I had to check, but it takes only _one_ filename as > its argument). > > Perhaps it would be nice to support something like this someday in > Git LFS, but I think we would have to reimagine how this would look > in your file.lock scheme. If you can do it for 1 file you can do it for N with a for-loop, no? So is this just a genreal UI issue in git-annex where some commands don't take lists of filenames (or git pathspecs) to operate on, or a more general issue with locking? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason @ 2018-09-17 15:57 ` Taylor Blau 2018-09-17 16:21 ` Randall S. Becker 2018-09-17 16:47 ` Joey Hess 1 sibling, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-17 15:57 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, John Austin, git, sandals, larsxschneider, pastelmobilesuit, Joey Hess On Mon, Sep 17, 2018 at 05:00:10PM +0200, Ævar Arnfjörð Bjarmason wrote: > > 2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z > > together." This isn't possible in Git LFS today with the existing "git > > lfs lock" command (I had to check, but it takes only _one_ filename as > > its argument). > > > > Perhaps it would be nice to support something like this someday in > > Git LFS, but I think we would have to reimagine how this would look > > in your file.lock scheme. > > If you can do it for 1 file you can do it for N with a for-loop, no? So > is this just a genreal UI issue in git-annex where some commands don't > take lists of filenames (or git pathspecs) to operate on, or a more > general issue with locking? I think that it's more general. I envision a scenario where between iterations of the for-loop, another client acquires a lock later on in the list. I think that the general problem here is that there is no transactional way to express "please give me all N of these locks". Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: Git for games working group 2018-09-17 15:57 ` Taylor Blau @ 2018-09-17 16:21 ` Randall S. Becker 0 siblings, 0 replies; 35+ messages in thread From: Randall S. Becker @ 2018-09-17 16:21 UTC (permalink / raw) To: 'Taylor Blau', 'Ævar Arnfjörð Bjarmason' Cc: 'John Austin', git, sandals, larsxschneider, pastelmobilesuit, 'Joey Hess' On September 17, 2018 11:58 AM, Taylor Blau wrote: > On Mon, Sep 17, 2018 at 05:00:10PM +0200, Ævar Arnfjörð Bjarmason > wrote: > > > 2. Multi-file locks, e.g., "I need to lock file(s) X, Y, and Z > > > together." This isn't possible in Git LFS today with the existing "git > > > lfs lock" command (I had to check, but it takes only _one_ filename as > > > its argument). > > > > > > Perhaps it would be nice to support something like this someday in > > > Git LFS, but I think we would have to reimagine how this would look > > > in your file.lock scheme. > > > > If you can do it for 1 file you can do it for N with a for-loop, no? > > So is this just a genreal UI issue in git-annex where some commands > > don't take lists of filenames (or git pathspecs) to operate on, or a > > more general issue with locking? > > I think that it's more general. > > I envision a scenario where between iterations of the for-loop, another > client acquires a lock later on in the list. I think that the general problem here > is that there is no transactional way to express "please give me all N of these > locks". A composite mutex is better, constructing a long name of X+Y+Z.lock and obtaining the lock of that, then attempting all locks X.lock,Y.lock,Z.lock and if any fail, free up what you did. Otherwise you run into a potential mutex conflict if someone attempts the locks in a different order. Not perfect, but it prevents two from going after the same set of resources, if that set is common. Another pattern is to have a very temporary dir.lock that is active while locks are being grabbed within a subtree, then released when all locks are acquired or fail (so very short time). This second pattern should generally work no matter what combination of locks are required, although single threads lock acquisition - which is probably a good thing functionally, but slower operationally. Cheers, Randall ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason 2018-09-17 15:57 ` Taylor Blau @ 2018-09-17 16:47 ` Joey Hess 2018-09-17 17:23 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 35+ messages in thread From: Joey Hess @ 2018-09-17 16:47 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, John Austin, git, sandals, larsxschneider, pastelmobilesuit [-- Attachment #1: Type: text/plain, Size: 1191 bytes --] Ævar Arnfjörð Bjarmason wrote: > There's surely other aspects of that square peg of large file tracking > not fitting the round hole of file locking, the point of my write-up was > not that *that* solution is perfect, but there's prior art here that's > very easily adopted to distributed locking if someone wanted to scratch > that itch, since the notion of keeping a log of who has/hasn't gotten a > file is very similar to a log of who has/hasn't locked some file(s) in > the tree. Actually they are fundamentally very different. git-annex's tracking of locations of files is eventually consistent, which of course means that at any given point in time it may be currently inconsistent. That is fine for tracking locations of files, but not for locking. When git-annex needs to do an operation that relies on someone else's copy of a file actually being present, it uses real locking. That locking is not centralized, instead it relies on the connections between git repositories. That turns out to be sufficient for git-annex's own locking needs, but it would not be sufficient to avoid file edit conflict problems in eg a split brain situation. -- see shy jo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 16:47 ` Joey Hess @ 2018-09-17 17:23 ` Ævar Arnfjörð Bjarmason 2018-09-23 17:28 ` John Austin 0 siblings, 1 reply; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-17 17:23 UTC (permalink / raw) To: Joey Hess Cc: Taylor Blau, John Austin, git, sandals, larsxschneider, pastelmobilesuit On Mon, Sep 17 2018, Joey Hess wrote: > Ævar Arnfjörð Bjarmason wrote: >> There's surely other aspects of that square peg of large file tracking >> not fitting the round hole of file locking, the point of my write-up was >> not that *that* solution is perfect, but there's prior art here that's >> very easily adopted to distributed locking if someone wanted to scratch >> that itch, since the notion of keeping a log of who has/hasn't gotten a >> file is very similar to a log of who has/hasn't locked some file(s) in >> the tree. > > Actually they are fundamentally very different. git-annex's tracking of > locations of files is eventually consistent, which of course means that > at any given point in time it may be currently inconsistent. That is > fine for tracking locations of files, but not for locking. > > When git-annex needs to do an operation that relies on someone else's > copy of a file actually being present, it uses real locking. That > locking is not centralized, instead it relies on the connections between > git repositories. That turns out to be sufficient for git-annex's own > locking needs, but it would not be sufficient to avoid file edit > conflict problems in eg a split brain situation. Right, all of that's true. I forgot to explicitly say what I meant by "locking" in this context. Clearly it's not suitable for something like actual file locking (in the sense of flock() et al), but rather just advisory locking in the loosest sense of the word, i.e. some git-ish way of someone writing on the office whiteboard "unless you're Bob, don't touch main.c today Tuesday Sep 17th, he's hacking on it". So just a way to have some eventually consistent side channel to pass such a message through git. Something similar to what git-annex does with its "git-annex" branch would work for that, as long as everyone who wanted get such messages ran some equivalent of "git annex sync" in a timely manner (or checked the office whiteboard every day...). Such a schema is never going to be 100% reliable even in centralized source control systems, e.g. even with cvs/perforce you might pull the latest changes, then go on a plane and edit the locked main.c. Then the lock has "failed" in the sense of "the message didn't get there in time, and two people who could have just picked different areas to work on made conflicting edits". As noted upthread this isn't my use-case, I just wanted to point the git-annex method of distributing metadata as a bolt-on to git as interesting prior art. If someone wants "truly distributed, but with file locking like cvs/perforce" something like what git-annex is doing would probably work for them. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 17:23 ` Ævar Arnfjörð Bjarmason @ 2018-09-23 17:28 ` John Austin 2018-09-23 17:56 ` Randall S. Becker 0 siblings, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-23 17:28 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: id, Taylor Blau, git, brian m. carlson, Lars Schneider, pastelmobilesuit I've been putting together a prototype file-locking implementation for a system that plays better with git. What are everyone's thoughts on something like the following? I'm tentatively labeling this system git-sync or sync-server. There are two pieces: 1. A centralized repository called the Global Graph that contains the union git commit graph for local developer repos. When Developer A makes a local commit on branch 'feature', git-sync will automatically push that new commit up to the global server, under a name-spaced branch: 'developera_repoabcdef/feature'. This can be done silently as a force push, and shouldn't ever interrupt the developer's workflow. Simple http queries can be made to the Global Graph, such as "Which commits descend from commit abcdefgh?" 2. A client-side tool that queries the Global Graph to determine when your current changes are in conflict with another developer. It might ask "Are there any commits I don't have locally that modify lockable_file.bin?". This could either be on pre-commit, or for more security, be part of a read-only marking system ala Git LFS. There wouldn't be any "lock" per say, rather, the client could refuse to modify a file if it found other commits for that file in the global graph. The key here is the separation of concerns. The Global Graph is fairly dimwitted -- it doesn't know anything about file locking. But it provides a layer of information from which we can implement file locking on the client side (or perhaps other interesting systems). Thoughts? On Mon, Sep 17, 2018 at 10:23 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Mon, Sep 17 2018, Joey Hess wrote: > > > Ævar Arnfjörð Bjarmason wrote: > >> There's surely other aspects of that square peg of large file tracking > >> not fitting the round hole of file locking, the point of my write-up was > >> not that *that* solution is perfect, but there's prior art here that's > >> very easily adopted to distributed locking if someone wanted to scratch > >> that itch, since the notion of keeping a log of who has/hasn't gotten a > >> file is very similar to a log of who has/hasn't locked some file(s) in > >> the tree. > > > > Actually they are fundamentally very different. git-annex's tracking of > > locations of files is eventually consistent, which of course means that > > at any given point in time it may be currently inconsistent. That is > > fine for tracking locations of files, but not for locking. > > > > When git-annex needs to do an operation that relies on someone else's > > copy of a file actually being present, it uses real locking. That > > locking is not centralized, instead it relies on the connections between > > git repositories. That turns out to be sufficient for git-annex's own > > locking needs, but it would not be sufficient to avoid file edit > > conflict problems in eg a split brain situation. > > Right, all of that's true. I forgot to explicitly say what I meant by > "locking" in this context. Clearly it's not suitable for something like > actual file locking (in the sense of flock() et al), but rather just > advisory locking in the loosest sense of the word, i.e. some git-ish way > of someone writing on the office whiteboard "unless you're Bob, don't > touch main.c today Tuesday Sep 17th, he's hacking on it". > > So just a way to have some eventually consistent side channel to pass > such a message through git. Something similar to what git-annex does > with its "git-annex" branch would work for that, as long as everyone who > wanted get such messages ran some equivalent of "git annex sync" in a > timely manner (or checked the office whiteboard every day...). > > Such a schema is never going to be 100% reliable even in centralized > source control systems, e.g. even with cvs/perforce you might pull the > latest changes, then go on a plane and edit the locked main.c. Then the > lock has "failed" in the sense of "the message didn't get there in time, > and two people who could have just picked different areas to work on > made conflicting edits". > > As noted upthread this isn't my use-case, I just wanted to point the > git-annex method of distributing metadata as a bolt-on to git as > interesting prior art. If someone wants "truly distributed, but with > file locking like cvs/perforce" something like what git-annex is doing > would probably work for them. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: Git for games working group 2018-09-23 17:28 ` John Austin @ 2018-09-23 17:56 ` Randall S. Becker 2018-09-23 19:53 ` John Austin 2018-09-24 13:59 ` Taylor Blau 0 siblings, 2 replies; 35+ messages in thread From: Randall S. Becker @ 2018-09-23 17:56 UTC (permalink / raw) To: 'John Austin', 'Ævar Arnfjörð Bjarmason' Cc: id, 'Taylor Blau', git, 'brian m. carlson', 'Lars Schneider', pastelmobilesuit On September 23, 2018 1:29 PM, John Austin wrote: > I've been putting together a prototype file-locking implementation for a > system that plays better with git. What are everyone's thoughts on > something like the following? I'm tentatively labeling this system git-sync or > sync-server. There are two pieces: > > 1. A centralized repository called the Global Graph that contains the union git > commit graph for local developer repos. When Developer A makes a local > commit on branch 'feature', git-sync will automatically push that new commit > up to the global server, under a name-spaced > branch: 'developera_repoabcdef/feature'. This can be done silently as a > force push, and shouldn't ever interrupt the developer's workflow. > Simple http queries can be made to the Global Graph, such as "Which > commits descend from commit abcdefgh?" > > 2. A client-side tool that queries the Global Graph to determine when your > current changes are in conflict with another developer. It might ask "Are > there any commits I don't have locally that modify lockable_file.bin?". This > could either be on pre-commit, or for more security, be part of a read-only > marking system ala Git LFS. There wouldn't be any "lock" per say, rather, the > client could refuse to modify a file if it found other commits for that file in the > global graph. > > The key here is the separation of concerns. The Global Graph is fairly > dimwitted -- it doesn't know anything about file locking. But it provides a > layer of information from which we can implement file locking on the client > side (or perhaps other interesting systems). > > Thoughts? I'm encouraged of where this is going. I might suggest "sync" is the wrong name here, with "mutex" being slightly better - I would even like to help with your effort and have non-unixy platforms I'd like to do this on. Having this separate from git LFS is an even better idea IMO, and I would suggest implementing this using the same set of build tools that git uses so that it is broadly portable, unlike git LFS. Glad to help there too. I would suggest that a higher-level grouping mechanism of resource groups might be helpful - as in "In need this directory" rather than "I need this file". Better still, I could see "I need all objects in this commit-ish", which would allow a revert operation to succeed or fail atomically while adhering to a lock requirement. One bit that traditional lock-brokering systems implement involve forcing security attribute changes - so an unlocked file is stored as chmod a-w to prevent accidental modification of lockables, when changing that to chmod ?+w when a lock is acquired. It's not perfect, but does catch a lot of errors. Cheers, Randall ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-23 17:56 ` Randall S. Becker @ 2018-09-23 19:53 ` John Austin 2018-09-23 19:55 ` John Austin ` (2 more replies) 2018-09-24 13:59 ` Taylor Blau 1 sibling, 3 replies; 35+ messages in thread From: John Austin @ 2018-09-23 19:53 UTC (permalink / raw) To: Randall Becker Cc: Ævar Arnfjörð Bjarmason, id, Taylor Blau, git, brian m. carlson, Lars Schneider, pastelmobilesuit On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker <rsbecker@nexbridge.com> wrote: > I would even like to help with your effort and have non-unixy platforms I'd like to do this on. > Having this separate from git LFS is an even better idea IMO, and I would suggest implementing this using the same set of build tools that git uses so that it is broadly portable, unlike git LFS. Glad to help there too. Great to hear -- once the code is in a bit better shape I can open it up on github. Cross platform is definitely one of my focuses. I'm currently implementing in Rust because it targets the same space as C and has great, near trivial, cross-platform support. What sorts of platforms are you interested in? Windows is my first target because that's where many game developers live. > I would suggest that a higher-level grouping mechanism of resource groups might be helpful - as in "In need this directory" rather than "I need this file". Better still, I could see "I need all objects in this commit-ish", which would allow a revert operation to succeed or fail atomically while adhering to a lock requirement. > One bit that traditional lock-brokering systems implement involve forcing security attribute changes - so an unlocked file is stored as chmod a-w to prevent accidental modification of lockables, when changing that to chmod ?+w when a lock is acquired. It's not perfect, but does catch a lot of errors. Agreed -- I think this is all up to how the query endpoint and client is designed. A couple of different types of clients could be implemented, depending on the policies you want in place. One could have strict security that stored unlocked files with a-w, as mentioned. Another could be a weaker client, and simply warn developers when their current branch is in conflict. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-23 19:53 ` John Austin @ 2018-09-23 19:55 ` John Austin 2018-09-23 20:43 ` Randall S. Becker 2018-09-24 14:01 ` Taylor Blau 2 siblings, 0 replies; 35+ messages in thread From: John Austin @ 2018-09-23 19:55 UTC (permalink / raw) To: Randall Becker Cc: Ævar Arnfjörð Bjarmason, id, Taylor Blau, git, brian m. carlson, Lars Schneider, pastelmobilesuit Regarding integration into LFS, I'd like to build the library in such a way that it would easy to bundle with LFS (so they could share the same git hooks), but also make it flexible enough to work for other workflows. On Sun, Sep 23, 2018 at 12:53 PM John Austin <john@astrangergravity.com> wrote: > > On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker > <rsbecker@nexbridge.com> wrote: > > I would even like to help with your effort and have non-unixy platforms I'd like to do this on. > > Having this separate from git LFS is an even better idea IMO, and I would suggest implementing this using the same set of build tools that git uses so that it is broadly portable, unlike git LFS. Glad to help there too. > > Great to hear -- once the code is in a bit better shape I can open it > up on github. Cross platform is definitely one of my focuses. I'm > currently implementing in Rust because it targets the same space as C > and has great, near trivial, cross-platform support. What sorts of > platforms are you interested in? Windows is my first target because > that's where many game developers live. > > > I would suggest that a higher-level grouping mechanism of resource groups might be helpful - as in "In need this directory" rather than "I need this file". Better still, I could see "I need all objects in this commit-ish", which would allow a revert operation to succeed or fail atomically while adhering to a lock requirement. > > One bit that traditional lock-brokering systems implement involve forcing security attribute changes - so an unlocked file is stored as chmod a-w to prevent accidental modification of lockables, when changing that to chmod ?+w when a lock is acquired. It's not perfect, but does catch a lot of errors. > > Agreed -- I think this is all up to how the query endpoint and client > is designed. A couple of different types of clients could be > implemented, depending on the policies you want in place. One could > have strict security that stored unlocked files with a-w, as > mentioned. Another could be a weaker client, and simply warn > developers when their current branch is in conflict. ^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: Git for games working group 2018-09-23 19:53 ` John Austin 2018-09-23 19:55 ` John Austin @ 2018-09-23 20:43 ` Randall S. Becker 2018-09-24 14:01 ` Taylor Blau 2 siblings, 0 replies; 35+ messages in thread From: Randall S. Becker @ 2018-09-23 20:43 UTC (permalink / raw) To: 'John Austin' Cc: 'Ævar Arnfjörð Bjarmason', id, 'Taylor Blau', git, 'brian m. carlson', 'Lars Schneider', pastelmobilesuit On September 23, 2018 3:54 PM, John Austin wrote: > On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker > <rsbecker@nexbridge.com> wrote: > > I would even like to help with your effort and have non-unixy platforms I'd > like to do this on. > > Having this separate from git LFS is an even better idea IMO, and I would > suggest implementing this using the same set of build tools that git uses so > that it is broadly portable, unlike git LFS. Glad to help there too. > > Great to hear -- once the code is in a bit better shape I can open it up on > github. Cross platform is definitely one of my focuses. I'm currently > implementing in Rust because it targets the same space as C and has great, > near trivial, cross-platform support. What sorts of platforms are you > interested in? Windows is my first target because that's where many game > developers live. I have looked at porting Rust to my two mid-to-large platforms which do not have a Rust port. I would prefer keeping within what git currently requires without adding dependencies, but I'd be happy to take a Rust prototype and translate it. My need is actually not for gamers, but in similar processes that gamers use. The following dependences are not available on the two platforms I have in mind: g++ or clang; And cmake (despite efforts by people on the platform to do ports). This puts me in a difficult spot with Rust. I understand you might want to use Rust's implied threating, so I would be willing to do the pthread work to make it happen in C. > > I would suggest that a higher-level grouping mechanism of resource groups > might be helpful - as in "In need this directory" rather than "I need this file". > Better still, I could see "I need all objects in this commit-ish", which would > allow a revert operation to succeed or fail atomically while adhering to a lock > requirement. > > One bit that traditional lock-brokering systems implement involve forcing > security attribute changes - so an unlocked file is stored as chmod a-w to > prevent accidental modification of lockables, when changing that to chmod > ?+w when a lock is acquired. It's not perfect, but does catch a lot of errors. > > Agreed -- I think this is all up to how the query endpoint and client is > designed. A couple of different types of clients could be implemented, > depending on the policies you want in place. One could have strict security > that stored unlocked files with a-w, as mentioned. Another could be a > weaker client, and simply warn developers when their current branch is in > conflict. Regards, Randall ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-23 19:53 ` John Austin 2018-09-23 19:55 ` John Austin 2018-09-23 20:43 ` Randall S. Becker @ 2018-09-24 14:01 ` Taylor Blau 2018-09-24 15:34 ` John Austin 2 siblings, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-24 14:01 UTC (permalink / raw) To: John Austin Cc: Randall Becker, Ævar Arnfjörð Bjarmason, id, Taylor Blau, git, brian m. carlson, Lars Schneider, pastelmobilesuit On Sun, Sep 23, 2018 at 12:53:58PM -0700, John Austin wrote: > On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker > <rsbecker@nexbridge.com> wrote: > > I would even like to help with your effort and have non-unixy platforms I'd like to do this on. > > Having this separate from git LFS is an even better idea IMO, and I would suggest implementing this using the same set of build tools that git uses so that it is broadly portable, unlike git LFS. Glad to help there too. > > Great to hear -- once the code is in a bit better shape I can open it > up on github. Cross platform is definitely one of my focuses. I'm > currently implementing in Rust because it targets the same space as C > and has great, near trivial, cross-platform support. What sorts of > platforms are you interested in? Windows is my first target because > that's where many game developers live. This would likely mean that Git LFS will have to reimplement it, since we strictly avoid using CGo (Go's mechanism to issue function calls to other languages). The upshot is that it likely shouldn't be too much effort for anybody, and the open-source community would get a Go implementation of the API, too. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-24 14:01 ` Taylor Blau @ 2018-09-24 15:34 ` John Austin 2018-09-24 19:58 ` Taylor Blau 0 siblings, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-24 15:34 UTC (permalink / raw) To: Taylor Blau Cc: Randall Becker, Ævar Arnfjörð Bjarmason, id, git, brian m. carlson, Lars Schneider, pastelmobilesuit Perhaps git-global-graph is a decent name. GGG? G3? :). The structure right now in my head looks a bit like: Global Graph: client - post-commit git hooks to push changes up to the GG git server - just the standard git server configuration query server - replies with information about the current state of the GG Locks Pre-Commit: client - pre-commit hook that makes requests to the GG query server For cross-platform compatibility, the Global Graph client and the Locks/Conflicts client are the pieces that need to be use-able on all platforms. My goal is to keep these pieces as simple as possible. I'd like to at least start prototyping these in Rust, hopefully in a way that can either be easily ported or easily re-implemented in C later on, once things are feature-frozen. For LFS, The main points of integration with I see are: -- bundling of packages (optionally install this package with a normal LFS installation) -- `git lfs locks` integration. ie. integration with the read-only control of LFS If we push more of the functionality into the gg query server, the integration with `lfs locks` could be simple enough to be a couple of web requests. That might help avoid integration issues. > we strictly avoid using CGo What's the main reason for this? Build system complexity? On Mon, Sep 24, 2018 at 7:37 AM Taylor Blau <me@ttaylorr.com> wrote: > > On Sun, Sep 23, 2018 at 12:53:58PM -0700, John Austin wrote: > > On Sun, Sep 23, 2018 at 10:57 AM Randall S. Becker > > <rsbecker@nexbridge.com> wrote: > > > I would even like to help with your effort and have non-unixy platforms I'd like to do this on. > > > Having this separate from git LFS is an even better idea IMO, and I would suggest implementing this using the same set of build tools that git uses so that it is broadly portable, unlike git LFS. Glad to help there too. > > > > Great to hear -- once the code is in a bit better shape I can open it > > up on github. Cross platform is definitely one of my focuses. I'm > > currently implementing in Rust because it targets the same space as C > > and has great, near trivial, cross-platform support. What sorts of > > platforms are you interested in? Windows is my first target because > > that's where many game developers live. > > This would likely mean that Git LFS will have to reimplement it, since > we strictly avoid using CGo (Go's mechanism to issue function calls to > other languages). > > The upshot is that it likely shouldn't be too much effort for anybody, > and the open-source community would get a Go implementation of the API, > too. > > Thanks, > Taylor > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-24 15:34 ` John Austin @ 2018-09-24 19:58 ` Taylor Blau 2018-09-25 4:05 ` John Austin 0 siblings, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-24 19:58 UTC (permalink / raw) To: John Austin Cc: Taylor Blau, Randall Becker, Ævar Arnfjörð Bjarmason, id, git, brian m. carlson, Lars Schneider, pastelmobilesuit On Mon, Sep 24, 2018 at 08:34:44AM -0700, John Austin wrote: > Perhaps git-global-graph is a decent name. GGG? G3? :). The structure > right now in my head looks a bit like: > > Global Graph: > client - post-commit git hooks to push changes up to the GG I'm replying to this part of the email to note that this would cause Git LFS to have to do some extra work, since running 'git lfs install' already writes to .git/hooks/post-commit (ironically, to detect and unlock locks that we should have released). I'm not immediately sure about how we'd resolve this, though I suspect it would look like either of: - Git LFS knows how to install or _append_ hooks to a given location, should one already exist at that path on disk, or - git-global-graph knows how to accommodate Git LFS, and can include a line that calls 'git-lfs-post-commit(1)', perhaps via: $ git global-graph install --git-lfs=$(which git-lfs) or similar. > For LFS, The main points of integration with I see are: > -- bundling of packages (optionally install this package with a > normal LFS installation) > -- `git lfs locks` integration. ie. integration with the read-only > control of LFS Sounds sane to me. > > we strictly avoid using CGo > > What's the main reason for this? Build system complexity? A couple of reasons. CGO is widely considered to be (1) slow and (2) unsafe. For our purposes, this would almost be OK, except that it makes it impossible for me to build cross-platform binaries without the correct compilers installed. Today, I build Git LFS for every pair in {Windows, Darwin, Linux, FreeBSD} x {386, amd64} by running 'make release', and using CGO would not allow me to do that. Transitioning from Go to CGO during each call is notoriously expensive, and concedes many of the benefits that leads us to choose Go in the first place. (Although now that I write much more C than Go, I don't think I would make the same argument today ;-).) Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-24 19:58 ` Taylor Blau @ 2018-09-25 4:05 ` John Austin 2018-09-25 20:14 ` Taylor Blau 0 siblings, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-25 4:05 UTC (permalink / raw) To: Taylor Blau Cc: Randall Becker, Ævar Arnfjörð Bjarmason, id, git, brian m. carlson, Lars Schneider, pastelmobilesuit On Mon, Sep 24, 2018 at 12:58 PM Taylor Blau <me@ttaylorr.com> wrote: > I'm replying to this part of the email to note that this would cause Git > LFS to have to do some extra work, since running 'git lfs install' > already writes to .git/hooks/post-commit (ironically, to detect and > unlock locks that we should have released). Right, that should have been another bullet point. The fact that there can only be one git hook is.. frustrating. Perhaps, if LFS has an option to bundle global-graph, LFS could merge the hooks when installing? If you instead install global-graph after LFS, I think it should probably attempt something like: -- first move the existing hook to a folder: post-commit.d/ -- install the global-graph hook to post-commit.d/ -- install a new hook at post-commit that simply calls all executables in post-commit.d/ Not sure if this is something that's been discussed, since I know LFS has a similar issue with existing hooks, but might be sensible. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-25 4:05 ` John Austin @ 2018-09-25 20:14 ` Taylor Blau 0 siblings, 0 replies; 35+ messages in thread From: Taylor Blau @ 2018-09-25 20:14 UTC (permalink / raw) To: John Austin Cc: Taylor Blau, Randall Becker, Ævar Arnfjörð Bjarmason, id, git, brian m. carlson, Lars Schneider, pastelmobilesuit On Mon, Sep 24, 2018 at 09:05:56PM -0700, John Austin wrote: > On Mon, Sep 24, 2018 at 12:58 PM Taylor Blau <me@ttaylorr.com> wrote: > > I'm replying to this part of the email to note that this would cause Git > > LFS to have to do some extra work, since running 'git lfs install' > > already writes to .git/hooks/post-commit (ironically, to detect and > > unlock locks that we should have released). > > Right, that should have been another bullet point. The fact that there > can only be one git hook is.. frustrating. Sure, I think one approach to dealing with this is to teach Git how to handle multiple hooks for the same phase of hook. I don't know how likely this is in practice to be something that would be acceptable, since it seems to involve much more work than either of our tools learning about the other. > Perhaps, if LFS has an option to bundle global-graph, LFS could merge > the hooks when installing? Right. I think that (in an ideal world) both tools would know about the other, that way we can not have to worry about who installs what first. > If you instead install global-graph after LFS, I think it should > probably attempt something like: > -- first move the existing hook to a folder: post-commit.d/ > -- install the global-graph hook to post-commit.d/ > -- install a new hook at post-commit that simply calls all > executables in post-commit.d/ > > Not sure if this is something that's been discussed, since I know LFS > has a similar issue with existing hooks, but might be sensible. Yeah, I think that that would be fine, too. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-23 17:56 ` Randall S. Becker 2018-09-23 19:53 ` John Austin @ 2018-09-24 13:59 ` Taylor Blau 1 sibling, 0 replies; 35+ messages in thread From: Taylor Blau @ 2018-09-24 13:59 UTC (permalink / raw) To: Randall S. Becker Cc: 'John Austin', 'Ævar Arnfjörð Bjarmason', id, 'Taylor Blau', git, 'brian m. carlson', 'Lars Schneider', pastelmobilesuit On Sun, Sep 23, 2018 at 01:56:37PM -0400, Randall S. Becker wrote: > On September 23, 2018 1:29 PM, John Austin wrote: > > I've been putting together a prototype file-locking implementation for a > > system that plays better with git. What are everyone's thoughts on > > something like the following? I'm tentatively labeling this system git-sync or > > sync-server. There are two pieces: > > > > 1. A centralized repository called the Global Graph that contains the union git > > commit graph for local developer repos. When Developer A makes a local > > commit on branch 'feature', git-sync will automatically push that new commit > > up to the global server, under a name-spaced > > branch: 'developera_repoabcdef/feature'. This can be done silently as a > > force push, and shouldn't ever interrupt the developer's workflow. > > Simple http queries can be made to the Global Graph, such as "Which > > commits descend from commit abcdefgh?" > > > > 2. A client-side tool that queries the Global Graph to determine when your > > current changes are in conflict with another developer. It might ask "Are > > there any commits I don't have locally that modify lockable_file.bin?". This > > could either be on pre-commit, or for more security, be part of a read-only > > marking system ala Git LFS. There wouldn't be any "lock" per say, rather, the > > client could refuse to modify a file if it found other commits for that file in the > > global graph. > > > > The key here is the separation of concerns. The Global Graph is fairly > > dimwitted -- it doesn't know anything about file locking. But it provides a > > layer of information from which we can implement file locking on the client > > side (or perhaps other interesting systems). > > > > Thoughts? > > I'm encouraged of where this is going. I might suggest "sync" is the > wrong name here, with "mutex" being slightly better - I would even > like to help with your effort and have non-unixy platforms I'd like to > do this on. > > Having this separate from git LFS is an even better idea IMO, and I > would suggest implementing this using the same set of build tools that > git uses so that it is broadly portable, unlike git LFS. Glad to help > there too. I think that this is the way that we would prefer it, too. Ideally users outside of those who have Git LFS installed or those that are regular users of it should be able to interoperate with those using the global graph. We're thinking a lot about what should go into the next major version of Git LFS, v3.0.0, and this seems a good candidate to me. We'd also want to figure out how to transition v2.0.0-era locks into the new global graph, but that seems a topic for a later discussion. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 19:00 ` Taylor Blau 2018-09-14 21:09 ` John Austin @ 2018-09-14 21:13 ` John Austin 2018-09-16 7:56 ` David Aguilar 1 sibling, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-14 21:13 UTC (permalink / raw) To: me; +Cc: git, sandals, larsxschneider, pastelmobilesuit Hey Taylor, Great to have your support! I think LFS has done a great job so far solving the large file issue. I've been working myself on strategies for handling binary conflicts, and particularly how to do it in a git-friendly way (ie. avoiding as much centralization as possible and playing into the commit/branching model of git). I've got to a loose design that I like, but it'd be good to get some feedback, as well as hearing what other game devs would want in a binary conflict system. - John On Fri, Sep 14, 2018 at 12:00 PM Taylor Blau <me@ttaylorr.com> wrote: > > Hi John, > > On Fri, Sep 14, 2018 at 10:55:39AM -0700, John Austin wrote: > > Is anyone interested in contributing/offering insights? I suspect most > > folks here are git users as is, but if you know someone stuck on > > Perforce, I'd love to chat with them! > > I'm thrilled that other folks are interested in this, too. I'm not a > video game developer myself, but I am the maintainer of Git LFS. If > there's a capacity in which I could be useful to this group, I'd be more > than happy to offer myself in that capacity. > > I'm cc-ing in brian carlson, Lars Schneider, and Preben Ingvaldsen on > this email, too, since they all server on the core team of the project. > > Thanks, > Taylor > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 21:13 ` John Austin @ 2018-09-16 7:56 ` David Aguilar 2018-09-17 13:48 ` Taylor Blau 0 siblings, 1 reply; 35+ messages in thread From: David Aguilar @ 2018-09-16 7:56 UTC (permalink / raw) To: John Austin; +Cc: me, git, sandals, larsxschneider, pastelmobilesuit On Fri, Sep 14, 2018 at 02:13:28PM -0700, John Austin wrote: > Hey Taylor, > > Great to have your support! I think LFS has done a great job so far > solving the large file issue. I've been working myself on strategies > for handling binary conflicts, and particularly how to do it in a > git-friendly way (ie. avoiding as much centralization as possible and > playing into the commit/branching model of git). I've got to a loose > design that I like, but it'd be good to get some feedback, as well as > hearing what other game devs would want in a binary conflict system. > > - John Hey John, thanks for LFS, and thanks to Taylor for bringing up this topic. Regarding file locking, the gitolite docs are insightful: http://gitolite.com/gitolite/locking/index.html File locking is how P4 handles binary conflicts. It's actually conflict prevention -- the locks prevent users from stepping on each other without needing to actually talk to each other. (I've always believed that this is actually a social problem (not a technical one) that is best served by better communication, but there's no doubt that having a technical guard in place is useful in many scenarios.) From the POV of using Git as a P4 replacement, the locking support in git-lfs seems like a fine solution to prevent binary conflicts. https://github.com/git-lfs/git-lfs/wiki/File-Locking Are there any missing features that would help improve LFS solution? Locking is just one aspect of binary conflicts. In a lock-free world, another aspect is tooling around dealing with actual conflicts. It seems like the main challenges there are related to introspection of changes and mechanisms for combining changes. Combining changes is inherently file-format specific, and I suspect that native authoring tools are best used in those scenarios. Maybe LFS can help deal with binary conflicts by having short and sweet ways to grab the "base", "their" and "our" versions of the conflict files. Example: git lfs checkout --theirs --to theirs.wav conflict.wav git lfs checkout --ours --to ours.wav conflict.wav git lfs checkout --base --to base.wav conflict.wav Then the user can use {ours,theirs,base}.wav to produce the resolved result using their usual authoring tools. From the plumbing perspective, we already have the tools to do this today, but they're not really user-friendly because they require the user to use "git cat-file --filters --path=..." and redirect the output to get at their changes. Not sure if git-lfs is the right place for that kind of helper wrapper command, but it's not a bad place for it either. That said, none of these are user-friendly for non-Gits that might be intimidated by a command-line. Is there anything we could add to git-cola to help? Being able to save the different conflicted index stages to separately named files seems like an obvious feature that would help users when confronted with a binary conflict. With LFS and the ongoing work related to MVFS, shallow clone, and partial checkout, the reasons to use P4 over Git are becoming less and less compelling. It'd be great to polish the game asset workflows further so that we can have a cohesive approach to doing game asset development using Git that is easy enough for non-technical users to use and understand. I mention git-cola because it's a Git porcelain that already has git-lfs support and I'm very much in favor of improving workflows related to interacting with LFS, large files, repos, and binary content. Are there other rough edges around (large) binary files that can be improved? One thought that comes to mind is diffing -- I imagine that we might want to use different diff tools depending on the file format. Currently git-difftool uses a single tool for all files, but it seems like being able to use different tools, based on the file type, could be helpful. Not sure if difftool is the right place for that, but being able to specify different tools per-file seems be useful in that scenario. Another avenue that could use help is documentation about suggested workflows. Git's core documentation talks about various large-file-centric features in isolation, but it'd be good to have a single user-centric document (not unlike gitworkflows) to document best practices for dealing with large files, repos, game assets, etc. That alone would help dispel the myth that Git is unsuitable for large repos, large files, and binary content. -- David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-16 7:56 ` David Aguilar @ 2018-09-17 13:48 ` Taylor Blau 0 siblings, 0 replies; 35+ messages in thread From: Taylor Blau @ 2018-09-17 13:48 UTC (permalink / raw) To: David Aguilar Cc: John Austin, me, git, sandals, larsxschneider, pastelmobilesuit On Sun, Sep 16, 2018 at 12:56:04AM -0700, David Aguilar wrote: > Combining changes is inherently file-format specific, and I suspect > that native authoring tools are best used in those scenarios. > Maybe LFS can help deal with binary conflicts by having short and sweet > ways to grab the "base", "their" and "our" versions of the conflict > files. > > Example: > > git lfs checkout --theirs --to theirs.wav conflict.wav > git lfs checkout --ours --to ours.wav conflict.wav > git lfs checkout --base --to base.wav conflict.wav > > Then the user can use {ours,theirs,base}.wav to produce the > resolved result using their usual authoring tools. That's a good idea, and I think that it's sensible that we teach Git LFS how to do it. I've opened an issue to that effect in our tracker: https://github.com/git-lfs/git-lfs/issues/3258 > One thought that comes to mind is diffing -- I imagine that we > might want to use different diff tools depending on the file format. > Currently git-difftool uses a single tool for all files, but it seems > like being able to use different tools, based on the file type, could > be helpful. We have had some internal discussion about this. I think that we had landed on something similar to: 1. Teach .gitattributes a new mergetool= attribute, which would specify a reference to a mergetool driver, and 2. Teach .gitconfig about a way to store meregtool drivers, similar to how we name filters today. Upon my re-reading of this proposal, it was suggested that we implement this in terms of 'git lfs mergetool', but I don't see why this wouldn't be a good fit for Git in general. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 17:55 Git for games working group John Austin 2018-09-14 19:00 ` Taylor Blau @ 2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason 2018-09-14 23:36 ` John Austin 1 sibling, 1 reply; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-09-14 21:21 UTC (permalink / raw) To: John Austin; +Cc: git On Fri, Sep 14 2018, John Austin wrote: > - improvements to large file management (mostly solved by LFS, GVFS) There's also the nascent "don't fetch all the blobs" work-in-progress clone mode which might be of interest to you: https://blog.github.com/2018-09-10-highlights-from-git-2-19/#partial-clones > - avoiding excessive binary file conflicts (this is one of the big > reasons most studio are on Perforce) Is this just a reference to the advisory locking mode perforce/cvs etc. have or is there something else at play here? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason @ 2018-09-14 23:36 ` John Austin 2018-09-15 16:42 ` Taylor Blau 0 siblings, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-14 23:36 UTC (permalink / raw) To: avarab; +Cc: git > There's also the nascent "don't fetch all the blobs" work-in-progress > clone mode which might be of interest to you: > https://blog.github.com/2018-09-10-highlights-from-git-2-19/#partial-clones Yes! I've been pretty excited about this functionality. It drives a lot of GVFS/VFS for Git under the hood. I think it's a great solution to the repo-size issue. > Is this just a reference to the advisory locking mode perforce/cvs > etc. have or is there something else at play here? Good catch. I actually phrased this precisely to avoid calling it "File Locking". An essential example would be a team of 5 audio designers working together on the SFX for a game. If one designer wants to add a layer of ambience to 40% of the .wav files, they have to coordinate with everyone else on the project manually. Without coordination this developer will clobber any changes made to these files while he worked on them. File Locking is the way that Perforce manages this, where a developer can exclusively block modifications on a set of files across the entire team. File locking is just one solution to the problem. It's also one that doesn't play well with git's decentralized structure and branching model. I would state the problem more generally: Developers need some way to know, as early as possible, if modifying a file will cause conflicts upstream. Optionally this knowledge can block modifying the file directly (if we're certain there's already a conflicting version of the file on a different branch). JA ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-14 23:36 ` John Austin @ 2018-09-15 16:42 ` Taylor Blau 2018-09-16 18:17 ` John Austin 0 siblings, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-15 16:42 UTC (permalink / raw) To: John Austin; +Cc: avarab, git On Fri, Sep 14, 2018 at 04:36:19PM -0700, John Austin wrote: > > There's also the nascent "don't fetch all the blobs" work-in-progress > > clone mode which might be of interest to you: > > https://blog.github.com/2018-09-10-highlights-from-git-2-19/#partial-clones > > Yes! I've been pretty excited about this functionality. It drives a > lot of GVFS/VFS for Git under the hood. I think it's a great solution > to the repo-size issue. Right, though this still subjects the remote copy to all of the difficulty of packing large objects (though Christian's work to support other object database implementations would go a long way to help this). Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-15 16:42 ` Taylor Blau @ 2018-09-16 18:17 ` John Austin 2018-09-16 22:05 ` Jonathan Nieder 0 siblings, 1 reply; 35+ messages in thread From: John Austin @ 2018-09-16 18:17 UTC (permalink / raw) To: Taylor Blau; +Cc: Ævar Arnfjörð Bjarmason, git > Right, though this still subjects the remote copy to all of the > difficulty of packing large objects (though Christian's work to support > other object database implementations would go a long way to help this). Ah, interesting -- I didn't realize this step was part of the bottleneck. I presumed git didn't do much more than perhaps gzip'ing binary files when it packed them up. Or do you mean the growing cost of storing the objects locally as you work? Perhaps that could be solved by allowing the client more control (ie. delete the oldest blobs that exist on the server). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-16 18:17 ` John Austin @ 2018-09-16 22:05 ` Jonathan Nieder 2018-09-17 13:58 ` Taylor Blau 0 siblings, 1 reply; 35+ messages in thread From: Jonathan Nieder @ 2018-09-16 22:05 UTC (permalink / raw) To: John Austin; +Cc: Taylor Blau, Ævar Arnfjörð Bjarmason, git Hi, On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote: > Taylor Blau wrote: >> Right, though this still subjects the remote copy to all of the >> difficulty of packing large objects (though Christian's work to support >> other object database implementations would go a long way to help this). > > Ah, interesting -- I didn't realize this step was part of the > bottleneck. I presumed git didn't do much more than perhaps gzip'ing > binary files when it packed them up. Or do you mean the growing cost > of storing the objects locally as you work? Perhaps that could be > solved by allowing the client more control (ie. delete the oldest > blobs that exist on the server). John, I believe you are correct. Taylor, can you elaborate about what packing overhead you are referring to? One thing I would like to see in the long run to help Git cope with very large files is adding something similar to bup's "bupsplit" to the packfile format (or even better, to the actual object format, so that it affects object names). In other words, using a rolling hash to decide where to split a blob and use a tree-like structure so that (1) common portions between files can deduplicated and (2) portions can be hashed in parallel. I haven't heard of these things being the bottleneck for anyone in practice today, though. Thanks, Jonathan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-16 22:05 ` Jonathan Nieder @ 2018-09-17 13:58 ` Taylor Blau 2018-09-17 15:58 ` Jonathan Nieder 0 siblings, 1 reply; 35+ messages in thread From: Taylor Blau @ 2018-09-17 13:58 UTC (permalink / raw) To: Jonathan Nieder Cc: John Austin, Taylor Blau, Ævar Arnfjörð Bjarmason, git On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote: > Hi, > > On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote: > > Taylor Blau wrote: > > >> Right, though this still subjects the remote copy to all of the > >> difficulty of packing large objects (though Christian's work to support > >> other object database implementations would go a long way to help this). > > > > Ah, interesting -- I didn't realize this step was part of the > > bottleneck. I presumed git didn't do much more than perhaps gzip'ing > > binary files when it packed them up. Or do you mean the growing cost > > of storing the objects locally as you work? Perhaps that could be > > solved by allowing the client more control (ie. delete the oldest > > blobs that exist on the server). > > John, I believe you are correct. Taylor, can you elaborate about what > packing overhead you are referring to? Jonathan, you are right. I was also referring about the increased time that Git would spend trying to find good packfile chains with larger, non-textual objects. I haven't done any hard benchmarking work on this, so it may be a moot point. > In other words, using a rolling hash to decide where to split a blob > and use a tree-like structure so that (1) common portions between > files can deduplicated and (2) portions can be hashed in parallel. I think that this is worth discussing further. Certainly, it would go a good bit of the way to addressing the point that I responded to earlier in this message. Thanks, Taylor ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 13:58 ` Taylor Blau @ 2018-09-17 15:58 ` Jonathan Nieder 2018-10-03 12:28 ` Thomas Braun 0 siblings, 1 reply; 35+ messages in thread From: Jonathan Nieder @ 2018-09-17 15:58 UTC (permalink / raw) To: Taylor Blau; +Cc: John Austin, Ævar Arnfjörð Bjarmason, git Taylor Blau wrote: > On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote: > > On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote: > > > Taylor Blau wrote: >>>> Right, though this still subjects the remote copy to all of the >>>> difficulty of packing large objects (though Christian's work to support >>>> other object database implementations would go a long way to help this). >>> >>> Ah, interesting -- I didn't realize this step was part of the >>> bottleneck. I presumed git didn't do much more than perhaps gzip'ing >>> binary files when it packed them up. Or do you mean the growing cost >>> of storing the objects locally as you work? Perhaps that could be >>> solved by allowing the client more control (ie. delete the oldest >>> blobs that exist on the server). >> >> John, I believe you are correct. Taylor, can you elaborate about what >> packing overhead you are referring to? > > Jonathan, you are right. I was also referring about the increased time > that Git would spend trying to find good packfile chains with larger, > non-textual objects. I haven't done any hard benchmarking work on this, > so it may be a moot point. Ah, thanks. See git-config(1): core.bigFileThreshold Files larger than this size are stored deflated, without attempting delta compression. Default is 512 MiB on all platforms. If that's failing on your machine then it would be a bug, so we'd definitely want to know. Jonathan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Git for games working group 2018-09-17 15:58 ` Jonathan Nieder @ 2018-10-03 12:28 ` Thomas Braun 0 siblings, 0 replies; 35+ messages in thread From: Thomas Braun @ 2018-10-03 12:28 UTC (permalink / raw) To: Jonathan Nieder, Taylor Blau Cc: John Austin, Ævar Arnfjörð Bjarmason, git Am 17.09.2018 um 17:58 schrieb Jonathan Nieder: [...] > Ah, thanks. See git-config(1): > > core.bigFileThreshold > Files larger than this size are stored deflated, > without attempting delta compression. > > Default is 512 MiB on all platforms. > In addition to config.bigFileThreshold you can also unset the delta attribute for file extensions you don't want to get delta compressed. See "git help attributes". And while you are at it, mark the files as binary so that git diff/log don't have to guess. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2018-10-03 12:28 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-14 17:55 Git for games working group John Austin 2018-09-14 19:00 ` Taylor Blau 2018-09-14 21:09 ` John Austin 2018-09-15 16:40 ` Taylor Blau 2018-09-16 14:55 ` Ævar Arnfjörð Bjarmason 2018-09-16 20:49 ` John Austin 2018-09-17 13:55 ` Taylor Blau 2018-09-17 14:01 ` Randall S. Becker 2018-09-17 15:00 ` Ævar Arnfjörð Bjarmason 2018-09-17 15:57 ` Taylor Blau 2018-09-17 16:21 ` Randall S. Becker 2018-09-17 16:47 ` Joey Hess 2018-09-17 17:23 ` Ævar Arnfjörð Bjarmason 2018-09-23 17:28 ` John Austin 2018-09-23 17:56 ` Randall S. Becker 2018-09-23 19:53 ` John Austin 2018-09-23 19:55 ` John Austin 2018-09-23 20:43 ` Randall S. Becker 2018-09-24 14:01 ` Taylor Blau 2018-09-24 15:34 ` John Austin 2018-09-24 19:58 ` Taylor Blau 2018-09-25 4:05 ` John Austin 2018-09-25 20:14 ` Taylor Blau 2018-09-24 13:59 ` Taylor Blau 2018-09-14 21:13 ` John Austin 2018-09-16 7:56 ` David Aguilar 2018-09-17 13:48 ` Taylor Blau 2018-09-14 21:21 ` Ævar Arnfjörð Bjarmason 2018-09-14 23:36 ` John Austin 2018-09-15 16:42 ` Taylor Blau 2018-09-16 18:17 ` John Austin 2018-09-16 22:05 ` Jonathan Nieder 2018-09-17 13:58 ` Taylor Blau 2018-09-17 15:58 ` Jonathan Nieder 2018-10-03 12:28 ` Thomas Braun
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).