git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Why Git LFS is not a built-in feature
@ 2020-11-13  9:45 Alireza
  2020-11-14  0:29 ` brian m. carlson
  0 siblings, 1 reply; 6+ messages in thread
From: Alireza @ 2020-11-13  9:45 UTC (permalink / raw)
  To: git

Currently, having to set up git-lfs in each client and checking server
compatibility is a huge barrier for using it in the first place,
whilst it is generally a good practice to store large files in lfs.

As a consequence a lot of repos are not using it when they should.

Is there any reason that we don't have built-in support for such an
important feature?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why Git LFS is not a built-in feature
  2020-11-13  9:45 Why Git LFS is not a built-in feature Alireza
@ 2020-11-14  0:29 ` brian m. carlson
  2020-11-14 16:27   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 6+ messages in thread
From: brian m. carlson @ 2020-11-14  0:29 UTC (permalink / raw)
  To: Alireza; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]

On 2020-11-13 at 09:45:52, Alireza wrote:
> Currently, having to set up git-lfs in each client and checking server
> compatibility is a huge barrier for using it in the first place,
> whilst it is generally a good practice to store large files in lfs.
> 
> As a consequence a lot of repos are not using it when they should.
> 
> Is there any reason that we don't have built-in support for such an
> important feature?

There are a couple reasons that it's not a built-in feature:

* First, there are several options in this space.  Git LFS is one,
  git-annex is another, and some people prefer to store large objects in
  the repository and use partial clone.  Git, as a project, tries to be
  flexible and meet the needs of various kinds of users without
  privileging one or another external tool.
* Git LFS is a complicated piece of software and it's currently written
  in Go, which is different from most of Git.  Re-implementing it in C
  would be burdensome, and there's little interest in maintaining Go
  software in the Git project.
* Git LFS uses a different protocol from Git, requiring additional
  configuration and a separate server-side component.
* The smudge and clean filter approach has some limitations, among them
  that users who don't have the external filter installed can commit
  uncleaned objects that then result in the working tree consistently
  being modified, even after git reset --hard.

It's my hope that the built-in support for partial clone will mature
enough to the point where that's a clear win and the need for external
tools isn't as great, since I think that will ultimately provide a
better experience for users.  Some people are already using it.  So in
some sense, we do have this as a built-in feature, maybe just not the
one you were expecting.

Additionally, in many cases, projects can avoid the need for storing
large files at all by using repository best practices, like not storing
build products or binary dependencies in the repository and instead
using an artifact server or a standard packaging system.  If possible,
that will almost always provide a better experience than any solution
for storing large files in the repository.

Finally, if you do want to use an external tool like Git LFS, it's
reasonably straightforward to specify a script to install and configure
the required dependencies for your project on each system so that
everything just works.  One popular location for this kind of path is
script/bootstrap.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why Git LFS is not a built-in feature
  2020-11-14  0:29 ` brian m. carlson
@ 2020-11-14 16:27   ` Konstantin Ryabitsev
  2020-11-14 18:20     ` Ævar Arnfjörð Bjarmason
  2020-11-14 19:15     ` Why Git LFS is not a built-in feature brian m. carlson
  0 siblings, 2 replies; 6+ messages in thread
From: Konstantin Ryabitsev @ 2020-11-14 16:27 UTC (permalink / raw)
  To: brian m. carlson, Alireza, git

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

On Sat, Nov 14, 2020 at 12:29:02AM +0000, brian m. carlson wrote:
> Additionally, in many cases, projects can avoid the need for storing
> large files at all by using repository best practices, like not storing
> build products or binary dependencies in the repository and instead
> using an artifact server or a standard packaging system.  If possible,
> that will almost always provide a better experience than any solution
> for storing large files in the repository.

Well, I would argue that if the goal is ongoing archival and easy 
replication, then storing objects in a repository like git makes a lot 
more sense than keeping them on a central server that may or may not be 
there a few years down the line. Having large file support native in git 
is a laudable goal and I quite often wish that it existed.

-K

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why Git LFS is not a built-in feature
  2020-11-14 16:27   ` Konstantin Ryabitsev
@ 2020-11-14 18:20     ` Ævar Arnfjörð Bjarmason
  2020-11-18 10:20       ` Partial clone demo for large files (Re: Why Git LFS is not a built-in feature) Christian Couder
  2020-11-14 19:15     ` Why Git LFS is not a built-in feature brian m. carlson
  1 sibling, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-14 18:20 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: brian m. carlson, Alireza, git


On Sat, Nov 14 2020, Konstantin Ryabitsev wrote:

> On Sat, Nov 14, 2020 at 12:29:02AM +0000, brian m. carlson wrote:
>> Additionally, in many cases, projects can avoid the need for storing
>> large files at all by using repository best practices, like not storing
>> build products or binary dependencies in the repository and instead
>> using an artifact server or a standard packaging system.  If possible,
>> that will almost always provide a better experience than any solution
>> for storing large files in the repository.
>
> Well, I would argue that if the goal is ongoing archival and easy 
> replication, then storing objects in a repository like git makes a lot 
> more sense than keeping them on a central server that may or may not be 
> there a few years down the line. Having large file support native in git 
> is a laudable goal and I quite often wish that it existed.

That native support does exist right now in the form of partial clones,
the packfile-uris support, core.bigFileThreshold etc.

It's got a lot of rough edges currently, but if it's something you're
interested in you should try it out and see if the subset of features
that works well now is something that would work for you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why Git LFS is not a built-in feature
  2020-11-14 16:27   ` Konstantin Ryabitsev
  2020-11-14 18:20     ` Ævar Arnfjörð Bjarmason
@ 2020-11-14 19:15     ` brian m. carlson
  1 sibling, 0 replies; 6+ messages in thread
From: brian m. carlson @ 2020-11-14 19:15 UTC (permalink / raw)
  To: Alireza, git

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

On 2020-11-14 at 16:27:00, Konstantin Ryabitsev wrote:
> On Sat, Nov 14, 2020 at 12:29:02AM +0000, brian m. carlson wrote:
> > Additionally, in many cases, projects can avoid the need for storing
> > large files at all by using repository best practices, like not storing
> > build products or binary dependencies in the repository and instead
> > using an artifact server or a standard packaging system.  If possible,
> > that will almost always provide a better experience than any solution
> > for storing large files in the repository.
> 
> Well, I would argue that if the goal is ongoing archival and easy
> replication, then storing objects in a repository like git makes a lot
> more sense than keeping them on a central server that may or may not be
> there a few years down the line. Having large file support native in git
> is a laudable goal and I quite often wish that it existed.

Sure, and I think that's a different goal than the typical source code
or writing project (documentation, book, etc.) repository.  For example,
one can use Git repositories to do backups using the tool bup, which
actually works quite well but isn't a traditional use of Git.

The typical use case is that the user wants to store some reasonably
sized project on their local system and possibly also collaborate with
others for that, and with that goal, it makes sense to make the
repository not be absurdly large, since most developer systems don't
have tons of storage.  As Ævar and I mentioned, there are built-in
options for large files that make this use case more palatable with
native Git tooling.  But for this particular use case, it doesn't
logically make sense to store build assets, whether they're yours or
others', in this project repository.

If your goal is archival and replication, then a tool like bup might
meet your needs, or simply a large repository with many objects.  But in
that context, you'll likely have more storage, CPU, and memory available
to you and the need for large file support will look different (e.g.,
core.bigFileThreshold) or not be present at all.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Partial clone demo for large files (Re: Why Git LFS is not a built-in feature)
  2020-11-14 18:20     ` Ævar Arnfjörð Bjarmason
@ 2020-11-18 10:20       ` Christian Couder
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Couder @ 2020-11-18 10:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Alireza
  Cc: Konstantin Ryabitsev, brian m. carlson, git

On Sat, Nov 14, 2020 at 7:25 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sat, Nov 14 2020, Konstantin Ryabitsev wrote:
>
> > On Sat, Nov 14, 2020 at 12:29:02AM +0000, brian m. carlson wrote:
> >> Additionally, in many cases, projects can avoid the need for storing
> >> large files at all by using repository best practices, like not storing
> >> build products or binary dependencies in the repository and instead
> >> using an artifact server or a standard packaging system.  If possible,
> >> that will almost always provide a better experience than any solution
> >> for storing large files in the repository.
> >
> > Well, I would argue that if the goal is ongoing archival and easy
> > replication, then storing objects in a repository like git makes a lot
> > more sense than keeping them on a central server that may or may not be
> > there a few years down the line. Having large file support native in git
> > is a laudable goal and I quite often wish that it existed.
>
> That native support does exist right now in the form of partial clones,
> the packfile-uris support, core.bigFileThreshold etc.
>
> It's got a lot of rough edges currently, but if it's something you're
> interested in you should try it out and see if the subset of features
> that works well now is something that would work for you.

I have been working on a partial clone demo that stores large files on
an HTTP server:

https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/demo.txt

It has a lot of rough edges indeed. Fetching from the HTTP server
promisor remote is very slow. For the last part you need a hacked Git.
The scripts have a lot of bugs and limitations and are not finished
(to say the least).

The goal for now is just to give people (especially product managers,
developers and managers inside GitLab) an outlook about how it could
work.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-18 10:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  9:45 Why Git LFS is not a built-in feature Alireza
2020-11-14  0:29 ` brian m. carlson
2020-11-14 16:27   ` Konstantin Ryabitsev
2020-11-14 18:20     ` Ævar Arnfjörð Bjarmason
2020-11-18 10:20       ` Partial clone demo for large files (Re: Why Git LFS is not a built-in feature) Christian Couder
2020-11-14 19:15     ` Why Git LFS is not a built-in feature brian m. carlson

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).