git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-lfs integration?
@ 2019-01-08 15:16 Harald Dunkel
  2019-01-08 15:45 ` Ævar Arnfjörð Bjarmason
  2019-01-10  3:09 ` brian m. carlson
  0 siblings, 2 replies; 4+ messages in thread
From: Harald Dunkel @ 2019-01-08 15:16 UTC (permalink / raw)
  To: git

Hi folks,

I wonder why git-lfs is needed to efficiently handle large files
in git. Would it be reasonable to integrate this functionality
into the native git?

Please excuse me asking. I read some pretty scary articles about
rewriting history, asking everybody to clone existing repositories
again, and strange errors if git-lfs is *not* installed. Apparently
this is a one-way street, so I didn't dare to install git-lfs yet.


Regards
Harri

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-lfs integration?
  2019-01-08 15:16 git-lfs integration? Harald Dunkel
@ 2019-01-08 15:45 ` Ævar Arnfjörð Bjarmason
  2019-01-09  6:10   ` Harald Dunkel
  2019-01-10  3:09 ` brian m. carlson
  1 sibling, 1 reply; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-08 15:45 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: git


On Tue, Jan 08 2019, Harald Dunkel wrote:

> I wonder why git-lfs is needed to efficiently handle large files
> in git. Would it be reasonable to integrate this functionality
> into the native git?
>
> Please excuse me asking. I read some pretty scary articles about
> rewriting history, asking everybody to clone existing repositories
> again, and strange errors if git-lfs is *not* installed. Apparently
> this is a one-way street, so I didn't dare to install git-lfs yet.

It depends on what "integrate this" means.

I think it's unlikely that git-lfs would be integrated as-is. There's
various clean/smudge filters like it that do remote downloads (also
git-annex). Everyone's probably better off if git itself maintains the
infra needed for that ecosystem, and users can vote by their usage which
one(s) they like.

But in more general terms the problem of making git natively friendlier
to "large files" is being worked on on multiple fronts.

For one, Microsoft has been upstreaming parts of their GVFS fork, if you
search for "partial clone" in release notes since 2.16.0 (including in
2.20.0) you'll see some of that work relevant to that. I.e. one part of
this is the general ability to have partially available local history,
whether it's skipping (big) blobs, some trees etc.

Another effort has been the introduction of the v2 protocol to Git,
which has happened recently, and is only now starting to get rolled out
at various hosting providers. That in and of itself hasn't helped with
this, but allows for future extensions to the protocol, such as "this is
not the full data, you can find the rest at xyz".

Then there's the "odb" effort, see e.g. here:
https://public-inbox.org/git/20180802061505.2983-1-chriscool@tuxfamily.org/

I think that long-term (5-20yrs) those effors will probably completely
supplant the approach git-lfs is taking. It's a very useful tool, but
ultimately a bit of a hacky workaround in lieu of addressing fixable
issues in git itself, i.e. native support for partially downloaded
history. But getting to that point will take time & effort.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-lfs integration?
  2019-01-08 15:45 ` Ævar Arnfjörð Bjarmason
@ 2019-01-09  6:10   ` Harald Dunkel
  0 siblings, 0 replies; 4+ messages in thread
From: Harald Dunkel @ 2019-01-09  6:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Hi Ævar,

thanx very much for your detailed response. Exactly what I was
looking for.


Regards
Harri

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-lfs integration?
  2019-01-08 15:16 git-lfs integration? Harald Dunkel
  2019-01-08 15:45 ` Ævar Arnfjörð Bjarmason
@ 2019-01-10  3:09 ` brian m. carlson
  1 sibling, 0 replies; 4+ messages in thread
From: brian m. carlson @ 2019-01-10  3:09 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

On Tue, Jan 08, 2019 at 04:16:41PM +0100, Harald Dunkel wrote:
> Hi folks,
> 
> I wonder why git-lfs is needed to efficiently handle large files
> in git. Would it be reasonable to integrate this functionality
> into the native git?

Most of the problems Git has with handling large files aren't really
problems if you have unlimited resources, but they are practical
problems in some situations.

Git doesn't really handle files that exceed memory capacity very well.
In order to deltify files, we need to have them in memory, so the option
is either to not deltify large files and have a lot of storage used, or
deltify and spend a lot of CPU and memory compressing, decompressing,
and deltifying them.

This means that Git can require a lot of resources to store and repack
large files. This is not only a problem on your system, but on whatever
remote system you host your repos on (your own server, GitHub, GitLab,
etc.). Your host, while probably having more resources than your local
machine, also probably has more repos as well.

Git LFS makes the trade-off to store files uncompressed and only copy
the needed files from the server to your system. That means that you
don't need to bloat your local clone with files you may never check out,
but you have the downside that your clone isn't necessarily complete.

I'm a maintainer of Git LFS, and I'm perfectly happy with solutions that
help Git handle large files better. Ævar gave a great explanation of
some of the work that's going on in this regard, and I'm also happy to
hear about other improvements that may come up as well.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-10  3:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 15:16 git-lfs integration? Harald Dunkel
2019-01-08 15:45 ` Ævar Arnfjörð Bjarmason
2019-01-09  6:10   ` Harald Dunkel
2019-01-10  3:09 ` brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).