git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* (no subject)
@ 2020-06-24  0:38 shejan shuza
  2020-06-24  1:31 ` your mail brian m. carlson
  0 siblings, 1 reply; 2+ messages in thread
From: shejan shuza @ 2020-06-24  0:38 UTC (permalink / raw)
  To: git

Hi, I have a repository with about 55GB of contents, with binary files
that are less than 100MB each (so no LFS mode) from a project which
has almost filled up an entire hard drive. I am trying to add all of
the contents to a git repo and push it to GitHub but every time I do

git add .

in the folder with my contents after initializing and setting my
remote, git starts caching all the files to .git/objects, making the
.git folder grow in size rapidly. All the files are binaries, so git
cannot stage changes between versions anyway, so there is no reason to
cache versions.

Is there any way, such as editing the git attributes or changing
something about how files are staged in the git repository, to only
just add indexes or references to files in the repository rather than
cache them into the .git folder, while also being able to push all the
data to GitHub?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: your mail
  2020-06-24  0:38 shejan shuza
@ 2020-06-24  1:31 ` brian m. carlson
  0 siblings, 0 replies; 2+ messages in thread
From: brian m. carlson @ 2020-06-24  1:31 UTC (permalink / raw)
  To: shejan shuza; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

On 2020-06-24 at 00:38:39, shejan shuza wrote:
> Hi, I have a repository with about 55GB of contents, with binary files
> that are less than 100MB each (so no LFS mode) from a project which
> has almost filled up an entire hard drive. I am trying to add all of
> the contents to a git repo and push it to GitHub but every time I do
> 
> git add .
> 
> in the folder with my contents after initializing and setting my
> remote, git starts caching all the files to .git/objects, making the
> .git folder grow in size rapidly. All the files are binaries, so git
> cannot stage changes between versions anyway, so there is no reason to
> cache versions.

What you're experiencing is normal; storing files in the .git directory
is how Git keeps track of them.  It can't rely on the copies in your
working tree because you can modify those files at any time, and if you
did so, relying on them would corrupt the repository.

Also, note that Git can and does deltify changes between revisions once
the data is packed, regardless of whether the file is binary, but how
well it does so depends on your data.  For example, if it's compressed,
it likely doesn't deltify very well, so storing things like compressed
images or zip files using deflate is generally going to result in a
bloated repository.

However, if you don't need versioning for these files, then you don't
need a Git repository.  Git is a tool for versioning files, not a
general storage mechanism.  You may find a cloud storage bucket or some
other artifact storage service may meet your needs better.

I will also tell you from a practical point of view that almost nobody
(including you) will want to host a 55 GB repository filled with binary
blobs.  Usually repacking these repositories is very expensive,
requiring extensive CPU and memory usage for a prolonged time for little
useful benefit.

> Is there any way, such as editing the git attributes or changing
> something about how files are staged in the git repository, to only
> just add indexes or references to files in the repository rather than
> cache them into the .git folder, while also being able to push all the
> data to GitHub?

This is how Git LFS and similar tools, like git-annex, work.  Git LFS
will create copies of the objects in your .git directory though, at
least until they're pushed to the server, at which point they can be
pruned.  Git LFS has the same limitation as Git here.  I'm less familiar
with git-annex, but it is also a popular choice.

However, as mentioned, it sounds like you don't need versioning at all,
so unless you do, Git with Git LFS will be no more suitable for this
than plain Git.  If that's the case, I encourage you to explore
alternate solutions.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-06-24  1:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  0:38 shejan shuza
2020-06-24  1:31 ` your mail brian m. carlson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).