git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: shejan shuza <shejan0@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: your mail
Date: Wed, 24 Jun 2020 01:31:01 +0000	[thread overview]
Message-ID: <20200624013101.GA6451@camp.crustytoothpaste.net> (raw)
In-Reply-To: <CABO4vy0WTYbeAiyWf7hE_4QurwH_c1wJUdU=A4FSC0uSxiEO7g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

On 2020-06-24 at 00:38:39, shejan shuza wrote:
> Hi, I have a repository with about 55GB of contents, with binary files
> that are less than 100MB each (so no LFS mode) from a project which
> has almost filled up an entire hard drive. I am trying to add all of
> the contents to a git repo and push it to GitHub but every time I do
> 
> git add .
> 
> in the folder with my contents after initializing and setting my
> remote, git starts caching all the files to .git/objects, making the
> .git folder grow in size rapidly. All the files are binaries, so git
> cannot stage changes between versions anyway, so there is no reason to
> cache versions.

What you're experiencing is normal; storing files in the .git directory
is how Git keeps track of them.  It can't rely on the copies in your
working tree because you can modify those files at any time, and if you
did so, relying on them would corrupt the repository.

Also, note that Git can and does deltify changes between revisions once
the data is packed, regardless of whether the file is binary, but how
well it does so depends on your data.  For example, if it's compressed,
it likely doesn't deltify very well, so storing things like compressed
images or zip files using deflate is generally going to result in a
bloated repository.

However, if you don't need versioning for these files, then you don't
need a Git repository.  Git is a tool for versioning files, not a
general storage mechanism.  You may find a cloud storage bucket or some
other artifact storage service may meet your needs better.

I will also tell you from a practical point of view that almost nobody
(including you) will want to host a 55 GB repository filled with binary
blobs.  Usually repacking these repositories is very expensive,
requiring extensive CPU and memory usage for a prolonged time for little
useful benefit.

> Is there any way, such as editing the git attributes or changing
> something about how files are staged in the git repository, to only
> just add indexes or references to files in the repository rather than
> cache them into the .git folder, while also being able to push all the
> data to GitHub?

This is how Git LFS and similar tools, like git-annex, work.  Git LFS
will create copies of the objects in your .git directory though, at
least until they're pushed to the server, at which point they can be
pruned.  Git LFS has the same limitation as Git here.  I'm less familiar
with git-annex, but it is also a popular choice.

However, as mentioned, it sounds like you don't need versioning at all,
so unless you do, Git with Git LFS will be no more suitable for this
than plain Git.  If that's the case, I encourage you to explore
alternate solutions.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  reply	other threads:[~2020-06-24  1:31 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24  0:38 shejan shuza
2020-06-24  1:31 ` brian m. carlson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-11-20  3:49 Han-Wen Nienhuys
2019-11-20  5:30 ` your mail Taylor Blau
2019-11-20  8:05   ` Christian Couder
2019-07-11 20:11 Robert Morgan
2019-07-11 20:18 ` your mail Kevin Daudt
2019-07-11 20:25   ` Robert Morgan
2019-01-25  9:47 Furkan DURUL
2019-01-25 11:27 ` your mail Kevin Daudt
2017-06-22  9:50 Jessie Hernandez
2017-06-22 12:48 ` your mail Simon Ruderich
2017-06-22 13:35   ` AW: " Patrick Lehmann
2017-06-22 13:47     ` Simon Ruderich
2017-06-22 13:55       ` AW: " Patrick Lehmann
2017-06-22 20:46         ` Simon Ruderich
2017-06-22 21:35           ` Junio C Hamano
2017-06-22 21:58             ` Ævar Arnfjörð Bjarmason
2017-06-22 22:14               ` Junio C Hamano
2017-06-22 23:21               ` Jeff King
2017-06-23  5:23                 ` Junio C Hamano
2017-06-23 16:53                   ` Jeff King
2017-06-23 18:44                     ` Junio C Hamano
2017-06-23  6:58               ` demerphq
2016-04-11 19:04 (unknown), miwilliams
2016-04-11 19:13 ` your mail Jeff King
2013-05-17 18:02 (unknown), ASHISH VERMA
2013-05-21 13:13 ` your mail Magnus Bäck
2012-05-06 14:17 (unknown), Bruce Zu
2012-05-06 17:04 ` your mail Marcus Karlsson
2008-08-13 14:54 (unknown), aneesh.kumar
2008-08-13 15:16 ` your mail Aneesh Kumar K.V
2007-12-05 19:00 [PATCH 0/6] builtin-remote Johannes Schindelin
2007-12-05 19:00 ` (unknown) Johannes Schindelin
2007-12-05 19:01   ` your mail Johannes Schindelin
2007-10-13  4:01 (unknown), Michael Witten
2007-10-13  4:07 ` your mail Jeff King
2006-11-21 22:24 (unknown) Johannes Schindelin
2006-11-22 20:16 ` your mail Davide Libenzi
2006-10-20 14:24 (unknown) andyparkins
2006-10-20 14:42 ` your mail Johannes Schindelin
     [not found] <C8DBC54F2A9BAD4FA7F445CC7ADD963B0232474F@sslmexchange1.paymentech.us>
2006-09-26 19:56 ` Linus Torvalds
2006-05-21 23:53 (unknown) J. Bruce Fields
2006-05-21 23:55 ` your mail J. Bruce Fields
2006-05-22  0:16   ` Junio C Hamano
2006-05-22  1:33     ` J. Bruce Fields
2005-10-05  6:10 (unknown), Willem Swart
2005-10-06 10:52 ` your mail Elfyn McBratney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200624013101.GA6451@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=shejan0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).