git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Farshid Zavareh <fhzavareh@gmail.com>
To: Andrew Ardill <andrew.ardill@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Should I store large text files on Git LFS?
Date: Mon, 24 Jul 2017 13:46:26 +1000	[thread overview]
Message-ID: <FF08CB42-35AC-4B97-BB02-2473BEDE66A1@gmail.com> (raw)
In-Reply-To: <CAH5451mrL=GE6WrX6juoyGPV6trcQhXXthKhjT2=qCDCiffeeA@mail.gmail.com>

Hi Andrew.

Thanks for your reply.

I'll probably test this myself, but would modifying and committing a 4GB text file actually add 4GB to the repository's size? I anticipate that it won't, since Git keeps track of the changes only, instead of storing a copy of the whole file (whereas this is not the case with binary files, hence the need for LFS).

Kind regards,
Farshid

> On 24 Jul 2017, at 12:29 pm, Andrew Ardill <andrew.ardill@gmail.com> wrote:
> 
> Hi Farshid,
> 
> On 24 July 2017 at 12:01, Farshid Zavareh <fhzavareh@gmail.com> wrote:
>> I'v been handed over a project that uses Git LFS for storing large CSV files.
>> 
>> My understanding is that the main benefit of using Git LFS is to keep the repository small for binary files, where Git can't keep track of the changes and ends up storing whole files for each revision. For a text file, that problem does not exist to begin with and Git can store only the changes. At the same time, this is going to make checkouts unnecessarily slow, not to mention the financial cost of storing the whole file for each revision.
>> 
>> Is there something I'm missing here?
> 
> Git LFS gives benefits when working on *large* files, not just large
> *binary* files.
> 
> I can imagine a few reasons for using LFS for some CSV files
> (especially the kinds of files I deal with sometimes!).
> 
> The main one is that many users don't need or want to download the
> large files, or all versions of the large file. Moreover, you probably
> don't care about changes between those files, or there would be so
> many that using the git machinery for comparing them would be
> cumbersome and ineffective.
> 
> For me, if I was storing any CSV file over a couple of hundred
> megabyte I would consider using something like LFS. An example would
> be a large Dunn & Bradstreet data file, which I do an analysis on
> every quarter. I want to include the file in the repository, so that
> the analysis can be replicated later on, but I don't want to add 4GB
> of data to the repo every single time the dataset gets updated (also
> every quarter). Storing that in LFS would be a good solution then.
> 
> Regards,
> 
> Andrew Ardill


  reply	other threads:[~2017-07-24  3:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24  2:01 Should I store large text files on Git LFS? Farshid Zavareh
2017-07-24  2:29 ` Andrew Ardill
2017-07-24  3:46   ` Farshid Zavareh [this message]
2017-07-24  4:13     ` David Lang
2017-07-24  4:18       ` Farshid Zavareh
     [not found]       ` <CANENsPpdQzBqStGjq4jUsAB0-7U8_SQq+=kjmJe6pJtiXxnYFg@mail.gmail.com>
2017-07-24  4:19         ` David Lang
     [not found]   ` <CANENsPr271w=a4YNOYdrp9UM4L_eA1VZMRP_UrH+NZ+2PWM_qg@mail.gmail.com>
2017-07-24  4:58     ` Andrew Ardill
2017-07-24 18:11       ` Jeff King
2017-07-24 19:41         ` Junio C Hamano
2017-07-25  8:06         ` Andrew Ardill
2017-07-25 19:13           ` Jeff King
2017-07-25 20:52             ` Junio C Hamano
2017-07-25 21:13               ` Jeff King
2017-07-25 21:38                 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF08CB42-35AC-4B97-BB02-2473BEDE66A1@gmail.com \
    --to=fhzavareh@gmail.com \
    --cc=andrew.ardill@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).