git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Andrew Ardill <andrew.ardill@gmail.com>,
	Farshid Zavareh <fhzavareh@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Should I store large text files on Git LFS?
Date: Tue, 25 Jul 2017 14:38:43 -0700	[thread overview]
Message-ID: <CAGZ79kaFR0d4Z2kcUawHw8PmHpa9gaj4sBMktZjYoeAc27ywyQ@mail.gmail.com> (raw)
In-Reply-To: <20170725211300.vwlpioy5jes55273@sigill.intra.peff.net>

On Tue, Jul 25, 2017 at 2:13 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Jul 25, 2017 at 01:52:46PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > As you can see, core.bigfilethreshold is a pretty blunt instrument. It
>> > might be nice if .gitattributes understood other types of patterns
>> > besides filenames, so you could do something like:
>> >
>> >   echo '[size > 500MB] delta -diff' >.gitattributes
>> >
>> > or something like that. I don't think it's come up enough for anybody to
>> > care too much about it or work on it.
>>
>> But attributes is about paths, at which a blob may or may not exist,
>> so it is a bad fit to add conditionals that are based on sizes and
>> types.
>
> Do attributes _have_ to be about paths? In practice we often use them to
> describe objects, and paths are just the only mechanism we give to refer
> to objects.  But it is not actually a correct or rigorous mechanism in
> some cases.  For example, imagine I have a .gitattributes with:
>
>   foo -delta
>   bar delta
>
> and then imagine I have a tree with both "foo" and "bar" pointing to the
> same blob. When I run pack-objects, it wants to know whether to delta
> the object. What should it do?
>
> The delta decision is really a property of the object. But the only
> mechanism we give for selecting an object is by path, which we know is
> not a one-to-one mapping with objects. So the results you get will
> depend on which name we happened to see the object under first while
> traversing.
>
> I think the case you are getting at is something like clean filters,
> where we might not have an object at all. In that case I would argue
> that a property of an object could never be satisfied (so neither
> "size > 500" nor "size <= 500" could match). Whether object properties
> are meaningful is in the eye of the code that is looking up the value.
> Or more generally, the set of properties to be matched is in the eye of
> the caller. So looking up a clean filter might want to define the size
> property based no the working tree size.
>
> -Peff

I recall a similar discussion on the different "big repo" approaches.
Looking at the interface of LFS, there are things such as:

  git lfs fetch --recent
  git lfs fetch --all
  git lfs fetch [--exclude] <pathspec>

so LFS provides both the way to address objects via time or by path,
maybe even combined "I want everything from <pathspec 1> but only
'recent' things from <pathspec 2>".

attributes can already be queried from pathspecs, and I think when
designing from scratch we might put it the other way round:

    delta:
        bar
        everything <500m
    -delta
        foo
        binaries

So in the far future, attributes may learn about more than just
pathspecs that we currently use to assign labels, but could
* include size
* properties derived from the 'file' utility
* be specific about certain objects (historic paths)

      reply	other threads:[~2017-07-25 21:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24  2:01 Should I store large text files on Git LFS? Farshid Zavareh
2017-07-24  2:29 ` Andrew Ardill
2017-07-24  3:46   ` Farshid Zavareh
2017-07-24  4:13     ` David Lang
2017-07-24  4:18       ` Farshid Zavareh
     [not found]       ` <CANENsPpdQzBqStGjq4jUsAB0-7U8_SQq+=kjmJe6pJtiXxnYFg@mail.gmail.com>
2017-07-24  4:19         ` David Lang
     [not found]   ` <CANENsPr271w=a4YNOYdrp9UM4L_eA1VZMRP_UrH+NZ+2PWM_qg@mail.gmail.com>
2017-07-24  4:58     ` Andrew Ardill
2017-07-24 18:11       ` Jeff King
2017-07-24 19:41         ` Junio C Hamano
2017-07-25  8:06         ` Andrew Ardill
2017-07-25 19:13           ` Jeff King
2017-07-25 20:52             ` Junio C Hamano
2017-07-25 21:13               ` Jeff King
2017-07-25 21:38                 ` Stefan Beller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kaFR0d4Z2kcUawHw8PmHpa9gaj4sBMktZjYoeAc27ywyQ@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=andrew.ardill@gmail.com \
    --cc=fhzavareh@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).