git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Ben Peart <peartben@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, benpeart@microsoft.com,
	pclouds@gmail.com, christian.couder@gmail.com,
	git@jeffhostetler.com
Subject: Re: [PATCH v2 1/1] sha1_file: Add support for downloading blobs on demand
Date: Mon, 17 Jul 2017 16:24:31 -0700	[thread overview]
Message-ID: <20170717162431.1e11737d@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <d3f1f884-7b8a-885f-47cb-eca2b8ef0ecf@gmail.com>

On Mon, 17 Jul 2017 16:09:17 -0400
Ben Peart <peartben@gmail.com> wrote:

> > Is this change meant to ensure that Git code that operates on loose
> > objects directly (bypassing storage-agnostic functions such as
> > sha1_object_info_extended() and has_sha1_file()) still work? If yes,
> > this patch appears incomplete (for example, read_loose_object() needs to
> > be changed too), and this seems like a difficult task - in my patch set
> > [1], I ended up deciding to create a separate type of storage and
> > instead looked at the code that operates on *packed* objects directly
> > (because there were fewer such methods) to ensure that they would work
> > correctly in the presence of a separate type of storage.
> > 
> 
> Yes, with this set of patches, we've been running successfully on 
> completely sparse clones (no commits, trees, or blobs) for several 
> months.  read_loose_object() is only called by fsck when it is 
> enumerating existing loose objects so does not need to be updated.

Ah, that's good to know. I think such an analysis (of the other
loose-related functions) in the commit message would be useful, like I
did for the packed-related functions [1].

[1] https://public-inbox.org/git/34efd9e9936fdab331655f5a33a098a72dc134f4.1499800530.git.jonathantanmy@google.com/

> We have a few thousand developers making ~100K commits per week so in 
> our particular usage, I'm fairly confident it works correctly.  That 
> said, it is possible there is some code path I've missed. :)

I think that code paths like the history-searching ones ("git log -S",
for example) should still work offline if possible - one of my ideas is
to have these commands take a size threshold parameter so that we do not
need to fetch large blobs during the invocation. (Hence my preference
for size information to be already available to the repo.)

Aside from that, does fsck of a partial repo work? Ability to fsck seems
quite important (or, at least, useful). I tried updating fsck to support
omitting commits and trees (in addition to blobs), and it seems
relatively involved (although I didn't look into it deeply yet).

(Also, that investigation also made me realize that, in my patch set, I
didn't handle the case where a tag references a blob - fsck doesn't work
when the blob is missing, even if it is promised. That's something for
me to look into.)

      reply	other threads:[~2017-07-17 23:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-14 13:26 [RFC/PATCH v2 0/1] Add support for downloading blobs on demand Ben Peart
2017-07-14 13:26 ` [PATCH v2 1/1] sha1_file: " Ben Peart
2017-07-14 15:18   ` Christian Couder
2017-07-17 18:06   ` Jonathan Tan
2017-07-17 20:09     ` Ben Peart
2017-07-17 23:24       ` Jonathan Tan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170717162431.1e11737d@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).