From: Nicolas Pitre <nico@cam.org>
To: Dana How <danahow@gmail.com>
Cc: Jakub Narebski <jnareb@gmail.com>, git@vger.kernel.org
Subject: Re: If you would write git from scratch now, what would you change?
Date: Mon, 26 Nov 2007 15:55:43 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.0.99999.0711261529080.9605@xanadu.home> (raw)
In-Reply-To: <56b7f5510711261217h56214321xb7acd9851b677dd6@mail.gmail.com>
On Mon, 26 Nov 2007, Dana How wrote:
> On Nov 26, 2007 11:52 AM, Nicolas Pitre <nico@cam.org> wrote:
> > On Mon, 26 Nov 2007, Dana How wrote:
> > > Currently data can be quickly copied from pack to pack,
> > > but data cannot be quickly copied blob->pack or pack->blob
> > I don't see why you would need the pack->blob copy normally.
> True, but that doesn't change the main point.
Sure, but let's not go overboard either.
> > > (there was an alternate blob format that supported this,
> > > but it was deprecated). Using the pack format for blobs
> > > would fix this.
> >
> > Then you can do just that for big enough blobs where "big enough" is
> > configurable: encapsulate them in a pack instead of a loose object.
> > Problem solved. Sure you'll end up with a bunch of packs containing
> > only one blob object, but given that those blobs are so large to be a
> > problem in your work flow when written out as loose objects, then they
> > certainly must be few enough not to cause an explosion in the number of
> > packs.
> Are you suggesting that "git add" create a new pack containing
> one blob when the blob is big enough?
Exactly.
> Re-using (part of) the pack format
> in a blob (or maybe only some blobs) seems like less code change.
Don't know what you mean exactly here, but what I mean is to do
something as simple as:
pretend_sha1_file(...);
add_object_entry(...);
write_pack_file();
when the buffer to make a blob from is larger than a configured
treshold.
> > > It would also mean blobs wouldn't need to
> > > be uncompressed to get the blob type or size I believe.
> >
> > They already don't.
> It looks like sha1_file.c:parse_sha1_header() works on a buffer
> filled in by sha1_file.c:unpack_sha1_header() by calling inflate(), right?
>
> It is true you don't have to uncompress the *entire* blob.
Right. Only the first 16 bytes or so need to be uncompressed.
> > > The equivalent operation in git would require the creation of
> > > the blob, and then of a temporary pack to send to the server.
> > > This requires 3 calls to zlib for each blob, which for very
> > > large files is not acceptable at my site.
> >
> > I currently count 2 calls to zlib, not 3.
> I count 3:
>
> Call 1: git-add calls zlib to make the blob.
>
> Call 2: builtin-pack-objects.c:write_one() calls sha1_file.c:read_sha1_file()
> calls :unpack_sha1_file() calls :unpack_sha1_{header,rest}() calls
> inflate() to get the data from the blob into a buffer.
>
> Call 3: Then write_one() calls deflate to make the new buffer
> to write into the pack. This is all under the "if (!to_reuse) {" path,
> which is active when packing a blob.
Oh, you're right. Somehow I didn't count the needed decompression.
> Remember, I'm comparing "p4 submit file" to
> "git add file"/"git commit"/"git push", which is the comparison
> the users will be making.
>
> On the other hand, I'm looking at code from June;
> but I haven't noticed big changes since then on the list.
>
> Calls 2 and 3 go away if the blob and pack formats were more similar.
... which my suggestion should provide with a minimum of changes, maybe
less than 10 lines of code.
Nicolas
next prev parent reply other threads:[~2007-11-26 20:56 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-25 21:48 If you would write git from scratch now, what would you change? Jakub Narebski
2007-11-25 22:23 ` Pierre Habouzit
2007-11-26 1:28 ` Steven Walter
2007-11-26 6:11 ` Junio C Hamano
2007-11-26 6:36 ` Adam Roben
2007-11-26 15:32 ` Carlos Rica
2007-11-26 16:40 ` Daniel Barkalow
2007-11-26 16:46 ` Andy Parkins
2007-11-26 17:10 ` Benoit Sigoure
2007-11-26 18:56 ` Jan Hudec
2007-11-26 19:12 ` David Kastrup
2007-11-26 19:34 ` Jan Hudec
2007-11-26 19:50 ` Michael Poole
2007-11-26 20:09 ` Jan Hudec
2007-11-26 20:31 ` Michael Poole
2007-11-26 20:48 ` Jon Smirl
2007-11-26 20:11 ` Andy Parkins
2007-11-26 19:25 ` Marco Costalba
2007-11-27 1:20 ` Shawn O. Pearce
2007-11-27 1:46 ` Jakub Narebski
2007-11-27 1:58 ` Shawn O. Pearce
2007-11-27 11:39 ` Johannes Schindelin
2007-11-27 23:59 ` [RFC] git-gui USer's Survey 2007 (was: If you would write git from scratch now, what would you change?) Jakub Narebski
2007-11-28 12:32 ` Johannes Schindelin
2007-11-28 15:48 ` Jason Sewall
2007-11-28 23:25 ` Jan Hudec
2007-11-28 23:48 ` Johannes Schindelin
2007-11-29 6:57 ` Jan Hudec
2007-11-29 12:01 ` Johannes Schindelin
2007-11-30 17:50 ` Jan Hudec
2007-11-30 18:25 ` Marco Costalba
2007-12-01 2:35 ` Shawn O. Pearce
2007-12-01 2:53 ` Marco Costalba
2007-11-28 13:18 ` [RFC] git-gui USer's Survey 2007 Sergei Organov
2007-11-27 8:45 ` If you would write git from scratch now, what would you change? Andy Parkins
2007-11-27 13:15 ` Marco Costalba
2007-11-27 23:56 ` Jan Hudec
2007-11-27 17:48 ` Johannes Schindelin
2007-12-04 11:00 ` Andy Parkins
2007-11-27 17:33 ` Jing Xue
2007-11-26 16:48 ` Jon Smirl
2007-11-26 17:11 ` David Kastrup
2007-11-26 19:27 ` Jan Hudec
2007-11-26 20:11 ` Benoit Sigoure
2007-11-26 20:36 ` Jan Hudec
2007-11-26 19:30 ` Nicolas Pitre
2007-11-26 19:34 ` David Kastrup
2007-11-26 19:57 ` Jan Hudec
2007-11-26 20:35 ` David Kastrup
2007-11-26 21:00 ` Jan Hudec
2007-11-26 21:28 ` Nicolas Pitre
2007-11-26 20:45 ` Wincent Colaiuta
2007-11-26 21:24 ` Junio C Hamano
2007-11-26 21:35 ` Nicolas Pitre
2007-11-26 21:47 ` Junio C Hamano
2007-11-26 22:03 ` Nicolas Pitre
2007-11-27 1:03 ` Shawn O. Pearce
2007-11-27 3:35 ` Junio C Hamano
2007-11-27 5:10 ` Steven Grimm
2007-11-26 21:27 ` Johannes Schindelin
2007-11-26 21:39 ` Nicolas Pitre
2007-11-26 21:40 ` Johannes Schindelin
2007-11-27 14:11 ` Andreas Ericsson
2007-11-27 14:38 ` Jakub Narebski
2007-11-26 19:18 ` Dana How
2007-11-26 19:52 ` Nicolas Pitre
2007-11-26 20:17 ` Dana How
2007-11-26 20:55 ` Nicolas Pitre [this message]
2007-11-26 22:02 ` Dana How
2007-11-26 22:22 ` Nicolas Pitre
2007-11-26 20:17 ` Jakub Narebski
2007-11-26 20:36 ` Dana How
2007-11-27 1:25 ` Shawn O. Pearce
2007-11-27 5:07 ` Nicolas Pitre
2007-11-27 1:48 ` Shawn O. Pearce
2007-11-27 1:54 ` Junio C Hamano
2007-11-27 1:59 ` Shawn O. Pearce
2007-11-27 2:15 ` Jakub Narebski
2007-11-27 11:47 ` C# binding, was " Johannes Schindelin
2007-11-27 4:58 ` Nicolas Pitre
2007-11-27 5:59 ` Dana How
2007-11-27 6:12 ` Shawn O. Pearce
2007-11-27 16:33 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.99999.0711261529080.9605@xanadu.home \
--to=nico@cam.org \
--cc=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).