git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* "Unpacking objects" question
@ 2021-05-02 11:06 Bagas Sanjaya
  2021-05-02 16:55 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Bagas Sanjaya @ 2021-05-02 11:06 UTC (permalink / raw)
  To: Git Users

Hi,

Recently I stumbled upon git unpack-objects documentation, which says:

> Read a packed archive (.pack) from the standard input, expanding the objects contained within and writing them into the repository in "loose" (one object per file) format.

However, I have some questions:

1. When I do git fetch, what is the threshold/limit for "Unpacking objects",
    in other words what is the minimum number of objects for invoking
    "Resolving deltas" instead of "Unpacking objects"?
2. Can the threshold between unpacking objects and resolving deltas be
    configurable?
3. Why in some cases Git do unpacking objects where resolving deltas
    can be done?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "Unpacking objects" question
  2021-05-02 11:06 "Unpacking objects" question Bagas Sanjaya
@ 2021-05-02 16:55 ` Jeff King
  2021-05-03  1:22   ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2021-05-02 16:55 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Git Users

On Sun, May 02, 2021 at 06:06:57PM +0700, Bagas Sanjaya wrote:

> Recently I stumbled upon git unpack-objects documentation, which says:
> 
> > Read a packed archive (.pack) from the standard input, expanding the objects contained within and writing them into the repository in "loose" (one object per file) format.
> 
> However, I have some questions:
> 
> 1. When I do git fetch, what is the threshold/limit for "Unpacking objects",
>    in other words what is the minimum number of objects for invoking
>    "Resolving deltas" instead of "Unpacking objects"?
> 2. Can the threshold between unpacking objects and resolving deltas be
>    configurable?

See the fetch.unpackLimit config. The default is 100 objects.

> 3. Why in some cases Git do unpacking objects where resolving deltas
>    can be done?

I don't know if the documentation discusses this tradeoff anywhere, but
off the top of my head:

  - storing packs can be more efficient in disk space (because of deltas
    within the pack, but also fewer inodes for small objects). This
    effect is bigger the more objects you have.

  - storing packs can be less efficient, because thin packs will be
    completed with duplicates of already-stored objects. The overhead is
    bigger the fewer objects you have.

Which I suspect is the main logic driving the object count (I didn't dig
in the history or the archive, though; you might find more discussion
there). AFAIK the number 100 doesn't have any real scientific basis.

There are some other subtle effects, too:

  - storing packs from the wire may make git-gc more efficient (you can
    often reuse deltas sent by the other side)

  - storing packs from the wire may produce a worse outcome after
    git-gc, because you are reusing deltas produced by the client for
    their push (who might not have spent as much CPU looking for them as
    you would)

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "Unpacking objects" question
  2021-05-02 16:55 ` Jeff King
@ 2021-05-03  1:22   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2021-05-03  1:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Bagas Sanjaya, Git Users

Jeff King <peff@peff.net> writes:

> I don't know if the documentation discusses this tradeoff anywhere, but
> off the top of my head:
>
>   - storing packs can be more efficient in disk space (because of deltas
>     within the pack, but also fewer inodes for small objects). This
>     effect is bigger the more objects you have.
>
>   - storing packs can be less efficient, because thin packs will be
>     completed with duplicates of already-stored objects. The overhead is
>     bigger the fewer objects you have.

Another original motivation was to avoid ending up with too many
small packs, as it would result in accessing objects taking
potentially order of number of packfiles in the repository in the
pre midx world.  After many small fetches, gc would be able to pack
them all into a single pack.

> There are some other subtle effects, too:
>
>   - storing packs from the wire may make git-gc more efficient (you can
>     often reuse deltas sent by the other side)

 - storing and using packs that came from the wire may not have as
   good locality among objects, especially when the other side was a
   server that is optimized to reduce outbound network bandwidth
   (read: size) and their own processing cycles (read: object reuse
   from their packs).  Local packing has a dedicated phase to
   reorder the objects to pack related ones close to each other, but
   the "server" side has no incentive to optimize for that.

>   - storing packs from the wire may produce a worse outcome after
>     git-gc, because you are reusing deltas produced by the client for
>     their push (who might not have spent as much CPU looking for them as
>     you would)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-03  1:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-02 11:06 "Unpacking objects" question Bagas Sanjaya
2021-05-02 16:55 ` Jeff King
2021-05-03  1:22   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).