git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index
@ 2019-06-10 23:35 Derrick Stolee via GitGitGadget
  2019-06-10 23:35 ` [PATCH 01/11] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-10 23:35 UTC (permalink / raw)
  To: git; +Cc: sbeller, peff, jrnieder, avarab, jonathantanmy, Junio C Hamano

The multi-pack-index provides a fast way to find an object among a large
list of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.

Create new subcommands for the multi-pack-index builtin.

 * 'git multi-pack-index expire': If we have a pack-file indexed by the
   multi-pack-index, but all objects in that pack are duplicated in
   more-recently modified packs, then delete that pack (and any others like
   it). Delete the reference to that pack in the multi-pack-index.
   
   
 * 'git multi-pack-index repack --batch-size=': Starting from the oldest
   pack-files covered by the multi-pack-index, find those whose "expected
   size" is below the batch size until we have a collection of packs whose
   expected sizes add up to the batch size. We compute the expected size by
   multiplying the number of referenced objects by the pack-size and
   dividing by the total number of objects in the pack. If the batch-size is
   zero, then select all packs. Create a new pack containing all objects
   that the multi-pack-index references to those packs.
   
   

This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the
last 'repack' are finished, run 'expire' again. This approach has some
advantages over the existing "repack everything" model:

 1. Incremental. We can repack a small batch of objects at a time, instead
    of repacking all reachable objects. We can also limit ourselves to the
    objects that do not appear in newer pack-files.
    
    
 2. Highly Available. By adding a new pack-file (and not deleting the old
    pack-files) we do not interrupt concurrent Git commands, and do not
    suffer performance degradation. By expiring only pack-files that have no
    referenced objects, we know that Git commands that are doing normal
    object lookups* will not be interrupted.
    
    

 * Note: if someone concurrently runs a Git command that uses
   get_all_packs(), then that command could try to read the pack-files and
   pack-indexes that we are deleting during an expire command. Such commands
   are usually related to object maintenance (i.e. fsck, gc, pack-objects)
   or are related to less-often-used features (i.e. fast-import,
   http-backend, server-info).

We are using this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on
an hourly basis to keep up-to-date with the central server. The cache
servers supply packs on an hourly and daily basis, so most of the hourly
packs become useless after a new daily pack is downloaded. The 'expire'
command would clear out most of those packs, but many will still remain with
fewer than 100 objects remaining. The 'repack' command (with a batch size of
1-3gb, probably) can condense the remaining packs in commands that run for
1-3 min at a time. Since the daily packs range from 100-250mb, we will also
combine and condense those packs.

This series is the same as v6 of an earlier series [1].

Thanks, -Stolee

[1] https://public-inbox.org/git/pull.92.git.gitgitgadget@gmail.com/T/#u

Derrick Stolee (11):
  repack: refactor pack deletion for future use
  Docs: rearrange subcommands for multi-pack-index
  multi-pack-index: prepare for 'expire' subcommand
  midx: simplify computation of pack name lengths
  midx: refactor permutation logic and pack sorting
  multi-pack-index: implement 'expire' subcommand
  multi-pack-index: prepare 'repack' subcommand
  midx: implement midx_repack()
  multi-pack-index: test expire while adding packs
  midx: add test that 'expire' respects .keep files
  t5319-multi-pack-index.sh: test batch size zero

 Documentation/git-multi-pack-index.txt |  32 +-
 builtin/multi-pack-index.c             |  14 +-
 builtin/repack.c                       |  14 +-
 midx.c                                 | 440 +++++++++++++++++++------
 midx.h                                 |   2 +
 packfile.c                             |  28 ++
 packfile.h                             |   7 +
 t/t5319-multi-pack-index.sh            | 184 +++++++++++
 8 files changed, 602 insertions(+), 119 deletions(-)


base-commit: af96fe3392fb078cb5447bcb94f2ed8d79d0a4a8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-261%2Fderrickstolee%2Fmidx-expire%2Fupstream-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-261/derrickstolee/midx-expire/upstream-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/261
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-07-01 19:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-10 23:35 [PATCH 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 01/11] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 02/11] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 03/11] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 04/11] midx: simplify computation of pack name lengths Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 05/11] midx: refactor permutation logic and pack sorting Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 06/11] multi-pack-index: implement 'expire' subcommand Derrick Stolee via GitGitGadget
2019-06-11 18:51   ` Junio C Hamano
2019-06-10 23:35 ` [PATCH 07/11] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 08/11] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 09/11] multi-pack-index: test expire while adding packs Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 10/11] midx: add test that 'expire' respects .keep files Derrick Stolee via GitGitGadget
2019-06-10 23:35 ` [PATCH 11/11] t5319-multi-pack-index.sh: test batch size zero Derrick Stolee via GitGitGadget
2019-06-30 18:57 ` [PATCH] t5319: don't trip over a user name with whitespace Johannes Sixt
2019-06-30 19:48   ` Eric Sunshine
2019-06-30 20:59     ` Johannes Sixt
2019-06-30 22:25       ` Jeff King
2019-07-01  6:33         ` Johannes Sixt
2019-07-01  9:16           ` Jeff King
2019-07-01 11:33             ` SZEDER Gábor
2019-07-01 12:03               ` Derrick Stolee
2019-07-01 12:11         ` Johannes Schindelin
2019-07-01 12:30           ` Derrick Stolee
2019-07-01 18:22             ` Johannes Sixt
2019-07-01 18:47               ` Derrick Stolee
2019-07-01 12:53           ` Jeff King
2019-07-01  8:36       ` SZEDER Gábor
2019-07-01 17:17   ` Andreas Schwab
2019-07-01 19:24     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).