git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com,
	git@jeffhostetler.com, peff@peff.net, gitster@pobox.com,
	Johannes.Shindelin@gmx.de, jrnieder@gmail.com
Subject: Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes
Date: Mon, 8 Jan 2018 15:35:59 -0500	[thread overview]
Message-ID: <4d7a1fb2-84ca-6bf9-811c-29ad21b4c5a6@gmail.com> (raw)
In-Reply-To: <20180108113226.da265814e5c1deea1f8c404d@google.com>

On 1/8/2018 2:32 PM, Jonathan Tan wrote:
> On Sun,  7 Jan 2018 13:14:42 -0500
> Derrick Stolee <stolee@gmail.com> wrote:
>
>> +Design Details
>> +--------------
>> +
>> +- The MIDX file refers only to packfiles in the same directory
>> +  as the MIDX file.
>> +
>> +- A special file, 'midx-head', stores the hash of the latest
>> +  MIDX file so we can load the file without performing a dirstat.
>> +  This file is especially important with incremental MIDX files,
>> +  pointing to the newest file.
> I presume that the actual MIDX files are named by hash? (You might have
> written this somewhere that I somehow missed.)
>
> Also, I notice that in the "Future Work" section, the possibility of
> multiple MIDX files is raised. Could this 'midx-head' file be allowed to
> store multiple such files? That way, we avoid a bit of file format
> churn (in that we won't need to define a new "chunk" in the future).

I hadn't considered this idea, and I like it. I'm not sure this is a 
robust solution, since isolated MIDX files don't contain information 
that they could use other MIDX files, or what order they should be in. I 
think the "order" of incremental MIDX files is important in a few ways 
(such as the "stable object order" idea).

I will revisit this idea when I come back with the incremental MIDX 
feature. For now, the only reference to "number of base MIDX files" is 
in one byte of the MIDX header. We should consider changing that byte 
for this patch.

>> +- If a packfile exists in the pack directory but is not referenced
>> +  by the MIDX file, then the packfile is loaded into the packed_git
>> +  list and Git can access the objects as usual. This behavior is
>> +  necessary since other tools could add packfiles to the pack
>> +  directory without notifying Git.
>> +
>> +- The MIDX file should be only a supplemental structure. If a
>> +  user downgrades or disables the `core.midx` config setting,
>> +  then the existing .idx and .pack files should be sufficient
>> +  to operate correctly.
> Let me try to summarize: so, at this point, there are no
> backwards-incompatible changes to the repo disk format. Unupdated code
> paths (and old versions of Git) can just read the .idx and .pack files,
> as always. Updated code paths will look at the .midx and .idx files, and
> will sort them as follows:
>   - .midx files go into a data structure
>   - .idx files not referenced by any .midx files go into the
>     existing packed_git data structure
>
> A writer can either merely write a new packfile (like old versions of
> Git) or write a packfile and update the .midx file, and everything above
> will still work. In the event that a writer deletes an existing packfile
> referenced by a .midx (for example, old versions of Git during a
> repack), we will lose the advantages of the .midx file - we will detect
> that the .midx no longer works when attempting to read an object given
> its information, but in this case, we can recover by dropping the .midx
> file and loading all the .idx files it references that still exist.
>
> As a reviewer, I think this is a very good approach, and this does make
> things easier to review (as opposed to, say, an approach where a lot of
> the code must be aware of .midx files).

Thanks! That is certainly the idea. If you know about MIDX, then you can 
benefit from it. If you do not, then you have all the same data 
available to you do to your work. Having a MIDX file will not break 
other tools (libgit2, JGit, etc.).

One thing I'd like to determine before this patch goes to v1 is how much 
we should make the other packfile-aware commands also midx-aware. My gut 
reaction right now is to have git-repack call 'git midx --clear' if 
core.midx=true and a packfile was deleted. However, this could easily be 
changed with 'git midx --clear' followed by 'git midx --write 
--update-head' if midx-head exists.

Thanks,
-Stolee

  reply	other threads:[~2018-01-08 20:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-07 18:14 [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes Derrick Stolee
2018-01-08 19:32   ` Jonathan Tan
2018-01-08 20:35     ` Derrick Stolee [this message]
2018-01-08 22:06       ` Jonathan Tan
2018-01-07 18:14 ` [RFC PATCH 02/18] midx: specify midx file format Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 03/18] midx: create core.midx config setting Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 04/18] midx: write multi-pack indexes for an object list Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 05/18] midx: create midx builtin with --write mode Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 06/18] midx: add t5318-midx.sh test script Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 07/18] midx: teach midx --write to update midx-head Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 08/18] midx: teach git-midx to read midx file details Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 09/18] midx: find details of nth object in midx Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 10/18] midx: use existing midx when writing Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 11/18] midx: teach git-midx to clear midx files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 12/18] midx: teach git-midx to delete expired files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 13/18] t5318-midx.h: confirm git actions are stable Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 14/18] midx: load midx files when loading packs Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 15/18] midx: use midx for approximate object count Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 16/18] midx: nth_midxed_object_oid() and bsearch_midx() Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 17/18] sha1_name: use midx for abbreviations Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 18/18] packfile: use midx for object loads Derrick Stolee
2018-01-07 22:42 ` [RFC PATCH 00/18] Multi-pack index (MIDX) Ævar Arnfjörð Bjarmason
2018-01-08  0:08   ` Derrick Stolee
2018-01-08 10:20     ` Jeff King
2018-01-08 10:27       ` Jeff King
2018-01-08 12:28         ` Ævar Arnfjörð Bjarmason
2018-01-08 13:43       ` Johannes Schindelin
2018-01-09  6:50         ` Jeff King
2018-01-09 13:05           ` Johannes Schindelin
2018-01-09 19:51             ` Stefan Beller
2018-01-09 20:12               ` Junio C Hamano
2018-01-09 20:16                 ` Stefan Beller
2018-01-09 21:31                   ` Junio C Hamano
2018-01-10 17:05               ` Johannes Schindelin
2018-01-10 10:57             ` Jeff King
2018-01-08 13:43       ` Derrick Stolee
2018-01-09  7:12         ` Jeff King
2018-01-08 11:43     ` Ævar Arnfjörð Bjarmason
2018-06-06  8:13     ` Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 0/2] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 1/2] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 2/2] sha1-name: add core.validateAbbrev & relative core.abbrev Ævar Arnfjörð Bjarmason
2018-06-06 12:04         ` Christian Couder
2018-06-06 11:24       ` [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-10 18:25 ` Martin Fick
2018-01-10 19:39   ` Derrick Stolee
2018-01-10 21:01     ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d7a1fb2-84ca-6bf9-811c-29ad21b4c5a6@gmail.com \
    --to=stolee@gmail.com \
    --cc=Johannes.Shindelin@gmx.de \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).