git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sun Chao <16657101987@163.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Sun Chao via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Thu, 15 Jul 2021 00:46:47 +0800	[thread overview]
Message-ID: <ACE7ECBE-0D7A-4FB8-B4F9-F9E32BE2234C@163.com> (raw)
In-Reply-To: <YO5RZ0Wix/K5q53Z@nand.local>



> 2021年7月14日 10:52,Taylor Blau <me@ttaylorr.com> 写道:
> 
> On Wed, Jul 14, 2021 at 03:39:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Hrm, per my v1 feedback (and I'm not sure if my suggestion is even good
>> here, there's others more familiar with this area than I am), I was
>> thinking of something like a *.bump file written via:
>> 
>>    core.packUseBumpFiles=bool
>> 
>> Or something like that, anyway, the edge case in allowing the user to
>> pick arbitrary suffixes is that we'd get in-the-wild user arbitrary
>> configuration squatting on a relatively sensitive part of the object
>> store.
>> 
>> E.g. we recently added *.rev files to go with
>> *.{pack,idx,bitmap,keep,promisor} (and I'm probably forgetting some
>> suffix). What if before that a user had set:
>> 
>>    core.packMtimeSuffix=rev
> 
> I think making the suffix configurable is probably a mistake. It seems
> like an unnecessary detail to expose, but it also forces us to think
> about cases like these where the configured suffix is already used for
> some other purpose.

Thanks, I agree with you and will fix it, such like the *.keep file, we
do not use the suffix configuration to create keep files.

> 
> I don't think that a new ".bump" file is a bad idea, but it does seem
> like we have a lot of files that represent a relatively little amount of
> the state that a pack can be in. The ".promisor" and ".keep" files both
> come to mind here. Some thoughts in this direction:
> 
>  - Combining *all* of the pack-related files (including the index,
>    reverse-index, bitmap, and so on) into a single "pack-meta" file
>    seems like a mistake for caching reasons.
> 
>  - But a meta file that contains just the small state (like promisor
>    information and whether or not the pack is "kept") seems like it
>    could be OK. On the other hand, being able to tweak the kept state
>    by touching or deleting a file is convenient (and having to rewrite
>    a meta file containing other information is much less so).

Yes, read and rewrite a meta file also means we need do lock/unlock, which
may cause inconvenient operations.

> 
> But a ".bump" file does seem like an awkward way to not rely on the
> mtime of the pack itself. And I do think it runs into compatibility
> issues like Ævar mentioned. Any version of Git that includes a
> hypothetical .bump file (or something like it) needs to also update the
> pack's mtime, too, so that old versions of Git can understand it. (Of
> course, that could be configurable, but that seems far too obscure to
> me).

Here we will have a configuration and default is backward compatiblity,
and if user decide to use the '.bump' file, which means he can not use
the old versions of Git, like the Repository Format Version, it is limited.

> 
> Stepping back, I'm not sure I understand why freshening a pack is so
> slow for you. freshen_file() just calls utime(2), and any sync back to
> the disk shouldn't need to update the pack itself, just a couple of
> fields in its inode. Maybe you could help explain further.
> 
> In any case, I couldn't find a spot in your patch that updates the
> packed_git's 'mtime' field, which is used to (a) sort packs in the
> linked list of packs, and (b) for determining the least-recently used
> pack if it has individual windows mmapped.

The reason why we want to avoid freshen the mtime of ".pack" file is to
improve the reading speed of Git Servers.

We have some large repositories in our Git Severs (some are bigger than 10GB),
and we created '.keep' files for large ".pack" files, we want the big files
unchanged to speed up git upload-pack, because in our mind the file system
cache will reduce the disk IO if a file does not changed.

However we find the mtime of ".pack" files changes over time which makes the
file system always reload the big files, that takes a lot of IO time and result
in lower speed of git upload-pack and even further the disk IOPS is exhausted.

> 
> Thanks,
> Taylor
> 



  reply	other threads:[~2021-07-14 16:47 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao [this message]
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ACE7ECBE-0D7A-4FB8-B4F9-F9E32BE2234C@163.com \
    --to=16657101987@163.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).