git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sun Chao <16657101987@163.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Martin Fick <mfick@codeaurora.org>,
	Son Luong Ngoc <sluongng@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH v3] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 23:34:25 +0800	[thread overview]
Message-ID: <29757455-B195-49CD-8185-55B64DCEC1D1@163.com> (raw)
In-Reply-To: <87bl6xwlaz.fsf@evledraar.gmail.com>



> 2021年7月20日 14:19,Ævar Arnfjörð Bjarmason <avarab@gmail.com> 写道:
> 
> 
> On Mon, Jul 19 2021, Taylor Blau wrote:
> 
>> On Mon, Jul 19, 2021 at 07:53:19PM +0000, Sun Chao via GitGitGadget wrote:
>>> From: Sun Chao <16657101987@163.com>
>>> 
>>> Commit 33d4221c79 (write_sha1_file: freshen existing objects,
>>> 2014-10-15) avoid writing existing objects by freshen their
>>> mtime (especially the packfiles contains them) in order to
>>> aid the correct caching, and some process like find_lru_pack
>>> can make good decision. However, this is unfriendly to
>>> incremental backup jobs or services rely on cached file system
>>> when there are large '.pack' files exists.
>>> 
>>> For example, after packed all objects, use 'write-tree' to
>>> create same commit with the same tree and same environments
>>> such like GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, we can
>>> notice the '.pack' file's mtime changed. Git servers
>>> that mount the same NFS disk will re-sync the '.pack' files
>>> to cached file system which will slow the git commands.
>>> 
>>> So if add core.freshenPackfiles to indicate whether or not
>>> packs can be freshened, turning off this option on some
>>> servers can speed up the execution of some commands on servers
>>> which use NFS disk instead of local disk.
>> 
>> Hmm. I'm still quite unconvinced that we should be taking this direction
>> without better motivation. We talked about your assumption that NFS
>> seems to be invalidating the block cache when updating the inodes that
>> point at those blocks, but I don't recall seeing further evidence.
> 
> I don't know about Sun's setup, but what he's describing is consistent
> with how NFS works, or can commonly be made to work.
> 
> See e.g. "lookupcache" in nfs(5) on Linux, but also a lot of people use
> some random vendor's proprietary NFS implementation, and commonly tweak
> various options that make it anywhere between "I guess that's not too
> crazy" and "are you kidding me?" levels of non-POSIX compliant.
> 
>> Regardless, a couple of idle thoughts:
>> 
>>> +	if (!core_freshen_packfiles)
>>> +		return 1;
>> 
>> It is important to still freshen the object mtimes even when we cannot
>> update the pack mtimes. That's why we return 0 when "freshen_file"
>> returned 0: even if there was an error calling utime, we should still
>> freshen the object. This is important because it impacts when
>> unreachable objects are pruned.
>> 
>> So I would have assumed that if a user set "core.freshenPackfiles=false"
>> that they would still want their object mtimes updated, in which case
>> the only option we have is to write those objects out loose.
>> 
>> ...and that happens by the caller of freshen_packed_object (either
>> write_object_file() or hash_object_file_literally()) then calling
>> write_loose_object() if freshen_packed_object() failed. So I would have
>> expected to see a "return 0" in the case that packfile freshening was
>> disabled.
>> 
>> But that leads us to an interesting problem: how many redundant objects
>> do we expect to see on the server? It may be a lot, in which case you
>> may end up having the same IO problems for a different reason. Peff
>> mentioned to me off-list that he suspected write-tree was overeager in
>> how many trees it would try to write out. I'm not sure.
> 
> In my experience with NFS the thing that kills you is anything that
> needs to do iterations, i.e. recursive readdir() and the like, or to
> read a lot of data, throughput was excellent. It's why I hacked core
> that core.checkCollisions patch.
I have read your patch, looks like a good idea to reduce the expensive operations
like readdir(). And in my production environments, IOPS stress of the NFS server
bothers me which make the git commands slow.

> 
> Jeff improved the situation I was mainly trying to fix with with the
> loose objects cache. I never got around to benchmarking the two in
> production, and now that setup is at an ex-job...
> 
>>> +test_expect_success 'do not bother loosening old objects without freshen pack time' '
>>> +	obj1=$(echo three | git hash-object -w --stdin) &&
>>> +	obj2=$(echo four | git hash-object -w --stdin) &&
>>> +	pack1=$(echo $obj1 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
>>> +	pack2=$(echo $obj2 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
>>> +	git -c core.freshenPackFiles=false prune-packed &&
>>> +	git cat-file -p $obj1 &&
>>> +	git cat-file -p $obj2 &&
>>> +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
>>> +	git -c core.freshenPackFiles=false repack -A -d --unpack-unreachable=1.hour.ago &&
>>> +	git cat-file -p $obj1 &&
>>> +	test_must_fail git cat-file -p $obj2
>>> +'
>> 
>> I had a little bit of a hard time following this test. AFAICT, it
>> proceeds as follows:
>> 
>>  - Write two packs, each containing a unique unreachable blob.
>>  - Call 'git prune-packed' with packfile freshening disabled, then
>>    check that the object survived.
>>  - Then repack while in a state where one of the pack's contents would
>>    be pruned.
>>  - Make sure that one object survives and the other doesn't.
>> 
>> This doesn't really seem to be testing the behavior of disabling
>> packfile freshening so much as it's testing prune-packed, and repack's
>> `--unpack-unreachable` option. I would probably have expected to see
>> something more along the lines of:
>> 
>>  - Write an unreachable object, pack it, and then remove the loose copy
>>    you wrote in the first place.
>>  - Then roll the pack's mtime to some fixed value in the past.
>>  - Try to write the same object again with packfile freshening
>>    disabled, and verify that:
>>    - the pack's mtime was unchanged,
>>    - the object exists loose again
>> 
>> But I'd really like to get some other opinions (especially from Peff,
>> who brought up the potential concerns with write-tree) as to whether or
>> not this is a direction worth pursuing.
>> 
>> My opinion is that it is not, and that the bizarre caching behavior you
>> are seeing is out of Git's control.
> 
> Thanks for this, I found the test hard to follow too, but didn't have
> time to really think about it, this makes sense.
> 
> Back to the topic: I share your sentiment of trying to avoid complexity
> in this area.
> 
> Sun: Have you considered --keep-unreachable to "git repack"? It's
> orthagonal to what you're trying here, but I wonder if being more
> aggressive about keeping objects + some impromevents to perhaps skip
> this "touch" at all if we have that in effect wouldn't be more viable &
> something e.g. Taylor would be more comforable having part of git.git.

Yes, I will try to create some more useful test cases with both `--keep-unreachable`
and `--unpack-unreachable` if I still believe I need to do something with
the mtime, whatever I need to get my test reports first, thanks ;)


  reply	other threads:[~2021-07-20 15:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao [this message]
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29757455-B195-49CD-8185-55B64DCEC1D1@163.com \
    --to=16657101987@163.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).