git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sun Chao <16657101987@163.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Sun Chao via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Martin Fick" <mfick@codeaurora.org>,
	"Son Luong Ngoc" <sluongng@gmail.com>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 23:00:18 +0800	[thread overview]
Message-ID: <5D8CDAF1-256C-4CC3-920C-2063CFACE9BD@163.com> (raw)
In-Reply-To: <YPXluqywHs3u4Qr+@nand.local>



> 2021年7月20日 04:51,Taylor Blau <me@ttaylorr.com> 写道:
> 
> On Mon, Jul 19, 2021 at 07:53:19PM +0000, Sun Chao via GitGitGadget wrote:
>> From: Sun Chao <16657101987@163.com>
>> 
>> Commit 33d4221c79 (write_sha1_file: freshen existing objects,
>> 2014-10-15) avoid writing existing objects by freshen their
>> mtime (especially the packfiles contains them) in order to
>> aid the correct caching, and some process like find_lru_pack
>> can make good decision. However, this is unfriendly to
>> incremental backup jobs or services rely on cached file system
>> when there are large '.pack' files exists.
>> 
>> For example, after packed all objects, use 'write-tree' to
>> create same commit with the same tree and same environments
>> such like GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, we can
>> notice the '.pack' file's mtime changed. Git servers
>> that mount the same NFS disk will re-sync the '.pack' files
>> to cached file system which will slow the git commands.
>> 
>> So if add core.freshenPackfiles to indicate whether or not
>> packs can be freshened, turning off this option on some
>> servers can speed up the execution of some commands on servers
>> which use NFS disk instead of local disk.
> 
> Hmm. I'm still quite unconvinced that we should be taking this direction
> without better motivation. We talked about your assumption that NFS
> seems to be invalidating the block cache when updating the inodes that
> point at those blocks, but I don't recall seeing further evidence.

Yes, these days I'm trying to asking help from our SRE to do tests in our
production environments (where we can get the real traffic report of NFS server),
such like:

- setup a repository witch some large packfiles in a NFS disk
- keep running 'git pack-objects --all --stdout --delta-base-offset >/dev/null' in
multiple git servers that mount the same NFS disk above
- 'touch' the packfiles in another server, and check (a) if the IOPS and IO traffic
of the NFS server changes and (b) if the IO traffic of network interfaces from
the git servers to the NFS server changes

I like to share the data when I receive the reports.

Meanwhile I find the description of the cached file system for NFS Client:
   https://www.ibm.com/docs/en/aix/7.2?topic=performance-cache-file-system
It is mentioned that:

  3. To ensure that the cached directories and files are kept up to date, 
     CacheFS periodically checks the consistency of files stored in the cache.
     It does this by comparing the current modification time to the previous
     modification time.
  4. If the modification times are different, all data and attributes
     for the directory or file are purged from the cache, and new data and
     attributes are retrieved from the back file system.

It looks like reasonable, but perhaps I should check it from my test reports ;)

> 
> Regardless, a couple of idle thoughts:
> 
>> +	if (!core_freshen_packfiles)
>> +		return 1;
> 
> It is important to still freshen the object mtimes even when we cannot
> update the pack mtimes. That's why we return 0 when "freshen_file"
> returned 0: even if there was an error calling utime, we should still
> freshen the object. This is important because it impacts when
> unreachable objects are pruned.
> 
> So I would have assumed that if a user set "core.freshenPackfiles=false"
> that they would still want their object mtimes updated, in which case
> the only option we have is to write those objects out loose.
> 
> ...and that happens by the caller of freshen_packed_object (either
> write_object_file() or hash_object_file_literally()) then calling
> write_loose_object() if freshen_packed_object() failed. So I would have
> expected to see a "return 0" in the case that packfile freshening was
> disabled.
> 
> But that leads us to an interesting problem: how many redundant objects
> do we expect to see on the server? It may be a lot, in which case you
> may end up having the same IO problems for a different reason. Peff
> mentioned to me off-list that he suspected write-tree was overeager in
> how many trees it would try to write out. I'm not sure.

You are right, I haven't thought it in details, if we do not update the mtime
both of packfiles and loose files, 'prune' may delete them by accident.

> 
>> +test_expect_success 'do not bother loosening old objects without freshen pack time' '
>> +	obj1=$(echo three | git hash-object -w --stdin) &&
>> +	obj2=$(echo four | git hash-object -w --stdin) &&
>> +	pack1=$(echo $obj1 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
>> +	pack2=$(echo $obj2 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
>> +	git -c core.freshenPackFiles=false prune-packed &&
>> +	git cat-file -p $obj1 &&
>> +	git cat-file -p $obj2 &&
>> +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
>> +	git -c core.freshenPackFiles=false repack -A -d --unpack-unreachable=1.hour.ago &&
>> +	git cat-file -p $obj1 &&
>> +	test_must_fail git cat-file -p $obj2
>> +'
> 
> I had a little bit of a hard time following this test. AFAICT, it
> proceeds as follows:
> 
>  - Write two packs, each containing a unique unreachable blob.
>  - Call 'git prune-packed' with packfile freshening disabled, then
>    check that the object survived.
>  - Then repack while in a state where one of the pack's contents would
>    be pruned.
>  - Make sure that one object survives and the other doesn't.
> 
> This doesn't really seem to be testing the behavior of disabling
> packfile freshening so much as it's testing prune-packed, and repack's
> `--unpack-unreachable` option. I would probably have expected to see
> something more along the lines of:
> 
>  - Write an unreachable object, pack it, and then remove the loose copy
>    you wrote in the first place.
>  - Then roll the pack's mtime to some fixed value in the past.
>  - Try to write the same object again with packfile freshening
>    disabled, and verify that:
>    - the pack's mtime was unchanged,
>    - the object exists loose again
> 
> But I'd really like to get some other opinions (especially from Peff,
> who brought up the potential concerns with write-tree) as to whether or
> not this is a direction worth pursuing.
> 
> My opinion is that it is not, and that the bizarre caching behavior you
> are seeing is out of Git's control.
OK, I will try this. And I will try to get the test reports from our SRE and
check how the mtime impacts the caches. 

> 
> Thanks,
> Taylor


  parent reply	other threads:[~2021-07-20 15:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao [this message]
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D8CDAF1-256C-4CC3-920C-2063CFACE9BD@163.com \
    --to=16657101987@163.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=mfick@codeaurora.org \
    --cc=peff@peff.net \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).