From: <steve.norman@thomsonreuters.com>
To: <peff@peff.net>, <pclouds@gmail.com>
Cc: <git@vger.kernel.org>
Subject: RE: Troubleshoot clone issue to NFS.
Date: Fri, 22 May 2015 08:35:52 +0000 [thread overview]
Message-ID: <7FAE15F0A93C0144AD8B5FBD584E1C551975ADA4@C111KXTEMBX51.ERF.thomson.com> (raw)
In-Reply-To: <20150522071224.GA10734@peff.net>
On Friday, May 22, 2015 @ 8:12 AM Jeff King did scribble:
> > In builtin/index-pack.c, replace the line "collision_test_needed =
> > has_sha1_file(sha1);" with "collision_test_needed = 0;". Security is
> > compromised but for this test it should be ok. Then clone again. I
> > hope the new number gets down close to v1.8.4.1.
>
> Yeah, I think that is a good starting point. I timed a local clone
> before and after 45e8a74; there is a small difference on my system
> (about 5%), but it goes away with your suggestion.
Tested this change on a couple of versions, first of all on the revision
where things go wrong for me:
~ $ bin/git --version
git version 1.8.4.1.g45e8a74.dirty
~ $ time bin/git clone https://github.com/git/git test
<snip>
real 0m7.105s
user 0m9.566s
sys 0m0.989s
~ $ time bin/git clone https://github.com/git/git /sami/test
<snip>
real 0m14.411s
user 0m9.703s
sys 0m1.374s
This is more in line with what I see normally. Also tested on master:
~ $ bin/git --version
git version 2.4.1.217.g6c1249c.dirty
~ $ time bin/git clone https://github.com/git/git test
<snip>
real 0m5.946s
user 0m9.111s
sys 0m1.332s
~ $ time bin/git clone https://github.com/git/git /sami/test
<snip>
real 0m12.344s
user 0m9.187s
sys 0m1.579s
So similar on the latest as well.
> The problem with has_sha1_file() prior to v1.8.4.2 is that it is racy
> with respect to simultaneous operations; we might claim we do not have
> an object, when in fact we do. As you noted, usually has_sha1_file()
> returns true (i.e., we look up objects that we expect to have), and the
> performance impact is minimal.
>
> But for code paths where _not_ having the object is normal, the impact
> is much greater. So I think there are two possibilities for improving
> this:
>
> 1. Find places where we expect the object will not exist (like the
> collision_test check you pointed out) and use a
> "has_sha1_file_fast" that accepts that it may very occasionally
> erroneously return false. In this case it would mean potentially
> skipping a collision check, but I think that is OK. That could have
> security implications, but only if an attacker:
>
> a. has broken sha1 to generate a colliding object
>
> b. can manipulate the victim into repacking in a loop
>
> c. can manipulate the victim into fetching (or receiving a push)
> simultaneously with (b)
>
> at which point they can try to race the repack procedure to add
> their colliding object to the repository. It seems rather unlikely
> (especially part a).
>
> 2. Make reprepare_packed_git() cheaper in the common case that nothing
> has changed. It would probably be enough to stat("objects/pack").
> We know that packfiles themselves do not change; we may only add or
> delete them. And since the hierarchy of objects/pack is flat, we
> know that the mtime on that directory will change if any packs are
> added or removed.
>
> Of course, we are still doing an extra stat() for each has_sha1_file
> call. Whether that helps for the NFS case depends on whether stat()
> is significantly cheaper than opendir/readdir/closedir. On my local
> disk, the hacky patch below did seem to give me back the 5% lost by
> 45e8a74 (I did it directly on master, since that old commit does
> not have the stat_validity infrastructure).
Also tested master with the patch provided:
~ $ bin/git --version
git version 2.4.1.217.g6c1249c.dirty
~ $ time git clone https://github.com/git/git test
real 0m8.263s
user 0m10.550s
sys 0m3.763s
~ $ time git clone https://github.com/git/git /sami/test
real 1m3.286s
user 0m12.149s
sys 0m9.192s
So the patch isn't reducing the time taken when cloning to NAS.
Here are the top calls from strace
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.68 19.946171 46 437398 45436 futex
4.63 1.017637 22 46141 5 read
2.37 0.521161 4 141429 pread
0.73 0.161442 3 47130 9 write
0.42 0.093146 0 188645 188621 access
0.38 0.083033 26 3209 181 open
0.32 0.069587 0 188613 1146 stat
0.23 0.050855 12 4082 3925 lstat
0.11 0.023317 8 2979 1 fstat
0.04 0.009134 0 35696 3 recvfrom
0.03 0.007666 1917 4 wait4
0.02 0.004478 1 3923 madvise
0.01 0.002291 0 17858 poll
0.01 0.002155 0 17851 select
Thanks for looking into this.
Steve
next prev parent reply other threads:[~2015-05-22 8:36 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38 ` Duy Nguyen
2015-05-21 15:53 ` steve.norman
2015-05-22 0:16 ` Duy Nguyen
2015-05-22 7:12 ` Jeff King
2015-05-22 8:35 ` steve.norman [this message]
2015-05-22 10:05 ` Duy Nguyen
2015-05-22 14:37 ` Junio C Hamano
2015-05-22 15:02 ` steve.norman
2015-05-22 23:51 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51 ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00 ` Michael Haggerty
2015-05-24 8:29 ` Jeff King
2015-05-22 23:52 ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54 ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23 1:19 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23 1:21 ` Duy Nguyen
2015-05-24 8:20 ` Jeff King
2015-05-24 9:00 ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01 ` steve.norman
2015-06-05 12:18 ` Jeff King
2015-06-05 12:29 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24 ` Jeff King
2015-06-09 17:41 ` Jeff King
2015-06-10 3:46 ` Shawn Pearce
2015-06-10 14:00 ` Jeff King
2015-06-10 14:36 ` Duy Nguyen
2015-06-10 21:34 ` Shawn Pearce
2015-06-05 14:20 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7FAE15F0A93C0144AD8B5FBD584E1C551975ADA4@C111KXTEMBX51.ERF.thomson.com \
--to=steve.norman@thomsonreuters.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).