From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS,URIBL_BLOCKED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id F386F1F66E for ; Sat, 29 Aug 2020 05:06:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725929AbgH2Esn (ORCPT ); Sat, 29 Aug 2020 00:48:43 -0400 Received: from dcvr.yhbt.net ([64.71.152.64]:39890 "EHLO dcvr.yhbt.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbgH2Esn (ORCPT ); Sat, 29 Aug 2020 00:48:43 -0400 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6028A1F66E; Sat, 29 Aug 2020 04:48:42 +0000 (UTC) Date: Sat, 29 Aug 2020 04:48:42 +0000 From: Eric Wong To: Ivan Baldo Cc: git@vger.kernel.org Subject: Re: Fastest way to set files date and time to latest commit time of each one Message-ID: <20200829044842.GA5732@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Ivan Baldo wrote: > Hello. > I know this is not standard usage of git, but I need a way to have > more stable dates and times in the files in order to avoid rsync > checksumming. > So I found this > https://stackoverflow.com/questions/2179722/checking-out-old-file-with-original-create-modified-timestamps/2179876#2179876 > and modified it a bit to run in CentOS 7: > > IFS=" > " > for FILE in $(git ls-files -z | tr '\0' '\n') > do > TIME=$(git log --pretty=format:%cd -n 1 --date=iso -- "$FILE") > touch -c -m -d "$TIME" "$FILE" > done > > Unfortunately it takes ages for a 84k files repo. > I see the CPU usage is dominated by the git log command. running git log for each file isn't necessary. On Debian, rsync actually ships the `git-set-file-times' script in /usr/share/doc/rsync/scripts/ which only runs `git log' once and parses it. You can also get my (original) version from: https://yhbt.net/git-set-file-times > I know a way I could use to split the work for all the CPU threads > but anyway, I would like to know if you guys and girls know of a > faster way to do this. Much of your overhead is going to be from process spawning. My Perl version reduces that significantly. I haven't tried it with 84K files, but it'll have to keep all those filenames in memory. I'm not sure if parallelizing utime() syscalls is worth it, either; maybe it helps on SSD more than HDD.