git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Phillip Wood <phillip.wood@talktalk.net>
To: Lars Schneider <larsxschneider@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Elijah Newren <newren@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	mgorny@gentoo.org, rtc@helen.PLASMA.Xg8.DE,
	winserver.support@winserver.com, tytso@mit.edu
Subject: Re: Optimizing writes to unchanged files during merges?
Date: Mon, 16 Apr 2018 18:47:53 +0100	[thread overview]
Message-ID: <21d15f6b-f4ab-2efc-47a9-6cbf95cf80d7@talktalk.net> (raw)
In-Reply-To: <F1738316-71EF-4053-82E5-F009F491CCE8@gmail.com>

On 16/04/18 17:07, Lars Schneider wrote:
> 
> I am happy to see this discussion and the patches, because long rebuilds 
> are a constant annoyance for us. We might have been bitten by the exact 
> case discussed here, but more often, we have a slightly different 
> situation:
> 
> An engineer works on a task branch and runs incremental builds — all 
> is good. The engineer switches to another branch to review another 
> engineer's work. This other branch changes a low-level header file, 
> but no rebuild is triggered. The engineer switches back to the previous 
> task branch. At this point, the incremental build will rebuild 
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
> 
> Of course, this problem can be solved with a separate worktree. However, 
> our engineers forget about that sometimes, and then, they are annoyed by 
> a 4h rebuild.
> 
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
> 
> What if Git kept a LRU list that contains file path, content hash, and 
> mtime of any file that is removed or modified during a checkout. If a 
> file is checked out later with the exact same path and content hash, 
> then Git could set the mtime to the previous value. This way the 
> compiler would not think that the content has been changed since the 
> last rebuild.

Hi Lars

But if there has been rebuild between the checkouts then you
want the compiler to rebuild. I've been using the script below
recently to save and restore mtimes around running rebase to squash
fixup commits. To avoid restoring the mtimes if there has been a
rebuild since they were stored it takes a list of build sentinels and
stores their mtimes too - if any of those change then it will refuse
to restore the original mtimes of the tracked files (if you give a
path that does exist when the mtimes are stored then it will refuse to
restore the mtimes if that path exists when you run 'git mtimes
restore'). The sentinels can be specified on the commandline when
running 'git mtimes save' or stored in multiple mtimes.sentinal config
keys.

Best Wishes

Phillip

--->8---

#!/usr/bin/perl

# Copyright (C) 2018 Phillip Wood <phillip.wood@dunelm.org.uk>
#
# git-mtimes.perl
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

use 5.008;
use strict;
use warnings;
use File::Copy (qw(copy));
use File::Spec::Functions (qw(abs2rel catfile file_name_is_absolute rel2abs));
use Storable ();

sub git;

my $GIT_DIR = git(qw(rev-parse --git-dir)) or exit 1;
$GIT_DIR = rel2abs($GIT_DIR);
my $mtimes_path = "$GIT_DIR/mtimes";

sub git {
    my @lines;
    # in a scalar context slurp removing any trailing $/
    # in an array context return a list of lines
    {
	local $/ = wantarray ? $/ : undef;
	local $,=' ';
	open my $fh, '-|', 'git', @_ or die "git @_ failed $!";
	@lines = <$fh>;
	chomp @lines;
	unless (close $fh) {
	    $? == -1 and die "git @_ not found";
	    my $code = $? >> 8;
	    $_[0] eq 'config' and $code == 1 or
		die "git @_ failed with exit code $code"
	}
    }
    wantarray and return @lines;
    @lines and chomp @lines;
    return $lines[0];
}

sub ls_files {
    # mode, uid, gid, mtime and maybe atime
    my @stat_indices = $_[0] ? (2, 4, 5, 9, 8) : (2, 4, 5, 9);
    local $_;
    local $/ = "\0";
    my @files;
    for (git(qw(ls-files --stage -z))) {
	if (/^[^ ]+ ([^\t]+) 0\t(.*)/) {
	    my @stat = stat($2);
	    # store name, hash, mode, uid, gid, mtime and maybe atime
	    push @files, [ $2, $1, @stat[@stat_indices] ];
	}
    }
    return @files;
}

sub get_config {
    local $/ = "\0";
    my $get = wantarray ? '--get-all' : '--get';
    git(qw(config -z), $get, @_);
}

sub save {
    local $_;
    my @sentinels = get_config('mtimes.sentinel');
    push @sentinels, @ARGV;
    @sentinels or die "No sentinels given";
    @sentinels = map { [ $_, [ stat $_ ] ] } @sentinels;
    my @files = ls_files();
    Storable::nstore( [ [ @sentinels ] , [ @files ] ], $mtimes_path) or
	die "unable to store mtimes $!";
}

sub match_sentinel_data {
    local $_;
    my ($old, $new, $trustctime) = @_;
    if (!@$old) {
	return (@$new) ? undef : 1;
    } else {
	@$new or return undef;
    }
    # Skip hardlink count, atime, blksize
    for (0..2,4..7,9,10,12) {
	next if ($_ == 10 and ! $trustctime);
	$old->[$_] == $new->[$_] or return undef;
    }
    return 1;
}

sub needs_update {
    local $_;
    my ($old, $new) = @_;
    for (0..1) {
	$old->[$_] eq $new->[$_] or return undef;
    }
    for (2..4) {
	$old->[$_] == $new->[$_] or return undef;
    }
    $old->[5] != $new->[5];
}

sub restore {
    local $_;
    my $stored = Storable::retrieve($mtimes_path) or
	die "unable to load stored data";
    my $trustctime = get_config('--bool', 'core.trustctime');
    $trustctime = defined($trustctime) ? $trustctime eq 'true' : 1;
    my ($sentinels, $oldfiles) = @$stored;
    for (@$sentinels) {
	match_sentinel_data( [ stat($_->[0]) ], $_->[1], $trustctime) or
	    die "Unable to restore mtimes, stat data for sentinel '$_->[0]' does not match";
    }
    my @newfiles = ls_files(1);
    my ($i, $restored) = (0, 0);
    for (@$oldfiles) {
	while ($newfiles[$i]->[0] lt $_->[0] and $i < @newfiles) {
	    $i++;
	}
	if (needs_update($_, $newfiles[$i])) {
	    utime($newfiles[$i]->[6], $_->[5], $_->[0]);
	    $restored = 1;
	}
    }
    if ($restored) {
	print "restored mtimes\n";
    }
}

my $cmd = shift;
# Keep relative paths relative in case repository directory is renamed
# between saving and restoring mtimes.
if ($ENV{GIT_PREFIX}) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : catfile($ENV{GIT_PREFIX}, $_);
    } @ARGV;
}
my $up = git(qw(rev-parse --show-cdup));
if ($up) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : abs2rel(rel2abs($_), $up);
    } @ARGV;
    chdir $up;
}

my $tmp_index = catfile($GIT_DIR, "mtimes-index");
my $src_index = $ENV{GIT_INDEX_FILE} ? $ENV{GIT_INDEX_FILE} :
				       catfile($GIT_DIR, "index");
copy($src_index, $tmp_index) or
    die "cannot create temporary index '$tmp_index'\n";
$ENV{GIT_INDEX_FILE} = $tmp_index;
git(qw(add -u));

if ($cmd eq 'save') {
    save();
} elsif ($cmd eq 'restore' and ! @ARGV) {
    restore();
} else {
    print STDERR "usage: git mtimes <save [sentinels ...] | restore>\n";
}

END {
    unlink $tmp_index;
}

> I think that would fix the problem that our engineers run into and also 
> the problem that Linus experienced during the merge, wouldn't it?
> 
> Thanks,
> Lars
> 


  parent reply	other threads:[~2018-04-16 17:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12 21:14 Optimizing writes to unchanged files during merges? Linus Torvalds
2018-04-12 21:46 ` Junio C Hamano
2018-04-12 23:17   ` Junio C Hamano
2018-04-12 23:35     ` Linus Torvalds
2018-04-12 23:41       ` Linus Torvalds
2018-04-12 23:55         ` Linus Torvalds
2018-04-13  0:01           ` Linus Torvalds
2018-04-13  7:02             ` Elijah Newren
2018-04-13 17:14               ` Linus Torvalds
2018-04-13 17:39                 ` Stefan Beller
2018-04-13 17:53                   ` Linus Torvalds
2018-04-13 20:04                 ` Elijah Newren
2018-04-13 22:27                   ` Junio C Hamano
2018-04-16  1:44                 ` Junio C Hamano
2018-04-16  2:03                   ` Linus Torvalds
2018-04-16 16:07                     ` Lars Schneider
2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
2018-04-17 17:23                         ` Lars Schneider
2018-04-16 17:43                       ` Jacob Keller
2018-04-16 17:45                         ` Jacob Keller
2018-04-16 22:34                           ` Junio C Hamano
2018-04-17 17:27                           ` Lars Schneider
2018-04-17 17:43                             ` Jacob Keller
2018-04-16 17:47                       ` Phillip Wood [this message]
2018-04-16 20:09                       ` Stefan Haller
2018-04-16 22:55                     ` Elijah Newren
2018-04-16 23:03                   ` Elijah Newren
2018-04-12 23:18   ` Linus Torvalds
2018-04-13  0:01 ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21d15f6b-f4ab-2efc-47a9-6cbf95cf80d7@talktalk.net \
    --to=phillip.wood@talktalk.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mgorny@gentoo.org \
    --cc=newren@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=rtc@helen.PLASMA.Xg8.DE \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=winserver.support@winserver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).