git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Optimizing writes to unchanged files during merges?
@ 2018-04-12 21:14 Linus Torvalds
  2018-04-12 21:46 ` Junio C Hamano
  2018-04-13  0:01 ` Elijah Newren
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-04-12 21:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

So I just had an interesting experience that has happened before too,
but this time I decided to try to figure out *why* it happened.

I'm obviously in the latter part of the kernel merge window, and
things are slowly calming down. I do the second XFS merge during this
window, and it brings in updates just to the fs/xfs/ subdirectory, so
I expect that my test build for the full kernel configuration should
be about a minute.

Instead of recompiles pretty much *everything*, and the test build
takes 22 minutes.

This happens occasionally, and I blame gremlins. But this time I
decided to look at what the gremlins actually *are*.

The diff that git shows for the pull was very clear: only fs/xfs/
changes. But doing

  ls -tr $(git ls-files '*.[chS]') | tail -10

gave the real reason: in between all the fs/xfs/xyz files was this:

    include/linux/mm.h

and yeah, that rather core header file causes pretty much everything
to be re-built.

Now, the reason it was marked as changed is that the xfs branch _had_
in fact changed it, but the changes were already upstream and got
merged away. But the file still got written out (with the same
contents it had before the merge), and 'make' obviously only looks at
modification time, so make rebuilt everything.

Now, because it's still the merge window, I don't have much time to
look into this, but I was hoping somebody in git land would like to
give it a quick look. I'm sure I'm not the only one to have ever been
hit by this, and I don't think the kernel is the only project to hit
it either.

Because it would be lovely if the merging logic would just notice "oh,
that file doesn't change", and not even write out the end result.

For testing, the merge that triggered this git introspection is kernel
commit 80aa76bcd364 ("Merge tag 'xfs-4.17-merge-4' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux"), which can act as a
test-case. It's a clean merge, so no kernel knowledge necessary: just
re-merge the parents and see if the modification time of
include/linux/mm.h changes.

I'm guessing some hack in update_file_flags() to say "if the new
object matches the old one, don't do anything" might work. Although I
didn't look if we perhaps end up writing to the working tree copy
earlier already.

Looking at the blame output of that function, most of it is really
old, so this "problem" goes back a long long time.

Just to clarify: this is absolutely not a correctness issue. It's not
even a *git* performance issue. It's literally just a "not updating
the working tree when things don't change results in better behavior
for other tools".

So if somebody gets interested in this problem, that would be great.
And if not, I'll hopefully remember to come back to this next week
when the merge window is over, and take a second look.

                     Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 21:14 Optimizing writes to unchanged files during merges? Linus Torvalds
@ 2018-04-12 21:46 ` Junio C Hamano
  2018-04-12 23:17   ` Junio C Hamano
  2018-04-12 23:18   ` Linus Torvalds
  2018-04-13  0:01 ` Elijah Newren
  1 sibling, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-04-12 21:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Now, the reason it was marked as changed is that the xfs branch _had_
> in fact changed it, but the changes were already upstream and got
> merged away. But the file still got written out (with the same
> contents it had before the merge), and 'make' obviously only looks at
> modification time, so make rebuilt everything.

Thanks for a clear description of the issue.  It does sound
interesting.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 21:46 ` Junio C Hamano
@ 2018-04-12 23:17   ` Junio C Hamano
  2018-04-12 23:35     ` Linus Torvalds
  2018-04-12 23:18   ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2018-04-12 23:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> Now, the reason it was marked as changed is that the xfs branch _had_
>> in fact changed it, but the changes were already upstream and got
>> merged away. But the file still got written out (with the same
>> contents it had before the merge), and 'make' obviously only looks at
>> modification time, so make rebuilt everything.
>
> Thanks for a clear description of the issue.  It does sound
> interesting.

A bit of detour.  "Change in side branch happened to be a subset of
the change in trunk and gets subsumed, but we end up writing the
same result" happens also with the simpler resolve strategy.

Here is a fix.

 git-merge-one-file.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/git-merge-one-file.sh b/git-merge-one-file.sh
index 9879c59395..aa7392f7ff 100755
--- a/git-merge-one-file.sh
+++ b/git-merge-one-file.sh
@@ -137,11 +137,21 @@ case "${1:-.}${2:-.}${3:-.}" in
 		ret=1
 	fi
 
+	# Does the merge result happen to be identical to what we
+	# already have?  Then do not cause unnecessary recompilation.
+	# But don't bother the optimization if we need to chmod
+	if cmp -s "$4" "$src1" && test "$6" = "$7"
+	then
+		:; # happy
+	else
+
 	# Create the working tree file, using "our tree" version from the
 	# index, and then store the result of the merge.
 	git checkout-index -f --stage=2 -- "$4" && cat "$src1" >"$4" || exit 1
 	rm -f -- "$orig" "$src1" "$src2"
 
+	fi
+
 	if test "$6" != "$7"
 	then
 		if test -n "$msg"

	   

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 21:46 ` Junio C Hamano
  2018-04-12 23:17   ` Junio C Hamano
@ 2018-04-12 23:18   ` Linus Torvalds
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-04-12 23:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Thu, Apr 12, 2018 at 2:46 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Thanks for a clear description of the issue.  It does sound
> interesting.

I decided to show it with a simpler case that could be scripted and
doesn't need the kernel.

NOTE! This obviously doesn't happen for files with the trivial merge
cases (ie the ones that don't require any real merging at all).

There *is* a three-way merge going on, it's just that the end result
is the same as the original file.

Example script just appended here at the end.

NOTE: The script uses "ls -l --full-time", which afaik is a GNU ls
extension. So the script below is not portable.

                 Linus

---

  # Create throw-away test repository
  mkdir merge-test
  cd merge-test
  git init

  # Create silly baseline file with 10 lines of numbers in it
  for i in $(seq 1 10); do echo $i; done > a
  git add a
  git commit -m"Original"

  # Make a branch that changes '9' to 'nine'
  git checkout -b branch
  sed -i 's/9/nine/' a
  git commit -m "Nine" a

  # On the master, change '2' to 'two' _and_ '9' to 'nine'
  git checkout master
  sed -i 's/9/nine/' a
  sed -i 's/2/two/' a
  git commit -m "Two and nine" a

  # sleep to show the time difference
  sleep 1

  # show the date on 'a' and do the merge
  ls -l --full-time a
  git merge -m "Merge contents" branch

  # The merge didn't change the contents, but did rewrite the file
  ls -l --full-time a

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 23:17   ` Junio C Hamano
@ 2018-04-12 23:35     ` Linus Torvalds
  2018-04-12 23:41       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-04-12 23:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Thu, Apr 12, 2018 at 4:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> A bit of detour.  "Change in side branch happened to be a subset of
> the change in trunk and gets subsumed, but we end up writing the
> same result" happens also with the simpler resolve strategy.
>
> Here is a fix.

Heh. Except git-merge-one-file.sh is *only* used for that case, so
this doesn't fix the normal case.

I verified that the normal case situation is that
"update_file_flags()" thing that checks out the end result.

It's called by this case:

        } else if (a_oid && b_oid) {
                /* Case C: Added in both (check for same permissions) and */
                /* case D: Modified in both, but differently. */
                clean_merge = merge_content(o, path,
                                            o_oid, o_mode, a_oid,
a_mode, b_oid, b_mode,
                                            NULL);

in process_entry(), and I think  we could just there add a test for if
o_old,o_mod == a_oid,a_mode or something?

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 23:35     ` Linus Torvalds
@ 2018-04-12 23:41       ` Linus Torvalds
  2018-04-12 23:55         ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-04-12 23:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Thu, Apr 12, 2018 at 4:35 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> in process_entry(), and I think  we could just there add a test for if
> o_old,o_mod == a_oid,a_mode or something?

Actually, not process_entry, but merge_content().

Oddly, that *already* has the check:

        if (mfi.clean && !df_conflict_remains &&
            oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {
                int path_renamed_outside_HEAD;
                output(o, 3, _("Skipped %s (merged same as existing)"), path);

but that doesn't seem to actually trigger for some reason.

But the code really seems to have the _intention_ of skipping the case
where the result ended up the same as the source.

Maybe I'm missing something.

                 Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 23:41       ` Linus Torvalds
@ 2018-04-12 23:55         ` Linus Torvalds
  2018-04-13  0:01           ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-04-12 23:55 UTC (permalink / raw)
  To: Junio C Hamano, Elijah Newren; +Cc: Git Mailing List

[ Talking to myself ]

On Thu, Apr 12, 2018 at 4:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Oddly, that *already* has the check:
>
>         if (mfi.clean && !df_conflict_remains &&
>             oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {
>                 int path_renamed_outside_HEAD;
>                 output(o, 3, _("Skipped %s (merged same as existing)"), path);
>
> but that doesn't seem to actually trigger for some reason.

Actually, that check triggers just fine.

> But the code really seems to have the _intention_ of skipping the case
> where the result ended up the same as the source.
>
> Maybe I'm missing something.

The later check that does

                /*
                 * The content merge resulted in the same file contents we
                 * already had.  We can return early if those file contents
                 * are recorded at the correct path (which may not be true
                 * if the merge involves a rename).
                 */
                path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
                if (!path_renamed_outside_HEAD) {

will see that 'path2' is NULL, and not trigger this early out case,
and then this all falls back to the normal cases after all.

So I think that 'path_renamed_outside_HEAD' logic is somehow broken.

Did it perhaps mean to say

                path_renamed_outside_HEAD = path2 && !strcmp(path, path2);

instead?

See commit 5b448b853 ("merge-recursive: When we detect we can skip an
update, actually skip it") which really implies we want to actually
skip it (but then we don't anyway).

Also see commit b2c8c0a76 ("merge-recursive: When we detect we can
skip an update, actually skip it") which was an earlier version, and
which *actually* skipped it, but it was reverted because of that
rename issue.

Adding Elijah Newren to the cc, because he was working on this back
then, an dis still around, and still working on merging ;)

               Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 23:55         ` Linus Torvalds
@ 2018-04-13  0:01           ` Linus Torvalds
  2018-04-13  7:02             ` Elijah Newren
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-04-13  0:01 UTC (permalink / raw)
  To: Junio C Hamano, Elijah Newren; +Cc: Git Mailing List

[ Still talking to myself. Very soothing. ]

On Thu, Apr 12, 2018 at 4:55 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Talking to myself ]
>
> Did it perhaps mean to say
>
>                 path_renamed_outside_HEAD = path2 && !strcmp(path, path2);
>
> instead?

Probably not correct, but making that change makes my test-case work.

It probably breaks something important, though. I didn't look at why
that code happened in the first place.

But it's nice to know I'm at least looking at the right code while I'm
talking to myself.

           Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-12 21:14 Optimizing writes to unchanged files during merges? Linus Torvalds
  2018-04-12 21:46 ` Junio C Hamano
@ 2018-04-13  0:01 ` Elijah Newren
  1 sibling, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2018-04-13  0:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Thu, Apr 12, 2018 at 2:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So I just had an interesting experience that has happened before too,
> but this time I decided to try to figure out *why* it happened.
>
<snip>
>     include/linux/mm.h
>
> and yeah, that rather core header file causes pretty much everything
> to be re-built.
>
> Now, the reason it was marked as changed is that the xfs branch _had_
> in fact changed it, but the changes were already upstream and got
> merged away. But the file still got written out (with the same
> contents it had before the merge), and 'make' obviously only looks at
> modification time, so make rebuilt everything.
<snip>

Wow, timing.  I've been looking at this independently -- it turns out
to be in the exact same code area involved in the breakage my recent
series caused.  The bug you are observing here will happen with
current git (going back about seven years, at which time it had
different bugs) whenever no rename is involved and the modifications
on the other branch are a subset of the changes made on HEAD's side.

My series (reverted yesterday morning) made this particular code break
in a new, totally different way.  This is a code path that would be
trivial to get right with Junio's suggested re-design of
merge-recursive[1], but the current design just makes this a bit of a
mess, thus resulting in various code changes over the years with
different breakages, each with its own curious flavor of
incorrectness.

Anyway, I'm writing a bunch of test cases, and will try to get a patch
(or patches) out soon.  Pretty busy this week, but I should definitely
have something out next week.

Elijah

[1] https://public-inbox.org/git/xmqqd147kpdm.fsf@gitster.mtv.corp.google.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13  0:01           ` Linus Torvalds
@ 2018-04-13  7:02             ` Elijah Newren
  2018-04-13 17:14               ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Elijah Newren @ 2018-04-13  7:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Thu, Apr 12, 2018 at 5:01 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Still talking to myself. Very soothing. ]
>
> On Thu, Apr 12, 2018 at 4:55 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> [ Talking to myself ]
>>
>> Did it perhaps mean to say
>>
>>                 path_renamed_outside_HEAD = path2 && !strcmp(path, path2);
>>
>> instead?
>
> Probably not correct, but making that change makes my test-case work.
>
> It probably breaks something important, though. I didn't look at why
> that code happened in the first place.
>
> But it's nice to know I'm at least looking at the right code while I'm
> talking to myself.

I hope you don't mind me barging into your conversation, but I figured
out some interesting details.

What we want here is that if there are no content conflicts and the
contents match the version of the file from HEAD, and there are no
mode conflicts and the final mode matches what was in HEAD, and there
are no path conflicts (e.g. a rename/rename(1to2) issue or a D/F
conflict putting a directory in the way) and the final path matches
what was already in HEAD, then we can skip the update.

What does this look like in code?  Well, most of it was already there:

if (mfi.clean && !df_conflict_remains &&
    oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {

That covers everything except "the final path matches what was already
in HEAD".  How do we check for that?  The current code is just making
an approximation; your modification improves the approximation.
However, it turns out we have this awesome function called
"was_tracked(const char *path)" that was intended for answering this
exact question.  So, assuming was_tracked() isn't buggy, the correct
patch for this problem would look like:

-               path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-               if (!path_renamed_outside_HEAD) {
+               if (was_tracked(path)) {

Turns out that patch was already submitted as c5b761fb
("merge-recursive: ensure we write updates for directory-renamed
file", 2018-02-14).  While that patch was for a different bug, it is
identical to what I would have proposed to fix this bug.  A big series
including that patch was merged to master two days ago, but
unfortunately that exact patch was the one that caused some
impressively awful fireworks[1].  So it, along with the rest of the
series it was part of, was reverted just a few hours later.  As it
turns out, the cause of the problem is that was_tracked() can lie when
renames are involved.  It just hadn't ever bit us yet.

I have a fix for was_tracked(), and 10 additional testcases covering
interesting should-be-skipped and
should-not-be-skipped-but-is-close-to-a-should-be-skipped scenarios,
verifying the skipping actually happens if and only if it should
happen.  That should fix your bug, and the bug with my series.  Rough
WIP can be seen at the testme branch in github.com/newren/git for the
curious, but I need to sleep and have a bunch of other things on my
plate for the next few days.  But I thought I'd at least mention what
I've found so far.

Elijah

[1] https://public-inbox.org/git/xmqqefjm3w1h.fsf@gitster-ct.c.googlers.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13  7:02             ` Elijah Newren
@ 2018-04-13 17:14               ` Linus Torvalds
  2018-04-13 17:39                 ` Stefan Beller
                                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-04-13 17:14 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2434 bytes --]

On Fri, Apr 13, 2018 at 12:02 AM, Elijah Newren <newren@gmail.com> wrote:
>
> I hope you don't mind me barging into your conversation

I was getting tired of my own rambling anyway..

> However, it turns out we have this awesome function called
> "was_tracked(const char *path)" that was intended for answering this
> exact question.  So, assuming was_tracked() isn't buggy, the correct
> patch for this problem would look like:

Apparently that causes problems, for some odd reason.

I like the notion of checking the index, but it's not clear that the
index is reliable in the presence of renames either.

>   A big series
> including that patch was merged to master two days ago, but
> unfortunately that exact patch was the one that caused some
> impressively awful fireworks[1].

Yeah, so this code is fragile.

How about we take a completely different approach? Instead of relying
on fragile (but clever) tests, why not rely on stupid brute force?

Yeah, yeah, it's bad to be stupid, but sometimes simple and stupid
really does work.

See the attached patch. It gets rid of all the subtle "has this been
renamed" tests entirely, and just _always_ does that final
update_file().

But instead, it makes update_file() a bit smarter, and says "before we
write this file out, let's see if it's already there and already has
the expected contents"?

Now, it really shouldn't be _quite_ as stupid as that: we should
probably set a flag in the "hey, the oid matches, maybe it's worth
checking", so that it doesn't do the check in the cases where we know
the merge has done things.

But it's actually *fairly* low cost, because before it reads the file
it at least checks that file length matches the expected length (and
that the user permission bits match the expected mode).

So if the file doesn't match, most of the time the real cost will just
be an extra 'open/fstat/close()' sequence. That's pretty cheap.

So even the completely stupid approach is probably not too bad, and it
could be made smarter.

NOTE! I have *NOT* tested this on anything interesting. I tested the
patch on my stupid test-case, but that'[s it. I didn't even bother
re-doing the kernel merge that started this.

Comments? Because considering the problems this code has had, maybe
"stupid" really is the right approach...

[ Ok, I lied. I just tested this on the kernel merge. It worked fine,
and avoided modifying <linux/mm.h> ]

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2221 bytes --]

 merge-recursive.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624..ed2200065 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -815,6 +815,32 @@ static int make_room_for_path(struct merge_options *o, const char *path)
 	return err(o, msg, path, _(": perhaps a D/F conflict?"));
 }
 
+static int working_tree_matches(const char *path, const char *buf, unsigned long size, unsigned mode)
+{
+	int fd, matches;
+	struct stat st;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+	matches = 0;
+	if (!fstat(fd, &st) && st.st_size == size && S_ISREG(st.st_mode) && !(0700 & (st.st_mode ^ mode))) {
+		char tmpbuf[1024];
+		while (size) {
+			int n = read(fd, tmpbuf, sizeof(tmpbuf));
+			if (n <= 0 || n > size)
+				break;
+			if (memcmp(tmpbuf, buf, n))
+				break;
+			buf += n;
+			size -= n;
+		}
+		matches = !size;
+	}
+	close(fd);
+	return matches;
+}
+
 static int update_file_flags(struct merge_options *o,
 			     const struct object_id *oid,
 			     unsigned mode,
@@ -856,6 +882,8 @@ static int update_file_flags(struct merge_options *o,
 				size = strbuf.len;
 				buf = strbuf_detach(&strbuf, NULL);
 			}
+			if (working_tree_matches(path, buf, size, mode))
+				goto free_buf;
 		}
 
 		if (make_room_for_path(o, path) < 0) {
@@ -1782,20 +1810,8 @@ static int merge_content(struct merge_options *o,
 
 	if (mfi.clean && !df_conflict_remains &&
 	    oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {
-		int path_renamed_outside_HEAD;
 		output(o, 3, _("Skipped %s (merged same as existing)"), path);
-		/*
-		 * The content merge resulted in the same file contents we
-		 * already had.  We can return early if those file contents
-		 * are recorded at the correct path (which may not be true
-		 * if the merge involves a rename).
-		 */
-		path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-		if (!path_renamed_outside_HEAD) {
-			add_cacheinfo(o, mfi.mode, &mfi.oid, path,
-				      0, (!o->call_depth), 0);
-			return mfi.clean;
-		}
+		/* We could set a flag here and pass it to "update_file()" */
 	} else
 		output(o, 2, _("Auto-merging %s"), path);
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13 17:14               ` Linus Torvalds
@ 2018-04-13 17:39                 ` Stefan Beller
  2018-04-13 17:53                   ` Linus Torvalds
  2018-04-13 20:04                 ` Elijah Newren
  2018-04-16  1:44                 ` Junio C Hamano
  2 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-04-13 17:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Elijah Newren, Junio C Hamano, Git Mailing List

Hi Linus,

On Fri, Apr 13, 2018 at 10:14 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Comments? Because considering the problems this code has had, maybe
> "stupid" really is the right approach...

Would s/read/xread/ make sense in working_tree_matches ?
If read returns unexpectedly due to EINTR or EAGAIN,
this is no correctness issue, but you'd have to recompile the kernel.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13 17:39                 ` Stefan Beller
@ 2018-04-13 17:53                   ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-04-13 17:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Elijah Newren, Junio C Hamano, Git Mailing List

On Fri, Apr 13, 2018 at 10:39 AM, Stefan Beller <sbeller@google.com> wrote:
>
> Would s/read/xread/ make sense in working_tree_matches ?

Makes sense, yes.

That patch was really more of a RFD than anything that should be applied.

I would like to see the "might be same" flag pushed down so that we
don't do this comparison when it makes no sense, even if the stat info
makes that less of an issue than it might otherwise be,

And maybe that whole function should be taught to do the "verify CE
against file", and do the proper full cache state check (ie time etc)
instead of doing the read.

So maybe the best option is a combination of the two patches: my
"verify the working tree state" _together_ with the was_tracked()
logic that looks up a cache entry.

Again, the problem with "look up the index state" is about files
possibly being renamed as part of the merge, but if we then check the
index state against the actual contents of the working directory, that
isn't an issue.

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13 17:14               ` Linus Torvalds
  2018-04-13 17:39                 ` Stefan Beller
@ 2018-04-13 20:04                 ` Elijah Newren
  2018-04-13 22:27                   ` Junio C Hamano
  2018-04-16  1:44                 ` Junio C Hamano
  2 siblings, 1 reply; 29+ messages in thread
From: Elijah Newren @ 2018-04-13 20:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Fri, Apr 13, 2018 at 10:14 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 13, 2018 at 12:02 AM, Elijah Newren <newren@gmail.com> wrote:

>> However, it turns out we have this awesome function called
>> "was_tracked(const char *path)" that was intended for answering this
>> exact question.  So, assuming was_tracked() isn't buggy, the correct
>> patch for this problem would look like:
>
> Apparently that causes problems, for some odd reason.
>
> I like the notion of checking the index, but it's not clear that the
> index is reliable in the presence of renames either.

Yes, precisely.  Checking the *current* index is not reliable in the
presence of renames.

Trying to use the current index as a proxy for what was in the index
before the merge started is a problem.  But we had a copy of the index
before the merge started; we just discarded it at the end of
unpack_trees().  We could keep it around instead.  That would also
have the benefits of making the was_dirty() checks more accurate too,
as using the mtime's in the current index as a proxy for what was in
the original index has the potential for the same kinds of problems.

>>   A big series
>> including that patch was merged to master two days ago, but
>> unfortunately that exact patch was the one that caused some
>> impressively awful fireworks[1].
>
> Yeah, so this code is fragile.
>
> How about we take a completely different approach? Instead of relying
> on fragile (but clever) tests, why not rely on stupid brute force?
>
> Yeah, yeah, it's bad to be stupid, but sometimes simple and stupid
> really does work.
>
<snip>
> Comments? Because considering the problems this code has had, maybe
> "stupid" really is the right approach...

It's certainly tempting as an interim solution.  I have an alternative
interim solution that I think explains well why the code here had been
fragile, and how to just implement what we want to know rather than
making approximations to it, which I just posted at [2].  But I can
see the draw of just gutting the code and replacing with simple brute
force.  Long term, I think the correct solution is still Junio's
suggested rewrite[1].  My alternative is slightly closer to that
end-state, so I favor it over simple brute-force, but if others have
strong preferences here, I can be happy with either.


Elijah

[1] https://public-inbox.org/git/xmqqd147kpdm.fsf@gitster.mtv.corp.google.com/
[2] https://public-inbox.org/git/20180413195607.18091-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13 20:04                 ` Elijah Newren
@ 2018-04-13 22:27                   ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-04-13 22:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Linus Torvalds, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Yes, precisely.  Checking the *current* index is not reliable in the
> presence of renames.
>
> Trying to use the current index as a proxy for what was in the index
> before the merge started is a problem.  But we had a copy of the index
> before the merge started; we just discarded it at the end of
> unpack_trees().  We could keep it around instead.  That would also
> have the benefits of making the was_dirty() checks more accurate too,
> as using the mtime's in the current index as a proxy for what was in
> the original index has the potential for the same kinds of problems.

Yeah, I agree that you are going in the right direction with the
above.

> It's certainly tempting as an interim solution.  I have an alternative
> interim solution that I think explains well why the code here had been
> fragile, and how to just implement what we want to know rather than
> making approximations to it, which I just posted at [2].  But I can
> see the draw of just gutting the code and replacing with simple brute
> force.

Thanks; I'd love to reserve a block of time to think this through
myself, but I am a bit occupied otherwise this weekend, so I'll try
to catch up after that.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-13 17:14               ` Linus Torvalds
  2018-04-13 17:39                 ` Stefan Beller
  2018-04-13 20:04                 ` Elijah Newren
@ 2018-04-16  1:44                 ` Junio C Hamano
  2018-04-16  2:03                   ` Linus Torvalds
  2018-04-16 23:03                   ` Elijah Newren
  2 siblings, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-04-16  1:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Elijah Newren, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> How about we take a completely different approach? Instead of relying
> on fragile (but clever) tests, why not rely on stupid brute force?
>
> Yeah, yeah, it's bad to be stupid, but sometimes simple and stupid
> really does work.
>
> See the attached patch. It gets rid of all the subtle "has this been
> renamed" tests entirely, and just _always_ does that final
> update_file().

I think Elijah's corrected was_tracked() also does not care "has
this been renamed".

One thing that makes me curious is what happens (and what we want to
happen) when such a "we already have the changes the side branch
tries to bring in" path has local (i.e. not yet in the index)
changes.  For a dirty file that trivially merges (e.g. a path we
modified since our histories forked, while the other side didn't do
anything, has local changes in the working tree), we try hard to
make the merge succeed while keeping the local changes, and we
should be able to do the same in this case, too.  

Your illustration patch that reads from the working tree to compare
with the contents we are about to write out of course won't cope
with this case ;-).  If we do things in the index like the approach
to fix was_tracked(), I suspect that we would just say "as long as
the index and the HEAD matches, i.e. we are keeping the pathas it is
found in HEAD as the merge result, we do not touch the working tree"
and leave such a local modification alone.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16  1:44                 ` Junio C Hamano
@ 2018-04-16  2:03                   ` Linus Torvalds
  2018-04-16 16:07                     ` Lars Schneider
  2018-04-16 22:55                     ` Elijah Newren
  2018-04-16 23:03                   ` Elijah Newren
  1 sibling, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-04-16  2:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Elijah Newren, Git Mailing List

On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I think Elijah's corrected was_tracked() also does not care "has
> this been renamed".

I'm perfectly happy with the slightly smarter patches. My patch was
really just an RFC and because I had tried it out.

> One thing that makes me curious is what happens (and what we want to
> happen) when such a "we already have the changes the side branch
> tries to bring in" path has local (i.e. not yet in the index)
> changes.  For a dirty file that trivially merges (e.g. a path we
> modified since our histories forked, while the other side didn't do
> anything, has local changes in the working tree), we try hard to
> make the merge succeed while keeping the local changes, and we
> should be able to do the same in this case, too.

I think it might be nice, but probably not really worth it.

I find the "you can merge even if some files are dirty" to be really
convenient, because I often keep stupid test patches in my tree that I
may not even intend to commit, and I then use the same tree for
merging.

For example, I sometimes end up editing the Makefile for the release
version early, but I won't *commit* that until I actually cut the
release. But if I pull some branch that has also changed the Makefile,
it's not worth any complexity to try to be nice about the dirty state.

If it's a file that actually *has* been changed in the branch I'm
merging, and I'm more than happy to just stage the patch (or throw it
away - I think it's about 50:50 for me).

So I don't think it's a big deal, and I'd rather have the merge fail
very early with "that file has seen changes in the branch you are
merging" than add any real complexity to the merge logic.

                Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16  2:03                   ` Linus Torvalds
@ 2018-04-16 16:07                     ` Lars Schneider
  2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
                                         ` (3 more replies)
  2018-04-16 22:55                     ` Elijah Newren
  1 sibling, 4 replies; 29+ messages in thread
From: Lars Schneider @ 2018-04-16 16:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Elijah Newren, Git Mailing List, mgorny, rtc,
	winserver.support, tytso


> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> I think Elijah's corrected was_tracked() also does not care "has
>> this been renamed".
> 
> I'm perfectly happy with the slightly smarter patches. My patch was
> really just an RFC and because I had tried it out.
> 
>> One thing that makes me curious is what happens (and what we want to
>> happen) when such a "we already have the changes the side branch
>> tries to bring in" path has local (i.e. not yet in the index)
>> changes.  For a dirty file that trivially merges (e.g. a path we
>> modified since our histories forked, while the other side didn't do
>> anything, has local changes in the working tree), we try hard to
>> make the merge succeed while keeping the local changes, and we
>> should be able to do the same in this case, too.
> 
> I think it might be nice, but probably not really worth it.
> 
> I find the "you can merge even if some files are dirty" to be really
> convenient, because I often keep stupid test patches in my tree that I
> may not even intend to commit, and I then use the same tree for
> merging.
> 
> For example, I sometimes end up editing the Makefile for the release
> version early, but I won't *commit* that until I actually cut the
> release. But if I pull some branch that has also changed the Makefile,
> it's not worth any complexity to try to be nice about the dirty state.
> 
> If it's a file that actually *has* been changed in the branch I'm
> merging, and I'm more than happy to just stage the patch (or throw it
> away - I think it's about 50:50 for me).
> 
> So I don't think it's a big deal, and I'd rather have the merge fail
> very early with "that file has seen changes in the branch you are
> merging" than add any real complexity to the merge logic.

I am happy to see this discussion and the patches, because long rebuilds 
are a constant annoyance for us. We might have been bitten by the exact 
case discussed here, but more often, we have a slightly different 
situation:

An engineer works on a task branch and runs incremental builds — all 
is good. The engineer switches to another branch to review another 
engineer's work. This other branch changes a low-level header file, 
but no rebuild is triggered. The engineer switches back to the previous 
task branch. At this point, the incremental build will rebuild 
everything, as the compiler thinks that the low-level header file has
been changed (because the mtime is different).

Of course, this problem can be solved with a separate worktree. However, 
our engineers forget about that sometimes, and then, they are annoyed by 
a 4h rebuild.

Is this situation a problem for others too?
If yes, what do you think about the following approach:

What if Git kept a LRU list that contains file path, content hash, and 
mtime of any file that is removed or modified during a checkout. If a 
file is checked out later with the exact same path and content hash, 
then Git could set the mtime to the previous value. This way the 
compiler would not think that the content has been changed since the 
last rebuild.

I think that would fix the problem that our engineers run into and also 
the problem that Linus experienced during the merge, wouldn't it?

Thanks,
Lars

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 16:07                     ` Lars Schneider
@ 2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
  2018-04-17 17:23                         ` Lars Schneider
  2018-04-16 17:43                       ` Jacob Keller
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-16 17:04 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	mgorny, rtc, winserver.support, tytso


On Mon, Apr 16 2018, Lars Schneider wrote:

>> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> I think Elijah's corrected was_tracked() also does not care "has
>>> this been renamed".
>>
>> I'm perfectly happy with the slightly smarter patches. My patch was
>> really just an RFC and because I had tried it out.
>>
>>> One thing that makes me curious is what happens (and what we want to
>>> happen) when such a "we already have the changes the side branch
>>> tries to bring in" path has local (i.e. not yet in the index)
>>> changes.  For a dirty file that trivially merges (e.g. a path we
>>> modified since our histories forked, while the other side didn't do
>>> anything, has local changes in the working tree), we try hard to
>>> make the merge succeed while keeping the local changes, and we
>>> should be able to do the same in this case, too.
>>
>> I think it might be nice, but probably not really worth it.
>>
>> I find the "you can merge even if some files are dirty" to be really
>> convenient, because I often keep stupid test patches in my tree that I
>> may not even intend to commit, and I then use the same tree for
>> merging.
>>
>> For example, I sometimes end up editing the Makefile for the release
>> version early, but I won't *commit* that until I actually cut the
>> release. But if I pull some branch that has also changed the Makefile,
>> it's not worth any complexity to try to be nice about the dirty state.
>>
>> If it's a file that actually *has* been changed in the branch I'm
>> merging, and I'm more than happy to just stage the patch (or throw it
>> away - I think it's about 50:50 for me).
>>
>> So I don't think it's a big deal, and I'd rather have the merge fail
>> very early with "that file has seen changes in the branch you are
>> merging" than add any real complexity to the merge logic.
>
> I am happy to see this discussion and the patches, because long rebuilds
> are a constant annoyance for us. We might have been bitten by the exact
> case discussed here, but more often, we have a slightly different
> situation:
>
> An engineer works on a task branch and runs incremental builds — all
> is good. The engineer switches to another branch to review another
> engineer's work. This other branch changes a low-level header file,
> but no rebuild is triggered. The engineer switches back to the previous
> task branch. At this point, the incremental build will rebuild
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
>
> Of course, this problem can be solved with a separate worktree. However,
> our engineers forget about that sometimes, and then, they are annoyed by
> a 4h rebuild.
>
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
>
> What if Git kept a LRU list that contains file path, content hash, and
> mtime of any file that is removed or modified during a checkout. If a
> file is checked out later with the exact same path and content hash,
> then Git could set the mtime to the previous value. This way the
> compiler would not think that the content has been changed since the
> last rebuild.
>
> I think that would fix the problem that our engineers run into and also
> the problem that Linus experienced during the merge, wouldn't it?

Could what you're describing be prototyped as a post-checkout hook that
looks at the reflog? It sounds to me like it could, but perhaps I've
missed some subtlety.

Not re-writing out a file that hasn't changed is one thing, but I think
for more complex behaviors (such as the "I want everything to have the
same mtime" mentioned in another thread on-list), and this, it makes
sense to provide some hook mechanism than have git itself do all the
work.

I also don't see how what you're describing could be generalized, or
even be made to work reliably in the case you're describing. If the
engineer runs "make" on this branch he's testing out that might produce
an object file that'll get used as-is once he switches back, since
you've set the mtime in the past for that file because you re-checked it
out.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 16:07                     ` Lars Schneider
  2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
@ 2018-04-16 17:43                       ` Jacob Keller
  2018-04-16 17:45                         ` Jacob Keller
  2018-04-16 17:47                       ` Phillip Wood
  2018-04-16 20:09                       ` lists
  3 siblings, 1 reply; 29+ messages in thread
From: Jacob Keller @ 2018-04-16 17:43 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	mgorny, Peter Backes, winserver.support, Theodore Ts'o

On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
>> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> I think Elijah's corrected was_tracked() also does not care "has
>>> this been renamed".
>>
>> I'm perfectly happy with the slightly smarter patches. My patch was
>> really just an RFC and because I had tried it out.
>>
>>> One thing that makes me curious is what happens (and what we want to
>>> happen) when such a "we already have the changes the side branch
>>> tries to bring in" path has local (i.e. not yet in the index)
>>> changes.  For a dirty file that trivially merges (e.g. a path we
>>> modified since our histories forked, while the other side didn't do
>>> anything, has local changes in the working tree), we try hard to
>>> make the merge succeed while keeping the local changes, and we
>>> should be able to do the same in this case, too.
>>
>> I think it might be nice, but probably not really worth it.
>>
>> I find the "you can merge even if some files are dirty" to be really
>> convenient, because I often keep stupid test patches in my tree that I
>> may not even intend to commit, and I then use the same tree for
>> merging.
>>
>> For example, I sometimes end up editing the Makefile for the release
>> version early, but I won't *commit* that until I actually cut the
>> release. But if I pull some branch that has also changed the Makefile,
>> it's not worth any complexity to try to be nice about the dirty state.
>>
>> If it's a file that actually *has* been changed in the branch I'm
>> merging, and I'm more than happy to just stage the patch (or throw it
>> away - I think it's about 50:50 for me).
>>
>> So I don't think it's a big deal, and I'd rather have the merge fail
>> very early with "that file has seen changes in the branch you are
>> merging" than add any real complexity to the merge logic.
>
> I am happy to see this discussion and the patches, because long rebuilds
> are a constant annoyance for us. We might have been bitten by the exact
> case discussed here, but more often, we have a slightly different
> situation:
>
> An engineer works on a task branch and runs incremental builds — all
> is good. The engineer switches to another branch to review another
> engineer's work. This other branch changes a low-level header file,
> but no rebuild is triggered. The engineer switches back to the previous
> task branch. At this point, the incremental build will rebuild
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
>
> Of course, this problem can be solved with a separate worktree. However,
> our engineers forget about that sometimes, and then, they are annoyed by
> a 4h rebuild.
>
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
>
> What if Git kept a LRU list that contains file path, content hash, and
> mtime of any file that is removed or modified during a checkout. If a
> file is checked out later with the exact same path and content hash,
> then Git could set the mtime to the previous value. This way the
> compiler would not think that the content has been changed since the
> last rebuild.

That would only work until they actuall *did* a build on the second
branch, and upon changing back, how would this detect that it needs to
update mtime again? I don't think this solution really works.
Ultimately, the problem is that the build tool relies on the mtime to
determine what to rebuild. I think this would cause worse problems
because we *wouldn't* rebuild in the case. How is git supposed to know
that we rebuilt when switching branches or not?

Thanks,
Jake

>
> I think that would fix the problem that our engineers run into and also
> the problem that Linus experienced during the merge, wouldn't it?
>
> Thanks,
> Lars

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 17:43                       ` Jacob Keller
@ 2018-04-16 17:45                         ` Jacob Keller
  2018-04-16 22:34                           ` Junio C Hamano
  2018-04-17 17:27                           ` Lars Schneider
  0 siblings, 2 replies; 29+ messages in thread
From: Jacob Keller @ 2018-04-16 17:45 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	mgorny, Peter Backes, winserver.support, Theodore Ts'o

On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
>> What if Git kept a LRU list that contains file path, content hash, and
>> mtime of any file that is removed or modified during a checkout. If a
>> file is checked out later with the exact same path and content hash,
>> then Git could set the mtime to the previous value. This way the
>> compiler would not think that the content has been changed since the
>> last rebuild.
>
> That would only work until they actuall *did* a build on the second
> branch, and upon changing back, how would this detect that it needs to
> update mtime again? I don't think this solution really works.
> Ultimately, the problem is that the build tool relies on the mtime to
> determine what to rebuild. I think this would cause worse problems
> because we *wouldn't* rebuild in the case. How is git supposed to know
> that we rebuilt when switching branches or not?
>
> Thanks,
> Jake

I think a better solution for your problem would be to extend the
build system you're using to avoid rebuilding when the contents
haven't changed since last build (possibly by using hashes?). At the
very least, I would not want this to be default, as it could possibly
result in *no* build when there should be one, which is far more
confusing to debug.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 16:07                     ` Lars Schneider
  2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
  2018-04-16 17:43                       ` Jacob Keller
@ 2018-04-16 17:47                       ` Phillip Wood
  2018-04-16 20:09                       ` lists
  3 siblings, 0 replies; 29+ messages in thread
From: Phillip Wood @ 2018-04-16 17:47 UTC (permalink / raw)
  To: Lars Schneider, Linus Torvalds
  Cc: Junio C Hamano, Elijah Newren, Git Mailing List, mgorny, rtc,
	winserver.support, tytso

On 16/04/18 17:07, Lars Schneider wrote:
> 
> I am happy to see this discussion and the patches, because long rebuilds 
> are a constant annoyance for us. We might have been bitten by the exact 
> case discussed here, but more often, we have a slightly different 
> situation:
> 
> An engineer works on a task branch and runs incremental builds — all 
> is good. The engineer switches to another branch to review another 
> engineer's work. This other branch changes a low-level header file, 
> but no rebuild is triggered. The engineer switches back to the previous 
> task branch. At this point, the incremental build will rebuild 
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
> 
> Of course, this problem can be solved with a separate worktree. However, 
> our engineers forget about that sometimes, and then, they are annoyed by 
> a 4h rebuild.
> 
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
> 
> What if Git kept a LRU list that contains file path, content hash, and 
> mtime of any file that is removed or modified during a checkout. If a 
> file is checked out later with the exact same path and content hash, 
> then Git could set the mtime to the previous value. This way the 
> compiler would not think that the content has been changed since the 
> last rebuild.

Hi Lars

But if there has been rebuild between the checkouts then you
want the compiler to rebuild. I've been using the script below
recently to save and restore mtimes around running rebase to squash
fixup commits. To avoid restoring the mtimes if there has been a
rebuild since they were stored it takes a list of build sentinels and
stores their mtimes too - if any of those change then it will refuse
to restore the original mtimes of the tracked files (if you give a
path that does exist when the mtimes are stored then it will refuse to
restore the mtimes if that path exists when you run 'git mtimes
restore'). The sentinels can be specified on the commandline when
running 'git mtimes save' or stored in multiple mtimes.sentinal config
keys.

Best Wishes

Phillip

--->8---

#!/usr/bin/perl

# Copyright (C) 2018 Phillip Wood <phillip.wood@dunelm.org.uk>
#
# git-mtimes.perl
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

use 5.008;
use strict;
use warnings;
use File::Copy (qw(copy));
use File::Spec::Functions (qw(abs2rel catfile file_name_is_absolute rel2abs));
use Storable ();

sub git;

my $GIT_DIR = git(qw(rev-parse --git-dir)) or exit 1;
$GIT_DIR = rel2abs($GIT_DIR);
my $mtimes_path = "$GIT_DIR/mtimes";

sub git {
    my @lines;
    # in a scalar context slurp removing any trailing $/
    # in an array context return a list of lines
    {
	local $/ = wantarray ? $/ : undef;
	local $,=' ';
	open my $fh, '-|', 'git', @_ or die "git @_ failed $!";
	@lines = <$fh>;
	chomp @lines;
	unless (close $fh) {
	    $? == -1 and die "git @_ not found";
	    my $code = $? >> 8;
	    $_[0] eq 'config' and $code == 1 or
		die "git @_ failed with exit code $code"
	}
    }
    wantarray and return @lines;
    @lines and chomp @lines;
    return $lines[0];
}

sub ls_files {
    # mode, uid, gid, mtime and maybe atime
    my @stat_indices = $_[0] ? (2, 4, 5, 9, 8) : (2, 4, 5, 9);
    local $_;
    local $/ = "\0";
    my @files;
    for (git(qw(ls-files --stage -z))) {
	if (/^[^ ]+ ([^\t]+) 0\t(.*)/) {
	    my @stat = stat($2);
	    # store name, hash, mode, uid, gid, mtime and maybe atime
	    push @files, [ $2, $1, @stat[@stat_indices] ];
	}
    }
    return @files;
}

sub get_config {
    local $/ = "\0";
    my $get = wantarray ? '--get-all' : '--get';
    git(qw(config -z), $get, @_);
}

sub save {
    local $_;
    my @sentinels = get_config('mtimes.sentinel');
    push @sentinels, @ARGV;
    @sentinels or die "No sentinels given";
    @sentinels = map { [ $_, [ stat $_ ] ] } @sentinels;
    my @files = ls_files();
    Storable::nstore( [ [ @sentinels ] , [ @files ] ], $mtimes_path) or
	die "unable to store mtimes $!";
}

sub match_sentinel_data {
    local $_;
    my ($old, $new, $trustctime) = @_;
    if (!@$old) {
	return (@$new) ? undef : 1;
    } else {
	@$new or return undef;
    }
    # Skip hardlink count, atime, blksize
    for (0..2,4..7,9,10,12) {
	next if ($_ == 10 and ! $trustctime);
	$old->[$_] == $new->[$_] or return undef;
    }
    return 1;
}

sub needs_update {
    local $_;
    my ($old, $new) = @_;
    for (0..1) {
	$old->[$_] eq $new->[$_] or return undef;
    }
    for (2..4) {
	$old->[$_] == $new->[$_] or return undef;
    }
    $old->[5] != $new->[5];
}

sub restore {
    local $_;
    my $stored = Storable::retrieve($mtimes_path) or
	die "unable to load stored data";
    my $trustctime = get_config('--bool', 'core.trustctime');
    $trustctime = defined($trustctime) ? $trustctime eq 'true' : 1;
    my ($sentinels, $oldfiles) = @$stored;
    for (@$sentinels) {
	match_sentinel_data( [ stat($_->[0]) ], $_->[1], $trustctime) or
	    die "Unable to restore mtimes, stat data for sentinel '$_->[0]' does not match";
    }
    my @newfiles = ls_files(1);
    my ($i, $restored) = (0, 0);
    for (@$oldfiles) {
	while ($newfiles[$i]->[0] lt $_->[0] and $i < @newfiles) {
	    $i++;
	}
	if (needs_update($_, $newfiles[$i])) {
	    utime($newfiles[$i]->[6], $_->[5], $_->[0]);
	    $restored = 1;
	}
    }
    if ($restored) {
	print "restored mtimes\n";
    }
}

my $cmd = shift;
# Keep relative paths relative in case repository directory is renamed
# between saving and restoring mtimes.
if ($ENV{GIT_PREFIX}) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : catfile($ENV{GIT_PREFIX}, $_);
    } @ARGV;
}
my $up = git(qw(rev-parse --show-cdup));
if ($up) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : abs2rel(rel2abs($_), $up);
    } @ARGV;
    chdir $up;
}

my $tmp_index = catfile($GIT_DIR, "mtimes-index");
my $src_index = $ENV{GIT_INDEX_FILE} ? $ENV{GIT_INDEX_FILE} :
				       catfile($GIT_DIR, "index");
copy($src_index, $tmp_index) or
    die "cannot create temporary index '$tmp_index'\n";
$ENV{GIT_INDEX_FILE} = $tmp_index;
git(qw(add -u));

if ($cmd eq 'save') {
    save();
} elsif ($cmd eq 'restore' and ! @ARGV) {
    restore();
} else {
    print STDERR "usage: git mtimes <save [sentinels ...] | restore>\n";
}

END {
    unlink $tmp_index;
}

> I think that would fix the problem that our engineers run into and also 
> the problem that Linus experienced during the merge, wouldn't it?
> 
> Thanks,
> Lars
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 16:07                     ` Lars Schneider
                                         ` (2 preceding siblings ...)
  2018-04-16 17:47                       ` Phillip Wood
@ 2018-04-16 20:09                       ` lists
  3 siblings, 0 replies; 29+ messages in thread
From: lists @ 2018-04-16 20:09 UTC (permalink / raw)
  To: larsxschneider; +Cc: git

Lars Schneider <larsxschneider@gmail.com> wrote:

> An engineer works on a task branch and runs incremental builds — all 
> is good. The engineer switches to another branch to review another 
> engineer's work. This other branch changes a low-level header file, 
> but no rebuild is triggered. The engineer switches back to the previous
> task branch. At this point, the incremental build will rebuild 
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).

If you are using C or C++, look into ccache. We're using it to work
around this problem, and it's a near-perfect solution, I'd say.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 17:45                         ` Jacob Keller
@ 2018-04-16 22:34                           ` Junio C Hamano
  2018-04-17 17:27                           ` Lars Schneider
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-04-16 22:34 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Lars Schneider, Linus Torvalds, Elijah Newren, Git Mailing List,
	mgorny, Peter Backes, winserver.support, Theodore Ts'o

Jacob Keller <jacob.keller@gmail.com> writes:

> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
> ...
> I think a better solution for your problem would be to extend the
> build system you're using to avoid rebuilding when the contents
> haven't changed since last build (possibly by using hashes?). At the
> very least, I would not want this to be default, as it could possibly
> result in *no* build when there should be one, which is far more
> confusing to debug.

Yup.

Even though I take many optional features that I do not see need to
support on the core side and I do not plan to use personally, as
long as the implementation is cleanly separated to make it clear
that those who do not use them won't negatively affected, I would
not want to have this in a system I maintain and get blamed by those
who get burned by it.  Something like ccache, not a version control
system, is meant to serve this audience.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16  2:03                   ` Linus Torvalds
  2018-04-16 16:07                     ` Lars Schneider
@ 2018-04-16 22:55                     ` Elijah Newren
  1 sibling, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2018-04-16 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Sun, Apr 15, 2018 at 7:03 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> One thing that makes me curious is what happens (and what we want to
>> happen) when such a "we already have the changes the side branch
>> tries to bring in" path has local (i.e. not yet in the index)
>> changes.  For a dirty file that trivially merges (e.g. a path we
>> modified since our histories forked, while the other side didn't do
>> anything, has local changes in the working tree), we try hard to
>> make the merge succeed while keeping the local changes, and we
>> should be able to do the same in this case, too.
>
> I think it might be nice, but probably not really worth it.
<snip>
> So I don't think it's a big deal, and I'd rather have the merge fail
> very early with "that file has seen changes in the branch you are
> merging" than add any real complexity to the merge logic.

That's actually problematic, and not yet possible with the current
merge-recursive code.  The fail-very-early code is in unpack_trees(),
and it doesn't understand renames.  By the time we get to the rest of
the logic of merge-recursive, unpack_trees() has already written
updates to lots of files throughout the tree, so bailing and leaving
the tree in a half-merged state is no longer an option.  (The logic in
merge-recursive.c is excessively complex in part due to this issue,
making it implement what amounts to a four-way merge instead of just a
three-way merge.  It's gross.)

So, if we were to use the brute-force scheme here, when renames are
involved, we'd need to have some special logic for handling dirty
files.  That logic would probably include checking the original index
for both modification times (to determine if the file is dirty), and
for comparison of contents.  In short, we'd still need all the logic
that this scheme was trying to get rid of in the first place.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16  1:44                 ` Junio C Hamano
  2018-04-16  2:03                   ` Linus Torvalds
@ 2018-04-16 23:03                   ` Elijah Newren
  1 sibling, 0 replies; 29+ messages in thread
From: Elijah Newren @ 2018-04-16 23:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:

> One thing that makes me curious is what happens (and what we want to
> happen) when such a "we already have the changes the side branch
> tries to bring in" path has local (i.e. not yet in the index)
> changes.  For a dirty file that trivially merges (e.g. a path we
> modified since our histories forked, while the other side didn't do
> anything, has local changes in the working tree), we try hard to
> make the merge succeed while keeping the local changes, and we
> should be able to do the same in this case, too.
>
> Your illustration patch that reads from the working tree to compare
> with the contents we are about to write out of course won't cope
> with this case ;-).  If we do things in the index like the approach
> to fix was_tracked(), I suspect that we would just say "as long as
> the index and the HEAD matches, i.e. we are keeping the path as it is
> found in HEAD as the merge result, we do not touch the working tree"
> and leave such a local modification alone.

I could see it going either way (fail early, or succeed because we
don't need to overwrite anything), but my suspicion is also towards
letting the merge succeed.

It looks like my patches need a little more fixing up (the was_dirty()
calls would be mixing both the old and the new index -- it used the
original index via was_tracked(), but used the current index via
cache_file_exists(); it should be more consistent).  I'm fixing that
up, will add this as another testcase to t6046, and will submit a
newer RFC in a day or so.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
@ 2018-04-17 17:23                         ` Lars Schneider
  0 siblings, 0 replies; 29+ messages in thread
From: Lars Schneider @ 2018-04-17 17:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	mgorny, rtc, winserver.support, tytso


> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> 
> On Mon, Apr 16 2018, Lars Schneider wrote:
> 
>>> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> 
>>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>> 
>>>> I think Elijah's corrected was_tracked() also does not care "has
>>>> this been renamed".
>>> 
>>> I'm perfectly happy with the slightly smarter patches. My patch was
>>> really just an RFC and because I had tried it out.
>>> 
>>>> One thing that makes me curious is what happens (and what we want to
>>>> happen) when such a "we already have the changes the side branch
>>>> tries to bring in" path has local (i.e. not yet in the index)
>>>> changes.  For a dirty file that trivially merges (e.g. a path we
>>>> modified since our histories forked, while the other side didn't do
>>>> anything, has local changes in the working tree), we try hard to
>>>> make the merge succeed while keeping the local changes, and we
>>>> should be able to do the same in this case, too.
>>> 
>>> I think it might be nice, but probably not really worth it.
>>> 
>>> I find the "you can merge even if some files are dirty" to be really
>>> convenient, because I often keep stupid test patches in my tree that I
>>> may not even intend to commit, and I then use the same tree for
>>> merging.
>>> 
>>> For example, I sometimes end up editing the Makefile for the release
>>> version early, but I won't *commit* that until I actually cut the
>>> release. But if I pull some branch that has also changed the Makefile,
>>> it's not worth any complexity to try to be nice about the dirty state.
>>> 
>>> If it's a file that actually *has* been changed in the branch I'm
>>> merging, and I'm more than happy to just stage the patch (or throw it
>>> away - I think it's about 50:50 for me).
>>> 
>>> So I don't think it's a big deal, and I'd rather have the merge fail
>>> very early with "that file has seen changes in the branch you are
>>> merging" than add any real complexity to the merge logic.
>> 
>> I am happy to see this discussion and the patches, because long rebuilds
>> are a constant annoyance for us. We might have been bitten by the exact
>> case discussed here, but more often, we have a slightly different
>> situation:
>> 
>> An engineer works on a task branch and runs incremental builds — all
>> is good. The engineer switches to another branch to review another
>> engineer's work. This other branch changes a low-level header file,
>> but no rebuild is triggered. The engineer switches back to the previous
>> task branch. At this point, the incremental build will rebuild
>> everything, as the compiler thinks that the low-level header file has
>> been changed (because the mtime is different).
>> 
>> Of course, this problem can be solved with a separate worktree. However,
>> our engineers forget about that sometimes, and then, they are annoyed by
>> a 4h rebuild.
>> 
>> Is this situation a problem for others too?
>> If yes, what do you think about the following approach:
>> 
>> What if Git kept a LRU list that contains file path, content hash, and
>> mtime of any file that is removed or modified during a checkout. If a
>> file is checked out later with the exact same path and content hash,
>> then Git could set the mtime to the previous value. This way the
>> compiler would not think that the content has been changed since the
>> last rebuild.
>> 
>> I think that would fix the problem that our engineers run into and also
>> the problem that Linus experienced during the merge, wouldn't it?
> 
> Could what you're describing be prototyped as a post-checkout hook that
> looks at the reflog? It sounds to me like it could, but perhaps I've
> missed some subtlety.

Yeah, probably. You don't even need the reflog I think. I just wanted
to get a sense if other people run into this problem too.


> Not re-writing out a file that hasn't changed is one thing, but I think
> for more complex behaviors (such as the "I want everything to have the
> same mtime" mentioned in another thread on-list), and this, it makes
> sense to provide some hook mechanism than have git itself do all the
> work.
> 
> I also don't see how what you're describing could be generalized, or
> even be made to work reliably in the case you're describing. If the
> engineer runs "make" on this branch he's testing out that might produce
> an object file that'll get used as-is once he switches back, since
> you've set the mtime in the past for that file because you re-checked it
> out.

Ohh... you're right. I thought Visual Studio looks *just* at ctime/mtime 
of the files. But this seems not to be true [1]:
 
   "MSBuild to build it quickly checks if any of a project’s input files 
    are modified later than any of the project’s outputs"

In that case my idea outlined above is garbage.

Thanks,
Lars


[1] https://blogs.msdn.microsoft.com/kirillosenkov/2014/08/04/how-to-investigate-rebuilding-in-visual-studio-when-nothing-has-changed/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-16 17:45                         ` Jacob Keller
  2018-04-16 22:34                           ` Junio C Hamano
@ 2018-04-17 17:27                           ` Lars Schneider
  2018-04-17 17:43                             ` Jacob Keller
  1 sibling, 1 reply; 29+ messages in thread
From: Lars Schneider @ 2018-04-17 17:27 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	mgorny, Peter Backes, winserver.support, Theodore Ts'o


> On 16 Apr 2018, at 19:45, Jacob Keller <jacob.keller@gmail.com> wrote:
> 
> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
>> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
>> <larsxschneider@gmail.com> wrote:
>>> What if Git kept a LRU list that contains file path, content hash, and
>>> mtime of any file that is removed or modified during a checkout. If a
>>> file is checked out later with the exact same path and content hash,
>>> then Git could set the mtime to the previous value. This way the
>>> compiler would not think that the content has been changed since the
>>> last rebuild.
>> 
>> That would only work until they actuall *did* a build on the second
>> branch, and upon changing back, how would this detect that it needs to
>> update mtime again? I don't think this solution really works.
>> Ultimately, the problem is that the build tool relies on the mtime to
>> determine what to rebuild. I think this would cause worse problems
>> because we *wouldn't* rebuild in the case. How is git supposed to know
>> that we rebuilt when switching branches or not?
>> 
>> Thanks,
>> Jake
> 
> I think a better solution for your problem would be to extend the
> build system you're using to avoid rebuilding when the contents
> haven't changed since last build (possibly by using hashes?). At the
> very least, I would not want this to be default, as it could possibly
> result in *no* build when there should be one, which is far more
> confusing to debug.

I am 100% with you that this is a build system issue. But changing
the build system for many teams in a large organization is really
hard. That's why I wondered if Git could help with a shortcut.
Looks like there is no shortcut (see my other reply in this thread).

Thanks
Lars

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Optimizing writes to unchanged files during merges?
  2018-04-17 17:27                           ` Lars Schneider
@ 2018-04-17 17:43                             ` Jacob Keller
  0 siblings, 0 replies; 29+ messages in thread
From: Jacob Keller @ 2018-04-17 17:43 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Linus Torvalds, Junio C Hamano, Elijah Newren, Git Mailing List,
	Michał Górny, Peter Backes, winserver.support,
	Theodore Ts'o

On Tue, Apr 17, 2018 at 10:27 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
>> On 16 Apr 2018, at 19:45, Jacob Keller <jacob.keller@gmail.com> wrote:
>>
>> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
>>> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
>>> <larsxschneider@gmail.com> wrote:
>>>> What if Git kept a LRU list that contains file path, content hash, and
>>>> mtime of any file that is removed or modified during a checkout. If a
>>>> file is checked out later with the exact same path and content hash,
>>>> then Git could set the mtime to the previous value. This way the
>>>> compiler would not think that the content has been changed since the
>>>> last rebuild.
>>>
>>> That would only work until they actuall *did* a build on the second
>>> branch, and upon changing back, how would this detect that it needs to
>>> update mtime again? I don't think this solution really works.
>>> Ultimately, the problem is that the build tool relies on the mtime to
>>> determine what to rebuild. I think this would cause worse problems
>>> because we *wouldn't* rebuild in the case. How is git supposed to know
>>> that we rebuilt when switching branches or not?
>>>
>>> Thanks,
>>> Jake
>>
>> I think a better solution for your problem would be to extend the
>> build system you're using to avoid rebuilding when the contents
>> haven't changed since last build (possibly by using hashes?). At the
>> very least, I would not want this to be default, as it could possibly
>> result in *no* build when there should be one, which is far more
>> confusing to debug.
>
> I am 100% with you that this is a build system issue. But changing
> the build system for many teams in a large organization is really
> hard. That's why I wondered if Git could help with a shortcut.
> Looks like there is no shortcut (see my other reply in this thread).
>
> Thanks
> Lars

Right. I think that solutions involving hooks or scripts which "fix"
the mtimes are the best bet for this problem then, given that building
it into git would cause problems for other users. (And personally I
would always ere on the side of causing rebuilds unless we're 100%
sure)

Thanks,
Jake

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, back to index

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 21:14 Optimizing writes to unchanged files during merges? Linus Torvalds
2018-04-12 21:46 ` Junio C Hamano
2018-04-12 23:17   ` Junio C Hamano
2018-04-12 23:35     ` Linus Torvalds
2018-04-12 23:41       ` Linus Torvalds
2018-04-12 23:55         ` Linus Torvalds
2018-04-13  0:01           ` Linus Torvalds
2018-04-13  7:02             ` Elijah Newren
2018-04-13 17:14               ` Linus Torvalds
2018-04-13 17:39                 ` Stefan Beller
2018-04-13 17:53                   ` Linus Torvalds
2018-04-13 20:04                 ` Elijah Newren
2018-04-13 22:27                   ` Junio C Hamano
2018-04-16  1:44                 ` Junio C Hamano
2018-04-16  2:03                   ` Linus Torvalds
2018-04-16 16:07                     ` Lars Schneider
2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
2018-04-17 17:23                         ` Lars Schneider
2018-04-16 17:43                       ` Jacob Keller
2018-04-16 17:45                         ` Jacob Keller
2018-04-16 22:34                           ` Junio C Hamano
2018-04-17 17:27                           ` Lars Schneider
2018-04-17 17:43                             ` Jacob Keller
2018-04-16 17:47                       ` Phillip Wood
2018-04-16 20:09                       ` lists
2018-04-16 22:55                     ` Elijah Newren
2018-04-16 23:03                   ` Elijah Newren
2018-04-12 23:18   ` Linus Torvalds
2018-04-13  0:01 ` Elijah Newren

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox