git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Elijah Newren <newren@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Optimizing writes to unchanged files during merges?
Date: Fri, 13 Apr 2018 10:14:26 -0700	[thread overview]
Message-ID: <CA+55aFwi9pTAJT_qtv=vHLgu=B1fdXBoD96i8Y5xnbS=zrfSzg@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BHQsOSCJiPU9Ku5b67QTkAjnEBrhx04mTXf2QdPBriHmw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2434 bytes --]

On Fri, Apr 13, 2018 at 12:02 AM, Elijah Newren <newren@gmail.com> wrote:
>
> I hope you don't mind me barging into your conversation

I was getting tired of my own rambling anyway..

> However, it turns out we have this awesome function called
> "was_tracked(const char *path)" that was intended for answering this
> exact question.  So, assuming was_tracked() isn't buggy, the correct
> patch for this problem would look like:

Apparently that causes problems, for some odd reason.

I like the notion of checking the index, but it's not clear that the
index is reliable in the presence of renames either.

>   A big series
> including that patch was merged to master two days ago, but
> unfortunately that exact patch was the one that caused some
> impressively awful fireworks[1].

Yeah, so this code is fragile.

How about we take a completely different approach? Instead of relying
on fragile (but clever) tests, why not rely on stupid brute force?

Yeah, yeah, it's bad to be stupid, but sometimes simple and stupid
really does work.

See the attached patch. It gets rid of all the subtle "has this been
renamed" tests entirely, and just _always_ does that final
update_file().

But instead, it makes update_file() a bit smarter, and says "before we
write this file out, let's see if it's already there and already has
the expected contents"?

Now, it really shouldn't be _quite_ as stupid as that: we should
probably set a flag in the "hey, the oid matches, maybe it's worth
checking", so that it doesn't do the check in the cases where we know
the merge has done things.

But it's actually *fairly* low cost, because before it reads the file
it at least checks that file length matches the expected length (and
that the user permission bits match the expected mode).

So if the file doesn't match, most of the time the real cost will just
be an extra 'open/fstat/close()' sequence. That's pretty cheap.

So even the completely stupid approach is probably not too bad, and it
could be made smarter.

NOTE! I have *NOT* tested this on anything interesting. I tested the
patch on my stupid test-case, but that'[s it. I didn't even bother
re-doing the kernel merge that started this.

Comments? Because considering the problems this code has had, maybe
"stupid" really is the right approach...

[ Ok, I lied. I just tested this on the kernel merge. It worked fine,
and avoided modifying <linux/mm.h> ]

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2221 bytes --]

 merge-recursive.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624..ed2200065 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -815,6 +815,32 @@ static int make_room_for_path(struct merge_options *o, const char *path)
 	return err(o, msg, path, _(": perhaps a D/F conflict?"));
 }
 
+static int working_tree_matches(const char *path, const char *buf, unsigned long size, unsigned mode)
+{
+	int fd, matches;
+	struct stat st;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return 0;
+	matches = 0;
+	if (!fstat(fd, &st) && st.st_size == size && S_ISREG(st.st_mode) && !(0700 & (st.st_mode ^ mode))) {
+		char tmpbuf[1024];
+		while (size) {
+			int n = read(fd, tmpbuf, sizeof(tmpbuf));
+			if (n <= 0 || n > size)
+				break;
+			if (memcmp(tmpbuf, buf, n))
+				break;
+			buf += n;
+			size -= n;
+		}
+		matches = !size;
+	}
+	close(fd);
+	return matches;
+}
+
 static int update_file_flags(struct merge_options *o,
 			     const struct object_id *oid,
 			     unsigned mode,
@@ -856,6 +882,8 @@ static int update_file_flags(struct merge_options *o,
 				size = strbuf.len;
 				buf = strbuf_detach(&strbuf, NULL);
 			}
+			if (working_tree_matches(path, buf, size, mode))
+				goto free_buf;
 		}
 
 		if (make_room_for_path(o, path) < 0) {
@@ -1782,20 +1810,8 @@ static int merge_content(struct merge_options *o,
 
 	if (mfi.clean && !df_conflict_remains &&
 	    oid_eq(&mfi.oid, a_oid) && mfi.mode == a_mode) {
-		int path_renamed_outside_HEAD;
 		output(o, 3, _("Skipped %s (merged same as existing)"), path);
-		/*
-		 * The content merge resulted in the same file contents we
-		 * already had.  We can return early if those file contents
-		 * are recorded at the correct path (which may not be true
-		 * if the merge involves a rename).
-		 */
-		path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-		if (!path_renamed_outside_HEAD) {
-			add_cacheinfo(o, mfi.mode, &mfi.oid, path,
-				      0, (!o->call_depth), 0);
-			return mfi.clean;
-		}
+		/* We could set a flag here and pass it to "update_file()" */
 	} else
 		output(o, 2, _("Auto-merging %s"), path);
 

  reply	other threads:[~2018-04-13 17:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12 21:14 Optimizing writes to unchanged files during merges? Linus Torvalds
2018-04-12 21:46 ` Junio C Hamano
2018-04-12 23:17   ` Junio C Hamano
2018-04-12 23:35     ` Linus Torvalds
2018-04-12 23:41       ` Linus Torvalds
2018-04-12 23:55         ` Linus Torvalds
2018-04-13  0:01           ` Linus Torvalds
2018-04-13  7:02             ` Elijah Newren
2018-04-13 17:14               ` Linus Torvalds [this message]
2018-04-13 17:39                 ` Stefan Beller
2018-04-13 17:53                   ` Linus Torvalds
2018-04-13 20:04                 ` Elijah Newren
2018-04-13 22:27                   ` Junio C Hamano
2018-04-16  1:44                 ` Junio C Hamano
2018-04-16  2:03                   ` Linus Torvalds
2018-04-16 16:07                     ` Lars Schneider
2018-04-16 17:04                       ` Ævar Arnfjörð Bjarmason
2018-04-17 17:23                         ` Lars Schneider
2018-04-16 17:43                       ` Jacob Keller
2018-04-16 17:45                         ` Jacob Keller
2018-04-16 22:34                           ` Junio C Hamano
2018-04-17 17:27                           ` Lars Schneider
2018-04-17 17:43                             ` Jacob Keller
2018-04-16 17:47                       ` Phillip Wood
2018-04-16 20:09                       ` Stefan Haller
2018-04-16 22:55                     ` Elijah Newren
2018-04-16 23:03                   ` Elijah Newren
2018-04-12 23:18   ` Linus Torvalds
2018-04-13  0:01 ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFwi9pTAJT_qtv=vHLgu=B1fdXBoD96i8Y5xnbS=zrfSzg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).