[PATCH] checkout: don't write merge results into the object database

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] checkout: don't write merge results into the object database
@ 2017-06-15 11:33 René Scharfe
  2017-06-15 13:57 ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: René Scharfe @ 2017-06-15 11:33 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Johannes Schindelin, Jeff King

Merge results need to be written to the worktree, of course, but we
don't necessarily need object entries for them, especially if they
contain conflict markers.  Use pretend_sha1_file() to create fake
blobs to pass to make_cache_entry() and checkout_entry() instead.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/checkout.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 1624eed7e7..f51c15af9f 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -215,7 +215,7 @@ static int checkout_merged(int pos, const struct checkout *state)
 
 	/*
 	 * NEEDSWORK:
-	 * There is absolutely no reason to write this as a blob object
+	 * There is absolutely no reason to build a fake blob object
 	 * and create a phony cache entry.  This hack is primarily to get
 	 * to the write_entry() machinery that massages the contents to
 	 * work-tree format and writes out which only allows it for a
@@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct checkout *state)
 	 * (it also writes the merge result to the object database even
 	 * when it may contain conflicts).
 	 */
-	if (write_sha1_file(result_buf.ptr, result_buf.size,
-			    blob_type, oid.hash))
+	if (pretend_sha1_file(result_buf.ptr, result_buf.size,
+			      OBJ_BLOB, oid.hash))
 		die(_("Unable to add merge result for '%s'"), path);
 	free(result_buf.ptr);
 	ce = make_cache_entry(mode, oid.hash, path, 2, 0);
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] checkout: don't write merge results into the object database
  2017-06-15 11:33 [PATCH] checkout: don't write merge results into the object database René Scharfe
@ 2017-06-15 13:57 ` Jeff King
  2017-06-15 15:48   ` René Scharfe
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2017-06-15 13:57 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Johannes Schindelin

On Thu, Jun 15, 2017 at 01:33:42PM +0200, René Scharfe wrote:

> Merge results need to be written to the worktree, of course, but we
> don't necessarily need object entries for them, especially if they
> contain conflict markers.  Use pretend_sha1_file() to create fake
> blobs to pass to make_cache_entry() and checkout_entry() instead.

Conceptually this makes sense, although I have a comment below.

Out of curiosity, did this come up when looking into the cherry-pick
segfault/error from a few hours ago?

> @@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct checkout *state)
>  	 * (it also writes the merge result to the object database even
>  	 * when it may contain conflicts).
>  	 */
> -	if (write_sha1_file(result_buf.ptr, result_buf.size,
> -			    blob_type, oid.hash))
> +	if (pretend_sha1_file(result_buf.ptr, result_buf.size,
> +			      OBJ_BLOB, oid.hash))
>  		die(_("Unable to add merge result for '%s'"), path);
>  	free(result_buf.ptr);

I wondered if pretend_sha1_file() makes a copy of the buffer, and indeed
it does. So this is correct.

But that raises an interesting question: how big are these objects, and
is it a good idea to store them in RAM? Obviously they're in RAM
already, but I'm not sure if it's one-at-a-time. We could be bumping the
peak RAM used if there's a large number of these entries. Touching the
on-disk odb does feel hacky, but it also means they behave like other
objects.

If it is a concern, I think it could be solved by "unpretending" after
our call to checkout_entry completes. That would need a new call in
sha1_file.c, but it should be easy to write.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] checkout: don't write merge results into the object database
  2017-06-15 13:57 ` Jeff King
@ 2017-06-15 15:48   ` René Scharfe
  2017-06-15 21:56     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: René Scharfe @ 2017-06-15 15:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano, Johannes Schindelin

Am 15.06.2017 um 15:57 schrieb Jeff King:
> On Thu, Jun 15, 2017 at 01:33:42PM +0200, René Scharfe wrote:
> 
>> Merge results need to be written to the worktree, of course, but we
>> don't necessarily need object entries for them, especially if they
>> contain conflict markers.  Use pretend_sha1_file() to create fake
>> blobs to pass to make_cache_entry() and checkout_entry() instead.
> 
> Conceptually this makes sense, although I have a comment below.
> 
> Out of curiosity, did this come up when looking into the cherry-pick
> segfault/error from a few hours ago?

No, it came up in the discussion of Dscho's memory leak plug series [1].

>> @@ -225,8 +225,8 @@ static int checkout_merged(int pos, const struct checkout *state)
>>   	 * (it also writes the merge result to the object database even
>>   	 * when it may contain conflicts).
>>   	 */
>> -	if (write_sha1_file(result_buf.ptr, result_buf.size,
>> -			    blob_type, oid.hash))
>> +	if (pretend_sha1_file(result_buf.ptr, result_buf.size,
>> +			      OBJ_BLOB, oid.hash))
>>   		die(_("Unable to add merge result for '%s'"), path);
>>   	free(result_buf.ptr);
> 
> I wondered if pretend_sha1_file() makes a copy of the buffer, and indeed
> it does. So this is correct.
> 
> But that raises an interesting question: how big are these objects, and
> is it a good idea to store them in RAM? Obviously they're in RAM
> already, but I'm not sure if it's one-at-a-time. We could be bumping the
> peak RAM used if there's a large number of these entries. Touching the
> on-disk odb does feel hacky, but it also means they behave like other
> objects.
> 
> If it is a concern, I think it could be solved by "unpretending" after
> our call to checkout_entry completes. That would need a new call in
> sha1_file.c, but it should be easy to write.

Good point; we'd accumulate fake entries that we'll never need again.
The patch should clean them up.

Alternatively we could finally address the NEEDSWORK comment and
provide a way to checkout a file without faking an index entry..

René


[1] https://public-inbox.org/git/2704e145927c851c4163a68cfdfd5ada48fff21d.1493906085.git.johannes.schindelin@gmx.de/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] checkout: don't write merge results into the object database
  2017-06-15 15:48   ` René Scharfe
@ 2017-06-15 21:56     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2017-06-15 21:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Git List, Johannes Schindelin

René Scharfe <l.s.r@web.de> writes:

>> If it is a concern, I think it could be solved by "unpretending" after
>> our call to checkout_entry completes. That would need a new call in
>> sha1_file.c, but it should be easy to write.
>
> Good point; we'd accumulate fake entries that we'll never need again.

Hopefully we are not pretending to have the same object from two
callsites; this one may knows the merged one no longer needs to be
in core, but some other callsite wanted to pretend a blob with the
same contents is in the object store, what happens to it?  I do not
think we want to refcount ;-)

> The patch should clean them up.
>
> Alternatively we could finally address the NEEDSWORK comment and
> provide a way to checkout a file without faking an index entry..

Yeah, we should not need index entry, and we should not need the
blob object name.

    ... thinks a bit more ...

Having said that, in the longer term, it may be safe to write an
actual object out to the object store.  The convert-to-working-tree
backends currently work only on raw bytes, but it is not inplausible
for some new interfaces to want to pass the object name to the
backend and tell it either togive raw (converted) bytes back, or to
write out the bytes directly to the filesystem, bypassing the main
Git process.  If we "pretend" in this process, not just we accumulate
cruft in-core as Peff points out, we risk giving out an invalid object
name to such external mechanisms.

I do not think it is too bad to leave a handful of temporary blobs
that are written out by "git checkout -m <other-branch>" in the
object store to be GC'ed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-15 21:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-15 11:33 [PATCH] checkout: don't write merge results into the object database René Scharfe
2017-06-15 13:57 ` Jeff King
2017-06-15 15:48   ` René Scharfe
2017-06-15 21:56     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).