mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <>
To: Junio C Hamano <>
Cc: John Cai <>, git <>,
	Christian Couder <>
Subject: Re: [PATCH 1/3] fsck: free tree buffers after walking unreachable objects
Date: Thu, 22 Sep 2022 18:16:12 -0400	[thread overview]
Message-ID: <YyzerG/> (raw)
In-Reply-To: <xmqq5yhfyztm.fsf@gitster.g>

On Thu, Sep 22, 2022 at 12:27:33PM -0700, Junio C Hamano wrote:

> > As a side note, IMHO having tree->buffer at all is a mistake, because it
> > leads to exactly this kind of confusion about when the buffer should be
> > discarded. We'd be better off having all callers parse directly into a
> > local buffer, and then clean up when they're done.
> Yeah, tree-walk.c users woud use tree_desc structure anyway, and
> instead of having a moving pointer that points into a separate thing
> (i.e. tree->buffer), it could have its own copy of the "whole buffer"
> that can be used to free when it is done iterating over entries.
> > .... But that's obviously a much bigger change.
> Yup.

I took a (very) brief stab at this, out of curiosity. The sticking point
becomes obvious very quickly: how do we get the buffer to the caller? If
you are calling parse_tree(), we can add new out-parameters to provide
the buffer. But something like parse_object() is just returning an
object struct, and we have to stuff anything we want to communicate to
the caller inside the polymorphic struct which contains it.

We could split the concept of "parse" away from "get the buffer"
entirely, but then we have a potential slowdown. The "parse" functions
really want to open the object contents and check the hash (and removing
that in the general case would probably break part of fsck, at least).
So we'd end up inflating the object contents twice, which would probably
have a measurable impact.

I don't plan on digging any further on it for now, so this is just a
note for future people who do. :)


  reply	other threads:[~2022-09-22 22:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 19:27 [INVESTIGATION] why is fsck --connectivity-only so much more expensive than rev-list --objects --all? John Cai
2022-09-20 20:41 ` Jeff King
2022-09-22 10:09   ` [PATCH 0/3] reducing fsck memory usage Jeff King
2022-09-22 10:11     ` [PATCH 1/3] fsck: free tree buffers after walking unreachable objects Jeff King
2022-09-22 18:40       ` Junio C Hamano
2022-09-22 18:58         ` Jeff King
2022-09-22 19:27           ` Junio C Hamano
2022-09-22 22:16             ` Jeff King [this message]
2022-09-22 10:13     ` [PATCH 2/3] fsck: turn off save_commit_buffer Jeff King
2022-09-22 10:15     ` [PATCH 3/3] parse_object_buffer(): respect save_commit_buffer Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyzerG/ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).