git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Brian Gesiak <modocache@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
Date: Mon, 10 Mar 2014 21:33:17 -0400	[thread overview]
Message-ID: <20140311013316.GA11524@sigill.intra.peff.net> (raw)
In-Reply-To: <CAN7MxmW-aWgTQpTMuEx=kzyHVUf5E7unZR-LmLQrY-AmmrZxjA@mail.gmail.com>

On Sun, Mar 09, 2014 at 02:04:16AM +0900, Brian Gesiak wrote:

> > Once the logic is extracted into a nice API, there are
> > several other places that can use it, too: ...
> 
> I've found the following four areas so far:
> 
> 1. lockfile.lock_file
> 2. git-compat-util.odb_mkstemp
> 3. git-compat-util.odb_pack_keep
> 4. diff.prepare_temp_file
> 
> Tons of files use (1) and (2). (3) is less common, and (4) is only
> used for external diffs.

Yeah, I would expect (1) and (2) to be the most frequent. (3) gets
written on every push and fetch, but only for a short period. (4) is
also used for diff's textconv, though like external diffs, they are
relatively rare.

In my experience, most of the cruft that gets left is from (2), since a
push or fetch will spool to a tmpfile, then verify the results via "git
index-pack". Any failure there leaves the file in place.

There are a few other potential candidates we can find by grepping for
mkstemp. Not all of those might want cleanup, but it's a starting point
for investigation.

> > the shallow_XXXXXX tempfiles
> 
> I'm not sure I was able to find this one. Are you referring to the
> lock files used when fetching, such as in fetch-pack.c?

I mean the xmkstemp from setup_temporary_shallow in shallow.c.

> I'd say the biggest difference between lockfiles and object files is
> that tempfile methods like odb_mkstemp need to know the location of
> the object directory. Aside from that, lockfiles and the external diff
> files appear to be cleaned up at exit, while temporary object files
> tend to have a more finely controlled lifecycle. I'm still
> investigating this aspect of the proposal, though.

The diff tempfiles are true tempfiles; they always go away in the end
(though of course we want to clean them up as we finish with them,
rather than doing it all at the end). Lockfiles may get committed into
place (i.e., via atomic rename) or rolled back (deleted).

Object files should generally be hard-linked into place, but there is
some extra magic in move_temp_to_file to fallback to renames.  Some of
that we may be able to get rid of (e.g., we try to avoid doing
cross-directory renames at all these days, so the comment there may be
out of date).

> One question, though: the idea on the ideas page specifies that
> temporary pack and object files may "optionally" be cleaned up in case
> of error during program execution. How will users specify their
> preference? I think the API for creating temporary files should allow
> cleanup options to be specified on a per-file basis. That way each
> part of the program that creates tempfiles can specify a different
> config value to determine the cleanup policy.

That probably makes sense. I certainly had a config option in mind. I
mentioned above that the most common cruft is leftover packfiles from
pushes and fetches. We haven't deleted those historically because the
same person often controls both the client and the server, and they
would want to possibly do forensics on the packfile sent to the remote,
or even rescue objects out of it. But the remote end may simply have
rejected the pack by some policy, and has no interest in forensics.

Having a config option for each type of file may be cool, but I don't
know how useful it would be in practice. Still, it's certainly worth
thinking about and looking into.

-Peff

  reply	other threads:[~2014-03-11  1:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-01 21:04 [GSoC14][RFC] Proposal Draft: Refactor tempfile handling Brian Gesiak
2014-03-03 22:42 ` Jeff King
2014-03-08 17:04   ` Brian Gesiak
2014-03-11  1:33     ` Jeff King [this message]
2014-03-11 16:27 ` Michael Haggerty
2014-03-11 18:05   ` Jeff King
2014-03-12 12:26     ` Brian Gesiak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140311013316.GA11524@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=modocache@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).