git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Brian Gesiak <modocache@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling
Date: Sun, 9 Mar 2014 02:04:16 +0900	[thread overview]
Message-ID: <CAN7MxmW-aWgTQpTMuEx=kzyHVUf5E7unZR-LmLQrY-AmmrZxjA@mail.gmail.com> (raw)
In-Reply-To: <20140303224238.GA2699@sigill.intra.peff.net>

Excellent, thank you very much for the feedback, Jeff! It was very
helpful and encouraging. I've done some more research based on your
comments.

> Once the logic is extracted into a nice API, there are
> several other places that can use it, too: ...

I've found the following four areas so far:

1. lockfile.lock_file
2. git-compat-util.odb_mkstemp
3. git-compat-util.odb_pack_keep
4. diff.prepare_temp_file

Tons of files use (1) and (2). (3) is less common, and (4) is only
used for external diffs.

> the shallow_XXXXXX tempfiles

I'm not sure I was able to find this one. Are you referring to the
lock files used when fetching, such as in fetch-pack.c?

> What are the mismatches in how lockfiles and object files are
> handled? E.g., how do we finalize them into place? How should
> the API be designed to minimize race conditions (e.g., if we
> get a signal delivered while we are committing or cleaning up a file)?

I'd say the biggest difference between lockfiles and object files is
that tempfile methods like odb_mkstemp need to know the location of
the object directory. Aside from that, lockfiles and the external diff
files appear to be cleaned up at exit, while temporary object files
tend to have a more finely controlled lifecycle. I'm still
investigating this aspect of the proposal, though.

One question, though: the idea on the ideas page specifies that
temporary pack and object files may "optionally" be cleaned up in case
of error during program execution. How will users specify their
preference? I think the API for creating temporary files should allow
cleanup options to be specified on a per-file basis. That way each
part of the program that creates tempfiles can specify a different
config value to determine the cleanup policy.

Thanks for all your help so far!

- Brian Gesiak

PS: I'm maintaining a working draft of my proposal here, in case
anyone wants to offer any feedback prior to its submission:
https://gist.github.com/modocache/9434914


On Tue, Mar 4, 2014 at 7:42 AM, Jeff King <peff@peff.net> wrote:
> On Sun, Mar 02, 2014 at 06:04:39AM +0900, Brian Gesiak wrote:
>
>> My name is Brian Gesiak. I'm a research student at the University of
>> Tokyo, and I'm hoping to participate in this year's Google Summer of
>> Code by contributing to Git. I'm a longtime user, first-time
>> contributor--some of you may have noticed my "microproject"
>> patches.[1][2]
>
> Yes, we did notice them. Thanks, and welcome. :)
>
>> The ideas page points out that while lock files are closed and
>> unlinked[3] when the program exits[4], object pack files implement
>> their own brand of temp file creation and deletion. This
>> implementation doesn't share the same guarantees as lock files--it is
>> possible that the program terminates before the temp file is
>> unlinked.[5]
>>
>> Lock file references are stored in a linked list. When the program
>> exits, this list is traversed and each file is closed and unlinked. It
>> seems to me that this mechanism is appropriate for temp files in
>> general, not just lock files. Thus, my proposal would be to extract
>> this logic into a separate module--tempfile.h, perhaps. Lock and
>> object files would share the tempfile implementation.
>>
>> That is, both object and lock temp files would be stored in a linked
>> list, and all of these would be deleted at program exit.
>
> Yes, I think this is definitely the right way to go. We should be able
> to unify the tempfile handling for all of git. Once the logic is
> extracted into a nice API, there are several other places that can use
> it, too:
>
>   - the external diff code creates tempfiles and uses its own cleanup
>     routines
>
>   - the shallow_XXXXXX tempfiles (these are not cleaned right now,
>     though I sent a patch recently for them to do their own cleanup)
>
> Those are just off the top of my head. There may be other spots, too.
>
> It is worth thinking in your proposal about some of the things that the
> API will want to handle. What are the mismatches in how lockfiles and
> object files are handled? E.g., how do we finalize them into place?
> How should the API be designed to minimize race conditions (e.g., if we
> get a signal delivered while we are committing or cleaning up a file)?
>
>> Please let me know if this seems like it would make for an interesting
>> proposal, or if perhaps there is something I am overlooking. Any
>> feedback at all would be appreciated. Thank you!
>
> You definitely have a grasp of what the project is aiming for, and which
> areas need to be touched.
>
> -Peff

  reply	other threads:[~2014-03-08 17:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-01 21:04 [GSoC14][RFC] Proposal Draft: Refactor tempfile handling Brian Gesiak
2014-03-03 22:42 ` Jeff King
2014-03-08 17:04   ` Brian Gesiak [this message]
2014-03-11  1:33     ` Jeff King
2014-03-11 16:27 ` Michael Haggerty
2014-03-11 18:05   ` Jeff King
2014-03-12 12:26     ` Brian Gesiak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN7MxmW-aWgTQpTMuEx=kzyHVUf5E7unZR-LmLQrY-AmmrZxjA@mail.gmail.com' \
    --to=modocache@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).