git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: Git in Outreachy December 2019?
Date: Mon, 23 Sep 2019 17:28:34 -0400	[thread overview]
Message-ID: <20190923212834.GA19504@sigill.intra.peff.net> (raw)
In-Reply-To: <20190923203854.171170-1-jonathantanmy@google.com>

On Mon, Sep 23, 2019 at 01:38:54PM -0700, Jonathan Tan wrote:

> I didn't have any concrete ideas so I didn't include those, but some
> unrefined ideas:

One risk to a mentoring project like this is that the intern does a good
job of steps 1-5, and then in step 6 we realize that the whole thing is
not useful, and upstream doesn't want it. Which isn't to say the intern
didn't learn something, and the project didn't benefit. Negative results
can be useful; but it can also be demoralizing.

I'm not arguing that's going to be the case here. But I do think it's
worth talking through these things a bit as part of thinking about
proposals.

>  - index-pack has the CLI option to specify a message to be written into
>    the .promisor file, but in my patch to write fetched refs to
>    .promisor [1], I ended up making fetch-pack.c write the information
>    because I didn't know how many refs were going to be written (and I
>    didn't want to bump into CLI argument length limits). If we had this
>    feature, I might have been able to pass a callback to index-pack that
>    writes the list of refs once we have the fd into .promisor,
>    eliminating some code duplication (but I haven't verified this).

That makes some sense. We could pass the data over a pipe, but obviously
stdin is already in use to receive the pack here. Ideally we'd be able
to pass multiple streams between the programs, but I think due to
Windows support, we can't assume that arbitrary pipe descriptors will
make it across the run-command boundary. So I think we'd be left with
communicating via temporary files (which really isn't the worst thing in
the world, but has its own complications).

>  - In your reply [2] to the above [1], you mentioned the possibility of
>    keeping a list of cutoff points. One way of doing this, as I state in
>    [3], is my original suggestion back in 2017 of one such
>    repository-wide list. If we do this, it would be better for
>    fetch-pack to handle this instead of index-pack, and it seems more
>    efficient to me to have index-pack be able to pass objects to
>    fetch-pack as they are inflated instead of fetch-pack rereading the
>    compressed forms on disk (but again, I haven't verified this).

And this is the flip-side problem: we need to get data back, but we have
only stdout, which is already in use (so we need some kind of protocol).
That leads to things like the horrible NUL-byte added by 83558686ce
(receive-pack: send keepalives during quiet periods, 2016-07-15).

> There are also the debuggability improvements of not having to deal with
> 2 processes.

I think it can sometimes be easier to debug with two separate processes,
because the input to index-pack is well-defined and can be repeated
without hitting the network (though you do have to figure out how to
record the network response, which can be non-trivial). I've also done
similar things for running performance simulations.

We'll still have the stand-alone index-pack command, so it can be used
for those cases. But as we add more features that utilize the in-process
interface, that may eventually stop being feasible.

> > [dropping unpack-objects]
> >     Maybe that would be worth making part of the project?
> 
> I'm reluctant to do so because I don't want to increase the scope too
> much - although if my project has relatively narrow scope for an
> Outreachy project, we can do so. As for eliminating the utility of
> having richer communication, I don't think so, because in the situations
> where we require richer communication (right now, situations to do with
> partial clone), we specifically run index-pack anyway.

Yeah, we're in kind of a weird situation there, where unpack-objects is
used less and less. I wonder how many surprises are lurking where
somebody reasoned about index-pack behavior, but unpack-objects may do
something slightly differently (I know this came up when we looked at
fsck-ing incoming objects for submodule vulnerabilities).

I kind of wonder if it would be reasonable to just always use index-pack
for the sake of simplicity, even if it never learns to actually unpack
objects. We've been doing that for years on the server side at GitHub
without ill effects (I think the unpack route is slightly more efficient
for a thin pack, but since it only kicks in when there are few objects
anyway, I wonder how big an advantage it is in general).

-Peff

  reply	other threads:[~2019-09-23 21:28 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
2019-08-31  7:58 ` Christian Couder
2019-08-31 19:44   ` Olga Telezhnaya
2019-09-04 19:41 ` Jeff King
2019-09-05  7:24   ` Christian Couder
2019-09-05 19:39   ` Emily Shaffer
2019-09-06 11:55     ` Carlo Arenas
2019-09-07  6:39       ` Jeff King
2019-09-07 10:13         ` Carlo Arenas
2019-09-07  6:36     ` Jeff King
2019-09-08 14:56   ` Pratyush Yadav
2019-09-09 17:00     ` Jeff King
2019-09-23 18:07   ` SZEDER Gábor
2019-09-26  9:47     ` SZEDER Gábor
2019-09-26 19:32       ` Johannes Schindelin
2019-09-26 21:54         ` SZEDER Gábor
2019-09-26 11:42     ` Johannes Schindelin
2019-09-13 20:03 ` Jonathan Tan
2019-09-13 20:51   ` Jeff King
2019-09-16 18:42     ` Emily Shaffer
2019-09-16 21:33       ` Eric Wong
2019-09-16 21:44       ` SZEDER Gábor
2019-09-16 23:13         ` Jonathan Nieder
2019-09-17  0:59           ` Jeff King
2019-09-17 11:23       ` Johannes Schindelin
2019-09-17 12:02         ` SZEDER Gábor
2019-09-23 12:47           ` Johannes Schindelin
2019-09-23 16:58             ` SZEDER Gábor
2019-09-26 11:04               ` Johannes Schindelin
2019-09-26 13:28                 ` SZEDER Gábor
2019-09-26 19:39                   ` Johannes Schindelin
2019-09-26 21:44                     ` SZEDER Gábor
2019-09-27 22:18                       ` Jeff King
2019-10-09 17:25                         ` SZEDER Gábor
2019-10-11  6:34                           ` Jeff King
2019-09-23 18:19             ` Jeff King
2019-09-24 14:30               ` Johannes Schindelin
2019-09-17 15:10         ` Christian Couder
2019-09-23 12:50           ` Johannes Schindelin
2019-09-23 19:30           ` Jeff King
2019-09-23 18:07         ` Jeff King
2019-09-24 14:25           ` Johannes Schindelin
2019-09-24 15:33             ` Jeff King
2019-09-28  3:56               ` Junio C Hamano
2019-09-24  0:55         ` Eric Wong
2019-09-26 12:45           ` Johannes Schindelin
2019-09-30  8:55             ` Eric Wong
2019-09-28  4:01           ` Junio C Hamano
2019-09-20 17:04     ` Jonathan Tan
2019-09-21  1:47       ` Emily Shaffer
2019-09-23 14:23         ` Christian Couder
2019-09-23 19:40         ` Jeff King
2019-09-23 22:29           ` Philip Oakley
2019-10-22 21:16         ` Emily Shaffer
2019-09-23 11:49       ` Christian Couder
2019-09-23 17:58         ` Jonathan Tan
2019-09-23 19:27           ` Jeff King
2019-09-23 20:48             ` Jonathan Tan
2019-09-23 19:15       ` Jeff King
2019-09-23 20:38         ` Jonathan Tan
2019-09-23 21:28           ` Jeff King [this message]
2019-09-24 17:07             ` Jonathan Tan
2019-09-26  7:09               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190923212834.GA19504@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).