git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: William Duclot <william.duclot@ensimag.grenoble-inp.fr>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, antoine.queru@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr,
	Johannes.Schindelin@gmx.de, mh@glandium.org
Subject: Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory
Date: Wed, 8 Jun 2016 15:19:18 -0400	[thread overview]
Message-ID: <20160608191918.GB19572@sigill.intra.peff.net> (raw)
In-Reply-To: <575845D9.2010604@alum.mit.edu>

On Wed, Jun 08, 2016 at 06:20:41PM +0200, Michael Haggerty wrote:

> Instead, one could write
> 
> > static int feed_object(const unsigned char *sha1, int fd, int negative)
> > {
> > 	char buf[GIT_SHA1_HEXSZ + 2];
> > 	struct strbuf line = WRAPPED_FIXED_STRBUF(buf);
> > 
> > 	if (negative && !has_sha1_file(sha1))
> > 		return 1;
> > 
> > 	if (negative)
> > 		strbuf_addch(&line, '^');
> > 	strbuf_add(&line, sha1_to_hex(sha1), GIT_SHA1_HEXSZ);
> > 	strbuf_addch(&line, '\n');
> > 	return write_or_whine(fd, line.buf, line.len, "send-pack: send refs");
> > }

Hmm. I'm not sure that just replacing that with a regular heap-allocated
strbuf is so bad. It additionally gets rid of the SHA1_HEXSZ math in the
allocation.

So from your list of advantages:

> * It's a little less manual bookkeeping, and thus less error-prone,
>   than the current code.

We have this, but better.

> * If somebody decides to add another character to the line but
>   forgets to increase the allocation size, the code dies in testing
>   rather than (a) overflowing the buffer, like the current
>   code, or (b) silently becoming less performant, as if it used a
>   preallocated but non-fixed strbuf.

Instead of overflowing, it just silently works.

> * There's no need to strbuf_release() (which can be convenient in
>   a function with multiple exit paths).

Same.

The downside, obviously, is the cost of malloc/free. It may even be
noticeable here here because this really is a tight loop of strbuf
allocation (OTOH, we immediately make a syscall; how expensive is
write() compared to malloc()?).

We can hack around that by reusing the same strbuf.

Unfortunately the usual trick of:

  struct strbuf buf = STRBUF_INIT;
  for (...) {
	strbuf_reset(&buf);
	...
  }
  strbuf_release(&buf);

does not work, because we are in a sub-function. We can pass in the
buffer as scratch space, but that makes the function interface a little
uglier than it needs to be.

Likewise, we could make the strbuf static inside feed_object().  It's
not so bad here, where we know there aren't re-entrancy issues, but it's
not a very safe pattern in general (and it leaks the memory when we're
done with the function).

That made me wonder if we could repeatedly reuse a buffer attached to
the file descriptor. And indeed, isn't that what stdio is? The whole
reason this buffer exists is because we are using a direct descriptor
write. If we switched this function to use fprintf(), we'd avoid the
whole buffer question, have a fixed cap on our memory use (since we just
flush anytime the buffer is full) _and_ we'd reduce the number of
write syscalls we're making by almost a factor of 100.

> I don't know whether this particular function should be rewritten; I'm
> just giving an example of the type of scenario where I think it could be
> useful.
>
> In a world without fixed strbufs, what would one use in this situation?

I know I've done the exact opposite of what you wanted here and talked
about this specific function. But I _do_ think this is a pattern I've
seen several times, where we format into a buffer only to write() it
out. I think they may comprise a reasonable number of our buffer-using
loops.

-Peff

  parent reply	other threads:[~2016-06-08 19:19 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 15:13 [PATCH V2 0/3] strbuf: improve API William Duclot
2016-06-06 15:13 ` [PATCH V2 1/3] strbuf: add tests William Duclot
2016-06-06 16:11   ` Matthieu Moy
2016-06-07  8:44   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 2/3] pretty.c: rename strbuf_wrap() function William Duclot
2016-06-06 16:12   ` Matthieu Moy
2016-06-07  9:04   ` Johannes Schindelin
2016-06-06 15:13 ` [PATCH V2 3/3] strbuf: allow to use preallocated memory William Duclot
2016-06-06 16:17   ` Matthieu Moy
2016-06-06 17:19   ` Junio C Hamano
2016-06-06 20:39     ` William Duclot
2016-06-06 22:44       ` Junio C Hamano
2016-06-06 22:58         ` Jeff King
2016-06-06 23:24           ` Junio C Hamano
2016-06-06 23:25             ` Junio C Hamano
2016-06-06 23:30             ` Jeff King
2016-06-07  9:06             ` William Duclot
2016-06-07 18:10               ` Junio C Hamano
2016-06-08 16:20               ` Michael Haggerty
2016-06-08 18:07                 ` Junio C Hamano
2016-06-08 19:19                 ` Jeff King [this message]
2016-06-08 19:42                   ` [PATCH] send-pack: use buffered I/O to talk to pack-objects Jeff King
2016-06-09 12:10                     ` Matthieu Moy
2016-06-09 14:34                       ` Ramsay Jones
2016-06-09 17:12                         ` Jeff King
2016-06-09 22:40                           ` Ramsay Jones
2016-06-09 16:40                       ` Junio C Hamano
2016-06-09 17:14                         ` Jeff King
2016-06-09 17:22                         ` Matthieu Moy
2016-06-08 19:48                   ` [PATCH V2 3/3] strbuf: allow to use preallocated memory Junio C Hamano
2016-06-08 19:52                     ` Jeff King
2016-06-08 23:05                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608191918.GB19572@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mh@glandium.org \
    --cc=mhagger@alum.mit.edu \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).