git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Jessica Clarke <jrtc27@jrtc27.com>, Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH] Properly align memory allocations and temporary buffers
Date: Sat, 8 Jan 2022 00:30:44 +0100	[thread overview]
Message-ID: <f40c1b47-9aad-2dcc-ceeb-5dee2b517cd8@web.de> (raw)
In-Reply-To: <xmqqzgo76xpj.fsf@gitster.g>

Am 07.01.22 um 22:30 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> If falling back to malloc(3) is fine in general then I wonder if we can
>> just it use always.  This would save two branches and make buffer
>> management trivial here.  How much worse is malloc(3) on platforms with
>> alloca(3)?  Do we sort lots of short lists somewhere?  In other words:
>> Does this stack allocation optimization actually make a measurable
>> difference?
>
> Well all the preimage of this came from your 04ee8b87 (compat: add
> qsort_s(), 2017-01-22), so you tell me ;-)

Right, except I stole the code almost verbatim from compat/qsort.c,
which had this optimization since 43fe901b71 (compat: Add simplified
merge sort implementation from glibc, 2008-02-05).  :)  The
optimization may have raised my eyebrow a bit at the time, but not
enough to come up with a meaningful benchmark..

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c (the
original) still uses alloca(3) (and more tricks), by the way;
https://cgit.freebsd.org/src/tree/lib/libc/stdlib/merge.c still
doesn't.

>
>> Heap fragmention should not be a concern here, at least, because the
>> pattern of requesting, using and releasing a single allocation won't
>> leave holes.
>
> Sure.  Even in a multi-threaded environment that should be true.
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
> Subject: compat/qsort_s.c: avoid using potentially unaligned access
>
> The compatibility definition for qsort_s() uses "char buffer[1024]"
> on the stack to avoid making malloc() calls for small temporary
> space, which essentially hand-rolls alloca().
>
> But the elements of the array being sorted may have alignment needs
> more strict than what an array of bytes may have. &buf[0] may be
> word aligned, but using the address as if it stores the first
> element of an array of a struct, whose first member may need to be
> aligned on double-word boundary, would be a no-no.
>
> We could use xalloca() from git-compat-util.h, or alloca() directly
> on platforms with HAVE_ALLOCA_H, but let's try using unconditionally
> xmalloc() before we know the performance characteristics of the
> callers.
>
> It may not make much of an argument to inspect the current callers
> and say "it shouldn't matter to any of them", but anyway:
>
>  * The one in object-name.c is used to sort potential matches to a
>    given ambiguous object name prefix in the error path;
>
>  * The one in pack-write.c is done once per a pack .idx file being
>    written to create the reverse index, so (1) the cost of malloc()
>    overhead is dwarfed by the cost of the packing operation, and (2)
>    the number of entries being sorted is the number of objects in a
>    pack;
>
>  * The one in ref-filter.c is used by "branch --list", "tag --list",
>    and "for-each-ref", only once per operation.  We sort an array of
>    pointers with entries, each corresponding to a ref that is shown.
>
>  * The one in string-list.c is used by sort_string_list(), which is
>    way too generic to assume any access patterns, so it may or may
>    not matter, but I do not care too much ;-)
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  compat/qsort_s.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git c/compat/qsort_s.c w/compat/qsort_s.c
> index 52d1f0a73d..0f7ff30f5f 100644
> --- c/compat/qsort_s.c
> +++ w/compat/qsort_s.c
> @@ -49,21 +49,15 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	char *tmp;
>
>  	if (!n)
>  		return 0;
>  	if (!b || !cmp)
>  		return -1;
>
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> -	} else {
> -		/* It's somewhat large, so malloc it.  */
> -		char *tmp = xmalloc(size);
> -		msort_with_tmp(b, n, s, cmp, tmp, ctx);
> -		free(tmp);
> -	}
> +	tmp = xmalloc(size);
> +	msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +	free(tmp);
>  	return 0;
>  }

--- >8 ---
Subject: [PATCH] stable-qsort: avoid using potentially unaligned access

Like in the previous patch for compat/qsort_s.c, remove the optimization
of using an on-stack buffer to avoid small allocations.  This ensures
maximum alignment for the array elements and simplifies the code a bit.

The performance impact for the current callers is unlikely to be
noticeable:

 * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and
   ALLOC_GROW several times already, so another allocation of up to 1KB
   should not matter much.

 * diffcore-rename.c::diffcore_rename_extended() is called once per diff
   or twice per merge, and those require allocations for each object and
   more already.

 * merge-ort.c::detect_and_process_renames() is called once per merge.
   It's responsible for the two per-merge diffcore_rename_extended()
   calls mentioned above as well, though.  So this is possibly the most
   impacted caller.  Per-object allocations are likely to dwarf the
   additional small allocations in git_stable_qsort(), though.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 stable-qsort.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/stable-qsort.c b/stable-qsort.c
index 6cbaf39f7b..7ff12467cd 100644
--- a/stable-qsort.c
+++ b/stable-qsort.c
@@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s,
 		      int (*cmp)(const void *, const void *))
 {
 	const size_t size = st_mult(n, s);
-	char buf[1024];
-
-	if (size < sizeof(buf)) {
-		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf);
-	} else {
-		/* It's somewhat large, so malloc it.  */
-		char *tmp = xmalloc(size);
-		msort_with_tmp(b, n, s, cmp, tmp);
-		free(tmp);
-	}
+	char *tmp;
+
+	tmp = xmalloc(size);
+	msort_with_tmp(b, n, s, cmp, tmp);
+	free(tmp);
 }
--
2.34.1



  reply	other threads:[~2022-01-07 23:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
2022-01-06 21:46 ` Taylor Blau
2022-01-06 21:56   ` Jessica Clarke
2022-01-06 22:27   ` Junio C Hamano
2022-01-06 22:56     ` Jessica Clarke
2022-01-07  0:10       ` Junio C Hamano
2022-01-07  0:22         ` Jessica Clarke
2022-01-07  0:31         ` brian m. carlson
2022-01-07  0:39           ` Jessica Clarke
2022-01-07  1:43             ` brian m. carlson
2022-01-07  2:08               ` Jessica Clarke
2022-01-07  2:11                 ` Jessica Clarke
2022-01-07 19:30               ` Junio C Hamano
2022-01-07 19:33                 ` Jessica Clarke
2022-01-07 20:56                 ` René Scharfe
2022-01-07 21:30                   ` Junio C Hamano
2022-01-07 23:30                     ` René Scharfe [this message]
2022-01-08  0:18                       ` Elijah Newren
2022-01-06 23:22 ` brian m. carlson
2022-01-06 23:31   ` Jessica Clarke
2022-01-07 14:57 ` Philip Oakley
2022-01-07 16:08 ` René Scharfe
2022-01-07 16:21   ` Jessica Clarke
2022-01-12 13:58 ` Jessica Clarke
2022-01-12 15:47   ` René Scharfe
2022-01-12 15:49     ` Jessica Clarke
2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
2022-01-23 20:17   ` Junio C Hamano
2022-01-23 20:23     ` Jessica Clarke
2022-01-23 20:28       ` Junio C Hamano
2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
2022-01-24 17:11     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f40c1b47-9aad-2dcc-ceeb-5dee2b517cd8@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrtc27@jrtc27.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).