From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,T_RP_MATCHES_RCVD shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id D0CF01F404 for ; Tue, 16 Jan 2018 14:01:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751835AbeAPOBd (ORCPT ); Tue, 16 Jan 2018 09:01:33 -0500 Received: from mail-qk0-f172.google.com ([209.85.220.172]:32984 "EHLO mail-qk0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751372AbeAPOBc (ORCPT ); Tue, 16 Jan 2018 09:01:32 -0500 Received: by mail-qk0-f172.google.com with SMTP id i141so20946894qke.0 for ; Tue, 16 Jan 2018 06:01:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=+ap1oel/+0yptjMGFsY7nZSV6U2nWSN5IBM+3Rpe1vo=; b=ofaUQX57Zq15lMJCIbugZuY7FNw0XGYpRdihlQ6LaBl9xuQuYXpvnWbvUzZsttIVIT vaBZTIpAbGE5Egc9vexTv159ujmFV6Wyg3iEtXubW80esYTaGaRt04dUgrjS50C3y8hE HmfR5WiZhANqj6jCNU1hutOQ9pXCZXoEwdZOR/E01YsthqA2ZZiW5Bbw8WeuaCoYlUVI hXJY1SjgNON/tpOXfqnRH7bf06wQupN3NwL7pSDnNur45iKoUzT505+Ismr0qTMv3Ywj N5ydwTdMQ3o829n3fWAIIW1yS8s18TTA7y/jSyJbyLW/CXjY8nK3Y3Hc7ETKmxvBgUqz qTBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=+ap1oel/+0yptjMGFsY7nZSV6U2nWSN5IBM+3Rpe1vo=; b=tElNwzRQk66rzN5aNrd/54utis7CWTOsGHSdRBNw96TfF/Zf2wZr3YIsCwqQ5F1sqD FeDvrVoaWXVqF6cL9iiezX6gtMjZARCuBlnc+2g/A2XbNMt5Gz9gSQYUWRqfXLGIweQh qlria4oM2rbjiTvo93U/PDS7dGzxTcaJMTF71M8wyAcrZElKbt65M1P2E5wJf8jdOrw/ 7EgBPHaKJ6kbgMPV9xy+ReldsKCYEghmLDz5CC29MstiQDySlt4EZB/+gtyAwemUp5dC xSadbi5hJfPQFXe5BOs6hWKd2JS3fOs3uZG8on4uPyOg6LeKaFsvXKoNXzgCBr36LF1y MX5A== X-Gm-Message-State: AKwxytesEhOykAvnIgdryOYAzMXOEKciAGPYgAG4/dU2nceFIft2+ztV Mla15zkBz4uXExH/msiVzEE= X-Google-Smtp-Source: ACJfBoseZzVz2zPVoOvdEyjt87JqjiflZfwDag5hCT7TrLEsIiqqQN2sWX5lqAYusRupMYcxMaxhPg== X-Received: by 10.55.214.75 with SMTP id t72mr55980124qki.12.1516111291573; Tue, 16 Jan 2018 06:01:31 -0800 (PST) Received: from ?IPv6:2001:4898:6808:13e:c4e6:7a22:56f1:df04? ([2001:4898:8010:0:ae1c:7a22:56f1:df04]) by smtp.gmail.com with ESMTPSA id n64sm1322068qkf.49.2018.01.16.06.01.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Jan 2018 06:01:31 -0800 (PST) Subject: Re: [PATCH] sha1_file: remove static strbuf from sha1_file_name() To: Christian Couder , git@vger.kernel.org Cc: Junio C Hamano , Jeff Hostetler , Jonathan Tan , Christian Couder References: <20180116071814.19884-1-chriscool@tuxfamily.org> From: Derrick Stolee Message-ID: <65c81aae-5652-281f-565a-f6116d94abc1@gmail.com> Date: Tue, 16 Jan 2018 09:01:30 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180116071814.19884-1-chriscool@tuxfamily.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 1/16/2018 2:18 AM, Christian Couder wrote: > Using a static buffer in sha1_file_name() is error prone > and the performance improvements it gives are not needed > in most of the callers. > > So let's get rid of this static buffer and, if necessary > or helpful, let's use one in the caller. First: this is a good change for preventing bugs in the future. Do not let my next thought deter you from making this change. Second: I wonder if there is any perf hit now that we are allocating buffers much more often. Also, how often does get_object_directory() change, so in some cases we could cache the buffer and only append the parts for the loose object (and not reallocate because the filenames will have equal length). I'm concerned about the perf implications when inspecting many loose objects (100k+) but these code paths seem to be involved with more substantial work, such as opening and parsing the objects, so keeping a buffer in-memory is probably unnecessary. > --- > cache.h | 8 +++----- > http-walker.c | 6 ++++-- > http.c | 16 ++++++++++------ > sha1_file.c | 38 +++++++++++++++++++++++++------------- > 4 files changed, 42 insertions(+), 26 deletions(-) > > diff --git a/cache.h b/cache.h > index d8b975a571..6db565408e 100644 > --- a/cache.h > +++ b/cache.h > @@ -957,12 +957,10 @@ extern void check_repository_format(void); > #define TYPE_CHANGED 0x0040 > > /* > - * Return the name of the file in the local object database that would > - * be used to store a loose object with the specified sha1. The > - * return value is a pointer to a statically allocated buffer that is > - * overwritten each time the function is called. > + * Put in `buf` the name of the file in the local object database that > + * would be used to store a loose object with the specified sha1. > */ > -extern const char *sha1_file_name(const unsigned char *sha1); > +extern void sha1_file_name(struct strbuf *buf, const unsigned char *sha1); > > /* > * Return an abbreviated sha1 unique within this repository's object database. > diff --git a/http-walker.c b/http-walker.c > index 1ae8363de2..07c2b1af82 100644 > --- a/http-walker.c > +++ b/http-walker.c > @@ -544,8 +544,10 @@ static int fetch_object(struct walker *walker, unsigned char *sha1) > } else if (hashcmp(obj_req->sha1, req->real_sha1)) { > ret = error("File %s has bad hash", hex); > } else if (req->rename < 0) { > - ret = error("unable to write sha1 filename %s", > - sha1_file_name(req->sha1)); > + struct strbuf buf = STRBUF_INIT; > + sha1_file_name(&buf, req->sha1); > + ret = error("unable to write sha1 filename %s", buf.buf); > + strbuf_release(&buf); > } > > release_http_object_request(req); > diff --git a/http.c b/http.c > index 5977712712..5979305bc9 100644 > --- a/http.c > +++ b/http.c > @@ -2168,7 +2168,7 @@ struct http_object_request *new_http_object_request(const char *base_url, > unsigned char *sha1) > { > char *hex = sha1_to_hex(sha1); > - const char *filename; > + struct strbuf filename = STRBUF_INIT; > char prevfile[PATH_MAX]; > int prevlocal; > char prev_buf[PREV_BUF_SIZE]; > @@ -2180,14 +2180,15 @@ struct http_object_request *new_http_object_request(const char *base_url, > hashcpy(freq->sha1, sha1); > freq->localfile = -1; > > - filename = sha1_file_name(sha1); > + sha1_file_name(&filename, sha1); > snprintf(freq->tmpfile, sizeof(freq->tmpfile), > - "%s.temp", filename); > + "%s.temp", filename.buf); > > - snprintf(prevfile, sizeof(prevfile), "%s.prev", filename); > + snprintf(prevfile, sizeof(prevfile), "%s.prev", filename.buf); > unlink_or_warn(prevfile); > rename(freq->tmpfile, prevfile); > unlink_or_warn(freq->tmpfile); > + strbuf_release(&filename); > > if (freq->localfile != -1) > error("fd leakage in start: %d", freq->localfile); > @@ -2302,6 +2303,7 @@ void process_http_object_request(struct http_object_request *freq) > int finish_http_object_request(struct http_object_request *freq) > { > struct stat st; > + struct strbuf filename = STRBUF_INIT; > > close(freq->localfile); > freq->localfile = -1; > @@ -2327,8 +2329,10 @@ int finish_http_object_request(struct http_object_request *freq) > unlink_or_warn(freq->tmpfile); > return -1; > } > - freq->rename = > - finalize_object_file(freq->tmpfile, sha1_file_name(freq->sha1)); > + > + sha1_file_name(&filename, freq->sha1); > + freq->rename = finalize_object_file(freq->tmpfile, filename.buf); > + strbuf_release(&filename); > > return freq->rename; > } > diff --git a/sha1_file.c b/sha1_file.c > index 3da70ac650..f66c21b2da 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -321,15 +321,11 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) > } > } > > -const char *sha1_file_name(const unsigned char *sha1) > +void sha1_file_name(struct strbuf *buf, const unsigned char *sha1) > { > - static struct strbuf buf = STRBUF_INIT; > - > - strbuf_reset(&buf); > - strbuf_addf(&buf, "%s/", get_object_directory()); > + strbuf_addf(buf, "%s/", get_object_directory()); > > - fill_sha1_path(&buf, sha1); > - return buf.buf; > + fill_sha1_path(buf, sha1); > } Could you change this to use strbuf_addstr(buf, get_object_directory()) followed by a strbuf_addch(buf, '/')? This format string is unnecessary and could become slow if this method is called in a tight loop. > > struct strbuf *alt_scratch_buf(struct alternate_object_database *alt) > @@ -710,7 +706,12 @@ int check_and_freshen_file(const char *fn, int freshen) > > static int check_and_freshen_local(const unsigned char *sha1, int freshen) > { > - return check_and_freshen_file(sha1_file_name(sha1), freshen); > + static struct strbuf buf = STRBUF_INIT; > + > + strbuf_reset(&buf); > + sha1_file_name(&buf, sha1); > + > + return check_and_freshen_file(buf.buf, freshen); > } > > static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen) > @@ -866,8 +867,12 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st, > const char **path) > { > struct alternate_object_database *alt; > + static struct strbuf buf = STRBUF_INIT; > + > + strbuf_reset(&buf); > + sha1_file_name(&buf, sha1); > + *path = buf.buf; > > - *path = sha1_file_name(sha1); > if (!lstat(*path, st)) > return 0; > > @@ -891,8 +896,12 @@ static int open_sha1_file(const unsigned char *sha1, const char **path) > int fd; > struct alternate_object_database *alt; > int most_interesting_errno; > + static struct strbuf buf = STRBUF_INIT; > + > + strbuf_reset(&buf); > + sha1_file_name(&buf, sha1); > + *path = buf.buf; > > - *path = sha1_file_name(sha1); > fd = git_open(*path); > if (fd >= 0) > return fd; The check_and_freshen_nonlocal() and open_sha1_file() method comments warn about the 'path' pointer and how it could be overwritten with another call to sha1_file_name(). Your patch fixes that issue, so the comments can be removed. Since these methods are currently being extra careful about the 'path' parameter, perhaps these are candidates to keep a static buffer around? > @@ -1557,9 +1566,12 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, > git_SHA_CTX c; > unsigned char parano_sha1[20]; > static struct strbuf tmp_file = STRBUF_INIT; > - const char *filename = sha1_file_name(sha1); > + static struct strbuf filename = STRBUF_INIT; > + > + strbuf_reset(&filename); > + sha1_file_name(&filename, sha1); > > - fd = create_tmpfile(&tmp_file, filename); > + fd = create_tmpfile(&tmp_file, filename.buf); > if (fd < 0) { > if (errno == EACCES) > return error("insufficient permission for adding an object to repository database %s", get_object_directory()); > @@ -1612,7 +1624,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, > warning_errno("failed utime() on %s", tmp_file.buf); > } > > - return finalize_object_file(tmp_file.buf, filename); > + return finalize_object_file(tmp_file.buf, filename.buf); > } > > static int freshen_loose_object(const unsigned char *sha1) Thanks for this cleanup! -Stolee