From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 7B3FA1F8C4 for ; Wed, 23 Mar 2022 03:48:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241709AbiCWDtw (ORCPT ); Tue, 22 Mar 2022 23:49:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241688AbiCWDtd (ORCPT ); Tue, 22 Mar 2022 23:49:33 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB4C063CB for ; Tue, 22 Mar 2022 20:48:03 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id u16so334501wru.4 for ; Tue, 22 Mar 2022 20:48:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nFAAiqNO9m+Va/Z8MN+WGZqe3XS8J3j2boic6rijhE4=; b=m8kAsWhAXruykO9Q4UVrirJHH9x1JceOaidtNTCoNfutV0ybRJNnQNL+2gmG82iKTg 6AOCLw3t9d3so9JTZB4Xe0UV+fDPnYI7Hh6ehAi46D9yayKou7DuKx4w5JgwLmerHQug MSPCEZPhccuhZAYjleFQTHV+StYpzEFEjFiE7Ry55KrA32qvamOu6nJtwDsCytNMe2H2 q337uFQC3b5SAOMkSmm1RCyr1Mlj56q2mjwhL8VHsuBVcKLWu/+tHpoKl/KR23vQDup6 gKFkmtUGGwwRG7L4aiTVidPBYyRipn1YHytT4WtQVWBxqXlMdYygXDxpv//Y9p1lIOq2 OmwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nFAAiqNO9m+Va/Z8MN+WGZqe3XS8J3j2boic6rijhE4=; b=zjvIcGdtH8q3KsyQAyOgARZgDME0e5ccCCPQXSiwSg5D46abAyTdaWvsOpU75mKT3q gk/XaT99V+LfTLHbPkTgpmPsXgFjib/n85HeVRySSHoUlnGpf+OikbcyIwlgQ+t2JrmF q/L4dmDVyVivlJrCQgkG6JKCkiq+ZbgjuiMnNPlOpUzOKSjV4zh7JGvdFgJCI6cOrj9H kmQUPgb5KskAiHwNBCKDdoNBnf4A11E90Awnrm8t8kNiXQUdnRkOhF9OC71tLWrcIT93 OkyHMB1+vE6HAiTs7aw9fZ/Rb1s4BHfCbzb+E2z7QaLw4wrAA5HwN+4CDAQxQZK7E/8/ iT/A== X-Gm-Message-State: AOAM5315PaPOr15JxshfPalscldyT9s4mNqau50tTmHQfmH6Pcj+OCfX ybyQ4kphVOwoYQPtoeA/Cg99UOvyIGB68A== X-Google-Smtp-Source: ABdhPJxdAubHOQIECRq2e0IP+5d+9YLQY2xxWqYKuMnojVor+/19/65DtD+t0piNXVF2OCmXtECB7A== X-Received: by 2002:a5d:526d:0:b0:203:d69f:3a66 with SMTP id l13-20020a5d526d000000b00203d69f3a66mr24942781wrc.74.1648007281958; Tue, 22 Mar 2022 20:48:01 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id o16-20020a05600c511000b0038c6d836935sm3028830wms.16.2022.03.22.20.48.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Mar 2022 20:48:01 -0700 (PDT) From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= To: git@vger.kernel.org Cc: Junio C Hamano , Neeraj Singh , Johannes Schindelin , Patrick Steinhardt , Bagas Sanjaya , Neeraj Singh , =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Subject: [RFC PATCH 3/7] object-file: pass down unpack-objects.c flags for "bulk" checkin Date: Wed, 23 Mar 2022 04:47:32 +0100 Message-Id: X-Mailer: git-send-email 2.35.1.1428.g1c1a0152d61 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Remove much of this as a POC for exploring some of what I mentioned in https://lore.kernel.org/git/220322.86mthinxnn.gmgdl@evledraar.gmail.com/ This commit is obviously not what we *should* do as end-state, but demonstrates what's needed (I think) for a bare-minimum implementation of just the "bulk" syncing method for loose objects without the part where we do the tmp-objdir.c dance. Performance with this is already quite promising. Benchmarking with: git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' \ -p 'rm -rf r.git && git init --bare r.git' \ './git -C r.git -c core.fsync=loose-object -c core.fsyncMethod=batch unpack-objects 0" branch. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/add.c | 3 -- builtin/unpack-objects.c | 2 - builtin/update-index.c | 4 -- bulk-checkin.c | 86 ---------------------------------------- bulk-checkin.h | 6 --- cache.h | 5 --- object-file.c | 37 ++++++++++++----- t/t1050-large.sh | 3 ++ 8 files changed, 29 insertions(+), 117 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index 3ffb86a4338..a6d7f4dc1e1 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -670,8 +670,6 @@ int cmd_add(int argc, const char **argv, const char *prefix) string_list_clear(&only_match_skip_worktree, 0); } - plug_bulk_checkin(); - if (add_renormalize) exit_status |= renormalize_tracked_files(&pathspec, flags); else @@ -682,7 +680,6 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (chmod_arg && pathspec.nr) exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only); - unplug_bulk_checkin(); finish: if (write_locked_index(&the_index, &lock_file, diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index ec40c6fd966..93da436581b 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -507,7 +507,6 @@ static void unpack_all(void) if (!quiet) progress = start_progress(_("Unpacking objects"), nr_objects); CALLOC_ARRAY(obj_list, nr_objects); - plug_bulk_checkin(); oflags = nr_objects > 1 ? HASH_N_OBJECTS : 0; for (i = 0; i < nr_objects; i++) { int nth = i + 1; @@ -517,7 +516,6 @@ static void unpack_all(void) unpack_one(i, oflags | f); display_progress(progress, nth); } - unplug_bulk_checkin(); stop_progress(&progress); if (delta_list) diff --git a/builtin/update-index.c b/builtin/update-index.c index cbd2b0d633b..95ed3c47b2e 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1118,8 +1118,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) parse_options_start(&ctx, argc, argv, prefix, options, PARSE_OPT_STOP_AT_NON_OPTION); - /* optimize adding many objects to the object database */ - plug_bulk_checkin(); while (ctx.argc) { if (parseopt_state != PARSE_OPT_DONE) parseopt_state = parse_options_step(&ctx, options, @@ -1194,8 +1192,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_release(&buf); } - /* by now we must have added all of the new objects */ - unplug_bulk_checkin(); if (split_index > 0) { if (git_config_get_split_index() == 0) warning(_("core.splitIndex is set to false; " diff --git a/bulk-checkin.c b/bulk-checkin.c index a0dca79ba6a..4ffea87f44d 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -9,14 +9,11 @@ #include "pack.h" #include "strbuf.h" #include "string-list.h" -#include "tmp-objdir.h" #include "packfile.h" #include "object-store.h" static int bulk_checkin_plugged; -static struct tmp_objdir *bulk_fsync_objdir; - static struct bulk_checkin_state { char *pack_tmp_name; struct hashfile *f; @@ -85,40 +82,6 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state) reprepare_packed_git(the_repository); } -/* - * Cleanup after batch-mode fsync_object_files. - */ -static void do_batch_fsync(void) -{ - struct strbuf temp_path = STRBUF_INIT; - struct tempfile *temp; - - if (!bulk_fsync_objdir) - return; - - /* - * Issue a full hardware flush against a temporary file to ensure - * that all objects are durable before any renames occur. The code in - * fsync_loose_object_bulk_checkin has already issued a writeout - * request, but it has not flushed any writeback cache in the storage - * hardware or any filesystem logs. This fsync call acts as a barrier - * to ensure that the data in each new object file is durable before - * the final name is visible. - */ - strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); - temp = xmks_tempfile(temp_path.buf); - fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); - delete_tempfile(&temp); - strbuf_release(&temp_path); - - /* - * Make the object files visible in the primary ODB after their data is - * fully durable. - */ - tmp_objdir_migrate(bulk_fsync_objdir); - bulk_fsync_objdir = NULL; -} - static int already_written(struct bulk_checkin_state *state, struct object_id *oid) { int i; @@ -313,26 +276,6 @@ static int deflate_to_pack(struct bulk_checkin_state *state, return 0; } -void prepare_loose_object_bulk_checkin(void) -{ - if (bulk_checkin_plugged && !bulk_fsync_objdir) - bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); -} - -void fsync_loose_object_bulk_checkin(int fd, const char *filename) -{ - /* - * If we have a plugged bulk checkin, we issue a call that - * cleans the filesystem page cache but avoids a hardware flush - * command. Later on we will issue a single hardware flush - * before as part of do_batch_fsync. - */ - if (!bulk_fsync_objdir || - git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) { - fsync_or_die(fd, filename); - } -} - int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags) @@ -343,32 +286,3 @@ int index_bulk_checkin(struct object_id *oid, finish_bulk_checkin(&bulk_checkin_state); return status; } - -void plug_bulk_checkin(void) -{ - assert(!bulk_checkin_plugged); - - /* - * A temporary object directory is used to hold the files - * while they are not fsynced. - */ - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) { - bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); - if (!bulk_fsync_objdir) - die(_("Could not create temporary object directory for core.fsyncMethod=batch")); - - tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0); - } - - bulk_checkin_plugged = 1; -} - -void unplug_bulk_checkin(void) -{ - assert(bulk_checkin_plugged); - bulk_checkin_plugged = 0; - if (bulk_checkin_state.f) - finish_bulk_checkin(&bulk_checkin_state); - - do_batch_fsync(); -} diff --git a/bulk-checkin.h b/bulk-checkin.h index 181d3447ff9..76fc33e0c8f 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -6,14 +6,8 @@ #include "cache.h" -void prepare_loose_object_bulk_checkin(void); -void fsync_loose_object_bulk_checkin(int fd, const char *filename); - int index_bulk_checkin(struct object_id *oid, int fd, size_t size, enum object_type type, const char *path, unsigned flags); -void plug_bulk_checkin(void); -void unplug_bulk_checkin(void); - #endif diff --git a/cache.h b/cache.h index 320248a54e0..997bf2f57fd 100644 --- a/cache.h +++ b/cache.h @@ -1771,11 +1771,6 @@ void write_or_die(int fd, const void *buf, size_t count); void fsync_or_die(int fd, const char *); void fsync_component_or_die(enum fsync_component component, int fd, const char *msg); -static inline int batch_fsync_enabled(enum fsync_component component) -{ - return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH); -} - ssize_t read_in_full(int fd, void *buf, size_t count); ssize_t write_in_full(int fd, const void *buf, size_t count); ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); diff --git a/object-file.c b/object-file.c index cd0ddb49e4b..dbeb3df502d 100644 --- a/object-file.c +++ b/object-file.c @@ -1886,19 +1886,37 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf, hash_object_file_literally(algo, buf, len, type_name(type), oid); } +static void sync_loose_object_batch(int fd, const char *filename, + const unsigned oflags) +{ + const int last = oflags & HASH_N_OBJECTS_LAST; + + /* + * We're doing a sync_file_range() (or equivalent) for 1..N-1 + * objects, and then a "real" fsync() for N. On some OS's + * enabling core.fsync=loose-object && core.fsyncMethod=batch + * improves the performance by a lot. + */ + if (last || (!last && git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0)) + fsync_or_die(fd, filename); +} + /* Finalize a file on disk, and close it. */ -static void close_loose_object(int fd, const char *filename) +static void close_loose_object(int fd, const char *filename, + const unsigned oflags) { + int fsync_loose; + if (the_repository->objects->odb->will_destroy) goto out; - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - fsync_loose_object_bulk_checkin(fd, filename); - else if (fsync_object_files > 0) + fsync_loose = fsync_components & FSYNC_COMPONENT_LOOSE_OBJECT; + + if (oflags & HASH_N_OBJECTS && fsync_loose && + fsync_method == FSYNC_METHOD_BATCH) + sync_loose_object_batch(fd, filename, oflags); + else if (fsync_object_files > 0 || fsync_loose) fsync_or_die(fd, filename); - else - fsync_component_or_die(FSYNC_COMPONENT_LOOSE_OBJECT, fd, - filename); out: if (close(fd) != 0) @@ -1962,9 +1980,6 @@ static int write_loose_object(const struct object_id *oid, char *hdr, static struct strbuf tmp_file = STRBUF_INIT; static struct strbuf filename = STRBUF_INIT; - if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) - prepare_loose_object_bulk_checkin(); - loose_object_path(the_repository, &filename, oid); fd = create_tmpfile(&tmp_file, filename.buf); @@ -2015,7 +2030,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, die(_("confused by unstable object source data for %s"), oid_to_hex(oid)); - close_loose_object(fd, tmp_file.buf); + close_loose_object(fd, tmp_file.buf, flags); if (mtime) { struct utimbuf utb; diff --git a/t/t1050-large.sh b/t/t1050-large.sh index 4f3aa17c994..1baaa8024c8 100755 --- a/t/t1050-large.sh +++ b/t/t1050-large.sh @@ -5,6 +5,9 @@ test_description='adding and checking out large blobs' . ./test-lib.sh +skip_all='TODO: migrate the builtin/add.c code' +test_done + test_expect_success setup ' # clone does not allow us to pass core.bigfilethreshold to # new repos, so set core.bigfilethreshold globally -- 2.35.1.1428.g1c1a0152d61