From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 8ADBD1F5AE for ; Wed, 21 Jul 2021 04:25:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232334AbhGUDo1 (ORCPT ); Tue, 20 Jul 2021 23:44:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232111AbhGUDne (ORCPT ); Tue, 20 Jul 2021 23:43:34 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D89BAC061762 for ; Tue, 20 Jul 2021 21:24:10 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id k4so640648wrc.8 for ; Tue, 20 Jul 2021 21:24:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=GrqZQtDOMosDqWoAZzBDHFGfZidiSnrJH8SL/Q6ug8Y=; b=MMFYbOVJ4I5Dq9MXHb/8GvMlgFIfGboJZ82qJSPnrsoEH3Y+IFCA2Yd9ehbjXbJIWZ N86MdRPm5BK6qOJCa+23fTJhIQqFDRN219WJMM4IoKrrj4Eh+6tRiOFws4UP51JKsnMc DApD6g4T53ZnIc4nTx7V6YGcOFLX9eK/g6XHw3GixbdKZrpMntVWVMWhCgtxi8WMdlga e9JkpFmFTpiSZbRk0UuEXvoLmJEScXR/Uvcij8yWrv7bYaLzTOZ2/UrIORnr260dv9V6 J6ANFzBkIMTtxlTX/S97LatkgmbPV/2HX9cE+lNoaNFl7BR06jJCKfxfjjDrO7A0LcX7 39JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=GrqZQtDOMosDqWoAZzBDHFGfZidiSnrJH8SL/Q6ug8Y=; b=eWlny2bgwN1gjcF+6T/64x40b69l53j/eLdP/x8NbMo2NXodgiZcVbrYMtY99KrPhv FHmYX9K854Gz2yCE9a9DcgLpBOdZgyCRspSzm8/WM34F6GcuWhGz1JD/RXl21verPVsE cu0WYFbWxye51pKiiJX+NeSHOXz1xnZ+cEgZjgazIadmWaYuQIGogck8VaDvuVovPGtv BAKecNEAIZ3/kECCvsMxjb4B6HUtUuH4PqnVQmMPqhBfyYAPfpTj9BIJpr6yqV5Y1xjy 2gJwP/9O9uAvMJao+w6lB0ttCiVFtm2Z0jEzgYMygDDf75H02BbTGs6kG9Rbep2OkI6m szuQ== X-Gm-Message-State: AOAM532uJRSQd6dupcJ6H/eyVl3MhhKwlQ7qLU5pGUAdLGJ3tPMT4JCw zxDArpvn/lnzyOi/fmXI/HgNJuTzkFk= X-Google-Smtp-Source: ABdhPJx00A0uQ0zpPES2Yv0Aftjzg+q8QyfVlhfAJfnr2Yel8o4q3SMaJ0ddAc/sn17bw6dg58umcw== X-Received: by 2002:a5d:4e43:: with SMTP id r3mr39925074wrt.132.1626841449495; Tue, 20 Jul 2021 21:24:09 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m32sm3858803wms.23.2021.07.20.21.24.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jul 2021 21:24:09 -0700 (PDT) Message-Id: <7b2112718157e65ba558842e27521df8c351f596.1626841444.git.gitgitgadget@gmail.com> In-Reply-To: References: From: "Elijah Newren via GitGitGadget" Date: Wed, 21 Jul 2021 04:24:03 +0000 Subject: [PATCH v4 6/7] merge-ort: avoid recursing into directories when we don't need to MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fcc: Sent To: git@vger.kernel.org Cc: Derrick Stolee , =?UTF-8?Q?=C3=86var_Arnfj=C3=B6r=C3=B0?= Bjarmason , Elijah Newren , Bagas Sanjaya , Elijah Newren , Elijah Newren Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Elijah Newren This combines the work of the last several patches, and implements the conditions when we don't need to recurse into directories. It's perhaps easiest to see the logic by separating the fact that a directory might have both rename sources and rename destinations: * rename sources: only files present in the merge base can serve as rename sources, and only when one side deletes that file. When the tree on one side matches the merge base, that means every file within the subtree matches the merge base. This means that the skip-irrelevant-rename-detection optimization from before kicks in and we don't need any of these files as rename sources. * rename destinations: the tree that does not match the merge base might have newly added and hence unmatched destination files. This is what usually prevents us from doing trivial directory resolutions in the merge machinery. However, the fact that we have deferred recursing into this directory until the end means we know whether there are any unmatched relevant potential rename sources elsewhere in this merge. If there are no unmatched such relevant sources anywhere, then there is no need to look for unmatched potential rename destinations to match them with. This informs our algorithm: * Search through relevant_sources; if we have entries, they better all be reflected in cached_pairs or cached_irrelevant, otherwise they represent an unmatched potential rename source (causing the optimization to be disallowed). * For any relevant_source represented in cached_pairs, we do need to to make sure to get the destination for each source, meaning we need to recurse into any ancestor directories of those destinations. * Once we've recursed into all the rename destinations for any relevant_sources in cached_pairs, we can then do the trivial directory resolution for the remaining directories. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms Acked-by: Derrick Stolee Signed-off-by: Elijah Newren --- merge-ort.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 99 insertions(+), 3 deletions(-) diff --git a/merge-ort.c b/merge-ort.c index dbccf8c62e2..a013708fa79 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -1230,6 +1230,18 @@ static int collect_merge_info_callback(int n, return mask; } +static void resolve_trivial_directory_merge(struct conflict_info *ci, int side) +{ + VERIFY_CI(ci); + assert((side == 1 && ci->match_mask == 5) || + (side == 2 && ci->match_mask == 3)); + oidcpy(&ci->merged.result.oid, &ci->stages[side].oid); + ci->merged.result.mode = ci->stages[side].mode; + ci->merged.is_null = is_null_oid(&ci->stages[side].oid); + ci->match_mask = 0; + ci->merged.clean = 1; /* (ci->filemask == 0); */ +} + static int handle_deferred_entries(struct merge_options *opt, struct traverse_info *info) { @@ -1239,9 +1251,72 @@ static int handle_deferred_entries(struct merge_options *opt, int side, ret = 0; for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) { - renames->deferred[side].trivial_merges_okay = 0; - strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges, - &iter, entry) { + unsigned optimization_okay = 1; + struct strintmap copy; + + /* Loop over the set of paths we need to know rename info for */ + strset_for_each_entry(&renames->relevant_sources[side], + &iter, entry) { + char *rename_target, *dir, *dir_marker; + struct strmap_entry *e; + + /* + * If we don't know delete/rename info for this path, + * then we need to recurse into all trees to get all + * adds to make sure we have it. + */ + if (strset_contains(&renames->cached_irrelevant[side], + entry->key)) + continue; + e = strmap_get_entry(&renames->cached_pairs[side], + entry->key); + if (!e) { + optimization_okay = 0; + break; + } + + /* If this is a delete, we have enough info already */ + rename_target = e->value; + if (!rename_target) + continue; + + /* If we already walked the rename target, we're good */ + if (strmap_contains(&opt->priv->paths, rename_target)) + continue; + + /* + * Otherwise, we need to get a list of directories that + * will need to be recursed into to get this + * rename_target. + */ + dir = xstrdup(rename_target); + while ((dir_marker = strrchr(dir, '/'))) { + *dir_marker = '\0'; + if (strset_contains(&renames->deferred[side].target_dirs, + dir)) + break; + strset_add(&renames->deferred[side].target_dirs, + dir); + } + free(dir); + } + renames->deferred[side].trivial_merges_okay = optimization_okay; + /* + * We need to recurse into any directories in + * possible_trivial_merges[side] found in target_dirs[side]. + * But when we recurse, we may need to queue up some of the + * subdirectories for possible_trivial_merges[side]. Since + * we can't safely iterate through a hashmap while also adding + * entries, move the entries into 'copy', iterate over 'copy', + * and then we'll also iterate anything added into + * possible_trivial_merges[side] once this loop is done. + */ + copy = renames->deferred[side].possible_trivial_merges; + strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges, + 0, + NULL, + 0); + strintmap_for_each_entry(©, &iter, entry) { const char *path = entry->key; unsigned dir_rename_mask = (intptr_t)entry->value; struct conflict_info *ci; @@ -1254,6 +1329,13 @@ static int handle_deferred_entries(struct merge_options *opt, VERIFY_CI(ci); dirmask = ci->dirmask; + if (optimization_okay && + !strset_contains(&renames->deferred[side].target_dirs, + path)) { + resolve_trivial_directory_merge(ci, side); + continue; + } + info->name = path; info->namelen = strlen(path); info->pathlen = info->namelen + 1; @@ -1289,6 +1371,20 @@ static int handle_deferred_entries(struct merge_options *opt, if (ret < 0) return ret; } + strintmap_clear(©); + strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges, + &iter, entry) { + const char *path = entry->key; + struct conflict_info *ci; + + ci = strmap_get(&opt->priv->paths, path); + VERIFY_CI(ci); + + assert(renames->deferred[side].trivial_merges_okay && + !strset_contains(&renames->deferred[side].target_dirs, + path)); + resolve_trivial_directory_merge(ci, side); + } } return ret; } -- gitgitgadget