From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-11.1 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 5BB2E1F4C1 for ; Wed, 30 Nov 2022 18:04:24 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="IeSxID7o"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229748AbiK3SEQ (ORCPT ); Wed, 30 Nov 2022 13:04:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbiK3SEN (ORCPT ); Wed, 30 Nov 2022 13:04:13 -0500 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 180E66C72B for ; Wed, 30 Nov 2022 10:04:12 -0800 (PST) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-3b10392c064so179758437b3.0 for ; Wed, 30 Nov 2022 10:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yopEHLV2QMy+aasWtc7lZ/JPy46gPyRvefGlE6kFE2A=; b=IeSxID7o9mDP8SpVXygklmp7G8DKRQ2hmf0bM7aT4yMkt7NmRZxD+985TBAbFZ+wh5 YZ978sqAMHOZT8LQp2EzzninDSRC8FboV9Ca9MKZS5UwgIOk+wn41L6RSULr5JssFUdF Qz2sE2DEtTPWKW1NvDQURYht/3R7eCfPGrvT8Lewl8IKndg1i9tmvSg/Xakt3i7dSz8g Plf5hHOVmzMdpJnUqedumY7kVcXfQuyvgU/Sls2VeUCygHymfRaPDqxWNIqJTQZnrxCO wrYGg5kmZmRu9EU3D2M1ienjScnzj7dVSN4SCQEAjxwUZz6r3ev3VEl4iPS5W+r7wtUS fpRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yopEHLV2QMy+aasWtc7lZ/JPy46gPyRvefGlE6kFE2A=; b=SrfgCcZORx/R5UjnIqRl6ugG9Ue+8FPIWx4YJnLe8kNivQ/KkrAmd+oyC56yG95b3O 7fqp4cJ2zEd/NgzPpTJgZREdTZToU0PtuzSOxmfuAkEVYWi97SF98UMZYVWCgodeWt7l DaSbRUEA4fK6VT2cYu2hratk/F92K2OAijSpAkvLu8dgJEehVy1prVcNpaJAxYxsuY1E l1fO1YuovgwcS2Umww3gACWWJue7DGDXqhsgs9o5/9VFCuBbeVBghEJZDCIo+k30KphB KiFzGEcV/eDyp701qGZZ9h88euCTXYiKNFPLngul0iTMu09Tdj/P7yC59Y6+7dC5u8Fx 3sjw== X-Gm-Message-State: ANoB5pmEna7ji1KcUg0Zq943tScUhpSXS8Bw2eVnF77Ejp72fyYYtvJR TjYJ9AYf5C8S9Ugp7fq03nxNyp6rbVuOR/pmju9l4w== X-Google-Smtp-Source: AA0mqf5y7l8TvtSbF8Ik2Z0JGoEOsXsZT2qJa9WC5TGAl2de8XNHetXdUZZ0X6eWJIhDt7IlON/tIBgvOQAs/W9fznw= X-Received: by 2002:a81:5345:0:b0:399:36f1:d851 with SMTP id h66-20020a815345000000b0039936f1d851mr40621397ywb.369.1669831451141; Wed, 30 Nov 2022 10:04:11 -0800 (PST) MIME-Version: 1.0 References: <20221108184200.2813458-6-calvinwan@google.com> In-Reply-To: From: Calvin Wan Date: Wed, 30 Nov 2022 10:04:00 -0800 Message-ID: Subject: Re: [PATCH v4 5/5] diff-lib: parallelize run_diff_files for submodules To: Elijah Newren Cc: git@vger.kernel.org, emilyshaffer@google.com, avarab@gmail.com, phillip.wood123@gmail.com, myriamanis@google.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org > > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt > > index 6490527b45..1144a5ad74 100644 > > --- a/Documentation/config/submodule.txt > > +++ b/Documentation/config/submodule.txt > > @@ -93,6 +93,18 @@ submodule.fetchJobs:: > > in parallel. A value of 0 will give some reasonable default. > > If unset, it defaults to 1. > > > > +submodule.diffJobs:: > > + Specifies how many submodules are diffed at the same time. A > > + positive integer allows up to that number of submodules diffed > > + in parallel. A value of 0 will give the number of logical cores. > > Why hardcode that 0 gives the number of logical cores? Why not just > state that a value of 0 "gives a guess at optimal parallelism", > allowing us to adjust it in the future if we can do some smart > heuristics? It'd be nice to not have us tied down and prevented from > taking a smarter approach. I was unaware that the original intention of "reasonable default" was for flexibility (I have a WIP series standardizing these parallelism config options that also used "number of logical cores" but I think that should probably change now). There are other parallel config options that hardcode 0 as well, so my initial thought was that we should be using the more precise wording -- the argument for flexibility now seems more preferable, however. > > > + If unset, it defaults to 1. The diff operation is used by many > > + other git commands such as add, merge, diff, status, stash and > > + more. Note that the expensive part of the diff operation is > > + reading the index from cache or memory. Therefore multiple jobs > > + may be detrimental to performance if your hardware does not > > + support parallel reads or if the number of jobs greatly exceeds > > + the amount of supported reads. > > So, in the future, someone who wants to speed things up is going to > need to configure submodule.diffJobs, submodule.fetchJobs, > submodule.checkoutJobs, submodule.grepJobs, submodule.mergeJobs, etc.? > I worry that we're headed towards a bit of a suboptimal user > experience here. It'd be nice to have a more central configuration of > "yes, I want parallelism; please don't make me benchmark things in > order to take advantage of it", if that's possible. It may just be > that the "optimal" parallelism varies significantly between commands, > and also varies a lot based on hardware, repository sizes, background > load on the system, etc. such that we can't provide a reasonable > suggestion for those that want a value greater than 1. Or maybe in > the future we allow folks somehow to request our best guess at a good > parallelization level and then let users override with these > individual flags. I'm just a little worried we might be making users > do work that we should somehow figure out. I had the same worry as well -- see the discussion I had here: https://lore.kernel.org/git/CAFySSZAbsPuyPVX0+DQzArny2CEWs+GpQqJ3AOxUB_ffo8B3SQ@mail.gmail.com/ I would like to also eventually solve this problem, but this patch won't be the one to do so.