From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_NONE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id BA8691F463 for ; Mon, 2 Dec 2019 06:50:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726428AbfLBGuh (ORCPT ); Mon, 2 Dec 2019 01:50:37 -0500 Received: from pb-smtp21.pobox.com ([173.228.157.53]:62289 "EHLO pb-smtp21.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725977AbfLBGuh (ORCPT ); Mon, 2 Dec 2019 01:50:37 -0500 Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 899BD97C3F; Mon, 2 Dec 2019 01:50:35 -0500 (EST) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=tmW57z4nEbnabMg1vpvRDFZ1UKo=; b=WoQlXG 2OlsJBoDJifdZu59OCd4sDpr7cDU5bv5bLG1Wuej0tG9xf/D5mcAoMcQ4lvG3XpW /uCv7RGU91tjD3c7OCVKm52LtDk2vpKwVSAf9RYfc6hdfYq8cQTolff7dn38QxpX zlZSwtzpMFIOlEzAw1mM5INsCh8LMqU8GLNco= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=GoO9HzsZ5HxETqKwOiuzkEsPc8LTSnds cu1KXxBXJvsQ0tyr88nDT4/B1rZ1tMsEv1kNZebTi2zvmFGI6qiS6dtT+qJuHrkv Uo4tPc7sf3yKoInmufK6RI4xfke7VEF5rqZBBuhzBBlKBFl8pftVvMqy1cFmbego 2ZceTp3ku2o= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 81BA497C3E; Mon, 2 Dec 2019 01:50:35 -0500 (EST) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.76.80.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id AC28B97C3D; Mon, 2 Dec 2019 01:50:32 -0500 (EST) (envelope-from junio@pobox.com) From: Junio C Hamano To: "D. Ben Knoble" Cc: git@vger.kernel.org Subject: Re: git-status performance with submodules References: Date: Sun, 01 Dec 2019 22:50:29 -0800 In-Reply-To: (D. Ben Knoble's message of "Mon, 2 Dec 2019 01:19:49 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 09223CB8-14D0-11EA-8DEA-8D86F504CC47-77302942!pb-smtp21.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org "D. Ben Knoble" writes: > ### What I am curious about > > From the traces (attached), it appears that git-status suffers from a lack of > (possibly embarrassing) parallelism: I would expect each submodule to be > independently check-able, ... > ... > What can we do to fix this? Is there a reason for this (really terribly slow) > serial execution? Is this something developers haven't bothered to optimize > ("unexpected use case")? If so, I would like to discuss taking a crack at it, > because I do have at least one repository with this many submodules, and I > care about its performance. Nice to hear from somebody who cares about improving submodule support. I offhand do not think of a reason why we inherently have to process them serially. But the way "git status" code is structured, it probably takes a bit of preparatory refactoring. If I recall correctly, it walks each path in the index in the superproject and notes how the file in the working tree is different from that of the index and the HEAD, under the assumption that inspection of each path is relatively cheap and at the same cost. You'd first need to restructure that part so that inspecting groups of index entries can be sharded to separate subprocesses while the parent process waits, and have them report to the parent process, and let the parent process continue with the aggregated result, or something like that. Thanks.