From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 1C3C51F453 for ; Wed, 26 Sep 2018 22:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726469AbeI0EWE (ORCPT ); Thu, 27 Sep 2018 00:22:04 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:52908 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726397AbeI0EWE (ORCPT ); Thu, 27 Sep 2018 00:22:04 -0400 Received: by mail-wm1-f66.google.com with SMTP id l7-v6so3833200wme.2 for ; Wed, 26 Sep 2018 15:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=jp9nouSZ2TgXzDZCzfvK/knA83BKbUbXmrs8O5PgGsA=; b=mVJGSR2wSKvVviLrSG9ShcfN+3DFdwHJLJITdzuVXA7Yprq2rBmZXby/0SjfX6iWb4 F6+GaNTd8xWPqazH7w2WEMdOQEiVWizdIcaIpyomaFMEAcDFLx+SO6NE77Q1cDBulys4 ZiJY/xj3ay+3/tqK2WSMJuc4jl0vk10imGZGCnU+eSXnfRHrQ6e4FTwyAVHuC0Vfl0Ow lbqKbqmj9yjnr37qGeywu/xWXV/8K6kem2dBjVpiLKG12GwPkLhtmVqVUty0rev9vK2L 28Tf0TFMjTse63pNi6d7HmUkrW452ELYt0F0tgxEbfc1FqRIHQ8G3lMq1DmA/mq7o/B4 KKRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=jp9nouSZ2TgXzDZCzfvK/knA83BKbUbXmrs8O5PgGsA=; b=aq0wAbbeEU8eYzTHetF2/v7egO9XVmfFzCkx23mM6/+A0yKMew/yzPPquLm3igE7e3 6v7z7X6YXEr/SKGl3M1MoG2UnlRMLeoMHnwTGI+oWKZfWKYvi/RIcwOioqvbSLdv0S+P Evy8e91VSdLnm8vxcgUkq5e7UYhVBKb85tksASFhA5Oko8wXKXCoO5FIHEEM5wF4vUYc gJbbfaXXLaW6ylCb2WnJmi2ATPL2Px8joNGdTfKdf0rbRSr/iBqWBkL8k2E646QFCbK2 i7OYQ6zwD2S0lUCOto3qDjogjjQBABmaweVd/WW/8F4JBOVIw9SuBCrrt3nTzUVgi17M +FKA== X-Gm-Message-State: ABuFfojTIUl7Atk3++oPDvDmk3dKUNGZhLJJIsHinJURhg9G0HOOSTUl SSx9e57kjSl+f+t0RzUhGlI= X-Google-Smtp-Source: ACcGV62OGELRw0iqczGCOjpmHsf4/tIGnQ4V6jOOlQtlEzuql0b1db21lwldlY4xTe7MUhvp5aD56Q== X-Received: by 2002:a1c:7212:: with SMTP id n18-v6mr5915912wmc.33.1537999618942; Wed, 26 Sep 2018 15:06:58 -0700 (PDT) Received: from localhost (168.50.187.35.bc.googleusercontent.com. [35.187.50.168]) by smtp.gmail.com with ESMTPSA id r9sm139970wrx.65.2018.09.26.15.06.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 15:06:58 -0700 (PDT) From: Junio C Hamano To: Ben Peart Cc: git@vger.kernel.org, pclouds@gmail.com, Ben Peart Subject: Re: [PATCH v6 0/7] speed up index load through parallelization References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Date: Wed, 26 Sep 2018 15:06:57 -0700 In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> (Ben Peart's message of "Wed, 26 Sep 2018 15:54:35 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Ben Peart writes: > Base Ref: master > Web-Diff: https://github.com/benpeart/git/commit/a0300882d4 > Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 && git checkout a0300882d4 > > > This iteration brings back the Index Entry Offset Table (IEOT) extension > which enables us to multi-thread the cache entry parsing without having > the primary thread have to scan all the entries first. In cases where the > cache entry parsing is the most expensive part, this yields some additional > savings. Nice. > Test w/100,000 files Baseline Optimize V4 Extensions Entries > ---------------------------------------------------------------------------- > 0002.1: read_cache 22.36 18.74 -16.2% 18.64 -16.6% 12.63 -43.5% > > Test w/1,000,000 files Baseline Optimize V4 Extensions Entries > ----------------------------------------------------------------------------- > 0002.1: read_cache 304.40 270.70 -11.1% 195.50 -35.8% 204.82 -32.7% > > Note that on the 1,000,000 files case, multi-threading the cache entry parsing > does not yield a performance win. This is because the cost to parse the > index extensions in this repo, far outweigh the cost of loading the cache > entries. > ... > The high cost of parsing the index extensions is driven by the cache tree > and the untracked cache extensions. As this is currently the longest pole, > any reduction in this time will reduce the overall index load times so is > worth further investigation in another patch series. Interesting. > One option would be to load each extension on a separate thread but I > believe that is overkill for the vast majority of repos. Instead, some > optimization of the loading code for these two extensions is probably worth > looking into as a quick examination shows that the bulk of the time for both > of them is spent in xcalloc(). Thanks. Looking forward to block some quality time off to read this through, but from the cursory look (read: diff between the previous round), this looks quite promising.