From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 5BAEF1F597 for ; Mon, 23 Jul 2018 17:03:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388652AbeGWSFv (ORCPT ); Mon, 23 Jul 2018 14:05:51 -0400 Received: from mail-io0-f195.google.com ([209.85.223.195]:41886 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388357AbeGWSFv (ORCPT ); Mon, 23 Jul 2018 14:05:51 -0400 Received: by mail-io0-f195.google.com with SMTP id q9-v6so1134928ioj.8 for ; Mon, 23 Jul 2018 10:03:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=pt6mUKIjbvTIM0OZOywIMJn5VcAi4nGFmg19bHFIefI=; b=j7k79HCWtg5hMyOqmOH9tr1zXadsrCtGZR+NnKcG/bv/QvUd9qqErQuWty+Q9ona/n 8ooVWjler02Y03wcKjNWXu9cRKT9jIqv7q72sKa74o9krj6Wf1h5u0In9+8VQzYX06JC Yiv0NTXx6NXmDGDYM+5tY8rp8xvLk1aS/LgDHGcSc1gdhru67ti+Iu+CmhDo+zAq/1vo eX4fCYN4iQte7iXa5u787/VhZel4PA+G1qmHnzS1fJ0qOZzhN4uap8hMUehrWaqIc8tX 03s4Mwm3iWVJ6maWcbNIod5zSlE2SyC+Zn6q94aKmngVA/6L7iAGGNBvDhpVXUW2UipI dZuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=pt6mUKIjbvTIM0OZOywIMJn5VcAi4nGFmg19bHFIefI=; b=kwGeisHB96Vz8V/V43N6IKO5kGHzMHyFPPpZxIBrULlpZBUYp34WikFC1u9YIl/XPa RrDMXaop1/eJ91zlNIAfuThQReSB+drjm8dzM33UZ8dS788vdogHFMJ5HG7vE4VOF5H3 3WZWcKdGEP/A4Q9BPZ+/zTInACTSx7X5gcyesnSk/AERiqRy28Zl92OydaEDqJ1cefyd Fza64oC0FJf4dPxS0z9A9h1xgzRMIo/0bNQHPIeRvdy5fbWHVozFWgfWSnxLjUWkAeKh XcEXJGVqlGkAfX3BdR9KsBsA1Q9J2LG5gZ0JSkJMOyYf2En6PIlfYW+J8Z6NSxv7xOEv JKqg== X-Gm-Message-State: AOUpUlHpvz6y5tM8+g6LZZ2xIDJe0SwCmFaCgOHqnawqAsTcXqoBjqPj Q1rL9qkPfaj55Sl4vI447kXKfrLeM8m7L6pZYUA= X-Google-Smtp-Source: AAOMgpfEQJKxMVRnj3cKy5HQKmWDbjl4oI8fb3auKUN+uQlJ2BaBb101dpxjPMVe9140YyuJgnHi/z8+VlH5OirpRqw= X-Received: by 2002:a6b:825e:: with SMTP id e91-v6mr10552674iod.118.1532365422594; Mon, 23 Jul 2018 10:03:42 -0700 (PDT) MIME-Version: 1.0 References: <20180718204458.20936-1-benpeart@microsoft.com> <20180718213420.GA17291@sigill.intra.peff.net> In-Reply-To: From: Duy Nguyen Date: Mon, 23 Jul 2018 19:03:16 +0200 Message-ID: Subject: Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) To: Ben Peart Cc: Jeff King , Ben Peart , Git Mailing List , Junio C Hamano Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Jul 23, 2018 at 5:50 PM Ben Peart wrote: > > Anyway, on to the actual discussion: > > > >> Here is a checkout command with tracing turned on to demonstrate where= the > >> time is spent. Note, this is somewhat of a =EF=BF=BDbest case=EF=BF= =BD as I=EF=BF=BDm simply > >> checking out the current commit: > >> > >> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev) > >> $ /usr/src/git/git.exe checkout > >> 12:31:50.419016 read-cache.c:2006 performance: 1.180966800 s: re= ad cache .git/index > >> 12:31:51.184636 name-hash.c:605 performance: 0.664575200 s: in= itialize name hash > >> 12:31:51.200280 preload-index.c:111 performance: 0.019811600 s: pr= eload index > >> 12:31:51.294012 read-cache.c:1543 performance: 0.094515600 s: re= fresh index > >> 12:32:29.731344 unpack-trees.c:1358 performance: 33.889840200 s: t= raverse_trees > >> 12:32:37.512555 read-cache.c:2541 performance: 1.564438300 s: wr= ite index, changed mask =3D 28 > >> 12:32:44.918730 unpack-trees.c:1358 performance: 7.243155600 s: tr= averse_trees > >> 12:32:44.965611 diff-lib.c:527 performance: 7.374729200 s: di= ff-index > >> Waiting for GVFS to parse index and update placeholder files...Succeed= ed > >> 12:32:46.824986 trace.c:420 performance: 57.715656000 s: g= it command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout > > > > What's the current state of the index before this checkout? > > This was after running "git checkout" multiple times so there was really > nothing for git to do. Hmm.. this means cache-tree is fully valid, unless you have changes in index. We're quite aggressive in repairing cache-tree since aecf567cbf (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we have very good cache-tree records and still spend 33s on traverse_trees, maybe there's something else. > >> ODB cache > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D > >> Since traverse_trees() hits the ODB for each tree object (of which the= re are > >> over 500K in this repo) I wrote and tested having an in-memory ODB cac= he > >> that cached all tree objects. This resulted in a > 50% hit ratio (lar= gely > >> due to the fact we traverse the tree twice during checkout) but result= ed in > >> only a minimal savings (1.3 seconds). > > > > In my experience, one major cost of object access is decompression, bot= h > > delta and zlib. Trees in particular tend to delta very well across > > versions. We have a cache to try to reuse intermediate delta results, > > but the default size is probably woefully undersized for your repositor= y > > (I know from past tests it's undersized a bit even for the linux > > kernel). > > > > Try bumping core.deltaBaseCacheLimit to see if that has any impact. It'= s > > 96MB by default. > > > > There may also be some possible work in making it more aggressive about > > storing the intermediate results. I seem to recall from past > > explorations that it doesn't keep everything, and I don't know if its > > heuristics have ever been proven sane. Could we be a bit more flexible about cache size? Say if we know there's 8 GB memory still available, we should be able to use like 1 GB at least (and that's done automatically without tinkering with config). --=20 Duy