From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 159951F5AE for ; Thu, 10 Jun 2021 10:54:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230084AbhFJK4t (ORCPT ); Thu, 10 Jun 2021 06:56:49 -0400 Received: from mail-ej1-f46.google.com ([209.85.218.46]:41575 "EHLO mail-ej1-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230059AbhFJK4o (ORCPT ); Thu, 10 Jun 2021 06:56:44 -0400 Received: by mail-ej1-f46.google.com with SMTP id ho18so32457449ejc.8 for ; Thu, 10 Jun 2021 03:54:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:user-agent:message-id:mime-version; bh=TC5lYBoske9Nqrh1xk48OHdXtmo8i8VIUGD9nVcrfvM=; b=qALVUU3qyywA1GT4f1srgL/0Y6E+xpfcAuXyL2JE0ZR3LB0LsAbhG79BGp0q/o02A9 bZmqXs0qauDhGPZ9GiRhN8aHWTSay6dB1qoidFQTVjYHy7CK+92fafFHwUFDbOmPFvue fDMb5ViqHrf1KmKVfTA/2Hexsp/uUVrlGGCYYmW6mZlxoywFIFQPWHB8nbWakP1FLz1p TEpWGEMQ03/nMvkysJJZr0n/NDnFEHFK2rc9V8Kt3SnX+gRPhiPzIDaXg8F9SYhiYX4X vV64uEPfbVpnIZyceUzsYAZ39G59J2uBPIpjlj+3bwmmgKse3tSz5KRYO7ig5I2JAL4Q Q6sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:user-agent:message-id :mime-version; bh=TC5lYBoske9Nqrh1xk48OHdXtmo8i8VIUGD9nVcrfvM=; b=ncldR2zi7/E28HS9SQVZFP/BAnoxyx9ocggQg3DxaGPHsqslqCYP7I1T/wFiYCT3bo GS0VaC8h1gS/R2dXofcQBFIfu+KpuWkLZEZxKIVQRhZExxtgGQt19+NUoOQkIU4DQpEV nxs5Iegdx5YxgglPm6qZd4RhBn/YMI0ry+HrA6NFKUt5DxOkn3v8Gqfj+/DIgl4CAL7R EfwMvfeefaLN+YZSwqV0gz3+oNravVU5wBYas54pk3+nhHugttq2YuuzI/GK2loKjA1e TMsWOeAm98WOH0glLavVCgN4Mp/iQ1ndvz9sBSU8kM9KyLs7dqXnU9C6IztIB5Ollcgf o6TQ== X-Gm-Message-State: AOAM531DRs1KoRlCXJLnhV1qEmym5wb3aiwxldlmFZgywKaqVVjNVJrA o/+47BY6z8DvcV1n8f4+sBI= X-Google-Smtp-Source: ABdhPJwZ2ExfOmkOC397F7QFBk+fUl1SVSbQ2FQnj98KS7bxwDdYZ1qbxzvCcHbfr+WDBS2EiFtNlg== X-Received: by 2002:a17:906:5593:: with SMTP id y19mr3790275ejp.195.1623322427229; Thu, 10 Jun 2021 03:53:47 -0700 (PDT) Received: from evledraar (j57224.upc-j.chello.nl. [24.132.57.224]) by smtp.gmail.com with ESMTPSA id lv25sm864570ejb.100.2021.06.10.03.53.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jun 2021 03:53:46 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Taylor Blau Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, szeder.dev@gmail.com Subject: Making split commit graphs pick up new options (namely --changed-paths) Date: Thu, 10 Jun 2021 12:40:33 +0200 User-agent: Debian GNU/Linux bullseye/sid; Emacs 27.1; mu4e 1.5.12 Message-ID: <871r9a2dol.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Sep 16 2020, Taylor Blau wrote: Replying to http://lore.kernel.org/git/ccb6482feb8d8606d82b5ab97e33184f26d6c5b6.1600279373.git.me@ttaylorr.com as a start-off point for discussion; > Introduce a command-line flag to specify the maximum number of new Bloom > filters that a 'git commit-graph write' is willing to compute from > scratch. > > Prior to this patch, a commit-graph write with '--changed-paths' would > compute Bloom filters for all selected commits which haven't already > been computed (i.e., by a previous commit-graph write with '--split' > such that a roll-up or replacement is performed). > > This behavior can cause prohibitively-long commit-graph writes for a > variety of reasons: > > * There may be lots of filters whose diffs take a long time to > generate (for example, they have close to the maximum number of > changes, diffing itself takes a long time, etc). > > * Old-style commit-graphs (which encode filters with too many entries > as not having been computed at all) cause us to waste time > recomputing filters that appear to have not been computed only to > discover that they are too-large. > > This can make the upper-bound of the time it takes for 'git commit-graph > write --changed-paths' to be rather unpredictable. > > To make this command behave more predictably, introduce > '--max-new-filters=' to allow computing at most '' Bloom filters > from scratch. This lets "computing" already-known filters proceed > quickly, while bounding the number of slow tasks that Git is willing to > do. > [...] > @@ -67,6 +67,11 @@ this option is given, future commit-graph writes will automatically assume > that this option was intended. Use `--no-changed-paths` to stop storing this > data. > + > +With the `--max-new-filters=` option, generate at most `n` new Bloom > +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is > +enforced. Commits whose filters are not calculated are stored as a > +length zero Bloom filter. > ++ > [...] Is there any way with an existing --split setup that introduces a --changed-paths to make the "add bloom filters to the graph" eventually consistent, or is some one-off --split=replace the only way to grandfather in such a feature? Reading the code there seems to be no way to do that, and we have the "chunk_bloom_data" in the graph, as well as "bloom_filter_settings". I'd expect some way to combine the "max_new_filters" and --split with some eventual-consistency logic so that graphs not matching our current settings are replaced, or replaced some at a time. Also, am I reading the expire_commit_graphs() logic correctly that we first write the split graph, and then unlink() things that are too old? I.e. if you rely on the commit-graph to optimize things this will make things slower until the next run of writing the graph? I expected to find something more gentle there, i.e. marking that file as obsolete, not making it part of the new chain (replacing it), and then unlinking only things not part of the current chain of data that are too old. But perhaps I'm just misreading or misunderstanding the behavior...