From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS,URIBL_BLOCKED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id BE20E1F66E for ; Fri, 28 Aug 2020 17:22:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727092AbgH1RVT (ORCPT ); Fri, 28 Aug 2020 13:21:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727020AbgH1RUy (ORCPT ); Fri, 28 Aug 2020 13:20:54 -0400 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32E72C061264 for ; Fri, 28 Aug 2020 10:20:53 -0700 (PDT) Received: by mail-ot1-x344.google.com with SMTP id 5so1506285otp.12 for ; Fri, 28 Aug 2020 10:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DFSVVLWkP9vk8o4Ls61rYZ4KfgHv8AWRLlkmMzG3koc=; b=G1Nbc7v0mums6nDxAU2vEc5Ck376fnZR/RDJNmF9RUCGSSkmhLDF0137rmZJG3ATlP BGn0LST8/LTg95dCy7TuQCG6GAnksWDkZrIB3hhemo9xoN9N2fghJ2C3K6dbwSdPd+0V DdT9Y+ppd+jNmDrh7O5aY/W/1k49aBd7l7dMB+P9cYYJwsF7g5RnHchB6BchVXCdSYS4 CcyjvuGKS6nWpCpikXGk7KjXNC+yPdzfwvSQUv0QUcZERxkF/ARmv8ySo3/rpJT6ykv6 p9KMkJKKjH14TUtGZTKez1Qbt6cH6m4KkZJTw665pNCejx9Nt2r4LLH4So1hgCKpfe6r y+TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DFSVVLWkP9vk8o4Ls61rYZ4KfgHv8AWRLlkmMzG3koc=; b=AOS3WCuDyjl7HrAVrK+L32z28prN6/4IVqi7I9KvKYHpgDNqimhVGGGRI32koVX59r ZTt4jBL1Aszy0Rt+3REBuAcnpJmquQtsNSqv4GfN5s0z4xk3P/ylJdNDUdg/lh6W1Sww NlEZqeO8Zy1QFywR7Q+SAEFFxmwA/arR+XlEGzYkt2a3+wHjJ26WHSEUm7AvfkGYwcLD GFLvz0vAxPqaePVN1SVVlNl6jORxdUwGR3MZ/JdW80n7XXAH0N1KCwxxJRJ5Q8NpSD/o z8ZIgZMfbOgLmCWpaqgdwG1PxX1p5KnQXXlb/BkYWW4av8DomQK2Nbtn4fdvbB9P0JrX BCdg== X-Gm-Message-State: AOAM5330V1TVVysxeqXPVTONDrgKDSbBevwWNdtA9n4rvOWCVN3+HagZ 7Pu9vvATqRZcFwr/WNm6cn79w84Hc02VzYwSXis= X-Google-Smtp-Source: ABdhPJzEDapSaEYumsl2LcWT8re/IM1byaERAzNjyCtx59hDrv8jQr48fsbIpExaSGfEW8B+hkULzTNk0I8o1PbI+p4= X-Received: by 2002:a05:6830:1258:: with SMTP id s24mr1959036otp.162.1598635251885; Fri, 28 Aug 2020 10:20:51 -0700 (PDT) MIME-Version: 1.0 References: <20200821200121.GF1165@coredump.intra.peff.net> <20200821210301.GA11806@coredump.intra.peff.net> <20200828070802.GC2105050@coredump.intra.peff.net> In-Reply-To: <20200828070802.GC2105050@coredump.intra.peff.net> From: Elijah Newren Date: Fri, 28 Aug 2020 10:20:40 -0700 Message-ID: Subject: Re: [PATCH 4/5] strmap: add strdup_strings option To: Jeff King Cc: Elijah Newren via GitGitGadget , Git Mailing List Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Aug 28, 2020 at 12:08 AM Jeff King wrote: > > On Fri, Aug 21, 2020 at 03:25:44PM -0700, Elijah Newren wrote: > > > > - That sounds like a lot of maps. :) I guess you've looked at > > > compacting some of them into a single map-to-struct? > > > > Oh, map-to-struct is the primary use. But compacting them won't work, > > because the reason for the additional maps is that they have different > > sets of keys (this set of paths meet a certain condition...). Only > > one map contains all the paths involved in the merge. > > OK, I guess I'm not surprised that you would not have missed such an > obvious optimization. :) > > > Also, several of those maps don't even store a value; and are really > > just a set implemented via strmap (thus meaning the only bit of data I > > need for some conditions is whether any given path meets it). It > > seems slightly ugly to have to call strmap_put(map, string, NULL) for > > those. I wonder if I should have another strset type much like your > > suggesting for strintmap. Hmm... > > FWIW, khash does have a "set" mode where it avoids allocating the value > array at all. Cool. > What's the easiest way to benchmark merge-ort? Note that I discovered another optimization that I'm working on implementing; when finished, it should cut down a little more on the time spent on inexact rename detection. That should have the side effect of having the time spent on strmaps stick out some more in the overall timings (as a percentage of overall time anyway). So, I'm focused on that before I do other benchmarking work (which is part of the reason I mentioned my strmap/hashmap benchmarking last week might take a while). Anyway, on to your question: === If you just want to be able to run the ort merge algorithm === Clone git@github.com:newren/git and checkout the 'ort' branch and build it. It currently changes the default merge algorithm to 'ort' and even ignores '-s recursive' by remapping it to '-s ort' (because I wanted to see how regression tests fared with ort as a replacement for recrusive). It should pass the regression tests if you want to run those first. But note that if you want to compare 'ort' to 'recursive', then currently you need to have two different git builds, one of my branch and one with a different checkout of something else (e.g. 2.28.0 or 'master' or whatever). === Decide the granularity of your timing === I suspect you know more than me here, but maybe my pointers are useful anyway... Decide if you want to measure overall program runtime, or dive into details. I used both a simple 'time' and the better 'hyperfine' for the former, and used both 'perf' and GIT_TRACE2_PERF for the latter. One nice thing about GIT_TRACE2_PERF was I wrote a simple program to aggregate the times per region and provide percentages, in a script at the toplevel named 'summarize-perf' that I can use to prefix commands. Thus, I could for example run from my linux clone: $ ../git/summarize-perf git fast-rebase --onto HEAD base hwmon-updates and I'd get output that looks something like (note that this is a subset of the real output): 1.400 : 35 : label:inmemory_nonrecursive 0.827 : 41 : ..label:renames 0.019 : ( 2.2%) 0.803 : 37 : ....label:regular renames 0.004 : 31 : ....label:directory renames 0.001 : 31 : ....label:process renames 0.513 : 41 : ..label:collect_merge_info 0.048 : 35 : ..label:process_entries 0.117 : 1 : label:checkout 0.000 : 1 : label:record_unmerged and where those fields are