From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 2635B1F66F for ; Thu, 5 Nov 2020 21:17:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732235AbgKEVRi (ORCPT ); Thu, 5 Nov 2020 16:17:38 -0500 Received: from cloud.peff.net ([104.130.231.41]:49372 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731234AbgKEVRi (ORCPT ); Thu, 5 Nov 2020 16:17:38 -0500 Received: (qmail 17591 invoked by uid 109); 5 Nov 2020 21:17:37 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 05 Nov 2020 21:17:37 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 27535 invoked by uid 111); 5 Nov 2020 21:17:37 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 05 Nov 2020 16:17:37 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 5 Nov 2020 16:17:36 -0500 From: Jeff King To: Junio C Hamano Cc: Elijah Newren via GitGitGadget , git@vger.kernel.org, Elijah Newren Subject: Re: [PATCH v4 00/13] Add struct strmap and associated utility functions Message-ID: <20201105211736.GA134274@coredump.intra.peff.net> References: <20201105132909.GB91972@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Nov 05, 2020 at 12:25:14PM -0800, Junio C Hamano wrote: > Jeff King writes: > > > Subject: [PATCH] shortlog: drop custom strset implementation > > > > We can use the strset recently added in strmap.h instead. Note that this > > doesn't have a "check_and_add" function. We can easily write the same > > thing using separate "contains" and "adds" calls. This is slightly less > > efficient, in that it hashes the string twice, but for our use here it > > shouldn't be a big deal either way. > > > > I did leave it as a separate helper function, though, since we use it in > > three separate spots (some of which are in the middle of a conditional). > > It makes sense, but check_dup() sounds like its use pattern would be > > if (check_dup(it) == NO_DUP) > add(it); > > where it is more like "add it but just once". Hmph. I picked that name (versus just "contains") hoping it be general enough to cover the dual operation. Better name suggestions are welcome. Though... > By the way, is a strset a set or a bag? If it makes the second add > an no-op, perhaps your check_dup() is what strset_add() should do > itself? What builtin/shortlog.c::check_dup() does smells like it is > a workaround for the lack of a naturally-expected feature. Yes, if strset_add() returned an integer telling us whether the item was already in the set, then we could use it directly. It's slightly non-trivial to do, though, as it's built around strmap_put(), which returns a pointer to the old value. But since we're a set and not a map, that value is always NULL; we can't tell the difference between "I was storing an old value which was NULL" and "I was not storing any value". If strset were built around strintmap it could store "1" for "present in the set". It somehow feels hacky, though, to induce extra value writes just for the sake of working around the API. Since strset is defined within strmap.c, perhaps it wouldn't be too bad for it to be more intimate with the details here. I.e., to use find_strmap_entry() directly, and if the value is not present, to create a new hashmap entry. That would require hacking up strmap_put() into a few helpers, but it's probably not too bad. -Peff