From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 18B8C1F55B for ; Fri, 12 Jun 2020 19:26:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726557AbgFLTZy (ORCPT ); Fri, 12 Jun 2020 15:25:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726338AbgFLTZe (ORCPT ); Fri, 12 Jun 2020 15:25:34 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B83FC08C5C1 for ; Fri, 12 Jun 2020 12:25:34 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id x13so10936537wrv.4 for ; Fri, 12 Jun 2020 12:25:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Olix818027V1IcNPyeZRLiFMJmViclrWT5wIea+Xn10=; b=kirb3IPo9YGGb4ZG3f56e82kCXYkxOwKamdVo9ph8IlgfQCZEAtd+Brr51NhUiVwt1 zafhCXwmb62Z+I7IoLH5tCKMYTw+K5sZRsWaqjYAWOtnAmrhL6ifEzPLJWe+9kNX8Eig niSkLI2XY0H4a9PizgUmYLmbRNzrV49DR3HYbYLVE/09mTjl+7lCoOhJ8PXk9XVto/T6 bEI9MznuK3f/fUn4+Fl9QGGY0NR2huMdsuXT95XD2CQeze5AWyKd4RezVf08o/u1cbCz gSqi7P8ibaO3xv5RKOSiTtVFlH7ApDQrAlm2eOHq14iTG5R7nx68p/L/rGXKZY2yEr+z cOfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Olix818027V1IcNPyeZRLiFMJmViclrWT5wIea+Xn10=; b=qkXqpROSRn8wj8dGTb6zS7syePRquCgXmUs1ldIssl7Acf2/nAVWy4TbdQZiDyMiSu sQxjQztpNzWXx6jeKLyXd/gL6r5iA7o2/8fGk9XM3o7LyWgYzbSJzezbRw3FwhTrvfGh PKQzjqib8U9WFw5SGxmMLtvEUSXxY40S9itAO3pGr+VKqCMt10sivZmq5ay4LDsvt+oy WsskeAOWS5uaySXQ7ooXwEj653103RbtPeBq+6WHz5WP16Q2O1FbCRh1AsGuv/OrzqRc GQdcyD07iIGVkGTaZEzSJ1FVxqHoqbTT3kDsa1BGB/y5CdOj8t06N5c6Okqdo7nltY95 /p2w== X-Gm-Message-State: AOAM533Ut3JYvzOyYIJ1xaF0u337sIuLbBwPW9rD/p0ltkfvbYkLEwAs eYDq1hVvWVTrbEKqsVhRaDHz138lmJ2xTtkTpfI= X-Google-Smtp-Source: ABdhPJxrZBTezGt28hlg539rJ89z+ehzckbyrvnZ9Ff/euHrTlN4EqpOtcyEwx+sDoE/T/1D3IlffGT34Cy5jdI7K2g= X-Received: by 2002:adf:aa94:: with SMTP id h20mr15460483wrc.327.1591989932640; Fri, 12 Jun 2020 12:25:32 -0700 (PDT) MIME-Version: 1.0 References: <4fa6fba33b329ce85f68470fb2545adf1bb06900.1591978801.git.gitgitgadget@gmail.com> In-Reply-To: From: Chris Torek Date: Fri, 12 Jun 2020 12:25:21 -0700 Message-ID: Subject: Re: [PATCH v4 2/3] git diff: improve range handling To: Junio C Hamano , git@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Ugh, forgot to tweak gmail reply to go to the mailing list. Also I typed in a wrong word ("commit" should be "comment"). Corrected reply: On Fri, Jun 12, 2020 at 11:31 AM Junio C Hamano wrote: > > "Chris Torek via GitGitGadget" writes: > > > +struct symdiff { > > + struct bitmap *skip; > > + int warn; > > + const char *base, *left, *right; > > +}; > > + > > +/* > > + * Check for symmetric-difference arguments, and if present, arrange > > + * everything we need to know to handle them correctly. As a bonus, > > + * weed out all bogus range-based revision specifications, e.g., > > + * "git diff A..B C..D" or "git diff A..B C" get rejected. > > + * > > + * For an actual symmetric diff, *symdiff is set this way: > > + * > > + * - its skip is non-NULL and marks *all* rev->pending.objects[i] > > + * indices that the caller should ignore (extra merge bases, of > > + * which there might be many, and A in A...B). Note that the > > + * chosen merge base and right side are NOT marked. > > + * - warn is set if there are multiple merge bases. > > + * - base, left, and right point to the names to use in a > > + * warning about multiple merge bases. > > + * > > + * If there is no symmetric diff argument, sym->skip is NULL and > > + * sym->warn is cleared. The remaining fields are not set. > > + */ > > OK. > > > +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym) > > +{ > > + int i, is_symdiff = 0, basecount = 0, othercount = 0; > > + int lpos = -1, rpos = -1, basepos = -1; > > + struct bitmap *map = NULL; > > + > > + /* > > + * Use the whence fields to find merge bases and left and > > + * right parts of symmetric difference, so that we do not > > + * depend on the order that revisions are parsed. If there > > + * are any revs that aren't from these sources, we have a > > + * "git diff C A...B" or "git diff A...B C" case. Or we > > + * could even get "git diff A...B C...E", for instance. > > + * > > + * If we don't have just one merge base, we pick one > > + * at random. > > + * > > + * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B, > > + * so we must check for SYMMETRIC_LEFT too. The two arrays > > + * rev->pending.objects and rev->cmdline.rev are parallel. > > + */ > > + for (i = 0; i < rev->cmdline.nr; i++) { > > + struct object *obj = rev->pending.objects[i].item; > > + switch (rev->cmdline.rev[i].whence) { > > + case REV_CMD_MERGE_BASE: > > + if (basepos < 0) > > + basepos = i; > > + basecount++; > > + break; /* do mark all bases */ > > We find and use the first found merge base (i.e. "pick at random" as > promised in the comment before the function), but for warning, keep > track of how many merge bases there are. > > > + case REV_CMD_LEFT: > > + if (lpos >= 0) > > + usage(builtin_diff_usage); > > A range (either A..B or A...B) has already been seen, and we have > another, which is now rejected. > > > + lpos = i; > > + if (obj->flags & SYMMETRIC_LEFT) { > > + is_symdiff = 1; > > + break; /* do mark A */ > > Inside this switch statement, "continue" is a sign that the caller > should use the rev, and "break" is a sign that the rev is to be > ignored. We obviously do not ignore "A" in ... > > > + } > > + continue; > > ... "A..B" notation, so we "continue" here. > > > + case REV_CMD_RIGHT: > > + if (rpos >= 0) > > + usage(builtin_diff_usage); > > Here is the same "we reject having two or more ranges". > > I actually suspect that this usage() would become dead code---we > would already have died when we saw the matching left end of the > second range (so this could become BUG(), even though usage() does > not hurt). Right - I considered it both ways and figured a second usage() was simpler and straightforward, but I'm fine with either method. > > + rpos = i; > > + continue; /* don't mark B */ > > And of course, whether "A..B" or "A...B", B will be used as the > "result" side of the diff, so won't be marked for skipping. > > > + case REV_CMD_PARENTS_ONLY: > > + case REV_CMD_REF: > > + case REV_CMD_REV: > > + othercount++; > > + continue; > > I wonder if we want to use "default" instead of these three > individual cases. Pros and cons? > > - If we forgot to list a whence REV_CMD_* here, it will be silently > marked to be skipped with this code. With "default", it will be > counted to be diffed (which may trigger "giving too many revs to > be diffed" error from the diff machinery, which is good). > > - With "default", when we add new type of whence to REV_CMD_* and > forget to adjust this code, it will be counted to be diffed. > With the current code, it will be skipped. > > We probably could get the best of the both words by keeping the > above three for "counted in othercount and kept), and then add a > default arm to the switch() that just says > > default: > BUG("forgot to handle %d", > rev->cmdline.rev[i].whence); I'm fine with this as well. I did it the way I did because at least some compilers give a warning or error if you forgot an enum. Using default (or adding one) defeats this, but the BUG method is reasonable. > That way, every time we add a new type of whence, we would be forced > to think what should be done to them. > > > + } > > + if (map == NULL) > > + map = bitmap_new(); > > + bitmap_set(map, i); > > + } > > + > > + /* > > + * Forbid any additional revs for both A...B and A..B. > > + */ > > + if (lpos >= 0 && othercount > 0) > > + usage(builtin_diff_usage); > > Meaning "git diff A..B C" is bad. Reasonable. > > > + if (!is_symdiff) { > > + bitmap_free(map); > > It is not wrong per-se to free it unconditionally, but wouldn't it > be a bug if (map != NULL) at this point in the flow? Yes, because the A in A...B will have been marked. It didn't seem worth a BUG() call though. > The merge bases would only be stuffed in the revs when A...B is > given, and we are not skipping anything involved in A..B or > non-range revs. > > > + sym->warn = 0; > > + sym->skip = NULL; > > Clearing these two fields are as promised to the callers in the > comment above, which is good. > > > + return; > > + } > > + > > + sym->left = rev->pending.objects[lpos].name; > > + sym->right = rev->pending.objects[rpos].name; > > + sym->base = rev->pending.objects[basepos].name; > > + if (basecount == 0) > > + die(_("%s...%s: no merge base"), sym->left, sym->right); > > Good. > > > + bitmap_unset(map, basepos); /* unmark the base we want */ > > Hmph. You could > > case REV_CMD_MERGE_BASE: > basecount++; > if (basepos < 0) { > basepos = i; > continue; /* keep this one */ > } > break; /* skip all others */ > > and lose this unset(). I do not think it makes too much of a > difference, but it probably is easier to follow if we avoided this > "do something and then come back to correct" pattern. OK, I'll change it to do this in the re-roll. > > + sym->warn = basecount > 1; > > + sym->skip = map; > > +} > > + > > int cmd_diff(int argc, const char **argv, const char *prefix) > > { > > int i; > > @@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > > struct object_array_entry *blob[2]; > > int nongit = 0, no_index = 0; > > int result = 0; > > + struct symdiff sdiff; > > > > /* > > * We could get N tree-ish in the rev.pending_objects list. > > @@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > > } > > } > > > > + symdiff_prepare(&rev, &sdiff); > > for (i = 0; i < rev.pending.nr; i++) { > > struct object_array_entry *entry = &rev.pending.objects[i]; > > struct object *obj = entry->item; > > @@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > > obj = &get_commit_tree(((struct commit *)obj))->object; > > > > if (obj->type == OBJ_TREE) { > > + if (sdiff.skip && bitmap_get(sdiff.skip, i)) > > + continue; > > By the way, I cannot shake this feeling that, given that > rev.pending/cmdline.nr will not be an unreasonably large number, if > it is overkill to use the bitmap here. If I were writing this code, > I would have made symdiff_prepare() to fill a separate object array > by copying the elements to be used in the final "diff" out of the > rev.pending array and updated this loop to iterate over that array. > > I am not saying that such an approach is better than the use of > bitmap code here. It just was a bit unexpected to see the bitmap > code used for set of objects that is typically less than a dozen. I am sure it is overkill -- but at the same time, it's already coded and cheap enough to use that rolling our own separate array felt worse. I can add a comment (NOT COMMIT) if you like. Thanks, Chris