From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 1888C1F87F for ; Mon, 12 Nov 2018 15:46:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729962AbeKMBkP (ORCPT ); Mon, 12 Nov 2018 20:40:15 -0500 Received: from mail-vs1-f65.google.com ([209.85.217.65]:46184 "EHLO mail-vs1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726997AbeKMBkO (ORCPT ); Mon, 12 Nov 2018 20:40:14 -0500 Received: by mail-vs1-f65.google.com with SMTP id r14so5338757vsc.13 for ; Mon, 12 Nov 2018 07:46:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=G0FH8MHxAhwWQQ3S2WIGMmVGpTpYne629GeSr1Q1mfo=; b=svrMY8Pryc6ESTeY1T47IYHxu8gXWnoWwzvKgVcV9ruoDjTdls+xCEYuM3SX9D8daK IVQDXgc7+hnTHHAdv8CU5BUDjJszl+0xUU4mkHip06o6BrnposFmsFyEjp7Yv0N6UN+3 Z+baDZeJYaAozrAIjPEEDE4soyVrVZXbpopgQwvi3dgiZ8WoVosHoZ2y/dkEaGJkoRFB L8EW85wQCLrdhvCWSJCoq0wzqgdQO3tJ2Qf7dnVJCwuDqskN3Fyee5m7dNhjDWrlXO7s 8OPHN8i4iwsl7xRE3aYALpvQbggUbAMOzcZbtdQ8JOxxN/QI2cUbgugzQLMVYQeWdH1w abQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=G0FH8MHxAhwWQQ3S2WIGMmVGpTpYne629GeSr1Q1mfo=; b=gJCm3azb5aZRtklh/mscgwFq0TqXHrh5tgKTKZJPyDlk1SQZtI9USunSphIDEDPzKp kUHjvA75444nsnSg23j9CUKd4y03wG5qgY+xwcozxARS6gnJzX1QMGNP3vh7RVaBtd8Q 2iWOFSLIWAfKLP1P51l9A69LoKomPT9DI+WcPelLCGkWzy1nP6fZ7fpfpP76SbBiz7LU 6BMa0NBY9pSYM9yMOS9yzGInc4XbUhYRhhlYdiG88tDmUupwaFca626k8cPJj2TsJdA8 1VvudSBiDT0N/qrX4udBgIRtSU8ON91d1KgM/p4AMOUjthOerdJdAVv/3m31dBM/zb+l fZRw== X-Gm-Message-State: AGRZ1gKV6TyCwEKWq761D+WixbkcXyclpkpp00x9plvKLeM0Gmled/VU ipgqsK/0FHAjDdAMc4gMeVG8rebXD6/bRn0Hie5Of6OB X-Google-Smtp-Source: AJdET5e1bhH9b7cihahHwNDxvsVy1L4QE9BYp68jYp9FiC++0G6rtHEvsjqowkyhOjYYxmM1NtdF3/P0sR17xTCqnso= X-Received: by 2002:a67:f696:: with SMTP id n22mr635867vso.175.1542037585988; Mon, 12 Nov 2018 07:46:25 -0800 (PST) MIME-Version: 1.0 References: <20181111062312.16342-1-newren@gmail.com> <20181111062312.16342-10-newren@gmail.com> <20181111072007.GI30850@sigill.intra.peff.net> <20181112125341.GH3956@sigill.intra.peff.net> In-Reply-To: <20181112125341.GH3956@sigill.intra.peff.net> From: Elijah Newren Date: Mon, 12 Nov 2018 07:46:14 -0800 Message-ID: Subject: Re: [PATCH 09/10] fast-export: add a --show-original-ids option to show original names To: Jeff King Cc: Git Mailing List , Lars Schneider , "brian m. carlson" , Taylor Blau , Jonathan Nieder Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Nov 12, 2018 at 4:53 AM Jeff King wrote: > On Sun, Nov 11, 2018 at 12:32:22AM -0800, Elijah Newren wrote: > > > > > Documentation/git-fast-export.txt | 7 +++++++ > > > > builtin/fast-export.c | 20 +++++++++++++++----- > > > > fast-import.c | 17 +++++++++++++++++ > > > > t/t9350-fast-export.sh | 17 +++++++++++++++++ > > > > 4 files changed, 56 insertions(+), 5 deletions(-) > > > > > > The fast-import format is documented in Documentation/git-fast-import.txt. > > > It might need an update to cover the new format. > > > > We document the format in both fast-import.c and > > Documentation/git-fast-import.txt? Maybe we should delete the long > > comments in fast-import.c so this isn't duplicated? > > Yes, that is probably worth doing (see the comment at the top of > fast-import.c). Some information might need to be migrated. > > If we're going to have just one spot, I think it needs to be the > user-facing documentation. This is a public interface that other people > are building compatible implementations for (including your new tool). Okay, I'll work on that. > OK, that matches my understanding. So why does fast-export need to print > the blob ids? If the intermediary is rewriting blobs, it can then > produce the "originally" line itself, can't it? > > The more interesting case I guess is your "strip out blobs by id" > example. There the intermediary _could_ do so itself, but it would > require recomputing the object id of each blob. > > If you use "--no-data", then this just works (we specify tree entries by > object id, rather than by mark). But I can see how it would be useful to > have the information even without "--no-data" (i.e., if you are doing > multiple kinds of rewrites on a single stream). > > I think the thing that confused me is that this "originally" is doing > two things: > > - mentioning blob ids as an optimization / convenience for the reader > > - mentioning rewritten commit (and presumably tag?) ids that were > rewritten as part of a partial history export. I suppose even trees > could be rewritten that way, too, but fast-import doesn't generally > consider trees to be a first-class item. > > So I'm OK with it, but I wonder if there is an easier way to explain it. Yeah, I started out just needing to add the original oids for commits. Once I added them there, I wondered whether someone would need them for tags and blobs too (not trees since fast-import doesn't work with those). For blobs, it made sense as a small performance optimization (when running without --no-data), as you pointed out. I can't think of a use for them in tags, but once I've included them in blobs and commits it felt like I might as well include them there for completeness. So maybe my commit message should have been something more like: """ Knowing the original names (hashes) of commits can sometimes enable post-filtering that would otherwise be difficult or impossible. In particular, the desire to rewrite commit messages which refer to other prior commits (on top of whatever other filtering is being done) is very difficult without knowing the original names of each commit. In addition, knowing the original names (hashes) of blobs can allow filtering by blob-id without requiring re-hashing the content of the blob, and is thus useful as a small optimization. Once we add original ids for both commits and blobs, we may as well add them for tags too for completeness. Perhaps someone will have a use for them. This commit teaches a new --show-original-ids option to fast-export which will make it add a 'original-oid ' line to blob, commits, and tags. It also teaches fast-import to parse (and ignore) such lines. """ ?