From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id A83931F5AE for ; Wed, 17 Jun 2020 23:07:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726893AbgFQXHP (ORCPT ); Wed, 17 Jun 2020 19:07:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbgFQXHO (ORCPT ); Wed, 17 Jun 2020 19:07:14 -0400 Received: from mail-oo1-xc43.google.com (mail-oo1-xc43.google.com [IPv6:2607:f8b0:4864:20::c43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DEBFC06174E for ; Wed, 17 Jun 2020 16:07:13 -0700 (PDT) Received: by mail-oo1-xc43.google.com with SMTP id 18so782487ooy.3 for ; Wed, 17 Jun 2020 16:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+hkf701b8YzLo3BQDRwbTQfx75eu9HP6LXmmo9JNqMw=; b=C+cNP5JrS0kteZXtxt+tyUpFqq4+Z9c1eTwEvfZr4CgAeNSTfjUgPexOdfVaIwKhNq P3scG0VR0+Fe53KJZ/eB8MCJSO/YSLLOHXtpVarZH/RMUDR1YcSof2/j2V3tMMGo5V2g fjCcphMSEsb9T8AYGLcghKZPeJrmRVbjgMPIyyhNAFVog368moPC2fHUy0ZRl+qZXuJj 9MuRxMbP7jkshvYq5xG/mu8KYl+MquaRISskCVgb95pGweJivTeXu500Vm5ce042zFGv N6EuY0XG4t7n+vuVitOn8VAdZeJd/qsWWUMepwhC5Psx29ZYpmh5Y8f9/7yUM2McFPFp RqAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+hkf701b8YzLo3BQDRwbTQfx75eu9HP6LXmmo9JNqMw=; b=ZQm7GqUt7pmQWMzw2IAJmuNOBh56OOTXUYZOfDLviF+tgIYQEDcfaC3pONtqreeWJV hLJcj8gdp7ILK4iJhDB6B0qPoTilMswyO6affQQP7g1CtBNMSihUJoN9LSYAPe1BQw/i 061UDMy7kI2HgSJIQI0BK4YmVkerZ3Y0jlnn+igCt6RNIDDaZtToWovk1dwuxhpUYy2w 6VhwG+iGZpSmCiTbXUPA5a/kYkPrrMWdIe4zIFWr0wdjOPbkAyVcxMNvjHU6ehXyoHe/ SQYokwvGIOQXtLS7xhZRom9B/dS7vZhwZCiNaYv447Pp1anIkISgLzixyhkwV/vynI76 OmAg== X-Gm-Message-State: AOAM533hu5ELVqG2elWRpQGpwOuXrcB6P6d8aax+k62QGCOQJJwbafOk 7xaSWOby0SZuN5VgqsU5y2yJptGEjhQDNclFgE8= X-Google-Smtp-Source: ABdhPJydC5+TIN2QLwTM1DmGWXiXHUq0m9Npsqd5LFf4Q6G8a9tmxgKj7dqrDW8/zX8fiU+QBO8c2fFk7vUv5Vv1NfY= X-Received: by 2002:a4a:8507:: with SMTP id k7mr1668973ooh.32.1592435232560; Wed, 17 Jun 2020 16:07:12 -0700 (PDT) MIME-Version: 1.0 References: <2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@gmail.com> <6d354901-9361-d8d1-539d-3b6c3edb2d9f@gmail.com> In-Reply-To: From: Elijah Newren Date: Wed, 17 Jun 2020 16:07:01 -0700 Message-ID: Subject: Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions To: Derrick Stolee Cc: Junio C Hamano , Derrick Stolee via GitGitGadget , Git Mailing List , newren@gmaill.com, Jeff King , Taylor Blau , Jonathan Nieder , Derrick Stolee , Son Luong Ngoc Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, May 20, 2020 at 10:52 AM Elijah Newren wrote: > > On Fri, May 8, 2020 at 8:42 AM Derrick Stolee wrote: > > > > On 5/7/2020 6:58 PM, Junio C Hamano wrote: > > > "Derrick Stolee via GitGitGadget" writes: > > > > > >> One of the difficulties of using the sparse-checkout feature is not > > >> knowing which directories are absolutely needed for working in a portion > > >> of the repository. Some of this can be documented in README files or > > >> included in a bootstrapping tool along with the repository. This is done > > >> in an ad-hoc way by every project that wants to use it. > > >> > > >> Let's make this process easier for users by creating a way to define a > > >> useful sparse-checkout definition inside the Git tree data. This has > > >> several benefits. In particular, the data is available to anyone who has > > >> a copy of the repository without needing a different data source. > > >> Second, the needs of the repository can change over time and Git can > > >> present a way to automatically update the working directory as these > > >> sparse-checkout definitions change over time. > > > > > > And two lines of development can merge them together? > > > > > > Any time a new "feature" pops up that would eventually affect how > > > "git clone" and "git checkout" work based on untrusted user data, we > > > need to make sure there is no negative security implications. > > > > > > If it only boils down to "we have files that can record list of > > > leading directory names and without offering extra 'flexibility'", I > > > guess there aren't all that much that a malicious sparse definition > > > can do and we would be safe, though. > > > > Yes. I hope that we can be extremely careful with this feature. > > The RFC status of this series implicitly includes the question > > "Should we do this at all?" I think the benefits outweigh the > > risks, but we can minimize those risks with very careful design > > and implementation. > > > > >> To use this feature, add the "--in-tree" option when setting or adding > > >> directories to the sparse-checkout definition. For example: > > >> > > >> $ git sparse-checkout set --in-tree .sparse/base > > >> $ git sparse-checkout add --in-tree .sparse/extra > > >> > > >> These commands add values to the multi-valued config setting > > >> "sparse.inTree". When updating the sparse-checkout definition, these > > >> values describe paths in the repository to find the sparse-checkout > > >> data. After the commands listed earlier, we expect to see the following > > >> in .git/config.worktree: > > >> > > >> [sparse] > > >> intree = .sparse/base > > >> intree = .sparse/extra > > > > > > What does this say in human words? "These two tracked files specify > > > which paths should be in the working tree"? Spelling it out here > > > would help readers of this commit. > > > > You got it. Sounds good. > > > > >> When applying the sparse-checkout definitions from this config, the > > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. > > > > > > OK, so end-user edit to the working tree copy or what is added to > > > the index does not count and only the committed version gets used. > > > > > > That makes it simple---I was wondering how we would operate when > > > merging a branch with different contents in the .sparse/* files > > > until the conflicts are resolved. > > > > It's worth testing this case so we can be sure what happens. > > During a merge or rebase or checkout -m, what happens if .sparse/extra > has the following working tree content: > > [sparse] > dir = D > dir = X > <<<<<< HEAD > dir = Y > |||||| MERGE_BASE > ====== > inherit = .sparse/tools > >>>>>> MERGE_HEAD > inherit = .sparse/base > > and, of course, three different entries in the index? > > Also, do we use the version of the --in-tree file from the latest > commit, from the index, or from the working tree? (This is a question > not only for merge and rebase, but also checkout with dirty changes > and even checkout -m.) Which one "wins"? > > And what if the user updates and commits an ill-formed version of the > file -- is it equivalent to getting an empty cone with just the > toplevel directory, equivalent to getting a complete checkout of > everything, or something else? Son pointed out that mercurial has a 'sparse' extension that has some possible ideas of things we could do here; see https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/ for some further discussion.