git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Victoria Dye <vdye@github.com>
To: Derrick Stolee <derrickstolee@github.com>,
	Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Johannes.Schindelin@gmx.de, gitster@pobox.com
Subject: Re: [PATCH 2/2] scalar: convert README.md into a technical design doc
Date: Mon, 11 Jul 2022 16:05:28 -0700	[thread overview]
Message-ID: <ee9ea998-fb9d-1bf0-635a-e1627c7c1c40@github.com> (raw)
In-Reply-To: <994f2efd-0789-afad-ba0d-27da9692b289@github.com>

Derrick Stolee wrote:
> On 6/29/2022 12:58 PM, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@github.com>
>>
>> Replace 'README.md' with 'technical/scalar.txt' (still in 'contrib/'). In
>> addition to reformatting for asciidoc, elaborate on the background, purpose,
>> and design choices that went into Scalar.
>>
>> This document is intended to persist in the 'Documentation/technical/'
>> directory after Scalar has been moved into the root of Git (out of
> 
> I wonder if it is a good idea to move this into Documentation/technical/ _now_,
> so we can have this document as part of "core Git's roadmap" as much as we can
> say that. Part of the roadmap is moving the Scalar code out of contrib/ and
> finalizing how users (or distributors) can install the 'scalar' binary.
> 

I initially hesitated in doing this because I felt it was a bit premature,
but your comment prompted me to look more closely at other docs in
'Documentation/technical'. Precedent seems to indicate that the doc should
be added *before* implementation (either in the same series as the
implementation [1] or separately [2]), so I'll do the same here. Thanks for
the suggestion!

[1] https://lore.kernel.org/git/daa9a6bcefbce977606484d502f5cfd7fca63ecc.1614111270.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/20171208192636.13678-1-git@jeffhostetler.com/

> It can be helpful to include the details of what steps to take to compile and
> test the 'scalar' executable. That documentation will then be updated when
> Scalar moves out of contrib/.
> 

As part of the move out of 'contrib/', I was planning on having Scalar built
and installed the same as any built-in (albeit in 'bin/' - like 'gitk',
'git-cvsserver', etc. - rather than 'libexec/git-core'). In that case, there
won't be any special steps needed to build/install 'scalar', so any
instructions here would be temporary. I could include those instructions in
the meantime, but with Scalar incomplete, I'm not sure whether that would be
valuable.

All that said, if later reviews/iterations lead to Scalar needing special
steps to build/install, I'll definitely include them here.

>> +Scalar
>> +======
>> +
>> +Scalar is a built-in repository management tool that optimizes Git for use in
> 
> I think "built-in" is unnecessary here. It makes me think of Git builtins and
> otherwise does not add much to the description.
> 
>> +large repositories. It accomplishes this by helping users to take advantage of
>> +advanced performance features in Git. Unlike most other Git built-in commands,
>> +Scalar is not executed as a subcommand of 'git'; rather, it is built as a
>> +separate executable containing its own series of subcommands.
> 
> Good to have this distinction.
> 
>> +Scalar is a large enough project that it is being upstreamed incrementally,
>> +living in `contrib/` until it is feature-complete. So far, the following patch
>> +series have been accepted:
>> +
>> +- `scalar-the-beginning`: The initial patch series which sets up
>> +  `contrib/scalar/` and populates it with a minimal `scalar` command that
>> +  demonstrates the fundamental ideas.
>> +
>> +- `scalar-c-and-C`: The `scalar` command learns about two options that can be
>> +  specified before the command, `-c <key>=<value>` and `-C <directory>`.
>> +
>> +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
>> +
>> +Roughly speaking (and subject to change), the following series are needed to
>> +"finish" this initial version of Scalar:
> 
> I like how you have these two sections. As you move forward in this roadmap,
> the diff to the document should look like you are _moving_ the sentence above.
> 
>> +- Finish Scalar features: Enable the built-in FSMonitor in Scalar
>> +  enlistments and implement `scalar help`. At the end of this series, Scalar
>> +  should be feature-complete from the perspective of a user.
> 
> Wow, we're so close!
> 
>> +- Generalize features not specific to Scalar: In the spirit of
>> +  making Scalar configure only what is needed for large repo performance, move
> 
> (nit: Strange wrapping on the first line)
> 
>> +  common utilities into other parts of Git. Some of this will be internal-only,
>> +  but one major change will be generalizing `scalar diagnose` for use with any
>> +  Git repository.
>> +
>> +- Move Scalar to toplevel: Make `scalar` a built-in component of Git by
> 
> Here, I would say "included" instead of built-in. Or is there another term we
> should use to avoid confusion with "git <cmd>" builtins.
> 
>> +  moving it out of `contrib/` and into the root of `git`. The actual change will
>> +  be relatively small, but this series will also contain expanded testing to
>> +  ensure Scalar is stable and performant.
> 
> Perhaps "This change will also integrate Scalar into Git's test suite and CI
> checks."
> 
> You mention "performant" which makes me think that performance tests are intended
> to be part of this change. It makes me think it would be interesting to have our
> existing performance tests create a mode where they compare a "vanilla" Git repo
> to one registered with Scalar, but otherwise runs the same tests already in the
> t/perf/ test scripts. This is a wide aside so feel free to ignore me.
> 

This is a really interesting idea! My original plan was to add some basic
tests around the operations 'scalar' should (directly or indirectly) speed
up. I think I'll still need to do that anyway (e.g., for things like 'scalar
clone' vs 'git clone'), but I'll also try to find a (repeatable) way to
compare standard repo vs. Scalar enlistment performance in the existing perf
tests.

>> +Finally, there are two additional patch series that exist in Microsoft's fork of
>> +Git, but there is no current plan to upstream them. There are some interesting
>> +ideas there, but the implementation is too specific to Azure Repos and/or VFS
>> +for Git to be of much help in general.
>> +
>> +These still exist mainly because the GVFS protocol is what Azure Repos has
>> +instead of partial clone, while Git is focused on improving partial clone:
>> +
>> +- `scalar-with-gvfs`: The primary purpose of this patch series is to support
>> +  existing Scalar users whose repositories are hosted in Azure Repos (which does
>> +  not support Git's partial clones, but supports its predecessor, the GVFS
>> +  protocol, which is used by Scalar to emulate the partial clone).
>> +
>> +  Since the GVFS protocol will never be supported by core Git, this patch series
>> +  will remain in Microsoft's fork of Git.
>> +
>> +- `run-scalar-functional-tests`: The Scalar project developed a quite
>> +  comprehensive set of integration tests (or, "Functional Tests"). They are the
>> +  sole remaining part of the original C#-based Scalar project, and this patch
>> +  adds a GitHub workflow that runs them all.
>> +
>> +  Since the tests partially depend on features that are only provided in the
>> +  `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
> 
> We've had a good track record of identifying when an upstream change causes a
> failure in the Scalar Functional Tests and then porting that test into the Git test
> suite. I expect that to continue, but it has also not happened in a while. This is
> probably because the test coverage now has a significant amount of overlap. Many
> of the tests were created specifically for VFS for Git and strangeness around the
> virtual filesystem layer. Several of those have found interesting corner cases in
> Git's sparse-checkout feature, but these have probably been resolved by now.
> 
> Thanks for doing this work, Victoria. I'm excited to see how the project progresses
> with your new plan.
> 
> Thanks,
> -Stolee


  reply	other threads:[~2022-07-11 23:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 16:58 [PATCH 0/2] [RFC] scalar: prepare documentation for future work Victoria Dye via GitGitGadget
2022-06-29 16:58 ` [PATCH 1/2] scalar: reword command documentation to clarify purpose Victoria Dye via GitGitGadget
2022-06-29 16:58 ` [PATCH 2/2] scalar: convert README.md into a technical design doc Victoria Dye via GitGitGadget
2022-06-29 17:58   ` Derrick Stolee
2022-07-11 23:05     ` Victoria Dye [this message]
2022-07-15 16:52       ` Derrick Stolee
2022-06-29 17:41 ` [PATCH 0/2] [RFC] scalar: prepare documentation for future work Derrick Stolee
2022-06-29 21:50 ` Johannes Schindelin
2022-07-12  0:06 ` [PATCH v2 0/2] " Victoria Dye via GitGitGadget
2022-07-12  0:06   ` [PATCH v2 1/2] scalar: reword command documentation to clarify purpose Victoria Dye via GitGitGadget
2022-07-12  0:06   ` [PATCH v2 2/2] scalar: convert README.md into a technical design doc Victoria Dye via GitGitGadget
2022-07-15 16:53   ` [PATCH v2 0/2] scalar: prepare documentation for future work Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee9ea998-fb9d-1bf0-635a-e1627c7c1c40@github.com \
    --to=vdye@github.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).