git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: git@vger.kernel.org
Subject: [Summit topic] Crazy (and not so crazy) ideas
Date: Thu, 21 Oct 2021 13:55:21 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2110211144490.56@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2110211129130.56@tvgsbejvaqbjf.bet>

[-- Attachment #1: Type: text/plain, Size: 7494 bytes --]

This session was led by Elijah Newren. Supporting cast: Johannes "Dscho"
Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson,
Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey,
Taylor Blau, and Philip Oakley.

Notes:

* sent my idea for rebase merges on-list

* Test suite is slow. Shell scripts and process forking.

   * What if we had a special shell that interpreted the commands in a
     single process?

   * Even Git commands like rev-parse and hash-object, as long as that’s
     not the command you’re trying to test

   * Dscho wants to slip in a C-based solution

   * Jonathan tan commented: going back to your custom shell for tests
     idea, one thing we could do is have a custom command that generates
     the repo commits that we want (and that saves process spawns and
     might make the tests simpler too)

      * We could replace several “setup repo” steps with “git fast-import”
        instead.

   * Dscho measured: 0.5 sec - 30 sec in setup steps. Can use fast-import,
     or can make a new format that helps us set up the test scenario

   * Elijah: test-lib-functions helpers could be built ins

* Biggest idea: there are a lot of people who version control things via
  tarballs or .zip files per version. This prevents history from
  compressing well. Some people check in those compressed files into Git
  for purposes of history.

   * In particular, .jar files or npm packages. Initial testing showed
     that you can expand .jar files in a way that creates source-like
     files.

   * Jonathan Nieder points out that “pristine-tar” exists to do similar
     ideas: https://joeyh.name/code/pristine-tar/

   * Others use “git archive” for this purpose to mixed success.

   * jars and npm packages compress better if you store them in expanded
     form instead of compressed form

   * So many tools are used to using the end-archive, so while it’s
     tempting to have the build system be responsible for this, being able
     to “git add” the archive and have the right thing happen behind the
     scenes would be nice for ease of use

   * Goal here isn’t bit-for-bit reproducibility, just semantic
     reproducibility

   * What about other file formats that use zips, such as LibreOffice?

   * Git Merge 2018: Designers Git-It; A unified design system workflow
     did something similar, except made the tool understand the “exploded”
     file view.

   * Jonathan Tan mentions that smudge/clean filters can help, except this
     is about tree<->blob instead of blob<->blob

   * brian m. carlson mentions “git archive” output isn’t stable across
     Git versions. Should we have a canonical tar format that provides
     reproducibility?

   * Peff: tree<->blob filters can get confusing in the
     tree<->index<->worktree mapping. Possible, but requires careful
     thought about the details about when each spot

   * Old suggestion of a “blob-tree” type that allows storing a single
     index entry that corresponds to multiple trees and blobs in the
     background, possibly.

   * One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to
     store large binary files in a tree-structured way that can store
     common regions as deltas, improve random access, parallelized
     hashing. Involves a consistent way to split the file into stable
     pieces, like --rsyncable uses (based on a rolling hash being zero).

   * Peff: you can do that at the object model layer or at the storage
     layer. The latter is less invasive.

   * jrnieder: The benefits of blobtree are greater at the object model
     layer --- e.g. not having to transmit chunks over the wire that you
     already have. I think the main obstacle has been that the benefits
     haven’t been enough to be worth the complexity. If that changes, we
     can imagine bundling it with some other object format changes, e.g.
     putting blob sizes in tree objects, and rolling it out as a new
     object-format.

   * Ævar: can we do this in a simpler manner, without deep technical
     changes? (Context: was thinking about this in the context of some
     $id$ questions.) Clean/smudge filters have some significant UX
     drawbacks. Has experience helping users trying to commit .jar files.
     Some simple advice saying “maybe you don’t want to commit this file
     type, here are some ways to expand it to a committable format…” based
     on patterns such as .gitignore or .gitattributes. We don’t have ways
     to indicate “this repo uses Git LFS, but you don’t have the plugin.”

* Emily: If I could rewrite the commit object format, I would change some
  things

   * Allow multiple authors

   * Add a layer of indirection to author name

      * brian has thought about this too: replace name with email address
        + some ssh key or something and use something mailmap-like to map
        it. Could be a backward-compatible approach

   * CB has been thinking about these problems in the background. Could
     randomly generate an identifier when you commit your first patch, an
     @example.com address to avoid conflicting with any real address.
     Mailmap can be a blob maintained by the project

      * In the process can get first-class multiple authors

      * If I have this id representing this particular pair of authors,
        can update what the id points to

      * Cool stuff but gets complicated

   * Just getting mailmap applied to trailers in “git log” would be huge

      * CB: main reason I don’t put myself in mailmap is that it’s not
        worth bothering without that feature

      * Ævar: “git log --author” would want the mapping, too. (and ‘git
        shortlog --group’) Do we do this only at the presentation layer or
        if we do it at a lower layer do we get such things for free?

      * If anyone’s interested, I might know where the dragons are hiding,
        happy to give advice

      * Peff: “git shortlog” already knows how to parse it out so this
        seems very possible

      * Taylor:
        https://lore.kernel.org/git/YW8A5FznqLYs7MqH@coredump.intra.peff.net/T/

   * Generation number was discussed ~2011(?)

   * Ævar: does this really need a format change? Two “author” fields
     would break things, but could have “author” and “x-author” header

   * General principle when changing formats: teasing apart where it’s
     possible to achieve what you want backward compatibility

* Philip Oakley would like a commit id referring to an unborn branch as a
  proper id

   * brian: empty tree works for what you’re talking about when you want a
     diff

   * Philip: motivating example was “first parent is going nowhere, but
     you have a second parent”

   * jrnieder: I see, you want the --first-parent history of your
     published branch to match the reflog. As a workaround, you’re able to
     use an empty initial commit and use --no-ff merges whenever you pull
     things in, but you’re referring to wishing you didn’t have to make
     that empty initial commit

   * Ævar: reminds me of the discussion in
     https://www.fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki of
     commit/branch relationships

  reply	other threads:[~2021-10-21 11:55 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 11:55 Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-21 11:55 ` Johannes Schindelin [this message]
2021-10-21 12:30   ` [Summit topic] Crazy (and not so crazy) ideas Son Luong Ngoc
2021-10-26 20:14   ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Eric Wong
2021-10-30 19:58     ` Ævar Arnfjörð Bjarmason
2021-11-03  9:24       ` test suite speedups via some not-so-crazy ideas (was: scripting speedups[...]) Ævar Arnfjörð Bjarmason
2021-11-03 22:12         ` test suite speedups via some not-so-crazy ideas Junio C Hamano
2021-11-02 13:52     ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Johannes Schindelin
2021-10-21 11:55 ` [Summit topic] SHA-256 Updates Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Server-side merge/rebase: needs and wants? Johannes Schindelin
2021-10-22  3:06   ` Bagas Sanjaya
2021-10-22 10:01     ` Johannes Schindelin
2021-10-23 20:52       ` Ævar Arnfjörð Bjarmason
2021-11-08 18:21   ` Taylor Blau
2021-11-09  2:15     ` Ævar Arnfjörð Bjarmason
2021-11-30 10:06       ` Christian Couder
2021-10-21 11:56 ` [Summit topic] Submodules and how to make them worth using Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Sparse checkout behavior and plans Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] The state of getting a reftable backend working in git.git Johannes Schindelin
2021-10-25 19:00   ` Han-Wen Nienhuys
2021-10-25 22:09     ` Ævar Arnfjörð Bjarmason
2021-10-26  8:12       ` Han-Wen Nienhuys
2021-10-28 14:17         ` Philip Oakley
2021-10-26 15:51       ` Philip Oakley
2021-10-21 11:56 ` [Summit topic] Documentation (translations, FAQ updates, new user-focused, general improvements, etc.) Johannes Schindelin
2021-10-22 14:20   ` Jean-Noël Avila
2021-10-22 14:31     ` Ævar Arnfjörð Bjarmason
2021-10-27  7:02       ` Jean-Noël Avila
2021-10-27  8:50       ` Jeff King
2021-10-21 11:56 ` [Summit topic] Increasing diversity & inclusion (transition to `main`, etc) Johannes Schindelin
2021-10-21 12:55   ` Son Luong Ngoc
2021-10-22 10:02     ` vale check, was " Johannes Schindelin
2021-10-22 10:03       ` Johannes Schindelin
2021-10-21 11:57 ` [Summit topic] Improving Git UX Johannes Schindelin
2021-10-21 16:45   ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Ævar Arnfjörð Bjarmason
2021-10-21 23:03     ` changing the experimental 'git switch' Junio C Hamano
2021-10-22  3:33     ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Bagas Sanjaya
2021-10-22 14:04     ` martin
2021-10-22 14:24       ` Ævar Arnfjörð Bjarmason
2021-10-22 15:30         ` martin
2021-10-23  8:27           ` changing the experimental 'git switch' Sergey Organov
2021-10-22 21:54         ` Sergey Organov
2021-10-24  6:54       ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Martin
2021-10-24 20:27         ` changing the experimental 'git switch' Junio C Hamano
2021-10-25 12:48           ` Ævar Arnfjörð Bjarmason
2021-10-25 17:06             ` Junio C Hamano
2021-10-25 16:44     ` Sergey Organov
2021-10-25 22:23       ` Ævar Arnfjörð Bjarmason
2021-10-27 18:54         ` Sergey Organov
2021-10-21 11:57 ` [Summit topic] Improving reviewer quality of life (patchwork, subsystem lists?, etc) Johannes Schindelin
2021-10-21 13:41   ` Konstantin Ryabitsev
2021-10-22 22:06     ` Ævar Arnfjörð Bjarmason
2021-10-22  8:02 ` Missing notes, was Re: Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-22  8:22   ` Johannes Schindelin
2021-10-22  8:30     ` Johannes Schindelin
2021-10-22  9:07       ` Johannes Schindelin
2021-10-22  9:44 ` Let's have public Git chalk talks, " Johannes Schindelin
2021-10-25 12:58   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2110211144490.56@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).