From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: git@vger.kernel.org
Subject: [Summit topic] Crazy (and not so crazy) ideas
Date: Thu, 21 Oct 2021 13:55:21 +0200 (CEST) [thread overview]
Message-ID: <nycvar.QRO.7.76.6.2110211144490.56@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2110211129130.56@tvgsbejvaqbjf.bet>
[-- Attachment #1: Type: text/plain, Size: 7494 bytes --]
This session was led by Elijah Newren. Supporting cast: Johannes "Dscho"
Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson,
Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey,
Taylor Blau, and Philip Oakley.
Notes:
* sent my idea for rebase merges on-list
* Test suite is slow. Shell scripts and process forking.
* What if we had a special shell that interpreted the commands in a
single process?
* Even Git commands like rev-parse and hash-object, as long as that’s
not the command you’re trying to test
* Dscho wants to slip in a C-based solution
* Jonathan tan commented: going back to your custom shell for tests
idea, one thing we could do is have a custom command that generates
the repo commits that we want (and that saves process spawns and
might make the tests simpler too)
* We could replace several “setup repo” steps with “git fast-import”
instead.
* Dscho measured: 0.5 sec - 30 sec in setup steps. Can use fast-import,
or can make a new format that helps us set up the test scenario
* Elijah: test-lib-functions helpers could be built ins
* Biggest idea: there are a lot of people who version control things via
tarballs or .zip files per version. This prevents history from
compressing well. Some people check in those compressed files into Git
for purposes of history.
* In particular, .jar files or npm packages. Initial testing showed
that you can expand .jar files in a way that creates source-like
files.
* Jonathan Nieder points out that “pristine-tar” exists to do similar
ideas: https://joeyh.name/code/pristine-tar/
* Others use “git archive” for this purpose to mixed success.
* jars and npm packages compress better if you store them in expanded
form instead of compressed form
* So many tools are used to using the end-archive, so while it’s
tempting to have the build system be responsible for this, being able
to “git add” the archive and have the right thing happen behind the
scenes would be nice for ease of use
* Goal here isn’t bit-for-bit reproducibility, just semantic
reproducibility
* What about other file formats that use zips, such as LibreOffice?
* Git Merge 2018: Designers Git-It; A unified design system workflow
did something similar, except made the tool understand the “exploded”
file view.
* Jonathan Tan mentions that smudge/clean filters can help, except this
is about tree<->blob instead of blob<->blob
* brian m. carlson mentions “git archive” output isn’t stable across
Git versions. Should we have a canonical tar format that provides
reproducibility?
* Peff: tree<->blob filters can get confusing in the
tree<->index<->worktree mapping. Possible, but requires careful
thought about the details about when each spot
* Old suggestion of a “blob-tree” type that allows storing a single
index entry that corresponds to multiple trees and blobs in the
background, possibly.
* One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to
store large binary files in a tree-structured way that can store
common regions as deltas, improve random access, parallelized
hashing. Involves a consistent way to split the file into stable
pieces, like --rsyncable uses (based on a rolling hash being zero).
* Peff: you can do that at the object model layer or at the storage
layer. The latter is less invasive.
* jrnieder: The benefits of blobtree are greater at the object model
layer --- e.g. not having to transmit chunks over the wire that you
already have. I think the main obstacle has been that the benefits
haven’t been enough to be worth the complexity. If that changes, we
can imagine bundling it with some other object format changes, e.g.
putting blob sizes in tree objects, and rolling it out as a new
object-format.
* Ævar: can we do this in a simpler manner, without deep technical
changes? (Context: was thinking about this in the context of some
$id$ questions.) Clean/smudge filters have some significant UX
drawbacks. Has experience helping users trying to commit .jar files.
Some simple advice saying “maybe you don’t want to commit this file
type, here are some ways to expand it to a committable format…” based
on patterns such as .gitignore or .gitattributes. We don’t have ways
to indicate “this repo uses Git LFS, but you don’t have the plugin.”
* Emily: If I could rewrite the commit object format, I would change some
things
* Allow multiple authors
* Add a layer of indirection to author name
* brian has thought about this too: replace name with email address
+ some ssh key or something and use something mailmap-like to map
it. Could be a backward-compatible approach
* CB has been thinking about these problems in the background. Could
randomly generate an identifier when you commit your first patch, an
@example.com address to avoid conflicting with any real address.
Mailmap can be a blob maintained by the project
* In the process can get first-class multiple authors
* If I have this id representing this particular pair of authors,
can update what the id points to
* Cool stuff but gets complicated
* Just getting mailmap applied to trailers in “git log” would be huge
* CB: main reason I don’t put myself in mailmap is that it’s not
worth bothering without that feature
* Ævar: “git log --author” would want the mapping, too. (and ‘git
shortlog --group’) Do we do this only at the presentation layer or
if we do it at a lower layer do we get such things for free?
* If anyone’s interested, I might know where the dragons are hiding,
happy to give advice
* Peff: “git shortlog” already knows how to parse it out so this
seems very possible
* Taylor:
https://lore.kernel.org/git/YW8A5FznqLYs7MqH@coredump.intra.peff.net/T/
* Generation number was discussed ~2011(?)
* Ævar: does this really need a format change? Two “author” fields
would break things, but could have “author” and “x-author” header
* General principle when changing formats: teasing apart where it’s
possible to achieve what you want backward compatibility
* Philip Oakley would like a commit id referring to an unborn branch as a
proper id
* brian: empty tree works for what you’re talking about when you want a
diff
* Philip: motivating example was “first parent is going nowhere, but
you have a second parent”
* jrnieder: I see, you want the --first-parent history of your
published branch to match the reflog. As a workaround, you’re able to
use an empty initial commit and use --no-ff merges whenever you pull
things in, but you’re referring to wishing you didn’t have to make
that empty initial commit
* Ævar: reminds me of the discussion in
https://www.fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki of
commit/branch relationships
next prev parent reply other threads:[~2021-10-21 11:55 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-21 11:55 Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-21 11:55 ` Johannes Schindelin [this message]
2021-10-21 12:30 ` [Summit topic] Crazy (and not so crazy) ideas Son Luong Ngoc
2021-10-26 20:14 ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Eric Wong
2021-10-30 19:58 ` Ævar Arnfjörð Bjarmason
2021-11-03 9:24 ` test suite speedups via some not-so-crazy ideas (was: scripting speedups[...]) Ævar Arnfjörð Bjarmason
2021-11-03 22:12 ` test suite speedups via some not-so-crazy ideas Junio C Hamano
2021-11-02 13:52 ` scripting speedups [was: [Summit topic] Crazy (and not so crazy) ideas] Johannes Schindelin
2021-10-21 11:55 ` [Summit topic] SHA-256 Updates Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Server-side merge/rebase: needs and wants? Johannes Schindelin
2021-10-22 3:06 ` Bagas Sanjaya
2021-10-22 10:01 ` Johannes Schindelin
2021-10-23 20:52 ` Ævar Arnfjörð Bjarmason
2021-11-08 18:21 ` Taylor Blau
2021-11-09 2:15 ` Ævar Arnfjörð Bjarmason
2021-11-30 10:06 ` Christian Couder
2021-10-21 11:56 ` [Summit topic] Submodules and how to make them worth using Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] Sparse checkout behavior and plans Johannes Schindelin
2021-10-21 11:56 ` [Summit topic] The state of getting a reftable backend working in git.git Johannes Schindelin
2021-10-25 19:00 ` Han-Wen Nienhuys
2021-10-25 22:09 ` Ævar Arnfjörð Bjarmason
2021-10-26 8:12 ` Han-Wen Nienhuys
2021-10-28 14:17 ` Philip Oakley
2021-10-26 15:51 ` Philip Oakley
2021-10-21 11:56 ` [Summit topic] Documentation (translations, FAQ updates, new user-focused, general improvements, etc.) Johannes Schindelin
2021-10-22 14:20 ` Jean-Noël Avila
2021-10-22 14:31 ` Ævar Arnfjörð Bjarmason
2021-10-27 7:02 ` Jean-Noël Avila
2021-10-27 8:50 ` Jeff King
2021-10-21 11:56 ` [Summit topic] Increasing diversity & inclusion (transition to `main`, etc) Johannes Schindelin
2021-10-21 12:55 ` Son Luong Ngoc
2021-10-22 10:02 ` vale check, was " Johannes Schindelin
2021-10-22 10:03 ` Johannes Schindelin
2021-10-21 11:57 ` [Summit topic] Improving Git UX Johannes Schindelin
2021-10-21 16:45 ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Ævar Arnfjörð Bjarmason
2021-10-21 23:03 ` changing the experimental 'git switch' Junio C Hamano
2021-10-22 3:33 ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Bagas Sanjaya
2021-10-22 14:04 ` martin
2021-10-22 14:24 ` Ævar Arnfjörð Bjarmason
2021-10-22 15:30 ` martin
2021-10-23 8:27 ` changing the experimental 'git switch' Sergey Organov
2021-10-22 21:54 ` Sergey Organov
2021-10-24 6:54 ` changing the experimental 'git switch' (was: [Summit topic] Improving Git UX) Martin
2021-10-24 20:27 ` changing the experimental 'git switch' Junio C Hamano
2021-10-25 12:48 ` Ævar Arnfjörð Bjarmason
2021-10-25 17:06 ` Junio C Hamano
2021-10-25 16:44 ` Sergey Organov
2021-10-25 22:23 ` Ævar Arnfjörð Bjarmason
2021-10-27 18:54 ` Sergey Organov
2021-10-21 11:57 ` [Summit topic] Improving reviewer quality of life (patchwork, subsystem lists?, etc) Johannes Schindelin
2021-10-21 13:41 ` Konstantin Ryabitsev
2021-10-22 22:06 ` Ævar Arnfjörð Bjarmason
2021-10-22 8:02 ` Missing notes, was Re: Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-22 8:22 ` Johannes Schindelin
2021-10-22 8:30 ` Johannes Schindelin
2021-10-22 9:07 ` Johannes Schindelin
2021-10-22 9:44 ` Let's have public Git chalk talks, " Johannes Schindelin
2021-10-25 12:58 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.QRO.7.76.6.2110211144490.56@tvgsbejvaqbjf.bet \
--to=johannes.schindelin@gmx.de \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).