git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: nicolas.mailhot@laposte.net
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [RFE] Add minimal universal release management capabilities to GIT
Date: Sat, 21 Oct 2017 15:56:51 +0200 (CEST)	[thread overview]
Message-ID: <1760206035.56692.1508594211365.JavaMail.zimbra@laposte.net> (raw)
In-Reply-To: <CAGZ79kYRq4OugvTfb2WNdk-M5DMAZC0JpJHqC1KSeJY2eNN1=Q@mail.gmail.com>



----- Mail original -----
De: "Stefan Beller" 

>> Unfortunately Git is so good more and more developers start to procrastinate on any activity that happens outside of GIT,
>> starting with cutting releases. The meme "one only needs a git commit hash" is going strong, even infecting institutions
>> like lwn and glibc (https://lwn.net/SubscriberLink/736429/e5a8c8888cc85cc8/)

> For release you would want to include more than just "the code" into
> the hash, such as compiler versions, environment variables, the phase
> of the moon, what have you, that may impact the release build.

Yes and no. Yes because you do want to limit failure cases, and no because it's very easy to overspecify and block code reuse possibilities. Anyway I don't see a strong consensus on how to do those yet, they are very language-specific, and the first step is being able to identify other code you depend on which requires some sort of release id, which is what my message was about. You can't build any compatibility matrix, before being able to name the dimensions of the matrix.

> It sounds to me as if you assume that if X, Y, Z were numbers (or
> rather had some order), this can be easily deduced.

It's a lot more easy to use "option foo was introduced in version 2.3.4 and takes Y parameters" than "option foo was introduced in commit hash #############################################, you have version hash $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$", good luck.

> The output of git-describe ought to be sufficient for an ordering
> scheme to rely on?

That relies on git access to the repo of every bit of code your computer runs. This is not practical past the deployment phase. For deployment the ordering needs to be extracted from all the git data so you only need to manipulate short human and tool-friendly ids. You need low coupling not the strong coupling of git repo access.

>> — hashes are not ranked. You can not guess, looking at a hash, if it corresponds to a project stability point, or is in a
>> middle of a refactoring sequence, where things are expected to break. Evaluating every hash of every project you use
>> quickly becomes prohibitive, with the only possible strategy being to just use the latest commit at a given time and pra
>> (and if you are lucky never never update afterwards unless you have lots of fixing and testing time to waste).

> That is up to the hash function. One could imagine a hash function
> that generates bit patterns that you can use to obtain an order from.

No that is not up to the hash function. First because hashes are too long to be manipulated by humans, and second no hash will ever capture human intent. You need an explicit human action to mark "I want others to use this particular state of my project because I am confident it is solid".

>> – commit mixing is broken by design.

> In Git terms a repository is the whole universe.
> If you want relationships between different projects, you need to
> include these projects e.g. via subtree or submodules.
> It scales even up to linux distributions (e.g.
> https://github.com/gittup/gittup, includes nethack!)

This is still pre-deployment phase. And I wouldn't qualify this as "full linux distro", it's very small scale. If anything it demonstrated than even on a smallish perimeter relying on git alone as it stands today is too hard (3 updates in the whole 2017 year!).

>> One can not adapt the user of a piece of code to changes in this piece of code before those changes are committed in the
>> first place. There will always be moments where the latest commit of a project, is incompatible with the latest commit of
>> downsteam users of this project. It is not a problem in developer environments and automated testers, where you want things >> to break early and be fixed early. It is a huge problem when you follow the same early commit push strategy for actual
>> production code, where failures are not just a red light in a build farm dashboard, but have real-world consequences. And
>> the more interlinked git repositories you pile on one another, the higher the probability is two commits won't work with
>> one another with failures cascading down

> That is software engineering in general, I am not sure how Git relates
> to this? Any change that you make (with or without utilizing Git) can
> break the downstream world.

It's a lot easier to manage when you have discrete release synchronisation point and not just a flow of commits

>> – commits are a bad inter-project synchronisation point. There are too many of them, they are not ranked, everyone is
>> choosing a different commit to deploy, that effectively kills the network effects that helped making traditional releases
>> solid (because distributors used the same release state, and could share feedback and audit results).

> There are different strategies. Relevant open source projects (kernel,
> glibc, git) are pretty good at not breaking the downstream users with
> every commit.

Just pick any random kernel commit during merge windows, try to build/run it and we'll talk again;)
What those projects are pretty good at is a clear release strategy that helps their users identify good project states which are safe to run.

Except, the releasing happens outside git, it's still fairly manual. All I'm proposing is to integrate the basic functions in git to simplify the life of those projects and help smaller projects that want completely intergrated git workflows.

> If you want faster velocity, you have to couple the projects more
> (submodules or a large repo including everything)

Just try do to it. You'll get slower velocity because of the difficulty inherent in managing a large number of projects with strong git coupling. And if you tell me "I'll just not update everything in parallel all the time" you've just reinvented releasing without the help of explicit release states.

> I am not convinced, yet. As said initially the release handling needs
> to take more things into account (compiler version, hardware version
> of the fleet, etc) which is usually not tracked in Git. Well you
> could, but that is the job of the release management tool, no?

Yes and it is so fun to herd hundreds of management tools with different conventions and quirks. About as much fun as managing dozens of scm before most projects settled on git. All commonalities need to migrate in the common git layer to simplify management and release id is the first of those. Besides the first thing those tools want is a way to identify the states to use, they'll be the first consumers of release integration in git.

>> 1. "release versions" are first class objects that can be attached to a commit (not just freestyle tags that look like
>> versions, but may be something else entirely). Tools can identify release IDs reliably.

> git tags ?

Too loosely defined to be relied on by project-agnostic tools. That's what most tools won't ever try to use those. Anything you will define around tags as they stand is unlikely to work on the project of someone else

>> 2. "release versions" have strong format constrains, that allow humans and tools to deduce their ordering without needing
>> access to something else (full git history or project-specific conventions). The usual string of numbers separated by dots
>> is probably simple and universal enough (if you start to allow letters people will try to use clever schemes like alpha or 
>> roman numerals, that break automation). There needs to be at least two numbers in the string to allow tracking patchlevels.

> git tags are pretty open ended in their naming. the strictness would
> need to be enforced by the given requirement of the environment.

You've just lost. You can't build any complex system without some level of shared conventions, if you limit the conventions to the project level you limit what you can build above the project level, starting with tooling

> (Some
> want to have just one integer number going up; others want patch
>levels, i.e. 4 ints; 

That's why I don't propose to set any constrain on the number of levels, except for a minimum (2, because in practice every project will need a point release at some time).

This is sufficient for automation, and pretty much half of what linux distros do to manage complex multi-project systems (convert loosely under-specified versionning to chains of numbers that deb or rpm can understand). And distributions manage to do that because that's already pretty much the release id conventions everyone uses, with minor variations.

> yet others want dates?)

That never worked so well, half the time you miss the date because of delays and it's too late to change the naming everyone expects. But anyway, nothing prevents you from using 2017.10.21.0 as release id, the proposed scheme allows this.

>> 3. several such objects can be attached to a commit (a project may wish to promote a minor release to major one after it
>> passes QA, versionning history should not be lost).

> Multiple git tags can be attached to the same commit. You can even tag
> a tag or tag a blob.

Again the problem with tags is that they can be anything, you can't rely on a tag being a release id, you can't rely on a tag having ordering constrains, you can't build any tooling around those above the project level.

>> 4. absent human intervention the release state of a repo is initialised at 0.0, for its first commit (tools can rely on at >> least one release existing in a repo).

> An initial repo doesn't have tags, which comes close to 0.

And it's not defined anywere so some will insist history starts at 1 or at -52 BC or whatever. Explicit convention enforced by tooling that others tools can rely on trumps implicit convention that can be argued to death all the time.

>> 5. a command, such as "git release", allow a human with control of the repo to set an explicit release version to a commit. 

> This sounds fairly specific to an environment that you are in, maybe
> write git-release for your environment and then open source it. The
> world will love it (assuming they have the same environment and
> needs).

If you take the time to look at it it is not specific, it is generic.

But, anyway yet another project bubble presents no value. The value of conventions is that they are shared, not that they are better than the neighbour's. I'll applaud anything done at git level because all the other tools and humans rely on this level. I'm sick of looking at conversion heuristics between higher-level tools, because they can't rely on scm-level conventions.

>> 9. a command, such as "git release cut", 
> git -archive comes to mind, doing a subset here.

It is not complex to do. The value is not on its complexity, the value is in setting conventions others can rely on.

>> 11. when no releasing has been done in a repo for some time (I'd suggest 3 months to balance freshness with churn, it can >> be user-overidable in repo config), git reminds its human masters at the next commit events they should think about >> stabilizing state and cutting a release.

> This is all process specific to your environment. Consider e.g. the
> C++ standard committee tracking the C++ Standard in Git .
> https://isocpp.org/std/the-committee
> They make a release every 10 years or such, so 3 month is off!

Actually you'll find out that they do intermediary pre-standard releases way more often. 3 months is the average that works for most projects. I don't propose to set it in stone, just as a sane default.


> Integrating with CI and release is definitely important, but Git
> itself has no idea about the requirements and environments of the
> project specifics,

The proposal is not just about CI. The software life does not end when a dev pushes code to CI. You need to identify software during its whole lifecycle, and the id needs to start in the scm, because that's where the lifecycle starts.

Right now the only shared id that does not depend on project environment that git proposes is commit hashes, and it is terrible in post-dev stages of the lifecycle.

Regards,

-- 
Nicolas Mailhot

  reply	other threads:[~2017-10-21 13:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <2089146798.216127.1508487262593.JavaMail.zimbra@laposte.net>
2017-10-20 10:40 ` [RFE] Add minimal universal release management capabilities to GIT nicolas.mailhot
2017-10-20 21:12   ` Stefan Beller
2017-10-21 13:56     ` nicolas.mailhot [this message]
2017-10-22 14:15       ` Kaartic Sivaraam
2017-10-23  8:46         ` nicolas.mailhot
2017-10-24  7:38       ` Jacob Keller
2017-10-27  7:16         ` nicolas.mailhot
2017-10-21 21:50   ` Randall S. Becker
2017-10-23  9:16     ` nicolas.mailhot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1760206035.56692.1508594211365.JavaMail.zimbra@laposte.net \
    --to=nicolas.mailhot@laposte.net \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).