About GIT Internals

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* About GIT Internals
@ 2022-05-25 16:10 Aman
  2022-05-25 16:49 ` Emily Shaffer
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Aman @ 2022-05-25 16:10 UTC (permalink / raw)
  To: git

Hello there,

I have recently been reading The Architecture for Open Source
Applications book - and read the chapters dedicated to GIT internals.
And if I am being completely honest, I didn't understand most of it.

Could someone please assist - in sharing some resources - which I
could go through, to better understand GIT software internals.

(I am a high school student, and really want to learn more about how
all the great software and hardware around us work - which so many of
us take for granted)

Regards,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-25 16:10 About GIT Internals Aman
@ 2022-05-25 16:49 ` Emily Shaffer
  2022-05-25 21:14 ` Erik Cervin Edin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Emily Shaffer @ 2022-05-25 16:49 UTC (permalink / raw)
  To: Aman; +Cc: Git List

On Wed, May 25, 2022 at 9:11 AM Aman <amanmatreja@gmail.com> wrote:
>
> Hello there,
>
> I have recently been reading The Architecture for Open Source
> Applications book - and read the chapters dedicated to GIT internals.
> And if I am being completely honest, I didn't understand most of it.
>
> Could someone please assist - in sharing some resources - which I
> could go through, to better understand GIT software internals.

I am really excited you asked! This puts you firmly on the road to
being the person who can help unstick all your friends when they get
into Git messes later on. ;)

https://docs.google.com/presentation/d/1IQCRPHEIX-qKo7QFxsD3V62yhyGA9_5YsYXFOiBpgkk/edit?usp=sharing
<- This is a really great intro to the internals which I love. I
pretty much always recommend it as the place to start for someone
curious about learning how Git works.
https://www.youtube.com/watch?v=5Gq3KVvcfDk <- This covers much of the
same territory but has a nice video to go through it, in case it's
easier for you to learn that way instead of reading slides.

If you have additional questions about the technical design of Git
following one or both of those presentations above, I think you could
get far starting with Git's own design documentation:
https://github.com/git/git/tree/master/Documentation/technical

From there I think the list will be the best place for specific
followup questions you might have.

Happy learning!

 - Emily

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-25 16:10 About GIT Internals Aman
  2022-05-25 16:49 ` Emily Shaffer
@ 2022-05-25 21:14 ` Erik Cervin Edin
  2022-05-25 23:34 ` git-vger
  2022-05-26 12:45 ` Konstantin Khomoutov
  3 siblings, 0 replies; 17+ messages in thread
From: Erik Cervin Edin @ 2022-05-25 21:14 UTC (permalink / raw)
  To: Aman; +Cc: git

On Wed, May 25, 2022 at 10:14 PM Aman <amanmatreja@gmail.com> wrote:
>
> And if I am being completely honest, I didn't understand most of it.

You are not alone, there are many that struggle with understanding how
git works internally.

> (I am a high school student, and really want to learn more about how
> all the great software and hardware around us work - which so many of
> us take for granted)

Perhaps not a good resource, depending on your familiarity with
computer science but
https://eagain.net/articles/git-for-computer-scientists/
is an article that is often recommended.

I think for me, the hardest part of understanding Git was the
difficulty conceptualizing it.
But at its core Git is very simple.

You can think of it as a folder of files that you can "save" (commit)
whenever you want.
Each time you "save" (commit), all files and folders are "copied" to
another folder (the local repository).
That means that if you ever want to look at a previous version of a
file, it's there.
For simplicity's sake you can think of this as being unchangeable.
Once a file is saved it's saved forever.

Just having a messy pile of every single version of a file is not useful,
so the rest of git consists of making this manageable.
For example by remembering who saved it, when and why (by making them
write a message when they save).

The main thing however is that Git orders saves.
This order is not necessarily one version after another, sorted by
when they were saved.
Instead, order is manually controlled by saving files in different
places (branches).
In its simplest form, a branch is several saves, one after another.

Because of how Git orders saves, I can work on files, save them and
give them to you.
You can keep working on those files and make your own saves.
But I don't have to wait for you to send your work back to me.
I can keep working on the same files and making my own saves.

When you're done you can put your saves in a "shared folder" (a remote
repository).
Later, when I'm done, I can get your saves and Git can help me figure
out which parts of the files that you changed that I didn't and copy
both of our work into new files (merging).

This is a bit of an oversimplification and Git allows users to do more
advanced things but the gist is basically this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-25 16:10 About GIT Internals Aman
  2022-05-25 16:49 ` Emily Shaffer
  2022-05-25 21:14 ` Erik Cervin Edin
@ 2022-05-25 23:34 ` git-vger
  2022-05-26  8:47   ` Philip Oakley
  2022-05-26 12:45 ` Konstantin Khomoutov
  3 siblings, 1 reply; 17+ messages in thread
From: git-vger @ 2022-05-25 23:34 UTC (permalink / raw)
  To: Aman; +Cc: git

Hi Aman, responses inline below.

On Wed, May 25, 2022 at 09:40:42PM +0530, Aman wrote:
> Could someone please assist - in sharing some resources - which I
> could go through, to better understand GIT software internals.

There is an excellent free book at https://git-scm.com/book/en/v2 .

Chapter 10 is about git internals. It is important to realize that,
unlike many other version control systems, git works effectively on
files locally on your computer, without any server or other shared
resources to manage. Also, one good way to learn may be to form a
question that you want to answer first. "How do I ...." or "what happens
when I ....". Since git works locally, it is possible to create a git
repo, look at the files contained in the .git directory, take action
with git, and then look at the files again.

Many people use git from the command line. If you are not familiar with
the command line, you may be interesting in learning more about it.
Mozilla, the makers of the Firefox web browser, have a wiki page to
familiarize yourself with the command line here: 
https://developer.mozilla.org/en-US/docs/Learn/Tools_and_testing/Understanding_client-side_tools/Command_line

Happy Explorations!
Eldon

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-25 23:34 ` git-vger
@ 2022-05-26  8:47   ` Philip Oakley
       [not found]     ` <CACMKQb3exv13sYN5uEP_AG-JYu1rmVj4HDxjdw8_Y-+maJPwGg@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Philip Oakley @ 2022-05-26  8:47 UTC (permalink / raw)
  To: git-vger, Aman; +Cc: git

On 26/05/2022 00:34, git-vger@eldondev.com wrote:
> Hi Aman, responses inline below.
>
> On Wed, May 25, 2022 at 09:40:42PM +0530, Aman wrote:
>> Could someone please assist - in sharing some resources - which I
>> could go through, to better understand GIT software internals.
> There is an excellent free book at https://git-scm.com/book/en/v2 .
>
> Chapter 10 is about git internals. It is important to realize that,
> unlike many other version control systems, git works effectively on
> files locally on your computer, without any server or other shared
> resources to manage. Also, one good way to learn may be to form a
> question that you want to answer first. "How do I ...." or "what happens
> when I ....". Since git works locally, it is possible to create a git
> repo, look at the files contained in the .git directory, take action
> with git, and then look at the files again.
>
>
Another Git feature, compared to older version control systems, is that
it flips the 'control' aspect on its head. (who controls what you can
store?)

It does this by using the hash (sha1, or sha256) values as a way of
users _checking_ that they have the right copy of a file or commit,
rather than needing special permissions to access (write/read) some
alleged 'master' copy (in the sense of a unique artefact) of the
particular version. Maintainers now check and authorise particular
versions much more easily.

Hence Git _Distributes Control_ - you no longer need permission to keep
versioned copies of your work. This was, in my mind, a core element of
its success.

There is other stuff about how Git splits the (file) content from it's
meta-data, so if say 10 files contain the same licence text, then it
only hold one copy of that text, with its own unique hash. Then has a
hierarchy (pyramid) of hashes of the meta-data to build up a whole
project's hash (the top level 'tree'), and the same hierarchy technique
is repeated for the project's history of commits.

If you have a copy of the repository with the latest (same) hash then
you have a perfect copy, indistinguishable to the 'original'! Older
versioning systems did not have those guarantees, many were derived from
systems for versioning engineering and architectural drawings such as
those that were used for the RMS Titanic or Empire State Building.

Philip

PS it's worth checking out the distinction between having hash (a magic
id) of some text, and encrypting (a magic translation of) some text.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-25 16:10 About GIT Internals Aman
                   ` (2 preceding siblings ...)
  2022-05-25 23:34 ` git-vger
@ 2022-05-26 12:45 ` Konstantin Khomoutov
  3 siblings, 0 replies; 17+ messages in thread
From: Konstantin Khomoutov @ 2022-05-26 12:45 UTC (permalink / raw)
  To: Aman; +Cc: git

In addition to what others have said, I would recommend to start with "The Git
Parable" [1] - which is an ideal gentle, non-technical introduction to the
concept of distributed version control systems, - and then read "Git from the
Bottom Up" [2] and "Git for Computer Scientists" which has already been
mentioned.

 1. https://tom.preston-werner.com/2009/05/19/the-git-parable.html
 2. https://jwiegley.github.io/git-from-the-bottom-up/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
       [not found]     ` <CACMKQb3exv13sYN5uEP_AG-JYu1rmVj4HDxjdw8_Y-+maJPwGg@mail.gmail.com>
@ 2022-05-27 14:40       ` Philip Oakley
       [not found]         ` <C4B1A93D-800F-4C49-93D5-86FE58B1DDCA@hxcore.ol>
  2022-05-30  9:49         ` Kerry, Richard
  0 siblings, 2 replies; 17+ messages in thread
From: Philip Oakley @ 2022-05-27 14:40 UTC (permalink / raw)
  To: Aman; +Cc: Git List, git-vger

Hi Aman,
We try to keep all the cc's so every one can gain from the learning! 
comments in-line.

On 26/05/2022 15:17, Aman wrote:
> Hey Phillip.
>
> Thanks a lot for your email, and for sharing the book! This is great.
That was Eldon, thank you.. 
(https://lore.kernel.org/git/Yo68+kjAeP6tnduW@invalid/)

There is also the Git Magic 'book' from Stanford, with Ch8 covering the 
internals http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html

>
> Just a  follow up questions- if you don't mind:
>
> 1. I haven't had the experience of working with other (perhaps even
> older) version control systems, like subversion. So when refering to
> the "control" aspect,

The "control" aspect was from whoever was the 'manager' that limited 
access to the version system (i.e. acting like a museum curator), and 
deciding if your masterpiece was worthy of inclusion as a significant 
example of your craft, whether that was an engineering drawing or some 
software code.

>   you mean because with hashes we can verify

If you have a look at 
https://www.makeuk.org/insights/blogs/how-to-read-engineering-drawings-a-simple-guide 
and the part about the Title Block has a drawing (DWG) number 
(EEF-001-AM) that is used to reference it and, while it feels nice, the 
reference is rather arbitrary, (could someone else use that number? 
what's the next in the sequence? what happens when we reach EEF-999-AM? 
etc.).

So the computer hash (40 digits of 0-9a-f !) solves all those problems, 
it is unique (>40 card shuffle level), depends only on the content, 
computers like it. Yay.
(Computers are great at perfect replication, so cost of manufacture 
tends to zero! Cost of design wanders in the other direction;-)

> the
> integrity of the files (like code) in git - there is no need for
> having a central authority to guarantee that's it's the right content
> files (which is great)?

And it means managers no longer worry about _your_ working copy - 
computers have digital storage space to spare. That wasn't the case when 
it was on paper, and we didn't have photocopies - have a look at 'blue 
prints' https://en.wikipedia.org/wiki/Blueprint (see the invention 
date!, I still remember the smell from the late 1970s)
>
> On Thu, May 26, 2022 at 2:17 PM Philip Oakley <philipoakley@iee.email> wrote:
>> On 26/05/2022 00:34, git-vger@eldondev.com wrote:
>>> Hi Aman, responses inline below.
>>>
>>> On Wed, May 25, 2022 at 09:40:42PM +0530, Aman wrote:
>>>> Could someone please assist - in sharing some resources - which I
>>>> could go through, to better understand GIT software internals.
>>> There is an excellent free book at https://git-scm.com/book/en/v2 .
>>>
>>> Chapter 10 is about git internals. It is important to realize that,
>>> unlike many other version control systems, git works effectively on
>>> files locally on your computer, without any server or other shared
>>> resources to manage. Also, one good way to learn may be to form a
>>> question that you want to answer first. "How do I ...." or "what happens
>>> when I ....". Since git works locally, it is possible to create a git
>>> repo, look at the files contained in the .git directory, take action
>>> with git, and then look at the files again.
>>>
>>>
>> Another Git feature, compared to older version control systems, is that
>> it flips the 'control' aspect on its head. (who controls what you can
>> store?)
>>
>> It does this by using the hash (sha1, or sha256) values as a way of
>> users _checking_ that they have the right copy of a file or commit,
>> rather than needing special permissions to access (write/read) some
>> alleged 'master' copy (in the sense of a unique artefact) of the
>> particular version. Maintainers now check and authorise particular
>> versions much more easily.
>>
>> Hence Git _Distributes Control_ - you no longer need permission to keep
>> versioned copies of your work. This was, in my mind, a core element of
>> its success.
>>
>> There is other stuff about how Git splits the (file) content from it's
>> meta-data, so if say 10 files contain the same licence text, then it
>> only hold one copy of that text, with its own unique hash. Then has a
>> hierarchy (pyramid) of hashes of the meta-data to build up a whole
>> project's hash (the top level 'tree'), and the same hierarchy technique
>> is repeated for the project's history of commits.
>>
>> If you have a copy of the repository with the latest (same) hash then
>> you have a perfect copy, indistinguishable to the 'original'! Older
>> versioning systems did not have those guarantees, many were derived from
>> systems for versioning engineering and architectural drawings such as
>> those that were used for the RMS Titanic or Empire State Building.
>>
>> Philip
>>
>> PS it's worth checking out the distinction between having hash (a magic
>> id) of some text, and encrypting (a magic translation of) some text.
>>
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
       [not found]         ` <C4B1A93D-800F-4C49-93D5-86FE58B1DDCA@hxcore.ol>
@ 2022-05-27 15:14           ` Philip Oakley
  0 siblings, 0 replies; 17+ messages in thread
From: Philip Oakley @ 2022-05-27 15:14 UTC (permalink / raw)
  To: Aman; +Cc: Git List

On 27/05/2022 16:01, Aman wrote:
>
> Hey, thank you again.
>
> I am finding this mailing list format of talking a bit confusing, sorry.
>
No problem, blame Microsoft for following the business ($$$) way of 
doing stuff.

The 'plain text, in-line replies, with trimming of unrelated items' 
style helps the lurkers, and folks who come to the discussion later.

Main point is that we convert each point of interest into its own 
discussion, rather than it being a big challenge-response style between 
legal negotiators - it's not a win/lose discussion ;-)


> Would there be any way address everyone on the mailing list – like in 
> the future – to continue this conversation about git internals?
>
Key method is to locate the "reply All" option in your mail app. That 
makes sure every one in the discussion is copied, and the mailing lists 
as well for all the 'lurkers';-)

The mailing list archive uses both the message titles and any 
in-reply-to hidden headers (see if your mailer has a 'show source' 
option to see those interesting bits) to organise the list archive.
>
> I found out the mailing list archive (so confusing)– and saw these 
> personal replies don’t get added in the thread. I would appreciate if 
> you give some advice, thank you
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for 
> Windows
>
> From: Philip Oakley <mailto:philipoakley@iee.email>
> Sent: 27 May 2022 08:10 PM
> To: Aman <mailto:amanmatreja@gmail.com>
> Cc: Git List <mailto:git@vger.kernel.org>; git-vger@eldondev.com
> Subject: Re: About GIT Internals
>
> Hi Aman,
>
> We try to keep all the cc's so every one can gain from the learning!
>
> comments in-line.
>
> On 26/05/2022 15:17, Aman wrote:
>
> > Hey Phillip.
>
> >
>
> > Thanks a lot for your email, and for sharing the book! This is great.
>
> That was Eldon, thank you..
>
> (https://lore.kernel.org/git/Yo68+kjAeP6tnduW@invalid/)
>
> There is also the Git Magic 'book' from Stanford, with Ch8 covering the
>
> internals http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html
>
> >
>
> > Just a  follow up questions- if you don't mind:
>
> >
>
> > 1. I haven't had the experience of working with other (perhaps even
>
> > older) version control systems, like subversion. So when refering to
>
> > the "control" aspect,
>
> The "control" aspect was from whoever was the 'manager' that limited
>
> access to the version system (i.e. acting like a museum curator), and
>
> deciding if your masterpiece was worthy of inclusion as a significant
>
> example of your craft, whether that was an engineering drawing or some
>
> software code.
>
> >   you mean because with hashes we can verify
>
> If you have a look at
>
> https://www.makeuk.org/insights/blogs/how-to-read-engineering-drawings-a-simple-guide 
>
>
> and the part about the Title Block has a drawing (DWG) number
>
> (EEF-001-AM) that is used to reference it and, while it feels nice, the
>
> reference is rather arbitrary, (could someone else use that number?
>
> what's the next in the sequence? what happens when we reach EEF-999-AM?
>
> etc.).
>
> So the computer hash (40 digits of 0-9a-f !) solves all those problems,
>
> it is unique (>40 card shuffle level), depends only on the content,
>
> computers like it. Yay.
>
> (Computers are great at perfect replication, so cost of manufacture
>
> tends to zero! Cost of design wanders in the other direction;-)
>
> > the
>
> > integrity of the files (like code) in git - there is no need for
>
> > having a central authority to guarantee that's it's the right content
>
> > files (which is great)?
>
> And it means managers no longer worry about _your_ working copy -
>
> computers have digital storage space to spare. That wasn't the case when
>
> it was on paper, and we didn't have photocopies - have a look at 'blue
>
> prints' https://en.wikipedia.org/wiki/Blueprint (see the invention
>
> date!, I still remember the smell from the late 1970s)
>
> >
>
> > On Thu, May 26, 2022 at 2:17 PM Philip Oakley 
> <philipoakley@iee.email> wrote:
>
> >> On 26/05/2022 00:34, git-vger@eldondev.com wrote:
>
> >>> Hi Aman, responses inline below.
>
> >>>
>
> >>> On Wed, May 25, 2022 at 09:40:42PM +0530, Aman wrote:
>
> >>>> Could someone please assist - in sharing some resources - which I
>
> >>>> could go through, to better understand GIT software internals.
>
> >>> There is an excellent free book at https://git-scm.com/book/en/v2 .
>
> >>>
>
> >>> Chapter 10 is about git internals. It is important to realize that,
>
> >>> unlike many other version control systems, git works effectively on
>
> >>> files locally on your computer, without any server or other shared
>
> >>> resources to manage. Also, one good way to learn may be to form a
>
> >>> question that you want to answer first. "How do I ...." or "what 
> happens
>
> >>> when I ...". Since git works locally, it is possible to create a git
>
> >>> repo, look at the files contained in the .git directory, take action
>
> >>> with git, and then look at the files again.
>
> >>>
>
> >>>
>
> >> Another Git feature, compared to older version control systems, is that
>
> >> it flips the 'control' aspect on its head. (who controls what you can
>
> >> store?)
>
> >>
>
> >> It does this by using the hash (sha1, or sha256) values as a way of
>
> >> users _checking_ that they have the right copy of a file or commit,
>
> >> rather than needing special permissions to access (write/read) some
>
> >> alleged 'master' copy (in the sense of a unique artefact) of the
>
> >> particular version. Maintainers now check and authorise particular
>
> >> versions much more easily.
>
> >>
>
> >> Hence Git _Distributes Control_ - you no longer need permission to keep
>
> >> versioned copies of your work. This was, in my mind, a core element of
>
> >> its success.
>
> >>
>
> >> There is other stuff about how Git splits the (file) content from it's
>
> >> meta-data, so if say 10 files contain the same licence text, then it
>
> >> only hold one copy of that text, with its own unique hash. Then has a
>
> >> hierarchy (pyramid) of hashes of the meta-data to build up a whole
>
> >> project's hash (the top level 'tree'), and the same hierarchy technique
>
> >> is repeated for the project's history of commits.
>
> >>
>
> >> If you have a copy of the repository with the latest (same) hash then
>
> >> you have a perfect copy, indistinguishable to the 'original'! Older
>
> >> versioning systems did not have those guarantees, many were derived 
> from
>
> >> systems for versioning engineering and architectural drawings such as
>
> >> those that were used for the RMS Titanic or Empire State Building.
>
> >>
>
> >> Philip
>
> >>
>
> >> PS it's worth checking out the distinction between having hash (a magic
>
> >> id) of some text, and encrypting (a magic translation of) some text.
>
> >>
>
> >>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: About GIT Internals
  2022-05-27 14:40       ` Philip Oakley
       [not found]         ` <C4B1A93D-800F-4C49-93D5-86FE58B1DDCA@hxcore.ol>
@ 2022-05-30  9:49         ` Kerry, Richard
  2022-05-30 11:53           ` Konstantin Khomoutov
  1 sibling, 1 reply; 17+ messages in thread
From: Kerry, Richard @ 2022-05-30  9:49 UTC (permalink / raw)
  To: Philip Oakley, Aman; +Cc: Git List, git-vger@eldondev.com

> -----Original Message-----
> From: Philip Oakley <philipoakley@iee.email>
> Sent: 27 May 2022 15:40
> To: Aman <amanmatreja@gmail.com>
> Cc: Git List <git@vger.kernel.org>; git-vger@eldondev.com
> Subject: Re: About GIT Internals
> 
> > Just a  follow up questions- if you don't mind:
> >
> > 1. I haven't had the experience of working with other (perhaps even
> > older) version control systems, like subversion. So when refering to
> > the "control" aspect,
> 
> The "control" aspect was from whoever was the 'manager' that limited
> access to the version system (i.e. acting like a museum curator), and deciding
> if your masterpiece was worthy of inclusion as a significant example of your
> craft, whether that was an engineering drawing or some software code.

I'm not sure I get that idea.  I worked using server-based Version Control systems from the mid 80s until about 5 years ago when the team moved from Subversion to Git.  There was never a "curator" who controlled what went into VC.  You did your work, developed files, and committed when you thought it necessary.  When a build was to be done there would then be some consideration of what from VC would go into the build.
That is all still there nowadays using a distributed system (ie Git).  Those doing Open source work might operate a bit differently, as there is of necessity distribution of control of what gets into a release. But those of us who are developing proprietary software are still going through the same sort of release process.  And that's even if there isn't actually a separate person actively manipulating the contents of a release, it's just up to you to do what's necessary (actually there are others involved in dividing what will be in, but in our case they don't actively manipulate a repository).

> >>> Chapter 10 is about git internals. It is important to realize that,
> >>> unlike many other version control systems, git works effectively on
> >>> files locally on your computer, without any server or other shared
> >>> resources to manage. Also, one good way to learn may be to form a
> >>> question that you want to answer first. "How do I ...." or "what
> >>> happens when I ....". Since git works locally, it is possible to
> >>> create a git repo, look at the files contained in the .git
> >>> directory, take action with git, and then look at the files again.
> >>>
> >>>
> >> Another Git feature, compared to older version control systems, is
> >> that it flips the 'control' aspect on its head. (who controls what
> >> you can
> >> store?)

Again, I don't really recognize that.  You store what you want, probably with some sort of arrangement with the others on the team.  The important bit is determining what will go into the release.  Ie in choosing what, from everything that is stored, will be released.

> >> Hence Git _Distributes Control_ - you no longer need permission to
> >> keep versioned copies of your work. This was, in my mind, a core
> >> element of its success.

Maybe you do.  If you're working with others there will probably be "permission" in some sense involved.  I can store what I like locally, but then I miss out on some protection of my work, against a technical fault locally that might cause a loss of the whole repository.  If there is a remote server then I am probably only allowed to store company work to the company server.

A lot of this discussion seems to be more about the differences between the nature of Git and its client-server rivals.  I thought the original query was about how its internals worked, which would seem to be a slightly different question.

Regards,
Richard.
(Not old enough to remember the smell of blue prints, but old enough to know of the term)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-30  9:49         ` Kerry, Richard
@ 2022-05-30 11:53           ` Konstantin Khomoutov
  2022-05-30 13:50             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Khomoutov @ 2022-05-30 11:53 UTC (permalink / raw)
  To: Kerry, Richard; +Cc: Philip Oakley, Aman, Git List, git-vger@eldondev.com

On Mon, May 30, 2022 at 09:49:57AM +0000, Kerry, Richard wrote:

[...]
> > > 1. I haven't had the experience of working with other (perhaps even
> > > older) version control systems, like subversion. So when refering to
> > > the "control" aspect,
> > 
> > The "control" aspect was from whoever was the 'manager' that limited
> > access to the version system (i.e. acting like a museum curator), and deciding
> > if your masterpiece was worthy of inclusion as a significant example of your
> > craft, whether that was an engineering drawing or some software code.
> 
> I'm not sure I get that idea.  I worked using server-based Version Control
> systems from the mid 80s until about 5 years ago when the team moved from
> Subversion to Git.  There was never a "curator" who controlled what went
> into VC.  You did your work, developed files, and committed when you thought
> it necessary.  When a build was to be done there would then be some
> consideration of what from VC would go into the build. That is all still
> there nowadays using a distributed system (ie Git).  Those doing Open source
> work might operate a bit differently, as there is of necessity distribution
> of control of what gets into a release. But those of us who are developing
> proprietary software are still going through the same sort of release
> process.  And that's even if there isn't actually a separate person actively
> manipulating the contents of a release, it's just up to you to do what's
> necessary (actually there are others involved in dividing what will be in,
> but in our case they don't actively manipulate a repository).

I think, the "inversion of control" brought in by DVCS-es about a bit
differet set of things.

I would say it is connected to F/OSS and the way most projects have been
hosted before the DVCS-es over: usually each project had a single repository
(say, on Sourceforge or elsewhere), and it was "truly central" in the sense
that if anyone were to decide to work on that project, they would need to
contact whoever were in charge of that project and ask them to set up
permissions allowing commits - may be not to "the trunk", but anyway the
commit access was required because in centralized VCS commits are made on the
server side.
(Of course, there were projects where you could mail your patchset to a
maintainer, but maintaining such patchset was not convenient: you would either
need to host your own fully private VCS or use a tool like Quilt [1].
Also note that certain high-profile projects such as Linux and Git use mailing
lists for submission and review of patch series; this workflow coexists with
the concept of DVCS just fine.)

This approach has been effectively reversed by what was a killer-feature of
Github (I honestly am not sure whether Github was the first to implement it
but it was, and arguably is, the most popular): a network of "forks".
If a project is hosted using a DVCS, anyone is free to clone it and push their
work _elsewhere._ This point is crucial: you do not need to ask the project
maintainers to publish your modifications. Github pushed this concept quite
far: creating a fork and pushing your work there is actually a device to create
a pull request - a request to incorporate your changes into the original
project. While this approach has obvious upsides, it also has possible
downsides; one of a more visible is that when an original project becomes
dormant for some reason, its users might have hard time understanding which
one of competing forks to switch to, and there are cases when multiple
competing forks implement different features and bugfixes, in parallel.
One of the guys behind Subversion expressed his concerns about this back then
wgen Git was in its relative infancy [2].

 1. https://en.wikipedia.org/wiki/Quilt_(software)
 2. http://blog.red-bean.com/sussman/?p=20

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-30 11:53           ` Konstantin Khomoutov
@ 2022-05-30 13:50             ` Ævar Arnfjörð Bjarmason
  2022-06-03 12:18               ` Aman
  0 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-30 13:50 UTC (permalink / raw)
  To: Konstantin Khomoutov
  Cc: Kerry, Richard, Philip Oakley, Aman, Git List,
	git-vger@eldondev.com

On Mon, May 30 2022, Konstantin Khomoutov wrote:

> On Mon, May 30, 2022 at 09:49:57AM +0000, Kerry, Richard wrote:
>
> [...]
>> > > 1. I haven't had the experience of working with other (perhaps even
>> > > older) version control systems, like subversion. So when refering to
>> > > the "control" aspect,
>> > 
>> > The "control" aspect was from whoever was the 'manager' that limited
>> > access to the version system (i.e. acting like a museum curator), and deciding
>> > if your masterpiece was worthy of inclusion as a significant example of your
>> > craft, whether that was an engineering drawing or some software code.
>> 
>> I'm not sure I get that idea.  I worked using server-based Version Control
>> systems from the mid 80s until about 5 years ago when the team moved from
>> Subversion to Git.  There was never a "curator" who controlled what went
>> into VC.  You did your work, developed files, and committed when you thought
>> it necessary.  When a build was to be done there would then be some
>> consideration of what from VC would go into the build. That is all still
>> there nowadays using a distributed system (ie Git).  Those doing Open source
>> work might operate a bit differently, as there is of necessity distribution
>> of control of what gets into a release. But those of us who are developing
>> proprietary software are still going through the same sort of release
>> process.  And that's even if there isn't actually a separate person actively
>> manipulating the contents of a release, it's just up to you to do what's
>> necessary (actually there are others involved in dividing what will be in,
>> but in our case they don't actively manipulate a repository).
>
> I think, the "inversion of control" brought in by DVCS-es about a bit
> differet set of things.

Re the "I'm not sure I get that idea" from Richard I think his point
stands that some of the stories we carry around about the VCS v.s. DVCS
in free/open source software was more particular to how things were done
in those online communities, and not really about the implicit
constraints of centralized VCS per-se.

Partly those two mix: It was quite common for free software projects not
to have any public VCS (usually CVS) access at all, some did, but it was
quite a hassle to set up, and not part of your "normal" workflow (as
opposed setting up a hoster git repository, which everyone uses) that
many just didn't do it.

> I would say it is connected to F/OSS and the way most projects have been
> hosted before the DVCS-es over: usually each project had a single repository
> (say, on Sourceforge or elsewhere), and it was "truly central" in the sense
> that if anyone were to decide to work on that project, they would need to
> contact whoever were in charge of that project and ask them to set up
> permissions allowing commits - may be not to "the trunk", but anyway the
> commit access was required because in centralized VCS commits are made on the
> server side.

We may have tried this in different eras, but from what I recall it was
a crapshoot whether there was any public VCS access at all. Some
projects were quite good about it, and sourceforge managed to push that
to more of them early on by making anonymous CVS access something you
could get by default.

But a lot of projects simply didn't have it at all, you'll still find
some of them today, i.e. various bits of "infrastructure" code that the
maintainers are (presumably) still manually managing with zip snapshots
and manually applied patches.

> (Of course, there were projects where you could mail your patchset to a
> maintainer, but maintaining such patchset was not convenient: you would either
> need to host your own fully private VCS or use a tool like Quilt [1].
> Also note that certain high-profile projects such as Linux and Git use mailing
> lists for submission and review of patch series; this workflow coexists with
> the concept of DVCS just fine.)

I'd add though that this isn't really "co-existing" with DVSC so much as
using patches on a ML as an indirect transport protocol for "git push".

I.e. if you contributed to some similar projects "back in the day" you
could expect to effectively send your patche into a black-hole until the
next release, the maintainer would apply them locally, you wouldn't be
able to pull them back down via the DVCS.

Perhaps there would be development releases, but those could be weeks or
even months apart, and a "real" release might be once every 1-2 years.

Whereas both Junio and Linus (and other linux maintainers) publish their
version of the patches they do integrate fairly quickly.

> [...] it also has possible
> downsides; one of a more visible is that when an original project becomes
> dormant for some reason, its users might have hard time understanding which
> one of competing forks to switch to, and there are cases when multiple
> competing forks implement different features and bugfixes, in parallel.
> One of the guys behind Subversion expressed his concerns about this back then
> wgen Git was in its relative infancy [2].
>
>  1. https://en.wikipedia.org/wiki/Quilt_(software)
>  2. http://blog.red-bean.com/sussman/?p=20

It's interesting that this aspect of what proponents of centralized VCS
were fearful of when it came to DVCS turned out to be the exact
opposite:

    Notice what this user is now able to do: he wants to to crawl off
    into a cave, work for weeks on a complex feature by himself, then
    present it as a polished result to the main codebase. And this is
    exactly the sort of behavior that I think is bad for open source
    communities.

I.e. lowering the cost to publish early and often has had the effect
that people are less likely to "crawl off into a cave" and work on
something for a long time without syncing up with other parallel
development.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-05-30 13:50             ` Ævar Arnfjörð Bjarmason
@ 2022-06-03 12:18               ` Aman
  2022-06-03 15:23                 ` Konstantin Khomoutov
  2022-06-03 15:25                 ` Emily Shaffer
  0 siblings, 2 replies; 17+ messages in thread
From: Aman @ 2022-06-03 12:18 UTC (permalink / raw)
  To: Git List
  Cc: Konstantin Khomoutov, Kerry, Richard, Philip Oakley,
	git-vger@eldondev.com, Ævar Arnfjörð Bjarmason

Hello everyone. I sent out an email here last week, asking for a list
of resources, so I could better understand the workings and design of
git. I really appreciate everyone, who gave the links and their
advice.

I have been reading about GIT for some time now, and have looked at
almost all of the resources plus some others. I think I could say, I
now have a decent conceptual understanding of how GIT  works
internally.

(Also, I understood the chapter about git I read in the book I am
reading, Architecture of Open Source Applications: Volume 2, which I
didn't understand at all, the reason I started this thread). Although
there must definitely be a lot of details and subtle things I may not
understand yet (like branches are nothing but pointers to commits,
wow! btw)

Now, continuing this discussion, and talking about the implementation
and engineering side of things, I wanted to ask another question and
hence wanted some advice.

Though I may understand the internal design and high-level
implementation of GIT, I really want to know how it's implemented and
was made, which means reading the SOURCE CODE.

1. I don't know how absurd of a quest this is, please enlighten me.
2. How do I do it? Where do I start? It's such a BIG repository - and
I am not guessing it's going to be easy.
3. Would someone advise, perhaps, to have a look at an older version
of the source code? rather than the latest one, for some reason.


Again, I would really appreciate it if someone could give their
thoughts on this.

Thank you,

Regards,
Aman


On Mon, May 30, 2022 at 7:40 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 30 2022, Konstantin Khomoutov wrote:
>
> > On Mon, May 30, 2022 at 09:49:57AM +0000, Kerry, Richard wrote:
> >
> > [...]
> >> > > 1. I haven't had the experience of working with other (perhaps even
> >> > > older) version control systems, like subversion. So when refering to
> >> > > the "control" aspect,
> >> >
> >> > The "control" aspect was from whoever was the 'manager' that limited
> >> > access to the version system (i.e. acting like a museum curator), and deciding
> >> > if your masterpiece was worthy of inclusion as a significant example of your
> >> > craft, whether that was an engineering drawing or some software code.
> >>
> >> I'm not sure I get that idea.  I worked using server-based Version Control
> >> systems from the mid 80s until about 5 years ago when the team moved from
> >> Subversion to Git.  There was never a "curator" who controlled what went
> >> into VC.  You did your work, developed files, and committed when you thought
> >> it necessary.  When a build was to be done there would then be some
> >> consideration of what from VC would go into the build. That is all still
> >> there nowadays using a distributed system (ie Git).  Those doing Open source
> >> work might operate a bit differently, as there is of necessity distribution
> >> of control of what gets into a release. But those of us who are developing
> >> proprietary software are still going through the same sort of release
> >> process.  And that's even if there isn't actually a separate person actively
> >> manipulating the contents of a release, it's just up to you to do what's
> >> necessary (actually there are others involved in dividing what will be in,
> >> but in our case they don't actively manipulate a repository).
> >
> > I think, the "inversion of control" brought in by DVCS-es about a bit
> > differet set of things.
>
> Re the "I'm not sure I get that idea" from Richard I think his point
> stands that some of the stories we carry around about the VCS v.s. DVCS
> in free/open source software was more particular to how things were done
> in those online communities, and not really about the implicit
> constraints of centralized VCS per-se.
>
> Partly those two mix: It was quite common for free software projects not
> to have any public VCS (usually CVS) access at all, some did, but it was
> quite a hassle to set up, and not part of your "normal" workflow (as
> opposed setting up a hoster git repository, which everyone uses) that
> many just didn't do it.
>
> > I would say it is connected to F/OSS and the way most projects have been
> > hosted before the DVCS-es over: usually each project had a single repository
> > (say, on Sourceforge or elsewhere), and it was "truly central" in the sense
> > that if anyone were to decide to work on that project, they would need to
> > contact whoever were in charge of that project and ask them to set up
> > permissions allowing commits - may be not to "the trunk", but anyway the
> > commit access was required because in centralized VCS commits are made on the
> > server side.
>
> We may have tried this in different eras, but from what I recall it was
> a crapshoot whether there was any public VCS access at all. Some
> projects were quite good about it, and sourceforge managed to push that
> to more of them early on by making anonymous CVS access something you
> could get by default.
>
> But a lot of projects simply didn't have it at all, you'll still find
> some of them today, i.e. various bits of "infrastructure" code that the
> maintainers are (presumably) still manually managing with zip snapshots
> and manually applied patches.
>
> > (Of course, there were projects where you could mail your patchset to a
> > maintainer, but maintaining such patchset was not convenient: you would either
> > need to host your own fully private VCS or use a tool like Quilt [1].
> > Also note that certain high-profile projects such as Linux and Git use mailing
> > lists for submission and review of patch series; this workflow coexists with
> > the concept of DVCS just fine.)
>
> I'd add though that this isn't really "co-existing" with DVSC so much as
> using patches on a ML as an indirect transport protocol for "git push".
>
> I.e. if you contributed to some similar projects "back in the day" you
> could expect to effectively send your patche into a black-hole until the
> next release, the maintainer would apply them locally, you wouldn't be
> able to pull them back down via the DVCS.
>
> Perhaps there would be development releases, but those could be weeks or
> even months apart, and a "real" release might be once every 1-2 years.
>
> Whereas both Junio and Linus (and other linux maintainers) publish their
> version of the patches they do integrate fairly quickly.
>
> > [...] it also has possible
> > downsides; one of a more visible is that when an original project becomes
> > dormant for some reason, its users might have hard time understanding which
> > one of competing forks to switch to, and there are cases when multiple
> > competing forks implement different features and bugfixes, in parallel.
> > One of the guys behind Subversion expressed his concerns about this back then
> > wgen Git was in its relative infancy [2].
> >
> >  1. https://en.wikipedia.org/wiki/Quilt_(software)
> >  2. http://blog.red-bean.com/sussman/?p=20
>
> It's interesting that this aspect of what proponents of centralized VCS
> were fearful of when it came to DVCS turned out to be the exact
> opposite:
>
>     Notice what this user is now able to do: he wants to to crawl off
>     into a cave, work for weeks on a complex feature by himself, then
>     present it as a polished result to the main codebase. And this is
>     exactly the sort of behavior that I think is bad for open source
>     communities.
>
> I.e. lowering the cost to publish early and often has had the effect
> that people are less likely to "crawl off into a cave" and work on
> something for a long time without syncing up with other parallel
> development.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-06-03 12:18               ` Aman
@ 2022-06-03 15:23                 ` Konstantin Khomoutov
  2022-06-04 15:24                   ` Aman
  2022-06-03 15:25                 ` Emily Shaffer
  1 sibling, 1 reply; 17+ messages in thread
From: Konstantin Khomoutov @ 2022-06-03 15:23 UTC (permalink / raw)
  To: Aman
  Cc: Git List, Konstantin Khomoutov, Kerry, Richard, Philip Oakley,
	git-vger@eldondev.com, Ævar Arnfjörð Bjarmason

On Fri, Jun 03, 2022 at 05:48:14PM +0530, Aman wrote:

[...]
> Though I may understand the internal design and high-level
> implementation of GIT, I really want to know how it's implemented and
> was made, which means reading the SOURCE CODE.
> 
> 1. I don't know how absurd of a quest this is, please enlighten me.
> 2. How do I do it? Where do I start? It's such a BIG repository - and
> I am not guessing it's going to be easy.
> 3. Would someone advise, perhaps, to have a look at an older version
> of the source code? rather than the latest one, for some reason.

Well, depends on what you mean when talking about the two mentioned designs.
I mean, there's the design of the approach to manage data and there's the
design of the software package (which Git is).

If you do also understand the latter - that is, understanding that Git is an
assortment of CLI tools combined into two layers called "plumbing" and
"porcelain", - then you should have no difficulty starting to read the code:
basically locate the source code of the entry point Git binary (which is,
well, "git", or "git.exe" on Windows) and start reading it. You'll find it
parses its command-line arguments and calls out to other executable modules
which are parts of the Git software package to do heavy lifting.
You then read the source code of the packages of interest, and so on and so
on. I'm not sure there could be any other "guide" to read the source code.

If you're not familiar with the design of Git-as-a-software-package, it's
probably time to clone the Git repository and explore the contents of the
directory named "Documentation" there.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-06-03 12:18               ` Aman
  2022-06-03 15:23                 ` Konstantin Khomoutov
@ 2022-06-03 15:25                 ` Emily Shaffer
  2022-06-03 17:15                   ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Emily Shaffer @ 2022-06-03 15:25 UTC (permalink / raw)
  To: Aman
  Cc: Git List, Konstantin Khomoutov, Kerry, Richard, Philip Oakley,
	git-vger@eldondev.com, Ævar Arnfjörð Bjarmason

On Fri, Jun 3, 2022 at 5:21 AM Aman <amanmatreja@gmail.com> wrote:
>
> Hello everyone. I sent out an email here last week, asking for a list
> of resources, so I could better understand the workings and design of
> git. I really appreciate everyone, who gave the links and their
> advice.
>
> I have been reading about GIT for some time now, and have looked at
> almost all of the resources plus some others. I think I could say, I
> now have a decent conceptual understanding of how GIT  works
> internally.
>
> (Also, I understood the chapter about git I read in the book I am
> reading, Architecture of Open Source Applications: Volume 2, which I
> didn't understand at all, the reason I started this thread). Although
> there must definitely be a lot of details and subtle things I may not
> understand yet (like branches are nothing but pointers to commits,
> wow! btw)
>
> Now, continuing this discussion, and talking about the implementation
> and engineering side of things, I wanted to ask another question and
> hence wanted some advice.
>
> Though I may understand the internal design and high-level
> implementation of GIT, I really want to know how it's implemented and
> was made, which means reading the SOURCE CODE.
>
> 1. I don't know how absurd of a quest this is, please enlighten me.

It's a lot :) But I don't think that should discourage you.

> 2. How do I do it? Where do I start? It's such a BIG repository - and
> I am not guessing it's going to be easy.

I would start actually with "Documentation/MyFirstContribution.txt"
and "Documentation/MyFirstRevisionWalk.txt" - but I am biased towards
those documents. ;) The other subtle hint I would give is that the
entry point for almost every command is at a function called
"cmd_cmdname()", so for example "git status" is at "cmd_status()",
usually somewhere in 'builtin/'.

> 3. Would someone advise, perhaps, to have a look at an older version
> of the source code? rather than the latest one, for some reason.

Some other piece of the developer documentation (maybe
"SubmittingPatches"?) suggests that you start from the initial commit
and understand that part first. I personally don't find this exercise
very useful anymore as Git has grown quite a lot since then (and is
even primarily in a different language, although we still have some
bash scripts here and there).

> Again, I would really appreciate it if someone could give their
> thoughts on this.

In your journeys, also watch out for some libraries in common, like
calls from "run-command.h" or "parse-opt.h", to help you understand
how we make stuff work more or less consistently across the codebase,
or libraries like "strbuf.h" and "string-list.h" to understand some of
the things that we do to make working with C a little less fraught.

>
> Thank you,
>
> Regards,
> Aman
>
>
> On Mon, May 30, 2022 at 7:40 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> >
> > On Mon, May 30 2022, Konstantin Khomoutov wrote:
> >
> > > On Mon, May 30, 2022 at 09:49:57AM +0000, Kerry, Richard wrote:
> > >
> > > [...]
> > >> > > 1. I haven't had the experience of working with other (perhaps even
> > >> > > older) version control systems, like subversion. So when refering to
> > >> > > the "control" aspect,
> > >> >
> > >> > The "control" aspect was from whoever was the 'manager' that limited
> > >> > access to the version system (i.e. acting like a museum curator), and deciding
> > >> > if your masterpiece was worthy of inclusion as a significant example of your
> > >> > craft, whether that was an engineering drawing or some software code.
> > >>
> > >> I'm not sure I get that idea.  I worked using server-based Version Control
> > >> systems from the mid 80s until about 5 years ago when the team moved from
> > >> Subversion to Git.  There was never a "curator" who controlled what went
> > >> into VC.  You did your work, developed files, and committed when you thought
> > >> it necessary.  When a build was to be done there would then be some
> > >> consideration of what from VC would go into the build. That is all still
> > >> there nowadays using a distributed system (ie Git).  Those doing Open source
> > >> work might operate a bit differently, as there is of necessity distribution
> > >> of control of what gets into a release. But those of us who are developing
> > >> proprietary software are still going through the same sort of release
> > >> process.  And that's even if there isn't actually a separate person actively
> > >> manipulating the contents of a release, it's just up to you to do what's
> > >> necessary (actually there are others involved in dividing what will be in,
> > >> but in our case they don't actively manipulate a repository).
> > >
> > > I think, the "inversion of control" brought in by DVCS-es about a bit
> > > differet set of things.
> >
> > Re the "I'm not sure I get that idea" from Richard I think his point
> > stands that some of the stories we carry around about the VCS v.s. DVCS
> > in free/open source software was more particular to how things were done
> > in those online communities, and not really about the implicit
> > constraints of centralized VCS per-se.
> >
> > Partly those two mix: It was quite common for free software projects not
> > to have any public VCS (usually CVS) access at all, some did, but it was
> > quite a hassle to set up, and not part of your "normal" workflow (as
> > opposed setting up a hoster git repository, which everyone uses) that
> > many just didn't do it.
> >
> > > I would say it is connected to F/OSS and the way most projects have been
> > > hosted before the DVCS-es over: usually each project had a single repository
> > > (say, on Sourceforge or elsewhere), and it was "truly central" in the sense
> > > that if anyone were to decide to work on that project, they would need to
> > > contact whoever were in charge of that project and ask them to set up
> > > permissions allowing commits - may be not to "the trunk", but anyway the
> > > commit access was required because in centralized VCS commits are made on the
> > > server side.
> >
> > We may have tried this in different eras, but from what I recall it was
> > a crapshoot whether there was any public VCS access at all. Some
> > projects were quite good about it, and sourceforge managed to push that
> > to more of them early on by making anonymous CVS access something you
> > could get by default.
> >
> > But a lot of projects simply didn't have it at all, you'll still find
> > some of them today, i.e. various bits of "infrastructure" code that the
> > maintainers are (presumably) still manually managing with zip snapshots
> > and manually applied patches.
> >
> > > (Of course, there were projects where you could mail your patchset to a
> > > maintainer, but maintaining such patchset was not convenient: you would either
> > > need to host your own fully private VCS or use a tool like Quilt [1].
> > > Also note that certain high-profile projects such as Linux and Git use mailing
> > > lists for submission and review of patch series; this workflow coexists with
> > > the concept of DVCS just fine.)
> >
> > I'd add though that this isn't really "co-existing" with DVSC so much as
> > using patches on a ML as an indirect transport protocol for "git push".
> >
> > I.e. if you contributed to some similar projects "back in the day" you
> > could expect to effectively send your patche into a black-hole until the
> > next release, the maintainer would apply them locally, you wouldn't be
> > able to pull them back down via the DVCS.
> >
> > Perhaps there would be development releases, but those could be weeks or
> > even months apart, and a "real" release might be once every 1-2 years.
> >
> > Whereas both Junio and Linus (and other linux maintainers) publish their
> > version of the patches they do integrate fairly quickly.
> >
> > > [...] it also has possible
> > > downsides; one of a more visible is that when an original project becomes
> > > dormant for some reason, its users might have hard time understanding which
> > > one of competing forks to switch to, and there are cases when multiple
> > > competing forks implement different features and bugfixes, in parallel.
> > > One of the guys behind Subversion expressed his concerns about this back then
> > > wgen Git was in its relative infancy [2].
> > >
> > >  1. https://en.wikipedia.org/wiki/Quilt_(software)
> > >  2. http://blog.red-bean.com/sussman/?p=20
> >
> > It's interesting that this aspect of what proponents of centralized VCS
> > were fearful of when it came to DVCS turned out to be the exact
> > opposite:
> >
> >     Notice what this user is now able to do: he wants to to crawl off
> >     into a cave, work for weeks on a complex feature by himself, then
> >     present it as a polished result to the main codebase. And this is
> >     exactly the sort of behavior that I think is bad for open source
> >     communities.
> >
> > I.e. lowering the cost to publish early and often has had the effect
> > that people are less likely to "crawl off into a cave" and work on
> > something for a long time without syncing up with other parallel
> > development.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-06-03 15:25                 ` Emily Shaffer
@ 2022-06-03 17:15                   ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2022-06-03 17:15 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Aman, Git List, Konstantin Khomoutov, Kerry, Richard,
	Philip Oakley, git-vger@eldondev.com,
	Ævar Arnfjörð Bjarmason

Emily Shaffer <emilyshaffer@google.com> writes:

>> 3. Would someone advise, perhaps, to have a look at an older version
>> of the source code? rather than the latest one, for some reason.

For those who want to learn from source files, I would recommend
reading all the files in the very initial commit, cover to cover.

e83c5163 (Initial revision of "git", the information manager from
hell, 2005-04-07)

With only 1244 lines spread across 11 files, it is a short-read that
is completable in a single sitting for those who are reasonably
fluent in C.  It does not have any frills, but the basic data
structures to express the important concepts are already there.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-06-03 15:23                 ` Konstantin Khomoutov
@ 2022-06-04 15:24                   ` Aman
  2022-06-06 11:52                     ` Konstantin Khomoutov
  0 siblings, 1 reply; 17+ messages in thread
From: Aman @ 2022-06-04 15:24 UTC (permalink / raw)
  To: Konstantin Khomoutov
  Cc: Git List, Kerry, Richard, Philip Oakley, git-vger@eldondev.com,
	Ævar Arnfjörð Bjarmason

On Fri, Jun 3, 2022, at 8:53 PM Konstantin Khomoutov <kostix@bswap.ru> wrote:

> Well, depends on what you mean when talking about the two mentioned designs.
> I mean, there's the design of the approach to manage data and there's the
> design of the software package (which Git is).

That's a good perspective on the distinction between the designs. I am
not familiar yet, with the design of GIT as a software package, and I
am guessing most people who'll be learning about GIT internals won't
be.

> If you do also understand the latter - that is, understanding that Git is an
> assortment of CLI tools combined into two layers called "plumbing" and
> "porcelain", - then you should have no difficulty starting to read the code:
> basically locate the source code of the entry point Git binary (which is,
> well, "git", or "git.exe" on Windows) and start reading it.

How do I do that?  What do you mean by the "entry point" of the git binary?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: About GIT Internals
  2022-06-04 15:24                   ` Aman
@ 2022-06-06 11:52                     ` Konstantin Khomoutov
  0 siblings, 0 replies; 17+ messages in thread
From: Konstantin Khomoutov @ 2022-06-06 11:52 UTC (permalink / raw)
  To: Aman
  Cc: Konstantin Khomoutov, Git List, Kerry, Richard, Philip Oakley,
	git-vger@eldondev.com, Ævar Arnfjörð Bjarmason

On Sat, Jun 04, 2022 at 08:54:10PM +0530, Aman wrote:

[...]
> > If you do also understand the latter - that is, understanding that Git is an
> > assortment of CLI tools combined into two layers called "plumbing" and
> > "porcelain", - then you should have no difficulty starting to read the code:
> > basically locate the source code of the entry point Git binary (which is,
> > well, "git", or "git.exe" on Windows) and start reading it.

(I have reversed the order of your questions below so that my comments follow
logically one after another.)

> What do you mean by the "entry point" of the git binary?

Well, porcelain Git commands (those supposed to be used by users to carry out
their day-to-day tasks) are all implemented as subcommands of a single
executable image file called "git" on all supported platforms (except Windows,
where it's called "git.exe"): for instance, you run "git init" to initialize a
repository, and your OS looks up the executable image file named "git"
somewhere in the list of directories containing such files (it's usually
contained in the environment variable named "PATH"), executes it and passes it
a single command-line argument - "init". The rest of the commands works the
same way. Therefore, that binary named "git" is an entry point of the Git
software package: the execution of most Git commands starts there (not *all*
Git commands, but let's not touch this yet).

> How do I do that?

Well, basically that's out of the scope of this list, but let's try...

Git is a complex software package mostly written in C (and POSIX shell).
As many F/OSS projects written in C, it has a top-level Makefile which is a
file supposed to be processed by GNU Make; this file contains a set of rules
for generating files from other files (compiling C source code into object
files and linking those into libraries and executable image files is exactly
this - generating files from other files). So usually you start from reading
the Makefile to find where the binary file of interest is generated, and from
which source files.

The problem is that Git's Makefile is *complex.*
So let's save you some headache and cut straight to the point: of the top
interest to you are the two files: git.c and common-main.c. The former is
exactly what implements that top-level entry point program, "git", while the
latter implements the function called "main" which is an entry point to any
program written in C which is supposed to be runnable standalone (as opposed
to becoming a library); the object file generated when compiling common-main.c
is linked to every other compiled code implementing Git commands, its main()
calls cmd_main() which is supposed to be implemented in the code of those
commands.

The rest is basically just usual C stuff - source files and header files.
If you're not familiar with these basics, then, I'm afraid, Git may be not the
best project to dive into.

In any case, I find the idea proposed by Junio elsewhere in this thread to be
very smart: it should be quite enlightening to read the "early" Git code to
make yourself accustomed to its overal architecture before moving on to its
present - much more complicated - implementation which nevertheless still
maintains the same architecture.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-06 13:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-25 16:10 About GIT Internals Aman
2022-05-25 16:49 ` Emily Shaffer
2022-05-25 21:14 ` Erik Cervin Edin
2022-05-25 23:34 ` git-vger
2022-05-26  8:47   ` Philip Oakley
     [not found]     ` <CACMKQb3exv13sYN5uEP_AG-JYu1rmVj4HDxjdw8_Y-+maJPwGg@mail.gmail.com>
2022-05-27 14:40       ` Philip Oakley
     [not found]         ` <C4B1A93D-800F-4C49-93D5-86FE58B1DDCA@hxcore.ol>
2022-05-27 15:14           ` Philip Oakley
2022-05-30  9:49         ` Kerry, Richard
2022-05-30 11:53           ` Konstantin Khomoutov
2022-05-30 13:50             ` Ævar Arnfjörð Bjarmason
2022-06-03 12:18               ` Aman
2022-06-03 15:23                 ` Konstantin Khomoutov
2022-06-04 15:24                   ` Aman
2022-06-06 11:52                     ` Konstantin Khomoutov
2022-06-03 15:25                 ` Emily Shaffer
2022-06-03 17:15                   ` Junio C Hamano
2022-05-26 12:45 ` Konstantin Khomoutov

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).