"Producting Open Source Software" book and distributed SCMs

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* "Producting Open Source Software" book and distributed SCMs
@ 2007-04-29 23:20 Jakub Narebski
  2007-05-01  9:35 ` Johannes Schindelin
  2007-05-01 16:15 ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Jakub Narebski @ 2007-04-29 23:20 UTC (permalink / raw)
  To: git

I have read lately classic book "Producing Open Source Software. How to 
Run a Successful Free Software Project" by Karl Fogel (2005).

Among others, author advocates using version control system as a basis 
for running a project. In "Choosing a Version Contol System" he writes:

  As of this writing, the version control system of choice in the free
  software world is the Concurrent Versions System or CVS.

Further on much of examples of managing project and managing volunteers 
revolves around the idea of "commit access", and it is assumed 
implicitely that version control system is centralized. It is 
understandable, as in 2005 there were (according to Linus) no good 
distributed version control systems (SCMs). Also Karl Fogel writes in 
preface that much of material came from the five years of working with 
the Subversion project, and Subversion is centralized SCM meant as 
"better CVS" and used itself as revision control system; any experience 
described had to be with centralized SCM.

The distributed SCM is mentioned in footnote in section "Comitters" in 
Chapter 8, Managing Volunteers:

 http://producingoss.com/producingoss.html#ftn.id284130

  [22] Note that the commit access means something a bit different in
  decentralized version control systems, where anyone can set up a
  repository that is linked into the project, and give themselves commit
  access to that repository. Nevertheless, the concept of commit access
  still applies: "commit access" is shorthand for "the right to make
  changes to the code that will ship in the group's next release of the
  software." In centralized version control systems, this means having
  direct commit access; in decentralized ones, it means having one's
  changes pulled into the main distribution by default. It is the same
  idea either way; the mechanics by which it is realized are not
  terribly important.

I'm interested in your experience with managing projects using 
distributed SCM, or even better first centralized then distributed SCM: 
is the above difference the only one? Linus has said that fully 
distributed SCM improves forkability: 

 "Re: If merging that is really fast forwarding creates new commit"
 Message-ID: <Pine.LNX.4.64.0611070841580.3667@g5.osdl.org>
 http://permalink.gmane.org/gmane.comp.version-control.git/31078

  Time for some purely philosophical arguments on why it's wrong to have 
  "special people" encoded in the tools:

  I think that "forking" is what keeps people honest. The _biggest_
  downside with CVS is actually that a central repository gets so much
  _political_ clout, that it's effectively impossible to fork the
  project: the maintainers of a central repo have huge powers over
  everybody else, and it's practically impossible for anybody else to
  say "you're wrong, and I'll show how wrong you are by competing fairly
  and being better".

According to "Producting Open Source Software" it is very important 
feature for an OSS project. See section "Forkability" of Chapter 4, 
Social and Political Infrastructure (beginning of chapter):

 http://producingoss.com/producingoss.html#forkability

  The indispensable ingredient that binds developers together on a free
  software project, and makes them willing to compromise when necessary,
  is the code's _forkability_: the ability of anyone to take a copy of
  the source code and use it to start a competing project, known as
  a fork.  The paradoxical thing is that the _possibility_ of forks is
  usually a much greater force in free software projects than actual
  forks, which are very rare.  Because a fork is bad for everyone (for
  reasons examined in detail in the section called "Forks" in Chapter 8,
  Managing Volunteers, http://producingoss.com/producingoss.html#forks),
  the more serious the threat of a fork becomes, the more willing people
  are to compromise to avoid it.

Besides that, what are the differences between managing project using 
centralized SCM and one using distributed SCM? What is equivalent of 
committers, giving full and partial commit access, revoking commit 
access? How good support for tagging and branching influences creating 
code and build procedure? Is distributed SCM better geared towards 
"benovolent dictator" model than "consensus-based democracy" model, as 
described in OSSbook?

Thanks in advance for all responses
-- 
Jakub Narebski
ShadeHawk on #git
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-04-29 23:20 "Producting Open Source Software" book and distributed SCMs Jakub Narebski
@ 2007-05-01  9:35 ` Johannes Schindelin
  2007-05-01 15:23   ` Theodore Tso
  2007-05-01 18:30   ` Jakub Narebski
  2007-05-01 16:15 ` Linus Torvalds
  1 sibling, 2 replies; 9+ messages in thread
From: Johannes Schindelin @ 2007-05-01  9:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Mon, 30 Apr 2007, Jakub Narebski wrote:

> I have read lately classic book "Producing Open Source Software. How to 
> Run a Successful Free Software Project" by Karl Fogel (2005).
> 
> Among others, author advocates using version control system as a basis 
> for running a project. In "Choosing a Version Contol System" he writes:
> 
>   As of this writing, the version control system of choice in the free
>   software world is the Concurrent Versions System or CVS.

Back then, it was. I ran all my projects on CVS. Then came along Git. I 
tried to keep up with it, but had to quit for day-job reasons. When I came 
back, Git was already so good that I switched almost everything over.

> The distributed SCM is mentioned in footnote in section "Comitters" in 
> Chapter 8, Managing Volunteers:
> 
>  http://producingoss.com/producingoss.html#ftn.id284130
> 
>   [22] Note that the commit access means something a bit different in
>   decentralized version control systems, where anyone can set up a
>   repository that is linked into the project, and give themselves commit
>   access to that repository. Nevertheless, the concept of commit access
>   still applies: "commit access" is shorthand for "the right to make
>   changes to the code that will ship in the group's next release of the
>   software." In centralized version control systems, this means having
>   direct commit access; in decentralized ones, it means having one's
>   changes pulled into the main distribution by default. It is the same
>   idea either way; the mechanics by which it is realized are not
>   terribly important.
> 
> 
> I'm interested in your experience with managing projects using 
> distributed SCM, or even better first centralized then distributed SCM: 
> is the above difference the only one?

In my experience, the offline mode has been a huge advantage. For example, 
in one project I work together with people from three different countries, 
some of them traveling quite a bit. I sold Git solely on the 
transportability. One of them was so happy that he switched over most of 
his projects, too.

BTW that is the common way I see: once people get hooked, they not only 
convert their existing projects to Git, but they use cvsimport a lot more, 
and they start to manage configuration settings, documents, pictures, etc. 
with Git, because it gives rise an easy backup mechanism.

Another difference between central and distributed operation I see is the 
workflow. With Git, you can commit much more often. For example, when 
working with Sourceforge's CVS (which _was_ comparable with the speed of 
corporate SourceSafe repos), I would always think about committing (and 
having a coffee), or rather combine these changes with the next ones.

Obviously, committing more often leads to a much nicer repository 
structure, making it much easier to get into the code for new developers. 
It also makes it easier to get at bugs. And because it is so much faster, 
you can actually do a "git diff" before committing, to make sure that you 
did not leave in that stupid debug statement.

> Linus has said that fully distributed SCM improves forkability:
>
> [...] 
> 
>   I think that "forking" is what keeps people honest. The _biggest_
>   downside with CVS is actually that a central repository gets so much
>   _political_ clout, that it's effectively impossible to fork the
>   project: [...]
> 
> According to "Producting Open Source Software" it is very important 
> feature for an OSS project.
>
> [...]
> 
>   Because a fork is bad for everyone (for reasons examined in detail in 
>   the section called "Forks" in Chapter 8, Managing Volunteers, 
>   http://producingoss.com/producingoss.html#forks), the more serious the 
>   threat of a fork becomes, the more willing people are to compromise to 
>   avoid it.

This is a lousy argument, IMHO.

Why are forks bad? They are not. But if you "learnt" that merges are hard, 
they are.

It is a pity that so many people were trained in CVS, and keep thinking 
some of the lectures were true, when they are no longer.

Forks are good. In fact, we all "forked" with CVS as soon as we began 
hacking. Everybody who claims to never have started over from a fresh 
checkout, or from an "update -C"ed state, is probably lying, or a bad 
developer. Thinking about it, I believe that the difference between 
forking and branching is philosophical, not technical. You can always 
merge a fork.

And the thing is, you would not start hacking on some obscure feature, if 
that happened completely in the open, for fear of being accused a complete 
moron.

With CVS, that meant that you tried to get at a stage where others could 
see that it was worth doing, before committing. Which makes for monster 
commits. ("The number of bugs is the _square_ of the number of changed 
lines.") With Git, that problem is virtually not there.

> Besides that, what are the differences between managing project using 
> centralized SCM and one using distributed SCM? What is equivalent of 
> committers, giving full and partial commit access, revoking commit 
> access?

I have to admit that I drive one of my projects "CVS" style, with SSH 
accounts for all developers, who push into the same repo.

But that worked quite well up to now.

If I _had_ to restrict them, I'd probably do that by (temporarily) 
assigning a release engineer, and setting up some hook scripts in all 
repos. But I don't believe in restriction when it comes to creativity.

> How good support for tagging and branching influences creating 
> code and build procedure?

> Is distributed SCM better geared towards "benovolent dictator" model 
> than "consensus-based democracy" model, as described in OSSbook?

Not at all. I think the best example is kernel.org, where you find tons of 
forks. IMHO it is really helping the benevolent dictator cave into the 
consensus-based model, since forks can be preferred at any time. Hey, even 
switching from one to another upstream is just a git-pull away!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01  9:35 ` Johannes Schindelin
@ 2007-05-01 15:23   ` Theodore Tso
  2007-05-01 15:45     ` Johannes Schindelin
  2007-05-01 18:30   ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Theodore Tso @ 2007-05-01 15:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, git

On Tue, May 01, 2007 at 11:35:54AM +0200, Johannes Schindelin wrote:
> > [...]
> > 
> >   Because a fork is bad for everyone (for reasons examined in detail in 
> >   the section called "Forks" in Chapter 8, Managing Volunteers, 
> >   http://producingoss.com/producingoss.html#forks), the more serious the 
> >   threat of a fork becomes, the more willing people are to compromise to 
> >   avoid it.
> 
> This is a lousy argument, IMHO.
> 
> Why are forks bad? They are not. But if you "learnt" that merges are hard, 
> they are.
> 
> It is a pity that so many people were trained in CVS, and keep thinking 
> some of the lectures were true, when they are no longer.
> 
> Forks are good. In fact, we all "forked" with CVS as soon as we began 
> hacking. Everybody who claims to never have started over from a fresh 
> checkout, or from an "update -C"ed state, is probably lying, or a bad 
> developer. Thinking about it, I believe that the difference between 
> forking and branching is philosophical, not technical. You can always 
> merge a fork.

There's a confusion going on here between a "fork" meaning a branch in
the SCM sense of the word, and a "Project Fork" where there are two
camps competing for developers and users.  So for example, having
kerenl developers develop using branches which are then merged into
the -mm tree and then into Linus tree --- Good.  In the
suspend-to-disk world, where we have *three* separate implementations,
with two in the mainline tree, and one very popular one, suspend2,
with features that niether of the in-mainline implementations have,
and with Pavel constantly casting aspersions at Nigel because he's
splitting the development effort --- Not So Good.

I prefer to use the term "branch" to talk about a SCM and development
series, and to use the term "fork" to talk about the political/project
issues.  So for example, even though Ingo Molnar's CONFIG_PREEMPT_RT
patchset has been a very long-running thing, it is constantly getting
rebased against the kernel, and there is no expectation that this
would replace the mainline kernel.  That makes a code branch, and not
a fork.

So my suggestion is to let branches be branches, and to reserve fork
for when there is an attempt to compete for developer and user
attention.  That is more or less the general understanding of the two
terms, and trying to confuse the two only leads to confusion and a
general muddying of the waters.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01 15:23   ` Theodore Tso
@ 2007-05-01 15:45     ` Johannes Schindelin
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2007-05-01 15:45 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jakub Narebski, git

Hi,

On Tue, 1 May 2007, Theodore Tso wrote:

> On Tue, May 01, 2007 at 11:35:54AM +0200, Johannes Schindelin wrote:
> > > [...]
> > 
> > Forks are good. In fact, we all "forked" with CVS as soon as we began 
> > hacking. Everybody who claims to never have started over from a fresh 
> > checkout, or from an "update -C"ed state, is probably lying, or a bad 
> > developer. Thinking about it, I believe that the difference between 
> > forking and branching is philosophical, not technical. You can always 
> > merge a fork.
> 
> There's a confusion going on here between a "fork" meaning a branch in 
> the SCM sense of the word, and a "Project Fork" where there are two 
> camps competing for developers and users.

So you agree! I said that it is a philosophical, and not a technical 
issue.

> So for example, having kerenl developers develop using branches which 
> are then merged into the -mm tree and then into Linus tree --- Good.  
> In the suspend-to-disk world, where we have *three* separate 
> implementations, with two in the mainline tree, and one very popular 
> one, suspend2, with features that niether of the in-mainline 
> implementations have, and with Pavel constantly casting aspersions at 
> Nigel because he's splitting the development effort --- Not So Good.

But why! Because Pavel is just ignoring reality. I always wondered why the 
work of Nigel was never considered for inclusion, even if it was clearly 
superiour from a usability view point.

And if it is usable, but not clean, then clean it up. Instead, Pavel seems 
to never even have considering casting his planet sized ego aside and 
admit that his work is just not up to par with Nigel's, and start to 
clean up suspend2.

So in that case, I am even _more_ happy that forking is so easy, because I 
did not _have_ to suffer all that much from people who cannot enter my 
flat because their head does not fit through the door, but I could just 
happily use suspend2 and be fine.

BTW the same goes for Reiser4, which is quite fast and flexible, and I do 
not care at all about the ardent discussions around it.

> I prefer to use the term "branch" to talk about a SCM and development 
> series, and to use the term "fork" to talk about the political/project 
> issues.  So for example, even though Ingo Molnar's CONFIG_PREEMPT_RT 
> patchset has been a very long-running thing, it is constantly getting 
> rebased against the kernel, and there is no expectation that this would 
> replace the mainline kernel.  That makes a code branch, and not a fork.

I refuse to get involved in such a sophistic (not to be confused with 
sophisticated) discussion.

I am _only_ interested in the technical side. Philosophical discussions, 
while fun when not taken too seriously, _can_ take all the fun out for me 
when the participants get too religious about their beliefs. So please, 
keep me out of them.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-04-29 23:20 "Producting Open Source Software" book and distributed SCMs Jakub Narebski
  2007-05-01  9:35 ` Johannes Schindelin
@ 2007-05-01 16:15 ` Linus Torvalds
  2007-05-01 22:27   ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2007-05-01 16:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Mon, 30 Apr 2007, Jakub Narebski wrote:
> 
> Among others, author advocates using version control system as a basis 
> for running a project. In "Choosing a Version Contol System" he writes:
> 
>   As of this writing, the version control system of choice in the free
>   software world is the Concurrent Versions System or CVS.

Well, I actually personally suspect that the original Linux method of 
"patches + tar-balls" is a perfectly valid method of source control 
management, and in many ways preferable over CVS.

So no, I don't think using a version control system should be the _basis_ 
of running a project. Version control comes pretty far down the list, long 
long after "good taste" and "willingness to do things rather than talk 
about them", the latter of which tends to kill more hypothetical projects 
than even CVS has ever done.

The _basis_ of an open source project is a good manager, a good idea, and 
a realization that what matters most is _using_ the end result, rather 
than the idea or discussions or "cool features".

The SCM becomes relevant only once you are far enough along that tar-balls 
and patches really don't work, and that might well take years.

[ I'm really serious: I think a lot of the good practices that the kernel 
  project has gotten is exactly because of the "patches rule" mentality. 

  We now use real revision control, but I really *really* believe that 
  pushing patches around is a much better way of managing stuff than with 
  CVS or any other centralized model, because in the centralized model it 
  always ends up being about the "core team". In contrast, even if there 
  is a core team, if they just push patches around and discuss them as 
  such, non-core-team members are automatically basically all equal.

  And avoiding the politics, and avoiding the "five people are special" 
  mentality is a *lot* more important than the limited and broken tracking 
  capabilities that CVS brings to the table.

  So maybe I'm just in denial, but I really believe that the fact that the 
  kernel was basically maintained _without_ an SCM for a decade was 
  actually a *good* thing, considering the alternatives. ]

> Further on much of examples of managing project and managing volunteers 
> revolves around the idea of "commit access", and it is assumed 
> implicitely that version control system is centralized.

Karl Fogel is wrong. 

It's an understandable mistake to do, since commit access is so important 
in a centralized environment, and he probably has never used anything else 
(even decentralized SCM's are often _used_ as centralized ones), but he's 
still *wrong*. Fundamentally so:

> The distributed SCM is mentioned in footnote in section "Comitters" in 
> Chapter 8, Managing Volunteers:
> 
>  http://producingoss.com/producingoss.html#ftn.id284130
> 
>   [22] Note that the commit access means something a bit different in
>   decentralized version control systems, where anyone can set up a
>   repository that is linked into the project, and give themselves commit
>   access to that repository. Nevertheless, the concept of commit access
>   still applies: "commit access" is shorthand for "the right to make
>   changes to the code that will ship in the group's next release of the
>   software." In centralized version control systems, this means having
>   direct commit access; in decentralized ones, it means having one's
>   changes pulled into the main distribution by default. It is the same
>   idea either way; the mechanics by which it is realized are not
>   terribly important.

That's just making excuses. Yes, you can use the same words, and say that 
you call the two TOTALLY DIFFERENT things "commit access", and then, 
because you've made two totally different things use the same term, you 
claim that it's the same thing, and the differences aren't "terribly 
important".

It's like saying that a distributed (or threaded, for that matter) 
algorithm and a linear algorithm both result in the same result, so the 
"mechanics" of the algorithm are not terribly relevant: they're both 
algorithms.

Anybody who has ever done any distributed algorithms realizes that the 
mechanichs are *hugely* important.  The difference between a distributed 
situation and a centralized one is absolutely humongous. It changes 
literally everything.

Does the fact that you *can* run a distributed algorithm on one machine 
make it the same? No. Does the fact that the end result is called the same 
make the two the same? No. It's a totally different model, and they share 
almost none of the issues.

When it comes to "commit access", not only is the term nonsensical in a 
distributed environment, even if you want to use that term to describe the 
notion of "gets pulled into the next release", it's not even TRUE.

People like Andrew, Ingo, and Davem have what Karl would probably call 
"commit access". Andrew and Ingo have it even though they don't actually 
even use git to synchronize with me. But no, they don't actually get 
pulled into the next release by default _anyway_ - there's always a 
conscious choice after the fact, rather than any implicit permission.

I quite often tell maintainers that I won't pull their stuff, simply 
because the changes look too scary, and I'm too close to a release. Yes, 
it happens less often than me just silently pulling it, but that's not a 
sign of "commit access", that's a sign of the fact that the process 
_works_ in the first place. If we spent all our time arguing about it, and 
people didn't just "know" how to behave, we'd never get anything done.

So that "get pulled by default" has _nothing_ to do with commit access, 
and everything to do with much higher-level process issues. And it's 
something that distributed development makes _possible_ in a way that the 
centralized model with "commit access" simply does not.

Miles and miles apart. And a very important distinction.

(Btw, I'll argue that it's really important inside companies too, even 
when the source control in question is "controlled". When you do things 
like validation, you shouldn't just allow "commit access" to the tree to 
be validated. The validation group should maintain a tree that *they* 
control, and getting things accepted into their tree should be just one 
step on a "release schedule")

> Linus has said that fully distributed SCM improves forkability:

Yes. There's two issues to forkability:

 - all real development happens as "micro-forks", and so you should make 
   that easy, whether it's an "inside" developer or somebody else who just 
   has a wild and crazy idea that might just work.

 - all real _honesty_ comes from a belief that the code *can* be forked, 
   and that even the original developer and/or top maintainer cannot force 
   his world-view on anybody.

Both of these are important, but the latter is important not because it 
should be the "normal case", but just because the _knowledge_ that a fork 
can happen should keep people honest.

Big forks due to fundmanetal personality clashes (they are sometimes 
about technology, but even when they are ostensibly about technology 
issues, they are often very much about strong personal ideas about that 
technology) are painful. But they should be painful not because of the SCM 
in question, but simply because handling personality issues is inherently 
painful.

The SCM shouldn't allow people to be a*-holes and control freaks.

And I think Karl Fogel agrees with me on that. When he says 

   .. the more serious the threat of a fork becomes, the more willing 
   people are to compromise to avoid it.

he's right on the money, and I _think_ he meant it in the good way 
(compromise and trying to work with people is absolutely a _must_).

> Besides that, what are the differences between managing project using 
> centralized SCM and one using distributed SCM? What is equivalent of 
> committers, giving full and partial commit access, revoking commit 
> access?

So here's what happens for the kernel:

 - we simply don't *have* commit access

 - there's no "partial", and there's not "revoking"

 - there are people I trust, but I don't trust them implicitly in the 
   sense that I give them the keys to my repository. If they go crazy, 
   there's nothing to revoke. NOTHING. If they go crazy, I just don't pull 
   from them. It's really rhat easy!

 - there are people I trust in certain areas, but that doesn't mean that 
   they can't make changes everywhere. It just means that I won't pull 
   unless I see that the changes are only to those areas.

   And again, it's not an "up-front" decision: when people ask me to pull, 
   they tell me (by way of a diffstat) what they changed, and I can - and 
   actually do this, although mostly because it avoids mistakes - verify 
   it, because the pull always tells me what got changed.

 - In fact, what happens occasionally is that I pull something, and tell 
   people "nope, that won't do" and just discard their changes. It doesn't 
   happen every day, but it happened yesterday - David Miller (who is one 
   of the top developers) sent me a fix, I fetched it and told him it was 
   incomplete and I wouldn't pull until it was fixed.

Notice? No partial commit access, no revoking, no granting. No politics. 
No up-front "you have rights". Just a very basic issue: trust.

And the nice thing about this is that if some subsystem needs to make 
trivial changes to another subsystem, they don't need to ask for 
permission. They just do them, AND THEN THEY EXPLAIN THEM! And if they 
really were trivial and obvious (and that's almost always the case), they 
just get pulled normally. No special dispensation.

This is somethign that a centralized repository with commit access 
fundamentally *cannot* do! If a maintainer who has partial commit access 
needs to fix something else in order to make his subtree work, he's 
basically screwed. He cannot commit his changes to *his* area, just 
because they depend on a fix to another persons area, and he cannot commit 
that.

Centralized SCM's are *fundamentally* broken. And the whole "commit 
access" is very much part of that breakage. A distributed system doesn't 
have it, doesn't need it, and is much much better off without it!

This is why I said Karl was totally off when he said that there's an 
equivalent to "commit access" in a distributed system too. It's just not 
true. Everything that people use "commit access" for just entirely goes 
away!

> How good support for tagging and branching influences creating 
> code and build procedure? Is distributed SCM better geared towards 
> "benovolent dictator" model than "consensus-based democracy" model, as 
> described in OSSbook?

I think branching is so fundamnetal to being distributed, that asking 
whether good support for something like that is important for build 
procedure is just not a valid question. It's like asking "How important is 
water to your social life?" It's supremely important in the sense that 
without water, you wouldn't have a social life, but that's because you 
wouldn't _exist_ in the first place. But does that make water _directly_ 
important to your social life? Probably not, unless your life revolves 
around playing water polo with your buddies.

Same goes for the benevolent dictator vs consensus-based model. I think 
the distributed setup has advantages for both, and the advantages are much 
more fundamental than anything direct. You can use distributed for either 
model, and in both cases, the tools a distributed system gives you are 
just different (and much better). 

			Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01  9:35 ` Johannes Schindelin
  2007-05-01 15:23   ` Theodore Tso
@ 2007-05-01 18:30   ` Jakub Narebski
  2007-05-01 23:13     ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2007-05-01 18:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi

On Thursday, 1 May 2007, Johannes Schindelin wrote:
> On Mon, 30 Apr 2007, Jakub Narebski wrote:
> 
>> Linus has said that fully distributed SCM improves forkability:
>>
>> [...] 
>> 
>>   I think that "forking" is what keeps people honest. The _biggest_
>>   downside with CVS is actually that a central repository gets so much
>>   _political_ clout, that it's effectively impossible to fork the
>>   project: [...]
>> 
>> According to "Producting Open Source Software" it is very important 
>> feature for an OSS project.
>>
>> [...]
>> 
>>   Because a fork is bad for everyone (for reasons examined in detail in 
>>   the section called "Forks" in Chapter 8, Managing Volunteers, 
>>   http://producingoss.com/producingoss.html#forks), the more serious the 
>>   threat of a fork becomes, the more willing people are to compromise to 
>>   avoid it.
> 
> This is a lousy argument, IMHO.
> 
> Why are forks bad? They are not. But if you "learnt" that merges are hard, 
> they are.
> 
> It is a pity that so many people were trained in CVS, and keep thinking 
> some of the lectures were true, when they are no longer.
> 
> Forks are good. In fact, we all "forked" with CVS as soon as we began 
> hacking. Everybody who claims to never have started over from a fresh 
> checkout, or from an "update -C"ed state, is probably lying, or a bad 
> developer. Thinking about it, I believe that the difference between 
> forking and branching is philosophical, not technical. You can always 
> merge a fork.

IIRC Compiz and Beryl (fork of Compiz) plan to be merged. Both projects
use git as SCM. We will see how this "merge a fork" will work.

In "Producting Open Source Software" Karl Fogel gives an example of
GCC/EGCS fork, which resulted in "fast forward" merge (EGCS which was
fork of GCC, became next version of GCC). Similar example is XFree86/X.Org
fork; Linux distributions went from packaging XFree86 to packaging X.Org.

But for example GNU Emacs / XEmacs fork will never be merged, I think.
So not always you can merge a fork - you can try, unless codebase diverged
too much.

>> Is distributed SCM better geared towards "benovolent dictator"
>> model than "consensus-based democracy" model, as described in
>> OSSbook?
>
> Not at all. I think the best example is kernel.org, where you find
> tons of forks. IMHO it is really helping the benevolent dictator cave
> into the consensus-based model, since forks can be preferred at any
> time. Hey, even switching from one to another upstream is just a
> git-pull away!

What is or is not a fork is a bit blurry in the world of distributed
version control systems. Is a clone of repository a fork? I think that
everybody would agree that it is not. Is for example *-mm tree a fork?
I'd say not. But I'd say that Beryl is a fork of Compiz...

-- 
Jakub Narebski
ShadeHawk on #git
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01 16:15 ` Linus Torvalds
@ 2007-05-01 22:27   ` Jakub Narebski
  2007-05-01 22:45     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2007-05-01 22:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds wrote:

> And the nice thing about this is that if some subsystem needs to make 
> trivial changes to another subsystem, they don't need to ask for 
> permission. They just do them, AND THEN THEY EXPLAIN THEM! And if they 
> really were trivial and obvious (and that's almost always the case), they 
> just get pulled normally. No special dispensation.

Actually Karl Fogel wrote in "Producting Open Source Software" that he
recommends and uses 'soft' partial commit access; it means that committing
is restricted to a part of project for some by a guideline, but is not
enforced by the tool (by SCM).

P.S. I recommend actually reading the book (at http://producingoss.com)
instead of relying on my understanding of it.

-- 
Jakub Narebski
ShadeHawk on #git
Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01 22:27   ` Jakub Narebski
@ 2007-05-01 22:45     ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2007-05-01 22:45 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Wed, 2 May 2007, Jakub Narebski wrote:
> 
> Actually Karl Fogel wrote in "Producting Open Source Software" that he
> recommends and uses 'soft' partial commit access; it means that committing
> is restricted to a part of project for some by a guideline, but is not
> enforced by the tool (by SCM).

Oh, absolutely. Except that really does require a lot of trust up front, 
which is the problem with commit access to begin with - you automatically 
have a very clear (and *big*) difference between insiders and outsiders, 
and there is no "gradual" way to move from one to the other.

So yes, for practical reasons, "commit access" really is almost always an 
all-or-nothing thing for most centralized setups, because nothing else 
really works. And when it isn't, it's just a horrible horrible pain in the 
*ss.

What people do instead of commit access is to set up triggers to notify 
people about certain subsystems being modified. Which is a good idea, but 
it's really a totally different thing.

> P.S. I recommend actually reading the book (at http://producingoss.com)
> instead of relying on my understanding of it.

It actually looks like a fine book, even though I think Karl is totally 
off in not seeing the big difference between centralized and distributed. 

I saw it at the local Borders, and considered buying it. I didn't even 
realize that it apparently is downloadable too.

And it talks about a lot of other things than just SCM's.

			Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "Producting Open Source Software" book and distributed SCMs
  2007-05-01 18:30   ` Jakub Narebski
@ 2007-05-01 23:13     ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2007-05-01 23:13 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Johannes Schindelin, git

On Tue, 1 May 2007, Jakub Narebski wrote:
> 
> In "Producting Open Source Software" Karl Fogel gives an example of
> GCC/EGCS fork, which resulted in "fast forward" merge (EGCS which was
> fork of GCC, became next version of GCC).

The egcs fork was a total disaster, and a big part of that was CVS and the 
tight control of the gcc tree. 

It took _years_ for people to get so fed up with the gcc maintenance that 
the egcs tree happened at all, and it was a prime example of how *painful* 
CVS makes this, and how it allowed the gcc maintainers to do a really bad 
job, and ignore a whole lot of major problems simply because the whole gcc 
setup was so hard to get into.

So yes, the egcs fork is a great example. It was not only a required (and 
very good) fork, but it is _also_ an example of a setup where all the 
infrastructure made the fork take a lot longer to materialize and be a lot 
more painful than it should have been.

> But for example GNU Emacs / XEmacs fork will never be merged, I think.
> So not always you can merge a fork - you can try, unless codebase diverged
> too much.

In all honesty, I don't think any tools would help there. Git can make 
merging easier, but it cannot solve the fundamental differences in 
personality and it can't help with ten years of differences. Git tries to 
make merging easy by making it happen all the time, and thus the git 
merge capability really depend on changing the *model*. But git cannot 
really help you all that much if you have a decade of split, and the 
codebases just don't look similar any more..

(Not entirely true: git obviously does make merging easier, since people 
have piped up to say that they imported branches from SVN just to merge 
them in git and push the result back to SVN. So git _does_ help on the 
pure technical side too, but I think the even more important part is how 
git tries to encourage the model to be that one or both sides just merge 
often enough that the merges _stay_ easy).

> What is or is not a fork is a bit blurry in the world of distributed
> version control systems. Is a clone of repository a fork? I think that
> everybody would agree that it is not. Is for example *-mm tree a fork?
> I'd say not. But I'd say that Beryl is a fork of Compiz...

Well, the -mm tree is a fork, but perhaps the difference is that the 
_intention_ is to merge back.

We've had "real forks" in the kernel community too. Vendor branches for a 
while tended to be real forks - not because the vendors didn't want to 
merge back, but simply because they didn't have the capability and 
commitment to do so. That's changed, partly because 2.4->2.6 was so 
painful for some of them.

And the VM people have had real forks. The -aa tree wasa real fork in the 
2.4.x timeframe.

So I think the reason kernel people don't really think about "-mm" as a 
fork is that we've tended to be pretty amicable about the forks, whatever 
the intention was. I personally encourage them, for example, in ways that 
most other bigger open source projects do not. That makes it easier 
psychologically to fork, but more importantly, it also makes it easier to 
join back again, because there was generally no hard feelings, just 
differences of opinion on technical matters that didn't get to be _too_ 
personal.

So to _me_, the big issue is not so much forking, but joining it all back 
(ie merging). Forking should be trivial, and not even worthy of any real 
discussion. It should be a daily event, and sure, you'd expect the small 
forks to heavily outnumber the big ones, but none of that really matters 
if you just consider forking to not be a big deal - and always realize 
that joining back is where the interesting stuff happens!

			Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-01 23:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-29 23:20 "Producting Open Source Software" book and distributed SCMs Jakub Narebski
2007-05-01  9:35 ` Johannes Schindelin
2007-05-01 15:23   ` Theodore Tso
2007-05-01 15:45     ` Johannes Schindelin
2007-05-01 18:30   ` Jakub Narebski
2007-05-01 23:13     ` Linus Torvalds
2007-05-01 16:15 ` Linus Torvalds
2007-05-01 22:27   ` Jakub Narebski
2007-05-01 22:45     ` Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).