git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Should branches be objects?
@ 2014-06-19 21:37 Nico Williams
  2014-06-19 23:46 ` Jonathan Nieder
  0 siblings, 1 reply; 6+ messages in thread
From: Nico Williams @ 2014-06-19 21:37 UTC (permalink / raw
  To: git

[I'm a list newbie here, but a git power user.]

If branches were objects...

 - one could see the history of branches, including

 - how commits were grouped when pushed/pulled (push 5 commits, and
the branch object will record that its head moved by those five
commits at once)

 - rebase history (git log <branch-object> -> better than git reflog!)

 - object transactional APIs would be used to update branches

Branch objects might be purely local, recording what was done in a
local repo to a branch, but they might be pullable, to make branch
history viewable in clones.

Just a thought,

Nico
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should branches be objects?
  2014-06-19 21:37 Should branches be objects? Nico Williams
@ 2014-06-19 23:46 ` Jonathan Nieder
  2014-06-20  0:25   ` Nico Williams
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2014-06-19 23:46 UTC (permalink / raw
  To: Nico Williams; +Cc: git, Ronnie Sahlberg

Hi,

Nico Williams wrote:

>  - one could see the history of branches, including

Interesting.  'git log -g' is good for getting that information
locally, but the protocol doesn't have a way to get it from a remote
server so you have to ssh in.  Ronnie (cc-ed) and I were talking
recently about whether it would make sense to update git protocol to
have a way to get at the remote reflogs more easily --- would that be
useful to you?

>  - how commits were grouped when pushed/pulled (push 5 commits, and
> the branch object will record that its head moved by those five
> commits at once)

The reflog on the server (if enabled) records this.

>  - rebase history (git log <branch-object> -> better than git reflog!)

The local reflog ('git log -g <branch>') records this.

>  - object transactional APIs would be used to update branches

Ronnie's recent ref-transaction code does this.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should branches be objects?
  2014-06-19 23:46 ` Jonathan Nieder
@ 2014-06-20  0:25   ` Nico Williams
  2014-06-20  0:31     ` Nico Williams
  2014-06-20  1:01     ` Jonathan Nieder
  0 siblings, 2 replies; 6+ messages in thread
From: Nico Williams @ 2014-06-20  0:25 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: git, Ronnie Sahlberg

On Thu, Jun 19, 2014 at 6:46 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Nico Williams wrote:
>
>>  - one could see the history of branches, including
>
> Interesting.  'git log -g' is good for getting that information
> locally, but the protocol doesn't have a way to get it from a remote
> server so you have to ssh in.  Ronnie (cc-ed) and I were talking
> recently about whether it would make sense to update git protocol to
> have a way to get at the remote reflogs more easily --- would that be
> useful to you?

Yes and no.  I've thought about that some concept, but:

a) reflogs include information about what's done to the workspace
(checkout...) that's not relevant to any branch,

b) reflogs aren't objects, which ISTM has caused transactional issued
(even if they are fixed or soon to be),

c) the fewer kinds of things, the more elegant the design, so maybe
reflogs ought to be objects themselves, which is one thought that led
me to "branches should be objects".

Another thought that led me there is that I often do:

$ git checkout -b ${branch}-rebase1
$ git rebase -i master
...
$ git checkout -b ${branch}-rebase2
$ git rebase -i master
...

I iterate through this until a set of commits is the way the upstream wants it.

No one really needs that history, except me: possibly to show my
boss/customer, possibly to put together a list of changes I've done to
show the upstream maintainer, ...   Yes, this is in the reflog, but...
it's mixed up with unrelated stuff.

Also, I'd like to be able to git diff
<branch-version>..<same-branch-diff-branch-version>.  Again, for my
own purposes in collating changes I've done to previously submitted
PRs.

Now, I can do that as I always have, but it litters my branch namespace.

Lastly, there are people who just don't get rebasing.  They think it's
horrible because it changes the truth.  You've met them, I'm certain.
Branches as objects might help mollify them.

>>  - how commits were grouped when pushed/pulled (push 5 commits, and
>> the branch object will record that its head moved by those five
>> commits at once)
>
> The reflog on the server (if enabled) records this.

Yeah, though as you point out I can't see it.

>>  - rebase history (git log <branch-object> -> better than git reflog!)
>
> The local reflog ('git log -g <branch>') records this.

See above.

>>  - object transactional APIs would be used to update branches
>
> Ronnie's recent ref-transaction code does this.

Speaking of which: are there any power failure corruption cases left
in git?  How is this tested?

Nico
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should branches be objects?
  2014-06-20  0:25   ` Nico Williams
@ 2014-06-20  0:31     ` Nico Williams
  2014-06-20  1:01     ` Jonathan Nieder
  1 sibling, 0 replies; 6+ messages in thread
From: Nico Williams @ 2014-06-20  0:31 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: git, Ronnie Sahlberg

Another thing is that branches as objects could store a lot more
information, like:

 - the merge-base and HEAD for a rebase (and the --onto)

 - the interactive rebase plan!  (and diffs to what would have been
the non-interactive plan)

 - the would-be no-op non-interactive rebase plan post rebase (again,
so elucidate what commit splitting and such things occurred during a
rebase)

Nico
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should branches be objects?
  2014-06-20  0:25   ` Nico Williams
  2014-06-20  0:31     ` Nico Williams
@ 2014-06-20  1:01     ` Jonathan Nieder
  2014-06-20  2:27       ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Nieder @ 2014-06-20  1:01 UTC (permalink / raw
  To: Nico Williams; +Cc: git, Ronnie Sahlberg

Nico Williams wrote:

> a) reflogs include information about what's done to the workspace
> (checkout...) that's not relevant to any branch,

Nope, reflogs just record changes to refs and information about why
they happened.

> b) reflogs aren't objects, which ISTM has caused transactional issued
> (even if they are fixed or soon to be),

Not sure I understand.  Do you mean that if reflogs were named by their
content then they wouldn't need to be renamed when a ref is renamed?
Or are you referring to some other atomicity issue?

[...]
> $ git checkout -b ${branch}-rebase1
> $ git rebase -i master
> ...
> $ git checkout -b ${branch}-rebase2
> $ git rebase -i master
> ...
>
> I iterate through this until a set of commits is the way the upstream wants it.
>
> No one really needs that history, except me: possibly to show my
> boss/customer, possibly to put together a list of changes I've done to
> show the upstream maintainer, ...   Yes, this is in the reflog, but...
> it's mixed up with unrelated stuff.

Yes, this isn't something we do well at all.  It would be nice to have a
tool that can take two versions of a branch (from different refs, taken
from the reflog, or whatever) and visually represent what happened to
corresponding commits.

Thomas Rast started work on such a thing called tbdiff, which you can
find at https://github.com/trast/tbdiff.

[...]
> Also, I'd like to be able to git diff
> <branch-version>..<same-branch-diff-branch-version>.  Again, for my
> own purposes in collating changes I've done to previously submitted
> PRs.

Do you mean 'git diff mybranch mybranch@{3}' /
'git diff <mybranch> <mybranch>@{3.days.ago}'?

[...]
>>>  - object transactional APIs would be used to update branches
>>
>> Ronnie's recent ref-transaction code does this.
>
> Speaking of which: are there any power failure corruption cases left
> in git?  How is this tested?

What kind of power failure corruption are you talking about?  Git
usually updates files by writing a completely new file and then
renaming it into place, so depending on your filesystem this means it
is very hard or very easy to lose data with a power failure. :)

If you're on one of those filesystems where it is very easy and you
lose power a lot, you'll probably want to enable the
core.fsyncobjectfiles configuration option.  It might be worth adding
another knob like that for the other files git writes if someone is
interested.

Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Should branches be objects?
  2014-06-20  1:01     ` Jonathan Nieder
@ 2014-06-20  2:27       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2014-06-20  2:27 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Nico Williams, git, Ronnie Sahlberg

On Thu, Jun 19, 2014 at 06:01:47PM -0700, Jonathan Nieder wrote:

> > Speaking of which: are there any power failure corruption cases left
> > in git?  How is this tested?
> 
> What kind of power failure corruption are you talking about?  Git
> usually updates files by writing a completely new file and then
> renaming it into place, so depending on your filesystem this means it
> is very hard or very easy to lose data with a power failure. :)

We use git-core on ext4 at GitHub, and we certainly have seen our share
of machines failing unexpectedly. We haven't seen any problems of this
nature[1] (but note that we journal data writes; you should also be fine
with ordered data writes, but data=writeback is likely disastrous).

> If you're on one of those filesystems where it is very easy and you
> lose power a lot, you'll probably want to enable the
> core.fsyncobjectfiles configuration option.  It might be worth adding
> another knob like that for the other files git writes if someone is
> interested.

You probably know this already Jonathan, but to be clear:

Git always fsyncs pack writes. That knob controls fsyncing of loose
object files, but nothing else. So ref writes (and writing packed-refs)
could be corrupted on a filesystem that doesn't order data and metadata
writes (and there is currently no way to tell git to do otherwise).

My recommendation would be to steer clear or reconfigure such systems,
but it also would not be very hard to add an optional fsync in those
cases.

-Peff

[1] We did have one case where after a crash packfiles would end up
    corrupted, but it turned out to be bad RAM in a battery-backed RAID
    card that was transparently caching (and losing) the writes.
    There's not much git can do when fsync lies to it, nor much the
    kernel can do when the hardware lies to it. :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-20  2:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-19 21:37 Should branches be objects? Nico Williams
2014-06-19 23:46 ` Jonathan Nieder
2014-06-20  0:25   ` Nico Williams
2014-06-20  0:31     ` Nico Williams
2014-06-20  1:01     ` Jonathan Nieder
2014-06-20  2:27       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).