git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
@ 2019-04-09 15:39 Kapil Jain
  2019-04-10  4:40 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Kapil Jain @ 2019-04-09 15:39 UTC (permalink / raw)
  To: git, Thomas Gummerer, Johannes Schindelin

Plan to implement the project.

Objective:
Teach git stash to handle unmerged index entries.

Description:
When the index is unmerged, git stash refuses to do anything. That is
unnecessary, though, as it could easily craft e.g. an octopus merge of
the various stages. A subsequent git stash apply can detect that
octopus and re-generate the unmerged index.


Implementation Idea:
Performing an octopus merge of all `stage n` (n>0) unmerged index
entries, could solve the problem, but

What if there are conflicts in merging ?
In this case, we would store(commit) the conflicted state, so they can
be regenerated when git stash is applied.

How to store the conflicted files ?
create a tree from the merge using `git-write-tree`
and then commit that tree using `git-commit-tree`.


Relevant Discussions:
https://colabti.org/irclogger/irclogger_log/git-devel?date=2019-04-05#l92
https://colabti.org/irclogger/irclogger_log/git-devel?date=2019-04-09#l47


Idea Execution Plan: Divided into 2 parts.

Part 1: Store the unmerged index entries this part will work with `git
stash push`

stash.sh: file would be changed to accommodate the below implementation.

Step 1:
Extract all the unmerged entries from index file and store them in a
temporary index file.

read-cache.c: this file is responsible for reading index file,
probably this implementation will end up in this file.

Step 2:
cache-tree.c: study and implement a slightly modified version of the
function `write_index_as_tree()`

int write_index_as_tree(struct object_id *oid, struct index_state
*index_state, const char *index_path, int flags, const char *prefix);

this function is responsible for writing tree from index file.
Currently in this function, the index must be in a fully merged state,
and we are dealing with its exact opposite. So a version to write tree
for unmerged index entries will be implemented.

Step 3:
write-tree.c: some possible changes will go here, so as to use the
modified version of write_index_as_tree() function.

Step 4:
use git-commit-tree to commit the written tree and store the hash in
some file say `stash_conflicting_merge`

Step 5:
Write tests for all implementation till this point.

Part 2: Retrieve the tree hash and regenerate the state of repository
as it was earlier.

Step 6:
Modify implementation of `git stash apply` for regenerating the committed tree.

Step 7:
Write tests.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
  2019-04-09 15:39 [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries Kapil Jain
@ 2019-04-10  4:40 ` Junio C Hamano
  2019-04-10  5:09   ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2019-04-10  4:40 UTC (permalink / raw)
  To: Kapil Jain; +Cc: git, Thomas Gummerer, Johannes Schindelin

Kapil Jain <jkapil.cs@gmail.com> writes:

> Plan to implement the project.
>
> Objective:
>
> Description:
>
> Implementation Idea:
>
> Relevant Discussions:
>
> Idea Execution Plan: Divided into 2 parts.

Two things missing before implementation idea are design, and more
importantly, the success criteria.  What lets you and your mentor
declare victory?

As to the design, it does not quite matter if you add four or more
separate trees to represent stage #[0123] entries in the index to
the already octopus merge commit that represents a stash entry
(i.e. when keeping the untracked ones, I think the stash entry's
"result of the merge" tree records the state of the tracked files in
the working tree, and the "result of the merge" commit records the
the-current HEAD, a commit that records the state of the index and
anothre commit that records the state of the untracked files, as its
parents---that's already a 3-parent octopus).

The fact that a stash entry is represented as a merge commit is a
mere implementation detail, and there is *NO* need to worry about
resolving merge conflicts while recording a stash.  If the result of
this GSoC task is to be any usable together with the current version
in a backward compatible way, you must record these extra states as
extra parents of the merge, so it is sort of given already that
you'd be using some form of an octopus merge.

The real challenge would be how the unstashing part of such a stash
entry that records unmerged state should work.  Personally I do not
think it will be very useful to allow unstashing such a stash entry
on top of any arbitrary commit---rather, I suspect that the user
would want to come back to the exact HEAD the user had trouble
resolving conflicts at, without having to first checking it out.
IOW, a usual way to use "git stash" is

	$ git checkout topic
	$ edit edit edit
	... I am happily hacking away ...
	... the boss appears with an ultra-urgent task ...
	$ git stash save -m WIP
	$ git checkout master
	$ edit-and-build-and-test
	$ git commit
	... now the emergency is over ...
	$ git checkout topic
	... sync with the work others may have done on topic
	... while I was dealing with the boss
	$ git pull --rebase origin topic
	$ git stash pop

IOW, it is expected to be applied on top of an updated commit.

But I have a moderately strong suspicion that a stash that holds
unmerged state (i.e. a conflicted merge in progress) is created with
a use case, which is very different from the normal use case, in
mind.  When creating such a stash entry, the above sequence would go
more like this:

	$ git checkout topic
	$ git merge ...
	... oops, conflicted, and it takes time to resolve ...
	$ edit edit inspect edit
	... the boss appears
	$ git stash save -m "Merge in progress"
	$ git checkout master
	... deal with the emergency the same way ...
	$ git checkout topic
	... go back to the conflict resolution first without
	... touching what may have happened on the branch in
	... the meantime---a human brain cannot afford to deal
	... with two or more parallel conflicts at the same
	... time.
	$ git stash pop
	... now deal with the conflict we were looking at
	... before the boss interrupted us.
	$ edit inspect edit
	... be satisfied with the result
	$ git commit
	... now let's see if others have something else that
	... is interesting
	$ git pull --rebase origin topic

And if we assume that the primary use of a stash for a conflicted
state is to bring us back to the exact state (rather than allowing
us to pretend as if we started form a different HEAD), it might even
make sense to teach "git stash pop" step to barf if HEAD does not
match the first parent of the merge commit that represents the stash
entry being applied (again, stash^{tree} is the working tree,
stash^1 is then-current HEAD).  That would make the application side
a lot simpler and manageable by developers who are not intimately
familiar with the code.

Others may disagree with the above assumption (i.e. "a stash for a
conflicted state does not have to be applicable), though, making
your task a lot harder ;-).

Quite honestly, I do not think you can design a system that attempts
to "stash apply/pop" a recorded unmerged state on top of any
arbitrary HEAD and leave a state useful for the end user to deal with
when the "stash apply/pop" step itself introduces _new_ conflicts
due to the differences between the then-current HEAD the stash entry
is based on and the HEAD the "stash apply" is attempted on top of.
Even the current "stash apply/pop with the change between the HEAD
and the index" does punt when it cannot make a clean application,
and that is without any unmerged entries in the recorded index
state.

The key point is "a state useful for the end user"---it is easy to
build a system that claims to leave a state created from the updated
HEAD and what's recorded in a stash entry that the end users cannot
use as a stating point to make progress, but that is not something
our users would want.

Have fun.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries.
  2019-04-10  4:40 ` Junio C Hamano
@ 2019-04-10  5:09   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2019-04-10  5:09 UTC (permalink / raw)
  To: Kapil Jain; +Cc: git, Thomas Gummerer, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> As to the design, it does not quite matter if you add four or more
> separate trees to represent stage #[0123] entries in the index to
> the already octopus merge commit that represents a stash entry ...

I forgot that I was planning to expand on this part while writing
the message I am following up.

There are a few things you must take into account while designing a
new format for a stash entry:

 - Your new feature will *NOT* be the last extension to the stash
   subsystem.  Always leave room to other developers to extend it
   further, without breaking backward compatiblity when your new
   feature is int in use.

 - Even though you may have never encountered in your projects,
   higher stage entries can have duplicates.  When merging two
   branches into your current branch, and there are three merge
   bases for such an octopus merge, the system (and the index
   format) is designed to allow a merge backend to store 3 stage #1
   entries (because there are that many common ancestor versions in
   the example), 1 stage #2 entry (because there is only one
   "current brahch" a merge is made into) and 2 stage #3 entries
   (because there are that many other branches you are merging into
   the current branch), all for the same path.

So, a design that says:

   A stash entry in the current system is recorded as a merge
   commit, whose tree represents the state of the tracked working
   tree files, whose first parent records the HEAD commit the stash
   entry was created on, and whose second parent records the tree
   that would have been created if "git write-tree" were done on the
   index when the stash entry was created.  Optionally, it can have
   the third parent whose tree records the state of untracked files.

   Let's add three more parents.  IOW, the fourth parent's tree
   records the result of "git write-tree" of the index after
   removing all the entries other than those at stage #1 and moving
   the remainder from stage #1 down to stage #0, and similarly the
   fifth is for stage #2 and the sixth is for stage #3.

is bad at multiple counts.

 - It does not say what should happen to the third parent when this
   new "record unmerged state" feature is used without using the
   "record untracked paths" feature.

 - It does not allow multiple stage #1 and/or stage #3 entries.

For the first point, I think a trick to record the same commit as
the first parent may be a good hack to say "this is not used"; we
might need to allow commit-tree not to complain about duplicate
parents if we go that route.

FOr the second one, there may be multiple solutions.  A
quick-and-dirty and obvious way may be to add only one new parent to
the merge commit that represents a stash entry (i.e. the fourth
parent).  Make that new parent a merge of three commits, each of
which represents what was in stage #1, stage #2 and stage #3 (we can
reuse the second parent of the stash entry that usually records the
index state to store stage #0 entries).

As we allow multiple stage #1 or stage #3 entries in the index, and
there is no fundamental reason why we should not allow multiple
stage #2 entries, make each of these three commits able to represent
multiple entries at the same stage, perhaps by

 - iterate over the index and count the maximum occurrence of the
   same path at the same stage #$n;
 - make that stage #$n commit a merge of that many parent commits.
   The tree recorded in that stage #$n commit can be an empty tree.

I am not saying this is a good design.  I am merely showing the
expected level of detail when your design gets in a presentable
shape and shared with the list.

Have fun.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-04-10  5:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-09 15:39 [GSoC] [RFC] Proposal: Teach git stash to handle unmerged index entries Kapil Jain
2019-04-10  4:40 ` Junio C Hamano
2019-04-10  5:09   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).