git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][RFC] discussion about stashing with conflicts
@ 2019-04-07 13:08 Kapil Jain
  2019-04-07 18:38 ` Thomas Gummerer
  0 siblings, 1 reply; 8+ messages in thread
From: Kapil Jain @ 2019-04-07 13:08 UTC (permalink / raw)
  To: git, Johannes Schindelin, Thomas Gummerer

below is my understanding from reading the parts of code as suggested on IRC.

what is the use of ce_stage macro ?
tells about stage of an index entry.
if ce_stage says, stage #0 i.e staging area, then that index entry is
in staging area
and nothing needs to be done.
else a temporary index entry is created and repo_read_index_unmerged()
calls other function and tries to add it to index.
if it fails, it issues an error.

is this correct interpretation ?

1) in repo_read_index_unmerged(), why don't we make the value of
`unmerged` 0, if adding index entry is successful; as the entry is no
longer unmerged ?

2) what is ADD_CACHE_SKIP_DFCHECK ?
i am unsure if i get its meaning, cache.h says that it means "Ok to
skip DF conflict checks"
what are DF conflict checks ? something about diffing to check for
conflicts ? if so why are we skipping it this entry had conflicts in
the past maybe it will create again.

3) what is cache_nr variable in index_state struct ? what is its use ?


Now, about add_index_entry_with_check(), i don't fully understand this
function but concentrating on the part pointed by dscho.
https://github.com/git/git/blob/v2.21.0/read-cache.c#L1284-L1294

    /*
     * Inserting a merged entry ("stage 0") into the index
     * will always replace all non-merged entries..
     */

so this is the part we need to play with for the project
https://git.github.io/SoC-2019-Ideas/#teach-git-stash-to-handle-unmerged-index-entries
try and change this in some-way to not replace those unmerged entries
or store them some place else ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-07 13:08 [GSoC][RFC] discussion about stashing with conflicts Kapil Jain
@ 2019-04-07 18:38 ` Thomas Gummerer
  2019-04-08  5:48   ` Kapil Jain
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gummerer @ 2019-04-07 18:38 UTC (permalink / raw)
  To: Kapil Jain; +Cc: git, Johannes Schindelin

On 04/07, Kapil Jain wrote:
> below is my understanding from reading the parts of code as suggested on IRC.
> 
> what is the use of ce_stage macro ?
> tells about stage of an index entry.
> if ce_stage says, stage #0 i.e staging area, then that index entry is
> in staging area
> and nothing needs to be done.

I don't quite understand what you mean with "nothing needst to be
done" here.  In the context of teaching 'git stash' to handle unmerged
index entries, nothing that is not already being done needs to be done
with an index entry that is at stage #0.  The current implementation
already handles that correctly.

> else a temporary index entry is created and repo_read_index_unmerged()
> calls other function and tries to add it to index.
> if it fails, it issues an error.

Not sure what you mean here.  Index entries with higher stages are not
temporary, they are written out to the index file, and can then be
read back with 'repo_read_index()' for example.

repo_read_index_unmerged() drops entries to stage #0, and marks them
as CE_CONFLICTED, which callers can then use to distinguish index
entries that are conflicted and those that are not.

In the context of the GSoC project, this function is probably not
useful, because we need to separate out entries of different stages,
so we can create commits of those entries, as described in the thread
that's also linked from the project page [*1*].  Once
repo_read_index_unmerged() is called, we no longer know which entry is
at which stage (they are all at stage 0).

*1*: https://public-inbox.org/git/nycvar.QRO.7.76.6.1902072023250.41@tvgsbejvaqbjf.bet/

> is this correct interpretation ?
> 
> 1) in repo_read_index_unmerged(), why don't we make the value of
> `unmerged` 0, if adding index entry is successful; as the entry is no
> longer unmerged ?

Because the caller often wants to know if the index is unmerged in the
first place, and would refuse to operate on such an index.  Read the
comment documenting the function again, that explains this very
nicely.  Then see how some callers actually use the function, and
you'll see that they actually don't care about dropping the entry to
stage 0, but they care about knowing whether the index is unmerged or
not.

> 2) what is ADD_CACHE_SKIP_DFCHECK ?
> i am unsure if i get its meaning, cache.h says that it means "Ok to
> skip DF conflict checks"
> what are DF conflict checks ? something about diffing to check for
> conflicts ? if so why are we skipping it this entry had conflicts in
> the past maybe it will create again.

A DF conflict is a directory/file conflict, where one side of the
merge has created a file that has the same name as a directory created
on the other side of the merge.  From a bit of digging around in
git.git, t/t1000-read-tree-m-3way.sh seems to have some good
explanation on the various types of conflicts, not sure if it's
documented somewhere in Documentation/ in detail as well.

In this function we just care about creating an index that doesn't
contain any conflicts, so we are skipping the checks here.

> 3) what is cache_nr variable in index_state struct ? what is its use ?

If you look for cache_nr in the codebase you'll see that it's often
used when iterating through the whole index.  That could give you a
hint what it is.  Alternatively you could go have a look where it is
assigned, and what it is assigned to.  Looking through read-cache.c I
can find this line [*2*].  hdr_entries is in 'struct cache_header',
which is the header of the index file (cache is another name for the
index that's used in the codebase).

Now if you look at Documentation/technical/index-format.txt, you can
see that it's the number of entries that are in the index.

*2*: https://github.com/gitster/git/blob/041f5ea1cf987a4068ef5f39ba0a09be85952064/read-cache.c#L2163

> Now, about add_index_entry_with_check(), i don't fully understand this
> function but concentrating on the part pointed by dscho.
> https://github.com/git/git/blob/v2.21.0/read-cache.c#L1284-L1294
> 
>     /*
>      * Inserting a merged entry ("stage 0") into the index
>      * will always replace all non-merged entries..
>      */
> 
> so this is the part we need to play with for the project
> https://git.github.io/SoC-2019-Ideas/#teach-git-stash-to-handle-unmerged-index-entries
> try and change this in some-way to not replace those unmerged entries
> or store them some place else ?

It would be useful to repeat what the context is in which Dscho gave
you this code to read.  Looking back at the IRC logs, I can see:

    11:55:15 dscho | Also, here is some code that is helpful to understand
    what happens when unmerged entries are collapsed into a merged entry:
    https://github.com/git/git/blob/v2.21.0/read-cache.c#L1284-L1294

And the whole conversation was about trying to understand index
entries better, which is a fundamental part of the project, rather
than code that needs to be changed for the project itself.  Most
likely you will not change any functions in read-cache.c, but you'll
need an understanding of how they do things.

So the question is, did you read this function in depth and understand
what it does?  If you want to validate your understanding of the
function, try to repeat what it does in your own words, and ask for us
to correct you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-07 18:38 ` Thomas Gummerer
@ 2019-04-08  5:48   ` Kapil Jain
  2019-04-08 10:31     ` Duy Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Kapil Jain @ 2019-04-08  5:48 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Johannes Schindelin

On Mon, Apr 8, 2019 at 12:09 AM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 04/07, Kapil Jain wrote:
> >
> > what is the use of ce_stage macro ?
> > tells about stage of an index entry.
> > if ce_stage says, stage #0 i.e staging area, then that index entry is
> > in staging area
> > and nothing needs to be done.
>
> I don't quite understand what you mean with "nothing needst to be
> done" here.  In the context of teaching 'git stash' to handle unmerged
> index entries, nothing that is not already being done needs to be done
> with an index entry that is at stage #0.  The current implementation
> already handles that correctly.
>
> > else a temporary index entry is created and repo_read_index_unmerged()
> > calls other function and tries to add it to index.
> > if it fails, it issues an error.
>
> Not sure what you mean here.  Index entries with higher stages are not
> temporary, they are written out to the index file, and can then be
> read back with 'repo_read_index()' for example.

sorry, i failed to provide detailed explanation. below is what i meant.

in repo_read_index_merged(),
if ce_stage() macro says that this cache_entry is in stage #0 i.e.
already merged,
then the function doesn't try to add that entry into index.

if (!ce_stage(ce))
    continue;

but when it is not in stage #0; the function, creates a temporary cache_entry,

struct cache_entry *new_ce;
new_ce = make_empty_cache_entry(istate, len);

and tries to add it to index file.

if (add_index_entry(istate, new_ce, ADD_CACHE_SKIP_DFCHECK))
    return error(_("%s: cannot drop to stage #0"),
             new_ce->name);

now if this try of adding index entry is successful, then that entry
is no longer unmerged, right ?
so can we make `unmerged` variable 0.

> >
> > 1) in repo_read_index_unmerged(), why don't we make the value of
> > `unmerged` 0, if adding index entry is successful; as the entry is no
> > longer unmerged ?
>
> Because the caller often wants to know if the index is unmerged in the
> first place, and would refuse to operate on such an index.  Read the
> comment documenting the function again, that explains this very
> nicely.  Then see how some callers actually use the function, and
> you'll see that they actually don't care about dropping the entry to
> stage 0,

>but they care about knowing whether the index is unmerged or
> not.

if they care about whether the index *is* unmerged, and that call to
add_index_entry()
is successful, then index is no longer unmerged (at least because of
that index_entry).
is it possible that they care about if index *was* unmerged ?


> So the question is, did you read this function in depth and understand
> what it does?  If you want to validate your understanding of the
> function, try to repeat what it does in your own words, and ask for us
> to correct you.

upcoming mail will do so.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-08  5:48   ` Kapil Jain
@ 2019-04-08 10:31     ` Duy Nguyen
  2019-04-08 11:00       ` Kapil Jain
  0 siblings, 1 reply; 8+ messages in thread
From: Duy Nguyen @ 2019-04-08 10:31 UTC (permalink / raw)
  To: Kapil Jain; +Cc: Thomas Gummerer, git, Johannes Schindelin

On Mon, Apr 8, 2019 at 12:49 PM Kapil Jain <jkapil.cs@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 12:09 AM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > On 04/07, Kapil Jain wrote:
> > >
> > > what is the use of ce_stage macro ?
> > > tells about stage of an index entry.
> > > if ce_stage says, stage #0 i.e staging area, then that index entry is
> > > in staging area
> > > and nothing needs to be done.
> >
> > I don't quite understand what you mean with "nothing needst to be
> > done" here.  In the context of teaching 'git stash' to handle unmerged
> > index entries, nothing that is not already being done needs to be done
> > with an index entry that is at stage #0.  The current implementation
> > already handles that correctly.
> >
> > > else a temporary index entry is created and repo_read_index_unmerged()
> > > calls other function and tries to add it to index.
> > > if it fails, it issues an error.
> >
> > Not sure what you mean here.  Index entries with higher stages are not
> > temporary, they are written out to the index file, and can then be
> > read back with 'repo_read_index()' for example.
>
> sorry, i failed to provide detailed explanation. below is what i meant.
>
> in repo_read_index_merged(),
> if ce_stage() macro says that this cache_entry is in stage #0 i.e.
> already merged,
> then the function doesn't try to add that entry into index.
>
> if (!ce_stage(ce))
>     continue;
>
> but when it is not in stage #0; the function, creates a temporary cache_entry,
>
> struct cache_entry *new_ce;
> new_ce = make_empty_cache_entry(istate, len);
>
> and tries to add it to index file.
>
> if (add_index_entry(istate, new_ce, ADD_CACHE_SKIP_DFCHECK))
>     return error(_("%s: cannot drop to stage #0"),
>              new_ce->name);
>
> now if this try of adding index entry is successful, then that entry
> is no longer unmerged, right ?
> so can we make `unmerged` variable 0.

There's a big block of comment on top saying return true if unmerged.
As you noticed, after the function returns, there will be no unmerged
entries left anyway. Always returning zero then does not even make
sense.

The main purpose of this function is probably just checking that there
are unmerged entries or not (and return 1 if so). The
add_index_entry() there is probably just to do some more validation.

Sometimes when I don't understand what some code does, I look at "git
log --patch". In this case, there a big explanation in ad3762042a
(read-cache: fix directory/file conflict handling in
read_index_unmerged(), 2018-07-31) that might help you.

Going further back to d147e501f3 (Builtin git-read-tree., 2006-05-23),
you can see that this function has two separate uses. One about the
return code, one about collapsing unmerged entries back to stage zero.
-- 
Duy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-08 10:31     ` Duy Nguyen
@ 2019-04-08 11:00       ` Kapil Jain
  2019-04-08 11:09         ` Duy Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Kapil Jain @ 2019-04-08 11:00 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Thomas Gummerer, git, Johannes Schindelin

On Mon, Apr 8, 2019 at 4:02 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> Sometimes when I don't understand what some code does, I look at "git
> log --patch". In this case, there a big explanation in ad3762042a
> (read-cache: fix directory/file conflict handling in
> read_index_unmerged(), 2018-07-31) that might help you.
>

i got my reason from the commit message,

"The _only_ reason we want to keep a previously unmerged entry in the
index at stage #0 is so that we don't forget the fact that we have
corresponding file in the work tree in order to be able to remove it
when the tree we are resetting to does not have the path."

and now that i got back to reading that comment, it makes sense.

for finding that commit message did you do something like this:
git log -L :repo_read_index_unmerged:read-cache.c

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-08 11:00       ` Kapil Jain
@ 2019-04-08 11:09         ` Duy Nguyen
  2019-04-08 14:27           ` Kapil Jain
  0 siblings, 1 reply; 8+ messages in thread
From: Duy Nguyen @ 2019-04-08 11:09 UTC (permalink / raw)
  To: Kapil Jain; +Cc: Thomas Gummerer, git, Johannes Schindelin

On Mon, Apr 8, 2019 at 6:00 PM Kapil Jain <jkapil.cs@gmail.com> wrote:
> for finding that commit message did you do something like this:
> git log -L :repo_read_index_unmerged:read-cache.c

No. I'm old fashioned. I just do "git log --patch --follow <path>"
then search read_index_unmerged, which should at least appear in the
@@ line. But yeah log -L or "tig blame" may be better to dig in
history.
-- 
Duy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-08 11:09         ` Duy Nguyen
@ 2019-04-08 14:27           ` Kapil Jain
  2019-04-08 21:55             ` Thomas Gummerer
  0 siblings, 1 reply; 8+ messages in thread
From: Kapil Jain @ 2019-04-08 14:27 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Thomas Gummerer, git, Johannes Schindelin

while understanding add_index_entry_with_check()
i got to do_invalidate_path().

the commit hash for do_invalidate_path() function is
`749864627c2d3c33bbc56d7ba0b5542af698cc40`

in the commit message it is explained that, cache-tree is used to
store object names of index file objects and it is kept separate from
index file because adding it would change index file format.

the part i couldn't understand is:
"During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy."

what exactly does invalidating means here ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] discussion about stashing with conflicts
  2019-04-08 14:27           ` Kapil Jain
@ 2019-04-08 21:55             ` Thomas Gummerer
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gummerer @ 2019-04-08 21:55 UTC (permalink / raw)
  To: Kapil Jain; +Cc: Duy Nguyen, git, Johannes Schindelin

On 04/08, Kapil Jain wrote:
> while understanding add_index_entry_with_check()
> i got to do_invalidate_path().
> 
> the commit hash for do_invalidate_path() function is
> `749864627c2d3c33bbc56d7ba0b5542af698cc40`
> 
> in the commit message it is explained that, cache-tree is used to
> store object names of index file objects and it is kept separate from
> index file because adding it would change index file format.
> 
> the part i couldn't understand is:
> "During various index manupulations, we selectively invalidate
> the parts so that the next write-tree can bypass regenerating
> tree objects for unchanged parts of the directory hierarchy."
> 
> what exactly does invalidating means here ?

FWIW, I don't think you need to understand cache-tree for the stash
GSoC project.  Your time is probably better spent taking what you
learned, and trying to make that into a proposal, as the application
period is coming to an end.

That said, since we are talking about a cache here, invalidating means
simply making part of the cache invalid, which means the caches
contents need to be re-calculated next time they are needed.

For the cache-tree in particular that means that we need to
re-calculate tree objects that have been invalidated, while we can
just re-use the ones that have not.

If you want to have a look at the cache-tree, you can use
't/helper/test-tool dump-cache-tree .git/index' from your locally
built git, which will dump the cache-tree that can be found
'.git/index'.  Compare the output of that command just after you did
'git reset --hard' on your repository (of course it needs some
contents), and after you modified some file, and added it to the index
using 'git add'.  In the latter case you will notice some directories
that are marked as 'invalid'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-08 21:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-07 13:08 [GSoC][RFC] discussion about stashing with conflicts Kapil Jain
2019-04-07 18:38 ` Thomas Gummerer
2019-04-08  5:48   ` Kapil Jain
2019-04-08 10:31     ` Duy Nguyen
2019-04-08 11:00       ` Kapil Jain
2019-04-08 11:09         ` Duy Nguyen
2019-04-08 14:27           ` Kapil Jain
2019-04-08 21:55             ` Thomas Gummerer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).