git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Extending "extended SHA1" syntax to traverse through gitlinks?
@ 2016-08-20 22:50 Josh Triplett
  2016-08-21 13:46 ` Jakub Narębski
  0 siblings, 1 reply; 13+ messages in thread
From: Josh Triplett @ 2016-08-20 22:50 UTC (permalink / raw)
  To: git

Currently, if you have a branch "somebranch" that contains a gitlink
"somecommit", you can write "somebranch:somecommit" to refer to the
commit, just like a tree or blob.  ("man git-rev-parse" defines this
syntax in the "SPECIFYING REVISIONS" section.)  You can use this
anywhere you can use a committish, including "git show
somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
or even "git format-patch -1 somebranch:somecommit".

However, you cannot traverse *through* the gitlink to look at files
inside its own tree, or to look at other commits relative to that
commit.  For instance, "somebranch:somecommit:somefile" and
"somebranch:somecommit~3" do not work.

I'd love to have a syntax that allows traversing through the gitlink to
other files or commits.  Ideally, I'd suggest the syntax above, as a
natural extension of the existing extended syntax.

(That syntax would potentially introduce ambiguity if you had a file
named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
problem, though; the existing syntax already doesn't support accessing a
file named "x..y" or "x...y", so scripts already can't expect to access
arbitrary filenames with that syntax without some kind of quoting, wich
we also don't have.)

Does this seem reasonable?  Would a patch introducing such syntax
(including documentation and tests) be acceptable?

- Josh Triplett

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-20 22:50 Extending "extended SHA1" syntax to traverse through gitlinks? Josh Triplett
@ 2016-08-21 13:46 ` Jakub Narębski
  2016-08-21 14:26   ` Josh Triplett
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narębski @ 2016-08-21 13:46 UTC (permalink / raw)
  To: Josh Triplett, git

W dniu 21.08.2016 o 00:50, Josh Triplett pisze:

> Currently, if you have a branch "somebranch" that contains a gitlink
> "somecommit", you can write "somebranch:somecommit" to refer to the
> commit, just like a tree or blob.  ("man git-rev-parse" defines this
> syntax in the "SPECIFYING REVISIONS" section.)  You can use this
> anywhere you can use a committish, including "git show
> somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
> or even "git format-patch -1 somebranch:somecommit".
> 
> However, you cannot traverse *through* the gitlink to look at files
> inside its own tree, or to look at other commits relative to that
> commit.  For instance, "somebranch:somecommit:somefile" and
> "somebranch:somecommit~3" do not work.

Note that there is the same problem traversing through trees:
while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
doesn't:

  $ git cat-file -p HEAD:subdir:file
  fatal: Not a valid object name HEAD:subdir:file

Though you can do resolve step manually

  $ git cat-file -p $(git rev-parse HEAD:subdir):file

This works.

> 
> I'd love to have a syntax that allows traversing through the gitlink to
> other files or commits.  Ideally, I'd suggest the syntax above, as a
> natural extension of the existing extended syntax.

And with the above manual resolving, you can see the problem with
implementing it: the git-cat-file (in submodule) and git-rev-parse
(in supermodule) are across repository boundary.

Also the problem with proposed syntax is that is not very visible.
But perhaps it is all right.  Maybe :/ as separator would be better,
or using parentheses or braces?

> (That syntax would potentially introduce ambiguity if you had a file
> named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
> problem, though; the existing syntax already doesn't support accessing a
> file named "x..y" or "x...y", so scripts already can't expect to access
> arbitrary filenames with that syntax without some kind of quoting, which
> we also don't have.)

Errr... what?

  $ echo A..B >A..B
  $ git add A..B
  $ git commit -m 'A..B added'
  [master 2d69af9] A..B added
   1 file changed, 1 insertion(+), 1 deletion(-)
   create mode 100644 A..B
  $ git show HEAD:A..B
  A..B

> 
> Does this seem reasonable?  Would a patch introducing such syntax
> (including documentation and tests) be acceptable?

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-21 13:46 ` Jakub Narębski
@ 2016-08-21 14:26   ` Josh Triplett
  2016-08-22 18:39     ` Jakub Narębski
  0 siblings, 1 reply; 13+ messages in thread
From: Josh Triplett @ 2016-08-21 14:26 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git

On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
> > Currently, if you have a branch "somebranch" that contains a gitlink
> > "somecommit", you can write "somebranch:somecommit" to refer to the
> > commit, just like a tree or blob.  ("man git-rev-parse" defines this
> > syntax in the "SPECIFYING REVISIONS" section.)  You can use this
> > anywhere you can use a committish, including "git show
> > somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
> > or even "git format-patch -1 somebranch:somecommit".
> > 
> > However, you cannot traverse *through* the gitlink to look at files
> > inside its own tree, or to look at other commits relative to that
> > commit.  For instance, "somebranch:somecommit:somefile" and
> > "somebranch:somecommit~3" do not work.
> 
> Note that there is the same problem traversing through trees:
> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
> doesn't:
> 
>   $ git cat-file -p HEAD:subdir:file
>   fatal: Not a valid object name HEAD:subdir:file

Interesting point; if extending this syntax anyway, any treeish ought to
work, not just a committish.

> Though you can do resolve step manually
> 
>   $ git cat-file -p $(git rev-parse HEAD:subdir):file
> 
> This works.

True, but that seems quite inconvenient.

> > I'd love to have a syntax that allows traversing through the gitlink to
> > other files or commits.  Ideally, I'd suggest the syntax above, as a
> > natural extension of the existing extended syntax.
> 
> And with the above manual resolving, you can see the problem with
> implementing it: the git-cat-file (in submodule) and git-rev-parse
> (in supermodule) are across repository boundary.

Only if the gitlink points to a commit that doesn't exist in the same
repository.  A gitlink can point to a commit you already have.

> Also the problem with proposed syntax is that is not very visible.
> But perhaps it is all right.  Maybe :/ as separator would be better,
> or using parentheses or braces?

It seems as visible as the standard commit:path syntax; the second colon
seems just as visible as the first.  :/ already has a different meaning
(text search), so that would introduce inconsistency.

> > (That syntax would potentially introduce ambiguity if you had a file
> > named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
> > problem, though; the existing syntax already doesn't support accessing a
> > file named "x..y" or "x...y", so scripts already can't expect to access
> > arbitrary filenames with that syntax without some kind of quoting, which
> > we also don't have.)
> 
> Errr... what?
> 
>   $ echo A..B >A..B
>   $ git add A..B
>   $ git commit -m 'A..B added'
>   [master 2d69af9] A..B added
>    1 file changed, 1 insertion(+), 1 deletion(-)
>    create mode 100644 A..B
>   $ git show HEAD:A..B
>   A..B

I stand corrected; I didn't find that.  I thought rev parsing worked
independently from the repository, and didn't have any automagic
detection based on the contents of the repository?

This seems ambiguous, and (AFAICT) not documented.  If HEAD:A and B both
refer to a commit, in addition to the blob A..B, which will HEAD:A..B
refer to?  I did test the HEAD:gitlink..anotherbranch case, and it does
parse as a range.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-21 14:26   ` Josh Triplett
@ 2016-08-22 18:39     ` Jakub Narębski
  2016-08-23  6:53       ` Josh Triplett
  2016-08-23 16:39       ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Narębski @ 2016-08-22 18:39 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

W dniu 21.08.2016 o 16:26, Josh Triplett pisze:
> On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
>> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
>>> Currently, if you have a branch "somebranch" that contains a gitlink
>>> "somecommit", you can write "somebranch:somecommit" to refer to the
>>> commit, just like a tree or blob.  ("man git-rev-parse" defines this
>>> syntax in the "SPECIFYING REVISIONS" section.)  You can use this
>>> anywhere you can use a committish, including "git show
>>> somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
>>> or even "git format-patch -1 somebranch:somecommit".
>>>
>>> However, you cannot traverse *through* the gitlink to look at files
>>> inside its own tree, or to look at other commits relative to that
>>> commit.  For instance, "somebranch:somecommit:somefile" and
>>> "somebranch:somecommit~3" do not work.
>>
>> Note that there is the same problem traversing through trees:
>> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
>> doesn't:
>>
>>   $ git cat-file -p HEAD:subdir:file
>>   fatal: Not a valid object name HEAD:subdir:file
> 
> Interesting point; if extending this syntax anyway, any treeish ought to
> work, not just a committish.

Actually, because you can use simply "HEAD:subdir/file" I'd rather
it didn't work (no two ways of access), unless we can get it for free.

>> Though you can do resolve step manually
>>
>>   $ git cat-file -p $(git rev-parse HEAD:subdir):file
>>
>> This works.
> 
> True, but that seems quite inconvenient.

Especially that for submodules you need:

$ git --git-dir=subdir/.git cat-file -p $(git rev-parse HEAD:subdir):file

(or something like that), assuming that you start in supermodule.
 
>>> I'd love to have a syntax that allows traversing through the gitlink to
>>> other files or commits.  Ideally, I'd suggest the syntax above, as a
>>> natural extension of the existing extended syntax.
>>
>> And with the above manual resolving, you can see the problem with
>> implementing it: the git-cat-file (in submodule) and git-rev-parse
>> (in supermodule) are across repository boundary.
> 
> Only if the gitlink points to a commit that doesn't exist in the same
> repository.  A gitlink can point to a commit you already have.

The idea of submodules is that tree object in superproject includes
link to commit of subproject (so called gitlink).  Tree object is
in superproject repository, while gitlinked commit is in submodule
repository.

True, with modern Git the submodule repository is embedded in .git
area of superproject, with '.git' in submodule being gitling file,
but by design those objects are in different repositories, in different
object databases.

>> Also the problem with proposed syntax is that is not very visible.
>> But perhaps it is all right.  Maybe :/ as separator would be better,
>> or using parentheses or braces?
> 
> It seems as visible as the standard commit:path syntax; the second colon
> seems just as visible as the first.  :/ already has a different meaning
> (text search), so that would introduce inconsistency.

Actually ":/" has a special meaning only if it is at beginning:
 - :/<text> for first matching commit from any ref
 - :/       is 'top directory' pathspec (equivalent to ':(top)')

But perhaps '//' would be better.

>>> (That syntax would potentially introduce ambiguity if you had a file
>>> named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
>>> problem, though; the existing syntax already doesn't support accessing a
>>> file named "x..y" or "x...y", so scripts already can't expect to access
>>> arbitrary filenames with that syntax without some kind of quoting, which
>>> we also don't have.)
>>
>> Errr... what?
>>
>>   $ echo A..B >A..B
>>   $ git add A..B
>>   $ git commit -m 'A..B added'
>>   [master 2d69af9] A..B added
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>    create mode 100644 A..B
>>   $ git show HEAD:A..B
>>   A..B
> 
> I stand corrected; I didn't find that.  I thought rev parsing worked
> independently from the repository, and didn't have any automagic
> detection based on the contents of the repository?

It probably depends on whether command expects range (like git-log),
supports range-like notation (like git-diff), or expects single or
multiple things (like git-show).

> This seems ambiguous, and (AFAICT) not documented.  If HEAD:A and B both
> refer to a commit, in addition to the blob A..B, which will HEAD:A..B
> refer to?  I did test the HEAD:gitlink..anotherbranch case, and it does
> parse as a range.

Well, it is ambiguous.

We would probably want to support some kind of quoting, for example
HEAD:"A..B" (where everything inside "..." is c-quoted, but can use utf-8).

-- 
Jakub Narębski 


 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-22 18:39     ` Jakub Narębski
@ 2016-08-23  6:53       ` Josh Triplett
  2016-08-23 20:24         ` Jakub Narębski
  2016-08-23 16:39       ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Josh Triplett @ 2016-08-23  6:53 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git

On Mon, Aug 22, 2016 at 08:39:19PM +0200, Jakub Narębski wrote:
> W dniu 21.08.2016 o 16:26, Josh Triplett pisze:
> > On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
> >> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
> >>> Currently, if you have a branch "somebranch" that contains a gitlink
> >>> "somecommit", you can write "somebranch:somecommit" to refer to the
> >>> commit, just like a tree or blob.  ("man git-rev-parse" defines this
> >>> syntax in the "SPECIFYING REVISIONS" section.)  You can use this
> >>> anywhere you can use a committish, including "git show
> >>> somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
> >>> or even "git format-patch -1 somebranch:somecommit".
> >>>
> >>> However, you cannot traverse *through* the gitlink to look at files
> >>> inside its own tree, or to look at other commits relative to that
> >>> commit.  For instance, "somebranch:somecommit:somefile" and
> >>> "somebranch:somecommit~3" do not work.
> >>
> >> Note that there is the same problem traversing through trees:
> >> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
> >> doesn't:
> >>
> >>   $ git cat-file -p HEAD:subdir:file
> >>   fatal: Not a valid object name HEAD:subdir:file
> > 
> > Interesting point; if extending this syntax anyway, any treeish ought to
> > work, not just a committish.
> 
> Actually, because you can use simply "HEAD:subdir/file" I'd rather
> it didn't work (no two ways of access), unless we can get it for free.

Agreed.  I suspect we'd get it for free if we introduced a syntax for
traversing through commits (by allowing that syntax to work with any
treeish), but if not, I certainly don't see any value in adding a second
syntax for accessing tree contents.

> >>> I'd love to have a syntax that allows traversing through the gitlink to
> >>> other files or commits.  Ideally, I'd suggest the syntax above, as a
> >>> natural extension of the existing extended syntax.
> >>
> >> And with the above manual resolving, you can see the problem with
> >> implementing it: the git-cat-file (in submodule) and git-rev-parse
> >> (in supermodule) are across repository boundary.
> > 
> > Only if the gitlink points to a commit that doesn't exist in the same
> > repository.  A gitlink can point to a commit you already have.
> 
> The idea of submodules is that tree object in superproject includes
> link to commit of subproject (so called gitlink).  Tree object is
> in superproject repository, while gitlinked commit is in submodule
> repository.
> 
> True, with modern Git the submodule repository is embedded in .git
> area of superproject, with '.git' in submodule being gitling file,
> but by design those objects are in different repositories, in different
> object databases.

git-submodule handles them that way by default, yes.  But a gitlink
doesn't inherently have to point to a separate repository, and even a
submodule could point to an object available in the same repository
(perhaps via another ref).

git-series creates such gitlinks, for instance.

> >> Also the problem with proposed syntax is that is not very visible.
> >> But perhaps it is all right.  Maybe :/ as separator would be better,
> >> or using parentheses or braces?
> > 
> > It seems as visible as the standard commit:path syntax; the second colon
> > seems just as visible as the first.  :/ already has a different meaning
> > (text search), so that would introduce inconsistency.
> 
> Actually ":/" has a special meaning only if it is at beginning:

True, but it seems inconsistent to have :/ mean search if at the
beginning, or traversal if not.

> But perhaps '//' would be better.

That does seem unambiguous, and it can't conflict with an existing file.
Does it seem reasonable to allow that for the initial commit as well
('committish//file', as well as 'commit//gitlink//file')?

Also, while that handles traversal into the tree contained in the
gitlinked commit, what about navigating by commit (using '~' and '^',
for instance)?  Does it seem reasonable to allow those as well, perhaps
only if you use // to reach the gitlink?  For instance,
'commit//gitlink~3', or 'commit//gitlink^{tree}'?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-22 18:39     ` Jakub Narębski
  2016-08-23  6:53       ` Josh Triplett
@ 2016-08-23 16:39       ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-08-23 16:39 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Josh Triplett, git

Jakub Narębski <jnareb@gmail.com> writes:

> Especially that for submodules you need:
>
> $ git --git-dir=subdir/.git cat-file -p $(git rev-parse HEAD:subdir):file
>
> (or something like that), assuming that you start in supermodule.
>   ...
>
> But perhaps '//' would be better.

If the users have to know where they need to use different
separator, I do not think it is worth complicating the plumbing to
do this for them.  I'd rather keep things simple, and let the users
build complex stuff on top of the plumbing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-23  6:53       ` Josh Triplett
@ 2016-08-23 20:24         ` Jakub Narębski
  2016-08-24  5:36           ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narębski @ 2016-08-23 20:24 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git, Junio C Hamano

W dniu 23.08.2016 o 08:53, Josh Triplett pisze:
> On Mon, Aug 22, 2016 at 08:39:19PM +0200, Jakub Narębski wrote:
>> W dniu 21.08.2016 o 16:26, Josh Triplett pisze:
>>> On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
>>>> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
>>>>>
[...]
>>>> And with the above manual resolving, you can see the problem with
>>>> implementing it: the git-cat-file (in submodule) and git-rev-parse
>>>> (in supermodule) are across repository boundary.
>>>
>>> Only if the gitlink points to a commit that doesn't exist in the same
>>> repository.  A gitlink can point to a commit you already have.
>>
>> The idea of submodules is that tree object in superproject includes
>> link to commit of subproject (so called gitlink).  Tree object is
>> in superproject repository, while gitlinked commit is in submodule
>> repository.
>>
>> True, with modern Git the submodule repository is embedded in .git
>> area of superproject, with '.git' in submodule being gitling file,
>> but by design those objects are in different repositories, in different
>> object databases.
> 
> git-submodule handles them that way by default, yes.  But a gitlink
> doesn't inherently have to point to a separate repository, and even a
> submodule could point to an object available in the same repository
> (perhaps via another ref).
> 
> git-series creates such gitlinks, for instance.

The point is that submodule has it's own object database.  It might
be the same as superproject's, but you need to handle submodule objects
being in separate submodule repository anyway.  Common repository is
just a special case.

By the way, this also means that proposed "extended extended SHA1"
syntax would be useful to user's of submodules...

>>>> Also the problem with proposed syntax is that is not very visible.
>>>> But perhaps it is all right.  Maybe :/ as separator would be better,
>>>> or using parentheses or braces?
>>>
>>> It seems as visible as the standard commit:path syntax; the second colon
>>> seems just as visible as the first.  :/ already has a different meaning
>>> (text search), so that would introduce inconsistency.
>>
>> Actually ":/" has a special meaning only if it is at beginning:
> 
> True, but it seems inconsistent to have :/ mean search if at the
> beginning, or traversal if not.

Right.  It would also mean that if we have directory or submodule
called 'foo:', then 'foo:/bar' would be ambiguous where it was not
before.

BTW. currently there is not much need for quoting, at least not for
the ':' as separator.  Files with ':' in them, even if they are
named 'HEAD:foo' can be distinguished with ./HEAD:foo, or with
':(top)HEAD:foo'.  This would not be the case if supermodule to
submodule separator was ':'; the '//' is safe-ish.

Also, '//' would have additional meaning, in that left hand side
and right hand side are in [possibly] different repositories.


Sidenote (on MS Windows):
 samsung@notebook MINGW64 ~/test (master)
 $ echo 'HEAD:A..B' >'HEAD:A..B'

 samsung@notebook MINGW64 ~/test (master)
 $ git add 'HEAD:A..B'
 fatal: pathspec 'HEAD:A..B' did not match any files

 samsung@notebook MINGW64 ~/test (master)
 $ ls
 A  A..B  B  HEAD:A..B  file  sub/  subm/


>> But perhaps '//' would be better.
> 
> That does seem unambiguous, and it can't conflict with an existing file.
> Does it seem reasonable to allow that for the initial commit as well
> ('committish//file', as well as 'commit//gitlink//file')?

I don't think we can change this without breaking scripts (because it
would be breaking backward compatibility).  And adding new syntax...

The problem might be shells sanitizing input, that is turning '//'
into '/' before passing it to command; I don't know if it is a problem.
Probably not.

> Also, while that handles traversal into the tree contained in the
> gitlinked commit, what about navigating by commit (using '~' and '^',
> for instance)?  Does it seem reasonable to allow those as well, perhaps
> only if you use // to reach the gitlink?  For instance,
> 'commit//gitlink~3', or 'commit//gitlink^{tree}'?

I don't know which of those work, and which do not:

  HEAD:path/to/submodule~3
  :0:path/to/submodule^{tree}
  HEAD~3:path/to/submodule

But I think the following should work:

  v1.0.1~2^2~4:path/to/submodule~3//inner/subm~4//sub/file


NOTE that the syntax allows to start at revision, at the index state
of superproject, but it only goes to state recorder, or to be recorded
in the superproject.  There is no syntax to find out HEAD or index
version of submodule, unless you are within submodule, isn't it?

Best,
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-23 20:24         ` Jakub Narębski
@ 2016-08-24  5:36           ` Junio C Hamano
  2016-08-24 13:16             ` Jakub Narębski
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-08-24  5:36 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Josh Triplett, git

Jakub Narębski <jnareb@gmail.com> writes:

> The point is that submodule has it's own object database.  It might
> be the same as superproject's, but you need to handle submodule objects
> being in separate submodule repository anyway.  Common repository is
> just a special case.
>
> By the way, this also means that proposed "extended extended SHA1"
> syntax would be useful to user's of submodules...

Not really.

I think that you gave a prime example why <treeish>:<path1>//<path2>
is not a useful thing for submodules.  When the syntax resolves to a
40-hex object name, that object name by itself is not useful.

You also need to carry an additional piece of information that lets
you identify the location of the repository, in which the object
name is valid, in the current user's context (i.e. somewhere in the
superproject where the submodule lives).  In other words, you'd need
to carry <treeish>:<path1> around anyway for the object name to be
useful, so there is no good reason why anybody should insist that
the plumbing level resolve <treeish>:<path1>//<path2> directly to an
object name in the first place.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-24  5:36           ` Junio C Hamano
@ 2016-08-24 13:16             ` Jakub Narębski
  2016-08-24 14:20               ` Josh Triplett
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Narębski @ 2016-08-24 13:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Triplett, git

W dniu 24.08.2016 o 07:36, Junio C Hamano pisze:
> Jakub Narębski <jnareb@gmail.com> writes:
> 
>> The point is that submodule has it's own object database.  It might
>> be the same as superproject's, but you need to handle submodule objects
>> being in separate submodule repository anyway.  Common repository is
>> just a special case.
>>
>> By the way, this also means that proposed "extended extended SHA1"
>> syntax would be useful to user's of submodules...
> 
> Not really.
> 
> I think that you gave a prime example why <treeish>:<path1>//<path2>
> is not a useful thing for submodules.  When the syntax resolves to a
> 40-hex object name, that object name by itself is not useful.
> 
> You also need to carry an additional piece of information that lets
> you identify the location of the repository, in which the object
> name is valid, in the current user's context (i.e. somewhere in the
> superproject where the submodule lives).  In other words, you'd need
> to carry <treeish>:<path1> around anyway for the object name to be
> useful, so there is no good reason why anybody should insist that
> the plumbing level resolve <treeish>:<path1>//<path2> directly to an
> object name in the first place.

Not really.

The above means only that the support for new syntax would be not
as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
except for the case where submodule uses the same object database as
supermodule.

So it wouldn't be as easy (on conceptual level) as adding support
for ':/<text>' or '<commit>^{/<text>}'.  It would be at least as
hard, if not harder, as adding support for '@{-1}' and its '-'
shortcut.


Josh, what was the reason behind proposing this feature? Was it
conceived as adding completeness to gitrevisions syntax, a low-hanging
fruit?  It isn't (the latter).  Or was it some problem with submodule
handling that you would want to use this syntax for?

As for usefulness: this fills the hole in accessing submodules, one
that could be handled by combining plumbing-level commands.  Namely,
there are 5 states of submodule (as I understand it)

 * recorded in ref / commit in supermodule
 * recorded in the index in supermodule
 - recorded in ref / commit in submodule
 - recorded in the index in submodule
 - state of worktree in submodule

The last three can be easyly acessed by cd-ing to submodule.  The first
two are not easy to get, AFAIUC.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-24 13:16             ` Jakub Narębski
@ 2016-08-24 14:20               ` Josh Triplett
  2016-08-24 16:26                 ` Stefan Beller
  2016-08-24 17:05                 ` Jakub Narębski
  0 siblings, 2 replies; 13+ messages in thread
From: Josh Triplett @ 2016-08-24 14:20 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Junio C Hamano, git

On Wed, Aug 24, 2016 at 03:16:56PM +0200, Jakub Narębski wrote:
> W dniu 24.08.2016 o 07:36, Junio C Hamano pisze:
> > Jakub Narębski <jnareb@gmail.com> writes:
> > 
> >> The point is that submodule has it's own object database.  It might
> >> be the same as superproject's, but you need to handle submodule objects
> >> being in separate submodule repository anyway.  Common repository is
> >> just a special case.
> >>
> >> By the way, this also means that proposed "extended extended SHA1"
> >> syntax would be useful to user's of submodules...
> > 
> > Not really.
> > 
> > I think that you gave a prime example why <treeish>:<path1>//<path2>
> > is not a useful thing for submodules.  When the syntax resolves to a
> > 40-hex object name, that object name by itself is not useful.
> > 
> > You also need to carry an additional piece of information that lets
> > you identify the location of the repository, in which the object
> > name is valid, in the current user's context (i.e. somewhere in the
> > superproject where the submodule lives).  In other words, you'd need
> > to carry <treeish>:<path1> around anyway for the object name to be
> > useful, so there is no good reason why anybody should insist that
> > the plumbing level resolve <treeish>:<path1>//<path2> directly to an
> > object name in the first place.
> 
> Not really.
> 
> The above means only that the support for new syntax would be not
> as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
> except for the case where submodule uses the same object database as
> supermodule.
> 
> So it wouldn't be as easy (on conceptual level) as adding support
> for ':/<text>' or '<commit>^{/<text>}'.  It would be at least as
> hard, if not harder, as adding support for '@{-1}' and its '-'
> shortcut.

Depends on which cases you want to handle.  In the most general case,
you'd need to find and process the applicable .gitmodules file, which
would only work if you started from the top-level tree, not a random
treeish.  On the other hand, in the most general case, you don't
necessarily even have the module you need, because .git/modules only
contains the modules the *current* version needed, not every past
version.

As an alternate approach (pun intended): treat every module in
.git/modules as an alternate and just look up the object by hash.  Or,
teach git-submodule to store all the objects for submodules in the
supermodule's .git/objects (and teach git's reachability algorithm to
respect refs in .git/modules, or store their refs in
.git/refs/submodules/ or in a namespace).

> Josh, what was the reason behind proposing this feature? Was it
> conceived as adding completeness to gitrevisions syntax, a low-hanging
> fruit?  It isn't (the latter).  Or was it some problem with submodule
> handling that you would want to use this syntax for?

This wasn't an abstract/theoretical completeness issue.  I specifically
wanted this syntax for practical use with actual trees containing
gitlinks, motivated by having a tool that creates and uses such
gitlinks. :)

> As for usefulness: this fills the hole in accessing submodules, one
> that could be handled by combining plumbing-level commands.  Namely,
> there are 5 states of submodule (as I understand it)
> 
>  * recorded in ref / commit in supermodule
>  * recorded in the index in supermodule
>  - recorded in ref / commit in submodule
>  - recorded in the index in submodule
>  - state of worktree in submodule
> 
> The last three can be easyly acessed by cd-ing to submodule.  The first
> two are not easy to get, AFAIUC.

Right.  I primarily care about those first two cases, especially the
first one: given a commit containing a gitlink, how can I easily dig
into the linked commit?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-24 14:20               ` Josh Triplett
@ 2016-08-24 16:26                 ` Stefan Beller
  2016-08-24 17:05                 ` Jakub Narębski
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2016-08-24 16:26 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Jakub Narębski, Junio C Hamano, git@vger.kernel.org

On Wed, Aug 24, 2016 at 7:20 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> Depends on which cases you want to handle.  In the most general case,
> you'd need to find and process the applicable .gitmodules file, which
> would only work if you started from the top-level tree, not a random
> treeish.  On the other hand, in the most general case, you don't
> necessarily even have the module you need, because .git/modules only
> contains the modules the *current* version needed, not every past
> version.

The code in submodule-config.{c,h} allows exactly that:

    submodule_from_path(commit_sha1, path)

returns information about the submodule recorded in a .gitmodules
file of a specific revision of the superproject.

Ideally the .git/modules contains all the modules that existed, ever.
Well "ideally" is the wrong word, but it is at least possible as the
submodules git dir is kept even when you remove the outdated
submodules working dir. (That's why the git dir is in the superprojects
git dir in the first place).

But as you say, it is possible of not having the submodule available.

>
> As an alternate approach (pun intended): treat every module in
> .git/modules as an alternate and just look up the object by hash.  Or,
> teach git-submodule to store all the objects for submodules in the
> supermodule's .git/objects (and teach git's reachability algorithm to
> respect refs in .git/modules, or store their refs in
> .git/refs/submodules/ or in a namespace).

This is a sensible thing to do no matter the outcome of this discussion.

>>  * recorded in ref / commit in supermodule
>>  * recorded in the index in supermodule
>>  - recorded in ref / commit in submodule
>>  - recorded in the index in submodule
>>  - state of worktree in submodule
>>
>> The last three can be easyly acessed by cd-ing to submodule.  The first
>> two are not easy to get, AFAIUC.
>
> Right.  I primarily care about those first two cases, especially the
> first one: given a commit containing a gitlink, how can I easily dig
> into the linked commit?

What do you exactly need? (What is digging here?)

See for example the series, that Jake Keller currently tries to land:
    "submodule inline diff format"
https://public-inbox.org/git/20160822234344.22797-1-jacob.e.keller@intel.com

That would enhance all the log/diff/show things.

Reading the original message, do you want to create patches in
the submodule from the superproject?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-24 14:20               ` Josh Triplett
  2016-08-24 16:26                 ` Stefan Beller
@ 2016-08-24 17:05                 ` Jakub Narębski
  2016-08-24 20:21                   ` Josh Triplett
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Narębski @ 2016-08-24 17:05 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Junio C Hamano, git, Stefan Beller

W dniu 24.08.2016 o 16:20, Josh Triplett pisze:
> On Wed, Aug 24, 2016 at 03:16:56PM +0200, Jakub Narębski wrote:
[...]
>> Not really.
>>
>> The above means only that the support for new syntax would be not
>> as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
>> except for the case where submodule uses the same object database as
>> supermodule.
>>
>> So it wouldn't be as easy (on conceptual level) as adding support
>> for ':/<text>' or '<commit>^{/<text>}'.  It would be at least as
>> hard, if not harder, as adding support for '@{-1}' and its '-'
>> shortcut.
> 
> Depends on which cases you want to handle.  In the most general case,
> you'd need to find and process the applicable .gitmodules file, which
> would only work if you started from the top-level tree, not a random
> treeish.  On the other hand, in the most general case, you don't
> necessarily even have the module you need, because .git/modules only
> contains the modules the *current* version needed, not every past
> version.

There is an additional problem, namely that directory with submodule
can be renamed.

I don't know if there is an existing API, but assuming modern
git-submodule (with repository in .git/modules) you would have to
do the following steps for <revision>:<path/to/submodule>//<path>:

 * look up <revision>:.gitmodules for module which 'path'
   is <path/to/submodule>; let's say it is named <submodule>
 * check if <revision>:<path/to/submodule> commit object
   is present in .git/modules/<submodule>
 * look up this object

In the case of legacy submodule setup, with submodule repository
in the supermodule working directory, you would need:
   
 * look up <revision>:.gitmodules for module which 'path'
   is <path/to/submodule>; let's say it is named <submodule>
 * look up current .gitmodules for current path of submodule
   named <submodule>; let's say it is <new/path/submodule>
 * check of <revision>:</path/to/submodule> commit object
   is present in :(top)<new/path/submodule>/.git repository
 * look up this object

You could also check if the submodule repository (as stored
in config) is a path, and use it if it is... but that might
be going to far.


BTW. all that reminds me that gitweb should handle submodules
better.

> As an alternate approach (pun intended): treat every module in
> .git/modules as an alternate and just look up the object by hash.  

This could be a good fallback, to search through all submodules.

> Or, teach git-submodule to store all the objects for submodules in the
> supermodule's .git/objects (and teach git's reachability algorithm to
> respect refs in .git/modules, or store their refs in
> .git/refs/submodules/ or in a namespace).

And fallback to this fallback could be searching through supermodule
object repository.

Storing all objects in single repository is counter to the design
decision of submodules (though I don't remember what it was), but
it might be done.  Still, Git needs to be able to deal with legacy
situations anyway.
 
>> Josh, what was the reason behind proposing this feature? Was it
>> conceived as adding completeness to gitrevisions syntax, a low-hanging
>> fruit?  It isn't (the latter).  Or was it some problem with submodule
>> handling that you would want to use this syntax for?
> 
> This wasn't an abstract/theoretical completeness issue.  I specifically
> wanted this syntax for practical use with actual trees containing
> gitlinks, motivated by having a tool that creates and uses such
> gitlinks. :)

Could you explain what you need in more detail?  Is it a fragment
of history of submodule, a contents of a file at given point of
superproject history, diff between file-in-submodule and something
else, or what?
 
>> As for usefulness: this fills the hole in accessing submodules, one
>> that could be handled by combining plumbing-level commands.  Namely,
>> there are 5 states of submodule (as I understand it)
>>
>>  * recorded in ref / commit in supermodule
>>  * recorded in the index in supermodule
>>  - recorded in ref / commit in submodule
>>  - recorded in the index in submodule
>>  - state of worktree in submodule
>>
>> The last three can be easyly acessed by cd-ing to submodule.  The first
>> two are not easy to get, AFAIUC.
> 
> Right.  I primarily care about those first two cases, especially the
> first one: given a commit containing a gitlink, how can I easily dig
> into the linked commit?

All right.

Though you can cobble it with plumbing... just saying.

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Extending "extended SHA1" syntax to traverse through gitlinks?
  2016-08-24 17:05                 ` Jakub Narębski
@ 2016-08-24 20:21                   ` Josh Triplett
  0 siblings, 0 replies; 13+ messages in thread
From: Josh Triplett @ 2016-08-24 20:21 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Junio C Hamano, git, Stefan Beller

On Wed, Aug 24, 2016 at 07:05:17PM +0200, Jakub Narębski wrote:
> W dniu 24.08.2016 o 16:20, Josh Triplett pisze:
> > On Wed, Aug 24, 2016 at 03:16:56PM +0200, Jakub Narębski wrote:
> [...]
> >> Not really.
> >>
> >> The above means only that the support for new syntax would be not
> >> as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
> >> except for the case where submodule uses the same object database as
> >> supermodule.
> >>
> >> So it wouldn't be as easy (on conceptual level) as adding support
> >> for ':/<text>' or '<commit>^{/<text>}'.  It would be at least as
> >> hard, if not harder, as adding support for '@{-1}' and its '-'
> >> shortcut.
> > 
> > Depends on which cases you want to handle.  In the most general case,
> > you'd need to find and process the applicable .gitmodules file, which
> > would only work if you started from the top-level tree, not a random
> > treeish.  On the other hand, in the most general case, you don't
> > necessarily even have the module you need, because .git/modules only
> > contains the modules the *current* version needed, not every past
> > version.
> 
> There is an additional problem, namely that directory with submodule
> can be renamed.
> 
> I don't know if there is an existing API, but assuming modern
> git-submodule (with repository in .git/modules) you would have to
> do the following steps for <revision>:<path/to/submodule>//<path>:
> 
>  * look up <revision>:.gitmodules for module which 'path'
>    is <path/to/submodule>; let's say it is named <submodule>
>  * check if <revision>:<path/to/submodule> commit object
>    is present in .git/modules/<submodule>
>  * look up this object

This also assumes your lookup started with a <committish> and not an
intermediate <treeish>, but that'll work in many cases.

> > As an alternate approach (pun intended): treat every module in
> > .git/modules as an alternate and just look up the object by hash.  
> 
> This could be a good fallback, to search through all submodules.
> 
> > Or, teach git-submodule to store all the objects for submodules in the
> > supermodule's .git/objects (and teach git's reachability algorithm to
> > respect refs in .git/modules, or store their refs in
> > .git/refs/submodules/ or in a namespace).
> 
> And fallback to this fallback could be searching through supermodule
> object repository.

I'd flip those around: first search registered .gitmodules, then look up
the object in the superproject (since you have it at hand), and then
maybe search every submodule.

> >> Josh, what was the reason behind proposing this feature? Was it
> >> conceived as adding completeness to gitrevisions syntax, a low-hanging
> >> fruit?  It isn't (the latter).  Or was it some problem with submodule
> >> handling that you would want to use this syntax for?
> > 
> > This wasn't an abstract/theoretical completeness issue.  I specifically
> > wanted this syntax for practical use with actual trees containing
> > gitlinks, motivated by having a tool that creates and uses such
> > gitlinks. :)
> 
> Could you explain what you need in more detail?  Is it a fragment
> of history of submodule, a contents of a file at given point of
> superproject history, diff between file-in-submodule and something
> else, or what?

As part of git-series, I have commits, whose trees contain various
gitlinks, such as "series" and "base".  Those gitlinks point to commits
in the same repository.  I'd like to use those gitlinks everywhere I
could use any other committish, such as a branch name.  In particular,
I'd like to write things like some_feature:series:path/to/file ("what
does path/to/file look like in the current version of some_feature"),
some_feature:series^ ("what's the second-to-last commit in
some_feature"), some_feature~5:series:path/to/file ("what did
path/to/file look like in an older version of some_feature"), or
some_feature~5:base..some_feature~5:series~2 ("all but the last two
patches in some_feature~5").  Those should work with show, diff,
format-patch, log, etc.

> >> As for usefulness: this fills the hole in accessing submodules, one
> >> that could be handled by combining plumbing-level commands.  Namely,
> >> there are 5 states of submodule (as I understand it)
> >>
> >>  * recorded in ref / commit in supermodule
> >>  * recorded in the index in supermodule
> >>  - recorded in ref / commit in submodule
> >>  - recorded in the index in submodule
> >>  - state of worktree in submodule
> >>
> >> The last three can be easyly acessed by cd-ing to submodule.  The first
> >> two are not easy to get, AFAIUC.
> > 
> > Right.  I primarily care about those first two cases, especially the
> > first one: given a commit containing a gitlink, how can I easily dig
> > into the linked commit?
> 
> All right.
> 
> Though you can cobble it with plumbing... just saying.

Sure, but that makes the expression much more complex.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-08-24 20:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-20 22:50 Extending "extended SHA1" syntax to traverse through gitlinks? Josh Triplett
2016-08-21 13:46 ` Jakub Narębski
2016-08-21 14:26   ` Josh Triplett
2016-08-22 18:39     ` Jakub Narębski
2016-08-23  6:53       ` Josh Triplett
2016-08-23 20:24         ` Jakub Narębski
2016-08-24  5:36           ` Junio C Hamano
2016-08-24 13:16             ` Jakub Narębski
2016-08-24 14:20               ` Josh Triplett
2016-08-24 16:26                 ` Stefan Beller
2016-08-24 17:05                 ` Jakub Narębski
2016-08-24 20:21                   ` Josh Triplett
2016-08-23 16:39       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).