git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC] subtree: handle unmerged history trees
       [not found] <CA+GP4bqEjK2V7fGqXsJMkRURod8zVzZAQQ7woUUtqybxfnmSVg@mail.gmail.com>
@ 2020-05-06 14:00 ` Claus Schneider
  2020-05-11 11:46   ` Tom Clarkson
  0 siblings, 1 reply; 2+ messages in thread
From: Claus Schneider @ 2020-05-06 14:00 UTC (permalink / raw)
  To: git

Hi..

I would like to enhance git subtree functionality

Problem outline:
In a scenario where a team develops a software subsystem and part of
the code is internal and parts (source and/or interfaces) should be
delivered continuously for others to incorporate into their
system/product.

One solution is to use submodules structure for both the development
team and product which is a hassle for the development team. They need
to make logical commits across submodule and parent repository which
can be a problem with parallel development and verification.

Subtree has the feature of splitting the repository in order to
achieve this, but there are some constraints that I would fix.
- In bare mode it pushes changes to a separate branch containing the
prefix changes which is fine. You get a problem when you run the next
split. Either you re-split all the commits again - Or you add the
-rejoin parameter with the result that the splitted prefix patches are
part of your history twice or even more if you have further extracts.
So this is either a performance issue or a usability issue.
- You lose traceability from the extracted subtree commit back to you
original commits,

Solution outline using subtree:
- Add traceability to each extracted commit in new history
  - It enables humans to trace from the extracted commit to the
original commit by basic reading, clicking in tools like gitk and
scripting if desired
  - Enable subtree itself to utilize the above mentioned traceability
and simulate the add repository or rejoin merge commit. Subtree can
then "behave" similarly independent of the method being used.
  - Add option for rev-list so it can list based on
prefix/subdirectory. I have not been able to find any error, issues or
side effects adding the "-- $dir" to the rev-list command. All the
manual tests, I have done, behave correctly in my total patched git
revision. It gives a heck of performance for many-commit repositories.
 - Example: split contrib/subtree out of git repository.
    Without the option: parsing ~28000 commit
    With the option: parsing ~100 commits

Patches can be found here:
https://github.com/git/git/compare/master...Praqma:split-append-info-options-master
Commits related to this:
* cd712dda39 subtree: use append-info to only extract new commits in the prefix
* deb2e1cd8b subtree: add append-info option for adding info about
original commit
* ff73b37e22 subtree: Add option for listing commits based on prefix

Best regards
Claus Schneider

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] subtree: handle unmerged history trees
  2020-05-06 14:00 ` [RFC] subtree: handle unmerged history trees Claus Schneider
@ 2020-05-11 11:46   ` Tom Clarkson
  0 siblings, 0 replies; 2+ messages in thread
From: Tom Clarkson @ 2020-05-11 11:46 UTC (permalink / raw)
  To: Claus Schneider; +Cc: git


> On 7 May 2020, at 12:00 am, Claus Schneider <claus.schneider@eficode.com> wrote:

> - In bare mode it pushes changes to a separate branch containing the
> prefix changes which is fine. You get a problem when you run the next
> split. Either you re-split all the commits again - Or you add the
> -rejoin parameter with the result that the splitted prefix patches are
> part of your history twice or even more if you have further extracts.
> So this is either a performance issue or a usability issue.

A simpler way to link a split without including both histories would be to add a mainline commit with a git-subtree-split annotation, but without having the subtree commit as a parent. That would give you a reference to a commit not reachable from HEAD though, so plenty of opportunity to shoot yourself in the foot.

Persisting the cache between runs would be enough to avoid any potential performance penalty on subsequent splits, and is just a matter of changing the directory used. My unrelated patch implements that for other reasons, along with letting you specify specific commit mappings from script if that’s what you need.

> - Add traceability to each extracted commit in new history
>  - It enables humans to trace from the extracted commit to the
> original commit by basic reading, clicking in tools like gitk and
> scripting if desired
>  - Enable subtree itself to utilize the above mentioned traceability
> and simulate the add repository or rejoin merge commit. Subtree can
> then "behave" similarly independent of the method being used.

Have you considered how your annotations will behave if you import the same subtree repo into two different mainline repos? The subtree history would then have references to a bunch of commits that don’t exist. Adding similar annotations to merge commits on the mainline side seems like a good idea though, and would let you use find_existing_splits to avoid regenerating too many commits.

For the human readable link from the subtree repo to your original monorepo, perhaps a custom annotation would be a better fit - something like

git subtree split dir - - annotate-mainline-commit-as=“id-in-monorepo”

>  - Add option for rev-list so it can list based on
> prefix/subdirectory. I have not been able to find any error, issues or
> side effects adding the "-- $dir" to the rev-list command. All the
> manual tests, I have done, behave correctly in my total patched git
> revision. It gives a heck of performance for many-commit repositories.

Have you tested the rev-list dir option against preexisting history without your new annotations or created without split? If any of the new commits has a parent that is not in the rev-list, it will look up that commit individually and recursively. A git-subtree-mainline annotation will shortcut that, but without it the individual lookup is massively slower than working from even a very large rev-list.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-11 11:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+GP4bqEjK2V7fGqXsJMkRURod8zVzZAQQ7woUUtqybxfnmSVg@mail.gmail.com>
2020-05-06 14:00 ` [RFC] subtree: handle unmerged history trees Claus Schneider
2020-05-11 11:46   ` Tom Clarkson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).