[RFE] Inverted sparseness

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFE] Inverted sparseness
@ 2017-12-01 17:21 Randall S. Becker
  2017-12-01 18:19 ` Jeff Hostetler
  0 siblings, 1 reply; 6+ messages in thread
From: Randall S. Becker @ 2017-12-01 17:21 UTC (permalink / raw)
  To: git

I recently encountered a really strange use-case relating to sparse clone/fetch that is really backwards from the discussion that has been going on, and well, I'm a bit embarrassed to bring it up, but I have no good solution including building a separate data store that will end up inconsistent with repositories (a bad solution).  The use-case is as follows:

Given a backbone of multiple git repositories spread across an organization with a server farm and upstream vendors.
The vendor delivers code by having the client perform git pull into a specific branch.
The customer may take the code as is or merge in customizations.
The vendor wants to know exactly what commit of theirs is installed on each server, in near real time.
The customer is willing to push the commit-ish to the vendor's upstream repo but does not want, by default, to share the actual commit contents for security reasons.
	Realistically, the vendor needs to know that their own commit id was put somewhere (process exists to track this, so not part of the use-case) and whether there is a subsequent commit contributed by the customer, but the content is not relevant initially.

After some time, the vendor may request the commit contents from the customer in order to satisfy support requirements - a.k.a. a defect was found but has to be resolved.
The customer would then perform a deeper push that looks a lot like a "slightly" symmetrical operation of a deep fetch following a prior sparse fetch to supply the vendor with the specific commit(s).

This is not hard to realize if the sparse commit is HEAD on a branch, but if its inside a tree, well, I don't even know where to start. To self-deprecate, this is likely a bad idea, but it has come up a few times.

Thoughts? Nasty Remarks?

Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000) 
-- In my real life, I talk too much.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFE] Inverted sparseness
  2017-12-01 17:21 [RFE] Inverted sparseness Randall S. Becker
@ 2017-12-01 18:19 ` Jeff Hostetler
  2017-12-01 18:31   ` Randall S. Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Hostetler @ 2017-12-01 18:19 UTC (permalink / raw)
  To: Randall S. Becker, git

On 12/1/2017 12:21 PM, Randall S. Becker wrote:
> I recently encountered a really strange use-case relating to sparse clone/fetch that is really backwards from the discussion that has been going on, and well, I'm a bit embarrassed to bring it up, but I have no good solution including building a separate data store that will end up inconsistent with repositories (a bad solution).  The use-case is as follows:
> 
> Given a backbone of multiple git repositories spread across an organization with a server farm and upstream vendors.
> The vendor delivers code by having the client perform git pull into a specific branch.
> The customer may take the code as is or merge in customizations.
> The vendor wants to know exactly what commit of theirs is installed on each server, in near real time.
> The customer is willing to push the commit-ish to the vendor's upstream repo but does not want, by default, to share the actual commit contents for security reasons.
> 	Realistically, the vendor needs to know that their own commit id was put somewhere (process exists to track this, so not part of the use-case) and whether there is a subsequent commit contributed by the customer, but the content is not relevant initially.
> 
> After some time, the vendor may request the commit contents from the customer in order to satisfy support requirements - a.k.a. a defect was found but has to be resolved.
> The customer would then perform a deeper push that looks a lot like a "slightly" symmetrical operation of a deep fetch following a prior sparse fetch to supply the vendor with the specific commit(s).
> 
> This is not hard to realize if the sparse commit is HEAD on a branch, but if its inside a tree, well, I don't even know where to start. To self-deprecate, this is likely a bad idea, but it has come up a few times.
> 
> Thoughts? Nasty Remarks?
> 
> Randall

Perhaps I'm not understanding the subtleties of what you're describing,
but could you do this with stock git functionality.

Let the vendor publish a "well known branch" for the client.
Let the client pull that and build.
Let the client create a branch set to the same commit that they fetched.
Let the client push that branch as a client-specific branch to
the vendor to indicate that that is the official release they
are based on.

Then the vendor would know the official commit that the client was
using.

If the client makes local changes, does the vendor really need the
SHA of those -- without the actual content?  I mean any SHA would
do right?  Perhaps let the client create a second client-specific
branch (set to the same commit as the first) to indicate they had
mods.

Later, when the vendor needs the actual client changes, the client
does a normal push to this 2nd client-specific branch at the vendor.
This would send everything that the client has done to the code
since the official release.

I'm not sure what you mean about "it is inside a tree".

Hope this helps,
Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [RFE] Inverted sparseness
  2017-12-01 18:19 ` Jeff Hostetler
@ 2017-12-01 18:31   ` Randall S. Becker
  2017-12-03 23:14     ` Philip Oakley
  0 siblings, 1 reply; 6+ messages in thread
From: Randall S. Becker @ 2017-12-01 18:31 UTC (permalink / raw)
  To: 'Jeff Hostetler', git

On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
>> I recently encountered a really strange use-case relating to sparse clone/fetch that is really backwards from the discussion that has been going on, and well, I'm a bit embarrassed to bring it up, but I have no good solution including building a separate data store that will end up inconsistent with repositories (a bad solution).  The use-case is as follows:
>> 
>> Given a backbone of multiple git repositories spread across an organization with a server farm and upstream vendors.
>> The vendor delivers code by having the client perform git pull into a specific branch.
>> The customer may take the code as is or merge in customizations.
>> The vendor wants to know exactly what commit of theirs is installed on each server, in near real time.
>> The customer is willing to push the commit-ish to the vendor's upstream repo but does not want, by default, to share the actual commit contents for security reasons.
>> 	Realistically, the vendor needs to know that their own commit id was put somewhere (process exists to track this, so not part of the use-case) and whether there is a subsequent commit contributed >by the customer, but the content is not relevant initially.
>> 
>> After some time, the vendor may request the commit contents from the customer in order to satisfy support requirements - a.k.a. a defect was found but has to be resolved.
>> The customer would then perform a deeper push that looks a lot like a "slightly" symmetrical operation of a deep fetch following a prior sparse fetch to supply the vendor with the specific commit(s).

>Perhaps I'm not understanding the subtleties of what you're describing, but could you do this with stock git functionality.

>Let the vendor publish a "well known branch" for the client.
>Let the client pull that and build.
>Let the client create a branch set to the same commit that they fetched.
>Let the client push that branch as a client-specific branch to the vendor to indicate that that is the official release they are based on.

>Then the vendor would know the official commit that the client was using.
This is the easy part, and it doesn't require anything sparse to exist.

>If the client makes local changes, does the vendor really need the SHA of those -- without the actual content?
>I mean any SHA would do right?  Perhaps let the client create a second client-specific branch (set to
> the same commit as the first) to indicate they had mods.
>Later, when the vendor needs the actual client changes, the client does a normal push to this 2nd client-specific branch at the vendor.
>This would send everything that the client has done to the code since the official release.

What I should have added to the use-case was that there is a strong audit requirement (regulatory, actually) involved that the SHA is exact, immutable, and cannot be substitute or forged (one of the reasons git is in such high regard). So, no I can't arrange a fake SHA to represent a SHA to be named later. It SHA of the installed commit is part of the official record of what happened on the specific server, so I'm stuck with it.

>I'm not sure what you mean about "it is inside a tree".

m---a---b---c---H1
          `---d---H2

d would be at a head. b would be inside. Determining content of c is problematic if b is sparse, so I'm really unsure that any of this is possible.

Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000) 
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFE] Inverted sparseness
  2017-12-01 18:31   ` Randall S. Becker
@ 2017-12-03 23:14     ` Philip Oakley
  2017-12-03 23:44       ` Randall S. Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Philip Oakley @ 2017-12-03 23:14 UTC (permalink / raw)
  To: Randall S. Becker, 'Jeff Hostetler', git

From: "Randall S. Becker" <rsbecker@nexbridge.com>
Sent: Friday, December 01, 2017 6:31 PM
> On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
>>> I recently encountered a really strange use-case relating to sparse 
>>> clone/fetch that is really backwards from the discussion that has been 
>>> going on, and well, I'm a bit embarrassed to bring it up, but I have no 
>>> good solution including building a separate data store that will end up 
>>> inconsistent with repositories (a bad solution).  The use-case is as 
>>> follows:
>>>
>>> Given a backbone of multiple git repositories spread across an 
>>> organization with a server farm and upstream vendors.
>>> The vendor delivers code by having the client perform git pull into a 
>>> specific branch.
>>> The customer may take the code as is or merge in customizations.
>>> The vendor wants to know exactly what commit of theirs is installed on 
>>> each server, in near real time.
>>> The customer is willing to push the commit-ish to the vendor's upstream 
>>> repo but does not want, by default, to share the actual commit contents 
>>> for security reasons.
>>> Realistically, the vendor needs to know that their own commit id was put 
>>> somewhere (process exists to track this, so not part of the use-case) 
>>> and whether there is a subsequent commit contributed >by the customer, 
>>> but the content is not relevant initially.
>>>
>>> After some time, the vendor may request the commit contents from the 
>>> customer in order to satisfy support requirements - a.k.a. a defect was 
>>> found but has to be resolved.
>>> The customer would then perform a deeper push that looks a lot like a 
>>> "slightly" symmetrical operation of a deep fetch following a prior 
>>> sparse fetch to supply the vendor with the specific commit(s).
>
>>Perhaps I'm not understanding the subtleties of what you're describing, 
>>but could you do this with stock git functionality.
>
>>Let the vendor publish a "well known branch" for the client.
>>Let the client pull that and build.
>>Let the client create a branch set to the same commit that they fetched.
>>Let the client push that branch as a client-specific branch to the vendor 
>>to indicate that that is the official release they are based on.
>
>>Then the vendor would know the official commit that the client was using.
> This is the easy part, and it doesn't require anything sparse to exist.
>
>>If the client makes local changes, does the vendor really need the SHA of 
>>those -- without the actual content?
>>I mean any SHA would do right?  Perhaps let the client create a second 
>>client-specific branch (set to
>> the same commit as the first) to indicate they had mods.
>>Later, when the vendor needs the actual client changes, the client does a 
>>normal push to this 2nd client-specific branch at the vendor.
>>This would send everything that the client has done to the code since the 
>>official release.
>
> What I should have added to the use-case was that there is a strong audit 
> requirement (regulatory, actually) involved that the SHA is exact, 
> immutable, and cannot be substitute or forged (one of the reasons git is 
> in such high regard). So, no I can't arrange a fake SHA to represent a SHA 
> to be named later. It SHA of the installed commit is part of the official 
> record of what happened on the specific server, so I'm stuck with it.
>
>>I'm not sure what you mean about "it is inside a tree".
>
> m---a---b---c---H1
>          `---d---H2
>
> d would be at a head. b would be inside. Determining content of c is 
> problematic if b is sparse, so I'm really unsure that any of this is 
> possible.
>
> Cheers,
> Randall
>
> -- Brief whoami: NonStop&UNIX developer since approximately 
> UNIX(421664400)/NonStop(211288444200000000)
> -- In my real life, I talk too much.

I think I get the jist of your use case. Would I be right that you don't 
have a true working solution yet? i.e. that it's a problem that is almost 
sorted but falls down at the last step.

If one pretended that this was a single development shop, and the various 
vendors, clients and customers as being independent devolopers, each of whom 
is over protective of their code, it may give a better view that maps onto 
classic feature development diagrams. (i.e draw the answer for local devs, 
then mark where the splits happen)

In particular, I think you could use a notional regulator's view that the 
whole code base is part of a large Git heirarchy of branches and merges, and 
that some of the feature loops are only available via the particular 
developer that worked on that feature.

This would mean that from a regulatory overview there is a merge commit in 
the 'main' (master) heirachy that has the main and feature commits listed, 
and the feature commit is probably an --allow-empty commit (that has an 
empty tree if they are that paranoid) that says 'function X released' (and 
probably tagged), and that release commit then has, as its parent, the true 
release commit, with the true code tree. The latter commit isn't actually 
being shown to you!

At this point the potential for using the graft capability comes in (as a 
regulated method!). Locally the graft records the missing line of pearls for 
that paranoid dev/vendor/customer/client. The whole git heirachy still 
works.

The question is how to get that  release commit with its empty tree, and its 
tag, to you from the dev. I'd guess that a fast-export of just that 
tag/commit/empty tree would allow you to bring in that sentinel point to 
your heirachy (initially as a psuedo --root), and then graft it on. (I 
haven't checked if fast-export allows such specificity, but it's a method)

You can now form the merge commit and have regulatory oversight and the full 
git validation and verification capability, as long as your web of trust 
extends to the regulator looking effectively across the air gap. It's a 
fresh way of seeing the web of trust.

Thus you/they have various "shallow clones", but with gaps and islands in 
the shallowness....  and those gaps are spanned by grafts (which are 
audited). The `git-replace` may also be an option, but I don't think it's 
quite right for this case. You just have a temporary gap in the history, and 
with fast export

If using the empty tree part doesn't pass muster (i.e. showing nothing isn't 
sufficient), then the narrow clone could come into play to limit what parts 
of the trees are widely visible, but mainly its using the grafts to cover 
the regulatory gap, and (for the moment) using fast-export to transfer the 
singleton commit / tags

Oh Just remembered, there is the newish capability to fetch random blobs, so 
that may help.

Philip

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [RFE] Inverted sparseness
  2017-12-03 23:14     ` Philip Oakley
@ 2017-12-03 23:44       ` Randall S. Becker
  2017-12-04 11:54         ` Philip Oakley
  0 siblings, 1 reply; 6+ messages in thread
From: Randall S. Becker @ 2017-12-03 23:44 UTC (permalink / raw)
  To: 'Philip Oakley', 'Jeff Hostetler', git

On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom: 
>From: "Randall S. Becker" <rsbecker@nexbridge.com>
>Sent: Friday, December 01, 2017 6:31 PM
>> On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>>>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
>>>> I recently encountered a really strange use-case relating to sparse 
>>>> clone/fetch that is really backwards from the discussion that has 
>>>> been going on, and well, I'm a bit embarrassed to bring it up, but I 
>>>> have no good solution including building a separate data store that 
>>>> will end up inconsistent with repositories (a bad solution).  The 
>>>> use-case is as
>>>> follows:
>>>>
>>>> Given a backbone of multiple git repositories spread across an 
>>>> organization with a server farm and upstream vendors.
>>>> The vendor delivers code by having the client perform git pull into 
>>>> a specific branch.
>>>> The customer may take the code as is or merge in customizations.
>>>> The vendor wants to know exactly what commit of theirs is installed 
>>>> on each server, in near real time.
>>>> The customer is willing to push the commit-ish to the vendor's 
>>>> upstream repo but does not want, by default, to share the actual 
>>>> commit contents for security reasons.
>>>> Realistically, the vendor needs to know that their own commit id was 
>>>> put somewhere (process exists to track this, so not part of the 
>>>> use-case) and whether there is a subsequent commit contributed >by 
>>>> the customer, but the content is not relevant initially.
>>>>
>>>> After some time, the vendor may request the commit contents from the 
>>>> customer in order to satisfy support requirements - a.k.a. a defect 
>>>> was found but has to be resolved.
>>>> The customer would then perform a deeper push that looks a lot like 
>>>> a "slightly" symmetrical operation of a deep fetch following a prior 
>>>> sparse fetch to supply the vendor with the specific commit(s).
>>
>>>Perhaps I'm not understanding the subtleties of what you're 
>>>describing, but could you do this with stock git functionality.
>>
>>>Let the vendor publish a "well known branch" for the client.
>>>Let the client pull that and build.
>>>Let the client create a branch set to the same commit that they fetched.
>>>Let the client push that branch as a client-specific branch to the 
>>>vendor to indicate that that is the official release they are based on.
>>
>>>Then the vendor would know the official commit that the client was using.
>> This is the easy part, and it doesn't require anything sparse to exist.
>>
>>>If the client makes local changes, does the vendor really need the SHA 
>>>of those -- without the actual content?
>>>I mean any SHA would do right?  Perhaps let the client create a second 
>>>client-specific branch (set to  the same commit as the first) to 
>>>indicate they had mods.
>>>Later, when the vendor needs the actual client changes, the client 
>>>does a normal push to this 2nd client-specific branch at the vendor.
>>>This would send everything that the client has done to the code since 
>>>the official release.
>>
>> What I should have added to the use-case was that there is a strong 
>> audit requirement (regulatory, actually) involved that the SHA is 
>> exact, immutable, and cannot be substitute or forged (one of the 
>> reasons git is in such high regard). So, no I can't arrange a fake SHA 
>> to represent a SHA to be named later. It SHA of the installed commit 
>> is part of the official record of what happened on the specific server, so I'm stuck with it.
>>
>>>I'm not sure what you mean about "it is inside a tree".
>>
>> m---a---b---c---H1
>>          `---d---H2
>>
>> d would be at a head. b would be inside. Determining content of c is 
>> problematic if b is sparse, so I'm really unsure that any of this is 
>> possible.

>I think I get the jist of your use case. Would I be right that you don't have a true working
>solution yet? i.e. that it's a problem that is almost sorted but falls down at the last step.

>If one pretended that this was a single development shop, and the various vendors, clients
>and customers as being independent devolopers, each of whom is over protective of their
>code, it may give a better view that maps onto classic feature development diagrams.
>(i.e draw the answer for local devs, then mark where the splits happen)

>In particular, I think you could use a notional regulator's view that the whole code base is
>part of a large Git heirarchy of branches and merges, and that some of the feature loops
>are only available via the particular developer that worked on that feature.

>This would mean that from a regulatory overview there is a merge commit in the 'main'
>(master) heirachy that has the main and feature commits listed, and the feature commit
>is probably an --allow-empty commit (that has an empty tree if they are that paranoid) that
>says 'function X released' (and probably tagged), and that release commit then has, as its
>parent, the true release commit, with the true code tree. The latter commit isn't actually being
>shown to you!

>At this point the potential for using the graft capability comes in (as a regulated method!). 
>Locally the graft records the missing line of pearls for that paranoid dev/vendor/customer/client. 
>The whole git heirachy still works.

>The question is how to get that  release commit with its empty tree, and its tag, to you from
>the dev. I'd guess that a fast-export of just that tag/commit/empty tree would allow you to
>bring in that sentinel point to your heirachy (initially as a psuedo --root), and then graft it
>on. (I haven't checked if fast-export allows such specificity, but it's a method)

>You can now form the merge commit and have regulatory oversight and the full git validation
>and verification capability, as long as your web of trust extends to the regulator looking
>effectively across the air gap. It's a fresh way of seeing the web of trust.

>Thus you/they have various "shallow clones", but with gaps and islands in the shallowness.... 
>and those gaps are spanned by grafts (which are audited). The `git-replace` may also be an
>option, but I don't think it's quite right for this case. You just have a temporary gap in
>the history, and with fast export

>If using the empty tree part doesn't pass muster (i.e. showing nothing isn't sufficient), 
>then the narrow clone could come into play to limit what parts of the trees are widely
>visible, but mainly its using the grafts to cover the regulatory gap, and (for the moment) 
>using fast-export to transfer the singleton commit / tags

>Oh Just remembered, there is the newish capability to fetch random blobs, so that may help.

I think you hit the nail on the head pretty well. We're currently at 2.3.7, with a push to 2.15.1 this week, so I'm looking forward to trying this. My two worries are whether the empty tree is acceptable (it should be to the client, and might be to the vendor), and doing this reliably (semi-automated) so the user base does not have to worry about the gory details of doing this. The unit tests for it are undoubtedly going to give me headaches.

Thanks for the advice. Islands of shallowness are a really descriptive image for what this is. So identifying that there are shoals (to extend the metaphor somewhat), will be crucial to this adventure.

Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000) 
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFE] Inverted sparseness
  2017-12-03 23:44       ` Randall S. Becker
@ 2017-12-04 11:54         ` Philip Oakley
  0 siblings, 0 replies; 6+ messages in thread
From: Philip Oakley @ 2017-12-04 11:54 UTC (permalink / raw)
  To: Randall S. Becker, 'Jeff Hostetler', Git List

From: "Randall S. Becker"  :December 03, 2017 11:44 PM
On December 3, 2017 6:14 PM, Philip Oakley wrote a nugget of wisdom:
>From: "Randall S. Becker" <rsbecker@nexbridge.com>
>Sent: Friday, December 01, 2017 6:31 PM
>> On December 1, 2017 1:19 PM, Jeff Hostetler wrote:
>>>On 12/1/2017 12:21 PM, Randall S. Becker wrote:
>>>> I recently encountered a really strange use-case relating to sparse
>>>> clone/fetch that is really backwards from the discussion that has
>>>> been going on, and well, I'm a bit embarrassed to bring it up, but I
>>>> have no good solution including building a separate data store that
>>>> will end up inconsistent with repositories (a bad solution).  The
>>>> use-case is as
>>>> follows:
>>>>
>>>> Given a backbone of multiple git repositories spread across an
>>>> organization with a server farm and upstream vendors.
>>>> The vendor delivers code by having the client perform git pull into
>>>> a specific branch.
>>>> The customer may take the code as is or merge in customizations.
>>>> The vendor wants to know exactly what commit of theirs is installed
>>>> on each server, in near real time.
>>>> The customer is willing to push the commit-ish to the vendor's
>>>> upstream repo but does not want, by default, to share the actual
>>>> commit contents for security reasons.
>>>> Realistically, the vendor needs to know that their own commit id was
>>>> put somewhere (process exists to track this, so not part of the
>>>> use-case) and whether there is a subsequent commit contributed >by
>>>> the customer, but the content is not relevant initially.
>>>>
>>>> After some time, the vendor may request the commit contents from the
>>>> customer in order to satisfy support requirements - a.k.a. a defect
>>>> was found but has to be resolved.
>>>> The customer would then perform a deeper push that looks a lot like
>>>> a "slightly" symmetrical operation of a deep fetch following a prior
>>>> sparse fetch to supply the vendor with the specific commit(s).
>>
>>>Perhaps I'm not understanding the subtleties of what you're
>>>describing, but could you do this with stock git functionality.
>>
>>>Let the vendor publish a "well known branch" for the client.
>>>Let the client pull that and build.
>>>Let the client create a branch set to the same commit that they fetched.
>>>Let the client push that branch as a client-specific branch to the
>>>vendor to indicate that that is the official release they are based on.
>>
>>>Then the vendor would know the official commit that the client was using.
>> This is the easy part, and it doesn't require anything sparse to exist.
>>
>>>If the client makes local changes, does the vendor really need the SHA
>>>of those -- without the actual content?
>>>I mean any SHA would do right?  Perhaps let the client create a second
>>>client-specific branch (set to  the same commit as the first) to
>>>indicate they had mods.
>>>Later, when the vendor needs the actual client changes, the client
>>>does a normal push to this 2nd client-specific branch at the vendor.
>>>This would send everything that the client has done to the code since
>>>the official release.
>>
>> What I should have added to the use-case was that there is a strong
>> audit requirement (regulatory, actually) involved that the SHA is
>> exact, immutable, and cannot be substitute or forged (one of the
>> reasons git is in such high regard). So, no I can't arrange a fake SHA
>> to represent a SHA to be named later. It SHA of the installed commit
>> is part of the official record of what happened on the specific server,
>> so I'm stuck with it.
>>
>>>I'm not sure what you mean about "it is inside a tree".
>>
>> m---a---b---c---H1
>>          `---d---H2
>>
>> d would be at a head. b would be inside. Determining content of c is
>> problematic if b is sparse, so I'm really unsure that any of this is
>> possible.

>I think I get the jist of your use case. Would I be right that you don't
>have a true working
>solution yet? i.e. that it's a problem that is almost sorted but falls down
>at the last step.

>If one pretended that this was a single development shop, and the various
>vendors, clients
>and customers as being independent devolopers, each of whom is over
>protective of their
>code, it may give a better view that maps onto classic feature development
>diagrams.
>(i.e draw the answer for local devs, then mark where the splits happen)

>In particular, I think you could use a notional regulator's view that the
>whole code base is
>part of a large Git heirarchy of branches and merges, and that some of the
>feature loops
>are only available via the particular developer that worked on that
>feature.

>This would mean that from a regulatory overview there is a merge commit in
>the 'main'
>(master) heirachy that has the main and feature commits listed, and the
>feature commit
>is probably an --allow-empty commit (that has an empty tree if they are
>that paranoid) that
>says 'function X released' (and probably tagged), and that release commit
>then has, as its
>parent, the true release commit, with the true code tree. The latter commit
>isn't actually being
>shown to you!

>At this point the potential for using the graft capability comes in (as a
>regulated method!).
>Locally the graft records the missing line of pearls for that paranoid
>dev/vendor/customer/client.
>The whole git heirachy still works.

>The question is how to get that  release commit with its empty tree, and
>its tag, to you from
>the dev. I'd guess that a fast-export of just that tag/commit/empty tree
>would allow you to
>bring in that sentinel point to your heirachy (initially as a
>psuedo --root), and then graft it
>on. (I haven't checked if fast-export allows such specificity, but it's a
>method)

>You can now form the merge commit and have regulatory oversight and the
>full git validation
>and verification capability, as long as your web of trust extends to the
>regulator looking
>effectively across the air gap. It's a fresh way of seeing the web of
>trust.

>Thus you/they have various "shallow clones", but with gaps and islands in
>the shallowness....
>and those gaps are spanned by grafts (which are audited). The `git-replace`
>may also be an
>option, but I don't think it's quite right for this case. You just have a
>temporary gap in
>the history, and with fast export

>If using the empty tree part doesn't pass muster (i.e. showing nothing
>isn't sufficient),
>then the narrow clone could come into play to limit what parts of the trees
>are widely
>visible, but mainly its using the grafts to cover the regulatory gap, and
>(for the moment)
>using fast-export to transfer the singleton commit / tags

>Oh Just remembered, there is the newish capability to fetch random blobs,
>so that may help.

Randall said>
I think you hit the nail on the head pretty well. We're currently at 2.3.7,
with a push to 2.15.1 this week, so I'm looking forward to trying this. My
two worries are whether the empty tree is acceptable (it should be to the
client, and might be to the vendor), and doing this reliably
(semi-automated) so the user base does not have to worry about the gory
details of doing this. The unit tests for it are undoubtedly going to give
me headaches.

Thanks for the advice. Islands of shallowness are a really descriptive image
for what this is. So identifying that there are shoals (to extend the
metaphor somewhat), will be crucial to this adventure.
---

An overnight sleep remined me that the ideal way of transferring across the 
air gap is *Obviously* the use of `git bundle`

Bundle allows you to specify the exact revisions (the tag and commits) in 
the bundle that is sneakernet transferred (or email) between the repos.

I'm also thinking that the vendor/client/customet should also have, on their 
side, one of the empty merge commits that shows both the original fork-point 
(aka merge base) and their current (empty) release commit. This provides the 
authentication and verification that they have used the right base commit 
for their ox-bow lake of disconnected development (Oh the metaphors just 
keep coming). It also provies a place for the automated/scripted graft to 
get the two ends of the graft from, and check they are valid.

It would be very easy for transcription errors to sneak in at that step of 
recording the fork-point (which would be a bit of unexpected river capture - 
https://phys.org/news/2017-04-retreating-yukon-glacier-river.html), so 
making the client also do it removes that concern.

The creation of client side and your side empty-merges should also create a 
criss-cross plaiting that locks the two processes together - it's almost a 
block-chain!

Hope it goes well. It would be great to hear the result.
--
Philip



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-12-04 11:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 17:21 [RFE] Inverted sparseness Randall S. Becker
2017-12-01 18:19 ` Jeff Hostetler
2017-12-01 18:31   ` Randall S. Becker
2017-12-03 23:14     ` Philip Oakley
2017-12-03 23:44       ` Randall S. Becker
2017-12-04 11:54         ` Philip Oakley

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).