git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Repository data loss in fast-export with a merge of a deleted submodule
@ 2011-10-27 19:27 Joshua Jensen
  2011-11-03 16:05 ` Joshua Jensen
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Jensen @ 2011-10-27 19:27 UTC (permalink / raw
  To: git@vger.kernel.org

Hello.

We had a submodule that we deleted and then added back into the 
repository at the same location as the former submodule.  When running 
fast-export, the newly 'added' files for the merge commit are listed and 
then are followed with a:

M ... path/to/submodule/file
D path/to/submodule

On fast-import, the resultant repository becomes corrupt due to the 
Delete instruction above occurring AFTER the file adds/modifications.  
The new repository does not match the old repository where the 
fast-export was performed.

I have included a repro script below.  I have not been able to test this 
on Git 1.7.7.1, but I have tested on Git 1.7.7 (msysGit version).

Please compare the differences between the generated main.fe and 
newmain.fe files.  newmain.fe has data loss.

I am not familiar with the fast-export code.  Can anyone help out?

Thanks.

Josh

---------

rm -rf main brokenmain sub main.fenewmain.fe

# Create the submodule.
mkdir sub
cd sub
git init
echo file > file
git add file
git commit -m file
cd ..

# Create the main repository.
mkdir main
cd main
git init

# Add the submodule.
git submodule add ../sub sub
git commit -m "Add submodule"

# Remove the submodule.
rm -rf sub
git rm sub .gitmodules
git commit -m "Remove submodule"

# Add sub/file to the master branch.
mkdir sub
echo file > sub/file
git add sub/file
git commit -m "Add sub/file"
if [ -f sub/file ]; then
     echo "main: master branch: sub/file exists."
fi

# Delete the submodule directory manually, because we know that the 
incoming merge will need it gone.
git checkout -B will-be-broken HEAD^^
rm -rf sub
git merge --no-ff master

# sub/file exists within the 'will-be-broken' branch.
if [ -f sub/file ]; then
     echo "main: will-be-broken branch: sub/file exists."
fi

# Export out the main repository.
git fast-export --all > ../main.fe

# Create the brokenmain repository.
cd ..
mkdir brokenmain
cd brokenmain
git init

# Import in everything from the main repository.
git fast-import < ../main.fe

# sub/file exists within the master branch.
git checkout master
if [ -f sub/file ]; then
     echo "brokenmain: master branch: sub/file exists."
fi

# sub/file SHOULD exist within the 'will-be-broken' branch but doesn't.
git checkout will-be-broken
if [ ! -f sub/file ]; then
     echo "brokenmain: will-be-broken branch: sub/file SHOULD exist but 
doesn't."
fi

# Export out the brokenmain repository.
git fast-export --all > ../brokenmain.fe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Repository data loss in fast-export with a merge of a deleted submodule
  2011-10-27 19:27 Repository data loss in fast-export with a merge of a deleted submodule Joshua Jensen
@ 2011-11-03 16:05 ` Joshua Jensen
  2011-11-14 15:06   ` Joshua Jensen
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Jensen @ 2011-11-03 16:05 UTC (permalink / raw
  To: git@vger.kernel.org

----- Original Message -----
From: Joshua Jensen
Date: 10/27/2011 1:27 PM
> We had a submodule that we deleted and then added back into the 
> repository at the same location as the former submodule.  When running 
> fast-export, the newly 'added' files for the merge commit are listed 
> and then are followed with a:
>
> M ... path/to/submodule/file
> D path/to/submodule
>
> On fast-import, the resultant repository becomes corrupt due to the 
> Delete instruction above occurring AFTER the file adds/modifications.  
> The new repository does not match the old repository where the 
> fast-export was performed.
>
> I am not familiar with the fast-export code.  Can anyone help out?
Okay, I looked into this further, and I came up with a patch that works 
for me.  Nevertheless, I do not understand exactly what is going on 
here, so I would like to defer to someone else's patch to fix the issue.

-Josh


---
builtin/fast-export.c |    8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9836e6b..1abc470 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -161,6 +161,14 @@ static int depth_first(const void *a_, const void *b_)
                name_a = a->one ? a->one->path : a->two->path;
                name_b = b->one ? b->one->path : b->two->path;
+             /*
+             * Move 'D'elete entries first.
+             */
+             if (a->status == 'D')
+                             return -1;
+             else if (b->status == 'D')
+                             return 1;
+
                len_a = strlen(name_a);
                len_b = strlen(name_b);
                len = (len_a < len_b) ? len_a : len_b;
-- 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Repository data loss in fast-export with a merge of a deleted submodule
  2011-11-03 16:05 ` Joshua Jensen
@ 2011-11-14 15:06   ` Joshua Jensen
  2011-11-14 19:51     ` Jens Lehmann
  2011-11-30  7:15     ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Joshua Jensen @ 2011-11-14 15:06 UTC (permalink / raw
  To: git@vger.kernel.org

----- Original Message -----
From: Joshua Jensen
Date: 11/3/2011 10:05 AM
> ----- Original Message -----
> From: Joshua Jensen
> Date: 10/27/2011 1:27 PM
>> We had a submodule that we deleted and then added back into the 
>> repository at the same location as the former submodule.  When 
>> running fast-export, the newly 'added' files for the merge commit are 
>> listed and then are followed with a:
>>
>> M ... path/to/submodule/file
>> D path/to/submodule
>>
>> On fast-import, the resultant repository becomes corrupt due to the 
>> Delete instruction above occurring AFTER the file 
>> adds/modifications.  The new repository does not match the old 
>> repository where the fast-export was performed.
>>
>> I am not familiar with the fast-export code.  Can anyone help out?
> Okay, I looked into this further, and I came up with a patch that 
> works for me.  Nevertheless, I do not understand exactly what is going 
> on here, so I would like to defer to someone else's patch to fix the 
> issue.
>
Hi.

__This is a genuine data loss problem in Git.__

I'm confused at the lack of response to this.  I first posted about the 
issue **2-1/2 weeks ago**, and there have been no responses  Does no one 
care?

In case no one received the messages, you can find them at [1] and [2].

-Josh

[1] http://www.spinics.net/lists/git/msg168295.html
[2] http://www.spinics.net/lists/git/msg168691.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Repository data loss in fast-export with a merge of a deleted submodule
  2011-11-14 15:06   ` Joshua Jensen
@ 2011-11-14 19:51     ` Jens Lehmann
  2011-11-30  7:15     ` Jeff King
  1 sibling, 0 replies; 5+ messages in thread
From: Jens Lehmann @ 2011-11-14 19:51 UTC (permalink / raw
  To: Joshua Jensen; +Cc: git@vger.kernel.org, Elijah Newren, Johannes Sixt

Am 14.11.2011 16:06, schrieb Joshua Jensen:
> ----- Original Message -----
> From: Joshua Jensen
> Date: 11/3/2011 10:05 AM
>> ----- Original Message -----
>> From: Joshua Jensen
>> Date: 10/27/2011 1:27 PM
>>> We had a submodule that we deleted and then added back into the repository at the same location as the former submodule.  When running fast-export, the newly 'added' files for the merge commit are listed and then are followed with a:
>>>
>>> M ... path/to/submodule/file
>>> D path/to/submodule
>>>
>>> On fast-import, the resultant repository becomes corrupt due to the Delete instruction above occurring AFTER the file adds/modifications.  The new repository does not match the old repository where the fast-export was performed.
>>>
>>> I am not familiar with the fast-export code.  Can anyone help out?
>> Okay, I looked into this further, and I came up with a patch that works for me.  Nevertheless, I do not understand exactly what is going on here, so I would like to defer to someone else's patch to fix the issue.
>>
> Hi.
> 
> __This is a genuine data loss problem in Git.__
> 
> I'm confused at the lack of response to this.  I first posted about the issue **2-1/2 weeks ago**, and there have been no responses  Does no one care?

Maybe no one cares, people didn't read the message (or forgot about it)
or they are too busy ... thanks for prodding us again.

While I'm interested in this issue because submodules are affected, I'm
very short on Git time these days and can't investigate this issue
further (and I have no clue about export/import either). I added the last
two people who touched depth_first() in builtin/fast-export.c to the CC,
maybe they can tell us more about your patch to solve this issue (found
in [2]).

> In case no one received the messages, you can find them at [1] and [2].
> 
> -Josh
> 
> [1] http://www.spinics.net/lists/git/msg168295.html
> [2] http://www.spinics.net/lists/git/msg168691.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Repository data loss in fast-export with a merge of a deleted submodule
  2011-11-14 15:06   ` Joshua Jensen
  2011-11-14 19:51     ` Jens Lehmann
@ 2011-11-30  7:15     ` Jeff King
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2011-11-30  7:15 UTC (permalink / raw
  To: Joshua Jensen; +Cc: git@vger.kernel.org

On Mon, Nov 14, 2011 at 08:06:51AM -0700, Joshua Jensen wrote:

> __This is a genuine data loss problem in Git.__
> 
> I'm confused at the lack of response to this.  I first posted about
> the issue **2-1/2 weeks ago**, and there have been no responses  Does
> no one care?

Still not much response.

I think the keywords "submodule" and "fast-export" in the subject line
hit a lot of people's do-not-care filters.

I read your original two messages. It does seem like a simple ordering
problem from your description. I suspect you would get more response to
actually post your patch with a commit message explaining the problem,
and an accompanying test. And then at the very least, one outcome could
be Junio picking up the patch. :)

I think you have all of those components spread across your messages,
and just need to polish them and put them in one place.

Regarding your patch itself, your explanation make sense to me and the
goal of your patch looks reasonable. Bearing in mind that I know
virtually nothing about the innards fast-import/fast-export.

But for the patch text itself:

> @@ -161,6 +161,14 @@ static int depth_first(const void *a_, const void *b_)
>                name_a = a->one ? a->one->path : a->two->path;
>                name_b = b->one ? b->one->path : b->two->path;
> +             /*
> +             * Move 'D'elete entries first.
> +             */
> +             if (a->status == 'D')
> +                             return -1;
> +             else if (b->status == 'D')
> +                             return 1;
> +
>                len_a = strlen(name_a);
>                len_b = strlen(name_b);
>                len = (len_a < len_b) ? len_a : len_b;

If you have multiple deleted entries, doesn't this leave them in a
random order at the beginning of the list? Does that matter? If they are
both 'D', should they be compared as usual? I.e.:

  if (a->status != b->status) {
          if (a->status == 'D')
                  return -1;
          if (b->status == 'D')
                  return 1;
  }

  /* and now we do the rest of the function as usual... */

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-11-30  7:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-27 19:27 Repository data loss in fast-export with a merge of a deleted submodule Joshua Jensen
2011-11-03 16:05 ` Joshua Jensen
2011-11-14 15:06   ` Joshua Jensen
2011-11-14 19:51     ` Jens Lehmann
2011-11-30  7:15     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).