git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] git-fast-import: note 1M limit of mark number
@ 2008-04-15 12:54 Sam Vilain
  2008-04-15 15:50 ` Michael Haggerty
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Vilain @ 2008-04-15 12:54 UTC (permalink / raw)
  To: git

The insert_mark() function in fast-import.c has this limit; note the
limitation in the documentation.

Signed-off-by: Sam Vilain <sam@vilain.net>
---
 Documentation/git-fast-import.txt |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index c29a4f8..b5cc3c2 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -410,6 +410,9 @@ or `refs/heads/42`), or an abbreviated SHA-1 which happened to
 consist only of base-10 digits.
 +
 Marks must be declared (via `mark`) before they can be used.
++
+Note that due to current internal limitations, you may not make marks
+with a higher number than 1048575 (2^20-1).
 
 * A complete 40 byte or abbreviated commit SHA-1 in hex.
 
-- 
1.5.4.rc2.85.g7c8f5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-fast-import: note 1M limit of mark number
  2008-04-15 12:54 [PATCH] git-fast-import: note 1M limit of mark number Sam Vilain
@ 2008-04-15 15:50 ` Michael Haggerty
  2008-04-15 21:05   ` Sam Vilain
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Haggerty @ 2008-04-15 15:50 UTC (permalink / raw)
  To: Sam Vilain; +Cc: git

Sam Vilain wrote:
> The insert_mark() function in fast-import.c has this limit; note the
> limitation in the documentation.
> 
> Signed-off-by: Sam Vilain <sam@vilain.net>
> ---
>  Documentation/git-fast-import.txt |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index c29a4f8..b5cc3c2 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -410,6 +410,9 @@ or `refs/heads/42`), or an abbreviated SHA-1 which happened to
>  consist only of base-10 digits.
>  +
>  Marks must be declared (via `mark`) before they can be used.
> ++
> +Note that due to current internal limitations, you may not make marks
> +with a higher number than 1048575 (2^20-1).
>  
>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
>  

Oh.  Um.  That is an awkwardly small number nowadays.

cvs2svn has been used for repositories with O(2^20) distinct file
revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
small for some users.

Moreover, cvs2git needs to generate marks for both file contents and for
commits.  It generates the latter by adding 1000000000 to the small
integer IDs that it uses internally.  If git-fast-import only allows
20-bit integers, this makes me wonder why this hasn't broken
dramatically in the past.  Pure numerological good fortune, combined
with weak range checking in git-fast-import?

In any case, this restriction will require changes in cvs2git.

While I'm at it, let me also renew my suggestion that git-fast-import
use separate namespaces ("markspaces", so to speak) for file content
marks and for commit marks.  There is no reason for these distinct types
of marks to be located in a shared space of integers.

Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-fast-import: note 1M limit of mark number
  2008-04-15 15:50 ` Michael Haggerty
@ 2008-04-15 21:05   ` Sam Vilain
  2008-04-16  6:54     ` Shawn O. Pearce
  2008-04-16  7:04     ` Michael Haggerty
  0 siblings, 2 replies; 5+ messages in thread
From: Sam Vilain @ 2008-04-15 21:05 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git

Michael Haggerty wrote:
>> ++
>> +Note that due to current internal limitations, you may not make marks
>> +with a higher number than 1048575 (2^20-1).
>>  
>>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
>>  
> 
> Oh.  Um.  That is an awkwardly small number nowadays.
> 
> cvs2svn has been used for repositories with O(2^20) distinct file
> revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
> small for some users.

Right.  But, if you're not making the importer you write for a
conversion of that size restartable, you're insane.  So, marking more
than 1Mi *marks* in a single gfi session might not be so vital.

It only tripped me up because I was using a database sequence to
generate the marks, which meant I hit the ceiling.

> Moreover, cvs2git needs to generate marks for both file contents and for
> commits.  It generates the latter by adding 1000000000 to the small
> integer IDs that it uses internally.  If git-fast-import only allows
> 20-bit integers, this makes me wonder why this hasn't broken
> dramatically in the past.  Pure numerological good fortune, combined
> with weak range checking in git-fast-import?

Perhaps.  All I saw was that after I hit 1Mi for the mark ID, the mark
numbers in the returned file were drastically different from the ones I
put in.  I had a glance over this code and it seemed likely to be a
culprit - this docpatch is really more raising awareness of the problem.
 Obviously finding the fault and fixing it would be preferable.

> While I'm at it, let me also renew my suggestion that git-fast-import
> use separate namespaces ("markspaces", so to speak) for file content
> marks and for commit marks.  There is no reason for these distinct types
> of marks to be located in a shared space of integers.

There is a reason, it's because they're both just object IDs.  Is it
really that much of a drag?  I know what you mean though, it meant for
my code I had to keep track of which type each mark was.

Sam.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-fast-import: note 1M limit of mark number
  2008-04-15 21:05   ` Sam Vilain
@ 2008-04-16  6:54     ` Shawn O. Pearce
  2008-04-16  7:04     ` Michael Haggerty
  1 sibling, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2008-04-16  6:54 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Michael Haggerty, git

Sam Vilain <sam@vilain.net> wrote:
> Michael Haggerty wrote:
> >> ++
> >> +Note that due to current internal limitations, you may not make marks
> >> +with a higher number than 1048575 (2^20-1).
> >>  
> >>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
> > 
> > Oh.  Um.  That is an awkwardly small number nowadays.
> > 
> > cvs2svn has been used for repositories with O(2^20) distinct file
> > revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
> > small for some users.
> 
> Right.  But, if you're not making the importer you write for a
> conversion of that size restartable, you're insane.  So, marking more
> than 1Mi *marks* in a single gfi session might not be so vital.
> 
> It only tripped me up because I was using a database sequence to
> generate the marks, which meant I hit the ceiling.

Uhm.  Wow.  gfi has a bug then; mark numbers are supposed to
be whatever uintmax_t is on your platform; for any systems that
support a 64 bit off_t (and most do these days) that should be a
64 bit integer value, which we all know easily exceeds 2^20-1.

I'd rather see the bug fixed than a documentation patch.  But I'm
too whacked out from heavy travel in the past two days flying
coast-to-coast and back to attempt debugging gfi and writing such
a fix patch tonight.  Maybe someone else will find it before I
can look at it later.  ;-)
 
> > While I'm at it, let me also renew my suggestion that git-fast-import
> > use separate namespaces ("markspaces", so to speak) for file content
> > marks and for commit marks.  There is no reason for these distinct types
> > of marks to be located in a shared space of integers.
> 
> There is a reason, it's because they're both just object IDs.  Is it
> really that much of a drag?  I know what you mean though, it meant for
> my code I had to keep track of which type each mark was.

Yea.  I think I had pointed out the same point to Michael earlier
when he asked about it.  I didn't want create two different tables
of marks because then we either need to extend the language to say
which table, or we have to infer it based on position.  But either
way its just object IDs.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-fast-import: note 1M limit of mark number
  2008-04-15 21:05   ` Sam Vilain
  2008-04-16  6:54     ` Shawn O. Pearce
@ 2008-04-16  7:04     ` Michael Haggerty
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Haggerty @ 2008-04-16  7:04 UTC (permalink / raw)
  To: Sam Vilain; +Cc: git

Sam Vilain wrote:
> Michael Haggerty wrote:
>>> ++
>>> +Note that due to current internal limitations, you may not make marks
>>> +with a higher number than 1048575 (2^20-1).
>>>  
>>>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
>>>  
>> Oh.  Um.  That is an awkwardly small number nowadays.
>>
>> cvs2svn has been used for repositories with O(2^20) distinct file
>> revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
>> small for some users.
> 
> Right.  But, if you're not making the importer you write for a
> conversion of that size restartable, you're insane.  So, marking more
> than 1Mi *marks* in a single gfi session might not be so vital.

Well, call me insane then.  Because there *are* no bulletproof
incremental CVS importers, and it would be a lot of work to write one.
(git-cvsimport doesn't count because it is far from bulletproof.)  So it
is reasonable to choose a high-quality one-time conversion over a broken
restartable conversion.

>> While I'm at it, let me also renew my suggestion that git-fast-import
>> use separate namespaces ("markspaces", so to speak) for file content
>> marks and for commit marks.  There is no reason for these distinct types
>> of marks to be located in a shared space of integers.
> 
> There is a reason, it's because they're both just object IDs.  Is it
> really that much of a drag?  I know what you mean though, it meant for
> my code I had to keep track of which type each mark was.

That's exactly my point.  That's at least two of us then who have had
extra work to merge two separate ID concepts into a single integer
sequence.  It's not a big deal, but it would be convenient if
git-fast-import allowed the sequences to overlap.

Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-04-16  7:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-15 12:54 [PATCH] git-fast-import: note 1M limit of mark number Sam Vilain
2008-04-15 15:50 ` Michael Haggerty
2008-04-15 21:05   ` Sam Vilain
2008-04-16  6:54     ` Shawn O. Pearce
2008-04-16  7:04     ` Michael Haggerty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).