git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] git-fast-import: note 1M limit of mark number
Date: Wed, 16 Apr 2008 09:05:06 +1200	[thread overview]
Message-ID: <48051882.8000201@vilain.net> (raw)
In-Reply-To: <4804CECE.2040205@alum.mit.edu>

Michael Haggerty wrote:
>> ++
>> +Note that due to current internal limitations, you may not make marks
>> +with a higher number than 1048575 (2^20-1).
>>  
>>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
>>  
> 
> Oh.  Um.  That is an awkwardly small number nowadays.
> 
> cvs2svn has been used for repositories with O(2^20) distinct file
> revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
> small for some users.

Right.  But, if you're not making the importer you write for a
conversion of that size restartable, you're insane.  So, marking more
than 1Mi *marks* in a single gfi session might not be so vital.

It only tripped me up because I was using a database sequence to
generate the marks, which meant I hit the ceiling.

> Moreover, cvs2git needs to generate marks for both file contents and for
> commits.  It generates the latter by adding 1000000000 to the small
> integer IDs that it uses internally.  If git-fast-import only allows
> 20-bit integers, this makes me wonder why this hasn't broken
> dramatically in the past.  Pure numerological good fortune, combined
> with weak range checking in git-fast-import?

Perhaps.  All I saw was that after I hit 1Mi for the mark ID, the mark
numbers in the returned file were drastically different from the ones I
put in.  I had a glance over this code and it seemed likely to be a
culprit - this docpatch is really more raising awareness of the problem.
 Obviously finding the fault and fixing it would be preferable.

> While I'm at it, let me also renew my suggestion that git-fast-import
> use separate namespaces ("markspaces", so to speak) for file content
> marks and for commit marks.  There is no reason for these distinct types
> of marks to be located in a shared space of integers.

There is a reason, it's because they're both just object IDs.  Is it
really that much of a drag?  I know what you mean though, it meant for
my code I had to keep track of which type each mark was.

Sam.

  reply	other threads:[~2008-04-15 21:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-15 12:54 [PATCH] git-fast-import: note 1M limit of mark number Sam Vilain
2008-04-15 15:50 ` Michael Haggerty
2008-04-15 21:05   ` Sam Vilain [this message]
2008-04-16  6:54     ` Shawn O. Pearce
2008-04-16  7:04     ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48051882.8000201@vilain.net \
    --to=sam@vilain.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).