* Revisiting large binary files issue. @ 2006-07-10 23:01 Carl Baldwin 2006-07-10 23:14 ` Junio C Hamano 2006-07-10 23:28 ` Linus Torvalds 0 siblings, 2 replies; 37+ messages in thread From: Carl Baldwin @ 2006-07-10 23:01 UTC (permalink / raw) To: git Hi, Some of this stuff has been discussed before but I thought I'd bring it up again. I am using git in a way for which, admittedly, it was not intended. I have repositories in which I'm storing some source code mixed with some large binary files (from 20MB up to 1GB in the worst case). The large files easily dominate the space in the repository and the time that it takes to perform network operations between repositories. Just to refresh your memory a sample of these files led to work that Nicolas did to improve the performance of packing large binary files. Thank you Nicolas, I think your work improved the speed of git in the face of largish files. Attempting to delta compress these blobs is a frivolous effort. The nature of the blobs is such that if given two blobs: 'A', and the next revision, 'B' it is just as good --- from a pack-size standpoint --- to compress the entire contents of 'B' than try to find a delta from A -> B. It is also much faster than trying to find deltas. In git, I can accomplish this by setting the pack window (I think this is right) to 0. When I set the window to 0 I one more issue. Even though the blobs are already compressed on disk I still seem to pay the penalty of inflating them into memory and then deflating them into the pack. When the window size is 0 this is just wasted cycles. With large binary files these wasted cycles slow down the push/fetch operation considerably. Couldn't the compressed blobs be copied into the pack without first deflating them in this 0 window case? My 'porcelain' on top of git works around these issues by first rsyncing the object directories and then running the git push/fetch command afterward. The push/fetch command sees that the remote is up-to-date with all the necessary objects and skips packing. This works and is fast (much faster than even packing with window of 0 because of the time to inflate/deflate the objects) but I'd like to remove this work-around in the long-term. Ideally, there are two things that I would like available with git to be able to remove my work-around. First, I would like to be able to set the packing window to 0 for all of the git commands. It would be nice if I could set this in a per-repository config file so that any push/fetch operation would honor this window. Is there currently a way to do this? Second, I would like to not pay the penalty to inflate and then deflate the objects into the pack when I use a window of 0. How hard would this be? I am a capable programmer and wouldn't mind getting my hands dirty in the code to implement this if someone could point me in the right direction. Thanks for your time, Carl Baldwin -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Carl Baldwin RADCAD (R&D CAD) Hewlett Packard Company MS 88 work: 970 898-1523 3404 E. Harmony Rd. work: Carl.N.Baldwin@hp.com Fort Collins, CO 80525 home: Carl@ecBaldwin.net - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Revisiting large binary files issue. 2006-07-10 23:01 Revisiting large binary files issue Carl Baldwin @ 2006-07-10 23:14 ` Junio C Hamano 2006-07-11 6:20 ` Peter Baumann 2006-07-10 23:28 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2006-07-10 23:14 UTC (permalink / raw) To: Carl Baldwin; +Cc: git Carl Baldwin <cnb@fc.hp.com> writes: > First, I would like to be able to set the packing window to 0 for all of > the git commands. It would be nice if I could set this in a > per-repository config file so that any push/fetch operation would honor > this window. Is there currently a way to do this? Should not be hard to add. > Second, I would like to not pay the penalty to inflate and then deflate > the objects into the pack when I use a window of 0. How hard would this > be? I am a capable programmer and wouldn't mind getting my hands dirty > in the code to implement this if someone could point me in the right > direction. The problem is that unpacked objects have the single line header (type followed by its inflated size in decimal) which starts the deflated stream, while in-pack representation of non-delta does not. There was an attempt to help doing this, but I haven't pursued it. http://article.gmane.org/gmane.comp.version-control.git/17368 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Revisiting large binary files issue. 2006-07-10 23:14 ` Junio C Hamano @ 2006-07-11 6:20 ` Peter Baumann 0 siblings, 0 replies; 37+ messages in thread From: Peter Baumann @ 2006-07-11 6:20 UTC (permalink / raw) To: git On 2006-07-10, Junio C Hamano <junkio@cox.net> wrote: > Carl Baldwin <cnb@fc.hp.com> writes: > >> First, I would like to be able to set the packing window to 0 for all of >> the git commands. It would be nice if I could set this in a >> per-repository config file so that any push/fetch operation would honor >> this window. Is there currently a way to do this? > > Should not be hard to add. > >> Second, I would like to not pay the penalty to inflate and then deflate >> the objects into the pack when I use a window of 0. How hard would this >> be? I am a capable programmer and wouldn't mind getting my hands dirty >> in the code to implement this if someone could point me in the right >> direction. > > The problem is that unpacked objects have the single line header > (type followed by its inflated size in decimal) which starts the > deflated stream, while in-pack representation of non-delta does > not. > > There was an attempt to help doing this, but I haven't pursued it. > > http://article.gmane.org/gmane.comp.version-control.git/17368 > > Wouldn't it make more sense to convert the unpacked object reprasentation to the one which is used in the pack files? This would make it really easy to just copy the object in the non-delta case to the pack and avoid the inflate/deflate calls. Peter Baumann ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Revisiting large binary files issue. 2006-07-10 23:01 Revisiting large binary files issue Carl Baldwin 2006-07-10 23:14 ` Junio C Hamano @ 2006-07-10 23:28 ` Linus Torvalds 2006-07-11 9:40 ` [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) sf 2006-07-11 14:55 ` Revisiting large binary files issue Carl Baldwin 1 sibling, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-10 23:28 UTC (permalink / raw) To: Carl Baldwin; +Cc: git On Mon, 10 Jul 2006, Carl Baldwin wrote: > > When I set the window to 0 I one more issue. Even though the blobs are > already compressed on disk I still seem to pay the penalty of inflating > them into memory and then deflating them into the pack. When the window > size is 0 this is just wasted cycles. With large binary files these > wasted cycles slow down the push/fetch operation considerably. Couldn't > the compressed blobs be copied into the pack without first deflating > them in this 0 window case? The problem is that the individual object disk format isn't actually the same as the pack-file object format for one object. The header is different: a pack-file uses a very dense bit packing, while the individual object format is a bit less dense. Sad, really, but it means that right now you can only re-use data that was already packed (when the format matches). Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) 2006-07-10 23:28 ` Linus Torvalds @ 2006-07-11 9:40 ` sf 2006-07-11 18:00 ` Linus Torvalds 2006-07-11 14:55 ` Revisiting large binary files issue Carl Baldwin 1 sibling, 1 reply; 37+ messages in thread From: sf @ 2006-07-11 9:40 UTC (permalink / raw) To: git Linus Torvalds wrote: ... > The problem is that the individual object disk format isn't actually the > same as the pack-file object format for one object. The header is > different: a pack-file uses a very dense bit packing, while the individual > object format is a bit less dense. I just stumbled over the same fact and asked myself why there are still two formats. Wouldn't it make more sense to use the pack-file object format for individual objects as well? As it happens individual objects all start with nibble 7 (deflated with default _zlib_ window size of 32K) whereas in the pack-file object format nibble 7 indicates delta entries which never occur as individual files. Roadmap for using pack-file format as individual object disk format: Step 1. When reading individual objects from disk check the first nibble and decode accordingly (see above). Step 2. When writing individual objects to disk write them in pack-file object format. Make that optional (config-file parameter, command line option etc.)? Step 3. Remove code for (old) individual object disk format. Please comment. Regards Stephan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) 2006-07-11 9:40 ` [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) sf @ 2006-07-11 18:00 ` Linus Torvalds 2006-07-11 21:45 ` sf 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 18:00 UTC (permalink / raw) To: sf; +Cc: git [ I read my personal mailbox first, so I didn't see this one until after I had already written my version.. ] On Tue, 11 Jul 2006, sf wrote: > > I just stumbled over the same fact and asked myself why there are still > two formats. Wouldn't it make more sense to use the pack-file object > format for individual objects as well? Yes, see the git list for a series of patches that try to do this. > As it happens individual objects all start with nibble 7 (deflated with > default _zlib_ window size of 32K) whereas in the pack-file object > format nibble 7 indicates delta entries which never occur as individual > files. I didn't actually do it that way, but it would be better to make the "parse_ascii_sha1_header()" more strict, and only accept the old names. Right now my patch-series could in theory accept something that is _not_ an ASCII header (eg it would be a binary header that just happened to have the format "x n\0", where "n" was a valid number). > Step 1. When reading individual objects from disk check the first nibble > and decode accordingly (see above). Check more than that, but yes, this should be tightened up in my series. > Step 2. When writing individual objects to disk write them in pack-file > object format. Make that optional (config-file parameter, command line > option etc.)? Done. > Step 3. Remove code for (old) individual object disk format. Well, I'm not sure how necessary that even is. We actually do have to generate the old header regardless, if for no other reason than the fact that we generate the SHA1 names based on it (even if we then write a new-style dense binary header to disk and discard the ASCII header). Having it there means that you can always just get a new version of git, and never worry about how old the archive you're working with is. (And then doing a "git repack -a -d" will make any archive also work with an old-style git, since the pack-file format didn't change, and a "git repack" thus ends up always creating something that is readable by anybody, including old clients). Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) 2006-07-11 18:00 ` Linus Torvalds @ 2006-07-11 21:45 ` sf 2006-07-11 22:17 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: sf @ 2006-07-11 21:45 UTC (permalink / raw) To: git Linus Torvalds wrote: ... > On Tue, 11 Jul 2006, sf wrote: ... >> Step 1. When reading individual objects from disk check the first nibble >> and decode accordingly (see above). > > Check more than that, but yes, this should be tightened up in my > series. Just look at the first byte of the object file _without doing any decompression_. It is 0x78 _if and only if_ the object file is in the traditional format. >> Step 3. Remove code for (old) individual object disk format. > > Well, I'm not sure how necessary that even is. We actually do have to > generate the old header regardless, if for no other reason than the fact > that we generate the SHA1 names based on it (even if we then write a > new-style dense binary header to disk and discard the ASCII header). > > Having it there means that you can always just get a new version of git, > and never worry about how old the archive you're working with is. > > (And then doing a "git repack -a -d" will make any archive also work with > an old-style git, since the pack-file format didn't change, and a "git > repack" thus ends up always creating something that is readable by > anybody, including old clients). Agreed. Regards Stephan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) 2006-07-11 21:45 ` sf @ 2006-07-11 22:17 ` Linus Torvalds 2006-07-11 22:26 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 22:17 UTC (permalink / raw) To: sf; +Cc: git On Tue, 11 Jul 2006, sf wrote: > > Just look at the first byte of the object file _without doing any > decompression_. It is 0x78 _if and only if_ the object file is in the > traditional format. 0x78 isn't the only valid flag for a zlib stream, as far as I can tell. It may be the only one _in_practice_, of course, but the zlib standard defines the first byte as - for low bits: CM (compression method): "This identifies the compression method used in the file. CM = 8 denotes the "deflate" compression method with a window size up to 32K. This is the method used by gzip and PNG (see references [1] and [2] in Chapter 3, below, for the reference documents). CM = 15 is reserved. It might be used in a future version of this specification to indicate the presence of an extra field before the compressed data." - four high bits are CINFO: "For CM = 8, CINFO is the base-2 logarithm of the LZ77 window size, minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above 7 are not allowed in this version of the specification. CINFO is not defined in this specification for CM not equal to 8." so 0x78 means "deflate with 32kB window size", but I don't see anything guaranteeing that we might not see something else for an object that cannot be compressed, for example. Anyway, the good news is that _if_ 0x78 is indeed the only possible value, then it is also an illegal value for an unpacked object in pack-file format (type=7 being OBJ_DELTA) and we wouldn't need any other flag for this. I just don't know if it's the only possible one.. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) 2006-07-11 22:17 ` Linus Torvalds @ 2006-07-11 22:26 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 22:26 UTC (permalink / raw) To: sf; +Cc: git On Tue, 11 Jul 2006, Linus Torvalds wrote: > > - for low bits: CM (compression method): > > "This identifies the compression method used in the file. CM = 8 > denotes the "deflate" compression method with a window size up > to 32K. This is the method used by gzip and PNG (see > references [1] and [2] in Chapter 3, below, for the reference > documents). CM = 15 is reserved. It might be used in a future > version of this specification to indicate the presence of an > extra field before the compressed data." > > - four high bits are CINFO: > > "For CM = 8, CINFO is the base-2 logarithm of the LZ77 window > size, minus eight (CINFO=7 indicates a 32K window size). Values > of CINFO above 7 are not allowed in this version of the > specification. CINFO is not defined in this specification for > CM not equal to 8." > > so 0x78 means "deflate with 32kB window size", but I don't see anything > guaranteeing that we might not see something else for an object that > cannot be compressed, for example. Ahh. Looking at the zlib sources, I see /* Write the zlib header */ if (s->status == INIT_STATE) { uInt header = (Z_DEFLATED + ((s->w_bits-8)<<4)) << 8; uInt level_flags = (s->level-1) >> 1; if (level_flags > 3) level_flags = 3; header |= (level_flags << 6); if (s->strstart != 0) header |= PRESET_DICT; header += 31 - (header % 31); s->status = BUSY_STATE; putShortMSB(s, header); (which is that first 16-bit word, MSB first). So we'll always have the Z-DEFLATED (8) there in the low four bits, but the high nybble will be "s->w_bits-8" where w_bits comes from windowBits, and I think we can depend on it beign 15: "The windowBits parameter is the base two logarithm of the window size (the size of the history buffer). It should be in the range 8..15 for this version of the library. Larger values of this parameter result in better compression at the expense of memory usage. The default value is 15 if deflateInit is used instead." so since we use deflateInit(), we know the window will be 15. So I guess we _can_ depend on the first byte being 0x78 for our use. Goodie. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Revisiting large binary files issue. 2006-07-10 23:28 ` Linus Torvalds 2006-07-11 9:40 ` [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) sf @ 2006-07-11 14:55 ` Carl Baldwin 2006-07-11 17:09 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Carl Baldwin @ 2006-07-11 14:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: git I'd like to get my hands dirty and see for myself where the issue lies. I hope to have some time next week to devote to this. Is it reasonable to hope for a solution that is at least a lot lighter weight than the current status quo? Carl On Mon, Jul 10, 2006 at 04:28:24PM -0700, Linus Torvalds wrote: > > > On Mon, 10 Jul 2006, Carl Baldwin wrote: > > > > When I set the window to 0 I one more issue. Even though the blobs are > > already compressed on disk I still seem to pay the penalty of inflating > > them into memory and then deflating them into the pack. When the window > > size is 0 this is just wasted cycles. With large binary files these > > wasted cycles slow down the push/fetch operation considerably. Couldn't > > the compressed blobs be copied into the pack without first deflating > > them in this 0 window case? > > The problem is that the individual object disk format isn't actually the > same as the pack-file object format for one object. The header is > different: a pack-file uses a very dense bit packing, while the individual > object format is a bit less dense. > > Sad, really, but it means that right now you can only re-use data that was > already packed (when the format matches). > > Linus > -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Carl Baldwin RADCAD (R&D CAD) Hewlett Packard Company MS 88 work: 970 898-1523 3404 E. Harmony Rd. work: Carl.N.Baldwin@hp.com Fort Collins, CO 80525 home: Carl@ecBaldwin.net - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Revisiting large binary files issue. 2006-07-11 14:55 ` Revisiting large binary files issue Carl Baldwin @ 2006-07-11 17:09 ` Linus Torvalds 2006-07-11 17:10 ` [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 17:09 UTC (permalink / raw) To: Carl Baldwin, Junio C Hamano; +Cc: Git Mailing List On Tue, 11 Jul 2006, Carl Baldwin wrote: > > I'd like to get my hands dirty and see for myself where the issue lies. > I hope to have some time next week to devote to this. Is it reasonable > to hope for a solution that is at least a lot lighter weight than the > current status quo? Ok, I decided to see how nasty an object database change would be. It doesn't look too bad. The following three patches implement what looks like a workable model. NOTE NOTE NOTE! It makes the new "binary headers" the default, and that will mean that unless you have applied the first two patches, any repository that has had objects added with the new git version WILL NOT BE READABLE BY AN OLDER GIT VERSION! So I think that the first two patches can be added to the main git branch pretty much immediately (after people have tested this all a _bit_ more, of course), because the first two patches just add the capability to _read_ a mixed-format repository. The third patch is the one that actually starts _writing_ new-format repo. It should be applied with extreme care, although it can basically be de-fanged by setting the default initial value of the "use_binary_headers" to 0, which will make it not write the binary headers by default. I have _not_ verified that the actual object format is identical to the pack-file one, but it should be. It's simple enough. The three patches will be sent as replies to this email. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c 2006-07-11 17:09 ` Linus Torvalds @ 2006-07-11 17:10 ` Linus Torvalds 2006-07-11 17:12 ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds 2006-07-11 17:16 ` [PATCH 3/3] Enable the new binary header format for unpacked objects Linus Torvalds 2 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 17:10 UTC (permalink / raw) To: Carl Baldwin, Junio C Hamano; +Cc: Git Mailing List Nobody else uses them, and I'm going to start changing them. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- cache.h | 2 -- sha1_file.c | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index b5e3f8f..d433d46 100644 --- a/cache.h +++ b/cache.h @@ -219,8 +219,6 @@ int safe_create_leading_directories(char char *enter_repo(char *path, int strict); /* Read and unpack a sha1 file into memory, write memory to a sha1 file */ -extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size); -extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep); extern int sha1_object_info(const unsigned char *, char *, unsigned long *); extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size); extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size); diff --git a/sha1_file.c b/sha1_file.c index 459430a..8734d50 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -684,7 +684,7 @@ static void *map_sha1_file_internal(cons return map; } -int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) +static int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) { /* Get the data stream */ memset(stream, 0, sizeof(*stream)); @@ -720,7 +720,7 @@ static void *unpack_sha1_rest(z_stream * * too permissive for what we want to check. So do an anal * object header parse by hand. */ -int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) +static int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) { int i; unsigned long size; ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 17:09 ` Linus Torvalds 2006-07-11 17:10 ` [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c Linus Torvalds @ 2006-07-11 17:12 ` Linus Torvalds 2006-07-11 18:40 ` Johannes Schindelin 2006-07-11 21:24 ` sf 2006-07-11 17:16 ` [PATCH 3/3] Enable the new binary header format for unpacked objects Linus Torvalds 2 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 17:12 UTC (permalink / raw) To: Carl Baldwin, Junio C Hamano; +Cc: Git Mailing List The pack-file format is slightly different from the traditional git object format, in that it has a much denser binary header encoding. The traditional format uses an ASCII string with type and length information, which is somewhat wasteful. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- This should probably be applied to the main tree asap if we think this is at all a worthwhile exercise. But somebody should verify that I got the format right first! sha1_file.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 57 insertions(+), 9 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 8734d50..ca5f0c0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -697,9 +697,9 @@ static int unpack_sha1_header(z_stream * return inflate(stream, 0); } -static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size) +static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size, unsigned int hdrlen) { - int bytes = strlen(buffer) + 1; + int bytes = hdrlen; unsigned char *buf = xmalloc(1+size); memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); @@ -720,9 +720,9 @@ static void *unpack_sha1_rest(z_stream * * too permissive for what we want to check. So do an anal * object header parse by hand. */ -static int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) +static int parse_ascii_sha1_header(char *hdr, char *type, unsigned long *sizep) { - int i; + int i, bytes = 0; unsigned long size; /* @@ -733,6 +733,7 @@ static int parse_sha1_header(char *hdr, i = 10; for (;;) { char c = *hdr++; + bytes++; if (c == ' ') break; if (!--i) @@ -746,6 +747,7 @@ static int parse_sha1_header(char *hdr, * decimal format (ie "010" is not valid). */ size = *hdr++ - '0'; + bytes++; if (size > 9) return -1; if (size) { @@ -754,6 +756,7 @@ static int parse_sha1_header(char *hdr, if (c > 9) break; hdr++; + bytes++; size = size * 10 + c; } } @@ -762,20 +765,65 @@ static int parse_sha1_header(char *hdr, /* * The length must be followed by a zero byte */ - return *hdr ? -1 : 0; + bytes++; + if (*hdr) + bytes = -1; + return bytes; +} + +static int parse_binary_sha1_header(char *hdr, char *type, unsigned long *sizep) +{ + unsigned char c; + int bytes = 1; + unsigned long size; + unsigned object_type, bits; + static const char *typename[8] = { + NULL, /* OBJ_EXT */ + "commit", "tree", "blob", "tag", + NULL, NULL, NULL + }; + + c = *hdr++; + object_type = (c >> 4) & 7; + if (!typename[object_type]) + return -1; + strcpy(type, typename[object_type]); + size = c & 15; + bits = 4; + while (!(c & 0x80)) { + if (bits >= 8*sizeof(unsigned long)) + return -1; + c = *hdr++; + size += (unsigned long) (c & 0x7f) << bits; + bytes++; + bits += 7; + } + *sizep = size; + return bytes; +} + +static int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) +{ + int retval = parse_ascii_sha1_header(hdr, type, sizep); + if (retval < 0) + retval = parse_binary_sha1_header(hdr, type, sizep); + return retval; } void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size) { - int ret; + int ret, hdrlen; z_stream stream; char hdr[8192]; ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)); - if (ret < Z_OK || parse_sha1_header(hdr, type, size) < 0) + if (ret < Z_OK) + return NULL; + hdrlen = parse_sha1_header(hdr, type, size); + if (hdrlen < 0) return NULL; - return unpack_sha1_rest(&stream, hdr, *size); + return unpack_sha1_rest(&stream, hdr, *size, hdrlen); } /* forward declaration for a mutually recursive function */ @@ -1192,7 +1240,7 @@ struct packed_git *find_sha1_pack(const int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep) { - int status; + int status, hdrlen; unsigned long mapsize, size; void *map; z_stream stream; ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 17:12 ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds @ 2006-07-11 18:40 ` Johannes Schindelin 2006-07-11 18:58 ` Linus Torvalds 2006-07-11 21:24 ` sf 1 sibling, 1 reply; 37+ messages in thread From: Johannes Schindelin @ 2006-07-11 18:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Carl Baldwin, Junio C Hamano, Git Mailing List Hi, On Tue, 11 Jul 2006, Linus Torvalds wrote: > @@ -762,20 +765,65 @@ static int parse_sha1_header(char *hdr, > /* > * The length must be followed by a zero byte > */ > - return *hdr ? -1 : 0; > + bytes++; > + if (*hdr) > + bytes = -1; > + return bytes; Why not just say return *hdr ? -1 : bytes; > +} > + > +static int parse_binary_sha1_header(char *hdr, char *type, unsigned long *sizep) > +{ > + unsigned char c; > + int bytes = 1; > + unsigned long size; > + unsigned object_type, bits; > + static const char *typename[8] = { > + NULL, /* OBJ_EXT */ > + "commit", "tree", "blob", "tag", > + NULL, NULL, NULL > + }; > + > + c = *hdr++; > + object_type = (c >> 4) & 7; > + if (!typename[object_type]) > + return -1; You might want to add a comment saying "since the type is lowercase in ascii format, object_type must be 6 or 7, which is an invalid object type." It took me a little to figure that out... > + strcpy(type, typename[object_type]); > + size = c & 15; Just a nit here: I think 0xf is easier to read with boolean operations. > + bits = 4; > + while (!(c & 0x80)) { > + if (bits >= 8*sizeof(unsigned long)) > + return -1; > + c = *hdr++; > + size += (unsigned long) (c & 0x7f) << bits; > + bytes++; > + bits += 7; > + } Are you not losing the last byte by putting the "while" _before_ instead of _after_ the loop? > @@ -1192,7 +1240,7 @@ struct packed_git *find_sha1_pack(const > > int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep) > { > - int status; > + int status, hdrlen; > unsigned long mapsize, size; > void *map; > z_stream stream; > - This hunk is unnecessary, right? Ciao, Dscho ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 18:40 ` Johannes Schindelin @ 2006-07-11 18:58 ` Linus Torvalds 2006-07-11 19:20 ` Johannes Schindelin 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 18:58 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Carl Baldwin, Junio C Hamano, Git Mailing List On Tue, 11 Jul 2006, Johannes Schindelin wrote: > > Why not just say > > return *hdr ? -1 : bytes; Hey, whatever works. I rewrote more, and edited some of my changes down again.. > You might want to add a comment saying "since the type is lowercase in > ascii format, object_type must be 6 or 7, which is an invalid object > type." It took me a little to figure that out... It's not even correct in my version - I check the ASCII header _first_, so by the time it looks at the binary one, it already knows it's not ascii. The problematic case is actually the other way around: my "parse_ascii_sha1_header()" isn't strict enough. Or, more likely, the parse_sha1_header() function should just be changed to check the binary format first (and then add your comment about why that is safe). > > + bits = 4; > > + while (!(c & 0x80)) { > > + if (bits >= 8*sizeof(unsigned long)) > > + return -1; > > + c = *hdr++; > > + size += (unsigned long) (c & 0x7f) << bits; > > + bytes++; > > + bits += 7; > > + } > > Are you not losing the last byte by putting the "while" _before_ instead > of _after_ the loop? No. The very first byte can have the 0x80 end marker, when the size was between 0..15. > > int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep) > > { > > - int status; > > + int status, hdrlen; > > unsigned long mapsize, size; > > void *map; > > z_stream stream; > > This hunk is unnecessary, right? Yeah, never mind. That function didn't actually need the hdrlen, it only cared about the SHA1. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 18:58 ` Linus Torvalds @ 2006-07-11 19:20 ` Johannes Schindelin 2006-07-11 19:48 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Johannes Schindelin @ 2006-07-11 19:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: Carl Baldwin, Junio C Hamano, Git Mailing List Hi, On Tue, 11 Jul 2006, Linus Torvalds wrote: > On Tue, 11 Jul 2006, Johannes Schindelin wrote: > > > You might want to add a comment saying "since the type is lowercase in > > ascii format, object_type must be 6 or 7, which is an invalid object > > type." It took me a little to figure that out... > > It's not even correct in my version - I check the ASCII header _first_, so > by the time it looks at the binary one, it already knows it's not ascii. Just realized it all by myself... > Or, more likely, the parse_sha1_header() function should just be changed > to check the binary format first (and then add your comment about why that > is safe). Yes, exactly. > > > + bits = 4; > > > + while (!(c & 0x80)) { > > > + if (bits >= 8*sizeof(unsigned long)) > > > + return -1; > > > + c = *hdr++; > > > + size += (unsigned long) (c & 0x7f) << bits; > > > + bytes++; > > > + bits += 7; > > > + } > > > > Are you not losing the last byte by putting the "while" _before_ instead > > of _after_ the loop? > > No. The very first byte can have the 0x80 end marker, when the size was > between 0..15. Yes, I understand now. I was a little confused by the way it is written... Thanks for the clarification, Dscho ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 19:20 ` Johannes Schindelin @ 2006-07-11 19:48 ` Linus Torvalds 2006-07-11 21:25 ` Johannes Schindelin 2006-07-11 21:47 ` Junio C Hamano 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 19:48 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Carl Baldwin, Junio C Hamano, Git Mailing List On Tue, 11 Jul 2006, Johannes Schindelin wrote: > > > Or, more likely, the parse_sha1_header() function should just be changed > > to check the binary format first (and then add your comment about why that > > is safe). > > Yes, exactly. Here's a newer verson of [2/3], with these issues fixed. It actually fixes things twice: (a) by parsing the binary version first (which makes sense for a totally independent reason - if that is going to be the "default" version in the long run, we should just test it first anyway) and (b) by making the ASCII version parser stricter too. This did, btw, also fix the test failure, so the fact that the ASCII header parser wasn't careful enough was actually a problem in real life. So please throw away the old version. Linus --- From: Linus Torvalds <torvalds@osdl.org> Subject: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" The pack-file format is slightly different from the traditional git object format, in that it has a much denser binary header encoding. The traditional format uses an ASCII string with type and length information, which is somewhat wasteful. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- sha1_file.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 82 insertions(+), 11 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 8734d50..15ccf5e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -697,9 +697,9 @@ static int unpack_sha1_header(z_stream * return inflate(stream, 0); } -static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size) +static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size, unsigned int hdrlen) { - int bytes = strlen(buffer) + 1; + int bytes = hdrlen; unsigned char *buf = xmalloc(1+size); memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); @@ -720,25 +720,40 @@ static void *unpack_sha1_rest(z_stream * * too permissive for what we want to check. So do an anal * object header parse by hand. */ -static int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) +static int parse_ascii_sha1_header(char *hdr, char *type, unsigned long *sizep) { - int i; + int bytes = 0; unsigned long size; /* * The type can be at most ten bytes (including the * terminating '\0' that we add), and is followed by * a space. + * + * We want at least three characters, and they should + * all be normal lower-case letters. */ - i = 10; for (;;) { - char c = *hdr++; + unsigned char c = *hdr++; + bytes++; if (c == ' ') break; - if (!--i) + if (bytes >= 10) return -1; *type++ = c; + + /* + * The high nybble must be 6 of 7, see + * parse_binary_header(). This covers + * all ASCII lowercase characters. + */ + if (c < 0x60 || c > 0x7f) + return -1; } + + /* Minimum three letters and the space */ + if (bytes < 4) + return -1; *type = 0; /* @@ -746,6 +761,7 @@ static int parse_sha1_header(char *hdr, * decimal format (ie "010" is not valid). */ size = *hdr++ - '0'; + bytes++; if (size > 9) return -1; if (size) { @@ -754,6 +770,7 @@ static int parse_sha1_header(char *hdr, if (c > 9) break; hdr++; + bytes++; size = size * 10 + c; } } @@ -762,20 +779,74 @@ static int parse_sha1_header(char *hdr, /* * The length must be followed by a zero byte */ - return *hdr ? -1 : 0; + bytes++; + return *hdr ? -1 : bytes; +} + +/* + * We never confuse a binary header with an old ASCII one, + * because the ASCII one will always start with a lower-case + * letter, meaning that the first byte will be of the form + * 0x6? or 0x7?. + * + * That in turn would be parsed as object type 6 or 7, neither + * of which is valid for a unpacked object (object type 7 is + * a delta, and can only exist in a pack-file, while object type + * 6 is invalid). + */ +static int parse_binary_sha1_header(char *hdr, char *type, unsigned long *sizep) +{ + unsigned char c; + int bytes = 1; + unsigned long size; + unsigned object_type, bits; + static const char *typename[8] = { + NULL, /* OBJ_EXT */ + "commit", "tree", "blob", "tag", + NULL, NULL, NULL + }; + + c = *hdr++; + object_type = (c >> 4) & 7; + if (!typename[object_type]) + return -1; + strcpy(type, typename[object_type]); + size = c & 15; + bits = 4; + while (!(c & 0x80)) { + if (bits >= 8*sizeof(unsigned long)) + return -1; + c = *hdr++; + size += (unsigned long) (c & 0x7f) << bits; + bytes++; + bits += 7; + } + *sizep = size; + return bytes; +} + +static int parse_sha1_header(char *hdr, char *type, unsigned long *sizep) +{ + int retval = parse_binary_sha1_header(hdr, type, sizep); + if (retval < 0) + retval = parse_ascii_sha1_header(hdr, type, sizep); + return retval; } void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size) { - int ret; + int ret, hdrlen; z_stream stream; char hdr[8192]; ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)); - if (ret < Z_OK || parse_sha1_header(hdr, type, size) < 0) + if (ret < Z_OK) + return NULL; + hdrlen = parse_sha1_header(hdr, type, size); + if (hdrlen < 0) return NULL; - return unpack_sha1_rest(&stream, hdr, *size); + return unpack_sha1_rest(&stream, hdr, *size, hdrlen); } /* forward declaration for a mutually recursive function */ ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 19:48 ` Linus Torvalds @ 2006-07-11 21:25 ` Johannes Schindelin 2006-07-11 21:47 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Johannes Schindelin @ 2006-07-11 21:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Hi, On Tue, 11 Jul 2006, Linus Torvalds wrote: > On Tue, 11 Jul 2006, Johannes Schindelin wrote: > > > > > Or, more likely, the parse_sha1_header() function should just be changed > > > to check the binary format first (and then add your comment about why that > > > is safe). > > > > Yes, exactly. > > Here's a newer verson of [2/3], with these issues fixed. It actually fixes > things twice: (a) by parsing the binary version first (which makes sense > for a totally independent reason - if that is going to be the "default" > version in the long run, we should just test it first anyway) and (b) by > making the ASCII version parser stricter too. Melikey. Ciao, Dscho ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 19:48 ` Linus Torvalds 2006-07-11 21:25 ` Johannes Schindelin @ 2006-07-11 21:47 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2006-07-11 21:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, Carl Baldwin, Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > Here's a newer verson of [2/3], with these issues fixed. It actually fixes > things twice: (a) by parsing the binary version first (which makes sense > for a totally independent reason - if that is going to be the "default" > version in the long run, we should just test it first anyway) and (b) by > making the ASCII version parser stricter too. Wait a minute. read-sha1-file maps sha1-file-internal (for unpacked one), and then calls unpack-sha1-file. unpack-sha1-file calls unpack-sha1-header to start inflation, lets parse-sha1-header to read the header in the inflated buffer, and calls unpack-sha1-rest to inflate the rest. But in packs, we have binary header not deflated, followed by deflated payload. If we want to copy things from loose objects into pack without changing the packfile format, this change would not help, I suspect. At least, your updated unpack_sha1_file() needs to check for binary header first (starting from "map"), and if that starts with binary header, start inflating after the header to extract the payload. Otherwise you would do the traditional. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 17:12 ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds 2006-07-11 18:40 ` Johannes Schindelin @ 2006-07-11 21:24 ` sf 2006-07-11 22:09 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: sf @ 2006-07-11 21:24 UTC (permalink / raw) To: git Linus Torvalds wrote: > The pack-file format is slightly different from the traditional git > object format, in that it has a much denser binary header encoding. > > The traditional format uses an ASCII string with type and length > information, which is somewhat wasteful. And in the traditional format type and length are compressed whereas in the pack-file format they are not. > This should probably be applied to the main tree asap if we think > this is at all a worthwhile exercise. But somebody should verify that I > got the format right first! Sorry but see above. Regards Stephan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 21:24 ` sf @ 2006-07-11 22:09 ` Linus Torvalds 2006-07-11 22:25 ` sf 2006-07-11 23:03 ` Junio C Hamano 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 22:09 UTC (permalink / raw) To: sf; +Cc: git On Tue, 11 Jul 2006, sf wrote: > > And in the traditional format type and length are compressed whereas in > the pack-file format they are not. Ahh. Yes. > > This should probably be applied to the main tree asap if we think > > this is at all a worthwhile exercise. But somebody should verify that I > > got the format right first! > > Sorry but see above. Good catch, thanks indeed. Doing that for unpacked objects would in fact make a lot of things much simpler, so it would be good to do. The _bad_ part is that this also makes it a lot harder to see the difference between a "binary header" and a "compressed ASCII header". The two are not "obviously different" any more. The common byte sequence for a compressed stream is 78 9c ... where the first byte is the CMF byte (compression info and method). But it's not the only possible such sequence according to the zlib format. (The 16-bit hex number in MSB format, ie 0x789c above, is defined to have a built-in checksum, so that it must be a multiple of 31 according to the standard: 0x789c = 996 * 31). So if we have a uncompressed header, we'd need to add a separate 2-byte fingerprint to it _before_ the real header that isn't divisible by 31, and use that as the thing to test. Ho humm. I'll see what I can come up with. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 22:09 ` Linus Torvalds @ 2006-07-11 22:25 ` sf 2006-07-11 23:03 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: sf @ 2006-07-11 22:25 UTC (permalink / raw) To: git Linus Torvalds wrote: ... > The common byte sequence for a compressed stream is > > 78 9c ... > > where the first byte is the CMF byte (compression info and method). > > But it's not the only possible such sequence according to the zlib format. No, but 0x78 is the only first byte ever produced by git; the files are always deflated (second nibble is 8) with window size 32K (first nibble is 7). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 22:09 ` Linus Torvalds 2006-07-11 22:25 ` sf @ 2006-07-11 23:03 ` Junio C Hamano 2006-07-12 0:03 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2006-07-11 23:03 UTC (permalink / raw) To: git Linus Torvalds <torvalds@osdl.org> writes: > So if we have a uncompressed header, we'd need to add a separate 2-byte > fingerprint to it _before_ the real header that isn't divisible by 31, and > use that as the thing to test. > > Ho humm. I'll see what I can come up with. I do not like to rely too heavily on what zlib compression's beginning of stream looks like. I think the new format can be deflated new header (fully synched) followed by deflated payload. So the sequence unpack-sha1-header followed by parse-sha1-header would notice we are dealing with new format and reinitialize the deflator at the point where the header deflator left off. Wouldn't that work? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-11 23:03 ` Junio C Hamano @ 2006-07-12 0:03 ` Linus Torvalds 2006-07-12 0:39 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 0:03 UTC (permalink / raw) To: Junio C Hamano, sf, Johannes Schindelin; +Cc: Git Mailing List On Tue, 11 Jul 2006, Junio C Hamano wrote: > > I do not like to rely too heavily on what zlib compression's > beginning of stream looks like. Well, I normally would agree with you if it was a "oh, all our zlib objects seem to start with 0x78" thing, but after having dug into both the zlib standard (which is actually an RFC, not just some random thing), and looked at the sources, it's definitely the case that the "0x78" byte isn't just an implementation detail. It's actually really part of the specs, and not just happenstance. > I think the new format can be deflated new header (fully > synched) followed by deflated payload. That would work, but on the other hand, one of the advantages of doing the new format would be that the "check size and type" code wouldn't even need to call into the zlib code. Anyway, I think this following patch replaces the old 2/3 and 3/3 (it still depends on the original [1/3] cleanup. (It also renames and reverses the meaning of the config file option: it's now "[core] LegacyHeaders = true" for using legacy headers.) Not heavily tested, but seems ok. sf? Dscho? Can you check this thing out? Linus ---- Documentation/config.txt | 6 +++ cache.h | 1 config.c | 5 ++ environment.c | 1 sha1_file.c | 106 +++++++++++++++++++++++++++++++++++++++++++--- 5 files changed, 111 insertions(+), 8 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0b434c1..9780c89 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -97,6 +97,12 @@ core.compression:: compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. +core.legacyheaders:: + A boolean which enables the legacy object header format in case + you want to interoperate with old clients accessing the object + database directly (where the "http://" and "rsync://" protocols + count as direct access). + alias.*:: Command aliases for the gitlink:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/cache.h b/cache.h index d433d46..eee5fc9 100644 --- a/cache.h +++ b/cache.h @@ -176,6 +176,7 @@ extern int commit_lock_file(struct lock_ extern void rollback_lock_file(struct lock_file *); /* Environment bits from configuration mechanism */ +extern int use_legacy_headers; extern int trust_executable_bit; extern int assume_unchanged; extern int prefer_symlink_refs; diff --git a/config.c b/config.c index 8445f7d..0ac6aeb 100644 --- a/config.c +++ b/config.c @@ -279,6 +279,11 @@ int git_default_config(const char *var, return 0; } + if (!strcmp(var, "core.legacyheaders")) { + use_legacy_headers = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "core.compression")) { int level = git_config_int(var, value); if (level == -1) diff --git a/environment.c b/environment.c index 97d42b1..d80a39a 100644 --- a/environment.c +++ b/environment.c @@ -11,6 +11,7 @@ #include "cache.h" char git_default_email[MAX_GITNAME]; char git_default_name[MAX_GITNAME]; +int use_legacy_headers = 0; int trust_executable_bit = 1; int assume_unchanged = 0; int prefer_symlink_refs = 0; diff --git a/sha1_file.c b/sha1_file.c index 8734d50..475b23d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -684,26 +684,74 @@ static void *map_sha1_file_internal(cons return map; } -static int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) +static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz) { + unsigned char c; + unsigned int word, bits; + unsigned long size; + static const char *typename[8] = { + NULL, /* OBJ_EXT */ + "commit", "tree", "blob", "tag", + NULL, NULL, NULL + }; + const char *type; + /* Get the data stream */ memset(stream, 0, sizeof(*stream)); stream->next_in = map; stream->avail_in = mapsize; stream->next_out = buffer; - stream->avail_out = size; + stream->avail_out = bufsiz; + + /* + * Is it a zlib-compressed buffer? If so, the first byte + * must be 0x78 (15-bit window size, deflated), and the + * first 16-bit word is evenly divisible by 31 + */ + word = (map[0] << 8) + map[1]; + if (map[0] == 0x78 && !(word % 31)) { + inflateInit(stream); + return inflate(stream, 0); + } + + c = *map++; + mapsize--; + type = typename[(c >> 4) & 7]; + if (!type) + return -1; + + bits = 4; + size = c & 0xf; + while (!(c & 0x80)) { + if (bits >= 8*sizeof(long)) + return -1; + c = *map++; + size += (c & 0x7f) << bits; + bits += 7; + mapsize--; + } + /* Set up the stream for the rest.. */ + stream->next_in = map; + stream->avail_in = mapsize; inflateInit(stream); - return inflate(stream, 0); + + /* And generate the fake traditional header */ + stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu", type, size); + return 0; } static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size) { int bytes = strlen(buffer) + 1; unsigned char *buf = xmalloc(1+size); + unsigned long n; - memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); - bytes = stream->total_out - bytes; + n = stream->total_out - bytes; + if (n > size) + n = size; + memcpy(buf, (char *) buffer + bytes, n); + bytes = n; if (bytes < size) { stream->next_out = buf + bytes; stream->avail_out = size - bytes; @@ -1414,6 +1462,49 @@ static int write_buffer(int fd, const vo return 0; } +static int write_binary_header(unsigned char *hdr, enum object_type type, unsigned long len) +{ + int hdr_len; + unsigned char c; + + c = (type << 4) | (len & 15); + len >>= 4; + hdr_len = 1; + while (len) { + *hdr++ = c; + hdr_len++; + c = (len & 0x7f); + len >>= 7; + } + *hdr = c | 0x80; + return hdr_len; +} + +static void setup_object_header(z_stream *stream, const char *type, unsigned long len) +{ + int obj_type, hdr; + + if (use_legacy_headers) { + while (deflate(stream, 0) == Z_OK) + /* nothing */; + return; + } + if (!strcmp(type, blob_type)) + obj_type = OBJ_BLOB; + else if (!strcmp(type, tree_type)) + obj_type = OBJ_TREE; + else if (!strcmp(type, commit_type)) + obj_type = OBJ_COMMIT; + else if (!strcmp(type, tag_type)) + obj_type = OBJ_TAG; + else + die("trying to generate bogus object of type '%s'", type); + hdr = write_binary_header(stream->next_out, obj_type, len); + stream->total_out = hdr; + stream->next_out += hdr; + stream->avail_out -= hdr; +} + int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1) { int size; @@ -1459,7 +1550,7 @@ int write_sha1_file(void *buf, unsigned /* Set it up */ memset(&stream, 0, sizeof(stream)); deflateInit(&stream, zlib_compression_level); - size = deflateBound(&stream, len+hdrlen); + size = 8 + deflateBound(&stream, len+hdrlen); compressed = xmalloc(size); /* Compress it */ @@ -1469,8 +1560,7 @@ int write_sha1_file(void *buf, unsigned /* First header.. */ stream.next_in = hdr; stream.avail_in = hdrlen; - while (deflate(&stream, 0) == Z_OK) - /* nothing */; + setup_object_header(&stream, type, len); /* Then the data itself.. */ stream.next_in = buf; ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 0:03 ` Linus Torvalds @ 2006-07-12 0:39 ` Johannes Schindelin 2006-07-12 3:45 ` Linus Torvalds 2006-07-12 0:46 ` Junio C Hamano 2006-07-12 6:49 ` Peter Baumann 2 siblings, 1 reply; 37+ messages in thread From: Johannes Schindelin @ 2006-07-12 0:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, sf, Git Mailing List Hi, On Tue, 11 Jul 2006, Linus Torvalds wrote: > On Tue, 11 Jul 2006, Junio C Hamano wrote: > > > > I do not like to rely too heavily on what zlib compression's > > beginning of stream looks like. > > Well, I normally would agree with you if it was a "oh, all our zlib > objects seem to start with 0x78" thing, but after having dug into both the > zlib standard (which is actually an RFC, not just some random thing), and > looked at the sources, it's definitely the case that the "0x78" byte isn't > just an implementation detail. > > It's actually really part of the specs, and not just happenstance. Good. > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -684,26 +684,74 @@ static void *map_sha1_file_internal(cons > return map; > } > > -static int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) > +static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz) > { > + unsigned char c; > + unsigned int word, bits; > + unsigned long size; > + static const char *typename[8] = { > + NULL, /* OBJ_EXT */ > + "commit", "tree", "blob", "tag", > + NULL, NULL, NULL > + }; I completely forgot to mention that type_names[] is already declared in object.h. Obviously, it is not really important, but maybe it would be better to obey the DRY principle (think addition of "bind" object type). > + while (!(c & 0x80)) { > + if (bits >= 8*sizeof(long)) Another nit: while it is safe to assume that sizeof(long) == sizeof(unsigned long), it was nevertheless a little confusing to yours truly (especially since you changed it since your last patch). > static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size) > { > int bytes = strlen(buffer) + 1; > unsigned char *buf = xmalloc(1+size); > + unsigned long n; > > - memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); > - bytes = stream->total_out - bytes; > + n = stream->total_out - bytes; > + if (n > size) > + n = size; Just out of curiosity: when can this happen? I mean, there is no error or something which could tell the caller that not the whole object was copied. > memset(&stream, 0, sizeof(stream)); > deflateInit(&stream, zlib_compression_level); > - size = deflateBound(&stream, len+hdrlen); > + size = 8 + deflateBound(&stream, len+hdrlen); Again, I had to think why this is correct. I think it should be something like 2 + sizeof(unsigned long) * 8 / 7, but I did not think all that hard. It looks good to me, but I did not really test... Ciao, Dscho ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 0:39 ` Johannes Schindelin @ 2006-07-12 3:45 ` Linus Torvalds 2006-07-12 4:31 ` Linus Torvalds 2006-07-12 6:35 ` Junio C Hamano 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 3:45 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, sf, Git Mailing List On Wed, 12 Jul 2006, Johannes Schindelin wrote: >> > I completely forgot to mention that type_names[] is already declared in > object.h. Obviously, it is not really important, but maybe it would be > better to obey the DRY principle (think addition of "bind" object type). Actually, the type_names[] array in object.h is _totally_ different. That one is indexed by TYPE_xxx, not OBJ_xxxx. And yeah, it's stupid to have two different numberings for the same thing, and we really really shouldn't. But the OBJ_xxx numbering was originally a a totally internal implementation detail to just the pack-files. I shouldn't have introduced the new TYPE_xxx macros. I should just have used the same OBJ_xxx macros that we use in the pack-file. Junio: this patch is totally independent from the other patches, and is on top of you current "master". It gets rid of TYPE_xxx in favor of the OBJ_xxx enums, which are moved from pack.h into object.h. That way, we can't have people mixing up the two different kinds of type enumeration. (Eventually, I want to get rid of passing around the "type strings" entirely, and this will help - no confusion about two different integer enumeration). Linus ---- diff --git a/blob.c b/blob.c index 496f270..d1af2e6 100644 --- a/blob.c +++ b/blob.c @@ -10,12 +10,12 @@ struct blob *lookup_blob(const unsigned if (!obj) { struct blob *ret = alloc_blob_node(); created_object(sha1, &ret->object); - ret->object.type = TYPE_BLOB; + ret->object.type = OBJ_BLOB; return ret; } if (!obj->type) - obj->type = TYPE_BLOB; - if (obj->type != TYPE_BLOB) { + obj->type = OBJ_BLOB; + if (obj->type != OBJ_BLOB) { error("Object %s is a %s, not a blob", sha1_to_hex(sha1), typename(obj->type)); return NULL; diff --git a/builtin-diff.c b/builtin-diff.c index ae901dd..cb38f44 100644 --- a/builtin-diff.c +++ b/builtin-diff.c @@ -285,9 +285,9 @@ int cmd_diff(int argc, const char **argv obj = deref_tag(obj, NULL, 0); if (!obj) die("invalid object '%s' given.", name); - if (obj->type == TYPE_COMMIT) + if (obj->type == OBJ_COMMIT) obj = &((struct commit *)obj)->tree->object; - if (obj->type == TYPE_TREE) { + if (obj->type == OBJ_TREE) { if (ARRAY_SIZE(ent) <= ents) die("more than %d trees given: '%s'", (int) ARRAY_SIZE(ent), name); @@ -297,7 +297,7 @@ int cmd_diff(int argc, const char **argv ents++; continue; } - if (obj->type == TYPE_BLOB) { + if (obj->type == OBJ_BLOB) { if (2 <= blobs) die("more than two blobs given: '%s'", name); memcpy(blob[blobs].sha1, obj->sha1, 20); diff --git a/builtin-fmt-merge-msg.c b/builtin-fmt-merge-msg.c index 6527482..b3c4f98 100644 --- a/builtin-fmt-merge-msg.c +++ b/builtin-fmt-merge-msg.c @@ -177,7 +177,7 @@ static void shortlog(const char *name, u int flags = UNINTERESTING | TREECHANGE | SEEN | SHOWN | ADDED; branch = deref_tag(parse_object(sha1), sha1_to_hex(sha1), 40); - if (!branch || branch->type != TYPE_COMMIT) + if (!branch || branch->type != OBJ_COMMIT) return; setup_revisions(0, NULL, rev, NULL); diff --git a/builtin-grep.c b/builtin-grep.c index 4c2f7df..a79bac3 100644 --- a/builtin-grep.c +++ b/builtin-grep.c @@ -891,9 +891,9 @@ static int grep_tree(struct grep_opt *op static int grep_object(struct grep_opt *opt, const char **paths, struct object *obj, const char *name) { - if (obj->type == TYPE_BLOB) + if (obj->type == OBJ_BLOB) return grep_sha1(opt, obj->sha1, name); - if (obj->type == TYPE_COMMIT || obj->type == TYPE_TREE) { + if (obj->type == OBJ_COMMIT || obj->type == OBJ_TREE) { struct tree_desc tree; void *data; int hit; diff --git a/builtin-rev-list.c b/builtin-rev-list.c index 63bad0e..8f32871 100644 --- a/builtin-rev-list.c +++ b/builtin-rev-list.c @@ -167,16 +167,16 @@ static void show_commit_list(struct rev_ const char *name = pending->name; if (obj->flags & (UNINTERESTING | SEEN)) continue; - if (obj->type == TYPE_TAG) { + if (obj->type == OBJ_TAG) { obj->flags |= SEEN; add_object_array(obj, name, &objects); continue; } - if (obj->type == TYPE_TREE) { + if (obj->type == OBJ_TREE) { process_tree((struct tree *)obj, &objects, NULL, name); continue; } - if (obj->type == TYPE_BLOB) { + if (obj->type == OBJ_BLOB) { process_blob((struct blob *)obj, &objects, NULL, name); continue; } diff --git a/commit.c b/commit.c index c6bf10d..46d5867 100644 --- a/commit.c +++ b/commit.c @@ -56,7 +56,7 @@ static struct commit *check_commit(struc const unsigned char *sha1, int quiet) { - if (obj->type != TYPE_COMMIT) { + if (obj->type != OBJ_COMMIT) { if (!quiet) error("Object %s is a %s, not a commit", sha1_to_hex(sha1), typename(obj->type)); @@ -86,11 +86,11 @@ struct commit *lookup_commit(const unsig if (!obj) { struct commit *ret = alloc_commit_node(); created_object(sha1, &ret->object); - ret->object.type = TYPE_COMMIT; + ret->object.type = OBJ_COMMIT; return ret; } if (!obj->type) - obj->type = TYPE_COMMIT; + obj->type = OBJ_COMMIT; return check_commit(obj, sha1, 0); } diff --git a/describe.c b/describe.c index 8e68d5d..324ca89 100644 --- a/describe.c +++ b/describe.c @@ -67,7 +67,7 @@ static int get_name(const char *path, co * Otherwise only annotated tags are used. */ if (!strncmp(path, "refs/tags/", 10)) { - if (object->type == TYPE_TAG) + if (object->type == OBJ_TAG) prio = 2; else prio = 1; diff --git a/fetch-pack.c b/fetch-pack.c index f2c51eb..b7824db 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -46,7 +46,7 @@ static int rev_list_insert_ref(const cha { struct object *o = deref_tag(parse_object(sha1), path, 0); - if (o && o->type == TYPE_COMMIT) + if (o && o->type == OBJ_COMMIT) rev_list_push((struct commit *)o, SEEN); return 0; @@ -256,14 +256,14 @@ static int mark_complete(const char *pat { struct object *o = parse_object(sha1); - while (o && o->type == TYPE_TAG) { + while (o && o->type == OBJ_TAG) { struct tag *t = (struct tag *) o; if (!t->tagged) break; /* broken repository */ o->flags |= COMPLETE; o = parse_object(t->tagged->sha1); } - if (o && o->type == TYPE_COMMIT) { + if (o && o->type == OBJ_COMMIT) { struct commit *commit = (struct commit *)o; commit->object.flags |= COMPLETE; insert_by_date(commit, &complete); @@ -357,7 +357,7 @@ static int everything_local(struct ref * * in sync with the other side at some time after * that (it is OK if we guess wrong here). */ - if (o->type == TYPE_COMMIT) { + if (o->type == OBJ_COMMIT) { struct commit *commit = (struct commit *)o; if (!cutoff || cutoff < commit->date) cutoff = commit->date; @@ -376,7 +376,7 @@ static int everything_local(struct ref * struct object *o = deref_tag(lookup_object(ref->old_sha1), NULL, 0); - if (!o || o->type != TYPE_COMMIT || !(o->flags & COMPLETE)) + if (!o || o->type != OBJ_COMMIT || !(o->flags & COMPLETE)) continue; if (!(o->flags & SEEN)) { diff --git a/fsck-objects.c b/fsck-objects.c index ef54a8a..e167f41 100644 --- a/fsck-objects.c +++ b/fsck-objects.c @@ -297,13 +297,13 @@ static int fsck_sha1(unsigned char *sha1 if (obj->flags & SEEN) return 0; obj->flags |= SEEN; - if (obj->type == TYPE_BLOB) + if (obj->type == OBJ_BLOB) return 0; - if (obj->type == TYPE_TREE) + if (obj->type == OBJ_TREE) return fsck_tree((struct tree *) obj); - if (obj->type == TYPE_COMMIT) + if (obj->type == OBJ_COMMIT) return fsck_commit((struct commit *) obj); - if (obj->type == TYPE_TAG) + if (obj->type == OBJ_TAG) return fsck_tag((struct tag *) obj); /* By now, parse_object() would've returned NULL instead. */ return objerror(obj, "unknown type '%d' (internal fsck error)", obj->type); @@ -472,7 +472,7 @@ static int fsck_cache_tree(struct cache_ } mark_reachable(obj, REACHABLE); obj->used = 1; - if (obj->type != TYPE_TREE) + if (obj->type != OBJ_TREE) err |= objerror(obj, "non-tree in cache-tree"); } for (i = 0; i < it->subtree_nr; i++) diff --git a/http-push.c b/http-push.c index f761584..4768619 100644 --- a/http-push.c +++ b/http-push.c @@ -1784,16 +1784,16 @@ static int get_delta(struct rev_info *re if (obj->flags & (UNINTERESTING | SEEN)) continue; - if (obj->type == TYPE_TAG) { + if (obj->type == OBJ_TAG) { obj->flags |= SEEN; p = add_one_object(obj, p); continue; } - if (obj->type == TYPE_TREE) { + if (obj->type == OBJ_TREE) { p = process_tree((struct tree *)obj, p, NULL, name); continue; } - if (obj->type == TYPE_BLOB) { + if (obj->type == OBJ_BLOB) { p = process_blob((struct blob *)obj, p, NULL, name); continue; } @@ -1960,12 +1960,12 @@ static int ref_newer(const unsigned char * old. Otherwise we require --force. */ o = deref_tag(parse_object(old_sha1), NULL, 0); - if (!o || o->type != TYPE_COMMIT) + if (!o || o->type != OBJ_COMMIT) return 0; old = (struct commit *) o; o = deref_tag(parse_object(new_sha1), NULL, 0); - if (!o || o->type != TYPE_COMMIT) + if (!o || o->type != OBJ_COMMIT) return 0; new = (struct commit *) o; @@ -2044,7 +2044,7 @@ static void add_remote_info_ref(struct r fwrite_buffer(ref_info, 1, len, buf); free(ref_info); - if (o->type == TYPE_TAG) { + if (o->type == OBJ_TAG) { o = deref_tag(o, ls->dentry_name, 0); if (o) { len = strlen(ls->dentry_name) + 45; diff --git a/name-rev.c b/name-rev.c index 083d067..f92f14e 100644 --- a/name-rev.c +++ b/name-rev.c @@ -84,14 +84,14 @@ static int name_ref(const char *path, co if (tags_only && strncmp(path, "refs/tags/", 10)) return 0; - while (o && o->type == TYPE_TAG) { + while (o && o->type == OBJ_TAG) { struct tag *t = (struct tag *) o; if (!t->tagged) break; /* broken repository */ o = parse_object(t->tagged->sha1); deref = 1; } - if (o && o->type == TYPE_COMMIT) { + if (o && o->type == OBJ_COMMIT) { struct commit *commit = (struct commit *)o; if (!strncmp(path, "refs/heads/", 11)) @@ -111,7 +111,7 @@ static const char* get_rev_name(struct o struct rev_name *n; struct commit *c; - if (o->type != TYPE_COMMIT) + if (o->type != OBJ_COMMIT) return "undefined"; c = (struct commit *) o; n = c->util; @@ -172,7 +172,7 @@ int main(int argc, char **argv) } o = deref_tag(parse_object(sha1), *argv, 0); - if (!o || o->type != TYPE_COMMIT) { + if (!o || o->type != OBJ_COMMIT) { fprintf(stderr, "Could not get commit for %s. Skipping.\n", *argv); continue; diff --git a/object.c b/object.c index 37277f9..e7ca56e 100644 --- a/object.c +++ b/object.c @@ -19,7 +19,8 @@ struct object *get_indexed_object(unsign } const char *type_names[] = { - "none", "blob", "tree", "commit", "bad" + "none", "commit", "tree", "blob", "tag", + "bad type 5", "bad type 6", "delta", "bad", }; static unsigned int hash_obj(struct object *obj, unsigned int n) @@ -88,7 +89,7 @@ void created_object(const unsigned char { obj->parsed = 0; obj->used = 0; - obj->type = TYPE_NONE; + obj->type = OBJ_NONE; obj->flags = 0; memcpy(obj->sha1, sha1, 20); @@ -131,7 +132,7 @@ struct object *lookup_unknown_object(con if (!obj) { union any_object *ret = xcalloc(1, sizeof(*ret)); created_object(sha1, &ret->object); - ret->object.type = TYPE_NONE; + ret->object.type = OBJ_NONE; return &ret->object; } return obj; diff --git a/object.h b/object.h index e0125e1..7893e94 100644 --- a/object.h +++ b/object.h @@ -24,12 +24,19 @@ struct object_array { #define TYPE_BITS 3 #define FLAG_BITS 27 -#define TYPE_NONE 0 -#define TYPE_BLOB 1 -#define TYPE_TREE 2 -#define TYPE_COMMIT 3 -#define TYPE_TAG 4 -#define TYPE_BAD 5 +/* + * The object type is stored in 3 bits. + */ +enum object_type { + OBJ_NONE = 0, + OBJ_COMMIT = 1, + OBJ_TREE = 2, + OBJ_BLOB = 3, + OBJ_TAG = 4, + /* 5/6 for future expansion */ + OBJ_DELTA = 7, + OBJ_BAD, +}; struct object { unsigned parsed : 1; @@ -40,14 +47,14 @@ struct object { }; extern int track_object_refs; -extern const char *type_names[]; +extern const char *type_names[9]; extern unsigned int get_max_object_index(void); extern struct object *get_indexed_object(unsigned int); static inline const char *typename(unsigned int type) { - return type_names[type > TYPE_TAG ? TYPE_BAD : type]; + return type_names[type > OBJ_TAG ? OBJ_BAD : type]; } extern struct object_refs *lookup_object_refs(struct object *); diff --git a/pack-check.c b/pack-check.c diff --git a/pack.h b/pack.h index 694e0c5..eb07b03 100644 --- a/pack.h +++ b/pack.h @@ -1,20 +1,7 @@ #ifndef PACK_H #define PACK_H -/* - * The packed object type is stored in 3 bits. - * The type value 0 is a reserved prefix if ever there is more than 7 - * object types, or any future format extensions. - */ -enum object_type { - OBJ_EXT = 0, - OBJ_COMMIT = 1, - OBJ_TREE = 2, - OBJ_BLOB = 3, - OBJ_TAG = 4, - /* 5/6 for future expansion */ - OBJ_DELTA = 7, -}; +#include "object.h" /* * Packed object header diff --git a/revision.c b/revision.c index 7df9089..874e349 100644 --- a/revision.c +++ b/revision.c @@ -135,7 +135,7 @@ static struct commit *handle_commit(stru /* * Tag object? Look what it points to.. */ - while (object->type == TYPE_TAG) { + while (object->type == OBJ_TAG) { struct tag *tag = (struct tag *) object; if (revs->tag_objects && !(flags & UNINTERESTING)) add_pending_object(revs, object, tag->tag); @@ -148,7 +148,7 @@ static struct commit *handle_commit(stru * Commit object? Just return it, we'll do all the complex * reachability crud. */ - if (object->type == TYPE_COMMIT) { + if (object->type == OBJ_COMMIT) { struct commit *commit = (struct commit *)object; if (parse_commit(commit) < 0) die("unable to parse commit %s", name); @@ -164,7 +164,7 @@ static struct commit *handle_commit(stru * Tree object? Either mark it uniniteresting, or add it * to the list of objects to look at later.. */ - if (object->type == TYPE_TREE) { + if (object->type == OBJ_TREE) { struct tree *tree = (struct tree *)object; if (!revs->tree_objects) return NULL; @@ -179,7 +179,7 @@ static struct commit *handle_commit(stru /* * Blob object? You know the drill by now.. */ - if (object->type == TYPE_BLOB) { + if (object->type == OBJ_BLOB) { struct blob *blob = (struct blob *)object; if (!revs->blob_objects) return NULL; @@ -494,11 +494,11 @@ static int add_parents_only(struct rev_i return 0; while (1) { it = get_reference(revs, arg, sha1, 0); - if (it->type != TYPE_TAG) + if (it->type != OBJ_TAG) break; memcpy(sha1, ((struct tag*)it)->tagged->sha1, 20); } - if (it->type != TYPE_COMMIT) + if (it->type != OBJ_COMMIT) return 0; commit = (struct commit *)it; for (parents = commit->parents; parents; parents = parents->next) { diff --git a/send-pack.c b/send-pack.c index 4019a4b..10bc8bc 100644 --- a/send-pack.c +++ b/send-pack.c @@ -151,12 +151,12 @@ static int ref_newer(const unsigned char * old. Otherwise we require --force. */ o = deref_tag(parse_object(old_sha1), NULL, 0); - if (!o || o->type != TYPE_COMMIT) + if (!o || o->type != OBJ_COMMIT) return 0; old = (struct commit *) o; o = deref_tag(parse_object(new_sha1), NULL, 0); - if (!o || o->type != TYPE_COMMIT) + if (!o || o->type != OBJ_COMMIT) return 0; new = (struct commit *) o; diff --git a/server-info.c b/server-info.c index fdfe05a..7df628f 100644 --- a/server-info.c +++ b/server-info.c @@ -12,7 +12,7 @@ static int add_info_ref(const char *path struct object *o = parse_object(sha1); fprintf(info_ref_fp, "%s %s\n", sha1_to_hex(sha1), path); - if (o->type == TYPE_TAG) { + if (o->type == OBJ_TAG) { o = deref_tag(o, path, 0); if (o) fprintf(info_ref_fp, "%s %s^{}\n", diff --git a/sha1_name.c b/sha1_name.c index f2cbafa..5fe8e5d 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -381,13 +381,13 @@ static int peel_onion(const char *name, sp++; /* beginning of type name, or closing brace for empty */ if (!strncmp(commit_type, sp, 6) && sp[6] == '}') - expected_type = TYPE_COMMIT; + expected_type = OBJ_COMMIT; else if (!strncmp(tree_type, sp, 4) && sp[4] == '}') - expected_type = TYPE_TREE; + expected_type = OBJ_TREE; else if (!strncmp(blob_type, sp, 4) && sp[4] == '}') - expected_type = TYPE_BLOB; + expected_type = OBJ_BLOB; else if (sp[0] == '}') - expected_type = TYPE_NONE; + expected_type = OBJ_NONE; else return -1; @@ -416,9 +416,9 @@ static int peel_onion(const char *name, memcpy(sha1, o->sha1, 20); return 0; } - if (o->type == TYPE_TAG) + if (o->type == OBJ_TAG) o = ((struct tag*) o)->tagged; - else if (o->type == TYPE_COMMIT) + else if (o->type == OBJ_COMMIT) o = &(((struct commit *) o)->tree->object); else return error("%.*s: expected %s type, but the object dereferences to %s type", diff --git a/tag.c b/tag.c index 74d0dab..864ac1b 100644 --- a/tag.c +++ b/tag.c @@ -5,7 +5,7 @@ const char *tag_type = "tag"; struct object *deref_tag(struct object *o, const char *warn, int warnlen) { - while (o && o->type == TYPE_TAG) + while (o && o->type == OBJ_TAG) o = parse_object(((struct tag *)o)->tagged->sha1); if (!o && warn) { if (!warnlen) @@ -21,12 +21,12 @@ struct tag *lookup_tag(const unsigned ch if (!obj) { struct tag *ret = alloc_tag_node(); created_object(sha1, &ret->object); - ret->object.type = TYPE_TAG; + ret->object.type = OBJ_TAG; return ret; } if (!obj->type) - obj->type = TYPE_TAG; - if (obj->type != TYPE_TAG) { + obj->type = OBJ_TAG; + if (obj->type != OBJ_TAG) { error("Object %s is a %s, not a tree", sha1_to_hex(sha1), typename(obj->type)); return NULL; diff --git a/tree.c b/tree.c index 1023655..a6032e3 100644 --- a/tree.c +++ b/tree.c @@ -131,12 +131,12 @@ struct tree *lookup_tree(const unsigned if (!obj) { struct tree *ret = alloc_tree_node(); created_object(sha1, &ret->object); - ret->object.type = TYPE_TREE; + ret->object.type = OBJ_TREE; return ret; } if (!obj->type) - obj->type = TYPE_TREE; - if (obj->type != TYPE_TREE) { + obj->type = OBJ_TREE; + if (obj->type != OBJ_TREE) { error("Object %s is a %s, not a tree", sha1_to_hex(sha1), typename(obj->type)); return NULL; @@ -216,11 +216,11 @@ struct tree *parse_tree_indirect(const u do { if (!obj) return NULL; - if (obj->type == TYPE_TREE) + if (obj->type == OBJ_TREE) return (struct tree *) obj; - else if (obj->type == TYPE_COMMIT) + else if (obj->type == OBJ_COMMIT) obj = &(((struct commit *) obj)->tree->object); - else if (obj->type == TYPE_TAG) + else if (obj->type == OBJ_TAG) obj = ((struct tag *) obj)->tagged; else return NULL; diff --git a/upload-pack.c b/upload-pack.c index b18eb9b..2e820c9 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -326,7 +326,7 @@ static int got_sha1(char *hex, unsigned o = parse_object(sha1); if (!o) die("oops (%s)", sha1_to_hex(sha1)); - if (o->type == TYPE_COMMIT) { + if (o->type == OBJ_COMMIT) { struct commit_list *parents; if (o->flags & THEY_HAVE) return 0; @@ -457,7 +457,7 @@ static int send_ref(const char *refname, o->flags |= OUR_REF; nr_our_refs++; } - if (o->type == TYPE_TAG) { + if (o->type == OBJ_TAG) { o = deref_tag(o, refname, 0); packet_write(1, "%s %s^{}\n", sha1_to_hex(o->sha1), refname); } ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 3:45 ` Linus Torvalds @ 2006-07-12 4:31 ` Linus Torvalds 2006-07-12 6:35 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 4:31 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, sf, Git Mailing List On Tue, 11 Jul 2006, Linus Torvalds wrote: > > Junio: this patch is totally independent from the other patches, and is on > top of you current "master". It gets rid of TYPE_xxx in favor of the > OBJ_xxx enums, which are moved from pack.h into object.h. ... > static inline const char *typename(unsigned int type) > { > - return type_names[type > TYPE_TAG ? TYPE_BAD : type]; > + return type_names[type > OBJ_TAG ? OBJ_BAD : type]; That should be "[type > OBJ_BAD ? OBJ_BAD : type]" Not that any users will care, because the current users of the typename() macro are the old TYPE_xxxx users which will never see the OBJ_DELTA case anyway. So it's not a big deal, but let's do it right for future users (ie if the pack-file things want to start using "typename(type)" and they actually _have_ delta descriptors). Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 3:45 ` Linus Torvalds 2006-07-12 4:31 ` Linus Torvalds @ 2006-07-12 6:35 ` Junio C Hamano 2006-07-12 16:29 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2006-07-12 6:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > I shouldn't have introduced the new TYPE_xxx macros. I should just have > used the same OBJ_xxx macros that we use in the pack-file. > > Junio: this patch is totally independent from the other patches, and is on > top of you current "master". It gets rid of TYPE_xxx in favor of the > OBJ_xxx enums, which are moved from pack.h into object.h. You seem to have forgotten one file (fetch-pack.c) but that was trivial. I'll apply and push it out shortly. This might collide with a few topic branches but I have rerere to help me cope with it ;-). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 6:35 ` Junio C Hamano @ 2006-07-12 16:29 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 16:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, 11 Jul 2006, Junio C Hamano wrote: > > You seem to have forgotten one file (fetch-pack.c) but that was > trivial. I'll apply and push it out shortly. Actually, it was fetch.c, not fetch-pack.c, and I forgot that due to a very real (and totally separate) bug. The makefile doesn't have any dependencies for "fetch.o" and "rsh.o", so when you change the headers, they never get rebuilt. I think fetch.o and rsh.o should either get added to the library files, or we need something like this.. (I didn't check that I caught all the appropriate *.o files, but this should be better than what we have now). Even better would be to make the dependancies automatic. The kernel does that really well with some GNU Makefile magic, but it also depends on magic gcc command line flags ("-Wp,-MD,$(depfile)") Linus --- diff --git a/Makefile b/Makefile index e75fb13..854e0af 100644 --- a/Makefile +++ b/Makefile @@ -236,6 +236,9 @@ BUILTIN_OBJS = \ builtin-cat-file.o builtin-mailsplit.o builtin-stripspace.o \ builtin-update-ref.o builtin-fmt-merge-msg.o +MISC_OBJS = \ + fetch.o rsh.o http-fetch.o http-push.o + GITLIBS = $(LIB_FILE) $(XDIFF_LIB) LIBS = $(GITLIBS) -lz @@ -615,7 +618,7 @@ git-http-push$X: revision.o http.o http- $(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) \ $(LIBS) $(CURL_LIBCURL) $(EXPAT_LIBEXPAT) -$(LIB_OBJS) $(BUILTIN_OBJS): $(LIB_H) +$(LIB_OBJS) $(BUILTIN_OBJS) $(MISC_OBJS): $(LIB_H) $(patsubst git-%$X,%.o,$(PROGRAMS)): $(LIB_H) $(wildcard */*.h) $(DIFF_OBJS): diffcore.h ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 0:03 ` Linus Torvalds 2006-07-12 0:39 ` Johannes Schindelin @ 2006-07-12 0:46 ` Junio C Hamano 2006-07-12 3:42 ` Linus Torvalds 2006-07-12 6:49 ` Peter Baumann 2 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2006-07-12 0:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: sf, Johannes Schindelin, Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > It's actually really part of the specs, and not just happenstance. > Well, I normally would agree with you if it was a "oh, all our zlib > objects seem to start with 0x78" thing, but after having dug into both the > zlib standard (which is actually an RFC, not just some random thing), and > looked at the sources, it's definitely the case that the "0x78" byte isn't > just an implementation detail. Ok, I do not think we would worry about casting use of deflate + 32k windowsize in stone that much, and being able to check the size and type without inflating certainly is attractive. Validating FCHECK bits is surely a nice touch. Thanks. > Anyway, I think this following patch replaces the old 2/3 and 3/3 (it > still depends on the original [1/3] cleanup. > > (It also renames and reverses the meaning of the config file option: it's > now "[core] LegacyHeaders = true" for using legacy headers.) > > Not heavily tested, but seems ok. I'd queue it in "pu" with reversed default and then move it to "next" later. > static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size) > { > int bytes = strlen(buffer) + 1; > unsigned char *buf = xmalloc(1+size); > + unsigned long n; > > - memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); > - bytes = stream->total_out - bytes; > + n = stream->total_out - bytes; > + if (n > size) > + n = size; > + memcpy(buf, (char *) buffer + bytes, n); > + bytes = n; > if (bytes < size) { > stream->next_out = buf + bytes; > stream->avail_out = size - bytes; This one looks like an independent fix for a well spotted bug. > int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1) > { > int size; > @@ -1459,7 +1550,7 @@ int write_sha1_file(void *buf, unsigned > /* Set it up */ > memset(&stream, 0, sizeof(stream)); > deflateInit(&stream, zlib_compression_level); > - size = deflateBound(&stream, len+hdrlen); > + size = 8 + deflateBound(&stream, len+hdrlen); > compressed = xmalloc(size); > > /* Compress it */ I am wondring what this eight is. You would pack 7 7-bit length plus 4-bit totalling 49+4 = 53-bit length (plus 4-bit type). Is it an unwritten decision that the format would not deal with objects larger than 2^53 (which is probably fine but looks magic)? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 0:46 ` Junio C Hamano @ 2006-07-12 3:42 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 3:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: sf, Johannes Schindelin, Git Mailing List On Tue, 11 Jul 2006, Junio C Hamano wrote: > > unsigned char *buf = xmalloc(1+size); > > + unsigned long n; > > > > - memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); > > - bytes = stream->total_out - bytes; > > + n = stream->total_out - bytes; > > + if (n > size) > > + n = size; > > + memcpy(buf, (char *) buffer + bytes, n); > > + bytes = n; > > if (bytes < size) { > > stream->next_out = buf + bytes; > > stream->avail_out = size - bytes; > > This one looks like an independent fix for a well spotted bug. Yeah, well, the "bug" only happens if you screw something up (which I triggered both times I tried to rewrite this ;) Or possibly it the object is corrupt. But yes, it's independent. > > - size = deflateBound(&stream, len+hdrlen); > > + size = 8 + deflateBound(&stream, len+hdrlen); > > compressed = xmalloc(size); > > > > /* Compress it */ > > I am wondring what this eight is. You would pack 7 7-bit length > plus 4-bit totalling 49+4 = 53-bit length (plus 4-bit type). Is > it an unwritten decision that the format would not deal with > objects larger than 2^53 (which is probably fine but looks > magic)? 8 was just "obviously enough". The "hdrlen" part should already give us _way_ more padding than we need (the old-fashioned header will deflate to something bigger than the new header, so deflateBound() even _without_ the extra space should be plenty). But I decided to add a few bytes just because it won't hurt. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 0:03 ` Linus Torvalds 2006-07-12 0:39 ` Johannes Schindelin 2006-07-12 0:46 ` Junio C Hamano @ 2006-07-12 6:49 ` Peter Baumann 2006-07-12 7:16 ` Junio C Hamano 2006-07-12 15:13 ` Linus Torvalds 2 siblings, 2 replies; 37+ messages in thread From: Peter Baumann @ 2006-07-12 6:49 UTC (permalink / raw) To: git On 2006-07-12, Linus Torvalds <torvalds@osdl.org> wrote: [...] > Anyway, I think this following patch replaces the old 2/3 and 3/3 (it > still depends on the original [1/3] cleanup. > > (It also renames and reverses the meaning of the config file option: it's > now "[core] LegacyHeaders = true" for using legacy headers.) > > Not heavily tested, but seems ok. > > sf? Dscho? Can you check this thing out? > > Linus > ---- [...] > diff --git a/sha1_file.c b/sha1_file.c > index 8734d50..475b23d 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -684,26 +684,74 @@ static void *map_sha1_file_internal(cons > return map; > } > > -static int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size) > +static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz) > { > + unsigned char c; > + unsigned int word, bits; > + unsigned long size; > + static const char *typename[8] = { > + NULL, /* OBJ_EXT */ > + "commit", "tree", "blob", "tag", > + NULL, NULL, NULL > + }; > + const char *type; > + > /* Get the data stream */ > memset(stream, 0, sizeof(*stream)); > stream->next_in = map; > stream->avail_in = mapsize; > stream->next_out = buffer; > - stream->avail_out = size; > + stream->avail_out = bufsiz; > + > + /* > + * Is it a zlib-compressed buffer? If so, the first byte > + * must be 0x78 (15-bit window size, deflated), and the > + * first 16-bit word is evenly divisible by 31 > + */ > + word = (map[0] << 8) + map[1]; > + if (map[0] == 0x78 && !(word % 31)) { > + inflateInit(stream); > + return inflate(stream, 0); > + } > + > + c = *map++; > + mapsize--; > + type = typename[(c >> 4) & 7]; > + if (!type) > + return -1; > + > + bits = 4; > + size = c & 0xf; > + while (!(c & 0x80)) { > + if (bits >= 8*sizeof(long)) > + return -1; > + c = *map++; > + size += (c & 0x7f) << bits; > + bits += 7; > + mapsize--; > + } This doesn't match the logic used in unpack_object_header, which is used in the packs: static unsigned long unpack_object_header(struct packed_git *p, unsigned long offset, enum object_type *type, unsigned long *sizep) { unsigned shift; unsigned char *pack, c; unsigned long size; if (offset >= p->pack_size) die("object offset outside of pack file"); pack = (unsigned char *) p->pack_base + offset; c = *pack++; offset++; *type = (c >> 4) & 7; size = c & 15; shift = 4; while (c & 0x80) { <========== if (offset >= p->pack_size) die("object offset outside of pack file"); c = *pack++; offset++; size += (c & 0x7f) << shift; shift += 7; } *sizep = size; <========== return offset; } > @@ -1414,6 +1462,49 @@ static int write_buffer(int fd, const vo > return 0; > } > > +static int write_binary_header(unsigned char *hdr, enum object_type type, unsigned long len) > +{ > + int hdr_len; > + unsigned char c; > + > + c = (type << 4) | (len & 15); > + len >>= 4; > + hdr_len = 1; > + while (len) { > + *hdr++ = c; > + hdr_len++; > + c = (len & 0x7f); > + len >>= 7; > + } > + *hdr = c | 0x80; > + return hdr_len; > +} > + Dito, but in this case see pack-objects.c /* * The per-object header is a pretty dense thing, which is * - first byte: low four bits are "size", then three bits of "type", * and the high bit is "size continues". * - each byte afterwards: low seven bits are size continuation, * with the high bit being "size continues" */ static int encode_header(enum object_type type, unsigned long size, unsigned char *hdr) { int n = 1; unsigned char c; if (type < OBJ_COMMIT || type > OBJ_DELTA) die("bad type %d", type); c = (type << 4) | (size & 15); size >>= 4; while (size) { *hdr++ = c | 0x80; <======= c = size & 0x7f; size >>= 7; n++; } *hdr = c; <======= return n; } -Peter Baumann ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 6:49 ` Peter Baumann @ 2006-07-12 7:16 ` Junio C Hamano 2006-07-12 8:28 ` Peter Baumann 2006-07-12 15:13 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2006-07-12 7:16 UTC (permalink / raw) To: Peter Baumann; +Cc: git, Linus Torvalds Peter Baumann <Peter.B.Baumann@stud.informatik.uni-erlangen.de> writes: >> + bits = 4; >> + size = c & 0xf; >> + while (!(c & 0x80)) { >> + if (bits >= 8*sizeof(long)) >> + return -1; >> + c = *map++; >> + size += (c & 0x7f) << bits; >> + bits += 7; >> + mapsize--; >> + } > > This doesn't match the logic used in unpack_object_header, which is used > in the packs: > ... >> + c = (type << 4) | (len & 15); >> + len >>= 4; >> + hdr_len = 1; >> + while (len) { >> + *hdr++ = c; >> + hdr_len++; >> + c = (len & 0x7f); >> + len >>= 7; >> + } >> + *hdr = c | 0x80; >> + return hdr_len; >> +} >> + > > Dito, but in this case see pack-objects.c Well, while these are not strictly needed to match, there is no good reason to make them inconsistent. Very well spotted. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 7:16 ` Junio C Hamano @ 2006-07-12 8:28 ` Peter Baumann 0 siblings, 0 replies; 37+ messages in thread From: Peter Baumann @ 2006-07-12 8:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Peter Baumann, git, Linus Torvalds On Wed, Jul 12, 2006 at 12:16:12AM -0700, Junio C Hamano wrote: > >> + bits = 4; > >> + size = c & 0xf; > >> + while (!(c & 0x80)) { > >> + if (bits >= 8*sizeof(long)) > >> + return -1; > >> + c = *map++; > >> + size += (c & 0x7f) << bits; > >> + bits += 7; > >> + mapsize--; > >> + } > > > > This doesn't match the logic used in unpack_object_header, which is used > > in the packs: > > ... > >> + c = (type << 4) | (len & 15); > >> + len >>= 4; > >> + hdr_len = 1; > >> + while (len) { > >> + *hdr++ = c; > >> + hdr_len++; > >> + c = (len & 0x7f); > >> + len >>= 7; > >> + } > >> + *hdr = c | 0x80; > >> + return hdr_len; > >> +} > >> + > > > > Dito, but in this case see pack-objects.c > > Well, while these are not strictly needed to match, there is no > good reason to make them inconsistent. Very well spotted. > During packing one could simply copy the existing object into the generated pack in the non delta case, so I think this _is_ necessary/usefull. -Peter Baumann ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 6:49 ` Peter Baumann 2006-07-12 7:16 ` Junio C Hamano @ 2006-07-12 15:13 ` Linus Torvalds 2006-07-12 15:27 ` Junio C Hamano 1 sibling, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2006-07-12 15:13 UTC (permalink / raw) To: Peter Baumann; +Cc: Junio C Hamano, Git Mailing List On Wed, 12 Jul 2006, Peter Baumann wrote: > > This doesn't match the logic used in unpack_object_header, which is used > in the packs: Yeah, good point. I reversed the meaning of the high bit by mistake. In pack-files, the high bit is a "there is more to come" bit, and in my new code it was a "this is the last byte" bit. It doesn't matter for the ultimate reason for this (being able to re-use the actual _bulk_ of the data), but having it be different would be stupid and confusing, and means that we can't later try to use the same routines for packing/unpacking the objects. Junio - do you want me to send an updated patch, or do you want to reverse bit#7 yourself? Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" 2006-07-12 15:13 ` Linus Torvalds @ 2006-07-12 15:27 ` Junio C Hamano 0 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2006-07-12 15:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Junio - do you want me to send an updated patch, or do you want to reverse > bit#7 yourself? Pushed out in "pu" already, thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 3/3] Enable the new binary header format for unpacked objects 2006-07-11 17:09 ` Linus Torvalds 2006-07-11 17:10 ` [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c Linus Torvalds 2006-07-11 17:12 ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds @ 2006-07-11 17:16 ` Linus Torvalds 2 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2006-07-11 17:16 UTC (permalink / raw) To: Carl Baldwin, Junio C Hamano; +Cc: Git Mailing List Enable the new binary header format for unpacked objects This makes unpacked objects use the same header encoding as the packed objects do, which should eventually allow us to be able to re-use the object data directly when creating pack-files (rather than having to decode and re-encode the data when inserting it in a pack). It's enabled by default, but can be disabled with [core] BinaryHeaders = false in the config file. We can read both formats, of course, so you can have a mixed archive. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- This should _not_ be applied to the main git sources, at least not with the default being "use_binary_headers = 1". But if you change that initial assignment to 0, this should be reasonably good. Not extensively tested, of course. It fails t9102-git-svn-deep-rmdir.sh for me for some reason, I didn't really look at it yet, since this whole thing is more for Carl Baldwin to play with right now. Documentation/config.txt | 5 ++++ cache.h | 1 + config.c | 5 ++++ environment.c | 1 + sha1_file.c | 65 ++++++++++++++++++++++++++++++++++++++++------ 5 files changed, 69 insertions(+), 8 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0b434c1..bc95416 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -97,6 +97,11 @@ core.compression:: compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. +core.binaryheaders:: + A boolean, that if set to false, disables the use of the + new-style binary objects headers that share the same format with + the headers in a pack file. + alias.*:: Command aliases for the gitlink:git[1] command wrapper - e.g. after defining "alias.last = cat-file commit HEAD", the invocation diff --git a/cache.h b/cache.h index d433d46..756d89f 100644 --- a/cache.h +++ b/cache.h @@ -176,6 +176,7 @@ extern int commit_lock_file(struct lock_ extern void rollback_lock_file(struct lock_file *); /* Environment bits from configuration mechanism */ +extern int use_binary_headers; extern int trust_executable_bit; extern int assume_unchanged; extern int prefer_symlink_refs; diff --git a/config.c b/config.c index 8445f7d..2497447 100644 --- a/config.c +++ b/config.c @@ -289,6 +289,11 @@ int git_default_config(const char *var, return 0; } + if (!strcmp(var, "core.binaryheaders")) { + use_binary_headers = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "user.name")) { strlcpy(git_default_name, value, sizeof(git_default_name)); return 0; diff --git a/environment.c b/environment.c index 43823ff..340214d 100644 --- a/environment.c +++ b/environment.c @@ -11,6 +11,7 @@ #include "cache.h" char git_default_email[MAX_GITNAME]; char git_default_name[MAX_GITNAME]; +int use_binary_headers = 1; int trust_executable_bit = 1; int assume_unchanged = 0; int prefer_symlink_refs = 0; diff --git a/sha1_file.c b/sha1_file.c index ca5f0c0..700f455 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -699,11 +699,14 @@ static int unpack_sha1_header(z_stream * static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size, unsigned int hdrlen) { - int bytes = hdrlen; + unsigned long bytes = hdrlen, n; unsigned char *buf = xmalloc(1+size); - memcpy(buf, (char *) buffer + bytes, stream->total_out - bytes); - bytes = stream->total_out - bytes; + n = stream->total_out - bytes; + if (n > size) + n = size; + memcpy(buf, (char *) buffer + bytes, n); + bytes = n; if (bytes < size) { stream->next_out = buf + bytes; stream->avail_out = size - bytes; @@ -1240,7 +1243,7 @@ struct packed_git *find_sha1_pack(const int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep) { - int status, hdrlen; + int status; unsigned long mapsize, size; void *map; z_stream stream; @@ -1349,24 +1352,70 @@ void *read_object_with_reference(const u } } +static int write_binary_header(unsigned char *hdr, enum object_type type, unsigned long len) +{ + int hdr_len; + unsigned char c; + + c = (type << 4) | (len & 15); + len >>= 4; + hdr_len = 1; + while (len) { + *hdr++ = c; + hdr_len++; + c = (len & 0x7f); + len >>= 7; + } + *hdr = c | 0x80; + return hdr_len; +} + +static int generate_binary_header(unsigned char *hdr, const char *type, unsigned long len) +{ + int obj_type; + + if (!strcmp(type, blob_type)) + obj_type = OBJ_BLOB; + else if (!strcmp(type, tree_type)) + obj_type = OBJ_TREE; + else if (!strcmp(type, commit_type)) + obj_type = OBJ_COMMIT; + else if (!strcmp(type, tag_type)) + obj_type = OBJ_TAG; + else + die("trying to generate bogus object of type '%s'", type); + return write_binary_header(hdr, obj_type, len); +} + char *write_sha1_file_prepare(void *buf, unsigned long len, const char *type, unsigned char *sha1, unsigned char *hdr, - int *hdrlen) + int *hdrlenp) { SHA_CTX c; + int hdr_len; - /* Generate the header */ - *hdrlen = sprintf((char *)hdr, "%s %lu", type, len)+1; + /* + * Generate the header. + * + * NOTE! Regardless of whether we end up using the ASCII + * or binary header, we always generate the SHA1 of the + * file as if we had the ASCII header. + */ + hdr_len = sprintf((char *)hdr, "%s %lu", type, len)+1; /* Sha1.. */ SHA1_Init(&c); - SHA1_Update(&c, hdr, *hdrlen); + SHA1_Update(&c, hdr, hdr_len); SHA1_Update(&c, buf, len); SHA1_Final(sha1, &c); + if (use_binary_headers) + hdr_len = generate_binary_header(hdr, type, len); + *hdrlenp = hdr_len; + return sha1_file_name(sha1); } ^ permalink raw reply related [flat|nested] 37+ messages in thread
end of thread, other threads:[~2006-07-12 16:29 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-07-10 23:01 Revisiting large binary files issue Carl Baldwin 2006-07-10 23:14 ` Junio C Hamano 2006-07-11 6:20 ` Peter Baumann 2006-07-10 23:28 ` Linus Torvalds 2006-07-11 9:40 ` [RFC]: Pack-file object format for individual objects (Was: Revisiting large binary files issue.) sf 2006-07-11 18:00 ` Linus Torvalds 2006-07-11 21:45 ` sf 2006-07-11 22:17 ` Linus Torvalds 2006-07-11 22:26 ` Linus Torvalds 2006-07-11 14:55 ` Revisiting large binary files issue Carl Baldwin 2006-07-11 17:09 ` Linus Torvalds 2006-07-11 17:10 ` [PATCH 1/3] Make the unpacked object header functions static to sha1_file.c Linus Torvalds 2006-07-11 17:12 ` [PATCH 2/3] sha1_file: add the ability to parse objects in "pack file format" Linus Torvalds 2006-07-11 18:40 ` Johannes Schindelin 2006-07-11 18:58 ` Linus Torvalds 2006-07-11 19:20 ` Johannes Schindelin 2006-07-11 19:48 ` Linus Torvalds 2006-07-11 21:25 ` Johannes Schindelin 2006-07-11 21:47 ` Junio C Hamano 2006-07-11 21:24 ` sf 2006-07-11 22:09 ` Linus Torvalds 2006-07-11 22:25 ` sf 2006-07-11 23:03 ` Junio C Hamano 2006-07-12 0:03 ` Linus Torvalds 2006-07-12 0:39 ` Johannes Schindelin 2006-07-12 3:45 ` Linus Torvalds 2006-07-12 4:31 ` Linus Torvalds 2006-07-12 6:35 ` Junio C Hamano 2006-07-12 16:29 ` Linus Torvalds 2006-07-12 0:46 ` Junio C Hamano 2006-07-12 3:42 ` Linus Torvalds 2006-07-12 6:49 ` Peter Baumann 2006-07-12 7:16 ` Junio C Hamano 2006-07-12 8:28 ` Peter Baumann 2006-07-12 15:13 ` Linus Torvalds 2006-07-12 15:27 ` Junio C Hamano 2006-07-11 17:16 ` [PATCH 3/3] Enable the new binary header format for unpacked objects Linus Torvalds
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).