git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Reading commit objects
@ 2013-05-21 21:21 Chico Sokol
  2013-05-21 21:25 ` Felipe Contreras
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Chico Sokol @ 2013-05-21 21:21 UTC (permalink / raw)
  To: git

Hello,

I'm building a library to manipulate git repositories (interacting
directly with the filesystem).

Currently, we're trying to parse commit objects. After decompressing
the contents of a commit object file we got the following output:

commit 191
author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300

first commit

We hoped to get the same output of a "git cat-file -p <sha1>", but
that didn't happened. From a commit object, how can I find tree object
hash of this commit?

Thanks,


--
Chico Sokol

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 21:21 Reading commit objects Chico Sokol
@ 2013-05-21 21:25 ` Felipe Contreras
  2013-05-21 21:37 ` John Szakmeister
  2013-05-21 22:20 ` Junio C Hamano
  2 siblings, 0 replies; 19+ messages in thread
From: Felipe Contreras @ 2013-05-21 21:25 UTC (permalink / raw)
  To: Chico Sokol; +Cc: git

On Tue, May 21, 2013 at 4:21 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
> Hello,
>
> I'm building a library to manipulate git repositories (interacting
> directly with the filesystem).
>
> Currently, we're trying to parse commit objects. After decompressing
> the contents of a commit object file we got the following output:
>
> commit 191
> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>
> first commit
>
> We hoped to get the same output of a "git cat-file -p <sha1>", but
> that didn't happened. From a commit object, how can I find tree object
> hash of this commit?

git rev-parse <sha1>:

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 21:21 Reading commit objects Chico Sokol
  2013-05-21 21:25 ` Felipe Contreras
@ 2013-05-21 21:37 ` John Szakmeister
  2013-05-21 22:18   ` Chico Sokol
  2013-05-21 22:20 ` Junio C Hamano
  2 siblings, 1 reply; 19+ messages in thread
From: John Szakmeister @ 2013-05-21 21:37 UTC (permalink / raw)
  To: Chico Sokol; +Cc: git

On Tue, May 21, 2013 at 5:21 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
> Hello,
>
> I'm building a library to manipulate git repositories (interacting
> directly with the filesystem).
>
> Currently, we're trying to parse commit objects. After decompressing
> the contents of a commit object file we got the following output:
>
> commit 191
> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>
> first commit

Does `git cat-file -p <sha1>` show a tree object?  FWIW, I expected to
see a tree line there, so maybe this object was created without a
tree?  I also don't see a parent listed.

I did this on one of my repos:

>>> buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 'rb').read()
>>> import zlib
>>> zlib.decompress(buf)
'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent
0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister
<john@szakmeister.net> 1369168481 -0400\ncommitter John Szakmeister
<john@szakmeister.net> 1369168481 -0400\n\nGot a file listing.\n'

So at least creating the commits with Git, I see a tree.  How was the
commit you're referencing created?  Perhaps something is wrong with
that process?

> We hoped to get the same output of a "git cat-file -p <sha1>", but
> that didn't happened. From a commit object, how can I find tree object
> hash of this commit?

I'd expect that too.

-John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 21:37 ` John Szakmeister
@ 2013-05-21 22:18   ` Chico Sokol
  2013-05-21 22:22     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Chico Sokol @ 2013-05-21 22:18 UTC (permalink / raw)
  To: John Szakmeister; +Cc: git

Ok, we discovered that the commit object actually contains the tree
object's sha1, by reading its contents with python zlib library.

So the bug must be with our java code (we're building a java lib).

Is there any non-standard issue in git's zlib compression? We're
decompressing its contents with java default zlib api, so it should
work normally, here's our code, that's printing that wrong output:

import java.io.File;
import java.io.FileInputStream;
import java.util.zip.InflaterInputStream;
import org.apache.commons.io.IOUtils;
...
File obj = new File(".git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(obj));
System.out.println(IOUtils.readLines(inflaterInputStream));


I know that here it's not the right place to ask about java issues,
but we would appreciate any help any help.



--
Chico Sokol


On Tue, May 21, 2013 at 6:37 PM, John Szakmeister <john@szakmeister.net> wrote:
> On Tue, May 21, 2013 at 5:21 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
>> Hello,
>>
>> I'm building a library to manipulate git repositories (interacting
>> directly with the filesystem).
>>
>> Currently, we're trying to parse commit objects. After decompressing
>> the contents of a commit object file we got the following output:
>>
>> commit 191
>> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>
>> first commit
>
> Does `git cat-file -p <sha1>` show a tree object?  FWIW, I expected to
> see a tree line there, so maybe this object was created without a
> tree?  I also don't see a parent listed.
>
> I did this on one of my repos:
>
>>>> buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 'rb').read()
>>>> import zlib
>>>> zlib.decompress(buf)
> 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent
> 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister
> <john@szakmeister.net> 1369168481 -0400\ncommitter John Szakmeister
> <john@szakmeister.net> 1369168481 -0400\n\nGot a file listing.\n'
>
> So at least creating the commits with Git, I see a tree.  How was the
> commit you're referencing created?  Perhaps something is wrong with
> that process?
>
>> We hoped to get the same output of a "git cat-file -p <sha1>", but
>> that didn't happened. From a commit object, how can I find tree object
>> hash of this commit?
>
> I'd expect that too.
>
> -John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 21:21 Reading commit objects Chico Sokol
  2013-05-21 21:25 ` Felipe Contreras
  2013-05-21 21:37 ` John Szakmeister
@ 2013-05-21 22:20 ` Junio C Hamano
  2 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2013-05-21 22:20 UTC (permalink / raw)
  To: Chico Sokol; +Cc: git

Chico Sokol <chico.sokol@gmail.com> writes:

> Hello,
>
> I'm building a library to manipulate git repositories (interacting
> directly with the filesystem).
>
> Currently, we're trying to parse commit objects. After decompressing
> the contents of a commit object file we got the following output:

Who wrote this commit object you are trying to read?  Us, or your
library (this question is to see if you are chasing the right
problem)?

> commit 191
> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>
> first commit
>
> We hoped to get the same output of a "git cat-file -p <sha1>", but
> that didn't happened. From a commit object, how can I find tree object
> hash of this commit?

If you care about the byte-for-byte compatibility, never use
"cat-file -p".  That is meant for human consumption.

"git cat-file commit <sha1>" gives you the raw representation after
inflating and stripping out the first "<type> SP <length> LF" line.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 22:18   ` Chico Sokol
@ 2013-05-21 22:22     ` Junio C Hamano
  2013-05-21 22:33       ` Chico Sokol
  2013-05-22  4:51     ` java zlib woes (was: Reading commit objects) Andreas Krey
  2013-05-22  5:59     ` Reading commit objects Shawn Pearce
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2013-05-21 22:22 UTC (permalink / raw)
  To: Chico Sokol; +Cc: John Szakmeister, git

Chico Sokol <chico.sokol@gmail.com> writes:

> Ok, we discovered that the commit object actually contains the tree
> object's sha1, by reading its contents with python zlib library.
>
> So the bug must be with our java code (we're building a java lib).

Why aren't you using jgit?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 22:22     ` Junio C Hamano
@ 2013-05-21 22:33       ` Chico Sokol
  2013-05-21 23:34         ` Jonathan Nieder
  2013-05-22  5:54         ` Shawn Pearce
  0 siblings, 2 replies; 19+ messages in thread
From: Chico Sokol @ 2013-05-21 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Szakmeister, git

It was git who created that object.

We're trying to build a improved java library focused in our needs
(jgit has a really confusing api focused in solving egit needs). But
we're about to get into their code to discover how to decompress git
objects.


--
Chico Sokol


On Tue, May 21, 2013 at 7:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Chico Sokol <chico.sokol@gmail.com> writes:
>
>> Ok, we discovered that the commit object actually contains the tree
>> object's sha1, by reading its contents with python zlib library.
>>
>> So the bug must be with our java code (we're building a java lib).
>
> Why aren't you using jgit?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 22:33       ` Chico Sokol
@ 2013-05-21 23:34         ` Jonathan Nieder
  2013-05-22  5:54         ` Shawn Pearce
  1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2013-05-21 23:34 UTC (permalink / raw)
  To: Chico Sokol; +Cc: Junio C Hamano, John Szakmeister, git

Chico Sokol wrote:

> We're trying to build a improved java library focused in our needs
> (jgit has a really confusing api focused in solving egit needs).

JGit is also open to contributions, including contributions that
add less confusing API calls. :)  See

 http://wiki.eclipse.org/JGit/User_Guide
 http://wiki.eclipse.org/EGit/Contributor_Guide#JGit
 http://wiki.eclipse.org/EGit/Contributor_Guide#Using_Gerrit_at_https:.2F.2Fgit.eclipse.org.2Fr
 https://dev.eclipse.org/mailman/listinfo/jgit-dev

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* java zlib woes (was: Reading commit objects)
  2013-05-21 22:18   ` Chico Sokol
  2013-05-21 22:22     ` Junio C Hamano
@ 2013-05-22  4:51     ` Andreas Krey
  2013-05-22  5:56       ` Shawn Pearce
  2013-05-22  5:59     ` Reading commit objects Shawn Pearce
  2 siblings, 1 reply; 19+ messages in thread
From: Andreas Krey @ 2013-05-22  4:51 UTC (permalink / raw)
  To: Chico Sokol; +Cc: John Szakmeister, git

On Tue, 21 May 2013 19:18:35 +0000, Chico Sokol wrote:
> Ok, we discovered that the commit object actually contains the tree
> object's sha1, by reading its contents with python zlib library.
> 
> So the bug must be with our java code (we're building a java lib).

That's interesting. We ran in a similar problem: We had a fetch
with jget hanging within the zlib deflater code in what looked
like a busy loop. Unfortunately we don't yet have a publishable
repo on which this happens.

Andreas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 22:33       ` Chico Sokol
  2013-05-21 23:34         ` Jonathan Nieder
@ 2013-05-22  5:54         ` Shawn Pearce
  1 sibling, 0 replies; 19+ messages in thread
From: Shawn Pearce @ 2013-05-22  5:54 UTC (permalink / raw)
  To: Chico Sokol; +Cc: Junio C Hamano, John Szakmeister, git

On Tue, May 21, 2013 at 3:33 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
> It was git who created that object.
>
> We're trying to build a improved java library focused in our needs
> (jgit has a really confusing api focused in solving egit needs).

JGit code... is confusing because its fast. We spent a lot of time
trying to make things fast on the JVM, and somewhat comparable with C
Git even though its not in C. Some of the low-level APIs are fast
because they bypass conventional Java wisdom and just tell the #@!*
machine what to do, with no pretty bits about it. Make it pretty, it
goes slower. Or uses more RAM. Java likes RAM.

Good luck making an improved library. JGit of course is also
interested in contributions. The api package has been trying to make a
simpler calling convention for common use cases that match the command
line interface user are familiar with, but its still incomplete and
hides some optimizations that are possible with the lower-level calls.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: java zlib woes (was: Reading commit objects)
  2013-05-22  4:51     ` java zlib woes (was: Reading commit objects) Andreas Krey
@ 2013-05-22  5:56       ` Shawn Pearce
  2013-05-27  4:11         ` Andreas Krey
  0 siblings, 1 reply; 19+ messages in thread
From: Shawn Pearce @ 2013-05-22  5:56 UTC (permalink / raw)
  To: Andreas Krey; +Cc: Chico Sokol, John Szakmeister, git

On Tue, May 21, 2013 at 9:51 PM, Andreas Krey <a.krey@gmx.de> wrote:
> On Tue, 21 May 2013 19:18:35 +0000, Chico Sokol wrote:
>> Ok, we discovered that the commit object actually contains the tree
>> object's sha1, by reading its contents with python zlib library.
>>
>> So the bug must be with our java code (we're building a java lib).
>
> That's interesting. We ran in a similar problem: We had a fetch
> with jget hanging within the zlib deflater code in what looked
> like a busy loop. Unfortunately we don't yet have a publishable
> repo on which this happens.

This was with JGit?  A stack trace and JGit version (so we can
correlate line numbers) would be a much more useful bug report than
nothing at all.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-21 22:18   ` Chico Sokol
  2013-05-21 22:22     ` Junio C Hamano
  2013-05-22  4:51     ` java zlib woes (was: Reading commit objects) Andreas Krey
@ 2013-05-22  5:59     ` Shawn Pearce
  2013-05-22 14:20       ` Chico Sokol
  2013-05-22 14:25       ` Chico Sokol
  2 siblings, 2 replies; 19+ messages in thread
From: Shawn Pearce @ 2013-05-22  5:59 UTC (permalink / raw)
  To: Chico Sokol; +Cc: John Szakmeister, git

On Tue, May 21, 2013 at 3:18 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
> Ok, we discovered that the commit object actually contains the tree
> object's sha1, by reading its contents with python zlib library.
>
> So the bug must be with our java code (we're building a java lib).
>
> Is there any non-standard issue in git's zlib compression? We're
> decompressing its contents with java default zlib api, so it should
> work normally, here's our code, that's printing that wrong output:
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.util.zip.InflaterInputStream;
> import org.apache.commons.io.IOUtils;
> ...
> File obj = new File(".git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
> InflaterInputStream inflaterInputStream = new InflaterInputStream(new
> FileInputStream(obj));
> System.out.println(IOUtils.readLines(inflaterInputStream));
...
>>> Currently, we're trying to parse commit objects. After decompressing
>>> the contents of a commit object file we got the following output:
>>>
>>> commit 191
>>> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>>
>>> first commit

Your code is broken. IOUtils is probably corrupting what you get back.
After inflating the stream you should see the object type ("commit"),
space, its length in bytes as a base 10 string, and then a NUL ('\0').
Following that is the tree line, and parent(s) if any. I wonder if
IOUtils discarded the remainder of the line after the NUL and did not
consider the tree line.

And you wonder why JGit code is confusing. We can't rely on "standard
Java APIs" to do the right thing, because commonly used libraries have
made assumptions that disagree with the way Git works.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-22  5:59     ` Reading commit objects Shawn Pearce
@ 2013-05-22 14:20       ` Chico Sokol
  2013-05-22 20:02         ` Shawn Pearce
  2013-05-22 14:25       ` Chico Sokol
  1 sibling, 1 reply; 19+ messages in thread
From: Chico Sokol @ 2013-05-22 14:20 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: John Szakmeister, git

I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
We're not interested in mapping git commands in java and don't have
the same RAM limitations.

I know JGit team is doing a great job and we do not intend to build a
library with such completeness.

Are you guys contributors of JGit? Can you guys point me out to the
code that unpacks git objects? The closest I could get was that class:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

It seems to be a standard and a non standard format of the packed
object, as I read the comments of this method:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

I suspect that the default inflater class of java api expect the
object to be in the standard format.

What the following comment mean? What's the "Experimental pack-based"
format? Is there any docs on the specs of that?

We must determine if the buffer contains the standard
zlib-deflated stream or the experimental format based
on the in-pack object format. Compare the header byte
for each format:
RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
Experimental pack-based : Stttssss : ttt = 1,2,3,4


--
Chico Sokol


On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Tue, May 21, 2013 at 3:18 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
>> Ok, we discovered that the commit object actually contains the tree
>> object's sha1, by reading its contents with python zlib library.
>>
>> So the bug must be with our java code (we're building a java lib).
>>
>> Is there any non-standard issue in git's zlib compression? We're
>> decompressing its contents with java default zlib api, so it should
>> work normally, here's our code, that's printing that wrong output:
>>
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.util.zip.InflaterInputStream;
>> import org.apache.commons.io.IOUtils;
>> ...
>> File obj = new File(".git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
>> InflaterInputStream inflaterInputStream = new InflaterInputStream(new
>> FileInputStream(obj));
>> System.out.println(IOUtils.readLines(inflaterInputStream));
> ...
>>>> Currently, we're trying to parse commit objects. After decompressing
>>>> the contents of a commit object file we got the following output:
>>>>
>>>> commit 191
>>>> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>>> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>>>
>>>> first commit
>
> Your code is broken. IOUtils is probably corrupting what you get back.
> After inflating the stream you should see the object type ("commit"),
> space, its length in bytes as a base 10 string, and then a NUL ('\0').
> Following that is the tree line, and parent(s) if any. I wonder if
> IOUtils discarded the remainder of the line after the NUL and did not
> consider the tree line.
>
> And you wonder why JGit code is confusing. We can't rely on "standard
> Java APIs" to do the right thing, because commonly used libraries have
> made assumptions that disagree with the way Git works.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-22  5:59     ` Reading commit objects Shawn Pearce
  2013-05-22 14:20       ` Chico Sokol
@ 2013-05-22 14:25       ` Chico Sokol
  2013-05-22 14:47         ` Chico Sokol
  2013-05-22 19:59         ` Shawn Pearce
  1 sibling, 2 replies; 19+ messages in thread
From: Chico Sokol @ 2013-05-22 14:25 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: John Szakmeister, git

> Your code is broken. IOUtils is probably corrupting what you get back.
> After inflating the stream you should see the object type ("commit"),
> space, its length in bytes as a base 10 string, and then a NUL ('\0').
> Following that is the tree line, and parent(s) if any. I wonder if
> IOUtils discarded the remainder of the line after the NUL and did not
> consider the tree line.


Maybe you're right, Shawn. I've also tried the following code:

File dotGit = new File("objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

But we got the same result, I'll try to read the bytes by myself
(without apache IOUtils). Is the contents of a unpacked object utf-8
encoded?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-22 14:25       ` Chico Sokol
@ 2013-05-22 14:47         ` Chico Sokol
  2013-05-22 19:59         ` Shawn Pearce
  1 sibling, 0 replies; 19+ messages in thread
From: Chico Sokol @ 2013-05-22 14:47 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: John Szakmeister, git

Solved! It was exaclty the problem pointed by Shawn.

Here is the working code:

File dotGit = new File("objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
Integer read = inflaterInputStream.read();
while(read != 0) { //reading the bytes from 'commit <lenght>\0'
    read = inflaterInputStream.read();
    System.out.println((char)read.byteValue());
}
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

Thank you all!



--
Chico Sokol


On Wed, May 22, 2013 at 11:25 AM, Chico Sokol <chico.sokol@gmail.com> wrote:
>> Your code is broken. IOUtils is probably corrupting what you get back.
>> After inflating the stream you should see the object type ("commit"),
>> space, its length in bytes as a base 10 string, and then a NUL ('\0').
>> Following that is the tree line, and parent(s) if any. I wonder if
>> IOUtils discarded the remainder of the line after the NUL and did not
>> consider the tree line.
>
>
> Maybe you're right, Shawn. I've also tried the following code:
>
> File dotGit = new File("objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
> InflaterInputStream inflaterInputStream = new InflaterInputStream(new
> FileInputStream(dotGit));
> ByteArrayOutputStream os = new ByteArrayOutputStream();
> IOUtils.copyLarge(inflaterInputStream, os);
> System.out.println(new String(os.toByteArray()));
>
> But we got the same result, I'll try to read the bytes by myself
> (without apache IOUtils). Is the contents of a unpacked object utf-8
> encoded?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-22 14:25       ` Chico Sokol
  2013-05-22 14:47         ` Chico Sokol
@ 2013-05-22 19:59         ` Shawn Pearce
  1 sibling, 0 replies; 19+ messages in thread
From: Shawn Pearce @ 2013-05-22 19:59 UTC (permalink / raw)
  To: Chico Sokol; +Cc: John Szakmeister, git

On Wed, May 22, 2013 at 7:25 AM, Chico Sokol <chico.sokol@gmail.com> wrote:
>> Your code is broken. IOUtils is probably corrupting what you get back.
>> After inflating the stream you should see the object type ("commit"),
>> space, its length in bytes as a base 10 string, and then a NUL ('\0').
>> Following that is the tree line, and parent(s) if any. I wonder if
>> IOUtils discarded the remainder of the line after the NUL and did not
>> consider the tree line.
>
...
> Is the contents of a unpacked object utf-8
> encoded?

Its more complicated than that. Commit objects are usually in utf-8,
unless a repository configuration setting told you otherwise, or an
encoding header appears in the commit. And sometimes that data lies
anyway. ISO-8859-1 is one of the safer forms of reading a commit, but
that also isn't always accurate.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Reading commit objects
  2013-05-22 14:20       ` Chico Sokol
@ 2013-05-22 20:02         ` Shawn Pearce
  0 siblings, 0 replies; 19+ messages in thread
From: Shawn Pearce @ 2013-05-22 20:02 UTC (permalink / raw)
  To: Chico Sokol; +Cc: John Szakmeister, git

On Wed, May 22, 2013 at 7:20 AM, Chico Sokol <chico.sokol@gmail.com> wrote:
> I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
> We're not interested in mapping git commands in java and don't have
> the same RAM limitations.

I guess you aren't trying to process the WebKit or Linux kernel
repositories. Or you can afford more RAM than I can[1]. :-)

[1] $DAY_JOB has lots of RAM.  Lots.

> Are you guys contributors of JGit?

Not really. I had nothing to do with JGit.  :-)

> Can you guys point me out to the
> code that unpacks git objects? The closest I could get was that class:
> https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

This class handles the loose object format in $GIT_DIR/objects, but
does not handle objects contained in pack files. That is elsewhere,
and well, more complex. Look at PackFile.java.

> It seems to be a standard and a non standard format of the packed
> object, as I read the comments of this method:
> https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

There are two formats, the official format that is used, and an
experimental format that was discarded but is still supported for
legacy reasons.

> I suspect that the default inflater class of java api expect the
> object to be in the standard format.
>
> What the following comment mean? What's the "Experimental pack-based"
> format? Is there any docs on the specs of that?

Read the code. This is the dead format that is no longer written, but
is still supported.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: java zlib woes (was: Reading commit objects)
  2013-05-22  5:56       ` Shawn Pearce
@ 2013-05-27  4:11         ` Andreas Krey
  2013-06-04 10:18           ` fetch delta resolution vs. checkout (was: java zlib woes) Andreas Krey
  0 siblings, 1 reply; 19+ messages in thread
From: Andreas Krey @ 2013-05-27  4:11 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Chico Sokol, John Szakmeister, git

On Tue, 21 May 2013 22:56:21 +0000, Shawn Pearce wrote:
...
> This was with JGit?  A stack trace and JGit version (so we can
> correlate line numbers) would be a much more useful bug report than
> nothing at all.

I now have a full test case (involving a generated repo just shy of 1GB)
that will reproduce that hang. Will look up the existing jgit bug to
report there.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 19+ messages in thread

* fetch delta resolution vs. checkout (was: java zlib woes)
  2013-05-27  4:11         ` Andreas Krey
@ 2013-06-04 10:18           ` Andreas Krey
  0 siblings, 0 replies; 19+ messages in thread
From: Andreas Krey @ 2013-06-04 10:18 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Chico Sokol, John Szakmeister, git

On Mon, 27 May 2013 06:11:46 +0000, Andreas Krey wrote:
...
> 
> I now have a full test case (involving a generated repo just shy of 1GB)
> that will reproduce that hang. Will look up the existing jgit bug to
> report there.

On https://bugs.eclipse.org/bugs/show_bug.cgi?id=394078

A question: The delta decoding. If I understand correctly,
git and jgit do verify the packfile content after fetching/cloning,
and need to resolve any deltified files in the pack.

And when checking out a commit it needs this to again for the
files that are being checked out?

Because we now have the phenomenon that the packfile is fetched
ok, but a checkout then hangs (100%) CPU on one of the large files,
and on one that should, according to core.bigfilethreshold, not
even be deltified.

(Setting core.bigfilethreshold to 20m in the source repo (C git)
gets jgit to no longer hang in the fetch/delta resolution phase.
And it doesn't look like jgit would repack the pack file, and
uses it as it was received plus 20 bytes at the end.)

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-06-04 10:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21 21:21 Reading commit objects Chico Sokol
2013-05-21 21:25 ` Felipe Contreras
2013-05-21 21:37 ` John Szakmeister
2013-05-21 22:18   ` Chico Sokol
2013-05-21 22:22     ` Junio C Hamano
2013-05-21 22:33       ` Chico Sokol
2013-05-21 23:34         ` Jonathan Nieder
2013-05-22  5:54         ` Shawn Pearce
2013-05-22  4:51     ` java zlib woes (was: Reading commit objects) Andreas Krey
2013-05-22  5:56       ` Shawn Pearce
2013-05-27  4:11         ` Andreas Krey
2013-06-04 10:18           ` fetch delta resolution vs. checkout (was: java zlib woes) Andreas Krey
2013-05-22  5:59     ` Reading commit objects Shawn Pearce
2013-05-22 14:20       ` Chico Sokol
2013-05-22 20:02         ` Shawn Pearce
2013-05-22 14:25       ` Chico Sokol
2013-05-22 14:47         ` Chico Sokol
2013-05-22 19:59         ` Shawn Pearce
2013-05-21 22:20 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).