git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
@ 2021-08-16 14:00 ZheNing Hu
  2021-08-17 14:34 ` Fwd: " ZheNing Hu
  2021-08-17 16:09 ` Christian Couder
  0 siblings, 2 replies; 14+ messages in thread
From: ZheNing Hu @ 2021-08-16 14:00 UTC (permalink / raw)
  To: Git List, Junio C Hamano, Christian Couder, Stefan Beller,
	Hariom verma

Hi,

In the implementation of %(raw) atom
(bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
to copy the content of the object. But if we can reuse the content
of the object?

Since git cat-file --batch needs to use ref-filter
as the backend, if the object buffer can be reused correctly here,
we can save a lot of copies and improve its performance by about 6%.

Tracing back to the source, the object buffer is allocated from
oid_object_info_extended(), but in parse_object_buffer() we may lose
the ownership of the buffer (maybe the buffer is eaten), but I browsed the
source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
buffers which have been eaten are released by the program.

So can we reuse it?

This is what I want to do:

diff --git a/ref-filter.c b/ref-filter.c
index 93ce2a6ef2..1f6c1daabd 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
atom_value *val, int deref, struct exp
                        unsigned long buf_size = data->size;

                        if (atom->u.raw_data.option == RAW_BARE) {
-                               v->s = xmemdupz(buf, buf_size);
+                               v->s = buf;
                                v->s_size = buf_size;
                        } else if (atom->u.raw_data.option == RAW_LENGTH) {
                                v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
@@ -1768,8 +1768,6 @@ static int get_object(struct ref_array_item
*ref, int deref, struct object **obj
        }

        grab_common_values(ref->value, deref, oi);
-       if (!eaten)
-               free(oi->content);
        return 0;
 }

Thanks.
--
ZheNing Hu

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Fwd: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
@ 2021-08-17 14:34 ` ZheNing Hu
  2021-08-17 16:09 ` Christian Couder
  1 sibling, 0 replies; 14+ messages in thread
From: ZheNing Hu @ 2021-08-17 14:34 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Christian Couder, Hariom verma,
	Ævar Arnfjörð Bjarmason, Stefan Beller

Hi,

Because this email was accidentally sent to everyone’s mail
instead of CC everyone, so this email was re-sent again.

In the implementation of %(raw) atom
(bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
to copy the content of the object. But if we can reuse the content
of the object?

Since git cat-file --batch needs to use ref-filter
as the backend, if the object buffer can be reused correctly here,
we can save a lot of copies and improve its performance by about 6%.

Tracing back to the source, the object buffer is allocated from
oid_object_info_extended(), but in parse_object_buffer() we may lose
the ownership of the buffer (maybe the buffer is eaten), but I browsed the
source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
buffers which have been eaten are released by the program.

So can we reuse it? (Or can we free() it freely?)

This is what I want to do:

diff --git a/ref-filter.c b/ref-filter.c
index 93ce2a6ef2..1f6c1daabd 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
atom_value *val, int deref, struct exp
                        unsigned long buf_size = data->size;

                        if (atom->u.raw_data.option == RAW_BARE) {
-                               v->s = xmemdupz(buf, buf_size);
+                               v->s = buf;
                                v->s_size = buf_size;
                        } else if (atom->u.raw_data.option == RAW_LENGTH) {
                                v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
@@ -1768,8 +1768,6 @@ static int get_object(struct ref_array_item
*ref, int deref, struct object **obj
        }

        grab_common_values(ref->value, deref, oi);
-       if (!eaten)
-               free(oi->content);
        return 0;
 }

Thanks.
--
ZheNing Hu

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
  2021-08-17 14:34 ` Fwd: " ZheNing Hu
@ 2021-08-17 16:09 ` Christian Couder
  2021-08-18  4:51   ` ZheNing Hu
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Couder @ 2021-08-17 16:09 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

Hi,

On Mon, Aug 16, 2021 at 4:00 PM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Hi,
>
> In the implementation of %(raw) atom
> (bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
> to copy the content of the object. But if we can reuse the content
> of the object?
>
> Since git cat-file --batch needs to use ref-filter
> as the backend, if the object buffer can be reused correctly here,
> we can save a lot of copies and improve its performance by about 6%.

Yeah, that would be great.

> Tracing back to the source, the object buffer is allocated from
> oid_object_info_extended(), but in parse_object_buffer() we may lose
> the ownership of the buffer (maybe the buffer is eaten), but I browsed the
> source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
> buffers which have been eaten are released by the program.
>
> So can we reuse it?
>
> This is what I want to do:
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 93ce2a6ef2..1f6c1daabd 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
> atom_value *val, int deref, struct exp
>                         unsigned long buf_size = data->size;
>
>                         if (atom->u.raw_data.option == RAW_BARE) {
> -                               v->s = xmemdupz(buf, buf_size);
> +                               v->s = buf;

It seems to me that this could work only if 'buf' isn't freed. Have
you checked that? Did we leak 'buf' before this patch? Otherwise when
are we freeing it?

>                                 v->s_size = buf_size;
>                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
>                                 v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
> @@ -1768,8 +1768,6 @@ static int get_object(struct ref_array_item
> *ref, int deref, struct object **obj
>         }
>
>         grab_common_values(ref->value, deref, oi);
> -       if (!eaten)
> -               free(oi->content);

This change might not be needed. 'oi->content' seems to come from the
'buf' above, either directly or through a copy, but if we can indeed
own the 'buf' completely, then we should be able to dispose of it how
we want.

>         return 0;
>  }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-17 16:09 ` Christian Couder
@ 2021-08-18  4:51   ` ZheNing Hu
  2021-08-18  8:53     ` Christian Couder
  0 siblings, 1 reply; 14+ messages in thread
From: ZheNing Hu @ 2021-08-18  4:51 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

Christian Couder <christian.couder@gmail.com> 于2021年8月18日周三 上午12:10写道:
>
> Hi,
>
> On Mon, Aug 16, 2021 at 4:00 PM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Hi,
> >
> > In the implementation of %(raw) atom
> > (bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
> > to copy the content of the object. But if we can reuse the content
> > of the object?
> >
> > Since git cat-file --batch needs to use ref-filter
> > as the backend, if the object buffer can be reused correctly here,
> > we can save a lot of copies and improve its performance by about 6%.
>
> Yeah, that would be great.
>
> > Tracing back to the source, the object buffer is allocated from
> > oid_object_info_extended(), but in parse_object_buffer() we may lose
> > the ownership of the buffer (maybe the buffer is eaten), but I browsed the
> > source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
> > buffers which have been eaten are released by the program.
> >
> > So can we reuse it?
> >
> > This is what I want to do:
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 93ce2a6ef2..1f6c1daabd 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
> > atom_value *val, int deref, struct exp
> >                         unsigned long buf_size = data->size;
> >
> >                         if (atom->u.raw_data.option == RAW_BARE) {
> > -                               v->s = xmemdupz(buf, buf_size);
> > +                               v->s = buf;
>
> It seems to me that this could work only if 'buf' isn't freed. Have
> you checked that? Did we leak 'buf' before this patch? Otherwise when
> are we freeing it?
>
This is how I use gdb find out if this buffer have been freed:

Breakpoint 1, get_object (ref=0x5555559ad7a0, deref=1,
obj=0x7fffffffc930, oi=0x5555559a21e0 <oi_deref>, err=0x7fffffffcb80)
at ref-filter.c:1744
1744    {
(gdb) n
1746            int eaten = 1;
(gdb)
1747            if (oi->info.contentp) {
(gdb)
1749                    oi->info.sizep = &oi->size;
(gdb)
1750                    oi->info.typep = &oi->type;
(gdb)
1752            if (oid_object_info_extended(the_repository, &oi->oid,
&oi->info,
(gdb)
1756            if (oi->info.disk_sizep && oi->disk_size < 0)
(gdb)
1759            if (oi->info.contentp) {
(gdb) p oi->info.contentp
$1 = (void **) 0x5555559a2240 <oi_deref+96>
(gdb) p oi->content
$2 = (void *) 0x5555559afc90
(gdb) b free if $rdi == 0x5555559afc90
Breakpoint 2 at 0x7ffff7e1f980
(gdb) d 1
(gdb) c
Continuing.
tree 2025c6f1d02348585487ab5597d77151f158325f
parent 532c266f73e11df407a8939135f2722c003b7539
author ZheNing Hu <adlternative@gmail.com> 1628909038 +0800
committer ZheNing Hu <adlternative@gmail.com> 1628909038 +0800

test2

tree 2025c6f1d02348585487ab5597d77151f158325f
parent 532c266f73e11df407a8939135f2722c003b7539
author ZheNing Hu <adlternative@gmail.com> 1628909038 +0800
committer ZheNing Hu <adlternative@gmail.com> 1628909038 +0800

test2

[Inferior 1 (process 74283) exited normally]

We didn't stop at the breakpoint, this means that we did not free() the memory.
Or will this be used as a caching mechanism? I dunno...

> >                                 v->s_size = buf_size;
> >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> >                                 v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
> > @@ -1768,8 +1768,6 @@ static int get_object(struct ref_array_item
> > *ref, int deref, struct object **obj
> >         }
> >
> >         grab_common_values(ref->value, deref, oi);
> > -       if (!eaten)
> > -               free(oi->content);
>
> This change might not be needed. 'oi->content' seems to come from the
> 'buf' above, either directly or through a copy, but if we can indeed
> own the 'buf' completely, then we should be able to dispose of it how
> we want.
>

Since we will release the `v->s` (now is the buf) in free_array_item(), if we
continue to call free() in get_object(), it may cause double-free.

> >         return 0;
> >  }

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-18  4:51   ` ZheNing Hu
@ 2021-08-18  8:53     ` Christian Couder
  2021-08-18  9:07       ` ZheNing Hu
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Couder @ 2021-08-18  8:53 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

On Wed, Aug 18, 2021 at 6:51 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> 于2021年8月18日周三 上午12:10写道:
> >
> > Hi,
> >
> > On Mon, Aug 16, 2021 at 4:00 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > In the implementation of %(raw) atom
> > > (bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
> > > to copy the content of the object. But if we can reuse the content
> > > of the object?
> > >
> > > Since git cat-file --batch needs to use ref-filter
> > > as the backend, if the object buffer can be reused correctly here,
> > > we can save a lot of copies and improve its performance by about 6%.
> >
> > Yeah, that would be great.
> >
> > > Tracing back to the source, the object buffer is allocated from
> > > oid_object_info_extended(), but in parse_object_buffer() we may lose
> > > the ownership of the buffer (maybe the buffer is eaten), but I browsed the
> > > source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
> > > buffers which have been eaten are released by the program.
> > >
> > > So can we reuse it?
> > >
> > > This is what I want to do:
> > >
> > > diff --git a/ref-filter.c b/ref-filter.c
> > > index 93ce2a6ef2..1f6c1daabd 100644
> > > --- a/ref-filter.c
> > > +++ b/ref-filter.c
> > > @@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
> > > atom_value *val, int deref, struct exp
> > >                         unsigned long buf_size = data->size;
> > >
> > >                         if (atom->u.raw_data.option == RAW_BARE) {
> > > -                               v->s = xmemdupz(buf, buf_size);
> > > +                               v->s = buf;
> >
> > It seems to me that this could work only if 'buf' isn't freed. Have
> > you checked that? Did we leak 'buf' before this patch? Otherwise when
> > are we freeing it?
> >
> This is how I use gdb find out if this buffer have been freed:

I was asking about 'buf' before the patch.

Before the patch, we were doing:

v->s = xmemdupz(buf, buf_size);

which means that in v->s there is a copy of 'buf', not 'buf' itself.
So I was wondering if 'buf' was freed then, not the copy in in v->s.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-18  8:53     ` Christian Couder
@ 2021-08-18  9:07       ` ZheNing Hu
  2021-08-18 11:11         ` ZheNing Hu
  0 siblings, 1 reply; 14+ messages in thread
From: ZheNing Hu @ 2021-08-18  9:07 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

Christian Couder <christian.couder@gmail.com> 于2021年8月18日周三 下午4:54写道:
>
> On Wed, Aug 18, 2021 at 6:51 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Christian Couder <christian.couder@gmail.com> 于2021年8月18日周三 上午12:10写道:
> > >
> > > Hi,
> > >
> > > On Mon, Aug 16, 2021 at 4:00 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In the implementation of %(raw) atom
> > > > (bd0708c7 ref-filter: add %(raw) atom), we use xmemdupz()
> > > > to copy the content of the object. But if we can reuse the content
> > > > of the object?
> > > >
> > > > Since git cat-file --batch needs to use ref-filter
> > > > as the backend, if the object buffer can be reused correctly here,
> > > > we can save a lot of copies and improve its performance by about 6%.
> > >
> > > Yeah, that would be great.
> > >
> > > > Tracing back to the source, the object buffer is allocated from
> > > > oid_object_info_extended(), but in parse_object_buffer() we may lose
> > > > the ownership of the buffer (maybe the buffer is eaten), but I browsed the
> > > > source code of for-each-ref.c or cat-file.c, and I don’t seem to find that the
> > > > buffers which have been eaten are released by the program.
> > > >
> > > > So can we reuse it?
> > > >
> > > > This is what I want to do:
> > > >
> > > > diff --git a/ref-filter.c b/ref-filter.c
> > > > index 93ce2a6ef2..1f6c1daabd 100644
> > > > --- a/ref-filter.c
> > > > +++ b/ref-filter.c
> > > > @@ -1443,7 +1443,7 @@ static void grab_sub_body_contents(struct
> > > > atom_value *val, int deref, struct exp
> > > >                         unsigned long buf_size = data->size;
> > > >
> > > >                         if (atom->u.raw_data.option == RAW_BARE) {
> > > > -                               v->s = xmemdupz(buf, buf_size);
> > > > +                               v->s = buf;
> > >
> > > It seems to me that this could work only if 'buf' isn't freed. Have
> > > you checked that? Did we leak 'buf' before this patch? Otherwise when
> > > are we freeing it?
> > >
> > This is how I use gdb find out if this buffer have been freed:
>
> I was asking about 'buf' before the patch.
>
> Before the patch, we were doing:
>
> v->s = xmemdupz(buf, buf_size);
>
> which means that in v->s there is a copy of 'buf', not 'buf' itself.
> So I was wondering if 'buf' was freed then, not the copy in in v->s.

I think the 'buf' is not freed in ref-filter logic.

But one thing worth noting is:

parse_object_buffer() may take this buffer away, and store it in
tree->buffer or use set_commit_buffer() to store it in commit_slab.

So in theory, as long as we don’t use parse_object_buffer(), this
dynamically allocated memory can be used freely for us. If we use
parse_object_buffer() and free the buffer, there seems to be a risk,
but it does not appear so far.

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-18  9:07       ` ZheNing Hu
@ 2021-08-18 11:11         ` ZheNing Hu
  2021-08-19  1:39           ` ZheNing Hu
  2021-08-20 15:58           ` Christian Couder
  0 siblings, 2 replies; 14+ messages in thread
From: ZheNing Hu @ 2021-08-18 11:11 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

ZheNing Hu <adlternative@gmail.com> 于2021年8月18日周三 下午5:07写道:
> I think the 'buf' is not freed in ref-filter logic.
>
> But one thing worth noting is:
>
> parse_object_buffer() may take this buffer away, and store it in
> tree->buffer or use set_commit_buffer() to store it in commit_slab.
>
> So in theory, as long as we don’t use parse_object_buffer(), this
> dynamically allocated memory can be used freely for us. If we use
> parse_object_buffer() and free the buffer, there seems to be a risk,
> but it does not appear so far.
>

When parse_object_buffer() hints that we don’t free the buffer when
eaten == 1, I have a better idea, If the buffer is "eaten", we copy it,
otherwise we use the originally allocated space.

-static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data)
+static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data, int eaten)
 {
        int i;
        const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
@@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
atom_value *val, int deref, struct exp
                        unsigned long buf_size = data->size;

                        if (atom->u.raw_data.option == RAW_BARE) {
-                               v->s = buf;
+                               if (eaten)
+                                       v->s = xmemdupz(buf, buf_size);
+                               else
+                                       v->s = buf;
                                v->s_size = buf_size;
                        } else if (atom->u.raw_data.option == RAW_LENGTH) {
                                v->s = xstrfmt_len(&v->s_size,
"%"PRIuMAX, (uintmax_t)buf_size);

As parse_object_buffer() does internally: the buffer of commit/tree objects
needs to be copied, but blob/tag not. You said that the number of commits
is generally significantly greater than the others. It seems that we cannot
make full use of this idea. But remember the "skip_parse_buffer" before?
If we skip the parse_object_buffer(), this buffer is also "!eaten".

In other words, those default paths of git cat-file --batch are included in it!
So from the perspective of performance regression, this patch will be
beneficial.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-18 11:11         ` ZheNing Hu
@ 2021-08-19  1:39           ` ZheNing Hu
  2021-08-20 16:13             ` Christian Couder
  2021-08-20 15:58           ` Christian Couder
  1 sibling, 1 reply; 14+ messages in thread
From: ZheNing Hu @ 2021-08-19  1:39 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma,
	Ævar Arnfjörð Bjarmason, Jeff King

Hi, Christian and Hariom,

I want to use this patch series as the temporary final version of GSOC project:

https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic

Due to the branch ref-filter-opt-code-logic or branch
ref-filter-opt-perf patch series
temporarily unable to reflect its optimization to git cat-file
--batch. Therefore, using
branch cat-file-reuse-ref-filter-logic is the most effective now.

This is the final performance regression test result:
Test                                        upstream/master   this
tree
------------------------------------------------------------------------------------
1006.2: cat-file --batch-check              0.06(0.06+0.00)
0.08(0.07+0.00) +33.3%
1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
0.06(0.06+0.00) +0.0%
1006.4: cat-file --batch                    0.49(0.47+0.02)
0.48(0.47+0.01) -2.0%
1006.5: cat-file --batch with atoms         0.48(0.44+0.03)
0.47(0.46+0.01) -2.1%

git cat-file --batch has a performance improvement of about 2%.
git cat-file --batch-check still has a performance gap of 33.3%.

The performance degradation of git cat-file --batch-check is actually
not very big.

upstream/master (225bc32a98):

$ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
--batch-check --batch-all-objects"
Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
 Time (mean ± σ):     596.2 ms ±   5.7 ms    [User: 563.0 ms, System: 32.5 ms]
 Range (min … max):   586.9 ms … 607.9 ms    10 runs

cat-file-reuse-ref-filter-logic (709a0c5c12):

$ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
--batch-check --batch-all-objects"
Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
 Time (mean ± σ):     601.3 ms ±   5.8 ms    [User: 566.9 ms, System: 33.9 ms]
 Range (min … max):   596.7 ms … 613.3 ms    10 runs

The execution time of git cat-file --batch-check is only a few
milliseconds away.

But look at the execution time changes of git cat-file --batch:

upstream/master (225bc32a98):

$ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects
>/dev/null
/home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects >
 24.61s user 0.30s system 99% cpu 24.908 total

cat-file-reuse-ref-filter-logic (709a0c5c12):

$ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null
cat-file --batch --batch-all-objects > /dev/null  25.10s user 0.30s
system 99% cpu 25.417 total

The execution time has been reduced by nearly 0.5 seconds. Intuition
tells me that the performance improvement of git cat-file --batch will be
more important.

In fact, git cat-file origin code directly adds the obtained object data
to the output buffer; But after using ref-filter logic, it needs to copy
the object data to the intermediate data (atom_value), and finally
to the output buffer. At present, we cannot easily eliminate intermediate
data, because git for-each-ref --sort has a lot of dependence on it,
but we can reduce the overhead of copying or allocating memory as
much as possible.

I had an idea that I didn't implement before: partial data delayed evaluation.
Or to be more specific, waiting until the data is about to be added to
the output
buffer, form specific output content, this may be a way to bypass the
intermediate
data.

To be optimistic, I think this patch can be merged with the current
performance of
git cat-file --batch. Of course, this still needs more suggestions
from reviewers.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-18 11:11         ` ZheNing Hu
  2021-08-19  1:39           ` ZheNing Hu
@ 2021-08-20 15:58           ` Christian Couder
  2021-08-21  2:16             ` ZheNing Hu
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Couder @ 2021-08-20 15:58 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
>
> ZheNing Hu <adlternative@gmail.com> 于2021年8月18日周三 下午5:07写道:
> > I think the 'buf' is not freed in ref-filter logic.
> >
> > But one thing worth noting is:
> >
> > parse_object_buffer() may take this buffer away, and store it in
> > tree->buffer or use set_commit_buffer() to store it in commit_slab.
> >
> > So in theory, as long as we don’t use parse_object_buffer(), this
> > dynamically allocated memory can be used freely for us. If we use
> > parse_object_buffer() and free the buffer, there seems to be a risk,
> > but it does not appear so far.
> >
>
> When parse_object_buffer() hints that we don’t free the buffer when
> eaten == 1, I have a better idea, If the buffer is "eaten", we copy it,
> otherwise we use the originally allocated space.
>
> -static void grab_sub_body_contents(struct atom_value *val, int deref,
> struct expand_data *data)
> +static void grab_sub_body_contents(struct atom_value *val, int deref,
> struct expand_data *data, int eaten)
>  {
>         int i;
>         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> atom_value *val, int deref, struct exp
>                         unsigned long buf_size = data->size;
>
>                         if (atom->u.raw_data.option == RAW_BARE) {
> -                               v->s = buf;
> +                               if (eaten)
> +                                       v->s = xmemdupz(buf, buf_size);

Can the 'buf' already be eaten at this point? I thought that it could
be eaten only through v->s so after this code.

> +                               else
> +                                       v->s = buf;
>                                 v->s_size = buf_size;
>                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
>                                 v->s = xstrfmt_len(&v->s_size,
> "%"PRIuMAX, (uintmax_t)buf_size);
>
> As parse_object_buffer() does internally: the buffer of commit/tree objects
> needs to be copied, but blob/tag not. You said that the number of commits
> is generally significantly greater than the others. It seems that we cannot
> make full use of this idea. But remember the "skip_parse_buffer" before?
> If we skip the parse_object_buffer(), this buffer is also "!eaten".
>
> In other words, those default paths of git cat-file --batch are included in it!
> So from the perspective of performance regression, this patch will be
> beneficial.

Yeah, it seems that we can benefit from something like this, but it'd
be nice if things were clarified a bit more.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-19  1:39           ` ZheNing Hu
@ 2021-08-20 16:13             ` Christian Couder
  2021-08-21  2:36               ` ZheNing Hu
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Couder @ 2021-08-20 16:13 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Git List, Junio C Hamano, Hariom verma,
	Ævar Arnfjörð Bjarmason, Jeff King

Hi ZheNing,

On Thu, Aug 19, 2021 at 3:39 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Hi, Christian and Hariom,
>
> I want to use this patch series as the temporary final version of GSOC project:
>
> https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic

I am still not very happy with the last patch in the series,but it can
be improved later.

> Due to the branch ref-filter-opt-code-logic or branch
> ref-filter-opt-perf patch series
> temporarily unable to reflect its optimization to git cat-file
> --batch. Therefore, using
> branch cat-file-reuse-ref-filter-logic is the most effective now.
>
> This is the final performance regression test result:
> Test                                        upstream/master   this
> tree
> ------------------------------------------------------------------------------------
> 1006.2: cat-file --batch-check              0.06(0.06+0.00)
> 0.08(0.07+0.00) +33.3%
> 1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
> 0.06(0.06+0.00) +0.0%
> 1006.4: cat-file --batch                    0.49(0.47+0.02)
> 0.48(0.47+0.01) -2.0%
> 1006.5: cat-file --batch with atoms         0.48(0.44+0.03)
> 0.47(0.46+0.01) -2.1%
>
> git cat-file --batch has a performance improvement of about 2%.
> git cat-file --batch-check still has a performance gap of 33.3%.
>
> The performance degradation of git cat-file --batch-check is actually
> not very big.
>
> upstream/master (225bc32a98):
>
> $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> --batch-check --batch-all-objects"
> Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
>  Time (mean ± σ):     596.2 ms ±   5.7 ms    [User: 563.0 ms, System: 32.5 ms]
>  Range (min … max):   586.9 ms … 607.9 ms    10 runs
>
> cat-file-reuse-ref-filter-logic (709a0c5c12):
>
> $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> --batch-check --batch-all-objects"
> Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
>  Time (mean ± σ):     601.3 ms ±   5.8 ms    [User: 566.9 ms, System: 33.9 ms]
>  Range (min … max):   596.7 ms … 613.3 ms    10 runs
>
> The execution time of git cat-file --batch-check is only a few
> milliseconds away.

Yeah, it looks like less than 1% overhead.

Great work!

> But look at the execution time changes of git cat-file --batch:
>
> upstream/master (225bc32a98):
>
> $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects
> >/dev/null
> /home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects >
>  24.61s user 0.30s system 99% cpu 24.908 total
>
> cat-file-reuse-ref-filter-logic (709a0c5c12):
>
> $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null
> cat-file --batch --batch-all-objects > /dev/null  25.10s user 0.30s
> system 99% cpu 25.417 total
>
> The execution time has been reduced by nearly 0.5 seconds.

It looks like it has increased by 0.5s, not been reduced.

> Intuition
> tells me that the performance improvement of git cat-file --batch will be
> more important.
>
> In fact, git cat-file origin code directly adds the obtained object data
> to the output buffer; But after using ref-filter logic, it needs to copy
> the object data to the intermediate data (atom_value), and finally
> to the output buffer. At present, we cannot easily eliminate intermediate
> data, because git for-each-ref --sort has a lot of dependence on it,
> but we can reduce the overhead of copying or allocating memory as
> much as possible.

Ok.

> I had an idea that I didn't implement before: partial data delayed evaluation.
> Or to be more specific, waiting until the data is about to be added to
> the output
> buffer, form specific output content, this may be a way to bypass the
> intermediate
> data.

Yeah, that might be a good idea.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-20 15:58           ` Christian Couder
@ 2021-08-21  2:16             ` ZheNing Hu
  2021-08-24  7:11               ` Christian Couder
  0 siblings, 1 reply; 14+ messages in thread
From: ZheNing Hu @ 2021-08-21  2:16 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

Christian Couder <christian.couder@gmail.com> 于2021年8月20日周五 下午11:58写道:
>
> On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年8月18日周三 下午5:07写道:
> > > I think the 'buf' is not freed in ref-filter logic.
> > >
> > > But one thing worth noting is:
> > >
> > > parse_object_buffer() may take this buffer away, and store it in
> > > tree->buffer or use set_commit_buffer() to store it in commit_slab.
> > >
> > > So in theory, as long as we don’t use parse_object_buffer(), this
> > > dynamically allocated memory can be used freely for us. If we use
> > > parse_object_buffer() and free the buffer, there seems to be a risk,
> > > but it does not appear so far.
> > >
> >
> > When parse_object_buffer() hints that we don’t free the buffer when
> > eaten == 1, I have a better idea, If the buffer is "eaten", we copy it,
> > otherwise we use the originally allocated space.
> >
> > -static void grab_sub_body_contents(struct atom_value *val, int deref,
> > struct expand_data *data)
> > +static void grab_sub_body_contents(struct atom_value *val, int deref,
> > struct expand_data *data, int eaten)
> >  {
> >         int i;
> >         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> > atom_value *val, int deref, struct exp
> >                         unsigned long buf_size = data->size;
> >
> >                         if (atom->u.raw_data.option == RAW_BARE) {
> > -                               v->s = buf;
> > +                               if (eaten)
> > +                                       v->s = xmemdupz(buf, buf_size);
>
> Can the 'buf' already be eaten at this point? I thought that it could
> be eaten only through v->s so after this code.
>

I think where the 'buf' is eaten is parse_object_buffer(). set_commit_buffer()
or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
manually. In grab_sub_body_contents() we can indeed write the address of
'buf' to v->s, but what I worry about is that before we completely write v->s to
the output buffer through append_atom(), will this buf be freed in
somewhere else?

> > +                               else
> > +                                       v->s = buf;
> >                                 v->s_size = buf_size;
> >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> >                                 v->s = xstrfmt_len(&v->s_size,
> > "%"PRIuMAX, (uintmax_t)buf_size);
> >
> > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > needs to be copied, but blob/tag not. You said that the number of commits
> > is generally significantly greater than the others. It seems that we cannot
> > make full use of this idea. But remember the "skip_parse_buffer" before?
> > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> >
> > In other words, those default paths of git cat-file --batch are included in it!
> > So from the perspective of performance regression, this patch will be
> > beneficial.
>
> Yeah, it seems that we can benefit from something like this, but it'd

Only when the data allocated to us is dynamically allocated and we do have
the ownership of it, we can benefit by reducing copies and allocate
memory again.

> be nice if things were clarified a bit more.

OK. In get_object(), eaten means that the oi->content we obtained is cached
by git in parse_object_buffer(). This means that we don't need to free this buf
manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
parse_buffer() when using some atoms, which can avoid some meaningless parsing.
So under this path, our'buf' is not cached. This means we have
ownership of it, so we
can free it freely, and benefit from this.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-20 16:13             ` Christian Couder
@ 2021-08-21  2:36               ` ZheNing Hu
  0 siblings, 0 replies; 14+ messages in thread
From: ZheNing Hu @ 2021-08-21  2:36 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git List, Junio C Hamano, Hariom verma,
	Ævar Arnfjörð Bjarmason, Jeff King

Christian Couder <christian.couder@gmail.com> 于2021年8月21日周六 上午12:13写道:
>
> Hi ZheNing,
>
> On Thu, Aug 19, 2021 at 3:39 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Hi, Christian and Hariom,
> >
> > I want to use this patch series as the temporary final version of GSOC project:
> >
> > https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic
>
> I am still not very happy with the last patch in the series,but it can
> be improved later.
>

To be free to tell me what's not good about it, I can try my best to
improve it. :-)


> > Due to the branch ref-filter-opt-code-logic or branch
> > ref-filter-opt-perf patch series
> > temporarily unable to reflect its optimization to git cat-file
> > --batch. Therefore, using
> > branch cat-file-reuse-ref-filter-logic is the most effective now.
> >
> > This is the final performance regression test result:
> > Test                                        upstream/master   this
> > tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.06(0.06+0.00)
> > 0.08(0.07+0.00) +33.3%
> > 1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
> > 0.06(0.06+0.00) +0.0%
> > 1006.4: cat-file --batch                    0.49(0.47+0.02)
> > 0.48(0.47+0.01) -2.0%
> > 1006.5: cat-file --batch with atoms         0.48(0.44+0.03)
> > 0.47(0.46+0.01) -2.1%
> >
> > git cat-file --batch has a performance improvement of about 2%.
> > git cat-file --batch-check still has a performance gap of 33.3%.
> >
> > The performance degradation of git cat-file --batch-check is actually
> > not very big.
> >
> > upstream/master (225bc32a98):
> >
> > $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> > --batch-check --batch-all-objects"
> > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
> >  Time (mean ± σ):     596.2 ms ±   5.7 ms    [User: 563.0 ms, System: 32.5 ms]
> >  Range (min … max):   586.9 ms … 607.9 ms    10 runs
> >
> > cat-file-reuse-ref-filter-logic (709a0c5c12):
> >
> > $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> > --batch-check --batch-all-objects"
> > Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
> >  Time (mean ± σ):     601.3 ms ±   5.8 ms    [User: 566.9 ms, System: 33.9 ms]
> >  Range (min … max):   596.7 ms … 613.3 ms    10 runs
> >
> > The execution time of git cat-file --batch-check is only a few
> > milliseconds away.
>
> Yeah, it looks like less than 1% overhead.
>
> Great work!
>
> > But look at the execution time changes of git cat-file --batch:
> >
> > upstream/master (225bc32a98):
> >
> > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects
> > >/dev/null
> > /home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects >
> >  24.61s user 0.30s system 99% cpu 24.908 total
> >
> > cat-file-reuse-ref-filter-logic (709a0c5c12):
> >
> > $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null
> > cat-file --batch --batch-all-objects > /dev/null  25.10s user 0.30s
> > system 99% cpu 25.417 total
> >
> > The execution time has been reduced by nearly 0.5 seconds.
>
> It looks like it has increased by 0.5s, not been reduced.
>

Unfortunately, you are right, it is not faster, but slower.
It seems that the 2% optimization measured by t/perf does not seem to be
so credible? I donno.

Test                                        upstream/master   this
tree
------------------------------------------------------------------------------------
1006.2: cat-file --batch-check              0.07(0.06+0.01)
0.08(0.07+0.01) +14.3%
1006.3: cat-file --batch-check with atoms   0.06(0.05+0.01)
0.07(0.05+0.01) +16.7%
1006.4: cat-file --batch                    0.49(0.46+0.03)
0.48(0.47+0.01) -2.0%
1006.5: cat-file --batch with atoms         0.48(0.45+0.03)
0.47(0.46+0.01) -2.1%

Do we need to focus on the benchmark instead of the sum of the
benchmark plus the
variance? i.e. 1006.4, benchmark are 0.46 and 0.47, From this perspective, the
performance of git cat-file --batch will be worse.

> > Intuition
> > tells me that the performance improvement of git cat-file --batch will be
> > more important.
> >
> > In fact, git cat-file origin code directly adds the obtained object data
> > to the output buffer; But after using ref-filter logic, it needs to copy
> > the object data to the intermediate data (atom_value), and finally
> > to the output buffer. At present, we cannot easily eliminate intermediate
> > data, because git for-each-ref --sort has a lot of dependence on it,
> > but we can reduce the overhead of copying or allocating memory as
> > much as possible.
>
> Ok.
>
> > I had an idea that I didn't implement before: partial data delayed evaluation.
> > Or to be more specific, waiting until the data is about to be added to
> > the output
> > buffer, form specific output content, this may be a way to bypass the
> > intermediate
> > data.
>
> Yeah, that might be a good idea.

I will try to do it.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-21  2:16             ` ZheNing Hu
@ 2021-08-24  7:11               ` Christian Couder
  2021-08-25  8:11                 ` ZheNing Hu
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Couder @ 2021-08-24  7:11 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

On Sat, Aug 21, 2021 at 4:16 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> 于2021年8月20日周五 下午11:58写道:
> >
> > On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@gmail.com> wrote:

> > > -static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data)
> > > +static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data, int eaten)
> > >  {
> > >         int i;
> > >         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> > > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> > > atom_value *val, int deref, struct exp
> > >                         unsigned long buf_size = data->size;
> > >
> > >                         if (atom->u.raw_data.option == RAW_BARE) {
> > > -                               v->s = buf;
> > > +                               if (eaten)
> > > +                                       v->s = xmemdupz(buf, buf_size);
> >
> > Can the 'buf' already be eaten at this point? I thought that it could
> > be eaten only through v->s so after this code.
>
> I think where the 'buf' is eaten is parse_object_buffer().

Yeah, right, and parse_object_buffer() seems to be called by
get_object() which calls grab_values() after that. So the buf can
indeed be eaten at that point.

> set_commit_buffer()
> or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
> manually.

Yeah, but it doesn't mean that we cannot use the buf. So perhaps we
could still use it, even if it's eaten.

> In grab_sub_body_contents() we can indeed write the address of
> 'buf' to v->s, but what I worry about is that before we completely write v->s to
> the output buffer through append_atom(), will this buf be freed in
> somewhere else?

Yeah, that's a good question. But actually it seems that the buf is
freed (at least in get_object()) only when it's not eaten.

> > > +                               else
> > > +                                       v->s = buf;
> > >                                 v->s_size = buf_size;
> > >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> > >                                 v->s = xstrfmt_len(&v->s_size,
> > > "%"PRIuMAX, (uintmax_t)buf_size);
> > >
> > > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > > needs to be copied, but blob/tag not. You said that the number of commits
> > > is generally significantly greater than the others. It seems that we cannot
> > > make full use of this idea. But remember the "skip_parse_buffer" before?
> > > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> > >
> > > In other words, those default paths of git cat-file --batch are included in it!
> > > So from the perspective of performance regression, this patch will be
> > > beneficial.
> >
> > Yeah, it seems that we can benefit from something like this, but it'd
>
> Only when the data allocated to us is dynamically allocated and we do have
> the ownership of it, we can benefit by reducing copies and allocate
> memory again.
>
> > be nice if things were clarified a bit more.
>
> OK. In get_object(), eaten means that the oi->content we obtained is cached
> by git in parse_object_buffer(). This means that we don't need to free this buf
> manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
> parse_buffer() when using some atoms, which can avoid some meaningless parsing.
> So under this path, our'buf' is not cached. This means we have
> ownership of it, so we
> can free it freely, and benefit from this.

In the patch we are not freeing it, we only duplicate it when it's
eaten. So I am not sure that the above explanation are clear enough.
That's the issue I have with this patch, but maybe the commit message
can now be discussed and improved after sending the patch series to
the mailing list instead of in this thread.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
  2021-08-24  7:11               ` Christian Couder
@ 2021-08-25  8:11                 ` ZheNing Hu
  0 siblings, 0 replies; 14+ messages in thread
From: ZheNing Hu @ 2021-08-25  8:11 UTC (permalink / raw)
  To: Christian Couder; +Cc: Git List, Junio C Hamano, Stefan Beller, Hariom verma

Christian Couder <christian.couder@gmail.com> 于2021年8月24日周二 下午3:11写道:
>
> > set_commit_buffer()
> > or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
> > manually.
>
> Yeah, but it doesn't mean that we cannot use the buf. So perhaps we
> could still use it, even if it's eaten.
>

Yes. Just like the following question: How do we know that the eaten 'buf' has
not been free() by some logic in git when we want to use it?

> > In grab_sub_body_contents() we can indeed write the address of
> > 'buf' to v->s, but what I worry about is that before we completely write v->s to
> > the output buffer through append_atom(), will this buf be freed in
> > somewhere else?
>
> Yeah, that's a good question. But actually it seems that the buf is
> freed (at least in get_object()) only when it's not eaten.
>

get_object() will free the 'buf' which has not been eaten, but we can
take use of
it (v->s = buf and we free it later)

> > > > +                               else
> > > > +                                       v->s = buf;
> > > >                                 v->s_size = buf_size;
> > > >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> > > >                                 v->s = xstrfmt_len(&v->s_size,
> > > > "%"PRIuMAX, (uintmax_t)buf_size);
> > > >
> > > > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > > > needs to be copied, but blob/tag not. You said that the number of commits
> > > > is generally significantly greater than the others. It seems that we cannot
> > > > make full use of this idea. But remember the "skip_parse_buffer" before?
> > > > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> > > >
> > > > In other words, those default paths of git cat-file --batch are included in it!
> > > > So from the perspective of performance regression, this patch will be
> > > > beneficial.
> > >
> > > Yeah, it seems that we can benefit from something like this, but it'd
> >
> > Only when the data allocated to us is dynamically allocated and we do have
> > the ownership of it, we can benefit by reducing copies and allocate
> > memory again.
> >
> > > be nice if things were clarified a bit more.
> >
> > OK. In get_object(), eaten means that the oi->content we obtained is cached
> > by git in parse_object_buffer(). This means that we don't need to free this buf
> > manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
> > parse_buffer() when using some atoms, which can avoid some meaningless parsing.
> > So under this path, our'buf' is not cached. This means we have
> > ownership of it, so we
> > can free it freely, and benefit from this.
>
> In the patch we are not freeing it, we only duplicate it when it's
> eaten. So I am not sure that the above explanation are clear enough.
> That's the issue I have with this patch, but maybe the commit message
> can now be discussed and improved after sending the patch series to
> the mailing list instead of in this thread.
>

My thoughts are:

1. those 'buf' which have been eaten, because we don't know when they
will be freed, if we
use it in append_atom(), it may be a little unsafe if it have been
freed, but it is safe to use after
copying it;
2. those  uneaten 'buf', it can be used safely.

Maybe we can guarantee that the buf have not been freed when append_atom()?
I don't have a good grasp.

> Thanks!

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-08-25  8:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
2021-08-17 14:34 ` Fwd: " ZheNing Hu
2021-08-17 16:09 ` Christian Couder
2021-08-18  4:51   ` ZheNing Hu
2021-08-18  8:53     ` Christian Couder
2021-08-18  9:07       ` ZheNing Hu
2021-08-18 11:11         ` ZheNing Hu
2021-08-19  1:39           ` ZheNing Hu
2021-08-20 16:13             ` Christian Couder
2021-08-21  2:36               ` ZheNing Hu
2021-08-20 15:58           ` Christian Couder
2021-08-21  2:16             ` ZheNing Hu
2021-08-24  7:11               ` Christian Couder
2021-08-25  8:11                 ` ZheNing Hu

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).