git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Abhradeep Chakraborty via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>,
	Kaartic Sivaram <kaartic.sivaraam@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Subject: Re: [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API
Date: Mon, 19 Sep 2022 14:33:51 -0400	[thread overview]
Message-ID: <b727c25c-469f-ca56-bbd6-82f82c762523@github.com> (raw)
In-Reply-To: <38ec2360f4fbfe65fa2d9f1e9cfb7d4944d1714f.1663609659.git.gitgitgadget@gmail.com>

On 9/19/2022 1:47 PM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> Though the Roaring library is introduced in previous commit, the library
> cannot be used as is. One reason is that the library doesn't support Big
> endian machines. Besides, Git specific file related functions does use
> `hashwrite()` (or similar). So there is a need to modify the library.

There are a few refactorings happening in this single patch, so it
might be good to split them out for easier spot-checking from the
reviewer's perspective. I'll try to list the ones I see.
 

>  int32_t array_container_write(const array_container_t *container, char *buf);
> +
> +int array_container_network_write(const array_container_t *container,
> +				  int (*write_fn) (void *, const void *, size_t),
> +				  void *data);

Should we make write_fn a defined type? I'm not sure I've seen this
implicit type within a function declaration before.

>  /**
>   * Reads the instance from buf, outputs how many bytes were read.
>   * This is meant to be byte-by-byte compatible with the Java and Go versions of
> @@ -1801,6 +1805,9 @@ int32_t array_container_write(const array_container_t *container, char *buf);
>  int32_t array_container_read(int32_t cardinality, array_container_t *container,
>                               const char *buf);
>  
> +int32_t array_container_network_read(int32_t cardinality, array_container_t *container,
> +                        	     const char *buf);
> +

Both of these functions are creating new implementations instead
of modifying the existing implementations. Is there any reason
why we should keep both of these in perpetuity? They are likely
to drift if we do that.

> +static int container_network_write(const container_t *c, uint8_t typecode,
> +				   int (*write_fn) (void *, const void *, size_t),
> +				   void *data)
> +{
> +	c = container_unwrap_shared(c, &typecode);
> +	switch (typecode) {
> +		case BITSET_CONTAINER_TYPE:
> +			return bitset_container_network_write(const_CAST_bitset(c), write_fn, data);
> +		case ARRAY_CONTAINER_TYPE:
> +			return array_container_network_write(const_CAST_array(c), write_fn, data);
> +		case RUN_CONTAINER_TYPE:
> +			return run_container_network_write(const_CAST_run(c), write_fn, data);
> +	}
> +	assert(false);
> +	__builtin_unreachable();
> +	return 0;
> +}
> +

This similarly is a copy of an existing function. Instead we
should probably make all writers/readers expect network byte
order (for all multi-word integers).

> +static size_t ra_portable_network_size_in_bytes(const roaring_array_t *ra)
> +{
> +	size_t count = ra_portable_network_header_size(ra);
> +
> +	for (int32_t k = 0; k < ra->size; ++k)

We have not loosened the restriction on defining iterator variables
within the for and instead would need this in the outer block. One
possible refactoring would be to move these definitions everywhere
within roaring.c.

> @@ -8603,16 +8981,16 @@ extern inline void roaring_bitmap_remove_range(roaring_bitmap_t *r, uint64_t min
>  void roaring_bitmap_printf(const roaring_bitmap_t *r) {
>      const roaring_array_t *ra = &r->high_low_container;
>  
> -    printf("{");
> +    fprintf(stderr, "{");
>      for (int i = 0; i < ra->size; ++i) {
>          container_printf_as_uint32_array(ra->containers[i], ra->typecodes[i],
>                                           ((uint32_t)ra->keys[i]) << 16);
>  
>          if (i + 1 < ra->size) {
> -            printf(",");
> +            fprintf(stderr, ",");
>          }
>      }
> -    printf("}");
> +    fprintf(stderr, "}");
>  }

This change is confusing to me. I epxect the printf() to print to
stdout, and this might be used in a test helper or something. If
you really want this to go somewhere other than stdout, then the
method should be changed to take an arbitrary FILE*.

> +void roaring_bitmap_free_safe(roaring_bitmap_t **r)
> +{
> +	if (*r) {
> +		roaring_bitmap_free((const roaring_bitmap_t *)*r);
> +		r = NULL;

I think you want "*r = NULL" here, if you are intending to free
and NULL the given address.

This method seems separate from the network-byte-order changes.

> +	}
> +}
> +
  
> +size_t roaring_bitmap_network_portable_size_in_bytes(const roaring_bitmap_t *r)
> +{
> +	return ra_portable_network_size_in_bytes(&r->high_low_container);
> +}

Does network order change the potential size of the bitmap?

> +roaring_bitmap_t *roaring_bitmap_portable_network_deserialize_safe(const char *buf, size_t maxbytes)
> +{
> +	roaring_bitmap_t *ans =
> +		(roaring_bitmap_t *)roaring_malloc(sizeof(roaring_bitmap_t));
> +	if (ans == NULL) {
> +		return NULL;
> +	}

nit: Lose braces around single-line blocks.

> +	size_t bytesread;
> +	bool is_ok = ra_portable_network_deserialize(&ans->high_low_container, buf, maxbytes, &bytesread);

Declare all variables before your logic. I think this will fail if
you run "make DEVELOPER=1".

> +	if(is_ok) assert(bytesread <= maxbytes);

nit: break lines for if bodies.

> +	roaring_bitmap_set_copy_on_write(ans, false);
> +	if (!is_ok) {
> +		roaring_free(ans);
> +		return NULL;
> +	}
> +	return ans;
> +}
> +

>  size_t roaring_bitmap_portable_serialize(const roaring_bitmap_t *r,
>                                           char *buf) {
>      return ra_portable_serialize(&r->high_low_container, buf);
>  }
>  
> +int roaring_bitmap_portable_network_serialize(roaring_bitmap_t *rb,
> +				     int (*write_fn) (void *, const void *, size_t),
> +				     void *data)
> +{
> +	return ra_portable_network_serialize(&rb->high_low_container, write_fn, data);
> +}

I'm not sure why these methods are created as wrappers instead of
renaming the base methods.


>  roaring_bitmap_t *roaring_bitmap_deserialize(const void *buf) {
>      const char *bufaschar = (const char *)buf;
>      if (*(const unsigned char *)buf == CROARING_SERIALIZATION_ARRAY_UINT32) {
> @@ -13827,9 +14247,9 @@ void array_container_printf_as_uint32_array(const array_container_t *v,
>      if (v->cardinality == 0) {
>          return;
>      }
> -    printf("%u", v->array[0] + base);
> +    fprintf(stderr, "%u", v->array[0] + base);
>      for (int i = 1; i < v->cardinality; ++i) {
> -        printf(",%u", v->array[i] + base);
> +        fprintf(stderr, ",%u", v->array[i] + base);

Here's another printf to fprintf situation that is unclear to me.

> @@ -15208,13 +15659,13 @@ void run_container_printf_as_uint32_array(const run_container_t *cont,
>      {
>          uint32_t run_start = base + cont->runs[0].value;
>          uint16_t le = cont->runs[0].length;
> -        printf("%u", run_start);
> -        for (uint32_t j = 1; j <= le; ++j) printf(",%u", run_start + j);
> +        fprintf(stderr, "%u", run_start);
> +        for (uint32_t j = 1; j <= le; ++j) fprintf(stderr, ",%u", run_start + j);

Ditto here. I see we are inheriting off-style code from the original.

> +/**
> + * Frees the memory if exists
> + */
> +void roaring_bitmap_free_safe(roaring_bitmap_t **r);

And nullifies the pointer, don't forget!

In general, I think this change would be a lot smaller if you took
the existing implementation and inserted the proper ntohl() and
htonl() conversions. Git will never call the other versions, so
why keep them in the tree? Why require re-checking all of the format
logic here instead of only the places where we write multi-byte
words?

Thanks,
-Stolee

  reply	other threads:[~2022-09-19 18:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-19 17:47 [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 1/5] reachability-bitmaps: add CRoaring library " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:33   ` Derrick Stolee [this message]
2022-09-19 22:02     ` Junio C Hamano
2022-09-20 12:19       ` Derrick Stolee
2022-09-20 15:09         ` Abhradeep Chakraborty
2022-09-21 16:58         ` Junio C Hamano
2022-09-20 14:46     ` Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty via GitGitGadget
2022-09-30  6:20   ` Junio C Hamano
2022-09-30 16:23     ` Abhradeep Chakraborty
2022-10-30  6:35       ` Abhradeep Chakraborty
2022-10-30 19:46         ` Derrick Stolee
2022-10-31 14:30           ` Abhradeep Chakraborty
2022-10-31 16:06           ` Junio C Hamano
2022-10-31 17:51             ` C99 -> C11 or C17? (was: [PATCH 3/5] roaring: teach Git to write roaring bitmaps) Ævar Arnfjörð Bjarmason
2022-10-31 20:55               ` rsbecker
2022-11-01  6:58             ` [PATCH 3/5] roaring: teach Git to write roaring bitmaps Abhradeep Chakraborty
2022-09-19 17:47 ` [PATCH 4/5] roaring: introduce a new config option for " Abhradeep Chakraborty via GitGitGadget
2022-09-19 17:47 ` [PATCH 5/5] roaring: teach Git to read " Abhradeep Chakraborty via GitGitGadget
2022-09-19 18:18 ` [PATCH 0/5] [RFC] introduce Roaring bitmaps to Git Derrick Stolee
2022-09-20 14:05   ` Abhradeep Chakraborty
2022-09-20 21:59 ` Taylor Blau
2022-09-21 15:27   ` Abhradeep Chakraborty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b727c25c-469f-ca56-bbd6-82f82c762523@github.com \
    --to=derrickstolee@github.com \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=kaartic.sivaraam@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).