git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`
Date: Tue, 2 Apr 2024 23:52:58 -0500	[thread overview]
Message-ID: <hwfkdfilniy46usnc3vnksaphdxboi5bxep4ek7aj2qxfhu332@6ym7dnn35k7z> (raw)
In-Reply-To: <f10882a0840a77f2569cf891374b70d1e84ceb4b.1711519925.git.ps@pks.im>

On 24/03/27 07:37AM, Patrick Steinhardt wrote:
> The table iterator allows the caller to iterate through all records in a
> reftable table. To do so it iterates through all blocks of the desired
> type one by one, where for each block it creates a new block iterator
> and yields all its entries.
> 
> One of the things that is somewhat confusing in this context is who owns
> the block reader that is being used to read the blocks and pass them to
> the block iterator. Intuitively, as the table iterator is responsible
> for iterating through the blocks, one would assume that this iterator is
> also responsible for managing the lifecycle of the reader. And while it
> somewhat is, the block reader is ultimately stored inside of the block
> iterator.
> 
> Refactor the code such that the block reader is instead fully managed by
> the table iterator. Instead of passing the reader to the block iterator,
> we now only end up passing the block data to it. Despite clearing up the
> lifecycle of the reader, it will also allow for better reuse of the
> reader in subsequent patches.
> 
> The following benchmark prints a single matching ref out of 1 million
> refs. Before:
> 
>   HEAP SUMMARY:
>       in use at exit: 13,603 bytes in 125 blocks
>     total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
> 
> After:
> 
>   HEAP SUMMARY:
>       in use at exit: 13,603 bytes in 125 blocks
>     total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
> 
> Note that while there are more allocation and free calls now, the
> overall number of bytes allocated is significantly lower. The number of
> allocations will be reduced significantly by the next patch though.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...
> @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
>  int block_iter_next(struct block_iter *it, struct reftable_record *rec)
>  {
>  	struct string_view in = {
> -		.buf = it->br->block.data + it->next_off,
> -		.len = it->br->block_len - it->next_off,
> +		.buf = (unsigned char *) it->block + it->next_off,

Would it be best to use the `uint8_t *` type instead of `unsigned char *`
to match `string_view.buf`? Not sure if it matters in this case.

> +		.len = it->block_len - it->next_off,
>  	};
...  
> diff --git a/reftable/block.h b/reftable/block.h
> index 601a1e0e89..b41efa5042 100644
> --- a/reftable/block.h
> +++ b/reftable/block.h
> @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
>  void block_reader_release(struct block_reader *br);
>  
>  /* Returns the block type (eg. 'r' for refs) */
> -uint8_t block_reader_type(struct block_reader *r);
> +uint8_t block_reader_type(const struct block_reader *r);
>  
>  /* Decodes the first key in the block */
> -int block_reader_first_key(struct block_reader *br, struct strbuf *key);
> +int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
>  
>  /* Iterate over entries in a block */
>  struct block_iter {
>  	/* offset within the block of the next entry to read. */
>  	uint32_t next_off;
> -	struct block_reader *br;
> +	const unsigned char *block;

Same question here. Would it be better to use `uint8_t *`? Or does it not
really matter?

> +	size_t block_len;
> +	int hash_size;
>  
>  	/* key for last entry we read. */
>  	struct strbuf last_key;
> @@ -106,17 +108,22 @@ struct block_iter {
>  }
>  
>  /* Position `it` at start of the block */
> -void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
> +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
>  
>  /* Position `it` to the `want` key in the block */
> -int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
> +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
>  			struct strbuf *want);
>  
> -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
> +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
>  
>  /* return < 0 for error, 0 for OK, > 0 for EOF. */
>  int block_iter_next(struct block_iter *it, struct reftable_record *rec);
>  
> +/*
> + * Reset the block iterator to pristine state without releasing its memory.
> + */

Do we want to make the comment a single line to match the other adjacent
examples?

> +void block_iter_reset(struct block_iter *it);
> +
>  /* deallocate memory for `it`. The block reader and its block is left intact. */
>  void block_iter_close(struct block_iter *it);
>  
...

-Justin


  reply	other threads:[~2024-04-03  4:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-03 13:16   ` Karthik Nayak
2024-04-08 12:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-03  4:52   ` Justin Tobler [this message]
2024-04-03 13:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-10 10:15     ` Karthik Nayak
2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
2024-04-09  1:29     ` Justin Tobler
2024-04-09  3:18       ` Patrick Steinhardt
2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
2024-04-10 11:35   ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=hwfkdfilniy46usnc3vnksaphdxboi5bxep4ek7aj2qxfhu332@6ym7dnn35k7z \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).