Re: [PATCH v2] t/t0021: convert the rot13-filter.pl script to C

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Matheus Tavares <matheus.bernardino@usp.br>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, gitster@pobox.com, larsxschneider@gmail.com,
	christian.couder@gmail.com, avarab@gmail.com
Subject: Re: [PATCH v2] t/t0021: convert the rot13-filter.pl script to C
Date: Sat, 30 Jul 2022 23:52:16 -0300	[thread overview]
Message-ID: <CAHd-oW6LZay=MX2FdFjgTh1pjE=g-XTm63mGWuMhHd=-N=tXRA@mail.gmail.com> (raw)
In-Reply-To: <4n20476q-6ssr-osp8-q5o3-p8ns726q4pn3@tzk.qr>

Hi, Dscho

On Thu, Jul 28, 2022 at 1:58 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> > On Sun, 24 Jul 2022, Matheus Tavares wrote:
> >
> > diff --git a/t/helper/test-rot13-filter.c b/t/helper/test-rot13-filter.c
> > +static char *rot13(char *str)
> > +{
> > +     char *c;
> > +     for (c = str; *c; c++) {
> > +             if (*c >= 'a' && *c <= 'z')
> > +                     *c = 'a' + (*c - 'a' + 13) % 26;
> > +             else if (*c >= 'A' && *c <= 'Z')
> > +                     *c = 'A' + (*c - 'A' + 13) % 26;
>
> That's quite verbose, but it _is_ correct (if a bit harder than necessary
> to validate, I admit that I had to look up whether `%`'s precedence is higher
> than `+` in https://en.cppreference.com/w/c/language/operator_precedence).
>
> A conciser way (also easier to reason about):
>
>         for (c = str; *c; c++)
>                 if (isalpha(*c))
>                         *c += tolower(*c) < 'n' ? 13 : -13;

Nice :) Thanks.

> > [...]
> > +static void packet_read_capabilities(struct string_list *caps)
> > +{
> > +     while (1) {
> > +             int size;
> > +             char *buf = packet_read_line(0, &size);
> > +             if (!buf)
> > +                     break;
> > +             string_list_append_nodup(caps,
> > +                                      skip_key_dup(buf, size, "capability"));
>
> It is tempting to use unsorted string lists for everything because Perl
> makes that relatively easy.
>
> However, in this instance I would strongly recommend using something more
> akin to Perl's "hash" data structure, in this instance a `strset`.

Ok, will do.

> > +
> > +/*
> > + * Check our capabilities we want to advertise against the remote ones
> > + * and then advertise our capabilities
> > + */
> > +static void packet_check_and_write_capabilities(struct string_list *remote_caps,
> > +                                             struct string_list *our_caps)
>
> The list of "our caps" comes from the command-line. In C, this means we
> get a `const char **argv` and an `int argc`. So:
>
> static void check_and_write_capabilities(struct strset *remote_caps,
>                                          const char **caps, int caps_count)
> {
>         int i;
>
>         for (i = 0; i < caps_count; i++) {
>                 if (!strset_contains(remote_caps, caps[i]))
>                         die("our capability '%s' is not available from remote",
>                             caps[i]);
>
>                 packet_write_fmt(1, "capability=%s\n", caps[i]);
>         }
>         packet_flush(1);
> }

Makes sense. We also use the list elsewhere (has_capability()), but we
can easily replace that with two global flags to indicate if we have
the "clean" and "smudge" caps.

> And then we would call it via
>
>         check_and_write_capabilities(remote_caps, argv + 1, argc - 1);
>
> [...]
> > +static void command_loop(void)
> > +{
> > +     while (1) {
> > +             char *command = packet_key_val_read("command");
> > +             if (!command) {
> > +                     fprintf(logfile, "STOP\n");
> > +                     break;
> > +             }
> > +             fprintf(logfile, "IN: %s", command);
>
> We will also need to `fflush(logfile)` here, to imitate the Perl script's
> behavior more precisely.

I was somewhat intrigued as to why the flushes were needed in the Perl
script. But reading [1] and [2], now, it seems to have been an
oversight.

That is, Eric suggested splictily flushing stdout because it is a
pipe, but the author ended up erroneously disabling autoflush for
stdout too, so that's why we needed the flushes there. They later
acknowledged that and said that they would re-enabled it (see [2]),
but it seems to have been forgotten. So I think we can safely drop the
flush calls.

[1]: http://public-inbox.org/git/20160723072721.GA20875%40starla/
[2]: https://lore.kernel.org/git/7F1F1A0E-8FC3-4FBD-81AA-37786DE0EF50@gmail.com/

> > +
> > +             if (!strcmp(command, "list_available_blobs")) {
> > +                     struct hashmap_iter iter;
> > +                     struct strmap_entry *ent;
> > +                     struct string_list_item *str_item;
> > +                     struct string_list paths = STRING_LIST_INIT_NODUP;
> > +
> > +                     /* flush */
> > +                     if (packet_read_line(0, NULL))
> > +                             die("bad list_available_blobs end");
> > +
> > +                     strmap_for_each_entry(&delay, &iter, ent) {
> > +                             struct delay_entry *delay_entry = ent->value;
> > +                             if (!delay_entry->requested)
> > +                                     continue;
> > +                             delay_entry->count--;
> > +                             if (!strcmp(ent->key, "invalid-delay.a")) {
> > +                                     /* Send Git a pathname that was not delayed earlier */
> > +                                     packet_write_fmt(1, "pathname=unfiltered");
> > +                             }
> > +                             if (!strcmp(ent->key, "missing-delay.a")) {
> > +                                     /* Do not signal Git that this file is available */
> > +                             } else if (!delay_entry->count) {
> > +                                     string_list_insert(&paths, ent->key);
> > +                                     packet_write_fmt(1, "pathname=%s", ent->key);
> > +                             }
> > +                     }
> > +
> > +                     /* Print paths in sorted order. */
>
> The Perl script does not order them specifically. Do we really have to do
> that here?

It actually prints them in sorted order:

        foreach my $pathname ( sort keys %DELAY )

That is required because some test cases will compare the output using
this order.

> In any case, it is more performant to append the paths in an unsorted way
> and then sort them once in the end (that's O(N log(N)) instead of O(N^2)).

OK, will do.

> > +                     for_each_string_list_item(str_item, &paths)
> > +                             fprintf(logfile, " %s", str_item->string);
> > +                     string_list_clear(&paths, 0);
> > +
> > +                     packet_flush(1);
> > +
> > +                     fprintf(logfile, " [OK]\n");
> > +                     packet_write_fmt(1, "status=success");
> > +                     packet_flush(1);
>
> I know the Perl script uses an else here, but I'd much rather insert a
> `continue` at the end of the `list_available_blobs` clause and de-indent
> the remainder of the loop body.

Sure! I think we can take a step further and extract the if logic to a
separate function.

> > +             } else {
> > +                     char *buf, *output;
> > +                     int size;
> > +                     char *pathname;
> > +                     struct delay_entry *entry;
> > +                     struct strbuf input = STRBUF_INIT;
> > +
> > +                     pathname = packet_key_val_read("pathname");
> > +                     if (!pathname)
> > +                             die("unexpected EOF while expecting pathname");
> > +                     fprintf(logfile, " %s", pathname);
>
> Again, let's `fflush(logfile)` here.
>
> > +
> > +                     /* Read until flush */
> > +                     buf = packet_read_line(0, &size);
> > +                     while (buf) {
>
> Let's write this in more idiomatic C:
>
>                         while ((buf = packet_read_line(0, &size))) {
>
> > +                             if (!strcmp(buf, "can-delay=1")) {
> > +                                     entry = strmap_get(&delay, pathname);
> > +                                     if (entry && !entry->requested) {
> > +                                             entry->requested = 1;
> > +                                     } else if (!entry && always_delay) {
> > +                                             entry = xcalloc(1, sizeof(*entry));
> > +                                             entry->requested = 1;
> > +                                             entry->count = 1;
> > +                                             strmap_put(&delay, pathname, entry);
>
> I guess here is our chance to extend the signature of `add_delay_entry()`
> to accept a `requested` parameter, and to call that here.
>
> > +                                     }
> > +                             } else if (starts_with(buf, "ref=") ||
> > +                                        starts_with(buf, "treeish=") ||
> > +                                        starts_with(buf, "blob=")) {
> > +                                     fprintf(logfile, " %s", buf);
> > +                             } else {
> > +                                     /*
> > +                                      * In general, filters need to be graceful about
> > +                                      * new metadata, since it's documented that we
> > +                                      * can pass any key-value pairs, but for tests,
> > +                                      * let's be a little stricter.
> > +                                      */
> > +                                     die("Unknown message '%s'", buf);
> > +                             }
> > +                             buf = packet_read_line(0, &size);
> > +                     }
> > +
> > +
> > +                     read_packetized_to_strbuf(0, &input, 0);
> > +                     fprintf(logfile, " %"PRIuMAX" [OK] -- ", (uintmax_t)input.len);
>
> This reads _so much nicer_ than the Perl version!
>
> > +
> > +                     entry = strmap_get(&delay, pathname);
> > +                     if (entry && entry->output) {
> > +                             output = entry->output;
> > +                     } else if (!strcmp(pathname, "error.r") || !strcmp(pathname, "abort.r")) {
> > +                             output = "";
> > +                     } else if (!strcmp(command, "clean") && has_capability("clean")) {
> > +                             output = rot13(input.buf);
> > +                     } else if (!strcmp(command, "smudge") && has_capability("smudge")) {
> > +                             output = rot13(input.buf);
> > +                     } else {
> > +                             die("bad command '%s'", command);
> > +                     }
> > +
> > +                     if (!strcmp(pathname, "error.r")) {
> > +                             fprintf(logfile, "[ERROR]\n");
> > +                             packet_write_fmt(1, "status=error");
> > +                             packet_flush(1);
> > +                     } else if (!strcmp(pathname, "abort.r")) {
> > +                             fprintf(logfile, "[ABORT]\n");
> > +                             packet_write_fmt(1, "status=abort");
> > +                             packet_flush(1);
> > +                     } else if (!strcmp(command, "smudge") &&
> > +                                (entry = strmap_get(&delay, pathname)) &&
> > +                                entry->requested == 1) {
> > +                             fprintf(logfile, "[DELAYED]\n");
> > +                             packet_write_fmt(1, "status=delayed");
> > +                             packet_flush(1);
> > +                             entry->requested = 2;
> > +                             entry->output = xstrdup(output);
>
> We need to call `free(entry->output)` before that lest we leak memory, but
> only if `output` is not identical anyway:
>
>                                 if (entry->output != output) {
>                                         free(entry->output);
>                                         entry->output = xstrdup(output);
>                                 }

I think, entry->output will always be NULL here, since we only get
inside this if block after entry->requested has been set to 1 at the
top of the function; and, at that point, we haven't run ro13 yet.
Nevertheless, it doesn't hurt to add the free call anyway :)

>
> > +                     } else {
> > +                             int i, nr_packets;
> > +                             size_t output_len;
> > +                             struct strbuf sb = STRBUF_INIT;
> > +                             packet_write_fmt(1, "status=success");
> > +                             packet_flush(1);
> > +
> > +                             strbuf_addf(&sb, "%s-write-fail.r", command);
> > +                             if (!strcmp(pathname, sb.buf)) {
>
> We can easily avoid allocating the string just for comparing it:
>
>                                 const char *p;
>
>                                 if (skip_prefix(pathname, command, &p) &&
>                                     !strcmp(p, "-write-fail.r")) {
>
> > +                                     fprintf(logfile, "[WRITE FAIL]\n");
>
>                                         fflush(logfile) ;-)
>
> > +                                     die("%s write error", command);
> > +                             }
> > +
> > +                             output_len = strlen(output);
> > +                             fprintf(logfile, "OUT: %"PRIuMAX" ", (uintmax_t)output_len);
> > +
> > +                             if (write_packetized_from_buf_no_flush_count(output,
> > +                                     output_len, 1, &nr_packets))
> > +                                     die("failed to write buffer to stdout");
> > +                             packet_flush(1);
> > +
> > +                             for (i = 0; i < nr_packets; i++)
> > +                                     fprintf(logfile, ".");
>
> That's not quite the same as the Perl script does: it prints a '.'
> (without flushing, though) _every_ time it wrote a packet.
>
> If you want to emulate that, you will have to copy/edit that loop (and in
> that case, the insanely long-named function
> `write_packetized_from_buf_no_flush_count()` is unnecessary, too).

Hmm, I'm not sure we need to emulate that. I do dislike the huge
function name as well, but I also don't quite like to repeat code
copying that loop here...

> > +                             fprintf(logfile, " [OK]\n");
> > +
> > +                             packet_flush(1);
> > +                             strbuf_release(&sb);
> > +                     }
> > +                     free(pathname);
> > +                     strbuf_release(&input);
> > +             }
> > +             free(command);
> > +     }
> > +}
> > [...]
> > +static void packet_initialize(const char *name, int version)
> > +{
> > +     struct strbuf sb = STRBUF_INIT;
> > +     int size;
> > +     char *pkt_buf = packet_read_line(0, &size);
> > +
> > +     strbuf_addf(&sb, "%s-client", name);
> > +     if (!pkt_buf || strncmp(pkt_buf, sb.buf, size))
>
> We do not need the flexibility of the Perl package, where `name` is a
> parameter. We can hard-code `git-filter-client` here. I.e. something like
> this:
>
>         if (!pkt_buf || size != 17 ||
>             strncmp(pkt_buf, "git-filter-client", 17))

Good idea! Thanks. Perhaps, can't we do:

        if (!pkt_buf || strncmp(pkt_buf, "git-filter-client", size))

to avoid the hard-coded and possibly error-prone 17?

> > +             die("bad initialize: '%s'", xstrndup(pkt_buf, size));
> > +
> > +     strbuf_reset(&sb);
> > +     strbuf_addf(&sb, "version=%d", version);

Thanks for a very detailed review and great suggestions!

next prev parent reply	other threads:[~2022-07-31  2:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-22 19:42 [PATCH 0/2] t0021: convert perl script to C test-tool helper Matheus Tavares
2022-07-22 19:42 ` [PATCH 1/2] t/t0021: convert the rot13-filter.pl script to C Matheus Tavares
2022-07-23  4:52   ` Ævar Arnfjörð Bjarmason
2022-07-23  4:59   ` Ævar Arnfjörð Bjarmason
2022-07-23 13:36     ` Matheus Tavares
2022-07-22 19:42 ` [PATCH 2/2] t/t0021: replace old rot13-filter.pl uses with new test-tool cmd Matheus Tavares
2022-07-24 15:09 ` [PATCH v2] t/t0021: convert the rot13-filter.pl script to C Matheus Tavares
2022-07-28 16:58   ` Johannes Schindelin
2022-07-28 17:54     ` Junio C Hamano
2022-07-28 19:50     ` Ævar Arnfjörð Bjarmason
2022-07-31  2:52     ` Matheus Tavares [this message]
2022-08-09  9:36       ` Johannes Schindelin
2022-07-31 18:19   ` [PATCH v3 0/3] t0021: convert perl script to C test-tool helper Matheus Tavares
2022-07-31 18:19     ` [PATCH v3 1/3] t0021: avoid grepping for a Perl-specific string at filter output Matheus Tavares
2022-08-01 20:41       ` Junio C Hamano
2022-07-31 18:19     ` [PATCH v3 2/3] t0021: implementation the rot13-filter.pl script in C Matheus Tavares
2022-08-01 11:33       ` Ævar Arnfjörð Bjarmason
2022-08-02  0:16         ` Matheus Tavares
2022-08-09  9:45           ` Johannes Schindelin
2022-08-01 11:39       ` Ævar Arnfjörð Bjarmason
2022-08-01 21:18       ` Junio C Hamano
2022-08-02  0:13         ` Matheus Tavares
2022-08-09 10:00         ` Johannes Schindelin
2022-08-10 18:37           ` Junio C Hamano
2022-08-10 19:58             ` Junio C Hamano
2022-08-09 10:37         ` Johannes Schindelin
2022-08-09 10:47       ` Johannes Schindelin
2022-07-31 18:19     ` [PATCH v3 3/3] tests: use the new C rot13-filter helper to avoid PERL prereq Matheus Tavares
2022-08-15  1:06     ` [PATCH v4 0/3] t0021: convert perl script to C test-tool helper Matheus Tavares
2022-08-15  1:06       ` [PATCH v4 1/3] t0021: avoid grepping for a Perl-specific string at filter output Matheus Tavares
2022-08-15  1:06       ` [PATCH v4 2/3] t0021: implementation the rot13-filter.pl script in C Matheus Tavares
2022-08-15  1:06       ` [PATCH v4 3/3] tests: use the new C rot13-filter helper to avoid PERL prereq Matheus Tavares
2022-08-15 13:01       ` [PATCH v4 0/3] t0021: convert perl script to C test-tool helper Johannes Schindelin
2022-08-19 22:17         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW6LZay=MX2FdFjgTh1pjE=g-XTm63mGWuMhHd=-N=tXRA@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).