git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <stefanbeller@googlemail.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: git@vger.kernel.org, l.s.r@web.de, mfick@codeaurora.org,
	apelisse@gmail.com, Matthieu.Moy@grenoble-inp.fr,
	pclouds@gmail.com, iveqy@iveqy.com, gitster@pobox.com,
	mackyle@gmail.com
Subject: Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Date: Tue, 20 Aug 2013 17:08:54 +0200	[thread overview]
Message-ID: <52138686.1070304@googlemail.com> (raw)
In-Reply-To: <52136F9C.6030308@kdbg.org>

[-- Attachment #1: Type: text/plain, Size: 15287 bytes --]

On 08/20/2013 03:31 PM, Johannes Sixt wrote:
> 
> Are the long forms of options your invention?

I tried to keep strong similarity with the shell script for
ease of review. In the shellscript the options where 
put in variables having these names, so for example there was:

	-f)	no_reuse=--no-reuse-delta ;;
	-F)	no_reuse=--no-reuse-object ;;

So I used these variable names as well in here. And as I assumed
the variables are meaningful in itself.

In the shell script they may be meaningful, but with the option
parser in the C version, I overlooked the possibility for 
--no-<option> being possible as you noted below.

Maybe we should inverse the logic and have the variables and options
called reuse-delta and being enabled by default.

> 
>> +        OPT_BOOL('a', "all", &pack_everything,
>> +                N_("pack everything in a single pack")),
>> +        OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
>> +                N_("same as -a, and turn unreachable objects loose")),
> 
> --all-but-loose does not express what the help text says. The long form
> of -A is --all --unpack-unreachable, so it is really just a short option
> for convenience. It does not need its own long form.

Ok, I'll keep that in mind, and will only use the varialbe tied to -A
to set the -a and --unpack-unreachable variable.

> 
>> +        OPT_BOOL('d', "delete-redundant", &delete_redundant,
>> +                N_("remove redundant packs, and run git-prune-packed")),
>> +        OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
>> +                N_("pass --no-reuse-delta to git-pack-objects")),
>> +        OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
>> +                N_("pass --no-reuse-object to git-pack-objects")),
> 
> Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?

see above, I'd try not to.

> 
>> +        OPT_BOOL('n', NULL, &no_update_server_info,
>> +                N_("do not run git-update-server-info")),
> 
> No long option name?

This is also a negated option, so as above, maybe 
we could have --update_server_info and --no-update_server_info
respectively. Talking about the shortform then: Is it possible to
negate the shortform?

> 
>> +        OPT__QUIET(&quiet, N_("be quiet")),
>> +        OPT_BOOL('l', "local", &local,
>> +                N_("pass --local to git-pack-objects")),
> 
> Good.
> 
>> +        OPT_STRING(0, "unpack-unreachable", &unpack_unreachable,
>> N_("approxidate"),
>> +                N_("with -A, do not loosen objects older than this
>> Packing constraints")),
> 
> "Packing constraints" is a section heading, not a continuation of the
> previous help text.
> 
>> +        OPT_INTEGER(0, "window", &window,
>> +                N_("size of the window used for delta compression")),
> 
> This help text is suboptimal as the option is a count, not a "size" in
> the narrow sense. But that can be changed later (as it would affect
> other tools as well, I guess).
> 
>> +        OPT_INTEGER(0, "window-memory", &window_memory,
>> +                N_("same as the above, but limit memory size instead
>> of entries count")),
>> +        OPT_INTEGER(0, "depth", &depth,
>> +                N_("limits the maximum delta depth")),
>> +        OPT_INTEGER(0, "max-pack-size", &max_pack_size,
>> +                N_("maximum size of each packfile")),
>> +        OPT_END()
>> +    };
> 
> Good.
> 
>> +
>> +    git_config(repack_config, NULL);
>> +
>> +    argc = parse_options(argc, argv, prefix, builtin_repack_options,
>> +                git_repack_usage, 0);
>> +
>> +    sigchain_push_common(remove_pack_on_signal);
> 
> Good.
> 
>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
> 
> Should this not be
> 
>     packdir = xstrdup(git_path("pack"));
>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
> 
> Perhaps make packdir and packtmp global so that the strings need not be
> duplicated in get_pack_filenames and remove_temporary_files?

ok

> 
>> +
>> +    remove_temporary_files();
> 
> Yes, the shell script had this. But is it really necessary?

Well I can drop it if it's not needed.
It actually should implement
	rm -f "$PACKTMP"-*
and then the trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
as well.

> 
>> +
>> +    struct argv_array cmd_args = ARGV_ARRAY_INIT;
> 
> Declaration after statement.

will fix.

> 
>> +    argv_array_push(&cmd_args, "pack-objects");
>> +    argv_array_push(&cmd_args, "--keep-true-parents");
>> +    argv_array_push(&cmd_args, "--honor-pack-keep");
>> +    argv_array_push(&cmd_args, "--non-empty");
>> +    argv_array_push(&cmd_args, "--all");
>> +    argv_array_push(&cmd_args, "--reflog");
>> +
>> +    if (window)
>> +        argv_array_pushf(&cmd_args, "--window=%u", window);
>> +
>> +    if (window_memory)
>> +        argv_array_pushf(&cmd_args, "--window-memory=%u",
>> window_memory);
>> +
>> +    if (depth)
>> +        argv_array_pushf(&cmd_args, "--depth=%u", depth);
>> +
>> +    if (max_pack_size)
>> +        argv_array_pushf(&cmd_args, "--max_pack_size=%u",
>> max_pack_size);
>> +
>> +    if (no_reuse_delta)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-delta");
>> +
>> +    if (no_reuse_object)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-object");
> 
> no_reuse_delta and no_reuse_object are mutually exclusive, according to
> the shell script version.

I'll change it then to OPT_BIT and die() when both are set.

> 
>> +
>> +    if (pack_everything + pack_everything_but_loose == 0) {
>> +        argv_array_push(&cmd_args, "--unpacked");
>> +        argv_array_push(&cmd_args, "--incremental");
>> +    } else {
>> +        struct string_list fname_list = STRING_LIST_INIT_DUP;
>> +        get_pack_filenames(packdir, &fname_list);
>> +        for_each_string_list_item(item, &fname_list) {
>> +            char *fname;
>> +            fname = mkpathdup("%s/%s.keep", packdir, item->string);
>> +            if (stat(fname, &statbuffer) &&
>> S_ISREG(statbuffer.st_mode)) {
> 
>             if (!stat(fname, &statbuffer) && ...
> 
> But you are using file_exists() later. That should be good enough here
> as well, no?

will do.

> 
>> +                /* when the keep file is there, we're ignoring that
>> pack */
>> +            } else {
>> +                string_list_append(&existing_packs, item->string);
>> +            }
>> +            free(fname);
>> +        }
>> +
>> +        if (existing_packs.nr && delete_redundant) {
>> +            if (unpack_unreachable)
>> +                argv_array_pushf(&cmd_args,
>> "--unpack-unreachable=%s", unpack_unreachable);
>> +            else if (pack_everything_but_loose)
>> +                argv_array_push(&cmd_args, "--unpack-unreachable");
>> +        }
>> +    }
>> +
>> +    if (local)
>> +        argv_array_push(&cmd_args,  "--local");
>> +    if (quiet)
>> +        argv_array_push(&cmd_args,  "--quiet");
>> +    if (delta_base_offset)
>> +        argv_array_push(&cmd_args,  "--delta-base-offset");
>> +
>> +    argv_array_push(&cmd_args, packtmp);
> 
> Otherwise, argument setup looks fine.
> 
>> +
>> +    memset(&cmd, 0, sizeof(cmd));
>> +    cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Is it necessary to detach the arguments?

Probably not. 

> 
>> +    cmd.git_cmd = 1;
>> +    cmd.out = -1;
>> +    cmd.no_stdin = 1;
>> +
>> +    if (run_command(&cmd))
>> +        return 1;
> 
> You cannot run_command() and then later read its output! You must split
> it into start_command(), read stdout, finish_command().

Thanks for this hint. Could that explain rare non-deterministic failures in
the test suite?

> 
>> +
>> +    struct string_list names = STRING_LIST_INIT_DUP;
>> +    struct string_list rollback = STRING_LIST_INIT_DUP;
> 
> Declaration after statement.

will fix
> 
>> +
>> +    char line[1024];
>> +    int counter = 0;
>> +    FILE *out = xfdopen(cmd.out, "r");
>> +    while (fgets(line, sizeof(line), out)) {
>> +        /* a line consists of 40 hex chars + '\n' */
>> +        assert(strlen(line) == 41);
> 
> You cannot make assertions about input that you read from an external
> command! You can die() if the expectation is not met. But I think that
> in this case the only necessary expectation is that a line is not empty.
> 
> BTW, don't we have strbuf functions to read from an fd linewise?

I'll check.

> 
>> +        line[40] = '\0';
>> +        string_list_append(&names, line);
>> +        counter++;
>> +    }
>> +    if (!counter)
>> +        printf("Nothing new to pack.\n");
> 
> This was 'say Nothing new to pack.'. say obeys --quiet, IIRC.

ok
> 
>> +    fclose(out);
>> +
>> +    int failed = 0;
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 1; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s%s", packdir, item->string,
>> exts[ext]);
>> +            if (!file_exists(fname)) {
>> +                free(fname);
>> +                continue;
>> +            }
>> +
>> +            fname_old = mkpathdup("%s/old-%s%s", packdir,
>> item->string, exts[ext]);
> 
> If you could use git_path() instead of mkpathdup() in these two cases,
> we would not need to free() the names.
> 
>> +            if (file_exists(fname_old))
>> +                unlink(fname_old);
>> +
>> +            if (rename(fname, fname_old)) {
>> +                failed = 1;
>> +                break;
>> +            }
>> +            free(fname_old);
>> +            string_list_append_nodup(&rollback, fname);
> 
> Ah, we would need to allocate here then.
> 
>> +        }
>> +        if (failed)
>> +            /* set to last element to break for_each loop */
>> +            item = names.items + names.nr;
> 
> A mere
>             break;
> doesn't do it here?

Sure! I'll replace by break.

> 
>> +    }
>> +    if (failed) {
>> +        struct string_list rollback_failure;
>> +        for_each_string_list_item(item, &rollback) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s", packdir, item->string);
>> +            fname_old = mkpathdup("%s/old-%s", packdir, item->string);
> 
> I think it's possible to attach arbitrary data to each string_list item.
> We could attach the "%s/old-%s" name to the item name, then we wouldn't
> need to re-construct the names here.

handy! I'll try to do that.

> 
>> +            if (rename(fname_old, fname))
>> +                string_list_append(&rollback_failure, fname);
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +
>> +        if (rollback.nr) {
>> +            int i;
>> +            fprintf(stderr,
>> +                "WARNING: Some packs in use have been renamed by\n"
>> +                "WARNING: prefixing old- to their name, in order to\n"
>> +                "WARNING: replace them with the new version of the\n"
>> +                "WARNING: file.  But the operation failed, and the\n"
>> +                "WARNING: attempt to rename them back to their\n"
>> +                "WARNING: original names also failed.\n"
>> +                "WARNING: Please rename them in $PACKDIR manually:\n");
>> +            for (i = 0; i < rollback.nr; i++)
>> +                fprintf(stderr, "WARNING:   old-%s -> %s\n",
>> +                    rollback.items[i].string,
>> +                    rollback.items[i].string);
>> +        }
>> +        exit(1);
>> +    }
>> +
>> +    /* Now the ones with the same name are out of the way... */
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 2; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/pack-%s%s", packdir, item->string,
>> exts[ext]);
>> +            fname_old = mkpathdup("%s-%s%s", packtmp, item->string,
>> exts[ext]);
> 
> Same here: git_path()?
> 
>> +            stat(fname_old, &statbuffer);
> 
> We ignore errors during chmod in the shell script. But this doesn't give
> you license to ignore stat() errors completely: If stat() fails, then
> don't chmod() below, either.

ok

> 
>> +            statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
> 
>             statbuffer.st_mode &= ~(S_IWUSR|S_IWGRP|S_IWOTH);
> 
>> +            chmod(fname_old, statbuffer.st_mode);
>> +            if (rename(fname_old, fname))
>> +                die("Could not rename packfile: %s -> %s", fname_old,
>> fname);
> 
> Use die_errno() here.
> 
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +    }
>> +
>> +    /* Remove the "old-" files */
>> +    for_each_string_list_item(item, &names) {
>> +        char *fname;
>> +        fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> die_errno() makes sense here, too.
> 
>> +        free(fname);
>> +
>> +        fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> and here as well.
> 
>> +        free(fname);
> 
> Again git_path?
> 
>> +    }
>> +
>> +    /* End of pack replacement. */
> 
> Nit: A blank line should follow this comment.
> 
>> +    if (delete_redundant) {
>> +        sort_string_list(&names);
>> +        for_each_string_list_item(item, &existing_packs) {
>> +            char *sha1;
>> +            size_t len = strlen(item->string);
>> +            if (len < 40)
>> +                continue;
>> +            sha1 = item->string + len - 40;
>> +            if (!string_list_has_string(&names, sha1))
>> +                remove_pack(packdir, item->string);
>> +        }
> 
> OK.
> 
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "prune-packed");
>> +        if (quiet)
>> +            argv_array_push(&cmd_args, "--quiet");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Again: is it necessary to detach?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +
>> +    if (!no_update_server_info) {
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "update-server-info");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Same here?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +    return 0;
>> +}
> 
> In my opinion, it is good that you keep a large function that resembles
> the structure of the shell script because it is easier to review. But
> ultimately, it should be factored into smaller functions.
> 
> -- Hannes
> 

Hannes,

thank you very much for the review. I'll follow your suggestions and dive
deeper into the API to change your annotated lines.

Thanks,
Stefan




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

  reply	other threads:[~2013-08-20 15:09 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
2013-08-14  7:26   ` Matthieu Moy
2013-08-14 16:26     ` Stefan Beller
2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
2013-08-14 16:49         ` Antoine Pelisse
2013-08-14 17:04           ` Stefan Beller
2013-08-14 17:19             ` Jeff King
2013-08-14 17:25           ` Martin Fick
2013-08-14 22:16             ` Stefan Beller
2013-08-14 22:28               ` Martin Fick
2013-08-14 22:53                 ` Junio C Hamano
2013-08-14 23:28                   ` Martin Fick
2013-08-15 17:15                     ` Junio C Hamano
2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
2013-08-17 13:34                         ` René Scharfe
2013-08-17 19:18                           ` Kyle J. McKay
2013-08-18 14:34                           ` Stefan Beller
2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
2013-08-18 15:41                               ` Kyle J. McKay
2013-08-18 16:44                               ` René Scharfe
2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
2013-08-19 23:23                                   ` Stefan Beller
2013-08-20 13:31                                     ` Johannes Sixt
2013-08-20 15:08                                       ` Stefan Beller [this message]
2013-08-20 18:38                                         ` Johannes Sixt
2013-08-20 18:57                                         ` René Scharfe
2013-08-20 22:36                                           ` Stefan Beller
2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
2013-08-21  8:25                                               ` Jonathan Nieder
2013-08-21 10:37                                                 ` Stefan Beller
2013-08-21 17:25                                                 ` Stefan Beller
2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
2013-08-21 21:52                                                       ` Matthieu Moy
2013-08-21 22:15                                                       ` Stefan Beller
2013-08-21 22:50                                                         ` Junio C Hamano
2013-08-21 22:57                                                           ` Stefan Beller
2013-08-22 10:46                                                         ` Johannes Sixt
2013-08-22 21:03                                                       ` Jonathan Nieder
2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
2013-08-21 12:47                                                 ` Stefan Beller
2013-08-21 13:05                                                   ` Matthieu Moy
2013-08-21 12:53                                                 ` Stefan Beller
2013-08-21 13:07                                                   ` Matthieu Moy
2013-08-22 10:46                                                     ` Johannes Sixt
2013-08-22 10:46                                                 ` Johannes Sixt
2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
2013-08-22 20:31                                                     ` Junio C Hamano
2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
2013-08-21  9:20                                             ` Johannes Sixt
2013-08-20 21:24                                       ` Stefan Beller
2013-08-20 21:34                                         ` Jonathan Nieder
2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
2013-08-20 21:59                                             ` Jonathan Nieder
2013-08-21 22:43                                               ` Stefan Beller
2013-08-22 17:29                                                 ` Junio C Hamano
2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
2013-08-14 22:59                 ` Matthieu Moy
2013-08-15  7:47                   ` Stefan Beller
2013-08-15  4:15             ` Duy Nguyen
2013-08-14 17:26           ` Junio C Hamano
2013-08-14 22:51           ` Matthieu Moy
2013-08-14 23:25             ` Martin Fick
2013-08-15  0:26               ` Martin Fick
2013-08-15  7:46               ` Stefan Beller
2013-08-15 15:04                 ` Martin Fick
2013-08-15  4:20             ` Duy Nguyen
2013-08-14 17:04         ` Junio C Hamano
2013-08-15  7:53           ` Stefan Beller
2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52138686.1070304@googlemail.com \
    --to=stefanbeller@googlemail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=iveqy@iveqy.com \
    --cc=j6t@kdbg.org \
    --cc=l.s.r@web.de \
    --cc=mackyle@gmail.com \
    --cc=mfick@codeaurora.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).