From: Johannes Sixt <j6t@kdbg.org>
To: Stefan Beller <stefanbeller@googlemail.com>
Cc: git@vger.kernel.org, l.s.r@web.de, mfick@codeaurora.org,
apelisse@gmail.com, Matthieu.Moy@grenoble-inp.fr,
pclouds@gmail.com, iveqy@iveqy.com, gitster@pobox.com,
mackyle@gmail.com
Subject: Re: [RFC PATCHv4] repack: rewrite the shell script in C.
Date: Tue, 20 Aug 2013 15:31:08 +0200 [thread overview]
Message-ID: <52136F9C.6030308@kdbg.org> (raw)
In-Reply-To: <1376954619-24314-1-git-send-email-stefanbeller@googlemail.com>
I didn't look at functions above cmd_repack.
Am 20.08.2013 01:23, schrieb Stefan Beller:
> +int cmd_repack(int argc, const char **argv, const char *prefix) {
> +
> + int pack_everything = 0;
> + int pack_everything_but_loose = 0;
> + int delete_redundant = 0;
> + char *unpack_unreachable = NULL;
> + int window = 0, window_memory = 0;
> + int depth = 0;
> + int max_pack_size = 0;
> + int no_reuse_delta = 0, no_reuse_object = 0;
> + int no_update_server_info = 0;
> + int quiet = 0;
> + int local = 0;
> + char *packdir, *packtmp;
> + struct child_process cmd;
> + struct string_list_item *item;
> + struct string_list existing_packs = STRING_LIST_INIT_DUP;
> + struct stat statbuffer;
> + int ext;
> + char *exts[2] = {".idx", ".pack"};
> +
> + struct option builtin_repack_options[] = {
Are the long forms of options your invention?
> + OPT_BOOL('a', "all", &pack_everything,
> + N_("pack everything in a single pack")),
> + OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
> + N_("same as -a, and turn unreachable objects loose")),
--all-but-loose does not express what the help text says. The long form of
-A is --all --unpack-unreachable, so it is really just a short option for
convenience. It does not need its own long form.
> + OPT_BOOL('d', "delete-redundant", &delete_redundant,
> + N_("remove redundant packs, and run git-prune-packed")),
> + OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
> + N_("pass --no-reuse-delta to git-pack-objects")),
> + OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
> + N_("pass --no-reuse-object to git-pack-objects")),
Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?
> + OPT_BOOL('n', NULL, &no_update_server_info,
> + N_("do not run git-update-server-info")),
No long option name?
> + OPT__QUIET(&quiet, N_("be quiet")),
> + OPT_BOOL('l', "local", &local,
> + N_("pass --local to git-pack-objects")),
Good.
> + OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
> + N_("with -A, do not loosen objects older than this Packing constraints")),
"Packing constraints" is a section heading, not a continuation of the
previous help text.
> + OPT_INTEGER(0, "window", &window,
> + N_("size of the window used for delta compression")),
This help text is suboptimal as the option is a count, not a "size" in the
narrow sense. But that can be changed later (as it would affect other
tools as well, I guess).
> + OPT_INTEGER(0, "window-memory", &window_memory,
> + N_("same as the above, but limit memory size instead of entries count")),
> + OPT_INTEGER(0, "depth", &depth,
> + N_("limits the maximum delta depth")),
> + OPT_INTEGER(0, "max-pack-size", &max_pack_size,
> + N_("maximum size of each packfile")),
> + OPT_END()
> + };
Good.
> +
> + git_config(repack_config, NULL);
> +
> + argc = parse_options(argc, argv, prefix, builtin_repack_options,
> + git_repack_usage, 0);
> +
> + sigchain_push_common(remove_pack_on_signal);
Good.
> + packdir = mkpathdup("%s/pack", get_object_directory());
> + packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
Should this not be
packdir = xstrdup(git_path("pack"));
packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
Perhaps make packdir and packtmp global so that the strings need not be
duplicated in get_pack_filenames and remove_temporary_files?
> +
> + remove_temporary_files();
Yes, the shell script had this. But is it really necessary?
> +
> + struct argv_array cmd_args = ARGV_ARRAY_INIT;
Declaration after statement.
> + argv_array_push(&cmd_args, "pack-objects");
> + argv_array_push(&cmd_args, "--keep-true-parents");
> + argv_array_push(&cmd_args, "--honor-pack-keep");
> + argv_array_push(&cmd_args, "--non-empty");
> + argv_array_push(&cmd_args, "--all");
> + argv_array_push(&cmd_args, "--reflog");
> +
> + if (window)
> + argv_array_pushf(&cmd_args, "--window=%u", window);
> +
> + if (window_memory)
> + argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
> +
> + if (depth)
> + argv_array_pushf(&cmd_args, "--depth=%u", depth);
> +
> + if (max_pack_size)
> + argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
> +
> + if (no_reuse_delta)
> + argv_array_pushf(&cmd_args, "--no-reuse-delta");
> +
> + if (no_reuse_object)
> + argv_array_pushf(&cmd_args, "--no-reuse-object");
no_reuse_delta and no_reuse_object are mutually exclusive, according to
the shell script version.
> +
> + if (pack_everything + pack_everything_but_loose == 0) {
> + argv_array_push(&cmd_args, "--unpacked");
> + argv_array_push(&cmd_args, "--incremental");
> + } else {
> + struct string_list fname_list = STRING_LIST_INIT_DUP;
> + get_pack_filenames(packdir, &fname_list);
> + for_each_string_list_item(item, &fname_list) {
> + char *fname;
> + fname = mkpathdup("%s/%s.keep", packdir, item->string);
> + if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
if (!stat(fname, &statbuffer) && ...
But you are using file_exists() later. That should be good enough here as
well, no?
> + /* when the keep file is there, we're ignoring that pack */
> + } else {
> + string_list_append(&existing_packs, item->string);
> + }
> + free(fname);
> + }
> +
> + if (existing_packs.nr && delete_redundant) {
> + if (unpack_unreachable)
> + argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
> + else if (pack_everything_but_loose)
> + argv_array_push(&cmd_args, "--unpack-unreachable");
> + }
> + }
> +
> + if (local)
> + argv_array_push(&cmd_args, "--local");
> + if (quiet)
> + argv_array_push(&cmd_args, "--quiet");
> + if (delta_base_offset)
> + argv_array_push(&cmd_args, "--delta-base-offset");
> +
> + argv_array_push(&cmd_args, packtmp);
Otherwise, argument setup looks fine.
> +
> + memset(&cmd, 0, sizeof(cmd));
> + cmd.argv = argv_array_detach(&cmd_args, NULL);
Is it necessary to detach the arguments?
> + cmd.git_cmd = 1;
> + cmd.out = -1;
> + cmd.no_stdin = 1;
> +
> + if (run_command(&cmd))
> + return 1;
You cannot run_command() and then later read its output! You must split it
into start_command(), read stdout, finish_command().
> +
> + struct string_list names = STRING_LIST_INIT_DUP;
> + struct string_list rollback = STRING_LIST_INIT_DUP;
Declaration after statement.
> +
> + char line[1024];
> + int counter = 0;
> + FILE *out = xfdopen(cmd.out, "r");
> + while (fgets(line, sizeof(line), out)) {
> + /* a line consists of 40 hex chars + '\n' */
> + assert(strlen(line) == 41);
You cannot make assertions about input that you read from an external
command! You can die() if the expectation is not met. But I think that in
this case the only necessary expectation is that a line is not empty.
BTW, don't we have strbuf functions to read from an fd linewise?
> + line[40] = '\0';
> + string_list_append(&names, line);
> + counter++;
> + }
> + if (!counter)
> + printf("Nothing new to pack.\n");
This was 'say Nothing new to pack.'. say obeys --quiet, IIRC.
> + fclose(out);
> +
> + int failed = 0;
> + for_each_string_list_item(item, &names) {
> + for (ext = 0; ext < 1; ext++) {
> + char *fname, *fname_old;
> + fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
> + if (!file_exists(fname)) {
> + free(fname);
> + continue;
> + }
> +
> + fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);
If you could use git_path() instead of mkpathdup() in these two cases, we
would not need to free() the names.
> + if (file_exists(fname_old))
> + unlink(fname_old);
> +
> + if (rename(fname, fname_old)) {
> + failed = 1;
> + break;
> + }
> + free(fname_old);
> + string_list_append_nodup(&rollback, fname);
Ah, we would need to allocate here then.
> + }
> + if (failed)
> + /* set to last element to break for_each loop */
> + item = names.items + names.nr;
A mere
break;
doesn't do it here?
> + }
> + if (failed) {
> + struct string_list rollback_failure;
> + for_each_string_list_item(item, &rollback) {
> + char *fname, *fname_old;
> + fname = mkpathdup("%s/%s", packdir, item->string);
> + fname_old = mkpathdup("%s/old-%s", packdir, item->string);
I think it's possible to attach arbitrary data to each string_list item.
We could attach the "%s/old-%s" name to the item name, then we wouldn't
need to re-construct the names here.
> + if (rename(fname_old, fname))
> + string_list_append(&rollback_failure, fname);
> + free(fname);
> + free(fname_old);
> + }
> +
> + if (rollback.nr) {
> + int i;
> + fprintf(stderr,
> + "WARNING: Some packs in use have been renamed by\n"
> + "WARNING: prefixing old- to their name, in order to\n"
> + "WARNING: replace them with the new version of the\n"
> + "WARNING: file. But the operation failed, and the\n"
> + "WARNING: attempt to rename them back to their\n"
> + "WARNING: original names also failed.\n"
> + "WARNING: Please rename them in $PACKDIR manually:\n");
> + for (i = 0; i < rollback.nr; i++)
> + fprintf(stderr, "WARNING: old-%s -> %s\n",
> + rollback.items[i].string,
> + rollback.items[i].string);
> + }
> + exit(1);
> + }
> +
> + /* Now the ones with the same name are out of the way... */
> + for_each_string_list_item(item, &names) {
> + for (ext = 0; ext < 2; ext++) {
> + char *fname, *fname_old;
> + fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
> + fname_old = mkpathdup("%s-%s%s", packtmp, item->string, exts[ext]);
Same here: git_path()?
> + stat(fname_old, &statbuffer);
We ignore errors during chmod in the shell script. But this doesn't give
you license to ignore stat() errors completely: If stat() fails, then
don't chmod() below, either.
> + statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
statbuffer.st_mode &= ~(S_IWUSR|S_IWGRP|S_IWOTH);
> + chmod(fname_old, statbuffer.st_mode);
> + if (rename(fname_old, fname))
> + die("Could not rename packfile: %s -> %s", fname_old, fname);
Use die_errno() here.
> + free(fname);
> + free(fname_old);
> + }
> + }
> +
> + /* Remove the "old-" files */
> + for_each_string_list_item(item, &names) {
> + char *fname;
> + fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
> + if (remove_path(fname))
> + die("Could not remove file: %s", fname);
die_errno() makes sense here, too.
> + free(fname);
> +
> + fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
> + if (remove_path(fname))
> + die("Could not remove file: %s", fname);
and here as well.
> + free(fname);
Again git_path?
> + }
> +
> + /* End of pack replacement. */
Nit: A blank line should follow this comment.
> + if (delete_redundant) {
> + sort_string_list(&names);
> + for_each_string_list_item(item, &existing_packs) {
> + char *sha1;
> + size_t len = strlen(item->string);
> + if (len < 40)
> + continue;
> + sha1 = item->string + len - 40;
> + if (!string_list_has_string(&names, sha1))
> + remove_pack(packdir, item->string);
> + }
OK.
> + argv_array_clear(&cmd_args);
> + argv_array_push(&cmd_args, "prune-packed");
> + if (quiet)
> + argv_array_push(&cmd_args, "--quiet");
> +
> + memset(&cmd, 0, sizeof(cmd));
> + cmd.argv = argv_array_detach(&cmd_args, NULL);
Again: is it necessary to detach?
> + cmd.git_cmd = 1;
> + run_command(&cmd);
> + }
> +
> + if (!no_update_server_info) {
> + argv_array_clear(&cmd_args);
> + argv_array_push(&cmd_args, "update-server-info");
> +
> + memset(&cmd, 0, sizeof(cmd));
> + cmd.argv = argv_array_detach(&cmd_args, NULL);
Same here?
> + cmd.git_cmd = 1;
> + run_command(&cmd);
> + }
> + return 0;
> +}
In my opinion, it is good that you keep a large function that resembles
the structure of the shell script because it is easier to review. But
ultimately, it should be factored into smaller functions.
-- Hannes
next prev parent reply other threads:[~2013-08-20 13:31 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
2013-08-14 7:26 ` Matthieu Moy
2013-08-14 16:26 ` Stefan Beller
2013-08-14 16:27 ` [RFC PATCH] " Stefan Beller
2013-08-14 16:49 ` Antoine Pelisse
2013-08-14 17:04 ` Stefan Beller
2013-08-14 17:19 ` Jeff King
2013-08-14 17:25 ` Martin Fick
2013-08-14 22:16 ` Stefan Beller
2013-08-14 22:28 ` Martin Fick
2013-08-14 22:53 ` Junio C Hamano
2013-08-14 23:28 ` Martin Fick
2013-08-15 17:15 ` Junio C Hamano
2013-08-16 0:12 ` [RFC PATCHv2] " Stefan Beller
2013-08-17 13:34 ` René Scharfe
2013-08-17 19:18 ` Kyle J. McKay
2013-08-18 14:34 ` Stefan Beller
2013-08-18 14:36 ` [RFC PATCHv3] " Stefan Beller
2013-08-18 15:41 ` Kyle J. McKay
2013-08-18 16:44 ` René Scharfe
2013-08-18 22:26 ` [RFC PATCHv4] " Stefan Beller
2013-08-19 23:23 ` Stefan Beller
2013-08-20 13:31 ` Johannes Sixt [this message]
2013-08-20 15:08 ` Stefan Beller
2013-08-20 18:38 ` Johannes Sixt
2013-08-20 18:57 ` René Scharfe
2013-08-20 22:36 ` Stefan Beller
2013-08-20 22:38 ` [PATCH] " Stefan Beller
2013-08-21 8:25 ` Jonathan Nieder
2013-08-21 10:37 ` Stefan Beller
2013-08-21 17:25 ` Stefan Beller
2013-08-21 17:28 ` [RFC PATCHv6 1/2] " Stefan Beller
2013-08-21 17:28 ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
2013-08-21 20:56 ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
2013-08-21 21:52 ` Matthieu Moy
2013-08-21 22:15 ` Stefan Beller
2013-08-21 22:50 ` Junio C Hamano
2013-08-21 22:57 ` Stefan Beller
2013-08-22 10:46 ` Johannes Sixt
2013-08-22 21:03 ` Jonathan Nieder
2013-08-21 8:49 ` [PATCH] " Matthieu Moy
2013-08-21 12:47 ` Stefan Beller
2013-08-21 13:05 ` Matthieu Moy
2013-08-21 12:53 ` Stefan Beller
2013-08-21 13:07 ` Matthieu Moy
2013-08-22 10:46 ` Johannes Sixt
2013-08-22 10:46 ` Johannes Sixt
2013-08-22 20:06 ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
2013-08-22 20:31 ` Junio C Hamano
2013-08-20 22:46 ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
2013-08-21 9:20 ` Johannes Sixt
2013-08-20 21:24 ` Stefan Beller
2013-08-20 21:34 ` Jonathan Nieder
2013-08-20 21:40 ` Dokumenting api-paths.txt Stefan Beller
2013-08-20 21:59 ` Jonathan Nieder
2013-08-21 22:43 ` Stefan Beller
2013-08-22 17:29 ` Junio C Hamano
2013-08-14 22:51 ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
2013-08-14 22:59 ` Matthieu Moy
2013-08-15 7:47 ` Stefan Beller
2013-08-15 4:15 ` Duy Nguyen
2013-08-14 17:26 ` Junio C Hamano
2013-08-14 22:51 ` Matthieu Moy
2013-08-14 23:25 ` Martin Fick
2013-08-15 0:26 ` Martin Fick
2013-08-15 7:46 ` Stefan Beller
2013-08-15 15:04 ` Martin Fick
2013-08-15 4:20 ` Duy Nguyen
2013-08-14 17:04 ` Junio C Hamano
2013-08-15 7:53 ` Stefan Beller
2013-08-14 7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52136F9C.6030308@kdbg.org \
--to=j6t@kdbg.org \
--cc=Matthieu.Moy@grenoble-inp.fr \
--cc=apelisse@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=iveqy@iveqy.com \
--cc=l.s.r@web.de \
--cc=mackyle@gmail.com \
--cc=mfick@codeaurora.org \
--cc=pclouds@gmail.com \
--cc=stefanbeller@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).