git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <stefanbeller@googlemail.com>
Cc: git@vger.kernel.org, mfick@codeaurora.org, apelisse@gmail.com,
	Matthieu.Moy@grenoble-inp.fr, pclouds@gmail.com, iveqy@iveqy.com,
	gitster@pobox.com, mackyle@gmail.com, j6t@kdbg.org
Subject: Re: [PATCH] repack: rewrite the shell script in C.
Date: Wed, 21 Aug 2013 01:25:27 -0700	[thread overview]
Message-ID: <20130821082527.GC2802@elie.Belkin> (raw)
In-Reply-To: <1377038334-15799-1-git-send-email-stefanbeller@googlemail.com>

Hi,

Stefan Beller wrote:

> [PATCH] repack: rewrite the shell script in C.

Thanks for your work so far.  This review will have mostly cosmetic
notes.  Hopefully others can try it out to see if the actual behavior
is good.

As a first nit: in git, as usual in emails, the style in subject lines
is not to end with a period.  The above subject line is otherwise good
(a nice summary that quickly explains the effect, which is handy in
e.g. abbreviated changelogs from release announcements).

> This is the beginning of the rewrite of the repacking.

This is a place to explain

 - the motivation / intended positive effect of the patch
 - any noticeable behavior changes
 - complications and other hints for people looking back and trying
   to understand this code

Based on the discussion before, I think the motivation is to get
closer to a goal of being able to have a core subset of git
functionality built in to git.  That would mean

 * people on Windows could get a copy of at least the core parts
   of Git without having to install a Unix-style shell

 * people deploying to servers don't have to rewrite the #! line
   or worry about the PATH and quality of installed POSIX
   utilities, if they are only using the built-in part written
   in C 

This patch is meant to be mostly a literal translation of the
git-repack script; the intent is that later patches would start using
more library facilities, but this patch is meant to be as close to a
no-op as possible so it doesn't do that kind of thing.

> All tests are constantly positive now.

This kind of changes-since-the-previous-iteration information that
doesn't need to be recorded in the commit log for posterity goes
after the "---" marker.

> 
> Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
> ---
>  Makefile                                        |   2 +-
[...]
> --- /dev/null
> +++ b/builtin/repack.c
> @@ -0,0 +1,361 @@
[...]
> +static int delta_base_offset = 0;

The "= 0" is automatic for statics without an initializer.  The
prevailing style in git is to leave it out.

Behavior change: in the script, wasn't the default "true"?

[...]
> +static void remove_temporary_files() {

Style: argument list should have "void".  (In C89 and C99, an empty
argument list means "having unspecified arguments" instead of "having
no arguments" as in C++.)

> +	DIR *dir;
> +	struct dirent *e;
> +	char *prefix, *path;
> +
> +	prefix = mkpathdup(".tmp-%d-pack", getpid());

pid_t is not guaranteed to be an "int", so this needs a cast.

> +	path = mkpathdup("%s/pack", get_object_directory());

The names "prefix" and "path" are quite generic.  What does this
function do?  A comment could help, e.g.:

	/*
	 * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
	 */

> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {

What happens if the directory does not exist?

> +		if (!prefixcmp(e->d_name, prefix)) {

The git-repack script removes $PACKTMP-*, but this code matches $PACKTMP*
instead.  Intentional?

> +			struct strbuf fname = STRBUF_INIT;
> +			strbuf_addf(&fname, "%s/%s", path, e->d_name);
> +			unlink(strbuf_detach(&fname, NULL));

This leaks fname (see Documentation/technical/api-strbuf.txt for an
explanation of strbuf_detach).

> +		}
> +	}
> +	free(prefix);
> +	free(path);
> +	closedir(dir);

I wonder if it would make sense for buffers to share space here.
E.g. something like

 {
	/*
	 * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
	 */

	struct strbuf buf = STRBUF_INIT;
	size_t dirlen, prefixlen;
	DIR *dir;
	struct dirent *e;

	/* .git/objects/pack */
	strbuf_addstr(&buf, get_object_directory());
	strbuf_addstr(&buf, "/pack");
	dir = opendir(buf.buf);
	if (!dir)
		... handle error ...

	/* .git/objects/pack/.tmp-$$-pack-* */
	dirlen = buf.len + 1;
	strbuf_addf(&buf, "/.tmp-%d-pack-", getpid());
	prefixlen = buf.len - dirlen;

	while ((e = readdir(dir))) {
		if (strncmp(e->d_name, buf.buf + dirlen, prefixlen))
			continue;
		strbuf_setlen(&buf, dirlen);
		strbuf_addstr(&buf, e->d_name);
		unlink(buf.buf);
	}
	if (closedir(dir))
		... handle error ...

	strbuf_release(&buf);
 }

I dunno.

[...]
> +/*
> + * Fills the filename list with all the files found in the pack directory
> + * ending with .pack, without that extension.
> + */

Ideally a comment opening a function will save lazy readers the
trouble of reading the body of the function, by explaining what the
function is for and giving them some reliable summary of what its
effect will be.

The above comment doesn't do either: it doesn't make it clear why
the function exists, and it doesn't make the semantics precise:
should fname_list be empty before this function is called?  Are the
filenames filling it absolute or relative?  What happens if packdir
is unreadable or doesn't exist?  What happens to files without a .pack
extension?

Also, the above comment is in the wrong part of the file to maximally
help a lazy reader: it should be at the call site, so the reader
doesn't have to look for the function's definition at all.

> +void get_pack_filenames(char *packdir, struct string_list *fname_list)

File-local function should be static.

Missing "const" on packdir.

> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *path, *suffix, *fname;
> +
> +	path = mkpath("%s/pack", get_object_directory());

Whenever I see the result of "mkpath" stored for more than a couple
of lines, I fret a little (since it's easy to scribble over the
rotating list of 4 get_pathname() buffers).  Would doing

	dir = opendir(mkpath(...))

directly work?  By the way, why does this function both compute
packdir and take it as an argument?

> +	suffix = ".pack";

Why not pass the string directly to suffixcmp and strlen?

> +	dir = opendir(path);

What happens if the packdir does not exist or cannot be read, or
another error occurs?

> +	while ((e = readdir(dir)) != NULL) {
> +		if (!suffixcmp(e->d_name, suffix)) {

Can decrease the indent and deal with the boring case early
by reversing the test:

		if (suffixcmp(...))
			continue;

[...]
> +void remove_pack(char *path, char* sha1)

Missing "const" on path.  The * in pointers sticks to the variable
name instead of its type.

> +{
> +	char *exts[] = {".pack", ".idx", ".keep"};
> +	int ext = 0;
> +	for (ext = 0; ext < 3; ext++) {

String constants are allowed in C to be assigned to a char * for
historical reasons, but it's never a good idea :), since they're
not mutable.

Array index is being assigned twice.  ARRAY_SIZE could make this
clearer:

 {
	const char *exts[] = {".pack", ... };
	int i;

	for (i = 0; i < ARRAY_SIZE(exts); i++)
		unlink(mkpath("%s/pack-%s%s", packdir, sha1, exts[i]));
 }

Is the sha1 parameter actually a sha1?

It wasn't obvious to me at first what this function is for.  Maybe
a name like remove_redundant_pack() would work.  E.g.:

	static void remove_redundant_pack(const char *pack_sha1)
	{
		const char *exts[] = {".pack", ".idx", ".keep"};
		struct strbuf buf = STRBUF_INIT;
		size_t plen;
		int i;

		strbuf_addf(&buf, "%s/pack-%s", get_object_directory(), pack_sha1);
		plen = buf.len;

		for (i = 0; i < ARRAY_SIZE(exts); i++) {
			strbuf_setlen(&buf, plen);
			strbuf_addstr(&buf, exts[i]);
			unlink(buf.buf);
		}
	}

[...]
> +int cmd_repack(int argc, const char **argv, const char *prefix) {
> +

Stray blank line.

> +	char *exts[2] = {".idx", ".pack"};

Missing "const".

> +	char *packdir, *packtmp, line[1024];

Fixed-size buffers are no fun.  Better to use a strbuf with
e.g. strbuf_getline (see Documentation/technical/api-strbuf.txt).

[...]
> +	struct option builtin_repack_options[] = {
> +		OPT_BOOL('a', NULL, &pack_everything,
> +				N_("pack everything in a single pack")),
> +		OPT_BOOL('A', NULL, &pack_everything_but_loose,
> +				N_("same as -a, and turn unreachable objects loose")),
> +		OPT_BOOL('d', NULL, &delete_redundant,
> +				N_("remove redundant packs, and run git-prune-packed")),
> +		OPT_BOOL('f', NULL, &no_reuse_delta,
> +				N_("pass --no-reuse-delta to git-pack-objects")),
> +		OPT_BOOL('F', NULL, &no_reuse_object,
> +				N_("pass --no-reuse-object to git-pack-objects")),
> +		OPT_BOOL('n', NULL, &no_update_server_info,
> +				N_("do not run git-update-server-info")),
> +		OPT__QUIET(&quiet, N_("be quiet")),
> +		OPT_BOOL('l', "local", &local,
> +				N_("pass --local to git-pack-objects")),
> +		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
> +				N_("with -A, do not loosen objects older than this Packing constraints")),

Do you mean
		... than this")),

		OPT_GROUP(N_("Packing constraints")),

?

[...]
> +	git_config(repack_config, NULL);
> +
> +	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> +				git_repack_usage, 0);
> +
> +	sigchain_push_common(remove_pack_on_signal);
> +
> +	packdir = mkpathdup("%s/pack", get_object_directory());
> +	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());

This filename pattern also appears higher in the file.  Maybe it's
possible for them to share a constant or something?  (If it's too
much fuss, no need to bother.)

> +	argv_array_push(&cmd_args, "pack-objects");
> +	argv_array_push(&cmd_args, "--keep-true-parents");
> +	argv_array_push(&cmd_args, "--honor-pack-keep");
> +	argv_array_push(&cmd_args, "--non-empty");
> +	argv_array_push(&cmd_args, "--all");
> +	argv_array_push(&cmd_args, "--reflog");
> +
> +	if (window)
> +		argv_array_pushf(&cmd_args, "--window=%u", window);
> +
> +	if (window_memory)
> +		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);

Style: I'd leave out these blank lines, so the reader can see more
of the arguments in one screenful.

Are these signed or unsigned integers?  What happens if I pass
a value < 0 or > INT_MAX?

[...]
> +	if (pack_everything + pack_everything_but_loose == 0) {

Probably easier to read as

	if (!pack_everything && !pack_everything_but_loose) {

> +		argv_array_push(&cmd_args, "--unpacked");
> +		argv_array_push(&cmd_args, "--incremental");
> +	} else {
> +		struct string_list fname_list = STRING_LIST_INIT_DUP;
> +		get_pack_filenames(packdir, &fname_list);
> +		for_each_string_list_item(item, &fname_list) {

The git-repack script uses "find", which scans the directory
recursively, though I'm not sure why.  (Probably not important?)

Instead of building a temporary and potentially long string_list
and then pruning it to build another list, why not build the list
of packs without a corresponding .keep file in a single pass?

> +			char *fname;
> +			fname = mkpathdup("%s/%s.keep", packdir, item->string);
> +			if (file_exists(fname)) {
> +				/* when the keep file is there, we're ignoring that pack */
> +			} else {
> +				string_list_append(&existing_packs, item->string);
> +			}

Simplifying, and avoiding braces around a single-line "if" body:

			if (!file_exists(mkpath(...)))
				string_list_append(...);

[...]
> +		if (existing_packs.nr && delete_redundant) {
> +			if (unpack_unreachable)
> +				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);

Long line.

[...]
> +	count_packs = 0;
> +	out = xfdopen(cmd.out, "r");
> +	while (fgets(line, sizeof(line), out)) {
> +		/* a line consists of 40 hex chars + '\n' */

Time to sleep.  Stopping here for this round.

Hope that helps,
Jonathan

  reply	other threads:[~2013-08-21  8:25 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
2013-08-14  7:26   ` Matthieu Moy
2013-08-14 16:26     ` Stefan Beller
2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
2013-08-14 16:49         ` Antoine Pelisse
2013-08-14 17:04           ` Stefan Beller
2013-08-14 17:19             ` Jeff King
2013-08-14 17:25           ` Martin Fick
2013-08-14 22:16             ` Stefan Beller
2013-08-14 22:28               ` Martin Fick
2013-08-14 22:53                 ` Junio C Hamano
2013-08-14 23:28                   ` Martin Fick
2013-08-15 17:15                     ` Junio C Hamano
2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
2013-08-17 13:34                         ` René Scharfe
2013-08-17 19:18                           ` Kyle J. McKay
2013-08-18 14:34                           ` Stefan Beller
2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
2013-08-18 15:41                               ` Kyle J. McKay
2013-08-18 16:44                               ` René Scharfe
2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
2013-08-19 23:23                                   ` Stefan Beller
2013-08-20 13:31                                     ` Johannes Sixt
2013-08-20 15:08                                       ` Stefan Beller
2013-08-20 18:38                                         ` Johannes Sixt
2013-08-20 18:57                                         ` René Scharfe
2013-08-20 22:36                                           ` Stefan Beller
2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
2013-08-21  8:25                                               ` Jonathan Nieder [this message]
2013-08-21 10:37                                                 ` Stefan Beller
2013-08-21 17:25                                                 ` Stefan Beller
2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
2013-08-21 21:52                                                       ` Matthieu Moy
2013-08-21 22:15                                                       ` Stefan Beller
2013-08-21 22:50                                                         ` Junio C Hamano
2013-08-21 22:57                                                           ` Stefan Beller
2013-08-22 10:46                                                         ` Johannes Sixt
2013-08-22 21:03                                                       ` Jonathan Nieder
2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
2013-08-21 12:47                                                 ` Stefan Beller
2013-08-21 13:05                                                   ` Matthieu Moy
2013-08-21 12:53                                                 ` Stefan Beller
2013-08-21 13:07                                                   ` Matthieu Moy
2013-08-22 10:46                                                     ` Johannes Sixt
2013-08-22 10:46                                                 ` Johannes Sixt
2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
2013-08-22 20:31                                                     ` Junio C Hamano
2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
2013-08-21  9:20                                             ` Johannes Sixt
2013-08-20 21:24                                       ` Stefan Beller
2013-08-20 21:34                                         ` Jonathan Nieder
2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
2013-08-20 21:59                                             ` Jonathan Nieder
2013-08-21 22:43                                               ` Stefan Beller
2013-08-22 17:29                                                 ` Junio C Hamano
2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
2013-08-14 22:59                 ` Matthieu Moy
2013-08-15  7:47                   ` Stefan Beller
2013-08-15  4:15             ` Duy Nguyen
2013-08-14 17:26           ` Junio C Hamano
2013-08-14 22:51           ` Matthieu Moy
2013-08-14 23:25             ` Martin Fick
2013-08-15  0:26               ` Martin Fick
2013-08-15  7:46               ` Stefan Beller
2013-08-15 15:04                 ` Martin Fick
2013-08-15  4:20             ` Duy Nguyen
2013-08-14 17:04         ` Junio C Hamano
2013-08-15  7:53           ` Stefan Beller
2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130821082527.GC2802@elie.Belkin \
    --to=jrnieder@gmail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=iveqy@iveqy.com \
    --cc=j6t@kdbg.org \
    --cc=mackyle@gmail.com \
    --cc=mfick@codeaurora.org \
    --cc=pclouds@gmail.com \
    --cc=stefanbeller@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).