git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Rewriting git-repack in C
@ 2013-08-13 19:23 Stefan Beller
  2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
  2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-13 19:23 UTC (permalink / raw)
  To: git, Matthieu.Moy, pclouds, iveqy, gitster, apelisse; +Cc: Stefan Beller

Hello,

a few days ago, I asked how I would proceed if I'd want to rewrite git-repack.
The general consensus (Duy, Junio and Matthieu) was to not touch
git-pack-objects, but rather translate the shell version of the git-repack
to C.
I'll send a very rough patch, which still contains 2 todos, so it is not yet
feature complete. I was using the run_command api and the string-list api
a lot.
The following patch is definitle not recommended for a fine review, but
I'd still ask for feedback, whether this seems like the right approach.

Also I'd like to propose a small technical change:
I found no easy way to get a subset of files in a specific directory,
which is very easy in the shell version via * (rm file/in/dir/startswithprefix-*),
So maybe instead of just prefixing the temporary files such as:
$PACKDIR/.tmp-$PID-pack-*
we could put all the temporary files in a temporary directory.
This would come handy as there already exists functions to manipulate 
a directory recursively (such as remove_dir_recursively)

That patch requires the patches of jc/parseopt-command-modes

Stefan

Stefan Beller (1):
  repack: rewrite the shell script in C.

 Makefile                       |   2 +-
 builtin.h                      |   1 +
 builtin/repack.c               | 313 +++++++++++++++++++++++++++++++++++++++++
 contrib/examples/git-repack.sh | 194 +++++++++++++++++++++++++
 git-repack.sh                  | 194 -------------------------
 git.c                          |   1 +
 6 files changed, 510 insertions(+), 195 deletions(-)
 create mode 100644 builtin/repack.c
 create mode 100755 contrib/examples/git-repack.sh
 delete mode 100755 git-repack.sh

-- 
1.8.4.rc2

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH] repack: rewrite the shell script in C.
  2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
@ 2013-08-13 19:23 ` Stefan Beller
  2013-08-14  7:26   ` Matthieu Moy
  2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-13 19:23 UTC (permalink / raw)
  To: git, Matthieu.Moy, pclouds, iveqy, gitster, apelisse; +Cc: Stefan Beller

This is the beginning of the rewrite of the repacking.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                       |   2 +-
 builtin.h                      |   1 +
 builtin/repack.c               | 313 +++++++++++++++++++++++++++++++++++++++++
 contrib/examples/git-repack.sh | 194 +++++++++++++++++++++++++
 git-repack.sh                  | 194 -------------------------
 git.c                          |   1 +
 6 files changed, 510 insertions(+), 195 deletions(-)
 create mode 100644 builtin/repack.c
 create mode 100755 contrib/examples/git-repack.sh
 delete mode 100755 git-repack.sh

diff --git a/Makefile b/Makefile
index 3588ca1..69e5267 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -971,6 +970,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..bfec56d
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,313 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <dirent.h>
+
+static const char * const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+static int delta_base_offset = 1; // enabled by default since 22c79eab (2008-06-25)
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	struct strbuf packtmpbuf = STRBUF_INIT;
+	/* todo: the files are not in a subdirectory, but only a subset in %s/pack, see disscussion */
+	strbuf_addf(&packtmpbuf, "%s/pack/.tmp-%d-pack", get_object_directory(), getpid());
+	remove_dir_recursively(&packtmpbuf, 0);
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	int unpack_unreachable = 0;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	const char *cmd_args[20];
+	int cmd_i = 0;
+	struct child_process cmd;
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_DATE(0, "unpack-unreachable", &unpack_unreachable,
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpath("%s/pack", get_object_directory());
+	packtmp = mkpath("%s/.tmp-%d-pack", packdir, getpid());
+
+	struct strbuf packtmpbuf = STRBUF_INIT;
+	strbuf_addf(&packtmpbuf, "%s", packtmp);
+	remove_dir_recursively(&packtmpbuf, 0);
+
+	cmd_args[cmd_i++] = "pack-objects";
+	cmd_args[cmd_i++] = "--keep-true-parents";
+	cmd_args[cmd_i++] = "--honor-pack-keep";
+	cmd_args[cmd_i++] = "--non-empty";
+	cmd_args[cmd_i++] = "--all";
+	cmd_args[cmd_i++] = "--reflog";
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		cmd_args[cmd_i++] = "--unpacked";
+		cmd_args[cmd_i++] = "--incremental";
+	} else {
+		// todo:
+		// handle -a and -A option here.
+		// find all *.pack in pack dir:
+
+/*
+	if [ -d "$PACKDIR" ]; then
+		# cd into packdir and find all packs, sed will remove beginning ./ and ending .pack
+		for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
+			| sed -e 's/^\.\///' -e 's/\.pack$//'`
+		do
+			if [ -e "$PACKDIR/$e.keep" ]; then
+				: keep
+				# do nothing
+			else
+				# add the pack to the existing list
+				existing="$existing $e"
+			fi
+		done
+		# -n string     True if the length of string is nonzero.
+		# -a and
+		if test -n "$existing" -a -n "$unpack_unreachable" -a \
+			-n "$remove_redundant"
+		then
+			# This may have arbitrary user arguments, so we
+			# have to protect it against whitespace splitting
+			# when it gets run as "pack-objects $args" later.
+			# Fortunately, we know it's an approxidate, so we
+			# can just use dots instead.
+			args="$args $(echo "$unpack_unreachable" | tr ' ' .)"
+		fi
+	fi
+*/
+	}
+
+	if (local)
+		cmd_args[cmd_i++] = "--local";
+
+	if (delta_base_offset)
+		cmd_args[cmd_i++] = "--delta-base-offset";
+
+	cmd_args[cmd_i++] = packtmp;
+	cmd_args[cmd_i] = NULL;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = cmd_args;
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	run_command(&cmd);
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		assert(strlen(line) == 41); // 40 chars for sha1 + '\n'
+		line[40] = '\0'; // remove '\n'
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	char *fname, *fname_old;
+	fname = xmalloc(strlen(packdir) + 47); // 40 chars for the sha1
+	strcpy(fname, packdir);
+	strcpy(fname + strlen(packdir), "/");
+
+	fname_old = xmalloc(strlen(packdir) + 52); // 40 chars for the sha1
+	strcpy(fname_old, packdir);
+	strcpy(fname_old + strlen(packdir), "/");
+	char *exts[2] = {".idx", ".pack"};
+
+	int failed = 0;
+
+	struct string_list_item *item;
+	for_each_string_list_item(item, &names) {
+		int ext;
+		for (ext = 0; ext < 1; ext++) {
+			strcpy(fname + strlen(packdir) + 1, item->string);
+			strcpy(fname + strlen(packdir) + 41, exts[ext]);
+			if (!file_exists(fname))
+				continue;
+
+			strcpy(fname_old, packdir);
+			strcpy(fname_old + strlen(packdir), "/old-");
+			strcpy(fname_old + strlen(packdir) + 5, item->string);
+			strcpy(fname_old + strlen(packdir) + 45, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break while loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+
+		for_each_string_list_item(item, &rollback) {
+			sprintf(fname, "%s/%s", packdir, item->string);
+			sprintf(fname_old, "%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	// Now the ones with the same name are out of the way...
+	struct string_list fullbases = STRING_LIST_INIT_DUP;
+	for_each_string_list_item(item, &names) {
+		string_list_append(&fullbases, item->string);
+
+		sprintf(fname, "%s/pack-%s.pack", packdir, item->string);
+		sprintf(fname_old, "%s-%s.pack", packtmp, item->string);
+		chmod(fname_old, 0660);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+
+		sprintf(fname, "%s/pack-%s.idx", packdir, item->string);
+		sprintf(fname_old, "%s-%s.idx", packtmp, item->string);
+		chmod(fname_old, 0660); // todo, in shell there was "chmod a-w ..."
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		sprintf(fname, "%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+
+		sprintf(fname, "%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		// todo
+		/*
+			# We know $existing are all redundant.
+			if [ -n "$existing" ]
+			then
+				( cd "$PACKDIR" &&
+				  for e in $existing
+				  do
+					case " $fullbases " in
+					*" $e "*) ;;
+					*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+					esac
+				  done
+				)
+			fi
+			git prune-packed ${GIT_QUIET:+-q}
+		*/
+	}
+
+	if (!no_update_server_info) {
+		cmd_i = 0;
+		cmd_args[cmd_i++] = "update-server-info";
+		cmd_args[cmd_i++] = NULL;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
+
diff --git a/contrib/examples/git-repack.sh b/contrib/examples/git-repack.sh
new file mode 100755
index 0000000..7579331
--- /dev/null
+++ b/contrib/examples/git-repack.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Linus Torvalds
+#
+
+OPTIONS_KEEPDASHDASH=
+OPTIONS_SPEC="\
+git repack [options]
+--
+a               pack everything in a single pack
+A               same as -a, and turn unreachable objects loose
+d               remove redundant packs, and run git-prune-packed
+f               pass --no-reuse-delta to git-pack-objects
+F               pass --no-reuse-object to git-pack-objects
+n               do not run git-update-server-info
+q,quiet         be quiet
+l               pass --local to git-pack-objects
+unpack-unreachable=  with -A, do not loosen objects older than this
+ Packing constraints
+window=         size of the window used for delta compression
+window-memory=  same as the above, but limit memory size instead of entries count
+depth=          limits the maximum delta depth
+max-pack-size=  maximum size of each packfile
+"
+SUBDIRECTORY_OK='Yes'
+. git-sh-setup
+
+no_update_info= all_into_one= remove_redundant= unpack_unreachable=
+local= no_reuse= extra=
+while test $# != 0
+do
+	case "$1" in
+	-n)	no_update_info=t ;;
+	-a)	all_into_one=t ;;
+	-A)	all_into_one=t
+		unpack_unreachable=--unpack-unreachable ;;
+	--unpack-unreachable)
+		unpack_unreachable="--unpack-unreachable=$2"; shift ;;
+	-d)	remove_redundant=t ;;
+	-q)	GIT_QUIET=t ;;
+	-f)	no_reuse=--no-reuse-delta ;;
+	-F)	no_reuse=--no-reuse-object ;;
+	-l)	local=--local ;;
+	--max-pack-size|--window|--window-memory|--depth)
+		extra="$extra $1=$2"; shift ;;
+	--) shift; break;;
+	*)	usage ;;
+	esac
+	shift
+done
+
+case "`git config --bool repack.usedeltabaseoffset || echo true`" in
+true)
+	extra="$extra --delta-base-offset" ;;
+esac
+
+PACKDIR="$GIT_OBJECT_DIRECTORY/pack"
+PACKTMP="$PACKDIR/.tmp-$$-pack"
+rm -f "$PACKTMP"-*
+trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
+
+# There will be more repacking strategies to come...
+case ",$all_into_one," in
+,,)
+	args='--unpacked --incremental'
+	;;
+,t,)
+	args= existing=
+	if [ -d "$PACKDIR" ]; then
+		for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
+			| sed -e 's/^\.\///' -e 's/\.pack$//'`
+		do
+			if [ -e "$PACKDIR/$e.keep" ]; then
+				: keep
+			else
+				existing="$existing $e"
+			fi
+		done
+		if test -n "$existing" -a -n "$unpack_unreachable" -a \
+			-n "$remove_redundant"
+		then
+			# This may have arbitrary user arguments, so we
+			# have to protect it against whitespace splitting
+			# when it gets run as "pack-objects $args" later.
+			# Fortunately, we know it's an approxidate, so we
+			# can just use dots instead.
+			args="$args $(echo "$unpack_unreachable" | tr ' ' .)"
+		fi
+	fi
+	;;
+esac
+
+mkdir -p "$PACKDIR" || exit
+
+args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
+names=$(git pack-objects --keep-true-parents --honor-pack-keep --non-empty --all --reflog $args </dev/null "$PACKTMP") ||
+	exit 1
+if [ -z "$names" ]; then
+	say Nothing new to pack.
+fi
+
+# Ok we have prepared all new packfiles.
+
+# First see if there are packs of the same name and if so
+# if we can move them out of the way (this can happen if we
+# repacked immediately after packing fully.
+rollback=
+failed=
+for name in $names
+do
+	for sfx in pack idx
+	do
+		file=pack-$name.$sfx
+		test -f "$PACKDIR/$file" || continue
+		rm -f "$PACKDIR/old-$file" &&
+		mv "$PACKDIR/$file" "$PACKDIR/old-$file" || {
+			failed=t
+			break
+		}
+		rollback="$rollback $file"
+	done
+	test -z "$failed" || break
+done
+
+# If renaming failed for any of them, roll the ones we have
+# already renamed back to their original names.
+if test -n "$failed"
+then
+	rollback_failure=
+	for file in $rollback
+	do
+		mv "$PACKDIR/old-$file" "$PACKDIR/$file" ||
+		rollback_failure="$rollback_failure $file"
+	done
+	if test -n "$rollback_failure"
+	then
+		echo >&2 "WARNING: Some packs in use have been renamed by"
+		echo >&2 "WARNING: prefixing old- to their name, in order to"
+		echo >&2 "WARNING: replace them with the new version of the"
+		echo >&2 "WARNING: file.  But the operation failed, and"
+		echo >&2 "WARNING: attempt to rename them back to their"
+		echo >&2 "WARNING: original names also failed."
+		echo >&2 "WARNING: Please rename them in $PACKDIR manually:"
+		for file in $rollback_failure
+		do
+			echo >&2 "WARNING:   old-$file -> $file"
+		done
+	fi
+	exit 1
+fi
+
+# Now the ones with the same name are out of the way...
+fullbases=
+for name in $names
+do
+	fullbases="$fullbases pack-$name"
+	chmod a-w "$PACKTMP-$name.pack"
+	chmod a-w "$PACKTMP-$name.idx"
+	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
+	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
+	exit
+done
+
+# Remove the "old-" files
+for name in $names
+do
+	rm -f "$PACKDIR/old-pack-$name.idx"
+	rm -f "$PACKDIR/old-pack-$name.pack"
+done
+
+# End of pack replacement.
+
+if test "$remove_redundant" = t
+then
+	# We know $existing are all redundant.
+	if [ -n "$existing" ]
+	then
+		( cd "$PACKDIR" &&
+		  for e in $existing
+		  do
+			case " $fullbases " in
+			*" $e "*) ;;
+			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			esac
+		  done
+		)
+	fi
+	git prune-packed ${GIT_QUIET:+-q}
+fi
+
+case "$no_update_info" in
+t) : ;;
+*) git update-server-info ;;
+esac
diff --git a/git-repack.sh b/git-repack.sh
deleted file mode 100755
index 7579331..0000000
--- a/git-repack.sh
+++ /dev/null
@@ -1,194 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Linus Torvalds
-#
-
-OPTIONS_KEEPDASHDASH=
-OPTIONS_SPEC="\
-git repack [options]
---
-a               pack everything in a single pack
-A               same as -a, and turn unreachable objects loose
-d               remove redundant packs, and run git-prune-packed
-f               pass --no-reuse-delta to git-pack-objects
-F               pass --no-reuse-object to git-pack-objects
-n               do not run git-update-server-info
-q,quiet         be quiet
-l               pass --local to git-pack-objects
-unpack-unreachable=  with -A, do not loosen objects older than this
- Packing constraints
-window=         size of the window used for delta compression
-window-memory=  same as the above, but limit memory size instead of entries count
-depth=          limits the maximum delta depth
-max-pack-size=  maximum size of each packfile
-"
-SUBDIRECTORY_OK='Yes'
-. git-sh-setup
-
-no_update_info= all_into_one= remove_redundant= unpack_unreachable=
-local= no_reuse= extra=
-while test $# != 0
-do
-	case "$1" in
-	-n)	no_update_info=t ;;
-	-a)	all_into_one=t ;;
-	-A)	all_into_one=t
-		unpack_unreachable=--unpack-unreachable ;;
-	--unpack-unreachable)
-		unpack_unreachable="--unpack-unreachable=$2"; shift ;;
-	-d)	remove_redundant=t ;;
-	-q)	GIT_QUIET=t ;;
-	-f)	no_reuse=--no-reuse-delta ;;
-	-F)	no_reuse=--no-reuse-object ;;
-	-l)	local=--local ;;
-	--max-pack-size|--window|--window-memory|--depth)
-		extra="$extra $1=$2"; shift ;;
-	--) shift; break;;
-	*)	usage ;;
-	esac
-	shift
-done
-
-case "`git config --bool repack.usedeltabaseoffset || echo true`" in
-true)
-	extra="$extra --delta-base-offset" ;;
-esac
-
-PACKDIR="$GIT_OBJECT_DIRECTORY/pack"
-PACKTMP="$PACKDIR/.tmp-$$-pack"
-rm -f "$PACKTMP"-*
-trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
-
-# There will be more repacking strategies to come...
-case ",$all_into_one," in
-,,)
-	args='--unpacked --incremental'
-	;;
-,t,)
-	args= existing=
-	if [ -d "$PACKDIR" ]; then
-		for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
-			| sed -e 's/^\.\///' -e 's/\.pack$//'`
-		do
-			if [ -e "$PACKDIR/$e.keep" ]; then
-				: keep
-			else
-				existing="$existing $e"
-			fi
-		done
-		if test -n "$existing" -a -n "$unpack_unreachable" -a \
-			-n "$remove_redundant"
-		then
-			# This may have arbitrary user arguments, so we
-			# have to protect it against whitespace splitting
-			# when it gets run as "pack-objects $args" later.
-			# Fortunately, we know it's an approxidate, so we
-			# can just use dots instead.
-			args="$args $(echo "$unpack_unreachable" | tr ' ' .)"
-		fi
-	fi
-	;;
-esac
-
-mkdir -p "$PACKDIR" || exit
-
-args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
-names=$(git pack-objects --keep-true-parents --honor-pack-keep --non-empty --all --reflog $args </dev/null "$PACKTMP") ||
-	exit 1
-if [ -z "$names" ]; then
-	say Nothing new to pack.
-fi
-
-# Ok we have prepared all new packfiles.
-
-# First see if there are packs of the same name and if so
-# if we can move them out of the way (this can happen if we
-# repacked immediately after packing fully.
-rollback=
-failed=
-for name in $names
-do
-	for sfx in pack idx
-	do
-		file=pack-$name.$sfx
-		test -f "$PACKDIR/$file" || continue
-		rm -f "$PACKDIR/old-$file" &&
-		mv "$PACKDIR/$file" "$PACKDIR/old-$file" || {
-			failed=t
-			break
-		}
-		rollback="$rollback $file"
-	done
-	test -z "$failed" || break
-done
-
-# If renaming failed for any of them, roll the ones we have
-# already renamed back to their original names.
-if test -n "$failed"
-then
-	rollback_failure=
-	for file in $rollback
-	do
-		mv "$PACKDIR/old-$file" "$PACKDIR/$file" ||
-		rollback_failure="$rollback_failure $file"
-	done
-	if test -n "$rollback_failure"
-	then
-		echo >&2 "WARNING: Some packs in use have been renamed by"
-		echo >&2 "WARNING: prefixing old- to their name, in order to"
-		echo >&2 "WARNING: replace them with the new version of the"
-		echo >&2 "WARNING: file.  But the operation failed, and"
-		echo >&2 "WARNING: attempt to rename them back to their"
-		echo >&2 "WARNING: original names also failed."
-		echo >&2 "WARNING: Please rename them in $PACKDIR manually:"
-		for file in $rollback_failure
-		do
-			echo >&2 "WARNING:   old-$file -> $file"
-		done
-	fi
-	exit 1
-fi
-
-# Now the ones with the same name are out of the way...
-fullbases=
-for name in $names
-do
-	fullbases="$fullbases pack-$name"
-	chmod a-w "$PACKTMP-$name.pack"
-	chmod a-w "$PACKTMP-$name.idx"
-	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
-	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
-	exit
-done
-
-# Remove the "old-" files
-for name in $names
-do
-	rm -f "$PACKDIR/old-pack-$name.idx"
-	rm -f "$PACKDIR/old-pack-$name.pack"
-done
-
-# End of pack replacement.
-
-if test "$remove_redundant" = t
-then
-	# We know $existing are all redundant.
-	if [ -n "$existing" ]
-	then
-		( cd "$PACKDIR" &&
-		  for e in $existing
-		  do
-			case " $fullbases " in
-			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
-			esac
-		  done
-		)
-	fi
-	git prune-packed ${GIT_QUIET:+-q}
-fi
-
-case "$no_update_info" in
-t) : ;;
-*) git update-server-info ;;
-esac
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc2

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH] Rewriting git-repack in C
  2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
  2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
@ 2013-08-14  7:12 ` Matthieu Moy
  1 sibling, 0 replies; 72+ messages in thread
From: Matthieu Moy @ 2013-08-14  7:12 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, iveqy, gitster, apelisse

Stefan Beller <stefanbeller@googlemail.com> writes:

> Also I'd like to propose a small technical change:
> I found no easy way to get a subset of files in a specific directory,
> which is very easy in the shell version via * (rm file/in/dir/startswithprefix-*),
> So maybe instead of just prefixing the temporary files such as:
> $PACKDIR/.tmp-$PID-pack-*

If it's just a prefix, you can iterate over the full list of files, and
then use prefixcmp(...) to find the right files. May seem inefficient,
but AFAIK it's how the shell does wildcard expansion.

You should be carefull that if the operation is interrupted, the next
"git gc" must remove all the garbage, so if you change the naming
convention, you must also change the place where the cleanup is done.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
@ 2013-08-14  7:26   ` Matthieu Moy
  2013-08-14 16:26     ` Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Matthieu Moy @ 2013-08-14  7:26 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, iveqy, gitster, apelisse

Stefan Beller <stefanbeller@googlemail.com> writes:

> This is the beginning of the rewrite of the repacking.

(please, mark your patch as RFC/PATCH in the subject in this case)

A few quick comments on style below.

>  Makefile                       |   2 +-
>  builtin.h                      |   1 +
>  builtin/repack.c               | 313 +++++++++++++++++++++++++++++++++++++++++
>  contrib/examples/git-repack.sh | 194 +++++++++++++++++++++++++
>  git-repack.sh                  | 194 -------------------------
>  git.c                          |   1 +

I suggested that you first enrich the test suite if needed. Did you
check that the testsuite had good enough coverage for git-repack?

> +static const char * const git_repack_usage[] = {

Style: no space after *.

> +static int delta_base_offset = 1; // enabled by default since 22c79eab (2008-06-25)

No // comments please (they're C99, not portable C90).

> +int cmd_repack(int argc, const char **argv, const char *prefix) {

Brace on the next line.

> +	fname = xmalloc(strlen(packdir) + 47); // 40 chars for the sha1

> +	fname_old = xmalloc(strlen(packdir) + 52); // 40 chars for the sha1

I'd rather have "40 + strlen("whatever")" explicitly.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-14  7:26   ` Matthieu Moy
@ 2013-08-14 16:26     ` Stefan Beller
  2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-14 16:26 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, pclouds, iveqy, gitster, apelisse

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

On 08/14/2013 09:26 AM, Matthieu Moy wrote:
> I suggested that you first enrich the test suite if needed. Did you
> check that the testsuite had good enough coverage for git-repack?

There are already quite some tests using git-repack for testing other
purposes. Also git repack seems to be called from other commands
internally such as "git notes prune" or "git gc".

I'll look into enhancing the test suite once I got the rewritten
code working on the current test suite.

Stefan




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:26     ` Stefan Beller
@ 2013-08-14 16:27       ` Stefan Beller
  2013-08-14 16:49         ` Antoine Pelisse
  2013-08-14 17:04         ` Junio C Hamano
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-14 16:27 UTC (permalink / raw)
  To: git, Matthieu.Moy, pclouds, iveqy, gitster, apelisse; +Cc: Stefan Beller

* Suggestions by Matthieu Moy have been included.
* I think it is completed apart from small todos and bugfixes.
* breaks the test suite, first breakage is t5301 (gc, sliding window)

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                       |   2 +-
 builtin.h                      |   1 +
 builtin/repack.c               | 410 +++++++++++++++++++++++++++++++++++++++++
 contrib/examples/git-repack.sh | 194 +++++++++++++++++++
 git-repack.sh                  | 194 -------------------
 git.c                          |   1 +
 6 files changed, 607 insertions(+), 195 deletions(-)
 create mode 100644 builtin/repack.c
 create mode 100755 contrib/examples/git-repack.sh
 delete mode 100755 git-repack.sh

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..d39c34e
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,410 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <dirent.h>
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path, *fname;
+
+	prefix = xmalloc(strlen(".tmp-10000-pack") + 1);
+	sprintf(prefix, ".tmp-%d-pack", getpid());
+
+	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
+	sprintf(path, "%s/pack", get_object_directory());
+
+	fname = xmalloc(strlen(path) + strlen("/")
+		+ strlen(prefix) + strlen("/")
+		+ 40 + strlen(".pack") + 1);
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			sprintf(fname, "%s/%s", path, e->d_name);
+			unlink(fname);
+		}
+	}
+	free(fname);
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+void get_pack_sha1_list(char *packdir, struct string_list *sha1_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix;
+
+	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
+	sprintf(path, "%s/pack", get_object_directory());
+
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			char buf[255], *sha1;
+			strcpy(buf, e->d_name);
+			buf[strlen(e->d_name) - strlen(suffix)] = '\0';
+			sha1 = &buf[strlen(e->d_name) - strlen(suffix) - 40];
+			string_list_append(sha1_list, sha1);
+		}
+	}
+	free(path);
+	closedir(dir);
+}
+
+void remove_pack(char *path, char* sha1)
+{
+	DIR *dir;
+	struct dirent *e;
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+
+		char *sha_begin, *sha_end;
+		sha_end = e->d_name + strlen(e->d_name);
+		while (sha_end > e->d_name && *sha_end != '.')
+			sha_end--;
+
+		/* do not touch files not ending in .pack, .idx or .keep */
+		if (strcmp(sha_end, ".pack") &&
+			strcmp(sha_end, ".idx") &&
+			strcmp(sha_end, ".keep"))
+			continue;
+
+		sha_begin = sha_end - 40;
+
+		if (sha_begin >= e->d_name && !strncmp(sha_begin, sha1, 40)) {
+			char *fname;
+			fname = xmalloc(strlen(path) + 1 + strlen(e->d_name));
+			sprintf(fname, "%s/%s", path, e->d_name);
+			unlink(fname);
+			/*TODO: free(fname); fails here sometimes, needs investigation*/
+			break;
+		}
+	}
+	closedir(dir);
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	unsigned long unpack_unreachable = 0;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	const char *cmd_args[20];
+	int cmd_i = 0;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct stat statbuffer;
+	char window_str[64], window_mem_str[64], depth_str[64], max_pack_size_str[64];
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_DATE(0, "unpack-unreachable", &unpack_unreachable,
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpath("%s/pack", get_object_directory());
+	packtmp = xmalloc(strlen(packdir) + strlen("/.tmp-10000-pack") + 1);
+	sprintf(packtmp, "%s/.tmp-%d-pack", packdir, getpid());
+
+	remove_temporary_files();
+
+	cmd_args[cmd_i++] = "pack-objects";
+	cmd_args[cmd_i++] = "--keep-true-parents";
+	cmd_args[cmd_i++] = "--honor-pack-keep";
+	cmd_args[cmd_i++] = "--non-empty";
+	cmd_args[cmd_i++] = "--all";
+	cmd_args[cmd_i++] = "--reflog";
+
+	if (window) {
+		sprintf(window_str, "--window=%u", window);
+		cmd_args[cmd_i++] = window_str;
+	}
+	if (window_memory) {
+		sprintf(window_mem_str, "--window-memory=%u", window_memory);
+		cmd_args[cmd_i++] = window_str;
+	}
+	if (depth) {
+		sprintf(depth_str, "--depth=%u", depth);
+		cmd_args[cmd_i++] = depth_str;
+	}
+	if (max_pack_size) {
+		sprintf(max_pack_size_str, "--max_pack_size=%u", max_pack_size);
+		cmd_args[cmd_i++] = max_pack_size_str;
+	}
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		cmd_args[cmd_i++] = "--unpacked";
+		cmd_args[cmd_i++] = "--incremental";
+	} else {
+		if (pack_everything_but_loose)
+			cmd_args[cmd_i++] = "--unpack-unreachable";
+
+		struct string_list sha1_list = STRING_LIST_INIT_DUP;
+		get_pack_sha1_list(packdir, &sha1_list);
+		for_each_string_list_item(item, &sha1_list) {
+			char *fname;
+			fname = xmalloc(strlen(packdir) + strlen("/") + 40 + strlen(".keep"));
+			sprintf(fname, "%s/%s.keep", packdir, item->string);
+			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+		}
+
+		if (existing_packs.nr && unpack_unreachable && delete_redundant) {
+			/*
+			 * TODO: convert unpack_unreachable (being time since epoch)
+			 * to an aproxidate again
+			 */
+			cmd_args[cmd_i++] = "--unpack-unreachable=$DATE";
+		}
+	}
+
+	if (local)
+		cmd_args[cmd_i++] = "--local";
+
+	if (delta_base_offset)
+		cmd_args[cmd_i++] = "--delta-base-offset";
+
+	cmd_args[cmd_i++] = packtmp;
+	cmd_args[cmd_i] = NULL;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = cmd_args;
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	run_command(&cmd);
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		assert(strlen(line) == 41);
+		line[40] = '\0';
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	char *fname, *fname_old;
+	fname = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
+	strcpy(fname, packdir);
+	strcpy(fname + strlen(packdir), "/");
+
+	fname_old = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
+	strcpy(fname_old, packdir);
+	strcpy(fname_old + strlen(packdir), "/");
+	char *exts[2] = {".idx", ".pack"};
+
+	int failed = 0;
+
+	for_each_string_list_item(item, &names) {
+		int ext;
+		for (ext = 0; ext < 1; ext++) {
+			strcpy(fname + strlen(packdir) + 1, item->string);
+			strcpy(fname + strlen(packdir) + 41, exts[ext]);
+			if (!file_exists(fname))
+				continue;
+
+			strcpy(fname_old, packdir);
+			strcpy(fname_old + strlen(packdir), "/old-");
+			strcpy(fname_old + strlen(packdir) + 5, item->string);
+			strcpy(fname_old + strlen(packdir) + 45, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break while loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+
+		for_each_string_list_item(item, &rollback) {
+			sprintf(fname, "%s/%s", packdir, item->string);
+			sprintf(fname_old, "%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	struct string_list fullbases = STRING_LIST_INIT_DUP;
+	for_each_string_list_item(item, &names) {
+		string_list_append(&fullbases, item->string);
+
+		sprintf(fname, "%s/pack-%s.pack", packdir, item->string);
+		sprintf(fname_old, "%s-%s.pack", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+
+		sprintf(fname, "%s/pack-%s.idx", packdir, item->string);
+		sprintf(fname_old, "%s-%s.idx", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		sprintf(fname, "%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+
+		sprintf(fname, "%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		sort_string_list(&fullbases);
+		fname = xmalloc(strlen(packtmp) + strlen("/") + 40 + strlen(".pack") + 1);
+		for_each_string_list_item(item, &existing_packs) {
+			if (!string_list_has_string(&fullbases, item->string))
+				remove_pack(packdir, item->string);
+		}
+		free(fname);
+		cmd_i = 0;
+		cmd_args[cmd_i++] = "prune-packed";
+		cmd_args[cmd_i++] = NULL;
+		/* TODO: pass argument: ${GIT_QUIET:+-q} */
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		cmd_i = 0;
+		cmd_args[cmd_i++] = "update-server-info";
+		cmd_args[cmd_i++] = NULL;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
+
diff --git a/contrib/examples/git-repack.sh b/contrib/examples/git-repack.sh
new file mode 100755
index 0000000..7579331
--- /dev/null
+++ b/contrib/examples/git-repack.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Linus Torvalds
+#
+
+OPTIONS_KEEPDASHDASH=
+OPTIONS_SPEC="\
+git repack [options]
+--
+a               pack everything in a single pack
+A               same as -a, and turn unreachable objects loose
+d               remove redundant packs, and run git-prune-packed
+f               pass --no-reuse-delta to git-pack-objects
+F               pass --no-reuse-object to git-pack-objects
+n               do not run git-update-server-info
+q,quiet         be quiet
+l               pass --local to git-pack-objects
+unpack-unreachable=  with -A, do not loosen objects older than this
+ Packing constraints
+window=         size of the window used for delta compression
+window-memory=  same as the above, but limit memory size instead of entries count
+depth=          limits the maximum delta depth
+max-pack-size=  maximum size of each packfile
+"
+SUBDIRECTORY_OK='Yes'
+. git-sh-setup
+
+no_update_info= all_into_one= remove_redundant= unpack_unreachable=
+local= no_reuse= extra=
+while test $# != 0
+do
+	case "$1" in
+	-n)	no_update_info=t ;;
+	-a)	all_into_one=t ;;
+	-A)	all_into_one=t
+		unpack_unreachable=--unpack-unreachable ;;
+	--unpack-unreachable)
+		unpack_unreachable="--unpack-unreachable=$2"; shift ;;
+	-d)	remove_redundant=t ;;
+	-q)	GIT_QUIET=t ;;
+	-f)	no_reuse=--no-reuse-delta ;;
+	-F)	no_reuse=--no-reuse-object ;;
+	-l)	local=--local ;;
+	--max-pack-size|--window|--window-memory|--depth)
+		extra="$extra $1=$2"; shift ;;
+	--) shift; break;;
+	*)	usage ;;
+	esac
+	shift
+done
+
+case "`git config --bool repack.usedeltabaseoffset || echo true`" in
+true)
+	extra="$extra --delta-base-offset" ;;
+esac
+
+PACKDIR="$GIT_OBJECT_DIRECTORY/pack"
+PACKTMP="$PACKDIR/.tmp-$$-pack"
+rm -f "$PACKTMP"-*
+trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
+
+# There will be more repacking strategies to come...
+case ",$all_into_one," in
+,,)
+	args='--unpacked --incremental'
+	;;
+,t,)
+	args= existing=
+	if [ -d "$PACKDIR" ]; then
+		for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
+			| sed -e 's/^\.\///' -e 's/\.pack$//'`
+		do
+			if [ -e "$PACKDIR/$e.keep" ]; then
+				: keep
+			else
+				existing="$existing $e"
+			fi
+		done
+		if test -n "$existing" -a -n "$unpack_unreachable" -a \
+			-n "$remove_redundant"
+		then
+			# This may have arbitrary user arguments, so we
+			# have to protect it against whitespace splitting
+			# when it gets run as "pack-objects $args" later.
+			# Fortunately, we know it's an approxidate, so we
+			# can just use dots instead.
+			args="$args $(echo "$unpack_unreachable" | tr ' ' .)"
+		fi
+	fi
+	;;
+esac
+
+mkdir -p "$PACKDIR" || exit
+
+args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
+names=$(git pack-objects --keep-true-parents --honor-pack-keep --non-empty --all --reflog $args </dev/null "$PACKTMP") ||
+	exit 1
+if [ -z "$names" ]; then
+	say Nothing new to pack.
+fi
+
+# Ok we have prepared all new packfiles.
+
+# First see if there are packs of the same name and if so
+# if we can move them out of the way (this can happen if we
+# repacked immediately after packing fully.
+rollback=
+failed=
+for name in $names
+do
+	for sfx in pack idx
+	do
+		file=pack-$name.$sfx
+		test -f "$PACKDIR/$file" || continue
+		rm -f "$PACKDIR/old-$file" &&
+		mv "$PACKDIR/$file" "$PACKDIR/old-$file" || {
+			failed=t
+			break
+		}
+		rollback="$rollback $file"
+	done
+	test -z "$failed" || break
+done
+
+# If renaming failed for any of them, roll the ones we have
+# already renamed back to their original names.
+if test -n "$failed"
+then
+	rollback_failure=
+	for file in $rollback
+	do
+		mv "$PACKDIR/old-$file" "$PACKDIR/$file" ||
+		rollback_failure="$rollback_failure $file"
+	done
+	if test -n "$rollback_failure"
+	then
+		echo >&2 "WARNING: Some packs in use have been renamed by"
+		echo >&2 "WARNING: prefixing old- to their name, in order to"
+		echo >&2 "WARNING: replace them with the new version of the"
+		echo >&2 "WARNING: file.  But the operation failed, and"
+		echo >&2 "WARNING: attempt to rename them back to their"
+		echo >&2 "WARNING: original names also failed."
+		echo >&2 "WARNING: Please rename them in $PACKDIR manually:"
+		for file in $rollback_failure
+		do
+			echo >&2 "WARNING:   old-$file -> $file"
+		done
+	fi
+	exit 1
+fi
+
+# Now the ones with the same name are out of the way...
+fullbases=
+for name in $names
+do
+	fullbases="$fullbases pack-$name"
+	chmod a-w "$PACKTMP-$name.pack"
+	chmod a-w "$PACKTMP-$name.idx"
+	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
+	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
+	exit
+done
+
+# Remove the "old-" files
+for name in $names
+do
+	rm -f "$PACKDIR/old-pack-$name.idx"
+	rm -f "$PACKDIR/old-pack-$name.pack"
+done
+
+# End of pack replacement.
+
+if test "$remove_redundant" = t
+then
+	# We know $existing are all redundant.
+	if [ -n "$existing" ]
+	then
+		( cd "$PACKDIR" &&
+		  for e in $existing
+		  do
+			case " $fullbases " in
+			*" $e "*) ;;
+			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
+			esac
+		  done
+		)
+	fi
+	git prune-packed ${GIT_QUIET:+-q}
+fi
+
+case "$no_update_info" in
+t) : ;;
+*) git update-server-info ;;
+esac
diff --git a/git-repack.sh b/git-repack.sh
deleted file mode 100755
index 7579331..0000000
--- a/git-repack.sh
+++ /dev/null
@@ -1,194 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Linus Torvalds
-#
-
-OPTIONS_KEEPDASHDASH=
-OPTIONS_SPEC="\
-git repack [options]
---
-a               pack everything in a single pack
-A               same as -a, and turn unreachable objects loose
-d               remove redundant packs, and run git-prune-packed
-f               pass --no-reuse-delta to git-pack-objects
-F               pass --no-reuse-object to git-pack-objects
-n               do not run git-update-server-info
-q,quiet         be quiet
-l               pass --local to git-pack-objects
-unpack-unreachable=  with -A, do not loosen objects older than this
- Packing constraints
-window=         size of the window used for delta compression
-window-memory=  same as the above, but limit memory size instead of entries count
-depth=          limits the maximum delta depth
-max-pack-size=  maximum size of each packfile
-"
-SUBDIRECTORY_OK='Yes'
-. git-sh-setup
-
-no_update_info= all_into_one= remove_redundant= unpack_unreachable=
-local= no_reuse= extra=
-while test $# != 0
-do
-	case "$1" in
-	-n)	no_update_info=t ;;
-	-a)	all_into_one=t ;;
-	-A)	all_into_one=t
-		unpack_unreachable=--unpack-unreachable ;;
-	--unpack-unreachable)
-		unpack_unreachable="--unpack-unreachable=$2"; shift ;;
-	-d)	remove_redundant=t ;;
-	-q)	GIT_QUIET=t ;;
-	-f)	no_reuse=--no-reuse-delta ;;
-	-F)	no_reuse=--no-reuse-object ;;
-	-l)	local=--local ;;
-	--max-pack-size|--window|--window-memory|--depth)
-		extra="$extra $1=$2"; shift ;;
-	--) shift; break;;
-	*)	usage ;;
-	esac
-	shift
-done
-
-case "`git config --bool repack.usedeltabaseoffset || echo true`" in
-true)
-	extra="$extra --delta-base-offset" ;;
-esac
-
-PACKDIR="$GIT_OBJECT_DIRECTORY/pack"
-PACKTMP="$PACKDIR/.tmp-$$-pack"
-rm -f "$PACKTMP"-*
-trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
-
-# There will be more repacking strategies to come...
-case ",$all_into_one," in
-,,)
-	args='--unpacked --incremental'
-	;;
-,t,)
-	args= existing=
-	if [ -d "$PACKDIR" ]; then
-		for e in `cd "$PACKDIR" && find . -type f -name '*.pack' \
-			| sed -e 's/^\.\///' -e 's/\.pack$//'`
-		do
-			if [ -e "$PACKDIR/$e.keep" ]; then
-				: keep
-			else
-				existing="$existing $e"
-			fi
-		done
-		if test -n "$existing" -a -n "$unpack_unreachable" -a \
-			-n "$remove_redundant"
-		then
-			# This may have arbitrary user arguments, so we
-			# have to protect it against whitespace splitting
-			# when it gets run as "pack-objects $args" later.
-			# Fortunately, we know it's an approxidate, so we
-			# can just use dots instead.
-			args="$args $(echo "$unpack_unreachable" | tr ' ' .)"
-		fi
-	fi
-	;;
-esac
-
-mkdir -p "$PACKDIR" || exit
-
-args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
-names=$(git pack-objects --keep-true-parents --honor-pack-keep --non-empty --all --reflog $args </dev/null "$PACKTMP") ||
-	exit 1
-if [ -z "$names" ]; then
-	say Nothing new to pack.
-fi
-
-# Ok we have prepared all new packfiles.
-
-# First see if there are packs of the same name and if so
-# if we can move them out of the way (this can happen if we
-# repacked immediately after packing fully.
-rollback=
-failed=
-for name in $names
-do
-	for sfx in pack idx
-	do
-		file=pack-$name.$sfx
-		test -f "$PACKDIR/$file" || continue
-		rm -f "$PACKDIR/old-$file" &&
-		mv "$PACKDIR/$file" "$PACKDIR/old-$file" || {
-			failed=t
-			break
-		}
-		rollback="$rollback $file"
-	done
-	test -z "$failed" || break
-done
-
-# If renaming failed for any of them, roll the ones we have
-# already renamed back to their original names.
-if test -n "$failed"
-then
-	rollback_failure=
-	for file in $rollback
-	do
-		mv "$PACKDIR/old-$file" "$PACKDIR/$file" ||
-		rollback_failure="$rollback_failure $file"
-	done
-	if test -n "$rollback_failure"
-	then
-		echo >&2 "WARNING: Some packs in use have been renamed by"
-		echo >&2 "WARNING: prefixing old- to their name, in order to"
-		echo >&2 "WARNING: replace them with the new version of the"
-		echo >&2 "WARNING: file.  But the operation failed, and"
-		echo >&2 "WARNING: attempt to rename them back to their"
-		echo >&2 "WARNING: original names also failed."
-		echo >&2 "WARNING: Please rename them in $PACKDIR manually:"
-		for file in $rollback_failure
-		do
-			echo >&2 "WARNING:   old-$file -> $file"
-		done
-	fi
-	exit 1
-fi
-
-# Now the ones with the same name are out of the way...
-fullbases=
-for name in $names
-do
-	fullbases="$fullbases pack-$name"
-	chmod a-w "$PACKTMP-$name.pack"
-	chmod a-w "$PACKTMP-$name.idx"
-	mv -f "$PACKTMP-$name.pack" "$PACKDIR/pack-$name.pack" &&
-	mv -f "$PACKTMP-$name.idx"  "$PACKDIR/pack-$name.idx" ||
-	exit
-done
-
-# Remove the "old-" files
-for name in $names
-do
-	rm -f "$PACKDIR/old-pack-$name.idx"
-	rm -f "$PACKDIR/old-pack-$name.pack"
-done
-
-# End of pack replacement.
-
-if test "$remove_redundant" = t
-then
-	# We know $existing are all redundant.
-	if [ -n "$existing" ]
-	then
-		( cd "$PACKDIR" &&
-		  for e in $existing
-		  do
-			case " $fullbases " in
-			*" $e "*) ;;
-			*)	rm -f "$e.pack" "$e.idx" "$e.keep" ;;
-			esac
-		  done
-		)
-	fi
-	git prune-packed ${GIT_QUIET:+-q}
-fi
-
-case "$no_update_info" in
-t) : ;;
-*) git update-server-info ;;
-esac
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc2

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
@ 2013-08-14 16:49         ` Antoine Pelisse
  2013-08-14 17:04           ` Stefan Beller
                             ` (3 more replies)
  2013-08-14 17:04         ` Junio C Hamano
  1 sibling, 4 replies; 72+ messages in thread
From: Antoine Pelisse @ 2013-08-14 16:49 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Matthieu Moy, Nguyễn Thái Ngọc Duy, iveqy,
	Junio C Hamano

On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
<stefanbeller@googlemail.com> wrote:
>  builtin/repack.c               | 410 +++++++++++++++++++++++++++++++++++++++++
>  contrib/examples/git-repack.sh | 194 +++++++++++++++++++
>  git-repack.sh                  | 194 -------------------

I'm still not sure I understand the trade-off here.

Most of what git-repack does is compute some file paths, (re)move
those files and call git-pack-objects, and potentially
git-prune-packed and git-update-server-info.
Maybe I'm wrong, but I have the feeling that the correct tool for that
is Shell, rather than C (and I think the code looks less intuitive in
C for that matter).
I'm not sure anyone would run that command a thousand times a second,
so I'm not sure it would make a real-life performance difference.

Last and very less important: I think it's OK to format-patch with -M,
especially when you move a file.

Cheers,
Antoine

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
  2013-08-14 16:49         ` Antoine Pelisse
@ 2013-08-14 17:04         ` Junio C Hamano
  2013-08-15  7:53           ` Stefan Beller
  1 sibling, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2013-08-14 17:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Matthieu.Moy, pclouds, iveqy, apelisse

Stefan Beller <stefanbeller@googlemail.com> writes:

> diff --git a/builtin/repack.c b/builtin/repack.c
> new file mode 100644
> index 0000000..d39c34e
> --- /dev/null
> +++ b/builtin/repack.c
> @@ -0,0 +1,410 @@
> +/*
> + * The shell version was written by Linus Torvalds (2005) and many others.
> + * This is a translation into C by Stefan Beller (2013)
> + */
> +
> +#include "builtin.h"
> +#include "cache.h"
> +#include "dir.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +#include "sigchain.h"
> +#include "strbuf.h"
> +#include "string-list.h"
> +
> +#include <sys/types.h>
> +#include <unistd.h>
> +#include <stdio.h>
> +#include <dirent.h>

If you need these system-includes here, it means that our own
platform-portability layer "git-compat-util.h" is broken.  On
various systems, often some system header files are missing, need a
few feature macros to be defined before including, and/or need to be
included in certain order, etc., and "git-compat-util.h" is meant to
hide all such details from the programmers.

I do not think the above four needs to be included in *.c, as long
as you include either builtin.h or cache.h, both of which includes
the compat-util header.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:49         ` Antoine Pelisse
@ 2013-08-14 17:04           ` Stefan Beller
  2013-08-14 17:19             ` Jeff King
  2013-08-14 17:25           ` Martin Fick
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-14 17:04 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: git, Matthieu Moy, Nguyễn Thái Ngọc Duy, iveqy,
	Junio C Hamano, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

On 08/14/2013 06:49 PM, Antoine Pelisse wrote:
> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> <stefanbeller@googlemail.com> wrote:
>>  builtin/repack.c               | 410 +++++++++++++++++++++++++++++++++++++++++
>>  contrib/examples/git-repack.sh | 194 +++++++++++++++++++
>>  git-repack.sh                  | 194 -------------------
> 
> I'm still not sure I understand the trade-off here.
> 
> Most of what git-repack does is compute some file paths, (re)move
> those files and call git-pack-objects, and potentially
> git-prune-packed and git-update-server-info.
> Maybe I'm wrong, but I have the feeling that the correct tool for that
> is Shell, rather than C (and I think the code looks less intuitive in
> C for that matter).
> I'm not sure anyone would run that command a thousand times a second,
> so I'm not sure it would make a real-life performance difference.

From IRC:
<iveqy> IIRC repack is one of the most important scripts to port
<iveqy> it's one of the rare times a script is used serverside
<PjotrOrial> heh, I picked it because of its size
<iveqy> (and people want to be able to use git in chrooted enviroments
with few dependencies)

My contributions as of now are very small nit picking things just to
familiarize with the code base, most of the time supported by tools
such as static code checkers.

However I'd like to contribute to the project in a more meaningful way,
but I still have the feeling to not be completely familar with the
projects code base (heh, it sure takes time for such large projects)

So I think the best way to get a feeling for the code base is to
rewrite a shell script in C. I picked the smallest script, to have
only a little task. So I thought at least. The rewriting is larger than
originally assumed.


But apart from my blabbering, I think ivegy made a good point:
The C parts just don't rely on external things, but only libc and
kernel, so it may be nicer than a shell script. Also as it is used
serversided, the performance aspect is not negligible.

I included Jeff King, who maybe could elaborate on git-repack on the
serverside?


> 
> Last and very less important: I think it's OK to format-patch with -M,
> especially when you move a file.
> 

Noted, thanks.

> Cheers,
> Antoine
> 

Stefan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 17:04           ` Stefan Beller
@ 2013-08-14 17:19             ` Jeff King
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff King @ 2013-08-14 17:19 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Wed, Aug 14, 2013 at 07:04:37PM +0200, Stefan Beller wrote:

> But apart from my blabbering, I think ivegy made a good point:
> The C parts just don't rely on external things, but only libc and
> kernel, so it may be nicer than a shell script. Also as it is used
> serversided, the performance aspect is not negligible.
> 
> I included Jeff King, who maybe could elaborate on git-repack on the
> serverside?

I don't think the performance of repack as a C program versus a shell
script is really relevant to us at GitHub. Sure, we run a fair number of
repacks, but the cost is totally dominated by the pack-objects process
itself.

You might be able to achieve some speedups if it was not simply a
shell->C conversion, but an overall gc rewrite that did more in a single
process, and reused results (for example, you can reuse all or part of
the history traversal from pack-object's "counting objects" phase to do
the reachability analysis during prune)[1].

But I'd be very wary of stuffing too many things in a single process.
There are parts of the code that make assumptions about which objects
have been seen in the global object hash table (I believe index-pack is
one of these; see check_objects). And there are parts of the code which
must run separately (e.g., the connectivity check after transfer runs in
a separate process, both because it may die(), but also because we want
a clean slate of which packs are available, with no caching of results
we may have seen).

None of those problems is unsolvable, but it's very hard to know when
one is going to pop up and bite you. And because the repacking and
pruning code is the most likely place for a bug to cause data loss, it
makes me a bit nervous.

-Peff

[1] Another way to reuse the history traversal is to generate the
    much-discussed pack reachability bitmaps, and then use them in
    git-prune.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:49         ` Antoine Pelisse
  2013-08-14 17:04           ` Stefan Beller
@ 2013-08-14 17:25           ` Martin Fick
  2013-08-14 22:16             ` Stefan Beller
  2013-08-15  4:15             ` Duy Nguyen
  2013-08-14 17:26           ` Junio C Hamano
  2013-08-14 22:51           ` Matthieu Moy
  3 siblings, 2 replies; 72+ messages in thread
From: Martin Fick @ 2013-08-14 17:25 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: Stefan Beller, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Wednesday, August 14, 2013 10:49:58 am Antoine Pelisse 
wrote:
> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> 
> <stefanbeller@googlemail.com> wrote:
> >  builtin/repack.c               | 410
> >  +++++++++++++++++++++++++++++++++++++++++
> >  contrib/examples/git-repack.sh | 194
> >  +++++++++++++++++++ git-repack.sh                  |
> >  194 -------------------
> 
> I'm still not sure I understand the trade-off here.
> 
> Most of what git-repack does is compute some file paths,
> (re)move those files and call git-pack-objects, and
> potentially git-prune-packed and git-update-server-info.
> Maybe I'm wrong, but I have the feeling that the correct
> tool for that is Shell, rather than C (and I think the
> code looks less intuitive in C for that matter).
> I'm not sure anyone would run that command a thousand
> times a second, so I'm not sure it would make a
> real-life performance difference.

I have been holding off a bit on expressing this opinion too 
because I don't want to squash someone's energy to improve 
things, and because I am not yet a git dev, but since it was 
brought up anyway...
 
I can say that as a user, having git-repack as a shell 
script has been very valuable.  For example:  we have 
modified it for our internal use to no longer ever overwrite
new packfiles with the same name as the current packfile.  
This modification was relatively easy to do and see in shell 
script.  If this were C code I can't imagine having 
personally: 1) identified that there was an issue with the 
original git-repack (it temporarily makes objects 
unavailable) 2) made a simple custom fix to that policy.

The script really is mostly a policy script, and with the 
discussions happening in other threads about how to improve 
git gc, I think it is helpful to potentially be able to 
quickly modify the policies in this script, it makes it 
easier to prototype things.  Shell portability issues aside, 
this script is not a low level type of tool that I feel will 
benefit from being in C, I feel it will in fact be worse off 
in C,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:49         ` Antoine Pelisse
  2013-08-14 17:04           ` Stefan Beller
  2013-08-14 17:25           ` Martin Fick
@ 2013-08-14 17:26           ` Junio C Hamano
  2013-08-14 22:51           ` Matthieu Moy
  3 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2013-08-14 17:26 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: Stefan Beller, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy

Antoine Pelisse <apelisse@gmail.com> writes:

> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> <stefanbeller@googlemail.com> wrote:
>>  builtin/repack.c               | 410 +++++++++++++++++++++++++++++++++++++++++
>>  contrib/examples/git-repack.sh | 194 +++++++++++++++++++
>>  git-repack.sh                  | 194 -------------------
>
> I'm still not sure I understand the trade-off here.
>
> Most of what git-repack does is compute some file paths, (re)move
> those files and call git-pack-objects, and potentially
> git-prune-packed and git-update-server-info.
> Maybe I'm wrong, but I have the feeling that the correct tool for that
> is Shell, rather than C (and I think the code looks less intuitive in
> C for that matter).
> I'm not sure anyone would run that command a thousand times a second,
> so I'm not sure it would make a real-life performance difference.

I do not think the motivation of this patch is about performance in
the first place, though.

> Last and very less important: I think it's OK to format-patch with -M,
> especially when you move a file.
>
> Cheers,
> Antoine

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 17:25           ` Martin Fick
@ 2013-08-14 22:16             ` Stefan Beller
  2013-08-14 22:28               ` Martin Fick
  2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
  2013-08-15  4:15             ` Duy Nguyen
  1 sibling, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-14 22:16 UTC (permalink / raw)
  To: Martin Fick
  Cc: Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2539 bytes --]

On 08/14/2013 07:25 PM, Martin Fick wrote:
> I have been holding off a bit on expressing this opinion too 
> because I don't want to squash someone's energy to improve 
> things, and because I am not yet a git dev, but since it was 
> brought up anyway...

It's ok, if you knew a better topic to work on, I'd gladly switch over.
(Given it would be a good beginners topic.)

>  
> I can say that as a user, having git-repack as a shell 
> script has been very valuable.  For example:  we have 
> modified it for our internal use to no longer ever overwrite
> new packfiles with the same name as the current packfile.  
> This modification was relatively easy to do and see in shell 
> script.  If this were C code I can't imagine having 
> personally: 1) identified that there was an issue with the 
> original git-repack (it temporarily makes objects 
> unavailable) 2) made a simple custom fix to that policy.

Looking at the `git log -- git-repack.sh` the last commit is
from April 2012 and the commit before is 2011, so I assumed it
stable enough for porting over to C, as there is not much
modification going on. I'd be glad to include these changes
you're using into the rewrite or the shell script as of now.

> 
> The script really is mostly a policy script, and with the 
> discussions happening in other threads about how to improve 
> git gc, I think it is helpful to potentially be able to 
> quickly modify the policies in this script, it makes it 
> easier to prototype things.  Shell portability issues aside, 
> this script is not a low level type of tool that I feel will 
> benefit from being in C, I feel it will in fact be worse off 
> in C,

So far I have been following the git mailing list, there are
people dreaming of 'everything in C' and apparently those, who
are ok with lots of shell code as well, 'because it's high level'.
I tend to follow the first group, dreaming of everything in C.
Thanks for pointing that out, if it really hurts people from using
git effectively, I'd rather not contribute this patch. But I'd like
to stress again, that the prototying should be done by now.

I asked for a todo wish list a few weeks ago, but got no real answer,
but rather: "Pick your choice and try to come up with good patches".
This is a good policy from the projects point of view (your choice
helps in doing good patches, and good patches do not need as much
review, hence the reviewing costs are low), so I also choose this topic.

> 
> -Martin
> 

Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:16             ` Stefan Beller
@ 2013-08-14 22:28               ` Martin Fick
  2013-08-14 22:53                 ` Junio C Hamano
  2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
  1 sibling, 1 reply; 72+ messages in thread
From: Martin Fick @ 2013-08-14 22:28 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Wednesday, August 14, 2013 04:16:35 pm Stefan Beller 
wrote:
> On 08/14/2013 07:25 PM, Martin Fick wrote:
> > I have been holding off a bit on expressing this
> > opinion too because I don't want to squash someone's
> > energy to improve things, and because I am not yet a
> > git dev, but since it was brought up anyway...
> 
> It's ok, if you knew a better topic to work on, I'd
> gladly switch over. (Given it would be a good beginners
> topic.)

See below...

> > I can say that as a user, having git-repack as a shell
> > script has been very valuable.  For example:  we have
> > modified it for our internal use to no longer ever
> > overwrite new packfiles with the same name as the
> > current packfile. This modification was relatively
> > easy to do and see in shell script.  If this were C
> > code I can't imagine having personally: 1) identified
> > that there was an issue with the original git-repack
> > (it temporarily makes objects unavailable) 2) made a
> > simple custom fix to that policy.
> 
> Looking at the `git log -- git-repack.sh` the last commit
> is from April 2012 and the commit before is 2011, so I
> assumed it stable enough for porting over to C, as there
> is not much modification going on. I'd be glad to
> include these changes you're using into the rewrite or
> the shell script as of now.

One suggestion would be to change the repack code to create 
pack filenames based on the sha1 of the contents of the pack 
file instead of on the sha1 of the objects in the packfile.  

Since the same objects can be stored in a packfile in many 
ways (different deltification/compression options), it is 
currently possible to have 2 different pack files with the 
same names.  The contents are different, but the contained 
objects are the same.  This causes the object availability 
bug that I describe above in git repack when a new packfile 
is generated with the same name as a current one.

I am not 100% sure if the change in naming convention I 
propose wouldn't cause any problems?  But if others agree it 
is a good idea, perhaps it is something a beginner could do?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 16:49         ` Antoine Pelisse
                             ` (2 preceding siblings ...)
  2013-08-14 17:26           ` Junio C Hamano
@ 2013-08-14 22:51           ` Matthieu Moy
  2013-08-14 23:25             ` Martin Fick
  2013-08-15  4:20             ` Duy Nguyen
  3 siblings, 2 replies; 72+ messages in thread
From: Matthieu Moy @ 2013-08-14 22:51 UTC (permalink / raw)
  To: Antoine Pelisse
  Cc: Stefan Beller, git, Nguyễn Thái Ngọc Duy, iveqy,
	Junio C Hamano

Antoine Pelisse <apelisse@gmail.com> writes:

> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> <stefanbeller@googlemail.com> wrote:
>>  builtin/repack.c               | 410 +++++++++++++++++++++++++++++++++++++++++
>>  contrib/examples/git-repack.sh | 194 +++++++++++++++++++
>>  git-repack.sh                  | 194 -------------------
>
> I'm still not sure I understand the trade-off here.
>
> Most of what git-repack does is compute some file paths, (re)move
> those files and call git-pack-objects, and potentially
> git-prune-packed and git-update-server-info.
> Maybe I'm wrong, but I have the feeling that the correct tool for that
> is Shell, rather than C (and I think the code looks less intuitive in
> C for that matter).

There's a real problem with git-repack being shell (I already mentionned
it in the previous thread about the rewrite): it creates dependencies on
a few external binaries, and a restricted server may not have them. I
have this issue on a fusionforge server where Git repos are accessed in
a chroot with very few commands available: everything went OK until the
first project grew enough to require a "git gc --auto", and then it
stopped accepting pushes for that project.

I tracked down the origin of the problem and the sysadmins disabled
auto-gc, but that's not a very satisfactory solution.

C is rather painfull to write, but as a sysadmin, drop the binary on
your server and it just works. That's really important. AFAIK,
git-repack is the only remaining shell part on the server, and it's
rather small. I'd really love to see it disapear.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:16             ` Stefan Beller
  2013-08-14 22:28               ` Martin Fick
@ 2013-08-14 22:51               ` Junio C Hamano
  2013-08-14 22:59                 ` Matthieu Moy
  1 sibling, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2013-08-14 22:51 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Martin Fick, Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy

Stefan Beller <stefanbeller@googlemail.com> writes:

> I asked for a todo wish list a few weeks ago, but got no real answer,
> but rather: "Pick your choice and try to come up with good patches".

Hmph, I hope that wasn't me.

There are some good ones here;

  http://git-blame.blogspot.com/search?q=leftover

Some are trivial, some are moderate complexity, and some are
pie-in-the-sky.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:28               ` Martin Fick
@ 2013-08-14 22:53                 ` Junio C Hamano
  2013-08-14 23:28                   ` Martin Fick
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2013-08-14 22:53 UTC (permalink / raw)
  To: Martin Fick
  Cc: Stefan Beller, Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy

Martin Fick <mfick@codeaurora.org> writes:

> One suggestion would be to change the repack code to create 
> pack filenames based on the sha1 of the contents of the pack 
> file instead of on the sha1 of the objects in the packfile.  
> ...
> I am not 100% sure if the change in naming convention I 
> propose wouldn't cause any problems?  But if others agree it 
> is a good idea, perhaps it is something a beginner could do?

I would not be surprised if that change breaks some other people's
reimplementation.  I know we do not validate the pack name with the
hash of the contents in the current code, but at the same time I do
remember that was one of the planned things to be done while I and
Linus were working on the original pack design, which was the last
task we did together before he retired from the maintainership of
this project.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
@ 2013-08-14 22:59                 ` Matthieu Moy
  2013-08-15  7:47                   ` Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Matthieu Moy @ 2013-08-14 22:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Martin Fick, Antoine Pelisse, git,
	Nguyễn Thái Ngọc Duy, iveqy

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <stefanbeller@googlemail.com> writes:
>
>> I asked for a todo wish list a few weeks ago, but got no real answer,
>> but rather: "Pick your choice and try to come up with good patches".
>
> Hmph, I hope that wasn't me.
>
> There are some good ones here;
>
>   http://git-blame.blogspot.com/search?q=leftover

See also:

  https://git.wiki.kernel.org/index.php/SmallProjectsIdeas

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:51           ` Matthieu Moy
@ 2013-08-14 23:25             ` Martin Fick
  2013-08-15  0:26               ` Martin Fick
  2013-08-15  7:46               ` Stefan Beller
  2013-08-15  4:20             ` Duy Nguyen
  1 sibling, 2 replies; 72+ messages in thread
From: Martin Fick @ 2013-08-14 23:25 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Antoine Pelisse, Stefan Beller, git,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Wednesday, August 14, 2013 04:51:14 pm Matthieu Moy 
wrote:
> Antoine Pelisse <apelisse@gmail.com> writes:
> > On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> > 
> > <stefanbeller@googlemail.com> wrote:
> >>  builtin/repack.c               | 410
> >>  +++++++++++++++++++++++++++++++++++++++++
> >>  contrib/examples/git-repack.sh | 194
> >>  +++++++++++++++++++ git-repack.sh                  |
> >>  194 -------------------
> > 
> > I'm still not sure I understand the trade-off here.
> > 
> > Most of what git-repack does is compute some file
> > paths, (re)move those files and call git-pack-objects,
> > and potentially git-prune-packed and
> > git-update-server-info.
> > Maybe I'm wrong, but I have the feeling that the
> > correct tool for that is Shell, rather than C (and I
> > think the code looks less intuitive in C for that
> > matter).
> 
> There's a real problem with git-repack being shell (I
> already mentionned it in the previous thread about the
> rewrite): it creates dependencies on a few external
> binaries, and a restricted server may not have them. I
> have this issue on a fusionforge server where Git repos
> are accessed in a chroot with very few commands
> available: everything went OK until the first project
> grew enough to require a "git gc --auto", and then it
> stopped accepting pushes for that project.
> 
> I tracked down the origin of the problem and the
> sysadmins disabled auto-gc, but that's not a very
> satisfactory solution.
> 
> C is rather painfull to write, but as a sysadmin, drop
> the binary on your server and it just works. That's
> really important. AFAIK, git-repack is the only
> remaining shell part on the server, and it's rather
> small. I'd really love to see it disapear.

I didn't review the proposed C version, but how was it 
planning on removing the dependencies on these binaries?  
Was it planning to reimplement mv, cp, find?  Were there 
other binaries that were problematic that you were thinking 
of?  From what I can tell it also uses test, mkdir, sed, 
chmod and naturally sh, that is 8 dependencies.  If those 
can't be depended upon for existing, perhaps git should just 
consider bundling busy-box or some other limited shell 
utils, or yikes!, even its own reimplementation of these 
instead of implementing these independently inside other git 
programs?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:53                 ` Junio C Hamano
@ 2013-08-14 23:28                   ` Martin Fick
  2013-08-15 17:15                     ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Fick @ 2013-08-14 23:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy

On Wednesday, August 14, 2013 04:53:36 pm Junio C Hamano 
wrote:
> Martin Fick <mfick@codeaurora.org> writes:
> > One suggestion would be to change the repack code to
> > create pack filenames based on the sha1 of the
> > contents of the pack file instead of on the sha1 of
> > the objects in the packfile. ...
> > I am not 100% sure if the change in naming convention I
> > propose wouldn't cause any problems?  But if others
> > agree it is a good idea, perhaps it is something a
> > beginner could do?
> 
> I would not be surprised if that change breaks some other
> people's reimplementation.  I know we do not validate
> the pack name with the hash of the contents in the
> current code, but at the same time I do remember that
> was one of the planned things to be done while I and
> Linus were working on the original pack design, which
> was the last task we did together before he retired from
> the maintainership of this project.

Perhaps a config option?  One that becomes standard for git 
2.0?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 23:25             ` Martin Fick
@ 2013-08-15  0:26               ` Martin Fick
  2013-08-15  7:46               ` Stefan Beller
  1 sibling, 0 replies; 72+ messages in thread
From: Martin Fick @ 2013-08-15  0:26 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Antoine Pelisse, Stefan Beller, git,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Wednesday, August 14, 2013 05:25:42 pm Martin Fick wrote:
> On Wednesday, August 14, 2013 04:51:14 pm Matthieu Moy
> 
> wrote:
> > Antoine Pelisse <apelisse@gmail.com> writes:
> > > On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> > > 
> > > <stefanbeller@googlemail.com> wrote:
> > >>  builtin/repack.c               | 410
> > >>  +++++++++++++++++++++++++++++++++++++++++
> > >>  contrib/examples/git-repack.sh | 194
> > >>  +++++++++++++++++++ git-repack.sh                 
> > >>  | 194 -------------------
> > > 
> > > I'm still not sure I understand the trade-off here.
> > > 
> > > Most of what git-repack does is compute some file
> > > paths, (re)move those files and call
> > > git-pack-objects, and potentially git-prune-packed
> > > and
> > > git-update-server-info.
> > > Maybe I'm wrong, but I have the feeling that the
> > > correct tool for that is Shell, rather than C (and I
> > > think the code looks less intuitive in C for that
> > > matter).
> > 
> > There's a real problem with git-repack being shell (I
> > already mentionned it in the previous thread about the
> > rewrite): it creates dependencies on a few external
> > binaries, and a restricted server may not have them. I
> > have this issue on a fusionforge server where Git repos
> > are accessed in a chroot with very few commands
> > available: everything went OK until the first project
> > grew enough to require a "git gc --auto", and then it
> > stopped accepting pushes for that project.
> > 
> > I tracked down the origin of the problem and the
> > sysadmins disabled auto-gc, but that's not a very
> > satisfactory solution.
> > 
> > C is rather painfull to write, but as a sysadmin, drop
> > the binary on your server and it just works. That's
> > really important. AFAIK, git-repack is the only
> > remaining shell part on the server, and it's rather
> > small. I'd really love to see it disapear.
> 
> I didn't review the proposed C version, but how was it
> planning on removing the dependencies on these binaries?
> Was it planning to reimplement mv, cp, find?  Were there
> other binaries that were problematic that you were
> thinking of?  From what I can tell it also uses test,
> mkdir, sed, chmod and naturally sh, that is 8
> dependencies.  If those can't be depended upon for
> existing, perhaps git should just consider bundling
> busy-box or some other limited shell utils, or yikes!,
> even its own reimplementation of these instead of
> implementing these independently inside other git
> programs?

Sorry I didn't comprehend your email fully when I first read 
it.  I guess that wouldn't really solve your problem unless 
someone had a way of bundling an sh program and whatever it 
calls inside a single executable? :(

I can see why you would want what you want,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 17:25           ` Martin Fick
  2013-08-14 22:16             ` Stefan Beller
@ 2013-08-15  4:15             ` Duy Nguyen
  1 sibling, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2013-08-15  4:15 UTC (permalink / raw)
  To: Martin Fick
  Cc: Antoine Pelisse, Stefan Beller, git, Matthieu Moy,
	Fredrik Gustafsson, Junio C Hamano

On Thu, Aug 15, 2013 at 12:25 AM, Martin Fick <mfick@codeaurora.org> wrote:
> The script really is mostly a policy script, and with the
> discussions happening in other threads about how to improve
> git gc, I think it is helpful to potentially be able to
> quickly modify the policies in this script, it makes it
> easier to prototype things.  Shell portability issues aside,
> this script is not a low level type of tool that I feel will
> benefit from being in C, I feel it will in fact be worse off
> in C,

I think C is better for the modification you made in git-exproll.sh,
if it gets merged to git-repack.sh. Such calculations are not a strong
point of shell scripting. git-repack.sh is still around for
experimenting, although I think perl, ruby or python is better than
shell for prototyping.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:51           ` Matthieu Moy
  2013-08-14 23:25             ` Martin Fick
@ 2013-08-15  4:20             ` Duy Nguyen
  1 sibling, 0 replies; 72+ messages in thread
From: Duy Nguyen @ 2013-08-15  4:20 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Antoine Pelisse, Stefan Beller, git, Fredrik Gustafsson,
	Junio C Hamano

On Thu, Aug 15, 2013 at 5:51 AM, Matthieu Moy
<Matthieu.Moy@grenoble-inp.fr> wrote:
> There's a real problem with git-repack being shell (I already mentionned
> it in the previous thread about the rewrite): it creates dependencies on
> a few external binaries, and a restricted server may not have them.

There's also the Windows port. A POSIX shell environment is required
for using git on Windows, but I feel one should be able to use core
functionlity even without POSIX utilities. git-repack is part of this
core in my opinion.
-- 
Duy

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 23:25             ` Martin Fick
  2013-08-15  0:26               ` Martin Fick
@ 2013-08-15  7:46               ` Stefan Beller
  2013-08-15 15:04                 ` Martin Fick
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-15  7:46 UTC (permalink / raw)
  To: Martin Fick
  Cc: Matthieu Moy, Antoine Pelisse, git,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3470 bytes --]

On 08/15/2013 01:25 AM, Martin Fick wrote:
> On Wednesday, August 14, 2013 04:51:14 pm Matthieu Moy 
> wrote:
>> Antoine Pelisse <apelisse@gmail.com> writes:
>>> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
>>>
>>> <stefanbeller@googlemail.com> wrote:
>>>>  builtin/repack.c               | 410
>>>>  +++++++++++++++++++++++++++++++++++++++++
>>>>  contrib/examples/git-repack.sh | 194
>>>>  +++++++++++++++++++ git-repack.sh                  |
>>>>  194 -------------------
>>>
>>> I'm still not sure I understand the trade-off here.
>>>
>>> Most of what git-repack does is compute some file
>>> paths, (re)move those files and call git-pack-objects,
>>> and potentially git-prune-packed and
>>> git-update-server-info.
>>> Maybe I'm wrong, but I have the feeling that the
>>> correct tool for that is Shell, rather than C (and I
>>> think the code looks less intuitive in C for that
>>> matter).
>>
>> There's a real problem with git-repack being shell (I
>> already mentionned it in the previous thread about the
>> rewrite): it creates dependencies on a few external
>> binaries, and a restricted server may not have them. I
>> have this issue on a fusionforge server where Git repos
>> are accessed in a chroot with very few commands
>> available: everything went OK until the first project
>> grew enough to require a "git gc --auto", and then it
>> stopped accepting pushes for that project.
>>
>> I tracked down the origin of the problem and the
>> sysadmins disabled auto-gc, but that's not a very
>> satisfactory solution.
>>
>> C is rather painfull to write, but as a sysadmin, drop
>> the binary on your server and it just works. That's
>> really important. AFAIK, git-repack is the only
>> remaining shell part on the server, and it's rather
>> small. I'd really love to see it disapear.
> 
> I didn't review the proposed C version, but how was it 
> planning on removing the dependencies on these binaries?  
> Was it planning to reimplement mv, cp, find?  

These small programms (at least mv and cp) are just convenient
interfaces for system calls from within the shell.
You can use these system calls to achieve a similar results
compared to the commandline option.
http://linux.die.net/man/2/rename
http://linux.die.net/man/2/unlink

> Were there 
> other binaries that were problematic that you were thinking 
> of?  From what I can tell it also uses test, mkdir, sed, 
> chmod and naturally sh, that is 8 dependencies. 

mkdir, test, chmod are also easily done via system calls.
The system calls are usually capsulated by the libc to have
an easy C interface. (A standard C function call)

sed and find are tricky indeed, but you can get around it with
a few lines of C (maybe 10?) for each occurrence. We don't need
the full power of sed and find, but rather only the exact specific
matching regexp.

 If those
> can't be depended upon for existing, perhaps git should just 
> consider bundling busy-box or some other limited shell 
> utils, or yikes!, even its own reimplementation of these 
> instead of implementing these independently inside other git 
> programs?
> 

The C version as of now has twice the lines of code than the
shell version. And I am pretty sure I did some rookie mistakes,
so the code can be down-sized by better use of already existing
functions. So I guess the final version will have less lines than
in the proposed patch as of now.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 22:59                 ` Matthieu Moy
@ 2013-08-15  7:47                   ` Stefan Beller
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-15  7:47 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Junio C Hamano, Martin Fick, Antoine Pelisse, git,
	Nguyễn Thái Ngọc Duy, iveqy

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On 08/15/2013 12:59 AM, Matthieu Moy wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Stefan Beller <stefanbeller@googlemail.com> writes:
>>
>>> I asked for a todo wish list a few weeks ago, but got no real answer,
>>> but rather: "Pick your choice and try to come up with good patches".
>>
>> Hmph, I hope that wasn't me.
>>
>> There are some good ones here;
>>
>>   http://git-blame.blogspot.com/search?q=leftover
> 
> See also:
> 
>   https://git.wiki.kernel.org/index.php/SmallProjectsIdeas
> 

Thanks, I bookmarked both of those pages.
The wiki already lists Rewrite "git repack" in C at the
very end, so my sense where to start wasn't way off. ;)




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 17:04         ` Junio C Hamano
@ 2013-08-15  7:53           ` Stefan Beller
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-15  7:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Matthieu.Moy, pclouds, iveqy, apelisse

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On 08/14/2013 07:04 PM, Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:
> 
>> diff --git a/builtin/repack.c b/builtin/repack.c
>> new file mode 100644
>> index 0000000..d39c34e
>> --- /dev/null
>> +++ b/builtin/repack.c
>> @@ -0,0 +1,410 @@
>> +/*
>> + * The shell version was written by Linus Torvalds (2005) and many others.
>> + * This is a translation into C by Stefan Beller (2013)
>> + */
>> +
>> +#include "builtin.h"
>> +#include "cache.h"
>> +#include "dir.h"
>> +#include "parse-options.h"
>> +#include "run-command.h"
>> +#include "sigchain.h"
>> +#include "strbuf.h"
>> +#include "string-list.h"
>> +
>> +#include <sys/types.h>
>> +#include <unistd.h>
>> +#include <stdio.h>
>> +#include <dirent.h>
> 
> If you need these system-includes here, it means that our own
> platform-portability layer "git-compat-util.h" is broken.  On
> various systems, often some system header files are missing, need a
> few feature macros to be defined before including, and/or need to be
> included in certain order, etc., and "git-compat-util.h" is meant to
> hide all such details from the programmers.
> 
> I do not think the above four needs to be included in *.c, as long
> as you include either builtin.h or cache.h, both of which includes
> the compat-util header.
> 

Thanks. It works without these includes as well. I think I got those
includes before I realized there is so much infrastructure already
available and I forgot to remove these includes once I added the
others.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-15  7:46               ` Stefan Beller
@ 2013-08-15 15:04                 ` Martin Fick
  0 siblings, 0 replies; 72+ messages in thread
From: Martin Fick @ 2013-08-15 15:04 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Matthieu Moy, Antoine Pelisse, git,
	Nguyễn Thái Ngọc Duy, iveqy, Junio C Hamano

On Thursday, August 15, 2013 01:46:02 am Stefan Beller 
wrote:
> On 08/15/2013 01:25 AM, Martin Fick wrote:
> > On Wednesday, August 14, 2013 04:51:14 pm Matthieu Moy
> > 
> > wrote:
> >> Antoine Pelisse <apelisse@gmail.com> writes:
> >>> On Wed, Aug 14, 2013 at 6:27 PM, Stefan Beller
> >>> 
> >>> <stefanbeller@googlemail.com> wrote:
> >>>>  builtin/repack.c               | 410
> >>>>  +++++++++++++++++++++++++++++++++++++++++
> >>>>  contrib/examples/git-repack.sh | 194
> >>>>  +++++++++++++++++++ git-repack.sh                 
> >>>>  | 194 -------------------
> >>> 
> >>> I'm still not sure I understand the trade-off here.
> >>> 
> >>> Most of what git-repack does is compute some file
> >>> paths, (re)move those files and call
> >>> git-pack-objects, and potentially git-prune-packed
> >>> and
> >>> git-update-server-info.
> >>> Maybe I'm wrong, but I have the feeling that the
> >>> correct tool for that is Shell, rather than C (and I
> >>> think the code looks less intuitive in C for that
> >>> matter).
> >> 
> >> There's a real problem with git-repack being shell (I
> >> already mentionned it in the previous thread about the
> >> rewrite): it creates dependencies on a few external
> >> binaries, and a restricted server may not have them. I
> >> have this issue on a fusionforge server where Git
> >> repos are accessed in a chroot with very few commands
> >> available: everything went OK until the first project
> >> grew enough to require a "git gc --auto", and then it
> >> stopped accepting pushes for that project.
> >> 
> >> I tracked down the origin of the problem and the
> >> sysadmins disabled auto-gc, but that's not a very
> >> satisfactory solution.
> >> 
> >> C is rather painfull to write, but as a sysadmin, drop
> >> the binary on your server and it just works. That's
> >> really important. AFAIK, git-repack is the only
> >> remaining shell part on the server, and it's rather
> >> small. I'd really love to see it disapear.
> > 
> > I didn't review the proposed C version, but how was it
> > planning on removing the dependencies on these
> > binaries? Was it planning to reimplement mv, cp, find?
> 
> These small programms (at least mv and cp) are just
> convenient interfaces for system calls from within the
> shell. You can use these system calls to achieve a
> similar results compared to the commandline option.
> http://linux.die.net/man/2/rename
> http://linux.die.net/man/2/unlink

Sure, but have you ever looked at the code to mv?  It isn't 
pretty. ;(  But in all that ugliness is decades worth of 
portability and corner cases.  Also, mv is smart enough to 
copy when rename doesn't work (on some systems it doesn't).  
So C may sound more portable, but I am not sure it actually 
is.  Now hopefully you won't need all of that, but I think 
that some of the design decision that went into git-repack 
did consider some of the more eccentric filesystems out 
there,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCH] repack: rewrite the shell script in C.
  2013-08-14 23:28                   ` Martin Fick
@ 2013-08-15 17:15                     ` Junio C Hamano
  2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2013-08-15 17:15 UTC (permalink / raw)
  To: Martin Fick
  Cc: Stefan Beller, Antoine Pelisse, git, Matthieu Moy,
	Nguyễn Thái Ngọc Duy, iveqy

Martin Fick <mfick@codeaurora.org> writes:

> On Wednesday, August 14, 2013 04:53:36 pm Junio C Hamano 
> wrote:
>> Martin Fick <mfick@codeaurora.org> writes:
>> > One suggestion would be to change the repack code to
>> > create pack filenames based on the sha1 of the
>> > contents of the pack file instead of on the sha1 of
>> > the objects in the packfile. ...
>> > I am not 100% sure if the change in naming convention I
>> > propose wouldn't cause any problems?  But if others
>> > agree it is a good idea, perhaps it is something a
>> > beginner could do?
>> 
>> I would not be surprised if that change breaks some other
>> people's reimplementation.  I know we do not validate
>> the pack name with the hash of the contents in the
>> current code, but at the same time I do remember that
>> was one of the planned things to be done while I and
>> Linus were working on the original pack design, which
>> was the last task we did together before he retired from
>> the maintainership of this project.
>
> Perhaps a config option?  One that becomes standard for git 
> 2.0?

Anything new is too late for Git 2.0, as we do not want to hold the
switching of push.default to "simple" too long.  End of this year
might be a bit too soon, but I want 2.0 to happen by the next
spring.

You can discuss, design the new naming and necessary transition plan
for existing repositories, reach a concensus and declare the name
switch in the future, and then schedule that for the next major
version bump after 2.0 happens.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC PATCHv2] repack: rewrite the shell script in C.
  2013-08-15 17:15                     ` Junio C Hamano
@ 2013-08-16  0:12                       ` Stefan Beller
  2013-08-17 13:34                         ` René Scharfe
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-16  0:12 UTC (permalink / raw)
  To: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster; +Cc: Stefan Beller

This is the beginning of the rewrite of the repacking.

 * Removed unneeded system header files
 * corrected remove_pack to really remove any pack files with the given
   sha1
 * fail if pack-objects fails
 * Only test t7701 (2nd) fails now  with this patch.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 411 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 414 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..f72911d
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,411 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path, *fname;
+
+	prefix = xmalloc(strlen(".tmp-10000-pack") + 1);
+	sprintf(prefix, ".tmp-%d-pack", getpid());
+
+	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
+	sprintf(path, "%s/pack", get_object_directory());
+
+	fname = xmalloc(strlen(path) + strlen("/")
+		+ strlen(prefix) + strlen("/")
+		+ 40 + strlen(".pack") + 1);
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			sprintf(fname, "%s/%s", path, e->d_name);
+			unlink(fname);
+		}
+	}
+	free(fname);
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+void get_pack_sha1_list(char *packdir, struct string_list *sha1_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix;
+
+	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
+	sprintf(path, "%s/pack", get_object_directory());
+
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			char buf[255], *sha1;
+			strcpy(buf, e->d_name);
+			buf[strlen(e->d_name) - strlen(suffix)] = '\0';
+			sha1 = &buf[strlen(e->d_name) - strlen(suffix) - 40];
+			string_list_append(sha1_list, sha1);
+		}
+	}
+	free(path);
+	closedir(dir);
+}
+
+/*
+ * remove_pack will remove any files following the pattern *${SHA1}.{EXT}
+ * where EXT is one of {pack, idx, keep}. The SHA1 consists of 40 chars and
+ * is specified by the sha1 parameter.
+ * path is specifying the directory in which all found files will be deleted.
+ */
+void remove_pack(char *path, char* sha1)
+{
+	DIR *dir;
+	struct dirent *e;
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+
+		char *sha_begin, *sha_end;
+		sha_end = e->d_name + strlen(e->d_name);
+		while (sha_end > e->d_name && *sha_end != '.')
+			sha_end--;
+
+		/* do not touch files not ending in .pack, .idx or .keep */
+		if (strcmp(sha_end, ".pack") &&
+			strcmp(sha_end, ".idx") &&
+			strcmp(sha_end, ".keep"))
+			continue;
+
+		sha_begin = sha_end - 40;
+
+		if (sha_begin >= e->d_name && !strncmp(sha_begin, sha1, 40)) {
+			char *fname;
+			fname = xmalloc(strlen(path) + 1 + strlen(e->d_name));
+			sprintf(fname, "%s/%s", path, e->d_name);
+			unlink(fname);
+			/*TODO: free(fname); fails here sometimes, needs investigation*/
+		}
+	}
+	closedir(dir);
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	unsigned long unpack_unreachable = 0;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	const char *cmd_args[20];
+	int cmd_i = 0;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct stat statbuffer;
+	char window_str[64], window_mem_str[64], depth_str[64], max_pack_size_str[64];
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_DATE(0, "unpack-unreachable", &unpack_unreachable,
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpath("%s/pack", get_object_directory());
+	packtmp = xmalloc(strlen(packdir) + strlen("/.tmp-10000-pack") + 1);
+	sprintf(packtmp, "%s/.tmp-%d-pack", packdir, getpid());
+
+	remove_temporary_files();
+
+	cmd_args[cmd_i++] = "pack-objects";
+	cmd_args[cmd_i++] = "--keep-true-parents";
+	cmd_args[cmd_i++] = "--honor-pack-keep";
+	cmd_args[cmd_i++] = "--non-empty";
+	cmd_args[cmd_i++] = "--all";
+	cmd_args[cmd_i++] = "--reflog";
+
+	if (window) {
+		sprintf(window_str, "--window=%u", window);
+		cmd_args[cmd_i++] = window_str;
+	}
+	if (window_memory) {
+		sprintf(window_mem_str, "--window-memory=%u", window_memory);
+		cmd_args[cmd_i++] = window_str;
+	}
+	if (depth) {
+		sprintf(depth_str, "--depth=%u", depth);
+		cmd_args[cmd_i++] = depth_str;
+	}
+	if (max_pack_size) {
+		sprintf(max_pack_size_str, "--max_pack_size=%u", max_pack_size);
+		cmd_args[cmd_i++] = max_pack_size_str;
+	}
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		cmd_args[cmd_i++] = "--unpacked";
+		cmd_args[cmd_i++] = "--incremental";
+	} else {
+		if (pack_everything_but_loose)
+			cmd_args[cmd_i++] = "--unpack-unreachable";
+
+		struct string_list sha1_list = STRING_LIST_INIT_DUP;
+		get_pack_sha1_list(packdir, &sha1_list);
+		for_each_string_list_item(item, &sha1_list) {
+			char *fname;
+			fname = xmalloc(strlen(packdir) + strlen("/") + 40 + strlen(".keep"));
+			sprintf(fname, "%s/%s.keep", packdir, item->string);
+			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+		}
+
+		if (existing_packs.nr && unpack_unreachable && delete_redundant) {
+			/*
+			 * TODO: convert unpack_unreachable (being time since epoch)
+			 * to an aproxidate again
+			 */
+			cmd_args[cmd_i++] = "--unpack-unreachable=$DATE";
+		}
+	}
+
+	if (local)
+		cmd_args[cmd_i++] = "--local";
+
+	if (delta_base_offset)
+		cmd_args[cmd_i++] = "--delta-base-offset";
+
+	cmd_args[cmd_i++] = packtmp;
+	cmd_args[cmd_i] = NULL;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = cmd_args;
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	if (run_command(&cmd))
+		return 1;
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		assert(strlen(line) == 41);
+		line[40] = '\0';
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	char *fname, *fname_old;
+	fname = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
+	strcpy(fname, packdir);
+	strcpy(fname + strlen(packdir), "/");
+
+	fname_old = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
+	strcpy(fname_old, packdir);
+	strcpy(fname_old + strlen(packdir), "/");
+	char *exts[2] = {".idx", ".pack"};
+
+	int failed = 0;
+
+	for_each_string_list_item(item, &names) {
+		int ext;
+		for (ext = 0; ext < 1; ext++) {
+			strcpy(fname + strlen(packdir) + 1, item->string);
+			strcpy(fname + strlen(packdir) + 41, exts[ext]);
+			if (!file_exists(fname))
+				continue;
+
+			strcpy(fname_old, packdir);
+			strcpy(fname_old + strlen(packdir), "/old-");
+			strcpy(fname_old + strlen(packdir) + 5, item->string);
+			strcpy(fname_old + strlen(packdir) + 45, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break while loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+
+		for_each_string_list_item(item, &rollback) {
+			sprintf(fname, "%s/%s", packdir, item->string);
+			sprintf(fname_old, "%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	struct string_list fullbases = STRING_LIST_INIT_DUP;
+	for_each_string_list_item(item, &names) {
+		string_list_append(&fullbases, item->string);
+
+		sprintf(fname, "%s/pack-%s.pack", packdir, item->string);
+		sprintf(fname_old, "%s-%s.pack", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+
+		sprintf(fname, "%s/pack-%s.idx", packdir, item->string);
+		sprintf(fname_old, "%s-%s.idx", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		sprintf(fname, "%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+
+		sprintf(fname, "%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		sort_string_list(&fullbases);
+		fname = xmalloc(strlen(packtmp) + strlen("/") + 40 + strlen(".pack") + 1);
+		for_each_string_list_item(item, &existing_packs) {
+			if (!string_list_has_string(&fullbases, item->string))
+				remove_pack(packdir, item->string);
+		}
+		free(fname);
+		cmd_i = 0;
+		cmd_args[cmd_i++] = "prune-packed";
+		cmd_args[cmd_i++] = NULL;
+		/* TODO: pass argument: ${GIT_QUIET:+-q} */
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		cmd_i = 0;
+		cmd_args[cmd_i++] = "update-server-info";
+		cmd_args[cmd_i++] = NULL;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
+
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv2] repack: rewrite the shell script in C.
  2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
@ 2013-08-17 13:34                         ` René Scharfe
  2013-08-17 19:18                           ` Kyle J. McKay
  2013-08-18 14:34                           ` Stefan Beller
  0 siblings, 2 replies; 72+ messages in thread
From: René Scharfe @ 2013-08-17 13:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster

>
>
> This is the beginning of the rewrite of the repacking.
>
>  * Removed unneeded system header files
>  * corrected remove_pack to really remove any pack files with the given
>    sha1
>  * fail if pack-objects fails
>  * Only test t7701 (2nd) fails now  with this patch.
>
> Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
> ---
>  Makefile                                        |   2 +-
>  builtin.h                                       |   1 +
>  builtin/repack.c                                | 411 ++++++++++++++++++++++++
>  git-repack.sh => contrib/examples/git-repack.sh |   0
>  git.c                                           |   1 +
>  5 files changed, 414 insertions(+), 1 deletion(-)
>  create mode 100644 builtin/repack.c
>  rename git-repack.sh => contrib/examples/git-repack.sh (100%)
>
> diff --git a/Makefile b/Makefile
> index ef442eb..4ec5bbe 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
>  SCRIPT_SH += git-quiltimport.sh
>  SCRIPT_SH += git-rebase.sh
>  SCRIPT_SH += git-remote-testgit.sh
> -SCRIPT_SH += git-repack.sh
>  SCRIPT_SH += git-request-pull.sh
>  SCRIPT_SH += git-stash.sh
>  SCRIPT_SH += git-submodule.sh
> @@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
>  BUILTIN_OBJS += builtin/remote.o
>  BUILTIN_OBJS += builtin/remote-ext.o
>  BUILTIN_OBJS += builtin/remote-fd.o
> +BUILTIN_OBJS += builtin/repack.o
>  BUILTIN_OBJS += builtin/replace.o
>  BUILTIN_OBJS += builtin/rerere.o
>  BUILTIN_OBJS += builtin/reset.o
> diff --git a/builtin.h b/builtin.h
> index 8afa2de..b56cf07 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
>  extern int cmd_remote(int argc, const char **argv, const char *prefix);
>  extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
>  extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
> +extern int cmd_repack(int argc, const char **argv, const char *prefix);
>  extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
>  extern int cmd_rerere(int argc, const char **argv, const char *prefix);
>  extern int cmd_reset(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/repack.c b/builtin/repack.c
> new file mode 100644
> index 0000000..f72911d
> --- /dev/null
> +++ b/builtin/repack.c
> @@ -0,0 +1,411 @@
> +/*
> + * The shell version was written by Linus Torvalds (2005) and many others.
> + * This is a translation into C by Stefan Beller (2013)
> + */
> +
> +#include "builtin.h"
> +#include "cache.h"
> +#include "dir.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +#include "sigchain.h"
> +#include "strbuf.h"
> +#include "string-list.h"
> +
> +static const char *const git_repack_usage[] = {
> +	N_("git repack [options]"),
> +	NULL
> +};
> +
> +/* enabled by default since 22c79eab (2008-06-25) */
> +static int delta_base_offset = 1;
> +
> +static int repack_config(const char *var, const char *value, void *cb)
> +{
> +	if (!strcmp(var, "repack.usedeltabaseoffset")) {
> +		delta_base_offset = git_config_bool(var, value);
> +		return 0;
> +	}
> +	return git_default_config(var, value, cb);
> +}
> +
> +static void remove_temporary_files() {
> +	DIR *dir;
> +	struct dirent *e;
> +	char *prefix, *path, *fname;
> +
> +	prefix = xmalloc(strlen(".tmp-10000-pack") + 1);
> +	sprintf(prefix, ".tmp-%d-pack", getpid());

This will overflow for PIDs with more than five digits. Better use a 
strbuf and build the string with strbuf_addf.  Or better, use mkpathdup, 
which does that under the hood.

> +
> +	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
> +	sprintf(path, "%s/pack", get_object_directory());

mkpathdup?

> +
> +	fname = xmalloc(strlen(path) + strlen("/")
> +		+ strlen(prefix) + strlen("/")
> +		+ 40 + strlen(".pack") + 1);
> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {
> +		if (!prefixcmp(e->d_name, prefix)) {
> +			sprintf(fname, "%s/%s", path, e->d_name);

If someone has a directory entry that begins with 'prefix' but is longer 
than expected this will overflow.  That's unlikely, but lets avoid it 
outright.  You could use a strbuf instead and reset it to the length of 
the prefix at the start of the loop (with strbuf_setlen), before adding 
the entry's name.

> +			unlink(fname);
> +		}
> +	}
> +	free(fname);
> +	free(prefix);
> +	free(path);
> +	closedir(dir);
> +}
> +
> +static void remove_pack_on_signal(int signo)
> +{
> +	remove_temporary_files();
> +	sigchain_pop(signo);
> +	raise(signo);
> +}
> +
> +void get_pack_sha1_list(char *packdir, struct string_list *sha1_list)
> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *path, *suffix;
> +
> +	path = xmalloc(strlen(get_object_directory()) + strlen("/pack") + 1);
> +	sprintf(path, "%s/pack", get_object_directory());

mkpathdup again.

Or would it make sense to cd into the pack directory and avoid these 
string manipulations outright?  Probably not if we need to call other 
git functions later on.

> +
> +	suffix = ".pack";
> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {
> +		if (!suffixcmp(e->d_name, suffix)) {
> +			char buf[255], *sha1;
> +			strcpy(buf, e->d_name);
> +			buf[strlen(e->d_name) - strlen(suffix)] = '\0';
> +			sha1 = &buf[strlen(e->d_name) - strlen(suffix) - 40];
> +			string_list_append(sha1_list, sha1);

You could avoid the need for a temporary buffer by using xmemdupz and 
string_list_append_nodup instead.

> +		}
> +	}
> +	free(path);
> +	closedir(dir);
> +}
> +
> +/*
> + * remove_pack will remove any files following the pattern *${SHA1}.{EXT}
> + * where EXT is one of {pack, idx, keep}. The SHA1 consists of 40 chars and
> + * is specified by the sha1 parameter.
> + * path is specifying the directory in which all found files will be deleted.
> + */
> +void remove_pack(char *path, char* sha1)
> +{
> +	DIR *dir;
> +	struct dirent *e;
> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {
> +

Extra newline.

> +		char *sha_begin, *sha_end;
> +		sha_end = e->d_name + strlen(e->d_name);
> +		while (sha_end > e->d_name && *sha_end != '.')
> +			sha_end--;
> +
> +		/* do not touch files not ending in .pack, .idx or .keep */
> +		if (strcmp(sha_end, ".pack") &&
> +			strcmp(sha_end, ".idx") &&
> +			strcmp(sha_end, ".keep"))
> +			continue;
> +
> +		sha_begin = sha_end - 40;
> +
> +		if (sha_begin >= e->d_name && !strncmp(sha_begin, sha1, 40)) {
> +			char *fname;
> +			fname = xmalloc(strlen(path) + 1 + strlen(e->d_name));
> +			sprintf(fname, "%s/%s", path, e->d_name);

mkpathdup..

> +			unlink(fname);
> +			/*TODO: free(fname); fails here sometimes, needs investigation*/

Strange.  Perhaps valgrind can tell you what's wrong.

> +		}
> +	}
> +	closedir(dir);
> +}

Hmm, stepping back a bit, why not just build the paths and call unlink 
for them right away, without readdir?  The shell version only ever 
deletes existing .pack files (those in $existing alias existing_packs) 
as well as their .idx and .keep files, if present.  It doesn't use a 
glob pattern, unlike remove_pack here.

> +
> +int cmd_repack(int argc, const char **argv, const char *prefix) {
> +
> +	int pack_everything = 0;
> +	int pack_everything_but_loose = 0;
> +	int delete_redundant = 0;
> +	unsigned long unpack_unreachable = 0;
> +	int window = 0, window_memory = 0;
> +	int depth = 0;
> +	int max_pack_size = 0;
> +	int no_reuse_delta = 0, no_reuse_object = 0;
> +	int no_update_server_info = 0;
> +	int quiet = 0;
> +	int local = 0;
> +	char *packdir, *packtmp;
> +	const char *cmd_args[20];
> +	int cmd_i = 0;
> +	struct child_process cmd;
> +	struct string_list_item *item;
> +	struct string_list existing_packs = STRING_LIST_INIT_DUP;
> +	struct stat statbuffer;
> +	char window_str[64], window_mem_str[64], depth_str[64], max_pack_size_str[64];
> +
> +	struct option builtin_repack_options[] = {
> +		OPT_BOOL('a', "all", &pack_everything,
> +				N_("pack everything in a single pack")),
> +		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
> +				N_("same as -a, and turn unreachable objects loose")),
> +		OPT_BOOL('d', "delete-redundant", &delete_redundant,
> +				N_("remove redundant packs, and run git-prune-packed")),
> +		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
> +				N_("pass --no-reuse-delta to git-pack-objects")),
> +		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
> +				N_("pass --no-reuse-object to git-pack-objects")),
> +		OPT_BOOL('n', NULL, &no_update_server_info,
> +				N_("do not run git-update-server-info")),
> +		OPT__QUIET(&quiet, N_("be quiet")),
> +		OPT_BOOL('l', "local", &local,
> +				N_("pass --local to git-pack-objects")),
> +		OPT_DATE(0, "unpack-unreachable", &unpack_unreachable,
> +				N_("with -A, do not loosen objects older than this Packing constraints")),
> +		OPT_INTEGER(0, "window", &window,
> +				N_("size of the window used for delta compression")),
> +		OPT_INTEGER(0, "window-memory", &window_memory,
> +				N_("same as the above, but limit memory size instead of entries count")),
> +		OPT_INTEGER(0, "depth", &depth,
> +				N_("limits the maximum delta depth")),
> +		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
> +				N_("maximum size of each packfile")),
> +		OPT_END()
> +	};
> +
> +	git_config(repack_config, NULL);
> +
> +	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> +				git_repack_usage, 0);
> +
> +	sigchain_push_common(remove_pack_on_signal);
> +
> +	packdir = mkpath("%s/pack", get_object_directory());

Ah, mkpath is already used.  That function is a bit tricky because you 
can only use it with four paths concurrently and some internal functions 
might already need one (or more) of the slots for itself.  It's better 
to consume its output on the spot (like in printf("my path is %s\n", 
mkpath(...));) or use mkpathdup.

> +	packtmp = xmalloc(strlen(packdir) + strlen("/.tmp-10000-pack") + 1);
> +	sprintf(packtmp, "%s/.tmp-%d-pack", packdir, getpid());

mkpathdup again..

> +
> +	remove_temporary_files();
> +
> +	cmd_args[cmd_i++] = "pack-objects";
> +	cmd_args[cmd_i++] = "--keep-true-parents";
> +	cmd_args[cmd_i++] = "--honor-pack-keep";
> +	cmd_args[cmd_i++] = "--non-empty";
> +	cmd_args[cmd_i++] = "--all";
> +	cmd_args[cmd_i++] = "--reflog";
> +
> +	if (window) {
> +		sprintf(window_str, "--window=%u", window);
> +		cmd_args[cmd_i++] = window_str;
> +	}
> +	if (window_memory) {
> +		sprintf(window_mem_str, "--window-memory=%u", window_memory);
> +		cmd_args[cmd_i++] = window_str;
> +	}
> +	if (depth) {
> +		sprintf(depth_str, "--depth=%u", depth);
> +		cmd_args[cmd_i++] = depth_str;
> +	}
> +	if (max_pack_size) {
> +		sprintf(max_pack_size_str, "--max_pack_size=%u", max_pack_size);
> +		cmd_args[cmd_i++] = max_pack_size_str;
> +	}


You can simplify that part by using argv_array and argv_array_pushf.

> +
> +	if (pack_everything + pack_everything_but_loose == 0) {
> +		cmd_args[cmd_i++] = "--unpacked";
> +		cmd_args[cmd_i++] = "--incremental";
> +	} else {
> +		if (pack_everything_but_loose)
> +			cmd_args[cmd_i++] = "--unpack-unreachable";
> +
> +		struct string_list sha1_list = STRING_LIST_INIT_DUP;
> +		get_pack_sha1_list(packdir, &sha1_list);
> +		for_each_string_list_item(item, &sha1_list) {
> +			char *fname;
> +			fname = xmalloc(strlen(packdir) + strlen("/") + 40 + strlen(".keep"));
> +			sprintf(fname, "%s/%s.keep", packdir, item->string);

mkpathdup..

Or maybe go through the entries in the pack directory once, like already 
done in get_pack_sha1_list, and instead of just making a list of .pack 
files, make a list of .keep files as well.  Then work with those lists 
instead of accessing the directory again with stat.

> +			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
> +				/* when the keep file is there, we're ignoring that pack */
> +			} else {
> +				string_list_append(&existing_packs, item->string);
> +			}
> +		}
> +
> +		if (existing_packs.nr && unpack_unreachable && delete_redundant) {
> +			/*
> +			 * TODO: convert unpack_unreachable (being time since epoch)
> +			 * to an aproxidate again
> +			 */
> +			cmd_args[cmd_i++] = "--unpack-unreachable=$DATE";
> +		}
> +	}
> +
> +	if (local)
> +		cmd_args[cmd_i++] = "--local";
> +
> +	if (delta_base_offset)
> +		cmd_args[cmd_i++] = "--delta-base-offset";
> +
> +	cmd_args[cmd_i++] = packtmp;
> +	cmd_args[cmd_i] = NULL;
> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.argv = cmd_args;
> +	cmd.git_cmd = 1;
> +	cmd.out = -1;
> +	cmd.no_stdin = 1;
> +
> +	if (run_command(&cmd))
> +		return 1;
> +
> +	struct string_list names = STRING_LIST_INIT_DUP;
> +	struct string_list rollback = STRING_LIST_INIT_DUP;
> +
> +	char line[1024];
> +	int counter = 0;
> +	FILE *out = xfdopen(cmd.out, "r");
> +	while (fgets(line, sizeof(line), out)) {
> +		/* a line consists of 40 hex chars + '\n' */
> +		assert(strlen(line) == 41);
> +		line[40] = '\0';
> +		string_list_append(&names, line);
> +		counter++;
> +	}
> +	if (!counter)
> +		printf("Nothing new to pack.\n");
> +	fclose(out);
> +
> +	char *fname, *fname_old;
> +	fname = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
> +	strcpy(fname, packdir);
> +	strcpy(fname + strlen(packdir), "/");
> +
> +	fname_old = xmalloc(strlen(packdir) + strlen("/old-pack-") + 40 + strlen(".pack") + 1);
> +	strcpy(fname_old, packdir);
> +	strcpy(fname_old + strlen(packdir), "/");
> +	char *exts[2] = {".idx", ".pack"};
> +
> +	int failed = 0;
> +
> +	for_each_string_list_item(item, &names) {
> +		int ext;
> +		for (ext = 0; ext < 1; ext++) {
> +			strcpy(fname + strlen(packdir) + 1, item->string);
> +			strcpy(fname + strlen(packdir) + 41, exts[ext]);
> +			if (!file_exists(fname))
> +				continue;
> +
> +			strcpy(fname_old, packdir);
> +			strcpy(fname_old + strlen(packdir), "/old-");
> +			strcpy(fname_old + strlen(packdir) + 5, item->string);
> +			strcpy(fname_old + strlen(packdir) + 45, exts[ext]);
> +			if (file_exists(fname_old))
> +				unlink(fname_old);
> +
> +			if (rename(fname, fname_old)) {
> +				failed = 1;
> +				break;
> +			}
> +			string_list_append(&rollback, fname);

mkpathdup with string_list_append_nodup instead?

> +		}
> +		if (failed)
> +			/* set to last element to break while loop */
> +			item = names.items + names.nr;
> +	}
> +	if (failed) {
> +		struct string_list rollback_failure;
> +
> +		for_each_string_list_item(item, &rollback) {
> +			sprintf(fname, "%s/%s", packdir, item->string);
> +			sprintf(fname_old, "%s/old-%s", packdir, item->string);
> +			if (rename(fname_old, fname))
> +				string_list_append(&rollback_failure, fname);

Dito.

> +		}
> +
> +		if (rollback.nr) {
> +			int i;
> +			fprintf(stderr,
> +				"WARNING: Some packs in use have been renamed by\n"
> +				"WARNING: prefixing old- to their name, in order to\n"
> +				"WARNING: replace them with the new version of the\n"
> +				"WARNING: file.  But the operation failed, and\n"
> +				"WARNING: attempt to rename them back to their\n"
> +				"WARNING: original names also failed.\n"
> +				"WARNING: Please rename them in $PACKDIR manually:\n");
> +			for (i = 0; i < rollback.nr; i++)
> +				fprintf(stderr, "WARNING:   old-%s -> %s\n",
> +					rollback.items[i].string,
> +					rollback.items[i].string);
> +		}
> +		exit(1);
> +	}
> +
> +	/* Now the ones with the same name are out of the way... */
> +	struct string_list fullbases = STRING_LIST_INIT_DUP;
> +	for_each_string_list_item(item, &names) {
> +		string_list_append(&fullbases, item->string);

Why make a copy of 'names'?  Can't you use it directly instead of 
'fullbases'?  Ah, the Shell version adds a "pack-" at the beginning of 
each string.  We don't need to do that and thus can get rid of that 
extra list, no?

> +
> +		sprintf(fname, "%s/pack-%s.pack", packdir, item->string);
> +		sprintf(fname_old, "%s-%s.pack", packtmp, item->string);
> +		stat(fname_old, &statbuffer);
> +		statbuffer.st_mode &= ~S_IWOTH;
> +		chmod(fname_old, statbuffer.st_mode);
> +		if (rename(fname_old, fname))
> +			die("Could not rename packfile: %s -> %s", fname_old, fname);
> +
> +		sprintf(fname, "%s/pack-%s.idx", packdir, item->string);
> +		sprintf(fname_old, "%s-%s.idx", packtmp, item->string);
> +		stat(fname_old, &statbuffer);
> +		statbuffer.st_mode &= ~S_IWOTH;
> +		chmod(fname_old, statbuffer.st_mode);
> +		if (rename(fname_old, fname))
> +			die("Could not rename packfile: %s -> %s", fname_old, fname);
> +	}
> +
> +	/* Remove the "old-" files */
> +	for_each_string_list_item(item, &names) {
> +		sprintf(fname, "%s/old-pack-%s.idx", packdir, item->string);
> +		if (remove_path(fname))
> +			die("Could not remove file: %s", fname);
> +
> +		sprintf(fname, "%s/old-pack-%s.pack", packdir, item->string);
> +		if (remove_path(fname))
> +			die("Could not remove file: %s", fname);
> +	}
> +
> +	/* End of pack replacement. */
> +	if (delete_redundant) {
> +		sort_string_list(&fullbases);
> +		fname = xmalloc(strlen(packtmp) + strlen("/") + 40 + strlen(".pack") + 1);
> +		for_each_string_list_item(item, &existing_packs) {
> +			if (!string_list_has_string(&fullbases, item->string))
> +				remove_pack(packdir, item->string);
> +		}
> +		free(fname);

Why allocate memory for fname and give it back unused?

> +		cmd_i = 0;
> +		cmd_args[cmd_i++] = "prune-packed";
> +		cmd_args[cmd_i++] = NULL;
> +		/* TODO: pass argument: ${GIT_QUIET:+-q} */
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = cmd_args;
> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}
> +
> +	if (!no_update_server_info) {
> +		cmd_i = 0;
> +		cmd_args[cmd_i++] = "update-server-info";
> +		cmd_args[cmd_i++] = NULL;
> +
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = cmd_args;
> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}
> +	return 0;
> +}
> +
> diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
> similarity index 100%
> rename from git-repack.sh
> rename to contrib/examples/git-repack.sh
> diff --git a/git.c b/git.c
> index 2025f77..03510be 100644
> --- a/git.c
> +++ b/git.c
> @@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
>  		{ "remote", cmd_remote, RUN_SETUP },
>  		{ "remote-ext", cmd_remote_ext },
>  		{ "remote-fd", cmd_remote_fd },
> +		{ "repack", cmd_repack, RUN_SETUP },
>  		{ "replace", cmd_replace, RUN_SETUP },
>  		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
>  		{ "rerere", cmd_rerere, RUN_SETUP },
> -- 1.8.4.rc3.1.gc1ebd90
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv2] repack: rewrite the shell script in C.
  2013-08-17 13:34                         ` René Scharfe
@ 2013-08-17 19:18                           ` Kyle J. McKay
  2013-08-18 14:34                           ` Stefan Beller
  1 sibling, 0 replies; 72+ messages in thread
From: Kyle J. McKay @ 2013-08-17 19:18 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	Junio C Hamano, René Scharfe

On Aug 17, 2013, at 06:34, René Scharfe wrote:
> On Aug 15, 2013, at 17:12, Stefan Beller wrote:
>> +		if (sha_begin >= e->d_name && !strncmp(sha_begin, sha1, 40)) {
>> +			char *fname;
>> +			fname = xmalloc(strlen(path) + 1 + strlen(e->d_name));

This needs another +1 because

>> +			sprintf(fname, "%s/%s", path, e->d_name);

len(path) + len("/") + len(e->d_name) + len("\0")

>
> mkpathdup..
>
>> +			unlink(fname);
>> +			/*TODO: free(fname); fails here sometimes, needs investigation*/
>
> Strange.  Perhaps valgrind can tell you what's wrong.

which is probably why it fails since the byte beyond the end of fname  
is being overwritten with a nul.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv2] repack: rewrite the shell script in C.
  2013-08-17 13:34                         ` René Scharfe
  2013-08-17 19:18                           ` Kyle J. McKay
@ 2013-08-18 14:34                           ` Stefan Beller
  2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-18 14:34 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster

On 08/17/2013 03:34 PM, René Scharfe wrote:
> 
> Hmm, stepping back a bit, why not just build the paths and call unlink
> for them right away, without readdir?  The shell version only ever
> deletes existing .pack files (those in $existing alias existing_packs)
> as well as their .idx and .keep files, if present.  It doesn't use a
> glob pattern, unlike remove_pack here.

I'll meditate on that. 

Thanks for all the other remarks. Now the code looks much more
git-ish, similar to other commands. 
The lines of code went down from 411 to 385, I guess we can cut off
more inefficiencies there. 

As you suggested, maybe we should juts have one helper function to
read in the pack directory and keeping all the information (complete filename),
so we do not need to find the exact filename later again by looping over
the directory again.

Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC PATCHv3] repack: rewrite the shell script in C.
  2013-08-18 14:34                           ` Stefan Beller
@ 2013-08-18 14:36                             ` Stefan Beller
  2013-08-18 15:41                               ` Kyle J. McKay
  2013-08-18 16:44                               ` René Scharfe
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-18 14:36 UTC (permalink / raw)
  To: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster
  Cc: Stefan Beller

This is the beginning of the rewrite of the repacking.

 * replace all plain string handling functions
   by git helper functions, most often mkpathdup
 * use argv-array structs to pass arguments to
   other git invocations.

Only test t7701 (2nd) fails now  with this patch.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 385 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 388 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..190eb5f
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,385 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+#include "argv-array.h"
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path;
+
+	prefix = mkpathdup(".tmp-%d-pack", getpid());
+	path = mkpathdup("%s/pack", get_object_directory());
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			struct strbuf fname = STRBUF_INIT;
+			strbuf_addf(&fname, "%s/%s", path, e->d_name);
+			unlink(strbuf_detach(&fname, NULL));
+		}
+	}
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+void get_pack_sha1_list(char *packdir, struct string_list *sha1_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix;
+
+	path = mkpathdup("%s/pack", get_object_directory());
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			char *buf, *sha1;
+			buf = xmemdupz(e->d_name, strlen(e->d_name));
+			buf[strlen(e->d_name) - strlen(suffix)] = '\0';
+			if (strlen(e->d_name) - strlen(suffix) > 40) {
+				sha1 = &buf[strlen(e->d_name) - strlen(suffix) - 40];
+				string_list_append_nodup(sha1_list, sha1);
+			} else {
+				/*TODO: what should happen to pack files having no 40 char sha1 specifier?*/
+			}
+		}
+	}
+	free(path);
+	closedir(dir);
+}
+
+/*
+ * remove_pack will remove any files following the pattern *${SHA1}.{EXT}
+ * where EXT is one of {pack, idx, keep}. The SHA1 consists of 40 chars and
+ * is specified by the sha1 parameter.
+ * path is specifying the directory in which all found files will be deleted.
+ */
+void remove_pack(char *path, char* sha1)
+{
+	DIR *dir;
+	struct dirent *e;
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		char *sha_begin, *sha_end;
+		sha_end = e->d_name + strlen(e->d_name);
+		while (sha_end > e->d_name && *sha_end != '.')
+			sha_end--;
+
+		/* do not touch files not ending in .pack, .idx or .keep */
+		if (strcmp(sha_end, ".pack") &&
+			strcmp(sha_end, ".idx") &&
+			strcmp(sha_end, ".keep"))
+			continue;
+
+		sha_begin = sha_end - 40;
+
+		if (sha_begin >= e->d_name && !strncmp(sha_begin, sha1, 40)) {
+			char *fname;
+			fname = mkpathdup("%s/%s", path, e->d_name);
+			unlink(fname);
+			free(fname);
+		}
+	}
+	closedir(dir);
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	char *unpack_unreachable = NULL;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct stat statbuffer;
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
+
+	remove_temporary_files();
+
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	argv_array_push(&cmd_args, "pack-objects");
+	argv_array_push(&cmd_args, "--keep-true-parents");
+	argv_array_push(&cmd_args, "--honor-pack-keep");
+	argv_array_push(&cmd_args, "--non-empty");
+	argv_array_push(&cmd_args, "--all");
+	argv_array_push(&cmd_args, "--reflog");
+
+	if (window)
+		argv_array_pushf(&cmd_args, "--window=%u", window);
+
+	if (window_memory)
+		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
+
+	if (depth)
+		argv_array_pushf(&cmd_args, "--depth=%u", depth);
+
+	if (max_pack_size)
+		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		argv_array_push(&cmd_args, "--unpacked");
+		argv_array_push(&cmd_args, "--incremental");
+	} else {
+		if (pack_everything_but_loose)
+			argv_array_push(&cmd_args, "--unpack-unreachable");
+
+		struct string_list sha1_list = STRING_LIST_INIT_DUP;
+		get_pack_sha1_list(packdir, &sha1_list);
+		for_each_string_list_item(item, &sha1_list) {
+			char *fname;
+			fname = mkpathdup("%s/%s.keep", packdir, item->string);
+			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+			free(fname);
+		}
+
+		if (existing_packs.nr && unpack_unreachable && delete_redundant) {
+			argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
+		}
+	}
+
+	if (local)
+		argv_array_push(&cmd_args,  "--local");
+
+	if (delta_base_offset)
+		argv_array_push(&cmd_args,  "--delta-base-offset");
+
+	argv_array_push(&cmd_args, packtmp);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = argv_array_detach(&cmd_args, NULL);
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	if (run_command(&cmd))
+		return 1;
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		assert(strlen(line) == 41);
+		line[40] = '\0';
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	char *exts[2] = {".idx", ".pack"};
+	int failed = 0;
+	for_each_string_list_item(item, &names) {
+		int ext;
+		for (ext = 0; ext < 1; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
+			if (!file_exists(fname)) {
+				free(fname);
+				continue;
+			}
+
+			fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append_nodup(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break for_each loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+		for_each_string_list_item(item, &rollback) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s", packdir, item->string);
+			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+			free(fname);
+			free(fname_old);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	for_each_string_list_item(item, &names) {
+		char *fname, *fname_old;
+		fname = mkpathdup("%s/pack-%s.pack", packdir, item->string);
+		fname_old = mkpathdup("%s-%s.pack", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+		free(fname);
+		free(fname_old);
+
+		fname = mkpathdup("%s/pack-%s.idx", packdir, item->string);
+		fname_old = mkpathdup("%s-%s.idx", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+		free(fname);
+		free(fname_old);
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		char *fname;
+		fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+
+		fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		sort_string_list(&names);
+		for_each_string_list_item(item, &existing_packs) {
+			if (!string_list_has_string(&names, item->string))
+				remove_pack(packdir, item->string);
+		}
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "prune-packed");
+		/* TODO: pass argument: ${GIT_QUIET:+-q} */
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "update-server-info");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
+
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.2.g2c2b664

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv3] repack: rewrite the shell script in C.
  2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
@ 2013-08-18 15:41                               ` Kyle J. McKay
  2013-08-18 16:44                               ` René Scharfe
  1 sibling, 0 replies; 72+ messages in thread
From: Kyle J. McKay @ 2013-08-18 15:41 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster

On Aug 18, 2013, at 07:36, Stefan Beller wrote:

> +			fprintf(stderr,
> +				"WARNING: Some packs in use have been renamed by\n"
> +				"WARNING: prefixing old- to their name, in order to\n"
> +				"WARNING: replace them with the new version of the\n"
> +				"WARNING: file.  But the operation failed, and\n"
> +				"WARNING: attempt to rename them back to their\n"
> +				"WARNING: original names also failed.\n"

Bad grammar "But the operation failed, and attempt to rename them ...".

How about "But the operation failed, and the attempt to rename  
them ..." instead.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv3] repack: rewrite the shell script in C.
  2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
  2013-08-18 15:41                               ` Kyle J. McKay
@ 2013-08-18 16:44                               ` René Scharfe
  2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
  1 sibling, 1 reply; 72+ messages in thread
From: René Scharfe @ 2013-08-18 16:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster

> +static void remove_temporary_files() {
> +	DIR *dir;
> +	struct dirent *e;
> +	char *prefix, *path;
> +
> +	prefix = mkpathdup(".tmp-%d-pack", getpid());
> +	path = mkpathdup("%s/pack", get_object_directory());
> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {
> +		if (!prefixcmp(e->d_name, prefix)) {
> +			struct strbuf fname = STRBUF_INIT;
> +			strbuf_addf(&fname, "%s/%s", path, e->d_name);
> +			unlink(strbuf_detach(&fname, NULL));

I'm not sure I like the memory allocation done here for each file to be
deleted, but it's probably not worth worrying about.

> +void get_pack_sha1_list(char *packdir, struct string_list *sha1_list)
> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *path, *suffix;
> +
> +	path = mkpathdup("%s/pack", get_object_directory());
> +	suffix = ".pack";
> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {
> +		if (!suffixcmp(e->d_name, suffix)) {
> +			char *buf, *sha1;
> +			buf = xmemdupz(e->d_name, strlen(e->d_name));
> +			buf[strlen(e->d_name) - strlen(suffix)] = '\0';
> +			if (strlen(e->d_name) - strlen(suffix) > 40) {
> +				sha1 = &buf[strlen(e->d_name) - strlen(suffix) - 40];
> +				string_list_append_nodup(sha1_list, sha1);

Unless sha1 == buf, this will crash when that string_list is freed
because sha1 was not returned by malloc.  If it doesn't crash for
you then I guess sha1_list is never freed. :)  How about just
taking the part of d_name we need, like this?

			size_t len = strlen(e->d_name) - strlen(suffix);
			if (len > 40) {
				char *sha1 = xmemdupz(e->d_name + len - 40, 40);
				string_list_append_nodup(sha1_list, sha1);
			}

> +			} else {
> +				/*TODO: what should happen to pack files having no 40 char sha1 specifier?*/

What does the current code do with them?  From a quick glance it
looks like it deletes them in the end, right?

René

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-18 16:44                               ` René Scharfe
@ 2013-08-18 22:26                                 ` Stefan Beller
  2013-08-19 23:23                                   ` Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-18 22:26 UTC (permalink / raw)
  To: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle
  Cc: Stefan Beller

This is the beginning of the rewrite of the repacking.

 * rename get_pack_sha1_list to get_pack_filename_list, which
 * reads the pack directory only once as suggested by Rene.
 * fix the grammar as suggested by Kyle.

All tests have been positive at least once now.
However there is still one non-deterministic error occuring,
I am tired to search for it now, I'll get it working tommorow.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 372 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..bfaaad7
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,372 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+#include "argv-array.h"
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path;
+
+	prefix = mkpathdup(".tmp-%d-pack", getpid());
+	path = mkpathdup("%s/pack", get_object_directory());
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			struct strbuf fname = STRBUF_INIT;
+			strbuf_addf(&fname, "%s/%s", path, e->d_name);
+			unlink(strbuf_detach(&fname, NULL));
+		}
+	}
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+/*
+ * Fills the filename list with all the files found in the pack directory
+ * ending with .pack, without that extension.
+ */
+void get_pack_filenames(char *packdir, struct string_list *fname_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix, *fname;
+
+	path = mkpathdup("%s/pack", get_object_directory());
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			size_t len = strlen(e->d_name) - strlen(suffix);
+			fname = xmemdupz(e->d_name, len);
+			string_list_append_nodup(fname_list, fname);
+		}
+	}
+	free(path);
+	closedir(dir);
+}
+
+/*
+ * remove_pack will remove any files following the pattern *${SHA1}.{EXT}
+ * where EXT is one of {pack, idx, keep}. The SHA1 consists of 40 chars and
+ * is specified by the sha1 parameter.
+ * path is specifying the directory in which all found files will be deleted.
+ */
+void remove_pack(char *path, char* sha1)
+{
+	char *exts[] = {".pack", ".idx",".keep"};
+	char *fname;
+	int ext = 0;
+	for (ext = 0; ext < 3; ext++) {
+		fname = mkpathdup("%s/%s%s", path, sha1, exts[ext]);
+		unlink(fname);
+		free(fname);
+	}
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	char *unpack_unreachable = NULL;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct stat statbuffer;
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
+
+	remove_temporary_files();
+
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	argv_array_push(&cmd_args, "pack-objects");
+	argv_array_push(&cmd_args, "--keep-true-parents");
+	argv_array_push(&cmd_args, "--honor-pack-keep");
+	argv_array_push(&cmd_args, "--non-empty");
+	argv_array_push(&cmd_args, "--all");
+	argv_array_push(&cmd_args, "--reflog");
+
+	if (window)
+		argv_array_pushf(&cmd_args, "--window=%u", window);
+
+	if (window_memory)
+		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
+
+	if (depth)
+		argv_array_pushf(&cmd_args, "--depth=%u", depth);
+
+	if (max_pack_size)
+		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		argv_array_push(&cmd_args, "--unpacked");
+		argv_array_push(&cmd_args, "--incremental");
+	} else {
+		if (pack_everything_but_loose && delete_redundant)
+			argv_array_push(&cmd_args, "--unpack-unreachable");
+
+		struct string_list fname_list = STRING_LIST_INIT_DUP;
+		get_pack_filenames(packdir, &fname_list);
+		for_each_string_list_item(item, &fname_list) {
+			char *fname;
+			fname = mkpathdup("%s/%s.keep", packdir, item->string);
+			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+			free(fname);
+		}
+
+		if (existing_packs.nr && unpack_unreachable && delete_redundant) {
+			argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
+		}
+	}
+
+	if (local)
+		argv_array_push(&cmd_args,  "--local");
+
+	if (delta_base_offset)
+		argv_array_push(&cmd_args,  "--delta-base-offset");
+
+	argv_array_push(&cmd_args, packtmp);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = argv_array_detach(&cmd_args, NULL);
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	if (run_command(&cmd))
+		return 1;
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		assert(strlen(line) == 41);
+		line[40] = '\0';
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	char *exts[2] = {".idx", ".pack"};
+	int failed = 0;
+	for_each_string_list_item(item, &names) {
+		int ext;
+		for (ext = 0; ext < 1; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
+			if (!file_exists(fname)) {
+				free(fname);
+				continue;
+			}
+
+			fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append_nodup(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break for_each loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+		for_each_string_list_item(item, &rollback) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s", packdir, item->string);
+			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+			free(fname);
+			free(fname_old);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and the\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	for_each_string_list_item(item, &names) {
+		char *fname, *fname_old;
+		fname = mkpathdup("%s/pack-%s.pack", packdir, item->string);
+		fname_old = mkpathdup("%s-%s.pack", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+		free(fname);
+		free(fname_old);
+
+		fname = mkpathdup("%s/pack-%s.idx", packdir, item->string);
+		fname_old = mkpathdup("%s-%s.idx", packtmp, item->string);
+		stat(fname_old, &statbuffer);
+		statbuffer.st_mode &= ~S_IWOTH;
+		chmod(fname_old, statbuffer.st_mode);
+		if (rename(fname_old, fname))
+			die("Could not rename packfile: %s -> %s", fname_old, fname);
+		free(fname);
+		free(fname_old);
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		char *fname;
+		fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+
+		fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		sort_string_list(&names);
+		for_each_string_list_item(item, &existing_packs) {
+			char *sha1;
+			size_t len = strlen(item->string);
+			if (len < 40)
+				continue;
+			sha1 = item->string + len - 40;
+			if (!string_list_has_string(&names, sha1))
+				remove_pack(packdir, item->string);
+		}
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "prune-packed");
+		if (quiet)
+			argv_array_push(&cmd_args, "--quiet");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "update-server-info");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
+
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.2.g2c2b664

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
@ 2013-08-19 23:23                                   ` Stefan Beller
  2013-08-20 13:31                                     ` Johannes Sixt
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-19 23:23 UTC (permalink / raw)
  To: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle
  Cc: Stefan Beller

Hi,

so today I compared the argument lists of the repack shell script with the
C rewrite passed on to the pack-objects command and fixed some corner 
cases (-A -d --unpack-unreachable=%s should only pass 
--unpack-unreachable=%s once to pack-objects)
Also I fixed some missing smaller options (--quiet, --no-reuse-delta, --no-reuse-object).

I have run the test suite several times successfully now,
trying to find a pattern for the non-deterministically bug, which appears to
only occur in 1 out of 4 test suite runs.

The test suite has around 40 calls to repack and 35 calls to gc, which calls
repack internally. That alone covers quite a lot of the repack options,
but the debugging for the test cases is no fun, as the the calls to repack
are usually not the main concern of the respective tests.

There are however 
  t7700-repack.sh 
  t7701-repack-unpack-unreachable.sh
  
It was suggested earlier, and I think it's a good idea to enhance those
tests.

Anyway, here is an updated version of the repack rewrite.

Stefan

--8<--
From f6da16ac3ca71aa746fe6d9224b06e6cc4e7a104 Mon Sep 17 00:00:00 2001
From: Stefan Beller <stefanbeller@googlemail.com>
Date: Fri, 16 Aug 2013 02:08:47 +0200
Subject: [RFC PATCHv4] repack: rewrite the shell script in C.

This is the beginning of the rewrite of the repacking.

All tests have been positive at least once now.
However there is still a non-deterministic error occuring in
about 1 out of 4 test suite runs (usually in 7701 or 9301,
but could also occur in 5501 or 3306 iirc)

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 363 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..a87900e
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,363 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "argv-array.h"
+
+static int delta_base_offset = 0;
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path;
+
+	prefix = mkpathdup(".tmp-%d-pack", getpid());
+	path = mkpathdup("%s/pack", get_object_directory());
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			struct strbuf fname = STRBUF_INIT;
+			strbuf_addf(&fname, "%s/%s", path, e->d_name);
+			unlink(strbuf_detach(&fname, NULL));
+		}
+	}
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+/*
+ * Fills the filename list with all the files found in the pack directory
+ * ending with .pack, without that extension.
+ */
+void get_pack_filenames(char *packdir, struct string_list *fname_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix, *fname;
+
+	path = mkpathdup("%s/pack", get_object_directory());
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			size_t len = strlen(e->d_name) - strlen(suffix);
+			fname = xmemdupz(e->d_name, len);
+			string_list_append_nodup(fname_list, fname);
+		}
+	}
+	free(path);
+	closedir(dir);
+}
+
+void remove_pack(char *path, char* sha1)
+{
+	char *exts[] = {".pack", ".idx", ".keep"};
+	int ext = 0;
+	for (ext = 0; ext < 3; ext++) {
+		char *fname;
+		fname = mkpathdup("%s/%s%s", path, sha1, exts[ext]);
+		unlink(fname);
+		free(fname);
+	}
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	char *unpack_unreachable = NULL;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+	char *packdir, *packtmp;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct stat statbuffer;
+	int ext;
+	char *exts[2] = {".idx", ".pack"};
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', "all", &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
+
+	remove_temporary_files();
+
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	argv_array_push(&cmd_args, "pack-objects");
+	argv_array_push(&cmd_args, "--keep-true-parents");
+	argv_array_push(&cmd_args, "--honor-pack-keep");
+	argv_array_push(&cmd_args, "--non-empty");
+	argv_array_push(&cmd_args, "--all");
+	argv_array_push(&cmd_args, "--reflog");
+
+	if (window)
+		argv_array_pushf(&cmd_args, "--window=%u", window);
+
+	if (window_memory)
+		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
+
+	if (depth)
+		argv_array_pushf(&cmd_args, "--depth=%u", depth);
+
+	if (max_pack_size)
+		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
+
+	if (no_reuse_delta)
+		argv_array_pushf(&cmd_args, "--no-reuse-delta");
+
+	if (no_reuse_object)
+		argv_array_pushf(&cmd_args, "--no-reuse-object");
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		argv_array_push(&cmd_args, "--unpacked");
+		argv_array_push(&cmd_args, "--incremental");
+	} else {
+		struct string_list fname_list = STRING_LIST_INIT_DUP;
+		get_pack_filenames(packdir, &fname_list);
+		for_each_string_list_item(item, &fname_list) {
+			char *fname;
+			fname = mkpathdup("%s/%s.keep", packdir, item->string);
+			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+			free(fname);
+		}
+
+		if (existing_packs.nr && delete_redundant) {
+			if (unpack_unreachable)
+				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
+			else if (pack_everything_but_loose)
+				argv_array_push(&cmd_args, "--unpack-unreachable");
+		}
+	}
+
+	if (local)
+		argv_array_push(&cmd_args,  "--local");
+	if (quiet)
+		argv_array_push(&cmd_args,  "--quiet");
+	if (delta_base_offset)
+		argv_array_push(&cmd_args,  "--delta-base-offset");
+
+	argv_array_push(&cmd_args, packtmp);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = argv_array_detach(&cmd_args, NULL);
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	if (run_command(&cmd))
+		return 1;
+
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+
+	char line[1024];
+	int counter = 0;
+	FILE *out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		assert(strlen(line) == 41);
+		line[40] = '\0';
+		string_list_append(&names, line);
+		counter++;
+	}
+	if (!counter)
+		printf("Nothing new to pack.\n");
+	fclose(out);
+
+	int failed = 0;
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 1; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
+			if (!file_exists(fname)) {
+				free(fname);
+				continue;
+			}
+
+			fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			free(fname_old);
+			string_list_append_nodup(&rollback, fname);
+		}
+		if (failed)
+			/* set to last element to break for_each loop */
+			item = names.items + names.nr;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+		for_each_string_list_item(item, &rollback) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s", packdir, item->string);
+			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+			free(fname);
+			free(fname_old);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and the\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
+			fname_old = mkpathdup("%s-%s%s", packtmp, item->string, exts[ext]);
+			stat(fname_old, &statbuffer);
+			statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
+			chmod(fname_old, statbuffer.st_mode);
+			if (rename(fname_old, fname))
+				die("Could not rename packfile: %s -> %s", fname_old, fname);
+			free(fname);
+			free(fname_old);
+		}
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		char *fname;
+		fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+
+		fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die("Could not remove file: %s", fname);
+		free(fname);
+	}
+
+	/* End of pack replacement. */
+	if (delete_redundant) {
+		sort_string_list(&names);
+		for_each_string_list_item(item, &existing_packs) {
+			char *sha1;
+			size_t len = strlen(item->string);
+			if (len < 40)
+				continue;
+			sha1 = item->string + len - 40;
+			if (!string_list_has_string(&names, sha1))
+				remove_pack(packdir, item->string);
+		}
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "prune-packed");
+		if (quiet)
+			argv_array_push(&cmd_args, "--quiet");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "update-server-info");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-19 23:23                                   ` Stefan Beller
@ 2013-08-20 13:31                                     ` Johannes Sixt
  2013-08-20 15:08                                       ` Stefan Beller
  2013-08-20 21:24                                       ` Stefan Beller
  0 siblings, 2 replies; 72+ messages in thread
From: Johannes Sixt @ 2013-08-20 13:31 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle

I didn't look at functions above cmd_repack.

Am 20.08.2013 01:23, schrieb Stefan Beller:
> +int cmd_repack(int argc, const char **argv, const char *prefix) {
> +
> +	int pack_everything = 0;
> +	int pack_everything_but_loose = 0;
> +	int delete_redundant = 0;
> +	char *unpack_unreachable = NULL;
> +	int window = 0, window_memory = 0;
> +	int depth = 0;
> +	int max_pack_size = 0;
> +	int no_reuse_delta = 0, no_reuse_object = 0;
> +	int no_update_server_info = 0;
> +	int quiet = 0;
> +	int local = 0;
> +	char *packdir, *packtmp;
> +	struct child_process cmd;
> +	struct string_list_item *item;
> +	struct string_list existing_packs = STRING_LIST_INIT_DUP;
> +	struct stat statbuffer;
> +	int ext;
> +	char *exts[2] = {".idx", ".pack"};
> +
> +	struct option builtin_repack_options[] = {

Are the long forms of options your invention?

> +		OPT_BOOL('a', "all", &pack_everything,
> +				N_("pack everything in a single pack")),
> +		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
> +				N_("same as -a, and turn unreachable objects loose")),

--all-but-loose does not express what the help text says. The long form of 
-A is --all --unpack-unreachable, so it is really just a short option for 
convenience. It does not need its own long form.

> +		OPT_BOOL('d', "delete-redundant", &delete_redundant,
> +				N_("remove redundant packs, and run git-prune-packed")),
> +		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
> +				N_("pass --no-reuse-delta to git-pack-objects")),
> +		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
> +				N_("pass --no-reuse-object to git-pack-objects")),

Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?

> +		OPT_BOOL('n', NULL, &no_update_server_info,
> +				N_("do not run git-update-server-info")),

No long option name?

> +		OPT__QUIET(&quiet, N_("be quiet")),
> +		OPT_BOOL('l', "local", &local,
> +				N_("pass --local to git-pack-objects")),

Good.

> +		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
> +				N_("with -A, do not loosen objects older than this Packing constraints")),

"Packing constraints" is a section heading, not a continuation of the 
previous help text.

> +		OPT_INTEGER(0, "window", &window,
> +				N_("size of the window used for delta compression")),

This help text is suboptimal as the option is a count, not a "size" in the 
narrow sense. But that can be changed later (as it would affect other 
tools as well, I guess).

> +		OPT_INTEGER(0, "window-memory", &window_memory,
> +				N_("same as the above, but limit memory size instead of entries count")),
> +		OPT_INTEGER(0, "depth", &depth,
> +				N_("limits the maximum delta depth")),
> +		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
> +				N_("maximum size of each packfile")),
> +		OPT_END()
> +	};

Good.

> +
> +	git_config(repack_config, NULL);
> +
> +	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> +				git_repack_usage, 0);
> +
> +	sigchain_push_common(remove_pack_on_signal);

Good.

> +	packdir = mkpathdup("%s/pack", get_object_directory());
> +	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());

Should this not be

	packdir = xstrdup(git_path("pack"));
	packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));

Perhaps make packdir and packtmp global so that the strings need not be 
duplicated in get_pack_filenames and remove_temporary_files?

> +
> +	remove_temporary_files();

Yes, the shell script had this. But is it really necessary?

> +
> +	struct argv_array cmd_args = ARGV_ARRAY_INIT;

Declaration after statement.

> +	argv_array_push(&cmd_args, "pack-objects");
> +	argv_array_push(&cmd_args, "--keep-true-parents");
> +	argv_array_push(&cmd_args, "--honor-pack-keep");
> +	argv_array_push(&cmd_args, "--non-empty");
> +	argv_array_push(&cmd_args, "--all");
> +	argv_array_push(&cmd_args, "--reflog");
> +
> +	if (window)
> +		argv_array_pushf(&cmd_args, "--window=%u", window);
> +
> +	if (window_memory)
> +		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
> +
> +	if (depth)
> +		argv_array_pushf(&cmd_args, "--depth=%u", depth);
> +
> +	if (max_pack_size)
> +		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
> +
> +	if (no_reuse_delta)
> +		argv_array_pushf(&cmd_args, "--no-reuse-delta");
> +
> +	if (no_reuse_object)
> +		argv_array_pushf(&cmd_args, "--no-reuse-object");

no_reuse_delta and no_reuse_object are mutually exclusive, according to 
the shell script version.

> +
> +	if (pack_everything + pack_everything_but_loose == 0) {
> +		argv_array_push(&cmd_args, "--unpacked");
> +		argv_array_push(&cmd_args, "--incremental");
> +	} else {
> +		struct string_list fname_list = STRING_LIST_INIT_DUP;
> +		get_pack_filenames(packdir, &fname_list);
> +		for_each_string_list_item(item, &fname_list) {
> +			char *fname;
> +			fname = mkpathdup("%s/%s.keep", packdir, item->string);
> +			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {

			if (!stat(fname, &statbuffer) && ...

But you are using file_exists() later. That should be good enough here as 
well, no?

> +				/* when the keep file is there, we're ignoring that pack */
> +			} else {
> +				string_list_append(&existing_packs, item->string);
> +			}
> +			free(fname);
> +		}
> +
> +		if (existing_packs.nr && delete_redundant) {
> +			if (unpack_unreachable)
> +				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
> +			else if (pack_everything_but_loose)
> +				argv_array_push(&cmd_args, "--unpack-unreachable");
> +		}
> +	}
> +
> +	if (local)
> +		argv_array_push(&cmd_args,  "--local");
> +	if (quiet)
> +		argv_array_push(&cmd_args,  "--quiet");
> +	if (delta_base_offset)
> +		argv_array_push(&cmd_args,  "--delta-base-offset");
> +
> +	argv_array_push(&cmd_args, packtmp);

Otherwise, argument setup looks fine.

> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.argv = argv_array_detach(&cmd_args, NULL);

Is it necessary to detach the arguments?

> +	cmd.git_cmd = 1;
> +	cmd.out = -1;
> +	cmd.no_stdin = 1;
> +
> +	if (run_command(&cmd))
> +		return 1;

You cannot run_command() and then later read its output! You must split it 
into start_command(), read stdout, finish_command().

> +
> +	struct string_list names = STRING_LIST_INIT_DUP;
> +	struct string_list rollback = STRING_LIST_INIT_DUP;

Declaration after statement.

> +
> +	char line[1024];
> +	int counter = 0;
> +	FILE *out = xfdopen(cmd.out, "r");
> +	while (fgets(line, sizeof(line), out)) {
> +		/* a line consists of 40 hex chars + '\n' */
> +		assert(strlen(line) == 41);

You cannot make assertions about input that you read from an external 
command! You can die() if the expectation is not met. But I think that in 
this case the only necessary expectation is that a line is not empty.

BTW, don't we have strbuf functions to read from an fd linewise?

> +		line[40] = '\0';
> +		string_list_append(&names, line);
> +		counter++;
> +	}
> +	if (!counter)
> +		printf("Nothing new to pack.\n");

This was 'say Nothing new to pack.'. say obeys --quiet, IIRC.

> +	fclose(out);
> +
> +	int failed = 0;
> +	for_each_string_list_item(item, &names) {
> +		for (ext = 0; ext < 1; ext++) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
> +			if (!file_exists(fname)) {
> +				free(fname);
> +				continue;
> +			}
> +
> +			fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);

If you could use git_path() instead of mkpathdup() in these two cases, we 
would not need to free() the names.

> +			if (file_exists(fname_old))
> +				unlink(fname_old);
> +
> +			if (rename(fname, fname_old)) {
> +				failed = 1;
> +				break;
> +			}
> +			free(fname_old);
> +			string_list_append_nodup(&rollback, fname);

Ah, we would need to allocate here then.

> +		}
> +		if (failed)
> +			/* set to last element to break for_each loop */
> +			item = names.items + names.nr;

A mere
			break;
doesn't do it here?

> +	}
> +	if (failed) {
> +		struct string_list rollback_failure;
> +		for_each_string_list_item(item, &rollback) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/%s", packdir, item->string);
> +			fname_old = mkpathdup("%s/old-%s", packdir, item->string);

I think it's possible to attach arbitrary data to each string_list item. 
We could attach the "%s/old-%s" name to the item name, then we wouldn't 
need to re-construct the names here.

> +			if (rename(fname_old, fname))
> +				string_list_append(&rollback_failure, fname);
> +			free(fname);
> +			free(fname_old);
> +		}
> +
> +		if (rollback.nr) {
> +			int i;
> +			fprintf(stderr,
> +				"WARNING: Some packs in use have been renamed by\n"
> +				"WARNING: prefixing old- to their name, in order to\n"
> +				"WARNING: replace them with the new version of the\n"
> +				"WARNING: file.  But the operation failed, and the\n"
> +				"WARNING: attempt to rename them back to their\n"
> +				"WARNING: original names also failed.\n"
> +				"WARNING: Please rename them in $PACKDIR manually:\n");
> +			for (i = 0; i < rollback.nr; i++)
> +				fprintf(stderr, "WARNING:   old-%s -> %s\n",
> +					rollback.items[i].string,
> +					rollback.items[i].string);
> +		}
> +		exit(1);
> +	}
> +
> +	/* Now the ones with the same name are out of the way... */
> +	for_each_string_list_item(item, &names) {
> +		for (ext = 0; ext < 2; ext++) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
> +			fname_old = mkpathdup("%s-%s%s", packtmp, item->string, exts[ext]);

Same here: git_path()?

> +			stat(fname_old, &statbuffer);

We ignore errors during chmod in the shell script. But this doesn't give 
you license to ignore stat() errors completely: If stat() fails, then 
don't chmod() below, either.

> +			statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;

			statbuffer.st_mode &= ~(S_IWUSR|S_IWGRP|S_IWOTH);

> +			chmod(fname_old, statbuffer.st_mode);
> +			if (rename(fname_old, fname))
> +				die("Could not rename packfile: %s -> %s", fname_old, fname);

Use die_errno() here.

> +			free(fname);
> +			free(fname_old);
> +		}
> +	}
> +
> +	/* Remove the "old-" files */
> +	for_each_string_list_item(item, &names) {
> +		char *fname;
> +		fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
> +		if (remove_path(fname))
> +			die("Could not remove file: %s", fname);

die_errno() makes sense here, too.

> +		free(fname);
> +
> +		fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
> +		if (remove_path(fname))
> +			die("Could not remove file: %s", fname);

and here as well.

> +		free(fname);

Again git_path?

> +	}
> +
> +	/* End of pack replacement. */

Nit: A blank line should follow this comment.

> +	if (delete_redundant) {
> +		sort_string_list(&names);
> +		for_each_string_list_item(item, &existing_packs) {
> +			char *sha1;
> +			size_t len = strlen(item->string);
> +			if (len < 40)
> +				continue;
> +			sha1 = item->string + len - 40;
> +			if (!string_list_has_string(&names, sha1))
> +				remove_pack(packdir, item->string);
> +		}

OK.

> +		argv_array_clear(&cmd_args);
> +		argv_array_push(&cmd_args, "prune-packed");
> +		if (quiet)
> +			argv_array_push(&cmd_args, "--quiet");
> +
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = argv_array_detach(&cmd_args, NULL);

Again: is it necessary to detach?

> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}
> +
> +	if (!no_update_server_info) {
> +		argv_array_clear(&cmd_args);
> +		argv_array_push(&cmd_args, "update-server-info");
> +
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = argv_array_detach(&cmd_args, NULL);

Same here?

> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}
> +	return 0;
> +}

In my opinion, it is good that you keep a large function that resembles 
the structure of the shell script because it is easier to review. But 
ultimately, it should be factored into smaller functions.

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 13:31                                     ` Johannes Sixt
@ 2013-08-20 15:08                                       ` Stefan Beller
  2013-08-20 18:38                                         ` Johannes Sixt
  2013-08-20 18:57                                         ` René Scharfe
  2013-08-20 21:24                                       ` Stefan Beller
  1 sibling, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-20 15:08 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle

[-- Attachment #1: Type: text/plain, Size: 15287 bytes --]

On 08/20/2013 03:31 PM, Johannes Sixt wrote:
> 
> Are the long forms of options your invention?

I tried to keep strong similarity with the shell script for
ease of review. In the shellscript the options where 
put in variables having these names, so for example there was:

	-f)	no_reuse=--no-reuse-delta ;;
	-F)	no_reuse=--no-reuse-object ;;

So I used these variable names as well in here. And as I assumed
the variables are meaningful in itself.

In the shell script they may be meaningful, but with the option
parser in the C version, I overlooked the possibility for 
--no-<option> being possible as you noted below.

Maybe we should inverse the logic and have the variables and options
called reuse-delta and being enabled by default.

> 
>> +        OPT_BOOL('a', "all", &pack_everything,
>> +                N_("pack everything in a single pack")),
>> +        OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
>> +                N_("same as -a, and turn unreachable objects loose")),
> 
> --all-but-loose does not express what the help text says. The long form
> of -A is --all --unpack-unreachable, so it is really just a short option
> for convenience. It does not need its own long form.

Ok, I'll keep that in mind, and will only use the varialbe tied to -A
to set the -a and --unpack-unreachable variable.

> 
>> +        OPT_BOOL('d', "delete-redundant", &delete_redundant,
>> +                N_("remove redundant packs, and run git-prune-packed")),
>> +        OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
>> +                N_("pass --no-reuse-delta to git-pack-objects")),
>> +        OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
>> +                N_("pass --no-reuse-object to git-pack-objects")),
> 
> Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?

see above, I'd try not to.

> 
>> +        OPT_BOOL('n', NULL, &no_update_server_info,
>> +                N_("do not run git-update-server-info")),
> 
> No long option name?

This is also a negated option, so as above, maybe 
we could have --update_server_info and --no-update_server_info
respectively. Talking about the shortform then: Is it possible to
negate the shortform?

> 
>> +        OPT__QUIET(&quiet, N_("be quiet")),
>> +        OPT_BOOL('l', "local", &local,
>> +                N_("pass --local to git-pack-objects")),
> 
> Good.
> 
>> +        OPT_STRING(0, "unpack-unreachable", &unpack_unreachable,
>> N_("approxidate"),
>> +                N_("with -A, do not loosen objects older than this
>> Packing constraints")),
> 
> "Packing constraints" is a section heading, not a continuation of the
> previous help text.
> 
>> +        OPT_INTEGER(0, "window", &window,
>> +                N_("size of the window used for delta compression")),
> 
> This help text is suboptimal as the option is a count, not a "size" in
> the narrow sense. But that can be changed later (as it would affect
> other tools as well, I guess).
> 
>> +        OPT_INTEGER(0, "window-memory", &window_memory,
>> +                N_("same as the above, but limit memory size instead
>> of entries count")),
>> +        OPT_INTEGER(0, "depth", &depth,
>> +                N_("limits the maximum delta depth")),
>> +        OPT_INTEGER(0, "max-pack-size", &max_pack_size,
>> +                N_("maximum size of each packfile")),
>> +        OPT_END()
>> +    };
> 
> Good.
> 
>> +
>> +    git_config(repack_config, NULL);
>> +
>> +    argc = parse_options(argc, argv, prefix, builtin_repack_options,
>> +                git_repack_usage, 0);
>> +
>> +    sigchain_push_common(remove_pack_on_signal);
> 
> Good.
> 
>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
> 
> Should this not be
> 
>     packdir = xstrdup(git_path("pack"));
>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
> 
> Perhaps make packdir and packtmp global so that the strings need not be
> duplicated in get_pack_filenames and remove_temporary_files?

ok

> 
>> +
>> +    remove_temporary_files();
> 
> Yes, the shell script had this. But is it really necessary?

Well I can drop it if it's not needed.
It actually should implement
	rm -f "$PACKTMP"-*
and then the trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
as well.

> 
>> +
>> +    struct argv_array cmd_args = ARGV_ARRAY_INIT;
> 
> Declaration after statement.

will fix.

> 
>> +    argv_array_push(&cmd_args, "pack-objects");
>> +    argv_array_push(&cmd_args, "--keep-true-parents");
>> +    argv_array_push(&cmd_args, "--honor-pack-keep");
>> +    argv_array_push(&cmd_args, "--non-empty");
>> +    argv_array_push(&cmd_args, "--all");
>> +    argv_array_push(&cmd_args, "--reflog");
>> +
>> +    if (window)
>> +        argv_array_pushf(&cmd_args, "--window=%u", window);
>> +
>> +    if (window_memory)
>> +        argv_array_pushf(&cmd_args, "--window-memory=%u",
>> window_memory);
>> +
>> +    if (depth)
>> +        argv_array_pushf(&cmd_args, "--depth=%u", depth);
>> +
>> +    if (max_pack_size)
>> +        argv_array_pushf(&cmd_args, "--max_pack_size=%u",
>> max_pack_size);
>> +
>> +    if (no_reuse_delta)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-delta");
>> +
>> +    if (no_reuse_object)
>> +        argv_array_pushf(&cmd_args, "--no-reuse-object");
> 
> no_reuse_delta and no_reuse_object are mutually exclusive, according to
> the shell script version.

I'll change it then to OPT_BIT and die() when both are set.

> 
>> +
>> +    if (pack_everything + pack_everything_but_loose == 0) {
>> +        argv_array_push(&cmd_args, "--unpacked");
>> +        argv_array_push(&cmd_args, "--incremental");
>> +    } else {
>> +        struct string_list fname_list = STRING_LIST_INIT_DUP;
>> +        get_pack_filenames(packdir, &fname_list);
>> +        for_each_string_list_item(item, &fname_list) {
>> +            char *fname;
>> +            fname = mkpathdup("%s/%s.keep", packdir, item->string);
>> +            if (stat(fname, &statbuffer) &&
>> S_ISREG(statbuffer.st_mode)) {
> 
>             if (!stat(fname, &statbuffer) && ...
> 
> But you are using file_exists() later. That should be good enough here
> as well, no?

will do.

> 
>> +                /* when the keep file is there, we're ignoring that
>> pack */
>> +            } else {
>> +                string_list_append(&existing_packs, item->string);
>> +            }
>> +            free(fname);
>> +        }
>> +
>> +        if (existing_packs.nr && delete_redundant) {
>> +            if (unpack_unreachable)
>> +                argv_array_pushf(&cmd_args,
>> "--unpack-unreachable=%s", unpack_unreachable);
>> +            else if (pack_everything_but_loose)
>> +                argv_array_push(&cmd_args, "--unpack-unreachable");
>> +        }
>> +    }
>> +
>> +    if (local)
>> +        argv_array_push(&cmd_args,  "--local");
>> +    if (quiet)
>> +        argv_array_push(&cmd_args,  "--quiet");
>> +    if (delta_base_offset)
>> +        argv_array_push(&cmd_args,  "--delta-base-offset");
>> +
>> +    argv_array_push(&cmd_args, packtmp);
> 
> Otherwise, argument setup looks fine.
> 
>> +
>> +    memset(&cmd, 0, sizeof(cmd));
>> +    cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Is it necessary to detach the arguments?

Probably not. 

> 
>> +    cmd.git_cmd = 1;
>> +    cmd.out = -1;
>> +    cmd.no_stdin = 1;
>> +
>> +    if (run_command(&cmd))
>> +        return 1;
> 
> You cannot run_command() and then later read its output! You must split
> it into start_command(), read stdout, finish_command().

Thanks for this hint. Could that explain rare non-deterministic failures in
the test suite?

> 
>> +
>> +    struct string_list names = STRING_LIST_INIT_DUP;
>> +    struct string_list rollback = STRING_LIST_INIT_DUP;
> 
> Declaration after statement.

will fix
> 
>> +
>> +    char line[1024];
>> +    int counter = 0;
>> +    FILE *out = xfdopen(cmd.out, "r");
>> +    while (fgets(line, sizeof(line), out)) {
>> +        /* a line consists of 40 hex chars + '\n' */
>> +        assert(strlen(line) == 41);
> 
> You cannot make assertions about input that you read from an external
> command! You can die() if the expectation is not met. But I think that
> in this case the only necessary expectation is that a line is not empty.
> 
> BTW, don't we have strbuf functions to read from an fd linewise?

I'll check.

> 
>> +        line[40] = '\0';
>> +        string_list_append(&names, line);
>> +        counter++;
>> +    }
>> +    if (!counter)
>> +        printf("Nothing new to pack.\n");
> 
> This was 'say Nothing new to pack.'. say obeys --quiet, IIRC.

ok
> 
>> +    fclose(out);
>> +
>> +    int failed = 0;
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 1; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s%s", packdir, item->string,
>> exts[ext]);
>> +            if (!file_exists(fname)) {
>> +                free(fname);
>> +                continue;
>> +            }
>> +
>> +            fname_old = mkpathdup("%s/old-%s%s", packdir,
>> item->string, exts[ext]);
> 
> If you could use git_path() instead of mkpathdup() in these two cases,
> we would not need to free() the names.
> 
>> +            if (file_exists(fname_old))
>> +                unlink(fname_old);
>> +
>> +            if (rename(fname, fname_old)) {
>> +                failed = 1;
>> +                break;
>> +            }
>> +            free(fname_old);
>> +            string_list_append_nodup(&rollback, fname);
> 
> Ah, we would need to allocate here then.
> 
>> +        }
>> +        if (failed)
>> +            /* set to last element to break for_each loop */
>> +            item = names.items + names.nr;
> 
> A mere
>             break;
> doesn't do it here?

Sure! I'll replace by break.

> 
>> +    }
>> +    if (failed) {
>> +        struct string_list rollback_failure;
>> +        for_each_string_list_item(item, &rollback) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/%s", packdir, item->string);
>> +            fname_old = mkpathdup("%s/old-%s", packdir, item->string);
> 
> I think it's possible to attach arbitrary data to each string_list item.
> We could attach the "%s/old-%s" name to the item name, then we wouldn't
> need to re-construct the names here.

handy! I'll try to do that.

> 
>> +            if (rename(fname_old, fname))
>> +                string_list_append(&rollback_failure, fname);
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +
>> +        if (rollback.nr) {
>> +            int i;
>> +            fprintf(stderr,
>> +                "WARNING: Some packs in use have been renamed by\n"
>> +                "WARNING: prefixing old- to their name, in order to\n"
>> +                "WARNING: replace them with the new version of the\n"
>> +                "WARNING: file.  But the operation failed, and the\n"
>> +                "WARNING: attempt to rename them back to their\n"
>> +                "WARNING: original names also failed.\n"
>> +                "WARNING: Please rename them in $PACKDIR manually:\n");
>> +            for (i = 0; i < rollback.nr; i++)
>> +                fprintf(stderr, "WARNING:   old-%s -> %s\n",
>> +                    rollback.items[i].string,
>> +                    rollback.items[i].string);
>> +        }
>> +        exit(1);
>> +    }
>> +
>> +    /* Now the ones with the same name are out of the way... */
>> +    for_each_string_list_item(item, &names) {
>> +        for (ext = 0; ext < 2; ext++) {
>> +            char *fname, *fname_old;
>> +            fname = mkpathdup("%s/pack-%s%s", packdir, item->string,
>> exts[ext]);
>> +            fname_old = mkpathdup("%s-%s%s", packtmp, item->string,
>> exts[ext]);
> 
> Same here: git_path()?
> 
>> +            stat(fname_old, &statbuffer);
> 
> We ignore errors during chmod in the shell script. But this doesn't give
> you license to ignore stat() errors completely: If stat() fails, then
> don't chmod() below, either.

ok

> 
>> +            statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
> 
>             statbuffer.st_mode &= ~(S_IWUSR|S_IWGRP|S_IWOTH);
> 
>> +            chmod(fname_old, statbuffer.st_mode);
>> +            if (rename(fname_old, fname))
>> +                die("Could not rename packfile: %s -> %s", fname_old,
>> fname);
> 
> Use die_errno() here.
> 
>> +            free(fname);
>> +            free(fname_old);
>> +        }
>> +    }
>> +
>> +    /* Remove the "old-" files */
>> +    for_each_string_list_item(item, &names) {
>> +        char *fname;
>> +        fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> die_errno() makes sense here, too.
> 
>> +        free(fname);
>> +
>> +        fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
>> +        if (remove_path(fname))
>> +            die("Could not remove file: %s", fname);
> 
> and here as well.
> 
>> +        free(fname);
> 
> Again git_path?
> 
>> +    }
>> +
>> +    /* End of pack replacement. */
> 
> Nit: A blank line should follow this comment.
> 
>> +    if (delete_redundant) {
>> +        sort_string_list(&names);
>> +        for_each_string_list_item(item, &existing_packs) {
>> +            char *sha1;
>> +            size_t len = strlen(item->string);
>> +            if (len < 40)
>> +                continue;
>> +            sha1 = item->string + len - 40;
>> +            if (!string_list_has_string(&names, sha1))
>> +                remove_pack(packdir, item->string);
>> +        }
> 
> OK.
> 
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "prune-packed");
>> +        if (quiet)
>> +            argv_array_push(&cmd_args, "--quiet");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Again: is it necessary to detach?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +
>> +    if (!no_update_server_info) {
>> +        argv_array_clear(&cmd_args);
>> +        argv_array_push(&cmd_args, "update-server-info");
>> +
>> +        memset(&cmd, 0, sizeof(cmd));
>> +        cmd.argv = argv_array_detach(&cmd_args, NULL);
> 
> Same here?
> 
>> +        cmd.git_cmd = 1;
>> +        run_command(&cmd);
>> +    }
>> +    return 0;
>> +}
> 
> In my opinion, it is good that you keep a large function that resembles
> the structure of the shell script because it is easier to review. But
> ultimately, it should be factored into smaller functions.
> 
> -- Hannes
> 

Hannes,

thank you very much for the review. I'll follow your suggestions and dive
deeper into the API to change your annotated lines.

Thanks,
Stefan




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 15:08                                       ` Stefan Beller
@ 2013-08-20 18:38                                         ` Johannes Sixt
  2013-08-20 18:57                                         ` René Scharfe
  1 sibling, 0 replies; 72+ messages in thread
From: Johannes Sixt @ 2013-08-20 18:38 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, l.s.r, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle

Am 20.08.2013 17:08, schrieb Stefan Beller:
> On 08/20/2013 03:31 PM, Johannes Sixt wrote:
>> You cannot run_command() and then later read its output! You must split
>> it into start_command(), read stdout, finish_command().
>
> Thanks for this hint. Could that explain rare non-deterministic failures in
> the test suite?

Yes, it's a possible explanation.

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 15:08                                       ` Stefan Beller
  2013-08-20 18:38                                         ` Johannes Sixt
@ 2013-08-20 18:57                                         ` René Scharfe
  2013-08-20 22:36                                           ` Stefan Beller
  1 sibling, 1 reply; 72+ messages in thread
From: René Scharfe @ 2013-08-20 18:57 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Johannes Sixt, git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle

Am 20.08.2013 17:08, schrieb Stefan Beller:
> On 08/20/2013 03:31 PM, Johannes Sixt wrote:
>>
>> Are the long forms of options your invention?
>
> I tried to keep strong similarity with the shell script for
> ease of review. In the shellscript the options where
> put in variables having these names, so for example there was:
>
> 	-f)	no_reuse=--no-reuse-delta ;;
> 	-F)	no_reuse=--no-reuse-object ;;
>
> So I used these variable names as well in here. And as I assumed
> the variables are meaningful in itself.
>
> In the shell script they may be meaningful, but with the option
> parser in the C version, I overlooked the possibility for
> --no-<option> being possible as you noted below.
>
> Maybe we should inverse the logic and have the variables and options
> called reuse-delta and being enabled by default.

That's what git repack-objects does, which gets it passed to eventually.

But I think Johannes also wanted to point out that the git-repack.sh 
doesn't recognize --no-reuse-delta, --all etc..  I think it's better to 
introduce new long options in a separate patch.  Switching the 
programming language is big enough of a change already. :)

>>> +        OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
>>> +                N_("pass --no-reuse-delta to git-pack-objects")),
>>> +        OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
>>> +                N_("pass --no-reuse-object to git-pack-objects")),
>>
>> Do we want to allow --no-no-reuse-delta and --no-no-reuse-object?
>
> see above, I'd try not to.

The declaration above allows --reuse-delta, --no-reuse-delta and 
--no-no-reuse-delta to be used.  The latter looks funny, but I don't 
think we need to forbid it.  That said, dropping the no- and thus 
declaring them the same way as repack-objects is a good idea.

>>
>>> +        OPT_BOOL('n', NULL, &no_update_server_info,
>>> +                N_("do not run git-update-server-info")),
>>
>> No long option name?
>
> This is also a negated option, so as above, maybe
> we could have --update_server_info and --no-update_server_info
> respectively. Talking about the shortform then: Is it possible to
> negate the shortform?

Words in long options are separated by dashes, so --update-server-info. 
  The no- prefix is provided for free by parseopt, unless the flag 
PARSE_OPT_NONEG is given.

There is no automatic way to provide a short option that negates another 
short option.  You can build such a pair explicitly using OPTION_BIT and 
OPTION_NEGBIT or with OPTION_SET_INT and different values.

>>> +    if (pack_everything + pack_everything_but_loose == 0) {
>>> +        argv_array_push(&cmd_args, "--unpacked");
>>> +        argv_array_push(&cmd_args, "--incremental");
>>> +    } else {
>>> +        struct string_list fname_list = STRING_LIST_INIT_DUP;
>>> +        get_pack_filenames(packdir, &fname_list);
>>> +        for_each_string_list_item(item, &fname_list) {
>>> +            char *fname;
>>> +            fname = mkpathdup("%s/%s.keep", packdir, item->string);
>>> +            if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {

"t7700-repack.sh --valgrind" fails and flags that line...

>>
>>             if (!stat(fname, &statbuffer) && ...

... but with this fix it runs fine.  I suspect that explains you 
sporadic test failures.

>>
>> But you are using file_exists() later. That should be good enough here
>> as well, no?
>
> will do.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 13:31                                     ` Johannes Sixt
  2013-08-20 15:08                                       ` Stefan Beller
@ 2013-08-20 21:24                                       ` Stefan Beller
  2013-08-20 21:34                                         ` Jonathan Nieder
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-20 21:24 UTC (permalink / raw)
  Cc: git

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

On 08/20/2013 03:31 PM, Johannes Sixt wrote:
> 
>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
> 
> Should this not be
> 
>     packdir = xstrdup(git_path("pack"));
>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));

Just a question for documentational purpose. ;)
Am I right suggesting the following:

`mkpathdup`::
	Use parameters to build the path on the filesystem,
	i.e. create required folders and then return a duplicate
	of that path. The caller is responsible to free the memory

`xstrdup`::
	Duplicates the given string, making the caller responsible
	to free the return value. (No side effects to fs,
	other global memory). Basically the same as man 2 strdup
	with errorhandling.

`git_path`::
	Returns a pointer to a static string buffer, so it can just
	be used once or must be duplicated using xstrdup. The path
	given is relative and is inside the repository.


Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 21:24                                       ` Stefan Beller
@ 2013-08-20 21:34                                         ` Jonathan Nieder
  2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Jonathan Nieder @ 2013-08-20 21:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller wrote:
> On 08/20/2013 03:31 PM, Johannes Sixt wrote:

>>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
>> 
>> Should this not be
>> 
>>     packdir = xstrdup(git_path("pack"));
>>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
>
> Just a question for documentational purpose. ;)
> Am I right suggesting the following:
>
> `mkpathdup`::
> 	Use parameters to build the path on the filesystem,
> 	i.e. create required folders and then return a duplicate
> 	of that path. The caller is responsible to free the memory

Right.  mkpathdup is basically just mkpath composed with xstrdup,
except that it avoids stomping on mkpath's buffers.

The corresponding almost-shortcut for xstrdup(git_path(s)) is
git_pathdup(s).  But that's a minor detail.

Maybe a new Documentation/technical/api-paths.txt is in order.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Dokumenting api-paths.txt
  2013-08-20 21:34                                         ` Jonathan Nieder
@ 2013-08-20 21:40                                           ` Stefan Beller
  2013-08-20 21:59                                             ` Jonathan Nieder
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-20 21:40 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1536 bytes --]

On 08/20/2013 11:34 PM, Jonathan Nieder wrote:
> Stefan Beller wrote:
>> On 08/20/2013 03:31 PM, Johannes Sixt wrote:
> 
>>>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>>>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
>>>
>>> Should this not be
>>>
>>>     packdir = xstrdup(git_path("pack"));
>>>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
>>
>> Just a question for documentational purpose. ;)
>> Am I right suggesting the following:
>>
>> `mkpathdup`::
>> 	Use parameters to build the path on the filesystem,
>> 	i.e. create required folders and then return a duplicate
>> 	of that path. The caller is responsible to free the memory
> 
> Right.  mkpathdup is basically just mkpath composed with xstrdup,
> except that it avoids stomping on mkpath's buffers.
> 
> The corresponding almost-shortcut for xstrdup(git_path(s)) is
> git_pathdup(s).  But that's a minor detail.
> 
> Maybe a new Documentation/technical/api-paths.txt is in order.
> 
> Thanks,
> Jonathan
> 

Is there a way to create a path, without being using git_path?
git_path seems to imply adding .git.

So if I have 
	packdir = xstrdup(git_path("pack"));
	...
	path = git_path("%s/%s", packdir, filename)

This produces something as:
.git/.git/objects/pack/.tmp-13199-pack-c59c5758ef159b272f6ab10cb9fadee443966e71.idx
definitely having one .git too much.

Also interesting to add would be that git_path operates in the
.git/objects directory?

Thanks,
Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Dokumenting api-paths.txt
  2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
@ 2013-08-20 21:59                                             ` Jonathan Nieder
  2013-08-21 22:43                                               ` Stefan Beller
  0 siblings, 1 reply; 72+ messages in thread
From: Jonathan Nieder @ 2013-08-20 21:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Johannes Sixt

Stefan Beller wrote:
>>> On 08/20/2013 03:31 PM, Johannes Sixt wrote:
>>>> Stefan Beller wrote:

>>>>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>>>>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
>>>>
>>>> Should this not be
>>>>
>>>>     packdir = xstrdup(git_path("pack"));
>>>>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
[...]
> So if I have 
> 	packdir = xstrdup(git_path("pack"));
> 	...
> 	path = git_path("%s/%s", packdir, filename)
>
> This produces something as:
> .git/.git/objects/pack/.tmp-13199-pack-c59c5758ef159b272f6ab10cb9fadee443966e71.idx
> definitely having one .git too much.

The version with get_object_directory() was right.  The object
directory is not even necessarily under .git/, since it can be
overridden using the GIT_OBJECT_DIRECTORY envvar.

> Also interesting to add would be that git_path operates in the
> .git/objects directory?

git_path is for resolving paths within GIT_DIR, such as
git_path("config") and git_path("COMMIT_EDITMSG").

Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 18:57                                         ` René Scharfe
@ 2013-08-20 22:36                                           ` Stefan Beller
  2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
                                                               ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-20 22:36 UTC (permalink / raw)
  To: René Scharfe
  Cc: Johannes Sixt, git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	gitster, mackyle

[-- Attachment #1: Type: text/plain, Size: 9806 bytes --]


So here is an update of git-repack
Thanks for all the reviews and annotations!
I think I got all the suggestions except the
use of git_path/mkpathdup.
I replaced mkpathdup by mkpath where possible,
but it's still not perfect.
I'll wait for the dokumentation patch of Jonathan, 
before changing all these occurences forth and back
again.

What would be perfect here would be a function
which just does string processing and returning,
so 
	fname = create_string(fmt, ...);
or with duplication:
	fname = create_string_dup(fmt, ...);

Ah wait! There are struct str_buf, but these
would require more lines (init, add to buffer, 
get as char*) 

Below there is just the diff against RFC PATCHv4,
however I'll send the whole patch as well.

Thanks,
Stefan

--8<--
From e544eb9b7bdea6c2000c5f0d3043845fb901e90b Mon Sep 17 00:00:00 2001
From: Stefan Beller <stefanbeller@googlemail.com>
Date: Wed, 21 Aug 2013 00:35:18 +0200
Subject: [PATCH] Suggestions of reviewers

---
 builtin/repack.c | 104 +++++++++++++++++++++++++++----------------------------
 1 file changed, 51 insertions(+), 53 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a87900e..9fbe636 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -67,7 +67,7 @@ void get_pack_filenames(char *packdir, struct string_list *fname_list)
 	struct dirent *e;
 	char *path, *suffix, *fname;
 
-	path = mkpathdup("%s/pack", get_object_directory());
+	path = mkpath("%s/pack", get_object_directory());
 	suffix = ".pack";
 
 	dir = opendir(path);
@@ -78,7 +78,6 @@ void get_pack_filenames(char *packdir, struct string_list *fname_list)
 			string_list_append_nodup(fname_list, fname);
 		}
 	}
-	free(path);
 	closedir(dir);
 }
 
@@ -88,14 +87,25 @@ void remove_pack(char *path, char* sha1)
 	int ext = 0;
 	for (ext = 0; ext < 3; ext++) {
 		char *fname;
-		fname = mkpathdup("%s/%s%s", path, sha1, exts[ext]);
+		fname = mkpath("%s/%s%s", path, sha1, exts[ext]);
 		unlink(fname);
-		free(fname);
 	}
 }
 
 int cmd_repack(int argc, const char **argv, const char *prefix) {
 
+	char *exts[2] = {".idx", ".pack"};
+	char *packdir, *packtmp, line[1024];
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	int count_packs, ext;
+	FILE *out;
+
+	/* variables to be filled by option parsing */
 	int pack_everything = 0;
 	int pack_everything_but_loose = 0;
 	int delete_redundant = 0;
@@ -107,24 +117,17 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 	int no_update_server_info = 0;
 	int quiet = 0;
 	int local = 0;
-	char *packdir, *packtmp;
-	struct child_process cmd;
-	struct string_list_item *item;
-	struct string_list existing_packs = STRING_LIST_INIT_DUP;
-	struct stat statbuffer;
-	int ext;
-	char *exts[2] = {".idx", ".pack"};
 
 	struct option builtin_repack_options[] = {
-		OPT_BOOL('a', "all", &pack_everything,
+		OPT_BOOL('a', NULL, &pack_everything,
 				N_("pack everything in a single pack")),
-		OPT_BOOL('A', "all-but-loose", &pack_everything_but_loose,
+		OPT_BOOL('A', NULL, &pack_everything_but_loose,
 				N_("same as -a, and turn unreachable objects loose")),
-		OPT_BOOL('d', "delete-redundant", &delete_redundant,
+		OPT_BOOL('d', NULL, &delete_redundant,
 				N_("remove redundant packs, and run git-prune-packed")),
-		OPT_BOOL('f', "no-reuse-delta", &no_reuse_delta,
+		OPT_BOOL('f', NULL, &no_reuse_delta,
 				N_("pass --no-reuse-delta to git-pack-objects")),
-		OPT_BOOL('F', "no-reuse-object", &no_reuse_object,
+		OPT_BOOL('F', NULL, &no_reuse_object,
 				N_("pass --no-reuse-object to git-pack-objects")),
 		OPT_BOOL('n', NULL, &no_update_server_info,
 				N_("do not run git-update-server-info")),
@@ -154,9 +157,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
 
-	remove_temporary_files();
-
-	struct argv_array cmd_args = ARGV_ARRAY_INIT;
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
 	argv_array_push(&cmd_args, "--honor-pack-keep");
@@ -191,7 +191,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 		for_each_string_list_item(item, &fname_list) {
 			char *fname;
 			fname = mkpathdup("%s/%s.keep", packdir, item->string);
-			if (stat(fname, &statbuffer) && S_ISREG(statbuffer.st_mode)) {
+			if (file_exists(fname)) {
 				/* when the keep file is there, we're ignoring that pack */
 			} else {
 				string_list_append(&existing_packs, item->string);
@@ -217,34 +217,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 	argv_array_push(&cmd_args, packtmp);
 
 	memset(&cmd, 0, sizeof(cmd));
-	cmd.argv = argv_array_detach(&cmd_args, NULL);
+	cmd.argv = cmd_args.argv;
 	cmd.git_cmd = 1;
 	cmd.out = -1;
 	cmd.no_stdin = 1;
 
-	if (run_command(&cmd))
+	if (start_command(&cmd))
 		return 1;
 
-	struct string_list names = STRING_LIST_INIT_DUP;
-	struct string_list rollback = STRING_LIST_INIT_DUP;
-
-	char line[1024];
-	int counter = 0;
-	FILE *out = xfdopen(cmd.out, "r");
+	count_packs = 0;
+	out = xfdopen(cmd.out, "r");
 	while (fgets(line, sizeof(line), out)) {
 		/* a line consists of 40 hex chars + '\n' */
-		assert(strlen(line) == 41);
+		if (strlen(line) != 41)
+			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
 		line[40] = '\0';
 		string_list_append(&names, line);
-		counter++;
+		count_packs++;
 	}
-	if (!counter)
-		printf("Nothing new to pack.\n");
+	if (finish_command(&cmd))
+		return 1;
 	fclose(out);
 
+	if (!count_packs && !quiet)
+		printf("Nothing new to pack.\n");
+
 	int failed = 0;
 	for_each_string_list_item(item, &names) {
-		for (ext = 0; ext < 1; ext++) {
+		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
 			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
 			if (!file_exists(fname)) {
@@ -252,7 +252,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 				continue;
 			}
 
-			fname_old = mkpathdup("%s/old-%s%s", packdir, item->string, exts[ext]);
+			fname_old = mkpath("%s/old-%s%s", packdir, item->string, exts[ext]);
 			if (file_exists(fname_old))
 				unlink(fname_old);
 
@@ -260,23 +260,21 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 				failed = 1;
 				break;
 			}
-			free(fname_old);
 			string_list_append_nodup(&rollback, fname);
+			free(fname);
 		}
 		if (failed)
-			/* set to last element to break for_each loop */
-			item = names.items + names.nr;
+			break;
 	}
 	if (failed) {
 		struct string_list rollback_failure;
 		for_each_string_list_item(item, &rollback) {
 			char *fname, *fname_old;
 			fname = mkpathdup("%s/%s", packdir, item->string);
-			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
+			fname_old = mkpath("%s/old-%s", packdir, item->string);
 			if (rename(fname_old, fname))
 				string_list_append(&rollback_failure, fname);
 			free(fname);
-			free(fname_old);
 		}
 
 		if (rollback.nr) {
@@ -301,33 +299,33 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
+			struct stat statbuffer;
 			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
-			fname_old = mkpathdup("%s-%s%s", packtmp, item->string, exts[ext]);
-			stat(fname_old, &statbuffer);
-			statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
-			chmod(fname_old, statbuffer.st_mode);
+			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
+			if (!stat(fname_old, &statbuffer)) {
+				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
+				chmod(fname_old, statbuffer.st_mode);
+			}
 			if (rename(fname_old, fname))
-				die("Could not rename packfile: %s -> %s", fname_old, fname);
+				die_errno(_("renaming '%s' failed"), fname_old);
 			free(fname);
-			free(fname_old);
 		}
 	}
 
 	/* Remove the "old-" files */
 	for_each_string_list_item(item, &names) {
 		char *fname;
-		fname = mkpathdup("%s/old-pack-%s.idx", packdir, item->string);
+		fname = mkpath("%s/old-pack-%s.idx", packdir, item->string);
 		if (remove_path(fname))
-			die("Could not remove file: %s", fname);
-		free(fname);
+			die_errno(_("removing '%s' failed"), fname);
 
-		fname = mkpathdup("%s/old-pack-%s.pack", packdir, item->string);
+		fname = mkpath("%s/old-pack-%s.pack", packdir, item->string);
 		if (remove_path(fname))
-			die("Could not remove file: %s", fname);
-		free(fname);
+			die_errno(_("removing '%s' failed"), fname);
 	}
 
 	/* End of pack replacement. */
+
 	if (delete_redundant) {
 		sort_string_list(&names);
 		for_each_string_list_item(item, &existing_packs) {
@@ -345,7 +343,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 			argv_array_push(&cmd_args, "--quiet");
 
 		memset(&cmd, 0, sizeof(cmd));
-		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.argv = cmd_args.argv;
 		cmd.git_cmd = 1;
 		run_command(&cmd);
 	}
@@ -355,7 +353,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
 		argv_array_push(&cmd_args, "update-server-info");
 
 		memset(&cmd, 0, sizeof(cmd));
-		cmd.argv = argv_array_detach(&cmd_args, NULL);
+		cmd.argv = cmd_args.argv;
 		cmd.git_cmd = 1;
 		run_command(&cmd);
 	}
-- 
1.8.4.rc3.1.gc1ebd90


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH] repack: rewrite the shell script in C.
  2013-08-20 22:36                                           ` Stefan Beller
@ 2013-08-20 22:38                                             ` Stefan Beller
  2013-08-21  8:25                                               ` Jonathan Nieder
  2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
  2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
  2013-08-21  9:20                                             ` Johannes Sixt
  2 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-20 22:38 UTC (permalink / raw)
  To: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t
  Cc: Stefan Beller

This is the beginning of the rewrite of the repacking.
All tests are constantly positive now.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 361 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 364 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..9fbe636
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,361 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "argv-array.h"
+
+static int delta_base_offset = 0;
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+static void remove_temporary_files() {
+	DIR *dir;
+	struct dirent *e;
+	char *prefix, *path;
+
+	prefix = mkpathdup(".tmp-%d-pack", getpid());
+	path = mkpathdup("%s/pack", get_object_directory());
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!prefixcmp(e->d_name, prefix)) {
+			struct strbuf fname = STRBUF_INIT;
+			strbuf_addf(&fname, "%s/%s", path, e->d_name);
+			unlink(strbuf_detach(&fname, NULL));
+		}
+	}
+	free(prefix);
+	free(path);
+	closedir(dir);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+/*
+ * Fills the filename list with all the files found in the pack directory
+ * ending with .pack, without that extension.
+ */
+void get_pack_filenames(char *packdir, struct string_list *fname_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *path, *suffix, *fname;
+
+	path = mkpath("%s/pack", get_object_directory());
+	suffix = ".pack";
+
+	dir = opendir(path);
+	while ((e = readdir(dir)) != NULL) {
+		if (!suffixcmp(e->d_name, suffix)) {
+			size_t len = strlen(e->d_name) - strlen(suffix);
+			fname = xmemdupz(e->d_name, len);
+			string_list_append_nodup(fname_list, fname);
+		}
+	}
+	closedir(dir);
+}
+
+void remove_pack(char *path, char* sha1)
+{
+	char *exts[] = {".pack", ".idx", ".keep"};
+	int ext = 0;
+	for (ext = 0; ext < 3; ext++) {
+		char *fname;
+		fname = mkpath("%s/%s%s", path, sha1, exts[ext]);
+		unlink(fname);
+	}
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix) {
+
+	char *exts[2] = {".idx", ".pack"};
+	char *packdir, *packtmp, line[1024];
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	int count_packs, ext;
+	FILE *out;
+
+	/* variables to be filled by option parsing */
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	char *unpack_unreachable = NULL;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', NULL, &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', NULL, &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', NULL, &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', NULL, &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', NULL, &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
+				N_("with -A, do not loosen objects older than this Packing constraints")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
+
+	argv_array_push(&cmd_args, "pack-objects");
+	argv_array_push(&cmd_args, "--keep-true-parents");
+	argv_array_push(&cmd_args, "--honor-pack-keep");
+	argv_array_push(&cmd_args, "--non-empty");
+	argv_array_push(&cmd_args, "--all");
+	argv_array_push(&cmd_args, "--reflog");
+
+	if (window)
+		argv_array_pushf(&cmd_args, "--window=%u", window);
+
+	if (window_memory)
+		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
+
+	if (depth)
+		argv_array_pushf(&cmd_args, "--depth=%u", depth);
+
+	if (max_pack_size)
+		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
+
+	if (no_reuse_delta)
+		argv_array_pushf(&cmd_args, "--no-reuse-delta");
+
+	if (no_reuse_object)
+		argv_array_pushf(&cmd_args, "--no-reuse-object");
+
+	if (pack_everything + pack_everything_but_loose == 0) {
+		argv_array_push(&cmd_args, "--unpacked");
+		argv_array_push(&cmd_args, "--incremental");
+	} else {
+		struct string_list fname_list = STRING_LIST_INIT_DUP;
+		get_pack_filenames(packdir, &fname_list);
+		for_each_string_list_item(item, &fname_list) {
+			char *fname;
+			fname = mkpathdup("%s/%s.keep", packdir, item->string);
+			if (file_exists(fname)) {
+				/* when the keep file is there, we're ignoring that pack */
+			} else {
+				string_list_append(&existing_packs, item->string);
+			}
+			free(fname);
+		}
+
+		if (existing_packs.nr && delete_redundant) {
+			if (unpack_unreachable)
+				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);
+			else if (pack_everything_but_loose)
+				argv_array_push(&cmd_args, "--unpack-unreachable");
+		}
+	}
+
+	if (local)
+		argv_array_push(&cmd_args,  "--local");
+	if (quiet)
+		argv_array_push(&cmd_args,  "--quiet");
+	if (delta_base_offset)
+		argv_array_push(&cmd_args,  "--delta-base-offset");
+
+	argv_array_push(&cmd_args, packtmp);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = cmd_args.argv;
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	if (start_command(&cmd))
+		return 1;
+
+	count_packs = 0;
+	out = xfdopen(cmd.out, "r");
+	while (fgets(line, sizeof(line), out)) {
+		/* a line consists of 40 hex chars + '\n' */
+		if (strlen(line) != 41)
+			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
+		line[40] = '\0';
+		string_list_append(&names, line);
+		count_packs++;
+	}
+	if (finish_command(&cmd))
+		return 1;
+	fclose(out);
+
+	if (!count_packs && !quiet)
+		printf("Nothing new to pack.\n");
+
+	int failed = 0;
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
+			if (!file_exists(fname)) {
+				free(fname);
+				continue;
+			}
+
+			fname_old = mkpath("%s/old-%s%s", packdir, item->string, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append_nodup(&rollback, fname);
+			free(fname);
+		}
+		if (failed)
+			break;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+		for_each_string_list_item(item, &rollback) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s", packdir, item->string);
+			fname_old = mkpath("%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+			free(fname);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and the\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in $PACKDIR manually:\n");
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname, *fname_old;
+			struct stat statbuffer;
+			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
+			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
+			if (!stat(fname_old, &statbuffer)) {
+				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
+				chmod(fname_old, statbuffer.st_mode);
+			}
+			if (rename(fname_old, fname))
+				die_errno(_("renaming '%s' failed"), fname_old);
+			free(fname);
+		}
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		char *fname;
+		fname = mkpath("%s/old-pack-%s.idx", packdir, item->string);
+		if (remove_path(fname))
+			die_errno(_("removing '%s' failed"), fname);
+
+		fname = mkpath("%s/old-pack-%s.pack", packdir, item->string);
+		if (remove_path(fname))
+			die_errno(_("removing '%s' failed"), fname);
+	}
+
+	/* End of pack replacement. */
+
+	if (delete_redundant) {
+		sort_string_list(&names);
+		for_each_string_list_item(item, &existing_packs) {
+			char *sha1;
+			size_t len = strlen(item->string);
+			if (len < 40)
+				continue;
+			sha1 = item->string + len - 40;
+			if (!string_list_has_string(&names, sha1))
+				remove_pack(packdir, item->string);
+		}
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "prune-packed");
+		if (quiet)
+			argv_array_push(&cmd_args, "--quiet");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args.argv;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+
+	if (!no_update_server_info) {
+		argv_array_clear(&cmd_args);
+		argv_array_push(&cmd_args, "update-server-info");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args.argv;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+	}
+	return 0;
+}
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 22:36                                           ` Stefan Beller
  2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
@ 2013-08-20 22:46                                             ` Jonathan Nieder
  2013-08-21  9:20                                             ` Johannes Sixt
  2 siblings, 0 replies; 72+ messages in thread
From: Jonathan Nieder @ 2013-08-20 22:46 UTC (permalink / raw)
  To: Stefan Beller
  Cc: René Scharfe, Johannes Sixt, git, mfick, apelisse,
	Matthieu.Moy, pclouds, iveqy, gitster, mackyle

Stefan Beller wrote:

> I think I got all the suggestions except the
> use of git_path/mkpathdup.
> I replaced mkpathdup by mkpath where possible,
> but it's still not perfect.

No, mkpathdup is generally better unless you know what you're doing.

> I'll wait for the dokumentation patch of Jonathan, 

I never promised to write one. :)  I would have preferred to have a
rough draft with the results of your investigations so far to start
from.

Oh well.  I'll look into it tonight.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
@ 2013-08-21  8:25                                               ` Jonathan Nieder
  2013-08-21 10:37                                                 ` Stefan Beller
  2013-08-21 17:25                                                 ` Stefan Beller
  2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
  1 sibling, 2 replies; 72+ messages in thread
From: Jonathan Nieder @ 2013-08-21  8:25 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t

Hi,

Stefan Beller wrote:

> [PATCH] repack: rewrite the shell script in C.

Thanks for your work so far.  This review will have mostly cosmetic
notes.  Hopefully others can try it out to see if the actual behavior
is good.

As a first nit: in git, as usual in emails, the style in subject lines
is not to end with a period.  The above subject line is otherwise good
(a nice summary that quickly explains the effect, which is handy in
e.g. abbreviated changelogs from release announcements).

> This is the beginning of the rewrite of the repacking.

This is a place to explain

 - the motivation / intended positive effect of the patch
 - any noticeable behavior changes
 - complications and other hints for people looking back and trying
   to understand this code

Based on the discussion before, I think the motivation is to get
closer to a goal of being able to have a core subset of git
functionality built in to git.  That would mean

 * people on Windows could get a copy of at least the core parts
   of Git without having to install a Unix-style shell

 * people deploying to servers don't have to rewrite the #! line
   or worry about the PATH and quality of installed POSIX
   utilities, if they are only using the built-in part written
   in C 

This patch is meant to be mostly a literal translation of the
git-repack script; the intent is that later patches would start using
more library facilities, but this patch is meant to be as close to a
no-op as possible so it doesn't do that kind of thing.

> All tests are constantly positive now.

This kind of changes-since-the-previous-iteration information that
doesn't need to be recorded in the commit log for posterity goes
after the "---" marker.

> 
> Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
> ---
>  Makefile                                        |   2 +-
[...]
> --- /dev/null
> +++ b/builtin/repack.c
> @@ -0,0 +1,361 @@
[...]
> +static int delta_base_offset = 0;

The "= 0" is automatic for statics without an initializer.  The
prevailing style in git is to leave it out.

Behavior change: in the script, wasn't the default "true"?

[...]
> +static void remove_temporary_files() {

Style: argument list should have "void".  (In C89 and C99, an empty
argument list means "having unspecified arguments" instead of "having
no arguments" as in C++.)

> +	DIR *dir;
> +	struct dirent *e;
> +	char *prefix, *path;
> +
> +	prefix = mkpathdup(".tmp-%d-pack", getpid());

pid_t is not guaranteed to be an "int", so this needs a cast.

> +	path = mkpathdup("%s/pack", get_object_directory());

The names "prefix" and "path" are quite generic.  What does this
function do?  A comment could help, e.g.:

	/*
	 * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
	 */

> +
> +	dir = opendir(path);
> +	while ((e = readdir(dir)) != NULL) {

What happens if the directory does not exist?

> +		if (!prefixcmp(e->d_name, prefix)) {

The git-repack script removes $PACKTMP-*, but this code matches $PACKTMP*
instead.  Intentional?

> +			struct strbuf fname = STRBUF_INIT;
> +			strbuf_addf(&fname, "%s/%s", path, e->d_name);
> +			unlink(strbuf_detach(&fname, NULL));

This leaks fname (see Documentation/technical/api-strbuf.txt for an
explanation of strbuf_detach).

> +		}
> +	}
> +	free(prefix);
> +	free(path);
> +	closedir(dir);

I wonder if it would make sense for buffers to share space here.
E.g. something like

 {
	/*
	 * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
	 */

	struct strbuf buf = STRBUF_INIT;
	size_t dirlen, prefixlen;
	DIR *dir;
	struct dirent *e;

	/* .git/objects/pack */
	strbuf_addstr(&buf, get_object_directory());
	strbuf_addstr(&buf, "/pack");
	dir = opendir(buf.buf);
	if (!dir)
		... handle error ...

	/* .git/objects/pack/.tmp-$$-pack-* */
	dirlen = buf.len + 1;
	strbuf_addf(&buf, "/.tmp-%d-pack-", getpid());
	prefixlen = buf.len - dirlen;

	while ((e = readdir(dir))) {
		if (strncmp(e->d_name, buf.buf + dirlen, prefixlen))
			continue;
		strbuf_setlen(&buf, dirlen);
		strbuf_addstr(&buf, e->d_name);
		unlink(buf.buf);
	}
	if (closedir(dir))
		... handle error ...

	strbuf_release(&buf);
 }

I dunno.

[...]
> +/*
> + * Fills the filename list with all the files found in the pack directory
> + * ending with .pack, without that extension.
> + */

Ideally a comment opening a function will save lazy readers the
trouble of reading the body of the function, by explaining what the
function is for and giving them some reliable summary of what its
effect will be.

The above comment doesn't do either: it doesn't make it clear why
the function exists, and it doesn't make the semantics precise:
should fname_list be empty before this function is called?  Are the
filenames filling it absolute or relative?  What happens if packdir
is unreadable or doesn't exist?  What happens to files without a .pack
extension?

Also, the above comment is in the wrong part of the file to maximally
help a lazy reader: it should be at the call site, so the reader
doesn't have to look for the function's definition at all.

> +void get_pack_filenames(char *packdir, struct string_list *fname_list)

File-local function should be static.

Missing "const" on packdir.

> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *path, *suffix, *fname;
> +
> +	path = mkpath("%s/pack", get_object_directory());

Whenever I see the result of "mkpath" stored for more than a couple
of lines, I fret a little (since it's easy to scribble over the
rotating list of 4 get_pathname() buffers).  Would doing

	dir = opendir(mkpath(...))

directly work?  By the way, why does this function both compute
packdir and take it as an argument?

> +	suffix = ".pack";

Why not pass the string directly to suffixcmp and strlen?

> +	dir = opendir(path);

What happens if the packdir does not exist or cannot be read, or
another error occurs?

> +	while ((e = readdir(dir)) != NULL) {
> +		if (!suffixcmp(e->d_name, suffix)) {

Can decrease the indent and deal with the boring case early
by reversing the test:

		if (suffixcmp(...))
			continue;

[...]
> +void remove_pack(char *path, char* sha1)

Missing "const" on path.  The * in pointers sticks to the variable
name instead of its type.

> +{
> +	char *exts[] = {".pack", ".idx", ".keep"};
> +	int ext = 0;
> +	for (ext = 0; ext < 3; ext++) {

String constants are allowed in C to be assigned to a char * for
historical reasons, but it's never a good idea :), since they're
not mutable.

Array index is being assigned twice.  ARRAY_SIZE could make this
clearer:

 {
	const char *exts[] = {".pack", ... };
	int i;

	for (i = 0; i < ARRAY_SIZE(exts); i++)
		unlink(mkpath("%s/pack-%s%s", packdir, sha1, exts[i]));
 }

Is the sha1 parameter actually a sha1?

It wasn't obvious to me at first what this function is for.  Maybe
a name like remove_redundant_pack() would work.  E.g.:

	static void remove_redundant_pack(const char *pack_sha1)
	{
		const char *exts[] = {".pack", ".idx", ".keep"};
		struct strbuf buf = STRBUF_INIT;
		size_t plen;
		int i;

		strbuf_addf(&buf, "%s/pack-%s", get_object_directory(), pack_sha1);
		plen = buf.len;

		for (i = 0; i < ARRAY_SIZE(exts); i++) {
			strbuf_setlen(&buf, plen);
			strbuf_addstr(&buf, exts[i]);
			unlink(buf.buf);
		}
	}

[...]
> +int cmd_repack(int argc, const char **argv, const char *prefix) {
> +

Stray blank line.

> +	char *exts[2] = {".idx", ".pack"};

Missing "const".

> +	char *packdir, *packtmp, line[1024];

Fixed-size buffers are no fun.  Better to use a strbuf with
e.g. strbuf_getline (see Documentation/technical/api-strbuf.txt).

[...]
> +	struct option builtin_repack_options[] = {
> +		OPT_BOOL('a', NULL, &pack_everything,
> +				N_("pack everything in a single pack")),
> +		OPT_BOOL('A', NULL, &pack_everything_but_loose,
> +				N_("same as -a, and turn unreachable objects loose")),
> +		OPT_BOOL('d', NULL, &delete_redundant,
> +				N_("remove redundant packs, and run git-prune-packed")),
> +		OPT_BOOL('f', NULL, &no_reuse_delta,
> +				N_("pass --no-reuse-delta to git-pack-objects")),
> +		OPT_BOOL('F', NULL, &no_reuse_object,
> +				N_("pass --no-reuse-object to git-pack-objects")),
> +		OPT_BOOL('n', NULL, &no_update_server_info,
> +				N_("do not run git-update-server-info")),
> +		OPT__QUIET(&quiet, N_("be quiet")),
> +		OPT_BOOL('l', "local", &local,
> +				N_("pass --local to git-pack-objects")),
> +		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
> +				N_("with -A, do not loosen objects older than this Packing constraints")),

Do you mean
		... than this")),

		OPT_GROUP(N_("Packing constraints")),

?

[...]
> +	git_config(repack_config, NULL);
> +
> +	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> +				git_repack_usage, 0);
> +
> +	sigchain_push_common(remove_pack_on_signal);
> +
> +	packdir = mkpathdup("%s/pack", get_object_directory());
> +	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());

This filename pattern also appears higher in the file.  Maybe it's
possible for them to share a constant or something?  (If it's too
much fuss, no need to bother.)

> +	argv_array_push(&cmd_args, "pack-objects");
> +	argv_array_push(&cmd_args, "--keep-true-parents");
> +	argv_array_push(&cmd_args, "--honor-pack-keep");
> +	argv_array_push(&cmd_args, "--non-empty");
> +	argv_array_push(&cmd_args, "--all");
> +	argv_array_push(&cmd_args, "--reflog");
> +
> +	if (window)
> +		argv_array_pushf(&cmd_args, "--window=%u", window);
> +
> +	if (window_memory)
> +		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);

Style: I'd leave out these blank lines, so the reader can see more
of the arguments in one screenful.

Are these signed or unsigned integers?  What happens if I pass
a value < 0 or > INT_MAX?

[...]
> +	if (pack_everything + pack_everything_but_loose == 0) {

Probably easier to read as

	if (!pack_everything && !pack_everything_but_loose) {

> +		argv_array_push(&cmd_args, "--unpacked");
> +		argv_array_push(&cmd_args, "--incremental");
> +	} else {
> +		struct string_list fname_list = STRING_LIST_INIT_DUP;
> +		get_pack_filenames(packdir, &fname_list);
> +		for_each_string_list_item(item, &fname_list) {

The git-repack script uses "find", which scans the directory
recursively, though I'm not sure why.  (Probably not important?)

Instead of building a temporary and potentially long string_list
and then pruning it to build another list, why not build the list
of packs without a corresponding .keep file in a single pass?

> +			char *fname;
> +			fname = mkpathdup("%s/%s.keep", packdir, item->string);
> +			if (file_exists(fname)) {
> +				/* when the keep file is there, we're ignoring that pack */
> +			} else {
> +				string_list_append(&existing_packs, item->string);
> +			}

Simplifying, and avoiding braces around a single-line "if" body:

			if (!file_exists(mkpath(...)))
				string_list_append(...);

[...]
> +		if (existing_packs.nr && delete_redundant) {
> +			if (unpack_unreachable)
> +				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s", unpack_unreachable);

Long line.

[...]
> +	count_packs = 0;
> +	out = xfdopen(cmd.out, "r");
> +	while (fgets(line, sizeof(line), out)) {
> +		/* a line consists of 40 hex chars + '\n' */

Time to sleep.  Stopping here for this round.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
  2013-08-21  8:25                                               ` Jonathan Nieder
@ 2013-08-21  8:49                                               ` Matthieu Moy
  2013-08-21 12:47                                                 ` Stefan Beller
                                                                   ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: Matthieu Moy @ 2013-08-21  8:49 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, apelisse, pclouds, iveqy, gitster, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

> All tests are constantly positive now.

Cool!

> +/*
> + * Fills the filename list with all the files found in the pack directory

Detail: "filename list" could be "fname_list" to match the actual
argument below.

> + * ending with .pack, without that extension.
> + */
> +void get_pack_filenames(char *packdir, struct string_list *fname_list)
> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *path, *suffix, *fname;
> +
> +	path = mkpath("%s/pack", get_object_directory());
> +	suffix = ".pack";
> +
> +	dir = opendir(path);

I think you should test and complain if dir is NULL ("cannot open pack
directory: ...")

> +void remove_pack(char *path, char* sha1)
> +{
> +	char *exts[] = {".pack", ".idx", ".keep"};
> +	int ext = 0;
> +	for (ext = 0; ext < 3; ext++) {
> +		char *fname;
> +		fname = mkpath("%s/%s%s", path, sha1, exts[ext]);
> +		unlink(fname);

Here also, the return value from unlink is not checked. Probably not
serious (mistakenly deleting a pack file would be very serious, but
keeping it around by mistake shouldn't harm), but a warning message may
be welcome.

These kinds of warnings are never shown in normal usage, but may be
welcome when something goes really wrong with the repo, as a diagnosis
tool for the user. The shell version had these warnings implicitly since
"rm" displays the message on stderr when it fails.

> +	struct child_process cmd;
> +	struct argv_array cmd_args = ARGV_ARRAY_INIT;

Since the command is going to be "pack-objects", you may call the
variables pack_cmd and pack_cmd_args or so.

> +	if (local)
> +		argv_array_push(&cmd_args,  "--local");
> +	if (quiet)
> +		argv_array_push(&cmd_args,  "--quiet");
> +	if (delta_base_offset)
> +		argv_array_push(&cmd_args,  "--delta-base-offset");
> +
> +	argv_array_push(&cmd_args, packtmp);
> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.argv = cmd_args.argv;
> +	cmd.git_cmd = 1;
> +	cmd.out = -1;
> +	cmd.no_stdin = 1;
> +
> +	if (start_command(&cmd))
> +		return 1;

A warning message would be welcome in addition to returning 1.

> +	if (!count_packs && !quiet)
> +		printf("Nothing new to pack.\n");
> +
> +	int failed = 0;

Don't declare variables inside code, it's not C90.

> +	for_each_string_list_item(item, &names) {
> +		for (ext = 0; ext < 2; ext++) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
> +			if (!file_exists(fname)) {
> +				free(fname);
> +				continue;
> +			}
> +
> +			fname_old = mkpath("%s/old-%s%s", packdir, item->string, exts[ext]);
> +			if (file_exists(fname_old))
> +				unlink(fname_old);

Unchecked returned value.

> +			if (rename(fname, fname_old)) {
> +				failed = 1;
> +				break;
> +			}
> +			string_list_append_nodup(&rollback, fname);
> +			free(fname);
> +		}
> +		if (failed)
> +			break;
> +	}

I tend to dislike these "set a variable and break twice" to exit nested
loops. Using an auxiliary function, you could just do

int f()
{
	for_each {
		for () {
			...
			if ()
				return 1;
			...
		}
	}
	return 0;
}

(Matter of taste, though. Some people may disagree)

A good side effect would be to move some code out of cmd_repack, which
is rather long.

> +	/* Now the ones with the same name are out of the way... */
> +	for_each_string_list_item(item, &names) {
> +		for (ext = 0; ext < 2; ext++) {
> +			char *fname, *fname_old;
> +			struct stat statbuffer;
> +			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
> +			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
> +			if (!stat(fname_old, &statbuffer)) {
> +				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
> +				chmod(fname_old, statbuffer.st_mode);

Unchecked return value.

> +	/* Remove the "old-" files */
> +	for_each_string_list_item(item, &names) {
> +		char *fname;
> +		fname = mkpath("%s/old-pack-%s.idx", packdir, item->string);
> +		if (remove_path(fname))
> +			die_errno(_("removing '%s' failed"), fname);
> +
> +		fname = mkpath("%s/old-pack-%s.pack", packdir, item->string);
> +		if (remove_path(fname))
> +			die_errno(_("removing '%s' failed"), fname);

Does this have to be a fatal error? If I read correctly, it wasn't fatal
in the shell version.

Any reason why you duplicate the code for .idx and .pack here, while you
iterate over an ext array in other places of the code?

> +	if (delete_redundant) {
> +		sort_string_list(&names);
> +		for_each_string_list_item(item, &existing_packs) {
> +			char *sha1;
> +			size_t len = strlen(item->string);
> +			if (len < 40)
> +				continue;
> +			sha1 = item->string + len - 40;
> +			if (!string_list_has_string(&names, sha1))
> +				remove_pack(packdir, item->string);
> +		}
> +		argv_array_clear(&cmd_args);
> +		argv_array_push(&cmd_args, "prune-packed");
> +		if (quiet)
> +			argv_array_push(&cmd_args, "--quiet");
> +
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = cmd_args.argv;
> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}

It's tempting to call prune_packed_objects() directly here, but it's
implemented in builtin/ so it would require a refactoring patch to be
moved to libgit.a before I guess.

> +	if (!no_update_server_info) {
> +		argv_array_clear(&cmd_args);
> +		argv_array_push(&cmd_args, "update-server-info");
> +
> +		memset(&cmd, 0, sizeof(cmd));
> +		cmd.argv = cmd_args.argv;
> +		cmd.git_cmd = 1;
> +		run_command(&cmd);
> +	}
> +	return 0;
> +}

Any reason to fork a new process instead of just calling
update_server_info() directly?

Not that efficiency matters here, but the code would be a bit simpler.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv4] repack: rewrite the shell script in C.
  2013-08-20 22:36                                           ` Stefan Beller
  2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
  2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
@ 2013-08-21  9:20                                             ` Johannes Sixt
  2 siblings, 0 replies; 72+ messages in thread
From: Johannes Sixt @ 2013-08-21  9:20 UTC (permalink / raw)
  To: Stefan Beller
  Cc: René Scharfe, git, mfick, apelisse, Matthieu.Moy, pclouds,
	iveqy, gitster, mackyle

Am 21.08.2013 00:36, schrieb Stefan Beller:
> I think I got all the suggestions except the
> use of git_path/mkpathdup.
> I replaced mkpathdup by mkpath where possible,
> but it's still not perfect.
> I'll wait for the dokumentation patch of Jonathan,
> before changing all these occurences forth and back
> again.

I trust Jonathan's judgement of how to use git_path, mkpath, and mkpathdup 
more than my own. So, please take my earlier comments in this regard with 
an appropriately large grain of salt.

> Below there is just the diff against RFC PATCHv4,
> however I'll send the whole patch as well.

Thanks, that is VERY helpful!

I'll comment here and have a look at the full patch later.

>...
>   int cmd_repack(int argc, const char **argv, const char *prefix) {
>

You should move the opening brace to the next line, which would then not 
be empty anymore.

>...
> @@ -217,34 +217,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
>   	argv_array_push(&cmd_args, packtmp);
>
>   	memset(&cmd, 0, sizeof(cmd));
> -	cmd.argv = argv_array_detach(&cmd_args, NULL);
> +	cmd.argv = cmd_args.argv;
>   	cmd.git_cmd = 1;
>   	cmd.out = -1;
>   	cmd.no_stdin = 1;
>
> -	if (run_command(&cmd))
> +	if (start_command(&cmd))
>   		return 1;

You should have an int ret here and use it like

	ret = start_command(&cmd);
	if (ret)
		return ret;

to retain any exit codes from the sub-process. I know, the script didn't 
preserve it:

	names=$(git pack-objects ...) || exit 1

but that was not idiomatic as it should have been written as

	names=$(git pack-objects ...) || exit

to forward the failure exit code.

>
> -	struct string_list names = STRING_LIST_INIT_DUP;
> -	struct string_list rollback = STRING_LIST_INIT_DUP;
> -
> -	char line[1024];
> -	int counter = 0;
> -	FILE *out = xfdopen(cmd.out, "r");

Nice! I missed these decl-after-stmt in my earlier review.

> +	count_packs = 0;
> +	out = xfdopen(cmd.out, "r");
>   	while (fgets(line, sizeof(line), out)) {
>   		/* a line consists of 40 hex chars + '\n' */
> -		assert(strlen(line) == 41);
> +		if (strlen(line) != 41)
> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");

I agree with Jonathan that you should use strbuf_getline() here.

>   		line[40] = '\0';
>   		string_list_append(&names, line);
> -		counter++;
> +		count_packs++;
>   	}
> -	if (!counter)
> -		printf("Nothing new to pack.\n");
> +	if (finish_command(&cmd))
> +		return 1;

Same as above here:

	ret = finish_command(&cmd);
	if (ret)
		return ret;

I would prefer to see

	argv_array_clear(&cmd_args);

here, i.e., at the end of the current use rather than later at the 
beginning of the next use. (Ditto for the other uses of cmd_args.)

>   	fclose(out);

This should happen before finish_command(). It doesn't matter if there are 
no errors, but if things go awry, closing the channel before 
finish_command() avoids deadlocks.

>
> +	if (!count_packs && !quiet)
> +		printf("Nothing new to pack.\n");
> +
>...
> @@ -301,33 +299,33 @@ int cmd_repack(int argc, const char **argv, const char *prefix) {
>   	for_each_string_list_item(item, &names) {
>   		for (ext = 0; ext < 2; ext++) {
>   			char *fname, *fname_old;
> +			struct stat statbuffer;
>   			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
> -			fname_old = mkpathdup("%s-%s%s", packtmp, item->string, exts[ext]);
> -			stat(fname_old, &statbuffer);
> -			statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
> -			chmod(fname_old, statbuffer.st_mode);
> +			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
> +			if (!stat(fname_old, &statbuffer)) {
> +				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;

This is still wrong: it should be one of

			... &= ~S_IWUSR & ~S_IWGRP & ~S_IWOTH;
			... &= ~(S_IWUSR | S_IWGRP | S_IWOTH);

> +				chmod(fname_old, statbuffer.st_mode);
> +			}
>   			if (rename(fname_old, fname))
> -				die("Could not rename packfile: %s -> %s", fname_old, fname);
> +				die_errno(_("renaming '%s' failed"), fname_old);
>   			free(fname);
> -			free(fname_old);
>   		}
>   	}
>...

Everything else looks OK. But as I said, mkpath() may have to be reverted 
to mkpathdup() as per Jonathans comments.

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21  8:25                                               ` Jonathan Nieder
@ 2013-08-21 10:37                                                 ` Stefan Beller
  2013-08-21 17:25                                                 ` Stefan Beller
  1 sibling, 0 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 10:37 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t

On 08/21/2013 10:25 AM, Jonathan Nieder wrote:
>> +static int delta_base_offset = 0;
> 
> The "= 0" is automatic for statics without an initializer.  The
> prevailing style in git is to leave it out.
> 
> Behavior change: in the script, wasn't the default "true"?
> 

Yes, I was printing out the arguments of shell version and 
of the C version and tried to match the arguments.
I must have missconfigured the test repository where
I run these differential tests. 
Now that I test again, the --delta-base-offset option
shows up as default as it is documented.

Now fixing the rest of your annotations.

Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
@ 2013-08-21 12:47                                                 ` Stefan Beller
  2013-08-21 13:05                                                   ` Matthieu Moy
  2013-08-21 12:53                                                 ` Stefan Beller
  2013-08-22 10:46                                                 ` Johannes Sixt
  2 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 12:47 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, mfick, apelisse, pclouds, iveqy, gitster, mackyle, j6t

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

On 08/21/2013 10:49 AM, Matthieu Moy wrote:
>> +	if (start_command(&cmd))
>> > +		return 1;
> A warning message would be welcome in addition to returning 1.
> 

Johannes Sixt proposes to retain the return value of
the sub process, which I'd agree on.
However why do we need a warning message here?

I'd expect the pack-objects to bring up the warning as
the stderr is untouched in the command invocation.
And we already passed either --quiet or not, so pack-objects
should know how to behave on its own.
Also it is a builtin command, so we do not need to check
if it is found or not, so I'd strongly rely on the error
and warning reporting from the underlying process, no?

Thanks,
Stefan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
  2013-08-21 12:47                                                 ` Stefan Beller
@ 2013-08-21 12:53                                                 ` Stefan Beller
  2013-08-21 13:07                                                   ` Matthieu Moy
  2013-08-22 10:46                                                 ` Johannes Sixt
  2 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 12:53 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, mfick, apelisse, pclouds, iveqy, gitster, mackyle, j6t

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On 08/21/2013 10:49 AM, Matthieu Moy wrote:
> I tend to dislike these "set a variable and break twice" to exit nested
> loops. Using an auxiliary function, you could just do
> 
> int f()
> {
> 	for_each {
> 		for () {
> 			...
> 			if ()
> 				return 1;
> 			...
> 		}
> 	}
> 	return 0;
> }
> 
> (Matter of taste, though. Some people may disagree)
> 
> A good side effect would be to move some code out of cmd_repack, which
> is rather long.

Thank you very much for the review, it helps me very much to focus
on the small details.

I intend to have the C code in this patch as close to the
shell version as possible. This goes both for functionality as
well as style/organisation within the file.

All the additional changes, such as this one
(Or in the previous mail, retaining the error code of subprocesses)
I'd like to put in small follow up patches changing just one thing
at a time.

But as these follow up changes heavily rely on the very first patch
I will first try to get that right, meaning accepted into pu.
Then I can send patches with these proposals such as making more
functions.

Thanks,
Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21 12:47                                                 ` Stefan Beller
@ 2013-08-21 13:05                                                   ` Matthieu Moy
  0 siblings, 0 replies; 72+ messages in thread
From: Matthieu Moy @ 2013-08-21 13:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, apelisse, pclouds, iveqy, gitster, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

> On 08/21/2013 10:49 AM, Matthieu Moy wrote:
>>> +	if (start_command(&cmd))
>>> > +		return 1;
>> A warning message would be welcome in addition to returning 1.
>> 
>
> Johannes Sixt proposes to retain the return value of
> the sub process, which I'd agree on.

Yes.

> I'd expect the pack-objects to bring up the warning as
> the stderr is untouched in the command invocation.

I was more thinking of weird cases like failure to fork or so. But
according to api-run-command.txt:

  . If a system call failed, errno is set and -1 is returned. A diagnostic
    is printed.

So you actually don't need it. In this case, following Johannes's
suggestion, you'd return -1 from the main function, which is unusual but
AFAICT is OK.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21 12:53                                                 ` Stefan Beller
@ 2013-08-21 13:07                                                   ` Matthieu Moy
  2013-08-22 10:46                                                     ` Johannes Sixt
  0 siblings, 1 reply; 72+ messages in thread
From: Matthieu Moy @ 2013-08-21 13:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, mfick, apelisse, pclouds, iveqy, gitster, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

> But as these follow up changes heavily rely on the very first patch
> I will first try to get that right, meaning accepted into pu.
> Then I can send patches with these proposals such as making more
> functions.

I think it's better to get the style right before, to avoid doubling the
review effort (review a hard-to-review patch first, and then re-review a
style-fix one).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21  8:25                                               ` Jonathan Nieder
  2013-08-21 10:37                                                 ` Stefan Beller
@ 2013-08-21 17:25                                                 ` Stefan Beller
  2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 17:25 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t

On 08/21/2013 10:25 AM, Jonathan Nieder wrote:
> Hi,
> 
> Stefan Beller wrote:
> 
>> [PATCH] repack: rewrite the shell script in C.
> 
> Thanks for your work so far.  This review will have mostly cosmetic
> notes.  Hopefully others can try it out to see if the actual behavior
> is good.

Thanks for all the reviews. I hope to have included every suggestion
so far or have send out mail discussing why not.

There have been quite a few changes since last round
because of so many reviews.

Here is a diff to the last sent patch, I'll also send
the whole patch on its one again.
Last time I forgot to label correctly with [RFC PATCHv5],
so the next patch should be v6.

Stefan

Changes since "[PATCH] repack: rewrite the shell script in C.":

--8<--
From 3cda569cdcd1312679c0035d151515cba7dacc59 Mon Sep 17 00:00:00 2001
From: Stefan Beller <stefanbeller@googlemail.com>
Date: Wed, 21 Aug 2013 12:33:13 +0200
Subject: [PATCH 2/3] Changes to last round.

 * get_pack_filenames: directly check for .keep files
 * packdir is a global variable now
 * fix help string for parsing options.
 * reenable the delta-base-offset being turned on by default
 * rewrite remove_temporary_files(void), remove_redundant_pack(fname)
   to use more strbuf instead of using mkpath(dup)
 * beautifying the code (line length, empty lines)

Still on the todo list for this patch:
 * Inspect the code for unlink, rename and see if we
   need to deal with their return codes.
 * Check for datatypes (--window-memory could use ulong?)

Later:
 * Move parts of cmd_repack to extra functions
 * check if subprocesses are needed (update-server-info,
   prune-packed)
---
 builtin/repack.c | 191
++++++++++++++++++++++++++++++-------------------------
 1 file changed, 103 insertions(+), 88 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 9fbe636..fb050c0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -13,7 +13,9 @@
 #include "string-list.h"
 #include "argv-array.h"

-static int delta_base_offset = 0;
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+char *packdir;

 static const char *const git_repack_usage[] = {
 	N_("git repack [options]"),
@@ -29,25 +31,39 @@ static int repack_config(const char *var, const char
*value, void *cb)
 	return git_default_config(var, value, cb);
 }

-static void remove_temporary_files() {
+/*
+ * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
+ */
+static void remove_temporary_files(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t dirlen, prefixlen;
 	DIR *dir;
 	struct dirent *e;
-	char *prefix, *path;

-	prefix = mkpathdup(".tmp-%d-pack", getpid());
-	path = mkpathdup("%s/pack", get_object_directory());
+	/* .git/objects/pack */
+	strbuf_addstr(&buf, get_object_directory());
+	strbuf_addstr(&buf, "/pack");
+	dir = opendir(buf.buf);
+	if (!dir) {
+		strbuf_release(&buf);
+		return;
+	}

-	dir = opendir(path);
-	while ((e = readdir(dir)) != NULL) {
-		if (!prefixcmp(e->d_name, prefix)) {
-			struct strbuf fname = STRBUF_INIT;
-			strbuf_addf(&fname, "%s/%s", path, e->d_name);
-			unlink(strbuf_detach(&fname, NULL));
-		}
+	/* .git/objects/pack/.tmp-$$-pack-* */
+	dirlen = buf.len + 1;
+	strbuf_addf(&buf, "/.tmp-%d-pack-", (int)getpid());
+	prefixlen = buf.len - dirlen;
+
+	while ((e = readdir(dir))) {
+		if (strncmp(e->d_name, buf.buf + dirlen, prefixlen))
+			continue;
+		strbuf_setlen(&buf, dirlen);
+		strbuf_addstr(&buf, e->d_name);
+		unlink(buf.buf);
 	}
-	free(prefix);
-	free(path);
 	closedir(dir);
+	strbuf_release(&buf);
 }

 static void remove_pack_on_signal(int signo)
@@ -57,52 +73,57 @@ static void remove_pack_on_signal(int signo)
 	raise(signo);
 }

-/*
- * Fills the filename list with all the files found in the pack directory
- * ending with .pack, without that extension.
- */
-void get_pack_filenames(char *packdir, struct string_list *fname_list)
+static void get_pack_filenames(struct string_list *fname_list)
 {
 	DIR *dir;
 	struct dirent *e;
-	char *path, *suffix, *fname;
+	char *fname;

-	path = mkpath("%s/pack", get_object_directory());
-	suffix = ".pack";
+	if (!(dir = opendir(packdir)))
+		return;

-	dir = opendir(path);
 	while ((e = readdir(dir)) != NULL) {
-		if (!suffixcmp(e->d_name, suffix)) {
-			size_t len = strlen(e->d_name) - strlen(suffix);
-			fname = xmemdupz(e->d_name, len);
+		if (suffixcmp(e->d_name, ".pack"))
+			continue;
+
+		size_t len = strlen(e->d_name) - strlen(".pack");
+		fname = xmemdupz(e->d_name, len);
+
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
-		}
 	}
 	closedir(dir);
 }

-void remove_pack(char *path, char* sha1)
+static void remove_redundant_pack(const char *path, const char *sha1)
 {
-	char *exts[] = {".pack", ".idx", ".keep"};
-	int ext = 0;
-	for (ext = 0; ext < 3; ext++) {
-		char *fname;
-		fname = mkpath("%s/%s%s", path, sha1, exts[ext]);
-		unlink(fname);
+	const char *exts[] = {".pack", ".idx", ".keep"};
+	int i;
+	struct strbuf buf = STRBUF_INIT;
+	size_t plen;
+
+	strbuf_addf(&buf, "%s/%s", path, sha1);
+	plen = buf.len;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		strbuf_setlen(&buf, plen);
+		strbuf_addstr(&buf, exts[i]);
+		unlink(buf.buf);
 	}
 }

-int cmd_repack(int argc, const char **argv, const char *prefix) {
-
-	char *exts[2] = {".idx", ".pack"};
-	char *packdir, *packtmp, line[1024];
+int cmd_repack(int argc, const char **argv, const char *prefix)
+{
+	const char *exts[2] = {".idx", ".pack"};
+	char *packtmp;
 	struct child_process cmd;
 	struct string_list_item *item;
 	struct argv_array cmd_args = ARGV_ARRAY_INIT;
 	struct string_list names = STRING_LIST_INIT_DUP;
 	struct string_list rollback = STRING_LIST_INIT_DUP;
 	struct string_list existing_packs = STRING_LIST_INIT_DUP;
-	int count_packs, ext;
+	struct strbuf line = STRBUF_INIT;
+	int count_packs, ext, ret;
 	FILE *out;

 	/* variables to be filled by option parsing */
@@ -135,7 +156,7 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 		OPT_BOOL('l', "local", &local,
 				N_("pass --local to git-pack-objects")),
 		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable,
N_("approxidate"),
-				N_("with -A, do not loosen objects older than this Packing
constraints")),
+				N_("with -A, do not loosen objects older than this")),
 		OPT_INTEGER(0, "window", &window,
 				N_("size of the window used for delta compression")),
 		OPT_INTEGER(0, "window-memory", &window_memory,
@@ -155,7 +176,7 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 	sigchain_push_common(remove_pack_on_signal);

 	packdir = mkpathdup("%s/pack", get_object_directory());
-	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());

 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
@@ -163,47 +184,33 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 	argv_array_push(&cmd_args, "--non-empty");
 	argv_array_push(&cmd_args, "--all");
 	argv_array_push(&cmd_args, "--reflog");
-
 	if (window)
 		argv_array_pushf(&cmd_args, "--window=%u", window);
-
 	if (window_memory)
 		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
-
 	if (depth)
 		argv_array_pushf(&cmd_args, "--depth=%u", depth);
-
 	if (max_pack_size)
 		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
-
 	if (no_reuse_delta)
 		argv_array_pushf(&cmd_args, "--no-reuse-delta");
-
 	if (no_reuse_object)
 		argv_array_pushf(&cmd_args, "--no-reuse-object");

-	if (pack_everything + pack_everything_but_loose == 0) {
+	if (!pack_everything && !pack_everything_but_loose) {
 		argv_array_push(&cmd_args, "--unpacked");
 		argv_array_push(&cmd_args, "--incremental");
 	} else {
-		struct string_list fname_list = STRING_LIST_INIT_DUP;
-		get_pack_filenames(packdir, &fname_list);
-		for_each_string_list_item(item, &fname_list) {
-			char *fname;
-			fname = mkpathdup("%s/%s.keep", packdir, item->string);
-			if (file_exists(fname)) {
-				/* when the keep file is there, we're ignoring that pack */
-			} else {
-				string_list_append(&existing_packs, item->string);
-			}
-			free(fname);
-		}
+		get_pack_filenames(&existing_packs);

 		if (existing_packs.nr && delete_redundant) {
 			if (unpack_unreachable)
-				argv_array_pushf(&cmd_args, "--unpack-unreachable=%s",
unpack_unreachable);
+				argv_array_pushf(&cmd_args,
+						"--unpack-unreachable=%s",
+						unpack_unreachable);
 			else if (pack_everything_but_loose)
-				argv_array_push(&cmd_args, "--unpack-unreachable");
+				argv_array_push(&cmd_args,
+						"--unpack-unreachable");
 		}
 	}

@@ -222,22 +229,24 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 	cmd.out = -1;
 	cmd.no_stdin = 1;

-	if (start_command(&cmd))
+	ret = start_command(&cmd);
+	if (ret)
 		return 1;

 	count_packs = 0;
 	out = xfdopen(cmd.out, "r");
-	while (fgets(line, sizeof(line), out)) {
-		/* a line consists of 40 hex chars + '\n' */
-		if (strlen(line) != 41)
+	while (strbuf_getline(&line, out, '\n') != EOF) {
+		if (line.len != 40)
 			die("repack: Expecting 40 character sha1 lines only from
pack-objects.");
-		line[40] = '\0';
-		string_list_append(&names, line);
+		strbuf_addstr(&line, "");
+		string_list_append(&names, line.buf);
 		count_packs++;
 	}
-	if (finish_command(&cmd))
-		return 1;
 	fclose(out);
+	ret = finish_command(&cmd);
+	if (ret)
+		return 1;
+	argv_array_clear(&cmd_args);

 	if (!count_packs && !quiet)
 		printf("Nothing new to pack.\n");
@@ -246,13 +255,15 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
-			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
+			fname = mkpathdup("%s/%s%s", packdir,
+						item->string, exts[ext]);
 			if (!file_exists(fname)) {
 				free(fname);
 				continue;
 			}

-			fname_old = mkpath("%s/old-%s%s", packdir, item->string, exts[ext]);
+			fname_old = mkpathdup("%s/old-%s%s", packdir,
+						item->string, exts[ext]);
 			if (file_exists(fname_old))
 				unlink(fname_old);

@@ -262,6 +273,7 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 			}
 			string_list_append_nodup(&rollback, fname);
 			free(fname);
+			free(fname_old);
 		}
 		if (failed)
 			break;
@@ -286,7 +298,7 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 				"WARNING: file.  But the operation failed, and the\n"
 				"WARNING: attempt to rename them back to their\n"
 				"WARNING: original names also failed.\n"
-				"WARNING: Please rename them in $PACKDIR manually:\n");
+				"WARNING: Please rename them in %s manually:\n", packdir);
 			for (i = 0; i < rollback.nr; i++)
 				fprintf(stderr, "WARNING:   old-%s -> %s\n",
 					rollback.items[i].string,
@@ -300,28 +312,32 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
 			struct stat statbuffer;
-			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
-			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
+			fname = mkpathdup("%s/pack-%s%s",
+					packdir, item->string, exts[ext]);
+			fname_old = mkpathdup("%s-%s%s",
+					packtmp, item->string, exts[ext]);
 			if (!stat(fname_old, &statbuffer)) {
-				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
+				statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
 				chmod(fname_old, statbuffer.st_mode);
 			}
 			if (rename(fname_old, fname))
 				die_errno(_("renaming '%s' failed"), fname_old);
 			free(fname);
+			free(fname_old);
 		}
 	}

 	/* Remove the "old-" files */
 	for_each_string_list_item(item, &names) {
-		char *fname;
-		fname = mkpath("%s/old-pack-%s.idx", packdir, item->string);
-		if (remove_path(fname))
-			die_errno(_("removing '%s' failed"), fname);
-
-		fname = mkpath("%s/old-pack-%s.pack", packdir, item->string);
-		if (remove_path(fname))
-			die_errno(_("removing '%s' failed"), fname);
+		for (ext = 0; ext < 2; ext++) {
+			char *fname;
+			fname = mkpath("%s/old-pack-%s%s",
+					packdir,
+					item->string,
+					exts[ext]);
+			if (remove_path(fname))
+				warning(_("removing '%s' failed"), fname);
+		}
 	}

 	/* End of pack replacement. */
@@ -335,9 +351,8 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 				continue;
 			sha1 = item->string + len - 40;
 			if (!string_list_has_string(&names, sha1))
-				remove_pack(packdir, item->string);
+				remove_redundant_pack(packdir, item->string);
 		}
-		argv_array_clear(&cmd_args);
 		argv_array_push(&cmd_args, "prune-packed");
 		if (quiet)
 			argv_array_push(&cmd_args, "--quiet");
@@ -346,16 +361,16 @@ int cmd_repack(int argc, const char **argv, const
char *prefix) {
 		cmd.argv = cmd_args.argv;
 		cmd.git_cmd = 1;
 		run_command(&cmd);
+		argv_array_clear(&cmd_args);
 	}

 	if (!no_update_server_info) {
-		argv_array_clear(&cmd_args);
 		argv_array_push(&cmd_args, "update-server-info");
-
 		memset(&cmd, 0, sizeof(cmd));
 		cmd.argv = cmd_args.argv;
 		cmd.git_cmd = 1;
 		run_command(&cmd);
+		argv_array_clear(&cmd_args);
 	}
 	return 0;
 }
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 17:25                                                 ` Stefan Beller
@ 2013-08-21 17:28                                                   ` Stefan Beller
  2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
  2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 17:28 UTC (permalink / raw)
  To: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t
  Cc: Stefan Beller

The motivation of this patch is to get closer to a goal of being
able to have a core subset of git functionality built in to git.
That would mean

 * people on Windows could get a copy of at least the core parts
   of Git without having to install a Unix-style shell

 * people deploying to servers don't have to rewrite the #! line
   or worry about the PATH and quality of installed POSIX
   utilities, if they are only using the built-in part written
   in C

This patch is meant to be mostly a literal translation of the
git-repack script; the intent is that later patches would start using
more library facilities, but this patch is meant to be as close to a
no-op as possible so it doesn't do that kind of thing.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 Makefile                                        |   2 +-
 builtin.h                                       |   1 +
 builtin/repack.c                                | 376 ++++++++++++++++++++++++
 git-repack.sh => contrib/examples/git-repack.sh |   0
 git.c                                           |   1 +
 5 files changed, 379 insertions(+), 1 deletion(-)
 create mode 100644 builtin/repack.c
 rename git-repack.sh => contrib/examples/git-repack.sh (100%)

diff --git a/Makefile b/Makefile
index ef442eb..4ec5bbe 100644
--- a/Makefile
+++ b/Makefile
@@ -464,7 +464,6 @@ SCRIPT_SH += git-pull.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-rebase.sh
 SCRIPT_SH += git-remote-testgit.sh
-SCRIPT_SH += git-repack.sh
 SCRIPT_SH += git-request-pull.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
@@ -972,6 +971,7 @@ BUILTIN_OBJS += builtin/reflog.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/remote-ext.o
 BUILTIN_OBJS += builtin/remote-fd.o
+BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
diff --git a/builtin.h b/builtin.h
index 8afa2de..b56cf07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -102,6 +102,7 @@ extern int cmd_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 extern int cmd_remote_fd(int argc, const char **argv, const char *prefix);
+extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_repo_config(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
diff --git a/builtin/repack.c b/builtin/repack.c
new file mode 100644
index 0000000..fb050c0
--- /dev/null
+++ b/builtin/repack.c
@@ -0,0 +1,376 @@
+/*
+ * The shell version was written by Linus Torvalds (2005) and many others.
+ * This is a translation into C by Stefan Beller (2013)
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "string-list.h"
+#include "argv-array.h"
+
+/* enabled by default since 22c79eab (2008-06-25) */
+static int delta_base_offset = 1;
+char *packdir;
+
+static const char *const git_repack_usage[] = {
+	N_("git repack [options]"),
+	NULL
+};
+
+static int repack_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "repack.usedeltabaseoffset")) {
+		delta_base_offset = git_config_bool(var, value);
+		return 0;
+	}
+	return git_default_config(var, value, cb);
+}
+
+/*
+ * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
+ */
+static void remove_temporary_files(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t dirlen, prefixlen;
+	DIR *dir;
+	struct dirent *e;
+
+	/* .git/objects/pack */
+	strbuf_addstr(&buf, get_object_directory());
+	strbuf_addstr(&buf, "/pack");
+	dir = opendir(buf.buf);
+	if (!dir) {
+		strbuf_release(&buf);
+		return;
+	}
+
+	/* .git/objects/pack/.tmp-$$-pack-* */
+	dirlen = buf.len + 1;
+	strbuf_addf(&buf, "/.tmp-%d-pack-", (int)getpid());
+	prefixlen = buf.len - dirlen;
+
+	while ((e = readdir(dir))) {
+		if (strncmp(e->d_name, buf.buf + dirlen, prefixlen))
+			continue;
+		strbuf_setlen(&buf, dirlen);
+		strbuf_addstr(&buf, e->d_name);
+		unlink(buf.buf);
+	}
+	closedir(dir);
+	strbuf_release(&buf);
+}
+
+static void remove_pack_on_signal(int signo)
+{
+	remove_temporary_files();
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void get_pack_filenames(struct string_list *fname_list)
+{
+	DIR *dir;
+	struct dirent *e;
+	char *fname;
+
+	if (!(dir = opendir(packdir)))
+		return;
+
+	while ((e = readdir(dir)) != NULL) {
+		if (suffixcmp(e->d_name, ".pack"))
+			continue;
+
+		size_t len = strlen(e->d_name) - strlen(".pack");
+		fname = xmemdupz(e->d_name, len);
+
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
+			string_list_append_nodup(fname_list, fname);
+	}
+	closedir(dir);
+}
+
+static void remove_redundant_pack(const char *path, const char *sha1)
+{
+	const char *exts[] = {".pack", ".idx", ".keep"};
+	int i;
+	struct strbuf buf = STRBUF_INIT;
+	size_t plen;
+
+	strbuf_addf(&buf, "%s/%s", path, sha1);
+	plen = buf.len;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		strbuf_setlen(&buf, plen);
+		strbuf_addstr(&buf, exts[i]);
+		unlink(buf.buf);
+	}
+}
+
+int cmd_repack(int argc, const char **argv, const char *prefix)
+{
+	const char *exts[2] = {".idx", ".pack"};
+	char *packtmp;
+	struct child_process cmd;
+	struct string_list_item *item;
+	struct argv_array cmd_args = ARGV_ARRAY_INIT;
+	struct string_list names = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_DUP;
+	struct string_list existing_packs = STRING_LIST_INIT_DUP;
+	struct strbuf line = STRBUF_INIT;
+	int count_packs, ext, ret;
+	FILE *out;
+
+	/* variables to be filled by option parsing */
+	int pack_everything = 0;
+	int pack_everything_but_loose = 0;
+	int delete_redundant = 0;
+	char *unpack_unreachable = NULL;
+	int window = 0, window_memory = 0;
+	int depth = 0;
+	int max_pack_size = 0;
+	int no_reuse_delta = 0, no_reuse_object = 0;
+	int no_update_server_info = 0;
+	int quiet = 0;
+	int local = 0;
+
+	struct option builtin_repack_options[] = {
+		OPT_BOOL('a', NULL, &pack_everything,
+				N_("pack everything in a single pack")),
+		OPT_BOOL('A', NULL, &pack_everything_but_loose,
+				N_("same as -a, and turn unreachable objects loose")),
+		OPT_BOOL('d', NULL, &delete_redundant,
+				N_("remove redundant packs, and run git-prune-packed")),
+		OPT_BOOL('f', NULL, &no_reuse_delta,
+				N_("pass --no-reuse-delta to git-pack-objects")),
+		OPT_BOOL('F', NULL, &no_reuse_object,
+				N_("pass --no-reuse-object to git-pack-objects")),
+		OPT_BOOL('n', NULL, &no_update_server_info,
+				N_("do not run git-update-server-info")),
+		OPT__QUIET(&quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &local,
+				N_("pass --local to git-pack-objects")),
+		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
+				N_("with -A, do not loosen objects older than this")),
+		OPT_INTEGER(0, "window", &window,
+				N_("size of the window used for delta compression")),
+		OPT_INTEGER(0, "window-memory", &window_memory,
+				N_("same as the above, but limit memory size instead of entries count")),
+		OPT_INTEGER(0, "depth", &depth,
+				N_("limits the maximum delta depth")),
+		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
+				N_("maximum size of each packfile")),
+		OPT_END()
+	};
+
+	git_config(repack_config, NULL);
+
+	argc = parse_options(argc, argv, prefix, builtin_repack_options,
+				git_repack_usage, 0);
+
+	sigchain_push_common(remove_pack_on_signal);
+
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
+
+	argv_array_push(&cmd_args, "pack-objects");
+	argv_array_push(&cmd_args, "--keep-true-parents");
+	argv_array_push(&cmd_args, "--honor-pack-keep");
+	argv_array_push(&cmd_args, "--non-empty");
+	argv_array_push(&cmd_args, "--all");
+	argv_array_push(&cmd_args, "--reflog");
+	if (window)
+		argv_array_pushf(&cmd_args, "--window=%u", window);
+	if (window_memory)
+		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
+	if (depth)
+		argv_array_pushf(&cmd_args, "--depth=%u", depth);
+	if (max_pack_size)
+		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
+	if (no_reuse_delta)
+		argv_array_pushf(&cmd_args, "--no-reuse-delta");
+	if (no_reuse_object)
+		argv_array_pushf(&cmd_args, "--no-reuse-object");
+
+	if (!pack_everything && !pack_everything_but_loose) {
+		argv_array_push(&cmd_args, "--unpacked");
+		argv_array_push(&cmd_args, "--incremental");
+	} else {
+		get_pack_filenames(&existing_packs);
+
+		if (existing_packs.nr && delete_redundant) {
+			if (unpack_unreachable)
+				argv_array_pushf(&cmd_args,
+						"--unpack-unreachable=%s",
+						unpack_unreachable);
+			else if (pack_everything_but_loose)
+				argv_array_push(&cmd_args,
+						"--unpack-unreachable");
+		}
+	}
+
+	if (local)
+		argv_array_push(&cmd_args,  "--local");
+	if (quiet)
+		argv_array_push(&cmd_args,  "--quiet");
+	if (delta_base_offset)
+		argv_array_push(&cmd_args,  "--delta-base-offset");
+
+	argv_array_push(&cmd_args, packtmp);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.argv = cmd_args.argv;
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+
+	ret = start_command(&cmd);
+	if (ret)
+		return 1;
+
+	count_packs = 0;
+	out = xfdopen(cmd.out, "r");
+	while (strbuf_getline(&line, out, '\n') != EOF) {
+		if (line.len != 40)
+			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
+		strbuf_addstr(&line, "");
+		string_list_append(&names, line.buf);
+		count_packs++;
+	}
+	fclose(out);
+	ret = finish_command(&cmd);
+	if (ret)
+		return 1;
+	argv_array_clear(&cmd_args);
+
+	if (!count_packs && !quiet)
+		printf("Nothing new to pack.\n");
+
+	int failed = 0;
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s%s", packdir,
+						item->string, exts[ext]);
+			if (!file_exists(fname)) {
+				free(fname);
+				continue;
+			}
+
+			fname_old = mkpathdup("%s/old-%s%s", packdir,
+						item->string, exts[ext]);
+			if (file_exists(fname_old))
+				unlink(fname_old);
+
+			if (rename(fname, fname_old)) {
+				failed = 1;
+				break;
+			}
+			string_list_append_nodup(&rollback, fname);
+			free(fname);
+			free(fname_old);
+		}
+		if (failed)
+			break;
+	}
+	if (failed) {
+		struct string_list rollback_failure;
+		for_each_string_list_item(item, &rollback) {
+			char *fname, *fname_old;
+			fname = mkpathdup("%s/%s", packdir, item->string);
+			fname_old = mkpath("%s/old-%s", packdir, item->string);
+			if (rename(fname_old, fname))
+				string_list_append(&rollback_failure, fname);
+			free(fname);
+		}
+
+		if (rollback.nr) {
+			int i;
+			fprintf(stderr,
+				"WARNING: Some packs in use have been renamed by\n"
+				"WARNING: prefixing old- to their name, in order to\n"
+				"WARNING: replace them with the new version of the\n"
+				"WARNING: file.  But the operation failed, and the\n"
+				"WARNING: attempt to rename them back to their\n"
+				"WARNING: original names also failed.\n"
+				"WARNING: Please rename them in %s manually:\n", packdir);
+			for (i = 0; i < rollback.nr; i++)
+				fprintf(stderr, "WARNING:   old-%s -> %s\n",
+					rollback.items[i].string,
+					rollback.items[i].string);
+		}
+		exit(1);
+	}
+
+	/* Now the ones with the same name are out of the way... */
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname, *fname_old;
+			struct stat statbuffer;
+			fname = mkpathdup("%s/pack-%s%s",
+					packdir, item->string, exts[ext]);
+			fname_old = mkpathdup("%s-%s%s",
+					packtmp, item->string, exts[ext]);
+			if (!stat(fname_old, &statbuffer)) {
+				statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
+				chmod(fname_old, statbuffer.st_mode);
+			}
+			if (rename(fname_old, fname))
+				die_errno(_("renaming '%s' failed"), fname_old);
+			free(fname);
+			free(fname_old);
+		}
+	}
+
+	/* Remove the "old-" files */
+	for_each_string_list_item(item, &names) {
+		for (ext = 0; ext < 2; ext++) {
+			char *fname;
+			fname = mkpath("%s/old-pack-%s%s",
+					packdir,
+					item->string,
+					exts[ext]);
+			if (remove_path(fname))
+				warning(_("removing '%s' failed"), fname);
+		}
+	}
+
+	/* End of pack replacement. */
+
+	if (delete_redundant) {
+		sort_string_list(&names);
+		for_each_string_list_item(item, &existing_packs) {
+			char *sha1;
+			size_t len = strlen(item->string);
+			if (len < 40)
+				continue;
+			sha1 = item->string + len - 40;
+			if (!string_list_has_string(&names, sha1))
+				remove_redundant_pack(packdir, item->string);
+		}
+		argv_array_push(&cmd_args, "prune-packed");
+		if (quiet)
+			argv_array_push(&cmd_args, "--quiet");
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args.argv;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+		argv_array_clear(&cmd_args);
+	}
+
+	if (!no_update_server_info) {
+		argv_array_push(&cmd_args, "update-server-info");
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.argv = cmd_args.argv;
+		cmd.git_cmd = 1;
+		run_command(&cmd);
+		argv_array_clear(&cmd_args);
+	}
+	return 0;
+}
diff --git a/git-repack.sh b/contrib/examples/git-repack.sh
similarity index 100%
rename from git-repack.sh
rename to contrib/examples/git-repack.sh
diff --git a/git.c b/git.c
index 2025f77..03510be 100644
--- a/git.c
+++ b/git.c
@@ -396,6 +396,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "remote", cmd_remote, RUN_SETUP },
 		{ "remote-ext", cmd_remote_ext },
 		{ "remote-fd", cmd_remote_fd },
+		{ "repack", cmd_repack, RUN_SETUP },
 		{ "replace", cmd_replace, RUN_SETUP },
 		{ "repo-config", cmd_repo_config, RUN_SETUP_GENTLY },
 		{ "rerere", cmd_rerere, RUN_SETUP },
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [RFC PATCHv6 2/2] repack: retain the return value of pack-objects
  2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
@ 2013-08-21 17:28                                                     ` Stefan Beller
  2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
  1 sibling, 0 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 17:28 UTC (permalink / raw)
  To: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t
  Cc: Stefan Beller

During the review process of the previous commit (repack: rewrite the
shell script in C), Johannes Sixt proposed to retain any exit codes from
the sub-process, which makes it probably more obvious in case of failure.

As the commit before should behave as close to the original shell
script, the proposed change is put in this extra commit.
The infrastructure however was already setup in the previous commit.
(Having a local 'ret' variable)

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 builtin/repack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index fb050c0..1f13e0d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -231,7 +231,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	ret = start_command(&cmd);
 	if (ret)
-		return 1;
+		return ret;
 
 	count_packs = 0;
 	out = xfdopen(cmd.out, "r");
@@ -245,7 +245,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	fclose(out);
 	ret = finish_command(&cmd);
 	if (ret)
-		return 1;
+		return ret;
 	argv_array_clear(&cmd_args);
 
 	if (!count_packs && !quiet)
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
  2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
@ 2013-08-21 20:56                                                     ` Junio C Hamano
  2013-08-21 21:52                                                       ` Matthieu Moy
                                                                         ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: Junio C Hamano @ 2013-08-21 20:56 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

> The motivation of this patch is to get closer to a goal of being
> able to have a core subset of git functionality built in to git.
> That would mean
>
>  * people on Windows could get a copy of at least the core parts
>    of Git without having to install a Unix-style shell
>
>  * people deploying to servers don't have to rewrite the #! line
>    or worry about the PATH and quality of installed POSIX
>    utilities, if they are only using the built-in part written
>    in C

I am not sure what is meant by the latter.  Rewriting #! is part of
any scripted Porcelain done by the top-level Makefile, and I do not
think we have seen any problem reports on it.

As to "quality of ... utilities", I think the real issue some people
in the thread had was not about "deploying to servers" but about
installing in a minimalistic chrooted environment where standard
tools may be lacking.

> diff --git a/builtin/repack.c b/builtin/repack.c
> new file mode 100644
> index 0000000..fb050c0
> --- /dev/null
> +++ b/builtin/repack.c
> @@ -0,0 +1,376 @@
> +/*
> + * The shell version was written by Linus Torvalds (2005) and many others.
> + * This is a translation into C by Stefan Beller (2013)
> + */

I am not sure if we want to record "ownership" in the code like
this; it will go stale over time.

> +#include "builtin.h"
> +#include "cache.h"
> +#include "dir.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +#include "sigchain.h"
> +#include "strbuf.h"
> +#include "string-list.h"
> +#include "argv-array.h"
> +
> +/* enabled by default since 22c79eab (2008-06-25) */

It may be of some value that by default --delta-base-offset is used,
but that can be read from the initialization.  Do we need this
comment?

> +static int delta_base_offset = 1;
> +char *packdir;

Does this have to be global?

> +static const char *const git_repack_usage[] = {
> +	N_("git repack [options]"),
> +	NULL
> +};
> +
> +static int repack_config(const char *var, const char *value, void *cb)
> +{
> +	if (!strcmp(var, "repack.usedeltabaseoffset")) {
> +		delta_base_offset = git_config_bool(var, value);
> +		return 0;
> +	}
> +	return git_default_config(var, value, cb);
> +}
> +
> +/*
> + * Remove temporary $GIT_OBJECT_DIRECTORY/pack/.tmp-$$-pack-* files.
> + */
> +static void remove_temporary_files(void)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t dirlen, prefixlen;
> +	DIR *dir;
> +	struct dirent *e;
> +
> +	/* .git/objects/pack */

We can read what is in there from two strbuf calls without comment.

> +	strbuf_addstr(&buf, get_object_directory());
> +	strbuf_addstr(&buf, "/pack");

More importantly, you already know what this directory and what
packtmp prefix are.

Also, you can keep &buf empty until opendir() succeeds.

> +	dir = opendir(buf.buf);
> +	if (!dir) {
> +		strbuf_release(&buf);
> +		return;
> +	}
> +
> +	/* .git/objects/pack/.tmp-$$-pack-* */
> +	dirlen = buf.len + 1;

Likewise; it is a good idea to document what "dirlen" points at,
though.

> +	strbuf_addf(&buf, "/.tmp-%d-pack-", (int)getpid());
> +	prefixlen = buf.len - dirlen;

So in summary:

	dir = opendir(packdir);
        if (!dir)
		return;

	strbuf_addf(&buf, "%s-", packtmp);

        /* Point at the slash at the end of ".../objects/pack/" */
	dirlen = strlen(packdir) + 1;
        /* Point at the dash at the end of ".../.tmp-%d-pack-" */
        prefixlen = buf.len - dirlen;

You would need to move the initialization of packdir and packtmp
before sigchain_push() in cmd_repack() if you were to do this.

> +	while ((e = readdir(dir))) {
> +		if (strncmp(e->d_name, buf.buf + dirlen, prefixlen))
> +			continue;
> +		strbuf_setlen(&buf, dirlen);
> +		strbuf_addstr(&buf, e->d_name);
> +		unlink(buf.buf);

This unlink(2) could fail, but there is not much we could do here.

> +	}
> +	closedir(dir);
> +	strbuf_release(&buf);
> +}
> +
> +static void remove_pack_on_signal(int signo)
> +{
> +	remove_temporary_files();
> +	sigchain_pop(signo);
> +	raise(signo);
> +}
> +
> +static void get_pack_filenames(struct string_list *fname_list)
> +{
> +	DIR *dir;
> +	struct dirent *e;
> +	char *fname;
> +
> +	if (!(dir = opendir(packdir)))
> +		return;
> +
> +	while ((e = readdir(dir)) != NULL) {
> +		if (suffixcmp(e->d_name, ".pack"))
> +			continue;

We may want to tighten this to ignore cruft that does not match

	/^pack-[0-9a-f]{40}\.pack$/

in a later patch, but this is a faithful rewrite from the original.

> +		size_t len = strlen(e->d_name) - strlen(".pack");

decl-after-stmt.

> +		fname = xmemdupz(e->d_name, len);
> +
> +		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
> +			string_list_append_nodup(fname_list, fname);

mental note: this is getting names of non-kept packs, not all packs.

> +	}
> +	closedir(dir);
> +}
> +
> +static void remove_redundant_pack(const char *path, const char *sha1)

These parameter names may want to be changed to clarify what they
are; see below.

> +{
> +	const char *exts[] = {".pack", ".idx", ".keep"};
> +	int i;
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t plen;
> +
> +	strbuf_addf(&buf, "%s/%s", path, sha1);

This suggests that path[] has ".../objects/pack/pack-" and sha1[] is
a 40-hex representation of the pack name.  Calling the former
path_prefix[] and the latter hex[] may be clearer.

> +	plen = buf.len;
> +
> +	for (i = 0; i < ARRAY_SIZE(exts); i++) {
> +		strbuf_setlen(&buf, plen);
> +		strbuf_addstr(&buf, exts[i]);
> +		unlink(buf.buf);
> +	}
> +}
> +
> +int cmd_repack(int argc, const char **argv, const char *prefix)
> +{
> +	const char *exts[2] = {".idx", ".pack"};
> +	char *packtmp;
> +	struct child_process cmd;
> +	struct string_list_item *item;
> +	struct argv_array cmd_args = ARGV_ARRAY_INIT;
> +	struct string_list names = STRING_LIST_INIT_DUP;
> +	struct string_list rollback = STRING_LIST_INIT_DUP;
> +	struct string_list existing_packs = STRING_LIST_INIT_DUP;
> +	struct strbuf line = STRBUF_INIT;
> +	int count_packs, ext, ret;
> +	FILE *out;
> +
> +	/* variables to be filled by option parsing */
> +	int pack_everything = 0;
> +	int pack_everything_but_loose = 0;
> +	int delete_redundant = 0;
> +	char *unpack_unreachable = NULL;
> +	int window = 0, window_memory = 0;
> +	int depth = 0;
> +	int max_pack_size = 0;
> +	int no_reuse_delta = 0, no_reuse_object = 0;
> +	int no_update_server_info = 0;
> +	int quiet = 0;
> +	int local = 0;
> +
> +	struct option builtin_repack_options[] = {
> +		OPT_BOOL('a', NULL, &pack_everything,
> +				N_("pack everything in a single pack")),
> +		OPT_BOOL('A', NULL, &pack_everything_but_loose,
> +				N_("same as -a, and turn unreachable objects loose")),
> +		OPT_BOOL('d', NULL, &delete_redundant,
> +				N_("remove redundant packs, and run git-prune-packed")),
> +		OPT_BOOL('f', NULL, &no_reuse_delta,
> +				N_("pass --no-reuse-delta to git-pack-objects")),
> +		OPT_BOOL('F', NULL, &no_reuse_object,
> +				N_("pass --no-reuse-object to git-pack-objects")),
> +		OPT_BOOL('n', NULL, &no_update_server_info,
> +				N_("do not run git-update-server-info")),
> +		OPT__QUIET(&quiet, N_("be quiet")),
> +		OPT_BOOL('l', "local", &local,
> +				N_("pass --local to git-pack-objects")),
> +		OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
> +				N_("with -A, do not loosen objects older than this")),
> +		OPT_INTEGER(0, "window", &window,
> +				N_("size of the window used for delta compression")),
> +		OPT_INTEGER(0, "window-memory", &window_memory,
> +				N_("same as the above, but limit memory size instead of entries count")),
> +		OPT_INTEGER(0, "depth", &depth,
> +				N_("limits the maximum delta depth")),
> +		OPT_INTEGER(0, "max-pack-size", &max_pack_size,
> +				N_("maximum size of each packfile")),
> +		OPT_END()
> +	};
> +
> +	git_config(repack_config, NULL);
> +
> +	argc = parse_options(argc, argv, prefix, builtin_repack_options,
> +				git_repack_usage, 0);

Nice. In a later patch we might want to allow --delta-base-offset to
be overridden from the command line and doing config first and then
options second like the above would allow us to do so easily.

> +	sigchain_push_common(remove_pack_on_signal);
> +	packdir = mkpathdup("%s/pack", get_object_directory());
> +	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> +
> +	argv_array_push(&cmd_args, "pack-objects");
> +	argv_array_push(&cmd_args, "--keep-true-parents");
> +	argv_array_push(&cmd_args, "--honor-pack-keep");
> +	argv_array_push(&cmd_args, "--non-empty");
> +	argv_array_push(&cmd_args, "--all");
> +	argv_array_push(&cmd_args, "--reflog");
> +	if (window)
> +		argv_array_pushf(&cmd_args, "--window=%u", window);
> +	if (window_memory)
> +		argv_array_pushf(&cmd_args, "--window-memory=%u", window_memory);
> +	if (depth)
> +		argv_array_pushf(&cmd_args, "--depth=%u", depth);
> +	if (max_pack_size)
> +		argv_array_pushf(&cmd_args, "--max_pack_size=%u", max_pack_size);
> +	if (no_reuse_delta)
> +		argv_array_pushf(&cmd_args, "--no-reuse-delta");
> +	if (no_reuse_object)
> +		argv_array_pushf(&cmd_args, "--no-reuse-object");
> +
> +	if (!pack_everything && !pack_everything_but_loose) {
> +		argv_array_push(&cmd_args, "--unpacked");
> +		argv_array_push(&cmd_args, "--incremental");
> +	} else {
> +		get_pack_filenames(&existing_packs);
> +
> +		if (existing_packs.nr && delete_redundant) {
> +			if (unpack_unreachable)
> +				argv_array_pushf(&cmd_args,
> +						"--unpack-unreachable=%s",
> +						unpack_unreachable);
> +			else if (pack_everything_but_loose)
> +				argv_array_push(&cmd_args,
> +						"--unpack-unreachable");
> +		}
> +	}
> +
> +	if (local)
> +		argv_array_push(&cmd_args,  "--local");
> +	if (quiet)
> +		argv_array_push(&cmd_args,  "--quiet");

The original seems to push "-q", but it is probably OK to make it
more readable by spelling it out like this.

> +	if (delta_base_offset)
> +		argv_array_push(&cmd_args,  "--delta-base-offset");
> +
> +	argv_array_push(&cmd_args, packtmp);
> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.argv = cmd_args.argv;
> +	cmd.git_cmd = 1;
> +	cmd.out = -1;
> +	cmd.no_stdin = 1;
> +
> +	ret = start_command(&cmd);
> +	if (ret)
> +		return 1;
> +
> +	count_packs = 0;
> +	out = xfdopen(cmd.out, "r");
> +	while (strbuf_getline(&line, out, '\n') != EOF) {
> +		if (line.len != 40)
> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
> +		strbuf_addstr(&line, "");

What is this addstr() about?

> +		string_list_append(&names, line.buf);
> +		count_packs++;

It probably is more in line with our naming convention to call this
nr_packs, num_packs, etc.  "count_packs" sounds more like a boolean
that instructs the code to either count or not bother counting,
which this thing is not.

> +	}
> +	fclose(out);
> +	ret = finish_command(&cmd);
> +	if (ret)
> +		return 1;
> +	argv_array_clear(&cmd_args);
> +
> +	if (!count_packs && !quiet)
> +		printf("Nothing new to pack.\n");
> +
> +	int failed = 0;

decl-after-stmt.

> +	for_each_string_list_item(item, &names) {
> +		for (ext = 0; ext < 2; ext++) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/%s%s", packdir,
> +						item->string, exts[ext]);
> +			if (!file_exists(fname)) {
> +				free(fname);
> +				continue;
> +			}
> +
> +			fname_old = mkpathdup("%s/old-%s%s", packdir,
> +						item->string, exts[ext]);
> +			if (file_exists(fname_old))
> +				unlink(fname_old);
> +
> +			if (rename(fname, fname_old)) {
> +				failed = 1;
> +				break;

"break"-ing from here leaks fname_old.  As the only out-of-line call
file_exists() is just a thin wrapper around lstat(), I think it is
fine not to pathdup the fname_old here.

> +			}
> +			string_list_append_nodup(&rollback, fname);
> +			free(fname);

This looks bad, doesn't it?  append_nodup() lets &rollback string
list to take the ownership of the piece of memory pointed at by
fname, but then you free it here, no?

If you initialize &rollback with INIT_NODUP, you would not have to
call append_nodup().

> +			free(fname_old);
> +		}
> +		if (failed)
> +			break;
> +	}
> +	if (failed) {
> +		struct string_list rollback_failure;

Initialization?

> +		for_each_string_list_item(item, &rollback) {
> +			char *fname, *fname_old;
> +			fname = mkpathdup("%s/%s", packdir, item->string);
> +			fname_old = mkpath("%s/old-%s", packdir, item->string);
> +			if (rename(fname_old, fname))
> +				string_list_append(&rollback_failure, fname);
> +			free(fname);
> +		}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
@ 2013-08-21 21:52                                                       ` Matthieu Moy
  2013-08-21 22:15                                                       ` Stefan Beller
  2013-08-22 21:03                                                       ` Jonathan Nieder
  2 siblings, 0 replies; 72+ messages in thread
From: Matthieu Moy @ 2013-08-21 21:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, mfick, apelisse, pclouds, iveqy, mackyle, j6t

Junio C Hamano <gitster@pobox.com> writes:

> Stefan Beller <stefanbeller@googlemail.com> writes:
>
>> The motivation of this patch is to get closer to a goal of being
>> able to have a core subset of git functionality built in to git.
>> That would mean
>>
>>  * people on Windows could get a copy of at least the core parts
>>    of Git without having to install a Unix-style shell
>>
>>  * people deploying to servers don't have to rewrite the #! line
>>    or worry about the PATH and quality of installed POSIX
>>    utilities, if they are only using the built-in part written
>>    in C
>
> I am not sure what is meant by the latter.  Rewriting #! is part of
> any scripted Porcelain done by the top-level Makefile, and I do not
> think we have seen any problem reports on it.

I think the case of a server with exotic OS and totally broken /bin/sh
would also benefit from this (the user won't have to find a non-broken
sh and point SHELL_PATH to it). I have no concrete example though.

>> +		size_t len = strlen(e->d_name) - strlen(".pack");
>
> decl-after-stmt.

Stefan: you can make sure this does not happen again by adding

CFLAGS += -Wdeclaration-after-statement

in config.mak.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
  2013-08-21 21:52                                                       ` Matthieu Moy
@ 2013-08-21 22:15                                                       ` Stefan Beller
  2013-08-21 22:50                                                         ` Junio C Hamano
  2013-08-22 10:46                                                         ` Johannes Sixt
  2013-08-22 21:03                                                       ` Jonathan Nieder
  2 siblings, 2 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 22:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, mackyle, j6t

[-- Attachment #1: Type: text/plain, Size: 12228 bytes --]

On 08/21/2013 10:56 PM, Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:
> 
>> The motivation of this patch is to get closer to a goal of being
>> able to have a core subset of git functionality built in to git.
>> That would mean
>>
>>  * people on Windows could get a copy of at least the core parts
>>    of Git without having to install a Unix-style shell
>>
>>  * people deploying to servers don't have to rewrite the #! line
>>    or worry about the PATH and quality of installed POSIX
>>    utilities, if they are only using the built-in part written
>>    in C
> 
> I am not sure what is meant by the latter.  Rewriting #! is part of
> any scripted Porcelain done by the top-level Makefile, and I do not
> think we have seen any problem reports on it.
> 
> As to "quality of ... utilities", I think the real issue some people
> in the thread had was not about "deploying to servers" but about
> installing in a minimalistic chrooted environment where standard
> tools may be lacking.
> 
>> diff --git a/builtin/repack.c b/builtin/repack.c
>> new file mode 100644
>> index 0000000..fb050c0
>> --- /dev/null
>> +++ b/builtin/repack.c
>> @@ -0,0 +1,376 @@
>> +/*
>> + * The shell version was written by Linus Torvalds (2005) and many others.
>> + * This is a translation into C by Stefan Beller (2013)
>> + */
> 
> I am not sure if we want to record "ownership" in the code like
> this; it will go stale over time.

I'll remove it. Initially I put it there as I found similar 
comments in other files as well.


>> +static int delta_base_offset = 1;
>> +char *packdir;
> 
> Does this have to be global?

We could pass it to all the functions, making it not global.
I'd be ok with that for the functions get_pack_filenames 
and remove_redundant_pack, but we also need to know
packdir in remove_temporary_files which is called from
the signal handler remove_pack_on_signal.

As the path is pretty obvious (get_object_directory() + "/pack"),
we could however also construct it again in the signal handler.


> So in summary:
> 
> 	dir = opendir(packdir);
>         if (!dir)
> 		return;
> 
> 	strbuf_addf(&buf, "%s-", packtmp);

packtmp is not yet a global variable, but could be passed to 
to this function. Currently we're reconstructing it here.

> 
>         /* Point at the slash at the end of ".../objects/pack/" */
> 	dirlen = strlen(packdir) + 1;
>         /* Point at the dash at the end of ".../.tmp-%d-pack-" */
>         prefixlen = buf.len - dirlen;
> 
> You would need to move the initialization of packdir and packtmp
> before sigchain_push() in cmd_repack() if you were to do this.

Ah ok, I'll do so.

>> +
>> +		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
>> +			string_list_append_nodup(fname_list, fname);
> 
> mental note: this is getting names of non-kept packs, not all packs.

I should document that. ;)


>> +	while (strbuf_getline(&line, out, '\n') != EOF) {
>> +		if (line.len != 40)
>> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
>> +		strbuf_addstr(&line, "");
> 
> What is this addstr() about?

According to the documentation of strbufs, we cannot assume to have sane 
strings, but anything. Adding an empty string however will make sure to
add a NUL-terminated string to the buffer, no?

In a previous roll of this patch, which operated on char* line,
there was just line[40] = '\0'; // replacing '\n' by '\0'
to have it sane in the string list.


> 
>> +		string_list_append(&names, line.buf);
>> +		count_packs++;
> 
> It probably is more in line with our naming convention to call this
> nr_packs, num_packs, etc.  "count_packs" sounds more like a boolean
> that instructs the code to either count or not bother counting,
> which this thing is not.

This is something subtle, but important to know. Thanks, will be fixed in
the reroll.


>> +
>> +			if (rename(fname, fname_old)) {
>> +				failed = 1;
>> +				break;
> 
> "break"-ing from here leaks fname_old.  As the only out-of-line call
> file_exists() is just a thin wrapper around lstat(), I think it is
> fine not to pathdup the fname_old here.

fixed

I'd really appreciate, if there was documentation on these functions.
(When is mkpath safe? What is better in which situation: mkpath or strbufs?)
Maybe I could start doing it (but only those functions I used so far,
there are many more in cache.h)

> 
>> +			}
>> +			string_list_append_nodup(&rollback, fname);
>> +			free(fname);
> 
> This looks bad, doesn't it?  append_nodup() lets &rollback string
> list to take the ownership of the piece of memory pointed at by
> fname, but then you free it here, no?
> 
> If you initialize &rollback with INIT_NODUP, you would not have to
> call append_nodup().

Removed the free.
Having rollback initialized with NODUP and then not explicitely 
using append_nodup() makes me feel unhappy, because now you need
to check different places to make sure there is no leaking memory,
(you need to know the list is NODUP). I changed it nevertheless,
maybe I feel enlightened later on. ;)

As Matthieu proposed, I also set
CFLAGS += -Wdeclaration-after-statement in config.mak now. Hopefully
I don't screw up again now.

Thanks,
Stefan

--8<--
From 79945f5ae45f08fa2dbabfa1f6b7cd0b344ec0b3 Mon Sep 17 00:00:00 2001
From: Stefan Beller <stefanbeller@googlemail.com>
Date: Thu, 22 Aug 2013 00:13:35 +0200
Subject: [PATCH] Suggestions by Junio

---
 builtin/repack.c | 68 ++++++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 1f13e0d..bb90f07 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1,8 +1,3 @@
-/*
- * The shell version was written by Linus Torvalds (2005) and many others.
- * This is a translation into C by Stefan Beller (2013)
- */
-
 #include "builtin.h"
 #include "cache.h"
 #include "dir.h"
@@ -13,9 +8,8 @@
 #include "string-list.h"
 #include "argv-array.h"
 
-/* enabled by default since 22c79eab (2008-06-25) */
 static int delta_base_offset = 1;
-char *packdir;
+char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [options]"),
@@ -41,18 +35,16 @@ static void remove_temporary_files(void)
 	DIR *dir;
 	struct dirent *e;
 
-	/* .git/objects/pack */
-	strbuf_addstr(&buf, get_object_directory());
-	strbuf_addstr(&buf, "/pack");
-	dir = opendir(buf.buf);
-	if (!dir) {
-		strbuf_release(&buf);
+	dir = opendir(packdir);
+	if (!dir)
 		return;
-	}
 
-	/* .git/objects/pack/.tmp-$$-pack-* */
+	strbuf_addstr(&buf, packdir);
+
+	/* dirlen holds the length of the path before the file name */
 	dirlen = buf.len + 1;
-	strbuf_addf(&buf, "/.tmp-%d-pack-", (int)getpid());
+	strbuf_addf(&buf, "%s", packtmp);
+	/* prefixlen holds the length of the prefix */
 	prefixlen = buf.len - dirlen;
 
 	while ((e = readdir(dir))) {
@@ -73,11 +65,16 @@ static void remove_pack_on_signal(int signo)
 	raise(signo);
 }
 
+/*
+ * Adds all packs hex strings to the fname list, which do not
+ * have a corresponding .keep file.
+ */
 static void get_pack_filenames(struct string_list *fname_list)
 {
 	DIR *dir;
 	struct dirent *e;
 	char *fname;
+	size_t len;
 
 	if (!(dir = opendir(packdir)))
 		return;
@@ -86,7 +83,7 @@ static void get_pack_filenames(struct string_list *fname_list)
 		if (suffixcmp(e->d_name, ".pack"))
 			continue;
 
-		size_t len = strlen(e->d_name) - strlen(".pack");
+		len = strlen(e->d_name) - strlen(".pack");
 		fname = xmemdupz(e->d_name, len);
 
 		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
@@ -95,14 +92,14 @@ static void get_pack_filenames(struct string_list *fname_list)
 	closedir(dir);
 }
 
-static void remove_redundant_pack(const char *path, const char *sha1)
+static void remove_redundant_pack(const char *path_prefix, const char *hex)
 {
 	const char *exts[] = {".pack", ".idx", ".keep"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
 
-	strbuf_addf(&buf, "%s/%s", path, sha1);
+	strbuf_addf(&buf, "%s/%s", path_prefix, hex);
 	plen = buf.len;
 
 	for (i = 0; i < ARRAY_SIZE(exts); i++) {
@@ -115,15 +112,14 @@ static void remove_redundant_pack(const char *path, const char *sha1)
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
 	const char *exts[2] = {".idx", ".pack"};
-	char *packtmp;
 	struct child_process cmd;
 	struct string_list_item *item;
 	struct argv_array cmd_args = ARGV_ARRAY_INIT;
 	struct string_list names = STRING_LIST_INIT_DUP;
-	struct string_list rollback = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_NODUP;
 	struct string_list existing_packs = STRING_LIST_INIT_DUP;
 	struct strbuf line = STRBUF_INIT;
-	int count_packs, ext, ret;
+	int nr_packs, ext, ret, failed;
 	FILE *out;
 
 	/* variables to be filled by option parsing */
@@ -173,11 +169,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
 				git_repack_usage, 0);
 
-	sigchain_push_common(remove_pack_on_signal);
-
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
+	sigchain_push_common(remove_pack_on_signal);
+
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
 	argv_array_push(&cmd_args, "--honor-pack-keep");
@@ -233,14 +229,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (ret)
 		return ret;
 
-	count_packs = 0;
+	nr_packs = 0;
 	out = xfdopen(cmd.out, "r");
 	while (strbuf_getline(&line, out, '\n') != EOF) {
 		if (line.len != 40)
 			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
 		strbuf_addstr(&line, "");
 		string_list_append(&names, line.buf);
-		count_packs++;
+		nr_packs++;
 	}
 	fclose(out);
 	ret = finish_command(&cmd);
@@ -248,10 +244,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		return ret;
 	argv_array_clear(&cmd_args);
 
-	if (!count_packs && !quiet)
+	if (!nr_packs && !quiet)
 		printf("Nothing new to pack.\n");
 
-	int failed = 0;
+	failed = 0;
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
@@ -262,7 +258,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 
-			fname_old = mkpathdup("%s/old-%s%s", packdir,
+			fname_old = mkpath("%s/old-%s%s", packdir,
 						item->string, exts[ext]);
 			if (file_exists(fname_old))
 				unlink(fname_old);
@@ -271,15 +267,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				failed = 1;
 				break;
 			}
-			string_list_append_nodup(&rollback, fname);
-			free(fname);
-			free(fname_old);
+			string_list_append(&rollback, fname);
 		}
 		if (failed)
 			break;
 	}
 	if (failed) {
-		struct string_list rollback_failure;
+		struct string_list rollback_failure = STRING_LIST_INIT_DUP;
 		for_each_string_list_item(item, &rollback) {
 			char *fname, *fname_old;
 			fname = mkpathdup("%s/%s", packdir, item->string);
@@ -289,7 +283,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			free(fname);
 		}
 
-		if (rollback.nr) {
+		if (rollback_failure.nr) {
 			int i;
 			fprintf(stderr,
 				"WARNING: Some packs in use have been renamed by\n"
@@ -299,10 +293,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				"WARNING: attempt to rename them back to their\n"
 				"WARNING: original names also failed.\n"
 				"WARNING: Please rename them in %s manually:\n", packdir);
-			for (i = 0; i < rollback.nr; i++)
+			for (i = 0; i < rollback_failure.nr; i++)
 				fprintf(stderr, "WARNING:   old-%s -> %s\n",
-					rollback.items[i].string,
-					rollback.items[i].string);
+					rollback_failure.items[i].string,
+					rollback_failure.items[i].string);
 		}
 		exit(1);
 	}
-- 
1.8.4.rc3.1.gc1ebd90




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: Dokumenting api-paths.txt
  2013-08-20 21:59                                             ` Jonathan Nieder
@ 2013-08-21 22:43                                               ` Stefan Beller
  2013-08-22 17:29                                                 ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 22:43 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Johannes Sixt

[-- Attachment #1: Type: text/plain, Size: 3185 bytes --]

On 08/20/2013 11:59 PM, Jonathan Nieder wrote:
> Stefan Beller wrote:
>>>> On 08/20/2013 03:31 PM, Johannes Sixt wrote:
>>>>> Stefan Beller wrote:
> 
>>>>>> +    packdir = mkpathdup("%s/pack", get_object_directory());
>>>>>> +    packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, getpid());
>>>>>
>>>>> Should this not be
>>>>>
>>>>>     packdir = xstrdup(git_path("pack"));
>>>>>     packtmp = xstrdup(git_path("pack/.tmp-%d-pack", getpid()));
> [...]
>> So if I have 
>> 	packdir = xstrdup(git_path("pack"));
>> 	...
>> 	path = git_path("%s/%s", packdir, filename)
>>
>> This produces something as:
>> .git/.git/objects/pack/.tmp-13199-pack-c59c5758ef159b272f6ab10cb9fadee443966e71.idx
>> definitely having one .git too much.
> 
> The version with get_object_directory() was right.  The object
> directory is not even necessarily under .git/, since it can be
> overridden using the GIT_OBJECT_DIRECTORY envvar.
> 
>> Also interesting to add would be that git_path operates in the
>> .git/objects directory?
> 
> git_path is for resolving paths within GIT_DIR, such as
> git_path("config") and git_path("COMMIT_EDITMSG").
> 
> Jonathan
> 

Before we're doing double work, I just wrote down my understanding
so far. Feel free to tweak it, or remove obvious parts.

Thanks,
Stefan


---
path API
========

The functions described in this document are meant to be
used when dealing with pathes in the filesystem. The functions
are just for the string manipulations of the pathes, none of
the functions touches the actual filesystem.


`mkpath`::
	The parameters are in printf format. This function can be
	used to construct short-lived filename strings. It is meant
	to be used for direct use in system functions such as
	dir(mkpath("%s/pack", get_objects_directory())).
	The return value is a pointer to such a sanitized filename
	string, but it resides in a static buffer, so it will
	be overwritten by the next call to mkpath (or other functions?)
	This function only does string handling. It doesn't actually
	change anything on the filesystem. (This is not Gits mkdir -p)

`mkpathdup`::
	The same as mkpath, but the memory is duplicated into a new
	buffer, so it is not short-lived, but stays as long as the
	caller doesn't free the memory, which the caller is supposed
	to do.

`xstrdup`::
	Duplicates the given string, making the caller responsible
	to free the return value. Basically the same as strdup(2)
	with errorhandling.

	I am not sure if this belongs into the path api documentation,
	but it's not documented anywhere else.

`git_path`::
	git_path is for resolving paths within GIT_DIR, such as
	git_path("config") and git_path("COMMIT_EDITMSG").
	This is similar to mkpath, returning a pointer to a static
	buffer, which may be overwritten soon.

`git_pathdup`::
	The same as git_path, but creating a new buffer. The caller
	is responsible to free the returned buffer.


`git_path_submodule`::

`mksnpath`::

`git_snpath`::

`sha1_file_name`::
	Returns the filename to a given sha1 value within
	the objects directory.

`sha1_pack_name`::
	
`sha1_pack_index_name`::



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 22:15                                                       ` Stefan Beller
@ 2013-08-21 22:50                                                         ` Junio C Hamano
  2013-08-21 22:57                                                           ` Stefan Beller
  2013-08-22 10:46                                                         ` Johannes Sixt
  1 sibling, 1 reply; 72+ messages in thread
From: Junio C Hamano @ 2013-08-21 22:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

>>> +static int delta_base_offset = 1;
>>> +char *packdir;
>> 
>> Does this have to be global?
>
> We could pass it to all the functions, making it not global.

Sorry for being unclear; I meant "not static".  It is perfectly fine
for this to be a file-scope static.

>>> +
>>> +		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
>>> +			string_list_append_nodup(fname_list, fname);
>> 
>> mental note: this is getting names of non-kept packs, not all packs.
>
> I should document that. ;)

Rather, consider giving the function a better name, perhaps?

>>> +	while (strbuf_getline(&line, out, '\n') != EOF) {
>>> +		if (line.len != 40)
>>> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
>>> +		strbuf_addstr(&line, "");
>> 
>> What is this addstr() about?
>
> According to the documentation of strbufs, we cannot assume to have sane 
> strings, but anything.

Sorry, I do not get this.  What is a sane string and what is an
insane string?  sb->buf[sb-len] is always terminated with a NUL
when strbuf_getline() returns success, isn't it?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 22:50                                                         ` Junio C Hamano
@ 2013-08-21 22:57                                                           ` Stefan Beller
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Beller @ 2013-08-21 22:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, mackyle, j6t

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

On 08/22/2013 12:50 AM, Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:
> 
>>>> +static int delta_base_offset = 1;
>>>> +char *packdir;
>>>
>>> Does this have to be global?
>>
>> We could pass it to all the functions, making it not global.
> 
> Sorry for being unclear; I meant "not static".  It is perfectly fine
> for this to be a file-scope static.

No need to be sorry! I am sleepy, and may missunderstand even clear
messages. I'll change it to static of course.

> 
>>>> +
>>>> +		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
>>>> +			string_list_append_nodup(fname_list, fname);
>>>
>>> mental note: this is getting names of non-kept packs, not all packs.
>>
>> I should document that. ;)
> 
> Rather, consider giving the function a better name, perhaps?

What about one of:
get_non_kept_pack_filenames
get_prunable_pack_filenames
get_remove_candidate_pack_filenames

> 
>>>> +	while (strbuf_getline(&line, out, '\n') != EOF) {
>>>> +		if (line.len != 40)
>>>> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
>>>> +		strbuf_addstr(&line, "");
>>>
>>> What is this addstr() about?
>>
>> According to the documentation of strbufs, we cannot assume to have sane 
>> strings, but anything.
> 
> Sorry, I do not get this.  What is a sane string and what is an
> insane string?  sb->buf[sb-len] is always terminated with a NUL
> when strbuf_getline() returns success, isn't it?
> 

I should read the strbuf documentation again. Thanks for pointing it
out. I'll remove the strbuf_addstr(&line, "");

Thanks for your patience in the reviews,
Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 22:15                                                       ` Stefan Beller
  2013-08-21 22:50                                                         ` Junio C Hamano
@ 2013-08-22 10:46                                                         ` Johannes Sixt
  1 sibling, 0 replies; 72+ messages in thread
From: Johannes Sixt @ 2013-08-22 10:46 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, git, mfick, apelisse, Matthieu.Moy, pclouds,
	iveqy, mackyle

Am 22.08.2013 00:15, schrieb Stefan Beller:
> On 08/21/2013 10:56 PM, Junio C Hamano wrote:
>> Stefan Beller <stefanbeller@googlemail.com> writes:
>>> +static int delta_base_offset = 1;
>>> +char *packdir;
>>
>> Does this have to be global?
>
> As the path is pretty obvious (get_object_directory() + "/pack"),
> we could however also construct it again in the signal handler.

I would advise against doing that. The recomputation would call malloc(), 
which is not async-signal-safe. (It would not be the first case where we 
call "forbidden" functions from signal handlers, but we need not pile more 
on top of them.)

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21 13:07                                                   ` Matthieu Moy
@ 2013-08-22 10:46                                                     ` Johannes Sixt
  0 siblings, 0 replies; 72+ messages in thread
From: Johannes Sixt @ 2013-08-22 10:46 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Stefan Beller, git, mfick, apelisse, pclouds, iveqy, gitster,
	mackyle

Am 21.08.2013 15:07, schrieb Matthieu Moy:
> Stefan Beller <stefanbeller@googlemail.com> writes:
>
>> But as these follow up changes heavily rely on the very first patch
>> I will first try to get that right, meaning accepted into pu.
>> Then I can send patches with these proposals such as making more
>> functions.
>
> I think it's better to get the style right before, to avoid doubling the
> review effort (review a hard-to-review patch first, and then re-review a
> style-fix one).

If by "style fix" you mean "coding style fix", I agree.

But, IMO, refactoring the long function can wait because the long function 
is easier to compare to the shell script, and I think that is more 
important later when you need to dig the history.

It is already too late to save review effort.

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C.
  2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
  2013-08-21 12:47                                                 ` Stefan Beller
  2013-08-21 12:53                                                 ` Stefan Beller
@ 2013-08-22 10:46                                                 ` Johannes Sixt
  2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
  2 siblings, 1 reply; 72+ messages in thread
From: Johannes Sixt @ 2013-08-22 10:46 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Stefan Beller, git, mfick, apelisse, pclouds, iveqy, gitster,
	mackyle

Am 21.08.2013 10:49, schrieb Matthieu Moy:
> Stefan Beller <stefanbeller@googlemail.com> writes:
>> +	for_each_string_list_item(item, &names) {
>> +		for (ext = 0; ext < 2; ext++) {
>> +			char *fname, *fname_old;
>> +			fname = mkpathdup("%s/%s%s", packdir, item->string, exts[ext]);
>> +			if (!file_exists(fname)) {
>> +				free(fname);
>> +				continue;
>> +			}
>> +
>> +			fname_old = mkpath("%s/old-%s%s", packdir, item->string, exts[ext]);
>> +			if (file_exists(fname_old))
>> +				unlink(fname_old);
>
> Unchecked returned value.

Good catch! The original was 'rm -f ... && mv ... || failed=t'

>> +	/* Now the ones with the same name are out of the way... */
>> +	for_each_string_list_item(item, &names) {
>> +		for (ext = 0; ext < 2; ext++) {
>> +			char *fname, *fname_old;
>> +			struct stat statbuffer;
>> +			fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext]);
>> +			fname_old = mkpath("%s-%s%s", packtmp, item->string, exts[ext]);
>> +			if (!stat(fname_old, &statbuffer)) {
>> +				statbuffer.st_mode &= ~S_IWUSR | ~S_IWGRP | ~S_IWOTH;
>> +				chmod(fname_old, statbuffer.st_mode);
>
> Unchecked return value.

The original was an unchecked 'chmod a-w', so we don't care.

Of course, we could mimic the original better by issuing warnings.

>
>> +	/* Remove the "old-" files */
>> +	for_each_string_list_item(item, &names) {
>> +		char *fname;
>> +		fname = mkpath("%s/old-pack-%s.idx", packdir, item->string);
>> +		if (remove_path(fname))
>> +			die_errno(_("removing '%s' failed"), fname);
>> +
>> +		fname = mkpath("%s/old-pack-%s.pack", packdir, item->string);
>> +		if (remove_path(fname))
>> +			die_errno(_("removing '%s' failed"), fname);
>
> Does this have to be a fatal error? If I read correctly, it wasn't fatal
> in the shell version.

Good catch.

-- Hannes

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Dokumenting api-paths.txt
  2013-08-21 22:43                                               ` Stefan Beller
@ 2013-08-22 17:29                                                 ` Junio C Hamano
  0 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2013-08-22 17:29 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git, Johannes Sixt

Stefan Beller <stefanbeller@googlemail.com> writes:

>> git_path is for resolving paths within GIT_DIR, such as
>> git_path("config") and git_path("COMMIT_EDITMSG").
>> 
>> Jonathan
>
> Before we're doing double work, I just wrote down my understanding
> so far. Feel free to tweak it, or remove obvious parts.

> path API
> ========

I am not sure if they deserve to be called "API"; it is just a
set of simple helper functions.

> `mkpath`::
> 	The parameters are in printf format. This function can be
> 	used to construct short-lived filename strings. It is meant
> 	to be used for direct use in system functions such as
> 	dir(mkpath("%s/pack", get_objects_directory())).
> 	The return value is a pointer to such a sanitized filename
> 	string, but it resides in a static buffer, so it will
> 	be overwritten by the next call to mkpath (or other functions?)
> 	This function only does string handling. It doesn't actually
> 	change anything on the filesystem. (This is not Gits mkdir -p)
>
> `mkpathdup`::
> 	The same as mkpath, but the memory is duplicated into a new
> 	buffer, so it is not short-lived, but stays as long as the
> 	caller doesn't free the memory, which the caller is supposed
> 	to do.

Good.

> `xstrdup`::
> 	Duplicates the given string, making the caller responsible
> 	to free the return value. Basically the same as strdup(2)
> 	with errorhandling.
>
> 	I am not sure if this belongs into the path api documentation,
> 	but it's not documented anywhere else.

This does not belong.  It should be grouped together with xmalloc(),
xcalloc(), xrealloc(), etc. and these are not "path" functions.

> `git_path`::
> 	git_path is for resolving paths within GIT_DIR, such as
> 	git_path("config") and git_path("COMMIT_EDITMSG").
> 	This is similar to mkpath, returning a pointer to a static
> 	buffer, which may be overwritten soon.
>
> `git_pathdup`::
> 	The same as git_path, but creating a new buffer. The caller
> 	is responsible to free the returned buffer.

OK.

> `git_path_submodule`::

Similar to git_path() but is run for a submodule specified by the
"path" given as its first parameter.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH] repack: rewrite the shell script in C (squashing proposal)
  2013-08-22 10:46                                                 ` Johannes Sixt
@ 2013-08-22 20:06                                                   ` Stefan Beller
  2013-08-22 20:31                                                     ` Junio C Hamano
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Beller @ 2013-08-22 20:06 UTC (permalink / raw)
  To: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, gitster,
	mackyle, j6t
  Cc: Stefan Beller

This patch is meant to be squashed into bb4335a21441a0
(repack: rewrite the shell script in C), I'll do so when rerolling
the series. For reviewing I'll just send this patch.

* Remove comments, which likely get out of date (authorship is kept in
  git anyway)
* rename get_pack_filenames to get_non_kept_pack_filenames
* catch return value of unlink and fail as the shell version did
* beauty fixes to remove_temporary_files as Junio proposed
* install signal handling after static variables packdir, packtmp are set
* remove adding the empty string to the buffer.
* fix the rollback mechanism (wrong variable name)

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 builtin/repack.c | 78 ++++++++++++++++++++++++++------------------------------
 1 file changed, 36 insertions(+), 42 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 1f13e0d..e0d1f17 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1,8 +1,3 @@
-/*
- * The shell version was written by Linus Torvalds (2005) and many others.
- * This is a translation into C by Stefan Beller (2013)
- */
-
 #include "builtin.h"
 #include "cache.h"
 #include "dir.h"
@@ -13,9 +8,8 @@
 #include "string-list.h"
 #include "argv-array.h"
 
-/* enabled by default since 22c79eab (2008-06-25) */
 static int delta_base_offset = 1;
-char *packdir;
+static char *packdir, *packtmp;
 
 static const char *const git_repack_usage[] = {
 	N_("git repack [options]"),
@@ -41,18 +35,16 @@ static void remove_temporary_files(void)
 	DIR *dir;
 	struct dirent *e;
 
-	/* .git/objects/pack */
-	strbuf_addstr(&buf, get_object_directory());
-	strbuf_addstr(&buf, "/pack");
-	dir = opendir(buf.buf);
-	if (!dir) {
-		strbuf_release(&buf);
+	dir = opendir(packdir);
+	if (!dir)
 		return;
-	}
 
-	/* .git/objects/pack/.tmp-$$-pack-* */
+	strbuf_addstr(&buf, packdir);
+
+	/* dirlen holds the length of the path before the file name */
 	dirlen = buf.len + 1;
-	strbuf_addf(&buf, "/.tmp-%d-pack-", (int)getpid());
+	strbuf_addf(&buf, "%s", packtmp);
+	/* prefixlen holds the length of the prefix */
 	prefixlen = buf.len - dirlen;
 
 	while ((e = readdir(dir))) {
@@ -73,11 +65,16 @@ static void remove_pack_on_signal(int signo)
 	raise(signo);
 }
 
-static void get_pack_filenames(struct string_list *fname_list)
+/*
+ * Adds all packs hex strings to the fname list, which do not
+ * have a corresponding .keep file.
+ */
+static void get_non_kept_pack_filenames(struct string_list *fname_list)
 {
 	DIR *dir;
 	struct dirent *e;
 	char *fname;
+	size_t len;
 
 	if (!(dir = opendir(packdir)))
 		return;
@@ -86,7 +83,7 @@ static void get_pack_filenames(struct string_list *fname_list)
 		if (suffixcmp(e->d_name, ".pack"))
 			continue;
 
-		size_t len = strlen(e->d_name) - strlen(".pack");
+		len = strlen(e->d_name) - strlen(".pack");
 		fname = xmemdupz(e->d_name, len);
 
 		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
@@ -95,14 +92,14 @@ static void get_pack_filenames(struct string_list *fname_list)
 	closedir(dir);
 }
 
-static void remove_redundant_pack(const char *path, const char *sha1)
+static void remove_redundant_pack(const char *path_prefix, const char *hex)
 {
 	const char *exts[] = {".pack", ".idx", ".keep"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
 
-	strbuf_addf(&buf, "%s/%s", path, sha1);
+	strbuf_addf(&buf, "%s/%s", path_prefix, hex);
 	plen = buf.len;
 
 	for (i = 0; i < ARRAY_SIZE(exts); i++) {
@@ -115,15 +112,14 @@ static void remove_redundant_pack(const char *path, const char *sha1)
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
 	const char *exts[2] = {".idx", ".pack"};
-	char *packtmp;
 	struct child_process cmd;
 	struct string_list_item *item;
 	struct argv_array cmd_args = ARGV_ARRAY_INIT;
 	struct string_list names = STRING_LIST_INIT_DUP;
-	struct string_list rollback = STRING_LIST_INIT_DUP;
+	struct string_list rollback = STRING_LIST_INIT_NODUP;
 	struct string_list existing_packs = STRING_LIST_INIT_DUP;
 	struct strbuf line = STRBUF_INIT;
-	int count_packs, ext, ret;
+	int nr_packs, ext, ret, failed;
 	FILE *out;
 
 	/* variables to be filled by option parsing */
@@ -173,11 +169,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_repack_options,
 				git_repack_usage, 0);
 
-	sigchain_push_common(remove_pack_on_signal);
-
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
 
+	sigchain_push_common(remove_pack_on_signal);
+
 	argv_array_push(&cmd_args, "pack-objects");
 	argv_array_push(&cmd_args, "--keep-true-parents");
 	argv_array_push(&cmd_args, "--honor-pack-keep");
@@ -201,7 +197,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		argv_array_push(&cmd_args, "--unpacked");
 		argv_array_push(&cmd_args, "--incremental");
 	} else {
-		get_pack_filenames(&existing_packs);
+		get_non_kept_pack_filenames(&existing_packs);
 
 		if (existing_packs.nr && delete_redundant) {
 			if (unpack_unreachable)
@@ -233,14 +229,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (ret)
 		return ret;
 
-	count_packs = 0;
+	nr_packs = 0;
 	out = xfdopen(cmd.out, "r");
 	while (strbuf_getline(&line, out, '\n') != EOF) {
 		if (line.len != 40)
 			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
-		strbuf_addstr(&line, "");
 		string_list_append(&names, line.buf);
-		count_packs++;
+		nr_packs++;
 	}
 	fclose(out);
 	ret = finish_command(&cmd);
@@ -248,10 +243,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		return ret;
 	argv_array_clear(&cmd_args);
 
-	if (!count_packs && !quiet)
+	if (!nr_packs && !quiet)
 		printf("Nothing new to pack.\n");
 
-	int failed = 0;
+	failed = 0;
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < 2; ext++) {
 			char *fname, *fname_old;
@@ -262,24 +257,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 
-			fname_old = mkpathdup("%s/old-%s%s", packdir,
+			fname_old = mkpath("%s/old-%s%s", packdir,
 						item->string, exts[ext]);
 			if (file_exists(fname_old))
-				unlink(fname_old);
+				if (unlink(fname_old))
+					failed = 1;
 
-			if (rename(fname, fname_old)) {
+			if (!failed && rename(fname, fname_old)) {
 				failed = 1;
 				break;
 			}
-			string_list_append_nodup(&rollback, fname);
-			free(fname);
-			free(fname_old);
+			string_list_append(&rollback, fname);
 		}
 		if (failed)
 			break;
 	}
 	if (failed) {
-		struct string_list rollback_failure;
+		struct string_list rollback_failure = STRING_LIST_INIT_DUP;
 		for_each_string_list_item(item, &rollback) {
 			char *fname, *fname_old;
 			fname = mkpathdup("%s/%s", packdir, item->string);
@@ -289,7 +283,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			free(fname);
 		}
 
-		if (rollback.nr) {
+		if (rollback_failure.nr) {
 			int i;
 			fprintf(stderr,
 				"WARNING: Some packs in use have been renamed by\n"
@@ -299,10 +293,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				"WARNING: attempt to rename them back to their\n"
 				"WARNING: original names also failed.\n"
 				"WARNING: Please rename them in %s manually:\n", packdir);
-			for (i = 0; i < rollback.nr; i++)
+			for (i = 0; i < rollback_failure.nr; i++)
 				fprintf(stderr, "WARNING:   old-%s -> %s\n",
-					rollback.items[i].string,
-					rollback.items[i].string);
+					rollback_failure.items[i].string,
+					rollback_failure.items[i].string);
 		}
 		exit(1);
 	}
-- 
1.8.4.rc3.1.gc1ebd90

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH] repack: rewrite the shell script in C (squashing proposal)
  2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
@ 2013-08-22 20:31                                                     ` Junio C Hamano
  0 siblings, 0 replies; 72+ messages in thread
From: Junio C Hamano @ 2013-08-22 20:31 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy, mackyle, j6t

Stefan Beller <stefanbeller@googlemail.com> writes:

> @@ -41,18 +35,16 @@ static void remove_temporary_files(void)
>  	DIR *dir;
>  	struct dirent *e;
>  
> +	dir = opendir(packdir);
> +	if (!dir)
>  		return;
>  
> +	strbuf_addstr(&buf, packdir);
> +
> +	/* dirlen holds the length of the path before the file name */
>  	dirlen = buf.len + 1;
> +	strbuf_addf(&buf, "%s", packtmp);
> +	/* prefixlen holds the length of the prefix */

Thanks to the name of the variable that is self-describing, this
comment does not add much value.

But it misses the whole point of my suggestion in the earlier
message to phrase these like so:

        /* Point at the slash at the end of ".../objects/pack/" */
	dirlen = strlen(packdir) + 1;
        /* Point at the dash at the end of ".../.tmp-%d-pack-" */
        prefixlen = buf.len - dirlen;

to clarify what the writer considers as "the prefix" is, which may
be quite different from what the readers think "the prefix" is.  In
".tmp-2342-pack-0d8beaa5b76e824c9869f0d1f1b19ec7acf4982f.pack", is
the prefix ".tmp-2342-", ".tmp-2342-pack", or ".tmp-2342-pack-"?

>  int cmd_repack(int argc, const char **argv, const char *prefix)
>  {
> ...
>  	packdir = mkpathdup("%s/pack", get_object_directory());
>  	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
>  
> +	sigchain_push_common(remove_pack_on_signal);
> +
>  	argv_array_push(&cmd_args, "pack-objects");
>  	argv_array_push(&cmd_args, "--keep-true-parents");
>  	argv_array_push(&cmd_args, "--honor-pack-keep");
> ...
> +					rollback_failure.items[i].string,
> +					rollback_failure.items[i].string);
>  		}
>  		exit(1);
>  	}

The scripted version uses

    trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15

so remove_temporary_files() needs to be called before exiting from
the program without getting killed by a signal.

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [RFC PATCHv6 1/2] repack: rewrite the shell script in C
  2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
  2013-08-21 21:52                                                       ` Matthieu Moy
  2013-08-21 22:15                                                       ` Stefan Beller
@ 2013-08-22 21:03                                                       ` Jonathan Nieder
  2 siblings, 0 replies; 72+ messages in thread
From: Jonathan Nieder @ 2013-08-22 21:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, mfick, apelisse, Matthieu.Moy, pclouds, iveqy,
	mackyle, j6t

Junio C Hamano wrote:
> Stefan Beller <stefanbeller@googlemail.com> writes:

>> The motivation of this patch is to get closer to a goal of being
>> able to have a core subset of git functionality built in to git.
>> That would mean
>>
>>  * people on Windows could get a copy of at least the core parts
>>    of Git without having to install a Unix-style shell
>>
>>  * people deploying to servers don't have to rewrite the #! line
>>    or worry about the PATH and quality of installed POSIX
>>    utilities, if they are only using the built-in part written
>>    in C
>
>I am not sure what is meant by the latter.  Rewriting #! is part of
> any scripted Porcelain done by the top-level Makefile, and I do not
> think we have seen any problem reports on it.
>
> As to "quality of ... utilities", I think the real issue some people
> in the thread had was not about "deploying to servers" but about
> installing in a minimalistic chrooted environment where standard
> tools may be lacking.

Thanks for a sanity check.  Yeah, the second item should be about
minimal chroots, not my sloppy guess about some hypothetical bad
operating system with untrustworthy tools.

Jonathan

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2013-08-22 21:04 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-13 19:23 [PATCH] Rewriting git-repack in C Stefan Beller
2013-08-13 19:23 ` [PATCH] repack: rewrite the shell script " Stefan Beller
2013-08-14  7:26   ` Matthieu Moy
2013-08-14 16:26     ` Stefan Beller
2013-08-14 16:27       ` [RFC PATCH] " Stefan Beller
2013-08-14 16:49         ` Antoine Pelisse
2013-08-14 17:04           ` Stefan Beller
2013-08-14 17:19             ` Jeff King
2013-08-14 17:25           ` Martin Fick
2013-08-14 22:16             ` Stefan Beller
2013-08-14 22:28               ` Martin Fick
2013-08-14 22:53                 ` Junio C Hamano
2013-08-14 23:28                   ` Martin Fick
2013-08-15 17:15                     ` Junio C Hamano
2013-08-16  0:12                       ` [RFC PATCHv2] " Stefan Beller
2013-08-17 13:34                         ` René Scharfe
2013-08-17 19:18                           ` Kyle J. McKay
2013-08-18 14:34                           ` Stefan Beller
2013-08-18 14:36                             ` [RFC PATCHv3] " Stefan Beller
2013-08-18 15:41                               ` Kyle J. McKay
2013-08-18 16:44                               ` René Scharfe
2013-08-18 22:26                                 ` [RFC PATCHv4] " Stefan Beller
2013-08-19 23:23                                   ` Stefan Beller
2013-08-20 13:31                                     ` Johannes Sixt
2013-08-20 15:08                                       ` Stefan Beller
2013-08-20 18:38                                         ` Johannes Sixt
2013-08-20 18:57                                         ` René Scharfe
2013-08-20 22:36                                           ` Stefan Beller
2013-08-20 22:38                                             ` [PATCH] " Stefan Beller
2013-08-21  8:25                                               ` Jonathan Nieder
2013-08-21 10:37                                                 ` Stefan Beller
2013-08-21 17:25                                                 ` Stefan Beller
2013-08-21 17:28                                                   ` [RFC PATCHv6 1/2] " Stefan Beller
2013-08-21 17:28                                                     ` [RFC PATCHv6 2/2] repack: retain the return value of pack-objects Stefan Beller
2013-08-21 20:56                                                     ` [RFC PATCHv6 1/2] repack: rewrite the shell script in C Junio C Hamano
2013-08-21 21:52                                                       ` Matthieu Moy
2013-08-21 22:15                                                       ` Stefan Beller
2013-08-21 22:50                                                         ` Junio C Hamano
2013-08-21 22:57                                                           ` Stefan Beller
2013-08-22 10:46                                                         ` Johannes Sixt
2013-08-22 21:03                                                       ` Jonathan Nieder
2013-08-21  8:49                                               ` [PATCH] " Matthieu Moy
2013-08-21 12:47                                                 ` Stefan Beller
2013-08-21 13:05                                                   ` Matthieu Moy
2013-08-21 12:53                                                 ` Stefan Beller
2013-08-21 13:07                                                   ` Matthieu Moy
2013-08-22 10:46                                                     ` Johannes Sixt
2013-08-22 10:46                                                 ` Johannes Sixt
2013-08-22 20:06                                                   ` [PATCH] repack: rewrite the shell script in C (squashing proposal) Stefan Beller
2013-08-22 20:31                                                     ` Junio C Hamano
2013-08-20 22:46                                             ` [RFC PATCHv4] repack: rewrite the shell script in C Jonathan Nieder
2013-08-21  9:20                                             ` Johannes Sixt
2013-08-20 21:24                                       ` Stefan Beller
2013-08-20 21:34                                         ` Jonathan Nieder
2013-08-20 21:40                                           ` Dokumenting api-paths.txt Stefan Beller
2013-08-20 21:59                                             ` Jonathan Nieder
2013-08-21 22:43                                               ` Stefan Beller
2013-08-22 17:29                                                 ` Junio C Hamano
2013-08-14 22:51               ` [RFC PATCH] repack: rewrite the shell script in C Junio C Hamano
2013-08-14 22:59                 ` Matthieu Moy
2013-08-15  7:47                   ` Stefan Beller
2013-08-15  4:15             ` Duy Nguyen
2013-08-14 17:26           ` Junio C Hamano
2013-08-14 22:51           ` Matthieu Moy
2013-08-14 23:25             ` Martin Fick
2013-08-15  0:26               ` Martin Fick
2013-08-15  7:46               ` Stefan Beller
2013-08-15 15:04                 ` Martin Fick
2013-08-15  4:20             ` Duy Nguyen
2013-08-14 17:04         ` Junio C Hamano
2013-08-15  7:53           ` Stefan Beller
2013-08-14  7:12 ` [PATCH] Rewriting git-repack " Matthieu Moy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).