git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/1] Introduce git-recover
@ 2018-08-04 14:22 Edward Thomson
  2018-08-04 14:24 ` [RFC PATCH 1/1] recover: restoration of deleted worktree files Edward Thomson
  2018-08-05  1:34 ` [RFC PATCH 0/1] Introduce git-recover Jonathan Nieder
  0 siblings, 2 replies; 8+ messages in thread
From: Edward Thomson @ 2018-08-04 14:22 UTC (permalink / raw)
  To: git

Hello-

I created a simple shell script a while back to help people recover
files that they deleted from their working directory (but had been added
to the repository), which looks for unreachable blobs in the object
database and places them in the working directory (either en masse,
interactively, or via command-line arguments).

This has been available at https://github.com/ethomson/git-recover for
about a year, and in that time, someone has suggested that I propose
this as part of git itself.  So I thought I'd see if there's any
interest in this.

If there is, I'd like to get a sense of the amount of work required to
make this suitable for inclusion.  There are some larger pieces of work
required -- at a minimum, I think this requires:

- Tests -- there are none, which is fine with me but probably less fine
  for inclusion here.
- Documentation -- the current README is below but it will need proper
  documentation that can be rendered into manpages, etc, by the tools.
- Remove bashisms -- there are many.

Again, this may not be particularly interesting, but I thought I'd send
it along in case it is.

Cheers-
-ed

-----------------------------------------------------------------------

git-recover allows you to recover some files that you've accidentally
deleted from your working directory.  It helps you find files that exist
in the repository's object database - because you ran git add - but were
never committed.

Getting Started
---------------
The simplest way to use git-recover is in interactive mode - simply run
`git-recover -i` and it will show you all the files that you can recover
and prompt you to act.

Using git-recover
-----------------
Running git-recover without any arguments will list all the files (git
"blobs") that were recently orphaned, by their ID.  (Their filename is not 
known.)

You can examine these blobs by running `git show <objectid>`.  If you
find one that you want to recover, you can provide the ID as the argument
to git-recover.  You can specify the `--filename` option to write the
file out and apply any filters that are set up in the repository.  For
example:

    git-recover 38762cf7f55934b34d179ae6a4c80cadccbb7f0a \
        --filename shattered.pdf

You can also specify multiple files to recover, each with an optional
output filename:

    git-recover 38762c --filename one.txt cafebae --filename bae.txt

If you want to recover _all_ the orphaned blobs in your repository, run
`git-recover --all`.  This will write all the orphaned files to the
current working directory, so it's best to run this inside a temporary
directory beneath your working directory.  For example:

    mkdir _tmp && cd _tmp && git-recover --all

By default, git-recover limits itself to recently created orphaned blobs.
If you want to see _all_ orphaned files that have been created in your
repository (but haven't yet been garbage collected), you can run:

    git-recover --full

Options
-------
    git-recover [-a] [-i] [--full] [<id> [-f <filename>] ...]

-a, --all
Write all orphaned blobs to the current working directory.  Each file will
be named using its 40 character object ID.

-i, --interactive
Display information about each orphaned blob and prompt to recover it.

--full
List or recover all orphaned blobs, even those that are in packfiles.  By 
default, `git-recover` will only look at loose object files, which limits
it to the most recently created files.  Examining packfiles may be slow,
especially in large repositories.

<id>
The object ID (or its abbreviation) to recover.  The file will be written to
the current working directory and named using its 40 character object ID,
unless the `-f` option is specified.

-f <filename>, --filename <filename>
When specified after an object ID, the file written will use this filename.
In addition, any filters (for example: CRLF conversion or Git-LFS) will be
run according to the `gitattributes` configuration.


Edward Thomson (1):
  recover: restoration of deleted worktree files

 git-recover.sh | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 311 insertions(+)
 create mode 100755 git-recover.sh

-- 
2.0.0 (libgit2)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 14:22 [RFC PATCH 0/1] Introduce git-recover Edward Thomson
@ 2018-08-04 14:24 ` Edward Thomson
  2018-08-04 15:54   ` Junio C Hamano
  2018-08-05  1:34 ` [RFC PATCH 0/1] Introduce git-recover Jonathan Nieder
  1 sibling, 1 reply; 8+ messages in thread
From: Edward Thomson @ 2018-08-04 14:24 UTC (permalink / raw)
  To: git

Introduce git-recover, a simple script to aide in restoration of deleted
worktree files.  This will look for unreachable blobs in the object
database and prompt users to restore them to disk, either interactively
or on the command-line.
---
 git-recover.sh | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 311 insertions(+)
 create mode 100755 git-recover.sh

diff --git a/git-recover.sh b/git-recover.sh
new file mode 100755
index 000000000..651d4116f
--- /dev/null
+++ b/git-recover.sh
@@ -0,0 +1,311 @@
+#!/usr/bin/env bash
+#
+# This program helps recover files in your repository that were deleted
+# from the working tree.
+#
+# Copyright (c) 2017-2018 Edward Thomson.
+
+set -e
+
+IFS=$'\n'
+
+PROGNAME=$(echo "$0" | sed -e 's/.*\///')
+GIT_DIR=$(git rev-parse --git-dir)
+
+DO_RECOVER=0
+DO_FULL=0
+DO_INTERACTIVE=0
+BLOBS=()
+FILENAMES=()
+
+function die_usage {
+	echo "usage: $PROGNAME [-a] [-i] [--full] [<id> [-f <filename>] ...]" >&2
+	exit 1
+}
+
+while [[ $# -gt 0 ]]; do
+	case "$1" in
+	-a|--all)
+		DO_RECOVER=1
+		;;
+	-i|--interactive)
+		DO_INTERACTIVE=1
+		;;
+	--full)
+		DO_FULL=1
+		;;
+	*)
+		if [ "${1:0:1}" == "-" ]; then
+			echo "$PROGNAME: unknown argument: $1" >&2
+			die_usage
+		fi
+		BLOBS+=("$1")
+
+		shift
+		if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then
+			shift
+			if [ $# == 0 ]; then
+				die_usage
+			fi
+			FILENAMES+=("$1")
+			shift
+		else
+			FILENAMES+=("")
+		fi
+		continue
+	;;
+	esac
+	shift
+done
+
+if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then
+	die_usage
+elif [ ${#BLOBS[@]} != 0 ]; then
+	DO_RECOVER=1
+fi
+
+case "$OSTYPE" in
+	darwin*|freebsd*) IS_BSD=1 ;;
+	*) IS_BSD=0 ;;
+esac
+
+function expand_given_blobs() {
+	for i in "${!BLOBS[@]}"; do
+		ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true)
+
+		if [ -z "$ID" ]; then
+			echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 1>&2
+			exit 1
+		fi
+
+		TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true)
+
+		if [ "$TYPE" != "blob" ]; then
+			echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2
+			exit
+		fi
+
+		BLOBS[$i]=$ID
+	done
+}
+
+# find all the unreachable blobs
+function find_unreachable() {
+	FULLNESS="--no-full"
+
+	if [ $DO_FULL == 1 ]; then FULLNESS="--full"; fi
+
+	BLOBS=($(git fsck --unreachable --no-reflogs \
+		"${FULLNESS}" --no-progress | sed -ne 's/^unreachable blob //p'))
+}
+
+function read_one_file {
+	BLOB=$1
+	FILTER_NAME=$2
+	ARGS=()
+
+	if [ -z "$FILTER_NAME" ]; then
+		ARGS+=("blob")
+	else
+		ARGS+=("--filters" "--path=$FILTER_NAME")
+	fi
+
+	git cat-file "${ARGS[@]}" "$BLOB"
+}
+
+function write_one_file {
+	BLOB=$1
+	FILTER_NAME=$2
+	OUTPUT_NAME=$3
+
+	ABBREV=$(git rev-parse --short "${BLOB}")
+
+	echo -n "Writing $ABBREV: "
+	read_one_file "$BLOB" "$FILTER_NAME" > "$OUTPUT_NAME"
+	echo "$OUTPUT_NAME."
+}
+
+function unique_filename {
+	if [ ! -f "${BLOB}" ]; then
+		echo "$BLOB"
+	else
+		cnt=1
+		while true
+		do
+			fn="${BLOB}~${cnt}"
+			if [ ! -f "${fn}" ]; then
+				echo "${fn}"
+				break
+			fi
+			cnt=$((cnt+1))
+		done
+	fi
+}
+
+function write_recoverable {
+	for i in "${!BLOBS[@]}"; do
+		BLOB=${BLOBS[$i]}
+		FILTER_NAME=${FILENAMES[$i]}
+		OUTPUT_NAME=${FILENAMES[$i]:-$(unique_filename)}
+
+		write_one_file "$BLOB" "$FILTER_NAME" "$OUTPUT_NAME"
+	done
+}
+
+function file_time {
+	if [ $IS_BSD == 1 ]; then
+		stat -f %c "$1"
+	else
+		stat -c %Y "$1"
+	fi
+}
+
+function timestamp_to_s {
+	if [ $IS_BSD == 1 ]; then
+		date -r "$1"
+	else
+		date -d @"$1"
+	fi
+}
+
+function sort_by_timestamp {
+	# sort blobs in loose objects by their timestamp (packed blobs last)
+	BLOB_AND_TIMESTAMPS=($(for BLOB in "${BLOBS[@]}"; do
+		LOOSE="${BLOB::2}/${BLOB:2}"
+		TIME=$(file_time "$GIT_DIR/objects/$LOOSE" 2>/dev/null || true)
+		echo "$BLOB $TIME"
+	done | sort -k2 -r))
+}
+
+function print_recoverable {
+	echo "Recoverable orphaned git blobs:"
+	echo ""
+
+	sort_by_timestamp
+	for BLOB_AND_TIMESTAMP in "${BLOB_AND_TIMESTAMPS[@]}"; do
+		BLOB=${BLOB_AND_TIMESTAMP::40}
+		TIME=${BLOB_AND_TIMESTAMP:41}
+		DATE=$([ ! -z "$TIME" ] && timestamp_to_s "$TIME" || echo "(Unknown)") 
+
+		echo "$BLOB  $DATE"
+	done
+}
+
+function prompt_for_filename {
+	while true
+	do
+		echo -n "Filename (return to skip): "
+		read -r FILENAME
+
+		if [ -f "$FILENAME" ]; then
+			echo -n "File exists, overwrite? [y,N]: "
+			read -r overwrite
+
+			case "$overwrite" in
+			[yY]*)
+				return 0
+				;;
+			esac
+
+			echo
+		else
+			return 0
+		fi
+	done
+}
+
+function view_file {
+	read_one_file "${BLOB}" | ${PAGER:-less}
+}
+
+function show_summary {
+	FILETYPE=$(read_one_file "${BLOB}" | file -b -)
+	IS_TEXT=$(echo "${FILETYPE}" | grep -c ' text$' 2>/dev/null || true)
+
+	if [ "$IS_TEXT" == "1" ]; then
+		read_one_file "${BLOB}"
+	else
+		read_one_file "${BLOB}" | hexdump -C
+	fi
+}
+
+function interactive {
+	echo "Recoverable orphaned git blobs:"
+
+	sort_by_timestamp
+	for BLOB_AND_TIMESTAMP in "${BLOB_AND_TIMESTAMPS[@]}"; do
+		echo
+
+		BLOB=${BLOB_AND_TIMESTAMP::40}
+		TIME=${BLOB_AND_TIMESTAMP:41}
+		DATE=$([ ! -z "$TIME" ] && timestamp_to_s "$TIME" || echo "(Unknown)") 
+
+		echo "$BLOB  ($DATE)"
+		show_summary "${BLOB}" | head -4 | sed -e 's/^/> /'
+		echo
+
+		while true
+		do
+			echo -n "Recover this file? [y,n,v,f,q,?]: "
+			read -r ans || return 1
+
+			case "$ans" in
+			[yY]*)
+				write_one_file "${BLOB}" "" "$(unique_filename)"
+				break
+				;;
+			[nN]*)
+				break
+				;;
+			[vV]*)
+				view_file "${BLOB}"
+				echo
+				;;
+			[fF]*)
+				prompt_for_filename
+
+				if [ "$FILENAME" == "" ]; then
+					break
+				fi
+
+				write_one_file "${BLOB}" "${FILENAME}" "${FILENAME}"
+				break
+				;;
+			\?*)
+				echo
+				echo "Do you want to recover this file?"
+				echo " y: yes, write the file to ${BLOB}"
+				echo " n: no, skip this file and see the next orphaned file"
+				echo " v: view the file"
+				echo " f: prompt for a filename to use for recovery"
+				echo " q: quit"
+				echo
+				;;
+			[qQ]*)
+				return 0
+				;;
+			esac
+		done
+	done
+}
+
+
+if [ ${#BLOBS[@]} != 0 ]; then
+	expand_given_blobs
+else
+	find_unreachable
+fi
+
+if [ ${#BLOBS[@]} == 0 ]; then
+	echo "$PROGNAME: no recoverable orphaned blobs."
+	exit
+fi
+
+if [ $DO_INTERACTIVE == 1 ]; then
+	interactive
+elif [ $DO_RECOVER == 1 ]; then
+	write_recoverable
+else
+	print_recoverable
+fi
+
-- 
2.0.0 (libgit2)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 14:24 ` [RFC PATCH 1/1] recover: restoration of deleted worktree files Edward Thomson
@ 2018-08-04 15:54   ` Junio C Hamano
  2018-08-04 16:17     ` Robert P. J. Day
  2018-08-04 16:19     ` Edward Thomson
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2018-08-04 15:54 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git

Edward Thomson <ethomson@edwardthomson.com> writes:

> Introduce git-recover, a simple script to aide in restoration of deleted
> worktree files.  This will look for unreachable blobs in the object
> database and prompt users to restore them to disk, either interactively
> or on the command-line.
> ---
>  git-recover.sh | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 311 insertions(+)
>  create mode 100755 git-recover.sh

My first reaction was to say that I am not going to take a new
command written only for bash with full bashism, even if it came
with docs, tests nor Makefile integration, for Git itself.  Then I
reconsidered, as not everything related to Git is git-core, and all
of the above traits are sign of this patch _not_ meant for git-core.

In other words, I think this patch can be a fine addition to
somebody else's project (i.e. random collection of scripts that may
help Git users), so let's see how I can offer comments/inputs to
help you improve it.  So I won't comment on lang, log message, or
shell scripting style---these are project convention and the git-core
convention won't be relevant to this patch.

> diff --git a/git-recover.sh b/git-recover.sh
> new file mode 100755
> index 000000000..651d4116f
> --- /dev/null
> +++ b/git-recover.sh
> @@ -0,0 +1,311 @@
> +#!/usr/bin/env bash
> +#
> +# This program helps recover files in your repository that were deleted
> +# from the working tree.
> +#
> +# Copyright (c) 2017-2018 Edward Thomson.
> +
> +set -e
> +
> +IFS=$'\n'
> +
> +PROGNAME=$(echo "$0" | sed -e 's/.*\///')
> +GIT_DIR=$(git rev-parse --git-dir)
> +
> +DO_RECOVER=0
> +DO_FULL=0
> +DO_INTERACTIVE=0
> +BLOBS=()
> +FILENAMES=()
> +
> +function die_usage {
> +	echo "usage: $PROGNAME [-a] [-i] [--full] [<id> [-f <filename>] ...]" >&2
> +	exit 1
> +}
> +
> +while [[ $# -gt 0 ]]; do
> +	case "$1" in
> +	-a|--all)
> +		DO_RECOVER=1
> +		;;
> +	-i|--interactive)
> +		DO_INTERACTIVE=1
> +		;;
> +	--full)
> +		DO_FULL=1
> +		;;
> +	*)
> +		if [ "${1:0:1}" == "-" ]; then
> +			echo "$PROGNAME: unknown argument: $1" >&2
> +			die_usage
> +		fi
> +		BLOBS+=("$1")
> +
> +		shift
> +		if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then
> +			shift
> +			if [ $# == 0 ]; then
> +				die_usage
> +			fi
> +			FILENAMES+=("$1")
> +			shift
> +		else
> +			FILENAMES+=("")
> +		fi

You do not want to take "--file=Makefile" (i.e. abbreviated option
name, and value as part of the option arg after '=')?

> +		continue
> +	;;
> +	esac
> +	shift
> +done

So, as a user, I can run this with "-a" but no blob object names to
run it in DO_RECOVER mode, or I can give one or more "blob spec"
where I say object id, optionally followed by one "-f filename"; in
the latter mode, BLOBS[] and FILENAMES[] array would have the same
number of elements, corresponding to each other.

> +if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then
> +	die_usage
> +elif [ ${#BLOBS[@]} != 0 ]; then
> +	DO_RECOVER=1
> +fi

If I did not say "-a" but did not give "blob spec", then I am
implicitly asking for "-a" to work in DO_RECOVER mode.

I think I understood what the program wants to do so far.

> +case "$OSTYPE" in
> +	darwin*|freebsd*) IS_BSD=1 ;;
> +	*) IS_BSD=0 ;;
> +esac
> +
> +function expand_given_blobs() {
> +	for i in "${!BLOBS[@]}"; do
> +		ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true)
> +
> +		if [ -z "$ID" ]; then
> +			echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 1>&2
> +			exit 1
> +		fi
> +
> +		TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true)

An earlier "set -e" makes "|| true" ugliness required.  I suspect
use of "set -e" overall is a loss (vs explicit error checking).

> +		if [ "$TYPE" != "blob" ]; then
> +			echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2
> +			exit
> +		fi

A user may have given us 11f5bcd9 and this function makes sure such
an object exists in the object store *and* is a blob.  Otherwise
it dies.  The main objective of this function is to turn that user
supplied object name to a full hex that is known to refer to an
existing blob.

> +		BLOBS[$i]=$ID
> +	done

I find a disconnect between this being a loop and the attiude "we
won't tolerate any erroneous input".  If a user is feeding dozens of
blob object names, wouldn't it be more helpful to give a warning, go
on and help the user with the rest?

> +}
> +
> +# find all the unreachable blobs
> +function find_unreachable() {
> +	FULLNESS="--no-full"
> +
> +	if [ $DO_FULL == 1 ]; then FULLNESS="--full"; fi
> +
> +	BLOBS=($(git fsck --unreachable --no-reflogs \
> +		"${FULLNESS}" --no-progress | sed -ne 's/^unreachable blob //p'))
> +}

If you are going to do a full sweep with fsck anyway, perhaps have
make it do the work of writing out lost-found, so that you can
iterate over them?

As a blob that is only reachable from a commit in reflog and not in
the histories that are alive is something a user would want to recover,
the use of --no-reflogs option makes sense to me.

> +function read_one_file {
> +	BLOB=$1
> +	FILTER_NAME=$2
> +	ARGS=()
> +
> +	if [ -z "$FILTER_NAME" ]; then
> +		ARGS+=("blob")
> +	else
> +		ARGS+=("--filters" "--path=$FILTER_NAME")
> +	fi
> +
> +	git cat-file "${ARGS[@]}" "$BLOB"
> +}

We get a blob object name and optional "--filename=name"; drives
cat-file possibly with "--filters --path=name" to grab its contents.
I find it a good thinking to use "--filters" that does the equivalent
of the smudge codepath, as you eventually want to materialize the
found contents as a working tree file ...

> +function write_one_file {
> +	BLOB=$1
> +	FILTER_NAME=$2
> +	OUTPUT_NAME=$3
> +
> +	ABBREV=$(git rev-parse --short "${BLOB}")
> +
> +	echo -n "Writing $ABBREV: "
> +	read_one_file "$BLOB" "$FILTER_NAME" > "$OUTPUT_NAME"
> +	echo "$OUTPUT_NAME."
> +}

... which happens here.

> +function unique_filename {
> +	if [ ! -f "${BLOB}" ]; then
> +		echo "$BLOB"
> +	else
> +		cnt=1
> +		while true
> +		do
> +			fn="${BLOB}~${cnt}"
> +			if [ ! -f "${fn}" ]; then
> +				echo "${fn}"
> +				break
> +			fi
> +			cnt=$((cnt+1))
> +		done
> +	fi
> +}

The function comes up with a name for a given blob to be written in
the directory, in which the user happened to have started the
command.  The function cannot be used unless we are processing each
blob fully before moving onto the next one. In other words,
"resurrect abcde --file=Makefile abcde --file=README.pdf" would
leave the same blob in the BLOBS[] array twice, so that two
different --filters can be attempted, but the calling code cannot
first call this function for all the BLOBS[] elements to assign them
unique-filename, as this function depends on "test -f" to be able to
see what happened to the previous elements in the BLOBS[] array.

> +function write_recoverable {
> +	for i in "${!BLOBS[@]}"; do
> +		BLOB=${BLOBS[$i]}
> +		FILTER_NAME=${FILENAMES[$i]}
> +		OUTPUT_NAME=${FILENAMES[$i]:-$(unique_filename)}
> +
> +		write_one_file "$BLOB" "$FILTER_NAME" "$OUTPUT_NAME"
> +	done
> +}

And we do that for all in BLOBS[].

> +function interactive {
> +	echo "Recoverable orphaned git blobs:"
> +
> +	sort_by_timestamp
> +	for BLOB_AND_TIMESTAMP in "${BLOB_AND_TIMESTAMPS[@]}"; do
> +		echo
> +
> +		BLOB=${BLOB_AND_TIMESTAMP::40}
> +		TIME=${BLOB_AND_TIMESTAMP:41}
> +		DATE=$([ ! -z "$TIME" ] && timestamp_to_s "$TIME" || echo "(Unknown)") 
> +
> +		echo "$BLOB  ($DATE)"
> +		show_summary "${BLOB}" | head -4 | sed -e 's/^/> /'
> +		echo
> +
> +		while true
> +		do
> +			echo -n "Recover this file? [y,n,v,f,q,?]: "
> +			read -r ans || return 1
> +
> +			case "$ans" in
> +			[yY]*)
> +				write_one_file "${BLOB}" "" "$(unique_filename)"
> +				break
> +				;;
> +			[nN]*)
> +				break
> +				;;
> +			[vV]*)
> +				view_file "${BLOB}"
> +				echo
> +				;;
> +			[fF]*)
> +				prompt_for_filename
> +
> +				if [ "$FILENAME" == "" ]; then
> +					break
> +				fi
> +
> +				write_one_file "${BLOB}" "${FILENAME}" "${FILENAME}"
> +				break
> +				;;
> +			\?*)
> +				echo
> +				echo "Do you want to recover this file?"
> +				echo " y: yes, write the file to ${BLOB}"
> +				echo " n: no, skip this file and see the next orphaned file"
> +				echo " v: view the file"
> +				echo " f: prompt for a filename to use for recovery"
> +				echo " q: quit"
> +				echo
> +				;;
> +			[qQ]*)
> +				return 0
> +				;;
> +			esac
> +		done
> +	done

Shows a bit of snippet from an orphaned blob, offer to write to
disk, to show it in full to give larger clue on its contents, etc.
I am not sure the value of [yY] that does not do the prompt thing,
possibly using the generated uniq name as the default---after all,
this is interactive.

One thing that I am mildly disappointed about this script is that it
does not give much over lost-found service fsck gives, other than
the "interactive" loop.  A dedicated "resurrect" command should do a
lot more.

Let me throw one piece of idea out, which may or may not work well.

"fsck" finds unreachable blobs, but it should be finding trees that
are also unreachable that contain them, recursively.  A command to
truly help users recover lost blobs should be taking advantage of
that fact and spending cycles to exploit it to help them.  For
example, you could (and this won't happen inside a bash script)

 - enumerate unreachable trees and blobs;
 - identify _a_ tree the sought-blob appears in;
 - identify _a_ tree that tree appears in;
 - recursively do the above until you find no more.

That gives you _a_ path to the blob in _a_ tree.  That top tree
might be the toplevel of a working tree (it may be referenced by a
commit that is unreachable, for example), or the commit and its top
level tree may already have been lost to gc and what you have might
be a tree structure representing a subtree (e.g. you thought that
you found "Makefile", but it turns out not to be the top-level
Makefile, but "t/Makefile").

But any such name is 1000x better than a random name derived from
the object name.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 15:54   ` Junio C Hamano
@ 2018-08-04 16:17     ` Robert P. J. Day
  2018-08-04 17:33       ` Todd Zullinger
  2018-08-04 16:19     ` Edward Thomson
  1 sibling, 1 reply; 8+ messages in thread
From: Robert P. J. Day @ 2018-08-04 16:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Edward Thomson, git

On Sat, 4 Aug 2018, Junio C Hamano wrote:

> Edward Thomson <ethomson@edwardthomson.com> writes:
>
> > Introduce git-recover, a simple script to aide in restoration of
> > deleted worktree files.  This will look for unreachable blobs in
> > the object database and prompt users to restore them to disk,
> > either interactively or on the command-line.

> >  git-recover.sh | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 311 insertions(+)
> >  create mode 100755 git-recover.sh
>
> My first reaction was to say that I am not going to take a new
> command written only for bash with full bashism, even if it came
> with docs, tests nor Makefile integration, for Git itself.  Then I
> reconsidered, as not everything related to Git is git-core, and all
> of the above traits are sign of this patch _not_ meant for git-core.
>
> In other words, I think this patch can be a fine addition to
> somebody else's project (i.e. random collection of scripts that may
> help Git users), so let's see how I can offer comments/inputs to
> help you improve it.  So I won't comment on lang, log message, or
> shell scripting style---these are project convention and the
> git-core convention won't be relevant to this patch.

  not sure how relevant this is, but fedora bundles a bunch of neat
utilities into two packages: git-tools and git-extras. i have no idea
what relationship those packages have to official git, or who decides
what goes into them.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                  http://crashcourse.ca/dokuwiki

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 15:54   ` Junio C Hamano
  2018-08-04 16:17     ` Robert P. J. Day
@ 2018-08-04 16:19     ` Edward Thomson
  2018-08-04 16:48       ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Edward Thomson @ 2018-08-04 16:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Aug 04, 2018 at 08:54:49AM -0700, Junio C Hamano wrote:
> 
> My first reaction was to say that I am not going to take a new
> command written only for bash with full bashism, even if it came
> with docs, tests nor Makefile integration, for Git itself.  Then I
> reconsidered, as not everything related to Git is git-core, and all
> of the above traits are sign of this patch _not_ meant for git-core.

Yes, obviously I was not suggesting that this would be mergeable with
the bashims, as I mentioned in my cover letter.

In any case, it sounds like you're not particularly interested in
this, although I certainly appreciate you taking the time to suggest
improvements despite that.  There's some good feedback there.

Cheers-
-ed

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 16:19     ` Edward Thomson
@ 2018-08-04 16:48       ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2018-08-04 16:48 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git

Edward Thomson <ethomson@edwardthomson.com> writes:

> In any case, it sounds like you're not particularly interested in
> this, although I certainly appreciate you taking the time to suggest
> improvements despite that.  There's some good feedback there.

Not in its current shape.  But do not take this in a wrong way.  It
may be useful in a third-party script collection in its current
shape already.

More importantly, I am not opposed to have a "resurrect" utility in
the core distribution.  It just has to be a lot better than what
"grep -e 'I think I wrote this string' .git/lost-found/other/*"
gives us.

Filename discovery (perhaps from lost trees, which was the idea I
wrote in the message I am responding to, but others may come up with
better alternatibve approaches) is a must, but not primarily because
such a grep won't find the path to which the contents should go.
When a user says "I think I wrote this string in the file I am
looking for", s/he already knows what s/he wants to recover (i.e. it
was a README file at the top-level).  Filename discovery is a must
because grepping in the raw blob contents without smudge filter
chain applied may not find what we want in the first place, and for
that to happen, we need to have a filename.

	Side note.  That may mean that even working in the
	do-recover mode, the script may want to take a filename,
	letting the user to say "pretend all lost blobs are of this
	type, as that is the type of the blob I just lost and am
	interested in, and a filename will help you find an
	appropriate smudge and/or textconv filter to help me"

That makes me realize that I did not mention one more thing, other
than the "interactibve loop", I did like in the script over what
lost-found gives us: smudge filter support.  I do not very often
work with contents that needs clean/smudge other than in one project
(obviously not "git.git"), and I can see how it is essential in
helping the user to find the contents the user is looking for.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
  2018-08-04 16:17     ` Robert P. J. Day
@ 2018-08-04 17:33       ` Todd Zullinger
  0 siblings, 0 replies; 8+ messages in thread
From: Todd Zullinger @ 2018-08-04 17:33 UTC (permalink / raw)
  To: Robert P. J. Day; +Cc: Junio C Hamano, Edward Thomson, git

Hi,

Robert P. J. Day wrote:
> On Sat, 4 Aug 2018, Junio C Hamano wrote:
>> In other words, I think this patch can be a fine addition to
>> somebody else's project (i.e. random collection of scripts that may
>> help Git users), so let's see how I can offer comments/inputs to
>> help you improve it.  So I won't comment on lang, log message, or
>> shell scripting style---these are project convention and the
>> git-core convention won't be relevant to this patch.
> 
>   not sure how relevant this is, but fedora bundles a bunch of neat
> utilities into two packages: git-tools and git-extras. i have no idea
> what relationship those packages have to official git, or who decides
> what goes into them.

For anyone curious, those packages (git-extras and
git-tools) are both entirely separate projects upstream and
in the fedora packaging.  A git-recover script may well be a
good fit in one of those upstream projects.

The git-(extras|tools) package names are a bit confusing
IMO.  But it's probably more confusing that they each add a
number of git-* commands in the default PATH the way they're
packaged.

We do package some bits from contrib/ (e.g. completion,
subtree, etc.) in the fedora git packages.  We don't add
scripts and commands from outside of the git tarballs as
part of the fedora git package, though.

So far, I don't recall anyone filing a bug report about
commands from git-extras or git-tools against git.  So it
seems that users of those additional packages aren't being
confused, thankfully.

-- 
Todd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Between two evils, I always pick the one I never tried before.
    -- Mae West


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/1] Introduce git-recover
  2018-08-04 14:22 [RFC PATCH 0/1] Introduce git-recover Edward Thomson
  2018-08-04 14:24 ` [RFC PATCH 1/1] recover: restoration of deleted worktree files Edward Thomson
@ 2018-08-05  1:34 ` Jonathan Nieder
  1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Nieder @ 2018-08-05  1:34 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git

Hi,

Edward Thomson wrote:

> I created a simple shell script a while back to help people recover
> files that they deleted from their working directory (but had been added
> to the repository), which looks for unreachable blobs in the object
> database and places them in the working directory (either en masse,
> interactively, or via command-line arguments).

Cool!  Most of this belongs in the commit message, which is part of why
I always discourage having a separate cover letter in single-patch
series.

> This has been available at https://github.com/ethomson/git-recover for
> about a year, and in that time, someone has suggested that I propose
> this as part of git itself.  So I thought I'd see if there's any
> interest in this.
>
> If there is, I'd like to get a sense of the amount of work required to
> make this suitable for inclusion.  There are some larger pieces of work
> required -- at a minimum, I think this requires:
>
> - Tests -- there are none, which is fine with me but probably less fine
>   for inclusion here.
> - Documentation -- the current README is below but it will need proper
>   documentation that can be rendered into manpages, etc, by the tools.
> - Remove bashisms -- there are many.

One possible path in that direction would be to "stage" the code in
contrib/ first, while documenting the intention of graduating to a
command in git itself.  Then the list can pitch in with those tasks.
There are good reasons for a tool to exist outside of Git, so I
wouldn't recommend this unless we have a clear plan for its graduation
that we've agreed upon as a project, but thought I should mention it
as a mechanism in case we decide to do that.

The trend these days for Git commands has been to prefer to have them
in C.  Portable shell is a perfectly fine stopping point on the way
there, though.

My more fundamental main thought is separate from those logistics: how
does this relate to "git fsck --lost-found"?  What would your ideal
interface to solve this problem look like?  Can we make Git's commands
complement each other in a good way to solve it well?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-05  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-04 14:22 [RFC PATCH 0/1] Introduce git-recover Edward Thomson
2018-08-04 14:24 ` [RFC PATCH 1/1] recover: restoration of deleted worktree files Edward Thomson
2018-08-04 15:54   ` Junio C Hamano
2018-08-04 16:17     ` Robert P. J. Day
2018-08-04 17:33       ` Todd Zullinger
2018-08-04 16:19     ` Edward Thomson
2018-08-04 16:48       ` Junio C Hamano
2018-08-05  1:34 ` [RFC PATCH 0/1] Introduce git-recover Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).