git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] git add -A --dry-run adds files to .git directory
@ 2021-10-12 14:55 git.mexon
  2021-10-12 19:15 ` [PATCH] add: don't write objects with --dry-run René Scharfe
  0 siblings, 1 reply; 5+ messages in thread
From: git.mexon @ 2021-10-12 14:55 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

mat@charly:~$ git init repo
mat@charly:~$ cat /dev/random | head -c 100000000 > repo/randomfile
mat@charly:~$ git -C repo add -A -n
mat@charly:~$ du -hs repo/.git

What did you expect to happen? (Expected behavior)

  76K    repo/.git

What happened instead? (Actual behavior)

  97M    repo/.git

What's different between what you expected and what actually happened?

Even though I specified -n on the command line to make a dry run, it 
still added the large file as an object.

Anything else you want to add:

This is counter-intuitive behaviour.  When the documentation says "Don't 
actually add the file(s)", it's reasonable to expect that this command 
will leave the repository state as it found it.  If there's a good 
reason why these files should be added to the .git directory, that 
behaviour should be made clear in the documentation, probably including 
the reason why that counter-intuitive behaviour is necessary.

This bit me due to a script that was using git add -A -n to detect if a 
repository had been locally modified.  While I do have alternative 
(actually better) ways to get this information, it wasn't an 
unreasonable command to use in the first place, and it caused a real 
issue in a production system.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.30.1 (Apple Git-130)
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Darwin 20.6.0 Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 
PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 x86_64
compiler info: clang: 13.0.0 (clang-1300.0.29.3)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] add: don't write objects with --dry-run
  2021-10-12 14:55 [BUG] git add -A --dry-run adds files to .git directory git.mexon
@ 2021-10-12 19:15 ` René Scharfe
  2021-10-12 20:15   ` Junio C Hamano
  2021-10-12 20:17   ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 5+ messages in thread
From: René Scharfe @ 2021-10-12 19:15 UTC (permalink / raw)
  To: git.mexon, git; +Cc: Junio C Hamano

When the option --dry-run/-n is given, "git add" doesn't change the
index, but still writes out new object files.  Only hash the latter
without writing instead to make the run as dry as possible.

Use this opportunity to also make the hash_flags variable unsigned,
to match the index_path() parameter it is used as.

Reported-by: git.mexon@spamgourmet.com
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Am I missing something?  Do we sometimes rely on the written objects
within the "git add --dry-run" command?

 read-cache.c          | 2 +-
 t/t2200-add-update.sh | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index a78b88a41b..7fcc948077 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -738,7 +738,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	int intent_only = flags & ADD_CACHE_INTENT;
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
-	int hash_flags = HASH_WRITE_OBJECT;
+	unsigned hash_flags = pretend ? 0 : HASH_WRITE_OBJECT;
 	struct object_id oid;

 	if (flags & ADD_CACHE_RENORMALIZE)
diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
index 45ca35d60a..94c4cb0672 100755
--- a/t/t2200-add-update.sh
+++ b/t/t2200-add-update.sh
@@ -129,12 +129,15 @@ test_expect_success 'add -n -u should not add but just report' '
 		echo "remove '\''top'\''"
 	) >expect &&
 	before=$(git ls-files -s check top) &&
+	git count-objects -v >objects_before &&
 	echo changed >>check &&
 	rm -f top &&
 	git add -n -u >actual &&
 	after=$(git ls-files -s check top) &&
+	git count-objects -v >objects_after &&

 	test "$before" = "$after" &&
+	test_cmp objects_before objects_after &&
 	test_cmp expect actual

 '
--
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] add: don't write objects with --dry-run
  2021-10-12 19:15 ` [PATCH] add: don't write objects with --dry-run René Scharfe
@ 2021-10-12 20:15   ` Junio C Hamano
  2021-10-12 20:17   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-10-12 20:15 UTC (permalink / raw)
  To: René Scharfe; +Cc: git.mexon, git

René Scharfe <l.s.r@web.de> writes:

> When the option --dry-run/-n is given, "git add" doesn't change the
> index, but still writes out new object files.  Only hash the latter
> without writing instead to make the run as dry as possible.
>
> Use this opportunity to also make the hash_flags variable unsigned,
> to match the index_path() parameter it is used as.
>
> Reported-by: git.mexon@spamgourmet.com
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Am I missing something?  Do we sometimes rely on the written objects
> within the "git add --dry-run" command?

Good question.  I do not think of anything offhand, but this obvious
"omission" makes me suspect that we may be forgetting something.

Thanks.


>  read-cache.c          | 2 +-
>  t/t2200-add-update.sh | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index a78b88a41b..7fcc948077 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -738,7 +738,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  	int intent_only = flags & ADD_CACHE_INTENT;
>  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
>  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> -	int hash_flags = HASH_WRITE_OBJECT;
> +	unsigned hash_flags = pretend ? 0 : HASH_WRITE_OBJECT;
>  	struct object_id oid;
>
>  	if (flags & ADD_CACHE_RENORMALIZE)
> diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
> index 45ca35d60a..94c4cb0672 100755
> --- a/t/t2200-add-update.sh
> +++ b/t/t2200-add-update.sh
> @@ -129,12 +129,15 @@ test_expect_success 'add -n -u should not add but just report' '
>  		echo "remove '\''top'\''"
>  	) >expect &&
>  	before=$(git ls-files -s check top) &&
> +	git count-objects -v >objects_before &&
>  	echo changed >>check &&
>  	rm -f top &&
>  	git add -n -u >actual &&
>  	after=$(git ls-files -s check top) &&
> +	git count-objects -v >objects_after &&
>
>  	test "$before" = "$after" &&
> +	test_cmp objects_before objects_after &&
>  	test_cmp expect actual
>
>  '
> --
> 2.33.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] add: don't write objects with --dry-run
  2021-10-12 19:15 ` [PATCH] add: don't write objects with --dry-run René Scharfe
  2021-10-12 20:15   ` Junio C Hamano
@ 2021-10-12 20:17   ` Ævar Arnfjörð Bjarmason
  2021-10-12 20:37     ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-12 20:17 UTC (permalink / raw)
  To: René Scharfe; +Cc: git.mexon, git, Junio C Hamano


On Tue, Oct 12 2021, René Scharfe wrote:

> When the option --dry-run/-n is given, "git add" doesn't change the
> index, but still writes out new object files.  Only hash the latter
> without writing instead to make the run as dry as possible.
>
> Use this opportunity to also make the hash_flags variable unsigned,
> to match the index_path() parameter it is used as.
>
> Reported-by: git.mexon@spamgourmet.com
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> Am I missing something?  Do we sometimes rely on the written objects
> within the "git add --dry-run" command?

Probably not, here's a semi-related patch of mine that never got
integrated. E.g. you'll probably find that even if you're not writing
objects we're still doing things like zlib compression here too (or not,
I haven't looked):
https://lore.kernel.org/git/20190520222932.22843-1-avarab@gmail.com/

I think the "git fetch --dry-run" command behaves like this too,
i.e. doesn't update refs, but fetches and writes objects.

For the patch I hacked up I think it's easy to argue that it shouldn't
do compression etc.

For this sort of thing and "fetch" I'm not so sure. Do we really know
that there aren't people who rely on this for say the performance of
seeing what an operation would do, and then not pay as much for the
"real one" that updates the index/refs/etc. later? Is that subsequent
"fetch" cheaper because of the --dry-run?

Maybe not, but it seems like something to look into.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] add: don't write objects with --dry-run
  2021-10-12 20:17   ` Ævar Arnfjörð Bjarmason
@ 2021-10-12 20:37     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-10-12 20:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: René Scharfe, git.mexon, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think the "git fetch --dry-run" command behaves like this too,
> i.e. doesn't update refs, but fetches and writes objects.
>
> For the patch I hacked up I think it's easy to argue that it shouldn't
> do compression etc.
>
> For this sort of thing and "fetch" I'm not so sure. Do we really know
> that there aren't people who rely on this for say the performance of
> seeing what an operation would do, and then not pay as much for the
> "real one" that updates the index/refs/etc. later? Is that subsequent
> "fetch" cheaper because of the --dry-run?

The answer to the last one is an easy "yes".  Trying to gauge the
time it would take for a real fetch with "--dry-run" is a losing
battle, I would think, as the pre-fetching would make the "real" one
cheaper, so from that point of view, I think we can ignore those who
time "--dry-run" and try to figure out anything meaningful.

This in any case is an interesting area, as the definition of
correctness of what "dry-run" does can be quite fuzzy.  As long as
it does not change the index, "git add --dry-run", even if it writes
objects or detects filesystem corruption by noticing I/O error while
compressing the data taken from the working tree files, is still
correct and the patch in question is not technically a bugfix (it is
a performance thing).  "git fetch --dry-run" would fall into the
same category, so would "git hash-object" without "-w".

All can use performance enhancement without breaking existing users,
I would think.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-12 20:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 14:55 [BUG] git add -A --dry-run adds files to .git directory git.mexon
2021-10-12 19:15 ` [PATCH] add: don't write objects with --dry-run René Scharfe
2021-10-12 20:15   ` Junio C Hamano
2021-10-12 20:17   ` Ævar Arnfjörð Bjarmason
2021-10-12 20:37     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).