git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] builtin/ls-files.c:add git ls-file --dedup option
@ 2021-01-06  8:53 阿德烈 via GitGitGadget
  2021-01-07  6:10 ` Eric Sunshine
  2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
  0 siblings, 2 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-06  8:53 UTC (permalink / raw)
  To: git; +Cc: 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

1.When we use git ls-files with both -m -d,
we would find that repeated path,sometimes
it is confusing.
2.When we are performing a branch merge,
 the default git ls-files will also output
 multiple repeated file names.
Therefore, I added the --dedup option to git ls-files.
1. It can be achieved that only the deleted file name
is displayed when using -m, -d, and --dedup at the same time.
2. Add --dedup when merging branches to remove duplicate file
 names. (unless -s, -u are used)

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    builtin/ls-files.c:add git ls-file --dedup option
    
    I am reading the source code of git ls-files and learned that git ls
    -files may have duplicate entries when conflict occurs in a branch merge
    or when different options are used at the same time. Users may fell
    confuse when they see these duplicate entries.
    
    As Junio C Hamano said ,it have odd behaviour.
    
    Therefore, we can provide an additional option to git ls-files to delete
    those repeated information.
    
    This fixes https://github.com/gitgitgadget/git/issues/198
    
    Thanks!

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/832

 builtin/ls-files.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..66a7e251a46 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int delete_dup;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_stage=NULL;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -315,7 +317,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	if (show_cached || show_stage) {
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
-
+			if(show_cached && delete_dup){
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage=ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -336,7 +351,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
 			int err;
-
+			if(delete_dup){
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage=ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -347,10 +375,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			if(delete_dup && show_deleted && show_modified && err)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else{
+				if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified && ie_modified(repo->index, ce, &st, 0))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		}
 	}
 
@@ -578,6 +610,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),
 		OPT_END()
 	};
 

base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-06  8:53 [PATCH] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
@ 2021-01-07  6:10 ` Eric Sunshine
  2021-01-07  6:40   ` Junio C Hamano
  2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
  1 sibling, 1 reply; 65+ messages in thread
From: Eric Sunshine @ 2021-01-07  6:10 UTC (permalink / raw)
  To: 阿德烈 via GitGitGadget; +Cc: Git List, ZheNing Hu

On Wed, Jan 6, 2021 at 3:54 AM 阿德烈 via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> [...]
> Therefore, I added the --dedup option to git ls-files.
> 1. It can be achieved that only the deleted file name
> is displayed when using -m, -d, and --dedup at the same time.
> 2. Add --dedup when merging branches to remove duplicate file
>  names. (unless -s, -u are used)

I'm just pointing out a few minor style issues below; I'm not properly
reviewing the patch...

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  builtin/ls-files.c | 43 ++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 38 insertions(+), 5 deletions(-)

This change adds a new command-line option, so the documentation
(Documentation/git-ls-files.txt) should be updated and at least one
new test should be added (in one of the t/t30??-ls-files-*.sh scripts
probably).

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> @@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>         struct strbuf fullname = STRBUF_INIT;
> +       const struct cache_entry *last_stage=NULL;

Add spaces around `=` similar to the preceding line:

    const struct cache_entry *last_stage = NULL;

> @@ -315,7 +317,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>                 for (i = 0; i < repo->index->cache_nr; i++) {
>                         const struct cache_entry *ce = repo->index->cache[i];
> -

This patch deletes the blank line but this project usually prefers to
have a blank line after declarations.

> +                       if(show_cached && delete_dup){

Add space after `if` and before `{`:

    if (show_cached && delete_dup) {

> @@ -336,7 +351,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> +                       if(delete_dup){

Style: if (delete_dup) {

> @@ -347,10 +375,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> -                       if (show_deleted && err)
> +                       if(delete_dup && show_deleted && show_modified && err)

Style: if (delete_dup && ...

> -                       if (show_modified && ie_modified(repo->index, ce, &st, 0))
> -                               show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +                       else{

Style: else {

> +                               if (show_deleted && err)/* you can't find it,so it's actually removed at all! */

Add space before `/* comment */`.
Add space in "...it, so...".

> @@ -578,6 +610,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> +               OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),

The short help makes it seem like it's modifying the index. Perhaps instead:

    N_("suppress duplicate entries")

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-07  6:10 ` Eric Sunshine
@ 2021-01-07  6:40   ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-07  6:40 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: 阿德烈 via GitGitGadget, Git List, ZheNing Hu

Eric Sunshine <sunshine@sunshineco.com> writes:

>> +                               if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
>
> Add space before `/* comment */`.
> Add space in "...it, so...".

Avoid overly long lines.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 0/2] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-06  8:53 [PATCH] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-07  6:10 ` Eric Sunshine
@ 2021-01-08 14:36 ` 阿德烈 via GitGitGadget
  2021-01-08 14:36   ` [PATCH v2 1/2] " ZheNing Hu via GitGitGadget
                     ` (2 more replies)
  1 sibling, 3 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-08 14:36 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, 阿德烈

I am reading the source code of git ls-files and learned that git ls -files
may have duplicate entries when conflict occurs in a branch merge or when
different options are used at the same time. Users may fell confuse when
they see these duplicate entries.

As Junio C Hamano said ,it have odd behaviour.

Therefore, we can provide an additional option to git ls-files to delete
those repeated information.

This fixes https://github.com/gitgitgadget/git/issues/198

Thanks!

ZheNing Hu (2):
  builtin/ls-files.c:add git ls-file --dedup option
  builtin:ls-files.c:add git ls-file --dedup option

 Documentation/git-ls-files.txt |  4 +++
 builtin/ls-files.c             | 41 ++++++++++++++++++++--
 t/t3012-ls-files-dedup.sh      | 63 ++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 3 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v1:

 1:  0261e5d245e = 1:  0261e5d245e builtin/ls-files.c:add git ls-file --dedup option
 -:  ----------- > 2:  a09a5098aa6 builtin:ls-files.c:add git ls-file --dedup option

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 1/2] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
@ 2021-01-08 14:36   ` ZheNing Hu via GitGitGadget
  2021-01-08 14:36   ` [PATCH v2 2/2] builtin:ls-files.c:add " ZheNing Hu via GitGitGadget
  2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
  2 siblings, 0 replies; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-08 14:36 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

1.When we use git ls-files with both -m -d,
we would find that repeated path,sometimes
it is confusing.
2.When we are performing a branch merge,
 the default git ls-files will also output
 multiple repeated file names.
Therefore, I added the --dedup option to git ls-files.
1. It can be achieved that only the deleted file name
is displayed when using -m, -d, and --dedup at the same time.
2. Add --dedup when merging branches to remove duplicate file
 names. (unless -s, -u are used)

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..66a7e251a46 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int delete_dup;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_stage=NULL;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -315,7 +317,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	if (show_cached || show_stage) {
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
-
+			if(show_cached && delete_dup){
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage=ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -336,7 +351,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
 			int err;
-
+			if(delete_dup){
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage=ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -347,10 +375,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			if(delete_dup && show_deleted && show_modified && err)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else{
+				if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified && ie_modified(repo->index, ce, &st, 0))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		}
 	}
 
@@ -578,6 +610,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),
 		OPT_END()
 	};
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 2/2] builtin:ls-files.c:add git ls-file --dedup option
  2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
  2021-01-08 14:36   ` [PATCH v2 1/2] " ZheNing Hu via GitGitGadget
@ 2021-01-08 14:36   ` ZheNing Hu via GitGitGadget
  2021-01-14  6:38     ` Eric Sunshine
  2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
  2 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-08 14:36 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This commit standardizes the code format.
For git ls-file --dedup option added
relevant descriptions in Documentation/git-ls-files.txt
and wrote t/t3012-ls-files-dedup.sh test script
to prove the correctness of--dedup option.

this patch fixed: https://github.com/gitgitgadget/git/issues/198
Thanks.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-ls-files.txt |  4 +++
 builtin/ls-files.c             | 20 ++++++-----
 t/t3012-ls-files-dedup.sh      | 63 ++++++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 9 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..41a9c5a8b27 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--dedup]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,9 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--dedup::
+	Suppress duplicates entries when conflicts happen or
+	specify -d -m at the same time.
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 66a7e251a46..bc4eded19ab 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -302,7 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
-	const struct cache_entry *last_stage=NULL;
+	const struct cache_entry *last_stage = NULL;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -317,7 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	if (show_cached || show_stage) {
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
-			if(show_cached && delete_dup){
+
+			if (show_cached && delete_dup) {
 				switch (ce_stage(ce)) {
 				case 0:
 				default:
@@ -328,7 +329,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 					if (last_stage &&
 					!strcmp(last_stage->name, ce->name))
 						continue;
-					last_stage=ce;
+					last_stage = ce;
 				}
 			}
 			construct_fullname(&fullname, repo, ce);
@@ -351,7 +352,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
 			int err;
-			if(delete_dup){
+
+			if (delete_dup) {
 				switch (ce_stage(ce)) {
 				case 0:
 				default:
@@ -362,7 +364,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 					if (last_stage &&
 					!strcmp(last_stage->name, ce->name))
 						continue;
-					last_stage=ce;
+					last_stage = ce;
 				}
 			}
 			construct_fullname(&fullname, repo, ce);
@@ -375,10 +377,10 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if(delete_dup && show_deleted && show_modified && err)
+			if (delete_dup && show_deleted && show_modified && err)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			else{
-				if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
+			else {
+				if (show_deleted && err)
 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
 				if (show_modified && ie_modified(repo->index, ce, &st, 0))
 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
@@ -610,7 +612,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
-		OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),
+		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..00c7f65cfc1
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+test_description='git ls-files --dedup test.
+
+This test prepares the following in the cache:
+
+    a.txt       - a file(base)
+    a.txt	- a file(master)
+    a.txt       - a file(dev)
+    b.txt       - a file
+    delete.txt  - a file
+    expect1	- a file
+    expect2	- a file
+
+'
+
+. ./test-lib.sh
+
+test_expect_success 'master branch setup and write expect1 expect2 and commit' '
+	touch a.txt &&
+	touch b.txt &&
+	touch delete.txt &&
+	cat <<-EOF >expect1 &&
+	M a.txt
+	H b.txt
+	H delete.txt
+	H expect1
+	H expect2
+	EOF
+	cat <<-EOF >expect2 &&
+	C a.txt
+	R delete.txt
+	EOF
+	git add a.txt b.txt delete.txt expect1 expect2 &&
+	git commit -m master:1
+'
+
+test_expect_success 'main commit again' '
+	echo a>a.txt &&
+	echo b>b.txt &&
+	echo delete>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m master:2
+'
+
+test_expect_success 'dev commit' '
+	git checkout HEAD~ &&
+	git switch -c dev &&
+	echo change>a.txt &&
+	git add a.txt &&
+	git commit -m dev:1
+'
+
+test_expect_success 'dev merge master' '
+	test_must_fail git merge master &&
+	git ls-files -t --dedup >actual1 &&
+	test_cmp expect1 actual1 &&
+	rm delete.txt &&
+	git ls-files -d -m -t --dedup >actual2 &&
+	test_cmp expect2 actual2
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/2] builtin:ls-files.c:add git ls-file --dedup option
  2021-01-08 14:36   ` [PATCH v2 2/2] builtin:ls-files.c:add " ZheNing Hu via GitGitGadget
@ 2021-01-14  6:38     ` Eric Sunshine
  2021-01-14  8:17       ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Sunshine @ 2021-01-14  6:38 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: Git List, ZheNing Hu

On Fri, Jan 8, 2021 at 9:36 AM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> builtin:ls-files.c:add git ls-file --dedup option

This subject concisely explains the purpose of the patch. That's good.
A more typical way to write it would be:

    ls-files: add --dedup option

> This commit standardizes the code format.

Fixing problems pointed out by reviewers is good. Normally, however,
when you submit a new version of your patch or patch series, you
should apply these fixes directly to the patch(es) which introduced
the problems in the first place rather than adding one or more
additional patches to fix problems introduced in earlier patches. To
do this, you typically would use `git rebase -i` or `git commit
--amend` to squash the fixes into the problematic patches. Thus, when
you re-submit the patches, they will appear to be "perfect".

For this particular two-patch series, patch [2/2] is doing two things:
(1) fixing style problems from patch [1/2], and (2) adding
documentation and tests which logically belong with the feature added
by patch [1/2]. Taking the above advice into account, a better
presentation when you re-submit this series would be to squash these
two patches into a single patch.

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> @@ -81,6 +82,9 @@ OPTIONS
> +--dedup::
> +       Suppress duplicates entries when conflicts happen or
> +       specify -d -m at the same time.

For consistency with typesetting elsewhere in this file, use backticks
around the command-line options. It also often is a good idea to spell
the options using long form since it is typically easier to search for
the long form of an option in documentation. So, perhaps the above can
be written like this:

    Suppress duplicate entries when `--deleted` and `--modified` are
    combined.

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> -       const struct cache_entry *last_stage=NULL;
> +       const struct cache_entry *last_stage = NULL;
> -                       if(show_cached && delete_dup){
> +                       if (show_cached && delete_dup) {
> -                                       last_stage=ce;
> +                                       last_stage = ce;
> -                       if(delete_dup){
> +                       if (delete_dup) {
> -                       if(delete_dup && show_deleted && show_modified && err)
> +                       if (delete_dup && show_deleted && show_modified && err)
> -                       else{
> -                               if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
> +                       else {
> +                               if (show_deleted && err)

As mentioned above, these style fixes should be squashed into the
first patch, rather than being done in a separate patch, so that
reviewers see a nicely polished patch rather than a patch which
requires later fixing up.

> diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> @@ -0,0 +1,63 @@
> +test_expect_success 'master branch setup and write expect1 expect2 and commit' '

We usually give this test a simple title such as "setup" so that we
don't have to worry about the title becoming outdated as people make
changes to the test itself.

> +       touch a.txt &&
> +       touch b.txt &&
> +       touch delete.txt &&

On this project, we use `touch` when the timestamp of the empty files
is important to the test. If the timestamp is not important, then we
just use `>`, like this:

    >a.txt &&
    >b.txt &&
    >delete.txt &&

> +       cat <<-EOF >expect1 &&
> +       M a.txt
> +       H b.txt
> +       H delete.txt
> +       H expect1
> +       H expect2
> +       EOF
> +       cat <<-EOF >expect2 &&
> +       C a.txt
> +       R delete.txt
> +       EOF

When no variables are being interpolated in the here-doc content, we
use -\EOF to let readers know that the here-doc body is literal. So:

    cat >expect1 <<-\EOF &&
    ...
    EOF

> +       git add a.txt b.txt delete.txt expect1 expect2 &&
> +       git commit -m master:1
> +'
> +
> +test_expect_success 'main commit again' '
> +       echo a>a.txt &&
> +       echo b>b.txt &&
> +       echo delete>delete.txt &&
> +       git add a.txt b.txt delete.txt &&
> +       git commit -m master:2
> +'
> +
> +test_expect_success 'dev commit' '
> +       git checkout HEAD~ &&
> +       git switch -c dev &&
> +       echo change>a.txt &&
> +       git add a.txt &&
> +       git commit -m dev:1
> +'

These two tests following the "setup" test also seem to be doing setup
tasks rather than testing the new --dedup functionality. If this is
the case, then it probably would make sense to combine all three tests
into a single "setup" test.

> +test_expect_success 'dev merge master' '
> +       test_must_fail git merge master &&
> +       git ls-files -t --dedup >actual1 &&
> +       test_cmp expect1 actual1 &&
> +       rm delete.txt &&
> +       git ls-files -d -m -t --dedup >actual2 &&
> +       test_cmp expect2 actual2
> +'

Do you foresee that people will add more tests to this file which will
use the files and branches set up by the "setup" test(s)? If not, if
those branches and files are only ever going to be used by this one
test, then it probably would be better to combine all the above code
into a single test.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 2/2] builtin:ls-files.c:add git ls-file --dedup option
  2021-01-14  6:38     ` Eric Sunshine
@ 2021-01-14  8:17       ` 胡哲宁
  0 siblings, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-14  8:17 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: ZheNing Hu via GitGitGadget, Git List

You can see that the coding and documentation of GIT community are really very
standard, which may be one of the things I lack and need to improve ;)
Thanks for patiently correct my errors.

Eric Sunshine <sunshine@sunshineco.com> 于2021年1月14日周四 下午2:39写道:
>
> On Fri, Jan 8, 2021 at 9:36 AM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > builtin:ls-files.c:add git ls-file --dedup option
>
> This subject concisely explains the purpose of the patch. That's good.
> A more typical way to write it would be:
>
>     ls-files: add --dedup option
>
OK.I will correct it more specification.
> > This commit standardizes the code format.
>
> Fixing problems pointed out by reviewers is good. Normally, however,
> when you submit a new version of your patch or patch series, you
> should apply these fixes directly to the patch(es) which introduced
> the problems in the first place rather than adding one or more
> additional patches to fix problems introduced in earlier patches. To
> do this, you typically would use `git rebase -i` or `git commit
> --amend` to squash the fixes into the problematic patches. Thus, when
> you re-submit the patches, they will appear to be "perfect".
>
> For this particular two-patch series, patch [2/2] is doing two things:
> (1) fixing style problems from patch [1/2], and (2) adding
> documentation and tests which logically belong with the feature added
> by patch [1/2]. Taking the above advice into account, a better
> presentation when you re-submit this series would be to squash these
> two patches into a single patch.
>
I thought before this was gitgitgadget would sent duplicate patch
over and over again. It seems like I really should go straight ahead
and squash my commits , so I know what I should do.
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> > diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> > @@ -81,6 +82,9 @@ OPTIONS
> > +--dedup::
> > +       Suppress duplicates entries when conflicts happen or
> > +       specify -d -m at the same time.
>
> For consistency with typesetting elsewhere in this file, use backticks
> around the command-line options. It also often is a good idea to spell
> the options using long form since it is typically easier to search for
> the long form of an option in documentation. So, perhaps the above can
> be written like this:
>
>     Suppress duplicate entries when `--deleted` and `--modified` are
>     combined.
>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > -       const struct cache_entry *last_stage=NULL;
> > +       const struct cache_entry *last_stage = NULL;
> > -                       if(show_cached && delete_dup){
> > +                       if (show_cached && delete_dup) {
> > -                                       last_stage=ce;
> > +                                       last_stage = ce;
> > -                       if(delete_dup){
> > +                       if (delete_dup) {
> > -                       if(delete_dup && show_deleted && show_modified && err)
> > +                       if (delete_dup && show_deleted && show_modified && err)
> > -                       else{
> > -                               if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
> > +                       else {
> > +                               if (show_deleted && err)
>
> As mentioned above, these style fixes should be squashed into the
> first patch, rather than being done in a separate patch, so that
> reviewers see a nicely polished patch rather than a patch which
> requires later fixing up.
>
> > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> > @@ -0,0 +1,63 @@
> > +test_expect_success 'master branch setup and write expect1 expect2 and commit' '
>
> We usually give this test a simple title such as "setup" so that we
> don't have to worry about the title becoming outdated as people make
> changes to the test itself.
>
> > +       touch a.txt &&
> > +       touch b.txt &&
> > +       touch delete.txt &&
>
> On this project, we use `touch` when the timestamp of the empty files
> is important to the test. If the timestamp is not important, then we
> just use `>`, like this:
>
>     >a.txt &&
>     >b.txt &&
>     >delete.txt &&
>
OK,maybe because I always use touch to generate files.
> > +       cat <<-EOF >expect1 &&
> > +       M a.txt
> > +       H b.txt
> > +       H delete.txt
> > +       H expect1
> > +       H expect2
> > +       EOF
> > +       cat <<-EOF >expect2 &&
> > +       C a.txt
> > +       R delete.txt
> > +       EOF
>
> When no variables are being interpolated in the here-doc content, we
> use -\EOF to let readers know that the here-doc body is literal. So:
>
>     cat >expect1 <<-\EOF &&
>     ...
>     EOF
>
> > +       git add a.txt b.txt delete.txt expect1 expect2 &&
> > +       git commit -m master:1
> > +'
> > +
> > +test_expect_success 'main commit again' '
> > +       echo a>a.txt &&
> > +       echo b>b.txt &&
> > +       echo delete>delete.txt &&
> > +       git add a.txt b.txt delete.txt &&
> > +       git commit -m master:2
> > +'
> > +
> > +test_expect_success 'dev commit' '
> > +       git checkout HEAD~ &&
> > +       git switch -c dev &&
> > +       echo change>a.txt &&
> > +       git add a.txt &&
> > +       git commit -m dev:1
> > +'
>
> These two tests following the "setup" test also seem to be doing setup
> tasks rather than testing the new --dedup functionality. If this is
> the case, then it probably would make sense to combine all three tests
> into a single "setup" test.
>
> > +test_expect_success 'dev merge master' '
> > +       test_must_fail git merge master &&
> > +       git ls-files -t --dedup >actual1 &&
> > +       test_cmp expect1 actual1 &&
> > +       rm delete.txt &&
> > +       git ls-files -d -m -t --dedup >actual2 &&
> > +       test_cmp expect2 actual2
> > +'
>
> Do you foresee that people will add more tests to this file which will
> use the files and branches set up by the "setup" test(s)? If not, if
> those branches and files are only ever going to be used by this one
> test, then it probably would be better to combine all the above code
> into a single test.
No,the test file may just need only one.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3] ls-files.c: add --dedup option
  2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
  2021-01-08 14:36   ` [PATCH v2 1/2] " ZheNing Hu via GitGitGadget
  2021-01-08 14:36   ` [PATCH v2 2/2] builtin:ls-files.c:add " ZheNing Hu via GitGitGadget
@ 2021-01-14 12:22   ` 阿德烈 via GitGitGadget
  2021-01-15  0:59     ` Junio C Hamano
                       ` (2 more replies)
  2 siblings, 3 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-14 12:22 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, 阿德烈,
	ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--dedup` option will suppress
some duplicate options under some conditions.

In a merge conflict, one item of "git ls-files" output may
appear multiple times. For example,now the file `a.c` has
a conflict,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --dedup" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` in
the same time, The `--dedup` option can also suppress modified
entries output.

`--dedup` option relevant descriptions in
`Documentation/git-ls-files.txt`,
the test script in `t/t3012-ls-files-dedup.sh`
prove the correctness of the `--dedup` option.

this patch fixed:
https://github.com/gitgitgadget/git/issues/198
Thanks.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    builtin/ls-files.c:add git ls-file --dedup option
    
    I am reading the source code of git ls-files and learned that git ls
    -files may have duplicate entries when conflict occurs in a branch merge
    or when different options are used at the same time. Users may fell
    confuse when they see these duplicate entries.
    
    As Junio C Hamano said ,it have odd behaviour.
    
    Therefore, we can provide an additional option to git ls-files to delete
    those repeated information.
    
    This fixes https://github.com/gitgitgadget/git/issues/198
    
    Thanks!

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v2:

 1:  0261e5d245e < -:  ----------- builtin/ls-files.c:add git ls-file --dedup option
 2:  a09a5098aa6 ! 1:  5ce52c8b7a4 builtin:ls-files.c:add git ls-file --dedup option
     @@ Metadata
      Author: ZheNing Hu <adlternative@gmail.com>
      
       ## Commit message ##
     -    builtin:ls-files.c:add git ls-file --dedup option
     +    ls-files.c: add --dedup option
      
     -    This commit standardizes the code format.
     -    For git ls-file --dedup option added
     -    relevant descriptions in Documentation/git-ls-files.txt
     -    and wrote t/t3012-ls-files-dedup.sh test script
     -    to prove the correctness of--dedup option.
     +    In order to provide users a better experience
     +    when viewing information about files in the index
     +    and the working tree, the `--dedup` option will suppress
     +    some duplicate options under some conditions.
      
     -    this patch fixed: https://github.com/gitgitgadget/git/issues/198
     +    In a merge conflict, one item of "git ls-files" output may
     +    appear multiple times. For example,now the file `a.c` has
     +    a conflict,`a.c` will appear three times in the output of
     +    "git ls-files".We can use "git ls-files --dedup" to output
     +    `a.c` only one time.(unless `--stage` or `--unmerged` is
     +    used to view all the detailed information in the index)
     +
     +    In addition, if you use both `--delete` and `--modify` in
     +    the same time, The `--dedup` option can also suppress modified
     +    entries output.
     +
     +    `--dedup` option relevant descriptions in
     +    `Documentation/git-ls-files.txt`,
     +    the test script in `t/t3012-ls-files-dedup.sh`
     +    prove the correctness of the `--dedup` option.
     +
     +    this patch fixed:
     +    https://github.com/gitgitgadget/git/issues/198
          Thanks.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
     @@ Documentation/git-ls-files.txt: OPTIONS
       	See OUTPUT below for more information.
       
      +--dedup::
     -+	Suppress duplicates entries when conflicts happen or
     -+	specify -d -m at the same time.
     ++	Suppress duplicate entries when conflict happen or `--deleted`
     ++	and `--modified` are combined.
     ++
       -x <pattern>::
       --exclude=<pattern>::
       	Skip untracked files matching pattern.
      
       ## builtin/ls-files.c ##
     +@@ builtin/ls-files.c: static int line_terminator = '\n';
     + static int debug_mode;
     + static int show_eol;
     + static int recurse_submodules;
     ++static int delete_dup;
     + 
     + static const char *prefix;
     + static int max_prefix_len;
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
       {
       	int i;
       	struct strbuf fullname = STRBUF_INIT;
     --	const struct cache_entry *last_stage=NULL;
      +	const struct cache_entry *last_stage = NULL;
       
       	/* For cached/deleted files we don't need to even do the readdir */
       	if (show_others || show_killed) {
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 	if (show_cached || show_stage) {
       		for (i = 0; i < repo->index->cache_nr; i++) {
       			const struct cache_entry *ce = repo->index->cache[i];
     --			if(show_cached && delete_dup){
     -+
     + 
      +			if (show_cached && delete_dup) {
     - 				switch (ce_stage(ce)) {
     - 				case 0:
     - 				default:
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 					if (last_stage &&
     - 					!strcmp(last_stage->name, ce->name))
     - 						continue;
     --					last_stage=ce;
     ++				switch (ce_stage(ce)) {
     ++				case 0:
     ++				default:
     ++					break;
     ++				case 1:
     ++				case 2:
     ++				case 3:
     ++					if (last_stage &&
     ++					!strcmp(last_stage->name, ce->name))
     ++						continue;
      +					last_stage = ce;
     - 				}
     - 			}
     ++				}
     ++			}
       			construct_fullname(&fullname, repo, ce);
     + 
     + 			if ((dir->flags & DIR_SHOW_IGNORED) &&
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 			const struct cache_entry *ce = repo->index->cache[i];
       			struct stat st;
       			int err;
     --			if(delete_dup){
     -+
     + 
      +			if (delete_dup) {
     - 				switch (ce_stage(ce)) {
     - 				case 0:
     - 				default:
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 					if (last_stage &&
     - 					!strcmp(last_stage->name, ce->name))
     - 						continue;
     --					last_stage=ce;
     ++				switch (ce_stage(ce)) {
     ++				case 0:
     ++				default:
     ++					break;
     ++				case 1:
     ++				case 2:
     ++				case 3:
     ++					if (last_stage &&
     ++					!strcmp(last_stage->name, ce->name))
     ++						continue;
      +					last_stage = ce;
     - 				}
     - 			}
     ++				}
     ++			}
       			construct_fullname(&fullname, repo, ce);
     + 
     + 			if ((dir->flags & DIR_SHOW_IGNORED) &&
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
       			if (ce_skip_worktree(ce))
       				continue;
       			err = lstat(fullname.buf, &st);
     --			if(delete_dup && show_deleted && show_modified && err)
     +-			if (show_deleted && err)
      +			if (delete_dup && show_deleted && show_modified && err)
       				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --			else{
     --				if (show_deleted && err)/* you can't find it,so it's actually removed at all! */
     +-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
     +-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
      +			else {
      +				if (show_deleted && err)
     - 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     - 				if (show_modified && ie_modified(repo->index, ce, &st, 0))
     - 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++				if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++			}
     + 		}
     + 	}
     + 
      @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
       			N_("pretend that paths removed since <tree-ish> are still present")),
       		OPT__ABBREV(&abbrev),
       		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
     --		OPT_BOOL(0, "dedup", &delete_dup, N_("delete duplicate entry in index")),
      +		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
       		OPT_END()
       	};
     @@ t/t3012-ls-files-dedup.sh (new)
      +
      +. ./test-lib.sh
      +
     -+test_expect_success 'master branch setup and write expect1 expect2 and commit' '
     -+	touch a.txt &&
     -+	touch b.txt &&
     -+	touch delete.txt &&
     -+	cat <<-EOF >expect1 &&
     ++test_expect_success 'setup' '
     ++	> a.txt &&
     ++	> b.txt &&
     ++	> delete.txt &&
     ++	cat >expect1<<-\EOF &&
      +	M a.txt
      +	H b.txt
      +	H delete.txt
      +	H expect1
      +	H expect2
      +	EOF
     -+	cat <<-EOF >expect2 &&
     ++	cat >expect2<<-EOF &&
      +	C a.txt
      +	R delete.txt
      +	EOF
      +	git add a.txt b.txt delete.txt expect1 expect2 &&
     -+	git commit -m master:1
     -+'
     -+
     -+test_expect_success 'main commit again' '
     ++	git commit -m master:1 &&
      +	echo a>a.txt &&
      +	echo b>b.txt &&
     -+	echo delete>delete.txt &&
     ++	echo delete >delete.txt &&
      +	git add a.txt b.txt delete.txt &&
     -+	git commit -m master:2
     -+'
     -+
     -+test_expect_success 'dev commit' '
     ++	git commit -m master:2 &&
      +	git checkout HEAD~ &&
      +	git switch -c dev &&
     -+	echo change>a.txt &&
     ++	echo change >a.txt &&
      +	git add a.txt &&
     -+	git commit -m dev:1
     -+'
     -+
     -+test_expect_success 'dev merge master' '
     ++	git commit -m dev:1 &&
      +	test_must_fail git merge master &&
      +	git ls-files -t --dedup >actual1 &&
      +	test_cmp expect1 actual1 &&


 Documentation/git-ls-files.txt |  5 ++++
 builtin/ls-files.c             | 41 ++++++++++++++++++++++++--
 t/t3012-ls-files-dedup.sh      | 54 ++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+), 3 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..0f8dbeeea20 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--dedup]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--dedup::
+	Suppress duplicate entries when conflict happen or `--deleted`
+	and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..bc4eded19ab 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int delete_dup;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_stage = NULL;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
 
+			if (show_cached && delete_dup) {
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage = ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -337,6 +353,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			struct stat st;
 			int err;
 
+			if (delete_dup) {
+				switch (ce_stage(ce)) {
+				case 0:
+				default:
+					break;
+				case 1:
+				case 2:
+				case 3:
+					if (last_stage &&
+					!strcmp(last_stage->name, ce->name))
+						continue;
+					last_stage = ce;
+				}
+			}
 			construct_fullname(&fullname, repo, ce);
 
 			if ((dir->flags & DIR_SHOW_IGNORED) &&
@@ -347,10 +377,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			if (delete_dup && show_deleted && show_modified && err)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else {
+				if (show_deleted && err)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified && ie_modified(repo->index, ce, &st, 0))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		}
 	}
 
@@ -578,6 +612,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..aec7d364235
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+
+test_description='git ls-files --dedup test.
+
+This test prepares the following in the cache:
+
+    a.txt       - a file(base)
+    a.txt	- a file(master)
+    a.txt       - a file(dev)
+    b.txt       - a file
+    delete.txt  - a file
+    expect1	- a file
+    expect2	- a file
+
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	> a.txt &&
+	> b.txt &&
+	> delete.txt &&
+	cat >expect1<<-\EOF &&
+	M a.txt
+	H b.txt
+	H delete.txt
+	H expect1
+	H expect2
+	EOF
+	cat >expect2<<-EOF &&
+	C a.txt
+	R delete.txt
+	EOF
+	git add a.txt b.txt delete.txt expect1 expect2 &&
+	git commit -m master:1 &&
+	echo a>a.txt &&
+	echo b>b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m master:2 &&
+	git checkout HEAD~ &&
+	git switch -c dev &&
+	echo change >a.txt &&
+	git add a.txt &&
+	git commit -m dev:1 &&
+	test_must_fail git merge master &&
+	git ls-files -t --dedup >actual1 &&
+	test_cmp expect1 actual1 &&
+	rm delete.txt &&
+	git ls-files -d -m -t --dedup >actual2 &&
+	test_cmp expect2 actual2
+'
+
+test_done

base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
@ 2021-01-15  0:59     ` Junio C Hamano
  2021-01-17  3:45       ` 胡哲宁
  2021-01-16  7:13     ` Eric Sunshine
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
  2 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-15  0:59 UTC (permalink / raw)
  To: 阿德烈 via GitGitGadget
  Cc: git, Eric Sunshine, 胡哲宁

"阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--dedup` option will suppress
> some duplicate options under some conditions.
>
> In a merge conflict, one item of "git ls-files" output may
> appear multiple times. For example,now the file `a.c` has
> a conflict,`a.c` will appear three times in the output of
> "git ls-files".We can use "git ls-files --dedup" to output
> `a.c` only one time.(unless `--stage` or `--unmerged` is
> used to view all the detailed information in the index)

Unlike these option names we see in the description, "dedup" is not
a full word.  Perhaps spell it fully "--deduplicate" while letting
parse-options machinery to accept unique prefix (including
"--dedup"?

> In addition, if you use both `--delete` and `--modify` in
> the same time, The `--dedup` option can also suppress modified

"at the same time", I think.

> entries output.

[let's call this point "point A"]

> `--dedup` option relevant descriptions in
> `Documentation/git-ls-files.txt`,

I am not sure what this means.

> the test script in `t/t3012-ls-files-dedup.sh`
> prove the correctness of the `--dedup` option.

No amount of tests "proves" any correctness, but that is OK.  I
think you meant to say "a few tests have been added to t3012 to
protect the new feature from future breakage" or something like
that.

In any case, I think everything after "point A" and before your sign
off does not belong to the log message.  The diffstat shows that
documentation and tests have been added already.

> +--dedup::
> +	Suppress duplicate entries when conflict happen

"conflict happen" -> "there are unmerged paths", as the term
"unmerged" is already shown to readers of "ls-files --help".

> +	or `--deleted` and `--modified` are combined.

I somehow thought that you refrained from deduping when you are
showing the stages with "ls-files -u" and "ls-files -s", or you are
showing status with "ls-files -t", because you will otherwise lose
information.  In other words, showing only one cache entry out of
many that share the same name makes sense only when we are showing
name and nothing else.

Has that been changed from the previous rounds?

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index c8eae899b82..bc4eded19ab 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  		for (i = 0; i < repo->index->cache_nr; i++) {
>  			const struct cache_entry *ce = repo->index->cache[i];
>  
> +			if (show_cached && delete_dup) {
> +				switch (ce_stage(ce)) {
> +				case 0:
> +				default:
> +					break;

This part looks somewhat strange for two reasons:

 - The code enumerates ALL the possible stage numbers from 0 to 3;
   if we were to have "default", I'd expect it would be a separate
   switch arm from the possible values that calls out an programming
   error, e.g. BUG("at stage #%d???", ce_stage(ce)).  Simply removing
   the "default" arm would be another way out of this strangeness.

 - When we see a stage #0 entry, we know we will not have higher
   stage entries with the same name.  Not clearing last_stage here
   feels wrong, as the primary reason why last_stage variable is
   used is to remember the last ce that was shown, so that other
   entries with the same name can be skipped.

By the way, "last_shown_ce" may be a much better name for the
variable, as you do not really care what stage number the ce you
showed last was at (you care about its name).

Also, I do not see a good reason why the last_shown_ce variable
should have lifetime longer than the block that contains this for()
loop (and the other for loop for deleted/modified codepath we see
later).  Especially since you initialize the variable that you made
visible to the entire function to NULL before entering the first for
loop, but you do not set it back to NULL before entering the second
for loop, it is inviting a subtle bug.  You may have been given
show_cached and show_modified at the same time, so you enter the
first loop and have shown the first stage of the last conflicted
path, whose cache entry is left in the last_stage variable.  Since
the variable has longer lifespan than necessary, when the second
loop is entered, it still points at the cache entry for the highest
stage of the last conflicted path.  That is because the code forgets
to clear it to NULL before entering the second for loop.

Having said all that, I suspect that we may be much better off if we
can somehow merge the two loops into one.  You may be dedup adjacent
entries in each loop separately with the approach taken by this
patch, but I do not think the patch would work to deduplicate across
two loops.  For example, what happens if you do this?

    $ git reset --hard
    $ echo >>builtin/ls-files.c
    $ git ls-files -c -m builtin/ls-files.c
    $ git ls-files -t -c -m builtin/ls-files.c

I think you see the path twice in the output, with or without your
--dedup option (remember what I said about proving, by the way? ;-)).

Once we successfully merged two loops into one, the part that shows
tracked paths in the function would have only one loop, and it would
become a lot cleaner to add the logic to "skip showing the ce if it
has the same name as the previously shown one, only when doing so
won't lose information", by doing something like this:

	static void show_files(....)
	{
		/* show_others || show_killed done here */
		...

		/* leave early if not showing anything */
		if (! (show_cached || show_stage || show_deleted || show_modified))
			return;

		last_shown_ce = NULL;
		for (i = 0; i < repo->index->cache_nr; i++) {
			const struct cache_entry *ce = repo->index->cache[i];

			if (skipping_duplicates && last_shown_ce)
				if (!strcmp(ce->name, last_shown_ce->name))
					continue;

			construct_fullname();

                        /* various reasons to skip the entry tested */
			if (showing ignored directory and ce is excluded)
				continue;
			if (show_unmerged && !ce_stage(ce))
				continue;
			if (ce->ce_flags & CE_UPDATE)
				continue;
			... other reasons may appear here ...

			/* now we are committed to show it */
			last_shown_ce = ce;

			... various different ways to show ce come here ... 
			show_ce(...);
		}
	}

where "skipping_duplicates" would be set when "--deduplicate" is
asked and we are not showing information other than the pathname
via various options e.g. the tags (-t) or stages (-s/-u).

> +			if (delete_dup && show_deleted && show_modified && err)
>  				show_ce(repo, dir, ce, fullname.buf, tag_removed);

I actually think the original code that is still shown here ...

> +			else {
> +				if (show_deleted && err)
> +					show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +				if (show_modified && ie_modified(repo->index, ce, &st, 0))

... about modified file is buggy.  If lstat() failed, then &st has
no useful information, so it is wrong to feed it to ie_modified().

Perhaps a three-patch series that is structured like this may be in
order?

 #1: bugfix for --deleted and --modified.

	err = lstat(fullname.buf, &st);
	if (err) {
		/* deleted? */
		if (errno != E_NOENT)
			error_errno("cannot lstat '%s'", fullname.buf);
		else {
			if (show_deleted)
                        	show_ce(..., tag_removed);
			if (show_modified)
                        	show_ce(..., tag_modified);
		}
	} else if (show_modified && ie_modified(...))
		show_ce(..., tag_modified);
    
     This hopefully should not change the semantics.  If you ask
     --deleted and --modified, a deleted path would be listed twice.

 #2: consolidate two for loops into one.

     The two loops have slightly different condition to skip a ce,
     and different logic on what tag each path is shown with.  When
     --cached and --modified or --deleted are asked for at the same
     time, we'd show them multiple times (this is done inside the
     loop for each ce)

	if (show_cached || show_stage)
		show_ce(... ce_stage(ce) ? tag_unmerged : ...);
	err = lstat(fullname.buf, &st);
	if (err) {
        	/* deleted? */
                ... code that corresponds to the
		... illustration in #1 above come here.
	} else if (...)
		show_ce(..., tag_modified);

     This changes the semantics.  The original iterates the index
     twice, so you may see the same entry from --cached once and
     then again from --modified.  The updated one still will show
     the same entry twice but next to each other.

 #3: optionally deduplicate.

     Once we have a single loop, deduplicationg based on names is
     trivial, as we seen before.


Hmm?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
  2021-01-15  0:59     ` Junio C Hamano
@ 2021-01-16  7:13     ` Eric Sunshine
  2021-01-17  3:49       ` 胡哲宁
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
  2 siblings, 1 reply; 65+ messages in thread
From: Eric Sunshine @ 2021-01-16  7:13 UTC (permalink / raw)
  To: 阿德烈 via GitGitGadget; +Cc: Git List, ZheNing Hu

On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--dedup` option will suppress
> some duplicate options under some conditions.
> [...]

I have a few very minor comments alongside Junio's review comments...

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
> diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> @@ -0,0 +1,54 @@
> +test_description='git ls-files --dedup test.
> +
> +This test prepares the following in the cache:
> +
> +    a.txt       - a file(base)
> +    a.txt      - a file(master)
> +    a.txt       - a file(dev)
> +    b.txt       - a file
> +    delete.txt  - a file
> +    expect1    - a file
> +    expect2    - a file
> +
> +'

This test script description is outdated now. Perhaps shorten it to:

    test_description='ls-files dedup tests'

Or, it might be suitable to simply add the new test to the existing
t3004-ls-files-basic.sh instead.

> +test_expect_success 'setup' '
> +       > a.txt &&
> +       > b.txt &&
> +       > delete.txt &&
> +       cat >expect1<<-\EOF &&

Style nits: no space after redirection operator and a space before
redirection operator:

    >a.txt &&
    >b.txt &&
    >delete.txt &&
    cat >expect1 <<-\EOF &&

> +       cat >expect2<<-EOF &&

Nit: missing the backslash (and wrong spacing):

    cat >expect2 <<-\EOF &&

> +       echo a>a.txt &&
> +       echo b>b.txt &&

Style:

    echo a >a.txt &&
    echo b >b.txt &&

> +       echo delete >delete.txt &&
> +       git add a.txt b.txt delete.txt &&
> +       git commit -m master:2 &&
> +       git checkout HEAD~ &&
> +       git switch -c dev &&

If someone adds a new test after this test, then that new test will
run in the "dev" branch, which might be unexpected or undesirable. It
often is a good idea to ensure that tests do certain types of cleanup
to avoid breaking subsequent tests. Here, it would be a good idea to
ensure that the test switches back to the original branch when it
finishes (regardless of whether it finishes successfully or
unsuccessfully).

    git switch -c dev &&
    test_when_finished "git switch master" &&

Or you could use `git switch -` if you don't want to hard-code the
name "master" in the test (since there has been effort lately to
remove that name from tests.

> +       echo change >a.txt &&
> +       git add a.txt &&
> +       git commit -m dev:1 &&
> +       test_must_fail git merge master &&
> +       git ls-files -t --dedup >actual1 &&
> +       test_cmp expect1 actual1 &&
> +       rm delete.txt &&
> +       git ls-files -d -m -t --dedup >actual2 &&
> +       test_cmp expect2 actual2

We usually don't bother giving temporary files unique names like
"actual1" and "actual2" unless those files must exist at the same
time. This is because unique names like this may confuse readers into
wondering if there is some hidden interdependency between the files.
In this case, the files don't need to exist at the same time, so it
may be better simply to use the names "actual" and "expect", like
this:

    ...other stuff...
    cat >expect <<-\EOF &&
    ...
    EOF
    git ls-files -t --dedup >actual &&
    test_cmp expect actual &&
    rm delete.txt &&
    cat >expect <<-\EOF &&
    ...
    EOF
    git ls-files -d -m -t --dedup >actual &&
    test_cmp expect actual

(It also has the benefit that the "expect" content is closer to the
place where it is actually used, which may make it a bit easier for a
person reading the test to understand what is supposed to be
produced.)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-15  0:59     ` Junio C Hamano
@ 2021-01-17  3:45       ` 胡哲宁
  2021-01-17  4:37         ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-17  3:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Eric Sunshine

Junio, thank you for your patience to review
my patch and guide me how to modify it.

Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道:
>
> "阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> >
> > In a merge conflict, one item of "git ls-files" output may
> > appear multiple times. For example,now the file `a.c` has
> > a conflict,`a.c` will appear three times in the output of
> > "git ls-files".We can use "git ls-files --dedup" to output
> > `a.c` only one time.(unless `--stage` or `--unmerged` is
> > used to view all the detailed information in the index)
>
> Unlike these option names we see in the description, "dedup" is not
> a full word.  Perhaps spell it fully "--deduplicate" while letting
> parse-options machinery to accept unique prefix (including
> "--dedup"?
>
Ok i have modified "--dedup" to "--deduplicate".
> > In addition, if you use both `--delete` and `--modify` in
> > the same time, The `--dedup` option can also suppress modified
>
> "at the same time", I think.
>
My poor English grammar :-)
> > entries output.
>
> [let's call this point "point A"]
>
> > `--dedup` option relevant descriptions in
> > `Documentation/git-ls-files.txt`,
>
> I am not sure what this means.
>
> > the test script in `t/t3012-ls-files-dedup.sh`
> > prove the correctness of the `--dedup` option.
>
> No amount of tests "proves" any correctness, but that is OK.  I
> think you meant to say "a few tests have been added to t3012 to
> protect the new feature from future breakage" or something like
> that.
>
Alright, I understand!
> In any case, I think everything after "point A" and before your sign
> off does not belong to the log message.  The diffstat shows that
> documentation and tests have been added already.
>
> > +--dedup::
> > +     Suppress duplicate entries when conflict happen
>
> "conflict happen" -> "there are unmerged paths", as the term
> "unmerged" is already shown to readers of "ls-files --help".
>
Well, maybe I'm not good enough with these proper nouns.
> > +     or `--deleted` and `--modified` are combined.
>
> I somehow thought that you refrained from deduping when you are
> showing the stages with "ls-files -u" and "ls-files -s", or you are
> showing status with "ls-files -t", because you will otherwise lose
> information.  In other words, showing only one cache entry out of
> many that share the same name makes sense only when we are showing
> name and nothing else.
>
You are right, "--deduplicate" should only work on duplicate file names,
so "ls-files -t" also needs to be corrected.
Well,This is true a bug I haven't notice.
> Having said all that, I suspect that we may be much better off if we
> can somehow merge the two loops into one.  You may be dedup adjacent
> entries in each loop separately with the approach taken by this
> patch, but I do not think the patch would work to deduplicate across
> two loops.  For example, what happens if you do this?
>
>     $ git reset --hard
>     $ echo >>builtin/ls-files.c
>     $ git ls-files -c -m builtin/ls-files.c
>     $ git ls-files -t -c -m builtin/ls-files.c
>
> I think you see the path twice in the output, with or without your
> --dedup option (remember what I said about proving, by the way? ;-)).
>
Yeah,This is because I may have missed the -c option with other options
at the same time.
Here I may disagree with your point of view:
                 if (errno != E_NOENT)
                         error_errno("cannot lstat '%s'", fullname.buf);
With this sentence included, the patch will fail the test:
t/t3010-ls-files-killed-modified.sh.
the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);`
So I temporarily removed the judgment of errno.
>  #2: consolidate two for loops into one.
>
>      The two loops have slightly different condition to skip a ce,
>      and different logic on what tag each path is shown with.  When
>      --cached and --modified or --deleted are asked for at the same
>      time, we'd show them multiple times (this is done inside the
>      loop for each ce)
>
>         if (show_cached || show_stage)
>                 show_ce(... ce_stage(ce) ? tag_unmerged : ...);
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 ... code that corresponds to the
>                 ... illustration in #1 above come here.
>         } else if (...)
>                 show_ce(..., tag_modified);
>
>      This changes the semantics.  The original iterates the index
>      twice, so you may see the same entry from --cached once and
>      then again from --modified.  The updated one still will show
>      the same entry twice but next to each other.
>
Well,This does change the semantics. I think people who used two
for loops before may want to separate different outputs.
Now, if you don’t use "--deduplicate", You may see six consecutive
items under a combination of multiple options.
>  #3: optionally deduplicate.
>
>      Once we have a single loop, deduplicationg based on names is
>      trivial, as we seen before.
>
>
Indeed so.
> Hmm?
THANKS.

Junio C Hamano <gitster@pobox.com> 于2021年1月15日周五 上午8:59写道:
>
> "阿德烈 via GitGitGadget"  <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> >
> > In a merge conflict, one item of "git ls-files" output may
> > appear multiple times. For example,now the file `a.c` has
> > a conflict,`a.c` will appear three times in the output of
> > "git ls-files".We can use "git ls-files --dedup" to output
> > `a.c` only one time.(unless `--stage` or `--unmerged` is
> > used to view all the detailed information in the index)
>
> Unlike these option names we see in the description, "dedup" is not
> a full word.  Perhaps spell it fully "--deduplicate" while letting
> parse-options machinery to accept unique prefix (including
> "--dedup"?
>
> > In addition, if you use both `--delete` and `--modify` in
> > the same time, The `--dedup` option can also suppress modified
>
> "at the same time", I think.
>
> > entries output.
>
> [let's call this point "point A"]
>
> > `--dedup` option relevant descriptions in
> > `Documentation/git-ls-files.txt`,
>
> I am not sure what this means.
>
> > the test script in `t/t3012-ls-files-dedup.sh`
> > prove the correctness of the `--dedup` option.
>
> No amount of tests "proves" any correctness, but that is OK.  I
> think you meant to say "a few tests have been added to t3012 to
> protect the new feature from future breakage" or something like
> that.
>
> In any case, I think everything after "point A" and before your sign
> off does not belong to the log message.  The diffstat shows that
> documentation and tests have been added already.
>
> > +--dedup::
> > +     Suppress duplicate entries when conflict happen
>
> "conflict happen" -> "there are unmerged paths", as the term
> "unmerged" is already shown to readers of "ls-files --help".
>
> > +     or `--deleted` and `--modified` are combined.
>
> I somehow thought that you refrained from deduping when you are
> showing the stages with "ls-files -u" and "ls-files -s", or you are
> showing status with "ls-files -t", because you will otherwise lose
> information.  In other words, showing only one cache entry out of
> many that share the same name makes sense only when we are showing
> name and nothing else.
>
> Has that been changed from the previous rounds?
>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index c8eae899b82..bc4eded19ab 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >               for (i = 0; i < repo->index->cache_nr; i++) {
> >                       const struct cache_entry *ce = repo->index->cache[i];
> >
> > +                     if (show_cached && delete_dup) {
> > +                             switch (ce_stage(ce)) {
> > +                             case 0:
> > +                             default:
> > +                                     break;
>
> This part looks somewhat strange for two reasons:
>
>  - The code enumerates ALL the possible stage numbers from 0 to 3;
>    if we were to have "default", I'd expect it would be a separate
>    switch arm from the possible values that calls out an programming
>    error, e.g. BUG("at stage #%d???", ce_stage(ce)).  Simply removing
>    the "default" arm would be another way out of this strangeness.
>
>  - When we see a stage #0 entry, we know we will not have higher
>    stage entries with the same name.  Not clearing last_stage here
>    feels wrong, as the primary reason why last_stage variable is
>    used is to remember the last ce that was shown, so that other
>    entries with the same name can be skipped.
>
> By the way, "last_shown_ce" may be a much better name for the
> variable, as you do not really care what stage number the ce you
> showed last was at (you care about its name).
>
> Also, I do not see a good reason why the last_shown_ce variable
> should have lifetime longer than the block that contains this for()
> loop (and the other for loop for deleted/modified codepath we see
> later).  Especially since you initialize the variable that you made
> visible to the entire function to NULL before entering the first for
> loop, but you do not set it back to NULL before entering the second
> for loop, it is inviting a subtle bug.  You may have been given
> show_cached and show_modified at the same time, so you enter the
> first loop and have shown the first stage of the last conflicted
> path, whose cache entry is left in the last_stage variable.  Since
> the variable has longer lifespan than necessary, when the second
> loop is entered, it still points at the cache entry for the highest
> stage of the last conflicted path.  That is because the code forgets
> to clear it to NULL before entering the second for loop.
>
> Having said all that, I suspect that we may be much better off if we
> can somehow merge the two loops into one.  You may be dedup adjacent
> entries in each loop separately with the approach taken by this
> patch, but I do not think the patch would work to deduplicate across
> two loops.  For example, what happens if you do this?
>
>     $ git reset --hard
>     $ echo >>builtin/ls-files.c
>     $ git ls-files -c -m builtin/ls-files.c
>     $ git ls-files -t -c -m builtin/ls-files.c
>
> I think you see the path twice in the output, with or without your
> --dedup option (remember what I said about proving, by the way? ;-)).
>
> Once we successfully merged two loops into one, the part that shows
> tracked paths in the function would have only one loop, and it would
> become a lot cleaner to add the logic to "skip showing the ce if it
> has the same name as the previously shown one, only when doing so
> won't lose information", by doing something like this:
>
>         static void show_files(....)
>         {
>                 /* show_others || show_killed done here */
>                 ...
>
>                 /* leave early if not showing anything */
>                 if (! (show_cached || show_stage || show_deleted || show_modified))
>                         return;
>
>                 last_shown_ce = NULL;
>                 for (i = 0; i < repo->index->cache_nr; i++) {
>                         const struct cache_entry *ce = repo->index->cache[i];
>
>                         if (skipping_duplicates && last_shown_ce)
>                                 if (!strcmp(ce->name, last_shown_ce->name))
>                                         continue;
>
>                         construct_fullname();
>
>                         /* various reasons to skip the entry tested */
>                         if (showing ignored directory and ce is excluded)
>                                 continue;
>                         if (show_unmerged && !ce_stage(ce))
>                                 continue;
>                         if (ce->ce_flags & CE_UPDATE)
>                                 continue;
>                         ... other reasons may appear here ...
>
>                         /* now we are committed to show it */
>                         last_shown_ce = ce;
>
>                         ... various different ways to show ce come here ...
>                         show_ce(...);
>                 }
>         }
>
> where "skipping_duplicates" would be set when "--deduplicate" is
> asked and we are not showing information other than the pathname
> via various options e.g. the tags (-t) or stages (-s/-u).
>
> > +                     if (delete_dup && show_deleted && show_modified && err)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_removed);
>
> I actually think the original code that is still shown here ...
>
> > +                     else {
> > +                             if (show_deleted && err)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified && ie_modified(repo->index, ce, &st, 0))
>
> ... about modified file is buggy.  If lstat() failed, then &st has
> no useful information, so it is wrong to feed it to ie_modified().
>
> Perhaps a three-patch series that is structured like this may be in
> order?
>
>  #1: bugfix for --deleted and --modified.
>
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 if (errno != E_NOENT)
>                         error_errno("cannot lstat '%s'", fullname.buf);
>                 else {
>                         if (show_deleted)
>                                 show_ce(..., tag_removed);
>                         if (show_modified)
>                                 show_ce(..., tag_modified);
>                 }
>         } else if (show_modified && ie_modified(...))
>                 show_ce(..., tag_modified);
>
>      This hopefully should not change the semantics.  If you ask
>      --deleted and --modified, a deleted path would be listed twice.
>
>  #2: consolidate two for loops into one.
>
>      The two loops have slightly different condition to skip a ce,
>      and different logic on what tag each path is shown with.  When
>      --cached and --modified or --deleted are asked for at the same
>      time, we'd show them multiple times (this is done inside the
>      loop for each ce)
>
>         if (show_cached || show_stage)
>                 show_ce(... ce_stage(ce) ? tag_unmerged : ...);
>         err = lstat(fullname.buf, &st);
>         if (err) {
>                 /* deleted? */
>                 ... code that corresponds to the
>                 ... illustration in #1 above come here.
>         } else if (...)
>                 show_ce(..., tag_modified);
>
>      This changes the semantics.  The original iterates the index
>      twice, so you may see the same entry from --cached once and
>      then again from --modified.  The updated one still will show
>      the same entry twice but next to each other.
>
>  #3: optionally deduplicate.
>
>      Once we have a single loop, deduplicationg based on names is
>      trivial, as we seen before.
>
>
> Hmm?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-16  7:13     ` Eric Sunshine
@ 2021-01-17  3:49       ` 胡哲宁
  2021-01-17  5:11         ` Eric Sunshine
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-17  3:49 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, Git List

Eric,Thanks!
I have little confuse about I can use` test_when_finished "git switch master" `,
but I can't use` test_when_finished "git switch -" `,
why?

Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
>
> On Thu, Jan 14, 2021 at 7:22 AM 阿德烈 via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--dedup` option will suppress
> > some duplicate options under some conditions.
> > [...]
>
> I have a few very minor comments alongside Junio's review comments...
>
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> > @@ -0,0 +1,54 @@
> > +test_description='git ls-files --dedup test.
> > +
> > +This test prepares the following in the cache:
> > +
> > +    a.txt       - a file(base)
> > +    a.txt      - a file(master)
> > +    a.txt       - a file(dev)
> > +    b.txt       - a file
> > +    delete.txt  - a file
> > +    expect1    - a file
> > +    expect2    - a file
> > +
> > +'
>
> This test script description is outdated now. Perhaps shorten it to:
>
>     test_description='ls-files dedup tests'
>
> Or, it might be suitable to simply add the new test to the existing
> t3004-ls-files-basic.sh instead.
>
> > +test_expect_success 'setup' '
> > +       > a.txt &&
> > +       > b.txt &&
> > +       > delete.txt &&
> > +       cat >expect1<<-\EOF &&
>
> Style nits: no space after redirection operator and a space before
> redirection operator:
>
>     >a.txt &&
>     >b.txt &&
>     >delete.txt &&
>     cat >expect1 <<-\EOF &&
>
> > +       cat >expect2<<-EOF &&
>
> Nit: missing the backslash (and wrong spacing):
>
>     cat >expect2 <<-\EOF &&
>
> > +       echo a>a.txt &&
> > +       echo b>b.txt &&
>
> Style:
>
>     echo a >a.txt &&
>     echo b >b.txt &&
>
> > +       echo delete >delete.txt &&
> > +       git add a.txt b.txt delete.txt &&
> > +       git commit -m master:2 &&
> > +       git checkout HEAD~ &&
> > +       git switch -c dev &&
>
> If someone adds a new test after this test, then that new test will
> run in the "dev" branch, which might be unexpected or undesirable. It
> often is a good idea to ensure that tests do certain types of cleanup
> to avoid breaking subsequent tests. Here, it would be a good idea to
> ensure that the test switches back to the original branch when it
> finishes (regardless of whether it finishes successfully or
> unsuccessfully).
>
>     git switch -c dev &&
>     test_when_finished "git switch master" &&
>
> Or you could use `git switch -` if you don't want to hard-code the
> name "master" in the test (since there has been effort lately to
> remove that name from tests.
>
> > +       echo change >a.txt &&
> > +       git add a.txt &&
> > +       git commit -m dev:1 &&
> > +       test_must_fail git merge master &&
> > +       git ls-files -t --dedup >actual1 &&
> > +       test_cmp expect1 actual1 &&
> > +       rm delete.txt &&
> > +       git ls-files -d -m -t --dedup >actual2 &&
> > +       test_cmp expect2 actual2
>
> We usually don't bother giving temporary files unique names like
> "actual1" and "actual2" unless those files must exist at the same
> time. This is because unique names like this may confuse readers into
> wondering if there is some hidden interdependency between the files.
> In this case, the files don't need to exist at the same time, so it
> may be better simply to use the names "actual" and "expect", like
> this:
>
>     ...other stuff...
>     cat >expect <<-\EOF &&
>     ...
>     EOF
>     git ls-files -t --dedup >actual &&
>     test_cmp expect actual &&
>     rm delete.txt &&
>     cat >expect <<-\EOF &&
>     ...
>     EOF
>     git ls-files -d -m -t --dedup >actual &&
>     test_cmp expect actual
>
> (It also has the benefit that the "expect" content is closer to the
> place where it is actually used, which may make it a bit easier for a
> person reading the test to understand what is supposed to be
> produced.)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v4 0/3] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
  2021-01-15  0:59     ` Junio C Hamano
  2021-01-16  7:13     ` Eric Sunshine
@ 2021-01-17  4:02     ` 阿德烈 via GitGitGadget
  2021-01-17  4:02       ` [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
                         ` (3 more replies)
  2 siblings, 4 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-17  4:02 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈

I am reading the source code of git ls-files and learned that git ls-files
may have duplicate files name when there are unmerged path in a branch merge
or when different options are used at the same time. Users may fell confuse
when they see these duplicate file names.

As Junio C Hamano said ,it have odd behaviour.

Therefore, we can provide an additional option to git ls-files to delete
those repeated information.

This fixes https://github.com/gitgitgadget/git/issues/198

Thanks!

ZheNing Hu (3):
  ls_files.c: bugfix for --deleted and --modified
  ls_files.c: consolidate two for loops into one
  ls-files: add --deduplicate option

 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 82 +++++++++++++++++++---------------
 t/t3012-ls-files-dedup.sh      | 57 +++++++++++++++++++++++
 3 files changed, 109 insertions(+), 35 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v3:

 -:  ----------- > 1:  f4d9af8a312 ls_files.c: bugfix for --deleted and --modified
 -:  ----------- > 2:  50efd9b45b1 ls_files.c: consolidate two for loops into one
 1:  5ce52c8b7a4 ! 3:  0c7830d07db ls-files.c: add --dedup option
     @@ Metadata
      Author: ZheNing Hu <adlternative@gmail.com>
      
       ## Commit message ##
     -    ls-files.c: add --dedup option
     +    ls-files: add --deduplicate option
      
          In order to provide users a better experience
          when viewing information about files in the index
     -    and the working tree, the `--dedup` option will suppress
     -    some duplicate options under some conditions.
     +    and the working tree, the `--deduplicate` option will suppress
     +    some duplicate name under some conditions.
      
     -    In a merge conflict, one item of "git ls-files" output may
     -    appear multiple times. For example,now the file `a.c` has
     -    a conflict,`a.c` will appear three times in the output of
     -    "git ls-files".We can use "git ls-files --dedup" to output
     +    In a merge conflict, one file name of "git ls-files" output may
     +    appear multiple times. For example,now there is an unmerged path
     +    `a.c`,`a.c` will appear three times in the output of
     +    "git ls-files".We can use "git ls-files --deduplicate" to output
          `a.c` only one time.(unless `--stage` or `--unmerged` is
          used to view all the detailed information in the index)
      
     -    In addition, if you use both `--delete` and `--modify` in
     -    the same time, The `--dedup` option can also suppress modified
     -    entries output.
     +    In addition, if you use both `--delete` and `--modify` at
     +    the same time, The `--deduplicate` option
     +    can also suppress file name output.
      
     -    `--dedup` option relevant descriptions in
     -    `Documentation/git-ls-files.txt`,
     -    the test script in `t/t3012-ls-files-dedup.sh`
     -    prove the correctness of the `--dedup` option.
     -
     -    this patch fixed:
     -    https://github.com/gitgitgadget/git/issues/198
     -    Thanks.
     +    Additional instructions:
     +    In order to display entries information,`deduplicate` suppresses
     +    the output of duplicate file names, not the output of duplicate
     +    entries information, so under the option of `-t`, `--stage`, `--unmerge`,
     +    `--deduplicate` will have no effect.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
      
     @@ Documentation/git-ls-files.txt: SYNOPSIS
       		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
       		(-[c|d|o|i|s|u|k|m])*
       		[--eol]
     -+		[--dedup]
     ++		[--deduplicate]
       		[-x <pattern>|--exclude=<pattern>]
       		[-X <file>|--exclude-from=<file>]
       		[--exclude-per-directory=<file>]
     @@ Documentation/git-ls-files.txt: OPTIONS
       	\0 line termination on output and do not quote filenames.
       	See OUTPUT below for more information.
       
     -+--dedup::
     -+	Suppress duplicate entries when conflict happen or `--deleted`
     -+	and `--modified` are combined.
     ++--deduplicate::
     ++	Suppress duplicate entries when there are unmerged paths in index
     ++	or `--deleted` and `--modified` are combined.
      +
       -x <pattern>::
       --exclude=<pattern>::
     @@ builtin/ls-files.c: static int line_terminator = '\n';
       static int debug_mode;
       static int show_eol;
       static int recurse_submodules;
     -+static int delete_dup;
     ++static int skipping_duplicates;
       
       static const char *prefix;
       static int max_prefix_len;
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
       {
       	int i;
       	struct strbuf fullname = STRBUF_INIT;
     -+	const struct cache_entry *last_stage = NULL;
     ++	const struct cache_entry *last_shown_ce;
       
       	/* For cached/deleted files we don't need to even do the readdir */
       	if (show_others || show_killed) {
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 		for (i = 0; i < repo->index->cache_nr; i++) {
     - 			const struct cache_entry *ce = repo->index->cache[i];
     - 
     -+			if (show_cached && delete_dup) {
     -+				switch (ce_stage(ce)) {
     -+				case 0:
     -+				default:
     -+					break;
     -+				case 1:
     -+				case 2:
     -+				case 3:
     -+					if (last_stage &&
     -+					!strcmp(last_stage->name, ce->name))
     -+						continue;
     -+					last_stage = ce;
     -+				}
     -+			}
     - 			construct_fullname(&fullname, repo, ce);
     - 
     - 			if ((dir->flags & DIR_SHOW_IGNORED) &&
     + 	}
     + 	if (! (show_cached || show_stage || show_deleted || show_modified))
     + 		return;
     ++	last_shown_ce = NULL;
     + 	for (i = 0; i < repo->index->cache_nr; i++) {
     + 		const struct cache_entry *ce = repo->index->cache[i];
     + 		struct stat st;
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 			struct stat st;
     - 			int err;
       
     -+			if (delete_dup) {
     -+				switch (ce_stage(ce)) {
     -+				case 0:
     -+				default:
     -+					break;
     -+				case 1:
     -+				case 2:
     -+				case 3:
     -+					if (last_stage &&
     -+					!strcmp(last_stage->name, ce->name))
     -+						continue;
     -+					last_stage = ce;
     -+				}
     -+			}
     - 			construct_fullname(&fullname, repo, ce);
     + 		construct_fullname(&fullname, repo, ce);
       
     - 			if ((dir->flags & DIR_SHOW_IGNORED) &&
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 			if (ce_skip_worktree(ce))
     - 				continue;
     - 			err = lstat(fullname.buf, &st);
     --			if (show_deleted && err)
     -+			if (delete_dup && show_deleted && show_modified && err)
     - 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --			if (show_modified && ie_modified(repo->index, ce, &st, 0))
     --				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++		if (skipping_duplicates && last_shown_ce &&
     ++			!strcmp(last_shown_ce->name,ce->name))
     ++				continue;
     + 		if ((dir->flags & DIR_SHOW_IGNORED) &&
     + 			!ce_excluded(dir, repo->index, fullname.buf, ce))
     + 			continue;
     + 		if (ce->ce_flags & CE_UPDATE)
     + 			continue;
     + 		if (show_cached || show_stage) {
     ++			if (show_cached && skipping_duplicates && last_shown_ce &&
     ++				!strcmp(last_shown_ce->name,ce->name))
     ++					continue;
     + 			if (!show_unmerged || ce_stage(ce))
     + 				show_ce(repo, dir, ce, fullname.buf,
     + 					ce_stage(ce) ? tag_unmerged :
     + 					(ce_skip_worktree(ce) ? tag_skip_worktree :
     + 						tag_cached));
     ++			if(show_cached && skipping_duplicates)
     ++				last_shown_ce = ce;
     + 		}
     + 		if (ce_skip_worktree(ce))
     + 			continue;
     ++		if (skipping_duplicates && last_shown_ce && !strcmp(last_shown_ce->name,ce->name))
     ++			continue;
     + 		err = lstat(fullname.buf, &st);
     + 		if (err) {
     ++			if (skipping_duplicates && show_deleted && show_modified)
     ++				show_ce(repo, dir, ce, fullname.buf, tag_removed);
      +			else {
     -+				if (show_deleted && err)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+				if (show_modified && ie_modified(repo->index, ce, &st, 0))
     -+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     + 				if (show_deleted)
     + 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     + 				if (show_modified)
     + 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     +-		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
      +			}
     - 		}
     ++		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     + 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++		last_shown_ce = ce;
       	}
       
     + 	strbuf_release(&fullname);
      @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
       			N_("pretend that paths removed since <tree-ish> are still present")),
       		OPT__ABBREV(&abbrev),
       		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
     -+		OPT_BOOL(0, "dedup", &delete_dup, N_("suppress duplicate entries")),
     ++		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
       		OPT_END()
       	};
       
     +@@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
     + 		tag_skip_worktree = "S ";
     + 		tag_resolve_undo = "U ";
     + 	}
     ++	if (show_tag && skipping_duplicates)
     ++		skipping_duplicates = 0;
     + 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
     + 		require_work_tree = 1;
     + 	if (show_unmerged)
      
       ## t/t3012-ls-files-dedup.sh (new) ##
      @@
      +#!/bin/sh
      +
     -+test_description='git ls-files --dedup test.
     -+
     -+This test prepares the following in the cache:
     -+
     -+    a.txt       - a file(base)
     -+    a.txt	- a file(master)
     -+    a.txt       - a file(dev)
     -+    b.txt       - a file
     -+    delete.txt  - a file
     -+    expect1	- a file
     -+    expect2	- a file
     -+
     -+'
     ++test_description='git ls-files --deduplicate test'
      +
      +. ./test-lib.sh
      +
      +test_expect_success 'setup' '
     -+	> a.txt &&
     -+	> b.txt &&
     -+	> delete.txt &&
     -+	cat >expect1<<-\EOF &&
     -+	M a.txt
     -+	H b.txt
     -+	H delete.txt
     -+	H expect1
     -+	H expect2
     -+	EOF
     -+	cat >expect2<<-EOF &&
     -+	C a.txt
     -+	R delete.txt
     -+	EOF
     -+	git add a.txt b.txt delete.txt expect1 expect2 &&
     ++	>a.txt &&
     ++	>b.txt &&
     ++	>delete.txt &&
     ++	git add a.txt b.txt delete.txt &&
      +	git commit -m master:1 &&
     -+	echo a>a.txt &&
     -+	echo b>b.txt &&
     ++	echo a >a.txt &&
     ++	echo b >b.txt &&
      +	echo delete >delete.txt &&
      +	git add a.txt b.txt delete.txt &&
      +	git commit -m master:2 &&
      +	git checkout HEAD~ &&
      +	git switch -c dev &&
     ++	test_when_finished "git switch master" &&
      +	echo change >a.txt &&
      +	git add a.txt &&
      +	git commit -m dev:1 &&
      +	test_must_fail git merge master &&
     -+	git ls-files -t --dedup >actual1 &&
     -+	test_cmp expect1 actual1 &&
     ++	git ls-files --deduplicate >actual &&
     ++	cat >expect <<-\EOF &&
     ++	a.txt
     ++	b.txt
     ++	delete.txt
     ++	EOF
     ++	test_cmp expect actual &&
      +	rm delete.txt &&
     -+	git ls-files -d -m -t --dedup >actual2 &&
     -+	test_cmp expect2 actual2
     ++	git ls-files -d -m --deduplicate >actual &&
     ++	cat >expect <<-\EOF &&
     ++	a.txt
     ++	delete.txt
     ++	EOF
     ++	test_cmp expect actual &&
     ++	git ls-files -d -m -t  --deduplicate >actual &&
     ++	cat >expect <<-\EOF &&
     ++	C a.txt
     ++	C a.txt
     ++	C a.txt
     ++	R delete.txt
     ++	C delete.txt
     ++	EOF
     ++	test_cmp expect actual &&
     ++	git ls-files -d -m -c  --deduplicate >actual &&
     ++	cat >expect <<-\EOF &&
     ++	a.txt
     ++	b.txt
     ++	delete.txt
     ++	EOF
     ++	test_cmp expect actual &&
     ++	git merge --abort
      +'
     -+
      +test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
@ 2021-01-17  4:02       ` ZheNing Hu via GitGitGadget
  2021-01-17  6:22         ` Junio C Hamano
  2021-01-17  4:02       ` [PATCH v4 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-17  4:02 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.

It's buggy!

Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..6f97a23c2dc 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -347,9 +347,12 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			if (err) {
+					if (show_deleted)
+						show_ce(repo, dir, ce, fullname.buf, tag_removed);
+					if (show_modified)
+						show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
 		}
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 2/3] ls_files.c: consolidate two for loops into one
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
  2021-01-17  4:02       ` [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-17  4:02       ` ZheNing Hu via GitGitGadget
  2021-01-17  4:02       ` [PATCH v4 3/3] ls-files: add --deduplicate option ZheNing Hu via GitGitGadget
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  3 siblings, 0 replies; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-17  4:02 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Refactor the two for loops into one,skip showing the ce if it
has the same name as the previously shown one, only when doing so
won't lose information.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 68 +++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 40 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 6f97a23c2dc..49c242128d7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -312,49 +312,37 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (show_cached || show_stage) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-
-			construct_fullname(&fullname, repo, ce);
-
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			show_ce(repo, dir, ce, fullname.buf,
-				ce_stage(ce) ? tag_unmerged :
-				(ce_skip_worktree(ce) ? tag_skip_worktree :
-				 tag_cached));
-		}
-	}
-	if (show_deleted || show_modified) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-			struct stat st;
-			int err;
+	if (! (show_cached || show_stage || show_deleted || show_modified))
+		return;
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct stat st;
+		int err;
 
-			construct_fullname(&fullname, repo, ce);
+		construct_fullname(&fullname, repo, ce);
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			if (ce_skip_worktree(ce))
-				continue;
-			err = lstat(fullname.buf, &st);
-			if (err) {
-					if (show_deleted)
-						show_ce(repo, dir, ce, fullname.buf, tag_removed);
-					if (show_modified)
-						show_ce(repo, dir, ce, fullname.buf, tag_modified);
-			}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		if ((dir->flags & DIR_SHOW_IGNORED) &&
+			!ce_excluded(dir, repo->index, fullname.buf, ce))
+			continue;
+		if (ce->ce_flags & CE_UPDATE)
+			continue;
+		if (show_cached || show_stage) {
+			if (!show_unmerged || ce_stage(ce))
+				show_ce(repo, dir, ce, fullname.buf,
+					ce_stage(ce) ? tag_unmerged :
+					(ce_skip_worktree(ce) ? tag_skip_worktree :
+						tag_cached));
 		}
+		if (ce_skip_worktree(ce))
+			continue;
+		err = lstat(fullname.buf, &st);
+		if (err) {
+				if (show_deleted)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified)
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			show_ce(repo, dir, ce, fullname.buf, tag_modified);
 	}
 
 	strbuf_release(&fullname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
  2021-01-17  4:02       ` [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
  2021-01-17  4:02       ` [PATCH v4 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-17  4:02       ` ZheNing Hu via GitGitGadget
  2021-01-17  6:25         ` Junio C Hamano
  2021-01-17 23:34         ` Junio C Hamano
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  3 siblings, 2 replies; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-17  4:02 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 23 +++++++++++++-
 t/t3012-ls-files-dedup.sh      | 57 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..d11c8ade402 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 49c242128d7..390d7ef6b44 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_shown_ce;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -314,6 +316,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	}
 	if (! (show_cached || show_stage || show_deleted || show_modified))
 		return;
+	last_shown_ce = NULL;
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -321,28 +324,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 
 		construct_fullname(&fullname, repo, ce);
 
+		if (skipping_duplicates && last_shown_ce &&
+			!strcmp(last_shown_ce->name,ce->name))
+				continue;
 		if ((dir->flags & DIR_SHOW_IGNORED) &&
 			!ce_excluded(dir, repo->index, fullname.buf, ce))
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
 		if (show_cached || show_stage) {
+			if (show_cached && skipping_duplicates && last_shown_ce &&
+				!strcmp(last_shown_ce->name,ce->name))
+					continue;
 			if (!show_unmerged || ce_stage(ce))
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if(show_cached && skipping_duplicates)
+				last_shown_ce = ce;
 		}
 		if (ce_skip_worktree(ce))
 			continue;
+		if (skipping_duplicates && last_shown_ce && !strcmp(last_shown_ce->name,ce->name))
+			continue;
 		err = lstat(fullname.buf, &st);
 		if (err) {
+			if (skipping_duplicates && show_deleted && show_modified)
+				show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			else {
 				if (show_deleted)
 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
 				if (show_modified)
 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
-		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			}
+		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		last_shown_ce = ce;
 	}
 
 	strbuf_release(&fullname);
@@ -569,6 +587,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -600,6 +619,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		tag_skip_worktree = "S ";
 		tag_resolve_undo = "U ";
 	}
+	if (show_tag && skipping_duplicates)
+		skipping_duplicates = 0;
 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
 		require_work_tree = 1;
 	if (show_unmerged)
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..75877255c2c
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m master:1 &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m master:2 &&
+	git checkout HEAD~ &&
+	git switch -c dev &&
+	test_when_finished "git switch master" &&
+	echo change >a.txt &&
+	git add a.txt &&
+	git commit -m dev:1 &&
+	test_must_fail git merge master &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t  --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c  --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-17  3:45       ` 胡哲宁
@ 2021-01-17  4:37         ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-17  4:37 UTC (permalink / raw)
  To: 胡哲宁; +Cc: Git List, Eric Sunshine

胡哲宁 <adlternative@gmail.com> writes:

> Here I may disagree with your point of view:
>                  if (errno != E_NOENT)
>                          error_errno("cannot lstat '%s'", fullname.buf);
> With this sentence included, the patch will fail the test:
> t/t3010-ls-files-killed-modified.sh.
> the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);`
> So I temporarily removed the judgment of errno.

I didn't mean to give you a solution that is ready to be
cut-and-pasted without thinking ;-) 

If NOTDIR needs to also be excluded, then you can exclude it just
like the above illustration excludes NOENT to solve the issue,
right?

>>  #2: consolidate two for loops into one.
>> ...
>>      This changes the semantics.  The original iterates the index
>>      twice, so you may see the same entry from --cached once and
>>      then again from --modified.  The updated one still will show
>>      the same entry twice but next to each other.
>>
> Well,This does change the semantics. I think people who used two
> for loops before may want to separate different outputs.
> Now, if you don’t use "--deduplicate", You may see six consecutive
> items under a combination of multiple options.

Yes, and that is intended and is a vast improvement from the current
behaviour, which shows 3 in the first loop, bunch of unrelated
entries from the rest of the first loop, yet another bunch of
unrelated entries from the early part of the second loop and then
finally shows 3 from the second loop.  With the "single loop"
update, at least, the entries are sorted by their path and it would
make it easier to see (if the user cares to trigger both --cached
and --modified, that is), no?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-17  3:49       ` 胡哲宁
@ 2021-01-17  5:11         ` Eric Sunshine
  2021-01-17 23:04           ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: Eric Sunshine @ 2021-01-17  5:11 UTC (permalink / raw)
  To: 胡哲宁; +Cc: Junio C Hamano, Git List

On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
> > > +       git switch -c dev &&
> >
> > If someone adds a new test after this test, then that new test will
> > run in the "dev" branch, which might be unexpected or undesirable. It
> > often is a good idea to ensure that tests do certain types of cleanup
> > to avoid breaking subsequent tests. Here, it would be a good idea to
> > ensure that the test switches back to the original branch when it
> > finishes (regardless of whether it finishes successfully or
> > unsuccessfully).
> >
> >     git switch -c dev &&
> >     test_when_finished "git switch master" &&
> >
> > Or you could use `git switch -` if you don't want to hard-code the
> > name "master" in the test (since there has been effort lately to
> > remove that name from tests.
> >
> I have little confuse about I can use` test_when_finished "git switch master" `,
> but I can't use` test_when_finished "git switch -" `,
> why?

You may use either one. I presented both as alternative approaches.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-17  4:02       ` [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-17  6:22         ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-17  6:22 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> This situation may occur in the original code: lstat() failed
> but we use `&st` to feed ie_modified() later.
>
> It's buggy!

Wasteful extra paragraph with almost no information contents over
what has already been said in the first paragraph.

> Therefore, we can directly execute show_ce without the judgment of
> ie_modified() when lstat() has failed.

But it introduces another bug, as you do not even see why a path
cannot be lstat'ed, no?

>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  builtin/ls-files.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index c8eae899b82..6f97a23c2dc 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -347,9 +347,12 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  			if (ce_skip_worktree(ce))
>  				continue;
>  			err = lstat(fullname.buf, &st);
> -			if (show_deleted && err)
> -				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> -			if (show_modified && ie_modified(repo->index, ce, &st, 0))
> +			if (err) {
> +					if (show_deleted)
> +						show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +					if (show_modified)
> +						show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
>  				show_ce(repo, dir, ce, fullname.buf, tag_modified);
>  		}
>  	}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-17  4:02       ` [PATCH v4 3/3] ls-files: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-17  6:25         ` Junio C Hamano
  2021-01-17 23:34         ` Junio C Hamano
  1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-17  6:25 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> +			}
> +		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))

The preimage shows a style violation "}else if" introduced by an
earlier step in the series, and this fixes it.

Please make sure to proofread your patches before you show to others
to pretend that you are perfect coder and do not need "oops what I
did earlier in the series was wrong and here is a fix-up".

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-17  5:11         ` Eric Sunshine
@ 2021-01-17 23:04           ` Junio C Hamano
  2021-01-18 14:59             ` Eric Sunshine
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-17 23:04 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: 胡哲宁, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
>> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
>> > > +       git switch -c dev &&
>> >
>> > If someone adds a new test after this test, then that new test will
>> > run in the "dev" branch, which might be unexpected or undesirable. It
>> > often is a good idea to ensure that tests do certain types of cleanup
>> > to avoid breaking subsequent tests. Here, it would be a good idea to
>> > ensure that the test switches back to the original branch when it
>> > finishes (regardless of whether it finishes successfully or
>> > unsuccessfully).
>> >
>> >     git switch -c dev &&
>> >     test_when_finished "git switch master" &&
>> >
>> > Or you could use `git switch -` if you don't want to hard-code the
>> > name "master" in the test (since there has been effort lately to
>> > remove that name from tests.
>> >
>> I have little confuse about I can use` test_when_finished "git switch master" `,
>> but I can't use` test_when_finished "git switch -" `,
>> why?
>
> You may use either one. I presented both as alternative approaches.

I am sensing a bit of miscommunication here.  You sound like you
still believe either would work OK, but to me, it sounds like that
the author claims the one of them does not work for him/her.

I do not think the test framework itself mucks with the reflog to
make switching to "-" (or "@{-1}") break, so I find it implausible
that "switch -" form not to work, and unlike your "either is OK", I
have a moderately strong preference to use the "go back to the
previous one, whatever it is called" form.

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-17  4:02       ` [PATCH v4 3/3] ls-files: add --deduplicate option ZheNing Hu via GitGitGadget
  2021-01-17  6:25         ` Junio C Hamano
@ 2021-01-17 23:34         ` Junio C Hamano
  2021-01-18  4:09           ` 胡哲宁
  1 sibling, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-17 23:34 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> new file mode 100755
> index 00000000000..75877255c2c
> --- /dev/null
> +++ b/t/t3012-ls-files-dedup.sh
> @@ -0,0 +1,57 @@
> +#!/bin/sh
> +
> +test_description='git ls-files --deduplicate test'
> +
> +. ./test-lib.sh

We should already have a ls-files test so that we can add a handful
new tests to it, instead of dedicating a whole new test script.

Also, don't do everything in a single 'setup'.  There are various
scenarios you want to make sure ls-files to work (grep for ls-files
in the following you added---I count 4 of them), and when a future
developer touches the code, he or she may break one but not other
three.  The purpose you write tests is to protect your new feature
from such a developer *AND* help such a developer to debug and fix
his or her changes.  For that, it would be a lot more sensible to
have one set-up that is common, and then four separate tests.

> +test_expect_success 'setup' '
> +	>a.txt &&
> +	>b.txt &&
> +	>delete.txt &&
> +	git add a.txt b.txt delete.txt &&
> +	git commit -m master:1 &&

Needless use of the word "master".  Observe what is going on in the
project around you and avoid stepping other peoples' toes.  One of
the ongoing effort is to grep for the phrase master in t/ directory
and examine what happens when the default initial branch name
becomes something other than 'master', so adding a needless hit like
this is most unwelcome.

> +	echo a >a.txt &&
> +	echo b >b.txt &&
> +	echo delete >delete.txt &&
> +	git add a.txt b.txt delete.txt &&
> +	git commit -m master:2 &&

> +	git checkout HEAD~ &&
> +	git switch -c dev &&

Needless mixture of checkout/switch.  If you switch branches using
"git checkout", for example, consistently do so, i.e.

	git checkout -b dev HEAD~1 

It's not like these new tests are to test checkout and switch; your
mission is to protect "ls-files --dedup" feature here.

> +	test_when_finished "git switch master" &&
> +	echo change >a.txt &&
> +	git add a.txt &&
> +	git commit -m dev:1 &&

I'd consider all of the above to be 'setup' that is common for
subsequent tests.  It may make sense to actually do everything
on the initial branch, i.e. after creating two commits, do

	git tag tip &&
	git reset --hard HEAD^ &&
	echo change >a.txt &&
	git commit -a -m side &&
	git tag side

You are always on the initial branch without ever switching, so
there is no need for the when_finished stuff.

Then the first of your test is to show the index with conflicts.

> +	test_must_fail git merge master &&

This will become "git merge tip" instead of 'master'.

> +	git ls-files --deduplicate >actual &&
> +	cat >expect <<-\EOF &&
> +	a.txt
> +	b.txt
> +	delete.txt
> +	EOF
> +	test_cmp expect actual &&

And up to this point is the first test after 'setup'.

The next test should begin with:

	git reset --hard side &&
	test_must_fail git merge tip &&

so that even when the first test is skipped, or left unmerged,
you'll begin with a known state.

> +	rm delete.txt &&
> +	git ls-files -d -m --deduplicate >actual &&
> +	cat >expect <<-\EOF &&
> +	a.txt
> +	delete.txt
> +	EOF
> +	test_cmp expect actual &&
> +	git ls-files -d -m -t  --deduplicate >actual &&
> +	cat >expect <<-\EOF &&
> +	C a.txt
> +	C a.txt
> +	C a.txt
> +	R delete.txt
> +	C delete.txt
> +	EOF
> +	test_cmp expect actual &&
> +	git ls-files -d -m -c  --deduplicate >actual &&
> +	cat >expect <<-\EOF &&
> +	a.txt
> +	b.txt
> +	delete.txt
> +	EOF
> +	test_cmp expect actual &&

These three can be kept in the same test_expect_success, as they are
exercising read-only operation on the same state but with different
display options.

But in this case, the preparation is not too tedious (just a failed
merge plus a deletion), so you probably would prefer to split it
into 3 independent tests---that may make it more helpful to future
developers.

> +	git merge --abort
> +'
> +test_done

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-17 23:34         ` Junio C Hamano
@ 2021-01-18  4:09           ` 胡哲宁
  2021-01-18  6:05             ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-18  4:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月18日周一 上午7:34写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> > new file mode 100755
> > index 00000000000..75877255c2c
> > --- /dev/null
> > +++ b/t/t3012-ls-files-dedup.sh
> > @@ -0,0 +1,57 @@
> > +#!/bin/sh
> > +
> > +test_description='git ls-files --deduplicate test'
> > +
> > +. ./test-lib.sh
>
> We should already have a ls-files test so that we can add a handful
> new tests to it, instead of dedicating a whole new test script.
>
Fine,It might be easier for me to write a test file myself for the time being.
But I will learn slowly.
> Also, don't do everything in a single 'setup'.  There are various
> scenarios you want to make sure ls-files to work (grep for ls-files
> in the following you added---I count 4 of them), and when a future
> developer touches the code, he or she may break one but not other
> three.  The purpose you write tests is to protect your new feature
> from such a developer *AND* help such a developer to debug and fix
> his or her changes.  For that, it would be a lot more sensible to
> have one set-up that is common, and then four separate tests.
>
> > +test_expect_success 'setup' '
> > +     >a.txt &&
> > +     >b.txt &&
> > +     >delete.txt &&
> > +     git add a.txt b.txt delete.txt &&
> > +     git commit -m master:1 &&
>
> Needless use of the word "master".  Observe what is going on in the
> project around you and avoid stepping other peoples' toes.  One of
> the ongoing effort is to grep for the phrase master in t/ directory
> and examine what happens when the default initial branch name
> becomes something other than 'master', so adding a needless hit like
> this is most unwelcome.
>
Well, I will try my best to use less "master".
> > +     echo a >a.txt &&
> > +     echo b >b.txt &&
> > +     echo delete >delete.txt &&
> > +     git add a.txt b.txt delete.txt &&
> > +     git commit -m master:2 &&
>
> > +     git checkout HEAD~ &&
> > +     git switch -c dev &&
>
> Needless mixture of checkout/switch.  If you switch branches using
> "git checkout", for example, consistently do so, i.e.
>
>         git checkout -b dev HEAD~1
>
> It's not like these new tests are to test checkout and switch; your
> mission is to protect "ls-files --dedup" feature here.
>
> > +     test_when_finished "git switch master" &&
> > +     echo change >a.txt &&
> > +     git add a.txt &&
> > +     git commit -m dev:1 &&
>
> I'd consider all of the above to be 'setup' that is common for
> subsequent tests.  It may make sense to actually do everything
> on the initial branch, i.e. after creating two commits, do
>
I understand it now...setup is for serve as a basis for other tests.
>         git tag tip &&
>         git reset --hard HEAD^ &&
>         echo change >a.txt &&
>         git commit -a -m side &&
>         git tag side
>
> You are always on the initial branch without ever switching, so
> there is no need for the when_finished stuff.
>
> Then the first of your test is to show the index with conflicts.
>
> > +     test_must_fail git merge master &&
>
> This will become "git merge tip" instead of 'master'.
>
use tag instead of use branch name...
> > +     git ls-files --deduplicate >actual &&
> > +     cat >expect <<-\EOF &&
> > +     a.txt
> > +     b.txt
> > +     delete.txt
> > +     EOF
> > +     test_cmp expect actual &&
>
> And up to this point is the first test after 'setup'.
>
> The next test should begin with:
>
>         git reset --hard side &&
>         test_must_fail git merge tip &&
>
> so that even when the first test is skipped, or left unmerged,
> you'll begin with a known state.
>
Well,I understand now that the a test_success should allow other
programmers to skip this test,so that we should reset to a known
state at the beginning of each test.
> > +     rm delete.txt &&
> > +     git ls-files -d -m --deduplicate >actual &&
> > +     cat >expect <<-\EOF &&
> > +     a.txt
> > +     delete.txt
> > +     EOF
> > +     test_cmp expect actual &&
> > +     git ls-files -d -m -t  --deduplicate >actual &&
> > +     cat >expect <<-\EOF &&
> > +     C a.txt
> > +     C a.txt
> > +     C a.txt
> > +     R delete.txt
> > +     C delete.txt
> > +     EOF
> > +     test_cmp expect actual &&
> > +     git ls-files -d -m -c  --deduplicate >actual &&
> > +     cat >expect <<-\EOF &&
> > +     a.txt
> > +     b.txt
> > +     delete.txt
> > +     EOF
> > +     test_cmp expect actual &&
>
> These three can be kept in the same test_expect_success, as they are
> exercising read-only operation on the same state but with different
> display options.
>
indeed so.
> But in this case, the preparation is not too tedious (just a failed
> merge plus a deletion), so you probably would prefer to split it
> into 3 independent tests---that may make it more helpful to future
> developers.
>
Thanks:)
> > +     git merge --abort
> > +'
> > +test_done

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-18  4:09           ` 胡哲宁
@ 2021-01-18  6:05             ` 胡哲宁
  2021-01-18 21:31               ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-18  6:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Hi,Junio!
Here I am thinking about the role of this "--deduplicate" is to
suppress duplicate filenames rather than duplicate entries. Do you
think I should modify this sentence?

> > OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),

胡哲宁 <adlternative@gmail.com> 于2021年1月18日周一 下午12:09写道:
>
> Junio C Hamano <gitster@pobox.com> 于2021年1月18日周一 上午7:34写道:
> >
> > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
> > > new file mode 100755
> > > index 00000000000..75877255c2c
> > > --- /dev/null
> > > +++ b/t/t3012-ls-files-dedup.sh
> > > @@ -0,0 +1,57 @@
> > > +#!/bin/sh
> > > +
> > > +test_description='git ls-files --deduplicate test'
> > > +
> > > +. ./test-lib.sh
> >
> > We should already have a ls-files test so that we can add a handful
> > new tests to it, instead of dedicating a whole new test script.
> >
> Fine,It might be easier for me to write a test file myself for the time being.
> But I will learn slowly.
> > Also, don't do everything in a single 'setup'.  There are various
> > scenarios you want to make sure ls-files to work (grep for ls-files
> > in the following you added---I count 4 of them), and when a future
> > developer touches the code, he or she may break one but not other
> > three.  The purpose you write tests is to protect your new feature
> > from such a developer *AND* help such a developer to debug and fix
> > his or her changes.  For that, it would be a lot more sensible to
> > have one set-up that is common, and then four separate tests.
> >
> > > +test_expect_success 'setup' '
> > > +     >a.txt &&
> > > +     >b.txt &&
> > > +     >delete.txt &&
> > > +     git add a.txt b.txt delete.txt &&
> > > +     git commit -m master:1 &&
> >
> > Needless use of the word "master".  Observe what is going on in the
> > project around you and avoid stepping other peoples' toes.  One of
> > the ongoing effort is to grep for the phrase master in t/ directory
> > and examine what happens when the default initial branch name
> > becomes something other than 'master', so adding a needless hit like
> > this is most unwelcome.
> >
> Well, I will try my best to use less "master".
> > > +     echo a >a.txt &&
> > > +     echo b >b.txt &&
> > > +     echo delete >delete.txt &&
> > > +     git add a.txt b.txt delete.txt &&
> > > +     git commit -m master:2 &&
> >
> > > +     git checkout HEAD~ &&
> > > +     git switch -c dev &&
> >
> > Needless mixture of checkout/switch.  If you switch branches using
> > "git checkout", for example, consistently do so, i.e.
> >
> >         git checkout -b dev HEAD~1
> >
> > It's not like these new tests are to test checkout and switch; your
> > mission is to protect "ls-files --dedup" feature here.
> >
> > > +     test_when_finished "git switch master" &&
> > > +     echo change >a.txt &&
> > > +     git add a.txt &&
> > > +     git commit -m dev:1 &&
> >
> > I'd consider all of the above to be 'setup' that is common for
> > subsequent tests.  It may make sense to actually do everything
> > on the initial branch, i.e. after creating two commits, do
> >
> I understand it now...setup is for serve as a basis for other tests.
> >         git tag tip &&
> >         git reset --hard HEAD^ &&
> >         echo change >a.txt &&
> >         git commit -a -m side &&
> >         git tag side
> >
> > You are always on the initial branch without ever switching, so
> > there is no need for the when_finished stuff.
> >
> > Then the first of your test is to show the index with conflicts.
> >
> > > +     test_must_fail git merge master &&
> >
> > This will become "git merge tip" instead of 'master'.
> >
> use tag instead of use branch name...
> > > +     git ls-files --deduplicate >actual &&
> > > +     cat >expect <<-\EOF &&
> > > +     a.txt
> > > +     b.txt
> > > +     delete.txt
> > > +     EOF
> > > +     test_cmp expect actual &&
> >
> > And up to this point is the first test after 'setup'.
> >
> > The next test should begin with:
> >
> >         git reset --hard side &&
> >         test_must_fail git merge tip &&
> >
> > so that even when the first test is skipped, or left unmerged,
> > you'll begin with a known state.
> >
> Well,I understand now that the a test_success should allow other
> programmers to skip this test,so that we should reset to a known
> state at the beginning of each test.
> > > +     rm delete.txt &&
> > > +     git ls-files -d -m --deduplicate >actual &&
> > > +     cat >expect <<-\EOF &&
> > > +     a.txt
> > > +     delete.txt
> > > +     EOF
> > > +     test_cmp expect actual &&
> > > +     git ls-files -d -m -t  --deduplicate >actual &&
> > > +     cat >expect <<-\EOF &&
> > > +     C a.txt
> > > +     C a.txt
> > > +     C a.txt
> > > +     R delete.txt
> > > +     C delete.txt
> > > +     EOF
> > > +     test_cmp expect actual &&
> > > +     git ls-files -d -m -c  --deduplicate >actual &&
> > > +     cat >expect <<-\EOF &&
> > > +     a.txt
> > > +     b.txt
> > > +     delete.txt
> > > +     EOF
> > > +     test_cmp expect actual &&
> >
> > These three can be kept in the same test_expect_success, as they are
> > exercising read-only operation on the same state but with different
> > display options.
> >
> indeed so.
> > But in this case, the preparation is not too tedious (just a failed
> > merge plus a deletion), so you probably would prefer to split it
> > into 3 independent tests---that may make it more helpful to future
> > developers.
> >
> Thanks:)
> > > +     git merge --abort
> > > +'
> > > +test_done

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v3] ls-files.c: add --dedup option
  2021-01-17 23:04           ` Junio C Hamano
@ 2021-01-18 14:59             ` Eric Sunshine
  0 siblings, 0 replies; 65+ messages in thread
From: Eric Sunshine @ 2021-01-18 14:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: 胡哲宁, Git List

On Sun, Jan 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > On Sat, Jan 16, 2021 at 10:48 PM 胡哲宁 <adlternative@gmail.com> wrote:
> >> Eric Sunshine <sunshine@sunshineco.com> 于2021年1月16日周六 下午3:13写道:
> >> >     test_when_finished "git switch master" &&
> >> >
> >> > Or you could use `git switch -` if you don't want to hard-code the
> >> > name "master" in the test (since there has been effort lately to
> >> > remove that name from tests.
> >> >
> >> I have little confuse about I can use` test_when_finished "git switch master" `,
> >> but I can't use` test_when_finished "git switch -" `,
> >> why?
> >
> > You may use either one. I presented both as alternative approaches.
>
> I am sensing a bit of miscommunication here.  You sound like you
> still believe either would work OK, but to me, it sounds like that
> the author claims the one of them does not work for him/her.

That could be. My eye glided over the code too quickly to notice that
it was using `git checkout HEAD~` followed immediately by `git switch
-c dev`, which means that `git switch -` would indeed not return to
"master" (in fact, it errors out).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-18  6:05             ` 胡哲宁
@ 2021-01-18 21:31               ` Junio C Hamano
  2021-01-19  2:56                 ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-18 21:31 UTC (permalink / raw)
  To: 胡哲宁
  Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

胡哲宁 <adlternative@gmail.com> writes:

> Here I am thinking about the role of this "--deduplicate" is to
> suppress duplicate filenames rather than duplicate entries. Do you
> think I should modify this sentence?
>
>> > OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),

I see no strong need to.  One set of output entries from "ls-files"
may say

    $ git ls-files -u
    100644 536e55524db72bd2acf175208aef4f3dfc148d41 1	COPYING
    100644 536e55524db72bd2acf175208aef4f3dfc148d43 3	COPYING

and these three "entries" are not duplicates.  Another set of output
entries may say

    $ git ls-files COPYING
    COPYING
    COPYING
    COPYING

and these output entries are duplicates.  If you deduplicate the
latter but not the former, then "suppress duplicate entries" is
exactly what you are doing, I would think.

And if you are asked to show entries that would look like this in a
not-deduplicated form:

    $ git ls-files -u
    100644 536e55524db72bd2acf175208aef4f3dfc148d41 1	COPYING
    100644 536e55524db72bd2acf175208aef4f3dfc148d41 1	COPYING
    100644 536e55524db72bd2acf175208aef4f3dfc148d43 3	COPYING

"suppressing duplicates" would give us the first entry and drop the
second entry that is identical to the second entry, I would think
[*1*].

So "duplicate entries" would probably be more correct description of
what we want to happen than "duplicate filenames".


[Footnote]

*1* Multiple "common ancestor" versions at stage #1 for the same
    path is not an error.  That is how "merge-resolve" expresses
    criss-cross merge where multiple merge-bases exist.

    Multiple "their" versions at stage #3 for the same path is not
    an error, and "merge-octopus" should use it to express contents
    from histories being merged into ours, but the implementation of
    the octopus strategy does not use this feature of the index.

    Multiple "our" versions at stage #2 by definition should not
    happen ;-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v4 3/3] ls-files: add --deduplicate option
  2021-01-18 21:31               ` Junio C Hamano
@ 2021-01-19  2:56                 ` 胡哲宁
  0 siblings, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-19  2:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Thank you very much for your answer,I also learned a lot from it, then
I will use the description of "suppress duplicate entries".

Junio C Hamano <gitster@pobox.com> 于2021年1月19日周二 上午5:31写道:
>
> 胡哲宁 <adlternative@gmail.com> writes:
>
> > Here I am thinking about the role of this "--deduplicate" is to
> > suppress duplicate filenames rather than duplicate entries. Do you
> > think I should modify this sentence?
> >
> >> > OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
>
> I see no strong need to.  One set of output entries from "ls-files"
> may say
>
>     $ git ls-files -u
>     100644 536e55524db72bd2acf175208aef4f3dfc148d41 1   COPYING
>     100644 536e55524db72bd2acf175208aef4f3dfc148d43 3   COPYING
>
> and these three "entries" are not duplicates.  Another set of output
> entries may say
>
>     $ git ls-files COPYING
>     COPYING
>     COPYING
>     COPYING
>
> and these output entries are duplicates.  If you deduplicate the
> latter but not the former, then "suppress duplicate entries" is
> exactly what you are doing, I would think.
>
> And if you are asked to show entries that would look like this in a
> not-deduplicated form:
>
>     $ git ls-files -u
>     100644 536e55524db72bd2acf175208aef4f3dfc148d41 1   COPYING
>     100644 536e55524db72bd2acf175208aef4f3dfc148d41 1   COPYING
>     100644 536e55524db72bd2acf175208aef4f3dfc148d43 3   COPYING
>
> "suppressing duplicates" would give us the first entry and drop the
> second entry that is identical to the second entry, I would think
> [*1*].
>
> So "duplicate entries" would probably be more correct description of
> what we want to happen than "duplicate filenames".
>
>
> [Footnote]
>
> *1* Multiple "common ancestor" versions at stage #1 for the same
>     path is not an error.  That is how "merge-resolve" expresses
>     criss-cross merge where multiple merge-bases exist.
>
>     Multiple "their" versions at stage #3 for the same path is not
>     an error, and "merge-octopus" should use it to express contents
>     from histories being merged into ours, but the implementation of
>     the octopus strategy does not use this feature of the index.
>
>     Multiple "our" versions at stage #2 by definition should not
>     happen ;-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-01-17  4:02       ` [PATCH v4 3/3] ls-files: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-19  6:30       ` 阿德烈 via GitGitGadget
  2021-01-19  6:30         ` [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
                           ` (3 more replies)
  3 siblings, 4 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-19  6:30 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈

I am reading the source code of git ls-files and learned that git ls-files
may have duplicate files name when there are unmerged path in a branch merge
or when different options are used at the same time. Users may fell confuse
when they see these duplicate file names.

As Junio C Hamano said ,it have odd behaviour.

Therefore, we can provide an additional option to git ls-files to delete
those repeated information.

This fixes https://github.com/gitgitgadget/git/issues/198

Thanks!

ZheNing Hu (3):
  ls_files.c: bugfix for --deleted and --modified
  ls_files.c: consolidate two for loops into one
  ls-files.c: add --deduplicate option

 Documentation/git-ls-files.txt |  5 ++
 builtin/ls-files.c             | 83 ++++++++++++++++++++--------------
 t/t3012-ls-files-dedup.sh      | 66 +++++++++++++++++++++++++++
 3 files changed, 120 insertions(+), 34 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v4:

 1:  f4d9af8a312 ! 1:  ec9464f6094 ls_files.c: bugfix for --deleted and --modified
     @@ Commit message
          This situation may occur in the original code: lstat() failed
          but we use `&st` to feed ie_modified() later.
      
     -    It's buggy!
     -
          Therefore, we can directly execute show_ce without the judgment of
          ie_modified() when lstat() has failed.
      
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -				show_ce(repo, dir, ce, fullname.buf, tag_removed);
      -			if (show_modified && ie_modified(repo->index, ce, &st, 0))
      +			if (err) {
     -+					if (show_deleted)
     -+						show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+					if (show_modified)
     -+						show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+			}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++				if (errno != ENOENT && errno != ENOTDIR)
     ++				    error_errno("cannot lstat '%s'", fullname.buf);
     ++				if (show_deleted)
     ++					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++				if (show_modified)
     ++					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
       				show_ce(repo, dir, ce, fullname.buf, tag_modified);
       		}
       	}
 2:  50efd9b45b1 ! 2:  802ff802be8 ls_files.c: consolidate two for loops into one
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -	if (show_cached || show_stage) {
      -		for (i = 0; i < repo->index->cache_nr; i++) {
      -			const struct cache_entry *ce = repo->index->cache[i];
     --
     ++	if (! (show_cached || show_stage || show_deleted || show_modified))
     ++		return;
     ++	for (i = 0; i < repo->index->cache_nr; i++) {
     ++		const struct cache_entry *ce = repo->index->cache[i];
     ++		struct stat st;
     ++		int err;
     + 
      -			construct_fullname(&fullname, repo, ce);
     --
     ++		construct_fullname(&fullname, repo, ce);
     + 
      -			if ((dir->flags & DIR_SHOW_IGNORED) &&
      -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
      -				continue;
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -				ce_stage(ce) ? tag_unmerged :
      -				(ce_skip_worktree(ce) ? tag_skip_worktree :
      -				 tag_cached));
     --		}
     ++		if ((dir->flags & DIR_SHOW_IGNORED) &&
     ++			!ce_excluded(dir, repo->index, fullname.buf, ce))
     ++			continue;
     ++		if (ce->ce_flags & CE_UPDATE)
     ++			continue;
     ++		if (show_cached || show_stage) {
     ++			if (!show_unmerged || ce_stage(ce))
     ++				show_ce(repo, dir, ce, fullname.buf,
     ++					ce_stage(ce) ? tag_unmerged :
     ++					(ce_skip_worktree(ce) ? tag_skip_worktree :
     ++						tag_cached));
     + 		}
      -	}
      -	if (show_deleted || show_modified) {
      -		for (i = 0; i < repo->index->cache_nr; i++) {
      -			const struct cache_entry *ce = repo->index->cache[i];
      -			struct stat st;
      -			int err;
     -+	if (! (show_cached || show_stage || show_deleted || show_modified))
     -+		return;
     -+	for (i = 0; i < repo->index->cache_nr; i++) {
     -+		const struct cache_entry *ce = repo->index->cache[i];
     -+		struct stat st;
     -+		int err;
     - 
     +-
      -			construct_fullname(&fullname, repo, ce);
     -+		construct_fullname(&fullname, repo, ce);
     - 
     +-
      -			if ((dir->flags & DIR_SHOW_IGNORED) &&
      -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
      -				continue;
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -				continue;
      -			err = lstat(fullname.buf, &st);
      -			if (err) {
     --					if (show_deleted)
     --						show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --					if (show_modified)
     --						show_ce(repo, dir, ce, fullname.buf, tag_modified);
     --			}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     --				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+		if ((dir->flags & DIR_SHOW_IGNORED) &&
     -+			!ce_excluded(dir, repo->index, fullname.buf, ce))
     -+			continue;
     -+		if (ce->ce_flags & CE_UPDATE)
     -+			continue;
     -+		if (show_cached || show_stage) {
     -+			if (!show_unmerged || ce_stage(ce))
     -+				show_ce(repo, dir, ce, fullname.buf,
     -+					ce_stage(ce) ? tag_unmerged :
     -+					(ce_skip_worktree(ce) ? tag_skip_worktree :
     -+						tag_cached));
     - 		}
     +-				if (errno != ENOENT && errno != ENOTDIR)
     +-				    error_errno("cannot lstat '%s'", fullname.buf);
     +-				if (show_deleted)
     +-					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     +-				if (show_modified)
     +-					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     +-			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
      +		if (ce_skip_worktree(ce))
      +			continue;
      +		err = lstat(fullname.buf, &st);
      +		if (err) {
     -+				if (show_deleted)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+				if (show_modified)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++			if (errno != ENOENT && errno != ENOTDIR)
     ++				error_errno("cannot lstat '%s'", fullname.buf);
     ++			if (show_deleted)
     ++				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++			if (show_modified)
     + 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     +-		}
     ++		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
      +			show_ce(repo, dir, ce, fullname.buf, tag_modified);
       	}
       
 3:  0c7830d07db ! 3:  e9c53186706 ls-files: add --deduplicate option
     @@ Metadata
      Author: ZheNing Hu <adlternative@gmail.com>
      
       ## Commit message ##
     -    ls-files: add --deduplicate option
     +    ls-files.c: add --deduplicate option
      
          In order to provide users a better experience
          when viewing information about files in the index
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
       		if (ce->ce_flags & CE_UPDATE)
       			continue;
       		if (show_cached || show_stage) {
     -+			if (show_cached && skipping_duplicates && last_shown_ce &&
     ++			if (skipping_duplicates && last_shown_ce &&
      +				!strcmp(last_shown_ce->name,ce->name))
      +					continue;
       			if (!show_unmerged || ce_stage(ce))
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
       					ce_stage(ce) ? tag_unmerged :
       					(ce_skip_worktree(ce) ? tag_skip_worktree :
       						tag_cached));
     -+			if(show_cached && skipping_duplicates)
     ++			if (show_cached && skipping_duplicates)
      +				last_shown_ce = ce;
       		}
       		if (ce_skip_worktree(ce))
       			continue;
     -+		if (skipping_duplicates && last_shown_ce && !strcmp(last_shown_ce->name,ce->name))
     -+			continue;
     ++		if (skipping_duplicates && last_shown_ce &&
     ++			!strcmp(last_shown_ce->name,ce->name))
     ++				continue;
       		err = lstat(fullname.buf, &st);
       		if (err) {
     +-			if (errno != ENOENT && errno != ENOTDIR)
     +-				error_errno("cannot lstat '%s'", fullname.buf);
     +-			if (show_deleted)
      +			if (skipping_duplicates && show_deleted && show_modified)
     -+				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     + 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     +-			if (show_modified)
     +-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
      +			else {
     - 				if (show_deleted)
     - 					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     - 				if (show_modified)
     - 					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     --		}else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++				if (errno != ENOENT && errno != ENOTDIR)
     ++					error_errno("cannot lstat '%s'", fullname.buf);
     ++				if (show_deleted)
     ++					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++				if (show_modified)
     ++					show_ce(repo, dir, ce, fullname.buf, tag_modified);
      +			}
     -+		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     + 		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
       			show_ce(repo, dir, ce, fullname.buf, tag_modified);
      +		last_shown_ce = ce;
       	}
     @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
       	};
       
      @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
     - 		tag_skip_worktree = "S ";
     - 		tag_resolve_undo = "U ";
     - 	}
     -+	if (show_tag && skipping_duplicates)
     + 		 * you also show the stage information.
     + 		 */
     + 		show_stage = 1;
     ++	if (show_tag || show_stage)
      +		skipping_duplicates = 0;
     - 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
     - 		require_work_tree = 1;
     - 	if (show_unmerged)
     + 	if (dir.exclude_per_dir)
     + 		exc_given = 1;
     + 
      
       ## t/t3012-ls-files-dedup.sh (new) ##
      @@
     @@ t/t3012-ls-files-dedup.sh (new)
      +	>b.txt &&
      +	>delete.txt &&
      +	git add a.txt b.txt delete.txt &&
     -+	git commit -m master:1 &&
     ++	git commit -m base &&
      +	echo a >a.txt &&
      +	echo b >b.txt &&
      +	echo delete >delete.txt &&
      +	git add a.txt b.txt delete.txt &&
     -+	git commit -m master:2 &&
     -+	git checkout HEAD~ &&
     -+	git switch -c dev &&
     -+	test_when_finished "git switch master" &&
     ++	git commit -m tip &&
     ++	git tag tip &&
     ++	git reset --hard HEAD^ &&
      +	echo change >a.txt &&
     -+	git add a.txt &&
     -+	git commit -m dev:1 &&
     -+	test_must_fail git merge master &&
     ++	git commit -a -m side &&
     ++	git tag side
     ++'
     ++
     ++test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
     ++	test_must_fail git merge tip &&
      +	git ls-files --deduplicate >actual &&
      +	cat >expect <<-\EOF &&
      +	a.txt
     @@ t/t3012-ls-files-dedup.sh (new)
      +	delete.txt
      +	EOF
      +	test_cmp expect actual &&
     ++	git merge --abort
     ++'
     ++
     ++test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
     ++	git reset --hard side &&
     ++	test_must_fail git merge tip &&
      +	rm delete.txt &&
      +	git ls-files -d -m --deduplicate >actual &&
      +	cat >expect <<-\EOF &&
     @@ t/t3012-ls-files-dedup.sh (new)
      +	delete.txt
      +	EOF
      +	test_cmp expect actual &&
     -+	git ls-files -d -m -t  --deduplicate >actual &&
     ++	git ls-files -d -m -t --deduplicate >actual &&
      +	cat >expect <<-\EOF &&
      +	C a.txt
      +	C a.txt
     @@ t/t3012-ls-files-dedup.sh (new)
      +	C delete.txt
      +	EOF
      +	test_cmp expect actual &&
     -+	git ls-files -d -m -c  --deduplicate >actual &&
     ++	git ls-files -d -m -c --deduplicate >actual &&
      +	cat >expect <<-\EOF &&
      +	a.txt
      +	b.txt
     @@ t/t3012-ls-files-dedup.sh (new)
      +	test_cmp expect actual &&
      +	git merge --abort
      +'
     ++
      +test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
@ 2021-01-19  6:30         ` ZheNing Hu via GitGitGadget
  2021-01-20 20:26           ` Junio C Hamano
  2021-01-19  6:30         ` [PATCH v5 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-19  6:30 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.

Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..f1617260064 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -347,9 +347,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			if (ce_skip_worktree(ce))
 				continue;
 			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			if (err) {
+				if (errno != ENOENT && errno != ENOTDIR)
+				    error_errno("cannot lstat '%s'", fullname.buf);
+				if (show_deleted)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified)
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
 		}
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 2/3] ls_files.c: consolidate two for loops into one
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-19  6:30         ` [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-19  6:30         ` ZheNing Hu via GitGitGadget
  2021-01-20 20:27           ` Junio C Hamano
  2021-01-19  6:30         ` [PATCH v5 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  3 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-19  6:30 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Refactor the two for loops into one,skip showing the ce if it
has the same name as the previously shown one, only when doing so
won't lose information.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 70 +++++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 41 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f1617260064..1454ab1ae6f 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -312,51 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (show_cached || show_stage) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
+	if (! (show_cached || show_stage || show_deleted || show_modified))
+		return;
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct stat st;
+		int err;
 
-			construct_fullname(&fullname, repo, ce);
+		construct_fullname(&fullname, repo, ce);
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			show_ce(repo, dir, ce, fullname.buf,
-				ce_stage(ce) ? tag_unmerged :
-				(ce_skip_worktree(ce) ? tag_skip_worktree :
-				 tag_cached));
+		if ((dir->flags & DIR_SHOW_IGNORED) &&
+			!ce_excluded(dir, repo->index, fullname.buf, ce))
+			continue;
+		if (ce->ce_flags & CE_UPDATE)
+			continue;
+		if (show_cached || show_stage) {
+			if (!show_unmerged || ce_stage(ce))
+				show_ce(repo, dir, ce, fullname.buf,
+					ce_stage(ce) ? tag_unmerged :
+					(ce_skip_worktree(ce) ? tag_skip_worktree :
+						tag_cached));
 		}
-	}
-	if (show_deleted || show_modified) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-			struct stat st;
-			int err;
-
-			construct_fullname(&fullname, repo, ce);
-
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			if (ce_skip_worktree(ce))
-				continue;
-			err = lstat(fullname.buf, &st);
-			if (err) {
-				if (errno != ENOENT && errno != ENOTDIR)
-				    error_errno("cannot lstat '%s'", fullname.buf);
-				if (show_deleted)
-					show_ce(repo, dir, ce, fullname.buf, tag_removed);
-				if (show_modified)
-					show_ce(repo, dir, ce, fullname.buf, tag_modified);
-			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
+		if (ce_skip_worktree(ce))
+			continue;
+		err = lstat(fullname.buf, &st);
+		if (err) {
+			if (errno != ENOENT && errno != ENOTDIR)
+				error_errno("cannot lstat '%s'", fullname.buf);
+			if (show_deleted)
+				show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (show_modified)
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
-		}
+		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			show_ce(repo, dir, ce, fullname.buf, tag_modified);
 	}
 
 	strbuf_release(&fullname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-19  6:30         ` [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
  2021-01-19  6:30         ` [PATCH v5 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-19  6:30         ` ZheNing Hu via GitGitGadget
  2021-01-20 21:26           ` Junio C Hamano
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  3 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-19  6:30 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 32 ++++++++++++++---
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+), 5 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..d11c8ade402 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1454ab1ae6f..709d727c574 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -301,6 +302,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 {
 	int i;
 	struct strbuf fullname = STRBUF_INIT;
+	const struct cache_entry *last_shown_ce;
 
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
@@ -314,6 +316,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	}
 	if (! (show_cached || show_stage || show_deleted || show_modified))
 		return;
+	last_shown_ce = NULL;
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
@@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 
 		construct_fullname(&fullname, repo, ce);
 
+		if (skipping_duplicates && last_shown_ce &&
+			!strcmp(last_shown_ce->name,ce->name))
+				continue;
 		if ((dir->flags & DIR_SHOW_IGNORED) &&
 			!ce_excluded(dir, repo->index, fullname.buf, ce))
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
 		if (show_cached || show_stage) {
+			if (skipping_duplicates && last_shown_ce &&
+				!strcmp(last_shown_ce->name,ce->name))
+					continue;
 			if (!show_unmerged || ce_stage(ce))
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (show_cached && skipping_duplicates)
+				last_shown_ce = ce;
 		}
 		if (ce_skip_worktree(ce))
 			continue;
+		if (skipping_duplicates && last_shown_ce &&
+			!strcmp(last_shown_ce->name,ce->name))
+				continue;
 		err = lstat(fullname.buf, &st);
 		if (err) {
-			if (errno != ENOENT && errno != ENOTDIR)
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (show_deleted)
+			if (skipping_duplicates && show_deleted && show_modified)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified)
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			else {
+				if (errno != ENOENT && errno != ENOTDIR)
+					error_errno("cannot lstat '%s'", fullname.buf);
+				if (show_deleted)
+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
+				if (show_modified)
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			}
 		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		last_shown_ce = ce;
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..2682b1f43a6
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-19  6:30         ` [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-20 20:26           ` Junio C Hamano
  2021-01-21 10:02             ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-20 20:26 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> This situation may occur in the original code: lstat() failed
> but we use `&st` to feed ie_modified() later.
>
> Therefore, we can directly execute show_ce without the judgment of
> ie_modified() when lstat() has failed.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---

Thanks.  A few comments:

 * The error_errno() line is not indented correctly; I'll fix it up
   while queuing, but it would conflict with 2/3 as you'll be moving
   that line around.

 * When we say "error", we do not even know if the thing got removed
   or modified at all, so it is somewhat strange to report it as
   such (the path may be intact and the only issue may be that we
   cannot read the containing directory).  It is equally strange not
   to say anything on the path, and between the two, there isn't
   clearly a more correct answer.  What you implemented here does
   not change the traditional behaviour of reporting it as
   deleted/modified to "alert" the user, which I think is good.

 * The logic for modified entry looks a bit duplicated.  I wonder if
   the one at the end of this message reads better.  Renaming err to
   stat_err is optional, but I think the name makes it clear why it
   is sensible that these two places use the variable as a sign that
   the path was deleted and/or modified.

>  			err = lstat(fullname.buf, &st);
> +			if (err) {
> +				if (errno != ENOENT && errno != ENOTDIR)
> +				    error_errno("cannot lstat '%s'", fullname.buf);
> +				if (show_deleted)
> +					show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +				if (show_modified)
> +					show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
>  				show_ce(repo, dir, ce, fullname.buf, tag_modified);


			stat_err = lstat(...);
			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
				error_errno("cannot lstat '%s'", fullname.buf);

			if (show_deleted && stat_err)
				show_ce(..., tag_removed);
			if (show_modified &&
			    (stat_err || ie_modified(..., &st, 0)))
				show_ce(..., tag_modified);


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 2/3] ls_files.c: consolidate two for loops into one
  2021-01-19  6:30         ` [PATCH v5 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-20 20:27           ` Junio C Hamano
  2021-01-21 11:05             ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-20 20:27 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Refactor the two for loops into one,skip showing the ce if it
> has the same name as the previously shown one, only when doing so
> won't lose information.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  builtin/ls-files.c | 70 +++++++++++++++++++---------------------------
>  1 file changed, 29 insertions(+), 41 deletions(-)

This one needs a bit more work, but I like the basic structure of
the rewritten loop.

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index f1617260064..1454ab1ae6f 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -312,51 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  		if (show_killed)
>  			show_killed_files(repo->index, dir);
>  	}
> -	if (show_cached || show_stage) {
> -		for (i = 0; i < repo->index->cache_nr; i++) {
> -			const struct cache_entry *ce = repo->index->cache[i];
> +	if (! (show_cached || show_stage || show_deleted || show_modified))
> +		return;

If none of these four are given, nothing will be given after this
point, so returning early is good.

> +	for (i = 0; i < repo->index->cache_nr; i++) {
> +		const struct cache_entry *ce = repo->index->cache[i];
> +		struct stat st;
> +		int err;
>  
> +		construct_fullname(&fullname, repo, ce);
>  
> +		if ((dir->flags & DIR_SHOW_IGNORED) &&
> +			!ce_excluded(dir, repo->index, fullname.buf, ce))
> +			continue;
> +		if (ce->ce_flags & CE_UPDATE)
> +			continue;

The above two are common between the original two codepaths, and
merging them is good.

> +		if (show_cached || show_stage) {
> +			if (!show_unmerged || ce_stage(ce))
> +				show_ce(repo, dir, ce, fullname.buf,
> +					ce_stage(ce) ? tag_unmerged :
> +					(ce_skip_worktree(ce) ? tag_skip_worktree :
> +						tag_cached));
>  		}

We would want to reduce the indentation level of the show_ce() by
consolidating the nested if/if to

		if ((show_cached || show_stage) &&
                    (!show_unmerged || ce_stage(ce)))
			show_ce(...);


Everything below from this point should be skipped (especially, the
call to lstat()) unless show_modified and/or show_deleted was asked
by the caller, i.e.  we want to insert

		if (!(show_deleted || show_modified))
			continue;

here, before we call ce_skip_worktree(), I think.

> +		if (ce_skip_worktree(ce))
> +			continue;
> +		err = lstat(fullname.buf, &st);
> +		if (err) {
> +			if (errno != ENOENT && errno != ENOTDIR)
> +				error_errno("cannot lstat '%s'", fullname.buf);
> +			if (show_deleted)
> +				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +			if (show_modified)
>  				show_ce(repo, dir, ce, fullname.buf, tag_modified);
> -		}
> +		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> +			show_ce(repo, dir, ce, fullname.buf, tag_modified);
>  	}

And this part would look somewhat different if we take my earlier
suggestion for [1/3].

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-19  6:30         ` [PATCH v5 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-20 21:26           ` Junio C Hamano
  2021-01-21 11:00             ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-20 21:26 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget; +Cc: git, Eric Sunshine, 胡哲宁

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  
>  		construct_fullname(&fullname, repo, ce);
>  
> +		if (skipping_duplicates && last_shown_ce &&
> +			!strcmp(last_shown_ce->name,ce->name))
> +				continue;

Style.  Missing SP after comma.

>  		if ((dir->flags & DIR_SHOW_IGNORED) &&
>  			!ce_excluded(dir, repo->index, fullname.buf, ce))
>  			continue;
>  		if (ce->ce_flags & CE_UPDATE)
>  			continue;
>  		if (show_cached || show_stage) {
> +			if (skipping_duplicates && last_shown_ce &&
> +				!strcmp(last_shown_ce->name,ce->name))
> +					continue;

OK.  When show_stage is set, skipping_duplicates is automatically
turned off (and show_unmerged is automatically covered as it turns
show_stage on automatically).  So this feature has really become
"are we showing only names, and if so, did we show an entry of the
same name before?".

>  			if (!show_unmerged || ce_stage(ce))
>  				show_ce(repo, dir, ce, fullname.buf,
>  					ce_stage(ce) ? tag_unmerged :
>  					(ce_skip_worktree(ce) ? tag_skip_worktree :
>  						tag_cached));
> +			if (show_cached && skipping_duplicates)
> +				last_shown_ce = ce;

The code that calls show_ce() belonging to a totally separate if()
statement makes my stomach hurt---how are we going to guarantee that
"last shown" really will keep track of what was shown last?

Shouldn't the above be more like this?

- 			if (!show_unmerged || ce_stage(ce))
+ 			if (!show_unmerged || ce_stage(ce)) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+				last_shown_ce = ce;
+			}

It does maintain last_shown_ce even when skipping_duplicates is not
set, but I think that is overall win.  Assigning unconditionally
would be cheaper than making a conditional jump on the variable and
make assignment (or not).

>  		}
>  		if (ce_skip_worktree(ce))
>  			continue;
> +		if (skipping_duplicates && last_shown_ce &&
> +			!strcmp(last_shown_ce->name,ce->name))
> +				continue;

Style.  Missing SP after comma.

OK, if we've shown an entry of the same name under skip-duplicates
mode, and the code that follows will show the same entry (if they
decide to show it), so we can go to the next entry early.

>  		err = lstat(fullname.buf, &st);
>  		if (err) {
> -			if (errno != ENOENT && errno != ENOTDIR)
> -				error_errno("cannot lstat '%s'", fullname.buf);
> -			if (show_deleted)
> +			if (skipping_duplicates && show_deleted && show_modified)
>  				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> -			if (show_modified)
> -				show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			else {
> +				if (errno != ENOENT && errno != ENOTDIR)
> +					error_errno("cannot lstat '%s'", fullname.buf);
> +				if (show_deleted)
> +					show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +				if (show_modified)
> +					show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			}
>  		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
>  			show_ce(repo, dir, ce, fullname.buf, tag_modified);

This part will change shape quite a bit when we follow the
suggestion I made on 1/3, so I won't analyze how correct this
version is.

> +		last_shown_ce = ce;
>  	}
>  
>  	strbuf_release(&fullname);
> @@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>  			N_("pretend that paths removed since <tree-ish> are still present")),
>  		OPT__ABBREV(&abbrev),
>  		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> +		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
>  		OPT_END()
>  	};
>  
> @@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>  		 * you also show the stage information.
>  		 */
>  		show_stage = 1;
> +	if (show_tag || show_stage)
> +		skipping_duplicates = 0;

OK.

>  	if (dir.exclude_per_dir)
>  		exc_given = 1;
>  

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-20 20:26           ` Junio C Hamano
@ 2021-01-21 10:02             ` 胡哲宁
  0 siblings, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-21 10:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月21日周四 上午4:26写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > This situation may occur in the original code: lstat() failed
> > but we use `&st` to feed ie_modified() later.
> >
> > Therefore, we can directly execute show_ce without the judgment of
> > ie_modified() when lstat() has failed.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
>
> Thanks.  A few comments:
>
>  * The error_errno() line is not indented correctly; I'll fix it up
>    while queuing, but it would conflict with 2/3 as you'll be moving
>    that line around.
>
I might not have noticed,very Sorry.
>  * When we say "error", we do not even know if the thing got removed
>    or modified at all, so it is somewhat strange to report it as
>    such (the path may be intact and the only issue may be that we
>    cannot read the containing directory).  It is equally strange not
>    to say anything on the path, and between the two, there isn't
>    clearly a more correct answer.  What you implemented here does
>    not change the traditional behaviour of reporting it as
>    deleted/modified to "alert" the user, which I think is good.
>
Haha,thanks!
>  * The logic for modified entry looks a bit duplicated.  I wonder if
>    the one at the end of this message reads better.  Renaming err to
>    stat_err is optional, but I think the name makes it clear why it
>    is sensible that these two places use the variable as a sign that
>    the path was deleted and/or modified.
>
> >                       err = lstat(fullname.buf, &st);
> > +                     if (err) {
> > +                             if (errno != ENOENT && errno != ENOTDIR)
> > +                                 error_errno("cannot lstat '%s'", fullname.buf);
> > +                             if (show_deleted)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     } else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> >                               show_ce(repo, dir, ce, fullname.buf, tag_modified);
>
>
>                         stat_err = lstat(...);
>                         if (stat_err && (errno != ENOENT && errno != ENOTDIR))
>                                 error_errno("cannot lstat '%s'", fullname.buf);
>
>                         if (show_deleted && stat_err)
>                                 show_ce(..., tag_removed);
>                         if (show_modified &&
>                             (stat_err || ie_modified(..., &st, 0)))
>                                 show_ce(..., tag_modified);
>
Yes,"stat_err" may better then "err".
And putting the conditions together will make the code more compact.

Thanks for your comment:)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-20 21:26           ` Junio C Hamano
@ 2021-01-21 11:00             ` 胡哲宁
  2021-01-21 20:45               ` Junio C Hamano
  2021-01-22 15:46               ` [PATCH v6] " ZheNing Hu
  0 siblings, 2 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-21 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月21日周四 上午5:26写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >
> >               construct_fullname(&fullname, repo, ce);
> >
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
Get it.
>
> >               if ((dir->flags & DIR_SHOW_IGNORED) &&
> >                       !ce_excluded(dir, repo->index, fullname.buf, ce))
> >                       continue;
> >               if (ce->ce_flags & CE_UPDATE)
> >                       continue;
> >               if (show_cached || show_stage) {
> > +                     if (skipping_duplicates && last_shown_ce &&
> > +                             !strcmp(last_shown_ce->name,ce->name))
> > +                                     continue;
>
> OK.  When show_stage is set, skipping_duplicates is automatically
> turned off (and show_unmerged is automatically covered as it turns
> show_stage on automatically).  So this feature has really become
> "are we showing only names, and if so, did we show an entry of the
> same name before?".
Yeah,showing only names,so I yesterday ask such question :)
>
> >                       if (!show_unmerged || ce_stage(ce))
> >                               show_ce(repo, dir, ce, fullname.buf,
> >                                       ce_stage(ce) ? tag_unmerged :
> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >                                               tag_cached));
> > +                     if (show_cached && skipping_duplicates)
> > +                             last_shown_ce = ce;
>
> The code that calls show_ce() belonging to a totally separate if()
> statement makes my stomach hurt---how are we going to guarantee that
> "last shown" really will keep track of what was shown last?
>
> Shouldn't the above be more like this?
>
> -                       if (!show_unmerged || ce_stage(ce))
> +                       if (!show_unmerged || ce_stage(ce)) {
>                                 show_ce(repo, dir, ce, fullname.buf,
>                                         ce_stage(ce) ? tag_unmerged :
>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
>                                                 tag_cached));
> +                               last_shown_ce = ce;
> +                       }
>
well,I am also thinking about this question :"last_shown_ce" is not true
last shown ce,but may be If "last_shown_ce" truly seen every last shown
ce ,We may need more cumbersome logic to make the program correct.
I have tried the processing method of your above code before, but found
 that some errors may have occurred.
> It does maintain last_shown_ce even when skipping_duplicates is not
> set, but I think that is overall win.  Assigning unconditionally
> would be cheaper than making a conditional jump on the variable and
> make assignment (or not).
>
> >               }
> >               if (ce_skip_worktree(ce))
> >                       continue;
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
>
> OK, if we've shown an entry of the same name under skip-duplicates
> mode, and the code that follows will show the same entry (if they
> decide to show it), so we can go to the next entry early.
>
> >               err = lstat(fullname.buf, &st);
> >               if (err) {
> > -                     if (errno != ENOENT && errno != ENOTDIR)
> > -                             error_errno("cannot lstat '%s'", fullname.buf);
> > -                     if (show_deleted)
> > +                     if (skipping_duplicates && show_deleted && show_modified)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > -                     if (show_modified)
> > -                             show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     else {
> > +                             if (errno != ENOENT && errno != ENOTDIR)
> > +                                     error_errno("cannot lstat '%s'", fullname.buf);
> > +                             if (show_deleted)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     }
> >               } else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> >                       show_ce(repo, dir, ce, fullname.buf, tag_modified);
>
> This part will change shape quite a bit when we follow the
> suggestion I made on 1/3, so I won't analyze how correct this
> version is.
>
Fine...
> > +             last_shown_ce = ce;
> >       }
> >
> >       strbuf_release(&fullname);
> > @@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                       N_("pretend that paths removed since <tree-ish> are still present")),
> >               OPT__ABBREV(&abbrev),
> >               OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> > +             OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
> >               OPT_END()
> >       };
> >
> > @@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                * you also show the stage information.
> >                */
> >               show_stage = 1;
> > +     if (show_tag || show_stage)
> > +             skipping_duplicates = 0;
>
> OK.
>
> >       if (dir.exclude_per_dir)
> >               exc_given = 1;
> >
>
> Thanks.

Thanks,Junio,I find my PR in gitgitgadget have been accepted.
By the way,
I found the problem "leftoverbit" and "good first issue" on gitgitgadget
It may not have been updated for a long time, and most of the above
may have been resolved.

Should it do an update?
Then we can happily be a "bounty hunter" in the git community, haha!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 2/3] ls_files.c: consolidate two for loops into one
  2021-01-20 20:27           ` Junio C Hamano
@ 2021-01-21 11:05             ` 胡哲宁
  0 siblings, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-21 11:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月21日周四 上午4:27写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > Refactor the two for loops into one,skip showing the ce if it
> > has the same name as the previously shown one, only when doing so
> > won't lose information.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >  builtin/ls-files.c | 70 +++++++++++++++++++---------------------------
> >  1 file changed, 29 insertions(+), 41 deletions(-)
>
> This one needs a bit more work, but I like the basic structure of
> the rewritten loop.
>
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index f1617260064..1454ab1ae6f 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -312,51 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >               if (show_killed)
> >                       show_killed_files(repo->index, dir);
> >       }
> > -     if (show_cached || show_stage) {
> > -             for (i = 0; i < repo->index->cache_nr; i++) {
> > -                     const struct cache_entry *ce = repo->index->cache[i];
> > +     if (! (show_cached || show_stage || show_deleted || show_modified))
> > +             return;
>
> If none of these four are given, nothing will be given after this
> point, so returning early is good.
>
I understand.
> > +     for (i = 0; i < repo->index->cache_nr; i++) {
> > +             const struct cache_entry *ce = repo->index->cache[i];
> > +             struct stat st;
> > +             int err;
> >
> > +             construct_fullname(&fullname, repo, ce);
> >
> > +             if ((dir->flags & DIR_SHOW_IGNORED) &&
> > +                     !ce_excluded(dir, repo->index, fullname.buf, ce))
> > +                     continue;
> > +             if (ce->ce_flags & CE_UPDATE)
> > +                     continue;
>
> The above two are common between the original two codepaths, and
> merging them is good.
>
> > +             if (show_cached || show_stage) {
> > +                     if (!show_unmerged || ce_stage(ce))
> > +                             show_ce(repo, dir, ce, fullname.buf,
> > +                                     ce_stage(ce) ? tag_unmerged :
> > +                                     (ce_skip_worktree(ce) ? tag_skip_worktree :
> > +                                             tag_cached));
> >               }
>
> We would want to reduce the indentation level of the show_ce() by
> consolidating the nested if/if to
>
>                 if ((show_cached || show_stage) &&
>                     (!show_unmerged || ce_stage(ce)))
>                         show_ce(...);
>
>
The reason for this may be I gave
"if(show_cached || show_stage)" in 3/3
Added some logic.
> Everything below from this point should be skipped (especially, the
> call to lstat()) unless show_modified and/or show_deleted was asked
> by the caller, i.e.  we want to insert
>
>                 if (!(show_deleted || show_modified))
>                         continue;
>
I agree.
> here, before we call ce_skip_worktree(), I think.
>
> > +             if (ce_skip_worktree(ce))
> > +                     continue;
> > +             err = lstat(fullname.buf, &st);
> > +             if (err) {
> > +                     if (errno != ENOENT && errno != ENOTDIR)
> > +                             error_errno("cannot lstat '%s'", fullname.buf);
> > +                     if (show_deleted)
> > +                             show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                     if (show_modified)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > -             }
> > +             } else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> > +                     show_ce(repo, dir, ce, fullname.buf, tag_modified);
> >       }
>
> And this part would look somewhat different if we take my earlier
> suggestion for [1/3].
>
Fine.
> Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-21 11:00             ` 胡哲宁
@ 2021-01-21 20:45               ` Junio C Hamano
  2021-01-22  9:50                 ` 胡哲宁
  2021-01-22 15:46               ` [PATCH v6] " ZheNing Hu
  1 sibling, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-21 20:45 UTC (permalink / raw)
  To: 胡哲宁
  Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

胡哲宁 <adlternative@gmail.com> writes:

>> OK.  When show_stage is set, skipping_duplicates is automatically
>> turned off (and show_unmerged is automatically covered as it turns
>> show_stage on automatically).  So this feature has really become
>> "are we showing only names, and if so, did we show an entry of the
>> same name before?".
> Yeah,showing only names,so I yesterday ask such question :)
>>
>> >                       if (!show_unmerged || ce_stage(ce))
>> >                               show_ce(repo, dir, ce, fullname.buf,
>> >                                       ce_stage(ce) ? tag_unmerged :
>> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
>> >                                               tag_cached));
>> > +                     if (show_cached && skipping_duplicates)
>> > +                             last_shown_ce = ce;
>>
>> The code that calls show_ce() belonging to a totally separate if()
>> statement makes my stomach hurt---how are we going to guarantee that
>> "last shown" really will keep track of what was shown last?
>>
>> Shouldn't the above be more like this?
>>
>> -                       if (!show_unmerged || ce_stage(ce))
>> +                       if (!show_unmerged || ce_stage(ce)) {
>>                                 show_ce(repo, dir, ce, fullname.buf,
>>                                         ce_stage(ce) ? tag_unmerged :
>>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
>>                                                 tag_cached));
>> +                               last_shown_ce = ce;
>> +                       }
>>
> well,I am also thinking about this question :"last_shown_ce" is not true
> last shown ce,but may be If "last_shown_ce" truly seen every last shown
> ce ,We may need more cumbersome logic to make the program correct.
> I have tried the processing method of your above code before, but found
>  that some errors may have occurred.

I think judicious use of "goto" without introducing the last_shown
would probably result in a much more maintainable code.  It may look
somewhat like so:

	for (i = 0; i < repo->index->cache_nr; i++) {
		const struct cache_entry *ce = repo->index->cache[i];
		struct stat st;
		int stat_err;

		construct_fullname(&fullname, repo, ce);

		if ((dir->flags & DIR_SHOW_IGNORED) &&
			!ce_excluded(dir, repo->index, fullname.buf, ce))
			continue;
		if (ce->ce_flags & CE_UPDATE)
			continue;
		if ((show_cached || show_stage) &&
		    (!show_unmerged || ce_stage(ce))) {
			show_ce(repo, dir, ce, fullname.buf,
				ce_stage(ce) ? tag_unmerged :
				(ce_skip_worktree(ce) ? tag_skip_worktree :
				 tag_cached));
			if (skip_duplicates)
				goto skip_to_next_name;
		}

		if (!show_deleted && !show_modified)
			continue;
		if (ce_skip_worktree(ce))
			continue;
		stat_err = lstat(fullname.buf, &st);
		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
			error_errno("cannot lstat '%s'", fullname.buf);

		if (show_deleted) {
			show_ce(repo, dir, ce, fullname.buf, tag_removed);
			if (skip_duplicates)
				goto skip_to_next_name;
		}
		if (show_modified &&
		    (stat_err || ie_modified(repo->index, ce, &st, 0)))
			show_ce(repo, dir, ce, fullname.buf, tag_modified);
		continue;

	skip_to_next_name:
		{
			int j;
			const struct cache_entry **cache = repo->index->cache;
			for (j = i + 1; j < repo->index->cache_nr; j++)
				if (strcmp(ce->ce_name, cache[j]->ce_name))
					break;
			i = j - 1; /* compensate for outer for loop */
		}
	}

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-21 20:45               ` Junio C Hamano
@ 2021-01-22  9:50                 ` 胡哲宁
  2021-01-22 16:04                   ` Johannes Schindelin
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-22  9:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月22日周五 上午4:45写道:
>
> 胡哲宁 <adlternative@gmail.com> writes:
>
> >> OK.  When show_stage is set, skipping_duplicates is automatically
> >> turned off (and show_unmerged is automatically covered as it turns
> >> show_stage on automatically).  So this feature has really become
> >> "are we showing only names, and if so, did we show an entry of the
> >> same name before?".
> > Yeah,showing only names,so I yesterday ask such question :)
> >>
> >> >                       if (!show_unmerged || ce_stage(ce))
> >> >                               show_ce(repo, dir, ce, fullname.buf,
> >> >                                       ce_stage(ce) ? tag_unmerged :
> >> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >> >                                               tag_cached));
> >> > +                     if (show_cached && skipping_duplicates)
> >> > +                             last_shown_ce = ce;
> >>
> >> The code that calls show_ce() belonging to a totally separate if()
> >> statement makes my stomach hurt---how are we going to guarantee that
> >> "last shown" really will keep track of what was shown last?
> >>
> >> Shouldn't the above be more like this?
> >>
> >> -                       if (!show_unmerged || ce_stage(ce))
> >> +                       if (!show_unmerged || ce_stage(ce)) {
> >>                                 show_ce(repo, dir, ce, fullname.buf,
> >>                                         ce_stage(ce) ? tag_unmerged :
> >>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
> >>                                                 tag_cached));
> >> +                               last_shown_ce = ce;
> >> +                       }
> >>
> > well,I am also thinking about this question :"last_shown_ce" is not true
> > last shown ce,but may be If "last_shown_ce" truly seen every last shown
> > ce ,We may need more cumbersome logic to make the program correct.
> > I have tried the processing method of your above code before, but found
> >  that some errors may have occurred.
>
> I think judicious use of "goto" without introducing the last_shown
> would probably result in a much more maintainable code.  It may look
> somewhat like so:
>
>         for (i = 0; i < repo->index->cache_nr; i++) {
>                 const struct cache_entry *ce = repo->index->cache[i];
>                 struct stat st;
>                 int stat_err;
>
>                 construct_fullname(&fullname, repo, ce);
>
>                 if ((dir->flags & DIR_SHOW_IGNORED) &&
>                         !ce_excluded(dir, repo->index, fullname.buf, ce))
>                         continue;
>                 if (ce->ce_flags & CE_UPDATE)
>                         continue;
>                 if ((show_cached || show_stage) &&
>                     (!show_unmerged || ce_stage(ce))) {
>                         show_ce(repo, dir, ce, fullname.buf,
>                                 ce_stage(ce) ? tag_unmerged :
>                                 (ce_skip_worktree(ce) ? tag_skip_worktree :
>                                  tag_cached));
>                         if (skip_duplicates)
>                                 goto skip_to_next_name;
>                 }
>
>                 if (!show_deleted && !show_modified)
>                         continue;
>                 if (ce_skip_worktree(ce))
>                         continue;
>                 stat_err = lstat(fullname.buf, &st);
>                 if (stat_err && (errno != ENOENT && errno != ENOTDIR))
>                         error_errno("cannot lstat '%s'", fullname.buf);
>
>                 if (show_deleted) {
>                         show_ce(repo, dir, ce, fullname.buf, tag_removed);
>                         if (skip_duplicates)
>                                 goto skip_to_next_name;
>                 }
>                 if (show_modified &&
>                     (stat_err || ie_modified(repo->index, ce, &st, 0)))
>                         show_ce(repo, dir, ce, fullname.buf, tag_modified);
>                 continue;
>
>         skip_to_next_name:
>                 {
>                         int j;
>                         const struct cache_entry **cache = repo->index->cache;
>                         for (j = i + 1; j < repo->index->cache_nr; j++)
>                                 if (strcmp(ce->ce_name, cache[j]->ce_name))
>                                         break;
>                         i = j - 1; /* compensate for outer for loop */
>                 }
>         }
I have to admit that this is indeed a good way to skip with "goto".
Thanks for your help.
And should I still use gitgitgadget PR on my origin branch "dedup"or
send patch on branch "zh/ls-files-deduplicate"?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v6] ls-files.c: add --deduplicate option
  2021-01-21 11:00             ` 胡哲宁
  2021-01-21 20:45               ` Junio C Hamano
@ 2021-01-22 15:46               ` ZheNing Hu
  2021-01-22 20:52                 ` Junio C Hamano
  1 sibling, 1 reply; 65+ messages in thread
From: ZheNing Hu @ 2021-01-22 15:46 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Junio C Hamano, 阿德烈

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 46 +++++++++++++++++-------
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 12 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 0a3b5265b3..a05f063d3d 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -80,6 +81,10 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1454ab1ae6..e67dc1ff45 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -317,7 +318,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
 		struct stat st;
-		int err;
+		int stat_err;
 
 		construct_fullname(&fullname, repo, ce);
 
@@ -326,25 +327,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
-		if (show_cached || show_stage) {
-			if (!show_unmerged || ce_stage(ce))
+		if ((show_cached || show_stage) &&
+			(!show_unmerged || ce_stage(ce))) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (show_cached && skipping_duplicates)
+				goto skip_to_next_name;
 		}
+		if (!show_deleted && !show_modified)
+			continue;
 		if (ce_skip_worktree(ce))
 			continue;
-		err = lstat(fullname.buf, &st);
-		if (err) {
-			if (errno != ENOENT && errno != ENOTDIR)
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (show_deleted)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified)
+		stat_err = lstat(fullname.buf, &st);
+		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+			error_errno("cannot lstat '%s'", fullname.buf);
+		if (stat_err && show_deleted) {
+			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		if (show_modified &&
+			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
-		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
-			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		continue;
+skip_to_next_name:
+		{
+			int j;
+			struct cache_entry **cache = repo->index->cache;
+			for (j = i + 1; j < repo->index->cache_nr; j++)
+				if (strcmp(ce->name, cache[j]->name))
+					break;
+			i = j - 1; /* compensate for outer for loop */
+		}
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 0000000000..2682b1f43a
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-22  9:50                 ` 胡哲宁
@ 2021-01-22 16:04                   ` Johannes Schindelin
  2021-01-22 18:02                     ` Junio C Hamano
  2021-01-23  8:20                     ` 胡哲宁
  0 siblings, 2 replies; 65+ messages in thread
From: Johannes Schindelin @ 2021-01-22 16:04 UTC (permalink / raw)
  To: 胡哲宁
  Cc: Junio C Hamano, ZheNing Hu via GitGitGadget, Git List,
	Eric Sunshine

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

Hi 胡哲宁,

On Fri, 22 Jan 2021, 胡哲宁 wrote:

> And should I still use gitgitgadget PR on my origin branch "dedup"or
> send patch on branch "zh/ls-files-deduplicate"?

The way GitGitGadget is designed asks for contributors to adjust their
patch(es) via interactive rebase, implementing the suggestions and
addressing the concerns while doing so, then force-pushing, optionally
amending the first PR comment (i.e. the description) with a list of
those changes, and then submitting a new iteration via `/submit`.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-22 16:04                   ` Johannes Schindelin
@ 2021-01-22 18:02                     ` Junio C Hamano
  2021-03-19 13:54                       ` GitGitGadget and `next`, was " Johannes Schindelin
  2021-01-23  8:20                     ` 胡哲宁
  1 sibling, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 18:02 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: 胡哲宁, ZheNing Hu via GitGitGadget, Git List,
	Eric Sunshine

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> And should I still use gitgitgadget PR on my origin branch "dedup"or
>> send patch on branch "zh/ls-files-deduplicate"?
>
> The way GitGitGadget is designed asks for contributors to adjust their
> patch(es) via interactive rebase, implementing the suggestions and
> addressing the concerns while doing so, then force-pushing, optionally
> amending the first PR comment (i.e. the description) with a list of
> those changes, and then submitting a new iteration via `/submit`.

Thanks for clearly explaining the rules.

As I suspect many people are afraid of forcing their pushes, it
would assure them to explain that it is OK to force when them
restart the series from scratch by replacing the commits.

And it would very much help on the receiving end when the
description gets updated.

Just being curious, but when a series hits 'next', would the way in
which the user interacts with GGG change?  With or without GGG, what
is done on the local side is not all that different---you build new
commits on top without disturbing the commits that are in 'next'.
Then what?  Push it again (this time there is no need to force) and
submit the additional ones via `/submit`?

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6] ls-files.c: add --deduplicate option
  2021-01-22 15:46               ` [PATCH v6] " ZheNing Hu
@ 2021-01-22 20:52                 ` Junio C Hamano
  2021-01-23  8:27                   ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-22 20:52 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: git, Eric Sunshine

ZheNing Hu <adlternative@gmail.com> writes:

> In order to provide users a better experience
> when viewing information about files in the index
> and the working tree, the `--deduplicate` option will suppress
> some duplicate name under some conditions.

Now is it just a single patch squashing everything together?
That does not look like it.

> @@ -317,7 +318,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  	for (i = 0; i < repo->index->cache_nr; i++) {
>  		const struct cache_entry *ce = repo->index->cache[i];
>  		struct stat st;
> -		int err;
> +		int stat_err;
>  
>  		construct_fullname(&fullname, repo, ce);
>  
> @@ -326,25 +327,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  			continue;
>  		if (ce->ce_flags & CE_UPDATE)
>  			continue;
> -		if (show_cached || show_stage) {
> -			if (!show_unmerged || ce_stage(ce))
> +		if ((show_cached || show_stage) &&
> +			(!show_unmerged || ce_stage(ce))) {
>  				show_ce(repo, dir, ce, fullname.buf,
>  					ce_stage(ce) ? tag_unmerged :
>  					(ce_skip_worktree(ce) ? tag_skip_worktree :
>  						tag_cached));
> +			if (show_cached && skipping_duplicates)
> +				goto skip_to_next_name;

Why should this be so complex?  You are dropping skipping_duplicates
when the output is not name-only, so shouldn't this look more like

		if ((show_cached || show_stage) &&
		    (!show_unmerged || ce_stage(ce)) {
			show_ce(...);
                        if (skipping_duplicates)
                        	goto skip_to_next_name;
		}

It seems that this still depends on the 2/3 from the previous
iteration, against which I suggested to merge the conditions of
nested if statements into one.  That should be done in the updated
2/3, not in this step, no?

>  		}
> +		if (!show_deleted && !show_modified)
> +			continue;

And this one also belongs to the step 2/3 that consolidates the two
loops into one.

I think you'd need to start from the three patches in v5, "rebase -i"
not just [3/3] but at least [2/3], too.

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-22 16:04                   ` Johannes Schindelin
  2021-01-22 18:02                     ` Junio C Hamano
@ 2021-01-23  8:20                     ` 胡哲宁
  1 sibling, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-23  8:20 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine,
	Junio C Hamano

Hi Johannes Schindelin,
Thanks for prompt me how to choose, maybe I was a little confused
about the git workflow before, "zh/ls-files-deduplicate" this kind of
 branch I don't need to operate,right?

Well, then I will modify my code on the original branch.

Johannes Schindelin <Johannes.Schindelin@gmx.de> 于2021年1月23日周六 上午12:04写道:
>
> Hi 胡哲宁,
>
> On Fri, 22 Jan 2021, 胡哲宁 wrote:
>
> > And should I still use gitgitgadget PR on my origin branch "dedup"or
> > send patch on branch "zh/ls-files-deduplicate"?
>
> The way GitGitGadget is designed asks for contributors to adjust their
> patch(es) via interactive rebase, implementing the suggestions and
> addressing the concerns while doing so, then force-pushing, optionally
> amending the first PR comment (i.e. the description) with a list of
> those changes, and then submitting a new iteration via `/submit`.
>
> Ciao,
> Johannes

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6] ls-files.c: add --deduplicate option
  2021-01-22 20:52                 ` Junio C Hamano
@ 2021-01-23  8:27                   ` 胡哲宁
  0 siblings, 0 replies; 65+ messages in thread
From: 胡哲宁 @ 2021-01-23  8:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月23日周六 上午4:52写道:
>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> > In order to provide users a better experience
> > when viewing information about files in the index
> > and the working tree, the `--deduplicate` option will suppress
> > some duplicate name under some conditions.
>
> Now is it just a single patch squashing everything together?
> That does not look like it.
>
> > @@ -317,7 +318,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >       for (i = 0; i < repo->index->cache_nr; i++) {
> >               const struct cache_entry *ce = repo->index->cache[i];
> >               struct stat st;
> > -             int err;
> > +             int stat_err;
> >
> >               construct_fullname(&fullname, repo, ce);
> >
> > @@ -326,25 +327,43 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >                       continue;
> >               if (ce->ce_flags & CE_UPDATE)
> >                       continue;
> > -             if (show_cached || show_stage) {
> > -                     if (!show_unmerged || ce_stage(ce))
> > +             if ((show_cached || show_stage) &&
> > +                     (!show_unmerged || ce_stage(ce))) {
> >                               show_ce(repo, dir, ce, fullname.buf,
> >                                       ce_stage(ce) ? tag_unmerged :
> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >                                               tag_cached));
> > +                     if (show_cached && skipping_duplicates)
> > +                             goto skip_to_next_name;
>
> Why should this be so complex?  You are dropping skipping_duplicates
> when the output is not name-only, so shouldn't this look more like
>
Truly,I may have considered too much,if I have
"show_stage","skipping_duplicates"
must be false.
>                 if ((show_cached || show_stage) &&
>                     (!show_unmerged || ce_stage(ce)) {
>                         show_ce(...);
>                         if (skipping_duplicates)
>                                 goto skip_to_next_name;
>                 }
>
> It seems that this still depends on the 2/3 from the previous
> iteration, against which I suggested to merge the conditions of
> nested if statements into one.  That should be done in the updated
> 2/3, not in this step, no?
>
> >               }
> > +             if (!show_deleted && !show_modified)
> > +                     continue;
>
> And this one also belongs to the step 2/3 that consolidates the two
> loops into one.
>
> I think you'd need to start from the three patches in v5, "rebase -i"
> not just [3/3] but at least [2/3], too.
>
> Thanks.

Thanks,I am rewriting.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-01-19  6:30         ` [PATCH v5 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-23 10:20         ` 阿德烈 via GitGitGadget
  2021-01-23 10:20           ` [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
                             ` (4 more replies)
  3 siblings, 5 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-23 10:20 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈

I am reading the source code of git ls-files and learned that git ls-files
may have duplicate files name when there are unmerged path in a branch merge
or when different options are used at the same time. Users may fell confuse
when they see these duplicate file names.

As Junio C Hamano said ,it have odd behaviour.

Therefore, we can provide an additional option to git ls-files to delete
those repeated information.

This fixes https://github.com/gitgitgadget/git/issues/198

Thanks!

ZheNing Hu (3):
  ls_files.c: bugfix for --deleted and --modified
  ls_files.c: consolidate two for loops into one
  ls-files.c: add --deduplicate option

 Documentation/git-ls-files.txt |  5 ++
 builtin/ls-files.c             | 85 ++++++++++++++++++++--------------
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 35 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v5:

 1:  ec9464f6094 ! 1:  fbc38ce9075 ls_files.c: bugfix for --deleted and --modified
     @@ Commit message
      
       ## builtin/ls-files.c ##
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     + 		for (i = 0; i < repo->index->cache_nr; i++) {
     + 			const struct cache_entry *ce = repo->index->cache[i];
     + 			struct stat st;
     +-			int err;
     ++			int stat_err;
     + 
     + 			construct_fullname(&fullname, repo, ce);
     + 
     +@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     + 				continue;
       			if (ce_skip_worktree(ce))
       				continue;
     - 			err = lstat(fullname.buf, &st);
     +-			err = lstat(fullname.buf, &st);
      -			if (show_deleted && err)
     --				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++			stat_err = lstat(fullname.buf, &st);
     ++			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
     ++				error_errno("cannot lstat '%s'", fullname.buf);
     ++			if (stat_err && show_deleted)
     + 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
      -			if (show_modified && ie_modified(repo->index, ce, &st, 0))
     -+			if (err) {
     -+				if (errno != ENOENT && errno != ENOTDIR)
     -+				    error_errno("cannot lstat '%s'", fullname.buf);
     -+				if (show_deleted)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+				if (show_modified)
     +-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++			if (show_modified &&
     ++				(stat_err || ie_modified(repo->index, ce, &st, 0)))
      +					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     - 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
       		}
       	}
     + 
 2:  802ff802be8 ! 2:  3997d390883 ls_files.c: consolidate two for loops into one
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -	if (show_cached || show_stage) {
      -		for (i = 0; i < repo->index->cache_nr; i++) {
      -			const struct cache_entry *ce = repo->index->cache[i];
     -+	if (! (show_cached || show_stage || show_deleted || show_modified))
     -+		return;
     -+	for (i = 0; i < repo->index->cache_nr; i++) {
     -+		const struct cache_entry *ce = repo->index->cache[i];
     -+		struct stat st;
     -+		int err;
     - 
     +-
      -			construct_fullname(&fullname, repo, ce);
     -+		construct_fullname(&fullname, repo, ce);
     - 
     +-
      -			if ((dir->flags & DIR_SHOW_IGNORED) &&
      -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
      -				continue;
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -				ce_stage(ce) ? tag_unmerged :
      -				(ce_skip_worktree(ce) ? tag_skip_worktree :
      -				 tag_cached));
     -+		if ((dir->flags & DIR_SHOW_IGNORED) &&
     -+			!ce_excluded(dir, repo->index, fullname.buf, ce))
     -+			continue;
     -+		if (ce->ce_flags & CE_UPDATE)
     -+			continue;
     -+		if (show_cached || show_stage) {
     -+			if (!show_unmerged || ce_stage(ce))
     -+				show_ce(repo, dir, ce, fullname.buf,
     -+					ce_stage(ce) ? tag_unmerged :
     -+					(ce_skip_worktree(ce) ? tag_skip_worktree :
     -+						tag_cached));
     - 		}
     +-		}
      -	}
      -	if (show_deleted || show_modified) {
      -		for (i = 0; i < repo->index->cache_nr; i++) {
      -			const struct cache_entry *ce = repo->index->cache[i];
      -			struct stat st;
     --			int err;
     --
     +-			int stat_err;
     ++	if (! (show_cached || show_stage || show_deleted || show_modified))
     ++		return;
     ++	for (i = 0; i < repo->index->cache_nr; i++) {
     ++		const struct cache_entry *ce = repo->index->cache[i];
     ++		struct stat st;
     ++		int stat_err;
     + 
      -			construct_fullname(&fullname, repo, ce);
     --
     ++		construct_fullname(&fullname, repo, ce);
     + 
      -			if ((dir->flags & DIR_SHOW_IGNORED) &&
      -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
      -				continue;
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
      -				continue;
      -			if (ce_skip_worktree(ce))
      -				continue;
     --			err = lstat(fullname.buf, &st);
     --			if (err) {
     --				if (errno != ENOENT && errno != ENOTDIR)
     --				    error_errno("cannot lstat '%s'", fullname.buf);
     --				if (show_deleted)
     --					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --				if (show_modified)
     +-			stat_err = lstat(fullname.buf, &st);
     +-			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
     +-				error_errno("cannot lstat '%s'", fullname.buf);
     +-			if (stat_err && show_deleted)
     +-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     +-			if (show_modified &&
     +-				(stat_err || ie_modified(repo->index, ce, &st, 0)))
      -					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     --			} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     ++		if ((dir->flags & DIR_SHOW_IGNORED) &&
     ++			!ce_excluded(dir, repo->index, fullname.buf, ce))
     ++			continue;
     ++		if (ce->ce_flags & CE_UPDATE)
     ++			continue;
     ++		if (show_cached || show_stage) {
     ++			if (!show_unmerged || ce_stage(ce))
     ++				show_ce(repo, dir, ce, fullname.buf,
     ++					ce_stage(ce) ? tag_unmerged :
     ++					(ce_skip_worktree(ce) ? tag_skip_worktree :
     ++						tag_cached));
     + 		}
     ++		if (!show_deleted && !show_modified)
     ++			continue;
      +		if (ce_skip_worktree(ce))
      +			continue;
     -+		err = lstat(fullname.buf, &st);
     -+		if (err) {
     -+			if (errno != ENOENT && errno != ENOTDIR)
     -+				error_errno("cannot lstat '%s'", fullname.buf);
     -+			if (show_deleted)
     -+				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+			if (show_modified)
     - 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     --		}
     -+		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     -+			show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++		stat_err = lstat(fullname.buf, &st);
     ++		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
     ++			error_errno("cannot lstat '%s'", fullname.buf);
     ++		if (stat_err && show_deleted)
     ++			show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++		if (show_modified &&
     ++			(stat_err || ie_modified(repo->index, ce, &st, 0)))
     ++				show_ce(repo, dir, ce, fullname.buf, tag_modified);
       	}
       
       	strbuf_release(&fullname);
 3:  e9c53186706 ! 3:  07b603fd97c ls-files.c: add --deduplicate option
     @@ builtin/ls-files.c: static int line_terminator = '\n';
       static const char *prefix;
       static int max_prefix_len;
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - {
     - 	int i;
     - 	struct strbuf fullname = STRBUF_INIT;
     -+	const struct cache_entry *last_shown_ce;
     - 
     - 	/* For cached/deleted files we don't need to even do the readdir */
     - 	if (show_others || show_killed) {
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 	}
     - 	if (! (show_cached || show_stage || show_deleted || show_modified))
     - 		return;
     -+	last_shown_ce = NULL;
     - 	for (i = 0; i < repo->index->cache_nr; i++) {
     - 		const struct cache_entry *ce = repo->index->cache[i];
     - 		struct stat st;
     -@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     - 
     - 		construct_fullname(&fullname, repo, ce);
     - 
     -+		if (skipping_duplicates && last_shown_ce &&
     -+			!strcmp(last_shown_ce->name,ce->name))
     -+				continue;
     - 		if ((dir->flags & DIR_SHOW_IGNORED) &&
     - 			!ce_excluded(dir, repo->index, fullname.buf, ce))
       			continue;
       		if (ce->ce_flags & CE_UPDATE)
       			continue;
     - 		if (show_cached || show_stage) {
     -+			if (skipping_duplicates && last_shown_ce &&
     -+				!strcmp(last_shown_ce->name,ce->name))
     -+					continue;
     - 			if (!show_unmerged || ce_stage(ce))
     +-		if (show_cached || show_stage) {
     +-			if (!show_unmerged || ce_stage(ce))
     ++		if ((show_cached || show_stage) &&
     ++			(!show_unmerged || ce_stage(ce))) {
       				show_ce(repo, dir, ce, fullname.buf,
       					ce_stage(ce) ? tag_unmerged :
       					(ce_skip_worktree(ce) ? tag_skip_worktree :
       						tag_cached));
     -+			if (show_cached && skipping_duplicates)
     -+				last_shown_ce = ce;
     ++			if (skipping_duplicates)
     ++				goto skip_to_next_name;
       		}
     - 		if (ce_skip_worktree(ce))
     + 		if (!show_deleted && !show_modified)
       			continue;
     -+		if (skipping_duplicates && last_shown_ce &&
     -+			!strcmp(last_shown_ce->name,ce->name))
     -+				continue;
     - 		err = lstat(fullname.buf, &st);
     - 		if (err) {
     --			if (errno != ENOENT && errno != ENOTDIR)
     --				error_errno("cannot lstat '%s'", fullname.buf);
     --			if (show_deleted)
     -+			if (skipping_duplicates && show_deleted && show_modified)
     - 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
     --			if (show_modified)
     --				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+			else {
     -+				if (errno != ENOENT && errno != ENOTDIR)
     -+					error_errno("cannot lstat '%s'", fullname.buf);
     -+				if (show_deleted)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_removed);
     -+				if (show_modified)
     -+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+			}
     - 		} else if (show_modified && ie_modified(repo->index, ce, &st, 0))
     - 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
     -+		last_shown_ce = ce;
     +@@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
     + 		stat_err = lstat(fullname.buf, &st);
     + 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
     + 			error_errno("cannot lstat '%s'", fullname.buf);
     +-		if (stat_err && show_deleted)
     ++		if (stat_err && show_deleted) {
     + 			show_ce(repo, dir, ce, fullname.buf, tag_removed);
     ++			if (skipping_duplicates)
     ++				goto skip_to_next_name;
     ++		}
     + 		if (show_modified &&
     +-			(stat_err || ie_modified(repo->index, ce, &st, 0)))
     ++			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
     + 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
     ++			if (skipping_duplicates)
     ++				goto skip_to_next_name;
     ++		}
     ++		continue;
     ++skip_to_next_name:
     ++		{
     ++			int j;
     ++			struct cache_entry **cache = repo->index->cache;
     ++			for (j = i + 1; j < repo->index->cache_nr; j++)
     ++				if (strcmp(ce->name, cache[j]->name))
     ++					break;
     ++			i = j - 1; /* compensate for outer for loop */
     ++		}
       	}
       
       	strbuf_release(&fullname);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
@ 2021-01-23 10:20           ` ZheNing Hu via GitGitGadget
  2021-01-23 17:55             ` Junio C Hamano
  2021-01-23 10:20           ` [PATCH v6 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-23 10:20 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.

Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..1e264bd1329 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -335,7 +335,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
-			int err;
+			int stat_err;
 
 			construct_fullname(&fullname, repo, ce);
 
@@ -346,11 +346,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 				continue;
 			if (ce_skip_worktree(ce))
 				continue;
-			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			stat_err = lstat(fullname.buf, &st);
+			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+				error_errno("cannot lstat '%s'", fullname.buf);
+			if (stat_err && show_deleted)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (show_modified &&
+				(stat_err || ie_modified(repo->index, ce, &st, 0)))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
 		}
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 2/3] ls_files.c: consolidate two for loops into one
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-23 10:20           ` [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-23 10:20           ` ZheNing Hu via GitGitGadget
  2021-01-23 19:50             ` Junio C Hamano
  2021-01-23 10:20           ` [PATCH v6 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-23 10:20 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Refactor the two for loops into one,skip showing the ce if it
has the same name as the previously shown one, only when doing so
won't lose information.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/ls-files.c | 70 ++++++++++++++++++++--------------------------
 1 file changed, 30 insertions(+), 40 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1e264bd1329..966c0ab0296 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -312,49 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (show_cached || show_stage) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-
-			construct_fullname(&fullname, repo, ce);
-
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			show_ce(repo, dir, ce, fullname.buf,
-				ce_stage(ce) ? tag_unmerged :
-				(ce_skip_worktree(ce) ? tag_skip_worktree :
-				 tag_cached));
-		}
-	}
-	if (show_deleted || show_modified) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-			struct stat st;
-			int stat_err;
+	if (! (show_cached || show_stage || show_deleted || show_modified))
+		return;
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct stat st;
+		int stat_err;
 
-			construct_fullname(&fullname, repo, ce);
+		construct_fullname(&fullname, repo, ce);
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			if (ce_skip_worktree(ce))
-				continue;
-			stat_err = lstat(fullname.buf, &st);
-			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (stat_err && show_deleted)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified &&
-				(stat_err || ie_modified(repo->index, ce, &st, 0)))
-					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		if ((dir->flags & DIR_SHOW_IGNORED) &&
+			!ce_excluded(dir, repo->index, fullname.buf, ce))
+			continue;
+		if (ce->ce_flags & CE_UPDATE)
+			continue;
+		if (show_cached || show_stage) {
+			if (!show_unmerged || ce_stage(ce))
+				show_ce(repo, dir, ce, fullname.buf,
+					ce_stage(ce) ? tag_unmerged :
+					(ce_skip_worktree(ce) ? tag_skip_worktree :
+						tag_cached));
 		}
+		if (!show_deleted && !show_modified)
+			continue;
+		if (ce_skip_worktree(ce))
+			continue;
+		stat_err = lstat(fullname.buf, &st);
+		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+			error_errno("cannot lstat '%s'", fullname.buf);
+		if (stat_err && show_deleted)
+			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+		if (show_modified &&
+			(stat_err || ie_modified(repo->index, ce, &st, 0)))
+				show_ce(repo, dir, ce, fullname.buf, tag_modified);
 	}
 
 	strbuf_release(&fullname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v6 3/3] ls-files.c: add --deduplicate option
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-23 10:20           ` [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
  2021-01-23 10:20           ` [PATCH v6 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-23 10:20           ` ZheNing Hu via GitGitGadget
  2021-01-23 19:51             ` Junio C Hamano
  2021-01-23 19:53           ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified Junio C Hamano
  2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  4 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-23 10:20 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to provide users a better experience
when viewing information about files in the index
and the working tree, the `--deduplicate` option will suppress
some duplicate name under some conditions.

In a merge conflict, one file name of "git ls-files" output may
appear multiple times. For example,now there is an unmerged path
`a.c`,`a.c` will appear three times in the output of
"git ls-files".We can use "git ls-files --deduplicate" to output
`a.c` only one time.(unless `--stage` or `--unmerged` is
used to view all the detailed information in the index)

In addition, if you use both `--delete` and `--modify` at
the same time, The `--deduplicate` option
can also suppress file name output.

Additional instructions:
In order to display entries information,`deduplicate` suppresses
the output of duplicate file names, not the output of duplicate
entries information, so under the option of `-t`, `--stage`, `--unmerge`,
`--deduplicate` will have no effect.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 30 +++++++++++++---
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+), 4 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..d11c8ade402 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,10 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	Suppress duplicate entries when there are unmerged paths in index
+	or `--deleted` and `--modified` are combined.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 966c0ab0296..fb9cf50d764 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -326,12 +327,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
-		if (show_cached || show_stage) {
-			if (!show_unmerged || ce_stage(ce))
+		if ((show_cached || show_stage) &&
+			(!show_unmerged || ce_stage(ce))) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (skipping_duplicates)
+				goto skip_to_next_name;
 		}
 		if (!show_deleted && !show_modified)
 			continue;
@@ -340,11 +343,27 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		stat_err = lstat(fullname.buf, &st);
 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
 			error_errno("cannot lstat '%s'", fullname.buf);
-		if (stat_err && show_deleted)
+		if (stat_err && show_deleted) {
 			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
 		if (show_modified &&
-			(stat_err || ie_modified(repo->index, ce, &st, 0)))
+			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		continue;
+skip_to_next_name:
+		{
+			int j;
+			struct cache_entry **cache = repo->index->cache;
+			for (j = i + 1; j < repo->index->cache_nr; j++)
+				if (strcmp(ce->name, cache[j]->name))
+					break;
+			i = j - 1; /* compensate for outer for loop */
+		}
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..2682b1f43a6
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-23 10:20           ` [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-23 17:55             ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 17:55 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Eric Sunshine, 胡哲宁, Johannes Schindelin

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> This situation may occur in the original code: lstat() failed
> but we use `&st` to feed ie_modified() later.
>
> Therefore, we can directly execute show_ce without the judgment of
> ie_modified() when lstat() has failed.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  builtin/ls-files.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)

Looks good.  I think we are finished with this part, except for one
nit.

> +			if (stat_err && show_deleted)
>  				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> -			if (show_modified && ie_modified(repo->index, ce, &st, 0))
> -				show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +			if (show_modified &&
> +				(stat_err || ie_modified(repo->index, ce, &st, 0)))
> +					show_ce(repo, dir, ce, fullname.buf, tag_modified);
>  		}

The last line is misindented by having one leading horizontal tab
too many.  show_ce() for modified files and show_ce() for deleted
files are done independently under different conditions and stand as
equals, so the beginning of them should align to show that.

Perhaps format the last three lines more like so:

			if (show_modified &&
			    (stat_err || ie_modified(repo->index, ce, &st, 0)))
				show_ce(repo, dir, ce, fullname.buf, tag_modified);

Again this would cascade throughout the sreies, but let's see if
there are other things we may want to change in the rest of the
series first.  Otherwise, instead of having you rebase, I probably
have time to tweak the series on my end while queuing.

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 2/3] ls_files.c: consolidate two for loops into one
  2021-01-23 10:20           ` [PATCH v6 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-23 19:50             ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 19:50 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Eric Sunshine, 胡哲宁, Johannes Schindelin

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> Refactor the two for loops into one,skip showing the ce if it
> has the same name as the previously shown one, only when doing so
> won't lose information.

This message is all stale now.  This step does only refactoring,
without "skip showing" and others.

I've rebased the series locally and sending out a "v7" for your
review later.

Thanks.

>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  builtin/ls-files.c | 70 ++++++++++++++++++++--------------------------
>  1 file changed, 30 insertions(+), 40 deletions(-)
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index 1e264bd1329..966c0ab0296 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -312,49 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  		if (show_killed)
>  			show_killed_files(repo->index, dir);
>  	}
> -	if (show_cached || show_stage) {
> -		for (i = 0; i < repo->index->cache_nr; i++) {
> -			const struct cache_entry *ce = repo->index->cache[i];
> -
> -			construct_fullname(&fullname, repo, ce);
> -
> -			if ((dir->flags & DIR_SHOW_IGNORED) &&
> -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
> -				continue;
> -			if (show_unmerged && !ce_stage(ce))
> -				continue;
> -			if (ce->ce_flags & CE_UPDATE)
> -				continue;
> -			show_ce(repo, dir, ce, fullname.buf,
> -				ce_stage(ce) ? tag_unmerged :
> -				(ce_skip_worktree(ce) ? tag_skip_worktree :
> -				 tag_cached));
> -		}
> -	}
> -	if (show_deleted || show_modified) {
> -		for (i = 0; i < repo->index->cache_nr; i++) {
> -			const struct cache_entry *ce = repo->index->cache[i];
> -			struct stat st;
> -			int stat_err;
> +	if (! (show_cached || show_stage || show_deleted || show_modified))
> +		return;
> +	for (i = 0; i < repo->index->cache_nr; i++) {
> +		const struct cache_entry *ce = repo->index->cache[i];
> +		struct stat st;
> +		int stat_err;
>  
> -			construct_fullname(&fullname, repo, ce);
> +		construct_fullname(&fullname, repo, ce);
>  
> -			if ((dir->flags & DIR_SHOW_IGNORED) &&
> -			    !ce_excluded(dir, repo->index, fullname.buf, ce))
> -				continue;
> -			if (ce->ce_flags & CE_UPDATE)
> -				continue;
> -			if (ce_skip_worktree(ce))
> -				continue;
> -			stat_err = lstat(fullname.buf, &st);
> -			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
> -				error_errno("cannot lstat '%s'", fullname.buf);
> -			if (stat_err && show_deleted)
> -				show_ce(repo, dir, ce, fullname.buf, tag_removed);
> -			if (show_modified &&
> -				(stat_err || ie_modified(repo->index, ce, &st, 0)))
> -					show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +		if ((dir->flags & DIR_SHOW_IGNORED) &&
> +			!ce_excluded(dir, repo->index, fullname.buf, ce))
> +			continue;
> +		if (ce->ce_flags & CE_UPDATE)
> +			continue;
> +		if (show_cached || show_stage) {
> +			if (!show_unmerged || ce_stage(ce))
> +				show_ce(repo, dir, ce, fullname.buf,
> +					ce_stage(ce) ? tag_unmerged :
> +					(ce_skip_worktree(ce) ? tag_skip_worktree :
> +						tag_cached));
>  		}
> +		if (!show_deleted && !show_modified)
> +			continue;
> +		if (ce_skip_worktree(ce))
> +			continue;
> +		stat_err = lstat(fullname.buf, &st);
> +		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
> +			error_errno("cannot lstat '%s'", fullname.buf);
> +		if (stat_err && show_deleted)
> +			show_ce(repo, dir, ce, fullname.buf, tag_removed);
> +		if (show_modified &&
> +			(stat_err || ie_modified(repo->index, ce, &st, 0)))
> +				show_ce(repo, dir, ce, fullname.buf, tag_modified);
>  	}
>  
>  	strbuf_release(&fullname);

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v6 3/3] ls-files.c: add --deduplicate option
  2021-01-23 10:20           ` [PATCH v6 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-23 19:51             ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 19:51 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Eric Sunshine, 胡哲宁, Johannes Schindelin

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Additional instructions:
> In order to display entries information,`deduplicate` suppresses
> the output of duplicate file names, not the output of duplicate
> entries information, so under the option of `-t`, `--stage`, `--unmerge`,
> `--deduplicate` will have no effect.

That information belongs to the end-user documentation.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
                             ` (2 preceding siblings ...)
  2021-01-23 10:20           ` [PATCH v6 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
@ 2021-01-23 19:53           ` Junio C Hamano
  2021-01-23 19:53             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one Junio C Hamano
  2021-01-23 19:53             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option Junio C Hamano
  2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  4 siblings, 2 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 19:53 UTC (permalink / raw)
  To: git; +Cc: gitgitgadget, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.

Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: fixed misindented code]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/ls-files.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b8..ce6f6ad00e 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -335,7 +335,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
-			int err;
+			int stat_err;
 
 			construct_fullname(&fullname, repo, ce);
 
@@ -346,10 +346,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 				continue;
 			if (ce_skip_worktree(ce))
 				continue;
-			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			stat_err = lstat(fullname.buf, &st);
+			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+				error_errno("cannot lstat '%s'", fullname.buf);
+			if (stat_err && show_deleted)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
+			if (show_modified &&
+			    (stat_err || ie_modified(repo->index, ce, &st, 0)))
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
 		}
 	}
-- 
2.30.0-491-g302c625a7b


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 2/3] ls_files.c: consolidate two for loops into one
  2021-01-23 19:53           ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified Junio C Hamano
@ 2021-01-23 19:53             ` Junio C Hamano
  2021-01-23 19:53             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option Junio C Hamano
  1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 19:53 UTC (permalink / raw)
  To: git; +Cc: gitgitgadget, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This will make it easier to show only one entry per filename in the
next step.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: corrected the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/ls-files.c | 63 ++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 36 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index ce6f6ad00e..e94d724aff 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -312,49 +312,40 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (show_cached || show_stage) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
 
-			construct_fullname(&fullname, repo, ce);
+	if (!(show_cached || show_stage || show_deleted || show_modified))
+		return;
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct stat st;
+		int stat_err;
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
+		construct_fullname(&fullname, repo, ce);
+
+		if ((dir->flags & DIR_SHOW_IGNORED) &&
+			!ce_excluded(dir, repo->index, fullname.buf, ce))
+			continue;
+		if (ce->ce_flags & CE_UPDATE)
+			continue;
+		if ((show_cached || show_stage) &&
+		    (!show_unmerged || ce_stage(ce)))
 			show_ce(repo, dir, ce, fullname.buf,
 				ce_stage(ce) ? tag_unmerged :
 				(ce_skip_worktree(ce) ? tag_skip_worktree :
 				 tag_cached));
-		}
-	}
-	if (show_deleted || show_modified) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-			struct stat st;
-			int stat_err;
-
-			construct_fullname(&fullname, repo, ce);
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			if (ce_skip_worktree(ce))
-				continue;
-			stat_err = lstat(fullname.buf, &st);
-			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (stat_err && show_deleted)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified &&
-			    (stat_err || ie_modified(repo->index, ce, &st, 0)))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
-		}
+		if (!(show_deleted || show_modified))
+			continue;
+		if (ce_skip_worktree(ce))
+			continue;
+		stat_err = lstat(fullname.buf, &st);
+		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+			error_errno("cannot lstat '%s'", fullname.buf);
+		if (stat_err && show_deleted)
+			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+		if (show_modified &&
+		    (stat_err || ie_modified(repo->index, ce, &st, 0)))
+			show_ce(repo, dir, ce, fullname.buf, tag_modified);
 	}
 
 	strbuf_release(&fullname);
-- 
2.30.0-491-g302c625a7b


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 3/3] ls-files.c: add --deduplicate option
  2021-01-23 19:53           ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified Junio C Hamano
  2021-01-23 19:53             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one Junio C Hamano
@ 2021-01-23 19:53             ` Junio C Hamano
  1 sibling, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-23 19:53 UTC (permalink / raw)
  To: git; +Cc: gitgitgadget, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

During a merge conflict, the name of a file may appear multiple
times in "git ls-files" output, once for each stage.  If you use
both `--delete` and `--modify` at the same time, the output may
mention a deleted file twice.

When none of the '-t', '-u', or '-s' options is in use, these
duplicate entries do not add much value to the output.

Introduce a new '--deduplicate' option to suppress them.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: extended doc and rewritten commit log]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-ls-files.txt |  8 +++++
 builtin/ls-files.c             | 31 ++++++++++++++--
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 3 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 0a3b5265b3..6d11ab506b 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -80,6 +81,13 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	When only filenames are shown, suppress duplicates that may
+	come from having multiple stages during a merge, or giving
+	`--deleted` and `--modified` option at the same time.
+	When any of the `-t`, `--unmerged`, or `--stage` option is
+	in use, this option has no effect.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e94d724aff..f6f9e483b2 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -328,11 +329,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
 		if ((show_cached || show_stage) &&
-		    (!show_unmerged || ce_stage(ce)))
+		    (!show_unmerged || ce_stage(ce))) {
 			show_ce(repo, dir, ce, fullname.buf,
 				ce_stage(ce) ? tag_unmerged :
 				(ce_skip_worktree(ce) ? tag_skip_worktree :
 				 tag_cached));
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
 
 		if (!(show_deleted || show_modified))
 			continue;
@@ -341,11 +345,28 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		stat_err = lstat(fullname.buf, &st);
 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
 			error_errno("cannot lstat '%s'", fullname.buf);
-		if (stat_err && show_deleted)
+		if (stat_err && show_deleted) {
 			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
 		if (show_modified &&
-		    (stat_err || ie_modified(repo->index, ce, &st, 0)))
+		    (stat_err || ie_modified(repo->index, ce, &st, 0))) {
 			show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		continue;
+
+skip_to_next_name:
+		{
+			int j;
+			struct cache_entry **cache = repo->index->cache;
+			for (j = i + 1; j < repo->index->cache_nr; j++)
+				if (strcmp(ce->name, cache[j]->name))
+					break;
+			i = j - 1; /* compensate for the for loop */
+		}
 	}
 
 	strbuf_release(&fullname);
@@ -572,6 +593,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "deduplicate", &skipping_duplicates,
+			 N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -611,6 +634,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 0000000000..2682b1f43a
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done
-- 
2.30.0-491-g302c625a7b


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option
  2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
                             ` (3 preceding siblings ...)
  2021-01-23 19:53           ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified Junio C Hamano
@ 2021-01-24 10:54           ` 阿德烈 via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
                               ` (2 more replies)
  4 siblings, 3 replies; 65+ messages in thread
From: 阿德烈 via GitGitGadget @ 2021-01-24 10:54 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈

I am reading the source code of git ls-files and learned that git ls-files
may have duplicate files name when there are unmerged path in a branch merge
or when different options are used at the same time. Users may fell confuse
when they see these duplicate file names.

As Junio C Hamano said ,it have odd behaviour.

Therefore, we can provide an additional option to git ls-files to delete
those repeated information.

This fixes https://github.com/gitgitgadget/git/issues/198

Thanks!

ZheNing Hu (3):
  ls_files.c: bugfix for --deleted and --modified
  ls_files.c: consolidate two for loops into one
  ls-files.c: add --deduplicate option

 Documentation/git-ls-files.txt |  8 ++++
 builtin/ls-files.c             | 85 ++++++++++++++++++++--------------
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++
 3 files changed, 124 insertions(+), 35 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh


base-commit: 6d3ef5b467eccd2769f1aa1c555d317d3c8dc707
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-832%2Fadlternative%2Fls-files-dedup-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-832/adlternative/ls-files-dedup-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/832

Range-diff vs v6:

 1:  fbc38ce9075 ! 1:  8b02367a359 ls_files.c: bugfix for --deleted and --modified
     @@ Commit message
          ie_modified() when lstat() has failed.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
     +    [jc: fixed misindented code]
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/ls-files.c ##
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
 2:  3997d390883 ! 2:  f9d5e44d2c0 ls_files.c: consolidate two for loops into one
     @@ Metadata
       ## Commit message ##
          ls_files.c: consolidate two for loops into one
      
     -    Refactor the two for loops into one,skip showing the ce if it
     -    has the same name as the previously shown one, only when doing so
     -    won't lose information.
     +    This will make it easier to show only one entry per filename in the
     +    next step.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
     +    [jc: corrected the log message]
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/ls-files.c ##
      @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_struct *dir)
 3:  07b603fd97c ! 3:  384f77a4c18 ls-files.c: add --deduplicate option
     @@ Metadata
       ## Commit message ##
          ls-files.c: add --deduplicate option
      
     -    In order to provide users a better experience
     -    when viewing information about files in the index
     -    and the working tree, the `--deduplicate` option will suppress
     -    some duplicate name under some conditions.
     +    During a merge conflict, the name of a file may appear multiple
     +    times in "git ls-files" output, once for each stage.  If you use
     +    both `--delete` and `--modify` at the same time, the output may
     +    mention a deleted file twice.
      
     -    In a merge conflict, one file name of "git ls-files" output may
     -    appear multiple times. For example,now there is an unmerged path
     -    `a.c`,`a.c` will appear three times in the output of
     -    "git ls-files".We can use "git ls-files --deduplicate" to output
     -    `a.c` only one time.(unless `--stage` or `--unmerged` is
     -    used to view all the detailed information in the index)
     +    When none of the '-t', '-u', or '-s' options is in use, these
     +    duplicate entries do not add much value to the output.
      
     -    In addition, if you use both `--delete` and `--modify` at
     -    the same time, The `--deduplicate` option
     -    can also suppress file name output.
     -
     -    Additional instructions:
     -    In order to display entries information,`deduplicate` suppresses
     -    the output of duplicate file names, not the output of duplicate
     -    entries information, so under the option of `-t`, `--stage`, `--unmerge`,
     -    `--deduplicate` will have no effect.
     +    Introduce a new '--deduplicate' option to suppress them.
      
          Signed-off-by: ZheNing Hu <adlternative@gmail.com>
     +    [jc: extended doc and rewritten commit log]
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-ls-files.txt ##
      @@ Documentation/git-ls-files.txt: SYNOPSIS
     @@ Documentation/git-ls-files.txt: OPTIONS
       	See OUTPUT below for more information.
       
      +--deduplicate::
     -+	Suppress duplicate entries when there are unmerged paths in index
     -+	or `--deleted` and `--modified` are combined.
     ++	When only filenames are shown, suppress duplicates that may
     ++	come from having multiple stages during a merge, or giving
     ++	`--deleted` and `--modified` option at the same time.
     ++	When any of the `-t`, `--unmerged`, or `--stage` option is
     ++	in use, this option has no effect.
      +
       -x <pattern>::
       --exclude=<pattern>::

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
@ 2021-01-24 10:54             ` ZheNing Hu via GitGitGadget
  2021-01-24 22:04               ` Junio C Hamano
  2021-01-24 10:54             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
  2 siblings, 1 reply; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-24 10:54 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This situation may occur in the original code: lstat() failed
but we use `&st` to feed ie_modified() later.

Therefore, we can directly execute show_ce without the judgment of
ie_modified() when lstat() has failed.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: fixed misindented code]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/ls-files.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c8eae899b82..1e264bd1329 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -335,7 +335,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		for (i = 0; i < repo->index->cache_nr; i++) {
 			const struct cache_entry *ce = repo->index->cache[i];
 			struct stat st;
-			int err;
+			int stat_err;
 
 			construct_fullname(&fullname, repo, ce);
 
@@ -346,11 +346,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 				continue;
 			if (ce_skip_worktree(ce))
 				continue;
-			err = lstat(fullname.buf, &st);
-			if (show_deleted && err)
+			stat_err = lstat(fullname.buf, &st);
+			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+				error_errno("cannot lstat '%s'", fullname.buf);
+			if (stat_err && show_deleted)
 				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified && ie_modified(repo->index, ce, &st, 0))
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (show_modified &&
+				(stat_err || ie_modified(repo->index, ce, &st, 0)))
+					show_ce(repo, dir, ce, fullname.buf, tag_modified);
 		}
 	}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 2/3] ls_files.c: consolidate two for loops into one
  2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-24 10:54             ` ZheNing Hu via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
  2 siblings, 0 replies; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-24 10:54 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

This will make it easier to show only one entry per filename in the
next step.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: corrected the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/ls-files.c | 70 ++++++++++++++++++++--------------------------
 1 file changed, 30 insertions(+), 40 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 1e264bd1329..966c0ab0296 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -312,49 +312,39 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (show_cached || show_stage) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-
-			construct_fullname(&fullname, repo, ce);
-
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			show_ce(repo, dir, ce, fullname.buf,
-				ce_stage(ce) ? tag_unmerged :
-				(ce_skip_worktree(ce) ? tag_skip_worktree :
-				 tag_cached));
-		}
-	}
-	if (show_deleted || show_modified) {
-		for (i = 0; i < repo->index->cache_nr; i++) {
-			const struct cache_entry *ce = repo->index->cache[i];
-			struct stat st;
-			int stat_err;
+	if (! (show_cached || show_stage || show_deleted || show_modified))
+		return;
+	for (i = 0; i < repo->index->cache_nr; i++) {
+		const struct cache_entry *ce = repo->index->cache[i];
+		struct stat st;
+		int stat_err;
 
-			construct_fullname(&fullname, repo, ce);
+		construct_fullname(&fullname, repo, ce);
 
-			if ((dir->flags & DIR_SHOW_IGNORED) &&
-			    !ce_excluded(dir, repo->index, fullname.buf, ce))
-				continue;
-			if (ce->ce_flags & CE_UPDATE)
-				continue;
-			if (ce_skip_worktree(ce))
-				continue;
-			stat_err = lstat(fullname.buf, &st);
-			if (stat_err && (errno != ENOENT && errno != ENOTDIR))
-				error_errno("cannot lstat '%s'", fullname.buf);
-			if (stat_err && show_deleted)
-				show_ce(repo, dir, ce, fullname.buf, tag_removed);
-			if (show_modified &&
-				(stat_err || ie_modified(repo->index, ce, &st, 0)))
-					show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		if ((dir->flags & DIR_SHOW_IGNORED) &&
+			!ce_excluded(dir, repo->index, fullname.buf, ce))
+			continue;
+		if (ce->ce_flags & CE_UPDATE)
+			continue;
+		if (show_cached || show_stage) {
+			if (!show_unmerged || ce_stage(ce))
+				show_ce(repo, dir, ce, fullname.buf,
+					ce_stage(ce) ? tag_unmerged :
+					(ce_skip_worktree(ce) ? tag_skip_worktree :
+						tag_cached));
 		}
+		if (!show_deleted && !show_modified)
+			continue;
+		if (ce_skip_worktree(ce))
+			continue;
+		stat_err = lstat(fullname.buf, &st);
+		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
+			error_errno("cannot lstat '%s'", fullname.buf);
+		if (stat_err && show_deleted)
+			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+		if (show_modified &&
+			(stat_err || ie_modified(repo->index, ce, &st, 0)))
+				show_ce(repo, dir, ce, fullname.buf, tag_modified);
 	}
 
 	strbuf_release(&fullname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v7 3/3] ls-files.c: add --deduplicate option
  2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
  2021-01-24 10:54             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
@ 2021-01-24 10:54             ` ZheNing Hu via GitGitGadget
  2 siblings, 0 replies; 65+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-01-24 10:54 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, 胡哲宁, Junio C Hamano,
	Johannes Schindelin, 阿德烈, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

During a merge conflict, the name of a file may appear multiple
times in "git ls-files" output, once for each stage.  If you use
both `--delete` and `--modify` at the same time, the output may
mention a deleted file twice.

When none of the '-t', '-u', or '-s' options is in use, these
duplicate entries do not add much value to the output.

Introduce a new '--deduplicate' option to suppress them.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: extended doc and rewritten commit log]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-ls-files.txt |  8 +++++
 builtin/ls-files.c             | 30 +++++++++++++---
 t/t3012-ls-files-dedup.sh      | 66 ++++++++++++++++++++++++++++++++++
 3 files changed, 100 insertions(+), 4 deletions(-)
 create mode 100755 t/t3012-ls-files-dedup.sh

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index cbcf5263dd0..60449a69b69 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -13,6 +13,7 @@ SYNOPSIS
 		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 		(-[c|d|o|i|s|u|k|m])*
 		[--eol]
+		[--deduplicate]
 		[-x <pattern>|--exclude=<pattern>]
 		[-X <file>|--exclude-from=<file>]
 		[--exclude-per-directory=<file>]
@@ -81,6 +82,13 @@ OPTIONS
 	\0 line termination on output and do not quote filenames.
 	See OUTPUT below for more information.
 
+--deduplicate::
+	When only filenames are shown, suppress duplicates that may
+	come from having multiple stages during a merge, or giving
+	`--deleted` and `--modified` option at the same time.
+	When any of the `-t`, `--unmerged`, or `--stage` option is
+	in use, this option has no effect.
+
 -x <pattern>::
 --exclude=<pattern>::
 	Skip untracked files matching pattern.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 966c0ab0296..fb9cf50d764 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static int skipping_duplicates;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -326,12 +327,14 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
-		if (show_cached || show_stage) {
-			if (!show_unmerged || ce_stage(ce))
+		if ((show_cached || show_stage) &&
+			(!show_unmerged || ce_stage(ce))) {
 				show_ce(repo, dir, ce, fullname.buf,
 					ce_stage(ce) ? tag_unmerged :
 					(ce_skip_worktree(ce) ? tag_skip_worktree :
 						tag_cached));
+			if (skipping_duplicates)
+				goto skip_to_next_name;
 		}
 		if (!show_deleted && !show_modified)
 			continue;
@@ -340,11 +343,27 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		stat_err = lstat(fullname.buf, &st);
 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
 			error_errno("cannot lstat '%s'", fullname.buf);
-		if (stat_err && show_deleted)
+		if (stat_err && show_deleted) {
 			show_ce(repo, dir, ce, fullname.buf, tag_removed);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
 		if (show_modified &&
-			(stat_err || ie_modified(repo->index, ce, &st, 0)))
+			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
 				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+			if (skipping_duplicates)
+				goto skip_to_next_name;
+		}
+		continue;
+skip_to_next_name:
+		{
+			int j;
+			struct cache_entry **cache = repo->index->cache;
+			for (j = i + 1; j < repo->index->cache_nr; j++)
+				if (strcmp(ce->name, cache[j]->name))
+					break;
+			i = j - 1; /* compensate for outer for loop */
+		}
 	}
 
 	strbuf_release(&fullname);
@@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
 		OPT_END()
 	};
 
@@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		 * you also show the stage information.
 		 */
 		show_stage = 1;
+	if (show_tag || show_stage)
+		skipping_duplicates = 0;
 	if (dir.exclude_per_dir)
 		exc_given = 1;
 
diff --git a/t/t3012-ls-files-dedup.sh b/t/t3012-ls-files-dedup.sh
new file mode 100755
index 00000000000..2682b1f43a6
--- /dev/null
+++ b/t/t3012-ls-files-dedup.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='git ls-files --deduplicate test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	>a.txt &&
+	>b.txt &&
+	>delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m base &&
+	echo a >a.txt &&
+	echo b >b.txt &&
+	echo delete >delete.txt &&
+	git add a.txt b.txt delete.txt &&
+	git commit -m tip &&
+	git tag tip &&
+	git reset --hard HEAD^ &&
+	echo change >a.txt &&
+	git commit -a -m side &&
+	git tag side
+'
+
+test_expect_success 'git ls-files --deduplicate to show unique unmerged path' '
+	test_must_fail git merge tip &&
+	git ls-files --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_expect_success 'git ls-files -d -m --deduplicate with different display options' '
+	git reset --hard side &&
+	test_must_fail git merge tip &&
+	rm delete.txt &&
+	git ls-files -d -m --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -t --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	C a.txt
+	C a.txt
+	C a.txt
+	R delete.txt
+	C delete.txt
+	EOF
+	test_cmp expect actual &&
+	git ls-files -d -m -c --deduplicate >actual &&
+	cat >expect <<-\EOF &&
+	a.txt
+	b.txt
+	delete.txt
+	EOF
+	test_cmp expect actual &&
+	git merge --abort
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-24 10:54             ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
@ 2021-01-24 22:04               ` Junio C Hamano
  2021-01-25  6:05                 ` 胡哲宁
  0 siblings, 1 reply; 65+ messages in thread
From: Junio C Hamano @ 2021-01-24 22:04 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Eric Sunshine, 胡哲宁, Johannes Schindelin

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: ZheNing Hu <adlternative@gmail.com>
>
> This situation may occur in the original code: lstat() failed
> but we use `&st` to feed ie_modified() later.
>
> Therefore, we can directly execute show_ce without the judgment of
> ie_modified() when lstat() has failed.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> [jc: fixed misindented code]

I noticed that you reverted my fix in this version, when this is
compared with the one I sent last night.

Comparing the result of applying all three with what I sent last
night, this v7 looks worse (see below).  Let's discard this round
and declare victory with what is already on 'seen'.

Thanks.


---

comparison between what these three patches would produce (preimage)
and what is on 'seen' (postimage)is shown here.

diff --git w/builtin/ls-files.c c/builtin/ls-files.c
index fb9cf50d76..f6f9e483b2 100644
--- w/builtin/ls-files.c
+++ c/builtin/ls-files.c
@@ -313,7 +313,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (show_killed)
 			show_killed_files(repo->index, dir);
 	}
-	if (! (show_cached || show_stage || show_deleted || show_modified))
+
+	if (!(show_cached || show_stage || show_deleted || show_modified))
 		return;
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
@@ -328,15 +329,16 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
 		if ((show_cached || show_stage) &&
-			(!show_unmerged || ce_stage(ce))) {
-				show_ce(repo, dir, ce, fullname.buf,
-					ce_stage(ce) ? tag_unmerged :
-					(ce_skip_worktree(ce) ? tag_skip_worktree :
-						tag_cached));
+		    (!show_unmerged || ce_stage(ce))) {
+			show_ce(repo, dir, ce, fullname.buf,
+				ce_stage(ce) ? tag_unmerged :
+				(ce_skip_worktree(ce) ? tag_skip_worktree :
+				 tag_cached));
 			if (skipping_duplicates)
 				goto skip_to_next_name;
 		}
-		if (!show_deleted && !show_modified)
+
+		if (!(show_deleted || show_modified))
 			continue;
 		if (ce_skip_worktree(ce))
 			continue;
@@ -349,12 +351,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 				goto skip_to_next_name;
 		}
 		if (show_modified &&
-			(stat_err || ie_modified(repo->index, ce, &st, 0))) {
-				show_ce(repo, dir, ce, fullname.buf, tag_modified);
+		    (stat_err || ie_modified(repo->index, ce, &st, 0))) {
+			show_ce(repo, dir, ce, fullname.buf, tag_modified);
 			if (skipping_duplicates)
 				goto skip_to_next_name;
 		}
 		continue;
+
 skip_to_next_name:
 		{
 			int j;
@@ -362,7 +365,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			for (j = i + 1; j < repo->index->cache_nr; j++)
 				if (strcmp(ce->name, cache[j]->name))
 					break;
-			i = j - 1; /* compensate for outer for loop */
+			i = j - 1; /* compensate for the for loop */
 		}
 	}
 
@@ -590,7 +593,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
-		OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
+		OPT_BOOL(0, "deduplicate", &skipping_duplicates,
+			 N_("suppress duplicate entries")),
 		OPT_END()
 	};
 

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-24 22:04               ` Junio C Hamano
@ 2021-01-25  6:05                 ` 胡哲宁
  2021-01-25 19:05                   ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: 胡哲宁 @ 2021-01-25  6:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine,
	Johannes Schindelin

OK,I didn’t notice any formatting changes before.

Am I free from this patch now?I should probably
look for other issues.

Junio, thank you for all your patient help.
I may often make some low-level mistakes.
I am grateful.

Cheers.

Junio C Hamano <gitster@pobox.com> 于2021年1月25日周一 上午6:04写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > This situation may occur in the original code: lstat() failed
> > but we use `&st` to feed ie_modified() later.
> >
> > Therefore, we can directly execute show_ce without the judgment of
> > ie_modified() when lstat() has failed.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > [jc: fixed misindented code]
>
> I noticed that you reverted my fix in this version, when this is
> compared with the one I sent last night.
>
> Comparing the result of applying all three with what I sent last
> night, this v7 looks worse (see below).  Let's discard this round
> and declare victory with what is already on 'seen'.
>
> Thanks.
>
>
> ---
>
> comparison between what these three patches would produce (preimage)
> and what is on 'seen' (postimage)is shown here.
>
> diff --git w/builtin/ls-files.c c/builtin/ls-files.c
> index fb9cf50d76..f6f9e483b2 100644
> --- w/builtin/ls-files.c
> +++ c/builtin/ls-files.c
> @@ -313,7 +313,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>                 if (show_killed)
>                         show_killed_files(repo->index, dir);
>         }
> -       if (! (show_cached || show_stage || show_deleted || show_modified))
> +
> +       if (!(show_cached || show_stage || show_deleted || show_modified))
>                 return;
>         for (i = 0; i < repo->index->cache_nr; i++) {
>                 const struct cache_entry *ce = repo->index->cache[i];
> @@ -328,15 +329,16 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>                 if (ce->ce_flags & CE_UPDATE)
>                         continue;
>                 if ((show_cached || show_stage) &&
> -                       (!show_unmerged || ce_stage(ce))) {
> -                               show_ce(repo, dir, ce, fullname.buf,
> -                                       ce_stage(ce) ? tag_unmerged :
> -                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> -                                               tag_cached));
> +                   (!show_unmerged || ce_stage(ce))) {
> +                       show_ce(repo, dir, ce, fullname.buf,
> +                               ce_stage(ce) ? tag_unmerged :
> +                               (ce_skip_worktree(ce) ? tag_skip_worktree :
> +                                tag_cached));
>                         if (skipping_duplicates)
>                                 goto skip_to_next_name;
>                 }
> -               if (!show_deleted && !show_modified)
> +
> +               if (!(show_deleted || show_modified))
>                         continue;
>                 if (ce_skip_worktree(ce))
>                         continue;
> @@ -349,12 +351,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>                                 goto skip_to_next_name;
>                 }
>                 if (show_modified &&
> -                       (stat_err || ie_modified(repo->index, ce, &st, 0))) {
> -                               show_ce(repo, dir, ce, fullname.buf, tag_modified);
> +                   (stat_err || ie_modified(repo->index, ce, &st, 0))) {
> +                       show_ce(repo, dir, ce, fullname.buf, tag_modified);
>                         if (skipping_duplicates)
>                                 goto skip_to_next_name;
>                 }
>                 continue;
> +
>  skip_to_next_name:
>                 {
>                         int j;
> @@ -362,7 +365,7 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>                         for (j = i + 1; j < repo->index->cache_nr; j++)
>                                 if (strcmp(ce->name, cache[j]->name))
>                                         break;
> -                       i = j - 1; /* compensate for outer for loop */
> +                       i = j - 1; /* compensate for the for loop */
>                 }
>         }
>
> @@ -590,7 +593,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                         N_("pretend that paths removed since <tree-ish> are still present")),
>                 OPT__ABBREV(&abbrev),
>                 OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> -               OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
> +               OPT_BOOL(0, "deduplicate", &skipping_duplicates,
> +                        N_("suppress duplicate entries")),
>                 OPT_END()
>         };
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified
  2021-01-25  6:05                 ` 胡哲宁
@ 2021-01-25 19:05                   ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-01-25 19:05 UTC (permalink / raw)
  To: 胡哲宁
  Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine,
	Johannes Schindelin

胡哲宁 <adlternative@gmail.com> writes:

> OK,I didn’t notice any formatting changes before.
>
> Am I free from this patch now?I should probably
> look for other issues.

I think we are pretty much done with it.  Thanks for working on the
topic so patiently.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* GitGitGadget and `next`, was Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-01-22 18:02                     ` Junio C Hamano
@ 2021-03-19 13:54                       ` Johannes Schindelin
  2021-03-19 18:11                         ` Junio C Hamano
  0 siblings, 1 reply; 65+ messages in thread
From: Johannes Schindelin @ 2021-03-19 13:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: 胡哲宁, ZheNing Hu via GitGitGadget, Git List,
	Eric Sunshine

Hi Junio,

I just noticed that this still waited in my inbox for me to answer it.

On Fri, 22 Jan 2021, Junio C Hamano wrote:

> Just being curious, but when a series hits 'next', would the way in
> which the user interacts with GGG change?

My hunch is that we should probably tell new users (for who GitGitGadget
now uses the "new user" PR label) about the expectations of only adding
patches on top (i.e. in a new PR), unless the branch gets kicked out of
`next`.

> With or without GGG, what is done on the local side is not all that
> different---you build new commits on top without disturbing the commits
> that are in 'next'. Then what?  Push it again (this time there is no
> need to force) and submit the additional ones via `/submit`?

GitGitGadget would send the entire patch series, which is probably not a
good idea.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: GitGitGadget and `next`, was Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  2021-03-19 13:54                       ` GitGitGadget and `next`, was " Johannes Schindelin
@ 2021-03-19 18:11                         ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2021-03-19 18:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: 胡哲宁, ZheNing Hu via GitGitGadget, Git List,
	Eric Sunshine

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 22 Jan 2021, Junio C Hamano wrote:
>
>> Just being curious, but when a series hits 'next', would the way in
>> which the user interacts with GGG change?
>
> My hunch is that we should probably tell new users (for who GitGitGadget
> now uses the "new user" PR label) about the expectations of only adding
> patches on top (i.e. in a new PR), unless the branch gets kicked out of
> `next`.
>
>> With or without GGG, what is done on the local side is not all that
>> different---you build new commits on top without disturbing the commits
>> that are in 'next'. Then what?  Push it again (this time there is no
>> need to force) and submit the additional ones via `/submit`?
>
> GitGitGadget would send the entire patch series, which is probably not a
> good idea.

Thanks for a clarification.

While we are on the topic of GGG, if I may ask for a new feature or
two (or perhaps such a feature already exists), it would be nice if
contributors are allowed to tweak who are CC'ed in the outgoing
patch mail in various ways:

 - I may author a commit as <gitster@work.addre.ss> and make a pull
   request on GitHub, but the <gitster@pobox.com> is the address
   associated with the GitHub account making the pull request.  I
   think GGG sends CC to the author (at work) as well as me, but I
   may prefer to get correspondence on the patch at either one of my
   addresses not both.  "Mr GGG, please compute the CC list the
   normal way, and drop this address from the CC list" that I can
   say when I say "/submit" might be a good way to do so.

 - Also, at "/submit" time, being able to say "Also CC: these
   addresses" would be a good feature, without contaminating the
   commit log message with CC: trailer lines.

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2021-03-19 18:12 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06  8:53 [PATCH] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
2021-01-07  6:10 ` Eric Sunshine
2021-01-07  6:40   ` Junio C Hamano
2021-01-08 14:36 ` [PATCH v2 0/2] " 阿德烈 via GitGitGadget
2021-01-08 14:36   ` [PATCH v2 1/2] " ZheNing Hu via GitGitGadget
2021-01-08 14:36   ` [PATCH v2 2/2] builtin:ls-files.c:add " ZheNing Hu via GitGitGadget
2021-01-14  6:38     ` Eric Sunshine
2021-01-14  8:17       ` 胡哲宁
2021-01-14 12:22   ` [PATCH v3] ls-files.c: add " 阿德烈 via GitGitGadget
2021-01-15  0:59     ` Junio C Hamano
2021-01-17  3:45       ` 胡哲宁
2021-01-17  4:37         ` Junio C Hamano
2021-01-16  7:13     ` Eric Sunshine
2021-01-17  3:49       ` 胡哲宁
2021-01-17  5:11         ` Eric Sunshine
2021-01-17 23:04           ` Junio C Hamano
2021-01-18 14:59             ` Eric Sunshine
2021-01-17  4:02     ` [PATCH v4 0/3] builtin/ls-files.c:add git ls-file " 阿德烈 via GitGitGadget
2021-01-17  4:02       ` [PATCH v4 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
2021-01-17  6:22         ` Junio C Hamano
2021-01-17  4:02       ` [PATCH v4 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
2021-01-17  4:02       ` [PATCH v4 3/3] ls-files: add --deduplicate option ZheNing Hu via GitGitGadget
2021-01-17  6:25         ` Junio C Hamano
2021-01-17 23:34         ` Junio C Hamano
2021-01-18  4:09           ` 胡哲宁
2021-01-18  6:05             ` 胡哲宁
2021-01-18 21:31               ` Junio C Hamano
2021-01-19  2:56                 ` 胡哲宁
2021-01-19  6:30       ` [PATCH v5 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
2021-01-19  6:30         ` [PATCH v5 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
2021-01-20 20:26           ` Junio C Hamano
2021-01-21 10:02             ` 胡哲宁
2021-01-19  6:30         ` [PATCH v5 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
2021-01-20 20:27           ` Junio C Hamano
2021-01-21 11:05             ` 胡哲宁
2021-01-19  6:30         ` [PATCH v5 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
2021-01-20 21:26           ` Junio C Hamano
2021-01-21 11:00             ` 胡哲宁
2021-01-21 20:45               ` Junio C Hamano
2021-01-22  9:50                 ` 胡哲宁
2021-01-22 16:04                   ` Johannes Schindelin
2021-01-22 18:02                     ` Junio C Hamano
2021-03-19 13:54                       ` GitGitGadget and `next`, was " Johannes Schindelin
2021-03-19 18:11                         ` Junio C Hamano
2021-01-23  8:20                     ` 胡哲宁
2021-01-22 15:46               ` [PATCH v6] " ZheNing Hu
2021-01-22 20:52                 ` Junio C Hamano
2021-01-23  8:27                   ` 胡哲宁
2021-01-23 10:20         ` [PATCH v6 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
2021-01-23 10:20           ` [PATCH v6 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
2021-01-23 17:55             ` Junio C Hamano
2021-01-23 10:20           ` [PATCH v6 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
2021-01-23 19:50             ` Junio C Hamano
2021-01-23 10:20           ` [PATCH v6 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
2021-01-23 19:51             ` Junio C Hamano
2021-01-23 19:53           ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified Junio C Hamano
2021-01-23 19:53             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one Junio C Hamano
2021-01-23 19:53             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option Junio C Hamano
2021-01-24 10:54           ` [PATCH v7 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
2021-01-24 10:54             ` [PATCH v7 1/3] ls_files.c: bugfix for --deleted and --modified ZheNing Hu via GitGitGadget
2021-01-24 22:04               ` Junio C Hamano
2021-01-25  6:05                 ` 胡哲宁
2021-01-25 19:05                   ` Junio C Hamano
2021-01-24 10:54             ` [PATCH v7 2/3] ls_files.c: consolidate two for loops into one ZheNing Hu via GitGitGadget
2021-01-24 10:54             ` [PATCH v7 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).