git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v3 2/8] dir-iterator: support iteration in sorted order
Date: Wed, 21 Feb 2024 13:37:23 +0100	[thread overview]
Message-ID: <89cf960d47026cc1a1527e35b1c069c6598ac3e0.1708518982.git.ps@pks.im> (raw)
In-Reply-To: <cover.1708518982.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 6195 bytes --]

The `struct dir_iterator` is a helper that allows us to iterate through
directory entries. This iterator returns entries in the exact same order
as readdir(3P) does -- or in other words, it guarantees no specific
order at all.

This is about to become problematic as we are introducing a new reflog
subcommand to list reflogs. As the "files" backend uses the directory
iterator to enumerate reflogs, returning reflog names and exposing them
to the user would inherit the indeterministic ordering. Naturally, it
would make for a terrible user interface to show a list with no
discernible order.

While this could be handled at a higher level by the new subcommand
itself by collecting and ordering the reflogs, this would be inefficient
because we would first have to collect all reflogs before we can sort
them, which would introduce additional latency when there are many
reflogs.

Instead, introduce a new option into the directory iterator that asks
for its entries to be yielded in lexicographical order. If set, the
iterator will read all directory entries greedily and sort them before
we start to iterate over them.

While this will of course also incur overhead as we cannot yield the
directory entries immediately, it should at least be more efficient than
having to sort the complete list of reflogs as we only need to sort one
directory at a time.

This functionality will be used in a follow-up commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 dir-iterator.c | 99 +++++++++++++++++++++++++++++++++++++++++++-------
 dir-iterator.h |  3 ++
 2 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index f58a97e089..de619846f2 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -2,10 +2,19 @@
 #include "dir.h"
 #include "iterator.h"
 #include "dir-iterator.h"
+#include "string-list.h"
 
 struct dir_iterator_level {
 	DIR *dir;
 
+	/*
+	 * The directory entries of the current level. This list will only be
+	 * populated when the iterator is ordered. In that case, `dir` will be
+	 * set to `NULL`.
+	 */
+	struct string_list entries;
+	size_t entries_idx;
+
 	/*
 	 * The length of the directory part of path at this level
 	 * (including a trailing '/'):
@@ -43,6 +52,31 @@ struct dir_iterator_int {
 	unsigned int flags;
 };
 
+static int next_directory_entry(DIR *dir, const char *path,
+				struct dirent **out)
+{
+	struct dirent *de;
+
+repeat:
+	errno = 0;
+	de = readdir(dir);
+	if (!de) {
+		if (errno) {
+			warning_errno("error reading directory '%s'",
+				      path);
+			return -1;
+		}
+
+		return 1;
+	}
+
+	if (is_dot_or_dotdot(de->d_name))
+		goto repeat;
+
+	*out = de;
+	return 0;
+}
+
 /*
  * Push a level in the iter stack and initialize it with information from
  * the directory pointed by iter->base->path. It is assumed that this
@@ -72,6 +106,35 @@ static int push_level(struct dir_iterator_int *iter)
 		return -1;
 	}
 
+	string_list_init_dup(&level->entries);
+	level->entries_idx = 0;
+
+	/*
+	 * When the iterator is sorted we read and sort all directory entries
+	 * directly.
+	 */
+	if (iter->flags & DIR_ITERATOR_SORTED) {
+		struct dirent *de;
+
+		while (1) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
+				if (errno != ENOENT &&
+				    iter->flags & DIR_ITERATOR_PEDANTIC)
+					return -1;
+				continue;
+			} else if (ret > 0) {
+				break;
+			}
+
+			string_list_append(&level->entries, de->d_name);
+		}
+		string_list_sort(&level->entries);
+
+		closedir(level->dir);
+		level->dir = NULL;
+	}
+
 	return 0;
 }
 
@@ -88,6 +151,7 @@ static int pop_level(struct dir_iterator_int *iter)
 		warning_errno("error closing directory '%s'",
 			      iter->base.path.buf);
 	level->dir = NULL;
+	string_list_clear(&level->entries, 0);
 
 	return --iter->levels_nr;
 }
@@ -139,27 +203,34 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 		struct dirent *de;
 		struct dir_iterator_level *level =
 			&iter->levels[iter->levels_nr - 1];
+		const char *name;
 
 		strbuf_setlen(&iter->base.path, level->prefix_len);
-		errno = 0;
-		de = readdir(level->dir);
 
-		if (!de) {
-			if (errno) {
-				warning_errno("error reading directory '%s'",
-					      iter->base.path.buf);
+		if (level->dir) {
+			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
+			if (ret < 0) {
 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
 					goto error_out;
-			} else if (pop_level(iter) == 0) {
-				return dir_iterator_abort(dir_iterator);
+				continue;
+			} else if (ret > 0) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
 			}
-			continue;
-		}
 
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
+			name = de->d_name;
+		} else {
+			if (level->entries_idx >= level->entries.nr) {
+				if (pop_level(iter) == 0)
+					return dir_iterator_abort(dir_iterator);
+				continue;
+			}
 
-		if (prepare_next_entry_data(iter, de->d_name)) {
+			name = level->entries.items[level->entries_idx++].string;
+		}
+
+		if (prepare_next_entry_data(iter, name)) {
 			if (errno != ENOENT && iter->flags & DIR_ITERATOR_PEDANTIC)
 				goto error_out;
 			continue;
@@ -188,6 +259,8 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 			warning_errno("error closing directory '%s'",
 				      iter->base.path.buf);
 		}
+
+		string_list_clear(&level->entries, 0);
 	}
 
 	free(iter->levels);
diff --git a/dir-iterator.h b/dir-iterator.h
index 479e1ec784..6d438809b6 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -54,8 +54,11 @@
  *   and ITER_ERROR is returned immediately. In both cases, a meaningful
  *   warning is emitted. Note: ENOENT errors are always ignored so that
  *   the API users may remove files during iteration.
+ *
+ * - DIR_ITERATOR_SORTED: sort directory entries alphabetically.
  */
 #define DIR_ITERATOR_PEDANTIC (1 << 0)
+#define DIR_ITERATOR_SORTED   (1 << 1)
 
 struct dir_iterator {
 	/* The current path: */
-- 
2.44.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2024-02-21 12:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 14:35 [PATCH 0/6] reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 1/6] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 2/6] dir-iterator: support iteration in sorted order Patrick Steinhardt
2024-02-19 23:39   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 3/6] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-20  0:04   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 4/6] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-20  0:14   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 5/6] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-20  0:14   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-19 14:35 ` [PATCH 6/6] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-02-20  0:32   ` Junio C Hamano
2024-02-20  8:34     ` Patrick Steinhardt
2024-02-20  9:06 ` [PATCH v2 0/7] reflog: " Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 1/7] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 2/7] dir-iterator: support iteration in sorted order Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 3/7] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 4/7] refs: always treat iterators as ordered Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 5/7] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 6/7] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-20  9:06   ` [PATCH v2 7/7] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt
2024-04-24  7:30     ` Teng Long
2024-04-24  8:01       ` Patrick Steinhardt
2024-04-24 14:53         ` Junio C Hamano
2024-02-20 17:22   ` [PATCH v2 0/7] reflog: " Junio C Hamano
2024-02-21 11:48     ` Patrick Steinhardt
2024-02-21 12:37 ` [PATCH v3 0/8] " Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 1/8] dir-iterator: pass name to `prepare_next_entry_data()` directly Patrick Steinhardt
2024-02-21 12:37   ` Patrick Steinhardt [this message]
2024-02-21 12:37   ` [PATCH v3 3/8] refs/files: sort reflogs returned by the reflog iterator Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 4/8] refs/files: sort merged worktree and common reflogs Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 5/8] refs: always treat iterators as ordered Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 6/8] refs: drop unused params from the reflog iterator callback Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 7/8] refs: stop resolving ref corresponding to reflogs Patrick Steinhardt
2024-02-21 12:37   ` [PATCH v3 8/8] builtin/reflog: introduce subcommand to list reflogs Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89cf960d47026cc1a1527e35b1c069c6598ac3e0.1708518982.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).