git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] allow disabling fsync everywhere
@ 2021-10-28  0:21 Eric Wong
  2021-10-28  1:21 ` Eric Sunshine
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-28  0:21 UTC (permalink / raw)
  To: git

"core.fsync" and the "GIT_FSYNC" environment variable now exist
for disabling fsync() even on packfiles and other critical data.

Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
runtime from ~4 minutes down to ~3 minutes.

SVN interopability tests are minimally affected since SVN will
still use fsync in various places.

This will also be useful for 3rd-party tools which create
throwaway git repositories of temporary data.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Documentation/config/core.txt |  7 +++++++
 Documentation/git.txt         |  5 +++++
 cache.h                       |  1 +
 config.c                      |  5 +++++
 environment.c                 |  1 +
 git-cvsserver.perl            | 22 ++++++++++++++++++++++
 perl/Git/SVN.pm               | 20 ++++++++++++++++++--
 t/test-lib.sh                 |  1 +
 write-or-die.c                |  5 ++++-
 9 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a..2ad5364246 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -555,6 +555,13 @@ data writes properly, but can be useful for filesystems that do not use
 journalling (traditional UNIX filesystems) or that only journal metadata
 and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
 
+core.fsync::
+	A boolean to control 'fsync()' on all files.
++
+Setting this to false can speed up writes of unimportant data.
+Disabling fsync will lead to data loss on power failure.  If set
+to false, this also overrides core.fsyncObjectFiles.  Defaults to true.
+
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
 +
diff --git a/Documentation/git.txt b/Documentation/git.txt
index d63c65e67d..cda4504e41 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -703,6 +703,11 @@ for further details.
 	not set, Git will choose buffered or record-oriented flushing
 	based on whether stdout appears to be redirected to a file or not.
 
+`GIT_FSYNC`::
+	Setting this environment variable to "0" disables fsync() entirely.
+	This is intended for running test suites and other repositories of
+	unimportant data.  See also core.fsync in linkgit:git-config[1].
+
 `GIT_TRACE`::
 	Enables general trace messages, e.g. alias expansion, built-in
 	command execution and external command execution.
diff --git a/cache.h b/cache.h
index eba12487b9..de6c45cf44 100644
--- a/cache.h
+++ b/cache.h
@@ -986,6 +986,7 @@ extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
+extern int use_fsync;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 2dcbe901b6..1ea7cb801b 100644
--- a/config.c
+++ b/config.c
@@ -1492,6 +1492,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.fsync")) {
+		use_fsync = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.preloadindex")) {
 		core_preload_index = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index 9da7f3c1a1..0d06a31024 100644
--- a/environment.c
+++ b/environment.c
@@ -42,6 +42,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
 int fsync_object_files;
+int use_fsync = -1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
index 64319bed43..7679819e4d 100755
--- a/git-cvsserver.perl
+++ b/git-cvsserver.perl
@@ -3607,6 +3607,25 @@ package GITCVS::updater;
 use strict;
 use warnings;
 use DBI;
+our $_use_fsync;
+
+# n.b. consider using Git.pm
+sub use_fsync {
+    if (!defined($_use_fsync)) {
+        my $x = $ENV{GIT_FSYNC};
+        if (defined $x) {
+            local $ENV{GIT_CONFIG};
+            delete $ENV{GIT_CONFIG};
+            my $v = ::safe_pipe_capture('git', '-c', "core.fsync=$x",
+                                        qw(config --type=bool core.fsync));
+            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+        } else {
+            my $v = `git config --type=bool core.fsync`;
+            $_use_fsync = $v eq "false\n" ? 0 : 1;
+        }
+    }
+    $_use_fsync;
+}
 
 =head1 METHODS
 
@@ -3676,6 +3695,9 @@ sub new
                                 $self->{dbuser},
                                 $self->{dbpass});
     die "Error connecting to database\n" unless defined $self->{dbh};
+    if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) {
+        $self->{dbh}->do('PRAGMA synchronous = OFF');
+    }
 
     $self->{tables} = {};
     foreach my $table ( keys %{$self->{dbh}->table_info(undef,undef,undef,'TABLE')->fetchall_hashref('TABLE_NAME')} )
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 35ff5a6896..7b333ea62e 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -6,7 +6,7 @@ package Git::SVN;
 use vars qw/$_no_metadata
             $_repack $_repack_flags $_use_svm_props $_head
             $_use_svnsync_props $no_reuse_existing
-	    $_use_log_author $_add_author_from $_localtime/;
+	    $_use_log_author $_add_author_from $_localtime $_use_fsync/;
 use Carp qw/croak/;
 use File::Path qw/mkpath/;
 use IPC::Open3;
@@ -2269,6 +2269,22 @@ sub mkfile {
 	}
 }
 
+# TODO: move this to Git.pm?
+sub use_fsync {
+	if (!defined($_use_fsync)) {
+		my $x = $ENV{GIT_FSYNC};
+		if (defined $x) {
+			my $v = command_oneline('git', '-c', "core.fsync=$x",
+				qw(config --type=bool core.fsync));
+			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+		} else {
+			$_use_fsync = Git::config_bool('core.fsync');
+			$_use_fsync = 1 if !defined($_use_fsync);
+		}
+	}
+	$_use_fsync;
+}
+
 sub rev_map_set {
 	my ($self, $rev, $commit, $update_ref, $uuid) = @_;
 	defined $commit or die "missing arg3\n";
@@ -2290,7 +2306,7 @@ sub rev_map_set {
 	my $sync;
 	# both of these options make our .rev_db file very, very important
 	# and we can't afford to lose it because rebuild() won't work
-	if ($self->use_svm_props || $self->no_metadata) {
+	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
 		require File::Copy;
 		$sync = 1;
 		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a291a5d4a2..61bb24444c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -442,6 +442,7 @@ unset VISUAL EMAIL LANGUAGE $("$PERL_PATH" -e '
 		PERF_
 		CURL_VERBOSE
 		TRACE_CURL
+		FSYNC
 	));
 	my @vars = grep(/^GIT_/ && !/^GIT_($ok)/o, @env);
 	print join("\n", @vars);
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b..d2962dc423 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "config.h"
 #include "run-command.h"
 
 /*
@@ -57,7 +58,9 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	if (use_fsync < 0)
+		use_fsync = git_env_bool("GIT_FSYNC", 1);
+	while (use_fsync && fsync(fd) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
@ 2021-10-28  1:21 ` Eric Sunshine
  2021-10-28 14:36 ` Jeff King
  2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Sunshine @ 2021-10-28  1:21 UTC (permalink / raw)
  To: Eric Wong; +Cc: Git List

On Wed, Oct 27, 2021 at 8:21 PM Eric Wong <e@80x24.org> wrote:
> "core.fsync" and the "GIT_FSYNC" environment variable now exist
> for disabling fsync() even on packfiles and other critical data.
> [...]
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
> diff --git a/write-or-die.c b/write-or-die.c
> @@ -57,7 +58,9 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  void fsync_or_die(int fd, const char *msg)
>  {
> -       while (fsync(fd) < 0) {
> +       if (use_fsync < 0)
> +               use_fsync = git_env_bool("GIT_FSYNC", 1);
> +       while (use_fsync && fsync(fd) < 0) {
>                 if (errno != EINTR)
>                         die_errno("fsync error on '%s'", msg);
>         }

This is minor, but placing `use_fsync` in the `while` condition makes
the logic harder to digest. The intent would be clearer at-a-glance if
structured like this:

    if (use_fsync < 0)
        use_fsync = git_env_bool("GIT_FSYNC", 1);
    if (!use_fsync)
        return;
    while (fsync(fd) < 0) {
        if (errno != EINTR)
            die_errno("fsync error on '%s'", msg);
    }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
  2021-10-28  1:21 ` Eric Sunshine
@ 2021-10-28 14:36 ` Jeff King
  2021-10-28 18:28   ` Eric Wong
  2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-10-28 14:36 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Thu, Oct 28, 2021 at 12:21:02AM +0000, Eric Wong wrote:

> "core.fsync" and the "GIT_FSYNC" environment variable now exist
> for disabling fsync() even on packfiles and other critical data.
> 
> Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
> on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
> runtime from ~4 minutes down to ~3 minutes.
> 
> SVN interopability tests are minimally affected since SVN will
> still use fsync in various places.
> 
> This will also be useful for 3rd-party tools which create
> throwaway git repositories of temporary data.

Do you mostly just care about the tests, or is the third-party tool
support important to you? I ask because most of us switched to running
the tests with --root=/some/tmpfs long ago to achieve the same speedup.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28 14:36 ` Jeff King
@ 2021-10-28 18:28   ` Eric Wong
  2021-10-28 19:29     ` Junio C Hamano
  2021-10-28 21:40     ` [PATCH] allow disabling " brian m. carlson
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-28 18:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> On Thu, Oct 28, 2021 at 12:21:02AM +0000, Eric Wong wrote:
> 
> > "core.fsync" and the "GIT_FSYNC" environment variable now exist
> > for disabling fsync() even on packfiles and other critical data.

Fwiw, I'm questioning the need for core.fsync.  GIT_FSYNC alone
may be sufficient.

> > This will also be useful for 3rd-party tools which create
> > throwaway git repositories of temporary data.
> 
> Do you mostly just care about the tests, or is the third-party tool
> support important to you? I ask because most of us switched to running
> the tests with --root=/some/tmpfs long ago to achieve the same speedup.

Third-party tools and OSes which don't have a tmpfs mounted by
default (I don't think most *BSDs have tmpfs enabled by
default).

I try to use libeatmydata everywhere I can; but that's not
always installed.

I'm also strongly considering making GIT_FSYNC=0 the default for
our own test suite since it's less setup for newbies.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28 18:28   ` Eric Wong
@ 2021-10-28 19:29     ` Junio C Hamano
  2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
  2021-10-28 21:40     ` [PATCH] allow disabling " brian m. carlson
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-10-28 19:29 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git

Eric Wong <e@80x24.org> writes:

> Jeff King <peff@peff.net> wrote:
>> On Thu, Oct 28, 2021 at 12:21:02AM +0000, Eric Wong wrote:
>> 
>> > "core.fsync" and the "GIT_FSYNC" environment variable now exist
>> > for disabling fsync() even on packfiles and other critical data.
>
> Fwiw, I'm questioning the need for core.fsync.  GIT_FSYNC alone
> may be sufficient.
>
>> > This will also be useful for 3rd-party tools which create
>> > throwaway git repositories of temporary data.
>> 
>> Do you mostly just care about the tests, or is the third-party tool
>> support important to you? I ask because most of us switched to running
>> the tests with --root=/some/tmpfs long ago to achieve the same speedup.
>
> Third-party tools and OSes which don't have a tmpfs mounted by
> default (I don't think most *BSDs have tmpfs enabled by
> default).
>
> I try to use libeatmydata everywhere I can; but that's not
> always installed.
>
> I'm also strongly considering making GIT_FSYNC=0 the default for
> our own test suite since it's less setup for newbies.

If the primary focus is for testing, perhaps GIT_TEST_FSYNC would be
better?  I do not think we want to even advertise an option for not
syncing at all for end users working with any real repositories.
Not even when they promise that the end user accepts all the
responsibility and will keep both halves when things got broken.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28 18:28   ` Eric Wong
  2021-10-28 19:29     ` Junio C Hamano
@ 2021-10-28 21:40     ` brian m. carlson
  1 sibling, 0 replies; 17+ messages in thread
From: brian m. carlson @ 2021-10-28 21:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]

On 2021-10-28 at 18:28:24, Eric Wong wrote:
> Third-party tools and OSes which don't have a tmpfs mounted by
> default (I don't think most *BSDs have tmpfs enabled by
> default).
> 
> I try to use libeatmydata everywhere I can; but that's not
> always installed.

I was going to suggest libeatmydata for this purpose.  I don't think we
should grow this option when there's a library that does all of this for
us without any mistakes and also is universally applicable.  Presumably
if you find Git too slow in this case, you'll also find dpkg and other
programs too slow, and want to use it there as well.

It's also potentially a footgun where people end up "making Git faster"
and then get corrupted data.  I'm imagining blog posts and Stack
Overflow suggestions that people do this, just like we see tons of
suggestions for people to set http.postBuffer without understanding what
it does.  I think "eat my data" is pretty clear about the consequences
so we don't have to be, and it comes with a giant warning that you're
almost certainly going to experience data loss.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] tests: disable fsync everywhere
  2021-10-28 19:29     ` Junio C Hamano
@ 2021-10-29  0:15       ` Eric Wong
  2021-10-29  5:18         ` Junio C Hamano
                           ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-29  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> >> > This will also be useful for 3rd-party tools which create
> >> > throwaway git repositories of temporary data.
> >> 
> >> Do you mostly just care about the tests, or is the third-party tool
> >> support important to you? I ask because most of us switched to running
> >> the tests with --root=/some/tmpfs long ago to achieve the same speedup.
> >
> > Third-party tools and OSes which don't have a tmpfs mounted by
> > default (I don't think most *BSDs have tmpfs enabled by
> > default).

> > I'm also strongly considering making GIT_FSYNC=0 the default for
> > our own test suite since it's less setup for newbies.
> 
> If the primary focus is for testing, perhaps GIT_TEST_FSYNC would be
> better?  I do not think we want to even advertise an option for not
> syncing at all for end users working with any real repositories.
> Not even when they promise that the end user accepts all the
> responsibility and will keep both halves when things got broken.

I think having an undocumented GIT_TEST_FSYNC is a good
compromise and saves us the support burden.

Fwiw, I do currently have a 3rd-party tool which creates
throwaway repos, but the throwaways are currently small enough
that objects stay loose.  They might get bigger and pack in
in the future.

Relying on an LD_PRELOAD-based solution such as eatmydata
doesn't work for staticly-linked systems, and can conflict
with other LD_PRELOAD-based tools one might use(*)

v2 changes:
* s/GIT_FSYNC/GIT_TEST_FSYNC/
* disable fsync by default for tests, reduces setup for newcomers
* fix style nit noted by Eric Sunshine

--------------8<------------
Subject: [PATCH] tests: disable fsync everywhere

The "GIT_TEST_FSYNC" environment variable now exists for
disabling fsync() even on packfiles and other "critical" data.

Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
on an HDD, test runtime drops from ~4 minutes down to ~3 minutes.
Using "GIT_TEST_FSYNC=1" re-enables fsync() for comparison
purposes.

SVN interopability tests are minimally affected since SVN will
still use fsync in various places.

This will also be useful for 3rd-party tools which create
throwaway git repositories of temporary data, but remains
undocumented for end users.

Signed-off-by: Eric Wong <e@80x24.org>
---
 cache.h            |  1 +
 environment.c      |  1 +
 git-cvsserver.perl | 19 +++++++++++++++++++
 perl/Git/SVN.pm    | 17 +++++++++++++++--
 t/test-lib.sh      |  7 +++++++
 write-or-die.c     |  5 +++++
 6 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index eba12487b9..de6c45cf44 100644
--- a/cache.h
+++ b/cache.h
@@ -986,6 +986,7 @@ extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
+extern int use_fsync;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/environment.c b/environment.c
index 9da7f3c1a1..0d06a31024 100644
--- a/environment.c
+++ b/environment.c
@@ -42,6 +42,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
 int fsync_object_files;
+int use_fsync = -1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
index 64319bed43..4c8118010a 100755
--- a/git-cvsserver.perl
+++ b/git-cvsserver.perl
@@ -3607,6 +3607,22 @@ package GITCVS::updater;
 use strict;
 use warnings;
 use DBI;
+our $_use_fsync;
+
+# n.b. consider using Git.pm
+sub use_fsync {
+    if (!defined($_use_fsync)) {
+        my $x = $ENV{GIT_TEST_FSYNC};
+        if (defined $x) {
+            local $ENV{GIT_CONFIG};
+            delete $ENV{GIT_CONFIG};
+            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
+                                        qw(config --type=bool test.fsync));
+            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+        }
+    }
+    $_use_fsync;
+}
 
 =head1 METHODS
 
@@ -3676,6 +3692,9 @@ sub new
                                 $self->{dbuser},
                                 $self->{dbpass});
     die "Error connecting to database\n" unless defined $self->{dbh};
+    if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) {
+        $self->{dbh}->do('PRAGMA synchronous = OFF');
+    }
 
     $self->{tables} = {};
     foreach my $table ( keys %{$self->{dbh}->table_info(undef,undef,undef,'TABLE')->fetchall_hashref('TABLE_NAME')} )
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 35ff5a6896..df5a87a151 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -6,7 +6,7 @@ package Git::SVN;
 use vars qw/$_no_metadata
             $_repack $_repack_flags $_use_svm_props $_head
             $_use_svnsync_props $no_reuse_existing
-	    $_use_log_author $_add_author_from $_localtime/;
+	    $_use_log_author $_add_author_from $_localtime $_use_fsync/;
 use Carp qw/croak/;
 use File::Path qw/mkpath/;
 use IPC::Open3;
@@ -2269,6 +2269,19 @@ sub mkfile {
 	}
 }
 
+# TODO: move this to Git.pm?
+sub use_fsync {
+	if (!defined($_use_fsync)) {
+		my $x = $ENV{GIT_TEST_FSYNC};
+		if (defined $x) {
+			my $v = command_oneline('git', '-c', "test.fsync=$x",
+					qw(config --type=bool test.fsync));
+			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+		}
+	}
+	$_use_fsync;
+}
+
 sub rev_map_set {
 	my ($self, $rev, $commit, $update_ref, $uuid) = @_;
 	defined $commit or die "missing arg3\n";
@@ -2290,7 +2303,7 @@ sub rev_map_set {
 	my $sync;
 	# both of these options make our .rev_db file very, very important
 	# and we can't afford to lose it because rebuild() won't work
-	if ($self->use_svm_props || $self->no_metadata) {
+	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
 		require File::Copy;
 		$sync = 1;
 		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a291a5d4a2..21f5fab999 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -489,6 +489,13 @@ then
 	export GIT_PERL_FATAL_WARNINGS
 fi
 
+case $GIT_TEST_FSYNC in
+'')
+	GIT_TEST_FSYNC=0
+	export GIT_TEST_FSYNC
+	;;
+esac
+
 # Add libc MALLOC and MALLOC_PERTURB test
 # only if we are not executing the test with valgrind
 if test -n "$valgrind" ||
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b..a3d5784cec 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "config.h"
 #include "run-command.h"
 
 /*
@@ -57,6 +58,10 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
+	if (use_fsync < 0)
+		use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
+	if (!use_fsync)
+		return;
 	while (fsync(fd) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
-- 

(*) I happen to be working on an LD_PRELOAD-based malloc tracer
    for Perl which may help us cut memory use in git-svn;
    but interposing malloc/free gets tricky...

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
@ 2021-10-29  5:18         ` Junio C Hamano
  2021-10-29  7:56           ` Eric Wong
  2021-10-29  7:33         ` Junio C Hamano
  2021-10-29 20:34         ` Jeff King
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-10-29  5:18 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Eric Wong <e@80x24.org> writes:

> @@ -42,6 +42,7 @@ const char *git_hooks_path;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int pack_compression_level = Z_DEFAULT_COMPRESSION;
>  int fsync_object_files;
> +int use_fsync = -1;

OK, (-1) is "undetermined yet", as usual.

> diff --git a/git-cvsserver.perl b/git-cvsserver.perl
> index 64319bed43..4c8118010a 100755
> --- a/git-cvsserver.perl
> +++ b/git-cvsserver.perl
> @@ -3607,6 +3607,22 @@ package GITCVS::updater;
>  use strict;
>  use warnings;
>  use DBI;
> +our $_use_fsync;
> +
> +# n.b. consider using Git.pm
> +sub use_fsync {
> +    if (!defined($_use_fsync)) {
> +        my $x = $ENV{GIT_TEST_FSYNC};
> +        if (defined $x) {

I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
guess there is no way to place in %ENV anyway, so it would be OK.

> +            local $ENV{GIT_CONFIG};
> +            delete $ENV{GIT_CONFIG};

OK, "git -c test.fsync=no cvsserver" would added something to
GIT_CONFIG that would affect test.fsync, but wouldn't the usual
last-one-wins rule be sufficient to check the value of $x using the
next construction, no matter what is in GIT_CONFIG?  I do not think
it would hurt to delete $ENV{GIT_CONFIG}, but I am not sure how it
is necessary.

> +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
> +                                        qw(config --type=bool test.fsync));

THis is an interesting idiom.

> +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> +        }
> +    }
> +    $_use_fsync;
> +}


> +# TODO: move this to Git.pm?
> +sub use_fsync {

Possibly, but in a slightly more general form, taking the name of
the environment variable that holds a boolean value as an argument,
or something?

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index a291a5d4a2..21f5fab999 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -489,6 +489,13 @@ then
>  	export GIT_PERL_FATAL_WARNINGS
>  fi
>  
> +case $GIT_TEST_FSYNC in
> +'')
> +	GIT_TEST_FSYNC=0
> +	export GIT_TEST_FSYNC
> +	;;
> +esac

> diff --git a/write-or-die.c b/write-or-die.c
> index 0b1ec8190b..a3d5784cec 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "config.h"
>  #include "run-command.h"
>  
>  /*
> @@ -57,6 +58,10 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  
>  void fsync_or_die(int fd, const char *msg)
>  {
> +	if (use_fsync < 0)
> +		use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
> +	if (!use_fsync)
> +		return;

OK.  That's quite straight-forward.

>  	while (fsync(fd) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);

Will queue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
  2021-10-29  5:18         ` Junio C Hamano
@ 2021-10-29  7:33         ` Junio C Hamano
  2021-10-29  7:48           ` Eric Wong
  2021-10-29 20:34         ` Jeff King
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-10-29  7:33 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Eric Wong <e@80x24.org> writes:

> v2 changes:
> * s/GIT_FSYNC/GIT_TEST_FSYNC/
> * disable fsync by default for tests, reduces setup for newcomers
> * fix style nit noted by Eric Sunshine

https://github.com/git/git/runs/4043532265?check_suite_focus=true#step:5:70

Seems to be dying in "git svn" tests somehow.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  7:33         ` Junio C Hamano
@ 2021-10-29  7:48           ` Eric Wong
  2021-10-29 17:22             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2021-10-29  7:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > v2 changes:
> > * s/GIT_FSYNC/GIT_TEST_FSYNC/
> > * disable fsync by default for tests, reduces setup for newcomers
> > * fix style nit noted by Eric Sunshine
> 
> https://github.com/git/git/runs/4043532265?check_suite_focus=true#step:5:70

Fwiw, I couldn't view that (not sure if it's from lack of JS
or lack of GH account).  Either way it's accessibility problem.

> Seems to be dying in "git svn" tests somehow.

Easy repro+fix, though.  I only tested my final patch with NO_SVN_TESTS :x
Can you squash this in or do you want a reroll?

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index df5a87a151..6ce2e283c8 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -2274,7 +2274,7 @@ sub use_fsync {
 	if (!defined($_use_fsync)) {
 		my $x = $ENV{GIT_TEST_FSYNC};
 		if (defined $x) {
-			my $v = command_oneline('git', '-c', "test.fsync=$x",
+			my $v = command_oneline('-c', "test.fsync=$x",
 					qw(config --type=bool test.fsync));
 			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  5:18         ` Junio C Hamano
@ 2021-10-29  7:56           ` Eric Wong
  2021-10-29 18:12             ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2021-10-29  7:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > +# n.b. consider using Git.pm
> > +sub use_fsync {
> > +    if (!defined($_use_fsync)) {
> > +        my $x = $ENV{GIT_TEST_FSYNC};
> > +        if (defined $x) {
> 
> I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
> guess there is no way to place in %ENV anyway, so it would be OK.

Was that meant to say: "no way to place `undef' in %ENV anyway"?

If so, `undef' can actually be in Perl's %ENV, though it appears
to get coerced into "" (empty string) when spawning processes.

> > +            local $ENV{GIT_CONFIG};
> > +            delete $ENV{GIT_CONFIG};
> 
> OK, "git -c test.fsync=no cvsserver" would added something to
> GIT_CONFIG that would affect test.fsync, but wouldn't the usual
> last-one-wins rule be sufficient to check the value of $x using the
> next construction, no matter what is in GIT_CONFIG?  I do not think
> it would hurt to delete $ENV{GIT_CONFIG}, but I am not sure how it
> is necessary.

Leaving GIT_CONFIG set was actually causing "git config" to
exit(1) since git-cvsserver sets GIT_CONFIG and the GIT_CONFIG
file doesn't have a test.fsync setting.  This is the current
behavior, I think it's a weird quirk, but intended behavior of
git-config.

# this assumes you don't have foo.bar set in your ~/.gitconfig :>
$ GIT_CONFIG=$HOME/.gitconfig git -c foo.bar=0 config --type=bool foo.bar
$ echo $?
1

> > +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
> > +                                        qw(config --type=bool test.fsync));
> 
> THis is an interesting idiom.

Heh, I just thought of it before sending my original.  I was
going to use a regexp originally (in git-svn, too), but didn't
want to get into corner cases such as hex and +/- prefixes).

> > +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> > +        }
> > +    }
> > +    $_use_fsync;
> > +}
> 
> 
> > +# TODO: move this to Git.pm?
> > +sub use_fsync {
> 
> Possibly, but in a slightly more general form, taking the name of
> the environment variable that holds a boolean value as an argument,
> or something?

Yeah.  It would've been more useful if git-cvsserver used Git.pm;
but I didn't want to introduce Git.pm into git-cvsserver in case
somebody relies on git-cvsserver being standalone.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
  2021-10-28  1:21 ` Eric Sunshine
  2021-10-28 14:36 ` Jeff King
@ 2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
  2021-10-30 10:39   ` Eric Wong
  2 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-29 11:20 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Neeraj Singh


On Thu, Oct 28 2021, Eric Wong wrote:

> "core.fsync" and the "GIT_FSYNC" environment variable now exist
> for disabling fsync() even on packfiles and other critical data.

First off, I think this would be useful to add, even as a non-test
thing.

There's plenty of use-cases where users don't care about fsync(), and
aside from data reliability it also has implications for say laptop
battery use.

I care about my E-Mail, but my E-Mail syncing thingy's (mbsync) has an
fsync=off in its config, I'm pretty confident that I won't lose power on
that *nix laptop, and even if I did I have other recovery methods.

I wouldn't use this in general, but might for some operations, e.g. when
I'm doing some batch operation with git-annex or whatever.

> Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
> on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
> runtime from ~4 minutes down to ~3 minutes.

On a related topic: Have you tried having it use an empty template
directory? I have some unsubmitted patches for that, and if you retain
all trash dirs you'll find that we have IIRC 100-200MB of just
accumulated *.sample etc. hooks, out of ~1GB total for the tests.

> +core.fsync::
> +	A boolean to control 'fsync()' on all files.
> ++
> +Setting this to false can speed up writes of unimportant data.
> +Disabling fsync will lead to data loss on power failure.  If set
> +to false, this also overrides core.fsyncObjectFiles.  Defaults to true.
> +

Per the recent threads about fsync semantics this should really be
worded to be more scary, i.e. it's not guaranteed that your data is
on-disk at all. So subsequent git operations might see repo corruption,
if we're being POSIX-pedantic.

So some more "all bets are off" wording would be better here, and any
extra promises could be e.g.:

    On commonly deployed OS's such as Linux, OSX etc. a lack of fsync()
    gives you extra guarantees that are non-standard, primarily that a
    subsequent running process that accesses the data will see the
    updated version, as the VFS will serve up the new version, even if
    it's not on-disk yet. Even these light guarantees may not be
    something you're getting from our OS. Here be dragons!

> [...]
> --- a/git-cvsserver.perl
> +++ b/git-cvsserver.perl
> @@ -3607,6 +3607,25 @@ package GITCVS::updater;
>  use strict;
>  use warnings;
>  use DBI;
> +our $_use_fsync;

s/our/my/?

> +# n.b. consider using Git.pm
> +sub use_fsync {
> +    if (!defined($_use_fsync)) {
> +        my $x = $ENV{GIT_FSYNC};
> +        if (defined $x) {
> +            local $ENV{GIT_CONFIG};
> +            delete $ENV{GIT_CONFIG};
> +            my $v = ::safe_pipe_capture('git', '-c', "core.fsync=$x",
> +                                        qw(config --type=bool core.fsync));
> +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> +        } else {
> +            my $v = `git config --type=bool core.fsync`;
> +            $_use_fsync = $v eq "false\n" ? 0 : 1;
> +        }
> +    }
> +    $_use_fsync;
> +}

I wonder most of all if we really need these perl changes, does it
really matter for the overall performance that you want, or was it just
to get all fsync() uses out of the test suite?

This is a more general comment, but so much of scaffolding like that in
the *.perl scripts could go away if we taught git.c some "not quite a
builtin, but it's ours" mode, and had say code for git-send-email,
git-svn etc. to just set the various data they need in the environment
before we invoke them. Then this would just be say:

    our $use_fsync = $ENV{GIT_FOR_CVSSERVER_FSYNC};


> +    if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) {
> +        $self->{dbh}->do('PRAGMA synchronous = OFF');
> +    }

..in particular if we're doing this we should update the docs to say
that we'll also tweak third-party software's use of fsync() in some
cases.

Even if we "own" that sqlite DB a user might not expect a git option to
have gone so far as to be the cause of their on-disk sqlite DB's
corruption.

> [...]
>  	my $sync;
>  	# both of these options make our .rev_db file very, very important
>  	# and we can't afford to lose it because rebuild() won't work
> -	if ($self->use_svm_props || $self->no_metadata) {
> +	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
>  		require File::Copy;
>  		$sync = 1;
>  		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",

This doesn't just impact $io->sync, but also $io->flush, which is a
different thing. PerlIO flushing to the OS != fsync().

So even on OS's that make the above noted guarantees you'd get nothing,
since we're now at the mercy of perl not crashing in the interim, not
kernel or stdlib behavior.

> @@ -57,7 +58,9 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  
>  void fsync_or_die(int fd, const char *msg)
>  {
> -	while (fsync(fd) < 0) {
> +	if (use_fsync < 0)
> +		use_fsync = git_env_bool("GIT_FSYNC", 1);
> +	while (use_fsync && fsync(fd) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);
>  	}

This patch direction looks good to me, aside from the above we should
really update any other fsync() docs we have, maybe with a
cross-reference in core.fsyncObjectFiles?

I'm not sure offhand what the state of the other fsync patches that
Neeraj Singh has been submitting is, but let's make sure that whatever
docs/config knobs etc. we have all play nicely together, and are
somewhat future-proof if possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  7:48           ` Eric Wong
@ 2021-10-29 17:22             ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-10-29 17:22 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Eric Wong <e@80x24.org> writes:

> Easy repro+fix, though.  I only tested my final patch with NO_SVN_TESTS :x
> Can you squash this in or do you want a reroll?
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index df5a87a151..6ce2e283c8 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -2274,7 +2274,7 @@ sub use_fsync {
>  	if (!defined($_use_fsync)) {
>  		my $x = $ENV{GIT_TEST_FSYNC};
>  		if (defined $x) {
> -			my $v = command_oneline('git', '-c', "test.fsync=$x",
> +			my $v = command_oneline('-c', "test.fsync=$x",
>  					qw(config --type=bool test.fsync));
>  			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
>  		}

Yuck, I should have known that command_oneline() does not want the
leading 'git'.

A sad part of the story is that this is not exactly the fault of the
interface, as the full name of the thing is Git::command_oneline(),
which makes it sufficiently clear that it is a "git" specific thing,
and Git::oneline_result_from_git_command_do_not_give_git_at_front()
is not a good sub name X-<.

Thanks for quickly diagnosing.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  7:56           ` Eric Wong
@ 2021-10-29 18:12             ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-10-29 18:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git, Eric Sunshine, brian m. carlson

Eric Wong <e@80x24.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> > +# n.b. consider using Git.pm
>> > +sub use_fsync {
>> > +    if (!defined($_use_fsync)) {
>> > +        my $x = $ENV{GIT_TEST_FSYNC};
>> > +        if (defined $x) {
>> 
>> I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
>> guess there is no way to place in %ENV anyway, so it would be OK.
>
> Was that meant to say: "no way to place `undef' in %ENV anyway"?

Yes.  Nothing the external callers of a Perl script does by futzing
the environment with setenv(3) and unsetenv(3) can make undef appear
as a value in $ENV{SomeKey}, so defined $ENV{V} and exists $ENV{V}
are equivalent.

I still prefer "exists $ENV{V}", which I think conveys the intent of
the check better, i.e. "we do this iff the environment variable X is
there".

> If so, `undef' can actually be in Perl's %ENV, though it appears
> to get coerced into "" (empty string) when spawning processes.

Yes, but you are talking about the opposite direction, what Perl can
do to %ENV to affect processes it spawns, which is not what I meant.

> Leaving GIT_CONFIG set was actually causing "git config" to
> exit(1) since git-cvsserver sets GIT_CONFIG and the GIT_CONFIG
> file doesn't have a test.fsync setting.  This is the current
> behavior, I think it's a weird quirk, but intended behavior of
> git-config.

Ah, sorry, I misread the variable.  GIT_CONFIG was the thing that
says "read from this file and nowhere else"; we do want to disable
it locally for "-c var=val" to take effect.

> # this assumes you don't have foo.bar set in your ~/.gitconfig :>
> $ GIT_CONFIG=$HOME/.gitconfig git -c foo.bar=0 config --type=bool foo.bar
> $ echo $?
> 1
>
>> > +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
>> > +                                        qw(config --type=bool test.fsync));
>> 
>> THis is an interesting idiom.
>
> Heh, I just thought of it before sending my original.  I was
> going to use a regexp originally (in git-svn, too), but didn't
> want to get into corner cases such as hex and +/- prefixes).

And this I think is the best way to do so ;-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
  2021-10-29  5:18         ` Junio C Hamano
  2021-10-29  7:33         ` Junio C Hamano
@ 2021-10-29 20:34         ` Jeff King
  2021-10-29 20:42           ` Junio C Hamano
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-10-29 20:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Eric Sunshine, brian m. carlson

On Fri, Oct 29, 2021 at 12:15:52AM +0000, Eric Wong wrote:

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index a291a5d4a2..21f5fab999 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -489,6 +489,13 @@ then
>  	export GIT_PERL_FATAL_WARNINGS
>  fi
>  
> +case $GIT_TEST_FSYNC in
> +'')
> +	GIT_TEST_FSYNC=0
> +	export GIT_TEST_FSYNC
> +	;;
> +esac

I don't really have a problem with doing this by default, as it might
help and shouldn't hurt. But I didn't find it actually changed much for
me. Here are timings running the test suite on my machine:

  Benchmark 1: GIT_TEST_FSYNC=1 make
    Time (mean ± σ):     60.973 s ±  0.158 s    [User: 575.914 s, System: 333.243 s]
    Range (min … max):   60.731 s … 61.228 s    10 runs

  Benchmark 2: GIT_TEST_FSYNC=0 make
    Time (mean ± σ):     59.800 s ±  0.094 s    [User: 575.151 s, System: 337.111 s]
    Range (min … max):   59.643 s … 59.996 s    10 runs

  Benchmark 3: GIT_TEST_FSYNC=1 make GIT_TEST_OPTS=--root=/var/ram
    Time (mean ± σ):     56.966 s ±  0.066 s    [User: 572.353 s, System: 300.808 s]
    Range (min … max):   56.874 s … 57.063 s    10 runs

  Summary
    'GIT_TEST_FSYNC=1 make GIT_TEST_OPTS=--root=/var/ram' ran
      1.05 ± 0.00 times faster than 'GIT_TEST_FSYNC=0 make'
      1.07 ± 0.00 times faster than 'GIT_TEST_FSYNC=1 make'

So using an actual ram disk provided a much bigger edge for me than just
disabling fsync. This may be because the system is using a decent SSD
for its actual disk. But it may also be because I'm running 32 tests
simultaneously. So fsync introducing latency in tests isn't a big deal;
the bottleneck is CPU and there's always another script ready to run.
Though it is also interesting that the system CPU time is so much lower
in the tmpfs case.

Again, not really an objection, but I don't think this replaces the
"you're better off running the test suite on a RAM disk" wisdom.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tests: disable fsync everywhere
  2021-10-29 20:34         ` Jeff King
@ 2021-10-29 20:42           ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-10-29 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git, Eric Sunshine, brian m. carlson

Jeff King <peff@peff.net> writes:

> Again, not really an objection, but I don't think this replaces the
> "you're better off running the test suite on a RAM disk" wisdom.

Nice summary.  This was sold as "not everybody has ready access to
tmpfs", so those of us who can use either should stick to our tmpfs
as we have done so far.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] allow disabling fsync everywhere
  2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
@ 2021-10-30 10:39   ` Eric Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-30 10:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Neeraj Singh

Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Oct 28 2021, Eric Wong wrote:
> 
> > "core.fsync" and the "GIT_FSYNC" environment variable now exist
> > for disabling fsync() even on packfiles and other critical data.
> 
> First off, I think this would be useful to add, even as a non-test
> thing.

<snip>

Agreed, I've LD_PRELOAD-ed libeatmydata via ~/.bashrc at times :>

OTOH, I understand Junio and brian's positions on dealing with
misinformed users losing data.  I'm also lazy when it comes to
docs and support, so I'm leaning towards keeping this exclusively
for those who read source code.

I also know it's easy to add a sysctl knob for disabling fsync
et. al. in the kernel; but it's not something I want to deal
with supporting and documenting, either.

> > Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
> > on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
> > runtime from ~4 minutes down to ~3 minutes.
> 
> On a related topic: Have you tried having it use an empty template
> directory? I have some unsubmitted patches for that, and if you retain
> all trash dirs you'll find that we have IIRC 100-200MB of just
> accumulated *.sample etc. hooks, out of ~1GB total for the tests.

Not, not yet.  I also expect using reflinks on some Linux FSes
will help users save space in real-world use
(ioctl(dst_fd, FICLONE, src_fd)).

> Per the recent threads about fsync semantics this should really be
> worded to be more scary, i.e. it's not guaranteed that your data is
> on-disk at all. So subsequent git operations might see repo corruption,
> if we're being POSIX-pedantic.

Yeah, I wasn't sure how strongly to word it.  Now it's
undocumented; I suppose we can use your wording if we decide
it's worth documenting.

> > --- a/git-cvsserver.perl
> > +++ b/git-cvsserver.perl
> > @@ -3607,6 +3607,25 @@ package GITCVS::updater;
> >  use strict;
> >  use warnings;
> >  use DBI;
> > +our $_use_fsync;
> 
> s/our/my/?

Erm, yes.  Though it doesn't matter for a standalone script;
probably not worth a reroll..

> I wonder most of all if we really need these perl changes, does it
> really matter for the overall performance that you want, or was it just
> to get all fsync() uses out of the test suite?

Yes to both, or at least an attempt to...  A single fsync() can
end up being equivalent to syncfs().

Fully-disabling fsync in SVN doesn't seem possible, though.
There's a few svnadmin switches (--no-flush-to-disk +
--bdb-txn-nosync) but AFAIK they aren't 100% solutions.
Fortuntanately, NO_SVN_TESTS=1 exists.

> This is a more general comment, but so much of scaffolding like that in
> the *.perl scripts could go away if we taught git.c some "not quite a
> builtin, but it's ours" mode, and had say code for git-send-email,
> git-svn etc. to just set the various data they need in the environment
> before we invoke them. Then this would just be say:
> 
>     our $use_fsync = $ENV{GIT_FOR_CVSSERVER_FSYNC};

That's potentially a lot of environment variables, though.
Maybe just:  eval($ENV{GIT_CFG_CVSSERVER}) if $ENV{GIT_CFG_PID} == $$;
The PID check should be sufficient to avoid mismatches and
malicious code injection.

> > [...]
> >  	my $sync;
> >  	# both of these options make our .rev_db file very, very important
> >  	# and we can't afford to lose it because rebuild() won't work
> > -	if ($self->use_svm_props || $self->no_metadata) {
> > +	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
> >  		require File::Copy;
> >  		$sync = 1;
> >  		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
> 
> This doesn't just impact $io->sync, but also $io->flush, which is a
> different thing. PerlIO flushing to the OS != fsync().

Right, close($fh) (right below[1]) also takes care of PerlIO
flushing.  ->flush was only there for fsync.

[1] https://yhbt.net/lore/git/7b333ea62e/s/?b=perl/Git/SVN.pm#n2334

> This patch direction looks good to me, aside from the above we should
> really update any other fsync() docs we have, maybe with a
> cross-reference in core.fsyncObjectFiles?

Alright, though I'm happy with it being undocumented, for now.

> I'm not sure offhand what the state of the other fsync patches that
> Neeraj Singh has been submitting is, but let's make sure that whatever
> docs/config knobs etc. we have all play nicely together, and are
> somewhat future-proof if possible.

It looks like it some changes will be necessary to keep tests fast
after that lands.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-30 10:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
2021-10-28  1:21 ` Eric Sunshine
2021-10-28 14:36 ` Jeff King
2021-10-28 18:28   ` Eric Wong
2021-10-28 19:29     ` Junio C Hamano
2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
2021-10-29  5:18         ` Junio C Hamano
2021-10-29  7:56           ` Eric Wong
2021-10-29 18:12             ` Junio C Hamano
2021-10-29  7:33         ` Junio C Hamano
2021-10-29  7:48           ` Eric Wong
2021-10-29 17:22             ` Junio C Hamano
2021-10-29 20:34         ` Jeff King
2021-10-29 20:42           ` Junio C Hamano
2021-10-28 21:40     ` [PATCH] allow disabling " brian m. carlson
2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason
2021-10-30 10:39   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).