[PATCH 0/4] Documentation updates to FAQ and git-archive

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/4] Documentation updates to FAQ and git-archive
@ 2021-02-27 19:18 brian m. carlson
  2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: brian m. carlson @ 2021-02-27 19:18 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Johannes Schindelin

This series introduces several new FAQ items and an update to the
git-archive documentation.

The first three patches introduce FAQ entries for questions I've seen
extremely frequently on Stack Overflow.  Since clearly users are seeing
these problems, we should update our documentation to address them and
help users find clear and accurate solutions.

I realize that suggesting people share a working tree across systems is
controversial, but people are doing it, so let's tell them how to do it
safely.  Users frequently use things like Dropbox, OneDrive, iCloud, and
similar cloud syncing services to do this and then wonder why things are
broken or their repository is corrupted.  We tell them that
POSIX-compliant file systems should be used and give examples of what we
know does and doesn't work, and we tell them about the security pitfalls
of untrusted working trees.

The third patch addresses several common situations with HTTP pushes and
fetches.  The majority of these problems are going to be with TLS MITM
devices, intercepting and filtering proxies of various sorts, and
non-default antivirus and firewalls, all of which security experts
steadfastly recommend against.  We don't do so here (yet), but we do
explicitly call them out as potential sources of problems and we
encourage users to report these problems to vendors and network
administrators so that they can be addressed.

The fourth patch states a fact which we've been explicit about on the
list but have never documented: that the output of git archive is not
stable.  I do recall that I sent a patch breaking kernel.org's
infrastructure in the past due to a change in archive output and I'd
like to avoid other folks relying on bit-for-bit identical output.

brian m. carlson (4):
  docs: add a question on syncing repositories to the FAQ
  docs: add line ending configuration article to FAQ
  docs: add a FAQ section on push and fetch problems
  docs: note that archives are not stable

 Documentation/git-archive.txt |   3 +
 Documentation/gitfaq.txt      | 176 +++++++++++++++++++++++++++++++++-
 2 files changed, 178 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] docs: add a question on syncing repositories to the FAQ
  2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
@ 2021-02-27 19:18 ` brian m. carlson
  2021-02-28 13:01   ` Ævar Arnfjörð Bjarmason
  2021-02-27 19:18 ` [PATCH 2/4] docs: add line ending configuration article to FAQ brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: brian m. carlson @ 2021-02-27 19:18 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Johannes Schindelin

It is very common that users want to transport repositories with working
trees across machines.  While this is not recommended, many users do it
anyway and moreover, do it using cloud syncing services, which often
corrupt their data.  The results of such are often seen in tales of woe
on common user question fora.

Let's tell users what we recommend they do in this circumstance and how
to do it safely.  Warn them about the dangers of untrusted working trees
and the downsides of index refreshes, as well as the problems with cloud
syncing services.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index afdaeab850..042b11e88a 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -241,6 +241,45 @@ How do I know if I want to do a fetch or a pull?::
 	ignore the upstream changes.  A pull consists of a fetch followed
 	immediately by either a merge or rebase.  See linkgit:git-pull[1].
 
+[[syncing-across-computers]]
+How do I sync a Git repository across multiple computers, VMs, or operating systems?::
+	The best way to sync a repository across computers is by pushing and fetching.
+	This uses the native Git mechanisms to transport data efficiently and is the
+	easiest and best way to move data across machines.  If the machines aren't
+	connected by a network, you can use `git bundle` to create a file with your
+	changes and then fetch or pull them from the file on the remote machine.
+	Pushing and fetching are also the only secure ways to interact with a
+	repository you don't own or trust.
++
+However, sometimes people want to sync a repository with a working tree across
+machines.  While this isn't recommended, it can be done with `rsync` (usually
+over an SSH connection), but only when the repository is completely idle (that
+is, no processes, including `git gc`, are modifying it at all).  If `rsync`
+isn't available, you can use `tar` to create a tar archive of the repository and
+copy it to another machine.  Zip files shouldn't be used due to their poor
+support for permissions and symbolic links.
++
+You may also use a shared file system between the two machines that is POSIX
+compliant, such as SSHFS (SFTP) or NFSv4.  If you are using SFTP for this
+purpose, the server should support fsync and POSIX renames (OpenSSH does).  File
+systems that don't provide POSIX semantics, such as DAV mounts, shouldn't be
+used.
++
+Note that you must not work with untrusted working trees, since it's trivial
+for an attacker to set configuration options that will cause arbitrary code to
+be executed on your machine.  Also, in almost all cases when sharing a working
+tree across machines, Git will need to re-read all files the next time you run
+`git status` or otherwise refresh the index, which can be slow.  This generally
+can't be avoided and is part of the reason why sharing a working tree isn't
+recommended.
++
+In no circumstances should you share a working tree or bare repository using a
+cloud syncing service or store it in a directory managed by such a service.
+Such services sync file by file and don't maintain the invariants required for
+repository integrity; in addition, they can cause files to be added, removed, or
+duplicated unexpectedly.  If you must use one of these services, use it to store
+the repository in a tar archive instead.
+
 Merging and Rebasing
 --------------------
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] docs: add line ending configuration article to FAQ
  2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
  2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
@ 2021-02-27 19:18 ` brian m. carlson
  2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
  2021-02-27 19:18 ` [PATCH 4/4] docs: note that archives are not stable brian m. carlson
  3 siblings, 0 replies; 16+ messages in thread
From: brian m. carlson @ 2021-02-27 19:18 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Johannes Schindelin

A common source of problems when working across projects is getting line
endings to work in a consistent way.  Let's explain to users how to
configure their line endings such that they're automatically converted
using the .gitattributes file.  Update a reference to an incorrect FAQ
entry by referring to the previous entry instead of the following one.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 042b11e88a..a132f66032 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -387,6 +387,41 @@ repository will apply to all users of the repository.
 See the following entry for information about normalizing line endings as well,
 and see linkgit:gitattributes[5] for more information about attribute files.
 
+[[line-ending-gitattributes]]
+How do I fix my line endings to work well across platforms?::
+	The best way to do this is to ask Git to perform automatic line ending
+	conversion in your repository such that it always stores LF (Unix) line
+	endings in the repository and checks them out to the user's preferred endings.
+	This is done using the `text` attribute in the `.gitattributes` file in the
+	root of your repository.  If you want to use the built-in heuristic to
+	determine text files, you can write this:
++
+----
+* text=auto
+----
++
+If you have certain files that must always use specific line endings when
+checked out, such as shell scripts, or PowerShell files, you can specifically
+specify the line endings to be used, and you can also specifically mark some
+files as not wanting line-ending conversion (`-text`):
++
+----
+*.sh text eol=lf
+*.ps1 text eol=crlf
+*.jpg -text
+----
++
+When you're done making these changes to the `.gitattributes` file, run `git add
+--renormalize .` and then commit.  This will make sure that the files in the
+repository are properly stored with LF endings.
++
+Using this approach means that each developer can choose the line endings that
+are best for their environment while keeping the repository consistent, avoiding
+needless changes in the repository based on differing line endings, and allowing
+tools like `git diff` to not display spurious whitespace errors.
++
+See linkgit:gitattributes[5] for more information about attribute files.
+
 [[windows-diff-control-m]]
 I'm on Windows and git diff shows my files as having a `^M` at the end.::
 	By default, Git expects files to be stored with Unix line endings.  As such,
@@ -396,7 +431,7 @@ I'm on Windows and git diff shows my files as having a `^M` at the end.::
 +
 You can store the files in the repository with Unix line endings and convert
 them automatically to your platform's line endings.  To do that, set the
-configuration option `core.eol` to `native` and see the following entry for
+configuration option `core.eol` to `native` and see the previous entry for
 information about how to configure files as text or binary.
 +
 You can also control this behavior with the `core.whitespace` setting if you

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] docs: add a FAQ section on push and fetch problems
  2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
  2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
  2021-02-27 19:18 ` [PATCH 2/4] docs: add line ending configuration article to FAQ brian m. carlson
@ 2021-02-27 19:18 ` brian m. carlson
  2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
  2021-03-01 18:02   ` Junio C Hamano
  2021-02-27 19:18 ` [PATCH 4/4] docs: note that archives are not stable brian m. carlson
  3 siblings, 2 replies; 16+ messages in thread
From: brian m. carlson @ 2021-02-27 19:18 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Johannes Schindelin

There are a lot of questions on the Internet about common problems with
fetching and pushing.  Roughly, the vast majority of these problems are
when using HTTP and involve HTTP/2 streams, certain HTTP errors, or
connections which are interrupted.  This latter case is especially
common on Windows where non-default antivirus and firewall software
frequently tampers with connections in undesirable ways.

Let's add some FAQ entries explaining what is happening and how to
troubleshoot and solve these problems.  When discussing network
connection issues, explicitly call out TLS man-in-the-middle devices,
proxies, antivirus programs, and firewall applications, which are the
cause of most of these problems, and encourage users to report these
programs as broken.  Since many sites offer both HTTPS and SSH, suggest
using SSH, which is often not intercepted, as a good way to work around
these problems.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitfaq.txt | 100 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index a132f66032..fde54d2664 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -185,6 +185,106 @@ Then, you can adjust your push URL to use `git@example_author` or
 `git@example_committer` instead of `git@example.org` (e.g., `git remote set-url
 git@example_author:org1/project1.git`).
 
+Problems Fetching and Pushing
+-----------------------------
+
+[[remote-connection-http-2-stream-error]]
+Why do I get an error about an HTTP/2 stream not being closed cleanly?::
+	Sometimes when pushing or fetching over HTTP, users see a message such as "RPC
+	failed; curl 92 HTTP/2 stream 0 was not closed cleanly".  This message
+	indicates that Git is using HTTP/2, a recent version of the HTTP protocol, and
+	that the remote server, or a middlebox, such as a proxy, TLS middlebox,
+	antivirus, or firewall, failed to speak the protocol correctly and thus the
+	connection was interrupted.
++
+In such a case, the software causing the problem is buggy and will likely be
+broken with a wide variety of web browsers and other HTTP-using applications.
+The best thing to do is contact the responsible party to get the software fixed.
++
+If that isn't possible, you can set the `http.version` option to `HTTP/1.1`,
+which will force the use of an older version of HTTP.  This should allow Git to
+function with this broken software or device.  If the remote server supports
+SSH, you may wish to try switching to SSH instead.
+
+[[remote-connection-http-411]]
+Why do I get an error about an HTTP 411 status?::
+	Sometimes users see error messages when pushing that refer to HTTP status 411,
+	such as "RPC failed; result=22, HTTP code = 411."  This status means that the
+	server or a machine in the middle, such as a proxy, TLS middlebox, antivirus,
+	firewall, or other middlebox, refuses to accept a streaming data connection.
++
+When pushing or fetching over HTTP, Git normally uses a small buffer and, if the
+data is large, uses HTTP 1.1 chunked transfer encoding or HTTP 2 streaming to
+send the data without a defined size.  This is useful because it allows a push
+or fetch to start much faster and therefore complete much faster.  This type of
+streaming has been standardized since 1999 and is well understood, and all
+modern software should be capable of supporting it.
++
+However, in this case, the remote server or middlebox is misconfigured and does
+not correctly support this.  The best thing to do is contact the responsible
+party and ask them to fix the server or middlebox, since this misconfiguration
+can affect many pieces of software, some of which will simply not function at
+all in this environment.
++
+If the remote server supports SSH, you may wish to try using SSH instead.  If
+that is not possible, you can set `http.postBuffer` to a larger value as a
+workaround.  This is one of the few times when that option is useful, but note
+<<http-postbuffer,as outlined in the answer above>> that doing so will increase
+the memory usage for every push, no matter how small, and will not be able to
+handle pushes of arbitrary sizes, so fixing the broken server or device or
+switching to SSH is preferable in almost all cases.
+
+[[remote-connection-reset]]
+Why do I get errors that the connection was reset?::
+	When pushing or fetching, sometimes users see problems where the connection
+	was reset.  Common symptoms of this problem include (but are not limited to)
+	messages like the following:
++
+* RPC failed; curl 56 OpenSSL SSL_read: Connection was reset, errno 10054
+* RPC failed; curl 55 SSL_write() returned SYSCALL, errno = 10053
+* RPC failed; curl 56 LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60
+* RPC failed; result=56, HTTP code = 200
+* RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.
++
+These messages, and almost every message with a libcurl error code of 55 or 56,
+essentially mean that the network connection between Git and the remote server
+was terminated unexpectedly.  This can be caused by any sort of generic network
+problem, such as packet loss or an unstable connection.  Sometimes users also
+see it when connected to a VPN if the connection over the VPN is unstable.  In
+such a case, disabling the VPN or switching to a different connection may help
+the problem, or sending or receiving less data may work around the problem.
++
+This may also be caused by devices or software in the middle of the connection
+which attempt to inspect the data.  For example, if you're on a network which
+uses a TLS middlebox or a proxy, these devices may attempt to inspect the data
+and terminate the connection if the data is too large for them to handle or if
+they mistakenly think it is malicious, offensive, inappropriate, or otherwise
+unacceptable.  To test if this is the problem, try using a different network
+where these devices are not enabled, or contact your network administrator and
+report the problem to them.
++
+On Windows, and to a lesser extent on other platforms, antivirus, firewall, or
+network monitoring software that is not the default (on Windows, something other
+than Windows Defender and Windows Firewall) can intercept network connections
+and cause the same problems as the devices mentioned above.  This may also
+happen when using Git under the Windows Subsystem for Linux with such software.
+To test if this is the problem, remove the non-default software completely and
+restart your computer.  Some such software does not disable the broken
+functionality properly when it is set to disabled and so removing the software
+is the only way to perform the test.  If this is the problem, use the default
+software instead, report the problem to the software vendor, or contact your
+network administrator and report the problem to them.
++
+If you are using HTTPS and the remote server supports SSH, you may wish to try
+using SSH instead.
++
+Note that in all these cases, this is not a problem in Git, but a problem with
+the network or the devices and software managing it.  Some parties mistakenly
+recommend adjusting the `http.postBuffer` setting to work around this, but
+<<http-postbuffer,see the above answer>> for why that usually doesn't work, and
+even when it does work, indicates a defect in the network or software such as
+one mentioned above in this answer.
+
 Common Issues
 -------------
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] docs: note that archives are not stable
  2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
                   ` (2 preceding siblings ...)
  2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
@ 2021-02-27 19:18 ` brian m. carlson
  2021-02-28 12:48   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 16+ messages in thread
From: brian m. carlson @ 2021-02-27 19:18 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Johannes Schindelin

We have in the past told users on the list that git archive does not
necessarily produce stable archives, but we've never explicitly
documented this.  Unfortunately, we've had people in the past who have
relied on the relative stability of our archives to their detriment and
then had breakage occur.

Let's tell people that we don't guarantee stable archives so that they
can make good choices about how they structure their tooling and don't
end up with problems if we need to change archives later.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-archive.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index 9f8172828d..1f126cbdcc 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -30,6 +30,9 @@ extended pax header if the tar format is used; it can be extracted
 using 'git get-tar-commit-id'. In ZIP files it is stored as a file
 comment.
 
+The output of 'git archive' is not guaranteed to be stable and may change
+between versions.
+
 OPTIONS
 -------
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] docs: add a FAQ section on push and fetch problems
  2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
@ 2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
  2021-02-28 18:07     ` brian m. carlson
  2021-03-01 18:02   ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-28 12:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Emily Shaffer, Johannes Schindelin


On Sat, Feb 27 2021, brian m. carlson wrote:

> There are a lot of questions on the Internet about common problems with
> fetching and pushing.  Roughly, the vast majority of these problems are
> when using HTTP and involve HTTP/2 streams, certain HTTP errors, or
> connections which are interrupted.  This latter case is especially
> common on Windows where non-default antivirus and firewall software
> frequently tampers with connections in undesirable ways.
>
> Let's add some FAQ entries explaining what is happening and how to
> troubleshoot and solve these problems.  When discussing network
> connection issues, explicitly call out TLS man-in-the-middle devices,
> proxies, antivirus programs, and firewall applications, which are the
> cause of most of these problems, and encourage users to report these
> programs as broken.  Since many sites offer both HTTPS and SSH, suggest
> using SSH, which is often not intercepted, as a good way to work around
> these problems.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/gitfaq.txt | 100 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>
> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index a132f66032..fde54d2664 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -185,6 +185,106 @@ Then, you can adjust your push URL to use `git@example_author` or
>  `git@example_committer` instead of `git@example.org` (e.g., `git remote set-url
>  git@example_author:org1/project1.git`).
>  
> +Problems Fetching and Pushing
> +-----------------------------
> +
> +[[remote-connection-http-2-stream-error]]
> +Why do I get an error about an HTTP/2 stream not being closed cleanly?::
> +	Sometimes when pushing or fetching over HTTP, users see a message such as "RPC
> +	failed; curl 92 HTTP/2 stream 0 was not closed cleanly".  This message
> +	indicates that Git is using HTTP/2, a recent version of the HTTP protocol, and
> +	that the remote server, or a middlebox, such as a proxy, TLS middlebox,
> +	antivirus, or firewall, failed to speak the protocol correctly and thus the
> +	connection was interrupted.
> ++
> +In such a case, the software causing the problem is buggy and will likely be
> +broken with a wide variety of web browsers and other HTTP-using applications.
> +The best thing to do is contact the responsible party to get the software fixed.
> ++
> +If that isn't possible, you can set the `http.version` option to `HTTP/1.1`,
> +which will force the use of an older version of HTTP.  This should allow Git to
> +function with this broken software or device.  If the remote server supports
> +SSH, you may wish to try switching to SSH instead.
> +
> +[[remote-connection-http-411]]
> +Why do I get an error about an HTTP 411 status?::
> +	Sometimes users see error messages when pushing that refer to HTTP status 411,
> +	such as "RPC failed; result=22, HTTP code = 411."  This status means that the
> +	server or a machine in the middle, such as a proxy, TLS middlebox, antivirus,
> +	firewall, or other middlebox, refuses to accept a streaming data connection.
> ++
> +When pushing or fetching over HTTP, Git normally uses a small buffer and, if the
> +data is large, uses HTTP 1.1 chunked transfer encoding or HTTP 2 streaming to
> +send the data without a defined size.  This is useful because it allows a push
> +or fetch to start much faster and therefore complete much faster.  This type of
> +streaming has been standardized since 1999 and is well understood, and all
> +modern software should be capable of supporting it.
> ++
> +However, in this case, the remote server or middlebox is misconfigured and does
> +not correctly support this.  The best thing to do is contact the responsible
> +party and ask them to fix the server or middlebox, since this misconfiguration
> +can affect many pieces of software, some of which will simply not function at
> +all in this environment.
> ++
> +If the remote server supports SSH, you may wish to try using SSH instead.  If
> +that is not possible, you can set `http.postBuffer` to a larger value as a
> +workaround.  This is one of the few times when that option is useful, but note
> +<<http-postbuffer,as outlined in the answer above>> that doing so will increase
> +the memory usage for every push, no matter how small, and will not be able to
> +handle pushes of arbitrary sizes, so fixing the broken server or device or
> +switching to SSH is preferable in almost all cases.
> +
> +[[remote-connection-reset]]
> +Why do I get errors that the connection was reset?::
> +	When pushing or fetching, sometimes users see problems where the connection
> +	was reset.  Common symptoms of this problem include (but are not limited to)
> +	messages like the following:
> ++
> +* RPC failed; curl 56 OpenSSL SSL_read: Connection was reset, errno 10054
> +* RPC failed; curl 55 SSL_write() returned SYSCALL, errno = 10053
> +* RPC failed; curl 56 LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60
> +* RPC failed; result=56, HTTP code = 200
> +* RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.

I haven't looked in details at the content of the FAQ itself being added
here (as far as proposed solutions etc. go), but I wonder if this
wouldn't be 10x more useful to users if we cross-linked these errors
with the docs, e.g.:

diff --git a/remote-curl.c b/remote-curl.c
index 0290b04891..ffb1001703 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -829,7 +829,7 @@ static int run_slot(struct active_request_slot *slot,
                                strbuf_addstr(&msg, curl_errorstr);
                        }
                }
-               error(_("RPC failed; %s"), msg.buf);
+               error(_("RPC failed (see 'git help faq'); %s"), msg.buf);
                strbuf_release(&msg);
        }


> +These messages, and almost every message with a libcurl error code of 55 or 56,
> +essentially mean that the network connection between Git and the remote server
> +was terminated unexpectedly.  This can be caused by any sort of generic network
> +problem, such as packet loss or an unstable connection.  Sometimes users also
> +see it when connected to a VPN if the connection over the VPN is unstable.  In
> +such a case, disabling the VPN or switching to a different connection may help
> +the problem, or sending or receiving less data may work around the problem.
> ++
> +This may also be caused by devices or software in the middle of the connection
> +which attempt to inspect the data.  For example, if you're on a network which
> +uses a TLS middlebox or a proxy, these devices may attempt to inspect the data
> +and terminate the connection if the data is too large for them to handle or if
> +they mistakenly think it is malicious, offensive, inappropriate, or otherwise
> +unacceptable.  To test if this is the problem, try using a different network
> +where these devices are not enabled, or contact your network administrator and
> +report the problem to them.
> ++
> +On Windows, and to a lesser extent on other platforms, antivirus, firewall, or
> +network monitoring software that is not the default (on Windows, something other
> +than Windows Defender and Windows Firewall) can intercept network connections
> +and cause the same problems as the devices mentioned above.  This may also
> +happen when using Git under the Windows Subsystem for Linux with such software.
> +To test if this is the problem, remove the non-default software completely and
> +restart your computer.  Some such software does not disable the broken
> +functionality properly when it is set to disabled and so removing the software
> +is the only way to perform the test.  If this is the problem, use the default
> +software instead, report the problem to the software vendor, or contact your
> +network administrator and report the problem to them.
> ++
> +If you are using HTTPS and the remote server supports SSH, you may wish to try
> +using SSH instead.
> ++
> +Note that in all these cases, this is not a problem in Git, but a problem with
> +the network or the devices and software managing it.  Some parties mistakenly
> +recommend adjusting the `http.postBuffer` setting to work around this, but
> +<<http-postbuffer,see the above answer>> for why that usually doesn't work, and
> +even when it does work, indicates a defect in the network or software such as
> +one mentioned above in this answer.
> +
>  Common Issues
>  -------------
>  


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-02-27 19:18 ` [PATCH 4/4] docs: note that archives are not stable brian m. carlson
@ 2021-02-28 12:48   ` Ævar Arnfjörð Bjarmason
  2021-02-28 18:19     ` brian m. carlson
  0 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-28 12:48 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Emily Shaffer, Johannes Schindelin, Konstantin Ryabitsev,
	Jason Pyeron


On Sat, Feb 27 2021, brian m. carlson wrote:

> We have in the past told users on the list that git archive does not
> necessarily produce stable archives, but we've never explicitly
> documented this.  Unfortunately, we've had people in the past who have
> relied on the relative stability of our archives to their detriment and
> then had breakage occur.
>
> Let's tell people that we don't guarantee stable archives so that they
> can make good choices about how they structure their tooling and don't
> end up with problems if we need to change archives later.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/git-archive.txt | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index 9f8172828d..1f126cbdcc 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -30,6 +30,9 @@ extended pax header if the tar format is used; it can be extracted
>  using 'git get-tar-commit-id'. In ZIP files it is stored as a file
>  comment.
>  
> +The output of 'git archive' is not guaranteed to be stable and may change
> +between versions.

Is "stable archive" a well-known term people would understand, or is
someone going to read this thinking they might extract different content
today than tomorrow ? :) I wonder how much if anything this means to
someone not privy to the recent thread[1] that prompted this patch.

Perhaps something like this instead:

    The output of 'git archive' is guaranteed to be the same across
    versions of git, but the archive itself is not guaranteed to be
    bit-for-bit identical.

    In practice the output of 'git archive' is relatively stable across
    git versions, but has changed in the past, and most likely will in
    the future.

    Since the tar format provides multiple ways to encode the same
    output (ordering, headers, padding etc.) you should not rely on
    output being bit-for-bit identical across versions of git for
    e.g. GPG signing a SHA-256 hash of an archive generated with one
    version of git, and then expecting to be able to validate that GPG
    signature with a freshly generated archive made with same arguments
    on another version of git.

1. https://lore.kernel.org/git/20210122213954.7dlnnpngjoay3oia@chatter.i7.local/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] docs: add a question on syncing repositories to the FAQ
  2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
@ 2021-02-28 13:01   ` Ævar Arnfjörð Bjarmason
  2021-03-15 20:40     ` brian m. carlson
  0 siblings, 1 reply; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-28 13:01 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Emily Shaffer, Johannes Schindelin

On Sat, Feb 27 2021, brian m. carlson wrote:

> It is very common that users want to transport repositories with working
> trees across machines.  While this is not recommended, many users do it
> anyway and moreover, do it using cloud syncing services, which often
> corrupt their data.  The results of such are often seen in tales of woe
> on common user question fora.
>
> Let's tell users what we recommend they do in this circumstance and how
> to do it safely.  Warn them about the dangers of untrusted working trees
> and the downsides of index refreshes, as well as the problems with cloud
> syncing services.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/gitfaq.txt | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index afdaeab850..042b11e88a 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -241,6 +241,45 @@ How do I know if I want to do a fetch or a pull?::
>  	ignore the upstream changes.  A pull consists of a fetch followed
>  	immediately by either a merge or rebase.  See linkgit:git-pull[1].
>  
> +[[syncing-across-computers]]
> +How do I sync a Git repository across multiple computers, VMs, or operating systems?::
> +	The best way to sync a repository across computers is by pushing and fetching.
> +	This uses the native Git mechanisms to transport data efficiently and is the
> +	easiest and best way to move data across machines.  If the machines aren't
> +	connected by a network, you can use `git bundle` to create a file with your
> +	changes and then fetch or pull them from the file on the remote machine.
> +	Pushing and fetching are also the only secure ways to interact with a
> +	repository you don't own or trust.
> ++
> +However, sometimes people want to sync a repository with a working tree across
> +machines.  While this isn't recommended, it can be done with `rsync` (usually
> +over an SSH connection), but only when the repository is completely idle (that
> +is, no processes, including `git gc`, are modifying it at all).  If `rsync`
> +isn't available, you can use `tar` to create a tar archive of the repository and
> +copy it to another machine.  Zip files shouldn't be used due to their poor
> +support for permissions and symbolic links.
> ++
> +You may also use a shared file system between the two machines that is POSIX
> +compliant, such as SSHFS (SFTP) or NFSv4.  If you are using SFTP for this
> +purpose, the server should support fsync and POSIX renames (OpenSSH does).  File
> +systems that don't provide POSIX semantics, such as DAV mounts, shouldn't be
> +used.
> ++
> +Note that you must not work with untrusted working trees, since it's trivial
> +for an attacker to set configuration options that will cause arbitrary code to
> +be executed on your machine.  Also, in almost all cases when sharing a working
> +tree across machines, Git will need to re-read all files the next time you run
> +`git status` or otherwise refresh the index, which can be slow.  This generally
> +can't be avoided and is part of the reason why sharing a working tree isn't
> +recommended.
> ++
> +In no circumstances should you share a working tree or bare repository using a
> +cloud syncing service or store it in a directory managed by such a service.
> +Such services sync file by file and don't maintain the invariants required for
> +repository integrity; in addition, they can cause files to be added, removed, or
> +duplicated unexpectedly.  If you must use one of these services, use it to store
> +the repository in a tar archive instead.

I think documentation on this topic is needed, but wonder if we couldn't
make this more understandable by going to the heart of the matter, i.e.:

 * We prefer push/pull/bundle to copy/replicate .git content

 * Regardless, a .git directory can be copied across systems just fine
   if you recursively guarantee snapshot integrity, e.g. it doesn't
   depend on the endian-ness of the OS, or has anything like symlinks in
   there made by git itself.

 * Anything which copies .git data on a concurrently updated repo can
   lead to corruption, whether that's cp -R, rsync with any combination
   of flags, some cloud syncing service that expects to present that
   tree to two computers without guaranteeing POSIX fs semantics between
   the two etc.

 * A common pitfall with such copying of a .git directory is that file
   deletions are also critical, e.g. rsync without --delete is almost
   guaranteed to produce a corrupt .git if repeated enough times
   (e.g. git might prefer stale loose refs over now-packed ones).

 * It's OK to copy .git between system that differ in their support of
   symbolic links, but the work tree may be in an inconsistent state and
   need some manner of "git reset" to repair it.

And, not sure if this is correct:

 * It may be OK to edit a .git directory on a non-POSIX conforming fs
   (but perhaps validate the result with "git fsck"). But it's not OK to
   have two writing git processes work on such a repository at the same
   time. Keep in mind that certain operations and default settings (such
   as background gc, see `gc.autoDetach` in linkgit:git-config[1]) might
   result in two processes working on the directory even if you're
   changing it only in one terminal window at a time.

I.e. to go a bit beyond the docs you have of basically saying "there be
dragons in non-POSIX" and describe the particular scenarios where it can
go wrong. Something like the above still leaves the door open to users
using cloud syncing services, which they can then judge for themselves
as being OK or not. I'm sure there's some that are far from POSIX
compliance that are OK in practice if the above warnings are observed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] docs: add a FAQ section on push and fetch problems
  2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
@ 2021-02-28 18:07     ` brian m. carlson
  0 siblings, 0 replies; 16+ messages in thread
From: brian m. carlson @ 2021-02-28 18:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Emily Shaffer, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On 2021-02-28 at 12:37:39, Ævar Arnfjörð Bjarmason wrote:
> I haven't looked in details at the content of the FAQ itself being added
> here (as far as proposed solutions etc. go), but I wonder if this
> wouldn't be 10x more useful to users if we cross-linked these errors
> with the docs, e.g.:
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 0290b04891..ffb1001703 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -829,7 +829,7 @@ static int run_slot(struct active_request_slot *slot,
>                                 strbuf_addstr(&msg, curl_errorstr);
>                         }
>                 }
> -               error(_("RPC failed; %s"), msg.buf);
> +               error(_("RPC failed (see 'git help faq'); %s"), msg.buf);
>                 strbuf_release(&msg);
>         }

Sure, I can send a patch to do that.  That's a great idea.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-02-28 12:48   ` Ævar Arnfjörð Bjarmason
@ 2021-02-28 18:19     ` brian m. carlson
  2021-02-28 18:46       ` Ævar Arnfjörð Bjarmason
  2021-03-01 18:15       ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: brian m. carlson @ 2021-02-28 18:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Emily Shaffer, Johannes Schindelin, Konstantin Ryabitsev,
	Jason Pyeron

[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]

On 2021-02-28 at 12:48:56, Ævar Arnfjörð Bjarmason wrote:
> Perhaps something like this instead:
> 
>     The output of 'git archive' is guaranteed to be the same across
>     versions of git, but the archive itself is not guaranteed to be
>     bit-for-bit identical.
> 
>     In practice the output of 'git archive' is relatively stable across
>     git versions, but has changed in the past, and most likely will in
>     the future.
> 
>     Since the tar format provides multiple ways to encode the same
>     output (ordering, headers, padding etc.) you should not rely on
>     output being bit-for-bit identical across versions of git for
>     e.g. GPG signing a SHA-256 hash of an archive generated with one
>     version of git, and then expecting to be able to validate that GPG
>     signature with a freshly generated archive made with same arguments
>     on another version of git.

I think something like this is good.  I'm a bit nervous about telling
people that the output is relatively stable because that will likely
push people in the direction that we don't want to encourage.  I might
rephrase the first two paragraphs as so:

  The output of 'git archive' is guaranteed to be the same across
  versions of git, but the archive itself is not guaranteed to be
  bit-for-bit identical.  The output of 'git archive' has changed
  in the past, and most likely will in the future.

I'm not very familiar with the zip format, but I assume that it also has
features that allow equivalent but not bit-for-bit equal archives.
Looking at Wikipedia leads me to believe that one could indeed create
different archives just by either writing a Zip64 record or not, and if
we store the SHA-1 revision ID in a comment, then we would also produce
a different archive when using an equivalent SHA-256 repo.  And of
course there's compression, which allows many different but equivalent
serializations.  So we'd probably need to say the same thing about zip
files as well.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-02-28 18:19     ` brian m. carlson
@ 2021-02-28 18:46       ` Ævar Arnfjörð Bjarmason
  2021-03-01 18:15       ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-28 18:46 UTC (permalink / raw)
  To: brian m. carlson
  Cc: git, Emily Shaffer, Johannes Schindelin, Konstantin Ryabitsev,
	Jason Pyeron


On Sun, Feb 28 2021, brian m. carlson wrote:

> On 2021-02-28 at 12:48:56, Ævar Arnfjörð Bjarmason wrote:
>> Perhaps something like this instead:
>> 
>>     The output of 'git archive' is guaranteed to be the same across
>>     versions of git, but the archive itself is not guaranteed to be
>>     bit-for-bit identical.
>> 
>>     In practice the output of 'git archive' is relatively stable across
>>     git versions, but has changed in the past, and most likely will in
>>     the future.
>> 
>>     Since the tar format provides multiple ways to encode the same
>>     output (ordering, headers, padding etc.) you should not rely on
>>     output being bit-for-bit identical across versions of git for
>>     e.g. GPG signing a SHA-256 hash of an archive generated with one
>>     version of git, and then expecting to be able to validate that GPG
>>     signature with a freshly generated archive made with same arguments
>>     on another version of git.
>
> I think something like this is good.  I'm a bit nervous about telling
> people that the output is relatively stable because that will likely
> push people in the direction that we don't want to encourage.  I might
> rephrase the first two paragraphs as so:
>
>   The output of 'git archive' is guaranteed to be the same across
>   versions of git, but the archive itself is not guaranteed to be
>   bit-for-bit identical.  The output of 'git archive' has changed
>   in the past, and most likely will in the future.
>
> I'm not very familiar with the zip format, but I assume that it also has
> features that allow equivalent but not bit-for-bit equal archives.
> Looking at Wikipedia leads me to believe that one could indeed create
> different archives just by either writing a Zip64 record or not, and if
> we store the SHA-1 revision ID in a comment, then we would also produce
> a different archive when using an equivalent SHA-256 repo.  And of
> course there's compression, which allows many different but equivalent
> serializations.  So we'd probably need to say the same thing about zip
> files as well.

Yes, I think your version is better, and we should have some wording so
it generalizes to the various output formats we support, perhaps further
noting that the "relatively stable" (if you want to keep a note of that)
only refers to our own output, not when we invoke gzip or zip.

I thought that "relatively stable" and "[when you extract it you get the
same thing]" were good to note, to say that e.g. GPG signing across
versions = bad, but if you e.g. offer downloadable archives with the
contents of tags, there's no reason to make your git version a part of a
cache key for the purposes of saving yourself CPU time when
(re-)generating them.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] docs: add a FAQ section on push and fetch problems
  2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
  2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
@ 2021-03-01 18:02   ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2021-03-01 18:02 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Emily Shaffer, Johannes Schindelin

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> +[[remote-connection-http-411]]
> +Why do I get an error about an HTTP 411 status?::
> +	Sometimes users see error messages when pushing that refer to HTTP status 411,
> +	such as "RPC failed; result=22, HTTP code = 411."  This status means that the
> +	server or a machine in the middle, such as a proxy, TLS middlebox, antivirus,
> +	firewall, or other middlebox, refuses to accept a streaming data connection.
> ++
> +When pushing or fetching over HTTP, Git normally uses a small buffer and, if the
> +data is large, uses HTTP 1.1 chunked transfer encoding or HTTP 2 streaming to
> +send the data without a defined size.  This is useful because it allows a push
> +or fetch to start much faster and therefore complete much faster.  This type of
> +streaming has been standardized since 1999 and is well understood, and all
> +modern software should be capable of supporting it.
> ++
> +However, in this case, the remote server or middlebox is misconfigured and does
> +not correctly support this.  The best thing to do is contact the responsible
> +party and ask them to fix the server or middlebox, since this misconfiguration
> +can affect many pieces of software, some of which will simply not function at
> +all in this environment.
> ++
> +If the remote server supports SSH, you may wish to try using SSH instead.  If
> +that is not possible, you can set `http.postBuffer` to a larger value as a
> +workaround.  This is one of the few times when that option is useful, but note
> +<<http-postbuffer,as outlined in the answer above>> that doing so will increase
> +the memory usage for every push, no matter how small, and will not be able to
> +handle pushes of arbitrary sizes, so fixing the broken server or device or
> +switching to SSH is preferable in almost all cases.

Don't we rather want to merge this with [[http-postbuffer] part of
the faq?  If we can have two header lines for the same description
(i.e. the FAQ list may have

    "What does `http.postBuffer` do?" aka "I got HTTP 411--what now?"

as either a single link or a two separate but clearly related
entries), that might be ideal.

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-02-28 18:19     ` brian m. carlson
  2021-02-28 18:46       ` Ævar Arnfjörð Bjarmason
@ 2021-03-01 18:15       ` Junio C Hamano
  2021-03-03  0:36         ` brian m. carlson
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2021-03-01 18:15 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, git, Emily Shaffer,
	Johannes Schindelin, Konstantin Ryabitsev, Jason Pyeron

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>   The output of 'git archive' is guaranteed to be the same across
>   versions of git, but the archive itself is not guaranteed to be
>   bit-for-bit identical.

I do not quite get this; your original was clearer.  What does it
mean to "be the same across versions of git but not identical" at
the same time?  If output from Git version 1.0 and 2.0 are guranteed
to be the same across versions, what more is there for the readers
to worry about the format stability?

Perhaps you meant

	... is guaranteed to be the same for any given version of
	Git across ports.

or something?  It would allow kernel.org's use of "Konstantin tells
kernel.org users to use Git version X to run 'git archive' and
create detached signature on the output, and upload only the
signature.  The site uses the same Git version X to run 'git
archive' to create a tarball and the detached signature magically
matches, as the output on two places are bit-for-bit identical".

>   The output of 'git archive' has changed
>   in the past, and most likely will in the future.

That is correct as a statement of fact.  I feel that saying it is
either redundant and insufficient at the same time.  If we want to
tell them "do not depend on the output being bit-for-bit identical",
we should say it more explicitly after this sentence, I would think.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-03-01 18:15       ` Junio C Hamano
@ 2021-03-03  0:36         ` brian m. carlson
  2021-03-03  6:55           ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: brian m. carlson @ 2021-03-03  0:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Emily Shaffer,
	Johannes Schindelin, Konstantin Ryabitsev, Jason Pyeron

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On 2021-03-01 at 18:15:29, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >   The output of 'git archive' is guaranteed to be the same across
> >   versions of git, but the archive itself is not guaranteed to be
> >   bit-for-bit identical.
> 
> I do not quite get this; your original was clearer.  What does it
> mean to "be the same across versions of git but not identical" at
> the same time?  If output from Git version 1.0 and 2.0 are guranteed
> to be the same across versions, what more is there for the readers
> to worry about the format stability?
> 
> Perhaps you meant
> 
> 	... is guaranteed to be the same for any given version of
> 	Git across ports.
> 
> or something?  It would allow kernel.org's use of "Konstantin tells
> kernel.org users to use Git version X to run 'git archive' and
> create detached signature on the output, and upload only the
> signature.  The site uses the same Git version X to run 'git
> archive' to create a tarball and the detached signature magically
> matches, as the output on two places are bit-for-bit identical".

I think what I had intended was that Git produces deterministic output,
but I don't actually think that's true across ports.  If someone uses a
different version of zlib on a different OS, the output may differ.

I'll rephrase to avoid giving a misleading impression.

> >   The output of 'git archive' has changed
> >   in the past, and most likely will in the future.
> 
> That is correct as a statement of fact.  I feel that saying it is
> either redundant and insufficient at the same time.  If we want to
> tell them "do not depend on the output being bit-for-bit identical",
> we should say it more explicitly after this sentence, I would think.

I agree we should explicitly say that.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] docs: note that archives are not stable
  2021-03-03  0:36         ` brian m. carlson
@ 2021-03-03  6:55           ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2021-03-03  6:55 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, git, Emily Shaffer,
	Johannes Schindelin, Konstantin Ryabitsev, Jason Pyeron

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I think what I had intended was that Git produces deterministic output,
> but I don't actually think that's true across ports.  If someone uses a
> different version of zlib on a different OS, the output may differ.

I agree.  When I wrote my response, I had the "tar" format in mind,
which we write everything ourselves, but zip and also the compressed
output is a different story---we do rely on third-party libraries.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] docs: add a question on syncing repositories to the FAQ
  2021-02-28 13:01   ` Ævar Arnfjörð Bjarmason
@ 2021-03-15 20:40     ` brian m. carlson
  0 siblings, 0 replies; 16+ messages in thread
From: brian m. carlson @ 2021-03-15 20:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Emily Shaffer, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 4171 bytes --]

On 2021-02-28 at 13:01:04, Ævar Arnfjörð Bjarmason wrote:
> make this more understandable by going to the heart of the matter, i.e.:
> 
>  * We prefer push/pull/bundle to copy/replicate .git content
> 
>  * Regardless, a .git directory can be copied across systems just fine
>    if you recursively guarantee snapshot integrity, e.g. it doesn't
>    depend on the endian-ness of the OS, or has anything like symlinks in
>    there made by git itself.

I'll revise to make this clearer.

>  * Anything which copies .git data on a concurrently updated repo can
>    lead to corruption, whether that's cp -R, rsync with any combination
>    of flags, some cloud syncing service that expects to present that
>    tree to two computers without guaranteeing POSIX fs semantics between
>    the two etc.
> 
>  * A common pitfall with such copying of a .git directory is that file
>    deletions are also critical, e.g. rsync without --delete is almost
>    guaranteed to produce a corrupt .git if repeated enough times
>    (e.g. git might prefer stale loose refs over now-packed ones).

I'll include that as well.

>  * It's OK to copy .git between system that differ in their support of
>    symbolic links, but the work tree may be in an inconsistent state and
>    need some manner of "git reset" to repair it.

And this.

> And, not sure if this is correct:
> 
>  * It may be OK to edit a .git directory on a non-POSIX conforming fs
>    (but perhaps validate the result with "git fsck"). But it's not OK to
>    have two writing git processes work on such a repository at the same
>    time. Keep in mind that certain operations and default settings (such
>    as background gc, see `gc.autoDetach` in linkgit:git-config[1]) might
>    result in two processes working on the directory even if you're
>    changing it only in one terminal window at a time.

I'll reflect the fact that only one process may modify the repository at
a time.

> I.e. to go a bit beyond the docs you have of basically saying "there be
> dragons in non-POSIX" and describe the particular scenarios where it can
> go wrong. Something like the above still leaves the door open to users
> using cloud syncing services, which they can then judge for themselves
> as being OK or not. I'm sure there's some that are far from POSIX
> compliance that are OK in practice if the above warnings are observed.

Unfortunately, it's a bit hard to make a concise entry that explains all
the different ways that using a non-POSIX compliant file system can
break things.  We've seen NFS where open(2) is broken, both with
permissions and O_EXCL; cloud syncing services that restore files that
have been deleted; DAV mounts that don't support the necessary
semantics; systems where O_APPEND doesn't have POSIX semantics; and a
whole host of other sadness.  I don't therefore think that we want to
tell people that we think that using a file system that doesn't support
POSIX semantics is okay, because in most cases, they are not.

Users frequently try to judge for themselves what works and then they
try one of the above things which clearly does not, so saying, "Try it
and see if it breaks," just tends to result in users complaining about
repository corruption later on.  I'm really tired of answering these
same questions again and again and telling users that their repositories
are hosed and that they've lost data, so I want to be definitive that we
don't recommend or support these various broken environments and that
users should not use them.  It may be in rare cases that users have
extensive knowledge about the behavior of their file systems and Git's
requirements and can make a sound judgment to use a non-POSIX file
system, but the people on the planet who can do that effectively are
almost all Git developers.  I will state that "File systems that don't
provide POSIX semantics, such as DAV mounts, shouldn't be used without
fully understanding the situation and requirements," which I think is
the most generous recommendation we can safely give.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-03-15 20:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
2021-02-28 13:01   ` Ævar Arnfjörð Bjarmason
2021-03-15 20:40     ` brian m. carlson
2021-02-27 19:18 ` [PATCH 2/4] docs: add line ending configuration article to FAQ brian m. carlson
2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
2021-02-28 18:07     ` brian m. carlson
2021-03-01 18:02   ` Junio C Hamano
2021-02-27 19:18 ` [PATCH 4/4] docs: note that archives are not stable brian m. carlson
2021-02-28 12:48   ` Ævar Arnfjörð Bjarmason
2021-02-28 18:19     ` brian m. carlson
2021-02-28 18:46       ` Ævar Arnfjörð Bjarmason
2021-03-01 18:15       ` Junio C Hamano
2021-03-03  0:36         ` brian m. carlson
2021-03-03  6:55           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).