git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH 1/4] docs: add a question on syncing repositories to the FAQ
Date: Sun, 28 Feb 2021 14:01:04 +0100	[thread overview]
Message-ID: <87eeh0l52n.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210227191813.96148-2-sandals@crustytoothpaste.net>


On Sat, Feb 27 2021, brian m. carlson wrote:

> It is very common that users want to transport repositories with working
> trees across machines.  While this is not recommended, many users do it
> anyway and moreover, do it using cloud syncing services, which often
> corrupt their data.  The results of such are often seen in tales of woe
> on common user question fora.
>
> Let's tell users what we recommend they do in this circumstance and how
> to do it safely.  Warn them about the dangers of untrusted working trees
> and the downsides of index refreshes, as well as the problems with cloud
> syncing services.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/gitfaq.txt | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index afdaeab850..042b11e88a 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -241,6 +241,45 @@ How do I know if I want to do a fetch or a pull?::
>  	ignore the upstream changes.  A pull consists of a fetch followed
>  	immediately by either a merge or rebase.  See linkgit:git-pull[1].
>  
> +[[syncing-across-computers]]
> +How do I sync a Git repository across multiple computers, VMs, or operating systems?::
> +	The best way to sync a repository across computers is by pushing and fetching.
> +	This uses the native Git mechanisms to transport data efficiently and is the
> +	easiest and best way to move data across machines.  If the machines aren't
> +	connected by a network, you can use `git bundle` to create a file with your
> +	changes and then fetch or pull them from the file on the remote machine.
> +	Pushing and fetching are also the only secure ways to interact with a
> +	repository you don't own or trust.
> ++
> +However, sometimes people want to sync a repository with a working tree across
> +machines.  While this isn't recommended, it can be done with `rsync` (usually
> +over an SSH connection), but only when the repository is completely idle (that
> +is, no processes, including `git gc`, are modifying it at all).  If `rsync`
> +isn't available, you can use `tar` to create a tar archive of the repository and
> +copy it to another machine.  Zip files shouldn't be used due to their poor
> +support for permissions and symbolic links.
> ++
> +You may also use a shared file system between the two machines that is POSIX
> +compliant, such as SSHFS (SFTP) or NFSv4.  If you are using SFTP for this
> +purpose, the server should support fsync and POSIX renames (OpenSSH does).  File
> +systems that don't provide POSIX semantics, such as DAV mounts, shouldn't be
> +used.
> ++
> +Note that you must not work with untrusted working trees, since it's trivial
> +for an attacker to set configuration options that will cause arbitrary code to
> +be executed on your machine.  Also, in almost all cases when sharing a working
> +tree across machines, Git will need to re-read all files the next time you run
> +`git status` or otherwise refresh the index, which can be slow.  This generally
> +can't be avoided and is part of the reason why sharing a working tree isn't
> +recommended.
> ++
> +In no circumstances should you share a working tree or bare repository using a
> +cloud syncing service or store it in a directory managed by such a service.
> +Such services sync file by file and don't maintain the invariants required for
> +repository integrity; in addition, they can cause files to be added, removed, or
> +duplicated unexpectedly.  If you must use one of these services, use it to store
> +the repository in a tar archive instead.

I think documentation on this topic is needed, but wonder if we couldn't
make this more understandable by going to the heart of the matter, i.e.:

 * We prefer push/pull/bundle to copy/replicate .git content

 * Regardless, a .git directory can be copied across systems just fine
   if you recursively guarantee snapshot integrity, e.g. it doesn't
   depend on the endian-ness of the OS, or has anything like symlinks in
   there made by git itself.

 * Anything which copies .git data on a concurrently updated repo can
   lead to corruption, whether that's cp -R, rsync with any combination
   of flags, some cloud syncing service that expects to present that
   tree to two computers without guaranteeing POSIX fs semantics between
   the two etc.

 * A common pitfall with such copying of a .git directory is that file
   deletions are also critical, e.g. rsync without --delete is almost
   guaranteed to produce a corrupt .git if repeated enough times
   (e.g. git might prefer stale loose refs over now-packed ones).

 * It's OK to copy .git between system that differ in their support of
   symbolic links, but the work tree may be in an inconsistent state and
   need some manner of "git reset" to repair it.

And, not sure if this is correct:

 * It may be OK to edit a .git directory on a non-POSIX conforming fs
   (but perhaps validate the result with "git fsck"). But it's not OK to
   have two writing git processes work on such a repository at the same
   time. Keep in mind that certain operations and default settings (such
   as background gc, see `gc.autoDetach` in linkgit:git-config[1]) might
   result in two processes working on the directory even if you're
   changing it only in one terminal window at a time.

I.e. to go a bit beyond the docs you have of basically saying "there be
dragons in non-POSIX" and describe the particular scenarios where it can
go wrong. Something like the above still leaves the door open to users
using cloud syncing services, which they can then judge for themselves
as being OK or not. I'm sure there's some that are far from POSIX
compliance that are OK in practice if the above warnings are observed.


  reply	other threads:[~2021-02-28 13:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 19:18 [PATCH 0/4] Documentation updates to FAQ and git-archive brian m. carlson
2021-02-27 19:18 ` [PATCH 1/4] docs: add a question on syncing repositories to the FAQ brian m. carlson
2021-02-28 13:01   ` Ævar Arnfjörð Bjarmason [this message]
2021-03-15 20:40     ` brian m. carlson
2021-02-27 19:18 ` [PATCH 2/4] docs: add line ending configuration article to FAQ brian m. carlson
2021-02-27 19:18 ` [PATCH 3/4] docs: add a FAQ section on push and fetch problems brian m. carlson
2021-02-28 12:37   ` Ævar Arnfjörð Bjarmason
2021-02-28 18:07     ` brian m. carlson
2021-03-01 18:02   ` Junio C Hamano
2021-02-27 19:18 ` [PATCH 4/4] docs: note that archives are not stable brian m. carlson
2021-02-28 12:48   ` Ævar Arnfjörð Bjarmason
2021-02-28 18:19     ` brian m. carlson
2021-02-28 18:46       ` Ævar Arnfjörð Bjarmason
2021-03-01 18:15       ` Junio C Hamano
2021-03-03  0:36         ` brian m. carlson
2021-03-03  6:55           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eeh0l52n.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).