From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Eric Sunshine" <sunshine@sunshineco.com>,
"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Johannes Sixt" <j6t@kdbg.org>,
"Torsten Bögershausen" <tboegi@web.de>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v5 0/5] Fix and extend encoding handling in fast export/import
Date: Mon, 13 May 2019 16:17:21 -0700 [thread overview]
Message-ID: <20190513231726.16218-1-newren@gmail.com> (raw)
In-Reply-To: <20190513164722.31534-1-newren@gmail.com>
While stress testing `git filter-repo`, I noticed an issue with
encoding; further digging led to the fixes and features in this series.
See the individual commit messages for details.
Changes since v4 (full range-diff below):
* Used git_parse_maybe_bool()
* Updated Documentation/git-fast-export.txt to document the new option
Elijah Newren (5):
t9350: fix encoding test to actually test reencoding
fast-import: support 'encoding' commit header
fast-export: avoid stripping encoding header if we cannot reencode
fast-export: differentiate between explicitly utf-8 and implicitly
utf-8
fast-export: do automatic reencoding of commit messages only if
requested
Documentation/git-fast-export.txt | 7 ++
Documentation/git-fast-import.txt | 7 ++
builtin/fast-export.c | 55 ++++++++++++--
fast-import.c | 11 ++-
t/t9300-fast-import.sh | 20 +++++
t/t9350-fast-export.sh | 78 +++++++++++++++++---
t/t9350/broken-iso-8859-7-commit-message.txt | 1 +
t/t9350/simple-iso-8859-7-commit-message.txt | 1 +
8 files changed, 163 insertions(+), 17 deletions(-)
create mode 100644 t/t9350/broken-iso-8859-7-commit-message.txt
create mode 100644 t/t9350/simple-iso-8859-7-commit-message.txt
Range-diff:
1: 37a68a0ffd = 1: 37a68a0ffd t9350: fix encoding test to actually test reencoding
2: 3d84f4613d = 2: 3d84f4613d fast-import: support 'encoding' commit header
3: baa8394a3a = 3: baa8394a3a fast-export: avoid stripping encoding header if we cannot reencode
4: 49960164c6 = 4: 49960164c6 fast-export: differentiate between explicitly utf-8 and implicitly utf-8
5: 571613a09e ! 5: d8be4ee826 fast-export: do automatic reencoding of commit messages only if requested
@@ -13,6 +13,24 @@
Signed-off-by: Elijah Newren <newren@gmail.com>
+ diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
+ --- a/Documentation/git-fast-export.txt
+ +++ b/Documentation/git-fast-export.txt
+@@
+ for intermediary filters (e.g. for rewriting commit messages
+ which refer to older commits, or for stripping blobs by id).
+
++--reencode=(yes|no|abort)::
++ Specify how to handle `encoding` header in commit objects. When
++ asking to 'abort' (which is the default), this program will die
++ when encountering such a commit object. With 'yes', the commit
++ message will be reencoded into UTF-8. With 'no', the original
++ encoding will be preserved.
++
+ --refspec::
+ Apply the specified refspec to each ref exported. Multiple of them can
+ be specified.
+
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -31,14 +49,25 @@
+static int parse_opt_reencode_mode(const struct option *opt,
+ const char *arg, int unset)
+{
-+ if (unset || !strcmp(arg, "abort"))
++ if (unset) {
+ reencode_mode = REENCODE_ABORT;
-+ else if (!strcmp(arg, "yes") || !strcmp(arg, "true") || !strcmp(arg, "on"))
-+ reencode_mode = REENCODE_YES;
-+ else if (!strcmp(arg, "no") || !strcmp(arg, "false") || !strcmp(arg, "off"))
++ return 0;
++ }
++
++ switch (git_parse_maybe_bool(arg)) {
++ case 0:
+ reencode_mode = REENCODE_NO;
-+ else
-+ return error("Unknown reencoding mode: %s", arg);
++ break;
++ case 1:
++ reencode_mode = REENCODE_YES;
++ break;
++ default:
++ if (arg && !strcasecmp(arg, "abort"))
++ reencode_mode = REENCODE_ABORT;
++ else
++ return error("Unknown reencoding mode: %s", arg);
++ }
++
+ return 0;
+}
+
--
2.21.0.782.gd8be4ee826
next prev parent reply other threads:[~2019-05-13 23:17 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-30 18:25 [PATCH v2 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-04-30 18:25 ` [PATCH v2 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-04-30 18:25 ` [PATCH v2 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-04-30 18:25 ` [PATCH v2 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-04-30 18:25 ` [PATCH v2 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-04-30 18:25 ` [PATCH v2 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-10 20:53 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-10 20:53 ` [PATCH v3 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-10 20:53 ` [PATCH v3 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-10 20:53 ` [PATCH v3 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-10 20:53 ` [PATCH v3 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-10 20:53 ` [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-11 21:07 ` Torsten Bögershausen
2019-05-11 21:42 ` Elijah Newren
2019-05-13 7:48 ` Junio C Hamano
2019-05-13 13:24 ` Elijah Newren
2019-05-13 10:23 ` Johannes Schindelin
2019-05-13 12:56 ` Torsten Bögershausen
2019-05-13 13:29 ` Elijah Newren
2019-05-13 16:41 ` Elijah Newren
2019-05-13 10:14 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Johannes Schindelin
2019-05-13 16:47 ` [PATCH v4 " Elijah Newren
2019-05-13 16:47 ` [PATCH v4 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-13 16:47 ` [PATCH v4 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 16:47 ` [PATCH v4 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-13 16:47 ` [PATCH v4 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-13 16:47 ` [PATCH v4 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-13 22:32 ` Junio C Hamano
2019-05-13 23:17 ` Elijah Newren [this message]
2019-05-13 23:17 ` [PATCH v5 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 2:50 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 23:17 ` [PATCH v5 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 2:56 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-14 3:01 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-14 0:19 ` Eric Sunshine
2019-05-14 4:30 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-14 4:30 ` [PATCH v6 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 4:30 ` [PATCH v6 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-14 4:31 ` [PATCH v6 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 4:31 ` [PATCH v6 4/5] fast-export: differentiate between explicitly UTF-8 and implicitly UTF-8 Elijah Newren
2019-05-14 4:31 ` [PATCH v6 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-16 18:15 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190513231726.16218-1-newren@gmail.com \
--to=newren@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=sunshine@sunshineco.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).