From: "Torsten Bögershausen" <tboegi@web.de>
To: Elijah Newren <newren@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Johannes Sixt <j6t@kdbg.org>
Subject: Re: [PATCH v5 3/5] fast-export: avoid stripping encoding header if we cannot reencode
Date: Tue, 14 May 2019 04:56:51 +0200 [thread overview]
Message-ID: <20190514025651.gjtvikhxcjoudkrj@tb-raspi4> (raw)
In-Reply-To: <20190513231726.16218-4-newren@gmail.com>
On Mon, May 13, 2019 at 04:17:24PM -0700, Elijah Newren wrote:
> When fast-export encounters a commit with an 'encoding' header, it tries
> to reencode in utf-8 and then drops the encoding header. However, if it
> fails to reencode in utf-8 because e.g. one of the characters in the
> commit message was invalid in the old encoding, then we need to retain
> the original encoding or otherwise we lose information needed to
> understand all the other (valid) characters in the original commit
> message.
Minor question: "utf-8" or "UTF-8" ?
Mostly we use UTF-8 in Git.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> builtin/fast-export.c | 7 +++++--
> t/t9350-fast-export.sh | 21 ++++++++++++++++++++
> t/t9350/broken-iso-8859-7-commit-message.txt | 1 +
> 3 files changed, 27 insertions(+), 2 deletions(-)
> create mode 100644 t/t9350/broken-iso-8859-7-commit-message.txt
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 9e283482ef..7734a9f5a5 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -642,9 +642,12 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
> printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
> if (show_original_ids)
> printf("original-oid %s\n", oid_to_hex(&commit->object.oid));
> - printf("%.*s\n%.*s\ndata %u\n%s",
> + printf("%.*s\n%.*s\n",
> (int)(author_end - author), author,
> - (int)(committer_end - committer), committer,
> + (int)(committer_end - committer), committer);
> + if (!reencoded && encoding)
> + printf("encoding %s\n", encoding);
> + printf("data %u\n%s",
> (unsigned)(reencoded
> ? strlen(reencoded) : message
> ? strlen(message) : 0),
> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index c721026260..4fd637312a 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -118,6 +118,27 @@ test_expect_success 'iso-8859-7' '
> ! grep ^encoding actual)
> '
>
> +test_expect_success 'encoding preserved if reencoding fails' '
> +
> + test_when_finished "git reset --hard HEAD~1" &&
> + test_config i18n.commitencoding iso-8859-7 &&
> + echo rosten >file &&
> + git commit -s -F "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file &&
> + git fast-export wer^..wer >iso-8859-7.fi &&
> + sed "s/wer/i18n-invalid/" iso-8859-7.fi |
> + (cd new &&
> + git fast-import &&
> + git cat-file commit i18n-invalid >actual &&
> + # Make sure the commit still has the encoding header
> + grep ^encoding actual &&
> + # Verify that the commit has the expected size; i.e.
> + # that no bytes were re-encoded to a different encoding.
> + test 252 -eq "$(git cat-file -s i18n-invalid)" &&
> + # ...and check for the original special bytes
> + grep $(printf "\360") actual &&
> + grep $(printf "\377") actual)
> +'
> +
> test_expect_success 'import/export-marks' '
>
> git checkout -b marks master &&
> diff --git a/t/t9350/broken-iso-8859-7-commit-message.txt b/t/t9350/broken-iso-8859-7-commit-message.txt
> new file mode 100644
> index 0000000000..d06ad75b44
> --- /dev/null
> +++ b/t/t9350/broken-iso-8859-7-commit-message.txt
> @@ -0,0 +1 @@
> +Pi: ?; Invalid: ?
> \ No newline at end of file
> --
> 2.21.0.782.gd8be4ee826
>
next prev parent reply other threads:[~2019-05-14 2:57 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-30 18:25 [PATCH v2 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-04-30 18:25 ` [PATCH v2 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-04-30 18:25 ` [PATCH v2 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-04-30 18:25 ` [PATCH v2 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-04-30 18:25 ` [PATCH v2 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-04-30 18:25 ` [PATCH v2 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-10 20:53 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-10 20:53 ` [PATCH v3 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-10 20:53 ` [PATCH v3 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-10 20:53 ` [PATCH v3 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-10 20:53 ` [PATCH v3 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-10 20:53 ` [PATCH v3 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-11 21:07 ` Torsten Bögershausen
2019-05-11 21:42 ` Elijah Newren
2019-05-13 7:48 ` Junio C Hamano
2019-05-13 13:24 ` Elijah Newren
2019-05-13 10:23 ` Johannes Schindelin
2019-05-13 12:56 ` Torsten Bögershausen
2019-05-13 13:29 ` Elijah Newren
2019-05-13 16:41 ` Elijah Newren
2019-05-13 10:14 ` [PATCH v3 0/5] Fix and extend encoding handling in fast export/import Johannes Schindelin
2019-05-13 16:47 ` [PATCH v4 " Elijah Newren
2019-05-13 16:47 ` [PATCH v4 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-13 16:47 ` [PATCH v4 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 16:47 ` [PATCH v4 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-13 16:47 ` [PATCH v4 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-13 16:47 ` [PATCH v4 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-13 22:32 ` Junio C Hamano
2019-05-13 23:17 ` [PATCH v5 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-13 23:17 ` [PATCH v5 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 2:50 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-13 23:17 ` [PATCH v5 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 2:56 ` Torsten Bögershausen [this message]
2019-05-13 23:17 ` [PATCH v5 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Elijah Newren
2019-05-14 3:01 ` Torsten Bögershausen
2019-05-13 23:17 ` [PATCH v5 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-14 0:19 ` Eric Sunshine
2019-05-14 4:30 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Elijah Newren
2019-05-14 4:30 ` [PATCH v6 1/5] t9350: fix encoding test to actually test reencoding Elijah Newren
2019-05-14 4:30 ` [PATCH v6 2/5] fast-import: support 'encoding' commit header Elijah Newren
2019-05-14 4:31 ` [PATCH v6 3/5] fast-export: avoid stripping encoding header if we cannot reencode Elijah Newren
2019-05-14 4:31 ` [PATCH v6 4/5] fast-export: differentiate between explicitly UTF-8 and implicitly UTF-8 Elijah Newren
2019-05-14 4:31 ` [PATCH v6 5/5] fast-export: do automatic reencoding of commit messages only if requested Elijah Newren
2019-05-16 18:15 ` [PATCH v6 0/5] Fix and extend encoding handling in fast export/import Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190514025651.gjtvikhxcjoudkrj@tb-raspi4 \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=newren@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).