From: John Keeping <john@keeping.me.uk>
To: Pete Wyckoff <pw@padd.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Michael Haggerty <mhagger@alum.mit.edu>,
git@vger.kernel.org, "Eric S. Raymond" <esr@thyrsus.com>,
Felipe Contreras <felipe.contreras@gmail.com>,
Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [RFC/PATCH 2/8 v3] git_remote_helpers: fix input when running under Python 3
Date: Wed, 16 Jan 2013 09:45:34 +0000 [thread overview]
Message-ID: <20130116094418.GA9089@river> (raw)
In-Reply-To: <20130116000316.GA26999@padd.com>
On Tue, Jan 15, 2013 at 07:03:16PM -0500, Pete Wyckoff wrote:
> john@keeping.me.uk wrote on Tue, 15 Jan 2013 22:40 +0000:
>> This is what keeping the refs as byte strings looks like.
>
> As John knows, it is not possible to interpret text from a byte
> string without talking about the character encoding.
>
> Git is (largely) a C program and uses the character set defined
> in the C standard, which is a subset of ASCII. But git does
> "math" on strings, like this snippet that takes something from
> argv[] and prepends "refs/heads/":
>
> strcpy(refname, "refs/heads/");
> strcpy(refname + strlen("refs/heads/"), ret->name);
>
> The result doesn't talk about what character set it is using,
> but because it combines a prefix from ASCII with its input,
> git makes the assumption that the input is ASCII-compatible.
>
> If you feed a UTF-16 string in argv, e.g.
>
> $ echo master | iconv -f ascii -t utf16 | xargs git branch
> xargs: Warning: a NUL character occurred in the input. It cannot be passed through in the argument list. Did you mean to use the --null option?
> fatal: Not a valid object name: ''.
>
> you get an error about NUL, and not the branch you hoped for.
> Git assumes that the input character set contains roughly ASCII
> in byte positions 0..127.
>
> That's one small reason why the useful character encodings put
> ASCII in the 0..127 range, including utf-8, big5 and shift-jis.
> ASCII is indeed special due to its legacy, and both C and Python
> recognize this.
>
>> diff --git a/git_remote_helpers/git/importer.py b/git_remote_helpers/git/importer.py
>> @@ -18,13 +18,16 @@ class GitImporter(object):
>>
>> def get_refs(self, gitdir):
>> """Returns a dictionary with refs.
>> +
>> + Note that the keys in the returned dictionary are byte strings as
>> + read from git.
>> """
>> args = ["git", "--git-dir=" + gitdir, "for-each-ref", "refs/heads"]
>> - lines = check_output(args).strip().split('\n')
>> + lines = check_output(args).strip().split('\n'.encode('utf-8'))
>> refs = {}
>> for line in lines:
>> - value, name = line.split(' ')
>> - name = name.strip('commit\t')
>> + value, name = line.split(' '.encode('utf-8'))
>> + name = name.strip('commit\t'.encode('utf-8'))
>> refs[name] = value
>> return refs
>
> I'd suggest for this Python conundrum using byte-string literals, e.g.:
>
> lines = check_output(args).strip().split(b'\n')
> value, name = line.split(b' ')
> name = name.strip(b'commit\t')
>
> Essentially identical to what you have, but avoids naming "utf-8" as
> the encoding. It instead relies on Python's interpretation of
> ASCII characters in string context, which is exactly what C does.
The problem is that AFAICT the byte-string prefix is only available in
Python 2.7 and later (compare [1] and [2]). I think we need this more
convoluted code if we want to keep supporting Python 2.6 (although
perhaps 'ascii' would be a better choice than 'utf-8').
[1] http://docs.python.org/2.6/reference/lexical_analysis.html#literals
[2] http://docs.python.org/2.7/reference/lexical_analysis.html#literals
John
next prev parent reply other threads:[~2013-01-16 9:46 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-12 19:23 [PATCH 0/8] Initial support for Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 1/8] git_remote_helpers: Allow building with " John Keeping
2013-01-12 19:23 ` [PATCH 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-13 3:26 ` Michael Haggerty
2013-01-13 16:17 ` John Keeping
2013-01-14 4:48 ` Michael Haggerty
2013-01-14 9:47 ` John Keeping
2013-01-15 19:48 ` [RFC/PATCH 2/8 v2] " John Keeping
2013-01-15 20:51 ` Junio C Hamano
2013-01-15 21:54 ` John Keeping
2013-01-15 22:04 ` Junio C Hamano
2013-01-15 22:40 ` [RFC/PATCH 2/8 v3] " John Keeping
2013-01-16 0:03 ` Pete Wyckoff
2013-01-16 9:45 ` John Keeping [this message]
2013-01-17 0:29 ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 3/8] git_remote_helpers: Force rebuild if python version changes John Keeping
2013-01-12 23:30 ` Pete Wyckoff
2013-01-13 16:26 ` John Keeping
2013-01-13 17:14 ` Pete Wyckoff
2013-01-13 17:52 ` John Keeping
2013-01-15 22:58 ` John Keeping
2013-01-17 0:27 ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 4/8] git_remote_helpers: Use 2to3 if building with Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-12 19:23 ` [PATCH 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-12 19:23 ` [PATCH 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-12 19:23 ` [PATCH 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-12 23:43 ` [PATCH 0/8] Initial support for Python 3 Pete Wyckoff
2013-01-13 0:41 ` John Keeping
2013-01-13 12:34 ` John Keeping
2013-01-13 16:40 ` Pete Wyckoff
2013-01-13 17:35 ` John Keeping
2013-01-17 18:53 ` [PATCH v2 0/8] Initial Python 3 support John Keeping
2013-01-17 18:53 ` [PATCH v2 1/8] git_remote_helpers: allow building with Python 3 John Keeping
2013-01-17 18:53 ` [PATCH v2 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-17 18:53 ` [PATCH v2 3/8] git_remote_helpers: force rebuild if python version changes John Keeping
2013-01-17 18:53 ` [PATCH v2 4/8] git_remote_helpers: use 2to3 if building with Python 3 John Keeping
2013-01-18 5:15 ` Sverre Rabbelier
2013-01-18 10:32 ` John Keeping
2013-01-19 7:52 ` Sverre Rabbelier
2013-01-17 18:53 ` [PATCH v2 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-17 18:53 ` [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-17 20:36 ` Junio C Hamano
2013-01-17 20:43 ` Junio C Hamano
2013-01-17 21:00 ` John Keeping
2013-01-17 21:05 ` John Keeping
2013-01-17 22:24 ` Junio C Hamano
2013-01-17 22:30 ` John Keeping
2013-01-17 22:57 ` Junio C Hamano
2013-01-17 18:54 ` [PATCH v2 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-18 3:50 ` Sverre Rabbelier
2013-01-17 18:54 ` [PATCH v2 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-18 3:48 ` Sverre Rabbelier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130116094418.GA9089@river \
--to=john@keeping.me.uk \
--cc=esr@thyrsus.com \
--cc=felipe.contreras@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pw@padd.com \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).