git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: John Keeping <john@keeping.me.uk>
To: Pete Wyckoff <pw@padd.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	git@vger.kernel.org, "Eric S. Raymond" <esr@thyrsus.com>,
	Felipe Contreras <felipe.contreras@gmail.com>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [RFC/PATCH 2/8 v3] git_remote_helpers: fix input when running under Python 3
Date: Wed, 16 Jan 2013 09:45:34 +0000	[thread overview]
Message-ID: <20130116094418.GA9089@river> (raw)
In-Reply-To: <20130116000316.GA26999@padd.com>

On Tue, Jan 15, 2013 at 07:03:16PM -0500, Pete Wyckoff wrote:
> john@keeping.me.uk wrote on Tue, 15 Jan 2013 22:40 +0000:
>> This is what keeping the refs as byte strings looks like.
> 
> As John knows, it is not possible to interpret text from a byte
> string without talking about the character encoding.
> 
> Git is (largely) a C program and uses the character set defined
> in the C standard, which is a subset of ASCII.  But git does
> "math" on strings, like this snippet that takes something from
> argv[] and prepends "refs/heads/":
> 
>     strcpy(refname, "refs/heads/");
>     strcpy(refname + strlen("refs/heads/"), ret->name);
> 
> The result doesn't talk about what character set it is using,
> but because it combines a prefix from ASCII with its input,
> git makes the assumption that the input is ASCII-compatible.
> 
> If you feed a UTF-16 string in argv, e.g.
> 
>     $ echo master | iconv -f ascii -t utf16 | xargs git branch
>     xargs: Warning: a NUL character occurred in the input.  It cannot be passed through in the argument list.  Did you mean to use the --null option?
>     fatal: Not a valid object name: ''.
> 
> you get an error about NUL, and not the branch you hoped for.
> Git assumes that the input character set contains roughly ASCII
> in byte positions 0..127.
> 
> That's one small reason why the useful character encodings put
> ASCII in the 0..127 range, including utf-8, big5 and shift-jis.
> ASCII is indeed special due to its legacy, and both C and Python
> recognize this.
> 
>> diff --git a/git_remote_helpers/git/importer.py b/git_remote_helpers/git/importer.py
>> @@ -18,13 +18,16 @@ class GitImporter(object):
>>  
>>      def get_refs(self, gitdir):
>>          """Returns a dictionary with refs.
>> +
>> +        Note that the keys in the returned dictionary are byte strings as
>> +        read from git.
>>          """
>>          args = ["git", "--git-dir=" + gitdir, "for-each-ref", "refs/heads"]
>> -        lines = check_output(args).strip().split('\n')
>> +        lines = check_output(args).strip().split('\n'.encode('utf-8'))
>>          refs = {}
>>          for line in lines:
>> -            value, name = line.split(' ')
>> -            name = name.strip('commit\t')
>> +            value, name = line.split(' '.encode('utf-8'))
>> +            name = name.strip('commit\t'.encode('utf-8'))
>>              refs[name] = value
>>          return refs
> 
> I'd suggest for this Python conundrum using byte-string literals, e.g.:
> 
>         lines = check_output(args).strip().split(b'\n')
> 	value, name = line.split(b' ')
> 	name = name.strip(b'commit\t')
> 
> Essentially identical to what you have, but avoids naming "utf-8" as
> the encoding.  It instead relies on Python's interpretation of
> ASCII characters in string context, which is exactly what C does.

The problem is that AFAICT the byte-string prefix is only available in
Python 2.7 and later (compare [1] and [2]).  I think we need this more
convoluted code if we want to keep supporting Python 2.6 (although
perhaps 'ascii' would be a better choice than 'utf-8').

[1] http://docs.python.org/2.6/reference/lexical_analysis.html#literals
[2] http://docs.python.org/2.7/reference/lexical_analysis.html#literals


John

  reply	other threads:[~2013-01-16  9:46 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-12 19:23 [PATCH 0/8] Initial support for Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 1/8] git_remote_helpers: Allow building with " John Keeping
2013-01-12 19:23 ` [PATCH 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-13  3:26   ` Michael Haggerty
2013-01-13 16:17     ` John Keeping
2013-01-14  4:48       ` Michael Haggerty
2013-01-14  9:47         ` John Keeping
2013-01-15 19:48           ` [RFC/PATCH 2/8 v2] " John Keeping
2013-01-15 20:51             ` Junio C Hamano
2013-01-15 21:54               ` John Keeping
2013-01-15 22:04                 ` Junio C Hamano
2013-01-15 22:40                   ` [RFC/PATCH 2/8 v3] " John Keeping
2013-01-16  0:03                     ` Pete Wyckoff
2013-01-16  9:45                       ` John Keeping [this message]
2013-01-17  0:29                         ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 3/8] git_remote_helpers: Force rebuild if python version changes John Keeping
2013-01-12 23:30   ` Pete Wyckoff
2013-01-13 16:26     ` John Keeping
2013-01-13 17:14       ` Pete Wyckoff
2013-01-13 17:52         ` John Keeping
2013-01-15 22:58           ` John Keeping
2013-01-17  0:27             ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 4/8] git_remote_helpers: Use 2to3 if building with Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-12 19:23 ` [PATCH 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-12 19:23 ` [PATCH 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-12 19:23 ` [PATCH 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-12 23:43 ` [PATCH 0/8] Initial support for Python 3 Pete Wyckoff
2013-01-13  0:41   ` John Keeping
2013-01-13 12:34     ` John Keeping
2013-01-13 16:40     ` Pete Wyckoff
2013-01-13 17:35       ` John Keeping
2013-01-17 18:53 ` [PATCH v2 0/8] Initial Python 3 support John Keeping
2013-01-17 18:53 ` [PATCH v2 1/8] git_remote_helpers: allow building with Python 3 John Keeping
2013-01-17 18:53 ` [PATCH v2 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-17 18:53 ` [PATCH v2 3/8] git_remote_helpers: force rebuild if python version changes John Keeping
2013-01-17 18:53 ` [PATCH v2 4/8] git_remote_helpers: use 2to3 if building with Python 3 John Keeping
2013-01-18  5:15   ` Sverre Rabbelier
2013-01-18 10:32     ` John Keeping
2013-01-19  7:52       ` Sverre Rabbelier
2013-01-17 18:53 ` [PATCH v2 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-17 18:53 ` [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-17 20:36   ` Junio C Hamano
2013-01-17 20:43     ` Junio C Hamano
2013-01-17 21:00     ` John Keeping
2013-01-17 21:05       ` John Keeping
2013-01-17 22:24       ` Junio C Hamano
2013-01-17 22:30         ` John Keeping
2013-01-17 22:57           ` Junio C Hamano
2013-01-17 18:54 ` [PATCH v2 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-18  3:50   ` Sverre Rabbelier
2013-01-17 18:54 ` [PATCH v2 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-18  3:48   ` Sverre Rabbelier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130116094418.GA9089@river \
    --to=john@keeping.me.uk \
    --cc=esr@thyrsus.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pw@padd.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).