git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Uwe Kleine-König" <zeisberg@informatik.uni-freiburg.de>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <junkio@cox.net>, Nicolas Pitre <nico@cam.org>,
	git@vger.kernel.org
Subject: Re: [PATCH 1/2] libgit.a: add some UTF-8 handling functions
Date: Fri, 22 Dec 2006 23:14:48 +0100	[thread overview]
Message-ID: <20061222221448.GB2407@cepheus> (raw)
In-Reply-To: <Pine.LNX.4.63.0612222233150.19693@wbgn013.biozentrum.uni-wuerzburg.de>

Hello Johannes,

Johannes Schindelin wrote:
> On Fri, 22 Dec 2006, Junio C Hamano wrote:
> 
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > 
> > > This adds utf8_byte_count(), utf8_strlen() and print_wrapped_text().
> > >
> > > The most important is probably utf8_strlen(), which returns the length
> > > of the text, if it is in UTF-8, otherwise -1.
> > >
> > > Note that we do not go the full nine yards: we could also check that
> > > the character is encoded with the minimum amount of bytes, as pointed
> > > out by Uwe Kleine-Koenig.
> > >
> > > The function print_wrapped_text() can be used to wrap text to a certain
> > > line length.
> > 
> > If you do wrapped_text, I think you do not _want_ strlen (the
> > definition to me of strlen is "number of characters in the
> > string").  What you want is a function that returns the number
> > of columns consumed when displayed on monospace terminal.
> 
> To me, characters are the symbols occupying one "column" each. Bytes are 
> the 8-bit thingies that you usually use to encode the characters.
Quoting utf-8(7):

	are no longer valid in UTF-8 locales.  Firstly, a single byte
	does not necessarily correspond any more to a single character.
	Secondly, since modern terminal emulators in UTF-8 mode also
	support Chinese, Japanese, and Korean double-width characters as
	well as non-spacing combining characters, outputting a single
	character does not necessarily advance the cursor by one
	position as it did in ASCII.  Library functions such as
	mbsrtowcs(3) and wcswidth(3) should be used today to count
	characters and cursor positions.

I'd prefer using a similar naming scheme.  To acknowledge Junio,
wcslen(3) (the wide-character equivalent of the strlen() function)
counts the number of (wide-)characters in a string.

Best regards,
Uwe

-- 
Uwe Kleine-König

http://www.google.com/search?q=e+%5E+%28i+pi%29

  parent reply	other threads:[~2006-12-22 22:15 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-08 11:44 [PATCH] Fix documentation copy&paste typo Uwe Kleine-Koenig
2006-12-19 14:16 ` Uwe Kleine-König
2006-12-19 17:27   ` Junio C Hamano
2006-12-21  8:59     ` specify charset for commits (Was: [PATCH] Fix documentation copy&paste typo) Uwe Kleine-König
2006-12-21  9:51       ` Johannes Schindelin
2006-12-21 10:11         ` Santi Béjar
2006-12-21 10:23         ` Alexander Litvinov
2006-12-21 10:52           ` Jakub Narebski
2006-12-21 13:05             ` Alexander Litvinov
2006-12-21 13:14               ` Jakub Narebski
2006-12-21 13:43             ` Uwe Kleine-König
2006-12-21 18:19           ` specify charset for commits Junio C Hamano
2006-12-21 18:48             ` Nicolas Pitre
2006-12-21 19:11             ` Uwe Kleine-König
2006-12-21 19:36             ` Alexander Litvinov
2006-12-22 12:07             ` Johannes Schindelin
2006-12-22 15:09               ` Uwe Kleine-König
2006-12-22 22:02                 ` Uwe Kleine-König
2006-12-22 15:31               ` Nicolas Pitre
2006-12-22 19:01                 ` Junio C Hamano
2006-12-22 21:03                   ` [PATCH 1/2] libgit.a: add some UTF-8 handling functions Johannes Schindelin
2006-12-22 21:27                     ` Junio C Hamano
2006-12-22 21:36                       ` Johannes Schindelin
2006-12-22 21:58                         ` Junio C Hamano
2006-12-22 22:20                           ` Johannes Schindelin
2006-12-22 22:33                             ` Junio C Hamano
2006-12-25  4:03                             ` Alexander Litvinov
2006-12-22 22:14                         ` Uwe Kleine-König [this message]
2006-12-22 22:19                     ` Uwe Kleine-König
2006-12-22 22:34                       ` Johannes Schindelin
2006-12-22 23:50                         ` Johannes Schindelin
2006-12-23  8:52                           ` Uwe Kleine-König
2006-12-23 14:12                             ` Johannes Schindelin
2006-12-23 19:53                           ` warn non utf-8 commit log messages Junio C Hamano
2006-12-23 23:46                             ` Johannes Schindelin
2006-12-22 21:06                   ` [PATCH 2/2] git-commit-tree: if i18n.commitencoding is utf-8 (default), check it Johannes Schindelin
2006-12-22 21:50                     ` Junio C Hamano
2006-12-22 22:21                       ` Johannes Schindelin
2006-12-22 21:15                   ` [RFC/PATCH 3/2] Wrap lines in shortlog Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061222221448.GB2407@cepheus \
    --to=zeisberg@informatik.uni-freiburg.de \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).