[cryptography] Oddity in common bcrypt implementation

James A. Donald jamesd at echeque.com
Mon Jun 20 17:24:04 EDT 2011

On 2011-06-21 5:09 AM, Marsh Ray wrote:
> There are certainly more bugs lurking where the complex rules of
> international character data "collide" with password hashing. How does a
> password login application work from a UTF-8 terminal (or web page) when
> the host is using a single-byte code page?

C, and existing C libraries, was created when all the characters that 
anyone could ever want fitted in less  than seven bits.

When we ran out of space, each hardware manufacturer and each programmer 
implemented his own incompatible solution ad hoc.

The solution, of course, is more bits.  The world is now standardizing 
on Unicode.  Anything that is more than seven bits, and less than 
Unicode, is asking for endless compatibility crises.

Eight bit ascii is a compatibility bug.

> I once looked up the Unicode algorithm for some basic "case insensitive"
> string comparison... 40 pages!

When one goes truly international, case insensitivity is an AI hard 
problem.  Only some one with an intimate knowledge of the culture can 
tell you if two text strings are in some sense the same, when they are 
not exactly alike.

Humans are so good at judging that two things are almost the same, or 
very similar, that we tend to overlook small differences, where 
computers are incapable of noticing the similarity.  This is apt to 
create insoluble UI issues, for example the difference egold.com (all 
alphabetic) and ego1d.com  (letters and numbers).  One has to design 
around such problems.  Don't go there!

More information about the cryptography mailing list