Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932467AbbBDLST (ORCPT ); Wed, 4 Feb 2015 06:18:19 -0500 Received: from mail-lb0-f182.google.com ([209.85.217.182]:41293 "EHLO mail-lb0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751303AbbBDLSP (ORCPT ); Wed, 4 Feb 2015 06:18:15 -0500 From: Rasmus Villemoes To: linux-kernel@vger.kernel.org Cc: Andrew Morton Subject: The kernel's ctype Organization: D03 Date: Wed, 04 Feb 2015 12:18:11 +0100 Message-ID: <87iofiapsc.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1299 Lines: 27 Hi, The kernel's ctype is almost, but not quite, equivalent to latin1. Apart from whether one wants to include the C1 control chars (0x80-0x9f), there are a few other differences. For example, 0xb5 (MICRO SIGN) is, at least according to glibc, both alpha and lower, while the kernel classifies it as punct. A slightly surprising quirk of the kernel's ctype implementation is that toupper() is not idempotent: Both 0xdf (LATIN SMALL LETTER SHARP S) and 0xff (LATIN SMALL LETTER Y WITH DIAERESIS) are correctly classified as lower, but since neither character's uppercase version is representable in latin1, correct toupper() behaviour would be to return the character itself. Instead, we have toupper(0xff) == 0xdf and toupper(0xdf) == 0xbf. Digging in pre-git history, I see that ctype.c was originally ASCII-only, which I think is the only sane choice. It was changed around 1996, but the commit log that I've found just says "Import 2.0.1", so it's hard to tell what the intention was. What would break if ctype.c was changed back to ASCII? Rasmus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/