Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932697AbcCBBVj (ORCPT ); Tue, 1 Mar 2016 20:21:39 -0500 Received: from mail333.us4.mandrillapp.com ([205.201.137.77]:34623 "EHLO mail333.us4.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755532AbcCAXz1 (ORCPT ); Tue, 1 Mar 2016 18:55:27 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; q=dns; s=mandrill; d=linuxfoundation.org; b=ZJVHzLenu6oba81ykPSsbUkhZL/erOy1oTcJa92UtpG+ssQLhmtvqktY8LRbH97M5xlH2BhPyZ/j jyNsEuKZQAhtLS1EvLy6ChkHtj2navMKcQ4B2fa7MWDAk7k+pXods6vFWY/Y2ts8fxpHK5eyVr0h dlVBNc0JNj+kghrOej4=; From: Greg Kroah-Hartman Subject: [PATCH 4.4 055/342] lib/ucs2_string: Add ucs2 -> utf8 helper functions X-Mailer: git-send-email 2.7.2 To: Cc: Greg Kroah-Hartman , , Peter Jones , "Lee, Chun-Yi" , Matthew Garrett , Matt Fleming Message-Id: <20160301234529.783452061@linuxfoundation.org> In-Reply-To: <20160301234527.990448862@linuxfoundation.org> References: <20160301234527.990448862@linuxfoundation.org> X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=30481620.0052269358184728b43eb28507eecbd6 X-Mandrill-User: md_30481620 Date: Tue, 01 Mar 2016 23:54:01 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2634 Lines: 101 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Peter Jones commit 73500267c930baadadb0d02284909731baf151f7 upstream. This adds ucs2_utf8size(), which tells us how big our ucs2 string is in bytes, and ucs2_as_utf8, which translates from ucs2 to utf8.. Signed-off-by: Peter Jones Tested-by: Lee, Chun-Yi Acked-by: Matthew Garrett Signed-off-by: Matt Fleming Signed-off-by: Greg Kroah-Hartman --- include/linux/ucs2_string.h | 4 ++ lib/ucs2_string.c | 62 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+) --- a/include/linux/ucs2_string.h +++ b/include/linux/ucs2_string.h @@ -11,4 +11,8 @@ unsigned long ucs2_strlen(const ucs2_cha unsigned long ucs2_strsize(const ucs2_char_t *data, unsigned long maxlength); int ucs2_strncmp(const ucs2_char_t *a, const ucs2_char_t *b, size_t len); +unsigned long ucs2_utf8size(const ucs2_char_t *src); +unsigned long ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, + unsigned long maxlength); + #endif /* _LINUX_UCS2_STRING_H_ */ --- a/lib/ucs2_string.c +++ b/lib/ucs2_string.c @@ -49,3 +49,65 @@ ucs2_strncmp(const ucs2_char_t *a, const } } EXPORT_SYMBOL(ucs2_strncmp); + +unsigned long +ucs2_utf8size(const ucs2_char_t *src) +{ + unsigned long i; + unsigned long j = 0; + + for (i = 0; i < ucs2_strlen(src); i++) { + u16 c = src[i]; + + if (c > 0x800) + j += 3; + else if (c > 0x80) + j += 2; + else + j += 1; + } + + return j; +} +EXPORT_SYMBOL(ucs2_utf8size); + +/* + * copy at most maxlength bytes of whole utf8 characters to dest from the + * ucs2 string src. + * + * The return value is the number of characters copied, not including the + * final NUL character. + */ +unsigned long +ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, unsigned long maxlength) +{ + unsigned int i; + unsigned long j = 0; + unsigned long limit = ucs2_strnlen(src, maxlength); + + for (i = 0; maxlength && i < limit; i++) { + u16 c = src[i]; + + if (c > 0x800) { + if (maxlength < 3) + break; + maxlength -= 3; + dest[j++] = 0xe0 | (c & 0xf000) >> 12; + dest[j++] = 0x80 | (c & 0x0fc0) >> 8; + dest[j++] = 0x80 | (c & 0x003f); + } else if (c > 0x80) { + if (maxlength < 2) + break; + maxlength -= 2; + dest[j++] = 0xc0 | (c & 0xfe0) >> 5; + dest[j++] = 0x80 | (c & 0x01f); + } else { + maxlength -= 1; + dest[j++] = c & 0x7f; + } + } + if (maxlength) + dest[j] = '\0'; + return j; +} +EXPORT_SYMBOL(ucs2_as_utf8);