Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp580540imm; Mon, 21 May 2018 10:40:15 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrlOkXI91jyqUYBH2XP6YevJzZYFmENFVoBSv1Nki1DZODUwRcYXfzxM4Tc51DzkDWfE6i3 X-Received: by 2002:a65:61a6:: with SMTP id i6-v6mr16270557pgv.88.1526924415763; Mon, 21 May 2018 10:40:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526924415; cv=none; d=google.com; s=arc-20160816; b=kW4gS6hzoIG2Wk0a7qBakyVtSfskpnwxN5GVTDAt1JzSAEmNv4LQU9U+Jw79evAgID W861oyAxD5ds9fGb6Sr/EXHu4tKLU3rx+do7xpDV03DLeylK6YIoyYlIbD1Z0/KwRNy5 2aSn9+JmUfg/yDszjRjoc9Rfval6Ga3gNJcWncnjYr19tRFFRCj3DpklZwpXKLgxxN/Q jsRAxbdX74aDsV9eazkP6Pm5cdFI1k2KOPhhQ08oMHebuVaUFnvTy0TiQ1amnCeCAmM+ IseZ7I4UtsJruG4K97dWHZVK2Jk6XQxQKI+1S66GHeDMuNknmTw84oGdAAeOPHhy8ZK4 ysqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=IDV/ZzDox3zB0FkROghyp5Q2qXe8IIgAYck5+PMgfwE=; b=0jRuOm5BzWhx4lkIN1twgXA67JKoIWPwvvrZOGWOanEzDn6UdRbhuvgnnFt1Mkrysy Ig6zEVA5U2JFdtof/yHJkXmOdL+89EdCGVwOrpx04L0hIg+GHF6fU1MoDw0JtO7ka9Cg BoQCXvbsUFyrq5yfhfN2Wd/iuG09Wh+uFfMIqSoV4RsjTACm+8SfJYIUxrMfoW/9/2p7 7X8EmqWw9tAmlf/ND346DOL3103FP7FkejPFTV7OVAZUPfHYOuGCjglxVbeoARCww8P2 UaHMZ1+7+m7NFL4UXdePewUO1U3p77KN6tczCpvw2FoYPY2ZS687orlK6JPMTznbfV1w 4/Vw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.co.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si12318754plo.505.2018.05.21.10.40.01; Mon, 21 May 2018 10:40:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.co.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753464AbeEURjR (ORCPT + 99 others); Mon, 21 May 2018 13:39:17 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:45414 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753267AbeEURhF (ORCPT ); Mon, 21 May 2018 13:37:05 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id 6CA8528707D From: Gabriel Krisman Bertazi To: viro@ZenIV.linux.org.uk Cc: jra@google.com, tytso@mit.edu, olaf@sgi.com, darrick.wong@oracle.com, kernel@lists.collabora.co.uk, linux-fsdevel@vger.kernel.org, david@fromorbit.com, jack@suse.cz, linux-kernel@vger.kernel.org, Gabriel Krisman Bertazi Subject: [PATCH v2 10/15] nls: utf8norm: Add unicode character database files Date: Mon, 21 May 2018 14:36:12 -0300 Message-Id: <20180521173617.31625-11-krisman@collabora.co.uk> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180521173617.31625-1-krisman@collabora.co.uk> References: <20180521173617.31625-1-krisman@collabora.co.uk> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Olaf Weber Add files from the Unicode Character Database, version 10.0.0, to the source. A helper program that generates a trie used for normalization from these files is part of a separate commit. - Notes on the update from 8.0.0 and 10.0.0: The structure of ucd files and special cases have not experienced any changes between versions 8.0.0 and 10.0.0. 8.0.0 saw the addition of Cherokee LC characters, which is an interesting case for case-folding. The update is accompanied by new tests on the test_ucd module to catch specific cases. No changes to mkutf8data script was required for the update. The actual files are not part of the commit submitted to the list because they are to big and would bounce. Still, they can be obtained by the following script: FILES="CaseFolding.txt DerivedAge.txt extracted/DerivedCombiningClass.txt DerivedCoreProperties.txt NormalizationCorrections.txt NormalizationTest.txt UnicodeData.txt" VERSION=10.0.0 BASE=http://www.unicode.org/Public/${VERSION}/ucd for i in ${FILES} ; do wget "${BASE}/$i" -O fs/nls/ucd/$(basename ${i} .txt)-${VERSION}.txt done Signed-off-by: Olaf Weber Signed-off-by: Gabriel Krisman Bertazi [Move ucd directory to fs/nls/] [Update to ucd-10.0.0] --- fs/nls/ucd/README | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 fs/nls/ucd/README diff --git a/fs/nls/ucd/README b/fs/nls/ucd/README new file mode 100644 index 000000000000..67f2075d1fca --- /dev/null +++ b/fs/nls/ucd/README @@ -0,0 +1,33 @@ +The files in this directory are part of the Unicode Character Database +for version 10.0.0 of the Unicode standard. + +The full set of files can be found here: + + http://www.unicode.org/Public/10.0.0/ucd/ + +The latest released version of the UCD can be found here: + + http://www.unicode.org/Public/UCD/latest/ + +The files in this directory are identical, except that they have been +renamed with a suffix indicating the unicode version. + +Individual source links: + + http://www.unicode.org/Public/10.0.0/ucd/CaseFolding.txt + http://www.unicode.org/Public/10.0.0/ucd/DerivedAge.txt + http://www.unicode.org/Public/10.0.0/ucd/extracted/DerivedCombiningClass.txt + http://www.unicode.org/Public/10.0.0/ucd/DerivedCoreProperties.txt + http://www.unicode.org/Public/10.0.0/ucd/NormalizationCorrections.txt + http://www.unicode.org/Public/10.0.0/ucd/NormalizationTest.txt + http://www.unicode.org/Public/10.0.0/ucd/UnicodeData.txt + +md5sums + + 7893b6e005c5a521319a0d12062ae122 CaseFolding-10.0.0.txt + a602e4b44de3350087e40f2eb2184898 DerivedAge-10.0.0.txt + 5abdeb21af4edcc5d1e4c0b5802fc7a7 DerivedCombiningClass-10.0.0.txt + eda11c2c2e3c308d9d3b90e2b3282024 DerivedCoreProperties-10.0.0.txt + 425ece5ffbecd0140d98c13ce05724aa NormalizationCorrections-10.0.0.txt + 7296fe7aa07d7d288e65d559af2ad49b NormalizationTest-10.0.0.txt + 2a52f30695dcc821f0f224650552beaf UnicodeData-10.0.0.txt -- 2.17.0