Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp1934307rwl; Mon, 26 Dec 2022 06:49:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXu+hsMc80vRLliROMWMUMhIWWb0EFC6ie46eNfSQsmkChjVoFuFXvAUFQ6Ar1eOJQ4SD+Jy X-Received: by 2002:aa7:db53:0:b0:479:973d:8672 with SMTP id n19-20020aa7db53000000b00479973d8672mr15535402edt.3.1672066189605; Mon, 26 Dec 2022 06:49:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672066189; cv=none; d=google.com; s=arc-20160816; b=JJAjmj2Q2DQuHlapOX2BxO1eD4WZhxmJs/qbl5xMjFBAJD8S0ka4qzwcpDuScR2TZ/ tsG8+qRjImWR/Nq+60fz77x8h+/KrNVveWu8KD81/H+QFJG/ldshHO/7LgJIUt5ip8dg X4CH6yXTrUlBM/3Z7muW3LQsYQq0Td7+MBH2PCbKnMXMh/bAHJ3d/GRV/1iw5ouf5/qw gXgQ3A1CcPnM5ZIFqzUgGPmPXUlta8+Vmse4PuP5B+7T/GswqldwRu4KNOzSDJvzT2ll n/5wyNYgzFq4MwEcKgyX017BeZxj72tEwP9j9kuONZU9rpkokn86uopI4uUx4rTPt7s4 IvUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:to:from:dkim-signature; bh=/ycN6Zivg5ZhjpVFylddVxxFXHE3ztsfZyZYvKuGAxE=; b=kvbUwBJAMHdzbOQCt8oCJkhcbOmgUzpgYfgbKo0SZItwjYW3qP5USjLzY9O03miEOS aeTAf8j3PT9XPIKpZsdt9yn9W3A5NTn1jNn9USHNtWGSkEip0hgsQXX+DfnYFz4xPvjx EXUQadHAvi44wAvFWnHVA0HjYbUmaDbujSCOnznTFZdqBhu/oG6+mv4imizOBCzxWpfa Wj+ZzSSpKtpnETtLkjatrOnmkBEC8oClEhb19wtXK95PCrIiySImJ0DJg0F7CGNjspSr KrnaO5rTmUWo7qEHq7gyK+xhNc1+vopwMB+IAW1uzlSyqP818tctl4LPIpzPcgat78sD SrBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhvlJSnh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s1-20020a056402520100b004761d56a66asi9850239edd.271.2022.12.26.06.49.31; Mon, 26 Dec 2022 06:49:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HhvlJSnh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231730AbiLZOWH (ORCPT + 66 others); Mon, 26 Dec 2022 09:22:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229532AbiLZOWF (ORCPT ); Mon, 26 Dec 2022 09:22:05 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29D57DDD; Mon, 26 Dec 2022 06:22:04 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B856360EA0; Mon, 26 Dec 2022 14:22:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFB62C433D2; Mon, 26 Dec 2022 14:22:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672064523; bh=RE53Fciq2nRixBCsoy3+egocitH34STsez30iGbOyhc=; h=From:To:Subject:Date:From; b=HhvlJSnhDxEB2hW0pKagPKBpltkDOgH34p8DXvkq2ap8tamup3CfInSKrYghK+89Z FVh7d9NiilF5DA0ewfpzdBT67ag9zu0mwMy5tkrvWuPKKisesjm5cwDxpA23QrAAp+ Un2hAExH3isoqv+J76Hnbt40z2gzMqyoePufWK8kqLUG10O1Lv5PXjDW/y/51Ml8CK kmDrSN/W8NbIbn7u8MfWpooCPgTDAPJat6mQ1FEWOCW/uTBHpAOoEsxXE2c1ENuTFI gY1UBZu2Vxj3Bo+eUbpOX1Tuv0lx9fSTXmu2hTUiGzHfI+zI1zboiY+b8DPNz4LbD5 qfKHRgFB8GSQA== Received: by pali.im (Postfix) id 03B949D7; Mon, 26 Dec 2022 15:21:59 +0100 (CET) From: =?UTF-8?q?Pali=20Roh=C3=A1r?= To: linux-fsdevel@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, linux-cifs@vger.kernel.org, jfs-discussion@lists.sourceforge.net, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , "Theodore Y . Ts'o" , Anton Altaparmakov , OGAWA Hirofumi , Luis de Bethencourt , Salah Triki , Steve French , Paulo Alcantara , Ronnie Sahlberg , Shyam Prasad N , Tom Talpey , Dave Kleikamp , Andrew Morton , Pavel Machek , Christoph Hellwig , Kari Argillander , Viacheslav Dubeyko Subject: [RFC PATCH v2 00/18] fs: Remove usage of broken nls_utf8 and drop it Date: Mon, 26 Dec 2022 15:21:32 +0100 Message-Id: <20221226142150.13324-1-pali@kernel.org> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Module nls_utf8 is broken in several ways. It does not support (full) UTF-8, despite its name. It cannot handle 4-byte UTF-8 sequences and tolower/toupper table is not implemented at all. Which means that it is not suitable for usage in case-insensitive filesystems or UTF-16 filesystems (because of e.g. missing UTF-16 surrogate pairs processing). This is RFC v2 patch series which unify and fix iocharset=utf8 mount option in all fs drivers and converts all remaining fs drivers to use utf8s_to_utf16s(), utf16s_to_utf8s(), utf8_to_utf32(), utf32_to_utf8 functions for implementing UTF-8 support instead of nls_utf8. So at the end it allows to completely drop this broken nls_utf8 module. For more details look at email thread where was discussed fs unification: https://lore.kernel.org/linux-fsdevel/20200102211855.gg62r7jshp742d6i@pali/t/#u This patch series is mostly untested and presented as RFC. Please let me know what do you think about it and if is the correct way how to fix broken UTF-8 support in fs drivers. As explained in above email thread I think it does not make sense to try fixing whole NLS framework and it is easier to just drop this nls_utf8 module. Note: this patch series does not address UTF-8 fat case-sensitivity issue: https://lore.kernel.org/linux-fsdevel/20200119221455.bac7dc55g56q2l4r@pali/ Changes since RFC v1: * Dropped already merged udf and isofs patches * Addressed review comments: - updated documentation - usage of seq_puts - some code moved to local variables - usage of true/false instead of 1/0 - rebased on top of master branch Link to RFC v1: https://lore.kernel.org/linux-fsdevel/20210808162453.1653-1-pali@kernel.org/ Pali Rohár (18): fat: Fix iocharset=utf8 mount option hfsplus: Add iocharset= mount option as alias for nls= ntfs: Undeprecate iocharset= mount option ntfs: Fix error processing when load_nls() fails befs: Fix printing iocharset= mount option befs: Rename enum value Opt_charset to Opt_iocharset to match mount option befs: Fix error processing when load_nls() fails befs: Allow to use native UTF-8 mode hfs: Explicitly set hsb->nls_disk when hsb->nls_io is set hfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option hfsplus: Do not use broken utf8 NLS table for iocharset=utf8 mount option jfs: Remove custom iso8859-1 implementation jfs: Fix buffer overflow in jfs_strfromUCS_le() function jfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option ntfs: Do not use broken utf8 NLS table for iocharset=utf8 mount option cifs: Do not use broken utf8 NLS table for iocharset=utf8 mount option cifs: Remove usage of load_nls_default() calls nls: Drop broken nls_utf8 module Documentation/filesystems/hfsplus.rst | 3 + Documentation/filesystems/ntfs.rst | 5 +- Documentation/filesystems/vfat.rst | 13 +-- fs/befs/linuxvfs.c | 24 +++-- fs/cifs/cifs_unicode.c | 128 +++++++++++++++++--------- fs/cifs/cifs_unicode.h | 2 +- fs/cifs/cifsfs.c | 2 + fs/cifs/cifssmb.c | 8 +- fs/cifs/connect.c | 8 +- fs/cifs/dfs_cache.c | 24 ++--- fs/cifs/dir.c | 28 ++++-- fs/cifs/smb2pdu.c | 18 +--- fs/cifs/winucase.c | 14 ++- fs/fat/Kconfig | 19 +--- fs/fat/dir.c | 17 ++-- fs/fat/fat.h | 22 +++++ fs/fat/inode.c | 28 +++--- fs/fat/namei_vfat.c | 26 ++++-- fs/hfs/super.c | 62 +++++++++++-- fs/hfs/trans.c | 62 +++++++------ fs/hfsplus/dir.c | 7 +- fs/hfsplus/options.c | 39 +++++--- fs/hfsplus/super.c | 7 +- fs/hfsplus/unicode.c | 31 ++++++- fs/hfsplus/xattr.c | 20 ++-- fs/hfsplus/xattr_security.c | 6 +- fs/jfs/jfs_dtree.c | 13 ++- fs/jfs/jfs_unicode.c | 35 +++---- fs/jfs/jfs_unicode.h | 2 +- fs/jfs/super.c | 29 ++++-- fs/nls/Kconfig | 9 -- fs/nls/Makefile | 1 - fs/nls/nls_utf8.c | 67 -------------- fs/ntfs/dir.c | 6 +- fs/ntfs/inode.c | 5 +- fs/ntfs/super.c | 60 ++++++------ fs/ntfs/unistr.c | 29 +++++- 37 files changed, 493 insertions(+), 386 deletions(-) delete mode 100644 fs/nls/nls_utf8.c -- 2.20.1