Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1251000ybv; Fri, 7 Feb 2020 17:37:09 -0800 (PST) X-Google-Smtp-Source: APXvYqzP3eM7JIwBFWGLfGDMZVpLkm9WB8eDXiq7gRJuz/3Hz9ZFVFjZPU5OhTHgRn6Swwe/aPpU X-Received: by 2002:aca:af09:: with SMTP id y9mr3795520oie.101.1581125829361; Fri, 07 Feb 2020 17:37:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581125829; cv=none; d=google.com; s=arc-20160816; b=i+C86VwUAqrYKGereQRNbSIXehwaneo2PhD+9jhbdYLWeyPpvWLv+ARhR/PBFE0887 7N+ACt2jgpWFSLr+8WviBQvKr5scmCuPFLSJojy2wLKFtZWdHrMFpr5cDm/rS1jVOgVw kvI/j6NOUXj7qaITV7GMF1UKNGGIMzMHqYvlUsnMGTpcpFjked6Jc+pzhlnMb0wML1NY ELZ6dKVGGrJ2WEykcHV//56xplh7yZjZrnSV9ORFgurGausrO8TGW0OAkgEyEpkNYQTA 3SozjuMYD7M56CE631g2ff4aR+PB1xN99wq608NibA1tSOLEmT7DPd0oa6YkEht60wlh OIzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=fTgB3kb3O+v8ILq/4TyTQ+b9UXnhOVk2x96mgR9bebg=; b=X2yfdbEBjJbXyzlWzp+kDonfs1R4RSUK7cWJocdmYEMrFc7CRx2Sg66xkNpm0JllFA 9bk6OKaiWWJxmysYJN2JObZtKvVayyvFl1cY+btibrMmQoIjJyEhnsA+8Uk8Bq1+WWI5 XJajdfStzT0Y206Xiiub6EjhxLRUQ6z1+arOs24ox2vji+5VRfSgcpa04zs3Svjfo1Q8 pxx6g942BIjMvItX6VEwW20/FpLzx5g7x9zntbIfN3JwH8ZFDswGQaT9meW3LACcsewu vXP5w5c7O+aB1F2ptcOLqVq5Jft273hV0NyO8mFVR47bjCyLpm4YE2QTPyFcC4+HpmpK 9+wA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=g8ETqpmp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o18si689998otk.80.2020.02.07.17.36.58; Fri, 07 Feb 2020 17:37:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=g8ETqpmp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727527AbgBHBgB (ORCPT + 99 others); Fri, 7 Feb 2020 20:36:01 -0500 Received: from mail-pg1-f201.google.com ([209.85.215.201]:49460 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727484AbgBHBf7 (ORCPT ); Fri, 7 Feb 2020 20:35:59 -0500 Received: by mail-pg1-f201.google.com with SMTP id u14so776681pgq.16 for ; Fri, 07 Feb 2020 17:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fTgB3kb3O+v8ILq/4TyTQ+b9UXnhOVk2x96mgR9bebg=; b=g8ETqpmpRRWAN/I/0ihQuyhV6cZrzbIoeWPe0HONBVPSLk67hHc/HvOH5Y3E3sfXKm osLLtnREDZD+rb6zzkHfRBvJ4qwb6tOshn0h48zQROO/wU8i8sGw0n8zAkLu/i4laHlm VC5SF2YtA82PkH2qCWdQZ3RA7cLHrzjuCckH/Zo905k7Bj/TUHVv0pdQVck5swTV6UjO TqoElEBtPPPO4CjPtYIJ6/4J5p2FrFj5rtoF9klcjg6S6LJUJF4xprkxdTmeJM1Cviks gEoth9u806ySIHFmhgq9F7gJ08gTEEDvk84fAtCnN59hxoWQi08SIbLfXedYovWzn5eQ qBPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fTgB3kb3O+v8ILq/4TyTQ+b9UXnhOVk2x96mgR9bebg=; b=L1f3FkNXFUqg6K4KLhvBe8NkSXcOvFGQBVt2lAMDDNbAP0a9gI+Bv2tn31N7LzGn0u amj6hLnqxBYPpLZchElRJIe2kyrewRP8pdTqttkgyw9003AnRdbkg8rSUvg5oObSCghW OEsC/tqn1HRM34CS/Ww9sR135TayCXNoyktNXBqYubm0g2Yl5VtHir4a4gAzeR0YHvsP D1Qcvl9iZvbA4kFzg642+XOJm+czlMrIFlMKjq01k4zO6K0BlSa5vPLuSquWT6zC6b0A PufX34zrmW8HOJc5+0jG2K1uzgMRAxYuasEcomhj03ZE1US7Viux9t/GFTbXx9Z+4Zzq JnkQ== X-Gm-Message-State: APjAAAUeNrhl4l+oBzh4R0drWFbuNDYuR5+4MoynVOjA5VTOrua7aMAb mcxJnrSj5fhL70ZN6DGdTOR7B1/HlS8= X-Received: by 2002:a65:56c6:: with SMTP id w6mr2149182pgs.167.1581125758626; Fri, 07 Feb 2020 17:35:58 -0800 (PST) Date: Fri, 7 Feb 2020 17:35:45 -0800 In-Reply-To: <20200208013552.241832-1-drosen@google.com> Message-Id: <20200208013552.241832-2-drosen@google.com> Mime-Version: 1.0 References: <20200208013552.241832-1-drosen@google.com> X-Mailer: git-send-email 2.25.0.341.g760bfbb309-goog Subject: [PATCH v7 1/8] unicode: Add utf8_casefold_iter From: Daniel Rosenberg To: "Theodore Ts'o" , linux-ext4@vger.kernel.org, Jaegeuk Kim , Chao Yu , linux-f2fs-devel@lists.sourceforge.net, Eric Biggers , linux-fscrypt@vger.kernel.org, Alexander Viro , Richard Weinberger Cc: linux-mtd@lists.infradead.org, Andreas Dilger , Jonathan Corbet , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Gabriel Krisman Bertazi , kernel-team@android.com, Daniel Rosenberg Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This function will allow other uses of unicode to act upon a casefolded string without needing to allocate their own copy of one. The actor function can return an nonzero value to exit early. Signed-off-by: Daniel Rosenberg --- fs/unicode/utf8-core.c | 25 ++++++++++++++++++++++++- include/linux/unicode.h | 10 ++++++++++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 2a878b739115d..db050bf59a32b 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -122,9 +122,32 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } - EXPORT_SYMBOL(utf8_casefold); +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str, + struct utf8_itr_context *ctx) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + int c; + int res = 0; + int pos = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + while ((c = utf8byte(&cur))) { + if (c < 0) + return c; + res = ctx->actor(ctx, c, pos); + pos++; + if (res) + return res; + } + return res; +} +EXPORT_SYMBOL(utf8_casefold_iter); + int utf8_normalize(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 990aa97d80496..2ae12f8710ae2 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -10,6 +10,13 @@ struct unicode_map { int version; }; +struct utf8_itr_context; +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos); + +struct utf8_itr_context { + utf8_itr_actor_t actor; +}; + int utf8_validate(const struct unicode_map *um, const struct qstr *str); int utf8_strncmp(const struct unicode_map *um, @@ -27,6 +34,9 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, int utf8_casefold(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen); +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str, + struct utf8_itr_context *ctx); + struct unicode_map *utf8_load(const char *version); void utf8_unload(struct unicode_map *um); -- 2.25.0.341.g760bfbb309-goog