Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp25880944rwd; Mon, 3 Jul 2023 02:16:48 -0700 (PDT) X-Google-Smtp-Source: APBJJlGevWCAPEviWfDcvP7jhz6b8AgQVUy2riFJ6NOFWUhFDVMar0lDU6IGAttTaBtmOfXu8+8H X-Received: by 2002:a17:902:d353:b0:1b8:805f:98df with SMTP id l19-20020a170902d35300b001b8805f98dfmr5222813plk.30.1688375808534; Mon, 03 Jul 2023 02:16:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688375808; cv=none; d=google.com; s=arc-20160816; b=Bg8Bqx4GCwsbBXqju5rl+aiy037ss80rSNiqdE2E07VEN9r9XAho1v1vFtE/Squrs8 A+7IWXaSXz0I11MU5nVFJKPckg8fInsReJwP9MwUh/cVi1/wsfN9l2Gx3QoeVfGpZIIy z8VD7/mNZQ3Aq/Y4rDQik+SB3uxIkZsBRz646IIpTEEjkviktE8ksnIeK3YTuTGIChJ2 n5A+PCNl/S82j8LAvy+HuNL0QOU72c2UQhFLpFuWUvxDcyf86/UG1QcYHvEu+3XnrcIk FjZyoZ6Ps7yVUdRxvvRXLn+dWeDXc2kOeqdqwRQVVLk4hxNr18eesLRjc5iFFe4GkObi hjqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=iqjWu8cg3IoWLoYQZFhDoSA9bZqgDouR/ydNFxG/Xsk=; fh=LN74zx3AY3iPURcX/gbnxUJ5zQnAXMeutfuiPp3fK0Y=; b=IHHyqoGU7I5S/ZAEpfbuXhIZP8uWMCkg7MG3aSaTyKKRgJsa4bVdPExKJZxEAnnrGd LFmSSugm5W8N2Zv14cDB5sUhWhzKAevkqrw07Hg1lZYZMP9eDmMwuSn4yFw/VOtJO42X hrwP75gl4bus37hkCAWKY5VgWpqMzcSozsHG+n5paSFc5eZNl8aP//4tE6fbjjWmZF+I 3Psp9wyPQI9GaL5Ho3ScQzV5Mx9fXPj9qT37E4SzX0j6C03vrC3+4VQXqoq6Ng4FLAih YfE58GFYgnFtW86bc47HaKiPngP3a6pK8NYEhBo5KZBQk4EQPzpOWGx63ieHOYMr2Vj+ afpg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f15-20020a170902ce8f00b001b88f29c677si3190263plg.583.2023.07.03.02.16.32; Mon, 03 Jul 2023 02:16:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229535AbjGCJHh (ORCPT + 99 others); Mon, 3 Jul 2023 05:07:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbjGCJHb (ORCPT ); Mon, 3 Jul 2023 05:07:31 -0400 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E45718D for ; Mon, 3 Jul 2023 02:07:29 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R631e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0VmWwOlU_1688375245; Received: from 30.0.148.65(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VmWwOlU_1688375245) by smtp.aliyun-inc.com; Mon, 03 Jul 2023 17:07:26 +0800 Message-ID: <5df1b66b-622e-657e-f024-ce8d1a7e2476@linux.alibaba.com> Date: Mon, 3 Jul 2023 17:07:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr Content-Language: en-US To: Alexander Larsson Cc: hsiangkao@linux.alibaba.com, chao@kernel.org, huyue2@coolpad.com, linux-erofs@lists.ozlabs.org, linux-kernel@vger.kernel.org References: <20230621083209.116024-1-jefflexu@linux.alibaba.com> From: Jingbo Xu In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/3/23 3:25 PM, Alexander Larsson wrote: > On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu wrote: >> >> Hi all, >> >> Sorry for the late reply as I was on vacation these days. >> >> I test the hash bit for all xattrs given by Alex[1], to see if each >> xattr could be mapped into one unique bit in the 32-bit bloom filter. >> >> [1] >> https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/ >> >> >> On 6/21/23 4:32 PM, Jingbo Xu wrote: >>> >>> 3.2. input of hash function >>> ------------------------- >>> As previously described, each hash function will map the given data into >>> one bit of the bloom filter map. In our use case, xattr name serves as >>> the key of hash function. >>> >>> When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER) >>> and the remaining name apart from the prefix are handy. To avoid >>> constructing the full xattr name, the above index and name are fed into >>> the hash function directly in the following way: >>> >>> ``` >>> bit = xxh32(name, strlen(name), index + i); >>> ``` >>> >>> where index serves as part of seed, so that it gets involved in the >>> calculation for the hash. >> >> >> All xattrs are hashed with one single hash function. >> >> I first tested with the following hash function: >> >> ``` >> xxh32(name, strlen(name), index) >> ``` >> >> where `index` represents the index of corresponding predefined name >> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name >> after stripping the above predefined name prefix (e.g. >> "overlay.metacopy" for "user.overlay.metacopy") >> >> >> The mapping results are: >> >> bit 0: security.SMACK64EXEC >> bit 1: >> bit 2: user.overlay.protattr >> bit 3: trusted.overlay.impure, user.overlay.opaque, user.mime_type >> bit 4: >> bit 5: user.overlay.origin >> bit 6: user.overlay.metacopy, security.evm >> bit 8: trusted.overlay.opaque >> bit 9: trusted.overlay.origin >> bit 10: trusted.overlay.upper, trusted.overlay.protattr >> bit 11: security.apparmor, security.capability >> bit 12: security.SMACK64 >> bit 13: user.overlay.redirect, security.ima >> bit 14: user.overlay.upper >> bit 15: trusted.overlay.redirect >> bit 16: security.SMACK64IPOUT >> bit 17: >> bit 18: system.posix_acl_access >> bit 19: security.selinux >> bit 20: >> bit 21: >> bit 22: system.posix_acl_default >> bit 23: security.SMACK64MMAP >> bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE >> bit 25: trusted.overlay.metacopy >> bit 26: >> bit 27: security.SMACK64IPIN >> bit 28: >> bit 29: >> bit 30: trusted.overlay.nlink >> bit 31: >> >> Here 30 xattrs are mapped into 22 bits. There are two potential >> conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr) >> and bit 24 (user.overlay.impure, user.overlay.nlink). > > Bit 11 (apparmor and capabilities) seems like the most likely thing to > run into. I.e. on an apparmor-using system, many files would have > apparmor xattr set, so looking up security.capabilities on it would > cause a false negative and we'd unnecessarily read the xattrs. > >>> An alternative way is to calculate the hash from the full xattr name by >>> feeding the prefix string and the remaining name string separately in >>> the following way: >>> >>> ``` >>> xxh32_reset() >>> xxh32_update(prefix string, ...) >>> xxh32_update(remaining name, ...) >>> xxh32_digest() >>> ``` >>> >>> But I doubt if it really deserves to call multiple APIs instead of one >>> single xxh32(). >> >> >> I also tested with the following hash function, where the full name of >> the xattr, e.g. "user.overlay.metacopy", is fed into the hash function. >> >> ``` >> xxh32(name, strlen(name), 0) >> ``` >> >> >> Following are the mapping results: >> >> bit 0: trusted.overlay.impure, user.overlay.protattr >> bit 1: security.SMACK64IPOUT >> bit 2: >> bit 3: security.capability >> bit 4: security.selinux >> bit 5: security.ima >> bit 6: user.overlay.metacopy >> bit 8: >> bit 9: trusted.overlay.redirect, security.SMACK64EXEC >> bit 10: system.posix_acl_access >> bit 11: trusted.overlay.nlink >> bit 12: trusted.overlay.opaque >> bit 13: >> bit 14: >> bit 15: >> bit 16: >> bit 17: user.overlay.impure >> bit 18: security.apparmor >> bit 19: >> bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE >> bit 21: >> bit 22: trusted.overlay.metacopy, trusted.overlay.protattr >> bit 23: user.overlay.upper, security.evm >> bit 24: user.overlay.redirect, security.SMACK64IPIN, >> system.posix_acl_default >> bit 25: security.SMACK64 >> bit 26: >> bit 27: trusted.overlay.upper, security.SMACK64MMAP >> bit 28: trusted.overlay.origin, user.mime_type >> bit 29: >> bit 30: >> bit 31: user.overlay.opaque >> >> 30 xattrs are mapped into 20 bits. Similarly there are two potential >> conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit >> 22 (trusted.overlay.metacopy, trusted.overlay.protattr). >> >> >> Summary >> ======= >> >> Personally I would prefer the former, as it maps xattrs into the bloom >> filter more evenly (22 bits vs 20 bits) and can better cooperate with >> the kernel routine (index and the remaining name string, rather than the >> full name string, are handy). > > I agree that we want the approach with better cooperation with the > kernel function. However, I would much prefer if all the xattrs that > are commonly set on many files are unconflicted. This would be at > least: selinux, ima, evm, apparmor. > > Can't you just add a magic constant to the seed? Then we can come up > with one that gives a good spread and hardcode that. Brilliant idea! I would try to see if it works. -- Thanks, Jingbo