2023-06-21 08:54:28

by Jingbo Xu

[permalink] [raw]
Subject: [RFC 0/2] erofs: introduce bloom filter for xattr

Background
==========
Filesystems with ACL enabled generally need to read
"system.posix_acl_access"/"system.posix_acl_default" xattr to get the
access and default ACL. When filesystem is mounted with ACL enabled
while files in the system have not set access/default ACL, the getattr()
will run in vain while the round trip can decrease the performance in
workload like "ls -lR".

For example, there's a 12% performance boost if erofs is mounted with
"noacl" when running "ls -lR" workload on dataset [1] (given in [2]).

We'd better offer a fastpath to boost the above workload, as well as
other negative xattr lookup.


Proposal
========
Introduce a per-inode bloom filter for xattrs to boost the negative
xattr queries.

As following shows, a 32-bit bloom filter is introduced for each inode,
describing if a xattr with specific name exists on this inode.

```
struct erofs_xattr_ibody_header {
- __le32 h_reserved;
+ __le32 h_map; /* bloom filter */
...
}
```

Following are some implementation details for bloom filter.

1. Reverse bit value
--------------------
The bloom filter structure describes if a given data is inside the set.
It will map the given data into several bits of the bloom filter map.
The data must not exist inside the set if any mapped bit is 0, while the
data may be not inside the set even if all mapped bits is 1.

While in our use case, as erofs_xattr_ibody_header.h_map is previously a
(all zero) reserved field, the bit value for the bloom filter has a
reverse semantics in consideration for compatibility. That is, for a
given data, the mapped bits will be cleared to 0. Thus for a previously
built image without support for bloom filter, the bloom filter is all
zero and when it's mounted by the new kernel with support for bloom
filter, it can not determine if the queried xattr exists on the inode and
thus will fallback to the original routine of iterating all on-disk
xattrs to determine if the queried xattr exists.


2. The number of hash functions
-------------------------------
The optimal value for the number of the hash functions (k) is (ln2 *
m/n), where m stands the number of bits of the bloom filter map, while n
stands the number of all candidates may be inside the set.

In our use case, the number of common used xattr (n) is approximately 8,
including system.[posix_acl_access|posix_acl_default],
security.[capability|selinux] and
security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].

Given the number of bits of the bloom filter (m) is 32, the optimal value
for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).


3. The hash function
--------------------
xxh32() is chosen as the hash function.

Following are some tested statistics of several candidate hash
functions. Listed are time (in millionsecond) consumed when these hash
functions process input data in chunks of 24, 32, 64 and 4096 bytes.

| 24 B | 32 B | 64 B | 4 KB
--------+-------+-------+-------+------
jhash | 1325 | 2041 | 4016 | 2294
jhash2 | 1323 | 2035 | 4011 | 2310
crc16 | 7918 | 1056 | 2110 | 13784
crc32 | 1824 | 2436 | 4873 | 3107
crc32c | 2120 | 2708 | 5142 | 3109
xxhash | 1320 | 1967 | 2131 | 429
xxh32 | 1458 | 1358 | 1848 | 836
xxh64 | 1321 | 2081 | 2128 | 429


3.1. multiple hash functions with various seeds
-----------------------------------------------
As previously described, the given data will be mapped into several bits
of the bloom filter map with hash functions. There could be several
hash functions (k), with each hash function mapping the given data into
one bit of the bloom filter map. Thus given the number of hash
functions (k), each xattr name will be mapped into k bits of the bloom
filter map.

Here in our use case, k hash functions are all xxh32() but with
different seeds. As following shows, the seed is (index + i), where i
stands the index of the current hash function among all hash functions.
In this way, each hash function is distinguishable with others.

```
for (i = 0; i < k; i++)
bit = xxh32(name, strlen(name), index + i);
```

3.2. input of hash function
-------------------------
As previously described, each hash function will map the given data into
one bit of the bloom filter map. In our use case, xattr name serves as
the key of hash function.

When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
and the remaining name apart from the prefix are handy. To avoid
constructing the full xattr name, the above index and name are fed into
the hash function directly in the following way:

```
bit = xxh32(name, strlen(name), index + i);
```

where index serves as part of seed, so that it gets involved in the
calculation for the hash.

An alternative way is to calculate the hash from the full xattr name by
feeding the prefix string and the remaining name string separately in
the following way:

```
xxh32_reset()
xxh32_update(prefix string, ...)
xxh32_update(remaining name, ...)
xxh32_digest()
```

But I doubt if it really deserves to call multiple APIs instead of one
single xxh32().


Also be noted that for xattrs with long xattr name prefixes, the
above "name" is the xattr name after stripping the corresponding short
predefined xattr name prefix rather than the long xattr name prefix, as
only the former is handy in the kernel routine.


3.3. discussions
----------------
I think a wider discussion on the implementation details is needed,
including the number of the hash functions, and all other implementation
details mentioned above, as they are also part of the on-disk format.


Performance Improvement
=======================
The performance statistics are tested with 'ls -lR' workload upon the
dataset [1].
| uncached(ms) | cached(ms)
------------------------+---------------+----------
erofs.share | 468 | 264
erofs.share.bloom | 370 | 254
erofs.share.noacl | 412 | 216
erofs.share.noacl.bloom | 318 | 206

The tested erofs image is built in shared xattr layout. It indicates
~20% performance improvement with bloom filter in uncached scenarios.


[1] https://my.owndrive.com/index.php/s/irHJXRpZHtT3a5i
[2] https://lore.kernel.org/all/[email protected]/


Jingbo Xu (2):
erofs: update on-disk format for xattr bloom filter
erofs: optimize getxattr with bloom filter

fs/erofs/erofs_fs.h | 8 +++++++-
fs/erofs/internal.h | 2 ++
fs/erofs/xattr.c | 16 +++++++++++++++-
3 files changed, 24 insertions(+), 2 deletions(-)

--
2.19.1.6.gb485710b



2023-06-21 08:54:41

by Jingbo Xu

[permalink] [raw]
Subject: [RFC 2/2] erofs: optimize getxattr with bloom filter

Boost the negative xattr lookup with bloom filter.

The bit value for the bloom filter map has a reverse semantics for
compatibility. That is, the mapped bits will be cleared to 0, while the
bit value of 1 indicates the absence of corresponding xattr.

Signed-off-by: Jingbo Xu <[email protected]>
---
fs/erofs/internal.h | 2 ++
fs/erofs/xattr.c | 16 +++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 1e39c03357d1..49b4b350af8a 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -285,6 +285,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
+EROFS_FEATURE_FUNCS(xattr_bloom, compat, COMPAT_XATTR_BLOOM)

/* atomic flag definitions */
#define EROFS_I_EA_INITED_BIT 0
@@ -304,6 +305,7 @@ struct erofs_inode {
unsigned char inode_isize;
unsigned int xattr_isize;

+ unsigned long xattr_bloom_map;
unsigned int xattr_shared_count;
unsigned int *xattr_shared_xattrs;

diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index 4376f654474d..1ab481b46e8d 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -5,6 +5,7 @@
* Copyright (C) 2021-2022, Alibaba Cloud
*/
#include <linux/security.h>
+#include <linux/xxhash.h>
#include "xattr.h"

struct erofs_xattr_iter {
@@ -87,6 +88,7 @@ static int erofs_init_inode_xattrs(struct inode *inode)
}

ih = it.kaddr + erofs_blkoff(sb, it.pos);
+ vi->xattr_bloom_map = le32_to_cpu(ih->h_map);
vi->xattr_shared_count = ih->h_shared_count;
vi->xattr_shared_xattrs = kmalloc_array(vi->xattr_shared_count,
sizeof(uint), GFP_KERNEL);
@@ -392,8 +394,11 @@ int erofs_getxattr(struct inode *inode, int index,
const char *name,
void *buffer, size_t buffer_size)
{
- int ret;
+ int i, ret;
+ uint32_t bit;
struct erofs_xattr_iter it;
+ struct erofs_inode *const vi = EROFS_I(inode);
+ struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);

if (!name)
return -EINVAL;
@@ -402,6 +407,15 @@ int erofs_getxattr(struct inode *inode, int index,
if (ret)
return ret;

+ if (erofs_sb_has_xattr_bloom(sbi) && vi->xattr_bloom_map) {
+ for (i = 0; i < EROFS_XATTR_BLOOM_COUNTS; i++) {
+ bit = xxh32(name, strlen(name), index + i);
+ bit &= EROFS_XATTR_BLOOM_MASK;
+ if (test_bit(bit, &vi->xattr_bloom_map))
+ return -ENOATTR;
+ }
+ }
+
it.index = index;
it.name = (struct qstr)QSTR_INIT(name, strlen(name));
if (it.name.len > EROFS_NAME_LEN)
--
2.19.1.6.gb485710b


2023-06-21 12:07:48

by Alexander Larsson

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <[email protected]> wrote:
>
> Background
> ==========
> Filesystems with ACL enabled generally need to read
> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
> access and default ACL. When filesystem is mounted with ACL enabled
> while files in the system have not set access/default ACL, the getattr()
> will run in vain while the round trip can decrease the performance in
> workload like "ls -lR".
>
> For example, there's a 12% performance boost if erofs is mounted with
> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
>
> We'd better offer a fastpath to boost the above workload, as well as
> other negative xattr lookup.
>
>
> Proposal
> ========
> Introduce a per-inode bloom filter for xattrs to boost the negative
> xattr queries.
>
> As following shows, a 32-bit bloom filter is introduced for each inode,
> describing if a xattr with specific name exists on this inode.
>
> ```
> struct erofs_xattr_ibody_header {
> - __le32 h_reserved;
> + __le32 h_map; /* bloom filter */
> ...
> }
> ```
>
> Following are some implementation details for bloom filter.
>
> 1. Reverse bit value
> --------------------
> The bloom filter structure describes if a given data is inside the set.
> It will map the given data into several bits of the bloom filter map.
> The data must not exist inside the set if any mapped bit is 0, while the
> data may be not inside the set even if all mapped bits is 1.
>
> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
> (all zero) reserved field, the bit value for the bloom filter has a
> reverse semantics in consideration for compatibility. That is, for a
> given data, the mapped bits will be cleared to 0. Thus for a previously
> built image without support for bloom filter, the bloom filter is all
> zero and when it's mounted by the new kernel with support for bloom
> filter, it can not determine if the queried xattr exists on the inode and
> thus will fallback to the original routine of iterating all on-disk
> xattrs to determine if the queried xattr exists.
>
>
> 2. The number of hash functions
> -------------------------------
> The optimal value for the number of the hash functions (k) is (ln2 *
> m/n), where m stands the number of bits of the bloom filter map, while n
> stands the number of all candidates may be inside the set.
>
> In our use case, the number of common used xattr (n) is approximately 8,
> including system.[posix_acl_access|posix_acl_default],
> security.[capability|selinux] and
> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
>
> Given the number of bits of the bloom filter (m) is 32, the optimal value
> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).

This is indeed the optimal value in a traditional use of bloom
filters. However, I think it is based on a much larger set of values.
For this usecase it may be better to choose a different value.

I did some research a while ago on this, and I thought about the
counts too. Having more than one hash function is useful because it
allows you to avoid problems if two values happen to hash to the same
bucket, but this happens at the cost of there being less "unique
buckets". I spent some time looking for common xattr values
(including some from userspace) and ended up with a list of about 30.
If we can choose a single hash function that maps all (or most) of
these to a unique bucket (mod 32), then it seems to me that having
just that single hash function is better, as it is faster to compute,
and is good enough to uniquely identify each of these commonly used
xattrs.


2023-06-22 07:03:24

by Jingbo Xu

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 6/21/23 7:50 PM, Alexander Larsson wrote:
> On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <[email protected]> wrote:
>>
>> Background
>> ==========
>> Filesystems with ACL enabled generally need to read
>> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
>> access and default ACL. When filesystem is mounted with ACL enabled
>> while files in the system have not set access/default ACL, the getattr()
>> will run in vain while the round trip can decrease the performance in
>> workload like "ls -lR".
>>
>> For example, there's a 12% performance boost if erofs is mounted with
>> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
>>
>> We'd better offer a fastpath to boost the above workload, as well as
>> other negative xattr lookup.
>>
>>
>> Proposal
>> ========
>> Introduce a per-inode bloom filter for xattrs to boost the negative
>> xattr queries.
>>
>> As following shows, a 32-bit bloom filter is introduced for each inode,
>> describing if a xattr with specific name exists on this inode.
>>
>> ```
>> struct erofs_xattr_ibody_header {
>> - __le32 h_reserved;
>> + __le32 h_map; /* bloom filter */
>> ...
>> }
>> ```
>>
>> Following are some implementation details for bloom filter.
>>
>> 1. Reverse bit value
>> --------------------
>> The bloom filter structure describes if a given data is inside the set.
>> It will map the given data into several bits of the bloom filter map.
>> The data must not exist inside the set if any mapped bit is 0, while the
>> data may be not inside the set even if all mapped bits is 1.
>>
>> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
>> (all zero) reserved field, the bit value for the bloom filter has a
>> reverse semantics in consideration for compatibility. That is, for a
>> given data, the mapped bits will be cleared to 0. Thus for a previously
>> built image without support for bloom filter, the bloom filter is all
>> zero and when it's mounted by the new kernel with support for bloom
>> filter, it can not determine if the queried xattr exists on the inode and
>> thus will fallback to the original routine of iterating all on-disk
>> xattrs to determine if the queried xattr exists.
>>
>>
>> 2. The number of hash functions
>> -------------------------------
>> The optimal value for the number of the hash functions (k) is (ln2 *
>> m/n), where m stands the number of bits of the bloom filter map, while n
>> stands the number of all candidates may be inside the set.
>>
>> In our use case, the number of common used xattr (n) is approximately 8,
>> including system.[posix_acl_access|posix_acl_default],
>> security.[capability|selinux] and
>> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
>>
>> Given the number of bits of the bloom filter (m) is 32, the optimal value
>> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
>
> This is indeed the optimal value in a traditional use of bloom
> filters. However, I think it is based on a much larger set of values.
> For this usecase it may be better to choose a different value.
>
> I did some research a while ago on this, and I thought about the
> counts too. Having more than one hash function is useful because it
> allows you to avoid problems if two values happen to hash to the same
> bucket, but this happens at the cost of there being less "unique
> buckets". I spent some time looking for common xattr values
> (including some from userspace) and ended up with a list of about 30.

Yeah, if the number of common used xattr (n) is 30, then the optimal
value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
The optimal value in theory also matches our intuition.


> If we can choose a single hash function that maps all (or most) of
> these to a unique bucket (mod 32),

Excellent research! Would you mind sharing the list of these
approximately 30 commonly used xattrs, so that I could check if they are
mapped to unique bucket with the single hash function we proposed?


--
Thanks,
Jingbo

2023-06-22 08:01:27

by Alexander Larsson

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

On Thu, Jun 22, 2023 at 8:37 AM Jingbo Xu <[email protected]> wrote:
>
>
>
> On 6/21/23 7:50 PM, Alexander Larsson wrote:
> > On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <[email protected]> wrote:
> >>
> >> Background
> >> ==========
> >> Filesystems with ACL enabled generally need to read
> >> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
> >> access and default ACL. When filesystem is mounted with ACL enabled
> >> while files in the system have not set access/default ACL, the getattr()
> >> will run in vain while the round trip can decrease the performance in
> >> workload like "ls -lR".
> >>
> >> For example, there's a 12% performance boost if erofs is mounted with
> >> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
> >>
> >> We'd better offer a fastpath to boost the above workload, as well as
> >> other negative xattr lookup.
> >>
> >>
> >> Proposal
> >> ========
> >> Introduce a per-inode bloom filter for xattrs to boost the negative
> >> xattr queries.
> >>
> >> As following shows, a 32-bit bloom filter is introduced for each inode,
> >> describing if a xattr with specific name exists on this inode.
> >>
> >> ```
> >> struct erofs_xattr_ibody_header {
> >> - __le32 h_reserved;
> >> + __le32 h_map; /* bloom filter */
> >> ...
> >> }
> >> ```
> >>
> >> Following are some implementation details for bloom filter.
> >>
> >> 1. Reverse bit value
> >> --------------------
> >> The bloom filter structure describes if a given data is inside the set.
> >> It will map the given data into several bits of the bloom filter map.
> >> The data must not exist inside the set if any mapped bit is 0, while the
> >> data may be not inside the set even if all mapped bits is 1.
> >>
> >> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
> >> (all zero) reserved field, the bit value for the bloom filter has a
> >> reverse semantics in consideration for compatibility. That is, for a
> >> given data, the mapped bits will be cleared to 0. Thus for a previously
> >> built image without support for bloom filter, the bloom filter is all
> >> zero and when it's mounted by the new kernel with support for bloom
> >> filter, it can not determine if the queried xattr exists on the inode and
> >> thus will fallback to the original routine of iterating all on-disk
> >> xattrs to determine if the queried xattr exists.
> >>
> >>
> >> 2. The number of hash functions
> >> -------------------------------
> >> The optimal value for the number of the hash functions (k) is (ln2 *
> >> m/n), where m stands the number of bits of the bloom filter map, while n
> >> stands the number of all candidates may be inside the set.
> >>
> >> In our use case, the number of common used xattr (n) is approximately 8,
> >> including system.[posix_acl_access|posix_acl_default],
> >> security.[capability|selinux] and
> >> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
> >>
> >> Given the number of bits of the bloom filter (m) is 32, the optimal value
> >> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
> >
> > This is indeed the optimal value in a traditional use of bloom
> > filters. However, I think it is based on a much larger set of values.
> > For this usecase it may be better to choose a different value.
> >
> > I did some research a while ago on this, and I thought about the
> > counts too. Having more than one hash function is useful because it
> > allows you to avoid problems if two values happen to hash to the same
> > bucket, but this happens at the cost of there being less "unique
> > buckets". I spent some time looking for common xattr values
> > (including some from userspace) and ended up with a list of about 30.
>
> Yeah, if the number of common used xattr (n) is 30, then the optimal
> value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
> The optimal value in theory also matches our intuition.
>
>
> > If we can choose a single hash function that maps all (or most) of
> > these to a unique bucket (mod 32),
>
> Excellent research! Would you mind sharing the list of these
> approximately 30 commonly used xattrs, so that I could check if they are
> mapped to unique bucket with the single hash function we proposed?

This is the list I came up with:

trusted.overlay.opaque
trusted.overlay.redirect
trusted.overlay.origin
trusted.overlay.impure
trusted.overlay.nlink
trusted.overlay.upper
trusted.overlay.metacopy
trusted.overlay.protattr
user.overlay.opaque
user.overlay.redirect
user.overlay.origin
user.overlay.impure
user.overlay.nlink
user.overlay.upper
user.overlay.metacopy
user.overlay.protattr
security.evm
security.ima
security.selinux
security.SMACK64
security.SMACK64IPIN
security.SMACK64IPOUT
security.SMACK64EXEC
security.SMACK64TRANSMUTE
security.SMACK64MMAP
security.apparmor
security.capability
system.posix_acl_access
system.posix_acl_default
user.mime_type

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
[email protected] [email protected]


2023-06-24 01:44:49

by Jingbo Xu

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 6/22/23 3:56 PM, Alexander Larsson wrote:
> On Thu, Jun 22, 2023 at 8:37 AM Jingbo Xu <[email protected]> wrote:
>>
>>
>>
>> On 6/21/23 7:50 PM, Alexander Larsson wrote:
>>> On Wed, Jun 21, 2023 at 10:32 AM Jingbo Xu <[email protected]> wrote:
>>>>
>>>> Background
>>>> ==========
>>>> Filesystems with ACL enabled generally need to read
>>>> "system.posix_acl_access"/"system.posix_acl_default" xattr to get the
>>>> access and default ACL. When filesystem is mounted with ACL enabled
>>>> while files in the system have not set access/default ACL, the getattr()
>>>> will run in vain while the round trip can decrease the performance in
>>>> workload like "ls -lR".
>>>>
>>>> For example, there's a 12% performance boost if erofs is mounted with
>>>> "noacl" when running "ls -lR" workload on dataset [1] (given in [2]).
>>>>
>>>> We'd better offer a fastpath to boost the above workload, as well as
>>>> other negative xattr lookup.
>>>>
>>>>
>>>> Proposal
>>>> ========
>>>> Introduce a per-inode bloom filter for xattrs to boost the negative
>>>> xattr queries.
>>>>
>>>> As following shows, a 32-bit bloom filter is introduced for each inode,
>>>> describing if a xattr with specific name exists on this inode.
>>>>
>>>> ```
>>>> struct erofs_xattr_ibody_header {
>>>> - __le32 h_reserved;
>>>> + __le32 h_map; /* bloom filter */
>>>> ...
>>>> }
>>>> ```
>>>>
>>>> Following are some implementation details for bloom filter.
>>>>
>>>> 1. Reverse bit value
>>>> --------------------
>>>> The bloom filter structure describes if a given data is inside the set.
>>>> It will map the given data into several bits of the bloom filter map.
>>>> The data must not exist inside the set if any mapped bit is 0, while the
>>>> data may be not inside the set even if all mapped bits is 1.
>>>>
>>>> While in our use case, as erofs_xattr_ibody_header.h_map is previously a
>>>> (all zero) reserved field, the bit value for the bloom filter has a
>>>> reverse semantics in consideration for compatibility. That is, for a
>>>> given data, the mapped bits will be cleared to 0. Thus for a previously
>>>> built image without support for bloom filter, the bloom filter is all
>>>> zero and when it's mounted by the new kernel with support for bloom
>>>> filter, it can not determine if the queried xattr exists on the inode and
>>>> thus will fallback to the original routine of iterating all on-disk
>>>> xattrs to determine if the queried xattr exists.
>>>>
>>>>
>>>> 2. The number of hash functions
>>>> -------------------------------
>>>> The optimal value for the number of the hash functions (k) is (ln2 *
>>>> m/n), where m stands the number of bits of the bloom filter map, while n
>>>> stands the number of all candidates may be inside the set.
>>>>
>>>> In our use case, the number of common used xattr (n) is approximately 8,
>>>> including system.[posix_acl_access|posix_acl_default],
>>>> security.[capability|selinux] and
>>>> security.[SMACK64|SMACK64TRANSMUTE|SMACK64EXEC|SMACK64MMAP].
>>>>
>>>> Given the number of bits of the bloom filter (m) is 32, the optimal value
>>>> for the number of the hash functions (k) is 2 (ln2 * m/n = 2.7).
>>>
>>> This is indeed the optimal value in a traditional use of bloom
>>> filters. However, I think it is based on a much larger set of values.
>>> For this usecase it may be better to choose a different value.
>>>
>>> I did some research a while ago on this, and I thought about the
>>> counts too. Having more than one hash function is useful because it
>>> allows you to avoid problems if two values happen to hash to the same
>>> bucket, but this happens at the cost of there being less "unique
>>> buckets". I spent some time looking for common xattr values
>>> (including some from userspace) and ended up with a list of about 30.
>>
>> Yeah, if the number of common used xattr (n) is 30, then the optimal
>> value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
>> The optimal value in theory also matches our intuition.
>>
>>
>>> If we can choose a single hash function that maps all (or most) of
>>> these to a unique bucket (mod 32),
>>
>> Excellent research! Would you mind sharing the list of these
>> approximately 30 commonly used xattrs, so that I could check if they are
>> mapped to unique bucket with the single hash function we proposed?
>
> This is the list I came up with:
>
> trusted.overlay.opaque
> trusted.overlay.redirect
> trusted.overlay.origin
> trusted.overlay.impure
> trusted.overlay.nlink
> trusted.overlay.upper
> trusted.overlay.metacopy
> trusted.overlay.protattr
> user.overlay.opaque
> user.overlay.redirect
> user.overlay.origin
> user.overlay.impure
> user.overlay.nlink
> user.overlay.upper
> user.overlay.metacopy
> user.overlay.protattr
> security.evm
> security.ima
> security.selinux
> security.SMACK64
> security.SMACK64IPIN
> security.SMACK64IPOUT
> security.SMACK64EXEC
> security.SMACK64TRANSMUTE
> security.SMACK64MMAP
> security.apparmor
> security.capability
> system.posix_acl_access
> system.posix_acl_default
> user.mime_type
>

Got it. Thanks a lot!

--
Thanks,
Jingbo

2023-06-28 08:37:12

by Jingbo Xu

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

Hi all,

Sorry for the late reply as I was on vacation these days.

I test the hash bit for all xattrs given by Alex[1], to see if each
xattr could be mapped into one unique bit in the 32-bit bloom filter.

[1]
https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/


On 6/21/23 4:32 PM, Jingbo Xu wrote:
>
> 3.2. input of hash function
> -------------------------
> As previously described, each hash function will map the given data into
> one bit of the bloom filter map. In our use case, xattr name serves as
> the key of hash function.
>
> When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
> and the remaining name apart from the prefix are handy. To avoid
> constructing the full xattr name, the above index and name are fed into
> the hash function directly in the following way:
>
> ```
> bit = xxh32(name, strlen(name), index + i);
> ```
>
> where index serves as part of seed, so that it gets involved in the
> calculation for the hash.


All xattrs are hashed with one single hash function.

I first tested with the following hash function:

```
xxh32(name, strlen(name), index)
```

where `index` represents the index of corresponding predefined name
prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
after stripping the above predefined name prefix (e.g.
"overlay.metacopy" for "user.overlay.metacopy")


The mapping results are:

bit 0: security.SMACK64EXEC
bit 1:
bit 2: user.overlay.protattr
bit 3: trusted.overlay.impure, user.overlay.opaque, user.mime_type
bit 4:
bit 5: user.overlay.origin
bit 6: user.overlay.metacopy, security.evm
bit 8: trusted.overlay.opaque
bit 9: trusted.overlay.origin
bit 10: trusted.overlay.upper, trusted.overlay.protattr
bit 11: security.apparmor, security.capability
bit 12: security.SMACK64
bit 13: user.overlay.redirect, security.ima
bit 14: user.overlay.upper
bit 15: trusted.overlay.redirect
bit 16: security.SMACK64IPOUT
bit 17:
bit 18: system.posix_acl_access
bit 19: security.selinux
bit 20:
bit 21:
bit 22: system.posix_acl_default
bit 23: security.SMACK64MMAP
bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE
bit 25: trusted.overlay.metacopy
bit 26:
bit 27: security.SMACK64IPIN
bit 28:
bit 29:
bit 30: trusted.overlay.nlink
bit 31:

Here 30 xattrs are mapped into 22 bits. There are two potential
conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr)
and bit 24 (user.overlay.impure, user.overlay.nlink).


>
> An alternative way is to calculate the hash from the full xattr name by
> feeding the prefix string and the remaining name string separately in
> the following way:
>
> ```
> xxh32_reset()
> xxh32_update(prefix string, ...)
> xxh32_update(remaining name, ...)
> xxh32_digest()
> ```
>
> But I doubt if it really deserves to call multiple APIs instead of one
> single xxh32().


I also tested with the following hash function, where the full name of
the xattr, e.g. "user.overlay.metacopy", is fed into the hash function.

```
xxh32(name, strlen(name), 0)
```


Following are the mapping results:

bit 0: trusted.overlay.impure, user.overlay.protattr
bit 1: security.SMACK64IPOUT
bit 2:
bit 3: security.capability
bit 4: security.selinux
bit 5: security.ima
bit 6: user.overlay.metacopy
bit 8:
bit 9: trusted.overlay.redirect, security.SMACK64EXEC
bit 10: system.posix_acl_access
bit 11: trusted.overlay.nlink
bit 12: trusted.overlay.opaque
bit 13:
bit 14:
bit 15:
bit 16:
bit 17: user.overlay.impure
bit 18: security.apparmor
bit 19:
bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE
bit 21:
bit 22: trusted.overlay.metacopy, trusted.overlay.protattr
bit 23: user.overlay.upper, security.evm
bit 24: user.overlay.redirect, security.SMACK64IPIN,
system.posix_acl_default
bit 25: security.SMACK64
bit 26:
bit 27: trusted.overlay.upper, security.SMACK64MMAP
bit 28: trusted.overlay.origin, user.mime_type
bit 29:
bit 30:
bit 31: user.overlay.opaque

30 xattrs are mapped into 20 bits. Similarly there are two potential
conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit
22 (trusted.overlay.metacopy, trusted.overlay.protattr).


Summary
=======

Personally I would prefer the former, as it maps xattrs into the bloom
filter more evenly (22 bits vs 20 bits) and can better cooperate with
the kernel routine (index and the remaining name string, rather than the
full name string, are handy).

--
Thanks,
Jingbo

2023-07-03 07:33:52

by Alexander Larsson

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu <[email protected]> wrote:
>
> Hi all,
>
> Sorry for the late reply as I was on vacation these days.
>
> I test the hash bit for all xattrs given by Alex[1], to see if each
> xattr could be mapped into one unique bit in the 32-bit bloom filter.
>
> [1]
> https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/
>
>
> On 6/21/23 4:32 PM, Jingbo Xu wrote:
> >
> > 3.2. input of hash function
> > -------------------------
> > As previously described, each hash function will map the given data into
> > one bit of the bloom filter map. In our use case, xattr name serves as
> > the key of hash function.
> >
> > When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
> > and the remaining name apart from the prefix are handy. To avoid
> > constructing the full xattr name, the above index and name are fed into
> > the hash function directly in the following way:
> >
> > ```
> > bit = xxh32(name, strlen(name), index + i);
> > ```
> >
> > where index serves as part of seed, so that it gets involved in the
> > calculation for the hash.
>
>
> All xattrs are hashed with one single hash function.
>
> I first tested with the following hash function:
>
> ```
> xxh32(name, strlen(name), index)
> ```
>
> where `index` represents the index of corresponding predefined name
> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
> after stripping the above predefined name prefix (e.g.
> "overlay.metacopy" for "user.overlay.metacopy")
>
>
> The mapping results are:
>
> bit 0: security.SMACK64EXEC
> bit 1:
> bit 2: user.overlay.protattr
> bit 3: trusted.overlay.impure, user.overlay.opaque, user.mime_type
> bit 4:
> bit 5: user.overlay.origin
> bit 6: user.overlay.metacopy, security.evm
> bit 8: trusted.overlay.opaque
> bit 9: trusted.overlay.origin
> bit 10: trusted.overlay.upper, trusted.overlay.protattr
> bit 11: security.apparmor, security.capability
> bit 12: security.SMACK64
> bit 13: user.overlay.redirect, security.ima
> bit 14: user.overlay.upper
> bit 15: trusted.overlay.redirect
> bit 16: security.SMACK64IPOUT
> bit 17:
> bit 18: system.posix_acl_access
> bit 19: security.selinux
> bit 20:
> bit 21:
> bit 22: system.posix_acl_default
> bit 23: security.SMACK64MMAP
> bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE
> bit 25: trusted.overlay.metacopy
> bit 26:
> bit 27: security.SMACK64IPIN
> bit 28:
> bit 29:
> bit 30: trusted.overlay.nlink
> bit 31:
>
> Here 30 xattrs are mapped into 22 bits. There are two potential
> conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr)
> and bit 24 (user.overlay.impure, user.overlay.nlink).

Bit 11 (apparmor and capabilities) seems like the most likely thing to
run into. I.e. on an apparmor-using system, many files would have
apparmor xattr set, so looking up security.capabilities on it would
cause a false negative and we'd unnecessarily read the xattrs.

> > An alternative way is to calculate the hash from the full xattr name by
> > feeding the prefix string and the remaining name string separately in
> > the following way:
> >
> > ```
> > xxh32_reset()
> > xxh32_update(prefix string, ...)
> > xxh32_update(remaining name, ...)
> > xxh32_digest()
> > ```
> >
> > But I doubt if it really deserves to call multiple APIs instead of one
> > single xxh32().
>
>
> I also tested with the following hash function, where the full name of
> the xattr, e.g. "user.overlay.metacopy", is fed into the hash function.
>
> ```
> xxh32(name, strlen(name), 0)
> ```
>
>
> Following are the mapping results:
>
> bit 0: trusted.overlay.impure, user.overlay.protattr
> bit 1: security.SMACK64IPOUT
> bit 2:
> bit 3: security.capability
> bit 4: security.selinux
> bit 5: security.ima
> bit 6: user.overlay.metacopy
> bit 8:
> bit 9: trusted.overlay.redirect, security.SMACK64EXEC
> bit 10: system.posix_acl_access
> bit 11: trusted.overlay.nlink
> bit 12: trusted.overlay.opaque
> bit 13:
> bit 14:
> bit 15:
> bit 16:
> bit 17: user.overlay.impure
> bit 18: security.apparmor
> bit 19:
> bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE
> bit 21:
> bit 22: trusted.overlay.metacopy, trusted.overlay.protattr
> bit 23: user.overlay.upper, security.evm
> bit 24: user.overlay.redirect, security.SMACK64IPIN,
> system.posix_acl_default
> bit 25: security.SMACK64
> bit 26:
> bit 27: trusted.overlay.upper, security.SMACK64MMAP
> bit 28: trusted.overlay.origin, user.mime_type
> bit 29:
> bit 30:
> bit 31: user.overlay.opaque
>
> 30 xattrs are mapped into 20 bits. Similarly there are two potential
> conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit
> 22 (trusted.overlay.metacopy, trusted.overlay.protattr).
>
>
> Summary
> =======
>
> Personally I would prefer the former, as it maps xattrs into the bloom
> filter more evenly (22 bits vs 20 bits) and can better cooperate with
> the kernel routine (index and the remaining name string, rather than the
> full name string, are handy).

I agree that we want the approach with better cooperation with the
kernel function. However, I would much prefer if all the xattrs that
are commonly set on many files are unconflicted. This would be at
least: selinux, ima, evm, apparmor.

Can't you just add a magic constant to the seed? Then we can come up
with one that gives a good spread and hardcode that.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
[email protected] [email protected]


2023-07-03 09:16:48

by Jingbo Xu

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 7/3/23 3:25 PM, Alexander Larsson wrote:
> On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu <[email protected]> wrote:
>>
>> Hi all,
>>
>> Sorry for the late reply as I was on vacation these days.
>>
>> I test the hash bit for all xattrs given by Alex[1], to see if each
>> xattr could be mapped into one unique bit in the 32-bit bloom filter.
>>
>> [1]
>> https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/
>>
>>
>> On 6/21/23 4:32 PM, Jingbo Xu wrote:
>>>
>>> 3.2. input of hash function
>>> -------------------------
>>> As previously described, each hash function will map the given data into
>>> one bit of the bloom filter map. In our use case, xattr name serves as
>>> the key of hash function.
>>>
>>> When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
>>> and the remaining name apart from the prefix are handy. To avoid
>>> constructing the full xattr name, the above index and name are fed into
>>> the hash function directly in the following way:
>>>
>>> ```
>>> bit = xxh32(name, strlen(name), index + i);
>>> ```
>>>
>>> where index serves as part of seed, so that it gets involved in the
>>> calculation for the hash.
>>
>>
>> All xattrs are hashed with one single hash function.
>>
>> I first tested with the following hash function:
>>
>> ```
>> xxh32(name, strlen(name), index)
>> ```
>>
>> where `index` represents the index of corresponding predefined name
>> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
>> after stripping the above predefined name prefix (e.g.
>> "overlay.metacopy" for "user.overlay.metacopy")
>>
>>
>> The mapping results are:
>>
>> bit 0: security.SMACK64EXEC
>> bit 1:
>> bit 2: user.overlay.protattr
>> bit 3: trusted.overlay.impure, user.overlay.opaque, user.mime_type
>> bit 4:
>> bit 5: user.overlay.origin
>> bit 6: user.overlay.metacopy, security.evm
>> bit 8: trusted.overlay.opaque
>> bit 9: trusted.overlay.origin
>> bit 10: trusted.overlay.upper, trusted.overlay.protattr
>> bit 11: security.apparmor, security.capability
>> bit 12: security.SMACK64
>> bit 13: user.overlay.redirect, security.ima
>> bit 14: user.overlay.upper
>> bit 15: trusted.overlay.redirect
>> bit 16: security.SMACK64IPOUT
>> bit 17:
>> bit 18: system.posix_acl_access
>> bit 19: security.selinux
>> bit 20:
>> bit 21:
>> bit 22: system.posix_acl_default
>> bit 23: security.SMACK64MMAP
>> bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE
>> bit 25: trusted.overlay.metacopy
>> bit 26:
>> bit 27: security.SMACK64IPIN
>> bit 28:
>> bit 29:
>> bit 30: trusted.overlay.nlink
>> bit 31:
>>
>> Here 30 xattrs are mapped into 22 bits. There are two potential
>> conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr)
>> and bit 24 (user.overlay.impure, user.overlay.nlink).
>
> Bit 11 (apparmor and capabilities) seems like the most likely thing to
> run into. I.e. on an apparmor-using system, many files would have
> apparmor xattr set, so looking up security.capabilities on it would
> cause a false negative and we'd unnecessarily read the xattrs.
>
>>> An alternative way is to calculate the hash from the full xattr name by
>>> feeding the prefix string and the remaining name string separately in
>>> the following way:
>>>
>>> ```
>>> xxh32_reset()
>>> xxh32_update(prefix string, ...)
>>> xxh32_update(remaining name, ...)
>>> xxh32_digest()
>>> ```
>>>
>>> But I doubt if it really deserves to call multiple APIs instead of one
>>> single xxh32().
>>
>>
>> I also tested with the following hash function, where the full name of
>> the xattr, e.g. "user.overlay.metacopy", is fed into the hash function.
>>
>> ```
>> xxh32(name, strlen(name), 0)
>> ```
>>
>>
>> Following are the mapping results:
>>
>> bit 0: trusted.overlay.impure, user.overlay.protattr
>> bit 1: security.SMACK64IPOUT
>> bit 2:
>> bit 3: security.capability
>> bit 4: security.selinux
>> bit 5: security.ima
>> bit 6: user.overlay.metacopy
>> bit 8:
>> bit 9: trusted.overlay.redirect, security.SMACK64EXEC
>> bit 10: system.posix_acl_access
>> bit 11: trusted.overlay.nlink
>> bit 12: trusted.overlay.opaque
>> bit 13:
>> bit 14:
>> bit 15:
>> bit 16:
>> bit 17: user.overlay.impure
>> bit 18: security.apparmor
>> bit 19:
>> bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE
>> bit 21:
>> bit 22: trusted.overlay.metacopy, trusted.overlay.protattr
>> bit 23: user.overlay.upper, security.evm
>> bit 24: user.overlay.redirect, security.SMACK64IPIN,
>> system.posix_acl_default
>> bit 25: security.SMACK64
>> bit 26:
>> bit 27: trusted.overlay.upper, security.SMACK64MMAP
>> bit 28: trusted.overlay.origin, user.mime_type
>> bit 29:
>> bit 30:
>> bit 31: user.overlay.opaque
>>
>> 30 xattrs are mapped into 20 bits. Similarly there are two potential
>> conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit
>> 22 (trusted.overlay.metacopy, trusted.overlay.protattr).
>>
>>
>> Summary
>> =======
>>
>> Personally I would prefer the former, as it maps xattrs into the bloom
>> filter more evenly (22 bits vs 20 bits) and can better cooperate with
>> the kernel routine (index and the remaining name string, rather than the
>> full name string, are handy).
>
> I agree that we want the approach with better cooperation with the
> kernel function. However, I would much prefer if all the xattrs that
> are commonly set on many files are unconflicted. This would be at
> least: selinux, ima, evm, apparmor.
>
> Can't you just add a magic constant to the seed? Then we can come up
> with one that gives a good spread and hardcode that.

Brilliant idea! I would try to see if it works.

--
Thanks,
Jingbo

2023-07-04 06:03:45

by Jingbo Xu

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 7/3/23 3:25 PM, Alexander Larsson wrote:

>
> Can't you just add a magic constant to the seed? Then we can come up
> with one that gives a good spread and hardcode that.
>

I tried Alex's proposal of making a constant magic as part of the seed like:

```
xxh32(name, strlen(name), SEED_MAGIC + index)
```

>> where `index` represents the index of corresponding predefined name
>> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
>> after stripping the above predefined name prefix (e.g.
>> "overlay.metacopy" for "user.overlay.metacopy")


I tried SEED_MAGIC in range [0, UINT32_MAX], trying to find a constant
magic giving a best spread.

There are totally 30 commonly used xattrs and 32 bits could be mapped
into. I failed to find the most perfect constant magic which maps these
30 xattrs to exactly 30 bits with no conflict.

Later I can only select a magic from those that 1) maps 30 xattrs into
29 bits (with one conflict) and 2) at least xattrs like selinux, ima,
evm, apparmor are unconflicted to each other. Following are all
candidates and the conflicted xattrs:

```
seed: 745a51e
bit 29: user.overlay.opaque, security.selinux,

seed: fb08ad9
bit 10: user.overlay.impure, system.posix_acl_access,

seed: 11aa6c32
bit 19: user.overlay.redirect, security.SMACK64MMAP,

seed: 1438d503
bit 10: trusted.overlay.protattr, security.ima,

seed: 14ccea75
bit 30: user.overlay.upper, security.ima,

seed: 1d6a0d15
bit 31: trusted.overlay.upper, user.overlay.metacopy,

seed: 25bbe08f
bit 31: trusted.overlay.metacopy, user.overlay.metacopy,

seed: 2649da2a
bit 6: user.overlay.nlink, security.SMACK64TRANSMUTE,

seed: 2b9180b2
bit 29: user.overlay.impure, security.evm,

seed: 2c5d7862
bit 16: user.overlay.upper, system.posix_acl_default,

seed: 322040a6
bit 17: trusted.overlay.impure, user.overlay.metacopy,

seed: 328bcb8c
bit 30: user.overlay.opaque, security.evm,

seed: 35afc469
bit 3: trusted.overlay.opaque, user.overlay.origin,

seed: 435bed25
bit 4: trusted.overlay.upper, security.SMACK64MMAP,

seed: 43a853af
bit 3: trusted.overlay.protattr, security.selinux,

seed: 4930f511
bit 22: trusted.overlay.origin, security.SMACK64,

seed: 4acdce1d
bit 21: user.overlay.opaque, security.selinux,

seed: 4fc5f2b0
bit 24: user.overlay.redirect, system.posix_acl_default,

seed: 50632396
bit 6: security.SMACK64, system.posix_acl_access,

seed: 535f6351
bit 3: system.posix_acl_access, user.mime_type,

seed: 56f4306e
bit 9: user.overlay.redirect, security.SMACK64MMAP,

seed: 637e2bd9
bit 22: trusted.overlay.upper, security.evm,

seed: 6ab57b38
bit 9: trusted.overlay.upper, user.overlay.metacopy,

seed: 7113bd57
bit 19: trusted.overlay.redirect, security.SMACK64,

seed: 753e3f24
bit 6: user.overlay.redirect, security.SMACK64EXEC,

seed: 81883030
bit 0: trusted.overlay.upper, security.SMACK64IPOUT,

seed: 835f9f14
bit 27: security.SMACK64EXEC, system.posix_acl_access,

seed: 938fbb78
bit 28: user.overlay.upper, security.apparmor,

seed: 953bb3e9
bit 30: user.overlay.protattr, system.posix_acl_access,

seed: 962ccd7f
bit 29: trusted.overlay.nlink, security.SMACK64IPOUT,

seed: 9fff798e
bit 3: user.overlay.nlink, user.mime_type,

seed: a2e324eb
bit 28: trusted.overlay.nlink, user.overlay.impure,

seed: a6e00cef
bit 26: user.overlay.opaque, security.SMACK64TRANSMUTE,

seed: ae26aaa9
bit 0: trusted.overlay.metacopy, user.overlay.impure,

seed: b2172573
bit 14: trusted.overlay.upper, security.ima,

seed: b5917739
bit 8: trusted.overlay.protattr, user.overlay.impure,

seed: c01a4919
bit 22: trusted.overlay.nlink, user.overlay.upper,

seed: c1fa0c0a
bit 19: security.SMACK64TRANSMUTE, user.mime_type,

seed: cbd314d7
bit 6: trusted.overlay.upper, security.SMACK64IPOUT,

seed: cc6dd8ee
bit 2: trusted.overlay.nlink, user.overlay.upper,

seed: cc922efd
bit 3: trusted.overlay.opaque, user.overlay.opaque,

seed: cd478c17
bit 6: trusted.overlay.metacopy, user.overlay.metacopy,

seed: d35be1c8
bit 9: trusted.overlay.protattr, security.SMACK64MMAP,

seed: d3914458
bit 1: trusted.overlay.redirect, security.SMACK64IPIN,

seed: f3251337
bit 7: user.overlay.opaque, security.SMACK64IPOUT,

seed: f37d8900
bit 9: trusted.overlay.impure, user.overlay.protattr,

seed: fafd6c93
bit 0: security.SMACK64, user.mime_type,

seed: fcd35cbb
bit 3: trusted.overlay.upper, user.overlay.redirect,

seed: fe2e058a
bit 14: user.overlay.impure, user.mime_type,
```


Among all these candidates, I would prefer the following three
candidates as for each inode the same xattr of overlayfs family xattrs,
e.g. "overlay.metacopy", can not be prefixed with "trusted." and "user."
at the same time.

```
seed: 25bbe08f
bit 31: trusted.overlay.metacopy, user.overlay.metacopy,

seed: cc922efd
bit 3: trusted.overlay.opaque, user.overlay.opaque,

seed: cd478c17
bit 6: trusted.overlay.metacopy, user.overlay.metacopy,
```


Any other idea?


--
Thanks,
Jingbo

2023-07-04 08:18:41

by Alexander Larsson

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

On Tue, Jul 4, 2023 at 7:56 AM Jingbo Xu <[email protected]> wrote:
>
>
>
> On 7/3/23 3:25 PM, Alexander Larsson wrote:
>
> >
> > Can't you just add a magic constant to the seed? Then we can come up
> > with one that gives a good spread and hardcode that.
> >
>
> I tried Alex's proposal of making a constant magic as part of the seed like:
>
> ```
> xxh32(name, strlen(name), SEED_MAGIC + index)
> ```
>
> >> where `index` represents the index of corresponding predefined name
> >> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
> >> after stripping the above predefined name prefix (e.g.
> >> "overlay.metacopy" for "user.overlay.metacopy")
>
>
> I tried SEED_MAGIC in range [0, UINT32_MAX], trying to find a constant
> magic giving a best spread.
>
> There are totally 30 commonly used xattrs and 32 bits could be mapped
> into. I failed to find the most perfect constant magic which maps these
> 30 xattrs to exactly 30 bits with no conflict.
>
> Later I can only select a magic from those that 1) maps 30 xattrs into
> 29 bits (with one conflict) and 2) at least xattrs like selinux, ima,
> evm, apparmor are unconflicted to each other. Following are all
> candidates and the conflicted xattrs:
>
> ```
> seed: 745a51e
> bit 29: user.overlay.opaque, security.selinux,
>
> seed: fb08ad9
> bit 10: user.overlay.impure, system.posix_acl_access,
>
> seed: 11aa6c32
> bit 19: user.overlay.redirect, security.SMACK64MMAP,
>
> seed: 1438d503
> bit 10: trusted.overlay.protattr, security.ima,
>
> seed: 14ccea75
> bit 30: user.overlay.upper, security.ima,
>
> seed: 1d6a0d15
> bit 31: trusted.overlay.upper, user.overlay.metacopy,
>
> seed: 25bbe08f
> bit 31: trusted.overlay.metacopy, user.overlay.metacopy,
>
> seed: 2649da2a
> bit 6: user.overlay.nlink, security.SMACK64TRANSMUTE,
>
> seed: 2b9180b2
> bit 29: user.overlay.impure, security.evm,
>
> seed: 2c5d7862
> bit 16: user.overlay.upper, system.posix_acl_default,
>
> seed: 322040a6
> bit 17: trusted.overlay.impure, user.overlay.metacopy,
>
> seed: 328bcb8c
> bit 30: user.overlay.opaque, security.evm,
>
> seed: 35afc469
> bit 3: trusted.overlay.opaque, user.overlay.origin,
>
> seed: 435bed25
> bit 4: trusted.overlay.upper, security.SMACK64MMAP,
>
> seed: 43a853af
> bit 3: trusted.overlay.protattr, security.selinux,
>
> seed: 4930f511
> bit 22: trusted.overlay.origin, security.SMACK64,
>
> seed: 4acdce1d
> bit 21: user.overlay.opaque, security.selinux,
>
> seed: 4fc5f2b0
> bit 24: user.overlay.redirect, system.posix_acl_default,
>
> seed: 50632396
> bit 6: security.SMACK64, system.posix_acl_access,
>
> seed: 535f6351
> bit 3: system.posix_acl_access, user.mime_type,
>
> seed: 56f4306e
> bit 9: user.overlay.redirect, security.SMACK64MMAP,
>
> seed: 637e2bd9
> bit 22: trusted.overlay.upper, security.evm,
>
> seed: 6ab57b38
> bit 9: trusted.overlay.upper, user.overlay.metacopy,
>
> seed: 7113bd57
> bit 19: trusted.overlay.redirect, security.SMACK64,
>
> seed: 753e3f24
> bit 6: user.overlay.redirect, security.SMACK64EXEC,
>
> seed: 81883030
> bit 0: trusted.overlay.upper, security.SMACK64IPOUT,
>
> seed: 835f9f14
> bit 27: security.SMACK64EXEC, system.posix_acl_access,
>
> seed: 938fbb78
> bit 28: user.overlay.upper, security.apparmor,
>
> seed: 953bb3e9
> bit 30: user.overlay.protattr, system.posix_acl_access,
>
> seed: 962ccd7f
> bit 29: trusted.overlay.nlink, security.SMACK64IPOUT,
>
> seed: 9fff798e
> bit 3: user.overlay.nlink, user.mime_type,
>
> seed: a2e324eb
> bit 28: trusted.overlay.nlink, user.overlay.impure,
>
> seed: a6e00cef
> bit 26: user.overlay.opaque, security.SMACK64TRANSMUTE,
>
> seed: ae26aaa9
> bit 0: trusted.overlay.metacopy, user.overlay.impure,
>
> seed: b2172573
> bit 14: trusted.overlay.upper, security.ima,
>
> seed: b5917739
> bit 8: trusted.overlay.protattr, user.overlay.impure,
>
> seed: c01a4919
> bit 22: trusted.overlay.nlink, user.overlay.upper,
>
> seed: c1fa0c0a
> bit 19: security.SMACK64TRANSMUTE, user.mime_type,
>
> seed: cbd314d7
> bit 6: trusted.overlay.upper, security.SMACK64IPOUT,
>
> seed: cc6dd8ee
> bit 2: trusted.overlay.nlink, user.overlay.upper,
>
> seed: cc922efd
> bit 3: trusted.overlay.opaque, user.overlay.opaque,
>
> seed: cd478c17
> bit 6: trusted.overlay.metacopy, user.overlay.metacopy,
>
> seed: d35be1c8
> bit 9: trusted.overlay.protattr, security.SMACK64MMAP,
>
> seed: d3914458
> bit 1: trusted.overlay.redirect, security.SMACK64IPIN,
>
> seed: f3251337
> bit 7: user.overlay.opaque, security.SMACK64IPOUT,
>
> seed: f37d8900
> bit 9: trusted.overlay.impure, user.overlay.protattr,
>
> seed: fafd6c93
> bit 0: security.SMACK64, user.mime_type,
>
> seed: fcd35cbb
> bit 3: trusted.overlay.upper, user.overlay.redirect,
>
> seed: fe2e058a
> bit 14: user.overlay.impure, user.mime_type,
> ```
>
>
> Among all these candidates, I would prefer the following three
> candidates as for each inode the same xattr of overlayfs family xattrs,
> e.g. "overlay.metacopy", can not be prefixed with "trusted." and "user."
> at the same time.

Well, they *can* be the same at the same time, it just doesn't make
much sense, so it seems very unlikely. I agree that these make sense,
if the kernel is looking for user.overlay.metacopy, it would be in an
userxattr mounted overlayfs layer, and it would be very unlikely that
it had a trusted metacopy xattr, so there won't be any false negatives
causing us to do a full look up.

> ```
> seed: 25bbe08f
> bit 31: trusted.overlay.metacopy, user.overlay.metacopy,
>
> seed: cc922efd
> bit 3: trusted.overlay.opaque, user.overlay.opaque,
>
> seed: cd478c17
> bit 6: trusted.overlay.metacopy, user.overlay.metacopy,
> ```
>
>
> Any other idea?

Any of these three is good to me. I don't have any ideas directly
related to this. However, semi-related, it would be really cool if
fuse supported the same approach for xattr lookup. I.e. if lookup
could add a bloom filter value as part of the inode data, then we
could avoid a kernel<->fusefs-d transition for negative xattrs
lookups.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alexander Larsson Red Hat, Inc
[email protected] [email protected]


2023-07-04 09:34:30

by Gao Xiang

[permalink] [raw]
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 2023/7/4 16:05, Alexander Larsson wrote:
> On Tue, Jul 4, 2023 at 7:56 AM Jingbo Xu <[email protected]> wrote:
>>
>>

...

>> ```
>> seed: 25bbe08f
>> bit 31: trusted.overlay.metacopy, user.overlay.metacopy,
>>
>> seed: cc922efd
>> bit 3: trusted.overlay.opaque, user.overlay.opaque,
>>
>> seed: cd478c17
>> bit 6: trusted.overlay.metacopy, user.overlay.metacopy,
>> ```
>>
>>
>> Any other idea?
>
> Any of these three is good to me. I don't have any ideas directly
> related to this. However, semi-related, it would be really cool if
> fuse supported the same approach for xattr lookup. I.e. if lookup
> could add a bloom filter value as part of the inode data, then we
> could avoid a kernel<->fusefs-d transition for negative xattrs
> lookups.

I think we could just use a hardcoded magic number (e.g. 25bbe08f)
as a start first, since it seems already fulfill our requirement
and it's a compatible feature (and the implementation is also not
complex.) So I'm not quite worried about this.

Thanks,
Gao Xiang

>