2022-02-24 03:49:41

by Dharmendra Singh

[permalink] [raw]
Subject: [PATCH 0/2] FUSE: Implement atomic lookup + open

FUSE, as of now, makes aggressive lookup calls into libfuse in certain code
paths. These lookup calls possibly can be avoided in some cases. Incoming
two patches addresses the issue of aggressive lookup calls.

First patch handles the case where we open first time a file/dir or create
a file (O_CREAT) but do a lookup first on it. After lookup is performed
we make another call into libfuse to open the file. Now these two separate
calls into libfuse can be combined and performed as a single call into
libfuse.

Second patch handles the case when we are opening an already existing file
(positive dentry). Before this open call, we re-validate the inode and
this re-validation does a lookup on the file and verify the inode.
This separate lookup also can be avoided (for non-dir) and combined
with open call into libfuse.

Here is the link to libfuse patches which implement atomic open

https://github.com/d-hans/libfuse/commit/5255ce89decac71912e25b3cb4d79ebac538a456
https://github.com/d-hans/libfuse/commit/346b9feb2de5b6ff2b15882a38d7de0a0768c17c
https://github.com/d-hans/libfuse/commit/ac010dac446a9267b619afb138ab315d6c6eeb3e


Dharmendra Singh (2):
FUSE: Implement atomic lookup + open
FUSE: Avoid lookup in d_revalidate()

fs/fuse/dir.c | 170 +++++++++++++++++++++++++++++++++-----
fs/fuse/file.c | 30 ++++++-
fs/fuse/fuse_i.h | 13 ++-
fs/fuse/inode.c | 4 +-
fs/fuse/ioctl.c | 2 +-
include/uapi/linux/fuse.h | 2 +
6 files changed, 195 insertions(+), 26 deletions(-)

--
2.17.1


2022-03-01 16:24:00

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 0/2] FUSE: Implement atomic lookup + open

On Thu, 24 Feb 2022 at 04:27, Dharmendra Singh <[email protected]> wrote:
>
> FUSE, as of now, makes aggressive lookup calls into libfuse in certain code
> paths. These lookup calls possibly can be avoided in some cases. Incoming
> two patches addresses the issue of aggressive lookup calls.

Can you give performance numbers?

Thanks,
Miklos

2022-03-11 23:08:21

by Dharmendra Singh

[permalink] [raw]
Subject: 'Re: [PATCH 0/2] FUSE: Implement atomic lookup + open'

Thanks, Miklos. For measuring the performance, bonnie++ was used over passthrough_ll mount on tmpfs.
When taking numbers on vm, I could see non-deterministic behaviour in the results. Therefore core
binding was used for passthrough_ll and bonnie++, keeping them on separate cores.

Here are the google sheets having performance numbers.
https://docs.google.com/spreadsheets/d/1JRgF8DTR9xk5zz3_azmLcyy5kW3bgjjItmS8CYsAoT4/edit#gid=0
https://docs.google.com/spreadsheets/d/1JRgF8DTR9xk5zz3_azmLcyy5kW3bgjjItmS8CYsAoT4/edit#gid=1833203226

Following are the libfuse patches(commit on March 7 and March 8 in first link) which were used to test
these changes
https://github.com/aakefbs/libfuse/commits/atomic-open-and-no-flush
https://github.com/libfuse/libfuse/pull/644

Parameters used in mounting passthrough_ll:
numactl --localalloc --physcpubind=16-23 passthrough_ll -f -osource=/tmp/source,allow_other,allow_root,
cache=never -o max_idle_threads=1 /tmp/dest
(Here cache=never results in direct-io on the file)

Parameters used in bonnie++:
In sheet 0B:
numactl --localalloc --physcpubind=0-7 bonnie++ -x 4 -q -s0 -d /tmp/dest/ -n 10:0:0:10 -r 0 -u 0 2>/dev/null

in sheet 1B:
numactl --localalloc --physcpubind=0-7 bonnie++ -x 4 -q -s0 -d /tmp/dest/ -n 10:1:1:10 -r 0 -u 0 2>/dev/null

Additional settings done on the testing machine:
cpupower frequency-set -g performance

Running bonnie++ gives us results for Create/s, Read/s and Delete/s. Below table summarises the numbers
for these three operations. Please note that for read of 0 bytes, bonnie++ does ops in order of create-open,
close and stat but no atomic open. Therefore performance results in the sheet 0B had overhead of extra
stat calls. Whereas in sheet 1B, we directed bonnie++ to read 1 byte and this triggered atomic open call but
numbers for this run involve overhead for read operation itself instead of just plain open/close.

Here is the table summarising the performance numbers

Table: 0B
Sequential | Random
Creat/s Read/s Del/s | Creat/s Read/s Del/s
Patched Libfuse -3.55% -4.9% -4.43% | -0.4% -1.6% -1.0%
Patched Libfuse + No-Flush +22.3% +6% +5.15% | +27.9% +14.5% +2.8%
Patched Libfuse + Patched FuseK +22.9% +6.1% +5.3% | +28.3% +14.5% +2.3%
Patched Libfuse + Patched FuseK + No-Flush +33.4% -4.4% -3.73% | +38.8% -2.5% -2.0%



Table: 1B
Sequential | Random
Create/s Read/s Del/s | Create/s Read/s Del/s
Patched Libfuse -0.22% -0.35% -0.7% | -0.27% -0.78% -2.35%
Patched Libfuse + No-Flush +2.5% +2.6% -9.6% | +2.5% -8.6% -6.26%
Patched Libfuse + Patched FuseK +1.63% -1.0% -11.45% | +4.48% -6.84% -4.0%
Patched Libfuse + Patched FuseK + No-Flush +32.43% +26.61% +076% | +33.2% +14.7% -0.40%

Here
No-Flush = No flush trigger from fuse kernel into libfuse

In Table 1B, we see 4th row has good improvements for both create and Read whereas Del seems to be almost not
changed. In Table 0B, 3rd row we have Read perf reduced, it was found out that this was caused by some changes
in libfuse. So this was fixed and in Table 1B, same row, we can see increased numbers.

In Table 0B, 3rd row, we have good numbers because bonnie++ used 0 bytes to read and this changed behaviour
and impacted perf whereas for the same row, Table 1B we have reduced numbers because it involved flush
calls for 1 byte from the fuse kernel into libfuse.

These changes are not for fuse kernel/users-space context switches only, but our main goal is to have improvement performance
for network file systems
- Number network round trips
- Reduce load on meta servers with thousands of clients

Reduced kernel/userspace context switches is 'just' a side effect.

Thanks,
Dharmendra