A data race occurs when two concurrent data paths potentially access
fuse_conn->num_background simultaneously.
Specifically, fuse_request_end() accesses and modifies ->num_background
while holding the bg_lock, whereas fuse_readahead() reads
->num_background without acquiring any lock beforehand. This potential
data race is flagged by KCSAN:
BUG: KCSAN: data-race in fuse_readahead [fuse] / fuse_request_end [fuse]
read-write to 0xffff8883a6666598 of 4 bytes by task 113809 on cpu 39:
fuse_request_end (fs/fuse/dev.c:318) fuse
fuse_dev_do_write (fs/fuse/dev.c:?) fuse
fuse_dev_write (fs/fuse/dev.c:?) fuse
...
read to 0xffff8883a6666598 of 4 bytes by task 113787 on cpu 8:
fuse_readahead (fs/fuse/file.c:1005) fuse
read_pages (mm/readahead.c:166)
page_cache_ra_unbounded (mm/readahead.c:?)
...
value changed: 0x00000001 -> 0x00000000
Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
to avoid such complaint from KCSAN.
Suggested-by: Miklos Szeredi <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
fs/fuse/dev.c | 6 ++++--
fs/fuse/file.c | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 3ec8bb5e68ff..8e63dba49eff 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -282,6 +282,7 @@ void fuse_request_end(struct fuse_req *req)
struct fuse_mount *fm = req->fm;
struct fuse_conn *fc = fm->fc;
struct fuse_iqueue *fiq = &fc->iq;
+ unsigned int num_background;
if (test_and_set_bit(FR_FINISHED, &req->flags))
goto put_request;
@@ -301,7 +302,8 @@ void fuse_request_end(struct fuse_req *req)
if (test_bit(FR_BACKGROUND, &req->flags)) {
spin_lock(&fc->bg_lock);
clear_bit(FR_BACKGROUND, &req->flags);
- if (fc->num_background == fc->max_background) {
+ num_background = READ_ONCE(fc->num_background);
+ if (num_background == fc->max_background) {
fc->blocked = 0;
wake_up(&fc->blocked_waitq);
} else if (!fc->blocked) {
@@ -315,7 +317,7 @@ void fuse_request_end(struct fuse_req *req)
wake_up(&fc->blocked_waitq);
}
- fc->num_background--;
+ WRITE_ONCE(fc->num_background, num_background - 1);
fc->active_background--;
flush_bg_queue(fc);
spin_unlock(&fc->bg_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce4157640..07331889bbf3 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1002,7 +1002,7 @@ static void fuse_readahead(struct readahead_control *rac)
struct fuse_io_args *ia;
struct fuse_args_pages *ap;
- if (fc->num_background >= fc->congestion_threshold &&
+ if (READ_ONCE(fc->num_background) >= fc->congestion_threshold &&
rac->ra->async_size >= readahead_count(rac))
/*
* Congested and only async pages left, so skip the
--
2.43.0
On Thu, 9 May 2024 at 14:57, Breno Leitao <[email protected]> wrote:
> Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
> to avoid such complaint from KCSAN.
I'm not sure the write side part is really needed, since the lock is
properly protecting against concurrent readers/writers within the
locked region.
Does KCSAN still complain if you just add the READ_ONCE() to fuse_readahead()?
Thanks,
Miklos
Hello Miklos,
On Fri, May 10, 2024 at 11:21:19AM +0200, Miklos Szeredi wrote:
> On Thu, 9 May 2024 at 14:57, Breno Leitao <[email protected]> wrote:
>
> > Annotated the reader with READ_ONCE() and the writer with WRITE_ONCE()
> > to avoid such complaint from KCSAN.
>
> I'm not sure the write side part is really needed, since the lock is
> properly protecting against concurrent readers/writers within the
> locked region.
I understand that num_background is read from an unlocked region
(fuse_readahead()).
> Does KCSAN still complain if you just add the READ_ONCE() to fuse_readahead()?
I haven't checked, but, looking at the documentation it says that both part
needs to be marked. Here is an example very similar to ours here, from
tools/memory-model/Documentation/access-marking.txt
Lock-Protected Writes With Lockless Reads
-----------------------------------------
For another example, suppose a shared variable "foo" is updated only
while holding a spinlock, but is read locklessly. The code might look
as follows:
int foo;
DEFINE_SPINLOCK(foo_lock);
void update_foo(int newval)
{
spin_lock(&foo_lock);
WRITE_ONCE(foo, newval);
ASSERT_EXCLUSIVE_WRITER(foo);
do_something(newval);
spin_unlock(&foo_wlock);
}
int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}
Because foo is read locklessly, all accesses are marked.
From my understanding, we need a WRITE_ONCE() inside the lock, because
the bg_lock lock in fuse_request_end() is invisible for fuse_readahead(),
and fuse_readahead() might read num_backgroud that was writen
non-atomically/corrupted (if there is no WRITE_ONCE()).
That said, if the reader (fuse_readahead()) can handle possible
corrupted data, we can mark is with data_race() annotation. Then I
understand we don't need to mark the write with WRITE_ONCE().
Here is what access-marking.txt says about this case:
Here are some situations where data_race() should be used instead of
READ_ONCE() and WRITE_ONCE():
1. Data-racy loads from shared variables whose values are used only
for diagnostic purposes.
2. Data-racy reads whose values are checked against marked reload.
3. Reads whose values feed into error-tolerant heuristics.
4. Writes setting values that feed into error-tolerant heuristics.
Anyway, I am more than happy to test with only a READ_ONLY() in the
reader side, if that the approach you prefer.
Thanks!
On Mon, 13 May 2024 at 14:41, Breno Leitao <[email protected]> wrote:
> That said, if the reader (fuse_readahead()) can handle possible
> corrupted data, we can mark is with data_race() annotation. Then I
> understand we don't need to mark the write with WRITE_ONCE().
Adding Willy, since the readahead code in fuse is fairly special.
I don't think it actually matters if "fc->num_background >=
fc->congestion_threshold" returns false positive or false negative,
but I don't have a full understanding of how readahead works.
Willy, can you please look at fuse_readahead() to confirm that
breaking out of the loop is okay if (rac->ra->async_size >=
readahead_count(rac)) no mater what?
Thanks,
Miklos