struct file_ra_state ra.mmap_miss could be accessed concurrently during
page faults as noticed by KCSAN,
BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
filemap_fault+0x920/0xfc0
do_sync_mmap_readahead at mm/filemap.c:2384
(inlined by) filemap_fault at mm/filemap.c:2486
__xfs_filemap_fault+0x112/0x3e0 [xfs]
xfs_filemap_fault+0x74/0x90 [xfs]
__do_fault+0x9e/0x220
do_fault+0x4a0/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
filemap_map_pages+0xc2e/0xd80
filemap_map_pages at mm/filemap.c:2625
do_fault+0x3da/0x920
__handle_mm_fault+0xc69/0xd00
handle_mm_fault+0xfc/0x2f0
do_page_fault+0x263/0x6f9
page_fault+0x34/0x40
Reported by Kernel Concurrency Sanitizer on:
CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
ra.mmap_miss is used to contribute the readahead decisions, a data race
could be undesirable. Both the read and write is only under
non-exclusive mmap_sem, two concurrent writers could even overflow the
counter. Fixing the underflow by writing to a local variable before
committing a final store to ra.mmap_miss given a small inaccuracy of the
counter should be acceptable.
Suggested-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
---
v2: fix the underflow issue pointed out by Matthew.
mm/filemap.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 1784478270e1..2e298db2e80f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2365,6 +2365,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
struct address_space *mapping = file->f_mapping;
struct file *fpin = NULL;
pgoff_t offset = vmf->pgoff;
+ unsigned int mmap_miss;
/* If we don't want any read-ahead, don't bother */
if (vmf->vma->vm_flags & VM_RAND_READ)
@@ -2380,14 +2381,15 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
}
/* Avoid banging the cache line if not needed */
- if (ra->mmap_miss < MMAP_LOTSAMISS * 10)
- ra->mmap_miss++;
+ mmap_miss = READ_ONCE(ra->mmap_miss);
+ if (mmap_miss < MMAP_LOTSAMISS * 10)
+ WRITE_ONCE(ra->mmap_miss, ++mmap_miss);
/*
* Do we miss much more than hit in this file? If so,
* stop bothering with read-ahead. It will only hurt.
*/
- if (ra->mmap_miss > MMAP_LOTSAMISS)
+ if (mmap_miss > MMAP_LOTSAMISS)
return fpin;
/*
@@ -2413,13 +2415,15 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
struct file_ra_state *ra = &file->f_ra;
struct address_space *mapping = file->f_mapping;
struct file *fpin = NULL;
+ unsigned int mmap_miss;
pgoff_t offset = vmf->pgoff;
/* If we don't want any read-ahead, don't bother */
if (vmf->vma->vm_flags & VM_RAND_READ)
return fpin;
- if (ra->mmap_miss > 0)
- ra->mmap_miss--;
+ mmap_miss = READ_ONCE(ra->mmap_miss);
+ if (mmap_miss)
+ WRITE_ONCE(ra->mmap_miss, --mmap_miss);
if (PageReadahead(page)) {
fpin = maybe_unlock_mmap_for_io(vmf, fpin);
page_cache_async_readahead(mapping, ra, file,
@@ -2586,6 +2590,7 @@ void filemap_map_pages(struct vm_fault *vmf,
unsigned long max_idx;
XA_STATE(xas, &mapping->i_pages, start_pgoff);
struct page *page;
+ unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
rcu_read_lock();
xas_for_each(&xas, page, end_pgoff) {
@@ -2622,8 +2627,8 @@ void filemap_map_pages(struct vm_fault *vmf,
if (page->index >= max_idx)
goto unlock;
- if (file->f_ra.mmap_miss > 0)
- file->f_ra.mmap_miss--;
+ if (mmap_miss > 0)
+ mmap_miss--;
vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
if (vmf->pte)
@@ -2643,6 +2648,7 @@ void filemap_map_pages(struct vm_fault *vmf,
break;
}
rcu_read_unlock();
+ WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
}
EXPORT_SYMBOL(filemap_map_pages);
--
2.21.0 (Apple Git-122.2)
On Mon, Feb 10, 2020 at 10:01:34PM -0500, Qian Cai wrote:
> struct file_ra_state ra.mmap_miss could be accessed concurrently during
> page faults as noticed by KCSAN,
>
> BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
>
> write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
> filemap_fault+0x920/0xfc0
> do_sync_mmap_readahead at mm/filemap.c:2384
> (inlined by) filemap_fault at mm/filemap.c:2486
> __xfs_filemap_fault+0x112/0x3e0 [xfs]
> xfs_filemap_fault+0x74/0x90 [xfs]
> __do_fault+0x9e/0x220
> do_fault+0x4a0/0x920
> __handle_mm_fault+0xc69/0xd00
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
> filemap_map_pages+0xc2e/0xd80
> filemap_map_pages at mm/filemap.c:2625
> do_fault+0x3da/0x920
> __handle_mm_fault+0xc69/0xd00
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>
> ra.mmap_miss is used to contribute the readahead decisions, a data race
> could be undesirable. Both the read and write is only under
> non-exclusive mmap_sem, two concurrent writers could even overflow the
> counter. Fixing the underflow by writing to a local variable before
> committing a final store to ra.mmap_miss given a small inaccuracy of the
> counter should be acceptable.
>
> Suggested-by: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Qian Cai <[email protected]>
That's more than Suggested-by. The correct way to submit this patch is:
From: Kirill A. Shutemov <[email protected]>
(at the top of the patch, so it gets credited to Kirill)
then in this section:
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Qian Cai <[email protected]>
And now you can add:
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
> On Feb 10, 2020, at 10:49 PM, Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Feb 10, 2020 at 10:01:34PM -0500, Qian Cai wrote:
>> struct file_ra_state ra.mmap_miss could be accessed concurrently during
>> page faults as noticed by KCSAN,
>>
>> BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
>>
>> write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
>> filemap_fault+0x920/0xfc0
>> do_sync_mmap_readahead at mm/filemap.c:2384
>> (inlined by) filemap_fault at mm/filemap.c:2486
>> __xfs_filemap_fault+0x112/0x3e0 [xfs]
>> xfs_filemap_fault+0x74/0x90 [xfs]
>> __do_fault+0x9e/0x220
>> do_fault+0x4a0/0x920
>> __handle_mm_fault+0xc69/0xd00
>> handle_mm_fault+0xfc/0x2f0
>> do_page_fault+0x263/0x6f9
>> page_fault+0x34/0x40
>>
>> read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
>> filemap_map_pages+0xc2e/0xd80
>> filemap_map_pages at mm/filemap.c:2625
>> do_fault+0x3da/0x920
>> __handle_mm_fault+0xc69/0xd00
>> handle_mm_fault+0xfc/0x2f0
>> do_page_fault+0x263/0x6f9
>> page_fault+0x34/0x40
>>
>> Reported by Kernel Concurrency Sanitizer on:
>> CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1
>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>>
>> ra.mmap_miss is used to contribute the readahead decisions, a data race
>> could be undesirable. Both the read and write is only under
>> non-exclusive mmap_sem, two concurrent writers could even overflow the
>> counter. Fixing the underflow by writing to a local variable before
>> committing a final store to ra.mmap_miss given a small inaccuracy of the
>> counter should be acceptable.
>>
>> Suggested-by: Kirill A. Shutemov <[email protected]>
>> Signed-off-by: Qian Cai <[email protected]>
>
> That's more than Suggested-by. The correct way to submit this patch is:
>
> From: Kirill A. Shutemov <[email protected]>
> (at the top of the patch, so it gets credited to Kirill)
Sure, if Kirill is going to provide his Signed-off-by in the first place, I’ll be happy to
submit it on his behalf.
>
> then in this section:
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Tested-by: Qian Cai <[email protected]>
>
> And now you can add:
>
> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
On Mon, Feb 10, 2020 at 10:55:45PM -0500, Qian Cai wrote:
>
>
> > On Feb 10, 2020, at 10:49 PM, Matthew Wilcox <[email protected]> wrote:
> >
> > On Mon, Feb 10, 2020 at 10:01:34PM -0500, Qian Cai wrote:
> >> struct file_ra_state ra.mmap_miss could be accessed concurrently during
> >> page faults as noticed by KCSAN,
> >>
> >> BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
> >>
> >> write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
> >> filemap_fault+0x920/0xfc0
> >> do_sync_mmap_readahead at mm/filemap.c:2384
> >> (inlined by) filemap_fault at mm/filemap.c:2486
> >> __xfs_filemap_fault+0x112/0x3e0 [xfs]
> >> xfs_filemap_fault+0x74/0x90 [xfs]
> >> __do_fault+0x9e/0x220
> >> do_fault+0x4a0/0x920
> >> __handle_mm_fault+0xc69/0xd00
> >> handle_mm_fault+0xfc/0x2f0
> >> do_page_fault+0x263/0x6f9
> >> page_fault+0x34/0x40
> >>
> >> read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
> >> filemap_map_pages+0xc2e/0xd80
> >> filemap_map_pages at mm/filemap.c:2625
> >> do_fault+0x3da/0x920
> >> __handle_mm_fault+0xc69/0xd00
> >> handle_mm_fault+0xfc/0x2f0
> >> do_page_fault+0x263/0x6f9
> >> page_fault+0x34/0x40
> >>
> >> Reported by Kernel Concurrency Sanitizer on:
> >> CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G W L 5.5.0-next-20200210+ #1
> >> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >>
> >> ra.mmap_miss is used to contribute the readahead decisions, a data race
> >> could be undesirable. Both the read and write is only under
> >> non-exclusive mmap_sem, two concurrent writers could even overflow the
> >> counter. Fixing the underflow by writing to a local variable before
> >> committing a final store to ra.mmap_miss given a small inaccuracy of the
> >> counter should be acceptable.
> >>
> >> Suggested-by: Kirill A. Shutemov <[email protected]>
> >> Signed-off-by: Qian Cai <[email protected]>
> >
> > That's more than Suggested-by. The correct way to submit this patch is:
> >
> > From: Kirill A. Shutemov <[email protected]>
> > (at the top of the patch, so it gets credited to Kirill)
>
> Sure, if Kirill is going to provide his Signed-off-by in the first place, I’ll be happy to
> submit it on his behalf.
Signed-off-by: Kirill A. Shutemov <[email protected]>
--
Kirill A. Shutemov