LinuxLists.cc - [BUG] KCSAN: data-race in xas_clear_mark / xas_find

2023-08-18 18:04:07

Subject: [BUG] KCSAN: data-race in xas_clear_mark / xas_find_marked

Hi,

This is your friendly bug reporter.

I have found this KCSAN reported bug on a torvalds tree kernel 6.5-rc6, on an Ubuntu 22.04 LTS system and
a Ryzen 9 7950X assembled box.

So, the kernel is reported to be tainted, but I reckon it the taint comes from a previous KCSAN detection,
not from a loaded proprietary module.

Excert from dmesg log:

[ 206.510010] ==================================================================
[ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked

[ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
[ 206.510081] xas_clear_mark+0xd5/0x180
[ 206.510097] __xa_clear_mark+0xd1/0x100
[ 206.510114] __folio_end_writeback+0x293/0x5a0
[ 206.510128] folio_end_writeback+0x60/0x170
[ 206.510143] end_page_writeback+0x2a/0xb0
[ 206.510155] btrfs_page_clear_writeback+0xbe/0xe0 [btrfs]
[ 206.510994] end_bio_extent_writepage+0x103/0x310 [btrfs]
[ 206.511817] __btrfs_bio_end_io+0x9b/0xc0 [btrfs]
[ 206.512640] btrfs_orig_bbio_end_io+0x70/0x170 [btrfs]
[ 206.513497] btrfs_simple_end_io+0x122/0x170 [btrfs]
[ 206.514350] bio_endio+0x2c4/0x2f0
[ 206.514362] blk_mq_end_request_batch+0x238/0x9b0
[ 206.514377] nvme_pci_complete_batch+0x38/0x1a0 [nvme]
[ 206.514437] nvme_irq+0xa0/0xb0 [nvme]
[ 206.514500] __handle_irq_event_percpu+0x7c/0x290
[ 206.514517] handle_irq_event+0x7c/0x100
[ 206.514533] handle_edge_irq+0x13d/0x450
[ 206.514549] __common_interrupt+0x4f/0x110
[ 206.514563] common_interrupt+0x9f/0xb0
[ 206.514583] asm_common_interrupt+0x27/0x40
[ 206.514599] kcsan_setup_watchpoint+0x274/0x3f0
[ 206.514612] __tsan_read8+0x11c/0x180
[ 206.514626] steal_from_bitmap.part.0+0x29f/0x410 [btrfs]
[ 206.515491] __btrfs_add_free_space+0x1b4/0x850 [btrfs]
[ 206.516361] btrfs_add_free_space_async_trimmed+0x62/0xa0 [btrfs]
[ 206.517231] add_new_free_space+0x127/0x160 [btrfs]
[ 206.518095] load_free_space_tree+0x552/0x680 [btrfs]
[ 206.518953] caching_thread+0x923/0xba0 [btrfs]
[ 206.519800] btrfs_work_helper+0xfa/0x620 [btrfs]
[ 206.520643] process_one_work+0x525/0x930
[ 206.520658] worker_thread+0x311/0x7e0
[ 206.520672] kthread+0x18b/0x1d0
[ 206.520684] ret_from_fork+0x43/0x70
[ 206.520701] ret_from_fork_asm+0x1b/0x30

[ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
[ 206.520735] xas_find_marked+0xe5/0x600
[ 206.520750] filemap_get_folios_tag+0xf9/0x3d0
[ 206.520763] __filemap_fdatawait_range+0xa1/0x180
[ 206.520777] filemap_fdatawait_range+0x13/0x30
[ 206.520790] btrfs_wait_ordered_range+0x86/0x180 [btrfs]
[ 206.521641] btrfs_sync_file+0x36e/0xa80 [btrfs]
[ 206.522495] vfs_fsync_range+0x70/0x120
[ 206.522509] __x64_sys_fsync+0x44/0x80
[ 206.522522] do_syscall_64+0x58/0x90
[ 206.522535] entry_SYSCALL_64_after_hwframe+0x73/0xdd

[ 206.522557] value changed: 0xfffffffffff80000 -> 0xfffffffffff00000

[ 206.522574] Reported by Kernel Concurrency Sanitizer on:
[ 206.522585] CPU: 6 PID: 2793 Comm: tracker-extract Tainted: G L 6.5.0-rc6+ #44
[ 206.522600] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 206.522608] ==================================================================

Unwound:

[ 206.510010] ==================================================================
[ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked

[ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
[ 206.510081] xas_clear_mark (./arch/x86/include/asm/bitops.h:178 ./include/asm-generic/bitops/instrumented-non-atomic.h:115 lib/xarray.c:102 lib/xarray.c:914)
[ 206.510097] __xa_clear_mark (lib/xarray.c:1923)
[ 206.510114] __folio_end_writeback (mm/page-writeback.c:2981)
[ 206.510128] folio_end_writeback (mm/filemap.c:1616)
[ 206.510143] end_page_writeback (mm/folio-compat.c:28)
[ 206.510155] btrfs_page_clear_writeback (fs/btrfs/subpage.c:646) btrfs
[ 206.510994] end_bio_extent_writepage (./include/linux/bio.h:84 fs/btrfs/extent_io.c:542) btrfs
[ 206.511817] __btrfs_bio_end_io (fs/btrfs/bio.c:117 fs/btrfs/bio.c:112) btrfs
[ 206.512640] btrfs_orig_bbio_end_io (fs/btrfs/bio.c:164) btrfs
[ 206.513497] btrfs_simple_end_io (fs/btrfs/bio.c:380) btrfs
[ 206.514350] bio_endio (block/bio.c:1617)
[ 206.514362] blk_mq_end_request_batch (block/blk-mq.c:837 block/blk-mq.c:1073)
[ 206.514377] nvme_pci_complete_batch (drivers/nvme/host/pci.c:986) nvme
[ 206.514437] nvme_irq (drivers/nvme/host/pci.c:1086) nvme
[ 206.514500] __handle_irq_event_percpu (kernel/irq/handle.c:158)
[ 206.514517] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210)
[ 206.514533] handle_edge_irq (kernel/irq/chip.c:836)
[ 206.514549] __common_interrupt (./include/linux/irqdesc.h:161 arch/x86/kernel/irq.c:238 arch/x86/kernel/irq.c:257)
[ 206.514563] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
[ 206.514583] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
[ 206.514599] kcsan_setup_watchpoint (kernel/kcsan/core.c:705 (discriminator 1))
[ 206.514612] __tsan_read8 (kernel/kcsan/core.c:1025)
[ 206.514626] steal_from_bitmap.part.0 (./include/linux/find.h:186 fs/btrfs/free-space-cache.c:2557 fs/btrfs/free-space-cache.c:2613) btrfs
[ 206.515491] __btrfs_add_free_space (fs/btrfs/free-space-cache.c:2689 fs/btrfs/free-space-cache.c:2667) btrfs
[ 206.516361] btrfs_add_free_space_async_trimmed (fs/btrfs/free-space-cache.c:2798) btrfs
[ 206.517231] add_new_free_space (fs/btrfs/block-group.c:550) btrfs
[ 206.518095] load_free_space_tree (fs/btrfs/free-space-tree.c:1595 fs/btrfs/free-space-tree.c:1658) btrfs
[ 206.518953] caching_thread (fs/btrfs/block-group.c:873) btrfs
[ 206.519800] btrfs_work_helper (fs/btrfs/async-thread.c:314) btrfs
[ 206.520643] process_one_work (kernel/workqueue.c:2600)
[ 206.520658] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 206.520672] kthread (kernel/kthread.c:389)
[ 206.520684] ret_from_fork (arch/x86/kernel/process.c:145)
[ 206.520701] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)

[ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
[ 206.520735] xas_find_marked (./include/linux/xarray.h:1706 lib/xarray.c:1354)
[ 206.520750] filemap_get_folios_tag (mm/filemap.c:1975 mm/filemap.c:2273)
[ 206.520763] __filemap_fdatawait_range (mm/filemap.c:519)
[ 206.520777] filemap_fdatawait_range (mm/filemap.c:556)
[ 206.520790] btrfs_wait_ordered_range (fs/btrfs/ordered-data.c:839) btrfs
[ 206.521641] btrfs_sync_file (fs/btrfs/file.c:1859) btrfs
[ 206.522495] vfs_fsync_range (fs/sync.c:188)
[ 206.522509] __x64_sys_fsync (./include/linux/file.h:45 fs/sync.c:213 fs/sync.c:220 fs/sync.c:218 fs/sync.c:218)
[ 206.522522] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 206.522535] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[ 206.522557] value changed: 0xfffffffffff80000 -> 0xfffffffffff00000

[ 206.522574] Reported by Kernel Concurrency Sanitizer on:
[ 206.522585] CPU: 6 PID: 2793 Comm: tracker-extract Tainted: G L 6.5.0-rc6+ #44
[ 206.522600] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 206.522608] ==================================================================

Best regards,
Mirsad Todorovac

2023-08-19 15:31:48

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [BUG] KCSAN: data-race in xas_clear_mark / xas_find_marked

On Fri, Aug 18, 2023 at 10:01:32AM +0200, Mirsad Todorovac wrote:
> [ 206.510010] ==================================================================
> [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
>
> [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
> [ 206.510081] xas_clear_mark+0xd5/0x180
> [ 206.510097] __xa_clear_mark+0xd1/0x100
> [ 206.510114] __folio_end_writeback+0x293/0x5a0
> [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
> [ 206.520735] xas_find_marked+0xe5/0x600
> [ 206.520750] filemap_get_folios_tag+0xf9/0x3d0
Also, before submitting this kind of report, you should run the
trace through scripts/decode_stacktrace.sh to give us line numbers
instead of hex offsets, which are useless to anyone who doesn't have
your exact kernel build.

> [ 206.510010] ==================================================================
> [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
>
> [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
> [ 206.510081] xas_clear_mark (./arch/x86/include/asm/bitops.h:178 ./include/asm-generic/bitops/instrumented-non-atomic.h:115 lib/xarray.c:102 lib/xarray.c:914)
> [ 206.510097] __xa_clear_mark (lib/xarray.c:1923)
> [ 206.510114] __folio_end_writeback (mm/page-writeback.c:2981)

This path is properly using xa_lock_irqsave() before calling
__xa_clear_mark().

> [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
> [ 206.520735] xas_find_marked (./include/linux/xarray.h:1706 lib/xarray.c:1354)
> [ 206.520750] filemap_get_folios_tag (mm/filemap.c:1975 mm/filemap.c:2273)

This takes the RCU read lock before calling xas_find_marked() as it's
supposed to.

What garbage do I have to write to tell KCSAN it's wrong? The line
that's probably triggering it is currently:

unsigned long data = *addr & (~0UL << offset);

2023-08-19 16:33:52

by Mirsad Todorovac

[permalink] [raw]

Subject: Re: [BUG] KCSAN: data-race in xas_clear_mark / xas_find_marked

On 8/18/23 16:19, Matthew Wilcox wrote:
> On Fri, Aug 18, 2023 at 03:37:10PM +0200, Mirsad Todorovac wrote:
>> I am new to KCSAN. I was not aware of KCSAN false positives thus far, so my best bet was to report them.
>>
>> I thought that maybe READ_ONCE() was required, but I will trust your judgment.
>>
>> I hope I can find this resolved.
>
> I haven't looked into KCSAN in any detail, I don't know what the right
> way is to resolve this.

That's OK, I suppose.

I don't feel clarity about it either.

I am certain that the original developers have the big picture. :-)

Best regards,
Mirsad Todorovac

2023-09-14 20:09:08

by Jan Kara

[permalink] [raw]

Subject: Re: [BUG] KCSAN: data-race in xas_clear_mark / xas_find_marked

On Fri 18-08-23 13:21:17, Matthew Wilcox wrote:
> On Fri, Aug 18, 2023 at 10:01:32AM +0200, Mirsad Todorovac wrote:
> > [ 206.510010] ==================================================================
> > [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
> >
> > [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
> > [ 206.510081] xas_clear_mark+0xd5/0x180
> > [ 206.510097] __xa_clear_mark+0xd1/0x100
> > [ 206.510114] __folio_end_writeback+0x293/0x5a0
> > [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
> > [ 206.520735] xas_find_marked+0xe5/0x600
> > [ 206.520750] filemap_get_folios_tag+0xf9/0x3d0
> Also, before submitting this kind of report, you should run the
> trace through scripts/decode_stacktrace.sh to give us line numbers
> instead of hex offsets, which are useless to anyone who doesn't have
> your exact kernel build.
>
> > [ 206.510010] ==================================================================
> > [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
> >
> > [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
> > [ 206.510081] xas_clear_mark (./arch/x86/include/asm/bitops.h:178 ./include/asm-generic/bitops/instrumented-non-atomic.h:115 lib/xarray.c:102 lib/xarray.c:914)
> > [ 206.510097] __xa_clear_mark (lib/xarray.c:1923)
> > [ 206.510114] __folio_end_writeback (mm/page-writeback.c:2981)
>
> This path is properly using xa_lock_irqsave() before calling
> __xa_clear_mark().
>
> > [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
> > [ 206.520735] xas_find_marked (./include/linux/xarray.h:1706 lib/xarray.c:1354)
> > [ 206.520750] filemap_get_folios_tag (mm/filemap.c:1975 mm/filemap.c:2273)
>
> This takes the RCU read lock before calling xas_find_marked() as it's
> supposed to.
>
> What garbage do I have to write to tell KCSAN it's wrong? The line
> that's probably triggering it is currently:
>
> unsigned long data = *addr & (~0UL << offset);

I don't think it is actually wrong in this case. You're accessing xarray
only with RCU protection so it can be changing under your hands. For
example the code in xas_find_chunk():

unsigned long data = *addr & (~0UL << offset);
if (data)
return __ffs(data);

is prone to the compiler refetching 'data' from *addr after checking for
data != 0 and getting 0 the second time which would trigger undefined
behavior of __ffs(). So that code should definitely use READ_ONCE() to make
things safe.

BTW, find_next_bit() seems to need a similar treatment and in fact I'm not
sure why xas_find_chunk() has a special case for XA_CHUNK_SIZE ==
BITS_PER_LONG because find_next_bit() checks for that and handles that in a
fast path in the same way.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-09-17 19:11:53

by Mirsad Todorovac

[permalink] [raw]

Subject: Re: [BUG] KCSAN: data-race in xas_clear_mark / xas_find_marked

On 9/14/23 10:08, Jan Kara wrote:
> On Fri 18-08-23 13:21:17, Matthew Wilcox wrote:
>> On Fri, Aug 18, 2023 at 10:01:32AM +0200, Mirsad Todorovac wrote:
>>> [ 206.510010] ==================================================================
>>> [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
>>>
>>> [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
>>> [ 206.510081] xas_clear_mark+0xd5/0x180
>>> [ 206.510097] __xa_clear_mark+0xd1/0x100
>>> [ 206.510114] __folio_end_writeback+0x293/0x5a0
>>> [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
>>> [ 206.520735] xas_find_marked+0xe5/0x600
>>> [ 206.520750] filemap_get_folios_tag+0xf9/0x3d0
>> Also, before submitting this kind of report, you should run the
>> trace through scripts/decode_stacktrace.sh to give us line numbers
>> instead of hex offsets, which are useless to anyone who doesn't have
>> your exact kernel build.
>>
>>> [ 206.510010] ==================================================================
>>> [ 206.510035] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked
>>>
>>> [ 206.510067] write to 0xffff963df6a90fe0 of 8 bytes by interrupt on cpu 22:
>>> [ 206.510081] xas_clear_mark (./arch/x86/include/asm/bitops.h:178 ./include/asm-generic/bitops/instrumented-non-atomic.h:115 lib/xarray.c:102 lib/xarray.c:914)
>>> [ 206.510097] __xa_clear_mark (lib/xarray.c:1923)
>>> [ 206.510114] __folio_end_writeback (mm/page-writeback.c:2981)
>>
>> This path is properly using xa_lock_irqsave() before calling
>> __xa_clear_mark().
>>
>>> [ 206.520722] read to 0xffff963df6a90fe0 of 8 bytes by task 2793 on cpu 6:
>>> [ 206.520735] xas_find_marked (./include/linux/xarray.h:1706 lib/xarray.c:1354)
>>> [ 206.520750] filemap_get_folios_tag (mm/filemap.c:1975 mm/filemap.c:2273)
>>
>> This takes the RCU read lock before calling xas_find_marked() as it's
>> supposed to.
>>
>> What garbage do I have to write to tell KCSAN it's wrong? The line
>> that's probably triggering it is currently:
>>
>> unsigned long data = *addr & (~0UL << offset);
>
> I don't think it is actually wrong in this case. You're accessing xarray
> only with RCU protection so it can be changing under your hands. For
> example the code in xas_find_chunk():
>
> unsigned long data = *addr & (~0UL << offset);
> if (data)
> return __ffs(data);
>
> is prone to the compiler refetching 'data' from *addr after checking for
> data != 0 and getting 0 the second time which would trigger undefined
> behavior of __ffs(). So that code should definitely use READ_ONCE() to make
> things safe.
>
> BTW, find_next_bit() seems to need a similar treatment and in fact I'm not
> sure why xas_find_chunk() has a special case for XA_CHUNK_SIZE ==
> BITS_PER_LONG because find_next_bit() checks for that and handles that in a
> fast path in the same way.
>
> Honza

Hi,

Thank you for your insight on the matter.

I guess you meant something like implementing this:

include/linux/xarray.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index cb571dfcf4b1..1715fd322d62 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1720,7 +1720,7 @@ static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
offset++;
if (XA_CHUNK_SIZE == BITS_PER_LONG) {
if (offset < XA_CHUNK_SIZE) {
- unsigned long data = *addr & (~0UL << offset);
+ unsigned long data = READ_ONCE(*addr) & (~0UL << offset);
if (data)
return __ffs(data);
}

This apparently clears the KCSAN xas_find_marked() warning, so this might have been a data race after all.

Do you think we should escalate this to a formal patch?

Best regards,
Mirsad Todorovac