I recently got UAF by running generic/019 in qemu:
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x4508/0x68c0
Read of size 8 at addr ffff88811327f080 by task fio/11147
CPU: 6 PID: 11147 Comm: fio Tainted: G W 5.7.0-next-20200602+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
Call Trace:
dump_stack+0xf6/0x16e
? __lock_acquire+0x4508/0x68c0
? __lock_acquire+0x4508/0x68c0
print_address_description.constprop.0+0x1a/0x210
? __lock_acquire+0x4508/0x68c0
kasan_report.cold+0x1f/0x37
? lockdep_hardirqs_on_prepare+0x480/0x550
? __lock_acquire+0x4508/0x68c0
__lock_acquire+0x4508/0x68c0
? print_usage_bug+0x1f0/0x1f0
? finish_task_switch+0x126/0x5e0
? lockdep_hardirqs_on_prepare+0x550/0x550
? mark_held_locks+0x9e/0xe0
? __schedule+0x801/0x1d90
? _raw_spin_unlock_irq+0x1f/0x30
lock_acquire+0x182/0x790
? remove_wait_queue+0x1d/0x180
? __switch_to_asm+0x42/0x70
? lock_release+0x710/0x710
? __schedule+0x85c/0x1d90
? xfs_log_commit_cil+0x1d8e/0x2a50
? __sched_text_start+0x8/0x8
_raw_spin_lock_irqsave+0x32/0x50
? remove_wait_queue+0x1d/0x180
remove_wait_queue+0x1d/0x180
xfs_log_commit_cil+0x1d9e/0x2a50
? xlog_cil_empty+0x90/0x90
? wake_up_q+0x140/0x140
? rcu_read_lock_sched_held+0x9c/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
__xfs_trans_commit+0x292/0xec0
? xfs_trans_unreserve_and_mod_sb+0xab0/0xab0
? rcu_read_lock_bh_held+0xb0/0xb0
? xfs_isilocked+0x87/0x2e0
? xfs_trans_log_inode+0x1ad/0x480
xfs_vn_update_time+0x3eb/0x6d0
? xfs_setattr_mode.isra.0+0xa0/0xa0
? current_time+0xa8/0x110
? timestamp_truncate+0x2f0/0x2f0
? xfs_setattr_mode.isra.0+0xa0/0xa0
update_time+0x70/0xc0
file_update_time+0x2b7/0x490
? update_time+0xc0/0xc0
? __sb_start_write+0x197/0x3e0
__xfs_filemap_fault.constprop.0+0x1b7/0x480
do_page_mkwrite+0x1ac/0x470
do_wp_page+0x9e2/0x1b10
? do_raw_spin_lock+0x121/0x290
? finish_mkwrite_fault+0x4a0/0x4a0
? rwlock_bug.part.0+0x90/0x90
? handle_mm_fault+0xa81/0x3570
handle_mm_fault+0x1c65/0x3570
? __pmd_alloc+0x4c0/0x4c0
? vmacache_find+0x55/0x2a0
do_user_addr_fault+0x635/0xd42
exc_page_fault+0xdd/0x5b0
? asm_common_interrupt+0x8/0x40
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7f40e022336a
Code: Bad RIP value.
RSP: 002b:00007ffedefb0218 EFLAGS: 00010206
RAX: 00007f40b7a5a000 RBX: 0000000002562280 RCX: 00000000025633d0
RDX: 0000000000000fc0 RSI: 0000000002562420 RDI: 00007f40b7a5a000
RBP: 00007f40b8620190 R08: 0000000000000000 R09: 00007f40b7a5aff0
R10: 00007ffedeff8000 R11: 00007f40b7a5aff0 R12: 0000000000000001
R13: 0000000000001000 R14: 00000000025622a8 R15: 00007f40b8620198
Allocated by task 6826:
save_stack+0x1b/0x40
__kasan_kmalloc.constprop.0+0xc2/0xd0
kmem_alloc+0x154/0x450
xlog_cil_push_work+0xff/0x1250
process_one_work+0xa3e/0x17a0
worker_thread+0x8e2/0x1050
kthread+0x355/0x470
ret_from_fork+0x22/0x30
Freed by task 6826:
save_stack+0x1b/0x40
__kasan_slab_free+0x12c/0x170
kfree+0xd6/0x300
kvfree+0x42/0x50
xlog_cil_committed+0xa9c/0xf30
xlog_cil_push_work+0xa8c/0x1250
process_one_work+0xa3e/0x17a0
worker_thread+0x8e2/0x1050
kthread+0x355/0x470
ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff88811327f000
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 128 bytes inside of
256-byte region [ffff88811327f000, ffff88811327f100)
The buggy address belongs to the page:
page:ffffea00044c9f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00044c9f00 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x200000000010200(slab|head)
raw: 0200000000010200 dead000000000100 dead000000000122 ffff88811a40e800
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88811327ef80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88811327f000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88811327f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88811327f100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88811327f180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
I think the reason is that when 'ctx' is freed in xlog_cil_committed(),
a previous call to xlog_wait(&ctx->xc_ctx->push_wait, ...) hasn't finished
yet. Thus when remove_wait_queue() is called, UAF will be triggered
since 'ctx' was freed:
thread1 thread2 thread3
__xfs_trans_commit
xfs_log_commit_cil
xlog_wait
schedule
xlog_cil_push_work
wake_up_all
xlog_cil_committed
kmem_free
remove_wait_queue
spin_lock_irqsave --> UAF
I tried to fix the problem by using autoremove_wake_function() in
xlog_wait(), however, soft lockup will be triggered this way.
Instead, make sure waitqueue_active(&ctx->push_wait) return false before
freeing 'ctx'.
Signed-off-by: Yu Kuai <[email protected]>
---
fs/xfs/xfs_log_cil.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b43f0e8f43f2..59b21485b0fc 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -607,7 +607,7 @@ xlog_cil_committed(
if (!list_empty(&ctx->busy_extents))
xlog_discard_busy_extents(mp, ctx);
- else
+ else if (!waitqueue_active(&ctx->push_wait))
kmem_free(ctx);
}
--
2.25.4
On Thu, Jun 11, 2020 at 09:39:52AM +0800, Yu Kuai wrote:
> I recently got UAF by running generic/019 in qemu:
>
> ==================================================================
> BUG: KASAN: use-after-free in __lock_acquire+0x4508/0x68c0
> Read of size 8 at addr ffff88811327f080 by task fio/11147
....
> remove_wait_queue+0x1d/0x180
> xfs_log_commit_cil+0x1d9e/0x2a50
> __xfs_trans_commit+0x292/0xec0
Ok, so this is waking up from a the CIL context overrunning the hard
size limit....
> Freed by task 6826:
> save_stack+0x1b/0x40
> __kasan_slab_free+0x12c/0x170
> kfree+0xd6/0x300
> kvfree+0x42/0x50
> xlog_cil_committed+0xa9c/0xf30
> xlog_cil_push_work+0xa8c/0x1250
> process_one_work+0xa3e/0x17a0
> worker_thread+0x8e2/0x1050
> kthread+0x355/0x470
> ret_from_fork+0x22/0x30
Hmmmm. The CIL push work freed the context which means somethign
went wrong somewhere - we must be in CIL commit error path here...
/me checks generic/019
Oh, it's a repeated shutdown test. Right, so we're getting a
shutdown in the middle of a CIL push when the CIL is hard throttling
callers and the CIL context gets freed before the throttled tasks
can be woken.
Gotcha. Yup, that's a real issue, thanks for reporting it!
> I think the reason is that when 'ctx' is freed in xlog_cil_committed(),
> a previous call to xlog_wait(&ctx->xc_ctx->push_wait, ...) hasn't finished
> yet. Thus when remove_wait_queue() is called, UAF will be triggered
> since 'ctx' was freed:
>
> thread1 thread2 thread3
>
> __xfs_trans_commit
> xfs_log_commit_cil
> xlog_wait
> schedule
> xlog_cil_push_work
> wake_up_all
> xlog_cil_committed
> kmem_free
> remove_wait_queue
> spin_lock_irqsave --> UAF
Actually, it's a lot simpler:
thread1 thread2
__xfs_trans_commit
xfs_log_commit_cil
xlog_wait
schedule
xlog_cil_push_work
wake_up_all
<shutdown aborts commit>
xlog_cil_committed
kmem_free
remove_wait_queue
spin_lock_irqsave --> UAF
> Instead, make sure waitqueue_active(&ctx->push_wait) return false before
> freeing 'ctx'.
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> fs/xfs/xfs_log_cil.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index b43f0e8f43f2..59b21485b0fc 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -607,7 +607,7 @@ xlog_cil_committed(
>
> if (!list_empty(&ctx->busy_extents))
> xlog_discard_busy_extents(mp, ctx);
> - else
> + else if (!waitqueue_active(&ctx->push_wait))
> kmem_free(ctx);
That will just leak the memory instead, which is no better.
Let me go write a patch to fix this.
Cheers,
Dave.
--
Dave Chinner
[email protected]
From: Dave Chinner <[email protected]>
xlog_wait() on the CIL context can reference a freed context if the
waiter doesn't get scheduled before the CIL context is freed. This
can happen when a task is on the hard throttle and the CIL push
aborts due to a shutdown. This was detected by generic/019:
thread 1 thread 2
__xfs_trans_commit
xfs_log_commit_cil
<CIL size over hard throttle limit>
xlog_wait
schedule
xlog_cil_push_work
wake_up_all
<shutdown aborts commit>
xlog_cil_committed
kmem_free
remove_wait_queue
spin_lock_irqsave --> UAF
Fix it by moving the wait queue to the CIL rather than keeping it in
in the CIL context that gets freed on push completion. Because the
wait queue is now independent of the CIL context and we might have
multiple contexts in flight at once, only wake the waiters on the
push throttle when the context we are pushing is over the hard
throttle size threshold.
Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
Reported-by: Yu Kuai <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
---
fs/xfs/xfs_log_cil.c | 10 +++++-----
fs/xfs/xfs_log_priv.h | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b43f0e8f43f2e..9ed90368ab311 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -671,7 +671,8 @@ xlog_cil_push_work(
/*
* Wake up any background push waiters now this context is being pushed.
*/
- wake_up_all(&ctx->push_wait);
+ if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
+ wake_up_all(&cil->xc_push_wait);
/*
* Check if we've anything to push. If there is nothing, then we don't
@@ -743,13 +744,12 @@ xlog_cil_push_work(
/*
* initialise the new context and attach it to the CIL. Then attach
- * the current context to the CIL committing lsit so it can be found
+ * the current context to the CIL committing list so it can be found
* during log forces to extract the commit lsn of the sequence that
* needs to be forced.
*/
INIT_LIST_HEAD(&new_ctx->committing);
INIT_LIST_HEAD(&new_ctx->busy_extents);
- init_waitqueue_head(&new_ctx->push_wait);
new_ctx->sequence = ctx->sequence + 1;
new_ctx->cil = cil;
cil->xc_ctx = new_ctx;
@@ -937,7 +937,7 @@ xlog_cil_push_background(
if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
ASSERT(cil->xc_ctx->space_used < log->l_logsize);
- xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock);
+ xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
return;
}
@@ -1216,12 +1216,12 @@ xlog_cil_init(
INIT_LIST_HEAD(&cil->xc_committing);
spin_lock_init(&cil->xc_cil_lock);
spin_lock_init(&cil->xc_push_lock);
+ init_waitqueue_head(&cil->xc_push_wait);
init_rwsem(&cil->xc_ctx_lock);
init_waitqueue_head(&cil->xc_commit_wait);
INIT_LIST_HEAD(&ctx->committing);
INIT_LIST_HEAD(&ctx->busy_extents);
- init_waitqueue_head(&ctx->push_wait);
ctx->sequence = 1;
ctx->cil = cil;
cil->xc_ctx = ctx;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index ec22c7a3867f1..75a62870b63af 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -240,7 +240,6 @@ struct xfs_cil_ctx {
struct xfs_log_vec *lv_chain; /* logvecs being pushed */
struct list_head iclog_entry;
struct list_head committing; /* ctx committing list */
- wait_queue_head_t push_wait; /* background push throttle */
struct work_struct discard_endio_work;
};
@@ -274,6 +273,7 @@ struct xfs_cil {
wait_queue_head_t xc_commit_wait;
xfs_lsn_t xc_current_sequence;
struct work_struct xc_push_work;
+ wait_queue_head_t xc_push_wait; /* background push throttle */
} ____cacheline_aligned_in_smp;
/*
On 2020/6/11 10:28, Dave Chinner wrote
> Actually, it's a lot simpler:
>
> thread1 thread2
>
> __xfs_trans_commit
> xfs_log_commit_cil
> xlog_wait
> schedule
> xlog_cil_push_work
> wake_up_all
> <shutdown aborts commit>
> xlog_cil_committed
> kmem_free
>
> remove_wait_queue
> spin_lock_irqsave --> UAF
>
It's ture in this case, however, I got another result when I
tried to reporduce it, which seems 'ctx' can be freed in a
different path:
[ 64.975996] run fstests generic/019 at 2020-06-10 16:13:44
[ 69.126208] xfs filesystem being mounted at /tmp/scratch supports
timestamps until 2038 (0x7fffffff)
[ 99.166846] XFS (sdb): log I/O error -5
[ 99.170111] XFS (sdb): Log I/O Error Detected. Shutting down filesystem
[ 99.171071] XFS (sdb): Please unmount the filesystem and rectify the
problem(s)
[ 99.179569] ------------[ cut here ]------------
[ 99.180432] WARNING: CPU: 7 PID: 2705 at fs/iomap/buffered-io.c:1030
iomap_page_mkwrite_actor+0x17d/0x1b0
[ 99.181273]
==================================================================
[ 99.181758] Modules linked in:
[ 99.182806] BUG: KASAN: use-after-free in do_raw_spin_trylock+0x67/0x180
[ 99.183255] CPU: 7 PID: 2705 Comm: fio Not tainted
5.7.0-next-20200602+ #29
[ 99.184166] Read of size 4 at addr ffff888115f28868 by task fio/2704
[ 99.184171]
[ 99.185142] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31
04/01/2014
[ 99.185995] CPU: 6 PID: 2704 Comm: fio Not tainted
5.7.0-next-20200602+ #29
[ 99.186227] RIP: 0010:iomap_page_mkwrite_actor+0x17d/0x1b0
[ 99.188199] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31
04/01/2014
[ 99.189178] Code: 89 ef e8 a6 d4 c7 ff e9 3f ff ff ff e8 fc 64 ad ff
89 da 31 f6 48 89 ef e8 b0 1e f2 ff 49 89 dc e9 26 ff ff ff e8 e3 64 ad
ff <0f> 0b eb be e8 da 64 ad ff 4d 8d 67 ffe
[ 99.189899] Call Trace:
[ 99.191748] RSP: 0000:ffff88810599fa18 EFLAGS: 00010293
[ 99.194218] dump_stack+0xf6/0x16e
[ 99.194574] RAX: ffff888106df3680 RBX: 0000000000001000 RCX:
ffffffff94338cad
[ 99.195301] ? do_raw_spin_trylock+0x67/0x180
[ 99.195777] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000001
[ 99.195786] RBP: ffffea0002544800 R08: ffff888106df3680 R09:
fffff940004a8901
[ 99.196764] ? do_raw_spin_trylock+0x67/0x180
[ 99.196778] print_address_description.constprop.0.cold+0xd3/0x415
[ 99.197393] R10: ffffea0002544807 R11: fffff940004a8900 R12:
0000000000000000
[ 99.198380] ? do_raw_spin_trylock+0x67/0x180
[ 99.199349] R13: 0000000000000000 R14: ffff8880b6ee7280 R15:
ffffea00025447c8
[ 99.199939] kasan_report.cold+0x1f/0x37
[ 99.199949] ? __switch_to+0x510/0xef0
[ 99.200812] FS: 00007ff7d8562740(0000) GS:ffff88811a400000(0000)
knlGS:0000000000000000
[ 99.201755] ? do_raw_spin_trylock+0x67/0x180
[ 99.201765] check_memory_region+0x14e/0x1b0
[ 99.202378] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.203319] do_raw_spin_trylock+0x67/0x180
[ 99.203858] CR2: 00007ff7ae3e4ff0 CR3: 00000001028be000 CR4:
00000000000006e0
[ 99.204374] ? do_raw_spin_lock+0x290/0x290
[ 99.205470] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 99.206069] _raw_spin_lock_irqsave+0x44/0x80
[ 99.206641] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 99.206647] Call Trace:
[ 99.207420] ? remove_wait_queue+0x22/0x190
[ 99.208014] iomap_apply+0x2d7/0xc00
[ 99.208949] remove_wait_queue+0x22/0x190
[ 99.209539] ? iomap_page_create+0x350/0x350
[ 99.210510] xfs_log_commit_cil+0x1c7e/0x2940
[ 99.211115] ? trace_event_raw_event_iomap_apply+0x430/0x430
[ 99.212073] ? xlog_cil_empty+0x90/0x90
[ 99.212421] ? down_write_trylock+0x2f0/0x2f0
[ 99.212979] ? check_flags.part.0+0x430/0x430
[ 99.213486] ? update_time+0xc0/0xc0
[ 99.214025] ? wake_up_q+0x140/0x140
[ 99.214610] iomap_page_mkwrite+0x26a/0x3b0
[ 99.215201] ? xlog_ticket_alloc+0x3e0/0x3e0
[ 99.215966] ? iomap_page_create+0x350/0x350
[ 99.216487] ? __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 99.217097] __xfs_filemap_fault.constprop.0+0x32f/0x4e0
[ 99.217675] __xfs_trans_commit+0x2b3/0xf20
[ 99.218206] do_page_mkwrite+0x1b1/0x470
[ 99.218681] ? xfs_trans_unreserve_and_mod_sb+0xab0/0xab0
[ 99.219259] do_wp_page+0x9e7/0x1b10
[ 99.219838] ? xfs_isilocked+0x8c/0x2f0
[ 99.220439] ? finish_mkwrite_fault+0x4b0/0x4b0
[ 99.221102] ? xfs_trans_log_inode+0x1b2/0x480
[ 99.221817] ? do_user_addr_fault+0x2fd/0xd42
[ 99.222382] xfs_vn_update_time+0x40a/0x730
[ 99.222922] handle_mm_fault+0x1c9f/0x3600
[ 99.223634] ? xfs_setattr_mode.isra.0+0xb0/0xb0
[ 99.224140] ? __pmd_alloc+0x390/0x390
[ 99.224653] ? current_time+0xad/0x110
[ 99.225278] ? vmacache_find+0x5a/0x2a0
[ 99.225861] ? timestamp_truncate+0x2f0/0x2f0
[ 99.226461] do_user_addr_fault+0x635/0xd42
[ 99.227019] ? xfs_setattr_mode.isra.0+0xb0/0xb0
[ 99.227029] update_time+0x75/0xc0
[ 99.227760] exc_page_fault+0x12a/0x6f0
[ 99.228377] file_update_time+0x2bc/0x490
[ 99.228889] ? asm_exc_page_fault+0x8/0x30
[ 99.229414] ? update_time+0xc0/0xc0
[ 99.229933] asm_exc_page_fault+0x1e/0x30
[ 99.230515] ? __sb_start_write+0x225/0x3f0
[ 99.231099] RIP: 0033:0x7ff7d709636a
[ 99.231743] ? __sb_start_write+0x1a3/0x3f0
[ 99.231755] __xfs_filemap_fault.constprop.0+0x313/0x4e0
[ 99.232230] Code: Bad RIP value.
[ 99.232747] do_page_mkwrite+0x1b1/0x470
[ 99.233296] RSP: 002b:00007ffd89c6e2a8 EFLAGS: 00010206
[ 99.233841] do_wp_page+0x9e7/0x1b10
[ 99.234333] RAX: 00007ff7ae3e4000 RBX: 000000000159c280 RCX:
000000000159d3d0
[ 99.234863] ? finish_mkwrite_fault+0x4b0/0x4b0
[ 99.235432] RDX: 0000000000000fc0 RSI: 000000000159c420 RDI:
00007ff7ae3e4000
[ 99.235906] ? _raw_spin_unlock_irq+0x24/0x30
[ 99.236485] RBP: 00007ff7af493190 R08: 0000000000000000 R09:
00007ff7ae3e4ff0
[ 99.237198] handle_mm_fault+0x1c9f/0x3600
[ 99.237638] R10: 00007ffd89d04000 R11: 00007ff7ae3e4ff0 R12:
0000000000000001
[ 99.238163] ? __pmd_alloc+0x390/0x390
[ 99.238861] R13: 0000000000001000 R14: 000000000159c2a8 R15:
00007ff7af493198
[ 99.238874] irq event stamp: 0
[ 99.239368] ? vmacache_find+0x5a/0x2a0
[ 99.240324] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 99.240946] do_user_addr_fault+0x635/0xd42
[ 99.241903] hardirqs last disabled at (0): [<ffffffff93b3715c>]
copy_process+0x182c/0x64b0
[ 99.242488] exc_page_fault+0x12a/0x6f0
[ 99.243447] softirqs last enabled at (0): [<ffffffff93b371fd>]
copy_process+0x18cd/0x64b0
[ 99.244001] ? asm_exc_page_fault+0x8/0x30
[ 99.244959] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 99.244965] ---[ end trace a1aa182be3fbd861 ]---
[ 99.245482] asm_exc_page_fault+0x1e/0x30
[ 99.254277] RIP: 0033:0x7ff7d709636a
[ 99.254749] Code: Bad RIP value.
[ 99.255182] RSP: 002b:00007ffd89c6e2a8 EFLAGS: 00010206
[ 99.255879] RAX: 00007ff7ae7f5000 RBX: 000000000159c280 RCX:
000000000159d3d0
[ 99.256815] RDX: 0000000000000fc0 RSI: 000000000159c420 RDI:
00007ff7ae7f5000
[ 99.257760] RBP: 00007ff7af47a3a0 R08: 0000000000000000 R09:
00007ff7ae7f5ff0
[ 99.258720] R10: 00007ffd89d04000 R11: 00007ff7ae7f5ff0 R12:
0000000000000001
[ 99.259727] R13: 0000000000001000 R14: 000000000159c2a8 R15:
00007ff7af47a3a8
[ 99.260722]
[ 99.260934] Allocated by task 986:
[ 99.261403] save_stack+0x1b/0x40
[ 99.261843] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 99.262483] __kmalloc+0x179/0x3e0
[ 99.262937] kmem_alloc+0x175/0x4a0
[ 99.263416] xlog_cil_push_work+0x104/0x1250
[ 99.264038] process_one_work+0xa62/0x1840
[ 99.264583] worker_thread+0xa3/0x1050
[ 99.265087] kthread+0x35a/0x470
[ 99.265520] ret_from_fork+0x22/0x30
[ 99.266021]
[ 99.266238] Freed by task 1123:
[ 99.266658] save_stack+0x1b/0x40
[ 99.267095] __kasan_slab_free+0x117/0x160
[ 99.267652] kfree+0xdf/0x320
[ 99.268070] kvfree+0x47/0x50
[ 99.268500] xlog_cil_committed+0xaf0/0xf80
[ 99.269104] xlog_cil_process_committed+0xfe/0x1e0
[ 99.269743] xlog_state_do_callback+0x513/0x9b0
[ 99.270335] xfs_log_force_umount+0x2d6/0x4a0
[ 99.270925] xfs_do_force_shutdown+0xa2/0x220
[ 99.271514] xlog_ioend_work+0x13a/0x240
[ 99.272042] process_one_work+0xa62/0x1840
[ 99.272582] worker_thread+0xa3/0x1050
[ 99.273104] kthread+0x35a/0x470
[ 99.273527] ret_from_fork+0x22/0x30
[ 99.274008]
[ 99.274221] The buggy address belongs to the object at ffff888115f28800
[ 99.274221] which belongs to the cache kmalloc-512 of size 512
[ 99.275821] The buggy address is located 104 bytes inside of
[ 99.275821] 512-byte region [ffff888115f28800, ffff888115f28a00)
[ 99.277477] The buggy address belongs to the page:
[ 99.278121] page:ffffea000457ca00 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 head:ffffea000457ca00 order:3
compound_mapcount:0 compound_pincount:0
[ 99.279988] flags: 0x200000000010200(slab|head)
[ 99.280582] raw: 0200000000010200 ffffea0004156c00 0000000400000004
ffff888117c0e580
[ 99.281615] raw: 0000000000000000 0000000080200020 00000001ffffffff
0000000000000000
[ 99.282615] page dumped because: kasan: bad access detected
[ 99.283364]
[ 99.283585] Memory state around the buggy address:
[ 99.284227] ffff888115f28700: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 99.285206] ffff888115f28780: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 99.286164] >ffff888115f28800: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 99.287116] ^
[ 99.287962] ffff888115f28880: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 99.288915] ffff888115f28900: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 99.289874]
==================================================================
Thanks!
Yu Kuai
On Thu, Jun 11, 2020 at 11:01:38AM +0800, yukuai (C) wrote:
> On 2020/6/11 10:28, Dave Chinner wrote
> > Actually, it's a lot simpler:
> >
> > thread1 thread2
> >
> > __xfs_trans_commit
> > xfs_log_commit_cil
> > xlog_wait
> > schedule
> > xlog_cil_push_work
> > wake_up_all
> > <shutdown aborts commit>
> > xlog_cil_committed
> > kmem_free
> >
> > remove_wait_queue
> > spin_lock_irqsave --> UAF
> >
>
> It's ture in this case, however, I got another result when I
> tried to reporduce it, which seems 'ctx' can be freed in a
> different path:
Yup, it's effectively the same thing because of the nature of the IO
failures (generated at submit time) and scheduler behaviour of
workqueues. THis means the IO completion that processes the error is
is queued to a workqueue on the same CPU. When thread 2 finishes
running (it hasn't seen an error yet) the completion work will get
get scheduled ahead of thread1 (cpu bound kernel task vs unbound
user task). The completion work then runs the shutdown because it
saw a log IO error and because it's the commit record bio it runs
the journal checkpoint completion to abort all the items attached to
it and free the CIL context. Then thread 1 runs again.
The only difference between the two cases is which IO of the CIL
commit the request was failed on....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Jun 11, 2020 at 12:45:03PM +1000, Dave Chinner wrote:
>
> From: Dave Chinner <[email protected]>
>
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
>
> thread 1 thread 2
>
> __xfs_trans_commit
> xfs_log_commit_cil
> <CIL size over hard throttle limit>
> xlog_wait
> schedule
> xlog_cil_push_work
> wake_up_all
> <shutdown aborts commit>
> xlog_cil_committed
> kmem_free
>
> remove_wait_queue
> spin_lock_irqsave --> UAF
>
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.
>
> Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
> Reported-by: Yu Kuai <[email protected]>
> Signed-off-by: Dave Chinner <[email protected]>
> ---
Looks reasonable:
Reviewed-by: Brian Foster <[email protected]>
> fs/xfs/xfs_log_cil.c | 10 +++++-----
> fs/xfs/xfs_log_priv.h | 2 +-
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index b43f0e8f43f2e..9ed90368ab311 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -671,7 +671,8 @@ xlog_cil_push_work(
> /*
> * Wake up any background push waiters now this context is being pushed.
> */
> - wake_up_all(&ctx->push_wait);
> + if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
> + wake_up_all(&cil->xc_push_wait);
>
> /*
> * Check if we've anything to push. If there is nothing, then we don't
> @@ -743,13 +744,12 @@ xlog_cil_push_work(
>
> /*
> * initialise the new context and attach it to the CIL. Then attach
> - * the current context to the CIL committing lsit so it can be found
> + * the current context to the CIL committing list so it can be found
> * during log forces to extract the commit lsn of the sequence that
> * needs to be forced.
> */
> INIT_LIST_HEAD(&new_ctx->committing);
> INIT_LIST_HEAD(&new_ctx->busy_extents);
> - init_waitqueue_head(&new_ctx->push_wait);
> new_ctx->sequence = ctx->sequence + 1;
> new_ctx->cil = cil;
> cil->xc_ctx = new_ctx;
> @@ -937,7 +937,7 @@ xlog_cil_push_background(
> if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
> trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
> ASSERT(cil->xc_ctx->space_used < log->l_logsize);
> - xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock);
> + xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
> return;
> }
>
> @@ -1216,12 +1216,12 @@ xlog_cil_init(
> INIT_LIST_HEAD(&cil->xc_committing);
> spin_lock_init(&cil->xc_cil_lock);
> spin_lock_init(&cil->xc_push_lock);
> + init_waitqueue_head(&cil->xc_push_wait);
> init_rwsem(&cil->xc_ctx_lock);
> init_waitqueue_head(&cil->xc_commit_wait);
>
> INIT_LIST_HEAD(&ctx->committing);
> INIT_LIST_HEAD(&ctx->busy_extents);
> - init_waitqueue_head(&ctx->push_wait);
> ctx->sequence = 1;
> ctx->cil = cil;
> cil->xc_ctx = ctx;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index ec22c7a3867f1..75a62870b63af 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -240,7 +240,6 @@ struct xfs_cil_ctx {
> struct xfs_log_vec *lv_chain; /* logvecs being pushed */
> struct list_head iclog_entry;
> struct list_head committing; /* ctx committing list */
> - wait_queue_head_t push_wait; /* background push throttle */
> struct work_struct discard_endio_work;
> };
>
> @@ -274,6 +273,7 @@ struct xfs_cil {
> wait_queue_head_t xc_commit_wait;
> xfs_lsn_t xc_current_sequence;
> struct work_struct xc_push_work;
> + wait_queue_head_t xc_push_wait; /* background push throttle */
> } ____cacheline_aligned_in_smp;
>
> /*
>
On 2020/6/11 10:45, Dave Chinner wrote:
>
> From: Dave Chinner <[email protected]>
>
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
>
> thread 1 thread 2
>
> __xfs_trans_commit
> xfs_log_commit_cil
> <CIL size over hard throttle limit>
> xlog_wait
> schedule
> xlog_cil_push_work
> wake_up_all
> <shutdown aborts commit>
> xlog_cil_committed
> kmem_free
>
> remove_wait_queue
> spin_lock_irqsave --> UAF
>
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.
Hi, Dave,
How do you think about the following fix:
1. use autoremove_wake_func(), and remove remove_wait_queue() to
avoid UAF.
2. add finish_wait().
@@ -576,12 +576,13 @@ xlog_wait(
__releases(lock)
{
DECLARE_WAITQUEUE(wait, current);
+ wait.func = autoremove_wake_function;
add_wait_queue_exclusive(wq, &wait);
__set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock(lock);
schedule();
- remove_wait_queue(wq, &wait);
+ finish_wait(wq, &wait);
}
Best regards!
Yu Kuai
On Tue, Jun 16, 2020 at 09:16:09AM +0800, yukuai (C) wrote:
> On 2020/6/11 10:45, Dave Chinner wrote:
> >
> > From: Dave Chinner <[email protected]>
> >
> > xlog_wait() on the CIL context can reference a freed context if the
> > waiter doesn't get scheduled before the CIL context is freed. This
> > can happen when a task is on the hard throttle and the CIL push
> > aborts due to a shutdown. This was detected by generic/019:
> >
> > thread 1 thread 2
> >
> > __xfs_trans_commit
> > xfs_log_commit_cil
> > <CIL size over hard throttle limit>
> > xlog_wait
> > schedule
> > xlog_cil_push_work
> > wake_up_all
> > <shutdown aborts commit>
> > xlog_cil_committed
> > kmem_free
> >
> > remove_wait_queue
> > spin_lock_irqsave --> UAF
> >
> > Fix it by moving the wait queue to the CIL rather than keeping it in
> > in the CIL context that gets freed on push completion. Because the
> > wait queue is now independent of the CIL context and we might have
> > multiple contexts in flight at once, only wake the waiters on the
> > push throttle when the context we are pushing is over the hard
> > throttle size threshold.
>
> Hi, Dave,
>
> How do you think about the following fix:
>
> 1. use autoremove_wake_func(), and remove remove_wait_queue() to
> avoid UAF.
> 2. add finish_wait().
>
> @@ -576,12 +576,13 @@ xlog_wait(
> __releases(lock)
> {
> DECLARE_WAITQUEUE(wait, current);
> + wait.func = autoremove_wake_function;
>
> add_wait_queue_exclusive(wq, &wait);
> __set_current_state(TASK_UNINTERRUPTIBLE);
> spin_unlock(lock);
> schedule();
> - remove_wait_queue(wq, &wait);
> + finish_wait(wq, &wait);
> }
Yes, that would address this specific symptom of the problem, but it
doesn't fix the problem root cause: that the wq can be freed while
this function sleeps. IMO, this sort of change leaves a trap for
future modifications - all the code calling xlog_wait() assumes the
embedded wq the task is sleeping on still exists after waiting so we
really should be fixing the problem the incorrect existence
guarantee in the CIL code that you tripped over.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Jun 11, 2020 at 12:45:03PM +1000, Dave Chinner wrote:
>
> From: Dave Chinner <[email protected]>
>
> xlog_wait() on the CIL context can reference a freed context if the
> waiter doesn't get scheduled before the CIL context is freed. This
> can happen when a task is on the hard throttle and the CIL push
> aborts due to a shutdown. This was detected by generic/019:
>
> thread 1 thread 2
>
> __xfs_trans_commit
> xfs_log_commit_cil
> <CIL size over hard throttle limit>
> xlog_wait
> schedule
> xlog_cil_push_work
> wake_up_all
> <shutdown aborts commit>
> xlog_cil_committed
> kmem_free
>
> remove_wait_queue
> spin_lock_irqsave --> UAF
>
> Fix it by moving the wait queue to the CIL rather than keeping it in
> in the CIL context that gets freed on push completion. Because the
> wait queue is now independent of the CIL context and we might have
> multiple contexts in flight at once, only wake the waiters on the
> push throttle when the context we are pushing is over the hard
> throttle size threshold.
>
> Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push")
> Reported-by: Yu Kuai <[email protected]>
> Signed-off-by: Dave Chinner <[email protected]>
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>