Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).
The race can be demonstrated below:
Thread-1 Thread-2
| nci_dev_up()
| nci_open_device()
| __nci_request(nci_reset_req)
| nci_send_cmd
| queue_work(cmd_work)
nci_unregister_device() |
nci_close_device() | ...
del_timer_sync(cmd_timer)[1] |
... | Worker
nci_free_device() | nci_cmd_work()
kfree(ndev)[3] | mod_timer(cmd_timer)[2]
In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.
This UAF is easy to trigger, crash trace by POC is like below
[ 66.703713] ==================================================================
[ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[ 66.703974]
[ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[ 66.703974] Call Trace:
[ 66.703974] <TASK>
[ 66.703974] dump_stack_lvl+0x57/0x7d
[ 66.703974] print_report.cold+0x5e/0x5db
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] kasan_report+0xbe/0x1c0
[ 66.703974] ? enqueue_timer+0x448/0x490
[ 66.703974] enqueue_timer+0x448/0x490
[ 66.703974] __mod_timer+0x5e6/0xb80
[ 66.703974] ? mark_held_locks+0x9e/0xe0
[ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 66.703974] ? queue_work_on+0x61/0x80
[ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130
[ 66.703974] process_one_work+0x8bb/0x1510
[ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230
[ 66.703974] ? rwlock_bug.part.0+0x90/0x90
[ 66.703974] ? _raw_spin_lock_irq+0x41/0x50
[ 66.703974] worker_thread+0x575/0x1190
[ 66.703974] ? process_one_work+0x1510/0x1510
[ 66.703974] kthread+0x2a0/0x340
[ 66.703974] ? kthread_complete_and_exit+0x20/0x20
[ 66.703974] ret_from_fork+0x22/0x30
[ 66.703974] </TASK>
[ 66.703974]
[ 66.703974] Allocated by task 267:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] __kasan_kmalloc+0x81/0xa0
[ 66.703974] nci_allocate_device+0xd3/0x390
[ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0
[ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd
[ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0
[ 66.703974] tty_ioctl+0x764/0x1310
[ 66.703974] __x64_sys_ioctl+0x122/0x190
[ 66.703974] do_syscall_64+0x3b/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 66.703974]
[ 66.703974] Freed by task 406:
[ 66.703974] kasan_save_stack+0x1e/0x40
[ 66.703974] kasan_set_track+0x21/0x30
[ 66.703974] kasan_set_free_info+0x20/0x30
[ 66.703974] __kasan_slab_free+0x108/0x170
[ 66.703974] kfree+0xb0/0x330
[ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0
[ 66.703974] nci_uart_tty_close+0xdf/0x180
[ 66.703974] tty_ldisc_kill+0x73/0x110
[ 66.703974] tty_ldisc_hangup+0x281/0x5b0
[ 66.703974] __tty_hangup.part.0+0x431/0x890
[ 66.703974] tty_release+0x3a8/0xc80
[ 66.703974] __fput+0x1f0/0x8c0
[ 66.703974] task_work_run+0xc9/0x170
[ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0
[ 66.703974] syscall_exit_to_user_mode+0x19/0x50
[ 66.703974] do_syscall_64+0x48/0x90
[ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <[email protected]>
---
net/nfc/nci/core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index d2537383a3e8..0d7763c322b5 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev)
mutex_lock(&ndev->req_lock);
if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
+ /* Need to flush the cmd wq in case
+ * there is a queued/running cmd_work
+ */
+ flush_workqueue(ndev->cmd_wq);
del_timer_sync(&ndev->cmd_timer);
del_timer_sync(&ndev->data_timer);
mutex_unlock(&ndev->req_lock);
--
2.35.1
Hello:
This patch was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:
On Wed, 13 Apr 2022 00:04:30 +0800 you wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
>
> The race can be demonstrated below:
>
> Thread-1 Thread-2
> | nci_dev_up()
> | nci_open_device()
> | __nci_request(nci_reset_req)
> | nci_send_cmd
> | queue_work(cmd_work)
> nci_unregister_device() |
> nci_close_device() | ...
> del_timer_sync(cmd_timer)[1] |
> ... | Worker
> nci_free_device() | nci_cmd_work()
> kfree(ndev)[3] | mod_timer(cmd_timer)[2]
>
> [...]
Here is the summary with links:
- [v0] nfc: nci: add flush_workqueue to prevent uaf
https://git.kernel.org/netdev/net/c/ef27324e2cb7
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
On 12/04/2022 18:04, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
>
> The race can be demonstrated below:
>
Thanks!
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Best regards,
Krzysztof
Hello Guenter,
> I have been wondering about this and the same code further below.
> What prevents the command timer from firing after the call to
> flush_workqueue() ?
>
> Thanks,
> Guenter
>
From my understanding, once the flush_workqueue() is executed, the work that queued in
ndev->cmd_wq will be taken the care of.
That is, once the flush_workqueue() is finished, it promises there is no executing or
pending nci_cmd_work() ever.
static void nci_cmd_work(struct work_struct *work)
{
// ...
mod_timer(&ndev->cmd_timer,
jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
// ...
}
The command timer is still able be fired because the mod_timer() here. That is why the
del_timer_sync() is necessary after the flush_workqueue().
One very puzzling part is that you may find out the timer queue the work again
/* NCI command timer function */
static void nci_cmd_timer(struct timer_list *t)
{
// ...
queue_work(ndev->cmd_wq, &ndev->cmd_work);
}
But I found that this is okay because there is no packets in ndev->cmd_q buffers hence
even there is a queued nci_cmd_work(), it simply checks the queue and returns.
That is, the old race picture as below
> Thread-1 Thread-2
> | nci_dev_up()
> | nci_open_device()
> | __nci_request(nci_reset_req)
> | nci_send_cmd
> | queue_work(cmd_work)
> nci_unregister_device() |
> nci_close_device() | ...
> del_timer_sync(cmd_timer)[1] |
> ... | Worker
> nci_free_device() | nci_cmd_work()
> kfree(ndev)[3] | mod_timer(cmd_timer)[2]
is impossible now because the patched flush_workqueue() make the race like below
> Thread-1 Thread-2
> | nci_dev_up()
> | nci_open_device()
> | __nci_request(nci_reset_req)
> | nci_send_cmd
> | queue_work(cmd_work)
> nci_unregister_device() |
> nci_close_device() | ...
> flush_workqueue()[patch] | Worker
> | nci_cmd_work()
> | mod_timer(cmd_timer)[2]
> // work over then return
> del_timer_sync(cmd_timer)[1] |
> | Timer
> | nci_cmd_timer()
> |
> // timer over then return |
> ... |
> nci_free_device() |
> kfree(ndev)[3] |
With above thinkings and the given fact that my POC didn't raise the UAF, I think the
flush_workqueue() + del_timer_sync() combination is okay to hinder this race.
Tell me if there is anything wrong.
Regards
Lin Ma
On Wed, Apr 13, 2022 at 12:04:30AM +0800, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
>
> The race can be demonstrated below:
>
> Thread-1 Thread-2
> | nci_dev_up()
> | nci_open_device()
> | __nci_request(nci_reset_req)
> | nci_send_cmd
> | queue_work(cmd_work)
> nci_unregister_device() |
> nci_close_device() | ...
> del_timer_sync(cmd_timer)[1] |
> ... | Worker
> nci_free_device() | nci_cmd_work()
> kfree(ndev)[3] | mod_timer(cmd_timer)[2]
>
> In short, the cleanup routine thought that the cmd_timer has already
> been detached by [1] but the mod_timer can re-attach the timer [2], even
> it is already released [3], resulting in UAF.
>
> This UAF is easy to trigger, crash trace by POC is like below
>
> [ 66.703713] ==================================================================
> [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
> [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
> [ 66.703974]
> [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
> [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
> [ 66.703974] Call Trace:
> [ 66.703974] <TASK>
> [ 66.703974] dump_stack_lvl+0x57/0x7d
> [ 66.703974] print_report.cold+0x5e/0x5db
> [ 66.703974] ? enqueue_timer+0x448/0x490
> [ 66.703974] kasan_report+0xbe/0x1c0
> [ 66.703974] ? enqueue_timer+0x448/0x490
> [ 66.703974] enqueue_timer+0x448/0x490
> [ 66.703974] __mod_timer+0x5e6/0xb80
> [ 66.703974] ? mark_held_locks+0x9e/0xe0
> [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0
> [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410
> [ 66.703974] ? queue_work_on+0x61/0x80
> [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130
> [ 66.703974] process_one_work+0x8bb/0x1510
> [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410
> [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230
> [ 66.703974] ? rwlock_bug.part.0+0x90/0x90
> [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50
> [ 66.703974] worker_thread+0x575/0x1190
> [ 66.703974] ? process_one_work+0x1510/0x1510
> [ 66.703974] kthread+0x2a0/0x340
> [ 66.703974] ? kthread_complete_and_exit+0x20/0x20
> [ 66.703974] ret_from_fork+0x22/0x30
> [ 66.703974] </TASK>
> [ 66.703974]
> [ 66.703974] Allocated by task 267:
> [ 66.703974] kasan_save_stack+0x1e/0x40
> [ 66.703974] __kasan_kmalloc+0x81/0xa0
> [ 66.703974] nci_allocate_device+0xd3/0x390
> [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0
> [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd
> [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0
> [ 66.703974] tty_ioctl+0x764/0x1310
> [ 66.703974] __x64_sys_ioctl+0x122/0x190
> [ 66.703974] do_syscall_64+0x3b/0x90
> [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 66.703974]
> [ 66.703974] Freed by task 406:
> [ 66.703974] kasan_save_stack+0x1e/0x40
> [ 66.703974] kasan_set_track+0x21/0x30
> [ 66.703974] kasan_set_free_info+0x20/0x30
> [ 66.703974] __kasan_slab_free+0x108/0x170
> [ 66.703974] kfree+0xb0/0x330
> [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0
> [ 66.703974] nci_uart_tty_close+0xdf/0x180
> [ 66.703974] tty_ldisc_kill+0x73/0x110
> [ 66.703974] tty_ldisc_hangup+0x281/0x5b0
> [ 66.703974] __tty_hangup.part.0+0x431/0x890
> [ 66.703974] tty_release+0x3a8/0xc80
> [ 66.703974] __fput+0x1f0/0x8c0
> [ 66.703974] task_work_run+0xc9/0x170
> [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0
> [ 66.703974] syscall_exit_to_user_mode+0x19/0x50
> [ 66.703974] do_syscall_64+0x48/0x90
> [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> To fix the UAF, this patch adds flush_workqueue() to ensure the
> nci_cmd_work is finished before the following del_timer_sync.
> This combination will promise the timer is actually detached.
>
> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
> Signed-off-by: Lin Ma <[email protected]>
> Reviewed-by: Krzysztof Kozlowski <[email protected]>
> ---
> net/nfc/nci/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
> index d2537383a3e8..0d7763c322b5 100644
> --- a/net/nfc/nci/core.c
> +++ b/net/nfc/nci/core.c
> @@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev)
> mutex_lock(&ndev->req_lock);
>
> if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
> + /* Need to flush the cmd wq in case
> + * there is a queued/running cmd_work
> + */
> + flush_workqueue(ndev->cmd_wq);
> del_timer_sync(&ndev->cmd_timer);
I have been wondering about this and the same code further below.
What prevents the command timer from firing after the call to
flush_workqueue() ?
Thanks,
Guenter
> del_timer_sync(&ndev->data_timer);
> mutex_unlock(&ndev->req_lock);
On Mon, Apr 18, 2022 at 09:59:10PM +0800, Lin Ma wrote:
> Hello Guenter,
>
> > I have been wondering about this and the same code further below.
> > What prevents the command timer from firing after the call to
> > flush_workqueue() ?
> >
> > Thanks,
> > Guenter
> >
>
> From my understanding, once the flush_workqueue() is executed, the work that queued in
> ndev->cmd_wq will be taken the care of.
>
> That is, once the flush_workqueue() is finished, it promises there is no executing or
> pending nci_cmd_work() ever.
>
> static void nci_cmd_work(struct work_struct *work)
> {
> // ...
> mod_timer(&ndev->cmd_timer,
> jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
> // ...
> }
>
> The command timer is still able be fired because the mod_timer() here. That is why the
> del_timer_sync() is necessary after the flush_workqueue().
>
> One very puzzling part is that you may find out the timer queue the work again
>
> /* NCI command timer function */
> static void nci_cmd_timer(struct timer_list *t)
> {
> // ...
> queue_work(ndev->cmd_wq, &ndev->cmd_work);
> }
>
> But I found that this is okay because there is no packets in ndev->cmd_q buffers hence
> even there is a queued nci_cmd_work(), it simply checks the queue and returns.
>
> That is, the old race picture as below
>
> > Thread-1 Thread-2
> > | nci_dev_up()
> > | nci_open_device()
> > | __nci_request(nci_reset_req)
> > | nci_send_cmd
> > | queue_work(cmd_work)
> > nci_unregister_device() |
> > nci_close_device() | ...
> > del_timer_sync(cmd_timer)[1] |
> > ... | Worker
> > nci_free_device() | nci_cmd_work()
> > kfree(ndev)[3] | mod_timer(cmd_timer)[2]
>
> is impossible now because the patched flush_workqueue() make the race like below
>
> > Thread-1 Thread-2
> > | nci_dev_up()
> > | nci_open_device()
> > | __nci_request(nci_reset_req)
> > | nci_send_cmd
> > | queue_work(cmd_work)
> > nci_unregister_device() |
> > nci_close_device() | ...
> > flush_workqueue()[patch] | Worker
> > | nci_cmd_work()
> > | mod_timer(cmd_timer)[2]
> > // work over then return
> > del_timer_sync(cmd_timer)[1] |
> > | Timer
> > | nci_cmd_timer()
> > |
> > // timer over then return |
> > ... |
> > nci_free_device() |
> > kfree(ndev)[3] |
>
>
> With above thinkings and the given fact that my POC didn't raise the UAF, I think the
> flush_workqueue() + del_timer_sync() combination is okay to hinder this race.
>
> Tell me if there is anything wrong.
>
Thanks a lot for the detailed explanation and analysis.
I agree with your conclusion.
Guenter