2021-10-18 09:05:28

by Tzung-Bi Shih

[permalink] [raw]
Subject: [RESEND PATCH] platform/chrome: cros_ec_debugfs: detach log reader wq from devm

Debugfs console_log uses devm memory (see struct cros_ec_debugfs in
cros_ec_console_log fops). However, lifecycles of device and debugfs
are independent. An use-after-free issue is observed if userland
program operates the debugfs after the memory has been freed.

An userland program example in Python code:
... import select
... p = select.epoll()
... f = open('/sys/kernel/debug/cros_scp/console_log')
... p.register(f, select.POLLIN)
... p.poll(1)
[(4, 1)] # 4=fd, 1=select.POLLIN

[ shutdown cros_scp at the point ]

... p.poll(1)
[(4, 16)] # 4=fd, 16=select.POLLHUP
... p.unregister(f)

An use-after-free issue raises here. It called epoll_ctl with
EPOLL_CTL_DEL which in turn to use the workqueue in the devm (i.e.
log_wq).

The calling stack:
do_raw_spin_lock
_raw_spin_lock_irqsave
remove_wait_queue
ep_unregister_pollwait
ep_remove
do_epoll_ctl

Detaches log reader's workqueue from devm to make sure it is persistent
even if the device has been removed.

Signed-off-by: Tzung-Bi Shih <[email protected]>
---
As for 2 other cases:

Case 1. userland program opens the debugfs after the device has been removed

ENOENT. cros_ec_debugfs_remove() calls debugfs_remove_recursive().

Case 2. userland program is reading when the device is removing

If the userland program is waiting in cros_ec_console_log_read(), the device
removal will wait for it.

See the calling stack for the case:
wait_for_completion
__debugfs_file_removed
remove_one
simple_recursive_removal
debugfs_remove
cros_ec_debugfs_remove
platform_drv_remove
device_release_driver_internal
device_release_driver
bus_remove_device
device_del
platform_device_del
platform_device_unregister

drivers/platform/chrome/cros_ec_debugfs.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/platform/chrome/cros_ec_debugfs.c b/drivers/platform/chrome/cros_ec_debugfs.c
index 272c89837d74..0dbceee87a4b 100644
--- a/drivers/platform/chrome/cros_ec_debugfs.c
+++ b/drivers/platform/chrome/cros_ec_debugfs.c
@@ -25,6 +25,9 @@

#define CIRC_ADD(idx, size, value) (((idx) + (value)) & ((size) - 1))

+/* waitqueue for log readers */
+static DECLARE_WAIT_QUEUE_HEAD(cros_ec_debugfs_log_wq);
+
/**
* struct cros_ec_debugfs - EC debugging information.
*
@@ -33,7 +36,6 @@
* @log_buffer: circular buffer for console log information
* @read_msg: preallocated EC command and buffer to read console log
* @log_mutex: mutex to protect circular buffer
- * @log_wq: waitqueue for log readers
* @log_poll_work: recurring task to poll EC for new console log data
* @panicinfo_blob: panicinfo debugfs blob
*/
@@ -44,7 +46,6 @@ struct cros_ec_debugfs {
struct circ_buf log_buffer;
struct cros_ec_command *read_msg;
struct mutex log_mutex;
- wait_queue_head_t log_wq;
struct delayed_work log_poll_work;
/* EC panicinfo */
struct debugfs_blob_wrapper panicinfo_blob;
@@ -107,7 +108,7 @@ static void cros_ec_console_log_work(struct work_struct *__work)
buf_space--;
}

- wake_up(&debug_info->log_wq);
+ wake_up(&cros_ec_debugfs_log_wq);
}

mutex_unlock(&debug_info->log_mutex);
@@ -141,7 +142,7 @@ static ssize_t cros_ec_console_log_read(struct file *file, char __user *buf,

mutex_unlock(&debug_info->log_mutex);

- ret = wait_event_interruptible(debug_info->log_wq,
+ ret = wait_event_interruptible(cros_ec_debugfs_log_wq,
CIRC_CNT(cb->head, cb->tail, LOG_SIZE));
if (ret < 0)
return ret;
@@ -173,7 +174,7 @@ static __poll_t cros_ec_console_log_poll(struct file *file,
struct cros_ec_debugfs *debug_info = file->private_data;
__poll_t mask = 0;

- poll_wait(file, &debug_info->log_wq, wait);
+ poll_wait(file, &cros_ec_debugfs_log_wq, wait);

mutex_lock(&debug_info->log_mutex);
if (CIRC_CNT(debug_info->log_buffer.head,
@@ -377,7 +378,6 @@ static int cros_ec_create_console_log(struct cros_ec_debugfs *debug_info)
debug_info->log_buffer.tail = 0;

mutex_init(&debug_info->log_mutex);
- init_waitqueue_head(&debug_info->log_wq);

debugfs_create_file("console_log", S_IFREG | 0444, debug_info->dir,
debug_info, &cros_ec_console_log_fops);
--
2.33.0.1079.g6e70778dc9-goog


2021-10-18 15:03:00

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RESEND PATCH] platform/chrome: cros_ec_debugfs: detach log reader wq from devm

On Mon, Oct 18, 2021 at 2:03 AM Tzung-Bi Shih <[email protected]> wrote:
>
> Debugfs console_log uses devm memory (see struct cros_ec_debugfs in
> cros_ec_console_log fops). However, lifecycles of device and debugfs
> are independent. An use-after-free issue is observed if userland
> program operates the debugfs after the memory has been freed.
>

It would help to see the backtrace. Without it, it is difficult to
determine where the UAF is observed. Also, most if not all of the
touched functions access struct cros_ec_debugfs all over the place,
not only for the wait queue, so I am not sure if moving the wait queue
out of the data structure is the correct fix. It might instead be
necessary to disconnect memory allocations from the ec device.

Guenter



Guenter

> An userland program example in Python code:
> ... import select
> ... p = select.epoll()
> ... f = open('/sys/kernel/debug/cros_scp/console_log')
> ... p.register(f, select.POLLIN)
> ... p.poll(1)
> [(4, 1)] # 4=fd, 1=select.POLLIN
>
> [ shutdown cros_scp at the point ]
>
> ... p.poll(1)
> [(4, 16)] # 4=fd, 16=select.POLLHUP
> ... p.unregister(f)
>
> An use-after-free issue raises here. It called epoll_ctl with
> EPOLL_CTL_DEL which in turn to use the workqueue in the devm (i.e.
> log_wq).
>
> The calling stack:
> do_raw_spin_lock
> _raw_spin_lock_irqsave
> remove_wait_queue
> ep_unregister_pollwait
> ep_remove
> do_epoll_ctl
>
> Detaches log reader's workqueue from devm to make sure it is persistent
> even if the device has been removed.
>
> Signed-off-by: Tzung-Bi Shih <[email protected]>
> ---
> As for 2 other cases:
>
> Case 1. userland program opens the debugfs after the device has been removed
>
> ENOENT. cros_ec_debugfs_remove() calls debugfs_remove_recursive().
>
> Case 2. userland program is reading when the device is removing
>
> If the userland program is waiting in cros_ec_console_log_read(), the device
> removal will wait for it.
>
> See the calling stack for the case:
> wait_for_completion
> __debugfs_file_removed
> remove_one
> simple_recursive_removal
> debugfs_remove
> cros_ec_debugfs_remove
> platform_drv_remove
> device_release_driver_internal
> device_release_driver
> bus_remove_device
> device_del
> platform_device_del
> platform_device_unregister
>
> drivers/platform/chrome/cros_ec_debugfs.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/platform/chrome/cros_ec_debugfs.c b/drivers/platform/chrome/cros_ec_debugfs.c
> index 272c89837d74..0dbceee87a4b 100644
> --- a/drivers/platform/chrome/cros_ec_debugfs.c
> +++ b/drivers/platform/chrome/cros_ec_debugfs.c
> @@ -25,6 +25,9 @@
>
> #define CIRC_ADD(idx, size, value) (((idx) + (value)) & ((size) - 1))
>
> +/* waitqueue for log readers */
> +static DECLARE_WAIT_QUEUE_HEAD(cros_ec_debugfs_log_wq);
> +
> /**
> * struct cros_ec_debugfs - EC debugging information.
> *
> @@ -33,7 +36,6 @@
> * @log_buffer: circular buffer for console log information
> * @read_msg: preallocated EC command and buffer to read console log
> * @log_mutex: mutex to protect circular buffer
> - * @log_wq: waitqueue for log readers
> * @log_poll_work: recurring task to poll EC for new console log data
> * @panicinfo_blob: panicinfo debugfs blob
> */
> @@ -44,7 +46,6 @@ struct cros_ec_debugfs {
> struct circ_buf log_buffer;
> struct cros_ec_command *read_msg;
> struct mutex log_mutex;
> - wait_queue_head_t log_wq;
> struct delayed_work log_poll_work;
> /* EC panicinfo */
> struct debugfs_blob_wrapper panicinfo_blob;
> @@ -107,7 +108,7 @@ static void cros_ec_console_log_work(struct work_struct *__work)
> buf_space--;
> }
>
> - wake_up(&debug_info->log_wq);
> + wake_up(&cros_ec_debugfs_log_wq);
> }
>
> mutex_unlock(&debug_info->log_mutex);
> @@ -141,7 +142,7 @@ static ssize_t cros_ec_console_log_read(struct file *file, char __user *buf,
>
> mutex_unlock(&debug_info->log_mutex);
>
> - ret = wait_event_interruptible(debug_info->log_wq,
> + ret = wait_event_interruptible(cros_ec_debugfs_log_wq,
> CIRC_CNT(cb->head, cb->tail, LOG_SIZE));
> if (ret < 0)
> return ret;
> @@ -173,7 +174,7 @@ static __poll_t cros_ec_console_log_poll(struct file *file,
> struct cros_ec_debugfs *debug_info = file->private_data;
> __poll_t mask = 0;
>
> - poll_wait(file, &debug_info->log_wq, wait);
> + poll_wait(file, &cros_ec_debugfs_log_wq, wait);
>
> mutex_lock(&debug_info->log_mutex);
> if (CIRC_CNT(debug_info->log_buffer.head,
> @@ -377,7 +378,6 @@ static int cros_ec_create_console_log(struct cros_ec_debugfs *debug_info)
> debug_info->log_buffer.tail = 0;
>
> mutex_init(&debug_info->log_mutex);
> - init_waitqueue_head(&debug_info->log_wq);
>
> debugfs_create_file("console_log", S_IFREG | 0444, debug_info->dir,
> debug_info, &cros_ec_console_log_fops);
> --
> 2.33.0.1079.g6e70778dc9-goog
>

2021-10-19 05:25:51

by Tzung-Bi Shih

[permalink] [raw]
Subject: Re: [RESEND PATCH] platform/chrome: cros_ec_debugfs: detach log reader wq from devm

On Mon, Oct 18, 2021 at 4:59 PM Guenter Roeck <[email protected]> wrote:
>
> On Mon, Oct 18, 2021 at 2:03 AM Tzung-Bi Shih <[email protected]> wrote:
> >
> > Debugfs console_log uses devm memory (see struct cros_ec_debugfs in
> > cros_ec_console_log fops). However, lifecycles of device and debugfs
> > are independent. An use-after-free issue is observed if userland
> > program operates the debugfs after the memory has been freed.
> >
>
> It would help to see the backtrace. Without it, it is difficult to
> determine where the UAF is observed. Also, most if not all of the
> touched functions access struct cros_ec_debugfs all over the place,
> not only for the wait queue, so I am not sure if moving the wait queue
> out of the data structure is the correct fix. It might instead be
> necessary to disconnect memory allocations from the ec device.

A trimmed backtrace is in the commit message, but the more verbose one:
[ 426.174308] Call trace:
[ 426.174314] dump_backtrace+0x0/0x3ec
[ 426.174318] show_stack+0x20/0x2c
[ 426.174324] dump_stack+0x11c/0x1ac
[ 426.174329] print_address_description+0x7c/0x510
[ 426.174333] kasan_report+0x134/0x174
[ 426.174337] __asan_report_load4_noabort+0x44/0x50
[ 426.174341] do_raw_spin_lock+0x214/0x308
[ 426.174345] _raw_spin_lock_irqsave+0x68/0xf0
[ 426.174350] remove_wait_queue+0x3c/0x10c
[ 426.174355] ep_unregister_pollwait+0x120/0x170
[ 426.174358] ep_remove+0x60/0x2a0
[ 426.174362] do_epoll_ctl+0x590/0x7f4

I guess only the wait queue in the struct cros_ec_debugfs has
deep-coupled to console_log debugfs. There are 2 more file operation
scenarios appended after the "--".

> > An userland program example in Python code:
> > ... import select
> > ... p = select.epoll()
> > ... f = open('/sys/kernel/debug/cros_scp/console_log')
> > ... p.register(f, select.POLLIN)
> > ... p.poll(1)
> > [(4, 1)] # 4=fd, 1=select.POLLIN
> >
> > [ shutdown cros_scp at the point ]
> >
> > ... p.poll(1)
> > [(4, 16)] # 4=fd, 16=select.POLLHUP
> > ... p.unregister(f)
> >
> > An use-after-free issue raises here. It called epoll_ctl with
> > EPOLL_CTL_DEL which in turn to use the workqueue in the devm (i.e.
> > log_wq).
> >
> > The calling stack:
> > do_raw_spin_lock
> > _raw_spin_lock_irqsave
> > remove_wait_queue
> > ep_unregister_pollwait
> > ep_remove
> > do_epoll_ctl

Here is the trimmed backtrace.

[...]
> > ---
> > As for 2 other cases:
> >
> > Case 1. userland program opens the debugfs after the device has been removed
> >
> > ENOENT. cros_ec_debugfs_remove() calls debugfs_remove_recursive().
> >
> > Case 2. userland program is reading when the device is removing
> >
> > If the userland program is waiting in cros_ec_console_log_read(), the device
> > removal will wait for it.
> >
> > See the calling stack for the case:
> > wait_for_completion
> > __debugfs_file_removed
> > remove_one
> > simple_recursive_removal
> > debugfs_remove
> > cros_ec_debugfs_remove
> > platform_drv_remove
> > device_release_driver_internal
> > device_release_driver
> > bus_remove_device
> > device_del
> > platform_device_del
> > platform_device_unregister

Here are the 2 other cases.

2021-10-25 09:33:24

by Tzung-Bi Shih

[permalink] [raw]
Subject: Re: [RESEND PATCH] platform/chrome: cros_ec_debugfs: detach log reader wq from devm

Hi Guenter,

On Tue, Oct 19, 2021 at 1:24 PM Tzung-Bi Shih <[email protected]> wrote:
>
> On Mon, Oct 18, 2021 at 4:59 PM Guenter Roeck <[email protected]> wrote:
> >
> > On Mon, Oct 18, 2021 at 2:03 AM Tzung-Bi Shih <[email protected]> wrote:
> > >
> > > Debugfs console_log uses devm memory (see struct cros_ec_debugfs in
> > > cros_ec_console_log fops). However, lifecycles of device and debugfs
> > > are independent. An use-after-free issue is observed if userland
> > > program operates the debugfs after the memory has been freed.
> > >
> >
> > It would help to see the backtrace. Without it, it is difficult to
> > determine where the UAF is observed. Also, most if not all of the
> > touched functions access struct cros_ec_debugfs all over the place,
> > not only for the wait queue, so I am not sure if moving the wait queue
> > out of the data structure is the correct fix. It might instead be
> > necessary to disconnect memory allocations from the ec device.
>
> A trimmed backtrace is in the commit message, but the more verbose one:
> [ 426.174308] Call trace:
> [ 426.174314] dump_backtrace+0x0/0x3ec
> [ 426.174318] show_stack+0x20/0x2c
> [ 426.174324] dump_stack+0x11c/0x1ac
> [ 426.174329] print_address_description+0x7c/0x510
> [ 426.174333] kasan_report+0x134/0x174
> [ 426.174337] __asan_report_load4_noabort+0x44/0x50
> [ 426.174341] do_raw_spin_lock+0x214/0x308
> [ 426.174345] _raw_spin_lock_irqsave+0x68/0xf0
> [ 426.174350] remove_wait_queue+0x3c/0x10c
> [ 426.174355] ep_unregister_pollwait+0x120/0x170
> [ 426.174358] ep_remove+0x60/0x2a0
> [ 426.174362] do_epoll_ctl+0x590/0x7f4
>
> I guess only the wait queue in the struct cros_ec_debugfs has
> deep-coupled to console_log debugfs. There are 2 more file operation
> scenarios appended after the "--".

Do you think the backtrace is sufficient to determine the UAF happens
on the wait queue?

How about we keep the fix as is since we have a constantly reproducing
step for the UAF. And look forward to the approach "disconnect memory
allocations from the ec device" if we could discover more UAFs?