2022-05-23 07:17:10

by Duoming Zhou

[permalink] [raw]
Subject: Re: [PATCH net v2] net: wireless: marvell: mwifiex: fix sleep in atomic context bugs

Hello,

On Fri, 20 May 2022 09:08:52 -0700 Jeff Johnson wrote:

> >>>>> There are sleep in atomic context bugs when uploading device dump
> >>>>> data on usb interface. The root cause is that the operations that
> >>>>> may sleep are called in fw_dump_timer_fn which is a timer handler.
> >>>>> The call tree shows the execution paths that could lead to bugs:
> >>>>>
> >>>>> (Interrupt context)
> >>>>> fw_dump_timer_fn
> >>>>> mwifiex_upload_device_dump
> >>>>> dev_coredumpv(..., GFP_KERNEL)
> >>
> >> just looking at this description, why isn't the simple fix just to
> >> change this call to use GFP_ATOMIC?
> >
> > Because change the parameter of dev_coredumpv() to GFP_ATOMIC could only solve
> > partial problem. The following GFP_KERNEL parameters are in /lib/kobject.c
> > which is not influenced by dev_coredumpv().
> >
> > kobject_set_name_vargs
> > kvasprintf_const(GFP_KERNEL, ...); //may sleep
> > kstrdup(s, GFP_KERNEL); //may sleep
>
> Then it seems there is a problem with dev_coredumpm().
>
> dev_coredumpm() takes a gfp param which means it expects to be called in
> any context, but it then calls dev_set_name() which, as you point out,
> cannot be called from an atomic context.
>
> So if we cannot change the fact that dev_set_name() cannot be called
> from an atomic context, then it would seem to follow that
> dev_coredumpv also cannot be called from an atomic
> context and hence their gfp param is pointless and should presumably be
> removed.

Thanks for your time and suggestions! I think the gfp_t parameter of dev_coredumpv and
dev_coredumpm may not be removed, because it could be used to pass value to gfp_t
parameter of kzalloc in dev_coredumpm. What's more, there are also many other places
use dev_coredumpv and dev_coredumpm, if we remove the gfp_t parameter, there are too many
places that need to modify and these places are not in interrupt context.

There are two solutions now: one is to moves the operations that may sleep into a work item.
Another is to change the gfp_t parameter of dev_coredumpv from GFP_KERNEL to GFP_ATOMIC, and
change the gfp_t parameter of kvasprintf_const and kstrdup from GFP_KERNEL to
"in_interrupt() ? GFP_ATOMIC : GFP_KERNEL".

The detail of the first solution is shown below:

diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index 88c72d1827a..cc3f1121eb9 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -63,11 +63,19 @@ static void wakeup_timer_fn(struct timer_list *t)
adapter->if_ops.card_reset(adapter);
}

+static void fw_dump_work(struct work_struct *work)
+{
+ struct mwifiex_adapter *adapter =
+ container_of(work, struct mwifiex_adapter, devdump_work);
+
+ mwifiex_upload_device_dump(adapter);
+}
+
static void fw_dump_timer_fn(struct timer_list *t)
{
struct mwifiex_adapter *adapter = from_timer(adapter, t, devdump_timer);

- mwifiex_upload_device_dump(adapter);
+ schedule_work(&adapter->devdump_work);
}

/*
@@ -321,6 +329,7 @@ static void mwifiex_init_adapter(struct mwifiex_adapter *adapter)
adapter->active_scan_triggered = false;
timer_setup(&adapter->wakeup_timer, wakeup_timer_fn, 0);
adapter->devdump_len = 0;
+ INIT_WORK(&adapter->devdump_work, fw_dump_work);
timer_setup(&adapter->devdump_timer, fw_dump_timer_fn, 0);
}

@@ -401,6 +410,7 @@ mwifiex_adapter_cleanup(struct mwifiex_adapter *adapter)
{
del_timer(&adapter->wakeup_timer);
del_timer_sync(&adapter->devdump_timer);
+ cancel_work_sync(&adapter->devdump_work);
mwifiex_cancel_all_pending_cmd(adapter);
wake_up_interruptible(&adapter->cmd_wait_q.wait);
wake_up_interruptible(&adapter->hs_activate_wait_q);
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 332dd1c8db3..c8ac2f57f18 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -900,6 +900,7 @@ struct mwifiex_adapter {
struct work_struct rx_work;
struct workqueue_struct *dfs_workqueue;
struct work_struct dfs_work;
+ struct work_struct devdump_work;
bool rx_work_enabled;
bool rx_processing;
bool delay_main_work;
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_event.c b/drivers/net/wireless/marvell/mwifiex/sta_event.c
index 7d42c5d2dbf..8e28d0107d7 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_event.c
@@ -644,6 +644,7 @@ mwifiex_fw_dump_info_event(struct mwifiex_private *priv,

upload_dump:
del_timer_sync(&adapter->devdump_timer);
+ cancel_work_sync(&adapter->devdump_work);
mwifiex_upload_device_dump(adapter);
}

The detail of the second solution is shown below:

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ace7371c477..258906920a2 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1116,7 +1116,7 @@ void mwifiex_upload_device_dump(struct mwifiex_adapter *adapter)
mwifiex_dbg(adapter, MSG,
"== mwifiex dump information to /sys/class/devcoredump start\n");
dev_coredumpv(adapter->dev, adapter->devdump_data, adapter->devdump_len,
- GFP_KERNEL);
+ GFP_ATOMIC);
mwifiex_dbg(adapter, MSG,
"== mwifiex dump information to /sys/class/devcoredump end\n");

diff --git a/lib/kobject.c b/lib/kobject.c
index 5f0e71ab292..dbb2e57ef78 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -254,7 +254,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
if (kobj->name && !fmt)
return 0;

- s = kvasprintf_const(GFP_KERNEL, fmt, vargs);
+ s = kvasprintf_const(in_interrupt() ? GFP_ATOMIC : GFP_KERNEL, fmt, vargs);
if (!s)
return -ENOMEM;

@@ -267,7 +267,7 @@ int kobject_set_name_vargs(struct kobject *kobj, const char *fmt,
if (strchr(s, '/')) {
char *t;

- t = kstrdup(s, GFP_KERNEL);
+ t = kstrdup(s, in_interrupt() ? GFP_ATOMIC : GFP_KERNEL);
kfree_const(s);
if (!t)
return -ENOMEM;

Which one do you think is better? I will choose the better one to test.

Best regards,
Duoming Zhou