2019-04-12 12:27:48

by Lorenzo Bianconi

[permalink] [raw]
Subject: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
properly deallocate all pending skbs during suspend/resume phase

Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
Signed-off-by: Lorenzo Bianconi <[email protected]>
---
drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index a3acc070063a..575207133775 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
void mt76u_stop_queues(struct mt76_dev *dev)
{
tasklet_disable(&dev->usb.rx_tasklet);
- tasklet_disable(&dev->usb.tx_tasklet);
-
mt76u_stop_rx(dev);
+
mt76u_stop_tx(dev);
+ tasklet_disable(&dev->usb.tx_tasklet);
}
EXPORT_SYMBOL_GPL(mt76u_stop_queues);

--
2.20.1



2019-04-12 14:54:54

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> properly deallocate all pending skbs during suspend/resume phase

On suspend/resume tx skb's are processed after tasklet_enable()
in resume callback. There is issue with device removal though
(during suspend or otherwise).

> Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> Signed-off-by: Lorenzo Bianconi <[email protected]>
> ---
> drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> index a3acc070063a..575207133775 100644
> --- a/drivers/net/wireless/mediatek/mt76/usb.c
> +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> void mt76u_stop_queues(struct mt76_dev *dev)
> {
> tasklet_disable(&dev->usb.rx_tasklet);
> - tasklet_disable(&dev->usb.tx_tasklet);
> -
> mt76u_stop_rx(dev);
> +
> mt76u_stop_tx(dev);
> + tasklet_disable(&dev->usb.tx_tasklet);

If tasklet is scheduled and we disable it and never enable, we end up
with infinite loop in tasklet_action_common(). This patch make the
problem less reproducible since tasklet_disable() is moved after
usb_kill_urb() -> tasklet_schedule(), but it is still possible.

I comprehansive have fix for that, but giving it more testing.
Please drop this patch.

Stanislaw

2019-04-12 15:35:17

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

> On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > properly deallocate all pending skbs during suspend/resume phase
>
> On suspend/resume tx skb's are processed after tasklet_enable()
> in resume callback. There is issue with device removal though
> (during suspend or otherwise).

Hi Stanislaw,

I guess the right moment to deallocate the skbs is during suspend since resume
can happen in very far future

>
> > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > ---
> > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > index a3acc070063a..575207133775 100644
> > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > void mt76u_stop_queues(struct mt76_dev *dev)
> > {
> > tasklet_disable(&dev->usb.rx_tasklet);
> > - tasklet_disable(&dev->usb.tx_tasklet);
> > -
> > mt76u_stop_rx(dev);
> > +
> > mt76u_stop_tx(dev);
> > + tasklet_disable(&dev->usb.tx_tasklet);
>
> If tasklet is scheduled and we disable it and never enable, we end up
> with infinite loop in tasklet_action_common(). This patch make the
> problem less reproducible since tasklet_disable() is moved after
> usb_kill_urb() -> tasklet_schedule(), but it is still possible.

I can see the point here. Maybe we can just run tasklet_kill instead of
tasklet_disable here (at least for tx one)

Regards,
Lorenzo

>
> I comprehansive have fix for that, but giving it more testing.
> Please drop this patch.
>
> Stanislaw


Attachments:
(No filename) (1.80 kB)
signature.asc (228.00 B)
Download all attachments

2019-04-12 16:27:55

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

> > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > properly deallocate all pending skbs during suspend/resume phase
> >
> > On suspend/resume tx skb's are processed after tasklet_enable()
> > in resume callback. There is issue with device removal though
> > (during suspend or otherwise).
>
> Hi Stanislaw,
>
> I guess the right moment to deallocate the skbs is during suspend since resume
> can happen in very far future
>
> >
> > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > ---
> > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > index a3acc070063a..575207133775 100644
> > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > {
> > > tasklet_disable(&dev->usb.rx_tasklet);
> > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > -
> > > mt76u_stop_rx(dev);
> > > +
> > > mt76u_stop_tx(dev);
> > > + tasklet_disable(&dev->usb.tx_tasklet);
> >
> > If tasklet is scheduled and we disable it and never enable, we end up
> > with infinite loop in tasklet_action_common(). This patch make the
> > problem less reproducible since tasklet_disable() is moved after
> > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
>
> I can see the point here. Maybe we can just run tasklet_kill instead of
> tasklet_disable here (at least for tx one)

I think it is enough even for rx side since usb_kill_urb() will wait the end of
mt76u_complete_rx() and tasklet_kill() will wait for the completion of
mt76u_rx_tasklet(). We will need to remove tasklet_enable in resume routines.

Regards,
Lorenzo

>
> Regards,
> Lorenzo
>
> >
> > I comprehansive have fix for that, but giving it more testing.
> > Please drop this patch.
> >
> > Stanislaw



Attachments:
(No filename) (2.15 kB)
signature.asc (228.00 B)
Download all attachments

2019-04-13 08:31:03

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > properly deallocate all pending skbs during suspend/resume phase
> > >
> > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > in resume callback. There is issue with device removal though
> > > (during suspend or otherwise).
> >
> > Hi Stanislaw,
> >
> > I guess the right moment to deallocate the skbs is during suspend since resume
> > can happen in very far future

Yes, it's better to free on suspend, but in practice does not really matter since
system is disabled till resume.

> > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > ---
> > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > index a3acc070063a..575207133775 100644
> > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > {
> > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > -
> > > > mt76u_stop_rx(dev);
> > > > +
> > > > mt76u_stop_tx(dev);
> > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > >
> > > If tasklet is scheduled and we disable it and never enable, we end up
> > > with infinite loop in tasklet_action_common(). This patch make the
> > > problem less reproducible since tasklet_disable() is moved after
> > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> >
> > I can see the point here. Maybe we can just run tasklet_kill instead of
> > tasklet_disable here (at least for tx one)

I think you have right as tasklet_kill() will wait for scheduled tasklet .
Originally in my patch (see below) I used wait_event as I thought
tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
leak) but that seems to be not true.

> I think it is enough even for rx side since usb_kill_urb() will wait the end of
> mt76u_complete_rx() and tasklet_kill() will wait for the completion of
> mt76u_rx_tasklet(). We will need to remove tasklet_enable in resume routines.

No, because rx urb's are resubmitted on mt76u_rx_tasklet(). I use usb_poison_urb()
to prevent that in my patch.

Stanislaw

commit 088b1209302e17cda688c9b562e18970c92823ac
Author: Stanislaw Gruszka <[email protected]>
Date: Thu Apr 4 13:37:43 2019 +0200

mt76usb: fix tx/rx stop

Disabling tasklets on stopping rx/tx is wrong. If blocked tasklet
is scheduled and we remove device we get 100% cpu usage:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9 root 20 0 0 0 0 R 93.8 0.0 1:47.19 ksoftirqd/0

by using memory after free and eventually crash:

[ 2068.591964] RIP: 0010:tasklet_action_common.isra.17+0x66/0x100
[ 2068.591966] Code: 41 89 f5 eb 25 f0 48 0f ba 33 00 0f 83 b1 00 00 00 48 8b 7a 20 48 8b 42 18 e8 56 a3 b5 00 f0 80 23 fd 48 89 ea 48 85 ed 74 53 <48> 8b 2a 48 8d 5a 08 f0 48 0f ba 6a 08 01 72 0b 8b 42 10 85 c0 74
[ 2068.591968] RSP: 0018:ffff98758c34be58 EFLAGS: 00010206
[ 2068.591969] RAX: ffff98758e6966d0 RBX: ffff98756e69aef8 RCX: 0000000000000006
[ 2068.591970] RDX: 01060a053d060305 RSI: 0000000000000006 RDI: ffff98758e6966d0
[ 2068.591971] RBP: 01060a053d060305 R08: 0000000000000000 R09: 00000000000203c0
[ 2068.591971] R10: 000003ff65b34f08 R11: 0000000000000001 R12: ffff98758e6966d0
[ 2068.591972] R13: 0000000000000006 R14: 0000000000000040 R15: 0000000000000006
[ 2068.591974] FS: 0000000000000000(0000) GS:ffff98758e680000(0000) knlGS:0000000000000000
[ 2068.591975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2068.591975] CR2: 00002c5f73a6cc20 CR3: 00000002f920a001 CR4: 00000000003606e0
[ 2068.591977] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2068.591978] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2068.591978] Call Trace:
[ 2068.591985] __do_softirq+0xe3/0x30a
[ 2068.591989] ? sort_range+0x20/0x20
[ 2068.591990] run_ksoftirqd+0x26/0x40
[ 2068.591992] smpboot_thread_fn+0xc5/0x160
[ 2068.591995] kthread+0x112/0x130
[ 2068.591997] ? kthread_create_on_node+0x40/0x40
[ 2068.591998] ret_from_fork+0x35/0x40
[ 2068.591999] Modules linked in: ccm arc4 fuse rfcomm cmac bnep sunrpc snd_hda_codec_hdmi snd_soc_skl snd_soc_core snd_soc_acpi_intel_match snd_hda_codec_realtek snd_soc_acpi snd_hda_codec_generic snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core iTCO_wdt snd_hda_intel intel_rapl iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp btusb mei_wdt coretemp btrtl snd_hda_codec btbcm btintel intel_cstate snd_hwdep intel_uncore uvcvideo snd_hda_core videobuf2_vmalloc videobuf2_memops intel_rapl_perf wmi_bmof videobuf2_v4l2 intel_wmi_thunderbolt snd_seq bluetooth joydev videobuf2_common snd_seq_device snd_pcm videodev media i2c_i801 snd_timer idma64 ecdh_generic intel_lpss_pci intel_lpss mei_me mei ucsi_acpi typec_ucsi processor_thermal_device intel_soc_dts_iosf intel_pch_thermal typec thinkpad_acpi wmi snd soundcore rfkill int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad pcc_cpufreq uas usb_storage crc32c_intel i915 i2c_algo_bit nvme
serio_raw
[ 2068.592033] drm_kms_helper e1000e nvme_core drm video ipv6 [last unloaded: cfg80211]

Fortunate thing is that this not happen frequently, as scheduling
tasklet on blocked state is very exceptional, though might happen.

Due to different RX/TX tasklet processing fix is different for those.
For TX just wait until queues are empty. For RX assure rx_tasklet
do fail to resubmit buffers by poisoning urb's and kill the tasklet.

Signed-off-by: Stanislaw Gruszka <[email protected]>

diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index 15825e9a458f..7bf91566e212 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -384,7 +384,7 @@ void mt76_rx(struct mt76_dev *dev, enum mt76_rxq_id q, struct sk_buff *skb)
}
EXPORT_SYMBOL_GPL(mt76_rx);

-static bool mt76_has_tx_pending(struct mt76_dev *dev)
+bool mt76_has_tx_pending(struct mt76_dev *dev)
{
struct mt76_queue *q;
int i;
@@ -397,6 +397,7 @@ static bool mt76_has_tx_pending(struct mt76_dev *dev)

return false;
}
+EXPORT_SYMBOL_GPL(mt76_has_tx_pending);

void mt76_set_channel(struct mt76_dev *dev)
{
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index b432da3f55c7..19d03294e678 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -682,6 +682,7 @@ void mt76_release_buffered_frames(struct ieee80211_hw *hw,
u16 tids, int nframes,
enum ieee80211_frame_release_type reason,
bool more_data);
+bool mt76_has_tx_pending(struct mt76_dev *dev);
void mt76_set_channel(struct mt76_dev *dev);
int mt76_get_survey(struct ieee80211_hw *hw, int idx,
struct survey_info *survey);
@@ -776,9 +777,9 @@ int mt76u_vendor_request(struct mt76_dev *dev, u8 req,
void mt76u_single_wr(struct mt76_dev *dev, const u8 req,
const u16 offset, const u32 val);
int mt76u_init(struct mt76_dev *dev, struct usb_interface *intf);
-int mt76u_submit_rx_buffers(struct mt76_dev *dev);
int mt76u_alloc_queues(struct mt76_dev *dev);
void mt76u_stop_queues(struct mt76_dev *dev);
+int mt76u_start_queues(struct mt76_dev *dev);
void mt76u_stop_stat_wk(struct mt76_dev *dev);
void mt76u_queues_deinit(struct mt76_dev *dev);

diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
index 1ef00e971cfa..a27a3ae3dbf5 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
@@ -327,13 +327,10 @@ static int __maybe_unused mt76x0_resume(struct usb_interface *usb_intf)
struct mt76_usb *usb = &dev->mt76.usb;
int ret;

- ret = mt76u_submit_rx_buffers(&dev->mt76);
+ ret = mt76u_start_queues(&dev->mt76);
if (ret < 0)
goto err;

- tasklet_enable(&usb->rx_tasklet);
- tasklet_enable(&usb->tx_tasklet);
-
ret = mt76x0u_init_hardware(dev);
if (ret)
goto err;
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
index d08bb964966b..5450fb1c3f55 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
@@ -119,13 +119,10 @@ static int __maybe_unused mt76x2u_resume(struct usb_interface *intf)
struct mt76_usb *usb = &dev->mt76.usb;
int err;

- err = mt76u_submit_rx_buffers(&dev->mt76);
- if (err < 0)
+ err = mt76u_start_queues(&dev->mt76);
+ if (err)
goto err;

- tasklet_enable(&usb->rx_tasklet);
- tasklet_enable(&usb->tx_tasklet);
-
err = mt76x2u_init_hardware(dev);
if (err < 0)
goto err;
diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index a3acc070063a..8e32a60ac1c3 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -540,7 +540,7 @@ static void mt76u_rx_tasklet(unsigned long data)
rcu_read_unlock();
}

-int mt76u_submit_rx_buffers(struct mt76_dev *dev)
+static int mt76u_submit_rx_buffers(struct mt76_dev *dev)
{
struct mt76_queue *q = &dev->q_rx[MT_RXQ_MAIN];
unsigned long flags;
@@ -558,7 +558,6 @@ int mt76u_submit_rx_buffers(struct mt76_dev *dev)

return err;
}
-EXPORT_SYMBOL_GPL(mt76u_submit_rx_buffers);

static int mt76u_alloc_rx(struct mt76_dev *dev)
{
@@ -611,7 +610,9 @@ static void mt76u_stop_rx(struct mt76_dev *dev)
int i;

for (i = 0; i < q->ndesc; i++)
- usb_kill_urb(q->entry[i].urb);
+ usb_poison_urb(q->entry[i].urb);
+
+ tasklet_kill(&dev->usb.rx_tasklet);
}

static void mt76u_tx_tasklet(unsigned long data)
@@ -830,20 +831,35 @@ static void mt76u_free_tx(struct mt76_dev *dev)
static void mt76u_stop_tx(struct mt76_dev *dev)
{
struct mt76_queue *q;
- int i, j;
+ int i, j, ret;

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
q = dev->q_tx[i].q;
for (j = 0; j < q->ndesc; j++)
usb_kill_urb(q->entry[j].urb);
}
+
+ ret = wait_event_timeout(dev->tx_wait, !mt76_has_tx_pending(dev), HZ/5);
+ if (!ret)
+ dev_err(dev->dev, "timed out waiting for pending tx\n");
}

-void mt76u_stop_queues(struct mt76_dev *dev)
+int mt76u_start_queues(struct mt76_dev *dev)
{
- tasklet_disable(&dev->usb.rx_tasklet);
- tasklet_disable(&dev->usb.tx_tasklet);
+ struct mt76_queue *q = &dev->q_rx[MT_RXQ_MAIN];
+ int i;
+
+ /* Nothing to do for TX */

+ for (i = 0; i < q->ndesc; i++)
+ usb_unpoison_urb(q->entry[i].urb);
+
+ return mt76u_submit_rx_buffers(dev);
+}
+EXPORT_SYMBOL_GPL(mt76u_start_queues);
+
+void mt76u_stop_queues(struct mt76_dev *dev)
+{
mt76u_stop_rx(dev);
mt76u_stop_tx(dev);
}

2019-04-13 10:11:13

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

> On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > properly deallocate all pending skbs during suspend/resume phase
> > > >
> > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > in resume callback. There is issue with device removal though
> > > > (during suspend or otherwise).
> > >
> > > Hi Stanislaw,
> > >
> > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > can happen in very far future
>
> Yes, it's better to free on suspend, but in practice does not really matter since
> system is disabled till resume.
>
> > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > ---
> > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > index a3acc070063a..575207133775 100644
> > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > {
> > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > -
> > > > > mt76u_stop_rx(dev);
> > > > > +
> > > > > mt76u_stop_tx(dev);
> > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > >
> > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > problem less reproducible since tasklet_disable() is moved after
> > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > >
> > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > tasklet_disable here (at least for tx one)
>
> I think you have right as tasklet_kill() will wait for scheduled tasklet .
> Originally in my patch (see below) I used wait_event as I thought
> tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> leak) but that seems to be not true.

I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
is already waiting for tx pending so we just need to use tasklet_kill
at the end of mt76u_stop_queues, in this way we will free pending skbs during
suspend

Regards,
Lorenzo

>
> > I think it is enough even for rx side since usb_kill_urb() will wait the end of
> > mt76u_complete_rx() and tasklet_kill() will wait for the completion of
> > mt76u_rx_tasklet(). We will need to remove tasklet_enable in resume routines.
>
> No, because rx urb's are resubmitted on mt76u_rx_tasklet(). I use usb_poison_urb()
> to prevent that in my patch.
>
> Stanislaw
>
> commit 088b1209302e17cda688c9b562e18970c92823ac
> Author: Stanislaw Gruszka <[email protected]>
> Date: Thu Apr 4 13:37:43 2019 +0200
>
> mt76usb: fix tx/rx stop
>
> Disabling tasklets on stopping rx/tx is wrong. If blocked tasklet
> is scheduled and we remove device we get 100% cpu usage:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 9 root 20 0 0 0 0 R 93.8 0.0 1:47.19 ksoftirqd/0
>
> by using memory after free and eventually crash:
>
> [ 2068.591964] RIP: 0010:tasklet_action_common.isra.17+0x66/0x100
> [ 2068.591966] Code: 41 89 f5 eb 25 f0 48 0f ba 33 00 0f 83 b1 00 00 00 48 8b 7a 20 48 8b 42 18 e8 56 a3 b5 00 f0 80 23 fd 48 89 ea 48 85 ed 74 53 <48> 8b 2a 48 8d 5a 08 f0 48 0f ba 6a 08 01 72 0b 8b 42 10 85 c0 74
> [ 2068.591968] RSP: 0018:ffff98758c34be58 EFLAGS: 00010206
> [ 2068.591969] RAX: ffff98758e6966d0 RBX: ffff98756e69aef8 RCX: 0000000000000006
> [ 2068.591970] RDX: 01060a053d060305 RSI: 0000000000000006 RDI: ffff98758e6966d0
> [ 2068.591971] RBP: 01060a053d060305 R08: 0000000000000000 R09: 00000000000203c0
> [ 2068.591971] R10: 000003ff65b34f08 R11: 0000000000000001 R12: ffff98758e6966d0
> [ 2068.591972] R13: 0000000000000006 R14: 0000000000000040 R15: 0000000000000006
> [ 2068.591974] FS: 0000000000000000(0000) GS:ffff98758e680000(0000) knlGS:0000000000000000
> [ 2068.591975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2068.591975] CR2: 00002c5f73a6cc20 CR3: 00000002f920a001 CR4: 00000000003606e0
> [ 2068.591977] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2068.591978] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2068.591978] Call Trace:
> [ 2068.591985] __do_softirq+0xe3/0x30a
> [ 2068.591989] ? sort_range+0x20/0x20
> [ 2068.591990] run_ksoftirqd+0x26/0x40
> [ 2068.591992] smpboot_thread_fn+0xc5/0x160
> [ 2068.591995] kthread+0x112/0x130
> [ 2068.591997] ? kthread_create_on_node+0x40/0x40
> [ 2068.591998] ret_from_fork+0x35/0x40
> [ 2068.591999] Modules linked in: ccm arc4 fuse rfcomm cmac bnep sunrpc snd_hda_codec_hdmi snd_soc_skl snd_soc_core snd_soc_acpi_intel_match snd_hda_codec_realtek snd_soc_acpi snd_hda_codec_generic snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core iTCO_wdt snd_hda_intel intel_rapl iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp btusb mei_wdt coretemp btrtl snd_hda_codec btbcm btintel intel_cstate snd_hwdep intel_uncore uvcvideo snd_hda_core videobuf2_vmalloc videobuf2_memops intel_rapl_perf wmi_bmof videobuf2_v4l2 intel_wmi_thunderbolt snd_seq bluetooth joydev videobuf2_common snd_seq_device snd_pcm videodev media i2c_i801 snd_timer idma64 ecdh_generic intel_lpss_pci intel_lpss mei_me mei ucsi_acpi typec_ucsi processor_thermal_device intel_soc_dts_iosf intel_pch_thermal typec thinkpad_acpi wmi snd soundcore rfkill int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad pcc_cpufreq uas usb_storage crc32c_intel i915 i2c_algo_bit nvme serio_raw
> [ 2068.592033] drm_kms_helper e1000e nvme_core drm video ipv6 [last unloaded: cfg80211]
>
> Fortunate thing is that this not happen frequently, as scheduling
> tasklet on blocked state is very exceptional, though might happen.
>
> Due to different RX/TX tasklet processing fix is different for those.
> For TX just wait until queues are empty. For RX assure rx_tasklet
> do fail to resubmit buffers by poisoning urb's and kill the tasklet.
>
> Signed-off-by: Stanislaw Gruszka <[email protected]>
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
> index 15825e9a458f..7bf91566e212 100644

[...]

> @@ -830,20 +831,35 @@ static void mt76u_free_tx(struct mt76_dev *dev)
> static void mt76u_stop_tx(struct mt76_dev *dev)
> {
> struct mt76_queue *q;
> - int i, j;
> + int i, j, ret;
>
> for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> q = dev->q_tx[i].q;
> for (j = 0; j < q->ndesc; j++)
> usb_kill_urb(q->entry[j].urb);
> }
> +
> + ret = wait_event_timeout(dev->tx_wait, !mt76_has_tx_pending(dev), HZ/5);
> + if (!ret)
> + dev_err(dev->dev, "timed out waiting for pending tx\n");
> }
>
> -void mt76u_stop_queues(struct mt76_dev *dev)
> +int mt76u_start_queues(struct mt76_dev *dev)

maybe mt76u_resume_rx_queue would be more clear

> {
> - tasklet_disable(&dev->usb.rx_tasklet);
> - tasklet_disable(&dev->usb.tx_tasklet);
> + struct mt76_queue *q = &dev->q_rx[MT_RXQ_MAIN];
> + int i;
> +
> + /* Nothing to do for TX */
>
> + for (i = 0; i < q->ndesc; i++)
> + usb_unpoison_urb(q->entry[i].urb);
> +
> + return mt76u_submit_rx_buffers(dev);
> +}
> +EXPORT_SYMBOL_GPL(mt76u_start_queues);
> +
> +void mt76u_stop_queues(struct mt76_dev *dev)
> +{
> mt76u_stop_rx(dev);
> mt76u_stop_tx(dev);
> }


Attachments:
(No filename) (7.84 kB)
signature.asc (228.00 B)
Download all attachments

2019-04-15 11:54:03

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > >
> > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > in resume callback. There is issue with device removal though
> > > > > (during suspend or otherwise).
> > > >
> > > > Hi Stanislaw,
> > > >
> > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > can happen in very far future
> >
> > Yes, it's better to free on suspend, but in practice does not really matter since
> > system is disabled till resume.
> >
> > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > ---
> > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > index a3acc070063a..575207133775 100644
> > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > {
> > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > -
> > > > > > mt76u_stop_rx(dev);
> > > > > > +
> > > > > > mt76u_stop_tx(dev);
> > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > >
> > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > >
> > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > tasklet_disable here (at least for tx one)
> >
> > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > Originally in my patch (see below) I used wait_event as I thought
> > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > leak) but that seems to be not true.
>
> I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> is already waiting for tx pending so we just need to use tasklet_kill
> at the end of mt76u_stop_queues, in this way we will free pending skbs during
> suspend

I looked more into that and there are some issues with this approach.
tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
do not free skb's that require status check and dev->usb.stat_work
is already (correctly) stopped on mac80211.stop.

I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
issues correctly.

Stanislaw

2019-04-15 15:04:13

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

> On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > > >
> > > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > > in resume callback. There is issue with device removal though
> > > > > > (during suspend or otherwise).
> > > > >
> > > > > Hi Stanislaw,
> > > > >
> > > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > > can happen in very far future
> > >
> > > Yes, it's better to free on suspend, but in practice does not really matter since
> > > system is disabled till resume.
> > >
> > > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > > ---
> > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > index a3acc070063a..575207133775 100644
> > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > > {
> > > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > -
> > > > > > > mt76u_stop_rx(dev);
> > > > > > > +
> > > > > > > mt76u_stop_tx(dev);
> > > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > > >
> > > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > > >
> > > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > > tasklet_disable here (at least for tx one)
> > >
> > > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > > Originally in my patch (see below) I used wait_event as I thought
> > > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > > leak) but that seems to be not true.
> >
> > I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> > is already waiting for tx pending so we just need to use tasklet_kill
> > at the end of mt76u_stop_queues, in this way we will free pending skbs during
> > suspend
>
> I looked more into that and there are some issues with this approach.
> tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
> do not free skb's that require status check and dev->usb.stat_work
> is already (correctly) stopped on mac80211.stop.

right

>
> I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
> issues correctly.

ack

>
> Stanislaw

during device removal I guess we should also flush skbs in status queue, doing
something like (after commit 0b5f71304cd9 (mt76: introduce mt76_free_device
routine))

diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
index 1ef00e971cfa..d4d1eb003148 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
@@ -299,7 +299,7 @@ static void mt76x0_disconnect(struct usb_interface *usb_intf)
if (!initalized)
return;

- ieee80211_unregister_hw(dev->mt76.hw);
+ mt76_unregister_device(&dev->mt76);
mt76x0u_cleanup(dev);

usb_set_intfdata(usb_intf, NULL);
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
index d08bb964966b..4394c7c10535 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
@@ -94,7 +94,7 @@ static void mt76x2u_disconnect(struct usb_interface *intf)
struct ieee80211_hw *hw = mt76_hw(dev);

set_bit(MT76_REMOVED, &dev->mt76.state);
- ieee80211_unregister_hw(hw);
+ mt76_unregister_device(&dev->mt76);
mt76x2u_cleanup(dev);

ieee80211_free_hw(hw);

Regards,
Lorenzo


Attachments:
(No filename) (4.45 kB)
signature.asc (228.00 B)
Download all attachments

2019-04-16 08:04:48

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Mon, Apr 15, 2019 at 05:04:06PM +0200, Lorenzo Bianconi wrote:
> > On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > > > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > > > >
> > > > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > > > in resume callback. There is issue with device removal though
> > > > > > > (during suspend or otherwise).
> > > > > >
> > > > > > Hi Stanislaw,
> > > > > >
> > > > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > > > can happen in very far future
> > > >
> > > > Yes, it's better to free on suspend, but in practice does not really matter since
> > > > system is disabled till resume.
> > > >
> > > > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > index a3acc070063a..575207133775 100644
> > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > > > {
> > > > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > -
> > > > > > > > mt76u_stop_rx(dev);
> > > > > > > > +
> > > > > > > > mt76u_stop_tx(dev);
> > > > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > >
> > > > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > > > >
> > > > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > > > tasklet_disable here (at least for tx one)
> > > >
> > > > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > > > Originally in my patch (see below) I used wait_event as I thought
> > > > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > > > leak) but that seems to be not true.
> > >
> > > I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> > > is already waiting for tx pending so we just need to use tasklet_kill
> > > at the end of mt76u_stop_queues, in this way we will free pending skbs during
> > > suspend
> >
> > I looked more into that and there are some issues with this approach.
> > tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
> > do not free skb's that require status check and dev->usb.stat_work
> > is already (correctly) stopped on mac80211.stop.
>
> right
>
> >
> > I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
> > issues correctly.
>
> ack
>
> >
> > Stanislaw
>
> during device removal I guess we should also flush skbs in status queue, doing
> something like (after commit 0b5f71304cd9 (mt76: introduce mt76_free_device
> routine))
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> index 1ef00e971cfa..d4d1eb003148 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> @@ -299,7 +299,7 @@ static void mt76x0_disconnect(struct usb_interface *usb_intf)
> if (!initalized)
> return;
>
> - ieee80211_unregister_hw(dev->mt76.hw);
> + mt76_unregister_device(&dev->mt76);

mt76_unregister_device() free mmio dma. I've added mt76_tx_status_check()
on mt76u_stop_tx() routine instead.

Stanislaw


2019-04-16 08:12:49

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

> On Mon, Apr 15, 2019 at 05:04:06PM +0200, Lorenzo Bianconi wrote:
> > > On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > > > > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > > > > >
> > > > > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > > > > in resume callback. There is issue with device removal though
> > > > > > > > (during suspend or otherwise).
> > > > > > >
> > > > > > > Hi Stanislaw,
> > > > > > >
> > > > > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > > > > can happen in very far future
> > > > >
> > > > > Yes, it's better to free on suspend, but in practice does not really matter since
> > > > > system is disabled till resume.
> > > > >
> > > > > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > > > > ---
> > > > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > index a3acc070063a..575207133775 100644
> > > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > > > > {
> > > > > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > > -
> > > > > > > > > mt76u_stop_rx(dev);
> > > > > > > > > +
> > > > > > > > > mt76u_stop_tx(dev);
> > > > > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > >
> > > > > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > > > > >
> > > > > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > > > > tasklet_disable here (at least for tx one)
> > > > >
> > > > > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > > > > Originally in my patch (see below) I used wait_event as I thought
> > > > > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > > > > leak) but that seems to be not true.
> > > >
> > > > I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> > > > is already waiting for tx pending so we just need to use tasklet_kill
> > > > at the end of mt76u_stop_queues, in this way we will free pending skbs during
> > > > suspend
> > >
> > > I looked more into that and there are some issues with this approach.
> > > tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
> > > do not free skb's that require status check and dev->usb.stat_work
> > > is already (correctly) stopped on mac80211.stop.
> >
> > right
> >
> > >
> > > I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
> > > issues correctly.
> >
> > ack
> >
> > >
> > > Stanislaw
> >
> > during device removal I guess we should also flush skbs in status queue, doing
> > something like (after commit 0b5f71304cd9 (mt76: introduce mt76_free_device
> > routine))
> >
> > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > index 1ef00e971cfa..d4d1eb003148 100644
> > --- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > @@ -299,7 +299,7 @@ static void mt76x0_disconnect(struct usb_interface *usb_intf)
> > if (!initalized)
> > return;
> >
> > - ieee80211_unregister_hw(dev->mt76.hw);
> > + mt76_unregister_device(&dev->mt76);
>
> mt76_unregister_device() free mmio dma. I've added mt76_tx_status_check()
> on mt76u_stop_tx() routine instead.

nope, after commit 0b5f71304cd98fb7b3b5b3a633e470bea979fe94
(https://github.com/nbd168/wireless/commit/0b5f71304cd98fb7b3b5b3a633e470bea979fe94)
it can be used even for usb

Lorenzo

>
> Stanislaw
>


Attachments:
(No filename) (4.56 kB)
signature.asc (228.00 B)
Download all attachments

2019-04-16 08:27:26

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Tue, Apr 16, 2019 at 10:12:42AM +0200, Lorenzo Bianconi wrote:
> > On Mon, Apr 15, 2019 at 05:04:06PM +0200, Lorenzo Bianconi wrote:
> > > > On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > > > > > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > > > > > >
> > > > > > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > > > > > in resume callback. There is issue with device removal though
> > > > > > > > > (during suspend or otherwise).
> > > > > > > >
> > > > > > > > Hi Stanislaw,
> > > > > > > >
> > > > > > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > > > > > can happen in very far future
> > > > > >
> > > > > > Yes, it's better to free on suspend, but in practice does not really matter since
> > > > > > system is disabled till resume.
> > > > > >
> > > > > > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > index a3acc070063a..575207133775 100644
> > > > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > > > > > {
> > > > > > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > > > -
> > > > > > > > > > mt76u_stop_rx(dev);
> > > > > > > > > > +
> > > > > > > > > > mt76u_stop_tx(dev);
> > > > > > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > >
> > > > > > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > > > > > >
> > > > > > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > > > > > tasklet_disable here (at least for tx one)
> > > > > >
> > > > > > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > > > > > Originally in my patch (see below) I used wait_event as I thought
> > > > > > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > > > > > leak) but that seems to be not true.
> > > > >
> > > > > I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> > > > > is already waiting for tx pending so we just need to use tasklet_kill
> > > > > at the end of mt76u_stop_queues, in this way we will free pending skbs during
> > > > > suspend
> > > >
> > > > I looked more into that and there are some issues with this approach.
> > > > tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
> > > > do not free skb's that require status check and dev->usb.stat_work
> > > > is already (correctly) stopped on mac80211.stop.
> > >
> > > right
> > >
> > > >
> > > > I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
> > > > issues correctly.
> > >
> > > ack
> > >
> > > >
> > > > Stanislaw
> > >
> > > during device removal I guess we should also flush skbs in status queue, doing
> > > something like (after commit 0b5f71304cd9 (mt76: introduce mt76_free_device
> > > routine))
> > >
> > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > index 1ef00e971cfa..d4d1eb003148 100644
> > > --- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > @@ -299,7 +299,7 @@ static void mt76x0_disconnect(struct usb_interface *usb_intf)
> > > if (!initalized)
> > > return;
> > >
> > > - ieee80211_unregister_hw(dev->mt76.hw);
> > > + mt76_unregister_device(&dev->mt76);
> >
> > mt76_unregister_device() free mmio dma. I've added mt76_tx_status_check()
> > on mt76u_stop_tx() routine instead.
>
> nope, after commit 0b5f71304cd98fb7b3b5b3a633e470bea979fe94
> (https://github.com/nbd168/wireless/commit/0b5f71304cd98fb7b3b5b3a633e470bea979fe94)
> it can be used even for usb

Ok, but as you pointed before 'right moment to deallocate the skbs is
during suspend' so I still preffer to flush statuses there.

Stanislaw

2019-04-16 08:33:11

by Lorenzo Bianconi

[permalink] [raw]
Subject: Re: [PATCH] mt76: usb: fix possible memory leak during suspend/resume

On Apr 16, Stanislaw Gruszka wrote:
> On Tue, Apr 16, 2019 at 10:12:42AM +0200, Lorenzo Bianconi wrote:
> > > On Mon, Apr 15, 2019 at 05:04:06PM +0200, Lorenzo Bianconi wrote:
> > > > > On Sat, Apr 13, 2019 at 12:10:59PM +0200, Lorenzo Bianconi wrote:
> > > > > > > On Fri, Apr 12, 2019 at 06:27:48PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > > > On Fri, Apr 12, 2019 at 02:27:16PM +0200, Lorenzo Bianconi wrote:
> > > > > > > > > > > Disable mt76u_tx_tasklet at the end of mt76u_stop_queues in order to
> > > > > > > > > > > properly deallocate all pending skbs during suspend/resume phase
> > > > > > > > > >
> > > > > > > > > > On suspend/resume tx skb's are processed after tasklet_enable()
> > > > > > > > > > in resume callback. There is issue with device removal though
> > > > > > > > > > (during suspend or otherwise).
> > > > > > > > >
> > > > > > > > > Hi Stanislaw,
> > > > > > > > >
> > > > > > > > > I guess the right moment to deallocate the skbs is during suspend since resume
> > > > > > > > > can happen in very far future
> > > > > > >
> > > > > > > Yes, it's better to free on suspend, but in practice does not really matter since
> > > > > > > system is disabled till resume.
> > > > > > >
> > > > > > > > > > > Fixes: b40b15e1521f ("mt76: add usb support to mt76 layer")
> > > > > > > > > > > Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/net/wireless/mediatek/mt76/usb.c | 4 ++--
> > > > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > > index a3acc070063a..575207133775 100644
> > > > > > > > > > > --- a/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > > +++ b/drivers/net/wireless/mediatek/mt76/usb.c
> > > > > > > > > > > @@ -842,10 +842,10 @@ static void mt76u_stop_tx(struct mt76_dev *dev)
> > > > > > > > > > > void mt76u_stop_queues(struct mt76_dev *dev)
> > > > > > > > > > > {
> > > > > > > > > > > tasklet_disable(&dev->usb.rx_tasklet);
> > > > > > > > > > > - tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > > > > -
> > > > > > > > > > > mt76u_stop_rx(dev);
> > > > > > > > > > > +
> > > > > > > > > > > mt76u_stop_tx(dev);
> > > > > > > > > > > + tasklet_disable(&dev->usb.tx_tasklet);
> > > > > > > > > >
> > > > > > > > > > If tasklet is scheduled and we disable it and never enable, we end up
> > > > > > > > > > with infinite loop in tasklet_action_common(). This patch make the
> > > > > > > > > > problem less reproducible since tasklet_disable() is moved after
> > > > > > > > > > usb_kill_urb() -> tasklet_schedule(), but it is still possible.
> > > > > > > > >
> > > > > > > > > I can see the point here. Maybe we can just run tasklet_kill instead of
> > > > > > > > > tasklet_disable here (at least for tx one)
> > > > > > >
> > > > > > > I think you have right as tasklet_kill() will wait for scheduled tasklet .
> > > > > > > Originally in my patch (see below) I used wait_event as I thought
> > > > > > > tasklet_kill() may prevent scheduled tasklet to be executed (hence cause
> > > > > > > leak) but that seems to be not true.
> > > > > >
> > > > > > I agree with rx side (good catch!!), but on tx one I guess usb_kill_urb()
> > > > > > is already waiting for tx pending so we just need to use tasklet_kill
> > > > > > at the end of mt76u_stop_queues, in this way we will free pending skbs during
> > > > > > suspend
> > > > >
> > > > > I looked more into that and there are some issues with this approach.
> > > > > tx_tasklet do mt76_txq_schedule() which can queue tx frames. Also we
> > > > > do not free skb's that require status check and dev->usb.stat_work
> > > > > is already (correctly) stopped on mac80211.stop.
> > > >
> > > > right
> > > >
> > > > >
> > > > > I'll use wait_event(dev->tx_wait) on mac80211 stop to handle those
> > > > > issues correctly.
> > > >
> > > > ack
> > > >
> > > > >
> > > > > Stanislaw
> > > >
> > > > during device removal I guess we should also flush skbs in status queue, doing
> > > > something like (after commit 0b5f71304cd9 (mt76: introduce mt76_free_device
> > > > routine))
> > > >
> > > > diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > > index 1ef00e971cfa..d4d1eb003148 100644
> > > > --- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > > +++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
> > > > @@ -299,7 +299,7 @@ static void mt76x0_disconnect(struct usb_interface *usb_intf)
> > > > if (!initalized)
> > > > return;
> > > >
> > > > - ieee80211_unregister_hw(dev->mt76.hw);
> > > > + mt76_unregister_device(&dev->mt76);
> > >
> > > mt76_unregister_device() free mmio dma. I've added mt76_tx_status_check()
> > > on mt76u_stop_tx() routine instead.
> >
> > nope, after commit 0b5f71304cd98fb7b3b5b3a633e470bea979fe94
> > (https://github.com/nbd168/wireless/commit/0b5f71304cd98fb7b3b5b3a633e470bea979fe94)
> > it can be used even for usb
>
> Ok, but as you pointed before 'right moment to deallocate the skbs is
> during suspend' so I still preffer to flush statuses there.
>

I agree

Lorenzo

> Stanislaw


Attachments:
(No filename) (5.17 kB)
signature.asc (228.00 B)
Download all attachments