2014-10-08 09:14:49

by Rajkumar Manoharan

[permalink] [raw]
Subject: [PATCH] ath10k: fix kernel panic while shutting down AP

The commit "ath10k: workaround fw beaconing bug" is freeing
DMA-coherent memory in irq context which is hitting BUG ON
in ARM platforms. Fix this by moving dma_free out of spin
lock.

kernel BUG at mm/vmalloc.c:1512!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
task: dd58b840 ti: da6a6000 task.ti: da6a6000
PC is at vunmap+0x24/0x34
LR is at __arm_dma_free.isra.21+0x12c/0x190
[<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
[<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
(ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
[<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
(ath10k_remove_interface+0x44/0x1ec [ath10k_core])
[<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
(ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
[<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
(ieee80211_stop+0x10/0x18 [mac80211])
[<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
(__dev_close_many+0x9c/0xcc)

Cc: Michal Kazior <[email protected]>
Signed-off-by: Rajkumar Manoharan <[email protected]>
---
drivers/net/wireless/ath/ath10k/mac.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index bf8333c..dd4f56a 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -498,21 +498,6 @@ void ath10k_mac_vif_beacon_free(struct ath10k_vif *arvif)
arvif->beacon_sent = false;
}

-static void ath10k_mac_vif_beacon_cleanup(struct ath10k_vif *arvif)
-{
- struct ath10k *ar = arvif->ar;
-
- lockdep_assert_held(&ar->data_lock);
-
- ath10k_mac_vif_beacon_free(arvif);
-
- if (arvif->beacon_buf) {
- dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
- arvif->beacon_buf, arvif->beacon_paddr);
- arvif->beacon_buf = NULL;
- }
-}
-
static inline int ath10k_vdev_setup_sync(struct ath10k *ar)
{
int ret;
@@ -2404,8 +2389,15 @@ void ath10k_halt(struct ath10k *ar)

spin_lock_bh(&ar->data_lock);
list_for_each_entry(arvif, &ar->arvifs, list)
- ath10k_mac_vif_beacon_cleanup(arvif);
+ ath10k_mac_vif_beacon_free(arvif);
spin_unlock_bh(&ar->data_lock);
+ list_for_each_entry(arvif, &ar->arvifs, list) {
+ if (!arvif->beacon_buf)
+ continue;
+ dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
+ arvif->beacon_buf, arvif->beacon_paddr);
+ arvif->beacon_buf = NULL;
+ }
}

static int ath10k_get_antenna(struct ieee80211_hw *hw, u32 *tx_ant, u32 *rx_ant)
@@ -2988,9 +2980,15 @@ static void ath10k_remove_interface(struct ieee80211_hw *hw,
cancel_work_sync(&arvif->wep_key_work);

spin_lock_bh(&ar->data_lock);
- ath10k_mac_vif_beacon_cleanup(arvif);
+ ath10k_mac_vif_beacon_free(arvif);
spin_unlock_bh(&ar->data_lock);

+ if (arvif->beacon_buf) {
+ dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
+ arvif->beacon_buf, arvif->beacon_paddr);
+ arvif->beacon_buf = NULL;
+ }
+
ret = ath10k_spectral_vif_stop(arvif);
if (ret)
ath10k_warn(ar, "failed to stop spectral for vdev %i: %d\n",
--
2.1.2



2014-10-08 11:06:57

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On Wed, Oct 08, 2014 at 12:50:28PM +0200, Michal Kazior wrote:
> On 8 October 2014 12:33, Rajkumar Manoharan <[email protected]> wrote:
> > On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
> >> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
> >> > The commit "ath10k: workaround fw beaconing bug" is freeing
> >> > DMA-coherent memory in irq context which is hitting BUG ON
> >> > in ARM platforms. Fix this by moving dma_free out of spin
> >> > lock.
> >>
> >> I hardly see how moving the freeing outside the spinlock is a fix.
> >>
> >>
> >> > kernel BUG at mm/vmalloc.c:1512!
> >> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> >> > CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
> >> > task: dd58b840 ti: da6a6000 task.ti: da6a6000
> >> > PC is at vunmap+0x24/0x34
> >> > LR is at __arm_dma_free.isra.21+0x12c/0x190
> >> > [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
> >> > [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
> >> > (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
> >> > [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
> >> > (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
> >> > [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
> >> > (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
> >> > [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
> >> > (ieee80211_stop+0x10/0x18 [mac80211])
> >> > [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
> >> > (__dev_close_many+0x9c/0xcc)
> >>
> >> 1. How can even ieee80211_add_virtual_monitor() call
> >> ath10k_remove_interface()? Upstream ath10k doesn't advertise
> >> IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
> >> you're not using upstream ath10k and/or have custom patches applied to
> >> ath10k.
> >>
> > This is the backtrace captured on panic and we are getting the same
> > backtrace consistently. I confirmed that add_virtual_monitor is not
> > called for ath10k as it is not advertising. ath10k_remove_interface is
> > called for master mode.
> >
> >> 2. How can ieee80211_stop() be called from an interrupt context
> >> anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
> >> ieee80211_roc_purge() which tries to get a hold of local->mtx. This
> >> implies ieee80211_stop() isn't design to be run in an interrupt
> >> context to begin with so I don't see why ath10k should even care if
> >> ath10k_remove_interface() is called in an interrupt context at this
> >> point.
> >>
> > in_interrupt is counting soft and hard irqs. ieee80211_stop is not
> > called from interrupt context. In ath10k, by aquiring spin_lock in
> > ath10k_mac_vif_beacon_free is increasing soft irq count.
> >
> > In ARM arch, __arm_dma_free is calling vunmap which might sleep. So it
> > can not be called within spin_lock.
>
> Did you try using GFP_ATOMIC in the dma_alloc_coherent instead of
> moving the spinlock?
>
Nope. The problem is while freeing dma memory not during allocation.
dma_free_coherent won't take any GFP_* flags.
>
> >
> > Similar to dma_alloc_coherent, dma_free_coherent can not be called under
> > soft irq context.
>
> The call trace points to ath10k_mac_vif_beacon_free() which doesn't
> use dma_free_coherent() so why are you blaming it for the BUG_ON?
>
I agree the calltrace is a bogus. ath10k_mac_vif_beacon_cleanup is
calling dma_free_coherent.

> If anything the offender should be dma_unmap_single() but the thing is
> beacon_buf is always allocated for AP/IBSS now which means
> dma_unmap_single() is never called. For non-AP/IBSS both arvif->beacon
> and arvif->beacon_buf are always NULL so neither
> dma_alloc/free_coherent nor dma_map/unmap_single are called.
>
Agree. We need one more check in ath10k_mac_vif_beacon_free. But here
the problem is dma_free_coherent is being called from soft irq context
which is not allowed (BUG_ON noted in ARM arch).

After moving dma_free_coherent out of spinlock, the kernel
BUG at mm/vmalloc.c:1512 is never hit.

-Rajkumar

2014-10-08 10:46:50

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On Wed, Oct 08, 2014 at 12:52:04PM +0300, Kalle Valo wrote:
> Rajkumar Manoharan <[email protected]> writes:
>
> > The commit "ath10k: workaround fw beaconing bug" is freeing
> > DMA-coherent memory in irq context which is hitting BUG ON
> > in ARM platforms. Fix this by moving dma_free out of spin
> > lock.

> [...]
>
> > @@ -2404,8 +2389,15 @@ void ath10k_halt(struct ath10k *ar)
> >
> > spin_lock_bh(&ar->data_lock);
> > list_for_each_entry(arvif, &ar->arvifs, list)
> > - ath10k_mac_vif_beacon_cleanup(arvif);
> > + ath10k_mac_vif_beacon_free(arvif);
> > spin_unlock_bh(&ar->data_lock);
> > + list_for_each_entry(arvif, &ar->arvifs, list) {
> > + if (!arvif->beacon_buf)
> > + continue;
> > + dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
> > + arvif->beacon_buf, arvif->beacon_paddr);
> > + arvif->beacon_buf = NULL;
> > + }
> > }
>
> Until now we have protected arvif->beacon_buf with data_lock. How do we
> know that this is safe to do without taking data_lock?
>
As said, spin_lock can not be used for dma_free_coherent.
arvif->beacon_buf is already protected by conf_mutex. At this state
in ath10k_halt path, no one can access beacon_buf. So mutex lock itself
is sufficient.

-Rajkumar

2014-10-08 12:24:19

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On 8 October 2014 12:48, Rajkumar Manoharan <[email protected]> wrote:
> On Wed, Oct 08, 2014 at 12:52:04PM +0300, Kalle Valo wrote:
>> Rajkumar Manoharan <[email protected]> writes:
>>
>> > The commit "ath10k: workaround fw beaconing bug" is freeing
>> > DMA-coherent memory in irq context which is hitting BUG ON
>> > in ARM platforms. Fix this by moving dma_free out of spin
>> > lock.
>
>> [...]
>>
>> > @@ -2404,8 +2389,15 @@ void ath10k_halt(struct ath10k *ar)
>> >
>> > spin_lock_bh(&ar->data_lock);
>> > list_for_each_entry(arvif, &ar->arvifs, list)
>> > - ath10k_mac_vif_beacon_cleanup(arvif);
>> > + ath10k_mac_vif_beacon_free(arvif);
>> > spin_unlock_bh(&ar->data_lock);
>> > + list_for_each_entry(arvif, &ar->arvifs, list) {
>> > + if (!arvif->beacon_buf)
>> > + continue;
>> > + dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
>> > + arvif->beacon_buf, arvif->beacon_paddr);
>> > + arvif->beacon_buf = NULL;
>> > + }
>> > }
>>
>> Until now we have protected arvif->beacon_buf with data_lock. How do we
>> know that this is safe to do without taking data_lock?
>>
> As said, spin_lock can not be used for dma_free_coherent.
> arvif->beacon_buf is already protected by conf_mutex. At this state
> in ath10k_halt path, no one can access beacon_buf. So mutex lock itself
> is sufficient.

beacon_buf is protected by conf_mutex implicitly. It wasn't the main
intent. It is protected with data_lock spinlock.

Do not trust the device - if there's a spurious SWBA event while
ath10k_remove_interface() is running you could end up with invalid
memory access.

It might be acceptable to drop the spinlock for ath10k_halt() since
the device is guaranteed to be stopped at that point (effectively
reset) though.

Anyway I'm hoping this bug can be fixed with the gfp flag.


Michał

2014-10-08 12:37:13

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

Michal Kazior <[email protected]> writes:

>>> Until now we have protected arvif->beacon_buf with data_lock. How do we
>>> know that this is safe to do without taking data_lock?
>>>
>> As said, spin_lock can not be used for dma_free_coherent.
>> arvif->beacon_buf is already protected by conf_mutex. At this state
>> in ath10k_halt path, no one can access beacon_buf. So mutex lock itself
>> is sufficient.
>
> beacon_buf is protected by conf_mutex implicitly. It wasn't the main
> intent. It is protected with data_lock spinlock.
>
> Do not trust the device - if there's a spurious SWBA event while
> ath10k_remove_interface() is running you could end up with invalid
> memory access.
>
> It might be acceptable to drop the spinlock for ath10k_halt() since
> the device is guaranteed to be stopped at that point (effectively
> reset) though.
>
> Anyway I'm hoping this bug can be fixed with the gfp flag.

Yeah, fixing this with the gfp flag would be much better solution.

--
Kalle Valo

2014-10-08 12:17:15

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On 8 October 2014 13:13, Rajkumar Manoharan <[email protected]> wrote:
> On Wed, Oct 08, 2014 at 04:38:44PM +0530, Rajkumar Manoharan wrote:
>> On Wed, Oct 08, 2014 at 12:50:28PM +0200, Michal Kazior wrote:
>> > If anything the offender should be dma_unmap_single() but the thing is
>> > beacon_buf is always allocated for AP/IBSS now which means
>> > dma_unmap_single() is never called. For non-AP/IBSS both arvif->beacon
>> > and arvif->beacon_buf are always NULL so neither
>> > dma_alloc/free_coherent nor dma_map/unmap_single are called.
>> >
>> Agree. We need one more check in ath10k_mac_vif_beacon_free.
>
> No additional check is needed. For non beaconing mode, arvif->beacon
> should be false. isn't it?

Correct. No extra checks are necessary.


Michał

2014-10-08 09:52:16

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

Rajkumar Manoharan <[email protected]> writes:

> The commit "ath10k: workaround fw beaconing bug" is freeing
> DMA-coherent memory in irq context which is hitting BUG ON
> in ARM platforms. Fix this by moving dma_free out of spin
> lock.
>
> kernel BUG at mm/vmalloc.c:1512!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
> task: dd58b840 ti: da6a6000 task.ti: da6a6000
> PC is at vunmap+0x24/0x34
> LR is at __arm_dma_free.isra.21+0x12c/0x190
> [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
> [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
> (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
> [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
> (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
> [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
> (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
> [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
> (ieee80211_stop+0x10/0x18 [mac80211])
> [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
> (__dev_close_many+0x9c/0xcc)
>
> Cc: Michal Kazior <[email protected]>
> Signed-off-by: Rajkumar Manoharan <[email protected]>

[...]

> @@ -2404,8 +2389,15 @@ void ath10k_halt(struct ath10k *ar)
>
> spin_lock_bh(&ar->data_lock);
> list_for_each_entry(arvif, &ar->arvifs, list)
> - ath10k_mac_vif_beacon_cleanup(arvif);
> + ath10k_mac_vif_beacon_free(arvif);
> spin_unlock_bh(&ar->data_lock);
> + list_for_each_entry(arvif, &ar->arvifs, list) {
> + if (!arvif->beacon_buf)
> + continue;
> + dma_free_coherent(ar->dev, IEEE80211_MAX_FRAME_LEN,
> + arvif->beacon_buf, arvif->beacon_paddr);
> + arvif->beacon_buf = NULL;
> + }
> }

Until now we have protected arvif->beacon_buf with data_lock. How do we
know that this is safe to do without taking data_lock?

--
Kalle Valo

2014-10-08 09:45:39

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
> The commit "ath10k: workaround fw beaconing bug" is freeing
> DMA-coherent memory in irq context which is hitting BUG ON
> in ARM platforms. Fix this by moving dma_free out of spin
> lock.

I hardly see how moving the freeing outside the spinlock is a fix.


> kernel BUG at mm/vmalloc.c:1512!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
> task: dd58b840 ti: da6a6000 task.ti: da6a6000
> PC is at vunmap+0x24/0x34
> LR is at __arm_dma_free.isra.21+0x12c/0x190
> [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
> [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
> (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
> [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
> (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
> [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
> (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
> [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
> (ieee80211_stop+0x10/0x18 [mac80211])
> [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
> (__dev_close_many+0x9c/0xcc)

1. How can even ieee80211_add_virtual_monitor() call
ath10k_remove_interface()? Upstream ath10k doesn't advertise
IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
you're not using upstream ath10k and/or have custom patches applied to
ath10k.

2. How can ieee80211_stop() be called from an interrupt context
anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
ieee80211_roc_purge() which tries to get a hold of local->mtx. This
implies ieee80211_stop() isn't design to be run in an interrupt
context to begin with so I don't see why ath10k should even care if
ath10k_remove_interface() is called in an interrupt context at this
point.


Michał

2014-10-08 10:50:29

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On 8 October 2014 12:33, Rajkumar Manoharan <[email protected]> wrote:
> On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
>> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
>> > The commit "ath10k: workaround fw beaconing bug" is freeing
>> > DMA-coherent memory in irq context which is hitting BUG ON
>> > in ARM platforms. Fix this by moving dma_free out of spin
>> > lock.
>>
>> I hardly see how moving the freeing outside the spinlock is a fix.
>>
>>
>> > kernel BUG at mm/vmalloc.c:1512!
>> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
>> > CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
>> > task: dd58b840 ti: da6a6000 task.ti: da6a6000
>> > PC is at vunmap+0x24/0x34
>> > LR is at __arm_dma_free.isra.21+0x12c/0x190
>> > [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
>> > [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
>> > (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
>> > [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
>> > (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
>> > [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
>> > (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
>> > [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
>> > (ieee80211_stop+0x10/0x18 [mac80211])
>> > [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
>> > (__dev_close_many+0x9c/0xcc)
>>
>> 1. How can even ieee80211_add_virtual_monitor() call
>> ath10k_remove_interface()? Upstream ath10k doesn't advertise
>> IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
>> you're not using upstream ath10k and/or have custom patches applied to
>> ath10k.
>>
> This is the backtrace captured on panic and we are getting the same
> backtrace consistently. I confirmed that add_virtual_monitor is not
> called for ath10k as it is not advertising. ath10k_remove_interface is
> called for master mode.
>
>> 2. How can ieee80211_stop() be called from an interrupt context
>> anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
>> ieee80211_roc_purge() which tries to get a hold of local->mtx. This
>> implies ieee80211_stop() isn't design to be run in an interrupt
>> context to begin with so I don't see why ath10k should even care if
>> ath10k_remove_interface() is called in an interrupt context at this
>> point.
>>
> in_interrupt is counting soft and hard irqs. ieee80211_stop is not
> called from interrupt context. In ath10k, by aquiring spin_lock in
> ath10k_mac_vif_beacon_free is increasing soft irq count.
>
> In ARM arch, __arm_dma_free is calling vunmap which might sleep. So it
> can not be called within spin_lock.

Did you try using GFP_ATOMIC in the dma_alloc_coherent instead of
moving the spinlock?


>
> Similar to dma_alloc_coherent, dma_free_coherent can not be called under
> soft irq context.

The call trace points to ath10k_mac_vif_beacon_free() which doesn't
use dma_free_coherent() so why are you blaming it for the BUG_ON?

If anything the offender should be dma_unmap_single() but the thing is
beacon_buf is always allocated for AP/IBSS now which means
dma_unmap_single() is never called. For non-AP/IBSS both arvif->beacon
and arvif->beacon_buf are always NULL so neither
dma_alloc/free_coherent nor dma_map/unmap_single are called.


> PS: I am using upstream ath10k driver without private changes.

Interesting. Why is there ieee80211_add_virtual_monitor() calling
ath10k_remove_interface() then? This makes absolutely no sense.

Can you try checking what lines does the call trace point to in gdb?


Michał

2014-10-08 11:11:46

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On Wed, Oct 08, 2014 at 04:38:44PM +0530, Rajkumar Manoharan wrote:
> On Wed, Oct 08, 2014 at 12:50:28PM +0200, Michal Kazior wrote:
> > On 8 October 2014 12:33, Rajkumar Manoharan <[email protected]> wrote:
> > > On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
> > >> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
> > >> > The commit "ath10k: workaround fw beaconing bug" is freeing
> > >> > DMA-coherent memory in irq context which is hitting BUG ON
> > >> > in ARM platforms. Fix this by moving dma_free out of spin
> > >> > lock.
> > >>
> > >> I hardly see how moving the freeing outside the spinlock is a fix.
> > >>
> > >>
> > >> > kernel BUG at mm/vmalloc.c:1512!
> > >> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> > >> > CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
> > >> > task: dd58b840 ti: da6a6000 task.ti: da6a6000
> > >> > PC is at vunmap+0x24/0x34
> > >> > LR is at __arm_dma_free.isra.21+0x12c/0x190
> > >> > [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
> > >> > [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
> > >> > (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
> > >> > [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
> > >> > (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
> > >> > [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
> > >> > (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
> > >> > [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
> > >> > (ieee80211_stop+0x10/0x18 [mac80211])
> > >> > [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
> > >> > (__dev_close_many+0x9c/0xcc)
> > >>
> > >> 1. How can even ieee80211_add_virtual_monitor() call
> > >> ath10k_remove_interface()? Upstream ath10k doesn't advertise
> > >> IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
> > >> you're not using upstream ath10k and/or have custom patches applied to
> > >> ath10k.
> > >>
> > > This is the backtrace captured on panic and we are getting the same
> > > backtrace consistently. I confirmed that add_virtual_monitor is not
> > > called for ath10k as it is not advertising. ath10k_remove_interface is
> > > called for master mode.
> > >
> > >> 2. How can ieee80211_stop() be called from an interrupt context
> > >> anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
> > >> ieee80211_roc_purge() which tries to get a hold of local->mtx. This
> > >> implies ieee80211_stop() isn't design to be run in an interrupt
> > >> context to begin with so I don't see why ath10k should even care if
> > >> ath10k_remove_interface() is called in an interrupt context at this
> > >> point.
> > >>
> > > in_interrupt is counting soft and hard irqs. ieee80211_stop is not
> > > called from interrupt context. In ath10k, by aquiring spin_lock in
> > > ath10k_mac_vif_beacon_free is increasing soft irq count.
> > >
> > > In ARM arch, __arm_dma_free is calling vunmap which might sleep. So it
> > > can not be called within spin_lock.
> >
> > Did you try using GFP_ATOMIC in the dma_alloc_coherent instead of
> > moving the spinlock?
> >
> Nope. The problem is while freeing dma memory not during allocation.
> dma_free_coherent won't take any GFP_* flags.
> >
> > >
> > > Similar to dma_alloc_coherent, dma_free_coherent can not be called under
> > > soft irq context.
> >
> > The call trace points to ath10k_mac_vif_beacon_free() which doesn't
> > use dma_free_coherent() so why are you blaming it for the BUG_ON?
> >
> I agree the calltrace is a bogus. ath10k_mac_vif_beacon_cleanup is
> calling dma_free_coherent.
>
> > If anything the offender should be dma_unmap_single() but the thing is
> > beacon_buf is always allocated for AP/IBSS now which means
> > dma_unmap_single() is never called. For non-AP/IBSS both arvif->beacon
> > and arvif->beacon_buf are always NULL so neither
> > dma_alloc/free_coherent nor dma_map/unmap_single are called.
> >
> Agree. We need one more check in ath10k_mac_vif_beacon_free.

No additional check is needed. For non beaconing mode, arvif->beacon
should be false. isn't it?

-Rajkumar

2014-10-08 10:31:48

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
> > The commit "ath10k: workaround fw beaconing bug" is freeing
> > DMA-coherent memory in irq context which is hitting BUG ON
> > in ARM platforms. Fix this by moving dma_free out of spin
> > lock.
>
> I hardly see how moving the freeing outside the spinlock is a fix.
>
>
> > kernel BUG at mm/vmalloc.c:1512!
> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> > CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
> > task: dd58b840 ti: da6a6000 task.ti: da6a6000
> > PC is at vunmap+0x24/0x34
> > LR is at __arm_dma_free.isra.21+0x12c/0x190
> > [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
> > [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
> > (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
> > [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
> > (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
> > [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
> > (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
> > [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
> > (ieee80211_stop+0x10/0x18 [mac80211])
> > [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
> > (__dev_close_many+0x9c/0xcc)
>
> 1. How can even ieee80211_add_virtual_monitor() call
> ath10k_remove_interface()? Upstream ath10k doesn't advertise
> IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
> you're not using upstream ath10k and/or have custom patches applied to
> ath10k.
>
This is the backtrace captured on panic and we are getting the same
backtrace consistently. I confirmed that add_virtual_monitor is not
called for ath10k as it is not advertising. ath10k_remove_interface is
called for master mode.

> 2. How can ieee80211_stop() be called from an interrupt context
> anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
> ieee80211_roc_purge() which tries to get a hold of local->mtx. This
> implies ieee80211_stop() isn't design to be run in an interrupt
> context to begin with so I don't see why ath10k should even care if
> ath10k_remove_interface() is called in an interrupt context at this
> point.
>
in_interrupt is counting soft and hard irqs. ieee80211_stop is not
called from interrupt context. In ath10k, by aquiring spin_lock in
ath10k_mac_vif_beacon_free is increasing soft irq count.

In ARM arch, __arm_dma_free is calling vunmap which might sleep. So it
can not be called within spin_lock.

Similar to dma_alloc_coherent, dma_free_coherent can not be called under
soft irq context.

PS: I am using upstream ath10k driver without private changes.

-Rajkumar

2014-10-08 12:16:20

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On 8 October 2014 13:08, Rajkumar Manoharan <[email protected]> wrote:
> On Wed, Oct 08, 2014 at 12:50:28PM +0200, Michal Kazior wrote:
>> On 8 October 2014 12:33, Rajkumar Manoharan <[email protected]> wrote:
>> > On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
>> >> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
>> >> > The commit "ath10k: workaround fw beaconing bug" is freeing
>> >> > DMA-coherent memory in irq context which is hitting BUG ON
>> >> > in ARM platforms. Fix this by moving dma_free out of spin
>> >> > lock.
>> >>
>> >> I hardly see how moving the freeing outside the spinlock is a fix.
>> >>
>> >>
>> >> > kernel BUG at mm/vmalloc.c:1512!
>> >> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
>> >> > CPU: 0 PID: 722 Comm: hostapd Not tainted 3.14.0 #3
>> >> > task: dd58b840 ti: da6a6000 task.ti: da6a6000
>> >> > PC is at vunmap+0x24/0x34
>> >> > LR is at __arm_dma_free.isra.21+0x12c/0x190
>> >> > [<c02a97d0>] (vunmap) from [<c021f81c>] (__arm_dma_free.isra.21+0x12c/0x190)
>> >> > [<c021f81c>] (__arm_dma_free.isra.21) from [<bf3b2440>]
>> >> > (ath10k_mac_vif_beacon_free+0xf4/0x100 [ath10k_core])
>> >> > [<bf3b2440>] (ath10k_mac_vif_beacon_free [ath10k_core]) from [<bf3b2490>]
>> >> > (ath10k_remove_interface+0x44/0x1ec [ath10k_core])
>> >> > [<bf3b2490>] (ath10k_remove_interface [ath10k_core]) from [<bf3352e4>]
>> >> > (ieee80211_add_virtual_monitor+0x9d8/0x9f0 [mac80211])
>> >> > [<bf3352e4>] (ieee80211_add_virtual_monitor [mac80211]) from [<bf33530c>]
>> >> > (ieee80211_stop+0x10/0x18 [mac80211])
>> >> > [<bf33530c>] (ieee80211_stop [mac80211]) from [<c040d144>]
>> >> > (__dev_close_many+0x9c/0xcc)
>> >>
>> >> 1. How can even ieee80211_add_virtual_monitor() call
>> >> ath10k_remove_interface()? Upstream ath10k doesn't advertise
>> >> IEEE80211_HW_WANT_MONITOR_VIF. This call trace is either invalid,
>> >> you're not using upstream ath10k and/or have custom patches applied to
>> >> ath10k.
>> >>
>> > This is the backtrace captured on panic and we are getting the same
>> > backtrace consistently. I confirmed that add_virtual_monitor is not
>> > called for ath10k as it is not advertising. ath10k_remove_interface is
>> > called for master mode.
>> >
>> >> 2. How can ieee80211_stop() be called from an interrupt context
>> >> anyway? ieee80211_stop() calls ieee80211_do_stop() which calls
>> >> ieee80211_roc_purge() which tries to get a hold of local->mtx. This
>> >> implies ieee80211_stop() isn't design to be run in an interrupt
>> >> context to begin with so I don't see why ath10k should even care if
>> >> ath10k_remove_interface() is called in an interrupt context at this
>> >> point.
>> >>
>> > in_interrupt is counting soft and hard irqs. ieee80211_stop is not
>> > called from interrupt context. In ath10k, by aquiring spin_lock in
>> > ath10k_mac_vif_beacon_free is increasing soft irq count.
>> >
>> > In ARM arch, __arm_dma_free is calling vunmap which might sleep. So it
>> > can not be called within spin_lock.
>>
>> Did you try using GFP_ATOMIC in the dma_alloc_coherent instead of
>> moving the spinlock?
>>
> Nope. The problem is while freeing dma memory not during allocation.
> dma_free_coherent won't take any GFP_* flags.

My thinking was GFP_ given to dma_alloc_coherent could impact
behaviour of dma_free_coherent (e.g. different kind of pages is
allocated so different kind of freeing will be involved) but I may be
wrong.

Can you give it a try, please?


>> > Similar to dma_alloc_coherent, dma_free_coherent can not be called under
>> > soft irq context.
>>
>> The call trace points to ath10k_mac_vif_beacon_free() which doesn't
>> use dma_free_coherent() so why are you blaming it for the BUG_ON?
>>
> I agree the calltrace is a bogus. ath10k_mac_vif_beacon_cleanup is
> calling dma_free_coherent.

The call trace without additional comment in the commit log (stating
the fact that the call trace is inaccurate for some reason) is
confusing.


>> If anything the offender should be dma_unmap_single() but the thing is
>> beacon_buf is always allocated for AP/IBSS now which means
>> dma_unmap_single() is never called. For non-AP/IBSS both arvif->beacon
>> and arvif->beacon_buf are always NULL so neither
>> dma_alloc/free_coherent nor dma_map/unmap_single are called.
>>
> Agree. We need one more check in ath10k_mac_vif_beacon_free. But here
> the problem is dma_free_coherent is being called from soft irq context
> which is not allowed (BUG_ON noted in ARM arch).
>
> After moving dma_free_coherent out of spinlock, the kernel
> BUG at mm/vmalloc.c:1512 is never hit.

It seems there's a bit of branching in the arm dma code. Maybe it's
possible to fix this problem in ath10k differently (vide the gfp
idea).


Michał

2014-10-10 09:34:48

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix kernel panic while shutting down AP

On Wed, Oct 08, 2014 at 02:16:18PM +0200, Michal Kazior wrote:
> On 8 October 2014 13:08, Rajkumar Manoharan <[email protected]> wrote:
> > On Wed, Oct 08, 2014 at 12:50:28PM +0200, Michal Kazior wrote:
> >> On 8 October 2014 12:33, Rajkumar Manoharan <[email protected]> wrote:
> >> > On Wed, Oct 08, 2014 at 11:45:38AM +0200, Michal Kazior wrote:
> >> >> On 8 October 2014 11:16, Rajkumar Manoharan <[email protected]> wrote:
> >> >> > The commit "ath10k: workaround fw beaconing bug" is freeing
> >> >> > DMA-coherent memory in irq context which is hitting BUG ON
> >> >> > in ARM platforms. Fix this by moving dma_free out of spin
> >> >> > lock.
> >> >>
> >> >> I hardly see how moving the freeing outside the spinlock is a fix.
> >> >>
[...]
>
> My thinking was GFP_ given to dma_alloc_coherent could impact
> behaviour of dma_free_coherent (e.g. different kind of pages is
> allocated so different kind of freeing will be involved) but I may be
> wrong.
>
> Can you give it a try, please?
>
Michal,

Sorry for the delay.. Good catch. It is working after changing the GFP flag to
atomic. Will send the updated version and remove the backtrace from
commit log as well.

Thanks for the review.

-Rajkumar