2012-07-02 15:21:25

by Larry Finger

[permalink] [raw]
Subject: Kernel oops in __netif_schedule() for at76c50x-usb

Regarding the oops that I reported for PPC architecture that reported "Unable to
handle kernel paging request for data at address 0x000004c", I have now repeated
it on x86_64 architecture, where the objdump tool is better. The error occurs in
the line in __netif_schedule() that says

if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state))

Debug printouts have shown that q is not NULL, and it appears to be in the
correct address range. I think q->state is zero; however, q->state cannot be
written.

Additional testing shows this problem to be another side effect of commit
3a25a8c ("mac80211: add improved HW queue control") for a device with only a
single HW queue.

Any suggestions for additional debugging printouts will be greatly appreciated.

Thanks,

Larry


2012-07-02 22:50:24

by Larry Finger

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

On 07/02/2012 12:38 PM, Johannes Berg wrote:
>
>>> I'm not sure which fix is correct though. Should we have software QoS
>>> queues for these drivers, but we'll never use them? Then this would
>>> work:
>>> http://p.sipsolutions.net/e015bf7db9a05887.txt
>>>
>>> Or we could change the enable code path. Hmm.
>>
>> That patch does prevent the oops. I was not able to make a connection with the
>> device, but I just acquired it, and I'm not sure of its quality, or that of the
>> driver.
>
> I don't think that device works today -- IIRC it requires the BSSID
> before authentication and that wasn't possible before the auth redesign.
>
>> It does scan OK, and I think the patch is OK. I'll do more tests with
>> b43legacy later as the machine with that iface is busy. I will also test b43 on
>> the PPC using the open-source firmware.
>>
>> Although you may want to change the enable code path, some patch will be needed
>> to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
>> for me.
>
> Thanks. Could you try this patch instead? I think it makes more sense.
>
> http://p.sipsolutions.net/c3e9b814a409ca11.txt

That one fails and gives the oops in __netif_schedule.

Larry


2012-07-04 09:46:14

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

On Mon, 2012-07-02 at 17:50 -0500, Larry Finger wrote:
> On 07/02/2012 12:38 PM, Johannes Berg wrote:
> >
> >>> I'm not sure which fix is correct though. Should we have software QoS
> >>> queues for these drivers, but we'll never use them? Then this would
> >>> work:
> >>> http://p.sipsolutions.net/e015bf7db9a05887.txt
> >>>
> >>> Or we could change the enable code path. Hmm.
> >>
> >> That patch does prevent the oops. I was not able to make a connection with the
> >> device, but I just acquired it, and I'm not sure of its quality, or that of the
> >> driver.
> >
> > I don't think that device works today -- IIRC it requires the BSSID
> > before authentication and that wasn't possible before the auth redesign.
> >
> >> It does scan OK, and I think the patch is OK. I'll do more tests with
> >> b43legacy later as the machine with that iface is busy. I will also test b43 on
> >> the PPC using the open-source firmware.
> >>
> >> Although you may want to change the enable code path, some patch will be needed
> >> to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
> >> for me.
> >
> > Thanks. Could you try this patch instead? I think it makes more sense.
> >
> > http://p.sipsolutions.net/c3e9b814a409ca11.txt
>
> That one fails and gives the oops in __netif_schedule.

Hmmm, that's odd. I'll try to reproduce this to be able to track it
better.

johannes


2012-07-02 16:13:09

by Larry Finger

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

On 07/02/2012 10:31 AM, Johannes Berg wrote:
> Hi Larry,
>
> Sorry! I had your other email still marked unread but hadn't gotten
> around to it :-(
>
>> Regarding the oops that I reported for PPC architecture that reported "Unable to
>> handle kernel paging request for data at address 0x000004c", I have now repeated
>> it on x86_64 architecture, where the objdump tool is better. The error occurs in
>> the line in __netif_schedule() that says
>>
>> if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state))
>>
>> Debug printouts have shown that q is not NULL, and it appears to be in the
>> correct address range. I think q->state is zero; however, q->state cannot be
>> written.
>>
>> Additional testing shows this problem to be another side effect of commit
>> 3a25a8c ("mac80211: add improved HW queue control") for a device with only a
>> single HW queue.
>
> Looking at the code again, it seems pretty obviously wrong ... OUCH!
>
> I'm not sure which fix is correct though. Should we have software QoS
> queues for these drivers, but we'll never use them? Then this would
> work:
> http://p.sipsolutions.net/e015bf7db9a05887.txt
>
> Or we could change the enable code path. Hmm.

That patch does prevent the oops. I was not able to make a connection with the
device, but I just acquired it, and I'm not sure of its quality, or that of the
driver. It does scan OK, and I think the patch is OK. I'll do more tests with
b43legacy later as the machine with that iface is busy. I will also test b43 on
the PPC using the open-source firmware.

Although you may want to change the enable code path, some patch will be needed
to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
for me.

Larry



2012-07-02 17:38:57

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb


> > I'm not sure which fix is correct though. Should we have software QoS
> > queues for these drivers, but we'll never use them? Then this would
> > work:
> > http://p.sipsolutions.net/e015bf7db9a05887.txt
> >
> > Or we could change the enable code path. Hmm.
>
> That patch does prevent the oops. I was not able to make a connection with the
> device, but I just acquired it, and I'm not sure of its quality, or that of the
> driver.

I don't think that device works today -- IIRC it requires the BSSID
before authentication and that wasn't possible before the auth redesign.

> It does scan OK, and I think the patch is OK. I'll do more tests with
> b43legacy later as the machine with that iface is busy. I will also test b43 on
> the PPC using the open-source firmware.
>
> Although you may want to change the enable code path, some patch will be needed
> to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
> for me.

Thanks. Could you try this patch instead? I think it makes more sense.

http://p.sipsolutions.net/c3e9b814a409ca11.txt

johannes


2012-07-04 10:54:47

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

On Wed, 2012-07-04 at 12:49 +0200, Johannes Berg wrote:
> On Wed, 2012-07-04 at 11:46 +0200, Johannes Berg wrote:
>
> > > >> Although you may want to change the enable code path, some patch will be needed
> > > >> to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
> > > >> for me.
> > > >
> > > > Thanks. Could you try this patch instead? I think it makes more sense.
> > > >
> > > > http://p.sipsolutions.net/c3e9b814a409ca11.txt
> > >
> > > That one fails and gives the oops in __netif_schedule.
> >
> > Hmmm, that's odd. I'll try to reproduce this to be able to track it
> > better.
>
> Ok, strange. I can reproduce the original problem easily with hwsim, but
> with this new patch, which should be equivalent to the old, it's fixed:
>
> http://p.sipsolutions.net/d78d8740ad2d15b4.txt

Ok, actually, the same bug is on stop, but for some reason that doesn't
crash for me.

I've posted this patch now:
http://p.sipsolutions.net/55032a5ae0520dd8.txt

johannes


2012-07-04 10:49:45

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

On Wed, 2012-07-04 at 11:46 +0200, Johannes Berg wrote:

> > >> Although you may want to change the enable code path, some patch will be needed
> > >> to prevent a regression in 3.5. If this is the one, you may add a "Tested-by"
> > >> for me.
> > >
> > > Thanks. Could you try this patch instead? I think it makes more sense.
> > >
> > > http://p.sipsolutions.net/c3e9b814a409ca11.txt
> >
> > That one fails and gives the oops in __netif_schedule.
>
> Hmmm, that's odd. I'll try to reproduce this to be able to track it
> better.

Ok, strange. I can reproduce the original problem easily with hwsim, but
with this new patch, which should be equivalent to the old, it's fixed:

http://p.sipsolutions.net/d78d8740ad2d15b4.txt

Can you try just this patch?

johannes


2012-07-02 15:31:28

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops in __netif_schedule() for at76c50x-usb

Hi Larry,

Sorry! I had your other email still marked unread but hadn't gotten
around to it :-(

> Regarding the oops that I reported for PPC architecture that reported "Unable to
> handle kernel paging request for data at address 0x000004c", I have now repeated
> it on x86_64 architecture, where the objdump tool is better. The error occurs in
> the line in __netif_schedule() that says
>
> if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state))
>
> Debug printouts have shown that q is not NULL, and it appears to be in the
> correct address range. I think q->state is zero; however, q->state cannot be
> written.
>
> Additional testing shows this problem to be another side effect of commit
> 3a25a8c ("mac80211: add improved HW queue control") for a device with only a
> single HW queue.

Looking at the code again, it seems pretty obviously wrong ... OUCH!

I'm not sure which fix is correct though. Should we have software QoS
queues for these drivers, but we'll never use them? Then this would
work:
http://p.sipsolutions.net/e015bf7db9a05887.txt

Or we could change the enable code path. Hmm.

johannes