2017-10-24 20:03:15

by Mario Theodoridis

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd

Sorry for skipping the list one the last one.

On 19.10.2017 22:59, James Cameron wrote:
> On Thu, Oct 19, 2017 at 08:56:46AM +0200, Mario Theodoridis wrote:
>> On 18/10/17 23:33, James Cameron wrote:
>>
>> For your interest, kernel v4.4.93 in stable series just released has
>> changes in relevant files.
>>
>> [1]https://lwn.net/Articles/736770/
>>
>> Thanks James,
>>
>> after looking into bisection last night, i found that just before i wanted to
>> test out the 4.4.0-82 kernel, i found 3 stack traces in my syslog. :(
>>
>> I guess, i'm dealing with race conditions now. But it seems the 79 kernel still
>> crashes wifi a lot less than later ones.
>>
>> How do i get line numbers into these traces?

As the 4.4.0-79 kernel was sometimes crapping out, too, i decided to try
to test the latest kernel instead of bisecting after all.
This took a while because virtualbox was being a bitch. virtualbox-5.0
doesn't bode well with virtualbox-dkms-51, so i ended up rebuilding
virtualbox-5.1 to prevent dependency hell.
The vb-dkms package doesn't do 4.14, so i ended up going with the 4.13
kernel that comes with artful.

This one pretty quickly loads my syslog with new error stacks. I haven't
tested actual behavior yet, but the logs don't look so hot.

I ran another wireless-info (attached) and appended some of the syslog
stuff to it.


> Several methods, though by far the most common seems to be personal
> experience with offsets.
>
> When you don't have that personal experience, the methods are;
>
> 1. using GDB against the .o file,
>
> 2. using binutils objdump to disassemble .o file or vmlinuz,
>
> 3. using GCC to generate assembly listings,
>
> See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
> the end of page for the GDB method.

I have gotten around to that part, yet, as i was busy with the above,
but it seems later versions have issues, too.


--
Mit freundlichen Grüßen/Best regards

Mario Theodoridis


Attachments:
wireless-info.txt (33.76 kB)

2017-10-24 21:02:05

by James Cameron

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd

Summary: WARN_ON(iwl_mvm_is_dqa_supported(mvm)) in
iwl_mvm_rx_tx_cmd_single with v4.13, but code is since changed.

On Tue, Oct 24, 2017 at 09:56:31PM +0200, Mario Theodoridis wrote:
> Sorry for skipping the list one the last one.

Sorry, that was my fault. It was a private message you replied to.

> On 19.10.2017 22:59, James Cameron wrote:
> >On Thu, Oct 19, 2017 at 08:56:46AM +0200, Mario Theodoridis wrote:
> >>On 18/10/17 23:33, James Cameron wrote:
> >>
> >> For your interest, kernel v4.4.93 in stable series just released has
> >> changes in relevant files.
> >>
> >> https://lwn.net/Articles/736770/
> >>
> >>Thanks James,
> >>
> >>after looking into bisection last night, i found that just before
> >>i wanted to test out the 4.4.0-82 kernel, i found 3 stack traces
> >>in my syslog. :(
> >>
> >>I guess, i'm dealing with race conditions now. But it seems the 79
> >>kernel still crashes wifi a lot less than later ones.
> >>
> >>How do i get line numbers into these traces?
>
> As the 4.4.0-79 kernel was sometimes crapping out, too, i decided to
> try to test the latest kernel instead of bisecting after all. This
> took a while because virtualbox was being a bitch. virtualbox-5.0
> doesn't bode well with virtualbox-dkms-51, so i ended up rebuilding
> virtualbox-5.1 to prevent dependency hell. The vb-dkms package
> doesn't do 4.14, so i ended up going with the 4.13 kernel that comes
> with artful.

You didn't say virtualbox was essential for reproducing the problem,
so I'm continuing to exclude it from thought. If it is essential for
reproducing, then you might contact them.

Please do make sure you can exclude virtualbox as a cause.

> This one pretty quickly loads my syslog with new error stacks. I
> haven't tested actual behavior yet, but the logs don't look so hot.

Do connections frequently keep dying as before?

> I ran another wireless-info (attached) and appended some of the
> syslog stuff to it.

Thanks, you identified a line of code and cause; a WARN_ON in
iwl_mvm_rx_tx_cmd_single;

case TX_STATUS_FAIL_DEST_PS:
/* In DQA, the FW should have stopped the queue and not
* return this status
*/
WARN_ON(iwl_mvm_is_dqa_supported(mvm));
info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
break;

But it is only a warning. If connections aren't dying, it may not be
important to you.

Please check you are using the most recent linux-firmware?

> >Several methods, though by far the most common seems to be personal
> >experience with offsets.
> >
> >When you don't have that personal experience, the methods are;
> >
> >1. using GDB against the .o file,
> >
> >2. using binutils objdump to disassemble .o file or vmlinuz,
> >
> >3. using GCC to generate assembly listings,
> >
> >See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
> >the end of page for the GDB method.
>
> I have gotten around to that part, yet, as i was busy with the
> above, but it seems later versions have issues, too.

However, you're still testing old source code.

Several changes made since are worth testing, please either
cherry-pick the patches or test a 4.14 rc kernel, and without
involving dkms or virtualbox.

Or, if new firmware fixes the problem, go with that instead.

> --
> Mit freundlichen Gr??en/Best regards
>
> Mario Theodoridis

>
> ########## wireless info START ##########
> [...]

--
James Cameron
http://quozl.netrek.org/

2017-10-31 19:33:36

by Mario Theodoridis

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd



On 31.10.2017 20:25, Mario Theodoridis wrote:
> Hi James,
>
>
> On 24.10.2017 23:01, James Cameron wrote:
>> But it is only a warning.  If connections aren't dying, it may not be
>> important to you.
>
> Regarding whether wifi hangs, it's usually takes a while to get going
> and then disappears. Sunday night i ended up rebooting into the 4.4-79
> kernel because the 4.13 just got too ridiculous.
> I.e. Wlan off, wlan on no longer worked.
>
>
>> Please check you are using the most recent linux-firmware?
>
> Just in case i haven't answered that it's at 1.169
>
>
>>>> Several methods, though by far the most common seems to be personal
>>>> experience with offsets.
>>>>
>>>> When you don't have that personal experience, the methods are;
>>>>
>>>> 1.  using GDB against the .o file,
>>>>
>>>> 2.  using binutils objdump to disassemble .o file or vmlinuz,
>>>>
>>>> 3.  using GCC to generate assembly listings,
>>>>
>>>> See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
>>>> the end of page for the GDB method.
>>>
>>> I have gotten around to that part, yet, as i was busy with the
>>> above, but it seems later versions have issues, too.
>>
>> However, you're still testing old source code.
>>
>> Several changes made since are worth testing, please either
>> cherry-pick the patches or test a 4.14 rc kernel, and without
>> involving dkms or virtualbox.
>>
>> Or, if new firmware fixes the problem, go with that instead.
>
> I just managed to patch the 5.1.30 dkms package so i wouldn't need to
> update to virtualbox-5.2.
>
>
> Here are the results with the 4.14-rc7 kernel.
> As last time i appended what fills my syslog now.

Yes, maybe i ought to attach them ;)

--
Mit freundlichen Grüßen/Best regards

Mario Theodoridis


Attachments:
wireless-info.txt (29.71 kB)

2017-10-25 07:08:25

by Mario Theodoridis

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd

On 24/10/17 23:01, James Cameron wrote:
> Summary: WARN_ON(iwl_mvm_is_dqa_supported(mvm)) in
> iwl_mvm_rx_tx_cmd_single with v4.13, but code is since changed.
>
> On Tue, Oct 24, 2017 at 09:56:31PM +0200, Mario Theodoridis wrote:
>> Sorry for skipping the list one the last one.
>
> Sorry, that was my fault. It was a private message you replied to.
>
>> On 19.10.2017 22:59, James Cameron wrote:
>> [...]
>
> You didn't say virtualbox was essential for reproducing the problem,
> so I'm continuing to exclude it from thought. If it is essential for
> reproducing, then you might contact them.
>
> Please do make sure you can exclude virtualbox as a cause.

Let me clarify the virtualbox thing. The machine in question is a VM
host. It hosts several machines, one of which is my mail server, and
another (openbsd) which acts as a gateway to the internet for all machines.
If i run this machine without virtualbox, then my entire network
topology is off-line. While one could argue, that this is bad design,
the alternative would be to use openbsd as a virtual host, but i haven't
seen many tutorials on that. I also would like to run just one machine
24/7 to keep a tap on the electricity consumption.

This machine also bridges several interfaces and acts as a hotspot for
my wlan.

So i don't know whether virtualbox is responsible, but not running
virtualbox is simply not an option.

>> This one pretty quickly loads my syslog with new error stacks. I
>> haven't tested actual behavior yet, but the logs don't look so hot.
>
> Do connections frequently keep dying as before?
>
>> I ran another wireless-info (attached) and appended some of the
>> syslog stuff to it.
>
> Thanks, you identified a line of code and cause; a WARN_ON in
> iwl_mvm_rx_tx_cmd_single;
>
> case TX_STATUS_FAIL_DEST_PS:
> /* In DQA, the FW should have stopped the queue and not
> * return this status
> */
> WARN_ON(iwl_mvm_is_dqa_supported(mvm));
> info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
> break;
>
> But it is only a warning. If connections aren't dying, it may not be
> important to you.
>
> Please check you are using the most recent linux-firmware?

I'm running
ii linux-firmware 1.169 all
from artful.
No difference to the xenial version.

>
>>> Several methods, though by far the most common seems to be personal
>>> experience with offsets.
>>>
>>> When you don't have that personal experience, the methods are;
>>>
>>> 1. using GDB against the .o file,
>>>
>>> 2. using binutils objdump to disassemble .o file or vmlinuz,
>>>
>>> 3. using GCC to generate assembly listings,
>>>
>>> See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
>>> the end of page for the GDB method.
>>
>> I have gotten around to that part, yet, as i was busy with the
>> above, but it seems later versions have issues, too.
>
> However, you're still testing old source code.
>
> Several changes made since are worth testing, please either
> cherry-pick the patches or test a 4.14 rc kernel, and without
> involving dkms or virtualbox.

Then i'd have to patch those files so they build for 4.14 first.
I've seen patches, but still need to figure out how to get them applied
in the build process.

--
Mit freundlichen Grüßen/Best Regards

Mario Theodoridis

2017-10-25 09:06:49

by James Cameron

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd

On Wed, Oct 25, 2017 at 09:08:17AM +0200, Mario Theodoridis wrote:
> On 24/10/17 23:01, James Cameron wrote:
> >Summary: WARN_ON(iwl_mvm_is_dqa_supported(mvm)) in
> >iwl_mvm_rx_tx_cmd_single with v4.13, but code is since changed.
> >
> >On Tue, Oct 24, 2017 at 09:56:31PM +0200, Mario Theodoridis wrote:
> >>Sorry for skipping the list one the last one.
> >
> >Sorry, that was my fault. It was a private message you replied to.
> >
> >>On 19.10.2017 22:59, James Cameron wrote:
> >>[...]
> >
> >You didn't say virtualbox was essential for reproducing the problem,
> >so I'm continuing to exclude it from thought. If it is essential for
> >reproducing, then you might contact them.
> >
> >Please do make sure you can exclude virtualbox as a cause.
>
> Let me clarify the virtualbox thing. The machine in question is a VM host.
> It hosts several machines, one of which is my mail server, and another
> (openbsd) which acts as a gateway to the internet for all machines.
> If i run this machine without virtualbox, then my entire network topology is
> off-line. While one could argue, that this is bad design, the alternative
> would be to use openbsd as a virtual host, but i haven't seen many tutorials
> on that. I also would like to run just one machine 24/7 to keep a tap on the
> electricity consumption.
>
> This machine also bridges several interfaces and acts as a hotspot for my
> wlan.
>
> So i don't know whether virtualbox is responsible, but not running
> virtualbox is simply not an option.

Thanks.

I don't have a machine with the same wireless device, so I can't hope
to reproduce the problem or test fixes. I do have a slightly later
wireless device which uses the same driver, but I'm not confident it
would reproduce the problem, because (a) I've not seen the same stack
traces, (b) the WARN_ON relates to device response coded in firmware,
and my wireless device may use different firmware, and (c) it isn't
clear to me what you did to enable the problem.

You do have a machine, and you might do tests without virtualbox,
but as you say, this is not an option for you.

> >>This one pretty quickly loads my syslog with new error stacks. I
> >>haven't tested actual behavior yet, but the logs don't look so hot.
> >
> >Do connections frequently keep dying as before?
> >
> >>I ran another wireless-info (attached) and appended some of the
> >>syslog stuff to it.
> >
> >Thanks, you identified a line of code and cause; a WARN_ON in
> >iwl_mvm_rx_tx_cmd_single;
> >
> > case TX_STATUS_FAIL_DEST_PS:
> > /* In DQA, the FW should have stopped the queue and not
> > * return this status
> > */
> > WARN_ON(iwl_mvm_is_dqa_supported(mvm));
> > info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
> > break;
> >
> >But it is only a warning. If connections aren't dying, it may not be
> >important to you.
> >
> >Please check you are using the most recent linux-firmware?
>
> I'm running
> ii linux-firmware 1.169 all
> from artful.
> No difference to the xenial version.

Good, thanks.

> >>>Several methods, though by far the most common seems to be personal
> >>>experience with offsets.
> >>>
> >>>When you don't have that personal experience, the methods are;
> >>>
> >>>1. using GDB against the .o file,
> >>>
> >>>2. using binutils objdump to disassemble .o file or vmlinuz,
> >>>
> >>>3. using GCC to generate assembly listings,
> >>>
> >>>See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
> >>>the end of page for the GDB method.
> >>
> >>I have gotten around to that part, yet, as i was busy with the
> >>above, but it seems later versions have issues, too.
> >
> >However, you're still testing old source code.
> >
> >Several changes made since are worth testing, please either
> >cherry-pick the patches or test a 4.14 rc kernel, and without
> >involving dkms or virtualbox.
>
> Then i'd have to patch those files so they build for 4.14 first.
> I've seen patches, but still need to figure out how to get them
> applied in the build process.

It may be more efficient to wait for your dkms packagers to catch up
so that the v4.14-rc6 or v4.14 kernel will work with your
package configuration.

> --
> Mit freundlichen Gr??en/Best Regards
>
> Mario Theodoridis

--
James Cameron
http://quozl.netrek.org/

2017-10-31 19:25:33

by Mario Theodoridis

[permalink] [raw]
Subject: Re: iwlwifi crash with hostapd

Hi James,


On 24.10.2017 23:01, James Cameron wrote:
> But it is only a warning. If connections aren't dying, it may not be
> important to you.

Regarding whether wifi hangs, it's usually takes a while to get going
and then disappears. Sunday night i ended up rebooting into the 4.4-79
kernel because the 4.13 just got too ridiculous.
I.e. Wlan off, wlan on no longer worked.


> Please check you are using the most recent linux-firmware?

Just in case i haven't answered that it's at 1.169


>>> Several methods, though by far the most common seems to be personal
>>> experience with offsets.
>>>
>>> When you don't have that personal experience, the methods are;
>>>
>>> 1. using GDB against the .o file,
>>>
>>> 2. using binutils objdump to disassemble .o file or vmlinuz,
>>>
>>> 3. using GCC to generate assembly listings,
>>>
>>> See https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks right down
>>> the end of page for the GDB method.
>>
>> I have gotten around to that part, yet, as i was busy with the
>> above, but it seems later versions have issues, too.
>
> However, you're still testing old source code.
>
> Several changes made since are worth testing, please either
> cherry-pick the patches or test a 4.14 rc kernel, and without
> involving dkms or virtualbox.
>
> Or, if new firmware fixes the problem, go with that instead.

I just managed to patch the 5.1.30 dkms package so i wouldn't need to
update to virtualbox-5.2.


Here are the results with the 4.14-rc7 kernel.
As last time i appended what fills my syslog now.


--
Mit freundlichen Grüßen/Best regards

Mario Theodoridis