2013-11-06 04:47:35

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/5 wzyboy <[email protected]>:
> I've been using Linux 3.12 for one day now, no more bugs about
> iwlwifi.ko are encountered... Maybe this is fixed in Linux 3.12?

Hi, I'm back.

This terriable bug occurs again in Linux 3.12. This time I've been
prepared for it and wrote down the complete process of its apperance
and my reacting, as attached.

IMHO they are quite similar with those errors in 3.11...


Sincere regards.

--
wzyboy


Attachments:
iwlwifi.error4.log (16.85 kB)

2013-11-10 11:38:22

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/10 Emmanuel Grumbach <[email protected]>:
> HW people seem to want to know what happens in you disable L1 substates.
> Can you enter you BIOS and check if you have such an option in your BIOS?


Could you be more specific what the option name look like? Or I could
take photos for each tab in BIOS and attach the photos here.

--
wzyboy

2013-11-06 17:50:34

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Hi,

adding PCI folks.
Here is the story:

* Wzyboy has a Lenovo laptop with _OSC control *not* granted
* L1 Active is enabled
* kernel: 3.12.0
* Nic is PCIe (Gen2 but not sure...)

At some random point, the driver loses access to the NIC: all readl
operation return 0xff.
Even lspci returns 0xff:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

here is the output of lspci *before* the issue hits:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00

have you any idea of what we can do to understand what it going wrong here?

Thanks

On 11/06/2013 09:12 AM, wzyboy wrote:
> 2013/11/6 Grumbach, Emmanuel <[email protected]>:
>> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
>> I can see 0xff here.
>> Anyway - this is very bad... checking with HW guys...
>
>
> Sorry, that's my typo. They are all 0xff... (I don't know what do they
> mean but it look bad...)
>
> Thanks for your effort! I'm waiting for good news from you and HW guys. :-)
>

2013-11-14 06:20:53

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS8xNCBCam9ybiBIZWxnYWFzIDxiaGVsZ2Fhc0Bnb29nbGUuY29tPjoNCj4g
PiBUaGFua3MuICBBcmUgeW91IDEwMCUgc3VyZSB0aGUgbHNwY2kgb3V0cHV0IGlzIGJlZm9yZSB0
aGUgc2V0cGNpDQo+ID4gd29ya2Fyb3VuZD8NCj4gDQo+IA0KPiBUbyBlbnN1cmUgdGhhdCwgSSBk
aWQgaXQgYWdhaW46DQo+IA0KPiBib290IHVwIGxhcHRvcCAtPiBjb25uZWN0IHRvIGRvbWl0b3J5
J3MgV2lGaSAtPiBydW4gbHNwY2kgLT4gcnVuIHNldHBjaSAtPiBydW4NCj4gZG1lc2cuDQo+IA0K
PiBIZXJlIGFyZSB0aGUgb3V0cHV0cy4NCg0KQ2FuIHlvdSBwbGVhc2UgdHJ5IHRoZSBmb2xsb3dp
bmc6DQoqIEJvb3Qgd2l0aG91dCBhbnkgY2hhbmdlcw0KKiBzZXRwY2kgLXMwMzowMC4wIDB4MTYw
LkI9MHgwMA0KKiBzZXRwY2kgLXMwMDoxYy4xIDB4MjA0LkI9MHgxMA0KbHNwY2kgLXZ2DQoNCmFu
ZCB0ZWxsIG1lIGlmIFdpRmkgd29ya3MgdGhlbi4NCih0aGlzIHJlcGxhY2VzIHRoZSBwcmV2aW91
cyBzZXRwY2kgY29tbWFuZHMpDQoNCg0KVGhhbmsgeW91DQo=

2013-11-11 22:44:43

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Sat, Nov 09, 2013 at 10:46:21AM +0800, wzyboy wrote:
> 2013/11/9 Bjorn Helgaas <[email protected]>:
> > Thanks. But can you please attach the output of "lspci -vvxxx" (not
> > "-vxxxx") for the entire system before the problem occurs?
>
>
> Sorry I used the wrong command...
>
> I've attached the output of -vvxxx below.
>
> There are three files:
>
> * lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
> * lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
> link" after I ran "ip link set wlan0 up".
> * lspci.vvxxx.normal3.txt" When the interface is connected to the
> Wi-Fi of my dormitory and got an address (but without default
> gateway, I'm using wired network now).

The only interesting difference is this (between "normal" and "normal3"):

--- lspci.vvxxx.normal.txt 2013-11-11 14:42:14.000000000 -0700
+++ lspci.vvxxx.normal3.txt 2013-11-11 14:42:14.000000000 -0700

00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4) (prog-if 00 [Normal decode])
- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
+ LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train+ SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

In "normal3", the Link Training bit is set. I'm not a hardware person,
but my guess it this might be normal. The spec says Link Training
indicates that the "LTSSM is in the Configuration or Recovery state,"
and Figure 5-1 shows that the transition from L1 to L0 goes through
the Recovery state. So we might just be seeing the device returning
from L1 to L0. Maybe Emmanuel can confirm this with the hardware guys.

Comparing "lspci.vvxxx.normal.txt" with "lspci.vvxxx.patched.bug.txt",
I see these changes in the 00:1c.1 Downstream Port (the bridge that
leads to the 7260 NIC):

--- before 2013-11-11 15:24:04.755738964 -0700
+++ after 2013-11-11 15:24:11.875722068 -0700
00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4) (prog-if 00 [Normal decode])
- DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
+ DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
+ LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-
- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
- Changed: MRL- PresDet- LinkState+
+ SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
+ Changed: MRL- PresDet+ LinkState+
- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-

So when the bug occurs,

- Correctable Error Detected is set
- Data Link Layer Link Active is cleared
- Presence Detect State is cleared
- LTR Mechanism Enable is cleared (spec says this bit must be
reset to the default value when a Downstream Port goes to
DL_Down)

This all seems consistent with the device being powered off. Maybe
the 7260 is on a daughterboard with a bad connection to the system
board? Any chance you can open up the box and make sure the
connection is tight?

It's possible there's some ASPM issue, but I would think Presence
Detect would still work even if the 7260 had a problem with ASPM.
Here's another experiment to try to rule out ASPM. Run these
commands as root after the driver is loaded but before the bug occurs:

setpci -s03:00.0 0x50.W=0x140
setpci -s00:1c.1 0x50.W=0x040
lspci -vv

This should disable ASPM completely on that link, and the lspci output
will help verify that.

Bjorn

2013-11-12 18:14:51

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Tue, Nov 12, 2013 at 5:16 AM, Grumbach, Emmanuel
<[email protected]> wrote:
>>
>> 2013/11/12 wzyboy <[email protected]>:
>> > I will continue downloading big files to benchmark it.
>>
>> Hi guys, good news!
>>
>> Six hours ago I ran a simple loop script to repeatly download big files (and
>> saving to /dev/null) and went to have lessons. Six hours later it's after school.
>> I found that the wireless still works!
>>
>> So I believe that the two "setpci" commands really work! Thanks Bjorn and
>> Emmanuel!
>
> Well... I haven't done much, but the setpci isn't really a solution - it is more a work around.
> Bjorn is basically disabling L1 PCIe feature which allows to save power. While you might not care, I do :)
> The HW folks here would still want to know if you can disable L1 substates feature (not that I know what it is - but I can guess).
> If you can try to:
> * upgrade your BIOS (if needed)
> * check the advanced options I sent to you to see if you can unlock the advanced menu in your BIOS
>
> it'd help me to understand the issue.
> In any case, I am happy that you have a way to reliably use your NIC now.

The setpci experiment was only for debugging. Obviously it's not a
real fix, and it doesn't help any other users of this ThinkPad X240s.
But it does seem clear that the problem is related to ASPM.

And it looks like the same thing we investigated here:
https://bugzilla.kernel.org/show_bug.cgi?id=57331, which is even on
the same device.

>From your dmesg logs:

ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
acpi PNP0A08:00: ACPI _OSC control for PCIe not granted, disabling ASPM

The messages are misleading. Linux does not actually disable ASPM as
you did with setpci. All Linux does is leave the current ASPM
configuration untouched, because we believe that the BIOS is managing
it. The BIOS must have enabled ASPM on this device (you could verify
this by booting with "pci=earlydump"), and BIOS also says the OS must
not enable ASPM control (via the ACPI FADT table and the PCI host
bridge _OSC method).

It would really help if you still had Windows on this system, and we
could look and see whether it disables ASPM for this device (if
anybody does have Windows, I would probably use AIDA64 to dump the PCI
config space). I did experiments for bug 57331 that suggested that
Windows leaves ASPM alone just like Linux does, but my experiments
were on qemu, not on real hardware, and I didn't have an Intel wifi
device.

If the Windows driver works fine even with ASPM enabled, that would
suggest that the problem is something in the Linux iwlwifi driver. If
Windows does actually disable ASPM on this device, then we would have
to figure out if there's a way we can safely make Linux also disable
ASPM.

Bjorn

2013-11-15 08:49:48

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly



On 11/15/2013 05:06 AM, wzyboy wrote:
> 2013/11/15 Bjorn Helgaas <[email protected]>:
>> Why would it be unlikely to fix the driver? Do people think the
>> problem is not actually in the driver?
>>
>> Asking Lenovo how to disable L1 PM substates is really a non-answer.
>> Only the extremely technical and extremely patient user (hi wzyboy :))
>> will even bother to investigate why wifi works fine with Windows but
>> not with Linux. The only thing Lenovo *could* do is to release a new
>> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
>> would never do that because then I would have to tell customers
>> "disable this for Linux, enable this for Windows," and I'd have to
>> deal with support calls about devices using more power than they
>> should, battery life being shorter, etc. Plus you'd have to ask every
>> Linux user to upgrade their BIOS. That's all just a terrible user
>> experience.
>
>
> I am a little confused. There are two sets of "setpci" commands, both
> of which can make me use my NIC reliably. But you two say they are
> just workarounds, not real fixes.
>

Right - because they force a mode that the BIOS doesn't allow. The BIOS
doesn't allow the OS (the driver) to decide in what mode to work - so we
cannot reach the same effect as the setpci command from the OS / driver
level. setpci just directly accesses the HW without asking the
permissions of anyone.

> I know the "side effect" of first two "setpci" commands is consuming
> more power. (Actually by my experience of running on battery, I did
> not notice ...)
>
> But Grumbach said after the second two "setpci" commands enables "L1".
> Does it mean it saves power? So what's the "side effect" of second two
> "setpci" commands?
>

They are both the same in terms of side-effects. The first set of setpci
commands will disable L1 altogether - meaning you don't save any power.
The second set of setpci doesn't disable L1, but disable a more subtle
power state (actually several) which are defined as L1 PM substates. In
theses substates, you save less power than in L1 (I think) but you are
more likely to be able to reach them. After all, it is always the same
story - the deeper you sleep, you longer it takes to wake up. And if it
takes longer to wake up, it also means that in several cases you won't
chose to go to sleep. So the way PCI folks help to save power even in
case where you cannot go to a deep sleep is to define states in the
middle in which you save less power, but in which you are more likely to
be. Again - time spent in each state and power saved in each state trade
off.
Now:
L1 - deep sleep
L1 PM substate - something in the middle.

First setpci command - disable both features.
Second setpci command - disable only the second feature.

Regarding side effects... I don't think this is really "dangerous". But
this is not a fix in the way that I wouldn't like to deploy millions of
machines like that. The risk you have here is probably to have a bad
timing and have the setpci commands run exactly when the link is in a
state that setpci disables. That would be bad. How bad? Probably would
just require a reboot - or worst case G3 (take the battery off).

> IMHO, if this could user use their NIC reliably, maybe Grumbach may
> write these commands to iwlwifi driver and run them when 7260 is
> detected...

I can't as exlained above.

>
> BTW, no replies from Lenovo, yet.
>


> Or maybe you could add an option, which enables this "workaround" if
> user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
> and enable this "workaround", to use their NICs without having to
> reboot from time to time...

same.

2013-11-10 11:32:29

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Sun, Nov 10, 2013 at 12:19 PM, wzyboy <[email protected]> wrote:
> 2013/11/9 Bjorn Helgaas <[email protected]>:
>> it might
>> be interesting to do "echo on >
>> /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
>> makes a difference.
>
>
> Hi, should I run this command after the bug? I just ran this after the
> bug occurs, but there is no output in dmesg, and "ip link set wlan0
> up" still returns same error ("RTNETLINK answers: Connection timed
> out").
>
> --

HW people seem to want to know what happens in you disable L1 substates.
Can you enter you BIOS and check if you have such an option in your BIOS?

2013-11-12 22:09:35

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach <[email protected]> wrote:
> On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
>> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
>> <[email protected]> wrote:
>>
>>> Right - I remember the discussion we had on that.
>>> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
>>
>> If ASPM is supposed to work as far as the hardware is concerned, I
>> guess you're saying this must be an iwlwifi driver issue. Right?
>
> ASPM is supposed to work as far as the hardware is concerned.
> We might very well have an issue in iwlwifi - and I am checking this
> internally with our System guys.
> It can be a PCI core problem too, and it could also be a platform / BIOS
> / Lenovo issue.
> Of course, I have no clue which of these is the culprit here.
> Our System folks seemed to say that this new device uses L1 substates
> which can be enabled in Haswell platform which the user owns.
> Now - L1 substates is a new feature and might introduce issues
> (apparently) - and this is why they (System folks) wanted the try
> without L1 substates. But disabling L1 substates doesn't seem trivial
> with the production BIOS of Lenovo. So I am pretty stuck here.

For debugging purposes, we could configure L1 substates with setpci,
as we did for ASPM. The Linux kernel knows nothing about L1
substates, so the PCI core isn't doing anything with them. It's
possible the driver itself could muck with L1 substate configuration,
but that would be discouraged, and I don't see anything in iwlwifi
that is doing that.

The lspci output in
https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
Substates extended capability (capability ID 0x1e) for the Root Port
leading to the 7260 device, but not for the 7260 device itself:

00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root
Port 3 (rev e4) (prog-if 00 [Normal decode])
Capabilities: [200 v1] #1e

Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think L1
substates must be configured on both ends of the link, and if the 7260
device doesn't have that capability, I don't see how it could be
enabled.

The lspci version wzyboy has doesn't decode the L1 PM Substates
capability, but there is a newer version at
git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should
decode it. Also, "lspci -vvxxx" didn't hexdump this capability, which
should be at offset 0x200. Using "lspci -xxxx" (four "x"s) should
dump it, and we can decode it manually.

wzyboy, can you run these commands before the bug occurs and before
using the "setpci" workaround:

lspci -vvxxxx -s00:1c.1
lspci -vvxxxx -s03:00.0

Bjorn

2013-11-10 12:17:50

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS8xMCBHcnVtYmFjaCwgRW1tYW51ZWwgPGVtbWFudWVsLmdydW1iYWNoQGlu
dGVsLmNvbT46DQo+ID4gSXQgaXMgcmVhbGx5IGNhbGxlZCBMMSBQTSBTdWJzdGF0ZS4NCj4gPiBJ
IGRvbid0IHJlYWxseSBrbm93IHRoZSBCSU9TIG9mIFRoaW5rUGFkLi4uIEJ1dCB3ZSBjYW4gdHJ5
Li4uDQo+IA0KPiANCj4gSSBoYXZlIGJvb3RlZCBpbnRvIEJJT1MgYW5kIHdyaXRlIGRvd24gYWxt
b3N0IGFsbCB0aGUgb3B0aW9ucywgYXMgYXR0YWNoZWQuDQo+IA0KPiBCeSB0aGUgd2F5LCBJIGFs
d2F5cyBhdHRhY2ggbXkgbGFwdG9wIHRvIGFuIEFDIGFkYXB0b3IuDQo+IA0KDQpDb29sLi4uLiB0
aGV5IG1hc2sgYWxsIHRoZSBpbnRlcmVzdGluZyBvcHRpb25zLi4uIG9oIHdlbGwuLi4NCg==

2013-11-06 06:47:47

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/6 Grumbach, Emmanuel <[email protected]>:
> I don't know - I am trying to check with our HW guys here.
> Can you please run lspci -xxx before and after it happens?
> BTW - how do you recover? Reloading the module is enough?

I've attached the output of `lspci -xxx'. The next time this bug
occurs I will run it again and send the output to you.

This bug occurs more than 15 times since I bought this laptop on Nov
01 and I've tried differnt methods after each, trying to bring the
network back alive without rebooting. However, no matter I use "ip
link set wlan0 down", "ip link set wlan0 up" or reloading the kernel
module, it just did not work -- the only way is to reboot the laptop
... (That's why I said "I hate rebooting")

Could you suggest any other possible methods that may have a chance to
recover without rebooting? I could try it the next time this bug
occurs.

--
wzyboy


Attachments:
lspci.normal.txt (4.25 kB)

2013-11-12 18:25:56

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

>
> On Tue, Nov 12, 2013 at 5:16 AM, Grumbach, Emmanuel
> <[email protected]> wrote:
> >>
> >> 2013/11/12 wzyboy <[email protected]>:
> >> > I will continue downloading big files to benchmark it.
> >>
> >> Hi guys, good news!
> >>
> >> Six hours ago I ran a simple loop script to repeatly download big
> >> files (and saving to /dev/null) and went to have lessons. Six hours later it's
> after school.
> >> I found that the wireless still works!
> >>
> >> So I believe that the two "setpci" commands really work! Thanks Bjorn
> >> and Emmanuel!
> >
> > Well... I haven't done much, but the setpci isn't really a solution - it is more
> a work around.
> > Bjorn is basically disabling L1 PCIe feature which allows to save
> > power. While you might not care, I do :) The HW folks here would still want
> to know if you can disable L1 substates feature (not that I know what it is -
> but I can guess).
> > If you can try to:
> > * upgrade your BIOS (if needed)
> > * check the advanced options I sent to you to see if you can unlock
> > the advanced menu in your BIOS
> >
> > it'd help me to understand the issue.
> > In any case, I am happy that you have a way to reliably use your NIC now.
>
> The setpci experiment was only for debugging. Obviously it's not a real fix,
> and it doesn't help any other users of this ThinkPad X240s.
> But it does seem clear that the problem is related to ASPM.
>
> And it looks like the same thing we investigated here:
> https://bugzilla.kernel.org/show_bug.cgi?id=57331, which is even on the
> same device.
>

Not the same device but the same driver.

> From your dmesg logs:
>
> ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
> acpi PNP0A08:00: ACPI _OSC control for PCIe not granted, disabling ASPM
>

Right - I remember the discussion we had on that.
On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
This code is new in 3.12, and is not in 3.11. The first log that the user here sent is on 3.11, hence you still see the error message from PCI subsystem.
Now (3.12) the code reads:

if (!cfg->base_params->pcie_l1_allowed) {
/*
* W/A - seems to solve weird behavior. We need to remove this
* if we don't want to stay in L1 all the time. This wastes a
* lot of power.
*/
pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
PCIE_LINK_STATE_L1 |
PCIE_LINK_STATE_CLKPM);
}

and the if is *not* taken and 7260 which is the device we are talking about.

> The messages are misleading. Linux does not actually disable ASPM as you
> did with setpci. All Linux does is leave the current ASPM configuration
> untouched, because we believe that the BIOS is managing it. The BIOS must
> have enabled ASPM on this device (you could verify this by booting with
> "pci=earlydump"), and BIOS also says the OS must not enable ASPM control
> (via the ACPI FADT table and the PCI host bridge _OSC method).
>
> It would really help if you still had Windows on this system, and we could look
> and see whether it disables ASPM for this device (if anybody does have
> Windows, I would probably use AIDA64 to dump the PCI config space). I did
> experiments for bug 57331 that suggested that Windows leaves ASPM alone
> just like Linux does, but my experiments were on qemu, not on real
> hardware, and I didn't have an Intel wifi device.
>
> If the Windows driver works fine even with ASPM enabled, that would
> suggest that the problem is something in the Linux iwlwifi driver. If Windows
> does actually disable ASPM on this device, then we would have to figure out
> if there's a way we can safely make Linux also disable ASPM.
>

2013-11-06 06:18:53

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gSGksIEknbSBiYWNrLg0KPiANCj4gVGhpcyB0ZXJyaWFibGUgYnVnIG9jY3VycyBhZ2Fp
biBpbiBMaW51eCAzLjEyLiBUaGlzIHRpbWUgSSd2ZSBiZWVuIHByZXBhcmVkIGZvcg0KPiBpdCBh
bmQgd3JvdGUgZG93biB0aGUgY29tcGxldGUgcHJvY2VzcyBvZiBpdHMgYXBwZXJhbmNlIGFuZCBt
eSByZWFjdGluZywgYXMNCj4gYXR0YWNoZWQuDQoNClRoYW5rcyAtIHdhcyB0aGF0IHdpdGggcG93
ZXIgc2F2ZSBkaXNhYmxlZD8NCg0KPiANCj4gSU1ITyB0aGV5IGFyZSBxdWl0ZSBzaW1pbGFyIHdp
dGggdGhvc2UgZXJyb3JzIGluIDMuMTEuLi4NCg0KSW5kZWVkLiBUaGUgb25seSBkaWZmZXJlbmNl
IGlzIHRoYXQgeW91IGRvbuKAmXQgaGF2ZSBQQ0kgY29tcGxhaW4gYWJvdXQgbm90IGJlaW5nIGFi
bGUgdG8gZGlzYWJsZSBMMS4NCg==

2013-11-06 07:10:37

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS82IHd6eWJveSA8d3p5Ym95QHd6eWJveS5vcmc+Og0KPiA+IEkndmUgYXR0
YWNoZWQgdGhlIG91dHB1dCBvZiBgbHNwY2kgLXh4eCcuIFRoZSBuZXh0IHRpbWUgdGhpcyBidWcN
Cj4gPiBvY2N1cnMgSSB3aWxsIHJ1biBpdCBhZ2FpbiBhbmQgc2VuZCB0aGUgb3V0cHV0IHRvIHlv
dS4NCj4gDQo+IA0KPiBTbyBkcmFtYXRpY2FsIGl0IHdhcy4gU2V2ZXJhbCBtaW51dGVzIGFnbyBJ
IGNsaWNrZWQgb24gIlNlbmRlbiINCj4gKEdlcm1hbiBmb3IgIlNlbmQiKSBidXR0b24gdG8gc2Vu
ZCB0aGF0IGVtYWlsLiBBZnRlciB0aGUgbWFpbCB3YXMgc2VudCwgSQ0KPiBvcGVuZWQgYSBuZXcg
dGFiIGluIHRoZSBicm93c2VyLCB0cnlpbmcgdG8gZ29vZ2xlIHNvbWV0aGluZyBhbmQgZm91bmQg
dGhlDQo+IEludGVybmV0IGNvbm5lY3Rpb24gaXMgbG9zdC4gLS0gVGhlIGJ1ZyBvY2N1cmVkIGFn
YWluIQ0KPiANCj4gU28gSSByYW4gYGxjcGNpIC14eHgiIGFuZCBzYXZlZCB0aGUgb3V0cHV0IGFs
b25nIHdpdGggdGhlIGtlcm5lbCBsb2dzLiBJJ20NCj4gc2hvY2tlZCB0aGF0IHRoZSBoZXggc3Ry
aW5ncyBhcmUgYWxsICIwMCIuDQoNCldhaXQgLSB5b3UgbWVhbiB0aGF0IGFmdGVyIHRoZSBidWcg
b2NjdXJyZWQgYmVmb3JlIHlvdSByZWJvb3RlZCwgbHNwY2kgLXh4eCBzaG93IGFsbCAwMD8NCkkg
Y2FuIHNlZSAweGZmIGhlcmUuDQpBbnl3YXkgLSB0aGlzIGlzIHZlcnkgYmFkLi4uIGNoZWNraW5n
IHdpdGggSFcgZ3V5cy4uLg0KDQo+IA0KPiBJIHJlYm9vdGVkIG15IGxhcHRvcCBhbmQgdHJ5IHRv
IHNlbmQgeW91IHRoZSBsb2dzIGFuZCB0aGUgYnVnIG9jY3VycyBhZ2Fpbi4uLg0KPiAtLSBzZWVt
cyBtb3JlIGFuZCBtb3JlIGZyZXVxdWVudCAtLSBhbmQgSSBoYWQgdG8gcmVib290IC4uLg0KPiAN
Cj4gSGVyZSBhcmUgdGhlIGxvZ3MgLi4uIGZpbmFsbHkuDQo+IA0KPiAtLQ0KPiB3enlib3kNCg==

2013-11-14 08:40:02

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiA+IEF3ZXNvbWUgLSB5b3UgaGF2ZSBMMSBlbmFibGVkOg0KPiANCj4gDQo+IFRob3VnaCBkbyBu
b3QgdW5kZXJzdGFuZCBidXQgaXQgc2VlbXMgbGlrZSBhIGdvb2QgbmV3cyA6LSkNCj4gDQoNCllv
dSBhcmUgc2F2aW5nIHBvd2VyIChhbmQgSSBrbm93IHRoYXQgYXQgbGVhc3QsIEwxIHdvcmtzKS4N
CkwxIFBNIHN1YnN0YXRlcyBkb2Vzbid0IHdvcmsgKGJ1dCB0aGF0J3MgYSBicmFuZCBuZXcgZmVh
dHVyZSkNCg0KPiBEb2VzIHRoYXQgbWVhbiB5b3UgZmluZCBhIHJlYWwgZml4IGluc3RlYWQgb2Yg
YSB3b3JrYXJvdW5kPyBDb25ncmF0dWxhdGlvbnMhDQoNCk5vLiBUaGUgcmVhbCBmaXggc2hvdWxk
IGNvbWUgZnJvbSB0aGUgZHJpdmVyICh1bmxpa2VseSBmcm9tIHdoYXQgSSBoZWFyIGZyb20gc3lz
dGVtIHBlb3BsZSBoZXJlKSBvciBkaXNhYmxlIGluIEJJT1MuDQpTbyBJIGd1ZXNzIHlvdSdkIG5l
ZWQgdG8gYXNrIExlbm92byBob3cgdG8gZGlzYWJsZSBMMSBQTSBTdWJzdGF0ZSBpbiBCSU9TLg0K

2013-11-11 21:55:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Sun, Nov 10, 2013 at 06:19:58PM +0800, wzyboy wrote:
> 2013/11/9 Bjorn Helgaas <[email protected]>:
> > it might
> > be interesting to do "echo on >
> > /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
> > makes a difference.
>
> Hi, should I run this command after the bug? I just ran this after the
> bug occurs, but there is no output in dmesg, and "ip link set wlan0
> up" still returns same error ("RTNETLINK answers: Connection timed
> out").

Run the command before the bug occurs. The idea is to disable run-time
power management. If the problem is that we're turning off power to
the device, disabling power management might make a difference.

Bjorn

2013-11-10 12:13:40

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/10 Grumbach, Emmanuel <[email protected]>:
> It is really called L1 PM Substate.
> I don't really know the BIOS of ThinkPad... But we can try...


I have booted into BIOS and write down almost all the options, as attached.

By the way, I always attach my laptop to an AC adaptor.


--
wzyboy


Attachments:
x240s-bios.txt (2.05 kB)

2013-11-14 17:54:20

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Thu, Nov 14, 2013 at 1:39 AM, Grumbach, Emmanuel
<[email protected]> wrote:
>> > Awesome - you have L1 enabled:
>>
>> Though do not understand but it seems like a good news :-)
>
> You are saving power (and I know that at least, L1 works).
> L1 PM substates doesn't work (but that's a brand new feature)
>
>> Does that mean you find a real fix instead of a workaround? Congratulations!
>
> No. The real fix should come from the driver (unlikely from what I hear from system people here) or disable in BIOS.
> So I guess you'd need to ask Lenovo how to disable L1 PM Substate in BIOS.

Why would it be unlikely to fix the driver? Do people think the
problem is not actually in the driver?

Asking Lenovo how to disable L1 PM substates is really a non-answer.
Only the extremely technical and extremely patient user (hi wzyboy :))
will even bother to investigate why wifi works fine with Windows but
not with Linux. The only thing Lenovo *could* do is to release a new
BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
would never do that because then I would have to tell customers
"disable this for Linux, enable this for Windows," and I'd have to
deal with support calls about devices using more power than they
should, battery life being shorter, etc. Plus you'd have to ask every
Linux user to upgrade their BIOS. That's all just a terrible user
experience.

Bjorn

2013-11-06 07:07:29

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/6 wzyboy <[email protected]>:
> I've attached the output of `lspci -xxx'. The next time this bug
> occurs I will run it again and send the output to you.


So dramatical it was. Several minutes ago I clicked on "Senden"
(German for "Send") button to send that email. After the mail was
sent, I opened a new tab in the browser, trying to google something
and found the Internet connection is lost. -- The bug occured again!

So I ran `lcpci -xxx" and saved the output along with the kernel
logs. I'm shocked that the hex strings are all "00".

I rebooted my laptop and try to send you the logs and the bug occurs
again... -- seems more and more freuquent -- and I had to reboot ...

Here are the logs ... finally.

--
wzyboy


Attachments:
lspci.afterbug.txt (7.41 kB)

2013-11-13 05:39:24

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/13 Bjorn Helgaas <[email protected]>:
> wzyboy, can you run these commands before the bug occurs and before
> using the "setpci" workaround:
>
> lspci -vvxxxx -s00:1c.1
> lspci -vvxxxx -s03:00.0

After today's morning lessons I booted up my laptop with pci=earlydump
kernel perameter and here are the output of lspci (without setpci and
before bug hit) and dmesg.

--
wzyboy


Attachments:
lspci.without-setpci.before-bug.txt (32.04 kB)
pci-earlydump.dmesg (63.46 kB)
Download all attachments

2013-11-11 11:40:18

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/11 Emmanuel Grumbach <[email protected]>:
> Can you please try this?


Hi, thanks for your patch. I re-compiled my kernel with the patch and
benchmarked it, but sadly the bug still exists...

After booted into the new kernel, I tried to download a 4.0GB file
from the Internet with wget:

HTTP-Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 4268605440 (4,0G) [application/octet-stream]
In »»./en_windows_server_2012_r2_x64_dvd_2707946.iso«« speichern.

63% [=====> ] 2.713.761.088 --.-K/s ETA 19m 59s ^C

The download speed was at a average of 2.3 MB/s and suddenly the bug occured...

The dmesg log and lspci -vvxxx output are attached. They seem no different...

--
wzyboy


Attachments:
lspci.vvxxx.patched.bug.txt (30.81 kB)
linux-3.12-patched.dmesg (64.90 kB)
Download all attachments

2013-11-07 14:25:12

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/7 wzyboy <[email protected]>:
> Here is the output of lspci -vxxxx just now. I'll run this command
> again when the bug occurs next time:


Hi, I'm back. The bug occurs two more times, after the first of which
I forgot to run that command.

In the attachment is the output of lspci -vxxxx after the bug occured
and before I rebooted my laptop.

There are too many terriable 0xff there...


--
wzyboy


Attachments:
lspci.vxxxx.afterbug.txt (93.16 kB)

2013-11-08 17:38:28

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Fri, Nov 8, 2013 at 10:20 AM, Bjorn Helgaas <[email protected]> wrote:
> My only guess is that there's something wrong with the ASPM
> configuration and the device just stops responding to config accesses
> (and probably MMIO accesses, too, based on the errors in your dmesg
> log). Or maybe the device got powered off somehow.

If you've figured out a way to reproduce this more reliably, it might
be interesting to do "echo on >
/sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
makes a difference. That should prevent us from using runtime power
management for the iwlwifi device.

Bjorn

2013-11-12 12:11:13

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/12 wzyboy <[email protected]>:
> I will continue downloading big files to benchmark it.

Hi guys, good news!

Six hours ago I ran a simple loop script to repeatly download big
files (and saving to /dev/null) and went to have lessons. Six hours
later it's after school. I found that the wireless still works!

So I believe that the two "setpci" commands really work! Thanks Bjorn
and Emmanuel!

--
wzyboy

2013-11-15 03:09:53

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/15 wzyboy <[email protected]>:
> IMHO, if this could user use their NIC reliably, maybe Grumbach may
> write these commands to iwlwifi driver and run them when 7260 is
> detected...


Or maybe you could add an option, which enables this "workaround" if
user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
and enable this "workaround", to use their NICs without having to
reboot from time to time...

--
wzyboy

2013-11-14 07:02:16

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/14 Grumbach, Emmanuel <[email protected]>:
> Can you please try the following:
> * Boot without any changes
> * setpci -s03:00.0 0x160.B=0x00
> * setpci -s00:1c.1 0x204.B=0x10
> lspci -vv
>
> and tell me if WiFi works then.
> (this replaces the previous setpci commands)
>
>
> Thank you


boot up (no connection) -> run new setpci commands -> lspci and dmesg
-> connect to domitory's WiFi -> download 3.5 GiB data -> works fine!

--
wzyboy


Attachments:
lspci.after-setpci.before-connect.txt (21.25 kB)
pci-earlydump.3.dmesg (62.88 kB)
Download all attachments

2013-11-12 05:42:53

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Hi, I've got some good news. Here is what I did today:

boot up laptop -> do the sysfs trick -> start downloading a big file
to benchmark it -> several minutes later the bug occurs -> reboot my
laptop to recover -> do the setpci trick -> start downloading a big
file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
and everything works fine!

Here are the output of lspci -vv after running two "setpci" commands.

There is also a screenshot of ThinkPad X240s HMM, showing how the
wireless card is connected to the motherboard. In this figure #10 is
the Wireless LAN card. It is connected to the motherboard with Intel's
NGFF connector.

I will continue downloading big files to benchmark it.

--
wzyboy


Attachments:
lspci.vv.aftersetpci.txt (21.25 kB)
x240s-hmm.png (149.02 kB)
Download all attachments

2013-11-12 12:16:16

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS8xMiB3enlib3kgPHd6eWJveUB3enlib3kub3JnPjoNCj4gPiBJIHdpbGwg
Y29udGludWUgZG93bmxvYWRpbmcgYmlnIGZpbGVzIHRvIGJlbmNobWFyayBpdC4NCj4gDQo+IEhp
IGd1eXMsIGdvb2QgbmV3cyENCj4gDQo+IFNpeCBob3VycyBhZ28gSSByYW4gYSBzaW1wbGUgbG9v
cCBzY3JpcHQgdG8gcmVwZWF0bHkgZG93bmxvYWQgYmlnIGZpbGVzIChhbmQNCj4gc2F2aW5nIHRv
IC9kZXYvbnVsbCkgYW5kIHdlbnQgdG8gaGF2ZSBsZXNzb25zLiBTaXggaG91cnMgbGF0ZXIgaXQn
cyBhZnRlciBzY2hvb2wuDQo+IEkgZm91bmQgdGhhdCB0aGUgd2lyZWxlc3Mgc3RpbGwgd29ya3Mh
DQo+IA0KPiBTbyBJIGJlbGlldmUgdGhhdCB0aGUgdHdvICJzZXRwY2kiIGNvbW1hbmRzIHJlYWxs
eSB3b3JrISBUaGFua3MgQmpvcm4gYW5kDQo+IEVtbWFudWVsIQ0KPiANCg0KV2VsbC4uLiBJIGhh
dmVuJ3QgZG9uZSBtdWNoLCBidXQgdGhlIHNldHBjaSBpc24ndCByZWFsbHkgYSBzb2x1dGlvbiAt
IGl0IGlzIG1vcmUgYSB3b3JrIGFyb3VuZC4NCkJqb3JuIGlzIGJhc2ljYWxseSBkaXNhYmxpbmcg
TDEgUENJZSBmZWF0dXJlIHdoaWNoIGFsbG93cyB0byBzYXZlIHBvd2VyLiBXaGlsZSB5b3UgbWln
aHQgbm90IGNhcmUsIEkgZG8gOikNClRoZSBIVyBmb2xrcyBoZXJlIHdvdWxkIHN0aWxsIHdhbnQg
dG8ga25vdyBpZiB5b3UgY2FuIGRpc2FibGUgTDEgc3Vic3RhdGVzIGZlYXR1cmUgKG5vdCB0aGF0
IEkga25vdyB3aGF0IGl0IGlzIC0gYnV0IEkgY2FuIGd1ZXNzKS4NCklmIHlvdSBjYW4gdHJ5IHRv
Og0KICogdXBncmFkZSB5b3VyIEJJT1MgKGlmIG5lZWRlZCkNCiAqIGNoZWNrIHRoZSBhZHZhbmNl
ZCBvcHRpb25zIEkgc2VudCB0byB5b3UgdG8gc2VlIGlmIHlvdSBjYW4gdW5sb2NrIHRoZSBhZHZh
bmNlZCBtZW51IGluIHlvdXIgQklPUw0KDQppdCdkIGhlbHAgbWUgdG8gdW5kZXJzdGFuZCB0aGUg
aXNzdWUuDQpJbiBhbnkgY2FzZSwgSSBhbSBoYXBweSB0aGF0IHlvdSBoYXZlIGEgd2F5IHRvIHJl
bGlhYmx5IHVzZSB5b3VyIE5JQyBub3cuDQoNCg==

2013-11-12 12:45:20

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS8xMiBHcnVtYmFjaCwgRW1tYW51ZWwgPGVtbWFudWVsLmdydW1iYWNoQGlu
dGVsLmNvbT46DQo+ID4gV2VsbC4uLiBJIGhhdmVuJ3QgZG9uZSBtdWNoLCBidXQgdGhlIHNldHBj
aSBpc24ndCByZWFsbHkgYSBzb2x1dGlvbiAtIGl0IGlzIG1vcmUNCj4gYSB3b3JrIGFyb3VuZC4N
Cj4gPiBCam9ybiBpcyBiYXNpY2FsbHkgZGlzYWJsaW5nIEwxIFBDSWUgZmVhdHVyZSB3aGljaCBh
bGxvd3MgdG8gc2F2ZQ0KPiA+IHBvd2VyLiBXaGlsZSB5b3UgbWlnaHQgbm90IGNhcmUsIEkgZG8g
OikgVGhlIEhXIGZvbGtzIGhlcmUgd291bGQgc3RpbGwgd2FudA0KPiB0byBrbm93IGlmIHlvdSBj
YW4gZGlzYWJsZSBMMSBzdWJzdGF0ZXMgZmVhdHVyZSAobm90IHRoYXQgSSBrbm93IHdoYXQgaXQg
aXMgLQ0KPiBidXQgSSBjYW4gZ3Vlc3MpLg0KPiA+IElmIHlvdSBjYW4gdHJ5IHRvOg0KPiA+ICAq
IHVwZ3JhZGUgeW91ciBCSU9TIChpZiBuZWVkZWQpDQo+ID4gICogY2hlY2sgdGhlIGFkdmFuY2Vk
IG9wdGlvbnMgSSBzZW50IHRvIHlvdSB0byBzZWUgaWYgeW91IGNhbiB1bmxvY2sNCj4gPiB0aGUg
YWR2YW5jZWQgbWVudSBpbiB5b3VyIEJJT1MNCj4gPg0KPiA+IGl0J2QgaGVscCBtZSB0byB1bmRl
cnN0YW5kIHRoZSBpc3N1ZS4NCj4gPiBJbiBhbnkgY2FzZSwgSSBhbSBoYXBweSB0aGF0IHlvdSBo
YXZlIGEgd2F5IHRvIHJlbGlhYmx5IHVzZSB5b3VyIE5JQyBub3cuDQo+IA0KPiANCj4gT2gsIGFj
dHVhbGx5IEkgdXBncmFkZWQgbXkgQklPUyB0byBuZXdlc3QgdmVyc2lvbiB0aGUgZGF5IEkgZ290
IHRoaXMgbGFwdG9wDQo+ICgxMiBkYXlzIGFnbykuDQo+IA0KPiBUaGlua1BhZCByZWFsbHkgY2hh
bmdlZCBhIGxvdCBhZnRlciBiZWluZyBhY3F1aXJlZCBieSBMZW5vdm8uLi4NCj4gDQoNCklmIHlv
dSBzYXkgc28gOikgLSBJIGRvbid0a25vdyA6KQ0KDQo+IEFzIG9mIEFTVSwgSSd2ZSBkb3dubG9h
ZGVkIGl0IGJ1dCBJIGRvIG5vdCBrbm93IGhvdyB0byBzaG93IGhpZGRlbiBCSU9TDQo+IG9wdGlv
bnMgd2l0aCBpdCBzaW5jZSBJIGhhdmUgbGl0dGxlIGtub3dsZWRnZSBhYm91dCBoYXJkd2FyZS4u
Lg0KPiANCj4gSGVyZSBpcyB3aGF0IEkgZ290IHdoZW4gcnVubmluZyAuL2FzdTY0IGR1bXA6DQo+
IA0KPiB3enlib3lAeGVuaWVuOn4vRGVza3RvcC9hc3UkIHN1ZG8gLi9hc3U2NCBkdW1wIElCTSBB
ZHZhbmNlZCBTZXR0aW5ncw0KPiBVdGlsaXR5IHZlcnNpb24gOS40MS44MUsgTGljZW5zZWQgTWF0
ZXJpYWxzIC0gUHJvcGVydHkgb2YgSUJNDQo+IChDKSBDb3B5cmlnaHQgSUJNIENvcnAuIDIwMDct
MjAxMyBBbGwgUmlnaHRzIFJlc2VydmVkDQo+ICAgICAgMCAgMSAgMiAgMyAgNCAgNSAgNiAgNyAg
OCAgOSAgQSAgQiAgQyAgRCAgRSAgRg0KPiAwMDogMDA+MDAqMDAqMDAqMDAqMDAqNjcqOGIqNDUq
MDYqNjcqZjYqNDUqZWMqMDEqNzUNCj4gMTA6IDAzKmI4KjAwKmYwKjhlKmQ4KjY3KjhlKjQ1KjA0
KjY3KjhiKjdkKjAyKjMzKmMwDQo+IDIwOiAyNio4OSo0NSowNCphMSo2MSowOSoyNio4OSo0NSow
MiphMCo2MyowOSoyNio4OA0KPiAzMDogNDUqMDEqMjYqYzYqMDUqMDEqYjgqMDAqMDAqYzMqYjgq
ODIqMDAqYzMqYjgqODINCj4gNDA6IDAwKmMzKmI4KjgyKjAwKmMzKmI4KjgyKjAwKmMzKmI4KjAw
KjEwKjI2Kjg5KjQ1DQo+IDUwOiAwZCpmOCpjMyo5Yyo2Nio2MCplNCo2MCplYiowMCplYiowMCo2
Nio2MSo5ZCpjMw0KPiA2MDogMWUqYjgqNDAqMDAqOGUqZDgqZjYqMDYqMTAqMDQqMDQqNzQqMDMq
MWYqZjgqYzMNCj4gNzA6IDFmKmY5KmMzKmY4KmMzKjI1KjAwKjAwKjQxKmQwKjAwKjAwKjA4KjAw
KjAwKjAzDQo+IDgwOiAwMCoyMiowNCowMCo0NyowMSoyMCowMCoyMCowMCowMCowMio0NyowMSph
MCowMA0KPiA5MDogYTAqMDAqMDAqMDIqNzkqMDAqNzkqMDAqNzkqMDAqNDUqMDAqMDAqNDEqZDAq
MDINCj4gYTA6IDAwKjA4KjAxKjAwKjAzKjAwKjJhKjEwKjAwKjQ3KjAxKjAwKjAwKjAwKjAwKjAw
DQo+IGIwOiAxMCo0NyowMSo4MSowMCo4MSowMCowMCowMyo0NyowMSo4NyowMCo4NyowMCowMA0K
PiBjMDogMDEqNDcqMDEqODkqMDAqODkqMDAqMDAqMDMqNDcqMDEqOGYqMDAqOGYqMDAqMDANCj4g
ZDA6IDAzKjQ3KjAxKmMwKjAwKmMwKjAwKjAwKjIwKjc5KjAwKjc5KjAwKjc5KjAwKjFkDQo+IGUw
OiAwMCowMCo0MSpkMCowMSowMCowOCowMiowMSowMyowMCoyMiowMSowMCo0NyowMQ0KPiBmMDog
NDAqMDAqNDAqMDAqMDAqMDQqNzkqMDAqNzkqMDAqNzkqMDAqMWQqMDAqMDAqNDENCj4gDQo+IA0K
PiBDb3VsZCB5b3UgaGVscCBieSB0ZWxsaW5nIG1lIHdoYXQgY29tbWFuZCBzaG91bGQgSSBydW4g
dG8gZW5hYmxlIHRob3NlDQo+IGhpZGRlbiBvcHRpb25zIGluIEJJT1M/DQoNCkkgaGF2ZSBubyBj
bHVlIHVuZm9ydHVuYXRlbHkgLSBtYXliZSBjb250YWN0IExlbm92bz8NCg0K

2013-11-13 17:42:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

[+cc Bj?rn]

On Tue, Nov 12, 2013 at 10:39 PM, wzyboy <[email protected]> wrote:
> 2013/11/13 Bjorn Helgaas <[email protected]>:
>> wzyboy, can you run these commands before the bug occurs and before
>> using the "setpci" workaround:
>>
>> lspci -vvxxxx -s00:1c.1
>> lspci -vvxxxx -s03:00.0
>
> After today's morning lessons I booted up my laptop with pci=earlydump
> kernel perameter and here are the output of lspci (without setpci and
> before bug hit) and dmesg.

Thanks. Are you 100% sure the lspci output is before the setpci
workaround? The dmesg earlydump shows this (the ASPM control bits are
in the 16-bit Link Control register, which is at 0x50 for both
devices):

00:1c.1 Root Port config
50: 42 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
03:00.0 Intel 7260 config:
50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00

So at boot-time, ASPM was enabled. But the lspci shows:

00:1c.1 Root Port config
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
LnkCtl: ASPM Disabled
50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
03:00.0 Intel 7620 config
Capabilities: [40] Express (v2) Endpoint, MSI 00
LnkCtl: ASPM Disabled
50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00

And now ASPM is disabled. I'm pretty sure the kernel did not disable
ASPM, so I assume it was disabled by setpci.

I manually decoded the L1 PM Substates registers for both the Root
Port and the 7260 device (details appended below), and everything
appears enabled there (though I think that since ASPM is disabled, L1
PM substates is ignored).

My conclusion is that the BIOS enabled both ASPM and L1 PM substates.
Obviously the BIOS will do the same when booting Windows, and I assume
the device works fine with the Windows driver. Based on our previous
experience with Windows, I don't think it will change the ASPM
configuration because the ACPI FADT table and the PCI _OSC method do
not grant control of ASPM to the OS. Therefore, I think the problem
is in the Linux iwlwifi driver.

I don't think there's anything more I can do here because there's no
evidence that the PCI core is doing anything wrong. But if it turns
out that we *should* be doing something differently, let me know.

Bjorn


00:1c.1 Root Port config
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
LnkCtl: ASPM Disabled
50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
Capabilities: [200 v1] #1e
200: 1e 00 01 00 1f 28 28 00 1f 28 a0 40 f0 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Header 0x0001001e
ID 0x001e (L1 Substates)
Version 1
Capabilities 0x0028281f
Control 1 0x40a0281f
0x40a0281f Control 1
0x00000001 PCI-PM L1.2 Enable
0x00000002 PCI-PM L1.1 Enable
0x00000004 ASPM L1.2 Enable
0x00000008 ASPM L1.1 Enable
0x00000010 RsvdP
0x00002800 Common_Mode_Restore_Time
0x00a00000 LTR_L1.2_THRESHOLD_Value
0x40000000 LTR_L1.2_THRESHOLD_Scale
Control 2 0x000000f0
0x000000f0 Control 2
0x000000f0 T_POWER_ON Value

03:00.0 Intel 7620 config
Capabilities: [40] Express (v2) Endpoint, MSI 00
LnkCtl: ASPM Disabled
50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
Capabilities: [154 v1] Vendor Specific Information: ID=cafe
Rev=1 Len=014 <?>
150: 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
Header 0x0001000b
ID 0x000b (Vendor-specific)
Version 1
Vendor-specific header 0x0141cafe
VSEC ID 0xcafe
VSEC Rev 1
Length 0x14
Capabilities 0x00f01e1f
Control 1 0x40a0000f
0x40a0000f Control 1
0x00000001 PCI-PM L1.2 Enable
0x00000002 PCI-PM L1.1 Enable
0x00000004 ASPM L1.2 Enable
0x00000008 ASPM L1.1 Enable
Control 2 0x000000f0

2013-11-13 20:31:39

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

>
> [+cc Bj?rn]
>
> On Tue, Nov 12, 2013 at 10:39 PM, wzyboy <[email protected]> wrote:
> > 2013/11/13 Bjorn Helgaas <[email protected]>:
> >> wzyboy, can you run these commands before the bug occurs and before
> >> using the "setpci" workaround:
> >>
> >> lspci -vvxxxx -s00:1c.1
> >> lspci -vvxxxx -s03:00.0
> >
> > After today's morning lessons I booted up my laptop with pci=earlydump
> > kernel perameter and here are the output of lspci (without setpci and
> > before bug hit) and dmesg.
>
> Thanks. Are you 100% sure the lspci output is before the setpci workaround?
> The dmesg earlydump shows this (the ASPM control bits are in the 16-bit Link
> Control register, which is at 0x50 for both
> devices):
>
> 00:1c.1 Root Port config
> 50: 42 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> 03:00.0 Intel 7260 config:
> 50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
>
> So at boot-time, ASPM was enabled. But the lspci shows:
>
> 00:1c.1 Root Port config
> Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> 03:00.0 Intel 7620 config
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
>
> And now ASPM is disabled. I'm pretty sure the kernel did not disable ASPM,
> so I assume it was disabled by setpci.
>
> I manually decoded the L1 PM Substates registers for both the Root Port and
> the 7260 device (details appended below), and everything appears enabled
> there (though I think that since ASPM is disabled, L1 PM substates is
> ignored).
>
> My conclusion is that the BIOS enabled both ASPM and L1 PM substates.
> Obviously the BIOS will do the same when booting Windows, and I assume
> the device works fine with the Windows driver. Based on our previous
> experience with Windows, I don't think it will change the ASPM configuration
> because the ACPI FADT table and the PCI _OSC method do not grant control
> of ASPM to the OS. Therefore, I think the problem is in the Linux iwlwifi
> driver.
>
> I don't think there's anything more I can do here because there's no
> evidence that the PCI core is doing anything wrong. But if it turns out that we
> *should* be doing something differently, let me know.
>

Right - no evidence of anything - Thank you a lot for all your help. I have learnt a lot from this thread.
I guess I'll try to disable L1 PM substates with setpci command and see what happens.



> Bjorn
>
>
> 00:1c.1 Root Port config
> Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 00 11 70 00 b2 14 00 00 00 40 01 00 00 00 00
> Capabilities: [200 v1] #1e
> 200: 1e 00 01 00 1f 28 28 00 1f 28 a0 40 f0 00 00 00
> 210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Header 0x0001001e
> ID 0x001e (L1 Substates)
> Version 1
> Capabilities 0x0028281f
> Control 1 0x40a0281f
> 0x40a0281f Control 1
> 0x00000001 PCI-PM L1.2 Enable
> 0x00000002 PCI-PM L1.1 Enable
> 0x00000004 ASPM L1.2 Enable
> 0x00000008 ASPM L1.1 Enable
> 0x00000010 RsvdP
> 0x00002800 Common_Mode_Restore_Time
> 0x00a00000 LTR_L1.2_THRESHOLD_Value
> 0x40000000 LTR_L1.2_THRESHOLD_Scale
> Control 2 0x000000f0
> 0x000000f0 Control 2
> 0x000000f0 T_POWER_ON Value
>
> 03:00.0 Intel 7620 config
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> LnkCtl: ASPM Disabled
> 50: 40 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
> Capabilities: [154 v1] Vendor Specific Information: ID=cafe
> Rev=1 Len=014 <?>
> 150: 0b 00 01 00 fe ca 41 01 1f 1e f0 00
> 160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
> Header 0x0001000b
> ID 0x000b (Vendor-specific)
> Version 1
> Vendor-specific header 0x0141cafe
> VSEC ID 0xcafe
> VSEC Rev 1
> Length 0x14
> Capabilities 0x00f01e1f
> Control 1 0x40a0000f
> 0x40a0000f Control 1
> 0x00000001 PCI-PM L1.2 Enable
> 0x00000002 PCI-PM L1.1 Enable
> 0x00000004 ASPM L1.2 Enable
> 0x00000008 ASPM L1.1 Enable
> Control 2 0x000000f0

2013-11-14 04:24:05

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/14 Bjorn Helgaas <[email protected]>:
> Thanks. Are you 100% sure the lspci output is before the setpci
> workaround?


To ensure that, I did it again:

boot up laptop -> connect to domitory's WiFi -> run lspci -> run
setpci -> run dmesg.

Here are the outputs.
--
wzyboy


Attachments:
lspci.without-setpci.before-bug.2.txt (32.04 kB)
pci-earlydump.2.dmesg (63.48 kB)
Download all attachments

2013-11-15 03:07:15

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/15 Bjorn Helgaas <[email protected]>:
> Why would it be unlikely to fix the driver? Do people think the
> problem is not actually in the driver?
>
> Asking Lenovo how to disable L1 PM substates is really a non-answer.
> Only the extremely technical and extremely patient user (hi wzyboy :))
> will even bother to investigate why wifi works fine with Windows but
> not with Linux. The only thing Lenovo *could* do is to release a new
> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
> would never do that because then I would have to tell customers
> "disable this for Linux, enable this for Windows," and I'd have to
> deal with support calls about devices using more power than they
> should, battery life being shorter, etc. Plus you'd have to ask every
> Linux user to upgrade their BIOS. That's all just a terrible user
> experience.


I am a little confused. There are two sets of "setpci" commands, both
of which can make me use my NIC reliably. But you two say they are
just workarounds, not real fixes.

I know the "side effect" of first two "setpci" commands is consuming
more power. (Actually by my experience of running on battery, I did
not notice ...)

But Grumbach said after the second two "setpci" commands enables "L1".
Does it mean it saves power? So what's the "side effect" of second two
"setpci" commands?

IMHO, if this could user use their NIC reliably, maybe Grumbach may
write these commands to iwlwifi driver and run them when 7260 is
detected...

BTW, no replies from Lenovo, yet.

--
wzyboy

2013-11-15 09:04:43

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Thanks a lot for explaination, Emmanuel!

Now I finally know why this is a "catch-22" situation: Disabling those
features with OS/drvier cannot be as neat as disabling them directly
in BIOS. And there may be chance, that disabling them at a bad timing
may cause G3...

--
wzyboy


2013/11/15 Emmanuel Grumbach <[email protected]>:
>
>
> On 11/15/2013 05:06 AM, wzyboy wrote:
>> 2013/11/15 Bjorn Helgaas <[email protected]>:
>>> Why would it be unlikely to fix the driver? Do people think the
>>> problem is not actually in the driver?
>>>
>>> Asking Lenovo how to disable L1 PM substates is really a non-answer.
>>> Only the extremely technical and extremely patient user (hi wzyboy :))
>>> will even bother to investigate why wifi works fine with Windows but
>>> not with Linux. The only thing Lenovo *could* do is to release a new
>>> BIOS with a switch to control L1 PM Substates. If I were Lenovo, I
>>> would never do that because then I would have to tell customers
>>> "disable this for Linux, enable this for Windows," and I'd have to
>>> deal with support calls about devices using more power than they
>>> should, battery life being shorter, etc. Plus you'd have to ask every
>>> Linux user to upgrade their BIOS. That's all just a terrible user
>>> experience.
>>
>>
>> I am a little confused. There are two sets of "setpci" commands, both
>> of which can make me use my NIC reliably. But you two say they are
>> just workarounds, not real fixes.
>>
>
> Right - because they force a mode that the BIOS doesn't allow. The BIOS
> doesn't allow the OS (the driver) to decide in what mode to work - so we
> cannot reach the same effect as the setpci command from the OS / driver
> level. setpci just directly accesses the HW without asking the
> permissions of anyone.
>
>> I know the "side effect" of first two "setpci" commands is consuming
>> more power. (Actually by my experience of running on battery, I did
>> not notice ...)
>>
>> But Grumbach said after the second two "setpci" commands enables "L1".
>> Does it mean it saves power? So what's the "side effect" of second two
>> "setpci" commands?
>>
>
> They are both the same in terms of side-effects. The first set of setpci
> commands will disable L1 altogether - meaning you don't save any power.
> The second set of setpci doesn't disable L1, but disable a more subtle
> power state (actually several) which are defined as L1 PM substates. In
> theses substates, you save less power than in L1 (I think) but you are
> more likely to be able to reach them. After all, it is always the same
> story - the deeper you sleep, you longer it takes to wake up. And if it
> takes longer to wake up, it also means that in several cases you won't
> chose to go to sleep. So the way PCI folks help to save power even in
> case where you cannot go to a deep sleep is to define states in the
> middle in which you save less power, but in which you are more likely to
> be. Again - time spent in each state and power saved in each state trade
> off.
> Now:
> L1 - deep sleep
> L1 PM substate - something in the middle.
>
> First setpci command - disable both features.
> Second setpci command - disable only the second feature.
>
> Regarding side effects... I don't think this is really "dangerous". But
> this is not a fix in the way that I wouldn't like to deploy millions of
> machines like that. The risk you have here is probably to have a bad
> timing and have the setpci commands run exactly when the link is in a
> state that setpci disables. That would be bad. How bad? Probably would
> just require a reboot - or worst case G3 (take the battery off).
>
>> IMHO, if this could user use their NIC reliably, maybe Grumbach may
>> write these commands to iwlwifi driver and run them when 7260 is
>> detected...
>
> I can't as exlained above.
>
>>
>> BTW, no replies from Lenovo, yet.
>>
>
>
>> Or maybe you could add an option, which enables this "workaround" if
>> user wants. A user could simply write a /etc/modprobe.d/iwlwifi.conf
>> and enable this "workaround", to use their NICs without having to
>> reboot from time to time...
>
> same.
>

2013-11-12 09:36:42

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Tue, Nov 12, 2013 at 9:02 AM, Emmanuel Grumbach <[email protected]> wrote:
> On Tue, Nov 12, 2013 at 7:42 AM, wzyboy <[email protected]> wrote:
>> Hi, I've got some good news. Here is what I did today:
>>
>> boot up laptop -> do the sysfs trick -> start downloading a big file
>> to benchmark it -> several minutes later the bug occurs -> reboot my
>> laptop to recover -> do the setpci trick -> start downloading a big
>> file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
>> and everything works fine!
>
> encouraging. Thanks.
> I just wonder... the patch I sent was supposed to tell the HW not to
> use L1. So I would have hoped it would have helped in the same way?
> After all, L1 is a handshake between the device and the bridge, so
> that if the device doesn't initiate / refuses to go into L1, I'd
> expect it to have the same effect as disabling L1 in the ASPM register
> PCIe config space?
> Obviously I am wrong though.
>

can you please try to see if you have BIOS updates? (It seems that all
the BIOS update tools run on windows... - but I can have a bootable CD
:))

You can also check this out:
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=TOOL-ASU
This might help to remove support for L1 substates.
I guess it'd be nice to ask Lenovo too about how to find these options
in BIOS. From our experience, there are a lot of features and options
in BIOS that are accessible only after you enter a "secret" (I meant
obscure) sequence of keys.

>>
>> Here are the output of lspci -vv after running two "setpci" commands.
>>
>> There is also a screenshot of ThinkPad X240s HMM, showing how the
>> wireless card is connected to the motherboard. In this figure #10 is
>> the Wireless LAN card. It is connected to the motherboard with Intel's
>> NGFF connector.
>>
>> I will continue downloading big files to benchmark it.
>>
>
> Thanks
>
>> --
>> wzyboy

2013-11-10 07:08:49

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/10 Emmanuel Grumbach <[email protected]>:
> one more thing?
> Are you using KVM with pass-through? Or is it a native installation?


No, I’m using a native installation. I just wiped the pre-installed
Windows OS and booted from Arch Linux installation disk, set up LUKS
(without LVM) and installed the system.

However, as you can see in dmesg "modules linked in", I have
VirtualBox installed and "vboxdrv.ko" loaded.

--
wzyboy

2013-11-14 07:05:04

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiAyMDEzLzExLzE0IEdydW1iYWNoLCBFbW1hbnVlbCA8ZW1tYW51ZWwuZ3J1bWJhY2hAaW50ZWwu
Y29tPjoNCj4gPiBDYW4geW91IHBsZWFzZSB0cnkgdGhlIGZvbGxvd2luZzoNCj4gPiAqIEJvb3Qg
d2l0aG91dCBhbnkgY2hhbmdlcw0KPiA+ICogc2V0cGNpIC1zMDM6MDAuMCAweDE2MC5CPTB4MDAN
Cj4gPiAqIHNldHBjaSAtczAwOjFjLjEgMHgyMDQuQj0weDEwDQo+ID4gbHNwY2kgLXZ2DQo+ID4N
Cj4gPiBhbmQgdGVsbCBtZSBpZiBXaUZpIHdvcmtzIHRoZW4uDQo+ID4gKHRoaXMgcmVwbGFjZXMg
dGhlIHByZXZpb3VzIHNldHBjaSBjb21tYW5kcykNCj4gPg0KPiA+DQo+ID4gVGhhbmsgeW91DQo+
IA0KPiANCj4gYm9vdCB1cCAobm8gY29ubmVjdGlvbikgLT4gcnVuIG5ldyBzZXRwY2kgY29tbWFu
ZHMgLT4gbHNwY2kgYW5kIGRtZXNnDQo+IC0+IGNvbm5lY3QgdG8gZG9taXRvcnkncyBXaUZpIC0+
IGRvd25sb2FkIDMuNSBHaUIgZGF0YSAtPiB3b3JrcyBmaW5lIQ0KPiANCg0KQXdlc29tZSAtIHlv
dSBoYXZlIEwxIGVuYWJsZWQ6DQoNCjAzOjAwLjAgTmV0d29yayBjb250cm9sbGVyOiBJbnRlbCBD
b3Jwb3JhdGlvbiBXaXJlbGVzcyA3MjYwIChyZXYgNmIpDQoJCUxua0N0bDoJQVNQTSBMMSBFbmFi
bGVkOyBSQ0IgNjQgYnl0ZXMgRGlzYWJsZWQtIFJldHJhaW4tIENvbW1DbGsrDQoJCQlFeHRTeW5j
aC0gQ2xvY2tQTSsgQXV0V2lkRGlzLSBCV0ludC0gQXV0QldJbnQtDQo=

2013-11-08 04:46:50

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/8 wzyboy <[email protected]>:
> so I have no idea how this wireless card
> performs under Windows.


And I could not test this out under Windows any more, since my laptop
is now a BIOS + GTP + LUKS setup, so it would be a big project if I
want to install Windows again -- Windows 7/8/8.1 does not allow BIOS +
GPT setup, not speaking of the headache of resizing LUKS containers...

--
wzyboy

2013-11-06 07:12:57

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/6 Grumbach, Emmanuel <[email protected]>:
> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
> I can see 0xff here.
> Anyway - this is very bad... checking with HW guys...


Sorry, that's my typo. They are all 0xff... (I don't know what do they
mean but it look bad...)

Thanks for your effort! I'm waiting for good news from you and HW guys. :-)

--
wzyboy

2013-11-10 11:41:12

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxMy8xMS8xMCBFbW1hbnVlbCBHcnVtYmFjaCA8ZWdydW1iYWNoQGdtYWlsLmNvbT46
DQo+ID4gSFcgcGVvcGxlIHNlZW0gdG8gd2FudCB0byBrbm93IHdoYXQgaGFwcGVucyBpbiB5b3Ug
ZGlzYWJsZSBMMQ0KPiBzdWJzdGF0ZXMuDQo+ID4gQ2FuIHlvdSBlbnRlciB5b3UgQklPUyBhbmQg
Y2hlY2sgaWYgeW91IGhhdmUgc3VjaCBhbiBvcHRpb24gaW4geW91ciBCSU9TPw0KPiANCj4gDQo+
IENvdWxkIHlvdSBiZSBtb3JlIHNwZWNpZmljIHdoYXQgdGhlIG9wdGlvbiBuYW1lIGxvb2sgbGlr
ZT8gT3IgSSBjb3VsZCB0YWtlDQo+IHBob3RvcyBmb3IgZWFjaCB0YWIgaW4gQklPUyBhbmQgYXR0
YWNoIHRoZSBwaG90b3MgaGVyZS4NCj4gDQoNCkl0IGlzIHJlYWxseSBjYWxsZWQgTDEgUE0gU3Vi
c3RhdGUuDQpJIGRvbid0IHJlYWxseSBrbm93IHRoZSBCSU9TIG9mIFRoaW5rUGFkLi4uIEJ1dCB3
ZSBjYW4gdHJ5Li4uDQoNCg==

2013-11-12 12:25:57

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/12 Grumbach, Emmanuel <[email protected]>:
> Well... I haven't done much, but the setpci isn't really a solution - it is more a work around.
> Bjorn is basically disabling L1 PCIe feature which allows to save power. While you might not care, I do :)
> The HW folks here would still want to know if you can disable L1 substates feature (not that I know what it is - but I can guess).
> If you can try to:
> * upgrade your BIOS (if needed)
> * check the advanced options I sent to you to see if you can unlock the advanced menu in your BIOS
>
> it'd help me to understand the issue.
> In any case, I am happy that you have a way to reliably use your NIC now.


Oh, actually I upgraded my BIOS to newest version the day I got this
laptop (12 days ago).

ThinkPad really changed a lot after being acquired by Lenovo...

As of ASU, I've downloaded it but I do not know how to show hidden
BIOS options with it since I have little knowledge about hardware...

Here is what I got when running ./asu64 dump:

wzyboy@xenien:~/Desktop/asu$ sudo ./asu64 dump
IBM Advanced Settings Utility version 9.41.81K
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2013 All Rights Reserved
0 1 2 3 4 5 6 7 8 9 A B C D E F
00: 00>00*00*00*00*00*67*8b*45*06*67*f6*45*ec*01*75
10: 03*b8*00*f0*8e*d8*67*8e*45*04*67*8b*7d*02*33*c0
20: 26*89*45*04*a1*61*09*26*89*45*02*a0*63*09*26*88
30: 45*01*26*c6*05*01*b8*00*00*c3*b8*82*00*c3*b8*82
40: 00*c3*b8*82*00*c3*b8*82*00*c3*b8*00*10*26*89*45
50: 0d*f8*c3*9c*66*60*e4*60*eb*00*eb*00*66*61*9d*c3
60: 1e*b8*40*00*8e*d8*f6*06*10*04*04*74*03*1f*f8*c3
70: 1f*f9*c3*f8*c3*25*00*00*41*d0*00*00*08*00*00*03
80: 00*22*04*00*47*01*20*00*20*00*00*02*47*01*a0*00
90: a0*00*00*02*79*00*79*00*79*00*45*00*00*41*d0*02
a0: 00*08*01*00*03*00*2a*10*00*47*01*00*00*00*00*00
b0: 10*47*01*81*00*81*00*00*03*47*01*87*00*87*00*00
c0: 01*47*01*89*00*89*00*00*03*47*01*8f*00*8f*00*00
d0: 03*47*01*c0*00*c0*00*00*20*79*00*79*00*79*00*1d
e0: 00*00*41*d0*01*00*08*02*01*03*00*22*01*00*47*01
f0: 40*00*40*00*00*04*79*00*79*00*79*00*1d*00*00*41


Could you help by telling me what command should I run to enable those
hidden options in BIOS?
--
wzyboy

2013-11-14 07:09:52

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/14 Grumbach, Emmanuel <[email protected]>:
> Awesome - you have L1 enabled:


Though do not understand but it seems like a good news :-)

Does that mean you find a real fix instead of a workaround? Congratulations!

--
wzyboy

2013-11-10 10:20:15

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/9 Bjorn Helgaas <[email protected]>:
> it might
> be interesting to do "echo on >
> /sys/bus/pci/devices/0000:03:00.0/power/control" and see whether it
> makes a difference.


Hi, should I run this command after the bug? I just ran this after the
bug occurs, but there is no output in dmesg, and "ip link set wlan0
up" still returns same error ("RTNETLINK answers: Connection timed
out").

--
wzyboy

2013-11-12 12:59:37

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/12 Grumbach, Emmanuel <[email protected]>:
> If you say so :) - I don'tknow :)
>
I checked Lenovo's page again:
http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS035950

It says the latest BIOS version is 2.12, and this is the version I am using...
>
> I have no clue unfortunately - maybe contact Lenovo?

I sent an email to Lenovo support and am waiting for reply. I hope the
consumer service is as good as in IBM-era...

--
wzyboy

2013-11-07 04:49:39

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/7 Bjorn Helgaas <[email protected]>:
> Do you have any more details? Maybe open a bugzilla.kernel.org report
> and attach:
>
> - complete dmesg log
> - lspci -vvxxx output for entire system before issue occurs
> - lspci -vvxxx output for entire system after issue occurs


Hi, I have filed a bug on bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=64541

Actually I have posted some more logs before, you could find them on
the mailing lists archive:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/115259

Here is the output of lspci -vxxxx just now. I'll run this command
again when the bug occurs next time:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
Subsystem: Intel Corporation Wireless-N 7260
Flags: bus master, fast devsel, latency 0, IRQ 62
Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [40] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 5c-51-4f-ff-ff-0d-82-ac
Capabilities: [14c] Latency Tolerance Reporting
Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 <?>
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00
40: 10 00 02 00 c0 8e 00 10 10 0c 19 00 11 ec 06 00
50: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 12 08 08 00 05 04 00 00 00 00 00 00
70: 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 01 d0 23 c8 00 00 00 0d
d0: 05 40 81 00 0c f0 e0 fe 00 00 00 00 42 41 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100: 01 00 01 14 00 00 00 00 00 00 00 00 31 20 46 00
110: 00 20 00 00 00 20 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 03 00 c1 14 ac 82 0d ff ff 4f 51 5c 18 00 41 15
150: 03 10 03 10 0b 00 01 00 fe ca 41 01 1f 1e f0 00
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
740: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
850: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
920: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
aa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ad0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
--
wzyboy

2013-11-12 07:02:53

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Tue, Nov 12, 2013 at 7:42 AM, wzyboy <[email protected]> wrote:
> Hi, I've got some good news. Here is what I did today:
>
> boot up laptop -> do the sysfs trick -> start downloading a big file
> to benchmark it -> several minutes later the bug occurs -> reboot my
> laptop to recover -> do the setpci trick -> start downloading a big
> file to benchmark it -> more than 5 GiB downloaded (at ~ 2.3 MiB/s)
> and everything works fine!

encouraging. Thanks.
I just wonder... the patch I sent was supposed to tell the HW not to
use L1. So I would have hoped it would have helped in the same way?
After all, L1 is a handshake between the device and the bridge, so
that if the device doesn't initiate / refuses to go into L1, I'd
expect it to have the same effect as disabling L1 in the ASPM register
PCIe config space?
Obviously I am wrong though.

>
> Here are the output of lspci -vv after running two "setpci" commands.
>
> There is also a screenshot of ThinkPad X240s HMM, showing how the
> wireless card is connected to the motherboard. In this figure #10 is
> the Wireless LAN card. It is connected to the motherboard with Intel's
> NGFF connector.
>
> I will continue downloading big files to benchmark it.
>

Thanks

> --
> wzyboy

2013-11-13 08:45:51

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

>
> On Tue, Nov 12, 2013 at 12:37 PM, Emmanuel Grumbach
> <[email protected]> wrote:
> > On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> >> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> >> <[email protected]> wrote:
> >>
> >>> Right - I remember the discussion we had on that.
> >>> On this device (7260 that has an issue with ASPM), we don't call
> pci_disable_link_state, because we know it is supposed to work...
> >>
> >> If ASPM is supposed to work as far as the hardware is concerned, I
> >> guess you're saying this must be an iwlwifi driver issue. Right?
> >
> > ASPM is supposed to work as far as the hardware is concerned.
> > We might very well have an issue in iwlwifi - and I am checking this
> > internally with our System guys.
> > It can be a PCI core problem too, and it could also be a platform /
> > BIOS / Lenovo issue.
> > Of course, I have no clue which of these is the culprit here.
> > Our System folks seemed to say that this new device uses L1 substates
> > which can be enabled in Haswell platform which the user owns.
> > Now - L1 substates is a new feature and might introduce issues
> > (apparently) - and this is why they (System folks) wanted the try
> > without L1 substates. But disabling L1 substates doesn't seem trivial
> > with the production BIOS of Lenovo. So I am pretty stuck here.
>
> For debugging purposes, we could configure L1 substates with setpci, as we
> did for ASPM. The Linux kernel knows nothing about L1 substates, so the PCI
> core isn't doing anything with them. It's possible the driver itself could muck
> with L1 substate configuration, but that would be discouraged, and I don't
> see anything in iwlwifi that is doing that.
>
> The lspci output in
> https://bugzilla.kernel.org/attachment.cgi?id=114061 shows an L1 PM
> Substates extended capability (capability ID 0x1e) for the Root Port leading to
> the 7260 device, but not for the 7260 device itself:
>
> 00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3
> (rev e4) (prog-if 00 [Normal decode])
> Capabilities: [200 v1] #1e
>
> Per sec 5.5.4 of the ECN for L1 PM Substates (15 Aug 2012), I think L1
> substates must be configured on both ends of the link, and if the 7260 device
> doesn't have that capability, I don't see how it could be enabled.

Makes sense.

>
> The lspci version wzyboy has doesn't decode the L1 PM Substates capability,
> but there is a newer version at
> git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git that should decode it.
> Also, "lspci -vvxxx" didn't hexdump this capability, which should be at offset
> 0x200. Using "lspci -xxxx" (four "x"s) should dump it, and we can decode it
> manually.
>

You can find this in http://permalink.gmane.org/gmane.linux.kernel.wireless.general/115378.

Somehow my System team says that it should be at offset 0x160?
Is it possible that there is a "walk algorithm" with pointers just like for the ASPM register?
I'll try to check the PCI spec when I'll find the time for that.

In any case, here are the relevant offsets:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
[...]
160: 0f 00 a0 40 f0 00 00 00 00 00 00 00 00 00 00 00
[...]
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

> wzyboy, can you run these commands before the bug occurs and before
> using the "setpci" workaround:
>
> lspci -vvxxxx -s00:1c.1
> lspci -vvxxxx -s03:00.0

2013-11-12 19:37:12

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly



On 11/12/2013 09:14 PM, Bjorn Helgaas wrote:
> On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
> <[email protected]> wrote:
>
>> Right - I remember the discussion we had on that.
>> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...
>
> If ASPM is supposed to work as far as the hardware is concerned, I
> guess you're saying this must be an iwlwifi driver issue. Right?

ASPM is supposed to work as far as the hardware is concerned.
We might very well have an issue in iwlwifi - and I am checking this
internally with our System guys.
It can be a PCI core problem too, and it could also be a platform / BIOS
/ Lenovo issue.
Of course, I have no clue which of these is the culprit here.
Our System folks seemed to say that this new device uses L1 substates
which can be enabled in Haswell platform which the user owns.
Now - L1 substates is a new feature and might introduce issues
(apparently) - and this is why they (System folks) wanted the try
without L1 substates. But disabling L1 substates doesn't seem trivial
with the production BIOS of Lenovo. So I am pretty stuck here.
Another possibility is to run a PCI analyser on the machine, but that
requires to have the machine in the lab...

> If you think it's a PCI core problem, we have to figure out what the
> core needs to do differently. If somebody can point to a difference
> in the ASPM configuration between Windows and Linux, that would be a
> good start.
>
> Bjorn
>

2013-11-09 02:46:39

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/9 Bjorn Helgaas <[email protected]>:
> Thanks. But can you please attach the output of "lspci -vvxxx" (not
> "-vxxxx") for the entire system before the problem occurs?


Sorry I used the wrong command...

I've attached the output of -vvxxx below.

There are three files:

* lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
* lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
link" after I ran "ip link set wlan0 up".
* lspci.vvxxx.normal3.txt" When the interface is connected to the
Wi-Fi of my dormitory and got an address (but without default
gateway, I'm using wired network now).

--
wzyboy


Attachments:
lspci.vvxxx.normal.txt (33.44 kB)
lspci.vvxxx.normal2.txt (33.44 kB)
lspci.vvxxx.normal3.txt (33.44 kB)
Download all attachments

2013-11-13 06:46:21

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/13 wzyboy <[email protected]>:
> After today's morning lessons I booted up my laptop with pci=earlydump
> kernel perameter and here are the output of lspci (without setpci and
> before bug hit) and dmesg.


Hi, I have a question: The "setpci" workaround can now make me use my
NIC without having to reboot my laptop from time to time. However,
they are under Linux 3.12 with Grumbach's patch. I'm wondering whether
"setpci" workaround still works in official Linux 3.12 kernel?

I'll try official Linux 3.12 later.

--
wzyboy

2013-11-06 06:35:07

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/11/6 Grumbach, Emmanuel <[email protected]>:
>>
>> Hi, I'm back.
>>
>> This terriable bug occurs again in Linux 3.12. This time I've been prepared for
>> it and wrote down the complete process of its apperance and my reacting, as
>> attached.
>
> Thanks - was that with power save disabled?

Yes, Linux 3.12 with "options iwlmvm power_scheme=1". (I did not touch
that .conf file after the kernel upgrade)

>
>>
>> IMHO they are quite similar with those errors in 3.11...
>
> Indeed. The only difference is that you don’t have PCI complain about not being able to disable L1.

I see. But I still cannot figure out what is the "trigger" of this
bug. Today (Nov 06 UTC+8) this bug occurs twice till now (14:30), they
were at 12:31 and 13:30. At the first time I was about to do a system
upgrade and at the second time I was using rsync to upload photos from
my Android phone to my laptop.

Sometimes I was not even using network (the traffic was near zero)
when the bug occurs. So this bug seems to occur no matter of network
traffic states?

Could you think of a possible "trigger" of this bug so I could try to
avoid it (I hate rebooting) before the final fix is released? For
example, if there is something wrong with the "modules linked in" I
could blacklist that module...


Sincere regards.

--
wzyboy

2013-11-08 17:20:47

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Wed, Nov 6, 2013 at 9:49 PM, wzyboy <[email protected]> wrote:
> 2013/11/7 Bjorn Helgaas <[email protected]>:
>> Do you have any more details? Maybe open a bugzilla.kernel.org report
>> and attach:
>>
>> - complete dmesg log
>> - lspci -vvxxx output for entire system before issue occurs
>> - lspci -vvxxx output for entire system after issue occurs
>
>
> Hi, I have filed a bug on bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=64541
>
> Actually I have posted some more logs before, you could find them on
> the mailing lists archive:
> http://thread.gmane.org/gmane.linux.kernel.wireless.general/115259
>
> Here is the output of lspci -vxxxx just now. I'll run this command
> again when the bug occurs next time:

Thanks. But can you please attach the output of "lspci -vvxxx" (not
"-vxxxx") for the entire system before the problem occurs? All the
info is in the "-xxxx" output, but it's really painful to decode it
all by hand. Using "-vv" will decode the PCIe Capability structures
where the ASPM configuration is. And the entire system is
interesting, because ASPM requires configuration on upstream bridges
as well as on the device itself.

My only guess is that there's something wrong with the ASPM
configuration and the device just stops responding to config accesses
(and probably MMIO accesses, too, based on the errors in your dmesg
log). Or maybe the device got powered off somehow.

Bjorn

2013-11-12 19:14:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Tue, Nov 12, 2013 at 11:25 AM, Grumbach, Emmanuel
<[email protected]> wrote:

> Right - I remember the discussion we had on that.
> On this device (7260 that has an issue with ASPM), we don't call pci_disable_link_state, because we know it is supposed to work...

If ASPM is supposed to work as far as the hardware is concerned, I
guess you're saying this must be an iwlwifi driver issue. Right?

If you think it's a PCI core problem, we have to figure out what the
core needs to do differently. If somebody can point to a difference
in the ASPM configuration between Windows and Linux, that would be a
good start.

Bjorn

2013-11-06 18:33:17

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Wed, Nov 6, 2013 at 10:50 AM, Emmanuel Grumbach <[email protected]> wrote:
> Hi,
>
> adding PCI folks.
> Here is the story:
>
> * Wzyboy has a Lenovo laptop with _OSC control *not* granted
> * L1 Active is enabled
> * kernel: 3.12.0
> * Nic is PCIe (Gen2 but not sure...)
>
> At some random point, the driver loses access to the NIC: all readl
> operation return 0xff.
> Even lspci returns 0xff:
>
> 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> here is the output of lspci *before* the issue hits:
>
> 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
> 00: 86 80 b2 08 06 04 10 00 6b 00 80 02 10 00 00 00
> 10: 04 00 40 f0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 62 c2
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 09 01 00 00
>
> have you any idea of what we can do to understand what it going wrong here?

Do you have any more details? Maybe open a bugzilla.kernel.org report
and attach:

- complete dmesg log
- lspci -vvxxx output for entire system before issue occurs
- lspci -vvxxx output for entire system after issue occurs

Bjorn

> On 11/06/2013 09:12 AM, wzyboy wrote:
>> 2013/11/6 Grumbach, Emmanuel <[email protected]>:
>>> Wait - you mean that after the bug occurred before you rebooted, lspci -xxx show all 00?
>>> I can see 0xff here.
>>> Anyway - this is very bad... checking with HW guys...
>>
>>
>> Sorry, that's my typo. They are all 0xff... (I don't know what do they
>> mean but it look bad...)
>>
>> Thanks for your effort! I'm waiting for good news from you and HW guys. :-)
>>

2013-11-08 04:41:45

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Hi,

it seems that I might find one possible "trigger" of this bug: heavy download.

Today I'm trying to download a big file with wget, and this bug occurs
six times before I could not bear it any more (every time I have to
reboot to recover my network!) and bought a network cable downstairs
in the supermarket and used wired network instead.

It seems that when downloading at full speed (20 Mbps fiber, ~2.3
MiB/s) for several minutes and the bug, very possibly, may occur.

Several days ago, when I tried to download "linux-3.12.tar.xz" from
kernal.org with wget, the bug also occured twice during the whole
download process.

Could this be a hardware issue (flawed hardware?) or just driver
issue? I bought this laptop 8 days ago and wiped the pre-installed
Windows the moment I got it, so I have no idea how this wireless card
performs under Windows.

--
wzyboy

2013-11-11 09:43:56

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

>>
>> 2013/11/10 Grumbach, Emmanuel <[email protected]>:
>> > It is really called L1 PM Substate.
>> > I don't really know the BIOS of ThinkPad... But we can try...
>>
>>
>> I have booted into BIOS and write down almost all the options, as attached.
>>
>> By the way, I always attach my laptop to an AC adaptor.
>>
>
> Cool.... they mask all the interesting options... oh well...

Can you please try this?

diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c
b/drivers/net/wireless/iwlwifi/pcie/trans.c
index ebe351d..f8fbe08 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -131,7 +131,7 @@ static void iwl_pcie_apm_config(struct iwl_trans *trans)
* power savings, even without L1.
*/
pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_LNKCTL, &lctl);
- if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
+ if (0) {
/* L1-ASPM enabled; disable(!) L0S */
iwl_set_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
dev_info(trans->dev, "L1 Enabled; Disabling L0S\n");

2013-11-06 06:37:22

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiA+PiBIaSwgSSdtIGJhY2suDQo+ID4+DQo+ID4+IFRoaXMgdGVycmlhYmxlIGJ1ZyBvY2N1cnMg
YWdhaW4gaW4gTGludXggMy4xMi4gVGhpcyB0aW1lIEkndmUgYmVlbg0KPiA+PiBwcmVwYXJlZCBm
b3IgaXQgYW5kIHdyb3RlIGRvd24gdGhlIGNvbXBsZXRlIHByb2Nlc3Mgb2YgaXRzIGFwcGVyYW5j
ZQ0KPiA+PiBhbmQgbXkgcmVhY3RpbmcsIGFzIGF0dGFjaGVkLg0KPiA+DQo+ID4gVGhhbmtzIC0g
d2FzIHRoYXQgd2l0aCBwb3dlciBzYXZlIGRpc2FibGVkPw0KPiANCj4gWWVzLCBMaW51eCAzLjEy
IHdpdGggIm9wdGlvbnMgaXdsbXZtIHBvd2VyX3NjaGVtZT0xIi4gKEkgZGlkIG5vdCB0b3VjaA0K
PiB0aGF0IC5jb25mIGZpbGUgYWZ0ZXIgdGhlIGtlcm5lbCB1cGdyYWRlKQ0KPiANCj4gPg0KPiA+
Pg0KPiA+PiBJTUhPIHRoZXkgYXJlIHF1aXRlIHNpbWlsYXIgd2l0aCB0aG9zZSBlcnJvcnMgaW4g
My4xMS4uLg0KPiA+DQo+ID4gSW5kZWVkLiBUaGUgb25seSBkaWZmZXJlbmNlIGlzIHRoYXQgeW91
IGRvbuKAmXQgaGF2ZSBQQ0kgY29tcGxhaW4gYWJvdXQgbm90DQo+IGJlaW5nIGFibGUgdG8gZGlz
YWJsZSBMMS4NCj4gDQo+IEkgc2VlLiBCdXQgSSBzdGlsbCBjYW5ub3QgZmlndXJlIG91dCB3aGF0
IGlzIHRoZSAidHJpZ2dlciIgb2YgdGhpcyBidWcuIFRvZGF5IChOb3YNCj4gMDYgVVRDKzgpIHRo
aXMgYnVnIG9jY3VycyB0d2ljZSB0aWxsIG5vdyAoMTQ6MzApLCB0aGV5IHdlcmUgYXQgMTI6MzEg
YW5kIDEzOjMwLg0KPiBBdCB0aGUgZmlyc3QgdGltZSBJIHdhcyBhYm91dCB0byBkbyBhIHN5c3Rl
bSB1cGdyYWRlIGFuZCBhdCB0aGUgc2Vjb25kIHRpbWUgSQ0KPiB3YXMgdXNpbmcgcnN5bmMgdG8g
dXBsb2FkIHBob3RvcyBmcm9tIG15IEFuZHJvaWQgcGhvbmUgdG8gbXkgbGFwdG9wLg0KPiANCj4g
U29tZXRpbWVzIEkgd2FzIG5vdCBldmVuIHVzaW5nIG5ldHdvcmsgKHRoZSB0cmFmZmljIHdhcyBu
ZWFyIHplcm8pIHdoZW4NCj4gdGhlIGJ1ZyBvY2N1cnMuIFNvIHRoaXMgYnVnIHNlZW1zIHRvIG9j
Y3VyIG5vIG1hdHRlciBvZiBuZXR3b3JrIHRyYWZmaWMNCj4gc3RhdGVzPw0KPiANCj4gQ291bGQg
eW91IHRoaW5rIG9mIGEgcG9zc2libGUgInRyaWdnZXIiIG9mIHRoaXMgYnVnIHNvIEkgY291bGQg
dHJ5IHRvIGF2b2lkIGl0IChJDQo+IGhhdGUgcmVib290aW5nKSBiZWZvcmUgdGhlIGZpbmFsIGZp
eCBpcyByZWxlYXNlZD8gRm9yIGV4YW1wbGUsIGlmIHRoZXJlIGlzDQo+IHNvbWV0aGluZyB3cm9u
ZyB3aXRoIHRoZSAibW9kdWxlcyBsaW5rZWQgaW4iIEkgY291bGQgYmxhY2tsaXN0IHRoYXQgbW9k
dWxlLi4uDQo+IA0KDQpJIGRvbid0IGtub3cgLSBJIGFtIHRyeWluZyB0byBjaGVjayB3aXRoIG91
ciBIVyBndXlzIGhlcmUuDQpDYW4geW91IHBsZWFzZSBydW4gbHNwY2kgLXh4eCBiZWZvcmUgYW5k
IGFmdGVyIGl0IGhhcHBlbnM/DQpCVFcgLSBob3cgZG8geW91IHJlY292ZXI/IFJlbG9hZGluZyB0
aGUgbW9kdWxlIGlzIGVub3VnaD8NCg0K

2013-11-10 07:03:18

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

>
>
> Sorry I used the wrong command...
>
> I've attached the output of -vvxxx below.
>
> There are three files:
>
> * lspci.vvxxx.normal.txt: When the interface is "state DOWN" in "ip link".
> * lspci.vvxxx.normal2.txt: When the interface is "state UP" in "ip
> link" after I ran "ip link set wlan0 up".
> * lspci.vvxxx.normal3.txt" When the interface is connected to the
> Wi-Fi of my dormitory and got an address (but without default
> gateway, I'm using wired network now).
>

one more thing?
Are you using KVM with pass-through? Or is it a native installation?

2013-12-28 09:57:55

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

(Sorry for sending this mail multiple times...)

Hi,

yesterday a friend of mine told me that one can now install Windows
8/8.1 on a USB device natively. So I tried this so-called "Windows To
Go" technology.

Now I have successfully deployed a Windows 8.1 installation on my
external USB HDD and booted my laptop up with it. That is to say, I
can now observe how Intel 7260 acts under Windows.

Could you tell me what should I do to gather debugging information
such as L1 mode etc in Windows? I will send them to you then. Maybe
this could help, to figure out what the bug of iwlwifi is.

--
wzyboy

2013-12-29 11:45:21

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

>
>
> Here are the output of "PCI" and "PCI Index" of Intel Wireless.
>
>

looks like all the power features are enabled... including the ones I
told you to disable.
I am lost now...

2013-12-29 13:06:41

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/12/29 Emmanuel Grumbach <[email protected]>:
> looks like all the power features are enabled... including the ones I
> told you to disable.
> I am lost now...


Oh... That sounds bad. But I thought both Windows and Linux driver for
this NIC is written by Intel?

--
wzyboy

2013-12-29 09:23:37

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/12/29 Grumbach, Emmanuel <[email protected]>:
> http://rweverything.com/


Here are the output of "PCI" and "PCI Index" of Intel Wireless.


--
wzyboy


Attachments:
P030000.rw (33.38 kB)
PI000000.rw (3.16 kB)
Download all attachments

2013-12-25 08:28:00

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Fri, Nov 15, 2013 at 11:04 AM, wzyboy <[email protected]> wrote:
> Thanks a lot for explaination, Emmanuel!
>
> Now I finally know why this is a "catch-22" situation: Disabling those
> features with OS/drvier cannot be as neat as disabling them directly
> in BIOS. And there may be chance, that disabling them at a bad timing
> may cause G3...
>
> --

Back to you.
Can you please try not to do the setpci and add this:


diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c
b/drivers/net/wireless/iwlwifi/pcie/tx.c
index 079a511..e8a52f3 100644
--- a/drivers/net/wireless/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
@@ -707,6 +707,8 @@ void iwl_pcie_tx_start(struct iwl_trans *trans,
u32 scd_base_addr)
iwl_write_direct32(trans, FH_TX_CHICKEN_BITS_REG,
reg_val | FH_TX_CHICKEN_BITS_SCD_AUTO_RETRY_EN);

+ iwl_set_bits_prph(trans, 0xa04068, 0x8);
+
/* Enable L1-Active */
iwl_clear_bits_prph(trans, APMG_PCIDEV_STT_REG,
APMG_PCIDEV_STT_VAL_L1_ACT_DIS);


thanks

2013-12-25 10:44:27

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2013/12/25 Emmanuel Grumbach <[email protected]>:
> Back to you.
> Can you please try not to do the setpci and add this:


Glad to see you again :-)

I've compiled Linux 3.12.5 with your patch, removed "setpci" trick and rebooted.

During the boot of new kernel, I can see additional (error) messages
among "systemd-fsck" lines but I was not fast enough to take photos
for them before they disappeared (flushed away) by tty login
interface.

After logging in, I find that netcfg did not connect to dormitory's
Wi-Fi as before. I run "lspci -vvxxx" and find that the interface is
filled with "ff". I've attached the output of "lspci -vvxxx" and
"dmesg".




(And here is something "fun": I reverted my kernel to Arch's official
3.12.5 and rebooted, and the interface is totally missing! I mean, it
disappeared from the output of "ip link". I cannot even see it in
"lspci -vvxxx", not even "ff". The strange effect vanished after one
more reboot and a cold boot.)

--
wzyboy


Attachments:
patch-20131225.dmesg (53.29 kB)
patch-20131225.lspci.vvxxx (11.16 kB)
Download all attachments

2013-12-28 09:55:11

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Hi,

yesterday a friend of mine told me that one can now install Windows
8/8.1 on a USB device natively. So I tried this so-called "Windows To
Go" technology.

Now I have successfully deployed a Windows 8.1 installation on my
external USB HDD and booted my laptop up with it. That is to say, I
can now observe how Intel 7260 acts under Windows.

--
wzyboy

Could you tell me what should I do to gather debugging information
such as L1 mode etc in Windows? I will send them to you then. Maybe
this could help, to figure out what the bug of iwlwifi is.

2013-12-29 08:15:05

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiBIaSwNCj4gDQo+IHllc3RlcmRheSBhIGZyaWVuZCBvZiBtaW5lIHRvbGQgbWUgdGhhdCBvbmUg
Y2FuIG5vdyBpbnN0YWxsIFdpbmRvd3MNCj4gOC84LjEgb24gYSBVU0IgZGV2aWNlIG5hdGl2ZWx5
LiBTbyBJIHRyaWVkIHRoaXMgc28tY2FsbGVkICJXaW5kb3dzIFRvIEdvIg0KPiB0ZWNobm9sb2d5
Lg0KPiANCj4gTm93IEkgaGF2ZSBzdWNjZXNzZnVsbHkgZGVwbG95ZWQgYSBXaW5kb3dzIDguMSBp
bnN0YWxsYXRpb24gb24gbXkgZXh0ZXJuYWwNCj4gVVNCIEhERCBhbmQgYm9vdGVkIG15IGxhcHRv
cCB1cCB3aXRoIGl0LiBUaGF0IGlzIHRvIHNheSwgSSBjYW4gbm93IG9ic2VydmUNCj4gaG93IElu
dGVsIDcyNjAgYWN0cyB1bmRlciBXaW5kb3dzLg0KPiANCj4gQ291bGQgeW91IHRlbGwgbWUgd2hh
dCBzaG91bGQgSSBkbyB0byBnYXRoZXIgZGVidWdnaW5nIGluZm9ybWF0aW9uIHN1Y2ggYXMNCj4g
TDEgbW9kZSBldGMgaW4gV2luZG93cz8gSSB3aWxsIHNlbmQgdGhlbSB0byB5b3UgdGhlbi4gTWF5
YmUgdGhpcyBjb3VsZA0KPiBoZWxwLCB0byBmaWd1cmUgb3V0IHdoYXQgdGhlIGJ1ZyBvZiBpd2x3
aWZpIGlzLg0KDQpodHRwOi8vcndldmVyeXRoaW5nLmNvbS8NCg==

2013-12-25 10:38:24

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gDQo+IEdsYWQgdG8gc2VlIHlvdSBhZ2FpbiA6LSkNCj4gDQo+IEkndmUgY29tcGlsZWQg
TGludXggMy4xMi41IHdpdGggeW91ciBwYXRjaCwgcmVtb3ZlZCAic2V0cGNpIiB0cmljayBhbmQN
Cj4gcmVib290ZWQuDQo+IA0KPiBEdXJpbmcgdGhlIGJvb3Qgb2YgbmV3IGtlcm5lbCwgSSBjYW4g
c2VlIGFkZGl0aW9uYWwgKGVycm9yKSBtZXNzYWdlcyBhbW9uZw0KPiAic3lzdGVtZC1mc2NrIiBs
aW5lcyBidXQgSSB3YXMgbm90IGZhc3QgZW5vdWdoIHRvIHRha2UgcGhvdG9zIGZvciB0aGVtDQo+
IGJlZm9yZSB0aGV5IGRpc2FwcGVhcmVkIChmbHVzaGVkIGF3YXkpIGJ5IHR0eSBsb2dpbiBpbnRl
cmZhY2UuDQo+IA0KPiBBZnRlciBsb2dnaW5nIGluLCBJIGZpbmQgdGhhdCBuZXRjZmcgZGlkIG5v
dCBjb25uZWN0IHRvIGRvcm1pdG9yeSdzIFdpLUZpIGFzDQo+IGJlZm9yZS4gSSBydW4gImxzcGNp
IC12dnh4eCIgYW5kIGZpbmQgdGhhdCB0aGUgaW50ZXJmYWNlIGlzIGZpbGxlZCB3aXRoICJmZiIu
IEkndmUNCj4gYXR0YWNoZWQgdGhlIG91dHB1dCBvZiAibHNwY2kgLXZ2eHh4IiBhbmQgImRtZXNn
Ii4NCj4gDQoNClNvIGl0IGRpZG4ndCB3b3JrIC0gb2suIEkgYW0gbm90IHN1cnByaXNlZCwgYnV0
IEkgc3RpbGwgd2FudGVkIHRvIGtub3cuDQpUaGlzIHBhdGNoIGlzIHN1cHBvc2VkIHRvIGZpeCBz
b21lIHRpbWluZyBpc3N1ZSBpbiB0aGUgd2FrZSB1cCBmcm9tIEwxLg0KVGhhbmtzIGZvciB0ZXN0
aW5nLg0KDQo+IA0KPiAoQW5kIGhlcmUgaXMgc29tZXRoaW5nICJmdW4iOiBJIHJldmVydGVkIG15
IGtlcm5lbCB0byBBcmNoJ3Mgb2ZmaWNpYWwNCj4gMy4xMi41IGFuZCByZWJvb3RlZCwgYW5kIHRo
ZSBpbnRlcmZhY2UgaXMgdG90YWxseSBtaXNzaW5nISBJIG1lYW4sIGl0DQo+IGRpc2FwcGVhcmVk
IGZyb20gdGhlIG91dHB1dCBvZiAiaXAgbGluayIuIEkgY2Fubm90IGV2ZW4gc2VlIGl0IGluICJs
c3BjaSAtDQo+IHZ2eHh4Iiwgbm90IGV2ZW4gImZmIi4gVGhlIHN0cmFuZ2UgZWZmZWN0IHZhbmlz
aGVkIGFmdGVyIG9uZSBtb3JlIHJlYm9vdA0KPiBhbmQgYSBjb2xkIGJvb3QuKQ0KDQpHcmVhdCAt
IHRoYXQgY2FuIGhhcHBlbiBzb21ldGltZXMgd2hlbiBzb21ldGhpbmcgZ29lcyB3cm9uZy4uLg0K

2014-01-14 03:56:28

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/14 Bjorn Helgaas <[email protected]>:
> It doesn't seem strange to me; maybe we're interpreting wzyboy's data
> differently. The way I read it, if /etc/modprobe.d/iwlwifi.conf
> contains "options iwlmvm power_scheme=1", everything works fine (cases
> 0, 4, 8). If iwlwifi.conf does not exist or contains only a
> commented-out line, he sees problems (cases 2, 6).
>

Yes, Bjorn got my point :-)

With "options iwlmvm power_scheme=1" -> network is good
Without "options iwlmvm power_scheme=1" -> network is bad

Currently I am using Linux 3.12.7 with Emmanuel's patch, and without
any setpci tricks.

Emmanuel says a new firmware is available in 3.13 so I will follow up
when 3.13 is released in my distro's repo.



--
Sascha Weaver

2014-01-02 21:34:37

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

On Sun, Dec 29, 2013 at 2:23 AM, wzyboy <[email protected]> wrote:
> 2013/12/29 Grumbach, Emmanuel <[email protected]>:
>> http://rweverything.com/
>
>
> Here are the output of "PCI" and "PCI Index" of Intel Wireless.

ASPM must be configured on both ends of the link, so for completeness,
can you also collect the "PCI" output for the bridge leading to the
7260 device? Based on the Linux lspci output, this should be
0000:00:1c.1.

And I assume the device works well with the Windows driver?

Bjorn

2014-01-13 06:16:10

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/13 Grumbach, Emmanuel <[email protected]>:
> Small update from the bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=64541).
> The bug is solved... There seem to be a hardware bug with the L1 OFF exit timer. To solve this bug we need *not* to rely on the internal clock and need to keep using the external clock. The internal clock isn't reliable enough and can lead to loss of synchronization between the bridge and the device upon L1 OFF transition.
> This issue has been seen in simulation and not on real hardware... until now... The windows driver has a workaround for this hardware bug, this is why the issue wasn't seen on Windows. I am porting the work around to the Linux driver.
>
> Thank you wzyboy for your patience...
>
> Then end.


So this is a hardware bug instead of a driver bug? Oh...

I'm glad this is finally solved since 2013-11-03. Thanks to all,
providing me with wordarounds, without which I could only use wired
network. :-)

Cheers.

--
wzyboy

2014-01-04 14:41:42

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/3 Bjorn Helgaas <[email protected]>:
> ASPM must be configured on both ends of the link, so for completeness,
> can you also collect the "PCI" output for the bridge leading to the
> 7260 device? Based on the Linux lspci output, this should be
> 0000:00:1c.1.
>
> And I assume the device works well with the Windows driver?


Here are the "PCI" and "PCI Index" data for 0000:00:1c:1.

And yes the NIC works nice in Windows.

--
wzyboy


Attachments:
P001C01.rw (33.47 kB)
PI1C0100.rw (3.16 kB)
Download all attachments

2014-01-13 08:59:42

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gSGksIHRoZSBwb2ludCBpczoNCj4gDQo+ICogV2l0aCAib3B0aW9ucyBpd2xtdm0gcG93
ZXJfc2NoZW1lPTEiIC0+IGV2ZXJ5dGhpbmcncyBmaW5lDQo+ICogV2l0aG91dCAib3B0aW9ucyBp
d2xtdm0gcG93ZXJfc2NoZW1lPTEiIC0+IG5ldHdvcmsgaXMgYmFkDQo+IA0KPiBBYnNlbnNlIG9m
IHRoZSBtb2Rjb25mIGZpbGUgYW5kIGEgbW9kY29uZiB3aXRoIG9ubHkgb25lIGNvbW1lbnQgbGlu
ZQ0KPiBoYXZlIHRoZSAqc2FtZSogZWZmZWN0IC0gbmV0d29yayBpcyBiYWQuDQo+IA0KPiBjYXQg
L3N5cy9tb2R1bGUvaXdsbXZtL3BhcmFtZXRlcnMvcG93ZXJfc2NoZW1lIG5vdyByZXR1cm5zIDEs
IGFuZA0KPiB0aGUgbmV0d29yayBpcyBnb29kLg0KPiANCj4gPiBBbHNvIC0gd2hhdCBjb2RlIGJh
c2UgYXJlIHlvdSB1c2luZz8NCj4gDQo+IFdoYXQgaXMgImNvZGUgYmFzZSIuLi4/DQoNCldoYXQg
a2VybmVsIHZlcnNpb24gOikNCklmIHlvdSB1c2UgMy4xMywgeW91IGNhbiB0cnkgYSBuZXdlciBm
aXJtd2FyZS4NCg0KPiANCj4gRW5nbGlzaCBpcyBub3QgbXkgbmF0aXZlIGxhbmd1YWdlIHNvIEkg
Y2hlY2tlZCBpZiBJIG1pc3VzZSB0aGUgcGhyYXNlDQo+ICJjb21tZW50IG91dCI6IGh0dHA6Ly9l
bmdsaXNoLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy8zMzQ4My93aGVuLQ0KPiBpLXNheS1j
b21tZW50LW91dC1kb2VzLWl0LW1lYW4tdG8tdW5jb21tZW50LXNvbWV0aGluZy1vci1jb21tZW50
LWl0DQo+IA0KPiBTZWVtcyBub3QuLi4NCg0KRW5nbGlzaCBpc24ndCBteSBtb3RoZXIgdG9uZ3Vl
IGVpdGhlcjopDQoNCj4gDQo+IC0tDQo+IHd6eWJveQ0K

2014-01-13 17:29:51

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

[-cc linux-pci]

On Mon, Jan 13, 2014 at 1:26 AM, Grumbach, Emmanuel
<[email protected]> wrote:
>> Hi,
>>
>> I am sorry but here is bad news.
>>
>> During previous debugging process, I have a modconf file
>> /etc/modprobe.d/iwlwifi.conf, containing "options iwlmvm
>> power_scheme=1". I removed it just now (Emmanuel says I can remove it
>> now) and encountered (maybe) new bugs. Here is what I did just now:
>>
>> 0. Current kernel: patched with
>> https://bugzilla.kernel.org/attachment.cgi?id=121671&action=diff ;
>> setpci trick: none ; NIC status: works nice after ~16 hours heavy
>> usage.
>> 1. Delete that modconf file, reboot.
>> 2. Network connection becomes painfully laggy and lossy.
>> 3. Re-create that modconf file, reboot.
>> 4. Network connection works fine.
>> 5. Comment out that line, reboot.
>> 6. Network connection becomes painfully laggy and lossy.
>> 7. Uncomment that line, reboot.
>> 8. Network connection works fine.
>>
>> What I mean "painfully laggy and lossy" is that, to whomever I "ping"
>> (Google, 8.8.8.8, local DNS server...), the RTT is rather high than
>> normal, and packet loss rate is above 90% (some addresses 100% loss).
>> While at the same time, other network device in the same LAN works
>> fine.
>>
>> I've attached dmesg and lspci output at step 6 and 8.
>>
>
> Are you sure about step 1 and 5?
> It seems completely weird that an existing file with a line commented out have any impact.

It doesn't seem strange to me; maybe we're interpreting wzyboy's data
differently. The way I read it, if /etc/modprobe.d/iwlwifi.conf
contains "options iwlmvm power_scheme=1", everything works fine (cases
0, 4, 8). If iwlwifi.conf does not exist or contains only a
commented-out line, he sees problems (cases 2, 6).

> Can you please send the output of:
> cat /sys/module/iwlmvm/parameters/power_scheme
> in both cases.
>
> Also - what code base are you using?
> Since this is surely not related to PCI, please remove them in your reply.
> (I keep them here to have them see my mail :))

Agreed, this doesn't sound PCI-related, so I removed linux-pci. Feel
free to keep me or add me back if you do see anything PCI-related.

Bjorn

2014-01-13 07:56:43

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

Hi,

I am sorry but here is bad news.

During previous debugging process, I have a modconf file
/etc/modprobe.d/iwlwifi.conf, containing "options iwlmvm
power_scheme=1". I removed it just now (Emmanuel says I can remove it
now) and encountered (maybe) new bugs. Here is what I did just now:

0. Current kernel: patched with
https://bugzilla.kernel.org/attachment.cgi?id=121671&action=diff ;
setpci trick: none ; NIC status: works nice after ~16 hours heavy
usage.
1. Delete that modconf file, reboot.
2. Network connection becomes painfully laggy and lossy.
3. Re-create that modconf file, reboot.
4. Network connection works fine.
5. Comment out that line, reboot.
6. Network connection becomes painfully laggy and lossy.
7. Uncomment that line, reboot.
8. Network connection works fine.

What I mean "painfully laggy and lossy" is that, to whomever I "ping"
(Google, 8.8.8.8, local DNS server...), the RTT is rather high than
normal, and packet loss rate is above 90% (some addresses 100% loss).
While at the same time, other network device in the same LAN works
fine.

I've attached dmesg and lspci output at step 6 and 8.


--
wzyboy


Attachments:
dmesg-with-modconf.txt (49.80 kB)
dmesg-without-modconf.txt (50.37 kB)
lspci-vvxxxx-with-modconf.txt (107.97 kB)
lspci-vvxxxx-without-modconf.txt (107.97 kB)
Download all attachments

2014-01-13 09:06:09

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/13 Grumbach, Emmanuel <[email protected]>:
> What kernel version :)
> If you use 3.13, you can try a newer firmware.
>

I applied your patch against Arch Linux's stock kernel 3.12.7, using
this PKGBUILD: https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/linux&id=5e2f3b10dc4be40da8f1fe355bc871d7936ec2d8

wzyboy@xenien:~$ uname -a
Linux xenien.wzyboy.im 3.12.7-1-ARCH #1 SMP PREEMPT Sun Jan 12
20:38:55 CST 2014 x86_64 GNU/Linux
wzyboy@xenien:~$ cat /proc/version
Linux version 3.12.7-1-ARCH ([email protected]) (gcc version
4.8.2 20131219 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Jan 12 20:38:55
CST 2014



--
wzyboy

2014-01-13 10:52:10

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiANCj4gMjAxNC8xLzEzIEdydW1iYWNoLCBFbW1hbnVlbCA8ZW1tYW51ZWwuZ3J1bWJhY2hAaW50
ZWwuY29tPjoNCj4gPiBXaGF0IGtlcm5lbCB2ZXJzaW9uIDopDQo+ID4gSWYgeW91IHVzZSAzLjEz
LCB5b3UgY2FuIHRyeSBhIG5ld2VyIGZpcm13YXJlLg0KPiA+DQo+IA0KPiBJIGFwcGxpZWQgeW91
ciBwYXRjaCBhZ2FpbnN0IEFyY2ggTGludXgncyBzdG9jayBrZXJuZWwgMy4xMi43LCB1c2luZyB0
aGlzDQo+IFBLR0JVSUxEOg0KPiBodHRwczovL3Byb2plY3RzLmFyY2hsaW51eC5vcmcvc3ZudG9n
aXQvcGFja2FnZXMuZ2l0L3RyZWUvdHJ1bmsvUEtHQlVJTEQ/DQo+IGg9cGFja2FnZXMvbGludXgm
aWQ9NWUyZjNiMTBkYzRiZTQwZGE4ZjFmZTM1NWJjODcxZDc5MzZlYzJkOA0KPiANCj4gd3p5Ym95
QHhlbmllbjp+JCB1bmFtZSAtYQ0KPiBMaW51eCB4ZW5pZW4ud3p5Ym95LmltIDMuMTIuNy0xLUFS
Q0ggIzEgU01QIFBSRUVNUFQgU3VuIEphbiAxMg0KPiAyMDozODo1NSBDU1QgMjAxNCB4ODZfNjQg
R05VL0xpbnV4DQo+IHd6eWJveUB4ZW5pZW46fiQgY2F0IC9wcm9jL3ZlcnNpb24NCj4gTGludXgg
dmVyc2lvbiAzLjEyLjctMS1BUkNIICh3enlib3lAeGVuaWVuLnd6eWJveS5pbSkgKGdjYyB2ZXJz
aW9uDQo+IDQuOC4yIDIwMTMxMjE5IChwcmVyZWxlYXNlKSAoR0NDKSApICMxIFNNUCBQUkVFTVBU
IFN1biBKYW4gMTIgMjA6Mzg6NTUgQ1NUDQo+IDIwMTQNCj4gDQo+IA0KDQpBcmUgeW91IHVzaW5n
IEJsdWV0b290aCBhdCB0aGUgc2FtZSB0aW1lPw0KQWxzbyAtIGl0IG1pZ2h0IGJlIHdvcnRoIHRv
IHRyeSB3aXJlbGVzcy10ZXN0aW5nIGFuZCB0aGUgbGF0ZXN0IGZpcm13YXJlLg0KV2hpbGUgeW91
IHNlZW0gdG8gYmUgdGhlIGZpcnN0IHRvIHJlcG9ydCBpc3N1ZXMgYWJvdXQgcG93ZXIgc2F2ZSBv
biB0aGlzIGZpcm13YXJlLCB3ZSBrbm93IHRoYXQgYSBsb3Qgb2YgaXNzdWVzIGhhdmUgYmVlbiBm
aXhlZCBpbiB0aGUgbGF0ZXN0IGZpcm13YXJlICgtOC51Y29kZSkgd2hpY2ggaXMgc3VwcG9ydGVk
IGluIDMuMTMgb25seS4gQnV0IGlmIHlvdSBjaGFuZ2Uga2VybmVsLCBnbyBmb3Igd2lyZWxlc3Mt
dGVzdGluZy4NClRoYW5rcy4NCg0KPiANCj4gLS0NCj4gd3p5Ym95DQo=

2014-01-13 10:56:30

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/13 Grumbach, Emmanuel <[email protected]>:
> Are you using Bluetooth at the same time?

No, I disabled bluetooth in BIOS since Nov 2013. Never have used it.

> Also - it might be worth to try wireless-testing and the latest firmware.
> While you seem to be the first to report issues about power save on this firmware, we know that a lot of issues have been fixed in the latest firmware (-8.ucode) which is supported in 3.13 only. But if you change kernel, go for wireless-testing.
> Thanks.

Okay, I'll put this down and follow up when Linux 3.13 goes stable.

Thanks.

--
wzyboy

2014-01-13 08:26:15

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiBIaSwNCj4gDQo+IEkgYW0gc29ycnkgYnV0IGhlcmUgaXMgYmFkIG5ld3MuDQo+IA0KPiBEdXJp
bmcgcHJldmlvdXMgZGVidWdnaW5nIHByb2Nlc3MsIEkgaGF2ZSBhIG1vZGNvbmYgZmlsZQ0KPiAv
ZXRjL21vZHByb2JlLmQvaXdsd2lmaS5jb25mLCBjb250YWluaW5nICJvcHRpb25zIGl3bG12bQ0K
PiBwb3dlcl9zY2hlbWU9MSIuIEkgcmVtb3ZlZCBpdCBqdXN0IG5vdyAoRW1tYW51ZWwgc2F5cyBJ
IGNhbiByZW1vdmUgaXQNCj4gbm93KSBhbmQgZW5jb3VudGVyZWQgKG1heWJlKSBuZXcgYnVncy4g
SGVyZSBpcyB3aGF0IEkgZGlkIGp1c3Qgbm93Og0KPiANCj4gMC4gQ3VycmVudCBrZXJuZWw6IHBh
dGNoZWQgd2l0aA0KPiBodHRwczovL2J1Z3ppbGxhLmtlcm5lbC5vcmcvYXR0YWNobWVudC5jZ2k/
aWQ9MTIxNjcxJmFjdGlvbj1kaWZmIDsNCj4gc2V0cGNpIHRyaWNrOiBub25lIDsgTklDIHN0YXR1
czogd29ya3MgbmljZSBhZnRlciB+MTYgaG91cnMgaGVhdnkNCj4gdXNhZ2UuDQo+IDEuIERlbGV0
ZSB0aGF0IG1vZGNvbmYgZmlsZSwgcmVib290Lg0KPiAyLiBOZXR3b3JrIGNvbm5lY3Rpb24gYmVj
b21lcyBwYWluZnVsbHkgbGFnZ3kgYW5kIGxvc3N5Lg0KPiAzLiBSZS1jcmVhdGUgdGhhdCBtb2Rj
b25mIGZpbGUsIHJlYm9vdC4NCj4gNC4gTmV0d29yayBjb25uZWN0aW9uIHdvcmtzIGZpbmUuDQo+
IDUuIENvbW1lbnQgb3V0IHRoYXQgbGluZSwgcmVib290Lg0KPiA2LiBOZXR3b3JrIGNvbm5lY3Rp
b24gYmVjb21lcyBwYWluZnVsbHkgbGFnZ3kgYW5kIGxvc3N5Lg0KPiA3LiBVbmNvbW1lbnQgdGhh
dCBsaW5lLCByZWJvb3QuDQo+IDguIE5ldHdvcmsgY29ubmVjdGlvbiB3b3JrcyBmaW5lLg0KPiAN
Cj4gV2hhdCBJIG1lYW4gInBhaW5mdWxseSBsYWdneSBhbmQgbG9zc3kiIGlzIHRoYXQsIHRvIHdo
b21ldmVyIEkgInBpbmciDQo+IChHb29nbGUsIDguOC44LjgsIGxvY2FsIEROUyBzZXJ2ZXIuLi4p
LCB0aGUgUlRUIGlzIHJhdGhlciBoaWdoIHRoYW4NCj4gbm9ybWFsLCBhbmQgcGFja2V0IGxvc3Mg
cmF0ZSBpcyBhYm92ZSA5MCUgKHNvbWUgYWRkcmVzc2VzIDEwMCUgbG9zcykuDQo+IFdoaWxlIGF0
IHRoZSBzYW1lIHRpbWUsIG90aGVyIG5ldHdvcmsgZGV2aWNlIGluIHRoZSBzYW1lIExBTiB3b3Jr
cw0KPiBmaW5lLg0KPiANCj4gSSd2ZSBhdHRhY2hlZCBkbWVzZyBhbmQgbHNwY2kgb3V0cHV0IGF0
IHN0ZXAgNiBhbmQgOC4NCj4gDQoNCkFyZSB5b3Ugc3VyZSBhYm91dCBzdGVwIDEgYW5kIDU/DQpJ
dCBzZWVtcyBjb21wbGV0ZWx5IHdlaXJkIHRoYXQgYW4gZXhpc3RpbmcgZmlsZSB3aXRoIGEgbGlu
ZSBjb21tZW50ZWQgb3V0IGhhdmUgYW55IGltcGFjdC4NCkNhbiB5b3UgcGxlYXNlIHNlbmQgdGhl
IG91dHB1dCBvZjoNCgljYXQgL3N5cy9tb2R1bGUvaXdsbXZtL3BhcmFtZXRlcnMvcG93ZXJfc2No
ZW1lDQppbiBib3RoIGNhc2VzLg0KDQpBbHNvIC0gd2hhdCBjb2RlIGJhc2UgYXJlIHlvdSB1c2lu
Zz8NClNpbmNlICB0aGlzIGlzIHN1cmVseSBub3QgcmVsYXRlZCB0byBQQ0ksIHBsZWFzZSByZW1v
dmUgdGhlbSBpbiB5b3VyIHJlcGx5Lg0KKEkga2VlcCB0aGVtIGhlcmUgdG8gaGF2ZSB0aGVtIHNl
ZSBteSBtYWlsIDopKQ0K

2014-01-13 06:02:06

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: [Ilw] Intel Wireless 7260 hardware timed out randomly

PiA+IEFTUE0gbXVzdCBiZSBjb25maWd1cmVkIG9uIGJvdGggZW5kcyBvZiB0aGUgbGluaywgc28g
Zm9yIGNvbXBsZXRlbmVzcywNCj4gPiBjYW4geW91IGFsc28gY29sbGVjdCB0aGUgIlBDSSIgb3V0
cHV0IGZvciB0aGUgYnJpZGdlIGxlYWRpbmcgdG8gdGhlDQo+ID4gNzI2MCBkZXZpY2U/ICBCYXNl
ZCBvbiB0aGUgTGludXggbHNwY2kgb3V0cHV0LCB0aGlzIHNob3VsZCBiZQ0KPiA+IDAwMDA6MDA6
MWMuMS4NCj4gPg0KPiA+IEFuZCBJIGFzc3VtZSB0aGUgZGV2aWNlIHdvcmtzIHdlbGwgd2l0aCB0
aGUgV2luZG93cyBkcml2ZXI/DQo+IA0KPiANCj4gSGVyZSBhcmUgdGhlICJQQ0kiIGFuZCAiUENJ
IEluZGV4IiBkYXRhIGZvciAwMDAwOjAwOjFjOjEuDQo+IA0KPiBBbmQgeWVzIHRoZSBOSUMgd29y
a3MgbmljZSBpbiBXaW5kb3dzLg0KPiANCg0KU21hbGwgdXBkYXRlIGZyb20gdGhlIGJ1Z3ppbGxh
IChodHRwczovL2J1Z3ppbGxhLmtlcm5lbC5vcmcvc2hvd19idWcuY2dpP2lkPTY0NTQxKS4NClRo
ZSBidWcgaXMgc29sdmVkLi4uIFRoZXJlIHNlZW0gdG8gYmUgYSBoYXJkd2FyZSBidWcgd2l0aCB0
aGUgTDEgT0ZGIGV4aXQgdGltZXIuIFRvIHNvbHZlIHRoaXMgYnVnIHdlIG5lZWQgKm5vdCogdG8g
cmVseSBvbiB0aGUgaW50ZXJuYWwgY2xvY2sgYW5kIG5lZWQgdG8ga2VlcCB1c2luZyB0aGUgZXh0
ZXJuYWwgY2xvY2suIFRoZSBpbnRlcm5hbCBjbG9jayBpc24ndCByZWxpYWJsZSBlbm91Z2ggYW5k
IGNhbiBsZWFkIHRvIGxvc3Mgb2Ygc3luY2hyb25pemF0aW9uIGJldHdlZW4gdGhlIGJyaWRnZSBh
bmQgdGhlIGRldmljZSB1cG9uIEwxIE9GRiB0cmFuc2l0aW9uLg0KVGhpcyBpc3N1ZSBoYXMgYmVl
biBzZWVuIGluIHNpbXVsYXRpb24gYW5kIG5vdCBvbiByZWFsIGhhcmR3YXJlLi4uIHVudGlsIG5v
dy4uLiBUaGUgd2luZG93cyBkcml2ZXIgaGFzIGEgd29ya2Fyb3VuZCBmb3IgdGhpcyBoYXJkd2Fy
ZSBidWcsIHRoaXMgaXMgd2h5IHRoZSBpc3N1ZSB3YXNuJ3Qgc2VlbiBvbiBXaW5kb3dzLiBJIGFt
IHBvcnRpbmcgdGhlIHdvcmsgYXJvdW5kIHRvIHRoZSBMaW51eCBkcml2ZXIuDQoNClRoYW5rIHlv
dSB3enlib3kgZm9yIHlvdXIgcGF0aWVuY2UuLi4NCg0KVGhlbiBlbmQuDQo=

2014-01-13 08:51:13

by Sascha Weaver

[permalink] [raw]
Subject: Re: [Ilw] Intel Wireless 7260 hardware timed out randomly

2014/1/13 Grumbach, Emmanuel <[email protected]>:
> Are you sure about step 1 and 5?
> It seems completely weird that an existing file with a line commented out have any impact.
> Can you please send the output of:
> cat /sys/module/iwlmvm/parameters/power_scheme
> in both cases.
>

Hi, the point is:

* With "options iwlmvm power_scheme=1" -> everything's fine
* Without "options iwlmvm power_scheme=1" -> network is bad

Absense of the modconf file and a modconf with only one comment line
have the *same* effect - network is bad.

cat /sys/module/iwlmvm/parameters/power_scheme now returns 1, and the
network is good.

> Also - what code base are you using?

What is "code base"...?

> Since this is surely not related to PCI, please remove them in your reply.
> (I keep them here to have them see my mail :))

Done. :-)



English is not my native language so I checked if I misuse the phrase
"comment out": http://english.stackexchange.com/questions/33483/when-i-say-comment-out-does-it-mean-to-uncomment-something-or-comment-it

Seems not...

--
wzyboy