2014-04-12 15:11:39

by Alexander Monakov

[permalink] [raw]
Subject: iwlwifi null pointer dereference

Hello,

On a laptop with Wireless-N 135 card and 3.14.0 kernel I'm getting a
kernel oops on cold boot ans wireless seems unusable, but suprisingly
everything works fine after a reboot:

[ 4.155609] Intel(R) Wireless WiFi driver for Linux, in-tree:
[ 4.155611] Copyright(c) 2003- 2014 Intel Corporation
[ 4.155745] iwlwifi 0000:05:00.0: can't disable ASPM; OS doesn't
have ASPM control
[ 4.155810] iwlwifi 0000:05:00.0: pci_enable_msi failed(0Xffffffda)
[ 4.173755] iwlwifi 0000:05:00.0: RF_KILL bit toggled to enable radio.
[ 4.173763] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 4.175243] IP: [<ffffffffa00057a0>]
iwl_pcie_irq_handler+0xa10/0xb50 [iwlwifi]
[ 4.176641] PGD 0
[ 4.176874] iwlwifi 0000:05:00.0: loaded firmware version
18.168.6.1 op_mode iwldvm
[ 4.179239] Oops: 0000 [#1] SMP
[ 4.180623] Modules linked in: iwlwifi

I've upgraded to 3.14.0 from 3.10.5, where this problem was not
present. I can bisect as needed. The complete kernel log is
attached.

Any suggestions?

Thanks.
Alexander


Attachments:
iwloops.txt (61.06 kB)

2014-04-20 06:06:17

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

Hi,


On 04/18/2014 11:33 PM, Alexander Monakov wrote:
> On Fri, Apr 18, 2014 at 6:50 PM, Emmanuel Grumbach <[email protected]> wrote:
>>>> I have to say that this is really strange - I must miss something. It looks like the ISR is called even before it is requested...
>>>> So I definitely want to understand what is happening here before applying the fix.
>>>
>>> Sure, I can help with further investigation. Would it help if I bisected?
>>>
>>
>> I can't decline such an offer:)
>>
>> Note that the "bad" commit might be in pci too.
>> As a first try, I'd try 3.13.
>> I've made quite a few changes in the ISR for 3.14.
>
> It bisects to
>
> commit 2dbc368d7fded35ed221a3751405b15e06eb8925
> Author: Emmanuel Grumbach <[email protected]>
> Date: Mon Dec 9 11:09:47 2013 +0200
>
> iwlwifi: pcie: track interrupt mask in SW
>

Yeah - ok.
Thank you for this information.
I'll check again later.

2014-04-13 07:23:35

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Sat, Apr 12, 2014 at 6:11 PM, Alexander Monakov <[email protected]> wrote:
> Hello,
>
> On a laptop with Wireless-N 135 card and 3.14.0 kernel I'm getting a
> kernel oops on cold boot ans wireless seems unusable, but suprisingly
> everything works fine after a reboot:
>
> [ 4.155609] Intel(R) Wireless WiFi driver for Linux, in-tree:
> [ 4.155611] Copyright(c) 2003- 2014 Intel Corporation
> [ 4.155745] iwlwifi 0000:05:00.0: can't disable ASPM; OS doesn't
> have ASPM control
> [ 4.155810] iwlwifi 0000:05:00.0: pci_enable_msi failed(0Xffffffda)
> [ 4.173755] iwlwifi 0000:05:00.0: RF_KILL bit toggled to enable radio.
> [ 4.173763] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 4.175243] IP: [<ffffffffa00057a0>]
> iwl_pcie_irq_handler+0xa10/0xb50 [iwlwifi]
> [ 4.176641] PGD 0
> [ 4.176874] iwlwifi 0000:05:00.0: loaded firmware version
> 18.168.6.1 op_mode iwldvm
> [ 4.179239] Oops: 0000 [#1] SMP
> [ 4.180623] Modules linked in: iwlwifi
>

Wow - interesting - you are getting an interrupt before we even enable them.

Does this help you?

diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c
b/drivers/net/wireless/iwlwifi/pcie/trans.c
index 37f7acc..bebab4e 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -1803,6 +1803,10 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct
pci_dev *pdev,
* PCI Tx retries from interfering with C3 CPU state */
pci_write_config_byte(pdev, PCI_CFG_RETRY_TIMEOUT, 0x00);

+ trans->dev = &pdev->dev;
+ trans_pcie->pci_dev = pdev;
+ iwl_disable_interrupts(trans);
+
err = pci_enable_msi(pdev);
if (err) {
dev_err(&pdev->dev, "pci_enable_msi failed(0X%x)\n", err);
@@ -1814,8 +1818,6 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct
pci_dev *pdev,
}
}

- trans->dev = &pdev->dev;
- trans_pcie->pci_dev = pdev;
trans->hw_rev = iwl_read32(trans, CSR_HW_REV);
trans->hw_id = (pdev->device << 16) + pdev->subsystem_device;
snprintf(trans->hw_id_str, sizeof(trans->hw_id_str),

2014-04-18 14:50:48

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference



On 04/18/2014 05:34 PM, Alexander Monakov wrote:
> On Fri, Apr 18, 2014 at 5:47 PM, Emmanuel Grumbach <[email protected]> wrote:
>>
>>
>> On 04/18/2014 08:08 AM, Emmanuel Grumbach wrote:
>>> Hi,
>>>
>>> On 04/17/2014 11:56 PM, Alexander Monakov wrote:
>>>> Emmanuel,
>>>>
>>>> I'm curious about the plans w.r.t this fix; is it going to
>>>> iwlwifi-next.git? Thanks again.
>>>>
>>>
>>> It will go to iwlwifi-fixes.
>>>
>>
>> I have to say that this is really strange - I must miss something. It looks like the ISR is called even before it is requested...
>> So I definitely want to understand what is happening here before applying the fix.
>
> Sure, I can help with further investigation. Would it help if I bisected?
>

I can't decline such an offer:)

Note that the "bad" commit might be in pci too.
As a first try, I'd try 3.13.
I've made quite a few changes in the ISR for 3.14.

2014-04-13 08:49:04

by Alexander Monakov

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Sun, Apr 13, 2014 at 11:23 AM, Emmanuel Grumbach <[email protected]> wrote:
> On Sat, Apr 12, 2014 at 6:11 PM, Alexander Monakov <[email protected]> wrote:
>> Hello,
>>
>> On a laptop with Wireless-N 135 card and 3.14.0 kernel I'm getting a
>> kernel oops on cold boot ans wireless seems unusable, but suprisingly
>> everything works fine after a reboot:
>
> Wow - interesting - you are getting an interrupt before we even enable them.
>
> Does this help you?

Yes, it did help. Thanks.

(regarding interrupts, after installing an mSATA SSD that goes into
the mini-PCIe slot next to the wireless card, I've noticed that
"events" (mostly interrupts I guess) shown by PowerTop for the
wireless card and the SATA controller seem correlated, even though the
disk should be idle; I wonder what that means)

Alexander

2014-04-18 14:34:57

by Alexander Monakov

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Fri, Apr 18, 2014 at 5:47 PM, Emmanuel Grumbach <[email protected]> wrote:
>
>
> On 04/18/2014 08:08 AM, Emmanuel Grumbach wrote:
>> Hi,
>>
>> On 04/17/2014 11:56 PM, Alexander Monakov wrote:
>>> Emmanuel,
>>>
>>> I'm curious about the plans w.r.t this fix; is it going to
>>> iwlwifi-next.git? Thanks again.
>>>
>>
>> It will go to iwlwifi-fixes.
>>
>
> I have to say that this is really strange - I must miss something. It looks like the ISR is called even before it is requested...
> So I definitely want to understand what is happening here before applying the fix.

Sure, I can help with further investigation. Would it help if I bisected?

Alexander

2014-04-18 20:33:14

by Alexander Monakov

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Fri, Apr 18, 2014 at 6:50 PM, Emmanuel Grumbach <[email protected]> wrote:
>>> I have to say that this is really strange - I must miss something. It looks like the ISR is called even before it is requested...
>>> So I definitely want to understand what is happening here before applying the fix.
>>
>> Sure, I can help with further investigation. Would it help if I bisected?
>>
>
> I can't decline such an offer:)
>
> Note that the "bad" commit might be in pci too.
> As a first try, I'd try 3.13.
> I've made quite a few changes in the ISR for 3.14.

It bisects to

commit 2dbc368d7fded35ed221a3751405b15e06eb8925
Author: Emmanuel Grumbach <[email protected]>
Date: Mon Dec 9 11:09:47 2013 +0200

iwlwifi: pcie: track interrupt mask in SW

2014-04-18 05:08:20

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

Hi,

On 04/17/2014 11:56 PM, Alexander Monakov wrote:
> Emmanuel,
>
> I'm curious about the plans w.r.t this fix; is it going to
> iwlwifi-next.git? Thanks again.
>

It will go to iwlwifi-fixes.

2014-04-17 20:56:13

by Alexander Monakov

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

Emmanuel,

I'm curious about the plans w.r.t this fix; is it going to
iwlwifi-next.git? Thanks again.

On Sun, Apr 13, 2014 at 12:49 PM, Alexander Monakov <[email protected]> wrote:
> On Sun, Apr 13, 2014 at 11:23 AM, Emmanuel Grumbach <[email protected]> wrote:
>> On Sat, Apr 12, 2014 at 6:11 PM, Alexander Monakov <[email protected]> wrote:
>>> Hello,
>>>
>>> On a laptop with Wireless-N 135 card and 3.14.0 kernel I'm getting a
>>> kernel oops on cold boot ans wireless seems unusable, but suprisingly
>>> everything works fine after a reboot:
>>
>> Wow - interesting - you are getting an interrupt before we even enable them.
>>
>> Does this help you?
>
> Yes, it did help. Thanks.
>
> (regarding interrupts, after installing an mSATA SSD that goes into
> the mini-PCIe slot next to the wireless card, I've noticed that
> "events" (mostly interrupts I guess) shown by PowerTop for the
> wireless card and the SATA controller seem correlated, even though the
> disk should be idle; I wonder what that means)
>
> Alexander

2014-04-18 13:47:52

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference



On 04/18/2014 08:08 AM, Emmanuel Grumbach wrote:
> Hi,
>
> On 04/17/2014 11:56 PM, Alexander Monakov wrote:
>> Emmanuel,
>>
>> I'm curious about the plans w.r.t this fix; is it going to
>> iwlwifi-next.git? Thanks again.
>>
>
> It will go to iwlwifi-fixes.
>

I have to say that this is really strange - I must miss something. It looks like the ISR is called even before it is requested...
So I definitely want to understand what is happening here before applying the fix.

2014-05-01 01:19:48

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

Hi again,

On Sat, Apr 19, 2014 at 11:06 PM, Emmanuel Grumbach <[email protected]> wrote:
> Hi,
>
>

>>
>> It bisects to
>>
>> commit 2dbc368d7fded35ed221a3751405b15e06eb8925
>> Author: Emmanuel Grumbach <[email protected]>
>> Date: Mon Dec 9 11:09:47 2013 +0200
>>
>> iwlwifi: pcie: track interrupt mask in SW
>>
>
> Yeah - ok.
> Thank you for this information.
> I'll check again later.

Can you try this? I guess it won't work, but I want to be sure about it.

diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c
b/drivers/net/wireless/iwlwifi/pcie/trans.c
index 37f7acc..b3134f6 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -1803,6 +1803,9 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct
pci_dev *pdev,
* PCI Tx retries from interfering with C3 CPU state */
pci_write_config_byte(pdev, PCI_CFG_RETRY_TIMEOUT, 0x00);

+ trans->dev = &pdev->dev;
+ trans_pcie->pci_dev = pdev;
+
err = pci_enable_msi(pdev);
if (err) {
dev_err(&pdev->dev, "pci_enable_msi failed(0X%x)\n", err);
@@ -1814,8 +1817,6 @@ struct iwl_trans *iwl_trans_pcie_alloc(struct
pci_dev *pdev,
}
}

- trans->dev = &pdev->dev;
- trans_pcie->pci_dev = pdev;
trans->hw_rev = iwl_read32(trans, CSR_HW_REV);
trans->hw_id = (pdev->device << 16) + pdev->subsystem_device;
snprintf(trans->hw_id_str, sizeof(trans->hw_id_str),

Thanks!

2014-05-07 13:37:14

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Thu, May 1, 2014 at 11:43 AM, Alexander Monakov <[email protected]> wrote:
> On Thu, May 1, 2014 at 5:19 AM, Emmanuel Grumbach <[email protected]> wrote:
>>>> It bisects to
>>>>
>>>> commit 2dbc368d7fded35ed221a3751405b15e06eb8925
>>>> Author: Emmanuel Grumbach <[email protected]>
>>>> Date: Mon Dec 9 11:09:47 2013 +0200
>>>>
>>>> iwlwifi: pcie: track interrupt mask in SW
>>>>
>>>
>>> Yeah - ok.
>>> Thank you for this information.
>>> I'll check again later.
>>
>> Can you try this? I guess it won't work, but I want to be sure about it.
>
> Indeed, it still fails for me with that patch on top of 3.14.0.
>
> Anything else I can do?

Ok - I now understand what happened. Fix is on the way.
Sorry for having been so long. Vacation plus travelling doesn't help.

2014-05-01 08:43:25

by Alexander Monakov

[permalink] [raw]
Subject: Re: iwlwifi null pointer dereference

On Thu, May 1, 2014 at 5:19 AM, Emmanuel Grumbach <[email protected]> wrote:
>>> It bisects to
>>>
>>> commit 2dbc368d7fded35ed221a3751405b15e06eb8925
>>> Author: Emmanuel Grumbach <[email protected]>
>>> Date: Mon Dec 9 11:09:47 2013 +0200
>>>
>>> iwlwifi: pcie: track interrupt mask in SW
>>>
>>
>> Yeah - ok.
>> Thank you for this information.
>> I'll check again later.
>
> Can you try this? I guess it won't work, but I want to be sure about it.

Indeed, it still fails for me with that patch on top of 3.14.0.

Anything else I can do?