2014-04-05 14:35:39

by Rafał Miłecki

[permalink] [raw]
Subject: [PATCH][FIX 3.15] b43: N-PHY: access B43_MMIO_PSM_PHY_HDR using 16b ops

Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
functions isn't safe. On my machine it causes delayed (!) CPU exception:

Disabling lock debugging due to kernel taint
mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
mce: [Hardware Error]: TSC 164083803dc
mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check on current CPU
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Signed-off-by: Rafał Miłecki <[email protected]>
---
John: I think this patch it worth picking for 3.15 release. This bug
causes instability and can be triggered depending on the default state
of B43_NPHY_BANDCTL register.
It doesn't cause exception immediately, so I spent few hours tracing it.
---
drivers/net/wireless/b43/phy_n.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/b43/phy_n.c b/drivers/net/wireless/b43/phy_n.c
index 05ee7f1..24ccbe9 100644
--- a/drivers/net/wireless/b43/phy_n.c
+++ b/drivers/net/wireless/b43/phy_n.c
@@ -5176,22 +5176,22 @@ static void b43_nphy_channel_setup(struct b43_wldev *dev,
int ch = new_channel->hw_value;

u16 old_band_5ghz;
- u32 tmp32;
+ u16 tmp16;

old_band_5ghz =
b43_phy_read(dev, B43_NPHY_BANDCTL) & B43_NPHY_BANDCTL_5GHZ;
if (new_channel->band == IEEE80211_BAND_5GHZ && !old_band_5ghz) {
- tmp32 = b43_read32(dev, B43_MMIO_PSM_PHY_HDR);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32 | 4);
+ tmp16 = b43_read16(dev, B43_MMIO_PSM_PHY_HDR);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16 | 4);
b43_phy_set(dev, B43_PHY_B_BBCFG, 0xC000);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16);
b43_phy_set(dev, B43_NPHY_BANDCTL, B43_NPHY_BANDCTL_5GHZ);
} else if (new_channel->band == IEEE80211_BAND_2GHZ && old_band_5ghz) {
b43_phy_mask(dev, B43_NPHY_BANDCTL, ~B43_NPHY_BANDCTL_5GHZ);
- tmp32 = b43_read32(dev, B43_MMIO_PSM_PHY_HDR);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32 | 4);
+ tmp16 = b43_read16(dev, B43_MMIO_PSM_PHY_HDR);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16 | 4);
b43_phy_mask(dev, B43_PHY_B_BBCFG, 0x3FFF);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16);
}

b43_chantab_phy_upload(dev, e);
--
1.8.4.5



2014-04-05 16:08:32

by Rafał Miłecki

[permalink] [raw]
Subject: [PATCH V2][FIX 3.15] b43: Fix machine check error due to improper access of B43_MMIO_PSM_PHY_HDR

Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
functions isn't safe. On my machine it causes delayed (!) CPU exception:

Disabling lock debugging due to kernel taint
mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
mce: [Hardware Error]: TSC 164083803dc
mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check on current CPU
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Signed-off-by: Rafał Miłecki <[email protected]>
Acked-by: Larry Finger <[email protected]>
Cc: Stable <[email protected]> [2.6.35+]
---
V2: Change commit message as Larry suggested, no code changes
---
drivers/net/wireless/b43/phy_n.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/b43/phy_n.c b/drivers/net/wireless/b43/phy_n.c
index 05ee7f1..24ccbe9 100644
--- a/drivers/net/wireless/b43/phy_n.c
+++ b/drivers/net/wireless/b43/phy_n.c
@@ -5176,22 +5176,22 @@ static void b43_nphy_channel_setup(struct b43_wldev *dev,
int ch = new_channel->hw_value;

u16 old_band_5ghz;
- u32 tmp32;
+ u16 tmp16;

old_band_5ghz =
b43_phy_read(dev, B43_NPHY_BANDCTL) & B43_NPHY_BANDCTL_5GHZ;
if (new_channel->band == IEEE80211_BAND_5GHZ && !old_band_5ghz) {
- tmp32 = b43_read32(dev, B43_MMIO_PSM_PHY_HDR);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32 | 4);
+ tmp16 = b43_read16(dev, B43_MMIO_PSM_PHY_HDR);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16 | 4);
b43_phy_set(dev, B43_PHY_B_BBCFG, 0xC000);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16);
b43_phy_set(dev, B43_NPHY_BANDCTL, B43_NPHY_BANDCTL_5GHZ);
} else if (new_channel->band == IEEE80211_BAND_2GHZ && old_band_5ghz) {
b43_phy_mask(dev, B43_NPHY_BANDCTL, ~B43_NPHY_BANDCTL_5GHZ);
- tmp32 = b43_read32(dev, B43_MMIO_PSM_PHY_HDR);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32 | 4);
+ tmp16 = b43_read16(dev, B43_MMIO_PSM_PHY_HDR);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16 | 4);
b43_phy_mask(dev, B43_PHY_B_BBCFG, 0x3FFF);
- b43_write32(dev, B43_MMIO_PSM_PHY_HDR, tmp32);
+ b43_write16(dev, B43_MMIO_PSM_PHY_HDR, tmp16);
}

b43_chantab_phy_upload(dev, e);
--
1.8.4.5


2014-04-05 15:49:03

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH][FIX 3.15] b43: N-PHY: access B43_MMIO_PSM_PHY_HDR using 16b ops

On 04/05/2014 09:35 AM, Rafał Miłecki wrote:
> Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
> functions isn't safe. On my machine it causes delayed (!) CPU exception:
>
> Disabling lock debugging due to kernel taint
> mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
> mce: [Hardware Error]: TSC 164083803dc
> mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
> mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> mce: [Hardware Error]: Machine check: Processor context corrupt
> Kernel panic - not syncing: Fatal machine check on current CPU
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>
> Signed-off-by: Rafał Miłecki <[email protected]>
> ---
> John: I think this patch it worth picking for 3.15 release. This bug
> causes instability and can be triggered depending on the default state
> of B43_NPHY_BANDCTL register.
> It doesn't cause exception immediately, so I spent few hours tracing it.
> ---

I agree that this is a BUG fix that should be in 3.15, *and* in all stable
versions to which it would apply.

To aid John, I suggest that a better subject would be something like "b43: Fix
machine check error due to improper access of B43_MMIO_PSM_PHY_HDR". That makes
it more obvious that a bug is being fixed. In addition, you should add a "Cc:
Stable <[email protected]> [2.6.35+]".

As I rarely run an 802.11n Broadcom device, it is unlikely that I have
encountered this problem, but your evidence is convincing. Once the subject and
Cc are changed, then Acked-by: Larry Finger <[email protected]>

Larry