2013-09-25 19:06:28

by Josh Boyer

[permalink] [raw]
Subject: applesmc oops in 3.10/3.11

Hi All,

Chris has reported[1] an issue with the applesmc driver in the 3.10
and 3.11 kernels. This seems to be a bit transient, but the oops is
consistent across those releases. This is with a MacBook Pro 4,1.
The backtrace is below.

Any ideas on this one?

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719

BUG: unable to handle kernel paging request at ffff880322aed005
IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
PGD 1fe5067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
iTCO_vendor_support microcode input_polldev snd_hda_intel
snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
loop
CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
MBP41.88Z.00C1.B03.0802271651 02/27/08
task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>]
applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286
RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
Stack:
000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
Call Trace:
[<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
[<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
[<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
[<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
[<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
[<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
[<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
[<ffffffff813f1f77>] driver_probe_device+0x87/0x390
[<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
[<ffffffff813f22bb>] __device_attach+0x3b/0x40
[<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
[<ffffffff813f1e78>] device_attach+0x88/0xa0
[<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
[<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
[<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
[<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
[<ffffffffa05e3000>] ? 0xffffffffa05e2fff
[<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
[<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
[<ffffffff81052483>] ? set_memory_nx+0x43/0x50
[<ffffffff810cc58d>] load_module+0x1bbd/0x2660
[<ffffffff810c8890>] ? store_uevent+0x40/0x40
[<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
[<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
RSP <ffff8801343ffa18>


2013-09-25 19:56:35

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> Hi All,
>
> Chris has reported[1] an issue with the applesmc driver in the 3.10
> and 3.11 kernels. This seems to be a bit transient, but the oops is
> consistent across those releases. This is with a MacBook Pro 4,1.
> The backtrace is below.
>
> Any ideas on this one?
>
Some of the bug reports suggest that there are related messages in at least some
of the kernel logs.

Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail

This suggests that initialization may be attempted more than once. The key cache
is allocated only once, but the number of keys is read for each attempt.

No idea if that can happen, but if the number of keys can increase after
the first initialization attempt you would have an explanation for the crash.

Guenter


> josh
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719
>
> BUG: unable to handle kernel paging request at ffff880322aed005
> IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> PGD 1fe5067 PUD 0
> Oops: 0000 [#1] SMP
> Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
> uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
> iTCO_vendor_support microcode input_polldev snd_hda_intel
> snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
> videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
> snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
> snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
> mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
> xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
> ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
> i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
> loop
> CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
> Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
> MBP41.88Z.00C1.B03.0802271651 02/27/08
> task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
> RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>]
> applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286
> RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
> RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
> RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
> R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
> R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
> FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
> Stack:
> 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
> ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
> ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
> Call Trace:
> [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
> [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
> [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
> [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
> [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
> [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
> [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
> [<ffffffff813f1f77>] driver_probe_device+0x87/0x390
> [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
> [<ffffffff813f22bb>] __device_attach+0x3b/0x40
> [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
> [<ffffffff813f1e78>] device_attach+0x88/0xa0
> [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
> [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
> [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
> [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
> [<ffffffffa05e3000>] ? 0xffffffffa05e2fff
> [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
> [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
> [<ffffffff81052483>] ? set_memory_nx+0x43/0x50
> [<ffffffff810cc58d>] load_module+0x1bbd/0x2660
> [<ffffffff810c8890>] ? store_uevent+0x40/0x40
> [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
> [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
> Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
> e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
> 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
> RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> RSP <ffff8801343ffa18>
>

2013-09-25 21:47:15

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > Hi All,
> >
> > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > and 3.11 kernels. This seems to be a bit transient, but the oops is
> > consistent across those releases. This is with a MacBook Pro 4,1.
> > The backtrace is below.
> >
> > Any ideas on this one?
> >
> Some of the bug reports suggest that there are related messages in at least some
> of the kernel logs.
>
> Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
>
> This suggests that initialization may be attempted more than once. The key cache
> is allocated only once, but the number of keys is read for each attempt.
>
> No idea if that can happen, but if the number of keys can increase after
> the first initialization attempt you would have an explanation for the crash.

Good idea, and easy enough to test with the patch below.

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..d962c4d 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
{
struct applesmc_registers *s = &smcreg;
bool left_light_sensor, right_light_sensor;
+ unsigned int count;
u8 tmp[1];
int ret;

if (s->init_complete)
return 0;

- ret = read_register_count(&s->key_count);
+ ret = read_register_count(&count);
if (ret)
return ret;

+ if (s->cache && s->key_count != count) {
+ pr_warn("key count changed from %d to %d\n", s->key_count, count);
+ kfree(s->cache);
+ s->cache = NULL;
+ }
+ s->key_count = count;
+
if (!s->cache)
s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
if (!s->cache)

Thanks,
Henrik

2013-09-25 22:08:44

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Sep 25, 2013 at 11:48:07PM +0200, Henrik Rydberg wrote:
> On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > > Hi All,
> > >
> > > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > > and 3.11 kernels. This seems to be a bit transient, but the oops is
> > > consistent across those releases. This is with a MacBook Pro 4,1.
> > > The backtrace is below.
> > >
> > > Any ideas on this one?
> > >
> > Some of the bug reports suggest that there are related messages in at least some
> > of the kernel logs.
> >
> > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
> >
> > This suggests that initialization may be attempted more than once. The key cache
> > is allocated only once, but the number of keys is read for each attempt.
> >
> > No idea if that can happen, but if the number of keys can increase after
> > the first initialization attempt you would have an explanation for the crash.
>
> Good idea, and easy enough to test with the patch below.
>
Should we apply this patch even though it may not solve the specific problem ?

Again, not sure if the key count can change, but the current code is at the very
least inconsistent, as it keeps reading the key count without updating or
verifying the cache size.

Thanks,
Guenter

> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..d962c4d 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
> {
> struct applesmc_registers *s = &smcreg;
> bool left_light_sensor, right_light_sensor;
> + unsigned int count;
> u8 tmp[1];
> int ret;
>
> if (s->init_complete)
> return 0;
>
> - ret = read_register_count(&s->key_count);
> + ret = read_register_count(&count);
> if (ret)
> return ret;
>
> + if (s->cache && s->key_count != count) {
> + pr_warn("key count changed from %d to %d\n", s->key_count, count);
> + kfree(s->cache);
> + s->cache = NULL;
> + }
> + s->key_count = count;
> +
> if (!s->cache)
> s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
> if (!s->cache)
>
> Thanks,
> Henrik
>

2013-09-26 06:33:58

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> > > This suggests that initialization may be attempted more than once. The key cache
> > > is allocated only once, but the number of keys is read for each attempt.
> > >
> > > No idea if that can happen, but if the number of keys can increase after
> > > the first initialization attempt you would have an explanation for the crash.
> >
> > Good idea, and easy enough to test with the patch below.
> >
> Should we apply this patch even though it may not solve the specific problem ?

Yes, why not - it certainly won't hurt. I am running it right now, so
it is at least run-tested.

> Again, not sure if the key count can change, but the current code is at the very
> least inconsistent, as it keeps reading the key count without updating or
> verifying the cache size.

Yes - I agree that the error state is far-fetched, but it is hard to
see any other logical explanation. There is of course always the
possibility that the problem is somewhere else completely.

Proper patch attached.

Thanks,
Henrik

---

>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <[email protected]>
Date: Thu, 26 Sep 2013 08:33:16 +0200
Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding

After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.

Reported-by: <[email protected]>
Suggested-by: Guenter Roeck <[email protected]>
Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/hwmon/applesmc.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..98814d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
{
struct applesmc_registers *s = &smcreg;
bool left_light_sensor, right_light_sensor;
+ unsigned int count;
u8 tmp[1];
int ret;

if (s->init_complete)
return 0;

- ret = read_register_count(&s->key_count);
+ ret = read_register_count(&count);
if (ret)
return ret;

+ if (s->cache && s->key_count != count) {
+ pr_warn("key count changed from %d to %d\n",
+ s->key_count, count);
+ kfree(s->cache);
+ s->cache = NULL;
+ }
+ s->key_count = count;
+
if (!s->cache)
s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
if (!s->cache)
--
1.8.3.4

2013-09-26 10:36:10

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
There are also ACPI conflicts in each of the bug reports I looked at.
Can this play a role, or is that "normal" on apple systems ?

Thanks,
Guenter

> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <[email protected]>
> Suggested-by: Guenter Roeck <[email protected]>
> Signed-off-by: Henrik Rydberg <[email protected]>
> ---
> drivers/hwmon/applesmc.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..98814d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
> {
> struct applesmc_registers *s = &smcreg;
> bool left_light_sensor, right_light_sensor;
> + unsigned int count;
> u8 tmp[1];
> int ret;
>
> if (s->init_complete)
> return 0;
>
> - ret = read_register_count(&s->key_count);
> + ret = read_register_count(&count);
> if (ret)
> return ret;
>
> + if (s->cache && s->key_count != count) {
> + pr_warn("key count changed from %d to %d\n",
> + s->key_count, count);
> + kfree(s->cache);
> + s->cache = NULL;
> + }
> + s->key_count = count;
> +
> if (!s->cache)
> s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
> if (!s->cache)
>

2013-09-26 10:53:23

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <[email protected]>
> Suggested-by: Guenter Roeck <[email protected]>
> Signed-off-by: Henrik Rydberg <[email protected]>
> ---
> drivers/hwmon/applesmc.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)

Applied. I'll run my sanity tests before I send it upstream.
I'll also Cc: stable.

Interesting is that this started to happen with 3.10, even though
I did not find any relevant changes in the driver. Is it possible that
changed boot timing (ie reduced boot time) exposes this problem ?

Thanks,
Guenter

2013-09-26 11:10:55

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> Applied. I'll run my sanity tests before I send it upstream.
> I'll also Cc: stable.

Great.

> Interesting is that this started to happen with 3.10, even though
> I did not find any relevant changes in the driver. Is it possible that
> changed boot timing (ie reduced boot time) exposes this problem ?

I was thinking the same thing - if the SMC takes some time to boot up,
it could very well be the culprit. Imagine that, the kernel is too
fast. ;-)

Thanks,
Henrik

2013-09-26 11:12:54

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> >Yes - I agree that the error state is far-fetched, but it is hard to
> >see any other logical explanation. There is of course always the
> >possibility that the problem is somewhere else completely.
> >
> There are also ACPI conflicts in each of the bug reports I looked at.
> Can this play a role, or is that "normal" on apple systems ?

I honstestly don't know. On my own machine (MBA3,1), I have problems
with the PCIe NIC somehow overwriting the graphics buffer, resulting
in a corrupted screen at boot. After a suspend/resume, that problem is
gone, but there is surely something fishy going on...

Thanks,
Henrik

2013-09-27 16:21:08

by Josh Boyer

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>> > > This suggests that initialization may be attempted more than once. The key cache
>> > > is allocated only once, but the number of keys is read for each attempt.
>> > >
>> > > No idea if that can happen, but if the number of keys can increase after
>> > > the first initialization attempt you would have an explanation for the crash.
>> >
>> > Good idea, and easy enough to test with the patch below.
>> >
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <[email protected]>
> Suggested-by: Guenter Roeck <[email protected]>
> Signed-off-by: Henrik Rydberg <[email protected]>
> ---
> drivers/hwmon/applesmc.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)

Thanks for the quick reply. I'll get this rolled into our kernels soon.

josh

2013-09-27 17:13:06

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <[email protected]>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <[email protected]>
> > Suggested-by: Guenter Roeck <[email protected]>
> > Signed-off-by: Henrik Rydberg <[email protected]>
> > ---
> > drivers/hwmon/applesmc.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
>
> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>
I sent a pull request to Linus, so you should be able to pull it from
the upstream kernel shortly. Would be great to get feedback if the patch
solves the problem (or doesn't).

Thanks,
Guenter

2013-09-27 17:59:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>
> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:
>
> > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
> >>>>>> This suggests that initialization may be attempted more than once. The key cache
> >>>>>> is allocated only once, but the number of keys is read for each attempt.
> >>>>>>
> >>>>>> No idea if that can happen, but if the number of keys can increase after
> >>>>>> the first initialization attempt you would have an explanation for the crash.
> >>>>>
> >>>>> Good idea, and easy enough to test with the patch below.
> >>>>>
> >>>> Should we apply this patch even though it may not solve the specific problem ?
> >>>
> >>> Yes, why not - it certainly won't hurt. I am running it right now, so
> >>> it is at least run-tested.
> >>>
> >>>> Again, not sure if the key count can change, but the current code is at the very
> >>>> least inconsistent, as it keeps reading the key count without updating or
> >>>> verifying the cache size.
> >>>
> >>> Yes - I agree that the error state is far-fetched, but it is hard to
> >>> see any other logical explanation. There is of course always the
> >>> possibility that the problem is somewhere else completely.
> >>>
> >>> Proper patch attached.
> >>>
> >>> Thanks,
> >>> Henrik
> >>>
> >>> ---
> >>>
> >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> >>> From: Henrik Rydberg <[email protected]>
> >>> Date: Thu, 26 Sep 2013 08:33:16 +0200
> >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >>>
> >>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> >>> Guenter pointed at the initialization problem fixed below. The patch
> >>> has not been verified to fix the crash, but should be applied
> >>> regardless.
> >>>
> >>> Reported-by: <[email protected]>
> >>> Suggested-by: Guenter Roeck <[email protected]>
> >>> Signed-off-by: Henrik Rydberg <[email protected]>
> >>> ---
> >>> drivers/hwmon/applesmc.c | 11 ++++++++++-
> >>> 1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >> Thanks for the quick reply. I'll get this rolled into our kernels soon.
> >>
> > I sent a pull request to Linus, so you should be able to pull it from
> > the upstream kernel shortly. Would be great to get feedback if the patch
> > solves the problem (or doesn't).
>
> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>
When you do, can you possibly trigger an event based on the warning added
with the patch ? This might help us to identify if the problem fixed
with the patch actually happens.

Thanks,
Guenter

2013-09-27 18:01:08

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <[email protected]>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <[email protected]>
> > Suggested-by: Guenter Roeck <[email protected]>
> > Signed-off-by: Henrik Rydberg <[email protected]>
> > ---
> > drivers/hwmon/applesmc.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
>
> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>
Patch is now upstream (commit 5f4513864).

Guenter

2013-09-27 18:03:15

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Sep 27, 2013, at 11:59 AM, Guenter Roeck <[email protected]> wrote:

> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>
>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:
>>
>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>
>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>
>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>
>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>
>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>> it is at least run-tested.
>>>>>
>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>> verifying the cache size.
>>>>>
>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>> see any other logical explanation. There is of course always the
>>>>> possibility that the problem is somewhere else completely.
>>>>>
>>>>> Proper patch attached.
>>>>>
>>>>> Thanks,
>>>>> Henrik
>>>>>
>>>>> ---
>>>>>
>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>> From: Henrik Rydberg <[email protected]>
>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>
>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>> has not been verified to fix the crash, but should be applied
>>>>> regardless.
>>>>>
>>>>> Reported-by: <[email protected]>
>>>>> Suggested-by: Guenter Roeck <[email protected]>
>>>>> Signed-off-by: Henrik Rydberg <[email protected]>
>>>>> ---
>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>>>>
>>> I sent a pull request to Linus, so you should be able to pull it from
>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>> solves the problem (or doesn't).
>>
>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>
> When you do, can you possibly trigger an event based on the warning added
> with the patch ? This might help us to identify if the problem fixed
> with the patch actually happens.

I don't understand the question. I'm uncertain how to trigger, and also what event.

Chris-

2013-09-27 18:03:29

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:

> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>
>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>
>>>>> Good idea, and easy enough to test with the patch below.
>>>>>
>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>
>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>> it is at least run-tested.
>>>
>>>> Again, not sure if the key count can change, but the current code is at the very
>>>> least inconsistent, as it keeps reading the key count without updating or
>>>> verifying the cache size.
>>>
>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>> see any other logical explanation. There is of course always the
>>> possibility that the problem is somewhere else completely.
>>>
>>> Proper patch attached.
>>>
>>> Thanks,
>>> Henrik
>>>
>>> ---
>>>
>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>> From: Henrik Rydberg <[email protected]>
>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>
>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>> Guenter pointed at the initialization problem fixed below. The patch
>>> has not been verified to fix the crash, but should be applied
>>> regardless.
>>>
>>> Reported-by: <[email protected]>
>>> Suggested-by: Guenter Roeck <[email protected]>
>>> Signed-off-by: Henrik Rydberg <[email protected]>
>>> ---
>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>>
> I sent a pull request to Linus, so you should be able to pull it from
> the upstream kernel shortly. Would be great to get feedback if the patch
> solves the problem (or doesn't).

I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.

Chris-

2013-09-27 23:33:12

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 09/27/2013 11:03 AM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <[email protected]> wrote:
>
>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>
>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>
>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>
>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>
>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>> it is at least run-tested.
>>>>>>
>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>> verifying the cache size.
>>>>>>
>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>> see any other logical explanation. There is of course always the
>>>>>> possibility that the problem is somewhere else completely.
>>>>>>
>>>>>> Proper patch attached.
>>>>>>
>>>>>> Thanks,
>>>>>> Henrik
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>> From: Henrik Rydberg <[email protected]>
>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>
>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>> has not been verified to fix the crash, but should be applied
>>>>>> regardless.
>>>>>>
>>>>>> Reported-by: <[email protected]>
>>>>>> Suggested-by: Guenter Roeck <[email protected]>
>>>>>> Signed-off-by: Henrik Rydberg <[email protected]>
>>>>>> ---
>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>>>>>
>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>> solves the problem (or doesn't).
>>>
>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>
>> When you do, can you possibly trigger an event based on the warning added
>> with the patch ? This might help us to identify if the problem fixed
>> with the patch actually happens.
>
> I don't understand the question. I'm uncertain how to trigger, and also what event.
>

The patch includes a new warning message.

pr_warn("key count changed from %d to %d\n",
s->key_count, count);

It would be great if there would be a means to detect if this message is seen
in a kernel log, because it would show that the potential crash condition
fixed with the patch was actually encountered. This would help us to determine
if we actually fixed the problem or not.

Of course, we'll know if is wasn't fixed if the system still crashes.

Thanks,
Guenter

2013-10-01 03:11:50

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Sep 27, 2013, at 5:33 PM, Guenter Roeck <[email protected]> wrote:

> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>>
>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <[email protected]> wrote:
>>
>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>>
>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:
>>>>
>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>>
>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>>
>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>>
>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>>
>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>> it is at least run-tested.
>>>>>>>
>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>> verifying the cache size.
>>>>>>>
>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>> see any other logical explanation. There is of course always the
>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>>
>>>>>>> Proper patch attached.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Henrik
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>> From: Henrik Rydberg <[email protected]>
>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>>
>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>> regardless.
>>>>>>>
>>>>>>> Reported-by: <[email protected]>
>>>>>>> Suggested-by: Guenter Roeck <[email protected]>
>>>>>>> Signed-off-by: Henrik Rydberg <[email protected]>
>>>>>>> ---
>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>>>>>>
>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>> solves the problem (or doesn't).
>>>>
>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>>
>>> When you do, can you possibly trigger an event based on the warning added
>>> with the patch ? This might help us to identify if the problem fixed
>>> with the patch actually happens.
>>
>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>>
>
> The patch includes a new warning message.
>
> pr_warn("key count changed from %d to %d\n",
> s->key_count, count);
>
> It would be great if there would be a means to detect if this message is seen
> in a kernel log, because it would show that the potential crash condition
> fixed with the patch was actually encountered. This would help us to determine
> if we actually fixed the problem or not.
>
> Of course, we'll know if is wasn't fixed if the system still crashes.

Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.

[ 10.886016] applesmc: key count changed from 261 to 1174405121

Attaching new full dmesg to the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11

Chris-

2013-10-01 03:37:46

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 09/30/2013 06:57 PM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 5:33 PM, Guenter Roeck <[email protected]> wrote:
>
>> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <[email protected]> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <[email protected]> wrote:
>>>>>
>>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <[email protected]> wrote:
>>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>>>
>>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>>>
>>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>>>
>>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>>>
>>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>>> it is at least run-tested.
>>>>>>>>
>>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>>> verifying the cache size.
>>>>>>>>
>>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>>> see any other logical explanation. There is of course always the
>>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>>>
>>>>>>>> Proper patch attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Henrik
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>>> From: Henrik Rydberg <[email protected]>
>>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>>>
>>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>>> regardless.
>>>>>>>>
>>>>>>>> Reported-by: <[email protected]>
>>>>>>>> Suggested-by: Guenter Roeck <[email protected]>
>>>>>>>> Signed-off-by: Henrik Rydberg <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon.
>>>>>>>
>>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>>> solves the problem (or doesn't).
>>>>>
>>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>>>
>>>> When you do, can you possibly trigger an event based on the warning added
>>>> with the patch ? This might help us to identify if the problem fixed
>>>> with the patch actually happens.
>>>
>>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>>>
>>
>> The patch includes a new warning message.
>>
>> pr_warn("key count changed from %d to %d\n",
>> s->key_count, count);
>>
>> It would be great if there would be a means to detect if this message is seen
>> in a kernel log, because it would show that the potential crash condition
>> fixed with the patch was actually encountered. This would help us to determine
>> if we actually fixed the problem or not.
>>
>> Of course, we'll know if is wasn't fixed if the system still crashes.
>
> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>
> [ 10.886016] applesmc: key count changed from 261 to 1174405121
>

Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
Which I guess explains the subsequent memory allocation error in the log.

Henrik, any idea what might be going on ? Is it possible that the previous
command failure leaves some state machine in a bad state ?

Guenter

> Attaching new full dmesg to the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11
>
> Chris
>

2013-10-01 11:01:10

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >
> >[ 10.886016] applesmc: key count changed from 261 to 1174405121
> >
>
> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> Which I guess explains the subsequent memory allocation error in the log.
>
> Henrik, any idea what might be going on ? Is it possible that the previous
> command failure leaves some state machine in a bad state ?

I seem to recall a report on another similar state problem on newer
machines, so maybe, yes. Older machines seem fine, I have never
encountered the problem myself. Here is a patch to test that
theory. It has been tested to be pretty harmless on two different
generations.

I really really do not want to add an 'if (value is insane)' check ;-)

Thanks,
Henrik

>From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <[email protected]>
Date: Tue, 1 Oct 2013 12:16:09 +0200
Subject: [PATCH] applesmc remedy take 1

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Exception: the write-number-of-bytes-to-read command seems to differ
between models (it may not be needed on the newest), so do not try to
flush the data at that particular point.

Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
has not been reproduced, so the actual effect of this patch is unknown.
---
drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..f6eaf6d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -186,10 +186,23 @@ static int wait_read(void)
* send_byte - Write to SMC port, retrying when necessary. Callers
* must hold applesmc_lock.
*/
-static int send_byte(u8 cmd, u16 port)
+static int send_byte(u8 cmd, u16 port, bool flush)
{
- u8 status;
- int us;
+ u8 status, data;
+ int us, nskip;
+
+ if (flush) {
+ /* read the data port until bit0 is cleared */
+ for (nskip = 0; nskip < 16; nskip++) {
+ udelay(APPLESMC_MIN_WAIT);
+ status = inb(APPLESMC_CMD_PORT);
+ if (!(status & 0x01))
+ break;
+ data = inb(APPLESMC_DATA_PORT);
+ }
+ if (nskip)
+ pr_warn("flushed %d bytes\n", nskip);
+ }

outb(cmd, port);
for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
@@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)

static int send_command(u8 cmd)
{
- return send_byte(cmd, APPLESMC_CMD_PORT);
+ return send_byte(cmd, APPLESMC_CMD_PORT, true);
}

static int send_argument(const char *key)
@@ -223,7 +236,7 @@ static int send_argument(const char *key)
int i;

for (i = 0; i < 4; i++)
- if (send_byte(key[i], APPLESMC_DATA_PORT))
+ if (send_byte(key[i], APPLESMC_DATA_PORT, true))
return -EIO;
return 0;
}
@@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
return -EIO;
}

- if (send_byte(len, APPLESMC_DATA_PORT)) {
+ if (send_byte(len, APPLESMC_DATA_PORT, false)) {
pr_warn("%.4s: read len fail\n", key);
return -EIO;
}
@@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
return -EIO;
}

- if (send_byte(len, APPLESMC_DATA_PORT)) {
+ if (send_byte(len, APPLESMC_DATA_PORT, true)) {
pr_warn("%.4s: write len fail\n", key);
return -EIO;
}

for (i = 0; i < len; i++) {
- if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
+ if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
pr_warn("%s: write data fail\n", key);
return -EIO;
}
--
1.8.3.4

2013-10-01 15:26:23

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> > >
> > >[ 10.886016] applesmc: key count changed from 261 to 1174405121
> > >
> >
> > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> > Which I guess explains the subsequent memory allocation error in the log.
> >
> > Henrik, any idea what might be going on ? Is it possible that the previous
> > command failure leaves some state machine in a bad state ?
>
> I seem to recall a report on another similar state problem on newer
> machines, so maybe, yes. Older machines seem fine, I have never
> encountered the problem myself. Here is a patch to test that
> theory. It has been tested to be pretty harmless on two different
> generations.
>
> I really really do not want to add an 'if (value is insane)' check ;-)
>
Chris,

any chance you can load this patch on an affected machine so we can get
test feedback ? This one is too experimental to submit upstream without
knowing that it really fixes the problem.

Thanks,
Guenter

> Thanks,
> Henrik
>
> From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Tue, 1 Oct 2013 12:16:09 +0200
> Subject: [PATCH] applesmc remedy take 1
>
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
>
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
>
> Exception: the write-number-of-bytes-to-read command seems to differ
> between models (it may not be needed on the newest), so do not try to
> flush the data at that particular point.
>
> Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
> has not been reproduced, so the actual effect of this patch is unknown.
> ---
> drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..f6eaf6d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -186,10 +186,23 @@ static int wait_read(void)
> * send_byte - Write to SMC port, retrying when necessary. Callers
> * must hold applesmc_lock.
> */
> -static int send_byte(u8 cmd, u16 port)
> +static int send_byte(u8 cmd, u16 port, bool flush)
> {
> - u8 status;
> - int us;
> + u8 status, data;
> + int us, nskip;
> +
> + if (flush) {
> + /* read the data port until bit0 is cleared */
> + for (nskip = 0; nskip < 16; nskip++) {
> + udelay(APPLESMC_MIN_WAIT);
> + status = inb(APPLESMC_CMD_PORT);
> + if (!(status & 0x01))
> + break;
> + data = inb(APPLESMC_DATA_PORT);
> + }
> + if (nskip)
> + pr_warn("flushed %d bytes\n", nskip);
> + }
>
> outb(cmd, port);
> for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
> @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)
>
> static int send_command(u8 cmd)
> {
> - return send_byte(cmd, APPLESMC_CMD_PORT);
> + return send_byte(cmd, APPLESMC_CMD_PORT, true);
> }
>
> static int send_argument(const char *key)
> @@ -223,7 +236,7 @@ static int send_argument(const char *key)
> int i;
>
> for (i = 0; i < 4; i++)
> - if (send_byte(key[i], APPLESMC_DATA_PORT))
> + if (send_byte(key[i], APPLESMC_DATA_PORT, true))
> return -EIO;
> return 0;
> }
> @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> return -EIO;
> }
>
> - if (send_byte(len, APPLESMC_DATA_PORT)) {
> + if (send_byte(len, APPLESMC_DATA_PORT, false)) {
> pr_warn("%.4s: read len fail\n", key);
> return -EIO;
> }
> @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
> return -EIO;
> }
>
> - if (send_byte(len, APPLESMC_DATA_PORT)) {
> + if (send_byte(len, APPLESMC_DATA_PORT, true)) {
> pr_warn("%.4s: write len fail\n", key);
> return -EIO;
> }
>
> for (i = 0; i < len; i++) {
> - if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
> + if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
> pr_warn("%s: write data fail\n", key);
> return -EIO;
> }
> --
> 1.8.3.4
>
>

2013-10-01 15:33:39

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:

> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>
>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
>>>>
>>>
>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>> Which I guess explains the subsequent memory allocation error in the log.
>>>
>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>> command failure leaves some state machine in a bad state ?
>>
>> I seem to recall a report on another similar state problem on newer
>> machines, so maybe, yes. Older machines seem fine, I have never
>> encountered the problem myself. Here is a patch to test that
>> theory. It has been tested to be pretty harmless on two different
>> generations.
>>
>> I really really do not want to add an 'if (value is insane)' check ;-)
>>
> Chris,
>
> any chance you can load this patch on an affected machine so we can get
> test feedback ? This one is too experimental to submit upstream without
> knowing that it really fixes the problem.

Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).

Chris-

2013-10-01 16:24:32

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>
> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:
>
> > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>>
> >>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
> >>>>
> >>>
> >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>> Which I guess explains the subsequent memory allocation error in the log.
> >>>
> >>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>> command failure leaves some state machine in a bad state ?
> >>
> >> I seem to recall a report on another similar state problem on newer
> >> machines, so maybe, yes. Older machines seem fine, I have never
> >> encountered the problem myself. Here is a patch to test that
> >> theory. It has been tested to be pretty harmless on two different
> >> generations.
> >>
> >> I really really do not want to add an 'if (value is insane)' check ;-)
> >>
> > Chris,
> >
> > any chance you can load this patch on an affected machine so we can get
> > test feedback ? This one is too experimental to submit upstream without
> > knowing that it really fixes the problem.
>
> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>
Only requirement is that it also includes the previous patch, so it would be
optimal if you can apply it on top of the previous image.

Thanks,
Guenter

2013-10-02 01:09:33

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 1, 2013, at 10:24 AM, Guenter Roeck <[email protected]> wrote:

> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>
>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:
>>
>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>
>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>
>>>>>
>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>
>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>> command failure leaves some state machine in a bad state ?
>>>>
>>>> I seem to recall a report on another similar state problem on newer
>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>> encountered the problem myself. Here is a patch to test that
>>>> theory. It has been tested to be pretty harmless on two different
>>>> generations.
>>>>
>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>
>>> Chris,
>>>
>>> any chance you can load this patch on an affected machine so we can get
>>> test feedback ? This one is too experimental to submit upstream without
>>> knowing that it really fixes the problem.
>>
>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>
> Only requirement is that it also includes the previous patch, so it would be
> optimal if you can apply it on top of the previous image.

Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:

[ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
[ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10


Chris-

2013-10-02 03:52:04

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>
> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <[email protected]> wrote:
>
> > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
> >>
> >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:
> >>
> >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>>>>
> >>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
> >>>>>>
> >>>>>
> >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>>>> Which I guess explains the subsequent memory allocation error in the log.
> >>>>>
> >>>>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>>>> command failure leaves some state machine in a bad state ?
> >>>>
> >>>> I seem to recall a report on another similar state problem on newer
> >>>> machines, so maybe, yes. Older machines seem fine, I have never
> >>>> encountered the problem myself. Here is a patch to test that
> >>>> theory. It has been tested to be pretty harmless on two different
> >>>> generations.
> >>>>
> >>>> I really really do not want to add an 'if (value is insane)' check ;-)
> >>>>
> >>> Chris,
> >>>
> >>> any chance you can load this patch on an affected machine so we can get
> >>> test feedback ? This one is too experimental to submit upstream without
> >>> knowing that it really fixes the problem.
> >>
> >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
> >>
> > Only requirement is that it also includes the previous patch, so it would be
> > optimal if you can apply it on top of the previous image.
>
> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>
> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>

Hi Chris,

That only means that you did not hit the problem. There may be some secondary
trigger (cold boot ? coffee on the cpu ?).

One thing I have seen in all logs is the earlier "send_byte fail" message, so
I think that is a pre-requisite.

Thanks,
Guenter

2013-10-02 03:55:54

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 1, 2013, at 9:51 PM, Guenter Roeck <[email protected]> wrote:

> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>>
>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <[email protected]> wrote:
>>
>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>>
>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:
>>>>
>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>>
>>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>>
>>>>>>>
>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>>
>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>>
>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>> generations.
>>>>>>
>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>>
>>>>> Chris,
>>>>>
>>>>> any chance you can load this patch on an affected machine so we can get
>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>> knowing that it really fixes the problem.
>>>>
>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>>
>>> Only requirement is that it also includes the previous patch, so it would be
>>> optimal if you can apply it on top of the previous image.
>>
>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>
>> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>
>
> Hi Chris,
>
> That only means that you did not hit the problem. There may be some secondary
> trigger (cold boot ? coffee on the cpu ?).
>
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.


Chris-

2013-10-02 04:02:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 10/01/2013 08:55 PM, Chris Murphy wrote:
>
> On Oct 1, 2013, at 9:51 PM, Guenter Roeck <[email protected]> wrote:
>
>> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>>>
>>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <[email protected]> wrote:
>>>
>>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <[email protected]> wrote:
>>>>>
>>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>>>
>>>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>>>
>>>>>>>>
>>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>>>
>>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>>>
>>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>>> generations.
>>>>>>>
>>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>>>
>>>>>> Chris,
>>>>>>
>>>>>> any chance you can load this patch on an affected machine so we can get
>>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>>> knowing that it really fixes the problem.
>>>>>
>>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>>>
>>>> Only requirement is that it also includes the previous patch, so it would be
>>>> optimal if you can apply it on top of the previous image.
>>>
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> Hi Chris,
>>
>> That only means that you did not hit the problem. There may be some secondary
>> trigger (cold boot ? coffee on the cpu ?).
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.
>

I am sure you didn't pour coffee onto the CPU yet :)

Basic rule of testing: You'll only hit the problem again after you are convinced
that it was magically fixed by a completely unrelated change.

Cheers,
Guenter

2013-10-02 09:52:10

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
> >
> > [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
> > [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
> >
>
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

Not necessarily - it could be that the patch actually fixes the root
cause. One possible scenario is that on recent SMCs, some of the
commands produce more data than we actually read. This would
eventually lead to both data corruption and overflow somwhere in the
SMC internals. If the original SMC error is interpreted as a read
buffer overflow, then that problem should be fixed with this patch.

Even so, the patch does require more testing. It would be great to get
coverage on all models we can find.

Thanks,
Henrik

2013-10-02 13:30:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 10/02/2013 02:53 AM, Henrik Rydberg wrote:
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> Not necessarily - it could be that the patch actually fixes the root
> cause. One possible scenario is that on recent SMCs, some of the
> commands produce more data than we actually read. This would
> eventually lead to both data corruption and overflow somwhere in the
> SMC internals. If the original SMC error is interpreted as a read
> buffer overflow, then that problem should be fixed with this patch.
>

Good point.

But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

Thanks,
Guenter

2013-10-02 16:33:13

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

> >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>I think that is a pre-requisite.
> >
> >Not necessarily - it could be that the patch actually fixes the root
> >cause. One possible scenario is that on recent SMCs, some of the
> >commands produce more data than we actually read. This would
> >eventually lead to both data corruption and overflow somwhere in the
> >SMC internals. If the original SMC error is interpreted as a read
> >buffer overflow, then that problem should be fixed with this patch.
> >
>
> Good point.
>
> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

The explanation I have there is that the (newer) SMC needs the
application to read the 'no more bytes' or it will get confused. It
makes sense, if the number of bytes to read is no longer specified.

Thanks,
Henrik

2013-10-02 16:47:26

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > >>I think that is a pre-requisite.
> > >
> > >Not necessarily - it could be that the patch actually fixes the root
> > >cause. One possible scenario is that on recent SMCs, some of the
> > >commands produce more data than we actually read. This would
> > >eventually lead to both data corruption and overflow somwhere in the
> > >SMC internals. If the original SMC error is interpreted as a read
> > >buffer overflow, then that problem should be fixed with this patch.
> > >
> >
> > Good point.
> >
> > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>
> The explanation I have there is that the (newer) SMC needs the
> application to read the 'no more bytes' or it will get confused. It
> makes sense, if the number of bytes to read is no longer specified.
>
You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
That might make sense.

Thanks,
Guenter

2013-10-02 17:22:36

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > >>I think that is a pre-requisite.
> > > >
> > > >Not necessarily - it could be that the patch actually fixes the root
> > > >cause. One possible scenario is that on recent SMCs, some of the
> > > >commands produce more data than we actually read. This would
> > > >eventually lead to both data corruption and overflow somwhere in the
> > > >SMC internals. If the original SMC error is interpreted as a read
> > > >buffer overflow, then that problem should be fixed with this patch.
> > > >
> > >
> > > Good point.
> > >
> > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >
> > The explanation I have there is that the (newer) SMC needs the
> > application to read the 'no more bytes' or it will get confused. It
> > makes sense, if the number of bytes to read is no longer specified.
> >
> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> That might make sense.

It also points at the possibility of a smaller patch to test, but I
have not had the time to check this very deeply myself:

Thanks,
Henrik

>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <[email protected]>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] applesmc remedy take 2

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Tests by Chris suggests reading the status is enough for the problem
to go away, which is consistent with a change in the SMC interface,
where the number of bytes to read is no longer specified, but found
out by reading until end of data.

Tested on a MacBookAir3,1, but the original problem has not been
reproduced.
---
drivers/hwmon/applesmc.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..c0ff350 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)

static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
{
+ u8 status, data;
int i;

if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
return -EIO;
}

+ /* This has no effect on newer (2012) SMCs */
if (send_byte(len, APPLESMC_DATA_PORT)) {
pr_warn("%.4s: read len fail\n", key);
return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
buffer[i] = inb(APPLESMC_DATA_PORT);
}

+ /* Read the data port until bit0 is cleared */
+ for (i = 0; i < 16; i++) {
+ udelay(APPLESMC_MIN_WAIT);
+ status = inb(APPLESMC_CMD_PORT);
+ if (!(status & 0x01))
+ break;
+ data = inb(APPLESMC_DATA_PORT);
+ }
+ if (i)
+ pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
return 0;
}

--
1.8.3.4

2013-10-02 18:02:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > > >>I think that is a pre-requisite.
> > > > >
> > > > >Not necessarily - it could be that the patch actually fixes the root
> > > > >cause. One possible scenario is that on recent SMCs, some of the
> > > > >commands produce more data than we actually read. This would
> > > > >eventually lead to both data corruption and overflow somwhere in the
> > > > >SMC internals. If the original SMC error is interpreted as a read
> > > > >buffer overflow, then that problem should be fixed with this patch.
> > > > >
> > > >
> > > > Good point.
> > > >
> > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> > >
> > > The explanation I have there is that the (newer) SMC needs the
> > > application to read the 'no more bytes' or it will get confused. It
> > > makes sense, if the number of bytes to read is no longer specified.
> > >
> > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> > That might make sense.
>
> It also points at the possibility of a smaller patch to test, but I
> have not had the time to check this very deeply myself:
>
I like this patch much more than the previous patch. Chris, can you test it ?

Thanks,
Guenter

> Thanks,
> Henrik
>
> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
>
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
>
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
>
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
>
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.
> ---
> drivers/hwmon/applesmc.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>
> static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> {
> + u8 status, data;
> int i;
>
> if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> return -EIO;
> }
>
> + /* This has no effect on newer (2012) SMCs */
> if (send_byte(len, APPLESMC_DATA_PORT)) {
> pr_warn("%.4s: read len fail\n", key);
> return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> buffer[i] = inb(APPLESMC_DATA_PORT);
> }
>
> + /* Read the data port until bit0 is cleared */
> + for (i = 0; i < 16; i++) {
> + udelay(APPLESMC_MIN_WAIT);
> + status = inb(APPLESMC_CMD_PORT);
> + if (!(status & 0x01))
> + break;
> + data = inb(APPLESMC_DATA_PORT);
> + }
> + if (i)
> + pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
> return 0;
> }
>
> --
> 1.8.3.4
>
>

2013-10-02 18:33:09

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 2, 2013, at 12:02 PM, Guenter Roeck <[email protected]> wrote:

> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>> I think that is a pre-requisite.
>>>>>>
>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>> commands produce more data than we actually read. This would
>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>> SMC internals. If the original SMC error is interpreted as a read
>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>>
>>>>>
>>>>> Good point.
>>>>>
>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>>
>>>> The explanation I have there is that the (newer) SMC needs the
>>>> application to read the 'no more bytes' or it will get confused. It
>>>> makes sense, if the number of bytes to read is no longer specified.
>>>>
>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>> That might make sense.
>>
>> It also points at the possibility of a smaller patch to test, but I
>> have not had the time to check this very deeply myself:
>>
> I like this patch much more than the previous patch. Chris, can you test it ?

Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.

Chris Murphy-

2013-10-02 20:59:47

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
>
> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <[email protected]> wrote:
>
> > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>> I think that is a pre-requisite.
> >>>>>>
> >>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>> commands produce more data than we actually read. This would
> >>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>> SMC internals. If the original SMC error is interpreted as a read
> >>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>>
> >>>>>
> >>>>> Good point.
> >>>>>
> >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>>
> >>>> The explanation I have there is that the (newer) SMC needs the
> >>>> application to read the 'no more bytes' or it will get confused. It
> >>>> makes sense, if the number of bytes to read is no longer specified.
> >>>>
> >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>> That might make sense.
> >>
> >> It also points at the possibility of a smaller patch to test, but I
> >> have not had the time to check this very deeply myself:
> >>
> > I like this patch much more than the previous patch. Chris, can you test it ?
>
> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
>
None, if I understand correctly and if the patch really fixes the root cause
of the problem.

Guenter

2013-10-02 21:35:18

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 2, 2013, at 2:59 PM, Guenter Roeck <[email protected]> wrote:

> On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
>>
>> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <[email protected]> wrote:
>>
>>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>>>> I think that is a pre-requisite.
>>>>>>>>
>>>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>>>> commands produce more data than we actually read. This would
>>>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>>>> SMC internals. If the original SMC error is interpreted as a read
>>>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>>>>
>>>>>>>
>>>>>>> Good point.
>>>>>>>
>>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>>>>
>>>>>> The explanation I have there is that the (newer) SMC needs the
>>>>>> application to read the 'no more bytes' or it will get confused. It
>>>>>> makes sense, if the number of bytes to read is no longer specified.
>>>>>>
>>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>>>> That might make sense.
>>>>
>>>> It also points at the possibility of a smaller patch to test, but I
>>>> have not had the time to check this very deeply myself:
>>>>
>>> I like this patch much more than the previous patch. Chris, can you test it ?
>>
>> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
>>
> None, if I understand correctly and if the patch really fixes the root cause
> of the problem.

A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.


Chris-

2013-10-02 23:32:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 02, 2013 at 03:34:41PM -0600, Chris Murphy wrote:
>
> On Oct 2, 2013, at 2:59 PM, Guenter Roeck <[email protected]> wrote:
>
> > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
> >>
> >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <[email protected]> wrote:
> >>
> >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>>>> I think that is a pre-requisite.
> >>>>>>>>
> >>>>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>>>> commands produce more data than we actually read. This would
> >>>>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>>>> SMC internals. If the original SMC error is interpreted as a read
> >>>>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Good point.
> >>>>>>>
> >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>>>>
> >>>>>> The explanation I have there is that the (newer) SMC needs the
> >>>>>> application to read the 'no more bytes' or it will get confused. It
> >>>>>> makes sense, if the number of bytes to read is no longer specified.
> >>>>>>
> >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>>>> That might make sense.
> >>>>
> >>>> It also points at the possibility of a smaller patch to test, but I
> >>>> have not had the time to check this very deeply myself:
> >>>>
> >>> I like this patch much more than the previous patch. Chris, can you test it ?
> >>
> >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
> >>
> > None, if I understand correctly and if the patch really fixes the root cause
> > of the problem.
>
> A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.
>
Yes, that could be a possible trigger. I thought it might be triggered by faster
boot (as one gets with 3.10 and 3.11), but slow boot is just as likely.

Guenter

2013-10-07 23:42:21

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On 10/02/2013 10:24 AM, Henrik Rydberg wrote:

>>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
>
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
>
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
>
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
>
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.

So, what should we do with this patch ? Apply it ?

Thanks,
Guenter

> ---
> drivers/hwmon/applesmc.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>
> static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> {
> + u8 status, data;
> int i;
>
> if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> return -EIO;
> }
>
> + /* This has no effect on newer (2012) SMCs */
> if (send_byte(len, APPLESMC_DATA_PORT)) {
> pr_warn("%.4s: read len fail\n", key);
> return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
> buffer[i] = inb(APPLESMC_DATA_PORT);
> }
>
> + /* Read the data port until bit0 is cleared */
> + for (i = 0; i < 16; i++) {
> + udelay(APPLESMC_MIN_WAIT);
> + status = inb(APPLESMC_CMD_PORT);
> + if (!(status & 0x01))
> + break;
> + data = inb(APPLESMC_DATA_PORT);
> + }
> + if (i)
> + pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
> return 0;
> }
>
>

2013-10-07 23:47:04

by Chris Murphy

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11


On Oct 7, 2013, at 5:42 PM, Guenter Roeck <[email protected]> wrote:

> On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
>
>>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
>> From: Henrik Rydberg <[email protected]>
>> Date: Wed, 2 Oct 2013 19:15:03 +0200
>> Subject: [PATCH] applesmc remedy take 2
>>
>> Conjectured problem: there are remnant bytes ready on the data line
>> which corrupts the read after a failure.
>>
>> Remedy: assuming bit0 is the read valid line, try to flush it before
>> starting a new command.
>>
>> Tests by Chris suggests reading the status is enough for the problem
>> to go away, which is consistent with a change in the SMC interface,
>> where the number of bytes to read is no longer specified, but found
>> out by reading until end of data.
>>
>> Tested on a MacBookAir3,1, but the original problem has not been
>> reproduced.
>
> So, what should we do with this patch ? Apply it ?

So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.

Chris-

2013-10-08 15:48:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Mon, Oct 07, 2013 at 05:46:55PM -0600, Chris Murphy wrote:
>
> On Oct 7, 2013, at 5:42 PM, Guenter Roeck <[email protected]> wrote:
>
> > On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> >
> >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> >> From: Henrik Rydberg <[email protected]>
> >> Date: Wed, 2 Oct 2013 19:15:03 +0200
> >> Subject: [PATCH] applesmc remedy take 2
> >>
> >> Conjectured problem: there are remnant bytes ready on the data line
> >> which corrupts the read after a failure.
> >>
> >> Remedy: assuming bit0 is the read valid line, try to flush it before
> >> starting a new command.
> >>
> >> Tests by Chris suggests reading the status is enough for the problem
> >> to go away, which is consistent with a change in the SMC interface,
> >> where the number of bytes to read is no longer specified, but found
> >> out by reading until end of data.
> >>
> >> Tested on a MacBookAir3,1, but the original problem has not been
> >> reproduced.
> >
> > So, what should we do with this patch ? Apply it ?
>
> So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
>

Seems to me we should apply it.

Henrik,

what do you think ? Did you have time for additional testing ?
If you think we should apply it, please send me a signed patch.

Thanks,
Guenter

2013-10-08 16:27:11

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

Hi Guenter,

> > > So, what should we do with this patch ? Apply it ?
> >
> > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> >
>
> Seems to me we should apply it.
>
> Henrik,
>
> what do you think ? Did you have time for additional testing ?
> If you think we should apply it, please send me a signed patch.

I want to make one more test tomorrow, then I'll send a proper patch. Thanks!

Henrik

2013-10-08 16:29:15

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Tue, Oct 08, 2013 at 06:29:17PM +0200, Henrik Rydberg wrote:
> Hi Guenter,
>
> > > > So, what should we do with this patch ? Apply it ?
> > >
> > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> > >
> >
> > Seems to me we should apply it.
> >
> > Henrik,
> >
> > what do you think ? Did you have time for additional testing ?
> > If you think we should apply it, please send me a signed patch.
>
> I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
>
Ok, great.

Thanks,
Guenter

2013-10-09 08:27:39

by Henrik Rydberg

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

Hi Guenter,

> > I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
> >
> Ok, great.

And here it is, with compiler warning fixed.

Thanks everybody,
Henrik

>From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <[email protected]>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] hwmon: (applesmc) Always read until end of data

The crash reported and investigated in commit 5f4513 turned out to be
caused by a change to the read interface on newer (2012) SMCs.

Tests by Chris show that simply reading the data valid line is enough
for the problem to go away. Additional tests show that the newer SMCs
no longer wait for the number of requested bytes, but start sending
data right away. Apparently the number of bytes to read is no longer
specified as before, but instead found out by reading until end of
data. Failure to read until end of data confuses the state machine,
which eventually causes the crash.

As a remedy, assuming bit0 is the read valid line, make sure there is
nothing more to read before leaving the read function.

Tested to resolve the original problem, and runtested on MBA3,1,
MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
machines before 2012.

Tested-by: Chris Murphy <[email protected]>
Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/hwmon/applesmc.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..3288f13 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)

static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
{
+ u8 status, data = 0;
int i;

if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
return -EIO;
}

+ /* This has no effect on newer (2012) SMCs */
if (send_byte(len, APPLESMC_DATA_PORT)) {
pr_warn("%.4s: read len fail\n", key);
return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
buffer[i] = inb(APPLESMC_DATA_PORT);
}

+ /* Read the data port until bit0 is cleared */
+ for (i = 0; i < 16; i++) {
+ udelay(APPLESMC_MIN_WAIT);
+ status = inb(APPLESMC_CMD_PORT);
+ if (!(status & 0x01))
+ break;
+ data = inb(APPLESMC_DATA_PORT);
+ }
+ if (i)
+ pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
return 0;
}

--
1.8.3.4

2013-10-09 16:53:01

by Guenter Roeck

[permalink] [raw]
Subject: Re: applesmc oops in 3.10/3.11

On Wed, Oct 09, 2013 at 10:29:39AM +0200, Henrik Rydberg wrote:
> From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <[email protected]>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] hwmon: (applesmc) Always read until end of data
>
> The crash reported and investigated in commit 5f4513 turned out to be
> caused by a change to the read interface on newer (2012) SMCs.
>
> Tests by Chris show that simply reading the data valid line is enough
> for the problem to go away. Additional tests show that the newer SMCs
> no longer wait for the number of requested bytes, but start sending
> data right away. Apparently the number of bytes to read is no longer
> specified as before, but instead found out by reading until end of
> data. Failure to read until end of data confuses the state machine,
> which eventually causes the crash.
>
> As a remedy, assuming bit0 is the read valid line, make sure there is
> nothing more to read before leaving the read function.
>
> Tested to resolve the original problem, and runtested on MBA3,1,
> MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
> machines before 2012.
>
> Tested-by: Chris Murphy <[email protected]>
> Signed-off-by: Henrik Rydberg <[email protected]>

Applied. I'll do my usual sanity testing and send it to Linus
either tomorrow or Friday.

Thanks,
Guenter