2013-06-03 17:39:04

by Stefan Seyfried

[permalink] [raw]
Subject: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

Or, to be more precise: it breaks resume.

The machine seems to lock up hard after resume, then after a few seconds
it panics (caps lock blinking).

Reproduced on ThinkPad X200s

00:03.0 0780: 8086:2a44 (rev 07)
Intel Corporation Mobile 4 Series Chipset MEI Controller

Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
from the mei_me driver, then finally the panic (some overflow maybe?).

Unbinding the device before suspend fixes resume.
This machine has suspended and resumed fine with 3.9.

This machine has no serial port, so it is hard for me to capture output.
I could try to take a picture of the panic message if that would be helpful.

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537


2013-06-03 17:44:38

by Stefan Seyfried

[permalink] [raw]
Subject: Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

Am 03.06.2013 19:38, schrieb Stefan Seyfried:
> Or, to be more precise: it breaks resume.
>
> The machine seems to lock up hard after resume, then after a few seconds
> it panics (caps lock blinking).
>
> Reproduced on ThinkPad X200s
>
> 00:03.0 0780: 8086:2a44 (rev 07)
> Intel Corporation Mobile 4 Series Chipset MEI Controller
>
> Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
> from the mei_me driver, then finally the panic (some overflow maybe?).
>
> Unbinding the device before suspend fixes resume.

I just noticed that I get the following message on unbinding:

$ echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
$ dmesg|tail -2
[ 1216.830034] mei_me 0000:00:03.0: stop
[ 1216.837018] mei_me 0000:00:03.0: wait hw ready failed. status = 0x0

not sure if this is related.

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

2013-06-03 18:10:01

by Tomas Winkler

[permalink] [raw]
Subject: Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

>
> > Or, to be more precise: it breaks resume.
> >
> > The machine seems to lock up hard after resume, then after a few seconds
> > it panics (caps lock blinking).
> >
> > Reproduced on ThinkPad X200s
> >
> > 00:03.0 0780: 8086:2a44 (rev 07)
> > Intel Corporation Mobile 4 Series Chipset MEI Controller
> >
> > Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
> > from the mei_me driver, then finally the panic (some overflow maybe?).
> >
> > Unbinding the device before suspend fixes resume.
>
> I just noticed that I get the following message on unbinding:
>
> $ echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
> $ dmesg|tail -2
> [ 1216.830034] mei_me 0000:00:03.0: stop
> [ 1216.837018] mei_me 0000:00:03.0: wait hw ready failed. status = 0x0
>
> not sure if this is related.
>
Thanks for the report I'm looking into it.

Thanks
Tomas

2013-06-18 18:27:21

by Stefan Seyfried

[permalink] [raw]
Subject: Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

Hi Tomas,

executive summary: it is not fixed in 3.10rc6

Am 03.06.2013 20:09, schrieb Tomas Winkler:
>>> Or, to be more precise: it breaks resume.
>>>
>>> The machine seems to lock up hard after resume, then after a few seconds
>>> it panics (caps lock blinking).
>>>
>>> Reproduced on ThinkPad X200s
>>>
>>> 00:03.0 0780: 8086:2a44 (rev 07)
>>> Intel Corporation Mobile 4 Series Chipset MEI Controller
>>>
>>> Debugged with "init=/bin/bash no_console_suspend", I see lots of errors
>>> from the mei_me driver, then finally the panic (some overflow maybe?).
>>>
>>> Unbinding the device before suspend fixes resume.
>>
>> I just noticed that I get the following message on unbinding:
>>
>> $ echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
>> $ dmesg|tail -2
>> [ 1216.830034] mei_me 0000:00:03.0: stop
>> [ 1216.837018] mei_me 0000:00:03.0: wait hw ready failed. status = 0x0
>>
>> not sure if this is related.
>>
> Thanks for the report I'm looking into it.

I looked at the git log of drivers/misc/mei and it looked promising.

However, it still does not work, commit
42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough on
my hardware.

Still just unbinding and rebinding with
echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/bind

triggers lots of
[ 318.330981] mei_me 0000:00:03.0: reset: wrong host start response
[ 318.330984] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 318.330990] mei_me 0000:00:03.0: reset: unexpected enumeration response hbm.
[ 318.330993] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 318.331016] mei_me 0000:00:03.0: reset: wrong host start response
[ 318.331019] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 346.571031] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
[ 346.571047] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 376.631030] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
[ 376.631044] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING

It does, however, calm down after a few seconds, only to spew a few lines
once every 30 seconds:

[ 406.691032] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
[ 406.691048] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 436.751033] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
[ 436.751047] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
[ 466.811030] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
[ 466.811044] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING

So it is not yet fixed, unfortunately.

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

2013-06-19 08:52:45

by Tomas Winkler

[permalink] [raw]
Subject: RE: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

>
> However, it still does not work, commit
> 42f132febff3b7b42c6c9dbfc151f29233be3132 does not seem to help enough
> on my hardware.
>
> Still just unbinding and rebinding with
> echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/unbind
> echo 0000:00:03.0 > /sys/bus/pci/drivers/mei_me/bind
>
> triggers lots of
> [ 318.330981] mei_me 0000:00:03.0: reset: wrong host start response [
> 318.330984] mei_me 0000:00:03.0: unexpected reset: dev_state = RESETTING
> [ 318.330990] mei_me 0000:00:03.0: reset: unexpected enumeration
> response hbm.
> [ 318.330993] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING [ 318.331016] mei_me 0000:00:03.0: reset: wrong host start
> response [ 318.331019] mei_me 0000:00:03.0: unexpected reset: dev_state
> = RESETTING [ 346.571031] mei_me 0000:00:03.0: reset: init clients timeout
> hbm_state = 1.
> [ 346.571047] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING [ 376.631030] mei_me 0000:00:03.0: reset: init clients timeout
> hbm_state = 1.
> [ 376.631044] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING
>
> It does, however, calm down after a few seconds, only to spew a few lines
> once every 30 seconds:
>
> [ 406.691032] mei_me 0000:00:03.0: reset: init clients timeout hbm_state = 1.
> [ 406.691048] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING [ 436.751033] mei_me 0000:00:03.0: reset: init clients timeout
> hbm_state = 1.
> [ 436.751047] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING [ 466.811030] mei_me 0000:00:03.0: reset: init clients timeout
> hbm_state = 1.
> [ 466.811044] mei_me 0000:00:03.0: unexpected reset: dev_state =
> RESETTING
>
> So it is not yet fixed, unfortunately.

Not sure I understand how to reproduce it. it is still falling on suspend/resume or just unbind/bind?
Would you be so kind and send me the whole log.

Thanks
Tomas


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-06-19 09:03:09

by Stefan Seyfried

[permalink] [raw]
Subject: Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

Hi Tomas,

Am 19.06.2013 10:52, schrieb Winkler, Tomas:

>> So it is not yet fixed, unfortunately.
>
> Not sure I understand how to reproduce it. it is still falling on suspend/resume or just unbind/bind?
> Would you be so kind and send me the whole log.

Both is still broken. I'm actually not really sure if the unbind / bind
stuff is really related to the suspend / resume failure. The messages
just looked similar to me, but that might not mean anything.

Sending the whole log is not easy, since it overflows the dmesg buffer
(I have CONFIG_LOG_BUF_SHIFT=18 which is "big enough" usually) and the
journald just exits and restarts itself under such flooding, but I'll try.

Since the resume from suspend to RAM hangs, it is hard to get any logs
-- I never got the mei serial working before and a "real" serial port is
not present on this Thinkpad -- since the resume does not seem to
restart userspace before killing the machine, so nothing gets into the logs.

The suspend/resume failure is easily reproduced by

* booting with "init=/bin/bash no_console_suspend"
* mount /sys
* echo mem > /sys/power/state
* resume => lots of messages, finally kernel panic.

For the bind/unbind: the driver is built in (this is the openSUSE
kernel-of-the-day), but unbinding / rebinding also reproducibly floods
the logs. It does not seem to have additional side effects, but I cannot
test if mei actually still works afterwards.

I could try to take a picture of the panic, but it looked not really
directly related, more like a stack overflow after too many errors or
something like that (it also takes a few seconds after resume for the
machine to panic).

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

2013-06-29 19:46:43

by Stefan Seyfried

[permalink] [raw]
Subject: Re: INTEL_MEI_ME=y breaks suspend on 3.10-rc3

Hi all,

I hate to say it, but this regression from 3.9 is still present in
3.10-rc7 :-(

Am 19.06.2013 11:02, schrieb Stefan Seyfried:
> The suspend/resume failure is easily reproduced by
>
> * booting with "init=/bin/bash no_console_suspend"
> * mount /sys
> * echo mem > /sys/power/state
> * resume => lots of messages, finally kernel panic.

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537