2024-03-29 09:50:25

by Paul Menzel

[permalink] [raw]
Subject: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`

Dear Linux folks,


On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
KASAN enabled.

$ git log --no-decorate --oneline -2 a2ce022afcbb
a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated
*.mod.c intermediaries
8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

After several ACPI S3 (deep) suspend and resume cycles, this morning I
noticed the error below:

[29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected

This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.

if (!mei_cl_is_connected(cl)) {
cl_err(dev, cl, "is not connected");
rets = -ENODEV;
goto out;
}

with `drivers/misc/mei/client.h` containing:

/**
* mei_cl_is_connected - host client is connected
*
* @cl: host client
*
* Return: true if the host client is connected
*/
static inline bool mei_cl_is_connected(const struct mei_cl *cl)
{
return cl->state == MEI_FILE_CONNECTED;
}

Unfortunately, I do not know at all, why the ME needs to be written to,
and what was tried to be written, and what the effect of this failure is.

Could you please take a look at it?


Kind regards,

Paul


[1]:
https://lore.kernel.org/all/20240326202548.GLZgMvTGpPfQcs2cQ_@fat_crate.local/


Attachments:
20240328--dell-xps-13-9360--linux-6.9-rc1+--messages.txt (200.09 kB)

2024-03-30 10:56:50

by Paul Menzel

[permalink] [raw]
Subject: Re: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`


Dear Tomas,


Thank you for your quick response.

Am 30.03.24 um 11:50 schrieb Winkler, Tomas:
>
>> -----Original Message-----
>> From: Paul Menzel <[email protected]>
>> Sent: Friday, March 29, 2024 12:49 PM

[…]

>> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
>> sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
>> KASAN enabled.
>>
>> $ git log --no-decorate --oneline -2 a2ce022afcbb
>> a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
>> 8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
>>
>> After several ACPI S3 (deep) suspend and resume cycles, this morning I
>> noticed the error below:
>>
>> [29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected
>>
>> This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.
>>
>> if (!mei_cl_is_connected(cl)) {
>> cl_err(dev, cl, "is not connected");
>> rets = -ENODEV;
>> goto out;
>> }
>>
>> with `drivers/misc/mei/client.h` containing:
>>
>> /**
>> * mei_cl_is_connected - host client is connected
>> *
>> * @cl: host client
>> *
>> * Return: true if the host client is connected
>> */
>> static inline bool mei_cl_is_connected(const struct mei_cl *cl)
>> {
>> return cl->state == MEI_FILE_CONNECTED;
>> }
>>
>> Unfortunately, I do not know at all, why the ME needs to be written to, and
>> what was tried to be written, and what the effect of this failure is.
>>
>> Could you please take a look at it?
>
> Looks like a timing issue between setting up HDCP by graphics and
> device power management. I don't think this is a really an issue if
> this is happening during power cycles stress.

Understood. Could this be because of the Address Sanitizer (KASAN)?

> Anyway we will look at that, will you be able to provide more debug
> information if we ask for it?
Thank you. Yes, I can test patches. But right now, I was only able to
see this once, so I am not sure how to reproduce it.


Kind regards,

Paul

2024-03-30 10:58:41

by Winkler, Tomas

[permalink] [raw]
Subject: RE: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`


> -----Original Message-----
> From: Paul Menzel <[email protected]>
> Sent: Friday, March 29, 2024 12:49 PM
> To: Winkler, Tomas <[email protected]>
> Cc: LKML <[email protected]>
> Subject: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not
> connected`
>
> Dear Linux folks,
>
>
> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
> sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
> KASAN enabled.
>
> $ git log --no-decorate --oneline -2 a2ce022afcbb
> a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c
> intermediaries
> 8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
>
> After several ACPI S3 (deep) suspend and resume cycles, this morning I
> noticed the error below:
>
> [29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected
>
> This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.
>
> if (!mei_cl_is_connected(cl)) {
> cl_err(dev, cl, "is not connected");
> rets = -ENODEV;
> goto out;
> }
>
> with `drivers/misc/mei/client.h` containing:
>
> /**
> * mei_cl_is_connected - host client is connected
> *
> * @cl: host client
> *
> * Return: true if the host client is connected
> */
> static inline bool mei_cl_is_connected(const struct mei_cl *cl)
> {
> return cl->state == MEI_FILE_CONNECTED;
> }
>
> Unfortunately, I do not know at all, why the ME needs to be written to, and
> what was tried to be written, and what the effect of this failure is.
>
> Could you please take a look at it?

Looks like a timing issue between setting up HDCP by graphics and device power management. I don't think this is a really an issue if this is happening during power cycles stress. Anyway we will look at that, will you be able to provide more debug information if we ask for it?

Thanks
Tomas


2024-04-01 06:54:57

by Usyskin, Alexander

[permalink] [raw]
Subject: RE: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`

> -----Original Message-----
> From: Paul Menzel <[email protected]>
> Sent: Saturday, March 30, 2024 13:56
> To: Winkler, Tomas <[email protected]>; Usyskin, Alexander
> <[email protected]>
> Cc: LKML <[email protected]>
> Subject: Re: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not
> connected`
>
>
> Dear Tomas,
>
>
> Thank you for your quick response.
>
> Am 30.03.24 um 11:50 schrieb Winkler, Tomas:
> >
> >> -----Original Message-----
> >> From: Paul Menzel <[email protected]>
> >> Sent: Friday, March 29, 2024 12:49 PM
>
> […]
>
> >> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
> >> sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
> >> KASAN enabled.
> >>
> >> $ git log --no-decorate --oneline -2 a2ce022afcbb
> >> a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c
> intermediaries
> >> 8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
> >>
> >> After several ACPI S3 (deep) suspend and resume cycles, this morning I
> >> noticed the error below:
> >>
> >> [29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected
> >>
> >> This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.
> >>
> >> if (!mei_cl_is_connected(cl)) {
> >> cl_err(dev, cl, "is not connected");
> >> rets = -ENODEV;
> >> goto out;
> >> }
> >>
> >> with `drivers/misc/mei/client.h` containing:
> >>
> >> /**
> >> * mei_cl_is_connected - host client is connected
> >> *
> >> * @cl: host client
> >> *
> >> * Return: true if the host client is connected
> >> */
> >> static inline bool mei_cl_is_connected(const struct mei_cl *cl)
> >> {
> >> return cl->state == MEI_FILE_CONNECTED;
> >> }
> >>
> >> Unfortunately, I do not know at all, why the ME needs to be written to, and
> >> what was tried to be written, and what the effect of this failure is.
> >>
> >> Could you please take a look at it?
> >
> > Looks like a timing issue between setting up HDCP by graphics and
> > device power management. I don't think this is a really an issue if
> > this is happening during power cycles stress.
>
> Understood. Could this be because of the Address Sanitizer (KASAN)?
>
> > Anyway we will look at that, will you be able to provide more debug
> > information if we ask for it?
> Thank you. Yes, I can test patches. But right now, I was only able to
> see this once, so I am not sure how to reproduce it.
>

This print is in the code path executed from user-space only.
Seem like some user space app have had connection opened before suspend
and tried to write after resume, but driver closed all connections on suspend.
This is normal flow; user space should reopen handle and retry in this case.

The print can be demoted to debug, I think.

--
Alexander (Sasha) Usyskin

CSE FW Dev - Host SW
Intel Israel (74) Limited



>
> Kind regards,
>
> Paul

2024-04-01 13:07:35

by Paul Menzel

[permalink] [raw]
Subject: Re: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`

Dear Alexander,


Thank you very much for your reply.

Am 01.04.24 um 08:54 schrieb Usyskin, Alexander:
>> -----Original Message-----

>> Sent: Saturday, March 30, 2024 13:56
>> Am 30.03.24 um 11:50 schrieb Winkler, Tomas:
>>>
>>>> -----Original Message-----
>>>> From: Paul Menzel <[email protected]>
>>>> Sent: Friday, March 29, 2024 12:49 PM
>>
>> […]
>>
>>>> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
>>>> sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
>>>> KASAN enabled.
>>>>
>>>> $ git log --no-decorate --oneline -2 a2ce022afcbb
>>>> a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
>>>> 8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
>>>>
>>>> After several ACPI S3 (deep) suspend and resume cycles, this morning I
>>>> noticed the error below:
>>>>
>>>> [29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected
>>>>
>>>> This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.
>>>>
>>>> if (!mei_cl_is_connected(cl)) {
>>>> cl_err(dev, cl, "is not connected");
>>>> rets = -ENODEV;
>>>> goto out;
>>>> }
>>>>
>>>> with `drivers/misc/mei/client.h` containing:
>>>>
>>>> /**
>>>> * mei_cl_is_connected - host client is connected
>>>> *
>>>> * @cl: host client
>>>> *
>>>> * Return: true if the host client is connected
>>>> */
>>>> static inline bool mei_cl_is_connected(const struct mei_cl *cl)
>>>> {
>>>> return cl->state == MEI_FILE_CONNECTED;
>>>> }
>>>>
>>>> Unfortunately, I do not know at all, why the ME needs to be written to, and
>>>> what was tried to be written, and what the effect of this failure is.
>>>>
>>>> Could you please take a look at it?
>>>
>>> Looks like a timing issue between setting up HDCP by graphics and
>>> device power management. I don't think this is a really an issue if
>>> this is happening during power cycles stress.
>>
>> Understood. Could this be because of the Address Sanitizer (KASAN)?
>>
>>> Anyway we will look at that, will you be able to provide more debug
>>> information if we ask for it?
>>
>> Thank you. Yes, I can test patches. But right now, I was only able to
>> see this once, so I am not sure how to reproduce it.
>
> This print is in the code path executed from user-space only. Seem
> like some user space app have had connection opened before suspend
> and tried to write after resume, but driver closed all connections on
> suspend. This is normal flow; user space should reopen handle and
> retry in this case.

Interesting. Would user space program could this be?

> The print can be demoted to debug, I think.

Understood. Still maybe it could be extended too, so the cause/solution
could be deduced from the Linux logs.


Kind regards,

Paul


PS: Only if you care:

> --
> Alexander (Sasha) Usyskin

Your signature delimiter misses a trailing space at the end [1].


[1]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter

2024-04-01 15:13:38

by Usyskin, Alexander

[permalink] [raw]
Subject: RE: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`

> -----Original Message-----
> From: Paul Menzel <[email protected]>
> Sent: Monday, April 01, 2024 16:07
> To: Usyskin, Alexander <[email protected]>; Winkler, Tomas
> <[email protected]>
> Cc: LKML <[email protected]>
> Subject: Re: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not
> connected`
>
> Dear Alexander,
>
>
> Thank you very much for your reply.
>
> Am 01.04.24 um 08:54 schrieb Usyskin, Alexander:
> >> -----Original Message-----
>
> >> Sent: Saturday, March 30, 2024 13:56
> >> Am 30.03.24 um 11:50 schrieb Winkler, Tomas:
> >>>
> >>>> -----Original Message-----
> >>>> From: Paul Menzel <[email protected]>
> >>>> Sent: Friday, March 29, 2024 12:49 PM
> >>
> >> […]
> >>
> >>>> On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
> >>>> sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
> >>>> KASAN enabled.
> >>>>
> >>>> $ git log --no-decorate --oneline -2 a2ce022afcbb
> >>>> a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c
> intermediaries
> >>>> 8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
> >>>>
> >>>> After several ACPI S3 (deep) suspend and resume cycles, this morning I
> >>>> noticed the error below:
> >>>>
> >>>> [29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not
> connected
> >>>>
> >>>> This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.
> >>>>
> >>>> if (!mei_cl_is_connected(cl)) {
> >>>> cl_err(dev, cl, "is not connected");
> >>>> rets = -ENODEV;
> >>>> goto out;
> >>>> }
> >>>>
> >>>> with `drivers/misc/mei/client.h` containing:
> >>>>
> >>>> /**
> >>>> * mei_cl_is_connected - host client is connected
> >>>> *
> >>>> * @cl: host client
> >>>> *
> >>>> * Return: true if the host client is connected
> >>>> */
> >>>> static inline bool mei_cl_is_connected(const struct mei_cl *cl)
> >>>> {
> >>>> return cl->state == MEI_FILE_CONNECTED;
> >>>> }
> >>>>
> >>>> Unfortunately, I do not know at all, why the ME needs to be written to, and
> >>>> what was tried to be written, and what the effect of this failure is.
> >>>>
> >>>> Could you please take a look at it?
> >>>
> >>> Looks like a timing issue between setting up HDCP by graphics and
> >>> device power management. I don't think this is a really an issue if
> >>> this is happening during power cycles stress.
> >>
> >> Understood. Could this be because of the Address Sanitizer (KASAN)?
> >>
> >>> Anyway we will look at that, will you be able to provide more debug
> >>> information if we ask for it?
> >>
> >> Thank you. Yes, I can test patches. But right now, I was only able to
> >> see this once, so I am not sure how to reproduce it.
> >
> > This print is in the code path executed from user-space only. Seem
> > like some user space app have had connection opened before suspend
> > and tried to write after resume, but driver closed all connections on
> > suspend. This is normal flow; user space should reopen handle and
> > retry in this case.
>
> Interesting. Would user space program could this be?
>

You can try to look who opens /dev/mei* with lsof or similar.

> > The print can be demoted to debug, I think.
>
> Understood. Still maybe it could be extended too, so the cause/solution
> could be deduced from the Linux logs.
>
This can be genuine user-space error when app breaks the protocol.
Driver has no data to distinguish.

>
> Kind regards,
>
> Paul
>
>
> PS: Only if you care:
>
> > --
> > Alexander (Sasha) Usyskin
>
> Your signature delimiter misses a trailing space at the end [1].
>
>
> [1]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter

Thx, fixing, this should be right.

--
Alexander (Sasha) Usyskin

CSE FW Dev - Host SW
Intel Israel (74) Limited