2016-04-22 06:39:42

by Philip Müller

[permalink] [raw]
Subject: [regression] linux318, linux41 - kernel stack is corrupted

Hi Greg, hi Sasha,

seems I found another regression within the latest point-releases of
3.18 and 4.1 kernel series. We tested it on AMD and Intel CPUs so far.
They hit the same regression. Other kernels released on that day are not
affected. Do you guys have a clue what might been have missed here?

3.18.30 and 4.1.21 didn't had that issue on the same hardware.

kind regards
Philip Müller
---------------------------
Manjaro Project Lead


2016-04-22 06:39:41

by Philip Müller

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

Hi Greg, hi Sasha,

seems I found another regression within the latest point-releases of
3.18 and 4.1 kernel series. We tested it on AMD and Intel CPUs so far.
They hit the same regression. Other kernels released on that day are not
affected. Do you guys have a clue what might been have missed here?

3.18.30 and 4.1.21 didn't had that issue on the same hardware.

kind regards
Philip Müller
--------------------------
Manjaro Project Lead

https://github.com/manjaro/packages-core/issues/36


2016-04-22 06:46:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

On Fri, Apr 22, 2016 at 08:17:58AM +0200, Philip M?ller wrote:
> Hi Greg, hi Sasha,
>
> seems I found another regression within the latest point-releases of
> 3.18 and 4.1 kernel series. We tested it on AMD and Intel CPUs so far.
> They hit the same regression. Other kernels released on that day are not
> affected. Do you guys have a clue what might been have missed here?
>
> 3.18.30 and 4.1.21 didn't had that issue on the same hardware.

You are going to have to be a bit more specific here...
What is the oops message? How do you reproduce this? Does it also
happen on 4.6-rc4?

Can you run 'git bisect' to find the offending patch?

thanks,

greg k-h

2016-04-22 07:52:20

by Sebastian M. Bobrecki

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

Hi,

I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.

On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
> ...
> You are going to have to be a bit more specific here...
> What is the oops message? How do you reproduce this? Does it also
> happen on 4.6-rc4?
>
> Can you run 'git bisect' to find the offending patch?
>
Greg have you seen screenshots linked by Philip?

--
Sebastian M. Bobrecki

2016-04-22 07:56:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

On Fri, Apr 22, 2016 at 09:47:04AM +0200, Sebastian M. Bobrecki wrote:
> Hi,
>
> I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.
>
> On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
> > ...
> > You are going to have to be a bit more specific here...
> > What is the oops message? How do you reproduce this? Does it also
> > happen on 4.6-rc4?
> >
> > Can you run 'git bisect' to find the offending patch?
> >
> Greg have you seen screenshots linked by Philip?

I saw no such screenshots in the email.

2016-04-22 08:11:14

by Sebastian M. Bobrecki

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

W dniu 22.04.2016 o 09:55, Greg Kroah-Hartman pisze:
> On Fri, Apr 22, 2016 at 09:47:04AM +0200, Sebastian M. Bobrecki wrote:
>> Hi,
>>
>> I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.
>>
>> On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
>>> ...
>>> You are going to have to be a bit more specific here...
>>> What is the oops message? How do you reproduce this? Does it also
>>> happen on 4.6-rc4?
>>>
>>> Can you run 'git bisect' to find the offending patch?
>>>
>> Greg have you seen screenshots linked by Philip?
> I saw no such screenshots in the email.
They are here: https://github.com/manjaro/packages-core/issues/36

--
Sebastian M. Bobrecki

2016-04-22 08:23:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

On Fri, Apr 22, 2016 at 10:10:59AM +0200, Sebastian M. Bobrecki wrote:
> W dniu 22.04.2016 o 09:55, Greg Kroah-Hartman pisze:
> > On Fri, Apr 22, 2016 at 09:47:04AM +0200, Sebastian M. Bobrecki wrote:
> > > Hi,
> > >
> > > I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.
> > >
> > > On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
> > > > ...
> > > > You are going to have to be a bit more specific here...
> > > > What is the oops message? How do you reproduce this? Does it also
> > > > happen on 4.6-rc4?
> > > >
> > > > Can you run 'git bisect' to find the offending patch?
> > > >
> > > Greg have you seen screenshots linked by Philip?
> > I saw no such screenshots in the email.
> They are here: https://github.com/manjaro/packages-core/issues/36

Looks like an acpi thermal patch got backported incorrectly, again, 'git
bisect' is going to help out the best here.

thanks,

greg k-h

2016-04-22 10:16:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

On Fri, 2016-04-22 at 17:23 +0900, Greg Kroah-Hartman wrote:
> On Fri, Apr 22, 2016 at 10:10:59AM +0200, Sebastian M. Bobrecki wrote:
> > W dniu 22.04.2016 o 09:55, Greg Kroah-Hartman pisze:
> > > On Fri, Apr 22, 2016 at 09:47:04AM +0200, Sebastian M. Bobrecki wrote:
> > > > Hi,
> > > >
> > > > I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.
> > > >
> > > > On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
> > > > > ...
> > > > > You are going to have to be a bit more specific here...
> > > > > What is the oops message? How do you reproduce this? Does it also
> > > > > happen on 4.6-rc4?
> > > > >
> > > > > Can you run 'git bisect' to find the offending patch?
> > > > >
> > > > Greg have you seen screenshots linked by Philip?
> > > I saw no such screenshots in the email.
> > They are here: https://github.com/manjaro/packages-core/issues/36
>
> Looks like an acpi thermal patch got backported incorrectly, again, 'git
> bisect' is going to help out the best here.

That'll work, but requires repeatedly ignoring the big-fat-warning :)

Backport of 81ad4276b505e987dd8ebbdf63605f92cd172b52 failed to adjust
for intervening ->get_trip_temp() argument type change, thus causing
stack protector to panic.

drivers/thermal/thermal_core.c: In function ‘thermal_zone_device_register’:
drivers/thermal/thermal_core.c:1569:41: warning: passing argument 3 of
‘tz->ops->get_trip_temp’ from incompatible pointer type [-Wincompatible-pointer-types]
if (tz->ops->get_trip_temp(tz, count, &trip_temp))
^
drivers/thermal/thermal_core.c:1569:41: note: expected ‘long unsigned int *’
but argument is of type ‘int *’

CC: <[email protected]> #3.18,#4.1
Signed-off-by: Mike Galbraith <[email protected]>
---
drivers/thermal/thermal_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1491,7 +1491,7 @@ struct thermal_zone_device *thermal_zone
{
struct thermal_zone_device *tz;
enum thermal_trip_type trip_type;
- int trip_temp;
+ unsigned long trip_temp;
int result;
int count;
int passive = 0;

2016-04-23 02:03:05

by Sasha Levin

[permalink] [raw]
Subject: Re: [regression] linux318, linux41 - kernel stack is corrupted

On 04/22/2016 06:16 AM, Mike Galbraith wrote:
> On Fri, 2016-04-22 at 17:23 +0900, Greg Kroah-Hartman wrote:
>> On Fri, Apr 22, 2016 at 10:10:59AM +0200, Sebastian M. Bobrecki wrote:
>>> W dniu 22.04.2016 o 09:55, Greg Kroah-Hartman pisze:
>>>> On Fri, Apr 22, 2016 at 09:47:04AM +0200, Sebastian M. Bobrecki wrote:
>>>>> Hi,
>>>>>
>>>>> I just hit the same with 4.1.22 on Gentoo. 4.1.21 are working fine.
>>>>>
>>>>> On 22.04.2016 at 08:46, Greg Kroah-Hartman wrote:
>>>>>> ...
>>>>>> You are going to have to be a bit more specific here...
>>>>>> What is the oops message? How do you reproduce this? Does it also
>>>>>> happen on 4.6-rc4?
>>>>>>
>>>>>> Can you run 'git bisect' to find the offending patch?
>>>>>>
>>>>> Greg have you seen screenshots linked by Philip?
>>>> I saw no such screenshots in the email.
>>> They are here: https://github.com/manjaro/packages-core/issues/36
>>
>> Looks like an acpi thermal patch got backported incorrectly, again, 'git
>> bisect' is going to help out the best here.
>
> That'll work, but requires repeatedly ignoring the big-fat-warning :)
>
> Backport of 81ad4276b505e987dd8ebbdf63605f92cd172b52 failed to adjust
> for intervening ->get_trip_temp() argument type change, thus causing
> stack protector to panic.
>
> drivers/thermal/thermal_core.c: In function ‘thermal_zone_device_register’:
> drivers/thermal/thermal_core.c:1569:41: warning: passing argument 3 of
> ‘tz->ops->get_trip_temp’ from incompatible pointer type [-Wincompatible-pointer-types]
> if (tz->ops->get_trip_temp(tz, count, &trip_temp))
> ^
> drivers/thermal/thermal_core.c:1569:41: note: expected ‘long unsigned int *’
> but argument is of type ‘int *’
>
> CC: <[email protected]> #3.18,#4.1
> Signed-off-by: Mike Galbraith <[email protected]>
> ---
> drivers/thermal/thermal_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1491,7 +1491,7 @@ struct thermal_zone_device *thermal_zone
> {
> struct thermal_zone_device *tz;
> enum thermal_trip_type trip_type;
> - int trip_temp;
> + unsigned long trip_temp;
> int result;
> int count;
> int passive = 0;
>

Thanks!

I'll put it on both 3.18 and 4.1, and will try to ship it within a day or
two once all tests have gone through.


Thanks,
Sasha