2011-03-09 09:46:47

by Wang Lei

[permalink] [raw]
Subject: [BUG] Fans keep running, first found since v2.6.38-rc7

Hi, hackers!

I have ever reported some bugs, always got kind help, even though one
thing is not fixed yet, leaving my SBx00 sound card no voice.

Recently, i encounter another problem, after startup all fans keep
running. I guess it's a bug. I first saw this at v2.6.38-rc7. It still
exists at v2.6.38-rc8. The version I'm running now, v2.6.38-rc6+, is OK.

I cat /sys/class/thermal/thermal_zone0/trip_point_*_temp at the latest
v2.6.38-rc8, get this output:

[~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
105000
15900
15900
15900
15900
15900
[~]$

This is the output at v2.6.38-rc6+:

[~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
105000
95000
75000
65000
55000
40000
[~]$

Any help is appreciated.

Cheers,
Lei


2011-03-09 10:41:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

On Wednesday, March 09, 2011, Wang Lei wrote:
> Hi, hackers!
>
> I have ever reported some bugs, always got kind help, even though one
> thing is not fixed yet, leaving my SBx00 sound card no voice.
>
> Recently, i encounter another problem, after startup all fans keep
> running. I guess it's a bug. I first saw this at v2.6.38-rc7. It still
> exists at v2.6.38-rc8. The version I'm running now, v2.6.38-rc6+, is OK.
>
> I cat /sys/class/thermal/thermal_zone0/trip_point_*_temp at the latest
> v2.6.38-rc8, get this output:
>
> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
> 105000
> 15900
> 15900
> 15900
> 15900
> 15900
> [~]$
>
> This is the output at v2.6.38-rc6+:
>
> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
> 105000
> 95000
> 75000
> 65000
> 55000
> 40000
> [~]$
>
> Any help is appreciated.

There was only one commit in that area since 2.6.38-rc6, but it shouldn't
affect the functionality this way.

Is yout thermal management controlled by ACPI?

Rafael

2011-03-09 13:28:14

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

"Rafael J. Wysocki" <[email protected]> writes:

>
> There was only one commit in that area since 2.6.38-rc6, but it shouldn't
> affect the functionality this way.
>
> Is yout thermal management controlled by ACPI?
>
> Rafael

Thanks for your reply!

How could I know that?

--
Regards,
Lei

2011-03-09 20:25:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

On Wednesday, March 09, 2011, Wang Lei wrote:
> "Rafael J. Wysocki" <[email protected]> writes:
>
> >
> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
> > affect the functionality this way.
> >
> > Is yout thermal management controlled by ACPI?
> >
> > Rafael
>
> Thanks for your reply!
>
> How could I know that?

What does "ls -l /sys/class/thermal/cooling_device0/device" say?

Rafael

2011-03-09 23:16:04

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

"Rafael J. Wysocki" <[email protected]> writes:

> On Wednesday, March 09, 2011, Wang Lei wrote:
>> "Rafael J. Wysocki" <[email protected]> writes:
>>
>> >
>> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
>> > affect the functionality this way.
>> >
>> > Is yout thermal management controlled by ACPI?
>> >
>> > Rafael
>>
>> Thanks for your reply!
>>
>> How could I know that?
>
> What does "ls -l /sys/class/thermal/cooling_device0/device" say?
>
> Rafael

[~]$ ls -l /sys/class/thermal/cooling_device0/device
lrwxrwxrwx 1 root root 0 Mar 10 07:13 /sys/class/thermal/cooling_device0/device -> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
[~]$

--
Regards,
Lei

2011-03-10 00:06:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

On Thursday, March 10, 2011, Wang Lei wrote:
> "Rafael J. Wysocki" <[email protected]> writes:
>
> > On Wednesday, March 09, 2011, Wang Lei wrote:
> >> "Rafael J. Wysocki" <[email protected]> writes:
> >>
> >> >
> >> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
> >> > affect the functionality this way.
> >> >
> >> > Is yout thermal management controlled by ACPI?
> >> >
> >> > Rafael
> >>
> >> Thanks for your reply!
> >>
> >> How could I know that?
> >
> > What does "ls -l /sys/class/thermal/cooling_device0/device" say?
> >
> > Rafael
>
> [~]$ ls -l /sys/class/thermal/cooling_device0/device
> lrwxrwxrwx 1 root root 0 Mar 10 07:13 /sys/class/thermal/cooling_device0/device -> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
> [~]$

That's ACPI.

I don't know, however, which change might cause the problem to happen.

Can you bisect the commits between 2.6.38-rc6 and -rc7 to find the one that
introduced the issue?

Rafael

2011-03-10 11:59:29

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

"Rafael J. Wysocki" <[email protected]> writes:

> On Thursday, March 10, 2011, Wang Lei wrote:
>> "Rafael J. Wysocki" <[email protected]> writes:
>>
>> > On Wednesday, March 09, 2011, Wang Lei wrote:
>> >> "Rafael J. Wysocki" <[email protected]> writes:
>> >>
>> >> >
>> >> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
>> >> > affect the functionality this way.
>> >> >
>> >> > Is yout thermal management controlled by ACPI?
>> >> >
>> >> > Rafael
>> >>
>> >> Thanks for your reply!
>> >>
>> >> How could I know that?
>> >
>> > What does "ls -l /sys/class/thermal/cooling_device0/device" say?
>> >
>> > Rafael
>>
>> [~]$ ls -l /sys/class/thermal/cooling_device0/device
>> lrwxrwxrwx 1 root root 0 Mar 10 07:13 /sys/class/thermal/cooling_device0/device -> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
>> [~]$
>
> That's ACPI.
>
> I don't know, however, which change might cause the problem to happen.
>
> Can you bisect the commits between 2.6.38-rc6 and -rc7 to find the one that
> introduced the issue?
>
> Rafael
OK, I'll try.

--
Regards,
Lei

2011-03-10 21:31:02

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

Hi,

> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
> 105000
> 15900
> 15900
> 15900
> 15900
> 15900
> [~]$
>
> This is the output at v2.6.38-rc6+:
>
> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
> 105000
> 95000
> 75000
> 65000
> 55000
> 40000
> [~]$
>
> Any help is appreciated.
>
> Cheers,
> Lei

I wonder if this problem is related, present in latest git (35d34df71):

mainline>cat /sys/module/*/srcversion | sort | uniq -c
1 03B88A441752AC2157DBD6A
1 0C932B9F52535F81D3FFF93
1 1BB88B035855B2AE7F47229
1 2100DB8F374F0EE1AA607F2
1 2BBC9DD51CFC881CF5F60AA
1 3D4D8AFD09A0C71D9B19948
1 3EF20C25CC62BD750F4C3F3
1 41024DA8E830C7DAE171017
1 47AD35AE180473EB06EED32
19 533BB7E5866E52F63B9ACCB
1 598C709DDDAB55EB331379A
1 5AC5CB9DA8C242CBC76EEC0
1 6961679677DB019310F9046
1 6BBB3E29D835F5EC8C2FCF9
1 78FBC4BE6E0D68CC70FBFEC
1 97A6441C3D26B1B6A9B2B2B
1 AAD974CC23F320629986F38
2 AEBBDFD273E0316FD4E5D04
1 CC2F5468F15E654BA13360D
1 CD18C3E8FA6CD119EE11935
1 EA10C238A334BB44445DC83
1 EF2B1B70D28CB9290D97726

It shows that 19 modules exhibit the same module srcversion... This
behavior showed up late in -rc's, I am going to bisect now (unless
this is already done).

Henrik

2011-03-11 00:22:35

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

"Rafael J. Wysocki" <[email protected]> writes:

> On Thursday, March 10, 2011, Wang Lei wrote:
>> "Rafael J. Wysocki" <[email protected]> writes:
>>
>> > On Wednesday, March 09, 2011, Wang Lei wrote:
>> >> "Rafael J. Wysocki" <[email protected]> writes:
>> >>
>> >> >
>> >> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
>> >> > affect the functionality this way.
>> >> >
>> >> > Is yout thermal management controlled by ACPI?
>> >> >
>> >> > Rafael
>> >>
>> >> Thanks for your reply!
>> >>
>> >> How could I know that?
>> >
>> > What does "ls -l /sys/class/thermal/cooling_device0/device" say?
>> >
>> > Rafael
>>
>> [~]$ ls -l /sys/class/thermal/cooling_device0/device
>> lrwxrwxrwx 1 root root 0 Mar 10 07:13 /sys/class/thermal/cooling_device0/device -> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
>> [~]$
>
> That's ACPI.
>
> I don't know, however, which change might cause the problem to happen.
>
> Can you bisect the commits between 2.6.38-rc6 and -rc7 to find the one that
> introduced the issue?
>
> Rafael

Thanks, Rafael.
Bisect stopped at commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12

--------------------
[~/repository/kernel]$ git bisect good
7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 is the first bad commit
commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
Author: Andreas Herrmann <[email protected]>
Date: Thu Feb 24 15:53:46 2011 +0100

x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems

On some SB800 systems polarity for IOAPIC pin2 is wrongly
specified as low active by BIOS. This caused system hangs after
resume from S3 when HPET was used in one-shot mode on such
systems because a timer interrupt was missed (HPET signal is
high active).

For more details see:

http://marc.info/?l=linux-kernel&m=129623757413868

Tested-by: Manoj Iyer <[email protected]>
Tested-by: Andre Przywara <[email protected]>
Signed-off-by: Andreas Herrmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected] # 37.x, 32.x
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 918adb3e08ef8cd258016dc46afab842e1be65fc 77d76d6f2451b16f963ce1ffe183aacfed9d994d M arch
[~/repository/kernel]$
--------------------

So, I Cc to Andreas Herrmann, hope you will notice and help fix this.

Thanks, all you hackers!

--
Regards,
Lei

2011-03-11 00:39:11

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7


Hi, Henrik Rydberg.

"Henrik Rydberg" <[email protected]> writes:

> Hi,
>
>> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
>> 105000
>> 15900
>> 15900
>> 15900
>> 15900
>> 15900
>> [~]$
>>
>> This is the output at v2.6.38-rc6+:
>>
>> [~]$ cat /sys/class/thermal/thermal_zone0/trip_point_*_temp
>> 105000
>> 95000
>> 75000
>> 65000
>> 55000
>> 40000
>> [~]$
>>
>> Any help is appreciated.
>>
>> Cheers,
>> Lei
>
> I wonder if this problem is related, present in latest git (35d34df71):
>
> mainline>cat /sys/module/*/srcversion | sort | uniq -c
> 1 03B88A441752AC2157DBD6A
> 1 0C932B9F52535F81D3FFF93
> 1 1BB88B035855B2AE7F47229
> 1 2100DB8F374F0EE1AA607F2
> 1 2BBC9DD51CFC881CF5F60AA
> 1 3D4D8AFD09A0C71D9B19948
> 1 3EF20C25CC62BD750F4C3F3
> 1 41024DA8E830C7DAE171017
> 1 47AD35AE180473EB06EED32
> 19 533BB7E5866E52F63B9ACCB
> 1 598C709DDDAB55EB331379A
> 1 5AC5CB9DA8C242CBC76EEC0
> 1 6961679677DB019310F9046
> 1 6BBB3E29D835F5EC8C2FCF9
> 1 78FBC4BE6E0D68CC70FBFEC
> 1 97A6441C3D26B1B6A9B2B2B
> 1 AAD974CC23F320629986F38
> 2 AEBBDFD273E0316FD4E5D04
> 1 CC2F5468F15E654BA13360D
> 1 CD18C3E8FA6CD119EE11935
> 1 EA10C238A334BB44445DC83
> 1 EF2B1B70D28CB9290D97726
>
> It shows that 19 modules exhibit the same module srcversion... This
> behavior showed up late in -rc's, I am going to bisect now (unless
> this is already done).
>
> Henrik

I've confirmed the commit that caused the problem and sent a mail.

I don't know if these two problem are related, because i ran your
command but there is no `/sys/module/*/srcversion' in my system. I think
this is because of some configs i didn't enable. But I'll wait to see if
I've provided enough information to help your hackers fix it.

Thanks!

--
Regards,
Lei

Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

Hi Lei,

can you please provide information, which system you are using. dmesg
(please use apic=debug kernel parameter), output of lspci -nnxxxx (as
root),

This reminds me on some HP laptop issues, see
https://bugzilla.kernel.org/show_bug.cgi?id=11715
https://bugzilla.kernel.org/show_bug.cgi?id=11516
The BIOSes on HP laptops somehow checked IO-APIC configuration
and set trip_points differently (for unknown reason).

But I need above debug info to sort this out.


Thanks,

Andreas


On Thu, Mar 10, 2011 at 07:04:51PM -0500, Wang Lei wrote:
> "Rafael J. Wysocki" <[email protected]> writes:
>
> > On Thursday, March 10, 2011, Wang Lei wrote:
> >> "Rafael J. Wysocki" <[email protected]> writes:
> >>
> >> > On Wednesday, March 09, 2011, Wang Lei wrote:
> >> >> "Rafael J. Wysocki" <[email protected]> writes:
> >> >>
> >> >> >
> >> >> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
> >> >> > affect the functionality this way.
> >> >> >
> >> >> > Is yout thermal management controlled by ACPI?
> >> >> >
> >> >> > Rafael
> >> >>
> >> >> Thanks for your reply!
> >> >>
> >> >> How could I know that?
> >> >
> >> > What does "ls -l /sys/class/thermal/cooling_device0/device" say?
> >> >
> >> > Rafael
> >>
> >> [~]$ ls -l /sys/class/thermal/cooling_device0/device
> >> lrwxrwxrwx 1 root root 0 Mar 10 07:13 /sys/class/thermal/cooling_device0/device -> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
> >> [~]$
> >
> > That's ACPI.
> >
> > I don't know, however, which change might cause the problem to happen.
> >
> > Can you bisect the commits between 2.6.38-rc6 and -rc7 to find the one that
> > introduced the issue?
> >
> > Rafael
>
> Thanks, Rafael.
> Bisect stopped at commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
>
> --------------------
> [~/repository/kernel]$ git bisect good
> 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 is the first bad commit
> commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
> Author: Andreas Herrmann <[email protected]>
> Date: Thu Feb 24 15:53:46 2011 +0100
>
> x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems
>
> On some SB800 systems polarity for IOAPIC pin2 is wrongly
> specified as low active by BIOS. This caused system hangs after
> resume from S3 when HPET was used in one-shot mode on such
> systems because a timer interrupt was missed (HPET signal is
> high active).
>
> For more details see:
>
> http://marc.info/?l=linux-kernel&m=129623757413868
>
> Tested-by: Manoj Iyer <[email protected]>
> Tested-by: Andre Przywara <[email protected]>
> Signed-off-by: Andreas Herrmann <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: [email protected] # 37.x, 32.x
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> :040000 040000 918adb3e08ef8cd258016dc46afab842e1be65fc 77d76d6f2451b16f963ce1ffe183aacfed9d994d M arch
> [~/repository/kernel]$
> --------------------
>
> So, I Cc to Andreas Herrmann, hope you will notice and help fix this.
>
> Thanks, all you hackers!
>
> --
> Regards,
> Lei
>

--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632

2011-03-11 07:59:53

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

> I've confirmed the commit that caused the problem and sent a mail.
>
> I don't know if these two problem are related, because i ran your
> command but there is no `/sys/module/*/srcversion' in my system. I think
> this is because of some configs i didn't enable. But I'll wait to see if
> I've provided enough information to help your hackers fix it.

Hm, looks like a different problem, then. Mine stops at

commit b7bd182176960fdd139486cadb9962b39f8a2b50
Author: Michal Marek <[email protected]>

fixdep: Do not record dependency on the source file itself

This looks like a build change, so perhaps a symptom rather than the
actual problem. Nevertheless, that's where it stops.

Thanks,
Henrik

2011-03-11 10:47:33

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7


> This reminds me on some HP laptop issues, see
> https://bugzilla.kernel.org/show_bug.cgi?id=11715
> https://bugzilla.kernel.org/show_bug.cgi?id=11516
> The BIOSes on HP laptops somehow checked IO-APIC configuration
> and set trip_points differently (for unknown reason).
>
> But I need above debug info to sort this out.
>

I am running a HP Compaq 6515b laptop (unfortunately) . Sigh.

>
> Thanks,
>
> Andreas
>
>
> On Thu, Mar 10, 2011 at 07:04:51PM -0500, Wang Lei wrote:
>> "Rafael J. Wysocki" <[email protected]> writes:
>>
>> > On Thursday, March 10, 2011, Wang Lei wrote:
>> >> "Rafael J. Wysocki" <[email protected]> writes:
>> >>
>> >> > On Wednesday, March 09, 2011, Wang Lei wrote:
>> >> >> "Rafael J. Wysocki" <[email protected]> writes:
>> >> >>
>> >> >> >
>> >> >> > There was only one commit in that area since 2.6.38-rc6, but it shouldn't
>> >> >> > affect the functionality this way.
>> >> >> >
>> >> >> > Is yout thermal management controlled by ACPI?
>> >> >> >
>> >> >> > Rafael
>> >> >>
>> >> >> Thanks for your reply!
>> >> >>
>> >> >> How could I know that?
>> >> >
>> >> > What does "ls -l /sys/class/thermal/cooling_device0/device" say?
>> >> >
>> >> > Rafael
>> >>
>> >> [~]$ ls -l /sys/class/thermal/cooling_device0/device
>> >> lrwxrwxrwx 1 root root 0 Mar 10 07:13
>> >> /sys/class/thermal/cooling_device0/device ->
>> >> ../../../LNXSYSTM:00/device:47/PNP0C0B:00
>> >> [~]$
>> >
>> > That's ACPI.
>> >
>> > I don't know, however, which change might cause the problem to happen.
>> >
>> > Can you bisect the commits between 2.6.38-rc6 and -rc7 to find the one that
>> > introduced the issue?
>> >
>> > Rafael
>>
>> Thanks, Rafael.
>> Bisect stopped at commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
>>
>> --------------------
>> [~/repository/kernel]$ git bisect good
>> 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 is the first bad commit
>> commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
>> Author: Andreas Herrmann <[email protected]>
>> Date: Thu Feb 24 15:53:46 2011 +0100
>>
>> x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems
>>
>> On some SB800 systems polarity for IOAPIC pin2 is wrongly
>> specified as low active by BIOS. This caused system hangs after
>> resume from S3 when HPET was used in one-shot mode on such
>> systems because a timer interrupt was missed (HPET signal is
>> high active).
>>
>> For more details see:
>>
>> http://marc.info/?l=linux-kernel&m=129623757413868
>>
>> Tested-by: Manoj Iyer <[email protected]>
>> Tested-by: Andre Przywara <[email protected]>
>> Signed-off-by: Andreas Herrmann <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: [email protected] # 37.x, 32.x
>> LKML-Reference: <[email protected]>
>> Signed-off-by: Ingo Molnar <[email protected]>
>>
>> :040000 040000 918adb3e08ef8cd258016dc46afab842e1be65fc 77d76d6f2451b16f963ce1ffe183aacfed9d994d M arch
>> [~/repository/kernel]$
>> --------------------
>>
>> So, I Cc to Andreas Herrmann, hope you will notice and help fix this.
>>
>> Thanks, all you hackers!
>>
>> --
>> Regards,
>> Lei
>>

--
Regards,
Lei


Attachments:
dmesg (32.99 kB)
dmesg
lspci (18.96 kB)
lspci
Download all attachments

2011-03-11 11:38:47

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

On Fri, Mar 11, 2011 at 06:47:34PM +0800, Wang Lei wrote:
> Hi Andreas,
>
> Thanks for your reply.
>
> Andreas Herrmann <[email protected]> writes:
>
> > Hi Lei,
> >
> > can you please provide information, which system you are using. dmesg
> > (please use apic=debug kernel parameter), output of lspci -nnxxxx (as
> > root),
> >
>
> I'm using debian sid with customized kernel. dmesg and lspci are
> attached.

> Initializing cgroup subsys cpuset
> Initializing cgroup subsys cpu
> Linux version 2.6.38-rc6+ (root@localhost) (gcc version 4.4.5 (Debian 4.4.5-12) ) #1 SMP Fri Feb 25 20:01:40 CST 2011
> Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.38-rc6+ root=/dev/sda1 ro apic=debug

So that was the good case (all working as expected).

> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 0000000097fb0000 (usable)
> BIOS-e820: 0000000097fb0000 - 0000000097fc8000 (reserved)
> BIOS-e820: 0000000097fc8000 - 0000000097fe7fb8 (ACPI NVS)
> BIOS-e820: 0000000097fe7fb8 - 00000000a0000000 (reserved)
> BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> BIOS-e820: 00000000fec00000 - 00000000fec02000 (reserved)
> BIOS-e820: 00000000ffbc0000 - 00000000ffcc0000 (reserved)
> BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> NX (Execute Disable) protection: active
> DMI 2.4 present.
> DMI: Hewlett-Packard HP Compaq 6515b (GL087PA#AB2)/30C2, BIOS 68YTT Ver. F.05 04/26/2007

^^^^^^^^^^^^^^^

Hmm, I thought that this might be the case.

Can you please try to boot -rc7 with kernel parameter acpi_skip_timer_override
and send the same output (dmesg again also with apic=debug and lspci -nnxxxx)


Thanks,

Andreas

PS: My assumption is that the patch in -rc7 leads to usage of IO-APIC
pin2 for timer interrupt (potentially I have broken chipset
revision determination for some SB600.)

2011-03-11 14:29:04

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7

On Fri, Mar 11, 2011 at 08:47:01PM +0800, Wang Lei wrote:
>
> On 2011-03-11 19:38:39 +0800, Andreas wrote:
> >
> > Can you please try to boot -rc7 with kernel parameter acpi_skip_timer_override
> > and send the same output (dmesg again also with apic=debug and lspci -nnxxxx)
> >
> >
> > Thanks,
> >
> > Andreas
> >
> > PS: My assumption is that the patch in -rc7 leads to usage of IO-APIC
> > pin2 for timer interrupt (potentially I have broken chipset
> > revision determination for some SB600.)

> Hi Andreas,
>
> I did what you said, appended acpi_skip_timer_override when boot -rc7.
> Now the fans work OK. I don't know why and I don't think this is the
> final solution. If not, I'll wait.

Ok, the problem is that all SB[6-8]00 chipsets use the same PCI device ID.
To differntiate the versions I need to check the revision ID.

With my patch I removed some special treatment for SB600.
(See http://support.amd.com/us/Embedded_TechDocs/46155_sb600_rrg_pub_3.03.pdf)

Revision ID/Class Code- R - 32 bits - [PCI_Reg: 08h]
Field Name Bits Default Description
RevisionID 7:0 11h / This field reflects the ASIC revision.
12h / 11h : For ASIC revision A11
13h 12h : For ASIC revision A12
13h : For ASIC revision A13
For ASIC revisions after A13, by default this field will read 13h
still. However, if SMBUS PCI config 70h bit 8 is set to 1, a
hidden revision ID can be read from this field.

The old code temporarily cleared bit 8 in PCI config 70h and received
13h as revision for device 14.0 on your system (the "hidden revision
ID" shown in your lspci output is 0x14 and that is what the new code
is using). For SB700/SB800 PCI config 70h is reserved ("software
should not write to it") and that is why I wanted to avoid accesses to
that register. (See SB700 documentation
http://support.amd.com/us/Embedded_TechDocs/43009_sb7xx_rrg_pub_1.00.pdf)

So the right thing to do is to correct the check for SB600 to cover
all SB600 revisions w/o depending on the setting of bit 8 in PCI
config 70h.

Attached patch should achieve this.
Can you please test this patch on top of -rc7?


Thanks a lot,

Andreas
---
>From 8453f3aef2e2b89ba30877998dcbfc06f475e253 Mon Sep 17 00:00:00 2001
From: Andreas Herrmann <[email protected]>
Date: Fri, 11 Mar 2011 15:16:47 +0100
Subject: [PATCH] x86, quirk: Fix SB600 revision check

Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
(x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems)
introduced a regression. It removed some SB600 specific code
to determine the revision ID without adapting a corresponding
revision ID check for SB600.

See this mail thread
http://marc.info/?l=linux-kernel&m=129980296006380&w=2

This patch adapts the corresponding check to cover all SB600
revisions.

Signed-off-by: Andreas Herrmann <[email protected]>
---
arch/x86/kernel/early-quirks.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 9efbdcc..3755ef4 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -159,7 +159,12 @@ static void __init ati_bugs_contd(int num, int slot, int func)
if (rev >= 0x40)
acpi_fix_pin2_polarity = 1;

- if (rev > 0x13)
+ /*
+ * SB600: revisions 0x11, 0x12, 0x13, 0x14, ...
+ * SB700: revisions 0x39, 0x3a, ...
+ * SB800: revisions 0x40, 0x41, ...
+ */
+ if (rev >= 0x39)
return;

if (acpi_use_timer_override)
--
1.7.4.1

2011-03-11 23:02:30

by Wang Lei

[permalink] [raw]
Subject: Re: [BUG] Fans keep running, first found since v2.6.38-rc7


Thanks!

On 2011-03-11 22:28:54 +0800, Andreas Herrmann wrote:
> On Fri, Mar 11, 2011 at 08:47:01PM +0800, Wang Lei wrote:
>>
>> On 2011-03-11 19:38:39 +0800, Andreas wrote:
>> >
>> > Can you please try to boot -rc7 with kernel parameter acpi_skip_timer_override
>> > and send the same output (dmesg again also with apic=debug and lspci -nnxxxx)
>> >
>> >
>> > Thanks,
>> >
>> > Andreas
>> >
>> > PS: My assumption is that the patch in -rc7 leads to usage of IO-APIC
>> > pin2 for timer interrupt (potentially I have broken chipset
>> > revision determination for some SB600.)
>
>> Hi Andreas,
>>
>> I did what you said, appended acpi_skip_timer_override when boot -rc7.
>> Now the fans work OK. I don't know why and I don't think this is the
>> final solution. If not, I'll wait.
>
> Ok, the problem is that all SB[6-8]00 chipsets use the same PCI device ID.
> To differntiate the versions I need to check the revision ID.
>
> With my patch I removed some special treatment for SB600.
> (See http://support.amd.com/us/Embedded_TechDocs/46155_sb600_rrg_pub_3.03.pdf)
>
> Revision ID/Class Code- R - 32 bits - [PCI_Reg: 08h]
> Field Name Bits Default Description
> RevisionID 7:0 11h / This field reflects the ASIC revision.
> 12h / 11h : For ASIC revision A11
> 13h 12h : For ASIC revision A12
> 13h : For ASIC revision A13
> For ASIC revisions after A13, by default this field will read 13h
> still. However, if SMBUS PCI config 70h bit 8 is set to 1, a
> hidden revision ID can be read from this field.
>
> The old code temporarily cleared bit 8 in PCI config 70h and received
> 13h as revision for device 14.0 on your system (the "hidden revision
> ID" shown in your lspci output is 0x14 and that is what the new code
> is using). For SB700/SB800 PCI config 70h is reserved ("software
> should not write to it") and that is why I wanted to avoid accesses to
> that register. (See SB700 documentation
> http://support.amd.com/us/Embedded_TechDocs/43009_sb7xx_rrg_pub_1.00.pdf)
>
> So the right thing to do is to correct the check for SB600 to cover
> all SB600 revisions w/o depending on the setting of bit 8 in PCI
> config 70h.
>
> Attached patch should achieve this.
> Can you please test this patch on top of -rc7?
>
>
> Thanks a lot,
>
> Andreas
> ---
> From 8453f3aef2e2b89ba30877998dcbfc06f475e253 Mon Sep 17 00:00:00 2001
> From: Andreas Herrmann <[email protected]>
> Date: Fri, 11 Mar 2011 15:16:47 +0100
> Subject: [PATCH] x86, quirk: Fix SB600 revision check
>
> Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
> (x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems)
> introduced a regression. It removed some SB600 specific code
> to determine the revision ID without adapting a corresponding
> revision ID check for SB600.
>
> See this mail thread
> http://marc.info/?l=linux-kernel&m=129980296006380&w=2
>
> This patch adapts the corresponding check to cover all SB600
> revisions.
>
> Signed-off-by: Andreas Herrmann <[email protected]>
> ---
> arch/x86/kernel/early-quirks.c | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 9efbdcc..3755ef4 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -159,7 +159,12 @@ static void __init ati_bugs_contd(int num, int slot, int func)
> if (rev >= 0x40)
> acpi_fix_pin2_polarity = 1;
>
> - if (rev > 0x13)
> + /*
> + * SB600: revisions 0x11, 0x12, 0x13, 0x14, ...
> + * SB700: revisions 0x39, 0x3a, ...
> + * SB800: revisions 0x40, 0x41, ...
> + */
> + if (rev >= 0x39)
> return;
>
> if (acpi_use_timer_override)

--
Regards,
Lei


Attachments:
dmesg3 (33.07 kB)
dmesg
lspci3 (18.96 kB)
lspci
Download all attachments