Hi,
A lot of our users complains about the problem in $subject. Here are some clues:
- All users have mainboards with nForce2 chipset:
nVidia Corporation nForce2 IGP2 [10de:01e0] (rev c1)
- The last working kernel for them is 2.6.30.9. They can't boot into 2.6.31.9-11,
- They all tried several boot parameters to disable acpi, lapic, mce, etc. none of them works,
- Their last working kernel (2.6.30.9)'s all shows some suspicious stuff:
[ 0.000000] Using APIC driver default
[ 0.000000] Nvidia board detected. Ignoring ACPI timer override.
[ 0.000000] If you got timer trouble try acpi_use_timer_override
[ 0.000000] ACPI: PM-Timer IO Port: 0x4008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: BIOS IRQ0 pin2 override ignored.
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 24
...
...
[ 0.102842] pci 0000:00:00.0: reg 10 32bit mmio: [0xe0000000-0xe7ffffff]
[ 0.102858] pci 0000:00:00.0: nForce2 C1 Halt Disconnect fixup
[ 0.103045] pci 0000:00:01.1: reg 10 io port: [0xe400-0xe41f]
[ 0.103079] pci 0000:00:01.1: PME# supported from D3hot D3cold
[ 0.103083] pci 0000:00:01.1: PME# disabled
...
...
[ 0.224587] Unpacking initramfs...
[ 0.407342] Freeing initrd memory: 4954k freed
[ 0.413366] cpu0(1) debug files 5
[ 0.413372] Machine check exception polling timer started.
[ 0.413382] cpufreq-nforce2: Detected nForce2 chipset revision C1
[ 0.413385] cpufreq-nforce2: FSB changing is maybe unstable and can lead to crashes and data loss.
[ 0.413398] cpufreq-nforce2: FSB currently at 200 MHz, FID 11.0
[ 0.413423] ondemand governor failed, too long transition latency of HW, fallback to performance governor
I've googled a lot but couldn't find a similar bug report/regression between 2.6.30 and 2.6.31. I wanted to know how
should I help them debugging the issue (except bisect because that would be a tough task for them).
What should I interpret from a hang just after freeing initrd memory? They can't even reach the busybox in the initramfs
so I can't suspect the initramfs code for now.
Note that I've done two radical configuration changes from 2.6.30->2.6.31: Building AGP drivers and libata driver
into the kernel image.
Regards,
Ozan Caglayan
Pardus Linux -- http://www.pardus.org.tr/eng
Ozan Çağlayan wrote On 13-01-2010 20:51:
(CC'ing relevant people)
> Hi,
>
> A lot of our users complains about the problem in $subject. Here are some clues:
>
> - All users have mainboards with nForce2 chipset:
> nVidia Corporation nForce2 IGP2 [10de:01e0] (rev c1)
> - The last working kernel for them is 2.6.30.9. They can't boot into 2.6.31.9-11,
> - They all tried several boot parameters to disable acpi, lapic, mce, etc. none of them works,
>
I just made them boot with:
bootmem_debug debug debugpat dynamic_printk earlyprintk=vga initcall_debug
loglevel=7 mminit_loglevel=4 pnp.debug sched_debug apic=debug
and we had more relevant messages:
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
initcall inet_init+0x0/0x199 returned 0 after 3585 usecs
calling af_unix_init+0x0/0x47 @ 1
NET: Registered protocol family 1
initcall af_unix_init+0x0/0x47 returned 0 after 101 usecs
calling populate_rootfs+0x0/0x62 @ 1
Unpacking initramfs...
Freeing initrd memory: 5109k freed
initcall populate_rootfs+0x0/0x62 returned 0 after 215338 usecs
calling i8259A_init_sysfs+0x0/0x1d @ 1
initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 42 usecs
calling sbf_init+0x0/0xda @ 1
initcall sbf_init+0x0/0xda returned 0 after 0 usecs
calling i8237A_init_sysfs+0x0/0x1d @ 1
initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 13 usecs
calling add_rtc_cmos+0x0/0x94 @ 1
initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
calling cache_sysfs_init+0x0/0x55 @ 1
initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
calling cpu_debug_init+0x0/0xe3 @ 1
and it hangs. Not that this is the place where on 2.6.30.9 it continues with:
[ 0.404102] cpu0(1) debug files 5 <--
[ 0.404109] Machine check exception polling timer started.
[ 0.404119] cpufreq-nforce2: Detected nForce2 chipset revision C1
[ 0.404122] cpufreq-nforce2: FSB changing is maybe unstable and can lead to
crashes and data loss.
[ 0.404135] cpufreq-nforce2: FSB currently at 167 MHz, FID 11.5
[ 0.404155] ondemand governor failed, too long transition latency of HW,
fallback to performance governor
The users all have an AMD Athlon XP series processor. There are 576 changes in arch/x86/kernel and 291
in arch/x86/kernel/cpu between 2.6.30 and 2.6.31. Need some hands where to start to debug the problem.
Regards,
Ozan Caglayan
On 01/13/2010 10:58 PM, Ozan Çağlayan wrote:
> Ozan Çağlayan wrote On 13-01-2010 20:51:
>
> (CC'ing relevant people)
>
>> Hi,
>>
>> A lot of our users complains about the problem in $subject. Here are some clues:
>>
>> - All users have mainboards with nForce2 chipset:
>> nVidia Corporation nForce2 IGP2 [10de:01e0] (rev c1)
>> - The last working kernel for them is 2.6.30.9. They can't boot into 2.6.31.9-11,
>> - They all tried several boot parameters to disable acpi, lapic, mce, etc. none of them works,
>>
>
> I just made them boot with:
>
> bootmem_debug debug debugpat dynamic_printk earlyprintk=vga initcall_debug
> loglevel=7 mminit_loglevel=4 pnp.debug sched_debug apic=debug
>
> and we had more relevant messages:
>
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> initcall inet_init+0x0/0x199 returned 0 after 3585 usecs
> calling af_unix_init+0x0/0x47 @ 1
> NET: Registered protocol family 1
> initcall af_unix_init+0x0/0x47 returned 0 after 101 usecs
> calling populate_rootfs+0x0/0x62 @ 1
> Unpacking initramfs...
> Freeing initrd memory: 5109k freed
> initcall populate_rootfs+0x0/0x62 returned 0 after 215338 usecs
> calling i8259A_init_sysfs+0x0/0x1d @ 1
> initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 42 usecs
> calling sbf_init+0x0/0xda @ 1
> initcall sbf_init+0x0/0xda returned 0 after 0 usecs
> calling i8237A_init_sysfs+0x0/0x1d @ 1
> initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 13 usecs
> calling add_rtc_cmos+0x0/0x94 @ 1
> initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
> calling cache_sysfs_init+0x0/0x55 @ 1
> initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
> calling cpu_debug_init+0x0/0xe3 @ 1
>
> and it hangs. Not that this is the place where on 2.6.30.9 it continues with:
>
> [ 0.404102] cpu0(1) debug files 5 <--
> [ 0.404109] Machine check exception polling timer started.
> [ 0.404119] cpufreq-nforce2: Detected nForce2 chipset revision C1
> [ 0.404122] cpufreq-nforce2: FSB changing is maybe unstable and can lead to
> crashes and data loss.
> [ 0.404135] cpufreq-nforce2: FSB currently at 167 MHz, FID 11.5
> [ 0.404155] ondemand governor failed, too long transition latency of HW,
> fallback to performance governor
>
> The users all have an AMD Athlon XP series processor. There are 576 changes in arch/x86/kernel and 291
> in arch/x86/kernel/cpu between 2.6.30 and 2.6.31. Need some hands where to start to debug the problem.
can you just deselect CONFIG_X86_CPU_DEBUG in your .config ?
YH
Yinghai Lu wrote On 14-01-2010 09:43:
> On 01/13/2010 10:58 PM, Ozan Çağlayan wrote:
>> Ozan Çağlayan wrote On 13-01-2010 20:51:
>>
>>
>> The users all have an AMD Athlon XP series processor. There are 576 changes in arch/x86/kernel and 291
>> in arch/x86/kernel/cpu between 2.6.30 and 2.6.31. Need some hands where to start to debug the problem.
>
> can you just deselect CONFIG_X86_CPU_DEBUG in your .config ?
I just made that and waiting for feedback from the users as I can't find an Athlon XP. If it correctly boots, I'll disable
that temporarily until we bisect it.
Yinghai Lu wrote On 14-01-2010 09:43:
> On 01/13/2010 10:58 PM, Ozan Çağlayan wrote:
>> Ozan Çağlayan wrote On 13-01-2010 20:51:
>
> can you just deselect CONFIG_X86_CPU_DEBUG in your .config ?
Yes that fixed the issue. I now reenabled it and reverted the following commit which is 1/2 commits
modifying cpu_debug code between 2.6.30..2.6.31:
>From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
From: Jaswinder Singh Rajput <[email protected]>
Date: Fri, 5 Jun 2009 23:27:17 +0530
Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding
Remove model information, encoding/decoding and reduce bookkeeping.
This, besides removing a lot of code and cleaning up the code, also
enables these features on many more CPUs that were enumerated before.
The other commit is (which I think is just for improving the output):
>From 97a52714658cd959a3cfa35c5b6f489859f0204b Mon Sep 17 00:00:00 2001
From: Andreas Herrmann <[email protected]>
Date: Fri, 8 May 2009 18:23:50 +0200
Subject: [PATCH] x86: display extended apic registers with print_local_APIC and cpu_debug code
Both print_local_APIC (used when apic=debug kernel param is set) and
cpu_debug code missed support for some extended APIC registers that
I'd like to see.
This adds support to show:
- extended APIC feature register
- extended APIC control register
- extended LVT registers
[ Impact: print more debug info ]
Ozan Çağlayan wrote On 14-01-2010 12:35:
> Yinghai Lu wrote On 14-01-2010 09:43:
>> On 01/13/2010 10:58 PM, Ozan Çağlayan wrote:
>>> Ozan Çağlayan wrote On 13-01-2010 20:51:
>
>> can you just deselect CONFIG_X86_CPU_DEBUG in your .config ?
>
> Yes that fixed the issue. I now reenabled it and reverted the following commit which is 1/2 commits
> modifying cpu_debug code between 2.6.30..2.6.31:
>
> From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
> From: Jaswinder Singh Rajput <[email protected]>
> Date: Fri, 5 Jun 2009 23:27:17 +0530
> Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding
Reverting this commit on top of 2.6.31.11 fixes the boot hangs with AMD Athlon XP processors.
I'll double check with other reporters in a day or two.
Ozan Çağlayan wrote On 14-01-2010 15:08:
(CC'ing stable)
>>
>> From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
>> From: Jaswinder Singh Rajput <[email protected]>
>> Date: Fri, 5 Jun 2009 23:27:17 +0530
>> Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding
>
> Reverting this commit on top of 2.6.31.11 fixes the boot hangs with AMD Athlon XP processors.
> I'll double check with other reporters in a day or two.
OK we've verified on 2 separate systems with Athlon XP that reverting the commit fixes the issue.
Here's proc/cpuinfo and relevant dmesg output for reference:
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 10
model name : AMD Athlon(tm) XP 2600+
stepping : 0
cpu MHz : 1920.500
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
bogomips : 3842.29
clflush size : 32
power management: ts
[ 0.461404] Freeing initrd memory: 5191k freed
[ 0.467844] initcall populate_rootfs+0x0/0x62 returned 0 after 217975 usecs
[ 0.467947] calling i8259A_init_sysfs+0x0/0x1d @ 1
[ 0.468093] initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 41 usecs
[ 0.468190] calling sbf_init+0x0/0xda @ 1
[ 0.468282] initcall sbf_init+0x0/0xda returned 0 after 0 usecs
[ 0.468378] calling i8237A_init_sysfs+0x0/0x1d @ 1
[ 0.468484] initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 12 usecs
[ 0.468582] calling add_rtc_cmos+0x0/0x94 @ 1
[ 0.468679] initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
[ 0.468780] calling cache_sysfs_init+0x0/0x55 @ 1
[ 0.468940] initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
[ 0.469041] calling cpu_debug_init+0x0/0x1f @ 1
[ 0.469190] cpu0(1) debug files 5
[ 0.469282] initcall cpu_debug_init+0x0/0x1f returned 0 after 143 usecs <-- That call wasn't returning at all
I think that the commit should be reverted or a fix should be released for linux-2.6 tree,
as well as .31 and .32 stable trees.
Thanks,
Ozan Caglayan
On Saturday 16 January 2010, Ozan Çağlayan wrote:
> Ozan Çağlayan wrote:
> > Ozan Çağlayan wrote On 14-01-2010 15:08:
> >
> > (CC'ing stable)
> >
> >>> From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
> >>> From: Jaswinder Singh Rajput <[email protected]>
> >>> Date: Fri, 5 Jun 2009 23:27:17 +0530
> >>> Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding
> >> Reverting this commit on top of 2.6.31.11 fixes the boot hangs with AMD Athlon XP processors.
> >> I'll double check with other reporters in a day or two.
> >
> > OK we've verified on 2 separate systems with Athlon XP that reverting the commit fixes the issue.
> > Here's proc/cpuinfo and relevant dmesg output for reference:
> >
> > processor : 0
> > vendor_id : AuthenticAMD
> > cpu family : 6
> > model : 10
> > model name : AMD Athlon(tm) XP 2600+
> > stepping : 0
> > cpu MHz : 1920.500
> > cache size : 512 KB
> > fdiv_bug : no
> > hlt_bug : no
> > f00f_bug : no
> > coma_bug : no
> > fpu : yes
> > fpu_exception : yes
> > cpuid level : 1
> > wp : yes
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> > pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
> > bogomips : 3842.29
> > clflush size : 32
> > power management: ts
> >
> >
> >
> > [ 0.461404] Freeing initrd memory: 5191k freed
> > [ 0.467844] initcall populate_rootfs+0x0/0x62 returned 0 after 217975 usecs
> > [ 0.467947] calling i8259A_init_sysfs+0x0/0x1d @ 1
> > [ 0.468093] initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 41 usecs
> > [ 0.468190] calling sbf_init+0x0/0xda @ 1
> > [ 0.468282] initcall sbf_init+0x0/0xda returned 0 after 0 usecs
> > [ 0.468378] calling i8237A_init_sysfs+0x0/0x1d @ 1
> > [ 0.468484] initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 12 usecs
> > [ 0.468582] calling add_rtc_cmos+0x0/0x94 @ 1
> > [ 0.468679] initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
> > [ 0.468780] calling cache_sysfs_init+0x0/0x55 @ 1
> > [ 0.468940] initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
> > [ 0.469041] calling cpu_debug_init+0x0/0x1f @ 1
> > [ 0.469190] cpu0(1) debug files 5
> > [ 0.469282] initcall cpu_debug_init+0x0/0x1f returned 0 after 143 usecs <-- That call wasn't returning at all
> >
> > I think that the commit should be reverted or a fix should be released for linux-2.6 tree,
> > as well as .31 and .32 stable trees.
>
>
> Linux 2.6.31 is released on Sep 9th. So people having an Athlon XP processor + a kernel newer than 4 months
> which enables CONFIG_X86_CPU_DEBUG can't even boot into a linux kernel. I was *at least* expecting a comment from the relevant
> people but nope for 2 days.
>
> It seems to be a serious regression which doesn't get caught. I'm also CC'ing Rafael, maybe he can inject this
> in one of his regression threads.
Well, the problem is I'm not listing regressions from 2.6.31 any more.
Also, I don't have hardware to reproduce the problem on.
Is there a bug entry for this in the kernel Bugzilla?
Rafael
Rafael J. Wysocki wrote:
> On Saturday 16 January 2010, Ozan Çağlayan wrote:
>>>
>>> I think that the commit should be reverted or a fix should be released for linux-2.6 tree,
>>> as well as .31 and .32 stable trees.
>>
>> Linux 2.6.31 is released on Sep 9th. So people having an Athlon XP processor + a kernel newer than 4 months
>> which enables CONFIG_X86_CPU_DEBUG can't even boot into a linux kernel. I was *at least* expecting a comment from the relevant
>> people but nope for 2 days.
>>
>> It seems to be a serious regression which doesn't get caught. I'm also CC'ing Rafael, maybe he can inject this
>> in one of his regression threads.
>
> Well, the problem is I'm not listing regressions from 2.6.31 any more.
Yes you're right.
>
> Also, I don't have hardware to reproduce the problem on.
>
> Is there a bug entry for this in the kernel Bugzilla?
Nope not for now, I'll do that.
Rafael J. Wysocki wrote:
>
> Well, the problem is I'm not listing regressions from 2.6.31 any more.
>
> Also, I don't have hardware to reproduce the problem on.
>
> Is there a bug entry for this in the kernel Bugzilla?
http://bugzilla.kernel.org/show_bug.cgi?id=15075
On Sunday 17 January 2010, Ozan Çağlayan wrote:
> Rafael J. Wysocki wrote:
> > On Saturday 16 January 2010, Ozan Çağlayan wrote:
>
> >>>
> >>> I think that the commit should be reverted or a fix should be released for linux-2.6 tree,
> >>> as well as .31 and .32 stable trees.
> >>
> >> Linux 2.6.31 is released on Sep 9th. So people having an Athlon XP processor + a kernel newer than 4 months
> >> which enables CONFIG_X86_CPU_DEBUG can't even boot into a linux kernel. I was *at least* expecting a comment from the relevant
> >> people but nope for 2 days.
> >>
> >> It seems to be a serious regression which doesn't get caught. I'm also CC'ing Rafael, maybe he can inject this
> >> in one of his regression threads.
> >
> > Well, the problem is I'm not listing regressions from 2.6.31 any more.
I should have said "from 2.6.30" actually.
Rafael
On Sunday 17 January 2010, Ozan Çağlayan wrote:
> Rafael J. Wysocki wrote:
>
> >
> > Well, the problem is I'm not listing regressions from 2.6.31 any more.
> >
> > Also, I don't have hardware to reproduce the problem on.
> >
> > Is there a bug entry for this in the kernel Bugzilla?
>
>
> http://bugzilla.kernel.org/show_bug.cgi?id=15075
I linked it to http://bugzilla.kernel.org/show_bug.cgi?id=13615 as a regression
from 2.6.30.
Rafael
On 01/16/2010 04:28 AM, Ozan Çağlayan wrote:
>
> Linux 2.6.31 is released on Sep 9th. So people having an Athlon XP processor + a kernel newer than 4 months
> which enables CONFIG_X86_CPU_DEBUG can't even boot into a linux kernel. I was *at least* expecting a comment from the relevant
> people but nope for 2 days.
>
> It seems to be a serious regression which doesn't get caught. I'm also CC'ing Rafael, maybe he can inject this
> in one of his regression threads.
>
Anything which involves enabling CONFIG_X86_CPU_DEBUG can hardly be
considered serious. It's a broken piece of work that should never have
gotten into the kernel in the first place.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
CONFIG_X86_CPU_DEBUG really seems to be causing more problems than it
ever solved. This is an RFC for immediately deprecating it, and
schedule it for removal in the 2.6.34 cycle.
If this was a high value feature, it would be different -- but it's not
even close.
Posting this as an RFC just on the offchance someone actually depends on
this.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Monday 18 January 2010, H. Peter Anvin wrote:
> CONFIG_X86_CPU_DEBUG really seems to be causing more problems than it
> ever solved. This is an RFC for immediately deprecating it, and
> schedule it for removal in the 2.6.34 cycle.
>
> If this was a high value feature, it would be different -- but it's not
> even close.
ACK
On Sun, 17 Jan 2010 17:26:53 -0800
"H. Peter Anvin" <[email protected]> wrote:
> CONFIG_X86_CPU_DEBUG really seems to be causing more problems than it
> ever solved. This is an RFC for immediately deprecating it, and
> schedule it for removal in the 2.6.34 cycle.
>
> If this was a high value feature, it would be different -- but it's not
> even close.
>
> Posting this as an RFC just on the offchance someone actually depends on
> this.
>
We know that enabling this feature will cause some machines to hang,
and that this problem has existed for six months.
Would it not be better to fix that problem (perhaps just with the
revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
nuke the feature in 2.6.34.
Alternatively, we can nuke the feature from 2.6.33 and 2.6.32.x and
earlier right now. Where "nuke" might mean "make it difficult to
enable".
Whatever. Bottom line is that it'd be nice to do something to fix up
2.6.33 and earlier.
On 01/22/2010 04:53 PM, Andrew Morton wrote:
>
> Would it not be better to fix that problem (perhaps just with the
> revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
> nuke the feature in 2.6.34.
>
> Alternatively, we can nuke the feature from 2.6.33 and 2.6.32.x and
> earlier right now. Where "nuke" might mean "make it difficult to
> enable".
>
> Whatever. Bottom line is that it'd be nice to do something to fix up
> 2.6.33 and earlier.
>
I would be all for nuking the feature immediately. The easiest way to
nuke the feature quickly is to make it a noninteractive Kconfig feature.
All in favor?
-hpa
On Fri, Jan 22, 2010 at 05:05:08PM -0800, H. Peter Anvin wrote:
> On 01/22/2010 04:53 PM, Andrew Morton wrote:
> >
> > Would it not be better to fix that problem (perhaps just with the
> > revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
> > nuke the feature in 2.6.34.
> >
> > Alternatively, we can nuke the feature from 2.6.33 and 2.6.32.x and
> > earlier right now. Where "nuke" might mean "make it difficult to
> > enable".
> >
> > Whatever. Bottom line is that it'd be nice to do something to fix up
> > 2.6.33 and earlier.
> >
>
> I would be all for nuking the feature immediately. The easiest way to
> nuke the feature quickly is to make it a noninteractive Kconfig feature.
>
> All in favor?
/me raises his hand.
A Kconfig change would be nice to have.
thanks,
greg k-h
On 01/22/2010 05:20 PM, Greg KH wrote:
> On Fri, Jan 22, 2010 at 05:05:08PM -0800, H. Peter Anvin wrote:
>> On 01/22/2010 04:53 PM, Andrew Morton wrote:
>>>
>>> Would it not be better to fix that problem (perhaps just with the
>>> revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
>>> nuke the feature in 2.6.34.
>>>
>>> Alternatively, we can nuke the feature from 2.6.33 and 2.6.32.x and
>>> earlier right now. Where "nuke" might mean "make it difficult to
>>> enable".
>>>
>>> Whatever. Bottom line is that it'd be nice to do something to fix up
>>> 2.6.33 and earlier.
>>>
>>
>> I would be all for nuking the feature immediately. The easiest way to
>> nuke the feature quickly is to make it a noninteractive Kconfig feature.
>>
>> All in favor?
>
> /me raises his hand.
>
> A Kconfig change would be nice to have.
>
I take that as an Acked-by: ...
-hpa
On Fri, 22 Jan 2010, Andrew Morton wrote:
>
> We know that enabling this feature will cause some machines to hang,
> and that this problem has existed for six months.
>
> Would it not be better to fix that problem (perhaps just with the
> revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
> nuke the feature in 2.6.34.
Another way of looking at is "we know it's been broken for six months, and
clearly nobody really ever enabled it in any distro, and even getting a
bug report on it took forever. So why keep it around at all"?
So I'd personally rather just remove it outright than deprecate it or even
try to fix it. Since clearly absolutely nobody depends on it.
The usual reason for deprecating a feature is to give people time to move
away from it, but since clearly nobody uses it...
Linus
Makes sense. Will do.
"Linus Torvalds" <[email protected]> wrote:
>
>
>On Fri, 22 Jan 2010, Andrew Morton wrote:
>>
>> We know that enabling this feature will cause some machines to hang,
>> and that this problem has existed for six months.
>>
>> Would it not be better to fix that problem (perhaps just with the
>> revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
>> nuke the feature in 2.6.34.
>
>Another way of looking at is "we know it's been broken for six months, and
>clearly nobody really ever enabled it in any distro, and even getting a
>bug report on it took forever. So why keep it around at all"?
>
>So I'd personally rather just remove it outright than deprecate it or even
>try to fix it. Since clearly absolutely nobody depends on it.
>
>The usual reason for deprecating a feature is to give people time to move
>away from it, but since clearly nobody uses it...
>
> Linus
--
Sent from my mobile phone, pardon any lack of formatting.
* Linus Torvalds <[email protected]> wrote:
> On Fri, 22 Jan 2010, Andrew Morton wrote:
> >
> > We know that enabling this feature will cause some machines to hang,
> > and that this problem has existed for six months.
> >
> > Would it not be better to fix that problem (perhaps just with the
> > revert) so that 2.6.33, 2.6.32.x and earlier can be fixed? Then we can
> > nuke the feature in 2.6.34.
>
> Another way of looking at is "we know it's been broken for six months, and
> clearly nobody really ever enabled it in any distro, and even getting a bug
> report on it took forever. So why keep it around at all"?
>
> So I'd personally rather just remove it outright than deprecate it or even
> try to fix it. Since clearly absolutely nobody depends on it.
>
> The usual reason for deprecating a feature is to give people time to move
> away from it, but since clearly nobody uses it...
Excellent - that makes it all even simpler to handle.
Ingo
Ozan Çağlayan wrote:
> Ozan Çağlayan wrote On 14-01-2010 15:08:
>
> (CC'ing stable)
>
>>> From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
>>> From: Jaswinder Singh Rajput <[email protected]>
>>> Date: Fri, 5 Jun 2009 23:27:17 +0530
>>> Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding
>> Reverting this commit on top of 2.6.31.11 fixes the boot hangs with AMD Athlon XP processors.
>> I'll double check with other reporters in a day or two.
>
> OK we've verified on 2 separate systems with Athlon XP that reverting the commit fixes the issue.
> Here's proc/cpuinfo and relevant dmesg output for reference:
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 10
> model name : AMD Athlon(tm) XP 2600+
> stepping : 0
> cpu MHz : 1920.500
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
> bogomips : 3842.29
> clflush size : 32
> power management: ts
>
>
>
> [ 0.461404] Freeing initrd memory: 5191k freed
> [ 0.467844] initcall populate_rootfs+0x0/0x62 returned 0 after 217975 usecs
> [ 0.467947] calling i8259A_init_sysfs+0x0/0x1d @ 1
> [ 0.468093] initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 41 usecs
> [ 0.468190] calling sbf_init+0x0/0xda @ 1
> [ 0.468282] initcall sbf_init+0x0/0xda returned 0 after 0 usecs
> [ 0.468378] calling i8237A_init_sysfs+0x0/0x1d @ 1
> [ 0.468484] initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 12 usecs
> [ 0.468582] calling add_rtc_cmos+0x0/0x94 @ 1
> [ 0.468679] initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
> [ 0.468780] calling cache_sysfs_init+0x0/0x55 @ 1
> [ 0.468940] initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
> [ 0.469041] calling cpu_debug_init+0x0/0x1f @ 1
> [ 0.469190] cpu0(1) debug files 5
> [ 0.469282] initcall cpu_debug_init+0x0/0x1f returned 0 after 143 usecs <-- That call wasn't returning at all
>
> I think that the commit should be reverted or a fix should be released for linux-2.6 tree,
> as well as .31 and .32 stable trees.
Linux 2.6.31 is released on Sep 9th. So people having an Athlon XP processor + a kernel newer than 4 months
which enables CONFIG_X86_CPU_DEBUG can't even boot into a linux kernel. I was *at least* expecting a comment from the relevant
people but nope for 2 days.
It seems to be a serious regression which doesn't get caught. I'm also CC'ing Rafael, maybe he can inject this
in one of his regression threads.
Thanks
Ozan Caglayan