[sorry for repost, local MTA problems here...]
Hi list, Hi Andrew,
I cannot boot 2.6.17-rc4-mm1 because my rootdisk is a scsi disk and upon
scsi-init (SYM53C8XX_2) I'm getting:
May 19 15:39:55 prinz sym0: <895> rev 0x1 at pci 0000:02:09.0 irq 161
May 19 15:39:55 prinz sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
May 19 15:39:55 prinz sym0: SCSI BUS has been reset.
May 19 15:39:55 prinz scsi0 : sym-2.2.3
May 19 15:40:08 prinz 0:0:0:0: ABORT operation started.
May 19 15:40:13 prinz 0:0:0:0: ABORT operation timed-out.
May 19 15:40:13 prinz 0:0:0:0: DEVICE RESET operation started.
May 19 15:40:18 prinz 0:0:0:0: DEVICE RESET operation timed-out.
May 19 15:40:18 prinz 0:0:0:0: BUS RESET operation started.
May 19 15:40:23 prinz 0:0:0:0: BUS RESET operation timed-out.
May 19 15:40:23 prinz 0:0:0:0: HOST RESET operation started.
May 19 15:40:23 prinz sym0: SCSI BUS has been reset.
May 19 15:40:28 prinz 0:0:0:0: HOST RESET operation timed-out.
May 19 15:40:28 prinz 0:0:0:0: scsi: Device offlined - not ready after
error recovery
May 19 15:40:33 prinz 0:0:1:0: ABORT operation started.
May 19 15:40:38 prinz 0:0:1:0: ABORT operation timed-out.
May 19 15:40:38 prinz 0:0:1:0: DEVICE RESET operation started.
May 19 15:40:43 prinz 0:0:1:0: DEVICE RESET operation timed-out.
May 19 15:40:43 prinz 0:0:1:0: BUS RESET operation started.
I have backed out drivers-scsi-use-array_size-macro.patch, but to no
avail. There are other scsi-related patches in the broken-out
mm-directory, any hint which one to try first? Sometimes they're dependent
on each other, so I find it not easy to just "patch -R" all "*scsi*.patch"
files.
Please see http://www.nerdbynature.de/bits/2.6.17-rc4-mm1/ for a
netsconsole-dmesg for 2.6.17-rc4 (working fine) and a the -mm1.
I've tried different .configs for -mm1, created with:
- yes '' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_default)
- yes 'N' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_no)
- make oldlconfig (interactive, config-2.6-mm.2.6.17-rc4-mm1.oldconfig_my)
Thanks,
Christian.
--
BOFH excuse #442:
Trojan horse ran out of hay
--
BOFH excuse #442:
Trojan horse ran out of hay
"Christian Kujau" <[email protected]> wrote:
>
> [sorry for repost, local MTA problems here...]
>
> Hi list, Hi Andrew,
>
> I cannot boot 2.6.17-rc4-mm1 because my rootdisk is a scsi disk and upon
> scsi-init (SYM53C8XX_2) I'm getting:
>
> May 19 15:39:55 prinz sym0: <895> rev 0x1 at pci 0000:02:09.0 irq 161
> May 19 15:39:55 prinz sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
> May 19 15:39:55 prinz sym0: SCSI BUS has been reset.
> May 19 15:39:55 prinz scsi0 : sym-2.2.3
> May 19 15:40:08 prinz 0:0:0:0: ABORT operation started.
> May 19 15:40:13 prinz 0:0:0:0: ABORT operation timed-out.
> May 19 15:40:13 prinz 0:0:0:0: DEVICE RESET operation started.
> May 19 15:40:18 prinz 0:0:0:0: DEVICE RESET operation timed-out.
> May 19 15:40:18 prinz 0:0:0:0: BUS RESET operation started.
> May 19 15:40:23 prinz 0:0:0:0: BUS RESET operation timed-out.
> May 19 15:40:23 prinz 0:0:0:0: HOST RESET operation started.
> May 19 15:40:23 prinz sym0: SCSI BUS has been reset.
> May 19 15:40:28 prinz 0:0:0:0: HOST RESET operation timed-out.
> May 19 15:40:28 prinz 0:0:0:0: scsi: Device offlined - not ready after
> error recovery
> May 19 15:40:33 prinz 0:0:1:0: ABORT operation started.
> May 19 15:40:38 prinz 0:0:1:0: ABORT operation timed-out.
> May 19 15:40:38 prinz 0:0:1:0: DEVICE RESET operation started.
> May 19 15:40:43 prinz 0:0:1:0: DEVICE RESET operation timed-out.
> May 19 15:40:43 prinz 0:0:1:0: BUS RESET operation started.
>
> I have backed out drivers-scsi-use-array_size-macro.patch, but to no
> avail. There are other scsi-related patches in the broken-out
> mm-directory, any hint which one to try first? Sometimes they're dependent
> on each other, so I find it not easy to just "patch -R" all "*scsi*.patch"
> files.
>
> Please see http://www.nerdbynature.de/bits/2.6.17-rc4-mm1/ for a
> netsconsole-dmesg for 2.6.17-rc4 (working fine) and a the -mm1.
>
> I've tried different .configs for -mm1, created with:
>
> - yes '' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_default)
> - yes 'N' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_no)
> - make oldlconfig (interactive, config-2.6-mm.2.6.17-rc4-mm1.oldconfig_my)
>
Thanks for the report, and thanks for testing. The full demsg output
really helps.
It goes pear-shaped very early:
--- prinz64-nc.2.6.17-rc4.log Fri May 19 13:56:34 2006
+++ prinz64-nc.2.6.17-rc4-mm1.log Fri May 19 13:56:58 2006
@@ -12,20 +12,17 @@
BIOS-e820: 00000000fefffc00 - 00000000ff000000 (reserved)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
DMI 2.2 present.
+ACPI: Unable to map RSDT header
+node 0 zone Normal missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
+node 0 zone HighMem missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
And from then on, ACPI is kaput. So your interrupts are kaput, as is the
disk controller.
I had some of this happening too - it's due to some of the MM patches from
Mel and/or Andy. I also managed to provoke "Too many memory regions,
truncating" out of it.
I hope that's all sorted out now. Please test next -mm (hopefully
tomorrow) and let us know?
Or, if you're super-keen,
http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2 is my current rollup
(against 2.6.17-rc4). It was compilable this morning, but I've since
merged stuff ;) It would be interesting to know if that has fixed the bug.
> "Christian Kujau" <[email protected]> wrote:
>>
>> [sorry for repost, local MTA problems here...]
>>
>> Hi list, Hi Andrew,
>>
>> I cannot boot 2.6.17-rc4-mm1 because my rootdisk is a scsi disk and upon
>> scsi-init (SYM53C8XX_2) I'm getting:
>>
>> May 19 15:39:55 prinz sym0: <895> rev 0x1 at pci 0000:02:09.0 irq 161
>> May 19 15:39:55 prinz sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
>> May 19 15:39:55 prinz sym0: SCSI BUS has been reset.
>> May 19 15:39:55 prinz scsi0 : sym-2.2.3
>> May 19 15:40:08 prinz 0:0:0:0: ABORT operation started.
>> May 19 15:40:13 prinz 0:0:0:0: ABORT operation timed-out.
>> May 19 15:40:13 prinz 0:0:0:0: DEVICE RESET operation started.
>> May 19 15:40:18 prinz 0:0:0:0: DEVICE RESET operation timed-out.
>> May 19 15:40:18 prinz 0:0:0:0: BUS RESET operation started.
>> May 19 15:40:23 prinz 0:0:0:0: BUS RESET operation timed-out.
>> May 19 15:40:23 prinz 0:0:0:0: HOST RESET operation started.
>> May 19 15:40:23 prinz sym0: SCSI BUS has been reset.
>> May 19 15:40:28 prinz 0:0:0:0: HOST RESET operation timed-out.
>> May 19 15:40:28 prinz 0:0:0:0: scsi: Device offlined - not ready after
>> error recovery
>> May 19 15:40:33 prinz 0:0:1:0: ABORT operation started.
>> May 19 15:40:38 prinz 0:0:1:0: ABORT operation timed-out.
>> May 19 15:40:38 prinz 0:0:1:0: DEVICE RESET operation started.
>> May 19 15:40:43 prinz 0:0:1:0: DEVICE RESET operation timed-out.
>> May 19 15:40:43 prinz 0:0:1:0: BUS RESET operation started.
>>
>> I have backed out drivers-scsi-use-array_size-macro.patch, but to no
>> avail. There are other scsi-related patches in the broken-out
>> mm-directory, any hint which one to try first? Sometimes they're dependent
>> on each other, so I find it not easy to just "patch -R" all "*scsi*.patch"
>> files.
>>
>> Please see http://www.nerdbynature.de/bits/2.6.17-rc4-mm1/ for a
>> netsconsole-dmesg for 2.6.17-rc4 (working fine) and a the -mm1.
>>
>> I've tried different .configs for -mm1, created with:
>>
>> - yes '' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_default)
>> - yes 'N' | make oldconfig (config-2.6-mm.2.6.17-rc4-mm1.oldconfig_no)
>> - make oldlconfig (interactive, config-2.6-mm.2.6.17-rc4-mm1.oldconfig_my)
>>
>
> Thanks for the report, and thanks for testing. The full demsg output
> really helps.
>
>
> It goes pear-shaped very early:
>
> --- prinz64-nc.2.6.17-rc4.log Fri May 19 13:56:34 2006
> +++ prinz64-nc.2.6.17-rc4-mm1.log Fri May 19 13:56:58 2006
> @@ -12,20 +12,17 @@
> BIOS-e820: 00000000fefffc00 - 00000000ff000000 (reserved)
> BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
> DMI 2.2 present.
> +ACPI: Unable to map RSDT header
> +node 0 zone Normal missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
> +node 0 zone HighMem missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
>
>
> And from then on, ACPI is kaput. So your interrupts are kaput, as is the
> disk controller.
>
> I had some of this happening too - it's due to some of the MM patches from
> Mel and/or Andy.
The warnings in this case is valid but I would think harmless. ZONE_NORMAL
on x86_64 begins at MAX_DMA32_PFN on the 4GiB boundary which is MAX_ORDER
aligned. From the e820 map, I am guessing the machine has 1GiB of memory
so the normal and highmem zones are empty. Andy's latest patches should
catch that.
The places where I now expect to see zone alignment error messages is
where the lowest PFN in a node is not aligned so the zone appears to
start unaligned. As the node_mem_map is aligned to the MAX_ORDER
boundary, we will see the warning, but it'll be harmless again.
I am struggling to see how the alignment patches or
arch-independent-zone-sizing would clobber the mapping of the ACPI table :(
> I also managed to provoke "Too many memory regions,
> truncating" out of it.
>
"Too many memory regions, truncating" is of concern because memory will be
effectively lost. Is this on x86_64 as well? If so, I need to submit a
patch that sets CONFIG_MAX_ACTIVE_REGIONS to 128 on x86_64 which is the
same value of E820MAX. This is similar to what PPC64 does for LMB regions
(see MAX_ACTIVE_REGIONS in arch/powerpc/Kconfig for example). If it's not
x86_64, what arch does it occur on?
> I hope that's all sorted out now. Please test next -mm (hopefully
> tomorrow) and let us know?
>
> Or, if you're super-keen,
> http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2 is my current rollup
> (against 2.6.17-rc4). It was compilable this morning, but I've since
> merged stuff ;) It would be interesting to know if that has fixed the bug.
>
[email protected] (Mel Gorman) wrote:
>
> I am struggling to see how the alignment patches or
> arch-independent-zone-sizing would clobber the mapping of the ACPI table :(
hm. Well something did it ;)
> > I also managed to provoke "Too many memory regions,
> > truncating" out of it.
> >
>
> "Too many memory regions, truncating" is of concern because memory will be
> effectively lost. Is this on x86_64 as well? If so, I need to submit a
> patch that sets CONFIG_MAX_ACTIVE_REGIONS to 128 on x86_64 which is the
> same value of E820MAX. This is similar to what PPC64 does for LMB regions
> (see MAX_ACTIVE_REGIONS in arch/powerpc/Kconfig for example). If it's not
> x86_64, what arch does it occur on?
Yes, it's x86_64. It kind of went away though. I seem to have been
finding various .config combinations which cause x86_64 to die horridly -
that was one.
Hi there,
On Fri, 19 May 2006, Andrew Morton wrote:
> DMI 2.2 present.
> +ACPI: Unable to map RSDT header
> +node 0 zone Normal missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
> +node 0 zone HighMem missaligned start pfn, enable UNALIGNED_ZONE_BOUNDRIES
gah, diff(1) is actually not new to me, but I forgot to use it :(
Thanks for spotting this!
> Or, if you're super-keen,
> http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2 is my current rollup
> (against 2.6.17-rc4). It was compilable this morning, but I've since
> merged stuff ;) It would be interesting to know if that has fixed the bug.
I tried to be "super-keen" and applied x.bz2 to pristine 2.6.17-rc4, but
the scsi error persists (logs, .config coming in a few minutes.)
Furthermore, I had to do 2 more things to get rc4-mm* compiling:
1) apply the attached patch, as the compile breaks with:
CC drivers/pci/msi-apic.o
In file included from include/asm/msi.h:11,
from drivers/pci/msi.h:71,
from drivers/pci/msi-apic.c:8:
include/asm/smp.h:103: error: syntax error before '->' token
make[2]: *** [drivers/pci/msi-apic.o] Error 1
make[1]: *** [drivers/pci] Error 2
make: *** [drivers] Error 2
(this has been reported with 2.6.17-rc3-mm1, but was not fixed?)
2) disable CONFIG_ROOT_NFS=y, as the compile breaks with:
GEN .version
CHK include/linux/compile.h
UPD include/linux/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
fs/built-in.o: In function `nfs_root_setup':nfsroot.c:(.init.text+0x1809):
undefined reference to `root_nfs_parse_addr'
:nfsroot.c:(.init.text+0x1810): undefined reference to `root_server_addr'
fs/built-in.o: In function `nfs_root_data': undefined reference to
`root_server_path'
fs/built-in.o: In function `nfs_root_data': undefined reference to
`root_server_addr'
As said before, .config and dmesg for rc4-mm2 in a moment, netconsole is
not working...hm.
Thank you!
Christian.
--
"No one talks peace unless he's ready to back it up with war."
"He talks of peace if it is the only way to live."
-- Colonel Green and Surak of Vulcan, "The Savage Curtain",
stardate 5906.5.
Hi Mel,
On Fri, 19 May 2006, Mel Gorman wrote:
> The warnings in this case is valid but I would think harmless. ZONE_NORMAL
> on x86_64 begins at MAX_DMA32_PFN on the 4GiB boundary which is MAX_ORDER
> aligned. From the e820 map, I am guessing the machine has 1GiB of memory
yes, this (x86_64) box has 1GB of memory, non-ECC.
> I am struggling to see how the alignment patches or
> arch-independent-zone-sizing would clobber the mapping of the ACPI table :(
I'll try to disable ACPI in the next testing runs...
Thanks,
Christian.
--
"The combination of a number of things to make existence worthwhile."
"Yes, the philosophy of 'none,' meaning 'all.'"
-- Spock and Lincoln, "The Savage Curtain", stardate 5906.4
On Fri, 19 May 2006, Andrew Morton wrote:
> [email protected] (Mel Gorman) wrote:
>>
>> I am struggling to see how the alignment patches or
>> arch-independent-zone-sizing would clobber the mapping of the ACPI table :(
>
> hm. Well something did it ;)
>
Obviously. One option is to back out
have-x86_64-use-add_active_range-and-free_area_init_nodes.patch and see
what happens on Christian's machine.
>> > I also managed to provoke "Too many memory regions,
>> > truncating" out of it.
>> >
>>
>> "Too many memory regions, truncating" is of concern because memory will be
>> effectively lost. Is this on x86_64 as well? If so, I need to submit a
>> patch that sets CONFIG_MAX_ACTIVE_REGIONS to 128 on x86_64 which is the
>> same value of E820MAX. This is similar to what PPC64 does for LMB regions
>> (see MAX_ACTIVE_REGIONS in arch/powerpc/Kconfig for example). If it's not
>> x86_64, what arch does it occur on?
>
> Yes, it's x86_64. It kind of went away though. I seem to have been
> finding various .config combinations which cause x86_64 to die horridly -
> that was one.
>
Can you post up some of the configs and I'll see can I reproduce it
locally please?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Sat, 20 May 2006, Christian Kujau wrote:
> I tried to be "super-keen" and applied x.bz2 to pristine 2.6.17-rc4, but the
> scsi error persists (logs, .config coming in a few minutes.)
Please see .config and dmesgs here:
http://www.nerdbynature.de/bits/2.6.17-rc4-mm2.x/
I'll try with ACPI disabled later on and let you know. If you have more
patches to test/back-out I'll be happy to test. What puzzles me: sym53c8xx
does not seem *too* exotic but I seem to be the only one whining...
Thanks,
Christian.
--
"The combination of a number of things to make existence worthwhile."
"Yes, the philosophy of 'none,' meaning 'all.'"
-- Spock and Lincoln, "The Savage Curtain", stardate 5906.4
On Sat, 20 May 2006, Mel Gorman wrote:
> Obviously. One option is to back out
> have-x86_64-use-add_active_range-and-free_area_init_nodes.patch and see what
> happens on Christian's machine.
I've disabled CONFIG_PM and backed out above patch (from -rc4-mm1), but
sadly, the error persists:
http://nerdbynature.de/bits/2.6.17-rc4-mm1/no-CONFIG_PM/
http://nerdbynature.de/bits/2.6.17-rc4-mm2.x/no-CONFIG_PM/
(the first one with the said patch backed out)
Thanks for your ideas,
Christian.
--
There's another way to survive. Mutual trust -- and help.
-- Kirk, "Day of the Dove", stardate unknown
Just in case the news didn't get through: the issue has been fixed in
-mm3. I'm not sure about what the real fix was, since
- rc4 is working
- rc4-mm1 is not working
- rc4-mm2 is not working
- rc4-mm3 is working
Mel Gorman sent me the zonesizing-v13 patch for -mm3 (thanks again!),
which was also working, results are here:
http://nerdbynature.de/bits/2.6.17-rc4-mm3/
Thanks to all involved,
Christian.
--
BOFH excuse #435:
Internet shut down due to maintenance