2007-12-03 04:00:04

by Geoff Levand

[permalink] [raw]
Subject: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Hi.

I'm finding that recently kexec'ed kernels on PS3 will
panic on startup. It seems the trouble was introduced
with the ppc64 SPARSEMEM_VMEMMAP support. The problem
is the same when starting either new or old kernels:

2.6.24 -> 2.6.23 ok
2.6.24 -> 2.6.23 panic
2.6.24 -> 2.6.24 panic

These are the commits that seem to introduce the problem:

d29eff7bca60c9ee401d691d4562a4abca8de543 ppc64: SPARSEMEM_VMEMMAP suppor
8f6aac419bd590f535fb110875a51f7db2b62b5b Generic Virtual Memmap support for SPARSEMEM


Below is a startup dump. Any help in finding the problem
would be appreciated.

-Geoff



ps3_mm_add_memory:317: start_addr 740320000000h, start_pfn 740320000h, nr_pages 17000h
<4>swapper: page allocation failure. order:12, mode:0x80d0
Call Trace:
[c000000006047820] [c00000000000e700] .show_stack+0x68/0x1b0 (unreliable)
[c0000000060478c0] [c000000000089eb4] .__alloc_pages+0x358/0x3ac
[c0000000060479b0] [c0000000000a3964] .vmemmap_alloc_block+0x6c/0xf4
[c000000006047a40] [c000000000026544] .vmemmap_populate+0x74/0x100
[c000000006047ae0] [c0000000000a385c] .sparse_mem_map_populate+0x38/0x5c
[c000000006047b70] [c0000000000a36e4] .sparse_add_one_section+0x64/0x128
[c000000006047c20] [c0000000000aa74c] .__add_pages+0xac/0x18c
[c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
[c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
[c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
[c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
[c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
Mem-info:
DMA per-cpu:
CPU 0: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0
CPU 1: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0
Active:0 inactive:0 dirty:0 writeback:0 unstable:0
free:18094 slab:122 mapped:0 pagetables:0 bounce:0
DMA free:72376kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:129280kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 8*4kB 5*8kB 5*16kB 7*32kB 3*64kB 5*128kB 4*256kB 3*512kB 5*1024kB 3*2048kB 4*4096kB 5*8192kB 0*16384kB = 72376kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap = 0kB
Total swap = 0kB
Free swap: 0kB
32768 pages of RAM
10403 reserved pages
0 pages shared
0 pages swap cached
<1>Unable to handle kernel paging request for data at address 0xcf0001960b000010
<1>Faulting instruction address: 0xc000000000087340
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 PS3
Modules linked in:
NIP: c000000000087340 LR: c00000000008733c CTR: 0000000000000000
REGS: c000000006047900 TRAP: 0300 Not tainted (2.6.24-rc3-ps3-linux-dev-g91428d55-dirty)
MSR: 8000000000008032 <EE,IR,DR> CR: 22004444 XER: 00000000
DAR: cf0001960b000010, DSISR: 0000000042000000
TASK = c000000006041080[1] 'swapper' THREAD: c000000006044000 CPU: 1
<6>GPR00: 0000000000000000 c000000006047b80 c00000000052b410 c000000006001b40
<6>GPR04: 0000000000000001 0000000000000003 0000000000000008 0000000000000000
<6>GPR08: 0000000000000002 cf0001960b000008 c000000006051240 0000000000000003
<6>GPR12: 0000000000000003 c000000000484080 00000000100d0000 0000000000bc5000
<6>GPR16: 0000000007fff000 0000000000000001 00000000100a0000 00000000100d0000
<6>GPR20: 0000000000000000 00000000100df628 00000000100df458 00000000100df678
<6>GPR24: 0000000000740336 c000000000492c00 0000000000000000 0000000000000001
<6>GPR28: 0000000740325000 0000000740324924 c0000000004ce9a8 cf0001960affffe0
NIP [c000000000087340] .memmap_init_zone+0xf0/0x134
LR [c00000000008733c] .memmap_init_zone+0xec/0x134
Call Trace:
[c000000006047b80] [c0000000001da530] .add_memory_block+0xd8/0x108 (unreliable)
[c000000006047c20] [c0000000000aa7ac] .__add_pages+0x10c/0x18c
[c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
[c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
[c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
[c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
[c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
Instruction dump:
901f000c 38000400 7d20f8a8 7d290378 7d20f9ad 40a2fff4 7ba00521 7fe3fb78
38800002 41820008 4bffff0d 393f0028 <f9290008> f93f0028 3bbd0001 3bff0038
<0>Kernel panic - not syncing: Attempted to kill init!


2007-12-03 16:32:42

by Milton Miller

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Dec 2, 2007, at 9:59 PM, Geoff Levand wrote:

> Hi.
>
> I'm finding that recently kexec'ed kernels on PS3 will
> panic on startup. It seems the trouble was introduced
> with the ppc64 SPARSEMEM_VMEMMAP support. The problem
> is the same when starting either new or old kernels:
>
> 2.6.24 -> 2.6.23 ok
> 2.6.24 -> 2.6.23 panic
> 2.6.24 -> 2.6.24 panic

I'm not sure I completely follow this. What is the difference between
1 and 2 ? Also, you are talking about starting with kexec, but I
don't see how that fits in the failure you have below. In other words,
there may be more than one failure. But I can talk a bit about the
scope of the problem in the current traceback.

>
> These are the commits that seem to introduce the problem:
>
> d29eff7bca60c9ee401d691d4562a4abca8de543 ppc64: SPARSEMEM_VMEMMAP
> suppor
> 8f6aac419bd590f535fb110875a51f7db2b62b5b Generic Virtual Memmap
> support for SPARSEMEM
>
>
> Below is a startup dump. Any help in finding the problem
> would be appreciated.
>
> -Geoff
>
>
>
> ps3_mm_add_memory:317: start_addr 740320000000h, start_pfn 740320000h,
> nr_pages 17000h
> <4>swapper: page allocation failure. order:12, mode:0x80d0
> Call Trace:
> [c000000006047820] [c00000000000e700] .show_stack+0x68/0x1b0
> (unreliable)
> [c0000000060478c0] [c000000000089eb4] .__alloc_pages+0x358/0x3ac
> [c0000000060479b0] [c0000000000a3964] .vmemmap_alloc_block+0x6c/0xf4
> [c000000006047a40] [c000000000026544] .vmemmap_populate+0x74/0x100
> [c000000006047ae0] [c0000000000a385c]
> .sparse_mem_map_populate+0x38/0x5c
> [c000000006047b70] [c0000000000a36e4]
> .sparse_add_one_section+0x64/0x128
> [c000000006047c20] [c0000000000aa74c] .__add_pages+0xac/0x18c
> [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> Mem-info:
> DMA per-cpu:
> CPU 0: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch:
> 3 usd: 0
> CPU 1: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch:
> 3 usd: 0
> Active:0 inactive:0 dirty:0 writeback:0 unstable:0
> free:18094 slab:122 mapped:0 pagetables:0 bounce:0
> DMA free:72376kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
> present:129280kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 8*4kB 5*8kB 5*16kB 7*32kB 3*64kB 5*128kB 4*256kB 3*512kB 5*1024kB
> 3*2048kB 4*4096kB 5*8192kB 0*16384kB = 72376kB
> Swap cache: add 0, delete 0, find 0/0, race 0+0
> Free swap = 0kB
> Total swap = 0kB
> Free swap: 0kB
> 32768 pages of RAM
> 10403 reserved pages
> 0 pages shared
> 0 pages swap cached

The kernel is using 16MB pages for the linear mapping and, since its in
the same region, the sparse virtural memmap. PS3 uses hotplug for all
most all of its memory. In this case, its trying to allocate an
additional page to cover a new region of the memory map. However, the
initial 128 MB is fragmented, we have 8 8M chunks but no 16MB ones.

> <1>Unable to handle kernel paging request for data at address
> 0xcf0001960b000010
> <1>Faulting instruction address: 0xc000000000087340
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2 PS3
> Modules linked in:
> NIP: c000000000087340 LR: c00000000008733c CTR: 0000000000000000
> REGS: c000000006047900 TRAP: 0300 Not tainted
> (2.6.24-rc3-ps3-linux-dev-g91428d55-dirty)
> MSR: 8000000000008032 <EE,IR,DR> CR: 22004444 XER: 00000000
> DAR: cf0001960b000010, DSISR: 0000000042000000
> TASK = c000000006041080[1] 'swapper' THREAD: c000000006044000 CPU: 1
> <6>GPR00: 0000000000000000 c000000006047b80 c00000000052b410
> c000000006001b40
> <6>GPR04: 0000000000000001 0000000000000003 0000000000000008
> 0000000000000000
> <6>GPR08: 0000000000000002 cf0001960b000008 c000000006051240
> 0000000000000003
> <6>GPR12: 0000000000000003 c000000000484080 00000000100d0000
> 0000000000bc5000
> <6>GPR16: 0000000007fff000 0000000000000001 00000000100a0000
> 00000000100d0000
> <6>GPR20: 0000000000000000 00000000100df628 00000000100df458
> 00000000100df678
> <6>GPR24: 0000000000740336 c000000000492c00 0000000000000000
> 0000000000000001
> <6>GPR28: 0000000740325000 0000000740324924 c0000000004ce9a8
> cf0001960affffe0
> NIP [c000000000087340] .memmap_init_zone+0xf0/0x134
> LR [c00000000008733c] .memmap_init_zone+0xec/0x134
> Call Trace:
> [c000000006047b80] [c0000000001da530] .add_memory_block+0xd8/0x108
> (unreliable)
> [c000000006047c20] [c0000000000aa7ac] .__add_pages+0x10c/0x18c
> [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> Instruction dump:
> 901f000c 38000400 7d20f8a8 7d290378 7d20f9ad 40a2fff4 7ba00521 7fe3fb78
> 38800002 41820008 4bffff0d 393f0028 <f9290008> f93f0028 3bbd0001
> 3bff0038
> <0>Kernel panic - not syncing: Attempted to kill init!

Instead of detecting the fail and aborting the add, we proceed to
dereference the memory map.

Chris, as you can see, PS3 needs to allocate 1/8th of total initial
memory to add any more memory. Geoff, can you predict what linear
address the additional memory will occupy? Judging from the attempted
address toa add, maybe not. If not, my only thought is to pre-reserve
an additional page and consume it on the first add. Additional adds
will likely draw from the first added region, pinning.

milton

2007-12-04 08:31:05

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Mon, 3 Dec 2007, Milton Miller wrote:
> On Dec 2, 2007, at 9:59 PM, Geoff Levand wrote:
> > I'm finding that recently kexec'ed kernels on PS3 will
> > panic on startup. It seems the trouble was introduced
> > with the ppc64 SPARSEMEM_VMEMMAP support. The problem
> > is the same when starting either new or old kernels:
> >
> > 2.6.24 -> 2.6.23 ok
> > 2.6.24 -> 2.6.23 panic
> > 2.6.24 -> 2.6.24 panic
>
> I'm not sure I completely follow this. What is the difference between 1 and 2
> ? Also, you are talking about starting with kexec, but I don't see how that

I think the first line should be `2.6.23 -> 2.6.23' (i.e. a 2.6.23 kernel
kexec's a 2.6.24(-rc) kernel).

> fits in the failure you have below. In other words, there may be more than
> one failure. But I can talk a bit about the scope of the problem in the
> current traceback.

The problem can be triggered in other ways, e.g. playing with the ps3fb=xxxM
parameter, which specifies how much memory is reserved for ps3fb.
With 2.6.23, I can boot with `ps3fb=48M'. Very early in 2.6.24-rc*, this
changed.
2.6.24 seems to be easier fragmented than 2.6.23.

> > These are the commits that seem to introduce the problem:
> >
> > d29eff7bca60c9ee401d691d4562a4abca8de543 ppc64: SPARSEMEM_VMEMMAP suppor
> > 8f6aac419bd590f535fb110875a51f7db2b62b5b Generic Virtual Memmap support
> > for SPARSEMEM
> >
> >
> > Below is a startup dump. Any help in finding the problem
> > would be appreciated.
> >
> > -Geoff
> >
> >
> >
> > ps3_mm_add_memory:317: start_addr 740320000000h, start_pfn 740320000h,
> > nr_pages 17000h
> > <4>swapper: page allocation failure. order:12, mode:0x80d0
> > Call Trace:
> > [c000000006047820] [c00000000000e700] .show_stack+0x68/0x1b0 (unreliable)
> > [c0000000060478c0] [c000000000089eb4] .__alloc_pages+0x358/0x3ac
> > [c0000000060479b0] [c0000000000a3964] .vmemmap_alloc_block+0x6c/0xf4
> > [c000000006047a40] [c000000000026544] .vmemmap_populate+0x74/0x100
> > [c000000006047ae0] [c0000000000a385c] .sparse_mem_map_populate+0x38/0x5c
> > [c000000006047b70] [c0000000000a36e4] .sparse_add_one_section+0x64/0x128
> > [c000000006047c20] [c0000000000aa74c] .__add_pages+0xac/0x18c
> > [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> > [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> > [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> > [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> > [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> > Mem-info:
> > DMA per-cpu:
> > CPU 0: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd:
> > 0
> > CPU 1: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd:
> > 0
> > Active:0 inactive:0 dirty:0 writeback:0 unstable:0
> > free:18094 slab:122 mapped:0 pagetables:0 bounce:0
> > DMA free:72376kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
> > present:129280kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 8*4kB 5*8kB 5*16kB 7*32kB 3*64kB 5*128kB 4*256kB 3*512kB 5*1024kB
> > 3*2048kB 4*4096kB 5*8192kB 0*16384kB = 72376kB
> > Swap cache: add 0, delete 0, find 0/0, race 0+0
> > Free swap = 0kB
> > Total swap = 0kB
> > Free swap: 0kB
> > 32768 pages of RAM
> > 10403 reserved pages
> > 0 pages shared
> > 0 pages swap cached
>
> The kernel is using 16MB pages for the linear mapping and, since its in the
> same region, the sparse virtural memmap. PS3 uses hotplug for all most all of
> its memory. In this case, its trying to allocate an additional page to cover
> a new region of the memory map. However, the initial 128 MB is fragmented,
> we have 8 8M chunks but no 16MB ones.
>
> > <1>Unable to handle kernel paging request for data at address
> > 0xcf0001960b000010
> > <1>Faulting instruction address: 0xc000000000087340
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=2 PS3
> > Modules linked in:
> > NIP: c000000000087340 LR: c00000000008733c CTR: 0000000000000000
> > REGS: c000000006047900 TRAP: 0300 Not tainted
> > (2.6.24-rc3-ps3-linux-dev-g91428d55-dirty)
> > MSR: 8000000000008032 <EE,IR,DR> CR: 22004444 XER: 00000000
> > DAR: cf0001960b000010, DSISR: 0000000042000000
> > TASK = c000000006041080[1] 'swapper' THREAD: c000000006044000 CPU: 1
> > <6>GPR00: 0000000000000000 c000000006047b80 c00000000052b410
> > c000000006001b40
> > <6>GPR04: 0000000000000001 0000000000000003 0000000000000008
> > 0000000000000000
> > <6>GPR08: 0000000000000002 cf0001960b000008 c000000006051240
> > 0000000000000003
> > <6>GPR12: 0000000000000003 c000000000484080 00000000100d0000
> > 0000000000bc5000
> > <6>GPR16: 0000000007fff000 0000000000000001 00000000100a0000
> > 00000000100d0000
> > <6>GPR20: 0000000000000000 00000000100df628 00000000100df458
> > 00000000100df678
> > <6>GPR24: 0000000000740336 c000000000492c00 0000000000000000
> > 0000000000000001
> > <6>GPR28: 0000000740325000 0000000740324924 c0000000004ce9a8
> > cf0001960affffe0
> > NIP [c000000000087340] .memmap_init_zone+0xf0/0x134
> > LR [c00000000008733c] .memmap_init_zone+0xec/0x134
> > Call Trace:
> > [c000000006047b80] [c0000000001da530] .add_memory_block+0xd8/0x108
> > (unreliable)
> > [c000000006047c20] [c0000000000aa7ac] .__add_pages+0x10c/0x18c
> > [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> > [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> > [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> > [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> > [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> > Instruction dump:
> > 901f000c 38000400 7d20f8a8 7d290378 7d20f9ad 40a2fff4 7ba00521 7fe3fb78
> > 38800002 41820008 4bffff0d 393f0028 <f9290008> f93f0028 3bbd0001 3bff0038
> > <0>Kernel panic - not syncing: Attempted to kill init!
>
> Instead of detecting the fail and aborting the add, we proceed to dereference
> the memory map.
>
> Chris, as you can see, PS3 needs to allocate 1/8th of total initial memory to
> add any more memory. Geoff, can you predict what linear address the
> additional memory will occupy? Judging from the attempted address toa add,
> maybe not. If not, my only thought is to pre-reserve an additional page and
> consume it on the first add. Additional adds will likely draw from the first
> added region, pinning.

To me it sounds a bit strange that hotplug memory relies on having huge
contiguous blocks of memory available. If this isn't done very early in the
boot process, changes are high it will fail.

Would it be possible to allocate the memory from the newly added block, which
is guaranteed to be unfragmented?

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-12-05 04:56:17

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Milton Miller wrote:
> On Dec 2, 2007, at 9:59 PM, Geoff Levand wrote:
>
>> Hi.
>>
>> I'm finding that recently kexec'ed kernels on PS3 will
>> panic on startup. It seems the trouble was introduced
>> with the ppc64 SPARSEMEM_VMEMMAP support. The problem
>> is the same when starting either new or old kernels:
>>
>> 2.6.24 -> 2.6.23 ok
>> 2.6.24 -> 2.6.23 panic
>> 2.6.24 -> 2.6.24 panic
>
> I'm not sure I completely follow this. What is the difference between
> 1 and 2 ?


Sorry, '2.6.23 -> 2.6.24 ok', but it really doesn't have much meaning,
considering what the actual problem is.


> Also, you are talking about starting with kexec, but I
> don't see how that fits in the failure you have below.


I think just buy chance the kexec'ed kernel hits because
the 2.6.24 kernel is just at the point of hitting the condition,
and the memory usage of the kexe'ed kernel hits.

If I just reduce the size of the kernel a small amount kexec
works ok, and as Geert pointed out, if you increase the size
of the first stage kernel it will hit it.


>> DMA free:72376kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
>> present:129280kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0
>> DMA: 8*4kB 5*8kB 5*16kB 7*32kB 3*64kB 5*128kB 4*256kB 3*512kB 5*1024kB
>> 3*2048kB 4*4096kB 5*8192kB 0*16384kB = 72376kB
>> Swap cache: add 0, delete 0, find 0/0, race 0+0
>> Free swap = 0kB
>> Total swap = 0kB
>> Free swap: 0kB
>> 32768 pages of RAM
>> 10403 reserved pages
>> 0 pages shared
>> 0 pages swap cached
>
> The kernel is using 16MB pages for the linear mapping and, since its in
> the same region, the sparse virtural memmap. PS3 uses hotplug for all
> most all of its memory. In this case, its trying to allocate an
> additional page to cover a new region of the memory map. However, the
> initial 128 MB is fragmented, we have 8 8M chunks but no 16MB ones.


Yes, I see this is the problem.

-Geoff

2007-12-05 04:56:31

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Geert Uytterhoeven wrote:
> On Mon, 3 Dec 2007, Milton Miller wrote:
>> Chris, as you can see, PS3 needs to allocate 1/8th of total initial memory to
>> add any more memory. Geoff, can you predict what linear address the
>> additional memory will occupy? Judging from the attempted address toa add,
>> maybe not. If not, my only thought is to pre-reserve an additional page and
>> consume it on the first add. Additional adds will likely draw from the first
>> added region, pinning.
>
> To me it sounds a bit strange that hotplug memory relies on having huge
> contiguous blocks of memory available. If this isn't done very early in the
> boot process, changes are high it will fail.
>
> Would it be possible to allocate the memory from the newly added block, which
> is guaranteed to be unfragmented?


Yes, this sounds like a cleaner solution than pre-allocating, as the memory is
there and its properties are known.

-Geoff

2007-12-05 09:53:00

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Mon, 3 Dec 2007, Milton Miller wrote:
> On Dec 2, 2007, at 9:59 PM, Geoff Levand wrote:
> > ps3_mm_add_memory:317: start_addr 740320000000h, start_pfn 740320000h,
> > nr_pages 17000h
> > <4>swapper: page allocation failure. order:12, mode:0x80d0
> > Call Trace:
> > [c000000006047820] [c00000000000e700] .show_stack+0x68/0x1b0 (unreliable)
> > [c0000000060478c0] [c000000000089eb4] .__alloc_pages+0x358/0x3ac
> > [c0000000060479b0] [c0000000000a3964] .vmemmap_alloc_block+0x6c/0xf4
> > [c000000006047a40] [c000000000026544] .vmemmap_populate+0x74/0x100
> > [c000000006047ae0] [c0000000000a385c] .sparse_mem_map_populate+0x38/0x5c
> > [c000000006047b70] [c0000000000a36e4] .sparse_add_one_section+0x64/0x128
> > [c000000006047c20] [c0000000000aa74c] .__add_pages+0xac/0x18c
> > [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> > [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> > [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> > [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> > [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> > Mem-info:
> > DMA per-cpu:
> > CPU 0: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd:
> > 0
> > CPU 1: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd:
> > 0
> > Active:0 inactive:0 dirty:0 writeback:0 unstable:0
> > free:18094 slab:122 mapped:0 pagetables:0 bounce:0
> > DMA free:72376kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
> > present:129280kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 8*4kB 5*8kB 5*16kB 7*32kB 3*64kB 5*128kB 4*256kB 3*512kB 5*1024kB
> > 3*2048kB 4*4096kB 5*8192kB 0*16384kB = 72376kB
> > Swap cache: add 0, delete 0, find 0/0, race 0+0
> > Free swap = 0kB
> > Total swap = 0kB
> > Free swap: 0kB
> > 32768 pages of RAM
> > 10403 reserved pages
> > 0 pages shared
> > 0 pages swap cached
>
> The kernel is using 16MB pages for the linear mapping and, since its in the
> same region, the sparse virtural memmap. PS3 uses hotplug for all most all of
> its memory. In this case, its trying to allocate an additional page to cover
> a new region of the memory map. However, the initial 128 MB is fragmented,
> we have 8 8M chunks but no 16MB ones.
>
> > <1>Unable to handle kernel paging request for data at address
> > 0xcf0001960b000010
> > <1>Faulting instruction address: 0xc000000000087340
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=2 PS3
> > Modules linked in:
> > NIP: c000000000087340 LR: c00000000008733c CTR: 0000000000000000
> > REGS: c000000006047900 TRAP: 0300 Not tainted
> > (2.6.24-rc3-ps3-linux-dev-g91428d55-dirty)
> > MSR: 8000000000008032 <EE,IR,DR> CR: 22004444 XER: 00000000
> > DAR: cf0001960b000010, DSISR: 0000000042000000
> > TASK = c000000006041080[1] 'swapper' THREAD: c000000006044000 CPU: 1
> > <6>GPR00: 0000000000000000 c000000006047b80 c00000000052b410
> > c000000006001b40
> > <6>GPR04: 0000000000000001 0000000000000003 0000000000000008
> > 0000000000000000
> > <6>GPR08: 0000000000000002 cf0001960b000008 c000000006051240
> > 0000000000000003
> > <6>GPR12: 0000000000000003 c000000000484080 00000000100d0000
> > 0000000000bc5000
> > <6>GPR16: 0000000007fff000 0000000000000001 00000000100a0000
> > 00000000100d0000
> > <6>GPR20: 0000000000000000 00000000100df628 00000000100df458
> > 00000000100df678
> > <6>GPR24: 0000000000740336 c000000000492c00 0000000000000000
> > 0000000000000001
> > <6>GPR28: 0000000740325000 0000000740324924 c0000000004ce9a8
> > cf0001960affffe0
> > NIP [c000000000087340] .memmap_init_zone+0xf0/0x134
> > LR [c00000000008733c] .memmap_init_zone+0xec/0x134
> > Call Trace:
> > [c000000006047b80] [c0000000001da530] .add_memory_block+0xd8/0x108
> > (unreliable)
> > [c000000006047c20] [c0000000000aa7ac] .__add_pages+0x10c/0x18c
> > [c000000006047cd0] [c000000000025fd4] .arch_add_memory+0x44/0x60
> > [c000000006047d60] [c0000000000aa5b0] .add_memory+0xd4/0x124
> > [c000000006047e00] [c000000000452544] .ps3_mm_add_memory+0x8c/0x108
> > [c000000006047ea0] [c0000000004417c4] .kernel_init+0x1f4/0x3b8
> > [c000000006047f90] [c000000000021d88] .kernel_thread+0x4c/0x68
> > Instruction dump:
> > 901f000c 38000400 7d20f8a8 7d290378 7d20f9ad 40a2fff4 7ba00521 7fe3fb78
> > 38800002 41820008 4bffff0d 393f0028 <f9290008> f93f0028 3bbd0001 3bff0038
> > <0>Kernel panic - not syncing: Attempted to kill init!
>
> Instead of detecting the fail and aborting the add, we proceed to dereference
> the memory map.

sparse_add_one_section() does:

memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, nr_pages);

but doesn't check whether the allocation succeeded.

Patch to make it continue booting below (but this doesn't fix the issue that
you need 16 MiB of contiguous memory before you can add a new memory region).

--------------------------------------------------------------------------------
Subject: sparsemem: sparse_add_one_section() may fail to allocate memory

sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
whether the allocation succeeded before proceeding to touch the allocated
memory.

From: Geert Uytterhoeven <[email protected]>

Signed-off-by: Geert Uytterhoeven <[email protected]>
---
FIXME There are still some possible memory leaks in sparse_add_one_section():
- usemap is never deallocated
- __kfree_section_memmap() is a not yet implemented dummy

mm/sparse.c | 3 +++
1 files changed, 3 insertions(+)

--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -391,6 +391,9 @@ int sparse_add_one_section(struct zone *
*/
sparse_index_init(section_nr, pgdat->node_id);
memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, nr_pages);
+ if (!memmap)
+ return -ENOMEM;
+
usemap = __kmalloc_section_usemap();

pgdat_resize_lock(pgdat, &flags);


With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-12-05 23:12:37

by Andrew Morton

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Wed, 5 Dec 2007 10:52:48 +0100 (CET)
Geert Uytterhoeven <[email protected]> wrote:

> --------------------------------------------------------------------------------
> Subject: sparsemem: sparse_add_one_section() may fail to allocate memory
>
> sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
> whether the allocation succeeded before proceeding to touch the allocated
> memory.
>
> From: Geert Uytterhoeven <[email protected]>
>
> Signed-off-by: Geert Uytterhoeven <[email protected]>
> ---
> FIXME There are still some possible memory leaks in sparse_add_one_section():
> - usemap is never deallocated
> - __kfree_section_memmap() is a not yet implemented dummy

I already had

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch
and
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch

queued. Do they fix the problem, and should they be merged in 2.6.24?

2007-12-05 23:45:27

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Andrew Morton wrote:
> On Wed, 5 Dec 2007 10:52:48 +0100 (CET)
> Geert Uytterhoeven <[email protected]> wrote:
>
>> --------------------------------------------------------------------------------
>> Subject: sparsemem: sparse_add_one_section() may fail to allocate memory
>>
>> sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
>> whether the allocation succeeded before proceeding to touch the allocated
>> memory.
>>
>> From: Geert Uytterhoeven <[email protected]>
>>
>> Signed-off-by: Geert Uytterhoeven <[email protected]>
>> ---
>> FIXME There are still some possible memory leaks in sparse_add_one_section():
>> - usemap is never deallocated
>> - __kfree_section_memmap() is a not yet implemented dummy
>
> I already had
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch
> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch
>
> queued. Do they fix the problem, and should they be merged in 2.6.24?

No, a quick test shows it just panics in a different place. Geert's
patch does also.

I'll try Milton's suggestion to pre-allocate the memory early. It seems
that should work as long as nothing else before the hot-plug mem is added
needs a large chunk.

-Geoff

2007-12-06 05:44:27

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Andrew Morton wrote:
> On Wed, 5 Dec 2007 10:52:48 +0100 (CET)
> Geert Uytterhoeven <[email protected]> wrote:
>
>> --------------------------------------------------------------------------------
>> Subject: sparsemem: sparse_add_one_section() may fail to allocate memory
>>
>> sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
>> whether the allocation succeeded before proceeding to touch the allocated
>> memory.
>>
>> From: Geert Uytterhoeven <[email protected]>
>>
>> Signed-off-by: Geert Uytterhoeven <[email protected]>
>> ---
>> FIXME There are still some possible memory leaks in sparse_add_one_section():
>> - usemap is never deallocated
>> - __kfree_section_memmap() is a not yet implemented dummy
>
> I already had
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch


This one has an error in it. A patch to fix it is below.


> and
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch
>
> queued. Do they fix the problem, and should they be merged in 2.6.24?


These two plus my fix below allow the hot plug add_memory() call to fail
gracefully and for the platform code to continue to boot on the
128MB of boot mem.

With ps3_defconfig the condition is only hit by the second stage
kexec'ed (kboot) kernel, which is not generally built by end users,
but there is a chance this condition would be hit by custom kernel
config, so I think they should go in for 2.6.24.

I'll continue to work on a fix for the memory allocation failure.

-Geoff


------------------
Subject: sparsemem: Fix sparse_index_init return check

sparse_index_init() returns -EEXIST to indicate the index
has already been created. Exclude this from the error check
on the return value.

Signed-off-by: Geoff Levand <[email protected]>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -392,7 +392,7 @@ int sparse_add_one_section(struct zone *
* plus, it does a kmalloc
*/
ret = sparse_index_init(section_nr, pgdat->node_id);
- if (ret < 0)
+ if (ret < 0 && ret != -EEXIST)
return ret;
memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, nr_pages);
if (!memmap)


2007-12-06 06:11:17

by Yasunori Goto

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec


> I'll try Milton's suggestion to pre-allocate the memory early. It seems
> that should work as long as nothing else before the hot-plug mem is added
> needs a large chunk.

Hello. Geoff-san. Sorry for late response.

Could you tell me the value of the following page_size calculation
in vmemmap_populate()? I think this page_size may be too big value.

------
int __meminit vmemmap_populate(struct page *start_page,
unsigned long nr_pages, int node)
:
:
unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
:
-------


In addition, I remember that current add_memory() is designed for
only 1 section's addition. (See: memory_probe_store() and
sparse_mem_map_populate().
they require only for 1 section's mem_map by specifing
PAGES_PER_SECTION.)
The 1 section size for normal powerpc box is only 16MB.
(IA64 -> 1GB, x86-64 -> 128MB).

But, if my understanding is correct, PS3's add_memory() requires all
of total memory. I'm afraid something other problems might be hidden
in this issue yet.

(However, I think Milton-san's suggestion is very desirable.
If preallocation of hotadd works on ia64 too, I'm very glad.)

Thanks.

--
Yasunori Goto

2007-12-06 07:41:42

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Wed, 5 Dec 2007, Geoff Levand wrote:
> Andrew Morton wrote:
> > On Wed, 5 Dec 2007 10:52:48 +0100 (CET)
> > Geert Uytterhoeven <[email protected]> wrote:
> >
> >> --------------------------------------------------------------------------------
> >> Subject: sparsemem: sparse_add_one_section() may fail to allocate memory
> >>
> >> sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
> >> whether the allocation succeeded before proceeding to touch the allocated
> >> memory.
> >>
> >> From: Geert Uytterhoeven <[email protected]>
> >>
> >> Signed-off-by: Geert Uytterhoeven <[email protected]>
> >> ---
> >> FIXME There are still some possible memory leaks in sparse_add_one_section():
> >> - usemap is never deallocated
> >> - __kfree_section_memmap() is a not yet implemented dummy
> >
> > I already had
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch
> > and
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch
> >
> > queued. Do they fix the problem, and should they be merged in 2.6.24?
>
> No, a quick test shows it just panics in a different place. Geert's
> patch does also.

What do you mean, that it still paniced after my patch?

The kernel did boot succesfully for me when passing ps3fb=48M. Userspace saw 58
MiB (128 MiB - kernelsize - 48 MiB(ps3fb)).

I did not try kexec, though.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-12-06 09:55:23

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Thu, 6 Dec 2007, Yasunori Goto wrote:
> > I'll try Milton's suggestion to pre-allocate the memory early. It seems
> > that should work as long as nothing else before the hot-plug mem is added
> > needs a large chunk.
>
> Hello. Geoff-san. Sorry for late response.
>
> Could you tell me the value of the following page_size calculation
> in vmemmap_populate()? I think this page_size may be too big value.
>
> ------
> int __meminit vmemmap_populate(struct page *start_page,
> unsigned long nr_pages, int node)
> :
> :
> unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> :
> -------

24 MiB

> In addition, I remember that current add_memory() is designed for
> only 1 section's addition. (See: memory_probe_store() and
> sparse_mem_map_populate().
> they require only for 1 section's mem_map by specifing
> PAGES_PER_SECTION.)
> The 1 section size for normal powerpc box is only 16MB.
> (IA64 -> 1GB, x86-64 -> 128MB).
>
> But, if my understanding is correct, PS3's add_memory() requires all
> of total memory. I'm afraid something other problems might be hidden
> in this issue yet.
>
> (However, I think Milton-san's suggestion is very desirable.
> If preallocation of hotadd works on ia64 too, I'm very glad.)

PS3 initially starts with 128 MiB.
Later hotplug is used to add the remaining memory (96 or 112 MIB, IIRC).

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-12-06 09:55:48

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On Thu, 6 Dec 2007, Geert Uytterhoeven wrote:
> On Thu, 6 Dec 2007, Yasunori Goto wrote:
> > > I'll try Milton's suggestion to pre-allocate the memory early. It seems
> > > that should work as long as nothing else before the hot-plug mem is added
> > > needs a large chunk.
> >
> > Hello. Geoff-san. Sorry for late response.
> >
> > Could you tell me the value of the following page_size calculation
> > in vmemmap_populate()? I think this page_size may be too big value.
> >
> > ------
> > int __meminit vmemmap_populate(struct page *start_page,
> > unsigned long nr_pages, int node)
> > :
> > :
> > unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> > :
> > -------
>
> 24 MiB

Bummer, messing up bits and MiB.

16 MiB of course.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-12-06 10:50:07

by Yasunori Goto

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

> On Thu, 6 Dec 2007, Geert Uytterhoeven wrote:
> > On Thu, 6 Dec 2007, Yasunori Goto wrote:
> > > > I'll try Milton's suggestion to pre-allocate the memory early. It seems
> > > > that should work as long as nothing else before the hot-plug mem is added
> > > > needs a large chunk.
> > >
> > > Hello. Geoff-san. Sorry for late response.
> > >
> > > Could you tell me the value of the following page_size calculation
> > > in vmemmap_populate()? I think this page_size may be too big value.
> > >
> > > ------
> > > int __meminit vmemmap_populate(struct page *start_page,
> > > unsigned long nr_pages, int node)
> > > :
> > > :
> > > unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> > > :
> > > -------
> >
> > 24 MiB
>
> Bummer, messing up bits and MiB.
>
> 16 MiB of course.

16 MiB is not page size. It is "section size".
IIRC, powerpc's page size must be 4K (or 64K).
If page size is 4k, vmemmap_alloc_block will call the order 12 page.

Is it really correct value for vmemmap population?

> PS3 initially starts with 128 MiB.
> Later hotplug is used to add the remaining memory (96 or 112 MIB, IIRC).

Ok.
Then, add_memory() must be called 6 or 7 times for each sections.

Thanks.


--
Yasunori Goto

2007-12-07 05:55:31

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Geert Uytterhoeven wrote:
> On Wed, 5 Dec 2007, Geoff Levand wrote:
>> Andrew Morton wrote:
>> > On Wed, 5 Dec 2007 10:52:48 +0100 (CET)
>> > Geert Uytterhoeven <[email protected]> wrote:
>> >
>> >> --------------------------------------------------------------------------------
>> >> Subject: sparsemem: sparse_add_one_section() may fail to allocate memory
>> >>
>> >> sparsemem: sparse_add_one_section() may fail to allocate memory, and must check
>> >> whether the allocation succeeded before proceeding to touch the allocated
>> >> memory.
>> >>
>> >> From: Geert Uytterhoeven <[email protected]>
>> >>
>> >> Signed-off-by: Geert Uytterhoeven <[email protected]>
>> >> ---
>> >> FIXME There are still some possible memory leaks in sparse_add_one_section():
>> >> - usemap is never deallocated
>> >> - __kfree_section_memmap() is a not yet implemented dummy
>> >
>> > I already had
>> >
>> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch
>> > and
>> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch
>> >
>> > queued. Do they fix the problem, and should they be merged in 2.6.24?
>>
>> No, a quick test shows it just panics in a different place. Geert's
>> patch does also.
>
> What do you mean, that it still paniced after my patch?
>
> The kernel did boot succesfully for me when passing ps3fb=48M. Userspace saw 58
> MiB (128 MiB - kernelsize - 48 MiB(ps3fb)).
>
> I did not try kexec, though.

On looking at it, your patch should have worked, so I guess I didn't boot the
correct image, or something like that. Sorry.

-Geoff

2007-12-08 02:50:14

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

Yasunori Goto wrote:
>> On Thu, 6 Dec 2007, Geert Uytterhoeven wrote:
>> > On Thu, 6 Dec 2007, Yasunori Goto wrote:
>> > > > I'll try Milton's suggestion to pre-allocate the memory early. It seems
>> > > > that should work as long as nothing else before the hot-plug mem is added
>> > > > needs a large chunk.
>> > >
>> > > Hello. Geoff-san. Sorry for late response.
>> > >
>> > > Could you tell me the value of the following page_size calculation
>> > > in vmemmap_populate()? I think this page_size may be too big value.
>> > >
>> > > ------
>> > > int __meminit vmemmap_populate(struct page *start_page,
>> > > unsigned long nr_pages, int node)
>> > > :
>> > > :
>> > > unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
>> > > :
>> > > -------
>>
>> 16 MiB of course.
>
> 16 MiB is not page size. It is "section size".
> IIRC, powerpc's page size must be 4K (or 64K).
> If page size is 4k, vmemmap_alloc_block will call the order 12 page.


By default PS3 uses 4K virtual pages, and 16M linear pages.


> Is it really correct value for vmemmap population?


It seems vmemmap needs linear pages, so I think it is ok.


>> PS3 initially starts with 128 MiB.
>> Later hotplug is used to add the remaining memory (96 or 112 MIB, IIRC).
>
> Ok.
> Then, add_memory() must be called 6 or 7 times for each sections.


Yes, I call add_memory() once, then it in turn calls sparse_add_one_section()
7 times.

-Geoff

2007-12-08 03:27:01

by Geoff Levand

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

On 12/05/2007 10:09 PM, Yasunori Goto wrote:
>> I'll try Milton's suggestion to pre-allocate the memory early. It seems
>> that should work as long as nothing else before the hot-plug mem is added
>> needs a large chunk.
>
> (However, I think Milton-san's suggestion is very desirable.
> If preallocation of hotadd works on ia64 too, I'm very glad.)

As it turns out, preallocation is not a such a good solution I
think because in the general case the system may need many
allocations to support the added memory, so it would be difficult
to know how much pre-allocated memory is needed. I think a
preferable solution is to use memory from the newly added
region.

I don't plan to work on a pre-allocation method. I think I can
free up sufficient memory in other ways.

-Geoff

2007-12-08 03:50:29

by jeff

[permalink] [raw]
Subject: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

I am running linux kernel 2.6.23.1, which I compiled.
The base system was mandriva 2008.

I have a dual processor pentium III 933 system.
It has 3gb of ram, an intel stl-2 motherboard.
It also has a promise 100 tx2 pata controller,
a supermicro marvell based 8 port pcix sata controller,
and a nvidia pci based video card.

I have the os on a pata drive, and have made a software raid array
consisting of 4 sata drives attached to the pcix sata controller.
I created the array, and formatted with reiserfs 3.6
I have run bonnie++ (filesystem benchmark) on the array without incident.
When I use samba-3.0.25b-4.3 and copy files from a windows machine to
the fileserver,
every so often, the fileserver crashes or hangs. It seems to happen
more often under heavy samba traffic.
Enclosed is the oops from syslog.
I also have a 'kernel bug' from syslog if that would be helpful.

jeff


Dec 7 17:20:52 sata_fileserver kernel: BUG: unable to handle kernel
NULL pointer dereference at virtual address 0000000d
Dec 7 17:20:52 sata_fileserver kernel: printing eip:
Dec 7 17:20:52 sata_fileserver kernel: c02cc820
Dec 7 17:20:52 sata_fileserver kernel: *pde = 00000000
Dec 7 17:20:52 sata_fileserver kernel: Oops: 0000 [#1]
Dec 7 17:20:52 sata_fileserver kernel: SMP
Dec 7 17:20:52 sata_fileserver kernel: Modules linked in: raid456
async_xor async_memcpy async_tx xor iptable_raw xt_comment xt_policy
xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME
ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP
ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn ipt_CLUSTERIP
ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip
nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp
nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp
nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns
nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss
xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG xt_MARK xt_mark xt_mac
xt_limit xt_length xt_helper xt_hashlimit ip6_tables xt_dccp
xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY nfsd xt_tcpudp
exportfs auth_rpcgss xt_state iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nfs iptable_mangle lockd nfs_acl sunrpc nfnetlink
iptable_filter ip_table
Dec 7 17:20:52 sata_fileserver kernel: x_tables af_packet ipv6
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer_oss ipmi_si ipmi_msghandler binfmt_misc loop nls_utf8 ntfs
dm_mod usb_storage sg sd_mod sata_mv libata scsi_mod video output
thermal sbs processor fan container button dock battery ac floppy
snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm
snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep
ehci_hcd snd ohci_hcd i2c_piix4 uhci_hcd soundcore e1000 sworks_agp
i2c_core ide_cd usbcore agpgart emu10k1_gp gameport tsdev evdev
reiserfs ide_disk serverworks pdc202xx_new ide_core
Dec 7 17:20:52 sata_fileserver kernel: CPU: 1
Dec 7 17:20:52 sata_fileserver kernel:
EIP: 0060:[<c02cc820>] Not tainted VLI
Dec 7 17:20:52 sata_fileserver kernel: EFLAGS: 00210202 (2.6.23.1 #1)
Dec 7 17:20:52 sata_fileserver kernel: EIP is at tcp_recvmsg+0x150/0xbf0
Dec 7 17:20:52 sata_fileserver kernel: eax: 00000000 ebx:
f55c4b60 ecx: 784e2c7c edx: f63f63d8
Dec 7 17:20:52 sata_fileserver kernel: esi: 784e2c7a edi:
f63f614c ebp: e21fde24 esp: e21fddc4
Dec 7 17:20:52 sata_fileserver kernel: ds: 007b es: 007b fs:
00d8 gs: 0033 ss: 0068
Dec 7 17:20:52 sata_fileserver kernel: Process smbd (pid: 9524,
ti=e21fc000 task=f5109000 task.ti=e21fc000)
Dec 7 17:20:52 sata_fileserver kernel: Stack: 00000000 ffffffff
00000000 c13e5740 f557b000 c03fa300 00000000 e21fde90
Dec 7 17:20:52 sata_fileserver kernel: f63f60e0 00000000
00000b64 f63f63d8 000005b4 00000001 00000000 00000000
Dec 7 17:20:52 sata_fileserver kernel: 00000000 000005b4
e21fde4c 7fffffff e21fde28 00000000 c03a4de0 e21fde90
Dec 7 17:20:52 sata_fileserver kernel: Call Trace:
Dec 7 17:20:53 sata_fileserver kernel: [<c010542a>]
show_trace_log_lvl+0x1a/0x30
Dec 7 17:20:53 sata_fileserver kernel: [<c01054eb>]
show_stack_log_lvl+0xab/0xd0
Dec 7 17:20:53 sata_fileserver kernel: [<c01056e1>]
show_registers+0x1d1/0x2d0
Dec 7 17:20:53 sata_fileserver kernel: [<c01058f6>] die+0x116/0x250
Dec 7 17:20:53 sata_fileserver kernel: [<c011f52b>] do_page_fault+0x28b/0x6a0
Dec 7 17:20:53 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
Dec 7 17:20:53 sata_fileserver kernel: [<c0295423>]
sock_common_recvmsg+0x43/0x60
Dec 7 17:20:53 sata_fileserver kernel: [<c029301c>] sock_aio_read+0x11c/0x130
Dec 7 17:20:53 sata_fileserver kernel: [<c017db30>] do_sync_read+0xd0/0x110
Dec 7 17:20:53 sata_fileserver kernel: [<c017e47d>] vfs_read+0x12d/0x140
Dec 7 17:20:53 sata_fileserver kernel: [<c017e8bd>] sys_read+0x3d/0x70
Dec 7 17:20:53 sata_fileserver kernel: [<c01042fe>]
sysenter_past_esp+0x6b/0xa1
Dec 7 17:20:53 sata_fileserver kernel: =======================
Dec 7 17:20:53 sata_fileserver kernel: Code: 6c 39 df 74 59 8d b6 00
00 00 00 85 db 74 4f 8b 55 cc 8d 43 20 8b 0a 3b 48 18 0f 88 f4 05 00
00 89 ce 2b 70 18 8b 83 90 00 00 00 <0f> b6 50 0d 89 d0 83 e0 02 3c
01 8b 43 50 83 d6 ff 39 c6 0f 82
Dec 7 17:20:53 sata_fileserver kernel: EIP: [<c02cc820>]
tcp_recvmsg+0x150/0xbf0 SS:ESP 0068:e21fddc4
Dec 7 17:21:11 sata_fileserver kernel:
Shorewall:net2all:DROP:IN=eth0 OUT=
MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9964
PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24064
Dec 7 17:21:13 sata_fileserver kernel:
Shorewall:net2all:DROP:IN=eth0 OUT=
MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9975
PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24320

2007-12-09 04:22:30

by Geoff Levand

[permalink] [raw]
Subject: sparsemem: Make SPARSEMEM_VMEMMAP selectable


From: Geoff Levand <[email protected]>

SPARSEMEM_VMEMMAP needs to be a selectable config option to
support building the kernel both with and without sparsemem
vmemmap support. This selection is desirable for platforms
which could be configured one way for platform specific
builds and the other for multi-platform builds.

Signed-off-by: Miguel Bot?n <[email protected]>
Signed-off-by: Geoff Levand <[email protected]>
---

Andrew,

Please consider for 2.6.24.

-Geoff


mm/Kconfig | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -112,18 +112,17 @@ config SPARSEMEM_EXTREME
def_bool y
depends on SPARSEMEM && !SPARSEMEM_STATIC

-#
-# SPARSEMEM_VMEMMAP uses a virtually mapped mem_map to optimise pfn_to_page
-# and page_to_pfn. The most efficient option where kernel virtual space is
-# not under pressure.
-#
config SPARSEMEM_VMEMMAP_ENABLE
def_bool n

config SPARSEMEM_VMEMMAP
- bool
- depends on SPARSEMEM
- default y if (SPARSEMEM_VMEMMAP_ENABLE)
+ bool "Sparse Memory virtual memmap"
+ depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
+ default y
+ help
+ SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise
+ pfn_to_page and page_to_pfn operations. This is the most
+ efficient option when sufficient kernel resources are available.

# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG

2007-12-10 01:57:06

by Yasunori Goto

[permalink] [raw]
Subject: Re: PS3: trouble with SPARSEMEM_VMEMMAP and kexec

> Yasunori Goto wrote:
> >> On Thu, 6 Dec 2007, Geert Uytterhoeven wrote:
> >> > On Thu, 6 Dec 2007, Yasunori Goto wrote:
> >> > > > I'll try Milton's suggestion to pre-allocate the memory early. It seems
> >> > > > that should work as long as nothing else before the hot-plug mem is added
> >> > > > needs a large chunk.
> >> > >
> >> > > Hello. Geoff-san. Sorry for late response.
> >> > >
> >> > > Could you tell me the value of the following page_size calculation
> >> > > in vmemmap_populate()? I think this page_size may be too big value.
> >> > >
> >> > > ------
> >> > > int __meminit vmemmap_populate(struct page *start_page,
> >> > > unsigned long nr_pages, int node)
> >> > > :
> >> > > :
> >> > > unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
> >> > > :
> >> > > -------
> >>
> >> 16 MiB of course.
> >
> > 16 MiB is not page size. It is "section size".
> > IIRC, powerpc's page size must be 4K (or 64K).
> > If page size is 4k, vmemmap_alloc_block will call the order 12 page.
>
>
> By default PS3 uses 4K virtual pages, and 16M linear pages.
>
>
> > Is it really correct value for vmemmap population?
>
>
> It seems vmemmap needs linear pages, so I think it is ok.

Oh, I see. Sorry for noise.

Bye.

--
Yasunori Goto

2007-12-10 05:51:20

by Yasunori Goto

[permalink] [raw]
Subject: Re: sparsemem: Make SPARSEMEM_VMEMMAP selectable

Looks good to me.

Thanks.

Acked-by: Yasunori Goto <[email protected]>


>
> From: Geoff Levand <[email protected]>
>
> SPARSEMEM_VMEMMAP needs to be a selectable config option to
> support building the kernel both with and without sparsemem
> vmemmap support. This selection is desirable for platforms
> which could be configured one way for platform specific
> builds and the other for multi-platform builds.
>
> Signed-off-by: Miguel Boton <[email protected]>
> Signed-off-by: Geoff Levand <[email protected]>
> ---
>
> Andrew,
>
> Please consider for 2.6.24.
>
> -Geoff
>
>
> mm/Kconfig | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -112,18 +112,17 @@ config SPARSEMEM_EXTREME
> def_bool y
> depends on SPARSEMEM && !SPARSEMEM_STATIC
>
> -#
> -# SPARSEMEM_VMEMMAP uses a virtually mapped mem_map to optimise pfn_to_page
> -# and page_to_pfn. The most efficient option where kernel virtual space is
> -# not under pressure.
> -#
> config SPARSEMEM_VMEMMAP_ENABLE
> def_bool n
>
> config SPARSEMEM_VMEMMAP
> - bool
> - depends on SPARSEMEM
> - default y if (SPARSEMEM_VMEMMAP_ENABLE)
> + bool "Sparse Memory virtual memmap"
> + depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
> + default y
> + help
> + SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise
> + pfn_to_page and page_to_pfn operations. This is the most
> + efficient option when sufficient kernel resources are available.
>
> # eventually, we can have this option just 'select SPARSEMEM'
> config MEMORY_HOTPLUG
>
>

--
Yasunori Goto

2007-12-16 11:05:34

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

On Fri, 07 Dec 2007 19:49:52 -0800 jeffunit <[email protected]> wrote:

> I am running linux kernel 2.6.23.1, which I compiled.
> The base system was mandriva 2008.
>
> I have a dual processor pentium III 933 system.
> It has 3gb of ram, an intel stl-2 motherboard.
> It also has a promise 100 tx2 pata controller,
> a supermicro marvell based 8 port pcix sata controller,
> and a nvidia pci based video card.
>
> I have the os on a pata drive, and have made a software raid array
> consisting of 4 sata drives attached to the pcix sata controller.
> I created the array, and formatted with reiserfs 3.6
> I have run bonnie++ (filesystem benchmark) on the array without incident.
> When I use samba-3.0.25b-4.3 and copy files from a windows machine to
> the fileserver,
> every so often, the fileserver crashes or hangs. It seems to happen
> more often under heavy samba traffic.
> Enclosed is the oops from syslog.
> I also have a 'kernel bug' from syslog if that would be helpful.
>
> jeff
>
>
> Dec 7 17:20:52 sata_fileserver kernel: BUG: unable to handle kernel
> NULL pointer dereference at virtual address 0000000d
> Dec 7 17:20:52 sata_fileserver kernel: printing eip:
> Dec 7 17:20:52 sata_fileserver kernel: c02cc820
> Dec 7 17:20:52 sata_fileserver kernel: *pde = 00000000
> Dec 7 17:20:52 sata_fileserver kernel: Oops: 0000 [#1]
> Dec 7 17:20:52 sata_fileserver kernel: SMP
> Dec 7 17:20:52 sata_fileserver kernel: Modules linked in: raid456
> async_xor async_memcpy async_tx xor iptable_raw xt_comment xt_policy
> xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME
> ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP
> ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn ipt_CLUSTERIP
> ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip
> nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
> nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp
> nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns
> nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss
> xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG xt_MARK xt_mark xt_mac
> xt_limit xt_length xt_helper xt_hashlimit ip6_tables xt_dccp
> xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY nfsd xt_tcpudp
> exportfs auth_rpcgss xt_state iptable_nat nf_nat nf_conntrack_ipv4
> nf_conntrack nfs iptable_mangle lockd nfs_acl sunrpc nfnetlink
> iptable_filter ip_table
> Dec 7 17:20:52 sata_fileserver kernel: x_tables af_packet ipv6
> snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
> snd_mixer_oss ipmi_si ipmi_msghandler binfmt_misc loop nls_utf8 ntfs
> dm_mod usb_storage sg sd_mod sata_mv libata scsi_mod video output
> thermal sbs processor fan container button dock battery ac floppy
> snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm
> snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep
> ehci_hcd snd ohci_hcd i2c_piix4 uhci_hcd soundcore e1000 sworks_agp
> i2c_core ide_cd usbcore agpgart emu10k1_gp gameport tsdev evdev
> reiserfs ide_disk serverworks pdc202xx_new ide_core
> Dec 7 17:20:52 sata_fileserver kernel: CPU: 1
> Dec 7 17:20:52 sata_fileserver kernel:
> EIP: 0060:[<c02cc820>] Not tainted VLI
> Dec 7 17:20:52 sata_fileserver kernel: EFLAGS: 00210202 (2.6.23.1 #1)
> Dec 7 17:20:52 sata_fileserver kernel: EIP is at tcp_recvmsg+0x150/0xbf0
> Dec 7 17:20:52 sata_fileserver kernel: eax: 00000000 ebx:
> f55c4b60 ecx: 784e2c7c edx: f63f63d8
> Dec 7 17:20:52 sata_fileserver kernel: esi: 784e2c7a edi:
> f63f614c ebp: e21fde24 esp: e21fddc4
> Dec 7 17:20:52 sata_fileserver kernel: ds: 007b es: 007b fs:
> 00d8 gs: 0033 ss: 0068
> Dec 7 17:20:52 sata_fileserver kernel: Process smbd (pid: 9524,
> ti=e21fc000 task=f5109000 task.ti=e21fc000)
> Dec 7 17:20:52 sata_fileserver kernel: Stack: 00000000 ffffffff
> 00000000 c13e5740 f557b000 c03fa300 00000000 e21fde90
> Dec 7 17:20:52 sata_fileserver kernel: f63f60e0 00000000
> 00000b64 f63f63d8 000005b4 00000001 00000000 00000000
> Dec 7 17:20:52 sata_fileserver kernel: 00000000 000005b4
> e21fde4c 7fffffff e21fde28 00000000 c03a4de0 e21fde90
> Dec 7 17:20:52 sata_fileserver kernel: Call Trace:
> Dec 7 17:20:53 sata_fileserver kernel: [<c010542a>]
> show_trace_log_lvl+0x1a/0x30
> Dec 7 17:20:53 sata_fileserver kernel: [<c01054eb>]
> show_stack_log_lvl+0xab/0xd0
> Dec 7 17:20:53 sata_fileserver kernel: [<c01056e1>]
> show_registers+0x1d1/0x2d0
> Dec 7 17:20:53 sata_fileserver kernel: [<c01058f6>] die+0x116/0x250
> Dec 7 17:20:53 sata_fileserver kernel: [<c011f52b>] do_page_fault+0x28b/0x6a0
> Dec 7 17:20:53 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
> Dec 7 17:20:53 sata_fileserver kernel: [<c0295423>]
> sock_common_recvmsg+0x43/0x60
> Dec 7 17:20:53 sata_fileserver kernel: [<c029301c>] sock_aio_read+0x11c/0x130
> Dec 7 17:20:53 sata_fileserver kernel: [<c017db30>] do_sync_read+0xd0/0x110
> Dec 7 17:20:53 sata_fileserver kernel: [<c017e47d>] vfs_read+0x12d/0x140
> Dec 7 17:20:53 sata_fileserver kernel: [<c017e8bd>] sys_read+0x3d/0x70
> Dec 7 17:20:53 sata_fileserver kernel: [<c01042fe>]
> sysenter_past_esp+0x6b/0xa1
> Dec 7 17:20:53 sata_fileserver kernel: =======================
> Dec 7 17:20:53 sata_fileserver kernel: Code: 6c 39 df 74 59 8d b6 00
> 00 00 00 85 db 74 4f 8b 55 cc 8d 43 20 8b 0a 3b 48 18 0f 88 f4 05 00
> 00 89 ce 2b 70 18 8b 83 90 00 00 00 <0f> b6 50 0d 89 d0 83 e0 02 3c
> 01 8b 43 50 83 d6 ff 39 c6 0f 82
> Dec 7 17:20:53 sata_fileserver kernel: EIP: [<c02cc820>]
> tcp_recvmsg+0x150/0xbf0 SS:ESP 0068:e21fddc4
> Dec 7 17:21:11 sata_fileserver kernel:
> Shorewall:net2all:DROP:IN=eth0 OUT=
> MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
> DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9964
> PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24064
> Dec 7 17:21:13 sata_fileserver kernel:
> Shorewall:net2all:DROP:IN=eth0 OUT=
> MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
> DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9975
> PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24320

(Please try to avoid the wordwrapping).

That's a networking crash. Do the oops traces which you're getting all look
like this one?

Pentium III's are getting a bit old (resistive connections, drooping
power supplies, etc) so there's a decent chance that you're seeing
hardware failures here.

2007-12-16 11:57:17

by Herbert Xu

[permalink] [raw]
Subject: Re: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

Andrew Morton <[email protected]> wrote:
>
>> Dec 7 17:20:53 sata_fileserver kernel: Code: 6c 39 df 74 59 8d b6 00
>> 00 00 00 85 db 74 4f 8b 55 cc 8d 43 20 8b 0a 3b 48 18 0f 88 f4 05 00
>> 00 89 ce 2b 70 18 8b 83 90 00 00 00 <0f> b6 50 0d 89 d0 83 e0 02 3c
>> 01 8b 43 50 83 d6 ff 39 c6 0f 82

This means that skb->network_header == NULL so this line crashes:

if (tcp_hdr(skb)->syn)
offset--;

> That's a networking crash. Do the oops traces which you're getting all look
> like this one?

What's spooky is that I just did a google and we've had reports
since 1998 all crashing on exactly the same line in tcp_recvmsg.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-12-16 12:22:10

by Herbert Xu

[permalink] [raw]
Subject: Re: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

On Sun, Dec 16, 2007 at 07:56:56PM +0800, Herbert Xu wrote:
>
> What's spooky is that I just did a google and we've had reports
> since 1998 all crashing on exactly the same line in tcp_recvmsg.

However, there's been no reports at all since 2000 apart from this
one so the earlier ones are probably not related.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-12-16 14:56:15

by jeff

[permalink] [raw]
Subject: Re: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

At 03:05 AM 12/16/2007, Andrew Morton wrote:
>On Fri, 07 Dec 2007 19:49:52 -0800 jeffunit <[email protected]> wrote:
>
> > I am running linux kernel 2.6.23.1, which I compiled.
> > The base system was mandriva 2008.
> >
> > I have a dual processor pentium III 933 system.
> > It has 3gb of ram, an intel stl-2 motherboard.
> > It also has a promise 100 tx2 pata controller,
> > a supermicro marvell based 8 port pcix sata controller,
> > and a nvidia pci based video card.
> >
> > I have the os on a pata drive, and have made a software raid array
> > consisting of 4 sata drives attached to the pcix sata controller.
> > I created the array, and formatted with reiserfs 3.6
> > I have run bonnie++ (filesystem benchmark) on the array without incident.
> > When I use samba-3.0.25b-4.3 and copy files from a windows machine to
> > the fileserver,
> > every so often, the fileserver crashes or hangs. It seems to happen
> > more often under heavy samba traffic.
> > Enclosed is the oops from syslog.
> > I also have a 'kernel bug' from syslog if that would be helpful.
> >
> > jeff
> >
> >
> > Dec 7 17:20:52 sata_fileserver kernel: BUG: unable to handle kernel
> > NULL pointer dereference at virtual address 0000000d
> > Dec 7 17:20:52 sata_fileserver kernel: printing eip:
> > Dec 7 17:20:52 sata_fileserver kernel: c02cc820
> > Dec 7 17:20:52 sata_fileserver kernel: *pde = 00000000
> > Dec 7 17:20:52 sata_fileserver kernel: Oops: 0000 [#1]
> > Dec 7 17:20:52 sata_fileserver kernel: SMP
> > Dec 7 17:20:52 sata_fileserver kernel: Modules linked in: raid456
> > async_xor async_memcpy async_tx xor iptable_raw xt_comment xt_policy
> > xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME
> > ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP
> > ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn ipt_CLUSTERIP
> > ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip
> > nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
> > nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp
> > nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp
> > nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns
> > nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss
> > xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG xt_MARK xt_mark xt_mac
> > xt_limit xt_length xt_helper xt_hashlimit ip6_tables xt_dccp
> > xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY nfsd xt_tcpudp
> > exportfs auth_rpcgss xt_state iptable_nat nf_nat nf_conntrack_ipv4
> > nf_conntrack nfs iptable_mangle lockd nfs_acl sunrpc nfnetlink
> > iptable_filter ip_table
> > Dec 7 17:20:52 sata_fileserver kernel: x_tables af_packet ipv6
> > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
> > snd_mixer_oss ipmi_si ipmi_msghandler binfmt_misc loop nls_utf8 ntfs
> > dm_mod usb_storage sg sd_mod sata_mv libata scsi_mod video output
> > thermal sbs processor fan container button dock battery ac floppy
> > snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm
> > snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep
> > ehci_hcd snd ohci_hcd i2c_piix4 uhci_hcd soundcore e1000 sworks_agp
> > i2c_core ide_cd usbcore agpgart emu10k1_gp gameport tsdev evdev
> > reiserfs ide_disk serverworks pdc202xx_new ide_core
> > Dec 7 17:20:52 sata_fileserver kernel: CPU: 1
> > Dec 7 17:20:52 sata_fileserver kernel:
> > EIP: 0060:[<c02cc820>] Not tainted VLI
> > Dec 7 17:20:52 sata_fileserver kernel: EFLAGS: 00210202 (2.6.23.1 #1)
> > Dec 7 17:20:52 sata_fileserver kernel: EIP is at tcp_recvmsg+0x150/0xbf0
> > Dec 7 17:20:52 sata_fileserver kernel: eax: 00000000 ebx:
> > f55c4b60 ecx: 784e2c7c edx: f63f63d8
> > Dec 7 17:20:52 sata_fileserver kernel: esi: 784e2c7a edi:
> > f63f614c ebp: e21fde24 esp: e21fddc4
> > Dec 7 17:20:52 sata_fileserver kernel: ds: 007b es: 007b fs:
> > 00d8 gs: 0033 ss: 0068
> > Dec 7 17:20:52 sata_fileserver kernel: Process smbd (pid: 9524,
> > ti=e21fc000 task=f5109000 task.ti=e21fc000)
> > Dec 7 17:20:52 sata_fileserver kernel: Stack: 00000000 ffffffff
> > 00000000 c13e5740 f557b000 c03fa300 00000000 e21fde90
> > Dec 7 17:20:52 sata_fileserver kernel: f63f60e0 00000000
> > 00000b64 f63f63d8 000005b4 00000001 00000000 00000000
> > Dec 7 17:20:52 sata_fileserver kernel: 00000000 000005b4
> > e21fde4c 7fffffff e21fde28 00000000 c03a4de0 e21fde90
> > Dec 7 17:20:52 sata_fileserver kernel: Call Trace:
> > Dec 7 17:20:53 sata_fileserver kernel: [<c010542a>]
> > show_trace_log_lvl+0x1a/0x30
> > Dec 7 17:20:53 sata_fileserver kernel: [<c01054eb>]
> > show_stack_log_lvl+0xab/0xd0
> > Dec 7 17:20:53 sata_fileserver kernel: [<c01056e1>]
> > show_registers+0x1d1/0x2d0
> > Dec 7 17:20:53 sata_fileserver kernel: [<c01058f6>] die+0x116/0x250
> > Dec 7 17:20:53 sata_fileserver kernel: [<c011f52b>]
> do_page_fault+0x28b/0x6a0
> > Dec 7 17:20:53 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
> > Dec 7 17:20:53 sata_fileserver kernel: [<c0295423>]
> > sock_common_recvmsg+0x43/0x60
> > Dec 7 17:20:53 sata_fileserver kernel: [<c029301c>]
> sock_aio_read+0x11c/0x130
> > Dec 7 17:20:53 sata_fileserver kernel: [<c017db30>]
> do_sync_read+0xd0/0x110
> > Dec 7 17:20:53 sata_fileserver kernel: [<c017e47d>] vfs_read+0x12d/0x140
> > Dec 7 17:20:53 sata_fileserver kernel: [<c017e8bd>] sys_read+0x3d/0x70
> > Dec 7 17:20:53 sata_fileserver kernel: [<c01042fe>]
> > sysenter_past_esp+0x6b/0xa1
> > Dec 7 17:20:53 sata_fileserver kernel: =======================
> > Dec 7 17:20:53 sata_fileserver kernel: Code: 6c 39 df 74 59 8d b6 00
> > 00 00 00 85 db 74 4f 8b 55 cc 8d 43 20 8b 0a 3b 48 18 0f 88 f4 05 00
> > 00 89 ce 2b 70 18 8b 83 90 00 00 00 <0f> b6 50 0d 89 d0 83 e0 02 3c
> > 01 8b 43 50 83 d6 ff 39 c6 0f 82
> > Dec 7 17:20:53 sata_fileserver kernel: EIP: [<c02cc820>]
> > tcp_recvmsg+0x150/0xbf0 SS:ESP 0068:e21fddc4
> > Dec 7 17:21:11 sata_fileserver kernel:
> > Shorewall:net2all:DROP:IN=eth0 OUT=
> > MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
> > DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9964
> > PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24064
> > Dec 7 17:21:13 sata_fileserver kernel:
> > Shorewall:net2all:DROP:IN=eth0 OUT=
> > MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
> > DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=9975
> > PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=24320
>
>(Please try to avoid the wordwrapping).
>
>That's a networking crash. Do the oops traces which you're getting all look
>like this one?
>
>Pentium III's are getting a bit old (resistive connections, drooping
>power supplies, etc) so there's a decent chance that you're seeing
>hardware failures here.

The other trace is a kernel bug. lt is included below.

It is true the hardware is a bit old, but I freshly assembled the system.
The power supply is new, everything has been re-seated.
I will be updating the hardware eventually, but I picked this hardware
because it is low power (@120watts), server grade, has ecc memory,
and has pcix- slots, which my ethernet card and 8 port sata controller need.

For what it is worth, the ethernet card is an intel pro1000-mt.

Dec 3 15:44:50 sata_fileserver kernel: ------------[ cut here ]------------
Dec 3 15:44:50 sata_fileserver kernel: Kernel BUG at c0167b30
[verbose debug info unavailable]
Dec 3 15:44:50 sata_fileserver kernel: invalid opcode: 0000 [#1]
Dec 3 15:44:51 sata_fileserver kernel: SMP
Dec 3 15:44:51 sata_fileserver kernel: Modules linked in:
iptable_raw xt_comment xt_policy xt_multiport ipt_ULOG ipt_TTL
ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent
ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN
ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp
nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc
nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda
nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323
nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG
xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_hashlimit
ip6_tables xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY
xt_tcpudp nfsd xt_state iptable_nat nf_nat nf_conntrack_ipv4 exportfs
auth_rpcgss nf_conntrack iptable_mangle nfnetlink nfs lockd nfs_acl
sunrpc iptable_filter ip_tables x_tables af_packet ipv6 snd_seq_dummy snd_
Dec 3 15:44:51 sata_fileserver kernel: eq_oss snd_seq_midi_event
snd_seq snd_pcm_oss snd_mixer_oss ipmi_si ipmi_msghandler binfmt_misc
loop nls_utf8 ntfs raid456 async_xor async_memcpy async_tx xor dm_mod
usb_storage sg sd_mod sata_mv libata scsi_mod video output thermal
sbs processor fan container button dock battery ac floppy snd_emu10k1
snd_rawmidi snd_ac97_codec ac97_bus snd_pcm ide_cd snd_seq_device
snd_timer snd_page_alloc i2c_piix4 snd_util_mem ohci_hcd uhci_hcd
i2c_core ehci_hcd snd_hwdep e1000 snd sworks_agp agpgart soundcore
usbcore emu10k1_gp gameport tsdev evdev reiserfs ide_disk serverworks
pdc202xx_new ide_core
Dec 3 15:44:51 sata_fileserver kernel: CPU: 1
Dec 3 15:44:51 sata_fileserver kernel:
EIP: 0060:[<c0167b30>] Not tainted VLI
Dec 3 15:44:51 sata_fileserver kernel: EFLAGS: 00210246 (2.6.23.1 #1)
Dec 3 15:44:51 sata_fileserver kernel: EIP is at set_page_address+0x170/0x180
Dec 3 15:44:51 sata_fileserver kernel: eax: ffbff000 ebx:
ffbff000 ecx: c0005ffc edx: ffbff000
Dec 3 15:44:51 sata_fileserver kernel: esi: c17d6c60 edi:
c0443ec0 ebp: ea139c88 esp: ea139c74
Dec 3 15:44:51 sata_fileserver kernel: ds: 007b es: 007b fs:
00d8 gs: 0033 ss: 0068
Dec 3 15:44:52 sata_fileserver kernel: Process smbd (pid: 6132,
ti=ea138000 task=f139c000 task.ti=ea138000)
Dec 3 15:44:52 sata_fileserver kernel: Stack: ffbff000 00200286
ffbff000 c17d6c60 3eb63163 ea139cb4 c0167ed2 ea139ca8
Dec 3 15:44:52 sata_fileserver kernel: ea138000 804cbe2c
804cce2c ea139cac c0125248 c17d6c60 804cbe2c 804cce2c
Dec 3 15:44:52 sata_fileserver kernel: ea139cc0 c01209b0
00000000 ea139cec f8aa86b5 0000000f 00000002 00000000
Dec 3 15:44:52 sata_fileserver kernel: Call Trace:
Dec 3 15:44:52 sata_fileserver kernel: [<c010542a>]
show_trace_log_lvl+0x1a/0x30
Dec 3 15:44:52 sata_fileserver kernel: [<c01054eb>]
show_stack_log_lvl+0xab/0xd0
Dec 3 15:44:52 sata_fileserver kernel: [<c01056e1>]
show_registers+0x1d1/0x2d0
Dec 3 15:44:52 sata_fileserver kernel: [<c01058f6>] die+0x116/0x250
Dec 3 15:44:53 sata_fileserver kernel: [<c0105ac1>] do_trap+0x91/0xc0
Dec 3 15:44:53 sata_fileserver kernel: [<c0105dd8>] do_invalid_op+0x88/0xa0
Dec 3 15:44:53 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
Dec 3 15:44:53 sata_fileserver kernel: [<c0167ed2>] kmap_high+0x152/0x1b0
Dec 3 15:44:53 sata_fileserver kernel: [<c01209b0>] kmap+0x50/0x80
Dec 3 15:44:53 sata_fileserver kernel: [<f8aa86b5>]
reiserfs_copy_from_user_to_file_region+0xa5/0xf0 [reiserfs]
Dec 3 15:44:53 sata_fileserver kernel: [<f8aa9c06>]
reiserfs_file_write+0x746/0x1dd0 [reiserfs]
Dec 3 15:44:53 sata_fileserver kernel: [<c017e2c5>] vfs_write+0xb5/0x140
Dec 3 15:44:53 sata_fileserver kernel: [<c017ea43>] sys_pwrite64+0x63/0x80
Dec 3 15:44:54 sata_fileserver kernel: [<c01042fe>]
sysenter_past_esp+0x6b/0xa1
Dec 3 15:44:54 sata_fileserver kernel: =======================
Dec 3 15:44:54 sata_fileserver kernel: Code: 3a 44 c0 89 1a 89 53 04
89 c2 b8 0c 3a 44 c0 e8 67 15 1a 00 e9 6f ff ff ff 8b 45 f0 89 ca e8
58 15 1a 00 83 c4 08 5b 5e 5f 5d c3 <0f> 0b eb fe 8d b6 00 00 00 00
8d bf 00 00 00 00 55 89 e5 83 ec
Dec 3 15:44:54 sata_fileserver kernel: EIP: [<c0167b30>]
set_page_address+0x170/0x180 SS:ESP 0068:ea139c74
Dec 3 15:44:54 sata_fileserver kernel: WARNING: at
/usr/src/linux-2.6.23.1/kernel/exit.c:892 do_exit()
Dec 3 15:44:54 sata_fileserver kernel: [<c010542a>]
show_trace_log_lvl+0x1a/0x30
Dec 3 15:44:54 sata_fileserver kernel: [<c0106022>] show_trace+0x12/0x20
Dec 3 15:44:54 sata_fileserver kernel: [<c0106046>] dump_stack+0x16/0x20
Dec 3 15:44:54 sata_fileserver kernel: [<c012d064>] do_exit+0x834/0x840
Dec 3 15:44:54 sata_fileserver kernel: [<c0105a29>] die+0x249/0x250
Dec 3 15:44:54 sata_fileserver kernel: [<c0105ac1>] do_trap+0x91/0xc0
Dec 3 15:44:54 sata_fileserver kernel: [<c0105dd8>] do_invalid_op+0x88/0xa0
Dec 3 15:44:54 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
Dec 3 15:44:54 sata_fileserver kernel: [<c0167ed2>] kmap_high+0x152/0x1b0
Dec 3 15:44:54 sata_fileserver kernel: [<c01209b0>] kmap+0x50/0x80
Dec 3 15:44:54 sata_fileserver kernel: [<f8aa86b5>]
reiserfs_copy_from_user_to_file_region+0xa5/0xf0 [reiserfs]
Dec 3 15:44:54 sata_fileserver kernel: [<f8aa9c06>]
reiserfs_file_write+0x746/0x1dd0 [reiserfs]
Dec 3 15:44:54 sata_fileserver kernel: [<c017e2c5>] vfs_write+0xb5/0x140
Dec 3 15:44:54 sata_fileserver kernel: [<c017ea43>] sys_pwrite64+0x63/0x80
Dec 3 15:44:54 sata_fileserver kernel: [<c01042fe>]
sysenter_past_esp+0x6b/0xa1
Dec 3 15:44:54 sata_fileserver kernel: =======================
Dec 3 15:44:54 sata_fileserver kernel:
Shorewall:net2all:DROP:IN=eth0 OUT=
MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=24365
PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=6912
Dec 3 15:44:54 sata_fileserver kernel:
Shorewall:net2all:DROP:IN=eth0 OUT=
MAC=00:04:23:a8:12:cf:00:11:2f:42:d4:32:08:00 SRC=192.168.47.120
DST=192.168.47.101 LEN=60 TOS=0x00 PREC=0x00 TTL=32 ID=24381
PROTO=ICMP TYPE=8 CODE=0 ID=512 SEQ=7168

2007-12-16 22:09:46

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with 2.6.23.1, marvel, software raid, reiserfs and samba

On Sun, 16 Dec 2007 06:55:51 -0800 jeffunit <[email protected]> wrote:

> At 03:05 AM 12/16/2007, Andrew Morton wrote:
> >On Fri, 07 Dec 2007 19:49:52 -0800 jeffunit <[email protected]> wrote:
> >
> > > I am running linux kernel 2.6.23.1, which I compiled.
> > > The base system was mandriva 2008.
> > >
> > > I have a dual processor pentium III 933 system.
> > > It has 3gb of ram, an intel stl-2 motherboard.
> > > It also has a promise 100 tx2 pata controller,
> > > a supermicro marvell based 8 port pcix sata controller,
> > > and a nvidia pci based video card.
> > >
> > > I have the os on a pata drive, and have made a software raid array
> > > consisting of 4 sata drives attached to the pcix sata controller.
> > > I created the array, and formatted with reiserfs 3.6
> > > I have run bonnie++ (filesystem benchmark) on the array without incident.
> > > When I use samba-3.0.25b-4.3 and copy files from a windows machine to
> > > the fileserver,
> > > every so often, the fileserver crashes or hangs. It seems to happen
> > > more often under heavy samba traffic.
> > > Enclosed is the oops from syslog.
> > > I also have a 'kernel bug' from syslog if that would be helpful.
> > >
>
> ...
>
> >
> >(Please try to avoid the wordwrapping).

(you didn't)

> >That's a networking crash. Do the oops traces which you're getting all look
> >like this one?
> >
> >Pentium III's are getting a bit old (resistive connections, drooping
> >power supplies, etc) so there's a decent chance that you're seeing
> >hardware failures here.
>
> The other trace is a kernel bug. lt is included below.
>
> It is true the hardware is a bit old, but I freshly assembled the system.
> The power supply is new, everything has been re-seated.
> I will be updating the hardware eventually, but I picked this hardware
> because it is low power (@120watts), server grade, has ecc memory,
> and has pcix- slots, which my ethernet card and 8 port sata controller need.
>
> For what it is worth, the ethernet card is an intel pro1000-mt.
>
> Dec 3 15:44:50 sata_fileserver kernel: ------------[ cut here ]------------
> Dec 3 15:44:50 sata_fileserver kernel: Kernel BUG at c0167b30
> [verbose debug info unavailable]

I'd suggest that you enable CONFIG_DEBUG_BUGVERBOSE, especially when the
system is having trouble. It's worth it.

> Dec 3 15:44:50 sata_fileserver kernel: invalid opcode: 0000 [#1]
> Dec 3 15:44:51 sata_fileserver kernel: SMP
> Dec 3 15:44:51 sata_fileserver kernel: Modules linked in:
> iptable_raw xt_comment xt_policy xt_multiport ipt_ULOG ipt_TTL
> ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent
> ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN
> ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp
> nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc
> nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda
> nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp
> nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
> nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323
> nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG
> xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_hashlimit
> ip6_tables xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY
> xt_tcpudp nfsd xt_state iptable_nat nf_nat nf_conntrack_ipv4 exportfs
> auth_rpcgss nf_conntrack iptable_mangle nfnetlink nfs lockd nfs_acl
> sunrpc iptable_filter ip_tables x_tables af_packet ipv6 snd_seq_dummy snd_
> Dec 3 15:44:51 sata_fileserver kernel: eq_oss snd_seq_midi_event
> snd_seq snd_pcm_oss snd_mixer_oss ipmi_si ipmi_msghandler binfmt_misc
> loop nls_utf8 ntfs raid456 async_xor async_memcpy async_tx xor dm_mod
> usb_storage sg sd_mod sata_mv libata scsi_mod video output thermal
> sbs processor fan container button dock battery ac floppy snd_emu10k1
> snd_rawmidi snd_ac97_codec ac97_bus snd_pcm ide_cd snd_seq_device
> snd_timer snd_page_alloc i2c_piix4 snd_util_mem ohci_hcd uhci_hcd
> i2c_core ehci_hcd snd_hwdep e1000 snd sworks_agp agpgart soundcore
> usbcore emu10k1_gp gameport tsdev evdev reiserfs ide_disk serverworks
> pdc202xx_new ide_core
> Dec 3 15:44:51 sata_fileserver kernel: CPU: 1
> Dec 3 15:44:51 sata_fileserver kernel:
> EIP: 0060:[<c0167b30>] Not tainted VLI
> Dec 3 15:44:51 sata_fileserver kernel: EFLAGS: 00210246 (2.6.23.1 #1)
> Dec 3 15:44:51 sata_fileserver kernel: EIP is at set_page_address+0x170/0x180
> Dec 3 15:44:51 sata_fileserver kernel: eax: ffbff000 ebx:
> ffbff000 ecx: c0005ffc edx: ffbff000
> Dec 3 15:44:51 sata_fileserver kernel: esi: c17d6c60 edi:
> c0443ec0 ebp: ea139c88 esp: ea139c74
> Dec 3 15:44:51 sata_fileserver kernel: ds: 007b es: 007b fs:
> 00d8 gs: 0033 ss: 0068
> Dec 3 15:44:52 sata_fileserver kernel: Process smbd (pid: 6132,
> ti=ea138000 task=f139c000 task.ti=ea138000)
> Dec 3 15:44:52 sata_fileserver kernel: Stack: ffbff000 00200286
> ffbff000 c17d6c60 3eb63163 ea139cb4 c0167ed2 ea139ca8
> Dec 3 15:44:52 sata_fileserver kernel: ea138000 804cbe2c
> 804cce2c ea139cac c0125248 c17d6c60 804cbe2c 804cce2c
> Dec 3 15:44:52 sata_fileserver kernel: ea139cc0 c01209b0
> 00000000 ea139cec f8aa86b5 0000000f 00000002 00000000
> Dec 3 15:44:52 sata_fileserver kernel: Call Trace:
> Dec 3 15:44:52 sata_fileserver kernel: [<c010542a>]
> show_trace_log_lvl+0x1a/0x30
> Dec 3 15:44:52 sata_fileserver kernel: [<c01054eb>]
> show_stack_log_lvl+0xab/0xd0
> Dec 3 15:44:52 sata_fileserver kernel: [<c01056e1>]
> show_registers+0x1d1/0x2d0
> Dec 3 15:44:52 sata_fileserver kernel: [<c01058f6>] die+0x116/0x250
> Dec 3 15:44:53 sata_fileserver kernel: [<c0105ac1>] do_trap+0x91/0xc0
> Dec 3 15:44:53 sata_fileserver kernel: [<c0105dd8>] do_invalid_op+0x88/0xa0
> Dec 3 15:44:53 sata_fileserver kernel: [<c030938a>] error_code+0x72/0x78
> Dec 3 15:44:53 sata_fileserver kernel: [<c0167ed2>] kmap_high+0x152/0x1b0
> Dec 3 15:44:53 sata_fileserver kernel: [<c01209b0>] kmap+0x50/0x80
> Dec 3 15:44:53 sata_fileserver kernel: [<f8aa86b5>]
> reiserfs_copy_from_user_to_file_region+0xa5/0xf0 [reiserfs]
> Dec 3 15:44:53 sata_fileserver kernel: [<f8aa9c06>]
> reiserfs_file_write+0x746/0x1dd0 [reiserfs]
> Dec 3 15:44:53 sata_fileserver kernel: [<c017e2c5>] vfs_write+0xb5/0x140
> Dec 3 15:44:53 sata_fileserver kernel: [<c017ea43>] sys_pwrite64+0x63/0x80
> Dec 3 15:44:54 sata_fileserver kernel: [<c01042fe>]
> sysenter_past_esp+0x6b/0xa1

This is a totally different crash and I don't think I've ever before seen a
crash in kmap()->set_page_address(). I'm suspecting hardware problems.
Can you run memtest86 on that box for a day or so?