2009-03-16 07:10:40

by Sachin Sant

[permalink] [raw]
Subject: [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

Using pSeries machine description
Page orders: linear mapping = 24, virtual = 16, io = 12, vmemmap = 24
Using 1TB segments
Found initrd at 0xc000000002600000:0xc000000002865000
console [udbg0] enabled
Partition configured for 20 cpus.
CPU maps initialized for 2 threads per core
(thread shift is 1)
Starting Linux PPC64 #1 SMP Sat Mar 14 22:21:27 IST 2009
-----------------------------------------------------
ppc64_pft_size = 0x1b
physicalMemorySize = 0x1c0000000
htab_hash_mask = 0xfffff
-----------------------------------------------------
Initializing cgroup subsys cpuset
Linux version 2.6.29-rc8-autokern1 ([email protected]) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Sat Mar 14 22:21:27 IST 2009
[boot]0012 Setup Arch
Node 0 Memory: 0x10000000-0xf0000000
Node 1 Memory: 0x0-0x10000000 0xf0000000-0x1c0000000
EEH: No capable adapters found
PPC64 nvram contains 7168 bytes
Using shared processor idle loop
Zone PFN ranges:
DMA 0x00000000 -> 0x0001c000
Normal 0x0001c000 -> 0x0001c000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
1: 0x00000000 -> 0x00001000
0: 0x00001000 -> 0x0000f000
1: 0x0000f000 -> 0x0001c000
On node 0 totalpages: 57344
DMA zone: 49 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 57295 pages, LIFO batch:1
On node 1 totalpages: 57344
DMA zone: 98 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 57246 pages, LIFO batch:1
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on. Total pages: 114541
Policy zone: DMA
Kernel command line: ro console=hvc0 rhgb quiet root=LABEL=/1 autobench_args: root=/dev/sda3 ABAT:1237051183
[boot]0020 XICS Init
[boot]0021 XICS Done
pic: no ISA interrupt controller
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 187.199000 MHz
time_init: processor frequency = 1498.500000 MHz
clocksource: timebase mult[155e1d6] shift[22] registered
clockevent: decrementer mult[2fec] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg0] -> real [hvc0]
Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
allocated 4587520 bytes of page_cgroup
please try cgroup_disable=memory option if you don't want
freeing bootmem node 0
freeing bootmem node 1
Memory: 7259520k/7340032k available (6016k kernel code, 96384k reserved, 1024k data, 992k bss, 448k init)
Calibrating delay loop... 373.76 BogoMIPS (lpj=186880)
Security Framework initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
Mount-cache hash table entries: 4096
Initializing cgroup subsys debug
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
clockevent: decrementer mult[2fec] shift[16] cpu[1]
Processor 1 found.
Brought up 2 CPUs
Node 0 CPUs: 0-1
Node 1 CPUs:
CPU0 attaching sched-domain:
domain 0: span 0-1 level SIBLING
groups: 0 1
domain 1: span 0-1 level NODE
groups: 0-1
CPU1 attaching sched-domain:
domain 0: span 0-1 level SIBLING
groups: 1 0
domain 1: span 0-1 level NODE
groups: 0-1
khelper used greatest stack depth: 12160 bytes left
khelper used greatest stack depth: 11024 bytes left
net_namespace: 1672 bytes
NET: Registered protocol family 16
IBM eBus Device Driver
PCI: Probing PCI hardware
PCI: Probing PCI hardware done
bio: create slab <bio-0> at 0
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 3, 524288 bytes)
TCP established hash table entries: 262144 (order: 6, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 4, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
NET: Registered protocol family 1
checking if image is initramfs... it is
Freeing initrd memory: 2452k freed
IOMMU table initialized, virtual merging enabled
RTAS daemon started
RTAS: event: 333, Type: Resource Deallocation Event, Severity: 2
RTAS: event: 334, Type: Resource Deallocation Event, Severity: 2
RTAS: event: 335, Type: Resource Deallocation Event, Severity: 2
audit: initializing netlink socket (disabled)
type=2000 audit(1237051441.513:1): initialized
HugeTLB registered 16 MB page size, pre-allocated 0 pages
HugeTLB registered 16 GB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
msgmni has been set to 14180
SELinux: Registering netfilter hooks
alg: No test for stdrng (krng)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Linux agpgart interface v0.103
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
brd: module loaded
Uniform Multi-Platform E-IDE driver
ide-gd driver 1.18
mice: PS/2 mouse device common for all mice
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
TCP bic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
Freeing unused kernel memory: 448k freed
uhci_hcd: USB Universal Host Controller Interface driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after
SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
vio_register_driver: driver ibmvscsi registering
ibmvscsi 30000003: SRP_VERSION: 16.a
scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8
ibmvscsi 30000003: partner initialization complete
ibmvscsi 30000003: sent SRP login
ibmvscsi 30000003: SRP_LOGIN succeeded
ibmvscsi 30000003: host srp version: 16.a, host partition VIO_Server (1), OS 3, max io 262144
scsi 0:0:1:0: Direct-Access AIX VDASD PQ: 0 ANSI: 3
sd 0:0:1:0: [sda] 136314880 512-byte hardware sectors: (69.7 GB/65.0 GiB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08
sd 0:0:1:0: [sda] Cache data unavailable
sd 0:0:1:0: [sda] Assuming drive cache: write through
sd 0:0:1:0: [sda] 136314880 512-byte hardware sectors: (69.7 GB/65.0 GiB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08
sd 0:0:1:0: [sda] Cache data unavailable
sd 0:0:1:0: [sda] Assuming drive cache: write through
sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:1:0: [sda] Attached SCSI disk
insmod used greatest stack depth: 4960 bytes left
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
SELinux: Unregistering netfilter hooks
type=1404 audit(1237051445.261:2): selinux=0 auid=4294967295 ses=4294967295
drivers/net/ibmveth.c: ibmveth: IBM i/pSeries Virtual Ethernet Driver 1.03
vio_register_driver: driver ibmveth registering
eth0 (ibmveth): not using net_device_ops yet
sd 0:0:1:0: Attached scsi generic sg0 type 0
lp: driver loaded but no devices found
i2c-core: driver [dummy] registered
device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: [email protected]
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on sda3, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 2048128k swap on /dev/sda5. Priority:-1 extents:1 across:2048128k


Attachments:
dmesg-log (7.58 kB)
jfs-fsstress.log (140.93 kB)
Download all attachments

2009-03-17 14:22:16

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Jfs-discussion] [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

On Mon, 2009-03-16 at 12:40 +0530, Sachin Sant wrote:
> While running file system stress (fsstress) on a JFS partition
> with 2.6.29-rc8, i hit the following BUG.

I tried reproducing this yesterday, but was unsuccessful. Maybe I
didn't let it run long enough. Have you run this successfully on
earlier kernels? If so, what was the most recent that worked for you.
Have you seen it more than once? I'm not sure how easy it is to
reproduce.

Thanks,
Shaggy

> ------------[ cut here ]------------
> kernel BUG at fs/mpage.c:473!
> cpu 0x0: Vector: 700 (Program Check) at [c0000000be9daba0]
> pc: c00000000014e3a0: .__mpage_writepage+0xd0/0x630
> lr: c0000000000dfedc: .write_cache_pages+0x288/0x42c
> sp: c0000000be9dae20
> msr: 8000000000029032
> current = 0xc00000003f30dd10
> paca = 0xc000000000753400
> pid = 27402, comm = fsstress
> kernel BUG at fs/mpage.c:473!
> enter ? for help
> [c0000000be9db3a0] c0000000000dfedc .write_cache_pages+0x288/0x42c
> [c0000000be9db530] c00000000014e298 .mpage_writepages+0x54/0x8c
> [c0000000be9db5d0] d0000000032f1720 .jfs_writepages+0x1c/0x34 [jfs]
> [c0000000be9db650] c0000000000e0120 .do_writepages+0x68/0xa4
> [c0000000be9db6d0] c0000000000d764c .__filemap_fdatawrite_range+0x8c/0xbc
> [c0000000be9db7c0] c0000000000d76bc .filemap_write_and_wait_range+0x40/0x8c
> [c0000000be9db860] c0000000000d79f4 .generic_file_buffered_write+0x2ec/0x358
> [c0000000be9db9a0] c0000000000d7f88
> .__generic_file_aio_write_nolock+0x300/0x3ec
> [c0000000be9dbaa0] c0000000000d8390 .generic_file_aio_write+0x80/0x114
> [c0000000be9dbb60] c000000000118a24 .do_sync_write+0xc4/0x124
> [c0000000be9dbcf0] c00000000011959c .vfs_write+0xd8/0x1a4
> [c0000000be9dbd90] c000000000119768 .SyS_write+0x50/0x90
> [c0000000be9dbe30] c000000000008534 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 000000000ff0d8c8
> SP (ff87c6f0) is in userspace
>
> This is a powerpc box (IBM,9133-55A). I have attached the dmesg log
> as well as the jfs-fsstress.log
>
> Partition Table information :
>
> /dev/sda3 on / type ext3 (rw)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> /dev/sda2 on /boot type ext3 (rw)
> tmpfs on /dev/shm type tmpfs (rw)
> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> /dev/sda6 on /mnt/tmp type jfs (rw)
>
> Let me know if there is any other information that i can provide.
>
> Thanks
> -Sachin

--
David Kleikamp
IBM Linux Technology Center

2009-03-18 05:29:38

by Sachin Sant

[permalink] [raw]
Subject: Re: [Jfs-discussion] [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

Dave Kleikamp wrote:
> On Mon, 2009-03-16 at 12:40 +0530, Sachin Sant wrote:
>
>> While running file system stress (fsstress) on a JFS partition
>> with 2.6.29-rc8, i hit the following BUG.
>>
>
> I tried reproducing this yesterday, but was unsuccessful. Maybe I
> didn't let it run long enough. Have you run this successfully on
> earlier kernels? If so, what was the most recent that worked for you.
> Have you seen it more than once? I'm not sure how easy it is to
> reproduce.
I haven't tried to test with earlier kernels. This was the first
time i ran these tests on this particular box.I run the fsstress
on different type of file systems one after the another (in no
particular order ). Something like ..

* create ext2 filesystem on a device
* run fsstress.
* unmount the device.

I repeat the above steps with ext3, reiserfs and jfs. May be this
combination is causing this problem.

I have seen this problem on two partitions of a power box. Same problem
seem to have been reported earlier against 2.6.25-rcx kernel.

http://lkml.indiana.edu/hypermail/linux/kernel/0803.2/1418.html


Thanks
-Sachin




--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-03-18 10:06:11

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

On Mon, 16 Mar 2009, Sachin Sant wrote:
> autorun: reading command 85 /usr/local/autobench/scripts/autobench.dat
> autorun: run test fsstress -d $scratchdir -l $iterations: command read
> autorun: run test fsstress -d /mnt/tmp -l 1: was expanded
> 03/15/2009-19:36:37 processing command: (1) 'run test fsstress -d /mnt/tmp -l 1'

How many iterations were really run to hit this BUG? When I specify "-l
1", fsstress[0] comes back instantly (and no error of course)...

Thanks,
Christian.

[0] http://ltp.cvs.sourceforge.net/viewvc/ltp/ltp/testcases/kernel/fs/fsstress/
--
Bruce Schneier's passphrase is used to etch diamonds

2009-03-19 06:33:44

by Sachin Sant

[permalink] [raw]
Subject: Re: [Jfs-discussion] [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

Christian Kujau wrote:
> On Mon, 16 Mar 2009, Sachin Sant wrote:
>
>> autorun: reading command 85 /usr/local/autobench/scripts/autobench.dat
>> autorun: run test fsstress -d $scratchdir -l $iterations: command read
>> autorun: run test fsstress -d /mnt/tmp -l 1: was expanded
>> 03/15/2009-19:36:37 processing command: (1) 'run test fsstress -d /mnt/tmp -l 1'
>>
>
> How many iterations were really run to hit this BUG? When I specify "-l
> 1", fsstress[0] comes back instantly (and no error of course)...
>
Sorry for the late reply. Somehow i missed this mail.

Yeah with default option values fsstress completes in a jiffy. I generally
invoke fsstress with 1000 operations per process (-n 1000) and 1000
processes (-p 1000).

fsstress -d /mnt/tmp -l 1 -n 1000 -p 1000

Thanks
-Sachin


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-03-20 01:51:21

by Christian Kujau

[permalink] [raw]
Subject: Re: [Jfs-discussion] [BUG/JFS/2.6.29-rc8] kernel BUG at fs/mpage.c:473!

On Thu, 19 Mar 2009, Sachin Sant wrote:
> Yeah with default option values fsstress completes in a jiffy. I generally
> invoke fsstress with 1000 operations per process (-n 1000) and 1000
> processes (-p 1000).

Ah! I was curious if I could reproduce this on amd64 and started
with 100, 1000, 10000 *iterations* (-l), will try with -n/-p now...

Thanks,
Christian.
--
My firewall cannot block http://www.schneier.com.