2013-05-19 22:25:22

by Toralf Förster

[permalink] [raw]
Subject: BUG at fs/ext4/inode.c:1590!

The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3:

2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------
2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590!
2013-05-19T23:28:34.195+02:00 n22 kernel: invalid opcode: 0000 [#1] SMP
2013-05-19T23:28:34.195+02:00 n22 kernel: Modules linked in: loop rc_dib0700_rc5 dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib7000p dvb_core dibx000_common dib0090 rc_core nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun coretemp kvm_intel kvm fbcon bitblit usblp softcursor font psmouse acpi_cpufreq i915 uvcvideo sdhci_pci cfbfillrect sdhci videobuf2_vmalloc cfbimgblt videobuf2_memops i2c_algo_bit mmc_core videobuf2_core cfbcopyarea intel_agp videodev intel_gtt drm_kms_helper drm mperf arc4 iwldvm agpgart evdev processor video mac80211 iwlwifi cfg80211 thermal thermal_sys ac thinkpad_acpi battery nvram wmi e1000e rfkill fb snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer i2c_i801 tpm_tis tpm i2c_core but
ton tpm_bios 8250_pci hwmon 8250 snd ptp pps_core serial_core soundcore fbdev aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
2013-05-19T23:28:34.195+02:00 n22 kernel: Pid: 6292, comm: flush-7:1 Not tainted 3.9.3 #12 LENOVO 4180F65/4180F65
2013-05-19T23:28:34.195+02:00 n22 kernel: EIP: 0060:[<c11a71e9>] EFLAGS: 00010202 CPU: 2
2013-05-19T23:28:34.195+02:00 n22 kernel: EIP is at mpage_da_submit_io+0x339/0x360
2013-05-19T23:28:34.195+02:00 n22 kernel: EAX: 00000002 EBX: f2293d20 ECX: e4663700 EDX: 00000000
2013-05-19T23:28:34.195+02:00 n22 kernel: ESI: 00000000 EDI: e4663700 EBP: f2293ca4 ESP: f2293bf4
2013-05-19T23:28:34.195+02:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
2013-05-19T23:28:34.197+02:00 n22 kernel: CR0: 80050033 CR2: b74ca060 CR3: 1bfd2000 CR4: 000407f0
2013-05-19T23:28:34.197+02:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
2013-05-19T23:28:34.197+02:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
2013-05-19T23:28:34.197+02:00 n22 kernel: Process flush-7:1 (pid: 6292, ti=f2292000 task=f14e0000 task.ti=f2292000)
2013-05-19T23:28:34.197+02:00 n22 kernel: Stack:
2013-05-19T23:28:34.197+02:00 n22 kernel: 0000000e e85f039c 00000003 e85f02d8 0000003e 80000001 0000003e f2293dac
2013-05-19T23:28:34.197+02:00 n22 kernel: 80000001 00000004 00000000 00001000 00000000 80000001 f5a1cc00 0000f001
2013-05-19T23:28:34.197+02:00 n22 kernel: 00000000 00000000 00000000 00000004 00000000 f5a1cc00 f4796fe0 f612bf60
2013-05-19T23:28:34.197+02:00 n22 kernel: Call Trace:
2013-05-19T23:28:34.197+02:00 n22 kernel: [<c11abbba>] ? ext4_mark_inode_dirty+0x6a/0x1c0
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11acfba>] mpage_da_map_and_submit+0xfa/0x5c0
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11d875b>] ? __ext4_journal_start_sb+0x6b/0x140
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11adbe9>] ext4_da_writepages+0x339/0x5d0
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10df3e1>] do_writepages+0x21/0x40
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c1141108>] __writeback_single_inode+0x38/0x240
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10551d3>] ? wake_up_bit+0x23/0x30
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114359b>] writeback_sb_inodes+0x16b/0x2f0
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114389b>] wb_writeback+0xcb/0x2c0
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10438bb>] ? lock_timer_base.isra.38+0x2b/0x50
2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10442e9>] ? del_timer_sync+0x49/0x60
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1144eec>] wb_do_writeback+0x9c/0x1d0
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145095>] bdi_writeback_thread+0x75/0x230
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145020>] ? wb_do_writeback+0x1d0/0x1d0
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054d64>] kthread+0x94/0xa0
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1488177>] ret_from_kernel_thread+0x1b/0x28
2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054cd0>] ? flush_kthread_work+0xd0/0xd0
2013-05-19T23:28:34.200+02:00 n22 kernel: Code: ff ff 85 d2 0f 45 85 78 ff ff ff 89 85 78 ff ff ff e9 59 ff ff ff 8d 74 26 00 8b b5 58 ff ff ff 89 b5 7c ff ff ff e9 16 fe ff ff <0f> 0b 0f 0b 0f 0b 0f 0b c7 85 78 ff ff ff 00 00 00 00 e9 76 ff
2013-05-19T23:28:34.200+02:00 n22 kernel: EIP: [<c11a71e9>] mpage_da_submit_io+0x339/0x360 SS:ESP 0068:f2293bf4
2013-05-19T23:28:34.200+02:00 n22 kernel: ---[ end trace 6b3eadfbb825e4d2 ]---



The trinity log files hangs here since about a hour:
...
[4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument)
[4673] [416] munlock(addr=0x85c6800, len=4096) = 0
[4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097
[4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument)
[4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument)
[4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument)
[4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL.


I created an EXT4FS on the file /mnt/ramdisk/disk1, loop-mounted it at /mnt/ramdisk/victims,
I mounted a stable Gentoo Linux image onto /mnt/ramdisk/trinity, chrooted into + started a fuzzying process

$ ps axf | grep trinity
2427 pts/2 S+ 0:00 | \_ sudo /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
2428 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
2479 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
4681 pts/2 D+ 0:00 | | \_ grep -q -e Regenerating random pages -e Triggering periodic reseed. /mnt/ramdisk/victims/v1/trinity.log
2483 pts/2 S+ 0:00 | \_ su - tfoerste -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
2485 pts/2 S+ 0:00 | \_ /bin/bash -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
4510 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4511 pts/2 D+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4512 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4514 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4546 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4651 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
4673 pts/2 DNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
5421 pts/5 S+ 0:00 \_ grep --colour=auto trinity


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


2013-05-19 23:55:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: BUG at fs/ext4/inode.c:1590!

So this BUG happened with a corrupted file system using a fuzzing
process? What is trinity? Is that the fuzzing process or the
workload?

Can you replicate it? Do you have the corrupted file system?

Thanks,

- Ted

On Mon, May 20, 2013 at 12:25:17AM +0200, Toralf F?rster wrote:
> The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3:
>
> 2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------
> 2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590!
> 2013-05-19T23:28:34.195+02:00 n22 kernel: invalid opcode: 0000 [#1] SMP
> 2013-05-19T23:28:34.195+02:00 n22 kernel: Modules linked in: loop rc_dib0700_rc5 dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib7000p dvb_core dibx000_common dib0090 rc_core nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun coretemp kvm_intel kvm fbcon bitblit usblp softcursor font psmouse acpi_cpufreq i915 uvcvideo sdhci_pci cfbfillrect sdhci videobuf2_vmalloc cfbimgblt videobuf2_memops i2c_algo_bit mmc_core videobuf2_core cfbcopyarea intel_agp videodev intel_gtt drm_kms_helper drm mperf arc4 iwldvm agpgart evdev processor video mac80211 iwlwifi cfg80211 thermal thermal_sys ac thinkpad_acpi battery nvram wmi e1000e rfkill fb snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer i2c_i801 tpm_tis tpm i2!
> c_core but
> ton tpm_bios 8250_pci hwmon 8250 snd ptp pps_core serial_core soundcore fbdev aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
> 2013-05-19T23:28:34.195+02:00 n22 kernel: Pid: 6292, comm: flush-7:1 Not tainted 3.9.3 #12 LENOVO 4180F65/4180F65
> 2013-05-19T23:28:34.195+02:00 n22 kernel: EIP: 0060:[<c11a71e9>] EFLAGS: 00010202 CPU: 2
> 2013-05-19T23:28:34.195+02:00 n22 kernel: EIP is at mpage_da_submit_io+0x339/0x360
> 2013-05-19T23:28:34.195+02:00 n22 kernel: EAX: 00000002 EBX: f2293d20 ECX: e4663700 EDX: 00000000
> 2013-05-19T23:28:34.195+02:00 n22 kernel: ESI: 00000000 EDI: e4663700 EBP: f2293ca4 ESP: f2293bf4
> 2013-05-19T23:28:34.195+02:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> 2013-05-19T23:28:34.197+02:00 n22 kernel: CR0: 80050033 CR2: b74ca060 CR3: 1bfd2000 CR4: 000407f0
> 2013-05-19T23:28:34.197+02:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> 2013-05-19T23:28:34.197+02:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
> 2013-05-19T23:28:34.197+02:00 n22 kernel: Process flush-7:1 (pid: 6292, ti=f2292000 task=f14e0000 task.ti=f2292000)
> 2013-05-19T23:28:34.197+02:00 n22 kernel: Stack:
> 2013-05-19T23:28:34.197+02:00 n22 kernel: 0000000e e85f039c 00000003 e85f02d8 0000003e 80000001 0000003e f2293dac
> 2013-05-19T23:28:34.197+02:00 n22 kernel: 80000001 00000004 00000000 00001000 00000000 80000001 f5a1cc00 0000f001
> 2013-05-19T23:28:34.197+02:00 n22 kernel: 00000000 00000000 00000000 00000004 00000000 f5a1cc00 f4796fe0 f612bf60
> 2013-05-19T23:28:34.197+02:00 n22 kernel: Call Trace:
> 2013-05-19T23:28:34.197+02:00 n22 kernel: [<c11abbba>] ? ext4_mark_inode_dirty+0x6a/0x1c0
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11acfba>] mpage_da_map_and_submit+0xfa/0x5c0
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11d875b>] ? __ext4_journal_start_sb+0x6b/0x140
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11adbe9>] ext4_da_writepages+0x339/0x5d0
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10df3e1>] do_writepages+0x21/0x40
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c1141108>] __writeback_single_inode+0x38/0x240
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10551d3>] ? wake_up_bit+0x23/0x30
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114359b>] writeback_sb_inodes+0x16b/0x2f0
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114389b>] wb_writeback+0xcb/0x2c0
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10438bb>] ? lock_timer_base.isra.38+0x2b/0x50
> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10442e9>] ? del_timer_sync+0x49/0x60
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1144eec>] wb_do_writeback+0x9c/0x1d0
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145095>] bdi_writeback_thread+0x75/0x230
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145020>] ? wb_do_writeback+0x1d0/0x1d0
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054d64>] kthread+0x94/0xa0
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1488177>] ret_from_kernel_thread+0x1b/0x28
> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054cd0>] ? flush_kthread_work+0xd0/0xd0
> 2013-05-19T23:28:34.200+02:00 n22 kernel: Code: ff ff 85 d2 0f 45 85 78 ff ff ff 89 85 78 ff ff ff e9 59 ff ff ff 8d 74 26 00 8b b5 58 ff ff ff 89 b5 7c ff ff ff e9 16 fe ff ff <0f> 0b 0f 0b 0f 0b 0f 0b c7 85 78 ff ff ff 00 00 00 00 e9 76 ff
> 2013-05-19T23:28:34.200+02:00 n22 kernel: EIP: [<c11a71e9>] mpage_da_submit_io+0x339/0x360 SS:ESP 0068:f2293bf4
> 2013-05-19T23:28:34.200+02:00 n22 kernel: ---[ end trace 6b3eadfbb825e4d2 ]---
>
>
>
> The trinity log files hangs here since about a hour:
> ...
> [4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument)
> [4673] [416] munlock(addr=0x85c6800, len=4096) = 0
> [4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097
> [4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument)
> [4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument)
> [4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument)
> [4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL.
>
>
> I created an EXT4FS on the file /mnt/ramdisk/disk1, loop-mounted it at /mnt/ramdisk/victims,
> I mounted a stable Gentoo Linux image onto /mnt/ramdisk/trinity, chrooted into + started a fuzzying process
>
> $ ps axf | grep trinity
> 2427 pts/2 S+ 0:00 | \_ sudo /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
> 2428 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
> 2479 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
> 4681 pts/2 D+ 0:00 | | \_ grep -q -e Regenerating random pages -e Triggering periodic reseed. /mnt/ramdisk/victims/v1/trinity.log
> 2483 pts/2 S+ 0:00 | \_ su - tfoerste -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
> 2485 pts/2 S+ 0:00 | \_ /bin/bash -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
> 4510 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4511 pts/2 D+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4512 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4514 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4546 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4651 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 4673 pts/2 DNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
> 5421 pts/5 S+ 0:00 \_ grep --colour=auto trinity
>
>
> --
> MfG/Sincerely
> Toralf F?rster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-05-20 02:28:09

by Eric Sandeen

[permalink] [raw]
Subject: Re: BUG at fs/ext4/inode.c:1590!

On 5/19/13 6:55 PM, Theodore Ts'o wrote:
> So this BUG happened with a corrupted file system using a fuzzing
> process? What is trinity? Is that the fuzzing process or the
> workload?

a system call fuzz tester

http://codemonkey.org.uk/projects/trinity/

It's probably possible that it's memory corruption too.

> Can you replicate it? Do you have the corrupted file system?

Right, these bugs need to be narrowed down to be useful.

"I threw random garbage at the kernel and something fell off after a
few hours" is a bit vague. ;)

Does trinity start w/ a random seed, so you can restart? Or better
yet, restart w/ that seed and show the last 20 syscalls before the
bug, etc?

Ideally if you can get it down to a short sequence that leads to the
bug, then we can investigate.

Maybe it's the last few lines of the logfile but I'm not sure, I'm
not too familiar w/ trinity.

/me goes off to ask davej

-Eric

> Thanks,
>
> - Ted
>
> On Mon, May 20, 2013 at 12:25:17AM +0200, Toralf F?rster wrote:
>> The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3:
>>
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590!
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: invalid opcode: 0000 [#1] SMP
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: Modules linked in: loop rc_dib0700_rc5 dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib7000p dvb_core dibx000_common dib0090 rc_core nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun coretemp kvm_intel kvm fbcon bitblit usblp softcursor font psmouse acpi_cpufreq i915 uvcvideo sdhci_pci cfbfillrect sdhci videobuf2_vmalloc cfbimgblt videobuf2_memops i2c_algo_bit mmc_core videobuf2_core cfbcopyarea intel_agp videodev intel_gtt drm_kms_helper drm mperf arc4 iwldvm agpgart evdev processor video mac80211 iwlwifi cfg80211 thermal thermal_sys ac thinkpad_acpi battery nvram wmi e1000e rfkill fb snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer i2c_i801 tpm_tis tpm!
!
> i2!
>> c_core but
>> ton tpm_bios 8250_pci hwmon 8250 snd ptp pps_core serial_core soundcore fbdev aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: Pid: 6292, comm: flush-7:1 Not tainted 3.9.3 #12 LENOVO 4180F65/4180F65
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: EIP: 0060:[<c11a71e9>] EFLAGS: 00010202 CPU: 2
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: EIP is at mpage_da_submit_io+0x339/0x360
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: EAX: 00000002 EBX: f2293d20 ECX: e4663700 EDX: 00000000
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: ESI: 00000000 EDI: e4663700 EBP: f2293ca4 ESP: f2293bf4
>> 2013-05-19T23:28:34.195+02:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: CR0: 80050033 CR2: b74ca060 CR3: 1bfd2000 CR4: 000407f0
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: Process flush-7:1 (pid: 6292, ti=f2292000 task=f14e0000 task.ti=f2292000)
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: Stack:
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: 0000000e e85f039c 00000003 e85f02d8 0000003e 80000001 0000003e f2293dac
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: 80000001 00000004 00000000 00001000 00000000 80000001 f5a1cc00 0000f001
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: 00000000 00000000 00000000 00000004 00000000 f5a1cc00 f4796fe0 f612bf60
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: Call Trace:
>> 2013-05-19T23:28:34.197+02:00 n22 kernel: [<c11abbba>] ? ext4_mark_inode_dirty+0x6a/0x1c0
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11acfba>] mpage_da_map_and_submit+0xfa/0x5c0
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11d875b>] ? __ext4_journal_start_sb+0x6b/0x140
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c11adbe9>] ext4_da_writepages+0x339/0x5d0
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10df3e1>] do_writepages+0x21/0x40
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c1141108>] __writeback_single_inode+0x38/0x240
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10551d3>] ? wake_up_bit+0x23/0x30
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114359b>] writeback_sb_inodes+0x16b/0x2f0
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c114389b>] wb_writeback+0xcb/0x2c0
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10438bb>] ? lock_timer_base.isra.38+0x2b/0x50
>> 2013-05-19T23:28:34.199+02:00 n22 kernel: [<c10442e9>] ? del_timer_sync+0x49/0x60
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1144eec>] wb_do_writeback+0x9c/0x1d0
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145095>] bdi_writeback_thread+0x75/0x230
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1145020>] ? wb_do_writeback+0x1d0/0x1d0
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054d64>] kthread+0x94/0xa0
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1488177>] ret_from_kernel_thread+0x1b/0x28
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: [<c1054cd0>] ? flush_kthread_work+0xd0/0xd0
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: Code: ff ff 85 d2 0f 45 85 78 ff ff ff 89 85 78 ff ff ff e9 59 ff ff ff 8d 74 26 00 8b b5 58 ff ff ff 89 b5 7c ff ff ff e9 16 fe ff ff <0f> 0b 0f 0b 0f 0b 0f 0b c7 85 78 ff ff ff 00 00 00 00 e9 76 ff
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: EIP: [<c11a71e9>] mpage_da_submit_io+0x339/0x360 SS:ESP 0068:f2293bf4
>> 2013-05-19T23:28:34.200+02:00 n22 kernel: ---[ end trace 6b3eadfbb825e4d2 ]---
>>
>>
>>
>> The trinity log files hangs here since about a hour:
>> ...
>> [4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument)
>> [4673] [416] munlock(addr=0x85c6800, len=4096) = 0
>> [4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097
>> [4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument)
>> [4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument)
>> [4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument)
>> [4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL.
>>
>>
>> I created an EXT4FS on the file /mnt/ramdisk/disk1, loop-mounted it at /mnt/ramdisk/victims,
>> I mounted a stable Gentoo Linux image onto /mnt/ramdisk/trinity, chrooted into + started a fuzzying process
>>
>> $ ps axf | grep trinity
>> 2427 pts/2 S+ 0:00 | \_ sudo /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
>> 2428 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
>> 2479 pts/2 S+ 0:00 | \_ /bin/sh /home/tfoerste/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/trinity -t cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
>> 4681 pts/2 D+ 0:00 | | \_ grep -q -e Regenerating random pages -e Triggering periodic reseed. /mnt/ramdisk/victims/v1/trinity.log
>> 2483 pts/2 S+ 0:00 | \_ su - tfoerste -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
>> 2485 pts/2 S+ 0:00 | \_ /bin/bash -c cd /mnt/n22/v1; while [[ : ]]; do trinity -C 4 -V /mnt/n22/v1/v2/ -m; sleep 2; done
>> 4510 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4511 pts/2 D+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4512 pts/2 S+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4514 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4546 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4651 pts/2 SNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 4673 pts/2 DNL+ 0:00 | \_ trinity -C 4 -V /mnt/n22/v1/v2/ -m
>> 5421 pts/5 S+ 0:00 \_ grep --colour=auto trinity
>>
>>
>> --
>> MfG/Sincerely
>> Toralf F?rster
>> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-05-20 07:51:53

by Toralf Förster

[permalink] [raw]
Subject: Re: BUG at fs/ext4/inode.c:1590!

On 05/20/2013 04:28 AM, Eric Sandeen wrote:
> It's probably possible that it's memory corruption too.
>
>> > Can you replicate it? Do you have the corrupted file system?
> Right, these bugs need to be narrowed down to be useful.
>
When the bug occurred I was neither able to cd into the directory where
the log files resides nor I could do a sync. psgrep and friends hang too.
On the other hand I was able to write an email with thunderbird and send
it out before I had to power off the system - sysrq key alt+print+b
didn't worked too.

> Does trinity start w/ a random seed, so you can restart? Or better
> yet, restart w/ that seed and show the last 20 syscalls before the
> bug, etc?
yes - trinity uses a randomly choosen seed. SO a replay is possible later.
But because the bug occurred after 3 hours /me thinks that a simple
replay with just few syscalls won't work.

> "I threw random garbage at the kernel and something fell off after a
> few hours" is a bit vague. ;)

yes - I do know that the bug report lacks data to easy reproduce it -
OTOH I thought it is better to report that then to ignore.

To speed up things for fuzzy tests I do use file systems living in a
tempfs. I'll change my scripts to hold at least the main log file of
trinity on a hard disk and came back if I do have useful log data too.

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-05-20 12:32:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: BUG at fs/ext4/inode.c:1590!

On 5/20/13 2:51 AM, Toralf Förster wrote:
> On 05/20/2013 04:28 AM, Eric Sandeen wrote:
>> It's probably possible that it's memory corruption too.
>>
>>>> Can you replicate it? Do you have the corrupted file system?
>> Right, these bugs need to be narrowed down to be useful.
>>
> When the bug occurred I was neither able to cd into the directory where
> the log files resides nor I could do a sync. psgrep and friends hang too.
> On the other hand I was able to write an email with thunderbird and send
> it out before I had to power off the system - sysrq key alt+print+b
> didn't worked too.
>
>> Does trinity start w/ a random seed, so you can restart? Or better
>> yet, restart w/ that seed and show the last 20 syscalls before the
>> bug, etc?
> yes - trinity uses a randomly choosen seed. SO a replay is possible later.
> But because the bug occurred after 3 hours /me thinks that a simple
> replay with just few syscalls won't work.

Then maybe it needs an option to say "start with this seed, but only
start operation at the Nth syscall in the series" where N is indicated by
the last bit of info in the logs.

i.e. rather than start with seed S and do 10,000 operations and crash,
start with seed S, skip the first 9,990 operations, and do the next 10.
If that doesn't work, skip the first 9,900 ... back up until you get
a small set of reproducing steps.

>> "I threw random garbage at the kernel and something fell off after a
>> few hours" is a bit vague. ;)
>
> yes - I do know that the bug report lacks data to easy reproduce it -
> OTOH I thought it is better to report that then to ignore.

Possibly. :) (fwiw I think davej asks that he be cc'd on bugs that
it finds as well)

> To speed up things for fuzzy tests I do use file systems living in a
> tempfs. I'll change my scripts to hold at least the main log file of
> trinity on a hard disk and came back if I do have useful log data too.

Ok, I do think it's worth testing, but it's not actionable this way.

Given the fact that it's a purely random and usually intended to be
invalid input, it's going to be something that's never been seen
before, and very hard to solve by inspection.

Thanks,
-Eric

2013-05-20 15:27:30

by Eric Sandeen

[permalink] [raw]
Subject: Re: Trinity: BUG at fs/ext4/inode.c:1590!

On 5/19/13 5:25 PM, Toralf Förster wrote:
> The following BUG happened today at a stable Gentoo Linux 32bit system with stable kernel 3.9.3:

Please say right away, or even in the subject, when you're reporting a trinity fuzzer bug.
It helps set the stage for what we're looking at. :)

> 2013-05-19T23:28:34.195+02:00 n22 kernel: ------------[ cut here ]------------
> 2013-05-19T23:28:34.195+02:00 n22 kernel: kernel BUG at fs/ext4/inode.c:1590!

<snip>

> The trinity log files hangs here since about a hour:
> ...

ok so you're running it as:

# trinity -C 4 -V /mnt/n22/v1/v2/ -m

so: 4 children, victim files on /mnt/n22/v1/v2, and monochrome

PID
| child syscall nr
| |
> [4673] [415] rt_sigsuspend(unewset=0xc0100220, sigsetsize=0x5ffdef7a) = -1 (Invalid argument)
> [4673] [416] munlock(addr=0x85c6800, len=4096) = 0
> [4673] [417] splice(fd_in=8, off_in=0, fd_out=12, off_out=0x85c3000[page_0xff], len=4097, flags=8) = 4097
> [4673] [418] fstatat64(dfd=12, filename="/mnt/n22/v1/v2/d10", statbuf=0, flag=0x284d0014) = -1 (Invalid argument)
> [4673] [419] mincore(start=1, len=0x1000000, vec=0x85c0000[page_zeros]) = -1 (Invalid argument)
> [4673] [420] timer_settime(timer_id=0x5f3bdbfa, flags=0x3075aee6, new_setting=0x85c3000[page_0xff], old_setting=0x85c3001) = -1 (Invalid argument)
> [4673] [421] syncfs(fd=12) [watchdog] pid 4514 hasn't made progress in 30 seconds! (last:1368998898 now:1368998928 diff:30). Stuck in syscall 267:clock_nanosleep. Sending SIGKILL.

hm, and it's telling us (?) that pid 4514 is the one that's stuck, is that in the logs, should we see what syscalls were done there?

trinity reseeds itself now and then, and prints out the seed it's using at that point.

Gathering at least the seed info would allow the test to be restarted with that seed (-s option), and in particular see if it's repeatable.

Dave, I had suggested earlier in the thread that an option to specify a seed and a nr. of syscalls to skip would help narrow down what triggers a bug. i.e. in this case, we could find the last seed (for this child?) and then run with:

* that seed
* max syscalls 421
* skip the first 400 syscalls

and see if it reproduces. Keep narrowing the window until we get the smallest set that reproduces. fsx has something similar to this.

(heh, now I want an option to emit C code to recreate the last N syscalls it's made, for permanent testcases. I suppose that'd be tough) ;)

-Eric

Here's a really hacky, untested patch that might implement the skipping I'm talking about. Caveat; I've never actually used trinity. :)

So if I'm missing something obvious about how to narrow down a failure to the call that caused it, I'm all ears. :)

=====

trinity: Add option to skip first X syscalls

To narrow down what set of operations triggers a bug, it may
be useful to rerun w/ the same seed, but skip the first X syscalls,
to get the minimum set of calls which causes a problem.

Add an "-X" option to do this. This is per-child, in the
same way that the -N option is per-child. (I think!)

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/include/syscall.h b/include/syscall.h
index ccbe436..6dc00ec 100644
--- a/include/syscall.h
+++ b/include/syscall.h
@@ -99,6 +99,7 @@ extern const struct syscalltable *syscalls_32bit;
extern const struct syscalltable *syscalls_64bit;

extern unsigned long syscalls_todo;
+extern unsigned long syscalls_skip;
extern unsigned int max_nr_syscalls;
extern unsigned int max_nr_32bit_syscalls;
extern unsigned int max_nr_64bit_syscalls;
diff --git a/params.c b/params.c
index c2c3f70..d09c032 100644
--- a/params.c
+++ b/params.c
@@ -64,6 +64,7 @@ static void usage(void)
fprintf(stderr, " -N#: do # syscalls then exit.\n");
fprintf(stderr, " -p: pause after syscall.\n");
fprintf(stderr, " -s#: use # as random seed.\n");
+ fprintf(stderr, " -X#: skip first # syscalls.\n");
exit(EXIT_SUCCESS);
}

@@ -92,7 +93,7 @@ void parse_args(int argc, char *argv[])
{
int opt;

- while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:", longopts, NULL)) != -1) {
+ while ((opt = getopt_long(argc, argv, "c:C:dDg:hIl:LN:mnP:pqr:s:SV:vx:X", longopts, NULL)) != -1) {
switch (opt) {
default:
if (opt == '?')
@@ -189,7 +190,9 @@ void parse_args(int argc, char *argv[])
seed = strtol(optarg, NULL, 10);
user_set_seed = TRUE;
break;
-
+ case 'X':
+ syscalls_skip = strtol(optarg, NULL, 10);
+ break;

case 'S':
do_syslog = TRUE;
diff --git a/syscall.c b/syscall.c
index 42ac75e..cc33c52 100644
--- a/syscall.c
+++ b/syscall.c
@@ -83,10 +83,12 @@ static unsigned long do_syscall(int childno, int *errno_saved)

errno = 0;

- if (shm->do32bit[childno] == FALSE)
- ret = syscall(nr, a1, a2, a3, a4, a5, a6);
- else
- ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6);
+ if (shm->total_syscalls_done > syscalls_skip) {
+ if (shm->do32bit[childno] == FALSE)
+ ret = syscall(nr, a1, a2, a3, a4, a5, a6);
+ else
+ ret = syscall32(num_args, nr, a1, a2, a3, a4, a5, a6);
+ }

*errno_saved = errno;

@@ -221,7 +223,8 @@ args_done:
sptr += sprintf(sptr, ") ");
*sptr = '\0';

- output(2, "%s", string);
+ if (shm->total_syscalls_done >= syscalls_skip)
+ output(2, "%s", string);

if (dopause == TRUE) {
synclogs();
@@ -243,6 +246,7 @@ args_done:
*/
call += SYSCALL_OFFSET;

+ /* do_syscall will skip the actual call if we're still skipping */
ret = do_syscall(childno, &errno_saved);

sptr = string;
@@ -262,7 +266,8 @@ args_done:

*sptr = '\0';

- output(2, "%s", string);
+ if (shm->total_syscalls_done > syscalls_skip)
+ output(2, "%s", string);

/* If the syscall doesn't exist don't bother calling it next time. */
if ((ret == -1) && (errno_saved == ENOSYS)) {
diff --git a/tables.c b/tables.c
index 2bccd83..de11fb5 100644
--- a/tables.c
+++ b/tables.c
@@ -19,6 +19,7 @@ const struct syscalltable *syscalls_32bit;
const struct syscalltable *syscalls_64bit;

unsigned long syscalls_todo = 0;
+unsigned long syscalls_skip = 0;

unsigned int max_nr_syscalls;
unsigned int max_nr_32bit_syscalls;

2013-05-21 17:07:14

by Dave Jones

[permalink] [raw]
Subject: Re: Trinity: BUG at fs/ext4/inode.c:1590!

On Mon, May 20, 2013 at 10:27:26AM -0500, Eric Sandeen wrote:

> Dave, I had suggested earlier in the thread that an option to specify a seed and a nr. of syscalls
> to skip would help narrow down what triggers a bug.
> i.e. in this case, we could find the last seed (for this child?) and then run with:

The seed is 'global', ie, child 1 uses seed+1, child 2 uses seed+2 etc etc.
We only log the seed value without the child number, so as long as you run with the same number
of child processes, the children will all get the correct seed values.

> * that seed
> * max syscalls 421
> * skip the first 400 syscalls
>
> and see if it reproduces. Keep narrowing the window until we get the smallest set that reproduces. fsx has something similar to this.

Setting the seed with -N should have the same net result as skipping the syscalls up until that point.

There are however two gotchas with the code to use prior seeds.

1. imagine this scenario.

a. initial seed
b. N syscalls done
c. reseed
d. N syscalls done.

If we get an oops after D, you might think "great, I can ignore everything before (c)" but that may not necessarily
be the case, if something at (b) created/corrupted some kernel state that (d) ends up falling over.

2. We gather a list of filenames at startup from /proc, /sys and /dev, and these change at each run.
This has undesirable results when you're trying to recreate something based on 'fd 243' etc, when that
ends up mapping to a different file.
However, in Toralf's case, if he's using -V, we'll only gather fd's from there, and as long as the files/dirs
at that path are static across runs, all should be good.

> (heh, now I want an option to emit C code to recreate the last N syscalls it's made, for permanent testcases. I suppose that'd be tough) ;)

Not tough, just yet another idea to add to the already mile-long TODO :-)

> Here's a really hacky, untested patch that might implement the skipping I'm talking about. Caveat; I've never actually used trinity. :)

I think if we wanted to skip a few million syscalls this would involve
quite a wait, and you're not really going to be saving much time over
just doing the syscalls ;)

> So if I'm missing something obvious about how to narrow down a failure to the call that caused it, I'm all ears. :)

I really hope that -N is enough here. The only real gotcha is the 2nd case
above, which we could solve by using a filename cache in the same way
there's a network socket cache.

Dave


2013-05-26 12:42:53

by Toralf Förster

[permalink] [raw]
Subject: Re: BUG at fs/ext4/inode.c:1590!

On 05/20/2013 04:28 AM, Eric Sandeen wrote:
> On 5/19/13 6:55 PM, Theodore Ts'o wrote:
>> So this BUG happened with a corrupted file system using a fuzzing
>> process? What is trinity? Is that the fuzzing process or the
>> workload?
>
> a system call fuzz tester
>
> http://codemonkey.org.uk/projects/trinity/
>
> It's probably possible that it's memory corruption too.
>
>> Can you replicate it? Do you have the corrupted file system?
>
> Right, these bugs need to be narrowed down to be useful.

Today I run while fuzzying latest v3.10-rc2-448-g72de4c6 into a
nearly similar situation (32bit Gentoo Linux)

Till now I'm not able to gave a reproducible scenario just the bug messages.
All what I can tell till is that with stable kernel 3.9.x I never observed such issues
while 3.10-rcX produces it now for the 2nd time.

Unfortunately I lost the file containing the file system, the command mount gave:
$> /mnt/ramdisk/disk0 (deleted) on /mnt/ramdisk/victims type ext4 (rw)


What happened after that was :

$> rmdir /mnt/ramdisk/victims/
rmdir: failed to remove ‘/mnt/ramdisk/victims/’: Device or resource busy

$> ll /mnt/ramdisk/victims/
total 0

$> umount /mnt/ramdisk/victims/
Segmentation fault

And the syslog for completeness :

2013-05-26T14:32:59.612+02:00 n22 kernel: EXT4-fs (loop0): sb orphan head is 32029
2013-05-26T14:32:59.612+02:00 n22 kernel: sb_info orphan list:
2013-05-26T14:32:59.612+02:00 n22 kernel: inode loop0:32029 at e85e57f8: mode 102002, nlink 0, next 32018
2013-05-26T14:32:59.612+02:00 n22 kernel: inode loop0:32018 at e8bf49e8: mode 105043, nlink 0, next 32023
2013-05-26T14:32:59.612+02:00 n22 kernel: inode loop0:32023 at e938bb20: mode 101247, nlink 0, next 32047
2013-05-26T14:32:59.612+02:00 n22 kernel: inode loop0:32047 at e840b418: mode 101027, nlink 0, next 0
2013-05-26T14:32:59.612+02:00 n22 kernel: ------------[ cut here ]------------
2013-05-26T14:32:59.612+02:00 n22 kernel: kernel BUG at fs/ext4/super.c:804!
2013-05-26T14:32:59.612+02:00 n22 kernel: invalid opcode: 0000 [#1] SMP
2013-05-26T14:32:59.612+02:00 n22 kernel: Modules linked in: loop nfsd auth_rpcgss oid_registry lockd sunrpc ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc ipv6 tun fuse dm_mod coretemp kvm_intel kvm aesni_intel hid_generic usblp xts hid_cherry usbhid aes_i586 rc_dib0700_rc5 i915 cfbfillrect cfbimgblt hid lrw i2c_algo_bit gf128mul ablk_helper arc4 iwldvm mac80211 cfbcopyarea cryptd intel_agp dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib0090 dib7000p uvcvideo dvb_core dibx000_common videobuf2_vmalloc rc_core videobuf2_memops videobuf2_core videodev snd_hda_codec_conexant e1000e sr_mod snd_hda_intel intel_gtt thinkpad_acpi nvram snd_hda_codec snd_pcm snd_page_alloc snd_timer fbcon bitblit softcursor font drm_kms_helper psmouse sdhci_pci cdrom sdhci mmc_core w
mi iwlwifi cfg80211 ac snd evdev drm thermal acpi_cpufreq tpm_tis mperf rfkill battery tpm video tpm_bios button i2c_i801 agpgart 8250_pci fb processor i2c_core 8250 ptp serial_core soundcore pps_core thermal_sys fbdev hwmon [last unloaded: microcode]
2013-05-26T14:32:59.613+02:00 n22 kernel: CPU: 1 PID: 25253 Comm: umount Not tainted 3.10.0-rc2+ #3
2013-05-26T14:32:59.613+02:00 n22 kernel: Hardware name: LENOVO 4180F65/4180F65, BIOS 83ET73WW (1.43 ) 11/30/2012
2013-05-26T14:32:59.613+02:00 n22 kernel: task: eaff27f0 ti: e9c28000 task.ti: e9c28000
2013-05-26T14:32:59.613+02:00 n22 kernel: EIP: 0060:[<c11ba55c>] EFLAGS: 00010287 CPU: 1
2013-05-26T14:32:59.613+02:00 n22 kernel: EIP is at ext4_put_super+0x2dc/0x2e0
2013-05-26T14:32:59.613+02:00 n22 kernel: EAX: 0000003d EBX: eb519400 ECX: eb519550 EDX: eb519550
2013-05-26T14:32:59.613+02:00 n22 kernel: ESI: eb51c400 EDI: eb519514 EBP: e9c29efc ESP: e9c29ecc
2013-05-26T14:32:59.613+02:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
2013-05-26T14:32:59.613+02:00 n22 kernel: CR0: 80050033 CR2: b773f0c0 CR3: 31a7d000 CR4: 000407f0
2013-05-26T14:32:59.613+02:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
2013-05-26T14:32:59.614+02:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
2013-05-26T14:32:59.614+02:00 n22 kernel: Stack:
2013-05-26T14:32:59.614+02:00 n22 kernel: c1565830 eb51c5bc 00007d2f e840b418 00008217 00000000 00000000 e840b3f8
2013-05-26T14:32:59.614+02:00 n22 kernel: eb519550 eb51c400 eb51c458 c149e6a0 e9c29f18 c111f5e1 e9c29f28 e9c29f18
2013-05-26T14:32:59.614+02:00 n22 kernel: f1d74600 00000083 eb51c400 e9c29f28 c111f689 eb51c400 c15f9e28 e9c29f38
2013-05-26T14:32:59.614+02:00 n22 kernel: Call Trace:
2013-05-26T14:32:59.614+02:00 n22 kernel: [<c111f5e1>] generic_shutdown_super+0x51/0xd0
2013-05-26T14:32:59.614+02:00 n22 kernel: [<c111f689>] kill_block_super+0x29/0x70
2013-05-26T14:32:59.614+02:00 n22 kernel: [<c111f8d4>] deactivate_locked_super+0x44/0x70
2013-05-26T14:32:59.614+02:00 n22 kernel: [<c11202a7>] deactivate_super+0x47/0x60
2013-05-26T14:32:59.615+02:00 n22 kernel: [<c113702d>] mntput_no_expire+0xcd/0x120
2013-05-26T14:32:59.615+02:00 n22 kernel: [<c1137eee>] SyS_umount+0xae/0x330
2013-05-26T14:32:59.615+02:00 n22 kernel: [<c113818e>] SyS_oldumount+0x1e/0x20
2013-05-26T14:32:59.615+02:00 n22 kernel: [<c1480901>] sysenter_do_call+0x12/0x22
2013-05-26T14:32:59.615+02:00 n22 kernel: [<c1480000>] ? tty_unlock+0x20/0x50
2013-05-26T14:32:59.615+02:00 n22 kernel: Code: 24 30 58 56 c1 05 bc 01 00 00 89 44 24 04 e8 b4 db 2b 00 8b 4d ec 8b 55 f0 8b 09 39 ca 75 b2 39 93 50 01 00 00 0f 84 9a fe ff ff <0f> 0b 66 90 55 89 e5 83 ec 20 66 66 66 66 90 8d 45 18 c7 04 24
2013-05-26T14:32:59.615+02:00 n22 kernel: EIP: [<c11ba55c>] ext4_put_super+0x2dc/0x2e0 SS:ESP 0068:e9c29ecc
2013-05-26T14:32:59.615+02:00 n22 kernel: ---[ end trace 4ebfe9c46d1fd9eb ]---



--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3