2011-03-01 19:26:50

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

Whoops, looks like the Debian bug-tracker lost the CC list somehow. I believe I've got all the CCs re-added, sorry for any duplicate emails.

On Mar 01, 2011, at 11:52, Kyle Moffett wrote:
> Package: linux-2.6
> Version: 2.6.32-30
> Severity: important
>
> I'm getting a repeatable BUG from ext4, which seems to be caused by
> Postfix processing its mail queue. The specific filesystem block device
> that has the problem seems to be "dm-13", which on this boot is the
> logical volume containing the "/var/spool/postfix" chroot.
>
> This is a completely standard Debian installation running on an Amazon EC2
> instance (x86_64). The filesystem is mounted in "data=journal" mode.
>
> This crash is *very* repeatable. It occurs almost every reboot when
> there are more than 1 or 2 queued emails. I will try re-mounting the
> filesystem in "data=ordered" mode momentarily.
>
> The relevant filesystems are:
> /dev/mapper/system-root / ext4 rw,noatime,barrier=1,data=ordered 0 0
> /dev/mapper/system-var /var ext4 rw,noatime,barrier=1,nodelalloc,data=journal 0 0
> /dev/mapper/system-log /var/log ext4 rw,nosuid,nodev,noatime,barrier=1,data=ordered 0 0
> /dev/xvda1 /boot ext3 rw,noatime,user_xattr,acl,data=journal 0 0
> /dev/mapper/db-mail /var/mail ext4 rw,nosuid,nodev,noatime,barrier=1,data=ordered 0 0
> /dev/mapper/db-postfix /var/spool/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
> /dev/mapper/db-smtp /var/lib/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
> /dev/mapper/db-smtpcfg /etc/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
>
> In particular, I note that there was a previous report of a BUG at
> fs/jbd2/commit.c:533 which never seemed to get isolated:
> http://www.kerneltrap.com/mailarchive/linux-ext4/2009/9/2/6373283
>
> I need to get this system operational again right now, but I'm going to
> take a consistent snapshot of it so I can debug it later.
>
> NOTE: For followers on the linux-ext4 mailing list, this particular
> system is running the Debian "squeeze" kernel (based on 2.6.32), so it's
> theoretically possible this bug has been fixed upstream since then.
> I didn't have any luck finding such a fix on Google, though.
>
> -- Package-specific info:
> ** Version:
> Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30) ([email protected]) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 05:46:49 UTC 2011
>
> ** Command line:
> root=/dev/mapper/system-root ro
>
> ** Tainted: D (128)
> * Kernel has oopsed before.
>
> ** Kernel log:
> [ 118.525038] alloc irq_desc for 526 on node -1
> [ 118.525040] alloc kstat_irqs on node -1
> [ 118.700415] device-mapper: uevent: version 1.0.3
> [ 118.700890] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: [email protected]
> [ 118.925563] EXT4-fs (dm-0): INFO: recovery required on readonly filesystem
> [ 118.925580] EXT4-fs (dm-0): write access will be enabled during recovery
> [ 118.968700] EXT4-fs (dm-0): orphan cleanup on readonly fs
> [ 118.968716] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790044
> [ 118.968761] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790012
> [ 118.968768] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790011
> [ 118.968775] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790010
> [ 118.968782] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790009
> [ 118.968788] EXT4-fs (dm-0): 5 orphan inodes deleted
> [ 118.968794] EXT4-fs (dm-0): recovery complete
> [ 118.979150] EXT4-fs (dm-0): mounted filesystem with ordered data mode
> [ 119.293543] udev[204]: starting version 164
> [ 119.366778] input: PC Speaker as /devices/platform/pcspkr/input/input1
> [ 119.436417] Error: Driver 'pcspkr' is already registered, aborting...
> [ 124.153241] Adding 4194296k swap on /dev/xvdb1. Priority:-1 extents:1 across:4194296k SS
> [ 125.156599] loop: module loaded
> [ 138.650657] EXT4-fs (dm-21): Ignoring delalloc option - requested data journaling mode
> [ 138.650959] EXT4-fs (dm-21): mounted filesystem with journalled data mode
> [ 138.660092] EXT4-fs (dm-22): mounted filesystem with ordered data mode
> [ 138.674436] kjournald starting. Commit interval 5 seconds
> [ 138.675145] EXT3 FS on xvda1, internal journal
> [ 138.675155] EXT3-fs: mounted filesystem with journal data mode.
> [ 138.728462] EXT4-fs (xvdc): mounted filesystem without journal
> [ 138.745406] EXT4-fs (dm-17): mounted filesystem with ordered data mode
> [ 138.748531] EXT4-fs (dm-18): mounted filesystem with ordered data mode
> [ 138.774667] EXT4-fs (dm-19): mounted filesystem with ordered data mode
> [ 138.780834] EXT4-fs (dm-2): Ignoring delalloc option - requested data journaling mode
> [ 138.781400] EXT4-fs (dm-2): mounted filesystem with journalled data mode
> [ 138.784700] EXT4-fs (dm-1): Ignoring delalloc option - requested data journaling mode
> [ 138.784773] EXT4-fs (dm-1): mounted filesystem with journalled data mode
> [ 138.790320] EXT4-fs (dm-4): Ignoring delalloc option - requested data journaling mode
> [ 138.790871] EXT4-fs (dm-4): mounted filesystem with journalled data mode
> [ 138.794955] EXT4-fs (dm-3): Ignoring delalloc option - requested data journaling mode
> [ 138.795064] EXT4-fs (dm-3): mounted filesystem with journalled data mode
> [ 138.800441] EXT4-fs (dm-14): Ignoring delalloc option - requested data journaling mode
> [ 138.801015] EXT4-fs (dm-14): mounted filesystem with journalled data mode
> [ 138.804188] EXT4-fs (dm-13): Ignoring delalloc option - requested data journaling mode
> [ 138.804263] EXT4-fs (dm-13): mounted filesystem with journalled data mode
> [ 138.807812] EXT4-fs (dm-12): Ignoring delalloc option - requested data journaling mode
> [ 138.807890] EXT4-fs (dm-12): mounted filesystem with journalled data mode
> [ 138.814241] EXT4-fs (dm-16): mounted filesystem with ordered data mode
> [ 138.817864] EXT4-fs (dm-15): mounted filesystem with ordered data mode
> [ 138.821251] EXT4-fs (dm-9): mounted filesystem with ordered data mode
> [ 138.824801] EXT4-fs (dm-7): Ignoring delalloc option - requested data journaling mode
> [ 138.824877] EXT4-fs (dm-7): mounted filesystem with journalled data mode
> [ 138.828423] EXT4-fs (dm-6): Ignoring delalloc option - requested data journaling mode
> [ 138.828495] EXT4-fs (dm-6): mounted filesystem with journalled data mode
> [ 138.832295] EXT4-fs (dm-5): Ignoring delalloc option - requested data journaling mode
> [ 138.832868] EXT4-fs (dm-5): mounted filesystem with journalled data mode
> [ 138.836422] EXT4-fs (dm-8): Ignoring delalloc option - requested data journaling mode
> [ 138.836495] EXT4-fs (dm-8): mounted filesystem with journalled data mode
> [ 138.840309] EXT4-fs (dm-11): mounted filesystem with ordered data mode
> [ 138.844664] EXT4-fs (dm-10): Ignoring delalloc option - requested data journaling mode
> [ 138.844735] EXT4-fs (dm-10): mounted filesystem with journalled data mode
> [ 138.859715] EXT4-fs (dm-20): mounted filesystem with ordered data mode
> [ 150.205516] eth0: no IPv6 routers present
> [ 157.443694] ------------[ cut here ]------------
> [ 157.443708] kernel BUG at /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:534!
> [ 157.443717] invalid opcode: 0000 [#1] SMP
> [ 157.443722] last sysfs file: /sys/devices/virtual/vc/vcsa6/uevent
> [ 157.443727] CPU 0
> [ 157.443730] Modules linked in: ext3 jbd loop snd_pcm snd_timer snd soundcore evdev snd_page_alloc pcspkr ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> [ 157.443759] Pid: 748, comm: jbd2/dm-13-8 Not tainted 2.6.32-5-xen-amd64 #1
> [ 157.443764] RIP: e030:[<ffffffffa00213bc>] [<ffffffffa00213bc>] jbd2_journal_commit_transaction+0x58b/0x115b [jbd2]
> [ 157.443778] RSP: e02b:ffff8801dc2adcf0 EFLAGS: 00010202
> [ 157.443782] RAX: 0000000000000009 RBX: ffff8801d7c26870 RCX: 00000000ffff74c8
> [ 157.443788] RDX: 00000000ffff74c8 RSI: ffff8801d7c26800 RDI: ffff8801d87a9024
> [ 157.443793] RBP: ffff8801d7c26850 R08: ffff8801d87a9058 R09: ffff88000b1df780
> [ 157.443798] R10: ffff8801d7aa0048 R11: ffff8801dba1d880 R12: ffff8801d87a9000
> [ 157.443803] R13: ffff8801dc2add88 R14: ffff8801d7c26800 R15: 0000000000000000
> [ 157.443812] FS: 00007f3afa7d3700(0000) GS:ffff88000b1ca000(0000) knlGS:0000000000000000
> [ 157.443818] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 157.443822] CR2: 00007f3afc878268 CR3: 00000001d7b05000 CR4: 0000000000002660
> [ 157.443828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 157.443833] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
> [ 157.443839] Process jbd2/dm-13-8 (pid: 748, threadinfo ffff8801dc2ac000, task ffff8801db02e2e0)
> [ 157.443844] Stack:
> [ 157.443847] ffff8801db02e318 ffff8801dc2add70 ffff8801db02e2e0 ffff8801d7aa38c8
> [ 157.443854] <0> 00000024a85fc12e ffff8801d87a9024 ffff8801d87a939c ffff88000b1df780
> [ 157.443865] <0> ffff8801d72f3000 ffff8801db02e2e0 0000000800000000 00000441000116e0
> [ 157.443877] Call Trace:
> [ 157.443886] [<ffffffff8130b79f>] ? thread_return+0x79/0xe0
> [ 157.443893] [<ffffffff8100e609>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
> [ 157.443901] [<ffffffff8130cb3a>] ? _spin_unlock_irqrestore+0xd/0xe
> [ 157.443908] [<ffffffff8105b748>] ? try_to_del_timer_sync+0x63/0x6c
> [ 157.443917] [<ffffffffa002788f>] ? kjournald2+0xbe/0x206 [jbd2]
> [ 157.443923] [<ffffffff81065d4a>] ? autoremove_wake_function+0x0/0x2e
> [ 157.443930] [<ffffffffa00277d1>] ? kjournald2+0x0/0x206 [jbd2]
> [ 157.443936] [<ffffffff81065a7d>] ? kthread+0x79/0x81
> [ 157.443942] [<ffffffff81012baa>] ? child_rip+0xa/0x20
> [ 157.443947] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> [ 157.443952] [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> [ 157.443956] Code: 00 48 89 4c 24 38 45 31 ed c7 44 24 4c 00 00 00 00 c7 44 24 50 00 00 00 00 c7 44 24 58 ff ff ff ff 48 89 44 24 18 e9 f0 02 00 00 <0f> 0b eb fe 41 f6 04 24 02 74 40 48 8b 13 48 8d 42 02 f0 80 62
> [ 157.444009] RIP [<ffffffffa00213bc>] jbd2_journal_commit_transaction+0x58b/0x115b [jbd2]
> [ 157.444017] RSP <ffff8801dc2adcf0>
> [ 157.444035] ---[ end trace 1277bd032d9ae98f ]---
>
> ** Model information
> Amazon EC2 instance
>
> ** Loaded modules:
> Module Size Used by
> ext3 106502 1
> jbd 37085 1 ext3
> loop 11799 0
> snd_pcm 60551 0
> snd_timer 15582 1 snd_pcm
> snd 46446 2 snd_pcm,snd_timer
> soundcore 4598 1 snd
> evdev 7352 0
> snd_page_alloc 6249 1 snd_pcm
> pcspkr 1699 0
> ext4 289033 24
> mbcache 5050 2 ext3,ext4
> jbd2 67015 1 ext4
> crc16 1319 1 ext4
> dm_mod 53754 69
> xen_netfront 16257 0
> xen_blkfront 9435 8
>
> ** Network interface configuration:
> ###
> ## WARNING:
> ## DO NOT MESS WITH THIS FILE, AND ESPECIALLY DO NOT RUN IFDOWN OR IFUP!!!
> ## THIS MACHINE IS AN AMAZON EC2 VM WITH NO CONSOLE ACCESS!!!
> ###
>
> auto lo
> iface lo inet loopback
> dns-search border.exmeritus.com exmeritus.com
>
> auto eth0
> iface eth0 inet dhcp
>
>
> ** Network status:
> *** IP interfaces and addresses:
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
> link/ether 12:31:3b:0a:94:dc brd ff:ff:ff:ff:ff:ff
> inet 10.204.151.42/23 brd 10.204.151.255 scope global eth0
> inet6 fe80::1031:3bff:fe0a:94dc/64 scope link
> valid_lft forever preferred_lft forever
>
> *** Device statistics:
> Inter-| Receive | Transmit
> face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
> lo: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> eth0: 185543 1684 0 0 0 0 0 0 239161 1212 0 0 0 0 0 0
>
> *** Protocol statistics:
> Ip:
> 1653 total packets received
> 1 with invalid addresses
> 0 forwarded
> 0 incoming packets discarded
> 1652 incoming packets delivered
> 1203 requests sent out
> Icmp:
> 9 ICMP messages received
> 0 input ICMP message failed.
> ICMP input histogram:
> echo requests: 9
> 9 ICMP messages sent
> 0 ICMP messages failed
> ICMP output histogram:
> echo replies: 9
> IcmpMsg:
> InType8: 9
> OutType0: 9
> Tcp:
> 26 active connections openings
> 6 passive connection openings
> 7 failed connection attempts
> 7 connection resets received
> 6 connections established
> 1477 segments received
> 1008 segments send out
> 20 segments retransmited
> 0 bad segments received.
> 12 resets sent
> Udp:
> 166 packets received
> 0 packets to unknown port received.
> 0 packet receive errors
> 166 packets sent
> UdpLite:
> TcpExt:
> 2 TCP sockets finished time wait in fast timer
> 22 delayed acks sent
> 4 packets directly queued to recvmsg prequeue.
> 563 packet headers predicted
> 84 acknowledgments not containing data payload received
> 740 predicted acknowledgments
> 6 congestion windows recovered without slow start after partial ack
> 13 other TCP timeouts
> 1 connections reset due to unexpected data
> 2 connections reset due to early user close
> 7 connections aborted due to timeout
> IpExt:
> InOctets: 184459
> OutOctets: 221425
>
> *** Device features:
> eth0: 0x50003
> lo: 0x13865
>
> ** PCI devices:
>
> ** USB devices:
> not available
>
>
> -- System Information:
> Debian Release: 6.0
> APT prefers stable
> APT policy: (990, 'stable'), (700, 'testing'), (600, 'unstable'), (500, 'squeeze-updates'), (500, 'experimental')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/2 CPU cores)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 depends on:
> ii debconf [debconf-2.0] 1.5.36.1 Debian configuration management sy
> ii initramfs-tools 0.98.8 tools for generating an initramfs
> ii linux-base 2.6.32-30 Linux image base package
> ii module-init-tools 3.12-1 tools for managing Linux kernel mo
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 recommends:
> pn firmware-linux-free <none> (no description available)
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 suggests:
> ii grub-legacy [grub] 0.97-64 GRand Unified Bootloader (Legacy v
> pn linux-doc-2.6.32 <none> (no description available)
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 is related to:
> pn firmware-bnx2 <none> (no description available)
> pn firmware-bnx2x <none> (no description available)
> pn firmware-ipw2x00 <none> (no description available)
> pn firmware-ivtv <none> (no description available)
> pn firmware-iwlwifi <none> (no description available)
> pn firmware-linux <none> (no description available)
> pn firmware-linux-nonfree <none> (no description available)
> pn firmware-qlogic <none> (no description available)
> pn firmware-ralink <none> (no description available)
> pn xen-hypervisor <none> (no description available)
>
> -- debconf information:
> linux-image-2.6.32-5-xen-amd64/postinst/ignoring-do-bootloader-2.6.32-5-xen-amd64:
> linux-image-2.6.32-5-xen-amd64/prerm/removing-running-kernel-2.6.32-5-xen-amd64: true
> linux-image-2.6.32-5-xen-amd64/postinst/depmod-error-initrd-2.6.32-5-xen-amd64: false
> linux-image-2.6.32-5-xen-amd64/postinst/missing-firmware-2.6.32-5-xen-amd64:
>
>


2011-04-03 02:02:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

Hi Kyle,

Sorry for not following up sooner. Are you still able to reproduce
this failure? If I set up an identical Debian stable instance on
EC-2, am I likely to reproduce it myself? Do you have a package list
or EC2 base image I can use as a starting point?

Thanks,

- Ted

2011-04-04 15:04:12

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Apr 02, 2011, at 22:02, Ted Ts'o wrote:
> Sorry for not following up sooner. Are you still able to reproduce
> this failure? If I set up an identical Debian stable instance on
> EC-2, am I likely to reproduce it myself? Do you have a package list
> or EC2 base image I can use as a starting point?

I'll need to check on this.

Unfortunately it was not a trivial process to install Debian "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts, a patched Debian-Installer, and several manual post-install-but-before-reboot steps (like fixing up GRUB 0.99). One of these days I may get time to update all that to the official "wheezy" release and submit bug reports.

I have an exact image of the failing instance, but it has proprietary data on it and if I stand up an old copy I need to be careful not to actually let it send all the queued emails :-D.

It's probably easier for me to halt email delivery and clone the working instance and try to reproduce from there. If I recall, the (easily undone) workaround was to remount from "data=journal" to "data=ordered" on a couple filesystems. It may take a day or two to get this done, though.

If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.

Thanks again!

Cheers,
Kyle Moffett

2011-04-04 20:52:56

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Apr 04, 2011, at 10:24, Moffett, Kyle D wrote:
> On Apr 02, 2011, at 22:02, Ted Ts'o wrote:
>> Sorry for not following up sooner. Are you still able to reproduce
>> this failure? If I set up an identical Debian stable instance on
>> EC-2, am I likely to reproduce it myself? Do you have a package list
>> or EC2 base image I can use as a starting point?
>
> I'll need to check on this.
>
> Unfortunately it was not a trivial process to install Debian "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts, a patched Debian-Installer, and several manual post-install-but-before-reboot steps (like fixing up GRUB 0.99). One of these days I may get time to update all that to the official "wheezy" release and submit bug reports.
>
> I have an exact image of the failing instance, but it has proprietary data on it and if I stand up an old copy I need to be careful not to actually let it send all the queued emails :-D.
>
> It's probably easier for me to halt email delivery and clone the working instance and try to reproduce from there. If I recall, the (easily undone) workaround was to remount from "data=journal" to "data=ordered" on a couple filesystems. It may take a day or two to get this done, though.
>
> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.

Bah...

I went back to the old image that was crashing every boot before and I can't find any way to make it crash at all now...

If I manage to reproduce it again later I'll send you another email.

Cheers,
Kyle Moffett

2011-04-05 00:15:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Mon, Apr 04, 2011 at 09:24:28AM -0500, Moffett, Kyle D wrote:
>
> Unfortunately it was not a trivial process to install Debian
> "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts,
> a patched Debian-Installer, and several manual
> post-install-but-before-reboot steps (like fixing up GRUB 0.99).
> One of these days I may get time to update all that to the official
> "wheezy" release and submit bug reports.

Sigh, I was whoping someone was maintaining semi-official EC2 images
for Debian, much like alestic has been maintaining for Ubuntu. (Hmm,
actually, he has EC2 images for Lenny and Etch, but unfortunately not
for squeeze. Sigh....)

> It's probably easier for me to halt email delivery and clone the
> working instance and try to reproduce from there. If I recall, the
> (easily undone) workaround was to remount from "data=journal" to
> "data=ordered" on a couple filesystems. It may take a day or two to
> get this done, though.

Couple of questions which might give me some clues: (a) was this a
natively formatted ext4 file system, or a ext3 file system which was
later converted to ext4? (b) How big are the files/directories
involved? In particular, how big is the Postfix mail queue directory,
and it is an extent-based directory? (what does lsattr on the mail
queue directory report) As far as file sizes, does it matter how big
the e-mail messages are, and are there any other database files that
postgress might be touching at the time that you get the OOPS?

I have found a bug in ext4 where we were underestimating how many
journal credits were needed when modifying direct/indirect-mapped
files (which would be seen on ext4 if you had a ext3 file system that
was converted to start using extents; but old, pre-existing
directories wouldn't be converted), which is why I'm asking the
question about whether this was an ext2/ext3 file system which was
converted to use ext4.

I have a patch to fix it, but backporting it into a kernel which will
work with EC2 is not something I've done before. Can anyone point me
at a web page that gives me the quick cheat sheet?

> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.

If we can reproduce the problem on that base image it would be really
great! I have an Amazon AWS account; contact me when you have an
image you want to share, if you want to share it just with my AWS
account id, instead of sharing it publically...

- Ted

2011-04-05 16:36:22

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Apr 04, 2011, at 20:15, Ted Ts'o wrote:
> On Mon, Apr 04, 2011 at 09:24:28AM -0500, Moffett, Kyle D wrote:
>>
>> Unfortunately it was not a trivial process to install Debian
>> "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts,
>> a patched Debian-Installer, and several manual
>> post-install-but-before-reboot steps (like fixing up GRUB 0.99).
>> One of these days I may get time to update all that to the official
>> "wheezy" release and submit bug reports.
>
> Sigh, I was whoping someone was maintaining semi-official EC2 images
> for Debian, much like alestic has been maintaining for Ubuntu. (Hmm,
> actually, he has EC2 images for Lenny and Etch, but unfortunately not
> for squeeze. Sigh....)

The Alestic EC2 images (now replaced by official Ubuntu images) use
kernel images formed as AKIs, which means users can't upload their
own. Prior to a couple of Ubuntu staff getting special permission
to upload kernel images, all the Alestic EC2 images just borrowed
RedHat or Fedora kernels and copied over the modules.

The big problem for Squeeze is that it uses new udev which is not
compatible with those older kernels.

For the Debian-Installer and my Debian images, I use the PV-GRUB
AKI to load a kernel image from my rootfs.

Specifically, one of the Perl scripts builds an S3-based AMI
containing a Debian-Installer kernel and initramfs (using a tweaked
and preseeded D-I build). It uploads the AMI to my account and
registers it with EC2.

Then another Perl script starts the uploaded AMI and attaches one
or more EBS volumes for the Debian-Instalelr to use. When you've
completed the install it takes EBS snapshots and creates an
EBS-backed AMI from those.

The scripts use an odd mix of the Net::Amazon::EC2 CPAN module and
shell callouts to the ec2 tools, but they seem to work well enough.

I'm actually using the official Debian Xen kernels for both the
install process and the operational system, but the regular pv_ops
kernels (without extra Xen patches) work fine too. The only bug I
found so far was a known workaround for old buggy hypervisors:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592428

That one is fixed in the official "squeeze" release.

>> It's probably easier for me to halt email delivery and clone the
>> working instance and try to reproduce from there. If I recall, the
>> (easily undone) workaround was to remount from "data=journal" to
>> "data=ordered" on a couple filesystems. It may take a day or two to
>> get this done, though.
>
> Couple of questions which might give me some clues:
> (a) was this a natively formatted ext4 file system, or a ext3 file
> system which was later converted to ext4?

All the filesystems were formatted like this using Debian e2fstools
as of 9 months ago:

mke2fs -t ext4 -E lazy_itable_init=1 -L db:mail /dev/mapper/db-mail
tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/mapper/db-mail

Ooooh.... could the lazy_itable_init have anything to do with it?

> (b) How big are the files/directories involved? In particular,
> how big is the Postfix mail queue directory, and it is an
> extent-based directory? (what does lsattr on the mail queue
> directory report)

Ok, there's a couple relatively small filesystems:
/var/spool/postfix (20971520 sectors, 728K used right now)
/var/lib/postfix (262144 sectors, 26K used right now)
/var/mail (8380416 sectors, 340K used right now)

As far as I can tell, everything in each filesystem is using
extents (at least I assume that's what this means from lsattr -R):
-----------------e- .
-----------------e- ./corrupt
-----------------e- ./deferred
[...]

The "/var/spool/postfix" is the Postfix chroot as per the default
Debian configuration.

I should also mention that the EC2 hypervisor does not seem to
support barriers or flushes. PV-GRUB complains about that very
early during the boot process.

> (c) As far as file sizes, does it matter how big the e-mail
> messages are, and are there any other database files that
> postgress might be touching at the time that you get the OOPS?

I assume you mean "postfix" instead of "postgres" here. I'm not
entirely sure because I can't reproduce the OOPS anymore, but there
does not seem to be anything in the Postfix directories other than
the individual spooled-mail files (one per email), some libraries,
some PID files, some UNIX-domain sockets, and a couple of static
files in etc/, so I would assume not. I'm pretty sure that it is
/var/spool/postfix that was crashing.

The emails that were triggering the issue were between 4k and 120k,
but no more than 100-120 stuck emails total.

The SSL session cache files are stored in /var/lib/postfix, which
as I said above is an entirely separate filesystem.


> I have found a bug in ext4 where we were underestimating how many
> journal credits were needed when modifying direct/indirect-mapped
> files (which would be seen on ext4 if you had a ext3 file system that
> was converted to start using extents; but old, pre-existing
> directories wouldn't be converted), which is why I'm asking the
> question about whether this was an ext2/ext3 file system which was
> converted to use ext4.

I'm not entirely clear if this applies to me or not, but I'm more
than happy to try patches.


> I have a patch to fix it, but backporting it into a kernel which will
> work with EC2 is not something I've done before. Can anyone point me
> at a web page that gives me the quick cheat sheet?

As I said above, I'm just using unmodified Debian "squeeze" kernels.
The EC2 stuff is basically a really old version of Xen, and latest
upstream kernels with paravirt_ops seem to work. Not sure how much
good it does since I can't seem to reproduce...

I've switched the relevant filesystems back to data=journal mode,
so if you want to send me a patch for 2.6.32 that I can apply to a
Debian kernel I will keep that kernel around and if I see it happen
again I'll check if the patch fixes it.


>> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.
>
> If we can reproduce the problem on that base image it would be really
> great! I have an Amazon AWS account; contact me when you have an
> image you want to share, if you want to share it just with my AWS
> account id, instead of sharing it publically...

Well, the base image is essentially a somewhat basic Debian "squeeze"
for EC2 with our SSH public keys and a couple generic customizations
applied. It does not have Postfix installed or configured, so there
would be some work involved.

I also didn't see any problems with the system at all until the
queue got backed up with ~100-120 stuck emails. After Postfix tried
and failed to deliver a bunch of emails I would get the OOPS.

Cheers,
Kyle Moffett

2011-04-05 19:07:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Tue, Apr 05, 2011 at 10:30:11AM -0500, Moffett, Kyle D wrote:
> > Couple of questions which might give me some clues:
> > (a) was this a natively formatted ext4 file system, or a ext3 file
> > system which was later converted to ext4?
>
> All the filesystems were formatted like this using Debian e2fstools
> as of 9 months ago:

Rats. OK, so the indirect block journal credit bug fix won't help
this bug.

> mke2fs -t ext4 -E lazy_itable_init=1 -L db:mail /dev/mapper/db-mail
> tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/mapper/db-mail
>
> Ooooh.... could the lazy_itable_init have anything to do with it?

Shouldn't be, since 2.6.32 doesn't have the lazy inode init support.
That support didn't show up until 2.6.37.

> I've switched the relevant filesystems back to data=journal mode,
> so if you want to send me a patch for 2.6.32 that I can apply to a
> Debian kernel I will keep that kernel around and if I see it happen
> again I'll check if the patch fixes it.

Given that this was a freshly created file system with mke2fs -t ext4,
I doubt the patch would help.

> Well, the base image is essentially a somewhat basic Debian "squeeze"
> for EC2 with our SSH public keys and a couple generic customizations
> applied. It does not have Postfix installed or configured, so there
> would be some work involved.

Well, if you can share that image in AWS with the ssh keys stripped
out it would save me a bunch of time. I assume it's not setup to
automatically set ssh keys and pass them back to AWS like the generic
images can?

> I also didn't see any problems with the system at all until the
> queue got backed up with ~100-120 stuck emails. After Postfix tried
> and failed to deliver a bunch of emails I would get the OOPS.

Yeah, what I'd probably try to do is install postfix and then send a
few hundred messages to [email protected] and see if I can repro the OOPS.

Thanks for investigating!

- Ted

2011-04-05 19:44:33

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernelBUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Apr 05, 2011, at 15:07, Ted Ts'o wrote:
> On Tue, Apr 05, 2011 at 10:30:11AM -0500, Moffett, Kyle D wrote:
>> Well, the base image is essentially a somewhat basic Debian "squeeze"
>> for EC2 with our SSH public keys and a couple generic customizations
>> applied. It does not have Postfix installed or configured, so there
>> would be some work involved.
>
> Well, if you can share that image in AWS with the ssh keys stripped
> out it would save me a bunch of time. I assume it's not setup to
> automatically set ssh keys and pass them back to AWS like the generic
> images can?

Well, the generic images just download to ~root/.ssh/authorized_keys
from http://169.254.169.254/meta-data/public-keys/0/openssh-key

They don't really "generate" a keypair themselves, that's just what
AWS does and provides the pubkey via that URL. The 169.254.169.254 is
just the link-local address for some virtualization infrastructure
software. These days they even let you upload your own pubkey.

Unfortunately our images don't do that download, although the patched
D-I image does do that when initially setting up the network console.

If you send me an SSH key and AWS account number in private email then
I will cook an updated image and share it out to you.


>> I also didn't see any problems with the system at all until the
>> queue got backed up with ~100-120 stuck emails. After Postfix tried
>> and failed to deliver a bunch of emails I would get the OOPS.
>
> Yeah, what I'd probably try to do is install postfix and then send a
> few hundred messages to [email protected] and see if I can repro the OOPS.

Not sure if it's related, but the particular SMTP error that was
causing things to back up in the first place was by the remote server
rejecting emails via SMTP errors after a *successful* connection.


> Thanks for investigating!

Thanks for debugging!

Cheers,
Kyle Moffett