2009-03-16 19:43:30

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] New: kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885

Summary: kernel BUG at fs/jbd/transaction.c:1376!
Product: File System
Version: 2.5
KernelVersion: 2.6.28
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext3
AssignedTo: [email protected]
ReportedBy: [email protected]


Latest working kernel version: 2.6.28
Earliest failing kernel version: 2.6.28
Distribution: Archlinux
Hardware Environment:x86_64
Software Environment:
Problem Description:
------------[ cut here ]------------
kernel BUG at fs/jbd/transaction.c:1376!
invalid opcode: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
CPU 1
Modules linked in: xt_tcpudp iptable_mangle xt_MARK ip_tables x_tables
cpufreq_ondemand ext3 jbd ext2 lrw joydev usbhid hid uhci_hcd snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ohci1394
i2c_i801 i2c_core ieee1394 ehci_hcd skge snd_hda_intel snd_pcm snd_timer
snd_page_alloc snd_hwdep snd soundcore pcspkr usbcore intel_agp iTCO_wdt
iTCO_vendor_support sg battery ac tun fuse sky2 evdev acpi_cpufreq freq_table
coretemp button fan thermal processor rtc_cmos rtc_core rtc_lib ext4 mbcache
jbd2 crc16 aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sd_mod sr_mod
cdrom pata_acpi ata_generic ahci ata_piix pata_marvell libata scsi_mod
Pid: 11065, comm: scp Not tainted 2.6.28-ARCH #1
RIP: 0010:[<ffffffffa039119e>] [<ffffffffa039119e>] journal_stop+0x1ee/0x200
[jbd]
RSP: 0018:ffff88009121bd48 EFLAGS: 00010282
RAX: ffff8800cf9c2520 RBX: 0000000000000000 RCX: 0000000000000034
RDX: ffff88020d93eb88 RSI: ffff88022f7fb5d0 RDI: ffff88022f7fb5d0
RBP: ffff880216f6e3c0 R08: 0400000000000000 R09: 0000000000000000
R10: ffff88022e848000 R11: ffffffff80321f00 R12: ffff88022f7fb5d0
R13: ffff88022ea8a000 R14: 00000000000081f8 R15: ffff88009121bdbc
FS: 00002abe4691f810(0000) GS:ffff88022f802900(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000192c428 CR3: 00000000cfb9f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scp (pid: 11065, threadinfo ffff88009121a000, task ffff8800cf9c2520)
Stack:
ffffffff80488675 0000000000000000 ffff88022ea89800 ffffffffa03b77f6
ffff88020d93eb88 00000000000081f8 ffff88009121bdbc ffffffffa03b12dd
ffff88021a7c97d8 ffff88022f7fb5d0 0000000000000000 ffffffffa03abaef
Call Trace:
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffffa03b12dd>] ? __ext3_journal_stop+0x2d/0x60 [ext3]
[<ffffffffa03abaef>] ? ext3_create+0x3f/0x130 [ext3]
[<ffffffff802bc1a7>] ? vfs_create+0xf7/0x140
[<ffffffff802bf17c>] ? do_filp_open+0x85c/0x980
[<ffffffff802572a0>] ? autoremove_wake_function+0x0/0x30
[<ffffffff802b5b73>] ? vfs_stat_fd+0x23/0x60
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffff802c8b42>] ? alloc_fd+0x122/0x150
[<ffffffff802af640>] ? do_sys_open+0x80/0x110
[<ffffffff8020c50a>] ? system_call_fastpath+0x16/0x1b
Code: e8 58 3b ea df 41 8b 75 28 85 f6 0f 84 37 ff ff ff 49 8d 7d 78 31 c9 ba
01 00 00 00 be 03 00 00 00 e8 37 3b ea df e9 1d ff ff ff <0f> 0b eb fe 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
RIP [<ffffffffa039119e>] journal_stop+0x1ee/0x200 [jbd]
RSP <ffff88009121bd48>


Steps to reproduce:

Don't know.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


2009-03-17 09:31:48

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #1 from [email protected] 2009-03-17 02:30 -------
Another one:
------------[ cut here ]------------
kernel BUG at fs/jbd/transaction.c:1376!
invalid opcode: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
CPU 0
Modules linked in: xt_tcpudp iptable_mangle xt_MARK ip_tables x_tables
cpufreq_ondemand ext3 jbd ext2 lrw joydev snd_seq_oss snd_seq_midi_event
snd_seq snd_seq_device usbhid hid snd_pcm_oss snd_mixer_oss ohci1394 uhci_hcd
snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep ieee1394 skge snd
pcspkr ehci_hcd soundcore i2c_i801 i2c_core usbcore iTCO_wdt intel_agp
iTCO_vendor_support sg battery ac tun fuse sky2 evdev acpi_cpufreq freq_table
coretemp button fan thermal processor rtc_cmos rtc_core rtc_lib ext4 mbcache
jbd2 crc16 aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sr_mod cdrom
sd_mod pata_acpi ata_generic ahci ata_piix pata_marvell libata scsi_mod
Pid: 6706, comm: unrar Not tainted 2.6.28-ARCH #1
RIP: 0010:[<ffffffffa039119e>] [<ffffffffa039119e>] journal_stop+0x1ee/0x200
[jbd]
RSP: 0000:ffff8800cfac5de8 EFLAGS: 00010286
RAX: ffff8801b33a2b50 RBX: 0000000000000000 RCX: 0000000000000034
RDX: ffff880160894770 RSI: ffff880229b61390 RDI: ffff880229b61390
RBP: ffff8801a01e0180 R08: 0400000000000000 R09: 0000000000000000
R10: ffff88022e758000 R11: ffffffff80321f00 R12: ffff880229b61390
R13: ffff88022554ec00 R14: 00000000000041fd R15: ffff880141756ea0
FS: 00002af651d5bb20(0000) GS:ffffffff8065e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f6c80ef9000 CR3: 00000000cfb7c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process unrar (pid: 6706, threadinfo ffff8800cfac4000, task ffff8801b33a2b50)
Stack:
ffff8800432cd310 0000000000000000 ffff88022554c800 ffffffffa03b77c8
ffff8800432cd310 00000000000041fd ffff880141756ea0 ffffffffa03b12dd
ffff8801608bab88 ffff880229b61390 ffff880160894770 ffffffffa03aba2e
Call Trace:
[<ffffffffa03b12dd>] ? __ext3_journal_stop+0x2d/0x60 [ext3]
[<ffffffffa03aba2e>] ? ext3_mkdir+0x24e/0x2d0 [ext3]
[<ffffffff802bbc67>] ? vfs_mkdir+0xe7/0x130
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffff802be27c>] ? sys_mkdirat+0x11c/0x130
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffff802c8b42>] ? alloc_fd+0x122/0x150
[<ffffffff8020c50a>] ? system_call_fastpath+0x16/0x1b
Code: e8 58 3b ea df 41 8b 75 28 85 f6 0f 84 37 ff ff ff 49 8d 7d 78 31 c9 ba
01 00 00 00 be 03 00 00 00 e8 37 3b ea df e9 1d ff ff ff <0f> 0b eb fe 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
RIP [<ffffffffa039119e>] journal_stop+0x1ee/0x200 [jbd]
RSP <ffff8800cfac5de8>
---[ end trace 0b2b3734ee2df4f1 ]---


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-17 21:50:40

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #2 from [email protected] 2009-03-17 14:49 -------
The BUG is caused by the following in journal_stop() in transaction.c:

J_ASSERT(journal_current_handle() == handle);

Somehow the journal handle got corrupted, or we screwed up in the refcount of
open handles and allowed a journal transaction to commit even though
ext3_mkdir() and ext3_creat() was in the middle of doing something with a
handle. This is one of these should-never-happen situations.

What was the system doing at the time of the crash? And we have to wonder why
you are seeing it but apparently not others. Is this a completely unmodified,
stock 2.6.28 kernel? Have there been any patches applied?


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 14:47:49

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #3 from [email protected] 2009-03-18 07:46 -------
Stock Archlinux kernel, which uses this patch set afaik:
ftp://ftp.archlinux.org/other/kernel26/patch-2.6.28.7-2-ARCH.bz2
Or the version before. The ext3 is on a luks encrypted hdd.

Afair I tried to scp to the fs and the scp just hang even after the disk spun
up.
That was when I noticed.

Is there a way to recover?

Is there a way to differentiate between hardware problem and software bug?


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 14:58:51

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #4 from [email protected] 2009-03-18 07:57 -------
Oh, this looks pretty likely to be a software bug. It just seems to be one of
these "should never happen; why isn't anyone else seeing it?" things. Maybe it
has something to do with the luks encrpytion layer? I don't know.

If you could do some experiments to see how easily you can reproduce it, and
maybe some hints about how we could reproduce it on one of our machines, that
would be really appreciated. If you're willing to work with us on some
debugging patches, that would be really great as well.

Once you can reproduce it, then we can try to twiddle various variables, such
as whether it goes away if you remove the luks layer. We can also try
introducing some patches that might log some in-flight information, etc. So
trying to see if we can get an easy reproduction case is the first thing that
we really need to do. If you have the time to help us out, it will really
help us track it down. (If not, we'll understand, of course.)

Thanks in advance...


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 14:59:45

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #5 from [email protected] 2009-03-18 07:58 -------
Oh, once it happens, probably the only way to recover is to reboot.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 16:35:58

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #6 from [email protected] 2009-03-18 09:34 -------
I'm quite willing to nail this one down.

Do you mean reproduction on another ext3 or on the same?
On the same it wouldn't be a problem, it e.g. happens with a cp -rp when trying
to back that data up.
And while most fs on my system are ext4 now, i've used ext3 over luks for quite
some time now, without any problems.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 17:42:39

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #7 from [email protected] 2009-03-18 10:41 -------
It's good to know that you can reproduce it easily on that particular ext3 file
system. One interesting question is whether you can reproduce it on some
other ext3 file system (since then we might be able to reproduce it on our
end); although I haven't played with LUKS at all, and I don't know how much
effort it would take for me to set it up on my system.

How are you backing up the data? tar? Amanda? Some other backup system?


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 17:46:14

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #8 from [email protected] 2009-03-18 10:45 -------
One interesting thing you could try, if you can go to 2.6.29-rc8, is you could
try mounting your ext3 filesystem using ext4, and see if the problem goes away
or not. If the problem sticks around, then that's actually very interesting,
since so much has changed between ext3 and ext4. If it goes away, that could
be a solution for you, but it also could be that it's just harder to reproduce
on ext4.

This smells like a bug in jbd layer, or perhaps some assumption which is
getting violated by LUKS; since if it was as simple as just doing a backup of a
tree while another process was copying data into it, ext3 is such a commonly
used file system that I would have thought such a problem would have been
discovered by now. So if we have something like this which can be easily
reproduced, I really want to try to get it chased down.

Thanks,


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 17:52:18

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #9 from [email protected] 2009-03-18 10:51 -------
Trying to mount ext4 over LUKS on 2.6.28 should work as well. (Sorry, the
reason why I said going to 2.6.29-rc was because I confused myself for a moment
and thought this was an ext2 problem, and 2.6.29-rc1 and later will allow you
to mount an ext2 filesystem using the ext4 filesystem code.) But simply a
quick try to see if it will reproduce with using ext4 to mount your ext3
filesystem over LUKS would also be worth a very quick test. Again, if the
problem still sticks around, that's the much more interesting data point; if
you can't reproduce it, that's good to know, but it doesn't help us chase down
the ext3/jbd bug....


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 18:11:31

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #10 from [email protected] 2009-03-18 11:10 -------
(In reply to comment #7)
> It's good to know that you can reproduce it easily on that particular ext3 file
> system. One interesting question is whether you can reproduce it on some
> other ext3 file system (since then we might be able to reproduce it on our
> end); although I haven't played with LUKS at all, and I don't know how much
> effort it would take for me to set it up on my system.

That should be rather easy, follow:
http://wiki.archlinux.org/index.php/LUKS

Basically:

$ cryptsetup -c aes-xts-plain -y -s 512 luksFormat /dev/sda3
$ cryptsetup luksOpen /dev/sda3 test
$ mkfs.ext3 /dev/mapper/test

> How are you backing up the data? tar? Amanda? Some other backup system?

I just tried to cp -rp the data away. There was no write to the fs at the time
I think.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 18:15:25

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #11 from [email protected] 2009-03-18 11:14 -------
(In reply to comment #8)
> One interesting thing you could try, if you can go to 2.6.29-rc8, is you could
> try mounting your ext3 filesystem using ext4, and see if the problem goes away
> or not. If the problem sticks around, then that's actually very interesting,
> since so much has changed between ext3 and ext4. If it goes away, that could
> be a solution for you, but it also could be that it's just harder to reproduce
> on ext4.

Might we not loose the bug that way, as in it'd be gone for ever?
Can I do that read-only?

> This smells like a bug in jbd layer, or perhaps some assumption which is
> getting violated by LUKS; since if it was as simple as just doing a backup of a
> tree while another process was copying data into it, ext3 is such a commonly
> used file system that I would have thought such a problem would have been
> discovered by now. So if we have something like this which can be easily
> reproduced, I really want to try to get it chased down.

It's as easy as *reading* the afaics. I'll test this some more.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 18:53:40

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #12 from [email protected] 2009-03-18 11:50 -------
OK. The BUG above says there was an unrar that caused it. So that probably
happened before my cp -rp. Most of the files are readable after the bug
happens. But accessing the problematic directory or file, processes hang. Is
this to be expected after the bug happens but before a reboot?


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 19:11:43

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #13 from [email protected] 2009-03-18 12:11 -------
Oh, sorry, I misunderstood what you were saying. All you need to do is "cp
-rp" the directory and you get the system hang? Uh, that's very interesting.
How big is the file system in question? I thought you were saying you were
doing a backup while some other process was doing a cp -rp *into* the
directory.

I agree, if that's all it takes, then it must be highly sensitive to the
filesystem state, and we want to be careful to preserve it. In fact, before
you do anything else, you might want to save the filesystem image using
e2image:

e2image -r /dev/sda1 - | bzip2 > sda1.e2i.bz2

Given that this is an encrypted filesystem, I can imagine that you probably
won't be willing to send this to me, even though this omits all of the data
blocks, and only keeps the metadata blocks (although this does include the
directory blocks and hence the file names).

However, you can take this raw image file, and dump it on a raw disk, and see
if you can replicate the problem on a disk partition. Something else you
could do is to send me a "scrambled" e2image:

e2image -rs /dev/sda1 - | bzip2 > sda1.e2i.bz2

This randomizes the directory names, although it means that I would have to
turn off the dir_index flags before I could try using it. Still, it might be
enough to replicate the problem on my end.

Final question --- have you tried running e2fsck -n on the filesystem; do you
know if the filesystem has been reported as self-consistent by e2fsck?

Thanks,


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 19:14:25

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #14 from [email protected] 2009-03-18 12:13 -------
In answer to your question in comment #12, yes, it's expected that processes
accessing the problematic directory might hang after a reboot, since when the
process died after the BUG, it probably left some locks locked, and so
processes would end up waiting forever for the locks to get unlocked (which
they won't since the process that held them died after the OOPS message).

But if this was caused by the unrar, then this might be harder to replicate....

In any case, if you haven't rebooted yet, I would try rebooting, and then
running e2fsck on the filesystem to make sure it is consistent. Then the next
trick is to see what is needed to replicate the BUG/oops message.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 21:13:44

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #15 from [email protected] 2009-03-18 14:12 -------
A dd of the hdd is about 370GB. Using this dd image and loop mounting it
I can reproduce a hang (no fsck), but this time nothing in dmesg.

I'm probably able to share it, but only to a limited group. It's got nothing
really confidential on it. I just encrypt all my disks so I can throw them away
without a second thought, especially if they break (or send them for repair).


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 21:52:21

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #16 from [email protected] 2009-03-18 14:50 -------
So after rebooting and manual fsck, a simple touch is triggering it,
or something similar:

$ fsck /dev/mapper/media
fsck 1.41.4 (27-Jan-2009)
e2fsck 1.41.4 (27-Jan-2009)
/dev/mapper/media: recovering journal
/dev/mapper/media has been mounted 157 times without being checked, check
forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/media: 21438/24420352 files (30.1% non-contiguous),
89299732/97677469 blocks


$ touch x
[1] 5794 segmentation fault touch x

dmesg:
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
------------[ cut here ]------------
kernel BUG at fs/jbd/transaction.c:1376!
invalid opcode: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
CPU 1
Modules linked in: ext3 jbd xt_tcpudp iptable_mangle xt_MARK ip_tables x_tables
cpufreq_ondemand ext2 lrw joydev usbhid hid uhci_hcd snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ohci1394
ieee1394 pcspkr skge ehci_hcd i2c_i801 i2c_core snd_hda_intel snd_pcm snd_timer
snd_page_alloc snd_hwdep snd soundcore usbcore intel_agp iTCO_wdt
iTCO_vendor_support sg battery ac tun fuse sky2 evdev acpi_cpufreq freq_table
coretemp button fan thermal processor rtc_cmos rtc_core rtc_lib ext4 mbcache
jbd2 crc16 aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sr_mod cdrom
sd_mod pata_acpi ata_generic ahci ata_piix pata_marvell libata scsi_mod
Pid: 5794, comm: touch Not tainted 2.6.28-ARCH #1
RIP: 0010:[<ffffffffa053019e>] [<ffffffffa053019e>] journal_stop+0x1ee/0x200
[jbd]
RSP: 0018:ffff88021acfbd48 EFLAGS: 00010282
RAX: ffff8802190e18c0 RBX: 0000000000000000 RCX: 0000000000000034
RDX: ffff8801b799fbf0 RSI: ffff88022f516168 RDI: ffff88022f516168
RBP: ffff8801ac5f6780 R08: 0400000000000000 R09: 0000000000000000
R10: ffff88020b544000 R11: ffffffff80321f00 R12: ffff88022f516168
R13: ffff88020b401800 R14: 00000000000081b4 R15: ffff88021acfbdbc
FS: 00002ab790209000(0000) GS:ffff88022f802900(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000406da0 CR3: 0000000228bf2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process touch (pid: 5794, threadinfo ffff88021acfa000, task ffff8802190e18c0)
Stack:
ffffffff80488675 0000000000000000 ffff88020b401000 ffffffffa05567f6
ffff8801b799fbf0 00000000000081b4 ffff88021acfbdbc ffffffffa05502dd
ffff8801ce594e00 ffff88022f516168 0000000000000000 ffffffffa054aaef
Call Trace:
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffffa05502dd>] ? __ext3_journal_stop+0x2d/0x60 [ext3]
[<ffffffffa054aaef>] ? ext3_create+0x3f/0x130 [ext3]
[<ffffffffa054ae7b>] ? ext3_lookup+0xcb/0x100 [ext3]
[<ffffffff802bc1a7>] ? vfs_create+0xf7/0x140
[<ffffffff802bf17c>] ? do_filp_open+0x85c/0x980
[<ffffffff80488675>] ? _spin_unlock+0x5/0x30
[<ffffffff802c8b42>] ? alloc_fd+0x122/0x150
[<ffffffff802af640>] ? do_sys_open+0x80/0x110
[<ffffffff8020c50a>] ? system_call_fastpath+0x16/0x1b
Code: e8 58 4b d0 df 41 8b 75 28 85 f6 0f 84 37 ff ff ff 49 8d 7d 78 31 c9 ba
01 00 00 00 be 03 00 00 00 e8 37 4b d0 df e9 1d ff ff ff <0f> 0b eb fe 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
RIP [<ffffffffa053019e>] journal_stop+0x1ee/0x200 [jbd]
RSP <ffff88021acfbd48>
---[ end trace 778583b30cdd0759 ]---


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-18 22:06:08

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #17 from [email protected] 2009-03-18 15:04 -------
Mounting as ext4, touch works, mounting again as ext3 hits the bug again.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-19 09:38:44

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #18 from [email protected] 2009-03-19 02:37 -------
Booting a new Archlinux kernel 2.6.28.8, I got this on booting:

Adding 1999992k swap on /a.swap. Priority:-1 extents:462 across:8115000k
JBD: barrier-based sync failed on dm-0:8 - disabling barriers
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
general protection fault: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/block/dm-2/size
CPU 0
Modules linked in: ext2 lrw joydev usbhid hid uhci_hcd snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ohci1394
ieee1394 ehci_hcd snd_hda_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd
soundcore skge i2c_i801 i2c_core usbcore intel_agp pcspkr iTCO_wdt
iTCO_vendor_support sg battery ac tun fuse sky2 evdev acpi_cpufreq freq_table
coretemp button fan thermal processor rtc_cmos rtc_core rtc_lib ext4 mbcache
jbd2 crc16 aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sd_mod sr_mod
cdrom pata_acpi ata_generic ahci ata_piix pata_marvell libata scsi_mod
Pid: 4955, comm: hald Not tainted 2.6.28-ARCH #1
RIP: 0010:[<ffffffffa01620ab>] [<ffffffffa01620ab>]
acpi_processor_info_seq_show+0x10/0x69 [processor]
RSP: 0018:ffff88022e571e48 EFLAGS: 00010206
RAX: ffff88022ee1f760 RBX: ffff88022e513380 RCX: 2222222222222222
RDX: 0038004000000000 RSI: 0000000000000001 RDI: ffff88022e513380
RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: ffff88022e513380 R12: 0000000000000001
R13: ffff88022e571e98 R14: 0000000000000000 R15: 0000000000000400
FS: 00007fadef4276f0(0000) GS:ffffffff8065e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000435660 CR3: 000000022cbf1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hald (pid: 4955, threadinfo ffff88022e570000, task ffff88022e61dcd0)
Stack:
ffff88022e8fa900 ffff88022e571f50 0000000000000400 ffffffff802cd9c6
00000007fadef2c2 ffff88022e571f50 0000000000655860 ffff88022e8fa900
ffff88022e5133b0 0000000000000000 0000000000000000 ffff88022eb28480
Call Trace:
[<ffffffff802cd9c6>] ? seq_read+0xd6/0x360
[<ffffffff802cd8f0>] ? seq_read+0x0/0x360
[<ffffffff802fec51>] ? proc_reg_read+0x81/0xd0
[<ffffffff802b23f8>] ? vfs_read+0xc8/0x180
[<ffffffff802b25b3>] ? sys_read+0x53/0xa0
[<ffffffff8020c50a>] ? system_call_fastpath+0x16/0x1b
Code: c3 48 8b 47 e8 48 89 f1 48 c7 c6 9b 20 16 a0 48 89 cf 48 8b 50 60 e9 e5
bc 16 e0 48 83 ec 18 48 8b 57 70 49 89 fb 48 85 d2 74 52 <8a> 42 1c 49 c7 c0 ce
6a 16 a0 48 c7 c6 d1 6a 16 a0 4d 89 c2 4c
RIP [<ffffffffa01620ab>] acpi_processor_info_seq_show+0x10/0x69 [processor]
RSP <ffff88022e571e48>
---[ end trace 7a143c8d515d5620 ]---


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-19 21:42:31

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #19 from [email protected] 2009-03-19 14:41 -------
The problem was introduced between Archlinux kernel version 2.6.28.7-1 and
2.6.28.7-2.
I'll upload a diff of the package. The patch used by Archlinux has not changed
between those revisions.
The only thing I could see that might be related to the bug i'm seeing is a
change in CONFIG_NLS_UTF8 which is now built as module instead of into the
kernel.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-19 21:43:44

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #20 from [email protected] 2009-03-19 14:43 -------
Created an attachment (id=20601)
--> (http://bugzilla.kernel.org/attachment.cgi?id=20601&action=view)
Archlinux kernel package diff

A diff between the kernel source packages that introduced the bug and the one
before.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-19 21:52:01

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #21 from [email protected] 2009-03-19 14:51 -------
Can you replicate this on stock upstream 2.6.28.7, with the archlinux configs?

-Eric


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-19 21:55:10

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #22 from [email protected] 2009-03-19 14:54 -------
Maybe a related bug:
http://bugs.archlinux.org/task/13762?project=1


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-20 07:25:07

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #23 from [email protected] 2009-03-20 00:23 -------
Just commenting out the Archlinux patch, results in ext3 not being able to load
anymore:
ext3: Unknown symbol __grab_cache_page


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-20 13:28:33

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885





------- Comment #24 from [email protected] 2009-03-20 06:26 -------
Andreas, if you can send me the compressed e2image file (it should compress
well, since all of the data blocks have been taken out) via some kind of
private download URL, I'd really appreciate it. At this point that's probably
going to be the fastest way to track down what is happening.

The possibly related bug you pointed at in Comment #22 is a reiserfs bug, which
uses totally unrelated journalling machinery to ext4, so I very much doubt it
is related.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-21 07:12:47

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12885] kernel BUG at fs/jbd/transaction.c:1376!

http://bugzilla.kernel.org/show_bug.cgi?id=12885


[email protected] changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |REJECTED
Resolution| |INVALID




------- Comment #25 from [email protected] 2009-03-21 00:12 -------
Turns out the grub root was wrong and so an old kernel got loaded,
leading to a mismatch between kernel and modules.

If you still consider this a bug, i'll submit the e2image.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.