LinuxLists.cc - [next-20101038] Call trace in ext4

2010-10-28 10:05:05

Subject: [next-20101038] Call trace in ext4

Hi,

I have built today linux-next as of next-20101028 in a non-BKL config
(kernel-config attached) on a Debian i386/sid host.

When I start my quassel IRC-client I get reproducibly this call-trace:

# tail -40 kern.log
Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
maximal mount count reached, running e2fsck is recommended
Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
internal journal
Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
filesystem with ordered data mode
Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
/home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
Oct 28 11:44:16 tbox kernel: [ 115.480979] invalid opcode: 0000 [#1] SMP
Oct 28 11:44:16 tbox kernel: [ 115.481155] last sysfs file:
/sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
Oct 28 11:44:16 tbox kernel: [ 115.481610] Modules linked in: ext3
jbd sco aes_i586 rfcomm bnep aes_generic l2cap bluetooth acpi_cpufreq
mperf cpufreq_powersave cpufreq_userspace cpufreq_stats ppdev
cpufreq_conservative lp dm_crypt binfmt_misc arc4 snd_intel8x0 ecb
snd_intel8x0m radeon thinkpad_acpi snd_ac97_codec ath5k snd_seq_midi
ac97_bus pcmcia snd_pcm_oss joydev snd_rawmidi snd_seq_midi_event ath
mac80211 snd_mixer_oss ttm cfg80211 snd_seq snd_pcm yenta_socket
pcmcia_rsrc drm_kms_helper drm snd_seq_device snd_timer pcmcia_core
i2c_algo_bit nsc_ircc rfkill snd_page_alloc irda i2c_i801 shpchp snd
i2c_core tpm_tis rng_core pci_hotplug psmouse processor tpm soundcore
parport_pc video ac tpm_bios battery button led_class pcspkr evdev
parport power_supply serio_raw nvram output crc_ccitt fuse autofs4
ext4 mbcache jbd2 crc16 dm_mod usbhid hid usb_storage sg sd_mod sr_mod
crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
scsi_mod thermal e1000 thermal_sys floppy nls_base [last unloaded:
scsi
Oct 28 11:44:16 tbox kernel: _wait_scan]
Oct 28 11:44:16 tbox kernel: [ 115.484020]
Oct 28 11:44:16 tbox kernel: [ 115.484020] Pid: 237, comm:
jbd2/sda5-8 Not tainted 2.6.36-next-20101028.1-686 #1 2374SG6/2374SG6
Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: 0060:[<f86282fb>]
EFLAGS: 00010246 CPU: 0
Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP is at
ext4_writepage+0x8d/0x1f1 [ext4]
Oct 28 11:44:16 tbox kernel: [ 115.484020] EAX: 40020029 EBX:
f76934a0 ECX: 05050030 EDX: 00000000
Oct 28 11:44:16 tbox kernel: [ 115.484020] ESI: 00005050 EDI:
00001000 EBP: f5f3c548 ESP: ef973dbc
Oct 28 11:44:16 tbox kernel: [ 115.484020] DS: 007b ES: 007b FS:
00d8 GS: 00e0 SS: 0068
Oct 28 11:44:16 tbox kernel: [ 115.484020] Process jbd2/sda5-8 (pid:
237, ti=ef972000 task=ef8c9080 task.ti=ef972000)
Oct 28 11:44:16 tbox kernel: [ 115.484020] Stack:
Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000005 00000000
ef973e98 00000000 f5f3c600 ef973e98 00000000 00005050
Oct 28 11:44:16 tbox kernel: [ 115.484020] c108fe6e f76934a0
c1090ea4 00000001 f5f3c600 00000004 00000002 00000000
Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000006 0000000e
ef9a6180 c108fe66 e74172ff 0000000e 00000000 f7693400
Oct 28 11:44:16 tbox kernel: [ 115.484020] Call Trace:
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe6e>] ? __writepage+0x8/0x1f
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090ea4>] ?
write_cache_pages+0x1cc/0x281
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe66>] ? __writepage+0x0/0x1f
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090f6f>] ?
generic_writepages+0x16/0x1d
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85ab8d9>] ?
journal_submit_data_buffers+0xf5/0x150 [jbd2]
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85abd1e>] ?
jbd2_journal_commit_transaction+0x2d1/0xda3 [jbd2]
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1025bdb>] ?
dequeue_task_fair+0x1b/0x57
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10392df>] ?
lock_timer_base+0x19/0x34
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1039356>] ?
try_to_del_timer_sync+0x5c/0x63
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afdf2>] ?
kjournald2+0x9e/0x1c7 [jbd2]
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043ffe>] ?
autoremove_wake_function+0x0/0x29
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afd54>] ?
kjournald2+0x0/0x1c7 [jbd2]
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043cb2>] ? kthread+0x63/0x68
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043c4f>] ? kthread+0x0/0x68
Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10034fe>] ?
kernel_thread_helper+0x6/0x10
Oct 28 11:44:16 tbox kernel: [ 115.484020] Code: 0c 89 34 24 89 ce 0f
ac d6 0c 39 7c 24 04 75 05 39 34 24 74 07 bf 00 10 00 00 eb 08 89 cf
81 e7 ff 0f 00 00 8b 03 f6 c4 08 75 04 <0f> 0b eb fe c7 04 24 00 00 00
00 83 7b 0c 00 75 37 68 46 ac 62
Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: [<f86282fb>]
ext4_writepage+0x8d/0x1f1 [ext4] SS:ESP 0068:ef973dbc
Oct 28 11:44:16 tbox kernel: [ 115.564589] ---[ end trace 1b8c420fb1d1ae45 ]---

The code section looks like:
[ fs/ext4/inode.c ]
...
/*
* If the page does not have buffers (for whatever reason),
* try to create them using __block_write_begin. If this
* fails, redirty the page and move on.
*/
if (!page_buffers(page)) { <--- LINE #2721
if (__block_write_begin(page, 0, len,
noalloc_get_block_write)) {
redirty_page:
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
}
commit_write = 1;
}
...

I am not sure if this is a general ext4 problem or only in a non-BKL setup.

Hope this helps to track down the problem.

Kind Regards,
- Sedat -

P.S.:
I also have tried with patches from [1] and [2], v2 versions attached
to make linux-next happy.

[1] http://lkml.org/lkml/2010/10/28/36
[2] http://lkml.org/lkml/2010/10/28/40

Attachments:

config-2.6.36-next-20101028.1-686 (114.05 kB)
GIT-PULL-ext4-update-for-2.6.37-v2.patch (463.00 B)
tip-origin-tree-build-failure-was-GIT-PULL-ext4-update-for-2.6.37-v2.patch (427.00 B)
Download all attachments

2010-10-28 10:34:56

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101028] Call trace in ext4

[ Changes subject s/next-20101038/next-20101028 ]

Not sure if there got something wrong when merging ext4 [1] into
linux-next (before GIT-pull-request to LKML) or some patches missing
from Ted's ext4/upstream-merge GIT branch?

- Sedat -

P.S.: Forgotten to add v2 of 2nd patch I mentionned below, sorry.

[1] http://lkml.org/lkml/2010/10/27/480
[2] http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=shortlog;h=refs/heads/upstream-merge

On Thu, Oct 28, 2010 at 12:05 PM, Sedat Dilek
<[email protected]> wrote:
> Hi,
>
> I have built today linux-next as of next-20101028 in a non-BKL config
> (kernel-config attached) on a Debian i386/sid host.
>
> When I start my quassel IRC-client I get reproducibly this call-trace:
>
> # tail -40 kern.log
> Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
> maximal mount count reached, running e2fsck is recommended
> Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
> internal journal
> Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
> filesystem with ordered data mode
> Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
> Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
> /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
> Oct 28 11:44:16 tbox kernel: [ 115.480979] invalid opcode: 0000 [#1] SMP
> Oct 28 11:44:16 tbox kernel: [ 115.481155] last sysfs file:
> /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
> Oct 28 11:44:16 tbox kernel: [ 115.481610] Modules linked in: ext3
> jbd sco aes_i586 rfcomm bnep aes_generic l2cap bluetooth acpi_cpufreq
> mperf cpufreq_powersave cpufreq_userspace cpufreq_stats ppdev
> cpufreq_conservative lp dm_crypt binfmt_misc arc4 snd_intel8x0 ecb
> snd_intel8x0m radeon thinkpad_acpi snd_ac97_codec ath5k snd_seq_midi
> ac97_bus pcmcia snd_pcm_oss joydev snd_rawmidi snd_seq_midi_event ath
> mac80211 snd_mixer_oss ttm cfg80211 snd_seq snd_pcm yenta_socket
> pcmcia_rsrc drm_kms_helper drm snd_seq_device snd_timer pcmcia_core
> i2c_algo_bit nsc_ircc rfkill snd_page_alloc irda i2c_i801 shpchp snd
> i2c_core tpm_tis rng_core pci_hotplug psmouse processor tpm soundcore
> parport_pc video ac tpm_bios battery button led_class pcspkr evdev
> parport power_supply serio_raw nvram output crc_ccitt fuse autofs4
> ext4 mbcache jbd2 crc16 dm_mod usbhid hid usb_storage sg sd_mod sr_mod
> crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
> scsi_mod thermal e1000 thermal_sys floppy nls_base [last unloaded:
> scsi
> Oct 28 11:44:16 tbox kernel: _wait_scan]
> Oct 28 11:44:16 tbox kernel: [ 115.484020]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] Pid: 237, comm:
> jbd2/sda5-8 Not tainted 2.6.36-next-20101028.1-686 #1 2374SG6/2374SG6
> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: 0060:[<f86282fb>]
> EFLAGS: 00010246 CPU: 0
> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP is at
> ext4_writepage+0x8d/0x1f1 [ext4]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] EAX: 40020029 EBX:
> f76934a0 ECX: 05050030 EDX: 00000000
> Oct 28 11:44:16 tbox kernel: [ 115.484020] ESI: 00005050 EDI:
> 00001000 EBP: f5f3c548 ESP: ef973dbc
> Oct 28 11:44:16 tbox kernel: [ 115.484020] DS: 007b ES: 007b FS:
> 00d8 GS: 00e0 SS: 0068
> Oct 28 11:44:16 tbox kernel: [ 115.484020] Process jbd2/sda5-8 (pid:
> 237, ti=ef972000 task=ef8c9080 task.ti=ef972000)
> Oct 28 11:44:16 tbox kernel: [ 115.484020] Stack:
> Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000005 00000000
> ef973e98 00000000 f5f3c600 ef973e98 00000000 00005050
> Oct 28 11:44:16 tbox kernel: [ 115.484020] c108fe6e f76934a0
> c1090ea4 00000001 f5f3c600 00000004 00000002 00000000
> Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000006 0000000e
> ef9a6180 c108fe66 e74172ff 0000000e 00000000 f7693400
> Oct 28 11:44:16 tbox kernel: [ 115.484020] Call Trace:
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe6e>] ? __writepage+0x8/0x1f
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090ea4>] ?
> write_cache_pages+0x1cc/0x281
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe66>] ? __writepage+0x0/0x1f
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090f6f>] ?
> generic_writepages+0x16/0x1d
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85ab8d9>] ?
> journal_submit_data_buffers+0xf5/0x150 [jbd2]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85abd1e>] ?
> jbd2_journal_commit_transaction+0x2d1/0xda3 [jbd2]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1025bdb>] ?
> dequeue_task_fair+0x1b/0x57
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10392df>] ?
> lock_timer_base+0x19/0x34
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1039356>] ?
> try_to_del_timer_sync+0x5c/0x63
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afdf2>] ?
> kjournald2+0x9e/0x1c7 [jbd2]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043ffe>] ?
> autoremove_wake_function+0x0/0x29
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afd54>] ?
> kjournald2+0x0/0x1c7 [jbd2]
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043cb2>] ? kthread+0x63/0x68
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043c4f>] ? kthread+0x0/0x68
> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10034fe>] ?
> kernel_thread_helper+0x6/0x10
> Oct 28 11:44:16 tbox kernel: [ 115.484020] Code: 0c 89 34 24 89 ce 0f
> ac d6 0c 39 7c 24 04 75 05 39 34 24 74 07 bf 00 10 00 00 eb 08 89 cf
> 81 e7 ff 0f 00 00 8b 03 f6 c4 08 75 04 <0f> 0b eb fe c7 04 24 00 00 00
> 00 83 7b 0c 00 75 37 68 46 ac 62
> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: [<f86282fb>]
> ext4_writepage+0x8d/0x1f1 [ext4] SS:ESP 0068:ef973dbc
> Oct 28 11:44:16 tbox kernel: [ 115.564589] ---[ end trace 1b8c420fb1d1ae45 ]---
>
> The code section looks like:
> [ fs/ext4/inode.c ]
> ...
> /*
> * If the page does not have buffers (for whatever reason),
> * try to create them using __block_write_begin. If this
> * fails, redirty the page and move on.
> */
> if (!page_buffers(page)) { <--- LINE #2721
> if (__block_write_begin(page, 0, len,
> noalloc_get_block_write)) {
> redirty_page:
> redirty_page_for_writepage(wbc, page);
> unlock_page(page);
> return 0;
> }
> commit_write = 1;
> }
> ...
>
> I am not sure if this is a general ext4 problem or only in a non-BKL setup.
>
> Hope this helps to track down the problem.
>
> Kind Regards,
> - Sedat -
>
> P.S.:
> I also have tried with patches from [1] and [2], v2 versions attached
> to make linux-next happy.
>
> [1] http://lkml.org/lkml/2010/10/28/36
> [2] http://lkml.org/lkml/2010/10/28/40
>

Attachments:

tip-origin-tree-build-failure-was-GIT-PULL-ext4-update-for-2.6.37-v2.patch (427.00 B)

2010-10-28 15:08:34

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101028] Call trace in ext4

I tried to GIT-pull ext4/upstream-merge and linux-2.6/master into
linux-next (20101028), for both cases there occured CONFLICTS.
So let's see if this is fixed in tommorow's linux-next patch .

For now, I switch to Linux 2.6.36-git11 and will report later.

- Sedat -

Attached files:
MERGE_CONFLICT_ext4-upstream-merge and MERGE_CONFLICT_linux-2.6

On Thu, Oct 28, 2010 at 12:34 PM, Sedat Dilek
<[email protected]> wrote:
> [ Changes subject s/next-20101038/next-20101028 ]
>
> Not sure if there got something wrong when merging ext4 [1] into
> linux-next (before GIT-pull-request to LKML) or some patches missing
> from Ted's ext4/upstream-merge GIT branch?
>
> - Sedat -
>
> P.S.: Forgotten to add v2 of 2nd patch I mentionned below, sorry.
>
> [1] http://lkml.org/lkml/2010/10/27/480
> [2] http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=shortlog;h=refs/heads/upstream-merge
>
> On Thu, Oct 28, 2010 at 12:05 PM, Sedat Dilek
> <[email protected]> wrote:
>> Hi,
>>
>> I have built today linux-next as of next-20101028 in a non-BKL config
>> (kernel-config attached) on a Debian i386/sid host.
>>
>> When I start my quassel IRC-client I get reproducibly this call-trace:
>>
>> # tail -40 kern.log
>> Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
>> maximal mount count reached, running e2fsck is recommended
>> Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
>> internal journal
>> Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
>> filesystem with ordered data mode
>> Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
>> Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
>> /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
>> Oct 28 11:44:16 tbox kernel: [ 115.480979] invalid opcode: 0000 [#1] SMP
>> Oct 28 11:44:16 tbox kernel: [ 115.481155] last sysfs file:
>> /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
>> Oct 28 11:44:16 tbox kernel: [ 115.481610] Modules linked in: ext3
>> jbd sco aes_i586 rfcomm bnep aes_generic l2cap bluetooth acpi_cpufreq
>> mperf cpufreq_powersave cpufreq_userspace cpufreq_stats ppdev
>> cpufreq_conservative lp dm_crypt binfmt_misc arc4 snd_intel8x0 ecb
>> snd_intel8x0m radeon thinkpad_acpi snd_ac97_codec ath5k snd_seq_midi
>> ac97_bus pcmcia snd_pcm_oss joydev snd_rawmidi snd_seq_midi_event ath
>> mac80211 snd_mixer_oss ttm cfg80211 snd_seq snd_pcm yenta_socket
>> pcmcia_rsrc drm_kms_helper drm snd_seq_device snd_timer pcmcia_core
>> i2c_algo_bit nsc_ircc rfkill snd_page_alloc irda i2c_i801 shpchp snd
>> i2c_core tpm_tis rng_core pci_hotplug psmouse processor tpm soundcore
>> parport_pc video ac tpm_bios battery button led_class pcspkr evdev
>> parport power_supply serio_raw nvram output crc_ccitt fuse autofs4
>> ext4 mbcache jbd2 crc16 dm_mod usbhid hid usb_storage sg sd_mod sr_mod
>> crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore
>> scsi_mod thermal e1000 thermal_sys floppy nls_base [last unloaded:
>> scsi
>> Oct 28 11:44:16 tbox kernel: _wait_scan]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] Pid: 237, comm:
>> jbd2/sda5-8 Not tainted 2.6.36-next-20101028.1-686 #1 2374SG6/2374SG6
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: 0060:[<f86282fb>]
>> EFLAGS: 00010246 CPU: 0
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP is at
>> ext4_writepage+0x8d/0x1f1 [ext4]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] EAX: 40020029 EBX:
>> f76934a0 ECX: 05050030 EDX: 00000000
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] ESI: 00005050 EDI:
>> 00001000 EBP: f5f3c548 ESP: ef973dbc
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] DS: 007b ES: 007b FS:
>> 00d8 GS: 00e0 SS: 0068
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] Process jbd2/sda5-8 (pid:
>> 237, ti=ef972000 task=ef8c9080 task.ti=ef972000)
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] Stack:
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000005 00000000
>> ef973e98 00000000 f5f3c600 ef973e98 00000000 00005050
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] c108fe6e f76934a0
>> c1090ea4 00000001 f5f3c600 00000004 00000002 00000000
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] 00000006 0000000e
>> ef9a6180 c108fe66 e74172ff 0000000e 00000000 f7693400
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] Call Trace:
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe6e>] ? __writepage+0x8/0x1f
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090ea4>] ?
>> write_cache_pages+0x1cc/0x281
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c108fe66>] ? __writepage+0x0/0x1f
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1090f6f>] ?
>> generic_writepages+0x16/0x1d
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85ab8d9>] ?
>> journal_submit_data_buffers+0xf5/0x150 [jbd2]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85abd1e>] ?
>> jbd2_journal_commit_transaction+0x2d1/0xda3 [jbd2]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1025bdb>] ?
>> dequeue_task_fair+0x1b/0x57
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10392df>] ?
>> lock_timer_base+0x19/0x34
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1039356>] ?
>> try_to_del_timer_sync+0x5c/0x63
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afdf2>] ?
>> kjournald2+0x9e/0x1c7 [jbd2]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043ffe>] ?
>> autoremove_wake_function+0x0/0x29
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<f85afd54>] ?
>> kjournald2+0x0/0x1c7 [jbd2]
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043cb2>] ? kthread+0x63/0x68
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c1043c4f>] ? kthread+0x0/0x68
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] [<c10034fe>] ?
>> kernel_thread_helper+0x6/0x10
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] Code: 0c 89 34 24 89 ce 0f
>> ac d6 0c 39 7c 24 04 75 05 39 34 24 74 07 bf 00 10 00 00 eb 08 89 cf
>> 81 e7 ff 0f 00 00 8b 03 f6 c4 08 75 04 <0f> 0b eb fe c7 04 24 00 00 00
>> 00 83 7b 0c 00 75 37 68 46 ac 62
>> Oct 28 11:44:16 tbox kernel: [ 115.484020] EIP: [<f86282fb>]
>> ext4_writepage+0x8d/0x1f1 [ext4] SS:ESP 0068:ef973dbc
>> Oct 28 11:44:16 tbox kernel: [ 115.564589] ---[ end trace 1b8c420fb1d1ae45 ]---
>>
>> The code section looks like:
>> [ fs/ext4/inode.c ]
>> ...
>> /*
>> * If the page does not have buffers (for whatever reason),
>> * try to create them using __block_write_begin. If this
>> * fails, redirty the page and move on.
>> */
>> if (!page_buffers(page)) { <--- LINE #2721
>> if (__block_write_begin(page, 0, len,
>> noalloc_get_block_write)) {
>> redirty_page:
>> redirty_page_for_writepage(wbc, page);
>> unlock_page(page);
>> return 0;
>> }
>> commit_write = 1;
>> }
>> ...
>>
>> I am not sure if this is a general ext4 problem or only in a non-BKL setup.
>>
>> Hope this helps to track down the problem.
>>
>> Kind Regards,
>> - Sedat -
>>
>> P.S.:
>> I also have tried with patches from [1] and [2], v2 versions attached
>> to make linux-next happy.
>>
>> [1] http://lkml.org/lkml/2010/10/28/36
>> [2] http://lkml.org/lkml/2010/10/28/40
>>
>

Attachments:

MERGE_CONFLICT_ext4-upstream-merge (20.84 kB)
MERGE_CONFLICT_linux-2.6 (2.31 kB)
Download all attachments

2010-10-28 15:27:54

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101028] Call trace in ext4

On Thu, Oct 28, 2010 at 05:08:32PM +0200, Sedat Dilek wrote:
> I tried to GIT-pull ext4/upstream-merge and linux-2.6/master into
> linux-next (20101028), for both cases there occured CONFLICTS.
> So let's see if this is fixed in tommorow's linux-next patch .
>
> For now, I switch to Linux 2.6.36-git11 and will report later.

Yeah, sorry. What got dropped into Linux next was something that was
in the middle of getting prepared for the push to Linus. Thanks to
the shortened merge window and things being crazy for me (between
$WORK stuff, Kernel Summit and Linux Plumber's Conference stuff, etc.)
some stuff didn't get fully finished in the ext4 tree until the very
last minute.

There were indeed a largish number of conflicts this time around,
mainly because of parallel development between the bio layer and ext4.
I suspect Stephen had not resolved those conflicts in the same way as
I had, and thus you had problems trying to pull in ext4/upstream-merge
into linux-next.

Linux 2.6.36-git11 has the full set of ext4 commits which I intended
to push to Linus before 37-rc1, and so linux-next should be free of
conflicts for tomorrow night. Let me know if you have any problems
with -git11.

Regards,

- Ted

2010-10-28 17:52:21

by Markus Trippelsdorf

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 12:05:05PM +0200, Sedat Dilek wrote:
> Hi,
>
> I have built today linux-next as of next-20101028 in a non-BKL config
> (kernel-config attached) on a Debian i386/sid host.
>
> When I start my quassel IRC-client I get reproducibly this call-trace:
>
> # tail -40 kern.log
> Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
> maximal mount count reached, running e2fsck is recommended
> Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
> internal journal
> Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
> filesystem with ordered data mode
> Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
> Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
> /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!

The same BUG (inode.c:2721) happend here today running latest vanilla
git. There is nothing in my logs unfortunately, but I shot a photo of
the trace (see attachment).

--
Markus

Attachments:

(No filename) (1.01 kB)
ext4bug_.jpg (205.34 kB)
Download all attachments

2010-10-28 18:01:22

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
>
> The same BUG (inode.c:2721) happend here today running latest vanilla
> git. There is nothing in my logs unfortunately, but I shot a photo of
> the trace (see attachment).

I see, it's the page_buffers() call which is triggering. Looking into
it...

- Ted

2010-10-28 18:17:06

by Eric Sandeen

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

Ted Ts'o wrote:
> On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
>> The same BUG (inode.c:2721) happend here today running latest vanilla
>> git. There is nothing in my logs unfortunately, but I shot a photo of
>> the trace (see attachment).
>
> I see, it's the page_buffers() call which is triggering. Looking into
> it...
>
> - Ted

Same bug that Avinash Kurup reported, right.

-Eric

2010-10-28 18:20:34

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 7:52 PM, Markus Trippelsdorf
<[email protected]> wrote:
> On Thu, Oct 28, 2010 at 12:05:05PM +0200, Sedat Dilek wrote:
>> Hi,
>>
>> I have built today linux-next as of next-20101028 in a non-BKL config
>> (kernel-config attached) on a Debian i386/sid host.
>>
>> When I start my quassel IRC-client I get reproducibly this call-trace:
>>
>> # tail -40 kern.log
>> Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
>> maximal mount count reached, running e2fsck is recommended
>> Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
>> internal journal
>> Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
>> filesystem with ordered data mode
>> Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
>> Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
>> /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
>
> The same BUG (inode.c:2721) happend here today running latest vanilla
> git. There is nothing in my logs unfortunately, but I shot a photo of
> the trace (see attachment).
>
> --
> Markus
>

Looks like the two patches from ext/for_linus GIT-branch stabilzes
system, here (2.6.36-git11).

root@tbox:~# uptime
20:19:42 up 19 min, 4 users, load average: 1.27, 0.89, 0.58

- Sedat -

Attachments:

0001-ext4-Fix-build-when-CONFIG_EXT4_FS_XATTR.patch (1.28 kB)
0002-fs-build-fix-when-CONFIG_BLOCK.patch (1.37 kB)
Download all attachments

2010-10-28 18:34:21

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 8:20 PM, Sedat Dilek <[email protected]> wrote:
> On Thu, Oct 28, 2010 at 7:52 PM, Markus Trippelsdorf
> <[email protected]> wrote:
>> On Thu, Oct 28, 2010 at 12:05:05PM +0200, Sedat Dilek wrote:
>>> Hi,
>>>
>>> I have built today linux-next as of next-20101028 in a non-BKL config
>>> (kernel-config attached) on a Debian i386/sid host.
>>>
>>> When I start my quassel IRC-client I get reproducibly this call-trace:
>>>
>>> # tail -40 kern.log
>>> Oct 28 11:42:54 tbox kernel: [ 32.872957] EXT3-fs (sdb5): warning:
>>> maximal mount count reached, running e2fsck is recommended
>>> Oct 28 11:42:54 tbox kernel: [ 32.873621] EXT3-fs (sdb5): using
>>> internal journal
>>> Oct 28 11:42:54 tbox kernel: [ 32.873635] EXT3-fs (sdb5): mounted
>>> filesystem with ordered data mode
>>> Oct 28 11:44:16 tbox kernel: [ 115.480401] ------------[ cut here ]------------
>>> Oct 28 11:44:16 tbox kernel: [ 115.480598] kernel BUG at
>>> /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
>>
>> The same BUG (inode.c:2721) happend here today running latest vanilla
>> git. There is nothing in my logs unfortunately, but I shot a photo of
>> the trace (see attachment).
>>
>> --
>> Markus
>>
>
> Looks like the two patches from ext/for_linus GIT-branch stabilzes
> system, here (2.6.36-git11).
>
> root@tbox:~# uptime
> 20:19:42 up 19 min, 4 users, load average: 1.27, 0.89, 0.58
>
> - Sedat -
>

One minute after I sent the mail, I ran into the error.
Now, I could start my IRC-client for a while... till crash.

- Sedat -

2010-10-28 18:55:36

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 08:34:19PM +0200, Sedat Dilek wrote:
>
> One minute after I sent the mail, I ran into the error.
> Now, I could start my IRC-client for a while... till crash.

Yeah, the patchesin ext4/next are just things to fix compiling without
CONFIG_BLOCK or CONFIG_EXT4_FS_XATTR. It doesn't make any code
changes at all.

- Ted

2010-10-28 19:32:17

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 02:01:18PM -0400, Ted Ts'o wrote:
> On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
> >
> > The same BUG (inode.c:2721) happend here today running latest vanilla
> > git. There is nothing in my logs unfortunately, but I shot a photo of
> > the trace (see attachment).
>
> I see, it's the page_buffers() call which is triggering. Looking into
> it...

Can folks let me know if this fixes the problem?

In this case I haven't been able to replicate the problem, but I've
eyeballed the problem and I'm about 90% certain this should fix
things. But I don't want to push this to Linus until I get
confirmation from you all that it fixes things. That's just one of
the ways in which your testing is critically important for ext4, so
thanks again for your help in the past, present, and future.
Thanks!!

- Ted

commit 51279fcb9720aa856ad81673886ca2349a373dac
Author: Theodore Ts'o <[email protected]>
Date: Thu Oct 28 15:15:21 2010 -0400

ext4: BUG_ON fix: check if page has buffers before calling page_buffers()

We need to make check if a page does not have buffes by checking
page_has_buffers(page) before calling page_buffers(page) in
ext4_writepage(). Otherwise page_buffers() could throw a BUG_ON.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2d6c6c8..1916164 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2718,7 +2718,7 @@ static int ext4_writepage(struct page *page,
* try to create them using __block_write_begin. If this
* fails, redirty the page and move on.
*/
- if (!page_buffers(page)) {
+ if (!page_has_buffers(page)) {
if (__block_write_begin(page, 0, len,
noalloc_get_block_write)) {
redirty_page:
@@ -2732,12 +2732,10 @@ static int ext4_writepage(struct page *page,
if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
ext4_bh_delay_or_unwritten)) {
/*
- * We don't want to do block allocation So redirty the
- * page and return We may reach here when we do a
- * journal commit via
- * journal_submit_inode_data_buffers. If we don't
- * have mapping block we just ignore them. We can also
- * reach here via shrink_page_list
+ * We don't want to do block allocation, so redirty
+ * the page and return. We may reach here when we do
+ * a journal commit via journal_submit_inode_data_buffers.
+ * We can also reach here via shrink_page_list
*/
goto redirty_page;
}

2010-10-28 19:36:23

by Eric Sandeen

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

Ted Ts'o wrote:
> On Thu, Oct 28, 2010 at 02:01:18PM -0400, Ted Ts'o wrote:
>> On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
>>> The same BUG (inode.c:2721) happend here today running latest vanilla
>>> git. There is nothing in my logs unfortunately, but I shot a photo of
>>> the trace (see attachment).
>> I see, it's the page_buffers() call which is triggering. Looking into
>> it...
>
> Can folks let me know if this fixes the problem?

Ted, any idea what caused the change in behavior here?

-Eric

> In this case I haven't been able to replicate the problem, but I've
> eyeballed the problem and I'm about 90% certain this should fix
> things. But I don't want to push this to Linus until I get
> confirmation from you all that it fixes things. That's just one of
> the ways in which your testing is critically important for ext4, so
> thanks again for your help in the past, present, and future.
> Thanks!!
>
> - Ted
>
> commit 51279fcb9720aa856ad81673886ca2349a373dac
> Author: Theodore Ts'o <[email protected]>
> Date: Thu Oct 28 15:15:21 2010 -0400
>
> ext4: BUG_ON fix: check if page has buffers before calling page_buffers()
>
> We need to make check if a page does not have buffes by checking
> page_has_buffers(page) before calling page_buffers(page) in
> ext4_writepage(). Otherwise page_buffers() could throw a BUG_ON.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2d6c6c8..1916164 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2718,7 +2718,7 @@ static int ext4_writepage(struct page *page,
> * try to create them using __block_write_begin. If this
> * fails, redirty the page and move on.
> */
> - if (!page_buffers(page)) {
> + if (!page_has_buffers(page)) {
> if (__block_write_begin(page, 0, len,
> noalloc_get_block_write)) {
> redirty_page:
> @@ -2732,12 +2732,10 @@ static int ext4_writepage(struct page *page,
> if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> ext4_bh_delay_or_unwritten)) {
> /*
> - * We don't want to do block allocation So redirty the
> - * page and return We may reach here when we do a
> - * journal commit via
> - * journal_submit_inode_data_buffers. If we don't
> - * have mapping block we just ignore them. We can also
> - * reach here via shrink_page_list
> + * We don't want to do block allocation, so redirty
> + * the page and return. We may reach here when we do
> + * a journal commit via journal_submit_inode_data_buffers.
> + * We can also reach here via shrink_page_list
> */
> goto redirty_page;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-10-28 19:54:56

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 02:36:08PM -0500, Eric Sandeen wrote:
> Ted Ts'o wrote:
> > On Thu, Oct 28, 2010 at 02:01:18PM -0400, Ted Ts'o wrote:
> >> On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
> >>> The same BUG (inode.c:2721) happend here today running latest vanilla
> >>> git. There is nothing in my logs unfortunately, but I shot a photo of
> >>> the trace (see attachment).
> >> I see, it's the page_buffers() call which is triggering. Looking into
> >> it...
> >
> > Can folks let me know if this fixes the problem?
>
> Ted, any idea what caused the change in behavior here?

The bug was caused by commit a42afc5f56: ext4: simplify ext4_writepage()

I somehow managed to use page_buffers(page) instead of
page_has_buffers(page) when cleaning up ext4_writpage(). It's not
something I can trigger in xfstests, and so on my todo list is to
create a test case that can trigger this issue.

The immediate trigger was journal_submit_inode_data_buffers() getting
called in data=ordered mode, which ends up calling
generic_writepages() which iterates over all of the dirty pages in the
inode and calls ext4_writepage() on them. If we're under enough
memory pressure that the buffer heads get stripped from the page
before the journal commit happens (by default on a 5 second interval),
then we'll end up calling page_buffers() on a page with the buffer
heads stripped, and the fact that I had somehow changed
page_has_buffers() to page_buffers(), would cause a BUG_ON.

My standard test setup runs xfstests using 768k of memory on a
dual-CPU system, and apparently fsstress wasn't enough to trigger the
case where the bh's get stripped from the page, even with a relatively
small memory configuration. Which is surprising to me, but one good
thing about this bug is that it has pointed out a gap in my testing
strategy.

To address this, we need to either (a) create tests that generate
enough memory pressure so this happens, or (b) we need to have some
hooks (maybe some magic ioctl's) that emulate this by forcibly
detaching bh's from some random number of pages.

- Ted

2010-10-28 19:54:23

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 9:32 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Oct 28, 2010 at 02:01:18PM -0400, Ted Ts'o wrote:
>> On Thu, Oct 28, 2010 at 07:52:21PM +0200, Markus Trippelsdorf wrote:
>> >
>> > The same BUG (inode.c:2721) happend here today running latest vanilla
>> > git. There is nothing in my logs unfortunately, but I shot a photo of
>> > the trace (see attachment).
>>
>> I see, it's the page_buffers() call which is triggering. Looking into
>> it...
>
> Can folks let me know if this fixes the problem?
>
> In this case I haven't been able to replicate the problem, but I've
> eyeballed the problem and I'm about 90% certain this should fix
> things. But I don't want to push this to Linus until I get
> confirmation from you all that it fixes things. That's just one of
> the ways in which your testing is critically important for ext4, so
> thanks again for your help in the past, present, and future.
> Thanks!!
>
> - Ted
>
> commit 51279fcb9720aa856ad81673886ca2349a373dac
> Author: Theodore Ts'o <[email protected]>
> Date: Thu Oct 28 15:15:21 2010 -0400
>
> ext4: BUG_ON fix: check if page has buffers before calling page_buffers()
>
> We need to make check if a page does not have buffes by checking
> page_has_buffers(page) before calling page_buffers(page) in
> ext4_writepage(). Otherwise page_buffers() could throw a BUG_ON.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2d6c6c8..1916164 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2718,7 +2718,7 @@ static int ext4_writepage(struct page *page,
> * try to create them using __block_write_begin. If this
> * fails, redirty the page and move on.
> */
> - if (!page_buffers(page)) {
> + if (!page_has_buffers(page)) {
> if (__block_write_begin(page, 0, len,
> noalloc_get_block_write)) {
> redirty_page:
> @@ -2732,12 +2732,10 @@ static int ext4_writepage(struct page *page,
> if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> ext4_bh_delay_or_unwritten)) {
> /*
> - * We don't want to do block allocation So redirty the
> - * page and return We may reach here when we do a
> - * journal commit via
> - * journal_submit_inode_data_buffers. If we don't
> - * have mapping block we just ignore them. We can also
> - * reach here via shrink_page_list
> + * We don't want to do block allocation, so redirty
> + * the page and return. We may reach here when we do
> + * a journal commit via journal_submit_inode_data_buffers.
> + * We can also reach here via shrink_page_list
> */
> goto redirty_page;
> }
>

Hm, unfortunately NO (see logs).

I have compiled via M=fs/ext4 in an already compiled build-tree with
these 3 patches.

sd@tbox:~/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none$
cat .pc/applied-patches
0001-ext4-Fix-build-when-CONFIG_EXT4_FS_XATTR.patch
0002-fs-build-fix-when-CONFIG_BLOCK.patch
ext4-BUG_ON-fix-check-if-page-has-buffers-before-calling-page_buffers.patch

- Sedat -

Attachments:

dmesg.txt (55.70 kB)
debug (85.39 kB)
kern.log (90.93 kB)
messages (93.89 kB)
syslog (90.55 kB)
Download all attachments

2010-10-28 20:05:52

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 09:54:23PM +0200, Sedat Dilek wrote:
>
> Hm, unfortunately NO (see logs).
>
> I have compiled via M=fs/ext4 in an already compiled build-tree with
> these 3 patches.

Ok, stupid question. You did make sure the new ext4 module was
loaded, right?

> [ 100.884524] ------------[ cut here ]------------
> [ 100.884718] kernel BUG at /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!

OK, so after the patch, line 2721: changed from page_buffers() to:

if (!page_has_buffers(page))

page_has_buffers() expands to:

#define page_has_buffers(page) PagePrivate(page)

which expands to test_bit() call to see if PG_private1 is set in
page->flags. There is no BUG_ON anywhere there as far as I can tell.

Line 2721 in the older kernel was page_has_buffers() which does have a
BUG_ON check.

- Ted

2010-10-28 20:15:00

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 10:05 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Oct 28, 2010 at 09:54:23PM +0200, Sedat Dilek wrote:
>>
>> Hm, unfortunately NO (see logs).
>>
>> I have compiled via M=fs/ext4 in an already compiled build-tree with
>> these 3 patches.
>
> Ok, stupid question. You did make sure the new ext4 module was
> loaded, right?
>
>> [ 100.884524] ------------[ cut here ]------------
>> [ 100.884718] kernel BUG at /home/sd/src/linux-2.6/linux-2.6.36/debian/build/source_i386_none/fs/ext4/inode.c:2721!
>
> OK, so after the patch, line 2721: changed from page_buffers() to:
>
> if (!page_has_buffers(page))
>
> page_has_buffers() expands to:
>
> #define page_has_buffers(page) PagePrivate(page)
>
> which expands to test_bit() call to see if PG_private1 is set in
> page->flags. There is no BUG_ON anywhere there as far as I can tell.
>
> Line 2721 in the older kernel was page_has_buffers() which does have a
> BUG_ON check.
>
> - Ted
>

I created a new ext4.ko via "make M=fs/ext4" in the build-dir and
copied the the kernel-module to /lib/modules/$(uname
-r)/kernel/fs/ext4/, is that not enough?
If not, I have to recompile a new kernel.

- Sedat -

2010-10-28 20:37:14

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 10:15:00PM +0200, Sedat Dilek wrote:
>
> I created a new ext4.ko via "make M=fs/ext4" in the build-dir and
> copied the the kernel-module to /lib/modules/$(uname
> -r)/kernel/fs/ext4/, is that not enough?

It might not be. Some distributions include modules in the initial
ramdisk, and load the module from the initrd, simply dropping the
module in /lib/modules/<kver>/... might not be enough. So recreating
the initrd and then rebooting might be enough.

Certainly if you drop it there on a running kernel, if you don't
unload the module (before unmounting all of your ext4 file systems),
and then reload the module, *definitely* just copying a module into
/lib/modules.... without making sure the module is reloaded, you'll
still have the old module.

- Ted

2010-10-28 21:02:15

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 10:37 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Oct 28, 2010 at 10:15:00PM +0200, Sedat Dilek wrote:
>>
>> I created a new ext4.ko via "make M=fs/ext4" in the build-dir and
>> copied the the kernel-module to /lib/modules/$(uname
>> -r)/kernel/fs/ext4/, is that not enough?
>
> It might not be. Some distributions include modules in the initial
> ramdisk, and load the module from the initrd, simply dropping the
> module in /lib/modules/<kver>/... might not be enough. So recreating
> the initrd and then rebooting might be enough.
>
> Certainly if you drop it there on a running kernel, if you don't
> unload the module (before unmounting all of your ext4 file systems),
> and then reload the module, *definitely* just copying a module into
> /lib/modules.... without making sure the module is reloaded, you'll
> still have the old module.
>
> - Ted
>

Grrr, did not think of ext4 kernel-module be in initrd.img.

OK, I have recreated a new one:

$ update-initramfs -k 2.6.36-git11.sd.1-686 -c

I could copy a complete linux-2.6 GIT tree within my $HOME/src, looks good.

Any other test-case you can suggest to be sure things work as expected, now?

BTW, feel free to add:

Reported-andtested-by: Sedat Dilek <[email protected]>

- Sedat -

2010-10-28 21:06:47

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 11:02:14PM +0200, Sedat Dilek wrote:
>
> I could copy a complete linux-2.6 GIT tree within my $HOME/src, looks good.

Excellent; and attempts to do this before resulted on the BUG_ON
tripping? If so, that's certainly a good sign!

>
> Any other test-case you can suggest to be sure things work as expected, now?

Well, given that I haven't been able to replicate myself, no, not
really.

I am curious about your setup. How much memory do you have on your
system, and how much free memory did you have before running the copy?
(i.e, what does the "free" command display)?

> BTW, feel free to add:
>
> Reported-andtested-by: Sedat Dilek <[email protected]>

I'll certainly give you credit (as well as to the other people who
reported the problem).

Thanks again!!

- Ted

2010-10-28 21:17:19

by Sedat Dilek

[permalink] [raw]

Subject: Re: [next-20101038] Call trace in ext4

On Thu, Oct 28, 2010 at 11:06 PM, Ted Ts'o <[email protected]> wrote:
> On Thu, Oct 28, 2010 at 11:02:14PM +0200, Sedat Dilek wrote:
>>
>> I could copy a complete linux-2.6 GIT tree within my $HOME/src, looks good.
>
> Excellent; and attempts to do this before resulted on the BUG_ON
> tripping? If so, that's certainly a good sign!
>
>>
>> Any other test-case you can suggest to be sure things work as expected, now?
>
> Well, given that I haven't been able to replicate myself, no, not
> really.
>
> I am curious about your setup. How much memory do you have on your
> system, and how much free memory did you have before running the copy?
> (i.e, what does the "free" command display)?
>

This is an IBM T40p with Pentium-M (Banias) 1.7GHz and 1GByte RAM, 60GByte HDD.

Before I run the copy command, I have open 2 KDE konsoles, 1x Quassel
IRC-client, 1x Firefox with 5 open tabs.

# free
total used free shared buffers cached
Mem: 1033796 1005812 27984 0 61400 617252
-/+ buffers/cache: 327160 706636
Swap: 1052244 0 1052244

While copying (a list with invoking x-timesl free command is attached)...

# free
total used free shared buffers cached
Mem: 1033796 1017332 16464 0 66144 630488
-/+ buffers/cache: 320700 713096
Swap: 1052244 0 1052244

>> BTW, feel free to add:
>>
>> Reported-andtested-by: Sedat Dilek <[email protected]>
>
> I'll certainly give you credit (as well as to the other people who
> reported the problem).
>

Of course, Linus wrote in one 2.6.36-rcX announce that maintainers
forgot about that.
But I saw people on LKML adding a "Feel free to add...", so I follow
the etiquette.

> Thanks again!!
>
> - Ted
>

I have to thank for the fast replies and patch.

- Sedat -

Attachments:

free.txt (8.63 kB)