LinuxLists.cc - Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

2013-04-03 08:52:18

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

Attachments:

(No filename) (342.00 B)
ext4_zeroout_cpu2disk_fix.patch (504.00 B)
(No filename) (2.95 kB)
Download all attachments

2013-04-03 09:54:00

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

Attachments:

(No filename) (653.00 B)
sparse.log (9.02 kB)
(No filename) (3.71 kB)
Download all attachments

2013-04-03 10:05:53

by Zheng Liu

[permalink] [raw]

Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

On Wed, Apr 03, 2013 at 01:53:49PM +0400, Dmitry Monakhov wrote:
> On Wed, 03 Apr 2013 12:52:06 +0400, Dmitry Monakhov <[email protected]> wrote:
> Non-text part: multipart/mixed
> > On Tue, 2 Apr 2013 16:22:41 -0700 (PDT), Christian Kujau <[email protected]> wrote:
> > > On Wed, 3 Apr 2013 at 02:05, Dmitry Monakhov wrote:
> > > > Please drop that patch and collect logs with a kernel which
> > > > has only 0001-enable-ES_AGGRESSIVE_TEST-V2.patch patch applied
> > Ok I have found at least one issue.
> Yeah.. My college advise me to use sparse in order to spot all
> cpu_to_ondisk format conversion
> make C=2 CF="-D__CHECK_ENDIAN__" fs/ext4/
> And it spotted a huge amount of issues. Which tell us that we are deeply
> in shit.

Yes, My college also suggest me that we should use sparse to check this
problem. I think the following patch could fix this bug.

Regards,
- Zheng

Subject: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

From: Zheng Liu <[email protected]>

When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out. So
fix it.

Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/extents.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e4a6844..2352467 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2999,20 +2999,23 @@ static int ext4_split_extent_at(handle_t *handle,
if (split_flag & EXT4_EXT_DATA_VALID1) {
err = ext4_ext_zeroout(inode, ex2);
zero_ex.ee_block = ex2->ee_block;
- zero_ex.ee_len = ext4_ext_get_actual_len(ex2);
+ zero_ex.ee_len = cpu_to_le16(
+ ext4_ext_get_actual_len(ex2));
ext4_ext_store_pblock(&zero_ex,
ext4_ext_pblock(ex2));
} else {
err = ext4_ext_zeroout(inode, ex);
zero_ex.ee_block = ex->ee_block;
- zero_ex.ee_len = ext4_ext_get_actual_len(ex);
+ zero_ex.ee_len = cpu_to_le16(
+ ext4_ext_get_actual_len(ex));
ext4_ext_store_pblock(&zero_ex,
ext4_ext_pblock(ex));
}
} else {
err = ext4_ext_zeroout(inode, &orig_ex);
zero_ex.ee_block = orig_ex.ee_block;
- zero_ex.ee_len = ext4_ext_get_actual_len(&orig_ex);
+ zero_ex.ee_len = cpu_to_le16(
+ ext4_ext_get_actual_len(&orig_ex));
ext4_ext_store_pblock(&zero_ex,
ext4_ext_pblock(&orig_ex));
}
@@ -3272,7 +3275,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
if (err)
goto out;
zero_ex.ee_block = ex->ee_block;
- zero_ex.ee_len = ext4_ext_get_actual_len(ex);
+ zero_ex.ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex));
ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));

err = ext4_ext_get_access(handle, inode, path + depth);
--
1.7.12.rc2.18.g61b472e

2013-04-03 11:02:59

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

On Wed, 03 Apr 2013 13:53:49 +0400, Dmitry Monakhov <[email protected]> wrote:
Non-text part: multipart/mixed
> On Wed, 03 Apr 2013 12:52:06 +0400, Dmitry Monakhov <[email protected]> wrote:
> Non-text part: multipart/mixed
> > On Tue, 2 Apr 2013 16:22:41 -0700 (PDT), Christian Kujau <[email protected]> wrote:
> > > On Wed, 3 Apr 2013 at 02:05, Dmitry Monakhov wrote:
> > > > Please drop that patch and collect logs with a kernel which
> > > > has only 0001-enable-ES_AGGRESSIVE_TEST-V2.patch patch applied
Good news big endian cpu owners
Please try following patches(second is most important):
http://patchwork.ozlabs.org/patch/233396/
http://patchwork.ozlabs.org/patch/233397/
I hope this should fix all known issues
> > Ok I have found at least one issue.
> Yeah.. My college advise me to use sparse in order to spot all
> cpu_to_ondisk format conversion
> make C=2 CF="-D__CHECK_ENDIAN__" fs/ext4/
> And it spotted a huge amount of issues. Which tell us that we are deeply
> in shit.
> <stdin>:1220:2: warning: #warning syscall kcmp not implemented
> <stdin>:1223:2: warning: #warning syscall finit_module not implemented
> fs/ext4/ialloc.c:902:37: warning: symbol 'sbi' shadows an earlier one
> fs/ext4/ialloc.c:650:29: originally declared here
> fs/ext4/inode.c:58:17: warning: incorrect type in assignment (different base types)
> fs/ext4/inode.c:58:17: expected unsigned short [unsigned] [usertype] csum_lo
> fs/ext4/inode.c:58:17: got restricted __le16 [usertype] l_i_checksum_lo
> fs/ext4/inode.c:62:25: warning: incorrect type in assignment (different base types)
> fs/ext4/inode.c:62:25: expected unsigned short [unsigned] [usertype] csum_hi
> fs/ext4/inode.c:62:25: got restricted __le16 [usertype] i_checksum_hi
> fs/ext4/inode.c:69:28: warning: incorrect type in assignment (different base types)
> fs/ext4/inode.c:69:28: expected restricted __le16 [usertype] l_i_checksum_lo
> fs/ext4/inode.c:69:28: got unsigned short [unsigned] [usertype] csum_lo
> fs/ext4/inode.c:72:36: warning: incorrect type in assignment (different base types)
> fs/ext4/inode.c:72:36: expected restricted __le16 [usertype] i_checksum_hi
> fs/ext4/inode.c:72:36: got unsigned short [unsigned] [usertype] csum_hi
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> fs/ext4/ioctl.c:358:36: warning: symbol 'sb' shadows an earlier one
> fs/ext4/ioctl.c:26:28: originally declared here
> fs/ext4/namei.c:2008:36: warning: potentially expensive pointer subtraction
> fs/ext4/namei.c:423:18: warning: incorrect type in assignment (different base types)
> fs/ext4/namei.c:423:18: expected unsigned int [unsigned] [usertype] old_csum
> fs/ext4/namei.c:423:18: got restricted __le32 [usertype] dt_checksum
> fs/ext4/namei.c:427:24: warning: incorrect type in assignment (different base types)
> fs/ext4/namei.c:427:24: expected restricted __le32 [usertype] dt_checksum
> fs/ext4/namei.c:427:24: got unsigned int [unsigned] [usertype] old_csum
> include/trace/events/ext4.h:1926:1: warning: cast from restricted __le32
> include/trace/events/ext4.h:1926:1: warning: incorrect type in assignment (different base types)
> include/trace/events/ext4.h:1926:1: expected unsigned int [unsigned] [usertype] ee_lblk
> include/trace/events/ext4.h:1926:1: got restricted __le32 [usertype] <noident>
> include/trace/events/ext4.h:1926:1: warning: cast from restricted __le32
> include/trace/events/ext4.h:1926:1: warning: incorrect type in assignment (different base types)
> include/trace/events/ext4.h:1926:1: expected unsigned int [unsigned] [usertype] ee_lblk
> include/trace/events/ext4.h:1926:1: got restricted __le32 [usertype] <noident>
> fs/ext4/super.c:1957:26: warning: incorrect type in assignment (different base types)
> fs/ext4/super.c:1957:26: expected unsigned short [unsigned] [usertype] old_csum
> fs/ext4/super.c:1957:26: got restricted __le16 [usertype] bg_checksum
> fs/ext4/super.c:1963:34: warning: incorrect type in assignment (different base types)
> fs/ext4/super.c:1963:34: expected restricted __le16 [usertype] bg_checksum
> fs/ext4/super.c:1963:34: got unsigned short [unsigned] [usertype] old_csum
> include/uapi/linux/swab.h:60:16: error: undefined identifier '__builtin_bswap32'
> include/uapi/linux/swab.h:60:33: error: not a function <noident>
> fs/ext4/extents.c:3002:48: warning: incorrect type in assignment (different base types)
> fs/ext4/extents.c:3002:48: expected restricted __le16 [assigned] [usertype] ee_len
> fs/ext4/extents.c:3002:48: got restricted __be16 [usertype] <noident>
> fs/ext4/extents.c:3008:48: warning: incorrect type in assignment (different base types)
> fs/ext4/extents.c:3008:48: expected restricted __le16 [addressable] [assigned] [usertype] ee_len
> fs/ext4/extents.c:3008:48: got restricted __be16 [usertype] <noident>
> fs/ext4/extents.c:3015:40: warning: incorrect type in assignment (different base types)
> fs/ext4/extents.c:3015:40: expected restricted __le16 [addressable] [assigned] [usertype] ee_len
> fs/ext4/extents.c:3015:40: got restricted __be16 [usertype] <noident>
> fs/ext4/extents.c:603:28: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:671:28: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:849:43: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:984:47: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:1063:50: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:1656:52: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:1706:32: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:1929:43: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:2206:55: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:2528:72: warning: potentially expensive pointer subtraction
> fs/ext4/extents.c:2787:36: warning: incorrect type in argument 5 (different base types)
> fs/ext4/extents.c:2787:36: expected unsigned short [unsigned] eh_entries
> fs/ext4/extents.c:2787:36: got restricted __le16 [usertype] eh_entries
> fs/ext4/extents.c:3275:32: warning: incorrect type in assignment (different base types)
> fs/ext4/extents.c:3275:32: expected restricted __le16 [assigned] [usertype] ee_len
> fs/ext4/extents.c:3275:32: got restricted __be16 [usertype] <noident>
> fs/ext4/extents.c:4642:34: warning: cast from restricted __le16
> fs/ext4/extents.c:4642:34: warning: incorrect type in argument 1 (different base types)
> fs/ext4/extents.c:4642:34: expected unsigned short [unsigned] [usertype] val
> fs/ext4/extents.c:4642:34: got restricted __le16 [usertype] eh_entries
> fs/ext4/extents.c:4642:34: warning: cast from restricted __le16
> fs/ext4/extents.c:4642:34: warning: cast from restricted __le16
> fs/ext4/extents.c:4642:34: warning: restricted __be16 degrades to integer
> fs/ext4/mballoc.c:863:21: warning: symbol 'group' shadows an earlier one
> fs/ext4/mballoc.c:795:35: originally declared here
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> fs/ext4/mballoc.c:4855:9: warning: context imbalance in 'ext4_trim_extent' - unexpected unlock
> fs/ext4/move_extent.c:859:29: warning: symbol 'err' shadows an earlier one
> fs/ext4/move_extent.c:835:16: originally declared here
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> fs/ext4/mmp.c:18:16: warning: incorrect type in return expression (different base types)
> fs/ext4/mmp.c:18:16: expected unsigned int
> fs/ext4/mmp.c:18:16: got restricted __le32 [usertype] <noident>
> fs/ext4/mmp.c:27:19: warning: restricted __le32 degrades to integer
> fs/ext4/mmp.c:36:27: warning: incorrect type in assignment (different base types)
> fs/ext4/mmp.c:36:27: expected restricted __le32 [usertype] mmp_checksum
> fs/ext4/mmp.c:36:27: got unsigned int
> fs/ext4/indirect.c:579:41: warning: potentially expensive pointer subtraction
> fs/ext4/indirect.c:592:52: warning: potentially expensive pointer subtraction
> fs/ext4/indirect.c:1257:68: warning: potentially expensive pointer subtraction
> fs/ext4/indirect.c:1268:67: warning: potentially expensive pointer subtraction
> fs/ext4/indirect.c:1275:48: warning: potentially expensive pointer subtraction
> fs/ext4/indirect.c:1327:52: warning: incorrect type in argument 2 (different base types)
> fs/ext4/indirect.c:1327:52: expected unsigned long [unsigned] [usertype] block
> fs/ext4/indirect.c:1327:52: got restricted __le32 [assigned] [usertype] blk
> fs/ext4/indirect.c:1329:33: warning: incorrect type in argument 4 (different base types)
> fs/ext4/indirect.c:1329:33: expected unsigned long long [unsigned] [usertype] <noident>
> fs/ext4/indirect.c:1329:33: got restricted __le32 [assigned] [usertype] blk
> fs/ext4/xattr.c:127:13: warning: incorrect type in assignment (different base types)
> fs/ext4/xattr.c:127:13: expected unsigned int [unsigned] [usertype] old
> fs/ext4/xattr.c:127:13: got restricted __le32 [usertype] h_checksum
> fs/ext4/xattr.c:129:18: warning: incorrect type in assignment (different base types)
> fs/ext4/xattr.c:129:18: expected unsigned long [unsigned] [usertype] block_nr
> fs/ext4/xattr.c:129:18: got restricted __le64 [usertype] <noident>
> fs/ext4/xattr.c:135:25: warning: incorrect type in assignment (different base types)
> fs/ext4/xattr.c:135:25: expected restricted __le32 [usertype] h_checksum
> fs/ext4/xattr.c:135:25: got unsigned int [unsigned] [usertype] old
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> include/linux/mm.h:759:16: warning: potentially expensive pointer subtraction
> So please give me couple of hours and I'll send you a complete patch.
>
>
> > Please give a try to this patch
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 1530cf4..e8460f6 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -3272,7 +3272,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> > if (err)
> > goto out;
> > zero_ex.ee_block = ex->ee_block;
> > - zero_ex.ee_len = ext4_ext_get_actual_len(ex);
> > + zero_ex.ee_len = cpu_to_le16(ext4_ext_get_actual_len(ext));
> > ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
> >
> > err = ext4_ext_get_access(handle, inode, path + depth);
> >
> > >
> > > I've applied (only) 0001-enable-ES_AGGRESSIVE_TEST-V2.patch to 3.9-rc4:
> > >
> > > patching file fs/ext4/extents_status.h
> > > patching file fs/ext4/inode.c
> > > Hunk #1 succeeded at 1588 (offset 42 lines).
> > > Hunk #2 succeeded at 1609 (offset 42 lines).
> > >
> > > And tried to download some images via bittorrent. As expected, lots of
> > > "ES cache assertation failed" were being logged:
> > >
> > > http://nerdbynature.de/bits/3.9.0-rc4/ext4/
> > > => messages_0001-enable-ES_AGGRESSIVE_TEST-V2.txt.xz
> > >
> > > I've tried to download 3 files, all via bittorrent (so, fallocate & heavy
> > > mmap)
> > >
> > > 1) 8GB Fedora iso, there are also WARNINGs bring triggered, see below.
> > > I decided to cancel the download after some gigabyes.
> > >
> > > 2) A 50 MB Debian iso, this produced just one "ES cache assertation"
> > > message. Download went OK, checksum was correct too.
> > >
> > > 3) A 221 MB Fedora iso, produced a couple of "ES cache assertation"
> > > messages, but no WARNINGs. Download went OK, checksum was correct too.
> > >
> > > It's all in that messages_0001-enable-ES_AGGRESSIVE_TEST-V2.txt.xz file
> > > above. I just e2fsck'ed the ext4 filesystem again (and did so last night),
> > > but no errors were found.
> > >
> > > HTH,
> > > Christian.
> > >
> > > One of the WARNINGs during that 8GB download:
> > >
> > > ino:39190654 lbkl:0, b_state=0x0004b988, b_size=4131
> > > ------------[ cut here ]------------
> > > WARNING: at /opt/linux-git/fs/ext4/inode.c:1600
> > > Modules linked in: md5 ecb nfs i2c_powermac therm_adt746x ecryptfs firewire_sbp2 arc4 b43 usb_storage mac80211 cfg80211
> > > NIP: c013745c LR: c013745c CTR: c000df9c
> > > REGS: edc479a0 TRAP: 0700 Tainted: G W (3.9.0-rc4-dirty)
> > > MSR: 00029032 <EE,ME,IR,DR,RI> CR: 44028644 XER: 20000000
> > > TASK = edca9740[4379] 'flush-254:1' THREAD: edc46000
> > > GPR00: c013745c edc47a50 edca9740 00000034 edca9db0 00000006 00000000 00008000
> > > GPR08: 00003fb0 00218f23 00000000 c000006e 00000dc9 00000000 00000009 ee18cca0
> > > GPR16: edc47c78 0000000e 0004b9bf 0004b988 00000000 edc47a84 00001000 e6357540
> > > GPR24: edc47b78 e6357540 00000000 0004b97f 00000000 0051d188 00000000 0004b988
> > > NIP [c013745c] mpage_da_submit_io+0x3dc/0x3f0
> > > LR [c013745c] mpage_da_submit_io+0x3dc/0x3f0
> > > Call Trace:
> > > [edc47a50] [c013745c] mpage_da_submit_io+0x3dc/0x3f0 (unreliable)
> > > [edc47b30] [c013c9f0] mpage_da_map_and_submit+0x16c/0x5f0
> > > [edc47bc0] [c013d2e4] write_cache_pages_da+0x470/0x480
> > > [edc47c70] [c013d554] ext4_da_writepages+0x260/0x49c
> > > [edc47d20] [c00eeea0] __writeback_single_inode+0x34/0x120
> > > [edc47d40] [c00ef508] writeback_sb_inodes+0x1fc/0x34c
> > > [edc47db0] [c00ef6e4] __writeback_inodes_wb+0x8c/0xd0
> > > [edc47de0] [c00efa90] wb_writeback+0x1b4/0x1bc
> > > [edc47e20] [c00f06d0] wb_do_writeback+0x1ec/0x1f4
> > > [edc47e80] [c00f0748] bdi_writeback_thread+0x70/0x140
> > > [edc47eb0] [c0051c18] kthread+0xa8/0xac
> > > [edc47f40] [c00106cc] ret_from_kernel_thread+0x64/0x6c
> > > --- Exception: 0 at (null)
> > > LR = (null)
> > > Instruction dump:
> > >
> > > --
> > > BOFH excuse #61:
> > >
> > > not approved by the FCC

2013-04-03 12:21:10

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

On Wed, Apr 03, 2013 at 06:22:04PM +0800, Zheng Liu wrote:
> Subject: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
>
> From: Zheng Liu <[email protected]>
>
> When an extent was zeroed out, we forgot to do convert from cpu to le16.
> It could make us hit a BUG_ON when we try to write dirty pages out. So
> fix it.
>
> Signed-off-by: Zheng Liu <[email protected]>

Thanks for finding this! I think we should push this to Linus right
away, and not wait for the next merge window. The bug has been here
for a long time, but it was unmasked by the fact that we unbroke
extent zeroing in 3.9-rcX.

I have two big questions. (1) Shouldn't Eric Whitney have picked this
up with his ARM pandaboard testing, since IIRC it's big-endian as
well? If not, is there something we can do to improve our testing wrt
to big-endian systems?

And (2) does it make sense to have an inline function
ext4_ext_set_len(len)? It might save some lines of code, but more
importantly, it might make it less likely that we will overlook this
sort of bug in the future.

- Ted

2013-04-03 12:29:59

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

On Wed, 3 Apr 2013 08:20:58 -0400, Theodore Ts'o <[email protected]> wrote:
> On Wed, Apr 03, 2013 at 06:22:04PM +0800, Zheng Liu wrote:
> > Subject: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> >
> > From: Zheng Liu <[email protected]>
> >
> > When an extent was zeroed out, we forgot to do convert from cpu to le16.
> > It could make us hit a BUG_ON when we try to write dirty pages out. So
> > fix it.
> >
> > Signed-off-by: Zheng Liu <[email protected]>
>
> Thanks for finding this! I think we should push this to Linus right
> away, and not wait for the next merge window. The bug has been here
> for a long time, but it was unmasked by the fact that we unbroke
> extent zeroing in 3.9-rcX.
IMHO you have to pick this one http://patchwork.ozlabs.org/patch/233397
because it also fix ext_to_indirect_migration and inode's csum
>
> I have two big questions. (1) Shouldn't Eric Whitney have picked this
> up with his ARM pandaboard testing, since IIRC it's big-endian as
> well? If not, is there something we can do to improve our testing wrt
> to big-endian systems?
>
> And (2) does it make sense to have an inline function
> ext4_ext_set_len(len)? It might save some lines of code, but more
> importantly, it might make it less likely that we will overlook this
> sort of bug in the future.
Let's live it for now, later I'll cleanup/optimize this 'zero_ex'
at least from from ext4_split_extent_at because at the end we are shure
that 'ex' is fully mapped and initialized
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-04-03 14:34:14

by Eric Whitney

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

* Theodore Ts'o <[email protected]>:
> On Wed, Apr 03, 2013 at 06:22:04PM +0800, Zheng Liu wrote:
> > Subject: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> >
> > From: Zheng Liu <[email protected]>
> >
> > When an extent was zeroed out, we forgot to do convert from cpu to le16.
> > It could make us hit a BUG_ON when we try to write dirty pages out. So
> > fix it.
> >
> > Signed-off-by: Zheng Liu <[email protected]>
>
> Thanks for finding this! I think we should push this to Linus right
> away, and not wait for the next merge window. The bug has been here
> for a long time, but it was unmasked by the fact that we unbroke
> extent zeroing in 3.9-rcX.
>
> I have two big questions. (1) Shouldn't Eric Whitney have picked this
> up with his ARM pandaboard testing, since IIRC it's big-endian as
> well? If not, is there something we can do to improve our testing wrt
> to big-endian systems?

The TI OMAP4 processor on my Pandaboard test system is little endian.

Eric

>
> And (2) does it make sense to have an inline function
> ext4_ext_set_len(len)? It might save some lines of code, but more
> importantly, it might make it less likely that we will overlook this
> sort of bug in the future.
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-04-03 14:41:22

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:
>
> The TI OMAP4 processor on my Pandaboard test system is little endian.

Ah... so basically, we need to find a test platform which allows us to
boot arbitrary kernels and allows us to have root access (which means
it's unlikely we'll be able to do this via remote access) and which
doesn't have exotic power requirements (which as far as I know rules
out pSeries and zSeries systems....)

It would also be nice if we could run tests in finite time, which
probably rules out the Hercules emulator (it runs at one-tenth zSeries
processor speeds, which doesn't win speed competitions by default, and
I suspect their storage speeds are even worse).

Anyone else have any suggestions? Or anyone willing to help us run
ext4 regression tests on the ext4 dev tree, so we can find these
problems before we merge into mainline?

Thanks,

- Ted

2013-04-03 15:23:43

by Florian Fainelli

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

Hello,

Le 04/03/13 16:41, Theodore Ts'o a écrit :
> On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:
>>
>> The TI OMAP4 processor on my Pandaboard test system is little endian.
>
> Ah... so basically, we need to find a test platform which allows us to
> boot arbitrary kernels and allows us to have root access (which means
> it's unlikely we'll be able to do this via remote access) and which
> doesn't have exotic power requirements (which as far as I know rules
> out pSeries and zSeries systems....)
>
> It would also be nice if we could run tests in finite time, which
> probably rules out the Hercules emulator (it runs at one-tenth zSeries
> processor speeds, which doesn't win speed competitions by default, and
> I suspect their storage speeds are even worse).
>
> Anyone else have any suggestions? Or anyone willing to help us run
> ext4 regression tests on the ext4 dev tree, so we can find these
> problems before we merge into mainline?

Qemu emulates various mainline PowerPC, MIPS and SPARC big-endian
systems pretty efficiently and it should not be too hard neither to
script nor to get a recent kernel up and running on these platforms.

My 2 cents.
--
Florian

2013-04-09 03:05:21

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

Hello Ted,

----- Original Message -----
> From: "Theodore Ts'o" <[email protected]>
> To: "Eric Whitney" <[email protected]>
> Cc: "Dmitry Monakhov" <[email protected]>, "Christian Kujau" <[email protected]>, "CAI Qian"
> <[email protected]>, "LKML" <[email protected]>, "linux-s390" <[email protected]>, "Steve Best"
> <[email protected]>, [email protected]
> Sent: Wednesday, April 3, 2013 10:41:14 PM
> Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
>
> On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:
> >
> > The TI OMAP4 processor on my Pandaboard test system is little endian.
>
> Ah... so basically, we need to find a test platform which allows us to
> boot arbitrary kernels and allows us to have root access (which means
> it's unlikely we'll be able to do this via remote access) and which
> doesn't have exotic power requirements (which as far as I know rules
> out pSeries and zSeries systems....)
>
> It would also be nice if we could run tests in finite time, which
> probably rules out the Hercules emulator (it runs at one-tenth zSeries
> processor speeds, which doesn't win speed competitions by default, and
> I suspect their storage speeds are even worse).
>
> Anyone else have any suggestions? Or anyone willing to help us run
> ext4 regression tests on the ext4 dev tree, so we can find these
> problems before we merge into mainline?
I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
I plan to run this weekly and setup a wiki page to update the testing
status by every Friday.
CAI Qian
>
> Thanks,
>
> - Ted
>

2013-04-20 15:20:03

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

On Mon, Apr 08, 2013 at 11:05:11PM -0400, CAI Qian wrote:
> I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
> KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
> I plan to run this weekly and setup a wiki page to update the testing
> status by every Friday.

Hi CAI,

Sorry for not getting back to you sooner; I was at Collaboration
Summit and LSF/MM last week.

It would be great if you could help run xfstests on the ext4 dev tree
on various platforms. We don't have any coverage on Power7 or
s390/Z10 at the moment, so that would be especially welcome. Coverage
on alternate storage backends can be interesting in finding timing
problems so they would be valuable as well.

If you have any Itanium platforms, that would be great too, since we
don't have that today.

The various ext4 configurations which I test can be found in the
kvm-autorun/conf directory in my xfstests-bld git repository:

git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git

(This repository is a convenient setup to do build xfstests in a
hermetic environment, and convenience scripts to run xfstests under
kvm, and scripts on the host OS kick off the kvm test run and parse
the test output afterwards.)

Thanks for offering to test the dev branch!

- Ted

2013-04-22 03:40:50

by Qian Cai

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

Hi Ted,

----- Original Message -----
> From: "Theodore Ts'o" <[email protected]>
> To: "CAI Qian" <[email protected]>
> Cc: "Eric Whitney" <[email protected]>, "Dmitry Monakhov" <[email protected]>, "Christian Kujau"
> nerdbynature.de>, "LKML" <[email protected]>, "linux-s390" <[email protected]>, "Steve
> Best" <[email protected]>, [email protected]
> Sent: Saturday, April 20, 2013 11:19:45 PM
> Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
>
> On Mon, Apr 08, 2013 at 11:05:11PM -0400, CAI Qian wrote:
> > I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
> > KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
> > I plan to run this weekly and setup a wiki page to update the testing
> > status by every Friday.
>
> Hi CAI,
>
> Sorry for not getting back to you sooner; I was at Collaboration
> Summit and LSF/MM last week.
>
> It would be great if you could help run xfstests on the ext4 dev tree
> on various platforms. We don't have any coverage on Power7 or
> s390/Z10 at the moment, so that would be especially welcome. Coverage
> on alternate storage backends can be interesting in finding timing
> problems so they would be valuable as well.
>
> If you have any Itanium platforms, that would be great too, since we
> don't have that today.
Unfortunately, to get those ia64 up and running with the upstream kernel
required some significant efforts. I'd leave that for now until it is
something very important.
>
> The various ext4 configurations which I test can be found in the
> kvm-autorun/conf directory in my xfstests-bld git repository:
>
> git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
>
> (This repository is a convenient setup to do build xfstests in a
> hermetic environment, and convenience scripts to run xfstests under
> kvm, and scripts on the host OS kick off the kvm test run and parse
> the test output afterwards.)
>
> Thanks for offering to test the dev branch!
OK, will check with that. BTW, git.kernel.org is kind of broken for me
very often those days, as almost all my tests got this,
+ git clone http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git -b dev
Cloning into 'ext4'...
fatal: The remote end hung up unexpectedly
fatal: recursion detected in die handler

Therefore, it is going to take a while to re-try later as the test systems
here always do testing from a clean environment, i.e., re-install OS, and
then re-clone the tree etc. :\
CAI Qian
>
> - Ted
>

2013-04-22 10:05:01

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

On 20/04/13 17:19, Theodore Ts'o wrote:
> On Mon, Apr 08, 2013 at 11:05:11PM -0400, CAI Qian wrote:
>> I can help run xfstests for ext4 dev tree on x64, Power7, Z10 and
>> KVM platforms with back-storage like SAN/multipath, iSCSI and FCoE.
>> I plan to run this weekly and setup a wiki page to update the testing
>> status by every Friday.
>
> Hi CAI,
>
> Sorry for not getting back to you sooner; I was at Collaboration
> Summit and LSF/MM last week.
>
> It would be great if you could help run xfstests on the ext4 dev tree
> on various platforms. We don't have any coverage on Power7 or
> s390/Z10 at the moment, so that would be especially welcome. Coverage
> on alternate storage backends can be interesting in finding timing
> problems so they would be valuable as well.

It is probably easier to get access to a Power system (gcc compile farm?).
Having said that, there is a service available to get access to a System z
for open source development:
http://www-03.ibm.com/systems/z/os/linux/support/community.html

When we did the valgrind port we used that a lot to gave access to
non-IBMers.
Dont know if it is ok to install a private kernel, though. Will double
check.

Christian