2007-06-14 07:23:17

by Dmitri Monakhov

[permalink] [raw]
Subject: delayed allocatiou result in Oops

Simple test failed on ext4 when delayed allocation was used.
#mkfs.ext3 -b4096 /dev/vzvg/test2
#mount -text4dev /dev/vzvg/test2 /mnt/test -odelalloc
#fsstress -d /mnt/test/ -l100 -n100000 -p20 -f dwrite=0

<CONSOLE LOG>
EXT4-fs: writeback error = -28
......
EXT4-fs: writeback error = -28
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
PGD 44c1067 PUD 44fd067 PMD 0
Oops: 0000 [2] SMP
CPU 0
Modules linked in: ext4dev jbd2
Pid: 4833, comm: fsstress Not tainted 2.6.22-rc4-mm2 #9
RIP: 0010:[<ffffffff802a12d2>] [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
RSP: 0018:ffff810004df9a58 EFLAGS: 00010203
RAX: 0000000000001000 RBX: ffff8100cf4256f8 RCX: 000000000000000c

RDX: 0000000000000001 RSI: 000000000000000c RDI: ffff8100cf4256f8
RBP: 0000000000000000 R08: ffff810004df9be8 R09: ffff810004df9c58
R10: 8888888888888888 R11: 8888888888888888 R12: 0000000000000052
R13: 0000000000001000 R14: 0000000000000000 R15: 00000000000000d3
FS: 00002adfe3f7d6f0(0000) GS:ffffffff80730000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000004362000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fsstress (pid: 4833, threadinfo ffff810004df8000, task ffff810004c867a0)
Stack: ffffffff880180f2 ffff8100054a23f0 0000000000000000 0000000000000000
ffff810005dcbb80 ffff81000549bf00 0000000000000000 ffff810005def8b0
ffffffff88029e60 ffffffff8025b2be ffff8100054da540 ffff8100cf496fb0
Call Trace:
[<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
[<ffffffff8025b2be>] find_get_page+0x21/0x51
[<ffffffff802a5b45>] do_mpage_readpage+0x45f/0x480
[<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
[<ffffffff88003d64>] :jbd2:jbd2_journal_dirty_metadata+0x197/0x1be
[<ffffffff80245f3b>] bit_waitqueue+0x1c/0x99
[<ffffffff802a5bb4>] mpage_readpage+0x4e/0x67
[<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
[<ffffffff8028817e>] do_lookup+0x63/0x1ae
[<ffffffff8025b1ae>] file_read_actor+0x8d/0xf6
[<ffffffff8025b2be>] find_get_page+0x21/0x51
[<ffffffff8025b93a>] do_generic_mapping_read+0x23c/0x3da
[<ffffffff8025b121>] file_read_actor+0x0/0xf6
[<ffffffff8025d123>] generic_file_aio_read+0x119/0x156
[<ffffffff80281848>] do_sync_read+0xc9/0x10c

[<ffffffff802845b2>] cp_new_stat+0xe5/0xfd
[<ffffffff80246007>] autoremove_wake_function+0x0/0x2e
[<ffffffff80281fba>] vfs_read+0xaa/0x132
[<ffffffff80282356>] sys_read+0x45/0x6e
[<ffffffff8020b41e>] system_call+0x7e/0x83
Code: 8b 45 00 a8 01 0f 85 e6 00 00 00 8b 45 00 a8 20 0f 85 c9 00
<CONSOLE LOG>

I've digged this a litle bit with folowig results:

int block_read_full_page(struct page *page, get_block_t *get_block)
{
...
1914: if (!page_has_buffers(page)) <<< page_has_buffers(page) == true
create_empty_buffers(page, blocksize, 0);
head = page_buffers(page); <<<< page_buffers(page) == NULL
<<<i've add debug info here:
<<< page->flags == 100000000000821
<<< PagePrivate(page) == 1, (page)->private == NULL
<<< So we have private page without buffers, it is WRONG.

iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
bh = head;
nr = 0;
i = 0;

do {
if (buffer_uptodate(bh)) << Null pointer deref here result in oops
.......
}


2007-06-15 05:14:33

by Alex Tomas

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

looks like an error in error handling path (notice -28 (ENOSPC) before)

thanks for the report, Alex

Dmitriy Monakhov wrote:
> )
>
> Simple test failed on ext4 when delayed allocation was used.
> #mkfs.ext3 -b4096 /dev/vzvg/test2
> #mount -text4dev /dev/vzvg/test2 /mnt/test -odelalloc
> #fsstress -d /mnt/test/ -l100 -n100000 -p20 -f dwrite=0
>
> <CONSOLE LOG>
> EXT4-fs: writeback error = -28
> ......
> EXT4-fs: writeback error = -28
> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> PGD 44c1067 PUD 44fd067 PMD 0
> Oops: 0000 [2] SMP
> CPU 0
> Modules linked in: ext4dev jbd2
> Pid: 4833, comm: fsstress Not tainted 2.6.22-rc4-mm2 #9
> RIP: 0010:[<ffffffff802a12d2>] [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> RSP: 0018:ffff810004df9a58 EFLAGS: 00010203
> RAX: 0000000000001000 RBX: ffff8100cf4256f8 RCX: 000000000000000c
>
> RDX: 0000000000000001 RSI: 000000000000000c RDI: ffff8100cf4256f8
> RBP: 0000000000000000 R08: ffff810004df9be8 R09: ffff810004df9c58
> R10: 8888888888888888 R11: 8888888888888888 R12: 0000000000000052
> R13: 0000000000001000 R14: 0000000000000000 R15: 00000000000000d3
> FS: 00002adfe3f7d6f0(0000) GS:ffffffff80730000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000004362000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process fsstress (pid: 4833, threadinfo ffff810004df8000, task ffff810004c867a0)
> Stack: ffffffff880180f2 ffff8100054a23f0 0000000000000000 0000000000000000
> ffff810005dcbb80 ffff81000549bf00 0000000000000000 ffff810005def8b0
> ffffffff88029e60 ffffffff8025b2be ffff8100054da540 ffff8100cf496fb0
> Call Trace:
> [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> [<ffffffff8025b2be>] find_get_page+0x21/0x51
> [<ffffffff802a5b45>] do_mpage_readpage+0x45f/0x480
> [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> [<ffffffff88003d64>] :jbd2:jbd2_journal_dirty_metadata+0x197/0x1be
> [<ffffffff80245f3b>] bit_waitqueue+0x1c/0x99
> [<ffffffff802a5bb4>] mpage_readpage+0x4e/0x67
> [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> [<ffffffff8028817e>] do_lookup+0x63/0x1ae
> [<ffffffff8025b1ae>] file_read_actor+0x8d/0xf6
> [<ffffffff8025b2be>] find_get_page+0x21/0x51
> [<ffffffff8025b93a>] do_generic_mapping_read+0x23c/0x3da
> [<ffffffff8025b121>] file_read_actor+0x0/0xf6
> [<ffffffff8025d123>] generic_file_aio_read+0x119/0x156
> [<ffffffff80281848>] do_sync_read+0xc9/0x10c
>
> [<ffffffff802845b2>] cp_new_stat+0xe5/0xfd
> [<ffffffff80246007>] autoremove_wake_function+0x0/0x2e
> [<ffffffff80281fba>] vfs_read+0xaa/0x132
> [<ffffffff80282356>] sys_read+0x45/0x6e
> [<ffffffff8020b41e>] system_call+0x7e/0x83
> Code: 8b 45 00 a8 01 0f 85 e6 00 00 00 8b 45 00 a8 20 0f 85 c9 00
> <CONSOLE LOG>
>
> I've digged this a litle bit with folowig results:
>
> int block_read_full_page(struct page *page, get_block_t *get_block)
> {
> ...
> 1914: if (!page_has_buffers(page)) <<< page_has_buffers(page) == true
> create_empty_buffers(page, blocksize, 0);
> head = page_buffers(page); <<<< page_buffers(page) == NULL
> <<<i've add debug info here:
> <<< page->flags == 100000000000821
> <<< PagePrivate(page) == 1, (page)->private == NULL
> <<< So we have private page without buffers, it is WRONG.
>
> iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
> lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
> bh = head;
> nr = 0;
> i = 0;
>
> do {
> if (buffer_uptodate(bh)) << Null pointer deref here result in oops
> .......
> }
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-06-15 23:16:25

by Mingming Cao

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

I hit almost the same issue today also, but with different error #, and
one more kernel oops, when run fsstress on x86_64.

EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[<ffffffff8028bbb6>] block_read_full_page+0xb5/0x267
PGD 1f9842067 PUD 1f9843067 PMD 0
Oops: 0000 [5] SMP
CPU 3
Modules linked in:
Pid: 10900, comm: fsstress Not tainted 2.6.22-rc4-autokern1 #1
RIP: 0010:[<ffffffff8028bbb6>] [<ffffffff8028bbb6>] block_read_full_page+0xb5/0x267
RSP: 0000:ffff8101f984fa48 EFLAGS: 00010213
RAX: 0000000000000179 RBX: 0000000000000000 RCX: 000000000000000c
RDX: 000000000000000c RSI: ffffffff802e0f7b RDI: ffff81017ff578c8
RBP: ffff81017ff578c8 R08: ffff8101f984fbe8 R09: ffff8101f984fbe0
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000e5
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8101803ec5c0(0063) knlGS:00000000f7dec460
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001f9841000 CR4: 00000000000006e0
Process fsstress (pid: 10900, threadinfo ffff8101f984e000, task ffff8101f9824280)
Stack: 00001000f9ad4080 0000000100000000 0000000000000000 0000000000000179
ffff8100de7c4100 ffffffff802e0f7b 000003363e3761bf ffffffff804eac2d
ffff8101f984fb48 0000000000000082 ffff81017e9bc550 ffff81017e9bc588
Call Trace:
[<ffffffff802e0f7b>] ext4_get_block+0x0/0x104
[<ffffffff804eac2d>] thread_return+0x0/0xd5
[<ffffffff8028fcd6>] do_mpage_readpage+0x411/0x430
[<ffffffff804eb481>] io_schedule+0x26/0x32
[<ffffffff804eb6fb>] __wait_on_bit_lock+0x5f/0x6d
[<ffffffff8028fe7e>] mpage_readpage+0x42/0x5b
[<ffffffff802e0f7b>] ext4_get_block+0x0/0x104
[<ffffffff802395eb>] wake_bit_function+0x0/0x23
[<ffffffff8024a9bd>] file_read_actor+0x89/0xf4
[<ffffffff8024a21e>] find_get_page+0x1e/0x4d
[<ffffffff8024a763>] do_generic_mapping_read+0x20e/0x3df
[<ffffffff8024a934>] file_read_actor+0x0/0xf4
[<ffffffff8024c2e7>] generic_file_aio_read+0x11d/0x154
[<ffffffff8026c7ca>] do_sync_read+0xc8/0x10b
[<ffffffff80272c4f>] permission+0xbb/0xbd
[<ffffffff802395bd>] autoremove_wake_function+0x0/0x2e
[<ffffffff8026be62>] nameidata_to_filp+0x25/0x34
[<ffffffff8026be9e>] do_filp_open+0x2d/0x3d
[<ffffffff8026f355>] vfs_getattr+0x2b/0x2f
[<ffffffff8026f43d>] vfs_fstat+0x33/0x3a
[<ffffffff8026c8b8>] vfs_read+0xab/0x12e
[<ffffffff8026cbbc>] sys_read+0x45/0x6e
[<ffffffff80219f02>] ia32_sysret+0x0/0xa


Code: 8b 03 a8 01 0f 85 e1 00 00 00 8b 03 a8 20 0f 85 cc 00 00 00
RIP [<ffffffff8028bbb6>] block_read_full_page+0xb5/0x267
RSP <ffff8101f984fa48>
CR2: 0000000000000000
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
EXT4-fs: writeback error = -2
------------[ cut here ]------------
kernel BUG at fs/ext4/writeback.c:266!
invalid opcode: 0000 [6] SMP
CPU 3
Modules linked in:
Pid: 10851, comm: fsstress Not tainted 2.6.22-rc4-autokern1 #1
RIP: 0010:[<ffffffff802ed5f6>] [<ffffffff802ed5f6>] ext4_wb_submit_extent+0x1ef/0x3d9
RSP: 0000:ffff8101e47cfab8 EFLAGS: 00010246
RAX: 000000000001182c RBX: ffff8100c6709ca0 RCX: 000000000000000c
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8101e8de5000
RBP: ffff8100c6709a48 R08: ffff8101b1056338 R09: 0000000000000000
R10: ffff8101b1056338 R11: ffff8100c6709a48 R12: 0000000000000040
R13: ffff81017eaa5b98 R14: 0000000000000040 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8101803ec5c0(0063) knlGS:00000000f7dec460
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f7dcb004 CR3: 00000001e47c1000 CR4: 00000000000006e0
Process fsstress (pid: 10851, threadinfo ffff8101e47ce000, task ffff8101e47a6b30)
Stack: ffff8101cf22c9b8 0000000000000000 0000000000000001 0000000c00000001
ffff8100c6709a48 000000018028938e ffff8101e47cfb68 0000000000000000
ffff8101e47cfd28 ffff8100c6709ca0 ffff8100c6709a48 ffff8100c6709990
Call Trace:
[<ffffffff802edb95>] ext4_wb_handle_extent+0x3b5/0x48c
[<ffffffff802ebc24>] ext4_ext_walk_space+0x18a/0x20c
[<ffffffff802ed7e0>] ext4_wb_handle_extent+0x0/0x48c
[<ffffffff802edcc7>] ext4_wb_flush+0x5b/0x153
[<ffffffff802ee1a0>] ext4_wb_writepages+0x34b/0x398
[<ffffffff8024f81b>] do_writepages+0x20/0x2d
[<ffffffff80286164>] __writeback_single_inode+0x1df/0x3a7
[<ffffffff8024a47e>] find_get_pages_tag+0x34/0x89
[<ffffffff80250c66>] pagevec_lookup_tag+0x1a/0x24
[<ffffffff80249e89>] wait_on_page_writeback_range+0xc7/0x10d
[<ffffffff80286702>] sync_sb_inodes+0x1cb/0x2a0
[<ffffffff8028687c>] sync_inodes_sb+0xa5/0xb9
[<ffffffff803b3e09>] __up_read+0x10/0x8a
[<ffffffff802868fa>] __sync_inodes+0x6a/0xb1
[<ffffffff80286952>] sync_inodes+0x11/0x29
[<ffffffff8028895c>] do_sync+0x2c/0x50
[<ffffffff8028898b>] sys_sync+0xb/0xf
[<ffffffff80219f02>] ia32_sysret+0x0/0xa


Code: 0f 0b eb fe f0 41 0f ba 75 00 14 48 8b 4c 24 40 01 51 10 48
RIP [<ffffffff802ed5f6>] ext4_wb_submit_extent+0x1ef/0x3d9
RSP <ffff8101e47cfab8>

I will try the patch below...Alex, any hint about the second oops?

Mingming
Alex please
On Fri, 2007-06-15 at 09:14 +0400, Alex Tomas wrote:
> looks like an error in error handling path (notice -28 (ENOSPC) before)
>
> thanks for the report, Alex
>
> Dmitriy Monakhov wrote:
> > )
> >
> > Simple test failed on ext4 when delayed allocation was used.
> > #mkfs.ext3 -b4096 /dev/vzvg/test2
> > #mount -text4dev /dev/vzvg/test2 /mnt/test -odelalloc
> > #fsstress -d /mnt/test/ -l100 -n100000 -p20 -f dwrite=0
> >
> > <CONSOLE LOG>
> > EXT4-fs: writeback error = -28
> > ......
> > EXT4-fs: writeback error = -28
> > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> > [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> > PGD 44c1067 PUD 44fd067 PMD 0
> > Oops: 0000 [2] SMP
> > CPU 0
> > Modules linked in: ext4dev jbd2
> > Pid: 4833, comm: fsstress Not tainted 2.6.22-rc4-mm2 #9
> > RIP: 0010:[<ffffffff802a12d2>] [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> > RSP: 0018:ffff810004df9a58 EFLAGS: 00010203
> > RAX: 0000000000001000 RBX: ffff8100cf4256f8 RCX: 000000000000000c
> >
> > RDX: 0000000000000001 RSI: 000000000000000c RDI: ffff8100cf4256f8
> > RBP: 0000000000000000 R08: ffff810004df9be8 R09: ffff810004df9c58
> > R10: 8888888888888888 R11: 8888888888888888 R12: 0000000000000052
> > R13: 0000000000001000 R14: 0000000000000000 R15: 00000000000000d3
> > FS: 00002adfe3f7d6f0(0000) GS:ffffffff80730000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000000 CR3: 0000000004362000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process fsstress (pid: 4833, threadinfo ffff810004df8000, task ffff810004c867a0)
> > Stack: ffffffff880180f2 ffff8100054a23f0 0000000000000000 0000000000000000
> > ffff810005dcbb80 ffff81000549bf00 0000000000000000 ffff810005def8b0
> > ffffffff88029e60 ffffffff8025b2be ffff8100054da540 ffff8100cf496fb0
> > Call Trace:
> > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > [<ffffffff8025b2be>] find_get_page+0x21/0x51
> > [<ffffffff802a5b45>] do_mpage_readpage+0x45f/0x480
> > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > [<ffffffff88003d64>] :jbd2:jbd2_journal_dirty_metadata+0x197/0x1be
> > [<ffffffff80245f3b>] bit_waitqueue+0x1c/0x99
> > [<ffffffff802a5bb4>] mpage_readpage+0x4e/0x67
> > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > [<ffffffff8028817e>] do_lookup+0x63/0x1ae
> > [<ffffffff8025b1ae>] file_read_actor+0x8d/0xf6
> > [<ffffffff8025b2be>] find_get_page+0x21/0x51
> > [<ffffffff8025b93a>] do_generic_mapping_read+0x23c/0x3da
> > [<ffffffff8025b121>] file_read_actor+0x0/0xf6
> > [<ffffffff8025d123>] generic_file_aio_read+0x119/0x156
> > [<ffffffff80281848>] do_sync_read+0xc9/0x10c
> >
> > [<ffffffff802845b2>] cp_new_stat+0xe5/0xfd
> > [<ffffffff80246007>] autoremove_wake_function+0x0/0x2e
> > [<ffffffff80281fba>] vfs_read+0xaa/0x132
> > [<ffffffff80282356>] sys_read+0x45/0x6e
> > [<ffffffff8020b41e>] system_call+0x7e/0x83
> > Code: 8b 45 00 a8 01 0f 85 e6 00 00 00 8b 45 00 a8 20 0f 85 c9 00
> > <CONSOLE LOG>
> >
> > I've digged this a litle bit with folowig results:
> >
> > int block_read_full_page(struct page *page, get_block_t *get_block)
> > {
> > ...
> > 1914: if (!page_has_buffers(page)) <<< page_has_buffers(page) == true
> > create_empty_buffers(page, blocksize, 0);
> > head = page_buffers(page); <<<< page_buffers(page) == NULL
> > <<<i've add debug info here:
> > <<< page->flags == 100000000000821
> > <<< PagePrivate(page) == 1, (page)->private == NULL
> > <<< So we have private page without buffers, it is WRONG.
> >
> > iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
> > bh = head;
> > nr = 0;
> > i = 0;
> >
> > do {
> > if (buffer_uptodate(bh)) << Null pointer deref here result in oops
> > .......
> > }
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-06-16 08:01:32

by Dmitri Monakhov

[permalink] [raw]
Subject: [PATCH] ext4:fix invariant checking in ext4_rebalance_reservation

Variable "free" was declarated as __u64 so conidition (free < 0) always
false, even if free was overflowed during substraction.
---
fs/ext4/balloc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 43ae8f8..a9655f1 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -1909,8 +1909,8 @@ void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free)
chunk = free;
if (rs[i].rs_reserved || i == smp_processor_id()) {
rs[i].rs_reserved = chunk;
+ BUG_ON(free < chunk);
free -= chunk;
- BUG_ON(free < 0);
}
}
BUG_ON(free);
--
1.5.2

2007-06-16 08:13:54

by Dmitri Monakhov

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On 16:16 Птн 15 Июн , Mingming Cao wrote:
> I hit almost the same issue today also, but with different error #, and
> one more kernel oops, when run fsstress on x86_64.
>
> EXT4-fs: writeback error = -2
> EXT4-fs: writeback error = -2
This error never happens in writeback in my case, only ENOSPC.
Btw: i've send one more micro fix (see in this thread " PATCH] ext4:fix
invariant checking in ext4_rebalance_reservation")
>
> Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> [<ffffffff8028bbb6>] block_read_full_page+0xb5/0x267
> PGD 1f9842067 PUD 1f9843067 PMD 0
> Oops: 0000 [5] SMP
> CPU 3
> Modules linked in:
> Pid: 10900, comm: fsstress Not tainted 2.6.22-rc4-autokern1 #1
> RIP: 0010:[<ffffffff8028bbb6>] [<ffffffff8028bbb6>] block_read_full_page+0xb5/0x267
> RSP: 0000:ffff8101f984fa48 EFLAGS: 00010213
> RAX: 0000000000000179 RBX: 0000000000000000 RCX: 000000000000000c
> RDX: 000000000000000c RSI: ffffffff802e0f7b RDI: ffff81017ff578c8
> RBP: ffff81017ff578c8 R08: ffff8101f984fbe8 R09: ffff8101f984fbe0
> R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000e5
> R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff8101803ec5c0(0063) knlGS:00000000f7dec460
> CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 00000001f9841000 CR4: 00000000000006e0
> Process fsstress (pid: 10900, threadinfo ffff8101f984e000, task ffff8101f9824280)
> Stack: 00001000f9ad4080 0000000100000000 0000000000000000 0000000000000179
> ffff8100de7c4100 ffffffff802e0f7b 000003363e3761bf ffffffff804eac2d
> ffff8101f984fb48 0000000000000082 ffff81017e9bc550 ffff81017e9bc588
> Call Trace:
> [<ffffffff802e0f7b>] ext4_get_block+0x0/0x104
> [<ffffffff804eac2d>] thread_return+0x0/0xd5
> [<ffffffff8028fcd6>] do_mpage_readpage+0x411/0x430
> [<ffffffff804eb481>] io_schedule+0x26/0x32
> [<ffffffff804eb6fb>] __wait_on_bit_lock+0x5f/0x6d
> [<ffffffff8028fe7e>] mpage_readpage+0x42/0x5b
> [<ffffffff802e0f7b>] ext4_get_block+0x0/0x104
> [<ffffffff802395eb>] wake_bit_function+0x0/0x23
> [<ffffffff8024a9bd>] file_read_actor+0x89/0xf4
> [<ffffffff8024a21e>] find_get_page+0x1e/0x4d
> [<ffffffff8024a763>] do_generic_mapping_read+0x20e/0x3df
> [<ffffffff8024a934>] file_read_actor+0x0/0xf4
> [<ffffffff8024c2e7>] generic_file_aio_read+0x11d/0x154
> [<ffffffff8026c7ca>] do_sync_read+0xc8/0x10b
> [<ffffffff80272c4f>] permission+0xbb/0xbd
> [<ffffffff802395bd>] autoremove_wake_function+0x0/0x2e
> [<ffffffff8026be62>] nameidata_to_filp+0x25/0x34
> [<ffffffff8026be9e>] do_filp_open+0x2d/0x3d
> [<ffffffff8026f355>] vfs_getattr+0x2b/0x2f
> [<ffffffff8026f43d>] vfs_fstat+0x33/0x3a
> [<ffffffff8026c8b8>] vfs_read+0xab/0x12e
> [<ffffffff8026cbbc>] sys_read+0x45/0x6e
> [<ffffffff80219f02>] ia32_sysret+0x0/0xa
>
[skip]
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/writeback.c:266!
Yepp. I've saw this oops too, But IMHO it may be artefact caused by
previous one.
> invalid opcode: 0000 [6] SMP
> CPU 3
> Modules linked in:
> Pid: 10851, comm: fsstress Not tainted 2.6.22-rc4-autokern1 #1
> RIP: 0010:[<ffffffff802ed5f6>] [<ffffffff802ed5f6>] ext4_wb_submit_extent+0x1ef/0x3d9
> RSP: 0000:ffff8101e47cfab8 EFLAGS: 00010246
> RAX: 000000000001182c RBX: ffff8100c6709ca0 RCX: 000000000000000c
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8101e8de5000
> RBP: ffff8100c6709a48 R08: ffff8101b1056338 R09: 0000000000000000
> R10: ffff8101b1056338 R11: ffff8100c6709a48 R12: 0000000000000040
> R13: ffff81017eaa5b98 R14: 0000000000000040 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff8101803ec5c0(0063) knlGS:00000000f7dec460
> CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000f7dcb004 CR3: 00000001e47c1000 CR4: 00000000000006e0
> Process fsstress (pid: 10851, threadinfo ffff8101e47ce000, task ffff8101e47a6b30)
> Stack: ffff8101cf22c9b8 0000000000000000 0000000000000001 0000000c00000001
> ffff8100c6709a48 000000018028938e ffff8101e47cfb68 0000000000000000
> ffff8101e47cfd28 ffff8100c6709ca0 ffff8100c6709a48 ffff8100c6709990
> Call Trace:
> [<ffffffff802edb95>] ext4_wb_handle_extent+0x3b5/0x48c
> [<ffffffff802ebc24>] ext4_ext_walk_space+0x18a/0x20c
> [<ffffffff802ed7e0>] ext4_wb_handle_extent+0x0/0x48c
> [<ffffffff802edcc7>] ext4_wb_flush+0x5b/0x153
> [<ffffffff802ee1a0>] ext4_wb_writepages+0x34b/0x398
> [<ffffffff8024f81b>] do_writepages+0x20/0x2d
> [<ffffffff80286164>] __writeback_single_inode+0x1df/0x3a7
> [<ffffffff8024a47e>] find_get_pages_tag+0x34/0x89
> [<ffffffff80250c66>] pagevec_lookup_tag+0x1a/0x24
> [<ffffffff80249e89>] wait_on_page_writeback_range+0xc7/0x10d
> [<ffffffff80286702>] sync_sb_inodes+0x1cb/0x2a0
> [<ffffffff8028687c>] sync_inodes_sb+0xa5/0xb9
> [<ffffffff803b3e09>] __up_read+0x10/0x8a
> [<ffffffff802868fa>] __sync_inodes+0x6a/0xb1
> [<ffffffff80286952>] sync_inodes+0x11/0x29
> [<ffffffff8028895c>] do_sync+0x2c/0x50
> [<ffffffff8028898b>] sys_sync+0xb/0xf
> [<ffffffff80219f02>] ia32_sysret+0x0/0xa
>
>
> Code: 0f 0b eb fe f0 41 0f ba 75 00 14 48 8b 4c 24 40 01 51 10 48
> RIP [<ffffffff802ed5f6>] ext4_wb_submit_extent+0x1ef/0x3d9
> RSP <ffff8101e47cfab8>
>
> I will try the patch below...Alex, any hint about the second oops?
>
> Mingming
> Alex please
> On Fri, 2007-06-15 at 09:14 +0400, Alex Tomas wrote:
> > looks like an error in error handling path (notice -28 (ENOSPC) before)
> >
> > thanks for the report, Alex
> >
> > Dmitriy Monakhov wrote:
> > > )
> > >
> > > Simple test failed on ext4 when delayed allocation was used.
> > > #mkfs.ext3 -b4096 /dev/vzvg/test2
> > > #mount -text4dev /dev/vzvg/test2 /mnt/test -odelalloc
> > > #fsstress -d /mnt/test/ -l100 -n100000 -p20 -f dwrite=0
> > >
> > > <CONSOLE LOG>
> > > EXT4-fs: writeback error = -28
> > > ......
> > > EXT4-fs: writeback error = -28
> > > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> > > [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> > > PGD 44c1067 PUD 44fd067 PMD 0
> > > Oops: 0000 [2] SMP
> > > CPU 0
> > > Modules linked in: ext4dev jbd2
> > > Pid: 4833, comm: fsstress Not tainted 2.6.22-rc4-mm2 #9
> > > RIP: 0010:[<ffffffff802a12d2>] [<ffffffff802a12d2>] block_read_full_page+0xab/0x25f
> > > RSP: 0018:ffff810004df9a58 EFLAGS: 00010203
> > > RAX: 0000000000001000 RBX: ffff8100cf4256f8 RCX: 000000000000000c
> > >
> > > RDX: 0000000000000001 RSI: 000000000000000c RDI: ffff8100cf4256f8
> > > RBP: 0000000000000000 R08: ffff810004df9be8 R09: ffff810004df9c58
> > > R10: 8888888888888888 R11: 8888888888888888 R12: 0000000000000052
> > > R13: 0000000000001000 R14: 0000000000000000 R15: 00000000000000d3
> > > FS: 00002adfe3f7d6f0(0000) GS:ffffffff80730000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000000 CR3: 0000000004362000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process fsstress (pid: 4833, threadinfo ffff810004df8000, task ffff810004c867a0)
> > > Stack: ffffffff880180f2 ffff8100054a23f0 0000000000000000 0000000000000000
> > > ffff810005dcbb80 ffff81000549bf00 0000000000000000 ffff810005def8b0
> > > ffffffff88029e60 ffffffff8025b2be ffff8100054da540 ffff8100cf496fb0
> > > Call Trace:
> > > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > > [<ffffffff8025b2be>] find_get_page+0x21/0x51
> > > [<ffffffff802a5b45>] do_mpage_readpage+0x45f/0x480
> > > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > > [<ffffffff88003d64>] :jbd2:jbd2_journal_dirty_metadata+0x197/0x1be
> > > [<ffffffff80245f3b>] bit_waitqueue+0x1c/0x99
> > > [<ffffffff802a5bb4>] mpage_readpage+0x4e/0x67
> > > [<ffffffff880180f2>] :ext4dev:ext4_get_block+0x0/0x109
> > > [<ffffffff8028817e>] do_lookup+0x63/0x1ae
> > > [<ffffffff8025b1ae>] file_read_actor+0x8d/0xf6
> > > [<ffffffff8025b2be>] find_get_page+0x21/0x51
> > > [<ffffffff8025b93a>] do_generic_mapping_read+0x23c/0x3da
> > > [<ffffffff8025b121>] file_read_actor+0x0/0xf6
> > > [<ffffffff8025d123>] generic_file_aio_read+0x119/0x156
> > > [<ffffffff80281848>] do_sync_read+0xc9/0x10c
> > >
> > > [<ffffffff802845b2>] cp_new_stat+0xe5/0xfd
> > > [<ffffffff80246007>] autoremove_wake_function+0x0/0x2e
> > > [<ffffffff80281fba>] vfs_read+0xaa/0x132
> > > [<ffffffff80282356>] sys_read+0x45/0x6e
> > > [<ffffffff8020b41e>] system_call+0x7e/0x83
> > > Code: 8b 45 00 a8 01 0f 85 e6 00 00 00 8b 45 00 a8 20 0f 85 c9 00
> > > <CONSOLE LOG>
> > >
> > > I've digged this a litle bit with folowig results:
> > >
> > > int block_read_full_page(struct page *page, get_block_t *get_block)
> > > {
> > > ...
> > > 1914: if (!page_has_buffers(page)) <<< page_has_buffers(page) == true
> > > create_empty_buffers(page, blocksize, 0);
> > > head = page_buffers(page); <<<< page_buffers(page) == NULL
> > > <<<i've add debug info here:
> > > <<< page->flags == 100000000000821
> > > <<< PagePrivate(page) == 1, (page)->private == NULL
> > > <<< So we have private page without buffers, it is WRONG.
> > >
> > > iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > > lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
> > > bh = head;
> > > nr = 0;
> > > i = 0;
> > >
> > > do {
> > > if (buffer_uptodate(bh)) << Null pointer deref here result in oops
> > > .......
> > > }
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-06-18 21:09:43

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4:fix invariant checking in ext4_rebalance_reservation

On Sat, 2007-06-16 at 12:02 +0400, Dmitriy Monakhov wrote:
> Variable "free" was declarated as __u64 so conidition (free < 0) always
> false, even if free was overflowed during substraction.

Agreed.

Can you add your Signed-off to this patch? I can add mine after yours.

Thanks,
Mingming
> ---
> fs/ext4/balloc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 43ae8f8..a9655f1 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -1909,8 +1909,8 @@ void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free)
> chunk = free;
> if (rs[i].rs_reserved || i == smp_processor_id()) {
> rs[i].rs_reserved = chunk;
> + BUG_ON(free < chunk);
> free -= chunk;
> - BUG_ON(free < 0);
> }
> }
> BUG_ON(free);

2007-06-19 01:01:29

by Mingming Cao

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On Sat, 2007-06-16 at 12:14 +0400, Dmitriy Monakhov wrote:
> On 16:16 Птн 15 Июн , Mingming Cao wrote:
> > I hit almost the same issue today also, but with different error #, and
> > one more kernel oops, when run fsstress on x86_64.
> >
> > EXT4-fs: writeback error = -2
> > EXT4-fs: writeback error = -2
> This error never happens in writeback in my case, only ENOSPC.
I see why, error -2 is ENOENT, your kernel already patched with your
previus patch to fix the return error from ext4_reserve_global() from
ENOENT to ENOSPC.

> > > > I've digged this a litle bit with folowig results:
> > > >
> > > > int block_read_full_page(struct page *page, get_block_t *get_block)
> > > > {
> > > > ...
> > > > 1914: if (!page_has_buffers(page)) <<< page_has_buffers(page) == true
> > > > create_empty_buffers(page, blocksize, 0);
> > > > head = page_buffers(page); <<<< page_buffers(page) == NULL
> > > > <<<i've add debug info here:
> > > > <<< page->flags == 100000000000821
> > > > <<< PagePrivate(page) == 1, (page)->private == NULL
> > > > <<< So we have private page without buffers, it is WRONG.
> > > >

Right, I think the problem is the PagePrivate() is being set without a
valid page->private pointer.

The PagePrivate flag is being set by delayed allocation in the
ext4_wb_commit_write().

Looking at ext4-delayed-allocation.patch

+int ext4_wb_commit_write(struct file *file, struct page *page,
+ unsigned from, unsigned to)
+{
+ loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ struct inode *inode = page->mapping->host;
+ int err = 0;
+
+ wb_debug("commit page %lu (%u-%u) for inode %lu\n",
+ page->index, from, to, inode->i_ino);
+
+ /* mark page private so that we get
+ * called to invalidate/release page */
+ SetPagePrivate(page);
+
+ if (!PageBooked(page) && !PageMappedToDisk(page)) {
+ /* ->prepare_write() observed that block for this
+ * page hasn't been allocated yet. there fore it
+ * asked to reserve block for later allocation */
+ BUG_ON(page->private == 0);
+ page->private = 0;
+ err = ext4_wb_reserve_space_page(page, 1);
+ if (err)
+ return err;
+ }


It sets the PagePrivate flag at the commit_write() time. And the page-
>private is cleared to 0. page->private is being set to 1 at
ext4_wb_prepare_write():

+int ext4_wb_prepare_write(struct file *file, struct page *page,
+ unsigned from, unsigned to)
+{
+ struct inode *inode = page->mapping->host;
+ struct buffer_head bh, *bhw = &bh;
+ int err = 0;
+
+ wb_debug("prepare page %lu (%u-%u) for inode %lu\n",
+ page->index, from, to, page->mapping->host->i_ino);
+
+ /* if page is uptodate this means that ->prepare_write() has
+ * been called on page before and page is mapped to disk or
+ * we did reservation. page is protected and nobody can
+ * access it. hence, it safe to use page->private to pass
+ * flag that ->commit_write() has to reserve blocks. because
+ * an error may occur after ->prepare_write() we should not
+ * reserve block here. it's better to do in ->commit_write()
+ * when we're sure page is to be written */
+ page->private = 0;
+ if (!PageUptodate(page)) {
+ /* first write to this page */
+ bh.b_state = 0;
+ err = ext4_get_block(inode, page->index, bhw, 0);
+ if (err)
+ return err;
+ if (!buffer_mapped(bhw)) {
+ /* this block isn't allocated yet, reserve space */
+ wb_debug("reserve space for new block\n");
+ page->private = 1;
+ ext4_wb_clear_page(page, from, to);
+ ClearPageMappedToDisk(page);



>From the comments it says the page->private is set to 1 to letting
commit_write know that it needs block reservation, but I don't see the
page->private value being checked in ext4_wb_commit_write(). Instead,
the PageMappedToDisk(page) flag is being checked.

Alex, can you clarify the use of page->private and PagePrivate flag
here? Do we still need the page->private for delayed allocation, with
PageBooked flag and PageMappedToDisk page flag?

Thanks,

Mingming

2007-06-19 05:55:19

by Dmitri Monakhov

[permalink] [raw]
Subject: Re: [PATCH] ext4:fix invariant checking in ext4_rebalance_reservation

On 12:02 Сбт 16 Июн , Dmitry Monakhov wrote:
> Variable "free" was declarated as __u64 so conidition (free < 0) always
> false, even if free was overflowed during substraction.
> ---
> fs/ext4/balloc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 43ae8f8..a9655f1 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -1909,8 +1909,8 @@ void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free)
> chunk = free;
> if (rs[i].rs_reserved || i == smp_processor_id()) {
> rs[i].rs_reserved = chunk;
> + BUG_ON(free < chunk);
> free -= chunk;
> - BUG_ON(free < 0);
> }
> }
> BUG_ON(free);
> --
> 1.5.2
Signed-off-by: Dmitry Monakhov <[email protected]>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-06-19 07:11:22

by Alex Tomas

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

Mingming Cao wrote:
>>From the comments it says the page->private is set to 1 to letting
> commit_write know that it needs block reservation, but I don't see the
> page->private value being checked in ext4_wb_commit_write(). Instead,
> the PageMappedToDisk(page) flag is being checked.
>
> Alex, can you clarify the use of page->private and PagePrivate flag
> here? Do we still need the page->private for delayed allocation, with
> PageBooked flag and PageMappedToDisk page flag?

sorry for confusion, we need PagePrivate so that truncate calls our
->invalidatepage(). in turn this call is needed to drop unused reservation.

block_read_full_page() must not happen to pages being delayed allocated
in the first place - it's uptodate by definition. I think the problem
began when ext4_wb_commit_write() exited due to -ENOSPC but left not
uptodate *and* with PG_private. then subsequent access to page turned to
block_read_full_page() which relies on PG_private and meaningful private
field. Dmitry, could you repeat the test with SetPagePrivate(page) moved
to after that if() with ext4_wb_reserve_space_page(), please?


as i'm here ... status update: I've been reworking delayed allocation
patches to support blocksize < PAGE_CACHE_SIZE (and address akpm's request
for more generic implementation). the patch isn't ready for review, but
hopefully will be in few days.


thanks, Alex

2007-06-19 09:15:52

by Dmitri Monakhov

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On 11:11 Втр 19 Июн , Alex Tomas wrote:
> Mingming Cao wrote:
> >>From the comments it says the page->private is set to 1 to letting
> >commit_write know that it needs block reservation, but I don't see the
> >page->private value being checked in ext4_wb_commit_write(). Instead,
> >the PageMappedToDisk(page) flag is being checked.
> >
> >Alex, can you clarify the use of page->private and PagePrivate flag
> >here? Do we still need the page->private for delayed allocation, with
> >PageBooked flag and PageMappedToDisk page flag?
>
> sorry for confusion, we need PagePrivate so that truncate calls our
> ->invalidatepage(). in turn this call is needed to drop unused reservation.
>
> block_read_full_page() must not happen to pages being delayed allocated
> in the first place - it's uptodate by definition. I think the problem
> began when ext4_wb_commit_write() exited due to -ENOSPC but left not
> uptodate *and* with PG_private. then subsequent access to page turned to
> block_read_full_page() which relies on PG_private and meaningful private
> field. Dmitry, could you repeat the test with SetPagePrivate(page) moved
> to after that if() with ext4_wb_reserve_space_page(), please?
Yes it works. Bug not triggered now.

But whole approach based on using PagePrivate bit and page->private
ptr for special purposes is realy dengerous, and imho wrong,
because avery fs-related code assume that page->private points to
page_buffers. Especially this approach not work for blksize < pgsize.
The best way to store/pass any block related info is buffer_head flags.
This approach works for blksize < pgsize case too.
>
>
> as i'm here ... status update: I've been reworking delayed allocation
> patches to support blocksize < PAGE_CACHE_SIZE (and address akpm's request
> for more generic implementation). the patch isn't ready for review, but
> hopefully will be in few days.
>
>
> thanks, Alex
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-06-19 09:37:10

by Alex Tomas

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

Dmitry Monakhov wrote:
> But whole approach based on using PagePrivate bit and page->private
> ptr for special purposes is realy dengerous, and imho wrong,
> because avery fs-related code assume that page->private points to
> page_buffers. Especially this approach not work for blksize < pgsize.
> The best way to store/pass any block related info is buffer_head flags.
> This approach works for blksize < pgsize case too.

it's not dangerous as buffer_heads is optional facility
and any filesystem is free to use them or not. or even
use own mechanism like reiser4 does. this is why struct
page got private field instead of buffers (in 2.4).

buffer_heads have cost - 56 bytes/page (on 32bit arch)
+ cpu cycles to maintain them.

current delayed allocation isn't supposed to work with
bsize != PAGE_CACHE_SIZE. this will be resolved by the
upcoming patch, but bsize == PAGE_CACHE_SIZE is worthwhile
to optimize, I believe.

thanks, Alex

2007-06-19 16:24:01

by Mingming Cao

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On Tue, 2007-06-19 at 11:11 +0400, Alex Tomas wrote:
> Mingming Cao wrote:
> >>From the comments it says the page->private is set to 1 to letting
> > commit_write know that it needs block reservation, but I don't see the
> > page->private value being checked in ext4_wb_commit_write(). Instead,
> > the PageMappedToDisk(page) flag is being checked.
> >
> > Alex, can you clarify the use of page->private and PagePrivate flag
> > here? Do we still need the page->private for delayed allocation, with
> > PageBooked flag and PageMappedToDisk page flag?
>
> sorry for confusion, we need PagePrivate so that truncate calls our
> ->invalidatepage(). in turn this call is needed to drop unused reservation.


One more question, do we still need page->private? I don't see page-
>private value being checked anywhere. I see you cleared
PageMappedToDisk page flag at prepare_write, and checked
PageMappedToDisk in commit_write()...

Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
=blocksize. Is that the reason we need page->private to pass the
request?


> block_read_full_page() must not happen to pages being delayed allocated
> in the first place - it's uptodate by definition. I think the problem
> began when ext4_wb_commit_write() exited due to -ENOSPC but left not
> uptodate *and* with PG_private. then subsequent access to page turned to
> block_read_full_page() which relies on PG_private and meaningful private
> field. Dmitry, could you repeat the test with SetPagePrivate(page) moved
> to after that if() with ext4_wb_reserve_space_page(), please?
>
>
> as i'm here ... status update: I've been reworking delayed allocation
> patches to support blocksize < PAGE_CACHE_SIZE (and address akpm's request
> for more generic implementation). the patch isn't ready for review, but
> hopefully will be in few days.
>

That's good to know, thanks for the update. So probably above error case
handling will be addressed in the new version?

BTW, can you point me your latest and greatest mballoc patch? I am
trying to forward port and merge that patch to ext4 patch queue....


Thanks,


Mingming
>
> thanks, Alex
>
>

2007-06-19 16:47:33

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops



Mingming Cao wrote:
> O
>
> BTW, can you point me your latest and greatest mballoc patch? I am
> trying to forward port and merge that patch to ext4 patch queue....
>
>

I am looking at the one found at.

ftp://ftp.clusterfs.com/pub/people/alex/2.6.19-rc6


-aneesh

2007-06-19 21:50:08

by Andreas Dilger

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On Jun 19, 2007 13:16 +0400, Dmitry Monakhov wrote:
> But whole approach based on using PagePrivate bit and page->private
> ptr for special purposes is realy dengerous, and imho wrong,
> because avery fs-related code assume that page->private points to
> page_buffers.

This is actually a misconception. page->private can be anything. While
many fs helper routines use page->private as buffers, that only happens
when the fs itself causes those helper routines to be called. Lustre,
for example, has no buffer heads on the clients, and uses page->private
to track RPC batching information and data checksums.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-06-20 08:15:30

by Alex Tomas

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

Mingming Cao wrote:
> Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
> =blocksize. Is that the reason we need page->private to pass the
> request?

PageMappedToDisk isn't enough in that case, definitely. bh is the way
to track state of each block (this is what i'm implementing now), but
I think current nobh version (per-page flags are used) is valuable.

> That's good to know, thanks for the update. So probably above error case
> handling will be addressed in the new version?

well, you actually can move that SetPagePrivate() few lines above
(Dmitriy already tested this).

> BTW, can you point me your latest and greatest mballoc patch? I am
> trying to forward port and merge that patch to ext4 patch queue....

I don't have version for mainline yet. will prepare soon. thanks for
your interest.

thanks, Alex

2007-06-20 23:46:13

by Mingming Cao

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On Wed, 2007-06-20 at 12:15 +0400, Alex Tomas wrote:
> Mingming Cao wrote:
> > Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
> > =blocksize. Is that the reason we need page->private to pass the
> > request?
>
> PageMappedToDisk isn't enough in that case, definitely. bh is the way
> to track state of each block (this is what i'm implementing now), but
> I think current nobh version (per-page flags are used) is valuable.
>
> > That's good to know, thanks for the update. So probably above error case
> > handling will be addressed in the new version?
>
> well, you actually can move that SetPagePrivate() few lines above
> (Dmitriy already tested this).

Like this?

Index: linux-2.6.22-rc5/fs/ext4/writeback.c
===================================================================
--- linux-2.6.22-rc5.orig/fs/ext4/writeback.c 2007-06-20 16:41:26.000000000 -0700
+++ linux-2.6.22-rc5/fs/ext4/writeback.c 2007-06-20 16:44:10.000000000 -0700
@@ -918,16 +918,16 @@ int ext4_wb_commit_write(struct file *fi
wb_debug("commit page %lu (%u-%u) for inode %lu\n",
page->index, from, to, inode->i_ino);

- /* mark page private so that we get
- * called to invalidate/release page */
- SetPagePrivate(page);
-
if (!PageBooked(page) && !PageMappedToDisk(page)) {
/* ->prepare_write() observed that block for this
* page hasn't been allocated yet. there fore it
* asked to reserve block for later allocation */
BUG_ON(page->private == 0);
page->private = 0;
+ /* mark page private so that we get
+ * called to invalidate/release page */
+ SetPagePrivate(page);
+
err = ext4_wb_reserve_space_page(page, 1);
if (err)
return err;

2007-06-21 04:50:16

by Dmitri Monakhov

[permalink] [raw]
Subject: Re: delayed allocatiou result in Oops

On 16:46 Срд 20 Июн , Mingming Cao wrote:
> On Wed, 2007-06-20 at 12:15 +0400, Alex Tomas wrote:
> > Mingming Cao wrote:
> > > Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
> > > =blocksize. Is that the reason we need page->private to pass the
> > > request?
> >
> > PageMappedToDisk isn't enough in that case, definitely. bh is the way
> > to track state of each block (this is what i'm implementing now), but
> > I think current nobh version (per-page flags are used) is valuable.
> >
> > > That's good to know, thanks for the update. So probably above error case
> > > handling will be addressed in the new version?
> >
> > well, you actually can move that SetPagePrivate() few lines above
> > (Dmitriy already tested this).
>
> Like this?
Of course not, we may set page private bit only if
space was successfully reserved by ext4_wb_reserve_space_page().
So patch looks like this:
diff --git a/fs/ext4/writeback.c b/fs/ext4/writeback.c
index 3e669d6..84e5029 100644
--- a/fs/ext4/writeback.c
+++ b/fs/ext4/writeback.c
@@ -904,9 +904,6 @@ int ext4_wb_commit_write(struct file *file, struct page *page,
wb_debug("commit page %lu (%u-%u) for inode %lu\n",
page->index, from, to, inode->i_ino);

- /* mark page private so that we get
- * called to invalidate/release page */
- SetPagePrivate(page);

if (!PageBooked(page) && !PageMappedToDisk(page)) {
/* ->prepare_write() observed that block for this
@@ -918,6 +915,9 @@ int ext4_wb_commit_write(struct file *file, struct page *page,
if (err)
return err;
}
+ /* mark page private so that we get
+ * called to invalidate/release page */
+ SetPagePrivate(page);

/* ok. block for this page is allocated already or it has
* been reserved succesfully. so, user may use it */
--
1.5.2
>
> Index: linux-2.6.22-rc5/fs/ext4/writeback.c
> ===================================================================
> --- linux-2.6.22-rc5.orig/fs/ext4/writeback.c 2007-06-20 16:41:26.000000000 -0700
> +++ linux-2.6.22-rc5/fs/ext4/writeback.c 2007-06-20 16:44:10.000000000 -0700
> @@ -918,16 +918,16 @@ int ext4_wb_commit_write(struct file *fi
> wb_debug("commit page %lu (%u-%u) for inode %lu\n",
> page->index, from, to, inode->i_ino);
>
> - /* mark page private so that we get
> - * called to invalidate/release page */
> - SetPagePrivate(page);
> -
> if (!PageBooked(page) && !PageMappedToDisk(page)) {
> /* ->prepare_write() observed that block for this
> * page hasn't been allocated yet. there fore it
> * asked to reserve block for later allocation */
> BUG_ON(page->private == 0);
> page->private = 0;
> + /* mark page private so that we get
> + * called to invalidate/release page */
> + SetPagePrivate(page);
> +
> err = ext4_wb_reserve_space_page(page, 1);
> if (err)
> return err;
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html