2009-02-15 05:33:12

by Theodore Ts'o

[permalink] [raw]
Subject: ext4 not currently doing (much) multi-block allocation?

So I was looking at the ext4 code to see how hard it would be to add a
function that would take a struct inode *, and make sure that all of
the pages in the page cache had been allocated a physical block on
disk (but not necessarily writing the I/O to disk). The idea would be
to do this on close if the file had been truncated or opened with
O_TRUNC, and to also call this function if the inode had been renamed
and in the process a destination inode was freed. That way if we have
data=ordered, the blocks would be allocated, and at the next commit,
we would force the data blocks to disk.

While I was looking at the code, it looks to me like we are currently
only allocating a page at a time; ext4_da_writepages() may end up
allocating a number of pages, but it's doing it one page at a time,
not an extent at a time. So if the filesystem blocksize is 4k (and
the page size is 4k), the only time we will ever call the mballoc with
an allocation request greater than 1 is in the fallocate() system call
handler. This seems... non-optimal. Am I missing something?

- Ted




2009-02-15 11:06:26

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: ext4 not currently doing (much) multi-block allocation?

On Sun, Feb 15, 2009 at 12:32:06AM -0500, Theodore Tso wrote:
> So I was looking at the ext4 code to see how hard it would be to add a
> function that would take a struct inode *, and make sure that all of
> the pages in the page cache had been allocated a physical block on
> disk (but not necessarily writing the I/O to disk). The idea would be
> to do this on close if the file had been truncated or opened with
> O_TRUNC, and to also call this function if the inode had been renamed
> and in the process a destination inode was freed. That way if we have
> data=ordered, the blocks would be allocated, and at the next commit,
> we would force the data blocks to disk.
>
> While I was looking at the code, it looks to me like we are currently
> only allocating a page at a time; ext4_da_writepages() may end up
> allocating a number of pages, but it's doing it one page at a time,
> not an extent at a time. So if the filesystem blocksize is 4k (and
> the page size is 4k), the only time we will ever call the mballoc with
> an allocation request greater than 1 is in the fallocate() system call
> handler. This seems... non-optimal. Am I missing something?
>

Here is how it works. During writepages we loop through the dirty pages
and build largest contiguous block extent (mpage_add_bh_to_extent). Then we call
mpage_da_map_blocks. mpage_da_map_blocks does the mutli block request.
Once we have the blocks allocated we map these blocks to the pages. And
then we writeback one page at a time using writepage callback.

-aneesh

2009-02-15 13:36:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 not currently doing (much) multi-block allocation?

On Sun, Feb 15, 2009 at 04:35:28PM +0530, Aneesh Kumar K.V wrote:
> Here is how it works. During writepages we loop through the dirty pages
> and build largest contiguous block extent (mpage_add_bh_to_extent). Then we call
> mpage_da_map_blocks. mpage_da_map_blocks does the mutli block request.
> Once we have the blocks allocated we map these blocks to the pages. And
> then we writeback one page at a time using writepage callback.

mpage_da_map_blocks() calls mpd->get_block, which is set to
ext4_da_get_block_write(), which allocates a single block at a time
(max_blocks is set to bh->b_size >> inode->i_blkbits).

Put another way, where is the call to ext4_get_blocks_wrap() which
does the multi-block request? I don't see it...

- Ted

2009-02-15 17:37:03

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: ext4 not currently doing (much) multi-block allocation?

On Sun, Feb 15, 2009 at 08:36:18AM -0500, Theodore Tso wrote:
> On Sun, Feb 15, 2009 at 04:35:28PM +0530, Aneesh Kumar K.V wrote:
> > Here is how it works. During writepages we loop through the dirty pages
> > and build largest contiguous block extent (mpage_add_bh_to_extent). Then we call
> > mpage_da_map_blocks. mpage_da_map_blocks does the mutli block request.
> > Once we have the blocks allocated we map these blocks to the pages. And
> > then we writeback one page at a time using writepage callback.
>
> mpage_da_map_blocks() calls mpd->get_block, which is set to
> ext4_da_get_block_write(), which allocates a single block at a time
> (max_blocks is set to bh->b_size >> inode->i_blkbits).


That bh>b_size indicate multiple blocks.

we do the below in mpage_add_bh_to_extent

2024 if (logical == next && (bh->b_state & BH_FLAGS) == lbh->b_state) {
2025 lbh->b_size += b_size;
2026 return;
2027 }

>
> Put another way, where is the call to ext4_get_blocks_wrap() which
> does the multi-block request? I don't see it...
>

-aneesh

2009-02-15 19:37:59

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4 not currently doing (much) multi-block allocation?

Aneesh Kumar K.V wrote:
> On Sun, Feb 15, 2009 at 08:36:18AM -0500, Theodore Tso wrote:
>> On Sun, Feb 15, 2009 at 04:35:28PM +0530, Aneesh Kumar K.V wrote:
>>> Here is how it works. During writepages we loop through the dirty pages
>>> and build largest contiguous block extent (mpage_add_bh_to_extent). Then we call
>>> mpage_da_map_blocks. mpage_da_map_blocks does the mutli block request.
>>> Once we have the blocks allocated we map these blocks to the pages. And
>>> then we writeback one page at a time using writepage callback.
>> mpage_da_map_blocks() calls mpd->get_block, which is set to
>> ext4_da_get_block_write(), which allocates a single block at a time
>> (max_blocks is set to bh->b_size >> inode->i_blkbits).
>
>
> That bh>b_size indicate multiple blocks.

I never did like this overloading of a buffer head for this sort of
purpose, for this (sometimes confusing) reason, but it's done throughout
the kernel... :)

But, maybe a comment would be in order just to make it clear "This is a
mapping BH" or something.

-Eric

> we do the below in mpage_add_bh_to_extent
>
> 2024 if (logical == next && (bh->b_state & BH_FLAGS) == lbh->b_state) {
> 2025 lbh->b_size += b_size;
> 2026 return;
> 2027 }
>
>> Put another way, where is the call to ext4_get_blocks_wrap() which
>> does the multi-block request? I don't see it...
>>
>


2009-02-15 21:22:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 not currently doing (much) multi-block allocation?

On Sun, Feb 15, 2009 at 11:06:29PM +0530, Aneesh Kumar K.V wrote:
>
> That bh>b_size indicate multiple blocks.
>
> we do the below in mpage_add_bh_to_extent
>
> 2024 if (logical == next && (bh->b_state & BH_FLAGS) == lbh->b_state) {
> 2025 lbh->b_size += b_size;
> 2026 return;
> 2027 }
>

Urgh, right. mpd->lbh isn't a real struct buffer_head at all; the
only fields we use out of it is b_size, b_state, and b_blocknr. I
really dislike this coding style; it's hard to tell what is a real
buffer_head, and what is a fake buffer_head. It also wastes about 200
bytes of kernel stack space.

There's a similar problem with bh_result; in which functions is it a
real buffer head (and must be a real buffer head), and in which
functions is it a fake buffer head, and which functions is it
sometimes a real buffer_head, and when it is a fake buffer_head? This
is one of those things which makes for hard-to-maintain code.

- Ted