2013-09-09 03:25:41

by Yan, Zheng

[permalink] [raw]
Subject: [PATCH] ext4: fix delayed pages writback regression.

From: "Yan, Zheng" <[email protected]>

Our Linux Kernel Performance project found that commit 4e7ea81db5
(ext4: restructure writeback path) indroduced a read performance
regression. After the commit, ext4 does not merge adjacent delayed
pages during writeback. The regression is caused by the "buffer
mapped" check in mpage_add_bh_to_extent(), delayed dirty pages are
not mapped.

Signed-off-by: Yan, Zheng <[email protected]>
---
fs/ext4/inode.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c79fd7d..f2034cb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1944,8 +1944,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
struct ext4_map_blocks *map = &mpd->map;

/* Buffer that doesn't need mapping for writeback? */
- if (!buffer_dirty(bh) || !buffer_mapped(bh) ||
- (!buffer_delay(bh) && !buffer_unwritten(bh))) {
+ if (!buffer_dirty(bh) ||
+ (!buffer_mapped(bh) &&
+ !buffer_delay(bh) && !buffer_unwritten(bh))) {
/* So far no extent to map => we write the buffer right away */
if (map->m_len == 0)
return true;
--
1.8.1.4



2013-09-09 13:52:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix delayed pages writback regression.

Hello,

Thanks for testing and the report.

On Mon 09-09-13 11:25:17, Yan, Zheng wrote:
> From: "Yan, Zheng" <[email protected]>
>
> Our Linux Kernel Performance project found that commit 4e7ea81db5
> (ext4: restructure writeback path) indroduced a read performance
> regression. After the commit, ext4 does not merge adjacent delayed
Really "read performance regression"? Do you mean that the file was more
fragmented and therefore reading got slower? Or how exactly did a change in
writeback path cause read perfomance regression?

Also what benchmark and HW configuration do you use for testing? And how
big regression do you see exactly? I can try to reproduce the results...

> pages during writeback. The regression is caused by the "buffer
> mapped" check in mpage_add_bh_to_extent(), delayed dirty pages are
> not mapped.
This shouldn't happen. As a comment before ext4_da_get_block_prep()
describes, delayed allocated buffers should be marked with BH_Mapped |
BH_New | BH_Delay. So if you can see BH_Delay buffers without BH_Mapped set
that's a bug we should find.

Honza
>
> Signed-off-by: Yan, Zheng <[email protected]>
> ---
> fs/ext4/inode.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c79fd7d..f2034cb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1944,8 +1944,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
> struct ext4_map_blocks *map = &mpd->map;
>
> /* Buffer that doesn't need mapping for writeback? */
> - if (!buffer_dirty(bh) || !buffer_mapped(bh) ||
> - (!buffer_delay(bh) && !buffer_unwritten(bh))) {
> + if (!buffer_dirty(bh) ||
> + (!buffer_mapped(bh) &&
> + !buffer_delay(bh) && !buffer_unwritten(bh))) {
> /* So far no extent to map => we write the buffer right away */
> if (map->m_len == 0)
> return true;
> --
> 1.8.1.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-09-10 02:02:35

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix delayed pages writback regression.

On Mon, Sep 9, 2013 at 9:52 PM, Jan Kara <[email protected]> wrote:
> Hello,
>
> Thanks for testing and the report.
>
> On Mon 09-09-13 11:25:17, Yan, Zheng wrote:
>> From: "Yan, Zheng" <[email protected]>
>>
>> Our Linux Kernel Performance project found that commit 4e7ea81db5
>> (ext4: restructure writeback path) indroduced a read performance
>> regression. After the commit, ext4 does not merge adjacent delayed
> Really "read performance regression"? Do you mean that the file was more
> fragmented and therefore reading got slower? Or how exactly did a change in
> writeback path cause read perfomance regression?
>
> Also what benchmark and HW configuration do you use for testing? And how
> big regression do you see exactly? I can try to reproduce the results...

It's fio benchmark, about 50% regression. job file is attached below.

---
[global]
direct=0
ioengine=mmap
size=1500M
bs=4k
pre_read=1
numjobs=1
overwrite=1
loops=5
runtime=300
group_reporting
invalidate=0
directory=/mnt/stp/fiodata
file_service_type=random:36
file_service_type=random:36

[job_sdb1_sub0]
startdelay=0
rw=randrw
filename=data0/f1:data0/f2

[job_sdb1_sub1]
startdelay=0
rw=randrw
filename=data0/f2:data0/f1

[job_sdb1_sub2]
startdelay=0
rw=randrw
filename=data0/f1:data0/f2

[job_sdb1_sub3]
startdelay=0
rw=randrw
filename=data0/f2:data0/f1

[job_sdb1_sub4]
startdelay=0
rw=randrw
filename=data0/f1:data0/f2

[job_sdb1_sub5]
startdelay=0
rw=randrw
filename=data0/f2:data0/f1

[job_sdb1_sub6]
startdelay=0
rw=randrw
filename=data0/f1:data0/f2

[job_sdb1_sub7]
startdelay=0
rw=randrw
filename=data0/f2:data0/f1

[job_sdc1_sub0]
startdelay=0
rw=randrw
filename=data1/f1:data1/f2

[job_sdc1_sub1]
startdelay=0
rw=randrw
filename=data1/f2:data1/f1

[job_sdc1_sub2]
startdelay=0
rw=randrw
filename=data1/f1:data1/f2

[job_sdc1_sub3]
startdelay=0
rw=randrw
filename=data1/f2:data1/f1

[job_sdc1_sub4]
startdelay=0
rw=randrw
filename=data1/f1:data1/f2

[job_sdc1_sub5]
startdelay=0
rw=randrw
filename=data1/f2:data1/f1

[job_sdc1_sub6]
startdelay=0
rw=randrw
filename=data1/f1:data1/f2

[job_sdc1_sub7]
startdelay=0
rw=randrw
filename=data1/f2:data1/f1

[job_sdd1_sub0]
startdelay=0
rw=randrw
filename=data2/f1:data2/f2

[job_sdd1_sub1]
startdelay=0
rw=randrw
filename=data2/f2:data2/f1

[job_sdd1_sub2]
startdelay=0
rw=randrw
filename=data2/f1:data2/f2

[job_sdd1_sub3]
startdelay=0
rw=randrw
filename=data2/f2:data2/f1

[job_sdd1_sub4]
startdelay=0
rw=randrw
filename=data2/f1:data2/f2

[job_sdd1_sub5]
startdelay=0
rw=randrw
filename=data2/f2:data2/f1

[job_sdd1_sub6]
startdelay=0
rw=randrw
filename=data2/f1:data2/f2

[job_sdd1_sub7]
startdelay=0
rw=randrw
filename=data2/f2:data2/f1

[job_sde1_sub0]
startdelay=0
rw=randrw
filename=data3/f1:data3/f2

[job_sde1_sub1]
startdelay=0
rw=randrw
filename=data3/f2:data3/f1

[job_sde1_sub2]
startdelay=0
rw=randrw
filename=data3/f1:data3/f2

[job_sde1_sub3]
startdelay=0
rw=randrw
filename=data3/f2:data3/f1

[job_sde1_sub4]
startdelay=0
rw=randrw
filename=data3/f1:data3/f2

[job_sde1_sub5]
startdelay=0
rw=randrw
filename=data3/f2:data3/f1

[job_sde1_sub6]
startdelay=0
rw=randrw
filename=data3/f1:data3/f2

[job_sde1_sub7]
startdelay=0
rw=randrw
filename=data3/f2:data3/f1


>
>> pages during writeback. The regression is caused by the "buffer
>> mapped" check in mpage_add_bh_to_extent(), delayed dirty pages are
>> not mapped.
> This shouldn't happen. As a comment before ext4_da_get_block_prep()
> describes, delayed allocated buffers should be marked with BH_Mapped |
> BH_New | BH_Delay. So if you can see BH_Delay buffers without BH_Mapped set
> that's a bug we should find.

Sorry, I re-ran the test and found it's the "(!buffer_delay(bh) &&
!buffer_unwritten(bh))"
check that prevents the merging. I will send a new patch soon

Regards
Yan, Zheng


>
> Honza
>>
>> Signed-off-by: Yan, Zheng <[email protected]>
>> ---
>> fs/ext4/inode.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index c79fd7d..f2034cb 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1944,8 +1944,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
>> struct ext4_map_blocks *map = &mpd->map;
>>
>> /* Buffer that doesn't need mapping for writeback? */
>> - if (!buffer_dirty(bh) || !buffer_mapped(bh) ||
>> - (!buffer_delay(bh) && !buffer_unwritten(bh))) {
>> + if (!buffer_dirty(bh) ||
>> + (!buffer_mapped(bh) &&
>> + !buffer_delay(bh) && !buffer_unwritten(bh))) {
>> /* So far no extent to map => we write the buffer right away */
>> if (map->m_len == 0)
>> return true;
>> --
>> 1.8.1.4
>>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html