From: Robin Dong <[email protected]>
We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
"dd" will only cost lesss than 1 second.
The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_cluster() will also scan all pages
in cluster because no buffer is "delayed".
A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
severely hurts the performance.
Therefore, we don't call ext4_find_delalloc_cluster() when use "nodelalloc".
Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/extents.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..e15d32b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3724,7 +3724,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
if (!(flags & EXT4_GET_BLOCKS_PUNCH_OUT_EXT) &&
ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
if (!newex.ee_start_lo && !newex.ee_start_hi) {
- if ((sbi->s_cluster_ratio > 1) &&
+ if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ (sbi->s_cluster_ratio > 1) &&
ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;
@@ -3900,7 +3901,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
}
}
- if ((sbi->s_cluster_ratio > 1) &&
+ if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ (sbi->s_cluster_ratio > 1) &&
ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;
--
1.7.4.1
Hi Robin,
If a file system is mounted with delalloc and it changes to nodelalloc
mode thereafter, does the patch work?
Yongqiang.
On Wed, Dec 7, 2011 at 2:04 PM, Robin Dong <[email protected]> wrote:
> From: Robin Dong <[email protected]>
>
> We found performance regression when using bigalloc with "nodelalloc" ?(1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: ?when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_cluster() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
>
> Therefore, we don't call ext4_find_delalloc_cluster() when use "nodelalloc".
>
> Signed-off-by: Robin Dong <[email protected]>
> ---
> ?fs/ext4/extents.c | ? ?6 ++++--
> ?1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 61fa9e1..e15d32b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3724,7 +3724,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
> ? ? ? ?if (!(flags & EXT4_GET_BLOCKS_PUNCH_OUT_EXT) &&
> ? ? ? ? ? ? ? ?ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
> ? ? ? ? ? ? ? ?if (!newex.ee_start_lo && !newex.ee_start_hi) {
> - ? ? ? ? ? ? ? ? ? ? ? if ((sbi->s_cluster_ratio > 1) &&
> + ? ? ? ? ? ? ? ? ? ? ? if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> + ? ? ? ? ? ? ? ? ? ? ? ? ? (sbi->s_cluster_ratio > 1) &&
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?map->m_flags |= EXT4_MAP_FROM_CLUSTER;
>
> @@ -3900,7 +3901,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
>
> - ? ? ? if ((sbi->s_cluster_ratio > 1) &&
> + ? ? ? if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> + ? ? ? ? ? (sbi->s_cluster_ratio > 1) &&
> ? ? ? ? ? ?ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
> ? ? ? ? ? ? ? ?map->m_flags |= EXT4_MAP_FROM_CLUSTER;
>
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
--
Best Wishes
Yongqiang Yang
From: Robin Dong <[email protected]>
We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
"dd" will only cost lesss than 1 second.
The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
in cluster because no buffer is "delayed".
A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
severely hurts the performance.
Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/extents.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..60f5f25 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3282,6 +3282,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
ext4_lblk_t i, pg_lblk;
pgoff_t index;
+ if (!test_opt(inode->i_sb, DELALLOC))
+ return 0;
+
/* reverse search wont work if fs block size is less than page size */
if (inode->i_blkbits < PAGE_CACHE_SHIFT)
search_hint_reverse = 0;
--
1.7.4.1
On Thu, Dec 8, 2011 at 2:59 PM, Robin Dong <[email protected]> wrote:
> From: Robin Dong <[email protected]>
>
> We found performance regression when using bigalloc with "nodelalloc" ?(1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: ?when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
Looks good to me.
I think delayed extent tree can help a lot when a cluster has hundreds
of pages in delalloc case.
Hi Ted,
Any plans on merging delayed extent tree patches?
Yongqiang.
>
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
>
> Signed-off-by: Robin Dong <[email protected]>
> ---
> ?fs/ext4/extents.c | ? ?3 +++
> ?1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 61fa9e1..60f5f25 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3282,6 +3282,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
> ? ? ? ?ext4_lblk_t i, pg_lblk;
> ? ? ? ?pgoff_t index;
>
> + ? ? ? if (!test_opt(inode->i_sb, DELALLOC))
> + ? ? ? ? ? ? ? return 0;
> +
> ? ? ? ?/* reverse search wont work if fs block size is less than page size */
> ? ? ? ?if (inode->i_blkbits < PAGE_CACHE_SHIFT)
> ? ? ? ? ? ? ? ?search_hint_reverse = 0;
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
--
Best Wishes
Yongqiang Yang
On Thu, Dec 08, 2011 at 02:59:54PM +0800, Robin Dong wrote:
> From: Robin Dong <[email protected]>
>
> We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
>
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
>
> Signed-off-by: Robin Dong <[email protected]>
Thanks, applied.
- Ted