From: Robin Dong <[email protected]>
When I run the shell scripts below for about 10 minutes in a 16-core server (upstream kernel):
DEV=/dev/sdc
FILE=/test/hello
do_write()
{
while [ 1 ]
do
dd if=/dev/zero of=$FILE bs=1k count=$1 conv=notrunc &> /dev/null
done
}
do_truncate()
{
while [ 1 ]
do
truncate -s $1 $FILE
done
}
mke2fs -m 0 -C 1048576 -O ^has_journal,^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,bigalloc $DEV
mount -t ext4 $DEV /test/
do_write 1 &
do_write 3 &
do_write 5 &
do_write 7 &
do_truncate 0 &
do_truncate 0 &
do_truncate 0 &
The "Used" ratio of ext4 filesystem ( which reported from "df" command ) grow very fast until it reach 100%, but actually the max size of the file in /test/ is only 7k.
Imaging a file has only one page (0~4k) which is delayed and not writeback yet (the i_reserved_data_blocks is 1),
and here comes two processes, process0 truncate page0(bh0), process1 write page1(bh1), the race condition will be like:
process0 process1
-->truncate
-->ext4_da_invalidatepage
-->ext4_da_page_release_reservation
-->clear_buffer_delay(bh0)
-->ext4_da_map_blocks
-->ext4_ext_map_blocks
-->map->m_flags |= EXT4_MAP_FROM_CLUSTER
(because bh0 is not delay now)
-->ext4_da_reserve_space
(i_reserved_data_blocks is 2 now)
(the bh1 is delay, so ext4_da_release_space
will not be called)
after bh1 writeback, the i_reserved_data_blocks is 1 but there is no really dirty cluster in the fs.
The following write operations will call ext4_da_update_reserve_space, but the sbi->s_dirtyclusters_counter will not be decreased since the i_reserved_data_block will not be zero any more. As a result, the s_dirtyclusters_counter grows fast.
Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/inode.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..9b3ceac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1209,10 +1209,10 @@ static void ext4_da_page_release_reservation(struct page *page,
do {
unsigned int next_off = curr_off + bh->b_size;
- if ((offset <= curr_off) && (buffer_delay(bh))) {
+ if ((offset <= curr_off) && buffer_delay(bh) &&
+ !buffer_da_mapped(bh)) {
to_release++;
clear_buffer_delay(bh);
- clear_buffer_da_mapped(bh);
}
curr_off = next_off;
} while ((bh = bh->b_this_page) != head);
@@ -2544,6 +2544,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
* Drop reserved blocks
*/
BUG_ON(!PageLocked(page));
+ down_write(&EXT4_I(page->mapping->host)->i_data_sem);
if (!page_has_buffers(page))
goto out;
@@ -2552,6 +2553,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
out:
ext4_invalidatepage(page, offset);
+ up_write(&EXT4_I(page->mapping->host)->i_data_sem);
return;
}
--
1.7.3.2