2011-05-17 22:59:30

by Jiaying Zhang

[permalink] [raw]
Subject: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
flag and then ftruncate the file to a size larger than the file's i_size,
any allocated but unwritten blocks will be freed but the file size is set
to the size that ftruncate specifies.

Here is a simple test to reproduce the problem:
1. fallocate a 12k size file with KEEP_SIZE flag
2. write the first 4k
3. ftruncate the file to 8k
Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
shows the file has only the first written block left.

Below is the proposed patch to fix the bug:

ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().

Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
of ext4_truncate(inode) when it needs to truncate an inode so that
if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
to a size larger than the inode's i_size, we will only truncate the blocks
beyond the specified truncate size instead of all of blocks beyond i_size.

Signed-off-by: Jiaying Zhang <[email protected]>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3424e82..3bfad57 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
}
}
/* ext4_truncate will clear the flag */
- if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
- ext4_truncate(inode);
+ if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
+ rc = vmtruncate(inode, attr->ia_size);
+ if (rc)
+ goto err_out;
+ }
}

if ((attr->ia_valid & ATTR_SIZE) &&


2011-05-18 02:46:37

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <[email protected]> wrote:
> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> flag and then ftruncate the file to a size larger than the file's i_size,
> any allocated but unwritten blocks will be freed but the file size is set
> to the size that ftruncate specifies.
>
> Here is a simple test to reproduce the problem:
> ?1. fallocate a 12k size file with KEEP_SIZE flag
> ?2. write the first 4k
> ?3. ftruncate the file to 8k
> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> shows the file has only the first written block left.
>
> Below is the proposed patch to fix the bug:
>
> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>
> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
> of ext4_truncate(inode) when it needs to truncate an inode so that
> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
> to a size larger than the inode's i_size, we will only truncate the blocks
> beyond the specified truncate size instead of all of blocks beyond i_size.
>
> Signed-off-by: Jiaying Zhang <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3424e82..3bfad57 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
> - ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> - ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> + ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
> + ? ? ? ? ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
> + ? ? ? ? ? ? ? ? ? ? ? if (rc)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto err_out;
> + ? ? ? ? ? ? ? }
Hi there,
It seems that the error handling has problems. 1)We should handle
error as the below code does. or 2) we can add a OR condition to the
if statement below so that it can handle case here. I prefer the 2nd
solution. it looks like:
- ? ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
- ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
- ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);

- if ((attr->ia_valid & ATTR_SIZE) &&
- attr->ia_size != i_size_read(inode))
+ if (((attr->ia_valid & ATTR_SIZE) &&
+ attr->ia_size != i_size_read(inode)) ||
+ ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
rc = vmtruncate(inode, attr->ia_size);

Yongqiang.
> ? ? ? ?}
>
> ? ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&


> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-05-18 02:56:43

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <[email protected]> wrote:
> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> flag and then ftruncate the file to a size larger than the file's i_size,
> any allocated but unwritten blocks will be freed but the file size is set
> to the size that ftruncate specifies.
>
> Here is a simple test to reproduce the problem:
> ?1. fallocate a 12k size file with KEEP_SIZE flag
> ?2. write the first 4k
> ?3. ftruncate the file to 8k
> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> shows the file has only the first written block left.
>
> Below is the proposed patch to fix the bug:
>
> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>
> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
> of ext4_truncate(inode) when it needs to truncate an inode so that
> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
> to a size larger than the inode's i_size, we will only truncate the blocks
> beyond the specified truncate size instead of all of blocks beyond i_size.
>
> Signed-off-by: Jiaying Zhang <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3424e82..3bfad57 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
> - ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> - ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> + ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
> + ? ? ? ? ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
> + ? ? ? ? ? ? ? ? ? ? ? if (rc)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto err_out;
> + ? ? ? ? ? ? ? }
> ? ? ? ?}
Another question, does vmtruncate() write zeros to blocks beyond old
EOF? I had a rough look and could not find the code doing this, maybe
due to my density.

Yongqiang.
>
> ? ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-05-18 03:19:10

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On 5/17/11 5:59 PM, Jiaying Zhang wrote:
> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> flag and then ftruncate the file to a size larger than the file's i_size,
> any allocated but unwritten blocks will be freed but the file size is set
> to the size that ftruncate specifies.
>
> Here is a simple test to reproduce the problem:
> 1. fallocate a 12k size file with KEEP_SIZE flag
> 2. write the first 4k
> 3. ftruncate the file to 8k
> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> shows the file has only the first written block left.

To be honest I'm not 100% certain what the fiesystem -should- do in this case.

If I go through that same sequence on xfs, I get 4k written / 8k unwritten:

# xfs_bmap -vp testfile
testfile:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000

size 8k:
# ls -l testfile
-rw-r--r-- 1 root root 8192 May 17 22:33 testfile

and diskspace used 12k:
# du -hc testfile
12K testfile
12K total

I think this is a different result from ext4, either with or without your patch.

On ext4 I get size 8k, but only the first 4k mapped, as you say.

I don't recall when truncate is supposed to free fallocated blocks, and from what point?

-Eric

> Below is the proposed patch to fix the bug:
>
> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>
> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
> of ext4_truncate(inode) when it needs to truncate an inode so that
> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
> to a size larger than the inode's i_size, we will only truncate the blocks
> beyond the specified truncate size instead of all of blocks beyond i_size.
>
> Signed-off-by: Jiaying Zhang <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3424e82..3bfad57 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> }
> }
> /* ext4_truncate will clear the flag */
> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> - ext4_truncate(inode);
> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
> + rc = vmtruncate(inode, attr->ia_size);
> + if (rc)
> + goto err_out;
> + }
> }
>
> if ((attr->ia_valid & ATTR_SIZE) &&
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-05-18 03:27:26

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Wed, May 18, 2011 at 10:56 AM, Yongqiang Yang <[email protected]> wrote:
> On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <[email protected]> wrote:
>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>> flag and then ftruncate the file to a size larger than the file's i_size,
>> any allocated but unwritten blocks will be freed but the file size is set
>> to the size that ftruncate specifies.
>>
>> Here is a simple test to reproduce the problem:
>> ?1. fallocate a 12k size file with KEEP_SIZE flag
>> ?2. write the first 4k
>> ?3. ftruncate the file to 8k
>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>> shows the file has only the first written block left.
>>
>> Below is the proposed patch to fix the bug:
>>
>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>>
>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
>> of ext4_truncate(inode) when it needs to truncate an inode so that
>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
>> to a size larger than the inode's i_size, we will only truncate the blocks
>> beyond the specified truncate size instead of all of blocks beyond i_size.
>>
>> Signed-off-by: Jiaying Zhang <[email protected]>
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 3424e82..3bfad57 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>> ? ? ? ? ? ? ? ? ? ? ? ?}
>> ? ? ? ? ? ? ? ?}
>> ? ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
>> - ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
>> - ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
>> + ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
>> + ? ? ? ? ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
>> + ? ? ? ? ? ? ? ? ? ? ? if (rc)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto err_out;
>> + ? ? ? ? ? ? ? }
>> ? ? ? ?}
> Another question, does vmtruncate() write zeros to blocks beyond old
> EOF? ?I had a rough look and could not find the code doing this, maybe
> due to my density.
Sorry, ignore this, the extent is unwritten, no need to zero.
>
> Yongqiang.
>>
>> ? ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>



--
Best Wishes
Yongqiang Yang

2011-05-18 05:35:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On May 17, 2011, at 21:19, Eric Sandeen wrote:
> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>> flag and then ftruncate the file to a size larger than the file's i_size,
>> any allocated but unwritten blocks will be freed but the file size is set
>> to the size that ftruncate specifies.
>>
>> Here is a simple test to reproduce the problem:
>> 1. fallocate a 12k size file with KEEP_SIZE flag
>> 2. write the first 4k
>> 3. ftruncate the file to 8k
>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>> shows the file has only the first written block left.
>
> To be honest I'm not 100% certain what the fiesystem -should- do in this case.

I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space.

> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>
> # xfs_bmap -vp testfile
> testfile:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000
>
> size 8k:
> # ls -l testfile
> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>
> and diskspace used 12k:
> # du -hc testfile
> 12K testfile
> 12K total
>
> I think this is a different result from ext4, either with or without your patch.
>
> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>
> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>
> -Eric
>
>> Below is the proposed patch to fix the bug:
>>
>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>>
>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
>> of ext4_truncate(inode) when it needs to truncate an inode so that
>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
>> to a size larger than the inode's i_size, we will only truncate the blocks
>> beyond the specified truncate size instead of all of blocks beyond i_size.
>>
>> Signed-off-by: Jiaying Zhang <[email protected]>
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 3424e82..3bfad57 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>> }
>> }
>> /* ext4_truncate will clear the flag */
>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
>> - ext4_truncate(inode);
>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
>> + rc = vmtruncate(inode, attr->ia_size);
>> + if (rc)
>> + goto err_out;
>> + }
>> }
>>
>> if ((attr->ia_valid & ATTR_SIZE) &&
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2011-05-18 06:18:19

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote:
> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> > flag and then ftruncate the file to a size larger than the file's i_size,
> > any allocated but unwritten blocks will be freed but the file size is set
> > to the size that ftruncate specifies.
> >
> > Here is a simple test to reproduce the problem:
> > 1. fallocate a 12k size file with KEEP_SIZE flag
> > 2. write the first 4k
> > 3. ftruncate the file to 8k
> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> > shows the file has only the first written block left.
>
> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>
> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>
> # xfs_bmap -vp testfile
> testfile:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000

Ok, so that's the case for a _truncate up_ from 4k to 8k:

$ rm /mnt/test/foo
$ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 0
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..23]: 9712..9735 0 (9712..9735) 24 10000
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec)
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 8192
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 2
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

But you get a different result on truncate down:

$rm /mnt/test/foo
$ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 12288
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..23]: 9584..9607 0 (9584..9607) 24 10000
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec)
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
1: [8..23]: 9592..9607 0 (9592..9607) 16 10000
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
1: [8..15]: 9592..9599 0 (9592..9599) 8 10000
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 8192
stat.blocks = 16
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 2
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136

IOWs, on XFS a truncate up does not change the preallocation at all,
while a truncate down will _always_ remove preallocation beyond the
new EOF. It's always had this behaviour w.r.t. to truncate(2) and
preallocation beyond EOF.

> I think this is a different result from ext4, either with or without your patch.
>
> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>
> I don't recall when truncate is supposed to free fallocated blocks, and from what point?

It's entirely up to the filesystem how it treats blocks beyond EOF
during truncation. XFS frees them on truncate down, because it is
much safer to just truncate away everything beyond the new EOF than
to leave written extents beyond EOF as potential landmines.

Indeed, that's why calling vmtruncate() as a bad fix. If you have:


NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU
....----+----------+--------+--------+
A B C D

Where A = new EOF (N)
A->B = unwritten (U)
B->C = written (W)
C = old EOF (O)
C->D = unwritten (U)

Then just calling vmtruncate() will leave the blocks in the range
B->C as written blocks. Hence then doing an extending truncate back
out to D will expose stale data rather than zeros in the range
B->C....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-05-18 14:05:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On 5/18/11 1:13 AM, Dave Chinner wrote:

> It's entirely up to the filesystem how it treats blocks beyond EOF
> during truncation. XFS frees them on truncate down, because it is
> much safer to just truncate away everything beyond the new EOF than
> to leave written extents beyond EOF as potential landmines.
>
> Indeed, that's why calling vmtruncate() as a bad fix. If you have:
>
>
> NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU
> ....----+----------+--------+--------+
> A B C D
>
> Where A = new EOF (N)
> A->B = unwritten (U)
> B->C = written (W)
> C = old EOF (O)
> C->D = unwritten (U)
>
> Then just calling vmtruncate() will leave the blocks in the range
> B->C as written blocks. Hence then doing an extending truncate back
> out to D will expose stale data rather than zeros in the range
> B->C....

Hm, running recent xfstests which includes fsx-with-fallocate should probably
eventually catch that then.

-Eric

> Cheers,
>
> Dave.


2011-05-18 20:29:27

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <[email protected]> wrote:
>
> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> > flag and then ftruncate the file to a size larger than the file's i_size,
> > any allocated but unwritten blocks will be freed but the file size is set
> > to the size that ftruncate specifies.
> >
> > Here is a simple test to reproduce the problem:
> > ? 1. fallocate a 12k size file with KEEP_SIZE flag
> > ? 2. write the first 4k
> > ? 3. ftruncate the file to 8k
> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> > shows the file has only the first written block left.
>
> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>
> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>
> # xfs_bmap -vp testfile
> testfile:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ? ? ? ?AG AG-OFFSET ? ? ? ? ? ? ?TOTAL FLAGS
> ? 0: [0..7]: ? ? ? ? ?2648750760..2648750767 ?3 (356066400..356066407) ? ? 8 00000
> ? 1: [8..23]: ? ? ? ? 2648750768..2648750783 ?3 (356066408..356066423) ? ?16 10000
>
> size 8k:
> # ls -l testfile
> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>
> and diskspace used 12k:
> # du -hc testfile
> 12K ? ? testfile
> 12K ? ? total
>
> I think this is a different result from ext4, either with or without your patch.
>
> On ext4 I get size 8k, but only the first 4k mapped, as you say.
I agree that truncating to a size larger than i_size is un-specified by
POSIX. However, I think the problem with the current behavior is that
we have an inconsistency between file's i_size and its extent tree.
Now we have 8k i_size but the file has only 4k space allocated. That
can confuse applications.

Jiaying
>
> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>
> -Eric
>
> > Below is the proposed patch to fix the bug:
> >
> > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
> >
> > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
> > of ext4_truncate(inode) when it needs to truncate an inode so that
> > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
> > to a size larger than the inode's i_size, we will only truncate the blocks
> > beyond the specified truncate size instead of all of blocks beyond i_size.
> >
> > Signed-off-by: Jiaying Zhang <[email protected]>
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 3424e82..3bfad57 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> > ? ? ? ? ? ? ? ? ? ? ? }
> > ? ? ? ? ? ? ? }
> > ? ? ? ? ? ? ? /* ext4_truncate will clear the flag */
> > - ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> > - ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> > + ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
> > + ? ? ? ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
> > + ? ? ? ? ? ? ? ? ? ? if (rc)
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto err_out;
> > + ? ? ? ? ? ? }
> > ? ? ? }
> >
> > ? ? ? if ((attr->ia_valid & ATTR_SIZE) &&
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-05-18 20:31:34

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On 5/18/11 3:28 PM, Jiaying Zhang wrote:
>
>
> On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <[email protected] <mailto:[email protected]>> wrote:
>
> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> > flag and then ftruncate the file to a size larger than the file's i_size,
> > any allocated but unwritten blocks will be freed but the file size is set
> > to the size that ftruncate specifies.
> >
> > Here is a simple test to reproduce the problem:
> > 1. fallocate a 12k size file with KEEP_SIZE flag
> > 2. write the first 4k
> > 3. ftruncate the file to 8k
> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> > shows the file has only the first written block left.
>
> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>
> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>
> # xfs_bmap -vp testfile
> testfile:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000
>
> size 8k:
> # ls -l testfile
> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>
> and diskspace used 12k:
> # du -hc testfile
> 12K testfile
> 12K total
>
> I think this is a different result from ext4, either with or without your patch.
>
> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>
> I agree that truncating to a size larger than i_size is un-specified by
> POSIX. However, I think the problem with the current behavior is that
> we have an inconsistency between file's i_size and its extent tree.
> Now we have 8k i_size but the file has only 4k space allocated. That
> can confuse applications.

That's called "a sparse file" right? Apps should not be confused by that ...

-Eric

2011-05-18 20:32:56

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <[email protected]> wrote:
> On May 17, 2011, at 21:19, Eric Sandeen wrote:
>> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>>> flag and then ftruncate the file to a size larger than the file's i_size,
>>> any allocated but unwritten blocks will be freed but the file size is set
>>> to the size that ftruncate specifies.
>>>
>>> Here is a simple test to reproduce the problem:
>>> 1. fallocate a 12k size file with KEEP_SIZE flag
>>> 2. write the first 4k
>>> 3. ftruncate the file to 8k
>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>>> shows the file has only the first written block left.
>>
>> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>
> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. ?For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space.
>
Indeed we have applications that are doing exactly the same as you
described. They always fallocate files to a pre-defined size with
KEEP_SIZE flag and if they end up using less than the allocated size,
they ftruncate files to their written size later.

Jiaying

>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>
>> # xfs_bmap -vp testfile
>> testfile:
>> EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ? ? ? ?AG AG-OFFSET ? ? ? ? ? ? ?TOTAL FLAGS
>> 0: [0..7]: ? ? ? ? ?2648750760..2648750767 ?3 (356066400..356066407) ? ? 8 00000
>> 1: [8..23]: ? ? ? ? 2648750768..2648750783 ?3 (356066408..356066423) ? ?16 10000
>>
>> size 8k:
>> # ls -l testfile
>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>>
>> and diskspace used 12k:
>> # du -hc testfile
>> 12K ? testfile
>> 12K ? total
>>
>> I think this is a different result from ext4, either with or without your patch.
>>
>> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>
>> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>>
>> -Eric
>>
>>> Below is the proposed patch to fix the bug:
>>>
>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>>>
>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
>>> of ext4_truncate(inode) when it needs to truncate an inode so that
>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
>>> to a size larger than the inode's i_size, we will only truncate the blocks
>>> beyond the specified truncate size instead of all of blocks beyond i_size.
>>>
>>> Signed-off-by: Jiaying Zhang <[email protected]>
>>>
>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>> index 3424e82..3bfad57 100644
>>> --- a/fs/ext4/inode.c
>>> +++ b/fs/ext4/inode.c
>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>>> ? ? ? ? ? ? ? ? ? ? ?}
>>> ? ? ? ? ? ? ?}
>>> ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
>>> - ? ? ? ? ? ?if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
>>> - ? ? ? ? ? ? ? ? ? ?ext4_truncate(inode);
>>> + ? ? ? ? ? ?if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
>>> + ? ? ? ? ? ? ? ? ? ?rc = vmtruncate(inode, attr->ia_size);
>>> + ? ? ? ? ? ? ? ? ? ?if (rc)
>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto err_out;
>>> + ? ? ? ? ? ?}
>>> ? ? ?}
>>>
>>> ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>

2011-05-18 20:38:25

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Wed, May 18, 2011 at 1:31 PM, Eric Sandeen <[email protected]> wrote:
> On 5/18/11 3:28 PM, Jiaying Zhang wrote:
>>
>>
>> On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <[email protected] <mailto:[email protected]>> wrote:
>>
>> ? ? On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>> ? ? > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>> ? ? > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>> ? ? > flag and then ftruncate the file to a size larger than the file's i_size,
>> ? ? > any allocated but unwritten blocks will be freed but the file size is set
>> ? ? > to the size that ftruncate specifies.
>> ? ? >
>> ? ? > Here is a simple test to reproduce the problem:
>> ? ? > ? 1. fallocate a 12k size file with KEEP_SIZE flag
>> ? ? > ? 2. write the first 4k
>> ? ? > ? 3. ftruncate the file to 8k
>> ? ? > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>> ? ? > shows the file has only the first written block left.
>>
>> ? ? To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>>
>> ? ? If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>
>> ? ? # xfs_bmap -vp testfile
>> ? ? testfile:
>> ? ? ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ? ? ? ?AG AG-OFFSET ? ? ? ? ? ? ?TOTAL FLAGS
>> ? ? ? 0: [0..7]: ? ? ? ? ?2648750760..2648750767 ?3 (356066400..356066407) ? ? 8 00000
>> ? ? ? 1: [8..23]: ? ? ? ? 2648750768..2648750783 ?3 (356066408..356066423) ? ?16 10000
>>
>> ? ? size 8k:
>> ? ? # ls -l testfile
>> ? ? -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>>
>> ? ? and diskspace used 12k:
>> ? ? # du -hc testfile
>> ? ? 12K ? ? testfile
>> ? ? 12K ? ? total
>>
>> ? ? I think this is a different result from ext4, either with or without your patch.
>>
>> ? ? On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>
>> I agree that truncating to a size larger than i_size is un-specified by
>> POSIX. However, I think the problem with the current behavior is that
>> we have an inconsistency between file's i_size and its extent tree.
>> Now we have 8k i_size but the file has only 4k space allocated. That
>> can confuse applications.
>
> That's called "a sparse file" right? ?Apps should not be confused by that ...
But applications don't intend to create a sparse file in this case. They may
create a lot of such files in this way with 'df' showing there is
still plenty of
free space available but as they start writing to these files they will fill up
space and get ENOSPC unexpectedly.

Jiaying
>
> -Eric
>

2011-05-18 20:42:38

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Tue, May 17, 2011 at 11:13 PM, Dave Chinner <[email protected]> wrote:
> On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote:
>> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>> > flag and then ftruncate the file to a size larger than the file's i_size,
>> > any allocated but unwritten blocks will be freed but the file size is set
>> > to the size that ftruncate specifies.
>> >
>> > Here is a simple test to reproduce the problem:
>> > ? 1. fallocate a 12k size file with KEEP_SIZE flag
>> > ? 2. write the first 4k
>> > ? 3. ftruncate the file to 8k
>> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>> > shows the file has only the first written block left.
>>
>> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>>
>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>
>> # xfs_bmap -vp testfile
>> testfile:
>> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ? ? ? ?AG AG-OFFSET ? ? ? ? ? ? ?TOTAL FLAGS
>> ? ?0: [0..7]: ? ? ? ? ?2648750760..2648750767 ?3 (356066400..356066407) ? ? 8 00000
>> ? ?1: [8..23]: ? ? ? ? 2648750768..2648750783 ?3 (356066408..356066423) ? ?16 10000
>
> Ok, so that's the case for a _truncate up_ from 4k to 8k:
>
> $ rm /mnt/test/foo
> $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 0
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..23]: ? ? ? ? 9712..9735 ? ? ? ?0 (9712..9735) ? ? ? ?24 10000
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec)
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..7]: ? ? ? ? ?9712..9719 ? ? ? ?0 (9712..9719) ? ? ? ? 8 00000
> ? 1: [8..23]: ? ? ? ? 9720..9735 ? ? ? ?0 (9720..9735) ? ? ? ?16 10000
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..7]: ? ? ? ? ?9712..9719 ? ? ? ?0 (9712..9719) ? ? ? ? 8 00000
> ? 1: [8..23]: ? ? ? ? 9720..9735 ? ? ? ?0 (9720..9735) ? ? ? ?16 10000
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 8192
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 2
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
>
> But you get a different result on truncate down:
>
> $rm /mnt/test/foo
> $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 12288
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..23]: ? ? ? ? 9584..9607 ? ? ? ?0 (9584..9607) ? ? ? ?24 10000
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec)
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..7]: ? ? ? ? ?9584..9591 ? ? ? ?0 (9584..9591) ? ? ? ? 8 00000
> ? 1: [8..23]: ? ? ? ? 9592..9607 ? ? ? ?0 (9592..9607) ? ? ? ?16 10000
> /mnt/test/foo:
> ?EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ?AG AG-OFFSET ? ? ? ?TOTAL FLAGS
> ? 0: [0..7]: ? ? ? ? ?9584..9591 ? ? ? ?0 (9584..9591) ? ? ? ? 8 00000
> ? 1: [8..15]: ? ? ? ? 9592..9599 ? ? ? ?0 (9592..9599) ? ? ? ? 8 10000
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 8192
> stat.blocks = 16
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 2
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
>
> IOWs, on XFS a truncate up does not change the preallocation at all,
> while a truncate down will _always_ remove preallocation beyond the
> new EOF. ?It's always had this behaviour w.r.t. to truncate(2) and
> preallocation beyond EOF.
>
>> I think this is a different result from ext4, either with or without your patch.
>>
>> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>
>> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>
> It's entirely up to the filesystem how it treats blocks beyond EOF
> during truncation. XFS frees them on truncate down, because it is
> much safer to just truncate away everything beyond the new EOF than
> to leave written extents beyond EOF as potential landmines.
>
> Indeed, that's why calling vmtruncate() as a bad fix. If you have:
>
>
> ? ? ? ? ? ? ? NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU
> ? ? ? ....----+----------+--------+--------+
> ? ? ? ? ? ? ? A ? ? ? ? ?B ? ? ? ?C ? ? ? ?D
>
> Where ? A = new EOF (N)
> ? ? ? ?A->B = unwritten (U)
> ? ? ? ?B->C = written (W)
> ? ? ? ?C = old EOF (O)
> ? ? ? ?C->D = unwritten (U)
>
> Then just calling vmtruncate() will leave the blocks in the range
> B->C as written blocks. Hence then doing an extending truncate back
> out to D will expose stale data rather than zeros in the range
> B->C....
Sorry I am a little confused. If I understand correctly, in the situation
you described, we call a truncate that causes EOF to change from
C to A. On ext4, we should free all of blocks after A. And when we
do an extending truncate to D, any blocks beyond A should be treated
as unwritten blocks so we should not expose any stale data, right?

Jiaying
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
>

2011-05-18 20:45:15

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On May 18, 2011, at 14:32, Jiaying Zhang wrote:
> On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <[email protected]> wrote:
>> On May 17, 2011, at 21:19, Eric Sandeen wrote:
>>> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>>>> flag and then ftruncate the file to a size larger than the file's i_size,
>>>> any allocated but unwritten blocks will be freed but the file size is set
>>>> to the size that ftruncate specifies.
>>>>
>>>> Here is a simple test to reproduce the problem:
>>>> 1. fallocate a 12k size file with KEEP_SIZE flag
>>>> 2. write the first 4k
>>>> 3. ftruncate the file to 8k
>>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>>>> shows the file has only the first written block left.
>>>
>>> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>>
>> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space.
>>
>
> Indeed we have applications that are doing exactly the same as you
> described. They always fallocate files to a pre-defined size with
> KEEP_SIZE flag and if they end up using less than the allocated size,
> they ftruncate files to their written size later.

If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing.

If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation.

>>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>>
>>> # xfs_bmap -vp testfile
>>> testfile:
>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
>>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
>>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000
>>>
>>> size 8k:
>>> # ls -l testfile
>>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>>>
>>> and diskspace used 12k:
>>> # du -hc testfile
>>> 12K testfile
>>> 12K total
>>>
>>> I think this is a different result from ext4, either with or without your patch.
>>>
>>> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>>
>>> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>>>
>>> -Eric
>>>
>>>> Below is the proposed patch to fix the bug:
>>>>
>>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>>>>
>>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
>>>> of ext4_truncate(inode) when it needs to truncate an inode so that
>>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
>>>> to a size larger than the inode's i_size, we will only truncate the blocks
>>>> beyond the specified truncate size instead of all of blocks beyond i_size.
>>>>
>>>> Signed-off-by: Jiaying Zhang <[email protected]>
>>>>
>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>> index 3424e82..3bfad57 100644
>>>> --- a/fs/ext4/inode.c
>>>> +++ b/fs/ext4/inode.c
>>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>>>> }
>>>> }
>>>> /* ext4_truncate will clear the flag */
>>>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
>>>> - ext4_truncate(inode);
>>>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
>>>> + rc = vmtruncate(inode, attr->ia_size);
>>>> + if (rc)
>>>> + goto err_out;
>>>> + }
>>>> }
>>>>
>>>> if ((attr->ia_valid & ATTR_SIZE) &&
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2011-05-18 20:57:45

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

On Wed, May 18, 2011 at 1:45 PM, Andreas Dilger <[email protected]> wrote:
> On May 18, 2011, at 14:32, Jiaying Zhang wrote:
>> On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <[email protected]> wrote:
>>> On May 17, 2011, at 21:19, Eric Sandeen wrote:
>>>> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>>>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>>>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>>>>> flag and then ftruncate the file to a size larger than the file's i_size,
>>>>> any allocated but unwritten blocks will be freed but the file size is set
>>>>> to the size that ftruncate specifies.
>>>>>
>>>>> Here is a simple test to reproduce the problem:
>>>>> 1. fallocate a 12k size file with KEEP_SIZE flag
>>>>> 2. write the first 4k
>>>>> 3. ftruncate the file to 8k
>>>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>>>>> shows the file has only the first written block left.
>>>>
>>>> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>>>
>>> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. ?For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space.
>>>
>>
>> Indeed we have applications that are doing exactly the same as you
>> described. They always fallocate files to a pre-defined size with
>> KEEP_SIZE flag and if they end up using less than the allocated size,
>> they ftruncate files to their written size later.
>
> If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. ?I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. ?However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing.
>
> If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation.
With my patch, it is still a little different from what xfs does in
truncating-up case. As Eric mentioned, when an application fallocates
12k, writes 4k, and then truncates to 8k, on xfs there will be 12k
allocated blocks left, but on ext4 with my patch there will be 8k
allocated blocks left. The reason I think we may want to free any
blocks beyond the truncate size is because there may be situations
that applications are running out of space and want to shrink
fallocated files to a smaller size that is still larger than the
current i_size.

Jiaying
>
>>>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>>>
>>>> # xfs_bmap -vp testfile
>>>> testfile:
>>>> EXT: FILE-OFFSET ? ? ?BLOCK-RANGE ? ? ? ? ? ?AG AG-OFFSET ? ? ? ? ? ? ?TOTAL FLAGS
>>>> 0: [0..7]: ? ? ? ? ?2648750760..2648750767 ?3 (356066400..356066407) ? ? 8 00000
>>>> 1: [8..23]: ? ? ? ? 2648750768..2648750783 ?3 (356066408..356066423) ? ?16 10000
>>>>
>>>> size 8k:
>>>> # ls -l testfile
>>>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile
>>>>
>>>> and diskspace used 12k:
>>>> # du -hc testfile
>>>> 12K ? testfile
>>>> 12K ? total
>>>>
>>>> I think this is a different result from ext4, either with or without your patch.
>>>>
>>>> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>>>
>>>> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>>>>
>>>> -Eric
>>>>
>>>>> Below is the proposed patch to fix the bug:
>>>>>
>>>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr().
>>>>>
>>>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead
>>>>> of ext4_truncate(inode) when it needs to truncate an inode so that
>>>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate
>>>>> to a size larger than the inode's i_size, we will only truncate the blocks
>>>>> beyond the specified truncate size instead of all of blocks beyond i_size.
>>>>>
>>>>> Signed-off-by: Jiaying Zhang <[email protected]>
>>>>>
>>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>>> index 3424e82..3bfad57 100644
>>>>> --- a/fs/ext4/inode.c
>>>>> +++ b/fs/ext4/inode.c
>>>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>>>>> ? ? ? ? ? ? ? ? ? ? ?}
>>>>> ? ? ? ? ? ? ?}
>>>>> ? ? ? ? ? ? ?/* ext4_truncate will clear the flag */
>>>>> - ? ? ? ? ? ?if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
>>>>> - ? ? ? ? ? ? ? ? ? ?ext4_truncate(inode);
>>>>> + ? ? ? ? ? ?if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
>>>>> + ? ? ? ? ? ? ? ? ? ?rc = vmtruncate(inode, attr->ia_size);
>>>>> + ? ? ? ? ? ? ? ? ? ?if (rc)
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto err_out;
>>>>> + ? ? ? ? ? ?}
>>>>> ? ? ?}
>>>>>
>>>>> ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>

2011-05-22 23:56:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

Here's a cleaner way of fixing the problem. vmtruncate() is now
deprecated, so I'm using truncate_setsize() instead; in addition, I
avoid calling ext4_truncate() twice in the case where EOFBLOCKS_FL.
Since ext4_truncate() is a very heavyweight function, which requires
manipulating the orphan list and taking i_data_sem, avoiding a double
call of ext4_truncate() is a big win.

- Ted

commit bb5eabd2de1e9dfdeeac822174fba17f2d2a7f2b
Author: Theodore Ts'o <[email protected]>
Date: Sun May 22 19:45:01 2011 -0400

ext4: use truncate_setsize() unconditionally

In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
before calling vmtruncate(). This caused any allocated but unwritten
blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
flag to be dropped. This was done to make to make sure that
EOFBLOCKS_FL would not be cleared while still leaving blocks past
i_size allocated. This was not necessary, since ext4_truncate()
guarantees that blocks past i_size will be dropped, even in the case
where truncate() has increased i_size before calling ext4_truncate().

So fix this by removing the EOFBLOCKS_FL special case treatment in
ext4_setattr(). In addition, use truncate_setsize() followed by a
call to ext4_truncate() instead of using vmtruncate(). This is more
efficient since it skips the call to inode_newsize_ok(), which has
been checked already by inode_change_ok(). This is also in a win in
the case where EOFBLOCKS_FL is set since it avoids calling
ext4_truncate() twice.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index df3fb20..4ca8411 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)

if (S_ISREG(inode->i_mode) &&
attr->ia_valid & ATTR_SIZE &&
- (attr->ia_size < inode->i_size ||
- (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) {
+ (attr->ia_size < inode->i_size)) {
handle_t *handle;

handle = ext4_journal_start(inode, 3);
@@ -5398,14 +5397,13 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
goto err_out;
}
}
- /* ext4_truncate will clear the flag */
- if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
- ext4_truncate(inode);
}

if ((attr->ia_valid & ATTR_SIZE) &&
- attr->ia_size != i_size_read(inode))
- rc = vmtruncate(inode, attr->ia_size);
+ attr->ia_size != i_size_read(inode)) {
+ truncate_setsize(inode, attr->ia_size);
+ ext4_truncate(inode);
+ }

if (!rc) {
setattr_copy(inode, attr);

2011-05-23 18:30:38

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()

Hi Ted,

On Sun, May 22, 2011 at 4:56 PM, Ted Ts'o <[email protected]> wrote:
> Here's a cleaner way of fixing the problem. ?vmtruncate() is now
> deprecated, so I'm using truncate_setsize() instead; in addition, I
> avoid calling ext4_truncate() twice in the case where EOFBLOCKS_FL.
> Since ext4_truncate() is a very heavyweight function, which requires
> manipulating the orphan list and taking i_data_sem, avoiding a double
> call of ext4_truncate() is a big win.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>
> commit bb5eabd2de1e9dfdeeac822174fba17f2d2a7f2b
> Author: Theodore Ts'o <[email protected]>
> Date: ? Sun May 22 19:45:01 2011 -0400
>
> ? ?ext4: use truncate_setsize() unconditionally
>
> ? ?In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
> ? ?past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
> ? ?before calling vmtruncate(). ?This caused any allocated but unwritten
> ? ?blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
> ? ?flag to be dropped. ?This was done to make to make sure that
> ? ?EOFBLOCKS_FL would not be cleared while still leaving blocks past
> ? ?i_size allocated. ?This was not necessary, since ext4_truncate()
> ? ?guarantees that blocks past i_size will be dropped, even in the case
> ? ?where truncate() has increased i_size before calling ext4_truncate().
>
> ? ?So fix this by removing the EOFBLOCKS_FL special case treatment in
> ? ?ext4_setattr(). ?In addition, use truncate_setsize() followed by a
> ? ?call to ext4_truncate() instead of using vmtruncate(). ?This is more
> ? ?efficient since it skips the call to inode_newsize_ok(), which has
> ? ?been checked already by inode_change_ok(). ?This is also in a win in
> ? ?the case where EOFBLOCKS_FL is set since it avoids calling
> ? ?ext4_truncate() twice.
>
> ? ?Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index df3fb20..4ca8411 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>
> ? ? ? ?if (S_ISREG(inode->i_mode) &&
> ? ? ? ? ? ?attr->ia_valid & ATTR_SIZE &&
> - ? ? ? ? ? (attr->ia_size < inode->i_size ||
> - ? ? ? ? ? ?(ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) {
> + ? ? ? ? ? (attr->ia_size < inode->i_size)) {
> ? ? ? ? ? ? ? ?handle_t *handle;
>
> ? ? ? ? ? ? ? ?handle = ext4_journal_start(inode, 3);
> @@ -5398,14 +5397,13 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto err_out;
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?}
> - ? ? ? ? ? ? ? /* ext4_truncate will clear the flag */
> - ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> - ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> ? ? ? ?}
>
> ? ? ? ?if ((attr->ia_valid & ATTR_SIZE) &&
> - ? ? ? ? ? attr->ia_size != i_size_read(inode))
> - ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
> + ? ? ? ? ? attr->ia_size != i_size_read(inode)) {
> + ? ? ? ? ? ? ? truncate_setsize(inode, attr->ia_size);
> + ? ? ? ? ? ? ? ext4_truncate(inode);
> + ? ? ? }
I think your patch doesn't cover the case when we fallocate with
KEEP_SIZE a 12k file, write 4k, and then truncate the file to 4k.
In this case, attr->ia_size is equal to i_size but we still want to
free any allocated but unwritten blocks during truncate.

Jiaying
>
> ? ? ? ?if (!rc) {
> ? ? ? ? ? ? ? ?setattr_copy(inode, attr);
>

2011-05-23 19:19:05

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH -v2] ext4: use truncate_setsize() unconditionally

In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
before calling vmtruncate(). This caused any allocated but unwritten
blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
flag to be dropped. This was done to make to make sure that
EOFBLOCKS_FL would not be cleared while still leaving blocks past
i_size allocated. This was not necessary, since ext4_truncate()
guarantees that blocks past i_size will be dropped, even in the case
where truncate() has increased i_size before calling ext4_truncate().

So fix this by removing the EOFBLOCKS_FL special case treatment in
ext4_setattr(). In addition, use truncate_setsize() followed by a
call to ext4_truncate() instead of using vmtruncate(). This is more
efficient since it skips the call to inode_newsize_ok(), which has
been checked already by inode_change_ok(). This is also in a win in
the case where EOFBLOCKS_FL is set since it avoids calling
ext4_truncate() twice.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and
then truncate to 4k, we should discard the excess fallocate'd blocks. So if
attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but
if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate().

fs/ext4/inode.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index df3fb20..2e95819 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)

if (S_ISREG(inode->i_mode) &&
attr->ia_valid & ATTR_SIZE &&
- (attr->ia_size < inode->i_size ||
- (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) {
+ (attr->ia_size < inode->i_size)) {
handle_t *handle;

handle = ext4_journal_start(inode, 3);
@@ -5398,14 +5397,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
goto err_out;
}
}
- /* ext4_truncate will clear the flag */
- if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
- ext4_truncate(inode);
}

- if ((attr->ia_valid & ATTR_SIZE) &&
- attr->ia_size != i_size_read(inode))
- rc = vmtruncate(inode, attr->ia_size);
+ if (attr->ia_valid & ATTR_SIZE) {
+ if (attr->ia_size != i_size_read(inode)) {
+ truncate_setsize(inode, attr->ia_size);
+ ext4_truncate(inode);
+ } else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
+ ext4_truncate(inode);
+ }

if (!rc) {
setattr_copy(inode, attr);
--
1.7.3.1


2011-05-23 20:22:42

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally

On Mon, May 23, 2011 at 12:19 PM, Theodore Ts'o <[email protected]> wrote:
> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
> before calling vmtruncate(). ?This caused any allocated but unwritten
> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
> flag to be dropped. ?This was done to make to make sure that
> EOFBLOCKS_FL would not be cleared while still leaving blocks past
> i_size allocated. ?This was not necessary, since ext4_truncate()
> guarantees that blocks past i_size will be dropped, even in the case
> where truncate() has increased i_size before calling ext4_truncate().
>
> So fix this by removing the EOFBLOCKS_FL special case treatment in
> ext4_setattr(). ?In addition, use truncate_setsize() followed by a
> call to ext4_truncate() instead of using vmtruncate(). ?This is more
> efficient since it skips the call to inode_newsize_ok(), which has
> been checked already by inode_change_ok(). ?This is also in a win in
> the case where EOFBLOCKS_FL is set since it avoids calling
> ext4_truncate() twice.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> ?Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and
> ?then truncate to 4k, we should discard the excess fallocate'd blocks. ?So if
> ?attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but
> ?if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate().
>
> ?fs/ext4/inode.c | ? 16 ++++++++--------
> ?1 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index df3fb20..2e95819 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5363,8 +5363,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>
> ? ? ? ?if (S_ISREG(inode->i_mode) &&
> ? ? ? ? ? ?attr->ia_valid & ATTR_SIZE &&
> - ? ? ? ? ? (attr->ia_size < inode->i_size ||
> - ? ? ? ? ? ?(ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))) {
> + ? ? ? ? ? (attr->ia_size < inode->i_size)) {
> ? ? ? ? ? ? ? ?handle_t *handle;
>
> ? ? ? ? ? ? ? ?handle = ext4_journal_start(inode, 3);
> @@ -5398,14 +5397,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto err_out;
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?}
> - ? ? ? ? ? ? ? /* ext4_truncate will clear the flag */
> - ? ? ? ? ? ? ? if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)))
> - ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> ? ? ? ?}
>
> - ? ? ? if ((attr->ia_valid & ATTR_SIZE) &&
> - ? ? ? ? ? attr->ia_size != i_size_read(inode))
> - ? ? ? ? ? ? ? rc = vmtruncate(inode, attr->ia_size);
> + ? ? ? if (attr->ia_valid & ATTR_SIZE) {
> + ? ? ? ? ? ? ? if (attr->ia_size != i_size_read(inode)) {
> + ? ? ? ? ? ? ? ? ? ? ? truncate_setsize(inode, attr->ia_size);
> + ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> + ? ? ? ? ? ? ? } else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
> + ? ? ? ? ? ? ? ? ? ? ? ext4_truncate(inode);
> + ? ? ? }
>
> ? ? ? ?if (!rc) {
> ? ? ? ? ? ? ? ?setattr_copy(inode, attr);
> --
> 1.7.3.1
lgtm.

Jiaying

>
>

2011-05-24 14:30:06

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally

On 5/23/11 2:19 PM, Theodore Ts'o wrote:
> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
> before calling vmtruncate(). This caused any allocated but unwritten
> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
> flag to be dropped. This was done to make to make sure that
> EOFBLOCKS_FL would not be cleared while still leaving blocks past
> i_size allocated. This was not necessary, since ext4_truncate()
> guarantees that blocks past i_size will be dropped, even in the case
> where truncate() has increased i_size before calling ext4_truncate().
>
> So fix this by removing the EOFBLOCKS_FL special case treatment in
> ext4_setattr(). In addition, use truncate_setsize() followed by a
> call to ext4_truncate() instead of using vmtruncate(). This is more
> efficient since it skips the call to inode_newsize_ok(), which has
> been checked already by inode_change_ok(). This is also in a win in
> the case where EOFBLOCKS_FL is set since it avoids calling
> ext4_truncate() twice.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and
> then truncate to 4k, we should discard the excess fallocate'd blocks. So if
> attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but
> if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate().

are there xfstests which cover this explicitly? It should be simple to write.

If filesystem behavior differs we can always make ext4-only tests.

-Eric

2011-05-24 22:06:54

by Jiaying Zhang

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally

On Tue, May 24, 2011 at 7:30 AM, Eric Sandeen <[email protected]> wrote:
> On 5/23/11 2:19 PM, Theodore Ts'o wrote:
>> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
>> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
>> before calling vmtruncate(). ?This caused any allocated but unwritten
>> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
>> flag to be dropped. ?This was done to make to make sure that
>> EOFBLOCKS_FL would not be cleared while still leaving blocks past
>> i_size allocated. ?This was not necessary, since ext4_truncate()
>> guarantees that blocks past i_size will be dropped, even in the case
>> where truncate() has increased i_size before calling ext4_truncate().
>>
>> So fix this by removing the EOFBLOCKS_FL special case treatment in
>> ext4_setattr(). ?In addition, use truncate_setsize() followed by a
>> call to ext4_truncate() instead of using vmtruncate(). ?This is more
>> efficient since it skips the call to inode_newsize_ok(), which has
>> been checked already by inode_change_ok(). ?This is also in a win in
>> the case where EOFBLOCKS_FL is set since it avoids calling
>> ext4_truncate() twice.
>>
>> Signed-off-by: "Theodore Ts'o" <[email protected]>
>> ---
>> ?Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and
>> ?then truncate to 4k, we should discard the excess fallocate'd blocks. ?So if
>> ?attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but
>> ?if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate().
>
> are there xfstests which cover this explicitly? ?It should be simple to write.
>
Vivek has written a xfstest to cover this and more fallocate/truncate cases:
http://old.nabble.com/-PATCH--xfstests%3A-test-fallocate%2C-write%2C-ftruncate-combinations.-to31666685.html#a31666685

Jiaying

> If filesystem behavior differs we can always make ext4-only tests.
>
> -Eric
>

2011-05-24 22:31:27

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH -v2] ext4: use truncate_setsize() unconditionally

On 5/24/11 5:06 PM, Jiaying Zhang wrote:
> On Tue, May 24, 2011 at 7:30 AM, Eric Sandeen <[email protected]> wrote:
>> On 5/23/11 2:19 PM, Theodore Ts'o wrote:
>>> In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
>>> past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
>>> before calling vmtruncate(). This caused any allocated but unwritten
>>> blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
>>> flag to be dropped. This was done to make to make sure that
>>> EOFBLOCKS_FL would not be cleared while still leaving blocks past
>>> i_size allocated. This was not necessary, since ext4_truncate()
>>> guarantees that blocks past i_size will be dropped, even in the case
>>> where truncate() has increased i_size before calling ext4_truncate().
>>>
>>> So fix this by removing the EOFBLOCKS_FL special case treatment in
>>> ext4_setattr(). In addition, use truncate_setsize() followed by a
>>> call to ext4_truncate() instead of using vmtruncate(). This is more
>>> efficient since it skips the call to inode_newsize_ok(), which has
>>> been checked already by inode_change_ok(). This is also in a win in
>>> the case where EOFBLOCKS_FL is set since it avoids calling
>>> ext4_truncate() twice.
>>>
>>> Signed-off-by: "Theodore Ts'o" <[email protected]>
>>> ---
>>> Jiayingz pointed out that in the case where we fallocate 12k, write 4k, and
>>> then truncate to 4k, we should discard the excess fallocate'd blocks. So if
>>> attr->ia_size == inode.i_size, we can skip the truncate_setsize() call, but
>>> if the EOFBLOCKS_FL flag is set, we should still call ext4_truncate().
>>
>> are there xfstests which cover this explicitly? It should be simple to write.
>>
> Vivek has written a xfstest to cover this and more fallocate/truncate cases:
> http://old.nabble.com/-PATCH--xfstests%3A-test-fallocate%2C-write%2C-ftruncate-combinations.-to31666685.html#a31666685

ah, right - ok, thanks! Sorry for not keeping up.

-Eric

> Jiaying
>
>> If filesystem behavior differs we can always make ext4-only tests.
>>
>> -Eric
>>