2010-08-10 05:41:57

by Toshiyuki Okajima

[permalink] [raw]
Subject: [Q] ext4: the max file size of each case is correct?

Hi.

I have tested the following cases in order to confirm the maximum file size.
For the tests, I selected two parameters:
(These parameters relate to the max file size.)
1) Filesystem Feature
"extent"
2) File(Inode) Flag
"EXT4_EXTENTS_FL"
(This parameter corresponds to the case where people shifts from ext3 into ext4.
(Files which are created with ext3 have no "EXT4_EXTENTS_FL" flag.))

Table. the max file size which we can write or seek
at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag| | |
| \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
|Fs Features\| | |
+------------+-------------------------------+-------------------------------+
| !extent | write: 2194719883264 | write: -------------- |
| | seek: 2199023251456 | seek: -------------- |
+------------+-------------------------------+-------------------------------+
| extent | write: 4402345721856 | write: 17592186044415 |
| | seek: 17592186044415 | seek: 17592186044415 |
+------------+-------------------------------+-------------------------------+
(
The symbols (!extent, extent) mean:
!extent: The filesystem feature "extent" is not set.
ex. mkfs.ext3 <dev>; mount -t ext4 <dev>
extent: The filesystem feature "extent" is set.
ex. mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>
The symbols ("!EXT4_EXTENTS_FL","EXT4_EXTENTS_FL") mean:
!EXT4_EXTENS_FL: The file flag, "EXT4_EXTENTS_FL" is not set.
EXT4_EXTENS_FL: The file flag, "EXT4_EXTENTS_FL" is set.
)

According to the table, if EXT4_EXTETNS_FL flag is not set, the max file
size of write() is different from the one of seek().

These differences don't include in the ext4-specification, do they?
I think the differences are not the ext4-specification.

So, I made a fix patch of the differences.
================================================================================
Subject: [PATCH] ext4: create own llseek function to handle the max file size correctly.
From: Toshiyuki Okajima <[email protected]>

If the file has no EXT4_EXTENTS_FL flag, the maximum size which can be written
(write systemcall) is different from the maximum size which can be seeked
(lseek systemcall).

For example, the following 2 cases show us the differences:
#1: mkfs.ext3 <dev>; mount -t ext4 <dev>
#2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>

Table. the max file size which we can write or seek
at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag| | |
| \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
|case \| | |
+------------+-------------------------------+-------------------------------+
| #1 | write: 2194719883264 | write: -------------- |
| | seek: 2199023251456 | seek: -------------- |
+------------+-------------------------------+-------------------------------+
| #2 | write: 4402345721856 | write: 17592186044415 |
| | seek: 17592186044415 | seek: 17592186044415 |
+------------+-------------------------------+-------------------------------+

The differences exist because ext4 has 2 max-file-size (sb->s_maxbytes,
EXT4_SB(sb)->s_bitmap_maxbytes) although generic_file_llseek uses only
sb->s_maxbytes. (llseek of ext4_file_operations is generic_file_llseek.)
Therefore we create own llseek function which uses 2 max-file-size.

The new own function originates from generic_file_llseek_nolocked().
If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters
inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.

Signed-off-by: Toshiyuki Okajima <[email protected]>
---
fs/ext4/dir.c | 2 +-
fs/ext4/ext4.h | 1 +
fs/ext4/file.c | 39 ++++++++++++++++++++++++++++++++++++++-
3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index ea5e6cb..62c9bba 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -39,7 +39,7 @@ static int ext4_release_dir(struct inode *inode,
struct file *filp);

const struct file_operations ext4_dir_operations = {
- .llseek = generic_file_llseek,
+ .llseek = ext4_llseek,
.read = generic_read_dir,
.readdir = ext4_readdir, /* we take BKL. needed?*/
.unlocked_ioctl = ext4_ioctl,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..e7050cd 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1870,6 +1870,7 @@ extern const struct file_operations ext4_dir_operations;
/* file.c */
extern const struct inode_operations ext4_file_inode_operations;
extern const struct file_operations ext4_file_operations;
+extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);

/* namei.c */
extern const struct inode_operations ext4_dir_inode_operations;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 5313ae4..f2e2d57 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -129,8 +129,45 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
return dquot_file_open(inode, filp);
}

+loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ loff_t maxbytes;
+
+ mutex_lock(&inode->i_mutex);
+ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
+ maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes;
+ else
+ maxbytes = inode->i_sb->s_maxbytes;
+ switch (origin) {
+ case SEEK_END:
+ offset += inode->i_size;
+ break;
+ case SEEK_CUR:
+ if (offset == 0) {
+ mutex_unlock(&inode->i_mutex);
+ return file->f_pos;
+ }
+ offset += file->f_pos;
+ break;
+ }
+
+ if (offset < 0 || offset > maxbytes) {
+ mutex_unlock(&inode->i_mutex);
+ return -EINVAL;
+ }
+
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+ mutex_unlock(&inode->i_mutex);
+
+ return offset;
+}
+
const struct file_operations ext4_file_operations = {
- .llseek = generic_file_llseek,
+ .llseek = ext4_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = generic_file_aio_read,
--
1.5.5.6


2010-08-10 06:37:33

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Q] ext


On 2010-08-10, at 10:23, Toshiyuki Okajima wrote:
> Table. the max file size which we can write or seek
> at each filesystem feature tuning and file flag setting
> +===============================+===============================+
> | | |
> | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL |
> | | |
> +-------------------------------+-------------------------------+
> | write: 2194719883264 | write: -------------- |
> | seek: 2199023251456 | seek: -------------- |
> +-------------------------------+-------------------------------+
> | write: 4402345721856 | write: 17592186044415 |
> | seek: 17592186044415 | seek: 17592186044415 |
> +-------------------------------+-------------------------------+

Interesting. The 2TB vs. 4TB difference for block-mapped files is due to the removal of the 2^32 sectors limit imposed by i_blocks, and is not strictly related to extents.

> +loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
> +{
> + struct inode *inode = file->f_mapping->host;
> + loff_t maxbytes;
> +
> + mutex_lock(&inode->i_mutex);
> + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
> + maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes;
> + else
> + maxbytes = inode->i_sb->s_maxbytes;

This part doesn't really have to be under i_mutex. Otherwise, the patch looks reasonable.

> + switch (origin) {
> + case SEEK_END:
> + offset += inode->i_size;
> + break;
> + case SEEK_CUR:
> + if (offset == 0) {
> + mutex_unlock(&inode->i_mutex);
> + return file->f_pos;
> + }
> + offset += file->f_pos;
> + break;
> + }
> +
> + if (offset < 0 || offset > maxbytes) {
> + mutex_unlock(&inode->i_mutex);
> + return -EINVAL;
> + }
> +
> + if (offset != file->f_pos) {
> + file->f_pos = offset;
> + file->f_version = 0;
> + }
> + mutex_unlock(&inode->i_mutex);
> +
> + return offset;
> +}

It's too bad that we have to duplicate the whole generic_file_llseek() code here, but I don't think there is a better solution. However, it is worthwhile to add a comment to this function like:

/* copied from generic_file_llseek() to handle both block-mapped and
* extent-mapped maxbytes values. Should otherwise be identical. */

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.


2010-08-18 01:06:51

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: [Q] ext

Hi.

(2010/08/10 15:35), Andreas Dilger wrote:
>
> On 2010-08-10, at 10:23, Toshiyuki Okajima wrote:
<snip>
>> +loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
>> +{
>> + struct inode *inode = file->f_mapping->host;
>> + loff_t maxbytes;
>> +
>> + mutex_lock(&inode->i_mutex);
>> + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
>> + maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes;
>> + else
>> + maxbytes = inode->i_sb->s_maxbytes;
>
> This part doesn't really have to be under i_mutex. Otherwise, the patch looks reasonable.
OK.

>
>> + switch (origin) {
>> + case SEEK_END:
>> + offset += inode->i_size;
>> + break;
>> + case SEEK_CUR:
>> + if (offset == 0) {
>> + mutex_unlock(&inode->i_mutex);
>> + return file->f_pos;
>> + }
>> + offset += file->f_pos;
>> + break;
>> + }
>> +
>> + if (offset< 0 || offset> maxbytes) {
>> + mutex_unlock(&inode->i_mutex);
>> + return -EINVAL;
>> + }
>> +
>> + if (offset != file->f_pos) {
>> + file->f_pos = offset;
>> + file->f_version = 0;
>> + }
>> + mutex_unlock(&inode->i_mutex);
>> +
>> + return offset;
>> +}
>
> It's too bad that we have to duplicate the whole generic_file_llseek() code here, but I don't think there is a better solution. However, it is worthwhile to add a comment to this function like:
>
> /* copied from generic_file_llseek() to handle both block-mapped and
> * extent-mapped maxbytes values. Should otherwise be identical. */
I understand it.

I apply your comments and then rebuild my patch.
Immediately I'll send it.

Thanks,
Toshiyuki Okajima