2013-09-10 14:16:15

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH v6] fat: additions to support fat_fallocate

From: Namjae Jeon <[email protected]>

v6: In v3 of the patch we released the allocated clutsers in
fat_file_release on the basis of d_count. Moving this from fat_file_release
to fat_evict_inode and on the basis of i_count , so that allocated
clusters are released at the last refrence of the inode when inode
evicts from memory .

In case of direct IO , writting in fallocated area fall back to
buffered write.

Support for fibmap is also added for fallocated region
by modifying fat_bmap to map cluster in case of read request
for a block in fallocated region.

v5: change to avoid compilation warning:
fs/fat/inode.c: In function 'fat_zero_falloc_area':
>> fs/fat/inode.c:169:11: warning: comparison of distinct pointer
types lacks a cast [enabled by default]

v4: Rework based on review comments.
Add check in fat_setattr to release fallocated blocks on a truncate

v3: Release preallocated blocks at file release.

With FALLOC_FL_KEEP_SIZE, there is no way to distinguish if the
mismatch between i_size and no. of clusters allocated is a consequence
of fallocate or just plain corruption. When a non fallocate aware (old)
linux fat driver tries to write to such a file, it throws an error.Also,
fsck detects this as inconsistency and truncates the prealloc'd blocks.

To avoid this, as suggested by OGAWA, remove changes that make fallocate
persistent across mounts and restrict lifetime of blocks from
fallocate(2) to file release.

v2: On an area preallocated with FALLOC_FL_KEEP_SIZE, when a seek was
done to an offset beyond i_size, the old (garbage) data was exposed as
we did not zero out the area at allocation time. Added
fat_zero_falloc_area() to fix this.

v1: Reworked an earlier patch of the same name
(https://lkml.org/lkml/2007/12/22/130) to fix some bugs:
i) Preallocated space was not persistent and was lost on remount. Fixed
it.
ii) Did not zero out allocated clusters when FALLOC_FL_KEEP_SIZE was set,
thereby speeding up preallocation time.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Amit Sahrawat <[email protected]>
---
fs/fat/cache.c | 16 +++++++++--
fs/fat/file.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fat/inode.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/fs/fat/cache.c b/fs/fat/cache.c
index 91ad9e1..06cb903 100644
--- a/fs/fat/cache.c
+++ b/fs/fat/cache.c
@@ -312,6 +312,7 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys,
const unsigned char blocksize_bits = sb->s_blocksize_bits;
sector_t last_block;
int cluster, offset;
+ loff_t i_size = i_size_read(inode);

*phys = 0;
*mapped_blocks = 0;
@@ -323,11 +324,19 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys,
return 0;
}

- last_block = (i_size_read(inode) + (blocksize - 1)) >> blocksize_bits;
+ last_block = (i_size + (blocksize - 1)) >> blocksize_bits;
if (sector >= last_block) {
- if (!create)
- return 0;
+ if (!create) {
+ /*
+ * to map cluster in case of read request
+ * for a block in fallocated region
+ */
+ if (MSDOS_I(inode)->mmu_private >
+ round_up(i_size, sb->s_blocksize))
+ goto out_map_cluster;

+ return 0;
+ }
/*
* ->mmu_private can access on only allocation path.
* (caller must hold ->i_mutex)
@@ -338,6 +347,7 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys,
return 0;
}

+out_map_cluster:
cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits);
offset = sector & (sbi->sec_per_clus - 1);
cluster = fat_bmap_cluster(inode, cluster);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 33711ff..a3abc56 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,8 +17,11 @@
#include <linux/blkdev.h>
#include <linux/fsnotify.h>
#include <linux/security.h>
+#include <linux/falloc.h>
#include "fat.h"

+static long fat_fallocate(struct file *file, int mode,
+ loff_t offset, loff_t len);
static int fat_ioctl_get_attributes(struct inode *inode, u32 __user *user_attr)
{
u32 attr;
@@ -182,6 +185,7 @@ const struct file_operations fat_file_operations = {
#endif
.fsync = fat_file_fsync,
.splice_read = generic_file_splice_read,
+ .fallocate = fat_fallocate,
};

static int fat_cont_expand(struct inode *inode, loff_t size)
@@ -220,6 +224,87 @@ out:
return err;
}

+/*
+ * Preallocate space for a file. This implements fat's fallocate file
+ * operation, which gets called from sys_fallocate system call. User
+ * space requests len bytes at offset. If FALLOC_FL_KEEP_SIZE is set
+ * we just allocate clusters without zeroing them out. Otherwise we
+ * allocate and zero out clusters via an expanding truncate.
+ */
+static long fat_fallocate(struct file *file, int mode,
+ loff_t offset, loff_t len)
+{
+ int cluster, fclus, dclus;
+ int nr_cluster; /* Number of clusters to be allocated */
+ loff_t nr_bytes; /* Number of bytes to be allocated*/
+ loff_t free_bytes; /* Unused bytes in the last cluster of file*/
+ struct inode *inode = file->f_mapping->host;
+ struct super_block *sb = inode->i_sb;
+ struct msdos_sb_info *sbi = MSDOS_SB(sb);
+ int err = 0;
+
+ /* No support for hole punch or other fallocate flags. */
+ if (mode & ~FALLOC_FL_KEEP_SIZE)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&inode->i_mutex);
+ if ((offset + len) <= MSDOS_I(inode)->mmu_private) {
+ fat_msg(sb, KERN_ERR,
+ "fat_fallocate(): Blocks already allocated");
+ err = -EINVAL;
+ goto error;
+ }
+
+ if (mode & FALLOC_FL_KEEP_SIZE) {
+ /* First compute the number of clusters to be allocated */
+ if (inode->i_size > 0) {
+ err = fat_get_cluster(inode, FAT_ENT_EOF,
+ &fclus, &dclus);
+ if (err < 0) {
+ fat_msg(sb, KERN_ERR,
+ "fat_fallocate(): fat_get_cluster() error");
+ goto error;
+ }
+ free_bytes = ((fclus + 1) << sbi->cluster_bits) -
+ inode->i_size;
+ nr_bytes = offset + len - inode->i_size - free_bytes;
+ MSDOS_I(inode)->mmu_private = (fclus + 1) <<
+ sbi->cluster_bits;
+ } else
+ nr_bytes = offset + len - inode->i_size;
+
+ nr_cluster = (nr_bytes + (sbi->cluster_size - 1)) >>
+ sbi->cluster_bits;
+
+ /* Start the allocation.We are not zeroing out the clusters */
+ while (nr_cluster-- > 0) {
+ err = fat_alloc_clusters(inode, &cluster, 1);
+ if (err) {
+ fat_msg(sb, KERN_ERR,
+ "fat_fallocate(): fat_alloc_clusters() error");
+ goto error;
+ }
+ err = fat_chain_add(inode, cluster, 1);
+ if (err) {
+ fat_free_clusters(inode, cluster);
+ goto error;
+ }
+ MSDOS_I(inode)->mmu_private += sbi->cluster_size;
+ }
+ } else {
+ /* This is just an expanding truncate */
+ err = fat_cont_expand(inode, (offset + len));
+ if (err) {
+ fat_msg(sb, KERN_ERR,
+ "fat_fallocate(): fat_cont_expand() error");
+ }
+ }
+
+error:
+ mutex_unlock(&inode->i_mutex);
+ return err;
+}
+
/* Free all clusters after the skip'th cluster. */
static int fat_free(struct inode *inode, int skip)
{
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 3134d1e..3bd6e73 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -152,16 +152,67 @@ static void fat_write_failed(struct address_space *mapping, loff_t to)
}
}

+static int fat_zero_falloc_area(struct file *file,
+ struct address_space *mapping, loff_t pos)
+{
+ struct page *page;
+ struct inode *inode = mapping->host;
+ loff_t curpos = i_size_read(inode);
+ size_t count = pos - curpos;
+ int err;
+
+ do {
+ unsigned offset;
+ size_t bytes;
+ void *fsdata;
+
+ offset = (curpos & (PAGE_CACHE_SIZE - 1));
+ bytes = PAGE_CACHE_SIZE - offset;
+ bytes = min(bytes, count);
+
+ err = pagecache_write_begin(NULL, mapping, curpos, bytes,
+ AOP_FLAG_UNINTERRUPTIBLE,
+ &page, &fsdata);
+ if (err)
+ break;
+
+ zero_user(page, offset, bytes);
+
+ err = pagecache_write_end(NULL, mapping, curpos, bytes, bytes,
+ page, fsdata);
+ if (err < 0)
+ break;
+ curpos += bytes;
+ count -= bytes;
+ err = 0;
+ } while (count);
+
+ return err;
+}
+
static int fat_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
int err;
+ struct inode *inode = mapping->host;
+ struct super_block *sb = inode->i_sb;
+ loff_t i_size = i_size_read(inode);
+
+ if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
+ && pos > i_size) {
+ err = fat_zero_falloc_area(file, mapping, pos);
+ if (err) {
+ fat_msg(sb, KERN_ERR,
+ "Error (%d) zeroing fallocated area", err);
+ return err;
+ }
+ }

*pagep = NULL;
err = cont_write_begin(file, mapping, pos, len, flags,
pagep, fsdata, fat_get_block,
- &MSDOS_I(mapping->host)->mmu_private);
+ &MSDOS_I(inode)->mmu_private);
if (err < 0)
fat_write_failed(mapping, pos + len);
return err;
@@ -205,6 +256,14 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb,
loff_t size = offset + iov_iter_count(iter);
if (MSDOS_I(inode)->mmu_private < size)
return 0;
+ /*
+ * In case of writing in fallocated region, return 0 and
+ * fallback to buffered write.
+ */
+ if (MSDOS_I(inode)->mmu_private >
+ round_up(i_size_read(inode), inode->i_sb->s_blocksize))
+ return 0;
+
}

/*
@@ -488,6 +547,19 @@ EXPORT_SYMBOL_GPL(fat_build_inode);

static void fat_evict_inode(struct inode *inode)
{
+
+ struct super_block *sb = inode->i_sb;
+
+ /*
+ * Release unwritten fallocated blocks on file release.
+ * Do this only when the inode evict and i_count becomes 0.
+ */
+ mutex_lock(&inode->i_mutex);
+ if (round_up(inode->i_size, sb->s_blocksize) <
+ MSDOS_I(inode)->mmu_private && atomic_read(&inode->i_count) == 0)
+ fat_truncate_blocks(inode, inode->i_size);
+ mutex_unlock(&inode->i_mutex);
+
truncate_inode_pages(&inode->i_data, 0);
if (!inode->i_nlink) {
inode->i_size = 0;
--
1.7.9.5


2013-09-21 08:14:26

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

Namjae Jeon <[email protected]> writes:

Entirely, we would have to consider in the case of write fail
(e.g. -ENOSPC). If write failed, it will call truncate(). Then, it can
be truncate the fallocate region too unexpectedly.

> + if (!create) {
> + /*
> + * to map cluster in case of read request
> + * for a block in fallocated region
> + */
> + if (MSDOS_I(inode)->mmu_private >
> + round_up(i_size, sb->s_blocksize))
> + goto out_map_cluster;

->mmu_private can't stable without i_mutex. I.e. racy.

> +/*
> + * Preallocate space for a file. This implements fat's fallocate file
> + * operation, which gets called from sys_fallocate system call. User
> + * space requests len bytes at offset. If FALLOC_FL_KEEP_SIZE is set
> + * we just allocate clusters without zeroing them out. Otherwise we
> + * allocate and zero out clusters via an expanding truncate.
> + */
> +static long fat_fallocate(struct file *file, int mode,
> + loff_t offset, loff_t len)
> +{
> + int cluster, fclus, dclus;
> + int nr_cluster; /* Number of clusters to be allocated */
> + loff_t nr_bytes; /* Number of bytes to be allocated*/
> + loff_t free_bytes; /* Unused bytes in the last cluster of file*/
> + struct inode *inode = file->f_mapping->host;
> + struct super_block *sb = inode->i_sb;
> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
> + int err = 0;
> +
> + /* No support for hole punch or other fallocate flags. */
> + if (mode & ~FALLOC_FL_KEEP_SIZE)
> + return -EOPNOTSUPP;

Don't we need to check S_ISREG() here? If this inode is dir, fallocate()
will break dir.

> + mutex_lock(&inode->i_mutex);

Don't we need to check inode_newsize_ok()? It seems nobody is checking
rlimit.

> + if ((offset + len) <= MSDOS_I(inode)->mmu_private) {
> + fat_msg(sb, KERN_ERR,
> + "fat_fallocate(): Blocks already allocated");
> + err = -EINVAL;
> + goto error;
> + }

This is user error, so we should not print from kernel. Otherwise,
attacker can make flood of kernel message.

Furthermore, is this right behavior? User should care about current
fallocated size? Other FSes return -EINVAL too?

> + if (mode & FALLOC_FL_KEEP_SIZE) {
> + /* First compute the number of clusters to be allocated */
> + if (inode->i_size > 0) {
> + err = fat_get_cluster(inode, FAT_ENT_EOF,
> + &fclus, &dclus);
> + if (err < 0) {
> + fat_msg(sb, KERN_ERR,
> + "fat_fallocate(): fat_get_cluster() error");
> + goto error;
> + }
> + free_bytes = ((fclus + 1) << sbi->cluster_bits) -
> + inode->i_size;
> + nr_bytes = offset + len - inode->i_size - free_bytes;
> + MSDOS_I(inode)->mmu_private = (fclus + 1) <<
> + sbi->cluster_bits;
> + } else
> + nr_bytes = offset + len - inode->i_size;

Hm, why do we need to re-compute ->mmu_private here?

> static int fat_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> struct page **pagep, void **fsdata)
> {
> int err;
> + struct inode *inode = mapping->host;
> + struct super_block *sb = inode->i_sb;
> + loff_t i_size = i_size_read(inode);
> +
> + if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
> + && pos > i_size) {
> + err = fat_zero_falloc_area(file, mapping, pos);
> + if (err) {
> + fat_msg(sb, KERN_ERR,
> + "Error (%d) zeroing fallocated area", err);
> + return err;
> + }
> + }

Again, I'm not fan of this way.

Normally, get_block() returns with buffer_new(). Then, caller checks
blockdev buffer with

unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);

then, zeroed buffer. Do we really don't need to check this race?

> static void fat_evict_inode(struct inode *inode)
> {
> +
> + struct super_block *sb = inode->i_sb;
> +
> + /*
> + * Release unwritten fallocated blocks on file release.
> + * Do this only when the inode evict and i_count becomes 0.
> + */
> + mutex_lock(&inode->i_mutex);
> + if (round_up(inode->i_size, sb->s_blocksize) <
> + MSDOS_I(inode)->mmu_private && atomic_read(&inode->i_count) == 0)
> + fat_truncate_blocks(inode, inode->i_size);
> + mutex_unlock(&inode->i_mutex);

This is strange.

- inode->i_count is always 0 here.
- nobody touch this inode anymore, so no need ->i_mutex.
- no need to truncate if ->i_nlink == 0. it should be done already.

> truncate_inode_pages(&inode->i_data, 0);
> if (!inode->i_nlink) {
> inode->i_size = 0;

Thanks.
--
OGAWA Hirofumi <[email protected]>

2013-09-23 08:14:03

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

2013/9/21, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
Hi OGAWA.

>
> Entirely, we would have to consider in the case of write fail
> (e.g. -ENOSPC). If write failed, it will call truncate(). Then, it can
> be truncate the fallocate region too unexpectedly.
I agree. I will update as your suggestion.

>
>> + if (!create) {
>> + /*
>> + * to map cluster in case of read request
>> + * for a block in fallocated region
>> + */
>> + if (MSDOS_I(inode)->mmu_private >
>> + round_up(i_size, sb->s_blocksize))
>> + goto out_map_cluster;
>
> ->mmu_private can't stable without i_mutex. I.e. racy.
Yes. I will add lock here.

>
>> +/*
>> + * Preallocate space for a file. This implements fat's fallocate file
>> + * operation, which gets called from sys_fallocate system call. User
>> + * space requests len bytes at offset. If FALLOC_FL_KEEP_SIZE is set
>> + * we just allocate clusters without zeroing them out. Otherwise we
>> + * allocate and zero out clusters via an expanding truncate.
>> + */
>> +static long fat_fallocate(struct file *file, int mode,
>> + loff_t offset, loff_t len)
>> +{
>> + int cluster, fclus, dclus;
>> + int nr_cluster; /* Number of clusters to be allocated */
>> + loff_t nr_bytes; /* Number of bytes to be allocated*/
>> + loff_t free_bytes; /* Unused bytes in the last cluster of file*/
>> + struct inode *inode = file->f_mapping->host;
>> + struct super_block *sb = inode->i_sb;
>> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
>> + int err = 0;
>> +
>> + /* No support for hole punch or other fallocate flags. */
>> + if (mode & ~FALLOC_FL_KEEP_SIZE)
>> + return -EOPNOTSUPP;
>
> Don't we need to check S_ISREG() here? If this inode is dir, fallocate()
> will break dir.
No, It is surely needed. I will update.

>
>> + mutex_lock(&inode->i_mutex);
>
> Don't we need to check inode_newsize_ok()? It seems nobody is checking
> rlimit.
Correct. I will update.

>
>> + if ((offset + len) <= MSDOS_I(inode)->mmu_private) {
>> + fat_msg(sb, KERN_ERR,
>> + "fat_fallocate(): Blocks already allocated");
>> + err = -EINVAL;
>> + goto error;
>> + }
>
> This is user error, so we should not print from kernel. Otherwise,
> attacker can make flood of kernel message.
okay, I will remove these codes.

>
> Furthermore, is this right behavior? User should care about current
> fallocated size? Other FSes return -EINVAL too?
You're right. There is a mistake from me. Normally other fs return 0.
I will update.

>
>> + if (mode & FALLOC_FL_KEEP_SIZE) {
>> + /* First compute the number of clusters to be allocated */
>> + if (inode->i_size > 0) {
>> + err = fat_get_cluster(inode, FAT_ENT_EOF,
>> + &fclus, &dclus);
>> + if (err < 0) {
>> + fat_msg(sb, KERN_ERR,
>> + "fat_fallocate(): fat_get_cluster() error");
>> + goto error;
>> + }
>> + free_bytes = ((fclus + 1) << sbi->cluster_bits) -
>> + inode->i_size;
>> + nr_bytes = offset + len - inode->i_size - free_bytes;
>> + MSDOS_I(inode)->mmu_private = (fclus + 1) <<
>> + sbi->cluster_bits;
>> + } else
>> + nr_bytes = offset + len - inode->i_size;
>
> Hm, why do we need to re-compute ->mmu_private here?
Not needed, I will remove unneeded compute code.

>
>> static int fat_write_begin(struct file *file, struct address_space
>> *mapping,
>> loff_t pos, unsigned len, unsigned flags,
>> struct page **pagep, void **fsdata)
>> {
>> int err;
>> + struct inode *inode = mapping->host;
>> + struct super_block *sb = inode->i_sb;
>> + loff_t i_size = i_size_read(inode);
>> +
>> + if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
>> + && pos > i_size) {
>> + err = fat_zero_falloc_area(file, mapping, pos);
>> + if (err) {
>> + fat_msg(sb, KERN_ERR,
>> + "Error (%d) zeroing fallocated area", err);
>> + return err;
>> + }
>> + }
>
> Again, I'm not fan of this way.
>
> Normally, get_block() returns with buffer_new(). Then, caller checks
> blockdev buffer with
>
> unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
>
> then, zeroed buffer. Do we really don't need to check this race?
We considered after your advice before. we reach for the conclusion
that use this method.
because, Cluster is already allocated in fat fallocate and
when we write with radom offset over i_size on fallocated region, It
will be hit by fat cache in fat_bmap of get_block, which mean buffer
is not set to new.

>
>> static void fat_evict_inode(struct inode *inode)
>> {
>> +
>> + struct super_block *sb = inode->i_sb;
>> +
>> + /*
>> + * Release unwritten fallocated blocks on file release.
>> + * Do this only when the inode evict and i_count becomes 0.
>> + */
>> + mutex_lock(&inode->i_mutex);
>> + if (round_up(inode->i_size, sb->s_blocksize) <
>> + MSDOS_I(inode)->mmu_private && atomic_read(&inode->i_count) == 0)
>> + fat_truncate_blocks(inode, inode->i_size);
>> + mutex_unlock(&inode->i_mutex);
>
> This is strange.
>
> - inode->i_count is always 0 here.
> - nobody touch this inode anymore, so no need ->i_mutex.
> - no need to truncate if ->i_nlink == 0. it should be done already.
>
>> truncate_inode_pages(&inode->i_data, 0);
>> if (!inode->i_nlink) {
>> inode->i_size = 0;
>
Right. I will update patch.

Thanks for your valuable review!

> Thanks.
> --
> OGAWA Hirofumi <[email protected]>
>

2013-09-23 13:13:37

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

Namjae Jeon <[email protected]> writes:

>>> + if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
>>> + && pos > i_size) {
>>> + err = fat_zero_falloc_area(file, mapping, pos);
>>> + if (err) {
>>> + fat_msg(sb, KERN_ERR,
>>> + "Error (%d) zeroing fallocated area", err);
>>> + return err;
>>> + }
>>> + }
>>
>> Again, I'm not fan of this way.
>>
>> Normally, get_block() returns with buffer_new(). Then, caller checks
>> blockdev buffer with
>>
>> unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
>>
>> then, zeroed buffer. Do we really don't need to check this race?
> We considered after your advice before. we reach for the conclusion
> that use this method.
> because, Cluster is already allocated in fat fallocate and
> when we write with radom offset over i_size on fallocated region, It
> will be hit by fat cache in fat_bmap of get_block, which mean buffer
> is not set to new.

Hm, how does it hit to fat cache? I think fat_alloc_clusters() and
fat_chain_add() doesn't update fat cache, right? I.e. initial write
after fallocate() should not hit fat cache over i_size?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2013-09-24 04:12:42

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

2013/9/23, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>>>> + if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
>>>> + && pos > i_size) {
>>>> + err = fat_zero_falloc_area(file, mapping, pos);
>>>> + if (err) {
>>>> + fat_msg(sb, KERN_ERR,
>>>> + "Error (%d) zeroing fallocated area", err);
>>>> + return err;
>>>> + }
>>>> + }
>>>
>>> Again, I'm not fan of this way.
>>>
>>> Normally, get_block() returns with buffer_new(). Then, caller checks
>>> blockdev buffer with
>>>
>>> unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
>>>
>>> then, zeroed buffer. Do we really don't need to check this race?
>> We considered after your advice before. we reach for the conclusion
>> that use this method.
>> because, Cluster is already allocated in fat fallocate and
>> when we write with radom offset over i_size on fallocated region, It
>> will be hit by fat cache in fat_bmap of get_block, which mean buffer
>> is not set to new.
>
> Hm, how does it hit to fat cache? I think fat_alloc_clusters() and
> fat_chain_add() doesn't update fat cache, right? I.e. initial write
> after fallocate() should not hit fat cache over i_size?

Ah.. Sorry for wrong reply. old memory make me confusing.
By allocating cluster in fat fallocate, when write, fat_bmap of
get_block return physical sector number.
So buffer is not set to new in _fat_get_block.

When we fallocate with keep size on -> only clusters are added to the
fat chain calling fat_get_cluster(),and add the cluster to cluster
chain.
This doesn’t call fat_get_block() .

Now when we try to write in the fallocated region in the
fat_write_begin() when it is called first time it checks that the
mismatch is present between the mmu_private and mmu_actual (i.e., the
file has pre-allocated blocks), and hence zero out the region ;
Since buffer_new() is not set for fallocated region by fat_get_block()
, we explicitly zero out the lseek'ed region using
“fat_zero_falloc_area” and normal write occurs beyond that,and i_size
is updated accordingly.

Thanks :)
>
> Thanks.
> --
> OGAWA Hirofumi <[email protected]>
>

2013-09-24 09:58:53

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

Namjae Jeon <[email protected]> writes:

>>> We considered after your advice before. we reach for the conclusion
>>> that use this method.
>>> because, Cluster is already allocated in fat fallocate and
>>> when we write with radom offset over i_size on fallocated region, It
>>> will be hit by fat cache in fat_bmap of get_block, which mean buffer
>>> is not set to new.
>>
>> Hm, how does it hit to fat cache? I think fat_alloc_clusters() and
>> fat_chain_add() doesn't update fat cache, right? I.e. initial write
>> after fallocate() should not hit fat cache over i_size?
>
> Ah.. Sorry for wrong reply. old memory make me confusing.
> By allocating cluster in fat fallocate, when write, fat_bmap of
> get_block return physical sector number.
> So buffer is not set to new in _fat_get_block.
>
> When we fallocate with keep size on -> only clusters are added to the
> fat chain calling fat_get_cluster(),and add the cluster to cluster
> chain.
> This doesn$B!G(Bt call fat_get_block() .

Right.

> Now when we try to write in the fallocated region in the
> fat_write_begin() when it is called first time it checks that the
> mismatch is present between the mmu_private and mmu_actual (i.e., the
> file has pre-allocated blocks), and hence zero out the region ;
> Since buffer_new() is not set for fallocated region by fat_get_block()
> , we explicitly zero out the lseek'ed region using
> $B!H(Bfat_zero_falloc_area$B!I(B and normal write occurs beyond that,and i_size
> is updated accordingly.

Yes. So I'm saying fixing fat_get_block() would not be hard.

For example, add new size the disk_size, totally 3 sizes - 1) i_size 2)
mmu_private (aka, initialized size) 3) disk_size (aka, uninitialized
size).

When called fat_get_block(), it checks the region between mmu_private
and disk_size. If block hits that region, block is uninitialized area,
so return as buffer_new().

Like this, I think it is not hard. Please consider like above example
too.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2013-09-26 11:28:25

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v6] fat: additions to support fat_fallocate

2013/9/24, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>>>> We considered after your advice before. we reach for the conclusion
>>>> that use this method.
>>>> because, Cluster is already allocated in fat fallocate and
>>>> when we write with radom offset over i_size on fallocated region, It
>>>> will be hit by fat cache in fat_bmap of get_block, which mean buffer
>>>> is not set to new.
>>>
>>> Hm, how does it hit to fat cache? I think fat_alloc_clusters() and
>>> fat_chain_add() doesn't update fat cache, right? I.e. initial write
>>> after fallocate() should not hit fat cache over i_size?
>>
>> Ah.. Sorry for wrong reply. old memory make me confusing.
>> By allocating cluster in fat fallocate, when write, fat_bmap of
>> get_block return physical sector number.
>> So buffer is not set to new in _fat_get_block.
>>
>> When we fallocate with keep size on -> only clusters are added to the
>> fat chain calling fat_get_cluster(),and add the cluster to cluster
>> chain.
>> This doesn’t call fat_get_block() .
>
> Right.
>
>> Now when we try to write in the fallocated region in the
>> fat_write_begin() when it is called first time it checks that the
>> mismatch is present between the mmu_private and mmu_actual (i.e., the
>> file has pre-allocated blocks), and hence zero out the region ;
>> Since buffer_new() is not set for fallocated region by fat_get_block()
>> , we explicitly zero out the lseek'ed region using
>> “fat_zero_falloc_area” and normal write occurs beyond that,and i_size
>> is updated accordingly.
>
> Yes. So I'm saying fixing fat_get_block() would not be hard.
>
> For example, add new size the disk_size, totally 3 sizes - 1) i_size 2)
> mmu_private (aka, initialized size) 3) disk_size (aka, uninitialized
> size).
>
> When called fat_get_block(), it checks the region between mmu_private
> and disk_size. If block hits that region, block is uninitialized area,
> so return as buffer_new().
>
> Like this, I think it is not hard. Please consider like above example
> too.
Yes, I checked after change as your suggestion. it work fine.
I will post the updated patch soon.

Thanks. OGAWA.
>
> Thanks.
> --
> OGAWA Hirofumi <[email protected]>
>