2014-02-03 04:26:29

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

Namjae Jeon <[email protected]> writes:

> From: Namjae Jeon <[email protected]>
>
> Make the fibmap call the return the proper physical block number for any
> offset request in the fallocated range.
>
> Signed-off-by: Namjae Jeon <[email protected]>
> Signed-off-by: Amit Sahrawat <[email protected]>
> ---
> fs/fat/cache.c | 13 ++++++++++---
> fs/fat/fat.h | 3 +++
> fs/fat/inode.c | 3 +++
> 3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/fs/fat/cache.c b/fs/fat/cache.c
> index a132666..d22c1a2 100644
> --- a/fs/fat/cache.c
> +++ b/fs/fat/cache.c
> @@ -325,19 +325,26 @@ int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys,
>
> last_block = (i_size_read(inode) + (blocksize - 1)) >> blocksize_bits;
> if (sector >= last_block) {
> - if (!create)
> - return 0;
> -
> /*
> * Both ->mmu_private and ->i_disksize can access
> * on only allocation path. (caller must hold ->i_mutex)
> */
> last_block = (MSDOS_I(inode)->i_disksize + (blocksize - 1))
> >> blocksize_bits;
> + if (!create) {
> + /* Map a block in fallocated region */
> + if (atomic_read(&MSDOS_I(inode)->beyond_isize))
> + if (sector < last_block)
> + goto out_map_cluster;
> +
> + return 0;
> + }
> +
> if (sector >= last_block)
> return 0;
> }
>
> +out_map_cluster:
> cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits);
> offset = sector & (sbi->sec_per_clus - 1);
> cluster = fat_bmap_cluster(inode, cluster);
> diff --git a/fs/fat/fat.h b/fs/fat/fat.h
> index 7b5851f..b884276 100644
> --- a/fs/fat/fat.h
> +++ b/fs/fat/fat.h
> @@ -129,6 +129,9 @@ struct msdos_inode_info {
> struct hlist_node i_dir_hash; /* hash by i_logstart */
> struct rw_semaphore truncate_lock; /* protect bmap against truncate */
> struct inode vfs_inode;
> +
> + /* for getting block number beyond file size in case of fallocate */
> + atomic_t beyond_isize;
> };
>
> struct fat_slot_info {
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 3636617..1c3192b 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -256,7 +256,10 @@ static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
>
> /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
> down_read(&MSDOS_I(mapping->host)->truncate_lock);
> + /* To get block number beyond file size in fallocated region */
> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
> blocknr = generic_block_bmap(mapping, block, fat_get_block);
> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
> up_read(&MSDOS_I(mapping->host)->truncate_lock);

This is racy. While user is using bmap, kernel can allocate new blocks.
We should use another function for this.

For example, something like

fat_get_block_bmap()
{
[...]
fat_get_block2(inode, iblock, &max_blocks, bh_result, create, bmap);
[...]
}

blocknr = generic_block_bmap(mapping, block, fat_get_block_bmap);
--
OGAWA Hirofumi <[email protected]>


2014-02-03 23:13:28

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

2014-02-03, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>> From: Namjae Jeon <[email protected]>
>>
>> Make the fibmap call the return the proper physical block number for any
>> offset request in the fallocated range.
>>
>> Signed-off-by: Namjae Jeon <[email protected]>
>> Signed-off-by: Amit Sahrawat <[email protected]>
>> ---
>> fs/fat/cache.c | 13 ++++++++++---
>> fs/fat/fat.h | 3 +++
>> fs/fat/inode.c | 3 +++
>> 3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fat/cache.c b/fs/fat/cache.c
>> index a132666..d22c1a2 100644
>> --- a/fs/fat/cache.c
>> +++ b/fs/fat/cache.c
>> @@ -325,19 +325,26 @@ int fat_bmap(struct inode *inode, sector_t sector,
>> sector_t *phys,
>>
>> last_block = (i_size_read(inode) + (blocksize - 1)) >> blocksize_bits;
>> if (sector >= last_block) {
>> - if (!create)
>> - return 0;
>> -
>> /*
>> * Both ->mmu_private and ->i_disksize can access
>> * on only allocation path. (caller must hold ->i_mutex)
>> */
>> last_block = (MSDOS_I(inode)->i_disksize + (blocksize - 1))
>> >> blocksize_bits;
>> + if (!create) {
>> + /* Map a block in fallocated region */
>> + if (atomic_read(&MSDOS_I(inode)->beyond_isize))
>> + if (sector < last_block)
>> + goto out_map_cluster;
>> +
>> + return 0;
>> + }
>> +
>> if (sector >= last_block)
>> return 0;
>> }
>>
>> +out_map_cluster:
>> cluster = sector >> (sbi->cluster_bits - sb->s_blocksize_bits);
>> offset = sector & (sbi->sec_per_clus - 1);
>> cluster = fat_bmap_cluster(inode, cluster);
>> diff --git a/fs/fat/fat.h b/fs/fat/fat.h
>> index 7b5851f..b884276 100644
>> --- a/fs/fat/fat.h
>> +++ b/fs/fat/fat.h
>> @@ -129,6 +129,9 @@ struct msdos_inode_info {
>> struct hlist_node i_dir_hash; /* hash by i_logstart */
>> struct rw_semaphore truncate_lock; /* protect bmap against truncate */
>> struct inode vfs_inode;
>> +
>> + /* for getting block number beyond file size in case of fallocate */
>> + atomic_t beyond_isize;
>> };
>>
>> struct fat_slot_info {
>> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
>> index 3636617..1c3192b 100644
>> --- a/fs/fat/inode.c
>> +++ b/fs/fat/inode.c
>> @@ -256,7 +256,10 @@ static sector_t _fat_bmap(struct address_space
>> *mapping, sector_t block)
>>
>> /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
>> down_read(&MSDOS_I(mapping->host)->truncate_lock);
>> + /* To get block number beyond file size in fallocated region */
>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
>> blocknr = generic_block_bmap(mapping, block, fat_get_block);
>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
>> up_read(&MSDOS_I(mapping->host)->truncate_lock);
>
> This is racy. While user is using bmap, kernel can allocate new blocks.
> We should use another function for this.
I understand that fat can map fallocated blocks in read case while
user is using bmap.
But I can not find the case allocate new blocks.
If I am missing something, Could you please elaborate more ?
Is it a case of _bmap request returning the block number for block
allocated in parallel write path ?

Thanks.
>
> For example, something like
>
> fat_get_block_bmap()
> {
> [...]
> fat_get_block2(inode, iblock, &max_blocks, bh_result, create, bmap);
> [...]
> }
>
> blocknr = generic_block_bmap(mapping, block, fat_get_block_bmap);
> --
> OGAWA Hirofumi <[email protected]>
>

2014-02-04 02:46:07

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

Namjae Jeon <[email protected]> writes:

>>> /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
>>> down_read(&MSDOS_I(mapping->host)->truncate_lock);
>>> + /* To get block number beyond file size in fallocated region */
>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
>>> blocknr = generic_block_bmap(mapping, block, fat_get_block);
>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
>>> up_read(&MSDOS_I(mapping->host)->truncate_lock);
>>
>> This is racy. While user is using bmap, kernel can allocate new blocks.
>> We should use another function for this.
> I understand that fat can map fallocated blocks in read case while
> user is using bmap.
> But I can not find the case allocate new blocks.
> If I am missing something, Could you please elaborate more ?
> Is it a case of _bmap request returning the block number for block
> allocated in parallel write path ?

->beyond_size is global for inode. So, write(2) path on same inode with
bmap() also can see 1 set by bmap() while another process is using bmap().
--
OGAWA Hirofumi <[email protected]>

2014-02-04 04:03:19

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

2014-02-04, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>>>> /* fat_get_cluster() assumes the requested blocknr isn't truncated.
>>>> */
>>>> down_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>> + /* To get block number beyond file size in fallocated region */
>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
>>>> blocknr = generic_block_bmap(mapping, block, fat_get_block);
>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
>>>> up_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>
>>> This is racy. While user is using bmap, kernel can allocate new blocks.
>>> We should use another function for this.
>> I understand that fat can map fallocated blocks in read case while
>> user is using bmap.
>> But I can not find the case allocate new blocks.
>> If I am missing something, Could you please elaborate more ?
>> Is it a case of _bmap request returning the block number for block
>> allocated in parallel write path ?
>
> ->beyond_size is global for inode. So, write(2) path on same inode with
> bmap() also can see 1 set by bmap() while another process is using bmap().
'create' flag will be 1 in write(2) path. ->beyond_isize will only be
checked when 'create' flag is 0. Is there any case to be racy by
beyond_isize in write(2) path ?

Thanks.
> --
> OGAWA Hirofumi <[email protected]>
>

2014-02-04 06:55:57

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

Namjae Jeon <[email protected]> writes:

> 2014-02-04, OGAWA Hirofumi <[email protected]>:
>> Namjae Jeon <[email protected]> writes:
>>
>>>>> /* fat_get_cluster() assumes the requested blocknr isn't truncated.
>>>>> */
>>>>> down_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>>> + /* To get block number beyond file size in fallocated region */
>>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
>>>>> blocknr = generic_block_bmap(mapping, block, fat_get_block);
>>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
>>>>> up_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>>
>>>> This is racy. While user is using bmap, kernel can allocate new blocks.
>>>> We should use another function for this.
>>> I understand that fat can map fallocated blocks in read case while
>>> user is using bmap.
>>> But I can not find the case allocate new blocks.
>>> If I am missing something, Could you please elaborate more ?
>>> Is it a case of _bmap request returning the block number for block
>>> allocated in parallel write path ?
>>
>> ->beyond_size is global for inode. So, write(2) path on same inode with
>> bmap() also can see 1 set by bmap() while another process is using bmap().
> 'create' flag will be 1 in write(2) path. ->beyond_isize will only be
> checked when 'create' flag is 0. Is there any case to be racy by
> beyond_isize in write(2) path ?

Ah, so instead of write, it will assign physical address to buffers
beyond i_size for simple read if race? In this case, it is still wrong.
--
OGAWA Hirofumi <[email protected]>

2014-02-04 07:00:22

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH v3 5/6] fat: permit to return phy block number by fibmap in fallocated region

2014-02-04, OGAWA Hirofumi <[email protected]>:
> Namjae Jeon <[email protected]> writes:
>
>> 2014-02-04, OGAWA Hirofumi <[email protected]>:
>>> Namjae Jeon <[email protected]> writes:
>>>
>>>>>> /* fat_get_cluster() assumes the requested blocknr isn't truncated.
>>>>>> */
>>>>>> down_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>>>> + /* To get block number beyond file size in fallocated region */
>>>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 1);
>>>>>> blocknr = generic_block_bmap(mapping, block, fat_get_block);
>>>>>> + atomic_set(&MSDOS_I(mapping->host)->beyond_isize, 0);
>>>>>> up_read(&MSDOS_I(mapping->host)->truncate_lock);
>>>>>
>>>>> This is racy. While user is using bmap, kernel can allocate new
>>>>> blocks.
>>>>> We should use another function for this.
>>>> I understand that fat can map fallocated blocks in read case while
>>>> user is using bmap.
>>>> But I can not find the case allocate new blocks.
>>>> If I am missing something, Could you please elaborate more ?
>>>> Is it a case of _bmap request returning the block number for block
>>>> allocated in parallel write path ?
>>>
>>> ->beyond_size is global for inode. So, write(2) path on same inode with
>>> bmap() also can see 1 set by bmap() while another process is using
>>> bmap().
>> 'create' flag will be 1 in write(2) path. ->beyond_isize will only be
>> checked when 'create' flag is 0. Is there any case to be racy by
>> beyond_isize in write(2) path ?
>
> Ah, so instead of write, it will assign physical address to buffers
> beyond i_size for simple read if race? In this case, it is still wrong.
Right. I will fix this case.
Thanks for review!
> --
> OGAWA Hirofumi <[email protected]>
>