2009-11-23 18:13:49

by Dmitry Monakhov

[permalink] [raw]
Subject: ext4+quota patch series

Hello, I've prepared patch-set which fixes some issues
in quota related code. With this patch-set applied i've
got success pass from all quota tests I use.


2009-11-23 18:15:44

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4+quota patch series

Dmitry Monakhov wrote:
> Hello, I've prepared patch-set which fixes some issues
> in quota related code. With this patch-set applied i've
> got success pass from all quota tests I use.
> --



Can you share those quota tests? I'd love to put them into the xfstests
suite we've been using for ext4 as well.

Thanks,
-Eric

2009-11-23 18:30:22

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 1/4] ext4: delalloc quota fixes

This patch fix most visible problems with delalloc+quota issues
- ext4_get_reserved_space() must return bytes instead of blocks.
- Claim allocated meta blocks. Do it as we do for data blocks
but delay it until proper moment.
- Move space claiming to ext4_da_update_reserve_space()
Only here we do know what we are dealing with data or metadata

The most useful test case for delalloc+quota is concurrent tasks
with write+truncate vs chown for a same file.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 13 ++++++++-----
fs/ext4/mballoc.c | 6 ------
2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ab22297..84863e6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
EXT4_I(inode)->i_reserved_meta_blocks;
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

- return total;
+ return (total << inode->i_blkbits);
}
/*
* Calculate the number of metadata blocks need to reserve
@@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
static void ext4_da_update_reserve_space(struct inode *inode, int used)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- int total, mdb, mdb_free;
+ int total, mdb, mdb_free, mdb_claim = 0;

spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
/* recalculate the number of metablocks still need to be reserved */
@@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)

if (mdb_free) {
/* Account for allocated meta_blocks */
- mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
+ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
+ BUG_ON(mdb_free < mdb_claim);
+ mdb_free -= mdb_claim;

/* update fs dirty blocks counter */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
@@ -1119,8 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)

/* update per-inode reservations */
BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
- EXT4_I(inode)->i_reserved_data_blocks -= used;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+ EXT4_I(inode)->i_reserved_data_blocks -= used;
+ percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
+ vfs_dq_claim_block(inode, used + mdb_claim);

/*
* free those over-booking quota for metadata blocks
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index bba1282..d4c52db 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
- else {
- percpu_counter_sub(&sbi->s_dirtyblocks_counter,
- ac->ac_b_ex.fe_len);
- /* convert reserved quota blocks to real quota blocks */
- vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
- }

if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi,
--
1.6.0.4


2009-11-23 18:32:00

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 2/4] ext4: fix race chown vs truncate

Currently all functions which call vfs_dq_release_reservation_block()
call it without i_block_reservation_lock. This result in
ext4_reservation vs quota_reservation inconsistency which provoke
incorrect reservation transfer and incorrect quota.
Task 1 (chown) Task 2 (truncate)
dquot_transfer
->down_write(dqptr_sem) ext4_da_release_spac
->dquot_get_reserved_space ->lock(i_block_reservation_lock)
->get_reserved_space /* decrement reservation */
->ext4_get_reserved_spac ->unlock(i_block_reservation_lock)
lock(i_block_rsv_lock) ---- /* During this time window
Read incorrect value * fs's reservation not equals
* to quota's */
->vfs_dq_release_reservation_block()
In fact i_block_reservation_lock is held by ext4_da_reserve_space()
while calling vfs_dq_reserve_block(). This may result in deadlock:
because of different lock ordering:
ext4_da_reserve_space() dquot_transfer()
lock(i_block_reservation_lock) down_write(dqptr_sem)
down_write(dqptr_sem) lock(i_block_reservation_lock)

But this not happens only because both callers must have i_mutex so
serialization happens on i_mutex.

To prevent ext4_reservation vs dquot_reservation inconsistency, we have
to reorganize locking ordering like follows:
i_block_reservation_lock > dqptr_sem
This means what all functions which changes ext4 or quota reservation have
to hold i_block_reservation_lock.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 23 ++++++++++++++++++-----
1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84863e6..c521c93 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1047,16 +1047,23 @@ cleanup:
out:
return err;
}
+/*
+ * Quota_transfer callback.
+ * During quota transfer we have to transfer rsv-blocks from one dquot to
+ * another. inode must be protected from concurrent reservation/reclamation.
+ * Locking ordering for all space reservation code:
+ * i_block_reservation_lock > dqptr_sem
+ * This is differ from i_block,i_lock locking ordering, but this is the
+ * only possible way.
+ * NOTE: Caller must hold i_block_reservation_lock.
+ */

qsize_t ext4_get_reserved_space(struct inode *inode)
{
unsigned long long total;

- spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
total = EXT4_I(inode)->i_reserved_data_blocks +
EXT4_I(inode)->i_reserved_meta_blocks;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
-
return (total << inode->i_blkbits);
}
/*
@@ -1131,6 +1138,8 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
if (mdb_free)
vfs_dq_release_reservation_block(inode, mdb_free);

+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+
/*
* If we have done all the pending block allocations and if
* there aren't any writers on the inode, we can discard the
@@ -1866,8 +1875,8 @@ repeat:
}

if (ext4_claim_free_blocks(sbi, total)) {
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
vfs_dq_release_reservation_block(inode, total);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
yield();
goto repeat;
@@ -1924,9 +1933,9 @@ static void ext4_da_release_space(struct inode *inode, int to_free)

BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
EXT4_I(inode)->i_reserved_meta_blocks = mdb;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

vfs_dq_release_reservation_block(inode, release);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
}

static void ext4_da_page_release_reservation(struct page *page,
@@ -5436,7 +5445,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
error = PTR_ERR(handle);
goto err_out;
}
+ /* i_block_reservation must being held in order to avoid races
+ * with concurent block reservation. */
+ spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
error = vfs_dq_transfer(inode, attr) ? -EDQUOT : 0;
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
if (error) {
ext4_journal_stop(handle);
return error;
--
1.6.0.4


2009-11-23 18:34:32

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer.

Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b166e90..428336a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5440,7 +5440,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
/* (user+group)*(old+new) structure, inode write (sb,
* inode block, ? - but truncate inode update has it) */
handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
- EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
+ EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto err_out;
--
1.6.0.4


2009-11-23 18:33:00

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 3/4] ext4: quota macros cleanup

Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/ext4_jbd2.h | 8 ++++++--
fs/ext4/extents.c | 2 +-
fs/ext4/inode.c | 2 +-
fs/ext4/migrate.c | 4 ++--
fs/ext4/namei.c | 8 ++++----
5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index a286598..404ab6c 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -49,7 +49,7 @@

#define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \
EXT4_XATTR_TRANS_BLOCKS - 2 + \
- 2*EXT4_QUOTA_TRANS_BLOCKS(sb))
+ EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

/*
* Define the number of metadata blocks we need to account to modify data.
@@ -57,7 +57,7 @@
* This include super block, inode block, quota blocks and xattr blocks
*/
#define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \
- 2*EXT4_QUOTA_TRANS_BLOCKS(sb))
+ EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

/* Delete operations potentially hit one directory's namespace plus an
* entire inode, plus arbitrary amounts of bitmap/indirection data. Be
@@ -92,6 +92,7 @@
* but inode, sb and group updates are done only once */
#define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_INIT_REWRITE) : 0)
+
#define EXT4_QUOTA_DEL_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_DEL_ALLOC*\
(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_DEL_REWRITE) : 0)
#else
@@ -99,6 +100,9 @@
#define EXT4_QUOTA_INIT_BLOCKS(sb) 0
#define EXT4_QUOTA_DEL_BLOCKS(sb) 0
#endif
+#define EXT4_MAXQUOTAS_TRANS_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS(sb))
+#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
+#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))

int
ext4_mark_iloc_dirty(handle_t *handle,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 715264b..f73c3ff 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2167,7 +2167,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
correct_index = 1;
credits += (ext_depth(inode)) + 1;
}
- credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
+ credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

err = ext4_ext_truncate_extend_restart(handle, inode, credits);
if (err)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c521c93..b166e90 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5439,7 +5439,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)

/* (user+group)*(old+new) structure, inode write (sb,
* inode block, ? - but truncate inode update has it) */
- handle = ext4_journal_start(inode, 2*(EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)+
+ handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index a93d5b8..8646149 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -238,7 +238,7 @@ static int extend_credit_for_blkdel(handle_t *handle, struct inode *inode)
* So allocate a credit of 3. We may update
* quota (user and group).
*/
- needed = 3 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
+ needed = 3 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

if (ext4_journal_extend(handle, needed) != 0)
retval = ext4_journal_restart(handle, needed);
@@ -477,7 +477,7 @@ int ext4_ext_migrate(struct inode *inode)
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2 * EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)
+ EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
+ 1);
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index fde08c9..17a17e1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1769,7 +1769,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, int mode,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -1803,7 +1803,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -1840,7 +1840,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2253,7 +2253,7 @@ static int ext4_symlink(struct inode *dir,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 5 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

--
1.6.0.4


2009-11-23 18:42:38

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH 2/4] ext4: fix race chown vs truncate

Dmitry Monakhov <[email protected]> writes:
> To prevent ext4_reservation vs dquot_reservation inconsistency, we have
> to reorganize locking ordering like follows:
> i_block_reservation_lock > dqptr_sem
> This means what all functions which changes ext4 or quota reservation have
> to hold i_block_reservation_lock.
While investigating this issue i've considered other solution

* Introduce i_block analog for generic reserved space management:
We may introduce i_rsv_block field in generic inode, and protected
it by i_lock(similar to i_block).
Introduce inc/dec/set/get methods similar to inode_get_bytes,
inode_sub_bytes.. .
This value is managed internally by quota code. Perform reservation
management inside dquot_reserve_space, dquot_release_reservation
without interfering with fs internals, as we do for i_blocks.
IMHO this is best way because:

1)We don't have to hold i_block_reservation_lock while quota-op
which may lead to virtual performance penalty.
2)This brings to well defined VFS interface for reserved space management.

But I expect some problems from AlViro because only ext4 would use it by now.
And off course this may lead to ext4_rsv quot_rsv inconsistency
due to some bugs.
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> fs/ext4/inode.c | 23 ++++++++++++++++++-----
> 1 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 84863e6..c521c93 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1047,16 +1047,23 @@ cleanup:
> out:
> return err;
> }
> +/*
> + * Quota_transfer callback.
> + * During quota transfer we have to transfer rsv-blocks from one dquot to
> + * another. inode must be protected from concurrent reservation/reclamation.
> + * Locking ordering for all space reservation code:
> + * i_block_reservation_lock > dqptr_sem
> + * This is differ from i_block,i_lock locking ordering, but this is the
> + * only possible way.
> + * NOTE: Caller must hold i_block_reservation_lock.
> + */
>
> qsize_t ext4_get_reserved_space(struct inode *inode)
> {
> unsigned long long total;
>
> - spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> total = EXT4_I(inode)->i_reserved_data_blocks +
> EXT4_I(inode)->i_reserved_meta_blocks;
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> -
> return (total << inode->i_blkbits);
> }
> /*
> @@ -1131,6 +1138,8 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
> if (mdb_free)
> vfs_dq_release_reservation_block(inode, mdb_free);
>
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> +
> /*
> * If we have done all the pending block allocations and if
> * there aren't any writers on the inode, we can discard the
> @@ -1866,8 +1875,8 @@ repeat:
> }
>
> if (ext4_claim_free_blocks(sbi, total)) {
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> vfs_dq_release_reservation_block(inode, total);
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
> yield();
> goto repeat;
> @@ -1924,9 +1933,9 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
>
> BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
> EXT4_I(inode)->i_reserved_meta_blocks = mdb;
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> vfs_dq_release_reservation_block(inode, release);
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> }
>
> static void ext4_da_page_release_reservation(struct page *page,
> @@ -5436,7 +5445,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> error = PTR_ERR(handle);
> goto err_out;
> }
> + /* i_block_reservation must being held in order to avoid races
> + * with concurent block reservation. */
> + spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> error = vfs_dq_transfer(inode, attr) ? -EDQUOT : 0;
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> if (error) {
> ext4_journal_stop(handle);
> return error;

2009-11-23 19:18:29

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: ext4+quota patch series

Eric Sandeen <[email protected]> writes:
>Can you share those quota tests? I'd love to put them into the xfstests
>suite we've been using for ext4 as well.

write-truncate-chown: test delalloc + quota_transfer
I've written crappy quotactl for quota manipulation
(which i use for ct-tree-quota development).Some times
it more useful. See files attached.

mkfs.ext4 /dev/sdb5 -b4096
mount /dev/sdb5 /mnt -ogrpquota,usrquota
quotacheck -cug /mnt
# sync is necessary, because files may have reserved some blocks
# which later lead to complain from claim_reserved_space
# Probably we have print *huge* warning if we found
# file with reserved blocks inodes traversing on quotaon
sync;sync;sync
# turn on quota
./quotactl --all --on --device=/dev/sdb5 --path /mnt
# run test
./write-truncate-chown /mnt/ 9999999999&
# get quota report, print warn in case of incorrect quota.
./quotactl --all --get --type 0 --device=/dev/sdb5 || echo "failed"
# checkout dmesg
dmesg


Attachments:
quotactl.c (5.10 kB)
write-truncate-chown.c (1.31 kB)
Download all attachments

2009-11-23 19:35:01

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4+quota patch series

Dmitry Monakhov wrote:
> Eric Sandeen <[email protected]> writes:
>> Can you share those quota tests? I'd love to put them into the xfstests
>> suite we've been using for ext4 as well.
>
> write-truncate-chown: test delalloc + quota_transfer
> I've written crappy quotactl for quota manipulation
> (which i use for ct-tree-quota development).Some times
> it more useful. See files attached.

Great, thanks!

-Eric


> mkfs.ext4 /dev/sdb5 -b4096
> mount /dev/sdb5 /mnt -ogrpquota,usrquota
> quotacheck -cug /mnt
> # sync is necessary, because files may have reserved some blocks
> # which later lead to complain from claim_reserved_space
> # Probably we have print *huge* warning if we found
> # file with reserved blocks inodes traversing on quotaon
> sync;sync;sync
> # turn on quota
> ./quotactl --all --on --device=/dev/sdb5 --path /mnt
> # run test
> ./write-truncate-chown /mnt/ 9999999999&
> # get quota report, print warn in case of incorrect quota.
> ./quotactl --all --get --type 0 --device=/dev/sdb5 || echo "failed"
> # checkout dmesg
> dmesg
>
>


2009-11-23 22:43:53

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH 1/4] ext4: delalloc quota fixes

Dmitry Monakhov <[email protected]> writes:
> @@ -1119,8 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> /* update per-inode reservations */
> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
> - EXT4_I(inode)->i_reserved_data_blocks -= used;
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
^^^^^^^^^^ OOps... i've just deleted spin_unlock, but it supposed to
be moved down. And appear again in second patch ;(
Seem what i've mistyped during patch splitting. Please ignore
this version, i'll send new version in a minute.

> + EXT4_I(inode)->i_reserved_data_blocks -= used;
> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
> + vfs_dq_claim_block(inode, used + mdb_claim);
>
> /*
> * free those over-booking quota for metadata blocks
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index bba1282..d4c52db 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
> /* release all the reserved blocks if non delalloc */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
> - else {
> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> - ac->ac_b_ex.fe_len);
> - /* convert reserved quota blocks to real quota blocks */
> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
> - }
>
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi,

2009-11-23 22:58:53

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 1/4] ext4: delalloc quota fixes

This patch fix most visible problems with delalloc+quota issues
- ext4_get_reserved_space() must return bytes instead of blocks.
- Claim allocated meta blocks. Do it as we do for data blocks
but delay it untill proper moment.
- Move space claiming to ext4_da_update_reserve_space()
Only here we do know what we are dealing with data or metadata

The most useful test case for delalloc+quota is concurent tasks
with write+truncate vs chown for a same file.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 12 ++++++++----
fs/ext4/mballoc.c | 6 ------
2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ab22297..e642cdb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
EXT4_I(inode)->i_reserved_meta_blocks;
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

- return total;
+ return (total << inode->i_blkbits);
}
/*
* Calculate the number of metadata blocks need to reserve
@@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
static void ext4_da_update_reserve_space(struct inode *inode, int used)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- int total, mdb, mdb_free;
+ int total, mdb, mdb_free, mdb_claim = 0;

spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
/* recalculate the number of metablocks still need to be reserved */
@@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)

if (mdb_free) {
/* Account for allocated meta_blocks */
- mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
+ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
+ BUG_ON(mdb_free < mdb_claim);
+ mdb_free -= mdb_claim;

/* update fs dirty blocks counter */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
@@ -1119,7 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)

/* update per-inode reservations */
BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
- EXT4_I(inode)->i_reserved_data_blocks -= used;
+ EXT4_I(inode)->i_reserved_data_blocks -= used;
+ percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
+ vfs_dq_claim_block(inode, used + mdb_claim);
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index bba1282..d4c52db 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
- else {
- percpu_counter_sub(&sbi->s_dirtyblocks_counter,
- ac->ac_b_ex.fe_len);
- /* convert reserved quota blocks to real quota blocks */
- vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
- }

if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi,
--
1.6.0.4


2009-11-23 22:58:54

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 2/4] ext4: fix race chown vs truncate

Currently all functions which call vfs_dq_release_reservation_block()
call it without i_block_reservation_lock. This result in
ext4_reservation vs quota_reservation inconsistency which provoke
incorrect reservation transfer and incorrect quota.
Task 1 (chown) Task 2 (truncate)
dquot_transfer
->down_write(dqptr_sem) ext4_da_release_spac
->dquot_get_reserved_space ->lock(i_block_reservation_lock)
->get_reserved_space /* decrement reservation */
->ext4_get_reserved_spac ->unlock(i_block_reservation_lock)
lock(i_block_rsv_lock) ---- /* During this time window
Read incorrect value * fs's reservation not equals
* to quota's */
->vfs_dq_release_reservation_block()
In fact i_block_reservation_lock is held by ext4_da_reserve_space()
while calling vfs_dq_reserve_block(). This may result in deadlock:
because of different lock ordering:
ext4_da_reserve_space() dquot_transfer()
lock(i_block_reservation_lock) down_write(dqptr_sem)
down_write(dqptr_sem) lock(i_block_reservation_lock)

But this not happens only because both callers must have i_mutex so
serialization happens on i_mutex.

To prevent ext4_reservation vs dquot_reservation inconsistency, we have
to reorganize locking ordering like follows:
i_block_reservation_lock > dqptr_sem
This means what all functions which changes ext4 or quota reservation have
to hold i_block_reservation_lock.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 23 +++++++++++++++++------
1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e642cdb..979362d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1047,16 +1047,23 @@ cleanup:
out:
return err;
}
+/*
+ * Quota_transfer callback.
+ * During quota transfer we have to transfer rsv-blocks from one dquot to
+ * another. inode must be protected from concurrent reservation/reclamation.
+ * Locking ordering for all space reservation code:
+ * i_block_reservation_lock > dqptr_sem
+ * This is differ from i_block,i_lock locking ordering, but this is the
+ * only possible way.
+ * NOTE: Caller must hold i_block_reservation_lock.
+ */

qsize_t ext4_get_reserved_space(struct inode *inode)
{
unsigned long long total;

- spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
total = EXT4_I(inode)->i_reserved_data_blocks +
EXT4_I(inode)->i_reserved_meta_blocks;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
-
return (total << inode->i_blkbits);
}
/*
@@ -1124,13 +1131,13 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
EXT4_I(inode)->i_reserved_data_blocks -= used;
percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
vfs_dq_claim_block(inode, used + mdb_claim);
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

/*
* free those over-booking quota for metadata blocks
*/
if (mdb_free)
vfs_dq_release_reservation_block(inode, mdb_free);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

/*
* If we have done all the pending block allocations and if
@@ -1867,8 +1874,8 @@ repeat:
}

if (ext4_claim_free_blocks(sbi, total)) {
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
vfs_dq_release_reservation_block(inode, total);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
yield();
goto repeat;
@@ -1925,9 +1932,9 @@ static void ext4_da_release_space(struct inode *inode, int to_free)

BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
EXT4_I(inode)->i_reserved_meta_blocks = mdb;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

vfs_dq_release_reservation_block(inode, release);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
}

static void ext4_da_page_release_reservation(struct page *page,
@@ -5437,7 +5444,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
error = PTR_ERR(handle);
goto err_out;
}
+ /* i_block_reservation must being held in order to avoid races
+ * with concurent block reservation. */
+ spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
error = vfs_dq_transfer(inode, attr) ? -EDQUOT : 0;
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
if (error) {
ext4_journal_stop(handle);
return error;
--
1.6.0.4


2009-11-23 22:58:55

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 3/4] ext4: quota macros cleanup

Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/ext4_jbd2.h | 8 ++++++--
fs/ext4/extents.c | 2 +-
fs/ext4/inode.c | 2 +-
fs/ext4/migrate.c | 4 ++--
fs/ext4/namei.c | 8 ++++----
5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index a286598..404ab6c 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -49,7 +49,7 @@

#define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \
EXT4_XATTR_TRANS_BLOCKS - 2 + \
- 2*EXT4_QUOTA_TRANS_BLOCKS(sb))
+ EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

/*
* Define the number of metadata blocks we need to account to modify data.
@@ -57,7 +57,7 @@
* This include super block, inode block, quota blocks and xattr blocks
*/
#define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \
- 2*EXT4_QUOTA_TRANS_BLOCKS(sb))
+ EXT4_MAXQUOTAS_TRANS_BLOCKS(sb))

/* Delete operations potentially hit one directory's namespace plus an
* entire inode, plus arbitrary amounts of bitmap/indirection data. Be
@@ -92,6 +92,7 @@
* but inode, sb and group updates are done only once */
#define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_INIT_REWRITE) : 0)
+
#define EXT4_QUOTA_DEL_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_DEL_ALLOC*\
(EXT4_SINGLEDATA_TRANS_BLOCKS(sb)-3)+3+DQUOT_DEL_REWRITE) : 0)
#else
@@ -99,6 +100,9 @@
#define EXT4_QUOTA_INIT_BLOCKS(sb) 0
#define EXT4_QUOTA_DEL_BLOCKS(sb) 0
#endif
+#define EXT4_MAXQUOTAS_TRANS_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS(sb))
+#define EXT4_MAXQUOTAS_INIT_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_INIT_BLOCKS(sb))
+#define EXT4_MAXQUOTAS_DEL_BLOCKS(sb) (MAXQUOTAS*EXT4_QUOTA_DEL_BLOCKS(sb))

int
ext4_mark_iloc_dirty(handle_t *handle,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 715264b..f73c3ff 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2167,7 +2167,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
correct_index = 1;
credits += (ext_depth(inode)) + 1;
}
- credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
+ credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

err = ext4_ext_truncate_extend_restart(handle, inode, credits);
if (err)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 979362d..211722b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5438,7 +5438,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)

/* (user+group)*(old+new) structure, inode write (sb,
* inode block, ? - but truncate inode update has it) */
- handle = ext4_journal_start(inode, 2*(EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)+
+ handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index a93d5b8..8646149 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -238,7 +238,7 @@ static int extend_credit_for_blkdel(handle_t *handle, struct inode *inode)
* So allocate a credit of 3. We may update
* quota (user and group).
*/
- needed = 3 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
+ needed = 3 + EXT4_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);

if (ext4_journal_extend(handle, needed) != 0)
retval = ext4_journal_restart(handle, needed);
@@ -477,7 +477,7 @@ int ext4_ext_migrate(struct inode *inode)
handle = ext4_journal_start(inode,
EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2 * EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)
+ EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)
+ 1);
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index fde08c9..17a17e1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1769,7 +1769,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, int mode,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -1803,7 +1803,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -1840,7 +1840,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

@@ -2253,7 +2253,7 @@ static int ext4_symlink(struct inode *dir,
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 5 +
- 2*EXT4_QUOTA_INIT_BLOCKS(dir->i_sb));
+ EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);

--
1.6.0.4


2009-11-23 22:58:56

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer.

Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 211722b..d42e954 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5439,7 +5439,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
/* (user+group)*(old+new) structure, inode write (sb,
* inode block, ? - but truncate inode update has it) */
handle = ext4_journal_start(inode, (EXT4_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
- EXT4_QUOTA_DEL_BLOCKS(inode->i_sb))+3);
+ EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb))+3);
if (IS_ERR(handle)) {
error = PTR_ERR(handle);
goto err_out;
--
1.6.0.4


2009-11-24 15:24:06

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 1/4] ext4: delalloc quota fixes

Dmitry Monakhov wrote:
> This patch fix most visible problems with delalloc+quota issues
> - ext4_get_reserved_space() must return bytes instead of blocks.
> - Claim allocated meta blocks. Do it as we do for data blocks
> but delay it untill proper moment.
> - Move space claiming to ext4_da_update_reserve_space()
> Only here we do know what we are dealing with data or metadata
>
> The most useful test case for delalloc+quota is concurent tasks
> with write+truncate vs chown for a same file.
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> fs/ext4/inode.c | 12 ++++++++----
> fs/ext4/mballoc.c | 6 ------
> 2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ab22297..e642cdb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
> EXT4_I(inode)->i_reserved_meta_blocks;
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> - return total;
> + return (total << inode->i_blkbits);
> }

Ow, whoops, yes this looks like a bug! (maybe should be in its own patch?)

> /*
> * Calculate the number of metadata blocks need to reserve
> @@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
> static void ext4_da_update_reserve_space(struct inode *inode, int used)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> - int total, mdb, mdb_free;
> + int total, mdb, mdb_free, mdb_claim = 0;
>
> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> /* recalculate the number of metablocks still need to be reserved */
> @@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> if (mdb_free) {
> /* Account for allocated meta_blocks */
> - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
> + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
> + BUG_ON(mdb_free < mdb_claim);

Did you see this happening in testing?

> + mdb_free -= mdb_claim;
>
> /* update fs dirty blocks counter */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
> @@ -1119,7 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> /* update per-inode reservations */
> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
> - EXT4_I(inode)->i_reserved_data_blocks -= used;
> + EXT4_I(inode)->i_reserved_data_blocks -= used;

looks like a little whitespace damage here

> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
> + vfs_dq_claim_block(inode, used + mdb_claim);
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> /*
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index bba1282..d4c52db 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
> /* release all the reserved blocks if non delalloc */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
> - else {
> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> - ac->ac_b_ex.fe_len);
> - /* convert reserved quota blocks to real quota blocks */
> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
> - }

Maybe Mingming can review this change better, I don't have all the quota paths
in my head yet ...

-Eric

> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi,


2009-11-24 19:38:42

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH 1/4] ext4: delalloc quota fixes

Eric Sandeen <[email protected]> writes:

> Dmitry Monakhov wrote:
>> This patch fix most visible problems with delalloc+quota issues
>> - ext4_get_reserved_space() must return bytes instead of blocks.
>> - Claim allocated meta blocks. Do it as we do for data blocks
>> but delay it untill proper moment.
>> - Move space claiming to ext4_da_update_reserve_space()
>> Only here we do know what we are dealing with data or metadata
>>
>> The most useful test case for delalloc+quota is concurent tasks
>> with write+truncate vs chown for a same file.
>>
>> Signed-off-by: Dmitry Monakhov <[email protected]>
>> ---
>> fs/ext4/inode.c | 12 ++++++++----
>> fs/ext4/mballoc.c | 6 ------
>> 2 files changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index ab22297..e642cdb 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
>> EXT4_I(inode)->i_reserved_meta_blocks;
>> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>>
>> - return total;
>> + return (total << inode->i_blkbits);
>> }
>
> Ow, whoops, yes this looks like a bug! (maybe should be in its own patch?)
Ok i'll resubmit patchset soon.
>
>> /*
>> * Calculate the number of metadata blocks need to reserve
>> @@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
>> static void ext4_da_update_reserve_space(struct inode *inode, int used)
>> {
>> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>> - int total, mdb, mdb_free;
>> + int total, mdb, mdb_free, mdb_claim = 0;
>>
>> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
>> /* recalculate the number of metablocks still need to be reserved */
>> @@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>>
>> if (mdb_free) {
>> /* Account for allocated meta_blocks */
>> - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
>> + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
>> + BUG_ON(mdb_free < mdb_claim);
>
> Did you see this happening in testing?
>
No i didn't. I've add it just for sanity reason.
>> + mdb_free -= mdb_claim;
>>
>> /* update fs dirty blocks counter */
>> percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
>> @@ -1119,7 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>>
>> /* update per-inode reservations */
>> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
>> - EXT4_I(inode)->i_reserved_data_blocks -= used;
>> + EXT4_I(inode)->i_reserved_data_blocks -= used;
>
> looks like a little whitespace damage here
>
>> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
>> + vfs_dq_claim_block(inode, used + mdb_claim);
>> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>>
>> /*
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index bba1282..d4c52db 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
>> /* release all the reserved blocks if non delalloc */
>> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
>> - else {
>> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
>> - ac->ac_b_ex.fe_len);
>> - /* convert reserved quota blocks to real quota blocks */
>> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
>> - }
>
> Maybe Mingming can review this change better, I don't have all the quota paths
> in my head yet ...
>
> -Eric
>
>> if (sbi->s_log_groups_per_flex) {
>> ext4_group_t flex_group = ext4_flex_group(sbi,

2009-12-08 00:00:54

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH 1/4] ext4: delalloc quota fixes

On Tue, 2009-11-24 at 01:58 +0300, Dmitry Monakhov wrote:
> This patch fix most visible problems with delalloc+quota issues
> - ext4_get_reserved_space() must return bytes instead of blocks.
> - Claim allocated meta blocks. Do it as we do for data blocks
> but delay it untill proper moment.
> - Move space claiming to ext4_da_update_reserve_space()
> Only here we do know what we are dealing with data or metadata
>

Thanks for sending the first fix, I am actually surprisely to know the
fix didn't get merged before, as Jan has pointed this out when he review
the original patch and I have get this fixed...somehow the latest patch
wasn't being picked at the end.

here it is.
http://marc.info/?l=linux-ext4&m=123185939602949&w=2


About the second and third issue you are trying to fix here... I think
the current code does what you want already.

The current code does reserve quota for metadata blocks at
ext4_da_update_reserve_space() already and delay the quota claim at the
later time when metadata blocks are really allocated (via
ext4_mb_mark_diskspace_used), extra reserved quota for metadata blocks
will get freed at ext4_da_update_reserve_space().
ext4_mb_mark_diskspace_used() is called for every block allocation
including medatadata allocation, so we won't miss quota claim for
metadata there.

The reason that I keep all quota claim immediately at the block
allocation time via ext4_mb_mark_diskspace_used() is to keep the block
allocation space accounting/dirty/delayed block space accounting/quota
accounting in the same place; Plus, when ext4_da_update_reserve_space()
called, the number of blocks passed is the number of blocks mapped, may
not necessarilly the the number of blocks just new allocated.

cheers,
Mingming

put the space claiming under
> The most useful test case for delalloc+quota is concurent tasks
> with write+truncate vs chown for a same file.
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> fs/ext4/inode.c | 12 ++++++++----
> fs/ext4/mballoc.c | 6 ------
> 2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ab22297..e642cdb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
> EXT4_I(inode)->i_reserved_meta_blocks;
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> - return total;
> + return (total << inode->i_blkbits);
> }



> /*
> * Calculate the number of metadata blocks need to reserve
> @@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
> static void ext4_da_update_reserve_space(struct inode *inode, int used)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> - int total, mdb, mdb_free;
> + int total, mdb, mdb_free, mdb_claim = 0;
>
> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> /* recalculate the number of metablocks still need to be reserved */
> @@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> if (mdb_free) {
> /* Account for allocated meta_blocks */
> - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
> + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
> + BUG_ON(mdb_free < mdb_claim);
> + mdb_free -= mdb_claim;
>
> /* update fs dirty blocks counter */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
> @@ -1119,7 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> /* update per-inode reservations */
> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
> - EXT4_I(inode)->i_reserved_data_blocks -= used;
> + EXT4_I(inode)->i_reserved_data_blocks -= used;
> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
> + vfs_dq_claim_block(inode, used + mdb_claim);
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> /*
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index bba1282..d4c52db 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
> /* release all the reserved blocks if non delalloc */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
> - else {
> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> - ac->ac_b_ex.fe_len);
> - /* convert reserved quota blocks to real quota blocks */
> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
> - }
>
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi,



2009-12-08 06:34:24

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH 1/4] ext4: delalloc quota fixes

Mingming <[email protected]> writes:

> On Tue, 2009-11-24 at 01:58 +0300, Dmitry Monakhov wrote:
>> This patch fix most visible problems with delalloc+quota issues
>> - ext4_get_reserved_space() must return bytes instead of blocks.
>> - Claim allocated meta blocks. Do it as we do for data blocks
>> but delay it untill proper moment.
>> - Move space claiming to ext4_da_update_reserve_space()
>> Only here we do know what we are dealing with data or metadata
>>
>
> Thanks for sending the first fix, I am actually surprisely to know the
> fix didn't get merged before, as Jan has pointed this out when he review
> the original patch and I have get this fixed...somehow the latest patch
> wasn't being picked at the end.
>
> here it is.
> http://marc.info/?l=linux-ext4&m=123185939602949&w=2
>
>
> About the second and third issue you are trying to fix here... I think
> the current code does what you want already.
>
> The current code does reserve quota for metadata blocks at
> ext4_da_update_reserve_space() already and delay the quota claim at the
> later time when metadata blocks are really allocated (via
> ext4_mb_mark_diskspace_used), extra reserved quota for metadata blocks
> will get freed at ext4_da_update_reserve_space().
> ext4_mb_mark_diskspace_used() is called for every block allocation
> including medatadata allocation, so we won't miss quota claim for
> metadata there.
>
> The reason that I keep all quota claim immediately at the block
> allocation time via ext4_mb_mark_diskspace_used() is to keep the block
> allocation space accounting/dirty/delayed block space accounting/quota
> accounting in the same place; Plus, when ext4_da_update_reserve_space()
> called, the number of blocks passed is the number of blocks mapped, may
> not necessarilly the the number of blocks just new allocated.
You have reviewed wrong patch version. Please read the latest one
http://article.gmane.org/gmane.comp.file-systems.ext4/16629 in order
to get my idea. But in fact as Aneesh have spotted, my I've missed
important issue aka #14739. Currently I'm working on new patch set
version. This patch series will fix "chown vs truncate issue" and
#14739 by one shot.
>
> cheers,
> Mingming
>
> put the space claiming under
>> The most useful test case for delalloc+quota is concurent tasks
>> with write+truncate vs chown for a same file.
>>
>> Signed-off-by: Dmitry Monakhov <[email protected]>
>> ---
>> fs/ext4/inode.c | 12 ++++++++----
>> fs/ext4/mballoc.c | 6 ------
>> 2 files changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index ab22297..e642cdb 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1057,7 +1057,7 @@ qsize_t ext4_get_reserved_space(struct inode *inode)
>> EXT4_I(inode)->i_reserved_meta_blocks;
>> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>>
>> - return total;
>> + return (total << inode->i_blkbits);
>> }
>
>
>
>> /*
>> * Calculate the number of metadata blocks need to reserve
>> @@ -1096,7 +1096,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
>> static void ext4_da_update_reserve_space(struct inode *inode, int used)
>> {
>> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>> - int total, mdb, mdb_free;
>> + int total, mdb, mdb_free, mdb_claim = 0;
>>
>> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
>> /* recalculate the number of metablocks still need to be reserved */
>> @@ -1109,7 +1109,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>>
>> if (mdb_free) {
>> /* Account for allocated meta_blocks */
>> - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
>> + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
>> + BUG_ON(mdb_free < mdb_claim);
>> + mdb_free -= mdb_claim;
>>
>> /* update fs dirty blocks counter */
>> percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
>> @@ -1119,7 +1121,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>>
>> /* update per-inode reservations */
>> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
>> - EXT4_I(inode)->i_reserved_data_blocks -= used;
>> + EXT4_I(inode)->i_reserved_data_blocks -= used;
>> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
>> + vfs_dq_claim_block(inode, used + mdb_claim);
>> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>>
>> /*
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index bba1282..d4c52db 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2750,12 +2750,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
>> /* release all the reserved blocks if non delalloc */
>> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
>> - else {
>> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
>> - ac->ac_b_ex.fe_len);
>> - /* convert reserved quota blocks to real quota blocks */
>> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
>> - }
>>
>> if (sbi->s_log_groups_per_flex) {
>> ext4_group_t flex_group = ext4_flex_group(sbi,
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html