2014-05-22 10:47:51

by Niu Yawei

[permalink] [raw]
Subject: [PATCH] quota: remove dqptr_sem for scalability

There are several global locks in the VFS quota code which hurts
performance a lot when quota accounting enabled, dqptr_sem is the major one.

This patch tries to make the VFS quota code scalable with minimal changes.

Following tests (mdtest & dbench) were running over ext4 fs in a
centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
the patch relieved the lock congestion a lot.

=== mdtest (http://sourceforge.net/projects/mdtest/)

mdtest creation:
threads 1 2 4 8 16
disabled quota: 40045 78379 128652 89176 103666
enabled quota: 34939 46725 24095 14321 16510
patched/disabled: 39120 75325 124181 72012 86622
patched/enabled: 34769 67086 111854 85923 87982

mdtest unlink:
threads 1 2 4 8 16
disabled quota: 91587 148808 227496 193661 190477
enabled quota: 72426 48726 14845 12825 15907
patched/disabled: 85246 146369 228514 194053 192407
patched/enabled: 78257 124332 166146 180874 174715

=== dbench test (8 threads)

disabled quota:

Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 32 3.585 8.437
Flush 82625 3.797 207.561
Close 865785 0.004 1.353
LockX 3840 0.005 0.182
Mkdir 16 0.005 0.007
Rename 49897 0.050 6.085
ReadX 1847719 0.006 6.332
WriteX 588019 0.033 6.968
Unlink 238061 0.050 6.537
UnlockX 3840 0.004 0.302
FIND_FIRST 413054 0.024 2.920
SET_FILE_INFORMATION 95961 0.035 6.998
QUERY_FILE_INFORMATION 187253 0.003 0.478
QUERY_PATH_INFORMATION 1068225 0.010 6.211
QUERY_FS_INFORMATION 195932 0.006 0.541
NTCreateX 1178614 0.021 64.684

Throughput 616.998 MB/sec 8 clients 8 procs max_latency=207.575 ms

enabled quota:

Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 16 11.240 54.888
Flush 61421 3.627 127.876
Close 643369 0.004 0.924
LockX 2848 0.005 0.253
Mkdir 8 0.005 0.008
Rename 37088 0.116 3.845
ReadX 1372315 0.007 5.024
WriteX 435537 0.106 18.304
Unlink 176928 0.351 29.266
UnlockX 2848 0.004 0.095
FIND_FIRST 306847 0.024 1.689
SET_FILE_INFORMATION 71406 0.040 8.933
QUERY_FILE_INFORMATION 138904 0.003 0.421
QUERY_PATH_INFORMATION 794000 0.011 4.027
QUERY_FS_INFORMATION 145520 0.006 0.473
NTCreateX 875964 0.072 52.923

Throughput 457.433 MB/sec 8 clients 8 procs max_latency=127.902 ms

patched/disabled:

Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 32 3.332 8.210
Flush 82543 3.790 146.987
Close 865200 0.004 1.289
LockX 3836 0.005 0.142
Mkdir 16 0.008 0.038
Rename 49870 0.052 4.907
ReadX 1846334 0.006 6.107
WriteX 587645 0.033 8.086
Unlink 237737 0.052 6.440
UnlockX 3836 0.004 0.105
FIND_FIRST 412704 0.024 1.597
SET_FILE_INFORMATION 95948 0.034 7.854
QUERY_FILE_INFORMATION 187179 0.003 0.408
QUERY_PATH_INFORMATION 1067460 0.010 5.316
QUERY_FS_INFORMATION 195706 0.006 0.613
NTCreateX 1177689 0.021 6.521

Throughput 616.574 MB/sec 8 clients 8 procs max_latency=147.007 ms

patched/enabled:

Operation Count AvgLat MaxLat
--------------------------------------------------
Deltree 32 3.248 8.430
Flush 80481 3.908 241.537
Close 843781 0.004 0.561
LockX 3746 0.005 0.141
Mkdir 16 0.005 0.007
Rename 48642 0.051 6.466
ReadX 1800754 0.006 87.027
WriteX 573185 0.033 6.750
Unlink 231880 0.058 14.507
UnlockX 3746 0.004 0.103
FIND_FIRST 402463 0.024 1.342
SET_FILE_INFORMATION 93557 0.035 42.348
QUERY_FILE_INFORMATION 182573 0.003 1.305
QUERY_PATH_INFORMATION 1041026 0.010 86.289
QUERY_FS_INFORMATION 190869 0.006 1.240
NTCreateX 1148570 0.022 6.285

Throughput 602.147 MB/sec 8 clients 8 procs max_latency=241.561 ms


[PATCH] quota: remove dqptr_sem for scalability

Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
unchanged), and the functionality of this lock is implemented by
other locks:
* i_dquot is protected by i_lock, however only this pointer, the
content of this struct is by dq_data_lock.
* Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
* Small changes in __dquot_initialize() to avoid unnecessary
dqget()/dqput() calls.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 171 ++++++++++++++++++++++++++-------------------------
fs/quota/quota.c | 6 +-
fs/stat.c | 7 ++-
fs/super.c | 1 -
include/linux/fs.h | 1 +
5 files changed, 97 insertions(+), 89 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9cd5f63..99394b2 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -83,26 +83,21 @@
/*
* There are three quota SMP locks. dq_list_lock protects all lists with quotas
* and quota formats.
- * dq_data_lock protects data from dq_dqb and also mem_dqinfo structures and
- * also guards consistency of dquot->dq_dqb with inode->i_blocks, i_bytes.
- * i_blocks and i_bytes updates itself are guarded by i_lock acquired directly
- * in inode_add_bytes() and inode_sub_bytes(). dq_state_lock protects
- * modifications of quota state (on quotaon and quotaoff) and readers who care
- * about latest values take it as well.
- *
- * The spinlock ordering is hence: dq_data_lock > dq_list_lock > i_lock,
+ * dq_data_lock protects data from dq_dqb and also mem_dqinfo structures.
+ * dq_state_lock protects modifications of quota state (on quotaon and quotaoff)
+ * and readers who care about latest values take it as well.
+ *
+ * The spinlock ordering is hence: i_lock > dq_data_lock > dq_list_lock,
* dq_list_lock > dq_state_lock
*
* Note that some things (eg. sb pointer, type, id) doesn't change during
* the life of the dquot structure and so needn't to be protected by a lock
*
- * Any operation working on dquots via inode pointers must hold dqptr_sem. If
- * operation is just reading pointers from inode (or not using them at all) the
- * read lock is enough. If pointers are altered function must hold write lock.
+ * Any operation working on dquots via inode pointers must hold i_lock.
* Special care needs to be taken about S_NOQUOTA inode flag (marking that
* inode is a quota file). Functions adding pointers from inode to dquots have
- * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
- * have to do all pointer modifications before dropping dqptr_sem. This makes
+ * to check this flag under i_lock and then (if S_NOQUOTA is not set) they
+ * have to do all pointer modifications before dropping i_lock. This makes
* sure they cannot race with quotaon which first sets S_NOQUOTA flag and
* then drops all pointers to dquots from an inode.
*
@@ -116,16 +111,9 @@
* spinlock to internal buffers before writing.
*
* Lock ordering (including related VFS locks) is the following:
- * dqonoff_mutex > i_mutex > journal_lock > dqptr_sem > dquot->dq_lock >
- * dqio_mutex
+ * i_mutex > dqonoff_sem > journal_lock > dquot->dq_lock > dqio_mutex
* dqonoff_mutex > i_mutex comes from dquot_quota_sync, dquot_enable, etc.
- * The lock ordering of dqptr_sem imposed by quota code is only dqonoff_sem >
- * dqptr_sem. But filesystem has to count with the fact that functions such as
- * dquot_alloc_space() acquire dqptr_sem and they usually have to be called
- * from inside a transaction to keep filesystem consistency after a crash. Also
- * filesystems usually want to do some IO on dquot from ->mark_dirty which is
- * called with dqptr_sem held.
- */
+ */

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_list_lock);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_state_lock);
@@ -974,7 +962,6 @@ static inline int dqput_blocks(struct dquot *dquot)
/*
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
- * We can't race with anybody because we hold dqptr_sem for writing...
*/
static int remove_inode_dquot_ref(struct inode *inode, int type,
struct list_head *tofree_head)
@@ -1035,13 +1022,15 @@ static void remove_dquot_ref(struct super_block *sb, int type,
* We have to scan also I_NEW inodes because they can already
* have quota pointer initialized. Luckily, we need to touch
* only quota pointers and these have separate locking
- * (dqptr_sem).
+ * (i_lock).
*/
+ spin_lock(&inode->i_lock);
if (!IS_NOQUOTA(inode)) {
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
remove_inode_dquot_ref(inode, type, tofree_head);
}
+ spin_unlock(&inode->i_lock);
}
spin_unlock(&inode_sb_list_lock);
#ifdef CONFIG_QUOTA_DEBUG
@@ -1059,9 +1048,7 @@ static void drop_dquot_ref(struct super_block *sb, int type)
LIST_HEAD(tofree_head);

if (sb->dq_op) {
- down_write(&sb_dqopt(sb)->dqptr_sem);
remove_dquot_ref(sb, type, &tofree_head);
- up_write(&sb_dqopt(sb)->dqptr_sem);
put_dquot_list(&tofree_head);
}
}
@@ -1392,25 +1379,27 @@ static int dquot_active(const struct inode *inode)
/*
* Initialize quota pointers in inode
*
- * We do things in a bit complicated way but by that we avoid calling
- * dqget() and thus filesystem callbacks under dqptr_sem.
- *
* It is better to call this function outside of any transaction as it
* might need a lot of space in journal for dquot structure allocation.
*/
static void __dquot_initialize(struct inode *inode, int type)
{
- int cnt;
- struct dquot *got[MAXQUOTAS];
+ int cnt, dq_get = 0;
+ struct dquot *got[MAXQUOTAS] = { NULL, NULL };
struct super_block *sb = inode->i_sb;
qsize_t rsv;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

- /* First get references to structures we might need. */
+ /* In most case, the i_dquot should have been initialized, except
+ * the newly allocated one. We'd always try to skip the dqget() and
+ * dqput() calls to avoid unnecessary global lock contention. */
+ if (!(inode->i_state & I_NEW))
+ goto init_idquot;
+
+get_dquots:
+ dq_get = 1;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
struct kqid qid;
got[cnt] = NULL;
@@ -1426,8 +1415,8 @@ static void __dquot_initialize(struct inode *inode, int type)
}
got[cnt] = dqget(sb, qid);
}
-
- down_write(&sb_dqopt(sb)->dqptr_sem);
+init_idquot:
+ spin_lock(&inode->i_lock);
if (IS_NOQUOTA(inode))
goto out_err;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1437,9 +1426,13 @@ static void __dquot_initialize(struct inode *inode, int type)
if (!sb_has_quota_active(sb, cnt))
continue;
/* We could race with quotaon or dqget() could have failed */
- if (!got[cnt])
+ if (!got[cnt] && dq_get)
continue;
if (!inode->i_dquot[cnt]) {
+ if (dq_get == 0) {
+ spin_unlock(&inode->i_lock);
+ goto get_dquots;
+ }
inode->i_dquot[cnt] = got[cnt];
got[cnt] = NULL;
/*
@@ -1455,7 +1448,7 @@ static void __dquot_initialize(struct inode *inode, int type)
}
}
out_err:
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_unlock(&inode->i_lock);
/* Drop unused references */
dqput_all(got);
}
@@ -1474,12 +1467,12 @@ static void __dquot_drop(struct inode *inode)
int cnt;
struct dquot *put[MAXQUOTAS];

- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
put[cnt] = inode->i_dquot[cnt];
inode->i_dquot[cnt] = NULL;
}
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&inode->i_lock);
dqput_all(put);
}

@@ -1519,36 +1512,57 @@ static qsize_t *inode_reserved_space(struct inode * inode)
return inode->i_sb->dq_op->get_reserved_space(inode);
}

+static inline void __inode_add_rsv_space(struct inode *inode, qsize_t number)
+{
+ *inode_reserved_space(inode) += number;
+}
+
void inode_add_rsv_space(struct inode *inode, qsize_t number)
{
spin_lock(&inode->i_lock);
- *inode_reserved_space(inode) += number;
+ __inode_add_rsv_space(inode, number);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(inode_add_rsv_space);

+static inline void __inode_claim_rsv_space(struct inode *inode, qsize_t number)
+{
+ *inode_reserved_space(inode) -= number;
+ __inode_add_bytes(inode, number);
+}
+
void inode_claim_rsv_space(struct inode *inode, qsize_t number)
{
spin_lock(&inode->i_lock);
- *inode_reserved_space(inode) -= number;
- __inode_add_bytes(inode, number);
+ __inode_claim_rsv_space(inode, number);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(inode_claim_rsv_space);

-void inode_reclaim_rsv_space(struct inode *inode, qsize_t number)
+static inline void __inode_reclaim_rsv_space(struct inode *inode,
+ qsize_t number)
{
- spin_lock(&inode->i_lock);
*inode_reserved_space(inode) += number;
__inode_sub_bytes(inode, number);
+}
+
+void inode_reclaim_rsv_space(struct inode *inode, qsize_t number)
+{
+ spin_lock(&inode->i_lock);
+ __inode_reclaim_rsv_space(inode, number);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(inode_reclaim_rsv_space);

+static inline void __inode_sub_rsv_space(struct inode *inode, qsize_t number)
+{
+ *inode_reserved_space(inode) -= number;
+}
+
void inode_sub_rsv_space(struct inode *inode, qsize_t number)
{
spin_lock(&inode->i_lock);
- *inode_reserved_space(inode) -= number;
+ __inode_sub_rsv_space(inode, number);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(inode_sub_rsv_space);
@@ -1559,9 +1573,8 @@ static qsize_t inode_get_rsv_space(struct inode *inode)

if (!inode->i_sb->dq_op->get_reserved_space)
return 0;
- spin_lock(&inode->i_lock);
+
ret = *inode_reserved_space(inode);
- spin_unlock(&inode->i_lock);
return ret;
}

@@ -1569,17 +1582,17 @@ static void inode_incr_space(struct inode *inode, qsize_t number,
int reserve)
{
if (reserve)
- inode_add_rsv_space(inode, number);
+ __inode_add_rsv_space(inode, number);
else
- inode_add_bytes(inode, number);
+ __inode_add_bytes(inode, number);
}

static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
{
if (reserve)
- inode_sub_rsv_space(inode, number);
+ __inode_sub_rsv_space(inode, number);
else
- inode_sub_bytes(inode, number);
+ __inode_sub_bytes(inode, number);
}

/*
@@ -1602,10 +1615,6 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
struct dquot **dquots = inode->i_dquot;
int reserve = flags & DQUOT_SPACE_RESERVE;

- /*
- * First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex
- */
if (!dquot_active(inode)) {
inode_incr_space(inode, number, reserve);
goto out;
@@ -1614,7 +1623,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1623,6 +1632,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
!(flags & DQUOT_SPACE_WARN), &warn[cnt]);
if (ret && !(flags & DQUOT_SPACE_NOFAIL)) {
spin_unlock(&dq_data_lock);
+ spin_unlock(&inode->i_lock);
goto out_flush_warn;
}
}
@@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
}
inode_incr_space(inode, number, reserve);
spin_unlock(&dq_data_lock);
+ spin_unlock(&inode->i_lock);

if (reserve)
goto out_flush_warn;
mark_all_dquot_dirty(dquots);
out_flush_warn:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn);
out:
return ret;
@@ -1657,13 +1667,12 @@ int dquot_alloc_inode(const struct inode *inode)
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return 0;
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;
- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ spin_lock((spinlock_t *)&inode->i_lock);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1681,9 +1690,9 @@ int dquot_alloc_inode(const struct inode *inode)

warn_put_all:
spin_unlock(&dq_data_lock);
+ spin_unlock((spinlock_t *)&inode->i_lock);
if (ret == 0)
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn);
return ret;
}
@@ -1701,7 +1710,7 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
return 0;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1710,10 +1719,10 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
number);
}
/* Update inode bytes */
- inode_claim_rsv_space(inode, number);
+ __inode_claim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
+ spin_unlock(&inode->i_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
return 0;
}
EXPORT_SYMBOL(dquot_claim_space_nodirty);
@@ -1730,7 +1739,7 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1739,10 +1748,10 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
number);
}
/* Update inode bytes */
- inode_reclaim_rsv_space(inode, number);
+ __inode_reclaim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
+ spin_unlock(&inode->i_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
return;
}
EXPORT_SYMBOL(dquot_reclaim_space_nodirty);
@@ -1757,14 +1766,12 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
struct dquot **dquots = inode->i_dquot;
int reserve = flags & DQUOT_SPACE_RESERVE;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode)) {
inode_decr_space(inode, number, reserve);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1782,12 +1789,12 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
}
inode_decr_space(inode, number, reserve);
spin_unlock(&dq_data_lock);
+ spin_unlock(&inode->i_lock);

if (reserve)
- goto out_unlock;
+ goto out;
mark_all_dquot_dirty(dquots);
-out_unlock:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+out:
flush_warnings(warn);
}
EXPORT_SYMBOL(__dquot_free_space);
@@ -1801,12 +1808,10 @@ void dquot_free_inode(const struct inode *inode)
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock((spinlock_t *)&inode->i_lock);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1820,8 +1825,8 @@ void dquot_free_inode(const struct inode *inode)
dquot_decr_inodes(dquots[cnt], 1);
}
spin_unlock(&dq_data_lock);
+ spin_unlock((spinlock_t *)&inode->i_lock);
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn);
}
EXPORT_SYMBOL(dquot_free_inode);
@@ -1847,8 +1852,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
struct dquot_warn warn_from_inodes[MAXQUOTAS];
struct dquot_warn warn_from_space[MAXQUOTAS];

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (IS_NOQUOTA(inode))
return 0;
/* Initialize the arrays */
@@ -1857,9 +1860,9 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
}
- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&inode->i_lock);
if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&inode->i_lock);
return 0;
}
spin_lock(&dq_data_lock);
@@ -1916,7 +1919,7 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
inode->i_dquot[cnt] = transfer_to[cnt];
}
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&inode->i_lock);

mark_all_dquot_dirty(transfer_from);
mark_all_dquot_dirty(transfer_to);
@@ -1930,7 +1933,7 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
return 0;
over_quota:
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&inode->i_lock);
flush_warnings(warn_to);
return ret;
}
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 2b363e2..e4851cb 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
{
__u32 fmt;

- down_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
if (!sb_has_quota_active(sb, type)) {
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
return -ESRCH;
}
fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
if (copy_to_user(addr, &fmt, sizeof(fmt)))
return -EFAULT;
return 0;
diff --git a/fs/stat.c b/fs/stat.c
index ae0c3ce..b0e6898 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -488,12 +488,17 @@ void inode_sub_bytes(struct inode *inode, loff_t bytes)

EXPORT_SYMBOL(inode_sub_bytes);

+loff_t __inode_get_bytes(struct inode *inode)
+{
+ return (((loff_t)inode->i_blocks) << 9) + inode->i_bytes;
+}
+
loff_t inode_get_bytes(struct inode *inode)
{
loff_t ret;

spin_lock(&inode->i_lock);
- ret = (((loff_t)inode->i_blocks) << 9) + inode->i_bytes;
+ ret = __inode_get_bytes(inode);
spin_unlock(&inode->i_lock);
return ret;
}
diff --git a/fs/super.c b/fs/super.c
index 48377f7..a97aecf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -214,7 +214,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
mutex_init(&s->s_dquot.dqio_mutex);
mutex_init(&s->s_dquot.dqonoff_mutex);
- init_rwsem(&s->s_dquot.dqptr_sem);
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8780312..cd2f427 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2518,6 +2518,7 @@ void __inode_add_bytes(struct inode *inode, loff_t bytes);
void inode_add_bytes(struct inode *inode, loff_t bytes);
void __inode_sub_bytes(struct inode *inode, loff_t bytes);
void inode_sub_bytes(struct inode *inode, loff_t bytes);
+loff_t __inode_get_bytes(struct inode *inode);
loff_t inode_get_bytes(struct inode *inode);
void inode_set_bytes(struct inode *inode, loff_t bytes);

--
1.7.1



2014-05-22 13:25:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability

Hello,

thanks for the work! I have some comments below.

On Thu 22-05-14 18:47:22, Niu Yawei wrote:
> There are several global locks in the VFS quota code which hurts
> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>
> This patch tries to make the VFS quota code scalable with minimal changes.
>
> Following tests (mdtest & dbench) were running over ext4 fs in a
> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
> the patch relieved the lock congestion a lot.
>
Snipped performance results - thanks for those but first let's concentrate
on correctness side of things.

> [PATCH] quota: remove dqptr_sem for scalability
>
> Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
> unchanged), and the functionality of this lock is implemented by
> other locks:
> * i_dquot is protected by i_lock, however only this pointer, the
> content of this struct is by dq_data_lock.
> * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
> * Small changes in __dquot_initialize() to avoid unnecessary
> dqget()/dqput() calls.
Each of these three steps should be a separate patch please.

Now regarding each of these steps: Using i_lock for protection of dquot
pointers doesn't quite work. You have e.g.:
> @@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
> }
> inode_incr_space(inode, number, reserve);
> spin_unlock(&dq_data_lock);
> + spin_unlock(&inode->i_lock);
>
> if (reserve)
> goto out_flush_warn;
> mark_all_dquot_dirty(dquots);
> out_flush_warn:
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> flush_warnings(warn);
> out:
> return ret;
So you release protection of dquot pointers from inode before calling
mark_all_dquot_dirty(). So dquot pointers can be removed by
remove_inode_dquot_ref() while mark_all_dquot_dirty() works on them. That's
wrong and can lead to use after free.

Quota code uses dqptr_sem to provide exclusion for three cases:
* dquot_init()
* dquot_transfer()
* various places just reading dquot pointers to update allocation
information
* remove_dquot_ref() (called from quotaoff code)

Now exclusion against remove_dquot_ref() is relatively easy to deal with.
We can create srcu for dquots, whoever looks at dquot pointers from inode
takes srcu_read_lock() (so you basically convert read side of dqptr_sem
and write side in dquot_init() and dquot_transfer() to that) and use
synchronize_srcu() in remove_dquot_ref() to make sure all users are done
before starting to remove dquot pointers. You'll need to move
dquot_active() checks inside srcu_read_lock() to make this reliable but that
should be easy.

What remains to deal with is an exclusion between the remaining places.
dquot_init(). dq_data_lock spinlock should already provide the necessary
exclusion between readers of allocation pointers and dquot_transfer() (that
spinlock would actually be a good candidate to a conversion to per-inode
one - using i_lock for this should noticeably reduce the contention - but
that's the next step). dquot_init() doesn't take the spinlock so far but
probably we can make it to take it for simplicity for now.

> static void __dquot_initialize(struct inode *inode, int type)
> {
> - int cnt;
> - struct dquot *got[MAXQUOTAS];
> + int cnt, dq_get = 0;
> + struct dquot *got[MAXQUOTAS] = { NULL, NULL };
> struct super_block *sb = inode->i_sb;
> qsize_t rsv;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode))
> return;
>
> - /* First get references to structures we might need. */
> + /* In most case, the i_dquot should have been initialized, except
> + * the newly allocated one. We'd always try to skip the dqget() and
> + * dqput() calls to avoid unnecessary global lock contention. */
> + if (!(inode->i_state & I_NEW))
> + goto init_idquot;
The optimization is a good idea but dquot_init() is often called for
!I_NEW inodes when the initialization is necessary. So I'd rather first
check whether relevant i_dquot[] pointers are != NULL before taking any
lock. If yes, we are done, otherwise we grab pointers to dquots, take the
lock and update the pointers.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-05-23 03:37:43

by Niu Yawei

[permalink] [raw]
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability

Thanks for your review, Jack.
> Hello,
>
> thanks for the work! I have some comments below.
>
> On Thu 22-05-14 18:47:22, Niu Yawei wrote:
>> There are several global locks in the VFS quota code which hurts
>> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>>
>> This patch tries to make the VFS quota code scalable with minimal changes.
>>
>> Following tests (mdtest & dbench) were running over ext4 fs in a
>> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
>> the patch relieved the lock congestion a lot.
>>
> Snipped performance results - thanks for those but first let's concentrate
> on correctness side of things.
>
>> [PATCH] quota: remove dqptr_sem for scalability
>>
>> Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
>> unchanged), and the functionality of this lock is implemented by
>> other locks:
>> * i_dquot is protected by i_lock, however only this pointer, the
>> content of this struct is by dq_data_lock.
>> * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
>> * Small changes in __dquot_initialize() to avoid unnecessary
>> dqget()/dqput() calls.
> Each of these three steps should be a separate patch please.
Ok, sure.
>
> Now regarding each of these steps: Using i_lock for protection of dquot
> pointers doesn't quite work. You have e.g.:
>> @@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>> }
>> inode_incr_space(inode, number, reserve);
>> spin_unlock(&dq_data_lock);
>> + spin_unlock(&inode->i_lock);
>>
>> if (reserve)
>> goto out_flush_warn;
>> mark_all_dquot_dirty(dquots);
>> out_flush_warn:
>> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> flush_warnings(warn);
>> out:
>> return ret;
> So you release protection of dquot pointers from inode before calling
> mark_all_dquot_dirty(). So dquot pointers can be removed by
> remove_inode_dquot_ref() while mark_all_dquot_dirty() works on them. That's
> wrong and can lead to use after free.
Indeed, I didn't realise that the getting refcount code has been removed
from these functions,
is it ok to add it back?
>
> Quota code uses dqptr_sem to provide exclusion for three cases:
> * dquot_init()
> * dquot_transfer()
> * various places just reading dquot pointers to update allocation
> information
> * remove_dquot_ref() (called from quotaoff code)
>
> Now exclusion against remove_dquot_ref() is relatively easy to deal with.
> We can create srcu for dquots, whoever looks at dquot pointers from inode
> takes srcu_read_lock() (so you basically convert read side of dqptr_sem
> and write side in dquot_init() and dquot_transfer() to that) and use
> synchronize_srcu() in remove_dquot_ref() to make sure all users are done
> before starting to remove dquot pointers. You'll need to move
> dquot_active() checks inside srcu_read_lock() to make this reliable but that
> should be easy.
Ok, but adding back the refcounting code looks more straightforward to
me, may I add them back?
> What remains to deal with is an exclusion between the remaining places.
> dquot_init(). dq_data_lock spinlock should already provide the necessary
> exclusion between readers of allocation pointers and dquot_transfer() (that
> spinlock would actually be a good candidate to a conversion to per-inode
> one - using i_lock for this should noticeably reduce the contention - but
> that's the next step). dquot_init() doesn't take the spinlock so far but
> probably we can make it to take it for simplicity for now.
Using global lock in dquot_initalize() will drop the performance a lot
(because it could
be called several times for a single create/unlink operation), so I'm
afraid that we have to
use per-inode lock (i_lock) to serialize them.
>
>> static void __dquot_initialize(struct inode *inode, int type)
>> {
>> - int cnt;
>> - struct dquot *got[MAXQUOTAS];
>> + int cnt, dq_get = 0;
>> + struct dquot *got[MAXQUOTAS] = { NULL, NULL };
>> struct super_block *sb = inode->i_sb;
>> qsize_t rsv;
>>
>> - /* First test before acquiring mutex - solves deadlocks when we
>> - * re-enter the quota code and are already holding the mutex */
>> if (!dquot_active(inode))
>> return;
>>
>> - /* First get references to structures we might need. */
>> + /* In most case, the i_dquot should have been initialized, except
>> + * the newly allocated one. We'd always try to skip the dqget() and
>> + * dqput() calls to avoid unnecessary global lock contention. */
>> + if (!(inode->i_state & I_NEW))
>> + goto init_idquot;
> The optimization is a good idea but dquot_init() is often called for
> !I_NEW inodes when the initialization is necessary. So I'd rather first
> check whether relevant i_dquot[] pointers are != NULL before taking any
> lock. If yes, we are done, otherwise we grab pointers to dquots, take the
> lock and update the pointers.
Ok, thank you.
>
> Honza


2014-05-23 04:02:28

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability

On 5/22/14, 5:47 AM, Niu Yawei wrote:
> There are several global locks in the VFS quota code which hurts
> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>
> This patch tries to make the VFS quota code scalable with minimal changes.
>
> Following tests (mdtest & dbench) were running over ext4 fs in a
> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
> the patch relieved the lock congestion a lot.
>

Just noticed this patch - FWIW, Lustre has a
"quota-replace-dqptr-sem-sles11sp2.patch" that they apply:

http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=lustre/kernel_patches/patches/quota-replace-dqptr-sem-sles11sp2.patch;h=c880dac83473f48cac96dc467ea76f64a74fe5dd;hb=HEAD

which might be interesting if you're looking at this.

(Or maybe it's doing the same thing; TBH I have not looked at
either patch, I just remembered that it existed...)

-eric

2014-05-23 05:22:13

by Niu Yawei

[permalink] [raw]
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability

?? 2014/5/23 12:02, Eric Sandeen д??:
> On 5/22/14, 5:47 AM, Niu Yawei wrote:
>> There are several global locks in the VFS quota code which hurts
>> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>>
>> This patch tries to make the VFS quota code scalable with minimal changes.
>>
>> Following tests (mdtest & dbench) were running over ext4 fs in a
>> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
>> the patch relieved the lock congestion a lot.
>>
> Just noticed this patch - FWIW, Lustre has a
> "quota-replace-dqptr-sem-sles11sp2.patch" that they apply:
Yes, I'm Lustre developer and trying to push the patch upstream. :)
>
> http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=lustre/kernel_patches/patches/quota-replace-dqptr-sem-sles11sp2.patch;h=c880dac83473f48cac96dc467ea76f64a74fe5dd;hb=HEAD
>
> which might be interesting if you're looking at this.
>
> (Or maybe it's doing the same thing; TBH I have not looked at
> either patch, I just remembered that it existed...)
>
> -eric

2014-05-23 13:02:37

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability

On 5/23/14, 12:22 AM, Niu Yawei wrote:
> ?? 2014/5/23 12:02, Eric Sandeen д??:
>> On 5/22/14, 5:47 AM, Niu Yawei wrote:
>>> There are several global locks in the VFS quota code which hurts
>>> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>>>
>>> This patch tries to make the VFS quota code scalable with minimal changes.
>>>
>>> Following tests (mdtest & dbench) were running over ext4 fs in a
>>> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
>>> the patch relieved the lock congestion a lot.
>>>
>> Just noticed this patch - FWIW, Lustre has a
>> "quota-replace-dqptr-sem-sles11sp2.patch" that they apply:
> Yes, I'm Lustre developer and trying to push the patch upstream. :)

I'm sorry. I took a quick glance at both and somehow didn't realize they
were the same. That's what I get for sending emails too late at night.

Sorry for the noise! Carry on... ;) And thanks for doing this!

-Eric

2014-05-27 10:10:53

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 1/3] quota: protect Q_GETFMT by dqonoff_mutex

Remove dqptr_sem to make quota code scalable (but kept in struct
quota_info to keep kernel ABI unchanged): Protect the Q_GETFMT by
dqonoff_mutex instead of dqptr_sem.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/quota.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 2b363e2..e4851cb 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
{
__u32 fmt;

- down_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
if (!sb_has_quota_active(sb, type)) {
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
return -ESRCH;
}
fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
if (copy_to_user(addr, &fmt, sizeof(fmt)))
return -EFAULT;
return 0;
--
1.7.1



2014-05-27 10:12:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] quota: protect Q_GETFMT by dqonoff_mutex

On Tue, May 27, 2014 at 06:10:53PM +0800, Niu Yawei wrote:
> Remove dqptr_sem to make quota code scalable (but kept in struct
> quota_info to keep kernel ABI unchanged):

There is not stable kernel ABI to be kept, please adjust your patch for
that.


2014-05-27 10:13:48

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 2/3] quota: avoid unnecessary dqget()/dqput() calls

Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(),
that will introduce global lock contention otherwise.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9cd5f63..dc6f711 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1400,7 +1400,7 @@ static int dquot_active(const struct inode *inode)
*/
static void __dquot_initialize(struct inode *inode, int type)
{
- int cnt;
+ int cnt, dq_get = 0;
struct dquot *got[MAXQUOTAS];
struct super_block *sb = inode->i_sb;
qsize_t rsv;
@@ -1416,6 +1416,13 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = NULL;
if (type != -1 && cnt != type)
continue;
+ /* the i_dquot should have been initialized in most case,
+ * we check it without locking here to avoid unnecessary
+ * dqget()/dqput() calls. */
+ if (inode->i_dquot[cnt])
+ continue;
+ dq_get = 1;
+
switch (cnt) {
case USRQUOTA:
qid = make_kqid_uid(inode->i_uid);
@@ -1427,6 +1434,10 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = dqget(sb, qid);
}

+ /* all required i_dquot has been initialized */
+ if (!dq_get)
+ return;
+
down_write(&sb_dqopt(sb)->dqptr_sem);
if (IS_NOQUOTA(inode))
goto out_err;
--
1.7.1



2014-05-27 10:15:29

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 3/3] quota: remove dqptr_sem

Remove dqptr_sem to make quota code scalable (but kept in struct
quota_info to keep kernel ABI unchanged): Remove the dqptr_sem,
accessing inode->i_dquot now protected by dquot_srcu, and changing
inode->i_dquot is now serialized by dq_data_lock.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 105 +++++++++++++++++++++---------------------------------
fs/super.c | 1 -
2 files changed, 41 insertions(+), 65 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index dc6f711..b86c88b 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -96,13 +96,15 @@
* Note that some things (eg. sb pointer, type, id) doesn't change during
* the life of the dquot structure and so needn't to be protected by a lock
*
- * Any operation working on dquots via inode pointers must hold dqptr_sem. If
- * operation is just reading pointers from inode (or not using them at all) the
- * read lock is enough. If pointers are altered function must hold write lock.
+ * Operation accessing dquots via inode pointers are protected by dquot_srcu.
+ * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
+ * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
+ * use after free. dq_data_lock is used to serialize the pointer setting and
+ * clearing operations.
* Special care needs to be taken about S_NOQUOTA inode flag (marking that
* inode is a quota file). Functions adding pointers from inode to dquots have
- * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
- * have to do all pointer modifications before dropping dqptr_sem. This makes
+ * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
+ * have to do all pointer modifications before dropping dq_data_lock. This makes
* sure they cannot race with quotaon which first sets S_NOQUOTA flag and
* then drops all pointers to dquots from an inode.
*
@@ -116,21 +118,15 @@
* spinlock to internal buffers before writing.
*
* Lock ordering (including related VFS locks) is the following:
- * dqonoff_mutex > i_mutex > journal_lock > dqptr_sem > dquot->dq_lock >
- * dqio_mutex
+ * dqonoff_mutex > i_mutex > journal_lock > dquot->dq_lock > dqio_mutex
* dqonoff_mutex > i_mutex comes from dquot_quota_sync, dquot_enable, etc.
- * The lock ordering of dqptr_sem imposed by quota code is only dqonoff_sem >
- * dqptr_sem. But filesystem has to count with the fact that functions such as
- * dquot_alloc_space() acquire dqptr_sem and they usually have to be called
- * from inside a transaction to keep filesystem consistency after a crash. Also
- * filesystems usually want to do some IO on dquot from ->mark_dirty which is
- * called with dqptr_sem held.
*/

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_list_lock);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_state_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_data_lock);
EXPORT_SYMBOL(dq_data_lock);
+DEFINE_STATIC_SRCU(dquot_srcu);

void __quota_error(struct super_block *sb, const char *func,
const char *fmt, ...)
@@ -974,7 +970,6 @@ static inline int dqput_blocks(struct dquot *dquot)
/*
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
- * We can't race with anybody because we hold dqptr_sem for writing...
*/
static int remove_inode_dquot_ref(struct inode *inode, int type,
struct list_head *tofree_head)
@@ -1035,13 +1030,15 @@ static void remove_dquot_ref(struct super_block *sb, int type,
* We have to scan also I_NEW inodes because they can already
* have quota pointer initialized. Luckily, we need to touch
* only quota pointers and these have separate locking
- * (dqptr_sem).
+ * (dq_data_lock).
*/
+ spin_lock(&dq_data_lock);
if (!IS_NOQUOTA(inode)) {
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
remove_inode_dquot_ref(inode, type, tofree_head);
}
+ spin_unlock(&dq_data_lock);
}
spin_unlock(&inode_sb_list_lock);
#ifdef CONFIG_QUOTA_DEBUG
@@ -1059,9 +1056,8 @@ static void drop_dquot_ref(struct super_block *sb, int type)
LIST_HEAD(tofree_head);

if (sb->dq_op) {
- down_write(&sb_dqopt(sb)->dqptr_sem);
remove_dquot_ref(sb, type, &tofree_head);
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ synchronize_srcu(&dquot_srcu);
put_dquot_list(&tofree_head);
}
}
@@ -1392,9 +1388,6 @@ static int dquot_active(const struct inode *inode)
/*
* Initialize quota pointers in inode
*
- * We do things in a bit complicated way but by that we avoid calling
- * dqget() and thus filesystem callbacks under dqptr_sem.
- *
* It is better to call this function outside of any transaction as it
* might need a lot of space in journal for dquot structure allocation.
*/
@@ -1405,8 +1398,6 @@ static void __dquot_initialize(struct inode *inode, int type)
struct super_block *sb = inode->i_sb;
qsize_t rsv;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

@@ -1438,7 +1429,7 @@ static void __dquot_initialize(struct inode *inode, int type)
if (!dq_get)
return;

- down_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode))
goto out_err;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1458,15 +1449,12 @@ static void __dquot_initialize(struct inode *inode, int type)
* did a write before quota was turned on
*/
rsv = inode_get_rsv_space(inode);
- if (unlikely(rsv)) {
- spin_lock(&dq_data_lock);
+ if (unlikely(rsv))
dquot_resv_space(inode->i_dquot[cnt], rsv);
- spin_unlock(&dq_data_lock);
- }
}
}
out_err:
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
/* Drop unused references */
dqput_all(got);
}
@@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
int cnt;
struct dquot *put[MAXQUOTAS];

- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
put[cnt] = inode->i_dquot[cnt];
inode->i_dquot[cnt] = NULL;
}
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
+ synchronize_srcu(&dquot_srcu);
dqput_all(put);
}

@@ -1608,15 +1597,11 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
*/
int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
int reserve = flags & DQUOT_SPACE_RESERVE;

- /*
- * First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex
- */
if (!dquot_active(inode)) {
inode_incr_space(inode, number, reserve);
goto out;
@@ -1625,7 +1610,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1652,7 +1637,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
goto out_flush_warn;
mark_all_dquot_dirty(dquots);
out_flush_warn:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
out:
return ret;
@@ -1664,17 +1649,16 @@ EXPORT_SYMBOL(__dquot_alloc_space);
*/
int dquot_alloc_inode(const struct inode *inode)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return 0;
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;
- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1694,7 +1678,7 @@ warn_put_all:
spin_unlock(&dq_data_lock);
if (ret == 0)
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
return ret;
}
@@ -1705,14 +1689,14 @@ EXPORT_SYMBOL(dquot_alloc_inode);
*/
int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_claim_rsv_space(inode, number);
return 0;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1724,7 +1708,7 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
inode_claim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return 0;
}
EXPORT_SYMBOL(dquot_claim_space_nodirty);
@@ -1734,14 +1718,14 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
*/
void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_reclaim_rsv_space(inode, number);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1753,7 +1737,7 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
inode_reclaim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return;
}
EXPORT_SYMBOL(dquot_reclaim_space_nodirty);
@@ -1766,16 +1750,14 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
- int reserve = flags & DQUOT_SPACE_RESERVE;
+ int reserve = flags & DQUOT_SPACE_RESERVE, index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode)) {
inode_decr_space(inode, number, reserve);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1798,7 +1780,7 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
goto out_unlock;
mark_all_dquot_dirty(dquots);
out_unlock:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(__dquot_free_space);
@@ -1811,13 +1793,12 @@ void dquot_free_inode(const struct inode *inode)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;
+ int index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1832,7 +1813,7 @@ void dquot_free_inode(const struct inode *inode)
}
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(dquot_free_inode);
@@ -1858,8 +1839,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
struct dquot_warn warn_from_inodes[MAXQUOTAS];
struct dquot_warn warn_from_space[MAXQUOTAS];

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (IS_NOQUOTA(inode))
return 0;
/* Initialize the arrays */
@@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
}
- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
return 0;
}
- spin_lock(&dq_data_lock);
cur_space = inode_get_bytes(inode);
rsv_space = inode_get_rsv_space(inode);
space = cur_space + rsv_space;
@@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
inode->i_dquot[cnt] = transfer_to[cnt];
}
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);

mark_all_dquot_dirty(transfer_from);
mark_all_dquot_dirty(transfer_to);
@@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
return 0;
over_quota:
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn_to);
return ret;
}
diff --git a/fs/super.c b/fs/super.c
index 48377f7..a97aecf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -214,7 +214,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
mutex_init(&s->s_dquot.dqio_mutex);
mutex_init(&s->s_dquot.dqonoff_mutex);
- init_rwsem(&s->s_dquot.dqptr_sem);
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
--
1.7.1



2014-05-27 10:28:13

by Niu Yawei

[permalink] [raw]
Subject: Re: [PATCH 1/3] quota: protect Q_GETFMT by dqonoff_mutex

Ok, thanks.
> On Tue, May 27, 2014 at 06:10:53PM +0800, Niu Yawei wrote:
>> Remove dqptr_sem to make quota code scalable (but kept in struct
>> quota_info to keep kernel ABI unchanged):
> There is not stable kernel ABI to be kept, please adjust your patch for
> that.
>


2014-05-28 01:52:12

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 1/3 v2] quota: protect Q_GETFMT by dqonoff_mutex

Remove dqptr_sem to make quota code scalable: Protect the
Q_GETFMT by dqonoff_mutex instead of dqptr_sem.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/quota.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 2b363e2..e4851cb 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
{
__u32 fmt;

- down_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
if (!sb_has_quota_active(sb, type)) {
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
return -ESRCH;
}
fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
if (copy_to_user(addr, &fmt, sizeof(fmt)))
return -EFAULT;
return 0;
--
1.7.1



2014-05-28 01:53:32

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 2/3 v2] quota: avoid unnecessary dqget()/dqput() calls

Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(),
that will introduce global lock contention otherwise.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9cd5f63..dc6f711 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1400,7 +1400,7 @@ static int dquot_active(const struct inode *inode)
*/
static void __dquot_initialize(struct inode *inode, int type)
{
- int cnt;
+ int cnt, dq_get = 0;
struct dquot *got[MAXQUOTAS];
struct super_block *sb = inode->i_sb;
qsize_t rsv;
@@ -1416,6 +1416,13 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = NULL;
if (type != -1 && cnt != type)
continue;
+ /* the i_dquot should have been initialized in most case,
+ * we check it without locking here to avoid unnecessary
+ * dqget()/dqput() calls. */
+ if (inode->i_dquot[cnt])
+ continue;
+ dq_get = 1;
+
switch (cnt) {
case USRQUOTA:
qid = make_kqid_uid(inode->i_uid);
@@ -1427,6 +1434,10 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = dqget(sb, qid);
}

+ /* all required i_dquot has been initialized */
+ if (!dq_get)
+ return;
+
down_write(&sb_dqopt(sb)->dqptr_sem);
if (IS_NOQUOTA(inode))
goto out_err;
--
1.7.1



2014-05-28 01:55:04

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 3/3 v2] quota: remove dqptr_sem

Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
accessing inode->i_dquot now protected by dquot_srcu, and changing
inode->i_dquot is now serialized by dq_data_lock.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------
fs/super.c | 1 -
include/linux/quota.h | 1 -
3 files changed, 41 insertions(+), 66 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index dc6f711..b86c88b 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -96,13 +96,15 @@
* Note that some things (eg. sb pointer, type, id) doesn't change during
* the life of the dquot structure and so needn't to be protected by a lock
*
- * Any operation working on dquots via inode pointers must hold dqptr_sem. If
- * operation is just reading pointers from inode (or not using them at all) the
- * read lock is enough. If pointers are altered function must hold write lock.
+ * Operation accessing dquots via inode pointers are protected by dquot_srcu.
+ * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
+ * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
+ * use after free. dq_data_lock is used to serialize the pointer setting and
+ * clearing operations.
* Special care needs to be taken about S_NOQUOTA inode flag (marking that
* inode is a quota file). Functions adding pointers from inode to dquots have
- * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
- * have to do all pointer modifications before dropping dqptr_sem. This makes
+ * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
+ * have to do all pointer modifications before dropping dq_data_lock. This makes
* sure they cannot race with quotaon which first sets S_NOQUOTA flag and
* then drops all pointers to dquots from an inode.
*
@@ -116,21 +118,15 @@
* spinlock to internal buffers before writing.
*
* Lock ordering (including related VFS locks) is the following:
- * dqonoff_mutex > i_mutex > journal_lock > dqptr_sem > dquot->dq_lock >
- * dqio_mutex
+ * dqonoff_mutex > i_mutex > journal_lock > dquot->dq_lock > dqio_mutex
* dqonoff_mutex > i_mutex comes from dquot_quota_sync, dquot_enable, etc.
- * The lock ordering of dqptr_sem imposed by quota code is only dqonoff_sem >
- * dqptr_sem. But filesystem has to count with the fact that functions such as
- * dquot_alloc_space() acquire dqptr_sem and they usually have to be called
- * from inside a transaction to keep filesystem consistency after a crash. Also
- * filesystems usually want to do some IO on dquot from ->mark_dirty which is
- * called with dqptr_sem held.
*/

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_list_lock);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_state_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_data_lock);
EXPORT_SYMBOL(dq_data_lock);
+DEFINE_STATIC_SRCU(dquot_srcu);

void __quota_error(struct super_block *sb, const char *func,
const char *fmt, ...)
@@ -974,7 +970,6 @@ static inline int dqput_blocks(struct dquot *dquot)
/*
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
- * We can't race with anybody because we hold dqptr_sem for writing...
*/
static int remove_inode_dquot_ref(struct inode *inode, int type,
struct list_head *tofree_head)
@@ -1035,13 +1030,15 @@ static void remove_dquot_ref(struct super_block *sb, int type,
* We have to scan also I_NEW inodes because they can already
* have quota pointer initialized. Luckily, we need to touch
* only quota pointers and these have separate locking
- * (dqptr_sem).
+ * (dq_data_lock).
*/
+ spin_lock(&dq_data_lock);
if (!IS_NOQUOTA(inode)) {
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
remove_inode_dquot_ref(inode, type, tofree_head);
}
+ spin_unlock(&dq_data_lock);
}
spin_unlock(&inode_sb_list_lock);
#ifdef CONFIG_QUOTA_DEBUG
@@ -1059,9 +1056,8 @@ static void drop_dquot_ref(struct super_block *sb, int type)
LIST_HEAD(tofree_head);

if (sb->dq_op) {
- down_write(&sb_dqopt(sb)->dqptr_sem);
remove_dquot_ref(sb, type, &tofree_head);
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ synchronize_srcu(&dquot_srcu);
put_dquot_list(&tofree_head);
}
}
@@ -1392,9 +1388,6 @@ static int dquot_active(const struct inode *inode)
/*
* Initialize quota pointers in inode
*
- * We do things in a bit complicated way but by that we avoid calling
- * dqget() and thus filesystem callbacks under dqptr_sem.
- *
* It is better to call this function outside of any transaction as it
* might need a lot of space in journal for dquot structure allocation.
*/
@@ -1405,8 +1398,6 @@ static void __dquot_initialize(struct inode *inode, int type)
struct super_block *sb = inode->i_sb;
qsize_t rsv;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

@@ -1438,7 +1429,7 @@ static void __dquot_initialize(struct inode *inode, int type)
if (!dq_get)
return;

- down_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode))
goto out_err;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1458,15 +1449,12 @@ static void __dquot_initialize(struct inode *inode, int type)
* did a write before quota was turned on
*/
rsv = inode_get_rsv_space(inode);
- if (unlikely(rsv)) {
- spin_lock(&dq_data_lock);
+ if (unlikely(rsv))
dquot_resv_space(inode->i_dquot[cnt], rsv);
- spin_unlock(&dq_data_lock);
- }
}
}
out_err:
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
/* Drop unused references */
dqput_all(got);
}
@@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
int cnt;
struct dquot *put[MAXQUOTAS];

- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
put[cnt] = inode->i_dquot[cnt];
inode->i_dquot[cnt] = NULL;
}
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
+ synchronize_srcu(&dquot_srcu);
dqput_all(put);
}

@@ -1608,15 +1597,11 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
*/
int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
int reserve = flags & DQUOT_SPACE_RESERVE;

- /*
- * First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex
- */
if (!dquot_active(inode)) {
inode_incr_space(inode, number, reserve);
goto out;
@@ -1625,7 +1610,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1652,7 +1637,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
goto out_flush_warn;
mark_all_dquot_dirty(dquots);
out_flush_warn:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
out:
return ret;
@@ -1664,17 +1649,16 @@ EXPORT_SYMBOL(__dquot_alloc_space);
*/
int dquot_alloc_inode(const struct inode *inode)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return 0;
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;
- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1694,7 +1678,7 @@ warn_put_all:
spin_unlock(&dq_data_lock);
if (ret == 0)
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
return ret;
}
@@ -1705,14 +1689,14 @@ EXPORT_SYMBOL(dquot_alloc_inode);
*/
int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_claim_rsv_space(inode, number);
return 0;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1724,7 +1708,7 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
inode_claim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return 0;
}
EXPORT_SYMBOL(dquot_claim_space_nodirty);
@@ -1734,14 +1718,14 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
*/
void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_reclaim_rsv_space(inode, number);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1753,7 +1737,7 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
inode_reclaim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return;
}
EXPORT_SYMBOL(dquot_reclaim_space_nodirty);
@@ -1766,16 +1750,14 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
- int reserve = flags & DQUOT_SPACE_RESERVE;
+ int reserve = flags & DQUOT_SPACE_RESERVE, index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode)) {
inode_decr_space(inode, number, reserve);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1798,7 +1780,7 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
goto out_unlock;
mark_all_dquot_dirty(dquots);
out_unlock:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(__dquot_free_space);
@@ -1811,13 +1793,12 @@ void dquot_free_inode(const struct inode *inode)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;
+ int index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1832,7 +1813,7 @@ void dquot_free_inode(const struct inode *inode)
}
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(dquot_free_inode);
@@ -1858,8 +1839,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
struct dquot_warn warn_from_inodes[MAXQUOTAS];
struct dquot_warn warn_from_space[MAXQUOTAS];

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (IS_NOQUOTA(inode))
return 0;
/* Initialize the arrays */
@@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
}
- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
return 0;
}
- spin_lock(&dq_data_lock);
cur_space = inode_get_bytes(inode);
rsv_space = inode_get_rsv_space(inode);
space = cur_space + rsv_space;
@@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
inode->i_dquot[cnt] = transfer_to[cnt];
}
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);

mark_all_dquot_dirty(transfer_from);
mark_all_dquot_dirty(transfer_to);
@@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
return 0;
over_quota:
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn_to);
return ret;
}
diff --git a/fs/super.c b/fs/super.c
index 48377f7..a97aecf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -214,7 +214,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
mutex_init(&s->s_dquot.dqio_mutex);
mutex_init(&s->s_dquot.dqonoff_mutex);
- init_rwsem(&s->s_dquot.dqptr_sem);
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
diff --git a/include/linux/quota.h b/include/linux/quota.h
index cc7494a..a1358ce 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -389,7 +389,6 @@ struct quota_info {
unsigned int flags; /* Flags for diskquotas on this device */
struct mutex dqio_mutex; /* lock device while I/O in progress */
struct mutex dqonoff_mutex; /* Serialize quotaon & quotaoff */
- struct rw_semaphore dqptr_sem; /* serialize ops using quota_info struct, pointers from inode to dquots */
struct inode *files[MAXQUOTAS]; /* inodes of quotafiles */
struct mem_dqinfo info[MAXQUOTAS]; /* Information for each quota type */
const struct quota_format_ops *ops[MAXQUOTAS]; /* Operations for each type */
--
1.7.1



2014-05-28 02:01:06

by Niu Yawei

[permalink] [raw]
Subject: Re: [PATCH 3/3 v2] quota: remove dqptr_sem

The v2 patchset just removed dqptr_sem totally (not like v1 keeping it
in quota_info), and I measured
performance of both v1 & v2 patches, the improvements are same as the
first big-one patch I sent.
> Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
> accessing inode->i_dquot now protected by dquot_srcu, and changing
> inode->i_dquot is now serialized by dq_data_lock.
>
> Signed-off-by: Lai Siyao <[email protected]>
> Signed-off-by: Niu Yawei <[email protected]>
> ---
> fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------
> fs/super.c | 1 -
> include/linux/quota.h | 1 -
> 3 files changed, 41 insertions(+), 66 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index dc6f711..b86c88b 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -96,13 +96,15 @@
> * Note that some things (eg. sb pointer, type, id) doesn't change during
> * the life of the dquot structure and so needn't to be protected by a lock
> *
> - * Any operation working on dquots via inode pointers must hold dqptr_sem. If
> - * operation is just reading pointers from inode (or not using them at all) the
> - * read lock is enough. If pointers are altered function must hold write lock.
> + * Operation accessing dquots via inode pointers are protected by dquot_srcu.
> + * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
> + * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
> + * use after free. dq_data_lock is used to serialize the pointer setting and
> + * clearing operations.
> * Special care needs to be taken about S_NOQUOTA inode flag (marking that
> * inode is a quota file). Functions adding pointers from inode to dquots have
> - * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
> - * have to do all pointer modifications before dropping dqptr_sem. This makes
> + * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
> + * have to do all pointer modifications before dropping dq_data_lock. This makes
> * sure they cannot race with quotaon which first sets S_NOQUOTA flag and
> * then drops all pointers to dquots from an inode.
> *
> @@ -116,21 +118,15 @@
> * spinlock to internal buffers before writing.
> *
> * Lock ordering (including related VFS locks) is the following:
> - * dqonoff_mutex > i_mutex > journal_lock > dqptr_sem > dquot->dq_lock >
> - * dqio_mutex
> + * dqonoff_mutex > i_mutex > journal_lock > dquot->dq_lock > dqio_mutex
> * dqonoff_mutex > i_mutex comes from dquot_quota_sync, dquot_enable, etc.
> - * The lock ordering of dqptr_sem imposed by quota code is only dqonoff_sem >
> - * dqptr_sem. But filesystem has to count with the fact that functions such as
> - * dquot_alloc_space() acquire dqptr_sem and they usually have to be called
> - * from inside a transaction to keep filesystem consistency after a crash. Also
> - * filesystems usually want to do some IO on dquot from ->mark_dirty which is
> - * called with dqptr_sem held.
> */
>
> static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_list_lock);
> static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_state_lock);
> __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_data_lock);
> EXPORT_SYMBOL(dq_data_lock);
> +DEFINE_STATIC_SRCU(dquot_srcu);
>
> void __quota_error(struct super_block *sb, const char *func,
> const char *fmt, ...)
> @@ -974,7 +970,6 @@ static inline int dqput_blocks(struct dquot *dquot)
> /*
> * Remove references to dquots from inode and add dquot to list for freeing
> * if we have the last reference to dquot
> - * We can't race with anybody because we hold dqptr_sem for writing...
> */
> static int remove_inode_dquot_ref(struct inode *inode, int type,
> struct list_head *tofree_head)
> @@ -1035,13 +1030,15 @@ static void remove_dquot_ref(struct super_block *sb, int type,
> * We have to scan also I_NEW inodes because they can already
> * have quota pointer initialized. Luckily, we need to touch
> * only quota pointers and these have separate locking
> - * (dqptr_sem).
> + * (dq_data_lock).
> */
> + spin_lock(&dq_data_lock);
> if (!IS_NOQUOTA(inode)) {
> if (unlikely(inode_get_rsv_space(inode) > 0))
> reserved = 1;
> remove_inode_dquot_ref(inode, type, tofree_head);
> }
> + spin_unlock(&dq_data_lock);
> }
> spin_unlock(&inode_sb_list_lock);
> #ifdef CONFIG_QUOTA_DEBUG
> @@ -1059,9 +1056,8 @@ static void drop_dquot_ref(struct super_block *sb, int type)
> LIST_HEAD(tofree_head);
>
> if (sb->dq_op) {
> - down_write(&sb_dqopt(sb)->dqptr_sem);
> remove_dquot_ref(sb, type, &tofree_head);
> - up_write(&sb_dqopt(sb)->dqptr_sem);
> + synchronize_srcu(&dquot_srcu);
> put_dquot_list(&tofree_head);
> }
> }
> @@ -1392,9 +1388,6 @@ static int dquot_active(const struct inode *inode)
> /*
> * Initialize quota pointers in inode
> *
> - * We do things in a bit complicated way but by that we avoid calling
> - * dqget() and thus filesystem callbacks under dqptr_sem.
> - *
> * It is better to call this function outside of any transaction as it
> * might need a lot of space in journal for dquot structure allocation.
> */
> @@ -1405,8 +1398,6 @@ static void __dquot_initialize(struct inode *inode, int type)
> struct super_block *sb = inode->i_sb;
> qsize_t rsv;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode))
> return;
>
> @@ -1438,7 +1429,7 @@ static void __dquot_initialize(struct inode *inode, int type)
> if (!dq_get)
> return;
>
> - down_write(&sb_dqopt(sb)->dqptr_sem);
> + spin_lock(&dq_data_lock);
> if (IS_NOQUOTA(inode))
> goto out_err;
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> @@ -1458,15 +1449,12 @@ static void __dquot_initialize(struct inode *inode, int type)
> * did a write before quota was turned on
> */
> rsv = inode_get_rsv_space(inode);
> - if (unlikely(rsv)) {
> - spin_lock(&dq_data_lock);
> + if (unlikely(rsv))
> dquot_resv_space(inode->i_dquot[cnt], rsv);
> - spin_unlock(&dq_data_lock);
> - }
> }
> }
> out_err:
> - up_write(&sb_dqopt(sb)->dqptr_sem);
> + spin_unlock(&dq_data_lock);
> /* Drop unused references */
> dqput_all(got);
> }
> @@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
> int cnt;
> struct dquot *put[MAXQUOTAS];
>
> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> put[cnt] = inode->i_dquot[cnt];
> inode->i_dquot[cnt] = NULL;
> }
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_unlock(&dq_data_lock);
> + synchronize_srcu(&dquot_srcu);
> dqput_all(put);
> }
>
> @@ -1608,15 +1597,11 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
> */
> int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
> {
> - int cnt, ret = 0;
> + int cnt, ret = 0, index;
> struct dquot_warn warn[MAXQUOTAS];
> struct dquot **dquots = inode->i_dquot;
> int reserve = flags & DQUOT_SPACE_RESERVE;
>
> - /*
> - * First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex
> - */
> if (!dquot_active(inode)) {
> inode_incr_space(inode, number, reserve);
> goto out;
> @@ -1625,7 +1610,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
> for (cnt = 0; cnt < MAXQUOTAS; cnt++)
> warn[cnt].w_type = QUOTA_NL_NOWARN;
>
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> if (!dquots[cnt])
> @@ -1652,7 +1637,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
> goto out_flush_warn;
> mark_all_dquot_dirty(dquots);
> out_flush_warn:
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> flush_warnings(warn);
> out:
> return ret;
> @@ -1664,17 +1649,16 @@ EXPORT_SYMBOL(__dquot_alloc_space);
> */
> int dquot_alloc_inode(const struct inode *inode)
> {
> - int cnt, ret = 0;
> + int cnt, ret = 0, index;
> struct dquot_warn warn[MAXQUOTAS];
> struct dquot * const *dquots = inode->i_dquot;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode))
> return 0;
> for (cnt = 0; cnt < MAXQUOTAS; cnt++)
> warn[cnt].w_type = QUOTA_NL_NOWARN;
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> +
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> if (!dquots[cnt])
> @@ -1694,7 +1678,7 @@ warn_put_all:
> spin_unlock(&dq_data_lock);
> if (ret == 0)
> mark_all_dquot_dirty(dquots);
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> flush_warnings(warn);
> return ret;
> }
> @@ -1705,14 +1689,14 @@ EXPORT_SYMBOL(dquot_alloc_inode);
> */
> int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
> {
> - int cnt;
> + int cnt, index;
>
> if (!dquot_active(inode)) {
> inode_claim_rsv_space(inode, number);
> return 0;
> }
>
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> /* Claim reserved quotas to allocated quotas */
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> @@ -1724,7 +1708,7 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
> inode_claim_rsv_space(inode, number);
> spin_unlock(&dq_data_lock);
> mark_all_dquot_dirty(inode->i_dquot);
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> return 0;
> }
> EXPORT_SYMBOL(dquot_claim_space_nodirty);
> @@ -1734,14 +1718,14 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
> */
> void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
> {
> - int cnt;
> + int cnt, index;
>
> if (!dquot_active(inode)) {
> inode_reclaim_rsv_space(inode, number);
> return;
> }
>
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> /* Claim reserved quotas to allocated quotas */
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> @@ -1753,7 +1737,7 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
> inode_reclaim_rsv_space(inode, number);
> spin_unlock(&dq_data_lock);
> mark_all_dquot_dirty(inode->i_dquot);
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> return;
> }
> EXPORT_SYMBOL(dquot_reclaim_space_nodirty);
> @@ -1766,16 +1750,14 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
> unsigned int cnt;
> struct dquot_warn warn[MAXQUOTAS];
> struct dquot **dquots = inode->i_dquot;
> - int reserve = flags & DQUOT_SPACE_RESERVE;
> + int reserve = flags & DQUOT_SPACE_RESERVE, index;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode)) {
> inode_decr_space(inode, number, reserve);
> return;
> }
>
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> int wtype;
> @@ -1798,7 +1780,7 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
> goto out_unlock;
> mark_all_dquot_dirty(dquots);
> out_unlock:
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> flush_warnings(warn);
> }
> EXPORT_SYMBOL(__dquot_free_space);
> @@ -1811,13 +1793,12 @@ void dquot_free_inode(const struct inode *inode)
> unsigned int cnt;
> struct dquot_warn warn[MAXQUOTAS];
> struct dquot * const *dquots = inode->i_dquot;
> + int index;
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (!dquot_active(inode))
> return;
>
> - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + index = srcu_read_lock(&dquot_srcu);
> spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> int wtype;
> @@ -1832,7 +1813,7 @@ void dquot_free_inode(const struct inode *inode)
> }
> spin_unlock(&dq_data_lock);
> mark_all_dquot_dirty(dquots);
> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + srcu_read_unlock(&dquot_srcu, index);
> flush_warnings(warn);
> }
> EXPORT_SYMBOL(dquot_free_inode);
> @@ -1858,8 +1839,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> struct dquot_warn warn_from_inodes[MAXQUOTAS];
> struct dquot_warn warn_from_space[MAXQUOTAS];
>
> - /* First test before acquiring mutex - solves deadlocks when we
> - * re-enter the quota code and are already holding the mutex */
> if (IS_NOQUOTA(inode))
> return 0;
> /* Initialize the arrays */
> @@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
> warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
> }
> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> +
> + spin_lock(&dq_data_lock);
> if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_unlock(&dq_data_lock);
> return 0;
> }
> - spin_lock(&dq_data_lock);
> cur_space = inode_get_bytes(inode);
> rsv_space = inode_get_rsv_space(inode);
> space = cur_space + rsv_space;
> @@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> inode->i_dquot[cnt] = transfer_to[cnt];
> }
> spin_unlock(&dq_data_lock);
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>
> mark_all_dquot_dirty(transfer_from);
> mark_all_dquot_dirty(transfer_to);
> @@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> return 0;
> over_quota:
> spin_unlock(&dq_data_lock);
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> flush_warnings(warn_to);
> return ret;
> }
> diff --git a/fs/super.c b/fs/super.c
> index 48377f7..a97aecf 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -214,7 +214,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
> lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
> mutex_init(&s->s_dquot.dqio_mutex);
> mutex_init(&s->s_dquot.dqonoff_mutex);
> - init_rwsem(&s->s_dquot.dqptr_sem);
> s->s_maxbytes = MAX_NON_LFS;
> s->s_op = &default_op;
> s->s_time_gran = 1000000000;
> diff --git a/include/linux/quota.h b/include/linux/quota.h
> index cc7494a..a1358ce 100644
> --- a/include/linux/quota.h
> +++ b/include/linux/quota.h
> @@ -389,7 +389,6 @@ struct quota_info {
> unsigned int flags; /* Flags for diskquotas on this device */
> struct mutex dqio_mutex; /* lock device while I/O in progress */
> struct mutex dqonoff_mutex; /* Serialize quotaon & quotaoff */
> - struct rw_semaphore dqptr_sem; /* serialize ops using quota_info struct, pointers from inode to dquots */
> struct inode *files[MAXQUOTAS]; /* inodes of quotafiles */
> struct mem_dqinfo info[MAXQUOTAS]; /* Information for each quota type */
> const struct quota_format_ops *ops[MAXQUOTAS]; /* Operations for each type */


2014-06-02 07:32:45

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/3 v2] quota: protect Q_GETFMT by dqonoff_mutex

On Wed 28-05-14 09:52:19, Niu Yawei wrote:
> Remove dqptr_sem to make quota code scalable: Protect the
> Q_GETFMT by dqonoff_mutex instead of dqptr_sem.
I'd make the changelog more verbose. Otherwise the patch looks good to
me. Something like:

dqptr_sem will go away. Protect Q_GETFMT quotactl by dqonoff_mutex instead.
This is also enough to make sure quota info will not go away while we are
looking at it.

Honza
>
> Signed-off-by: Lai Siyao <[email protected]>
> Signed-off-by: Niu Yawei <[email protected]>
> ---
> fs/quota/quota.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 2b363e2..e4851cb 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
> {
> __u32 fmt;
>
> - down_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> if (!sb_has_quota_active(sb, type)) {
> - up_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> return -ESRCH;
> }
> fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
> - up_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> if (copy_to_user(addr, &fmt, sizeof(fmt)))
> return -EFAULT;
> return 0;
> --
> 1.7.1
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-06-02 07:42:47

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/3 v2] quota: avoid unnecessary dqget()/dqput() calls

On Wed 28-05-14 09:53:39, Niu Yawei wrote:
> Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(),
> that will introduce global lock contention otherwise.
The patch looks good. Just two minor nits below.

Honza

> Signed-off-by: Lai Siyao <[email protected]>
> Signed-off-by: Niu Yawei <[email protected]>
> ---
> fs/quota/dquot.c | 13 ++++++++++++-
> 1 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 9cd5f63..dc6f711 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1400,7 +1400,7 @@ static int dquot_active(const struct inode *inode)
> */
> static void __dquot_initialize(struct inode *inode, int type)
> {
> - int cnt;
> + int cnt, dq_get = 0;
> struct dquot *got[MAXQUOTAS];
> struct super_block *sb = inode->i_sb;
> qsize_t rsv;
> @@ -1416,6 +1416,13 @@ static void __dquot_initialize(struct inode *inode, int type)
> got[cnt] = NULL;
> if (type != -1 && cnt != type)
> continue;
> + /* the i_dquot should have been initialized in most case,
cases ^^^
> + * we check it without locking here to avoid unnecessary
> + * dqget()/dqput() calls. */
Please fix the comment style - scripts/checkpatch.pl is your friend.

> + if (inode->i_dquot[cnt])
> + continue;
> + dq_get = 1;
init_needed or something like that would be a better name I think.

> +
> switch (cnt) {
> case USRQUOTA:
> qid = make_kqid_uid(inode->i_uid);
> @@ -1427,6 +1434,10 @@ static void __dquot_initialize(struct inode *inode, int type)
> got[cnt] = dqget(sb, qid);
> }
>
> + /* all required i_dquot has been initialized */
> + if (!dq_get)
> + return;
> +
> down_write(&sb_dqopt(sb)->dqptr_sem);
> if (IS_NOQUOTA(inode))
> goto out_err;
> --
> 1.7.1
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-06-02 08:34:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3 v2] quota: remove dqptr_sem

On Wed 28-05-14 09:55:10, Niu Yawei wrote:
> Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
> accessing inode->i_dquot now protected by dquot_srcu, and changing
> inode->i_dquot is now serialized by dq_data_lock.
The patch is mostly fine. Just some minor comments below.

Honza

> Signed-off-by: Lai Siyao <[email protected]>
> Signed-off-by: Niu Yawei <[email protected]>
> ---
> fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------
> fs/super.c | 1 -
> include/linux/quota.h | 1 -
> 3 files changed, 41 insertions(+), 66 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index dc6f711..b86c88b 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -96,13 +96,15 @@
> * Note that some things (eg. sb pointer, type, id) doesn't change during
> * the life of the dquot structure and so needn't to be protected by a lock
> *
> - * Any operation working on dquots via inode pointers must hold dqptr_sem. If
> - * operation is just reading pointers from inode (or not using them at all) the
> - * read lock is enough. If pointers are altered function must hold write lock.
> + * Operation accessing dquots via inode pointers are protected by dquot_srcu.
> + * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
> + * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
This is not actually precise. It should be:
and synchronize_srcu(&dquot_srcu) is called after clearing pointers from
inode and before dropping dquot references to avoid use of dquots after
they are freed.

Now that we have the rule spelled out exactly, I think we should update
what remove_inode_dquot_ref() does. It should do something like:

if (list_empty(&dquot->dq_free)) {
spin_lock(&dq_list_lock);
/*
* The inode still has reference to dquot so it can't be in the
* free list
*/
list_add(&dquot->dq_free, tofree_head);
spin_unlock(&dq_list_lock);
} else {
/*
* Dquot is already in a list to put so we won't drop the last
* reference here.
*/
dqput(dquot);
}

Although in practice this should be mostly the same as the current code
this makes it more obvious we keep one reference to each dquot from inodes
until after we call synchronize_srcu(). And you can make this change as a
separate patch before the dqptr_sem removal.

> + * use after free. dq_data_lock is used to serialize the pointer setting and
> + * clearing operations.
> * Special care needs to be taken about S_NOQUOTA inode flag (marking that
> * inode is a quota file). Functions adding pointers from inode to dquots have
> - * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
> - * have to do all pointer modifications before dropping dqptr_sem. This makes
> + * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
> + * have to do all pointer modifications before dropping dq_data_lock. This makes
> * sure they cannot race with quotaon which first sets S_NOQUOTA flag and
> * then drops all pointers to dquots from an inode.
> *
...
> @@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
> int cnt;
> struct dquot *put[MAXQUOTAS];
>
> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_lock(&dq_data_lock);
> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> put[cnt] = inode->i_dquot[cnt];
> inode->i_dquot[cnt] = NULL;
> }
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_unlock(&dq_data_lock);
> + synchronize_srcu(&dquot_srcu);
> dqput_all(put);
> }
You don't have to call sychronize_srcu() here. There can be no other
users of the inode when __dquot_drop() is called. So noone should be using
inode dquot pointers as well. Probably we should document this assumption
before dquot_drop().

> @@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
> warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
> }
> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> +
> + spin_lock(&dq_data_lock);
> if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> + spin_unlock(&dq_data_lock);
> return 0;
> }
> - spin_lock(&dq_data_lock);
> cur_space = inode_get_bytes(inode);
> rsv_space = inode_get_rsv_space(inode);
> space = cur_space + rsv_space;
> @@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> inode->i_dquot[cnt] = transfer_to[cnt];
> }
> spin_unlock(&dq_data_lock);
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>
> mark_all_dquot_dirty(transfer_from);
> mark_all_dquot_dirty(transfer_to);
> @@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
> return 0;
> over_quota:
> spin_unlock(&dq_data_lock);
> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> flush_warnings(warn_to);
> return ret;
Hum, you are missing srcu protection in __dquot_transfer()... Now we are
holding extra dquot references here so we are fine but it really deserves a
comment somewhere in the header before the function.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-06-03 09:51:44

by Niu Yawei

[permalink] [raw]
Subject: Re: [PATCH 3/3 v2] quota: remove dqptr_sem

Thanks for the review, Honza.
> On Wed 28-05-14 09:55:10, Niu Yawei wrote:
>> Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
>> accessing inode->i_dquot now protected by dquot_srcu, and changing
>> inode->i_dquot is now serialized by dq_data_lock.
> The patch is mostly fine. Just some minor comments below.
>
> Honza
>
>> Signed-off-by: Lai Siyao <[email protected]>
>> Signed-off-by: Niu Yawei <[email protected]>
>> ---
>> fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------
>> fs/super.c | 1 -
>> include/linux/quota.h | 1 -
>> 3 files changed, 41 insertions(+), 66 deletions(-)
>>
>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>> index dc6f711..b86c88b 100644
>> --- a/fs/quota/dquot.c
>> +++ b/fs/quota/dquot.c
>> @@ -96,13 +96,15 @@
>> * Note that some things (eg. sb pointer, type, id) doesn't change during
>> * the life of the dquot structure and so needn't to be protected by a lock
>> *
>> - * Any operation working on dquots via inode pointers must hold dqptr_sem. If
>> - * operation is just reading pointers from inode (or not using them at all) the
>> - * read lock is enough. If pointers are altered function must hold write lock.
>> + * Operation accessing dquots via inode pointers are protected by dquot_srcu.
>> + * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
>> + * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
> This is not actually precise. It should be:
> and synchronize_srcu(&dquot_srcu) is called after clearing pointers from
> inode and before dropping dquot references to avoid use of dquots after
> they are freed.
>
> Now that we have the rule spelled out exactly, I think we should update
> what remove_inode_dquot_ref() does. It should do something like:
>
> if (list_empty(&dquot->dq_free)) {
> spin_lock(&dq_list_lock);
> /*
> * The inode still has reference to dquot so it can't be in the
> * free list
> */
> list_add(&dquot->dq_free, tofree_head);
> spin_unlock(&dq_list_lock);
> } else {
> /*
> * Dquot is already in a list to put so we won't drop the last
> * reference here.
> */
> dqput(dquot);
> }
>
> Although in practice this should be mostly the same as the current code
> this makes it more obvious we keep one reference to each dquot from inodes
> until after we call synchronize_srcu(). And you can make this change as a
> separate patch before the dqptr_sem removal.
I don't quite follow this: in which condition the dq_free is not empty?
I think it could
be that dquot has been put in tofree_head before, and it was found by
dqget() and become
inuse again, right? Then won't this race with drop_dquot_ref() ->
put_dquot_list()? Actually,
it looks to me that the old version of remove_inode_dquot_ref() has the
same race. Did I
miss anyting?

My another concern is: in dqcache_shrink_scan(), we scan free_dquots
list without holding
the dq_list_lock, won't this race with dqget()/dqput()?
>> + * use after free. dq_data_lock is used to serialize the pointer setting and
>> + * clearing operations.
>> * Special care needs to be taken about S_NOQUOTA inode flag (marking that
>> * inode is a quota file). Functions adding pointers from inode to dquots have
>> - * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
>> - * have to do all pointer modifications before dropping dqptr_sem. This makes
>> + * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
>> + * have to do all pointer modifications before dropping dq_data_lock. This makes
>> * sure they cannot race with quotaon which first sets S_NOQUOTA flag and
>> * then drops all pointers to dquots from an inode.
>> *
> ...
>> @@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
>> int cnt;
>> struct dquot *put[MAXQUOTAS];
>>
>> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> + spin_lock(&dq_data_lock);
>> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
>> put[cnt] = inode->i_dquot[cnt];
>> inode->i_dquot[cnt] = NULL;
>> }
>> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> + spin_unlock(&dq_data_lock);
>> + synchronize_srcu(&dquot_srcu);
>> dqput_all(put);
>> }
> You don't have to call sychronize_srcu() here. There can be no other
> users of the inode when __dquot_drop() is called. So noone should be using
> inode dquot pointers as well. Probably we should document this assumption
> before dquot_drop().
>
I'm fine to remove this and add comments before this fucntion, but I'm
wondering that
if it's safer to call an additional synchronize_srcu() here? (In case
of someone use this
function for other purpose in the future.)
>> @@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
>> warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
>> warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
>> }
>> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> +
>> + spin_lock(&dq_data_lock);
>> if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
>> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> + spin_unlock(&dq_data_lock);
>> return 0;
>> }
>> - spin_lock(&dq_data_lock);
>> cur_space = inode_get_bytes(inode);
>> rsv_space = inode_get_rsv_space(inode);
>> space = cur_space + rsv_space;
>> @@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
>> inode->i_dquot[cnt] = transfer_to[cnt];
>> }
>> spin_unlock(&dq_data_lock);
>> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>>
>> mark_all_dquot_dirty(transfer_from);
>> mark_all_dquot_dirty(transfer_to);
>> @@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
>> return 0;
>> over_quota:
>> spin_unlock(&dq_data_lock);
>> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> flush_warnings(warn_to);
>> return ret;
> Hum, you are missing srcu protection in __dquot_transfer()... Now we are
> holding extra dquot references here so we are fine but it really deserves a
> comment somewhere in the header before the function.
Yes, we are holding reference. I'll add comments to explain it. Thanks.
>
> Honza


2014-06-03 15:43:08

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3 v2] quota: remove dqptr_sem

On Tue 03-06-14 17:51:44, Niu Yawei wrote:
> Thanks for the review, Honza.
> > On Wed 28-05-14 09:55:10, Niu Yawei wrote:
> >> Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
> >> accessing inode->i_dquot now protected by dquot_srcu, and changing
> >> inode->i_dquot is now serialized by dq_data_lock.
> > The patch is mostly fine. Just some minor comments below.
> >
> > Honza
> >
> >> Signed-off-by: Lai Siyao <[email protected]>
> >> Signed-off-by: Niu Yawei <[email protected]>
> >> ---
> >> fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------
> >> fs/super.c | 1 -
> >> include/linux/quota.h | 1 -
> >> 3 files changed, 41 insertions(+), 66 deletions(-)
> >>
> >> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> >> index dc6f711..b86c88b 100644
> >> --- a/fs/quota/dquot.c
> >> +++ b/fs/quota/dquot.c
> >> @@ -96,13 +96,15 @@
> >> * Note that some things (eg. sb pointer, type, id) doesn't change during
> >> * the life of the dquot structure and so needn't to be protected by a lock
> >> *
> >> - * Any operation working on dquots via inode pointers must hold dqptr_sem. If
> >> - * operation is just reading pointers from inode (or not using them at all) the
> >> - * read lock is enough. If pointers are altered function must hold write lock.
> >> + * Operation accessing dquots via inode pointers are protected by dquot_srcu.
> >> + * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
> >> + * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid
> > This is not actually precise. It should be:
> > and synchronize_srcu(&dquot_srcu) is called after clearing pointers from
> > inode and before dropping dquot references to avoid use of dquots after
> > they are freed.
> >
> > Now that we have the rule spelled out exactly, I think we should update
> > what remove_inode_dquot_ref() does. It should do something like:
> >
> > if (list_empty(&dquot->dq_free)) {
> > spin_lock(&dq_list_lock);
> > /*
> > * The inode still has reference to dquot so it can't be in the
> > * free list
> > */
> > list_add(&dquot->dq_free, tofree_head);
> > spin_unlock(&dq_list_lock);
> > } else {
> > /*
> > * Dquot is already in a list to put so we won't drop the last
> > * reference here.
> > */
> > dqput(dquot);
> > }
> >
> > Although in practice this should be mostly the same as the current code
> > this makes it more obvious we keep one reference to each dquot from inodes
> > until after we call synchronize_srcu(). And you can make this change as a
> > separate patch before the dqptr_sem removal.
> I don't quite follow this: in which condition the dq_free is not empty?
If we already added the dquot to tofree_head. Don't forget that there are
likely many references to one dquot from different inodes. And we want to
add dquot to the list just once.

> I think it could be that dquot has been put in tofree_head before, and it
> was found by dqget() and become inuse again, right?
This cannot really happen - by the time remove_inode_dquot_ref() runs we
have quota type marked as inactive and so dqget() will refuse to return any
references to dquots of that type.

> Then won't this race
> with drop_dquot_ref() -> put_dquot_list()? Actually, it looks to me that
> the old version of remove_inode_dquot_ref() has the same race. Did I miss
> anyting?
>
> My another concern is: in dqcache_shrink_scan(), we scan free_dquots
> list without holding
> the dq_list_lock, won't this race with dqget()/dqput()?
Yes, that's a bug introduced by commit
1ab6c4997e04a00c50c6d786c2f046adc0d1f5de. Good spotting!
dqcache_shrink_scan() should hold dq_list_lock all the time it runs. Will
you add a fix to your series so that you get credit?

> >> + * use after free. dq_data_lock is used to serialize the pointer setting and
> >> + * clearing operations.
> >> * Special care needs to be taken about S_NOQUOTA inode flag (marking that
> >> * inode is a quota file). Functions adding pointers from inode to dquots have
> >> - * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
> >> - * have to do all pointer modifications before dropping dqptr_sem. This makes
> >> + * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
> >> + * have to do all pointer modifications before dropping dq_data_lock. This makes
> >> * sure they cannot race with quotaon which first sets S_NOQUOTA flag and
> >> * then drops all pointers to dquots from an inode.
> >> *
> > ...
> >> @@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode)
> >> int cnt;
> >> struct dquot *put[MAXQUOTAS];
> >>
> >> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> >> + spin_lock(&dq_data_lock);
> >> for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> >> put[cnt] = inode->i_dquot[cnt];
> >> inode->i_dquot[cnt] = NULL;
> >> }
> >> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
> >> + spin_unlock(&dq_data_lock);
> >> + synchronize_srcu(&dquot_srcu);
> >> dqput_all(put);
> >> }
> > You don't have to call sychronize_srcu() here. There can be no other
> > users of the inode when __dquot_drop() is called. So noone should be using
> > inode dquot pointers as well. Probably we should document this assumption
> > before dquot_drop().
> >
> I'm fine to remove this and add comments before this fucntion, but I'm
> wondering that
> if it's safer to call an additional synchronize_srcu() here? (In case
> of someone use this
> function for other purpose in the future.)
Well, but synchronize_srcu() is quite expensive and would get called when
evicting each inode with quota initialized. So I think that would be too
expensive safety... You can probably add there:
WARN_ON(!(inode->i_flags & (I_NEW | I_FREEING)));
to catch unexpected users.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-06-04 04:19:16

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex

Subject: [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex

dqptr_sem will go away. Protect Q_GETFMT quotactl by
dqonoff_mutex instead. This is also enough to make sure
quota info will not go away while we are looking at it.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/quota.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 2b363e2..e4851cb 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
{
__u32 fmt;

- down_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
if (!sb_has_quota_active(sb, type)) {
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
return -ESRCH;
}
fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
- up_read(&sb_dqopt(sb)->dqptr_sem);
+ mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
if (copy_to_user(addr, &fmt, sizeof(fmt)))
return -EFAULT;
return 0;
--
1.7.1



2014-06-04 04:20:30

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 2/5] quota: avoid unnecessary dqget()/dqput() calls

Avoid unnecessary dqget()/dqput() calls in __dquot_initialize(),
that will introduce global lock contention otherwise.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9cd5f63..a00201d 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1400,7 +1400,7 @@ static int dquot_active(const struct inode *inode)
*/
static void __dquot_initialize(struct inode *inode, int type)
{
- int cnt;
+ int cnt, init_needed = 0;
struct dquot *got[MAXQUOTAS];
struct super_block *sb = inode->i_sb;
qsize_t rsv;
@@ -1416,6 +1416,13 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = NULL;
if (type != -1 && cnt != type)
continue;
+ /* The i_dquot should have been initialized in most cases,
+ * we check it without locking here to avoid unnecessary
+ * dqget()/dqput() calls. */
+ if (inode->i_dquot[cnt])
+ continue;
+ init_needed = 1;
+
switch (cnt) {
case USRQUOTA:
qid = make_kqid_uid(inode->i_uid);
@@ -1427,6 +1434,10 @@ static void __dquot_initialize(struct inode *inode, int type)
got[cnt] = dqget(sb, qid);
}

+ /* All required i_dquot has been initialized */
+ if (!init_needed)
+ return;
+
down_write(&sb_dqopt(sb)->dqptr_sem);
if (IS_NOQUOTA(inode))
goto out_err;
--
1.7.1



2014-06-04 04:21:30

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 3/5] quota: simplify remove_inode_dquot_ref()

Simplify the remove_inode_dquot_ref() to make it more obvious
that now we keep one reference for each dquot from inodes.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 51 +++++++++++++++++++--------------------------------
1 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index a00201d..51b5763 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -731,7 +731,6 @@ static struct shrinker dqcache_shrinker = {

/*
* Put reference to dquot
- * NOTE: If you change this function please check whether dqput_blocks() works right...
*/
void dqput(struct dquot *dquot)
{
@@ -961,46 +960,34 @@ static void add_dquot_ref(struct super_block *sb, int type)
}

/*
- * Return 0 if dqput() won't block.
- * (note that 1 doesn't necessarily mean blocking)
- */
-static inline int dqput_blocks(struct dquot *dquot)
-{
- if (atomic_read(&dquot->dq_count) <= 1)
- return 1;
- return 0;
-}
-
-/*
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
* We can't race with anybody because we hold dqptr_sem for writing...
*/
-static int remove_inode_dquot_ref(struct inode *inode, int type,
- struct list_head *tofree_head)
+static void remove_inode_dquot_ref(struct inode *inode, int type,
+ struct list_head *tofree_head)
{
struct dquot *dquot = inode->i_dquot[type];

inode->i_dquot[type] = NULL;
- if (dquot) {
- if (dqput_blocks(dquot)) {
-#ifdef CONFIG_QUOTA_DEBUG
- if (atomic_read(&dquot->dq_count) != 1)
- quota_error(inode->i_sb, "Adding dquot with "
- "dq_count %d to dispose list",
- atomic_read(&dquot->dq_count));
-#endif
- spin_lock(&dq_list_lock);
- /* As dquot must have currently users it can't be on
- * the free list... */
- list_add(&dquot->dq_free, tofree_head);
- spin_unlock(&dq_list_lock);
- return 1;
- }
- else
- dqput(dquot); /* We have guaranteed we won't block */
+ if (!dquot)
+ return;
+
+ if (list_empty(&dquot->dq_free)) {
+ /*
+ * The inode still has reference to dquot so it can't be in the
+ * free list
+ */
+ spin_lock(&dq_list_lock);
+ list_add(&dquot->dq_free, tofree_head);
+ spin_unlock(&dq_list_lock);
+ } else {
+ /*
+ * Dquot is already in a list to put so we won't drop the last
+ * reference here.
+ */
+ dqput(dquot);
}
- return 0;
}

/*
--
1.7.1



2014-06-04 04:22:13

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 4/5] quota: missing lock in dqcache_shrink_scan()

dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 51b5763..a94d464 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -702,6 +702,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
struct dquot *dquot;
unsigned long freed = 0;

+ spin_lock(&dq_list_lock);
head = free_dquots.prev;
while (head != &free_dquots && sc->nr_to_scan) {
dquot = list_entry(head, struct dquot, dq_free);
@@ -713,6 +714,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
freed++;
head = free_dquots.prev;
}
+ spin_unlock(&dq_list_lock);
return freed;
}

--
1.7.1



2014-06-04 04:23:22

by Niu Yawei

[permalink] [raw]
Subject: [PATCH 5/5] quota: remove dqptr_sem

Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem,
accessing inode->i_dquot now protected by dquot_srcu, and changing
inode->i_dquot is now serialized by dq_data_lock.

Signed-off-by: Lai Siyao <[email protected]>
Signed-off-by: Niu Yawei <[email protected]>
---
fs/quota/dquot.c | 114 +++++++++++++++++++++----------------------------
fs/super.c | 1 -
include/linux/quota.h | 1 -
3 files changed, 49 insertions(+), 67 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index a94d464..5d2e02c 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -96,13 +96,16 @@
* Note that some things (eg. sb pointer, type, id) doesn't change during
* the life of the dquot structure and so needn't to be protected by a lock
*
- * Any operation working on dquots via inode pointers must hold dqptr_sem. If
- * operation is just reading pointers from inode (or not using them at all) the
- * read lock is enough. If pointers are altered function must hold write lock.
+ * Operation accessing dquots via inode pointers are protected by dquot_srcu.
+ * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and
+ * synchronize_srcu(&dquot_srcu) is called after clearing pointers from
+ * inode and before dropping dquot references to avoid use of dquots after
+ * they are freed. dq_data_lock is used to serialize the pointer setting and
+ * clearing operations.
* Special care needs to be taken about S_NOQUOTA inode flag (marking that
* inode is a quota file). Functions adding pointers from inode to dquots have
- * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they
- * have to do all pointer modifications before dropping dqptr_sem. This makes
+ * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they
+ * have to do all pointer modifications before dropping dq_data_lock. This makes
* sure they cannot race with quotaon which first sets S_NOQUOTA flag and
* then drops all pointers to dquots from an inode.
*
@@ -116,21 +119,15 @@
* spinlock to internal buffers before writing.
*
* Lock ordering (including related VFS locks) is the following:
- * dqonoff_mutex > i_mutex > journal_lock > dqptr_sem > dquot->dq_lock >
- * dqio_mutex
+ * dqonoff_mutex > i_mutex > journal_lock > dquot->dq_lock > dqio_mutex
* dqonoff_mutex > i_mutex comes from dquot_quota_sync, dquot_enable, etc.
- * The lock ordering of dqptr_sem imposed by quota code is only dqonoff_sem >
- * dqptr_sem. But filesystem has to count with the fact that functions such as
- * dquot_alloc_space() acquire dqptr_sem and they usually have to be called
- * from inside a transaction to keep filesystem consistency after a crash. Also
- * filesystems usually want to do some IO on dquot from ->mark_dirty which is
- * called with dqptr_sem held.
*/

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_list_lock);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_state_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dq_data_lock);
EXPORT_SYMBOL(dq_data_lock);
+DEFINE_STATIC_SRCU(dquot_srcu);

void __quota_error(struct super_block *sb, const char *func,
const char *fmt, ...)
@@ -964,7 +961,6 @@ static void add_dquot_ref(struct super_block *sb, int type)
/*
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
- * We can't race with anybody because we hold dqptr_sem for writing...
*/
static void remove_inode_dquot_ref(struct inode *inode, int type,
struct list_head *tofree_head)
@@ -1024,13 +1020,15 @@ static void remove_dquot_ref(struct super_block *sb, int type,
* We have to scan also I_NEW inodes because they can already
* have quota pointer initialized. Luckily, we need to touch
* only quota pointers and these have separate locking
- * (dqptr_sem).
+ * (dq_data_lock).
*/
+ spin_lock(&dq_data_lock);
if (!IS_NOQUOTA(inode)) {
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
remove_inode_dquot_ref(inode, type, tofree_head);
}
+ spin_unlock(&dq_data_lock);
}
spin_unlock(&inode_sb_list_lock);
#ifdef CONFIG_QUOTA_DEBUG
@@ -1048,9 +1046,8 @@ static void drop_dquot_ref(struct super_block *sb, int type)
LIST_HEAD(tofree_head);

if (sb->dq_op) {
- down_write(&sb_dqopt(sb)->dqptr_sem);
remove_dquot_ref(sb, type, &tofree_head);
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ synchronize_srcu(&dquot_srcu);
put_dquot_list(&tofree_head);
}
}
@@ -1381,9 +1378,6 @@ static int dquot_active(const struct inode *inode)
/*
* Initialize quota pointers in inode
*
- * We do things in a bit complicated way but by that we avoid calling
- * dqget() and thus filesystem callbacks under dqptr_sem.
- *
* It is better to call this function outside of any transaction as it
* might need a lot of space in journal for dquot structure allocation.
*/
@@ -1394,8 +1388,6 @@ static void __dquot_initialize(struct inode *inode, int type)
struct super_block *sb = inode->i_sb;
qsize_t rsv;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

@@ -1427,7 +1419,7 @@ static void __dquot_initialize(struct inode *inode, int type)
if (!init_needed)
return;

- down_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode))
goto out_err;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1447,15 +1439,12 @@ static void __dquot_initialize(struct inode *inode, int type)
* did a write before quota was turned on
*/
rsv = inode_get_rsv_space(inode);
- if (unlikely(rsv)) {
- spin_lock(&dq_data_lock);
+ if (unlikely(rsv))
dquot_resv_space(inode->i_dquot[cnt], rsv);
- spin_unlock(&dq_data_lock);
- }
}
}
out_err:
- up_write(&sb_dqopt(sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
/* Drop unused references */
dqput_all(got);
}
@@ -1467,19 +1456,24 @@ void dquot_initialize(struct inode *inode)
EXPORT_SYMBOL(dquot_initialize);

/*
- * Release all quotas referenced by inode
+ * Release all quotas referenced by inode.
+ *
+ * This function only be called on inode free or converting
+ * a file to quota file, no other users for the i_dquot in
+ * both cases, so we needn't call synchronize_srcu() after
+ * clearing i_dquot.
*/
static void __dquot_drop(struct inode *inode)
{
int cnt;
struct dquot *put[MAXQUOTAS];

- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
put[cnt] = inode->i_dquot[cnt];
inode->i_dquot[cnt] = NULL;
}
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
dqput_all(put);
}

@@ -1597,15 +1591,11 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
*/
int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
int reserve = flags & DQUOT_SPACE_RESERVE;

- /*
- * First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex
- */
if (!dquot_active(inode)) {
inode_incr_space(inode, number, reserve);
goto out;
@@ -1614,7 +1604,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1641,7 +1631,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
goto out_flush_warn;
mark_all_dquot_dirty(dquots);
out_flush_warn:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
out:
return ret;
@@ -1653,17 +1643,16 @@ EXPORT_SYMBOL(__dquot_alloc_space);
*/
int dquot_alloc_inode(const struct inode *inode)
{
- int cnt, ret = 0;
+ int cnt, ret = 0, index;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return 0;
for (cnt = 0; cnt < MAXQUOTAS; cnt++)
warn[cnt].w_type = QUOTA_NL_NOWARN;
- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (!dquots[cnt])
@@ -1683,7 +1672,7 @@ warn_put_all:
spin_unlock(&dq_data_lock);
if (ret == 0)
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
return ret;
}
@@ -1694,14 +1683,14 @@ EXPORT_SYMBOL(dquot_alloc_inode);
*/
int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_claim_rsv_space(inode, number);
return 0;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1713,7 +1702,7 @@ int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
inode_claim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return 0;
}
EXPORT_SYMBOL(dquot_claim_space_nodirty);
@@ -1723,14 +1712,14 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
*/
void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
{
- int cnt;
+ int cnt, index;

if (!dquot_active(inode)) {
inode_reclaim_rsv_space(inode, number);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
/* Claim reserved quotas to allocated quotas */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
@@ -1742,7 +1731,7 @@ void dquot_reclaim_space_nodirty(struct inode *inode, qsize_t number)
inode_reclaim_rsv_space(inode, number);
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(inode->i_dquot);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
return;
}
EXPORT_SYMBOL(dquot_reclaim_space_nodirty);
@@ -1755,16 +1744,14 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot **dquots = inode->i_dquot;
- int reserve = flags & DQUOT_SPACE_RESERVE;
+ int reserve = flags & DQUOT_SPACE_RESERVE, index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode)) {
inode_decr_space(inode, number, reserve);
return;
}

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1787,7 +1774,7 @@ void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
goto out_unlock;
mark_all_dquot_dirty(dquots);
out_unlock:
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(__dquot_free_space);
@@ -1800,13 +1787,12 @@ void dquot_free_inode(const struct inode *inode)
unsigned int cnt;
struct dquot_warn warn[MAXQUOTAS];
struct dquot * const *dquots = inode->i_dquot;
+ int index;

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (!dquot_active(inode))
return;

- down_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ index = srcu_read_lock(&dquot_srcu);
spin_lock(&dq_data_lock);
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
int wtype;
@@ -1821,7 +1807,7 @@ void dquot_free_inode(const struct inode *inode)
}
spin_unlock(&dq_data_lock);
mark_all_dquot_dirty(dquots);
- up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ srcu_read_unlock(&dquot_srcu, index);
flush_warnings(warn);
}
EXPORT_SYMBOL(dquot_free_inode);
@@ -1835,6 +1821,8 @@ EXPORT_SYMBOL(dquot_free_inode);
* This operation can block, but only after everything is updated
* A transaction must be started when entering this function.
*
+ * We are holding reference on transfer_from & transfer_to, no need to
+ * protect them by srcu_read_lock().
*/
int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
{
@@ -1847,8 +1835,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
struct dquot_warn warn_from_inodes[MAXQUOTAS];
struct dquot_warn warn_from_space[MAXQUOTAS];

- /* First test before acquiring mutex - solves deadlocks when we
- * re-enter the quota code and are already holding the mutex */
if (IS_NOQUOTA(inode))
return 0;
/* Initialize the arrays */
@@ -1857,12 +1843,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN;
warn_from_space[cnt].w_type = QUOTA_NL_NOWARN;
}
- down_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+
+ spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode)) { /* File without quota accounting? */
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
+ spin_unlock(&dq_data_lock);
return 0;
}
- spin_lock(&dq_data_lock);
cur_space = inode_get_bytes(inode);
rsv_space = inode_get_rsv_space(inode);
space = cur_space + rsv_space;
@@ -1916,7 +1902,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
inode->i_dquot[cnt] = transfer_to[cnt];
}
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);

mark_all_dquot_dirty(transfer_from);
mark_all_dquot_dirty(transfer_to);
@@ -1930,7 +1915,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to)
return 0;
over_quota:
spin_unlock(&dq_data_lock);
- up_write(&sb_dqopt(inode->i_sb)->dqptr_sem);
flush_warnings(warn_to);
return ret;
}
diff --git a/fs/super.c b/fs/super.c
index 48377f7..a97aecf 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -214,7 +214,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
mutex_init(&s->s_dquot.dqio_mutex);
mutex_init(&s->s_dquot.dqonoff_mutex);
- init_rwsem(&s->s_dquot.dqptr_sem);
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
diff --git a/include/linux/quota.h b/include/linux/quota.h
index cc7494a..a1358ce 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -389,7 +389,6 @@ struct quota_info {
unsigned int flags; /* Flags for diskquotas on this device */
struct mutex dqio_mutex; /* lock device while I/O in progress */
struct mutex dqonoff_mutex; /* Serialize quotaon & quotaoff */
- struct rw_semaphore dqptr_sem; /* serialize ops using quota_info struct, pointers from inode to dquots */
struct inode *files[MAXQUOTAS]; /* inodes of quotafiles */
struct mem_dqinfo info[MAXQUOTAS]; /* Information for each quota type */
const struct quota_format_ops *ops[MAXQUOTAS]; /* Operations for each type */
--
1.7.1



2014-06-04 15:36:12

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex

On Wed 04-06-14 12:19:12, Niu Yawei wrote:
> Subject: [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex
>
> dqptr_sem will go away. Protect Q_GETFMT quotactl by
> dqonoff_mutex instead. This is also enough to make sure
> quota info will not go away while we are looking at it.
>
> Signed-off-by: Lai Siyao <[email protected]>
> Signed-off-by: Niu Yawei <[email protected]>
Thanks! I've merged all the five patches to my tree. I will give them
some testing. I won't push them in this merge window but it should go in in
the next one.

Honza
> ---
> fs/quota/quota.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 2b363e2..e4851cb 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -79,13 +79,13 @@ static int quota_getfmt(struct super_block *sb, int type, void __user *addr)
> {
> __u32 fmt;
>
> - down_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> if (!sb_has_quota_active(sb, type)) {
> - up_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> return -ESRCH;
> }
> fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id;
> - up_read(&sb_dqopt(sb)->dqptr_sem);
> + mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> if (copy_to_user(addr, &fmt, sizeof(fmt)))
> return -EFAULT;
> return 0;
> --
> 1.7.1
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR