2023-06-28 13:31:01

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 0/7] quota: fix race condition between dqput() and dquot_mark_dquot_dirty()

Hello Honza,

This is a solution that uses dquot_srcu to avoid race condition between
dqput() and dquot_mark_dquot_dirty(). I performed a 24+h fault injection
stress test (6 VMs, 4 test threads per VM) and have not found any problems.
And I tested the performance based on the latest mainline (6aeadf7896bf),
the patch set did not lead to performance degradation, and even a little
bit of improvement.

V1->V2:
Modify the solution to use dquot_srcu.

Baokun Li (7):
quota: factor out dquot_write_dquot()
quota: add new global dquot list releasing_dquots
quota: rename dquot_active() to inode_dquot_active()
quota: add new helper dquot_active()
quota: fix dqput() to follow the guarantees dquot_srcu should provide
quota: simplify drop_dquot_ref()
quota: remove unused function put_dquot_list()

fs/quota/dquot.c | 237 +++++++++++++++++++++++++++--------------------
1 file changed, 134 insertions(+), 103 deletions(-)

--
2.31.1



2023-06-28 13:31:06

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 6/7] quota: simplify drop_dquot_ref()

Now when dqput() drops the last reference count, it will call
synchronize_srcu(&dquot_srcu) in quota_release_workfn() to ensure that
no other user will use the dquot after the last reference count is dropped,
so we don't need to call synchronize_srcu(&dquot_srcu) in drop_dquot_ref()
and remove the corresponding logic directly to simplify the code.

Signed-off-by: Baokun Li <[email protected]>
---
fs/quota/dquot.c | 33 ++++++---------------------------
1 file changed, 6 insertions(+), 27 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index e8e702ac64e5..df028fb2ce72 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1090,8 +1090,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
* Remove references to dquots from inode and add dquot to list for freeing
* if we have the last reference to dquot
*/
-static void remove_inode_dquot_ref(struct inode *inode, int type,
- struct list_head *tofree_head)
+static void remove_inode_dquot_ref(struct inode *inode, int type)
{
struct dquot **dquots = i_dquot(inode);
struct dquot *dquot = dquots[type];
@@ -1100,21 +1099,7 @@ static void remove_inode_dquot_ref(struct inode *inode, int type,
return;

dquots[type] = NULL;
- if (list_empty(&dquot->dq_free)) {
- /*
- * The inode still has reference to dquot so it can't be in the
- * free list
- */
- spin_lock(&dq_list_lock);
- list_add(&dquot->dq_free, tofree_head);
- spin_unlock(&dq_list_lock);
- } else {
- /*
- * Dquot is already in a list to put so we won't drop the last
- * reference here.
- */
- dqput(dquot);
- }
+ dqput(dquot);
}

/*
@@ -1137,8 +1122,7 @@ static void put_dquot_list(struct list_head *tofree_head)
}
}

-static void remove_dquot_ref(struct super_block *sb, int type,
- struct list_head *tofree_head)
+static void remove_dquot_ref(struct super_block *sb, int type)
{
struct inode *inode;
#ifdef CONFIG_QUOTA_DEBUG
@@ -1159,7 +1143,7 @@ static void remove_dquot_ref(struct super_block *sb, int type,
if (unlikely(inode_get_rsv_space(inode) > 0))
reserved = 1;
#endif
- remove_inode_dquot_ref(inode, type, tofree_head);
+ remove_inode_dquot_ref(inode, type);
}
spin_unlock(&dq_data_lock);
}
@@ -1176,13 +1160,8 @@ static void remove_dquot_ref(struct super_block *sb, int type,
/* Gather all references from inodes and drop them */
static void drop_dquot_ref(struct super_block *sb, int type)
{
- LIST_HEAD(tofree_head);
-
- if (sb->dq_op) {
- remove_dquot_ref(sb, type, &tofree_head);
- synchronize_srcu(&dquot_srcu);
- put_dquot_list(&tofree_head);
- }
+ if (sb->dq_op)
+ remove_dquot_ref(sb, type);
}

static inline
--
2.31.1


2023-06-28 13:31:12

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 5/7] quota: fix dqput() to follow the guarantees dquot_srcu should provide

The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:

cpu1 cpu2
_________________|_________________
wb_do_writeback CHOWN(1)
...
ext4_da_update_reserve_space
dquot_claim_block
...
dquot_mark_dquot_dirty // try to dirty old quota
test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
if (test_bit(DQ_MOD_B, &dquot->dq_flags))
// test no dirty, wait dq_list_lock
...
dquot_transfer
__dquot_transfer
dqput_all(transfer_from) // rls old dquot
dqput // last dqput
dquot_release
clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
atomic_dec(&dquot->dq_count)
put_dquot_last(dquot)
list_add_tail(&dquot->dq_free, &free_dquots)
// add the dquot to free_dquots
if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
add dqi_dirty_list // add released dquot to dirty_list

This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.

As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.

Fixes: 4580b30ea887 ("quota: Do not dirty bad dquots")
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
---
fs/quota/dquot.c | 85 ++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 71 insertions(+), 14 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 09787e4f6a00..e8e702ac64e5 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -270,6 +270,9 @@ static qsize_t inode_get_rsv_space(struct inode *inode);
static qsize_t __inode_get_rsv_space(struct inode *inode);
static int __dquot_initialize(struct inode *inode, int type);

+static void quota_release_workfn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(quota_release_work, quota_release_workfn);
+
static inline unsigned int
hashfn(const struct super_block *sb, struct kqid qid)
{
@@ -569,6 +572,8 @@ static void invalidate_dquots(struct super_block *sb, int type)
struct dquot *dquot, *tmp;

restart:
+ flush_delayed_work(&quota_release_work);
+
spin_lock(&dq_list_lock);
list_for_each_entry_safe(dquot, tmp, &inuse_list, dq_inuse) {
if (dquot->dq_sb != sb)
@@ -577,6 +582,12 @@ static void invalidate_dquots(struct super_block *sb, int type)
continue;
/* Wait for dquot users */
if (atomic_read(&dquot->dq_count)) {
+ /* dquot in releasing_dquots, flush and retry */
+ if (!list_empty(&dquot->dq_free)) {
+ spin_unlock(&dq_list_lock);
+ goto restart;
+ }
+
atomic_inc(&dquot->dq_count);
spin_unlock(&dq_list_lock);
/*
@@ -760,6 +771,8 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
struct dquot *dquot;
unsigned long freed = 0;

+ flush_delayed_work(&quota_release_work);
+
spin_lock(&dq_list_lock);
while (!list_empty(&free_dquots) && sc->nr_to_scan) {
dquot = list_first_entry(&free_dquots, struct dquot, dq_free);
@@ -787,6 +800,60 @@ static struct shrinker dqcache_shrinker = {
.seeks = DEFAULT_SEEKS,
};

+/*
+ * Safely release dquot and put reference to dquot.
+ */
+static void quota_release_workfn(struct work_struct *work)
+{
+ struct dquot *dquot;
+ struct list_head rls_head;
+
+ spin_lock(&dq_list_lock);
+ /* Exchange the list head to avoid livelock. */
+ list_replace_init(&releasing_dquots, &rls_head);
+ spin_unlock(&dq_list_lock);
+
+restart:
+ synchronize_srcu(&dquot_srcu);
+ spin_lock(&dq_list_lock);
+ while (!list_empty(&rls_head)) {
+ dquot = list_first_entry(&rls_head, struct dquot, dq_free);
+ if (dquot_dirty(dquot)) {
+ spin_unlock(&dq_list_lock);
+ /* Commit dquot before releasing */
+ dquot_write_dquot(dquot);
+ goto restart;
+ }
+ /* Always clear DQ_ACTIVE_B, unless racing with dqget() */
+ if (dquot_active(dquot)) {
+ spin_unlock(&dq_list_lock);
+ dquot->dq_sb->dq_op->release_dquot(dquot);
+ spin_lock(&dq_list_lock);
+ }
+ /*
+ * During the execution of dquot_release() outside the
+ * dq_list_lock, another process may have completed
+ * dqget()/dqput()/mark_dirty().
+ */
+ if (atomic_read(&dquot->dq_count) == 1 &&
+ (dquot_active(dquot) || dquot_dirty(dquot))) {
+ spin_unlock(&dq_list_lock);
+ goto restart;
+ }
+ /*
+ * Now it is safe to remove this dquot from releasing_dquots
+ * and reduce its reference count.
+ */
+ remove_free_dquot(dquot);
+ atomic_dec(&dquot->dq_count);
+
+ /* We may be racing with some other dqget(). */
+ if (!atomic_read(&dquot->dq_count))
+ put_dquot_last(dquot);
+ }
+ spin_unlock(&dq_list_lock);
+}
+
/*
* Put reference to dquot
*/
@@ -803,7 +870,7 @@ void dqput(struct dquot *dquot)
}
#endif
dqstats_inc(DQST_DROPS);
-we_slept:
+
spin_lock(&dq_list_lock);
if (atomic_read(&dquot->dq_count) > 1) {
/* We have more than one user... nothing to do */
@@ -815,25 +882,15 @@ void dqput(struct dquot *dquot)
spin_unlock(&dq_list_lock);
return;
}
+
/* Need to release dquot? */
- if (dquot_dirty(dquot)) {
- spin_unlock(&dq_list_lock);
- /* Commit dquot before releasing */
- dquot_write_dquot(dquot);
- goto we_slept;
- }
- if (dquot_active(dquot)) {
- spin_unlock(&dq_list_lock);
- dquot->dq_sb->dq_op->release_dquot(dquot);
- goto we_slept;
- }
- atomic_dec(&dquot->dq_count);
#ifdef CONFIG_QUOTA_DEBUG
/* sanity check */
BUG_ON(!list_empty(&dquot->dq_free));
#endif
- put_dquot_last(dquot);
+ put_releasing_dquots(dquot);
spin_unlock(&dq_list_lock);
+ queue_delayed_work(system_unbound_wq, &quota_release_work, 1);
}
EXPORT_SYMBOL(dqput);

--
2.31.1


2023-06-28 13:31:13

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 2/7] quota: add new global dquot list releasing_dquots

Add a new global dquot list that obeys the following rules:

1). A dquot is added to this list when its last reference count is about
to be dropped.
2). The reference count of the dquot in the list is greater than or equal
to 1 ( due to possible race with dqget()).
3). When a dquot is removed from this list, a reference count is always
subtracted, and if the reference count is then 0, the dquot is added
to the free_dquots list.

This list is used to safely perform the final cleanup before releasing
the last reference count, to avoid various contention issues caused by
performing cleanup directly in dqput(), and to avoid the performance impact
caused by calling synchronize_srcu(&dquot_srcu) directly in dqput(). Here
it is just defining the list and implementing the corresponding operation
function, which we will use later.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
---
fs/quota/dquot.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 108ba9f1e420..a8b43b5b5623 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -226,12 +226,21 @@ static void put_quota_format(struct quota_format_type *fmt)
/*
* Dquot List Management:
* The quota code uses four lists for dquot management: the inuse_list,
- * free_dquots, dqi_dirty_list, and dquot_hash[] array. A single dquot
- * structure may be on some of those lists, depending on its current state.
+ * releasing_dquots, free_dquots, dqi_dirty_list, and dquot_hash[] array.
+ * A single dquot structure may be on some of those lists, depending on
+ * its current state.
*
* All dquots are placed to the end of inuse_list when first created, and this
* list is used for invalidate operation, which must look at every dquot.
*
+ * When the last reference of a dquot will be dropped, the dquot will be
+ * added to releasing_dquots. We'd then queue work item which would call
+ * synchronize_srcu() and after that perform the final cleanup of all the
+ * dquots on the list. Both releasing_dquots and free_dquots use the
+ * dq_free list_head in the dquot struct. when a dquot is removed from
+ * releasing_dquots, a reference count is always subtracted, and if
+ * dq_count == 0 at that point, the dquot will be added to the free_dquots.
+ *
* Unused dquots (dq_count == 0) are added to the free_dquots list when freed,
* and this list is searched whenever we need an available dquot. Dquots are
* removed from the list as soon as they are used again, and
@@ -250,6 +259,7 @@ static void put_quota_format(struct quota_format_type *fmt)

static LIST_HEAD(inuse_list);
static LIST_HEAD(free_dquots);
+static LIST_HEAD(releasing_dquots);
static unsigned int dq_hash_bits, dq_hash_mask;
static struct hlist_head *dquot_hash;

@@ -305,6 +315,13 @@ static inline void put_dquot_last(struct dquot *dquot)
dqstats_inc(DQST_FREE_DQUOTS);
}

+static inline void put_releasing_dquots(struct dquot *dquot)
+{
+ list_add_tail(&dquot->dq_free, &releasing_dquots);
+ /* dquot will be moved to free_dquots during shrink. */
+ dqstats_inc(DQST_FREE_DQUOTS);
+}
+
static inline void remove_free_dquot(struct dquot *dquot)
{
if (list_empty(&dquot->dq_free))
--
2.31.1


2023-06-28 13:31:16

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 4/7] quota: add new helper dquot_active()

Add new helper function dquot_active() to make the code more concise.

Signed-off-by: Baokun Li <[email protected]>
---
fs/quota/dquot.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index b21f5e888482..09787e4f6a00 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -353,6 +353,11 @@ static void wait_on_dquot(struct dquot *dquot)
mutex_unlock(&dquot->dq_lock);
}

+static inline int dquot_active(struct dquot *dquot)
+{
+ return test_bit(DQ_ACTIVE_B, &dquot->dq_flags);
+}
+
static inline int dquot_dirty(struct dquot *dquot)
{
return test_bit(DQ_MOD_B, &dquot->dq_flags);
@@ -368,14 +373,14 @@ int dquot_mark_dquot_dirty(struct dquot *dquot)
{
int ret = 1;

- if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
+ if (!dquot_active(dquot))
return 0;

if (sb_dqopt(dquot->dq_sb)->flags & DQUOT_NOLIST_DIRTY)
return test_and_set_bit(DQ_MOD_B, &dquot->dq_flags);

/* If quota is dirty already, we don't have to acquire dq_list_lock */
- if (test_bit(DQ_MOD_B, &dquot->dq_flags))
+ if (dquot_dirty(dquot))
return 1;

spin_lock(&dq_list_lock);
@@ -457,7 +462,7 @@ int dquot_acquire(struct dquot *dquot)
smp_mb__before_atomic();
set_bit(DQ_READ_B, &dquot->dq_flags);
/* Instantiate dquot if needed */
- if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags) && !dquot->dq_off) {
+ if (!dquot_active(dquot) && !dquot->dq_off) {
ret = dqopt->ops[dquot->dq_id.type]->commit_dqblk(dquot);
/* Write the info if needed */
if (info_dirty(&dqopt->info[dquot->dq_id.type])) {
@@ -499,7 +504,7 @@ int dquot_commit(struct dquot *dquot)
goto out_lock;
/* Inactive dquot can be only if there was error during read/init
* => we have better not writing it */
- if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
+ if (dquot_active(dquot))
ret = dqopt->ops[dquot->dq_id.type]->commit_dqblk(dquot);
else
ret = -EIO;
@@ -614,7 +619,7 @@ int dquot_scan_active(struct super_block *sb,

spin_lock(&dq_list_lock);
list_for_each_entry(dquot, &inuse_list, dq_inuse) {
- if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
+ if (!dquot_active(dquot))
continue;
if (dquot->dq_sb != sb)
continue;
@@ -629,7 +634,7 @@ int dquot_scan_active(struct super_block *sb,
* outstanding call and recheck the DQ_ACTIVE_B after that.
*/
wait_on_dquot(dquot);
- if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
+ if (dquot_active(dquot)) {
ret = fn(dquot, priv);
if (ret < 0)
goto out;
@@ -680,7 +685,7 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
dquot = list_first_entry(&dirty, struct dquot,
dq_dirty);

- WARN_ON(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags));
+ WARN_ON(!dquot_active(dquot));

/* Now we have active dquot from which someone is
* holding reference so we can safely just increase
@@ -817,7 +822,7 @@ void dqput(struct dquot *dquot)
dquot_write_dquot(dquot);
goto we_slept;
}
- if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
+ if (dquot_active(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
goto we_slept;
@@ -918,7 +923,7 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid)
* already finished or it will be canceled due to dq_count > 1 test */
wait_on_dquot(dquot);
/* Read the dquot / allocate space in quota file */
- if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
+ if (!dquot_active(dquot)) {
int err;

err = sb->dq_op->acquire_dquot(dquot);
--
2.31.1


2023-06-28 13:31:23

by Baokun Li

[permalink] [raw]
Subject: [PATCH v2 7/7] quota: remove unused function put_dquot_list()

Now that no one is calling put_dquot_list(), remove it.

Signed-off-by: Baokun Li <[email protected]>
---
fs/quota/dquot.c | 20 --------------------
1 file changed, 20 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index df028fb2ce72..ba0125473be3 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1102,26 +1102,6 @@ static void remove_inode_dquot_ref(struct inode *inode, int type)
dqput(dquot);
}

-/*
- * Free list of dquots
- * Dquots are removed from inodes and no new references can be got so we are
- * the only ones holding reference
- */
-static void put_dquot_list(struct list_head *tofree_head)
-{
- struct list_head *act_head;
- struct dquot *dquot;
-
- act_head = tofree_head->next;
- while (act_head != tofree_head) {
- dquot = list_entry(act_head, struct dquot, dq_free);
- act_head = act_head->next;
- /* Remove dquot from the list so we won't have problems... */
- list_del_init(&dquot->dq_free);
- dqput(dquot);
- }
-}
-
static void remove_dquot_ref(struct super_block *sb, int type)
{
struct inode *inode;
--
2.31.1


2023-06-29 10:31:45

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] quota: add new global dquot list releasing_dquots

On Wed 28-06-23 21:21:50, Baokun Li wrote:
> Add a new global dquot list that obeys the following rules:
>
> 1). A dquot is added to this list when its last reference count is about
> to be dropped.
> 2). The reference count of the dquot in the list is greater than or equal
> to 1 ( due to possible race with dqget()).
> 3). When a dquot is removed from this list, a reference count is always
> subtracted, and if the reference count is then 0, the dquot is added
> to the free_dquots list.
>
> This list is used to safely perform the final cleanup before releasing
> the last reference count, to avoid various contention issues caused by
> performing cleanup directly in dqput(), and to avoid the performance impact
> caused by calling synchronize_srcu(&dquot_srcu) directly in dqput(). Here
> it is just defining the list and implementing the corresponding operation
> function, which we will use later.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Baokun Li <[email protected]>

I think you can merge this patch with patch 5. It is not like separating
this bit helps in review or anything...

> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 108ba9f1e420..a8b43b5b5623 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -226,12 +226,21 @@ static void put_quota_format(struct quota_format_type *fmt)
> /*
> * Dquot List Management:
> * The quota code uses four lists for dquot management: the inuse_list,
^^^ five now :)

> - * free_dquots, dqi_dirty_list, and dquot_hash[] array. A single dquot
> - * structure may be on some of those lists, depending on its current state.
> + * releasing_dquots, free_dquots, dqi_dirty_list, and dquot_hash[] array.
> + * A single dquot structure may be on some of those lists, depending on
> + * its current state.
> *
> * All dquots are placed to the end of inuse_list when first created, and this
> * list is used for invalidate operation, which must look at every dquot.
> *
> + * When the last reference of a dquot will be dropped, the dquot will be
> + * added to releasing_dquots. We'd then queue work item which would call
> + * synchronize_srcu() and after that perform the final cleanup of all the
> + * dquots on the list. Both releasing_dquots and free_dquots use the
> + * dq_free list_head in the dquot struct. when a dquot is removed from
^^^ Capital W please

> + * releasing_dquots, a reference count is always subtracted, and if
> + * dq_count == 0 at that point, the dquot will be added to the free_dquots.
> + *

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-29 11:07:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] quota: fix dqput() to follow the guarantees dquot_srcu should provide

On Wed 28-06-23 21:21:53, Baokun Li wrote:
> The dquot_mark_dquot_dirty() using dquot references from the inode
> should be protected by dquot_srcu. quota_off code takes care to call
> synchronize_srcu(&dquot_srcu) to not drop dquot references while they
> are used by other users. But dquot_transfer() breaks this assumption.
> We call dquot_transfer() to drop the last reference of dquot and add
> it to free_dquots, but there may still be other users using the dquot
> at this time, as shown in the function graph below:
>
> cpu1 cpu2
> _________________|_________________
> wb_do_writeback CHOWN(1)
> ...
> ext4_da_update_reserve_space
> dquot_claim_block
> ...
> dquot_mark_dquot_dirty // try to dirty old quota
> test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
> if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> // test no dirty, wait dq_list_lock
> ...
> dquot_transfer
> __dquot_transfer
> dqput_all(transfer_from) // rls old dquot
> dqput // last dqput
> dquot_release
> clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
> atomic_dec(&dquot->dq_count)
> put_dquot_last(dquot)
> list_add_tail(&dquot->dq_free, &free_dquots)
> // add the dquot to free_dquots
> if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
> add dqi_dirty_list // add released dquot to dirty_list
>
> This can cause various issues, such as dquot being destroyed by
> dqcache_shrink_scan() after being added to free_dquots, which can trigger
> a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
> and then to dirty_list, it is added to free_dquots again after
> dquot_writeback_dquots() is executed, which causes the free_dquots list to
> be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
> freeing dquot twice.
>
> As Honza said, we need to fix dquot_transfer() to follow the guarantees
> dquot_srcu should provide. But calling synchronize_srcu() directly from
> dquot_transfer() is too expensive (and mostly unnecessary). So we add
> dquot whose last reference should be dropped to the new global dquot
> list releasing_dquots, and then queue work item which would call
> synchronize_srcu() and after that perform the final cleanup of all the
> dquots on releasing_dquots.
>
> Fixes: 4580b30ea887 ("quota: Do not dirty bad dquots")
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Baokun Li <[email protected]>
> ---
> fs/quota/dquot.c | 85 ++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 71 insertions(+), 14 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 09787e4f6a00..e8e702ac64e5 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -270,6 +270,9 @@ static qsize_t inode_get_rsv_space(struct inode *inode);
> static qsize_t __inode_get_rsv_space(struct inode *inode);
> static int __dquot_initialize(struct inode *inode, int type);
>
> +static void quota_release_workfn(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(quota_release_work, quota_release_workfn);
> +
> static inline unsigned int
> hashfn(const struct super_block *sb, struct kqid qid)
> {
> @@ -569,6 +572,8 @@ static void invalidate_dquots(struct super_block *sb, int type)
> struct dquot *dquot, *tmp;
>
> restart:
> + flush_delayed_work(&quota_release_work);
> +
> spin_lock(&dq_list_lock);
> list_for_each_entry_safe(dquot, tmp, &inuse_list, dq_inuse) {
> if (dquot->dq_sb != sb)
> @@ -577,6 +582,12 @@ static void invalidate_dquots(struct super_block *sb, int type)
> continue;
> /* Wait for dquot users */
> if (atomic_read(&dquot->dq_count)) {
> + /* dquot in releasing_dquots, flush and retry */
> + if (!list_empty(&dquot->dq_free)) {
> + spin_unlock(&dq_list_lock);
> + goto restart;
> + }
> +
> atomic_inc(&dquot->dq_count);
> spin_unlock(&dq_list_lock);
> /*
> @@ -760,6 +771,8 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> struct dquot *dquot;
> unsigned long freed = 0;
>
> + flush_delayed_work(&quota_release_work);
> +

I would not flush the work here. Sure, it can make more dquots available
for reclaim but I think it is more important for the shrinker to not wait
on srcu period as shrinker can be called very frequently under memory
pressure.

> spin_lock(&dq_list_lock);
> while (!list_empty(&free_dquots) && sc->nr_to_scan) {
> dquot = list_first_entry(&free_dquots, struct dquot, dq_free);
> @@ -787,6 +800,60 @@ static struct shrinker dqcache_shrinker = {
> .seeks = DEFAULT_SEEKS,
> };
>
> +/*
> + * Safely release dquot and put reference to dquot.
> + */
> +static void quota_release_workfn(struct work_struct *work)
> +{
> + struct dquot *dquot;
> + struct list_head rls_head;
> +
> + spin_lock(&dq_list_lock);
> + /* Exchange the list head to avoid livelock. */
> + list_replace_init(&releasing_dquots, &rls_head);
> + spin_unlock(&dq_list_lock);
> +
> +restart:
> + synchronize_srcu(&dquot_srcu);
> + spin_lock(&dq_list_lock);
> + while (!list_empty(&rls_head)) {

I think the logic below needs a bit more work. Firstly, I think that
dqget() should removing dquots from releasing_dquots list - basically just
replace the:
if (!atomic_read(&dquot->dq_count))
remove_free_dquot(dquot);
with
/* Dquot on releasing_dquots list? Drop ref kept by that list. */
if (atomic_read(&dquot->dq_count) == 1 && !list_empty(&dquot->dq_free))
atomic_dec(&dquot->dq_count);
remove_free_dquot(dquot);
atomic_inc(&dquot->dq_count);

That way we are sure that while we are holding dq_list_lock, all dquots on
rls_head list have dq_count == 1.

> + dquot = list_first_entry(&rls_head, struct dquot, dq_free);
> + if (dquot_dirty(dquot)) {
> + spin_unlock(&dq_list_lock);
> + /* Commit dquot before releasing */
> + dquot_write_dquot(dquot);
> + goto restart;
> + }
> + /* Always clear DQ_ACTIVE_B, unless racing with dqget() */
> + if (dquot_active(dquot)) {
> + spin_unlock(&dq_list_lock);
> + dquot->dq_sb->dq_op->release_dquot(dquot);

I'd just go to restart here to make the logic simple. Forward progress is
guaranteed anyway and it isn't really much less efficient.

> + spin_lock(&dq_list_lock);
> + }
> + /*
> + * During the execution of dquot_release() outside the
> + * dq_list_lock, another process may have completed
> + * dqget()/dqput()/mark_dirty().
> + */
> + if (atomic_read(&dquot->dq_count) == 1 &&
> + (dquot_active(dquot) || dquot_dirty(dquot))) {
> + spin_unlock(&dq_list_lock);
> + goto restart;
> + }

This can be dropped then...

> + /*
> + * Now it is safe to remove this dquot from releasing_dquots
> + * and reduce its reference count.
> + */
> + remove_free_dquot(dquot);
> + atomic_dec(&dquot->dq_count);
> +
> + /* We may be racing with some other dqget(). */
> + if (!atomic_read(&dquot->dq_count))

This condition can also be dropped then.

> + put_dquot_last(dquot);
> + }
> + spin_unlock(&dq_list_lock);
> +}
> +

The rest looks good.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-29 11:15:35

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] quota: remove unused function put_dquot_list()

On Wed 28-06-23 21:21:55, Baokun Li wrote:
> Now that no one is calling put_dquot_list(), remove it.
>
> Signed-off-by: Baokun Li <[email protected]>

I guess you can merge this with patch 6. When there was only a single user,
there's no good reason to separate the removal of the unused function...

Honza

> ---
> fs/quota/dquot.c | 20 --------------------
> 1 file changed, 20 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index df028fb2ce72..ba0125473be3 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1102,26 +1102,6 @@ static void remove_inode_dquot_ref(struct inode *inode, int type)
> dqput(dquot);
> }
>
> -/*
> - * Free list of dquots
> - * Dquots are removed from inodes and no new references can be got so we are
> - * the only ones holding reference
> - */
> -static void put_dquot_list(struct list_head *tofree_head)
> -{
> - struct list_head *act_head;
> - struct dquot *dquot;
> -
> - act_head = tofree_head->next;
> - while (act_head != tofree_head) {
> - dquot = list_entry(act_head, struct dquot, dq_free);
> - act_head = act_head->next;
> - /* Remove dquot from the list so we won't have problems... */
> - list_del_init(&dquot->dq_free);
> - dqput(dquot);
> - }
> -}
> -
> static void remove_dquot_ref(struct super_block *sb, int type)
> {
> struct inode *inode;
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-29 11:20:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] quota: simplify drop_dquot_ref()

On Wed 28-06-23 21:21:54, Baokun Li wrote:
> Now when dqput() drops the last reference count, it will call
> synchronize_srcu(&dquot_srcu) in quota_release_workfn() to ensure that
> no other user will use the dquot after the last reference count is dropped,
> so we don't need to call synchronize_srcu(&dquot_srcu) in drop_dquot_ref()
> and remove the corresponding logic directly to simplify the code.

Nice simplification! It is also important that dqput() now cannot sleep
which was another reason for the logic with tofree_head in
remove_inode_dquot_ref(). Probably this is good to mention in the
changelog.

> Signed-off-by: Baokun Li <[email protected]>
> ---
> fs/quota/dquot.c | 33 ++++++---------------------------
> 1 file changed, 6 insertions(+), 27 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index e8e702ac64e5..df028fb2ce72 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1090,8 +1090,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
> * Remove references to dquots from inode and add dquot to list for freeing
> * if we have the last reference to dquot
> */
> -static void remove_inode_dquot_ref(struct inode *inode, int type,
> - struct list_head *tofree_head)
> +static void remove_inode_dquot_ref(struct inode *inode, int type)
> {
> struct dquot **dquots = i_dquot(inode);
> struct dquot *dquot = dquots[type];
> @@ -1100,21 +1099,7 @@ static void remove_inode_dquot_ref(struct inode *inode, int type,
> return;
>
> dquots[type] = NULL;
> - if (list_empty(&dquot->dq_free)) {
> - /*
> - * The inode still has reference to dquot so it can't be in the
> - * free list
> - */
> - spin_lock(&dq_list_lock);
> - list_add(&dquot->dq_free, tofree_head);
> - spin_unlock(&dq_list_lock);
> - } else {
> - /*
> - * Dquot is already in a list to put so we won't drop the last
> - * reference here.
> - */
> - dqput(dquot);
> - }
> + dqput(dquot);
> }

I think you can also just drop remove_inode_dquot_ref() as it is trivial
now and inline it at its only callsite...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-29 11:23:33

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] quota: add new global dquot list releasing_dquots

On 2023/6/29 18:29, Jan Kara wrote:
> On Wed 28-06-23 21:21:50, Baokun Li wrote:
>> Add a new global dquot list that obeys the following rules:
>>
>> 1). A dquot is added to this list when its last reference count is about
>> to be dropped.
>> 2). The reference count of the dquot in the list is greater than or equal
>> to 1 ( due to possible race with dqget()).
>> 3). When a dquot is removed from this list, a reference count is always
>> subtracted, and if the reference count is then 0, the dquot is added
>> to the free_dquots list.
>>
>> This list is used to safely perform the final cleanup before releasing
>> the last reference count, to avoid various contention issues caused by
>> performing cleanup directly in dqput(), and to avoid the performance impact
>> caused by calling synchronize_srcu(&dquot_srcu) directly in dqput(). Here
>> it is just defining the list and implementing the corresponding operation
>> function, which we will use later.
>>
>> Suggested-by: Jan Kara <[email protected]>
>> Signed-off-by: Baokun Li <[email protected]>
> I think you can merge this patch with patch 5. It is not like separating
> this bit helps in review or anything...
OK, I just don't want to cram a lot of stuff into patch 5, I will merge
this patch
into patch 5 in the next version.
>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>> index 108ba9f1e420..a8b43b5b5623 100644
>> --- a/fs/quota/dquot.c
>> +++ b/fs/quota/dquot.c
>> @@ -226,12 +226,21 @@ static void put_quota_format(struct quota_format_type *fmt)
>> /*
>> * Dquot List Management:
>> * The quota code uses four lists for dquot management: the inuse_list,
> ^^^ five now :)
Yes, indeed, I forgot to correct here.
>
>> - * free_dquots, dqi_dirty_list, and dquot_hash[] array. A single dquot
>> - * structure may be on some of those lists, depending on its current state.
>> + * releasing_dquots, free_dquots, dqi_dirty_list, and dquot_hash[] array.
>> + * A single dquot structure may be on some of those lists, depending on
>> + * its current state.
>> *
>> * All dquots are placed to the end of inuse_list when first created, and this
>> * list is used for invalidate operation, which must look at every dquot.
>> *
>> + * When the last reference of a dquot will be dropped, the dquot will be
>> + * added to releasing_dquots. We'd then queue work item which would call
>> + * synchronize_srcu() and after that perform the final cleanup of all the
>> + * dquots on the list. Both releasing_dquots and free_dquots use the
>> + * dq_free list_head in the dquot struct. when a dquot is removed from
> ^^^ Capital W please
Good catch!Very sorry for the oversight here.
>> + * releasing_dquots, a reference count is always subtracted, and if
>> + * dq_count == 0 at that point, the dquot will be added to the free_dquots.
>> + *
> Honza
Thank you very much for your careful REVIEW! I will fix those in the
next version!

--
With Best Regards,
Baokun Li
.

2023-06-29 12:07:37

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] quota: fix dqput() to follow the guarantees dquot_srcu should provide

On 2023/6/29 18:59, Jan Kara wrote:
> On Wed 28-06-23 21:21:53, Baokun Li wrote:
>> @@ -760,6 +771,8 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>> struct dquot *dquot;
>> unsigned long freed = 0;
>>
>> + flush_delayed_work(&quota_release_work);
>> +
> I would not flush the work here. Sure, it can make more dquots available
> for reclaim but I think it is more important for the shrinker to not wait
> on srcu period as shrinker can be called very frequently under memory
> pressure.
This is because I want to use remove_free_dquot() directly, and if I
don't do
flush here anymore, then DQST_FREE_DQUOTS will not be accurate.
Since that's the case, I'll remove the flush here and add a determination
to remove_free_dquot() whether to increase DQST_FREE_DQUOTS.
>> spin_lock(&dq_list_lock);
>> while (!list_empty(&free_dquots) && sc->nr_to_scan) {
>> dquot = list_first_entry(&free_dquots, struct dquot, dq_free);
>> @@ -787,6 +800,60 @@ static struct shrinker dqcache_shrinker = {
>> .seeks = DEFAULT_SEEKS,
>> };
>>
>> +/*
>> + * Safely release dquot and put reference to dquot.
>> + */
>> +static void quota_release_workfn(struct work_struct *work)
>> +{
>> + struct dquot *dquot;
>> + struct list_head rls_head;
>> +
>> + spin_lock(&dq_list_lock);
>> + /* Exchange the list head to avoid livelock. */
>> + list_replace_init(&releasing_dquots, &rls_head);
>> + spin_unlock(&dq_list_lock);
>> +
>> +restart:
>> + synchronize_srcu(&dquot_srcu);
>> + spin_lock(&dq_list_lock);
>> + while (!list_empty(&rls_head)) {
> I think the logic below needs a bit more work. Firstly, I think that
> dqget() should removing dquots from releasing_dquots list - basically just
> replace the:
> if (!atomic_read(&dquot->dq_count))
> remove_free_dquot(dquot);
> with
> /* Dquot on releasing_dquots list? Drop ref kept by that list. */
> if (atomic_read(&dquot->dq_count) == 1 && !list_empty(&dquot->dq_free))
> atomic_dec(&dquot->dq_count);
> remove_free_dquot(dquot);
> atomic_inc(&dquot->dq_count);
>
> That way we are sure that while we are holding dq_list_lock, all dquots on
> rls_head list have dq_count == 1.
I wrote it this way at first, but that would have been problematic, so I
ended up
dropping the dq_count == 1 constraint for dquots on releasing_dquots.
Like the following, we will get a bad dquot directly:

quota_release_workfn
 spin_lock(&dq_list_lock)
 dquot = list_first_entry(&rls_head, struct dquot, dq_free)
 spin_unlock(&dq_list_lock)
 dquot->dq_sb->dq_op->release_dquot(dquot)
 release_dquot
       dqget
        atomic_dec(&dquot->dq_count)
        remove_free_dquot(dquot)
        atomic_inc(&dquot->dq_count)
        spin_unlock(&dq_list_lock)
        wait_on_dquot(dquot)
        if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
        // still active
 mutex_lock(&dquot->dq_lock)
 dquot_is_busy(dquot)
  atomic_read(&dquot->dq_count) > 1
 clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
 mutex_unlock(&dquot->dq_lock)

Removing dquot from releasing_dquots and its reduced reference count
will cause dquot_is_busy() in dquot_release to fail. wait_on_dquot(dquot)
in dqget would have no effect. This is also the reason why I did not restart
at dquot_active. Adding dquot to releasing_dquots only in dqput() and
removing dquot from releasing_dquots only in quota_release_workfn() is
a simple and effective way to ensure consistency.


>> + dquot = list_first_entry(&rls_head, struct dquot, dq_free);
>> + if (dquot_dirty(dquot)) {
>> + spin_unlock(&dq_list_lock);
>> + /* Commit dquot before releasing */
>> + dquot_write_dquot(dquot);
>> + goto restart;
>> + }
>> + /* Always clear DQ_ACTIVE_B, unless racing with dqget() */
>> + if (dquot_active(dquot)) {
>> + spin_unlock(&dq_list_lock);
>> + dquot->dq_sb->dq_op->release_dquot(dquot);
> I'd just go to restart here to make the logic simple. Forward progress is
> guaranteed anyway and it isn't really much less efficient.
>
>
> The rest looks good.
>
> Honza
Thanks!
--
With Best Regards,
Baokun Li
.

2023-06-29 12:14:08

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] quota: simplify drop_dquot_ref()

On 2023/6/29 19:08, Jan Kara wrote:
> On Wed 28-06-23 21:21:54, Baokun Li wrote:
>> Now when dqput() drops the last reference count, it will call
>> synchronize_srcu(&dquot_srcu) in quota_release_workfn() to ensure that
>> no other user will use the dquot after the last reference count is dropped,
>> so we don't need to call synchronize_srcu(&dquot_srcu) in drop_dquot_ref()
>> and remove the corresponding logic directly to simplify the code.
> Nice simplification! It is also important that dqput() now cannot sleep
> which was another reason for the logic with tofree_head in
> remove_inode_dquot_ref().

I don't understand this sentence very well, so I would appreciate it

if you could explain it in detail. ????

> Probably this is good to mention in the
> changelog.
>
>> Signed-off-by: Baokun Li <[email protected]>
>> ---
>> fs/quota/dquot.c | 33 ++++++---------------------------
>> 1 file changed, 6 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>> index e8e702ac64e5..df028fb2ce72 100644
>> --- a/fs/quota/dquot.c
>> +++ b/fs/quota/dquot.c
>> @@ -1090,8 +1090,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
>> * Remove references to dquots from inode and add dquot to list for freeing
>> * if we have the last reference to dquot
>> */
>> -static void remove_inode_dquot_ref(struct inode *inode, int type,
>> - struct list_head *tofree_head)
>> +static void remove_inode_dquot_ref(struct inode *inode, int type)
>> {
>> struct dquot **dquots = i_dquot(inode);
>> struct dquot *dquot = dquots[type];
>> @@ -1100,21 +1099,7 @@ static void remove_inode_dquot_ref(struct inode *inode, int type,
>> return;
>>
>> dquots[type] = NULL;
>> - if (list_empty(&dquot->dq_free)) {
>> - /*
>> - * The inode still has reference to dquot so it can't be in the
>> - * free list
>> - */
>> - spin_lock(&dq_list_lock);
>> - list_add(&dquot->dq_free, tofree_head);
>> - spin_unlock(&dq_list_lock);
>> - } else {
>> - /*
>> - * Dquot is already in a list to put so we won't drop the last
>> - * reference here.
>> - */
>> - dqput(dquot);
>> - }
>> + dqput(dquot);
>> }
> I think you can also just drop remove_inode_dquot_ref() as it is trivial
> now and inline it at its only callsite...
>
> Honza
Sure, I'll just remove it in the next version and inline the only
remaining code
into remove_dquot_ref().

Thanks!
--
With Best Regards,
Baokun Li
.

2023-06-29 12:20:41

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] quota: remove unused function put_dquot_list()

On 2023/6/29 19:05, Jan Kara wrote:
> On Wed 28-06-23 21:21:55, Baokun Li wrote:
>> Now that no one is calling put_dquot_list(), remove it.
>>
>> Signed-off-by: Baokun Li <[email protected]>
> I guess you can merge this with patch 6. When there was only a single user,
> there's no good reason to separate the removal of the unused function...
>
> Honza
I got a warning when I compiled after I finished patch 6, and then there was
this patch. I will merge this patch into patch 6 in the next version.
>> ---
>> fs/quota/dquot.c | 20 --------------------
>> 1 file changed, 20 deletions(-)
>>
>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>> index df028fb2ce72..ba0125473be3 100644
>> --- a/fs/quota/dquot.c
>> +++ b/fs/quota/dquot.c
>> @@ -1102,26 +1102,6 @@ static void remove_inode_dquot_ref(struct inode *inode, int type)
>> dqput(dquot);
>> }
>>
>> -/*
>> - * Free list of dquots
>> - * Dquots are removed from inodes and no new references can be got so we are
>> - * the only ones holding reference
>> - */
>> -static void put_dquot_list(struct list_head *tofree_head)
>> -{
>> - struct list_head *act_head;
>> - struct dquot *dquot;
>> -
>> - act_head = tofree_head->next;
>> - while (act_head != tofree_head) {
>> - dquot = list_entry(act_head, struct dquot, dq_free);
>> - act_head = act_head->next;
>> - /* Remove dquot from the list so we won't have problems... */
>> - list_del_init(&dquot->dq_free);
>> - dqput(dquot);
>> - }
>> -}
>> -
>> static void remove_dquot_ref(struct super_block *sb, int type)
>> {
>> struct inode *inode;
>> --
>> 2.31.1
>>
Thanks!
--
With Best Regards,
Baokun Li
.

2023-06-29 14:17:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] quota: simplify drop_dquot_ref()

On Thu 29-06-23 20:13:05, Baokun Li wrote:
> On 2023/6/29 19:08, Jan Kara wrote:
> > On Wed 28-06-23 21:21:54, Baokun Li wrote:
> > > Now when dqput() drops the last reference count, it will call
> > > synchronize_srcu(&dquot_srcu) in quota_release_workfn() to ensure that
> > > no other user will use the dquot after the last reference count is dropped,
> > > so we don't need to call synchronize_srcu(&dquot_srcu) in drop_dquot_ref()
> > > and remove the corresponding logic directly to simplify the code.
> > Nice simplification! It is also important that dqput() now cannot sleep
> > which was another reason for the logic with tofree_head in
> > remove_inode_dquot_ref().
>
> I don't understand this sentence very well, so I would appreciate it
>
> if you could explain it in detail. ????

OK, let me phrase it in a "changelog" way :):

remove_inode_dquot_ref() currently does not release the last dquot
reference but instead adds the dquot to tofree_head list. This is because
dqput() can sleep while dropping of the last dquot reference (writing back
the dquot and calling ->release_dquot()) and that must not happen under
dq_list_lock. Now that dqput() queues the final dquot cleanup into a
workqueue, remove_inode_dquot_ref() can call dqput() unconditionally
and we can significantly simplify it.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-29 14:29:52

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] quota: simplify drop_dquot_ref()

On 2023/6/29 22:09, Jan Kara wrote:
> On Thu 29-06-23 20:13:05, Baokun Li wrote:
>> On 2023/6/29 19:08, Jan Kara wrote:
>>> On Wed 28-06-23 21:21:54, Baokun Li wrote:
>>>> Now when dqput() drops the last reference count, it will call
>>>> synchronize_srcu(&dquot_srcu) in quota_release_workfn() to ensure that
>>>> no other user will use the dquot after the last reference count is dropped,
>>>> so we don't need to call synchronize_srcu(&dquot_srcu) in drop_dquot_ref()
>>>> and remove the corresponding logic directly to simplify the code.
>>> Nice simplification! It is also important that dqput() now cannot sleep
>>> which was another reason for the logic with tofree_head in
>>> remove_inode_dquot_ref().
>> I don't understand this sentence very well, so I would appreciate it
>>
>> if you could explain it in detail. ????
> OK, let me phrase it in a "changelog" way :):
>
> remove_inode_dquot_ref() currently does not release the last dquot
> reference but instead adds the dquot to tofree_head list. This is because
> dqput() can sleep while dropping of the last dquot reference (writing back
> the dquot and calling ->release_dquot()) and that must not happen under
> dq_list_lock. Now that dqput() queues the final dquot cleanup into a
> workqueue, remove_inode_dquot_ref() can call dqput() unconditionally
> and we can significantly simplify it.
>
> Honza
I suddenly understand what you mean, you mean that now dqput() doesn't have
any possible sleep operation. So it can be called in spin_lock at will.

I was confused because I understood that dqput() cannot be called in the
context
of possible sleep.

Thank you very much for the detailed explanation!
Now there is only the problem in patch 5.

Thanks!
--
With Best Regards,
Baokun Li
.

2023-06-29 14:38:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] quota: fix dqput() to follow the guarantees dquot_srcu should provide

On Thu 29-06-23 19:47:08, Baokun Li wrote:
> On 2023/6/29 18:59, Jan Kara wrote:
> > On Wed 28-06-23 21:21:53, Baokun Li wrote:
> > > @@ -760,6 +771,8 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > struct dquot *dquot;
> > > unsigned long freed = 0;
> > > + flush_delayed_work(&quota_release_work);
> > > +
> > I would not flush the work here. Sure, it can make more dquots available
> > for reclaim but I think it is more important for the shrinker to not wait
> > on srcu period as shrinker can be called very frequently under memory
> > pressure.
> This is because I want to use remove_free_dquot() directly, and if I don't
> do
> flush here anymore, then DQST_FREE_DQUOTS will not be accurate.
> Since that's the case, I'll remove the flush here and add a determination
> to remove_free_dquot() whether to increase DQST_FREE_DQUOTS.

OK.

> > > spin_lock(&dq_list_lock);
> > > while (!list_empty(&free_dquots) && sc->nr_to_scan) {
> > > dquot = list_first_entry(&free_dquots, struct dquot, dq_free);
> > > @@ -787,6 +800,60 @@ static struct shrinker dqcache_shrinker = {
> > > .seeks = DEFAULT_SEEKS,
> > > };
> > > +/*
> > > + * Safely release dquot and put reference to dquot.
> > > + */
> > > +static void quota_release_workfn(struct work_struct *work)
> > > +{
> > > + struct dquot *dquot;
> > > + struct list_head rls_head;
> > > +
> > > + spin_lock(&dq_list_lock);
> > > + /* Exchange the list head to avoid livelock. */
> > > + list_replace_init(&releasing_dquots, &rls_head);
> > > + spin_unlock(&dq_list_lock);
> > > +
> > > +restart:
> > > + synchronize_srcu(&dquot_srcu);
> > > + spin_lock(&dq_list_lock);
> > > + while (!list_empty(&rls_head)) {
> > I think the logic below needs a bit more work. Firstly, I think that
> > dqget() should removing dquots from releasing_dquots list - basically just
> > replace the:
> > if (!atomic_read(&dquot->dq_count))
> > remove_free_dquot(dquot);
> > with
> > /* Dquot on releasing_dquots list? Drop ref kept by that list. */
> > if (atomic_read(&dquot->dq_count) == 1 && !list_empty(&dquot->dq_free))
> > atomic_dec(&dquot->dq_count);
> > remove_free_dquot(dquot);
> > atomic_inc(&dquot->dq_count);
> >
> > That way we are sure that while we are holding dq_list_lock, all dquots on
> > rls_head list have dq_count == 1.
> I wrote it this way at first, but that would have been problematic, so I
> ended up dropping the dq_count == 1 constraint for dquots on
> releasing_dquots. Like the following, we will get a bad dquot directly:
>
> quota_release_workfn
> ?spin_lock(&dq_list_lock)
> ?dquot = list_first_entry(&rls_head, struct dquot, dq_free)
> ?spin_unlock(&dq_list_lock)
> ?dquot->dq_sb->dq_op->release_dquot(dquot)
> ?release_dquot
> ?????? dqget
> ??????? atomic_dec(&dquot->dq_count)
> ??????? remove_free_dquot(dquot)
> ??????? atomic_inc(&dquot->dq_count)
> ??????? spin_unlock(&dq_list_lock)
> ??????? wait_on_dquot(dquot)
> ??????? if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
> ??????? // still active
> ?mutex_lock(&dquot->dq_lock)
> ?dquot_is_busy(dquot)
> ? atomic_read(&dquot->dq_count) > 1
> ?clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
> ?mutex_unlock(&dquot->dq_lock)
>
> Removing dquot from releasing_dquots and its reduced reference count
> will cause dquot_is_busy() in dquot_release to fail. wait_on_dquot(dquot)
> in dqget would have no effect. This is also the reason why I did not restart
> at dquot_active. Adding dquot to releasing_dquots only in dqput() and
> removing dquot from releasing_dquots only in quota_release_workfn() is
> a simple and effective way to ensure consistency.

Indeed, that's a good point. Still cannot we simplify the loop like:

while (!list_empty(&rls_head)) {
dquot = list_first_entry(&rls_head, struct dquot, dq_free);
/* Dquot got used again? */
if (atomic_read(&dquot->dq_count) > 1) {
atomic_dec(&dquot->dq_count);
remove_free_dquot(dquot);
continue;
}
if (dquot_dirty(dquot)) {
keep what you had
}
if (dquot_active(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
goto restart;
}
/* Dquot is inactive and clean, we can move it to free list */
atomic_dec(&dquot->dq_count);
remove_free_dquot(dquot);
put_dquot_last(dquot);
}

What do you think?
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-30 07:50:32

by Baokun Li

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] quota: fix dqput() to follow the guarantees dquot_srcu should provide

On 2023/6/29 22:33, Jan Kara wrote:
> On Thu 29-06-23 19:47:08, Baokun Li wrote:
>> On 2023/6/29 18:59, Jan Kara wrote:
>>> On Wed 28-06-23 21:21:53, Baokun Li wrote:
>>>> @@ -760,6 +771,8 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>>>> struct dquot *dquot;
>>>> unsigned long freed = 0;
>>>> + flush_delayed_work(&quota_release_work);
>>>> +
>>> I would not flush the work here. Sure, it can make more dquots available
>>> for reclaim but I think it is more important for the shrinker to not wait
>>> on srcu period as shrinker can be called very frequently under memory
>>> pressure.
>> This is because I want to use remove_free_dquot() directly, and if I don't
>> do
>> flush here anymore, then DQST_FREE_DQUOTS will not be accurate.
>> Since that's the case, I'll remove the flush here and add a determination
>> to remove_free_dquot() whether to increase DQST_FREE_DQUOTS.
> OK.
>
>>>> spin_lock(&dq_list_lock);
>>>> while (!list_empty(&free_dquots) && sc->nr_to_scan) {
>>>> dquot = list_first_entry(&free_dquots, struct dquot, dq_free);
>>>> @@ -787,6 +800,60 @@ static struct shrinker dqcache_shrinker = {
>>>> .seeks = DEFAULT_SEEKS,
>>>> };
>>>> +/*
>>>> + * Safely release dquot and put reference to dquot.
>>>> + */
>>>> +static void quota_release_workfn(struct work_struct *work)
>>>> +{
>>>> + struct dquot *dquot;
>>>> + struct list_head rls_head;
>>>> +
>>>> + spin_lock(&dq_list_lock);
>>>> + /* Exchange the list head to avoid livelock. */
>>>> + list_replace_init(&releasing_dquots, &rls_head);
>>>> + spin_unlock(&dq_list_lock);
>>>> +
>>>> +restart:
>>>> + synchronize_srcu(&dquot_srcu);
>>>> + spin_lock(&dq_list_lock);
>>>> + while (!list_empty(&rls_head)) {
>>> I think the logic below needs a bit more work. Firstly, I think that
>>> dqget() should removing dquots from releasing_dquots list - basically just
>>> replace the:
>>> if (!atomic_read(&dquot->dq_count))
>>> remove_free_dquot(dquot);
>>> with
>>> /* Dquot on releasing_dquots list? Drop ref kept by that list. */
>>> if (atomic_read(&dquot->dq_count) == 1 && !list_empty(&dquot->dq_free))
>>> atomic_dec(&dquot->dq_count);
>>> remove_free_dquot(dquot);
>>> atomic_inc(&dquot->dq_count);
>>>
>>> That way we are sure that while we are holding dq_list_lock, all dquots on
>>> rls_head list have dq_count == 1.
>> I wrote it this way at first, but that would have been problematic, so I
>> ended up dropping the dq_count == 1 constraint for dquots on
>> releasing_dquots. Like the following, we will get a bad dquot directly:
>>
>> quota_release_workfn
>>  spin_lock(&dq_list_lock)
>>  dquot = list_first_entry(&rls_head, struct dquot, dq_free)
>>  spin_unlock(&dq_list_lock)
>>  dquot->dq_sb->dq_op->release_dquot(dquot)
>>  release_dquot
>>        dqget
>>         atomic_dec(&dquot->dq_count)
>>         remove_free_dquot(dquot)
>>         atomic_inc(&dquot->dq_count)
>>         spin_unlock(&dq_list_lock)
>>         wait_on_dquot(dquot)
>>         if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
>>         // still active
>>  mutex_lock(&dquot->dq_lock)
>>  dquot_is_busy(dquot)
>>   atomic_read(&dquot->dq_count) > 1
>>  clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
>>  mutex_unlock(&dquot->dq_lock)
>>
>> Removing dquot from releasing_dquots and its reduced reference count
>> will cause dquot_is_busy() in dquot_release to fail. wait_on_dquot(dquot)
>> in dqget would have no effect. This is also the reason why I did not restart
>> at dquot_active. Adding dquot to releasing_dquots only in dqput() and
>> removing dquot from releasing_dquots only in quota_release_workfn() is
>> a simple and effective way to ensure consistency.
> Indeed, that's a good point. Still cannot we simplify the loop like:
>
> while (!list_empty(&rls_head)) {
> dquot = list_first_entry(&rls_head, struct dquot, dq_free);
> /* Dquot got used again? */
> if (atomic_read(&dquot->dq_count) > 1) {
> atomic_dec(&dquot->dq_count);
> remove_free_dquot(dquot);
> continue;
> }
> if (dquot_dirty(dquot)) {
> keep what you had
> }
> if (dquot_active(dquot)) {
> spin_unlock(&dq_list_lock);
> dquot->dq_sb->dq_op->release_dquot(dquot);
> goto restart;
> }
> /* Dquot is inactive and clean, we can move it to free list */
> atomic_dec(&dquot->dq_count);
> remove_free_dquot(dquot);
> put_dquot_last(dquot);
> }
>
> What do you think?
> Honza
This looks great, and the code looks much cleaner, and I'll send out the
next version later containing your suggested changes!

Thank you so much for your patient review!
--
With Best Regards,
Baokun Li
.