2022-06-14 16:10:54

by Jan Kara

[permalink] [raw]
Subject: [PATCH 01/10] mbcache: Don't reclaim used entries

Do not reclaim entries that are currently used by somebody from a
shrinker. Firstly, these entries are likely useful. Secondly, we will
need to keep such entries to protect pending increment of xattr block
refcount.

CC: [email protected]
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <[email protected]>
---
fs/mbcache.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..cfc28129fb6f 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -288,7 +288,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
while (nr_to_scan-- && !list_empty(&cache->c_list)) {
entry = list_first_entry(&cache->c_list,
struct mb_cache_entry, e_list);
- if (entry->e_referenced) {
+ if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) {
entry->e_referenced = 0;
list_move_tail(&entry->e_list, &cache->c_list);
continue;
@@ -302,6 +302,14 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
spin_unlock(&cache->c_list_lock);
head = mb_cache_entry_head(cache, entry->e_key);
hlist_bl_lock(head);
+ /* Now a reliable check if the entry didn't get used... */
+ if (atomic_read(&entry->e_refcnt) > 2) {
+ hlist_bl_unlock(head);
+ spin_lock(&cache->c_list_lock);
+ list_add_tail(&entry->e_list, &cache->c_list);
+ cache->c_entry_count++;
+ continue;
+ }
if (!hlist_bl_unhashed(&entry->e_hash_list)) {
hlist_bl_del_init(&entry->e_hash_list);
atomic_dec(&entry->e_refcnt);
--
2.35.3


2022-06-16 14:36:21

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCH 01/10] mbcache: Don't reclaim used entries

On 22/06/14 06:05PM, Jan Kara wrote:
> Do not reclaim entries that are currently used by somebody from a
> shrinker. Firstly, these entries are likely useful. Secondly, we will
> need to keep such entries to protect pending increment of xattr block
> refcount.

Trying to review the patch series to best of my knowledge, so kindly excuse my
silly queries along the way.

>
> CC: [email protected]
> Fixes: 82939d7999df ("ext4: convert to mbcache2")
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/mbcache.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/mbcache.c b/fs/mbcache.c
> index 97c54d3a2227..cfc28129fb6f 100644
> --- a/fs/mbcache.c
> +++ b/fs/mbcache.c
> @@ -288,7 +288,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
> while (nr_to_scan-- && !list_empty(&cache->c_list)) {
> entry = list_first_entry(&cache->c_list,
> struct mb_cache_entry, e_list);
> - if (entry->e_referenced) {
> + if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) {

Sure, yes, the above "||" conditions looks good.
i.e. if the refcnt is above 2, then we should move the entry to the tail of LRU.
Because that means that there is another user of this entry which might have
incremented the refcnt.

> entry->e_referenced = 0;
> list_move_tail(&entry->e_list, &cache->c_list);
> continue;
> @@ -302,6 +302,14 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
> spin_unlock(&cache->c_list_lock);
> head = mb_cache_entry_head(cache, entry->e_key);
> hlist_bl_lock(head);
> + /* Now a reliable check if the entry didn't get used... */

But not sure why this is more reliable? Anytime we add or remove the entry,
we first always do the list operation and then increment or decrement the
refcnt "atomically".

So could you please help in understanding why will this be more reliable?

-ritesh


> + if (atomic_read(&entry->e_refcnt) > 2) {
> + hlist_bl_unlock(head);
> + spin_lock(&cache->c_list_lock);
> + list_add_tail(&entry->e_list, &cache->c_list);
> + cache->c_entry_count++;
> + continue;
> + }
> if (!hlist_bl_unhashed(&entry->e_hash_list)) {
> hlist_bl_del_init(&entry->e_hash_list);
> atomic_dec(&entry->e_refcnt);
> --
> 2.35.3
>

2022-06-16 17:29:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 01/10] mbcache: Don't reclaim used entries

On Thu 16-06-22 19:52:12, Ritesh Harjani wrote:
> > CC: [email protected]
> > Fixes: 82939d7999df ("ext4: convert to mbcache2")
> > Signed-off-by: Jan Kara <[email protected]>
> > ---
> > fs/mbcache.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/mbcache.c b/fs/mbcache.c
> > index 97c54d3a2227..cfc28129fb6f 100644
> > --- a/fs/mbcache.c
> > +++ b/fs/mbcache.c
> > @@ -288,7 +288,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
> > while (nr_to_scan-- && !list_empty(&cache->c_list)) {
> > entry = list_first_entry(&cache->c_list,
> > struct mb_cache_entry, e_list);
> > - if (entry->e_referenced) {
> > + if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) {
>
> Sure, yes, the above "||" conditions looks good.
> i.e. if the refcnt is above 2, then we should move the entry to the tail of LRU.
> Because that means that there is another user of this entry which might have
> incremented the refcnt.
>
> > entry->e_referenced = 0;
> > list_move_tail(&entry->e_list, &cache->c_list);
> > continue;
> > @@ -302,6 +302,14 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
> > spin_unlock(&cache->c_list_lock);
> > head = mb_cache_entry_head(cache, entry->e_key);
> > hlist_bl_lock(head);
> > + /* Now a reliable check if the entry didn't get used... */
>
> But not sure why this is more reliable? Anytime we add or remove the entry,
> we first always do the list operation and then increment or decrement the
> refcnt "atomically".
>
> So could you please help in understanding why will this be more reliable?

It is reliable in the sense that while we hold hlist_bl_lock() there can be
no new references acquired (they get acquired only through the hash table
lookup) and so here we can "atomically" do "check entry is unused and
remove it from the hash".

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR