2018-03-29 12:08:54

by Andreas Gruenbacher

[permalink] [raw]
Subject: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

Here's a second version of the patch (now a patch set) to eliminate
rhashtable_walk_peek in gfs2.

The first patch introduces lockref_put_not_zero, the inverse of
lockref_get_not_zero.

The second patch eliminates rhashtable_walk_peek in gfs2. In
gfs2_glock_iter_next, the new lockref function from patch one is used to
drop a lockref count as long as the count doesn't drop to zero. This is
almost always the case; if there is a risk of dropping the last
reference, we must defer that to a work queue because dropping the last
reference may sleep.

Thanks,
Andreas

Andreas Gruenbacher (2):
lockref: Add lockref_put_not_zero
gfs2: Stop using rhashtable_walk_peek

fs/gfs2/glock.c | 47 ++++++++++++++++++++++++++++-------------------
include/linux/lockref.h | 1 +
lib/lockref.c | 28 ++++++++++++++++++++++++++++
3 files changed, 57 insertions(+), 19 deletions(-)

--
2.14.3



2018-03-29 12:07:50

by Andreas Gruenbacher

[permalink] [raw]
Subject: [PATCH v2 1/2] lockref: Add lockref_put_not_zero

Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
include/linux/lockref.h | 1 +
lib/lockref.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+)

diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 2eac32095113..99f17cc8e163 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -37,6 +37,7 @@ struct lockref {
extern void lockref_get(struct lockref *);
extern int lockref_put_return(struct lockref *);
extern int lockref_get_not_zero(struct lockref *);
+extern int lockref_put_not_zero(struct lockref *);
extern int lockref_get_or_lock(struct lockref *);
extern int lockref_put_or_lock(struct lockref *);

diff --git a/lib/lockref.c b/lib/lockref.c
index 47169ed7e964..3d468b53d4c9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -80,6 +80,34 @@ int lockref_get_not_zero(struct lockref *lockref)
}
EXPORT_SYMBOL(lockref_get_not_zero);

+/**
+ * lockref_put_not_zero - Decrements count unless count <= 1 before decrement
+ * @lockref: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count would become zero
+ */
+int lockref_put_not_zero(struct lockref *lockref)
+{
+ int retval;
+
+ CMPXCHG_LOOP(
+ new.count--;
+ if (old.count <= 1)
+ return 0;
+ ,
+ return 1;
+ );
+
+ spin_lock(&lockref->lock);
+ retval = 0;
+ if (lockref->count > 1) {
+ lockref->count--;
+ retval = 1;
+ }
+ spin_unlock(&lockref->lock);
+ return retval;
+}
+EXPORT_SYMBOL(lockref_put_not_zero);
+
/**
* lockref_get_or_lock - Increments count unless the count is 0 or dead
* @lockref: pointer to lockref structure
--
2.14.3


2018-03-29 12:07:51

by Andreas Gruenbacher

[permalink] [raw]
Subject: [PATCH v2 2/2] gfs2: Stop using rhashtable_walk_peek

Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump. Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/glock.c | 47 ++++++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 82fb5583445c..097bd3c0f270 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1923,28 +1923,37 @@ void gfs2_glock_exit(void)

static void gfs2_glock_iter_next(struct gfs2_glock_iter *gi, loff_t n)
{
- if (n == 0)
- gi->gl = rhashtable_walk_peek(&gi->hti);
- else {
- gi->gl = rhashtable_walk_next(&gi->hti);
- n--;
+ struct gfs2_glock *gl = gi->gl;
+
+ if (gl) {
+ if (n == 0)
+ return;
+ if (!lockref_put_not_zero(&gl->gl_lockref))
+ gfs2_glock_queue_put(gl);
}
for (;;) {
- if (IS_ERR_OR_NULL(gi->gl)) {
- if (!gi->gl)
- return;
- if (PTR_ERR(gi->gl) != -EAGAIN) {
- gi->gl = NULL;
- return;
+ gl = rhashtable_walk_next(&gi->hti);
+ if (IS_ERR_OR_NULL(gl)) {
+ if (gl == ERR_PTR(-EAGAIN)) {
+ n = 1;
+ continue;
}
- n = 0;
- } else if (gi->sdp == gi->gl->gl_name.ln_sbd &&
- !__lockref_is_dead(&gi->gl->gl_lockref)) {
- if (!n--)
- break;
+ gl = NULL;
+ break;
+ }
+ if (gl->gl_name.ln_sbd != gi->sdp)
+ continue;
+ if (n <= 1) {
+ if (!lockref_get_not_dead(&gl->gl_lockref))
+ continue;
+ break;
+ } else {
+ if (__lockref_is_dead(&gl->gl_lockref))
+ continue;
+ n--;
}
- gi->gl = rhashtable_walk_next(&gi->hti);
}
+ gi->gl = gl;
}

static void *gfs2_glock_seq_start(struct seq_file *seq, loff_t *pos)
@@ -1988,7 +1997,6 @@ static void gfs2_glock_seq_stop(struct seq_file *seq, void *iter_ptr)
{
struct gfs2_glock_iter *gi = seq->private;

- gi->gl = NULL;
rhashtable_walk_stop(&gi->hti);
}

@@ -2076,7 +2084,8 @@ static int gfs2_glocks_release(struct inode *inode, struct file *file)
struct seq_file *seq = file->private_data;
struct gfs2_glock_iter *gi = seq->private;

- gi->gl = NULL;
+ if (gi->gl)
+ gfs2_glock_put(gi->gl);
rhashtable_walk_exit(&gi->hti);
return seq_release_private(inode, file);
}
--
2.14.3


2018-03-29 12:26:05

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

Hi,

Can we solve the problem another way, by not taking refs on the glocks
when we are iterating over them for the debugfs files? I assume that is
the main issue here.

We didn't used to take refs since the rcu locking was enough during the
walk itself. We used to only keep track of the hash bucket and offset
within the bucket when we dropped the rcu lock between calls to the
iterator. I may have lost track of why that approach did not work?

Steve.


On 29/03/18 13:06, Andreas Gruenbacher wrote:
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
>
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
>
> The second patch eliminates rhashtable_walk_peek in gfs2. In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero. This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.
>
> Thanks,
> Andreas
>
> Andreas Gruenbacher (2):
> lockref: Add lockref_put_not_zero
> gfs2: Stop using rhashtable_walk_peek
>
> fs/gfs2/glock.c | 47 ++++++++++++++++++++++++++++-------------------
> include/linux/lockref.h | 1 +
> lib/lockref.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 57 insertions(+), 19 deletions(-)
>


2018-03-29 12:37:18

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
>
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
>
> The second patch eliminates rhashtable_walk_peek in gfs2. In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero. This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.

In light of Neil's latest patch, do we still need this?

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-03-29 13:14:19

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On 29 March 2018 at 14:24, Steven Whitehouse <[email protected]> wrote:
> Hi,
>
> Can we solve the problem another way, by not taking refs on the glocks when
> we are iterating over them for the debugfs files? I assume that is the main
> issue here.
>
> We didn't used to take refs since the rcu locking was enough during the walk
> itself. We used to only keep track of the hash bucket and offset within the
> bucket when we dropped the rcu lock between calls to the iterator. I may
> have lost track of why that approach did not work?

That doesn't work because when a glock doesn't fit into one read, we
need to make sure that the next read will continue with the same glock
or else we'll end up with a corrupted dump. And rhashtable_walk_peek
cannot guarantee that.

I've done some minimal performance testing and the additional ref
taking only impacted the performance in the 10% range or less, so it
doesn't really matter.

Andreas

2018-03-29 13:17:23

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On 29 March 2018 at 14:35, Herbert Xu <[email protected]> wrote:
> On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
>> Here's a second version of the patch (now a patch set) to eliminate
>> rhashtable_walk_peek in gfs2.
>>
>> The first patch introduces lockref_put_not_zero, the inverse of
>> lockref_get_not_zero.
>>
>> The second patch eliminates rhashtable_walk_peek in gfs2. In
>> gfs2_glock_iter_next, the new lockref function from patch one is used to
>> drop a lockref count as long as the count doesn't drop to zero. This is
>> almost always the case; if there is a risk of dropping the last
>> reference, we must defer that to a work queue because dropping the last
>> reference may sleep.
>
> In light of Neil's latest patch, do we still need this?

For all I know, Neil's latest plan is to get rhashtable_walk_peek
replaced and removed because it is unfixable. This patch removes the
one and only user.

Thanks,
Andreas

2018-03-29 15:43:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>
> For all I know, Neil's latest plan is to get rhashtable_walk_peek
> replaced and removed because it is unfixable. This patch removes the
> one and only user.

His latest patch makes rhashtable_walk_peek stable in the face of
removals.

https://patchwork.ozlabs.org/patch/892534/

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-03-29 16:54:09

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On 29 March 2018 at 17:41, Herbert Xu <[email protected]> wrote:
> On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>>
>> For all I know, Neil's latest plan is to get rhashtable_walk_peek
>> replaced and removed because it is unfixable. This patch removes the
>> one and only user.
>
> His latest patch makes rhashtable_walk_peek stable in the face of
> removals.
>
> https://patchwork.ozlabs.org/patch/892534/

Ok, I can slightly update my patch description. The problem still
remains that glocks can be deleted from the rhashtable between
stop/start, and that needs to be fixed in gfs2. Once that's done,
keeping track of the current glock comes for free and we won't need
rhashtable_walk_peek anymore.

Should rhashtable_walk_peek be kept around even if there are no more
users? I have my doubts.

Thanks,
Andreas

2018-03-29 17:09:26

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Thu, Mar 29, 2018 at 06:52:34PM +0200, Andreas Gruenbacher wrote:
>
> Should rhashtable_walk_peek be kept around even if there are no more
> users? I have my doubts.

Absolutely. All netlink dumps using rhashtable_walk_next are buggy
and need to switch over to rhashtable_walk_peek. As otherwise
the object that triggers the out-of-space condition will be skipped
upon resumption.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-04-03 03:43:00

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Fri, Mar 30 2018, Herbert Xu wrote:

> On Thu, Mar 29, 2018 at 06:52:34PM +0200, Andreas Gruenbacher wrote:
>>
>> Should rhashtable_walk_peek be kept around even if there are no more
>> users? I have my doubts.
>
> Absolutely. All netlink dumps using rhashtable_walk_next are buggy
> and need to switch over to rhashtable_walk_peek. As otherwise
> the object that triggers the out-of-space condition will be skipped
> upon resumption.

Do we really need a rhashtable_walk_peek() interface?
I imagine that a seqfile ->start function can do:

if (*ppos == 0 && last_pos != 0) {
rhashtable_walk_exit(&iter);
rhashtable_walk_enter(&table, &iter);
last_pos = 0;
}
rhashtable_walk_start(&iter);
if (*ppos == last_pos && iter.p)
return iter.p;
last_pos = *ppos;
return rhashtable_walk_next(&iter)


and the ->next function just does

last_pos = *ppos;
*ppos += 1;
do p = rhashtable_walk_next(&iter); while (IS_ERR(p));
return p;

It might be OK to have a function call instead of expecting people to
use iter.p directly.

static inline void *rhashtable_walk_prev(struct rhashtable_iter *iter)
{
return iter->p;
}

Thoughts?

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2018-04-03 04:07:30

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Tue, Apr 03, 2018 at 01:41:26PM +1000, NeilBrown wrote:
>
> Do we really need a rhashtable_walk_peek() interface?
> I imagine that a seqfile ->start function can do:
>
> if (*ppos == 0 && last_pos != 0) {
> rhashtable_walk_exit(&iter);
> rhashtable_walk_enter(&table, &iter);
> last_pos = 0;
> }
> rhashtable_walk_start(&iter);
> if (*ppos == last_pos && iter.p)
> return iter.p;
> last_pos = *ppos;

We don't want users poking into the internals of iter. If you're
suggesting we could simplify rhashtable_walk_peek to just this after
your patch then yes possibly. You also need to ensure that not only
seqfs users continue to work but also netlink dump users which are
slightly different.

> It might be OK to have a function call instead of expecting people to
> use iter.p directly.

Yes that would definitely be the preferred option.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-04-04 15:48:18

by Bob Peterson

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

----- Original Message -----
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
>
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
>
> The second patch eliminates rhashtable_walk_peek in gfs2. In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero. This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.
>
> Thanks,
> Andreas
>
> Andreas Gruenbacher (2):
> lockref: Add lockref_put_not_zero
> gfs2: Stop using rhashtable_walk_peek
>
> fs/gfs2/glock.c | 47 ++++++++++++++++++++++++++++-------------------
> include/linux/lockref.h | 1 +
> lib/lockref.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 57 insertions(+), 19 deletions(-)
>
> --
> 2.14.3

Hi,

The patches look good. The big question is whether to add them to this
merge window while it's still open. Opinions?

Acked-by: Bob Peterson <[email protected]>

Regards,

Bob Peterson

2018-04-04 15:52:08

by Herbert Xu

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

On Wed, Apr 04, 2018 at 11:46:28AM -0400, Bob Peterson wrote:
>
> The patches look good. The big question is whether to add them to this
> merge window while it's still open. Opinions?

We're still hashing out the rhashtable interface so I don't think
now is the time to rush things.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2018-04-12 17:02:34

by Bob Peterson

[permalink] [raw]
Subject: Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

----- Original Message -----
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
>
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
>
> The second patch eliminates rhashtable_walk_peek in gfs2. In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero. This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.
>
> Thanks,
> Andreas
>
> Andreas Gruenbacher (2):
> lockref: Add lockref_put_not_zero
> gfs2: Stop using rhashtable_walk_peek
>
> fs/gfs2/glock.c | 47 ++++++++++++++++++++++++++++-------------------
> include/linux/lockref.h | 1 +
> lib/lockref.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 57 insertions(+), 19 deletions(-)
>
> --
> 2.14.3

Hi,

Thanks. These two patches are now pushed to the for-next branch of the linux-gfs2 tree:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?h=for-next&id=450b1f6f56350c630e795f240dc5a77aa8aa2419
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?h=for-next&id=3fd5d3ad35dc44aaf0f28d60cc0eb75887bff54d

Regards,

Bob Peterson
Red Hat File Systems