2015-04-30 16:33:10

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

Running a test on a large CPU count box with xfs, I hit a live lock
with the following backtraces on several CPUs:

Call Trace:
[<ffffffff812c34f8>] __const_udelay+0x28/0x30
[<ffffffffa033ab9a>] xfs_icsb_lock_cntr+0x2a/0x40 [xfs]
[<ffffffffa033c871>] xfs_icsb_modify_counters+0x71/0x280 [xfs]
[<ffffffffa03413e1>] xfs_trans_reserve+0x171/0x210 [xfs]
[<ffffffffa0378cfd>] xfs_create+0x24d/0x6f0 [xfs]
[<ffffffff8124c8eb>] ? avc_has_perm_flags+0xfb/0x1e0
[<ffffffffa0336eeb>] xfs_vn_mknod+0xbb/0x1e0 [xfs]
[<ffffffffa0337043>] xfs_vn_create+0x13/0x20 [xfs]
[<ffffffff811b0edd>] vfs_create+0xcd/0x130
[<ffffffff811b21ef>] do_last+0xb8f/0x1240
[<ffffffff811b39b2>] path_openat+0xc2/0x490

Looking at the code I see it was stuck at:

STATIC void
xfs_icsb_lock_cntr(
xfs_icsb_cnts_t *icsbp)
{
while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags)) {
ndelay(1000);
}
}

I'm not sure why it does the ndelay() and not just a cpu_relax(), but
that's besides the point. In xfs_icsb_modify_counters() the code is
fine. There's a preempt_disable() called when taking this bit spinlock
and a preempt_enable() after it is released. The issue is that not all
locations are protected by preempt_disable() when PREEMPT_RT is set.
Namely the places that grab all CPU cntr locks.

STATIC void
xfs_icsb_lock_all_counters(
xfs_mount_t *mp)
{
xfs_icsb_cnts_t *cntp;
int i;

for_each_online_cpu(i) {
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
xfs_icsb_lock_cntr(cntp);
}
}

STATIC void
xfs_icsb_disable_counter()
{
[...]
xfs_icsb_lock_all_counters(mp);
[...]
xfs_icsb_unlock_all_counters(mp);
}


STATIC void
xfs_icsb_balance_counter_locked()
{
[...]
xfs_icsb_disable_counter();
[...]
}

STATIC void
xfs_icsb_balance_counter(
xfs_mount_t *mp,
xfs_sb_field_t fields,
int min_per_cpu)
{
spin_lock(&mp->m_sb_lock);
xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu);
spin_unlock(&mp->m_sb_lock);
}

Now, when PREEMPT_RT is not enabled, that spin_lock() disables
preemption. But for PREEMPT_RT, it does not. Although with my test box I
was not able to produce a task state of all tasks, but I'm assuming that
some task called the xfs_icsb_lock_all_counters() and was preempted by
an RT task and could not finish, causing all callers of that lock to
block indefinitely.

Looking at all users of xfs_icsb_lock_all_counters(), they are leaf
functions and do not call anything that may block on PREEMPT_RT. I
believe the proper fix here is to simply disable preemption in
xfs_icsb_lock_all_counters() when PREEMPT_RT is enabled.

Signed-off-by: Steven Rostedt <[email protected]>
---
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 51435dbce9c4..dbaa1ce3f308 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1660,6 +1660,12 @@ xfs_icsb_lock_all_counters(
xfs_icsb_cnts_t *cntp;
int i;

+ /*
+ * In PREEMPT_RT, preemption is not disabled here, and it
+ * must be to take the xfs_icsb_lock_cntr.
+ */
+ preempt_disable_rt();
+
for_each_online_cpu(i) {
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
xfs_icsb_lock_cntr(cntp);
@@ -1677,6 +1683,8 @@ xfs_icsb_unlock_all_counters(
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
xfs_icsb_unlock_cntr(cntp);
}
+
+ preempt_enable_rt();
}

STATIC void


2015-04-30 18:07:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, Apr 30, 2015 at 12:33:03PM -0400, Steven Rostedt wrote:
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 51435dbce9c4..dbaa1ce3f308 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1660,6 +1660,12 @@ xfs_icsb_lock_all_counters(
> xfs_icsb_cnts_t *cntp;
> int i;
>
> + /*
> + * In PREEMPT_RT, preemption is not disabled here, and it
> + * must be to take the xfs_icsb_lock_cntr.
> + */
> + preempt_disable_rt();
> +
> for_each_online_cpu(i) {
> cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
> xfs_icsb_lock_cntr(cntp);
> @@ -1677,6 +1683,8 @@ xfs_icsb_unlock_all_counters(
> cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
> xfs_icsb_unlock_cntr(cntp);
> }
> +
> + preempt_enable_rt();
> }

The irony, this is distinctly non deterministic code you're putting
under a RT specific preempt_disable ;-)

2015-04-30 18:32:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, 30 Apr 2015 20:07:21 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, Apr 30, 2015 at 12:33:03PM -0400, Steven Rostedt wrote:
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 51435dbce9c4..dbaa1ce3f308 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -1660,6 +1660,12 @@ xfs_icsb_lock_all_counters(
> > xfs_icsb_cnts_t *cntp;
> > int i;
> >
> > + /*
> > + * In PREEMPT_RT, preemption is not disabled here, and it
> > + * must be to take the xfs_icsb_lock_cntr.
> > + */
> > + preempt_disable_rt();
> > +
> > for_each_online_cpu(i) {
> > cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
> > xfs_icsb_lock_cntr(cntp);
> > @@ -1677,6 +1683,8 @@ xfs_icsb_unlock_all_counters(
> > cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
> > xfs_icsb_unlock_cntr(cntp);
> > }
> > +
> > + preempt_enable_rt();
> > }
>
> The irony, this is distinctly non deterministic code you're putting
> under a RT specific preempt_disable ;-)

I know :-(

Unfortunately, a RT behaving fix would be much more invasive and would
probably require the help of the xfs folks. For now, this just prevents
a live lock that can happen and halt the system, where it becomes
deterministic catastrophe.

-- Steve

2015-04-30 18:37:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, 30 Apr 2015 11:33:27 -0700
Christoph Hellwig <[email protected]> wrote:

> FYI, this code is gone in 4.1-rc.

:-)

Well, when we port -rt to 4.1 that will be the answer!

But for now, we need to band-aid the older versions of the kernel that
we do support.

-- Steve

2015-04-30 18:40:32

by Austin Schuh

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, Apr 30, 2015 at 11:32 AM, Steven Rostedt <[email protected]> wrote:
> On Thu, 30 Apr 2015 20:07:21 +0200
> Peter Zijlstra <[email protected]> wrote:
>> The irony, this is distinctly non deterministic code you're putting
>> under a RT specific preempt_disable ;-)
>
> I know :-(
>
> Unfortunately, a RT behaving fix would be much more invasive and would
> probably require the help of the xfs folks. For now, this just prevents
> a live lock that can happen and halt the system, where it becomes
> deterministic catastrophe.
>
> -- Steve

Would it work to instead create a lock to replace the
preempt_enable_rt/preempt_disable_rt pair in XFS?

2015-04-30 19:07:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, 30 Apr 2015 11:40:07 -0700
Austin Schuh <[email protected]> wrote:

> On Thu, Apr 30, 2015 at 11:32 AM, Steven Rostedt <[email protected]> wrote:
> > On Thu, 30 Apr 2015 20:07:21 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >> The irony, this is distinctly non deterministic code you're putting
> >> under a RT specific preempt_disable ;-)
> >
> > I know :-(
> >
> > Unfortunately, a RT behaving fix would be much more invasive and would
> > probably require the help of the xfs folks. For now, this just prevents
> > a live lock that can happen and halt the system, where it becomes
> > deterministic catastrophe.
> >
> > -- Steve
>
> Would it work to instead create a lock to replace the
> preempt_enable_rt/preempt_disable_rt pair in XFS?

Not just the place where the preempt_enable and disable is done, but it
would need to replace all the per cpu bit spin locks (the
XFS_ICSB_FLAG_LOCK bit in icsbp->icsb_flags).

If we can replace them with a rtmutex (spin_lock() in vanilla kernel),
then that would work.

-- Steve

2015-05-04 00:51:53

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Thu, Apr 30, 2015 at 12:33:03PM -0400, Steven Rostedt wrote:
> Running a test on a large CPU count box with xfs, I hit a live lock
> with the following backtraces on several CPUs:
>
> Call Trace:
> [<ffffffff812c34f8>] __const_udelay+0x28/0x30
> [<ffffffffa033ab9a>] xfs_icsb_lock_cntr+0x2a/0x40 [xfs]
> [<ffffffffa033c871>] xfs_icsb_modify_counters+0x71/0x280 [xfs]
> [<ffffffffa03413e1>] xfs_trans_reserve+0x171/0x210 [xfs]
> [<ffffffffa0378cfd>] xfs_create+0x24d/0x6f0 [xfs]
> [<ffffffff8124c8eb>] ? avc_has_perm_flags+0xfb/0x1e0
> [<ffffffffa0336eeb>] xfs_vn_mknod+0xbb/0x1e0 [xfs]
> [<ffffffffa0337043>] xfs_vn_create+0x13/0x20 [xfs]
> [<ffffffff811b0edd>] vfs_create+0xcd/0x130
> [<ffffffff811b21ef>] do_last+0xb8f/0x1240
> [<ffffffff811b39b2>] path_openat+0xc2/0x490
>
> Looking at the code I see it was stuck at:
>
> STATIC void
> xfs_icsb_lock_cntr(
> xfs_icsb_cnts_t *icsbp)
> {
> while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags)) {
> ndelay(1000);
> }
> }
>
> I'm not sure why it does the ndelay() and not just a cpu_relax(), but

Because the code was writtenlong before cpu_relax() existed, just
like it was written long before the generic percpu counter code was
added...

....

> Now, when PREEMPT_RT is not enabled, that spin_lock() disables
> preemption. But for PREEMPT_RT, it does not. Although with my test box I
> was not able to produce a task state of all tasks, but I'm assuming that
> some task called the xfs_icsb_lock_all_counters() and was preempted by
> an RT task and could not finish, causing all callers of that lock to
> block indefinitely.
>
> Looking at all users of xfs_icsb_lock_all_counters(), they are leaf
> functions and do not call anything that may block on PREEMPT_RT. I
> believe the proper fix here is to simply disable preemption in
> xfs_icsb_lock_all_counters() when PREEMPT_RT is enabled.

RT is going to have other performance problems that are probably
going to negate the scalability this code provides. If you want a
hack that you can easily backport (as this code now uses the generic
percpu counters) then have a look at fs/xfs/xfs_linux.h:

/*
* Feature macros (disable/enable)
*/
#ifdef CONFIG_SMP
#define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
#else
#undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
#endif

You can turn off all that per-cpu code simply by:

-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT)

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-05-04 14:14:46

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][RT] xfs: Disable preemption when grabbing all icsb counter locks

On Mon, 4 May 2015 10:48:44 +1000
Dave Chinner <[email protected]> wrote:


> > I'm not sure why it does the ndelay() and not just a cpu_relax(), but
>
> Because the code was writtenlong before cpu_relax() existed, just
> like it was written long before the generic percpu counter code was
> added...

Ah, legacy code.

>
> ....
>
> > Now, when PREEMPT_RT is not enabled, that spin_lock() disables
> > preemption. But for PREEMPT_RT, it does not. Although with my test box I
> > was not able to produce a task state of all tasks, but I'm assuming that
> > some task called the xfs_icsb_lock_all_counters() and was preempted by
> > an RT task and could not finish, causing all callers of that lock to
> > block indefinitely.
> >
> > Looking at all users of xfs_icsb_lock_all_counters(), they are leaf
> > functions and do not call anything that may block on PREEMPT_RT. I
> > believe the proper fix here is to simply disable preemption in
> > xfs_icsb_lock_all_counters() when PREEMPT_RT is enabled.
>
> RT is going to have other performance problems that are probably
> going to negate the scalability this code provides. If you want a
> hack that you can easily backport (as this code now uses the generic
> percpu counters) then have a look at fs/xfs/xfs_linux.h:
>
> /*
> * Feature macros (disable/enable)
> */
> #ifdef CONFIG_SMP
> #define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
> #else
> #undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
> #endif
>
> You can turn off all that per-cpu code simply by:
>
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT)

Oh, I think I like this the best.

Thanks for the advice.

-- Steve

2015-05-13 15:36:38

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH][RT] xfs: Disable percpu SB on PREEMPT_RT_FULL

Running a test on a large CPU count box with xfs, I hit a live lock
with the following backtraces on several CPUs:

Call Trace:
[<ffffffff812c34f8>] __const_udelay+0x28/0x30
[<ffffffffa033ab9a>] xfs_icsb_lock_cntr+0x2a/0x40 [xfs]
[<ffffffffa033c871>] xfs_icsb_modify_counters+0x71/0x280 [xfs]
[<ffffffffa03413e1>] xfs_trans_reserve+0x171/0x210 [xfs]
[<ffffffffa0378cfd>] xfs_create+0x24d/0x6f0 [xfs]
[<ffffffff8124c8eb>] ? avc_has_perm_flags+0xfb/0x1e0
[<ffffffffa0336eeb>] xfs_vn_mknod+0xbb/0x1e0 [xfs]
[<ffffffffa0337043>] xfs_vn_create+0x13/0x20 [xfs]
[<ffffffff811b0edd>] vfs_create+0xcd/0x130
[<ffffffff811b21ef>] do_last+0xb8f/0x1240
[<ffffffff811b39b2>] path_openat+0xc2/0x490

Looking at the code I see it was stuck at:

STATIC void
xfs_icsb_lock_cntr(
xfs_icsb_cnts_t *icsbp)
{
while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags)) {
ndelay(1000);
}
}

In xfs_icsb_modify_counters() the code is fine. There's a
preempt_disable() called when taking this bit spinlock and a
preempt_enable() after it is released. The issue is that not all
locations are protected by preempt_disable() when PREEMPT_RT is set.
Namely the places that grab all CPU cntr locks.

STATIC void
xfs_icsb_lock_all_counters(
xfs_mount_t *mp)
{
xfs_icsb_cnts_t *cntp;
int i;

for_each_online_cpu(i) {
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
xfs_icsb_lock_cntr(cntp);
}
}

STATIC void
xfs_icsb_disable_counter()
{
[...]
xfs_icsb_lock_all_counters(mp);
[...]
xfs_icsb_unlock_all_counters(mp);
}


STATIC void
xfs_icsb_balance_counter_locked()
{
[...]
xfs_icsb_disable_counter();
[...]
}

STATIC void
xfs_icsb_balance_counter(
xfs_mount_t *mp,
xfs_sb_field_t fields,
int min_per_cpu)
{
spin_lock(&mp->m_sb_lock);
xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu);
spin_unlock(&mp->m_sb_lock);
}

Now, when PREEMPT_RT is not enabled, that spin_lock() disables
preemption. But for PREEMPT_RT, it does not. Although with my test box I
was not able to produce a task state of all tasks, but I'm assuming that
some task called the xfs_icsb_lock_all_counters() and was preempted by
an RT task and could not finish, causing all callers of that lock to
block indefinitely.

Dave Chinner has stated that the scalability of that code will probably
be negated by PREEMPT_RT, and that it is probably best to just disable
the code in question. Also, this code has been rewritten in newer kernels.

Link: http://lkml.kernel.org/r/20150504004844.GA21261@dastard

Suggested-by: Dave Chinner <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 825249d2dfc1..c43cb979a46d 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -132,7 +132,7 @@ typedef __uint64_t __psunsigned_t;
/*
* Feature macros (disable/enable)
*/
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT_FULL)
#define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */
#else
#undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */

Subject: Re: [PATCH][RT] xfs: Disable percpu SB on PREEMPT_RT_FULL

* Steven Rostedt | 2015-05-13 11:36:32 [-0400]:

>Running a test on a large CPU count box with xfs, I hit a live lock
>with the following backtraces on several CPUs:
Applied.

Sebastian