2015-05-12 23:52:41

by Dave Chinner

[permalink] [raw]
Subject: [PATCH 0/2 v2] percpu_counter: xfs requires custom compare batch size

Hi folks,

This is v2 of the regression fix for the new generic per-cpu
superblock counter code in XFS. The problems fixed arise from using
custom batch sizes for addition and decrement exceeding the
"accurate compare" bounds in percpu_counter_compare() and hence
resulting in incorrect comparisons being made.

This regression was introduced in 4.1-rc1 and it requires a small
tweak to the percpu counter infrastructure to fix, hence the two
patches.

Comments welcome!

-Dave.


2015-05-12 23:52:50

by Dave Chinner

[permalink] [raw]
Subject: [PATCH 1/2] percpu_counter: batch size aware __percpu_counter_compare()

From: Dave Chinner <[email protected]>

XFS uses non-stanard batch sizes for avoiding frequent global
counter updates on it's allocated inode counters, as they increment
or decrement in batches of 64 inodes. Hence the standard percpu
counter batch of 32 means that the counter is effectively a global
counter. Currently Xfs uses a batch size of 128 so that it doesn't
take the global lock on every single modification.

However, Xfs also needs to compare accurately against zero, which
means we need to use percpu_counter_compare(), and that has a
hard-coded batch size of 32, and hence will spuriously fail to
detect when it is supposed to use precise comparisons and hence
the accounting goes wrong.

Add __percpu_counter_compare() to take a custom batch size so we can
use it sanely in XFS and factor percpu_counter_compare() to use it.

Signed-off-by: Dave Chinner <[email protected]>
---
include/linux/percpu_counter.h | 13 ++++++++++++-
lib/percpu_counter.c | 6 +++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 50e5009..4c82e60 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -41,7 +41,12 @@ void percpu_counter_destroy(struct percpu_counter *fbc);
void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch);
s64 __percpu_counter_sum(struct percpu_counter *fbc);
-int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs);
+int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
+
+static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
+{
+ return __percpu_counter_compare(fbc, rhs, percpu_counter_batch);
+}

static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
@@ -116,6 +121,12 @@ static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
return 0;
}

+static inline int
+percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
+{
+ return percpu_counter_compare(fbc, rhs);
+}
+
static inline void
percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 48144cd..f051d69 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -197,13 +197,13 @@ static int percpu_counter_hotcpu_callback(struct notifier_block *nb,
* Compare counter against given value.
* Return 1 if greater, 0 if equal and -1 if less
*/
-int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
+int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
{
s64 count;

count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
- if (abs(count - rhs) > (percpu_counter_batch*num_online_cpus())) {
+ if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else
@@ -218,7 +218,7 @@ int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
else
return 0;
}
-EXPORT_SYMBOL(percpu_counter_compare);
+EXPORT_SYMBOL(__percpu_counter_compare);

static int __init percpu_counter_startup(void)
{
--
2.0.0

2015-05-12 23:52:46

by Dave Chinner

[permalink] [raw]
Subject: [PATCH 2/2] xfs: inode and free block counters need to use __percpu_counter_compare

From: Dave Chinner <[email protected]>

Because the counters use a custom batch size, the comparison
functions need to be aware of that batch size otherwise the
comparison does not work correctly. This leads to ASSERT failures
on generic/027 like this:

XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
------------[ cut here ]------------
....
Call Trace:
[<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
[<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
[<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
[<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
[<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
[<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
[<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
[<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
[<ffffffff811f3958>] evict+0xb8/0x190
[<ffffffff811f433b>] iput+0x18b/0x1f0
[<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
[<ffffffff811d548a>] ? filp_close+0x5a/0x80
[<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
[<ffffffff81e0892e>] system_call_fastpath+0x12/0x71

This is a regression introduced by commit 501ab32 ("xfs: use generic
percpu counters for inode counter").

This patch fixes the same problem for both the inode counter and the
free block counter in the superblocks.

Signed-off-by: Dave Chinner <[email protected]>
---
fs/xfs/xfs_mount.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 02f827f..461e791 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1100,14 +1100,18 @@ xfs_log_sbcount(xfs_mount_t *mp)
return xfs_sync_sb(mp, true);
}

+/*
+ * Deltas for the inode count are +/-64, hence we use a large batch size
+ * of 128 so we don't need to take the counter lock on every update.
+ */
+#define XFS_ICOUNT_BATCH 128
int
xfs_mod_icount(
struct xfs_mount *mp,
int64_t delta)
{
- /* deltas are +/-64, hence the large batch size of 128. */
- __percpu_counter_add(&mp->m_icount, delta, 128);
- if (percpu_counter_compare(&mp->m_icount, 0) < 0) {
+ __percpu_counter_add(&mp->m_icount, delta, XFS_ICOUNT_BATCH);
+ if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) {
ASSERT(0);
percpu_counter_add(&mp->m_icount, -delta);
return -EINVAL;
@@ -1129,6 +1133,14 @@ xfs_mod_ifree(
return 0;
}

+/*
+ * Deltas for the block count can vary from 1 to very large, but lock contention
+ * only occurs on frequent small block count updates such as in the delayed
+ * allocation path for buffered writes (page a time updates). Hence we set
+ * a large batch count (1024) to minimise global counter updates except when
+ * we get near to ENOSPC and we have to be very accurate with our updates.
+ */
+#define XFS_FDBLOCKS_BATCH 1024
int
xfs_mod_fdblocks(
struct xfs_mount *mp,
@@ -1167,25 +1179,19 @@ xfs_mod_fdblocks(
* Taking blocks away, need to be more accurate the closer we
* are to zero.
*
- * batch size is set to a maximum of 1024 blocks - if we are
- * allocating of freeing extents larger than this then we aren't
- * going to be hammering the counter lock so a lock per update
- * is not a problem.
- *
* If the counter has a value of less than 2 * max batch size,
* then make everything serialise as we are real close to
* ENOSPC.
*/
-#define __BATCH 1024
- if (percpu_counter_compare(&mp->m_fdblocks, 2 * __BATCH) < 0)
+ if (__percpu_counter_compare(&mp->m_fdblocks, 2 * XFS_FDBLOCKS_BATCH,
+ XFS_FDBLOCKS_BATCH) < 0)
batch = 1;
else
- batch = __BATCH;
-#undef __BATCH
+ batch = XFS_FDBLOCKS_BATCH;

__percpu_counter_add(&mp->m_fdblocks, delta, batch);
- if (percpu_counter_compare(&mp->m_fdblocks,
- XFS_ALLOC_SET_ASIDE(mp)) >= 0) {
+ if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
+ XFS_FDBLOCKS_BATCH) >= 0) {
/* we had space! */
return 0;
}
--
2.0.0

2015-05-13 13:59:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] percpu_counter: batch size aware __percpu_counter_compare()

Hello, Dave.

On Wed, May 13, 2015 at 09:52:33AM +1000, Dave Chinner wrote:
> @@ -116,6 +121,12 @@ static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
> return 0;
> }
>
> +static inline int
> +percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
> +{
> + return percpu_counter_compare(fbc, rhs);
> +}

I don't think this is right. Looks fine to me otherwise.

Thanks.

--
tejun

2015-05-14 00:56:15

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 1/2] percpu_counter: batch size aware __percpu_counter_compare()

On Wed, May 13, 2015 at 09:59:19AM -0400, Tejun Heo wrote:
> Hello, Dave.
>
> On Wed, May 13, 2015 at 09:52:33AM +1000, Dave Chinner wrote:
> > @@ -116,6 +121,12 @@ static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
> > return 0;
> > }
> >
> > +static inline int
> > +percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
> > +{
> > + return percpu_counter_compare(fbc, rhs);
> > +}
>
> I don't think this is right. Looks fine to me otherwise.

Ah, no, it's not. My bad, stale patch. Corrected version below.

Cheers,

Dave.
--
Dave Chinner
[email protected]


percpu_counter: batch size aware __percpu_counter_compare()

From: Dave Chinner <[email protected]>

XFS uses non-stanard batch sizes for avoiding frequent global
counter updates on it's allocated inode counters, as they increment
or decrement in batches of 64 inodes. Hence the standard percpu
counter batch of 32 means that the counter is effectively a global
counter. Currently Xfs uses a batch size of 128 so that it doesn't
take the global lock on every single modification.

However, Xfs also needs to compare accurately against zero, which
means we need to use percpu_counter_compare(), and that has a
hard-coded batch size of 32, and hence will spuriously fail to
detect when it is supposed to use precise comparisons and hence
the accounting goes wrong.

Add __percpu_counter_compare() to take a custom batch size so we can
use it sanely in XFS and factor percpu_counter_compare() to use it.

Signed-off-by: Dave Chinner <[email protected]>
---
include/linux/percpu_counter.h | 13 ++++++++++++-
lib/percpu_counter.c | 6 +++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 50e5009..84a1094 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -41,7 +41,12 @@ void percpu_counter_destroy(struct percpu_counter *fbc);
void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch);
s64 __percpu_counter_sum(struct percpu_counter *fbc);
-int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs);
+int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
+
+static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
+{
+ return __percpu_counter_compare(fbc, rhs, percpu_counter_batch);
+}

static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
@@ -116,6 +121,12 @@ static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
return 0;
}

+static inline int
+__percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
+{
+ return percpu_counter_compare(fbc, rhs);
+}
+
static inline void
percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 48144cd..f051d69 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -197,13 +197,13 @@ static int percpu_counter_hotcpu_callback(struct notifier_block *nb,
* Compare counter against given value.
* Return 1 if greater, 0 if equal and -1 if less
*/
-int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
+int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
{
s64 count;

count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
- if (abs(count - rhs) > (percpu_counter_batch*num_online_cpus())) {
+ if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else
@@ -218,7 +218,7 @@ int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
else
return 0;
}
-EXPORT_SYMBOL(percpu_counter_compare);
+EXPORT_SYMBOL(__percpu_counter_compare);

static int __init percpu_counter_startup(void)
{

2015-05-14 14:21:51

by Brian Foster

[permalink] [raw]
Subject: Re: [PATCH 2/2] xfs: inode and free block counters need to use __percpu_counter_compare

On Wed, May 13, 2015 at 09:52:34AM +1000, Dave Chinner wrote:
> From: Dave Chinner <[email protected]>
>
> Because the counters use a custom batch size, the comparison
> functions need to be aware of that batch size otherwise the
> comparison does not work correctly. This leads to ASSERT failures
> on generic/027 like this:
>
> XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
> ------------[ cut here ]------------
> ....
> Call Trace:
> [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
> [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
> [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
> [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
> [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
> [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
> [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
> [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
> [<ffffffff811f3958>] evict+0xb8/0x190
> [<ffffffff811f433b>] iput+0x18b/0x1f0
> [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
> [<ffffffff811d548a>] ? filp_close+0x5a/0x80
> [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
> [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71
>
> This is a regression introduced by commit 501ab32 ("xfs: use generic
> percpu counters for inode counter").
>
> This patch fixes the same problem for both the inode counter and the
> free block counter in the superblocks.
>
> Signed-off-by: Dave Chinner <[email protected]>
> ---

Reviewed-by: Brian Foster <[email protected]>

> fs/xfs/xfs_mount.c | 34 ++++++++++++++++++++--------------
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 02f827f..461e791 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1100,14 +1100,18 @@ xfs_log_sbcount(xfs_mount_t *mp)
> return xfs_sync_sb(mp, true);
> }
>
> +/*
> + * Deltas for the inode count are +/-64, hence we use a large batch size
> + * of 128 so we don't need to take the counter lock on every update.
> + */
> +#define XFS_ICOUNT_BATCH 128
> int
> xfs_mod_icount(
> struct xfs_mount *mp,
> int64_t delta)
> {
> - /* deltas are +/-64, hence the large batch size of 128. */
> - __percpu_counter_add(&mp->m_icount, delta, 128);
> - if (percpu_counter_compare(&mp->m_icount, 0) < 0) {
> + __percpu_counter_add(&mp->m_icount, delta, XFS_ICOUNT_BATCH);
> + if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) {
> ASSERT(0);
> percpu_counter_add(&mp->m_icount, -delta);
> return -EINVAL;
> @@ -1129,6 +1133,14 @@ xfs_mod_ifree(
> return 0;
> }
>
> +/*
> + * Deltas for the block count can vary from 1 to very large, but lock contention
> + * only occurs on frequent small block count updates such as in the delayed
> + * allocation path for buffered writes (page a time updates). Hence we set
> + * a large batch count (1024) to minimise global counter updates except when
> + * we get near to ENOSPC and we have to be very accurate with our updates.
> + */
> +#define XFS_FDBLOCKS_BATCH 1024
> int
> xfs_mod_fdblocks(
> struct xfs_mount *mp,
> @@ -1167,25 +1179,19 @@ xfs_mod_fdblocks(
> * Taking blocks away, need to be more accurate the closer we
> * are to zero.
> *
> - * batch size is set to a maximum of 1024 blocks - if we are
> - * allocating of freeing extents larger than this then we aren't
> - * going to be hammering the counter lock so a lock per update
> - * is not a problem.
> - *
> * If the counter has a value of less than 2 * max batch size,
> * then make everything serialise as we are real close to
> * ENOSPC.
> */
> -#define __BATCH 1024
> - if (percpu_counter_compare(&mp->m_fdblocks, 2 * __BATCH) < 0)
> + if (__percpu_counter_compare(&mp->m_fdblocks, 2 * XFS_FDBLOCKS_BATCH,
> + XFS_FDBLOCKS_BATCH) < 0)
> batch = 1;
> else
> - batch = __BATCH;
> -#undef __BATCH
> + batch = XFS_FDBLOCKS_BATCH;
>
> __percpu_counter_add(&mp->m_fdblocks, delta, batch);
> - if (percpu_counter_compare(&mp->m_fdblocks,
> - XFS_ALLOC_SET_ASIDE(mp)) >= 0) {
> + if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> + XFS_FDBLOCKS_BATCH) >= 0) {
> /* we had space! */
> return 0;
> }
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> [email protected]
> http://oss.sgi.com/mailman/listinfo/xfs

2015-05-14 15:02:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] percpu_counter: batch size aware __percpu_counter_compare()

On Thu, May 14, 2015 at 10:55:53AM +1000, Dave Chinner wrote:
> percpu_counter: batch size aware __percpu_counter_compare()
>
> From: Dave Chinner <[email protected]>
>
> XFS uses non-stanard batch sizes for avoiding frequent global
> counter updates on it's allocated inode counters, as they increment
> or decrement in batches of 64 inodes. Hence the standard percpu
> counter batch of 32 means that the counter is effectively a global
> counter. Currently Xfs uses a batch size of 128 so that it doesn't
> take the global lock on every single modification.
>
> However, Xfs also needs to compare accurately against zero, which
> means we need to use percpu_counter_compare(), and that has a
> hard-coded batch size of 32, and hence will spuriously fail to
> detect when it is supposed to use precise comparisons and hence
> the accounting goes wrong.
>
> Add __percpu_counter_compare() to take a custom batch size so we can
> use it sanely in XFS and factor percpu_counter_compare() to use it.
>
> Signed-off-by: Dave Chinner <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Please feel free to route the patch however you see fit.

Thanks.

--
tejun