2016-04-18 06:34:05

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -tip 1/3] locking/pvqspinlock: Fix div by 0 in qstats

While playing with such statistics I ran into the following
splat on a VM when opening pv_hash_hops:

[ 25.267962] divide error: 0000 [#1] SMP
...
[ 25.268807] CPU: 17 PID: 1018 Comm: cat Not tainted 4.6.0-rc3-debug+ #2
[ 25.268853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151208_145348-sheep05 04/01/2014
[ 25.268930] task: ffff88029a10c040 ti: ffff8800b1e1c000 task.ti: ffff8800b1e1c000
[ 25.268979] RIP: 0010:[<ffffffff810b61fe>] [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
[ 25.269043] RSP: 0018:ffff8800b1e1fde8 EFLAGS: 00010246
[ 25.269081] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 25.269128] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81911640
[ 25.269175] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000
[ 25.269223] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000df00
[ 25.269270] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800b1e1ff28
[ 25.269319] FS: 00007f3bd2f88700(0000) GS:ffff8802b5240000(0000) knlGS:0000000000000000
[ 25.269372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.269411] CR2: 00007f3bd2ddc008 CR3: 000000002c5f0000 CR4: 00000000000406e0
[ 25.269467] Stack:
[ 25.269487] 00007f3bd2ddd000 0000000000020000 ffffffff811cad7c ffffffff8119750c
[ 25.269549] ffff88002d08e000 ffffea0000b07140 ffffea0000b07140 ffff88002d08e000
[ 25.269609] ffffffff8118d3b9 ffffffff811937a9 00000000102a46cb ffff8802992beb00
[ 25.269670] Call Trace:
[ 25.269698] [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
[ 25.269745] [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
[ 25.269791] [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
[ 25.269834] [<ffffffff811937a9>] ? do_mmap+0x449/0x550
[ 25.269872] [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
[ 25.270506] [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
[ 25.271093] [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
[ 25.271677] [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
[ 25.272294] [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 25.272904] Code: 00 48 8b 74 24 50 65 48 33 34 25 28 00 00 00 0f 85 b7 00 00 00 48 83 c4 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 ee 4c 89 f0 31 d2 <48> f7 f6 48 d1 ed 48 8d 5c 24 10 48 8d 3c 92 48 89 c1 31 d2 48
[ 25.274347] RIP [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
[ 25.274885] RSP <ffff8800b1e1fde8>
[ 25.275457] ---[ end trace 8558569eb04480ab ]---

Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
similarly to what the qstat_pv_latency_wake case does, as if nothing
else, this can come from resetting the statistics, thus having 0 kicks
should be quite valid in this context.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/qspinlock_stat.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index eb2a2c9bc3fc..d734b7502001 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -136,10 +136,12 @@ static ssize_t qstat_read(struct file *file, char __user *user_buf,
}

if (counter == qstat_pv_hash_hops) {
- u64 frac;
+ u64 frac = 0;

- frac = 100ULL * do_div(stat, kicks);
- frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+ if (kicks) {
+ frac = 100ULL * do_div(stat, kicks);
+ frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+ }

/*
* Return a X.XX decimal number
--
2.8.1


2016-04-18 06:34:08

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -tip 3/3] locking/pvqspinlock: Robustify init_qspinlock_stat()

Specifically around the debugfs file creation calls,
I have no idea if they could ever possibly fail, but
this is core code (debug aside) so lets at least
check the return value and inform anything fishy.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/qspinlock_stat.h | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index 72722334237a..ddcd653c942c 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -225,12 +225,18 @@ static int __init init_qspinlock_stat(void)
* performance.
*/
for (i = 0; i < qstat_num; i++)
- debugfs_create_file(qstat_names[i], 0400, d_qstat,
- (void *)(long)i, &fops_qstat);
+ if (!debugfs_create_file(qstat_names[i], 0400, d_qstat,
+ (void *)(long)i, &fops_qstat))
+ goto fail;
+
+ if (!debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
+ (void *)(long)qstat_reset_cnts, &fops_qstat))
+ goto fail;

- debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
- (void *)(long)qstat_reset_cnts, &fops_qstat);
return 0;
+fail:
+ debugfs_remove_recursive(d_qstat);
+ return 1;
}
fs_initcall(init_qspinlock_stat);

--
2.8.1

2016-04-18 06:34:03

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -tip 2/3] locking/pvqspinlock: Avoid double resetting of stats

... remove the redundant second iteration, this is most
likely a copy/past buglet.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/qspinlock_stat.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index d734b7502001..72722334237a 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -191,8 +191,6 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf,

for (i = 0 ; i < qstat_num; i++)
WRITE_ONCE(ptr[i], 0);
- for (i = 0 ; i < qstat_num; i++)
- WRITE_ONCE(ptr[i], 0);
}
return count;
}
--
2.8.1

2016-04-18 16:16:59

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH -tip 3/3] locking/pvqspinlock: Robustify init_qspinlock_stat()

On Sun, 17 Apr 2016, Davidlohr Bueso wrote:

>diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
>index 72722334237a..ddcd653c942c 100644
>--- a/kernel/locking/qspinlock_stat.h
>+++ b/kernel/locking/qspinlock_stat.h
>@@ -225,12 +225,18 @@ static int __init init_qspinlock_stat(void)
> * performance.
> */
> for (i = 0; i < qstat_num; i++)
>- debugfs_create_file(qstat_names[i], 0400, d_qstat,
>- (void *)(long)i, &fops_qstat);
>+ if (!debugfs_create_file(qstat_names[i], 0400, d_qstat,
>+ (void *)(long)i, &fops_qstat))
>+ goto fail;
>+
>+ if (!debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
>+ (void *)(long)qstat_reset_cnts, &fops_qstat))
>+ goto fail;
>
>- debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
>- (void *)(long)qstat_reset_cnts, &fops_qstat);
> return 0;
>+fail:
>+ debugfs_remove_recursive(d_qstat);
>+ return 1;

Hmm actually if debugfs_create_dir() fails firstly, we still return 0, Waiman, did
you mean 1 there, no?

Thanks,
Davidlohr

2016-04-18 19:34:45

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH -tip 1/3] locking/pvqspinlock: Fix div by 0 in qstats

On 04/18/2016 02:31 AM, Davidlohr Bueso wrote:
> While playing with such statistics I ran into the following
> splat on a VM when opening pv_hash_hops:
>
> [ 25.267962] divide error: 0000 [#1] SMP
> ...
> [ 25.268807] CPU: 17 PID: 1018 Comm: cat Not tainted 4.6.0-rc3-debug+ #2
> [ 25.268853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151208_145348-sheep05 04/01/2014
> [ 25.268930] task: ffff88029a10c040 ti: ffff8800b1e1c000 task.ti: ffff8800b1e1c000
> [ 25.268979] RIP: 0010:[<ffffffff810b61fe>] [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
> [ 25.269043] RSP: 0018:ffff8800b1e1fde8 EFLAGS: 00010246
> [ 25.269081] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 25.269128] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81911640
> [ 25.269175] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000
> [ 25.269223] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000df00
> [ 25.269270] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800b1e1ff28
> [ 25.269319] FS: 00007f3bd2f88700(0000) GS:ffff8802b5240000(0000) knlGS:0000000000000000
> [ 25.269372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 25.269411] CR2: 00007f3bd2ddc008 CR3: 000000002c5f0000 CR4: 00000000000406e0
> [ 25.269467] Stack:
> [ 25.269487] 00007f3bd2ddd000 0000000000020000 ffffffff811cad7c ffffffff8119750c
> [ 25.269549] ffff88002d08e000 ffffea0000b07140 ffffea0000b07140 ffff88002d08e000
> [ 25.269609] ffffffff8118d3b9 ffffffff811937a9 00000000102a46cb ffff8802992beb00
> [ 25.269670] Call Trace:
> [ 25.269698] [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
> [ 25.269745] [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
> [ 25.269791] [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
> [ 25.269834] [<ffffffff811937a9>] ? do_mmap+0x449/0x550
> [ 25.269872] [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
> [ 25.270506] [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
> [ 25.271093] [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
> [ 25.271677] [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
> [ 25.272294] [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> [ 25.272904] Code: 00 48 8b 74 24 50 65 48 33 34 25 28 00 00 00 0f 85 b7 00 00 00 48 83 c4 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 ee 4c 89 f0 31 d2<48> f7 f6 48 d1 ed 48 8d 5c 24 10 48 8d 3c 92 48 89 c1 31 d2 48
> [ 25.274347] RIP [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
> [ 25.274885] RSP<ffff8800b1e1fde8>
> [ 25.275457] ---[ end trace 8558569eb04480ab ]---
>
> Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
> similarly to what the qstat_pv_latency_wake case does, as if nothing
> else, this can come from resetting the statistics, thus having 0 kicks
> should be quite valid in this context.
>
> Signed-off-by: Davidlohr Bueso<[email protected]>
> ---
> kernel/locking/qspinlock_stat.h | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
> index eb2a2c9bc3fc..d734b7502001 100644
> --- a/kernel/locking/qspinlock_stat.h
> +++ b/kernel/locking/qspinlock_stat.h
> @@ -136,10 +136,12 @@ static ssize_t qstat_read(struct file *file, char __user *user_buf,
> }
>
> if (counter == qstat_pv_hash_hops) {
> - u64 frac;
> + u64 frac = 0;
>
> - frac = 100ULL * do_div(stat, kicks);
> - frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
> + if (kicks) {
> + frac = 100ULL * do_div(stat, kicks);
> + frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
> + }
>
> /*
> * Return a X.XX decimal number

Reviewed-by: Waiman Long <[email protected]>

Cheers,
Longman

2016-04-18 19:40:46

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH -tip 3/3] locking/pvqspinlock: Robustify init_qspinlock_stat()

On 04/18/2016 02:31 AM, Davidlohr Bueso wrote:
> Specifically around the debugfs file creation calls,
> I have no idea if they could ever possibly fail, but
> this is core code (debug aside) so lets at least
> check the return value and inform anything fishy.
>
> Signed-off-by: Davidlohr Bueso<[email protected]>
> ---
> kernel/locking/qspinlock_stat.h | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
> index 72722334237a..ddcd653c942c 100644
> --- a/kernel/locking/qspinlock_stat.h
> +++ b/kernel/locking/qspinlock_stat.h
> @@ -225,12 +225,18 @@ static int __init init_qspinlock_stat(void)
> * performance.
> */
> for (i = 0; i< qstat_num; i++)
> - debugfs_create_file(qstat_names[i], 0400, d_qstat,
> - (void *)(long)i,&fops_qstat);
> + if (!debugfs_create_file(qstat_names[i], 0400, d_qstat,
> + (void *)(long)i,&fops_qstat))
> + goto fail;
> +
> + if (!debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
> + (void *)(long)qstat_reset_cnts,&fops_qstat))
> + goto fail;
>
> - debugfs_create_file(qstat_names[qstat_reset_cnts], 0200, d_qstat,
> - (void *)(long)qstat_reset_cnts,&fops_qstat);
> return 0;
> +fail:
> + debugfs_remove_recursive(d_qstat);
> + return 1;
> }
> fs_initcall(init_qspinlock_stat);
>

Reviewed-by: Waiman Long <[email protected]>

Cheers,
Longman

2016-04-18 20:14:41

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH -tip 2/3] locking/pvqspinlock: Avoid double resetting of stats

On 04/18/2016 02:31 AM, Davidlohr Bueso wrote:
> ... remove the redundant second iteration, this is most
> likely a copy/past buglet.
>
> Signed-off-by: Davidlohr Bueso<[email protected]>
> ---
> kernel/locking/qspinlock_stat.h | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
> index d734b7502001..72722334237a 100644
> --- a/kernel/locking/qspinlock_stat.h
> +++ b/kernel/locking/qspinlock_stat.h
> @@ -191,8 +191,6 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf,
>
> for (i = 0 ; i< qstat_num; i++)
> WRITE_ONCE(ptr[i], 0);
> - for (i = 0 ; i< qstat_num; i++)
> - WRITE_ONCE(ptr[i], 0);
> }
> return count;
> }

The double write is done on purpose. As the statistics count update
isn't atomic, there is a very small chance (p) that clearing the count
may happen in the middle of read-modify-write bus transaction. Doing a
double write will reduce the chance further to p^2. This isn't failsafe,
but I think is good enough.

However, I don't mind eliminate the double write either as we can always
view the statistics count after a reset to make sure that they are
properly cleared.

Cheers,
Longman

Subject: [tip:locking/urgent] locking/pvqspinlock: Fix division by zero in qstat_read()

Commit-ID: 6687659568e2ec5b3ac24b39c5d26ce8b9d90434
Gitweb: http://git.kernel.org/tip/6687659568e2ec5b3ac24b39c5d26ce8b9d90434
Author: Davidlohr Bueso <[email protected]>
AuthorDate: Sun, 17 Apr 2016 23:31:41 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 19 Apr 2016 10:49:19 +0200

locking/pvqspinlock: Fix division by zero in qstat_read()

While playing with the qstat statistics (in <debugfs>/qlockstat/) I ran into
the following splat on a VM when opening pv_hash_hops:

divide error: 0000 [#1] SMP
...
RIP: 0010:[<ffffffff810b61fe>] [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
...
Call Trace:
[<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
[<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
[<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
[<ffffffff811937a9>] ? do_mmap+0x449/0x550
[<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
[<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
[<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
[<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
[<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
similarly to what the qstat_pv_latency_wake case does, as if nothing
else, this can come from resetting the statistics, thus having 0 kicks
should be quite valid in this context.

Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Waiman Long <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/qspinlock_stat.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index eb2a2c9..d734b75 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -136,10 +136,12 @@ static ssize_t qstat_read(struct file *file, char __user *user_buf,
}

if (counter == qstat_pv_hash_hops) {
- u64 frac;
+ u64 frac = 0;

- frac = 100ULL * do_div(stat, kicks);
- frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+ if (kicks) {
+ frac = 100ULL * do_div(stat, kicks);
+ frac = DIV_ROUND_CLOSEST_ULL(frac, kicks);
+ }

/*
* Return a X.XX decimal number