2023-04-06 15:30:21

by Chris Hyser

[permalink] [raw]
Subject: [PATCH v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size.

Commit 6419265899d9 ("sched/fair: Fix division by zero
sysctl_numa_balancing_scan_size") prevented a divide by zero by using
sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
value of zero. When moved from a sysctl to a debugfs file, this checking
was lost.

This patch puts zero checking back in place.

Cc: [email protected]
Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Tested-by: Chen Yu <[email protected]>
Signed-off-by: Chris Hyser <[email protected]>
---
kernel/sched/debug.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1637b65ba07a..cc6a0172a598 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -278,6 +278,48 @@ static const struct file_operations sched_dynamic_fops = {

#endif /* CONFIG_PREEMPT_DYNAMIC */

+#ifdef CONFIG_NUMA_BALANCING
+
+static ssize_t sched_numa_scan_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ int err;
+ unsigned int scan_size;
+
+ err = kstrtouint_from_user(ubuf, cnt, 10, &scan_size);
+ if (err)
+ return err;
+
+ if (!scan_size)
+ return -EINVAL;
+
+ sysctl_numa_balancing_scan_size = scan_size;
+
+ *ppos += cnt;
+ return cnt;
+}
+
+static int sched_numa_scan_show(struct seq_file *m, void *v)
+{
+ seq_printf(m, "%d\n", sysctl_numa_balancing_scan_size);
+ return 0;
+}
+
+static int sched_numa_scan_open(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, sched_numa_scan_show, NULL);
+}
+
+static const struct file_operations sched_numa_scan_fops = {
+ .open = sched_numa_scan_open,
+ .write = sched_numa_scan_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#endif /* CONFIG_NUMA_BALANCING */
+
__read_mostly bool sched_debug_verbose;

static const struct seq_operations sched_debug_sops;
@@ -332,7 +374,7 @@ static __init int sched_init_debug(void)
debugfs_create_u32("scan_delay_ms", 0644, numa, &sysctl_numa_balancing_scan_delay);
debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balancing_scan_period_min);
debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
- debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
+ debugfs_create_file("scan_size_mb", 0644, numa, NULL, &sched_numa_scan_fops);
debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold);
#endif

--
2.31.1


2023-04-29 16:41:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size.

On Thu, Apr 06, 2023 at 11:26:33AM -0400, chris hyser wrote:
> Commit 6419265899d9 ("sched/fair: Fix division by zero
> sysctl_numa_balancing_scan_size") prevented a divide by zero by using
> sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
> value of zero. When moved from a sysctl to a debugfs file, this checking
> was lost.
>
> This patch puts zero checking back in place.
>
> Cc: [email protected]
> Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
> Tested-by: Chen Yu <[email protected]>
> Signed-off-by: Chris Hyser <[email protected]>

I suppose.. but is it really worth the hassle? I mean, this is debug
stuff, just don't write 0 in then?

If we do find we want this (why?!) then should we not invest in a better
debugfs_create_u32_minmax() or something so that we don't get to add 40+
lines for everthing we want to add limits on?

> ---
> kernel/sched/debug.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 1637b65ba07a..cc6a0172a598 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -278,6 +278,48 @@ static const struct file_operations sched_dynamic_fops = {
>
> #endif /* CONFIG_PREEMPT_DYNAMIC */
>
> +#ifdef CONFIG_NUMA_BALANCING
> +
> +static ssize_t sched_numa_scan_write(struct file *filp, const char __user *ubuf,
> + size_t cnt, loff_t *ppos)
> +{
> + int err;
> + unsigned int scan_size;
> +
> + err = kstrtouint_from_user(ubuf, cnt, 10, &scan_size);
> + if (err)
> + return err;
> +
> + if (!scan_size)
> + return -EINVAL;
> +
> + sysctl_numa_balancing_scan_size = scan_size;
> +
> + *ppos += cnt;
> + return cnt;
> +}
> +
> +static int sched_numa_scan_show(struct seq_file *m, void *v)
> +{
> + seq_printf(m, "%d\n", sysctl_numa_balancing_scan_size);
> + return 0;
> +}
> +
> +static int sched_numa_scan_open(struct inode *inode, struct file *filp)
> +{
> + return single_open(filp, sched_numa_scan_show, NULL);
> +}
> +
> +static const struct file_operations sched_numa_scan_fops = {
> + .open = sched_numa_scan_open,
> + .write = sched_numa_scan_write,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = single_release,
> +};
> +
> +#endif /* CONFIG_NUMA_BALANCING */
> +
> __read_mostly bool sched_debug_verbose;
>
> static const struct seq_operations sched_debug_sops;
> @@ -332,7 +374,7 @@ static __init int sched_init_debug(void)
> debugfs_create_u32("scan_delay_ms", 0644, numa, &sysctl_numa_balancing_scan_delay);
> debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balancing_scan_period_min);
> debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
> - debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
> + debugfs_create_file("scan_size_mb", 0644, numa, NULL, &sched_numa_scan_fops);
> debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold);
> #endif
>
> --
> 2.31.1
>

2023-05-01 16:31:12

by Chris Hyser

[permalink] [raw]
Subject: Re: [PATCH v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size.

On 4/29/23 10:56, Peter Zijlstra wrote:
> On Thu, Apr 06, 2023 at 11:26:33AM -0400, chris hyser wrote:
>> Commit 6419265899d9 ("sched/fair: Fix division by zero
>> sysctl_numa_balancing_scan_size") prevented a divide by zero by using
>> sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
>> value of zero. When moved from a sysctl to a debugfs file, this checking
>> was lost.
>>
>> This patch puts zero checking back in place.
>>
>> Cc: [email protected]
>> Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
>> Tested-by: Chen Yu <[email protected]>
>> Signed-off-by: Chris Hyser <[email protected]>
>
> I suppose.. but is it really worth the hassle? I mean, this is debug
> stuff, just don't write 0 in then?

My understanding of the history is that this was always debug, someone
found the divide by zero and a convenient sysctl mechanism was used to
fix it. It did also cleanup a little compiler weirdness. I did not see
any justifications in the discussion of the inclusion of that patch
other than showing the nasty stack trace you get when the machine dies
after writing a zero. It is a major inconvenience, completely
preventable and technically a regression, but as you point out the new
fix is a lot more code.

In terms of actually wanting to fix this, I'm a bit confused. It clearly
was worth fixing the first time around (it has your sign-off), and the
only thing that has changed is that that fix no longer works.

>
> If we do find we want this (why?!) then should we not invest in a better
> debugfs_create_u32_minmax() or something so that we don't get to add 40+
> lines for everthing we want to add limits on?

I will look at a way to greatly simplify the bounds checking here as you
suggest.


-chrish

2023-05-01 19:07:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size.

On Mon, May 01, 2023 at 12:21:17PM -0400, chris hyser wrote:
> In terms of actually wanting to fix this, I'm a bit confused. It clearly was
> worth fixing the first time around (it has your sign-off), and the only
> thing that has changed is that that fix no longer works.

Well, the amount of effort to fix it has dramatically increased, 40+
extra lines vs 2 extra lines.

> > If we do find we want this (why?!) then should we not invest in a better
> > debugfs_create_u32_minmax() or something so that we don't get to add 40+
> > lines for everthing we want to add limits on?
>
> I will look at a way to greatly simplify the bounds checking here as you
> suggest.

Thanks, that might make it all a lot nicer indeed.