Some of the error paths inside do_map_benchmark() do not behave well.
There is also an insufficient node id validation and an out-of-bounds
access in NUMA_NO_NODE case.
Adjust these issues with the following patches.
Changelog
v2:
* Rework kthread-related error handling patch (thanks to Robin Murphy) and
merge patches 1/2 and 2/2 from v1 into a single patch - 1/4 in v2.
* Three additional patches:
* * Avoid needless copy_to_user if benchmark fails (suggested by Barry Song).
* * Fix node id validation (found while testing the previous ones).
* * Handle NUMA_NO_NODE correctly.
v1:
* https://lore.kernel.org/linux-iommu/[email protected]/
Fedor Pchelkin (4):
dma-mapping: benchmark: fix up kthread-related error handling
dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
dma-mapping: benchmark: fix node id validation
dma-mapping: benchmark: handle NUMA_NO_NODE correctly
kernel/dma/map_benchmark.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
--
2.45.0
If do_map_benchmark() has failed, there is nothing useful to copy back
to userspace.
Suggested-by: Barry Song <[email protected]>
Signed-off-by: Fedor Pchelkin <[email protected]>
---
kernel/dma/map_benchmark.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 2478957cf9f8..a6edb1ef98c8 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -256,6 +256,9 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
* dma_mask changed by benchmark
*/
dma_set_mask(map->dev, old_dma_mask);
+
+ if (ret)
+ return ret;
break;
default:
return -EINVAL;
--
2.45.0
kthread creation failure is invalidly handled inside do_map_benchmark().
The put_task_struct() calls on the error path are supposed to balance the
get_task_struct() calls which only happen after all the kthreads are
successfully created. Rollback using kthread_stop() for already created
kthreads in case of such failure.
In normal situation call kthread_stop_put() to gracefully stop kthreads
and put their task refcounts. This should be done for all started
kthreads.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Suggested-by: Robin Murphy <[email protected]>
Signed-off-by: Fedor Pchelkin <[email protected]>
---
kernel/dma/map_benchmark.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 02205ab53b7e..2478957cf9f8 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -118,6 +118,8 @@ static int do_map_benchmark(struct map_benchmark_data *map)
if (IS_ERR(tsk[i])) {
pr_err("create dma_map thread failed\n");
ret = PTR_ERR(tsk[i]);
+ while (--i >= 0)
+ kthread_stop(tsk[i]);
goto out;
}
@@ -139,13 +141,17 @@ static int do_map_benchmark(struct map_benchmark_data *map)
msleep_interruptible(map->bparam.seconds * 1000);
- /* wait for the completion of benchmark threads */
+ /* wait for the completion of all started benchmark threads */
for (i = 0; i < threads; i++) {
- ret = kthread_stop(tsk[i]);
- if (ret)
- goto out;
+ int kthread_ret = kthread_stop_put(tsk[i]);
+
+ if (kthread_ret)
+ ret = kthread_ret;
}
+ if (ret)
+ goto out;
+
loops = atomic64_read(&map->loops);
if (likely(loops > 0)) {
u64 map_variance, unmap_variance;
@@ -170,8 +176,6 @@ static int do_map_benchmark(struct map_benchmark_data *map)
}
out:
- for (i = 0; i < threads; i++)
- put_task_struct(tsk[i]);
put_device(map->dev);
kfree(tsk);
return ret;
--
2.45.0
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:
BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
---
kernel/dma/map_benchmark.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index a6edb1ef98c8..9f6c15f3f168 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -212,7 +212,8 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
}
if (map->bparam.node != NUMA_NO_NODE &&
- !node_possible(map->bparam.node)) {
+ (map->bparam.node < 0 || map->bparam.node >= MAX_NUMNODES ||
+ !node_possible(map->bparam.node))) {
pr_err("invalid numa node\n");
return -EINVAL;
}
--
2.45.0
cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark()
resulting in the following sanitizer report:
UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask [64][1]'
CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
ubsan_epilogue (lib/ubsan.c:232)
__ubsan_handle_out_of_bounds (lib/ubsan.c:429)
cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline]
do_map_benchmark (kernel/dma/map_benchmark.c:104)
map_benchmark_ioctl (kernel/dma/map_benchmark.c:246)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Use cpumask_of_node() in place when binding a kernel thread to a cpuset
of a particular node.
Note that the provided node id is checked inside map_benchmark_ioctl().
It's just a NUMA_NO_NODE case which is not handled properly later.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <[email protected]>
---
kernel/dma/map_benchmark.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 9f6c15f3f168..4950e0b622b1 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -101,7 +101,6 @@ static int do_map_benchmark(struct map_benchmark_data *map)
struct task_struct **tsk;
int threads = map->bparam.threads;
int node = map->bparam.node;
- const cpumask_t *cpu_mask = cpumask_of_node(node);
u64 loops;
int ret = 0;
int i;
@@ -124,7 +123,7 @@ static int do_map_benchmark(struct map_benchmark_data *map)
}
if (node != NUMA_NO_NODE)
- kthread_bind_mask(tsk[i], cpu_mask);
+ kthread_bind_mask(tsk[i], cpumask_of_node(node));
}
/* clear the old value in the previous benchmark */
--
2.45.0
On Sat, May 4, 2024 at 11:48 PM Fedor Pchelkin <[email protected]> wrote:
>
> cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark()
> resulting in the following sanitizer report:
>
> UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
> index -1 is out of range for type 'cpumask [64][1]'
> CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:117)
> ubsan_epilogue (lib/ubsan.c:232)
> __ubsan_handle_out_of_bounds (lib/ubsan.c:429)
> cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline]
> do_map_benchmark (kernel/dma/map_benchmark.c:104)
> map_benchmark_ioctl (kernel/dma/map_benchmark.c:246)
> full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
> __x64_sys_ioctl (fs/ioctl.c:890)
> do_syscall_64 (arch/x86/entry/common.c:83)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>
> Use cpumask_of_node() in place when binding a kernel thread to a cpuset
> of a particular node.
>
> Note that the provided node id is checked inside map_benchmark_ioctl().
> It's just a NUMA_NO_NODE case which is not handled properly later.
>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
> Signed-off-by: Fedor Pchelkin <[email protected]>
We've never actually accessed the content of cpu_mask when node
equals NUMA_NO_NODE because we always have the condition if (node !=
NUMA_NO_NODE) before accessing it. Therefore, the reported failure isn't
actually an issue. However, the underlying concept is correct, hence,
Acked-by: Barry Song <[email protected]>
I also noticed some arch have fixed up cpumask_of_node itself.
#define cpumask_of_node(node) ((node) == -1 ? \
cpu_all_mask : \
&hub_data(node)->h_cpus)
> ---
> kernel/dma/map_benchmark.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> index 9f6c15f3f168..4950e0b622b1 100644
> --- a/kernel/dma/map_benchmark.c
> +++ b/kernel/dma/map_benchmark.c
> @@ -101,7 +101,6 @@ static int do_map_benchmark(struct map_benchmark_data *map)
> struct task_struct **tsk;
> int threads = map->bparam.threads;
> int node = map->bparam.node;
> - const cpumask_t *cpu_mask = cpumask_of_node(node);
> u64 loops;
> int ret = 0;
> int i;
> @@ -124,7 +123,7 @@ static int do_map_benchmark(struct map_benchmark_data *map)
> }
>
> if (node != NUMA_NO_NODE)
> - kthread_bind_mask(tsk[i], cpu_mask);
> + kthread_bind_mask(tsk[i], cpumask_of_node(node));
> }
>
> /* clear the old value in the previous benchmark */
> --
> 2.45.0
>
On 2024-05-04 12:47 pm, Fedor Pchelkin wrote:
> kthread creation failure is invalidly handled inside do_map_benchmark().
> The put_task_struct() calls on the error path are supposed to balance the
> get_task_struct() calls which only happen after all the kthreads are
> successfully created. Rollback using kthread_stop() for already created
> kthreads in case of such failure.
>
> In normal situation call kthread_stop_put() to gracefully stop kthreads
> and put their task refcounts. This should be done for all started
> kthreads.
Although strictly there were two overlapping bugs here, I'd agree that
it really doesn't seem worth the bother of trying to distinguish them.
I'm far from a kthread expert, but as best I can tell this looks like it
achieves a sensible final state. Modulo one nit below,
Reviewed-by: Robin Murphy <[email protected]>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
> Suggested-by: Robin Murphy <[email protected]>
> Signed-off-by: Fedor Pchelkin <[email protected]>
> ---
> kernel/dma/map_benchmark.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> index 02205ab53b7e..2478957cf9f8 100644
> --- a/kernel/dma/map_benchmark.c
> +++ b/kernel/dma/map_benchmark.c
> @@ -118,6 +118,8 @@ static int do_map_benchmark(struct map_benchmark_data *map)
> if (IS_ERR(tsk[i])) {
> pr_err("create dma_map thread failed\n");
> ret = PTR_ERR(tsk[i]);
> + while (--i >= 0)
> + kthread_stop(tsk[i]);
I think this means we'd end up actually starting the threads purely to
get them to see the KTHREAD_SHOULD_STOP flag and exit again? Not that
I'm too fussed about optimising an unexpected error path, however I
can't help but wonder if we might only need a put_task_struct() if we
can safely assume that the threads have never been started at this point.
Thanks,
Robin.
> goto out;
> }
>
> @@ -139,13 +141,17 @@ static int do_map_benchmark(struct map_benchmark_data *map)
>
> msleep_interruptible(map->bparam.seconds * 1000);
>
> - /* wait for the completion of benchmark threads */
> + /* wait for the completion of all started benchmark threads */
> for (i = 0; i < threads; i++) {
> - ret = kthread_stop(tsk[i]);
> - if (ret)
> - goto out;
> + int kthread_ret = kthread_stop_put(tsk[i]);
> +
> + if (kthread_ret)
> + ret = kthread_ret;
> }
>
> + if (ret)
> + goto out;
> +
> loops = atomic64_read(&map->loops);
> if (likely(loops > 0)) {
> u64 map_variance, unmap_variance;
> @@ -170,8 +176,6 @@ static int do_map_benchmark(struct map_benchmark_data *map)
> }
>
> out:
> - for (i = 0; i < threads; i++)
> - put_task_struct(tsk[i]);
> put_device(map->dev);
> kfree(tsk);
> return ret;
On 2024-05-04 12:47 pm, Fedor Pchelkin wrote:
> While validating node ids in map_benchmark_ioctl(), node_possible() may
> be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
> leading to:
>
> BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
> Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
> CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:117)
> kasan_report (mm/kasan/report.c:603)
> kasan_check_range (mm/kasan/generic.c:189)
> variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
> arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
> _test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
> node_state (include/linux/nodemask.h:423) [inline]
> map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
> full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
> __x64_sys_ioctl (fs/ioctl.c:890)
> do_syscall_64 (arch/x86/entry/common.c:83)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>
> Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
> special valid case meaning that benchmarking kthreads won't be bound to a
> cpuset of a given node.
Makes sense, and the style of the check looks like a reasonably common
pattern.
Reviewed-by: Robin Murphy <[email protected]>
> Found by Linux Verification Center (linuxtesting.org).
>
> Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
> Signed-off-by: Fedor Pchelkin <[email protected]>
> ---
> kernel/dma/map_benchmark.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> index a6edb1ef98c8..9f6c15f3f168 100644
> --- a/kernel/dma/map_benchmark.c
> +++ b/kernel/dma/map_benchmark.c
> @@ -212,7 +212,8 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
> }
>
> if (map->bparam.node != NUMA_NO_NODE &&
> - !node_possible(map->bparam.node)) {
> + (map->bparam.node < 0 || map->bparam.node >= MAX_NUMNODES ||
> + !node_possible(map->bparam.node))) {
> pr_err("invalid numa node\n");
> return -EINVAL;
> }
On 2024-05-04 12:47 pm, Fedor Pchelkin wrote:
> If do_map_benchmark() has failed, there is nothing useful to copy back
> to userspace.
I guess there could be some valid partial data if for instance it failed
due to OOM in the middle of running, but the standard tool is still
going to ignore that if the ioctl() returns an error, so meh.
Acked-by: Robin Murphy <[email protected]>
> Suggested-by: Barry Song <[email protected]>
> Signed-off-by: Fedor Pchelkin <[email protected]>
> ---
> kernel/dma/map_benchmark.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> index 2478957cf9f8..a6edb1ef98c8 100644
> --- a/kernel/dma/map_benchmark.c
> +++ b/kernel/dma/map_benchmark.c
> @@ -256,6 +256,9 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
> * dma_mask changed by benchmark
> */
> dma_set_mask(map->dev, old_dma_mask);
> +
> + if (ret)
> + return ret;
> break;
> default:
> return -EINVAL;
Hi,
Thanks for review of the series!
Robin Murphy wrote:
> On 2024-05-04 12:47 pm, Fedor Pchelkin wrote:
> > kthread creation failure is invalidly handled inside do_map_benchmark().
> > The put_task_struct() calls on the error path are supposed to balance the
> > get_task_struct() calls which only happen after all the kthreads are
> > successfully created. Rollback using kthread_stop() for already created
> > kthreads in case of such failure.
> >
> > In normal situation call kthread_stop_put() to gracefully stop kthreads
> > and put their task refcounts. This should be done for all started
> > kthreads.
>
> Although strictly there were two overlapping bugs here, I'd agree that
> it really doesn't seem worth the bother of trying to distinguish them.
> I'm far from a kthread expert, but as best I can tell this looks like it
> achieves a sensible final state. Modulo one nit below,
>
> Reviewed-by: Robin Murphy <[email protected]>
>
> > Found by Linux Verification Center (linuxtesting.org).
> >
> > Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
> > Suggested-by: Robin Murphy <[email protected]>
> > Signed-off-by: Fedor Pchelkin <[email protected]>
> > ---
> > kernel/dma/map_benchmark.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> > index 02205ab53b7e..2478957cf9f8 100644
> > --- a/kernel/dma/map_benchmark.c
> > +++ b/kernel/dma/map_benchmark.c
> > @@ -118,6 +118,8 @@ static int do_map_benchmark(struct map_benchmark_data *map)
> > if (IS_ERR(tsk[i])) {
> > pr_err("create dma_map thread failed\n");
> > ret = PTR_ERR(tsk[i]);
> > + while (--i >= 0)
> > + kthread_stop(tsk[i]);
>
> I think this means we'd end up actually starting the threads purely to
> get them to see the KTHREAD_SHOULD_STOP flag and exit again? Not that
> I'm too fussed about optimising an unexpected error path, however I
> can't help but wonder if we might only need a put_task_struct() if we
> can safely assume that the threads have never been started at this point.
The threads have been already started to the moment by
kthread_create_on_node() but put to uninterruptible sleep: please see
kthread() function. They just have to be explicitly awakened, find that
the KTHREAD_SHOULD_STOP flag was set and perform do_exit() in order to
terminate and release all resources. The thread_fn won't be executed in
such case.
I feel there is no more convenient way for doing this differently than
calling kthread_stop(). And the comment for kthread_stop() actually implies
that it is intended to work also just after kthread creation.
Calling put_task_struct() in that place will hit WARN_ON(!tsk->exit_state).
I'd say the last call to this function should be made after (or in result
of) the do_exit().
Thanks,
applied to the dma-mapping tree for 6.10.