2020-09-24 11:24:58

by Sumera Priyadarsini

[permalink] [raw]
Subject: [PATCH 0/2] Improve Coccinelle Parallelisation

Coccinelle utilises all available threads to implement parallelisation.
However, this results in a decrease in performance.

This patchset aims to improve performance by modifying cocciccheck to
use at most one thread per core by default.

Sumera Priyadarsini (2):
scripts: coccicheck: Change default value for parallelism
Documentation: Coccinelle: Modify parallelisation information in docs

Documentation/dev-tools/coccinelle.rst | 4 ++--
scripts/coccicheck | 5 +++++
2 files changed, 7 insertions(+), 2 deletions(-)

--
2.25.1


2020-09-24 11:27:52

by Sumera Priyadarsini

[permalink] [raw]
Subject: [PATCH 1/2] scripts: coccicheck: Change default value for parallelism

By default, coccicheck utilizes all available threads to implement
parallelisation. However, when all available threads are used,
a decrease in performance is noted. The elapsed time is minimum
when at most one thread per core is used.

For example, on benchmarking the semantic patch kfree.cocci for
usb/serial using hyperfine, the outputs obtained for J=5 and J=2
are 1.32 and 1.90 times faster than those for J=10 and J=9
respectively for two separate runs. For the larger drivers/staging
directory, minimium elapsed time is obtained for J=3 which is 1.86
times faster than that for J=12. The optimal J value does not
exceed 6 in any of the test runs. The benchmarks are run on a machine
with 6 cores, with 2 threads per core, i.e, 12 hyperthreads in all.

To improve performance, modify coccicheck to use at most only
one thread per core by default.

Signed-off-by: Sumera Priyadarsini <[email protected]>

---
Changes in V2:
- Change commit message as suggested by Julia Lawall
Changes in V3:
- Use J/2 as optimal value for machines with more
than 8 hyperthreads as well.
---
scripts/coccicheck | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/scripts/coccicheck b/scripts/coccicheck
index e04d328210ac..a72aa6c037ff 100755
--- a/scripts/coccicheck
+++ b/scripts/coccicheck
@@ -75,8 +75,13 @@ else
OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
fi

+ # Use only one thread per core by default if hyperthreading is enabled
+ THREADS_PER_CORE=$(lscpu | grep "Thread(s) per core: " | tr -cd [:digit:])
if [ -z "$J" ]; then
NPROC=$(getconf _NPROCESSORS_ONLN)
+ if [ $THREADS_PER_CORE -gt 1 -a $NPROC -gt 2 ] ; then
+ NPROC=$((NPROC/2))
+ fi
else
NPROC="$J"
fi
--
2.25.1

2020-09-24 11:29:56

by Sumera Priyadarsini

[permalink] [raw]
Subject: [PATCH 2/2] Documentation: Coccinelle: Modify parallelisation information in docs

This patchset modifies coccicheck to use at most one thread per core by
default for optimal performance. Modify documentation in coccinelle.rst
to reflect the same.

Signed-off-by: Sumera Priyadarsini <[email protected]>
---
Documentation/dev-tools/coccinelle.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst
index 74c5e6aeeff5..a27a4867018c 100644
--- a/Documentation/dev-tools/coccinelle.rst
+++ b/Documentation/dev-tools/coccinelle.rst
@@ -130,8 +130,8 @@ To enable verbose messages set the V= variable, for example::
Coccinelle parallelization
--------------------------

-By default, coccicheck tries to run as parallel as possible. To change
-the parallelism, set the J= variable. For example, to run across 4 CPUs::
+By default, coccicheck uses at most only one thread per core of the system.
+To change the parallelism, set the J= variable. For example, to run across 4 CPUs::

make coccicheck MODE=report J=4

--
2.25.1

2020-09-27 20:57:04

by Julia Lawall

[permalink] [raw]
Subject: Re: [Cocci] [PATCH 2/2] Documentation: Coccinelle: Modify parallelisation information in docs



On Thu, 24 Sep 2020, Sumera Priyadarsini wrote:

> This patchset modifies coccicheck to use at most one thread per core by
> default for optimal performance. Modify documentation in coccinelle.rst
> to reflect the same.

It would be good for the documentation to mention that this only occurs if
the machine has more than two cores (and more than 4 hardware threads).

julia


>
> Signed-off-by: Sumera Priyadarsini <[email protected]>
> ---
> Documentation/dev-tools/coccinelle.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst
> index 74c5e6aeeff5..a27a4867018c 100644
> --- a/Documentation/dev-tools/coccinelle.rst
> +++ b/Documentation/dev-tools/coccinelle.rst
> @@ -130,8 +130,8 @@ To enable verbose messages set the V= variable, for example::
> Coccinelle parallelization
> --------------------------
>
> -By default, coccicheck tries to run as parallel as possible. To change
> -the parallelism, set the J= variable. For example, to run across 4 CPUs::
> +By default, coccicheck uses at most only one thread per core of the system.
> +To change the parallelism, set the J= variable. For example, to run across 4 CPUs::
>
> make coccicheck MODE=report J=4
>
> --
> 2.25.1
>
> _______________________________________________
> Cocci mailing list
> [email protected]
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

2020-09-27 20:59:49

by Julia Lawall

[permalink] [raw]
Subject: Re: [Cocci] [PATCH 1/2] scripts: coccicheck: Change default value for parallelism



On Thu, 24 Sep 2020, Sumera Priyadarsini wrote:

> By default, coccicheck utilizes all available threads to implement
> parallelisation. However, when all available threads are used,
> a decrease in performance is noted. The elapsed time is minimum
> when at most one thread per core is used.
>
> For example, on benchmarking the semantic patch kfree.cocci for
> usb/serial using hyperfine, the outputs obtained for J=5 and J=2
> are 1.32 and 1.90 times faster than those for J=10 and J=9
> respectively for two separate runs. For the larger drivers/staging
> directory, minimium elapsed time is obtained for J=3 which is 1.86
> times faster than that for J=12. The optimal J value does not
> exceed 6 in any of the test runs. The benchmarks are run on a machine
> with 6 cores, with 2 threads per core, i.e, 12 hyperthreads in all.
>
> To improve performance, modify coccicheck to use at most only
> one thread per core by default.
>
> Signed-off-by: Sumera Priyadarsini <[email protected]>

I have applied this one, so just the patch on the docuemtnation needs to
be improved.

julia

>
> ---
> Changes in V2:
> - Change commit message as suggested by Julia Lawall
> Changes in V3:
> - Use J/2 as optimal value for machines with more
> than 8 hyperthreads as well.
> ---
> scripts/coccicheck | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/scripts/coccicheck b/scripts/coccicheck
> index e04d328210ac..a72aa6c037ff 100755
> --- a/scripts/coccicheck
> +++ b/scripts/coccicheck
> @@ -75,8 +75,13 @@ else
> OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
> fi
>
> + # Use only one thread per core by default if hyperthreading is enabled
> + THREADS_PER_CORE=$(lscpu | grep "Thread(s) per core: " | tr -cd [:digit:])
> if [ -z "$J" ]; then
> NPROC=$(getconf _NPROCESSORS_ONLN)
> + if [ $THREADS_PER_CORE -gt 1 -a $NPROC -gt 2 ] ; then
> + NPROC=$((NPROC/2))
> + fi
> else
> NPROC="$J"
> fi
> --
> 2.25.1
>
> _______________________________________________
> Cocci mailing list
> [email protected]
> https://systeme.lip6.fr/mailman/listinfo/cocci
>