2021-08-24 15:45:33

by Marcelo Tosatti

[permalink] [raw]
Subject: [patch V3 2/8] add prctl task isolation prctl docs and samples

Add documentation and userspace sample code for prctl
task isolation interface.

Signed-off-by: Marcelo Tosatti <[email protected]>

---
Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
samples/Kconfig | 7
samples/Makefile | 1
samples/task_isolation/Makefile | 9 +
samples/task_isolation/task_isol.c | 83 +++++++++
samples/task_isolation/task_isol.h | 9 +
samples/task_isolation/task_isol_userloop.c | 56 ++++++
7 files changed, 376 insertions(+)

Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/userspace-api/task_isolation.rst
@@ -0,0 +1,281 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Task isolation prctl interface
+===============================
+
+Certain types of applications benefit from running uninterrupted by
+background OS activities. Realtime systems and high-bandwidth networking
+applications with user-space drivers can fall into the category.
+
+To create an OS noise free environment for the application, this
+interface allows userspace to inform the kernel the start and
+end of the latency sensitive application section (with configurable
+system behaviour for that section).
+
+Note: the prctl interface is independent of nohz_full=.
+
+The prctl options are:
+
+
+ - PR_ISOL_FEAT: Retrieve supported features.
+ - PR_ISOL_GET: Retrieve task isolation parameters.
+ - PR_ISOL_SET: Set task isolation parameters.
+ - PR_ISOL_CTRL_GET: Retrieve task isolation state.
+ - PR_ISOL_CTRL_SET: Set task isolation state.
+ - PR_ISOL_GET_INT: Retrieve internal parameters.
+ - PR_ISOL_SET_INT: Retrieve internal parameters.
+
+
+
+Inheritance of the isolation parameters and state, across
+fork(2) and clone(2), can be changed via
+PR_ISOL_GET_INT/PR_ISOL_SET_INT.
+
+At a high-level, task isolation is divided in two steps:
+
+1. Configuration.
+2. Activation.
+
+Section "Userspace support" describes how to use
+task isolation.
+
+In terms of the interface, the sequence of steps to activate
+task isolation are:
+
+1. Retrieve supported task isolation features (PR_ISOL_FEAT).
+2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
+3. Activate or deactivate task isolation features
+ (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
+4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT).
+
+This interface is based on ideas and code from the
+task isolation patchset from Alex Belits:
+https://lwn.net/Articles/816298/
+
+--------------------
+Feature description
+--------------------
+
+ - ``ISOL_F_QUIESCE``
+
+ This feature allows quiescing select kernel activities on
+ return from system calls.
+
+---------------------
+Interface description
+---------------------
+
+**PR_ISOL_FEAT**:
+
+ Returns the supported features and feature
+ capabilities, as a bitmask::
+
+ prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies whether to return
+ supported features (if zero), or feature capabilities
+ (if not zero). Possible non-zero values for 'feat' are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ Returns a bitmask containing which kernel
+ activities are supported for quiescing.
+
+ Features and its capabilities are defined at include/uapi/linux/task_isolation.h.
+
+**PR_ISOL_GET**:
+
+ Retrieve task isolation feature configuration.
+ The general format is::
+
+ prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies whether to return
+ configured features (if zero), or individual feature
+ configuration (if not zero). Possible non-zero
+ values for 'feat' are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ Returns a bitmask containing which kernel
+ activities are enabled for quiescing.
+
+
+**PR_ISOL_SET**:
+
+ Configures task isolation features. The general format is::
+
+ prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies which feature to configure.
+ Possible values for feat are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ The 'arg3' argument is a bitmask specifying which
+ kernel activities to quiesce. Possible bit sets are:
+
+ - ``ISOL_F_QUIESCE_VMSTATS``
+
+ VM statistics are maintained in per-CPU counters to
+ improve performance. When a CPU modifies a VM statistic,
+ this modification is kept in the per-CPU counter.
+ Certain activities require a global count, which
+ involves requesting each CPU to flush its local counters
+ to the global VM counters.
+
+ This flush is implemented via a workqueue item, which
+ might schedule a workqueue on isolated CPUs.
+
+ To avoid this interruption, task isolation can be
+ configured to, upon return from system calls, synchronize
+ the per-CPU counters to global counters, thus avoiding
+ the interruption.
+
+ To ensure the application returns to userspace
+ with no modified per-CPU counters, its necessary to
+ use mlockall() in addition to this isolcpus flag.
+
+**PR_ISOL_CTRL_GET**:
+
+ Retrieve task isolation control.
+
+ prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
+
+ Returns which isolation features are active.
+
+**PR_ISOL_CTRL_SET**:
+
+ Activates/deactivates task isolation control.
+
+ prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+
+ The 'mask' argument specifies which features
+ to activate (bit set) or deactivate (bit clear).
+
+ For ISOL_F_QUIESCE, quiescing of background activities
+ happens on return to userspace from the
+ prctl(PR_ISOL_CTRL_SET) call, and on return from
+ subsequent system calls.
+
+ Quiescing can be adjusted (while active) by
+ prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
+
+**PR_ISOL_GET_INT**:
+
+ Retrieves task isolation internal parameters.
+
+ The general format is::
+
+ prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
+
+ The 'cmd' argument specifies which parameter to configure.
+ Possible values for cmd are:
+
+ - ``INHERIT_CFG``:
+
+ Retrieve inheritance configuration.
+
+ The 'arg3' argument is a pointer to a struct
+ inherit_control::
+
+ struct task_isol_inherit_control {
+ __u8 inherit_mask;
+ __u8 pad[7];
+ };
+
+ See PR_ISOL_SET_INT description below for meaning
+ of structure fields.
+
+**PR_ISOL_SET_INT**:
+
+ Sets task isolation internal parameters.
+
+ The general format is::
+
+ prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
+
+ The 'cmd' argument specifies which parameter to configure.
+ Possible values for cmd are:
+
+ - ``INHERIT_CFG``:
+
+ Set inheritance configuration when a new task
+ is created via fork and clone.
+
+ The 'arg3' argument is a pointer to a struct
+ inherit_control::
+
+ struct task_isol_inherit_control {
+ __u8 inherit_mask;
+ __u8 pad[7];
+ };
+
+ inherit_mask is a bitmask that specifies which part
+ of task isolation to be inherited:
+
+ - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
+ This is the stated written via prctl(PR_ISOL_SET, ...).
+
+ - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
+ (requires ISOL_INHERIT_CONF to be set). The new task
+ should behave, right after fork/clone, in the same manner
+ as the parent task _after_ it executed:
+
+ prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+
+ with a valid mask.
+
+==================
+Userspace support
+==================
+
+Task isolation is divided in two main steps: configuration and activation.
+
+Each step can be performed by an external tool or the latency sensitive
+application itself. util-linux contains the "chisol" tool for this
+purpose.
+
+This results in three combinations:
+
+1. Both configuration and activation performed by the
+latency sensitive application.
+Allows fine grained control of what task isolation
+features are enabled and when (see samples section below).
+
+2. Only activation can be performed by the latency sensitive app
+(and configuration performed by chisol).
+This allows the admin/user to control task isolation parameters,
+and applications have to be modified only once.
+
+3. Configuration and activation performed by an external tool.
+This allows unmodified applications to take advantage of
+task isolation. Activation is performed by the "-a" option
+of chisol.
+
+========
+Examples
+========
+
+The ``samples/task_isolation/`` directory contains sample
+code.
+
+This is a snippet of code to activate task isolation if
+it has been previously configured (by chisol for example)::
+
+ #include <sys/prctl.h>
+
+ #ifdef PR_ISOL_GET
+ ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
+ if (ret != -1) {
+ unsigned long mask = ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET");
+ return ret;
+ }
+ }
+ #endif
+
Index: linux-2.6/samples/task_isolation/task_isol.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include <errno.h>
+#include "task_isol.h"
+
+#ifdef PR_ISOL_FEAT
+int task_isol_setup(void)
+{
+ int ret;
+ int errnosv;
+
+ /* Retrieve supported task isolation features */
+ ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_FEAT");
+ return ret;
+ }
+ printf("supported features bitmask: 0x%x\n", ret);
+
+ /* Retrieve supported ISOL_F_QUIESCE bits */
+ ret = prctl(PR_ISOL_FEAT, ISOL_F_QUIESCE, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_FEAT (ISOL_F_QUIESCE)");
+ return ret;
+ }
+ printf("supported ISOL_F_QUIESCE bits: 0x%x\n", ret);
+
+ /* Do not configure task isolation attributes if already set */
+ ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
+ errnosv = errno;
+ if (ret != -1) {
+ printf("Task isolation parameters already configured!\n");
+ return ret;
+ }
+ if (ret == -1 && errnosv != ENODATA) {
+ perror("prctl PR_ISOL_GET");
+ return ret;
+ }
+ ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS,
+ 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_SET");
+ return ret;
+ }
+ return ISOL_F_QUIESCE;
+}
+
+int task_isol_ctrl_set(unsigned long mask)
+{
+ int ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
+ return -1;
+ }
+
+ return 0;
+}
+
+#else
+
+int task_isol_setup(void)
+{
+ return 0;
+}
+
+int task_isol_ctrl_set(unsigned long mask)
+{
+ return 0;
+}
+#endif
+
+
Index: linux-2.6/samples/task_isolation/task_isol.h
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TASK_ISOL_H
+#define __TASK_ISOL_H
+
+int task_isol_setup(void);
+
+int task_isol_ctrl_set(unsigned long mask);
+
+#endif /* __TASK_ISOL_H */
Index: linux-2.6/samples/task_isolation/task_isol_userloop.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol_userloop.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include "task_isol.h"
+
+int main(void)
+{
+ int ret;
+ void *buf = malloc(4096);
+ unsigned long mask;
+
+ memset(buf, 1, 4096);
+ ret = mlock(buf, 4096);
+ if (ret) {
+ perror("mlock");
+ return EXIT_FAILURE;
+ }
+
+ ret = task_isol_setup();
+ if (ret)
+ return EXIT_FAILURE;
+ mask = ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
+ return EXIT_FAILURE;
+ }
+
+#define NR_LOOPS 999999999
+#define NR_PRINT 100000000
+ /* busy loop */
+ while (ret < NR_LOOPS) {
+ memset(buf, 0, 4096);
+ ret = ret+1;
+ if (!(ret % NR_PRINT))
+ printf("loops=%d of %d\n", ret, NR_LOOPS);
+ }
+
+ ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (0)");
+ exit(0);
+ }
+
+ return EXIT_SUCCESS;
+}
+
Index: linux-2.6/samples/Kconfig
===================================================================
--- linux-2.6.orig/samples/Kconfig
+++ linux-2.6/samples/Kconfig
@@ -223,4 +223,11 @@ config SAMPLE_WATCH_QUEUE
Build example userspace program to use the new mount_notify(),
sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function.

+config SAMPLE_TASK_ISOLATION
+ bool "task isolation sample"
+ depends on CC_CAN_LINK && HEADERS_INSTALL
+ help
+ Build example userspace program to use prctl task isolation
+ interface.
+
endif # SAMPLES
Index: linux-2.6/samples/Makefile
===================================================================
--- linux-2.6.orig/samples/Makefile
+++ linux-2.6/samples/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/
subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
+subdir-$(CONFIG_SAMPLE_TASK_ISOLATION) += task_isolation
Index: linux-2.6/samples/task_isolation/Makefile
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+userprogs-always-y += task_isol_userloop
+task_isol_userloop-objs := task_isol.o task_isol_userloop.o
+
+userccflags += -I usr/include
+
+
+#$(CC) $^ -o $@



2021-08-26 10:06:34

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Tue, Aug 24, 2021 at 12:24:25PM -0300, Marcelo Tosatti wrote:
> Add documentation and userspace sample code for prctl
> task isolation interface.
>
> Signed-off-by: Marcelo Tosatti <[email protected]>
>
> ---
> Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
> samples/Kconfig | 7
> samples/Makefile | 1
> samples/task_isolation/Makefile | 9 +
> samples/task_isolation/task_isol.c | 83 +++++++++
> samples/task_isolation/task_isol.h | 9 +
> samples/task_isolation/task_isol_userloop.c | 56 ++++++
> 7 files changed, 376 insertions(+)
>
> Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> ===================================================================
> --- /dev/null
> +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> @@ -0,0 +1,281 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Task isolation prctl interface
> +===============================
> +
> +Certain types of applications benefit from running uninterrupted by
> +background OS activities. Realtime systems and high-bandwidth networking
> +applications with user-space drivers can fall into the category.
> +
> +To create an OS noise free environment for the application, this
> +interface allows userspace to inform the kernel the start and
> +end of the latency sensitive application section (with configurable
> +system behaviour for that section).
> +
> +Note: the prctl interface is independent of nohz_full=.
> +
> +The prctl options are:
> +
> +
> + - PR_ISOL_FEAT: Retrieve supported features.
> + - PR_ISOL_GET: Retrieve task isolation parameters.
> + - PR_ISOL_SET: Set task isolation parameters.
> + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> + - PR_ISOL_CTRL_SET: Set task isolation state.
> + - PR_ISOL_GET_INT: Retrieve internal parameters.
> + - PR_ISOL_SET_INT: Retrieve internal parameters.

There should be some short summary here to explain the difference
between parameter, state, and internal parameter. I personally have
no clue so far.

> +
> +
> +Inheritance of the isolation parameters and state, across
> +fork(2) and clone(2), can be changed via
> +PR_ISOL_GET_INT/PR_ISOL_SET_INT.
> +
> +At a high-level, task isolation is divided in two steps:
> +
> +1. Configuration.
> +2. Activation.
> +
> +Section "Userspace support" describes how to use
> +task isolation.
> +
> +In terms of the interface, the sequence of steps to activate
> +task isolation are:
> +
> +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> +3. Activate or deactivate task isolation features
> + (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> +4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT).
> +
> +This interface is based on ideas and code from the
> +task isolation patchset from Alex Belits:
> +https://lwn.net/Articles/816298/
> +
> +--------------------
> +Feature description
> +--------------------
> +
> + - ``ISOL_F_QUIESCE``
> +
> + This feature allows quiescing select kernel activities on
> + return from system calls.
> +
> +---------------------
> +Interface description
> +---------------------
> +
> +**PR_ISOL_FEAT**:
> +
> + Returns the supported features and feature
> + capabilities, as a bitmask::
> +
> + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> +
> + The 'feat' argument specifies whether to return
> + supported features (if zero), or feature capabilities
> + (if not zero). Possible non-zero values for 'feat' are:
> +
> + - ``ISOL_F_QUIESCE``:
> +
> + Returns a bitmask containing which kernel
> + activities are supported for quiescing.
> +
> + Features and its capabilities are defined at
> include/uapi/linux/task_isolation.h.

Preferably have feat a parameter name. We never know if we want
to extend it in the future.

> +
> +**PR_ISOL_GET**:
> +
> + Retrieve task isolation feature configuration.
> + The general format is::
> +
> + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> +
> + The 'feat' argument specifies whether to return
> + configured features (if zero), or individual feature
> + configuration (if not zero).

You might need to elaborate a bit on the "feat" behaviour difference.

> + values for 'feat' are:
> +
> + - ``ISOL_F_QUIESCE``:
> +
> + Returns a bitmask containing which kernel
> + activities are enabled for quiescing.
> +
> +
> +**PR_ISOL_SET**:
> +
> + Configures task isolation features. The general format is::
> +
> + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> +
> + The 'feat' argument specifies which feature to configure.
> + Possible values for feat are:
> +
> + - ``ISOL_F_QUIESCE``:
> +
> + The 'arg3' argument is a bitmask specifying which
> + kernel activities to quiesce. Possible bit sets are:
> +
> + - ``ISOL_F_QUIESCE_VMSTATS``
> +
> + VM statistics are maintained in per-CPU counters to
> + improve performance. When a CPU modifies a VM statistic,
> + this modification is kept in the per-CPU counter.
> + Certain activities require a global count, which
> + involves requesting each CPU to flush its local counters
> + to the global VM counters.
> +
> + This flush is implemented via a workqueue item, which
> + might schedule a workqueue on isolated CPUs.
> +
> + To avoid this interruption, task isolation can be
> + configured to, upon return from system calls, synchronize
> + the per-CPU counters to global counters, thus avoiding
> + the interruption.
> +
> + To ensure the application returns to userspace
> + with no modified per-CPU counters, its necessary to
> + use mlockall() in addition to this isolcpus flag.

So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
on all subsequent return to userspace, right?

> +
> +**PR_ISOL_CTRL_GET**:
> +
> + Retrieve task isolation control.
> +
> + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> +
> + Returns which isolation features are active.
> +
> +**PR_ISOL_CTRL_SET**:
> +
> + Activates/deactivates task isolation control.
> +
> + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> +
> + The 'mask' argument specifies which features
> + to activate (bit set) or deactivate (bit clear).
> +
> + For ISOL_F_QUIESCE, quiescing of background activities
> + happens on return to userspace from the
> + prctl(PR_ISOL_CTRL_SET) call, and on return from
> + subsequent system calls.

Now I'm lost again on the difference with PR_ISOL_SET

> +
> + Quiescing can be adjusted (while active) by
> + prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> +
> +**PR_ISOL_GET_INT**:
> +
> + Retrieves task isolation internal parameters.
> +
> + The general format is::
> +
> + prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> +
> + The 'cmd' argument specifies which parameter to configure.
> + Possible values for cmd are:
> +
> + - ``INHERIT_CFG``:
> +
> + Retrieve inheritance configuration.
> +
> + The 'arg3' argument is a pointer to a struct
> + inherit_control::
> +
> + struct task_isol_inherit_control {
> + __u8 inherit_mask;
> + __u8 pad[7];
> + };
> +
> + See PR_ISOL_SET_INT description below for meaning
> + of structure fields.
> +
> +**PR_ISOL_SET_INT**:
> +
> + Sets task isolation internal parameters.
> +
> + The general format is::
> +
> + prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> +
> + The 'cmd' argument specifies which parameter to configure.
> + Possible values for cmd are:
> +
> + - ``INHERIT_CFG``:
> +
> + Set inheritance configuration when a new task
> + is created via fork and clone.
> +
> + The 'arg3' argument is a pointer to a struct
> + inherit_control::
> +
> + struct task_isol_inherit_control {
> + __u8 inherit_mask;
> + __u8 pad[7];
> + };
> +
> + inherit_mask is a bitmask that specifies which part
> + of task isolation to be inherited:
> +
> + - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> + This is the stated written via prctl(PR_ISOL_SET, ...).
> +
> + - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> + (requires ISOL_INHERIT_CONF to be set). The new task
> + should behave, right after fork/clone, in the same manner
> + as the parent task _after_ it executed:
> +
> + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> +
> + with a valid mask.

I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
Arguably having a whole prctl for that makes it easier to extend. But then
PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
noise, right?

Thanks.

2021-08-26 12:39:49

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

Hi Frederic,

On Thu, Aug 26, 2021 at 11:59:58AM +0200, Frederic Weisbecker wrote:
> On Tue, Aug 24, 2021 at 12:24:25PM -0300, Marcelo Tosatti wrote:
> > Add documentation and userspace sample code for prctl
> > task isolation interface.
> >
> > Signed-off-by: Marcelo Tosatti <[email protected]>
> >
> > ---
> > Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
> > samples/Kconfig | 7
> > samples/Makefile | 1
> > samples/task_isolation/Makefile | 9 +
> > samples/task_isolation/task_isol.c | 83 +++++++++
> > samples/task_isolation/task_isol.h | 9 +
> > samples/task_isolation/task_isol_userloop.c | 56 ++++++
> > 7 files changed, 376 insertions(+)
> >
> > Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> > @@ -0,0 +1,281 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Task isolation prctl interface
> > +===============================
> > +
> > +Certain types of applications benefit from running uninterrupted by
> > +background OS activities. Realtime systems and high-bandwidth networking
> > +applications with user-space drivers can fall into the category.
> > +
> > +To create an OS noise free environment for the application, this
> > +interface allows userspace to inform the kernel the start and
> > +end of the latency sensitive application section (with configurable
> > +system behaviour for that section).
> > +
> > +Note: the prctl interface is independent of nohz_full=.
> > +
> > +The prctl options are:
> > +
> > +
> > + - PR_ISOL_FEAT: Retrieve supported features.
> > + - PR_ISOL_GET: Retrieve task isolation parameters.
> > + - PR_ISOL_SET: Set task isolation parameters.
> > + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > + - PR_ISOL_CTRL_SET: Set task isolation state.
> > + - PR_ISOL_GET_INT: Retrieve internal parameters.
> > + - PR_ISOL_SET_INT: Retrieve internal parameters.
>
> There should be some short summary here to explain the difference
> between parameter, state, and internal parameter. I personally have
> no clue so far.

Yes, those have been written without clear definitions and can be
improved (it makes sense to me, so please indicate what is not
clear to you). So:

* "Feature": a generic name for a task isolation feature.
Examples of features could be logging, new operating modes (syscalls
disallowed), userspace notifications, etc. One feature is quiescing.

* "Parameter": a specific choice from a given set of possible choices
that dictate how the particular feature in question should act.

* "Internal parameter": A parameter (as in above) but not related to
task isolation features themselves, but to "internal characteristics"
(well, there is only one example of internal parameter so far
and that is inheritance across clone/fork).

Maybe "internal parameter" is a bad name and something different should
be used instead ?

Should i add the description aboves to the document file?

> > +Inheritance of the isolation parameters and state, across
> > +fork(2) and clone(2), can be changed via
> > +PR_ISOL_GET_INT/PR_ISOL_SET_INT.
> > +
> > +At a high-level, task isolation is divided in two steps:
> > +
> > +1. Configuration.
> > +2. Activation.
> > +
> > +Section "Userspace support" describes how to use
> > +task isolation.
> > +
> > +In terms of the interface, the sequence of steps to activate
> > +task isolation are:
> > +
> > +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> > +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> > +3. Activate or deactivate task isolation features
> > + (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> > +4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT).
> > +
> > +This interface is based on ideas and code from the
> > +task isolation patchset from Alex Belits:
> > +https://lwn.net/Articles/816298/
> > +
> > +--------------------
> > +Feature description
> > +--------------------
> > +
> > + - ``ISOL_F_QUIESCE``
> > +
> > + This feature allows quiescing select kernel activities on
> > + return from system calls.
> > +
> > +---------------------
> > +Interface description
> > +---------------------
> > +
> > +**PR_ISOL_FEAT**:
> > +
> > + Returns the supported features and feature
> > + capabilities, as a bitmask::
> > +
> > + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies whether to return
> > + supported features (if zero), or feature capabilities
> > + (if not zero). Possible non-zero values for 'feat' are:
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + Returns a bitmask containing which kernel
> > + activities are supported for quiescing.
> > +
> > + Features and its capabilities are defined at
> > include/uapi/linux/task_isolation.h.
>
> Preferably have feat a parameter name. We never know if we want
> to extend it in the future.

It is a parameter name:

prctl(PR_ISOL_FEAT, feat-A, arg3, arg4, arg5);

prctl(PR_ISOL_FEAT, feat-B, arg3, arg4, arg5);

And yes, the idea is that new features can be added.

So unless i misunderstood you, there are no changes necessary here.

> > +
> > +**PR_ISOL_GET**:
> > +
> > + Retrieve task isolation feature configuration.
> > + The general format is::
> > +
> > + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies whether to return
> > + configured features (if zero), or individual feature
> > + configuration (if not zero).
>
> You might need to elaborate a bit on the "feat" behaviour difference.

Not sure what you mean? There is only one "feat" yet, which is
ISOL_F_QUIESCE:

> > + values for 'feat' are:
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + Returns a bitmask containing which kernel
> > + activities are enabled for quiescing.

If one were to add a new feature, he would add:

- ``ISOL_F_FEATURE2``:

Returns a ...

Unclear what improvement are you suggesting?

> > +**PR_ISOL_SET**:
> > +
> > + Configures task isolation features. The general format is::
> > +
> > + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies which feature to configure.
> > + Possible values for feat are:
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + The 'arg3' argument is a bitmask specifying which
> > + kernel activities to quiesce. Possible bit sets are:
> > +
> > + - ``ISOL_F_QUIESCE_VMSTATS``
> > +
> > + VM statistics are maintained in per-CPU counters to
> > + improve performance. When a CPU modifies a VM statistic,
> > + this modification is kept in the per-CPU counter.
> > + Certain activities require a global count, which
> > + involves requesting each CPU to flush its local counters
> > + to the global VM counters.
> > +
> > + This flush is implemented via a workqueue item, which
> > + might schedule a workqueue on isolated CPUs.
> > +
> > + To avoid this interruption, task isolation can be
> > + configured to, upon return from system calls, synchronize
> > + the per-CPU counters to global counters, thus avoiding
> > + the interruption.
> > +
> > + To ensure the application returns to userspace
> > + with no modified per-CPU counters, its necessary to
> > + use mlockall() in addition to this isolcpus flag.
>
> So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
> on all subsequent return to userspace, right?

Yes. Hum, i think i dropped that clarification (by mistake). Will re-add
it.

> > +
> > +**PR_ISOL_CTRL_GET**:
> > +
> > + Retrieve task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > +
> > + Returns which isolation features are active.
> > +
> > +**PR_ISOL_CTRL_SET**:
> > +
> > + Activates/deactivates task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > +
> > + The 'mask' argument specifies which features
> > + to activate (bit set) or deactivate (bit clear).
> > +
> > + For ISOL_F_QUIESCE, quiescing of background activities
> > + happens on return to userspace from the
> > + prctl(PR_ISOL_CTRL_SET) call, and on return from
> > + subsequent system calls.
>
> Now I'm lost again on the difference with PR_ISOL_SET

PR_ISOL_SET configures the features parameters.

PR_ISOL_CTRL_SET _activates_ task isolation.

You earlier wrote:

"I would rather decouple the above with, for modes:

PR_TASK_ISOLATION_SET
PR_TASK_ISOLATION_GET

And for oneshot requests:

PR_TASK_ISOLATION_REQUEST"

Now we have PR_ISOL_SET/PR_ISOL_GET (to configure the parameters of
task isolation features), and PR_ISOL_CTRL_SET to activate that
isolation (and we pass a bitmask to PR_ISOL_CTRL_SET indicating what
features should be active). How the particular features behave
is determined at PR_ISOL_SET time.

This allows the administrator to, via chisol, configure task isolation:

+
+ if (argc - optind < 1) {
+ warnx(_("bad usage"));
+ errtryhelp(EXIT_FAILURE);
+ }
+
+ if (quiesce_act_mask) {
+ ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_act_mask, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_SET");
+ exit(EXIT_FAILURE);
+ }
+ }

And the application, which has to be modified only once with:

+#ifdef PR_ISOL_GET
+ ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
+ if (ret != -1) {
+ unsigned long mask = ret;
+
+ TEST0(prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0));
+ }
+#endif
+
frc(&ts2);
do {
workload_fn(t->dst_buf, t->src_buf, g.workload_mem_size);

Makes sense?

(BTW, please let me know how the wording would be so it is
easier to understand... if it is not simple to you, it won't
be simple to others as well).

> > +
> > + Quiescing can be adjusted (while active) by
> > + prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> > +
> > +**PR_ISOL_GET_INT**:
> > +
> > + Retrieves task isolation internal parameters.
> > +
> > + The general format is::
> > +
> > + prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> > +
> > + The 'cmd' argument specifies which parameter to configure.
> > + Possible values for cmd are:
> > +
> > + - ``INHERIT_CFG``:
> > +
> > + Retrieve inheritance configuration.
> > +
> > + The 'arg3' argument is a pointer to a struct
> > + inherit_control::
> > +
> > + struct task_isol_inherit_control {
> > + __u8 inherit_mask;
> > + __u8 pad[7];
> > + };
> > +
> > + See PR_ISOL_SET_INT description below for meaning
> > + of structure fields.
> > +
> > +**PR_ISOL_SET_INT**:
> > +
> > + Sets task isolation internal parameters.
> > +
> > + The general format is::
> > +
> > + prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> > +
> > + The 'cmd' argument specifies which parameter to configure.
> > + Possible values for cmd are:
> > +
> > + - ``INHERIT_CFG``:
> > +
> > + Set inheritance configuration when a new task
> > + is created via fork and clone.
> > +
> > + The 'arg3' argument is a pointer to a struct
> > + inherit_control::
> > +
> > + struct task_isol_inherit_control {
> > + __u8 inherit_mask;
> > + __u8 pad[7];
> > + };
> > +
> > + inherit_mask is a bitmask that specifies which part
> > + of task isolation to be inherited:
> > +
> > + - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> > + This is the stated written via prctl(PR_ISOL_SET, ...).
> > +
> > + - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> > + (requires ISOL_INHERIT_CONF to be set). The new task
> > + should behave, right after fork/clone, in the same manner
> > + as the parent task _after_ it executed:
> > +
> > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > +
> > + with a valid mask.
>
> I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
> Arguably having a whole prctl for that makes it easier to extend. But then
> PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
> noise, right?

It has to be called before PR_ISOL_CTRL_SET, yes.

Decided on a separate prctl because the inheritance control
is not a feature itself: it acts on all features (or how task isolation
features are inherited across fork/clone).

So yes, first idea was to "lets add this to PR_ISOL_SET", but then it
became weird to have something that controls the features as a feature
itself... It would be ISOL_F_INHERIT_CONTROL. Can change to that, if
you prefer.

2021-08-26 19:17:45

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Thu, 26 Aug 2021, Marcelo Tosatti wrote:

> Decided on a separate prctl because the inheritance control
> is not a feature itself: it acts on all features (or how task isolation
> features are inherited across fork/clone).

I am having a hard time imagening use cases for such a feature since I
usally see special code sections optimized to run without OS jitter and
not whole processes. AFAICT You would not want to have any of these on
because they cause performance regression if you must do syscalls related
to process startup and termination.

Since we are adding docs: Could we have some sample use cases for
when these features are useful?

2021-08-26 20:40:01

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Thu, Aug 26, 2021 at 09:15:57PM +0200, Christoph Lameter wrote:
> On Thu, 26 Aug 2021, Marcelo Tosatti wrote:
>
> > Decided on a separate prctl because the inheritance control
> > is not a feature itself: it acts on all features (or how task isolation
> > features are inherited across fork/clone).
>
> I am having a hard time imagening use cases for such a feature since I
> usally see special code sections optimized to run without OS jitter and
> not whole processes. AFAICT You would not want to have any of these on
> because they cause performance regression if you must do syscalls related
> to process startup and termination.

The documentation has:

+==================
+Userspace support
+==================
+
+Task isolation is divided in two main steps: configuration and activation.
+
+Each step can be performed by an external tool or the latency sensitive
+application itself. util-linux contains the "chisol" tool for this
+purpose.
+
+This results in three combinations:
+
+1. Both configuration and activation performed by the
+latency sensitive application.
+Allows fine grained control of what task isolation
+features are enabled and when (see samples section below).
+
+2. Only activation can be performed by the latency sensitive app
+(and configuration performed by chisol).
+This allows the admin/user to control task isolation parameters,
+and applications have to be modified only once.
+
+3. Configuration and activation performed by an external tool.
+This allows unmodified applications to take advantage of
+task isolation. Activation is performed by the "-a" option
+of chisol.

The util-linux patch changelog has:

util-linux: add chisol tool to configure task isolation

Add chisol tool to configure task isolation. See chisol -h
for details.

For example, to launch a version of oslat that activates
task isolation:

chisol -q vmstat_sync -I conf ./oslat -f 1 -c 5 -D 5m

-q vmstat_sync: enable quiescing of per-CPU vmstats
-I conf: inherit task isolation configuration.

====

So you can _configure_ the parameters of task isolation outside
your application, but activation (just before the realtime loop),
can be done inside it (which requires modification of the
application). See the oslat/cyclictest patches.

So to answer your question: chisol allows task isolation to be
configured and activated externally from a latency sensitive app
(which at this moment means every system call, after activation,
will sync the vmstats on return to userspace... which obviously
slow things down). But you can then run unmodified applications
(while paying the cost of slower startup times).

Activation of different features before exec'ing a new application
will of course depend on what the feature does...

> Since we are adding docs: Could we have some sample use cases for
> when these features are useful?

There is one sample at samples/task_isolation/task_isol_userloop.c,
do you mean more samples would be useful ? (there are two i know of:
the one you mentioned and the one Thomas mentioned).

2021-08-27 13:14:33

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Thu, Aug 26, 2021 at 09:11:31AM -0300, Marcelo Tosatti wrote:
> Hi Frederic,
>
> On Thu, Aug 26, 2021 at 11:59:58AM +0200, Frederic Weisbecker wrote:
> > > +Note: the prctl interface is independent of nohz_full=.
> > > +
> > > +The prctl options are:
> > > +
> > > +
> > > + - PR_ISOL_FEAT: Retrieve supported features.
> > > + - PR_ISOL_GET: Retrieve task isolation parameters.
> > > + - PR_ISOL_SET: Set task isolation parameters.
> > > + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > > + - PR_ISOL_CTRL_SET: Set task isolation state.
> > > + - PR_ISOL_GET_INT: Retrieve internal parameters.
> > > + - PR_ISOL_SET_INT: Retrieve internal parameters.
> >
> > There should be some short summary here to explain the difference
> > between parameter, state, and internal parameter. I personally have
> > no clue so far.
>
> Yes, those have been written without clear definitions and can be
> improved (it makes sense to me, so please indicate what is not
> clear to you). So:
>
> * "Feature": a generic name for a task isolation feature.
> Examples of features could be logging, new operating modes (syscalls
> disallowed), userspace notifications, etc. One feature is quiescing.
>
> * "Parameter": a specific choice from a given set of possible choices
> that dictate how the particular feature in question should act.
>
> * "Internal parameter": A parameter (as in above) but not related to
> task isolation features themselves, but to "internal characteristics"
> (well, there is only one example of internal parameter so far
> and that is inheritance across clone/fork).
>
> Maybe "internal parameter" is a bad name and something different should
> be used instead ?
>
> Should i add the description aboves to the document file?

Ok so to make things clearer, may I suggest:

s/PR_ISOL_FEAT/PR_ISOL_GET_FEAT

to make it more obvious that we are not going to write or configure something.

Also:

s/PR_ISOL_SET/PR_ISOL_CFG_SET or s/PR_ISOL_SET/PR_ISOL_PARAM_SET
s/PR_ISOL_GET/PR_ISOL_CFG_GET or s/PR_ISOL_GET/PR_ISOL_PARAM_GET

because SET or GET alone are too general. I first thought they were the
activation interface whereas they are only the configuration stage.

And then PR_ISOL_CTRL_GET/SET look good. Although perhaps
PR_ISOL_ACTIVATE_SET/GET would probably be clearer. Or even this is where
the trimmed name PR_ISOL_SET/GET would make sense.

> > > +---------------------
> > > +Interface description
> > > +---------------------
> > > +
> > > +**PR_ISOL_FEAT**:
> > > +
> > > + Returns the supported features and feature
> > > + capabilities, as a bitmask::
> > > +
> > > + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > > +
> > > + The 'feat' argument specifies whether to return
> > > + supported features (if zero), or feature capabilities
> > > + (if not zero). Possible non-zero values for 'feat' are:
> > > +
> > > + - ``ISOL_F_QUIESCE``:
> > > +
> > > + Returns a bitmask containing which kernel
> > > + activities are supported for quiescing.
> > > +
> > > + Features and its capabilities are defined at
> > > include/uapi/linux/task_isolation.h.
> >
> > Preferably have feat a parameter name. We never know if we want
> > to extend it in the future.
>
> It is a parameter name:
>
> prctl(PR_ISOL_FEAT, feat-A, arg3, arg4, arg5);
>
> prctl(PR_ISOL_FEAT, feat-B, arg3, arg4, arg5);
>
> And yes, the idea is that new features can be added.
>
> So unless i misunderstood you, there are no changes necessary here.

Ok, indeed that was my misunderstanding.

> > > +**PR_ISOL_GET**:
> > > +
> > > + Retrieve task isolation feature configuration.
> > > + The general format is::
> > > +
> > > + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > > +
> > > + The 'feat' argument specifies whether to return
> > > + configured features (if zero), or individual feature
> > > + configuration (if not zero).
> >
> > You might need to elaborate a bit on the "feat" behaviour difference.
>
> Not sure what you mean? There is only one "feat" yet, which is
> ISOL_F_QUIESCE:

Sorry my misunderstanding again. So if I understand correctly prctl(PR_ISOL_GET,
0, ...) returns a mask of all features that have been configured through
PR_ISOL_SET(), right?

> > > +**PR_ISOL_SET**:
> > > +
> > > + Configures task isolation features. The general format is::
> > > +
> > > + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > > +
> > > + The 'feat' argument specifies which feature to configure.
> > > + Possible values for feat are:
> > > +
> > > + - ``ISOL_F_QUIESCE``:
> > > +
> > > + The 'arg3' argument is a bitmask specifying which
> > > + kernel activities to quiesce. Possible bit sets are:
> > > +
> > > + - ``ISOL_F_QUIESCE_VMSTATS``
> > > +
> > > + VM statistics are maintained in per-CPU counters to
> > > + improve performance. When a CPU modifies a VM statistic,
> > > + this modification is kept in the per-CPU counter.
> > > + Certain activities require a global count, which
> > > + involves requesting each CPU to flush its local counters
> > > + to the global VM counters.
> > > +
> > > + This flush is implemented via a workqueue item, which
> > > + might schedule a workqueue on isolated CPUs.
> > > +
> > > + To avoid this interruption, task isolation can be
> > > + configured to, upon return from system calls, synchronize
> > > + the per-CPU counters to global counters, thus avoiding
> > > + the interruption.
> > > +
> > > + To ensure the application returns to userspace
> > > + with no modified per-CPU counters, its necessary to
> > > + use mlockall() in addition to this isolcpus flag.
> >
> > So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
> > on all subsequent return to userspace, right?
>
> Yes. Hum, i think i dropped that clarification (by mistake). Will re-add
> it.

So how are we going to implement oneshot quiescing? As in quiescing only upon
the return of a given prctl().

Maybe using a feature something like ISOL_F_QUIESCE_ONCE?

But then suppose I do this:

prctl(PR_ISOL_SET, ISOL_F_QUIESCE_ONCE, ISOL_F_QUIESCE_VMSTATS, ...)
prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE_ONCE, ...) //will quiesce on this return only
prctl(PR_ISOL_CTRL_GET, ...)

What should PR_ISOL_CTRL_GET return above? Probably nothing.

>
> > > +
> > > +**PR_ISOL_CTRL_GET**:
> > > +
> > > + Retrieve task isolation control.
> > > +
> > > + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > > +
> > > + Returns which isolation features are active.
> > > +
> > > +**PR_ISOL_CTRL_SET**:
> > > +
> > > + Activates/deactivates task isolation control.
> > > +
> > > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > +
> > > + The 'mask' argument specifies which features
> > > + to activate (bit set) or deactivate (bit clear).
> > > +
> > > + For ISOL_F_QUIESCE, quiescing of background activities
> > > + happens on return to userspace from the
> > > + prctl(PR_ISOL_CTRL_SET) call, and on return from
> > > + subsequent system calls.
> >
> > Now I'm lost again on the difference with PR_ISOL_SET
>
> PR_ISOL_SET configures the features parameters.
>
> PR_ISOL_CTRL_SET _activates_ task isolation.
>
> You earlier wrote:
>
> "I would rather decouple the above with, for modes:
>
> PR_TASK_ISOLATION_SET
> PR_TASK_ISOLATION_GET
>
> And for oneshot requests:
>
> PR_TASK_ISOLATION_REQUEST"
>
> Now we have PR_ISOL_SET/PR_ISOL_GET (to configure the parameters of
> task isolation features), and PR_ISOL_CTRL_SET to activate that
> isolation (and we pass a bitmask to PR_ISOL_CTRL_SET indicating what
> features should be active). How the particular features behave
> is determined at PR_ISOL_SET time.

I guess that makes sense. This way we can quiesce everything in one go
instead of issuing a prctl() for each features, which adds further noise.
Sounds proper.

>
> This allows the administrator to, via chisol, configure task isolation:
>
> +
> + if (argc - optind < 1) {
> + warnx(_("bad usage"));
> + errtryhelp(EXIT_FAILURE);
> + }
> +
> + if (quiesce_act_mask) {
> + ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_act_mask, 0, 0);
> + if (ret == -1) {
> + perror("prctl PR_ISOL_SET");
> + exit(EXIT_FAILURE);
> + }
> + }
>
> And the application, which has to be modified only once with:
>
> +#ifdef PR_ISOL_GET
> + ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
> + if (ret != -1) {
> + unsigned long mask = ret;
> +
> + TEST0(prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0));
> + }
> +#endif
> +
> frc(&ts2);
> do {
> workload_fn(t->dst_buf, t->src_buf, g.workload_mem_size);
>
> Makes sense?

Yes! Btw you might want to fetch the mask of PR_ISOL_GET into the
second parameter instead of using the return value which is only
32 bits or prctl() and the highest significant bit is even reserved
for the error.

> > > +**PR_ISOL_GET_INT**:
> > > +
> > > + Retrieves task isolation internal parameters.
> > > +
> > > + The general format is::
> > > +
> > > + prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> > > +
> > > + The 'cmd' argument specifies which parameter to configure.
> > > + Possible values for cmd are:
> > > +
> > > + - ``INHERIT_CFG``:
> > > +
> > > + Retrieve inheritance configuration.
> > > +
> > > + The 'arg3' argument is a pointer to a struct
> > > + inherit_control::
> > > +
> > > + struct task_isol_inherit_control {
> > > + __u8 inherit_mask;
> > > + __u8 pad[7];
> > > + };
> > > +
> > > + See PR_ISOL_SET_INT description below for meaning
> > > + of structure fields.
> > > +
> > > +**PR_ISOL_SET_INT**:
> > > +
> > > + Sets task isolation internal parameters.
> > > +
> > > + The general format is::
> > > +
> > > + prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> > > +
> > > + The 'cmd' argument specifies which parameter to configure.
> > > + Possible values for cmd are:
> > > +
> > > + - ``INHERIT_CFG``:
> > > +
> > > + Set inheritance configuration when a new task
> > > + is created via fork and clone.
> > > +
> > > + The 'arg3' argument is a pointer to a struct
> > > + inherit_control::
> > > +
> > > + struct task_isol_inherit_control {
> > > + __u8 inherit_mask;
> > > + __u8 pad[7];
> > > + };
> > > +
> > > + inherit_mask is a bitmask that specifies which part
> > > + of task isolation to be inherited:
> > > +
> > > + - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> > > + This is the stated written via prctl(PR_ISOL_SET, ...).
> > > +
> > > + - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> > > + (requires ISOL_INHERIT_CONF to be set). The new task
> > > + should behave, right after fork/clone, in the same manner
> > > + as the parent task _after_ it executed:
> > > +
> > > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > +
> > > + with a valid mask.
> >
> > I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
> > Arguably having a whole prctl for that makes it easier to extend. But then
> > PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
> > noise, right?
>
> It has to be called before PR_ISOL_CTRL_SET, yes.
>
> Decided on a separate prctl because the inheritance control
> is not a feature itself: it acts on all features (or how task isolation
> features are inherited across fork/clone).
>
> So yes, first idea was to "lets add this to PR_ISOL_SET", but then it
> became weird to have something that controls the features as a feature
> itself... It would be ISOL_F_INHERIT_CONTROL. Can change to that, if
> you prefer.

Funny but that would work. Either way, let's keep things that way for now.
Just the naming is unfortunate.

Well that could be a clone flag after all... But what about exec()? Should we
make its inheritance tunable? Well we can still extend the interface later if
necessary for that.

Thanks.

2021-08-27 14:50:07

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Fri, Aug 27, 2021 at 03:08:20PM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 26, 2021 at 09:11:31AM -0300, Marcelo Tosatti wrote:
> > Hi Frederic,
> >
> > On Thu, Aug 26, 2021 at 11:59:58AM +0200, Frederic Weisbecker wrote:
> > > > +Note: the prctl interface is independent of nohz_full=.
> > > > +
> > > > +The prctl options are:
> > > > +
> > > > +
> > > > + - PR_ISOL_FEAT: Retrieve supported features.
> > > > + - PR_ISOL_GET: Retrieve task isolation parameters.
> > > > + - PR_ISOL_SET: Set task isolation parameters.
> > > > + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > > > + - PR_ISOL_CTRL_SET: Set task isolation state.
> > > > + - PR_ISOL_GET_INT: Retrieve internal parameters.
> > > > + - PR_ISOL_SET_INT: Retrieve internal parameters.
> > >
> > > There should be some short summary here to explain the difference
> > > between parameter, state, and internal parameter. I personally have
> > > no clue so far.
> >
> > Yes, those have been written without clear definitions and can be
> > improved (it makes sense to me, so please indicate what is not
> > clear to you). So:
> >
> > * "Feature": a generic name for a task isolation feature.
> > Examples of features could be logging, new operating modes (syscalls
> > disallowed), userspace notifications, etc. One feature is quiescing.
> >
> > * "Parameter": a specific choice from a given set of possible choices
> > that dictate how the particular feature in question should act.
> >
> > * "Internal parameter": A parameter (as in above) but not related to
> > task isolation features themselves, but to "internal characteristics"
> > (well, there is only one example of internal parameter so far
> > and that is inheritance across clone/fork).
> >
> > Maybe "internal parameter" is a bad name and something different should
> > be used instead ?
> >
> > Should i add the description aboves to the document file?
>
> Ok so to make things clearer, may I suggest:
>
> s/PR_ISOL_FEAT/PR_ISOL_GET_FEAT
>
> to make it more obvious that we are not going to write or configure something.
>
> Also:
>
> s/PR_ISOL_SET/PR_ISOL_CFG_SET or s/PR_ISOL_SET/PR_ISOL_PARAM_SET
> s/PR_ISOL_GET/PR_ISOL_CFG_GET or s/PR_ISOL_GET/PR_ISOL_PARAM_GET
>
> because SET or GET alone are too general. I first thought they were the
> activation interface whereas they are only the configuration stage.

Sounds good.

> And then PR_ISOL_CTRL_GET/SET look good. Although perhaps
> PR_ISOL_ACTIVATE_SET/GET would probably be clearer. Or even this is where
> the trimmed name PR_ISOL_SET/GET would make sense.

PR_ISOL_ACTIVATE_SET/GET indeed is quite clear. Will change.

> > > > +---------------------
> > > > +Interface description
> > > > +---------------------
> > > > +
> > > > +**PR_ISOL_FEAT**:
> > > > +
> > > > + Returns the supported features and feature
> > > > + capabilities, as a bitmask::
> > > > +
> > > > + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > > > +
> > > > + The 'feat' argument specifies whether to return
> > > > + supported features (if zero), or feature capabilities
> > > > + (if not zero). Possible non-zero values for 'feat' are:
> > > > +
> > > > + - ``ISOL_F_QUIESCE``:
> > > > +
> > > > + Returns a bitmask containing which kernel
> > > > + activities are supported for quiescing.
> > > > +
> > > > + Features and its capabilities are defined at
> > > > include/uapi/linux/task_isolation.h.
> > >
> > > Preferably have feat a parameter name. We never know if we want
> > > to extend it in the future.
> >
> > It is a parameter name:
> >
> > prctl(PR_ISOL_FEAT, feat-A, arg3, arg4, arg5);
> >
> > prctl(PR_ISOL_FEAT, feat-B, arg3, arg4, arg5);
> >
> > And yes, the idea is that new features can be added.
> >
> > So unless i misunderstood you, there are no changes necessary here.
>
> Ok, indeed that was my misunderstanding.
>
> > > > +**PR_ISOL_GET**:
> > > > +
> > > > + Retrieve task isolation feature configuration.
> > > > + The general format is::
> > > > +
> > > > + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > > > +
> > > > + The 'feat' argument specifies whether to return
> > > > + configured features (if zero), or individual feature
> > > > + configuration (if not zero).
> > >
> > > You might need to elaborate a bit on the "feat" behaviour difference.
> >
> > Not sure what you mean? There is only one "feat" yet, which is
> > ISOL_F_QUIESCE:
>
> Sorry my misunderstanding again. So if I understand correctly prctl(PR_ISOL_GET,
> 0, ...) returns a mask of all features that have been configured through
> PR_ISOL_SET(), right?

Yes.

> > > > +**PR_ISOL_SET**:
> > > > +
> > > > + Configures task isolation features. The general format is::
> > > > +
> > > > + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > > > +
> > > > + The 'feat' argument specifies which feature to configure.
> > > > + Possible values for feat are:
> > > > +
> > > > + - ``ISOL_F_QUIESCE``:
> > > > +
> > > > + The 'arg3' argument is a bitmask specifying which
> > > > + kernel activities to quiesce. Possible bit sets are:
> > > > +
> > > > + - ``ISOL_F_QUIESCE_VMSTATS``
> > > > +
> > > > + VM statistics are maintained in per-CPU counters to
> > > > + improve performance. When a CPU modifies a VM statistic,
> > > > + this modification is kept in the per-CPU counter.
> > > > + Certain activities require a global count, which
> > > > + involves requesting each CPU to flush its local counters
> > > > + to the global VM counters.
> > > > +
> > > > + This flush is implemented via a workqueue item, which
> > > > + might schedule a workqueue on isolated CPUs.
> > > > +
> > > > + To avoid this interruption, task isolation can be
> > > > + configured to, upon return from system calls, synchronize
> > > > + the per-CPU counters to global counters, thus avoiding
> > > > + the interruption.
> > > > +
> > > > + To ensure the application returns to userspace
> > > > + with no modified per-CPU counters, its necessary to
> > > > + use mlockall() in addition to this isolcpus flag.
> > >
> > > So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
> > > on all subsequent return to userspace, right?
> >
> > Yes. Hum, i think i dropped that clarification (by mistake). Will re-add
> > it.
>
> So how are we going to implement oneshot quiescing? As in quiescing only upon
> the return of a given prctl().
>
> Maybe using a feature something like ISOL_F_QUIESCE_ONCE?
>
> But then suppose I do this:
>
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE_ONCE, ISOL_F_QUIESCE_VMSTATS, ...)
> prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE_ONCE, ...) //will quiesce on this return only
> prctl(PR_ISOL_CTRL_GET, ...)
>
> What should PR_ISOL_CTRL_GET return above? Probably nothing.

Yeah, nothing.

So the "quiesce once" feature, as i understand, was suggested by
Christoph for the following type of application:

lat_loop:

do {
events = pending_events();
if (events & DATAPATH_EVENT)
process_data()
} while (!(events & UNFREQUENT_ERROR_EVENT))

syscall1()
syscall2()
...
syscallN()
goto lat_loop;

With the V3 patchset, one would have to:

prctl(PR_ISOL_SET, ISOL_F_OTHER, ...);
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...);
...
prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
lat_loop:
do {
events = pending_events();
if (events & DATAPATH_EVENT)
process_data()
} while (!(events & UNFREQUENT_ERROR_EVENT))

/* disables quiescing while executing system calls */
prctl(PR_ISOL_CTRL_SET, ISOL_F_OTHER);
syscall1()
syscall2()
...
syscallN()

/* no more system calls, enables quiescing */
prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
goto lat_loop;

But for an interface with less system calls (true "quiesce once") one could do:

prctl(PR_ISOL_SET, ISOL_F_OTHER, ...);
/* rather than do it at _CTRL_SET as you suggest, enable it at
* configuration time.
*/
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS|ISOL_F_QUIESCE_ONCE, ...);
...

lat_loop:
prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
do {
events = pending_events();
if (events & DATAPATH_EVENT)
process_data()
} while (!(events & UNFREQUENT_ERROR_EVENT))

/* disables quiescing while executing system calls */
syscall1()
syscall2()
...
syscallN()

goto lat_loop;

But see how it starts to get weird: both versions (new feature,
ISOL_F_QUIESCE_ONCE, or new "quiesce feature', ISOL_F_QUIESCE_ONCE)
are using space reserved to

"a list of different features"
or
"a list of different quiesce features".

To add something which is not either a new task isolation
feature or quiesce feature, but a separate control
(which could apply to all of features, or which one might want
to apply only to certain features, and in that case a bitmap
might be specified).

So i think adding a new parameter such as:

prctl(PR_ISOL_SET, ISOL_F_QUIESCE, CMD, arg, ...);

is a good idea. So one can have (with two commands, SET_QUIESCE
and SET_ONESHOT).

prctl(PR_ISOL_SET, ISOL_F_QUIESCE, SET_QUIESCE, ISOL_F_QUIESCE_VMSTAT);
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, SET_ONESHOT, ISOL_F_QUIESCE_VMSTAT);

Then its possible to add random commands with random parameters
(rather than be limited by a single bitmask to control quiescing).

Does that make sense?

> > > > +
> > > > +**PR_ISOL_CTRL_GET**:
> > > > +
> > > > + Retrieve task isolation control.
> > > > +
> > > > + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > > > +
> > > > + Returns which isolation features are active.
> > > > +
> > > > +**PR_ISOL_CTRL_SET**:
> > > > +
> > > > + Activates/deactivates task isolation control.
> > > > +
> > > > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > > +
> > > > + The 'mask' argument specifies which features
> > > > + to activate (bit set) or deactivate (bit clear).
> > > > +
> > > > + For ISOL_F_QUIESCE, quiescing of background activities
> > > > + happens on return to userspace from the
> > > > + prctl(PR_ISOL_CTRL_SET) call, and on return from
> > > > + subsequent system calls.
> > >
> > > Now I'm lost again on the difference with PR_ISOL_SET
> >
> > PR_ISOL_SET configures the features parameters.
> >
> > PR_ISOL_CTRL_SET _activates_ task isolation.
> >
> > You earlier wrote:
> >
> > "I would rather decouple the above with, for modes:
> >
> > PR_TASK_ISOLATION_SET
> > PR_TASK_ISOLATION_GET
> >
> > And for oneshot requests:
> >
> > PR_TASK_ISOLATION_REQUEST"
> >
> > Now we have PR_ISOL_SET/PR_ISOL_GET (to configure the parameters of
> > task isolation features), and PR_ISOL_CTRL_SET to activate that
> > isolation (and we pass a bitmask to PR_ISOL_CTRL_SET indicating what
> > features should be active). How the particular features behave
> > is determined at PR_ISOL_SET time.
>
> I guess that makes sense. This way we can quiesce everything in one go
> instead of issuing a prctl() for each features, which adds further noise.
> Sounds proper.
>
> >
> > This allows the administrator to, via chisol, configure task isolation:
> >
> > +
> > + if (argc - optind < 1) {
> > + warnx(_("bad usage"));
> > + errtryhelp(EXIT_FAILURE);
> > + }
> > +
> > + if (quiesce_act_mask) {
> > + ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_act_mask, 0, 0);
> > + if (ret == -1) {
> > + perror("prctl PR_ISOL_SET");
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> >
> > And the application, which has to be modified only once with:
> >
> > +#ifdef PR_ISOL_GET
> > + ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
> > + if (ret != -1) {
> > + unsigned long mask = ret;
> > +
> > + TEST0(prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0));
> > + }
> > +#endif
> > +
> > frc(&ts2);
> > do {
> > workload_fn(t->dst_buf, t->src_buf, g.workload_mem_size);
> >
> > Makes sense?
>
> Yes! Btw you might want to fetch the mask of PR_ISOL_GET into the
> second parameter instead of using the return value which is only
> 32 bits or prctl() and the highest significant bit is even reserved
> for the error.

Would be good to do this for all cases, so you can extend the
struct (or pad it).
>
> > > > +**PR_ISOL_GET_INT**:
> > > > +
> > > > + Retrieves task isolation internal parameters.
> > > > +
> > > > + The general format is::
> > > > +
> > > > + prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> > > > +
> > > > + The 'cmd' argument specifies which parameter to configure.
> > > > + Possible values for cmd are:
> > > > +
> > > > + - ``INHERIT_CFG``:
> > > > +
> > > > + Retrieve inheritance configuration.
> > > > +
> > > > + The 'arg3' argument is a pointer to a struct
> > > > + inherit_control::
> > > > +
> > > > + struct task_isol_inherit_control {
> > > > + __u8 inherit_mask;
> > > > + __u8 pad[7];
> > > > + };
> > > > +
> > > > + See PR_ISOL_SET_INT description below for meaning
> > > > + of structure fields.
> > > > +
> > > > +**PR_ISOL_SET_INT**:
> > > > +
> > > > + Sets task isolation internal parameters.
> > > > +
> > > > + The general format is::
> > > > +
> > > > + prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> > > > +
> > > > + The 'cmd' argument specifies which parameter to configure.
> > > > + Possible values for cmd are:
> > > > +
> > > > + - ``INHERIT_CFG``:
> > > > +
> > > > + Set inheritance configuration when a new task
> > > > + is created via fork and clone.
> > > > +
> > > > + The 'arg3' argument is a pointer to a struct
> > > > + inherit_control::
> > > > +
> > > > + struct task_isol_inherit_control {
> > > > + __u8 inherit_mask;
> > > > + __u8 pad[7];
> > > > + };
> > > > +
> > > > + inherit_mask is a bitmask that specifies which part
> > > > + of task isolation to be inherited:
> > > > +
> > > > + - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> > > > + This is the stated written via prctl(PR_ISOL_SET, ...).
> > > > +
> > > > + - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> > > > + (requires ISOL_INHERIT_CONF to be set). The new task
> > > > + should behave, right after fork/clone, in the same manner
> > > > + as the parent task _after_ it executed:
> > > > +
> > > > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > > +
> > > > + with a valid mask.
> > >
> > > I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
> > > Arguably having a whole prctl for that makes it easier to extend. But then
> > > PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
> > > noise, right?
> >
> > It has to be called before PR_ISOL_CTRL_SET, yes.
> >
> > Decided on a separate prctl because the inheritance control
> > is not a feature itself: it acts on all features (or how task isolation
> > features are inherited across fork/clone).
> >
> > So yes, first idea was to "lets add this to PR_ISOL_SET", but then it
> > became weird to have something that controls the features as a feature
> > itself... It would be ISOL_F_INHERIT_CONTROL. Can change to that, if
> > you prefer.
>
> Funny but that would work. Either way, let's keep things that way for now.
> Just the naming is unfortunate.
>
> Well that could be a clone flag after all...

Yes, it could as well. But there are no more bits for older clone
interfaces, and clone3 seems to be problematic (moreover, this
interface would need backporting to older kernels).

> But what about exec()? Should we
> make its inheritance tunable? Well we can still extend the interface later if
> necessary for that.

Hum, not sure... Yes, can extend it later.

Thanks!

2021-08-30 11:40:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Fri, Aug 27, 2021 at 11:44:16AM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 27, 2021 at 03:08:20PM +0200, Frederic Weisbecker wrote:
> > Ok so to make things clearer, may I suggest:
> >
> > s/PR_ISOL_FEAT/PR_ISOL_GET_FEAT

<nit>
In fact PR_ISOL_FEAT_GET so that it follows the same naming convention
than PR_ISOL_CFG_GET/PR_ISOL_PARAM_GET and PR_ISOL_ACTIVATE_GET


> > But then suppose I do this:
> >
> > prctl(PR_ISOL_SET, ISOL_F_QUIESCE_ONCE, ISOL_F_QUIESCE_VMSTATS, ...)
> > prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE_ONCE, ...) //will quiesce on this return only
> > prctl(PR_ISOL_CTRL_GET, ...)
> >
> > What should PR_ISOL_CTRL_GET return above? Probably nothing.
>
> Yeah, nothing.
>
> So the "quiesce once" feature, as i understand, was suggested by
> Christoph for the following type of application:
>
> lat_loop:
>
> do {
> events = pending_events();
> if (events & DATAPATH_EVENT)
> process_data()
> } while (!(events & UNFREQUENT_ERROR_EVENT))
>
> syscall1()
> syscall2()
> ...
> syscallN()
> goto lat_loop;
>
> With the V3 patchset, one would have to:
>
> prctl(PR_ISOL_SET, ISOL_F_OTHER, ...);
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...);
> ...
> prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
> lat_loop:
> do {
> events = pending_events();
> if (events & DATAPATH_EVENT)
> process_data()
> } while (!(events & UNFREQUENT_ERROR_EVENT))
>
> /* disables quiescing while executing system calls */
> prctl(PR_ISOL_CTRL_SET, ISOL_F_OTHER);
> syscall1()
> syscall2()
> ...
> syscallN()
>
> /* no more system calls, enables quiescing */
> prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
> goto lat_loop;
>
> But for an interface with less system calls (true "quiesce once") one could do:
>
> prctl(PR_ISOL_SET, ISOL_F_OTHER, ...);
> /* rather than do it at _CTRL_SET as you suggest, enable it at
> * configuration time.
> */
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS|ISOL_F_QUIESCE_ONCE, ...);
> ...
>
> lat_loop:
> prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE|ISOL_F_OTHER);
> do {
> events = pending_events();
> if (events & DATAPATH_EVENT)
> process_data()
> } while (!(events & UNFREQUENT_ERROR_EVENT))
>
> /* disables quiescing while executing system calls */
> syscall1()
> syscall2()
> ...
> syscallN()
>
> goto lat_loop;
>
> But see how it starts to get weird: both versions (new feature,
> ISOL_F_QUIESCE_ONCE, or new "quiesce feature', ISOL_F_QUIESCE_ONCE)
> are using space reserved to
>
> "a list of different features"
> or
> "a list of different quiesce features".
>
> To add something which is not either a new task isolation
> feature or quiesce feature, but a separate control
> (which could apply to all of features, or which one might want
> to apply only to certain features, and in that case a bitmap
> might be specified).
>
> So i think adding a new parameter such as:
>
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE, CMD, arg, ...);
>
> is a good idea. So one can have (with two commands, SET_QUIESCE
> and SET_ONESHOT).
>
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE, SET_QUIESCE, ISOL_F_QUIESCE_VMSTAT);
> prctl(PR_ISOL_SET, ISOL_F_QUIESCE, SET_ONESHOT, ISOL_F_QUIESCE_VMSTAT);
>
> Then its possible to add random commands with random parameters
> (rather than be limited by a single bitmask to control quiescing).
>
> Does that make sense?

We can but it means that the ONESHOT property applies to all ISOL_F_QUIESCE
features. So you can't, for example, quiesce ISOL_F_QUIESCE_VMSTAT only once
and quiesce ISOL_F_QUIESCE_FOO all the time.

I have no idea if it matters or not but be aware of limitations.

> > > +#ifdef PR_ISOL_GET
> > > + ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
> > > + if (ret != -1) {
> > > + unsigned long mask = ret;
> > > +
> > > + TEST0(prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0));
> > > + }
> > > +#endif
> > > +
> > > frc(&ts2);
> > > do {
> > > workload_fn(t->dst_buf, t->src_buf, g.workload_mem_size);
> > >
> > > Makes sense?
> >
> > Yes! Btw you might want to fetch the mask of PR_ISOL_GET into the
> > second parameter instead of using the return value which is only
> > 32 bits or prctl() and the highest significant bit is even reserved
> > for the error.
>
> Would be good to do this for all cases, so you can extend the
> struct (or pad it).

Yep.

> > Funny but that would work. Either way, let's keep things that way for now.
> > Just the naming is unfortunate.
> >
> > Well that could be a clone flag after all...
>
> Yes, it could as well. But there are no more bits for older clone
> interfaces, and clone3 seems to be problematic (moreover, this
> interface would need backporting to older kernels).

Another way to go is to use all the features as a mask in PR_ISOL_CFG_SET:

prctl(PR_ISOL_FEAT_GET, 0, &all_features, ...)
prctl(PR_ISOL_CFG_SET, &all_features, PR_ISOL_INHERIT...)

or simply:

prctl(PR_ISOL_CFG_SET, -1, PR_ISOL_INHERIT...)

or even:

prctl(PR_ISOL_CFG_SET, 0, PR_ISOL_INHERIT...)

Thanks.

2021-09-01 17:40:20

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Wed, Sep 01, 2021 at 09:11:56AM -0400, Nitesh Lal wrote:
> On Tue, Aug 24, 2021 at 11:42 AM Marcelo Tosatti <[email protected]> wrote:
> >
> > Add documentation and userspace sample code for prctl
> > task isolation interface.
> >
> > Signed-off-by: Marcelo Tosatti <[email protected]>
> >
> > ---
> > Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
> > samples/Kconfig | 7
> > samples/Makefile | 1
> > samples/task_isolation/Makefile | 9 +
> > samples/task_isolation/task_isol.c | 83 +++++++++
> > samples/task_isolation/task_isol.h | 9 +
> > samples/task_isolation/task_isol_userloop.c | 56 ++++++
> > 7 files changed, 376 insertions(+)
>
> [...]
>
> > + if (ret) {
> > + perror("mlock");
> > + return EXIT_FAILURE;
> > + }
> > +
> > + ret = task_isol_setup();
> > + if (ret)
> > + return EXIT_FAILURE;
>
> The above check condition should be 'ret == -1', isn't it?

task_isol_setup returns 0 on success, so fail to see the point
of testing for ret == -1 rather than ret != 0 ?

2021-09-01 17:52:58

by Nitesh Lal

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Wed, Sep 1, 2021 at 1:38 PM Marcelo Tosatti <[email protected]> wrote:
>
> On Wed, Sep 01, 2021 at 09:11:56AM -0400, Nitesh Lal wrote:
> > On Tue, Aug 24, 2021 at 11:42 AM Marcelo Tosatti <[email protected]> wrote:
> > >
> > > Add documentation and userspace sample code for prctl
> > > task isolation interface.
> > >
> > > Signed-off-by: Marcelo Tosatti <[email protected]>
> > >
> > > ---
> > > Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
> > > samples/Kconfig | 7
> > > samples/Makefile | 1
> > > samples/task_isolation/Makefile | 9 +
> > > samples/task_isolation/task_isol.c | 83 +++++++++
> > > samples/task_isolation/task_isol.h | 9 +
> > > samples/task_isolation/task_isol_userloop.c | 56 ++++++
> > > 7 files changed, 376 insertions(+)
> >
> > [...]
> >
> > > + if (ret) {
> > > + perror("mlock");
> > > + return EXIT_FAILURE;
> > > + }
> > > +
> > > + ret = task_isol_setup();
> > > + if (ret)
> > > + return EXIT_FAILURE;
> >
> > The above check condition should be 'ret == -1', isn't it?
>
> task_isol_setup returns 0 on success, so fail to see the point
> of testing for ret == -1 rather than ret != 0 ?
>

Hmm, could be something that I misinterpreted.
I will double-check.

--
Thanks
Nitesh

2021-09-01 22:13:05

by Nitesh Lal

[permalink] [raw]
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples

On Tue, Aug 24, 2021 at 11:42 AM Marcelo Tosatti <[email protected]> wrote:
>
> Add documentation and userspace sample code for prctl
> task isolation interface.
>
> Signed-off-by: Marcelo Tosatti <[email protected]>
>
> ---
> Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
> samples/Kconfig | 7
> samples/Makefile | 1
> samples/task_isolation/Makefile | 9 +
> samples/task_isolation/task_isol.c | 83 +++++++++
> samples/task_isolation/task_isol.h | 9 +
> samples/task_isolation/task_isol_userloop.c | 56 ++++++
> 7 files changed, 376 insertions(+)

[...]

> + if (ret) {
> + perror("mlock");
> + return EXIT_FAILURE;
> + }
> +
> + ret = task_isol_setup();
> + if (ret)
> + return EXIT_FAILURE;

The above check condition should be 'ret == -1', isn't it?

--
Thanks
Nitesh