2021-03-11 14:19:52

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V2 0/3] signals: Allow caching one sigqueue object per task

This is a follow up to the initial submission which can be found here:

https://lore.kernel.org/r/[email protected]

Signal sending requires a kmem cache allocation at the sender side and the
receiver hands it back to the kmem cache when consuming the signal.

This works pretty well even for realtime workloads except for the case when
the kmem cache allocation has to go into the slow path which is rare but
happens.

Preempt-RT carries a patch which allows caching of one sigqueue object per
task. The object is not preallocated. It's cached when the task receives a
signal. The cache is freed when the task exits.

The memory overhead for a standard distro setup is pretty small. After boot
there are less than 10 objects cached in about 1500 tasks. The speedup for
sending a signal from a cached sigqueue object is small (~3us) per signal
and almost invisible, but for signal heavy workloads it's definitely
measurable and for the targeted realtime workloads it's solving a real
world latency issue.

Changes vs V1:

- the caching is now unconditional
- drop the pointless cmpxchg
- split the patch up for better readability
- add a proper analysis to the changelog vs. the impact and benefits

Thanks,

tglx
---
include/linux/sched.h | 1 +
include/linux/signal.h | 1 +
kernel/exit.c | 3 +--
kernel/fork.c | 1 +
kernel/signal.c | 44 +++++++++++++++++++++++++++++++++-----------
5 files changed, 37 insertions(+), 13 deletions(-)


2021-03-11 21:17:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch V2 0/3] signals: Allow caching one sigqueue object per task

Thomas Gleixner <[email protected]> writes:

> This is a follow up to the initial submission which can be found here:
>
> https://lore.kernel.org/r/[email protected]
>
> Signal sending requires a kmem cache allocation at the sender side and the
> receiver hands it back to the kmem cache when consuming the signal.
>
> This works pretty well even for realtime workloads except for the case when
> the kmem cache allocation has to go into the slow path which is rare but
> happens.
>
> Preempt-RT carries a patch which allows caching of one sigqueue object per
> task. The object is not preallocated. It's cached when the task receives a
> signal. The cache is freed when the task exits.

I am probably skimming fast and missed your explanation but is there
a reason the caching is per task (aka thread) and not per signal_struct
(aka process)?

My sense is most signal delivery is per process. Are realtime workloads
that extensively use pthread_sigqueue? The ordinary sigqueue interface
only allows targeting a process.

Mostly I am just trying to get a sense of the workloads that are
improved by this.

Eric

2021-03-12 20:04:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V2 0/3] signals: Allow caching one sigqueue object per task

On Thu, Mar 11 2021 at 15:13, Eric W. Biederman wrote:
> Thomas Gleixner <[email protected]> writes:
>
>> This is a follow up to the initial submission which can be found here:
>>
>> https://lore.kernel.org/r/[email protected]
>>
>> Signal sending requires a kmem cache allocation at the sender side and the
>> receiver hands it back to the kmem cache when consuming the signal.
>>
>> This works pretty well even for realtime workloads except for the case when
>> the kmem cache allocation has to go into the slow path which is rare but
>> happens.
>>
>> Preempt-RT carries a patch which allows caching of one sigqueue object per
>> task. The object is not preallocated. It's cached when the task receives a
>> signal. The cache is freed when the task exits.
>
> I am probably skimming fast and missed your explanation but is there
> a reason the caching is per task (aka thread) and not per signal_struct
> (aka process)?
>
> My sense is most signal delivery is per process. Are realtime workloads
> that extensively use pthread_sigqueue? The ordinary sigqueue interface
> only allows targeting a process.

Unfortunately they use both. The majority is probably process based.

Thanks,

tglx