For isolated CPUs, we'd like to skip awakening ktimersoftd
(the switch to and then back from ktimersoftd takes 10us in
virtualized environments, in addition to other OS overhead,
which exceeds telco requirements for packet forwarding for
5G) from the sched tick.
The patch "timers: do not raise softirq unconditionally" from Thomas
attempts to address that by checking, in the sched tick, whether its
necessary to raise the timer softirq. Unfortunately, it attempts to grab
the tvec base spinlock which generates the issue described in the patch
"Revert "timers: do not raise softirq unconditionally"".
tvec_base->lock protects addition of timers to the wheel versus
timer interrupt execution.
This patch does not grab the tvec base spinlock from irq context,
but rather performs a lockless access to base->pending_map.
It handles the the race between timer addition and timer interrupt
execution by unconditionally (in case of isolated CPUs) raising the
timer softirq after making sure the updated bitmap is visible
on remote CPUs.
This patchset reduces cyclictest latency from 25us to 14us
on my testbox.
On Mon, Apr 15, 2019 at 05:12:13PM -0300, Marcelo Tosatti wrote:
> For isolated CPUs, we'd like to skip awakening ktimersoftd
> (the switch to and then back from ktimersoftd takes 10us in
> virtualized environments, in addition to other OS overhead,
> which exceeds telco requirements for packet forwarding for
> 5G) from the sched tick.
>
> The patch "timers: do not raise softirq unconditionally" from Thomas
> attempts to address that by checking, in the sched tick, whether its
> necessary to raise the timer softirq. Unfortunately, it attempts to grab
> the tvec base spinlock which generates the issue described in the patch
> "Revert "timers: do not raise softirq unconditionally"".
>
> tvec_base->lock protects addition of timers to the wheel versus
> timer interrupt execution.
>
> This patch does not grab the tvec base spinlock from irq context,
> but rather performs a lockless access to base->pending_map.
>
> It handles the the race between timer addition and timer interrupt
> execution by unconditionally (in case of isolated CPUs) raising the
> timer softirq after making sure the updated bitmap is visible
> on remote CPUs.
>
> This patchset reduces cyclictest latency from 25us to 14us
> on my testbox.
Patch against v5.0.5-rt3 branch of linux-rt-devel.
On Mon, Apr 15, 2019 at 05:12:13PM -0300, Marcelo Tosatti wrote:
> For isolated CPUs, we'd like to skip awakening ktimersoftd
> (the switch to and then back from ktimersoftd takes 10us in
> virtualized environments, in addition to other OS overhead,
> which exceeds telco requirements for packet forwarding for
> 5G) from the sched tick.
>
> The patch "timers: do not raise softirq unconditionally" from Thomas
> attempts to address that by checking, in the sched tick, whether its
> necessary to raise the timer softirq. Unfortunately, it attempts to grab
> the tvec base spinlock which generates the issue described in the patch
> "Revert "timers: do not raise softirq unconditionally"".
>
> tvec_base->lock protects addition of timers to the wheel versus
> timer interrupt execution.
>
> This patch does not grab the tvec base spinlock from irq context,
> but rather performs a lockless access to base->pending_map.
>
> It handles the the race between timer addition and timer interrupt
> execution by unconditionally (in case of isolated CPUs) raising the
> timer softirq after making sure the updated bitmap is visible
> on remote CPUs.
>
> This patchset reduces cyclictest latency from 25us to 14us
> on my testbox.
>
>
Ping?
On 5/6/19 5:22 AM, Marcelo Tosatti wrote:
> On Mon, Apr 15, 2019 at 05:12:13PM -0300, Marcelo Tosatti wrote:
>> For isolated CPUs, we'd like to skip awakening ktimersoftd
>> (the switch to and then back from ktimersoftd takes 10us in
>> virtualized environments, in addition to other OS overhead,
>> which exceeds telco requirements for packet forwarding for
>> 5G) from the sched tick.
>>
>> The patch "timers: do not raise softirq unconditionally" from Thomas
>> attempts to address that by checking, in the sched tick, whether its
>> necessary to raise the timer softirq. Unfortunately, it attempts to grab
>> the tvec base spinlock which generates the issue described in the patch
>> "Revert "timers: do not raise softirq unconditionally"".
>>
>> tvec_base->lock protects addition of timers to the wheel versus
>> timer interrupt execution.
>>
>> This patch does not grab the tvec base spinlock from irq context,
>> but rather performs a lockless access to base->pending_map.
>>
>> It handles the the race between timer addition and timer interrupt
>> execution by unconditionally (in case of isolated CPUs) raising the
>> timer softirq after making sure the updated bitmap is visible
>> on remote CPUs.
>>
>> This patchset reduces cyclictest latency from 25us to 14us
>> on my testbox.
>>
>>
>
> Ping?
>
Hi Marcelo,
I've been running your patches with lockdep and other debug options and did not
find any problem of this kind. Also, I did not find any kind of timing
regressions in tests with the PREEMPT RT...
-- Daniel
On Mon, 6 May 2019, Marcelo Tosatti wrote:
> On Mon, Apr 15, 2019 at 05:12:13PM -0300, Marcelo Tosatti wrote:
> > It handles the the race between timer addition and timer interrupt
> > execution by unconditionally (in case of isolated CPUs) raising the
> > timer softirq after making sure the updated bitmap is visible
> > on remote CPUs.
> >
> > This patchset reduces cyclictest latency from 25us to 14us
> > on my testbox.
>
> Ping?
On my list ....
Hi,
I had a look at the queue and have several questions about your
implementation.
First of all, I had some troubles to understand your commit messages. So I
first had to read the code and then tried to understand the commit
messages. It is easier, if it works the other way round.
On Mon, 15 Apr 2019, Marcelo Tosatti wrote:
> For isolated CPUs, we'd like to skip awakening ktimersoftd
> (the switch to and then back from ktimersoftd takes 10us in
> virtualized environments, in addition to other OS overhead,
> which exceeds telco requirements for packet forwarding for
> 5G) from the sched tick.
You would like to prevent raising the timer softirq in general from the
sched tick for isolated CPUs? Or you would like to prevent raising the
timer softirq if no pending timer is available?
Nevertheless, this change is not PREEMPT_RT specific. It is a NOHZ
dependand change. So it would be nice, if the queue is against
mainline. But please correct me, if I'm wrong.
[...]
> This patchset reduces cyclictest latency from 25us to 14us
> on my testbox.
>
A lot of information is missing: How does your environment looks like for
this test, what is your workload,...?
Did you also run other tests?
Thanks,
Anna-Maria
On Wed, May 29, 2019 at 04:52:37PM +0200, Anna-Maria Gleixner wrote:
> Hi,
>
> I had a look at the queue and have several questions about your
> implementation.
>
> First of all, I had some troubles to understand your commit messages. So I
> first had to read the code and then tried to understand the commit
> messages. It is easier, if it works the other way round.
Right. Commit message seemed descriptive to me, but i should probably
try to improve the explanation in the commit message.
> On Mon, 15 Apr 2019, Marcelo Tosatti wrote:
>
> > For isolated CPUs, we'd like to skip awakening ktimersoftd
> > (the switch to and then back from ktimersoftd takes 10us in
> > virtualized environments, in addition to other OS overhead,
> > which exceeds telco requirements for packet forwarding for
> > 5G) from the sched tick.
>
> You would like to prevent raising the timer softirq in general from the
> sched tick for isolated CPUs? Or you would like to prevent raising the
> timer softirq if no pending timer is available?
With DPDK and CPU isolation, there are no low resolution timers running on
the isolated CPUs. So prevent raising timer softirq if no pending timer
is available is enough.
> Nevertheless, this change is not PREEMPT_RT specific. It is a NOHZ
> dependand change. So it would be nice, if the queue is against
> mainline. But please correct me, if I'm wrong.
Since it was based on the idea of
https://lore.kernel.org/patchwork/patch/446045/
Which was an -RT patch, i decided to submit it against the -RT kernel.
But it can be merged through the mainline kernel queue as well,
i believe.
> [...]
>
> > This patchset reduces cyclictest latency from 25us to 14us
> > on my testbox.
> >
>
> A lot of information is missing: How does your environment looks like for
> this test, what is your workload,...?
x86 hardware, with KVM virtualized, running -RT kernel on both host
and guest. Relevant host kernel command line:
isolcpus=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,14
intel_pstate=disable nosoftlockup nohz=on
nohz_full=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,14
rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,14
Relevant guest kernel command line:
isolcpus=1 nosoftlockup nohz=on nohz_full=1
rcu_nocbs=1
And workload in question is cyclictest (the production
software is DPDK, but cyclictest is used to gather
latency data).
> Did you also run other tests?
Yes, have a custom module stress test to attempt to hit the race.
Daniel (CC'ed) ran it under a kernel with debugging enabled,
with no apparent problems.
So one downside of this patch is that raises the softirq
unnecessarily sometimes. However, its impact is reduced to
isolated CPUs.