2023-07-26 00:31:09

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 1/5] rcutorture: Fix stuttering races and other issues

The stuttering code isn't functioning as expected. Ideally, it should
pause the torture threads for a designated period before resuming. Yet,
it fails to halt the test for the correct duration. Additionally, a race
condition exists, potentially causing the stuttering code to pause for
an extended period if the 'spt' variable is non-zero due to the stutter
orchestration thread's inadequate CPU time.

Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
This happens as the stuttering code may run within a softirq due to RCU
callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
thus obstructing RCU's progress. This situation triggers a warning
message in the logs:

[ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9

This warning suggests that an RCU torture object, although invisible to
RCU readers, couldn't make it past the pipe array and be freed -- a
strong indication that there weren't enough grace periods during the
stutter interval.

To address these issues, this patch sets the "stutter end" time to an
absolute point in the future set by the main stutter thread. This is
then used for waiting in stutter_wait(). While the stutter thread still
defines this absolute time, the waiters' waiting logic doesn't rely on
the stutter thread receiving sufficient CPU time to halt the stuttering
as the halting is now self-controlled.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/torture.c | 46 +++++++++++++---------------------------------
1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/kernel/torture.c b/kernel/torture.c
index 68dba4ecab5c..63f8f2a7d960 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
* suddenly applied to or removed from the system.
*/
static struct task_struct *stutter_task;
-static int stutter_pause_test;
+static ktime_t stutter_till_abs_time;
static int stutter;
static int stutter_gap;

@@ -729,30 +729,17 @@ static int stutter_gap;
*/
bool stutter_wait(const char *title)
{
- unsigned int i = 0;
bool ret = false;
- int spt;
+ ktime_t now_ns, till_ns;

cond_resched_tasks_rcu_qs();
- spt = READ_ONCE(stutter_pause_test);
- for (; spt; spt = READ_ONCE(stutter_pause_test)) {
- if (!ret && !rt_task(current)) {
- sched_set_normal(current, MAX_NICE);
- ret = true;
- }
- if (spt == 1) {
- torture_hrtimeout_jiffies(1, NULL);
- } else if (spt == 2) {
- while (READ_ONCE(stutter_pause_test)) {
- if (!(i++ & 0xffff))
- torture_hrtimeout_us(10, 0, NULL);
- cond_resched();
- }
- } else {
- torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
- }
- torture_shutdown_absorb(title);
+ now_ns = ktime_get();
+ till_ns = READ_ONCE(stutter_till_abs_time);
+ if (till_ns && ktime_before(now_ns, till_ns)) {
+ torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);
+ ret = true;
}
+ torture_shutdown_absorb(title);
return ret;
}
EXPORT_SYMBOL_GPL(stutter_wait);
@@ -763,23 +750,16 @@ EXPORT_SYMBOL_GPL(stutter_wait);
*/
static int torture_stutter(void *arg)
{
- DEFINE_TORTURE_RANDOM(rand);
- int wtime;
+ ktime_t till_ns;

VERBOSE_TOROUT_STRING("torture_stutter task started");
do {
if (!torture_must_stop() && stutter > 1) {
- wtime = stutter;
- if (stutter > 2) {
- WRITE_ONCE(stutter_pause_test, 1);
- wtime = stutter - 3;
- torture_hrtimeout_jiffies(wtime, &rand);
- wtime = 2;
- }
- WRITE_ONCE(stutter_pause_test, 2);
- torture_hrtimeout_jiffies(wtime, NULL);
+ till_ns = ktime_add_ns(ktime_get(),
+ jiffies_to_nsecs(stutter));
+ WRITE_ONCE(stutter_till_abs_time, till_ns);
+ torture_hrtimeout_jiffies(stutter - 1, NULL);
}
- WRITE_ONCE(stutter_pause_test, 0);
if (!torture_must_stop())
torture_hrtimeout_jiffies(stutter_gap, NULL);
torture_shutdown_absorb("torture_stutter");
--
2.41.0.487.g6d72f3e995-goog



2023-07-26 21:46:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/5] rcutorture: Fix stuttering races and other issues

On Tue, Jul 25, 2023 at 11:29:06PM +0000, Joel Fernandes (Google) wrote:
> The stuttering code isn't functioning as expected. Ideally, it should
> pause the torture threads for a designated period before resuming. Yet,
> it fails to halt the test for the correct duration. Additionally, a race
> condition exists, potentially causing the stuttering code to pause for
> an extended period if the 'spt' variable is non-zero due to the stutter
> orchestration thread's inadequate CPU time.
>
> Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
> This happens as the stuttering code may run within a softirq due to RCU
> callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
> thus obstructing RCU's progress. This situation triggers a warning
> message in the logs:
>
> [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
>
> This warning suggests that an RCU torture object, although invisible to
> RCU readers, couldn't make it past the pipe array and be freed -- a
> strong indication that there weren't enough grace periods during the
> stutter interval.
>
> To address these issues, this patch sets the "stutter end" time to an
> absolute point in the future set by the main stutter thread. This is
> then used for waiting in stutter_wait(). While the stutter thread still
> defines this absolute time, the waiters' waiting logic doesn't rely on
> the stutter thread receiving sufficient CPU time to halt the stuttering
> as the halting is now self-controlled.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/torture.c | 46 +++++++++++++---------------------------------
> 1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/torture.c b/kernel/torture.c
> index 68dba4ecab5c..63f8f2a7d960 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
> * suddenly applied to or removed from the system.
> */
> static struct task_struct *stutter_task;
> -static int stutter_pause_test;
> +static ktime_t stutter_till_abs_time;
> static int stutter;
> static int stutter_gap;
>
> @@ -729,30 +729,17 @@ static int stutter_gap;
> */
> bool stutter_wait(const char *title)
> {
> - unsigned int i = 0;
> bool ret = false;
> - int spt;
> + ktime_t now_ns, till_ns;
>
> cond_resched_tasks_rcu_qs();
> - spt = READ_ONCE(stutter_pause_test);
> - for (; spt; spt = READ_ONCE(stutter_pause_test)) {
> - if (!ret && !rt_task(current)) {
> - sched_set_normal(current, MAX_NICE);
> - ret = true;
> - }
> - if (spt == 1) {
> - torture_hrtimeout_jiffies(1, NULL);
> - } else if (spt == 2) {
> - while (READ_ONCE(stutter_pause_test)) {
> - if (!(i++ & 0xffff))
> - torture_hrtimeout_us(10, 0, NULL);
> - cond_resched();
> - }
> - } else {
> - torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
> - }
> - torture_shutdown_absorb(title);
> + now_ns = ktime_get();
> + till_ns = READ_ONCE(stutter_till_abs_time);
> + if (till_ns && ktime_before(now_ns, till_ns)) {
> + torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);

This ktime_sub() is roughly cancelled out by a ktime_add_safe() in
__hrtimer_start_range_ns(). Perhaps torture_hrtimeout_ns() needs to
take a mode argument as in the patch at the end of this email, allowing
you to ditch that ktime_sub() in favor of HRTIMER_MODE_ABS.

Thanx, Paul

> + ret = true;
> }
> + torture_shutdown_absorb(title);
> return ret;
> }
> EXPORT_SYMBOL_GPL(stutter_wait);
> @@ -763,23 +750,16 @@ EXPORT_SYMBOL_GPL(stutter_wait);
> */
> static int torture_stutter(void *arg)
> {
> - DEFINE_TORTURE_RANDOM(rand);
> - int wtime;
> + ktime_t till_ns;
>
> VERBOSE_TOROUT_STRING("torture_stutter task started");
> do {
> if (!torture_must_stop() && stutter > 1) {
> - wtime = stutter;
> - if (stutter > 2) {
> - WRITE_ONCE(stutter_pause_test, 1);
> - wtime = stutter - 3;
> - torture_hrtimeout_jiffies(wtime, &rand);
> - wtime = 2;
> - }
> - WRITE_ONCE(stutter_pause_test, 2);
> - torture_hrtimeout_jiffies(wtime, NULL);
> + till_ns = ktime_add_ns(ktime_get(),
> + jiffies_to_nsecs(stutter));
> + WRITE_ONCE(stutter_till_abs_time, till_ns);
> + torture_hrtimeout_jiffies(stutter - 1, NULL);
> }
> - WRITE_ONCE(stutter_pause_test, 0);
> if (!torture_must_stop())
> torture_hrtimeout_jiffies(stutter_gap, NULL);
> torture_shutdown_absorb("torture_stutter");

------------------------------------------------------------------------

commit 6445f014c3d4d20a0b69bd97d3b76a222820612f
Author: Paul E. McKenney <[email protected]>
Date: Wed Jul 26 13:57:03 2023 -0700

torture: Make torture_hrtimeout_ns() take an hrtimer mode parameter

The current torture-test sleeps are waiting for a duration, but there
are situations where it is better to wait for an absolute time, for
example, when ending a stutter interval. This commit therefore adds
an hrtimer mode parameter to torture_hrtimeout_ns(). Why not also the
other torture_hrtimeout_*() functions? The theory is that most absolute
times will be in nanoseconds, especially not (say) jiffies.

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/torture.h b/include/linux/torture.h
index bb466eec01e4..017f0f710815 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -81,7 +81,8 @@ static inline void torture_random_init(struct torture_random_state *trsp)
}

/* Definitions for high-resolution-timer sleeps. */
-int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, struct torture_random_state *trsp);
+int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, const enum hrtimer_mode mode,
+ struct torture_random_state *trsp);
int torture_hrtimeout_us(u32 baset_us, u32 fuzzt_ns, struct torture_random_state *trsp);
int torture_hrtimeout_ms(u32 baset_ms, u32 fuzzt_us, struct torture_random_state *trsp);
int torture_hrtimeout_jiffies(u32 baset_j, struct torture_random_state *trsp);
diff --git a/kernel/torture.c b/kernel/torture.c
index 68dba4ecab5c..6ba62e5993e7 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -87,14 +87,15 @@ EXPORT_SYMBOL_GPL(verbose_torout_sleep);
* nanosecond random fuzz. This function and its friends desynchronize
* testing from the timer wheel.
*/
-int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, struct torture_random_state *trsp)
+int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, const enum hrtimer_mode mode,
+ struct torture_random_state *trsp)
{
ktime_t hto = baset_ns;

if (trsp)
hto += torture_random(trsp) % fuzzt_ns;
set_current_state(TASK_IDLE);
- return schedule_hrtimeout(&hto, HRTIMER_MODE_REL);
+ return schedule_hrtimeout(&hto, mode);
}
EXPORT_SYMBOL_GPL(torture_hrtimeout_ns);

@@ -106,7 +107,7 @@ int torture_hrtimeout_us(u32 baset_us, u32 fuzzt_ns, struct torture_random_state
{
ktime_t baset_ns = baset_us * NSEC_PER_USEC;

- return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp);
+ return torture_hrtimeout_ns(baset_ns, fuzzt_ns, HRTIMER_MODE_REL, trsp);
}
EXPORT_SYMBOL_GPL(torture_hrtimeout_us);

@@ -123,7 +124,7 @@ int torture_hrtimeout_ms(u32 baset_ms, u32 fuzzt_us, struct torture_random_state
fuzzt_ns = (u32)~0U;
else
fuzzt_ns = fuzzt_us * NSEC_PER_USEC;
- return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp);
+ return torture_hrtimeout_ns(baset_ns, fuzzt_ns, HRTIMER_MODE_REL, trsp);
}
EXPORT_SYMBOL_GPL(torture_hrtimeout_ms);

@@ -136,7 +137,7 @@ int torture_hrtimeout_jiffies(u32 baset_j, struct torture_random_state *trsp)
{
ktime_t baset_ns = jiffies_to_nsecs(baset_j);

- return torture_hrtimeout_ns(baset_ns, jiffies_to_nsecs(1), trsp);
+ return torture_hrtimeout_ns(baset_ns, jiffies_to_nsecs(1), HRTIMER_MODE_REL, trsp);
}
EXPORT_SYMBOL_GPL(torture_hrtimeout_jiffies);

@@ -153,7 +154,7 @@ int torture_hrtimeout_s(u32 baset_s, u32 fuzzt_ms, struct torture_random_state *
fuzzt_ns = (u32)~0U;
else
fuzzt_ns = fuzzt_ms * NSEC_PER_MSEC;
- return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp);
+ return torture_hrtimeout_ns(baset_ns, fuzzt_ns, HRTIMER_MODE_REL, trsp);
}
EXPORT_SYMBOL_GPL(torture_hrtimeout_s);


2023-07-27 03:17:20

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/5] rcutorture: Fix stuttering races and other issues

On Wed, Jul 26, 2023 at 4:59 PM Paul E. McKenney <[email protected]> wrote:
>
> On Tue, Jul 25, 2023 at 11:29:06PM +0000, Joel Fernandes (Google) wrote:
> > The stuttering code isn't functioning as expected. Ideally, it should
> > pause the torture threads for a designated period before resuming. Yet,
> > it fails to halt the test for the correct duration. Additionally, a race
> > condition exists, potentially causing the stuttering code to pause for
> > an extended period if the 'spt' variable is non-zero due to the stutter
> > orchestration thread's inadequate CPU time.
> >
> > Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
> > This happens as the stuttering code may run within a softirq due to RCU
> > callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
> > thus obstructing RCU's progress. This situation triggers a warning
> > message in the logs:
> >
> > [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
> >
> > This warning suggests that an RCU torture object, although invisible to
> > RCU readers, couldn't make it past the pipe array and be freed -- a
> > strong indication that there weren't enough grace periods during the
> > stutter interval.
> >
> > To address these issues, this patch sets the "stutter end" time to an
> > absolute point in the future set by the main stutter thread. This is
> > then used for waiting in stutter_wait(). While the stutter thread still
> > defines this absolute time, the waiters' waiting logic doesn't rely on
> > the stutter thread receiving sufficient CPU time to halt the stuttering
> > as the halting is now self-controlled.
> >
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/torture.c | 46 +++++++++++++---------------------------------
> > 1 file changed, 13 insertions(+), 33 deletions(-)
> >
> > diff --git a/kernel/torture.c b/kernel/torture.c
> > index 68dba4ecab5c..63f8f2a7d960 100644
> > --- a/kernel/torture.c
> > +++ b/kernel/torture.c
> > @@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
> > * suddenly applied to or removed from the system.
> > */
> > static struct task_struct *stutter_task;
> > -static int stutter_pause_test;
> > +static ktime_t stutter_till_abs_time;
> > static int stutter;
> > static int stutter_gap;
> >
> > @@ -729,30 +729,17 @@ static int stutter_gap;
> > */
> > bool stutter_wait(const char *title)
> > {
> > - unsigned int i = 0;
> > bool ret = false;
> > - int spt;
> > + ktime_t now_ns, till_ns;
> >
> > cond_resched_tasks_rcu_qs();
> > - spt = READ_ONCE(stutter_pause_test);
> > - for (; spt; spt = READ_ONCE(stutter_pause_test)) {
> > - if (!ret && !rt_task(current)) {
> > - sched_set_normal(current, MAX_NICE);
> > - ret = true;
> > - }
> > - if (spt == 1) {
> > - torture_hrtimeout_jiffies(1, NULL);
> > - } else if (spt == 2) {
> > - while (READ_ONCE(stutter_pause_test)) {
> > - if (!(i++ & 0xffff))
> > - torture_hrtimeout_us(10, 0, NULL);
> > - cond_resched();
> > - }
> > - } else {
> > - torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
> > - }
> > - torture_shutdown_absorb(title);
> > + now_ns = ktime_get();
> > + till_ns = READ_ONCE(stutter_till_abs_time);
> > + if (till_ns && ktime_before(now_ns, till_ns)) {
> > + torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);
>
> This ktime_sub() is roughly cancelled out by a ktime_add_safe() in
> __hrtimer_start_range_ns().

Yes, functionally it is the same but your suggestion is more robust I think.

> Perhaps torture_hrtimeout_ns() needs to
> take a mode argument as in the patch at the end of this email, allowing
> you to ditch that ktime_sub() in favor of HRTIMER_MODE_ABS.

Sure, or we can add a new API and keep the default as relative?

Or have 2 APIs:
torture_hrtimeout_relative_ns();

and:
torture_hrtimeout_absolute_ns();

That makes it more readable IMHO.

Also, do you want me to make both changes (API and usage) in the same
patch? Or were you planning to have a separate patch yourself in -dev
which I can use? Let me know either way, and then I'll refresh the
patch.

thanks,

- Joel

[..]

2023-07-27 04:40:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/5] rcutorture: Fix stuttering races and other issues

On Wed, Jul 26, 2023 at 11:01:40PM -0400, Joel Fernandes wrote:
> On Wed, Jul 26, 2023 at 4:59 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Tue, Jul 25, 2023 at 11:29:06PM +0000, Joel Fernandes (Google) wrote:
> > > The stuttering code isn't functioning as expected. Ideally, it should
> > > pause the torture threads for a designated period before resuming. Yet,
> > > it fails to halt the test for the correct duration. Additionally, a race
> > > condition exists, potentially causing the stuttering code to pause for
> > > an extended period if the 'spt' variable is non-zero due to the stutter
> > > orchestration thread's inadequate CPU time.
> > >
> > > Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
> > > This happens as the stuttering code may run within a softirq due to RCU
> > > callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
> > > thus obstructing RCU's progress. This situation triggers a warning
> > > message in the logs:
> > >
> > > [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
> > >
> > > This warning suggests that an RCU torture object, although invisible to
> > > RCU readers, couldn't make it past the pipe array and be freed -- a
> > > strong indication that there weren't enough grace periods during the
> > > stutter interval.
> > >
> > > To address these issues, this patch sets the "stutter end" time to an
> > > absolute point in the future set by the main stutter thread. This is
> > > then used for waiting in stutter_wait(). While the stutter thread still
> > > defines this absolute time, the waiters' waiting logic doesn't rely on
> > > the stutter thread receiving sufficient CPU time to halt the stuttering
> > > as the halting is now self-controlled.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > kernel/torture.c | 46 +++++++++++++---------------------------------
> > > 1 file changed, 13 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/kernel/torture.c b/kernel/torture.c
> > > index 68dba4ecab5c..63f8f2a7d960 100644
> > > --- a/kernel/torture.c
> > > +++ b/kernel/torture.c
> > > @@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
> > > * suddenly applied to or removed from the system.
> > > */
> > > static struct task_struct *stutter_task;
> > > -static int stutter_pause_test;
> > > +static ktime_t stutter_till_abs_time;
> > > static int stutter;
> > > static int stutter_gap;
> > >
> > > @@ -729,30 +729,17 @@ static int stutter_gap;
> > > */
> > > bool stutter_wait(const char *title)
> > > {
> > > - unsigned int i = 0;
> > > bool ret = false;
> > > - int spt;
> > > + ktime_t now_ns, till_ns;
> > >
> > > cond_resched_tasks_rcu_qs();
> > > - spt = READ_ONCE(stutter_pause_test);
> > > - for (; spt; spt = READ_ONCE(stutter_pause_test)) {
> > > - if (!ret && !rt_task(current)) {
> > > - sched_set_normal(current, MAX_NICE);
> > > - ret = true;
> > > - }
> > > - if (spt == 1) {
> > > - torture_hrtimeout_jiffies(1, NULL);
> > > - } else if (spt == 2) {
> > > - while (READ_ONCE(stutter_pause_test)) {
> > > - if (!(i++ & 0xffff))
> > > - torture_hrtimeout_us(10, 0, NULL);
> > > - cond_resched();
> > > - }
> > > - } else {
> > > - torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
> > > - }
> > > - torture_shutdown_absorb(title);
> > > + now_ns = ktime_get();
> > > + till_ns = READ_ONCE(stutter_till_abs_time);
> > > + if (till_ns && ktime_before(now_ns, till_ns)) {
> > > + torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);
> >
> > This ktime_sub() is roughly cancelled out by a ktime_add_safe() in
> > __hrtimer_start_range_ns().
>
> Yes, functionally it is the same but your suggestion is more robust I think.
>
> > Perhaps torture_hrtimeout_ns() needs to
> > take a mode argument as in the patch at the end of this email, allowing
> > you to ditch that ktime_sub() in favor of HRTIMER_MODE_ABS.
>
> Sure, or we can add a new API and keep the default as relative?
>
> Or have 2 APIs:
> torture_hrtimeout_relative_ns();
>
> and:
> torture_hrtimeout_absolute_ns();
>
> That makes it more readable IMHO.
>
> Also, do you want me to make both changes (API and usage) in the same
> patch? Or were you planning to have a separate patch yourself in -dev
> which I can use? Let me know either way, and then I'll refresh the
> patch.

I queued the patch on the -rcu tree's "dev" branch. It turns out that
torture_hrtimeout_ns() isn't called very many times, so adding the
parameter was straightforward. Plus the compiler might well optimize
it away anyway.

Thanx, Paul

2023-07-27 05:21:22

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/5] rcutorture: Fix stuttering races and other issues

On 7/27/23 00:01, Paul E. McKenney wrote:
> On Wed, Jul 26, 2023 at 11:01:40PM -0400, Joel Fernandes wrote:
>> On Wed, Jul 26, 2023 at 4:59 PM Paul E. McKenney <[email protected]> wrote:
>>>
>>> On Tue, Jul 25, 2023 at 11:29:06PM +0000, Joel Fernandes (Google) wrote:
>>>> The stuttering code isn't functioning as expected. Ideally, it should
>>>> pause the torture threads for a designated period before resuming. Yet,
>>>> it fails to halt the test for the correct duration. Additionally, a race
>>>> condition exists, potentially causing the stuttering code to pause for
>>>> an extended period if the 'spt' variable is non-zero due to the stutter
>>>> orchestration thread's inadequate CPU time.
>>>>
>>>> Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
>>>> This happens as the stuttering code may run within a softirq due to RCU
>>>> callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
>>>> thus obstructing RCU's progress. This situation triggers a warning
>>>> message in the logs:
>>>>
>>>> [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
>>>>
>>>> This warning suggests that an RCU torture object, although invisible to
>>>> RCU readers, couldn't make it past the pipe array and be freed -- a
>>>> strong indication that there weren't enough grace periods during the
>>>> stutter interval.
>>>>
>>>> To address these issues, this patch sets the "stutter end" time to an
>>>> absolute point in the future set by the main stutter thread. This is
>>>> then used for waiting in stutter_wait(). While the stutter thread still
>>>> defines this absolute time, the waiters' waiting logic doesn't rely on
>>>> the stutter thread receiving sufficient CPU time to halt the stuttering
>>>> as the halting is now self-controlled.
>>>>
>>>> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>>>> ---
>>>> kernel/torture.c | 46 +++++++++++++---------------------------------
>>>> 1 file changed, 13 insertions(+), 33 deletions(-)
>>>>
>>>> diff --git a/kernel/torture.c b/kernel/torture.c
>>>> index 68dba4ecab5c..63f8f2a7d960 100644
>>>> --- a/kernel/torture.c
>>>> +++ b/kernel/torture.c
>>>> @@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
>>>> * suddenly applied to or removed from the system.
>>>> */
>>>> static struct task_struct *stutter_task;
>>>> -static int stutter_pause_test;
>>>> +static ktime_t stutter_till_abs_time;
>>>> static int stutter;
>>>> static int stutter_gap;
>>>>
>>>> @@ -729,30 +729,17 @@ static int stutter_gap;
>>>> */
>>>> bool stutter_wait(const char *title)
>>>> {
>>>> - unsigned int i = 0;
>>>> bool ret = false;
>>>> - int spt;
>>>> + ktime_t now_ns, till_ns;
>>>>
>>>> cond_resched_tasks_rcu_qs();
>>>> - spt = READ_ONCE(stutter_pause_test);
>>>> - for (; spt; spt = READ_ONCE(stutter_pause_test)) {
>>>> - if (!ret && !rt_task(current)) {
>>>> - sched_set_normal(current, MAX_NICE);
>>>> - ret = true;
>>>> - }
>>>> - if (spt == 1) {
>>>> - torture_hrtimeout_jiffies(1, NULL);
>>>> - } else if (spt == 2) {
>>>> - while (READ_ONCE(stutter_pause_test)) {
>>>> - if (!(i++ & 0xffff))
>>>> - torture_hrtimeout_us(10, 0, NULL);
>>>> - cond_resched();
>>>> - }
>>>> - } else {
>>>> - torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
>>>> - }
>>>> - torture_shutdown_absorb(title);
>>>> + now_ns = ktime_get();
>>>> + till_ns = READ_ONCE(stutter_till_abs_time);
>>>> + if (till_ns && ktime_before(now_ns, till_ns)) {
>>>> + torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);
>>>
>>> This ktime_sub() is roughly cancelled out by a ktime_add_safe() in
>>> __hrtimer_start_range_ns().
>>
>> Yes, functionally it is the same but your suggestion is more robust I think.
>>
>>> Perhaps torture_hrtimeout_ns() needs to
>>> take a mode argument as in the patch at the end of this email, allowing
>>> you to ditch that ktime_sub() in favor of HRTIMER_MODE_ABS.
>>
>> Sure, or we can add a new API and keep the default as relative?
>>
>> Or have 2 APIs:
>> torture_hrtimeout_relative_ns();
>>
>> and:
>> torture_hrtimeout_absolute_ns();
>>
>> That makes it more readable IMHO.
>>
>> Also, do you want me to make both changes (API and usage) in the same
>> patch? Or were you planning to have a separate patch yourself in -dev
>> which I can use? Let me know either way, and then I'll refresh the
>> patch.
>
> I queued the patch on the -rcu tree's "dev" branch. It turns out that
> torture_hrtimeout_ns() isn't called very many times, so adding the
> parameter was straightforward. Plus the compiler might well optimize
> it away anyway.

Ok sounds good, I will make use of it in this patch and send it again after testing.

thanks,

- Joel