We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
is called, no matter whether it actually be scheduled or not. However,
it currently doesn't report the quiescent state when the task enters
into __schedule() as it's called with preempt = true. So make it report
the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
called.
And in TINY_RCU, even though the quiescent state of rcu_bh also should
be reported when the tick interrupt comes from user, it doesn't. So make
it reported.
Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
reported when the tick interrupt comes from not only user but also idle,
as an extended quiescent state.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/rcupdate.h | 4 ++--
kernel/rcu/tiny.c | 6 +++---
kernel/rcu/tree.c | 4 ++--
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ee8cf5fc..7432261 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
*/
#define cond_resched_tasks_rcu_qs() \
do { \
- if (!cond_resched()) \
- rcu_note_voluntary_context_switch_lite(current); \
+ rcu_note_voluntary_context_switch_lite(current); \
+ cond_resched(); \
} while (0)
/*
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index a64eee0..68d2332 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -120,12 +120,12 @@ void rcu_bh_qs(void)
*/
void rcu_check_callbacks(int user)
{
- if (user)
+ if (user) {
rcu_sched_qs();
- else if (!in_softirq())
rcu_bh_qs();
- if (user)
rcu_note_voluntary_context_switch(current);
+ } else if (!in_softirq())
+ rcu_bh_qs();
}
/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 36075dd..1abe29a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2595,6 +2595,7 @@ void rcu_check_callbacks(int user)
rcu_sched_qs();
rcu_bh_qs();
+ rcu_note_voluntary_context_switch(current);
} else if (!in_softirq()) {
@@ -2610,8 +2611,7 @@ void rcu_check_callbacks(int user)
rcu_preempt_check_callbacks();
if (rcu_pending())
invoke_rcu_core();
- if (user)
- rcu_note_voluntary_context_switch(current);
+
trace_rcu_utilization(TPS("End scheduler-tick"));
}
--
1.9.1
Hello folks,
I think I wrote the title in a misleading way.
Please change the title to something else such as,
"rcu: Report a quiescent state when it's in the state" or,
"rcu: Add points reporting quiescent states where proper" or so on.
On 2018-05-11 오후 5:30, Byungchul Park wrote:
> We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> is called, no matter whether it actually be scheduled or not. However,
> it currently doesn't report the quiescent state when the task enters
> into __schedule() as it's called with preempt = true. So make it report
> the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> called.
>
> And in TINY_RCU, even though the quiescent state of rcu_bh also should
> be reported when the tick interrupt comes from user, it doesn't. So make
> it reported.
>
> Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> reported when the tick interrupt comes from not only user but also idle,
> as an extended quiescent state.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> include/linux/rcupdate.h | 4 ++--
> kernel/rcu/tiny.c | 6 +++---
> kernel/rcu/tree.c | 4 ++--
> 3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index ee8cf5fc..7432261 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> */
> #define cond_resched_tasks_rcu_qs() \
> do { \
> - if (!cond_resched()) \
> - rcu_note_voluntary_context_switch_lite(current); \
> + rcu_note_voluntary_context_switch_lite(current); \
> + cond_resched(); \
> } while (0)
>
> /*
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index a64eee0..68d2332 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -120,12 +120,12 @@ void rcu_bh_qs(void)
> */
> void rcu_check_callbacks(int user)
> {
> - if (user)
> + if (user) {
> rcu_sched_qs();
> - else if (!in_softirq())
> rcu_bh_qs();
> - if (user)
> rcu_note_voluntary_context_switch(current);
> + } else if (!in_softirq())
> + rcu_bh_qs();
> }
>
> /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 36075dd..1abe29a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2595,6 +2595,7 @@ void rcu_check_callbacks(int user)
>
> rcu_sched_qs();
> rcu_bh_qs();
> + rcu_note_voluntary_context_switch(current);
>
> } else if (!in_softirq()) {
>
> @@ -2610,8 +2611,7 @@ void rcu_check_callbacks(int user)
> rcu_preempt_check_callbacks();
> if (rcu_pending())
> invoke_rcu_core();
> - if (user)
> - rcu_note_voluntary_context_switch(current);
> +
> trace_rcu_utilization(TPS("End scheduler-tick"));
> }
>
>
--
Thanks,
Byungchul
On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> Hello folks,
>
> I think I wrote the title in a misleading way.
>
> Please change the title to something else such as,
> "rcu: Report a quiescent state when it's in the state" or,
> "rcu: Add points reporting quiescent states where proper" or so on.
>
> On 2018-05-11 오후 5:30, Byungchul Park wrote:
> >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> >is called, no matter whether it actually be scheduled or not. However,
> >it currently doesn't report the quiescent state when the task enters
> >into __schedule() as it's called with preempt = true. So make it report
> >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> >called.
> >
> >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> >be reported when the tick interrupt comes from user, it doesn't. So make
> >it reported.
> >
> >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> >reported when the tick interrupt comes from not only user but also idle,
> >as an extended quiescent state.
> >
> >Signed-off-by: Byungchul Park <[email protected]>
> >---
> > include/linux/rcupdate.h | 4 ++--
> > kernel/rcu/tiny.c | 6 +++---
> > kernel/rcu/tree.c | 4 ++--
> > 3 files changed, 7 insertions(+), 7 deletions(-)
> >
> >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >index ee8cf5fc..7432261 100644
> >--- a/include/linux/rcupdate.h
> >+++ b/include/linux/rcupdate.h
> >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > */
> > #define cond_resched_tasks_rcu_qs() \
> > do { \
> >- if (!cond_resched()) \
> >- rcu_note_voluntary_context_switch_lite(current); \
> >+ rcu_note_voluntary_context_switch_lite(current); \
> >+ cond_resched(); \
Ah, good point.
Peter, I have to ask... Why is "cond_resched()" considered a preemption
while "schedule()" is not?
> > } while (0)
> > /*
> >diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> >index a64eee0..68d2332 100644
> >--- a/kernel/rcu/tiny.c
> >+++ b/kernel/rcu/tiny.c
> >@@ -120,12 +120,12 @@ void rcu_bh_qs(void)
> > */
> > void rcu_check_callbacks(int user)
> > {
> >- if (user)
> >+ if (user) {
> > rcu_sched_qs();
> >- else if (!in_softirq())
> > rcu_bh_qs();
> >- if (user)
> > rcu_note_voluntary_context_switch(current);
> >+ } else if (!in_softirq())
> >+ rcu_bh_qs();
> > }
> > /*
> >diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >index 36075dd..1abe29a 100644
> >--- a/kernel/rcu/tree.c
> >+++ b/kernel/rcu/tree.c
> >@@ -2595,6 +2595,7 @@ void rcu_check_callbacks(int user)
> > rcu_sched_qs();
> > rcu_bh_qs();
> >+ rcu_note_voluntary_context_switch(current);
> > } else if (!in_softirq()) {
> >@@ -2610,8 +2611,7 @@ void rcu_check_callbacks(int user)
> > rcu_preempt_check_callbacks();
> > if (rcu_pending())
> > invoke_rcu_core();
> >- if (user)
> >- rcu_note_voluntary_context_switch(current);
> >+
I recall that I had some reason for wanting this down here, but
do not recall the reason itself. I will try testing this patch
to see if rcutorture reminds me.
Thanx, Paul
> > trace_rcu_utilization(TPS("End scheduler-tick"));
> > }
> >
>
> --
> Thanks,
> Byungchul
>
On Fri, 11 May 2018 09:17:46 -0700
"Paul E. McKenney" <[email protected]> wrote:
> > >index ee8cf5fc..7432261 100644
> > >--- a/include/linux/rcupdate.h
> > >+++ b/include/linux/rcupdate.h
> > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > */
> > > #define cond_resched_tasks_rcu_qs() \
> > > do { \
> > >- if (!cond_resched()) \
> > >- rcu_note_voluntary_context_switch_lite(current); \
> > >+ rcu_note_voluntary_context_switch_lite(current); \
> > >+ cond_resched(); \
>
> Ah, good point.
>
> Peter, I have to ask... Why is "cond_resched()" considered a preemption
> while "schedule()" is not?
I would argue that cond_resched() not be considered a preemption.
Although, it may be called a "preemption point". A place that can be
preempted, but may not be. Maybe that's the answer. schedule() will
always schedule (even though it may pick the same task to run, but
not guaranteed to), where as, cond_resched() will only schedule if the
conditions are right. And maybe that's not really a "voluntary
schedule", although I think that can be argued against.
-- Steve
On Fri, 11 May 2018 12:23:21 -0400
Steven Rostedt <[email protected]> wrote:
> On Fri, 11 May 2018 09:17:46 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > > >index ee8cf5fc..7432261 100644
> > > >--- a/include/linux/rcupdate.h
> > > >+++ b/include/linux/rcupdate.h
> > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > */
> > > > #define cond_resched_tasks_rcu_qs() \
> > > > do { \
> > > >- if (!cond_resched()) \
> > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > >+ cond_resched(); \
> >
> > Ah, good point.
> >
> > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > while "schedule()" is not?
>
> I would argue that cond_resched() not be considered a preemption.
> Although, it may be called a "preemption point". A place that can be
> preempted, but may not be. Maybe that's the answer. schedule() will
> always schedule (even though it may pick the same task to run, but
> not guaranteed to), where as, cond_resched() will only schedule if the
> conditions are right. And maybe that's not really a "voluntary
> schedule", although I think that can be argued against.
>
I would also say that one should never call schedule() directly without
changing its state to something other than TASK_RUNNING. Hence, calling
schedule directly is saying you are ready to sleep. But that is not the
case with cond_resched() which should always be called with the state
as TASK_RUNNING.
-- Steve
On Fri, 11 May 2018 12:25:28 -0400
Steven Rostedt <[email protected]> wrote:
> I would also say that one should never call schedule() directly without
> changing its state to something other than TASK_RUNNING. Hence, calling
> schedule directly is saying you are ready to sleep. But that is not the
> case with cond_resched() which should always be called with the state
> as TASK_RUNNING.
To continue this, with tracing, when a task is scheduled out in the
RUNNING state, it is considered preempted, otherwise it is not.
-- Steve
On Fri, May 11, 2018 at 12:27:12PM -0400, Steven Rostedt wrote:
> On Fri, 11 May 2018 12:25:28 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > I would also say that one should never call schedule() directly without
> > changing its state to something other than TASK_RUNNING. Hence, calling
> > schedule directly is saying you are ready to sleep. But that is not the
> > case with cond_resched() which should always be called with the state
> > as TASK_RUNNING.
>
> To continue this, with tracing, when a task is scheduled out in the
> RUNNING state, it is considered preempted, otherwise it is not.
I suppose another option would be for cond_resched_tasks_rcu_qs() to set
(and later clear) a per-CPU variable that causes rcu_note_context_switch()
to ignore its "preempt" parameter. Byungchul's approach seems more
straightforward, though.
Thanx, Paul
On Fri, 11 May 2018 10:27:35 -0700
"Paul E. McKenney" <[email protected]> wrote:
> On Fri, May 11, 2018 at 12:27:12PM -0400, Steven Rostedt wrote:
> > On Fri, 11 May 2018 12:25:28 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > I would also say that one should never call schedule() directly without
> > > changing its state to something other than TASK_RUNNING. Hence, calling
> > > schedule directly is saying you are ready to sleep. But that is not the
> > > case with cond_resched() which should always be called with the state
> > > as TASK_RUNNING.
> >
> > To continue this, with tracing, when a task is scheduled out in the
> > RUNNING state, it is considered preempted, otherwise it is not.
>
> I suppose another option would be for cond_resched_tasks_rcu_qs() to set
> (and later clear) a per-CPU variable that causes rcu_note_context_switch()
> to ignore its "preempt" parameter. Byungchul's approach seems more
> straightforward, though.
I agree that I prefer Byungchul's approach better ;-)
-- Steve
On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > Hello folks,
> >
> > I think I wrote the title in a misleading way.
> >
> > Please change the title to something else such as,
> > "rcu: Report a quiescent state when it's in the state" or,
> > "rcu: Add points reporting quiescent states where proper" or so on.
> >
> > On 2018-05-11 오후 5:30, Byungchul Park wrote:
> > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > >is called, no matter whether it actually be scheduled or not. However,
> > >it currently doesn't report the quiescent state when the task enters
> > >into __schedule() as it's called with preempt = true. So make it report
> > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > >called.
> > >
> > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > >be reported when the tick interrupt comes from user, it doesn't. So make
> > >it reported.
> > >
> > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > >reported when the tick interrupt comes from not only user but also idle,
> > >as an extended quiescent state.
> > >
> > >Signed-off-by: Byungchul Park <[email protected]>
> > >---
> > > include/linux/rcupdate.h | 4 ++--
> > > kernel/rcu/tiny.c | 6 +++---
> > > kernel/rcu/tree.c | 4 ++--
> > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > >
> > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > >index ee8cf5fc..7432261 100644
> > >--- a/include/linux/rcupdate.h
> > >+++ b/include/linux/rcupdate.h
> > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > */
> > > #define cond_resched_tasks_rcu_qs() \
> > > do { \
> > >- if (!cond_resched()) \
> > >- rcu_note_voluntary_context_switch_lite(current); \
> > >+ rcu_note_voluntary_context_switch_lite(current); \
> > >+ cond_resched(); \
>
> Ah, good point.
>
> Peter, I have to ask... Why is "cond_resched()" considered a preemption
> while "schedule()" is not?
Infact something interesting I inferred from the __schedule loop related to
your question:
switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
assume that switch_count reflects whether the context switch is involuntary
or voluntary,
task-running-state preempt switch_count
0 (running) 1 involuntary
0 0 involuntary
1 0 voluntary
1 1 involuntary
According to the above table, both the task's running state and the preempt
parameter to __schedule should be used together to determine if the switch is
a voluntary one or not.
So this code in rcu_note_context_switch should really be:
if (!preempt && !(current->state & TASK_RUNNING))
rcu_note_voluntary_context_switch_lite(current);
According to the above table, cond_resched always classifies as an
involuntary switch which makes sense to me. Even though cond_resched is
explicitly called, its still sort of involuntary in the sense its not called
into the scheduler for sleeping, but rather for seeing if something else can
run instead (a preemption point). Infact none of the task deactivation in the
__schedule loop will run if cond_resched is used.
I agree that if schedule was called directly but with TASK_RUNNING=1, then
that could probably be classified an involuntary switch too...
Also since we're deciding to call rcu_note_voluntary_context_switch_lite
unconditionally, then IMO this comment on that macro:
/*
* Note a voluntary context switch for RCU-tasks benefit. This is a
* macro rather than an inline function to avoid #include hell.
*/
#ifdef CONFIG_TASKS_RCU
#define rcu_note_voluntary_context_switch_lite(t)
Should be changed to:
/*
* Note a attempt to perform a voluntary context switch for RCU-tasks
* benefit. This is called even in situations where a context switch
* didn't really happen even though it was requested. This is a
* macro rather than an inline function to avoid #include hell.
*/
#ifdef CONFIG_TASKS_RCU
#define rcu_note_voluntary_context_switch_lite(t)
Right?
Correct me if I'm wrong about anything, thanks,
- Joel
On Fri, May 11, 2018 at 03:41:38PM -0700, Joel Fernandes wrote:
> On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> > On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > > Hello folks,
> > >
> > > I think I wrote the title in a misleading way.
> > >
> > > Please change the title to something else such as,
> > > "rcu: Report a quiescent state when it's in the state" or,
> > > "rcu: Add points reporting quiescent states where proper" or so on.
> > >
> > > On 2018-05-11 오후 5:30, Byungchul Park wrote:
> > > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > > >is called, no matter whether it actually be scheduled or not. However,
> > > >it currently doesn't report the quiescent state when the task enters
> > > >into __schedule() as it's called with preempt = true. So make it report
> > > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > > >called.
> > > >
> > > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > > >be reported when the tick interrupt comes from user, it doesn't. So make
> > > >it reported.
> > > >
> > > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > > >reported when the tick interrupt comes from not only user but also idle,
> > > >as an extended quiescent state.
> > > >
> > > >Signed-off-by: Byungchul Park <[email protected]>
> > > >---
> > > > include/linux/rcupdate.h | 4 ++--
> > > > kernel/rcu/tiny.c | 6 +++---
> > > > kernel/rcu/tree.c | 4 ++--
> > > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > >index ee8cf5fc..7432261 100644
> > > >--- a/include/linux/rcupdate.h
> > > >+++ b/include/linux/rcupdate.h
> > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > */
> > > > #define cond_resched_tasks_rcu_qs() \
> > > > do { \
> > > >- if (!cond_resched()) \
> > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > >+ cond_resched(); \
> >
> > Ah, good point.
> >
> > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > while "schedule()" is not?
>
> Infact something interesting I inferred from the __schedule loop related to
> your question:
>
> switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> assume that switch_count reflects whether the context switch is involuntary
> or voluntary,
>
> task-running-state preempt switch_count
> 0 (running) 1 involuntary
> 0 0 involuntary
> 1 0 voluntary
> 1 1 involuntary
>
> According to the above table, both the task's running state and the preempt
> parameter to __schedule should be used together to determine if the switch is
> a voluntary one or not.
>
> So this code in rcu_note_context_switch should really be:
> if (!preempt && !(current->state & TASK_RUNNING))
> rcu_note_voluntary_context_switch_lite(current);
>
> According to the above table, cond_resched always classifies as an
> involuntary switch which makes sense to me. Even though cond_resched is
> explicitly called, its still sort of involuntary in the sense its not called
> into the scheduler for sleeping, but rather for seeing if something else can
> run instead (a preemption point). Infact none of the task deactivation in the
> __schedule loop will run if cond_resched is used.
>
> I agree that if schedule was called directly but with TASK_RUNNING=1, then
> that could probably be classified an involuntary switch too...
>
> Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> unconditionally, then IMO this comment on that macro:
>
> /*
> * Note a voluntary context switch for RCU-tasks benefit. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Should be changed to:
>
> /*
> * Note a attempt to perform a voluntary context switch for RCU-tasks
> * benefit. This is called even in situations where a context switch
> * didn't really happen even though it was requested. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Right?
>
> Correct me if I'm wrong about anything, thanks,
The starting point for me is that Tasks RCU is a special-purpose mechanism
for freeing trampolines in PREEMPT=y kernels. The approach is to arrange
for the trampoline to be inaccessible to future execution, wait for a
tasks-RCU grace period, then free the trampoline. So a tasks-RCU grace
period must wait until all tasks have spent at least some time outside
of a trampoline. My understanding is that trampolines cannot contain
preemption points, such as cond_resched() and cond_resched_tasks_rcu_qs(),
so we want to count them as quiescent states regardless of whether or
not any associated context switch is counted as involuntary.
What situations lead to the second line of your table above?
The sched_yield() system call, but trampolines don't do system calls,
either, as far as I know.
So it looks to me like that test can leave out the TASK_RUNNING check.
Or am I missing something subtle?
Thanx, Paul
On Fri, May 11, 2018 at 10:08:24PM -0700, Paul E. McKenney wrote:
> On Fri, May 11, 2018 at 03:41:38PM -0700, Joel Fernandes wrote:
> > On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> > > On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > > > Hello folks,
> > > >
> > > > I think I wrote the title in a misleading way.
> > > >
> > > > Please change the title to something else such as,
> > > > "rcu: Report a quiescent state when it's in the state" or,
> > > > "rcu: Add points reporting quiescent states where proper" or so on.
> > > >
> > > > On 2018-05-11 오후 5:30, Byungchul Park wrote:
> > > > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > > > >is called, no matter whether it actually be scheduled or not. However,
> > > > >it currently doesn't report the quiescent state when the task enters
> > > > >into __schedule() as it's called with preempt = true. So make it report
> > > > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > > > >called.
> > > > >
> > > > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > > > >be reported when the tick interrupt comes from user, it doesn't. So make
> > > > >it reported.
> > > > >
> > > > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > > > >reported when the tick interrupt comes from not only user but also idle,
> > > > >as an extended quiescent state.
> > > > >
> > > > >Signed-off-by: Byungchul Park <[email protected]>
> > > > >---
> > > > > include/linux/rcupdate.h | 4 ++--
> > > > > kernel/rcu/tiny.c | 6 +++---
> > > > > kernel/rcu/tree.c | 4 ++--
> > > > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > > > >
> > > > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > >index ee8cf5fc..7432261 100644
> > > > >--- a/include/linux/rcupdate.h
> > > > >+++ b/include/linux/rcupdate.h
> > > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > > */
> > > > > #define cond_resched_tasks_rcu_qs() \
> > > > > do { \
> > > > >- if (!cond_resched()) \
> > > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > > >+ cond_resched(); \
> > >
> > > Ah, good point.
> > >
> > > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > > while "schedule()" is not?
> >
> > Infact something interesting I inferred from the __schedule loop related to
> > your question:
> >
> > switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> > assume that switch_count reflects whether the context switch is involuntary
> > or voluntary,
> >
> > task-running-state preempt switch_count
> > 0 (running) 1 involuntary
> > 0 0 involuntary
> > 1 0 voluntary
> > 1 1 involuntary
> >
> > According to the above table, both the task's running state and the preempt
> > parameter to __schedule should be used together to determine if the switch is
> > a voluntary one or not.
> >
> > So this code in rcu_note_context_switch should really be:
> > if (!preempt && !(current->state & TASK_RUNNING))
I should have writte here- !preempt && current->state
> > rcu_note_voluntary_context_switch_lite(current);
> >
> > According to the above table, cond_resched always classifies as an
> > involuntary switch which makes sense to me. Even though cond_resched is
> > explicitly called, its still sort of involuntary in the sense its not called
> > into the scheduler for sleeping, but rather for seeing if something else can
> > run instead (a preemption point). Infact none of the task deactivation in the
> > __schedule loop will run if cond_resched is used.
> >
> > I agree that if schedule was called directly but with TASK_RUNNING=1, then
> > that could probably be classified an involuntary switch too...
> >
> > Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> > unconditionally, then IMO this comment on that macro:
> >
> > /*
> > * Note a voluntary context switch for RCU-tasks benefit. This is a
> > * macro rather than an inline function to avoid #include hell.
> > */
> > #ifdef CONFIG_TASKS_RCU
> > #define rcu_note_voluntary_context_switch_lite(t)
> >
> > Should be changed to:
> >
> > /*
> > * Note a attempt to perform a voluntary context switch for RCU-tasks
> > * benefit. This is called even in situations where a context switch
> > * didn't really happen even though it was requested. This is a
> > * macro rather than an inline function to avoid #include hell.
> > */
> > #ifdef CONFIG_TASKS_RCU
> > #define rcu_note_voluntary_context_switch_lite(t)
> >
> > Right?
> >
> > Correct me if I'm wrong about anything, thanks,
>
> The starting point for me is that Tasks RCU is a special-purpose mechanism
> for freeing trampolines in PREEMPT=y kernels. The approach is to arrange
> for the trampoline to be inaccessible to future execution, wait for a
> tasks-RCU grace period, then free the trampoline. So a tasks-RCU grace
> period must wait until all tasks have spent at least some time outside
> of a trampoline. My understanding is that trampolines cannot contain
> preemption points, such as cond_resched() and cond_resched_tasks_rcu_qs(),
> so we want to count them as quiescent states regardless of whether or
> not any associated context switch is counted as involuntary.
>
> What situations lead to the second line of your table above?
> The sched_yield() system call, but trampolines don't do system calls,
> either, as far as I know.
>
> So it looks to me like that test can leave out the TASK_RUNNING check.
I don't know much about tasks-RCU to comment more, sorry. Probably a few more
reading nights for me to catch up with that. Its possible the check is not
needed and tasks-RCU can survive without it, but I was thinking from a
correctness and future-proofing stand point... I generally don't like
inconsistencies. The check in the __schedule loop is as:
if (!preempt && prev->state) {
....
// switch_count = voluntary context switch counter pointer
....
} else {
....
// switch_count = involuntary context switch counter pointer
....
}
// context switch really happening
if (prev != next) {
....
++switch_count;
}
The first conditional if (!preempt...) above is what I was referring to which
also checks the state.
Also this issue aside, I was more trying to answer your question about why
schedule() is or isn't a preemption point, by sharing the table but I
possibly caused more confusion, sorry :-(. I'll let Peter and Steven chime in
since they know more than me about that and will just shutup and listen
instead of being more noisy.. :-D
thanks,
- Joel
On Fri, May 11, 2018 at 11:30:37PM -0700, Joel Fernandes wrote:
> On Fri, May 11, 2018 at 10:08:24PM -0700, Paul E. McKenney wrote:
> > On Fri, May 11, 2018 at 03:41:38PM -0700, Joel Fernandes wrote:
> > > On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> > > > On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > > > > Hello folks,
> > > > >
> > > > > I think I wrote the title in a misleading way.
> > > > >
> > > > > Please change the title to something else such as,
> > > > > "rcu: Report a quiescent state when it's in the state" or,
> > > > > "rcu: Add points reporting quiescent states where proper" or so on.
> > > > >
> > > > > On 2018-05-11 오후 5:30, Byungchul Park wrote:
> > > > > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > > > > >is called, no matter whether it actually be scheduled or not. However,
> > > > > >it currently doesn't report the quiescent state when the task enters
> > > > > >into __schedule() as it's called with preempt = true. So make it report
> > > > > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > > > > >called.
> > > > > >
> > > > > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > > > > >be reported when the tick interrupt comes from user, it doesn't. So make
> > > > > >it reported.
> > > > > >
> > > > > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > > > > >reported when the tick interrupt comes from not only user but also idle,
> > > > > >as an extended quiescent state.
> > > > > >
> > > > > >Signed-off-by: Byungchul Park <[email protected]>
> > > > > >---
> > > > > > include/linux/rcupdate.h | 4 ++--
> > > > > > kernel/rcu/tiny.c | 6 +++---
> > > > > > kernel/rcu/tree.c | 4 ++--
> > > > > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > > > > >
> > > > > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > > >index ee8cf5fc..7432261 100644
> > > > > >--- a/include/linux/rcupdate.h
> > > > > >+++ b/include/linux/rcupdate.h
> > > > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > > > */
> > > > > > #define cond_resched_tasks_rcu_qs() \
> > > > > > do { \
> > > > > >- if (!cond_resched()) \
> > > > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > > > >+ cond_resched(); \
> > > >
> > > > Ah, good point.
> > > >
> > > > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > > > while "schedule()" is not?
> > >
> > > Infact something interesting I inferred from the __schedule loop related to
> > > your question:
> > >
> > > switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> > > assume that switch_count reflects whether the context switch is involuntary
> > > or voluntary,
> > >
> > > task-running-state preempt switch_count
> > > 0 (running) 1 involuntary
> > > 0 0 involuntary
> > > 1 0 voluntary
> > > 1 1 involuntary
> > >
> > > According to the above table, both the task's running state and the preempt
> > > parameter to __schedule should be used together to determine if the switch is
> > > a voluntary one or not.
> > >
> > > So this code in rcu_note_context_switch should really be:
> > > if (!preempt && !(current->state & TASK_RUNNING))
>
> I should have writte here- !preempt && current->state
>
> > > rcu_note_voluntary_context_switch_lite(current);
> > >
> > > According to the above table, cond_resched always classifies as an
> > > involuntary switch which makes sense to me. Even though cond_resched is
> > > explicitly called, its still sort of involuntary in the sense its not called
> > > into the scheduler for sleeping, but rather for seeing if something else can
> > > run instead (a preemption point). Infact none of the task deactivation in the
> > > __schedule loop will run if cond_resched is used.
> > >
> > > I agree that if schedule was called directly but with TASK_RUNNING=1, then
> > > that could probably be classified an involuntary switch too...
> > >
> > > Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> > > unconditionally, then IMO this comment on that macro:
> > >
> > > /*
> > > * Note a voluntary context switch for RCU-tasks benefit. This is a
> > > * macro rather than an inline function to avoid #include hell.
> > > */
> > > #ifdef CONFIG_TASKS_RCU
> > > #define rcu_note_voluntary_context_switch_lite(t)
> > >
> > > Should be changed to:
> > >
> > > /*
> > > * Note a attempt to perform a voluntary context switch for RCU-tasks
> > > * benefit. This is called even in situations where a context switch
> > > * didn't really happen even though it was requested. This is a
> > > * macro rather than an inline function to avoid #include hell.
> > > */
> > > #ifdef CONFIG_TASKS_RCU
> > > #define rcu_note_voluntary_context_switch_lite(t)
> > >
> > > Right?
> > >
> > > Correct me if I'm wrong about anything, thanks,
> >
> > The starting point for me is that Tasks RCU is a special-purpose mechanism
> > for freeing trampolines in PREEMPT=y kernels. The approach is to arrange
> > for the trampoline to be inaccessible to future execution, wait for a
> > tasks-RCU grace period, then free the trampoline. So a tasks-RCU grace
> > period must wait until all tasks have spent at least some time outside
> > of a trampoline. My understanding is that trampolines cannot contain
> > preemption points, such as cond_resched() and cond_resched_tasks_rcu_qs(),
> > so we want to count them as quiescent states regardless of whether or
> > not any associated context switch is counted as involuntary.
> >
> > What situations lead to the second line of your table above?
> > The sched_yield() system call, but trampolines don't do system calls,
> > either, as far as I know.
> >
> > So it looks to me like that test can leave out the TASK_RUNNING check.
>
> I don't know much about tasks-RCU to comment more, sorry. Probably a few more
> reading nights for me to catch up with that. Its possible the check is not
> needed and tasks-RCU can survive without it, but I was thinking from a
> correctness and future-proofing stand point... I generally don't like
> inconsistencies. The check in the __schedule loop is as:
>
> if (!preempt && prev->state) {
> ....
> // switch_count = voluntary context switch counter pointer
> ....
> } else {
> ....
>
> // switch_count = involuntary context switch counter pointer
> ....
> }
>
> // context switch really happening
> if (prev != next) {
> ....
> ++switch_count;
> }
>
> The first conditional if (!preempt...) above is what I was referring to which
> also checks the state.
>
> Also this issue aside, I was more trying to answer your question about why
> schedule() is or isn't a preemption point, by sharing the table but I
> possibly caused more confusion, sorry :-(. I'll let Peter and Steven chime in
> since they know more than me about that and will just shutup and listen
> instead of being more noisy.. :-D
Don't get me wrong, this discussion was quite useful to me. We probably
need to at least change the comments, and perhaps the code as well. But
I agree that we need input from Peter and Steven to make much more forward
progress.
Thanx, Paul
On Sat, 12 May 2018 07:41:19 -0700
"Paul E. McKenney" <[email protected]> wrote:
> Don't get me wrong, this discussion was quite useful to me. We probably
> need to at least change the comments, and perhaps the code as well. But
> I agree that we need input from Peter and Steven to make much more forward
> progress.
It's the weekend so I skimmed more than read this thread, but I will
just add this.
The table Joel posted is interesting, and perhaps we should keep things
consistent with that. But that said, with respect to task-RCU, as
nothing on a trampoline should ever call cond_resched() (and perhaps I
should add code in lockdep that verifies this), we just want a
quiescent state that tells us that the task has left the trampoline. A
cond_resched() should be one of those points that does.
It really has nothing to do with scheduling or preemption. The issue is
that if a task is on a trampoline and gets preempted, there's no
knowing when it is off that trampoline where we can free it. We need to
have places in the kernel that we know is a quiescent state to move
task-RCU forward. cond_resched() seems to be one of them. schedule
itself can not be, because it can be called from an interrupt preempting
a task while it is on the trampoline.
-- Steve
On Sat, May 12, 2018 at 07:41:19AM -0700, Paul E. McKenney wrote:
> On Fri, May 11, 2018 at 11:30:37PM -0700, Joel Fernandes wrote:
> > On Fri, May 11, 2018 at 10:08:24PM -0700, Paul E. McKenney wrote:
> > > On Fri, May 11, 2018 at 03:41:38PM -0700, Joel Fernandes wrote:
> > > > On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> > > > > On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> > > > > > Hello folks,
> > > > > >
> > > > > > I think I wrote the title in a misleading way.
> > > > > >
> > > > > > Please change the title to something else such as,
> > > > > > "rcu: Report a quiescent state when it's in the state" or,
> > > > > > "rcu: Add points reporting quiescent states where proper" or so on.
> > > > > >
> > > > > > On 2018-05-11 오후 5:30, Byungchul Park wrote:
> > > > > > >We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> > > > > > >is called, no matter whether it actually be scheduled or not. However,
> > > > > > >it currently doesn't report the quiescent state when the task enters
> > > > > > >into __schedule() as it's called with preempt = true. So make it report
> > > > > > >the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> > > > > > >called.
> > > > > > >
> > > > > > >And in TINY_RCU, even though the quiescent state of rcu_bh also should
> > > > > > >be reported when the tick interrupt comes from user, it doesn't. So make
> > > > > > >it reported.
> > > > > > >
> > > > > > >Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> > > > > > >reported when the tick interrupt comes from not only user but also idle,
> > > > > > >as an extended quiescent state.
> > > > > > >
> > > > > > >Signed-off-by: Byungchul Park <[email protected]>
> > > > > > >---
> > > > > > > include/linux/rcupdate.h | 4 ++--
> > > > > > > kernel/rcu/tiny.c | 6 +++---
> > > > > > > kernel/rcu/tree.c | 4 ++--
> > > > > > > 3 files changed, 7 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > > > >index ee8cf5fc..7432261 100644
> > > > > > >--- a/include/linux/rcupdate.h
> > > > > > >+++ b/include/linux/rcupdate.h
> > > > > > >@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> > > > > > > */
> > > > > > > #define cond_resched_tasks_rcu_qs() \
> > > > > > > do { \
> > > > > > >- if (!cond_resched()) \
> > > > > > >- rcu_note_voluntary_context_switch_lite(current); \
> > > > > > >+ rcu_note_voluntary_context_switch_lite(current); \
> > > > > > >+ cond_resched(); \
> > > > >
> > > > > Ah, good point.
> > > > >
> > > > > Peter, I have to ask... Why is "cond_resched()" considered a preemption
> > > > > while "schedule()" is not?
> > > >
> > > > Infact something interesting I inferred from the __schedule loop related to
> > > > your question:
> > > >
> > > > switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> > > > assume that switch_count reflects whether the context switch is involuntary
> > > > or voluntary,
> > > >
> > > > task-running-state preempt switch_count
> > > > 0 (running) 1 involuntary
> > > > 0 0 involuntary
> > > > 1 0 voluntary
> > > > 1 1 involuntary
> > > >
> > > > According to the above table, both the task's running state and the preempt
> > > > parameter to __schedule should be used together to determine if the switch is
> > > > a voluntary one or not.
> > > >
> > > > So this code in rcu_note_context_switch should really be:
> > > > if (!preempt && !(current->state & TASK_RUNNING))
> >
> > I should have writte here- !preempt && current->state
> >
> > > > rcu_note_voluntary_context_switch_lite(current);
> > > >
> > > > According to the above table, cond_resched always classifies as an
> > > > involuntary switch which makes sense to me. Even though cond_resched is
> > > > explicitly called, its still sort of involuntary in the sense its not called
> > > > into the scheduler for sleeping, but rather for seeing if something else can
> > > > run instead (a preemption point). Infact none of the task deactivation in the
> > > > __schedule loop will run if cond_resched is used.
> > > >
> > > > I agree that if schedule was called directly but with TASK_RUNNING=1, then
> > > > that could probably be classified an involuntary switch too...
> > > >
> > > > Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> > > > unconditionally, then IMO this comment on that macro:
> > > >
> > > > /*
> > > > * Note a voluntary context switch for RCU-tasks benefit. This is a
> > > > * macro rather than an inline function to avoid #include hell.
> > > > */
> > > > #ifdef CONFIG_TASKS_RCU
> > > > #define rcu_note_voluntary_context_switch_lite(t)
> > > >
> > > > Should be changed to:
> > > >
> > > > /*
> > > > * Note a attempt to perform a voluntary context switch for RCU-tasks
> > > > * benefit. This is called even in situations where a context switch
> > > > * didn't really happen even though it was requested. This is a
> > > > * macro rather than an inline function to avoid #include hell.
> > > > */
> > > > #ifdef CONFIG_TASKS_RCU
> > > > #define rcu_note_voluntary_context_switch_lite(t)
> > > >
> > > > Right?
> > > >
> > > > Correct me if I'm wrong about anything, thanks,
> > >
> > > The starting point for me is that Tasks RCU is a special-purpose mechanism
> > > for freeing trampolines in PREEMPT=y kernels. The approach is to arrange
> > > for the trampoline to be inaccessible to future execution, wait for a
> > > tasks-RCU grace period, then free the trampoline. So a tasks-RCU grace
> > > period must wait until all tasks have spent at least some time outside
> > > of a trampoline. My understanding is that trampolines cannot contain
> > > preemption points, such as cond_resched() and cond_resched_tasks_rcu_qs(),
> > > so we want to count them as quiescent states regardless of whether or
> > > not any associated context switch is counted as involuntary.
> > >
> > > What situations lead to the second line of your table above?
> > > The sched_yield() system call, but trampolines don't do system calls,
> > > either, as far as I know.
> > >
> > > So it looks to me like that test can leave out the TASK_RUNNING check.
> >
> > I don't know much about tasks-RCU to comment more, sorry. Probably a few more
> > reading nights for me to catch up with that. Its possible the check is not
> > needed and tasks-RCU can survive without it, but I was thinking from a
> > correctness and future-proofing stand point... I generally don't like
> > inconsistencies. The check in the __schedule loop is as:
> >
> > if (!preempt && prev->state) {
> > ....
> > // switch_count = voluntary context switch counter pointer
> > ....
> > } else {
> > ....
> >
> > // switch_count = involuntary context switch counter pointer
> > ....
> > }
> >
> > // context switch really happening
> > if (prev != next) {
> > ....
> > ++switch_count;
> > }
> >
> > The first conditional if (!preempt...) above is what I was referring to which
> > also checks the state.
> >
> > Also this issue aside, I was more trying to answer your question about why
> > schedule() is or isn't a preemption point, by sharing the table but I
> > possibly caused more confusion, sorry :-(. I'll let Peter and Steven chime in
> > since they know more than me about that and will just shutup and listen
> > instead of being more noisy.. :-D
>
> Don't get me wrong, this discussion was quite useful to me. We probably
Cool, I'm glad, thanks.
> need to at least change the comments, and perhaps the code as well. But
Ok. I can make this kind of a comment in my clean up patch series just to
clarify how the macro is used for.
> I agree that we need input from Peter and Steven to make much more forward
> progress.
I saw Steven's email just now about cond_resched and stuff and it makes sense.
thanks,
- Joel
On 2018-05-12 오전 7:41, Joel Fernandes wrote:
> On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
>> On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
>>> Hello folks,
>>>
>>> I think I wrote the title in a misleading way.
>>>
>>> Please change the title to something else such as,
>>> "rcu: Report a quiescent state when it's in the state" or,
>>> "rcu: Add points reporting quiescent states where proper" or so on.
>>>
>>> On 2018-05-11 오후 5:30, Byungchul Park wrote:
>>>> We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
>>>> is called, no matter whether it actually be scheduled or not. However,
>>>> it currently doesn't report the quiescent state when the task enters
>>>> into __schedule() as it's called with preempt = true. So make it report
>>>> the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
>>>> called.
>>>>
>>>> And in TINY_RCU, even though the quiescent state of rcu_bh also should
>>>> be reported when the tick interrupt comes from user, it doesn't. So make
>>>> it reported.
>>>>
>>>> Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
>>>> reported when the tick interrupt comes from not only user but also idle,
>>>> as an extended quiescent state.
>>>>
>>>> Signed-off-by: Byungchul Park <[email protected]>
>>>> ---
>>>> include/linux/rcupdate.h | 4 ++--
>>>> kernel/rcu/tiny.c | 6 +++---
>>>> kernel/rcu/tree.c | 4 ++--
>>>> 3 files changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>>> index ee8cf5fc..7432261 100644
>>>> --- a/include/linux/rcupdate.h
>>>> +++ b/include/linux/rcupdate.h
>>>> @@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
>>>> */
>>>> #define cond_resched_tasks_rcu_qs() \
>>>> do { \
>>>> - if (!cond_resched()) \
>>>> - rcu_note_voluntary_context_switch_lite(current); \
>>>> + rcu_note_voluntary_context_switch_lite(current); \
>>>> + cond_resched(); \
>>
>> Ah, good point.
>>
>> Peter, I have to ask... Why is "cond_resched()" considered a preemption
>> while "schedule()" is not?
>
> Infact something interesting I inferred from the __schedule loop related to
> your question:
>
> switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> assume that switch_count reflects whether the context switch is involuntary
> or voluntary,
>
> task-running-state preempt switch_count
> 0 (running) 1 involuntary
> 0 0 involuntary
> 1 0 voluntary
> 1 1 involuntary
>
> According to the above table, both the task's running state and the preempt
> parameter to __schedule should be used together to determine if the switch is
> a voluntary one or not.
>
> So this code in rcu_note_context_switch should really be:
> if (!preempt && !(current->state & TASK_RUNNING))
> rcu_note_voluntary_context_switch_lite(current);
>
> According to the above table, cond_resched always classifies as an
> involuntary switch which makes sense to me. Even though cond_resched is
Hello guys,
The classification for nivcsw/nvcsw used in scheduler core, Joel, you
showed us is different from that used in when we distinguish between
non preemption/voluntary preemption/preemption/full and so on, even
they use the same word, "voluntary" though.
The name, rcu_note_voluntary_context_switch_lite() used in RCU has
a lot to do with the latter, the term of preemption. Furthermore, I
think the function should be called even when calling schedule() for
sleep as well. I think it would be better to change the function
name to something else to prevent confusing, it's up to Paul tho. :)
> explicitly called, its still sort of involuntary in the sense its not called
> into the scheduler for sleeping, but rather for seeing if something else can
> run instead (a preemption point). Infact none of the task deactivation in the
> __schedule loop will run if cond_resched is used.
>
> I agree that if schedule was called directly but with TASK_RUNNING=1, then
> that could probably be classified an involuntary switch too...
>
> Also since we're deciding to call rcu_note_voluntary_context_switch_lite
> unconditionally, then IMO this comment on that macro:
>
> /*
> * Note a voluntary context switch for RCU-tasks benefit. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Should be changed to:
>
> /*
> * Note a attempt to perform a voluntary context switch for RCU-tasks
> * benefit. This is called even in situations where a context switch
> * didn't really happen even though it was requested. This is a
> * macro rather than an inline function to avoid #include hell.
> */
> #ifdef CONFIG_TASKS_RCU
> #define rcu_note_voluntary_context_switch_lite(t)
>
> Right?
>
> Correct me if I'm wrong about anything, thanks,
>
> - Joel
>
>
--
Thanks,
Byungchul
On 2018-05-13 오전 2:26, Steven Rostedt wrote:
> On Sat, 12 May 2018 07:41:19 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>> Don't get me wrong, this discussion was quite useful to me. We probably
>> need to at least change the comments, and perhaps the code as well. But
>> I agree that we need input from Peter and Steven to make much more forward
>> progress.
>
> It's the weekend so I skimmed more than read this thread, but I will
> just add this.
>
> The table Joel posted is interesting, and perhaps we should keep things
> consistent with that. But that said, with respect to task-RCU, as
> nothing on a trampoline should ever call cond_resched() (and perhaps I
> should add code in lockdep that verifies this), we just want a
> quiescent state that tells us that the task has left the trampoline. A
> cond_resched() should be one of those points that does.
>
> It really has nothing to do with scheduling or preemption. The issue is
> that if a task is on a trampoline and gets preempted, there's no
> knowing when it is off that trampoline where we can free it. We need to
> have places in the kernel that we know is a quiescent state to move
> task-RCU forward. cond_resched() seems to be one of them. schedule
> itself can not be, because it can be called from an interrupt preempting
> a task while it is on the trampoline.
Exactly. I think Steven explained how we should consider them exactly.
> -- Steve
--
Thanks,
Byungchul
On Mon, May 14, 2018 at 11:59 AM, Byungchul Park <[email protected]> wrote:
>
>
> On 2018-05-12 오전 7:41, Joel Fernandes wrote:
>>
>> On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
>>>
>>> On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
>>>>
>>>> Hello folks,
>>>>
>>>> I think I wrote the title in a misleading way.
>>>>
>>>> Please change the title to something else such as,
>>>> "rcu: Report a quiescent state when it's in the state" or,
>>>> "rcu: Add points reporting quiescent states where proper" or so on.
>>>>
>>>> On 2018-05-11 오후 5:30, Byungchul Park wrote:
>>>>>
>>>>> We expect a quiescent state of TASKS_RCU when
>>>>> cond_resched_tasks_rcu_qs()
>>>>> is called, no matter whether it actually be scheduled or not. However,
>>>>> it currently doesn't report the quiescent state when the task enters
>>>>> into __schedule() as it's called with preempt = true. So make it report
>>>>> the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
>>>>> called.
>>>>>
>>>>> And in TINY_RCU, even though the quiescent state of rcu_bh also should
>>>>> be reported when the tick interrupt comes from user, it doesn't. So
>>>>> make
>>>>> it reported.
>>>>>
>>>>> Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
>>>>> reported when the tick interrupt comes from not only user but also
>>>>> idle,
>>>>> as an extended quiescent state.
>>>>>
>>>>> Signed-off-by: Byungchul Park <[email protected]>
>>>>> ---
>>>>> include/linux/rcupdate.h | 4 ++--
>>>>> kernel/rcu/tiny.c | 6 +++---
>>>>> kernel/rcu/tree.c | 4 ++--
>>>>> 3 files changed, 7 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>>>> index ee8cf5fc..7432261 100644
>>>>> --- a/include/linux/rcupdate.h
>>>>> +++ b/include/linux/rcupdate.h
>>>>> @@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
>>>>> */
>>>>> #define cond_resched_tasks_rcu_qs() \
>>>>> do { \
>>>>> - if (!cond_resched()) \
>>>>> - rcu_note_voluntary_context_switch_lite(current); \
>>>>> + rcu_note_voluntary_context_switch_lite(current); \
>>>>> + cond_resched(); \
>>>
>>>
>>> Ah, good point.
>>>
>>> Peter, I have to ask... Why is "cond_resched()" considered a preemption
>>> while "schedule()" is not?
>>
>>
>> Infact something interesting I inferred from the __schedule loop related
>> to
>> your question:
>>
>> switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
>> assume that switch_count reflects whether the context switch is
>> involuntary
>> or voluntary,
>> task-running-state preempt
>> switch_count
>> 0 (running) 1 involuntary
>> 0 0 involuntary
>> 1 0 voluntary
>> 1 1 involuntary
>>
>> According to the above table, both the task's running state and the
>> preempt
>> parameter to __schedule should be used together to determine if the switch
>> is
>> a voluntary one or not.
>>
>> So this code in rcu_note_context_switch should really be:
>> if (!preempt && !(current->state & TASK_RUNNING))
>> rcu_note_voluntary_context_switch_lite(current);
>>
>> According to the above table, cond_resched always classifies as an
>> involuntary switch which makes sense to me. Even though cond_resched is
>
>
> Hello guys,
>
> The classification for nivcsw/nvcsw used in scheduler core, Joel, you
> showed us is different from that used in when we distinguish between
> non preemption/voluntary preemption/preemption/full and so on, even
> they use the same word, "voluntary" though.
>
> The name, rcu_note_voluntary_context_switch_lite() used in RCU has
> a lot to do with the latter, the term of preemption. Furthermore, I
> think the function should be called even when calling schedule() for
> sleep as well. I think it would be better to change the function
> name to something else to prevent confusing, it's up to Paul tho. :)
Let me explain more what I did earlier.
In the scheduler core when classifying nivcsw/nvcsw, they classify the
tries as "voluntary", which go to the inactivate state i.e. sleep through
a normal path w/o any disturbed e.g. by interrupt preemption.
However, in RCU, it's for indicating the places trying to explicitly call
scheduler which are quiescent states anyway for TASKS_RCU. Any
explicit tries including voluntary preemption points are the cases.
That 's why I said they have different meaning from each other. But
anyway I also think it would be much better if we can make them
consistent by renaming or modifying both scheduler and rcu code.
>> explicitly called, its still sort of involuntary in the sense its not
>> called
>> into the scheduler for sleeping, but rather for seeing if something else
>> can
>> run instead (a preemption point). Infact none of the task deactivation in
>> the
>> __schedule loop will run if cond_resched is used.
>>
>> I agree that if schedule was called directly but with TASK_RUNNING=1, then
>> that could probably be classified an involuntary switch too...
>>
>> Also since we're deciding to call rcu_note_voluntary_context_switch_lite
>> unconditionally, then IMO this comment on that macro:
>>
>> /*
>> * Note a voluntary context switch for RCU-tasks benefit. This is a
>> * macro rather than an inline function to avoid #include hell.
>> */
>> #ifdef CONFIG_TASKS_RCU
>> #define rcu_note_voluntary_context_switch_lite(t)
>>
>> Should be changed to:
>>
>> /*
>> * Note a attempt to perform a voluntary context switch for RCU-tasks
>> * benefit. This is called even in situations where a context switch
>> * didn't really happen even though it was requested. This is a
>> * macro rather than an inline function to avoid #include hell.
>> */
>> #ifdef CONFIG_TASKS_RCU
>> #define rcu_note_voluntary_context_switch_lite(t)
>>
>> Right?
>>
>> Correct me if I'm wrong about anything, thanks,
>>
>> - Joel
>>
>>
>
> --
> Thanks,
> Byungchul
--
Thanks,
Byungchul
On Mon, May 14, 2018 at 11:59:41AM +0900, Byungchul Park wrote:
> On 2018-05-12 오전 7:41, Joel Fernandes wrote:
> >On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> >>On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> >>>Hello folks,
> >>>
> >>>I think I wrote the title in a misleading way.
> >>>
> >>>Please change the title to something else such as,
> >>>"rcu: Report a quiescent state when it's in the state" or,
> >>>"rcu: Add points reporting quiescent states where proper" or so on.
> >>>
> >>>On 2018-05-11 오후 5:30, Byungchul Park wrote:
> >>>>We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> >>>>is called, no matter whether it actually be scheduled or not. However,
> >>>>it currently doesn't report the quiescent state when the task enters
> >>>>into __schedule() as it's called with preempt = true. So make it report
> >>>>the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> >>>>called.
> >>>>
> >>>>And in TINY_RCU, even though the quiescent state of rcu_bh also should
> >>>>be reported when the tick interrupt comes from user, it doesn't. So make
> >>>>it reported.
> >>>>
> >>>>Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> >>>>reported when the tick interrupt comes from not only user but also idle,
> >>>>as an extended quiescent state.
> >>>>
> >>>>Signed-off-by: Byungchul Park <[email protected]>
> >>>>---
> >>>> include/linux/rcupdate.h | 4 ++--
> >>>> kernel/rcu/tiny.c | 6 +++---
> >>>> kernel/rcu/tree.c | 4 ++--
> >>>> 3 files changed, 7 insertions(+), 7 deletions(-)
> >>>>
> >>>>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >>>>index ee8cf5fc..7432261 100644
> >>>>--- a/include/linux/rcupdate.h
> >>>>+++ b/include/linux/rcupdate.h
> >>>>@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> >>>> */
> >>>> #define cond_resched_tasks_rcu_qs() \
> >>>> do { \
> >>>>- if (!cond_resched()) \
> >>>>- rcu_note_voluntary_context_switch_lite(current); \
> >>>>+ rcu_note_voluntary_context_switch_lite(current); \
> >>>>+ cond_resched(); \
> >>
> >>Ah, good point.
> >>
> >>Peter, I have to ask... Why is "cond_resched()" considered a preemption
> >>while "schedule()" is not?
> >
> >Infact something interesting I inferred from the __schedule loop related to
> >your question:
> >
> >switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> >assume that switch_count reflects whether the context switch is involuntary
> >or voluntary,
> >task-running-state preempt switch_count
> >0 (running) 1 involuntary
> >0 0 involuntary
> >1 0 voluntary
> >1 1 involuntary
> >
> >According to the above table, both the task's running state and the preempt
> >parameter to __schedule should be used together to determine if the switch is
> >a voluntary one or not.
> >
> >So this code in rcu_note_context_switch should really be:
> >if (!preempt && !(current->state & TASK_RUNNING))
> > rcu_note_voluntary_context_switch_lite(current);
> >
> >According to the above table, cond_resched always classifies as an
> >involuntary switch which makes sense to me. Even though cond_resched is
>
> Hello guys,
>
> The classification for nivcsw/nvcsw used in scheduler core, Joel, you
> showed us is different from that used in when we distinguish between
> non preemption/voluntary preemption/preemption/full and so on, even
> they use the same word, "voluntary" though.
>
> The name, rcu_note_voluntary_context_switch_lite() used in RCU has
> a lot to do with the latter, the term of preemption. Furthermore, I
> think the function should be called even when calling schedule() for
> sleep as well. I think it would be better to change the function
> name to something else to prevent confusing, it's up to Paul tho. :)
Given what it currently does, the name should be rcu_tasks_qs() to go
along with rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs(). Much as
I would like cond_resched() to be an RCU-tasks quiescent state, it is
nothingness for PREEMPT=y kernels, and Peter has indicated a strong
interest in having it remain so. But I did update a few comments.
I left rcu_note_voluntary_context_switch() alone because it should be
disappearing entirely Real Soon Now.
Please see patch below.
Thanx, Paul
PS. Oddly enough, the recent patch removing the "if" from
cond_resched_tasks_rcu_qs() is (technically speaking) pointless.
If the kernel contains RCU-tasks, it must be preemptible, which
means that cond_resched() unconditionally returns false, which
in turn means that rcu_note_voluntary_context_switch_lite() was
unconditionally invoked.
Simiarly, in non-preemptible kernels, where cond_resched()
might well return true, rcu_note_voluntary_context_switch_lite()
is a no-op.
So that patch had no effect, but I am keeping it due to the
improved readability. I should probably update its commit log,
though. ;-)
------------------------------------------------------------------------
commit 5b39fc0d9bc6c806cb42ed546c37655689b4dbdd
Author: Paul E. McKenney <[email protected]>
Date: Mon May 14 13:52:27 2018 -0700
rcu: Improve RCU-tasks naming and comments
The naming and comments associated with some RCU-tasks code make
the faulty assumption that context switches due to cond_resched()
are voluntary. As several people pointed out, this is not the case.
This commit therefore updates function names and comments to better
reflect current reality.
Reported-by: Byungchul Park <[email protected]>
Reported-by: Joel Fernandes <[email protected]>
Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 028d07ce198a..713b93af26f3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -159,11 +159,11 @@ static inline void rcu_init_nohz(void) { }
} while (0)
/*
- * Note a voluntary context switch for RCU-tasks benefit. This is a
- * macro rather than an inline function to avoid #include hell.
+ * Note a quasi-voluntary context switch for RCU-tasks's benefit.
+ * This is a macro rather than an inline function to avoid #include hell.
*/
#ifdef CONFIG_TASKS_RCU
-#define rcu_note_voluntary_context_switch_lite(t) \
+#define rcu_tasks_qs(t) \
do { \
if (READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
@@ -171,14 +171,14 @@ static inline void rcu_init_nohz(void) { }
#define rcu_note_voluntary_context_switch(t) \
do { \
rcu_all_qs(); \
- rcu_note_voluntary_context_switch_lite(t); \
+ rcu_tasks_qs(t); \
} while (0)
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks(void);
void exit_tasks_rcu_start(void);
void exit_tasks_rcu_finish(void);
#else /* #ifdef CONFIG_TASKS_RCU */
-#define rcu_note_voluntary_context_switch_lite(t) do { } while (0)
+#define rcu_tasks_qs(t) do { } while (0)
#define rcu_note_voluntary_context_switch(t) rcu_all_qs()
#define call_rcu_tasks call_rcu_sched
#define synchronize_rcu_tasks synchronize_sched
@@ -195,7 +195,7 @@ static inline void exit_tasks_rcu_finish(void) { }
*/
#define cond_resched_tasks_rcu_qs() \
do { \
- rcu_note_voluntary_context_switch_lite(current); \
+ rcu_tasks_qs(current); \
cond_resched(); \
} while (0)
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec35e34..79409dbb5478 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -93,7 +93,7 @@ static inline void kfree_call_rcu(struct rcu_head *head,
#define rcu_note_context_switch(preempt) \
do { \
rcu_sched_qs(); \
- rcu_note_voluntary_context_switch_lite(current); \
+ rcu_tasks_qs(current); \
} while (0)
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3826ce90fd6e..4e96761ce367 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -456,7 +456,7 @@ void rcu_note_context_switch(bool preempt)
rcu_momentary_dyntick_idle();
this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
if (!preempt)
- rcu_note_voluntary_context_switch_lite(current);
+ rcu_tasks_qs(current);
out:
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c230a60ece4..5783bdf86e5a 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -507,14 +507,15 @@ early_initcall(check_cpu_stall_init);
#ifdef CONFIG_TASKS_RCU
/*
- * Simple variant of RCU whose quiescent states are voluntary context switch,
- * user-space execution, and idle. As such, grace periods can take one good
- * long time. There are no read-side primitives similar to rcu_read_lock()
- * and rcu_read_unlock() because this implementation is intended to get
- * the system into a safe state for some of the manipulations involved in
- * tracing and the like. Finally, this implementation does not support
- * high call_rcu_tasks() rates from multiple CPUs. If this is required,
- * per-CPU callback lists will be needed.
+ * Simple variant of RCU whose quiescent states are voluntary context
+ * switch, cond_resched_rcu_qs(), user-space execution, and idle.
+ * As such, grace periods can take one good long time. There are no
+ * read-side primitives similar to rcu_read_lock() and rcu_read_unlock()
+ * because this implementation is intended to get the system into a safe
+ * state for some of the manipulations involved in tracing and the like.
+ * Finally, this implementation does not support high call_rcu_tasks()
+ * rates from multiple CPUs. If this is required, per-CPU callback lists
+ * will be needed.
*/
/* Global list of callbacks and associated lock. */
@@ -542,11 +543,11 @@ static struct task_struct *rcu_tasks_kthread_ptr;
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_tasks() assumes
* that the read-side critical sections end at a voluntary context
- * switch (not a preemption!), entry into idle, or transition to usermode
- * execution. As such, there are no read-side primitives analogous to
- * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
- * to determine that all tasks have passed through a safe state, not so
- * much for data-strcuture synchronization.
+ * switch (not a preemption!), cond_resched_rcu_qs(), entry into idle,
+ * or transition to usermode execution. As such, there are no read-side
+ * primitives analogous to rcu_read_lock() and rcu_read_unlock() because
+ * this primitive is intended to determine that all tasks have passed
+ * through a safe state, not so much for data-strcuture synchronization.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
On 2018-05-15 06:04, Paul E. McKenney wrote:
> On Mon, May 14, 2018 at 11:59:41AM +0900, Byungchul Park wrote:
>> On 2018-05-12 오전 7:41, Joel Fernandes wrote:
>>> On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
>>>> On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
>>>>> Hello folks,
>>>>>
>>>>> I think I wrote the title in a misleading way.
>>>>>
>>>>> Please change the title to something else such as,
>>>>> "rcu: Report a quiescent state when it's in the state" or,
>>>>> "rcu: Add points reporting quiescent states where proper" or so on.
>>>>>
>>>>> On 2018-05-11 오후 5:30, Byungchul Park wrote:
>>>>>> We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
>>>>>> is called, no matter whether it actually be scheduled or not. However,
>>>>>> it currently doesn't report the quiescent state when the task enters
>>>>>> into __schedule() as it's called with preempt = true. So make it report
>>>>>> the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
>>>>>> called.
>>>>>>
>>>>>> And in TINY_RCU, even though the quiescent state of rcu_bh also should
>>>>>> be reported when the tick interrupt comes from user, it doesn't. So make
>>>>>> it reported.
>>>>>>
>>>>>> Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
>>>>>> reported when the tick interrupt comes from not only user but also idle,
>>>>>> as an extended quiescent state.
>>>>>>
>>>>>> Signed-off-by: Byungchul Park <[email protected]>
>>>>>> ---
>>>>>> include/linux/rcupdate.h | 4 ++--
>>>>>> kernel/rcu/tiny.c | 6 +++---
>>>>>> kernel/rcu/tree.c | 4 ++--
>>>>>> 3 files changed, 7 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>>>>> index ee8cf5fc..7432261 100644
>>>>>> --- a/include/linux/rcupdate.h
>>>>>> +++ b/include/linux/rcupdate.h
>>>>>> @@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
>>>>>> */
>>>>>> #define cond_resched_tasks_rcu_qs() \
>>>>>> do { \
>>>>>> - if (!cond_resched()) \
>>>>>> - rcu_note_voluntary_context_switch_lite(current); \
>>>>>> + rcu_note_voluntary_context_switch_lite(current); \
>>>>>> + cond_resched(); \
>>>>
>>>> Ah, good point.
>>>>
>>>> Peter, I have to ask... Why is "cond_resched()" considered a preemption
>>>> while "schedule()" is not?
>>>
>>> Infact something interesting I inferred from the __schedule loop related to
>>> your question:
>>>
>>> switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
>>> assume that switch_count reflects whether the context switch is involuntary
>>> or voluntary,
>>> task-running-state preempt switch_count
>>> 0 (running) 1 involuntary
>>> 0 0 involuntary
>>> 1 0 voluntary
>>> 1 1 involuntary
>>>
>>> According to the above table, both the task's running state and the preempt
>>> parameter to __schedule should be used together to determine if the switch is
>>> a voluntary one or not.
>>>
>>> So this code in rcu_note_context_switch should really be:
>>> if (!preempt && !(current->state & TASK_RUNNING))
>>> rcu_note_voluntary_context_switch_lite(current);
>>>
>>> According to the above table, cond_resched always classifies as an
>>> involuntary switch which makes sense to me. Even though cond_resched is
>>
>> Hello guys,
>>
>> The classification for nivcsw/nvcsw used in scheduler core, Joel, you
>> showed us is different from that used in when we distinguish between
>> non preemption/voluntary preemption/preemption/full and so on, even
>> they use the same word, "voluntary" though.
>>
>> The name, rcu_note_voluntary_context_switch_lite() used in RCU has
>> a lot to do with the latter, the term of preemption. Furthermore, I
>> think the function should be called even when calling schedule() for
>> sleep as well. I think it would be better to change the function
>> name to something else to prevent confusing, it's up to Paul tho. :)
>
> Given what it currently does, the name should be rcu_tasks_qs() to go
> along with rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs(). Much as
> I would like cond_resched() to be an RCU-tasks quiescent state, it is
> nothingness for PREEMPT=y kernels, and Peter has indicated a strong
> interest in having it remain so. But I did update a few comments.
>
> I left rcu_note_voluntary_context_switch() alone because it should be
> disappearing entirely Real Soon Now.
>
> Please see patch below.
>
> Thanx, Paul
>
> PS. Oddly enough, the recent patch removing the "if" from
> cond_resched_tasks_rcu_qs() is (technically speaking) pointless.
> If the kernel contains RCU-tasks, it must be preemptible, which
> means that cond_resched() unconditionally returns false, which
> in turn means that rcu_note_voluntary_context_switch_lite() was
> unconditionally invoked.
>
> Simiarly, in non-preemptible kernels, where cond_resched()
> might well return true, rcu_note_voluntary_context_switch_lite()
> is a no-op.
Interesting. Right. Thanks for your explanation. :)
--
Thanks,
Byungchul