2019-10-04 15:00:26

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH] Remove GP_REPLAY state from rcu_sync

From: Joel Fernandes <[email protected]>

Please consider this is an RFC for discussion only. Just want to discuss
why the GP_REPLAY state is needed at all.

Here's the intention AFAICS:
When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
we wait for a grace period before transitioning to GP_IDLE. In the
meanwhile, if we receive another rcu_sync_exit(), then we want to wait
for another GP to account for that.

Drawing some timing diagrams, it looks like this:

Legend:
rse = rcu_sync_enter
rsx = rcu_sync_exit
i = GP_IDLE
x = GP_EXIT
r = GP_REPLAY
e = GP_ENTER
p = GP_PASSED
rx = GP_REPLAY changes to GP_EXIT

GP num = The GP we are one.

note: A GP passes between the states:
e and p
x and i
x and rx
rx and i

In a simple case, the timing and states look like:
time
---------------------->
GP num 1111111 2222222
GP state i e p x i
CPU0 : rse rsx

However we can enter the replay state like this:
time
---------------------->
GP num 1111111 2222222222222222222223333333
GP state i e p x r rx i
CPU0 : rse rsx
CPU1 : rse rsx

Due to the second rse + rsx, we had to wait for another GP.

I believe the rationale is, if another rsx happens, another GP has to
happen.

But this is not always true if you consider the following events:

time
---------------------->
GP num 111111 22222222222222222222222222222222233333333
GP state i e p x r rx i
CPU0 : rse rsx
CPU1 : rse rsx
CPU2 : rse rsx

Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
and 2 for the rcu_sync_exit(s).

However, we had 3 rcu_sync_exit()s, not 2. In other words, the
rcu_sync_exit() got batched.

So my point here is, rcu_sync_exit() does not really always cause a new
GP to happen and we can end up having the rcu_sync_exit()s as batched
and sharing the same grace period.

Then what is the point of the GP_REPLAY state at all if it does not
always wait for a new GP? Taking a step back, why did we intend to have
to wait for a new GP if another rcu_sync_exit() comes while one is still
in progress?

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/sync.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index d4558ab7a07d..4f3aad67992c 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -10,7 +10,7 @@
#include <linux/rcu_sync.h>
#include <linux/sched.h>

-enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
+enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };

#define rss_lock gp_wait.lock

@@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
*/
WRITE_ONCE(rsp->gp_state, GP_PASSED);
wake_up_locked(&rsp->gp_wait);
- } else if (rsp->gp_state == GP_REPLAY) {
- /*
- * A new rcu_sync_exit() has happened; requeue the callback to
- * catch a later GP.
- */
- WRITE_ONCE(rsp->gp_state, GP_EXIT);
- rcu_sync_call(rsp);
} else {
/*
* We're at least a GP after the last rcu_sync_exit(); eveybody
@@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
*/
void rcu_sync_exit(struct rcu_sync *rsp)
{
- WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
- WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
+ WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);

spin_lock_irq(&rsp->rss_lock);
if (!--rsp->gp_count) {
if (rsp->gp_state == GP_PASSED) {
WRITE_ONCE(rsp->gp_state, GP_EXIT);
rcu_sync_call(rsp);
- } else if (rsp->gp_state == GP_EXIT) {
- WRITE_ONCE(rsp->gp_state, GP_REPLAY);
}
}
spin_unlock_irq(&rsp->rss_lock);
--
2.23.0.581.g78d2f28ef7-goog


2019-10-04 15:42:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

On 10/04, Joel Fernandes (Google) wrote:
>
> But this is not always true if you consider the following events:

I'm afraid I missed your point, but...

> ---------------------->
> GP num 111111 22222222222222222222222222222222233333333
> GP state i e p x r rx i
> CPU0 : rse rsx
> CPU1 : rse rsx
> CPU2 : rse rsx
>
> Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> and 2 for the rcu_sync_exit(s).

But this is fine?

We only need to ensure that we have a full GP pass between the "last"
rcu_sync_exit() and GP_XXX -> GP_IDLE transition.

> However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> rcu_sync_exit() got batched.
>
> So my point here is, rcu_sync_exit() does not really always cause a new
> GP to happen

See above, it should not.

> Then what is the point of the GP_REPLAY state at all if it does not
> always wait for a new GP?

Again, I don't understand... GP_REPLAY ensures that we will have a full GP
before rcu_sync_func() sets GP_IDLE, note that it does another "recursive"
call_rcu() if it sees GP_REPLAY.

> Taking a step back, why did we intend to have
> to wait for a new GP if another rcu_sync_exit() comes while one is still
> in progress?

To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
do rcu_sync_exit(), then it must also see all memory changes you did before
rcu_sync_exit().

Oleg.

2019-10-04 16:38:04

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

Hi Oleg,

On Fri, Oct 04, 2019 at 05:41:03PM +0200, Oleg Nesterov wrote:
> On 10/04, Joel Fernandes (Google) wrote:
> >
> > But this is not always true if you consider the following events:
>
> I'm afraid I missed your point, but...
>
> > ---------------------->
> > GP num 111111 22222222222222222222222222222222233333333
> > GP state i e p x r rx i
> > CPU0 : rse rsx
> > CPU1 : rse rsx
> > CPU2 : rse rsx
> >
> > Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> > and 2 for the rcu_sync_exit(s).
>
> But this is fine?
>
> We only need to ensure that we have a full GP pass between the "last"
> rcu_sync_exit() and GP_XXX -> GP_IDLE transition.
>
> > However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> > rcu_sync_exit() got batched.
> >
> > So my point here is, rcu_sync_exit() does not really always cause a new
> > GP to happen
>
> See above, it should not.

Ok, I understand now. The point is to wait for a full GP, not necessarily
start a new one on each exit.

> > Then what is the point of the GP_REPLAY state at all if it does not
> > always wait for a new GP?
>
> Again, I don't understand... GP_REPLAY ensures that we will have a full GP
> before rcu_sync_func() sets GP_IDLE, note that it does another "recursive"
> call_rcu() if it sees GP_REPLAY.

Ok, got it.

> > Taking a step back, why did we intend to have
> > to wait for a new GP if another rcu_sync_exit() comes while one is still
> > in progress?
>
> To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
> do rcu_sync_exit(), then it must also see all memory changes you did before
> rcu_sync_exit().

Would this not be better implemented using memory barriers, than starting new
grace periods just for memory ordering? A memory barrier is lighter than
having to go through a grace period. So something like: if the state is
already GP_EXIT, then rcu_sync_exit() issues a memory barrier instead of
replaying. But if state is GP_PASSED, then wait for a grace period. Or, do
you see a situation where this will not work?

thanks,

- Joel

2019-10-04 19:29:40

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

Hi "Joel,

I love your patch! Yet something to improve:

[auto build test ERROR on rcu/dev]
[cannot apply to v5.4-rc1 next-20191004]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Joel-Fernandes-Google/Remove-GP_REPLAY-state-from-rcu_sync/20191005-024257
base: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

kernel/rcu/sync.c: In function 'rcu_sync_dtor':
>> kernel/rcu/sync.c:187:23: error: 'GP_REPLAY' undeclared (first use in this function)
if (rsp->gp_state == GP_REPLAY)
^~~~~~~~~
kernel/rcu/sync.c:187:23: note: each undeclared identifier is reported only once for each function it appears in

vim +/GP_REPLAY +187 kernel/rcu/sync.c

07899a6e5f5613 Oleg Nesterov 2015-08-21 174
07899a6e5f5613 Oleg Nesterov 2015-08-21 175 /**
07899a6e5f5613 Oleg Nesterov 2015-08-21 176 * rcu_sync_dtor() - Clean up an rcu_sync structure
07899a6e5f5613 Oleg Nesterov 2015-08-21 177 * @rsp: Pointer to rcu_sync structure to be cleaned up
07899a6e5f5613 Oleg Nesterov 2015-08-21 178 */
07899a6e5f5613 Oleg Nesterov 2015-08-21 179 void rcu_sync_dtor(struct rcu_sync *rsp)
07899a6e5f5613 Oleg Nesterov 2015-08-21 180 {
89da3b94bb9741 Oleg Nesterov 2019-04-25 181 int gp_state;
07899a6e5f5613 Oleg Nesterov 2015-08-21 182
89da3b94bb9741 Oleg Nesterov 2019-04-25 183 WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
89da3b94bb9741 Oleg Nesterov 2019-04-25 184 WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
07899a6e5f5613 Oleg Nesterov 2015-08-21 185
07899a6e5f5613 Oleg Nesterov 2015-08-21 186 spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov 2019-04-25 @187 if (rsp->gp_state == GP_REPLAY)

:::::: The code at line 187 was first introduced by commit
:::::: 89da3b94bb97417ca2c5b0ce3a28643819030247 rcu/sync: Simplify the state machine

:::::: TO: Oleg Nesterov <[email protected]>
:::::: CC: Paul E. McKenney <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.63 kB)
.config.gz (7.04 kB)
Download all attachments

2019-10-04 22:07:38

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

Hi "Joel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on rcu/dev]
[cannot apply to v5.4-rc1 next-20191004]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Joel-Fernandes-Google/Remove-GP_REPLAY-state-from-rcu_sync/20191005-024257
base: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
config: x86_64-randconfig-b002-201939 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from include/linux/export.h:44:0,
from include/linux/linkage.h:7,
from include/linux/kernel.h:8,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/rcu_sync.h:13,
from kernel//rcu/sync.c:10:
kernel//rcu/sync.c: In function 'rcu_sync_dtor':
kernel//rcu/sync.c:187:23: error: 'GP_REPLAY' undeclared (first use in this function)
if (rsp->gp_state == GP_REPLAY)
^
include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
>> kernel//rcu/sync.c:187:2: note: in expansion of macro 'if'
if (rsp->gp_state == GP_REPLAY)
^~
kernel//rcu/sync.c:187:23: note: each undeclared identifier is reported only once for each function it appears in
if (rsp->gp_state == GP_REPLAY)
^
include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
#define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
^~~~
>> kernel//rcu/sync.c:187:2: note: in expansion of macro 'if'
if (rsp->gp_state == GP_REPLAY)
^~

vim +/if +187 kernel//rcu/sync.c

cc44ca848f5e51 Oleg Nesterov 2015-08-21 @10 #include <linux/rcu_sync.h>
cc44ca848f5e51 Oleg Nesterov 2015-08-21 11 #include <linux/sched.h>
cc44ca848f5e51 Oleg Nesterov 2015-08-21 12
6d1a4c2dfe7bb0 Joel Fernandes 2019-10-04 13 enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
cc44ca848f5e51 Oleg Nesterov 2015-08-21 14
cc44ca848f5e51 Oleg Nesterov 2015-08-21 15 #define rss_lock gp_wait.lock
cc44ca848f5e51 Oleg Nesterov 2015-08-21 16
cc44ca848f5e51 Oleg Nesterov 2015-08-21 17 /**
cc44ca848f5e51 Oleg Nesterov 2015-08-21 18 * rcu_sync_init() - Initialize an rcu_sync structure
cc44ca848f5e51 Oleg Nesterov 2015-08-21 19 * @rsp: Pointer to rcu_sync structure to be initialized
cc44ca848f5e51 Oleg Nesterov 2015-08-21 20 */
95bf33b55ff446 Oleg Nesterov 2019-04-23 21 void rcu_sync_init(struct rcu_sync *rsp)
cc44ca848f5e51 Oleg Nesterov 2015-08-21 22 {
cc44ca848f5e51 Oleg Nesterov 2015-08-21 23 memset(rsp, 0, sizeof(*rsp));
cc44ca848f5e51 Oleg Nesterov 2015-08-21 24 init_waitqueue_head(&rsp->gp_wait);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 25 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 26
3942a9bd7b5842 Peter Zijlstra 2016-08-11 27 /**
27fdb35fe99011 Paul E. McKenney 2017-10-19 28 * rcu_sync_enter_start - Force readers onto slow path for multiple updates
27fdb35fe99011 Paul E. McKenney 2017-10-19 29 * @rsp: Pointer to rcu_sync structure to use for synchronization
27fdb35fe99011 Paul E. McKenney 2017-10-19 30 *
3942a9bd7b5842 Peter Zijlstra 2016-08-11 31 * Must be called after rcu_sync_init() and before first use.
3942a9bd7b5842 Peter Zijlstra 2016-08-11 32 *
3942a9bd7b5842 Peter Zijlstra 2016-08-11 33 * Ensures rcu_sync_is_idle() returns false and rcu_sync_{enter,exit}()
3942a9bd7b5842 Peter Zijlstra 2016-08-11 34 * pairs turn into NO-OPs.
3942a9bd7b5842 Peter Zijlstra 2016-08-11 35 */
3942a9bd7b5842 Peter Zijlstra 2016-08-11 36 void rcu_sync_enter_start(struct rcu_sync *rsp)
3942a9bd7b5842 Peter Zijlstra 2016-08-11 37 {
3942a9bd7b5842 Peter Zijlstra 2016-08-11 38 rsp->gp_count++;
3942a9bd7b5842 Peter Zijlstra 2016-08-11 39 rsp->gp_state = GP_PASSED;
3942a9bd7b5842 Peter Zijlstra 2016-08-11 40 }
3942a9bd7b5842 Peter Zijlstra 2016-08-11 41
cc44ca848f5e51 Oleg Nesterov 2015-08-21 42
89da3b94bb9741 Oleg Nesterov 2019-04-25 43 static void rcu_sync_func(struct rcu_head *rhp);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 44
89da3b94bb9741 Oleg Nesterov 2019-04-25 45 static void rcu_sync_call(struct rcu_sync *rsp)
89da3b94bb9741 Oleg Nesterov 2019-04-25 46 {
89da3b94bb9741 Oleg Nesterov 2019-04-25 47 call_rcu(&rsp->cb_head, rcu_sync_func);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 48 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 49
cc44ca848f5e51 Oleg Nesterov 2015-08-21 50 /**
cc44ca848f5e51 Oleg Nesterov 2015-08-21 51 * rcu_sync_func() - Callback function managing reader access to fastpath
27fdb35fe99011 Paul E. McKenney 2017-10-19 52 * @rhp: Pointer to rcu_head in rcu_sync structure to use for synchronization
cc44ca848f5e51 Oleg Nesterov 2015-08-21 53 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 54 * This function is passed to call_rcu() function by rcu_sync_enter() and
cc44ca848f5e51 Oleg Nesterov 2015-08-21 55 * rcu_sync_exit(), so that it is invoked after a grace period following the
89da3b94bb9741 Oleg Nesterov 2019-04-25 56 * that invocation of enter/exit.
89da3b94bb9741 Oleg Nesterov 2019-04-25 57 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 58 * If it is called by rcu_sync_enter() it signals that all the readers were
89da3b94bb9741 Oleg Nesterov 2019-04-25 59 * switched onto slow path.
89da3b94bb9741 Oleg Nesterov 2019-04-25 60 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 61 * If it is called by rcu_sync_exit() it takes action based on events that
cc44ca848f5e51 Oleg Nesterov 2015-08-21 62 * have taken place in the meantime, so that closely spaced rcu_sync_enter()
cc44ca848f5e51 Oleg Nesterov 2015-08-21 63 * and rcu_sync_exit() pairs need not wait for a grace period.
cc44ca848f5e51 Oleg Nesterov 2015-08-21 64 *
cc44ca848f5e51 Oleg Nesterov 2015-08-21 65 * If another rcu_sync_enter() is invoked before the grace period
cc44ca848f5e51 Oleg Nesterov 2015-08-21 66 * ended, reset state to allow the next rcu_sync_exit() to let the
cc44ca848f5e51 Oleg Nesterov 2015-08-21 67 * readers back onto their fastpaths (after a grace period). If both
cc44ca848f5e51 Oleg Nesterov 2015-08-21 68 * another rcu_sync_enter() and its matching rcu_sync_exit() are invoked
cc44ca848f5e51 Oleg Nesterov 2015-08-21 69 * before the grace period ended, re-invoke call_rcu() on behalf of that
cc44ca848f5e51 Oleg Nesterov 2015-08-21 70 * rcu_sync_exit(). Otherwise, set all state back to idle so that readers
cc44ca848f5e51 Oleg Nesterov 2015-08-21 71 * can again use their fastpaths.
cc44ca848f5e51 Oleg Nesterov 2015-08-21 72 */
27fdb35fe99011 Paul E. McKenney 2017-10-19 73 static void rcu_sync_func(struct rcu_head *rhp)
cc44ca848f5e51 Oleg Nesterov 2015-08-21 74 {
27fdb35fe99011 Paul E. McKenney 2017-10-19 75 struct rcu_sync *rsp = container_of(rhp, struct rcu_sync, cb_head);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 76 unsigned long flags;
cc44ca848f5e51 Oleg Nesterov 2015-08-21 77
89da3b94bb9741 Oleg Nesterov 2019-04-25 78 WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
89da3b94bb9741 Oleg Nesterov 2019-04-25 79 WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 80
cc44ca848f5e51 Oleg Nesterov 2015-08-21 81 spin_lock_irqsave(&rsp->rss_lock, flags);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 82 if (rsp->gp_count) {
cc44ca848f5e51 Oleg Nesterov 2015-08-21 83 /*
89da3b94bb9741 Oleg Nesterov 2019-04-25 84 * We're at least a GP after the GP_IDLE->GP_ENTER transition.
cc44ca848f5e51 Oleg Nesterov 2015-08-21 85 */
89da3b94bb9741 Oleg Nesterov 2019-04-25 86 WRITE_ONCE(rsp->gp_state, GP_PASSED);
89da3b94bb9741 Oleg Nesterov 2019-04-25 87 wake_up_locked(&rsp->gp_wait);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 88 } else {
cc44ca848f5e51 Oleg Nesterov 2015-08-21 89 /*
89da3b94bb9741 Oleg Nesterov 2019-04-25 90 * We're at least a GP after the last rcu_sync_exit(); eveybody
89da3b94bb9741 Oleg Nesterov 2019-04-25 91 * will now have observed the write side critical section.
89da3b94bb9741 Oleg Nesterov 2019-04-25 92 * Let 'em rip!.
cc44ca848f5e51 Oleg Nesterov 2015-08-21 93 */
89da3b94bb9741 Oleg Nesterov 2019-04-25 94 WRITE_ONCE(rsp->gp_state, GP_IDLE);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 95 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 96 spin_unlock_irqrestore(&rsp->rss_lock, flags);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 97 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 98
cc44ca848f5e51 Oleg Nesterov 2015-08-21 99 /**
89da3b94bb9741 Oleg Nesterov 2019-04-25 100 * rcu_sync_enter() - Force readers onto slowpath
89da3b94bb9741 Oleg Nesterov 2019-04-25 101 * @rsp: Pointer to rcu_sync structure to use for synchronization
89da3b94bb9741 Oleg Nesterov 2019-04-25 102 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 103 * This function is used by updaters who need readers to make use of
89da3b94bb9741 Oleg Nesterov 2019-04-25 104 * a slowpath during the update. After this function returns, all
89da3b94bb9741 Oleg Nesterov 2019-04-25 105 * subsequent calls to rcu_sync_is_idle() will return false, which
89da3b94bb9741 Oleg Nesterov 2019-04-25 106 * tells readers to stay off their fastpaths. A later call to
89da3b94bb9741 Oleg Nesterov 2019-04-25 107 * rcu_sync_exit() re-enables reader slowpaths.
89da3b94bb9741 Oleg Nesterov 2019-04-25 108 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 109 * When called in isolation, rcu_sync_enter() must wait for a grace
89da3b94bb9741 Oleg Nesterov 2019-04-25 110 * period, however, closely spaced calls to rcu_sync_enter() can
89da3b94bb9741 Oleg Nesterov 2019-04-25 111 * optimize away the grace-period wait via a state machine implemented
89da3b94bb9741 Oleg Nesterov 2019-04-25 112 * by rcu_sync_enter(), rcu_sync_exit(), and rcu_sync_func().
89da3b94bb9741 Oleg Nesterov 2019-04-25 113 */
89da3b94bb9741 Oleg Nesterov 2019-04-25 114 void rcu_sync_enter(struct rcu_sync *rsp)
89da3b94bb9741 Oleg Nesterov 2019-04-25 115 {
89da3b94bb9741 Oleg Nesterov 2019-04-25 116 int gp_state;
89da3b94bb9741 Oleg Nesterov 2019-04-25 117
89da3b94bb9741 Oleg Nesterov 2019-04-25 118 spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov 2019-04-25 119 gp_state = rsp->gp_state;
89da3b94bb9741 Oleg Nesterov 2019-04-25 120 if (gp_state == GP_IDLE) {
89da3b94bb9741 Oleg Nesterov 2019-04-25 121 WRITE_ONCE(rsp->gp_state, GP_ENTER);
89da3b94bb9741 Oleg Nesterov 2019-04-25 122 WARN_ON_ONCE(rsp->gp_count);
89da3b94bb9741 Oleg Nesterov 2019-04-25 123 /*
89da3b94bb9741 Oleg Nesterov 2019-04-25 124 * Note that we could simply do rcu_sync_call(rsp) here and
89da3b94bb9741 Oleg Nesterov 2019-04-25 125 * avoid the "if (gp_state == GP_IDLE)" block below.
89da3b94bb9741 Oleg Nesterov 2019-04-25 126 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 127 * However, synchronize_rcu() can be faster if rcu_expedited
89da3b94bb9741 Oleg Nesterov 2019-04-25 128 * or rcu_blocking_is_gp() is true.
89da3b94bb9741 Oleg Nesterov 2019-04-25 129 *
89da3b94bb9741 Oleg Nesterov 2019-04-25 130 * Another reason is that we can't wait for rcu callback if
89da3b94bb9741 Oleg Nesterov 2019-04-25 131 * we are called at early boot time but this shouldn't happen.
89da3b94bb9741 Oleg Nesterov 2019-04-25 132 */
89da3b94bb9741 Oleg Nesterov 2019-04-25 133 }
89da3b94bb9741 Oleg Nesterov 2019-04-25 134 rsp->gp_count++;
89da3b94bb9741 Oleg Nesterov 2019-04-25 135 spin_unlock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov 2019-04-25 136
89da3b94bb9741 Oleg Nesterov 2019-04-25 137 if (gp_state == GP_IDLE) {
89da3b94bb9741 Oleg Nesterov 2019-04-25 138 /*
89da3b94bb9741 Oleg Nesterov 2019-04-25 139 * See the comment above, this simply does the "synchronous"
89da3b94bb9741 Oleg Nesterov 2019-04-25 140 * call_rcu(rcu_sync_func) which does GP_ENTER -> GP_PASSED.
89da3b94bb9741 Oleg Nesterov 2019-04-25 141 */
89da3b94bb9741 Oleg Nesterov 2019-04-25 142 synchronize_rcu();
89da3b94bb9741 Oleg Nesterov 2019-04-25 143 rcu_sync_func(&rsp->cb_head);
89da3b94bb9741 Oleg Nesterov 2019-04-25 144 /* Not really needed, wait_event() would see GP_PASSED. */
89da3b94bb9741 Oleg Nesterov 2019-04-25 145 return;
89da3b94bb9741 Oleg Nesterov 2019-04-25 146 }
89da3b94bb9741 Oleg Nesterov 2019-04-25 147
89da3b94bb9741 Oleg Nesterov 2019-04-25 148 wait_event(rsp->gp_wait, READ_ONCE(rsp->gp_state) >= GP_PASSED);
89da3b94bb9741 Oleg Nesterov 2019-04-25 149 }
89da3b94bb9741 Oleg Nesterov 2019-04-25 150
89da3b94bb9741 Oleg Nesterov 2019-04-25 151 /**
89da3b94bb9741 Oleg Nesterov 2019-04-25 152 * rcu_sync_exit() - Allow readers back onto fast path after grace period
cc44ca848f5e51 Oleg Nesterov 2015-08-21 153 * @rsp: Pointer to rcu_sync structure to use for synchronization
cc44ca848f5e51 Oleg Nesterov 2015-08-21 154 *
cc44ca848f5e51 Oleg Nesterov 2015-08-21 155 * This function is used by updaters who have completed, and can therefore
cc44ca848f5e51 Oleg Nesterov 2015-08-21 156 * now allow readers to make use of their fastpaths after a grace period
cc44ca848f5e51 Oleg Nesterov 2015-08-21 157 * has elapsed. After this grace period has completed, all subsequent
cc44ca848f5e51 Oleg Nesterov 2015-08-21 158 * calls to rcu_sync_is_idle() will return true, which tells readers that
cc44ca848f5e51 Oleg Nesterov 2015-08-21 159 * they can once again use their fastpaths.
cc44ca848f5e51 Oleg Nesterov 2015-08-21 160 */
cc44ca848f5e51 Oleg Nesterov 2015-08-21 161 void rcu_sync_exit(struct rcu_sync *rsp)
cc44ca848f5e51 Oleg Nesterov 2015-08-21 162 {
6d1a4c2dfe7bb0 Joel Fernandes 2019-10-04 163 WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
89da3b94bb9741 Oleg Nesterov 2019-04-25 164
cc44ca848f5e51 Oleg Nesterov 2015-08-21 165 spin_lock_irq(&rsp->rss_lock);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 166 if (!--rsp->gp_count) {
89da3b94bb9741 Oleg Nesterov 2019-04-25 167 if (rsp->gp_state == GP_PASSED) {
89da3b94bb9741 Oleg Nesterov 2019-04-25 168 WRITE_ONCE(rsp->gp_state, GP_EXIT);
89da3b94bb9741 Oleg Nesterov 2019-04-25 169 rcu_sync_call(rsp);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 170 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 171 }
cc44ca848f5e51 Oleg Nesterov 2015-08-21 172 spin_unlock_irq(&rsp->rss_lock);
cc44ca848f5e51 Oleg Nesterov 2015-08-21 173 }
07899a6e5f5613 Oleg Nesterov 2015-08-21 174
07899a6e5f5613 Oleg Nesterov 2015-08-21 175 /**
07899a6e5f5613 Oleg Nesterov 2015-08-21 176 * rcu_sync_dtor() - Clean up an rcu_sync structure
07899a6e5f5613 Oleg Nesterov 2015-08-21 177 * @rsp: Pointer to rcu_sync structure to be cleaned up
07899a6e5f5613 Oleg Nesterov 2015-08-21 178 */
07899a6e5f5613 Oleg Nesterov 2015-08-21 179 void rcu_sync_dtor(struct rcu_sync *rsp)
07899a6e5f5613 Oleg Nesterov 2015-08-21 180 {
89da3b94bb9741 Oleg Nesterov 2019-04-25 181 int gp_state;
07899a6e5f5613 Oleg Nesterov 2015-08-21 182
89da3b94bb9741 Oleg Nesterov 2019-04-25 183 WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
89da3b94bb9741 Oleg Nesterov 2019-04-25 184 WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
07899a6e5f5613 Oleg Nesterov 2015-08-21 185
07899a6e5f5613 Oleg Nesterov 2015-08-21 186 spin_lock_irq(&rsp->rss_lock);
89da3b94bb9741 Oleg Nesterov 2019-04-25 @187 if (rsp->gp_state == GP_REPLAY)

:::::: The code at line 187 was first introduced by commit
:::::: 89da3b94bb97417ca2c5b0ce3a28643819030247 rcu/sync: Simplify the state machine

:::::: TO: Oleg Nesterov <[email protected]>
:::::: CC: Paul E. McKenney <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (16.96 kB)
.config.gz (29.64 kB)
Download all attachments

2019-10-07 14:12:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

On 10/04, Joel Fernandes wrote:
>
> On Fri, Oct 04, 2019 at 05:41:03PM +0200, Oleg Nesterov wrote:
> > On 10/04, Joel Fernandes (Google) wrote:
> > >
> > > Taking a step back, why did we intend to have
> > > to wait for a new GP if another rcu_sync_exit() comes while one is still
> > > in progress?
> >
> > To ensure that if another CPU sees rcu_sync_is_idle() (GP_IDLE) after you
> > do rcu_sync_exit(), then it must also see all memory changes you did before
> > rcu_sync_exit().
>
> Would this not be better implemented using memory barriers, than starting new
> grace periods just for memory ordering? A memory barrier is lighter than
> having to go through a grace period. So something like: if the state is
> already GP_EXIT, then rcu_sync_exit() issues a memory barrier instead of
> replaying. But if state is GP_PASSED, then wait for a grace period.

But these 2 cases do not differ. If we can use mb() if GP_EXIT, then we can
do the same if state == GP_PASSED and just move the state to GP_IDLE, and
remove both GP_PASSED/GP_REPLAY states.

However, in this case the readers will need the barrier too, and rcu_sync_enter()
will _always_ need to block (wait for GP).

rcu_sync.c is "equivalent" to the following implementation:


struct rcu_sync_struct {
atomic_t writers;
};

bool rcu_sync_is_idle(rss)
{
return atomic_read(rss->writers) == 0;
}

void rcu_sync_enter(rss)
{
atomic_inc(rss->writers);
synchronize_rcu();
}

void rcu_sync_exit(rss)
{
synchronize_rcu();
atomic_dec(rss->writers);
}

except

- rcu_sync_exit() never blocks

- synchronize_rcu/call_rci is called only if it is really needed.
In particular, if 2 writers come in a row the 2nd one will not
block in _enter().

Oleg.

2020-01-16 23:27:15

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

On Fri, Oct 4, 2019 at 11:41 AM Oleg Nesterov <[email protected]> wrote:
>
> On 10/04, Joel Fernandes (Google) wrote:
> >
> > But this is not always true if you consider the following events:
>
> I'm afraid I missed your point, but...
>
> > ---------------------->
> > GP num 111111 22222222222222222222222222222222233333333
> > GP state i e p x r rx i
> > CPU0 : rse rsx
> > CPU1 : rse rsx
> > CPU2 : rse rsx
> >
> > Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> > and 2 for the rcu_sync_exit(s).
>
> But this is fine?
>
> We only need to ensure that we have a full GP pass between the "last"
> rcu_sync_exit() and GP_XXX -> GP_IDLE transition.
>
> > However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> > rcu_sync_exit() got batched.
> >
> > So my point here is, rcu_sync_exit() does not really always cause a new
> > GP to happen
>
> See above, it should not.
>
> > Then what is the point of the GP_REPLAY state at all if it does not
> > always wait for a new GP?
>
> Again, I don't understand... GP_REPLAY ensures that we will have a full GP
> before rcu_sync_func() sets GP_IDLE, note that it does another "recursive"
> call_rcu() if it sees GP_REPLAY.

I finally got back to this (meanwhile life, job things happened). You
are right, only the last one needs a full GP and it does get one here.
Probably a comment in rcu_sync_exit() explaining this might help the
future reader.

Basically you are saying, if rcu_sync_exit() happens and GP_REPLAY is
already set, we need not worry about starting a new GP because
GP_REPLAY->GP_EXIT->GP_IDLE transition will involve a full GP anyway.
And only if, GP_EXIT is already set, then we must set GP_REPLAY and
wait for a full GP. This ensures the last rcu_sync_exit() gets a full
GP. I think that was what I was missing. Some reason I thought that
every rcu_sync_exit() needs to start a full GP.

thanks!

- Joel