2013-04-26 01:45:33

by Simon Horman

[permalink] [raw]
Subject: [PATCH 0/2] sched: Add cond_resched_rcu_lock() helper

Add a helper that for use in loops which read data protected by RCU and
may have a large number of iterations. Such an example is dumping the list
of connections known to IPVS: ip_vs_conn_array() and ip_vs_conn_seq_next().

This series also updates the two ip_vs functions mentioned above
to use the helper.

As suggested by Eric Dumazet.

Simon Horman (2):
sched: Add cond_resched_rcu_lock() helper
ipvs: Use cond_resched_rcu_lock() helper when dumping connections

include/linux/sched.h | 9 +++++++++
net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
2 files changed, 11 insertions(+), 4 deletions(-)

--
1.8.2.1


2013-04-26 01:45:31

by Simon Horman

[permalink] [raw]
Subject: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

This avoids the situation where a dump of a large number of connections
may prevent scheduling for a long time while also avoiding excessive
calls to rcu_read_unlock() and rcu_read_lock().

Cc: Eric Dumazet <[email protected]>
Cc: Julian Anastasov <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
---
net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a083bda..42a7b33 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
return cp;
}
}
- rcu_read_unlock();
- rcu_read_lock();
+ cond_resched_rcu_lock();
}

return NULL;
@@ -1015,8 +1014,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
iter->l = &ip_vs_conn_tab[idx];
return cp;
}
- rcu_read_unlock();
- rcu_read_lock();
+ cond_resched_rcu_lock();
}
iter->l = NULL;
return NULL;
--
1.8.2.1

2013-04-26 01:45:30

by Simon Horman

[permalink] [raw]
Subject: [PATCH 1/2] sched: Add cond_resched_rcu_lock() helper

This is intended for use in loops which read data protected by RCU and may
have a large number of iterations. Such an example is dumping the list of
connections known to IPVS: ip_vs_conn_array() and ip_vs_conn_seq_next().

As suggested by Eric Dumazet.

Cc: Eric Dumazet <[email protected]>
Cc: Julian Anastasov <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
---
include/linux/sched.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e692a02..7eec4c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2787,3 +2787,12 @@ static inline unsigned long rlimit_max(unsigned int limit)
}

#endif
+
+static void inline cond_resched_rcu_lock(void)
+{
+ if (need_resched()) {
+ rcu_read_unlock();
+ cond_resched();
+ rcu_read_lock();
+ }
+}
--
1.8.2.1

2013-04-26 06:14:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: Add cond_resched_rcu_lock() helper


* Simon Horman <[email protected]> wrote:

> This is intended for use in loops which read data protected by RCU and may
> have a large number of iterations. Such an example is dumping the list of
> connections known to IPVS: ip_vs_conn_array() and ip_vs_conn_seq_next().
>
> As suggested by Eric Dumazet.
>
> Cc: Eric Dumazet <[email protected]>
> Cc: Julian Anastasov <[email protected]>
> Signed-off-by: Simon Horman <[email protected]>
> ---
> include/linux/sched.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e692a02..7eec4c7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2787,3 +2787,12 @@ static inline unsigned long rlimit_max(unsigned int limit)
> }
>
> #endif
> +
> +static void inline cond_resched_rcu_lock(void)
> +{
> + if (need_resched()) {
> + rcu_read_unlock();
> + cond_resched();
> + rcu_read_lock();
> + }
> +}

Acked-by: Ingo Molnar <[email protected]>

Ingo

2013-04-26 06:15:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections


* Simon Horman <[email protected]> wrote:

> This avoids the situation where a dump of a large number of connections
> may prevent scheduling for a long time while also avoiding excessive
> calls to rcu_read_unlock() and rcu_read_lock().
>
> Cc: Eric Dumazet <[email protected]>
> Cc: Julian Anastasov <[email protected]>
> Signed-off-by: Simon Horman <[email protected]>
> ---
> net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index a083bda..42a7b33 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c
> @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> return cp;
> }
> }
> - rcu_read_unlock();
> - rcu_read_lock();
> + cond_resched_rcu_lock();
> }
>
> return NULL;
> @@ -1015,8 +1014,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> iter->l = &ip_vs_conn_tab[idx];
> return cp;
> }
> - rcu_read_unlock();
> - rcu_read_lock();
> + cond_resched_rcu_lock();

Feel free to route this via the networking tree.

Note that this change isn't a pure clean-up but has functional effects as
well: on !PREEMPT or PREEMPT_VOLUNTARY kernels it will add in a potential
cond_resched() - while previously it would only rcu unlock and relock
(which in itself isn't scheduling).

This should probably be pointed out in the changelog.

Thanks,

Ingo

2013-04-26 08:05:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:

> @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> return cp;
> }
> }
> - rcu_read_unlock();
> - rcu_read_lock();
> + cond_resched_rcu_lock();
> }


While I agree with the sentiment I do find it a somewhat dangerous construct in
that it might become far too easy to keep an RCU reference over this break and
thus violate the RCU premise.

Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
would be great to add this to these checkers.

2013-04-26 15:46:09

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
>
> > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > return cp;
> > }
> > }
> > - rcu_read_unlock();
> > - rcu_read_lock();
> > + cond_resched_rcu_lock();
> > }
>
>
> While I agree with the sentiment I do find it a somewhat dangerous construct in
> that it might become far too easy to keep an RCU reference over this break and
> thus violate the RCU premise.
>
> Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> would be great to add this to these checkers.

I have done some crude coccinelle patterns in the past, but they are
subject to false positives (from when you transfer the pointer from
RCU protection to reference-count protection) and also false negatives
(when you atomically increment some statistic unrelated to protection).

I could imagine maintaining a per-thread count of the number of outermost
RCU read-side critical sections at runtime, and then associating that
counter with a given pointer at rcu_dereference() time, but this would
require either compiler magic or an API for using a pointer returned
by rcu_dereference(). This API could in theory be enforced by sparse.

Dhaval Giani might have some ideas as well, adding him to CC.

Thanx, Paul

2013-04-26 15:59:13

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, 2013-04-26 at 08:45 -0700, Paul E. McKenney wrote:

> I have done some crude coccinelle patterns in the past, but they are
> subject to false positives (from when you transfer the pointer from
> RCU protection to reference-count protection) and also false negatives
> (when you atomically increment some statistic unrelated to protection).
>
> I could imagine maintaining a per-thread count of the number of outermost
> RCU read-side critical sections at runtime, and then associating that
> counter with a given pointer at rcu_dereference() time, but this would
> require either compiler magic or an API for using a pointer returned
> by rcu_dereference(). This API could in theory be enforced by sparse.
>
> Dhaval Giani might have some ideas as well, adding him to CC.


We had this fix the otherday, because tcp prequeue code hit this check :

static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
/* If refdst was not refcounted, check we still are in a
* rcu_read_lock section
*/
WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
!rcu_read_lock_held() &&
!rcu_read_lock_bh_held());
return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

( http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=093162553c33e9479283e107b4431378271c735d )

Problem is the rcu protected pointer was escaping the rcu lock and was
then used in another thread.


What would be cool (but expensive maybe) , would be to get a cookie from
rcu_read_lock(), and check the cookie at rcu_dereference(). These
cookies would have system wide scope to catch any kind of errors.

Because a per thread counter would not catch following problem :

rcu_read_lock();

ptr = rcu_dereference(x);
if (!ptr)
return NULL;
...


rcu_read_unlock();

...
rcu_read_lock();
/* no reload of x, ptr might be now stale/freed */
if (ptr->field) { ... }


2013-04-26 16:35:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 08:59:07AM -0700, Eric Dumazet wrote:
> On Fri, 2013-04-26 at 08:45 -0700, Paul E. McKenney wrote:
>
> > I have done some crude coccinelle patterns in the past, but they are
> > subject to false positives (from when you transfer the pointer from
> > RCU protection to reference-count protection) and also false negatives
> > (when you atomically increment some statistic unrelated to protection).
> >
> > I could imagine maintaining a per-thread count of the number of outermost
> > RCU read-side critical sections at runtime, and then associating that
> > counter with a given pointer at rcu_dereference() time, but this would
> > require either compiler magic or an API for using a pointer returned
> > by rcu_dereference(). This API could in theory be enforced by sparse.
> >
> > Dhaval Giani might have some ideas as well, adding him to CC.
>
>
> We had this fix the otherday, because tcp prequeue code hit this check :
>
> static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
> {
> /* If refdst was not refcounted, check we still are in a
> * rcu_read_lock section
> */
> WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) &&
> !rcu_read_lock_held() &&
> !rcu_read_lock_bh_held());
> return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
> }
>
> ( http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=093162553c33e9479283e107b4431378271c735d )
>
> Problem is the rcu protected pointer was escaping the rcu lock and was
> then used in another thread.

All the way to some other thread? That is a serious escape! ;-)

> What would be cool (but expensive maybe) , would be to get a cookie from
> rcu_read_lock(), and check the cookie at rcu_dereference(). These
> cookies would have system wide scope to catch any kind of errors.

I suspect that your cookie and my counter are quite similar.

> Because a per thread counter would not catch following problem :
>
> rcu_read_lock();
>
> ptr = rcu_dereference(x);
> if (!ptr)
> return NULL;
> ...
>
>
> rcu_read_unlock();
>
> ...
> rcu_read_lock();
> /* no reload of x, ptr might be now stale/freed */
> if (ptr->field) { ... }

Well, that is why I needed to appeal to compiler magic or an API
extension. Either way, the pointer returned from rcu_dereference()
must be tagged with the ID of the outermost rcu_read_lock() that
covers it. Then that tag must be checked against that of the outermost
rcu_read_lock() when it is dereferenced. If we introduced yet another
set of RCU API members (shudder!) with which to wrapper your
ptr->field use, we could associate additional storage with the
pointer to hold the cookie/counter. And then use sparse to enforce
use of the new API.

Of course, if we were using C++, we could define a template for pointers
returned by rcu_dereference() in order to make this work. Now, where
did I put that asbestos suit, you know, the one with the titanium
pinstripes? ;-)

But even that has some "interesting" issues. To see this, let's
modify your example a bit:

rcu_read_lock();

...

rcu_read_lock_bh();

ptr = rcu_dereference(x);
if (!ptr)
return NULL;
...


rcu_read_unlock_bh();

...
/* no reload of x, ptr might be now stale/freed */
if (ptr->field) { ... }

...

rcu_read_unlock();


Now, is it the rcu_read_lock() or the rcu_read_lock_bh() that protects
the rcu_dereference()?

Thanx, Paul

2013-04-26 17:21:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 08:45:47AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
> >
> > > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > > return cp;
> > > }
> > > }
> > > - rcu_read_unlock();
> > > - rcu_read_lock();
> > > + cond_resched_rcu_lock();
> > > }
> >
> >
> > While I agree with the sentiment I do find it a somewhat dangerous construct in
> > that it might become far too easy to keep an RCU reference over this break and
> > thus violate the RCU premise.
> >
> > Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> > would be great to add this to these checkers.
>
> I have done some crude coccinelle patterns in the past, but they are
> subject to false positives (from when you transfer the pointer from
> RCU protection to reference-count protection) and also false negatives
> (when you atomically increment some statistic unrelated to protection).
>
> I could imagine maintaining a per-thread count of the number of outermost
> RCU read-side critical sections at runtime, and then associating that
> counter with a given pointer at rcu_dereference() time, but this would
> require either compiler magic or an API for using a pointer returned
> by rcu_dereference(). This API could in theory be enforced by sparse.

Luckily cond_resched_rcu_lock() will typically only occur within loops, and
loops tend to be contained in a single sourcefile.

This would suggest a simple static checker should be able to tell without too
much magic right? All it needs to do is track pointers returned from
rcu_dereference*() and see if they're used after cond_resched_rcu_lock().

Also, cond_resched_rcu_lock() will only drop a single level of RCU refs; so
that should be easier still.

2013-04-26 17:48:40

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 07:19:49PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 26, 2013 at 08:45:47AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 26, 2013 at 10:03:13AM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 26, 2013 at 10:45:08AM +0900, Simon Horman wrote:
> > >
> > > > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > > > return cp;
> > > > }
> > > > }
> > > > - rcu_read_unlock();
> > > > - rcu_read_lock();
> > > > + cond_resched_rcu_lock();
> > > > }
> > >
> > >
> > > While I agree with the sentiment I do find it a somewhat dangerous construct in
> > > that it might become far too easy to keep an RCU reference over this break and
> > > thus violate the RCU premise.
> > >
> > > Is there anything that can detect this? Sparse / cocinelle / smatch? If so it
> > > would be great to add this to these checkers.
> >
> > I have done some crude coccinelle patterns in the past, but they are
> > subject to false positives (from when you transfer the pointer from
> > RCU protection to reference-count protection) and also false negatives
> > (when you atomically increment some statistic unrelated to protection).
> >
> > I could imagine maintaining a per-thread count of the number of outermost
> > RCU read-side critical sections at runtime, and then associating that
> > counter with a given pointer at rcu_dereference() time, but this would
> > require either compiler magic or an API for using a pointer returned
> > by rcu_dereference(). This API could in theory be enforced by sparse.
>
> Luckily cond_resched_rcu_lock() will typically only occur within loops, and
> loops tend to be contained in a single sourcefile.
>
> This would suggest a simple static checker should be able to tell without too
> much magic right? All it needs to do is track pointers returned from
> rcu_dereference*() and see if they're used after cond_resched_rcu_lock().
>
> Also, cond_resched_rcu_lock() will only drop a single level of RCU refs; so
> that should be easier still.

Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
will be difficult to validate. For one thing, until there are a lot of
them, manual inspection is quite possible. So feel free to apply my
Acked-by to the patch.

But it is definitely not too early to start thinking about how best to
automatically validate this sort of thing!

Thanx, Paul

2013-04-26 18:27:01

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:

> Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> will be difficult to validate. For one thing, until there are a lot of
> them, manual inspection is quite possible. So feel free to apply my
> Acked-by to the patch.

One question : If some thread(s) is(are) calling rcu_barrier() and
waiting we exit from rcu_read_lock() section, is need_resched() enough
for allowing to break the section ?

If not, maybe we should not test need_resched() at all.

rcu_read_unlock();
cond_resched();
rcu_read_lock();


2013-04-26 19:05:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 11:26:55AM -0700, Eric Dumazet wrote:
> On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
>
> > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > will be difficult to validate. For one thing, until there are a lot of
> > them, manual inspection is quite possible. So feel free to apply my
> > Acked-by to the patch.
>
> One question : If some thread(s) is(are) calling rcu_barrier() and
> waiting we exit from rcu_read_lock() section, is need_resched() enough
> for allowing to break the section ?
>
> If not, maybe we should not test need_resched() at all.
>
> rcu_read_unlock();
> cond_resched();
> rcu_read_lock();

A call to rcu_barrier() only blocks on already-queued RCU callbacks, so if
there are no RCU callbacks queued in the system, it need not block at all.

But it might need to wait on some callbacks, and thus might need to
wait for a grace period. So, is cond_resched() sufficient?
Currently, it depends:

1. CONFIG_TINY_RCU: Here cond_resched() doesn't do anything unless
there is at least one other process that is at and appropriate
priority level. So if the system has absolutely nothing else
to do other than run the in-kernel loop containing the
cond_resched_rcu_lock(), the grace period will never end.

But as soon as some other process wakes up, there will be a
context switch and the grace period will end. Unless you
are running at some high real-time priority, in which case
either throttling kicks in after a second or so or you get
what you deserve. ;-)

So for any reasonable workload, cond_resched() will eventually
suffice.

2. CONFIG_TREE_RCU without adaptive ticks (which is not yet in
tree): Same as #1, except that there is a greater chance
that the eventual wakeup might happen on some other CPU.

3. CONFIG_TREE_RCU with adaptive ticks (once it makes it into
mainline): After a new jiffies, RCU will kick the offending
CPU, which will turn on the scheduling-clock interrupt.
This won't end the grace period, but the kick could do a
bit more if needed.

4. CONFIG_TREE_PREEMPT_RCU: When the next scheduling-clock
interrupt notices that it happened in an RCU read-side
critical section and that there is a grace period pending,
it will set a flag in the task structure. The next
rcu_read_unlock() will report a quiescent state to the
RCU core.

So perhaps RCU should do a bit more in cases #2 and #3. It used to
send a resched IPI in this case, but if there is no reason to
reschedule, the resched IPI does nothing. In the worst case, I
can fire up a prio 99 kthread on each CPU and send that kthread a
wakeup from RCU's rcu_gp_fqs() code.

Other thoughts?

Thanx, Paul

2013-04-27 07:20:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

> In the worst case, I
> can fire up a prio 99 kthread on each CPU and send that kthread a
> wakeup from RCU's rcu_gp_fqs() code.

You just know that's going to be _so_ popular ;-)

2013-04-27 11:40:10

by Julian Anastasov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections


Hello,

On Fri, 26 Apr 2013, Eric Dumazet wrote:

> On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
>
> > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > will be difficult to validate. For one thing, until there are a lot of
> > them, manual inspection is quite possible. So feel free to apply my
> > Acked-by to the patch.
>
> One question : If some thread(s) is(are) calling rcu_barrier() and
> waiting we exit from rcu_read_lock() section, is need_resched() enough
> for allowing to break the section ?
>
> If not, maybe we should not test need_resched() at all.
>
> rcu_read_unlock();
> cond_resched();
> rcu_read_lock();

So, I assume, to help realtime kernels and rcu_barrier
it is not a good idea to guard rcu_read_unlock with checks.
I see that rcu_read_unlock will try to reschedule in the
!CONFIG_PREEMPT_RCU case (via preempt_enable), can we
use ifdefs to avoid double TIF_NEED_RESCHED check?:

rcu_read_unlock();
#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
cond_resched();
#endif
rcu_read_lock();

Regards

--
Julian Anastasov <[email protected]>

2013-04-27 16:17:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Sat, Apr 27, 2013 at 09:18:15AM +0200, Peter Zijlstra wrote:
> > In the worst case, I
> > can fire up a prio 99 kthread on each CPU and send that kthread a
> > wakeup from RCU's rcu_gp_fqs() code.
>
> You just know that's going to be _so_ popular ;-)

;-) ;-) ;-)

I must confess that I would prefer a somewhat less heavy-handed approach.

Thanx, Paul

2013-04-27 16:21:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 26 Apr 2013, Eric Dumazet wrote:
>
> > On Fri, 2013-04-26 at 10:48 -0700, Paul E. McKenney wrote:
> >
> > > Don't get me wrong, I am not opposing cond_resched_rcu_lock() because it
> > > will be difficult to validate. For one thing, until there are a lot of
> > > them, manual inspection is quite possible. So feel free to apply my
> > > Acked-by to the patch.
> >
> > One question : If some thread(s) is(are) calling rcu_barrier() and
> > waiting we exit from rcu_read_lock() section, is need_resched() enough
> > for allowing to break the section ?
> >
> > If not, maybe we should not test need_resched() at all.
> >
> > rcu_read_unlock();
> > cond_resched();
> > rcu_read_lock();
>
> So, I assume, to help realtime kernels and rcu_barrier
> it is not a good idea to guard rcu_read_unlock with checks.
> I see that rcu_read_unlock will try to reschedule in the
> !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> use ifdefs to avoid double TIF_NEED_RESCHED check?:
>
> rcu_read_unlock();
> #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)

I would instead suggest something like:

#ifndef CONFIG_PREEMPT_RCU

But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
needed.

Thanx, Paul

> cond_resched();
> #endif
> rcu_read_lock();
>
> Regards
>
> --
> Julian Anastasov <[email protected]>
>

2013-04-29 21:12:32

by Julian Anastasov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections


Hello,

On Sat, 27 Apr 2013, Paul E. McKenney wrote:

> On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
> >
> > So, I assume, to help realtime kernels and rcu_barrier
> > it is not a good idea to guard rcu_read_unlock with checks.
> > I see that rcu_read_unlock will try to reschedule in the
> > !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> > use ifdefs to avoid double TIF_NEED_RESCHED check?:
> >
> > rcu_read_unlock();
> > #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
>
> I would instead suggest something like:
>
> #ifndef CONFIG_PREEMPT_RCU
>
> But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> needed.

Hm, is this correct? If I follow the ifdefs
preempt_schedule is called when CONFIG_PREEMPT is
defined _and_ CONFIG_PREEMPT_RCU is not defined.
Your example for CONFIG_PREEMPT_RCU is the opposite to this?

> Thanx, Paul
>
> > cond_resched();
> > #endif
> > rcu_read_lock();

Regards

--
Julian Anastasov <[email protected]>

2013-04-29 21:30:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Tue, Apr 30, 2013 at 12:08:18AM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Sat, 27 Apr 2013, Paul E. McKenney wrote:
>
> > On Sat, Apr 27, 2013 at 02:32:48PM +0300, Julian Anastasov wrote:
> > >
> > > So, I assume, to help realtime kernels and rcu_barrier
> > > it is not a good idea to guard rcu_read_unlock with checks.
> > > I see that rcu_read_unlock will try to reschedule in the
> > > !CONFIG_PREEMPT_RCU case (via preempt_enable), can we
> > > use ifdefs to avoid double TIF_NEED_RESCHED check?:
> > >
> > > rcu_read_unlock();
> > > #if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_RCU)
> >
> > I would instead suggest something like:
> >
> > #ifndef CONFIG_PREEMPT_RCU
> >
> > But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> > needed.
>
> Hm, is this correct? If I follow the ifdefs
> preempt_schedule is called when CONFIG_PREEMPT is
> defined _and_ CONFIG_PREEMPT_RCU is not defined.
> Your example for CONFIG_PREEMPT_RCU is the opposite to this?

Yep, I really did intend to say "#ifndef CONFIG_PREEMPT_RCU".

A couple of things to keep in mind:

1. Although rcu_read_unlock() does map to preempt_enable() for
CONFIG_TINY_RCU and CONFIG_TREE_RCU, the current Kconfig refuses
to allow either CONFIG_TINY_RCU or CONFIG_TREE_RCU to be selected
if CONFIG_PREEMPT=y.

2. In the CONFIG_PREEMPT_RCU case, __rcu_read_unlock() will check
to see if the RCU core needs to be informed, so there is no
need to invoke cond_resched() in that case.

3. If we drop your "|| defined(CONFIG_PREEMPT_RCU)", we get an
almost-synonym for my "#ifndef CONFIG_PREEMPT_RCU". The "almost"
applies to older kernels due to the possibility of having a
CONFIG_TINY_PREEMPT_RCU kernel -- but this possibility is going
away soon.

Make sense?

Thanx, Paul

> > > cond_resched();
> > > #endif
> > > rcu_read_lock();
>
> Regards
>
> --
> Julian Anastasov <[email protected]>
>

2013-04-29 23:11:20

by Julian Anastasov

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections


Hello,

On Mon, 29 Apr 2013, Paul E. McKenney wrote:

> On Tue, Apr 30, 2013 at 12:08:18AM +0300, Julian Anastasov wrote:
> >
> > Hello,
> >
> > On Sat, 27 Apr 2013, Paul E. McKenney wrote:
> >
> > > I would instead suggest something like:
> > >
> > > #ifndef CONFIG_PREEMPT_RCU
> > >
> > > But yes, in the CONFIG_PREEMPT_RCU case, the cond_resched() is not
> > > needed.
> >
> > Hm, is this correct? If I follow the ifdefs
> > preempt_schedule is called when CONFIG_PREEMPT is
> > defined _and_ CONFIG_PREEMPT_RCU is not defined.
> > Your example for CONFIG_PREEMPT_RCU is the opposite to this?
>
> Yep, I really did intend to say "#ifndef CONFIG_PREEMPT_RCU".
>
> A couple of things to keep in mind:
>
> 1. Although rcu_read_unlock() does map to preempt_enable() for
> CONFIG_TINY_RCU and CONFIG_TREE_RCU, the current Kconfig refuses
> to allow either CONFIG_TINY_RCU or CONFIG_TREE_RCU to be selected
> if CONFIG_PREEMPT=y.

I see, CONFIG_PREEMPT_RCU depends on CONFIG_PREEMPT

> 2. In the CONFIG_PREEMPT_RCU case, __rcu_read_unlock() will check
> to see if the RCU core needs to be informed, so there is no
> need to invoke cond_resched() in that case.

OK

> 3. If we drop your "|| defined(CONFIG_PREEMPT_RCU)", we get an
> almost-synonym for my "#ifndef CONFIG_PREEMPT_RCU". The "almost"
> applies to older kernels due to the possibility of having a
> CONFIG_TINY_PREEMPT_RCU kernel -- but this possibility is going
> away soon.
>
> Make sense?

Yes, thanks for the explanation!

Simon, so lets do it as suggested by Eric and Paul:

rcu_read_unlock();
#ifndef CONFIG_PREEMPT_RCU
cond_resched();
#endif
rcu_read_lock();

Regards

--
Julian Anastasov <[email protected]>

2013-04-30 02:45:57

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH 2/2] ipvs: Use cond_resched_rcu_lock() helper when dumping connections

On Fri, Apr 26, 2013 at 08:15:34AM +0200, Ingo Molnar wrote:
>
> * Simon Horman <[email protected]> wrote:
>
> > This avoids the situation where a dump of a large number of connections
> > may prevent scheduling for a long time while also avoiding excessive
> > calls to rcu_read_unlock() and rcu_read_lock().
> >
> > Cc: Eric Dumazet <[email protected]>
> > Cc: Julian Anastasov <[email protected]>
> > Signed-off-by: Simon Horman <[email protected]>
> > ---
> > net/netfilter/ipvs/ip_vs_conn.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> > index a083bda..42a7b33 100644
> > --- a/net/netfilter/ipvs/ip_vs_conn.c
> > +++ b/net/netfilter/ipvs/ip_vs_conn.c
> > @@ -975,8 +975,7 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
> > return cp;
> > }
> > }
> > - rcu_read_unlock();
> > - rcu_read_lock();
> > + cond_resched_rcu_lock();
> > }
> >
> > return NULL;
> > @@ -1015,8 +1014,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > iter->l = &ip_vs_conn_tab[idx];
> > return cp;
> > }
> > - rcu_read_unlock();
> > - rcu_read_lock();
> > + cond_resched_rcu_lock();
>
> Feel free to route this via the networking tree.
>
> Note that this change isn't a pure clean-up but has functional effects as
> well: on !PREEMPT or PREEMPT_VOLUNTARY kernels it will add in a potential
> cond_resched() - while previously it would only rcu unlock and relock
> (which in itself isn't scheduling).
>
> This should probably be pointed out in the changelog.

Thanks, I have added some text which I will include in v2.