2019-11-25 12:28:12

by Qais Yousef

[permalink] [raw]
Subject: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

The core device API performs extra housekeeping bits that are missing
from directly calling cpu_up/down.

See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
serialization during LPM") for an example description of what might go
wrong.

This also prepares to make cpu_up/down a private interface for anything
but the cpu subsystem.

Signed-off-by: Qais Yousef <[email protected]>
CC: Davidlohr Bueso <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: Josh Triplett <[email protected]>
CC: [email protected]
---
kernel/torture.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/torture.c b/kernel/torture.c
index 7c13f5558b71..12115024feb2 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -97,7 +97,9 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
torture_type, cpu);
starttime = jiffies;
(*n_offl_attempts)++;
- ret = cpu_down(cpu);
+ lock_device_hotplug();
+ ret = device_offline(get_cpu_device(cpu));
+ unlock_device_hotplug();
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
@@ -148,7 +150,9 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
torture_type, cpu);
starttime = jiffies;
(*n_onl_attempts)++;
- ret = cpu_up(cpu);
+ lock_device_hotplug();
+ ret = device_online(get_cpu_device(cpu));
+ unlock_device_hotplug();
if (ret) {
if (verbose)
pr_alert("%s" TORTURE_FLAG
@@ -192,17 +196,20 @@ torture_onoff(void *arg)
for_each_online_cpu(cpu)
maxcpu = cpu;
WARN_ON(maxcpu < 0);
- if (!IS_MODULE(CONFIG_TORTURE_TEST))
+ if (!IS_MODULE(CONFIG_TORTURE_TEST)) {
+ lock_device_hotplug();
for_each_possible_cpu(cpu) {
if (cpu_online(cpu))
continue;
- ret = cpu_up(cpu);
+ ret = device_online(get_cpu_device(cpu));
if (ret && verbose) {
pr_alert("%s" TORTURE_FLAG
"%s: Initial online %d: errno %d\n",
__func__, torture_type, cpu, ret);
}
}
+ unlock_device_hotplug();
+ }

if (maxcpu == 0) {
VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled");
--
2.17.1


2019-11-27 21:49:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> The core device API performs extra housekeeping bits that are missing
> from directly calling cpu_up/down.
>
> See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> serialization during LPM") for an example description of what might go
> wrong.
>
> This also prepares to make cpu_up/down a private interface for anything
> but the cpu subsystem.
>
> Signed-off-by: Qais Yousef <[email protected]>
> CC: Davidlohr Bueso <[email protected]>
> CC: "Paul E. McKenney" <[email protected]>
> CC: Josh Triplett <[email protected]>
> CC: [email protected]

Looks fine from an rcutorture viewpoint, but why not provide an API
that pulled lock_device_hotplug() and unlock_device_hotplug() into the
online/offline calls?

Thanx, Paul

> ---
> kernel/torture.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/torture.c b/kernel/torture.c
> index 7c13f5558b71..12115024feb2 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -97,7 +97,9 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
> torture_type, cpu);
> starttime = jiffies;
> (*n_offl_attempts)++;
> - ret = cpu_down(cpu);
> + lock_device_hotplug();
> + ret = device_offline(get_cpu_device(cpu));
> + unlock_device_hotplug();
> if (ret) {
> if (verbose)
> pr_alert("%s" TORTURE_FLAG
> @@ -148,7 +150,9 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
> torture_type, cpu);
> starttime = jiffies;
> (*n_onl_attempts)++;
> - ret = cpu_up(cpu);
> + lock_device_hotplug();
> + ret = device_online(get_cpu_device(cpu));
> + unlock_device_hotplug();
> if (ret) {
> if (verbose)
> pr_alert("%s" TORTURE_FLAG
> @@ -192,17 +196,20 @@ torture_onoff(void *arg)
> for_each_online_cpu(cpu)
> maxcpu = cpu;
> WARN_ON(maxcpu < 0);
> - if (!IS_MODULE(CONFIG_TORTURE_TEST))
> + if (!IS_MODULE(CONFIG_TORTURE_TEST)) {
> + lock_device_hotplug();
> for_each_possible_cpu(cpu) {
> if (cpu_online(cpu))
> continue;
> - ret = cpu_up(cpu);
> + ret = device_online(get_cpu_device(cpu));
> if (ret && verbose) {
> pr_alert("%s" TORTURE_FLAG
> "%s: Initial online %d: errno %d\n",
> __func__, torture_type, cpu, ret);
> }
> }
> + unlock_device_hotplug();
> + }
>
> if (maxcpu == 0) {
> VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled");
> --
> 2.17.1
>

2019-11-28 16:57:45

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On 11/27/19 13:47, Paul E. McKenney wrote:
> On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > The core device API performs extra housekeeping bits that are missing
> > from directly calling cpu_up/down.
> >
> > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > serialization during LPM") for an example description of what might go
> > wrong.
> >
> > This also prepares to make cpu_up/down a private interface for anything
> > but the cpu subsystem.
> >
> > Signed-off-by: Qais Yousef <[email protected]>
> > CC: Davidlohr Bueso <[email protected]>
> > CC: "Paul E. McKenney" <[email protected]>
> > CC: Josh Triplett <[email protected]>
> > CC: [email protected]
>
> Looks fine from an rcutorture viewpoint, but why not provide an API
> that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> online/offline calls?

I *think* the right way to do what you say is by doing lock_device_hotplug()
inside device_{online, offline}() - which affects all drivers not just the CPU.

And even then, I think we need to refcount it so nested calls won't deadlock.

I don't know if this can break any rule or not. If Greg thinks it's okay I'd be
happy to post some patches that do just that.

Thanks

--
Qais Yousef

>
> Thanx, Paul
>
> > ---
> > kernel/torture.c | 15 +++++++++++----
> > 1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/torture.c b/kernel/torture.c
> > index 7c13f5558b71..12115024feb2 100644
> > --- a/kernel/torture.c
> > +++ b/kernel/torture.c
> > @@ -97,7 +97,9 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
> > torture_type, cpu);
> > starttime = jiffies;
> > (*n_offl_attempts)++;
> > - ret = cpu_down(cpu);
> > + lock_device_hotplug();
> > + ret = device_offline(get_cpu_device(cpu));
> > + unlock_device_hotplug();
> > if (ret) {
> > if (verbose)
> > pr_alert("%s" TORTURE_FLAG
> > @@ -148,7 +150,9 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
> > torture_type, cpu);
> > starttime = jiffies;
> > (*n_onl_attempts)++;
> > - ret = cpu_up(cpu);
> > + lock_device_hotplug();
> > + ret = device_online(get_cpu_device(cpu));
> > + unlock_device_hotplug();
> > if (ret) {
> > if (verbose)
> > pr_alert("%s" TORTURE_FLAG
> > @@ -192,17 +196,20 @@ torture_onoff(void *arg)
> > for_each_online_cpu(cpu)
> > maxcpu = cpu;
> > WARN_ON(maxcpu < 0);
> > - if (!IS_MODULE(CONFIG_TORTURE_TEST))
> > + if (!IS_MODULE(CONFIG_TORTURE_TEST)) {
> > + lock_device_hotplug();
> > for_each_possible_cpu(cpu) {
> > if (cpu_online(cpu))
> > continue;
> > - ret = cpu_up(cpu);
> > + ret = device_online(get_cpu_device(cpu));
> > if (ret && verbose) {
> > pr_alert("%s" TORTURE_FLAG
> > "%s: Initial online %d: errno %d\n",
> > __func__, torture_type, cpu, ret);
> > }
> > }
> > + unlock_device_hotplug();
> > + }
> >
> > if (maxcpu == 0) {
> > VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled");
> > --
> > 2.17.1
> >

2019-11-28 17:04:48

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On 11/28/19 16:56, Qais Yousef wrote:
> On 11/27/19 13:47, Paul E. McKenney wrote:
> > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > The core device API performs extra housekeeping bits that are missing
> > > from directly calling cpu_up/down.
> > >
> > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > serialization during LPM") for an example description of what might go
> > > wrong.
> > >
> > > This also prepares to make cpu_up/down a private interface for anything
> > > but the cpu subsystem.
> > >
> > > Signed-off-by: Qais Yousef <[email protected]>
> > > CC: Davidlohr Bueso <[email protected]>
> > > CC: "Paul E. McKenney" <[email protected]>
> > > CC: Josh Triplett <[email protected]>
> > > CC: [email protected]
> >
> > Looks fine from an rcutorture viewpoint, but why not provide an API
> > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > online/offline calls?
>
> I *think* the right way to do what you say is by doing lock_device_hotplug()
> inside device_{online, offline}() - which affects all drivers not just the CPU.
>
> And even then, I think we need to refcount it so nested calls won't deadlock.

Forget that. I don't think nesting here makes actually any sense.

Thanks

--
Qais Yousef

2019-11-28 21:08:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On Thu, Nov 28, 2019 at 05:00:26PM +0000, Qais Yousef wrote:
> On 11/28/19 16:56, Qais Yousef wrote:
> > On 11/27/19 13:47, Paul E. McKenney wrote:
> > > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > > The core device API performs extra housekeeping bits that are missing
> > > > from directly calling cpu_up/down.
> > > >
> > > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > > serialization during LPM") for an example description of what might go
> > > > wrong.
> > > >
> > > > This also prepares to make cpu_up/down a private interface for anything
> > > > but the cpu subsystem.
> > > >
> > > > Signed-off-by: Qais Yousef <[email protected]>
> > > > CC: Davidlohr Bueso <[email protected]>
> > > > CC: "Paul E. McKenney" <[email protected]>
> > > > CC: Josh Triplett <[email protected]>
> > > > CC: [email protected]
> > >
> > > Looks fine from an rcutorture viewpoint, but why not provide an API
> > > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > > online/offline calls?
> >
> > I *think* the right way to do what you say is by doing lock_device_hotplug()
> > inside device_{online, offline}() - which affects all drivers not just the CPU.

Or there could be a CPU-specific wrapper function that did the needed
locking. (Whether this is worth it or not of course depends on the
number of invocations.)

Thanx, Paul

2019-11-29 09:15:18

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On 11/28/19 13:02, Paul E. McKenney wrote:
> On Thu, Nov 28, 2019 at 05:00:26PM +0000, Qais Yousef wrote:
> > On 11/28/19 16:56, Qais Yousef wrote:
> > > On 11/27/19 13:47, Paul E. McKenney wrote:
> > > > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > > > The core device API performs extra housekeeping bits that are missing
> > > > > from directly calling cpu_up/down.
> > > > >
> > > > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > > > serialization during LPM") for an example description of what might go
> > > > > wrong.
> > > > >
> > > > > This also prepares to make cpu_up/down a private interface for anything
> > > > > but the cpu subsystem.
> > > > >
> > > > > Signed-off-by: Qais Yousef <[email protected]>
> > > > > CC: Davidlohr Bueso <[email protected]>
> > > > > CC: "Paul E. McKenney" <[email protected]>
> > > > > CC: Josh Triplett <[email protected]>
> > > > > CC: [email protected]
> > > >
> > > > Looks fine from an rcutorture viewpoint, but why not provide an API
> > > > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > > > online/offline calls?
> > >
> > > I *think* the right way to do what you say is by doing lock_device_hotplug()
> > > inside device_{online, offline}() - which affects all drivers not just the CPU.
>
> Or there could be a CPU-specific wrapper function that did the needed
> locking. (Whether this is worth it or not of course depends on the
> number of invocations.)

Okay I see what you mean now. driver/base/memory.c have {add,remove}_memory()
that does what you say. I think we can replicate this in driver/base/cpu.c too.

I can certainly do that, better as an improvement on top as I need to audit the
code to make sure the critical sections weren't relying on this lock to protect
something else beside the online/offline operation.

Thanks!

--
Qais Yousef

2019-11-29 20:42:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On Fri, Nov 29, 2019 at 09:13:45AM +0000, Qais Yousef wrote:
> On 11/28/19 13:02, Paul E. McKenney wrote:
> > On Thu, Nov 28, 2019 at 05:00:26PM +0000, Qais Yousef wrote:
> > > On 11/28/19 16:56, Qais Yousef wrote:
> > > > On 11/27/19 13:47, Paul E. McKenney wrote:
> > > > > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > > > > The core device API performs extra housekeeping bits that are missing
> > > > > > from directly calling cpu_up/down.
> > > > > >
> > > > > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > > > > serialization during LPM") for an example description of what might go
> > > > > > wrong.
> > > > > >
> > > > > > This also prepares to make cpu_up/down a private interface for anything
> > > > > > but the cpu subsystem.
> > > > > >
> > > > > > Signed-off-by: Qais Yousef <[email protected]>
> > > > > > CC: Davidlohr Bueso <[email protected]>
> > > > > > CC: "Paul E. McKenney" <[email protected]>
> > > > > > CC: Josh Triplett <[email protected]>
> > > > > > CC: [email protected]
> > > > >
> > > > > Looks fine from an rcutorture viewpoint, but why not provide an API
> > > > > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > > > > online/offline calls?
> > > >
> > > > I *think* the right way to do what you say is by doing lock_device_hotplug()
> > > > inside device_{online, offline}() - which affects all drivers not just the CPU.
> >
> > Or there could be a CPU-specific wrapper function that did the needed
> > locking. (Whether this is worth it or not of course depends on the
> > number of invocations.)
>
> Okay I see what you mean now. driver/base/memory.c have {add,remove}_memory()
> that does what you say. I think we can replicate this in driver/base/cpu.c too.
>
> I can certainly do that, better as an improvement on top as I need to audit the
> code to make sure the critical sections weren't relying on this lock to protect
> something else beside the online/offline operation.

Works for me!

Thanx, Paul

2020-02-20 15:33:01

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On 11/29/19 12:38, Paul E. McKenney wrote:
> On Fri, Nov 29, 2019 at 09:13:45AM +0000, Qais Yousef wrote:
> > On 11/28/19 13:02, Paul E. McKenney wrote:
> > > On Thu, Nov 28, 2019 at 05:00:26PM +0000, Qais Yousef wrote:
> > > > On 11/28/19 16:56, Qais Yousef wrote:
> > > > > On 11/27/19 13:47, Paul E. McKenney wrote:
> > > > > > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > > > > > The core device API performs extra housekeeping bits that are missing
> > > > > > > from directly calling cpu_up/down.
> > > > > > >
> > > > > > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > > > > > serialization during LPM") for an example description of what might go
> > > > > > > wrong.
> > > > > > >
> > > > > > > This also prepares to make cpu_up/down a private interface for anything
> > > > > > > but the cpu subsystem.
> > > > > > >
> > > > > > > Signed-off-by: Qais Yousef <[email protected]>
> > > > > > > CC: Davidlohr Bueso <[email protected]>
> > > > > > > CC: "Paul E. McKenney" <[email protected]>
> > > > > > > CC: Josh Triplett <[email protected]>
> > > > > > > CC: [email protected]
> > > > > >
> > > > > > Looks fine from an rcutorture viewpoint, but why not provide an API
> > > > > > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > > > > > online/offline calls?
> > > > >
> > > > > I *think* the right way to do what you say is by doing lock_device_hotplug()
> > > > > inside device_{online, offline}() - which affects all drivers not just the CPU.
> > >
> > > Or there could be a CPU-specific wrapper function that did the needed
> > > locking. (Whether this is worth it or not of course depends on the
> > > number of invocations.)
> >
> > Okay I see what you mean now. driver/base/memory.c have {add,remove}_memory()
> > that does what you say. I think we can replicate this in driver/base/cpu.c too.
> >
> > I can certainly do that, better as an improvement on top as I need to audit the
> > code to make sure the critical sections weren't relying on this lock to protect
> > something else beside the online/offline operation.
>
> Works for me!

I'm taking that as reviewed-by, which I'll add to v3. Please shout if you still
need to have a look further.

Once this is taken I'll add the suggested API!

Thanks

--
Qais Yousef

2020-02-21 00:28:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On Thu, Feb 20, 2020 at 03:31:59PM +0000, Qais Yousef wrote:
> On 11/29/19 12:38, Paul E. McKenney wrote:
> > On Fri, Nov 29, 2019 at 09:13:45AM +0000, Qais Yousef wrote:
> > > On 11/28/19 13:02, Paul E. McKenney wrote:
> > > > On Thu, Nov 28, 2019 at 05:00:26PM +0000, Qais Yousef wrote:
> > > > > On 11/28/19 16:56, Qais Yousef wrote:
> > > > > > On 11/27/19 13:47, Paul E. McKenney wrote:
> > > > > > > On Mon, Nov 25, 2019 at 11:27:52AM +0000, Qais Yousef wrote:
> > > > > > > > The core device API performs extra housekeeping bits that are missing
> > > > > > > > from directly calling cpu_up/down.
> > > > > > > >
> > > > > > > > See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
> > > > > > > > serialization during LPM") for an example description of what might go
> > > > > > > > wrong.
> > > > > > > >
> > > > > > > > This also prepares to make cpu_up/down a private interface for anything
> > > > > > > > but the cpu subsystem.
> > > > > > > >
> > > > > > > > Signed-off-by: Qais Yousef <[email protected]>
> > > > > > > > CC: Davidlohr Bueso <[email protected]>
> > > > > > > > CC: "Paul E. McKenney" <[email protected]>
> > > > > > > > CC: Josh Triplett <[email protected]>
> > > > > > > > CC: [email protected]
> > > > > > >
> > > > > > > Looks fine from an rcutorture viewpoint, but why not provide an API
> > > > > > > that pulled lock_device_hotplug() and unlock_device_hotplug() into the
> > > > > > > online/offline calls?
> > > > > >
> > > > > > I *think* the right way to do what you say is by doing lock_device_hotplug()
> > > > > > inside device_{online, offline}() - which affects all drivers not just the CPU.
> > > >
> > > > Or there could be a CPU-specific wrapper function that did the needed
> > > > locking. (Whether this is worth it or not of course depends on the
> > > > number of invocations.)
> > >
> > > Okay I see what you mean now. driver/base/memory.c have {add,remove}_memory()
> > > that does what you say. I think we can replicate this in driver/base/cpu.c too.
> > >
> > > I can certainly do that, better as an improvement on top as I need to audit the
> > > code to make sure the critical sections weren't relying on this lock to protect
> > > something else beside the online/offline operation.
> >
> > Works for me!
>
> I'm taking that as reviewed-by, which I'll add to v3. Please shout if you still
> need to have a look further.
>
> Once this is taken I'll add the suggested API!

OK, I will bite...

Why not right now?

Thanx, Paul

2020-02-21 09:36:25

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On 02/20/20 16:26, Paul E. McKenney wrote:
> > I'm taking that as reviewed-by, which I'll add to v3. Please shout if you still
> > need to have a look further.
> >
> > Once this is taken I'll add the suggested API!
>
> OK, I will bite...
>
> Why not right now?

Sigh. Good question. Probably I'm just being lame; it just felt the series is a
bit fragile spanning that many archs and was wary introducing some extra
changes on top will make it even harder to get merged soon.

Let me go and do it. You're probably right and it shouldn't really create a big
ripple on the series.

Thanks!

--
Qais Yousef

2020-02-21 20:39:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] torture: Replace cpu_up/down with device_online/offline

On Fri, Feb 21, 2020 at 09:35:05AM +0000, Qais Yousef wrote:
> On 02/20/20 16:26, Paul E. McKenney wrote:
> > > I'm taking that as reviewed-by, which I'll add to v3. Please shout if you still
> > > need to have a look further.
> > >
> > > Once this is taken I'll add the suggested API!
> >
> > OK, I will bite...
> >
> > Why not right now?
>
> Sigh. Good question. Probably I'm just being lame; it just felt the series is a
> bit fragile spanning that many archs and was wary introducing some extra
> changes on top will make it even harder to get merged soon.
>
> Let me go and do it. You're probably right and it shouldn't really create a big
> ripple on the series.

Very good, thank you!

Thanx, Paul