2007-05-12 18:17:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFD] Freezing of kernel threads

Hi,

Having considered the issue of freezing (or not freezing) kernel threads for a
longer while I tend to agree with Linus that we shouldn't be freezing as many
kernel threads as we currently freeze, but there's one thing that he doesn't
seem to take into account. Namely, there may be some kernel threads that
actually *want* (or need) to be frozen. :-)

For the suspend they may be kernel threads that otherwise would need some
special synchronization with some device drivers' .suspend() and .resume()
callbacks or fs-related kernel threads. For the CPU hotplug they might be
kernel threads that otherwise would interfere with the removal or addition of
CPUs. Still, in each case there seems to be a limited group of kernel threads
that may want to be frozen and the remaining kernel threads that shouldn't be
involved in any freezer-related actions at all. However, we currently require
all kernel threads to take part in the freezing mechanism, which doesn't seem to
be correct.

It seems to me that, instead of asking *all* kernel threads to either set
PF_NOFREEZE or call try_to_freeze() in a suitable place, we should introduce
some mechanism allowing the kernel threads that *want* to freeze to register
with the freezer. For example, we can use a per-task flag, say PF_FREEZABLE,
to indicate that given task is a kernel thread which wants (or needs) to be
frozen (for simplicity, I'll call it a 'freezable thread' from now on).
Alternatively, we can use a list of freezable kernel threads. Regardless of
what actual mechanism is used, the freezer will only set TIF_FREEZE for
freezable kernel threads and these threads will be expected to call
try_to_freeze(). The remaining kernel threads (I'll call them 'normal') won't
need to do anything.

If this approach is used, the entire freezing-related complexity will only
affect the freezable kernel threads. Moreover, it seems that we can precisely
define and document the rules to be followed by the freezable kernel threads
(for example, IMO it's reasonable to require a freezable kernel thread to
unregister from the freezer before it stops etc.) while the normal kernel
threads can be left in peace.

It also seems that we can transition from the current apporach to the one
outlined above in a quite straightforward way. Namely, for this purpose we
need to identify the kernel threads that should be freezable, make them use the
new interface, and remove PF_NOFREEZE and try_to_freeze() from the remaining
kernel threads. Later, if testing shows that we've overlooked some kernel
threads which need to be made freezable, we'll be able to add the "ability to
freeze" to these threads quite easily.

Of course, that would also require us to rewrite the freezer itself quite a
bit, but IMO it's worthy of doing.

Thoughts?

Rafael


2007-05-12 18:26:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sat, 12 May 2007, Rafael J. Wysocki wrote:
>
> Of course, that would also require us to rewrite the freezer itself quite a
> bit, but IMO it's worthy of doing.
>
> Thoughts?

I'd much prefer it. One of the reasons I hate the freezer so much is that
it ends up affecting things it really has no business affecting. Why
should a random kernel thread have to havecode NOT to care? So it makes
much more sense to default to "I don't care about the freezer", and then
people who do care can say so and add their own code.

That said, I also suspect that suspend should depend less on the freezer
in the first place, and depend more on just shutting up specific actions.

The freezer is kind of a blunt instrument that just stops everything,
without actually understanding *what* it stops. As a result (since it
really doesn't know what it's doing), it ends up having all the issues
with "ok, I don't know what I'm doing, but that guy says I shouldn't do
it, so I won't".

Blunt instruments are often _easier_ (compare with the global kernel lock
in SMP), but they end up being very inflexible and hard to get rid of
later when you want to do something more intelligent.

Linus

2007-05-12 19:17:18

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

Hi!

> Having considered the issue of freezing (or not freezing) kernel threads for a
> longer while I tend to agree with Linus that we shouldn't be freezing as many
> kernel threads as we currently freeze, but there's one thing that he doesn't
> seem to take into account. Namely, there may be some kernel threads that
> actually *want* (or need) to be frozen. :-)
>
> For the suspend they may be kernel threads that otherwise would need some
> special synchronization with some device drivers' .suspend() and .resume()
> callbacks or fs-related kernel threads. For the CPU hotplug they might be
> kernel threads that otherwise would interfere with the removal or addition of
> CPUs. Still, in each case there seems to be a limited group of kernel threads
> that may want to be frozen and the remaining kernel threads that shouldn't be
> involved in any freezer-related actions at all. However, we currently require
> all kernel threads to take part in the freezing mechanism, which doesn't seem to
> be correct.

Well.. It is actually a feature.

These days, you have to either add PF_NOFREEZE explicitly (which
hopefully means you know what you are doing) or add try_to_freeze() at
specific place.

If you fail to do that, we'll see freezer failure, quickly, and catch
the simple bug.

I'm afraid that if we default to "do not freeze by default", someone
will just port zfs, will know nothing about suspend, and his
"write management info somewhere periodically" thread will corrupt
people's disks.

OTOH... perhaps suspend/resume is common enough operation that people
_know_ they need to stop their writing? ...no, I do not think we are
there yet.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Subject: Re: [RFD] Freezing of kernel threads

On Sat, May 12, 2007 at 08:17:31PM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> Having considered the issue of freezing (or not freezing) kernel threads for a
> longer while I tend to agree with Linus that we shouldn't be freezing as many
> kernel threads as we currently freeze, but there's one thing that he doesn't
> seem to take into account. Namely, there may be some kernel threads that
> actually *want* (or need) to be frozen. :-)
>
> For the suspend they may be kernel threads that otherwise would need some
> special synchronization with some device drivers' .suspend() and .resume()
> callbacks or fs-related kernel threads. For the CPU hotplug they might be
> kernel threads that otherwise would interfere with the removal or addition of
> CPUs.

While it is definitely a good idea, since it would reduce the
freeze_time, it also raises some concerns, atleast from cpu-hotplug
perspective.

Having been through the whole global lock_cpu_hotplug()
and the per-subsystem cpu-hotplug-lock mess, I have observed one other
thing about these locks. Not only might they result in deadlocks, but
they also make code maintainability more difficult.

For eg, a function foo() which does

for_each_online_cpu() {
/* do something */
}

needs hotplug protection. Thus any call chain leading to foo(), needs to
have be wrapped with some sort of a cpu-hotplug-mutex. This is what made it
more difficult to maintain, since we had to audit everytime someone
creates a new call chain leading to foo().

With the "freeze-limited-kernel-threads" scheme, we would still need
to perform this audit, since we now have to freeze only those kernel
threads which *may* end up calling foo().

So at the moment, I have some doubts if this scheme would help us solve
the maintainabilty problem which the locks had created. It defintely
needs some more thinking.

That said, I agree that coarse-grained protection, though is effective
may not be flexible and my not yield the expected performance.
Eg: Waiting for kernel threads to freeze, which have nothing to do with
cpu-hotplug only mean more latency for the cpu-hotplug operation.

>Still, in each case there seems to be a limited group of kernel threads
> that may want to be frozen and the remaining kernel threads that shouldn't be
> involved in any freezer-related actions at all. However, we currently require
> all kernel threads to take part in the freezing mechanism, which doesn't seem to
> be correct.
>
> It seems to me that, instead of asking *all* kernel threads to either set
> PF_NOFREEZE or call try_to_freeze() in a suitable place, we should introduce
> some mechanism allowing the kernel threads that *want* to freeze to register
> with the freezer. For example, we can use a per-task flag, say PF_FREEZABLE,
> to indicate that given task is a kernel thread which wants (or needs) to be
> frozen (for simplicity, I'll call it a 'freezable thread' from now on).
> Alternatively, we can use a list of freezable kernel threads. Regardless of
> what actual mechanism is used, the freezer will only set TIF_FREEZE for
> freezable kernel threads and these threads will be expected to call
> try_to_freeze(). The remaining kernel threads (I'll call them 'normal') won't
> need to do anything.
>
> If this approach is used, the entire freezing-related complexity will only
> affect the freezable kernel threads. Moreover, it seems that we can precisely
> define and document the rules to be followed by the freezable kernel threads
> (for example, IMO it's reasonable to require a freezable kernel thread to
> unregister from the freezer before it stops etc.) while the normal kernel
> threads can be left in peace.
>
> It also seems that we can transition from the current apporach to the one
> outlined above in a quite straightforward way. Namely, for this purpose we
> need to identify the kernel threads that should be freezable, make them use the
> new interface, and remove PF_NOFREEZE and try_to_freeze() from the remaining
> kernel threads. Later, if testing shows that we've overlooked some kernel
> threads which need to be made freezable, we'll be able to add the "ability to
> freeze" to these threads quite easily.
>
> Of course, that would also require us to rewrite the freezer itself quite a
> bit, but IMO it's worthy of doing.
>
> Thoughts?
>
> Rafael
Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-12 19:41:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sat, 12 May 2007, Pavel Machek wrote:
>
> If you fail to do that, we'll see freezer failure, quickly, and catch
> the simple bug.

"It's a feature to add crap to drivers and subsystems that don't care!"

That's a novel thing.

Maybe we could add other features too. Like a "IM_NOT_AN_IDIOT" flag that
every driver has to set in it's "struct device", so that the device
manager knows that it wasn't written by an idiot.

> I'm afraid that if we default to "do not freeze by default", someone
> will just port zfs, will know nothing about suspend, and his
> "write management info somewhere periodically" thread will corrupt
> people's disks.

Right. That's what the IM_NOT_AN_IDIOT thing will do too - it will
guarantee that only smart people do the right thing.

Great idea. We'll get a better kernel in no tim!

Linus

2007-05-12 19:58:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sun, 13 May 2007, Gautham R Shenoy wrote:
>
> With the "freeze-limited-kernel-threads" scheme, we would still need
> to perform this audit, since we now have to freeze only those kernel
> threads which *may* end up calling foo().

What the *heck* is wrong with just adding proper locking or other
synchronization?

Why the hell do people think that they don't need locking for things?

In other words, you're just adding crap to make up for other crap. That's
*not* how we do kernel develpment. I would seriously suggest people like
that approach MS for a position on _their_ kernel team.

The fact is, if you want to avoid problems, you do so by having clear and
clean approaches. THAT is how you avoid bugs. Not by adding crap.

I've several times suggested that if you actually want to stop the
machine, you can do so at the scheduler level.

So stop talking about freezing when you're talkign abotu CPU hotplug. The
two have _nothing_ to do with each other, and if there is something this
whole discussion has absolutely convinced me about, it's that they will
never have anything to do with each other.

If you worry about people doing

for_each_online_cpu()

in the presense of hotplug, here's a couple of simple solutions:

- Just teach for_each_online_cpu() about locking. It's not a new concept.
We have it all over the kernel.

Maybe the macro itself can take whatever locks it needs, and maybe it's
as simple as a rule saying that: "you must hold spinlock XYZ when you
do that". But it has _nothing_ to do with freezing all threads.

- make the rule be that you cannot sleep in the above macro, and make the
rule be that CPU hotplug uses the same mechanisms that module unload
already does!

In other words, we already have *better* mechanisms for cpu hotplug than
the freezer. We have mechanisms that there is absolutely zero controversy
about.

IOW, have you actually looked at "stop_machine_run()", which people use
much more than the freezer, and which has never really caused any issues,
and didn't result in us adding *crap* to every single kernel thread?

Really. The arguments for the freezer are CRAP. Why cannot people just
admit it? We have *better* things available, that don't *need* any of this
crap. The "stop_machine()" stuff literally ends up just running a
high-priority thread on each CPU, and doesn't need any special support
anywhere.

AND IT JUST WORKS.

(Btw, I'm pretty sure it could be used for hibernation too: just stop the
IO at a higher level first, sync, then stop_machine_run() etc. But I don't
care. What I care about is that the freezer() insanity doesn't _spread_).

Linus

2007-05-12 20:12:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sat, 12 May 2007, Linus Torvalds wrote:
>
> - make the rule be that you cannot sleep in the above macro, and make the
> rule be that CPU hotplug uses the same mechanisms that module unload
> already does!

Side note: we obviously already do the stop_machine_run thing for CPU
down, so doing it for CPU bringup too would seem to just be a good thing.
And it means that the locking for CPU's disappearing is the same as the
locking rules for CPU's appearing: you just want to make sure to disable
preemption (which you already have to do, to make the "CPU went away" case
work out).

So how about just making sure "__cpu_up()" gets called with the same
stop_machine_run() logic? Maybe it's not *quite* a one-liner change, but
it should come fairly close...

Linus

2007-05-12 21:10:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Saturday, 12 May 2007 20:25, Linus Torvalds wrote:
>
> On Sat, 12 May 2007, Rafael J. Wysocki wrote:
> >
> > Of course, that would also require us to rewrite the freezer itself quite a
> > bit, but IMO it's worthy of doing.
> >
> > Thoughts?
>
> I'd much prefer it. One of the reasons I hate the freezer so much is that
> it ends up affecting things it really has no business affecting. Why
> should a random kernel thread have to havecode NOT to care? So it makes
> much more sense to default to "I don't care about the freezer", and then
> people who do care can say so and add their own code.
>
> That said, I also suspect that suspend should depend less on the freezer
> in the first place, and depend more on just shutting up specific actions.

Well, I think that could be the next step.

> The freezer is kind of a blunt instrument that just stops everything,
> without actually understanding *what* it stops. As a result (since it
> really doesn't know what it's doing), it ends up having all the issues
> with "ok, I don't know what I'm doing, but that guy says I shouldn't do
> it, so I won't".
>
> Blunt instruments are often _easier_ (compare with the global kernel lock
> in SMP), but they end up being very inflexible and hard to get rid of
> later when you want to do something more intelligent.

I generally agree, but as far as the hibernation/suspend is concerned, I'm not
at the point of doing things more intelligently yet. Someone who's been
developing the kernel for the last 15 years surely can do this, but not me.

OTOH, I think I can change the freezer so that it works as I outlined in the
previous message and I'd like to be able to make some progress in a direction
that's more-or-less acceptable to everyone.

Frankly, I don't see much point in developing things that you object to very
strongly, because that ultimately would lead to a loss of time, mine as well as
yours.

Greetings,
Rafael

2007-05-12 22:04:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

Hi,

On Saturday, 12 May 2007 21:17, Pavel Machek wrote:
> Hi!
>
> > Having considered the issue of freezing (or not freezing) kernel threads for a
> > longer while I tend to agree with Linus that we shouldn't be freezing as many
> > kernel threads as we currently freeze, but there's one thing that he doesn't
> > seem to take into account. Namely, there may be some kernel threads that
> > actually *want* (or need) to be frozen. :-)
> >
> > For the suspend they may be kernel threads that otherwise would need some
> > special synchronization with some device drivers' .suspend() and .resume()
> > callbacks or fs-related kernel threads. For the CPU hotplug they might be
> > kernel threads that otherwise would interfere with the removal or addition of
> > CPUs. Still, in each case there seems to be a limited group of kernel threads
> > that may want to be frozen and the remaining kernel threads that shouldn't be
> > involved in any freezer-related actions at all. However, we currently require
> > all kernel threads to take part in the freezing mechanism, which doesn't seem to
> > be correct.
>
> Well.. It is actually a feature.
>
> These days, you have to either add PF_NOFREEZE explicitly (which
> hopefully means you know what you are doing) or add try_to_freeze() at
> specific place.
>
> If you fail to do that, we'll see freezer failure, quickly, and catch
> the simple bug.

Well, not if PF_NOFREEZE is used.

> I'm afraid that if we default to "do not freeze by default", someone
> will just port zfs, will know nothing about suspend, and his
> "write management info somewhere periodically" thread will corrupt
> people's disks.

We'll have exactly the same result right now if said developer uses
PF_NOFREEZE.

> OTOH... perhaps suspend/resume is common enough operation that people
> _know_ they need to stop their writing? ...no, I do not think we are
> there yet.

We never will be there if we don't change the default.

Besides, if a kernel thread calls I_want_to_be_frozen_during_suspend() and
then try_to_freeze(), this seems to be a bit more meaningful than just calling
try_to_freeze() which (almost) everybody does.

Greetings,
Rafael

2007-05-12 22:14:55

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

Hi!

> > If you fail to do that, we'll see freezer failure, quickly, and catch
> > the simple bug.
>
> "It's a feature to add crap to drivers and subsystems that don't care!"
>
> That's a novel thing.
>
> Maybe we could add other features too. Like a "IM_NOT_AN_IDIOT" flag that
> every driver has to set in it's "struct device", so that the device
> manager knows that it wasn't written by an idiot.

Well, it was initially SOMEONE_REVIEWIED_THIS flag (and thus pretty
neccessary for initial merge). Now it still serves as
SOMEONE_REVIEWIED_THIS flag, but I guess you could call it
IM_NOT_AN_IDIOT flag, too.

Well, we can go the other way around, but that means more careful
reviewing of incoming code. Yes, we can do it.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Subject: Re: [RFD] Freezing of kernel threads

Hi Linus,

I apologise for citing 'for_each_online_cpu()' as the reason, when
I actually was thinking about 'online_cpu_map'. That only proves
that I should not reply to stuff when I am sleepy!

Anyway, coming back to the freezer cpu hotplug part, cpu-hotplug locking
has been broken for almost a year now. Primarily because of it's global
locking scheme, which is fairly easy to break.

Freezer was considered as a solution only recently.
Freezer was the last resort, not the first one.

I have already posted three RFC's to fix the cpu-hotplug locking
problem earlier and none of them used freezer :-) And I remember
cc'ing you on all these mails.

So to answer your question, why the do synchronization
primitives don't work, here's a gist of what we tried before
resorting to freezer, just incase you missed out. I am sorry for
writing such a long e-mail. But if the problem were so simple,
it would (or should) have been solved by now.

RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
well known refcounting model. I believe this is the most
natural solution to this problem, since we do not want to
lock cpus. We want to keep a reference to them so that they don't
come/go away while we are in some per-cpu-critical section.

So a cpu-hotplug aware function can it as follows:

foo ()
{
get_hot_cpus();
some_data = function_of(cpu_online_map);

/* We will some_data here. So we prefer that
* cpu's don't come up and go down while we are
* operating in this section. And yeah, we might
* also sleep!
*/

put_hot_cpus();
}

There are only 2 simple rules:

* In get_hot_cpus(), we check if a cpu-hotplug operation is
underway, in which case we sleep. Otherwise, we bump up the
refcount and return.

* A cpu-hotplug operation cannot proceed until the refcount
is/goes to zero.

RFC #2: A Scalable version of the above which eliminated any barriers,
atomic instructions on the get_hot_cpus fastpath.
Too bad, this one did not get any review at all. Bad timing? Maybe.

RFC #3: Per subsystem mutexes. Let each cpu-hotplug aware subsystem
maintain it's own hot_cpu_mutex to protect it's cpu-hotplug critical
data. Something like

foo ()
{
mutex_lock(&my_hot_cpu_mutex);
some_data = function_of(cpu_online_map);

/* We use some_data here. So we prefer that
* cpu's don't come up and go down while we are
* operating in this section. And yeah, we might
* also sleep!
*/

mutex_unlock(&my_hot_cpu_mutex);
}

my_hotplug_cpu_callback(int cpu, int event)
{
switch(event) {
case CPU_LOCK_ACQUIRE:
/*
* We are about to begin a cpu-hotplug operation.
* so make sure that code like foo, doesn't run while
* we do this.
*/
mutex_lock(&my_hot_cpu_mutex);

case CPU_DOWN_PREPARE:

case CPU_DOWN_FAILED:

case CPU_DEAD:

case CPU_LOCK_RELEASE:
/*
* Cpu hotplug is done. Release
* mutex so that foo can run
*/

mutex_unlock(&my_hot_cpu_mutex);
}
}

and

cpu_down(int cpu)
{
/* send notifications for CPU_LOCK_ACQUIRE */
notifier_blocking_chain(cpu, CPU_LOCK_ACQUIRE);

/*
* None of the cpu hotplug aware subsystems are running in
* critical section. So we are safe to perform a cpu-hotplug
* operation. So lets do it here.
*/

/* send notifications for CPU_LOCK_ACQUIRE */
notifier_blocking_chain(cpu, CPU_LOCK_RELEASE);

}

This is the commit 6f7cc11aa6c7d5002e16096c7590944daece70ed which you
were referring to in one of your previous mails.

So why won't any of these work???

RFC #3, though is nice and transactional doesn't work because foo()
can be called from cpu-hotplug call back path
(i.e, from a subsystem's CPU_DOWN_PREPARE / CPU_DEAD/ CPU_UP_PREPARE/
CPU_ONLINE call path.)

This will definitely result in a deadlock, since we would have
already acquired the my_hot_cpu_mutex in CPU_LOCK_ACQUIRE.

The case in point being the preempt rcu implementation of
synchronize_sched(), which has to call sched_setaffinity(), which
acquires sched_hotcpu_mutex.

Now consider update_sched_domains() which is a cpu-hotplug callback function.
CPU_LOCK_ACQUIRE:
| |-> mutex_lock(&sched_hotcpu_mutex);
|
CPU_DOWN_PREPARE:
|--> update_sched_domains()
|--> detach_destroy_domains()
|--> synchronize_sched()
|--> sched_setaffinity()
|-> mutex_lock(&sched_hotcpu_mutex);

A deadlock!

Ofcourse, we can do

__sched_schedaffinity()
{
/* Acquire sched_hotcpu_mutex before calling this */
}

sched_setaffinity()
{
mutex_lock(&sched_hotcpu_mutex);
__sched_setaffinity();
mutex_unlock(&sched_hotcpu_mutex);
}

However, now the caller (say update_sched_domains) has to wonder,
if he has to call the callpath with locked function, or the raw one.

I am pretty confident as the code evolves, we will encounter more such
dependencies leading to deadlocks. So should we go for two variants of
the same function, one locked or one lockless?? I personally feel that
this approach will only uglyfy the code.
Probably a matter of taste, something which I guess I am yet to
acquire. So I won't argue more.

RFC #1 and #2 DO work. But, the discussions in the thread
http://lkml.org/lkml/2007/1/26/282 gave me the impression
that we would be better off without any code audits to
make the code paths cpu-hotplug safe. I would leave it for others
to shed more light here.

So it is not that we cannot do cpu-hotplug without freezer.
We have done it in the past and we definitely can do it in
the future as well.
But many believe, by using freezer for cpu hotplug,
a) stupid people will have to try really hard to break cpu-hotplug.
b) We can avoid a lot many ugly and redundant checks in the code
paths to ensure that a cpu is still there, or a new cpu has
not come up.
c) We can avoid locks to protect data in frequently accessed codepath
from cpu hotplug, an operation which we do like once in a year or so.

All this is ofcourse, with the assumption that the freezer is bug free
which at the moment it is not. That's why we are working on it.

And regarding stop_machine_run(), freezer is not a replacement for
it. No, freezer is just a replacement for lock_cpu_hotplug() or
whatever locking mechanism we are currently using to ensure
that the whole code path from CPU_DOWN_PREPARE to
CPU_DEAD / CPU_DOWN_FAILED (and corresponding CPU_* notifiers
for cpu up) is atomic.

In _cpu_down(), we use stop_machine run() to
-> disable the interrupts on the cpu to be removed.
-> clear the cpu's bit from cpu_online_map
-> schedule a high priority SCHED_FIFO idle thread on the cpu to be
offlined so that we can fake cpu's death.

It clearly doesn't provide us any protection in sections like
CPU_DOWN_PREPARE/CPU_DEAD.

And yeah, we can make for_each_offline_cpu() take some mutex/lock
which will *hold* the hotplug operation. But, for_each_online_cpu()
is not the only place where we reference the cpu_online map.
There are zillion other places. We need a better scheme to address
all of them.

If you have come down to this point, I hope you have understood why
we went the freezer way. If you still feel that we can solve it using
a simpler, cleaner better method, I am all for it.

Why, even I would like to see this problem fixed, as much as you do,
if not more :-)

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-13 16:35:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sun, 13 May 2007, Gautham R Shenoy wrote:
>
> RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
> well known refcounting model.

Yes. And usign the "preempt count" as a refcount is fairly natural, no?
We do already depend on that in many code-paths.

It also has the advantage since it's not a *blocking* lock, it's fairly
easy to code around (ie since it nests, it avoids the kinds of nasty
deadlocks we had with cpufreq that had totally insane calling semantics
and different people all wanted the lock).

Of course, a real nesting lock could be used to the same effect.

> RFC #1 and #2 DO work. But, the discussions in the thread
> http://lkml.org/lkml/2007/1/26/282 gave me the impression
> that we would be better off without any code audits to
> make the code paths cpu-hotplug safe. I would leave it for others
> to shed more light here.

Well, I hope that _this_ discussion about the freezer has convinced you
that there are no more fundamntal problems with #1/#2 than with using the
freezer.

The freezer really needs even *more* code auditing, since it's almost
impossible to see which thread depends on some other thread. There's a
real reason why most kernel threads disable freezing.

At least with locking, the code-paths you require to audit are all
actually relevant to cpu hotplug, and you can easily add dynamic
_automated_ tests like "if you use online_cpu_map, you'd better hold the
preempt lock".

For example, since all users of cpu_online_map should be pure *readers*
(apart from a couple of cases that actually bring up a CPU), you can do
things like

#define cpu_online_map check_cpu_online_map()

static inline cpumask_t check_cpu_online_map(void)
{
WARN_ON(!preempt_safe()); /* or whatever lock we decide on */
return __real_cpu_online_map;
}

and it will nicely catch things like that.

Linus

2007-05-13 20:03:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Sunday, 13 May 2007 18:33, Linus Torvalds wrote:
>
> On Sun, 13 May 2007, Gautham R Shenoy wrote:
> >
> > RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
> > well known refcounting model.
>
> Yes. And usign the "preempt count" as a refcount is fairly natural, no?
> We do already depend on that in many code-paths.
>
> It also has the advantage since it's not a *blocking* lock, it's fairly
> easy to code around (ie since it nests, it avoids the kinds of nasty
> deadlocks we had with cpufreq that had totally insane calling semantics
> and different people all wanted the lock).
>
> Of course, a real nesting lock could be used to the same effect.
>
> > RFC #1 and #2 DO work. But, the discussions in the thread
> > http://lkml.org/lkml/2007/1/26/282 gave me the impression
> > that we would be better off without any code audits to
> > make the code paths cpu-hotplug safe. I would leave it for others
> > to shed more light here.
>
> Well, I hope that _this_ discussion about the freezer has convinced you
> that there are no more fundamntal problems with #1/#2 than with using the
> freezer.
>
> The freezer really needs even *more* code auditing, since it's almost
> impossible to see which thread depends on some other thread. There's a
> real reason why most kernel threads disable freezing.

Well, for the current -git we have:

rafael@albercik:~/src/linux-2.6> grep -r -I -l try_to_freeze * \
| grep -v signal.c | grep -v freezer.h | grep -v process.c | wc
45 45 1186

Most of these are calls from kernel threads.

At the same time we have:

rafael@albercik:~/src/linux-2.6> grep -r -I -l PF_NOFREEZE * \
| grep -v sched.h | grep -v process.c | grep -v freezer.h | wc
23 23 559

I wouldn't call that a majority. Moreover:

rafael@albercik:~/src/linux-2.6> grep -r -I -l PF_NOFREEZE drivers/* | wc
9 9 238
rafael@albercik:~/src/linux-2.6> grep -r -I -l try_to_freeze drivers/* | wc
27 27 790

That, BTW, is why I was (and I still am) afraid to stop freezing kernel threads
just like that.

Of course it's possible to look at these 45 files and see if the kernel threads
in there really need to be freezable and I'm going to do this, but this is a
different thing.

Besides, the problems with interdependencies that we've had recently are
related specifically to the CPU hotplug. To be precise, they are related to the
CPU hotplug notifiers that try to stop kernel threads which may be frozen.
The other interdependencies don't lead to freezer-related problems.

Greetings,
Rafael

2007-05-13 20:50:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads



On Sun, 13 May 2007, Rafael J. Wysocki wrote:
>
> Besides, the problems with interdependencies that we've had recently are
> related specifically to the CPU hotplug. To be precise, they are related to the
> CPU hotplug notifiers that try to stop kernel threads which may be frozen.
> The other interdependencies don't lead to freezer-related problems.

Sure they do. We've always had interdependencies and freezer-related
problems. That's why PF_NOFREEZE exists in the first place.

Claiming this is somehow CPU hotplug-related is disingenious, since the
freezer hasn't even been used for that historically, yet clearly we *do*
have the NOFREEZE stuff.

Those kernel threads tend to exists for a reason.

Linus

2007-05-13 21:09:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Sunday, 13 May 2007 22:49, Linus Torvalds wrote:
>
> On Sun, 13 May 2007, Rafael J. Wysocki wrote:
> >
> > Besides, the problems with interdependencies that we've had recently are
> > related specifically to the CPU hotplug. To be precise, they are related to the
> > CPU hotplug notifiers that try to stop kernel threads which may be frozen.
> > The other interdependencies don't lead to freezer-related problems.
>
> Sure they do. We've always had interdependencies and freezer-related
> problems. That's why PF_NOFREEZE exists in the first place.

Okay, we've already discussed that. I was referring to the recent problems
and I should have added "at present" to the last sentence. :-)

> Claiming this is somehow CPU hotplug-related is disingenious, since the
> freezer hasn't even been used for that historically, yet clearly we *do*
> have the NOFREEZE stuff.
>
> Those kernel threads tend to exists for a reason.

Agreed.

Greetings,
Rafael

2007-05-14 06:11:10

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Sun, May 13, 2007 at 09:33:41AM -0700, Linus Torvalds wrote:
> On Sun, 13 May 2007, Gautham R Shenoy wrote:
> > RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
> > well known refcounting model.
>
> Yes. And usign the "preempt count" as a refcount is fairly natural, no?
> We do already depend on that in many code-paths.
>
> It also has the advantage since it's not a *blocking* lock, [...]

get/put_hot_cpus() was intended to be similar and not same as
get/put_cpu().

One difference is get_hot_cpus() has to be a blocking lock. It has to block
when there is a cpu_down/up operation already in progress, otherwise it isn't
of much help to serialize readers/writers. Note that a cpu_down/up is marked in
progress *before* the first notifier is sent (CPU_DOWN/UP_PREPARE) and not just
when changing the cpu_online_map bitmap.

Because it can be a blocking call, there needs to be associated
machinery to wake up sleeping readers/writers.

The other complication get/put_hotcpu() had was dealing with
write-followed-by-read lock attempt by the *same* thread (whilst doing
cpu_down/up). IIRC this was triggered by some callback processing in CPU_DEAD
or CPU_DOWN_PREPARE.


cpu_down()
|- take write lock
|- CPU_DOWN_PREPARE
| |- foo() wants a read_lock

Stupid as it sounds, it was really found to be happening! Gautham, do you
recall who that foo() was? Somebody in cpufreq I guess ..

Tackling that requires some state bit in task_struct to educate
read_lock to be a no-op if write lock is already held by the thread.

In summary, get/put_hot_cpu() will need to be (slightly) more complex than
something like get/put_cpu(). Perhaps this complexity was what put off
Andrew when he suggested the use of freezer (http://lkml.org/lkml/2006/11/1/400)

> For example, since all users of cpu_online_map should be pure *readers*
> (apart from a couple of cases that actually bring up a CPU), you can do
> things like
>
> #define cpu_online_map check_cpu_online_map()
>
> static inline cpumask_t check_cpu_online_map(void)
> {
> WARN_ON(!preempt_safe()); /* or whatever lock we decide on */
> return __real_cpu_online_map;
> }

I remember Rusty had a similar function to check for unsafe references
to cpu_online_map way back when cpu hotplug was being developed. It will
be a good idea to reintroduce that back.

> and it will nicely catch things like that.

--
Regards,
vatsa

Subject: Re: [RFD] Freezing of kernel threads

On Mon, May 14, 2007 at 11:48:46AM +0530, Srivatsa Vaddagiri wrote:
>
> The other complication get/put_hotcpu() had was dealing with
> write-followed-by-read lock attempt by the *same* thread (whilst doing
> cpu_down/up). IIRC this was triggered by some callback processing in CPU_DEAD
> or CPU_DOWN_PREPARE.
>
>
> cpu_down()
> |- take write lock
> |- CPU_DOWN_PREPARE
> | |- foo() wants a read_lock
>
> Stupid as it sounds, it was really found to be happening! Gautham, do you
> recall who that foo() was? Somebody in cpufreq I guess ..

IIRC, it was a problem with ondemand. while handling CPU_DEAD, ondemand code
would call destroy_workqueue, which tried flushing the workqueue, which
once upon a time did lock_cpu_hotplug, before Oleg and Andrew cleaned
that up.

Ofcourse, cpufreq works fine now after Venki's patches which
just nullifies the reference to the policy structure of the cpu to be
removed during the CPU_DOWN_PREPARE by calling __cpufreq_remove_dev
instead of handling it in CPU_DEAD.

However, as we have discovered, without freezing all the threads, it
is inadvisable to call flush_workqueue from a cpu-hotplug callback
path.

>
> Tackling that requires some state bit in task_struct to educate
> read_lock to be a no-op if write lock is already held by the thread.
>

That should not be difficult right?
Since we have only one writer at a time, the task_struct in say
active_writer, and in the reader slowpath, allow if
current == active.

> In summary, get/put_hot_cpu() will need to be (slightly) more complex than
> something like get/put_cpu(). Perhaps this complexity was what put off
> Andrew when he suggested the use of freezer (http://lkml.org/lkml/2006/11/1/400)
>
> > For example, since all users of cpu_online_map should be pure *readers*
> > (apart from a couple of cases that actually bring up a CPU), you can do
> > things like
> >
> > #define cpu_online_map check_cpu_online_map()
> >
> > static inline cpumask_t check_cpu_online_map(void)
> > {
> > WARN_ON(!preempt_safe()); /* or whatever lock we decide on */
> > return __real_cpu_online_map;
> > }
>
> I remember Rusty had a similar function to check for unsafe references
> to cpu_online_map way back when cpu hotplug was being developed. It will
> be a good idea to reintroduce that back.
>

Yes. However, there are places where people keep a local copy of
the cpu_online_map. So any access to this local copy is also not
cpu-hotplug safe. No ?

> > and it will nicely catch things like that.
>
> --
> Regards,
> vatsa

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-14 07:43:09

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Mon, May 14, 2007 at 11:48:46AM +0530, Srivatsa Vaddagiri wrote:
> On Sun, May 13, 2007 at 09:33:41AM -0700, Linus Torvalds wrote:
> > On Sun, 13 May 2007, Gautham R Shenoy wrote:
> > > RFC #1: Use get_hot_cpus()- put_hot_cpus() , which follow the
> > > well known refcounting model.
> >
> > Yes. And usign the "preempt count" as a refcount is fairly natural, no?
> > We do already depend on that in many code-paths.
>
>
> Tackling that requires some state bit in task_struct to educate
> read_lock to be a no-op if write lock is already held by the thread.
>
> In summary, get/put_hot_cpu() will need to be (slightly) more complex than
> something like get/put_cpu(). Perhaps this complexity was what put off
> Andrew when he suggested the use of freezer (http://lkml.org/lkml/2006/11/1/400)

Atleast with get_hot_cpu()/put_hot_cpu(), complexity is limited
to the implementation of those interfaces and not all over the
kernel. Having worked on CPU hotplug in the past, I have no doubt
that this is the most simple *locking model* for cpu
hotplug events.


> > For example, since all users of cpu_online_map should be pure *readers*
> > (apart from a couple of cases that actually bring up a CPU), you can do
> > things like
> >
> > #define cpu_online_map check_cpu_online_map()
> >
> > static inline cpumask_t check_cpu_online_map(void)
> > {
> > WARN_ON(!preempt_safe()); /* or whatever lock we decide on */
> > return __real_cpu_online_map;
> > }
>
> I remember Rusty had a similar function to check for unsafe references
> to cpu_online_map way back when cpu hotplug was being developed. It will
> be a good idea to reintroduce that back.

One possibility is to have a generation counter for all cpu events
and save it in cpumask_t (or some debug version of it) every
time we snapshot the online cpu mask. We can then
put checks in places where we access them whether the generation
counter matches the current generation or not. If it doesn't
it would indicate that saved cpumask not being "updated" during
cpu hotplug events. In the cpu hotplug event handlers, we can
force all saved online cpumasks to be "updated". Such debug
code may help detect the locking violators.

Thanks
Dipankar

2007-05-14 10:02:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD] Freezing of kernel threads

On Monday, 14 May 2007 09:26, Gautham R Shenoy wrote:
> On Mon, May 14, 2007 at 11:48:46AM +0530, Srivatsa Vaddagiri wrote:
> >
> > The other complication get/put_hotcpu() had was dealing with
> > write-followed-by-read lock attempt by the *same* thread (whilst doing
> > cpu_down/up). IIRC this was triggered by some callback processing in CPU_DEAD
> > or CPU_DOWN_PREPARE.
> >
> >
> > cpu_down()
> > |- take write lock
> > |- CPU_DOWN_PREPARE
> > | |- foo() wants a read_lock
> >
> > Stupid as it sounds, it was really found to be happening! Gautham, do you
> > recall who that foo() was? Somebody in cpufreq I guess ..
>
> IIRC, it was a problem with ondemand. while handling CPU_DEAD, ondemand code
> would call destroy_workqueue, which tried flushing the workqueue, which
> once upon a time did lock_cpu_hotplug, before Oleg and Andrew cleaned
> that up.
>
> Ofcourse, cpufreq works fine now after Venki's patches which
> just nullifies the reference to the policy structure of the cpu to be
> removed during the CPU_DOWN_PREPARE by calling __cpufreq_remove_dev
> instead of handling it in CPU_DEAD.
>
> However, as we have discovered, without freezing all the threads, it
> is inadvisable to call flush_workqueue from a cpu-hotplug callback
> path.

Please see my recent patch at http://lkml.org/lkml/2007/5/14/7 .
It's not exactly the same thing, but I think the trick in there might be
useful.

Greetings,
Rafael