Even after applying some fixes posted by Chirag and Peter Z, I'm still
getting some messages in my log like this
BUG: sleeping function called from invalid context swapper(0) at
kernel/rtmutex.c:743
in_atomic():1 [00000001], irqs_disabled():1
Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
Call Trace:
[<ffffffff802305d3>] __might_sleep+0x12d/0x132
[<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
[<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
[<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
[<ffffffff803e1b7f>] menu_select+0x7b/0x9c
[<ffffffff8020b1be>] ? default_idle+0x0/0x5a
[<ffffffff8020b1be>] ? default_idle+0x0/0x5a
[<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
[<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
[<ffffffff8020b1be>] ? default_idle+0x0/0x5a
[<ffffffff8020b333>] cpu_idle+0xb2/0x12d
[<ffffffff80466af0>] start_secondary+0x186/0x18b
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
.....[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
The following simple patch makes the messages disappear - however,
there may be a better more fine grained solution, but the problem is
also that all the functions are designed to use the same lock.
On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
> Even after applying some fixes posted by Chirag and Peter Z, I'm still
> getting some messages in my log like this
> BUG: sleeping function called from invalid context swapper(0) at
> kernel/rtmutex.c:743
> in_atomic():1 [00000001], irqs_disabled():1
> Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
>
> Call Trace:
> [<ffffffff802305d3>] __might_sleep+0x12d/0x132
> [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
> [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
> [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
> [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
> [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
> [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
> [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
> [<ffffffff80466af0>] start_secondary+0x186/0x18b
>
> ---------------------------
> | preempt count: 00000001 ]
> | 1-level deep critical section nesting:
> ----------------------------------------
> ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
> ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
>
> The following simple patch makes the messages disappear - however,
> there may be a better more fine grained solution, but the problem is
> also that all the functions are designed to use the same lock.
Hmm, I think you're right - its called from the idle routine so we can't
go about sleeping there.
The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
use of this lock - that is decidedly not O(1).
Mark, would it be possible to split that lock in two, one lock
protecting pm_qos_array[], and one lock protecting the
requirements.list ?
[ NOTE: this is the -rt kernel we're talking about ]
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Index: linux-2.6.26.1-jk-rt1/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.26.1-jk-rt1.orig/kernel/pm_qos_params.c
> +++ linux-2.6.26.1-jk-rt1/kernel/pm_qos_params.c
> @@ -110,7 +110,7 @@ static struct pm_qos_object *pm_qos_arra
> &network_throughput_pm_qos
> };
>
> -static DEFINE_SPINLOCK(pm_qos_lock);
> +static DEFINE_RAW_SPINLOCK(pm_qos_lock);
>
> static ssize_t pm_qos_power_write(struct file *filp, const char
> __user *buf,
> size_t count, loff_t *f_pos);
>
On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
> On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
> > Even after applying some fixes posted by Chirag and Peter Z, I'm still
> > getting some messages in my log like this
>
> > BUG: sleeping function called from invalid context swapper(0) at
> > kernel/rtmutex.c:743
> > in_atomic():1 [00000001], irqs_disabled():1
> > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
> >
> > Call Trace:
> > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
> > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
> > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
> > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
> > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
> > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
> > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
> > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
> > [<ffffffff80466af0>] start_secondary+0x186/0x18b
> >
> > ---------------------------
> > | preempt count: 00000001 ]
> > | 1-level deep critical section nesting:
> > ----------------------------------------
> > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
> > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
> >
> > The following simple patch makes the messages disappear - however,
> > there may be a better more fine grained solution, but the problem is
> > also that all the functions are designed to use the same lock.
>
> Hmm, I think you're right - its called from the idle routine so we can't
> go about sleeping there.
>
> The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
> use of this lock - that is decidedly not O(1).
>
> Mark, would it be possible to split that lock in two, one lock
> protecting pm_qos_array[], and one lock protecting the
> requirements.list ?
very likely, but I'm not sure how it will help.
the fine grain locking I had initially worked out on pm_qos was to have
a lock per pm_qos_object, that would be used for accessing the
requirement_list and the target_value. But that isn't what you are
asking about is it?
Is what you want is a pm_qos_requirements_list_lock and a
pm_qos_target_value_lock, for each pm_qos_object instance?
I guess it wold work but besides giving the code spinlock diarrhea would
it really help solve the issue you are seeing?
--mgross
> [ NOTE: this is the -rt kernel we're talking about ]
>
> > Signed-off-by: John Kacur <jkacur at gmail dot com>
> >
> > Index: linux-2.6.26.1-jk-rt1/kernel/pm_qos_params.c
> > ===================================================================
> > --- linux-2.6.26.1-jk-rt1.orig/kernel/pm_qos_params.c
> > +++ linux-2.6.26.1-jk-rt1/kernel/pm_qos_params.c
> > @@ -110,7 +110,7 @@ static struct pm_qos_object *pm_qos_arra
> > &network_throughput_pm_qos
> > };
> >
> > -static DEFINE_SPINLOCK(pm_qos_lock);
> > +static DEFINE_RAW_SPINLOCK(pm_qos_lock);
> >
> > static ssize_t pm_qos_power_write(struct file *filp, const char
> > __user *buf,
> > size_t count, loff_t *f_pos);
> >
On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
> > > getting some messages in my log like this
> >
> > > BUG: sleeping function called from invalid context swapper(0) at
> > > kernel/rtmutex.c:743
> > > in_atomic():1 [00000001], irqs_disabled():1
> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
> > >
> > > Call Trace:
> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
> > >
> > > ---------------------------
> > > | preempt count: 00000001 ]
> > > | 1-level deep critical section nesting:
> > > ----------------------------------------
> > > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
> > > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
> > >
> > > The following simple patch makes the messages disappear - however,
> > > there may be a better more fine grained solution, but the problem is
> > > also that all the functions are designed to use the same lock.
> >
> > Hmm, I think you're right - its called from the idle routine so we can't
> > go about sleeping there.
> >
> > The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
> > use of this lock - that is decidedly not O(1).
> >
> > Mark, would it be possible to split that lock in two, one lock
> > protecting pm_qos_array[], and one lock protecting the
> > requirements.list ?
>
> very likely, but I'm not sure how it will help.
>
> the fine grain locking I had initially worked out on pm_qos was to have
> a lock per pm_qos_object, that would be used for accessing the
> requirement_list and the target_value. But that isn't what you are
> asking about is it?
>
> Is what you want is a pm_qos_requirements_list_lock and a
> pm_qos_target_value_lock, for each pm_qos_object instance?
>
> I guess it wold work but besides giving the code spinlock diarrhea would
> it really help solve the issue you are seeing?
The problem is that on -rt spinlocks turn into mutexes. And the above
BUG tells us that the idle loop might end up scheduling due to trying to
take this lock.
Now, the way I read the code, pm_qos_lock protects multiple things:
- pm_qos_array[target]->target_value
- &pm_qos_array[pm_qos_class]->requirements.list
Now, the thing is, we could turn the lock back into a real spinlock
(raw_spinlock_t), but the loops in eg update_target() are not O(1) and
could thus cause serious preempt-off latencies.
My question was, and now having had a second look at the code I think it
is, would it be possible to guard the list using a sleeping lock,
protect the target_value using a (raw) spinlock.
OTOH, just reading a (word aligned, word sized) value doesn't normally
require serialization, esp if the update site is already serialized by
other means.
So could we perhaps remove the lock usage from pm_qos_requirement()? -
that too would solve the issue.
- Peter
On Tue, Aug 5, 2008 at 11:09 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
>> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
>> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
>> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
>> > > getting some messages in my log like this
>> >
>> > > BUG: sleeping function called from invalid context swapper(0) at
>> > > kernel/rtmutex.c:743
>> > > in_atomic():1 [00000001], irqs_disabled():1
>> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
>> > >
>> > > Call Trace:
>> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
>> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
>> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
>> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
>> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
>> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
>> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
>> > >
>> > > ---------------------------
>> > > | preempt count: 00000001 ]
>> > > | 1-level deep critical section nesting:
>> > > ----------------------------------------
>> > > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
>> > > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
>> > >
>> > > The following simple patch makes the messages disappear - however,
>> > > there may be a better more fine grained solution, but the problem is
>> > > also that all the functions are designed to use the same lock.
>> >
>> > Hmm, I think you're right - its called from the idle routine so we can't
>> > go about sleeping there.
>> >
>> > The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
>> > use of this lock - that is decidedly not O(1).
>> >
>> > Mark, would it be possible to split that lock in two, one lock
>> > protecting pm_qos_array[], and one lock protecting the
>> > requirements.list ?
>>
>> very likely, but I'm not sure how it will help.
>>
>> the fine grain locking I had initially worked out on pm_qos was to have
>> a lock per pm_qos_object, that would be used for accessing the
>> requirement_list and the target_value. But that isn't what you are
>> asking about is it?
>>
>> Is what you want is a pm_qos_requirements_list_lock and a
>> pm_qos_target_value_lock, for each pm_qos_object instance?
>>
>> I guess it wold work but besides giving the code spinlock diarrhea would
>> it really help solve the issue you are seeing?
>
> The problem is that on -rt spinlocks turn into mutexes. And the above
> BUG tells us that the idle loop might end up scheduling due to trying to
> take this lock.
>
> Now, the way I read the code, pm_qos_lock protects multiple things:
>
> - pm_qos_array[target]->target_value
>
> - &pm_qos_array[pm_qos_class]->requirements.list
>
> Now, the thing is, we could turn the lock back into a real spinlock
> (raw_spinlock_t), but the loops in eg update_target() are not O(1) and
> could thus cause serious preempt-off latencies.
>
> My question was, and now having had a second look at the code I think it
> is, would it be possible to guard the list using a sleeping lock,
> protect the target_value using a (raw) spinlock.
>
> OTOH, just reading a (word aligned, word sized) value doesn't normally
> require serialization, esp if the update site is already serialized by
> other means.
>
> So could we perhaps remove the lock usage from pm_qos_requirement()? -
> that too would solve the issue.
>
>
> - Peter
>
How about this patch? Like Peter suggests, It adds a raw spinlock only
for the target value. I'm currently running with it, but still
testing, comments are appreciated.
Thanks
On Wed, Aug 6, 2008 at 12:18 AM, John Kacur <[email protected]> wrote:
> On Tue, Aug 5, 2008 at 11:09 PM, Peter Zijlstra <[email protected]> wrote:
>> On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
>>> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
>>> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
>>> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
>>> > > getting some messages in my log like this
>>> >
>>> > > BUG: sleeping function called from invalid context swapper(0) at
>>> > > kernel/rtmutex.c:743
>>> > > in_atomic():1 [00000001], irqs_disabled():1
>>> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
>>> > >
>>> > > Call Trace:
>>> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
>>> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
>>> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
>>> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
>>> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
>>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>>> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
>>> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
>>> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>>> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
>>> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
>>> > >
>>> > > ---------------------------
>>> > > | preempt count: 00000001 ]
>>> > > | 1-level deep critical section nesting:
>>> > > ----------------------------------------
>>> > > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
>>> > > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
>>> > >
>>> > > The following simple patch makes the messages disappear - however,
>>> > > there may be a better more fine grained solution, but the problem is
>>> > > also that all the functions are designed to use the same lock.
>>> >
>>> > Hmm, I think you're right - its called from the idle routine so we can't
>>> > go about sleeping there.
>>> >
>>> > The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
>>> > use of this lock - that is decidedly not O(1).
>>> >
>>> > Mark, would it be possible to split that lock in two, one lock
>>> > protecting pm_qos_array[], and one lock protecting the
>>> > requirements.list ?
>>>
>>> very likely, but I'm not sure how it will help.
>>>
>>> the fine grain locking I had initially worked out on pm_qos was to have
>>> a lock per pm_qos_object, that would be used for accessing the
>>> requirement_list and the target_value. But that isn't what you are
>>> asking about is it?
>>>
>>> Is what you want is a pm_qos_requirements_list_lock and a
>>> pm_qos_target_value_lock, for each pm_qos_object instance?
>>>
>>> I guess it wold work but besides giving the code spinlock diarrhea would
>>> it really help solve the issue you are seeing?
>>
>> The problem is that on -rt spinlocks turn into mutexes. And the above
>> BUG tells us that the idle loop might end up scheduling due to trying to
>> take this lock.
>>
>> Now, the way I read the code, pm_qos_lock protects multiple things:
>>
>> - pm_qos_array[target]->target_value
>>
>> - &pm_qos_array[pm_qos_class]->requirements.list
>>
>> Now, the thing is, we could turn the lock back into a real spinlock
>> (raw_spinlock_t), but the loops in eg update_target() are not O(1) and
>> could thus cause serious preempt-off latencies.
>>
>> My question was, and now having had a second look at the code I think it
>> is, would it be possible to guard the list using a sleeping lock,
>> protect the target_value using a (raw) spinlock.
>>
>> OTOH, just reading a (word aligned, word sized) value doesn't normally
>> require serialization, esp if the update site is already serialized by
>> other means.
>>
>> So could we perhaps remove the lock usage from pm_qos_requirement()? -
>> that too would solve the issue.
>>
>>
>> - Peter
>>
>
> How about this patch? Like Peter suggests, It adds a raw spinlock only
> for the target value. I'm currently running with it, but still
> testing, comments are appreciated.
>
> Thanks
>
I have been running with this patch for quite a while now without any
problems, so please apply. If Mark is able to remove the lock
altogether at some point in the future then we can remove this patch.
On Wed, Aug 06, 2008 at 12:18:08AM +0200, John Kacur wrote:
> On Tue, Aug 5, 2008 at 11:09 PM, Peter Zijlstra <[email protected]> wrote:
> > On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
> >> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
> >> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
> >> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
> >> > > getting some messages in my log like this
> >> >
> >> > > BUG: sleeping function called from invalid context swapper(0) at
> >> > > kernel/rtmutex.c:743
> >> > > in_atomic():1 [00000001], irqs_disabled():1
> >> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
> >> > >
> >> > > Call Trace:
> >> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
> >> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
> >> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
> >> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
> >> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
> >> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
> >> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
> >> > >
> >> > > ---------------------------
> >> > > | preempt count: 00000001 ]
> >> > > | 1-level deep critical section nesting:
> >> > > ----------------------------------------
> >> > > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
> >> > > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
> >> > >
> >> > > The following simple patch makes the messages disappear - however,
> >> > > there may be a better more fine grained solution, but the problem is
> >> > > also that all the functions are designed to use the same lock.
> >> >
> >> > Hmm, I think you're right - its called from the idle routine so we can't
> >> > go about sleeping there.
> >> >
> >> > The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
> >> > use of this lock - that is decidedly not O(1).
> >> >
> >> > Mark, would it be possible to split that lock in two, one lock
> >> > protecting pm_qos_array[], and one lock protecting the
> >> > requirements.list ?
> >>
> >> very likely, but I'm not sure how it will help.
> >>
> >> the fine grain locking I had initially worked out on pm_qos was to have
> >> a lock per pm_qos_object, that would be used for accessing the
> >> requirement_list and the target_value. But that isn't what you are
> >> asking about is it?
> >>
> >> Is what you want is a pm_qos_requirements_list_lock and a
> >> pm_qos_target_value_lock, for each pm_qos_object instance?
> >>
> >> I guess it wold work but besides giving the code spinlock diarrhea would
> >> it really help solve the issue you are seeing?
> >
> > The problem is that on -rt spinlocks turn into mutexes. And the above
> > BUG tells us that the idle loop might end up scheduling due to trying to
> > take this lock.
> >
> > Now, the way I read the code, pm_qos_lock protects multiple things:
> >
> > - pm_qos_array[target]->target_value
> >
> > - &pm_qos_array[pm_qos_class]->requirements.list
> >
> > Now, the thing is, we could turn the lock back into a real spinlock
> > (raw_spinlock_t), but the loops in eg update_target() are not O(1) and
> > could thus cause serious preempt-off latencies.
> >
> > My question was, and now having had a second look at the code I think it
> > is, would it be possible to guard the list using a sleeping lock,
> > protect the target_value using a (raw) spinlock.
> >
> > OTOH, just reading a (word aligned, word sized) value doesn't normally
> > require serialization, esp if the update site is already serialized by
> > other means.
> >
> > So could we perhaps remove the lock usage from pm_qos_requirement()? -
> > that too would solve the issue.
> >
> >
> > - Peter
> >
>
> How about this patch? Like Peter suggests, It adds a raw spinlock only
> for the target value. I'm currently running with it, but still
> testing, comments are appreciated.
>
> Thanks
> pm_qos_requirement-fix
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Add a raw spinlock for the target value.
>
>
> Index: linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.26.1-rt1.jk.orig/kernel/pm_qos_params.c
> +++ linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
> @@ -111,6 +111,7 @@ static struct pm_qos_object *pm_qos_arra
> };
>
> static DEFINE_SPINLOCK(pm_qos_lock);
> +static DEFINE_RAW_SPINLOCK(pm_qos_rawlock);
>
> static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> size_t count, loff_t *f_pos);
> @@ -149,13 +150,15 @@ static void update_target(int target)
> extreme_value = pm_qos_array[target]->comparitor(
> extreme_value, node->value);
> }
> + spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_lock_irqsave(&pm_qos_rawlock, flags);
> if (pm_qos_array[target]->target_value != extreme_value) {
> call_notifier = 1;
> pm_qos_array[target]->target_value = extreme_value;
> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
> pm_qos_array[target]->target_value);
> }
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>
> if (call_notifier)
> blocking_notifier_call_chain(pm_qos_array[target]->notifiers,
> @@ -195,9 +198,9 @@ int pm_qos_requirement(int pm_qos_class)
> int ret_val;
> unsigned long flags;
>
> - spin_lock_irqsave(&pm_qos_lock, flags);
> + spin_lock_irqsave(&pm_qos_rawlock, flags);
> ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>
> return ret_val;
> }
As long as RAW_SPINLOCK compiles to a normal spinlock for non-RT premept
kernels I'm don't see a problem, as the change is almost a no-op for
non-RT kernels.
Signed-off-by: mark gross <[email protected]>
Should I send an updated patch that includes a change to the comment
block regarding the locking design after this patch or instead of it?
--gmross
On Wed, Aug 13, 2008 at 12:49 AM, mark gross <[email protected]> wrote:
> On Wed, Aug 06, 2008 at 12:18:08AM +0200, John Kacur wrote:
>> On Tue, Aug 5, 2008 at 11:09 PM, Peter Zijlstra <[email protected]> wrote:
>> > On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
>> >> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
>> >> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
>> >> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
>> >> > > getting some messages in my log like this
>> >> >
>> >> > > BUG: sleeping function called from invalid context swapper(0) at
>> >> > > kernel/rtmutex.c:743
>> >> > > in_atomic():1 [00000001], irqs_disabled():1
>> >> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
>> >> > >
>> >> > > Call Trace:
>> >> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
>> >> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
>> >> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
>> >> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
>> >> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
>> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> >> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
>> >> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
>> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
>> >> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
>> >> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
>> >> > >
>> >> > > ---------------------------
>> >> > > | preempt count: 00000001 ]
>> >> > > | 1-level deep critical section nesting:
>> >> > > ----------------------------------------
>> >> > > ... [<ffffffff8020b39c>] .... cpu_idle+0x11b/0x12d
>> >> > > ......[<ffffffff80466af0>] .. ( <= start_secondary+0x186/0x18b)
>> >> > >
>> >> > > The following simple patch makes the messages disappear - however,
>> >> > > there may be a better more fine grained solution, but the problem is
>> >> > > also that all the functions are designed to use the same lock.
>> >> >
>> >> > Hmm, I think you're right - its called from the idle routine so we can't
>> >> > go about sleeping there.
>> >> >
>> >> > The only trouble I have is with kernel/pm_qos_params.c:update_target()'s
>> >> > use of this lock - that is decidedly not O(1).
>> >> >
>> >> > Mark, would it be possible to split that lock in two, one lock
>> >> > protecting pm_qos_array[], and one lock protecting the
>> >> > requirements.list ?
>> >>
>> >> very likely, but I'm not sure how it will help.
>> >>
>> >> the fine grain locking I had initially worked out on pm_qos was to have
>> >> a lock per pm_qos_object, that would be used for accessing the
>> >> requirement_list and the target_value. But that isn't what you are
>> >> asking about is it?
>> >>
>> >> Is what you want is a pm_qos_requirements_list_lock and a
>> >> pm_qos_target_value_lock, for each pm_qos_object instance?
>> >>
>> >> I guess it wold work but besides giving the code spinlock diarrhea would
>> >> it really help solve the issue you are seeing?
>> >
>> > The problem is that on -rt spinlocks turn into mutexes. And the above
>> > BUG tells us that the idle loop might end up scheduling due to trying to
>> > take this lock.
>> >
>> > Now, the way I read the code, pm_qos_lock protects multiple things:
>> >
>> > - pm_qos_array[target]->target_value
>> >
>> > - &pm_qos_array[pm_qos_class]->requirements.list
>> >
>> > Now, the thing is, we could turn the lock back into a real spinlock
>> > (raw_spinlock_t), but the loops in eg update_target() are not O(1) and
>> > could thus cause serious preempt-off latencies.
>> >
>> > My question was, and now having had a second look at the code I think it
>> > is, would it be possible to guard the list using a sleeping lock,
>> > protect the target_value using a (raw) spinlock.
>> >
>> > OTOH, just reading a (word aligned, word sized) value doesn't normally
>> > require serialization, esp if the update site is already serialized by
>> > other means.
>> >
>> > So could we perhaps remove the lock usage from pm_qos_requirement()? -
>> > that too would solve the issue.
>> >
>> >
>> > - Peter
>> >
>>
>> How about this patch? Like Peter suggests, It adds a raw spinlock only
>> for the target value. I'm currently running with it, but still
>> testing, comments are appreciated.
>>
>> Thanks
>
>> pm_qos_requirement-fix
>> Signed-off-by: John Kacur <jkacur at gmail dot com>
>>
>> Add a raw spinlock for the target value.
>>
>>
>> Index: linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
>> ===================================================================
>> --- linux-2.6.26.1-rt1.jk.orig/kernel/pm_qos_params.c
>> +++ linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
>> @@ -111,6 +111,7 @@ static struct pm_qos_object *pm_qos_arra
>> };
>>
>> static DEFINE_SPINLOCK(pm_qos_lock);
>> +static DEFINE_RAW_SPINLOCK(pm_qos_rawlock);
>>
>> static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
>> size_t count, loff_t *f_pos);
>> @@ -149,13 +150,15 @@ static void update_target(int target)
>> extreme_value = pm_qos_array[target]->comparitor(
>> extreme_value, node->value);
>> }
>> + spin_unlock_irqrestore(&pm_qos_lock, flags);
>> + spin_lock_irqsave(&pm_qos_rawlock, flags);
>> if (pm_qos_array[target]->target_value != extreme_value) {
>> call_notifier = 1;
>> pm_qos_array[target]->target_value = extreme_value;
>> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
>> pm_qos_array[target]->target_value);
>> }
>> - spin_unlock_irqrestore(&pm_qos_lock, flags);
>> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>>
>> if (call_notifier)
>> blocking_notifier_call_chain(pm_qos_array[target]->notifiers,
>> @@ -195,9 +198,9 @@ int pm_qos_requirement(int pm_qos_class)
>> int ret_val;
>> unsigned long flags;
>>
>> - spin_lock_irqsave(&pm_qos_lock, flags);
>> + spin_lock_irqsave(&pm_qos_rawlock, flags);
>> ret_val = pm_qos_array[pm_qos_class]->target_value;
>> - spin_unlock_irqrestore(&pm_qos_lock, flags);
>> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>>
>> return ret_val;
>> }
>
> As long as RAW_SPINLOCK compiles to a normal spinlock for non-RT premept
> kernels I'm don't see a problem, as the change is almost a no-op for
> non-RT kernels.
Correct, kernels with the rt patch that are configured to be non-rt
change the raw_spinlock to a normal spinlock. This patch still applies
to rt kernels only.
>
> Signed-off-by: mark gross <[email protected]>
>
> Should I send an updated patch that includes a change to the comment
> block regarding the locking design after this patch or instead of it?
>
I've updated my patch to include changes to the comment block about
the locking design. I've also added your Signed-off-by line . (Maybe
Acked-by: would be more appropriate?)
Now that I've separated locking of the target value from the other
locks, Peter's question still remains. Could the lock protecting
target be dropped from mainline which would allow us to drop this
patch altogether from rt? For now the safe thing to do is keep it
protected, but could you explain why it is needed? (it may very well
be)
Thank You.
(updated patch attached)
On Wed, Aug 13, 2008 at 10:24:54AM +0200, John Kacur wrote:
> On Wed, Aug 13, 2008 at 12:49 AM, mark gross <[email protected]> wrote:
> > On Wed, Aug 06, 2008 at 12:18:08AM +0200, John Kacur wrote:
> >> On Tue, Aug 5, 2008 at 11:09 PM, Peter Zijlstra <[email protected]> wrote:
> >> > On Tue, 2008-08-05 at 13:49 -0700, mark gross wrote:
> >> >> On Tue, Aug 05, 2008 at 09:25:01AM +0200, Peter Zijlstra wrote:
> >> >> > On Mon, 2008-08-04 at 22:52 +0200, John Kacur wrote:
> >> >> > > Even after applying some fixes posted by Chirag and Peter Z, I'm still
> >> >> > > getting some messages in my log like this
> >> >> >
> >> >> > > BUG: sleeping function called from invalid context swapper(0) at
> >> >> > > kernel/rtmutex.c:743
> >> >> > > in_atomic():1 [00000001], irqs_disabled():1
> >> >> > > Pid: 0, comm: swapper Tainted: G W 2.6.26.1-rt1.jk #2
> >> >> > >
> >> >> > > Call Trace:
> >> >> > > [<ffffffff802305d3>] __might_sleep+0x12d/0x132
> >> >> > > [<ffffffff8046cdbe>] __rt_spin_lock+0x34/0x7d
> >> >> > > [<ffffffff8046ce15>] rt_spin_lock+0xe/0x10
> >> >> > > [<ffffffff802532e5>] pm_qos_requirement+0x1f/0x3c
> >> >> > > [<ffffffff803e1b7f>] menu_select+0x7b/0x9c
> >> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> >> > > [<ffffffff803e0b4b>] cpuidle_idle_call+0x68/0xd8
> >> >> > > [<ffffffff803e0ae3>] ? cpuidle_idle_call+0x0/0xd8
> >> >> > > [<ffffffff8020b1be>] ? default_idle+0x0/0x5a
> >> >> > > [<ffffffff8020b333>] cpu_idle+0xb2/0x12d
> >> >> > > [<ffffffff80466af0>] start_secondary+0x186/0x18b
> >> >> > >
> >> >> > > ---------------------------
> >> >> > > | preempt count: 00000001 ]
snip
> >> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
> >>
> >> return ret_val;
> >> }
> >
> > As long as RAW_SPINLOCK compiles to a normal spinlock for non-RT premept
> > kernels I'm don't see a problem, as the change is almost a no-op for
> > non-RT kernels.
>
> Correct, kernels with the rt patch that are configured to be non-rt
> change the raw_spinlock to a normal spinlock. This patch still applies
> to rt kernels only.
I was confused about this point, as I thought I saw raw_spinlock defined
in the mainline tree.
> >
> > Signed-off-by: mark gross <[email protected]>
> >
> > Should I send an updated patch that includes a change to the comment
> > block regarding the locking design after this patch or instead of it?
> >
>
> I've updated my patch to include changes to the comment block about
> the locking design. I've also added your Signed-off-by line . (Maybe
> Acked-by: would be more appropriate?)
thanks,
Ok, see below for Acked-by: line.
>
> Now that I've separated locking of the target value from the other
> locks, Peter's question still remains. Could the lock protecting
> target be dropped from mainline which would allow us to drop this
> patch altogether from rt? For now the safe thing to do is keep it
> protected, but could you explain why it is needed? (it may very well
> be)
>
This code is doing list deletions, insertions and walks / element
updates in a multi threaded environment where many processes on many
CPU's can be adding removing and updating PM_QOS request, a lot (tm).
So I think I still need to locking for the list walking and list element
updating code on face value. I'm not supper good with lists, perhaps
there is a trick to protecting the lists better than the way I'm doing
it.
Keeping a lock around the different "target_value"s may not be so
important. Its just a 32bit scaler value, and perhaps we can make it an
atomic type? That way we loose the raw_spinlock.
Acked-by: mark gross <[email protected]>
> Thank You.
> (updated patch attached)
> pm_qos_requirement-fix
> Add a raw_spinlock_t for target. target is modified in pm_qos_requirement
> called by idle so it cannot be allowed to sleep.
>
> Signed-off-by: John Kacur <jkacur at gmail dot com>
> Signed-off-by: mark gross <[email protected]>
>
> Index: linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.26.1-rt1.jk.orig/kernel/pm_qos_params.c
> +++ linux-2.6.26.1-rt1.jk/kernel/pm_qos_params.c
> @@ -42,9 +42,10 @@
> #include <linux/uaccess.h>
>
> /*
> - * locking rule: all changes to target_value or requirements or notifiers lists
> + * locking rule: all changes to requirements or notifiers lists
> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> - * held, taken with _irqsave. One lock to rule them all
> + * held, taken with _irqsave. target is locked by pm_qos_rawlock because it
> + * is modified in pm_qos_requirement called from idle and cannot sleep.
Actually, the target is only getting read by CPUIDLE from idle. It
shouldn't get changed from the idle context.
--mgross
> */
> struct requirement_list {
> struct list_head list;
> @@ -111,6 +112,7 @@ static struct pm_qos_object *pm_qos_arra
> };
>
> static DEFINE_SPINLOCK(pm_qos_lock);
> +static DEFINE_RAW_SPINLOCK(pm_qos_rawlock);
>
> static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> size_t count, loff_t *f_pos);
> @@ -149,13 +151,15 @@ static void update_target(int target)
> extreme_value = pm_qos_array[target]->comparitor(
> extreme_value, node->value);
> }
> + spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_lock_irqsave(&pm_qos_rawlock, flags);
> if (pm_qos_array[target]->target_value != extreme_value) {
> call_notifier = 1;
> pm_qos_array[target]->target_value = extreme_value;
> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
> pm_qos_array[target]->target_value);
> }
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>
> if (call_notifier)
> blocking_notifier_call_chain(pm_qos_array[target]->notifiers,
> @@ -195,9 +199,12 @@ int pm_qos_requirement(int pm_qos_class)
> int ret_val;
> unsigned long flags;
>
> - spin_lock_irqsave(&pm_qos_lock, flags);
> + /*
> + * pm_qos_requirement is called from idle, so it cannot sleep
> + */
> + spin_lock_irqsave(&pm_qos_rawlock, flags);
> ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> + spin_unlock_irqrestore(&pm_qos_rawlock, flags);
>
> return ret_val;
> }
On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> Keeping a lock around the different "target_value"s may not be so
> important. Its just a 32bit scaler value, and perhaps we can make it an
> atomic type? That way we loose the raw_spinlock.
My suggestion was to keep the locking for the write side - so as to
avoid stuff stomping on one another, but drop the read side as:
spin_lock
foo = var;
spin_unlock
return foo;
is kinda useless, it doesn't actually serialize against the usage of
foo, that is, once it gets used, var might already have acquired a new
value.
The only thing it would protect is reading var, but since that is a
machine sized read, its atomic anyway (assuming its naturally aligned).
So no need for atomic_t (its read-side is just a read too), just drop
the whole lock usage from pq_qos_requirement().
On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
>
>> Keeping a lock around the different "target_value"s may not be so
>> important. Its just a 32bit scaler value, and perhaps we can make it an
>> atomic type? That way we loose the raw_spinlock.
>
> My suggestion was to keep the locking for the write side - so as to
> avoid stuff stomping on one another, but drop the read side as:
>
> spin_lock
> foo = var;
> spin_unlock
> return foo;
>
> is kinda useless, it doesn't actually serialize against the usage of
> foo, that is, once it gets used, var might already have acquired a new
> value.
>
> The only thing it would protect is reading var, but since that is a
> machine sized read, its atomic anyway (assuming its naturally aligned).
>
> So no need for atomic_t (its read-side is just a read too), just drop
> the whole lock usage from pq_qos_requirement().
>
Thanks Peter.
Mark, is the following patch ok with you? This should be applied to
mainline, and then after that no special patches are necessary for
real-time.
Thanks
John Kacur
On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
> On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> >
> >> Keeping a lock around the different "target_value"s may not be so
> >> important. Its just a 32bit scaler value, and perhaps we can make it an
> >> atomic type? That way we loose the raw_spinlock.
> >
> > My suggestion was to keep the locking for the write side - so as to
> > avoid stuff stomping on one another, but drop the read side as:
> >
> > spin_lock
> > foo = var;
> > spin_unlock
> > return foo;
> >
> > is kinda useless, it doesn't actually serialize against the usage of
> > foo, that is, once it gets used, var might already have acquired a new
> > value.
> >
> > The only thing it would protect is reading var, but since that is a
> > machine sized read, its atomic anyway (assuming its naturally aligned).
> >
> > So no need for atomic_t (its read-side is just a read too), just drop
> > the whole lock usage from pq_qos_requirement().
> >
>
> Thanks Peter.
>
> Mark, is the following patch ok with you? This should be applied to
> mainline, and then after that no special patches are necessary for
> real-time.
It looks ok to me, do I need to add a compiler declaration to the
structure to make sure the target_value is word aligned?
thanks,
--mgross
>
> Thanks
>
> John Kacur
> Subject: Remove unnecessary lock in pm_qos_requirement
>
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Index: linux-2.6/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.orig/kernel/pm_qos_params.c
> +++ linux-2.6/kernel/pm_qos_params.c
> @@ -193,14 +193,7 @@ static int find_pm_qos_object_by_minor(i
> */
> int pm_qos_requirement(int pm_qos_class)
> {
> - int ret_val;
> - unsigned long flags;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return ret_val;
> + return pm_qos_array[pm_qos_class]->target_value;
> }
> EXPORT_SYMBOL_GPL(pm_qos_requirement);
>
On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
> On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> >
> >> Keeping a lock around the different "target_value"s may not be so
> >> important. Its just a 32bit scaler value, and perhaps we can make it an
> >> atomic type? That way we loose the raw_spinlock.
> >
> > My suggestion was to keep the locking for the write side - so as to
> > avoid stuff stomping on one another, but drop the read side as:
> >
> > spin_lock
> > foo = var;
> > spin_unlock
> > return foo;
> >
> > is kinda useless, it doesn't actually serialize against the usage of
> > foo, that is, once it gets used, var might already have acquired a new
> > value.
> >
> > The only thing it would protect is reading var, but since that is a
> > machine sized read, its atomic anyway (assuming its naturally aligned).
> >
> > So no need for atomic_t (its read-side is just a read too), just drop
> > the whole lock usage from pq_qos_requirement().
> >
>
> Thanks Peter.
>
> Mark, is the following patch ok with you? This should be applied to
> mainline, and then after that no special patches are necessary for
> real-time.
I've been thinking about this patch and I worry that the readability
from making the use of this lock asymmetric WRT reads and writes to the
storage address is bothersome.
I would rather make the variable an atomic. What do you think about
that?
--mgross
>
> Thanks
>
> John Kacur
> Subject: Remove unnecessary lock in pm_qos_requirement
>
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Index: linux-2.6/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.orig/kernel/pm_qos_params.c
> +++ linux-2.6/kernel/pm_qos_params.c
> @@ -193,14 +193,7 @@ static int find_pm_qos_object_by_minor(i
> */
> int pm_qos_requirement(int pm_qos_class)
> {
> - int ret_val;
> - unsigned long flags;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return ret_val;
> + return pm_qos_array[pm_qos_class]->target_value;
> }
> EXPORT_SYMBOL_GPL(pm_qos_requirement);
>
On Mon, 2008-08-25 at 09:34 -0700, mark gross wrote:
> On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
> > On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> > > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> > >
> > >> Keeping a lock around the different "target_value"s may not be so
> > >> important. Its just a 32bit scaler value, and perhaps we can make it an
> > >> atomic type? That way we loose the raw_spinlock.
> > >
> > > My suggestion was to keep the locking for the write side - so as to
> > > avoid stuff stomping on one another, but drop the read side as:
> > >
> > > spin_lock
> > > foo = var;
> > > spin_unlock
> > > return foo;
> > >
> > > is kinda useless, it doesn't actually serialize against the usage of
> > > foo, that is, once it gets used, var might already have acquired a new
> > > value.
> > >
> > > The only thing it would protect is reading var, but since that is a
> > > machine sized read, its atomic anyway (assuming its naturally aligned).
> > >
> > > So no need for atomic_t (its read-side is just a read too), just drop
> > > the whole lock usage from pq_qos_requirement().
> > >
> >
> > Thanks Peter.
> >
> > Mark, is the following patch ok with you? This should be applied to
> > mainline, and then after that no special patches are necessary for
> > real-time.
>
> I've been thinking about this patch and I worry that the readability
> from making the use of this lock asymmetric WRT reads and writes to the
> storage address is bothersome.
>
> I would rather make the variable an atomic. What do you think about
> that?
It would make the write side more expensive, as we already have the two
atomic operations for the lock and unlock, this would add a third.
Then again, I doubt that this is really a fast path.
OTOH, a simple comment could clarify the situation for the reader.
Up to you I guess ;-)
On Mon, Aug 25, 2008 at 6:35 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2008-08-25 at 09:34 -0700, mark gross wrote:
>> On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
>> > On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
>> > > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
>> > >
>> > >> Keeping a lock around the different "target_value"s may not be so
>> > >> important. Its just a 32bit scaler value, and perhaps we can make it an
>> > >> atomic type? That way we loose the raw_spinlock.
>> > >
>> > > My suggestion was to keep the locking for the write side - so as to
>> > > avoid stuff stomping on one another, but drop the read side as:
>> > >
>> > > spin_lock
>> > > foo = var;
>> > > spin_unlock
>> > > return foo;
>> > >
>> > > is kinda useless, it doesn't actually serialize against the usage of
>> > > foo, that is, once it gets used, var might already have acquired a new
>> > > value.
>> > >
>> > > The only thing it would protect is reading var, but since that is a
>> > > machine sized read, its atomic anyway (assuming its naturally aligned).
>> > >
>> > > So no need for atomic_t (its read-side is just a read too), just drop
>> > > the whole lock usage from pq_qos_requirement().
>> > >
>> >
>> > Thanks Peter.
>> >
>> > Mark, is the following patch ok with you? This should be applied to
>> > mainline, and then after that no special patches are necessary for
>> > real-time.
>>
>> I've been thinking about this patch and I worry that the readability
>> from making the use of this lock asymmetric WRT reads and writes to the
>> storage address is bothersome.
>>
>> I would rather make the variable an atomic. What do you think about
>> that?
>
> It would make the write side more expensive, as we already have the two
> atomic operations for the lock and unlock, this would add a third.
>
> Then again, I doubt that this is really a fast path.
>
> OTOH, a simple comment could clarify the situation for the reader.
>
> Up to you I guess ;-)
>
Personally I agree with Peter, a simple comment would clarify the
situation, it seems quite silly to me to add complexity in the name of
symmetry. This is not my definition of readability. Never-the-less I
offer up solution number 3 here if that would please everyone more.
Attached is a patch that changes the target value to an atomic
variable as suggested by Arjan. To summarize.
3 Sol'ns - all of which solve the problem.
1. Add a raw spinlock around target value only. This makes the raw
spinlock area very small, and is converted to a normal spinlock for
non-preempt-rt.
2. Remove the spinlock altogether in pm_qos_requirement since the
simple read is already atomic. Advantage - smallest patch and realtime
doesn't require a special patch once this is included in mainline. I
like this one the best.
3. make target_value atomic_t. Advantage - symmetry, some people find
this more readable. The patch is larger than the above solution but as
above, no special patch is required for realtime once this is included
in mainline. Solution three is in the attached patch. Comments are
appreciated as always.
On Tue, Aug 26, 2008 at 10:48:13AM +0200, John Kacur wrote:
> On Mon, Aug 25, 2008 at 6:35 PM, Peter Zijlstra <[email protected]> wrote:
> > On Mon, 2008-08-25 at 09:34 -0700, mark gross wrote:
> >> On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
> >> > On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> >> > > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> >> > >
> >> > >> Keeping a lock around the different "target_value"s may not be so
> >> > >> important. Its just a 32bit scaler value, and perhaps we can make it an
> >> > >> atomic type? That way we loose the raw_spinlock.
> >> > >
> >> > > My suggestion was to keep the locking for the write side - so as to
> >> > > avoid stuff stomping on one another, but drop the read side as:
> >> > >
> >> > > spin_lock
> >> > > foo = var;
> >> > > spin_unlock
> >> > > return foo;
> >> > >
> >> > > is kinda useless, it doesn't actually serialize against the usage of
> >> > > foo, that is, once it gets used, var might already have acquired a new
> >> > > value.
> >> > >
> >> > > The only thing it would protect is reading var, but since that is a
> >> > > machine sized read, its atomic anyway (assuming its naturally aligned).
> >> > >
> >> > > So no need for atomic_t (its read-side is just a read too), just drop
> >> > > the whole lock usage from pq_qos_requirement().
> >> > >
> >> >
> >> > Thanks Peter.
> >> >
> >> > Mark, is the following patch ok with you? This should be applied to
> >> > mainline, and then after that no special patches are necessary for
> >> > real-time.
> >>
> >> I've been thinking about this patch and I worry that the readability
> >> from making the use of this lock asymmetric WRT reads and writes to the
> >> storage address is bothersome.
> >>
> >> I would rather make the variable an atomic. What do you think about
> >> that?
> >
> > It would make the write side more expensive, as we already have the two
> > atomic operations for the lock and unlock, this would add a third.
> >
> > Then again, I doubt that this is really a fast path.
> >
> > OTOH, a simple comment could clarify the situation for the reader.
> >
> > Up to you I guess ;-)
> >
>
> Personally I agree with Peter, a simple comment would clarify the
> situation, it seems quite silly to me to add complexity in the name of
> symmetry. This is not my definition of readability. Never-the-less I
> offer up solution number 3 here if that would please everyone more.
> Attached is a patch that changes the target value to an atomic
> variable as suggested by Arjan. To summarize.
>
> 3 Sol'ns - all of which solve the problem.
> 1. Add a raw spinlock around target value only. This makes the raw
> spinlock area very small, and is converted to a normal spinlock for
> non-preempt-rt.
> 2. Remove the spinlock altogether in pm_qos_requirement since the
> simple read is already atomic. Advantage - smallest patch and realtime
> doesn't require a special patch once this is included in mainline. I
> like this one the best.
> 3. make target_value atomic_t. Advantage - symmetry, some people find
> this more readable. The patch is larger than the above solution but as
> above, no special patch is required for realtime once this is included
> in mainline. Solution three is in the attached patch. Comments are
> appreciated as always.
Thank you! FWIW I'm really on the fence between option 2 and 3.
> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
> This is necessary for real-time since pm_qos_requirement is called by idle and
> cannot be allowed to sleep.
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Index: linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.26.3-rt3.orig/kernel/pm_qos_params.c
> +++ linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> @@ -42,7 +42,7 @@
> #include <linux/uaccess.h>
>
> /*
> - * locking rule: all changes to target_value or requirements or notifiers lists
> + * locking rule: all changes to requirements or notifiers lists
> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> * held, taken with _irqsave. One lock to rule them all
> */
> @@ -65,7 +65,7 @@ struct pm_qos_object {
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> s32 default_value;
> - s32 target_value;
> + atomic_t target_value;
> s32 (*comparitor)(s32, s32);
> };
>
> @@ -76,7 +76,7 @@ static struct pm_qos_object cpu_dma_pm_q
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -86,7 +86,7 @@ static struct pm_qos_object network_lat_
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -98,7 +98,7 @@ static struct pm_qos_object network_thro
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> .default_value = 0,
> - .target_value = 0,
> + .target_value = ATOMIC_INIT(0),
> .comparitor = max_compare
> };
>
> @@ -149,13 +149,14 @@ static void update_target(int target)
> extreme_value = pm_qos_array[target]->comparitor(
> extreme_value, node->value);
> }
> - if (pm_qos_array[target]->target_value != extreme_value) {
> + spin_unlock_irqrestore(&pm_qos_lock, flags);
> +
do we want to move the unlock before the setting of the target_value?
This feels wrong to me, the option 2 patch didn't do this.
couldn't we have a race from 2 cpu's hitting update_target at the same
time with different values if we drop the lock before the target_value
is set?
--mgross
> + if (atomic_read(&pm_qos_array[target]->target_value) != extreme_value) {
> call_notifier = 1;
> - pm_qos_array[target]->target_value = extreme_value;
> + atomic_set(&pm_qos_array[target]->target_value, extreme_value);
> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
> - pm_qos_array[target]->target_value);
> + atomic_read(&pm_qos_array[target]->target_value));
> }
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> if (call_notifier)
> blocking_notifier_call_chain(pm_qos_array[target]->notifiers,
> @@ -193,11 +194,8 @@ static int find_pm_qos_object_by_minor(i
> int pm_qos_requirement(int pm_qos_class)
> {
> int ret_val;
> - unsigned long flags;
>
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> + ret_val = atomic_read(&pm_qos_array[pm_qos_class]->target_value);
>
> return ret_val;
> }
On Tue, Aug 26, 2008 at 6:18 PM, mark gross <[email protected]> wrote:
> On Tue, Aug 26, 2008 at 10:48:13AM +0200, John Kacur wrote:
>> On Mon, Aug 25, 2008 at 6:35 PM, Peter Zijlstra <[email protected]> wrote:
>> > On Mon, 2008-08-25 at 09:34 -0700, mark gross wrote:
>> >> On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
>> >> > On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
>> >> > > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
>> >> > >
>> >> > >> Keeping a lock around the different "target_value"s may not be so
>> >> > >> important. Its just a 32bit scaler value, and perhaps we can make it an
>> >> > >> atomic type? That way we loose the raw_spinlock.
>> >> > >
>> >> > > My suggestion was to keep the locking for the write side - so as to
>> >> > > avoid stuff stomping on one another, but drop the read side as:
>> >> > >
>> >> > > spin_lock
>> >> > > foo = var;
>> >> > > spin_unlock
>> >> > > return foo;
>> >> > >
>> >> > > is kinda useless, it doesn't actually serialize against the usage of
>> >> > > foo, that is, once it gets used, var might already have acquired a new
>> >> > > value.
>> >> > >
>> >> > > The only thing it would protect is reading var, but since that is a
>> >> > > machine sized read, its atomic anyway (assuming its naturally aligned).
>> >> > >
>> >> > > So no need for atomic_t (its read-side is just a read too), just drop
>> >> > > the whole lock usage from pq_qos_requirement().
>> >> > >
>> >> >
>> >> > Thanks Peter.
>> >> >
>> >> > Mark, is the following patch ok with you? This should be applied to
>> >> > mainline, and then after that no special patches are necessary for
>> >> > real-time.
>> >>
>> >> I've been thinking about this patch and I worry that the readability
>> >> from making the use of this lock asymmetric WRT reads and writes to the
>> >> storage address is bothersome.
>> >>
>> >> I would rather make the variable an atomic. What do you think about
>> >> that?
>> >
>> > It would make the write side more expensive, as we already have the two
>> > atomic operations for the lock and unlock, this would add a third.
>> >
>> > Then again, I doubt that this is really a fast path.
>> >
>> > OTOH, a simple comment could clarify the situation for the reader.
>> >
>> > Up to you I guess ;-)
>> >
>>
>> Personally I agree with Peter, a simple comment would clarify the
>> situation, it seems quite silly to me to add complexity in the name of
>> symmetry. This is not my definition of readability. Never-the-less I
>> offer up solution number 3 here if that would please everyone more.
>> Attached is a patch that changes the target value to an atomic
>> variable as suggested by Arjan. To summarize.
>>
>> 3 Sol'ns - all of which solve the problem.
>> 1. Add a raw spinlock around target value only. This makes the raw
>> spinlock area very small, and is converted to a normal spinlock for
>> non-preempt-rt.
>> 2. Remove the spinlock altogether in pm_qos_requirement since the
>> simple read is already atomic. Advantage - smallest patch and realtime
>> doesn't require a special patch once this is included in mainline. I
>> like this one the best.
>> 3. make target_value atomic_t. Advantage - symmetry, some people find
>> this more readable. The patch is larger than the above solution but as
>> above, no special patch is required for realtime once this is included
>> in mainline. Solution three is in the attached patch. Comments are
>> appreciated as always.
>
> Thank you! FWIW I'm really on the fence between option 2 and 3.
>
>> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
>> This is necessary for real-time since pm_qos_requirement is called by idle and
>> cannot be allowed to sleep.
>> Signed-off-by: John Kacur <jkacur at gmail dot com>
>>
>> Index: linux-2.6.26.3-rt3/kernel/pm_qos_params.c
>> ===================================================================
>> --- linux-2.6.26.3-rt3.orig/kernel/pm_qos_params.c
>> +++ linux-2.6.26.3-rt3/kernel/pm_qos_params.c
>> @@ -42,7 +42,7 @@
>> #include <linux/uaccess.h>
>>
>> /*
>> - * locking rule: all changes to target_value or requirements or notifiers lists
>> + * locking rule: all changes to requirements or notifiers lists
>> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
>> * held, taken with _irqsave. One lock to rule them all
>> */
>> @@ -65,7 +65,7 @@ struct pm_qos_object {
>> struct miscdevice pm_qos_power_miscdev;
>> char *name;
>> s32 default_value;
>> - s32 target_value;
>> + atomic_t target_value;
>> s32 (*comparitor)(s32, s32);
>> };
>>
>> @@ -76,7 +76,7 @@ static struct pm_qos_object cpu_dma_pm_q
>> .notifiers = &cpu_dma_lat_notifier,
>> .name = "cpu_dma_latency",
>> .default_value = 2000 * USEC_PER_SEC,
>> - .target_value = 2000 * USEC_PER_SEC,
>> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
>> .comparitor = min_compare
>> };
>>
>> @@ -86,7 +86,7 @@ static struct pm_qos_object network_lat_
>> .notifiers = &network_lat_notifier,
>> .name = "network_latency",
>> .default_value = 2000 * USEC_PER_SEC,
>> - .target_value = 2000 * USEC_PER_SEC,
>> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
>> .comparitor = min_compare
>> };
>>
>> @@ -98,7 +98,7 @@ static struct pm_qos_object network_thro
>> .notifiers = &network_throughput_notifier,
>> .name = "network_throughput",
>> .default_value = 0,
>> - .target_value = 0,
>> + .target_value = ATOMIC_INIT(0),
>> .comparitor = max_compare
>> };
>>
>> @@ -149,13 +149,14 @@ static void update_target(int target)
>> extreme_value = pm_qos_array[target]->comparitor(
>> extreme_value, node->value);
>> }
>> - if (pm_qos_array[target]->target_value != extreme_value) {
>> + spin_unlock_irqrestore(&pm_qos_lock, flags);
>> +
>
> do we want to move the unlock before the setting of the target_value?
> This feels wrong to me, the option 2 patch didn't do this.
>
> couldn't we have a race from 2 cpu's hitting update_target at the same
> time with different values if we drop the lock before the target_value
> is set?
I think you are right since atomicity doesn't have anything to do with
ordering, good catch, putting the the unlock back where it was before,
new patch attached. (also shortened-up pm_qos_requirement)
---SNIP----
John
On Tue, Aug 26, 2008 at 07:45:37PM +0200, John Kacur wrote:
> On Tue, Aug 26, 2008 at 6:18 PM, mark gross <[email protected]> wrote:
> > On Tue, Aug 26, 2008 at 10:48:13AM +0200, John Kacur wrote:
> >> On Mon, Aug 25, 2008 at 6:35 PM, Peter Zijlstra <[email protected]> wrote:
> >> > On Mon, 2008-08-25 at 09:34 -0700, mark gross wrote:
> >> >> On Fri, Aug 15, 2008 at 12:51:11AM +0200, John Kacur wrote:
> >> >> > On Thu, Aug 14, 2008 at 7:48 PM, Peter Zijlstra <[email protected]> wrote:
> >> >> > > On Thu, 2008-08-14 at 08:52 -0700, mark gross wrote:
> >> >> > >
> >> >> > >> Keeping a lock around the different "target_value"s may not be so
> >> >> > >> important. Its just a 32bit scaler value, and perhaps we can make it an
> >> >> > >> atomic type? That way we loose the raw_spinlock.
> >> >> > >
> >> >> > > My suggestion was to keep the locking for the write side - so as to
> >> >> > > avoid stuff stomping on one another, but drop the read side as:
> >> >> > >
> >> >> > > spin_lock
> >> >> > > foo = var;
> >> >> > > spin_unlock
> >> >> > > return foo;
> >> >> > >
> >> >> > > is kinda useless, it doesn't actually serialize against the usage of
> >> >> > > foo, that is, once it gets used, var might already have acquired a new
> >> >> > > value.
> >> >> > >
> >> >> > > The only thing it would protect is reading var, but since that is a
> >> >> > > machine sized read, its atomic anyway (assuming its naturally aligned).
> >> >> > >
> >> >> > > So no need for atomic_t (its read-side is just a read too), just drop
> >> >> > > the whole lock usage from pq_qos_requirement().
> >> >> > >
> >> >> >
> >> >> > Thanks Peter.
> >> >> >
> >> >> > Mark, is the following patch ok with you? This should be applied to
> >> >> > mainline, and then after that no special patches are necessary for
> >> >> > real-time.
> >> >>
> >> >> I've been thinking about this patch and I worry that the readability
> >> >> from making the use of this lock asymmetric WRT reads and writes to the
> >> >> storage address is bothersome.
> >> >>
> >> >> I would rather make the variable an atomic. What do you think about
> >> >> that?
> >> >
> >> > It would make the write side more expensive, as we already have the two
> >> > atomic operations for the lock and unlock, this would add a third.
> >> >
> >> > Then again, I doubt that this is really a fast path.
> >> >
> >> > OTOH, a simple comment could clarify the situation for the reader.
> >> >
> >> > Up to you I guess ;-)
> >> >
> >>
> >> Personally I agree with Peter, a simple comment would clarify the
> >> situation, it seems quite silly to me to add complexity in the name of
> >> symmetry. This is not my definition of readability. Never-the-less I
> >> offer up solution number 3 here if that would please everyone more.
> >> Attached is a patch that changes the target value to an atomic
> >> variable as suggested by Arjan. To summarize.
> >>
> >> 3 Sol'ns - all of which solve the problem.
> >> 1. Add a raw spinlock around target value only. This makes the raw
> >> spinlock area very small, and is converted to a normal spinlock for
> >> non-preempt-rt.
> >> 2. Remove the spinlock altogether in pm_qos_requirement since the
> >> simple read is already atomic. Advantage - smallest patch and realtime
> >> doesn't require a special patch once this is included in mainline. I
> >> like this one the best.
> >> 3. make target_value atomic_t. Advantage - symmetry, some people find
> >> this more readable. The patch is larger than the above solution but as
> >> above, no special patch is required for realtime once this is included
> >> in mainline. Solution three is in the attached patch. Comments are
> >> appreciated as always.
> >
> > Thank you! FWIW I'm really on the fence between option 2 and 3.
> >
> >> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
> >> This is necessary for real-time since pm_qos_requirement is called by idle and
> >> cannot be allowed to sleep.
> >> Signed-off-by: John Kacur <jkacur at gmail dot com>
> >>
> >> Index: linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> >> ===================================================================
> >> --- linux-2.6.26.3-rt3.orig/kernel/pm_qos_params.c
> >> +++ linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> >> @@ -42,7 +42,7 @@
> >> #include <linux/uaccess.h>
> >>
> >> /*
> >> - * locking rule: all changes to target_value or requirements or notifiers lists
> >> + * locking rule: all changes to requirements or notifiers lists
> >> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> >> * held, taken with _irqsave. One lock to rule them all
> >> */
> >> @@ -65,7 +65,7 @@ struct pm_qos_object {
> >> struct miscdevice pm_qos_power_miscdev;
> >> char *name;
> >> s32 default_value;
> >> - s32 target_value;
> >> + atomic_t target_value;
> >> s32 (*comparitor)(s32, s32);
> >> };
> >>
> >> @@ -76,7 +76,7 @@ static struct pm_qos_object cpu_dma_pm_q
> >> .notifiers = &cpu_dma_lat_notifier,
> >> .name = "cpu_dma_latency",
> >> .default_value = 2000 * USEC_PER_SEC,
> >> - .target_value = 2000 * USEC_PER_SEC,
> >> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> >> .comparitor = min_compare
> >> };
> >>
> >> @@ -86,7 +86,7 @@ static struct pm_qos_object network_lat_
> >> .notifiers = &network_lat_notifier,
> >> .name = "network_latency",
> >> .default_value = 2000 * USEC_PER_SEC,
> >> - .target_value = 2000 * USEC_PER_SEC,
> >> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> >> .comparitor = min_compare
> >> };
> >>
> >> @@ -98,7 +98,7 @@ static struct pm_qos_object network_thro
> >> .notifiers = &network_throughput_notifier,
> >> .name = "network_throughput",
> >> .default_value = 0,
> >> - .target_value = 0,
> >> + .target_value = ATOMIC_INIT(0),
> >> .comparitor = max_compare
> >> };
> >>
> >> @@ -149,13 +149,14 @@ static void update_target(int target)
> >> extreme_value = pm_qos_array[target]->comparitor(
> >> extreme_value, node->value);
> >> }
> >> - if (pm_qos_array[target]->target_value != extreme_value) {
> >> + spin_unlock_irqrestore(&pm_qos_lock, flags);
> >> +
> >
> > do we want to move the unlock before the setting of the target_value?
> > This feels wrong to me, the option 2 patch didn't do this.
> >
> > couldn't we have a race from 2 cpu's hitting update_target at the same
> > time with different values if we drop the lock before the target_value
> > is set?
>
> I think you are right since atomicity doesn't have anything to do with
> ordering, good catch, putting the the unlock back where it was before,
> new patch attached. (also shortened-up pm_qos_requirement)
>
> ---SNIP----
>
> John
looks good, its running without error on the non-rt kernel.
Signed-off-by: mark gross <[email protected]>
FWIW I'll send this as patch to Andrew in a moment.
--mgross
> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
> This is necessary for real-time since pm_qos_requirement is called by idle and
> cannot be allowed to sleep.
> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> Index: linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.26.3-rt3.orig/kernel/pm_qos_params.c
> +++ linux-2.6.26.3-rt3/kernel/pm_qos_params.c
> @@ -42,7 +42,7 @@
> #include <linux/uaccess.h>
>
> /*
> - * locking rule: all changes to target_value or requirements or notifiers lists
> + * locking rule: all changes to requirements or notifiers lists
> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> * held, taken with _irqsave. One lock to rule them all
> */
> @@ -65,7 +65,7 @@ struct pm_qos_object {
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> s32 default_value;
> - s32 target_value;
> + atomic_t target_value;
> s32 (*comparitor)(s32, s32);
> };
>
> @@ -76,7 +76,7 @@ static struct pm_qos_object cpu_dma_pm_q
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -86,7 +86,7 @@ static struct pm_qos_object network_lat_
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -98,7 +98,7 @@ static struct pm_qos_object network_thro
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> .default_value = 0,
> - .target_value = 0,
> + .target_value = ATOMIC_INIT(0),
> .comparitor = max_compare
> };
>
> @@ -149,11 +149,11 @@ static void update_target(int target)
> extreme_value = pm_qos_array[target]->comparitor(
> extreme_value, node->value);
> }
> - if (pm_qos_array[target]->target_value != extreme_value) {
> + if (atomic_read(&pm_qos_array[target]->target_value) != extreme_value) {
> call_notifier = 1;
> - pm_qos_array[target]->target_value = extreme_value;
> + atomic_set(&pm_qos_array[target]->target_value, extreme_value);
> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
> - pm_qos_array[target]->target_value);
> + atomic_read(&pm_qos_array[target]->target_value));
> }
> spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> @@ -192,14 +192,7 @@ static int find_pm_qos_object_by_minor(i
> */
> int pm_qos_requirement(int pm_qos_class)
> {
> - int ret_val;
> - unsigned long flags;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return ret_val;
> + return atomic_read(&pm_qos_array[pm_qos_class]->target_value);
> }
> EXPORT_SYMBOL_GPL(pm_qos_requirement);
>
From: John Kacur <jkacur at gmail dot com>
Patch to make PM_QOS and CPU_IDLE play nicer when ran with the
RT-Preempt kernel. CPU_IDLE polls the target_value's of some of the
pm_qos parameters from the idle loop causing sleeping locking
warnings. Changing the target_value to an atomic avoids this issue.
Signed-off-by: mark gross <[email protected]>
Thanks,
--mgross
Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
This is necessary for real-time since pm_qos_requirement is called by idle and
cannot be allowed to sleep.
Signed-off-by: John Kacur <jkacur at gmail dot com>
Index: linux-2.6/kernel/pm_qos_params.c
===================================================================
--- linux-2.6.orig/kernel/pm_qos_params.c 2008-08-08 07:52:01.000000000 -0700
+++ linux-2.6/kernel/pm_qos_params.c 2008-08-28 12:14:55.000000000 -0700
@@ -43,7 +43,7 @@
#include <linux/uaccess.h>
/*
- * locking rule: all changes to target_value or requirements or notifiers lists
+ * locking rule: all changes to requirements or notifiers lists
* or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
* held, taken with _irqsave. One lock to rule them all
*/
@@ -66,7 +66,7 @@
struct miscdevice pm_qos_power_miscdev;
char *name;
s32 default_value;
- s32 target_value;
+ atomic_t target_value;
s32 (*comparitor)(s32, s32);
};
@@ -77,7 +77,7 @@
.notifiers = &cpu_dma_lat_notifier,
.name = "cpu_dma_latency",
.default_value = 2000 * USEC_PER_SEC,
- .target_value = 2000 * USEC_PER_SEC,
+ .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
.comparitor = min_compare
};
@@ -87,7 +87,7 @@
.notifiers = &network_lat_notifier,
.name = "network_latency",
.default_value = 2000 * USEC_PER_SEC,
- .target_value = 2000 * USEC_PER_SEC,
+ .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
.comparitor = min_compare
};
@@ -99,7 +99,7 @@
.notifiers = &network_throughput_notifier,
.name = "network_throughput",
.default_value = 0,
- .target_value = 0,
+ .target_value = ATOMIC_INIT(0),
.comparitor = max_compare
};
@@ -150,11 +150,11 @@
extreme_value = pm_qos_array[target]->comparitor(
extreme_value, node->value);
}
- if (pm_qos_array[target]->target_value != extreme_value) {
+ if (atomic_read(&pm_qos_array[target]->target_value) != extreme_value) {
call_notifier = 1;
- pm_qos_array[target]->target_value = extreme_value;
+ atomic_set(&pm_qos_array[target]->target_value, extreme_value);
pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
- pm_qos_array[target]->target_value);
+ atomic_read(&pm_qos_array[target]->target_value));
}
spin_unlock_irqrestore(&pm_qos_lock, flags);
@@ -193,14 +193,7 @@
*/
int pm_qos_requirement(int pm_qos_class)
{
- int ret_val;
- unsigned long flags;
-
- spin_lock_irqsave(&pm_qos_lock, flags);
- ret_val = pm_qos_array[pm_qos_class]->target_value;
- spin_unlock_irqrestore(&pm_qos_lock, flags);
-
- return ret_val;
+ return atomic_read(&pm_qos_array[pm_qos_class]->target_value);
}
EXPORT_SYMBOL_GPL(pm_qos_requirement);
On Thu, 28 Aug 2008 12:44:20 -0700
mark gross <[email protected]> wrote:
> From: John Kacur <jkacur at gmail dot com>
I tend to deobfuscate email addresses when preparing the changelogs.
Not sure why, really, and I feel a bit guilty about doing it (hey, you
can forward the resulting spam to me if you like). One reason I guess
is to avoid breaking all the emails which my scripts will automatically
send out as the patch goes through its little lifecycle. Some of which
can sometimes be quite important.
> Patch to make PM_QOS and CPU_IDLE play nicer when ran with the
> RT-Preempt kernel. CPU_IDLE polls the target_value's of some of the
> pm_qos parameters from the idle loop causing sleeping locking
> warnings. Changing the target_value to an atomic avoids this issue.
>
> Signed-off-by: mark gross <[email protected]>
>
> Thanks,
>
> --mgross
>
>
> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
> This is necessary for real-time since pm_qos_requirement is called by idle and
> cannot be allowed to sleep.
> Signed-off-by: John Kacur <jkacur at gmail dot com>
As usual when I get two different changelogs, I create a masterpiece
which is better than either of the originals!
> Index: linux-2.6/kernel/pm_qos_params.c
> ===================================================================
> --- linux-2.6.orig/kernel/pm_qos_params.c 2008-08-08 07:52:01.000000000 -0700
> +++ linux-2.6/kernel/pm_qos_params.c 2008-08-28 12:14:55.000000000 -0700
> @@ -43,7 +43,7 @@
> #include <linux/uaccess.h>
>
> /*
> - * locking rule: all changes to target_value or requirements or notifiers lists
> + * locking rule: all changes to requirements or notifiers lists
> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
> * held, taken with _irqsave. One lock to rule them all
> */
> @@ -66,7 +66,7 @@
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> s32 default_value;
> - s32 target_value;
> + atomic_t target_value;
> s32 (*comparitor)(s32, s32);
> };
>
> @@ -77,7 +77,7 @@
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -87,7 +87,7 @@
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> .default_value = 2000 * USEC_PER_SEC,
> - .target_value = 2000 * USEC_PER_SEC,
> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
> .comparitor = min_compare
> };
>
> @@ -99,7 +99,7 @@
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> .default_value = 0,
> - .target_value = 0,
> + .target_value = ATOMIC_INIT(0),
> .comparitor = max_compare
> };
>
> @@ -150,11 +150,11 @@
> extreme_value = pm_qos_array[target]->comparitor(
> extreme_value, node->value);
> }
> - if (pm_qos_array[target]->target_value != extreme_value) {
> + if (atomic_read(&pm_qos_array[target]->target_value) != extreme_value) {
> call_notifier = 1;
> - pm_qos_array[target]->target_value = extreme_value;
> + atomic_set(&pm_qos_array[target]->target_value, extreme_value);
> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
> - pm_qos_array[target]->target_value);
> + atomic_read(&pm_qos_array[target]->target_value));
> }
> spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> @@ -193,14 +193,7 @@
> */
> int pm_qos_requirement(int pm_qos_class)
> {
> - int ret_val;
> - unsigned long flags;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - ret_val = pm_qos_array[pm_qos_class]->target_value;
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return ret_val;
> + return atomic_read(&pm_qos_array[pm_qos_class]->target_value);
> }
> EXPORT_SYMBOL_GPL(pm_qos_requirement);
OK, so I've stared and stared. This patch doesn't do anything.
I'm suspecting that the rt patch might be doing something silly, like
thinking that it needs to take pm_qos_lock to read a single word from
memory, or something like that.
But this patch is simply not understandable given the amount of
information which you guys have provided.
Help?
On Fri, Aug 29, 2008 at 2:32 AM, Andrew Morton
<[email protected]> wrote:
> On Thu, 28 Aug 2008 12:44:20 -0700
> mark gross <[email protected]> wrote:
>
>> From: John Kacur <jkacur at gmail dot com>
>
> I tend to deobfuscate email addresses when preparing the changelogs.
>
> Not sure why, really, and I feel a bit guilty about doing it (hey, you
> can forward the resulting spam to me if you like). One reason I guess
> is to avoid breaking all the emails which my scripts will automatically
> send out as the patch goes through its little lifecycle. Some of which
> can sometimes be quite important.
>
>> Patch to make PM_QOS and CPU_IDLE play nicer when ran with the
>> RT-Preempt kernel. CPU_IDLE polls the target_value's of some of the
>> pm_qos parameters from the idle loop causing sleeping locking
>> warnings. Changing the target_value to an atomic avoids this issue.
>>
>> Signed-off-by: mark gross <[email protected]>
>>
>> Thanks,
>>
>> --mgross
>>
>>
>> Remove the spinlock in pm_qos_requirement by making target_value an atomic type.
>> This is necessary for real-time since pm_qos_requirement is called by idle and
>> cannot be allowed to sleep.
>> Signed-off-by: John Kacur <jkacur at gmail dot com>
>
> As usual when I get two different changelogs, I create a masterpiece
> which is better than either of the originals!
>
>> Index: linux-2.6/kernel/pm_qos_params.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/pm_qos_params.c 2008-08-08 07:52:01.000000000 -0700
>> +++ linux-2.6/kernel/pm_qos_params.c 2008-08-28 12:14:55.000000000 -0700
>> @@ -43,7 +43,7 @@
>> #include <linux/uaccess.h>
>>
>> /*
>> - * locking rule: all changes to target_value or requirements or notifiers lists
>> + * locking rule: all changes to requirements or notifiers lists
>> * or pm_qos_object list and pm_qos_objects need to happen with pm_qos_lock
>> * held, taken with _irqsave. One lock to rule them all
>> */
>> @@ -66,7 +66,7 @@
>> struct miscdevice pm_qos_power_miscdev;
>> char *name;
>> s32 default_value;
>> - s32 target_value;
>> + atomic_t target_value;
>> s32 (*comparitor)(s32, s32);
>> };
>>
>> @@ -77,7 +77,7 @@
>> .notifiers = &cpu_dma_lat_notifier,
>> .name = "cpu_dma_latency",
>> .default_value = 2000 * USEC_PER_SEC,
>> - .target_value = 2000 * USEC_PER_SEC,
>> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
>> .comparitor = min_compare
>> };
>>
>> @@ -87,7 +87,7 @@
>> .notifiers = &network_lat_notifier,
>> .name = "network_latency",
>> .default_value = 2000 * USEC_PER_SEC,
>> - .target_value = 2000 * USEC_PER_SEC,
>> + .target_value = ATOMIC_INIT(2000 * USEC_PER_SEC),
>> .comparitor = min_compare
>> };
>>
>> @@ -99,7 +99,7 @@
>> .notifiers = &network_throughput_notifier,
>> .name = "network_throughput",
>> .default_value = 0,
>> - .target_value = 0,
>> + .target_value = ATOMIC_INIT(0),
>> .comparitor = max_compare
>> };
>>
>> @@ -150,11 +150,11 @@
>> extreme_value = pm_qos_array[target]->comparitor(
>> extreme_value, node->value);
>> }
>> - if (pm_qos_array[target]->target_value != extreme_value) {
>> + if (atomic_read(&pm_qos_array[target]->target_value) != extreme_value) {
>> call_notifier = 1;
>> - pm_qos_array[target]->target_value = extreme_value;
>> + atomic_set(&pm_qos_array[target]->target_value, extreme_value);
>> pr_debug(KERN_ERR "new target for qos %d is %d\n", target,
>> - pm_qos_array[target]->target_value);
>> + atomic_read(&pm_qos_array[target]->target_value));
>> }
>> spin_unlock_irqrestore(&pm_qos_lock, flags);
>>
>> @@ -193,14 +193,7 @@
>> */
>> int pm_qos_requirement(int pm_qos_class)
>> {
>> - int ret_val;
>> - unsigned long flags;
>> -
>> - spin_lock_irqsave(&pm_qos_lock, flags);
>> - ret_val = pm_qos_array[pm_qos_class]->target_value;
>> - spin_unlock_irqrestore(&pm_qos_lock, flags);
>> -
>> - return ret_val;
>> + return atomic_read(&pm_qos_array[pm_qos_class]->target_value);
>> }
>> EXPORT_SYMBOL_GPL(pm_qos_requirement);
>
> OK, so I've stared and stared. This patch doesn't do anything.
>
> I'm suspecting that the rt patch might be doing something silly, like
> thinking that it needs to take pm_qos_lock to read a single word from
> memory, or something like that.
>
> But this patch is simply not understandable given the amount of
> information which you guys have provided.
>
Hi Andrew
The purpose of the patch is to remove the spin_lock around the read in
the function pm_qos_requirement - since spinlocks can sleep in -rt and
this function is called from idle. I did propose a patch, that simply
removed the spinlock, since Peter Zilstra and others (yourself) point
out, the lock isn't needed to read a single word from memory. Although
the patch worked fine, Arjan objected to it, because it was not
symmetric. A reader of the code might wonder why it was locked in
other functions but not here. He suggested changing it to an atomic
type might be less confusing. The alternative which I like better
actually is to simply remove the lock in pm_qos_requirement and add a
comment for readers of the code.
John
On Fri, 29 Aug 2008, John Kacur wrote:
>
> Hi Andrew
>
> The purpose of the patch is to remove the spin_lock around the read in
> the function pm_qos_requirement - since spinlocks can sleep in -rt and
> this function is called from idle. I did propose a patch, that simply
> removed the spinlock, since Peter Zilstra and others (yourself) point
> out, the lock isn't needed to read a single word from memory. Although
> the patch worked fine, Arjan objected to it, because it was not
> symmetric. A reader of the code might wonder why it was locked in
> other functions but not here. He suggested changing it to an atomic
> type might be less confusing. The alternative which I like better
> actually is to simply remove the lock in pm_qos_requirement and add a
> comment for readers of the code.
This is a much better changelog, and is what should have been written in
the first place ;-)
-- Steve