2014-04-09 02:52:42

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH RT 1/2] rwsem-rt: Do not allow readers to nest

From: "Steven Rostedt (Red Hat)" <[email protected]>

The readers of mainline rwsems are not allowed to nest, the rwsems in the
PREEMPT_RT kernel should not nest either.

Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/rwsem_rt.h | 1 -
kernel/rt.c | 37 ++++++++-----------------------------
2 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/include/linux/rwsem_rt.h b/include/linux/rwsem_rt.h
index e94d945..a81151c 100644
--- a/include/linux/rwsem_rt.h
+++ b/include/linux/rwsem_rt.h
@@ -20,7 +20,6 @@

struct rw_semaphore {
struct rt_mutex lock;
- int read_depth;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
diff --git a/kernel/rt.c b/kernel/rt.c
index 5d17727..bb72347 100644
--- a/kernel/rt.c
+++ b/kernel/rt.c
@@ -316,10 +316,8 @@ EXPORT_SYMBOL(rt_up_write);

void rt_up_read(struct rw_semaphore *rwsem)
{
- if (--rwsem->read_depth == 0) {
- rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
- rt_mutex_unlock(&rwsem->lock);
- }
+ rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
+ rt_mutex_unlock(&rwsem->lock);
}
EXPORT_SYMBOL(rt_up_read);

@@ -330,7 +328,6 @@ EXPORT_SYMBOL(rt_up_read);
void rt_downgrade_write(struct rw_semaphore *rwsem)
{
BUG_ON(rt_mutex_owner(&rwsem->lock) != current);
- rwsem->read_depth = 1;
}
EXPORT_SYMBOL(rt_downgrade_write);

@@ -367,37 +364,20 @@ void rt_down_write_nested_lock(struct rw_semaphore *rwsem,

int rt_down_read_trylock(struct rw_semaphore *rwsem)
{
- struct rt_mutex *lock = &rwsem->lock;
- int ret = 1;
-
- /*
- * recursive read locks succeed when current owns the rwsem,
- * but not when read_depth == 0 which means that the rwsem is
- * write locked.
- */
- if (rt_mutex_owner(lock) != current) {
- ret = rt_mutex_trylock(&rwsem->lock);
- if (ret)
- rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
- } else if (!rwsem->read_depth) {
- ret = 0;
- }
+ int ret;

+ ret = rt_mutex_trylock(&rwsem->lock);
if (ret)
- rwsem->read_depth++;
+ rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
+
return ret;
}
EXPORT_SYMBOL(rt_down_read_trylock);

static void __rt_down_read(struct rw_semaphore *rwsem, int subclass)
{
- struct rt_mutex *lock = &rwsem->lock;
-
- if (rt_mutex_owner(lock) != current) {
- rwsem_acquire(&rwsem->dep_map, subclass, 0, _RET_IP_);
- rt_mutex_lock(&rwsem->lock);
- }
- rwsem->read_depth++;
+ rwsem_acquire(&rwsem->dep_map, subclass, 0, _RET_IP_);
+ rt_mutex_lock(&rwsem->lock);
}

void rt_down_read(struct rw_semaphore *rwsem)
@@ -422,7 +402,6 @@ void __rt_rwsem_init(struct rw_semaphore *rwsem, const char *name,
debug_check_no_locks_freed((void *)rwsem, sizeof(*rwsem));
lockdep_init_map(&rwsem->dep_map, name, key, 0);
#endif
- rwsem->read_depth = 0;
rwsem->lock.save_state = 0;
}
EXPORT_SYMBOL(__rt_rwsem_init);
--
1.8.5.3


Subject: Re: [PATCH RT 1/2] rwsem-rt: Do not allow readers to nest

* Steven Rostedt | 2014-04-08 22:47:01 [-0400]:

>From: "Steven Rostedt (Red Hat)" <[email protected]>
>
>The readers of mainline rwsems are not allowed to nest, the rwsems in the
>PREEMPT_RT kernel should not nest either.

I applied this and this is the reason why cpufreq isn't working. What I
see in cpufreq is:
| test.sh-788 [004] ....... 61.416288: store: down_read_try
| test.sh-788 [004] ....... 61.416296: cpufreq_cpu_get: down_read_try
| test.sh-788 [004] ....... 61.416301: cpufreq_cpu_put.part.6: up_read
| test.sh-788 [004] ....... 61.416332: store: up_read

as you see, one code path takes the read path of rw_sema twice.

Looking at the generic implementation, we have:
|#define RWSEM_UNLOCKED_VALUE 0x00000000L
|#define RWSEM_ACTIVE_BIAS 0x00000001L
|#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)

| static inline int __down_read_trylock(struct rw_semaphore *sem)
| {
| long tmp;
|
| while ((tmp = sem->count) >= 0) {
| if (tmp == cmpxchg(&sem->count, tmp,
| tmp + RWSEM_ACTIVE_READ_BIAS)) {
| return 1;
| }
| }
| return 0;
| }

While sem->count is >= 0 we loop and take the semaphore. So we can have
five readers at once. The first writer would set count to a negative
value resulting in trylock failure.

|static inline void __down_read(struct rw_semaphore *sem)
|{
| if (unlikely(atomic_long_inc_return((atomic_long_t*)&sem->count) <= 0))
| rwsem_down_read_failed(sem);
|}

Here the same thing but without cmpxchg(). _If_ after an increment the
value is negative then we take slowpath. Otherwise we have the lock.

I think I'm going to revert this patch. Where is it written that
multiple readers of a RW-semaphore can not nest? According to the code
we can even have multiple readers without nesting (two+ processes may
take a reader lock).

Sebastian

2015-02-18 20:13:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH RT 1/2] rwsem-rt: Do not allow readers to nest

On Wed, 18 Feb 2015 20:57:10 +0100
Sebastian Andrzej Siewior <[email protected]> wrote:

> * Steven Rostedt | 2014-04-08 22:47:01 [-0400]:
>
> >From: "Steven Rostedt (Red Hat)" <[email protected]>
> >
> >The readers of mainline rwsems are not allowed to nest, the rwsems in the
> >PREEMPT_RT kernel should not nest either.
>
> I applied this and this is the reason why cpufreq isn't working. What I
> see in cpufreq is:
> | test.sh-788 [004] ....... 61.416288: store: down_read_try
> | test.sh-788 [004] ....... 61.416296: cpufreq_cpu_get: down_read_try
> | test.sh-788 [004] ....... 61.416301: cpufreq_cpu_put.part.6: up_read
> | test.sh-788 [004] ....... 61.416332: store: up_read
>
> as you see, one code path takes the read path of rw_sema twice.
>
> Looking at the generic implementation, we have:
> |#define RWSEM_UNLOCKED_VALUE 0x00000000L
> |#define RWSEM_ACTIVE_BIAS 0x00000001L
> |#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
>
> | static inline int __down_read_trylock(struct rw_semaphore *sem)
> | {
> | long tmp;
> |
> | while ((tmp = sem->count) >= 0) {
> | if (tmp == cmpxchg(&sem->count, tmp,
> | tmp + RWSEM_ACTIVE_READ_BIAS)) {
> | return 1;
> | }
> | }
> | return 0;
> | }
>
> While sem->count is >= 0 we loop and take the semaphore. So we can have
> five readers at once. The first writer would set count to a negative
> value resulting in trylock failure.
>
> |static inline void __down_read(struct rw_semaphore *sem)
> |{
> | if (unlikely(atomic_long_inc_return((atomic_long_t*)&sem->count) <= 0))
> | rwsem_down_read_failed(sem);
> |}
>
> Here the same thing but without cmpxchg(). _If_ after an increment the
> value is negative then we take slowpath. Otherwise we have the lock.

OK, so I need to make it so it can nest with trylock. I have to look at
the patch again because it has been a while.

>
> I think I'm going to revert this patch. Where is it written that
> multiple readers of a RW-semaphore can not nest? According to the code
> we can even have multiple readers without nesting (two+ processes may
> take a reader lock).

An RW sem must not do two down_read()s on the same lock (it's fine for
a trylock if it has a fail safe for it). The reason is, the second
down_read() will block if there's a writer waiting. Thus you are
guaranteed a deadlock if you have the lock for read, a write comes in
and waits, and you grab the RW sem again, because it will block, and
the writer is waiting for the reader to release. Thus you have a
deadlock.

I'll have to revisit this. I also need to revisit the multi readers
(although Thomas hates it, but he even admitted there's a better way to
do it. Now only if I could remember what that was ;-)

Thanks,

-- Steve

2015-02-20 05:07:58

by Jason Low

[permalink] [raw]
Subject: Re: [PATCH RT 1/2] rwsem-rt: Do not allow readers to nest

On Wed, Feb 18, 2015 at 12:13 PM, Steven Rostedt <[email protected]> wrote:
> On Wed, 18 Feb 2015 20:57:10 +0100
> Sebastian Andrzej Siewior <[email protected]> wrote:
>
>> * Steven Rostedt | 2014-04-08 22:47:01 [-0400]:
>>
>> >From: "Steven Rostedt (Red Hat)" <[email protected]>
>> >
>> >The readers of mainline rwsems are not allowed to nest, the rwsems in the
>> >PREEMPT_RT kernel should not nest either.
>>
>> I applied this and this is the reason why cpufreq isn't working. What I
>> see in cpufreq is:
>> | test.sh-788 [004] ....... 61.416288: store: down_read_try
>> | test.sh-788 [004] ....... 61.416296: cpufreq_cpu_get: down_read_try
>> | test.sh-788 [004] ....... 61.416301: cpufreq_cpu_put.part.6: up_read
>> | test.sh-788 [004] ....... 61.416332: store: up_read
>>
>> as you see, one code path takes the read path of rw_sema twice.
>>
>> Looking at the generic implementation, we have:
>> |#define RWSEM_UNLOCKED_VALUE 0x00000000L
>> |#define RWSEM_ACTIVE_BIAS 0x00000001L
>> |#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
>>
>> | static inline int __down_read_trylock(struct rw_semaphore *sem)
>> | {
>> | long tmp;
>> |
>> | while ((tmp = sem->count) >= 0) {
>> | if (tmp == cmpxchg(&sem->count, tmp,
>> | tmp + RWSEM_ACTIVE_READ_BIAS)) {
>> | return 1;
>> | }
>> | }
>> | return 0;
>> | }
>>
>> While sem->count is >= 0 we loop and take the semaphore. So we can have
>> five readers at once. The first writer would set count to a negative
>> value resulting in trylock failure.
>>
>> |static inline void __down_read(struct rw_semaphore *sem)
>> |{
>> | if (unlikely(atomic_long_inc_return((atomic_long_t*)&sem->count) <= 0))
>> | rwsem_down_read_failed(sem);
>> |}
>>
>> Here the same thing but without cmpxchg(). _If_ after an increment the
>> value is negative then we take slowpath. Otherwise we have the lock.
>
> OK, so I need to make it so it can nest with trylock. I have to look at
> the patch again because it has been a while.

When we reported this a few months ago, Thomas provided the following
patch to fix the issue (which essentially reverted the patch) and
appeared to be agreed on:

https://lkml.org/lkml/2014/11/5/844

Subject: Re: [PATCH RT 1/2] rwsem-rt: Do not allow readers to nest

On 02/18/2015 09:13 PM, Steven Rostedt wrote:

>> Here the same thing but without cmpxchg(). _If_ after an increment the
>> value is negative then we take slowpath. Otherwise we have the lock.
>
> OK, so I need to make it so it can nest with trylock. I have to look at
> the patch again because it has been a while.

I have reverted the patch and can confirm that cpufreq works again.

I did some testing on vanilla and -RT:
- down_read(l) + down_read(l)
this triggers a lockdep warning about a possible deadlock the lock is
obtained.

- down_read(l) + down_read_trylock()
this passes without a warning.

So I think we good now.

> An RW sem must not do two down_read()s on the same lock (it's fine for
> a trylock if it has a fail safe for it). The reason is, the second
> down_read() will block if there's a writer waiting. Thus you are
> guaranteed a deadlock if you have the lock for read, a write comes in
> and waits, and you grab the RW sem again, because it will block, and
> the writer is waiting for the reader to release. Thus you have a
> deadlock.

I fully understand. However nesting is allowed according to the code in
vanilla and now again in -RT. Lockdep complains properly so we should
catch people doing it wrong in both trees.

> I'll have to revisit this. I also need to revisit the multi readers
> (although Thomas hates it, but he even admitted there's a better way to
> do it. Now only if I could remember what that was ;-)

Okay. For now I keep the revert since it looks sane and simple.

>
> Thanks,
>
> -- Steve

Sebastian