I'm getting tons of this, and X fails to start
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_DEBUG_PREEMPT=y
BUG: sleeping function called from invalid context at kernel/rwsem.c:47
in_atomic():1, irqs_disabled():0
no locks held by X/5879.
[<c012bcb1>] down_write+0x15/0x50
[<c01b53af>] do_shmat+0x235/0x3a0
[<c0106be2>] sys_ipc+0x146/0x263
[<c0102892>] sysenter_past_esp+0xa7/0xb5
[<c0102856>] sysenter_past_esp+0x6b/0xb5
=======================
BUG: scheduling while atomic: X/5879/0x00000005
no locks held by X/5879.
irq event stamp: 401318
hardirqs last enabled at (401317): [<c02d96c8>] __mutex_unlock_slowpath+0xb4/0x170
hardirqs last disabled at (401318): [<c010286f>] sysenter_past_esp+0x84/0xb5
softirqs last enabled at (397628): [<c01051ce>] do_softirq+0xa6/0xf9
softirqs last disabled at (397585): [<c01051ce>] do_softirq+0xa6/0xf9
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0106bf7>] sys_ipc+0x15b/0x263
[<c01029ae>] work_resched+0x5/0x31
=======================
note: X[5879] exited with preempt_count 7
BUG: scheduling while atomic: X/5879/0x10000008
no locks held by X/5879.
[<c02d86fe>] __sched_text_start+0x2de/0x384
[<c0114ad5>] __cond_resched+0x21/0x3b
[<c02d8c21>] cond_resched+0x2a/0x31
[<c014c877>] unmap_vmas+0x518/0x560
[<c014f5d6>] exit_mmap+0x6f/0x104
[<c0115d41>] mmput+0x3b/0xbe
[<c011ae16>] do_exit+0x142/0x870
[<c01114cb>] do_page_fault+0x0/0x675
[<c011b569>] do_group_exit+0x25/0x6b
[<c01114cb>] do_page_fault+0x0/0x675
[<c01221c5>] get_signal_to_deliver+0x286/0x439
[<c0120e00>] force_sig_info+0x80/0x8a
[<c01114cb>] do_page_fault+0x0/0x675
[<c0101e0e>] do_notify_resume+0x79/0x6bc
[<c015025a>] vma_link+0x97/0xd5
[<c015025a>] vma_link+0x97/0xd5
[<c0111754>] do_page_fault+0x289/0x675
[<c0102892>] sysenter_past_esp+0xa7/0xb5
[<c01114cb>] do_page_fault+0x0/0x675
[<c01029ed>] work_notifysig+0x13/0x1a
...
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
> no locks held by X/5879.
> [<c012bcb1>] down_write+0x15/0x50
> [<c01b53af>] do_shmat+0x235/0x3a0
> [<c0106be2>] sys_ipc+0x146/0x263
> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> [<c0102856>] sysenter_past_esp+0x6b/0xb5
Someone got their locking imbalanced. Seems that I lost the suitable config
settings to catch that.
Hang about while I do bisection search #800.
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
> no locks held by X/5879.
> [<c012bcb1>] down_write+0x15/0x50
> [<c01b53af>] do_shmat+0x235/0x3a0
> [<c0106be2>] sys_ipc+0x146/0x263
> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> [<c0102856>] sysenter_past_esp+0x6b/0xb5
> =======================
Here's a bug:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}
_
but it still doesn't fix it.
Nadia, please review Documentation/SubmitChecklist. It contains stuff
which would have prevented this.
Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
>
>
>>I'm getting tons of this, and X fails to start
>>
>>CONFIG_SYSVIPC=y
>>CONFIG_SYSVIPC_SYSCTL=y
>># CONFIG_PREEMPT_NONE is not set
>># CONFIG_PREEMPT_VOLUNTARY is not set
>>CONFIG_PREEMPT=y
>>CONFIG_PREEMPT_BKL=y
>>CONFIG_DEBUG_PREEMPT=y
>>
>>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
>>in_atomic():1, irqs_disabled():0
>>no locks held by X/5879.
>> [<c012bcb1>] down_write+0x15/0x50
>> [<c01b53af>] do_shmat+0x235/0x3a0
>> [<c0106be2>] sys_ipc+0x146/0x263
>> [<c0102892>] sysenter_past_esp+0xa7/0xb5
>> [<c0102856>] sysenter_past_esp+0x6b/0xb5
>> =======================
>
>
> Here's a bug:
>
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
>
Andrew,
Actually the rcu_read_lock is released in ipc_unlock(). So I think we
shouldn't add an rcu_read_unlock() before leaving ipc_lock().
This is a part that has not changed since the ref code.
I'm looking also on my side.
Regards,
Nadia
On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
> I'm getting tons of this, and X fails to start
>
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> in_atomic():1, irqs_disabled():0
OK, this fixes the locking here:
--- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
+++ a/ipc/util.c
@@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
spin_lock_init(&new->lock);
new->deleted = 0;
- rcu_read_lock();
spin_lock(&new->lock);
return id;
}
@@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
-
+ rcu_read_unlock();
return out;
}
_
But I'm not very confident in what's happening in there. The patches take
away a _lot_ of the RCU locking, and it's unlear what's protecting the idr
tree. Is it rcu, or is it the mutex? The code seems confused and the
comments are all wrong.
On Tue, Sep 18, 2007 at 03:27:27AM -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
>
> > I'm getting tons of this, and X fails to start
> >
> > CONFIG_SYSVIPC=y
> > CONFIG_SYSVIPC_SYSCTL=y
> > # CONFIG_PREEMPT_NONE is not set
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > CONFIG_PREEMPT=y
> > CONFIG_PREEMPT_BKL=y
> > CONFIG_DEBUG_PREEMPT=y
> >
> > BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> > in_atomic():1, irqs_disabled():0
>
> OK, this fixes the locking here:
Ditto. Thanks!
[puts mental note to read IPC patches]
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
>
> spin_lock_init(&new->lock);
> new->deleted = 0;
> - rcu_read_lock();
> spin_lock(&new->lock);
> return id;
> }
> @@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
On Tue, 18 Sep 2007 12:30:52 +0200 Nadia Derbey <[email protected]> wrote:
> Andrew Morton wrote:
> > On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
> >
> >
> >>I'm getting tons of this, and X fails to start
> >>
> >>CONFIG_SYSVIPC=y
> >>CONFIG_SYSVIPC_SYSCTL=y
> >># CONFIG_PREEMPT_NONE is not set
> >># CONFIG_PREEMPT_VOLUNTARY is not set
> >>CONFIG_PREEMPT=y
> >>CONFIG_PREEMPT_BKL=y
> >>CONFIG_DEBUG_PREEMPT=y
> >>
> >>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
> >>in_atomic():1, irqs_disabled():0
> >>no locks held by X/5879.
> >> [<c012bcb1>] down_write+0x15/0x50
> >> [<c01b53af>] do_shmat+0x235/0x3a0
> >> [<c0106be2>] sys_ipc+0x146/0x263
> >> [<c0102892>] sysenter_past_esp+0xa7/0xb5
> >> [<c0102856>] sysenter_past_esp+0x6b/0xb5
> >> =======================
> >
> >
> > Here's a bug:
> >
> > --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> > +++ a/ipc/util.c
> > @@ -691,7 +691,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> > rcu_read_unlock();
> > return ERR_PTR(-EINVAL);
> > }
> > -
> > + rcu_read_unlock();
> > return out;
> > }
> >
>
> Andrew,
>
> Actually the rcu_read_lock is released in ipc_unlock().
I agree it should be, but it isn't.
> So I think we
> shouldn't add an rcu_read_unlock() before leaving ipc_lock().
> This is a part that has not changed since the ref code.
>
Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
a bit dirty and we might choose to not do that.
Would be interested in knowing the locking rules in there. For example,
this:
/**
* ipc_findkey - find a key in an ipc identifier set
* @ids: Identifier set
* @key: The key to find
*
* Requires ipc_ids.mutex locked.
* Returns the LOCKED pointer to the ipc structure if found or NULL
* if not.
* If key is found ipc contains its ipc structure
*/
appears to be hopelessly out of date?
Andrew Morton wrote:
> On Tue, 18 Sep 2007 13:17:28 +0400 Alexey Dobriyan <[email protected]> wrote:
>
>
>>I'm getting tons of this, and X fails to start
>>
>>CONFIG_SYSVIPC=y
>>CONFIG_SYSVIPC_SYSCTL=y
>># CONFIG_PREEMPT_NONE is not set
>># CONFIG_PREEMPT_VOLUNTARY is not set
>>CONFIG_PREEMPT=y
>>CONFIG_PREEMPT_BKL=y
>>CONFIG_DEBUG_PREEMPT=y
>>
>>BUG: sleeping function called from invalid context at kernel/rwsem.c:47
>>in_atomic():1, irqs_disabled():0
>
>
> OK, this fixes the locking here:
>
> --- a/ipc/util.c~ipc-integrate-ipc_checkid-into-ipc_lock-fix-2
> +++ a/ipc/util.c
> @@ -295,7 +295,6 @@ int ipc_addid(struct ipc_ids* ids, struc
>
> spin_lock_init(&new->lock);
> new->deleted = 0;
> - rcu_read_lock();
> spin_lock(&new->lock);
> return id;
> }
> @@ -691,7 +690,7 @@ struct kern_ipc_perm *ipc_lock(struct ip
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
> -
> + rcu_read_unlock();
> return out;
> }
>
Well, reviewing the code I found another place where the
rcu_read_unlock() was missing.
I'm so sorry for the inconvenience. It's true that I should have tested
with CONFIG_PREEMPT=y :-(
Now, the ltp tests pass even with this option set...
In attachment you'll find a patch thhat
1) adds the missing rcu_read_unlock()
2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
exactly as it was done in the ref code.
Regards,
Nadia
On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> <[email protected]> wrote:
>
> > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > a bit dirty and we might choose to not do that.
>
> Not true for the preemptible-rcu work. All such sites should be fixed,
> or at the very least heavily annotated.
What he said!!!
Thanx, Paul
On Tue, 18 Sep 2007 16:55:19 +0200 Nadia Derbey <[email protected]> wrote:
> This patch fixes the missing rcu_read_(un)lock in the ipc code
Thanks. Could you please check the code comments in ipc/util.c for
accuracy and completeness sometime?
On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <[email protected]> wrote:
> On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > <[email protected]> wrote:
> >
> > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > a bit dirty and we might choose to not do that.
> >
> > Not true for the preemptible-rcu work. All such sites should be fixed,
> > or at the very least heavily annotated.
>
> What he said!!!
>
What he said!
How are you going to find all such sites?
On Tue, Sep 18, 2007 at 09:57:15AM -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <[email protected]> wrote:
>
> > On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > > <[email protected]> wrote:
> > >
> > > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > > a bit dirty and we might choose to not do that.
> > >
> > > Not true for the preemptible-rcu work. All such sites should be fixed,
> > > or at the very least heavily annotated.
> >
> > What he said!!!
> >
>
> What he said!
>
> How are you going to find all such sites?
A first step would be to look for patches in -rt that add rcu_read_lock()
or friends. A second step would be to look for rcu_dereference() without
an enclosing rcu_read_lock(), rcu_read_lock_bh(), or preempt_disable().
Beyond a certain point, this reduces to the problem of finding places
where spin_lock() was omitted, right?
Thanx, Paul
On Tue, 18 Sep 2007 11:29:24 -0700 "Paul E. McKenney"
<[email protected]> wrote:
> On Tue, Sep 18, 2007 at 09:57:15AM -0700, Andrew Morton wrote:
> > On Tue, 18 Sep 2007 09:13:37 -0700 "Paul E. McKenney" <[email protected]> wrote:
> >
> > > On Tue, Sep 18, 2007 at 02:24:51PM +0200, Peter Zijlstra wrote:
> > > > On Tue, 18 Sep 2007 03:34:00 -0700 Andrew Morton
> > > > <[email protected]> wrote:
> > > >
> > > > > Well, it was an optimisation. spin_lock() implies rcu_read_lock(). That's
> > > > > a bit dirty and we might choose to not do that.
> > > >
> > > > Not true for the preemptible-rcu work. All such sites should be fixed,
> > > > or at the very least heavily annotated.
> > >
> > > What he said!!!
> > >
> >
> > What he said!
> >
> > How are you going to find all such sites?
>
> A first step would be to look for patches in -rt that add rcu_read_lock()
> or friends. A second step would be to look for rcu_dereference() without
> an enclosing rcu_read_lock(), rcu_read_lock_bh(), or preempt_disable().
I could perhaps do that with that rcu_read_lock lockdep annotation. If
I add a check for holding rcu_read_lock in rcu_dereference().
warn when rcu_dereference() is used outside of rcu_read_lock()
[ generates a _lot_ of output when booted ]
Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/lockdep.h | 3 ++
include/linux/rcupdate.h | 5 ++++
kernel/lockdep.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -303,6 +303,8 @@ extern void lock_acquire(struct lockdep_
extern void lock_release(struct lockdep_map *lock, int nested,
unsigned long ip);
+extern int lock_is_held(struct lockdep_map *lock);
+
# define INIT_LOCKDEP .lockdep_recursion = 0,
#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
@@ -319,6 +321,7 @@ static inline void lockdep_on(void)
# define lock_acquire(l, s, t, r, c, i) do { } while (0)
# define lock_release(l, n, i) do { } while (0)
+# define lock_is_held(l) (0)
# define lockdep_init() do { } while (0)
# define lockdep_info() do { } while (0)
# define lockdep_init_map(lock, name, key, sub) do { (void)(key); } while (0)
Index: linux-2.6/include/linux/rcupdate.h
===================================================================
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -138,9 +138,11 @@ extern int rcu_needs_cpu(int cpu);
extern struct lockdep_map rcu_lock_map;
# define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+# define rcu_read_held() WARN_ON_ONCE(!lock_is_held(&rcu_lock_map))
#else
# define rcu_read_acquire() do { } while (0)
# define rcu_read_release() do { } while (0)
+# define rcu_read_held() do { } while (0)
#endif
/**
@@ -216,6 +218,7 @@ extern struct lockdep_map rcu_lock_map;
do { \
local_bh_disable(); \
__acquire(RCU_BH); \
+ rcu_read_acquire(); \
} while(0)
/*
@@ -225,6 +228,7 @@ extern struct lockdep_map rcu_lock_map;
*/
#define rcu_read_unlock_bh() \
do { \
+ rcu_read_release(); \
__release(RCU_BH); \
local_bh_enable(); \
} while(0)
@@ -254,6 +258,7 @@ extern struct lockdep_map rcu_lock_map;
#define rcu_dereference(p) ({ \
typeof(p) _________p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
+ rcu_read_held(); \
(_________p1); \
})
Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c
+++ linux-2.6/kernel/lockdep.c
@@ -2624,6 +2624,36 @@ static int lock_release_nested(struct ta
return 1;
}
+static int __lock_is_held(struct lockdep_map *lock)
+{
+ struct task_struct *curr = current;
+ struct held_lock *hlock, *prev_hlock;
+ unsigned int depth;
+ int i;
+
+ /*
+ * Check whether the lock exists in the current stack
+ * of held locks:
+ */
+ depth = curr->lockdep_depth;
+ if (DEBUG_LOCKS_WARN_ON(!depth))
+ return 0;
+
+ prev_hlock = NULL;
+ for (i = depth-1; i >= 0; i--) {
+ hlock = curr->held_locks + i;
+ /*
+ * We must not cross into another context:
+ */
+ if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
+ break;
+ if (hlock->instance == lock)
+ return 1;
+ prev_hlock = hlock;
+ }
+ return 0;
+}
+
/*
* Remove the lock to the list of currently held locks - this gets
* called on mutex_unlock()/spin_unlock*() (or on a failed
@@ -2727,6 +2757,29 @@ void lock_release(struct lockdep_map *lo
EXPORT_SYMBOL_GPL(lock_release);
+int lock_is_held(struct lockdep_map *lock)
+{
+ int ret = 0;
+ unsigned long flags;
+
+ if (unlikely(!lock_stat && !prove_locking))
+ return 0;
+
+ if (unlikely(current->lockdep_recursion))
+ return -EBUSY;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ ret = __lock_is_held(lock);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(lock_is_held);
+
#ifdef CONFIG_LOCK_STAT
static int
print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
Lockdep annotate rcu_read_lock() so that it will find unbalanced
locking.
Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/lockdep.h | 7 +++++++
include/linux/rcupdate.h | 12 ++++++++++++
kernel/rcupdate.c | 8 ++++++++
3 files changed, 27 insertions(+)
Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
struct lock_class_key *key, int subclass);
/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), }
+
+/*
* Reinitialize a lock key - for cases where there is special locking or
* special initialization of locks so that the validator gets the scope
* of dependencies wrong: they are either too broad (they need a class-split)
Index: linux-2.6/include/linux/rcupdate.h
===================================================================
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -41,6 +41,7 @@
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
+#include <linux/lockdep.h>
/**
* struct rcu_head - callback structure for use with RCU
@@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int
extern int rcu_pending(int cpu);
extern int rcu_needs_cpu(int cpu);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern struct lockdep_map rcu_lock_map;
+# define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
+# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+#else
+# define rcu_read_acquire() do { } while (0)
+# define rcu_read_release() do { } while (0)
+#endif
+
/**
* rcu_read_lock - mark the beginning of an RCU read-side critical section.
*
@@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
do { \
preempt_disable(); \
__acquire(RCU); \
+ rcu_read_acquire(); \
} while(0)
/**
@@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
*/
#define rcu_read_unlock() \
do { \
+ rcu_read_release(); \
__release(RCU); \
preempt_enable(); \
} while(0)
Index: linux-2.6/kernel/rcupdate.c
===================================================================
--- linux-2.6.orig/kernel/rcupdate.c
+++ linux-2.6/kernel/rcupdate.c
@@ -48,6 +48,14 @@
#include <linux/cpu.h>
#include <linux/mutex.h>
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key rcu_lock_key;
+struct lockdep_map rcu_lock_map =
+ STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+
+EXPORT_SYMBOL_GPL(rcu_lock_map);
+#endif
+
/* Definition for rcupdate control block. */
static struct rcu_ctrlblk rcu_ctrlblk = {
.cur = -300,
On Tue, Sep 18, 2007 at 10:27:01PM +0200, Peter Zijlstra wrote:
>
>
> warn when rcu_dereference() is used outside of rcu_read_lock()
Cool!!!
> [ generates a _lot_ of output when booted ]
I bet! If you create an annotation for rcu_read_lock_bh()
and rcu_read_unlock_bh() like you did for rcu_read_lock() and
rcu_read_unlock(), I suspect that much of the noise will go away.
Of course, preempt_disable() / preempt_enable() is a bit of a two-edged
sword here. It is OK to do rcu_dereference() under explicit
preempt_disable(), but not OK under the implicit preempt_disable()
implied by spin_lock() in CONFIG_PREEMPT. One way to handle this
would be to have _rcu() wrappers for preempt_disable() and
preempt_enable() that are annotated.
There is probably a better way... Can't think of one right off, though...
Thanx, Paul
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> include/linux/lockdep.h | 3 ++
> include/linux/rcupdate.h | 5 ++++
> kernel/lockdep.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 61 insertions(+)
>
> Index: linux-2.6/include/linux/lockdep.h
> ===================================================================
> --- linux-2.6.orig/include/linux/lockdep.h
> +++ linux-2.6/include/linux/lockdep.h
> @@ -303,6 +303,8 @@ extern void lock_acquire(struct lockdep_
> extern void lock_release(struct lockdep_map *lock, int nested,
> unsigned long ip);
>
> +extern int lock_is_held(struct lockdep_map *lock);
> +
> # define INIT_LOCKDEP .lockdep_recursion = 0,
>
> #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
> @@ -319,6 +321,7 @@ static inline void lockdep_on(void)
>
> # define lock_acquire(l, s, t, r, c, i) do { } while (0)
> # define lock_release(l, n, i) do { } while (0)
> +# define lock_is_held(l) (0)
> # define lockdep_init() do { } while (0)
> # define lockdep_info() do { } while (0)
> # define lockdep_init_map(lock, name, key, sub) do { (void)(key); } while (0)
> Index: linux-2.6/include/linux/rcupdate.h
> ===================================================================
> --- linux-2.6.orig/include/linux/rcupdate.h
> +++ linux-2.6/include/linux/rcupdate.h
> @@ -138,9 +138,11 @@ extern int rcu_needs_cpu(int cpu);
> extern struct lockdep_map rcu_lock_map;
> # define rcu_read_acquire() lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
> # define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> +# define rcu_read_held() WARN_ON_ONCE(!lock_is_held(&rcu_lock_map))
> #else
> # define rcu_read_acquire() do { } while (0)
> # define rcu_read_release() do { } while (0)
> +# define rcu_read_held() do { } while (0)
> #endif
>
> /**
> @@ -216,6 +218,7 @@ extern struct lockdep_map rcu_lock_map;
> do { \
> local_bh_disable(); \
> __acquire(RCU_BH); \
> + rcu_read_acquire(); \
> } while(0)
>
> /*
> @@ -225,6 +228,7 @@ extern struct lockdep_map rcu_lock_map;
> */
> #define rcu_read_unlock_bh() \
> do { \
> + rcu_read_release(); \
> __release(RCU_BH); \
> local_bh_enable(); \
> } while(0)
> @@ -254,6 +258,7 @@ extern struct lockdep_map rcu_lock_map;
> #define rcu_dereference(p) ({ \
> typeof(p) _________p1 = ACCESS_ONCE(p); \
> smp_read_barrier_depends(); \
> + rcu_read_held(); \
> (_________p1); \
> })
>
> Index: linux-2.6/kernel/lockdep.c
> ===================================================================
> --- linux-2.6.orig/kernel/lockdep.c
> +++ linux-2.6/kernel/lockdep.c
> @@ -2624,6 +2624,36 @@ static int lock_release_nested(struct ta
> return 1;
> }
>
> +static int __lock_is_held(struct lockdep_map *lock)
> +{
> + struct task_struct *curr = current;
> + struct held_lock *hlock, *prev_hlock;
> + unsigned int depth;
> + int i;
> +
> + /*
> + * Check whether the lock exists in the current stack
> + * of held locks:
> + */
> + depth = curr->lockdep_depth;
> + if (DEBUG_LOCKS_WARN_ON(!depth))
> + return 0;
> +
> + prev_hlock = NULL;
> + for (i = depth-1; i >= 0; i--) {
> + hlock = curr->held_locks + i;
> + /*
> + * We must not cross into another context:
> + */
> + if (prev_hlock && prev_hlock->irq_context != hlock->irq_context)
> + break;
> + if (hlock->instance == lock)
> + return 1;
> + prev_hlock = hlock;
> + }
> + return 0;
> +}
> +
> /*
> * Remove the lock to the list of currently held locks - this gets
> * called on mutex_unlock()/spin_unlock*() (or on a failed
> @@ -2727,6 +2757,29 @@ void lock_release(struct lockdep_map *lo
>
> EXPORT_SYMBOL_GPL(lock_release);
>
> +int lock_is_held(struct lockdep_map *lock)
> +{
> + int ret = 0;
> + unsigned long flags;
> +
> + if (unlikely(!lock_stat && !prove_locking))
> + return 0;
> +
> + if (unlikely(current->lockdep_recursion))
> + return -EBUSY;
> +
> + raw_local_irq_save(flags);
> + check_flags(flags);
> + current->lockdep_recursion = 1;
> + ret = __lock_is_held(lock);
> + current->lockdep_recursion = 0;
> + raw_local_irq_restore(flags);
> +
> + return ret;
> +}
> +
> +EXPORT_SYMBOL_GPL(lock_is_held);
> +
> #ifdef CONFIG_LOCK_STAT
> static int
> print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
On 18-09-2007 16:55, Nadia Derbey wrote:
...
>
> Well, reviewing the code I found another place where the
> rcu_read_unlock() was missing.
> I'm so sorry for the inconvenience. It's true that I should have tested
> with CONFIG_PREEMPT=y :-(
> Now, the ltp tests pass even with this option set...
>
> In attachment you'll find a patch thhat
> 1) adds the missing rcu_read_unlock()
> 2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
> taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
> exactly as it was done in the ref code.
BTW, probably I miss something, but I wonder, how this RCU is working
here. E.g. in msg.c do_msgsnd() there is:
msq = msg_lock_check(ns, msqid);
...
msg_unlock(msq);
schedule();
ipc_lock_by_ptr(&msq->q_perm);
Since msq_lock_check() gets msq with ipc_lock_check() under
rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
with rcu_read_unlock(), is it valid to use this with
ipc_lock_by_ptr() yet?
Regards,
Jarek P.
Jarek Poplawski wrote:
> On 18-09-2007 16:55, Nadia Derbey wrote:
> ...
>
>>Well, reviewing the code I found another place where the
>>rcu_read_unlock() was missing.
>>I'm so sorry for the inconvenience. It's true that I should have tested
>>with CONFIG_PREEMPT=y :-(
>>Now, the ltp tests pass even with this option set...
>>
>>In attachment you'll find a patch thhat
>>1) adds the missing rcu_read_unlock()
>>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
>>exactly as it was done in the ref code.
>
>
> BTW, probably I miss something, but I wonder, how this RCU is working
> here. E.g. in msg.c do_msgsnd() there is:
>
> msq = msg_lock_check(ns, msqid);
> ...
>
> msg_unlock(msq);
> schedule();
>
> ipc_lock_by_ptr(&msq->q_perm);
>
> Since msq_lock_check() gets msq with ipc_lock_check() under
> rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
> with rcu_read_unlock(), is it valid to use this with
> ipc_lock_by_ptr() yet?
Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
refcount in the rcu header for the msg structure. This guarantees that
the the structure won't be freed before they relock it. Once the
structure is relocked by ipc_lock_by_ptr(), they do the symmetric
operation i.e. ipc_rcu_putref().
Regards,
Nadia
On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
> >On 18-09-2007 16:55, Nadia Derbey wrote:
> >...
> >
> >>Well, reviewing the code I found another place where the
> >>rcu_read_unlock() was missing.
> >>I'm so sorry for the inconvenience. It's true that I should have tested
> >>with CONFIG_PREEMPT=y :-(
> >>Now, the ltp tests pass even with this option set...
> >>
> >>In attachment you'll find a patch thhat
> >>1) adds the missing rcu_read_unlock()
> >>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
> >>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
> >>exactly as it was done in the ref code.
> >
> >
> >BTW, probably I miss something, but I wonder, how this RCU is working
> >here. E.g. in msg.c do_msgsnd() there is:
> >
> >msq = msg_lock_check(ns, msqid);
> >...
> >
> >msg_unlock(msq);
> >schedule();
> >
> >ipc_lock_by_ptr(&msq->q_perm);
> >
> >Since msq_lock_check() gets msq with ipc_lock_check() under
> >rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
> >with rcu_read_unlock(), is it valid to use this with
> >ipc_lock_by_ptr() yet?
>
> Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
> refcount in the rcu header for the msg structure. This guarantees that
> the the structure won't be freed before they relock it. Once the
> structure is relocked by ipc_lock_by_ptr(), they do the symmetric
> operation i.e. ipc_rcu_putref().
Yes, I've found this later too - sorry for bothering. I was mislead
by the code like this:
struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
{
struct kern_ipc_perm *out;
int lid = ipcid_to_idx(id);
rcu_read_lock();
out = idr_find(&ids->ipcs_idr, lid);
if (out == NULL) {
rcu_read_unlock();
return ERR_PTR(-EINVAL);
}
which seems to suggest "out" is an RCU protected pointer, so, I
thought these refcounts were for something else. But, after looking
at how it's used it turns out to be ~90% wrong: probably 9 out of 10
places use refcouning around this, so, these rcu_read_locks() don't
work here at all. So, probably I miss something again, but IMHO,
these rcu_read_locks/unlocks could be removed here or in
ipc_lock_by_ptr() and it should be enough to use them directly, where
really needed, e.g., in msg.c do_msgrcv().
BTW, I've found this comment, which, at least for me, explains very
good, what's going on here:
/* Lockless receive, part 3:
* Acquire the queue spinlock.
*/
ipc_lock_by_ptr(&msq->q_perm);
Thanks,
Jarek P.
On Thu, Sep 20, 2007 at 09:28:21AM +0200, Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
...
> > Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
> > refcount in the rcu header for the msg structure. This guarantees that
> > the the structure won't be freed before they relock it. Once the
> > structure is relocked by ipc_lock_by_ptr(), they do the symmetric
> > operation i.e. ipc_rcu_putref().
...
> which seems to suggest "out" is an RCU protected pointer, so, I
> thought these refcounts were for something else. But, after looking
> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> places use refcouning around this, so, these rcu_read_locks() don't
> work here at all. So, probably I miss something again, but IMHO,
> these rcu_read_locks/unlocks could be removed here or in
...
...So I missed it again: after all this RCU protection works before
and after refcounting.
Sorry,
Jarek P.
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
>>Jarek Poplawski wrote:
>>
>>>On 18-09-2007 16:55, Nadia Derbey wrote:
>>>...
>>>
>>>
>>>>Well, reviewing the code I found another place where the
>>>>rcu_read_unlock() was missing.
>>>>I'm so sorry for the inconvenience. It's true that I should have tested
>>>>with CONFIG_PREEMPT=y :-(
>>>>Now, the ltp tests pass even with this option set...
>>>>
>>>>In attachment you'll find a patch thhat
>>>>1) adds the missing rcu_read_unlock()
>>>>2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>>>taken in ipc_lock() / ipc_lock_by_ptr() and released in ipc_unlock(),
>>>>exactly as it was done in the ref code.
>>>
>>>
>>>BTW, probably I miss something, but I wonder, how this RCU is working
>>>here. E.g. in msg.c do_msgsnd() there is:
>>>
>>>msq = msg_lock_check(ns, msqid);
>>>...
>>>
>>>msg_unlock(msq);
>>>schedule();
>>>
>>>ipc_lock_by_ptr(&msq->q_perm);
>>>
>>>Since msq_lock_check() gets msq with ipc_lock_check() under
>>>rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
>>>with rcu_read_unlock(), is it valid to use this with
>>>ipc_lock_by_ptr() yet?
>>
>>Before Calling msg_unlock() they call ipc_rcu_getref() that increments a
>>refcount in the rcu header for the msg structure. This guarantees that
>>the the structure won't be freed before they relock it. Once the
>>structure is relocked by ipc_lock_by_ptr(), they do the symmetric
>>operation i.e. ipc_rcu_putref().
>
>
> Yes, I've found this later too - sorry for bothering. I was mislead
> by the code like this:
>
> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
> {
> struct kern_ipc_perm *out;
> int lid = ipcid_to_idx(id);
>
> rcu_read_lock();
> out = idr_find(&ids->ipcs_idr, lid);
> if (out == NULL) {
> rcu_read_unlock();
> return ERR_PTR(-EINVAL);
> }
>
> which seems to suggest "out" is an RCU protected pointer, so, I
> thought these refcounts were for something else. But, after looking
> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> places use refcouning around this,
Actually, ipc_lock() is called most of the time without the
ipc_ids.mutex held and without refcounting (maybe you didn't look for
the msg_lock() sem_lock() and shm_lock() too).
So I think disabling preemption is needed, isn't it?
> so, these rcu_read_locks() don't
> work here at all. So, probably I miss something again, but IMHO,
> these rcu_read_locks/unlocks could be removed here or in
> ipc_lock_by_ptr() and it should be enough to use them directly, where
> really needed, e.g., in msg.c do_msgrcv().
>
I have to check for the ipc_lock_by_ptr(): may be you're right!
Regards,
Nadia
Nadia Derbey wrote:
> Jarek Poplawski wrote:
>
>> On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>>
>>> Jarek Poplawski wrote:
>>>
>>>> On 18-09-2007 16:55, Nadia Derbey wrote:
>>>> ...
>>>>
>>>>
>>>>> Well, reviewing the code I found another place where the
>>>>> rcu_read_unlock() was missing.
>>>>> I'm so sorry for the inconvenience. It's true that I should have
>>>>> tested with CONFIG_PREEMPT=y :-(
>>>>> Now, the ltp tests pass even with this option set...
>>>>>
>>>>> In attachment you'll find a patch thhat
>>>>> 1) adds the missing rcu_read_unlock()
>>>>> 2) replaces Andrew's fix with a new one: the rcu_read_lock() is now
>>>>> taken in ipc_lock() / ipc_lock_by_ptr() and released in
>>>>> ipc_unlock(), exactly as it was done in the ref code.
>>>>
>>>>
>>>>
>>>> BTW, probably I miss something, but I wonder, how this RCU is working
>>>> here. E.g. in msg.c do_msgsnd() there is:
>>>>
>>>> msq = msg_lock_check(ns, msqid);
>>>> ...
>>>>
>>>> msg_unlock(msq);
>>>> schedule();
>>>>
>>>> ipc_lock_by_ptr(&msq->q_perm);
>>>>
>>>> Since msq_lock_check() gets msq with ipc_lock_check() under
>>>> rcu_read_lock(), and then goes msg_unlock(msq) (i.e. ipc_unlock())
>>>> with rcu_read_unlock(), is it valid to use this with
>>>> ipc_lock_by_ptr() yet?
>>>
>>>
>>> Before Calling msg_unlock() they call ipc_rcu_getref() that
>>> increments a refcount in the rcu header for the msg structure. This
>>> guarantees that the the structure won't be freed before they relock
>>> it. Once the structure is relocked by ipc_lock_by_ptr(), they do the
>>> symmetric operation i.e. ipc_rcu_putref().
>>
>>
>>
>> Yes, I've found this later too - sorry for bothering. I was mislead
>> by the code like this:
>>
>> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
>> {
>> struct kern_ipc_perm *out;
>> int lid = ipcid_to_idx(id);
>>
>> rcu_read_lock();
>> out = idr_find(&ids->ipcs_idr, lid);
>> if (out == NULL) {
>> rcu_read_unlock();
>> return ERR_PTR(-EINVAL);
>> }
>>
>> which seems to suggest "out" is an RCU protected pointer, so, I
>> thought these refcounts were for something else. But, after looking
>> at how it's used it turns out to be ~90% wrong: probably 9 out of 10
>> places use refcouning around this,
>
>
> Actually, ipc_lock() is called most of the time without the
> ipc_ids.mutex held and without refcounting (maybe you didn't look for
> the msg_lock() sem_lock() and shm_lock() too).
> So I think disabling preemption is needed, isn't it?
>
>> so, these rcu_read_locks() don't
>> work here at all. So, probably I miss something again, but IMHO,
>> these rcu_read_locks/unlocks could be removed here or in
>> ipc_lock_by_ptr() and it should be enough to use them directly, where
>> really needed, e.g., in msg.c do_msgrcv().
>>
>
> I have to check for the ipc_lock_by_ptr(): may be you're right!
>
So, here is the ipc_lock_by_ptr() status:
1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
call it inside a refcounting.
==> no rcu read section needed.
2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
ipc_ids mutex lock.
==> no rcu read section needed.
3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
under refcounting
==> rcu read section + some more checks needed once the spnlock is
taken.
So I completely agree with you: we might remove the rcu_read_lock() from
the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
is already explicitly called in do_msgrcv()).
Regards,
Nadia
On Thu, Sep 20, 2007 at 10:52:43AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >which seems to suggest "out" is an RCU protected pointer, so, I
> >thought these refcounts were for something else. But, after looking
> >at how it's used it turns out to be ~90% wrong: probably 9 out of 10
> >places use refcouning around this,
>
> Actually, ipc_lock() is called most of the time without the
> ipc_ids.mutex held and without refcounting (maybe you didn't look for
> the msg_lock() sem_lock() and shm_lock() too).
Yes, you are 100% right and I'm 90% lier, 10% blind (maybe backward
too).
> So I think disabling preemption is needed, isn't it?
Do I've to tell the truth...?
>
> >so, these rcu_read_locks() don't
> >work here at all. So, probably I miss something again, but IMHO,
> >these rcu_read_locks/unlocks could be removed here or in
> >ipc_lock_by_ptr() and it should be enough to use them directly, where
> >really needed, e.g., in msg.c do_msgrcv().
> >
>
> I have to check for the ipc_lock_by_ptr(): may be you're right!
Since this whole locking scheme is really quite a puzzle, and needs
more than one or two looks to figure it out, I'd better try stop to
discredit myself more.
Anyway it looks to me like the most sophisticated way of achieving
locklessness I've seen so far. I hope, there is still some gain after
this RCU + refcounting vs. simple locking. (It seems somebody had
similar doubts writing the "Lockless receive, part 1:" comment in
do_msgrcv(); and probably again I'm very wrong, but this checking of
validity of RCU protected structure with an r_msg value, which is
done to avoid refcounting, looks like not very different and has
some cost too).
Regards,
Jarek P.
On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
...
> So, here is the ipc_lock_by_ptr() status:
> 1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
> call it inside a refcounting.
> ==> no rcu read section needed.
>
> 2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
> ipc_ids mutex lock.
> ==> no rcu read section needed.
>
> 3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
> under refcounting
> ==> rcu read section + some more checks needed once the spnlock is
> taken.
>
> So I completely agree with you: we might remove the rcu_read_lock() from
> the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
> is already explicitly called in do_msgrcv()).
Impossible!
I've to look at this once more in the evening (and try to recall
what I've seen yesterday and forgotten today...).
Cheers,
Jarek P.
On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
> Nadia Derbey wrote:
> >Jarek Poplawski wrote:
> >
> >>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
...
> >Actually, ipc_lock() is called most of the time without the
> >ipc_ids.mutex held and without refcounting (maybe you didn't look for
> >the msg_lock() sem_lock() and shm_lock() too).
> >So I think disabling preemption is needed, isn't it?
> >
> >>so, these rcu_read_locks() don't
> >>work here at all. So, probably I miss something again, but IMHO,
> >>these rcu_read_locks/unlocks could be removed here or in
> >>ipc_lock_by_ptr() and it should be enough to use them directly, where
> >>really needed, e.g., in msg.c do_msgrcv().
> >>
> >
> >I have to check for the ipc_lock_by_ptr(): may be you're right!
> >
>
> So, here is the ipc_lock_by_ptr() status:
> 1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
> call it inside a refcounting.
> ==> no rcu read section needed.
>
> 2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
> ipc_ids mutex lock.
> ==> no rcu read section needed.
>
> 3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
> under refcounting
> ==> rcu read section + some more checks needed once the spnlock is
> taken.
>
> So I completely agree with you: we might remove the rcu_read_lock() from
> the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
> is already explicitly called in do_msgrcv()).
Yes, IMHO, it should be at least more readable when we can see where
this RCU is really needed.
But, after 3-rd look, I have a few more doubts (btw., 3 looks are
still not enough for me with this code, so I cerainly can miss many
things here, and, alas, I manged to see util and msg code only):
1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
but it's probably wrong: they call idr_find() with ipc_ids pointer
which needs this mutex, just like in similar code in: ipc_findkey(),
ipc_get_maxid() or sysvipc_find_ipc().
2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
safe (memory barriers): it's not atomic, so locking is needed, but
e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
3. Probably similar problem is possible with msr_d.r_msg which is
read in do_msgrcv() under rcu_read_lock() only.
Regards,
Jarek P.
Andrew Morton wrote:
> On Tue, 18 Sep 2007 16:55:19 +0200 Nadia Derbey <[email protected]> wrote:
>
>
>>This patch fixes the missing rcu_read_(un)lock in the ipc code
>
>
> Thanks. Could you please check the code comments in ipc/util.c for
> accuracy and completeness sometime?
>
Done - see attachment.
Hope I didn't forget / add some wrong stuff ;-)
BTW this patch also fixes a missing lock in shm_get_stat().
Regards,
Nadia
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
>
>>Nadia Derbey wrote:
>>
>>>Jarek Poplawski wrote:
>>>
>>>
>>>>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
> ...
>
>>>Actually, ipc_lock() is called most of the time without the
>>>ipc_ids.mutex held and without refcounting (maybe you didn't look for
>>>the msg_lock() sem_lock() and shm_lock() too).
>>>So I think disabling preemption is needed, isn't it?
>>>
>>>
>>>>so, these rcu_read_locks() don't
>>>>work here at all. So, probably I miss something again, but IMHO,
>>>>these rcu_read_locks/unlocks could be removed here or in
>>>>ipc_lock_by_ptr() and it should be enough to use them directly, where
>>>>really needed, e.g., in msg.c do_msgrcv().
>>>>
>>>
>>>I have to check for the ipc_lock_by_ptr(): may be you're right!
>>>
>>
>>So, here is the ipc_lock_by_ptr() status:
>>1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
>>call it inside a refcounting.
>> ==> no rcu read section needed.
>>
>>2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
>>ipc_ids mutex lock.
>> ==> no rcu read section needed.
>>
>>3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
>>under refcounting
>> ==> rcu read section + some more checks needed once the spnlock is
>> taken.
>>
>>So I completely agree with you: we might remove the rcu_read_lock() from
>>the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
>>is already explicitly called in do_msgrcv()).
>
>
> Yes, IMHO, it should be at least more readable when we can see where
> this RCU is really needed.
>
> But, after 3-rd look, I have a few more doubts (btw., 3 looks are
> still not enough for me with this code, so I cerainly can miss many
> things here, and, alas, I manged to see util and msg code only):
>
> 1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
> but it's probably wrong: they call idr_find() with ipc_ids pointer
> which needs this mutex, just like in similar code in: ipc_findkey(),
> ipc_get_maxid() or sysvipc_find_ipc().
>
> 2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> safe (memory barriers): it's not atomic, so locking is needed, but
> e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
OK, but freeque() freeary() and shm_destroy() are special cases:
we have the following path:
mutex_lock(ipc_ids.mutex)
...
ipc_lock(ipcp)
... do whatever cleaning is needed ...
ipc_rmid(ipcp)
ipc_unlock(ipcp)
....
ipc_rcu_putref(ipcp)
Once the rmid has been done the ipc structure is considered as not
visible anymore from the user side ==> any syscall called with the
corresponding id will return invalid.
The only thing that could happen is that this structure be reused for a
newly allocated ipc structure. But this too cannot happen since we are
under the ipc_ids mutex lock.
Am I wrong?
Answers to the other questions in separate e-mails
Regards,
Nadia
On Fri, Sep 21, 2007 at 12:11:15PM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> >safe (memory barriers): it's not atomic, so locking is needed, but
> >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
>
> OK, but freeque() freeary() and shm_destroy() are special cases:
> we have the following path:
>
> mutex_lock(ipc_ids.mutex)
> ...
> ipc_lock(ipcp)
> ... do whatever cleaning is needed ...
> ipc_rmid(ipcp)
> ipc_unlock(ipcp)
> ....
> ipc_rcu_putref(ipcp)
>
> Once the rmid has been done the ipc structure is considered as not
> visible anymore from the user side ==> any syscall called with the
> corresponding id will return invalid.
> The only thing that could happen is that this structure be reused for a
> newly allocated ipc structure. But this too cannot happen since we are
> under the ipc_ids mutex lock.
>
> Am I wrong?
I hope not! But, then it would be probably another logical trick:
ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
if it's used in do_msgsnd() there should be a risk something can do
this kfree at this moment, and it seems freeque() is the only one,
which both: can do this and cares for this refcount. Then, e.g., if
any of them does ipc_rcu_getref() a bit later and sees old (cached)
value - kfree can be skipped forever. On the other hand, if there is
no such possibility because of other reasons, this all refcounting
looks like a code beautifier only...
Thanks,
Jarek P.
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
...
> any of them does ipc_rcu_getref() a bit later and sees old (cached)
Should be:
"any of them does ipc_rcu_putref() a bit later and sees old (cached)"
Sorry,
Jarek P.
On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
...
> I hope not! But, then it would be probably another logical trick:
> ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
> if it's used in do_msgsnd() there should be a risk something can do
> this kfree at this moment, and it seems freeque() is the only one,
> which both: can do this and cares for this refcount. Then, e.g., if
> any of them does ipc_rcu_getref() a bit later and sees old (cached)
> value - kfree can be skipped forever. [...]
After rethinking, this scenario seems to be wrong or very unprobable
(I'm not sure of all ways "if (--container...)" could be compiled),
so there should be no such risk - double kfree/vfree is more probable,
so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
pointer acquired by do_msgsend() before freeque() started); then,
after schedule(), do_msgsnd() can work with kfreed msq_queue structure
(at least considering classic RCU).
Regards,
Jarek P.
On Mon, Sep 24, 2007 at 08:54:07AM +0200, Jarek Poplawski wrote:
> After rethinking, this scenario seems to be wrong or very unprobable
> (I'm not sure of all ways "if (--container...)" could be compiled),
> so there should be no such risk - double kfree/vfree is more probable,
> so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
> do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
> pointer acquired by do_msgsend() before freeque() started); then,
> after schedule(), do_msgsnd() can work with kfreed msq_queue structure
> (at least considering classic RCU).
I see this scenario is even more impossible, so you were right,
it's all right at this point.
Jarek P.
Jarek Poplawski wrote:
> On Fri, Sep 21, 2007 at 01:03:47PM +0200, Jarek Poplawski wrote:
> ...
>
>>I hope not! But, then it would be probably another logical trick:
>>ipc_rcu_getref/putref() seems to prevent kfreeing of a structure, so
>>if it's used in do_msgsnd() there should be a risk something can do
>>this kfree at this moment, and it seems freeque() is the only one,
>>which both: can do this and cares for this refcount. Then, e.g., if
>>any of them does ipc_rcu_getref() a bit later and sees old (cached)
>>value - kfree can be skipped forever. [...]
>
>
> After rethinking, this scenario seems to be wrong or very unprobable
> (I'm not sure of all ways "if (--container...)" could be compiled),
> so there should be no such risk - double kfree/vfree is more probable,
> so no danger. More likely is such refcount abuse: ipc_rcu_getref() in
> do_msgsnd() done a bit after ipc_rcu_putref() in freeque() (msq
> pointer acquired by do_msgsend() before freeque() started); then,
> after schedule(), do_msgsnd() can work with kfreed msq_queue structure
> (at least considering classic RCU).
>
If msgsnd() acquires the pointer first, it does it under lock +
rcu_getref(). ==> refcount = 2
When schedule() is called if freeque() takes the pointer it will call
msg_rmid() that sets the deleted field in the msg queue. When the lock
is released by freeque(), we have either 1) or 2):
1) freeque()'s putref called 1st ==> refocunt = 1
Then msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then msgsnd()'s putref is called ==> refcount = 0
But this is done under RCU lock, so should be no problem
Then the deleted field is checked ==> return
2) msgsnd()'s lock_by_ptr() is called ==> rcu lock
Then we don't mind in which order are done the other operations
since we under rcu_lock: the structure won't disappear till we test
the deleted field.
Regards,
Nadia
Jarek Poplawski wrote:
> On Thu, Sep 20, 2007 at 03:08:42PM +0200, Nadia Derbey wrote:
>
>>Nadia Derbey wrote:
>>
>>>Jarek Poplawski wrote:
>>>
>>>
>>>>On Thu, Sep 20, 2007 at 08:24:58AM +0200, Nadia Derbey wrote:
>
> ...
>
>>>Actually, ipc_lock() is called most of the time without the
>>>ipc_ids.mutex held and without refcounting (maybe you didn't look for
>>>the msg_lock() sem_lock() and shm_lock() too).
>>>So I think disabling preemption is needed, isn't it?
>>>
>>>
>>>>so, these rcu_read_locks() don't
>>>>work here at all. So, probably I miss something again, but IMHO,
>>>>these rcu_read_locks/unlocks could be removed here or in
>>>>ipc_lock_by_ptr() and it should be enough to use them directly, where
>>>>really needed, e.g., in msg.c do_msgrcv().
>>>>
>>>
>>>I have to check for the ipc_lock_by_ptr(): may be you're right!
>>>
>>
>>So, here is the ipc_lock_by_ptr() status:
>>1) do_msgsnd(), semctl_main(GETALL), semctl_main(SETALL) and find_undo()
>>call it inside a refcounting.
>> ==> no rcu read section needed.
>>
>>2) *_exit_ns(), ipc_findkey() and sysvipc_find_ipc() call it under the
>>ipc_ids mutex lock.
>> ==> no rcu read section needed.
>>
>>3) do_msgrcv() is the only path where ipc_lock_by_ptr() is not called
>>under refcounting
>> ==> rcu read section + some more checks needed once the spnlock is
>> taken.
>>
>>So I completely agree with you: we might remove the rcu_read_lock() from
>>the ipc_lock_by_ptr() and explicitley call it when needed (actually, it
>>is already explicitly called in do_msgrcv()).
>
>
> Yes, IMHO, it should be at least more readable when we can see where
> this RCU is really needed.
>
> But, after 3-rd look, I have a few more doubts (btw., 3 looks are
> still not enough for me with this code, so I cerainly can miss many
> things here, and, alas, I manged to see util and msg code only):
>
Jarek,
I'm realizing I did'nt give you an answer to issues # 1 and 3. Sorry for
that!
> 1. ipc_lock() and ipc_lock_check() are used without ipc_ids.mutex,
> but it's probably wrong: they call idr_find() with ipc_ids pointer
> which needs this mutex, just like in similar code in: ipc_findkey(),
> ipc_get_maxid() or sysvipc_find_ipc().
I think you're completely right: the rcu_read_lock() is not enough in
this case.
I have to solve this issue, but keeping the original way the ipc
developers have done it: I think they didn't want to take the mutex lock
for every single operation. E.g. sending a message to a given message
queue shouldn't avoid creating new message queues.
I'll come up with a solution.
>
> 2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> safe (memory barriers): it's not atomic, so locking is needed, but
> e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
>
> 3. Probably similar problem is possible with msr_d.r_msg which is
> read in do_msgrcv() under rcu_read_lock() only.
>
In think here they have avoided refcoutning by using r_msg:
r_msg is initialzed to -EAGAIN before releasing the msq lock. if
freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
Setting r_msg is always done under the msq lock (expunge_all() /
pipelined_Sned()).
Since rcu_read_lock is called right after schedule, they are sure the
msq pointer is still valid when they re-lock it once a msg is present in
the receive queue.
Please tell me if I'm not clear ;-)
Regards,
Nadia
On Mon, Sep 24, 2007 at 11:50:23AM +0200, Nadia Derbey wrote:
> Jarek Poplawski wrote:
...
> >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> >safe (memory barriers): it's not atomic, so locking is needed, but
> >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
> >
> >3. Probably similar problem is possible with msr_d.r_msg which is
> >read in do_msgrcv() under rcu_read_lock() only.
> >
>
> In think here they have avoided refcoutning by using r_msg:
> r_msg is initialzed to -EAGAIN before releasing the msq lock. if
> freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
> Setting r_msg is always done under the msq lock (expunge_all() /
> pipelined_Sned()).
> Since rcu_read_lock is called right after schedule, they are sure the
> msq pointer is still valid when they re-lock it once a msg is present in
> the receive queue.
>
> Please tell me if I'm not clear ;-)
All clear!
Except... this r_msg is still unclear to me. Since it's read without
msq lock I doubt this first check after schedule() is of any value. A
comment: "Lockless receive, part 2" tells about some safety against a
race with pipeline_send() and expunge_all() when in wake_up_process(),
but how can we be sure this r_msg is not just to be changed and this
wake_up_process() is running while "while" check still sees
ERR_PTR(-EAGAIN) instead of NULL?
Thanks & sorry for delay,
Jarek P.
On Tue, Sep 25, 2007 at 01:47:05PM +0200, Jarek Poplawski wrote:
> On Mon, Sep 24, 2007 at 11:50:23AM +0200, Nadia Derbey wrote:
> > Jarek Poplawski wrote:
> ...
> > >2. I'm not sure this refcounting with ipc_rcu_getref/putref is SMP
> > >safe (memory barriers): it's not atomic, so locking is needed, but
> > >e.g. in do_msgsnd() kern_ipc_perm lock is used for this, while
> > >freeque() calls ipc_rcu_putref() with ipc_ids mutex only.
> > >
> > >3. Probably similar problem is possible with msr_d.r_msg which is
> > >read in do_msgrcv() under rcu_read_lock() only.
> > >
> >
> > In think here they have avoided refcoutning by using r_msg:
> > r_msg is initialzed to -EAGAIN before releasing the msq lock. if
> > freequeue() is called it sets r_msg to EIDRM (see expunge_all(-EIDRM)).
> > Setting r_msg is always done under the msq lock (expunge_all() /
> > pipelined_Sned()).
> > Since rcu_read_lock is called right after schedule, they are sure the
> > msq pointer is still valid when they re-lock it once a msg is present in
> > the receive queue.
> >
> > Please tell me if I'm not clear ;-)
>
> All clear!
>
> Except... this r_msg is still unclear to me. Since it's read without
> msq lock I doubt this first check after schedule() is of any value. A
> comment: "Lockless receive, part 2" tells about some safety against a
> race with pipeline_send() and expunge_all() when in wake_up_process(),
> but how can we be sure this r_msg is not just to be changed and this
> wake_up_process() is running while "while" check still sees
> ERR_PTR(-EAGAIN) instead of NULL?
OOPS!
At last I've found enough time to look at this more exactly plus the
right comment in sem.c, and it looks like it's all right and really
thought up, with all variants foreseen. So, I withdraw this problem
nr 3 too.
Best regards,
Jarek P.