Hello!
This series provides miscellaneous fixes:
1. Removes the CONFIG_RCU_CPU_STALL_VERBOSE, so that RCU CPU stall
warnings are henceforth unconditionally verbose. (Another RCU
Kconfig variable bites the dust!)
2. Allow 1- and 2-byte smp_load_acquire() and smp_store_release().
This has the side effect of removing support for pre-EV56 Alpha
CPUs, and the official Alpha maintainers have thus far been
silent on this issue.
3. Use rcu_dereference() for accessing struct mapped_device,
courtesy of Pranith Kumar.
4. Annotate the dm_table's ->map field with __rcu, courtesy of
Pranith Kumar.
5. Add sparse checking for use of RCU_INIT_POINTER() on a non-__rcu
pointer, courtesy of Pranith Kumar.
6. Prevent cond_resched_rcu_qs() from doing gratuitous call to
rcu_note_voluntary_context_switch().
7. Update the list of rcu_read_unlock() potential deadlocks, courtesy
of Oleg Nesterov.
8. Bind rcu_tasks_kthread() to housekeeping CPUs in CONFIG_NO_HZ_FULL
builds.
9. Provide lockless_dereference() for times when you want
rcu_dereference(), but without the sparse and lockdep-RCU noise.
Thanx, Paul
------------------------------------------------------------------------
b/Documentation/RCU/stallwarn.txt | 6 --
b/drivers/md/dm.c | 10 ++--
b/include/linux/compiler.h | 2
b/include/linux/rcupdate.h | 24 ++++++++--
b/kernel/rcu/tree_plugin.h | 13 -----
b/kernel/rcu/update.c | 3 -
b/lib/Kconfig.debug | 12 -----
b/tools/testing/selftests/rcutorture/configs/rcu/TREE01 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE02 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE04 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE05 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE06 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE07 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE08 | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T | 1
b/tools/testing/selftests/rcutorture/configs/rcu/TREE09 | 1
b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 3 -
19 files changed, 30 insertions(+), 54 deletions(-)
From: Pranith Kumar <[email protected]>
Add a sparse check when RCU_INIT_POINTER() is used to assign a non __rcu
annotated pointer.
Signed-off-by: Pranith Kumar <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a4a819ffb2d1..a033d8b55773 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1047,6 +1047,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
*/
#define RCU_INIT_POINTER(p, v) \
do { \
+ rcu_dereference_sparse(p, __rcu); \
p = RCU_INITIALIZER(v); \
} while (0)
--
1.8.1.5
From: Pranith Kumar <[email protected]>
Annotate the map field with __rcu since this is a rcu pointer which is checked
by sparse.
Signed-off-by: Pranith Kumar <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
drivers/md/dm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e7399362722e..3372b8378830 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -140,7 +140,7 @@ struct mapped_device {
* Use dm_get_live_table{_fast} or take suspend_lock for
* dereference.
*/
- struct dm_table *map;
+ struct dm_table __rcu *map;
struct list_head table_devices;
struct mutex table_devices_lock;
--
1.8.1.5
From: "Paul E. McKenney" <[email protected]>
The current implementation of cond_resched_rcu_qs() can invoke
rcu_note_voluntary_context_switch() twice in the should_resched()
case, once via the call to __schedule() and once directly. However, as
noted by Joe Lawrence in a patch to the team subsystem, cond_resched()
returns an indication as to whether or not the call to __schedule()
actually happened. This commit therefore changes cond_resched_rcu_qs()
so as to invoke rcu_note_voluntary_context_switch() only when __schedule()
was not called.
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a033d8b55773..36ea3ba5c516 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -348,8 +348,8 @@ extern struct srcu_struct tasks_rcu_exit_srcu;
*/
#define cond_resched_rcu_qs() \
do { \
- rcu_note_voluntary_context_switch(current); \
- cond_resched(); \
+ if (!cond_resched()) \
+ rcu_note_voluntary_context_switch(current); \
} while (0)
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
--
1.8.1.5
From: "Paul E. McKenney" <[email protected]>
Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.
Reported-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index ae6942a84a0d..423fd0478f8a 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})
+
+/**
* rcu_assign_pointer() - assign to RCU-protected pointer
* @p: pointer to assign to
* @v: value to assign (publish)
--
1.8.1.5
From: Pranith Kumar <[email protected]>
Got Paul's email wrong the first time.
The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
while accessing it.
Signed-off-by: Pranith Kumar <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
drivers/md/dm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 58f3927fd7cc..e7399362722e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2332,7 +2332,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
merge_is_optional = dm_table_merge_is_optional(t);
- old_map = md->map;
+ old_map = rcu_dereference(md->map);
rcu_assign_pointer(md->map, t);
md->immutable_target_type = dm_table_get_immutable_target_type(t);
@@ -2351,7 +2351,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
*/
static struct dm_table *__unbind(struct mapped_device *md)
{
- struct dm_table *map = md->map;
+ struct dm_table *map = rcu_dereference(md->map);
if (!map)
return NULL;
@@ -2745,7 +2745,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
goto out_unlock;
}
- map = md->map;
+ map = rcu_dereference(md->map);
/*
* DMF_NOFLUSH_SUSPENDING must be set before presuspend.
@@ -2839,7 +2839,7 @@ int dm_resume(struct mapped_device *md)
if (!dm_suspended_md(md))
goto out;
- map = md->map;
+ map = rcu_dereference(md->map);
if (!map || !dm_table_get_size(map))
goto out;
--
1.8.1.5
From: "Paul E. McKenney" <[email protected]>
This commit affines rcu_tasks_kthread() to the housekeeping CPUs
in CONFIG_NO_HZ_FULL builds. This is just a default, so systems
administrators are free to put this kthread somewhere else if they wish.
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/update.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 3ef8ba58694e..8a39e68ff8e0 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -531,7 +531,8 @@ static int __noreturn rcu_tasks_kthread(void *arg)
struct rcu_head *next;
LIST_HEAD(rcu_tasks_holdouts);
- /* FIXME: Add housekeeping affinity. */
+ /* Run on housekeeping CPUs by default. Sysadm can move if desired. */
+ housekeeping_affine(current);
/*
* Each pass through the following loop makes one check for
--
1.8.1.5
From: Oleg Nesterov <[email protected]>
The comment above rcu_read_unlock() explains the potential deadlock
if the caller holds one of the locks taken by rt_mutex_unlock() paths,
but it is not clear from this documentation that any lock which can
be taken from interrupt can lead to deadlock as well and we need to
take rt_mutex_lock() into account too.
The problem is that rt_mutex_lock() takes wait_lock without disabling
irqs, and thus an interrupt taking some LOCK can obviously race with
rcu_read_unlock_special() called with the same LOCK held.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 36ea3ba5c516..ae6942a84a0d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -887,7 +887,9 @@ static inline void rcu_read_lock(void)
* Unfortunately, this function acquires the scheduler's runqueue and
* priority-inheritance spinlocks. This means that deadlock could result
* if the caller of rcu_read_unlock() already holds one of these locks or
- * any lock that is ever acquired while holding them.
+ * any lock that is ever acquired while holding them; or any lock which
+ * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
+ * does not disable irqs while taking ->wait_lock.
*
* That said, RCU readers are never priority boosted unless they were
* preempted. Therefore, one way to avoid deadlock is to make sure
--
1.8.1.5
From: "Paul E. McKenney" <[email protected]>
The CONFIG_RCU_CPU_STALL_VERBOSE Kconfig parameter causes preemptible
RCU's CPU stall warnings to dump out any preempted tasks that are blocking
the current RCU grace period. This information is useful, and the default
has been CONFIG_RCU_CPU_STALL_VERBOSE=y for some years. It is therefore
time for this commit to remove this Kconfig parameter, so that future
kernel builds will always act as if CONFIG_RCU_CPU_STALL_VERBOSE=y.
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/stallwarn.txt | 6 ------
kernel/rcu/tree_plugin.h | 13 -------------
lib/Kconfig.debug | 12 ------------
tools/testing/selftests/rcutorture/configs/rcu/TREE01 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE02 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE02-T | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE03 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE04 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE05 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE06 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE07 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE08 | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE08-T | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TREE09 | 1 -
tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 3 +--
15 files changed, 1 insertion(+), 44 deletions(-)
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index ef5a2fd4ff70..1560efbcc759 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
Stall-warning messages may be enabled and disabled completely via
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
-CONFIG_RCU_CPU_STALL_VERBOSE
-
- This kernel configuration parameter causes the stall warning to
- also dump the stacks of any tasks that are blocking the current
- RCU-preempt grace period.
-
CONFIG_RCU_CPU_STALL_INFO
This kernel configuration parameter causes the stall warning to
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c1d7f27bd38f..d062f4d6f037 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -72,9 +72,6 @@ static void __init rcu_bootup_announce_oddness(void)
#ifdef CONFIG_RCU_TORTURE_TEST_RUNNABLE
pr_info("\tRCU torture testing starts during boot.\n");
#endif
-#if defined(CONFIG_TREE_PREEMPT_RCU) && !defined(CONFIG_RCU_CPU_STALL_VERBOSE)
- pr_info("\tDump stacks of tasks blocking RCU-preempt GP.\n");
-#endif
#if defined(CONFIG_RCU_CPU_STALL_INFO)
pr_info("\tAdditional per-CPU info printed with stalls.\n");
#endif
@@ -415,8 +412,6 @@ void rcu_read_unlock_special(struct task_struct *t)
}
}
-#ifdef CONFIG_RCU_CPU_STALL_VERBOSE
-
/*
* Dump detailed information for all tasks blocking the current RCU
* grace period on the specified rcu_node structure.
@@ -451,14 +446,6 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp)
rcu_print_detail_task_stall_rnp(rnp);
}
-#else /* #ifdef CONFIG_RCU_CPU_STALL_VERBOSE */
-
-static void rcu_print_detail_task_stall(struct rcu_state *rsp)
-{
-}
-
-#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_VERBOSE */
-
#ifdef CONFIG_RCU_CPU_STALL_INFO
static void rcu_print_task_stall_begin(struct rcu_node *rnp)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4e35a5d767ed..04e54cd3e948 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1238,18 +1238,6 @@ config RCU_CPU_STALL_TIMEOUT
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
-config RCU_CPU_STALL_VERBOSE
- bool "Print additional per-task information for RCU_CPU_STALL_DETECTOR"
- depends on TREE_PREEMPT_RCU
- default y
- help
- This option causes RCU to printk detailed per-task information
- for any tasks that are stalling the current RCU grace period.
-
- Say N if you are unsure.
-
- Say Y if you want to enable such checks.
-
config RCU_CPU_STALL_INFO
bool "Print additional diagnostics on RCU CPU stall"
depends on (TREE_RCU || TREE_PREEMPT_RCU) && DEBUG_KERNEL
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
index 38e3895759dd..a04420e7ac9b 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
@@ -14,6 +14,5 @@ CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ZERO=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
index ea119ba2f7d4..a4f8bfbdd71d 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
@@ -19,6 +19,5 @@ CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
index 19cf9485f48a..fe19aaf8f20c 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
@@ -19,6 +19,5 @@ CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
index f4567fb3e332..a2a9a9bcd1cd 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
@@ -15,7 +15,6 @@ CONFIG_RCU_FANOUT_EXACT=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=2
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
index 0a262fbb0c12..0f84db35b36d 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
@@ -19,5 +19,4 @@ CONFIG_RCU_FANOUT_EXACT=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=y
-CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
index 3a06b97e9a73..212e3bfd2b2a 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
@@ -19,5 +19,4 @@ CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RCU=y
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE06 b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
index 8f084cca91bf..7eee63b44218 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE06
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
@@ -20,5 +20,4 @@ CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_PROVE_RCU=y
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
index 8f1017666aa7..92a97fa97dec 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
@@ -19,5 +19,4 @@ CONFIG_RCU_FANOUT_EXACT=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=y
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
index 69a2e255bf98..316aa6cedce5 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
@@ -19,6 +19,5 @@ CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
index a0f32fb8f17e..55daf08038e8 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
@@ -19,6 +19,5 @@ CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE09 b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
index b7a62a540ad1..13081a87586c 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE09
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
@@ -14,6 +14,5 @@ CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
-CONFIG_RCU_CPU_STALL_VERBOSE=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 3e588db86a17..0dcde4c70945 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -16,8 +16,7 @@ CONFIG_PROVE_LOCKING -- Do all but two, covering CONFIG_PROVE_RCU and not.
CONFIG_PROVE_RCU -- Do all but one under CONFIG_PROVE_LOCKING.
CONFIG_RCU_BOOST -- one of TREE_PREEMPT_RCU.
CONFIG_RCU_BOOST_PRIO -- set to 2 for _BOOST testing.
-CONFIG_RCU_CPU_STALL_INFO -- do one with and without _VERBOSE.
-CONFIG_RCU_CPU_STALL_VERBOSE -- do one with and without _INFO.
+CONFIG_RCU_CPU_STALL_INFO -- Do one.
CONFIG_RCU_FANOUT -- Cover hierarchy as currently, but overlap with others.
CONFIG_RCU_FANOUT_EXACT -- Do one.
CONFIG_RCU_FANOUT_LEAF -- Do one non-default.
--
1.8.1.5
From: "Paul E. McKenney" <[email protected]>
CPUs without single-byte and double-byte loads and stores place some
"interesting" requirements on concurrent code. For example (adapted
from Peter Hurley's test code), suppose we have the following structure:
struct foo {
spinlock_t lock1;
spinlock_t lock2;
char a; /* Protected by lock1. */
char b; /* Protected by lock2. */
};
struct foo *foop;
Of course, it is common (and good) practice to place data protected
by different locks in separate cache lines. However, if the locks are
rarely acquired (for example, only in rare error cases), and there are
a great many instances of the data structure, then memory footprint can
trump false-sharing concerns, so that it can be better to place them in
the same cache cache line as above.
But if the CPU does not support single-byte loads and stores, a store
to foop->a will do a non-atomic read-modify-write operation on foop->b,
which will come as a nasty surprise to someone holding foop->lock2. So we
now require CPUs to support single-byte and double-byte loads and stores.
Therefore, this commit adjusts the definition of __native_word() to allow
these sizes to be used by smp_load_acquire() and smp_store_release().
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index d5ad7b1118fc..934a834ab9f9 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -311,7 +311,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
/* Is this type a native word size -- useful for atomic operations */
#ifndef __native_word
-# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
#endif
/* Compile time object size, -1 for unknown */
--
1.8.1.5
On Tue, Oct 28, 2014 at 03:10:02PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> Although rcu_dereference() and friends can be used in situations where
> object lifetimes are being managed by something other than RCU, the
> resulting sparse and lockdep-RCU noise can be annoying. This commit
> therefore supplies a lockless_dereference(), which provides the
> protection for dereferences without the RCU-related debugging noise.
>
> Reported-by: Al Viro <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---
> +#define lockless_dereference(p) \
> +({ \
> + typeof(p) _________p1 = ACCESS_ONCE(p); \
> + smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> + (_________p1); \
> +})
Should we not have at least a single user along with this?
On Wed, Oct 29, 2014 at 11:57:04AM +0100, Peter Zijlstra wrote:
> On Tue, Oct 28, 2014 at 03:10:02PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > Although rcu_dereference() and friends can be used in situations where
> > object lifetimes are being managed by something other than RCU, the
> > resulting sparse and lockdep-RCU noise can be annoying. This commit
> > therefore supplies a lockless_dereference(), which provides the
> > protection for dereferences without the RCU-related debugging noise.
> >
> > Reported-by: Al Viro <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
>
> > +#define lockless_dereference(p) \
> > +({ \
> > + typeof(p) _________p1 = ACCESS_ONCE(p); \
> > + smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > + (_________p1); \
> > +})
>
> Should we not have at least a single user along with this?
And we do. In fact, Al Viro has pulled this into his vfs.git tree and
so I will be dropping this patch in favor of his.
Thanx, Paul
On 10/29, Paul E. McKenney wrote:
>
> On Wed, Oct 29, 2014 at 11:57:04AM +0100, Peter Zijlstra wrote:
> > On Tue, Oct 28, 2014 at 03:10:02PM -0700, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <[email protected]>
> > >
> > > Although rcu_dereference() and friends can be used in situations where
> > > object lifetimes are being managed by something other than RCU, the
> > > resulting sparse and lockdep-RCU noise can be annoying. This commit
> > > therefore supplies a lockless_dereference(), which provides the
> > > protection for dereferences without the RCU-related debugging noise.
> > >
> > > Reported-by: Al Viro <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > ---
> >
> > > +#define lockless_dereference(p) \
> > > +({ \
> > > + typeof(p) _________p1 = ACCESS_ONCE(p); \
> > > + smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > + (_________p1); \
> > > +})
> >
> > Should we not have at least a single user along with this?
>
> And we do. In fact, Al Viro has pulled this into his vfs.git tree and
> so I will be dropping this patch in favor of his.
And it seems that most of smp_read_barrier_depends() users can be changed
to use this helper.
Oleg.
On Wed, Oct 29, 2014 at 08:15:19PM +0100, Oleg Nesterov wrote:
> On 10/29, Paul E. McKenney wrote:
> >
> > On Wed, Oct 29, 2014 at 11:57:04AM +0100, Peter Zijlstra wrote:
> > > On Tue, Oct 28, 2014 at 03:10:02PM -0700, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <[email protected]>
> > > >
> > > > Although rcu_dereference() and friends can be used in situations where
> > > > object lifetimes are being managed by something other than RCU, the
> > > > resulting sparse and lockdep-RCU noise can be annoying. This commit
> > > > therefore supplies a lockless_dereference(), which provides the
> > > > protection for dereferences without the RCU-related debugging noise.
> > > >
> > > > Reported-by: Al Viro <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > ---
> > >
> > > > +#define lockless_dereference(p) \
> > > > +({ \
> > > > + typeof(p) _________p1 = ACCESS_ONCE(p); \
> > > > + smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > + (_________p1); \
> > > > +})
> > >
> > > Should we not have at least a single user along with this?
> >
> > And we do. In fact, Al Viro has pulled this into his vfs.git tree and
> > so I will be dropping this patch in favor of his.
>
> And it seems that most of smp_read_barrier_depends() users can be changed
> to use this helper.
Good point! I guess I should have done this some time ago. ;-)
Thanx, Paul
On Tue, Oct 28, 2014 at 03:09:56PM -0700, Paul E. McKenney wrote:
> From: Pranith Kumar <[email protected]>
>
> Got Paul's email wrong the first time.
>
> The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
> while accessing it.
>
> Signed-off-by: Pranith Kumar <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
On current -next I see this:
[ 6.388264] ===============================
[ 6.389571] [ INFO: suspicious RCU usage. ]
[ 6.390869] 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2 Not tainted
[ 6.392185] -------------------------------
[ 6.393479] /home/kas/git/public/linux/drivers/md/dm.c:2853 suspicious rcu_dereference_check() usage!
[ 6.394801]
other info that might help us debug this:
[ 6.398714]
rcu_scheduler_active = 1, debug_locks = 0
[ 6.401247] 1 lock held by cryptsetup/159:
[ 6.402522] #0: (&md->suspend_lock/1){+.+...}, at: [<ffffffff8179dd1d>] dm_suspend+0x3d/0x140
[ 6.403848]
stack backtrace:
[ 6.406448] CPU: 3 PID: 159 Comm: cryptsetup Not tainted 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2
[ 6.407726] Hardware name: LENOVO 3460CC6/3460CC6, BIOS G6ET93WW (2.53 ) 02/04/2013
[ 6.408982] 0000000000000001 ffff8800d3ac7c38 ffffffff81b00bbd 0000000000000011
[ 6.410249] ffff8800d301a560 ffff8800d3ac7c68 ffffffff81153be7 ffff8800d3928800
[ 6.411548] ffff8800d3928970 ffff8800d3928aa0 0000000000000001 ffff8800d3ac7ca8
[ 6.412780] Call Trace:
[ 6.413980] [<ffffffff81b00bbd>] dump_stack+0x4c/0x6e
[ 6.415178] [<ffffffff81153be7>] lockdep_rcu_suspicious+0xe7/0x120
[ 6.416364] [<ffffffff8179de1d>] dm_suspend+0x13d/0x140
[ 6.417535] [<ffffffff817a2d80>] ? table_load+0x340/0x340
[ 6.418749] [<ffffffff817a2f2b>] dev_suspend+0x1ab/0x260
[ 6.419901] [<ffffffff817a2d80>] ? table_load+0x340/0x340
[ 6.421038] [<ffffffff817a3781>] ctl_ioctl+0x251/0x540
[ 6.422164] [<ffffffff812b6415>] ? mntput_no_expire+0x5/0x360
[ 6.423280] [<ffffffff817a3a83>] dm_ctl_ioctl+0x13/0x20
[ 6.424389] [<ffffffff812a6f98>] do_vfs_ioctl+0x308/0x540
[ 6.425515] [<ffffffff81175d2d>] ? rcu_read_lock_held+0x6d/0x70
[ 6.426601] [<ffffffff812b324e>] ? __fget_light+0xbe/0xd0
[ 6.427686] [<ffffffff812a7251>] SyS_ioctl+0x81/0xa0
[ 6.428760] [<ffffffff81b0d6ad>] system_call_fastpath+0x16/0x1b
--
Kirill A. Shutemov
On 11/21/2014 08:31 AM, Kirill A. Shutemov wrote:
> On Tue, Oct 28, 2014 at 03:09:56PM -0700, Paul E. McKenney wrote:
>> From: Pranith Kumar <[email protected]>
>>
>> Got Paul's email wrong the first time.
>>
>> The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
>> while accessing it.
>>
>> Signed-off-by: Pranith Kumar <[email protected]>
>> Signed-off-by: Paul E. McKenney <[email protected]>
>
> On current -next I see this:
>
> [ 6.388264] ===============================
> [ 6.389571] [ INFO: suspicious RCU usage. ]
> [ 6.390869] 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2 Not tainted
> [ 6.392185] -------------------------------
> [ 6.393479] /home/kas/git/public/linux/drivers/md/dm.c:2853 suspicious rcu_dereference_check() usage!
> [ 6.394801]
> other info that might help us debug this:
>
Hi Kirill,
We are dereferencing an RCU pointer with the suspend_lock held which is causing this warning.
Can you please check if the following patch helps? Thanks!
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a0ece87..e584e66 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2830,7 +2830,7 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
*/
int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
{
- struct dm_table *map = NULL;
+ struct dm_table *map = rcu_dereference(md->map);
int r = 0;
retry:
@@ -2850,8 +2850,6 @@ retry:
goto retry;
}
- map = rcu_dereference(md->map);
-
r = __dm_suspend(md, map, suspend_flags, TASK_INTERRUPTIBLE);
if (r)
goto out_unlock;
> [ 6.398714]
> rcu_scheduler_active = 1, debug_locks = 0
> [ 6.401247] 1 lock held by cryptsetup/159:
> [ 6.402522] #0: (&md->suspend_lock/1){+.+...}, at: [<ffffffff8179dd1d>] dm_suspend+0x3d/0x140
> [ 6.403848]
> stack backtrace:
> [ 6.406448] CPU: 3 PID: 159 Comm: cryptsetup Not tainted 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2
> [ 6.407726] Hardware name: LENOVO 3460CC6/3460CC6, BIOS G6ET93WW (2.53 ) 02/04/2013
> [ 6.408982] 0000000000000001 ffff8800d3ac7c38 ffffffff81b00bbd 0000000000000011
> [ 6.410249] ffff8800d301a560 ffff8800d3ac7c68 ffffffff81153be7 ffff8800d3928800
> [ 6.411548] ffff8800d3928970 ffff8800d3928aa0 0000000000000001 ffff8800d3ac7ca8
> [ 6.412780] Call Trace:
> [ 6.413980] [<ffffffff81b00bbd>] dump_stack+0x4c/0x6e
> [ 6.415178] [<ffffffff81153be7>] lockdep_rcu_suspicious+0xe7/0x120
> [ 6.416364] [<ffffffff8179de1d>] dm_suspend+0x13d/0x140
> [ 6.417535] [<ffffffff817a2d80>] ? table_load+0x340/0x340
> [ 6.418749] [<ffffffff817a2f2b>] dev_suspend+0x1ab/0x260
> [ 6.419901] [<ffffffff817a2d80>] ? table_load+0x340/0x340
> [ 6.421038] [<ffffffff817a3781>] ctl_ioctl+0x251/0x540
> [ 6.422164] [<ffffffff812b6415>] ? mntput_no_expire+0x5/0x360
> [ 6.423280] [<ffffffff817a3a83>] dm_ctl_ioctl+0x13/0x20
> [ 6.424389] [<ffffffff812a6f98>] do_vfs_ioctl+0x308/0x540
> [ 6.425515] [<ffffffff81175d2d>] ? rcu_read_lock_held+0x6d/0x70
> [ 6.426601] [<ffffffff812b324e>] ? __fget_light+0xbe/0xd0
> [ 6.427686] [<ffffffff812a7251>] SyS_ioctl+0x81/0xa0
> [ 6.428760] [<ffffffff81b0d6ad>] system_call_fastpath+0x16/0x1b
>
On Fri, Nov 21, 2014 at 09:30:36AM -0500, Pranith Kumar wrote:
> On 11/21/2014 08:31 AM, Kirill A. Shutemov wrote:
> > On Tue, Oct 28, 2014 at 03:09:56PM -0700, Paul E. McKenney wrote:
> >> From: Pranith Kumar <[email protected]>
> >>
> >> Got Paul's email wrong the first time.
> >>
> >> The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
> >> while accessing it.
> >>
> >> Signed-off-by: Pranith Kumar <[email protected]>
> >> Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > On current -next I see this:
> >
> > [ 6.388264] ===============================
> > [ 6.389571] [ INFO: suspicious RCU usage. ]
> > [ 6.390869] 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2 Not tainted
> > [ 6.392185] -------------------------------
> > [ 6.393479] /home/kas/git/public/linux/drivers/md/dm.c:2853 suspicious rcu_dereference_check() usage!
> > [ 6.394801]
> > other info that might help us debug this:
> >
>
> Hi Kirill,
>
> We are dereferencing an RCU pointer with the suspend_lock held which is causing this warning.
>
> Can you please check if the following patch helps? Thanks!
Nope. The same issue.
IIUC, the problem is that you dereference pointer outside rcu_read_lock()
section, not that suspend_lock is held.
--
Kirill A. Shutemov
On Fri, Nov 21, 2014 at 9:58 AM, Kirill A. Shutemov
<[email protected]> wrote:
> On Fri, Nov 21, 2014 at 09:30:36AM -0500, Pranith Kumar wrote:
>> On 11/21/2014 08:31 AM, Kirill A. Shutemov wrote:
>> > On Tue, Oct 28, 2014 at 03:09:56PM -0700, Paul E. McKenney wrote:
>> >> From: Pranith Kumar <[email protected]>
>> >>
>> >> Got Paul's email wrong the first time.
>> >>
>> >> The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference()
>> >> while accessing it.
>> >>
>> >> Signed-off-by: Pranith Kumar <[email protected]>
>> >> Signed-off-by: Paul E. McKenney <[email protected]>
>> >
>> > On current -next I see this:
>> >
>> > [ 6.388264] ===============================
>> > [ 6.389571] [ INFO: suspicious RCU usage. ]
>> > [ 6.390869] 3.18.0-rc5-next-20141121-08303-g44cae4530372 #2 Not tainted
>> > [ 6.392185] -------------------------------
>> > [ 6.393479] /home/kas/git/public/linux/drivers/md/dm.c:2853 suspicious rcu_dereference_check() usage!
>> > [ 6.394801]
>> > other info that might help us debug this:
>> >
>>
>> Hi Kirill,
>>
>> We are dereferencing an RCU pointer with the suspend_lock held which is causing this warning.
>>
>> Can you please check if the following patch helps? Thanks!
>
> Nope. The same issue.
>
> IIUC, the problem is that you dereference pointer outside rcu_read_lock()
> section, not that suspend_lock is held.
>
I am not sure we should be taking rcu_read_lock() there as I am not
sure how long that critical section might last. Can someone who is
more familiar with the code take a look?
I will try to look for a solution too in the mean time.
Thanks!
--
Pranith
On Sun, 2014-11-23 at 07:21 -0500, Pranith Kumar wrote:
> I am not sure we should be taking rcu_read_lock() there as I am not
> sure how long that critical section might last. Can someone who is
> more familiar with the code take a look?
>
> I will try to look for a solution too in the mean time.
I am sending a fix.
From: Eric Dumazet <[email protected]>
rcu_dereference() should be used in sections protected by rcu_read_lock.
For writers, holding some kind of mutex or lock,
rcu_dereference_protected() is the way to go, adding explicit lockdep
bits.
In __unbind(), although there is no mutex or lock held, we are about
to free the mapped device, so can use the constant '1' instead of a
lockdep_is_held()
Reported-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
Cc: Pranith Kumar <[email protected]>
---
drivers/md/dm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a0ece87ad426..2e0aa2273382 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2335,7 +2335,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
merge_is_optional = dm_table_merge_is_optional(t);
- old_map = rcu_dereference(md->map);
+ old_map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
rcu_assign_pointer(md->map, t);
md->immutable_target_type = dm_table_get_immutable_target_type(t);
@@ -2355,7 +2356,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
*/
static struct dm_table *__unbind(struct mapped_device *md)
{
- struct dm_table *map = rcu_dereference(md->map);
+ struct dm_table *map = rcu_dereference_protected(md->map, 1);
if (!map)
return NULL;
@@ -2850,7 +2851,8 @@ retry:
goto retry;
}
- map = rcu_dereference(md->map);
+ map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
r = __dm_suspend(md, map, suspend_flags, TASK_INTERRUPTIBLE);
if (r)
On Sun, Nov 23, 2014 at 11:40 AM, Eric Dumazet <[email protected]> wrote:
> From: Eric Dumazet <[email protected]>
>
> rcu_dereference() should be used in sections protected by rcu_read_lock.
>
> For writers, holding some kind of mutex or lock,
> rcu_dereference_protected() is the way to go, adding explicit lockdep
> bits.
>
> In __unbind(), although there is no mutex or lock held, we are about
> to free the mapped device, so can use the constant '1' instead of a
> lockdep_is_held()
That isn't true. dm_hash_remove_all() -- which calls dm_destroy --
holds _hash_lock. Why leave __unbind() brittle in the face of future
DM locking changes?
> Reported-by: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
> Cc: Pranith Kumar <[email protected]>
Hi Eric,
I'll pick this up once I get clarification for why your __unbind
change is safe.. but it really would've helped if you cc'd
[email protected] or myself directly (not a single person that you
cc'd actively maintains DM).
Hopefully these DM rcu "fixes" are finished after this.
Thanks,
Mike
On Sun, 2014-11-23 at 11:53 -0500, Mike Snitzer wrote:
> On Sun, Nov 23, 2014 at 11:40 AM, Eric Dumazet <[email protected]> wrote:
> > From: Eric Dumazet <[email protected]>
> >
> > rcu_dereference() should be used in sections protected by rcu_read_lock.
> >
> > For writers, holding some kind of mutex or lock,
> > rcu_dereference_protected() is the way to go, adding explicit lockdep
> > bits.
> >
> > In __unbind(), although there is no mutex or lock held, we are about
> > to free the mapped device, so can use the constant '1' instead of a
> > lockdep_is_held()
>
> That isn't true. dm_hash_remove_all() -- which calls dm_destroy --
> holds _hash_lock. Why leave __unbind() brittle in the face of future
> DM locking changes?
>
Well, tell me. Before the 33423974bfc1 patch there was no protection.
If really you are about to delete an object, you have to be sure no one
is going to use it.
rcu_dereference_protected(X, 1) is how we express this thing, there is
nothing wrong here.
Fact that you hold a lock at this point is irrelevant and wont protect
the bug from happening. If you believe so, then you are wrong.
> > Reported-by: Kirill A. Shutemov <[email protected]>
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
> > Cc: Pranith Kumar <[email protected]>
>
> Hi Eric,
>
> I'll pick this up once I get clarification for why your __unbind
> change is safe.. but it really would've helped if you cc'd
> [email protected] or myself directly (not a single person that you
> cc'd actively maintains DM).
>
Hmm, my mailer complained because the mail had too many recipients
already. I did a 'reply' on the original thread.
> Hopefully these DM rcu "fixes" are finished after this.
You added a Signed-off-by on 33423974bfc1, not me.
Kirill gave the report 2 days ago and so far nobody fixed it.
I will send a v2 because other rcu_dereference() need to be changed as
well.
From: Eric Dumazet <[email protected]>
rcu_dereference() should be used in sections protected by rcu_read_lock.
For writers, holding some kind of mutex or lock,
rcu_dereference_protected() is the way to go, adding explicit lockdep
bits.
In __unbind(), we are the last user of this mapped device, so can use
the constant '1' instead of a lockdep_is_held(), not consistent with
other uses of rcu_dereference_protected() which use md->suspend_lock
mutex.
Reported-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
Cc: Pranith Kumar <[email protected]>
Cc: Mike Snitzer <[email protected]>
---
v2: changed all buggy rcu_dereference()
BTW, having a mutex named suspend_lock is ugly, IMO.
drivers/md/dm.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a0ece87ad426..5919d933bce9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2335,7 +2335,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
merge_is_optional = dm_table_merge_is_optional(t);
- old_map = rcu_dereference(md->map);
+ old_map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
rcu_assign_pointer(md->map, t);
md->immutable_target_type = dm_table_get_immutable_target_type(t);
@@ -2355,7 +2356,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
*/
static struct dm_table *__unbind(struct mapped_device *md)
{
- struct dm_table *map = rcu_dereference(md->map);
+ struct dm_table *map = rcu_dereference_protected(md->map, 1);
if (!map)
return NULL;
@@ -2850,7 +2851,8 @@ retry:
goto retry;
}
- map = rcu_dereference(md->map);
+ map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
r = __dm_suspend(md, map, suspend_flags, TASK_INTERRUPTIBLE);
if (r)
@@ -2908,7 +2910,8 @@ retry:
goto retry;
}
- map = rcu_dereference(md->map);
+ map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
if (!map || !dm_table_get_size(map))
goto out;
@@ -2943,7 +2946,8 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
return; /* nest suspend */
}
- map = rcu_dereference(md->map);
+ map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
/*
* Using TASK_UNINTERRUPTIBLE because only NOFLUSH internal suspend is
On Sun, Nov 23 2014 at 12:31pm -0500,
Eric Dumazet <[email protected]> wrote:
> On Sun, 2014-11-23 at 11:53 -0500, Mike Snitzer wrote:
> > On Sun, Nov 23, 2014 at 11:40 AM, Eric Dumazet <[email protected]> wrote:
> > > From: Eric Dumazet <[email protected]>
> > >
> > > rcu_dereference() should be used in sections protected by rcu_read_lock.
> > >
> > > For writers, holding some kind of mutex or lock,
> > > rcu_dereference_protected() is the way to go, adding explicit lockdep
> > > bits.
> > >
> > > In __unbind(), although there is no mutex or lock held, we are about
> > > to free the mapped device, so can use the constant '1' instead of a
> > > lockdep_is_held()
> >
> > That isn't true. dm_hash_remove_all() -- which calls dm_destroy --
> > holds _hash_lock. Why leave __unbind() brittle in the face of future
> > DM locking changes?
> >
>
> Well, tell me. Before the 33423974bfc1 patch there was no protection.
Wasn't protected by _hash_lock or wasn't protected by use of rcu_deference()?
> If really you are about to delete an object, you have to be sure no one
> is going to use it.
>
> rcu_dereference_protected(X, 1) is how we express this thing, there is
> nothing wrong here.
>
> Fact that you hold a lock at this point is irrelevant and wont protect
> the bug from happening. If you believe so, then you are wrong.
My asking a question about the validity of your assertion given that
assertion wasn't 100% correct is perfectly fair no? I just want to make
sure the change is accurately described. Asking that question also
doesn't imply I felt you don't know what you're doing. Fact is I really
trust you to be a _very_ capable developer.
But exactly which "bug" are we talking about? A theoretical bug or a
reported bug that caused DM to fail? So far all of these supposed rcu
deference fixes have _never_ substantiated with an actual bug (Other
than a splat from autochecking performed by rcu_dereference_check).
In fact the very first "fix" from 33423974bfc1 _seems_ to just be the
by-product from rcu janitor efforts. I'm happy others are looking after
rcu consumers but pretty sure all of this churn is what is causing these
"bugs".
But maybe I'm mistaken and all these changes shoul dbe cc'ing
[email protected]?
> > > Reported-by: Kirill A. Shutemov <[email protected]>
> > > Signed-off-by: Eric Dumazet <[email protected]>
> > > Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
> > > Cc: Pranith Kumar <[email protected]>
> >
> > Hi Eric,
> >
> > I'll pick this up once I get clarification for why your __unbind
> > change is safe.. but it really would've helped if you cc'd
> > [email protected] or myself directly (not a single person that you
> > cc'd actively maintains DM).
> >
>
> Hmm, my mailer complained because the mail had too many recipients
> already. I did a 'reply' on the original thread.
It's OK, I missed the report from 2 days ago too (wasn't cc'd etc).
> > Hopefully these DM rcu "fixes" are finished after this.
>
> You added a Signed-off-by on 33423974bfc1, not me.
Right, I don't pretend to keep my finger on the pulse of rcu API as much
as I probably should. But I'm pretty sure Paul McKenney does and he
also provided his Signed-off-by -- and I really trust Paul on rcu stuff.
> Kirill gave the report 2 days ago and so far nobody fixed it.
>
> I will send a v2 because other rcu_dereference() need to be changed as
> well.
I'm still wondering if they _need_ to be changed -- we aren't already
protected in these paths due to DM's existing locking (via suspend_lock
or _hash_lock, etc)? But I'll circle back to try to properly understand
the need shortly.
Anyway, all being said: I appreciate your help here. Not liking that we
got to baiting one another; I'll refrain from doing so in the future.
On Sun, Nov 23 2014 at 12:34pm -0500,
Eric Dumazet <[email protected]> wrote:
> From: Eric Dumazet <[email protected]>
>
> rcu_dereference() should be used in sections protected by rcu_read_lock.
>
> For writers, holding some kind of mutex or lock,
> rcu_dereference_protected() is the way to go, adding explicit lockdep
> bits.
>
> In __unbind(), we are the last user of this mapped device, so can use
> the constant '1' instead of a lockdep_is_held(), not consistent with
> other uses of rcu_dereference_protected() which use md->suspend_lock
> mutex.
>
> Reported-by: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Fixes: 33423974bfc1 ("dm: Use rcu_dereference() for accessing rcu pointer")
> Cc: Pranith Kumar <[email protected]>
> Cc: Mike Snitzer <[email protected]>
Thanks, I've staged this for 3.19:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.19&id=a12f5d48bdfeb5fe10157ac01c3de29269f457c6