LinuxLists.cc - [PATCH 01/10] fault injection: fix a few documentation errors

2016-10-16 15:56:28

Subject: [PATCH 01/10] fault injection: fix a few documentation errors

Signed-off-by: Vegard Nossum <[email protected]>
---
Documentation/fault-injection/fault-injection.txt | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
index 415484f..2ef8531 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -102,15 +102,15 @@ configuration of fault-injection capabilities.
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:

Format: { 'Y' | 'N' }
- default is 'N', setting it to 'Y' won't inject failures into
+ default is 'Y', setting it to 'N' will also inject failures into
highmem/user allocations.

- /sys/kernel/debug/failslab/ignore-gfp-wait:
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:

Format: { 'Y' | 'N' }
- default is 'N', setting it to 'Y' will inject failures
- only into non-sleep allocations (GFP_ATOMIC allocations).
+ default is 'Y', setting it to 'N' will also inject failures into
+ allocations that may sleep (non-GFP_ATOMIC allocations).

- /sys/kernel/debug/fail_page_alloc/min-order:

@@ -141,7 +141,7 @@ o #include <linux/fault-inject.h>

o define the fault attributes

- DECLARE_FAULT_INJECTION(name);
+ DECLARE_FAULT_ATTR(name);

Please see the definition of struct fault_attr in fault-inject.h
for details.
--
2.10.0.479.g221bd91

2016-10-16 15:56:30

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 02/10] fault injection: fix Kconfig menu

We need an explicit dependency on FAULT_INJECTION in order to keep
FAIL_MMC_REQUEST (and subsequent entries) inside the FAULT_INJECTION
menu.

Fixes: 28ff4fda9e5b ("mmc: kconfig: replace FAULT_INJECTION with FAULT_INJECTION_DEBUG_FS")
Cc: Adrien Schildknecht <[email protected]>
Cc: Ulf Hansson <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
lib/Kconfig.debug | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 33bc56c..d7cc65a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1641,7 +1641,8 @@ config FAIL_IO_TIMEOUT

config FAIL_MMC_REQUEST
bool "Fault-injection capability for MMC IO"
- depends on FAULT_INJECTION_DEBUG_FS && MMC
+ depends on FAULT_INJECTION && MMC
+ depends on FAULT_INJECTION_DEBUG_FS
help
Provide fault-injection capability for MMC IO.
This will make the mmc core return data errors. This is
--
2.10.0.479.g221bd91

2016-10-16 15:56:49

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 10/10] fault injection: inject faults in new/rare callchains

Before this patch, fault injection uses a combination of randomness and
frequency to determine where to inject faults. The problem with this is
that code paths which are executed very rarely get proportional amounts
of faults injected.

A better heuristic is to look at the actual callchain leading up to the
possible failure point; if we see a callchain that we've never seen up
until this point, chances are it's a rare one and we should definitely
inject a fault here (since we might not get the chance again later).

This uses a probabilistic set structure (similar to a bloom filter) to
determine whether we have seen a particular callchain before by hashing
the stack trace and atomically testing/setting a bit corresponding to
the current callchain.

There is a possibility of false negatives (i.e. we think we have seen a
particular callchain before when in fact we haven't, therefore we don't
inject a fault where we should have). We might use some sort of random
seed here, but the additional complexity doesn't seem worth it to me.

This finds a lot more bugs than just plain fault injection.

To enable at run-time:

echo Y > /sys/kernel/debug/fail*/fail_new_callsites

Signed-off-by: Vegard Nossum <[email protected]>

---

v2: turned this into a runtime knob as suggested by Akinobu Mita and
bumped the hashtable size from 32 to 128 KiB (16 to 64 KiB on 32-bit).
---
Documentation/fault-injection/fault-injection.txt | 8 +++++
include/linux/fault-inject.h | 2 ++
lib/Kconfig.debug | 7 +++-
lib/fault-inject.c | 40 ++++++++++++++++++++---
4 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
index 2ef8531..1cbfbbb 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -81,6 +81,14 @@ configuration of fault-injection capabilities.
Any positive value limits failures to only processes indicated by
/proc/<pid>/make-it-fail==1.

+- /sys/kernel/debug/fail*/fail_new_callsites:
+
+ Format: { 'Y' | 'N' }
+ A value of 'Y' will cause a callsite which has never been seen
+ before (since the last boot) to fail immediately. This can be
+ useful for triggering fault injection for very rare or hard-to-hit
+ call chains.
+
- /sys/kernel/debug/fail*/require-start:
- /sys/kernel/debug/fail*/require-end:
- /sys/kernel/debug/fail*/reject-start:
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 9f4956d..022da94 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -19,6 +19,7 @@ struct fault_attr {
atomic_t space;
unsigned long verbose;
bool task_filter;
+ bool fail_new_callsites;
unsigned long stacktrace_depth;
unsigned long require_start;
unsigned long require_end;
@@ -34,6 +35,7 @@ struct fault_attr {
.interval = 1, \
.times = ATOMIC_INIT(1), \
.require_end = ULONG_MAX, \
+ .fail_new_callsites = false, \
.stacktrace_depth = 32, \
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
.verbose = 2, \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1f13d02..d06a51d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1699,7 +1699,12 @@ config FAULT_INJECTION_STACKTRACE_FILTER
select STACKTRACE
select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !SCORE
help
- Provide stacktrace filter for fault-injection capabilities
+ Provide stacktrace filter for fault-injection capabilities.
+
+ This also allows you to enable fault-injection for new callsites,
+ i.e. a forced fault the first time a given call chain is seen.
+ This can be useful for triggering fault injection for very rare
+ or hard-to-hit call chains.

config LATENCYTOP
bool "Latency measuring infrastructure"
diff --git a/lib/fault-inject.c b/lib/fault-inject.c
index adba7c9..a401b14 100644
--- a/lib/fault-inject.c
+++ b/lib/fault-inject.c
@@ -63,7 +63,7 @@ static bool fail_task(struct fault_attr *attr, struct task_struct *task)

#ifdef CONFIG_FAULT_INJECTION_STACKTRACE_FILTER

-static bool fail_stacktrace(struct fault_attr *attr)
+static bool fail_stacktrace(struct fault_attr *attr, unsigned int *hash)
{
struct stack_trace trace;
int depth = attr->stacktrace_depth;
@@ -88,12 +88,20 @@ static bool fail_stacktrace(struct fault_attr *attr)
entries[n] < attr->require_end)
found = true;
}
+
+ if (IS_ENABLED(CONFIG_FAULT_INJECTION_AT_NEW_CALLSITES)) {
+ const char *start = (const char *) &entries[0];
+ const char *end = (const char *) &entries[trace.nr_entries];
+
+ *hash = full_name_hash(0, start, end - start);
+ }
+
return found;
}

#else

-static inline bool fail_stacktrace(struct fault_attr *attr)
+static inline bool fail_stacktrace(struct fault_attr *attr, unsigned int *hash)
{
return true;
}
@@ -134,6 +142,8 @@ static bool __fail(struct fault_attr *attr)

bool should_fail(struct fault_attr *attr, ssize_t size)
{
+ unsigned int hash = 0;
+
/* No need to check any other properties if the probability is 0 */
if (attr->probability == 0)
return false;
@@ -149,6 +159,25 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
return false;
}

+ if (!fail_stacktrace(attr, &hash))
+ return false;
+
+ if (IS_ENABLED(CONFIG_FAULT_INJECTION_STACKTRACE_FILTER) &&
+ attr->fail_new_callsites) {
+ static unsigned long seen_hashtable[16 * 1024];
+
+ hash &= 8 * sizeof(seen_hashtable) - 1;
+ if (!test_and_set_bit(hash & (BITS_PER_LONG - 1),
+ &seen_hashtable[hash / BITS_PER_LONG]))
+ {
+ /*
+ * If it's the first time we see this stacktrace, fail it
+ * without a second thought.
+ */
+ goto fail;
+ }
+ }
+
if (attr->interval > 1) {
attr->count++;
if (attr->count % attr->interval)
@@ -158,9 +187,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
if (attr->probability <= prandom_u32() % 100)
return false;

- if (!fail_stacktrace(attr))
- return false;
-
+fail:
return __fail(attr);
}
EXPORT_SYMBOL_GPL(should_fail);
@@ -241,6 +268,9 @@ struct dentry *fault_create_debugfs_attr(const char *name,

#ifdef CONFIG_FAULT_INJECTION_STACKTRACE_FILTER

+ if (!debugfs_create_bool("fail_new_callsites", mode, dir,
+ &attr->fail_new_callsites))
+ goto fail;
if (!debugfs_create_stacktrace_depth("stacktrace-depth", mode, dir,
&attr->stacktrace_depth))
goto fail;
--
2.10.0.479.g221bd91

2016-10-16 15:57:03

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 07/10] fault injection: down_trylock() fault injection

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
kernel/locking/semaphore.c | 26 ++++++++++++++++++++++++++
lib/Kconfig.debug | 6 ++++++
2 files changed, 32 insertions(+)

diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index b8120ab..367bae3 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -25,6 +25,7 @@
* semaphore. If it's zero, there may be tasks waiting on the wait_list.
*/

+#include <linux/fault-inject.h>
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/export.h>
@@ -114,6 +115,28 @@ int down_killable(struct semaphore *sem)
}
EXPORT_SYMBOL(down_killable);

+#ifdef CONFIG_FAIL_SEMAPHORE
+DECLARE_FAULT_ATTR(fail_semaphore);
+
+static int __init fail_semaphore_debugfs(void)
+{
+ struct dentry *dir = fault_create_debugfs_attr("fail_semaphore",
+ NULL, &fail_semaphore);
+ return PTR_ERR_OR_ZERO(dir);
+}
+late_initcall(fail_semaphore_debugfs);
+
+static inline bool should_fail_semaphore(struct semaphore *sem)
+{
+ return should_fail(&fail_semaphore, 1);
+}
+#else
+static inline bool should_fail_semaphore(struct semaphore *sem)
+{
+ return false;
+}
+#endif
+
/**
* down_trylock - try to acquire the semaphore, without waiting
* @sem: the semaphore to be acquired
@@ -132,6 +155,9 @@ int down_trylock(struct semaphore *sem)
unsigned long flags;
int count;

+ if (should_fail_semaphore(sem))
+ return 1;
+
raw_spin_lock_irqsave(&sem->lock, flags);
count = sem->count - 1;
if (likely(count >= 0))
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 12ad8eb..1e134de 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1669,6 +1669,12 @@ config FAIL_RT_MUTEX
help
Provide fault-injection capability for rt_mutex_trylock().

+config FAIL_SEMAPHORE
+ bool "Fault-injection capability for semaphores"
+ depends on FAULT_INJECTION
+ help
+ Provide fault-injection capability for down_trylock().
+
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
--
2.10.0.479.g221bd91

2016-10-16 15:57:12

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 09/10] fault injection: spin_trylock() fault injection

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
include/linux/spinlock.h | 12 ++++++++++++
kernel/locking/spinlock.c | 19 +++++++++++++++++++
lib/Kconfig.debug | 6 ++++++
3 files changed, 37 insertions(+)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..2fd1fee 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -307,8 +307,20 @@ static __always_inline void spin_lock_bh(spinlock_t *lock)
raw_spin_lock_bh(&lock->rlock);
}

+#if defined(CONFIG_SMP) && defined(CONFIG_FAIL_SPINLOCK)
+extern bool should_fail_spinlock(spinlock_t *lock);
+#else
+static __always_inline bool should_fail_spinlock(spinlock_t *lock)
+{
+ return false;
+}
+#endif
+
static __always_inline int spin_trylock(spinlock_t *lock)
{
+ if (should_fail_spinlock(lock))
+ return 0;
+
return raw_spin_trylock(&lock->rlock);
}

diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index db3ccb1..d57800f 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -14,6 +14,7 @@
* frame contact the architecture maintainers.
*/

+#include <linux/fault-inject.h>
#include <linux/linkage.h>
#include <linux/preempt.h>
#include <linux/spinlock.h>
@@ -405,3 +406,21 @@ notrace int in_lock_functions(unsigned long addr)
&& addr < (unsigned long)__lock_text_end;
}
EXPORT_SYMBOL(in_lock_functions);
+
+#ifdef CONFIG_FAIL_SPINLOCK
+static DECLARE_FAULT_ATTR(fail_spinlock);
+
+static int __init fail_spinlock_debugfs(void)
+{
+ struct dentry *dir = fault_create_debugfs_attr("fail_spinlock",
+ NULL, &fail_spinlock);
+ return PTR_ERR_OR_ZERO(dir);
+}
+late_initcall(fail_spinlock_debugfs);
+
+bool should_fail_spinlock(spinlock_t *lock)
+{
+ return should_fail(&fail_spinlock, 1);
+}
+
+#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 342f0a1f..1f13d02 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1681,6 +1681,12 @@ config FAIL_RW_SEMAPHORE
help
Provide fault-injection capability for down_{read,write}_trylock().

+config FAIL_SPINLOCK
+ bool "Fault-injection capability for spinlocks"
+ depends on FAULT_INJECTION
+ help
+ Provide fault-injection capability for spin_trylock().
+
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
--
2.10.0.479.g221bd91

2016-10-16 15:57:23

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 04/10] fault injection: prevent recursive fault injection

If something we call in the fail_dump() code path tries to acquire a
resource that might fail (due to fault injection), then we should not
try to recurse back into the fault injection code.

I've seen this happen with the console semaphore in the upcoming
semaphore trylock fault injection code.

Cc: Hillf Danton <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
lib/fault-inject.c | 34 ++++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/lib/fault-inject.c b/lib/fault-inject.c
index 6a823a5..adba7c9 100644
--- a/lib/fault-inject.c
+++ b/lib/fault-inject.c
@@ -100,6 +100,33 @@ static inline bool fail_stacktrace(struct fault_attr *attr)

#endif /* CONFIG_FAULT_INJECTION_STACKTRACE_FILTER */

+static DEFINE_PER_CPU(int, fault_active);
+
+static bool __fail(struct fault_attr *attr)
+{
+ bool ret = false;
+
+ /*
+ * Prevent recursive fault injection (this could happen if for
+ * example printing the fault would itself run some code that
+ * could fail)
+ */
+ preempt_disable();
+ if (unlikely(__this_cpu_inc_return(fault_active) != 1))
+ goto out;
+
+ ret = true;
+ fail_dump(attr);
+
+ if (atomic_read(&attr->times) != -1)
+ atomic_dec_not_zero(&attr->times);
+
+out:
+ __this_cpu_dec(fault_active);
+ preempt_enable();
+ return ret;
+}
+
/*
* This code is stolen from failmalloc-1.0
* http://www.nongnu.org/failmalloc/
@@ -134,12 +161,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
if (!fail_stacktrace(attr))
return false;

- fail_dump(attr);
-
- if (atomic_read(&attr->times) != -1)
- atomic_dec_not_zero(&attr->times);
-
- return true;
+ return __fail(attr);
}
EXPORT_SYMBOL_GPL(should_fail);

--
2.10.0.479.g221bd91

2016-10-16 15:57:33

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 06/10] fault injection: rt_mutex_trylock() fault injection

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
kernel/locking/rtmutex.c | 26 ++++++++++++++++++++++++++
lib/Kconfig.debug | 6 ++++++
2 files changed, 32 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 1ec0f48..22ff6da 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -10,6 +10,7 @@
*
* See Documentation/locking/rt-mutex-design.txt for details.
*/
+#include <linux/fault-inject.h>
#include <linux/spinlock.h>
#include <linux/export.h>
#include <linux/sched.h>
@@ -1465,6 +1466,28 @@ rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout)
}
EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);

+#ifdef CONFIG_FAIL_RT_MUTEX
+DECLARE_FAULT_ATTR(fail_rtmutex);
+
+static int __init fail_rtmutex_debugfs(void)
+{
+ struct dentry *dir = fault_create_debugfs_attr("fail_rtmutex",
+ NULL, &fail_rtmutex);
+ return PTR_ERR_OR_ZERO(dir);
+}
+late_initcall(fail_rtmutex_debugfs);
+
+static inline bool should_fail_rtmutex(struct rt_mutex *lock)
+{
+ return should_fail(&fail_rtmutex, 1);
+}
+#else
+static inline bool should_fail_rtmutex(struct rt_mutex *lock)
+{
+ return false;
+}
+#endif
+
/**
* rt_mutex_trylock - try to lock a rt_mutex
*
@@ -1481,6 +1504,9 @@ int __sched rt_mutex_trylock(struct rt_mutex *lock)
if (WARN_ON_ONCE(in_irq() || in_nmi() || in_serving_softirq()))
return 0;

+ if (should_fail_rtmutex(lock))
+ return 0;
+
return rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock);
}
EXPORT_SYMBOL_GPL(rt_mutex_trylock);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7e9a9b2e..12ad8eb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1663,6 +1663,12 @@ config FAIL_MUTEX
help
Provide fault-injection capability for mutex_trylock().

+config FAIL_RT_MUTEX
+ bool "Fault-injection capability for RT mutexes"
+ depends on FAULT_INJECTION && RT_MUTEXES
+ help
+ Provide fault-injection capability for rt_mutex_trylock().
+
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
--
2.10.0.479.g221bd91

2016-10-16 15:57:43

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 03/10] fault injection: allow FAULT_INJECTION_STACKTRACE_FILTER on X86_64

This config option was apparently disallowed on x86_64 because of
problems with the stack unwinder:

commit 6d690dcac92a84f98fd774862628ff871b713660
Author: Akinobu Mita <[email protected]>
Date: Sat May 12 10:36:53 2007 -0700

fault injection: disable stacktrace filter for x86-64

However, I observe no problems whatsoever with this today and in fact
it's been very useful for debugging. Let's allow it again.

Cc: Akinobu Mita <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
lib/Kconfig.debug | 1 -
1 file changed, 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d7cc65a..861fc1d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1666,7 +1666,6 @@ config FAULT_INJECTION_DEBUG_FS
config FAULT_INJECTION_STACKTRACE_FILTER
bool "stacktrace filter for fault-injection capabilities"
depends on FAULT_INJECTION_DEBUG_FS && STACKTRACE_SUPPORT
- depends on !X86_64
select STACKTRACE
select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !SCORE
help
--
2.10.0.479.g221bd91

2016-10-16 15:57:56

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 08/10] fault injection: down_{read,write}_trylock() fault injection

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
kernel/locking/rwsem.c | 35 +++++++++++++++++++++++++++++++++--
lib/Kconfig.debug | 6 ++++++
2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 45ba475..e52ffd3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -4,6 +4,7 @@
* Derived from asm-i386/semaphore.h
*/

+#include <linux/fault-inject.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/sched.h>
@@ -13,6 +14,28 @@

#include "rwsem.h"

+#ifdef CONFIG_FAIL_RW_SEMAPHORE
+DECLARE_FAULT_ATTR(fail_rwsem);
+
+static int __init fail_rwsem_debugfs(void)
+{
+ struct dentry *dir = fault_create_debugfs_attr("fail_rwsem",
+ NULL, &fail_rwsem);
+ return PTR_ERR_OR_ZERO(dir);
+}
+late_initcall(fail_rwsem_debugfs);
+
+static inline bool should_fail_rwsem(struct rw_semaphore *sem)
+{
+ return should_fail(&fail_rwsem, 1);
+}
+#else
+static inline bool should_fail_rwsem(struct rw_semaphore *sem)
+{
+ return false;
+}
+#endif
+
/*
* lock for reading
*/
@@ -32,8 +55,12 @@ EXPORT_SYMBOL(down_read);
*/
int down_read_trylock(struct rw_semaphore *sem)
{
- int ret = __down_read_trylock(sem);
+ int ret;
+
+ if (should_fail_rwsem(sem))
+ return 0;

+ ret = __down_read_trylock(sem);
if (ret == 1) {
rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
rwsem_set_reader_owned(sem);
@@ -81,8 +108,12 @@ EXPORT_SYMBOL(down_write_killable);
*/
int down_write_trylock(struct rw_semaphore *sem)
{
- int ret = __down_write_trylock(sem);
+ int ret;
+
+ if (should_fail_rwsem(sem))
+ return 0;

+ ret = __down_write_trylock(sem);
if (ret == 1) {
rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
rwsem_set_owner(sem);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e134de..342f0a1f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1675,6 +1675,12 @@ config FAIL_SEMAPHORE
help
Provide fault-injection capability for down_trylock().

+config FAIL_RW_SEMAPHORE
+ bool "Fault-injection capability for RW semaphores"
+ depends on FAULT_INJECTION
+ help
+ Provide fault-injection capability for down_{read,write}_trylock().
+
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
--
2.10.0.479.g221bd91

2016-10-16 15:58:44

by Vegard Nossum

[permalink] [raw]

Subject: [PATCH 05/10] fault injection: mutex_trylock() fault injection

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
kernel/locking/mutex.c | 32 ++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 6 ++++++
2 files changed, 38 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..0651ca6 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -17,6 +17,7 @@
*
* Also see Documentation/locking/mutex-design.txt.
*/
+#include <linux/fault-inject.h>
#include <linux/mutex.h>
#include <linux/ww_mutex.h>
#include <linux/sched.h>
@@ -887,6 +888,34 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
return prev == 1;
}

+#ifdef CONFIG_FAIL_MUTEX
+DECLARE_FAULT_ATTR(fail_mutex);
+
+static int __init setup_fail_mutex(char *str)
+{
+ return setup_fault_attr(&fail_mutex, str);
+}
+__setup("fail_mutex=", setup_fail_mutex);
+
+static int __init fail_mutex_debugfs(void)
+{
+ struct dentry *dir = fault_create_debugfs_attr("fail_mutex",
+ NULL, &fail_mutex);
+ return PTR_ERR_OR_ZERO(dir);
+}
+late_initcall(fail_mutex_debugfs);
+
+static inline bool should_fail_mutex(struct mutex *lock)
+{
+ return should_fail(&fail_mutex, 1);
+}
+#else
+static inline bool should_fail_mutex(struct mutex *lock)
+{
+ return false;
+}
+#endif
+
/**
* mutex_trylock - try to acquire the mutex, without waiting
* @lock: the mutex to be acquired
@@ -905,6 +934,9 @@ int __sched mutex_trylock(struct mutex *lock)
{
int ret;

+ if (should_fail_mutex(lock))
+ return 0;
+
ret = __mutex_fastpath_trylock(&lock->count, __mutex_trylock_slowpath);
if (ret)
mutex_set_owner(lock);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 861fc1d..7e9a9b2e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1657,6 +1657,12 @@ config FAIL_FUTEX
help
Provide fault-injection capability for futexes.

+config FAIL_MUTEX
+ bool "Fault-injection capability for mutexes"
+ depends on FAULT_INJECTION
+ help
+ Provide fault-injection capability for mutex_trylock().
+
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
--
2.10.0.479.g221bd91

2016-10-16 16:27:25

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 05/10] fault injection: mutex_trylock() fault injection

On Sun, Oct 16, 2016 at 05:56:07PM +0200, Vegard Nossum wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

NAK, -ENOCHANGELOG

2016-10-16 16:29:09

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 06/10] fault injection: rt_mutex_trylock() fault injection

On Sun, Oct 16, 2016 at 05:56:08PM +0200, Vegard Nossum wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

NAK, -ENOCHANGELOG

2016-10-16 16:29:20

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 08/10] fault injection: down_{read,write}_trylock() fault injection

On Sun, Oct 16, 2016 at 05:56:10PM +0200, Vegard Nossum wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

Guess what... NAK

2016-10-16 16:29:22

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 07/10] fault injection: down_trylock() fault injection

On Sun, Oct 16, 2016 at 05:56:09PM +0200, Vegard Nossum wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

Very tiring this..

NAK -ENOCHANGELOG

2016-10-16 16:29:58

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 09/10] fault injection: spin_trylock() fault injection

On Sun, Oct 16, 2016 at 05:56:11PM +0200, Vegard Nossum wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

*sigh*

2016-10-17 04:03:53

by Hillf Danton

[permalink] [raw]

Subject: Re: [PATCH 04/10] fault injection: prevent recursive fault injection

On Sunday, October 16, 2016 11:56 PM Vegard Nossum wrote:
>
> If something we call in the fail_dump() code path tries to acquire a
> resource that might fail (due to fault injection), then we should not
> try to recurse back into the fault injection code.
>
> I've seen this happen with the console semaphore in the upcoming
> semaphore trylock fault injection code.
>
> Cc: Hillf Danton <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>
> ---
> lib/fault-inject.c | 34 ++++++++++++++++++++++++++++------
> 1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/lib/fault-inject.c b/lib/fault-inject.c
> index 6a823a5..adba7c9 100644
> --- a/lib/fault-inject.c
> +++ b/lib/fault-inject.c
> @@ -100,6 +100,33 @@ static inline bool fail_stacktrace(struct fault_attr *attr)
>
> #endif /* CONFIG_FAULT_INJECTION_STACKTRACE_FILTER */
>
> +static DEFINE_PER_CPU(int, fault_active);
> +
> +static bool __fail(struct fault_attr *attr)
> +{
> + bool ret = false;
> +
> + /*
> + * Prevent recursive fault injection (this could happen if for
> + * example printing the fault would itself run some code that
> + * could fail)
> + */
> + preempt_disable();
> + if (unlikely(__this_cpu_inc_return(fault_active) != 1))
> + goto out;
> +
> + ret = true;
> + fail_dump(attr);
> +
> + if (atomic_read(&attr->times) != -1)
> + atomic_dec_not_zero(&attr->times);
> +
> +out:
> + __this_cpu_dec(fault_active);
> + preempt_enable();

Given no see other patches in this set, I could easily miss
anything important and correct me please.

Though added, there are no words in the change log about
preempt, and my wonder again is: can we add it in contexts
like the fast path of buddy allocator?

thanks
Hillf
> + return ret;
> +}
> +
> /*
> * This code is stolen from failmalloc-1.0
> * http://www.nongnu.org/failmalloc/
> @@ -134,12 +161,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
> if (!fail_stacktrace(attr))
> return false;
>
> - fail_dump(attr);
> -
> - if (atomic_read(&attr->times) != -1)
> - atomic_dec_not_zero(&attr->times);
> -
> - return true;
> + return __fail(attr);
> }
> EXPORT_SYMBOL_GPL(should_fail);
>
> --
> 2.10.0.479.g221bd91

2016-10-18 09:32:43

by Ulf Hansson

[permalink] [raw]

Subject: Re: [PATCH 02/10] fault injection: fix Kconfig menu

On 16 October 2016 at 17:56, Vegard Nossum <[email protected]> wrote:
> We need an explicit dependency on FAULT_INJECTION in order to keep
> FAIL_MMC_REQUEST (and subsequent entries) inside the FAULT_INJECTION
> menu.
>
> Fixes: 28ff4fda9e5b ("mmc: kconfig: replace FAULT_INJECTION with FAULT_INJECTION_DEBUG_FS")
> Cc: Adrien Schildknecht <[email protected]>
> Cc: Ulf Hansson <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>

Acked-by: Ulf Hansson <[email protected]>

Kind regards
Uffe

> ---
> lib/Kconfig.debug | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 33bc56c..d7cc65a 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1641,7 +1641,8 @@ config FAIL_IO_TIMEOUT
>
> config FAIL_MMC_REQUEST
> bool "Fault-injection capability for MMC IO"
> - depends on FAULT_INJECTION_DEBUG_FS && MMC
> + depends on FAULT_INJECTION && MMC
> + depends on FAULT_INJECTION_DEBUG_FS
> help
> Provide fault-injection capability for MMC IO.
> This will make the mmc core return data errors. This is
> --
> 2.10.0.479.g221bd91
>

2016-10-21 00:32:42

by Akinobu Mita

[permalink] [raw]

Subject: Re: [PATCH 10/10] fault injection: inject faults in new/rare callchains

2016-10-17 0:56 GMT+09:00 Vegard Nossum <[email protected]>:
> Before this patch, fault injection uses a combination of randomness and
> frequency to determine where to inject faults. The problem with this is
> that code paths which are executed very rarely get proportional amounts
> of faults injected.
>
> A better heuristic is to look at the actual callchain leading up to the
> possible failure point; if we see a callchain that we've never seen up
> until this point, chances are it's a rare one and we should definitely
> inject a fault here (since we might not get the chance again later).
>
> This uses a probabilistic set structure (similar to a bloom filter) to
> determine whether we have seen a particular callchain before by hashing
> the stack trace and atomically testing/setting a bit corresponding to
> the current callchain.
>
> There is a possibility of false negatives (i.e. we think we have seen a
> particular callchain before when in fact we haven't, therefore we don't
> inject a fault where we should have). We might use some sort of random
> seed here, but the additional complexity doesn't seem worth it to me.
>
> This finds a lot more bugs than just plain fault injection.
>
> To enable at run-time:
>
> echo Y > /sys/kernel/debug/fail*/fail_new_callsites
>
> Signed-off-by: Vegard Nossum <[email protected]>
>
> ---
>
> v2: turned this into a runtime knob as suggested by Akinobu Mita and
> bumped the hashtable size from 32 to 128 KiB (16 to 64 KiB on 32-bit).
> ---
> Documentation/fault-injection/fault-injection.txt | 8 +++++
> include/linux/fault-inject.h | 2 ++
> lib/Kconfig.debug | 7 +++-
> lib/fault-inject.c | 40 ++++++++++++++++++++---
> 4 files changed, 51 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
> index 2ef8531..1cbfbbb 100644
> --- a/Documentation/fault-injection/fault-injection.txt
> +++ b/Documentation/fault-injection/fault-injection.txt
> @@ -81,6 +81,14 @@ configuration of fault-injection capabilities.
> Any positive value limits failures to only processes indicated by
> /proc/<pid>/make-it-fail==1.
>
> +- /sys/kernel/debug/fail*/fail_new_callsites:
> +
> + Format: { 'Y' | 'N' }
> + A value of 'Y' will cause a callsite which has never been seen
> + before (since the last boot) to fail immediately. This can be
> + useful for triggering fault injection for very rare or hard-to-hit
> + call chains.
> +
> - /sys/kernel/debug/fail*/require-start:
> - /sys/kernel/debug/fail*/require-end:
> - /sys/kernel/debug/fail*/reject-start:
> diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
> index 9f4956d..022da94 100644
> --- a/include/linux/fault-inject.h
> +++ b/include/linux/fault-inject.h
> @@ -19,6 +19,7 @@ struct fault_attr {
> atomic_t space;
> unsigned long verbose;
> bool task_filter;
> + bool fail_new_callsites;
> unsigned long stacktrace_depth;
> unsigned long require_start;
> unsigned long require_end;
> @@ -34,6 +35,7 @@ struct fault_attr {
> .interval = 1, \
> .times = ATOMIC_INIT(1), \
> .require_end = ULONG_MAX, \
> + .fail_new_callsites = false, \
> .stacktrace_depth = 32, \
> .ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
> .verbose = 2, \
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 1f13d02..d06a51d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1699,7 +1699,12 @@ config FAULT_INJECTION_STACKTRACE_FILTER
> select STACKTRACE
> select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !SCORE
> help
> - Provide stacktrace filter for fault-injection capabilities
> + Provide stacktrace filter for fault-injection capabilities.
> +
> + This also allows you to enable fault-injection for new callsites,
> + i.e. a forced fault the first time a given call chain is seen.
> + This can be useful for triggering fault injection for very rare
> + or hard-to-hit call chains.
>
> config LATENCYTOP
> bool "Latency measuring infrastructure"
> diff --git a/lib/fault-inject.c b/lib/fault-inject.c
> index adba7c9..a401b14 100644
> --- a/lib/fault-inject.c
> +++ b/lib/fault-inject.c
> @@ -63,7 +63,7 @@ static bool fail_task(struct fault_attr *attr, struct task_struct *task)
>
> #ifdef CONFIG_FAULT_INJECTION_STACKTRACE_FILTER
>
> -static bool fail_stacktrace(struct fault_attr *attr)
> +static bool fail_stacktrace(struct fault_attr *attr, unsigned int *hash)
> {
> struct stack_trace trace;
> int depth = attr->stacktrace_depth;
> @@ -88,12 +88,20 @@ static bool fail_stacktrace(struct fault_attr *attr)
> entries[n] < attr->require_end)
> found = true;
> }
> +
> + if (IS_ENABLED(CONFIG_FAULT_INJECTION_AT_NEW_CALLSITES)) {
> + const char *start = (const char *) &entries[0];
> + const char *end = (const char *) &entries[trace.nr_entries];
> +
> + *hash = full_name_hash(0, start, end - start);
> + }
> +
> return found;
> }
>
> #else
>
> -static inline bool fail_stacktrace(struct fault_attr *attr)
> +static inline bool fail_stacktrace(struct fault_attr *attr, unsigned int *hash)
> {
> return true;
> }
> @@ -134,6 +142,8 @@ static bool __fail(struct fault_attr *attr)
>
> bool should_fail(struct fault_attr *attr, ssize_t size)
> {
> + unsigned int hash = 0;
> +
> /* No need to check any other properties if the probability is 0 */
> if (attr->probability == 0)
> return false;
> @@ -149,6 +159,25 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
> return false;
> }
>
> + if (!fail_stacktrace(attr, &hash))
> + return false;
> +

Why is this call moved from the end of should_fail() to here?
This changes the behavior of fault injection when stacktrace filter
is enabled. So please prepare as another patch for this change with
a good reason.

> + if (IS_ENABLED(CONFIG_FAULT_INJECTION_STACKTRACE_FILTER) &&
> + attr->fail_new_callsites) {
> + static unsigned long seen_hashtable[16 * 1024];
> +

Please use DECLARE_BITMAP macro.

> + hash &= 8 * sizeof(seen_hashtable) - 1;
> + if (!test_and_set_bit(hash & (BITS_PER_LONG - 1),
> + &seen_hashtable[hash / BITS_PER_LONG]))
> + {

It is unnecessary to pass adjusted bit number and bitmap offset to
test_and_set_bit(). (i.e. test_and_set_bit(hash, seen_hashtable))

> + /*
> + * If it's the first time we see this stacktrace, fail it
> + * without a second thought.
> + */
> + goto fail;
> + }
> + }
> +
> if (attr->interval > 1) {
> attr->count++;
> if (attr->count % attr->interval)
> @@ -158,9 +187,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
> if (attr->probability <= prandom_u32() % 100)
> return false;
>
> - if (!fail_stacktrace(attr))
> - return false;
> -
> +fail:
> return __fail(attr);
> }
> EXPORT_SYMBOL_GPL(should_fail);
> @@ -241,6 +268,9 @@ struct dentry *fault_create_debugfs_attr(const char *name,
>
> #ifdef CONFIG_FAULT_INJECTION_STACKTRACE_FILTER
>
> + if (!debugfs_create_bool("fail_new_callsites", mode, dir,
> + &attr->fail_new_callsites))
> + goto fail;
> if (!debugfs_create_stacktrace_depth("stacktrace-depth", mode, dir,
> &attr->stacktrace_depth))
> goto fail;
> --
> 2.10.0.479.g221bd91
>