2019-05-20 21:41:17

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).

v3: Rebase on top of Glisse's arg rework.

v4: More rebase on top of Glisse reworking everything.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Cc: Paolo Bonzini <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
mm/mmu_notifier.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ee36068077b6..c05e406a7cd7 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!mmu_notifier_range_blockable(range) ? "non-" : "");
+ if (!mmu_notifier_range_blockable(range))
+ pr_warn("%pS callback failure not allowed\n",
+ mn->ops->invalidate_range_start);
ret = _ret;
}
}
--
2.20.1



2019-05-20 21:42:02

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start

This is a similar idea to the fs_reclaim fake lockdep lock. It's
fairly easy to provoke a specific notifier to be run on a specific
range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for
all the various callchains that might lead to them. But both at the
same time is really hard to reliable hit, especially when you want to
exercise paths like direct reclaim or compaction, where it's not
easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep
to see a lot more dependencies, without having to actually hit them
in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled
this out for the invaliate_range_start callback. If there's
interest, we should probably roll this out to all of them. But my
undestanding of core mm is seriously lacking, and I'm not clear on
whether we need a lockdep map for each callback, or whether some can
be shared.

v2: Use lock_map_acquire/release() like fs_reclaim, to avoid confusion
with this being a real mutex (Chris Wilson).

v3: Rebase on top of Glisse's arg rework.

Cc: Chris Wilson <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/mmu_notifier.h | 6 ++++++
mm/mmu_notifier.c | 7 +++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b6c004bd9f6a..9dd38c32fc53 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -42,6 +42,10 @@ enum mmu_notifier_event {

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
+#endif
+
/*
* The mmu notifier_mm structure is allocated and installed in
* mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -310,10 +314,12 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
static inline void
mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
{
+ lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
if (mm_has_notifiers(range->mm)) {
range->flags |= MMU_NOTIFIER_RANGE_BLOCKABLE;
__mmu_notifier_invalidate_range_start(range);
}
+ lock_map_release(&__mmu_notifier_invalidate_range_start_map);
}

static inline int
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index a09e737711d5..33bdaddfb9b1 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -23,6 +23,13 @@
/* global SRCU for all MMs */
DEFINE_STATIC_SRCU(srcu);

+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
+ .name = "mmu_notifier_invalidate_range_start"
+};
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
+#endif
+
/*
* This function allows mmu_notifier::release callback to delay a call to
* a function that will free appropriate resources. The function must be
--
2.20.1


2019-05-20 21:42:16

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 3/4] mm, notifier: Catch sleeping/blocking for !blockable

We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Inspired by an i915 patch series which did exactly that, because the
rules haven't been entirely clear to us.

v2: Use the shiny new non_block_start/end annotations instead of
abusing preempt_disable/enable.

v3: Rebase on top of Glisse's arg rework.

v4: Rebase on top of more Glisse rework.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
mm/mmu_notifier.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index c05e406a7cd7..a09e737711d5 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -176,7 +176,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
id = srcu_read_lock(&srcu);
hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
if (mn->ops->invalidate_range_start) {
- int _ret = mn->ops->invalidate_range_start(mn, range);
+ int _ret;
+
+ if (!mmu_notifier_range_blockable(range))
+ non_block_start();
+ _ret = mn->ops->invalidate_range_start(mn, range);
+ if (!mmu_notifier_range_blockable(range))
+ non_block_end();
if (_ret) {
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
--
2.20.1


2019-05-20 21:43:35

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 2/4] kernel.h: Add non_block_start/end()

In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress. Quoting Michal:

"The notifier is called from quite a restricted context - oom_reaper -
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially."

Peter also asked whether we want to catch spinlocks on top, but Michal
said those are less of a problem because spinlocks can't have an
indirect dependency upon the page allocator and hence close the loop
with the oom reaper.

Suggested by Michal Hocko.

v2:
- Improve commit message (Michal)
- Also check in schedule, not just might_sleep (Peter)

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Cc: Masahiro Yamada <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: [email protected]
Acked-by: Christian König <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/kernel.h | 10 +++++++++-
include/linux/sched.h | 4 ++++
kernel/sched/core.c | 19 ++++++++++++++-----
3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74b1ee9027f5..b5f2c2ff0eab 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -214,7 +214,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
* might_sleep - annotation for functions that can sleep
*
* this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
*
* This is a useful debugging help to be able to catch problems early and not
* be bitten later when the calling function happens to sleep when it is not
@@ -230,6 +232,10 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
# define cant_sleep() \
do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
# define sched_annotate_sleep() (current->task_state_change = 0)
+# define non_block_start() \
+ do { current->non_block_count++; } while (0)
+# define non_block_end() \
+ do { WARN_ON(current->non_block_count-- == 0); } while (0)
#else
static inline void ___might_sleep(const char *file, int line,
int preempt_offset) { }
@@ -238,6 +244,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
# define might_sleep() do { might_resched(); } while (0)
# define cant_sleep() do { } while (0)
# define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
#endif

#define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..7f5b293e72df 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -908,6 +908,10 @@ struct task_struct {
struct mutex_waiter *blocked_on;
#endif

+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ int non_block_count;
+#endif
+
#ifdef CONFIG_TRACE_IRQFLAGS
unsigned int irq_events;
unsigned long hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 102dfcf0a29a..dd08d423947d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3264,13 +3264,22 @@ static noinline void __schedule_bug(struct task_struct *prev)
/*
* Various schedule()-time debugging checks and statistics:
*/
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
{
#ifdef CONFIG_SCHED_STACK_END_CHECK
if (task_stack_end_corrupted(prev))
panic("corrupted stack end detected inside scheduler\n");
#endif

+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ if (!preempt && prev->state && prev->non_block_count) {
+ printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
+ prev->comm, prev->pid, prev->non_blocking_count);
+ dump_stack();
+ add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+ }
+#endif
+
if (unlikely(in_atomic_preempt_off())) {
__schedule_bug(prev);
preempt_count_set(PREEMPT_DISABLED);
@@ -3377,7 +3386,7 @@ static void __sched notrace __schedule(bool preempt)
rq = cpu_rq(cpu);
prev = rq->curr;

- schedule_debug(prev);
+ schedule_debug(prev, preempt);

if (sched_feat(HRTICK))
hrtick_clear(rq);
@@ -6102,7 +6111,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
rcu_sleep_check();

if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
- !is_idle_task(current)) ||
+ !is_idle_task(current) && !current->non_block_count) ||
system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
oops_in_progress)
return;
@@ -6118,8 +6127,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
"BUG: sleeping function called from invalid context at %s:%d\n",
file, line);
printk(KERN_ERR
- "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
- in_atomic(), irqs_disabled(),
+ "in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+ in_atomic(), irqs_disabled(), current->non_block_count,
current->pid, current->comm);

if (task_stack_end_corrupted(current))
--
2.20.1


2019-05-21 14:50:52

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] kernel.h: Add non_block_start/end()

In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress. Quoting Michal:

"The notifier is called from quite a restricted context - oom_reaper -
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially."

Peter also asked whether we want to catch spinlocks on top, but Michal
said those are less of a problem because spinlocks can't have an
indirect dependency upon the page allocator and hence close the loop
with the oom reaper.

Suggested by Michal Hocko.

v2:
- Improve commit message (Michal)
- Also check in schedule, not just might_sleep (Peter)

v3: It works better when I actually squash in the fixup I had lying
around :-/

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Cc: Masahiro Yamada <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: [email protected]
Acked-by: Christian König <[email protected]> (v1)
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/kernel.h | 10 +++++++++-
include/linux/sched.h | 4 ++++
kernel/sched/core.c | 19 ++++++++++++++-----
3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74b1ee9027f5..b5f2c2ff0eab 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -214,7 +214,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
* might_sleep - annotation for functions that can sleep
*
* this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
*
* This is a useful debugging help to be able to catch problems early and not
* be bitten later when the calling function happens to sleep when it is not
@@ -230,6 +232,10 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
# define cant_sleep() \
do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
# define sched_annotate_sleep() (current->task_state_change = 0)
+# define non_block_start() \
+ do { current->non_block_count++; } while (0)
+# define non_block_end() \
+ do { WARN_ON(current->non_block_count-- == 0); } while (0)
#else
static inline void ___might_sleep(const char *file, int line,
int preempt_offset) { }
@@ -238,6 +244,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
# define might_sleep() do { might_resched(); } while (0)
# define cant_sleep() do { } while (0)
# define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
#endif

#define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..7f5b293e72df 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -908,6 +908,10 @@ struct task_struct {
struct mutex_waiter *blocked_on;
#endif

+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ int non_block_count;
+#endif
+
#ifdef CONFIG_TRACE_IRQFLAGS
unsigned int irq_events;
unsigned long hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 102dfcf0a29a..ed7755a28465 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3264,13 +3264,22 @@ static noinline void __schedule_bug(struct task_struct *prev)
/*
* Various schedule()-time debugging checks and statistics:
*/
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
{
#ifdef CONFIG_SCHED_STACK_END_CHECK
if (task_stack_end_corrupted(prev))
panic("corrupted stack end detected inside scheduler\n");
#endif

+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+ if (!preempt && prev->state && prev->non_block_count) {
+ printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
+ prev->comm, prev->pid, prev->non_block_count);
+ dump_stack();
+ add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+ }
+#endif
+
if (unlikely(in_atomic_preempt_off())) {
__schedule_bug(prev);
preempt_count_set(PREEMPT_DISABLED);
@@ -3377,7 +3386,7 @@ static void __sched notrace __schedule(bool preempt)
rq = cpu_rq(cpu);
prev = rq->curr;

- schedule_debug(prev);
+ schedule_debug(prev, preempt);

if (sched_feat(HRTICK))
hrtick_clear(rq);
@@ -6102,7 +6111,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
rcu_sleep_check();

if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
- !is_idle_task(current)) ||
+ !is_idle_task(current) && !current->non_block_count) ||
system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
oops_in_progress)
return;
@@ -6118,8 +6127,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
"BUG: sleeping function called from invalid context at %s:%d\n",
file, line);
printk(KERN_ERR
- "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
- in_atomic(), irqs_disabled(),
+ "in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+ in_atomic(), irqs_disabled(), current->non_block_count,
current->pid, current->comm);

if (task_stack_end_corrupted(current))
--
2.20.1


2019-05-21 15:35:05

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm, notifier: Catch sleeping/blocking for !blockable

On Mon, May 20, 2019 at 11:39:44PM +0200, Daniel Vetter wrote:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
>
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.
>
> Inspired by an i915 patch series which did exactly that, because the
> rules haven't been entirely clear to us.
>
> v2: Use the shiny new non_block_start/end annotations instead of
> abusing preempt_disable/enable.
>
> v3: Rebase on top of Glisse's arg rework.
>
> v4: Rebase on top of more Glisse rework.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: "Christian K?nig" <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "J?r?me Glisse" <[email protected]>
> Cc: [email protected]
> Reviewed-by: Christian K?nig <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> mm/mmu_notifier.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index c05e406a7cd7..a09e737711d5 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -176,7 +176,13 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, hlist) {
> if (mn->ops->invalidate_range_start) {
> - int _ret = mn->ops->invalidate_range_start(mn, range);
> + int _ret;
> +
> + if (!mmu_notifier_range_blockable(range))
> + non_block_start();
> + _ret = mn->ops->invalidate_range_start(mn, range);
> + if (!mmu_notifier_range_blockable(range))
> + non_block_end();

This is a taste thing so feel free to ignore it as maybe other
will dislike more what i prefer:

+ if (!mmu_notifier_range_blockable(range)) {
+ non_block_start();
+ _ret = mn->ops->invalidate_range_start(mn, range);
+ non_block_end();
+ } else
+ _ret = mn->ops->invalidate_range_start(mn, range);

If only we had predicate on CPU like on GPU :)

In any case:

Reviewed-by: J?r?me Glisse <[email protected]>


> if (_ret) {
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> --
> 2.20.1
>

2019-05-21 15:42:49

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start

On Mon, May 20, 2019 at 11:39:45PM +0200, Daniel Vetter wrote:
> This is a similar idea to the fs_reclaim fake lockdep lock. It's
> fairly easy to provoke a specific notifier to be run on a specific
> range: Just prep it, and then munmap() it.
>
> A bit harder, but still doable, is to provoke the mmu notifiers for
> all the various callchains that might lead to them. But both at the
> same time is really hard to reliable hit, especially when you want to
> exercise paths like direct reclaim or compaction, where it's not
> easy to control what exactly will be unmapped.
>
> By introducing a lockdep map to tie them all together we allow lockdep
> to see a lot more dependencies, without having to actually hit them
> in a single challchain while testing.
>
> Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> this out for the invaliate_range_start callback. If there's
> interest, we should probably roll this out to all of them. But my
> undestanding of core mm is seriously lacking, and I'm not clear on
> whether we need a lockdep map for each callback, or whether some can
> be shared.

I need to read more on lockdep but it is legal to have mmu notifier
invalidation within each other. For instance when you munmap you
might split a huge pmd and it will trigger a second invalidate range
while the munmap one is not done yet. Would that trigger the lockdep
here ?

Worst case i can think of is 2 invalidate_range_start chain one after
the other. I don't think you can triggers a 3 levels nesting but maybe.

Cheers,
J?r?me

2019-05-21 15:47:29

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
>
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
>
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
>
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
>
> v3: Rebase on top of Glisse's arg rework.
>
> v4: More rebase on top of Glisse reworking everything.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: "Christian K?nig" <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "J?r?me Glisse" <[email protected]>
> Cc: [email protected]
> Cc: Paolo Bonzini <[email protected]>
> Reviewed-by: Christian K?nig <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>

Reviewed-by: J?r?me Glisse <[email protected]>

> ---
> mm/mmu_notifier.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ee36068077b6..c05e406a7cd7 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> !mmu_notifier_range_blockable(range) ? "non-" : "");
> + if (!mmu_notifier_range_blockable(range))
> + pr_warn("%pS callback failure not allowed\n",
> + mn->ops->invalidate_range_start);
> ret = _ret;
> }
> }
> --
> 2.20.1
>

2019-05-21 16:02:21

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, May 21, 2019 at 5:41 PM Jerome Glisse <[email protected]> wrote:
>
> On Mon, May 20, 2019 at 11:39:45PM +0200, Daniel Vetter wrote:
> > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > fairly easy to provoke a specific notifier to be run on a specific
> > range: Just prep it, and then munmap() it.
> >
> > A bit harder, but still doable, is to provoke the mmu notifiers for
> > all the various callchains that might lead to them. But both at the
> > same time is really hard to reliable hit, especially when you want to
> > exercise paths like direct reclaim or compaction, where it's not
> > easy to control what exactly will be unmapped.
> >
> > By introducing a lockdep map to tie them all together we allow lockdep
> > to see a lot more dependencies, without having to actually hit them
> > in a single challchain while testing.
> >
> > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > this out for the invaliate_range_start callback. If there's
> > interest, we should probably roll this out to all of them. But my
> > undestanding of core mm is seriously lacking, and I'm not clear on
> > whether we need a lockdep map for each callback, or whether some can
> > be shared.
>
> I need to read more on lockdep but it is legal to have mmu notifier
> invalidation within each other. For instance when you munmap you
> might split a huge pmd and it will trigger a second invalidate range
> while the munmap one is not done yet. Would that trigger the lockdep
> here ?

Depends how it's nesting. I'm wrapping the annotation only just around
the individual mmu notifier callback, so if the nesting is just
- munmap starts
- invalidate_range_start #1
- we noticed that there's a huge pmd we need to split
- invalidate_range_start #2
- invalidate_reange_end #2
- invalidate_range_end #1
- munmap is done

But if otoh it's ok to trigger the 2nd invalidate range from within an
mmu_notifier->invalidate_range_start callback, then lockdep will be
pissed about that.

> Worst case i can think of is 2 invalidate_range_start chain one after
> the other. I don't think you can triggers a 3 levels nesting but maybe.

Lockdep has special nesting annotations. I think it'd be more an issue
of getting those funneled through the entire call chain, assuming we
really need that.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2019-05-21 16:34:58

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, May 21, 2019 at 06:00:36PM +0200, Daniel Vetter wrote:
> On Tue, May 21, 2019 at 5:41 PM Jerome Glisse <[email protected]> wrote:
> >
> > On Mon, May 20, 2019 at 11:39:45PM +0200, Daniel Vetter wrote:
> > > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > > fairly easy to provoke a specific notifier to be run on a specific
> > > range: Just prep it, and then munmap() it.
> > >
> > > A bit harder, but still doable, is to provoke the mmu notifiers for
> > > all the various callchains that might lead to them. But both at the
> > > same time is really hard to reliable hit, especially when you want to
> > > exercise paths like direct reclaim or compaction, where it's not
> > > easy to control what exactly will be unmapped.
> > >
> > > By introducing a lockdep map to tie them all together we allow lockdep
> > > to see a lot more dependencies, without having to actually hit them
> > > in a single challchain while testing.
> > >
> > > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > > this out for the invaliate_range_start callback. If there's
> > > interest, we should probably roll this out to all of them. But my
> > > undestanding of core mm is seriously lacking, and I'm not clear on
> > > whether we need a lockdep map for each callback, or whether some can
> > > be shared.
> >
> > I need to read more on lockdep but it is legal to have mmu notifier
> > invalidation within each other. For instance when you munmap you
> > might split a huge pmd and it will trigger a second invalidate range
> > while the munmap one is not done yet. Would that trigger the lockdep
> > here ?
>
> Depends how it's nesting. I'm wrapping the annotation only just around
> the individual mmu notifier callback, so if the nesting is just
> - munmap starts
> - invalidate_range_start #1
> - we noticed that there's a huge pmd we need to split
> - invalidate_range_start #2
> - invalidate_reange_end #2
> - invalidate_range_end #1
> - munmap is done

Yeah this is how it looks. All the callback from range_start #1 would
happens before range_start #2 happens so we should be fine.

>
> But if otoh it's ok to trigger the 2nd invalidate range from within an
> mmu_notifier->invalidate_range_start callback, then lockdep will be
> pissed about that.

No that would be illegal for a callback to do that. There is no existing
callback that would do that at least AFAIK. So we can just say that it
is illegal. I would not see the point.

>
> > Worst case i can think of is 2 invalidate_range_start chain one after
> > the other. I don't think you can triggers a 3 levels nesting but maybe.
>
> Lockdep has special nesting annotations. I think it'd be more an issue
> of getting those funneled through the entire call chain, assuming we
> really need that.

I think we are fine. So this patch looks good.

Reviewed-by: J?r?me Glisse <[email protected]>

2019-06-18 15:22:45

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
> >
> > Inspired by some confusion we had discussing i915 mmu notifiers and
> > whether we could use the newly-introduced return value to handle some
> > corner cases. Until we realized that these are only for when a task
> > has been killed by the oom reaper.
> >
> > An alternative approach would be to split the callback into two
> > versions, one with the int return value, and the other with void
> > return value like in older kernels. But that's a lot more churn for
> > fairly little gain I think.
> >
> > Summary from the m-l discussion on why we want something at warning
> > level: This allows automated tooling in CI to catch bugs without
> > humans having to look at everything. If we just upgrade the existing
> > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > one will ever spot the problem since it's lost in the massive amounts
> > of overall dmesg noise.
> >
> > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > the problematic case (Michal Hocko).
> >
> > v3: Rebase on top of Glisse's arg rework.
> >
> > v4: More rebase on top of Glisse reworking everything.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: "Christian K?nig" <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: "J?r?me Glisse" <[email protected]>
> > Cc: [email protected]
> > Cc: Paolo Bonzini <[email protected]>
> > Reviewed-by: Christian K?nig <[email protected]>
> > Signed-off-by: Daniel Vetter <[email protected]>
>
> Reviewed-by: J?r?me Glisse <[email protected]>

-mm folks, is this (entire series of 4 patches) planned to land in the 5.3
merge window? Or do you want more reviews/testing/polish?

I think with all the hmm rework going on, a bit more validation and checks
in this tricky area would help.

Thanks, Daniel

>
> > ---
> > mm/mmu_notifier.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index ee36068077b6..c05e406a7cd7 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> > pr_info("%pS callback failed with %d in %sblockable context.\n",
> > mn->ops->invalidate_range_start, _ret,
> > !mmu_notifier_range_blockable(range) ? "non-" : "");
> > + if (!mmu_notifier_range_blockable(range))
> > + pr_warn("%pS callback failure not allowed\n",
> > + mn->ops->invalidate_range_start);
> > ret = _ret;
> > }
> > }
> > --
> > 2.20.1
> >
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2019-06-19 16:52:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > Just a bit of paranoia, since if we start pushing this deep into
> > > callchains it's hard to spot all places where an mmu notifier
> > > implementation might fail when it's not allowed to.
> > >
> > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > whether we could use the newly-introduced return value to handle some
> > > corner cases. Until we realized that these are only for when a task
> > > has been killed by the oom reaper.
> > >
> > > An alternative approach would be to split the callback into two
> > > versions, one with the int return value, and the other with void
> > > return value like in older kernels. But that's a lot more churn for
> > > fairly little gain I think.
> > >
> > > Summary from the m-l discussion on why we want something at warning
> > > level: This allows automated tooling in CI to catch bugs without
> > > humans having to look at everything. If we just upgrade the existing
> > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > one will ever spot the problem since it's lost in the massive amounts
> > > of overall dmesg noise.
> > >
> > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > the problematic case (Michal Hocko).

I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
like syzkaller to report a bug, while a random pr_warn probably will
not.

I do agree the backtrace is not useful here, but we don't have a
warn-no-backtrace version..

IMHO, kernel/driver bugs should always be reported by WARN &
friends. We never expect to see the print, so why do we care how big
it is?

Also note that WARN integrates an unlikely() into it so the codegen is
automatically a bit more optimal that the if & pr_warn combination.

Jason

2019-06-19 19:57:55

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <[email protected]> wrote:
> On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > callchains it's hard to spot all places where an mmu notifier
> > > > implementation might fail when it's not allowed to.
> > > >
> > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > whether we could use the newly-introduced return value to handle some
> > > > corner cases. Until we realized that these are only for when a task
> > > > has been killed by the oom reaper.
> > > >
> > > > An alternative approach would be to split the callback into two
> > > > versions, one with the int return value, and the other with void
> > > > return value like in older kernels. But that's a lot more churn for
> > > > fairly little gain I think.
> > > >
> > > > Summary from the m-l discussion on why we want something at warning
> > > > level: This allows automated tooling in CI to catch bugs without
> > > > humans having to look at everything. If we just upgrade the existing
> > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > one will ever spot the problem since it's lost in the massive amounts
> > > > of overall dmesg noise.
> > > >
> > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > the problematic case (Michal Hocko).
>
> I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> like syzkaller to report a bug, while a random pr_warn probably will
> not.
>
> I do agree the backtrace is not useful here, but we don't have a
> warn-no-backtrace version..
>
> IMHO, kernel/driver bugs should always be reported by WARN &
> friends. We never expect to see the print, so why do we care how big
> it is?
>
> Also note that WARN integrates an unlikely() into it so the codegen is
> automatically a bit more optimal that the if & pr_warn combination.

Where do you make a difference between a WARN without backtrace and a
pr_warn? They're both dumped at the same log-level ...

I can easily throw an unlikely around this here if that's the only
thing that's blocking the merge.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2019-06-19 20:16:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <[email protected]> wrote:
> > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > implementation might fail when it's not allowed to.
> > > > >
> > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > whether we could use the newly-introduced return value to handle some
> > > > > corner cases. Until we realized that these are only for when a task
> > > > > has been killed by the oom reaper.
> > > > >
> > > > > An alternative approach would be to split the callback into two
> > > > > versions, one with the int return value, and the other with void
> > > > > return value like in older kernels. But that's a lot more churn for
> > > > > fairly little gain I think.
> > > > >
> > > > > Summary from the m-l discussion on why we want something at warning
> > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > humans having to look at everything. If we just upgrade the existing
> > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > of overall dmesg noise.
> > > > >
> > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > the problematic case (Michal Hocko).
> >
> > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > like syzkaller to report a bug, while a random pr_warn probably will
> > not.
> >
> > I do agree the backtrace is not useful here, but we don't have a
> > warn-no-backtrace version..
> >
> > IMHO, kernel/driver bugs should always be reported by WARN &
> > friends. We never expect to see the print, so why do we care how big
> > it is?
> >
> > Also note that WARN integrates an unlikely() into it so the codegen is
> > automatically a bit more optimal that the if & pr_warn combination.
>
> Where do you make a difference between a WARN without backtrace and a
> pr_warn? They're both dumped at the same log-level ...

WARN panics the kernel when you set

/proc/sys/kernel/panic_on_warn

So auto testing tools can set that and get a clean detection that the
kernel has failed the test in some way.

Otherwise you are left with frail/ugly grepping of dmesg.

Jason

2019-06-19 20:19:20

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <[email protected]> wrote:
> On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <[email protected]> wrote:
> > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > implementation might fail when it's not allowed to.
> > > > > >
> > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > has been killed by the oom reaper.
> > > > > >
> > > > > > An alternative approach would be to split the callback into two
> > > > > > versions, one with the int return value, and the other with void
> > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > fairly little gain I think.
> > > > > >
> > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > of overall dmesg noise.
> > > > > >
> > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > the problematic case (Michal Hocko).
> > >
> > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > like syzkaller to report a bug, while a random pr_warn probably will
> > > not.
> > >
> > > I do agree the backtrace is not useful here, but we don't have a
> > > warn-no-backtrace version..
> > >
> > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > friends. We never expect to see the print, so why do we care how big
> > > it is?
> > >
> > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > automatically a bit more optimal that the if & pr_warn combination.
> >
> > Where do you make a difference between a WARN without backtrace and a
> > pr_warn? They're both dumped at the same log-level ...
>
> WARN panics the kernel when you set
>
> /proc/sys/kernel/panic_on_warn
>
> So auto testing tools can set that and get a clean detection that the
> kernel has failed the test in some way.
>
> Otherwise you are left with frail/ugly grepping of dmesg.

Hm right.

Anyway, I'm happy to repaint the bikeshed in any color that's desired,
if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
longer (need to find a bit of time, plus it'll definitely attract more
comments).

Michal?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2019-06-19 20:43:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <[email protected]> wrote:
> > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <[email protected]> wrote:
> > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > implementation might fail when it's not allowed to.
> > > > > > >
> > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > has been killed by the oom reaper.
> > > > > > >
> > > > > > > An alternative approach would be to split the callback into two
> > > > > > > versions, one with the int return value, and the other with void
> > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > fairly little gain I think.
> > > > > > >
> > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > of overall dmesg noise.
> > > > > > >
> > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > the problematic case (Michal Hocko).
> > > >
> > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > not.
> > > >
> > > > I do agree the backtrace is not useful here, but we don't have a
> > > > warn-no-backtrace version..
> > > >
> > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > friends. We never expect to see the print, so why do we care how big
> > > > it is?
> > > >
> > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > automatically a bit more optimal that the if & pr_warn combination.
> > >
> > > Where do you make a difference between a WARN without backtrace and a
> > > pr_warn? They're both dumped at the same log-level ...
> >
> > WARN panics the kernel when you set
> >
> > /proc/sys/kernel/panic_on_warn
> >
> > So auto testing tools can set that and get a clean detection that the
> > kernel has failed the test in some way.
> >
> > Otherwise you are left with frail/ugly grepping of dmesg.
>
> Hm right.
>
> Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> longer (need to find a bit of time, plus it'll definitely attract more
> comments).

I was actually just writing something very similar when looking at the
hmm things..

Also, is the test backwards?

mmu_notifier_range_blockable() == true means the callback must return
zero

mmu_notififer_range_blockable() == false means the callback can return
0 or -EAGAIN.

Suggest this:

pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!mmu_notifier_range_blockable(range) ? "non-" : "");
+ WARN_ON(mmu_notifier_range_blockable(range) ||
+ _ret != -EAGAIN);
ret = _ret;
}
}

To express the API invariant.

Jason

2019-06-19 21:21:05

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

On Wed, Jun 19, 2019 at 10:42 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <[email protected]> wrote:
> > > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <[email protected]> wrote:
> > > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > > implementation might fail when it's not allowed to.
> > > > > > > >
> > > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > > has been killed by the oom reaper.
> > > > > > > >
> > > > > > > > An alternative approach would be to split the callback into two
> > > > > > > > versions, one with the int return value, and the other with void
> > > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > > fairly little gain I think.
> > > > > > > >
> > > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > > of overall dmesg noise.
> > > > > > > >
> > > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > > the problematic case (Michal Hocko).
> > > > >
> > > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > > not.
> > > > >
> > > > > I do agree the backtrace is not useful here, but we don't have a
> > > > > warn-no-backtrace version..
> > > > >
> > > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > > friends. We never expect to see the print, so why do we care how big
> > > > > it is?
> > > > >
> > > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > > automatically a bit more optimal that the if & pr_warn combination.
> > > >
> > > > Where do you make a difference between a WARN without backtrace and a
> > > > pr_warn? They're both dumped at the same log-level ...
> > >
> > > WARN panics the kernel when you set
> > >
> > > /proc/sys/kernel/panic_on_warn
> > >
> > > So auto testing tools can set that and get a clean detection that the
> > > kernel has failed the test in some way.
> > >
> > > Otherwise you are left with frail/ugly grepping of dmesg.
> >
> > Hm right.
> >
> > Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> > if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> > longer (need to find a bit of time, plus it'll definitely attract more
> > comments).
>
> I was actually just writing something very similar when looking at the
> hmm things..
>
> Also, is the test backwards?

Yes, in the last rebase I screwed things up :-/
-Daniel

> mmu_notifier_range_blockable() == true means the callback must return
> zero
>
> mmu_notififer_range_blockable() == false means the callback can return
> 0 or -EAGAIN.
>
> Suggest this:
>
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> !mmu_notifier_range_blockable(range) ? "non-" : "");
> + WARN_ON(mmu_notifier_range_blockable(range) ||
> + _ret != -EAGAIN);
> ret = _ret;
> }
> }
>
> To express the API invariant.
>
> Jason



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch