2018-11-24 03:17:47

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 0/3] RFC: mmu notifier debug checks

Hi all,

We're having some good fun with the i915 mmu notifier (it deadlocks), and
I think it'd be very useful to have a bunch more runtime debug checks to
catch screw-ups.

I'm also working on some lockdep improvements in gpu code (better
annotations and stuff like that). Together with this series here this
seems to catch a lot of bugs pretty much instantly, which previously took
hours/days of CI workloads to reproduce. Plus now you get nice backtraces
and the kernel keeps working, whereas without this it's real deadlocks
with piles of stuck processes (the deadlock needed at least 3 processes,
but generally it took more to close the loop, plus everyone piling in on
top).

If this looks like a good idea I'm happy to polish it for merging.

Thanks, Daniel

Daniel Vetter (3):
mm: Check if mmu notifier callbacks are allowed to fail
mm, notifier: Catch sleeping/blocking for !blockable
mm, notifier: Add a lockdep map for invalidate_range_start

include/linux/mmu_notifier.h | 7 +++++++
mm/mmu_notifier.c | 17 ++++++++++++++++-
2 files changed, 23 insertions(+), 1 deletion(-)

--
2.19.1



2018-11-24 03:18:30

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
mm/mmu_notifier.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5119ff846769..59e102589a25 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
!blockable ? "non-" : "");
+ WARN(blockable,"%pS callback failure not allowed\n",
+ mn->ops->invalidate_range_start);
ret = _ret;
}
}
--
2.19.1


2018-11-24 03:20:55

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
mm/mmu_notifier.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 59e102589a25..4d282cfb296e 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
id = srcu_read_lock(&srcu);
hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
if (mn->ops->invalidate_range_start) {
- int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+ int _ret;
+
+ if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
+ preempt_disable();
+ _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
+ if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
+ preempt_enable();
if (_ret) {
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
--
2.19.1


2018-11-24 06:51:57

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

This is a similar idea to the fs_reclaim fake lockdep lock. It's
fairly easy to provoke a specific notifier to be run on a specific
range: Just prep it, and then munmap() it.

A bit harder, but still doable, is to provoke the mmu notifiers for
all the various callchains that might lead to them. But both at the
same time is really hard to reliable hit, especially when you want to
exercise paths like direct reclaim or compaction, where it's not
easy to control what exactly will be unmapped.

By introducing a lockdep map to tie them all together we allow lockdep
to see a lot more dependencies, without having to actually hit them
in a single challchain while testing.

Aside: Since I typed this to test i915 mmu notifiers I've only rolled
this out for the invaliate_range_start callback. If there's
interest, we should probably roll this out to all of them. But my
undestanding of core mm is seriously lacking, and I'm not clear on
whether we need a lockdep map for each callback, or whether some can
be shared.

Cc: Andrew Morton <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/mmu_notifier.h | 7 +++++++
mm/mmu_notifier.c | 7 +++++++
2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 9893a6432adf..a39ba218dbbe 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -12,6 +12,10 @@ struct mmu_notifier_ops;

#ifdef CONFIG_MMU_NOTIFIER

+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
+#endif
+
/*
* The mmu notifier_mm structure is allocated and installed in
* mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
unsigned long start, unsigned long end)
{
+ mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
+ _RET_IP_);
if (mm_has_notifiers(mm))
__mmu_notifier_invalidate_range_start(mm, start, end, true);
+ mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_);
}

static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm,
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 4d282cfb296e..c6e797927376 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -23,6 +23,13 @@
/* global SRCU for all MMs */
DEFINE_STATIC_SRCU(srcu);

+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
+ .name = "mmu_notifier_invalidate_range_start"
+};
+EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
+#endif
+
/*
* This function allows mmu_notifier::release callback to delay a call to
* a function that will free appropriate resources. The function must be
--
2.19.1


2018-11-24 06:53:08

by Chris Wilson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Quoting Daniel Vetter (2018-11-22 16:51:04)
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.

Most callers could handle the failure correctly. It looks like the
failure was not propagated for convenience.
-Chris

2018-11-24 07:13:03

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "Jérôme Glisse" <[email protected]>
> Cc: [email protected]
> Cc: Paolo Bonzini <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>

Acked-by: Christian König <[email protected]>

> ---
> mm/mmu_notifier.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..59e102589a25 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> !blockable ? "non-" : "");
> + WARN(blockable,"%pS callback failure not allowed\n",
> + mn->ops->invalidate_range_start);
> ret = _ret;
> }
> }

2018-11-24 07:13:42

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
>
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "Jérôme Glisse" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> mm/mmu_notifier.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 59e102589a25..4d282cfb296e 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> if (mn->ops->invalidate_range_start) {
> - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> + int _ret;
> +
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_disable();
> + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_enable();

Just for the sake of better documenting this how about adding this to
include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() :
(void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of
errors).

Christian.

> if (_ret) {
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,

2018-11-24 08:12:46

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
> Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> >
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: "Christian K?nig" <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: "J?r?me Glisse" <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Daniel Vetter <[email protected]>
> > ---
> > mm/mmu_notifier.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > id = srcu_read_lock(&srcu);
> > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> > if (mn->ops->invalidate_range_start) {
> > - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > + int _ret;
> > +
> > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > + preempt_disable();
> > + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > + preempt_enable();
>
> Just for the sake of better documenting this how about adding this to
> include/linux/kernel.h right next to might_sleep():
>
> #define disallow_sleeping_if(cond)??? for((cond) ? preempt_disable() :
> (void)0; (cond); preempt_disable())
>
> (Just from the back of my head, might contain peanuts and/or hints of
> errors).

I think these magic for blocks aren't used in the kernel. goto breaks
them, and we use goto a lot. I think a disallow/allow_sleep() pair with
the conditional preept_disable/enable() calls would be nice though. I can
do that if the overall idea sticks.
-Daniel

>
> Christian.
>
> > if (_ret) {
> > pr_info("%pS callback failed with %d in %sblockable context.\n",
> > mn->ops->invalidate_range_start, _ret,
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-24 08:13:23

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:
> Quoting Daniel Vetter (2018-11-22 16:51:04)
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
>
> Most callers could handle the failure correctly. It looks like the
> failure was not propagated for convenience.

I have no idea whether the mm is semantically ok if pte shootdown doesn't
work for all sorts of strange reasons. From the commit that introduced the
error code it souded like this was very much only ok in the limited case
of an already killed process, in the oom killer path, where it's really
only about trying to free any kind of memory. And where the process is
gone already, so semantics of what exactly happens don't matter that much
anymore.

And even if a lot more paths could support some kind of error recovery
(they'd need to restart stuff, at least for your i915 patch to work I
think), as long as we have paths where that's not allowed I think it's
good to catch any bugs where a nonzero errno is errornously returned.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-24 08:23:10

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

Am 23.11.18 um 09:46 schrieb Daniel Vetter:
> On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
>> Am 22.11.18 um 17:51 schrieb Daniel Vetter:
>>> We need to make sure implementations don't cheat and don't have a
>>> possible schedule/blocking point deeply burried where review can't
>>> catch it.
>>>
>>> I'm not sure whether this is the best way to make sure all the
>>> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
>>> But it gets the job done.
>>>
>>> Cc: Andrew Morton <[email protected]>
>>> Cc: Michal Hocko <[email protected]>
>>> Cc: David Rientjes <[email protected]>
>>> Cc: "Christian König" <[email protected]>
>>> Cc: Daniel Vetter <[email protected]>
>>> Cc: "Jérôme Glisse" <[email protected]>
>>> Cc: [email protected]
>>> Signed-off-by: Daniel Vetter <[email protected]>
>>> ---
>>> mm/mmu_notifier.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
>>> index 59e102589a25..4d282cfb296e 100644
>>> --- a/mm/mmu_notifier.c
>>> +++ b/mm/mmu_notifier.c
>>> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>>> id = srcu_read_lock(&srcu);
>>> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>>> if (mn->ops->invalidate_range_start) {
>>> - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
>>> + int _ret;
>>> +
>>> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
>>> + preempt_disable();
>>> + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
>>> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
>>> + preempt_enable();
>> Just for the sake of better documenting this how about adding this to
>> include/linux/kernel.h right next to might_sleep():
>>
>> #define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() :
>> (void)0; (cond); preempt_disable())
>>
>> (Just from the back of my head, might contain peanuts and/or hints of
>> errors).
> I think these magic for blocks aren't used in the kernel. goto breaks
> them, and we use goto a lot.

Yeah, good argument.

> I think a disallow/allow_sleep() pair with
> the conditional preept_disable/enable() calls would be nice though. I can
> do that if the overall idea sticks.

Sounds like a good idea to me as well.

Christian.

> -Daniel
>
>> Christian.
>>
>>> if (_ret) {
>>> pr_info("%pS callback failed with %d in %sblockable context.\n",
>>> mn->ops->invalidate_range_start, _ret,


2018-11-24 08:27:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
>
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config
bahavior much different. So is this really worth it? Has this already
discovered any existing bug?

> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: "Christian K?nig" <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "J?r?me Glisse" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> mm/mmu_notifier.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 59e102589a25..4d282cfb296e 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> if (mn->ops->invalidate_range_start) {
> - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> + int _ret;
> +
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_disable();
> + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_enable();
> if (_ret) {
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> --
> 2.19.1
>

--
Michal Hocko
SUSE Labs

2018-11-24 08:27:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 09:49:34, Daniel Vetter wrote:
> On Thu, Nov 22, 2018 at 04:53:34PM +0000, Chris Wilson wrote:
> > Quoting Daniel Vetter (2018-11-22 16:51:04)
> > > Just a bit of paranoia, since if we start pushing this deep into
> > > callchains it's hard to spot all places where an mmu notifier
> > > implementation might fail when it's not allowed to.
> >
> > Most callers could handle the failure correctly. It looks like the
> > failure was not propagated for convenience.
>
> I have no idea whether the mm is semantically ok if pte shootdown doesn't
> work for all sorts of strange reasons. From the commit that introduced the
> error code it souded like this was very much only ok in the limited case
> of an already killed process, in the oom killer path, where it's really
> only about trying to free any kind of memory. And where the process is
> gone already, so semantics of what exactly happens don't matter that much
> anymore.

Yes this was indeed the case. There is still the exit path which would
do the rest of the work so we are not leaving anything behind.
--
Michal Hocko
SUSE Labs

2018-11-24 08:27:23

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.

What does WARN give you more than the existing pr_info? Is really
backtrace that interesting?

> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: "Christian K?nig" <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: "J?r?me Glisse" <[email protected]>
> Cc: [email protected]
> Cc: Paolo Bonzini <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> mm/mmu_notifier.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..59e102589a25 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> !blockable ? "non-" : "");
> + WARN(blockable,"%pS callback failure not allowed\n",
> + mn->ops->invalidate_range_start);
> ret = _ret;
> }
> }
> --
> 2.19.1
>

--
Michal Hocko
SUSE Labs

2018-11-24 08:31:55

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
> On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
>
> What does WARN give you more than the existing pr_info? Is really
> backtrace that interesting?

Automated tools have to ignore everything at info level (there's too much
of that). I guess I could do something like

if (blockable)
pr_warn(...)
else
pr_info(...)

WARN() is simply my goto tool for getting something at warning level
dumped into dmesg. But I think the pr_warn with the callback function
should be enough indeed.

If you wonder where all the info level stuff happens that we have to
ignore: suspend/resume is a primary culprit (fairly important for
gfx/desktops), but there's a bunch of other places. Even if we ignore
everything at info and below we still need filters because some drivers
are a bit too trigger-happy (i915 definitely included I guess, so everyone
contributes to this problem).

Cheers, Daniel

>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: "Christian K?nig" <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: "J?r?me Glisse" <[email protected]>
> > Cc: [email protected]
> > Cc: Paolo Bonzini <[email protected]>
> > Signed-off-by: Daniel Vetter <[email protected]>
> > ---
> > mm/mmu_notifier.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 5119ff846769..59e102589a25 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > pr_info("%pS callback failed with %d in %sblockable context.\n",
> > mn->ops->invalidate_range_start, _ret,
> > !blockable ? "non-" : "");
> > + WARN(blockable,"%pS callback failure not allowed\n",
> > + mn->ops->invalidate_range_start);
> > ret = _ret;
> > }
> > }
> > --
> > 2.19.1
> >
>
> --
> Michal Hocko
> SUSE Labs

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-24 08:32:15

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> >
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
>
> Yeah, it is quite ugly. Especially because it makes DEBUG config
> bahavior much different. So is this really worth it? Has this already
> discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI
(oom is just way to unpredictable to even try). I'd kinda like to also add
some debug interface so I can provoke an oom kill of a specially prepared
process, to make sure we can reliably exercise this path without killing
the kernel accidentally. We do similar tricks for our shrinker already.

There's been patches floating with this kind of bug I think, and the call
chains we're dealing with a fairly deep. I don't trust review to reliably
catch this kind of fail, that's why I'm looking into tools to better
validat this stuff to augment review.

And yes it's ugly :-/

Wrt the behavior difference: I guess we could put another counter into the
task struct, and change might_sleep() to check it. All under
CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
sideeffect. My worry with that is that people will spot it, and abuse it
in creative ways that do affect semantics. See horrors like
drm_can_sleep() (and I'm sure gfx folks are not the only ones who
seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Thanks, Daniel

>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: "Christian K?nig" <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: "J?r?me Glisse" <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Daniel Vetter <[email protected]>
> > ---
> > mm/mmu_notifier.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > id = srcu_read_lock(&srcu);
> > hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> > if (mn->ops->invalidate_range_start) {
> > - int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > + int _ret;
> > +
> > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > + preempt_disable();
> > + _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > + preempt_enable();
> > if (_ret) {
> > pr_info("%pS callback failed with %d in %sblockable context.\n",
> > mn->ops->invalidate_range_start, _ret,
> > --
> > 2.19.1
> >
>
> --
> Michal Hocko
> SUSE Labs

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-24 08:32:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> > On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > > We need to make sure implementations don't cheat and don't have a
> > > possible schedule/blocking point deeply burried where review can't
> > > catch it.
> > >
> > > I'm not sure whether this is the best way to make sure all the
> > > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > > But it gets the job done.
> >
> > Yeah, it is quite ugly. Especially because it makes DEBUG config
> > bahavior much different. So is this really worth it? Has this already
> > discovered any existing bug?
>
> Given that we need an oom trigger to hit this we're not hitting this in CI
> (oom is just way to unpredictable to even try). I'd kinda like to also add
> some debug interface so I can provoke an oom kill of a specially prepared
> process, to make sure we can reliably exercise this path without killing
> the kernel accidentally. We do similar tricks for our shrinker already.

Create a task with oom_score_adj = 1000 and trigger the oom killer via
sysrq and you should get a predictable oom invocation and execution.

[...]
> Wrt the behavior difference: I guess we could put another counter into the
> task struct, and change might_sleep() to check it. All under
> CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
> sideeffect. My worry with that is that people will spot it, and abuse it
> in creative ways that do affect semantics. See horrors like
> drm_can_sleep() (and I'm sure gfx folks are not the only ones who
> seriously lacked taste here).
>
> Up to the experts really how to best paint this shed I think.

Actually I like a way to say non_block_{begin,end} and might_sleep
firing inside that context.
--
Michal Hocko
SUSE Labs

2018-11-24 08:33:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
> > On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
> > > Just a bit of paranoia, since if we start pushing this deep into
> > > callchains it's hard to spot all places where an mmu notifier
> > > implementation might fail when it's not allowed to.
> >
> > What does WARN give you more than the existing pr_info? Is really
> > backtrace that interesting?
>
> Automated tools have to ignore everything at info level (there's too much
> of that). I guess I could do something like
>
> if (blockable)
> pr_warn(...)
> else
> pr_info(...)
>
> WARN() is simply my goto tool for getting something at warning level
> dumped into dmesg. But I think the pr_warn with the callback function
> should be enough indeed.

I wouldn't mind s@pr_info@pr_warn@

> If you wonder where all the info level stuff happens that we have to
> ignore: suspend/resume is a primary culprit (fairly important for
> gfx/desktops), but there's a bunch of other places. Even if we ignore
> everything at info and below we still need filters because some drivers
> are a bit too trigger-happy (i915 definitely included I guess, so everyone
> contributes to this problem).

Thanks for the clarification.
--
Michal Hocko
SUSE Labs

2018-11-24 08:34:38

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko <[email protected]> wrote:
> On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
> > On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
> > > On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
> > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > callchains it's hard to spot all places where an mmu notifier
> > > > implementation might fail when it's not allowed to.
> > >
> > > What does WARN give you more than the existing pr_info? Is really
> > > backtrace that interesting?
> >
> > Automated tools have to ignore everything at info level (there's too much
> > of that). I guess I could do something like
> >
> > if (blockable)
> > pr_warn(...)
> > else
> > pr_info(...)
> >
> > WARN() is simply my goto tool for getting something at warning level
> > dumped into dmesg. But I think the pr_warn with the callback function
> > should be enough indeed.
>
> I wouldn't mind s@pr_info@pr_warn@

Well that's too much, because then it would misfire in the oom
testcase, where failing is ok (desireble even, we want to avoid
blocking after all). So needs to be a switch (or else we need to
filter it in results, and that's a bit a maintenance headache from a
CI pov).
-Danile

> > If you wonder where all the info level stuff happens that we have to
> > ignore: suspend/resume is a primary culprit (fairly important for
> > gfx/desktops), but there's a bunch of other places. Even if we ignore
> > everything at info and below we still need filters because some drivers
> > are a bit too trigger-happy (i915 definitely included I guess, so everyone
> > contributes to this problem).
>
> Thanks for the clarification.
> --
> Michal Hocko
> SUSE Labs



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2018-11-24 08:34:55

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable


On 23/11/2018 13:12, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko <[email protected]> wrote:
>>
>> On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
>>> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
>>>> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
>>>>> We need to make sure implementations don't cheat and don't have a
>>>>> possible schedule/blocking point deeply burried where review can't
>>>>> catch it.
>>>>>
>>>>> I'm not sure whether this is the best way to make sure all the
>>>>> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
>>>>> But it gets the job done.
>>>>
>>>> Yeah, it is quite ugly. Especially because it makes DEBUG config
>>>> bahavior much different. So is this really worth it? Has this already
>>>> discovered any existing bug?
>>>
>>> Given that we need an oom trigger to hit this we're not hitting this in CI
>>> (oom is just way to unpredictable to even try). I'd kinda like to also add
>>> some debug interface so I can provoke an oom kill of a specially prepared
>>> process, to make sure we can reliably exercise this path without killing
>>> the kernel accidentally. We do similar tricks for our shrinker already.
>>
>> Create a task with oom_score_adj = 1000 and trigger the oom killer via
>> sysrq and you should get a predictable oom invocation and execution.
>
> Ah right. We kinda do that already in an attempt to get the tests
> killed without the runner, for accidental oom. Just didn't think about
> this in the context of intentionally firing the oom. I'll try whether
> I can bake up some new subtest in our userptr/mmu-notifier testcases.

Very handy trick - I think I will think of applying it in the shrinker
area as well.

Regards,

Tvrtko

2018-11-24 08:35:18

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

On Fri 23-11-18 14:15:11, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 1:43 PM Michal Hocko <[email protected]> wrote:
> > On Fri 23-11-18 13:30:57, Daniel Vetter wrote:
> > > On Fri, Nov 23, 2018 at 12:15:57PM +0100, Michal Hocko wrote:
> > > > On Thu 22-11-18 17:51:04, Daniel Vetter wrote:
> > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > implementation might fail when it's not allowed to.
> > > >
> > > > What does WARN give you more than the existing pr_info? Is really
> > > > backtrace that interesting?
> > >
> > > Automated tools have to ignore everything at info level (there's too much
> > > of that). I guess I could do something like
> > >
> > > if (blockable)
> > > pr_warn(...)
> > > else
> > > pr_info(...)
> > >
> > > WARN() is simply my goto tool for getting something at warning level
> > > dumped into dmesg. But I think the pr_warn with the callback function
> > > should be enough indeed.
> >
> > I wouldn't mind s@pr_info@pr_warn@
>
> Well that's too much, because then it would misfire in the oom
> testcase, where failing is ok (desireble even, we want to avoid
> blocking after all). So needs to be a switch (or else we need to
> filter it in results, and that's a bit a maintenance headache from a
> CI pov).

I thought the failure should be rare enough that warning about them can
be actually useful. E.g. in the oom case we can live with the failure
because we want to release _some_ memory but know about a callback that
prevents us to go the full way might be interesting.

But I do not really feel strongly about this. I find WARN a bit abuse
because the trace is unlikely going to help us much. If you want to make
a verbosity depending on the blockable context then I will surely not
stand in the way.

--
Michal Hocko
SUSE Labs

2018-11-24 08:35:45

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko <[email protected]> wrote:
>
> On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
> > On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> > > On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > > > We need to make sure implementations don't cheat and don't have a
> > > > possible schedule/blocking point deeply burried where review can't
> > > > catch it.
> > > >
> > > > I'm not sure whether this is the best way to make sure all the
> > > > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > > > But it gets the job done.
> > >
> > > Yeah, it is quite ugly. Especially because it makes DEBUG config
> > > bahavior much different. So is this really worth it? Has this already
> > > discovered any existing bug?
> >
> > Given that we need an oom trigger to hit this we're not hitting this in CI
> > (oom is just way to unpredictable to even try). I'd kinda like to also add
> > some debug interface so I can provoke an oom kill of a specially prepared
> > process, to make sure we can reliably exercise this path without killing
> > the kernel accidentally. We do similar tricks for our shrinker already.
>
> Create a task with oom_score_adj = 1000 and trigger the oom killer via
> sysrq and you should get a predictable oom invocation and execution.

Ah right. We kinda do that already in an attempt to get the tests
killed without the runner, for accidental oom. Just didn't think about
this in the context of intentionally firing the oom. I'll try whether
I can bake up some new subtest in our userptr/mmu-notifier testcases.

> [...]
> > Wrt the behavior difference: I guess we could put another counter into the
> > task struct, and change might_sleep() to check it. All under
> > CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
> > sideeffect. My worry with that is that people will spot it, and abuse it
> > in creative ways that do affect semantics. See horrors like
> > drm_can_sleep() (and I'm sure gfx folks are not the only ones who
> > seriously lacked taste here).
> >
> > Up to the experts really how to best paint this shed I think.
>
> Actually I like a way to say non_block_{begin,end} and might_sleep
> firing inside that context.

Ok, I'll respin with these (introduced in a separate patch).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2018-11-27 07:50:34

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
> This is a similar idea to the fs_reclaim fake lockdep lock. It's
> fairly easy to provoke a specific notifier to be run on a specific
> range: Just prep it, and then munmap() it.
>
> A bit harder, but still doable, is to provoke the mmu notifiers for
> all the various callchains that might lead to them. But both at the
> same time is really hard to reliable hit, especially when you want to
> exercise paths like direct reclaim or compaction, where it's not
> easy to control what exactly will be unmapped.
>
> By introducing a lockdep map to tie them all together we allow lockdep
> to see a lot more dependencies, without having to actually hit them
> in a single challchain while testing.
>
> Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> this out for the invaliate_range_start callback. If there's
> interest, we should probably roll this out to all of them. But my
> undestanding of core mm is seriously lacking, and I'm not clear on
> whether we need a lockdep map for each callback, or whether some can
> be shared.
>
> Cc: Andrew Morton <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: "J?r?me Glisse" <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: "Christian K?nig" <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: [email protected]
> Signed-off-by: Daniel Vetter <[email protected]>

Any comments on this one here? This is really the main ingredient for
catching deadlocks in mmu notifier callbacks. The other two patches are
more the icing on the cake.

Thanks, Daniel

> ---
> include/linux/mmu_notifier.h | 7 +++++++
> mm/mmu_notifier.c | 7 +++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 9893a6432adf..a39ba218dbbe 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
>
> #ifdef CONFIG_MMU_NOTIFIER
>
> +#ifdef CONFIG_LOCKDEP
> +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
> +#endif
> +
> /*
> * The mmu notifier_mm structure is allocated and installed in
> * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
> @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
> static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> unsigned long start, unsigned long end)
> {
> + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
> + _RET_IP_);
> if (mm_has_notifiers(mm))
> __mmu_notifier_invalidate_range_start(mm, start, end, true);
> + mutex_release(&__mmu_notifier_invalidate_range_start_map, 1, _RET_IP_);
> }
>
> static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm,
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 4d282cfb296e..c6e797927376 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -23,6 +23,13 @@
> /* global SRCU for all MMs */
> DEFINE_STATIC_SRCU(srcu);
>
> +#ifdef CONFIG_LOCKDEP
> +struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
> + .name = "mmu_notifier_invalidate_range_start"
> +};
> +EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start_map);
> +#endif
> +
> /*
> * This function allows mmu_notifier::release callback to delay a call to
> * a function that will free appropriate resources. The function must be
> --
> 2.19.1
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-27 16:53:41

by Chris Wilson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

Quoting Daniel Vetter (2018-11-27 07:49:18)
> On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
> > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > fairly easy to provoke a specific notifier to be run on a specific
> > range: Just prep it, and then munmap() it.
> >
> > A bit harder, but still doable, is to provoke the mmu notifiers for
> > all the various callchains that might lead to them. But both at the
> > same time is really hard to reliable hit, especially when you want to
> > exercise paths like direct reclaim or compaction, where it's not
> > easy to control what exactly will be unmapped.
> >
> > By introducing a lockdep map to tie them all together we allow lockdep
> > to see a lot more dependencies, without having to actually hit them
> > in a single challchain while testing.
> >
> > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > this out for the invaliate_range_start callback. If there's
> > interest, we should probably roll this out to all of them. But my
> > undestanding of core mm is seriously lacking, and I'm not clear on
> > whether we need a lockdep map for each callback, or whether some can
> > be shared.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: "Jérôme Glisse" <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: "Christian König" <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > Cc: Mike Rapoport <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Daniel Vetter <[email protected]>
>
> Any comments on this one here? This is really the main ingredient for
> catching deadlocks in mmu notifier callbacks. The other two patches are
> more the icing on the cake.
>
> Thanks, Daniel
>
> > ---
> > include/linux/mmu_notifier.h | 7 +++++++
> > mm/mmu_notifier.c | 7 +++++++
> > 2 files changed, 14 insertions(+)
> >
> > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > index 9893a6432adf..a39ba218dbbe 100644
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
> >
> > #ifdef CONFIG_MMU_NOTIFIER
> >
> > +#ifdef CONFIG_LOCKDEP
> > +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
> > +#endif
> > +
> > /*
> > * The mmu notifier_mm structure is allocated and installed in
> > * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
> > @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
> > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > unsigned long start, unsigned long end)
> > {
> > + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
> > + _RET_IP_);

Would not lock_acquire_shared() be more appropriate, i.e. treat this as
a rwsem_acquire_read()?
-Chris

2018-11-27 17:44:16

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, Nov 27, 2018 at 05:33:58PM +0000, Chris Wilson wrote:
> Quoting Daniel Vetter (2018-11-27 17:28:43)
> > On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson <[email protected]> wrote:
> > >
> > > Quoting Daniel Vetter (2018-11-27 07:49:18)
> > > > On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
> > > > > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > > > > fairly easy to provoke a specific notifier to be run on a specific
> > > > > range: Just prep it, and then munmap() it.
> > > > >
> > > > > A bit harder, but still doable, is to provoke the mmu notifiers for
> > > > > all the various callchains that might lead to them. But both at the
> > > > > same time is really hard to reliable hit, especially when you want to
> > > > > exercise paths like direct reclaim or compaction, where it's not
> > > > > easy to control what exactly will be unmapped.
> > > > >
> > > > > By introducing a lockdep map to tie them all together we allow lockdep
> > > > > to see a lot more dependencies, without having to actually hit them
> > > > > in a single challchain while testing.
> > > > >
> > > > > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > > > > this out for the invaliate_range_start callback. If there's
> > > > > interest, we should probably roll this out to all of them. But my
> > > > > undestanding of core mm is seriously lacking, and I'm not clear on
> > > > > whether we need a lockdep map for each callback, or whether some can
> > > > > be shared.
> > > > >
> > > > > Cc: Andrew Morton <[email protected]>
> > > > > Cc: David Rientjes <[email protected]>
> > > > > Cc: "J?r?me Glisse" <[email protected]>
> > > > > Cc: Michal Hocko <[email protected]>
> > > > > Cc: "Christian K?nig" <[email protected]>
> > > > > Cc: Greg Kroah-Hartman <[email protected]>
> > > > > Cc: Daniel Vetter <[email protected]>
> > > > > Cc: Mike Rapoport <[email protected]>
> > > > > Cc: [email protected]
> > > > > Signed-off-by: Daniel Vetter <[email protected]>
> > > >
> > > > Any comments on this one here? This is really the main ingredient for
> > > > catching deadlocks in mmu notifier callbacks. The other two patches are
> > > > more the icing on the cake.
> > > >
> > > > Thanks, Daniel
> > > >
> > > > > ---
> > > > > include/linux/mmu_notifier.h | 7 +++++++
> > > > > mm/mmu_notifier.c | 7 +++++++
> > > > > 2 files changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > > > > index 9893a6432adf..a39ba218dbbe 100644
> > > > > --- a/include/linux/mmu_notifier.h
> > > > > +++ b/include/linux/mmu_notifier.h
> > > > > @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
> > > > >
> > > > > #ifdef CONFIG_MMU_NOTIFIER
> > > > >
> > > > > +#ifdef CONFIG_LOCKDEP
> > > > > +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
> > > > > +#endif
> > > > > +
> > > > > /*
> > > > > * The mmu notifier_mm structure is allocated and installed in
> > > > > * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
> > > > > @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
> > > > > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > > > > unsigned long start, unsigned long end)
> > > > > {
> > > > > + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
> > > > > + _RET_IP_);
> > >
> > > Would not lock_acquire_shared() be more appropriate, i.e. treat this as
> > > a rwsem_acquire_read()?
> >
> > read lock critical sections can't create any dependencies against any
> > other read lock critical section of the same lock. Switching this to a
> > read lock would just render the annotation pointless (if you don't
> > include at least some write lock critical section somewhere, but I
> > have no idea where you'd do that). A read lock that you only ever take
> > for reading essentially doesn't do anything at all.
> >
> > So not clear on why you're suggesting this?
>
> Just that it's not acting as a mutex, so emulating one looks wrong.

Ok, I think switching to lock_map_acquire/release should address that.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-11-27 19:28:05

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson <[email protected]> wrote:
>
> Quoting Daniel Vetter (2018-11-27 07:49:18)
> > On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
> > > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > > fairly easy to provoke a specific notifier to be run on a specific
> > > range: Just prep it, and then munmap() it.
> > >
> > > A bit harder, but still doable, is to provoke the mmu notifiers for
> > > all the various callchains that might lead to them. But both at the
> > > same time is really hard to reliable hit, especially when you want to
> > > exercise paths like direct reclaim or compaction, where it's not
> > > easy to control what exactly will be unmapped.
> > >
> > > By introducing a lockdep map to tie them all together we allow lockdep
> > > to see a lot more dependencies, without having to actually hit them
> > > in a single challchain while testing.
> > >
> > > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > > this out for the invaliate_range_start callback. If there's
> > > interest, we should probably roll this out to all of them. But my
> > > undestanding of core mm is seriously lacking, and I'm not clear on
> > > whether we need a lockdep map for each callback, or whether some can
> > > be shared.
> > >
> > > Cc: Andrew Morton <[email protected]>
> > > Cc: David Rientjes <[email protected]>
> > > Cc: "Jérôme Glisse" <[email protected]>
> > > Cc: Michal Hocko <[email protected]>
> > > Cc: "Christian König" <[email protected]>
> > > Cc: Greg Kroah-Hartman <[email protected]>
> > > Cc: Daniel Vetter <[email protected]>
> > > Cc: Mike Rapoport <[email protected]>
> > > Cc: [email protected]
> > > Signed-off-by: Daniel Vetter <[email protected]>
> >
> > Any comments on this one here? This is really the main ingredient for
> > catching deadlocks in mmu notifier callbacks. The other two patches are
> > more the icing on the cake.
> >
> > Thanks, Daniel
> >
> > > ---
> > > include/linux/mmu_notifier.h | 7 +++++++
> > > mm/mmu_notifier.c | 7 +++++++
> > > 2 files changed, 14 insertions(+)
> > >
> > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > > index 9893a6432adf..a39ba218dbbe 100644
> > > --- a/include/linux/mmu_notifier.h
> > > +++ b/include/linux/mmu_notifier.h
> > > @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
> > >
> > > #ifdef CONFIG_MMU_NOTIFIER
> > >
> > > +#ifdef CONFIG_LOCKDEP
> > > +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
> > > +#endif
> > > +
> > > /*
> > > * The mmu notifier_mm structure is allocated and installed in
> > > * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
> > > @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
> > > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > > unsigned long start, unsigned long end)
> > > {
> > > + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
> > > + _RET_IP_);
>
> Would not lock_acquire_shared() be more appropriate, i.e. treat this as
> a rwsem_acquire_read()?

read lock critical sections can't create any dependencies against any
other read lock critical section of the same lock. Switching this to a
read lock would just render the annotation pointless (if you don't
include at least some write lock critical section somewhere, but I
have no idea where you'd do that). A read lock that you only ever take
for reading essentially doesn't do anything at all.

So not clear on why you're suggesting this?

It's the exact same idea like fs_reclaim of intserting a fake lock to
tie all possible callchains to a given functions together with all
possible callchains from that function. Of course this is only valid
if all NxM combinations could happen in theory. For fs_reclaim that's
true because direct reclaim can pick anything it wants to
shrink/evict. For mmu notifier that's true as long as we assume any
mmu notifier can be in use by any process, which only depends upon
sufficiently contrived/evil userspace.

I guess I could use lock_map_acquire/release() wrappers for this like
fs_reclaim, would be a bit more clear.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2018-11-27 19:29:08

by Chris Wilson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 3/3] mm, notifier: Add a lockdep map for invalidate_range_start

Quoting Daniel Vetter (2018-11-27 17:28:43)
> On Tue, Nov 27, 2018 at 5:50 PM Chris Wilson <[email protected]> wrote:
> >
> > Quoting Daniel Vetter (2018-11-27 07:49:18)
> > > On Thu, Nov 22, 2018 at 05:51:06PM +0100, Daniel Vetter wrote:
> > > > This is a similar idea to the fs_reclaim fake lockdep lock. It's
> > > > fairly easy to provoke a specific notifier to be run on a specific
> > > > range: Just prep it, and then munmap() it.
> > > >
> > > > A bit harder, but still doable, is to provoke the mmu notifiers for
> > > > all the various callchains that might lead to them. But both at the
> > > > same time is really hard to reliable hit, especially when you want to
> > > > exercise paths like direct reclaim or compaction, where it's not
> > > > easy to control what exactly will be unmapped.
> > > >
> > > > By introducing a lockdep map to tie them all together we allow lockdep
> > > > to see a lot more dependencies, without having to actually hit them
> > > > in a single challchain while testing.
> > > >
> > > > Aside: Since I typed this to test i915 mmu notifiers I've only rolled
> > > > this out for the invaliate_range_start callback. If there's
> > > > interest, we should probably roll this out to all of them. But my
> > > > undestanding of core mm is seriously lacking, and I'm not clear on
> > > > whether we need a lockdep map for each callback, or whether some can
> > > > be shared.
> > > >
> > > > Cc: Andrew Morton <[email protected]>
> > > > Cc: David Rientjes <[email protected]>
> > > > Cc: "Jérôme Glisse" <[email protected]>
> > > > Cc: Michal Hocko <[email protected]>
> > > > Cc: "Christian König" <[email protected]>
> > > > Cc: Greg Kroah-Hartman <[email protected]>
> > > > Cc: Daniel Vetter <[email protected]>
> > > > Cc: Mike Rapoport <[email protected]>
> > > > Cc: [email protected]
> > > > Signed-off-by: Daniel Vetter <[email protected]>
> > >
> > > Any comments on this one here? This is really the main ingredient for
> > > catching deadlocks in mmu notifier callbacks. The other two patches are
> > > more the icing on the cake.
> > >
> > > Thanks, Daniel
> > >
> > > > ---
> > > > include/linux/mmu_notifier.h | 7 +++++++
> > > > mm/mmu_notifier.c | 7 +++++++
> > > > 2 files changed, 14 insertions(+)
> > > >
> > > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > > > index 9893a6432adf..a39ba218dbbe 100644
> > > > --- a/include/linux/mmu_notifier.h
> > > > +++ b/include/linux/mmu_notifier.h
> > > > @@ -12,6 +12,10 @@ struct mmu_notifier_ops;
> > > >
> > > > #ifdef CONFIG_MMU_NOTIFIER
> > > >
> > > > +#ifdef CONFIG_LOCKDEP
> > > > +extern struct lockdep_map __mmu_notifier_invalidate_range_start_map;
> > > > +#endif
> > > > +
> > > > /*
> > > > * The mmu notifier_mm structure is allocated and installed in
> > > > * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
> > > > @@ -267,8 +271,11 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm,
> > > > static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> > > > unsigned long start, unsigned long end)
> > > > {
> > > > + mutex_acquire(&__mmu_notifier_invalidate_range_start_map, 0, 0,
> > > > + _RET_IP_);
> >
> > Would not lock_acquire_shared() be more appropriate, i.e. treat this as
> > a rwsem_acquire_read()?
>
> read lock critical sections can't create any dependencies against any
> other read lock critical section of the same lock. Switching this to a
> read lock would just render the annotation pointless (if you don't
> include at least some write lock critical section somewhere, but I
> have no idea where you'd do that). A read lock that you only ever take
> for reading essentially doesn't do anything at all.
>
> So not clear on why you're suggesting this?

Just that it's not acting as a mutex, so emulating one looks wrong.
-Chris