2017-03-12 19:28:02

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

The trouble we have is that we can't really test all the shrinker
recursion stuff exhaustively in BAT because any kind of thrashing
stress test just takes too long.

But that leaves a really big gap open, since shrinker recursions are
one of the most annoying bugs. Now lockdep already has support for
checking allocation deadlocks:

- Direct reclaim paths are marked up with
lockdep_set_current_reclaim_state() and
lockdep_clear_current_reclaim_state().

- Any allocation paths are marked with lockdep_trace_alloc().

If we simply mark up our debugfs with the reclaim annotations, any
code and locks taken in there will automatically complete the picture
with any allocation paths we already have, as long as we have a simple
testcase in BAT which throws out a few objects using this interface.
Not stress test or thrashing needed at all.

v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module.

Cc: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Reviewed-by: Chris Wilson <[email protected]> (v1)
Signed-off-by: Daniel Vetter <[email protected]>

--

Peter/Ingo,

We want this to validate the i915 shrinker locking in our fast tests
without thrashing badly (that takes too long, we can only thrash in
the extended runs). Can you pls take a look and if it's ok ack for
merging through drm-intel.git?

Thanks, Daniel
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
kernel/locking/lockdep.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 82fb005a5e22..0f1d6c4a212b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4273,11 +4273,13 @@ i915_drop_caches_set(void *data, u64 val)
if (val & (DROP_RETIRE | DROP_ACTIVE))
i915_gem_retire_requests(dev_priv);

+ lockdep_set_current_reclaim_state(GFP_KERNEL);
if (val & DROP_BOUND)
i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);

if (val & DROP_UNBOUND)
i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_UNBOUND);
+ lockdep_clear_current_reclaim_state();

if (val & DROP_SHRINK_ALL)
i915_gem_shrink_all(dev_priv);
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 12e38c213b70..508cbf31d43e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3856,11 +3856,13 @@ void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
{
current->lockdep_reclaim_gfp = gfp_mask;
}
+EXPORT_SYMBOL_GPL(lockdep_set_current_reclaim_state);

void lockdep_clear_current_reclaim_state(void)
{
current->lockdep_reclaim_gfp = 0;
}
+EXPORT_SYMBOL_GPL(lockdep_clear_current_reclaim_state);

#ifdef CONFIG_LOCK_STAT
static int
--
2.11.0


2017-03-12 19:58:53

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

On Sun, Mar 12, 2017 at 08:27:16PM +0100, Daniel Vetter wrote:
> The trouble we have is that we can't really test all the shrinker
> recursion stuff exhaustively in BAT because any kind of thrashing
> stress test just takes too long.
>
> But that leaves a really big gap open, since shrinker recursions are
> one of the most annoying bugs. Now lockdep already has support for
> checking allocation deadlocks:
>
> - Direct reclaim paths are marked up with
> lockdep_set_current_reclaim_state() and
> lockdep_clear_current_reclaim_state().
>
> - Any allocation paths are marked with lockdep_trace_alloc().
>
> If we simply mark up our debugfs with the reclaim annotations, any
> code and locks taken in there will automatically complete the picture
> with any allocation paths we already have, as long as we have a simple
> testcase in BAT which throws out a few objects using this interface.
> Not stress test or thrashing needed at all.
>
> v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module.
>
> Cc: Chris Wilson <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> Reviewed-by: Chris Wilson <[email protected]> (v1)
> Signed-off-by: Daniel Vetter <[email protected]>
>
> --
>
> Peter/Ingo,
>
> We want this to validate the i915 shrinker locking in our fast tests
> without thrashing badly (that takes too long, we can only thrash in
> the extended runs). Can you pls take a look and if it's ok ack for
> merging through drm-intel.git?
>
> Thanks, Daniel
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
> kernel/locking/lockdep.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 82fb005a5e22..0f1d6c4a212b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4273,11 +4273,13 @@ i915_drop_caches_set(void *data, u64 val)
> if (val & (DROP_RETIRE | DROP_ACTIVE))
> i915_gem_retire_requests(dev_priv);
>
> + lockdep_set_current_reclaim_state(GFP_KERNEL);
> if (val & DROP_BOUND)
> i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);
>
> if (val & DROP_UNBOUND)
> i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_UNBOUND);
> + lockdep_clear_current_reclaim_state();
>
> if (val & DROP_SHRINK_ALL)
> i915_gem_shrink_all(dev_priv);

Best to move the clear to here.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2017-03-12 20:54:00

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

The trouble we have is that we can't really test all the shrinker
recursion stuff exhaustively in BAT because any kind of thrashing
stress test just takes too long.

But that leaves a really big gap open, since shrinker recursions are
one of the most annoying bugs. Now lockdep already has support for
checking allocation deadlocks:

- Direct reclaim paths are marked up with
lockdep_set_current_reclaim_state() and
lockdep_clear_current_reclaim_state().

- Any allocation paths are marked with lockdep_trace_alloc().

If we simply mark up our debugfs with the reclaim annotations, any
code and locks taken in there will automatically complete the picture
with any allocation paths we already have, as long as we have a simple
testcase in BAT which throws out a few objects using this interface.
Not stress test or thrashing needed at all.

v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module.

v3: Fixup rebase fail (spotted by Chris).

Cc: Chris Wilson <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>

--

Peter/Ingo,

We want this to validate the i915 shrinker locking in our fast tests
without thrashing badly (that takes too long, we can only thrash in
the extended runs). Can you pls take a look and if it's ok ack for
merging through drm-intel.git?

Thanks, Daniel
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
kernel/locking/lockdep.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 82fb005a5e22..fbe761a3f5bd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4273,6 +4273,7 @@ i915_drop_caches_set(void *data, u64 val)
if (val & (DROP_RETIRE | DROP_ACTIVE))
i915_gem_retire_requests(dev_priv);

+ lockdep_set_current_reclaim_state(GFP_KERNEL);
if (val & DROP_BOUND)
i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);

@@ -4281,6 +4282,7 @@ i915_drop_caches_set(void *data, u64 val)

if (val & DROP_SHRINK_ALL)
i915_gem_shrink_all(dev_priv);
+ lockdep_clear_current_reclaim_state();

unlock:
mutex_unlock(&dev->struct_mutex);
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 12e38c213b70..508cbf31d43e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3856,11 +3856,13 @@ void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
{
current->lockdep_reclaim_gfp = gfp_mask;
}
+EXPORT_SYMBOL_GPL(lockdep_set_current_reclaim_state);

void lockdep_clear_current_reclaim_state(void)
{
current->lockdep_reclaim_gfp = 0;
}
+EXPORT_SYMBOL_GPL(lockdep_clear_current_reclaim_state);

#ifdef CONFIG_LOCK_STAT
static int
--
2.11.0

2017-03-13 08:02:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

On Sun, Mar 12, 2017 at 09:53:40PM +0100, Daniel Vetter wrote:

> Peter/Ingo,
>
> We want this to validate the i915 shrinker locking in our fast tests
> without thrashing badly (that takes too long, we can only thrash in
> the extended runs). Can you pls take a look and if it's ok ack for
> merging through drm-intel.git?

Hurm, I was going to rework all that soonish; have a look here:

https://lkml.kernel.org/r/[email protected]

The immediate problem is that I made the annotation private to mm/
there, I suppose I could fix that.

> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
> kernel/locking/lockdep.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 82fb005a5e22..fbe761a3f5bd 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4273,6 +4273,7 @@ i915_drop_caches_set(void *data, u64 val)
> if (val & (DROP_RETIRE | DROP_ACTIVE))
> i915_gem_retire_requests(dev_priv);
>
> + lockdep_set_current_reclaim_state(GFP_KERNEL);
> if (val & DROP_BOUND)
> i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);
>
> @@ -4281,6 +4282,7 @@ i915_drop_caches_set(void *data, u64 val)
>
> if (val & DROP_SHRINK_ALL)
> i915_gem_shrink_all(dev_priv);
> + lockdep_clear_current_reclaim_state();
>
> unlock:
> mutex_unlock(&dev->struct_mutex);
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 12e38c213b70..508cbf31d43e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3856,11 +3856,13 @@ void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
> {
> current->lockdep_reclaim_gfp = gfp_mask;
> }
> +EXPORT_SYMBOL_GPL(lockdep_set_current_reclaim_state);
>
> void lockdep_clear_current_reclaim_state(void)
> {
> current->lockdep_reclaim_gfp = 0;
> }
> +EXPORT_SYMBOL_GPL(lockdep_clear_current_reclaim_state);
>
> #ifdef CONFIG_LOCK_STAT
> static int
> --
> 2.11.0
>

2017-03-13 08:15:36

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

On Mon, Mar 13, 2017 at 09:01:57AM +0100, Peter Zijlstra wrote:
> On Sun, Mar 12, 2017 at 09:53:40PM +0100, Daniel Vetter wrote:
>
> > Peter/Ingo,
> >
> > We want this to validate the i915 shrinker locking in our fast tests
> > without thrashing badly (that takes too long, we can only thrash in
> > the extended runs). Can you pls take a look and if it's ok ack for
> > merging through drm-intel.git?
>
> Hurm, I was going to rework all that soonish; have a look here:
>
> https://lkml.kernel.org/r/[email protected]
>
> The immediate problem is that I made the annotation private to mm/
> there, I suppose I could fix that.

Yeah, we'd really like to have that, and even when switched to a
lockdep_map instead of reusing the context stuff the semantic interface
would be the same (and I think we should keep the gfp_flags stuff, in case
someone adds a nesting lockdep map for GFP_IO).

Do you want a topic branch with just this patch (the shrink_all is new so
there will be a conflict and we can't mege it through one tree alone) so
that you can refactor things with i915 included?

Thanks, Daniel

>
> > ---
> > drivers/gpu/drm/i915/i915_debugfs.c | 2 ++
> > kernel/locking/lockdep.c | 2 ++
> > 2 files changed, 4 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 82fb005a5e22..fbe761a3f5bd 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -4273,6 +4273,7 @@ i915_drop_caches_set(void *data, u64 val)
> > if (val & (DROP_RETIRE | DROP_ACTIVE))
> > i915_gem_retire_requests(dev_priv);
> >
> > + lockdep_set_current_reclaim_state(GFP_KERNEL);
> > if (val & DROP_BOUND)
> > i915_gem_shrink(dev_priv, LONG_MAX, I915_SHRINK_BOUND);
> >
> > @@ -4281,6 +4282,7 @@ i915_drop_caches_set(void *data, u64 val)
> >
> > if (val & DROP_SHRINK_ALL)
> > i915_gem_shrink_all(dev_priv);
> > + lockdep_clear_current_reclaim_state();
> >
> > unlock:
> > mutex_unlock(&dev->struct_mutex);
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 12e38c213b70..508cbf31d43e 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -3856,11 +3856,13 @@ void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
> > {
> > current->lockdep_reclaim_gfp = gfp_mask;
> > }
> > +EXPORT_SYMBOL_GPL(lockdep_set_current_reclaim_state);
> >
> > void lockdep_clear_current_reclaim_state(void)
> > {
> > current->lockdep_reclaim_gfp = 0;
> > }
> > +EXPORT_SYMBOL_GPL(lockdep_clear_current_reclaim_state);
> >
> > #ifdef CONFIG_LOCK_STAT
> > static int
> > --
> > 2.11.0
> >

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2017-03-13 08:30:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

On Mon, Mar 13, 2017 at 09:15:17AM +0100, Daniel Vetter wrote:
> On Mon, Mar 13, 2017 at 09:01:57AM +0100, Peter Zijlstra wrote:
> > On Sun, Mar 12, 2017 at 09:53:40PM +0100, Daniel Vetter wrote:
> >
> > > Peter/Ingo,
> > >
> > > We want this to validate the i915 shrinker locking in our fast tests
> > > without thrashing badly (that takes too long, we can only thrash in
> > > the extended runs). Can you pls take a look and if it's ok ack for
> > > merging through drm-intel.git?
> >
> > Hurm, I was going to rework all that soonish; have a look here:
> >
> > https://lkml.kernel.org/r/[email protected]
> >
> > The immediate problem is that I made the annotation private to mm/
> > there, I suppose I could fix that.
>
> Yeah, we'd really like to have that, and even when switched to a
> lockdep_map instead of reusing the context stuff the semantic interface
> would be the same (and I think we should keep the gfp_flags stuff, in case
> someone adds a nesting lockdep map for GFP_IO).
>
> Do you want a topic branch with just this patch (the shrink_all is new so
> there will be a conflict and we can't mege it through one tree alone) so
> that you can refactor things with i915 included?

Just take your patch; I'll sort it out when I get time for things and
take i915 along for the ride.

Acked-by: Peter Zijlstra (Intel) <[email protected]>

Thanks!

2017-03-14 09:10:08

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH] drm/i915: annote drop_caches debugfs interface with lockdep

On Mon, Mar 13, 2017 at 09:30:41AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 13, 2017 at 09:15:17AM +0100, Daniel Vetter wrote:
> > On Mon, Mar 13, 2017 at 09:01:57AM +0100, Peter Zijlstra wrote:
> > > On Sun, Mar 12, 2017 at 09:53:40PM +0100, Daniel Vetter wrote:
> > >
> > > > Peter/Ingo,
> > > >
> > > > We want this to validate the i915 shrinker locking in our fast tests
> > > > without thrashing badly (that takes too long, we can only thrash in
> > > > the extended runs). Can you pls take a look and if it's ok ack for
> > > > merging through drm-intel.git?
> > >
> > > Hurm, I was going to rework all that soonish; have a look here:
> > >
> > > https://lkml.kernel.org/r/[email protected]
> > >
> > > The immediate problem is that I made the annotation private to mm/
> > > there, I suppose I could fix that.
> >
> > Yeah, we'd really like to have that, and even when switched to a
> > lockdep_map instead of reusing the context stuff the semantic interface
> > would be the same (and I think we should keep the gfp_flags stuff, in case
> > someone adds a nesting lockdep map for GFP_IO).
> >
> > Do you want a topic branch with just this patch (the shrink_all is new so
> > there will be a conflict and we can't mege it through one tree alone) so
> > that you can refactor things with i915 included?
>
> Just take your patch; I'll sort it out when I get time for things and
> take i915 along for the ride.
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>

Thanks, queued in drm-intel for 4.12 with Chris' irc r-b confirmation.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch