2013-09-12 15:57:22

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 1/2] drm/i915: kill set_need_resched

This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.

Reported-by: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d80f33d..3871060 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1389,14 +1389,11 @@ out:
if (i915_terminally_wedged(&dev_priv->gpu_error))
return VM_FAULT_SIGBUS;
case -EAGAIN:
- /* Give the error handler a chance to run and move the
- * objects off the GPU active list. Next time we service the
- * fault, we should be able to transition the page into the
- * GTT without touching the GPU (and so avoid further
- * EIO/EGAIN). If the GPU is wedged, then there is no issue
- * with coherency, just lost writes.
+ /*
+ * EAGAIN means the gpu is hung and we'll wait for the error
+ * handler to reset everything when re-faulting in
+ * i915_mutex_lock_interruptible.
*/
- set_need_resched();
case 0:
case -ERESTARTSYS:
case -EINTR:
--
1.8.4.rc3


2013-09-12 15:57:24

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH 2/2] drm/udl: rip out set_need_resched

This very much looks like copypasta from drm/i915's fault handler.
It was used there to duct-tape over issues around gpu reset handling.

Since that can't ever happen for udl and there's seemingly no other
reason for this just drop it.

Reported-by: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Dave Airlie <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/udl/udl_gem.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c
index 8dbe9d0..8bf6461 100644
--- a/drivers/gpu/drm/udl/udl_gem.c
+++ b/drivers/gpu/drm/udl/udl_gem.c
@@ -97,7 +97,6 @@ int udl_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
ret = vm_insert_page(vma, (unsigned long)vmf->virtual_address, page);
switch (ret) {
case -EAGAIN:
- set_need_resched();
case 0:
case -ERESTARTSYS:
return VM_FAULT_NOPAGE;
--
1.8.4.rc3

2013-09-12 15:59:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm/i915: kill set_need_resched

On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote:
> This is just a remnant from the old days when our reset handling was
> horribly racy, suffered from terribly locking issues and often happily
> live-locked. Those days are now gone so we can drop the hacks and just
> rip the reschedule-point out.
>
> Reported-by: Peter Zijlstra <[email protected]>

Thanks!

Acked-by: Peter Zijlstra <[email protected]>

2013-09-12 16:00:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] drm/udl: rip out set_need_resched

On Thu, Sep 12, 2013 at 05:57:29PM +0200, Daniel Vetter wrote:
> This very much looks like copypasta from drm/i915's fault handler.
> It was used there to duct-tape over issues around gpu reset handling.
>
> Since that can't ever happen for udl and there's seemingly no other
> reason for this just drop it.
>
> Reported-by: Peter Zijlstra <[email protected]>

Thanks!

Acked-by: Peter Zijlstra <[email protected]>

2013-09-12 20:41:38

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 1/2] drm/i915: kill set_need_resched

On Thu, Sep 12, 2013 at 05:59:51PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 12, 2013 at 05:57:28PM +0200, Daniel Vetter wrote:
> > This is just a remnant from the old days when our reset handling was
> > horribly racy, suffered from terribly locking issues and often happily
> > live-locked. Those days are now gone so we can drop the hacks and just
> > rip the reschedule-point out.
> >
> > Reported-by: Peter Zijlstra <[email protected]>
>
> Thanks!
>
> Acked-by: Peter Zijlstra <[email protected]>

Merged to drm-intel-fixes with Chris' irc-review that we indeed don't need
this any more to duct-tape over bugs.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-09-13 00:59:18

by Rob Clark

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: kill set_need_resched

hmm, looks like I cargo-cult'd the same into msm.

I guess in i915 (and ttm) case, the issue arises due to need for CPU
access to buffer via GTT? In which case I should be safe to drop the
set_need_resched() as well? (Since CPU always has direct access to the
pages.) Or am I missing something about the original issue that
necessitated set_need_resched()?

BR,
-R


On Thu, Sep 12, 2013 at 11:57 AM, Daniel Vetter <[email protected]> wrote:
> This is just a remnant from the old days when our reset handling was
> horribly racy, suffered from terribly locking issues and often happily
> live-locked. Those days are now gone so we can drop the hacks and just
> rip the reschedule-point out.
>
> Reported-by: Peter Zijlstra <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d80f33d..3871060 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1389,14 +1389,11 @@ out:
> if (i915_terminally_wedged(&dev_priv->gpu_error))
> return VM_FAULT_SIGBUS;
> case -EAGAIN:
> - /* Give the error handler a chance to run and move the
> - * objects off the GPU active list. Next time we service the
> - * fault, we should be able to transition the page into the
> - * GTT without touching the GPU (and so avoid further
> - * EIO/EGAIN). If the GPU is wedged, then there is no issue
> - * with coherency, just lost writes.
> + /*
> + * EAGAIN means the gpu is hung and we'll wait for the error
> + * handler to reset everything when re-faulting in
> + * i915_mutex_lock_interruptible.
> */
> - set_need_resched();
> case 0:
> case -ERESTARTSYS:
> case -EINTR:
> --
> 1.8.4.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

2013-09-13 08:46:37

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: kill set_need_resched

On Fri, Sep 13, 2013 at 2:59 AM, Rob Clark <[email protected]> wrote:
> I guess in i915 (and ttm) case, the issue arises due to need for CPU
> access to buffer via GTT? In which case I should be safe to drop the
> set_need_resched() as well? (Since CPU always has direct access to the
> pages.) Or am I missing something about the original issue that
> necessitated set_need_resched()?

For drm/i915 the _only_ reason we've had it was to avoid life-locking
with our gpu reset work when the gpu hung. We've fixed that properly
now by using a wait-queue to stall when a gpu reset is pending and
proper locking in the gpu reset handler (plus tons of evil tests to
make sure it doesn't break, there's rather fragile lock-dropping and
tricky ordering involved). So if you don't have i915's broken gpu
reset handling from yonder you don't need our cargo-cult.

ttm's usage with a trylock+yield is a different form of duct-tape to
paper over locking inversions between copy_*_user callsites and the
pagefault handler.

In any case there's no way it actually works properly ;-)

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch