2016-03-07 02:59:15

by Erik Andersen

[permalink] [raw]
Subject: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

The following patch to radeon_sa_bo_new that
went into 3.10.99

commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
Author: Nicolai Hähnle <[email protected]>
Date: Fri Feb 5 14:35:53 2016 -0500
drm/radeon: hold reference to fences in radeon_sa_bo_new
commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.

is triggering an Oops for me right when xscreensaver
first began doing 3D stuff. After reverting this
patch, xscreensaver has been happily running 3D stuff.

Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP

Mar 6 18:00:43 sage kernel: Stack:
Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
Mar 6 18:00:43 sage kernel: Call Trace:
Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6

$ lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]

-Erik

--
Erik B. Andersen
--This message was written using 73% post-consumer electrons--


2016-03-07 20:47:08

by Greg KH

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
> The following patch to radeon_sa_bo_new that
> went into 3.10.99
>
> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
> Author: Nicolai Hähnle <[email protected]>
> Date: Fri Feb 5 14:35:53 2016 -0500
> drm/radeon: hold reference to fences in radeon_sa_bo_new
> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
>
> is triggering an Oops for me right when xscreensaver
> first began doing 3D stuff. After reverting this
> patch, xscreensaver has been happily running 3D stuff.
>
> Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
> Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
> Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
>
> Mar 6 18:00:43 sage kernel: Stack:
> Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
> Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
> Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
> Mar 6 18:00:43 sage kernel: Call Trace:
> Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
> Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
> Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
> Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
> Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
> Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
> Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
> Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
> Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
> Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
>
> $ lspci | grep VGA
> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]

Next time, please cc: the people responsible for that patch as well...

I can revert it, but maybe something else is going on here? Do you have
this same problem on 3.14, and 4.5-rc7?

thanks,

greg k-h

2016-03-07 21:07:10

by Christian König

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
> On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
>> The following patch to radeon_sa_bo_new that
>> went into 3.10.99
>>
>> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
>> Author: Nicolai Hähnle <[email protected]>
>> Date: Fri Feb 5 14:35:53 2016 -0500
>> drm/radeon: hold reference to fences in radeon_sa_bo_new
>> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
>>
>> is triggering an Oops for me right when xscreensaver
>> first began doing 3D stuff. After reverting this
>> patch, xscreensaver has been happily running 3D stuff.
>>
>> Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
>> Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
>> Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
>>
>> Mar 6 18:00:43 sage kernel: Stack:
>> Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
>> Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
>> Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
>> Mar 6 18:00:43 sage kernel: Call Trace:
>> Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
>> Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
>> Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
>> Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
>> Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
>> Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
>> Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
>> Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
>> Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
>> Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
>>
>> $ lspci | grep VGA
>> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
> Next time, please cc: the people responsible for that patch as well...
>
> I can revert it, but maybe something else is going on here? Do you have
> this same problem on 3.14, and 4.5-rc7?

Hi Greg,

yes that's an already known issue. Feel free to revert that one for now.

I got it on my TODO list to provide a fixed patch for older kernel, but
that can take a while.

For the background Nicolais patch is correct, but assumes that
radeon_fence_unref() can safely take NULL as the fence which is not the
case for older kernels.

Regards,
Christian.

>
> thanks,
>
> greg k-h

2016-03-07 22:58:58

by Greg KH

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian K?nig wrote:
> Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
> >On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
> >>The following patch to radeon_sa_bo_new that
> >>went into 3.10.99
> >>
> >> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
> >> Author: Nicolai Hähnle <[email protected]>
> >> Date: Fri Feb 5 14:35:53 2016 -0500
> >> drm/radeon: hold reference to fences in radeon_sa_bo_new
> >> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
> >>
> >>is triggering an Oops for me right when xscreensaver
> >>first began doing 3D stuff. After reverting this
> >>patch, xscreensaver has been happily running 3D stuff.
> >>
> >>Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >>Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
> >>Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
> >>Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
> >>
> >>Mar 6 18:00:43 sage kernel: Stack:
> >>Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
> >>Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
> >>Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
> >>Mar 6 18:00:43 sage kernel: Call Trace:
> >>Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
> >>Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
> >>Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
> >>Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
> >>Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
> >>Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
> >>Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
> >>Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
> >>Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
> >>Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
> >>
> >>$ lspci | grep VGA
> >>03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
> >Next time, please cc: the people responsible for that patch as well...
> >
> >I can revert it, but maybe something else is going on here? Do you have
> >this same problem on 3.14, and 4.5-rc7?
>
> Hi Greg,
>
> yes that's an already known issue. Feel free to revert that one for now.
>
> I got it on my TODO list to provide a fixed patch for older kernel, but that
> can take a while.
>
> For the background Nicolais patch is correct, but assumes that
> radeon_fence_unref() can safely take NULL as the fence which is not the case
> for older kernels.

Ok, thanks, now reverted.

greg k-h

2016-03-09 13:56:28

by Luis Henriques

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
> On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian K?nig wrote:
> > Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
> > >On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
> > >>The following patch to radeon_sa_bo_new that
> > >>went into 3.10.99
> > >>
> > >> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
> > >> Author: Nicolai Hähnle <[email protected]>
> > >> Date: Fri Feb 5 14:35:53 2016 -0500
> > >> drm/radeon: hold reference to fences in radeon_sa_bo_new
> > >> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
> > >>
> > >>is triggering an Oops for me right when xscreensaver
> > >>first began doing 3D stuff. After reverting this
> > >>patch, xscreensaver has been happily running 3D stuff.
> > >>
> > >>Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > >>Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
> > >>Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
> > >>Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
> > >>
> > >>Mar 6 18:00:43 sage kernel: Stack:
> > >>Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
> > >>Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
> > >>Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
> > >>Mar 6 18:00:43 sage kernel: Call Trace:
> > >>Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
> > >>Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
> > >>Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
> > >>Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
> > >>Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
> > >>Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
> > >>Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
> > >>Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
> > >>Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
> > >>Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
> > >>
> > >>$ lspci | grep VGA
> > >>03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> > >>[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
> > >Next time, please cc: the people responsible for that patch as well...
> > >
> > >I can revert it, but maybe something else is going on here? Do you have
> > >this same problem on 3.14, and 4.5-rc7?
> >
> > Hi Greg,
> >
> > yes that's an already known issue. Feel free to revert that one for now.
> >
> > I got it on my TODO list to provide a fixed patch for older kernel, but that
> > can take a while.
> >
> > For the background Nicolais patch is correct, but assumes that
> > radeon_fence_unref() can safely take NULL as the fence which is not the case
> > for older kernels.
>
> Ok, thanks, now reverted.
>

And looks like a few more kernels may be affected as well. I'll
revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
the CC list.

Cheers,
--
Lu?s

> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-03-09 16:39:11

by Greg KH

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On Wed, Mar 09, 2016 at 11:31:54AM -0500, Nicolai H?hnle wrote:
> On 09.03.2016 08:56, Luis Henriques wrote:
> >On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
> >>On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian K?nig wrote:
> >>>Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
> >>>>On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
> >>>>>The following patch to radeon_sa_bo_new that
> >>>>>went into 3.10.99
> >>>>>
> >>>>> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
> >>>>> Author: Nicolai Hähnle <[email protected]>
> >>>>> Date: Fri Feb 5 14:35:53 2016 -0500
> >>>>> drm/radeon: hold reference to fences in radeon_sa_bo_new
> >>>>> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
> >>>>>
> >>>>>is triggering an Oops for me right when xscreensaver
> >>>>>first began doing 3D stuff. After reverting this
> >>>>>patch, xscreensaver has been happily running 3D stuff.
> >>>>>
> >>>>>Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >>>>>Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
> >>>>>Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
> >>>>>
> >>>>>Mar 6 18:00:43 sage kernel: Stack:
> >>>>>Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
> >>>>>Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
> >>>>>Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
> >>>>>Mar 6 18:00:43 sage kernel: Call Trace:
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
> >>>>>Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
> >>>>>
> >>>>>$ lspci | grep VGA
> >>>>>03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>>>>[AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
> >>>>Next time, please cc: the people responsible for that patch as well...
> >>>>
> >>>>I can revert it, but maybe something else is going on here? Do you have
> >>>>this same problem on 3.14, and 4.5-rc7?
> >>>
> >>>Hi Greg,
> >>>
> >>>yes that's an already known issue. Feel free to revert that one for now.
> >>>
> >>>I got it on my TODO list to provide a fixed patch for older kernel, but that
> >>>can take a while.
> >>>
> >>>For the background Nicolais patch is correct, but assumes that
> >>>radeon_fence_unref() can safely take NULL as the fence which is not the case
> >>>for older kernels.
>
> Actually, the call to radeon_fence_ref() is the culprit.
>
> >>
> >>Ok, thanks, now reverted.
> >>
> >
> >And looks like a few more kernels may be affected as well. I'll
> >revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
> >the CC list.
>
> Kernels that contain commit 954605ca "drm/radeon: use common fence
> implementation for fences, v4" are safe, older kernels require a
> NULL-pointer check around the call to radeon_fence_ref.
>
> This means kernels 3.17 and older are affected and need the additional NULL
> pointer check that I've sent out already on a different thread (I'm
> attaching it again, hoping that Erik gets a chance to test it).
>
> It would be nice to get a confirmation that this really does fix the
> observed bug, then I can prepare a fixed version of the patch for 3.17 and
> older (i.e. squash the original bad commit with the attached patch).

Don't "squash" anything together, just send the needed patches
backported, we want to keep things to match Linus's tree as much as
possible.

thanks,

greg k-h

2016-03-09 19:06:00

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On 09.03.2016 08:56, Luis Henriques wrote:
> On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
>> On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian K?nig wrote:
>>> Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
>>>> On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
>>>>> The following patch to radeon_sa_bo_new that
>>>>> went into 3.10.99
>>>>>
>>>>> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
>>>>> Author: Nicolai Hähnle <[email protected]>
>>>>> Date: Fri Feb 5 14:35:53 2016 -0500
>>>>> drm/radeon: hold reference to fences in radeon_sa_bo_new
>>>>> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
>>>>>
>>>>> is triggering an Oops for me right when xscreensaver
>>>>> first began doing 3D stuff. After reverting this
>>>>> patch, xscreensaver has been happily running 3D stuff.
>>>>>
>>>>> Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>>>>> Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
>>>>> Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
>>>>>
>>>>> Mar 6 18:00:43 sage kernel: Stack:
>>>>> Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
>>>>> Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
>>>>> Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
>>>>> Mar 6 18:00:43 sage kernel: Call Trace:
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
>>>>>
>>>>> $ lspci | grep VGA
>>>>> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>>> [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
>>>> Next time, please cc: the people responsible for that patch as well...
>>>>
>>>> I can revert it, but maybe something else is going on here? Do you have
>>>> this same problem on 3.14, and 4.5-rc7?
>>>
>>> Hi Greg,
>>>
>>> yes that's an already known issue. Feel free to revert that one for now.
>>>
>>> I got it on my TODO list to provide a fixed patch for older kernel, but that
>>> can take a while.
>>>
>>> For the background Nicolais patch is correct, but assumes that
>>> radeon_fence_unref() can safely take NULL as the fence which is not the case
>>> for older kernels.

Actually, the call to radeon_fence_ref() is the culprit.

>>
>> Ok, thanks, now reverted.
>>
>
> And looks like a few more kernels may be affected as well. I'll
> revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
> the CC list.

Kernels that contain commit 954605ca "drm/radeon: use common fence
implementation for fences, v4" are safe, older kernels require a
NULL-pointer check around the call to radeon_fence_ref.

This means kernels 3.17 and older are affected and need the additional
NULL pointer check that I've sent out already on a different thread (I'm
attaching it again, hoping that Erik gets a chance to test it).

It would be nice to get a confirmation that this really does fix the
observed bug, then I can prepare a fixed version of the patch for 3.17
and older (i.e. squash the original bad commit with the attached patch).

Cheers,
Nicolai

>
> Cheers,
> --
> Lu?s
>
>> greg k-h
>> --
>> To unsubscribe from this list: send the line "unsubscribe stable" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
0001-drm-radeon-guard-call-to-radeon_fence_ref-against-NU.patch (1.41 kB)

2016-03-14 12:33:30

by Jiri Slaby

[permalink] [raw]
Subject: Re: Oops in 3.10.99 -- NULL pointer dereference in radeon_fence_ref

On 03/09/2016, 02:56 PM, Luis Henriques wrote:
> On Mon, Mar 07, 2016 at 02:58:51PM -0800, Greg Kroah-Hartman wrote:
>> On Mon, Mar 07, 2016 at 10:06:47PM +0100, Christian K?nig wrote:
>>> Am 07.03.2016 um 21:46 schrieb Greg Kroah-Hartman:
>>>> On Sun, Mar 06, 2016 at 07:50:14PM -0700, Erik Andersen wrote:
>>>>> The following patch to radeon_sa_bo_new that
>>>>> went into 3.10.99
>>>>>
>>>>> commit 8d5e1e5af0c667545c202e8f4051f77aa3bf31b7
>>>>> Author: Nicolai Hähnle <[email protected]>
>>>>> Date: Fri Feb 5 14:35:53 2016 -0500
>>>>> drm/radeon: hold reference to fences in radeon_sa_bo_new
>>>>> commit f6ff4f67cdf8455d0a4226eeeaf5af17c37d05eb upstream.
>>>>>
>>>>> is triggering an Oops for me right when xscreensaver
>>>>> first began doing 3D stuff. After reverting this
>>>>> patch, xscreensaver has been happily running 3D stuff.
>>>>>
>>>>> Mar 6 18:00:43 sage kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>>>>> Mar 6 18:00:43 sage kernel: IP: [<ffffffffa010345d>] radeon_fence_ref+0xd/0x50 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: PGD 799e1d067 PUD 819186067 PMD 0
>>>>> Mar 6 18:00:43 sage kernel: Oops: 0002 [#1] SMP
>>>>>
>>>>> Mar 6 18:00:43 sage kernel: Stack:
>>>>> Mar 6 18:00:43 sage kernel: ffffffffa01607ec ffff88108a4e8000 ffff88108a4e8000 ffff880888fbc000
>>>>> Mar 6 18:00:43 sage kernel: ffff880ecbf11c78 0000fe2001000006 0000000000000000 0020000000000100
>>>>> Mar 6 18:00:43 sage kernel: 00000000000d1200 ffff880ecbf11c14 0000000000000000 0000000000000000
>>>>> Mar 6 18:00:43 sage kernel: Call Trace:
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa01607ec>] ? radeon_sa_bo_new+0x2ac/0x4f0 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa005fc9d>] ? ttm_eu_list_ref_sub+0x3d/0x60 [ttm]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa0117c49>] radeon_ib_get+0x39/0x110 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa011a4ea>] radeon_cs_ioctl+0x69a/0xa70 [radeon]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffffa008e2d2>] drm_ioctl+0x512/0x650 [drm]
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff810a46e1>] ? do_futex+0x111/0xc30
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff81182a45>] do_vfs_ioctl+0x305/0x520
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff8107cd39>] ? vtime_account_user+0x69/0x80
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff81182ce1>] SyS_ioctl+0x81/0xa0
>>>>> Mar 6 18:00:43 sage kernel: [<ffffffff8178210f>] tracesys+0xe1/0xe6
>>>>>
>>>>> $ lspci | grep VGA
>>>>> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>>> [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]
>>>> Next time, please cc: the people responsible for that patch as well...
>>>>
>>>> I can revert it, but maybe something else is going on here? Do you have
>>>> this same problem on 3.14, and 4.5-rc7?
>>>
>>> Hi Greg,
>>>
>>> yes that's an already known issue. Feel free to revert that one for now.
>>>
>>> I got it on my TODO list to provide a fixed patch for older kernel, but that
>>> can take a while.
>>>
>>> For the background Nicolais patch is correct, but assumes that
>>> radeon_fence_unref() can safely take NULL as the fence which is not the case
>>> for older kernels.
>>
>> Ok, thanks, now reverted.
>>
>
> And looks like a few more kernels may be affected as well. I'll
> revert it from 3.16 kernel, and I'm adding Kamal, Sasha and Jiri to
> the CC list.

Reverted from 3.12. Thanks!

--
js
suse labs