2010-02-04 01:28:41

by Andy Isaacson

[permalink] [raw]
Subject: [2.6.33-rc6-git regression] idr fix breaks Xorg

On my Dell Latitude e4300 commit 859ddf0974 ("idr: fix a critical
misallocation bug") causes Xorg to segfault with the following
backtrace:

...
(II) intel(0): Initializing HW Cursor
(II) intel(0): No memory allocations

Backtrace:
0: /usr/bin/X11/X(xorg_backtrace+0x26) [0x4f00c6]
1: /usr/bin/X11/X(xf86SigHandler+0x41) [0x4852c1]
2: /lib/libc.so.6 [0x7fbdad071530]
3: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab8a1cbd]
4: /usr/lib/xorg/modules/drivers//intel_drv.so(gen4_render_state_init+0x416) [0x
7fbdab8a2a36]
5: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab87d3e8]
6: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab87e6fb]
7: /usr/bin/X11/X(AddScreen+0x1d4) [0x4337c4]
8: /usr/bin/X11/X(InitOutput+0x76f) [0x46f27f]
9: /usr/bin/X11/X(main+0x1fe) [0x433ece]
10: /lib/libc.so.6(__libc_start_main+0xfd) [0x7fbdad05cabd]
11: /usr/bin/X11/X [0x433509]
Saw signal 11. Server aborting.

Reverting commit 859ddf0974 causes X to start working again -- tested on
top of c80d292f.

I'm running Karmic 64-bit with

xserver-xorg-core 2:1.6.4-2ubuntu4.1
xserver-xorg-video-intel 2:2.9.0-1ubuntu2.1
libdrm-intel1 2.4.14-1ubuntu1
libdrm2 2.4.14-1ubuntu1

More information (dmesg, lspci, full dpkg, etc) is available at
http://web.hexapodia.org/~adi/bugs/201002-xorg-segv/

I've attached my kconfig as that seems like the most likely difference
between my config and others who are (presumably?) not seeing this
crash. (Gzipped, alas, because AFAIK 70KB will exceed vger's size
liimit.)

-andy


Attachments:
(No filename) (1.56 kB)
kconfig.gz (16.32 kB)
Download all attachments

2010-02-04 03:05:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

On 02/04/2010 10:28 AM, Andy Isaacson wrote:
> On my Dell Latitude e4300 commit 859ddf0974 ("idr: fix a critical
> misallocation bug") causes Xorg to segfault with the following
> backtrace:

Does the following patch make any difference?

diff --git a/lib/idr.c b/lib/idr.c
index ba7d37c..a96c604 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -140,7 +140,8 @@ static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
id = *starting_id;
restart:
p = idp->top;
- l = p->layer;
+ l = idp->layers;
+ pa[l--] = NULL;
while (1) {
/*
* We run around this while until we reach the leaf node...


--
tejun

2010-02-04 07:08:53

by Andy Isaacson

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

On Thu, Feb 04, 2010 at 12:11:41PM +0900, Tejun Heo wrote:
> On 02/04/2010 10:28 AM, Andy Isaacson wrote:
> > On my Dell Latitude e4300 commit 859ddf0974 ("idr: fix a critical
> > misallocation bug") causes Xorg to segfault with the following
> > backtrace:
>
> Does the following patch make any difference?
>
> diff --git a/lib/idr.c b/lib/idr.c
> index ba7d37c..a96c604 100644
> --- a/lib/idr.c
> +++ b/lib/idr.c
> @@ -140,7 +140,8 @@ static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
> id = *starting_id;
> restart:
> p = idp->top;
> - l = p->layer;
> + l = idp->layers;
> + pa[l--] = NULL;
> while (1) {
> /*
> * We run around this while until we reach the leaf node...

Alas, no -- with that patch applied Xorg still segfaults in the same
place.

-andy

2010-02-04 07:56:49

by Andy Isaacson

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

On Wed, Feb 03, 2010 at 05:28:37PM -0800, Andy Isaacson wrote:
> On my Dell Latitude e4300 commit 859ddf0974 ("idr: fix a critical
> misallocation bug") causes Xorg to segfault with the following
> backtrace:
>
> ...
> (II) intel(0): Initializing HW Cursor
> (II) intel(0): No memory allocations
>
> Backtrace:
> 0: /usr/bin/X11/X(xorg_backtrace+0x26) [0x4f00c6]
> 1: /usr/bin/X11/X(xf86SigHandler+0x41) [0x4852c1]
> 2: /lib/libc.so.6 [0x7fbdad071530]
> 3: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab8a1cbd]
> 4: /usr/lib/xorg/modules/drivers//intel_drv.so(gen4_render_state_init+0x416) [0x
> 7fbdab8a2a36]
> 5: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab87d3e8]
> 6: /usr/lib/xorg/modules/drivers//intel_drv.so [0x7fbdab87e6fb]
> 7: /usr/bin/X11/X(AddScreen+0x1d4) [0x4337c4]
> 8: /usr/bin/X11/X(InitOutput+0x76f) [0x46f27f]
> 9: /usr/bin/X11/X(main+0x1fe) [0x433ece]
> 10: /lib/libc.so.6(__libc_start_main+0xfd) [0x7fbdad05cabd]
> 11: /usr/bin/X11/X [0x433509]
> Saw signal 11. Server aborting.

strace perhaps shows some more illuminating results; I suspect the
SEGV is due to a failed ioctl. (This is a different run than the above,
obviously.)

1265267921.566623 ioctl(8, 0xc020645e, 0x7fffe2196980) = 0
1265267921.566944 ioctl(8, 0x400c645f, 0x7fffe21969a0) = 0
1265267921.567272 brk(0x20e7000) = 0x20e7000
1265267921.567602 ioctl(8, 0x40046460, 0x7fffe21969a0) = 0
1265267921.567922 ioctl(8, 0xc010645b, 0x7fffe2196990) = 0
1265267921.568269 ioctl(8, 0xc020645e, 0x7fffe2196980) = -1 EBADF (Bad file descriptor)
1265267921.568649 write(2, "../../../libdrm/intel/intel_bufmgr_gem.c:637: Error mapping buffer 1073741824 (gen4 WM state): Bad file descriptor .\n", 117) = 117
1265267921.569039 --- SIGSEGV (Segmentation fault) @ 0 (0) ---

The full strace is at
http://web.hexapodia.org/~adi/bugs/201002-xorg-segv/Xorg.strace

-andy

2010-02-04 08:17:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

Hello,

On 02/04/2010 04:56 PM, Andy Isaacson wrote:
> 1265267921.568269 ioctl(8, 0xc020645e, 0x7fffe2196980) = -1 EBADF (Bad file descriptor)

Hmm... -EBADF? I suppose it doesn't mean that the fd is invalid in
this case but that the mapped object can't be found for some reason?
Can anyone more familiar with the subsystem explain what's going on?

> 1265267921.568649 write(2, "../../../libdrm/intel/intel_bufmgr_gem.c:637: Error mapping buffer 1073741824 (gen4 WM state): Bad file descriptor .\n", 117) = 117
> 1265267921.569039 --- SIGSEGV (Segmentation fault) @ 0 (0) ---

I'll forward the fore mentioned fix as it at least fixes one reported
failure.

Thanks.

--
tejun

2010-02-04 08:45:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

Hello,

On 02/04/2010 05:40 PM, Dave Airlie wrote:
> Hmm at this late stage, maybe revert first? since the old idr code works fine
> with the subsystems in question.
>
> The drm idr code usage isn't anything crazy, the EBADF is the return code
> from the mmap ioctl when it calls the idr lookup function for a handle.
>
> The lookup function is just an idr_find inside a spinlock.

Yeap, you're right. I'll send a patch to revert it.

Thanks.

--
tejun

2010-02-04 08:48:12

by Dave Airlie

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

On Thu, Feb 4, 2010 at 6:16 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 02/04/2010 04:56 PM, Andy Isaacson wrote:
>> 1265267921.568269 ioctl(8, 0xc020645e, 0x7fffe2196980) = -1 EBADF (Bad file descriptor)
>
> Hmm... -EBADF? ?I suppose it doesn't mean that the fd is invalid in
> this case but that the mapped object can't be found for some reason?
> Can anyone more familiar with the subsystem explain what's going on?
>
>> 1265267921.568649 write(2, "../../../libdrm/intel/intel_bufmgr_gem.c:637: Error mapping buffer 1073741824 (gen4 WM state): Bad file descriptor .\n", 117) = 117
>> 1265267921.569039 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>
> I'll forward the fore mentioned fix as it at least fixes one reported
> failure.

Hmm at this late stage, maybe revert first? since the old idr code works fine
with the subsystems in question.

The drm idr code usage isn't anything crazy, the EBADF is the return code
from the mmap ioctl when it calls the idr lookup function for a handle.

The lookup function is just an idr_find inside a spinlock.

Dave.

2010-02-04 09:39:24

by Chris Wilson

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

On Thu, 04 Feb 2010 17:16:56 +0900, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On 02/04/2010 04:56 PM, Andy Isaacson wrote:
> > 1265267921.568269 ioctl(8, 0xc020645e, 0x7fffe2196980) = -1 EBADF (Bad file descriptor)
>
> Hmm... -EBADF? I suppose it doesn't mean that the fd is invalid in
> this case but that the mapped object can't be found for some reason?
> Can anyone more familiar with the subsystem explain what's going on?

Correct, in this case we pass in an 'fd' via the ioctl that we wish to
mmap. idr is then used to translate that handle back to one of our buffer
objects, which is then prepared for mapping.

In this context, it is the immediate lookup of the handle following
allocation that is failing. The critical bit of code lives in
drivers/gpu/drm/drm_gem.c:

int
drm_gem_handle_create(struct drm_file *file_priv,
struct drm_gem_object *obj,
u32 *handlep)
{
int ret;

/*
* Get the user-visible handle using idr.
*/
again:
/* ensure there is space available to allocate a handle */
if (idr_pre_get(&file_priv->object_idr, GFP_KERNEL) == 0)
return -ENOMEM;

/* do the allocation under our spinlock */
spin_lock(&file_priv->table_lock);
ret = idr_get_new_above(&file_priv->object_idr, obj, 1, (int
*)handlep);
spin_unlock(&file_priv->table_lock);
if (ret == -EAGAIN)
goto again;

if (ret != 0)
return ret;

drm_gem_object_handle_reference(obj);
return 0;
}

struct drm_gem_object *
drm_gem_object_lookup(struct drm_device *dev, struct drm_file *filp,
u32 handle)
{
struct drm_gem_object *obj;

spin_lock(&filp->table_lock);

/* Check if we currently have a reference on the object */
obj = idr_find(&filp->object_idr, handle);
if (obj == NULL) {
spin_unlock(&filp->table_lock);
return NULL;
}

drm_gem_object_reference(obj);

spin_unlock(&filp->table_lock);

return obj;
}

Can you see any problems here?

>
> > 1265267921.568649 write(2, "../../../libdrm/intel/intel_bufmgr_gem.c:637: Error mapping buffer 1073741824 (gen4 WM state): Bad file descriptor .\n", 117) = 117
> > 1265267921.569039 --- SIGSEGV (Segmentation fault) @ 0 (0) ---

This SEGV is just lazy error handling in the userspace driver; the
impossible just happened.
-ickle

--
Chris Wilson, Intel Open Source Technology Centre

2010-02-11 08:44:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [2.6.33-rc6-git regression] idr fix breaks Xorg

Hello,

Can you please test the following patch on top of the current
mainline? Thanks.

diff --git a/lib/idr.c b/lib/idr.c
index 1cac726..0dc7822 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -156,10 +156,12 @@ static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;

/* if already at the top layer, we need to grow */
- if (!(p = pa[l])) {
+ if (id >= 1 << (idp->layers * IDR_BITS)) {
*starting_id = id;
return IDR_NEED_TO_GROW;
}
+ p = pa[l];
+ BUG_ON(!p);

/* If we need to go up one layer, continue the
* loop; otherwise, restart from the top.

--
tejun