2002-02-02 21:55:28

by Lars Christensen

[permalink] [raw]
Subject: 2.4.17 agpgart process hang on crash


Hi. I have experienced a problem with the combination of kernel-2.4.16,
the kernel agpgart module and NVIDIA supplied drivers. I don't know which
is the cause of the problem.

Symptoms: Whenever an OpenGL application crashes (segfault etc.), the
process hangs and can't be killed. Responds to no signals (not even 9). ps
-ef hangs, it seems, when the crashed process is to be listed (some other
processes are listed first).

Hardware: AMD Athlon 1.333HGZ, ASUS M266 motherboard (AMD761 AGP
chipset), NVIDIA GeForce2 MX400 gfx card.

The mem=nopentium option have no effect on the problem, but it doesn't
occur if I use the NVIDIA AGP drivers or kernel 2.4.16 agp drivers. I am
not able to test the 2.4.17 agpgart with other 3D hardware that nvidia.


--
Lars Christensen, [email protected]


2002-02-02 22:05:20

by Alan

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

> Hi. I have experienced a problem with the combination of kernel-2.4.16,
> the kernel agpgart module and NVIDIA supplied drivers. I don't know which
> is the cause of the problem.
>

Please report problem with the nvidia drivers loaded to nvidia. They have
the kernel source, we do not have their source code. Only they can help
you.

Alan

2002-02-02 22:30:02

by Lars Christensen

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

On Sat, 2 Feb 2002, Alan Cox wrote:

> > Hi. I have experienced a problem with the combination of kernel-2.4.16,
> > the kernel agpgart module and NVIDIA supplied drivers. I don't know which
> > is the cause of the problem.
> >
>
> Please report problem with the nvidia drivers loaded to nvidia. They have
> the kernel source, we do not have their source code. Only they can help
> you.

I am sorry -- my initial testing weren't throurough enough. Now, booting
to single-user, without any drivers loaded, i can reproduce the bug:

modprobe agpart # loads fine, AMD 761 chipset found
ulimit -c unlimited # only occurs if core file sizes are written
./testgart &
pkill -ABRT testgart # before testgart ends

testgart AND pkill process hang. Nothing will kill them. "pkill pkill"
hangs too :)

Testgart is the one by Jeff Hartman.

Doesn't seem to be NVIDIA drivers causing this. Note, with ulimit -c 0,
testgart terminates, printing "Aborted".

--
Lars Christensen, [email protected]

2002-02-02 22:33:23

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

Lars Christensen wrote:
>
> Hi. I have experienced a problem with the combination of kernel-2.4.16,
> the kernel agpgart module and NVIDIA supplied drivers. I don't know which
> is the cause of the problem.
>
> Symptoms: Whenever an OpenGL application crashes (segfault etc.), the
> process hangs and can't be killed. Responds to no signals (not even 9). ps
> -ef hangs, it seems, when the crashed process is to be listed (some other
> processes are listed first).
>
> Hardware: AMD Athlon 1.333HGZ, ASUS M266 motherboard (AMD761 AGP
> chipset), NVIDIA GeForce2 MX400 gfx card.
>
> The mem=nopentium option have no effect on the problem, but it doesn't
> occur if I use the NVIDIA AGP drivers or kernel 2.4.16 agp drivers. I am
> not able to test the 2.4.17 agpgart with other 3D hardware that nvidia.
>

This is possibly because the crashing application tries to dump
core, and the kernel gets a fault accessing the video card's
mapping, and deadlocks over the recursive attempt to take mmap_sem.

Please apply this patch:

http://www.zip.com.au/~akpm/linux/2.4/2.4.18-pre7/fbmem-mmap.patch

and send a report back.

-

2002-02-02 23:14:18

by Lars Christensen

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

On Sat, 2 Feb 2002, Andrew Morton wrote:

> Lars Christensen wrote:
> >
> > Hi. I have experienced a problem with the combination of kernel-2.4.16,
> > the kernel agpgart module and NVIDIA supplied drivers. I don't know which
> > is the cause of the problem.
> >
> > Symptoms: Whenever an OpenGL application crashes (segfault etc.), the
> > process hangs and can't be killed. Responds to no signals (not even 9). ps
> > -ef hangs, it seems, when the crashed process is to be listed (some other
> > processes are listed first).
> >
> > Hardware: AMD Athlon 1.333HGZ, ASUS M266 motherboard (AMD761 AGP
> > chipset), NVIDIA GeForce2 MX400 gfx card.
> >
> > The mem=nopentium option have no effect on the problem, but it doesn't
> > occur if I use the NVIDIA AGP drivers or kernel 2.4.16 agp drivers. I am
> > not able to test the 2.4.17 agpgart with other 3D hardware that nvidia.
> >
>
> This is possibly because the crashing application tries to dump
> core, and the kernel gets a fault accessing the video card's
> mapping, and deadlocks over the recursive attempt to take mmap_sem.
>
> Please apply this patch:
>
> http://www.zip.com.au/~akpm/linux/2.4/2.4.18-pre7/fbmem-mmap.patch
>
> and send a report back.

No luck. Still hangs (e.g. with ./testgart & pkill -ABRT testgart), with
and without that patch and with and without 2.4.18-pre7. Does seem to
happen when dumping core--it doesn't happen with core dumping disabled.

--
Lars Christensen, [email protected]

2002-02-02 23:33:12

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

Lars Christensen wrote:
>
> No luck. Still hangs (e.g. with ./testgart & pkill -ABRT testgart), with
> and without that patch and with and without 2.4.18-pre7. Does seem to
> happen when dumping core--it doesn't happen with core dumping disabled.
>

This one, please:

--- linux-2.4.18-pre7/drivers/char/agp/agpgart_fe.c Sun Aug 12 10:38:48 2001
+++ linux-akpm/drivers/char/agp/agpgart_fe.c Sat Feb 2 15:29:49 2002
@@ -605,19 +605,18 @@ static int agp_mmap(struct file *file, s
agp_client *client;
agp_file_private *priv = (agp_file_private *) file->private_data;
agp_kern_info kerninfo;
+ int ret = -EPERM;

lock_kernel();
AGP_LOCK();

if (agp_fe.backend_acquired != TRUE) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EPERM;
+ ret = -EPERM;
+ goto out;
}
if (!(test_bit(AGP_FF_IS_VALID, &priv->access_flags))) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EPERM;
+ ret = -EPERM;
+ goto out;
}
agp_copy_info(&kerninfo);
size = vma->vm_end - vma->vm_start;
@@ -627,52 +626,46 @@ static int agp_mmap(struct file *file, s

if (test_bit(AGP_FF_IS_CLIENT, &priv->access_flags)) {
if ((size + offset) > current_size) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
client = agp_find_client_by_pid(current->pid);

if (client == NULL) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EPERM;
+ ret = -EPERM;
+ goto out;
}
if (!agp_find_seg_in_client(client, offset,
size, vma->vm_page_prot)) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
if (remap_page_range(vma->vm_start,
(kerninfo.aper_base + offset),
size, vma->vm_page_prot)) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EAGAIN;
- }
- AGP_UNLOCK();
- unlock_kernel();
- return 0;
+ ret = -EAGAIN;
+ goto out;
+ }
+ ret = 0;
+ goto out;
}
if (test_bit(AGP_FF_IS_CONTROLLER, &priv->access_flags)) {
if (size != current_size) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
if (remap_page_range(vma->vm_start, kerninfo.aper_base,
size, vma->vm_page_prot)) {
- AGP_UNLOCK();
- unlock_kernel();
- return -EAGAIN;
- }
- AGP_UNLOCK();
- unlock_kernel();
- return 0;
+ ret = -EAGAIN;
+ goto out;
+ }
+ ret = 0;
}
+out:
AGP_UNLOCK();
unlock_kernel();
+ if (ret == 0)
+ vma->vm_flags |= VM_IO;
return -EPERM;
}

2002-02-03 00:13:28

by Lars Christensen

[permalink] [raw]
Subject: Re: 2.4.17 agpgart process hang on crash

On Sat, 2 Feb 2002, Andrew Morton wrote:

> Lars Christensen wrote:
> >
> > No luck. Still hangs (e.g. with ./testgart & pkill -ABRT testgart), with
> > and without that patch and with and without 2.4.18-pre7. Does seem to
> > happen when dumping core--it doesn't happen with core dumping disabled.
> >
>
> This one, please:
>
> --- linux-2.4.18-pre7/drivers/char/agp/agpgart_fe.c Sun Aug 12 10:38:48 2001
> +++ linux-akpm/drivers/char/agp/agpgart_fe.c Sat Feb 2 15:29:49 2002
> @@ -605,19 +605,18 @@ static int agp_mmap(struct file *file, s

<snip>

> + if (ret == 0)
> + vma->vm_flags |= VM_IO;
> return -EPERM;
> }

> Sorry - make the last statement `return ret;'

Better. The process dumps core now, but ps -ef hangs after printing a few
processes. Also, with a app runnig with a window open in X, the window
stays, so apparently, the process isn't gone.

--
Lars Christensen, [email protected]