2011-04-05 00:03:17

by Gabriel Paubert

[permalink] [raw]
Subject: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Hi,

I've had the following funny crashes on PPC machines, with
cataleptic X server as a consequence:

kernel: [drm] Setting GART location based on new memory map
kernel: Oops: Exception in kernel mode, sig: 4 [#1]
kernel: CHRP
kernel: last sysfs file: /sys/devices/pci0001:01/0001:01:08.0/resource
kernel: NIP: c05648fc LR: c0226f58 CTR: 00000008
kernel: REGS: ddb53d20 TRAP: 0700 Not tainted (2.6.38)
kernel: MSR: 00089032 <EE,ME,IR,DR> CR: 48044482 XER: 00000000
kernel: TASK = ddab12b0[3040] 'Xorg' THREAD: ddb52000
kernel: GPR00: c0226f34 ddb53dd0 ddab12b0 00000000 c0509e6c 00000000 00000000 00000000
kernel: GPR08: 00000000 00000000 00000000 00000000 28044488 101f3d8c bf8166b4 00002c00
kernel: GPR16: 101b9458 1006f1a0 101ebe0c 00000001 101ebe08 00000000 df9efc20 df9efc00
kernel: GPR24: c0591e54 80546440 ddacf660 df9efc00 c0506048 c0480210 00a00000 df9ef800
kernel: NIP [c05648fc] platform_device_register_resndata+0x4/0xa4
kernel: LR [c0226f58] radeon_cp_init+0xd08/0x10c4
kernel: Call Trace:
kernel: [ddb53dd0] [c0226f34] radeon_cp_init+0xce4/0x10c4 (unreliable)
kernel: [ddb53df0] [c020801c] drm_ioctl+0x2c0/0x3e4
kernel: [ddb53eb0] [c0091264] do_vfs_ioctl+0x674/0x710
kernel: [ddb53f10] [c0091340] sys_ioctl+0x40/0x70
kernel: [ddb53f40] [c00111a8] ret_from_syscall+0x0/0x38
kernel: --- Exception: c01 at 0xfc54a78
kernel: LR = 0xfc549dc
kernel: Instruction dump:
kernel: 736f2e31 32002f75 73722f6c 69622f6c 6962786b 6c617669 65722e73 6f2e3132
kernel: 006c6962 786b6266 696c652e 736f2e31 <002f7573> 722f6c69 622f6c69 62786b62
kernel: ---[ end trace ed79daba161e31d9 ]---

As you can see, the processor is trying to execute ASCII strings like
"/usr/lib/libxkb" and has trouble digesting them :-)

The backtrace is actually missing radeon_cp_init_microcode and radeon_do_init_cp
which are inlined inside radeon_cp_init.

The trouble is that radeon_cp_init_microcode calls platform_device_register_simple
which is a simple inline wrapper around platform_device_register_resndata, which
happens to be already freed and overwritten with something looking like a list
of filenames, since I have a non modular kernel.

For now I have locally reverted 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8
which simply added an _init_or_module section attribute to
platform_device_register_resndata, and X is up again...

Now it may be that it is the ioctl that does not have the right to do
this. Actually I thought that the name radeon_cp that is registered there
would appear somwhere under /sys (or /proc) but failed to find it...

Regards,
Gabriel


2011-04-05 10:39:47

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Die, 2011-04-05 at 01:52 +0200, Gabriel Paubert wrote:
>
> Actually I thought that the name radeon_cp that is registered there
> would appear somwhere under /sys (or /proc) but failed to find it...

FWIW the radeon_cp* functions are in drivers/gpu/drm/radeon.


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-06 08:41:17

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Hi Gabriel,

On Tue, Apr 05, 2011 at 01:52:59AM +0200, Gabriel Paubert wrote:
> I've had the following funny crashes on PPC machines, with
> cataleptic X server as a consequence:
>
> kernel: [drm] Setting GART location based on new memory map
> kernel: Oops: Exception in kernel mode, sig: 4 [#1]
> kernel: CHRP
> kernel: last sysfs file: /sys/devices/pci0001:01/0001:01:08.0/resource
> kernel: NIP: c05648fc LR: c0226f58 CTR: 00000008
> kernel: REGS: ddb53d20 TRAP: 0700 Not tainted (2.6.38)
> kernel: MSR: 00089032 <EE,ME,IR,DR> CR: 48044482 XER: 00000000
> kernel: TASK = ddab12b0[3040] 'Xorg' THREAD: ddb52000
> kernel: GPR00: c0226f34 ddb53dd0 ddab12b0 00000000 c0509e6c 00000000 00000000 00000000
> kernel: GPR08: 00000000 00000000 00000000 00000000 28044488 101f3d8c bf8166b4 00002c00
> kernel: GPR16: 101b9458 1006f1a0 101ebe0c 00000001 101ebe08 00000000 df9efc20 df9efc00
> kernel: GPR24: c0591e54 80546440 ddacf660 df9efc00 c0506048 c0480210 00a00000 df9ef800
> kernel: NIP [c05648fc] platform_device_register_resndata+0x4/0xa4
> kernel: LR [c0226f58] radeon_cp_init+0xd08/0x10c4
> kernel: Call Trace:
> kernel: [ddb53dd0] [c0226f34] radeon_cp_init+0xce4/0x10c4 (unreliable)
> kernel: [ddb53df0] [c020801c] drm_ioctl+0x2c0/0x3e4
> kernel: [ddb53eb0] [c0091264] do_vfs_ioctl+0x674/0x710
> kernel: [ddb53f10] [c0091340] sys_ioctl+0x40/0x70
> kernel: [ddb53f40] [c00111a8] ret_from_syscall+0x0/0x38
> kernel: --- Exception: c01 at 0xfc54a78
> kernel: LR = 0xfc549dc
> kernel: Instruction dump:
> kernel: 736f2e31 32002f75 73722f6c 69622f6c 6962786b 6c617669 65722e73 6f2e3132
> kernel: 006c6962 786b6266 696c652e 736f2e31 <002f7573> 722f6c69 622f6c69 62786b62
> kernel: ---[ end trace ed79daba161e31d9 ]---
>
> As you can see, the processor is trying to execute ASCII strings like
> "/usr/lib/libxkb" and has trouble digesting them :-)
>
> The backtrace is actually missing radeon_cp_init_microcode and radeon_do_init_cp
> which are inlined inside radeon_cp_init.
>
> The trouble is that radeon_cp_init_microcode calls platform_device_register_simple
> which is a simple inline wrapper around platform_device_register_resndata, which
> happens to be already freed and overwritten with something looking like a list
> of filenames, since I have a non modular kernel.
>
> For now I have locally reverted 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8
> which simply added an _init_or_module section attribute to
> platform_device_register_resndata, and X is up again...
>
> Now it may be that it is the ioctl that does not have the right to do
> this. Actually I thought that the name radeon_cp that is registered there
> would appear somwhere under /sys (or /proc) but failed to find it...
I don't know for sure, but it looks strange to me that an ioctl can
register a device. But the fear for such code in the kernel made me
choose not to squash 737a3bb941 into 44f28bdea094. So my POV is that if
the maintainer of the radeon driver thinks registering the device is OK,
reverting 737a3bb9416 is fine for me.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-04-06 08:46:58

by Dave Airlie

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

2011/4/6 Uwe Kleine-K?nig <[email protected]>:
> Hi Gabriel,
>
> On Tue, Apr 05, 2011 at 01:52:59AM +0200, Gabriel Paubert wrote:
>> I've had the following funny crashes on PPC machines, with
>> cataleptic X server as a consequence:
>>
>> kernel: [drm] Setting GART location based on new memory map
>> kernel: Oops: Exception in kernel mode, sig: 4 [#1]
>> kernel: CHRP
>> kernel: last sysfs file: /sys/devices/pci0001:01/0001:01:08.0/resource
>> kernel: NIP: c05648fc LR: c0226f58 CTR: 00000008
>> kernel: REGS: ddb53d20 TRAP: 0700 ? Not tainted ?(2.6.38)
>> kernel: MSR: 00089032 <EE,ME,IR,DR> ?CR: 48044482 ?XER: 00000000
>> kernel: TASK = ddab12b0[3040] 'Xorg' THREAD: ddb52000
>> kernel: GPR00: c0226f34 ddb53dd0 ddab12b0 00000000 c0509e6c 00000000 00000000 00000000
>> kernel: GPR08: 00000000 00000000 00000000 00000000 28044488 101f3d8c bf8166b4 00002c00
>> kernel: GPR16: 101b9458 1006f1a0 101ebe0c 00000001 101ebe08 00000000 df9efc20 df9efc00
>> kernel: GPR24: c0591e54 80546440 ddacf660 df9efc00 c0506048 c0480210 00a00000 df9ef800
>> kernel: NIP [c05648fc] platform_device_register_resndata+0x4/0xa4
>> kernel: LR [c0226f58] radeon_cp_init+0xd08/0x10c4
>> kernel: Call Trace:
>> kernel: [ddb53dd0] [c0226f34] radeon_cp_init+0xce4/0x10c4 (unreliable)
>> kernel: [ddb53df0] [c020801c] drm_ioctl+0x2c0/0x3e4
>> kernel: [ddb53eb0] [c0091264] do_vfs_ioctl+0x674/0x710
>> kernel: [ddb53f10] [c0091340] sys_ioctl+0x40/0x70
>> kernel: [ddb53f40] [c00111a8] ret_from_syscall+0x0/0x38
>> kernel: --- Exception: c01 at 0xfc54a78
>> kernel: ? ? LR = 0xfc549dc
>> kernel: Instruction dump:
>> kernel: 736f2e31 32002f75 73722f6c 69622f6c 6962786b 6c617669 65722e73 6f2e3132
>> kernel: 006c6962 786b6266 696c652e 736f2e31 <002f7573> 722f6c69 622f6c69 62786b62
>> kernel: ---[ end trace ed79daba161e31d9 ]---
>>
>> As you can see, the processor is trying to execute ASCII strings like
>> "/usr/lib/libxkb" and has trouble digesting them :-)
>>
>> The backtrace is actually missing radeon_cp_init_microcode and radeon_do_init_cp
>> which are inlined inside radeon_cp_init.
>>
>> The trouble is that radeon_cp_init_microcode calls platform_device_register_simple
>> which is a simple inline wrapper around platform_device_register_resndata, which
>> happens to be already freed and overwritten with something looking like a list
>> of filenames, since I have a non modular kernel.
>>
>> For now I have locally reverted 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8
>> which simply added an _init_or_module section attribute to
>> platform_device_register_resndata, and X is up again...
>>
>> Now it may be that it is the ioctl that does not have the right to do
>> this. Actually I thought that the name radeon_cp that is registered there
>> would appear somwhere under /sys (or /proc) but failed to find it...
> I don't know for sure, but it looks strange to me that an ioctl can
> register a device. But the fear for such code in the kernel made me
> choose not to squash 737a3bb941 into 44f28bdea094. So my POV is that if
> the maintainer of the radeon driver thinks registering the device is OK,
> reverting 737a3bb9416 is fine for me.

This is the old DRM driver for radeon, which relies on userspace to
start X then calls the kernel
to initialise the hardware. Due to this model, there is no device we
can hang off (the PCI device
might already be bound to fbdev), so we are forced to create a
platform device to load the firmware.

So its ugly, unless someone can suggest a better device to hang things
off I don't know of another way.

Dave.

2011-04-06 20:43:39

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Wed, Apr 06, 2011 at 06:46:55PM +1000, Dave Airlie wrote:
> 2011/4/6 Uwe Kleine-K?nig <[email protected]>:
> > Hi Gabriel,
> >
> > On Tue, Apr 05, 2011 at 01:52:59AM +0200, Gabriel Paubert wrote:
> >> I've had the following funny crashes on PPC machines, with
> >> cataleptic X server as a consequence:
> >>
> >> kernel: [drm] Setting GART location based on new memory map
> >> kernel: Oops: Exception in kernel mode, sig: 4 [#1]
> >> kernel: CHRP
> >> kernel: last sysfs file: /sys/devices/pci0001:01/0001:01:08.0/resource
> >> kernel: NIP: c05648fc LR: c0226f58 CTR: 00000008
> >> kernel: REGS: ddb53d20 TRAP: 0700 ? Not tainted ?(2.6.38)
> >> kernel: MSR: 00089032 <EE,ME,IR,DR> ?CR: 48044482 ?XER: 00000000
> >> kernel: TASK = ddab12b0[3040] 'Xorg' THREAD: ddb52000
> >> kernel: GPR00: c0226f34 ddb53dd0 ddab12b0 00000000 c0509e6c 00000000 00000000 00000000
> >> kernel: GPR08: 00000000 00000000 00000000 00000000 28044488 101f3d8c bf8166b4 00002c00
> >> kernel: GPR16: 101b9458 1006f1a0 101ebe0c 00000001 101ebe08 00000000 df9efc20 df9efc00
> >> kernel: GPR24: c0591e54 80546440 ddacf660 df9efc00 c0506048 c0480210 00a00000 df9ef800
> >> kernel: NIP [c05648fc] platform_device_register_resndata+0x4/0xa4
> >> kernel: LR [c0226f58] radeon_cp_init+0xd08/0x10c4
> >> kernel: Call Trace:
> >> kernel: [ddb53dd0] [c0226f34] radeon_cp_init+0xce4/0x10c4 (unreliable)
> >> kernel: [ddb53df0] [c020801c] drm_ioctl+0x2c0/0x3e4
> >> kernel: [ddb53eb0] [c0091264] do_vfs_ioctl+0x674/0x710
> >> kernel: [ddb53f10] [c0091340] sys_ioctl+0x40/0x70
> >> kernel: [ddb53f40] [c00111a8] ret_from_syscall+0x0/0x38
> >> kernel: --- Exception: c01 at 0xfc54a78
> >> kernel: ? ? LR = 0xfc549dc
> >> kernel: Instruction dump:
> >> kernel: 736f2e31 32002f75 73722f6c 69622f6c 6962786b 6c617669 65722e73 6f2e3132
> >> kernel: 006c6962 786b6266 696c652e 736f2e31 <002f7573> 722f6c69 622f6c69 62786b62
> >> kernel: ---[ end trace ed79daba161e31d9 ]---
> >>
> >> As you can see, the processor is trying to execute ASCII strings like
> >> "/usr/lib/libxkb" and has trouble digesting them :-)
> >>
> >> The backtrace is actually missing radeon_cp_init_microcode and radeon_do_init_cp
> >> which are inlined inside radeon_cp_init.
> >>
> >> The trouble is that radeon_cp_init_microcode calls platform_device_register_simple
> >> which is a simple inline wrapper around platform_device_register_resndata, which
> >> happens to be already freed and overwritten with something looking like a list
> >> of filenames, since I have a non modular kernel.
> >>
> >> For now I have locally reverted 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8
> >> which simply added an _init_or_module section attribute to
> >> platform_device_register_resndata, and X is up again...
> >>
> >> Now it may be that it is the ioctl that does not have the right to do
> >> this. Actually I thought that the name radeon_cp that is registered there
> >> would appear somwhere under /sys (or /proc) but failed to find it...
> > I don't know for sure, but it looks strange to me that an ioctl can
> > register a device. But the fear for such code in the kernel made me
> > choose not to squash 737a3bb941 into 44f28bdea094. So my POV is that if
> > the maintainer of the radeon driver thinks registering the device is OK,
> > reverting 737a3bb9416 is fine for me.
>
> This is the old DRM driver for radeon, which relies on userspace to
> start X then calls the kernel
> to initialise the hardware. Due to this model, there is no device we
> can hang off (the PCI device
> might already be bound to fbdev), so we are forced to create a
> platform device to load the firmware.
>
> So its ugly, unless someone can suggest a better device to hang things
> off I don't know of another way.
>

The probem is that, at least on one of my machines, the new driver
does not work: the system hangs (apparently solid, but it's before
networking starts up and I've not yet hooked up a serial console),
after the "radeon: ib pool ready" message.

With the old driver, I've found some combinations of configuration
options that works. They all fail when DRM_RADEON_KMS is enabled.

Gabriel
> Dave.

2011-04-07 11:25:57

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Hi Dave,

> This is the old DRM driver for radeon, which relies on userspace to
> start X then calls the kernel

Actually, even the old DRM driver occasionally hangs on this machine,
I suspect a missing barrier, but I might be completely off base.

The system is up, only X uses 100% of one core and according to
gdb X is there:

(gdb) info stack
#0 0x0fbafb08 in ioctl () from /lib/libc.so.6
#1 0x0f7be1c8 in drmDMA () from /usr/lib/libdrm.so.2
#2 0x0f65330c in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#3 0x0f65380c in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#4 0x0f6f89b8 in ?? () from /usr/lib/xorg/modules/drivers/radeon_drv.so
#5 0x0f562538 in ?? () from /usr/lib/xorg/modules/libexa.so
#6 0x0f56298c in ?? () from /usr/lib/xorg/modules/libexa.so
#7 0x0f56351c in ?? () from /usr/lib/xorg/modules/libexa.so
#8 0x0f55fba0 in ?? () from /usr/lib/xorg/modules/libexa.so
#9 0x0f56ab18 in ?? () from /usr/lib/xorg/modules/libexa.so
#10 0x0f56b810 in ?? () from /usr/lib/xorg/modules/libexa.so
#11 0x100f168c in ?? ()
#12 0x100df0fc in CompositePicture ()
#13 0x0f56a748 in ?? () from /usr/lib/xorg/modules/libexa.so
#14 0x100dee08 in CompositeTrapezoids ()
#15 0x100eb318 in ?? ()
#16 0x100e3ae8 in ?? ()
#17 0x1004a1f0 in ?? ()
#18 0x1001d0d4 in ?? ()
#19 0x0faea63c in ?? () from /lib/libc.so.6
#20 0x0faea800 in __libc_start_main () from /lib/libc.so.6
#21 0x00000000 in ?? ()


I don't know how to get more details.

Regards,
Gabriel

2011-04-07 11:33:57

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Hi Dave,

sorry, in my previous message I forgot the strace
output, which is an inifinite loop of the following:

--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
ioctl(7, 0xc0286429, 0xffdf9bb8) = -1 EBUSY (Device or resource busy)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
ioctl(7, 0xc0286429, 0xffdf9bb8) = -1 EBUSY (Device or resource busy)

Note: fd 7 is /dev/dri/card0.

Regards,
Gabriel

2011-04-07 14:04:56

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Mit, 2011-04-06 at 22:43 +0200, Gabriel Paubert wrote:
>
> The probem is that, at least on one of my machines, the new driver
> does not work: the system hangs (apparently solid, but it's before
> networking starts up and I've not yet hooked up a serial console),
> after the "radeon: ib pool ready" message.

Does radeon.agpmode=-1 radeon.no_wb=1 help?

You might be able to get more information via netconsole if you prevent
the radeon module from loading automatically (or load it with
radeon.modeset=0 first) and then load it e.g. via ssh with modeset=1.

It would be interesting to see at least all agp/drm/radeon related
kernel messages before the problem occurs.


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-11 13:31:45

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Thu, Apr 07, 2011 at 04:04:35PM +0200, Michel D?nzer wrote:
> On Mit, 2011-04-06 at 22:43 +0200, Gabriel Paubert wrote:
> >
> > The probem is that, at least on one of my machines, the new driver
> > does not work: the system hangs (apparently solid, but it's before
> > networking starts up and I've not yet hooked up a serial console),
> > after the "radeon: ib pool ready" message.
>
> Does radeon.agpmode=-1 radeon.no_wb=1 help?
>
> You might be able to get more information via netconsole if you prevent
> the radeon module from loading automatically (or load it with
> radeon.modeset=0 first) and then load it e.g. via ssh with modeset=1.

Loading the module with modeset=1 results in insmod blocked in
kernel state (not consuming CPU cycles either). The last kernel
message is always the same (ib pool ready). This seems to be
independent of agpmode and no_wb. The kernel messages when
loading the driver are:

kernel: [drm] radeon kernel modesetting enabled.
kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7).
kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass
kernel: [drm] register mmio base: 0xE8000000
kernel: [drm] register mmio size: 65536
kernel: radeon 0000:f1:00.0: Invalid ROM contents
kernel: ATOM BIOS: X1650PRO
kernel: [drm] Generation 2 PCI interface, using max accessible memory
kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
kernel: [drm] Driver supports precise vblank timestamp query.
kernel: irq: irq 9 on host /mpic mapped to virtual irq 24
kernel: u3msi: allocated virq 0x18 (hw 0x9) addr 0xf8004090
kernel: radeon 0000:f1:00.0: radeon: using MSI.
kernel: [drm] radeon: irq initialized.
kernel: [drm] Detected VRAM RAM=512M, BAR=256M
kernel: [drm] RAM width 128bits DDR
kernel: [TTM] Zone kernel: Available graphics memory: 1002914 kiB.
kernel: [TTM] Initializing pool allocator.
kernel: [drm] radeon: 512M of VRAM memory ready
kernel: [drm] radeon: 512M of GTT memory ready.
kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized.
kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000).
kernel: radeon 0000:f1:00.0: WB enabled
kernel: [drm] Loading R500 Microcode
kernel: [drm] radeon: ring at 0x0000000020001000
kernel: [drm] ring test succeeded in 6 usecs
kernel: [drm] radeon: ib pool ready.


For now, with modeset=0, agpmode=-1 and no_wb=1, the driver
seems to work. But it sometimes took hours to fail, so some
more wait is needed.

Regards,
Gabriel

2011-04-11 15:33:32

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

[ Adding the dri-devel list ]

On Mon, 2011-04-11 at 15:31 +0200, Gabriel Paubert wrote:
> On Thu, Apr 07, 2011 at 04:04:35PM +0200, Michel Dänzer wrote:
> > On Mit, 2011-04-06 at 22:43 +0200, Gabriel Paubert wrote:
> > >
> > > The probem is that, at least on one of my machines, the new driver
> > > does not work: the system hangs (apparently solid, but it's before
> > > networking starts up and I've not yet hooked up a serial console),
> > > after the "radeon: ib pool ready" message.
> >
> > Does radeon.agpmode=-1 radeon.no_wb=1 help?
> >
> > You might be able to get more information via netconsole if you prevent
> > the radeon module from loading automatically (or load it with
> > radeon.modeset=0 first) and then load it e.g. via ssh with modeset=1.
>
> Loading the module with modeset=1 results in insmod blocked in
> kernel state (not consuming CPU cycles either). The last kernel
> message is always the same (ib pool ready). This seems to be
> independent of agpmode and no_wb. The kernel messages when
> loading the driver are:
>
> kernel: [drm] radeon kernel modesetting enabled.
> kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
> kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
> kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7).
> kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass
> kernel: [drm] register mmio base: 0xE8000000
> kernel: [drm] register mmio size: 65536
> kernel: radeon 0000:f1:00.0: Invalid ROM contents
> kernel: ATOM BIOS: X1650PRO
> kernel: [drm] Generation 2 PCI interface, using max accessible memory
> kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
> kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
> kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> kernel: [drm] Driver supports precise vblank timestamp query.
> kernel: irq: irq 9 on host /mpic mapped to virtual irq 24
> kernel: u3msi: allocated virq 0x18 (hw 0x9) addr 0xf8004090
> kernel: radeon 0000:f1:00.0: radeon: using MSI.

Have you ruled out any MSI related problems? I think the IRQ not working
could explain the symptoms...

> kernel: [drm] radeon: irq initialized.
> kernel: [drm] Detected VRAM RAM=512M, BAR=256M
> kernel: [drm] RAM width 128bits DDR
> kernel: [TTM] Zone kernel: Available graphics memory: 1002914 kiB.
> kernel: [TTM] Initializing pool allocator.
> kernel: [drm] radeon: 512M of VRAM memory ready
> kernel: [drm] radeon: 512M of GTT memory ready.
> kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
> kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized.
> kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000).
> kernel: radeon 0000:f1:00.0: WB enabled

Make sure this line changes to 'WB disabled' with no_wb=1. There's a
writeback endianness bug with modeset=1, see
http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .

> kernel: [drm] Loading R500 Microcode
> kernel: [drm] radeon: ring at 0x0000000020001000
> kernel: [drm] ring test succeeded in 6 usecs
> kernel: [drm] radeon: ib pool ready.


> For now, with modeset=0, agpmode=-1 and no_wb=1, the driver
> seems to work.

The agpmode and no_wb options only have an effect with modeset=1, and
you don't seem to be using AGP anyway. :)


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-12 11:30:46

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Hi Micel,

On Mon, Apr 11, 2011 at 05:32:43PM +0200, Michel D?nzer wrote:
> [ Adding the dri-devel list ]
>
> Have you ruled out any MSI related problems? I think the IRQ not working
> could explain the symptoms...

Booting with MSI disabled does not change anything. Actually on this
machine the Ethernet (tigon3) uses MSI and everything is fine. OTOH,
on my home PC (dual code Athlon64 4 1/2 years old), MSI has never worked.

> Make sure this line changes to 'WB disabled' with no_wb=1. There's a
> writeback endianness bug with modeset=1, see
> http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .
>

With no_wb=1 the driver goes a bit further but the X server ends
up in an infinite ioctl loop and the logs are:

kernel: [drm] radeon kernel modesetting enabled.
kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7).
kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass
kernel: [drm] register mmio base: 0xE8000000
kernel: [drm] register mmio size: 65536
kernel: radeon 0000:f1:00.0: Invalid ROM contents
kernel: ATOM BIOS: X1650PRO
kernel: [drm] Generation 2 PCI interface, using max accessible memory
kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
kernel: [drm] Driver supports precise vblank timestamp query.
kernel: [drm] radeon: irq initialized.
kernel: [drm] Detected VRAM RAM=512M, BAR=256M
kernel: [drm] RAM width 128bits DDR
kernel: [TTM] Zone kernel: Available graphics memory: 1003018 kiB.
kernel: [TTM] Initializing pool allocator.
kernel: [drm] radeon: 512M of VRAM memory ready
kernel: [drm] radeon: 512M of GTT memory ready.
kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized.
kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000).
kernel: radeon 0000:f1:00.0: WB disabled
kernel: [drm] Loading R500 Microcode
kernel: [drm] radeon: ring at 0x0000000020001000
kernel: [drm] ring test succeeded in 6 usecs
kernel: [drm] radeon: ib pool ready.
kernel: [drm] ib test succeeded in 0 usecs
kernel: [drm] Radeon Display Connectors
kernel: [drm] Connector 0:
kernel: [drm] DVI-I
kernel: [drm] HPD1
kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
kernel: [drm] Encoders:
kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
kernel: [drm] DFP1: INTERNAL_KLDSCP_TMDS1
kernel: [drm] Connector 1:
kernel: [drm] S-video
kernel: [drm] Encoders:
kernel: [drm] TV1: INTERNAL_KLDSCP_DAC2
kernel: [drm] Connector 2:
kernel: [drm] DVI-I
kernel: [drm] HPD2
kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
kernel: [drm] Encoders:
kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2
kernel: [drm] DFP3: INTERNAL_LVTM1
kernel: [drm] Possible lm63 thermal controller at 0x4c
kernel: [drm] fb mappable at 0xC00C0000
kernel: [drm] vram apper at 0xC0000000
kernel: [drm] size 9216000
kernel: [drm] fb depth is 24
kernel: [drm] pitch is 7680
kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
kernel: fb1: radeondrmfb frame buffer device
kernel: drm: registered panic notifier
kernel: [drm] Initialized radeon 2.8.0 20080528 for 0000:f1:00.0 on minor 0
kernel: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id

There is only one display connected and it is to the first DVI connector, BTW.

Regards,
Gabriel

2011-04-12 11:46:47

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Die, 2011-04-12 at 13:30 +0200, Gabriel Paubert wrote:
>
> On Mon, Apr 11, 2011 at 05:32:43PM +0200, Michel Dänzer wrote:
> >
> > Have you ruled out any MSI related problems? I think the IRQ not working
> > could explain the symptoms...
>
> Booting with MSI disabled does not change anything. Actually on this
> machine the Ethernet (tigon3) uses MSI and everything is fine. OTOH,
> on my home PC (dual code Athlon64 4 1/2 years old), MSI has never worked.

Okay, the fact no_wb helps probably rules out an IRQ problem anyway.


> > Make sure this line changes to 'WB disabled' with no_wb=1. There's a
> > writeback endianness bug with modeset=1, see
> > http://lists.freedesktop.org/archives/dri-devel/2011-April/009960.html .
> >
>
> With no_wb=1 the driver goes a bit further but the X server ends
> up in an infinite ioctl loop and the logs are:

Which ioctl does it loop on? Please provide the Xorg.0.log file as well.


> kernel: [drm] radeon kernel modesetting enabled.
> kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
> kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
> kernel: [drm] initializing kernel modesetting (RV530 0x1002:0x71C7).
> kernel: radeon 0000:f1:00.0: Using 64-bit DMA iommu bypass
> kernel: [drm] register mmio base: 0xE8000000
> kernel: [drm] register mmio size: 65536
> kernel: radeon 0000:f1:00.0: Invalid ROM contents
> kernel: ATOM BIOS: X1650PRO
> kernel: [drm] Generation 2 PCI interface, using max accessible memory
> kernel: radeon 0000:f1:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
> kernel: radeon 0000:f1:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF
> kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> kernel: [drm] Driver supports precise vblank timestamp query.
> kernel: [drm] radeon: irq initialized.
> kernel: [drm] Detected VRAM RAM=512M, BAR=256M
> kernel: [drm] RAM width 128bits DDR
> kernel: [TTM] Zone kernel: Available graphics memory: 1003018 kiB.
> kernel: [TTM] Initializing pool allocator.
> kernel: [drm] radeon: 512M of VRAM memory ready
> kernel: [drm] radeon: 512M of GTT memory ready.
> kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
> kernel: [drm] radeon: 1 quad pipes, 2 z pipes initialized.
> kernel: [drm] PCIE GART of 512M enabled (table at 0x00040000).
> kernel: radeon 0000:f1:00.0: WB disabled
> kernel: [drm] Loading R500 Microcode
> kernel: [drm] radeon: ring at 0x0000000020001000
> kernel: [drm] ring test succeeded in 6 usecs
> kernel: [drm] radeon: ib pool ready.
> kernel: [drm] ib test succeeded in 0 usecs
> kernel: [drm] Radeon Display Connectors
> kernel: [drm] Connector 0:
> kernel: [drm] DVI-I
> kernel: [drm] HPD1
> kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
> kernel: [drm] Encoders:
> kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
> kernel: [drm] DFP1: INTERNAL_KLDSCP_TMDS1
> kernel: [drm] Connector 1:
> kernel: [drm] S-video
> kernel: [drm] Encoders:
> kernel: [drm] TV1: INTERNAL_KLDSCP_DAC2
> kernel: [drm] Connector 2:
> kernel: [drm] DVI-I
> kernel: [drm] HPD2
> kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
> kernel: [drm] Encoders:
> kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2
> kernel: [drm] DFP3: INTERNAL_LVTM1
> kernel: [drm] Possible lm63 thermal controller at 0x4c
> kernel: [drm] fb mappable at 0xC00C0000
> kernel: [drm] vram apper at 0xC0000000
> kernel: [drm] size 9216000
> kernel: [drm] fb depth is 24
> kernel: [drm] pitch is 7680
> kernel: checking generic (c0000000 140000) vs hw (c0000000 10000000)
> kernel: fb: conflicting fb hw usage radeondrmfb vs OFfb vga,Displa - removing generic driver
> kernel: fb1: radeondrmfb frame buffer device

Hmm, I think this should say fb0, but that should only matter for
console, not X.

> kernel: drm: registered panic notifier
> kernel: [drm] Initialized radeon 2.8.0 20080528 for 0000:f1:00.0 on minor 0
> kernel: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id

BTW, if your kernel contains commit
69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that
helps?


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-12 12:01:33

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel D?nzer wrote:
> >
> > With no_wb=1 the driver goes a bit further but the X server ends
> > up in an infinite ioctl loop and the logs are:
>
> Which ioctl does it loop on? Please provide the Xorg.0.log file as well.

>From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.

The Xorg.0.log from the previous boot is attached.

Gabriel


Attachments:
(No filename) (398.00 B)
Xorg.0.log.old (26.72 kB)
Download all attachments

2011-04-12 17:29:50

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
> On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
> > >
> > > With no_wb=1 the driver goes a bit further but the X server ends
> > > up in an infinite ioctl loop and the logs are:
> >
> > Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
>
> From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.

Note that it's normal for this ioctl to be called every time before the
GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl
always returns an error, this may not indicate a problem on its own.


> The Xorg.0.log from the previous boot is attached.

I don't see any obvious problems in it. Can you describe the symptoms of
the problem you're having with X a bit more?

One thing I notice is that the X server/driver are rather oldish. Maybe
you can try newer versions from testing, sid or even experimental to see
if that makes any difference.


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-13 07:59:55

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel D?nzer wrote:
> On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
> > On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel D?nzer wrote:
> > > >
> > > > With no_wb=1 the driver goes a bit further but the X server ends
> > > > up in an infinite ioctl loop and the logs are:
> > >
> > > Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
> >
> > From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
>
> Note that it's normal for this ioctl to be called every time before the
> GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl
> always returns an error, this may not indicate a problem on its own.

It seems to be an infinite loop, always returning EINTR because
of regular SIGALRM delivery.

>
>
> > The Xorg.0.log from the previous boot is attached.
>
> I don't see any obvious problems in it. Can you describe the symptoms of
> the problem you're having with X a bit more?

Well, X is dead, or rather in an infinite ioctl loop as described above.
IIRC, the display enters a power-down mode and there is nothing to see.

>
> One thing I notice is that the X server/driver are rather oldish. Maybe
> you can try newer versions from testing, sid or even experimental to see
> if that makes any difference.

I lack time to do it until early May (being away for 2 weeks starting on
Friday and busy on urgent things). I'm indeed Debian stable (Squeeze),
which is rather recent and the machine is about 2 1/2 years old.

Gabriel

2011-04-13 08:02:14

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel D?nzer wrote:
> BTW, if your kernel contains commit
> 69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that
> helps?

My kernel is pristine 2.6.38 and does not include this commit
(was introduced before 2.6.39-rc1 according to gitk).

Gabriel

2011-04-13 08:12:58

by Uwe Kleine-König

[permalink] [raw]
Subject: small git lesson [Was: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?]

On Wed, Apr 13, 2011 at 10:02:04AM +0200, Gabriel Paubert wrote:
> On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel D?nzer wrote:
> > BTW, if your kernel contains commit
> > 69a07f0b117a40fcc1a479358d8e1f41793617f2, can you try if reverting that
> > helps?
>
> My kernel is pristine 2.6.38 and does not include this commit
> (was introduced before 2.6.39-rc1 according to gitk).
gitk is not the best tool to find this out.

$ git name-rev --refs=refs/tags/v2.6\* 69a07f0b117a40fcc1a479358d8e1f41793617f2
69a07f0b117a40fcc1a479358d8e1f41793617f2 tags/v2.6.39-rc2~3^2~43^2~4

so it was introduced just before -rc2.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-04-13 08:18:01

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Wed, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
>
> Well, X is dead, or rather in an infinite ioctl loop as described
> above.
> IIRC, the display enters a power-down mode and there is nothing to
> see.

So basically the card crashed. There's about an infinite amount of
reasons why radeons do so, sometimes it has to do with them not liking
what you ate that day...

The only thing I can see that could be of use would be a bisect

Cheers,
Ben.

2011-04-13 08:59:37

by Andreas Schwab

[permalink] [raw]
Subject: Re: small git lesson [Was: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?]

Uwe Kleine-König <[email protected]> writes:

> $ git name-rev --refs=refs/tags/v2.6\* 69a07f0b117a40fcc1a479358d8e1f41793617f2
> 69a07f0b117a40fcc1a479358d8e1f41793617f2 tags/v2.6.39-rc2~3^2~43^2~4
>
> so it was introduced just before -rc2.

$ git tag --contains 69a07f0b117a40fcc1a479358d8e1f41793617f2
v2.6.39-rc1
v2.6.39-rc2

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."

2011-04-13 10:01:59

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Wed, Apr 13, 2011 at 06:16:13PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
> >
> > Well, X is dead, or rather in an infinite ioctl loop as described
> > above.
> > IIRC, the display enters a power-down mode and there is nothing to
> > see.
>
> So basically the card crashed. There's about an infinite amount of
> reasons why radeons do so, sometimes it has to do with them not liking
> what you ate that day...
>
> The only thing I can see that could be of use would be a bisect

Bisecting for something which I have never got to work (radeon with
KMS) on this machine is something I don't know how to do...

Note that radeon without KMS also always ends up crashing, but it
may take hours. The only case where the machine works reliably is
when glxinfo claims that it is using software rendering.

Regards,
Gabriel

2011-04-13 10:31:51

by Gabriel Paubert

[permalink] [raw]
Subject: Re: small git lesson [Was: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?]

On Wed, Apr 13, 2011 at 10:59:14AM +0200, Andreas Schwab wrote:
> Uwe Kleine-K?nig <[email protected]> writes:
>
> > $ git name-rev --refs=refs/tags/v2.6\* 69a07f0b117a40fcc1a479358d8e1f41793617f2
> > 69a07f0b117a40fcc1a479358d8e1f41793617f2 tags/v2.6.39-rc2~3^2~43^2~4
> >
> > so it was introduced just before -rc2.
>
> $ git tag --contains 69a07f0b117a40fcc1a479358d8e1f41793617f2
> v2.6.39-rc1
> v2.6.39-rc2
>

So who is right? I think it was before rc1.

Anyway I'm aware that there are other git commands, although for the option
details I often have to have a look at the man page.

However in this case the main reason to fire gitk was to have a quick look
at the patch and its context, and simply reported the "Precedes" line
in the display, which is 2.6.39-rc1. It also follow v2.6.37-rc2, which means
that it has been quite a long time outside the main tree.

Gabriel

2011-04-13 12:12:57

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
> On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
> > On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
> > > On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
> > > > >
> > > > > With no_wb=1 the driver goes a bit further but the X server ends
> > > > > up in an infinite ioctl loop and the logs are:
> > > >
> > > > Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
> > >
> > > From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
> >
> > Note that it's normal for this ioctl to be called every time before the
> > GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl
> > always returns an error, this may not indicate a problem on its own.
>
> It seems to be an infinite loop, always returning EINTR because
> of regular SIGALRM delivery.

That does sound like the GPU locks up. Do you get any messages in dmesg
about lockups and attempts to reset the GPU at any time?


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-13 12:17:13

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: small git lesson [Was: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?]

Hello Gabriel
On Wed, Apr 13, 2011 at 12:31:44PM +0200, Gabriel Paubert wrote:
> On Wed, Apr 13, 2011 at 10:59:14AM +0200, Andreas Schwab wrote:
> > Uwe Kleine-K?nig <[email protected]> writes:
> >
> > > $ git name-rev --refs=refs/tags/v2.6\* 69a07f0b117a40fcc1a479358d8e1f41793617f2
> > > 69a07f0b117a40fcc1a479358d8e1f41793617f2 tags/v2.6.39-rc2~3^2~43^2~4
> > >
> > > so it was introduced just before -rc2.
> >
> > $ git tag --contains 69a07f0b117a40fcc1a479358d8e1f41793617f2
> > v2.6.39-rc1
> > v2.6.39-rc2
> >
>
> So who is right? I think it was before rc1.
Yep, correct. I interpreted the output of git name-rev to mean it's not
included in a tag earlier than v2.6.39-rc2, but actually that's wrong.
It's just that it's easier (for some definition of easy) to reach the
commit in question from v2.6.39-rc2 than from v2.6.39-rc1.

> However in this case the main reason to fire gitk was to have a quick look
> at the patch and its context, and simply reported the "Precedes" line
> in the display, which is 2.6.39-rc1. It also follow v2.6.37-rc2, which means
> that it has been quite a long time outside the main tree.
I think this conclusion isn't valid in general. (E.g. in git itself a
bug-fix is often done on top of the commit that introduced it and than
merged into master. Still the bugfix might be new.) But looking at the
AuthorDate of 69a07f0b117a seems to support your statement.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-04-13 12:28:01

by Gabriel Paubert

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Wed, Apr 13, 2011 at 02:12:16PM +0200, Michel D?nzer wrote:
> On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
> > On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel D?nzer wrote:
> > > On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
> > > > On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel D?nzer wrote:
> > > > > >
> > > > > > With no_wb=1 the driver goes a bit further but the X server ends
> > > > > > up in an infinite ioctl loop and the logs are:
> > > > >
> > > > > Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
> > > >
> > > > From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
> > >
> > > Note that it's normal for this ioctl to be called every time before the
> > > GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl
> > > always returns an error, this may not indicate a problem on its own.
> >
> > It seems to be an infinite loop, always returning EINTR because
> > of regular SIGALRM delivery.
>
> That does sound like the GPU locks up. Do you get any messages in dmesg
> about lockups and attempts to reset the GPU at any time?

No.

Gabriel

2011-04-13 14:16:00

by Michel Dänzer

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

On Mit, 2011-04-13 at 14:27 +0200, Gabriel Paubert wrote:
> On Wed, Apr 13, 2011 at 02:12:16PM +0200, Michel Dänzer wrote:
> > On Mit, 2011-04-13 at 09:59 +0200, Gabriel Paubert wrote:
> > > On Tue, Apr 12, 2011 at 07:29:22PM +0200, Michel Dänzer wrote:
> > > > On Die, 2011-04-12 at 14:00 +0200, Gabriel Paubert wrote:
> > > > > On Tue, Apr 12, 2011 at 01:46:10PM +0200, Michel Dänzer wrote:
> > > > > > >
> > > > > > > With no_wb=1 the driver goes a bit further but the X server ends
> > > > > > > up in an infinite ioctl loop and the logs are:
> > > > > >
> > > > > > Which ioctl does it loop on? Please provide the Xorg.0.log file as well.
> > > > >
> > > > > From memory, the code was 0x64, which is DRM_RADEON_GEM_WAIT_IDLE.
> > > >
> > > > Note that it's normal for this ioctl to be called every time before the
> > > > GPU accessible pixmap memory is accessed by the CPU. Unless the ioctl
> > > > always returns an error, this may not indicate a problem on its own.
> > >
> > > It seems to be an infinite loop, always returning EINTR because
> > > of regular SIGALRM delivery.
> >
> > That does sound like the GPU locks up. Do you get any messages in dmesg
> > about lockups and attempts to reset the GPU at any time?
>
> No.

Hmm, I guess the constant SIGALRMs might prevent the lockup detection
from kicking in... Maybe you can try starting the X server with
-dumbSched to see if that gets things along any further, but in the end
there's probably no way around figuring out what causes the lockup and
fixing that anyway.


--
Earthling Michel Dänzer | http://www.vmware.com
Libre software enthusiast | Debian, X and DRI developer

2011-04-13 20:29:03

by Andy Furniss

[permalink] [raw]
Subject: Re: Revert 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 ?

Michel Dänzer wrote:

>>> That does sound like the GPU locks up. Do you get any messages in dmesg
>>> about lockups and attempts to reset the GPU at any time?
>>
>> No.
>
> Hmm, I guess the constant SIGALRMs might prevent the lockup detection
> from kicking in... Maybe you can try starting the X server with
> -dumbSched to see if that gets things along any further, but in the end
> there's probably no way around figuring out what causes the lockup and
> fixing that anyway.

I have an old AGP box that locks with 600g + agpgart - It used to give
GPU lockup to dmesg/log, but (I only test it occasionally) it doesn't
anymore. I can still sysrq OK.

I wonder if something changed in recent months in the drm/whatever code
that has changed/blocked the logging.