2010-11-10 19:29:01

by Andrew Lutomirski

[permalink] [raw]
Subject: Severe reproducible nouveau breakage in 2.6.36 (and maybe .35)

Hi all-

Somewhere between 2.6.34-fedora-whatever and 2.6.36, Nouveau became
extremely broken on my hardware. It appears to be triggered by a bug
in my monitor (HP LP2475w), which causes the monitor to disappear from
DVI when it goes to sleep. Every time the console blanks (in X or
otherwise AFAICT) the system crashes oddly but unrecoverably. This is
100% reproducible by Ctrl-Alt-F2 followed by 'echo 1
>/sys/class/graphics/fb0/blank' *from SSH* and waiting a few seconds
for the monitor to go to sleep, but it also happens if I just walk
away from the computer long enough for it to blank itself. This is
present on F14's kernel and on 2.6.36 from kernel.org. This may or
may not be related to the unreproducible crashes that I used to get
rarely on 2.6.34.

The symptoms are:

- netconsole becomes very unreliable. (This makes it rather hard to
get any good debugging info because I don't have a real serial port.)
- system doesn't answer pings. userspace seems dead as well.
- capslock will work intermittently
- the lockup detector doesn't say anything.
- After a few seconds, the system thinks that the tsc is massively
unstable and switches clocksources. (I think this is because the
clocksource watchdog fails to schedule for awhile and then somehow
ends up running and thinking it detected a clocksource failure.)
- SysRq-c will give me my console back and spew (useless?) garbage.
Usually it also causes a panic and I get nothing else out of the
system.

The most recent time I triggered this, I got an amazing amount of
console spew about unexpected NMIs. None of it made it to serial
console, and the part left on the screen was so far down as to be
pretty much useless. lockdep shows nothing interesting (or at least
nothing interesting that stays on the screen long enough for me to
read).

The best hint I have is from this patch (sorry for whitespace damage):

diff --git a/drivers/gpu/drm/nouveau/nv50_display.c
b/drivers/gpu/drm/nouveau/nv50_display.c
index 612fa6d..6823a4d 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -1014,6 +1014,8 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
uint32_t unplug_mask, plug_mask, change_mask;
uint32_t hpd0, hpd1 = 0;

+ printk(KERN_ERR "in nv50_display_irq_hotplug_bh\n");
+
hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050);
if (dev_priv->chipset >= 0x90)
hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);
@@ -1062,6 +1064,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
if (dev_priv->chipset >= 0x90)
nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));

+ printk(KERN_ERR "about to drm_helper_hpd_irq_event\n");
drm_helper_hpd_irq_event(dev);
}

@@ -1072,6 +1075,7 @@ nv50_display_irq_handler(struct drm_device *dev)
uint32_t delayed = 0;

if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
+ printk(KERN_ERR "nv50 got hpd irq\n");
if (!work_pending(&dev_priv->hpd_work))
queue_work(dev_priv->wq, &dev_priv->hpd_work);
}

which spews "nv50 got hpd irq" once the display blanks.

Nouveau startup says:

[ 15.646535] nouveau 0000:04:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
[ 15.646540] nouveau 0000:04:00.0: setting latency timer to 64
[ 15.650606] [drm] nouveau 0000:04:00.0: Detected an NV50 generation
card (0x086f00a2)
[ 15.657126] [drm] nouveau 0000:04:00.0: Attempting to load BIOS
image from PRAMIN
[ 15.714410] [drm] nouveau 0000:04:00.0: ... appears to be valid
[ 15.714413] [drm] nouveau 0000:04:00.0: BIT BIOS found
[ 15.714415] [drm] nouveau 0000:04:00.0: Bios version 60.86.5b.00
[ 15.714418] [drm] nouveau 0000:04:00.0: TMDS table version 2.0
[ 15.714420] [drm] nouveau 0000:04:00.0: Found Display Configuration
Block version 4.0
[ 15.714423] [drm] nouveau 0000:04:00.0: Raw DCB entry 0: 02011300 00000028
[ 15.714425] [drm] nouveau 0000:04:00.0: Raw DCB entry 1: 01011302 00000010
[ 15.714427] [drm] nouveau 0000:04:00.0: Raw DCB entry 2: 01000310 00000028
[ 15.714429] [drm] nouveau 0000:04:00.0: Raw DCB entry 3: 02000312 00000010
[ 15.714430] [drm] nouveau 0000:04:00.0: Raw DCB entry 4: 0000000e 00000000
[ 15.714433] [drm] nouveau 0000:04:00.0: DCB connector table: VHER 0x40 5 14 2
[ 15.714435] [drm] nouveau 0000:04:00.0: 0: 0x00002030: type 0x30
idx 0 tag 0x08
[ 15.714438] [drm] nouveau 0000:04:00.0: 1: 0x00001130: type 0x30
idx 1 tag 0x07
[ 15.714441] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 0
at offset 0xC34B
[ 15.740011] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 1
at offset 0xC6B5
[ 15.758892] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 2
at offset 0xD2F6
[ 15.758903] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 3
at offset 0xD3E8
[ 15.760960] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 4
at offset 0xD5E2
[ 15.760965] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table at
offset 0xD647
[ 15.781884] [drm] nouveau 0000:04:00.0: 0xD647: Condition still not
met after 20ms, skipping following opcodes
[ 15.781953] [drm] nouveau 0000:04:00.0: Detected 256MiB VRAM
[ 15.873252] [TTM] Zone kernel: Available graphics memory: 3055420 kiB.
[ 15.873256] [TTM] Zone dma32: Available graphics memory: 2097152 kiB.
[ 15.873259] [TTM] Initializing pool allocator.
[ 15.948218] [drm] nouveau 0000:04:00.0: 512 MiB GART (aperture)
[ 15.983208] [drm] nouveau 0000:04:00.0: Allocating FIFO number 1
[ 15.998872] [drm] nouveau 0000:04:00.0: nouveau_channel_alloc:
initialised FIFO 1
[ 16.158101] [drm] nouveau 0000:04:00.0: allocated 1920x1200 fb:
0x40230000, bo ffff8801b48a5000
[ 16.158315] fbcon: nouveaufb (fb0) is primary device
[ 16.165464] Console: switching to colour frame buffer device 240x75
[ 16.168574] fb0: nouveaufb frame buffer device
[ 16.168576] drm: registered panic notifier
[ 16.168601] [drm] Initialized nouveau 0.0.16 20090420 for
0000:04:00.0 on minor 0


2010-11-10 20:06:31

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: Severe reproducible nouveau breakage in 2.6.36 (and maybe .35)

On Wed, Nov 10, 2010 at 2:28 PM, Andrew Lutomirski <[email protected]> wrote:
> Hi all-
>
> Somewhere between 2.6.34-fedora-whatever and 2.6.36, Nouveau became
> extremely broken on my hardware. ?It appears to be triggered by a bug
> in my monitor (HP LP2475w), which causes the monitor to disappear from
> DVI when it goes to sleep. ?Every time the console blanks (in X or
> otherwise AFAICT) the system crashes oddly but unrecoverably. ?This is
> 100% reproducible by Ctrl-Alt-F2 followed by 'echo 1
>>/sys/class/graphics/fb0/blank' *from SSH* and waiting a few seconds
> for the monitor to go to sleep, but it also happens if I just walk
> away from the computer long enough for it to blank itself. ?This is
> present on F14's kernel and on 2.6.36 from kernel.org. ?This may or
> may not be related to the unreproducible crashes that I used to get
> rarely on 2.6.34.

>
> The best hint I have is from this patch (sorry for whitespace damage):
>

>
> which spews "nv50 got hpd irq" once the display blanks.

I tracked it down. The interrupt code in 2.6.36 is totally broken ---
it acknowledges the interrupt *in the bottom half*. This might work
by accident if the bottom half gets queued on a different CPU, but
something probably changed (concurrency-managed workqueues?) that make
the BH end up on the same cpu. So the cpu starves the BH and there
goes a cpu.

Then the clocksource watchdog hits and takes the whole system down
when it calls stop_machine, which also gets starved on that cpu.

Patch coming.

--Andy

2010-11-10 21:26:37

by Andrew Lutomirski

[permalink] [raw]
Subject: [PATCH 0/2] Fix nouveau-related freezes

Nouveau takes down my system quite reliably when any hotplug event occurs.
The bug happens because the IRQ handler didn't acknowledge the hotplug
state until the bottom half, so the card generated a new interrupt
immediately, starving the bottom half and permanently starving that CPU
(and hence the bottom half).

Even with this fix, a lot of the IRQ code looks rather broken.

This is tested on 2.6.36 (and makes the system stable for me), but it also
applies cleanly to 2.6.37 (untested, but surely also necessary). Fedora 14's
2.6.35 kernels seem to have to same problem for me, so I suspect that 2.6.35
needs this fix as well. (All of my tests are on an NV50 card.)

Andy Lutomirski (2):
Use existing defines for NV50 hotplug registers
nouveau: Acknowledge HPD irq in handler, not bottom half

drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
drivers/gpu/drm/nouveau/nv50_display.c | 21 +++++++++++++++------
3 files changed, 21 insertions(+), 6 deletions(-)

--
1.7.3.2

>From 8055e8485f28491fe6219c512e379b4b89bcd465 Mon Sep 17 00:00:00 2001
Message-Id: <8055e8485f28491fe6219c512e379b4b89bcd465.1289423199.git.luto@mit.edu>
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
From: Andy Lutomirski <[email protected]>
Date: Wed, 10 Nov 2010 14:49:12 -0500
Subject: [PATCH 1/2] Use existing defines for NV50 hotplug registers

This doesn't change code at all, but it makes it a lot easier
to understand.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: <[email protected]>
---
drivers/gpu/drm/nouveau/nv50_display.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
index 612fa6d..83a7d27 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -453,8 +453,8 @@ static int nv50_display_disable(struct drm_device *dev)
nv_wr32(dev, NV50_PDISPLAY_INTR_EN, 0x00000000);

/* disable hotplug interrupts */
- nv_wr32(dev, 0xe054, 0xffffffff);
- nv_wr32(dev, 0xe050, 0x00000000);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, 0xffffffff);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_INTR, 0x00000000);
if (dev_priv->chipset >= 0x90) {
nv_wr32(dev, 0xe074, 0xffffffff);
nv_wr32(dev, 0xe070, 0x00000000);
@@ -1014,7 +1014,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
uint32_t unplug_mask, plug_mask, change_mask;
uint32_t hpd0, hpd1 = 0;

- hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050);
+ hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
if (dev_priv->chipset >= 0x90)
hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);

@@ -1058,7 +1058,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
}

- nv_wr32(dev, 0xe054, nv_rd32(dev, 0xe054));
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
if (dev_priv->chipset >= 0x90)
nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));

--
1.7.3.2


>From cb559f4c96f82d5bf0c132b3330aecd4885a0dda Mon Sep 17 00:00:00 2001
Message-Id: <cb559f4c96f82d5bf0c132b3330aecd4885a0dda.1289423199.git.luto@mit.edu>
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
From: Andy Lutomirski <[email protected]>
Date: Wed, 10 Nov 2010 15:08:39 -0500
Subject: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

The old code generated an interrupt storm bad enough to completely
take down my system.

This only fixes the bits that are defined nouveau_regs.h. Newer hardware
uses another register that isn't described, and I don't have that hardware
to test.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++----
3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index b1be617..b6c62cc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -531,6 +531,11 @@ struct drm_nouveau_private {
struct work_struct irq_work;
struct work_struct hpd_work;

+ struct {
+ spinlock_t lock;
+ uint32_t hpd0_bits;
+ } hpd_state;
+
struct list_head vbl_waiting;

struct {
diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c
index 794b0ee..b62a601 100644
--- a/drivers/gpu/drm/nouveau/nouveau_irq.c
+++ b/drivers/gpu/drm/nouveau/nouveau_irq.c
@@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev)
if (dev_priv->card_type >= NV_50) {
INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh);
INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh);
+ spin_lock_init(&dev_priv->hpd_state.lock);
INIT_LIST_HEAD(&dev_priv->vbl_waiting);
}
}
diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
index 83a7d27..0df08e3 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
uint32_t unplug_mask, plug_mask, change_mask;
uint32_t hpd0, hpd1 = 0;

- hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
+ spin_lock_irq(&dev_priv->hpd_state.lock);
+ hpd0 = dev_priv->hpd_state.hpd0_bits;
+ dev_priv->hpd_state.hpd0_bits = 0;
+ spin_unlock_irq(&dev_priv->hpd_state.lock);
+
+ hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
if (dev_priv->chipset >= 0x90)
hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);

@@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
}

- nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
if (dev_priv->chipset >= 0x90)
nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));

@@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev)
uint32_t delayed = 0;

if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
- if (!work_pending(&dev_priv->hpd_work))
- queue_work(dev_priv->wq, &dev_priv->hpd_work);
+ uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits);
+ spin_lock(&dev_priv->hpd_state.lock);
+ dev_priv->hpd_state.hpd0_bits |= hpd0_bits;
+ spin_unlock(&dev_priv->hpd_state.lock);
+
+ queue_work(dev_priv->wq, &dev_priv->hpd_work);
}

while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) {
--
1.7.3.2

2010-11-10 21:32:21

by Andrew Lutomirski

[permalink] [raw]
Subject: [PATCH 0/2] Fix nouveau-related freezes

[sorry for resend -- apparently git-send-email doesn't like mbox files]

Nouveau takes down my system quite reliably when any hotplug event occurs.
The bug happens because the IRQ handler didn't acknowledge the hotplug
state until the bottom half, so the card generated a new interrupt
immediately, starving the bottom half and permanently starving that CPU
(and hence the bottom half).

Even with this fix, a lot of the IRQ code looks rather broken.

This is tested on 2.6.36 (and makes the system stable for me), but it also
applies cleanly to 2.6.37 (untested, but surely also necessary). Fedora 14's
2.6.35 kernels seem to have to same problem for me, so I suspect that 2.6.35
needs this fix as well. (All of my tests are on an NV50 card.)

Andy Lutomirski (2):
Use existing defines for NV50 hotplug registers
nouveau: Acknowledge HPD irq in handler, not bottom half

drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
drivers/gpu/drm/nouveau/nv50_display.c | 21 +++++++++++++++------
3 files changed, 21 insertions(+), 6 deletions(-)

--
1.7.3.2

2010-11-10 21:32:26

by Andrew Lutomirski

[permalink] [raw]
Subject: [PATCH 1/2] Use existing defines for NV50 hotplug registers

This doesn't change code at all, but it makes it a lot easier
to understand.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: <[email protected]>
---
drivers/gpu/drm/nouveau/nv50_display.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
index 612fa6d..83a7d27 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -453,8 +453,8 @@ static int nv50_display_disable(struct drm_device *dev)
nv_wr32(dev, NV50_PDISPLAY_INTR_EN, 0x00000000);

/* disable hotplug interrupts */
- nv_wr32(dev, 0xe054, 0xffffffff);
- nv_wr32(dev, 0xe050, 0x00000000);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, 0xffffffff);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_INTR, 0x00000000);
if (dev_priv->chipset >= 0x90) {
nv_wr32(dev, 0xe074, 0xffffffff);
nv_wr32(dev, 0xe070, 0x00000000);
@@ -1014,7 +1014,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
uint32_t unplug_mask, plug_mask, change_mask;
uint32_t hpd0, hpd1 = 0;

- hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050);
+ hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
if (dev_priv->chipset >= 0x90)
hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);

@@ -1058,7 +1058,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
}

- nv_wr32(dev, 0xe054, nv_rd32(dev, 0xe054));
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
if (dev_priv->chipset >= 0x90)
nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));

--
1.7.3.2

2010-11-10 21:47:38

by Andrew Lutomirski

[permalink] [raw]
Subject: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

The old code generated an interrupt storm bad enough to completely
take down my system.

This only fixes the bits that are defined nouveau_regs.h. Newer hardware
uses another register that isn't described, and I don't have that hardware
to test.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++----
3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index b1be617..b6c62cc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -531,6 +531,11 @@ struct drm_nouveau_private {
struct work_struct irq_work;
struct work_struct hpd_work;

+ struct {
+ spinlock_t lock;
+ uint32_t hpd0_bits;
+ } hpd_state;
+
struct list_head vbl_waiting;

struct {
diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c
index 794b0ee..b62a601 100644
--- a/drivers/gpu/drm/nouveau/nouveau_irq.c
+++ b/drivers/gpu/drm/nouveau/nouveau_irq.c
@@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev)
if (dev_priv->card_type >= NV_50) {
INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh);
INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh);
+ spin_lock_init(&dev_priv->hpd_state.lock);
INIT_LIST_HEAD(&dev_priv->vbl_waiting);
}
}
diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
index 83a7d27..0df08e3 100644
--- a/drivers/gpu/drm/nouveau/nv50_display.c
+++ b/drivers/gpu/drm/nouveau/nv50_display.c
@@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
uint32_t unplug_mask, plug_mask, change_mask;
uint32_t hpd0, hpd1 = 0;

- hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
+ spin_lock_irq(&dev_priv->hpd_state.lock);
+ hpd0 = dev_priv->hpd_state.hpd0_bits;
+ dev_priv->hpd_state.hpd0_bits = 0;
+ spin_unlock_irq(&dev_priv->hpd_state.lock);
+
+ hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
if (dev_priv->chipset >= 0x90)
hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);

@@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
}

- nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
if (dev_priv->chipset >= 0x90)
nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));

@@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev)
uint32_t delayed = 0;

if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
- if (!work_pending(&dev_priv->hpd_work))
- queue_work(dev_priv->wq, &dev_priv->hpd_work);
+ uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL);
+ nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits);
+ spin_lock(&dev_priv->hpd_state.lock);
+ dev_priv->hpd_state.hpd0_bits |= hpd0_bits;
+ spin_unlock(&dev_priv->hpd_state.lock);
+
+ queue_work(dev_priv->wq, &dev_priv->hpd_work);
}

while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) {
--
1.7.3.2

2010-11-10 22:11:03

by Ben Skeggs

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
> The old code generated an interrupt storm bad enough to completely
> take down my system.
>
> This only fixes the bits that are defined nouveau_regs.h. Newer hardware
> uses another register that isn't described, and I don't have that hardware
> to test.
Thanks for looking at this. I'll take a closer look at the problem
today and see what I can come up with too, that'll work with the newer
hardware too.

Ben.
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> Cc: <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
> drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
> drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++----
> 3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
> index b1be617..b6c62cc 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> @@ -531,6 +531,11 @@ struct drm_nouveau_private {
> struct work_struct irq_work;
> struct work_struct hpd_work;
>
> + struct {
> + spinlock_t lock;
> + uint32_t hpd0_bits;
> + } hpd_state;
> +
> struct list_head vbl_waiting;
>
> struct {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c
> index 794b0ee..b62a601 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_irq.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c
> @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev)
> if (dev_priv->card_type >= NV_50) {
> INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh);
> INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh);
> + spin_lock_init(&dev_priv->hpd_state.lock);
> INIT_LIST_HEAD(&dev_priv->vbl_waiting);
> }
> }
> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
> index 83a7d27..0df08e3 100644
> --- a/drivers/gpu/drm/nouveau/nv50_display.c
> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
> @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
> uint32_t unplug_mask, plug_mask, change_mask;
> uint32_t hpd0, hpd1 = 0;
>
> - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
> + spin_lock_irq(&dev_priv->hpd_state.lock);
> + hpd0 = dev_priv->hpd_state.hpd0_bits;
> + dev_priv->hpd_state.hpd0_bits = 0;
> + spin_unlock_irq(&dev_priv->hpd_state.lock);
> +
> + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
> if (dev_priv->chipset >= 0x90)
> hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);
>
> @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
> helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
> }
>
> - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
> if (dev_priv->chipset >= 0x90)
> nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));
>
> @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev)
> uint32_t delayed = 0;
>
> if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
> - if (!work_pending(&dev_priv->hpd_work))
> - queue_work(dev_priv->wq, &dev_priv->hpd_work);
> + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL);
> + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits);
> + spin_lock(&dev_priv->hpd_state.lock);
> + dev_priv->hpd_state.hpd0_bits |= hpd0_bits;
> + spin_unlock(&dev_priv->hpd_state.lock);
> +
> + queue_work(dev_priv->wq, &dev_priv->hpd_work);
> }
>
> while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) {

2010-11-10 22:25:51

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
> On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
>> The old code generated an interrupt storm bad enough to completely
>> take down my system.
>>
>> This only fixes the bits that are defined nouveau_regs.h. ?Newer hardware
>> uses another register that isn't described, and I don't have that hardware
>> to test.
> Thanks for looking at this. ?I'll take a closer look at the problem
> today and see what I can come up with too, that'll work with the newer
> hardware too.

It should be as simple as adding an hpd1 field to the hpd_state and
making exactly the same change. (It would be nice to put the register
definitions into nouveau_regs.h as well -- I didn't really want to
muck around with a bunch of magic numbers that I can't test.)

I tried writing 0xffffffff to the display IRQ control in the handler
to explicitly acknowledge the IRQ, but either I did it wrong or it had
no effect.

I imagine that this explains the unreproducible crashes I had on F13 as well.

--Andy

>
> Ben.
>>
>> Signed-off-by: Andy Lutomirski <[email protected]>
>> Cc: <[email protected]>
>> ---
>> ?drivers/gpu/drm/nouveau/nouveau_drv.h ?| ? ?5 +++++
>> ?drivers/gpu/drm/nouveau/nouveau_irq.c ?| ? ?1 +
>> ?drivers/gpu/drm/nouveau/nv50_display.c | ? 17 +++++++++++++----
>> ?3 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
>> index b1be617..b6c62cc 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
>> @@ -531,6 +531,11 @@ struct drm_nouveau_private {
>> ? ? ? struct work_struct irq_work;
>> ? ? ? struct work_struct hpd_work;
>>
>> + ? ? struct {
>> + ? ? ? ? ? ? spinlock_t lock;
>> + ? ? ? ? ? ? uint32_t hpd0_bits;
>> + ? ? } hpd_state;
>> +
>> ? ? ? struct list_head vbl_waiting;
>>
>> ? ? ? struct {
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c
>> index 794b0ee..b62a601 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_irq.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c
>> @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev)
>> ? ? ? if (dev_priv->card_type >= NV_50) {
>> ? ? ? ? ? ? ? INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh);
>> ? ? ? ? ? ? ? INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh);
>> + ? ? ? ? ? ? spin_lock_init(&dev_priv->hpd_state.lock);
>> ? ? ? ? ? ? ? INIT_LIST_HEAD(&dev_priv->vbl_waiting);
>> ? ? ? }
>> ?}
>> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
>> index 83a7d27..0df08e3 100644
>> --- a/drivers/gpu/drm/nouveau/nv50_display.c
>> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
>> @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
>> ? ? ? uint32_t unplug_mask, plug_mask, change_mask;
>> ? ? ? uint32_t hpd0, hpd1 = 0;
>>
>> - ? ? hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
>> + ? ? spin_lock_irq(&dev_priv->hpd_state.lock);
>> + ? ? hpd0 = dev_priv->hpd_state.hpd0_bits;
>> + ? ? dev_priv->hpd_state.hpd0_bits = 0;
>> + ? ? spin_unlock_irq(&dev_priv->hpd_state.lock);
>> +
>> + ? ? hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
>> ? ? ? if (dev_priv->chipset >= 0x90)
>> ? ? ? ? ? ? ? hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);
>>
>> @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
>> ? ? ? ? ? ? ? ? ? ? ? helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
>> ? ? ? }
>>
>> - ? ? nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
>> ? ? ? if (dev_priv->chipset >= 0x90)
>> ? ? ? ? ? ? ? nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));
>>
>> @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev)
>> ? ? ? uint32_t delayed = 0;
>>
>> ? ? ? if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
>> - ? ? ? ? ? ? if (!work_pending(&dev_priv->hpd_work))
>> - ? ? ? ? ? ? ? ? ? ? queue_work(dev_priv->wq, &dev_priv->hpd_work);
>> + ? ? ? ? ? ? uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL);
>> + ? ? ? ? ? ? nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits);
>> + ? ? ? ? ? ? spin_lock(&dev_priv->hpd_state.lock);
>> + ? ? ? ? ? ? dev_priv->hpd_state.hpd0_bits |= hpd0_bits;
>> + ? ? ? ? ? ? spin_unlock(&dev_priv->hpd_state.lock);
>> +
>> + ? ? ? ? ? ? queue_work(dev_priv->wq, &dev_priv->hpd_work);
>> ? ? ? }
>>
>> ? ? ? while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) {
>
>
>

2010-11-10 22:35:50

by Ben Skeggs

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
> >> The old code generated an interrupt storm bad enough to completely
> >> take down my system.
> >>
> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware
> >> uses another register that isn't described, and I don't have that hardware
> >> to test.
> > Thanks for looking at this. I'll take a closer look at the problem
> > today and see what I can come up with too, that'll work with the newer
> > hardware too.
>
> It should be as simple as adding an hpd1 field to the hpd_state and
> making exactly the same change. (It would be nice to put the register
> definitions into nouveau_regs.h as well -- I didn't really want to
> muck around with a bunch of magic numbers that I can't test.)
Yes, it is. I can confirm the problem on another card, but it doesn't
actually cause any crashes here. If you can rework the patch to support
the newer chips too, that'd be great.

As for magic numbers, the register names for those regs are wrong
anyway. The joy of reverse-engineering the support. It doesn't really
matter if you want to stick to them or go back to "magic" numbers.

Ben.

>
> I tried writing 0xffffffff to the display IRQ control in the handler
> to explicitly acknowledge the IRQ, but either I did it wrong or it had
> no effect.
>
> I imagine that this explains the unreproducible crashes I had on F13 as well.
>
> --Andy
>
> >
> > Ben.
> >>
> >> Signed-off-by: Andy Lutomirski <[email protected]>
> >> Cc: <[email protected]>
> >> ---
> >> drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++
> >> drivers/gpu/drm/nouveau/nouveau_irq.c | 1 +
> >> drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++----
> >> 3 files changed, 19 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
> >> index b1be617..b6c62cc 100644
> >> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h
> >> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
> >> @@ -531,6 +531,11 @@ struct drm_nouveau_private {
> >> struct work_struct irq_work;
> >> struct work_struct hpd_work;
> >>
> >> + struct {
> >> + spinlock_t lock;
> >> + uint32_t hpd0_bits;
> >> + } hpd_state;
> >> +
> >> struct list_head vbl_waiting;
> >>
> >> struct {
> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c
> >> index 794b0ee..b62a601 100644
> >> --- a/drivers/gpu/drm/nouveau/nouveau_irq.c
> >> +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c
> >> @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev)
> >> if (dev_priv->card_type >= NV_50) {
> >> INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh);
> >> INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh);
> >> + spin_lock_init(&dev_priv->hpd_state.lock);
> >> INIT_LIST_HEAD(&dev_priv->vbl_waiting);
> >> }
> >> }
> >> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c
> >> index 83a7d27..0df08e3 100644
> >> --- a/drivers/gpu/drm/nouveau/nv50_display.c
> >> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
> >> @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
> >> uint32_t unplug_mask, plug_mask, change_mask;
> >> uint32_t hpd0, hpd1 = 0;
> >>
> >> - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
> >> + spin_lock_irq(&dev_priv->hpd_state.lock);
> >> + hpd0 = dev_priv->hpd_state.hpd0_bits;
> >> + dev_priv->hpd_state.hpd0_bits = 0;
> >> + spin_unlock_irq(&dev_priv->hpd_state.lock);
> >> +
> >> + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR);
> >> if (dev_priv->chipset >= 0x90)
> >> hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070);
> >>
> >> @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work)
> >> helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF);
> >> }
> >>
> >> - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL));
> >> if (dev_priv->chipset >= 0x90)
> >> nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074));
> >>
> >> @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev)
> >> uint32_t delayed = 0;
> >>
> >> if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) {
> >> - if (!work_pending(&dev_priv->hpd_work))
> >> - queue_work(dev_priv->wq, &dev_priv->hpd_work);
> >> + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL);
> >> + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits);
> >> + spin_lock(&dev_priv->hpd_state.lock);
> >> + dev_priv->hpd_state.hpd0_bits |= hpd0_bits;
> >> + spin_unlock(&dev_priv->hpd_state.lock);
> >> +
> >> + queue_work(dev_priv->wq, &dev_priv->hpd_work);
> >> }
> >>
> >> while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) {
> >
> >
> >

2010-11-10 22:51:59

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <[email protected]> wrote:
> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
>> >> The old code generated an interrupt storm bad enough to completely
>> >> take down my system.
>> >>
>> >> This only fixes the bits that are defined nouveau_regs.h. ?Newer hardware
>> >> uses another register that isn't described, and I don't have that hardware
>> >> to test.
>> > Thanks for looking at this. ?I'll take a closer look at the problem
>> > today and see what I can come up with too, that'll work with the newer
>> > hardware too.
>>
>> It should be as simple as adding an hpd1 field to the hpd_state and
>> making exactly the same change. ?(It would be nice to put the register
>> definitions into nouveau_regs.h as well -- I didn't really want to
>> muck around with a bunch of magic numbers that I can't test.)
> Yes, it is. ?I can confirm the problem on another card, but it doesn't
> actually cause any crashes here. ?If you can rework the patch to support
> the newer chips too, that'd be great.
>
> As for magic numbers, the register names for those regs are wrong
> anyway. ?The joy of reverse-engineering the support. ?It doesn't really
> matter if you want to stick to them or go back to "magic" numbers.

That explains why INTR and CTRL seemed backwards :) I'll leave the
magic numbers for the 0xe07? stuff.

Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back.

Patch to follow after I boot and test it here.

--Andy

2010-11-10 22:55:32

by Maarten Maathuis

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <[email protected]> wrote:
> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <[email protected]> wrote:
>> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
>>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
>>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
>>> >> The old code generated an interrupt storm bad enough to completely
>>> >> take down my system.
>>> >>
>>> >> This only fixes the bits that are defined nouveau_regs.h. ?Newer hardware
>>> >> uses another register that isn't described, and I don't have that hardware
>>> >> to test.
>>> > Thanks for looking at this. ?I'll take a closer look at the problem
>>> > today and see what I can come up with too, that'll work with the newer
>>> > hardware too.
>>>
>>> It should be as simple as adding an hpd1 field to the hpd_state and
>>> making exactly the same change. ?(It would be nice to put the register
>>> definitions into nouveau_regs.h as well -- I didn't really want to
>>> muck around with a bunch of magic numbers that I can't test.)
>> Yes, it is. ?I can confirm the problem on another card, but it doesn't
>> actually cause any crashes here. ?If you can rework the patch to support
>> the newer chips too, that'd be great.
>>
>> As for magic numbers, the register names for those regs are wrong
>> anyway. ?The joy of reverse-engineering the support. ?It doesn't really
>> matter if you want to stick to them or go back to "magic" numbers.
>
> That explains why INTR and CTRL seemed backwards :) ?I'll leave the
> magic numbers for the 0xe07? stuff.

Perhaps remove the bad definitions from the reg file, or rename them
to UNKsomething?

>
> Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back.
>
> Patch to follow after I boot and test it here.
>
> --Andy
> _______________________________________________
> dri-devel mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>



--
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.

2010-11-10 22:59:21

by Ben Skeggs

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, 2010-11-10 at 17:51 -0500, Andrew Lutomirski wrote:
> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <[email protected]> wrote:
> > On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
> >> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
> >> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
> >> >> The old code generated an interrupt storm bad enough to completely
> >> >> take down my system.
> >> >>
> >> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware
> >> >> uses another register that isn't described, and I don't have that hardware
> >> >> to test.
> >> > Thanks for looking at this. I'll take a closer look at the problem
> >> > today and see what I can come up with too, that'll work with the newer
> >> > hardware too.
> >>
> >> It should be as simple as adding an hpd1 field to the hpd_state and
> >> making exactly the same change. (It would be nice to put the register
> >> definitions into nouveau_regs.h as well -- I didn't really want to
> >> muck around with a bunch of magic numbers that I can't test.)
> > Yes, it is. I can confirm the problem on another card, but it doesn't
> > actually cause any crashes here. If you can rework the patch to support
> > the newer chips too, that'd be great.
> >
> > As for magic numbers, the register names for those regs are wrong
> > anyway. The joy of reverse-engineering the support. It doesn't really
> > matter if you want to stick to them or go back to "magic" numbers.
>
> That explains why INTR and CTRL seemed backwards :) I'll leave the
> magic numbers for the 0xe07? stuff.
That sounds good, it'll all get a cleanup at some point and switched to
"proper" (well, our best guess, you'd have to ask NVIDIA about the real
ones) names.

Ben.
>
> Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back.
>
> Patch to follow after I boot and test it here.
>
> --Andy

2010-11-10 23:01:59

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, Nov 10, 2010 at 5:55 PM, Maarten Maathuis <[email protected]> wrote:
> On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <[email protected]> wrote:
>> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <[email protected]> wrote:
>>> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
>>>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
>>>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
>>>> >> The old code generated an interrupt storm bad enough to completely
>>>> >> take down my system.
>>>> >>
>>>> >> This only fixes the bits that are defined nouveau_regs.h. ?Newer hardware
>>>> >> uses another register that isn't described, and I don't have that hardware
>>>> >> to test.
>>>> > Thanks for looking at this. ?I'll take a closer look at the problem
>>>> > today and see what I can come up with too, that'll work with the newer
>>>> > hardware too.
>>>>
>>>> It should be as simple as adding an hpd1 field to the hpd_state and
>>>> making exactly the same change. ?(It would be nice to put the register
>>>> definitions into nouveau_regs.h as well -- I didn't really want to
>>>> muck around with a bunch of magic numbers that I can't test.)
>>> Yes, it is. ?I can confirm the problem on another card, but it doesn't
>>> actually cause any crashes here. ?If you can rework the patch to support
>>> the newer chips too, that'd be great.
>>>
>>> As for magic numbers, the register names for those regs are wrong
>>> anyway. ?The joy of reverse-engineering the support. ?It doesn't really
>>> matter if you want to stick to them or go back to "magic" numbers.
>>
>> That explains why INTR and CTRL seemed backwards :) ?I'll leave the
>> magic numbers for the 0xe07? stuff.
>
> Perhaps remove the bad definitions from the reg file, or rename them
> to UNKsomething?

Well, they're known. One is hotplug detect enable (unless the code is
wrong) and the other is hotplug interrupt status.



--Andy

2010-11-10 23:12:35

by Ben Skeggs

[permalink] [raw]
Subject: Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half

On Wed, 2010-11-10 at 18:01 -0500, Andrew Lutomirski wrote:
> On Wed, Nov 10, 2010 at 5:55 PM, Maarten Maathuis <[email protected]> wrote:
> > On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <[email protected]> wrote:
> >> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <[email protected]> wrote:
> >>> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote:
> >>>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <[email protected]> wrote:
> >>>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote:
> >>>> >> The old code generated an interrupt storm bad enough to completely
> >>>> >> take down my system.
> >>>> >>
> >>>> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware
> >>>> >> uses another register that isn't described, and I don't have that hardware
> >>>> >> to test.
> >>>> > Thanks for looking at this. I'll take a closer look at the problem
> >>>> > today and see what I can come up with too, that'll work with the newer
> >>>> > hardware too.
> >>>>
> >>>> It should be as simple as adding an hpd1 field to the hpd_state and
> >>>> making exactly the same change. (It would be nice to put the register
> >>>> definitions into nouveau_regs.h as well -- I didn't really want to
> >>>> muck around with a bunch of magic numbers that I can't test.)
> >>> Yes, it is. I can confirm the problem on another card, but it doesn't
> >>> actually cause any crashes here. If you can rework the patch to support
> >>> the newer chips too, that'd be great.
> >>>
> >>> As for magic numbers, the register names for those regs are wrong
> >>> anyway. The joy of reverse-engineering the support. It doesn't really
> >>> matter if you want to stick to them or go back to "magic" numbers.
> >>
> >> That explains why INTR and CTRL seemed backwards :) I'll leave the
> >> magic numbers for the 0xe07? stuff.
> >
> > Perhaps remove the bad definitions from the reg file, or rename them
> > to UNKsomething?
>
> Well, they're known. One is hotplug detect enable (unless the code is
> wrong) and the other is hotplug interrupt status.
That's also not correct, if anything the most accurate names so far
would probably be:

#define NV_PGPIO_INTR_EN_0 0xe050
#define NV_PGPIO_INTR_0 0xe054
#define NV_PGPIO_INTR_EN_1 0xe070
#define NV_PGPIO_INTR_1 0xe074

PGPIO is a guess, and there's other stuff in that range too, but it's
definitely *not* PCONNECTOR.

Anyway, this doesn't matter. Whatever change in names can happen in
nouveau git and make it's way to Linus from there, the fix for nouveau
git is already going to be different enough from what'll apply on
Linus' tree right now. My opinion is, lets just fix the bug in
mainline (without register naming) and fix the naming etc in nouveau
git.

Ben.
>
>
>
> --Andy