2008-10-16 07:00:17

by Dave Airlie

[permalink] [raw]
Subject: [git pull] agp patches for 2.6.28-rc1.


Hi Linus,

Please pull the 'agp-next' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6.git agp-next

This just contains some suspend/resume fixes for the SiS, Intel memory
sizing fix for new hw, and support for more memory types on the nvidia
AGP.

Dave.

drivers/char/agp/amd-k7-agp.c | 38 ++++++++++++++++++++++++++++++++------
drivers/char/agp/intel-agp.c | 18 ++++++++++--------
drivers/char/agp/nvidia-agp.c | 22 ++++++++++++++++++----
3 files changed, 60 insertions(+), 18 deletions(-)

commit a64d2b37c2259e169759c1701ac565f0a11dc0ea
Author: Thomas Hellstrom <thomas-at-tungstengraphics-dot-com>
Date: Wed Sep 10 14:13:33 2008 +0200

agp/nvidia: Support agp user-memory on nvidia agp.

This adds user memory support required for TTM to the nvidia AGP driver.

Signed-off-by: Dave Airlie <[email protected]>

commit 2a32c3c894bcd3b3f8cc7e23f5ecbebca4a9f8e8
Author: Stuart Bennett <[email protected]>
Date: Tue Aug 12 15:19:18 2008 +0100

agp/amd-k7: Suspend support for AMD K7 GART driver

Reinitialize bridge registers after suspend, but avoid repeating the ioremap

Tested and works on AMD761

Signed-off-by: Stuart Bennett <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

commit 44d494417278e49f5b42bd3ded1801b6d2254db8
Author: Keith Packard <[email protected]>
Date: Tue Oct 14 17:18:45 2008 -0700

agp/intel: Reduce extraneous PCI posting reads during init

Instead of doing a posting read after each GTT entry update, do a single one
at the end of the writes. This should reduce boot time a tiny amount by
avoiding a lot of extra uncached reads.

Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

commit 82e14a6215cbc9804ecc35281e973c6c8ce22fe7
Author: Eric Anholt <[email protected]>
Date: Tue Oct 14 11:28:58 2008 -0700

agp: Fix stolen memory counting on G4X.

On the GM45, the amount of stolen memory mapped to the GTT was underestimated,
even though we had 508KB more available since the GTT doesn't take from
stolen memory. On the non-GM45 G4X, we overestimated how much stolen was
mapped to the GTT by 4KB, resulting in GPU page faults when that page was
accessed.

This update requires a corresponding update to xf86-video-intel to work
correctly.

Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>


2008-10-19 23:11:31

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi Dave,

On Thursday 16 October 2008 08:59:57 Dave Airlie wrote:
> Please pull the 'agp-next' branch from
> ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6.git
> agp-next
>
> This just contains some suspend/resume fixes for the SiS, Intel memory
> sizing fix for new hw, and support for more memory types on the nvidia
> AGP.
One of those patches breaks X on my laptop with:

(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x01f7f000 (pgoffset 8063)
(WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
at offset 0x1f7f000 failed (Invalid argument)

Fatal server error:
Couldn't bind memory for exa offscreen

(Found via bisect with some guessing)

If neccessary I can bisect further, but I guess you know what the problem is.

Andres


Xorg is: 1.5.2-2
Xorg Intel: 2.4.1-1ubuntu9




Attachments:
(No filename) (824.00 B)
lspci-vv (16.02 kB)
Xorg.0.log (17.17 kB)
bisect-config (65.59 kB)
dmesg (59.66 kB)
Download all attachments

2008-10-19 23:25:33

by Dave Airlie

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.


>
> On Thursday 16 October 2008 08:59:57 Dave Airlie wrote:
> > Please pull the 'agp-next' branch from
> > ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6.git
> > agp-next
> >
> > This just contains some suspend/resume fixes for the SiS, Intel memory
> > sizing fix for new hw, and support for more memory types on the nvidia
> > AGP.
> One of those patches breaks X on my laptop with:
>
> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x01f7f000 (pgoffset 8063)
> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
> at offset 0x1f7f000 failed (Invalid argument)
>
> Fatal server error:
> Couldn't bind memory for exa offscreen
>
> (Found via bisect with some guessing)
>
> If neccessary I can bisect further, but I guess you know what the problem is.
>
> Andres
>
>
> Xorg is: 1.5.2-2
> Xorg Intel: 2.4.1-1ubuntu9
>

What type of laptop is it and what GPU has it.
if distros are carrying GM45 supporting stuff we might be in trouble..

Dave.

2008-10-19 23:44:55

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi Dave,

On Monday 20 October 2008 01:25:22 Dave Airlie wrote:
> What type of laptop is it and what GPU has it.
> if distros are carrying GM45 supporting stuff we might be in trouble..
Its a Thinkpad T500 with a hybrid ATI/intel card, but the ATI card is disabled
in bios (Intel card is not recognized properly if its enabled, havent started
trying to track this down).
Vendor classifies the intel card as GMA 4500 MHD, and X says:
(II) intel(0): Integrated Graphics Chipset: Intel(R) Mobile Intel? GM45
Express
(--) intel(0): Chipset: "Mobile Intel? GM45 Express Chipset"

Andres

2008-10-20 00:20:06

by Keith Packard

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

On Mon, 2008-10-20 at 01:44 +0200, Andres Freund wrote:
> Hi Dave,
>
> On Monday 20 October 2008 01:25:22 Dave Airlie wrote:
> > What type of laptop is it and what GPU has it.
> > if distros are carrying GM45 supporting stuff we might be in trouble..

The GM45/G45 support in both kernel and user space was horribly broken,
causing the kernel to either overwrite stolen entries with freshly
allocated pages, and potentially corrupt the system, or leave some GTT
entries uninitialized and cause the hardware to lock up.

You appear to have a happy situation where this bug isn't obviously
breaking things. We couldn't find any machines where even simple 2D
graphics was stable for very long.

We applied a patch to both kernel and 2D driver (as both end up
computing the size of the GTT stolen area for historical reasons) to fix
this mistake. Updating your 2D driver to 2.4.98 should get you the other
half of the fix.

> Its a Thinkpad T500 with a hybrid ATI/intel card, but the ATI card is disabled
> in bios (Intel card is not recognized properly if its enabled, havent started
> trying to track this down).

Yeah, I'm getting information about how 'hybrid' graphics hardware
works.

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-10-20 00:34:39

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi,

On Monday 20 October 2008 02:19:37 you wrote:
> The GM45/G45 support in both kernel and user space was horribly broken,
> causing the kernel to either overwrite stolen entries with freshly
> allocated pages, and potentially corrupt the system, or leave some GTT
> entries uninitialized and cause the hardware to lock up.
I had some crashes with 3d, but none in 2d as far as I know.

> You appear to have a happy situation where this bug isn't obviously
> breaking things. We couldn't find any machines where even simple 2D
> graphics was stable for very long.
Interesting. I had the system running for some days without problems related
to graphics.

> We applied a patch to both kernel and 2D driver (as both end up
> computing the size of the GTT stolen area for historical reasons) to fix
> this mistake. Updating your 2D driver to 2.4.98 should get you the other
> half of the fix.
Ok, will try, and also report this to ubuntu and maybe other distributions...

> Yeah, I'm getting information about how 'hybrid' graphics hardware
> works.
If you need some information/testing/whatever...

Andres

2008-10-20 01:00:24

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

On Monday 20 October 2008 02:19:37 Keith Packard wrote:
> On Mon, 2008-10-20 at 01:44 +0200, Andres Freund wrote:
> > Hi Dave,
> >
> > On Monday 20 October 2008 01:25:22 Dave Airlie wrote:
> > > What type of laptop is it and what GPU has it.
> > > if distros are carrying GM45 supporting stuff we might be in trouble..
>
> The GM45/G45 support in both kernel and user space was horribly broken,
> causing the kernel to either overwrite stolen entries with freshly
> allocated pages, and potentially corrupt the system, or leave some GTT
> entries uninitialized and cause the hardware to lock up.
>
> You appear to have a happy situation where this bug isn't obviously
> breaking things. We couldn't find any machines where even simple 2D
> graphics was stable for very long.
>
> We applied a patch to both kernel and 2D driver (as both end up
> computing the size of the GTT stolen area for historical reasons) to fix
> this mistake. Updating your 2D driver to 2.4.98 should get you the other
> half of the fix.
Hm. But still, there is at least one distribution (ubuntu intrepid) which will
propably will ship 2.4.1 in its stable version soon (it seems unlikely that
they will update to an unstable version just before an release).
Which means, that this driver will get quite some spread...
Is it accepted that the kernel abi breaks that radically/fast?

Andres

2008-10-20 01:04:26

by Dave Airlie

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.


> Hm. But still, there is at least one distribution (ubuntu intrepid) which will
> propably will ship 2.4.1 in its stable version soon (it seems unlikely that
> they will update to an unstable version just before an release).
> Which means, that this driver will get quite some spread...

Intel will give the fixes to Ubuntu to roll into 2.4.x hopefully.

> Is it accepted that the kernel abi breaks that radically/fast?
>

The problem is this isn't a kernel ABI at all. This is two pieces of code
which are doing the exact same thing to a piece of hardware, one from the
kernel and one from userspace. When they disagree things break, however
sometimes when they agree things are broken. . The solution is proper
kernel graphics drivers, however that future is further away.

Dave.

> Andres
>
>

2008-10-20 01:18:47

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi,

On Monday 20 October 2008 03:04:16 Dave Airlie wrote:
> > Hm. But still, there is at least one distribution (ubuntu intrepid) which
> > will propably will ship 2.4.1 in its stable version soon (it seems
> > unlikely that they will update to an unstable version just before an
> > release). Which means, that this driver will get quite some spread...
> Intel will give the fixes to Ubuntu to roll into 2.4.x hopefully.
I opened a bugreport on their bugtracker to make them aware of the issue:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/286182

> > Is it accepted that the kernel abi breaks that radically/fast?
> The problem is this isn't a kernel ABI at all. This is two pieces of code
> which are doing the exact same thing to a piece of hardware, one from the
> kernel and one from userspace. When they disagree things break, however
> sometimes when they agree things are broken.
Hm, I understand what you are saying, but I don't see a fundamental difference
to an ABI here. But it doesn't matter anyway...


Andres

2008-10-20 01:36:06

by Keith Packard

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

On Mon, 2008-10-20 at 03:00 +0200, Andres Freund wrote:

> Hm. But still, there is at least one distribution (ubuntu intrepid) which will
> propably will ship 2.4.1 in its stable version soon (it seems unlikely that
> they will update to an unstable version just before an release).

We can backport the fix (it's tiny) to the 2.4 2D driver.

> Which means, that this driver will get quite some spread...
> Is it accepted that the kernel abi breaks that radically/fast?

We tested a pile of hardware and didn't find any GM45s that worked, so
we assumed they were all broken and that fixing the bug wouldn't cause
any working configurations to stop working.

We can hack up the kernel so the old X server just gets a WARN_ON
instead of breaking. This is a bit worrying though; the "fix" would let
user space continue to mis-program the hardware.

My concern here is that a common failure mode with this bug was to lock
up the graphics hardware and require a reboot. Having the X server fail
to start and leave the system in text mode where new packages can be
installed seems like a better mode than making the system hang during
boot.

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-10-20 01:59:08

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi,

On Monday 20 October 2008 03:35:21 Keith Packard wrote:
> On Mon, 2008-10-20 at 03:00 +0200, Andres Freund wrote:
> > Hm. But still, there is at least one distribution (ubuntu intrepid) which
> > will propably will ship 2.4.1 in its stable version soon (it seems
> > unlikely that they will update to an unstable version just before an
> > release).
> We can backport the fix (it's tiny) to the 2.4 2D driver.
Its basically only this, right?

diff --git a/src/i830_driver.c b/src/i830_driver.c
index c1d61f4..eaf5d27 100644
--- a/src/i830_driver.c
+++ b/src/i830_driver.c
@@ -502,8 +502,8 @@ I830DetectMemory(ScrnInfoPtr pScrn)
range = gtt_size + 4;

/* new 4 series hardware has seperate GTT stolen with GFX stolen */
- if (IS_G4X(pI830))
- range = 0;
+ if (IS_G4X(pI830) || IS_GM45(pI830))
+ range = 4;

if (IS_I85X(pI830) || IS_I865G(pI830) || IS_I9XX(pI830)) {
switch (gmch_ctrl & I855_GMCH_GMS_MASK) {

Barring that I have absolutely idea about the code and all related stuff, do I
see it correct, that this also will result in problems if the kernel doesn't
have the related fix?

> > Which means, that this driver will get quite some spread...
> > Is it accepted that the kernel abi breaks that radically/fast?
> We tested a pile of hardware and didn't find any GM45s that worked, so
> we assumed they were all broken and that fixing the bug wouldn't cause
> any working configurations to stop working.
Seems sensible from your side.

> We can hack up the kernel so the old X server just gets a WARN_ON
> instead of breaking. This is a bit worrying though; the "fix" would let
> user space continue to mis-program the hardware.
If its really only a that small portion of hardware... At least the T500/T400
series, containing the same hw as mine, from Lenovo propably is not yet really
wide spread (Laptop is 4weeks old or so and wasn't available before).

> My concern here is that a common failure mode with this bug was to lock
> up the graphics hardware and require a reboot. Having the X server fail
> to start and leave the system in text mode where new packages can be
> installed seems like a better mode than making the system hang during
> boot.
Right.

Andres

2008-10-20 02:19:31

by Keith Packard

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

On Mon, 2008-10-20 at 03:58 +0200, Andres Freund wrote:

> Its basically only this, right?

Yup.

> Barring that I have absolutely idea about the code and all related stuff, do I
> see it correct, that this also will result in problems if the kernel doesn't
> have the related fix?

Yes, on this hardware, the kernel is smashing the last few stolen
entries with a bad value, so running a new driver against an old kernel
will fail as well. There's not a lot we can do about this direction
though; until we have all of this hardware management in one place,
we're stuck trying to synchronize fixes across two code bases.

> If its really only a that small portion of hardware... At least the T500/T400
> series, containing the same hw as mine, from Lenovo propably is not yet really
> wide spread (Laptop is 4weeks old or so and wasn't available before).

I haven't even received my x200s yet as they've just entered production.
The problem here was that the prototype hardware had a different
behavior than what is now shipping, so we didn't catch this mistake
until the first production machines were tested.

If you care to try the above fix against your current driver sources,
that would be helpful for us.

--
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-10-20 02:27:32

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi,

On Monday 20 October 2008 04:19:07 you wrote:
> On Mon, 2008-10-20 at 03:58 +0200, Andres Freund wrote:
> > Its basically only this, right?
> > Barring that I have absolutely idea about the code and all related stuff,
> > do I see it correct, that this also will result in problems if the kernel
> > doesn't have the related fix?
> Yes, on this hardware, the kernel is smashing the last few stolen
> entries with a bad value, so running a new driver against an old kernel
> will fail as well. There's not a lot we can do about this direction
> though; until we have all of this hardware management in one place,
> we're stuck trying to synchronize fixes across two code bases.
Ugly situation. So the distribution can't really do anything, because it will
get locked either way.

> > If its really only a that small portion of hardware... At least the
> > T500/T400 series, containing the same hw as mine, from Lenovo propably is
> > not yet really wide spread (Laptop is 4weeks old or so and wasn't
> > available before).
> I haven't even received my x200s yet as they've just entered production.
> The problem here was that the prototype hardware had a different
> behavior than what is now shipping, so we didn't catch this mistake
> until the first production machines were tested.
Life is fun.

> If you care to try the above fix against your current driver sources,
> that would be helpful for us.
Will do so tomorrow morning, its 4am here ;-) and I finished the work I had to
do tonight...


Greetings,

Andres

2008-10-20 09:50:49

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

Hi Keith,

On Monday 20 October 2008 04:19:07 Keith Packard wrote:
> If you care to try the above fix against your current driver sources,
> that would be helpful for us.
Ok, tested with 2.4.1 and current linus git
(0cfd81031a26717fe14380d18275f8e217571615) and it works without problems so
far.

I have a good reason to say no to bisection requests now ;-) switching drivers
every bisection run is annoying (unfortunately I could just cherry-pick the fix
and rebase, but who knows that...)

Andres

2008-10-20 10:29:30

by Andres Freund

[permalink] [raw]
Subject: Re: [git pull] agp patches for 2.6.28-rc1.

On Monday 20 October 2008 11:50:37 Andres Freund wrote:
> Ok, tested with 2.4.1 and current linus git
> (0cfd81031a26717fe14380d18275f8e217571615) and it works without problems so
> far.
Ahem, this is with the patch out of 4dd00681dd0f9fce8dfd4592b46418edbbd2eeb4
applied...

Andres