2006-12-14 04:23:02

by Daniel Drake

[permalink] [raw]
Subject: amd64 agpgart aperture base value

Hi Dave,

I'm working on a solution for
http://bugzilla.kernel.org/show_bug.cgi?id=6350

Certain BIOSes are screwing with the K8 aperture base value. However,
these systems work after booting into windows and then rebooting into Linux.

It originally appeared to be a bug specific to asrock motherboard based
on nforce3, but further reports have shown that this bug also manifests
on ASUS+nforce3 and ASUS+via.

The BIOS sets some high bits at address 0x94 of the PCI config space of
the northbridge, which falls under AMD64_GARTAPERTUREBASE

My current approach at a solution involves identifying the buggy systems
by southbridge, and then fixing the northbridge in a PCI quirk. However
as more systems are being uncovered I don't feel so good about this
approach.

In amd64-agp.c, would it be dangerous to remove the "aperture base > 4G"
thing and instead simply only read the rightmost 7 bits to ensure the
aperture base is always in range? (This is coming from someone with
little AGPGART understanding...)

Alternatively do you have other suggestions for how the problem might be
solved better?

Thanks!
Daniel


2006-12-14 13:22:37

by Dave Jones

[permalink] [raw]
Subject: Re: amd64 agpgart aperture base value

On Wed, Dec 13, 2006 at 10:47:32PM -0500, Daniel Drake wrote:

> In amd64-agp.c, would it be dangerous to remove the "aperture base > 4G"
> thing and instead simply only read the rightmost 7 bits to ensure the
> aperture base is always in range? (This is coming from someone with
> little AGPGART understanding...)

Ignoring the high bits is the wrong thing to be doing.
The BIOS placed the aperture in one place, and by masking bits, you're going
to be assuming its somewhere else, and scribbling over who knows what.

> Alternatively do you have other suggestions for how the problem might be
> solved better?

If the aperture is placed above 4G, we should deal with it. Currently, we
don't. (See the AGP patches Linus merged just before 2.6.19 was released
that work around this for intel-agp).

Just needs someone to find the time to write the code to do it, and test it.

Dave

--
http://www.codemonkey.org.uk

2006-12-14 23:35:14

by Daniel Drake

[permalink] [raw]
Subject: Re: amd64 agpgart aperture base value

Dave Jones wrote:
> On Wed, Dec 13, 2006 at 10:47:32PM -0500, Daniel Drake wrote:
>
> > In amd64-agp.c, would it be dangerous to remove the "aperture base > 4G"
> > thing and instead simply only read the rightmost 7 bits to ensure the
> > aperture base is always in range? (This is coming from someone with
> > little AGPGART understanding...)
>
> Ignoring the high bits is the wrong thing to be doing.
> The BIOS placed the aperture in one place, and by masking bits, you're going
> to be assuming its somewhere else, and scribbling over who knows what.

So, you think that the aperture moving to a different location on every
boot is what the BIOS desires? Is it normal for it to move so much?

The current patch drops the upper bits and results in the aperture
always being in the same place, and this appears to work. If the BIOS
did really put the aperture beyond 4GB but my patch is making Linux put
it somewhere else, does it surprise you that things are still working
smoothly?

Is it even possible for the aperture to start beyond 4GB when the system
has less than 4GB of RAM?

> If the aperture is placed above 4G, we should deal with it. Currently, we
> don't. (See the AGP patches Linus merged just before 2.6.19 was released
> that work around this for intel-agp).
>
> Just needs someone to find the time to write the code to do it, and test it.

Looks like some understanding of AGP is required too. I'll have a closer
look another time.

Thanks,
Daniel

2006-12-15 00:02:55

by Dave Jones

[permalink] [raw]
Subject: Re: amd64 agpgart aperture base value

On Thu, Dec 14, 2006 at 06:35:30PM -0500, Daniel Drake wrote:

> So, you think that the aperture moving to a different location on every
> boot is what the BIOS desires? Is it normal for it to move so much?

Beats me. I gave up trying to understand BIOS authors motivations years ago.

> The current patch drops the upper bits and results in the aperture
> always being in the same place, and this appears to work. If the BIOS
> did really put the aperture beyond 4GB but my patch is making Linux put
> it somewhere else, does it surprise you that things are still working
> smoothly?

Does it survive a run of testgart when masking out the high bits?
It could be that you're right, and the upper bits being reported really
are garbage.

> Is it even possible for the aperture to start beyond 4GB when the system
> has less than 4GB of RAM?

The amount of RAM is irrelevant, it can appear anywhere in the address space,
which on 64bit, is pretty darned huge. The aperture isn't backed by RAM,
it's a 'virtual window' of sorts. When you write to an address in that range, it
gets transparently remapped to somewhere else in the address space.
The window is the 'aperture', where it remaps to is controlled by a translation
table called the GATT (which does live in real memory).

That's pretty much all there is to AGP. It's just a really dumb MMU of sorts.

Dave

--
http://www.codemonkey.org.uk

2006-12-18 11:39:33

by Eric W. Biederman

[permalink] [raw]
Subject: Re: amd64 agpgart aperture base value

Dave Jones <[email protected]> writes:

> On Thu, Dec 14, 2006 at 06:35:30PM -0500, Daniel Drake wrote:
>
> > So, you think that the aperture moving to a different location on every
> > boot is what the BIOS desires? Is it normal for it to move so much?
>
> Beats me. I gave up trying to understand BIOS authors motivations years ago.
>
> > The current patch drops the upper bits and results in the aperture
> > always being in the same place, and this appears to work. If the BIOS
> > did really put the aperture beyond 4GB but my patch is making Linux put
> > it somewhere else, does it surprise you that things are still working
> > smoothly?
>
> Does it survive a run of testgart when masking out the high bits?
> It could be that you're right, and the upper bits being reported really
> are garbage.
>
> > Is it even possible for the aperture to start beyond 4GB when the system
> > has less than 4GB of RAM?
>
> The amount of RAM is irrelevant, it can appear anywhere in the address space,
> which on 64bit, is pretty darned huge. The aperture isn't backed by RAM,
> it's a 'virtual window' of sorts. When you write to an address in that range, it
> gets transparently remapped to somewhere else in the address space.
> The window is the 'aperture', where it remaps to is controlled by a translation
> table called the GATT (which does live in real memory).
>
> That's pretty much all there is to AGP. It's just a really dumb MMU of sorts.

Well I just took a quick look, and it looks like there is a bug in amd64-agp.c
It isn't masking off the reserved high bits of the register at 0x94, and it
isn't being very careful with the promotion to 64bits.

However that does not appear to be the problem, as the base addresses you are
seeing are only seeing 40 bits long, and you are fixing it in the register
before the code in amd64-agp.c runs.

So I do agree that it appears that the BIOS is letting the upper address bits
float, and giving you a 32bit value.

Fixing this with a board specific pci quirk is questionable but it may
be ok. A reliable fix is probably if the address is sufficiently questionable
to allocate a new aperture ourselves, and scream that the BIOS messed up.
arch/x86_64/kernel/aperture.c appears to do that when we use the agp aperture
for an iommu.

I don't think a agp aperture above 64bits is actually very interesting,
in practice as most agp cards are only 32bits so won't be able to use it.
And we are talking bus addresses here.

Eric