2004-06-02 18:01:21

by Andi Kleen

[permalink] [raw]
Subject: Re: GART Error 11

Arthur Perry <[email protected]> writes:

> Hello,
>
> Oops. Sorry I have made a mistake in all of my statements below.
> It was after 5pm yesterday, and it was a long day...
> It's not offset 0x44 that we are interested in.
> My listings were at offset 0x48, which is MCA NB Status Low Register.
> Sorry, did not mean to confuse anybody.

I would recommend to just read the MC* MSRs via /dev/msr.
Accessing the northbridge directly for MCE information has various
quirks and i removed that completely in the 2.6 driver.
They contain the same information.

-Andi


2004-06-02 18:34:44

by Arthur Perry

[permalink] [raw]
Subject: Re: GART Error 11

Thanks Andi!
I did not realize there were quirks associated with reading this right from pci config space.

Perhaps someone can tell me this:
Does anybody know if there is any documented information about the differences between agp driver version 0.99 and 0.100?
I know I can just read the source, but there must be list of known bugs and what has been addressed by the newer version, right?

The reason why I ask is that both RedHat and SuSE are using 0.99 agp driver still..
RedHat Enterprise 3.0 's 2.4.21-9.0.1EL kernel and SuSE's 2.4.19 kernel have this in common, and I am seeing such gart errors only with their kernels.
The mainline kernel 2.4.27-pre4 using gart 0.100 does not produce this failure condition.

Please let me know if I am going in the wrong direction, but I am going to patch RedHat's kernel with the agp 0.100 driver and see if the problem does indeed go away.
I'll do the same with SuSE.
If this is the case, then I have found root cause of this particular problem, and I can then address it to the specific distributors.

Thanks!
Best Regards,
Arthur Perry


On Wed, 2 Jun 2004, Andi Kleen wrote:

> Arthur Perry <[email protected]> writes:
>
> > Hello,
> >
> > Oops. Sorry I have made a mistake in all of my statements below.
> > It was after 5pm yesterday, and it was a long day...
> > It's not offset 0x44 that we are interested in.
> > My listings were at offset 0x48, which is MCA NB Status Low Register.
> > Sorry, did not mean to confuse anybody.
>
> I would recommend to just read the MC* MSRs via /dev/msr.
> Accessing the northbridge directly for MCE information has various
> quirks and i removed that completely in the 2.6 driver.
> They contain the same information.
>
> -Andi
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-06-02 19:48:40

by Andi Kleen

[permalink] [raw]
Subject: Re: GART Error 11

On Wed, Jun 02, 2004 at 02:35:33PM -0400, Arthur Perry wrote:
> Thanks Andi!
> I did not realize there were quirks associated with reading this right from pci config space.
>
> Perhaps someone can tell me this:
> Does anybody know if there is any documented information about the differences between agp driver version 0.99 and 0.100?
> I know I can just read the source, but there must be list of known bugs and what has been addressed by the newer version, right?

You can read the bitkeeper logs at http://linux.bkbits.net
for the file in question.

Version numbers for kernel subsystems are often meaningless
because there are lots of changes without version number changes.


> The reason why I ask is that both RedHat and SuSE are using 0.99 agp driver still..
> RedHat Enterprise 3.0 's 2.4.21-9.0.1EL kernel and SuSE's 2.4.19 kernel have this in common, and I am seeing such gart errors only with their kernels.

Don't use the 2.4.19 kernel, use the SLES8-SP3 kernel. It will likely
fix this, there was a fix in this area, which also got into 2.4 mainline.
I don't know about RH kernels.

-Andi