2007-05-10 05:59:45

by Zhao Forrest

[permalink] [raw]
Subject: A question about GART aperture unmap

Hi,

The following is extracted from function gart_iommu_init()
......
/*
* Unmap the IOMMU part of the GART. The alias of the page is
* always mapped with cache enabled and there is no full cache
* coherency across the GART remapping. The unmapping avoids
* automatic prefetches from the CPU allocating cache lines in
* there. All CPU accesses are done via the direct mapping to
* the backing memory. The GART address is only used by PCI
* devices.
*/
clear_kernel_mapping((unsigned long)__va(iommu_bus_base), iommu_size);
......

On my AMD-based system, the GART aperture is reserved as:
Mapping aperture over 65536 KB of RAM @ 4000000

After commenting out clear_kernel_mapping() line, the system would
have sync flood and reset from time to time. However when with this
clear_kernel_mapping() line, no system reset happened.

As we know that CPU prefetch never cross the page boundary, in this
case the page boundary is 4M. Aperture starting address is 4000000,
which is aligned with 4M. So I think CPU prefetch can not touch this
range reserved by GART aperture.

My question is: in which cases would CPU prefetch touch the address
range reserved by GART aperture?

Thanks,
Forrest


2007-05-10 09:49:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] A question about GART aperture unmap

> After commenting out clear_kernel_mapping() line, the system would
> have sync flood and reset from time to time. However when with this
> clear_kernel_mapping() line, no system reset happened.

Hmm, that should not happen. Normally the problems fixed by
this are expected to be very rare and you also should not get
sync flood, but just cache line corruption. While it might be possible
for random corruption to then cause sync flood that should be again
rather unlikely. If it's repeatable quickly something else
must be wrong.

>
> As we know that CPU prefetch never cross the page boundary, in this

That only applies to sequential prefetch. But speculative execution can
prefetch pretty much any address. That is why the clear_kernel_mapping is
needed.

-Andi

2007-05-10 10:00:29

by Zhao Forrest

[permalink] [raw]
Subject: Re: [discuss] A question about GART aperture unmap

> >
> > As we know that CPU prefetch never cross the page boundary, in this
>
> That only applies to sequential prefetch. But speculative execution can
> prefetch pretty much any address. That is why the clear_kernel_mapping is
> needed.

In BIOS setup, there's "Speculative TLB Reload". Is this "Speculative
TLB Reload" the same as your mentioned speculative execution?
After disabling "Speculative TLB Reload" in BIOS, we could also
experience sync flood and reset when commenting out
clear_kernel_mapping()......

Forrest

2007-05-10 10:09:53

by Zhao Forrest

[permalink] [raw]
Subject: Re: [discuss] A question about GART aperture unmap

On 5/10/07, Andi Kleen <[email protected]> wrote:
> > After commenting out clear_kernel_mapping() line, the system would
> > have sync flood and reset from time to time. However when with this
> > clear_kernel_mapping() line, no system reset happened.
>
> Hmm, that should not happen. Normally the problems fixed by
> this are expected to be very rare and you also should not get
> sync flood, but just cache line corruption. While it might be possible
> for random corruption to then cause sync flood that should be again
> rather unlikely. If it's repeatable quickly something else
> must be wrong.

The log recorded in IPMI SEL log showed that there's a GART error:
3f01 | OEM record e0 | 1800000000f60000010005001b GART

Forrest

2007-05-10 10:19:30

by Zhao Forrest

[permalink] [raw]
Subject: Re: [discuss] A question about GART aperture unmap

On 5/10/07, Andi Kleen <[email protected]> wrote:
> > After commenting out clear_kernel_mapping() line, the system would
> > have sync flood and reset from time to time. However when with this
> > clear_kernel_mapping() line, no system reset happened.
>
> Hmm, that should not happen. Normally the problems fixed by
> this are expected to be very rare and you also should not get
> sync flood, but just cache line corruption. While it might be possible
> for random corruption to then cause sync flood that should be again
> rather unlikely. If it's repeatable quickly something else
> must be wrong.

Sorry, I missed the last sentence. This bug is reproduced by copying a
large file(8G) and in the meanwhile compiling a linux kernel for about
1 to 2 days.

Forrest

2007-05-10 11:06:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] A question about GART aperture unmap

> Sorry, I missed the last sentence. This bug is reproduced by copying a
> large file(8G) and in the meanwhile compiling a linux kernel for about
> 1 to 2 days.

Ok that is then more expected. Don't know what causes the sync flood,
but data corruption is at least expected without the clear_kernel_mapping().
Just don't comment it out if you care about your data.

-Andi