2008-08-04 16:30:52

by Rene Herman

[permalink] [raw]
Subject: AGP and PAT (induced?) problem (on AMD family 6)

Hi Dave.

A while ago I sent a message about long AGP delays upon starting and
exiting X:

http://marc.info/?l=linux-kernel&m=121647129632110&w=2

There was no reply (if that was due to the linux.ie address, could you
perhaps update it in MAINTAINERS?) but today Shaohua Li posted a patch
that made me wonder about PAT in this context:

http://marc.info/?l=linux-kernel&m=121783222306075&w=2
http://marc.info/?l=linux-kernel&m=121783222406078&w=2
http://marc.info/?l=linux-kernel&m=121783222406081&w=2

His patch does not solve anything appreciable for me -- the delays are
still as described in that previous post, with an exception for (with
Option "AGSize" "64") delays upon exiting X that are now sometimes as
bad as a full 12 seconds.

What _does_ solve this though is booting with the "nopat" command line
parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
With "nopat", there's no problem to be seen anymore -- exiting X
specifically is instantaneous.

With or without PAT, my /proc/mtrr is always:

reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1

under X joined by:

reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2

This is a machine with 768M, the AGP aperture set to 64MB and a 32MB
Matrox Millenium G550 AGP card. More detail in previous post.

Is this something inherent to PAT? Inherent to PAT on AMD family 6?
Inherent to DRM/AGP with PAT? On AMD family 6?

This is probably fairly important to get sorted because although I don't
know what's where at the moment, last I saw was a patch in x86/tip that
enabled PAT on many more models including all of AMD.

For reference, /proc/cpuinfo:

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 7
model name : AMD Duron(tm) Processor
stepping : 1
cpu MHz : 1313.094
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 2628.89
clflush size : 32
power management: ts

and the PAT enabler patch that I apply locally to 2.6.26:

diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c
b/arch/x86/kernel/cpu/addon_cpuid_features.c
index c2e1ce3..8992282 100644
--- a/arch/x86/kernel/cpu/addon_cpuid_features.c
+++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
@@ -55,7 +55,7 @@ void __cpuinit validate_pat_support(struct cpuinfo_x86 *c)
{
switch (c->x86_vendor) {
case X86_VENDOR_AMD:
- if (c->x86 >= 0xf && c->x86 <= 0x11)
+ if (c->x86 == 6 || c->x86 >= 0xf)
return;
break;
case X86_VENDOR_INTEL:

Rene.


Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
> What _does_ solve this though is booting with the "nopat" command line
> parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
> With "nopat", there's no problem to be seen anymore -- exiting X
> specifically is instantaneous.
>
> With or without PAT, my /proc/mtrr is always:
>
> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
>
> under X joined by:
>
> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2

To get some more debug data, can you please retest with latest kernel
(2.6.27-rc2) using "debugpat" kernel option and provide dmesg
output plus contents of <debugfs>/x86/pat_memtype_list?


Thanks,

Andreas

2008-08-06 20:58:11

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 06-08-08 15:51, Andreas Herrmann wrote:

> On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
>> What _does_ solve this though is booting with the "nopat" command line
>> parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
>> With "nopat", there's no problem to be seen anymore -- exiting X
>> specifically is instantaneous.
>>
>> With or without PAT, my /proc/mtrr is always:
>>
>> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
>> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
>> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
>>
>> under X joined by:
>>
>> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2
>
> To get some more debug data, can you please retest with latest kernel
> (2.6.27-rc2)

Problem present on vanilla -rc2.

> using "debugpat" kernel option and provide dmesg output

No... my kernel message buffer isn't large enough for that :-(

Right, I guess I now know where the delay is coming from. I suppose this
is not expected. dmesg as captured after starting X and without
"debugpat" at:

http://members.home.nl/rene.herman/pat/dmesg.x

Truncated dmesg with "debugpat":

http://members.home.nl/rene.herman/pat/dmesg.x.debugpat

> plus contents of <debugfs>/x86/pat_memtype_list?

Before starting X (1K):

http://members.home.nl/rene.herman/pat/pat_memtype_list.console.debugpat

After starting X (625K):

http://members.home.nl/rene.herman/pat/pat_memtype_list.x.debugpat

(This is with 64MB AGP memory)

More data:

http://members.home.nl/rene.herman/pat/config-2.6.27-rc2-current
http://members.home.nl/rene.herman/pat/xorg.conf
http://members.home.nl/rene.herman/pat/Xorg.0.log

Thanks,
Rene

2008-08-11 09:46:37

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

<ping>

Please note this a 2.6.27 problem (given that PAT isn't enabled by
default, not a _pure_ regression I guess, but still).

I also still don't know if you (Andreas), Dave or Yinghai should be the
To: on this but given that you've been the only one to react at all...

On 06-08-08 22:57, Rene Herman wrote:

> On 06-08-08 15:51, Andreas Herrmann wrote:
>
>> On Mon, Aug 04, 2008 at 06:30:32PM +0200, Rene Herman wrote:
>>> What _does_ solve this though is booting with the "nopat" command
>>> line parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron
>>> myself. With "nopat", there's no problem to be seen anymore --
>>> exiting X specifically is instantaneous.
>>>
>>> With or without PAT, my /proc/mtrr is always:
>>>
>>> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
>>> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
>>> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
>>>
>>> under X joined by:
>>>
>>> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2
>>
>> To get some more debug data, can you please retest with latest kernel
>> (2.6.27-rc2)
>
> Problem present on vanilla -rc2.
>
>> using "debugpat" kernel option and provide dmesg output
>
> No... my kernel message buffer isn't large enough for that :-(
>
> Right, I guess I now know where the delay is coming from. I suppose this
> is not expected. dmesg as captured after starting X and without
> "debugpat" at:
>
> http://members.home.nl/rene.herman/pat/dmesg.x
>
> Truncated dmesg with "debugpat":
>
> http://members.home.nl/rene.herman/pat/dmesg.x.debugpat
>
>> plus contents of <debugfs>/x86/pat_memtype_list?
>
> Before starting X (1K):
>
> http://members.home.nl/rene.herman/pat/pat_memtype_list.console.debugpat
>
> After starting X (625K):
>
> http://members.home.nl/rene.herman/pat/pat_memtype_list.x.debugpat
>
> (This is with 64MB AGP memory)
>
> More data:
>
> http://members.home.nl/rene.herman/pat/config-2.6.27-rc2-current
> http://members.home.nl/rene.herman/pat/xorg.conf
> http://members.home.nl/rene.herman/pat/Xorg.0.log

2008-08-15 14:22:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)


(more people Cc:-ed)

* Rene Herman <[email protected]> wrote:

> Hi Dave.
>
> A while ago I sent a message about long AGP delays upon starting and
> exiting X:
>
> http://marc.info/?l=linux-kernel&m=121647129632110&w=2
>
> There was no reply (if that was due to the linux.ie address, could you
> perhaps update it in MAINTAINERS?) but today Shaohua Li posted a patch
> that made me wonder about PAT in this context:
>
> http://marc.info/?l=linux-kernel&m=121783222306075&w=2
> http://marc.info/?l=linux-kernel&m=121783222406078&w=2
> http://marc.info/?l=linux-kernel&m=121783222406081&w=2
>
> His patch does not solve anything appreciable for me -- the delays are
> still as described in that previous post, with an exception for (with
> Option "AGSize" "64") delays upon exiting X that are now sometimes as
> bad as a full 12 seconds.
>
> What _does_ solve this though is booting with the "nopat" command line
> parameter. I'm on 2.6.26.1 and have enabled PAT for my AMD Duron myself.
> With "nopat", there's no problem to be seen anymore -- exiting X
> specifically is instantaneous.
>
> With or without PAT, my /proc/mtrr is always:
>
> reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1
> reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
> reg02: base=0xe8000000 (3712MB), size= 64MB: write-combining, count=1
>
> under X joined by:
>
> reg03: base=0xe4000000 (3648MB), size= 32MB: write-combining, count=2
>
> This is a machine with 768M, the AGP aperture set to 64MB and a 32MB
> Matrox Millenium G550 AGP card. More detail in previous post.
>
> Is this something inherent to PAT? Inherent to PAT on AMD family 6?
> Inherent to DRM/AGP with PAT? On AMD family 6?
>
> This is probably fairly important to get sorted because although I don't
> know what's where at the moment, last I saw was a patch in x86/tip that
> enabled PAT on many more models including all of AMD.
>
> For reference, /proc/cpuinfo:
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 7
> model name : AMD Duron(tm) Processor
> stepping : 1
> cpu MHz : 1313.094
> cache size : 64 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
> bogomips : 2628.89
> clflush size : 32
> power management: ts
>
> and the PAT enabler patch that I apply locally to 2.6.26:
>
> diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c
> b/arch/x86/kernel/cpu/addon_cpuid_features.c
> index c2e1ce3..8992282 100644
> --- a/arch/x86/kernel/cpu/addon_cpuid_features.c
> +++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
> @@ -55,7 +55,7 @@ void __cpuinit validate_pat_support(struct cpuinfo_x86 *c)
> {
> switch (c->x86_vendor) {
> case X86_VENDOR_AMD:
> - if (c->x86 >= 0xf && c->x86 <= 0x11)
> + if (c->x86 == 6 || c->x86 >= 0xf)
> return;
> break;
> case X86_VENDOR_INTEL:

agreed - +12 seconds wait suggest some rather fundamental breakage. Did
we go back to uncached for some critical display area that makes X start
up (shut down) that slowly? Did we mark the BIOS uncacheable perhaps,
causing X to execute BIOS code very slowly?

Ingo

2008-08-15 15:24:12

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 15-08-08 16:22, Ingo Molnar wrote:

> (more people Cc:-ed)

Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449

> agreed - +12 seconds wait suggest some rather fundamental breakage.
> Did we go back to uncached for some critical display area that makes
> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
> perhaps, causing X to execute BIOS code very slowly?

Quite a lot "uncached-minus" in those lists. I am desperately trying to
avoid a clue about mostly anything graphics related so, "I dunno".

I haven't just disabled PAT yet (although I was about to just do so) and
am available for testing.

Rene.

2008-08-19 10:10:35

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 15-08-08 17:24, Rene Herman wrote:

> On 15-08-08 16:22, Ingo Molnar wrote:
>
>> (more people Cc:-ed)
>
> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
>
>> agreed - +12 seconds wait suggest some rather fundamental breakage.
>> Did we go back to uncached for some critical display area that makes
>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
>> perhaps, causing X to execute BIOS code very slowly?
>
> Quite a lot "uncached-minus" in those lists. I am desperately trying to
> avoid a clue about mostly anything graphics related so, "I dunno".
>
> I haven't just disabled PAT yet (although I was about to just do so) and
> am available for testing.

<waiting with bated breath>

Additional observation with respect to first,next shutdown:

With Option "AGPSize" "64", and booted with "nopat", X startup (from
startx<enter> to functional desktop) is approximately 5 seconds,
shutdown is 1 second as calibration times.

Booted without "nopat", X startup seems to alternate between 10+ and 16+
seconds and for shutdown -- the first shutdown after boot takes some 14
seconds total, subsequent shutdowns settle at around 5 seconds.

Rene.

2008-08-19 10:27:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)


* Rene Herman <[email protected]> wrote:

> On 15-08-08 17:24, Rene Herman wrote:
>
>> On 15-08-08 16:22, Ingo Molnar wrote:
>>
>>> (more people Cc:-ed)
>>
>> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
>>
>>> agreed - +12 seconds wait suggest some rather fundamental breakage.
>>> Did we go back to uncached for some critical display area that makes
>>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
>>> perhaps, causing X to execute BIOS code very slowly?
>>
>> Quite a lot "uncached-minus" in those lists. I am desperately trying to
>> avoid a clue about mostly anything graphics related so, "I dunno".
>>
>> I haven't just disabled PAT yet (although I was about to just do so)
>> and am available for testing.
>
> <waiting with bated breath>
>
> Additional observation with respect to first,next shutdown:
>
> With Option "AGPSize" "64", and booted with "nopat", X startup (from
> startx<enter> to functional desktop) is approximately 5 seconds,
> shutdown is 1 second as calibration times.
>
> Booted without "nopat", X startup seems to alternate between 10+ and
> 16+ seconds and for shutdown -- the first shutdown after boot takes
> some 14 seconds total, subsequent shutdowns settle at around 5
> seconds.

would it be possible to start up and shut down X in the slow case via
strace, by doing something like this:

strace -f -ttt -TTT -o trace.log startx

and see which system calls (or other activities) took suspiciously long?

Ingo

2008-08-19 14:19:18

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 19-08-08 12:26, Ingo Molnar wrote:

> * Rene Herman <[email protected]> wrote:
>
>> On 15-08-08 17:24, Rene Herman wrote:
>>
>>> On 15-08-08 16:22, Ingo Molnar wrote:
>>>
>>>> (more people Cc:-ed)
>>> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
>>>
>>>> agreed - +12 seconds wait suggest some rather fundamental breakage.
>>>> Did we go back to uncached for some critical display area that makes
>>>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
>>>> perhaps, causing X to execute BIOS code very slowly?
>>> Quite a lot "uncached-minus" in those lists. I am desperately trying to
>>> avoid a clue about mostly anything graphics related so, "I dunno".
>>>
>>> I haven't just disabled PAT yet (although I was about to just do so)
>>> and am available for testing.
>> <waiting with bated breath>
>>
>> Additional observation with respect to first,next shutdown:
>>
>> With Option "AGPSize" "64", and booted with "nopat", X startup (from
>> startx<enter> to functional desktop) is approximately 5 seconds,
>> shutdown is 1 second as calibration times.
>>
>> Booted without "nopat", X startup seems to alternate between 10+ and
>> 16+ seconds and for shutdown -- the first shutdown after boot takes
>> some 14 seconds total, subsequent shutdowns settle at around 5
>> seconds.
>
> would it be possible to start up and shut down X in the slow case via
> strace, by doing something like this:
>
> strace -f -ttt -TTT -o trace.log startx
>
> and see which system calls (or other activities) took suspiciously long?

It wouldn't it seems. Root X (needed for the strace) works fine but
started this way hangs indefinitely.

I believe the 14 seconds for first shutdown to 5 later might be telling.
Sounds like something might have fixed up uncached entries.

I'd really like a reply from the AGP or PAT side right about now.

Rene.

2008-08-19 19:08:15

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:
> On 19-08-08 12:26, Ingo Molnar wrote:
>
> > * Rene Herman <[email protected]> wrote:
> >
> >> On 15-08-08 17:24, Rene Herman wrote:
> >>
> >>> On 15-08-08 16:22, Ingo Molnar wrote:
> >>>
> >>>> (more people Cc:-ed)
> >>> Thank you. Additional information at http://lkml.org/lkml/2008/8/6/449
> >>>
> >>>> agreed - +12 seconds wait suggest some rather fundamental breakage.
> >>>> Did we go back to uncached for some critical display area that makes
> >>>> X start up (shut down) that slowly? Did we mark the BIOS uncacheable
> >>>> perhaps, causing X to execute BIOS code very slowly?
> >>> Quite a lot "uncached-minus" in those lists. I am desperately trying to
> >>> avoid a clue about mostly anything graphics related so, "I dunno".
> >>>
> >>> I haven't just disabled PAT yet (although I was about to just do so)
> >>> and am available for testing.
> >> <waiting with bated breath>
> >>
> >> Additional observation with respect to first,next shutdown:
> >>
> >> With Option "AGPSize" "64", and booted with "nopat", X startup (from
> >> startx<enter> to functional desktop) is approximately 5 seconds,
> >> shutdown is 1 second as calibration times.
> >>
> >> Booted without "nopat", X startup seems to alternate between 10+ and
> >> 16+ seconds and for shutdown -- the first shutdown after boot takes
> >> some 14 seconds total, subsequent shutdowns settle at around 5
> >> seconds.
> >
> > would it be possible to start up and shut down X in the slow case via
> > strace, by doing something like this:
> >
> > strace -f -ttt -TTT -o trace.log startx
> >
> > and see which system calls (or other activities) took suspiciously long?
>
> It wouldn't it seems. Root X (needed for the strace) works fine but
> started this way hangs indefinitely.
>
> I believe the 14 seconds for first shutdown to 5 later might be telling.
> Sounds like something might have fixed up uncached entries.
>
> I'd really like a reply from the AGP or PAT side right about now.
>

Hmm. Looks like there are more than 16000 entries in the PAT list!

This delay may be due to the overhead of parsing this linked list everytime
for a new entry, rather than any problem with cache setting itself.

I am working on a patch to optimize this pat list parsing for the simple case.
Should be able to send it out later today, for testing.

Thanks,
Venki

2008-08-19 19:21:30

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 19-08-08 21:07, Venki Pallipadi wrote:

> On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:

>> I believe the 14 seconds for first shutdown to 5 later might be
>> telling. Sounds like something might have fixed up uncached
>> entries.
>>
>> I'd really like a reply from the AGP or PAT side right about now.
>
> Hmm. Looks like there are more than 16000 entries in the PAT list!
>
> This delay may be due to the overhead of parsing this linked list
> everytime for a new entry, rather than any problem with cache setting
> itself.
>
> I am working on a patch to optimize this pat list parsing for the
> simple case. Should be able to send it out later today, for testing.

Thanks for the reply. It's with 64MB of AGP memory which I guess is at
the low end these days. Would your reply mean that basically everyone on
2.6.27 should now be experiencing this?

I noticed it was PAT related due to Shaohua Li's:

http://marc.info/?l=linux-kernel&m=121783222306075&w=2

which lists very different times (patch there did not help any).

As another by the way, probably not surprising but I earlier also tried
both unmounting and completely compiling out debugfs just in case I
was seeing a debugging related sysmptom. No help either.

It's evening here so I'll probably not be able to test until tomorrow.

Rene.

2008-08-19 23:28:18

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Tue, Aug 19, 2008 at 12:22:10PM -0700, Rene Herman wrote:
> On 19-08-08 21:07, Venki Pallipadi wrote:
>
> > On Tue, Aug 19, 2008 at 07:19:44AM -0700, Rene Herman wrote:
>
> >> I believe the 14 seconds for first shutdown to 5 later might be
> >> telling. Sounds like something might have fixed up uncached
> >> entries.
> >>
> >> I'd really like a reply from the AGP or PAT side right about now.
> >
> > Hmm. Looks like there are more than 16000 entries in the PAT list!
> >
> > This delay may be due to the overhead of parsing this linked list
> > everytime for a new entry, rather than any problem with cache setting
> > itself.
> >
> > I am working on a patch to optimize this pat list parsing for the
> > simple case. Should be able to send it out later today, for testing.
>
> Thanks for the reply. It's with 64MB of AGP memory which I guess is at
> the low end these days. Would your reply mean that basically everyone on
> 2.6.27 should now be experiencing this?
>
> I noticed it was PAT related due to Shaohua Li's:
>
> http://marc.info/?l=linux-kernel&m=121783222306075&w=2
>
> which lists very different times (patch there did not help any).
>
> As another by the way, probably not surprising but I earlier also tried
> both unmounting and completely compiling out debugfs just in case I
> was seeing a debugging related sysmptom. No help either.
>
> It's evening here so I'll probably not be able to test until tomorrow.
>

Below is the patch I am testing. Let me know if this patch helps.

Thanks,
Venki


Test patch. Adds cached_entry to list add routine, in order to speed up the
lookup for sequential reserve_memtype calls.

Signed-off-by: Venkatesh Pallipadi <[email protected]>

---
arch/x86/mm/pat.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c 2008-08-19 15:21:07.000000000 -0700
+++ linux-2.6/arch/x86/mm/pat.c 2008-08-19 16:00:52.000000000 -0700
@@ -207,6 +207,9 @@ static int chk_conflict(struct memtype *
return -EBUSY;
}

+static struct memtype *cached_entry;
+static u64 cached_start;
+
/*
* req_type typically has one of the:
* - _PAGE_CACHE_WB
@@ -280,11 +283,17 @@ int reserve_memtype(u64 start, u64 end,

spin_lock(&memtype_lock);

+ if (cached_entry && start >= cached_start)
+ entry = cached_entry;
+ else
+ entry = list_entry(&memtype_list, struct memtype, nd);
+
/* Search for existing mapping that overlaps the current range */
where = NULL;
- list_for_each_entry(entry, &memtype_list, nd) {
+ list_for_each_entry_continue(entry, &memtype_list, nd) {
if (end <= entry->start) {
where = entry->nd.prev;
+ cached_entry = list_entry(where, struct memtype, nd);
break;
} else if (start <= entry->start) { /* end > entry->start */
err = chk_conflict(new, entry, new_type);
@@ -292,6 +301,8 @@ int reserve_memtype(u64 start, u64 end,
dprintk("Overlap at 0x%Lx-0x%Lx\n",
entry->start, entry->end);
where = entry->nd.prev;
+ cached_entry = list_entry(where,
+ struct memtype, nd);
}
break;
} else if (start < entry->end) { /* start > entry->start */
@@ -299,7 +310,20 @@ int reserve_memtype(u64 start, u64 end,
if (!err) {
dprintk("Overlap at 0x%Lx-0x%Lx\n",
entry->start, entry->end);
- where = &entry->nd;
+ cached_entry = list_entry(entry->nd.prev,
+ struct memtype, nd);
+
+ /*
+ * Move to right position in the linked
+ * list to add this new entry
+ */
+ list_for_each_entry_continue(entry,
+ &memtype_list, nd) {
+ if (start <= entry->start) {
+ where = entry->nd.prev;
+ break;
+ }
+ }
}
break;
}
@@ -314,6 +338,8 @@ int reserve_memtype(u64 start, u64 end,
return err;
}

+ cached_start = start;
+
if (where)
list_add(&new->nd, where);
else
@@ -343,6 +369,9 @@ int free_memtype(u64 start, u64 end)
spin_lock(&memtype_lock);
list_for_each_entry(entry, &memtype_list, nd) {
if (entry->start == start && entry->end == end) {
+ if (cached_entry == entry || cached_start == start)
+ cached_entry = NULL;
+
list_del(&entry->nd);
kfree(entry);
err = 0;

2008-08-20 10:05:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)


* Venki Pallipadi <[email protected]> wrote:

> > I'd really like a reply from the AGP or PAT side right about now.
>
> Hmm. Looks like there are more than 16000 entries in the PAT list!

hm, btw., why is that?

Ingo

2008-08-20 10:10:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)


* Venki Pallipadi <[email protected]> wrote:

> Below is the patch I am testing. Let me know if this patch helps.

i've queued this fix up in tip/x86/urgent for more testing - as ~10
seconds delays are serious enough to warrant a quick fix.

Rene, you might want to try tip/master, which has this integrated as
well:

http://people.redhat.com/mingo/tip.git/README

Ingo

2008-08-20 10:49:23

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 20-08-08 12:04, Ingo Molnar wrote:

> * Venki Pallipadi <[email protected]> wrote:
>
>>> I'd really like a reply from the AGP or PAT side right about now.
>> Hmm. Looks like there are more than 16000 entries in the PAT list!
>
> hm, btw., why is that?

Because 64M of AGP memory divided by 4K pages is 16K. That is, the
underlying problem seems to be AGP drivers using order 0 allocations.
I'm looking.

Do note also that this means that Venki's change would not constitite a
correct/final fix. Sure, caching the last entry speeds up traversing a
16K entry list but the issue is that there shouldn't be a 16K entry
list. Through AGP, or maybe even by coalescing entries in the PAT list
if that's at all possible (I guess it's not really).

Even if such a more fundamental fix isn't (easily) available, the PAT
code already comments that the list, which is sorted by ->start value,
is expected to be short, and should be turned into an rbtree if it isn't
which might be slightly less of a bandaid.

Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
seems so I've added Dave Jones for a possible comment from the AGP side.
If I'm reading this right upto now, still many AGP driver (among which
my amd-k7-agp) are affected.

In the short run and if I'm not just mistaken, the best fix might be to
make PAT dependent on not having a dumb AGP driver (but as said, still
looking).

Note that my chipset is capable of a 2G AGP aperture. That's 512K pages
if fully used, 256K for 1G, 128K for 512M, ...

Rene.

2008-08-20 14:26:46

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 20-08-08 12:50, Rene Herman wrote:

> On 20-08-08 12:04, Ingo Molnar wrote:
>
>> * Venki Pallipadi <[email protected]> wrote:
>>
>>>> I'd really like a reply from the AGP or PAT side right about now.
>>> Hmm. Looks like there are more than 16000 entries in the PAT list!
>>
>> hm, btw., why is that?
>
> Because 64M of AGP memory divided by 4K pages is 16K. That is, the
> underlying problem seems to be AGP drivers using order 0 allocations.
> I'm looking.
>
> Do note also that this means that Venki's change would not constitite a
> correct/final fix. Sure, caching the last entry speeds up traversing a
> 16K entry list but the issue is that there shouldn't be a 16K entry
> list. Through AGP, or maybe even by coalescing entries in the PAT list
> if that's at all possible (I guess it's not really).
>
> Even if such a more fundamental fix isn't (easily) available, the PAT
> code already comments that the list, which is sorted by ->start value,
> is expected to be short, and should be turned into an rbtree if it isn't
> which might be slightly less of a bandaid.
>
> Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
> seems so I've added Dave Jones for a possible comment from the AGP side.
> If I'm reading this right upto now, still many AGP driver (among which
> my amd-k7-agp) are affected.

This was based on a wrong reading; I was looking at the GATT allocation.

I'm giving up looking until someone can tell me whether or not those 16K
entries are expected though. I have just one AGP card in a PAT capable
machine.

How many entries in /debug/x86/pat_memtype_list are there on other AGP
systems with Option "AGPSize" "64" in their xorg.conf:"Device" section
(and their AGP aperture set to 64M or bigger in the BIOS)?

Rene.

2008-08-20 19:41:55

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Wed, Aug 20, 2008 at 07:27:22AM -0700, Rene Herman wrote:
> On 20-08-08 12:50, Rene Herman wrote:
>
> > On 20-08-08 12:04, Ingo Molnar wrote:
> >
> >> * Venki Pallipadi <[email protected]> wrote:
> >>
> >>>> I'd really like a reply from the AGP or PAT side right about now.
> >>> Hmm. Looks like there are more than 16000 entries in the PAT list!
> >>
> >> hm, btw., why is that?
> >
> > Because 64M of AGP memory divided by 4K pages is 16K. That is, the
> > underlying problem seems to be AGP drivers using order 0 allocations.
> > I'm looking.
> >
> > Do note also that this means that Venki's change would not constitite a
> > correct/final fix. Sure, caching the last entry speeds up traversing a
> > 16K entry list but the issue is that there shouldn't be a 16K entry
> > list. Through AGP, or maybe even by coalescing entries in the PAT list
> > if that's at all possible (I guess it's not really).
> >
> > Even if such a more fundamental fix isn't (easily) available, the PAT
> > code already comments that the list, which is sorted by ->start value,
> > is expected to be short, and should be turned into an rbtree if it isn't
> > which might be slightly less of a bandaid.
> >
> > Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
> > seems so I've added Dave Jones for a possible comment from the AGP side.
> > If I'm reading this right upto now, still many AGP driver (among which
> > my amd-k7-agp) are affected.
>
> This was based on a wrong reading; I was looking at the GATT allocation.
>
> I'm giving up looking until someone can tell me whether or not those 16K
> entries are expected though. I have just one AGP card in a PAT capable
> machine.
>

OK. I have reproduced this list size issue locally and this order 1
allocation and set_memory_uc on that allocation is actually coming from
agp_allocate_memory() -> agp_generic_alloc_page() -> map_page_into_agp()
agp_allocate_memory breaks higher order page requests into order 1 allocs.

On my system I see multiple agp_allocate_memory requests for nrpages
8841, 1020, 16, 2160, 2160, 8192. Together they end up resulting in
more than 22K entries in PAT pages.

Thanks,
Venki

2008-08-20 21:02:22

by Dave Airlie

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Wed, Aug 20, 2008 at 8:50 PM, Rene Herman <[email protected]> wrote:
> On 20-08-08 12:04, Ingo Molnar wrote:
>
>> * Venki Pallipadi <[email protected]> wrote:
>>
>>>> I'd really like a reply from the AGP or PAT side right about now.
>>>
>>> Hmm. Looks like there are more than 16000 entries in the PAT list!
>>
>> hm, btw., why is that?
>
> Because 64M of AGP memory divided by 4K pages is 16K. That is, the
> underlying problem seems to be AGP drivers using order 0 allocations. I'm
> looking.
>
> Do note also that this means that Venki's change would not constitite a
> correct/final fix. Sure, caching the last entry speeds up traversing a 16K
> entry list but the issue is that there shouldn't be a 16K entry list.
> Through AGP, or maybe even by coalescing entries in the PAT list if that's
> at all possible (I guess it's not really).
>
> Even if such a more fundamental fix isn't (easily) available, the PAT code
> already comments that the list, which is sorted by ->start value, is
> expected to be short, and should be turned into an rbtree if it isn't which
> might be slightly less of a bandaid.
>
> Dave Airlie (as the MAINTAINERS entry) can't be arsed to answer email it
> seems so I've added Dave Jones for a possible comment from the AGP side.
> If I'm reading this right upto now, still many AGP driver (among which my
> amd-k7-agp) are affected.

I haven't anything to add, I'm the maintainer not the author, all the
people who wrote the offending code were
already involved.

Dave.
>
> In the short run and if I'm not just mistaken, the best fix might be to make
> PAT dependent on not having a dumb AGP driver (but as said, still looking).
>
> Note that my chipset is capable of a 2G AGP aperture. That's 512K pages if
> fully used, 256K for 1G, 128K for 512M, ...
>
> Rene.
>

2008-08-20 21:16:17

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 20-08-08 23:02, Dave Airlie wrote:

> On Wed, Aug 20, 2008 at 8:50 PM, Rene Herman
> <[email protected]> wrote:
>> On 20-08-08 12:04, Ingo Molnar wrote:
>>
>>> * Venki Pallipadi <[email protected]> wrote:
>>>
>>>>> I'd really like a reply from the AGP or PAT side right about
>>>>> now.
>>>> Hmm. Looks like there are more than 16000 entries in the PAT
>>>> list!
>>> hm, btw., why is that?
>> Because 64M of AGP memory divided by 4K pages is 16K. That is, the
>> underlying problem seems to be AGP drivers using order 0
>> allocations. I'm looking.

[ ... ]

> I haven't anything to add, I'm the maintainer not the author, all the
> people who wrote the offending code were already involved.

The underlying problem is the order 0 allocations (agp_allocate_memory
--> agp_generic_allocate_page) where each single page is set uncached
individually, creating a PAT entry.

Non order 0 allocations generally would ofcourse help. That's very much
AGP internal -- do you feel that's the way to go?

All the current AGP drivers except sgi-agp use agp_generic_alloc_page().

Doing a quick local hack to collect pages in agp_allocate_memory() into
regions and set the regions (generally 1) UC in one fell swoop, but I
don't know if that's safe (and it feels like a rather poor hack anyway).

(not to mention that it's time for bed again).

Rene.

2008-08-20 21:40:06

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On 20-08-08 21:41, Venki Pallipadi wrote:

> OK. I have reproduced this list size issue locally and this order 1
> allocation and set_memory_uc on that allocation is actually coming
> from agp_allocate_memory() -> agp_generic_alloc_page() ->
> map_page_into_agp() agp_allocate_memory breaks higher order page
> requests into order 1 allocs.
>
> On my system I see multiple agp_allocate_memory requests for nrpages
> 8841, 1020, 16, 2160, 2160, 8192. Together they end up resulting in
> more than 22K entries in PAT pages.

Okay, thanks for the confirmation.

Now, how to fix...

Firstly, it seems we can conclude that any expectancy of a short PAT
list is simply destroyed by AGP. I believe the best thing migh be to
look into "fixing" AGP rather than PAT for now?

In a sense the entire purpose of the AGP GART is collecting non
contiguous pages but given that in practice it's generally still just
one or at most a few regions, going to multi-page allocs sounds most
appetising to me.

All in tree AGP drivers except sgi-agp use agp_generic_alloc_page(), ali
via m1541_alloc_page and i460 via i460_alloc_page.

Rene.

2008-08-20 21:47:00

by Dave Airlie

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Thu, Aug 21, 2008 at 7:40 AM, Rene Herman <[email protected]> wrote:
> On 20-08-08 21:41, Venki Pallipadi wrote:
>
>> OK. I have reproduced this list size issue locally and this order 1
>> allocation and set_memory_uc on that allocation is actually coming
>> from agp_allocate_memory() -> agp_generic_alloc_page() ->
>> map_page_into_agp() agp_allocate_memory breaks higher order page
>> requests into order 1 allocs.
>>
>> On my system I see multiple agp_allocate_memory requests for nrpages 8841,
>> 1020, 16, 2160, 2160, 8192. Together they end up resulting in more than 22K
>> entries in PAT pages.
>
> Okay, thanks for the confirmation.
>
> Now, how to fix...
>
> Firstly, it seems we can conclude that any expectancy of a short PAT list is
> simply destroyed by AGP. I believe the best thing migh be to look into
> "fixing" AGP rather than PAT for now?
>
> In a sense the entire purpose of the AGP GART is collecting non contiguous
> pages but given that in practice it's generally still just one or at most a
> few regions, going to multi-page allocs sounds most appetising to me.
>
> All in tree AGP drivers except sgi-agp use agp_generic_alloc_page(), ali via
> m1541_alloc_page and i460 via i460_alloc_page.

In the future we will be getting more smaller AGP allocs, so the other
problem needs a fix as well.

http://git.kernel.org/?p=linux/kernel/git/airlied/agp-2.6.git;a=shortlog;h=agp-pageattr2

contains some code I started on before that moves the interfaces
around, Shaohua has been looking at
it as it needs the changes to the set_pages interface as well, which
is where I ran out of time/steam last time.

However with alloc/free pages we could change to a higher order
allocation function as long as it fell back to lower
orders internally.

Dave.

>
> Rene.
>

2008-08-20 22:16:45

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Wed, Aug 20, 2008 at 02:46:49PM -0700, Dave Airlie wrote:
> On Thu, Aug 21, 2008 at 7:40 AM, Rene Herman <[email protected]> wrote:
> > On 20-08-08 21:41, Venki Pallipadi wrote:
> >
> >> OK. I have reproduced this list size issue locally and this order 1
> >> allocation and set_memory_uc on that allocation is actually coming
> >> from agp_allocate_memory() -> agp_generic_alloc_page() ->
> >> map_page_into_agp() agp_allocate_memory breaks higher order page
> >> requests into order 1 allocs.
> >>
> >> On my system I see multiple agp_allocate_memory requests for nrpages 8841,
> >> 1020, 16, 2160, 2160, 8192. Together they end up resulting in more than 22K
> >> entries in PAT pages.
> >
> > Okay, thanks for the confirmation.
> >
> > Now, how to fix...
> >
> > Firstly, it seems we can conclude that any expectancy of a short PAT list is
> > simply destroyed by AGP. I believe the best thing migh be to look into
> > "fixing" AGP rather than PAT for now?
> >
> > In a sense the entire purpose of the AGP GART is collecting non contiguous
> > pages but given that in practice it's generally still just one or at most a
> > few regions, going to multi-page allocs sounds most appetising to me.
> >
> > All in tree AGP drivers except sgi-agp use agp_generic_alloc_page(), ali via
> > m1541_alloc_page and i460 via i460_alloc_page.
>
> In the future we will be getting more smaller AGP allocs, so the other
> problem needs a fix as well.
>
> http://git.kernel.org/?p=linux/kernel/git/airlied/agp-2.6.git;a=shortlog;h=agp-pageattr2
>
> contains some code I started on before that moves the interfaces
> around, Shaohua has been looking at
> it as it needs the changes to the set_pages interface as well, which
> is where I ran out of time/steam last time.
>
> However with alloc/free pages we could change to a higher order
> allocation function as long as it fell back to lower
> orders internally.
>

Yes. Atleast during the bootup, we should be able to get higher order pages.
And if we cannot get that, agp can internally fall back to zero order pages.
That will reduce the number of times set_memory_* gets called, even without
pat.

We are also looking at changing the reserve_memtype in PAT, not to use linked
list for RAM backed pages and track them in page struct. That way we will be
using the list only for reserved region which should still be less in
number and agp set_memory_* call will not have the list overhead.

BTW, Rene: Did the patch from yday, caching the last list add, help in
eliminating the slowdown in X startup?

Thanks,
Venki

2008-08-21 03:42:33

by Andi Kleen

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

Venki Pallipadi <[email protected]> writes:
>
> We are also looking at changing the reserve_memtype in PAT, not to use linked
> list for RAM backed pages and track them in page struct.

Back when I hacked on this I explicitely chose to not do this because
it would make it impossible to put any normal anonymous pages into
the PAT list. While that's not done today there's no reason it couldn't
be done in the future.

Also it doesn't fix the scalability of the data structure anyways
(a list is a list), just saves some memory.

-Andi

2008-08-21 12:06:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)


* Venki Pallipadi <[email protected]> wrote:

> BTW, Rene: Did the patch from yday, caching the last list add, help in
> eliminating the slowdown in X startup?

Would be nice to test tip/master - it has both that patch included and
the latest pageattr-array API (with enablement in AGP drivers) patchset,
done by Shaohua Li, based on Dave's original patch.

Ingo

2008-08-21 17:15:15

by Rene Herman

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

>From c7f1e98ee8796ca2812363e88dc9ffbf9020528b Mon Sep 17 00:00:00 2001
From: Rene Herman <[email protected]>
Date: Thu, 21 Aug 2008 19:06:36 +0200
Subject: [PATCH] AGP: make AGP drivers use new generic_alloc_pages() interface

The new agp_generic_alloc_pages() interface uses the also new
pageattr array interface API. This makes all AGP drivers that
up to now used generic_{alloc,destroy}_page() use it.

Signed-off-by: Rene Herman <[email protected]>
---
drivers/char/agp/alpha-agp.c | 2 ++
drivers/char/agp/amd-k7-agp.c | 2 ++
drivers/char/agp/amd64-agp.c | 2 ++
drivers/char/agp/ati-agp.c | 2 ++
drivers/char/agp/efficeon-agp.c | 2 ++
drivers/char/agp/hp-agp.c | 2 ++
drivers/char/agp/i460-agp.c | 2 ++
drivers/char/agp/nvidia-agp.c | 2 ++
drivers/char/agp/parisc-agp.c | 2 ++
drivers/char/agp/sis-agp.c | 2 ++
drivers/char/agp/sworks-agp.c | 2 ++
drivers/char/agp/uninorth-agp.c | 4 ++++
drivers/char/agp/via-agp.c | 4 ++++
13 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c
index 5da89f6..5ea4da8 100644
--- a/drivers/char/agp/alpha-agp.c
+++ b/drivers/char/agp/alpha-agp.c
@@ -143,7 +143,9 @@ struct agp_bridge_driver alpha_core_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c
index e280531..603a986 100644
--- a/drivers/char/agp/amd-k7-agp.c
+++ b/drivers/char/agp/amd-k7-agp.c
@@ -386,7 +386,9 @@ static const struct agp_bridge_driver amd_irongate_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index 7495c52..2812ee2 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -224,7 +224,9 @@ static const struct agp_bridge_driver amd_8151_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/ati-agp.c b/drivers/char/agp/ati-agp.c
index 6ecbcaf..ae2791b 100644
--- a/drivers/char/agp/ati-agp.c
+++ b/drivers/char/agp/ati-agp.c
@@ -418,7 +418,9 @@ static const struct agp_bridge_driver ati_generic_bridge = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/efficeon-agp.c b/drivers/char/agp/efficeon-agp.c
index 8ca6f26..453543a 100644
--- a/drivers/char/agp/efficeon-agp.c
+++ b/drivers/char/agp/efficeon-agp.c
@@ -335,7 +335,9 @@ static const struct agp_bridge_driver efficeon_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/hp-agp.c b/drivers/char/agp/hp-agp.c
index 80d7317..183ac3f 100644
--- a/drivers/char/agp/hp-agp.c
+++ b/drivers/char/agp/hp-agp.c
@@ -435,7 +435,9 @@ const struct agp_bridge_driver hp_zx1_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
diff --git a/drivers/char/agp/i460-agp.c b/drivers/char/agp/i460-agp.c
index e587eeb..10da687 100644
--- a/drivers/char/agp/i460-agp.c
+++ b/drivers/char/agp/i460-agp.c
@@ -575,7 +575,9 @@ const struct agp_bridge_driver intel_i460_driver = {
.insert_memory = i460_insert_memory_small_io_page,
.remove_memory = i460_remove_memory_small_io_page,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
#endif
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
diff --git a/drivers/char/agp/nvidia-agp.c b/drivers/char/agp/nvidia-agp.c
index eaceb61..dc70d37 100644
--- a/drivers/char/agp/nvidia-agp.c
+++ b/drivers/char/agp/nvidia-agp.c
@@ -312,7 +312,9 @@ static const struct agp_bridge_driver nvidia_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/parisc-agp.c b/drivers/char/agp/parisc-agp.c
index 8c42dcc..f2492ec 100644
--- a/drivers/char/agp/parisc-agp.c
+++ b/drivers/char/agp/parisc-agp.c
@@ -224,7 +224,9 @@ static const struct agp_bridge_driver parisc_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
diff --git a/drivers/char/agp/sis-agp.c b/drivers/char/agp/sis-agp.c
index 2587ef9..6c3837a 100644
--- a/drivers/char/agp/sis-agp.c
+++ b/drivers/char/agp/sis-agp.c
@@ -140,7 +140,9 @@ static struct agp_bridge_driver sis_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/sworks-agp.c b/drivers/char/agp/sworks-agp.c
index 2fb27fe..6224df8 100644
--- a/drivers/char/agp/sworks-agp.c
+++ b/drivers/char/agp/sworks-agp.c
@@ -437,7 +437,9 @@ static const struct agp_bridge_driver sworks_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

diff --git a/drivers/char/agp/uninorth-agp.c b/drivers/char/agp/uninorth-agp.c
index eef7270..2accc97 100644
--- a/drivers/char/agp/uninorth-agp.c
+++ b/drivers/char/agp/uninorth-agp.c
@@ -509,7 +509,9 @@ const struct agp_bridge_driver uninorth_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
@@ -534,7 +536,9 @@ const struct agp_bridge_driver u3_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_paged = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
.needs_scratch_page = true,
diff --git a/drivers/char/agp/via-agp.c b/drivers/char/agp/via-agp.c
index 7b36476..9f4d49e 100644
--- a/drivers/char/agp/via-agp.c
+++ b/drivers/char/agp/via-agp.c
@@ -190,7 +190,9 @@ static const struct agp_bridge_driver via_agp3_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

@@ -214,7 +216,9 @@ static const struct agp_bridge_driver via_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};

--
1.5.5


Attachments:
0001-AGP-make-AGP-drivers-use-new-generic_alloc_pages.patch (9.61 kB)

2008-08-21 21:13:19

by Suresh Siddha

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

On Wed, Aug 20, 2008 at 08:42:18PM -0700, Andi Kleen wrote:
> Venki Pallipadi <[email protected]> writes:
> >
> > We are also looking at changing the reserve_memtype in PAT, not to use linked
> > list for RAM backed pages and track them in page struct.
>
> Back when I hacked on this I explicitely chose to not do this because
> it would make it impossible to put any normal anonymous pages into
> the PAT list. While that's not done today there's no reason it couldn't
> be done in the future.

Andi, we are planning to add couple of page flags which will track
the memory attribute of the page. We need to do some checks like,
allow the memory attribute of the page to be changed, only if it is not
mapped any where and not on free lists(like the in the X driver case,
where they allocate the page and then change the attribute). Similarly,
in generic -mm, we need to ensure that the page before it gets added to free
list, has the right memory attribute etc. If the driver is exposing this
page with special attribute, then it is drivers responsibility to
use the same attribute across all the mappings.

Is there a reason why this won't work with anonymous pages? Can you please
elaborate.

> Also it doesn't fix the scalability of the data structure anyways
> (a list is a list), just saves some memory.

With this, we will track only the reserved regions using the linked list
and typically these reserved regions will be small number (may be huge
contiguous chunks but total number of such chunks will be reasonably smaller).

thanks,
suresh

2008-08-21 22:09:28

by Rene Herman

[permalink] [raw]
Subject: [PATCH] x86: {reverve,free}_memtype() take a physical address

>From 48b9d479e149dffac24f98f9491174fdfc19d26b Mon Sep 17 00:00:00 2001
From: Rene Herman <[email protected]>
Date: Thu, 21 Aug 2008 23:32:25 +0200
Subject: [PATCH] x86: {reverve,free}_memtype() take a physical address

The new set_memory_array_{uc,wb}() pass virtual addresses to
{reserve,free}_memtype() it seems.

Signed-off-by: Rene Herman <[email protected]>
---
arch/x86/mm/pageattr.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index b351c4f..d49e4db 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -947,7 +947,7 @@ int set_memory_array_uc(unsigned long *addr, int addrinarray)
* for now UC MINUS. see comments in ioremap_nocache()
*/
for (i = 0; i < addrinarray; i++) {
- if (reserve_memtype(addr[i], addr[i] + PAGE_SIZE,
+ if (reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE,
_PAGE_CACHE_UC_MINUS, NULL))
goto out;
}
@@ -956,7 +956,7 @@ int set_memory_array_uc(unsigned long *addr, int addrinarray)
__pgprot(_PAGE_CACHE_UC_MINUS), 1);
out:
while (--i >= 0)
- free_memtype(addr[i], addr[i] + PAGE_SIZE);
+ free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);
return -EINVAL;
}
EXPORT_SYMBOL(set_memory_array_uc);
@@ -998,7 +998,7 @@ int set_memory_array_wb(unsigned long *addr, int addrinarray)
{
int i;
for (i = 0; i < addrinarray; i++)
- free_memtype(addr[i], addr[i] + PAGE_SIZE);
+ free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);

return change_page_attr_clear(addr, addrinarray,
__pgprot(_PAGE_CACHE_MASK), 1);
--
1.5.5


Attachments:
0001-x86-reverve-free-_memtype-take-a-physical-addres.patch (1.59 kB)

2008-08-21 22:16:58

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [PATCH] x86: {reverve,free}_memtype() take a physical address


>-----Original Message-----
>From: Rene Herman [mailto:[email protected]]
>Sent: Thursday, August 21, 2008 3:10 PM
>To: Ingo Molnar; Li, Shaohua
>Cc: Pallipadi, Venkatesh; Dave Airlie; Yinghai Lu; Andreas
>Herrmann; Arjan van de Ven; Linux Kernel; Siddha, Suresh B;
>Thomas Gleixner; H. Peter Anvin; Dave Jones
>Subject: [PATCH] x86: {reverve,free}_memtype() take a physical address
>
>On 21-08-08 19:15, Rene Herman wrote:
>
>> On 21-08-08 14:06, Ingo Molnar wrote:
>
>>> Would be nice to test tip/master - it has both that patch
>included and
>>> the latest pageattr-array API (with enablement in AGP drivers)
>>> patchset, done by Shaohua Li, based on Dave's original patch.
>>
>> That patch by itself doesn't help any -- the new
>set_memory_array_uc()
>> still calls reserve_memtype() for each single page in the array.
>
>Worse yet, it appears to be broken. {reserve,free}_memtype()
>expect phys
>addresses but it's being passed virtual ones...
>
>Rene.
>

Yes. Noticed that too and sent a patch here for x86/tip.

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/2270.html

It is not very critical as it sounds as only set_memory_uc sets PAT
bits for RAM pages. Most other users (devmem mmap, ioramep, pci)
set PAT bits on the reserved memory. And there will not be conflicts
across RAM and reserveed regions. Regardless, this was a stupid
bug that we had missed earlier.

Thanks,
Venki

2008-08-21 22:25:59

by Rene Herman

[permalink] [raw]
Subject: Re: [PATCH] x86: {reverve,free}_memtype() take a physical address

On 22-08-08 00:16, Pallipadi, Venkatesh wrote:

> Yes. Noticed that too and sent a patch here for x86/tip.
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/2270.html
>
> It is not very critical as it sounds as only set_memory_uc sets PAT
> bits for RAM pages. Most other users (devmem mmap, ioramep, pci)
> set PAT bits on the reserved memory. And there will not be conflicts
> across RAM and reserveed regions. Regardless, this was a stupid
> bug that we had missed earlier.

And unfortunately I don't think the above fully fixes it for AGP. __pa()
gets the real physical address and the memtypes should be on the GART
remapped physical addresses it seems.

Rene.

2008-08-21 22:58:28

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [PATCH] x86: {reverve,free}_memtype() take a physical address



>-----Original Message-----
>From: Rene Herman [mailto:[email protected]]
>Sent: Thursday, August 21, 2008 3:27 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Li, Shaohua; Dave Airlie; Yinghai Lu; Andreas
>Herrmann; Arjan van de Ven; Linux Kernel; Siddha, Suresh B;
>Thomas Gleixner; H. Peter Anvin; Dave Jones
>Subject: Re: [PATCH] x86: {reverve,free}_memtype() take a
>physical address
>
>On 22-08-08 00:16, Pallipadi, Venkatesh wrote:
>
>> Yes. Noticed that too and sent a patch here for x86/tip.
>>
>> http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/2270.html
>>
>> It is not very critical as it sounds as only set_memory_uc sets PAT
>> bits for RAM pages. Most other users (devmem mmap, ioramep, pci)
>> set PAT bits on the reserved memory. And there will not be conflicts
>> across RAM and reserveed regions. Regardless, this was a stupid
>> bug that we had missed earlier.
>
>And unfortunately I don't think the above fully fixes it for
>AGP. __pa()
>gets the real physical address and the memtypes should be on the GART
>remapped physical addresses it seems.
>

Page being marked here as uncached is the page got from alloc_page().
We are not really marking GART physical address as uncacheable. And
that page returned from alloc_page is what we are tracking with
reserve and free.
IOW, the tracking is only to keep CPU accesses consistent across
different va->pa and va across different CPUs and has nothing to do
with GART physical address here.

Thanks,
Venki

2008-08-21 23:01:22

by Rene Herman

[permalink] [raw]
Subject: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

>From b5dc6e481b38cf4e7792bcb9a8f5dd9aab0e5590 Mon Sep 17 00:00:00 2001
From: Rene Herman <[email protected]>
Date: Fri, 22 Aug 2008 00:56:00 +0200
Subject: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

---
arch/x86/mm/pageattr.c | 38 ++++++++++++++++++++++++++++++++------
1 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d49e4db..a2a497a 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -942,21 +942,38 @@ EXPORT_SYMBOL(set_memory_uc);

int set_memory_array_uc(unsigned long *addr, int addrinarray)
{
+ unsigned long start;
+ unsigned long end;
int i;
/*
* for now UC MINUS. see comments in ioremap_nocache()
*/
for (i = 0; i < addrinarray; i++) {
- if (reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE,
- _PAGE_CACHE_UC_MINUS, NULL))
+ start = __pa(addr[i]);
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL))
goto out;
}

return change_page_attr_set(addr, addrinarray,
__pgprot(_PAGE_CACHE_UC_MINUS), 1);
out:
- while (--i >= 0)
- free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);
+ for (i = 0; i < addrinarray; i++) {
+ unsigned long tmp = __pa(addr[i]);
+
+ if (tmp == start)
+ break;
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(tmp, end);
+ }
return -EINVAL;
}
EXPORT_SYMBOL(set_memory_array_uc);
@@ -997,9 +1014,18 @@ EXPORT_SYMBOL(set_memory_wb);
int set_memory_array_wb(unsigned long *addr, int addrinarray)
{
int i;
- for (i = 0; i < addrinarray; i++)
- free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);

+ for (i = 0; i < addrinarray; i++) {
+ unsigned long start = __pa(addr[i]);
+ unsigned long end;
+
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(start, end);
+ }
return change_page_attr_clear(addr, addrinarray,
__pgprot(_PAGE_CACHE_MASK), 1);
}
--
1.5.5


Attachments:
0001-x86-have-set_memory_array_-uc-wb-coalesce-memtypes.patch (2.18 kB)

2008-08-21 23:05:38

by Rene Herman

[permalink] [raw]
Subject: Re: [PATCH] x86: {reverve,free}_memtype() take a physical address

On 22-08-08 00:57, Pallipadi, Venkatesh wrote:
>
>> -----Original Message-----
>> From: Rene Herman [mailto:[email protected]]
>> Sent: Thursday, August 21, 2008 3:27 PM
>> To: Pallipadi, Venkatesh
>> Cc: Ingo Molnar; Li, Shaohua; Dave Airlie; Yinghai Lu; Andreas
>> Herrmann; Arjan van de Ven; Linux Kernel; Siddha, Suresh B;
>> Thomas Gleixner; H. Peter Anvin; Dave Jones
>> Subject: Re: [PATCH] x86: {reverve,free}_memtype() take a
>> physical address
>>
>> On 22-08-08 00:16, Pallipadi, Venkatesh wrote:
>>
>>> Yes. Noticed that too and sent a patch here for x86/tip.
>>>
>>> http://www.ussg.iu.edu/hypermail/linux/kernel/0808.2/2270.html
>>>
>>> It is not very critical as it sounds as only set_memory_uc sets PAT
>>> bits for RAM pages. Most other users (devmem mmap, ioramep, pci)
>>> set PAT bits on the reserved memory. And there will not be conflicts
>>> across RAM and reserveed regions. Regardless, this was a stupid
>>> bug that we had missed earlier.
>> And unfortunately I don't think the above fully fixes it for
>> AGP. __pa()
>> gets the real physical address and the memtypes should be on the GART
>> remapped physical addresses it seems.
>>
>
> Page being marked here as uncached is the page got from alloc_page().
> We are not really marking GART physical address as uncacheable. And
> that page returned from alloc_page is what we are tracking with
> reserve and free.
> IOW, the tracking is only to keep CPU accesses consistent across
> different va->pa and va across different CPUs and has nothing to do
> with GART physical address here.

Okay, if you say so... it _used_ to be before this array change to AGP
that the GART addresses were in the memtype list, but I'll take your
word for that being okay.

Rene

2008-08-22 02:10:30

by Andi Kleen

[permalink] [raw]
Subject: Re: AGP and PAT (induced?) problem (on AMD family 6)

> Andi, we are planning to add couple of page flags which will track

page flags are still scarce unfortunately, at least on 32bit.
It's a little better now than it used to be (at some point
they were nearly out before some were reclaimed), but adding a lot
of flags is still difficult.

In interest of full disclosure I need at least two for other
work too.

On 64bit adding a lot of new flags is fine, but 64bit only
solutions are not good in this case.

> the memory attribute of the page. We need to do some checks like,
> allow the memory attribute of the page to be changed, only if it is not
> mapped any where and not on free lists(like the in the X driver case,
> where they allocate the page and then change the attribute). Similarly,
> in generic -mm, we need to ensure that the page before it gets added to free
> list, has the right memory attribute etc.

You want to handle that in __free_pages?

I would have thought that should be handled in some higher level
function which could just check the memattr.

If the driver is exposing this
> page with special attribute, then it is drivers responsibility to
> use the same attribute across all the mappings.
>
> Is there a reason why this won't work with anonymous pages? Can you please
> elaborate.

The issue is just if you reuse the two list heads in struct page
because they're already used by the

Adding flags does not conflict here of course.

> > Also it doesn't fix the scalability of the data structure anyways
> > (a list is a list), just saves some memory.
>
> With this, we will track only the reserved regions using the linked list
> and typically these reserved regions will be small number (may be huge
> contiguous chunks but total number of such chunks will be reasonably smaller).

Reserved region defined how exactly?

-Andi

2008-08-22 04:16:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.


* Rene Herman <[email protected]> wrote:

> Actually, might as well simply reconstruct the memtype list at free
> time I guess. How is this for a coalescing version of the array
> functions?

impressive! Rarely do we get this much bang for such a low linecount :-)

> NOTE: I am posting this because I'm going to bed but haven't stared
> comfortably long at this and might be buggy. Compiles, boots and
> provides me with:
>
> root@7ixe4:~# wc -l /debug/x86/pat_memtype_list
> 53 /debug/x86/pat_memtype_list
>
> otherwise (down from 16384+).
>
> <snore>

cool!

I'd do this in v2.6.27 but i forced myself to be reasonable and applied
your patches to tip/x86/pat instead, for tentative v2.6.28 merging
(assuming it all passes testing, etc.):

# 9a79f4f: x86: {reverve,free}_memtype() take a physical address
# c5e147c: x86: have set_memory_array_{uc,wb} coalesce memtypes.
# 5f310b6: agp: enable optimized agp_alloc_pages methods

( note that i flipped them around a bit and have put your
enable-agp_alloc_pages()-widely patch last, so that we get better
bisection behavior. )

The frontside cache itself is in x86/urgent:

# 80c5e73: x86: fix Xorg startup/shutdown slowdown with PAT

... and should at least solve the symptom that you've hit in practice
(the slowdown), without changing the underlying PAT machinery. (which
would be way too dangerous for v2.6.27)

And it's all merged up in tip/master, you might want to test that too to
check whether all the pieces fit together nicely.

Tens of thousands of page granular memtypes was Not Nice.

Venki, Suresh, Shaohua, Dave, Arjan - any observations about this line
of action?

Ingo

2008-08-22 19:08:33

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

On Thu, Aug 21, 2008 at 09:15:44PM -0700, Ingo Molnar wrote:
>
> * Rene Herman <[email protected]> wrote:
>
> > Actually, might as well simply reconstruct the memtype list at free
> > time I guess. How is this for a coalescing version of the array
> > functions?
>
> impressive! Rarely do we get this much bang for such a low linecount :-)
>
> > NOTE: I am posting this because I'm going to bed but haven't stared
> > comfortably long at this and might be buggy. Compiles, boots and
> > provides me with:
> >
> > root@7ixe4:~# wc -l /debug/x86/pat_memtype_list
> > 53 /debug/x86/pat_memtype_list
> >
> > otherwise (down from 16384+).
> >
> > <snore>
>
> cool!
>
> I'd do this in v2.6.27 but i forced myself to be reasonable and applied
> your patches to tip/x86/pat instead, for tentative v2.6.28 merging
> (assuming it all passes testing, etc.):
>
> # 9a79f4f: x86: {reverve,free}_memtype() take a physical address
> # c5e147c: x86: have set_memory_array_{uc,wb} coalesce memtypes.
> # 5f310b6: agp: enable optimized agp_alloc_pages methods
>
> ( note that i flipped them around a bit and have put your
> enable-agp_alloc_pages()-widely patch last, so that we get better
> bisection behavior. )
>
> The frontside cache itself is in x86/urgent:
>
> # 80c5e73: x86: fix Xorg startup/shutdown slowdown with PAT
>
> ... and should at least solve the symptom that you've hit in practice
> (the slowdown), without changing the underlying PAT machinery. (which
> would be way too dangerous for v2.6.27)
>
> And it's all merged up in tip/master, you might want to test that too to
> check whether all the pieces fit together nicely.
>
> Tens of thousands of page granular memtypes was Not Nice.
>
> Venki, Suresh, Shaohua, Dave, Arjan - any observations about this line
> of action?

The concern I have here is that the coalescing is not guaranteed to work.
We may still end up having horrible worst case latency, even though this
improves the normal case (boot the system, start X, exit X, reboot the system).
It depends on how pages are allocated and how much memory is there in the
system and what else is running etc.

Here on my test system, without this coalescing change I see

[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
19528

With the coalescing change I see
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
135

quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
985
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1468
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1749
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
1916
quit and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
2085

untar a kernel tar.bz2 and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
2737

compile the kernel and restart X
[root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
3089


I think this as a good workaround for now. But, for long run we still need to
look at other ways of eliminating this overhead (like using page struct
that Suresh mentioned in the other thread).


Also, there seems to be a bug in the error path of the patch. Below should
fix it.

Thanks,
Venki

Fix the start addr for free_memtype calls in the error path.

Signed-off-by: Venkatesh Pallipadi <[email protected]>

---
arch/x86/mm/pageattr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: t/arch/x86/mm/pageattr.c
===================================================================
--- t.orig/arch/x86/mm/pageattr.c 2008-08-22 10:38:35.000000000 -0700
+++ t/arch/x86/mm/pageattr.c 2008-08-22 11:27:27.000000000 -0700
@@ -967,7 +967,7 @@ out:

if (tmp == start)
break;
- for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ for (end = tmp + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
if (end != __pa(addr[i + 1]))
break;
i++;

2008-08-22 20:02:31

by Rene Herman

[permalink] [raw]
Subject: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

>From 1a9bfbada3769e1bd9eecddd43ade9ebc4671c3d Mon Sep 17 00:00:00 2001
From: Rene Herman <[email protected]>
Date: Fri, 22 Aug 2008 21:27:45 +0200
Subject: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

Have set_memory_array_{uc,wb}() coalesce memtype entries so as to
avoid having unusefully many of them.

Especially in the case of AGP with its order 0 allocations for the
AGP memory the memtype list otherwise ends up with tens of thousands
of entries, slowing processing to a crawl. With this (and the former
changes to make AGP use this interface) AGP memory will just take as
many entries as needed -- which often means just one.

Note that the error path in set_memory_array_uc() just reconstructs
the entries from start again: no need to keep an expensive list of
regions around for an error condition which isn't going to happen.

Signed-off-by: Rene Herman <[email protected]>
---
arch/x86/mm/pageattr.c | 38 ++++++++++++++++++++++++++++++++------
1 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d49e4db..c7563b6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -942,21 +942,38 @@ EXPORT_SYMBOL(set_memory_uc);

int set_memory_array_uc(unsigned long *addr, int addrinarray)
{
+ unsigned long start;
+ unsigned long end;
int i;
/*
* for now UC MINUS. see comments in ioremap_nocache()
*/
for (i = 0; i < addrinarray; i++) {
- if (reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE,
- _PAGE_CACHE_UC_MINUS, NULL))
+ start = __pa(addr[i]);
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL))
goto out;
}

return change_page_attr_set(addr, addrinarray,
__pgprot(_PAGE_CACHE_UC_MINUS), 1);
out:
- while (--i >= 0)
- free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);
+ for (i = 0; i < addrinarray; i++) {
+ unsigned long tmp = __pa(addr[i]);
+
+ if (tmp == start)
+ break;
+ for (end = tmp + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(tmp, end);
+ }
return -EINVAL;
}
EXPORT_SYMBOL(set_memory_array_uc);
@@ -997,9 +1014,18 @@ EXPORT_SYMBOL(set_memory_wb);
int set_memory_array_wb(unsigned long *addr, int addrinarray)
{
int i;
- for (i = 0; i < addrinarray; i++)
- free_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE);

+ for (i = 0; i < addrinarray; i++) {
+ unsigned long start = __pa(addr[i]);
+ unsigned long end;
+
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(start, end);
+ }
return change_page_attr_clear(addr, addrinarray,
__pgprot(_PAGE_CACHE_MASK), 1);
}
--
1.5.5


Attachments:
0001-x86-have-set_memory_array_-uc-wb-coalesce-memtypes.patch (2.85 kB)

2008-08-22 20:14:32

by Rene Herman

[permalink] [raw]
Subject: Re: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.

On 22-08-08 21:08, Venki Pallipadi wrote:

> On Thu, Aug 21, 2008 at 09:15:44PM -0700, Ingo Molnar wrote:

>> Venki, Suresh, Shaohua, Dave, Arjan - any observations about this
>> line of action?
>
> The concern I have here is that the coalescing is not guaranteed to
> work. We may still end up having horrible worst case latency, even
> though this improves the normal case (boot the system, start X, exit
> X, reboot the system). It depends on how pages are allocated and how
> much memory is there in the system and what else is running etc.

Yes, I agree. Independent of the current trigger PAT wants a more
scalable approach and yes, worst case is still single page entries.

That worst case is the guaranteed case now though, so I do feel it's a
generic fix. After all, there wouldn't seem to be a reason to _not_
coalesce in set_memory_array_{uc,wb}().

> Here on my test system, without this coalescing change I see
>
> [root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
> 19528
>
> With the coalescing change I see
> [root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
> 135
>
> quit and restart X
> [root@localhost ~]# cat /proc/sys/debug/x86/pat_memtype_list | wc -l
> 985

[ constantly growing number of entries ]

Yes, absolutely right, PAT definitely needs something other than the
simple linked list. I do believe we also want the coalescing change
though - it seems to make sense regardless of trigger and it's only
little code.

> I think this as a good workaround for now. But, for long run we still need to
> look at other ways of eliminating this overhead (like using page struct
> that Suresh mentioned in the other thread).
>
>
> Also, there seems to be a bug in the error path of the patch. Below should
> fix it.

Ah, yes, thanks, just sent out a final version with this fixed as well.

Rene.

2008-08-23 15:34:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: have set_memory_array_{uc,wb} coalesce memtypes.


* Rene Herman <[email protected]> wrote:

>> Also, there seems to be a bug in the error path of the patch. Below
>> should fix it.
>
> Ah, yes, thanks, just sent out a final version with this fixed as
> well.

applied to tip/x86/pat, thanks.

Ingo