2008-03-25 22:29:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug 10328] New: [regression] performance drop for glx

(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 25 Mar 2008 15:11:15 -0700 (PDT)
[email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10328
>
> Summary: [regression] performance drop for glx
> Product: Memory Management
> Version: 2.5
> KernelVersion: 2.6.25-rc6
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: [email protected]
> ReportedBy: [email protected]
>
>
> after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some grafik
> related perfomance issues.
>
> I used glxgears for test.
> before this patch: 1281.005 FPS
> and after: 765.000 FPS

It nearly halved.

> latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a
> still hase bad performance.
>
> I use Pentium D with 2GB RAM, Grafick: i945G, ICH7
>

That's

: commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5
: Author: Siddha, Suresh B <[email protected]>
: Date: Wed Jan 30 13:33:43 2008 +0100
:
: x86: set strong uncacheable where UC is really desired
:
: Also use _PAGE_PWT for all the mappings which need uncache mapping.
: Instead of existing PAT2 which is UC- (and can be overwritten by MTRRs),
: we now use PAT3 which is strong uncacheable.
:
: This makes it consistent with pgprot_noncached()
:
: Signed-off-by: Suresh Siddha <[email protected]>
: Signed-off-by: Ingo Molnar <[email protected]>
: Signed-off-by: Thomas Gleixner <[email protected]>
:


2008-03-26 00:42:45

by Suresh Siddha

[permalink] [raw]
Subject: Re: [Bug 10328] New: [regression] performance drop for glx

On Tue, Mar 25, 2008 at 03:28:09PM -0700, Andrew Morton wrote:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT)
> [email protected] wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=10328
> >
> > Summary: [regression] performance drop for glx
> >
> > after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some grafik
> > related perfomance issues.
> >
> > I used glxgears for test.
> > before this patch: 1281.005 FPS
> > and after: 765.000 FPS
>
> It nearly halved.
>
> > latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a
> > still hase bad performance.
> >
> > I use Pentium D with 2GB RAM, Grafick: i945G, ICH7
> >
>
> That's
>
> : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5
> : Author: Siddha, Suresh B <[email protected]>
> : Date: Wed Jan 30 13:33:43 2008 +0100
> :
> : x86: set strong uncacheable where UC is really desired
> :
> : Also use _PAGE_PWT for all the mappings which need uncache mapping.
> : Instead of existing PAT2 which is UC- (and can be overwritten by MTRRs),
> : we now use PAT3 which is strong uncacheable.
> :
> : This makes it consistent with pgprot_noncached()

Alexey, Can you please try the appended patch?

Andrew, can you please push the appended patch for 2.6.25? Thanks.
---

fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.

The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
fb drivers to use this new ioremap_wc() API.

We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.

Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Cc: Arjan van de Ven <[email protected]>
---

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4afaba0..794895c 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -137,7 +137,11 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size,
switch (mode) {
case IOR_MODE_UNCACHED:
default:
- prot = PAGE_KERNEL_NOCACHE;
+ /*
+ * FIXME: we will use UC MINUS for now, as video fb drivers
+ * depend on it. Upcoming ioremap_wc() will fix this behavior.
+ */
+ prot = PAGE_KERNEL_UC_MINUS;
break;
case IOR_MODE_CACHED:
prot = PAGE_KERNEL;
diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 174b877..9cf472a 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -85,6 +85,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
#define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW)
#define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT)
#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD)
#define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER)
#define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE)
@@ -101,6 +102,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
#define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
#define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX)
#define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_UC_MINUS MAKE_GLOBAL(__PAGE_KERNEL_UC_MINUS)
#define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE)
#define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
#define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC)

2008-03-26 04:42:34

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Bug 10328] New: [regression] performance drop for glx

Suresh Siddha wrote:
> On Tue, Mar 25, 2008 at 03:28:09PM -0700, Andrew Morton wrote:
>> (switched to email. Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT)
>> [email protected] wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=10328
>>>
>>> Summary: [regression] performance drop for glx
>>>
>>> after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some grafik
>>> related perfomance issues.
>>>
>>> I used glxgears for test.
>>> before this patch: 1281.005 FPS
>>> and after: 765.000 FPS
>> It nearly halved.
>>
>>> latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a
>>> still hase bad performance.
>>>
>>> I use Pentium D with 2GB RAM, Grafick: i945G, ICH7
>>>
>> That's
>>
>> : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5
>> : Author: Siddha, Suresh B <[email protected]>
>> : Date: Wed Jan 30 13:33:43 2008 +0100
>> :
>> : x86: set strong uncacheable where UC is really desired
>> :
>> : Also use _PAGE_PWT for all the mappings which need uncache mapping.
>> : Instead of existing PAT2 which is UC- (and can be overwritten by MTRRs),
>> : we now use PAT3 which is strong uncacheable.
>> :
>> : This makes it consistent with pgprot_noncached()
>
> Alexey, Can you please try the appended patch?
>
> Andrew, can you please push the appended patch for 2.6.25? Thanks.
> ---
>
> fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
> WC attribute. Recent changes in page attribute code made both
> ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
> breaks the graphics performance, as the effective memory type is UC instead
> of expected WC.
>
> The correct way to fix this is to add ioremap_wc() (which uses UC- in the
> absence of PAT kernel support and WC with PAT) and change all the
> fb drivers to use this new ioremap_wc() API.
>
> We can take this correct and longer route for post 2.6.25. For now,
> revert back to the UC- behavior for ioremap/ioremap_nocache.

I would still like to add an ioremap_wc() even in 2.6.25; even if it's for now
identical to ioremap_nocache(). Better get the right API in place as soon as possible

2008-03-26 05:29:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Bug 10328] New: [regression] performance drop for glx


* Suresh Siddha <[email protected]> wrote:

> fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add
> with WC attribute. Recent changes in page attribute code made both
> ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-).
> This breaks the graphics performance, as the effective memory type is
> UC instead of expected WC.
>
> The correct way to fix this is to add ioremap_wc() (which uses UC- in
> the absence of PAT kernel support and WC with PAT) and change all the
> fb drivers to use this new ioremap_wc() API.
>
> We can take this correct and longer route for post 2.6.25. For now,
> revert back to the UC- behavior for ioremap/ioremap_nocache.

thanks Suresh, applied.

Ingo

2008-03-26 17:55:26

by Suresh Siddha

[permalink] [raw]
Subject: Re: [Bug 10328] New: [regression] performance drop for glx

On Wed, Mar 26, 2008 at 06:29:11AM +0100, Ingo Molnar wrote:
>
> * Suresh Siddha <[email protected]> wrote:
>
> > fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add
> > with WC attribute. Recent changes in page attribute code made both
> > ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-).
> > This breaks the graphics performance, as the effective memory type is
> > UC instead of expected WC.
> >
> > The correct way to fix this is to add ioremap_wc() (which uses UC- in
> > the absence of PAT kernel support and WC with PAT) and change all the
> > fb drivers to use this new ioremap_wc() API.
> >
> > We can take this correct and longer route for post 2.6.25. For now,
> > revert back to the UC- behavior for ioremap/ioremap_nocache.
>
> thanks Suresh, applied.

Well, we need to take care of set_memory_uc() aswell, as the previous version
didn't fix Alexey's issue.

Alexey, can you please test and ack this, before Andrew/Ingo can push relevant
bits to their trees. Thanks.
---

fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.

The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
drivers to use this new ioremap_wc() API.

We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.

Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Alexey Fisher <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 4afaba0..794895c 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -137,7 +137,11 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size,
switch (mode) {
case IOR_MODE_UNCACHED:
default:
- prot = PAGE_KERNEL_NOCACHE;
+ /*
+ * FIXME: we will use UC MINUS for now, as video fb drivers
+ * depend on it. Upcoming ioremap_wc() will fix this behavior.
+ */
+ prot = PAGE_KERNEL_UC_MINUS;
break;
case IOR_MODE_CACHED:
prot = PAGE_KERNEL;
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 14e48b5..7b79f6b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -771,7 +771,7 @@ static inline int change_page_attr_clear(unsigned long addr, int numpages,
int set_memory_uc(unsigned long addr, int numpages)
{
return change_page_attr_set(addr, numpages,
- __pgprot(_PAGE_PCD | _PAGE_PWT));
+ __pgprot(_PAGE_PCD));
}
EXPORT_SYMBOL(set_memory_uc);

diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 174b877..9cf472a 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -85,6 +85,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
#define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW)
#define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT)
#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD)
#define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER)
#define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE)
@@ -101,6 +102,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
#define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
#define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX)
#define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_UC_MINUS MAKE_GLOBAL(__PAGE_KERNEL_UC_MINUS)
#define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE)
#define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
#define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC)