2014-10-27 23:09:44

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 0/7] Support Write-Through mapping on x86

This patchset adds support of Write-Through (WT) mapping on x86.
The study below shows that using WT mapping may be useful for
non-volatile memory.

http://www.hpl.hp.com/techreports/2012/HPL-2012-236.pdf

This patchset applies on top of the Juergen's patchset below,
which provides the basis of the PAT management.

https://lkml.org/lkml/2014/10/27/71

All new/modified interfaces have been tested.

v4:
- Added set_memory_wt() by adding WT support of regular memory.

v3:
- Dropped the set_memory_wt() patch. (Andy Lutomirski)
- Refactored the !pat_enabled handling. (H. Peter Anvin,
Andy Lutomirski)
- Added the picture of PTE encoding. (Konrad Rzeszutek Wilk)

v2:
- Changed WT to use slot 7 of the PAT MSR. (H. Peter Anvin,
Andy Lutomirski)
- Changed to have conservative checks to exclude all Pentium 2, 3,
M, and 4 families. (Ingo Molnar, Henrique de Moraes Holschuh,
Andy Lutomirski)
- Updated documentation to cover WT interfaces and usages.
(Andy Lutomirski, Yigal Korman)

---
Toshi Kani (7):
1/7 x86, mm, pat: Set WT to PA7 slot of PAT MSR
2/7 x86, mm, pat: Change reserve_memtype() to handle WT
3/7 x86, mm, asm-gen: Add ioremap_wt() for WT
4/7 x86, mm, pat: Add pgprot_writethrough() for WT
5/7 x86, mm, pat: Refactor !pat_enable handling
6/7 x86, mm, asm: Add WT support to set_page_memtype()
7/7 x86, mm: Add set_memory_wt() for WT

---
Documentation/x86/pat.txt | 13 ++-
arch/x86/include/asm/cacheflush.h | 34 ++++---
arch/x86/include/asm/io.h | 2 +
arch/x86/include/asm/pgtable_types.h | 3 +
arch/x86/mm/init.c | 6 +-
arch/x86/mm/iomap_32.c | 18 ++--
arch/x86/mm/ioremap.c | 26 +++++-
arch/x86/mm/pageattr.c | 61 ++++++++++--
arch/x86/mm/pat.c | 174 +++++++++++++++++++++--------------
include/asm-generic/io.h | 4 +
include/asm-generic/iomap.h | 4 +
include/asm-generic/pgtable.h | 4 +
12 files changed, 233 insertions(+), 116 deletions(-)


2014-10-27 23:09:51

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 2/7] x86, mm, pat: Change reserve_memtype() to handle WT

This patch changes reserve_memtype() to handle the WT cache mode.
When PAT is not enabled, it continues to set UC- to *new_type for
any non-WB request.

When a target range is RAM, reserve_ram_pages_type() fails for WT
for now. This function may not reserve a RAM range for WT since
reserve_ram_pages_type() uses the page flags limited to three memory
types, WB, WC and UC.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/include/asm/cacheflush.h | 4 ++++
arch/x86/mm/pat.c | 16 +++++++++++++---
2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 157644b..c912680 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -53,6 +53,10 @@ static inline void set_page_memtype(struct page *pg,
case _PAGE_CACHE_MODE_WB:
memtype_flags = _PGMT_WB;
break;
+ case _PAGE_CACHE_MODE_WT:
+ case _PAGE_CACHE_MODE_WP:
+ pr_err("set_page_memtype: unsupported cachemode %d\n", memtype);
+ BUG();
default:
memtype_flags = _PGMT_DEFAULT;
break;
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index db687c3..a214f5a 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -289,6 +289,8 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)

/*
* For RAM pages, we use page flags to mark the pages with appropriate type.
+ * The page flags are currently limited to three types, WB, WC and UC. Hence,
+ * any request to WT or WP will fail with -EINVAL.
* Here we do two pass:
* - Find the memtype of all the pages in the range, look for any conflicts
* - In case of no conflicts, set the new memtype for pages in the range
@@ -300,6 +302,13 @@ static int reserve_ram_pages_type(u64 start, u64 end,
struct page *page;
u64 pfn;

+ if ((req_type == _PAGE_CACHE_MODE_WT) ||
+ (req_type == _PAGE_CACHE_MODE_WP)) {
+ if (new_type)
+ *new_type = _PAGE_CACHE_MODE_UC_MINUS;
+ return -EINVAL;
+ }
+
if (req_type == _PAGE_CACHE_MODE_UC) {
/* We do not support strong UC */
WARN_ON_ONCE(1);
@@ -349,6 +358,7 @@ static int free_ram_pages_type(u64 start, u64 end)
* - _PAGE_CACHE_MODE_WC
* - _PAGE_CACHE_MODE_UC_MINUS
* - _PAGE_CACHE_MODE_UC
+ * - _PAGE_CACHE_MODE_WT
*
* If new_type is NULL, function will return an error if it cannot reserve the
* region with req_type. If new_type is non-NULL, function will return
@@ -368,10 +378,10 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
if (!pat_enabled) {
/* This is identical to page table setting without PAT */
if (new_type) {
- if (req_type == _PAGE_CACHE_MODE_WC)
- *new_type = _PAGE_CACHE_MODE_UC_MINUS;
+ if (req_type == _PAGE_CACHE_MODE_WB)
+ *new_type = _PAGE_CACHE_MODE_WB;
else
- *new_type = req_type;
+ *new_type = _PAGE_CACHE_MODE_UC_MINUS;
}
return 0;
}

2014-10-27 23:09:54

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

This patch adds pgprot_writethrough() for setting WT to a given
pgprot_t.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 3 +++
arch/x86/mm/pat.c | 10 ++++++++++
include/asm-generic/pgtable.h | 4 ++++
3 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 31c66c6..eebf2a1 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -407,6 +407,9 @@ extern int nx_enabled;
#define pgprot_writecombine pgprot_writecombine
extern pgprot_t pgprot_writecombine(pgprot_t prot);

+#define pgprot_writethrough pgprot_writethrough
+extern pgprot_t pgprot_writethrough(pgprot_t prot);
+
/* Indicate that x86 has its own track and untrack pfn vma functions */
#define __HAVE_PFNMAP_TRACKING

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index a214f5a..a0264d3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
}
EXPORT_SYMBOL_GPL(pgprot_writecombine);

+pgprot_t pgprot_writethrough(pgprot_t prot)
+{
+ if (pat_enabled)
+ return __pgprot(pgprot_val(prot) |
+ cachemode2protval(_PAGE_CACHE_MODE_WT));
+ else
+ return pgprot_noncached(prot);
+}
+EXPORT_SYMBOL_GPL(pgprot_writethrough);
+
#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)

static struct memtype *memtype_get_idx(loff_t pos)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 752e30d..9ad756e 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -249,6 +249,10 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
#define pgprot_writecombine pgprot_noncached
#endif

+#ifndef pgprot_writethrough
+#define pgprot_writethrough pgprot_noncached
+#endif
+
#ifndef pgprot_device
#define pgprot_device pgprot_noncached
#endif

2014-10-27 23:13:00

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 7/7] x86, mm: Add set_memory_wt() for WT

This patch adds set_memory_wt(), set_memory_array_wt() and
set_pages_array_wt() for setting specified range(s) of the
regular memory to WT.

Signed-off-by: Toshi Kani <[email protected]>
---
Documentation/x86/pat.txt | 9 ++++--
arch/x86/include/asm/cacheflush.h | 6 +++-
arch/x86/mm/pageattr.c | 58 +++++++++++++++++++++++++++++++++----
3 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index be7b8c2..bf4339c 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -46,6 +46,9 @@ set_memory_uc | UC- | -- | -- |
set_memory_wc | WC | -- | -- |
set_memory_wb | | | |
| | | |
+set_memory_wt | WT | -- | -- |
+ set_memory_wb | | | |
+ | | | |
pci sysfs resource | -- | -- | UC- |
| | | |
pci sysfs resource_wc | -- | -- | WC |
@@ -117,8 +120,8 @@ can be more restrictive, in case of any existing aliasing for that address.
For example: If there is an existing uncached mapping, a new ioremap_wc can
return uncached mapping in place of write-combine requested.

-set_memory_[uc|wc] and set_memory_wb should be used in pairs, where driver will
-first make a region uc or wc and switch it back to wb after use.
+set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
+will first make a region uc, wc or wt and switch it back to wb after use.

Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
@@ -126,7 +129,7 @@ interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
types.

-Drivers should use set_memory_[uc|wc] to set access type for RAM ranges.
+Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.


PAT debugging
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index a561171..7578e0b 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -81,7 +81,7 @@ static inline void set_page_memtype(struct page *pg,
/*
* The set_memory_* API can be used to change various attributes of a virtual
* address range. The attributes include:
- * Cachability : UnCached, WriteCombining, WriteBack
+ * Cachability : UnCached, WriteCombining, WriteThrough, WriteBack
* Executability : eXeutable, NoteXecutable
* Read/Write : ReadOnly, ReadWrite
* Presence : NotPresent
@@ -108,9 +108,11 @@ static inline void set_page_memtype(struct page *pg,

int _set_memory_uc(unsigned long addr, int numpages);
int _set_memory_wc(unsigned long addr, int numpages);
+int _set_memory_wt(unsigned long addr, int numpages);
int _set_memory_wb(unsigned long addr, int numpages);
int set_memory_uc(unsigned long addr, int numpages);
int set_memory_wc(unsigned long addr, int numpages);
+int set_memory_wt(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
@@ -121,10 +123,12 @@ int set_memory_4k(unsigned long addr, int numpages);

int set_memory_array_uc(unsigned long *addr, int addrinarray);
int set_memory_array_wc(unsigned long *addr, int addrinarray);
+int set_memory_array_wt(unsigned long *addr, int addrinarray);
int set_memory_array_wb(unsigned long *addr, int addrinarray);

int set_pages_array_uc(struct page **pages, int addrinarray);
int set_pages_array_wc(struct page **pages, int addrinarray);
+int set_pages_array_wt(struct page **pages, int addrinarray);
int set_pages_array_wb(struct page **pages, int addrinarray);

/*
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 34f870d..6ea685b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1484,12 +1484,10 @@ EXPORT_SYMBOL(set_memory_uc);
static int _set_memory_array(unsigned long *addr, int addrinarray,
enum page_cache_mode new_type)
{
+ enum page_cache_mode set_type;
int i, j;
int ret;

- /*
- * for now UC MINUS. see comments in ioremap_nocache()
- */
for (i = 0; i < addrinarray; i++) {
ret = reserve_memtype(__pa(addr[i]), __pa(addr[i]) + PAGE_SIZE,
new_type, NULL);
@@ -1497,9 +1495,12 @@ static int _set_memory_array(unsigned long *addr, int addrinarray,
goto out_free;
}

+ /* If WC, set to UC- first and then WC */
+ set_type = (new_type == _PAGE_CACHE_MODE_WC) ?
+ _PAGE_CACHE_MODE_UC_MINUS : new_type;
+
ret = change_page_attr_set(addr, addrinarray,
- cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
- 1);
+ cachemode2pgprot(set_type), 1);

if (!ret && new_type == _PAGE_CACHE_MODE_WC)
ret = change_page_attr_set_clr(addr, addrinarray,
@@ -1531,6 +1532,12 @@ int set_memory_array_wc(unsigned long *addr, int addrinarray)
}
EXPORT_SYMBOL(set_memory_array_wc);

+int set_memory_array_wt(unsigned long *addr, int addrinarray)
+{
+ return _set_memory_array(addr, addrinarray, _PAGE_CACHE_MODE_WT);
+}
+EXPORT_SYMBOL(set_memory_array_wt);
+
int _set_memory_wc(unsigned long addr, int numpages)
{
int ret;
@@ -1571,6 +1578,34 @@ out_err:
}
EXPORT_SYMBOL(set_memory_wc);

+int _set_memory_wt(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(&addr, numpages,
+ cachemode2pgprot(_PAGE_CACHE_MODE_WT), 0);
+}
+
+int set_memory_wt(unsigned long addr, int numpages)
+{
+ int ret;
+
+ ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
+ _PAGE_CACHE_MODE_WT, NULL);
+ if (ret)
+ goto out_err;
+
+ ret = _set_memory_wt(addr, numpages);
+ if (ret)
+ goto out_free;
+
+ return 0;
+
+out_free:
+ free_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE);
+out_err:
+ return ret;
+}
+EXPORT_SYMBOL(set_memory_wt);
+
int _set_memory_wb(unsigned long addr, int numpages)
{
/* WB cache mode is hard wired to all cache attribute bits being 0 */
@@ -1663,6 +1698,7 @@ static int _set_pages_array(struct page **pages, int addrinarray,
{
unsigned long start;
unsigned long end;
+ enum page_cache_mode set_type;
int i;
int free_idx;
int ret;
@@ -1676,8 +1712,12 @@ static int _set_pages_array(struct page **pages, int addrinarray,
goto err_out;
}

+ /* If WC, set to UC- first and then WC */
+ set_type = (new_type == _PAGE_CACHE_MODE_WC) ?
+ _PAGE_CACHE_MODE_UC_MINUS : new_type;
+
ret = cpa_set_pages_array(pages, addrinarray,
- cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS));
+ cachemode2pgprot(set_type));
if (!ret && new_type == _PAGE_CACHE_MODE_WC)
ret = change_page_attr_set_clr(NULL, addrinarray,
cachemode2pgprot(
@@ -1711,6 +1751,12 @@ int set_pages_array_wc(struct page **pages, int addrinarray)
}
EXPORT_SYMBOL(set_pages_array_wc);

+int set_pages_array_wt(struct page **pages, int addrinarray)
+{
+ return _set_pages_array(pages, addrinarray, _PAGE_CACHE_MODE_WT);
+}
+EXPORT_SYMBOL(set_pages_array_wt);
+
int set_pages_wb(struct page *page, int numpages)
{
unsigned long addr = (unsigned long)page_address(page);

2014-10-27 23:13:20

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 6/7] x86, mm, asm: Add WT support to set_page_memtype()

As set_memory_wb() calls set_page_memtype() with -1, _PGMT_DEFAULT is
solely used for tracking the WB type. _PGMT_WB is defined but unused.
Hence, this patch renames _PGMT_DEFAULT to _PGMT_WB to clarify its
usage, and releases the slot used by _PGMT_WB before. As a result,
set_memory_wb() is changed to call set_page_memtype() with _PGMT_WB,
and set_page_memtype() handles any undefined type as a bug.

This patch then defines _PGMT_WT to the released slot. This enables
set_page_memtype() to track the WT type.

Signed-off-by: Toshi Kani <[email protected]>
---
arch/x86/include/asm/cacheflush.h | 28 ++++++++++++++--------------
arch/x86/mm/pat.c | 20 ++++++--------------
2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index c912680..a561171 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -7,18 +7,18 @@

#ifdef CONFIG_X86_PAT
/*
- * X86 PAT uses page flags WC and Uncached together to keep track of
- * memory type of pages that have backing page struct. X86 PAT supports 3
- * different memory types, _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC and
- * _PAGE_CACHE_MODE_UC_MINUS and fourth state where page's memory type has not
- * been changed from its default (value of -1 used to denote this).
+ * X86 PAT uses page flags arch_1 and uncached together to keep track of
+ * memory type of pages that have backing page struct. X86 PAT supports 4
+ * different memory types, _PAGE_CACHE_MODE_WT, _PAGE_CACHE_MODE_WC,
+ * _PAGE_CACHE_MODE_UC_MINUS and _PAGE_CACHE_MODE_WB where page's memory
+ * type has not been changed from its default.
* Note we do not support _PAGE_CACHE_MODE_UC here.
*/

-#define _PGMT_DEFAULT 0
+#define _PGMT_WB 0 /* default */
#define _PGMT_WC (1UL << PG_arch_1)
#define _PGMT_UC_MINUS (1UL << PG_uncached)
-#define _PGMT_WB (1UL << PG_uncached | 1UL << PG_arch_1)
+#define _PGMT_WT (1UL << PG_uncached | 1UL << PG_arch_1)
#define _PGMT_MASK (1UL << PG_uncached | 1UL << PG_arch_1)
#define _PGMT_CLEAR_MASK (~_PGMT_MASK)

@@ -26,14 +26,14 @@ static inline enum page_cache_mode get_page_memtype(struct page *pg)
{
unsigned long pg_flags = pg->flags & _PGMT_MASK;

- if (pg_flags == _PGMT_DEFAULT)
- return -1;
+ if (pg_flags == _PGMT_WB)
+ return _PAGE_CACHE_MODE_WB;
else if (pg_flags == _PGMT_WC)
return _PAGE_CACHE_MODE_WC;
else if (pg_flags == _PGMT_UC_MINUS)
return _PAGE_CACHE_MODE_UC_MINUS;
else
- return _PAGE_CACHE_MODE_WB;
+ return _PAGE_CACHE_MODE_WT;
}

static inline void set_page_memtype(struct page *pg,
@@ -50,16 +50,16 @@ static inline void set_page_memtype(struct page *pg,
case _PAGE_CACHE_MODE_UC_MINUS:
memtype_flags = _PGMT_UC_MINUS;
break;
+ case _PAGE_CACHE_MODE_WT:
+ memtype_flags = _PGMT_WT;
+ break;
case _PAGE_CACHE_MODE_WB:
memtype_flags = _PGMT_WB;
break;
- case _PAGE_CACHE_MODE_WT:
case _PAGE_CACHE_MODE_WP:
+ default:
pr_err("set_page_memtype: unsupported cachemode %d\n", memtype);
BUG();
- default:
- memtype_flags = _PGMT_DEFAULT;
- break;
}

do {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 648b885..4f61483 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -311,8 +311,8 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)

/*
* For RAM pages, we use page flags to mark the pages with appropriate type.
- * The page flags are currently limited to three types, WB, WC and UC. Hence,
- * any request to WT or WP will fail with -EINVAL.
+ * The page flags are limited to four memtypes, WB (default), WC, WT and UC.
+ * A new memtype can only be set to the default memtype WB.
* Here we do two pass:
* - Find the memtype of all the pages in the range, look for any conflicts
* - In case of no conflicts, set the new memtype for pages in the range
@@ -324,8 +324,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
struct page *page;
u64 pfn;

- if ((req_type == _PAGE_CACHE_MODE_WT) ||
- (req_type == _PAGE_CACHE_MODE_WP)) {
+ if (req_type == _PAGE_CACHE_MODE_WP) {
if (new_type)
*new_type = _PAGE_CACHE_MODE_UC_MINUS;
return -EINVAL;
@@ -342,7 +341,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,

page = pfn_to_page(pfn);
type = get_page_memtype(page);
- if (type != -1) {
+ if (type != _PAGE_CACHE_MODE_WB) {
pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
start, end - 1, type, req_type);
if (new_type)
@@ -369,7 +368,7 @@ static int free_ram_pages_type(u64 start, u64 end)

for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
page = pfn_to_page(pfn);
- set_page_memtype(page, -1);
+ set_page_memtype(page, _PAGE_CACHE_MODE_WB);
}
return 0;
}
@@ -498,7 +497,7 @@ int free_memtype(u64 start, u64 end)
* @paddr: physical address of which memory type needs to be looked up
*
* Returns _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC, _PAGE_CACHE_MODE_UC_MINUS
- * or _PAGE_CACHE_MODE_UC
+ * or _PAGE_CACHE_MODE_WT.
*/
static enum page_cache_mode lookup_memtype(u64 paddr)
{
@@ -512,13 +511,6 @@ static enum page_cache_mode lookup_memtype(u64 paddr)
struct page *page;
page = pfn_to_page(paddr >> PAGE_SHIFT);
rettype = get_page_memtype(page);
- /*
- * -1 from get_page_memtype() implies RAM page is in its
- * default state and not reserved, and hence of type WB
- */
- if (rettype == -1)
- rettype = _PAGE_CACHE_MODE_WB;
-
return rettype;
}

2014-10-27 23:13:45

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 3/7] x86, mm, asm-gen: Add ioremap_wt() for WT

This patch adds ioremap_wt() for creating WT mapping on x86.
It follows the same model as ioremap_wc() for multi-architecture
support. ARCH_HAS_IOREMAP_WT is defined in the x86 version of
io.h to indicate that ioremap_wt() is implemented on x86.

Also update the PAT documentation file to cover ioremap_wt().

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
Documentation/x86/pat.txt | 4 +++-
arch/x86/include/asm/io.h | 2 ++
arch/x86/mm/ioremap.c | 24 ++++++++++++++++++++++++
include/asm-generic/io.h | 4 ++++
include/asm-generic/iomap.h | 4 ++++
5 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9f..be7b8c2 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -12,7 +12,7 @@ virtual addresses.

PAT allows for different types of memory attributes. The most commonly used
ones that will be supported at this time are Write-back, Uncached,
-Write-combined and Uncached Minus.
+Write-combined, Write-through and Uncached Minus.


PAT APIs
@@ -38,6 +38,8 @@ ioremap_nocache | -- | UC- | UC- |
| | | |
ioremap_wc | -- | -- | WC |
| | | |
+ioremap_wt | -- | -- | WT |
+ | | | |
set_memory_uc | UC- | -- | -- |
set_memory_wb | | | |
| | | |
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 71b9e65..c813c86 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -35,6 +35,7 @@
*/

#define ARCH_HAS_IOREMAP_WC
+#define ARCH_HAS_IOREMAP_WT

#include <linux/string.h>
#include <linux/compiler.h>
@@ -316,6 +317,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
enum page_cache_mode pcm);
extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_wt(resource_size_t offset, unsigned long size);

extern bool is_early_ioremap_ptep(pte_t *ptep);

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 8832e51..b096988 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -167,6 +167,10 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
prot = __pgprot(pgprot_val(prot) |
cachemode2protval(_PAGE_CACHE_MODE_WC));
break;
+ case _PAGE_CACHE_MODE_WT:
+ prot = __pgprot(pgprot_val(prot) |
+ cachemode2protval(_PAGE_CACHE_MODE_WT));
+ break;
case _PAGE_CACHE_MODE_WB:
break;
}
@@ -261,6 +265,26 @@ void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
}
EXPORT_SYMBOL(ioremap_wc);

+/**
+ * ioremap_wt - map memory into CPU space write through
+ * @phys_addr: bus address of the memory
+ * @size: size of the resource to map
+ *
+ * This version of ioremap ensures that the memory is marked write through.
+ * Write through writes data into memory while keeping the cache up-to-date.
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
+{
+ if (pat_enabled)
+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
+ __builtin_return_address(0));
+ else
+ return ioremap_nocache(phys_addr, size);
+}
+EXPORT_SYMBOL(ioremap_wt);
+
void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
{
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index b8fdc57..9dd07bf 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -322,6 +322,10 @@ static inline void __iomem *ioremap(phys_addr_t offset, unsigned long size)
#define ioremap_wc ioremap_nocache
#endif

+#ifndef ioremap_wt
+#define ioremap_wt ioremap_nocache
+#endif
+
static inline void iounmap(void __iomem *addr)
{
}
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 1b41011..d8f8622 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -66,6 +66,10 @@ extern void ioport_unmap(void __iomem *);
#define ioremap_wc ioremap_nocache
#endif

+#ifndef ARCH_HAS_IOREMAP_WT
+#define ioremap_wt ioremap_nocache
+#endif
+
#ifdef CONFIG_PCI
/* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */
struct pci_dev;

2014-10-27 23:13:44

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 5/7] x86, mm, pat: Refactor !pat_enabled handling

This patch refactors the !pat_enabled handling code and integrates
this case into the PAT abstraction code. The PAT table is emulated
by corresponding to the two cache attribute bits, PWT (Write Through)
and PCD (Cache Disable). The emulated PAT table is also the same as
the BIOS default setup in case the system has PAT but "nopat" boot
option is specified.

As a result of this change, cache aliasing is checked for all cases
including !pat_enabled.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
---
arch/x86/mm/init.c | 6 ++-
arch/x86/mm/iomap_32.c | 18 +++------
arch/x86/mm/ioremap.c | 10 +----
arch/x86/mm/pageattr.c | 3 --
arch/x86/mm/pat.c | 92 ++++++++++++++++++++----------------------------
5 files changed, 50 insertions(+), 79 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 82b41d5..2e147c8 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -37,7 +37,7 @@
*/
uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
[_PAGE_CACHE_MODE_WB] = 0,
- [_PAGE_CACHE_MODE_WC] = _PAGE_PWT,
+ [_PAGE_CACHE_MODE_WC] = _PAGE_PCD,
[_PAGE_CACHE_MODE_UC_MINUS] = _PAGE_PCD,
[_PAGE_CACHE_MODE_UC] = _PAGE_PCD | _PAGE_PWT,
[_PAGE_CACHE_MODE_WT] = _PAGE_PCD,
@@ -46,11 +46,11 @@ uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
EXPORT_SYMBOL_GPL(__cachemode2pte_tbl);
uint8_t __pte2cachemode_tbl[8] = {
[__pte2cm_idx(0)] = _PAGE_CACHE_MODE_WB,
- [__pte2cm_idx(_PAGE_PWT)] = _PAGE_CACHE_MODE_WC,
+ [__pte2cm_idx(_PAGE_PWT)] = _PAGE_CACHE_MODE_UC_MINUS,
[__pte2cm_idx(_PAGE_PCD)] = _PAGE_CACHE_MODE_UC_MINUS,
[__pte2cm_idx(_PAGE_PWT | _PAGE_PCD)] = _PAGE_CACHE_MODE_UC,
[__pte2cm_idx(_PAGE_PAT)] = _PAGE_CACHE_MODE_WB,
- [__pte2cm_idx(_PAGE_PWT | _PAGE_PAT)] = _PAGE_CACHE_MODE_WC,
+ [__pte2cm_idx(_PAGE_PWT | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
[__pte2cm_idx(_PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
[__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC,
};
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index ee58a0b..96aa8bf 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -70,29 +70,23 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
return (void *)vaddr;
}

-/*
- * Map 'pfn' using protections 'prot'
- */
-#define __PAGE_KERNEL_WC (__PAGE_KERNEL | \
- cachemode2protval(_PAGE_CACHE_MODE_WC))
-
void __iomem *
iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
{
/*
- * For non-PAT systems, promote PAGE_KERNEL_WC to PAGE_KERNEL_UC_MINUS.
- * PAGE_KERNEL_WC maps to PWT, which translates to uncached if the
- * MTRR is UC or WC. UC_MINUS gets the real intention, of the
- * user, which is "WC if the MTRR is WC, UC if you can't do that."
+ * For non-PAT systems, translate non-WB request to UC- just in
+ * case the caller set the PWT bit to prot directly without using
+ * pgprot_writecombine(). UC- translates to uncached if the MTRR
+ * is UC or WC. UC- gets the real intention, of the user, which is
+ * "WC if the MTRR is WC, UC if you can't do that."
*/
- if (!pat_enabled && pgprot_val(prot) == __PAGE_KERNEL_WC)
+ if (!pat_enabled && pgprot2cachemode(prot) != _PAGE_CACHE_MODE_WB)
prot = __pgprot(__PAGE_KERNEL |
cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));

return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
}
EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
-#undef __PAGE_KERNEL_WC

void
iounmap_atomic(void __iomem *kvaddr)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index b096988..f83e483 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -257,11 +257,8 @@ EXPORT_SYMBOL(ioremap_nocache);
*/
void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
- if (pat_enabled)
- return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
__builtin_return_address(0));
- else
- return ioremap_nocache(phys_addr, size);
}
EXPORT_SYMBOL(ioremap_wc);

@@ -277,11 +274,8 @@ EXPORT_SYMBOL(ioremap_wc);
*/
void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
{
- if (pat_enabled)
- return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
+ return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
__builtin_return_address(0));
- else
- return ioremap_nocache(phys_addr, size);
}
EXPORT_SYMBOL(ioremap_wt);

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6917b39..34f870d 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1553,9 +1553,6 @@ int set_memory_wc(unsigned long addr, int numpages)
{
int ret;

- if (!pat_enabled)
- return set_memory_uc(addr, numpages);
-
ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
_PAGE_CACHE_MODE_WC, NULL);
if (ret)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index a0264d3..648b885 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -135,28 +135,48 @@ void pat_init(void)
bool boot_cpu = !boot_pat_state;
struct cpuinfo_x86 *c = &boot_cpu_data;

- if (!pat_enabled)
- return;
-
if (!cpu_has_pat) {
if (!boot_pat_state) {
pat_disable("PAT not supported by CPU.");
- return;
- } else {
+ } else if (pat_enabled) {
/*
* If this happens we are on a secondary CPU, but
* switched to PAT on the boot CPU. We have no way to
* undo PAT.
*/
- printk(KERN_ERR "PAT enabled, "
+ pr_err("PAT enabled, "
"but not supported by secondary CPU\n");
BUG();
}
}

- if ((c->x86_vendor == X86_VENDOR_INTEL) &&
- (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
- ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
+ if (!pat_enabled) {
+ /*
+ * No PAT. Emulate the PAT table by corresponding to the two
+ * cache bits, PWT (Write Through) and PCD (Cache Disable).
+ * This is also the same as the BIOS default setup in case
+ * the system has PAT but "nopat" boot option is specified.
+ *
+ * PTE encoding used in Linux:
+ * PCD
+ * |PWT PAT
+ * || slot
+ * 00 0 WB : _PAGE_CACHE_MODE_WB
+ * 01 1 WT : _PAGE_CACHE_MODE_WT
+ * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
+ * 11 3 UC : _PAGE_CACHE_MODE_UC
+ *
+ * NOTE: When WC or WP is used, it is redirected to UC- per
+ * the default setup in __cachemode2pte_tbl[].
+ */
+ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
+ PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ if (!boot_pat_state)
+ boot_pat_state = pat;
+
+ } else if ((c->x86_vendor == X86_VENDOR_INTEL) &&
+ (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
+ ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
/*
* PAT support with the lower four entries. Intel Pentium 2,
* 3, M, and 4 are affected by PAT errata, which makes the
@@ -203,11 +223,13 @@ void pat_init(void)
PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
}

- /* Boot CPU check */
- if (!boot_pat_state)
- rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+ if (pat_enabled) {
+ /* Boot CPU check */
+ if (!boot_pat_state)
+ rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);

- wrmsrl(MSR_IA32_CR_PAT, pat);
+ wrmsrl(MSR_IA32_CR_PAT, pat);
+ }

if (boot_cpu)
pat_init_cache_modes();
@@ -375,17 +397,6 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,

BUG_ON(start >= end); /* end is exclusive */

- if (!pat_enabled) {
- /* This is identical to page table setting without PAT */
- if (new_type) {
- if (req_type == _PAGE_CACHE_MODE_WB)
- *new_type = _PAGE_CACHE_MODE_WB;
- else
- *new_type = _PAGE_CACHE_MODE_UC_MINUS;
- }
- return 0;
- }
-
/* Low ISA region is always mapped WB in page table. No need to track */
if (x86_platform.is_untracked_pat_range(start, end)) {
if (new_type)
@@ -450,9 +461,6 @@ int free_memtype(u64 start, u64 end)
int is_range_ram;
struct memtype *entry;

- if (!pat_enabled)
- return 0;
-
/* Low ISA region is always mapped WB. No need to track */
if (x86_platform.is_untracked_pat_range(start, end))
return 0;
@@ -489,8 +497,6 @@ int free_memtype(u64 start, u64 end)
* lookup_memtype - Looksup the memory type for a physical address
* @paddr: physical address of which memory type needs to be looked up
*
- * Only to be called when PAT is enabled
- *
* Returns _PAGE_CACHE_MODE_WB, _PAGE_CACHE_MODE_WC, _PAGE_CACHE_MODE_UC_MINUS
* or _PAGE_CACHE_MODE_UC
*/
@@ -591,16 +597,13 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
return 1;
}
#else
-/* This check is needed to avoid cache aliasing when PAT is enabled */
+/* This check is needed to avoid cache aliasing */
static inline int range_is_allowed(unsigned long pfn, unsigned long size)
{
u64 from = ((u64)pfn) << PAGE_SHIFT;
u64 to = from + size;
u64 cursor = from;

- if (!pat_enabled)
- return 1;
-
while (cursor < to) {
if (!devmem_is_allowed(pfn)) {
printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx]\n",
@@ -704,9 +707,6 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
* the type requested matches the type of first page in the range.
*/
if (is_ram) {
- if (!pat_enabled)
- return 0;
-
pcm = lookup_memtype(paddr);
if (want_pcm != pcm) {
printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
@@ -819,9 +819,6 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
return ret;
}

- if (!pat_enabled)
- return 0;
-
/*
* For anything smaller than the vma size we set prot based on the
* lookup.
@@ -847,9 +844,6 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
{
enum page_cache_mode pcm;

- if (!pat_enabled)
- return 0;
-
/* Set prot based on lookup */
pcm = lookup_memtype((resource_size_t)pfn << PAGE_SHIFT);
*prot = __pgprot((pgprot_val(vma->vm_page_prot) & (~_PAGE_CACHE_MASK)) |
@@ -888,21 +882,15 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,

pgprot_t pgprot_writecombine(pgprot_t prot)
{
- if (pat_enabled)
- return __pgprot(pgprot_val(prot) |
+ return __pgprot(pgprot_val(prot) |
cachemode2protval(_PAGE_CACHE_MODE_WC));
- else
- return pgprot_noncached(prot);
}
EXPORT_SYMBOL_GPL(pgprot_writecombine);

pgprot_t pgprot_writethrough(pgprot_t prot)
{
- if (pat_enabled)
- return __pgprot(pgprot_val(prot) |
+ return __pgprot(pgprot_val(prot) |
cachemode2protval(_PAGE_CACHE_MODE_WT));
- else
- return pgprot_noncached(prot);
}
EXPORT_SYMBOL_GPL(pgprot_writethrough);

@@ -981,10 +969,8 @@ static const struct file_operations memtype_fops = {

static int __init pat_memtype_list_init(void)
{
- if (pat_enabled) {
- debugfs_create_file("pat_memtype_list", S_IRUSR,
+ debugfs_create_file("pat_memtype_list", S_IRUSR,
arch_debugfs_dir, NULL, &memtype_fops);
- }
return 0;
}

2014-10-27 23:14:27

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

This patch sets WT to the PA7 slot in the PAT MSR when the processor
is not affected by the PAT errata. The PA7 slot is chosen to further
minimize the risk of using the PAT bit as the PA3 slot is UC and is
not currently used.

The following Intel processors are affected by the PAT errata.

errata cpuid
----------------------------------------------------
Pentium 2, A52 family 0x6, model 0x5
Pentium 3, E27 family 0x6, model 0x7, 0x8
Pentium 3 Xenon, G26 family 0x6, model 0x7, 0x8, 0xa
Pentium M, Y26 family 0x6, model 0x9
Pentium M 90nm, X9 family 0x6, model 0xd
Pentium 4, N46 family 0xf, model 0x0

Instead of making sharp boundary checks, this patch makes conservative
checks to exclude all Pentium 2, 3, M and 4 family processors. For
such processors, _PAGE_CACHE_MODE_WT is redirected to UC- per the
default setup in __cachemode2pte_tbl[].

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
---
arch/x86/mm/pat.c | 64 +++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index ff31851..db687c3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -133,6 +133,7 @@ void pat_init(void)
{
u64 pat;
bool boot_cpu = !boot_pat_state;
+ struct cpuinfo_x86 *c = &boot_cpu_data;

if (!pat_enabled)
return;
@@ -153,21 +154,54 @@ void pat_init(void)
}
}

- /* Set PWT to Write-Combining. All other bits stay the same */
- /*
- * PTE encoding used in Linux:
- * PAT
- * |PCD
- * ||PWT
- * |||
- * 000 WB _PAGE_CACHE_WB
- * 001 WC _PAGE_CACHE_WC
- * 010 UC- _PAGE_CACHE_UC_MINUS
- * 011 UC _PAGE_CACHE_UC
- * PAT bit unused
- */
- pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
+ if ((c->x86_vendor == X86_VENDOR_INTEL) &&
+ (((c->x86 == 0x6) && (c->x86_model <= 0xd)) ||
+ ((c->x86 == 0xf) && (c->x86_model <= 0x6)))) {
+ /*
+ * PAT support with the lower four entries. Intel Pentium 2,
+ * 3, M, and 4 are affected by PAT errata, which makes the
+ * upper four entries unusable. We do not use the upper four
+ * entries for all the affected processor families for safe.
+ *
+ * PTE encoding used in Linux:
+ * PAT
+ * |PCD
+ * ||PWT PAT
+ * ||| slot
+ * 000 0 WB : _PAGE_CACHE_MODE_WB
+ * 001 1 WC : _PAGE_CACHE_MODE_WC
+ * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
+ * 011 3 UC : _PAGE_CACHE_MODE_UC
+ * PAT bit unused
+ *
+ * NOTE: When WT or WP is used, it is redirected to UC- per
+ * the default setup in __cachemode2pte_tbl[].
+ */
+ pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
+ PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
+ } else {
+ /*
+ * PAT full support. WT is set to slot 7, which minimizes
+ * the risk of using the PAT bit as slot 3 is UC and is
+ * currently unused. Slot 4 should remain as reserved.
+ *
+ * PTE encoding used in Linux:
+ * PAT
+ * |PCD
+ * ||PWT PAT
+ * ||| slot
+ * 000 0 WB : _PAGE_CACHE_MODE_WB
+ * 001 1 WC : _PAGE_CACHE_MODE_WC
+ * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
+ * 011 3 UC : _PAGE_CACHE_MODE_UC
+ * 100 4 <reserved>
+ * 101 5 <reserved>
+ * 110 6 <reserved>
+ * 111 7 WT : _PAGE_CACHE_MODE_WT
+ */
+ pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
+ PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ }

/* Boot CPU check */
if (!boot_pat_state)

2014-11-03 17:14:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, 27 Oct 2014, Toshi Kani wrote:
> + } else {
> + /*
> + * PAT full support. WT is set to slot 7, which minimizes
> + * the risk of using the PAT bit as slot 3 is UC and is
> + * currently unused. Slot 4 should remain as reserved.

This comment makes no sense. What minimizes which risk and what has
this to do with slot 3 and slot 4?

> + *
> + * PTE encoding used in Linux:
> + * PAT
> + * |PCD
> + * ||PWT PAT
> + * ||| slot
> + * 000 0 WB : _PAGE_CACHE_MODE_WB
> + * 001 1 WC : _PAGE_CACHE_MODE_WC
> + * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> + * 011 3 UC : _PAGE_CACHE_MODE_UC
> + * 100 4 <reserved>
> + * 101 5 <reserved>
> + * 110 6 <reserved>

Well, they are still mapped to WB/WC/UC_MINUS ....

> + * 111 7 WT : _PAGE_CACHE_MODE_WT
> + */
> + pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
> + PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
> + }

Thanks,

tglx

2014-11-03 18:02:06

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, 2014-11-03 at 18:14 +0100, Thomas Gleixner wrote:
> On Mon, 27 Oct 2014, Toshi Kani wrote:
> > + } else {
> > + /*
> > + * PAT full support. WT is set to slot 7, which minimizes
> > + * the risk of using the PAT bit as slot 3 is UC and is
> > + * currently unused. Slot 4 should remain as reserved.
>
> This comment makes no sense. What minimizes which risk and what has
> this to do with slot 3 and slot 4?

This is for precaution. Since the patch enables the PAT bit the first
time, it was suggested that we keep slot 4 reserved and set it to WB.
The PAT bit still has no effect to slot 0/1/2 (WB/WC/UC-) after this
patch. Slot 7 is the safest slot since slot 3 (UC) is unused today.

https://lkml.org/lkml/2014/9/4/691
https://lkml.org/lkml/2014/9/5/394

> > + *
> > + * PTE encoding used in Linux:
> > + * PAT
> > + * |PCD
> > + * ||PWT PAT
> > + * ||| slot
> > + * 000 0 WB : _PAGE_CACHE_MODE_WB
> > + * 001 1 WC : _PAGE_CACHE_MODE_WC
> > + * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> > + * 011 3 UC : _PAGE_CACHE_MODE_UC
> > + * 100 4 <reserved>
> > + * 101 5 <reserved>
> > + * 110 6 <reserved>
>
> Well, they are still mapped to WB/WC/UC_MINUS ....

Right, the reserved slots are also initialized with their safe values.
However, the macros _PAGE_CACHE_MODE_XXX only refer to the slots
specified above.

> > + * 111 7 WT : _PAGE_CACHE_MODE_WT
> > + */
> > + pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
> > + PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
> > + }
>
> Thanks,

Thanks for the review!
-Toshi

2014-11-03 18:09:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, Nov 3, 2014 at 9:47 AM, Toshi Kani <[email protected]> wrote:
> On Mon, 2014-11-03 at 18:14 +0100, Thomas Gleixner wrote:
>> On Mon, 27 Oct 2014, Toshi Kani wrote:
>> > + } else {
>> > + /*
>> > + * PAT full support. WT is set to slot 7, which minimizes
>> > + * the risk of using the PAT bit as slot 3 is UC and is
>> > + * currently unused. Slot 4 should remain as reserved.
>>
>> This comment makes no sense. What minimizes which risk and what has
>> this to do with slot 3 and slot 4?
>
> This is for precaution. Since the patch enables the PAT bit the first
> time, it was suggested that we keep slot 4 reserved and set it to WB.
> The PAT bit still has no effect to slot 0/1/2 (WB/WC/UC-) after this
> patch. Slot 7 is the safest slot since slot 3 (UC) is unused today.
>
> https://lkml.org/lkml/2014/9/4/691
> https://lkml.org/lkml/2014/9/5/394
>

I would clarify the comment, since this really has nothing to do with
slot 3 being unused. How about:

We put WT in slot 7 to improve robustness in the presence of errata
that might cause the high PAT bit to be ignored. This way a buggy
slot 7 access will hit slot 3, and slot 3 is UC, so at worst we lose
performance without causing a correctness issue. Pentium 4 erratum
N46 is an example of such an erratum, although we try not to use PAT
at all on affected CPUs.

--Andy


>> > + *
>> > + * PTE encoding used in Linux:
>> > + * PAT
>> > + * |PCD
>> > + * ||PWT PAT
>> > + * ||| slot
>> > + * 000 0 WB : _PAGE_CACHE_MODE_WB
>> > + * 001 1 WC : _PAGE_CACHE_MODE_WC
>> > + * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
>> > + * 011 3 UC : _PAGE_CACHE_MODE_UC
>> > + * 100 4 <reserved>
>> > + * 101 5 <reserved>
>> > + * 110 6 <reserved>
>>
>> Well, they are still mapped to WB/WC/UC_MINUS ....
>
> Right, the reserved slots are also initialized with their safe values.
> However, the macros _PAGE_CACHE_MODE_XXX only refer to the slots
> specified above.
>
>> > + * 111 7 WT : _PAGE_CACHE_MODE_WT
>> > + */
>> > + pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
>> > + PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
>> > + }
>>
>> Thanks,
>
> Thanks for the review!
> -Toshi
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-11-03 18:15:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, 3 Nov 2014, Andy Lutomirski wrote:
> On Mon, Nov 3, 2014 at 9:47 AM, Toshi Kani <[email protected]> wrote:
> > On Mon, 2014-11-03 at 18:14 +0100, Thomas Gleixner wrote:
> >> On Mon, 27 Oct 2014, Toshi Kani wrote:
> >> > + } else {
> >> > + /*
> >> > + * PAT full support. WT is set to slot 7, which minimizes
> >> > + * the risk of using the PAT bit as slot 3 is UC and is
> >> > + * currently unused. Slot 4 should remain as reserved.
> >>
> >> This comment makes no sense. What minimizes which risk and what has
> >> this to do with slot 3 and slot 4?
> >
> > This is for precaution. Since the patch enables the PAT bit the first
> > time, it was suggested that we keep slot 4 reserved and set it to WB.
> > The PAT bit still has no effect to slot 0/1/2 (WB/WC/UC-) after this
> > patch. Slot 7 is the safest slot since slot 3 (UC) is unused today.
> >
> > https://lkml.org/lkml/2014/9/4/691
> > https://lkml.org/lkml/2014/9/5/394
> >
>
> I would clarify the comment, since this really has nothing to do with
> slot 3 being unused. How about:
>
> We put WT in slot 7 to improve robustness in the presence of errata
> that might cause the high PAT bit to be ignored. This way a buggy
> slot 7 access will hit slot 3, and slot 3 is UC, so at worst we lose
> performance without causing a correctness issue. Pentium 4 erratum
> N46 is an example of such an erratum, although we try not to use PAT
> at all on affected CPUs.

Indeed. That makes a lot more sense.

> >> > + *
> >> > + * PTE encoding used in Linux:
> >> > + * PAT
> >> > + * |PCD
> >> > + * ||PWT PAT
> >> > + * ||| slot
> >> > + * 000 0 WB : _PAGE_CACHE_MODE_WB
> >> > + * 001 1 WC : _PAGE_CACHE_MODE_WC
> >> > + * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> >> > + * 011 3 UC : _PAGE_CACHE_MODE_UC
> >> > + * 100 4 <reserved>
> >> > + * 101 5 <reserved>
> >> > + * 110 6 <reserved>
> >>
> >> Well, they are still mapped to WB/WC/UC_MINUS ....
> >
> > Right, the reserved slots are also initialized with their safe values.
> > However, the macros _PAGE_CACHE_MODE_XXX only refer to the slots
> > specified above.

Then the table should reflect this, i.e.: reserved, but mapped to XX

And a comment below that explaining WHY we map the reserved slots.

Thanks,

tglx

2014-11-03 18:15:50

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, 2014-11-03 at 10:08 -0800, Andy Lutomirski wrote:
> On Mon, Nov 3, 2014 at 9:47 AM, Toshi Kani <[email protected]> wrote:
> > On Mon, 2014-11-03 at 18:14 +0100, Thomas Gleixner wrote:
> >> On Mon, 27 Oct 2014, Toshi Kani wrote:
> >> > + } else {
> >> > + /*
> >> > + * PAT full support. WT is set to slot 7, which minimizes
> >> > + * the risk of using the PAT bit as slot 3 is UC and is
> >> > + * currently unused. Slot 4 should remain as reserved.
> >>
> >> This comment makes no sense. What minimizes which risk and what has
> >> this to do with slot 3 and slot 4?
> >
> > This is for precaution. Since the patch enables the PAT bit the first
> > time, it was suggested that we keep slot 4 reserved and set it to WB.
> > The PAT bit still has no effect to slot 0/1/2 (WB/WC/UC-) after this
> > patch. Slot 7 is the safest slot since slot 3 (UC) is unused today.
> >
> > https://lkml.org/lkml/2014/9/4/691
> > https://lkml.org/lkml/2014/9/5/394
> >
>
> I would clarify the comment, since this really has nothing to do with
> slot 3 being unused.

Right.

> How about:
>
> We put WT in slot 7 to improve robustness in the presence of errata
> that might cause the high PAT bit to be ignored. This way a buggy
> slot 7 access will hit slot 3, and slot 3 is UC, so at worst we lose
> performance without causing a correctness issue. Pentium 4 erratum
> N46 is an example of such an erratum, although we try not to use PAT
> at all on affected CPUs.

That looks much better. :-) I will update the comment.

Thanks!
-Toshi

2014-11-03 18:23:00

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86, mm, pat: Set WT to PA7 slot of PAT MSR

On Mon, 2014-11-03 at 19:15 +0100, Thomas Gleixner wrote:
> On Mon, 3 Nov 2014, Andy Lutomirski wrote:
> > On Mon, Nov 3, 2014 at 9:47 AM, Toshi Kani <[email protected]> wrote:
> > > On Mon, 2014-11-03 at 18:14 +0100, Thomas Gleixner wrote:
> > >> On Mon, 27 Oct 2014, Toshi Kani wrote:
> > >> > + } else {
> > >> > + /*
> > >> > + * PAT full support. WT is set to slot 7, which minimizes
> > >> > + * the risk of using the PAT bit as slot 3 is UC and is
> > >> > + * currently unused. Slot 4 should remain as reserved.
> > >>
> > >> This comment makes no sense. What minimizes which risk and what has
> > >> this to do with slot 3 and slot 4?
> > >
> > > This is for precaution. Since the patch enables the PAT bit the first
> > > time, it was suggested that we keep slot 4 reserved and set it to WB.
> > > The PAT bit still has no effect to slot 0/1/2 (WB/WC/UC-) after this
> > > patch. Slot 7 is the safest slot since slot 3 (UC) is unused today.
> > >
> > > https://lkml.org/lkml/2014/9/4/691
> > > https://lkml.org/lkml/2014/9/5/394
> > >
> >
> > I would clarify the comment, since this really has nothing to do with
> > slot 3 being unused. How about:
> >
> > We put WT in slot 7 to improve robustness in the presence of errata
> > that might cause the high PAT bit to be ignored. This way a buggy
> > slot 7 access will hit slot 3, and slot 3 is UC, so at worst we lose
> > performance without causing a correctness issue. Pentium 4 erratum
> > N46 is an example of such an erratum, although we try not to use PAT
> > at all on affected CPUs.
>
> Indeed. That makes a lot more sense.
>
> > >> > + *
> > >> > + * PTE encoding used in Linux:
> > >> > + * PAT
> > >> > + * |PCD
> > >> > + * ||PWT PAT
> > >> > + * ||| slot
> > >> > + * 000 0 WB : _PAGE_CACHE_MODE_WB
> > >> > + * 001 1 WC : _PAGE_CACHE_MODE_WC
> > >> > + * 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
> > >> > + * 011 3 UC : _PAGE_CACHE_MODE_UC
> > >> > + * 100 4 <reserved>
> > >> > + * 101 5 <reserved>
> > >> > + * 110 6 <reserved>
> > >>
> > >> Well, they are still mapped to WB/WC/UC_MINUS ....
> > >
> > > Right, the reserved slots are also initialized with their safe values.
> > > However, the macros _PAGE_CACHE_MODE_XXX only refer to the slots
> > > specified above.
>
> Then the table should reflect this, i.e.: reserved, but mapped to XX
>
> And a comment below that explaining WHY we map the reserved slots.

Yes, I will update the table and add a comment.

Thanks,
-Toshi

2014-11-03 18:27:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] x86, mm, pat: Change reserve_memtype() to handle WT

On Mon, 27 Oct 2014, Toshi Kani wrote:
> This patch changes reserve_memtype() to handle the WT cache mode.
> When PAT is not enabled, it continues to set UC- to *new_type for
> any non-WB request.
>
> When a target range is RAM, reserve_ram_pages_type() fails for WT
> for now. This function may not reserve a RAM range for WT since
> reserve_ram_pages_type() uses the page flags limited to three memory
> types, WB, WC and UC.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/include/asm/cacheflush.h | 4 ++++
> arch/x86/mm/pat.c | 16 +++++++++++++---
> 2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index 157644b..c912680 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -53,6 +53,10 @@ static inline void set_page_memtype(struct page *pg,
> case _PAGE_CACHE_MODE_WB:
> memtype_flags = _PGMT_WB;
> break;
> + case _PAGE_CACHE_MODE_WT:
> + case _PAGE_CACHE_MODE_WP:
> + pr_err("set_page_memtype: unsupported cachemode %d\n", memtype);
> + BUG();

You already catch the cases with the hunk below at the entry of
reserve_ram_pages_type(). So what's the point of the BUG()?

If you are worried about other usage sites: This function should not
at all be in arch/x86/include/asm/cacheflush.h. It's solely used by
PAT, so we really should move it there before changing it.

> default:
> memtype_flags = _PGMT_DEFAULT;
> break;
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index db687c3..a214f5a 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -289,6 +289,8 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)
>
> /*
> * For RAM pages, we use page flags to mark the pages with appropriate type.
> + * The page flags are currently limited to three types, WB, WC and UC. Hence,
> + * any request to WT or WP will fail with -EINVAL.
> * Here we do two pass:
> * - Find the memtype of all the pages in the range, look for any conflicts
> * - In case of no conflicts, set the new memtype for pages in the range
> @@ -300,6 +302,13 @@ static int reserve_ram_pages_type(u64 start, u64 end,
> struct page *page;
> u64 pfn;
>
> + if ((req_type == _PAGE_CACHE_MODE_WT) ||
> + (req_type == _PAGE_CACHE_MODE_WP)) {
> + if (new_type)
> + *new_type = _PAGE_CACHE_MODE_UC_MINUS;
> + return -EINVAL;
> + }
> +
> if (req_type == _PAGE_CACHE_MODE_UC) {
> /* We do not support strong UC */
> WARN_ON_ONCE(1);
> @@ -349,6 +358,7 @@ static int free_ram_pages_type(u64 start, u64 end)
> * - _PAGE_CACHE_MODE_WC
> * - _PAGE_CACHE_MODE_UC_MINUS
> * - _PAGE_CACHE_MODE_UC
> + * - _PAGE_CACHE_MODE_WT
> *
> * If new_type is NULL, function will return an error if it cannot reserve the
> * region with req_type. If new_type is non-NULL, function will return
> @@ -368,10 +378,10 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
> if (!pat_enabled) {
> /* This is identical to page table setting without PAT */
> if (new_type) {
> - if (req_type == _PAGE_CACHE_MODE_WC)
> - *new_type = _PAGE_CACHE_MODE_UC_MINUS;
> + if (req_type == _PAGE_CACHE_MODE_WB)
> + *new_type = _PAGE_CACHE_MODE_WB;
> else
> - *new_type = req_type;
> + *new_type = _PAGE_CACHE_MODE_UC_MINUS;

So until now we supported WB, UC- and UC and mapped WC to UC-. Now we
map everything except WB to UC-

Why feels that wrong without a comment explaining it?

Thanks,

tglx

2014-11-03 19:01:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] x86, mm, pat: Refactor !pat_enabled handling

On Mon, 27 Oct 2014, Toshi Kani wrote:
> diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
> index ee58a0b..96aa8bf 100644
> --- a/arch/x86/mm/iomap_32.c
> +++ b/arch/x86/mm/iomap_32.c
> @@ -70,29 +70,23 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
> return (void *)vaddr;
> }
>
> -/*
> - * Map 'pfn' using protections 'prot'
> - */
> -#define __PAGE_KERNEL_WC (__PAGE_KERNEL | \
> - cachemode2protval(_PAGE_CACHE_MODE_WC))
> -
> void __iomem *
> iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
> {
> /*
> - * For non-PAT systems, promote PAGE_KERNEL_WC to PAGE_KERNEL_UC_MINUS.
> - * PAGE_KERNEL_WC maps to PWT, which translates to uncached if the
> - * MTRR is UC or WC. UC_MINUS gets the real intention, of the
> - * user, which is "WC if the MTRR is WC, UC if you can't do that."
> + * For non-PAT systems, translate non-WB request to UC- just in
> + * case the caller set the PWT bit to prot directly without using
> + * pgprot_writecombine(). UC- translates to uncached if the MTRR
> + * is UC or WC. UC- gets the real intention, of the user, which is
> + * "WC if the MTRR is WC, UC if you can't do that."
> */
> - if (!pat_enabled && pgprot_val(prot) == __PAGE_KERNEL_WC)
> + if (!pat_enabled && pgprot2cachemode(prot) != _PAGE_CACHE_MODE_WB)
> prot = __pgprot(__PAGE_KERNEL |
> cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
>
> return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
> }
> EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
> -#undef __PAGE_KERNEL_WC

Rejects. Please update against Juergens latest.

Thanks,

tglx

2014-11-03 19:11:32

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] x86, mm, pat: Change reserve_memtype() to handle WT

On Mon, 2014-11-03 at 19:27 +0100, Thomas Gleixner wrote:
> On Mon, 27 Oct 2014, Toshi Kani wrote:
> > This patch changes reserve_memtype() to handle the WT cache mode.
> > When PAT is not enabled, it continues to set UC- to *new_type for
> > any non-WB request.
> >
> > When a target range is RAM, reserve_ram_pages_type() fails for WT
> > for now. This function may not reserve a RAM range for WT since
> > reserve_ram_pages_type() uses the page flags limited to three memory
> > types, WB, WC and UC.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
> > Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/include/asm/cacheflush.h | 4 ++++
> > arch/x86/mm/pat.c | 16 +++++++++++++---
> > 2 files changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> > index 157644b..c912680 100644
> > --- a/arch/x86/include/asm/cacheflush.h
> > +++ b/arch/x86/include/asm/cacheflush.h
> > @@ -53,6 +53,10 @@ static inline void set_page_memtype(struct page *pg,
> > case _PAGE_CACHE_MODE_WB:
> > memtype_flags = _PGMT_WB;
> > break;
> > + case _PAGE_CACHE_MODE_WT:
> > + case _PAGE_CACHE_MODE_WP:
> > + pr_err("set_page_memtype: unsupported cachemode %d\n", memtype);
> > + BUG();
>
> You already catch the cases with the hunk below at the entry of
> reserve_ram_pages_type(). So what's the point of the BUG()?
>
> If you are worried about other usage sites: This function should not
> at all be in arch/x86/include/asm/cacheflush.h. It's solely used by
> PAT, so we really should move it there before changing it.

Yes, I was worried about other usage. I agree that this function should
belong to PAT. I will move it to pat.c before changing it, and will
remove BUG() from this patch.

> > default:
> > memtype_flags = _PGMT_DEFAULT;
> > break;
> > diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> > index db687c3..a214f5a 100644
> > --- a/arch/x86/mm/pat.c
> > +++ b/arch/x86/mm/pat.c
> > @@ -289,6 +289,8 @@ static int pat_pagerange_is_ram(resource_size_t start, resource_size_t end)
> >
> > /*
> > * For RAM pages, we use page flags to mark the pages with appropriate type.
> > + * The page flags are currently limited to three types, WB, WC and UC. Hence,
> > + * any request to WT or WP will fail with -EINVAL.
> > * Here we do two pass:
> > * - Find the memtype of all the pages in the range, look for any conflicts
> > * - In case of no conflicts, set the new memtype for pages in the range
> > @@ -300,6 +302,13 @@ static int reserve_ram_pages_type(u64 start, u64 end,
> > struct page *page;
> > u64 pfn;
> >
> > + if ((req_type == _PAGE_CACHE_MODE_WT) ||
> > + (req_type == _PAGE_CACHE_MODE_WP)) {
> > + if (new_type)
> > + *new_type = _PAGE_CACHE_MODE_UC_MINUS;
> > + return -EINVAL;
> > + }
> > +
> > if (req_type == _PAGE_CACHE_MODE_UC) {
> > /* We do not support strong UC */
> > WARN_ON_ONCE(1);
> > @@ -349,6 +358,7 @@ static int free_ram_pages_type(u64 start, u64 end)
> > * - _PAGE_CACHE_MODE_WC
> > * - _PAGE_CACHE_MODE_UC_MINUS
> > * - _PAGE_CACHE_MODE_UC
> > + * - _PAGE_CACHE_MODE_WT
> > *
> > * If new_type is NULL, function will return an error if it cannot reserve the
> > * region with req_type. If new_type is non-NULL, function will return
> > @@ -368,10 +378,10 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
> > if (!pat_enabled) {
> > /* This is identical to page table setting without PAT */
> > if (new_type) {
> > - if (req_type == _PAGE_CACHE_MODE_WC)
> > - *new_type = _PAGE_CACHE_MODE_UC_MINUS;
> > + if (req_type == _PAGE_CACHE_MODE_WB)
> > + *new_type = _PAGE_CACHE_MODE_WB;
> > else
> > - *new_type = req_type;
> > + *new_type = _PAGE_CACHE_MODE_UC_MINUS;
>
> So until now we supported WB, UC- and UC and mapped WC to UC-. Now we
> map everything except WB to UC-
> Why feels that wrong without a comment explaining it?

The case of !pat_enable only supports WB and UC-. Previously, WC was
the only type that needed to be mapped to UC-. The code now handles it
properly for any other types. Yes, I will add a comment for this.

Thanks,
-Toshi

2014-11-03 19:23:47

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] x86, mm, pat: Refactor !pat_enabled handling

On Mon, 2014-11-03 at 20:01 +0100, Thomas Gleixner wrote:
> On Mon, 27 Oct 2014, Toshi Kani wrote:
> > diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
> > index ee58a0b..96aa8bf 100644
> > --- a/arch/x86/mm/iomap_32.c
> > +++ b/arch/x86/mm/iomap_32.c
> > @@ -70,29 +70,23 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
> > return (void *)vaddr;
> > }
> >
> > -/*
> > - * Map 'pfn' using protections 'prot'
> > - */
> > -#define __PAGE_KERNEL_WC (__PAGE_KERNEL | \
> > - cachemode2protval(_PAGE_CACHE_MODE_WC))
> > -
> > void __iomem *
> > iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
> > {
> > /*
> > - * For non-PAT systems, promote PAGE_KERNEL_WC to PAGE_KERNEL_UC_MINUS.
> > - * PAGE_KERNEL_WC maps to PWT, which translates to uncached if the
> > - * MTRR is UC or WC. UC_MINUS gets the real intention, of the
> > - * user, which is "WC if the MTRR is WC, UC if you can't do that."
> > + * For non-PAT systems, translate non-WB request to UC- just in
> > + * case the caller set the PWT bit to prot directly without using
> > + * pgprot_writecombine(). UC- translates to uncached if the MTRR
> > + * is UC or WC. UC- gets the real intention, of the user, which is
> > + * "WC if the MTRR is WC, UC if you can't do that."
> > */
> > - if (!pat_enabled && pgprot_val(prot) == __PAGE_KERNEL_WC)
> > + if (!pat_enabled && pgprot2cachemode(prot) != _PAGE_CACHE_MODE_WB)
> > prot = __pgprot(__PAGE_KERNEL |
> > cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
> >
> > return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
> > }
> > EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
> > -#undef __PAGE_KERNEL_WC
>
> Rejects. Please update against Juergens latest.

Yes, I have addressed this merge conflict in my next version. I will
submit it shortly after all other comments are addressed. :-)

Thanks,
-Toshi

Subject: RE: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT



> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Kani, Toshimitsu
> Sent: Monday, 27 October, 2014 5:56 PM
> To: [email protected]; [email protected]; [email protected]; akpm@linux-
> foundation.org; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Kani,
> Toshimitsu
> Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
> WT
>
> This patch adds pgprot_writethrough() for setting WT to a given
> pgprot_t.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
...
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index a214f5a..a0264d3 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
> }
> EXPORT_SYMBOL_GPL(pgprot_writecombine);
>
> +pgprot_t pgprot_writethrough(pgprot_t prot)
> +{
> + if (pat_enabled)
> + return __pgprot(pgprot_val(prot) |
> + cachemode2protval(_PAGE_CACHE_MODE_WT));
> + else
> + return pgprot_noncached(prot);
> +}
> +EXPORT_SYMBOL_GPL(pgprot_writethrough);
...

Would you be willing to use EXPORT_SYMBOL for the new
pgprot_writethrough function to provide more flexibility
for modules to utilize the new feature? In x86/mm, 18 of 60
current exports are GPL and 42 are not GPL.

---
Rob Elliott HP Server Storage

2014-11-03 22:29:09

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, 2014-11-03 at 22:10 +0000, Elliott, Robert (Server Storage)
wrote:
:
> > Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
> > WT
> >
> > This patch adds pgprot_writethrough() for setting WT to a given
> > pgprot_t.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
> > Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> ...
> > diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> > index a214f5a..a0264d3 100644
> > --- a/arch/x86/mm/pat.c
> > +++ b/arch/x86/mm/pat.c
> > @@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
> > }
> > EXPORT_SYMBOL_GPL(pgprot_writecombine);
> >
> > +pgprot_t pgprot_writethrough(pgprot_t prot)
> > +{
> > + if (pat_enabled)
> > + return __pgprot(pgprot_val(prot) |
> > + cachemode2protval(_PAGE_CACHE_MODE_WT));
> > + else
> > + return pgprot_noncached(prot);
> > +}
> > +EXPORT_SYMBOL_GPL(pgprot_writethrough);
> ...
>
> Would you be willing to use EXPORT_SYMBOL for the new
> pgprot_writethrough function to provide more flexibility
> for modules to utilize the new feature? In x86/mm, 18 of 60
> current exports are GPL and 42 are not GPL.

I simply used EXPORT_SYMBOL_GPL() since pgprot_writecombine() used
it. :-) This interface is intended to be used along with
remap_pfn_range() and ioremap_prot(), which are both exported with
EXPORT_SYMBOL(). So, it seems reasonable to export it with
EXPORT_SYMBOL() as well. I will make this change.

Thanks,
-Toshi

2014-11-03 22:54:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, 3 Nov 2014, Toshi Kani wrote:
> On Mon, 2014-11-03 at 22:10 +0000, Elliott, Robert (Server Storage)
> wrote:
> :
> > > Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
> > > WT
> > >
> > > This patch adds pgprot_writethrough() for setting WT to a given
> > > pgprot_t.
> > >
> > > Signed-off-by: Toshi Kani <[email protected]>
> > > Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> > ...
> > > diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> > > index a214f5a..a0264d3 100644
> > > --- a/arch/x86/mm/pat.c
> > > +++ b/arch/x86/mm/pat.c
> > > @@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
> > > }
> > > EXPORT_SYMBOL_GPL(pgprot_writecombine);
> > >
> > > +pgprot_t pgprot_writethrough(pgprot_t prot)
> > > +{
> > > + if (pat_enabled)
> > > + return __pgprot(pgprot_val(prot) |
> > > + cachemode2protval(_PAGE_CACHE_MODE_WT));
> > > + else
> > > + return pgprot_noncached(prot);
> > > +}
> > > +EXPORT_SYMBOL_GPL(pgprot_writethrough);
> > ...
> >
> > Would you be willing to use EXPORT_SYMBOL for the new
> > pgprot_writethrough function to provide more flexibility
> > for modules to utilize the new feature? In x86/mm, 18 of 60
> > current exports are GPL and 42 are not GPL.
>
> I simply used EXPORT_SYMBOL_GPL() since pgprot_writecombine() used
> it. :-) This interface is intended to be used along with
> remap_pfn_range() and ioremap_prot(), which are both exported with
> EXPORT_SYMBOL(). So, it seems reasonable to export it with
> EXPORT_SYMBOL() as well. I will make this change.

NAK.

This is new functionality and we really have no reason to give the GPL
circumventors access to it.

Thanks,

tglx

2014-11-03 23:01:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, Nov 3, 2014 at 2:53 PM, Thomas Gleixner <[email protected]> wrote:
> On Mon, 3 Nov 2014, Toshi Kani wrote:
>> On Mon, 2014-11-03 at 22:10 +0000, Elliott, Robert (Server Storage)
>> wrote:
>> :
>> > > Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
>> > > WT
>> > >
>> > > This patch adds pgprot_writethrough() for setting WT to a given
>> > > pgprot_t.
>> > >
>> > > Signed-off-by: Toshi Kani <[email protected]>
>> > > Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
>> > ...
>> > > diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
>> > > index a214f5a..a0264d3 100644
>> > > --- a/arch/x86/mm/pat.c
>> > > +++ b/arch/x86/mm/pat.c
>> > > @@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
>> > > }
>> > > EXPORT_SYMBOL_GPL(pgprot_writecombine);
>> > >
>> > > +pgprot_t pgprot_writethrough(pgprot_t prot)
>> > > +{
>> > > + if (pat_enabled)
>> > > + return __pgprot(pgprot_val(prot) |
>> > > + cachemode2protval(_PAGE_CACHE_MODE_WT));
>> > > + else
>> > > + return pgprot_noncached(prot);
>> > > +}
>> > > +EXPORT_SYMBOL_GPL(pgprot_writethrough);
>> > ...
>> >
>> > Would you be willing to use EXPORT_SYMBOL for the new
>> > pgprot_writethrough function to provide more flexibility
>> > for modules to utilize the new feature? In x86/mm, 18 of 60
>> > current exports are GPL and 42 are not GPL.
>>
>> I simply used EXPORT_SYMBOL_GPL() since pgprot_writecombine() used
>> it. :-) This interface is intended to be used along with
>> remap_pfn_range() and ioremap_prot(), which are both exported with
>> EXPORT_SYMBOL(). So, it seems reasonable to export it with
>> EXPORT_SYMBOL() as well. I will make this change.
>
> NAK.
>
> This is new functionality and we really have no reason to give the GPL
> circumventors access to it.

I have mixed feelings about this.

On the one hand, I agree with your sentiment.

On the other hand, I thought that _GPL was supposed to be more about
whether the thing using it is inherently a derived work of the Linux
kernel. Since WT is an Intel concept, not a Linux concept, then I
think that this is a hard argument to make.

Not that I mind encouraging HP to GPL everything. Although my
experiences so far with HP servers have been so uniformly negative
that I really just want to stay far away from anything storage-related
by HP for several years, so I'm very unlikely to own an affected piece
of hardware any time soon. (Sorry, HP.)

--Andy

2014-11-03 23:32:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, 3 Nov 2014, Andy Lutomirski wrote:
> On Mon, Nov 3, 2014 at 2:53 PM, Thomas Gleixner <[email protected]> wrote:
> > On Mon, 3 Nov 2014, Toshi Kani wrote:
> >> On Mon, 2014-11-03 at 22:10 +0000, Elliott, Robert (Server Storage)
> >> wrote:
> >> > > +EXPORT_SYMBOL_GPL(pgprot_writethrough);
> >> > ...
> >> >
> >> > Would you be willing to use EXPORT_SYMBOL for the new
> >> > pgprot_writethrough function to provide more flexibility
> >> > for modules to utilize the new feature? In x86/mm, 18 of 60
> >> > current exports are GPL and 42 are not GPL.
> >>
> >> I simply used EXPORT_SYMBOL_GPL() since pgprot_writecombine() used
> >> it. :-) This interface is intended to be used along with
> >> remap_pfn_range() and ioremap_prot(), which are both exported with
> >> EXPORT_SYMBOL(). So, it seems reasonable to export it with
> >> EXPORT_SYMBOL() as well. I will make this change.
> >
> > NAK.
> >
> > This is new functionality and we really have no reason to give the GPL
> > circumventors access to it.
>
> I have mixed feelings about this.
>
> On the one hand, I agree with your sentiment.
>
> On the other hand, I thought that _GPL was supposed to be more about
> whether the thing using it is inherently a derived work of the Linux
> kernel. Since WT is an Intel concept, not a Linux concept, then I
> think that this is a hard argument to make.

If your argument stands then almost nothing in Linux which is related
to hardware can stand on its own as it is always dependent on the
underlying hardware. But that's not true. The software support for
that particular hardware feature is always Linux specific.

The point about derived work, which Linus made, is that the GPL might
not necessarily apply to something which was developed completely
independent of Linux in the first place and then adopted via a wrapper
layer. This applies pretty much to the odd binary graphics drivers
which got retrofitted with a Linux interface by wrapping the existing
binary blob.

We have always accomodated with this by not changing the replacement
interfaces for something with was EXPORT_SYMBOL to
EXPORT_SYMBOL_GPL. Though we have forced binary blobs away from
abusing stuff by removing such exports; google for the removal of the
init_mm export.

But that does not mean that we are obliged to expose new Linux
infrastucture which supports existing Intel hardware properties to
drivers which prefer to be closed for whatever reason.

Quite the contrary. We want to expose these new features to the fair
players. The HP driver can live with the less performant modes and if
it wants to use WT, that's none of our problems at all.

Thanks,

tglx

2014-11-04 01:04:39

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, 2014-11-03 at 23:53 +0100, Thomas Gleixner wrote:
> On Mon, 3 Nov 2014, Toshi Kani wrote:
> > On Mon, 2014-11-03 at 22:10 +0000, Elliott, Robert (Server Storage)
> > wrote:
> > :
> > > > Subject: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
> > > > WT
> > > >
> > > > This patch adds pgprot_writethrough() for setting WT to a given
> > > > pgprot_t.
> > > >
> > > > Signed-off-by: Toshi Kani <[email protected]>
> > > > Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> > > ...
> > > > diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> > > > index a214f5a..a0264d3 100644
> > > > --- a/arch/x86/mm/pat.c
> > > > +++ b/arch/x86/mm/pat.c
> > > > @@ -896,6 +896,16 @@ pgprot_t pgprot_writecombine(pgprot_t prot)
> > > > }
> > > > EXPORT_SYMBOL_GPL(pgprot_writecombine);
> > > >
> > > > +pgprot_t pgprot_writethrough(pgprot_t prot)
> > > > +{
> > > > + if (pat_enabled)
> > > > + return __pgprot(pgprot_val(prot) |
> > > > + cachemode2protval(_PAGE_CACHE_MODE_WT));
> > > > + else
> > > > + return pgprot_noncached(prot);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(pgprot_writethrough);
> > > ...
> > >
> > > Would you be willing to use EXPORT_SYMBOL for the new
> > > pgprot_writethrough function to provide more flexibility
> > > for modules to utilize the new feature? In x86/mm, 18 of 60
> > > current exports are GPL and 42 are not GPL.
> >
> > I simply used EXPORT_SYMBOL_GPL() since pgprot_writecombine() used
> > it. :-) This interface is intended to be used along with
> > remap_pfn_range() and ioremap_prot(), which are both exported with
> > EXPORT_SYMBOL(). So, it seems reasonable to export it with
> > EXPORT_SYMBOL() as well. I will make this change.
>
> NAK.
>
> This is new functionality and we really have no reason to give the GPL
> circumventors access to it.

Thanks for the background info about EXPORT_SYMBOL. I will keep it no
change.

-Toshi

Subject: RE: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT



> -----Original Message-----
> From: Andy Lutomirski [mailto:[email protected]]
> Sent: Monday, November 03, 2014 5:01 PM
> To: Thomas Gleixner
> Cc: Kani, Toshimitsu; Elliott, Robert (Server Storage); [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
> WT
>
> On Mon, Nov 3, 2014 at 2:53 PM, Thomas Gleixner <[email protected]>
> wrote:
...
> On the other hand, I thought that _GPL was supposed to be more about
> whether the thing using it is inherently a derived work of the Linux
> kernel. Since WT is an Intel concept, not a Linux concept, then I
> think that this is a hard argument to make.

IBM System/360 Model 85 (1968) had write-through (i.e., store-through)
caching. Intel might claim Write Combining, though.



????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-04 15:23:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for WT

On Mon, Nov 3, 2014 at 7:34 PM, Elliott, Robert (Server Storage)
<[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Andy Lutomirski [mailto:[email protected]]
>> Sent: Monday, November 03, 2014 5:01 PM
>> To: Thomas Gleixner
>> Cc: Kani, Toshimitsu; Elliott, Robert (Server Storage); [email protected];
>> [email protected]; [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH v4 4/7] x86, mm, pat: Add pgprot_writethrough() for
>> WT
>>
>> On Mon, Nov 3, 2014 at 2:53 PM, Thomas Gleixner <[email protected]>
>> wrote:
> ...
>> On the other hand, I thought that _GPL was supposed to be more about
>> whether the thing using it is inherently a derived work of the Linux
>> kernel. Since WT is an Intel concept, not a Linux concept, then I
>> think that this is a hard argument to make.
>
> IBM System/360 Model 85 (1968) had write-through (i.e., store-through)
> caching. Intel might claim Write Combining, though.
>

Arguably WC is, and was, mostly a hack to enable full cacheline writes
without an instruction to do it directly. x86 has such an instruction
now, so WC is less necessary.

In any event, my point wasn't that Intel should get any particular
credit here; it's that this is really a straightforward interface to
program a hardware feature that predates the interface.

--Andy