2015-02-26 13:16:10

by Lars Persson

[permalink] [raw]
Subject: [PATCH 0/2] New fix for icache coherency race

This patch set proposes an improved fix for the race condition that
originally was fixed in commit 2a4a8b1e5d9d ("MIPS: Remove race window
in page fault handling").

I have used the flush_icache_page API that is marked as deprecated in
Documentation/cachetlb.txt. There are strong reasons to keep this API
because it is not possible to implement an efficient and race-free
lazy flushing using the other APIs.

You can refer to a discussion about the same issue in arch/arm where
they chose to implement the solution in set_pte_at. In arch/mips we
could not do this because we lack information about the executability
of the mapping in set_pte_at() and thus we would have to flush all
pages to be safe.

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/030915.html

Lars Persson (2):
Revert "MIPS: Remove race window in page fault handling"
MIPS: Fix race condition in lazy cache flushing.

arch/mips/include/asm/cacheflush.h | 35 ++++++++++++++++++++---------------
arch/mips/include/asm/pgtable.h | 10 ++++++----
arch/mips/mm/cache.c | 27 ++++++++-------------------
3 files changed, 34 insertions(+), 38 deletions(-)

--
1.7.10.4


2015-02-26 13:16:38

by Lars Persson

[permalink] [raw]
Subject: [PATCH 1/2] Revert "MIPS: Remove race window in page fault handling"

Revert commit 2a4a8b1e5d9d ("MIPS: Remove race window in page fault
handling") because it increased the number of flushed dcache pages and
became a performance problem for some workloads.

Signed-off-by: Lars Persson <[email protected]>
---
arch/mips/include/asm/pgtable.h | 10 ++++++----
arch/mips/mm/cache.c | 27 ++++++++-------------------
2 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index bef782c..bd6d1cf 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -127,12 +127,9 @@ do { \
} \
} while(0)

-
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
- pte_t pteval);
-
#if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)

+
#define pte_none(pte) (!(((pte).pte_low | (pte).pte_high) & ~_PAGE_GLOBAL))
#define pte_present(pte) ((pte).pte_low & _PAGE_PRESENT)

@@ -154,6 +151,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
}
}
}
+#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)

static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
@@ -192,6 +190,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
}
#endif
}
+#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)

static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
@@ -407,12 +406,15 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)

extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
pte_t pte);
+extern void __update_cache(struct vm_area_struct *vma, unsigned long address,
+ pte_t pte);

static inline void update_mmu_cache(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
pte_t pte = *ptep;
__update_tlb(vma, address, pte);
+ __update_cache(vma, address, pte);
}

static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 7e3ea77..f7b91d3 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -119,36 +119,25 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)

EXPORT_SYMBOL(__flush_anon_page);

-static void mips_flush_dcache_from_pte(pte_t pteval, unsigned long address)
+void __update_cache(struct vm_area_struct *vma, unsigned long address,
+ pte_t pte)
{
struct page *page;
- unsigned long pfn = pte_pfn(pteval);
+ unsigned long pfn, addr;
+ int exec = (vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc;

+ pfn = pte_pfn(pte);
if (unlikely(!pfn_valid(pfn)))
return;
-
page = pfn_to_page(pfn);
if (page_mapping(page) && Page_dcache_dirty(page)) {
- unsigned long page_addr = (unsigned long) page_address(page);
-
- if (!cpu_has_ic_fills_f_dc ||
- pages_do_alias(page_addr, address & PAGE_MASK))
- flush_data_cache_page(page_addr);
+ addr = (unsigned long) page_address(page);
+ if (exec || pages_do_alias(addr, address & PAGE_MASK))
+ flush_data_cache_page(addr);
ClearPageDcacheDirty(page);
}
}

-void set_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pteval)
-{
- if (cpu_has_dc_aliases || !cpu_has_ic_fills_f_dc) {
- if (pte_present(pteval))
- mips_flush_dcache_from_pte(pteval, addr);
- }
-
- set_pte(ptep, pteval);
-}
-
unsigned long _page_cachable_default;
EXPORT_SYMBOL(_page_cachable_default);

--
1.7.10.4

2015-02-26 13:16:11

by Lars Persson

[permalink] [raw]
Subject: [PATCH 2/2] MIPS: Fix race condition in lazy cache flushing.

The lazy cache flushing implemented in the MIPS kernel suffers from a
race condition that is exposed by do_set_pte() in mm/memory.c.

A pre-condition is a file-system that writes to the page from the CPU
in its readpage method and then calls flush_dcache_page(). One example
is ubifs. Another pre-condition is that the dcache flush is postponed
in __flush_dcache_page().

Upon a page fault for an executable mapping not existing in the
page-cache, the following will happen:
1. Write to the page
2. flush_dcache_page
3. flush_icache_page
4. set_pte_at
5. update_mmu_cache (commits the flush of a dcache-dirty page)

Between steps 4 and 5 another thread can hit the same page and it will
encounter a valid pte. Because the data still is in the L1 dcache the CPU
will fetch stale data from L2 into the icache and execute garbage.

This fix moves the commit of the cache flush to step 3 to close the
race window. It also reduces the amount of flushes on non-executable
mappings because we never enter __flush_dcache_page() for non-aliasing
CPUs.

Regressions can occur in drivers that mistakenly relies on the
flush_dcache_page() in get_user_pages() for DMA operations.

Signed-off-by: Lars Persson <[email protected]>
---
arch/mips/include/asm/cacheflush.h | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index e08381a..37d5cf9 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -29,6 +29,20 @@
* - flush_icache_all() flush the entire instruction cache
* - flush_data_cache_page() flushes a page from the data cache
*/
+
+ /*
+ * This flag is used to indicate that the page pointed to by a pte
+ * is dirty and requires cleaning before returning it to the user.
+ */
+#define PG_dcache_dirty PG_arch_1
+
+#define Page_dcache_dirty(page) \
+ test_bit(PG_dcache_dirty, &(page)->flags)
+#define SetPageDcacheDirty(page) \
+ set_bit(PG_dcache_dirty, &(page)->flags)
+#define ClearPageDcacheDirty(page) \
+ clear_bit(PG_dcache_dirty, &(page)->flags)
+
extern void (*flush_cache_all)(void);
extern void (*__flush_cache_all)(void);
extern void (*flush_cache_mm)(struct mm_struct *mm);
@@ -41,9 +55,10 @@ extern void __flush_dcache_page(struct page *page);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
static inline void flush_dcache_page(struct page *page)
{
- if (cpu_has_dc_aliases || !cpu_has_ic_fills_f_dc)
+ if (cpu_has_dc_aliases)
__flush_dcache_page(page);
-
+ else if (!cpu_has_ic_fills_f_dc)
+ SetPageDcacheDirty(page);
}

#define flush_dcache_mmap_lock(mapping) do { } while (0)
@@ -61,6 +76,9 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
static inline void flush_icache_page(struct vm_area_struct *vma,
struct page *page)
{
+ if (!cpu_has_ic_fills_f_dc && (vma->vm_flags & VM_EXEC) &&
+ Page_dcache_dirty(page))
+ __flush_dcache_page(page);
}

extern void (*flush_icache_range)(unsigned long start, unsigned long end);
@@ -95,19 +113,6 @@ extern void (*flush_icache_all)(void);
extern void (*local_flush_data_cache_page)(void * addr);
extern void (*flush_data_cache_page)(unsigned long addr);

-/*
- * This flag is used to indicate that the page pointed to by a pte
- * is dirty and requires cleaning before returning it to the user.
- */
-#define PG_dcache_dirty PG_arch_1
-
-#define Page_dcache_dirty(page) \
- test_bit(PG_dcache_dirty, &(page)->flags)
-#define SetPageDcacheDirty(page) \
- set_bit(PG_dcache_dirty, &(page)->flags)
-#define ClearPageDcacheDirty(page) \
- clear_bit(PG_dcache_dirty, &(page)->flags)
-
/* Run kernel code uncached, useful for cache probing functions. */
unsigned long run_uncached(void *func);

--
1.7.10.4

2015-04-08 22:53:30

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 0/2] New fix for icache coherency race

On Thu, Feb 26, 2015 at 02:16:01PM +0100, Lars Persson wrote:
> This patch set proposes an improved fix for the race condition that
> originally was fixed in commit 2a4a8b1e5d9d ("MIPS: Remove race window
> in page fault handling").
>
> I have used the flush_icache_page API that is marked as deprecated in
> Documentation/cachetlb.txt. There are strong reasons to keep this API
> because it is not possible to implement an efficient and race-free
> lazy flushing using the other APIs.
>
> You can refer to a discussion about the same issue in arch/arm where
> they chose to implement the solution in set_pte_at. In arch/mips we
> could not do this because we lack information about the executability
> of the mapping in set_pte_at() and thus we would have to flush all
> pages to be safe.
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/030915.html
>
> Lars Persson (2):
> Revert "MIPS: Remove race window in page fault handling"
> MIPS: Fix race condition in lazy cache flushing.

FYI these 2 patches prevent a linux-next based kernel booting on the
Ingenic JZ4780-based CI20 board. I've not yet tried on the in-tree
JZ4740-based qi_lb60 to see whether it's also affected, nor have I yet
figured out what's going wrong. I'll hopefully dig into it tomorrow, but
just a heads up!

The boot failure (using an initramfs, so no DMA or much I/O at all):

[ 4.618013] Freeing unused kernel memory: 5160K (803d6000 - 808e0000)
[ 4.625211] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 80027924, ra == 8001db10
[ 4.635881] Oops[#1]:
[ 4.638149] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc7-next-20150408+ #334
[ 4.645354] task: 8dc39568 ti: 8dc3a000 task.ti: 8dc3a000
[ 4.650736] $ 0 : 00000000 00000001 00000001 00001000
[ 4.655965] $ 4 : 00000000 00000001 808e0000 8df0b610
[ 4.639347] $ 8 : 00000000 8d8f8160 00000000 00000000
[ 4.644575] $12 : 00000000 00000118 00000040 ffff0000
[ 4.649802] $16 : 81c649fc 77984000 00000004 8df013f8
[ 4.655030] $20 : 81c649fc 00000000 00000000 00040000
[ 4.638413] $24 : ff000000 80027920
[ 4.643642] $28 : 8dc3a000 8dc3bd38 ffffffbf 8001db10
[ 4.648871] Hi : 3036f946
[ 4.651740] Lo : eeea49fc
[ 4.654620] epc : 80027924 r4k_blast_dcache_page_dc32+0x4/0x9c
[ 4.638763] Not tainted
[ 4.641550] ra : 8001db10 __update_cache+0xa4/0xd0
[ 4.646585] Status: 10000403 KERNEL EXL IE
[ 4.650767] Cause : 00800008
[ 4.653634] BadVA : 00000000
[ 4.656503] PrId : 3ee1024f (Ingenic JZRISC)
[ 4.639002] Process init (pid: 1, threadinfo=8dc3a000, task=8dc39568, tls=00000000)
[ 4.646636] Stack : 3f916478 67caba37 58047621 77984000 58047621 77984000 8df0b610 800b777c
00400000 00031ec0 00000001 00400000 8d8f8178 8ddae840 8de8fbdc 81c649fc
8df013f8 8dc3be00 8de8fbe0 8008f7ac 8df09e38 8de9be40 fffffff8 8deeb000
00000000 00000000 00000000 00000040 803b9924 8dcd9000 77984000 800e2200
803ca320 800e1474 8df013f8 00000610 77984b50 00000000 8df00778 8df0b610
...
[ 4.638523] Call Trace:
[ 4.640962] [<80027924>] r4k_blast_dcache_page_dc32+0x4/0x9c
[ 4.646608] [<8001db10>] __update_cache+0xa4/0xd0
[ 4.651305] [<800b777c>] do_set_pte+0x14c/0x174
[ 4.655826] [<8008f7ac>] filemap_map_pages+0x2ac/0x384
[ 4.639107] [<800b7a54>] handle_mm_fault+0x2b0/0x1020
[ 4.644148] [<8001f460>] __do_page_fault+0x160/0x470
[ 4.649102] [<80013e24>] resume_userspace_check+0x0/0x10
[ 4.654395]
[ 4.655876]
Code: 03e00008 00000000 24831000 <bc950000> bc950020 bc950040 bc950060 bc950080 bc9500a0
[ 4.643978] ---[ end trace 672ef517bf5944f0 ]---
[ 4.648581] Fatal exception: panic in 5 seconds

The next-20140408 based CI20 branch (currently including reverts of
these 2 patches) if anyone wants to reproduce:

https://github.com/paulburton/linux/tree/wip-ci20-v4.1

Thanks,
Paul

> arch/mips/include/asm/cacheflush.h | 35 ++++++++++++++++++++---------------
> arch/mips/include/asm/pgtable.h | 10 ++++++----
> arch/mips/mm/cache.c | 27 ++++++++-------------------
> 3 files changed, 34 insertions(+), 38 deletions(-)
>
> --
> 1.7.10.4
>


Attachments:
(No filename) (4.30 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-04-09 10:54:27

by Lars Persson

[permalink] [raw]
Subject: Re: [PATCH 0/2] New fix for icache coherency race

Hi Paul

I guess you have run into the limitation of __update_cache() not supporting highmem pages. That could be caused by an omission in my patch that forgets to clear the PG_dcache_dirty flag in flush_icache_page. I will submit an incremental patch for this.

- Lars

> 9 apr 2015 kl. 00:53 skrev Paul Burton <[email protected]>:
>
>> On Thu, Feb 26, 2015 at 02:16:01PM +0100, Lars Persson wrote:
>> This patch set proposes an improved fix for the race condition that
>> originally was fixed in commit 2a4a8b1e5d9d ("MIPS: Remove race window
>> in page fault handling").
>>
>> I have used the flush_icache_page API that is marked as deprecated in
>> Documentation/cachetlb.txt. There are strong reasons to keep this API
>> because it is not possible to implement an efficient and race-free
>> lazy flushing using the other APIs.
>>
>> You can refer to a discussion about the same issue in arch/arm where
>> they chose to implement the solution in set_pte_at. In arch/mips we
>> could not do this because we lack information about the executability
>> of the mapping in set_pte_at() and thus we would have to flush all
>> pages to be safe.
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/030915.html
>>
>> Lars Persson (2):
>> Revert "MIPS: Remove race window in page fault handling"
>> MIPS: Fix race condition in lazy cache flushing.
>
> FYI these 2 patches prevent a linux-next based kernel booting on the
> Ingenic JZ4780-based CI20 board. I've not yet tried on the in-tree
> JZ4740-based qi_lb60 to see whether it's also affected, nor have I yet
> figured out what's going wrong. I'll hopefully dig into it tomorrow, but
> just a heads up!
>
> The boot failure (using an initramfs, so no DMA or much I/O at all):
>
> [ 4.618013] Freeing unused kernel memory: 5160K (803d6000 - 808e0000)
> [ 4.625211] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 80027924, ra == 8001db10
> [ 4.635881] Oops[#1]:
> [ 4.638149] CPU: 0 PID: 1 Comm: init Not tainted 4.0.0-rc7-next-20150408+ #334
> [ 4.645354] task: 8dc39568 ti: 8dc3a000 task.ti: 8dc3a000
> [ 4.650736] $ 0 : 00000000 00000001 00000001 00001000
> [ 4.655965] $ 4 : 00000000 00000001 808e0000 8df0b610
> [ 4.639347] $ 8 : 00000000 8d8f8160 00000000 00000000
> [ 4.644575] $12 : 00000000 00000118 00000040 ffff0000
> [ 4.649802] $16 : 81c649fc 77984000 00000004 8df013f8
> [ 4.655030] $20 : 81c649fc 00000000 00000000 00040000
> [ 4.638413] $24 : ff000000 80027920
> [ 4.643642] $28 : 8dc3a000 8dc3bd38 ffffffbf 8001db10
> [ 4.648871] Hi : 3036f946
> [ 4.651740] Lo : eeea49fc
> [ 4.654620] epc : 80027924 r4k_blast_dcache_page_dc32+0x4/0x9c
> [ 4.638763] Not tainted
> [ 4.641550] ra : 8001db10 __update_cache+0xa4/0xd0
> [ 4.646585] Status: 10000403 KERNEL EXL IE
> [ 4.650767] Cause : 00800008
> [ 4.653634] BadVA : 00000000
> [ 4.656503] PrId : 3ee1024f (Ingenic JZRISC)
> [ 4.639002] Process init (pid: 1, threadinfo=8dc3a000, task=8dc39568, tls=00000000)
> [ 4.646636] Stack : 3f916478 67caba37 58047621 77984000 58047621 77984000 8df0b610 800b777c
> 00400000 00031ec0 00000001 00400000 8d8f8178 8ddae840 8de8fbdc 81c649fc
> 8df013f8 8dc3be00 8de8fbe0 8008f7ac 8df09e38 8de9be40 fffffff8 8deeb000
> 00000000 00000000 00000000 00000040 803b9924 8dcd9000 77984000 800e2200
> 803ca320 800e1474 8df013f8 00000610 77984b50 00000000 8df00778 8df0b610
> ...
> [ 4.638523] Call Trace:
> [ 4.640962] [<80027924>] r4k_blast_dcache_page_dc32+0x4/0x9c
> [ 4.646608] [<8001db10>] __update_cache+0xa4/0xd0
> [ 4.651305] [<800b777c>] do_set_pte+0x14c/0x174
> [ 4.655826] [<8008f7ac>] filemap_map_pages+0x2ac/0x384
> [ 4.639107] [<800b7a54>] handle_mm_fault+0x2b0/0x1020
> [ 4.644148] [<8001f460>] __do_page_fault+0x160/0x470
> [ 4.649102] [<80013e24>] resume_userspace_check+0x0/0x10
> [ 4.654395]
> [ 4.655876]
> Code: 03e00008 00000000 24831000 <bc950000> bc950020 bc950040 bc950060 bc950080 bc9500a0
> [ 4.643978] ---[ end trace 672ef517bf5944f0 ]---
> [ 4.648581] Fatal exception: panic in 5 seconds
>
> The next-20140408 based CI20 branch (currently including reverts of
> these 2 patches) if anyone wants to reproduce:
>
> https://github.com/paulburton/linux/tree/wip-ci20-v4.1
>
> Thanks,
> Paul
>
>> arch/mips/include/asm/cacheflush.h | 35 ++++++++++++++++++++---------------
>> arch/mips/include/asm/pgtable.h | 10 ++++++----
>> arch/mips/mm/cache.c | 27 ++++++++-------------------
>> 3 files changed, 34 insertions(+), 38 deletions(-)
>>
>> --
>> 1.7.10.4
>>