2020-02-15 11:51:01

by Jan Kiszka

[permalink] [raw]
Subject: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

From: Jan Kiszka <[email protected]>

Those are not backed by page structs, and pte_page is returning an
invalid pointer.

Signed-off-by: Jan Kiszka <[email protected]>
---
arch/riscv/mm/cacheflush.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..9ee2c1a387cc 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
{
struct page *page = pte_page(pte);

- if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+ if (!pfn_valid(pte_pfn(pte)) ||
+ !test_and_set_bit(PG_dcache_clean, &page->flags))
flush_icache_all();
}
#endif /* CONFIG_MMU */
--
2.16.4


2020-02-16 14:42:27

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

Hi Jan,

On 2/15/20 6:49 AM, Jan Kiszka wrote:
> From: Jan Kiszka <[email protected]>
>
> Those are not backed by page structs, and pte_page is returning an
> invalid pointer.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> =2D--
> arch/riscv/mm/cacheflush.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 8930ab7278e6..9ee2c1a387cc 100644
> =2D-- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
> {
> struct page *page =3D pte_page(pte);
>
> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> + if (!pfn_valid(pte_pfn(pte)) ||
> + !test_and_set_bit(PG_dcache_clean, &page->flags))
> flush_icache_all();
> }
> #endif /* CONFIG_MMU */
> =2D-
> 2.16.4
>
>

When did you encounter such a situation ? i.e. executable code that is
not backed by struct page ?

Riscv uses the generic implementation of ioremap and the way
_PAGE_IOREMAP is defined does not allow to map executable memory region
using ioremap, so I'm interested to understand how we end up in
flush_icache_pte for an executable region not backed by any struct page.

Thanks,

Alex

2020-02-16 16:06:26

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

On 16.02.20 15:41, Alex Ghiti wrote:
> Hi Jan,
>
> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>> From: Jan Kiszka <[email protected]>
>>
>> Those are not backed by page structs, and pte_page is returning an
>> invalid pointer.
>>
>> Signed-off-by: Jan Kiszka <[email protected]>
>> =2D--
>> ? arch/riscv/mm/cacheflush.c | 3 ++-
>> ? 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> index 8930ab7278e6..9ee2c1a387cc 100644
>> =2D-- a/arch/riscv/mm/cacheflush.c
>> +++ b/arch/riscv/mm/cacheflush.c
>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>> ? {
>> ????? struct page *page =3D pte_page(pte);
>>
>> -??? if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>> +??? if (!pfn_valid(pte_pfn(pte)) ||
>> +??????? !test_and_set_bit(PG_dcache_clean, &page->flags))
>> ????????? flush_icache_all();
>> ? }
>> ? #endif /* CONFIG_MMU */
>> =2D-
>> 2.16.4
>>
>>
>
> When did you encounter such a situation ? i.e. executable code that is
> not backed by struct page ?
>
> Riscv uses the generic implementation of ioremap and the way
> _PAGE_IOREMAP is defined does not allow to map executable memory region
> using ioremap, so I'm interested to understand how we end up in
> flush_icache_pte for an executable region not backed by any struct page.

You can create executable mappings of memory that Linux does not
initially consider as RAM via ioremap_prot or ioremap_page_range. We are
using that in Jailhouse to load the hypervisor code into reserved memory
that is ioremapped for the purpose. Works fine on x86, arm and arm64.

Jan

2020-02-16 19:58:01

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

On 2/16/20 11:05 AM, Jan Kiszka wrote:
> On 16.02.20 15:41, Alex Ghiti wrote:
>> Hi Jan,
>>
>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>> From: Jan Kiszka <[email protected]>
>>>
>>> Those are not backed by page structs, and pte_page is returning an
>>> invalid pointer.
>>>
>>> Signed-off-by: Jan Kiszka <[email protected]>
>>> =2D--
>>> ? arch/riscv/mm/cacheflush.c | 3 ++-
>>> ? 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>> +++ b/arch/riscv/mm/cacheflush.c
>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>> ? {
>>> ????? struct page *page =3D pte_page(pte);
>>>
>>> -??? if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>> +??? if (!pfn_valid(pte_pfn(pte)) ||
>>> +??????? !test_and_set_bit(PG_dcache_clean, &page->flags))
>>> ????????? flush_icache_all();
>>> ? }
>>> ? #endif /* CONFIG_MMU */
>>> =2D-
>>> 2.16.4
>>>
>>>
>>
>> When did you encounter such a situation ? i.e. executable code that is
>> not backed by struct page ?
>>
>> Riscv uses the generic implementation of ioremap and the way
>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>> using ioremap, so I'm interested to understand how we end up in
>> flush_icache_pte for an executable region not backed by any struct page.
>
> You can create executable mappings of memory that Linux does not
> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
> using that in Jailhouse to load the hypervisor code into reserved memory
> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
>
> Jan

Ok thanks, I had missed this API.

Regarding your patch, I find it weird to do anything if the pfn is
invalid, we could have garbage in pte pointing to an invalid region for
example (I admit that the effect of flushing the icache would not be
catastrophic in that situation).

I'm not saying I will come with a better solution but I'll take a deeper
look tomorrow.

Alex

2020-02-20 05:50:11

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

Hi Jan,

On 2/16/20 2:56 PM, Alex Ghiti wrote:
> On 2/16/20 11:05 AM, Jan Kiszka wrote:
>> On 16.02.20 15:41, Alex Ghiti wrote:
>>> Hi Jan,
>>>
>>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>>> From: Jan Kiszka <[email protected]>
>>>>
>>>> Those are not backed by page structs, and pte_page is returning an
>>>> invalid pointer.
>>>>
>>>> Signed-off-by: Jan Kiszka <[email protected]>
>>>> =2D--
>>>> ? arch/riscv/mm/cacheflush.c | 3 ++-
>>>> ? 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>>> +++ b/arch/riscv/mm/cacheflush.c
>>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>>> ? {
>>>> ????? struct page *page =3D pte_page(pte);
>>>>
>>>> -??? if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>> +??? if (!pfn_valid(pte_pfn(pte)) ||
>>>> +??????? !test_and_set_bit(PG_dcache_clean, &page->flags))
>>>> ????????? flush_icache_all();
>>>> ? }
>>>> ? #endif /* CONFIG_MMU */
>>>> =2D-
>>>> 2.16.4
>>>>
>>>>
>>>
>>> When did you encounter such a situation ? i.e. executable code that is
>>> not backed by struct page ?
>>>
>>> Riscv uses the generic implementation of ioremap and the way
>>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>>> using ioremap, so I'm interested to understand how we end up in
>>> flush_icache_pte for an executable region not backed by any struct page.
>>
>> You can create executable mappings of memory that Linux does not
>> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
>> using that in Jailhouse to load the hypervisor code into reserved memory
>> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
>>
>> Jan
>
> Ok thanks, I had missed this API.
>
> Regarding your patch, I find it weird to do anything if the pfn is
> invalid, we could have garbage in pte pointing to an invalid region for
> example (I admit that the effect of flushing the icache would not be
> catastrophic in that situation).
>
> I'm not saying I will come with a better solution but I'll take a deeper
> look tomorrow.
>
> Alex
>

I took a look at the Jailhouse driver. After loading the hypervisor into
the ioremapped region, it explicitly ensures icache/dcache consistency
by calling flush_icache_range here:

https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505

There seems to be an implicit (?) rule that states that in-kernel code
modification must handle icache/dcache consistency:

In arm64 set_pte_at definition, they do not sync icache/dcache when the
pte is kernel:

https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271

In mips, they do the same:

https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137

So funnily, I'd do the contrary of what you have done, the mips way:

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 8930ab7278e6..c90c8bb49109 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte)
{
struct page *page = pte_page(pte);

+ if (unlikely(!pfn_valid(pte_pfn(pte))))
+ return;
+
if (!test_and_set_bit(PG_dcache_clean, &page->flags))
flush_icache_all();
}

What do you think ?

Alex

2020-02-20 06:40:10

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] riscv: Fix crash when flushing executable ioremap regions

On 20.02.20 06:49, Alex Ghiti wrote:
> Hi Jan,
>
> On 2/16/20 2:56 PM, Alex Ghiti wrote:
>> On 2/16/20 11:05 AM, Jan Kiszka wrote:
>>> On 16.02.20 15:41, Alex Ghiti wrote:
>>>> Hi Jan,
>>>>
>>>> On 2/15/20 6:49 AM, Jan Kiszka wrote:
>>>>> From: Jan Kiszka <[email protected]>
>>>>>
>>>>> Those are not backed by page structs, and pte_page is returning an
>>>>> invalid pointer.
>>>>>
>>>>> Signed-off-by: Jan Kiszka <[email protected]>
>>>>> =2D--
>>>>> ? arch/riscv/mm/cacheflush.c | 3 ++-
>>>>> ? 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>>>>> index 8930ab7278e6..9ee2c1a387cc 100644
>>>>> =2D-- a/arch/riscv/mm/cacheflush.c
>>>>> +++ b/arch/riscv/mm/cacheflush.c
>>>>> @@ -84,7 +84,8 @@ void flush_icache_pte(pte_t pte)
>>>>> ? {
>>>>> ????? struct page *page =3D pte_page(pte);
>>>>>
>>>>> -??? if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>> +??? if (!pfn_valid(pte_pfn(pte)) ||
>>>>> +??????? !test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>> ????????? flush_icache_all();
>>>>> ? }
>>>>> ? #endif /* CONFIG_MMU */
>>>>> =2D-
>>>>> 2.16.4
>>>>>
>>>>>
>>>>
>>>> When did you encounter such a situation ? i.e. executable code that is
>>>> not backed by struct page ?
>>>>
>>>> Riscv uses the generic implementation of ioremap and the way
>>>> _PAGE_IOREMAP is defined does not allow to map executable memory region
>>>> using ioremap, so I'm interested to understand how we end up in
>>>> flush_icache_pte for an executable region not backed by any struct
>>>> page.
>>>
>>> You can create executable mappings of memory that Linux does not
>>> initially consider as RAM via ioremap_prot or ioremap_page_range. We are
>>> using that in Jailhouse to load the hypervisor code into reserved memory
>>> that is ioremapped for the purpose. Works fine on x86, arm and arm64.
>>>
>>> Jan
>>
>> Ok thanks, I had missed this API.
>>
>> Regarding your patch, I find it weird to do anything if the pfn is
>> invalid, we could have garbage in pte pointing to an invalid region
>> for example (I admit that the effect of flushing the icache would not
>> be catastrophic in that situation).
>>
>> I'm not saying I will come with a better solution but I'll take a
>> deeper look tomorrow.
>>
>> Alex
>>
>
> I took a look at the Jailhouse driver. After loading the hypervisor into
> the ioremapped region, it explicitly ensures icache/dcache consistency
> by calling flush_icache_range here:
>
> https://github.com/siemens/jailhouse/blob/master/driver/main.c#L505
>

Yeah, the arm64 port needed this.

> There seems to be an implicit (?) rule that states that in-kernel code
> modification must handle icache/dcache consistency:
>
> In arm64 set_pte_at definition, they do not sync icache/dcache when the
> pte is kernel:
>
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/pgtable.h#L271
>
>
> In mips, they do the same:
>
> https://elixir.bootlin.com/linux/latest/source/arch/mips/mm/cache.c#L137
>
> So funnily, I'd do the contrary of what you have done, the mips way:
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 8930ab7278e6..c90c8bb49109 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -84,6 +84,9 @@ void flush_icache_pte(pte_t pte)
> ?{
> ??????? struct page *page = pte_page(pte);
>
> +?????? if (unlikely(!pfn_valid(pte_pfn(pte))))
> +?????????????? return;
> +
> ??????? if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> ??????????????? flush_icache_all();
> ?}
>
> What do you think ?
>

I wouldn't mind doing it like above. I suspect that became the common
simple pattern because no one expected a use case like with Jailhouse.
But I'm by far not an expert in mm topics in the kernel.

Jan