2022-09-08 17:19:04

by Andrew Bresticker

[permalink] [raw]
Subject: [PATCH] riscv: Allow PROT_WRITE-only mmap()

Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
PROT_READ with the justification that a write-only PTE is considered a
reserved PTE permission bit pattern in the privileged spec. This check
is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
inconsistent with other architectures that don't support write-only PTEs,
creating a potential software portability issue. Just remove the check
altogether and let PROT_WRITE imply PROT_READ as is the case on other
architectures.

Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
Signed-off-by: Andrew Bresticker <[email protected]>
---
arch/riscv/kernel/sys_riscv.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 571556bb9261..5d3f2fbeb33c 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
return -EINVAL;

- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
- return -EINVAL;
-
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> (PAGE_SHIFT - page_shift_offset));
}
--
2.25.1


2022-09-08 17:24:24

by SS JieJi

[permalink] [raw]
Subject: Re: [PATCH] riscv: Allow PROT_WRITE-only mmap()

> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> inconsistent with other architectures that don't support write-only PTEs,
> creating a potential software portability issue.

I don't believe that the check is unnecessary. The missing check is
discovered in realworld scenario, while we are fixing libaio's test
failure on RISC-V [1]. A minimum reproducible example is uploaded to
https://fars.ee/1sPb, showing *inconsistent* read results on -r- pages
before/after a write attempt performed by the kernel.

[1]: https://pagure.io/libaio/blob/1b18bfafc6a2f7b9fa2c6be77a95afed8b7be448/f/harness/cases/5.t

> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> - return -EINVAL;
> -

Just to mention, this revert patch is removing the check of exec
without read (--x), too.

2022-09-08 17:37:36

by SS JieJi

[permalink] [raw]
Subject: Re: [PATCH] riscv: Allow PROT_WRITE-only mmap()

> https://fars.ee/1sPb, showing *inconsistent* read results on -r- pages
> before/after a write attempt performed by the kernel.

That said, maybe prohibit mmap-ing -w- pages is not the best fix for
this issue. If -w- pages are irreplaceable for some use cases (and
hence need to be allowed), I'd suggest at least we need to re-fix the
read result inconsistency issue somewhere else despite simply
reverting the patch.

Yours, Pan Ruizhe

2022-09-08 19:35:52

by Andrew Bresticker

[permalink] [raw]
Subject: Re: [PATCH] riscv: Allow PROT_WRITE-only mmap()

On Thu, Sep 8, 2022 at 1:28 PM SS JieJi <[email protected]> wrote:
>
> > https://fars.ee/1sPb, showing *inconsistent* read results on -r- pages
> > before/after a write attempt performed by the kernel.
>
> That said, maybe prohibit mmap-ing -w- pages is not the best fix for
> this issue. If -w- pages are irreplaceable for some use cases (and
> hence need to be allowed), I'd suggest at least we need to re-fix the
> read result inconsistency issue somewhere else despite simply
> reverting the patch.

Ah, this is because do_page_fault() also needs to be made aware of
write-implying-read. Will send a v2 shortly.

-Andrew

>
> Yours, Pan Ruizhe

2022-09-08 19:38:21

by Andrew Bresticker

[permalink] [raw]
Subject: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
PROT_READ with the justification that a write-only PTE is considered a
reserved PTE permission bit pattern in the privileged spec. This check
is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
inconsistent with other architectures that don't support write-only PTEs,
creating a potential software portability issue. Just remove the check
altogether and let PROT_WRITE imply PROT_READ as is the case on other
architectures.

Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
disallowed prior to the aforementioned commit; PROT_READ is implied in
such mappings as well.

Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
Signed-off-by: Andrew Bresticker <[email protected]>
---
v1 -> v2: Update access_error() to account for write-implies-read
---
arch/riscv/kernel/sys_riscv.c | 3 ---
arch/riscv/mm/fault.c | 3 ++-
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 571556bb9261..5d3f2fbeb33c 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
return -EINVAL;

- if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
- return -EINVAL;
-
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> (PAGE_SHIFT - page_shift_offset));
}
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index f2fbd1400b7c..d86f7cebd4a7 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
}
break;
case EXC_LOAD_PAGE_FAULT:
- if (!(vma->vm_flags & VM_READ)) {
+ /* Write implies read */
+ if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
return true;
}
break;
--
2.25.1

2022-09-08 20:01:47

by SS JieJi

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

The v2 patch looks great,
> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> - return -EINVAL;
> -
This also removes the check for --x pages, which used to be present in
previous versions (before the submission of the to-be-reverted patch).
Is this intended? Thanks!

2022-09-09 03:05:45

by Celeste Liu

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

On 2022/9/9 02:50, Andrew Bresticker wrote:
> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> PROT_READ with the justification that a write-only PTE is considered a
> reserved PTE permission bit pattern in the privileged spec. This check
> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> inconsistent with other architectures that don't support write-only PTEs,
> creating a potential software portability issue. Just remove the check
> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> architectures.
>
> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> disallowed prior to the aforementioned commit; PROT_READ is implied in
> such mappings as well.
>
> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> Signed-off-by: Andrew Bresticker <[email protected]>
> ---
> v1 -> v2: Update access_error() to account for write-implies-read
> ---
> arch/riscv/kernel/sys_riscv.c | 3 ---
> arch/riscv/mm/fault.c | 3 ++-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 571556bb9261..5d3f2fbeb33c 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
> return -EINVAL;
>
> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> - return -EINVAL;
> -
> return ksys_mmap_pgoff(addr, len, prot, flags, fd,
> offset >> (PAGE_SHIFT - page_shift_offset));
> }
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index f2fbd1400b7c..d86f7cebd4a7 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
> }
> break;
> case EXC_LOAD_PAGE_FAULT:
> - if (!(vma->vm_flags & VM_READ)) {
> + /* Write implies read */
> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
> return true;
> }
> break;

Hi, this did solve the problem and achieved consistency between
architectures, but I have a question.

Such a change specifies behavior for a state that should not exist,
and if, in the future, RISC-V spec specifies a different behavior
for that state (I mean, RVI itself has a history of not caring about
downstream, like Zicsr and Zifencei), it will create inconsistencies,
which is bad.

If we reject the "write but not read" state, the user gets the most direct
response: the state is not allowed so that they do not and cannot rely
on the behavior of the state. This will bring better time consistency
to the application if the spec specifies the behavior in the future.
But it lost architecture consistency.

How do you think this situation should be handled properly?

Yours,
Celeste Liu

2022-09-09 11:52:10

by Celeste Liu

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

On 2022/9/9 11:01, Celeste Liu wrote:
> On 2022/9/9 02:50, Andrew Bresticker wrote:
>> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
>> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
>> PROT_READ with the justification that a write-only PTE is considered a
>> reserved PTE permission bit pattern in the privileged spec. This check
>> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
>> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
>> inconsistent with other architectures that don't support write-only PTEs,
>> creating a potential software portability issue. Just remove the check
>> altogether and let PROT_WRITE imply PROT_READ as is the case on other
>> architectures.
>>
>> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
>> disallowed prior to the aforementioned commit; PROT_READ is implied in
>> such mappings as well.
>>
>> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
>> Signed-off-by: Andrew Bresticker <[email protected]>
>> ---
>> v1 -> v2: Update access_error() to account for write-implies-read
>> ---
>> arch/riscv/kernel/sys_riscv.c | 3 ---
>> arch/riscv/mm/fault.c | 3 ++-
>> 2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
>> index 571556bb9261..5d3f2fbeb33c 100644
>> --- a/arch/riscv/kernel/sys_riscv.c
>> +++ b/arch/riscv/kernel/sys_riscv.c
>> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
>> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
>> return -EINVAL;
>>
>> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
>> - return -EINVAL;
>> -
>> return ksys_mmap_pgoff(addr, len, prot, flags, fd,
>> offset >> (PAGE_SHIFT - page_shift_offset));
>> }
>> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
>> index f2fbd1400b7c..d86f7cebd4a7 100644
>> --- a/arch/riscv/mm/fault.c
>> +++ b/arch/riscv/mm/fault.c
>> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
>> }
>> break;
>> case EXC_LOAD_PAGE_FAULT:
>> - if (!(vma->vm_flags & VM_READ)) {
>> + /* Write implies read */
>> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
>> return true;
>> }
>> break;
>
> Hi, this did solve the problem and achieved consistency between
> architectures, but I have a question.
>
> Such a change specifies behavior for a state that should not exist,
> and if, in the future, RISC-V spec specifies a different behavior
> for that state (I mean, RVI itself has a history of not caring about
> downstream, like Zicsr and Zifencei), it will create inconsistencies,
> which is bad.
>
> If we reject the "write but not read" state, the user gets the most direct
> response: the state is not allowed so that they do not and cannot rely
> on the behavior of the state. This will bring better time consistency
> to the application if the spec specifies the behavior in the future.
> But it lost architecture consistency.
>
> How do you think this situation should be handled properly?
>
> Yours,
> Celeste Liu

Oops!

I found a mistake in my previous understanding: PTE permission!=vma permission.
So your modification makes sense, no matter how we handle the mapping of input
permissions to PTEs, as long as we don't use the reserved permission combinations,
the behavior is reasonable and also independent of the architecture's definition
of PTEs.

But I think this mapping relationship should be well documented. If we have
such a mapping behavior in all architectures, then we should change this line in
the mmap documentation
On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ.
to apply all architectures. According to my read about code, all the vm_get_page_prot
will do the protection_map mapping to have this feature.

Yours,
Celeste Liu


Attachments:
OpenPGP_0x15F4180E73787863.asc (7.86 kB)
OpenPGP public key
OpenPGP_signature (235.00 B)
OpenPGP digital signature
Download all attachments

2022-09-09 15:31:28

by Andrew Bresticker

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

On Fri, Sep 9, 2022 at 7:42 AM Coelacanthus <[email protected]> wrote:
>
> On 2022/9/9 11:01, Celeste Liu wrote:
> > On 2022/9/9 02:50, Andrew Bresticker wrote:
> >> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> >> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> >> PROT_READ with the justification that a write-only PTE is considered a
> >> reserved PTE permission bit pattern in the privileged spec. This check
> >> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> >> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> >> inconsistent with other architectures that don't support write-only PTEs,
> >> creating a potential software portability issue. Just remove the check
> >> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> >> architectures.
> >>
> >> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> >> disallowed prior to the aforementioned commit; PROT_READ is implied in
> >> such mappings as well.
> >>
> >> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> >> Signed-off-by: Andrew Bresticker <[email protected]>
> >> ---
> >> v1 -> v2: Update access_error() to account for write-implies-read
> >> ---
> >> arch/riscv/kernel/sys_riscv.c | 3 ---
> >> arch/riscv/mm/fault.c | 3 ++-
> >> 2 files changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> >> index 571556bb9261..5d3f2fbeb33c 100644
> >> --- a/arch/riscv/kernel/sys_riscv.c
> >> +++ b/arch/riscv/kernel/sys_riscv.c
> >> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> >> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
> >> return -EINVAL;
> >>
> >> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> >> - return -EINVAL;
> >> -
> >> return ksys_mmap_pgoff(addr, len, prot, flags, fd,
> >> offset >> (PAGE_SHIFT - page_shift_offset));
> >> }
> >> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> >> index f2fbd1400b7c..d86f7cebd4a7 100644
> >> --- a/arch/riscv/mm/fault.c
> >> +++ b/arch/riscv/mm/fault.c
> >> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
> >> }
> >> break;
> >> case EXC_LOAD_PAGE_FAULT:
> >> - if (!(vma->vm_flags & VM_READ)) {
> >> + /* Write implies read */
> >> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
> >> return true;
> >> }
> >> break;
> >
> > Hi, this did solve the problem and achieved consistency between
> > architectures, but I have a question.
> >
> > Such a change specifies behavior for a state that should not exist,
> > and if, in the future, RISC-V spec specifies a different behavior
> > for that state (I mean, RVI itself has a history of not caring about
> > downstream, like Zicsr and Zifencei), it will create inconsistencies,
> > which is bad.
> >
> > If we reject the "write but not read" state, the user gets the most direct
> > response: the state is not allowed so that they do not and cannot rely
> > on the behavior of the state. This will bring better time consistency
> > to the application if the spec specifies the behavior in the future.
> > But it lost architecture consistency.
> >
> > How do you think this situation should be handled properly?
> >
> > Yours,
> > Celeste Liu
>
> Oops!
>
> I found a mistake in my previous understanding: PTE permission!=vma permission.
> So your modification makes sense, no matter how we handle the mapping of input
> permissions to PTEs, as long as we don't use the reserved permission combinations,
> the behavior is reasonable and also independent of the architecture's definition
> of PTEs.
>
> But I think this mapping relationship should be well documented. If we have
> such a mapping behavior in all architectures, then we should change this line in
> the mmap documentation
> On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ.
> to apply all architectures. According to my read about code, all the vm_get_page_prot
> will do the protection_map mapping to have this feature.

I think leaving the PROT_WRITE-implies-PROT_READ as being specified as
architecture-dependent is reasonable, but of course portable programs
shouldn't rely on this behavior. There are CPUs out there that support
write-only mappings -- MIPS with RI/XI comes to mind and indeed
mmap(PROT_WRITE) on such CPUs results in write-only mappings.

-Andrew

>
> Yours,
> Celeste Liu

2022-09-09 16:37:26

by Celeste Liu

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

On 2022/9/9 23:16, Andrew Bresticker wrote>
> I think leaving the PROT_WRITE-implies-PROT_READ as being specified as
> architecture-dependent is reasonable, but of course portable programs
> shouldn't rely on this behavior. There are CPUs out there that support
> write-only mappings -- MIPS with RI/XI comes to mind and indeed
> mmap(PROT_WRITE) on such CPUs results in write-only mappings.
>
> -Andrew
>

Ok, I have no question now. This patch looks good to me.

This feature shouldn't be relied upon indeed, as it depends on the specific
hardware implementation.

Thanks for your explanation!

Yours,
Celeste Liu

2022-09-09 19:04:36

by Atish Patra

[permalink] [raw]
Subject: Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

On Thu, Sep 8, 2022 at 11:50 AM Andrew Bresticker <[email protected]> wrote:
>
> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> PROT_READ with the justification that a write-only PTE is considered a
> reserved PTE permission bit pattern in the privileged spec. This check
> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> inconsistent with other architectures that don't support write-only PTEs,
> creating a potential software portability issue. Just remove the check
> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> architectures.
>
> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> disallowed prior to the aforementioned commit; PROT_READ is implied in
> such mappings as well.
>
> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> Signed-off-by: Andrew Bresticker <[email protected]>
> ---
> v1 -> v2: Update access_error() to account for write-implies-read
> ---
> arch/riscv/kernel/sys_riscv.c | 3 ---
> arch/riscv/mm/fault.c | 3 ++-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 571556bb9261..5d3f2fbeb33c 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
> return -EINVAL;
>
> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> - return -EINVAL;
> -
> return ksys_mmap_pgoff(addr, len, prot, flags, fd,
> offset >> (PAGE_SHIFT - page_shift_offset));
> }
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index f2fbd1400b7c..d86f7cebd4a7 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
> }
> break;
> case EXC_LOAD_PAGE_FAULT:
> - if (!(vma->vm_flags & VM_READ)) {
> + /* Write implies read */
> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
> return true;
> }
> break;

This should be a separate patch with commit text about VMA permissions.

> --
> 2.25.1
>

Otherwise, lgtm.

Reviewed-by: Atish Patra <[email protected]>

--
Regards,
Atish