Shadow stack memory is writable only in very specific, controlled ways.
However, since it is writable, the kernel treats it as such. As a result
there remain many ways for userspace to trigger the kernel to write to
shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a
little less exposed, block writable GUPs for shadow stack VMAs.
Still allow FOLL_FORCE to write through shadow stack protections, as it
does for read-only protections.
Signed-off-by: Rick Edgecombe <[email protected]>
---
v2:
- New patch
arch/x86/include/asm/pgtable.h | 3 +++
mm/gup.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 7a769c4dbc1c..2e6a5ee70034 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write)
{
unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
+ if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
+ return 0;
+
if (write)
need_pte_bits |= _PAGE_RW;
diff --git a/mm/gup.c b/mm/gup.c
index 5abdaf487460..56da98f3335c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1043,7 +1043,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
return -EFAULT;
if (write) {
- if (!(vm_flags & VM_WRITE)) {
+ if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
/*
--
2.17.1
On 9/29/22 15:29, Rick Edgecombe wrote:
> @@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write)
> {
> unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
>
> + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
> + return 0;
Do we not have a helper for this? Seems a bit messy to open-code these
shadow-stack permissions. Definitely at least needs a comment.
On Fri, 2022-09-30 at 12:16 -0700, Dave Hansen wrote:
> On 9/29/22 15:29, Rick Edgecombe wrote:
> > @@ -1633,6 +1633,9 @@ static inline bool
> > __pte_access_permitted(unsigned long pteval, bool write)
> > {
> > unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> >
> > + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> > _PAGE_DIRTY)
> > + return 0;
>
> Do we not have a helper for this? Seems a bit messy to open-code
> these
> shadow-stack permissions. Definitely at least needs a comment.
It's because pteval is an unsigned long. We could create a pte_t, and
use the helpers, but then we would be using pte_foo() on pmd's, etc. So
probably comment is the better option?
On 9/30/22 13:30, Edgecombe, Rick P wrote:
> On Fri, 2022-09-30 at 12:16 -0700, Dave Hansen wrote:
>> On 9/29/22 15:29, Rick Edgecombe wrote:
>>> @@ -1633,6 +1633,9 @@ static inline bool
>>> __pte_access_permitted(unsigned long pteval, bool write)
>>> {
>>> unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
>>>
>>> + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
>>> _PAGE_DIRTY)
>>> + return 0;
>> Do we not have a helper for this? Seems a bit messy to open-code
>> these
>> shadow-stack permissions. Definitely at least needs a comment.
> It's because pteval is an unsigned long. We could create a pte_t, and
> use the helpers, but then we would be using pte_foo() on pmd's, etc. So
> probably comment is the better option?
Yeah, a comment is probably best.
This is one of those "generic" page table functions that doesn't work
well with the p{te,md,ud}_* types. It's either this or cast over to a
pteval_t for pmd/pud and pretend this is a pte-only function.
On Fri, Sep 30, 2022 at 9:16 PM Dave Hansen <[email protected]> wrote:
> On 9/29/22 15:29, Rick Edgecombe wrote:
> > @@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write)
> > {
> > unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> >
> > + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
> > + return 0;
>
> Do we not have a helper for this? Seems a bit messy to open-code these
> shadow-stack permissions. Definitely at least needs a comment.
FWIW, if you look at more context around this diff, the function looks
like this:
static inline bool __pte_access_permitted(unsigned long pteval, bool write)
{
unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
+ if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
+ return 0;
+
if (write)
need_pte_bits |= _PAGE_RW;
if ((pteval & need_pte_bits) != need_pte_bits)
return 0;
return __pkru_allows_pkey(pte_flags_pkey(pteval), write);
}
So I think this change is actually a no-op - the only thing it does is
to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
below will always return 0 if !_PAGE_RW, unless I'm misreading it? And
this is the only patch in the series that touches this function, so
it's not like this becomes necessary with a later patch in the series
either.
Should this check go in anyway for clarity reasons, or should this
instead be a comment explaining that __pte_access_permitted() behaves
just like the hardware access check, which means shadow pages are
treated as readonly?
On Sat, 2022-10-01 at 01:00 +0200, Jann Horn wrote:
> On Fri, Sep 30, 2022 at 9:16 PM Dave Hansen <[email protected]>
> wrote:
> > On 9/29/22 15:29, Rick Edgecombe wrote:
> > > @@ -1633,6 +1633,9 @@ static inline bool
> > > __pte_access_permitted(unsigned long pteval, bool write)
> > > {
> > > unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> > >
> > > + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> > > _PAGE_DIRTY)
> > > + return 0;
> >
> > Do we not have a helper for this? Seems a bit messy to open-code
> > these
> > shadow-stack permissions. Definitely at least needs a comment.
>
> FWIW, if you look at more context around this diff, the function
> looks
> like this:
>
> static inline bool __pte_access_permitted(unsigned long pteval, bool
> write)
> {
> unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
>
> + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> _PAGE_DIRTY)
> + return 0;
> +
> if (write)
> need_pte_bits |= _PAGE_RW;
>
> if ((pteval & need_pte_bits) != need_pte_bits)
> return 0;
>
> return __pkru_allows_pkey(pte_flags_pkey(pteval), write);
> }
>
> So I think this change is actually a no-op - the only thing it does
> is
> to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
> below will always return 0 if !_PAGE_RW, unless I'm misreading it?
> And
> this is the only patch in the series that touches this function, so
> it's not like this becomes necessary with a later patch in the series
> either.
>
> Should this check go in anyway for clarity reasons, or should this
> instead be a comment explaining that __pte_access_permitted() behaves
> just like the hardware access check, which means shadow pages are
> treated as readonly?
Thanks Jann, I was just realizing the same thing. Yes, I think it
doesn't do anything. I can add a comment of why there is no check, but
otherwise the check seems like unnecessary work.
On Sat, Oct 1, 2022 at 1:00 AM Jann Horn <[email protected]> wrote:
> So I think this change is actually a no-op - the only thing it does is
> to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
> below will always return 0 if !_PAGE_RW, unless I'm misreading it?
Er, to be precise, it will always return 0 if write==1 and !_PAGE_RW.
On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
> [...]
> Still allow FOLL_FORCE to write through shadow stack protections, as it
> does for read-only protections.
As I asked in the cover letter: why do we need to add this for shstk? It
was a mistake for general memory. :P
> [...]
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..56da98f3335c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1043,7 +1043,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> return -EFAULT;
>
> if (write) {
> - if (!(vm_flags & VM_WRITE)) {
> + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
> if (!(gup_flags & FOLL_FORCE))
> return -EFAULT;
> /*
How about this instead:
return -EFAULT;
if (write) {
+ if (vm_flags & VM_SHADOW_STACK)
+ return -EFAULT;
if (!(vm_flags & VM_WRITE)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
--
Kees Cook
On 10/3/22 11:39, Kees Cook wrote:
> On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
>> [...]
>> Still allow FOLL_FORCE to write through shadow stack protections, as it
>> does for read-only protections.
>
> As I asked in the cover letter: why do we need to add this for shstk? It
> was a mistake for general memory. :P
For debuggers, which use FOLL_FORCE, quite intentionally, to modify
text. And once a debugger has ptrace write access to a target, shadow
stacks provide exactly no protection -- ptrace can modify text and all
registers.
But /proc/.../mem may be a different story, and I'd be okay with having
FOLL_PROC_MEM for legacy compatibility via /proc/.../mem and not
allowing that to access shadow stacks. This does seem like it may not
be very useful, though.
On Mon, Oct 03, 2022 at 03:49:18PM -0700, Andy Lutomirski wrote:
> On 10/3/22 11:39, Kees Cook wrote:
> > On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
> > > [...]
> > > Still allow FOLL_FORCE to write through shadow stack protections, as it
> > > does for read-only protections.
> >
> > As I asked in the cover letter: why do we need to add this for shstk? It
> > was a mistake for general memory. :P
>
> For debuggers, which use FOLL_FORCE, quite intentionally, to modify text.
> And once a debugger has ptrace write access to a target, shadow stacks
> provide exactly no protection -- ptrace can modify text and all registers.
i.e. via ptrace? Yeah, I grudgingly accept the ptrace need for
FOLL_FORCE.
> But /proc/.../mem may be a different story, and I'd be okay with having
> FOLL_PROC_MEM for legacy compatibility via /proc/.../mem and not allowing
> that to access shadow stacks. This does seem like it may not be very
> useful, though.
I *really* don't like the /mem use of FOLL_FORCE, though. I think the
rationale has been "using PTRACE_POKE is too slow". Again, I can live
with it, I was just hoping we could avoid expanding that questionable
behavior, especially since it's a bypass of WRSS.
--
Kees Cook