This series is the result of a rather absurd amount of discussion over
how to get SGX to play nice with LSM policies, without having to resort
to evil shenanigans or put undue burden on userspace. The discussion
definitely wandered into completely insane territory at times, but I
think/hope we ended up with something reasonable.
The basic gist of the approach is to require userspace to declare what
protections are maximally allowed for any given page, e.g. add a flags
field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}. LSMs
can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent
ever mapping the page with PROT_EXEC. SGX enforces the allowed perms
via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses
MAY_{READ,WRITE,EXEC}.
ALLOW_EXEC is used to deny hings like loading an enclave from a noexec
file system or from a file without EXECUTE permissions, e.g. without
the ALLOW_EXEC concept, on SGX2 hardware (regardless of kernel support)
userspace could EADD from a noexec file using read-only permissions,
and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's
EXECMOD (or EXECMEM).
This is very much an RFC series. It's only compile tested, likely has
obvious bugs, the SELinux patch could be completely harebrained, etc...
My goal at this point is to get feedback at a macro level, e.g. is the
core concept viable/acceptable, are there objection to hooking
mprotect(), etc...
Andy and Cedric, hopefully this aligns with your general expectations
based on our last discussion.
Lastly, I added a patch to allow userspace to add multiple pages in a
single ioctl(). It's obviously not directly related to the security
stuff, but the idea tangentially came up during earlier discussions and
it's something I think the UAPI should provide (it's a tiny change).
Since I was modifying the UAPI anyways, I threw it in.
Sean Christopherson (9):
x86/sgx: Remove unused local variable in sgx_encl_release()
x86/sgx: Do not naturally align MAP_FIXED address
x86/sgx: Allow userspace to add multiple pages in single ioctl()
mm: Introduce vm_ops->mprotect()
x86/sgx: Restrict mapping without an enclave page to PROT_NONE
x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
x86/sgx: Enforce noexec filesystem restriction for enclaves
LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
security/selinux: Add enclave_load() implementation
arch/x86/include/uapi/asm/sgx.h | 30 ++++--
arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
arch/x86/kernel/cpu/sgx/driver/main.c | 13 ++-
arch/x86/kernel/cpu/sgx/encl.c | 31 +++++-
arch/x86/kernel/cpu/sgx/encl.h | 4 +
include/linux/lsm_hooks.h | 16 +++
include/linux/mm.h | 2 +
include/linux/security.h | 2 +
mm/mprotect.c | 15 ++-
security/security.c | 8 ++
security/selinux/hooks.c | 85 +++++++++++++++
11 files changed, 290 insertions(+), 59 deletions(-)
--
2.21.0
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/cpu/sgx/encl.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 7216bdf07bd0..f23ea0fbaa47 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -463,7 +463,6 @@ EXPORT_SYMBOL_GPL(sgx_encl_destroy);
void sgx_encl_release(struct kref *ref)
{
struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
- struct sgx_encl_mm *encl_mm;
if (encl->pm_notifier.notifier_call)
unregister_pm_notifier(&encl->pm_notifier);
--
2.21.0
...to support (the equivalent) of existing Linux Security Module
functionality.
Because SGX manually manages EPC memory, all enclave VMAs are backed by
the same vm_file, i.e. /dev/sgx/enclave, so that SGX can implement the
necessary hooks to move pages in/out of the EPC. And because EPC pages
for any given enclave are fundamentally shared between processes, i.e.
CoW semantics are not possible with EPC pages, /dev/sgx/enclave must
always be MAP_SHARED. Lastly, all real world enclaves will need read,
write and execute permissions to EPC pages. As a result, SGX does not
play nice with existing LSM behavior as it is impossible to apply
policies to enclaves with any reasonable granularity, e.g. an LSM can
deny access to EPC altogether, but can't deny potentially dangerous
behavior such as mapping pages RW->RW or RWX.
To give LSMs enough information to implement their policies without
having to resort to ugly things, e.g. holding a reference to the vm_file
of each enclave page, require userspace to explicitly state the allowed
protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC}
in the ADD_PAGES ioctl.
The ALLOW_* flags will be passed to LSMs so that they can make informed
decisions when the enclave is being built, i.e. when the source vm_file
is available. For example, SELinux's EXECMOD permission can be
required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC.
Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections,
a la the standard VM_MAY{READ,WRITE,EXEC} flags.
The ALLOW_EXEC flag also has a second (important) use in that it can
be used to prevent loading an enclave from a noexec file system, on
SGX2 hardware (regardless of kernel support for SGX2), userspace could
EADD from a noexec path using read-only permissions and later mprotect()
and ENCLU[EMODPE] the page to gain execute permissions. By requiring
ALLOW_EXEC up front, SGX will be able to enforce noexec paths when
building the enclave.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/uapi/asm/sgx.h | 9 ++++++++-
arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +++++++++++++++++------
arch/x86/kernel/cpu/sgx/encl.c | 2 +-
arch/x86/kernel/cpu/sgx/encl.h | 1 +
4 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 4a12d6abbcb7..4489e92fa0dc 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -31,6 +31,11 @@ struct sgx_enclave_create {
__u64 src;
};
+/* Supported flags for struct sgx_enclave_add_pages. */
+#define SGX_ALLOW_READ VM_READ
+#define SGX_ALLOW_WRITE VM_WRITE
+#define SGX_ALLOW_EXEC VM_EXEC
+
/**
* struct sgx_enclave_add_pages - parameter structure for the
* %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
@@ -39,6 +44,7 @@ struct sgx_enclave_create {
* @secinfo: address for the SECINFO data (common to all pages)
* @nr_pages: number of pages (must be virtually contiguous)
* @mrmask: bitmask for the measured 256 byte chunks (common to all pages)
+ * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages)
*/
struct sgx_enclave_add_pages {
__u64 addr;
@@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
__u64 secinfo;
__u32 nr_pages;
__u16 mrmask;
-} __attribute__((__packed__));
+ __u16 flags;
+};
/**
* struct sgx_enclave_init - parameter structure for the
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index 6acfcbdeca9a..c30acd3fbbdd 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -235,7 +235,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs,
}
static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
- unsigned long addr)
+ unsigned long addr,
+ unsigned long allowed_prot)
{
struct sgx_encl_page *encl_page;
int ret;
@@ -247,6 +248,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
return ERR_PTR(-ENOMEM);
encl_page->desc = addr;
encl_page->encl = encl;
+ encl_page->allowed_prot = allowed_prot;
ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc),
encl_page);
if (ret) {
@@ -530,7 +532,7 @@ static int sgx_encl_queue_page(struct sgx_encl *encl,
static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
void *data, struct sgx_secinfo *secinfo,
- unsigned int mrmask)
+ unsigned int mrmask, unsigned long allowed_prot)
{
u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
struct sgx_encl_page *encl_page;
@@ -556,7 +558,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
goto out;
}
- encl_page = sgx_encl_page_alloc(encl, addr);
+ encl_page = sgx_encl_page_alloc(encl, addr, allowed_prot);
if (IS_ERR(encl_page)) {
ret = PTR_ERR(encl_page);
goto out;
@@ -576,12 +578,20 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
unsigned long src, struct sgx_secinfo *secinfo,
- unsigned int mrmask)
+ unsigned int mrmask, unsigned int flags)
{
+ unsigned long prot = secinfo->flags & (VM_READ | VM_WRITE | VM_EXEC);
+ unsigned long allowed_prot = flags & (VM_READ | VM_WRITE | VM_EXEC);
struct page *data_page;
void *data;
int ret;
+ BUILD_BUG_ON(SGX_SECINFO_R != VM_READ || SGX_SECINFO_W != VM_WRITE ||
+ SGX_SECINFO_X != VM_EXEC);
+
+ if (prot & ~allowed_prot)
+ return -EACCES;
+
data_page = alloc_page(GFP_HIGHUSER);
if (!data_page)
return -ENOMEM;
@@ -593,7 +603,8 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
goto out;
}
- ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask);
+ ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask,
+ allowed_prot);
out:
kunmap(data_page);
__free_page(data_page);
@@ -645,7 +656,7 @@ static long sgx_ioc_enclave_add_pages(struct file *filep, unsigned int cmd,
ret = sgx_encl_add_page(encl, addp->addr + i*PAGE_SIZE,
addp->src + i*PAGE_SIZE,
- &secinfo, addp->mrmask);
+ &secinfo, addp->mrmask, addp->flags);
}
return ret;
}
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 955d4f430adc..e5847571a265 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -249,7 +249,7 @@ int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
for (addr = start; addr < end; addr += PAGE_SIZE) {
page = radix_tree_lookup(&encl->page_tree, addr >> PAGE_SHIFT);
- if (!page)
+ if (!page || (prot & ~page->allowed_prot))
return -EACCES;
}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 6e310e3b3fff..7cca076a4987 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -41,6 +41,7 @@ enum sgx_encl_page_desc {
struct sgx_encl_page {
unsigned long desc;
+ unsigned long allowed_prot;
struct sgx_epc_page *epc_page;
struct sgx_va_page *va_page;
struct sgx_encl *encl;
--
2.21.0
SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
tracked and enforced by the CPU using a base+mask approach, similar to
how hardware range registers such as the variable MTRRs. As a result,
the ELRANGE must be naturally sized and aligned.
To reduce boilerplate code that would be needed in every userspace
enclave loader, the SGX driver naturally aligns the mmap() address and
also requires the range to be naturally sized. Unfortunately, SGX fails
to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
if userspace is attempting to map a small slice of an existing enclave.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index afe844aa81d6..129d356aff30 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -79,7 +79,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
unsigned long pgoff,
unsigned long flags)
{
- if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE)
+ if (flags & MAP_PRIVATE)
+ return -EINVAL;
+
+ if (flags & MAP_FIXED)
+ return addr;
+
+ if (len < 2 * PAGE_SIZE || len & (len - 1))
return -EINVAL;
addr = current->mm->get_unmapped_area(file, addr, 2 * len, pgoff,
--
2.21.0
Hi Sean,
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
>
> This series is the result of a rather absurd amount of discussion over how to get SGX to play
> nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
> userspace. The discussion definitely wandered into completely insane territory at times, but
> I think/hope we ended up with something reasonable.
>
> The basic gist of the approach is to require userspace to declare what protections are
> maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
> takes ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed protections, e.g. clear
> ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC. SGX enforces the allowed perms
> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
>
> ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
> file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
> (regardless of kernel support) userspace could EADD from a noexec file using read-only
> permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
>
> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
>
> This is very much an RFC series. It's only compile tested, likely has obvious bugs, the
> SELinux patch could be completely harebrained, etc...
> My goal at this point is to get feedback at a macro level, e.g. is the core concept
> viable/acceptable, are there objection to hooking mprotect(), etc...
>
> Andy and Cedric, hopefully this aligns with your general expectations based on our last
> discussion.
I couldn't understand the real intentions of ALLOW_* flags until I saw them in code. I have to say C is more expressive than English in that regard :)
Generally I agree with your direction but think ALLOW_* flags are completely internal to LSM because they can be both produced and consumed inside an LSM module. So spilling them into SGX driver and also user mode code makes the solution ugly and in some cases impractical because not every enclave host process has a priori knowledge on whether or not an enclave page would be EMODPE'd at runtime.
Theoretically speaking, what you really need is a per page flag (let's name it WRITTEN?) indicating whether a page has ever been written to (or more precisely, granted PROT_WRITE), which will be used to decide whether to grant PROT_EXEC when requested in future. Given the fact that all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture that flag by itself instead of asking user mode code to provide it.
That said, here is the summary of what I think is a better approach.
* In hook security_file_alloc(), if @file is an enclave, allocate some data structure to store for every page, the WRITTEN flag as described above. WRITTEN is cleared initially for all pages.
Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
* In hook security_mmap_file(), if @file is an enclave, make sure @prot can only be PROT_NONE. This is to force all protection changes to go through security_file_mprotect().
* In the newly introduced hook security_enclave_load(), set WRITTEN for pages that are requested PROT_WRITE.
* In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up and use WRITTEN flags for all pages within @vma, along with other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide on allowing/rejecting @prot.
* In hook security_file_free(), if @file is an enclave, free storage allocated for WRITTEN flags.
I'll try to make more detailed comments in my replies to individual patches sometime tomorrow.
>
> Lastly, I added a patch to allow userspace to add multiple pages in a single ioctl(). It's
> obviously not directly related to the security stuff, but the idea tangentially came up during
> earlier discussions and it's something I think the UAPI should provide (it's a tiny change).
> Since I was modifying the UAPI anyways, I threw it in.
>
> Sean Christopherson (9):
> x86/sgx: Remove unused local variable in sgx_encl_release()
> x86/sgx: Do not naturally align MAP_FIXED address
> x86/sgx: Allow userspace to add multiple pages in single ioctl()
> mm: Introduce vm_ops->mprotect()
> x86/sgx: Restrict mapping without an enclave page to PROT_NONE
> x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
> x86/sgx: Enforce noexec filesystem restriction for enclaves
> LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
> security/selinux: Add enclave_load() implementation
>
> arch/x86/include/uapi/asm/sgx.h | 30 ++++--
> arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
> arch/x86/kernel/cpu/sgx/driver/main.c | 13 ++-
> arch/x86/kernel/cpu/sgx/encl.c | 31 +++++-
> arch/x86/kernel/cpu/sgx/encl.h | 4 +
> include/linux/lsm_hooks.h | 16 +++
> include/linux/mm.h | 2 +
> include/linux/security.h | 2 +
> mm/mprotect.c | 15 ++-
> security/security.c | 8 ++
> security/selinux/hooks.c | 85 +++++++++++++++
> 11 files changed, 290 insertions(+), 59 deletions(-)
>
> --
> 2.21.0
-Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
>
> ...to support (the equivalent) of existing Linux Security Module functionality.
>
> Because SGX manually manages EPC memory, all enclave VMAs are backed by the same vm_file,
> i.e. /dev/sgx/enclave, so that SGX can implement the necessary hooks to move pages in/out
> of the EPC. And because EPC pages for any given enclave are fundamentally shared between
> processes, i.e.
> CoW semantics are not possible with EPC pages, /dev/sgx/enclave must always be MAP_SHARED.
> Lastly, all real world enclaves will need read, write and execute permissions to EPC pages.
> As a result, SGX does not play nice with existing LSM behavior as it is impossible to
> apply policies to enclaves with any reasonable granularity, e.g. an LSM can deny access to
> EPC altogether, but can't deny potentially dangerous behavior such as mapping pages RW->RW
> or RWX.
>
> To give LSMs enough information to implement their policies without having to resort to
> ugly things, e.g. holding a reference to the vm_file of each enclave page, require
> userspace to explicitly state the allowed protections for each page (region), i.e. take
> ALLOW_{READ,WRITE,EXEC} in the ADD_PAGES ioctl.
>
> The ALLOW_* flags will be passed to LSMs so that they can make informed decisions when the
> enclave is being built, i.e. when the source vm_file is available. For example, SELinux's
> EXECMOD permission can be required if an enclave is requesting both ALLOW_WRITE and
> ALLOW_EXEC.
>
> Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections, a la the standard
> VM_MAY{READ,WRITE,EXEC} flags.
>
> The ALLOW_EXEC flag also has a second (important) use in that it can be used to prevent
> loading an enclave from a noexec file system, on
> SGX2 hardware (regardless of kernel support for SGX2), userspace could EADD from a noexec
> path using read-only permissions and later mprotect() and ENCLU[EMODPE] the page to gain
> execute permissions. By requiring ALLOW_EXEC up front, SGX will be able to enforce noexec
> paths when building the enclave.
ALLOW_* flags shall be kept internal to LSM.
This patch is completely unnecessary.
On 6/2/19 3:29 AM, Xing, Cedric wrote:
> Hi Sean,
>
>> From: Christopherson, Sean J
>> Sent: Friday, May 31, 2019 4:32 PM
>>
>> This series is the result of a rather absurd amount of discussion over how to get SGX to play
>> nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
>> userspace. The discussion definitely wandered into completely insane territory at times, but
>> I think/hope we ended up with something reasonable.
>>
>> The basic gist of the approach is to require userspace to declare what protections are
>> maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
>> takes ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed protections, e.g. clear
>> ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC. SGX enforces the allowed perms
>> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
>>
>> ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
>> file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
>> (regardless of kernel support) userspace could EADD from a noexec file using read-only
>> permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
>>
>> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
>>
>> This is very much an RFC series. It's only compile tested, likely has obvious bugs, the
>> SELinux patch could be completely harebrained, etc...
>> My goal at this point is to get feedback at a macro level, e.g. is the core concept
>> viable/acceptable, are there objection to hooking mprotect(), etc...
>>
>> Andy and Cedric, hopefully this aligns with your general expectations based on our last
>> discussion.
>
> I couldn't understand the real intentions of ALLOW_* flags until I saw them in code. I have to say C is more expressive than English in that regard :)
>
> Generally I agree with your direction but think ALLOW_* flags are completely internal to LSM because they can be both produced and consumed inside an LSM module. So spilling them into SGX driver and also user mode code makes the solution ugly and in some cases impractical because not every enclave host process has a priori knowledge on whether or not an enclave page would be EMODPE'd at runtime.
>
> Theoretically speaking, what you really need is a per page flag (let's name it WRITTEN?) indicating whether a page has ever been written to (or more precisely, granted PROT_WRITE), which will be used to decide whether to grant PROT_EXEC when requested in future. Given the fact that all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture that flag by itself instead of asking user mode code to provide it.
>
> That said, here is the summary of what I think is a better approach.
> * In hook security_file_alloc(), if @file is an enclave, allocate some data structure to store for every page, the WRITTEN flag as described above. WRITTEN is cleared initially for all pages.
> Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
> * In hook security_mmap_file(), if @file is an enclave, make sure @prot can only be PROT_NONE. This is to force all protection changes to go through security_file_mprotect().
> * In the newly introduced hook security_enclave_load(), set WRITTEN for pages that are requested PROT_WRITE.
> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up and use WRITTEN flags for all pages within @vma, along with other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide on
allowing/rejecting @prot.
At this point we have no knowledge of the source vma/file, right? So
what do we check FILE__EXECUTE and/or FILE__EXECMOD against?
vma->vm_file at this point is /dev/sgx/enclave, right?
> * In hook security_file_free(), if @file is an enclave, free storage allocated for WRITTEN flags.
>
> I'll try to make more detailed comments in my replies to individual patches sometime tomorrow.
>
>>
>> Lastly, I added a patch to allow userspace to add multiple pages in a single ioctl(). It's
>> obviously not directly related to the security stuff, but the idea tangentially came up during
>> earlier discussions and it's something I think the UAPI should provide (it's a tiny change).
>> Since I was modifying the UAPI anyways, I threw it in.
>>
>> Sean Christopherson (9):
>> x86/sgx: Remove unused local variable in sgx_encl_release()
>> x86/sgx: Do not naturally align MAP_FIXED address
>> x86/sgx: Allow userspace to add multiple pages in single ioctl()
>> mm: Introduce vm_ops->mprotect()
>> x86/sgx: Restrict mapping without an enclave page to PROT_NONE
>> x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
>> x86/sgx: Enforce noexec filesystem restriction for enclaves
>> LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
>> security/selinux: Add enclave_load() implementation
>>
>> arch/x86/include/uapi/asm/sgx.h | 30 ++++--
>> arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
>> arch/x86/kernel/cpu/sgx/driver/main.c | 13 ++-
>> arch/x86/kernel/cpu/sgx/encl.c | 31 +++++-
>> arch/x86/kernel/cpu/sgx/encl.h | 4 +
>> include/linux/lsm_hooks.h | 16 +++
>> include/linux/mm.h | 2 +
>> include/linux/security.h | 2 +
>> mm/mprotect.c | 15 ++-
>> security/security.c | 8 ++
>> security/selinux/hooks.c | 85 +++++++++++++++
>> 11 files changed, 290 insertions(+), 59 deletions(-)
>>
>> --
>> 2.21.0
>
> -Cedric
>
On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> Hi Sean,
>
> > From: Christopherson, Sean J
> > Sent: Friday, May 31, 2019 4:32 PM
> >
> > This series is the result of a rather absurd amount of discussion over how to get SGX to play
> > nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
> > userspace. The discussion definitely wandered into completely insane territory at times, but
> > I think/hope we ended up with something reasonable.
> >
> > The basic gist of the approach is to require userspace to declare what protections are
> > maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
> > takes ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed protections, e.g. clear
> > ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC. SGX enforces the allowed perms
> > via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> >
> > ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
> > file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
> > (regardless of kernel support) userspace could EADD from a noexec file using read-only
> > permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
> >
> > ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
> >
> > This is very much an RFC series. It's only compile tested, likely has obvious bugs, the
> > SELinux patch could be completely harebrained, etc...
> > My goal at this point is to get feedback at a macro level, e.g. is the core concept
> > viable/acceptable, are there objection to hooking mprotect(), etc...
> >
> > Andy and Cedric, hopefully this aligns with your general expectations based on our last
> > discussion.
>
> I couldn't understand the real intentions of ALLOW_* flags until I saw them
> in code. I have to say C is more expressive than English in that regard :)
>
> Generally I agree with your direction but think ALLOW_* flags are completely
> internal to LSM because they can be both produced and consumed inside an LSM
> module. So spilling them into SGX driver and also user mode code makes the
> solution ugly and in some cases impractical because not every enclave host
> process has a priori knowledge on whether or not an enclave page would be
> EMODPE'd at runtime.
In this case, the host process should tag *all* pages it *might* convert
to executable as ALLOW_EXEC. LSMs can (and should/will) be written in
such a way that denying ALLOW_EXEC is fatal to the enclave if and only if
the enclave actually attempts mprotect(PROT_EXEC).
Take the SELinux path for example. The only scenario in which PROT_WRITE
is cleared from @allowed_prot is if the page *starts* with PROT_EXEC.
If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
then PROT_EXEC will be cleared from @allowed_prot.
As Stephen pointed out, auditing the denials on @allowed_prot means the
log will contain false positives of a sort. But this is more of a noise
issue than true false positives. E.g. there are three possible outcomes
for the enclave.
- The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
Requesting ALLOW_EXEC is either a straightforward a userspace bug or
a poorly written generic enclave loader.
- The enclave conditionally performs EMODPE[PROT_EXEC]. In this case
the denial is a true false positive.
- The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
The audit log will be noisy, but viewed as a whole the denials aren't
false positives.
The potential for noisy audit logs and/or false positives is unfortunate,
but it's (by far) the lesser of many evils.
> Theoretically speaking, what you really need is a per page flag (let's name
> it WRITTEN?) indicating whether a page has ever been written to (or more
> precisely, granted PROT_WRITE), which will be used to decide whether to grant
> PROT_EXEC when requested in future. Given the fact that all mprotect() goes
> through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture
> that flag by itself instead of asking user mode code to provide it.
>
> That said, here is the summary of what I think is a better approach.
> * In hook security_file_alloc(), if @file is an enclave, allocate some data
> structure to store for every page, the WRITTEN flag as described above.
> WRITTEN is cleared initially for all pages.
This would effectively require *every* LSM to duplicate the SGX driver's
functionality, e.g. track per-page metadata, implement locking to prevent
races between multiple mm structs, etc...
> Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
> * In hook security_mmap_file(), if @file is an enclave, make sure @prot can
> only be PROT_NONE. This is to force all protection changes to go through
> security_file_mprotect().
> * In the newly introduced hook security_enclave_load(), set WRITTEN for pages
> that are requested PROT_WRITE.
How would an LSM associate a page with a specific enclave? vma->vm_file
will point always point at /dev/sgx/enclave. vma->vm_mm is useless
because we're allowing multiple processes to map a single enclave, not to
mention that by mm would require holding a reference to the mm.
> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up
> and use WRITTEN flags for all pages within @vma, along with other global
> flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide
> on allowing/rejecting @prot.
vma->vm_file will always be /dev/sgx/enclave at this point, which means
LSMs don't have the necessary anchor back to the source file, e.g. to
enforce FILE__EXECUTE. The noexec file system case is also unaddressed.
> * In hook security_file_free(), if @file is an enclave, free storage
> allocated for WRITTEN flags.
> From: [email protected] [mailto:linux-sgx-
> [email protected]] On Behalf Of Stephen Smalley
> Sent: Monday, June 03, 2019 10:47 AM
>
> On 6/2/19 3:29 AM, Xing, Cedric wrote:
> > Hi Sean,
> >
> >> From: Christopherson, Sean J
> >> Sent: Friday, May 31, 2019 4:32 PM
> >>
> >> This series is the result of a rather absurd amount of discussion
> >> over how to get SGX to play nice with LSM policies, without having to
> >> resort to evil shenanigans or put undue burden on userspace. The
> >> discussion definitely wandered into completely insane territory at
> times, but I think/hope we ended up with something reasonable.
> >>
> >> The basic gist of the approach is to require userspace to declare
> >> what protections are maximally allowed for any given page, e.g. add a
> >> flags field for loading enclave pages that takes
> >> ALLOW_{READ,WRITE,EXEC}. LSMs can then adjust the allowed
> >> protections, e.g. clear ALLOW_EXEC to prevent ever mapping the page
> with PROT_EXEC. SGX enforces the allowed perms via a new mprotect()
> vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> >>
> >> ALLOW_EXEC is used to deny hings like loading an enclave from a
> >> noexec file system or from a file without EXECUTE permissions, e.g.
> >> without the ALLOW_EXEC concept, on SGX2 hardware (regardless of
> >> kernel support) userspace could EADD from a noexec file using read-
> only permissions, and later use mprotect() and ENCLU[EMODPE] to gain
> execute permissions.
> >>
> >> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce
> SELinux's EXECMOD (or EXECMEM).
> >>
> >> This is very much an RFC series. It's only compile tested, likely
> >> has obvious bugs, the SELinux patch could be completely harebrained,
> etc...
> >> My goal at this point is to get feedback at a macro level, e.g. is
> >> the core concept viable/acceptable, are there objection to hooking
> mprotect(), etc...
> >>
> >> Andy and Cedric, hopefully this aligns with your general expectations
> >> based on our last discussion.
> >
> > I couldn't understand the real intentions of ALLOW_* flags until I saw
> > them in code. I have to say C is more expressive than English in that
> > regard :)
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> completely internal to LSM because they can be both produced and
> consumed inside an LSM module. So spilling them into SGX driver and also
> user mode code makes the solution ugly and in some cases impractical
> because not every enclave host process has a priori knowledge on whether
> or not an enclave page would be EMODPE'd at runtime.
> >
> > Theoretically speaking, what you really need is a per page flag (let's
> name it WRITTEN?) indicating whether a page has ever been written to (or
> more precisely, granted PROT_WRITE), which will be used to decide
> whether to grant PROT_EXEC when requested in future. Given the fact that
> all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's
> easy for LSM to capture that flag by itself instead of asking user mode
> code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data structure to store for every page, the WRITTEN flag as described
> above. WRITTEN is cleared initially for all pages.
> > Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can only be PROT_NONE. This is to force all protection changes to
> go through security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages that are requested PROT_WRITE.
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> > look up and use WRITTEN flags for all pages within @vma, along with
> > other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of
> > SELinux) to decide on
> allowing/rejecting @prot.
>
> At this point we have no knowledge of the source vma/file, right? So
> what do we check FILE__EXECUTE and/or FILE__EXECMOD against?
> vma->vm_file at this point is /dev/sgx/enclave, right?
My apology to the confusions here.
Yes, vma->vm_file is always /dev/sgx/enclave, but each open("/dev/sgx/enclave") returns a *new* file struct (let's denote it as @enclave_fd) that uniquely identifies one enclave instance, and the expectation is that @enclave_fd->f_security would be used by LSM to store enclave specific information, including ALLOW_* flags and whatever deemed appropriate by an LSM module.
In the case of SELinux, and if the choice is to use FILE__EXECMOD of .sigstruct file to authorize RW->RX at runtime, then SELinux could cache that flag in @enclave_fd->f_security upon security_enclave_init().
> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 10:16 AM
>
> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > Hi Sean,
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> > completely internal to LSM because they can be both produced and
> > consumed inside an LSM module. So spilling them into SGX driver and
> > also user mode code makes the solution ugly and in some cases
> > impractical because not every enclave host process has a priori
> > knowledge on whether or not an enclave page would be EMODPE'd at
> runtime.
>
> In this case, the host process should tag *all* pages it *might* convert
> to executable as ALLOW_EXEC. LSMs can (and should/will) be written in
> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> if the enclave actually attempts mprotect(PROT_EXEC).
What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?
Anyway, my point is that it is unnecessary even if it works.
>
> Take the SELinux path for example. The only scenario in which
> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> PROT_EXEC.
> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> then PROT_EXEC will be cleared from @allowed_prot.
>
> As Stephen pointed out, auditing the denials on @allowed_prot means the
> log will contain false positives of a sort. But this is more of a noise
> issue than true false positives. E.g. there are three possible outcomes
> for the enclave.
>
> - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> Requesting ALLOW_EXEC is either a straightforward a userspace bug or
> a poorly written generic enclave loader.
>
> - The enclave conditionally performs EMODPE[PROT_EXEC]. In this case
> the denial is a true false positive.
>
> - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
> on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
> The audit log will be noisy, but viewed as a whole the denials
> aren't
> false positives.
What I was talking about was EMODPE[PROT_WRITE] on an RX page.
>
> The potential for noisy audit logs and/or false positives is unfortunate,
> but it's (by far) the lesser of many evils.
>
> > Theoretically speaking, what you really need is a per page flag (let's
> > name it WRITTEN?) indicating whether a page has ever been written to
> > (or more precisely, granted PROT_WRITE), which will be used to decide
> > whether to grant PROT_EXEC when requested in future. Given the fact
> > that all mprotect() goes through LSM and mmap() is limited to
> > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> asking user mode code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data
> > structure to store for every page, the WRITTEN flag as described
> above.
> > WRITTEN is cleared initially for all pages.
>
> This would effectively require *every* LSM to duplicate the SGX driver's
> functionality, e.g. track per-page metadata, implement locking to
> prevent races between multiple mm structs, etc...
Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.
If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.
>
> > Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can
> > only be PROT_NONE. This is to force all protection changes to go
> through
> > security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages
> > that are requested PROT_WRITE.
>
> How would an LSM associate a page with a specific enclave? vma->vm_file
> will point always point at /dev/sgx/enclave. vma->vm_mm is useless
> because we're allowing multiple processes to map a single enclave, not
> to mention that by mm would require holding a reference to the mm.
Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.
>
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> look up
> > and use WRITTEN flags for all pages within @vma, along with other
> global
> > flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
> to decide
> > on allowing/rejecting @prot.
>
> vma->vm_file will always be /dev/sgx/enclave at this point, which means
> LSMs don't have the necessary anchor back to the source file, e.g. to
> enforce FILE__EXECUTE. The noexec file system case is also unaddressed.
vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().
The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.
>
> > * In hook security_file_free(), if @file is an enclave, free storage
> > allocated for WRITTEN flags.
On Mon, Jun 03, 2019 at 11:30:54AM -0700, Xing, Cedric wrote:
> > From: Christopherson, Sean J
> > Sent: Monday, June 03, 2019 10:16 AM
> >
> > On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > > Hi Sean,
> > >
> > > Generally I agree with your direction but think ALLOW_* flags are
> > > completely internal to LSM because they can be both produced and
> > > consumed inside an LSM module. So spilling them into SGX driver and
> > > also user mode code makes the solution ugly and in some cases
> > > impractical because not every enclave host process has a priori
> > > knowledge on whether or not an enclave page would be EMODPE'd at
> > runtime.
> >
> > In this case, the host process should tag *all* pages it *might* convert
> > to executable as ALLOW_EXEC. LSMs can (and should/will) be written in
> > such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> > if the enclave actually attempts mprotect(PROT_EXEC).
>
> What if those pages contain self-modifying code but the host doesn't know
> ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it
> prevent those pages to start with PROT_EXEC?
Without ALLOW_WRITE+ALLOW_EXEC, the enclave would build and launch, but
fail at mprotect(..., PROT_WRITE), e.g. when it attempted to gain write
access to do self-modifying code. And it would would fail irrespective of
LSM restrictions.
> Anyway, my point is that it is unnecessary even if it works.
Unnecessary in an ideal world, yes. Realistically, it's the least bad
option.
> > Take the SELinux path for example. The only scenario in which
> > PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> > PROT_EXEC.
> > If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> > then PROT_EXEC will be cleared from @allowed_prot.
> >
> > As Stephen pointed out, auditing the denials on @allowed_prot means the
> > log will contain false positives of a sort. But this is more of a noise
> > issue than true false positives. E.g. there are three possible outcomes
> > for the enclave.
> >
> > - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> > Requesting ALLOW_EXEC is either a straightforward a userspace bug or
> > a poorly written generic enclave loader.
> >
> > - The enclave conditionally performs EMODPE[PROT_EXEC]. In this case
> > the denial is a true false positive.
> >
> > - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
> > on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
> > The audit log will be noisy, but viewed as a whole the denials
> > aren't
> > false positives.
>
> What I was talking about was EMODPE[PROT_WRITE] on an RX page.
As above, mprotect(..., PROT_WRITE) would fail without ALLOW_WRITE.
> > The potential for noisy audit logs and/or false positives is unfortunate,
> > but it's (by far) the lesser of many evils.
> >
> > > Theoretically speaking, what you really need is a per page flag (let's
> > > name it WRITTEN?) indicating whether a page has ever been written to
> > > (or more precisely, granted PROT_WRITE), which will be used to decide
> > > whether to grant PROT_EXEC when requested in future. Given the fact
> > > that all mprotect() goes through LSM and mmap() is limited to
> > > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> > asking user mode code to provide it.
> > >
> > > That said, here is the summary of what I think is a better approach.
> > > * In hook security_file_alloc(), if @file is an enclave, allocate some
> > data
> > > structure to store for every page, the WRITTEN flag as described
> > above.
> > > WRITTEN is cleared initially for all pages.
> >
> > This would effectively require *every* LSM to duplicate the SGX driver's
> > functionality, e.g. track per-page metadata, implement locking to
> > prevent races between multiple mm structs, etc...
>
> Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no
> difference than PROCESS__* or FILE__* flags, which are just artifacts to
> assist particular LSMs in decision making. They are never considered part of
> the LSM interface, even if other LSMs than SELinux may adopt the same/similar
> approach.
No, the flags are tracked and managed by SGX. We are not dictating LSM
behavior in any way, e.g. an LSM could completely ignore @allowed_prot and
nothing would break.
> If code duplication is what you are worrying about, you can put them in a
> library, or implement/export them in some new file (maybe
> security/enclave.c?) as utility functions.
Code duplication is the least of my concerns. Tracking file pointers
would require a global list/tree of some form, along with a locking and/or
RCU scheme to protect accesses to that container. Another lock would be
needed to prevent races between mprotect() calls from different processes.
> But spilling them into user mode is what I think is unacceptable.
Why is it unacceptable? There's effectively no cost to userspace for SGX1.
The ALLOW_* flags only come into play in the event of a noexec or LSM
restriction, i.e. worst case scenario an enclave that wants to do arbitrary
self-modifying code can declare RWX on everything.
On Fri, May 31, 2019 at 04:31:50PM -0700, Sean Christopherson wrote:
> This series is the result of a rather absurd amount of discussion over
> how to get SGX to play nice with LSM policies, without having to resort
> to evil shenanigans or put undue burden on userspace. The discussion
> definitely wandered into completely insane territory at times, but I
> think/hope we ended up with something reasonable.
By definition this is a broken series because it does not apply to
mainline. Even RFC series should at least apply. Would be better idea to
discuss design ideas and use snippets instead. Now you have to take
original v20 and apply to these patches to evaluate anything.
> The basic gist of the approach is to require userspace to declare what
> protections are maximally allowed for any given page, e.g. add a flags
> field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}. LSMs
> can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent
> ever mapping the page with PROT_EXEC. SGX enforces the allowed perms
> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses
> MAY_{READ,WRITE,EXEC}.
mprotect() does not use MAY_{READ,WRITE,EXEC} constants. It uses
VM_MAY{READ,WRITE,EXEC,SHARED} constants.
What are ALLOW_{READ,WRITE,EXEC} and how they are used? What does the
hook do and why it is in vm_ops and not in file_operations? Are they
arguments to the ioctl or internal variables that are set based on
SECINFO?
/Jarkko
On Fri, May 31, 2019 at 04:31:51PM -0700, Sean Christopherson wrote:
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
How this patch is essential to demonstrate anything?
/Jarkko
On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> tracked and enforced by the CPU using a base+mask approach, similar to
> how hardware range registers such as the variable MTRRs. As a result,
> the ELRANGE must be naturally sized and aligned.
>
> To reduce boilerplate code that would be needed in every userspace
> enclave loader, the SGX driver naturally aligns the mmap() address and
> also requires the range to be naturally sized. Unfortunately, SGX fails
> to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> if userspace is attempting to map a small slice of an existing enclave.
>
> Signed-off-by: Sean Christopherson <[email protected]>
Why you want to allow mmap() to be called multiple times? mmap() could
be allowed only once with PROT_NONE and denied afterwards. Is this for
sending fd to another process that would map already existing enclave?
I don't see any checks for whether the is enclave underneath. Also, I
think that in all cases mmap() callback should allow only PROT_NONE
as permissions for clarity even if it could called multiple times.
/Jarkko
On 6/3/19 2:30 PM, Xing, Cedric wrote:
>> From: Christopherson, Sean J
>> Sent: Monday, June 03, 2019 10:16 AM
>>
>> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
>>> Hi Sean,
>>>
>>> Generally I agree with your direction but think ALLOW_* flags are
>>> completely internal to LSM because they can be both produced and
>>> consumed inside an LSM module. So spilling them into SGX driver and
>>> also user mode code makes the solution ugly and in some cases
>>> impractical because not every enclave host process has a priori
>>> knowledge on whether or not an enclave page would be EMODPE'd at
>> runtime.
>>
>> In this case, the host process should tag *all* pages it *might* convert
>> to executable as ALLOW_EXEC. LSMs can (and should/will) be written in
>> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
>> if the enclave actually attempts mprotect(PROT_EXEC).
>
> What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?
>
> Anyway, my point is that it is unnecessary even if it works.
>
>>
>> Take the SELinux path for example. The only scenario in which
>> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
>> PROT_EXEC.
>> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
>> then PROT_EXEC will be cleared from @allowed_prot.
>>
>> As Stephen pointed out, auditing the denials on @allowed_prot means the
>> log will contain false positives of a sort. But this is more of a noise
>> issue than true false positives. E.g. there are three possible outcomes
>> for the enclave.
>>
>> - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
>> Requesting ALLOW_EXEC is either a straightforward a userspace bug or
>> a poorly written generic enclave loader.
>>
>> - The enclave conditionally performs EMODPE[PROT_EXEC]. In this case
>> the denial is a true false positive.
>>
>> - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
>> on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
>> The audit log will be noisy, but viewed as a whole the denials
>> aren't
>> false positives.
>
> What I was talking about was EMODPE[PROT_WRITE] on an RX page.
>
>>
>> The potential for noisy audit logs and/or false positives is unfortunate,
>> but it's (by far) the lesser of many evils.
>>
>>> Theoretically speaking, what you really need is a per page flag (let's
>>> name it WRITTEN?) indicating whether a page has ever been written to
>>> (or more precisely, granted PROT_WRITE), which will be used to decide
>>> whether to grant PROT_EXEC when requested in future. Given the fact
>>> that all mprotect() goes through LSM and mmap() is limited to
>>> PROT_NONE, it's easy for LSM to capture that flag by itself instead of
>> asking user mode code to provide it.
>>>
>>> That said, here is the summary of what I think is a better approach.
>>> * In hook security_file_alloc(), if @file is an enclave, allocate some
>> data
>>> structure to store for every page, the WRITTEN flag as described
>> above.
>>> WRITTEN is cleared initially for all pages.
>>
>> This would effectively require *every* LSM to duplicate the SGX driver's
>> functionality, e.g. track per-page metadata, implement locking to
>> prevent races between multiple mm structs, etc...
>
> Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.
>
> If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.
>
>>
>>> Open: Given a file of type struct file *, how to tell if it is an
>> enclave (i.e. /dev/sgx/enclave)?
>>> * In hook security_mmap_file(), if @file is an enclave, make sure
>> @prot can
>>> only be PROT_NONE. This is to force all protection changes to go
>> through
>>> security_file_mprotect().
>>> * In the newly introduced hook security_enclave_load(), set WRITTEN
>> for pages
>>> that are requested PROT_WRITE.
>>
>> How would an LSM associate a page with a specific enclave? vma->vm_file
>> will point always point at /dev/sgx/enclave. vma->vm_mm is useless
>> because we're allowing multiple processes to map a single enclave, not
>> to mention that by mm would require holding a reference to the mm.
>
> Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.
>
>>
>>> * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
>> look up
>>> and use WRITTEN flags for all pages within @vma, along with other
>> global
>>> flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
>> to decide
>>> on allowing/rejecting @prot.
>>
>> vma->vm_file will always be /dev/sgx/enclave at this point, which means
>> LSMs don't have the necessary anchor back to the source file, e.g. to
>> enforce FILE__EXECUTE. The noexec file system case is also unaddressed.
>
> vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().
The RFC series seemed to dispense with the use of the sigstruct file and
just used the source file throughout IIUC. That allowed for reuse of
FILE__* permissions without ambiguity rather than introducing separate
ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of
all checks.
Regardless, IIUC, your approach requires that we always check
FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load()
irrespective of prot so that we can save the result in the f_security
for later use by the mprotect hook. This may generate many spurious
audit messages for cases where PROT_EXEC will never be requested, and
users will be prone to just always allowing it since they cannot tell
when it was actually needed.
>
> The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.
>
>>
>>> * In hook security_file_free(), if @file is an enclave, free storage
>>> allocated for WRITTEN flags.
On Fri, May 31, 2019 at 04:31:56PM -0700, Sean Christopherson wrote:
> ...to support (the equivalent) of existing Linux Security Module
> functionality.
Long and short descriptions should be separate. Also this does not
make any sense. LSM is a framework with a set of hook to make access
decisions and there various implementations of it.
How this replicates LSMs and why that even would be a goal?
My guess is that you are trying to do something else. I'm just saying
that the idea to do equivalent of LSMs to another subsystems would be
insane if it was done.
> always be MAP_SHARED. Lastly, all real world enclaves will need read,
> write and execute permissions to EPC pages. As a result, SGX does not
> play nice with existing LSM behavior as it is impossible to apply
> policies to enclaves with any reasonable granularity, e.g. an LSM can
> deny access to EPC altogether, but can't deny potentially dangerous
> behavior such as mapping pages RW->RW or RWX.
The mapping must be shared given that it is iomem but why enclave pages
would need RWX for all pages? The information that is missing from this
paragraph is the explanation why an LSM could not deny dangerous
behavior in PTE level.
> To give LSMs enough information to implement their policies without
> having to resort to ugly things, e.g. holding a reference to the vm_file
> of each enclave page, require userspace to explicitly state the allowed
> protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC}
> in the ADD_PAGES ioctl.
I would keep descriptions such as "ugly things" away from commit
messages as it is easy way to be not clear and explicit what you are
trying to say.
> The ALLOW_* flags will be passed to LSMs so that they can make informed
> decisions when the enclave is being built, i.e. when the source vm_file
> is available. For example, SELinux's EXECMOD permission can be
> required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC.
There should be some explanation what ALLOW_* flag are. It is now like
as it was in common knowledge. SECINFO already has protection flags to
name an example and without any explanation all of this is just very
confusing.
This should address SECINFO and ALLOW_* relationship and differences.
> Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections,
> a la the standard VM_MAY{READ,WRITE,EXEC} flags.
>
> The ALLOW_EXEC flag also has a second (important) use in that it can
> be used to prevent loading an enclave from a noexec file system, on
> SGX2 hardware (regardless of kernel support for SGX2), userspace could
> EADD from a noexec path using read-only permissions and later mprotect()
> and ENCLU[EMODPE] the page to gain execute permissions. By requiring
> ALLOW_EXEC up front, SGX will be able to enforce noexec paths when
> building the enclave.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/uapi/asm/sgx.h | 9 ++++++++-
> arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +++++++++++++++++------
> arch/x86/kernel/cpu/sgx/encl.c | 2 +-
> arch/x86/kernel/cpu/sgx/encl.h | 1 +
> 4 files changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 4a12d6abbcb7..4489e92fa0dc 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -31,6 +31,11 @@ struct sgx_enclave_create {
> __u64 src;
> };
>
> +/* Supported flags for struct sgx_enclave_add_pages. */
> +#define SGX_ALLOW_READ VM_READ
> +#define SGX_ALLOW_WRITE VM_WRITE
> +#define SGX_ALLOW_EXEC VM_EXEC
Why these flags are even defined if they are the same as VM_* flags?
> +
> /**
> * struct sgx_enclave_add_pages - parameter structure for the
> * %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
> @@ -39,6 +44,7 @@ struct sgx_enclave_create {
> * @secinfo: address for the SECINFO data (common to all pages)
> * @nr_pages: number of pages (must be virtually contiguous)
> * @mrmask: bitmask for the measured 256 byte chunks (common to all pages)
> + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages)
> */
> struct sgx_enclave_add_pages {
> __u64 addr;
> @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
> __u64 secinfo;
> __u32 nr_pages;
> __u16 mrmask;
> -} __attribute__((__packed__));
> + __u16 flags;
> +};
>
> /**
> * struct sgx_enclave_init - parameter structure for the
> diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> index 6acfcbdeca9a..c30acd3fbbdd 100644
> --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> @@ -235,7 +235,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs,
> }
>
> static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
> - unsigned long addr)
> + unsigned long addr,
> + unsigned long allowed_prot)
> {
> struct sgx_encl_page *encl_page;
> int ret;
> @@ -247,6 +248,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
> return ERR_PTR(-ENOMEM);
> encl_page->desc = addr;
> encl_page->encl = encl;
> + encl_page->allowed_prot = allowed_prot;
> ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc),
> encl_page);
> if (ret) {
> @@ -530,7 +532,7 @@ static int sgx_encl_queue_page(struct sgx_encl *encl,
>
> static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> void *data, struct sgx_secinfo *secinfo,
> - unsigned int mrmask)
> + unsigned int mrmask, unsigned long allowed_prot)
> {
> u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK;
> struct sgx_encl_page *encl_page;
> @@ -556,7 +558,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> goto out;
> }
>
> - encl_page = sgx_encl_page_alloc(encl, addr);
> + encl_page = sgx_encl_page_alloc(encl, addr, allowed_prot);
> if (IS_ERR(encl_page)) {
> ret = PTR_ERR(encl_page);
> goto out;
> @@ -576,12 +578,20 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
>
> static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> unsigned long src, struct sgx_secinfo *secinfo,
> - unsigned int mrmask)
> + unsigned int mrmask, unsigned int flags)
> {
> + unsigned long prot = secinfo->flags & (VM_READ | VM_WRITE | VM_EXEC);
Even if the secinfo flags have the exactly the same values you should
not do this as they are kind of from different type. This is confusing
to read.
> + unsigned long allowed_prot = flags & (VM_READ | VM_WRITE | VM_EXEC);
Why you take the trouble defining those macros and do not then use them
even yourself?
> struct page *data_page;
> void *data;
> int ret;
>
> + BUILD_BUG_ON(SGX_SECINFO_R != VM_READ || SGX_SECINFO_W != VM_WRITE ||
> + SGX_SECINFO_X != VM_EXEC);
Why this check?
> +
> + if (prot & ~allowed_prot)
> + return -EACCES;
> +
> data_page = alloc_page(GFP_HIGHUSER);
> if (!data_page)
> return -ENOMEM;
> @@ -593,7 +603,8 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> goto out;
> }
>
> - ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask);
> + ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask,
> + allowed_prot);
> out:
> kunmap(data_page);
> __free_page(data_page);
> @@ -645,7 +656,7 @@ static long sgx_ioc_enclave_add_pages(struct file *filep, unsigned int cmd,
>
> ret = sgx_encl_add_page(encl, addp->addr + i*PAGE_SIZE,
> addp->src + i*PAGE_SIZE,
> - &secinfo, addp->mrmask);
> + &secinfo, addp->mrmask, addp->flags);
> }
> return ret;
> }
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 955d4f430adc..e5847571a265 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -249,7 +249,7 @@ int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
>
> for (addr = start; addr < end; addr += PAGE_SIZE) {
> page = radix_tree_lookup(&encl->page_tree, addr >> PAGE_SHIFT);
> - if (!page)
> + if (!page || (prot & ~page->allowed_prot))
> return -EACCES;
> }
However this goes it would be good idea to have only ony patch in the
patch set that fully defines this function. Impossible to review
properly with this split.
>
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index 6e310e3b3fff..7cca076a4987 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -41,6 +41,7 @@ enum sgx_encl_page_desc {
>
> struct sgx_encl_page {
> unsigned long desc;
> + unsigned long allowed_prot;
> struct sgx_epc_page *epc_page;
> struct sgx_va_page *va_page;
> struct sgx_encl *encl;
> --
> 2.21.0
>
This patch left me very confused. I don't get it.
/Jarkko
On Tue, Jun 04, 2019 at 11:33:44AM -0400, Stephen Smalley wrote:
> The RFC series seemed to dispense with the use of the sigstruct file and
> just used the source file throughout IIUC. That allowed for reuse of
> FILE__* permissions without ambiguity rather than introducing separate
> ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of all
> checks.
Drat, I meant to explicitly call that out in the cover letter. Yes, the
concept of using sigstruct as a proxy was dropped for this RFC. The
primary motivation was to avoid having to take a hold a reference to the
sigstruct file for the lifetime of the enclave, and in general so that
userspace isn't forced to put sigstruct into a file.
> Regardless, IIUC, your approach requires that we always check FILE__EXECMOD,
> and FILE__EXECUTE up front during security_enclave_load() irrespective of
> prot so that we can save the result in the f_security for later use by the
> mprotect hook.
Correct, this approach requires up front checks.
> This may generate many spurious audit messages for cases
> where PROT_EXEC will never be requested, and users will be prone to just
> always allowing it since they cannot tell when it was actually needed.
Userspace will be able to understand when PROT_EXEC is actually needed
as mprotect() will (eventually) fail. Of course that assumes userspace
is being intelligent and isn't blindly declaring permissions they don't
need, e.g. declaring RWX on all pages even though the enclave never
actually maps a RWX or RW->RX page.
One thought for handling this in a more user friendly fashion would be
to immediately return -EACCES instead of modifying @allowed_prot. An
enclave that truly needs the permission would fail immediately.
An enclave loader that wants/needs to speculatively declare PROT_EXEC,
e.g. because the exact requirements of the enclave are unknown, could
handle -EACCESS gracefully by retrying the SGX ioctl() with different
@allowed_prot, e.g.:
region.flags = SGX_ALLOW_READ | SGX_ALLOW_WRITE | SGX_ALLOW_EXEC;
ret = ioctl(fd, SGX_IOC_ENCLAVE_ADD_REGION, ®ion);
if (ret && errno == EACCES && !(prot & PROT_EXEC)) {
region.flags &= ~SGX_ALLOW_EXEC;
ret = ioctl(fd, SGX_IOC_ENCLAVE_ADD_REGION, ®ion);
}
This type of enclave loader would still generate spurious audit messages,
but the spurious messages would be limited to enclave loaders that are
deliberately probing the allowed permissions.
> >The noexec case should be addressed in IOC_ADD_PAGES by testing
> >@source_vma->vm_flags & VM_MAYEXEC.
> >
> >>
> >>>* In hook security_file_free(), if @file is an enclave, free storage
> >>> allocated for WRITTEN flags.
>
On Tue, Jun 04, 2019 at 07:23:06PM +0300, Jarkko Sakkinen wrote:
> On Fri, May 31, 2019 at 04:31:56PM -0700, Sean Christopherson wrote:
> > ...to support (the equivalent) of existing Linux Security Module
> > functionality.
>
> Long and short descriptions should be separate. Also this does not
> make any sense. LSM is a framework with a set of hook to make access
> decisions and there various implementations of it.
>
> How this replicates LSMs and why that even would be a goal?
>
> My guess is that you are trying to do something else. I'm just saying
> that the idea to do equivalent of LSMs to another subsystems would be
> insane if it was done.
Heh, yeah, it's not duplicating LSM functionality. What I was trying to
say is that this patch allows LSMs to implement policies that are
equivalent to their existing functionality, e.g. paves the way to add
security_enclave_load() as an equivalent to security_file_mprotect().
> > always be MAP_SHARED. Lastly, all real world enclaves will need read,
> > write and execute permissions to EPC pages. As a result, SGX does not
> > play nice with existing LSM behavior as it is impossible to apply
> > policies to enclaves with any reasonable granularity, e.g. an LSM can
> > deny access to EPC altogether, but can't deny potentially dangerous
> > behavior such as mapping pages RW->RW or RWX.
>
> The mapping must be shared given that it is iomem but why enclave pages
> would need RWX for all pages? The information that is missing from this
> paragraph is the explanation why an LSM could not deny dangerous
> behavior in PTE level.
I'll add that.
> > To give LSMs enough information to implement their policies without
> > having to resort to ugly things, e.g. holding a reference to the vm_file
> > of each enclave page, require userspace to explicitly state the allowed
> > protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC}
> > in the ADD_PAGES ioctl.
>
> I would keep descriptions such as "ugly things" away from commit
> messages as it is easy way to be not clear and explicit what you are
> trying to say.
>
> > The ALLOW_* flags will be passed to LSMs so that they can make informed
> > decisions when the enclave is being built, i.e. when the source vm_file
> > is available. For example, SELinux's EXECMOD permission can be
> > required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC.
>
> There should be some explanation what ALLOW_* flag are. It is now like
> as it was in common knowledge. SECINFO already has protection flags to
> name an example and without any explanation all of this is just very
> confusing.
Noted.
> This should address SECINFO and ALLOW_* relationship and differences.
>
> > Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections,
> > a la the standard VM_MAY{READ,WRITE,EXEC} flags.
> >
> > The ALLOW_EXEC flag also has a second (important) use in that it can
> > be used to prevent loading an enclave from a noexec file system, on
> > SGX2 hardware (regardless of kernel support for SGX2), userspace could
> > EADD from a noexec path using read-only permissions and later mprotect()
> > and ENCLU[EMODPE] the page to gain execute permissions. By requiring
> > ALLOW_EXEC up front, SGX will be able to enforce noexec paths when
> > building the enclave.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/include/uapi/asm/sgx.h | 9 ++++++++-
> > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +++++++++++++++++------
> > arch/x86/kernel/cpu/sgx/encl.c | 2 +-
> > arch/x86/kernel/cpu/sgx/encl.h | 1 +
> > 4 files changed, 27 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> > index 4a12d6abbcb7..4489e92fa0dc 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -31,6 +31,11 @@ struct sgx_enclave_create {
> > __u64 src;
> > };
> >
> > +/* Supported flags for struct sgx_enclave_add_pages. */
> > +#define SGX_ALLOW_READ VM_READ
> > +#define SGX_ALLOW_WRITE VM_WRITE
> > +#define SGX_ALLOW_EXEC VM_EXEC
>
> Why these flags are even defined if they are the same as VM_* flags?
Brain fart. Flags can just take PROT_{READ,WRITE,EXEC}.
> > +
> > /**
> > * struct sgx_enclave_add_pages - parameter structure for the
> > * %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
> > @@ -39,6 +44,7 @@ struct sgx_enclave_create {
> > * @secinfo: address for the SECINFO data (common to all pages)
> > * @nr_pages: number of pages (must be virtually contiguous)
> > * @mrmask: bitmask for the measured 256 byte chunks (common to all pages)
> > + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages)
> > */
> > struct sgx_enclave_add_pages {
> > __u64 addr;
> > @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
> > __u64 secinfo;
> > __u32 nr_pages;
> > __u16 mrmask;
> > -} __attribute__((__packed__));
> > + __u16 flags;
> > +};
> >
...
> > @@ -576,12 +578,20 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> >
> > static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> > unsigned long src, struct sgx_secinfo *secinfo,
> > - unsigned int mrmask)
> > + unsigned int mrmask, unsigned int flags)
> > {
> > + unsigned long prot = secinfo->flags & (VM_READ | VM_WRITE | VM_EXEC);
>
> Even if the secinfo flags have the exactly the same values you should
> not do this as they are kind of from different type. This is confusing
> to read.
I can add a dummy helper to translate flags and encapsulate the below
assert.
> > + unsigned long allowed_prot = flags & (VM_READ | VM_WRITE | VM_EXEC);
>
> Why you take the trouble defining those macros and do not then use them
> even yourself?
The original thought was to define them for userspace, but that's broken
because VM_* aren't defined for userspace.
> > struct page *data_page;
> > void *data;
> > int ret;
> >
> > + BUILD_BUG_ON(SGX_SECINFO_R != VM_READ || SGX_SECINFO_W != VM_WRITE ||
> > + SGX_SECINFO_X != VM_EXEC);
>
> Why this check?
To assert that the hardware defined SECINFO flags are interchangeable with
Linux's software defined flags, i.e. don't need to be translated.
>
> > +
> > + if (prot & ~allowed_prot)
> > + return -EACCES;
> > +
> > data_page = alloc_page(GFP_HIGHUSER);
> > if (!data_page)
> > return -ENOMEM;
> > @@ -593,7 +603,8 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> > goto out;
> > }
> >
> > - ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask);
> > + ret = __sgx_encl_add_page(encl, addr, data, secinfo, mrmask,
> > + allowed_prot);
> > out:
> > kunmap(data_page);
> > __free_page(data_page);
> > @@ -645,7 +656,7 @@ static long sgx_ioc_enclave_add_pages(struct file *filep, unsigned int cmd,
> >
> > ret = sgx_encl_add_page(encl, addp->addr + i*PAGE_SIZE,
> > addp->src + i*PAGE_SIZE,
> > - &secinfo, addp->mrmask);
> > + &secinfo, addp->mrmask, addp->flags);
> > }
> > return ret;
> > }
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 955d4f430adc..e5847571a265 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -249,7 +249,7 @@ int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
> >
> > for (addr = start; addr < end; addr += PAGE_SIZE) {
> > page = radix_tree_lookup(&encl->page_tree, addr >> PAGE_SHIFT);
> > - if (!page)
> > + if (!page || (prot & ~page->allowed_prot))
> > return -EACCES;
> > }
>
> However this goes it would be good idea to have only ony patch in the
> patch set that fully defines this function. Impossible to review
> properly with this split.
Sorry, I don't understand what you're suggesting.
>
> >
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index 6e310e3b3fff..7cca076a4987 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -41,6 +41,7 @@ enum sgx_encl_page_desc {
> >
> > struct sgx_encl_page {
> > unsigned long desc;
> > + unsigned long allowed_prot;
> > struct sgx_epc_page *epc_page;
> > struct sgx_va_page *va_page;
> > struct sgx_encl *encl;
> > --
> > 2.21.0
> >
>
> This patch left me very confused. I don't get it.
>
> /Jarkko
On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> > tracked and enforced by the CPU using a base+mask approach, similar to
> > how hardware range registers such as the variable MTRRs. As a result,
> > the ELRANGE must be naturally sized and aligned.
> >
> > To reduce boilerplate code that would be needed in every userspace
> > enclave loader, the SGX driver naturally aligns the mmap() address and
> > also requires the range to be naturally sized. Unfortunately, SGX fails
> > to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> > if userspace is attempting to map a small slice of an existing enclave.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> Why you want to allow mmap() to be called multiple times? mmap() could
> be allowed only once with PROT_NONE and denied afterwards. Is this for
> sending fd to another process that would map already existing enclave?
>
> I don't see any checks for whether the is enclave underneath. Also, I
> think that in all cases mmap() callback should allow only PROT_NONE
> as permissions for clarity even if it could called multiple times.
>
What's the advantage to only allowing PROT_NONE? The idea here is to
allow a PROT_NONE map followed by some replacemets that overlay it for
the individual segments. Admittedly, mprotect() can do the same
thing, but disallowing mmap() seems at least a bit surprising.
On Fri, May 31, 2019 at 4:32 PM Sean Christopherson
<[email protected]> wrote:
>
> ...to support (the equivalent) of existing Linux Security Module
> functionality.
>
> Because SGX manually manages EPC memory, all enclave VMAs are backed by
> the same vm_file, i.e. /dev/sgx/enclave, so that SGX can implement the
> necessary hooks to move pages in/out of the EPC. And because EPC pages
> for any given enclave are fundamentally shared between processes, i.e.
> CoW semantics are not possible with EPC pages, /dev/sgx/enclave must
> always be MAP_SHARED. Lastly, all real world enclaves will need read,
> write and execute permissions to EPC pages. As a result, SGX does not
> play nice with existing LSM behavior as it is impossible to apply
> policies to enclaves with any reasonable granularity, e.g. an LSM can
> deny access to EPC altogether, but can't deny potentially dangerous
> behavior such as mapping pages RW->RW or RWX.
>
> To give LSMs enough information to implement their policies without
> having to resort to ugly things, e.g. holding a reference to the vm_file
> of each enclave page, require userspace to explicitly state the allowed
> protections for each page (region), i.e. take ALLOW_{READ,WRITE,EXEC}
> in the ADD_PAGES ioctl.
>
> The ALLOW_* flags will be passed to LSMs so that they can make informed
> decisions when the enclave is being built, i.e. when the source vm_file
> is available. For example, SELinux's EXECMOD permission can be
> required if an enclave is requesting both ALLOW_WRITE and ALLOW_EXEC.
>
> Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections,
> a la the standard VM_MAY{READ,WRITE,EXEC} flags.
>
> The ALLOW_EXEC flag also has a second (important) use in that it can
> be used to prevent loading an enclave from a noexec file system, on
> SGX2 hardware (regardless of kernel support for SGX2), userspace could
> EADD from a noexec path using read-only permissions and later mprotect()
> and ENCLU[EMODPE] the page to gain execute permissions. By requiring
> ALLOW_EXEC up front, SGX will be able to enforce noexec paths when
> building the enclave.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/uapi/asm/sgx.h | 9 ++++++++-
> arch/x86/kernel/cpu/sgx/driver/ioctl.c | 23 +++++++++++++++++------
> arch/x86/kernel/cpu/sgx/encl.c | 2 +-
> arch/x86/kernel/cpu/sgx/encl.h | 1 +
> 4 files changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 4a12d6abbcb7..4489e92fa0dc 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -31,6 +31,11 @@ struct sgx_enclave_create {
> __u64 src;
> };
>
> +/* Supported flags for struct sgx_enclave_add_pages. */
> +#define SGX_ALLOW_READ VM_READ
> +#define SGX_ALLOW_WRITE VM_WRITE
> +#define SGX_ALLOW_EXEC VM_EXEC
> +
> /**
> * struct sgx_enclave_add_pages - parameter structure for the
> * %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
> @@ -39,6 +44,7 @@ struct sgx_enclave_create {
> * @secinfo: address for the SECINFO data (common to all pages)
> * @nr_pages: number of pages (must be virtually contiguous)
> * @mrmask: bitmask for the measured 256 byte chunks (common to all pages)
> + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all pages)
> */
> struct sgx_enclave_add_pages {
> __u64 addr;
> @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
> __u64 secinfo;
> __u32 nr_pages;
> __u16 mrmask;
> -} __attribute__((__packed__));
> + __u16 flags;
Minor nit: I would use at least u32 for flags. It's not like the size
saving is important.
> static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
> unsigned long src, struct sgx_secinfo *secinfo,
> - unsigned int mrmask)
> + unsigned int mrmask, unsigned int flags)
> {
> + unsigned long prot = secinfo->flags & (VM_READ | VM_WRITE | VM_EXEC);
> + unsigned long allowed_prot = flags & (VM_READ | VM_WRITE | VM_EXEC);
...
> + if (prot & ~allowed_prot)
> + return -EACCES;
This seems like a well-intentioned sanity check, but it doesn't really
accomplish anything with SGX2, so I tend to think it would be better
to omit it.
Hi Stephen,
> From: [email protected] [mailto:linux-sgx-
> [email protected]] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 04, 2019 8:34 AM
>
> On 6/3/19 2:30 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Monday, June 03, 2019 10:16 AM
> >>
> >> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> >>> Hi Sean,
> >>>
> >>> Generally I agree with your direction but think ALLOW_* flags are
> >>> completely internal to LSM because they can be both produced and
> >>> consumed inside an LSM module. So spilling them into SGX driver and
> >>> also user mode code makes the solution ugly and in some cases
> >>> impractical because not every enclave host process has a priori
> >>> knowledge on whether or not an enclave page would be EMODPE'd at
> >> runtime.
> >>
> >> In this case, the host process should tag *all* pages it *might*
> convert
> >> to executable as ALLOW_EXEC. LSMs can (and should/will) be written
> in
> >> such a way that denying ALLOW_EXEC is fatal to the enclave if and
> only
> >> if the enclave actually attempts mprotect(PROT_EXEC).
> >
> > What if those pages contain self-modifying code but the host doesn't
> know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD?
> Then would it prevent those pages to start with PROT_EXEC?
> >
> > Anyway, my point is that it is unnecessary even if it works.
> >
> >>
> >> Take the SELinux path for example. The only scenario in which
> >> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> >> PROT_EXEC.
> >> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> >> then PROT_EXEC will be cleared from @allowed_prot.
> >>
> >> As Stephen pointed out, auditing the denials on @allowed_prot means
> the
> >> log will contain false positives of a sort. But this is more of a
> noise
> >> issue than true false positives. E.g. there are three possible
> outcomes
> >> for the enclave.
> >>
> >> - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> >> Requesting ALLOW_EXEC is either a straightforward a userspace
> bug or
> >> a poorly written generic enclave loader.
> >>
> >> - The enclave conditionally performs EMODPE[PROT_EXEC]. In this
> case
> >> the denial is a true false positive.
> >>
> >> - The enclave does EMODPE[PROT_EXEC] and its host userspace then
> fails
> >> on mprotect(PROT_EXEC), i.e. the LSM denial is working as
> intended.
> >> The audit log will be noisy, but viewed as a whole the denials
> >> aren't
> >> false positives.
> >
> > What I was talking about was EMODPE[PROT_WRITE] on an RX page.
> >
> >>
> >> The potential for noisy audit logs and/or false positives is
> unfortunate,
> >> but it's (by far) the lesser of many evils.
> >>
> >>> Theoretically speaking, what you really need is a per page flag
> (let's
> >>> name it WRITTEN?) indicating whether a page has ever been written to
> >>> (or more precisely, granted PROT_WRITE), which will be used to
> decide
> >>> whether to grant PROT_EXEC when requested in future. Given the fact
> >>> that all mprotect() goes through LSM and mmap() is limited to
> >>> PROT_NONE, it's easy for LSM to capture that flag by itself instead
> of
> >> asking user mode code to provide it.
> >>>
> >>> That said, here is the summary of what I think is a better approach.
> >>> * In hook security_file_alloc(), if @file is an enclave, allocate
> some
> >> data
> >>> structure to store for every page, the WRITTEN flag as described
> >> above.
> >>> WRITTEN is cleared initially for all pages.
> >>
> >> This would effectively require *every* LSM to duplicate the SGX
> driver's
> >> functionality, e.g. track per-page metadata, implement locking to
> >> prevent races between multiple mm structs, etc...
> >
> > Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_*
> are no difference than PROCESS__* or FILE__* flags, which are just
> artifacts to assist particular LSMs in decision making. They are never
> considered part of the LSM interface, even if other LSMs than SELinux
> may adopt the same/similar approach.
> >
> > If code duplication is what you are worrying about, you can put them
> in a library, or implement/export them in some new file (maybe
> security/enclave.c?) as utility functions. But spilling them into user
> mode is what I think is unacceptable.
> >
> >>
> >>> Open: Given a file of type struct file *, how to tell if it is an
> >> enclave (i.e. /dev/sgx/enclave)?
> >>> * In hook security_mmap_file(), if @file is an enclave, make sure
> >> @prot can
> >>> only be PROT_NONE. This is to force all protection changes to go
> >> through
> >>> security_file_mprotect().
> >>> * In the newly introduced hook security_enclave_load(), set WRITTEN
> >> for pages
> >>> that are requested PROT_WRITE.
> >>
> >> How would an LSM associate a page with a specific enclave? vma-
> >vm_file
> >> will point always point at /dev/sgx/enclave. vma->vm_mm is useless
> >> because we're allowing multiple processes to map a single enclave,
> not
> >> to mention that by mm would require holding a reference to the mm.
> >
> > Each open("/dev/sgx/enclave") syscall creates a *new* instance of
> struct file to uniquely identify one enclave instance. What I mean is
> @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.
> >
> >>
> >>> * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> >> look up
> >>> and use WRITTEN flags for all pages within @vma, along with other
> >> global
> >>> flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
> >> to decide
> >>> on allowing/rejecting @prot.
> >>
> >> vma->vm_file will always be /dev/sgx/enclave at this point, which
> means
> >> LSMs don't have the necessary anchor back to the source file, e.g. to
> >> enforce FILE__EXECUTE. The noexec file system case is also
> unaddressed.
> >
> > vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is
> checked by security_enclave_load() using @source_vma->vm_file. Once a
> page has been EADD'ed, whether to allow RW->RX depends on .sigstruct
> file (more precisely, the file backing SIGSTRUCT), whose FILE__*
> attributes could be cached in vma->vm_file->f_security by
> security_enclave_init().
>
> The RFC series seemed to dispense with the use of the sigstruct file and
> just used the source file throughout IIUC. That allowed for reuse of
> FILE__* permissions without ambiguity rather than introducing separate
> ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of
> all checks.
That's right. But that's not the distinction between Sean's patch and my proposal.
My point here is, from the perspective of LSM architecture, LSM hooks shall be defined to pass adequate information to allow *all* possible implementations to make reasonable decisions. In particular, all parameters to EADD (i.e. target linear address, SECINFO, source page) could affect (current and future) decisions but Sean's definition of security_enclave_load() passes only the source, which limits the possibility of other implementations. Another point I'm trying to make is, different LSM implementations may need different information at any given decision point, therefore it's *not* possible to always pass "right" information at "right" time. And that's why I think .f_security was added to struct file to allow stateful LSMs. Sean's patch however is trying the opposite, as ALLOW_* flags should otherwise be part of internal state of LSMs, but they are spilled into SGX driver and also userspace.
>
> Regardless, IIUC, your approach requires that we always check
> FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load()
> irrespective of prot so that we can save the result in the f_security
> for later use by the mprotect hook. This may generate many spurious
> audit messages for cases where PROT_EXEC will never be requested, and
> users will be prone to just always allowing it since they cannot tell
> when it was actually needed.
Yes and no.
For those not following this discussion closely, here's the prototype of security_enclave_load() that I proposed in one of my earlier emails.
int security_enclave_load(struct file *enclave_fd, unsigned long linear_address, unsigned long nr_pages, int prot, struct vm_area_struct *source_vma);
@enclave_fd identifies the enclave to which new pages are being added.
@linear_address/@nr_pages specifies the linear range of pages being added.
@prot specifies the initial protection of those newly added pages. It is taken from the vma covering the target range.
@source_vma covers the source pages in the case of EADD. An LSM is expected to make sure security_file_mprotect(source_vma, prot, prot) would succeed before checking anything else, unless @source_vma is NULL, indicating pages are being EAUG'ed. In all cases, LSM is expected to "remember" @prot for all those pages to be checked in future security_file_mprotect() invocations.
Architecture wise, the idea here is for SGX driver to pass in all information relevant for an LSM's decision.
Implementation wise, LSM may allow PROT_EXEC depending on FILE__EXECUTE of the source file (@source_vma->vm_file), or the sigstruct file (will be provided to LSM at security_enclave_init), or /dev/sgx/enclave. It makes most sense to me to use the source file, hence the check would most likely be done here. For future security_file_mprotect(), the source file's FILE__EXECMOD could also be cached here, or it could use sigstruct file's FILE__EXECMOD instead. Given the fact that no EPC pages could be accessed before EINIT, the major purpose of security_enclave_load() is for LSM to cache whatever information deemed appropriate for the pages being EADD'ed (i.e. @source_vma != NULL) so that it has necessary information to make decisions in security_file_mprotect() in future. And in that regard the parameter @prot is unnecessary but I decided to have it here for 2 reasons: 1) Pages may be EADD'ed within an existing VMA (so no upcoming mprotect) so LSM's decision is needed right away and 2) @source_vma may not be able to mprotect(@prot) and in that case it'd be better to fail EADD instead of failing at mprotect() later.
>
> >
> > The noexec case should be addressed in IOC_ADD_PAGES by testing
> @source_vma->vm_flags & VM_MAYEXEC.
> >
> >>
> >>> * In hook security_file_free(), if @file is an enclave, free
> storage
> >>> allocated for WRITTEN flags.
> From: [email protected] [mailto:linux-sgx-
> [email protected]] On Behalf Of Andy Lutomirski
> Sent: Tuesday, June 04, 2019 1:16 PM
>
> On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
> <[email protected]> wrote:
> >
> > On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that
> > > is tracked and enforced by the CPU using a base+mask approach,
> > > similar to how hardware range registers such as the variable MTRRs.
> > > As a result, the ELRANGE must be naturally sized and aligned.
> > >
> > > To reduce boilerplate code that would be needed in every userspace
> > > enclave loader, the SGX driver naturally aligns the mmap() address
> > > and also requires the range to be naturally sized. Unfortunately,
> > > SGX fails to grant a waiver to the MAP_FIXED case, e.g. incorrectly
> > > rejects mmap() if userspace is attempting to map a small slice of an
> existing enclave.
> > >
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > Why you want to allow mmap() to be called multiple times? mmap() could
> > be allowed only once with PROT_NONE and denied afterwards. Is this for
> > sending fd to another process that would map already existing enclave?
> >
> > I don't see any checks for whether the is enclave underneath. Also, I
> > think that in all cases mmap() callback should allow only PROT_NONE as
> > permissions for clarity even if it could called multiple times.
> >
>
> What's the advantage to only allowing PROT_NONE? The idea here is to
> allow a PROT_NONE map followed by some replacemets that overlay it for
> the individual segments. Admittedly, mprotect() can do the same thing,
> but disallowing mmap() seems at least a bit surprising.
Disallowing mmap() is not only surprising but also unnecessary.
A bit off topic here. This mmap()/mprotect() discussion reminds me a question (guess for Jarkko): Now that vma->vm_file->private_data keeps a pointer to the enclave, why do we store it again in vma->vm_private? It isn't a big deal but non-NULL vm_private does prevent mprotect() from merging adjacent VMAs.
> From: Christopherson, Sean J
> Sent: Saturday, June 01, 2019 02:32
>
> /**
> * struct sgx_enclave_add_pages - parameter structure for the
> * %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
> @@ -39,6 +44,7 @@ struct sgx_enclave_create {
> * @secinfo: address for the SECINFO data (common to all pages)
> * @nr_pages: number of pages (must be virtually contiguous)
> * @mrmask: bitmask for the measured 256 byte chunks (common to all
> pages)
> + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all
> pages)
> */
> struct sgx_enclave_add_pages {
> __u64 addr;
> @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
> __u64 secinfo;
> __u32 nr_pages;
> __u16 mrmask;
> -} __attribute__((__packed__));
> + __u16 flags;
> +};
You are adding a flags member. The secinfo structure has already a flags member in it.
Why do you need both - they are both coming from user mode. What kind of scenario would
require having different values. Seems confusing.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
On Tue, Jun 04, 2019 at 03:10:22PM -0700, Xing, Cedric wrote:
> A bit off topic here. This mmap()/mprotect() discussion reminds me a question
> (guess for Jarkko): Now that vma->vm_file->private_data keeps a pointer to
> the enclave, why do we store it again in vma->vm_private? It isn't a big deal
> but non-NULL vm_private does prevent mprotect() from merging adjacent VMAs.
vma->vm_ops->close also prevents merging, and we need that to refcount the
enclave and mm.
We also rely on nullifying vma->vm_private_data in the unlikely event that
adding a new mm to the enclave fails on kzalloc().
On Tue, Jun 04, 2019 at 09:45:14AM -0700, Sean Christopherson wrote:
> Heh, yeah, it's not duplicating LSM functionality. What I was trying to
> say is that this patch allows LSMs to implement policies that are
> equivalent to their existing functionality, e.g. paves the way to add
> security_enclave_load() as an equivalent to security_file_mprotect().
I would suggest describing explicitly in the commit message what you
want to do, which you said here e.g. "I do this because I want to add
LSM hooks". This also relevant information for the LKM discussion.
Lets see how the next version looks like now that you have some
feedback.
In the whole scope of the patch set, in order to make it more
readable, I'll give following suggestions on how it is organized:
1. Leave out anything that is not strictly necessary (cosmetic
fix, batch operation if possible). Better to focus one thing at
a time.
2. Try to organize it so that each function is fully defined in
the scope of one patch even if it would mean larger patches.
3. Do not add one call site helpers unless there is a good
reason to do so. A good reason would be something like needing
to extensive work in error rollback, which would make the
caller a mess.
/Jarkko
On Tue, Jun 04, 2019 at 01:16:04PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
> <[email protected]> wrote:
> >
> > On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> > > tracked and enforced by the CPU using a base+mask approach, similar to
> > > how hardware range registers such as the variable MTRRs. As a result,
> > > the ELRANGE must be naturally sized and aligned.
> > >
> > > To reduce boilerplate code that would be needed in every userspace
> > > enclave loader, the SGX driver naturally aligns the mmap() address and
> > > also requires the range to be naturally sized. Unfortunately, SGX fails
> > > to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> > > if userspace is attempting to map a small slice of an existing enclave.
> > >
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > Why you want to allow mmap() to be called multiple times? mmap() could
> > be allowed only once with PROT_NONE and denied afterwards. Is this for
> > sending fd to another process that would map already existing enclave?
> >
> > I don't see any checks for whether the is enclave underneath. Also, I
> > think that in all cases mmap() callback should allow only PROT_NONE
> > as permissions for clarity even if it could called multiple times.
> >
>
> What's the advantage to only allowing PROT_NONE? The idea here is to
> allow a PROT_NONE map followed by some replacemets that overlay it for
> the individual segments. Admittedly, mprotect() can do the same
> thing, but disallowing mmap() seems at least a bit surprising.
I was merely wondering if it is specifically for the application where a
client process would mmap(MAP_FIXED) an enclave created by a server
process.
/Jarkko
On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> a pointer to the enclave, why do we store it again in vma->vm_private?
> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> from merging adjacent VMAs.
Same semantics as with a regular mmap i.e. you can close the file and
still use the mapping.
/Jarkko
> On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <[email protected]> wrote:
>
>> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
>> A bit off topic here. This mmap()/mprotect() discussion reminds me a
>> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
>> a pointer to the enclave, why do we store it again in vma->vm_private?
>> It isn't a big deal but non-NULL vm_private does prevent mprotect()
>> from merging adjacent VMAs.
>
> Same semantics as with a regular mmap i.e. you can close the file and
> still use the mapping.
>
>
The file should be properly refcounted — vm_file should not go away while it’s mapped.
On Wed, Jun 05, 2019 at 04:10:44AM -0700, Ayoun, Serge wrote:
> > From: Christopherson, Sean J
> > Sent: Saturday, June 01, 2019 02:32
> >
> > /**
> > * struct sgx_enclave_add_pages - parameter structure for the
> > * %SGX_IOC_ENCLAVE_ADD_PAGES ioctl
> > @@ -39,6 +44,7 @@ struct sgx_enclave_create {
> > * @secinfo: address for the SECINFO data (common to all pages)
> > * @nr_pages: number of pages (must be virtually contiguous)
> > * @mrmask: bitmask for the measured 256 byte chunks (common to all
> > pages)
> > + * @flags: flags, e.g. SGX_ALLOW_{READ,WRITE,EXEC} (common to all
> > pages)
> > */
> > struct sgx_enclave_add_pages {
> > __u64 addr;
> > @@ -46,7 +52,8 @@ struct sgx_enclave_add_pages {
> > __u64 secinfo;
> > __u32 nr_pages;
> > __u16 mrmask;
> > -} __attribute__((__packed__));
> > + __u16 flags;
> > +};
>
> You are adding a flags member. The secinfo structure has already a flags member in it.
> Why do you need both - they are both coming from user mode. What kind of scenario would
> require having different values. Seems confusing.
The format of SECINFO.FLAGS is hardware defined, e.g. we can't add a flag
to tag the page as being a zero page for optimization purposes, at least
not without breaking future compatibility or doing tricky overloading.
If you're specifically talking about SECINFO.FLAGS.RWX, due to SGX2 there
are scenarios where userspace will initially want the page to be RW, and
will later want to convert the page to RX. Making decisions based solely
on the initial EPCM permissions would either create a security hole or
force SGX to track "dirty" pages along with a implementing a pre-check
scheme for LSMs (or restricting LSMs to tieing permissions to the host
process and not the enclave).
On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
>
>
> > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <[email protected]> wrote:
> >
> >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> >> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> >> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> >> a pointer to the enclave, why do we store it again in vma->vm_private?
> >> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> >> from merging adjacent VMAs.
> >
> > Same semantics as with a regular mmap i.e. you can close the file and
> > still use the mapping.
> >
> >
>
> The file should be properly refcounted — vm_file should not go away while it’s mapped.
Right, makes sense. It is easy one to change essentially just removing
internal refcount from sgx_encl and using file for the same. I'll update
this to my tree along with the changes to remove LKM/ACPI bits ASAP.
/Jarkko
On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <[email protected]> wrote:
> > >
> > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > >> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> > >> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> > >> a pointer to the enclave, why do we store it again in vma->vm_private?
> > >> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> > >> from merging adjacent VMAs.
> > >
> > > Same semantics as with a regular mmap i.e. you can close the file and
> > > still use the mapping.
> > >
> > >
> >
> > The file should be properly refcounted — vm_file should not go away while it’s mapped.
mm already takes care of that so vm_file does not go away. Still
we need an internal refcount for enclaves to synchronize with the
swapper. In summary nothing needs to be done.
> Right, makes sense. It is easy one to change essentially just removing
> internal refcount from sgx_encl and using file for the same. I'll update
> this to my tree along with the changes to remove LKM/ACPI bits ASAP.
/Jarkko
> From: Jarkko Sakkinen [mailto:[email protected]]
> Sent: Thursday, June 13, 2019 6:48 AM
>
> On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen
> <[email protected]> wrote:
> > > >
> > > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > > >> A bit off topic here. This mmap()/mprotect() discussion reminds
> > > >> me a question (guess for Jarkko): Now that
> > > >> vma->vm_file->private_data keeps a pointer to the enclave, why do
> we store it again in vma->vm_private?
> > > >> It isn't a big deal but non-NULL vm_private does prevent
> > > >> mprotect() from merging adjacent VMAs.
> > > >
> > > > Same semantics as with a regular mmap i.e. you can close the file
> > > > and still use the mapping.
> > > >
> > > >
> > >
> > > The file should be properly refcounted — vm_file should not go away
> while it’s mapped.
>
> mm already takes care of that so vm_file does not go away. Still we need
> an internal refcount for enclaves to synchronize with the swapper. In
> summary nothing needs to be done.
I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks the vma, which in turn reference counts vma->vm_file. Why is the internal refcount still needed?
>
> > Right, makes sense. It is easy one to change essentially just removing
> > internal refcount from sgx_encl and using file for the same. I'll
> > update this to my tree along with the changes to remove LKM/ACPI bits
> ASAP.
>
> /Jarkko
On Thu, Jun 13, 2019 at 09:47:06AM -0700, Xing, Cedric wrote:
> > From: Jarkko Sakkinen [mailto:[email protected]]
> > Sent: Thursday, June 13, 2019 6:48 AM
> >
> > On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > > >
> > > >
> > > > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen
> > <[email protected]> wrote:
> > > > >
> > > > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > > > >> A bit off topic here. This mmap()/mprotect() discussion reminds
> > > > >> me a question (guess for Jarkko): Now that
> > > > >> vma->vm_file->private_data keeps a pointer to the enclave, why do
> > we store it again in vma->vm_private?
> > > > >> It isn't a big deal but non-NULL vm_private does prevent
> > > > >> mprotect() from merging adjacent VMAs.
> > > > >
> > > > > Same semantics as with a regular mmap i.e. you can close the file
> > > > > and still use the mapping.
> > > > >
> > > > >
> > > >
> > > > The file should be properly refcounted — vm_file should not go away
> > while it’s mapped.
> >
> > mm already takes care of that so vm_file does not go away. Still we need
> > an internal refcount for enclaves to synchronize with the swapper. In
> > summary nothing needs to be done.
>
> I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks
> the vma, which in turn reference counts vma->vm_file. Why is the internal
> refcount still needed?
mmap_sem is only held when reclaim is touching PTEs, e.g. to test/clear
its accessed bit and to zap the PTE. The liveliness of the enclave needs
to be guaranteed for the entire duration of reclaim, e.g. we can't have
the enclave disappearing when we go to do EWB. It's also worth nothing
that a single reclaim may operate on more than one mmap_sem, as enclaves
can be shared across processes (mm_structs).
On Thu, Jun 13, 2019 at 10:14:51AM -0700, Sean Christopherson wrote:
> > I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks
> > the vma, which in turn reference counts vma->vm_file. Why is the internal
> > refcount still needed?
>
> mmap_sem is only held when reclaim is touching PTEs, e.g. to test/clear
> its accessed bit and to zap the PTE. The liveliness of the enclave needs
> to be guaranteed for the entire duration of reclaim, e.g. we can't have
> the enclave disappearing when we go to do EWB. It's also worth nothing
> that a single reclaim may operate on more than one mmap_sem, as enclaves
> can be shared across processes (mm_structs).
Anyway, the takeaway I got from this is that encl->refcount does not
need to be updated for VMAs (sent a patch to linux-sgx that I plan
merge).
/Jarkko