2022-02-09 04:09:01

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

In the initial (SGX1) version of SGX, pages in an enclave need to be
created with permissions that support all usages of the pages, from the
time the enclave is initialized until it is unloaded. For example,
pages used by a JIT compiler or when code needs to otherwise be
relocated need to always have RWX permissions.

SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
and can be used to restrict the EPCM permissions of regular enclave
pages within an initialized enclave.

Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
restricting EPCM permissions. With this ioctl() the user specifies
a page range and the permissions to be applied to all pages in
the provided range. After checking the new permissions (more detail
below) the page table entries are reset and any new page
table entries will contain the new, restricted, permissions.
ENCLS[EMODPR] is run to restrict the EPCM permissions followed by
the ENCLS[ETRACK] flow that will ensure no cached
linear-to-physical address mappings to the changed pages remain.

It is possible for the permission change request to fail on any
page within the provided range, either with an error encountered
by the kernel or by the SGX hardware while running
ENCLS[EMODPR]. To support partial success the ioctl() returns an
error code based on failures encountered by the kernel as well
as two result output parameters: one for the number of pages
that were successfully changed and one for the SGX return code.

Checking user provided new permissions
======================================

Enclave page permission changes need to be approached with care and
for this reason permission changes are only allowed if the new
permissions are the same or more restrictive that the vetted
permissions. No additional checking is done to ensure that the
permissions are actually being restricted. This is because the
enclave may have relaxed the EPCM permissions from within
the enclave without letting the kernel know. An attempt to relax
permissions using this call will be ignored by the hardware.

For example, together with the support for relaxing of EPCM permissions,
enclave pages added with the vetted permissions in brackets below
are allowed to have permissions as follows:
* (RWX) => RW => R => RX => RWX
* (RW) => R => RW
* (RX) => R => RX

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since V1:
- Change terminology to use "relax" instead of "extend" to refer to
the case when enclave page permissions are added (Dave).
- Use ioctl() in commit message (Dave).
- Add examples on what permissions would be allowed (Dave).
- Split enclave page permission changes into two ioctl()s, one for
permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
(Jarkko).
- In support of the ioctl() name change the following names have been
changed:
struct sgx_page_modp -> struct sgx_enclave_restrict_perm
sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
sgx_page_modp() -> sgx_enclave_restrict_perm()
- ioctl() takes entire secinfo as input instead of
page permissions only (Jarkko).
- Fix kernel-doc to include () in function name.
- Create and use utility for the ETRACK flow.
- Fixups in comments
- Move kernel-doc to function that provides documentation for
Documentation/x86/sgx.rst.
- Remove redundant comment.
- Make explicit which members of struct sgx_enclave_restrict_perm
are for output (Dave).

arch/x86/include/uapi/asm/sgx.h | 21 +++
arch/x86/kernel/cpu/sgx/encl.c | 4 +-
arch/x86/kernel/cpu/sgx/encl.h | 3 +
arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++
4 files changed, 255 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 5c678b27bb72..b0ffb80bc67f 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -31,6 +31,8 @@ enum sgx_page_flags {
_IO(SGX_MAGIC, 0x04)
#define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
+#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
+ _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)

/**
* struct sgx_enclave_create - parameter structure for the
@@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
__u64 count;
};

+/**
+ * struct sgx_enclave_restrict_perm - parameters for ioctl
+ * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @offset: starting page offset (page aligned relative to enclave base
+ * address defined in SECS)
+ * @length: length of memory (multiple of the page size)
+ * @secinfo: address for the SECINFO data containing the new permission bits
+ * for pages in range described by @offset and @length
+ * @result: (output) SGX result code of ENCLS[EMODPR] function
+ * @count: (output) bytes successfully changed (multiple of page size)
+ */
+struct sgx_enclave_restrict_perm {
+ __u64 offset;
+ __u64 length;
+ __u64 secinfo;
+ __u64 result;
+ __u64 count;
+};
+
struct sgx_enclave_run;

/**
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 8da813504249..a5d4a7efb986 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
return epc_page;
}

-static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
- unsigned long addr)
+struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+ unsigned long addr)
{
struct sgx_epc_page *epc_page;
struct sgx_encl_page *entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index cb9f16d457ac..848a28d28d3d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
bool sgx_va_page_full(struct sgx_va_page *va_page);
void sgx_encl_free_epc_page(struct sgx_epc_page *page);

+struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+ unsigned long addr);
+
#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 9cc6af404bf6..23bdf558b231 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
return ret;
}

+/*
+ * Some SGX functions require that no cached linear-to-physical address
+ * mappings are present before they can succeed. Collaborate with
+ * hardware via ENCLS[ETRACK] to ensure that all cached
+ * linear-to-physical address mappings belonging to all threads of
+ * the enclave are cleared. See sgx_encl_cpumask() for details.
+ */
+static int sgx_enclave_etrack(struct sgx_encl *encl)
+{
+ void *epc_virt;
+ int ret;
+
+ epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
+ ret = __etrack(epc_virt);
+ if (ret) {
+ /*
+ * ETRACK only fails when there is an OS issue. For
+ * example, two consecutive ETRACK was sent without
+ * completed IPI between.
+ */
+ pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
+ /*
+ * Send IPIs to kick CPUs out of the enclave and
+ * try ETRACK again.
+ */
+ on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+ ret = __etrack(epc_virt);
+ if (ret) {
+ pr_err_once("ETRACK repeat returned %d (0x%x)",
+ ret, ret);
+ return -EFAULT;
+ }
+ }
+ on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+
+ return 0;
+}
+
+/**
+ * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view
+ * @encl: Enclave to which the pages belong.
+ * @modp: Checked parameters from user on which pages need modifying.
+ * @secinfo_perm: New (validated) permission bits.
+ *
+ * Return:
+ * - 0: Success.
+ * - -errno: Otherwise.
+ */
+static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
+ struct sgx_enclave_restrict_perm *modp,
+ u64 secinfo_perm)
+{
+ unsigned long vm_prot, run_prot_restore;
+ struct sgx_encl_page *entry;
+ struct sgx_secinfo secinfo;
+ unsigned long addr;
+ unsigned long c;
+ void *epc_virt;
+ int ret;
+
+ memset(&secinfo, 0, sizeof(secinfo));
+ secinfo.flags = secinfo_perm;
+
+ vm_prot = vm_prot_from_secinfo(secinfo_perm);
+
+ for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
+ addr = encl->base + modp->offset + c;
+
+ mutex_lock(&encl->lock);
+
+ entry = sgx_encl_load_page(encl, addr);
+ if (IS_ERR(entry)) {
+ ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
+ goto out_unlock;
+ }
+
+ /*
+ * Changing EPCM permissions is only supported on regular
+ * SGX pages. Attempting this change on other pages will
+ * result in #PF.
+ */
+ if (entry->type != SGX_PAGE_TYPE_REG) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /*
+ * Do not verify if current runtime protection bits are what
+ * is being requested. The enclave may have relaxed EPCM
+ * permissions calls without letting the kernel know and
+ * thus permission restriction may still be needed even if
+ * from the kernel's perspective the permissions are unchanged.
+ */
+
+ /* New permissions should never exceed vetted permissions. */
+ if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
+ ret = -EPERM;
+ goto out_unlock;
+ }
+
+ /* Make sure page stays around while releasing mutex. */
+ if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ /*
+ * Change runtime protection before zapping PTEs to ensure
+ * any new #PF uses new permissions. EPCM permissions (if
+ * needed) not changed yet.
+ */
+ run_prot_restore = entry->vm_run_prot_bits;
+ entry->vm_run_prot_bits = vm_prot;
+
+ mutex_unlock(&encl->lock);
+ /*
+ * Do not keep encl->lock because of dependency on
+ * mmap_lock acquired in sgx_zap_enclave_ptes().
+ */
+ sgx_zap_enclave_ptes(encl, addr);
+
+ mutex_lock(&encl->lock);
+
+ /* Change EPCM permissions. */
+ epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
+ ret = __emodpr(&secinfo, epc_virt);
+ if (encls_faulted(ret)) {
+ /*
+ * All possible faults should be avoidable:
+ * parameters have been checked, will only change
+ * permissions of a regular page, and no concurrent
+ * SGX1/SGX2 ENCLS instructions since these
+ * are protected with mutex.
+ */
+ pr_err_once("EMODPR encountered exception %d\n",
+ ENCLS_TRAPNR(ret));
+ ret = -EFAULT;
+ goto out_prot_restore;
+ }
+ if (encls_failed(ret)) {
+ modp->result = ret;
+ ret = -EFAULT;
+ goto out_prot_restore;
+ }
+
+ ret = sgx_enclave_etrack(encl);
+ if (ret) {
+ ret = -EFAULT;
+ goto out_reclaim;
+ }
+
+ sgx_mark_page_reclaimable(entry->epc_page);
+ mutex_unlock(&encl->lock);
+ }
+
+ ret = 0;
+ goto out;
+
+out_prot_restore:
+ entry->vm_run_prot_bits = run_prot_restore;
+out_reclaim:
+ sgx_mark_page_reclaimable(entry->epc_page);
+out_unlock:
+ mutex_unlock(&encl->lock);
+out:
+ modp->count = c;
+
+ return ret;
+}
+
+/**
+ * sgx_ioc_enclave_restrict_perm() - handler for
+ * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @encl: an enclave pointer
+ * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm
+ * instance
+ *
+ * SGX2 distinguishes between relaxing and restricting the enclave page
+ * permissions maintained by the hardware (EPCM permissions) of pages
+ * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
+ *
+ * EPCM permissions cannot be restricted from within the enclave, the enclave
+ * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]
+ * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call
+ * will be ignored by the hardware.
+ *
+ * Enclave page permissions are not allowed to exceed the maximum vetted
+ * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
+ *
+ * Return:
+ * - 0: Success
+ * - -errno: Otherwise
+ */
+static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
+ void __user *arg)
+{
+ struct sgx_enclave_restrict_perm params;
+ u64 secinfo_perm;
+ long ret;
+
+ ret = sgx_ioc_sgx2_ready(encl);
+ if (ret)
+ return ret;
+
+ if (copy_from_user(&params, arg, sizeof(params)))
+ return -EFAULT;
+
+ if (sgx_validate_offset_length(encl, params.offset, params.length))
+ return -EINVAL;
+
+ ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
+ &secinfo_perm);
+ if (ret)
+ return ret;
+
+ if (params.result || params.count)
+ return -EINVAL;
+
+ ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
+
+ if (copy_to_user(arg, &params, sizeof(params)))
+ return -EFAULT;
+
+ return ret;
+}
+
long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
{
struct sgx_encl *encl = filep->private_data;
@@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
break;
+ case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
+ ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
+ break;
default:
ret = -ENOIOCTLCMD;
break;
--
2.25.1



2022-02-21 02:36:46

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> In the initial (SGX1) version of SGX, pages in an enclave need to be
> created with permissions that support all usages of the pages, from the
> time the enclave is initialized until it is unloaded. For example,
> pages used by a JIT compiler or when code needs to otherwise be
> relocated need to always have RWX permissions.
>
> SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
> and can be used to restrict the EPCM permissions of regular enclave
> pages within an initialized enclave.
>
> Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
> restricting EPCM permissions. With this ioctl() the user specifies
> a page range and the permissions to be applied to all pages in
> the provided range. After checking the new permissions (more detail
> below) the page table entries are reset and any new page
> table entries will contain the new, restricted, permissions.
> ENCLS[EMODPR] is run to restrict the EPCM permissions followed by
> the ENCLS[ETRACK] flow that will ensure no cached
> linear-to-physical address mappings to the changed pages remain.
>
> It is possible for the permission change request to fail on any
> page within the provided range, either with an error encountered
> by the kernel or by the SGX hardware while running
> ENCLS[EMODPR]. To support partial success the ioctl() returns an
> error code based on failures encountered by the kernel as well
> as two result output parameters: one for the number of pages
> that were successfully changed and one for the SGX return code.
>
> Checking user provided new permissions
> ======================================
>
> Enclave page permission changes need to be approached with care and
> for this reason permission changes are only allowed if the new
> permissions are the same or more restrictive that the vetted
> permissions. No additional checking is done to ensure that the
> permissions are actually being restricted. This is because the
> enclave may have relaxed the EPCM permissions from within
> the enclave without letting the kernel know. An attempt to relax
> permissions using this call will be ignored by the hardware.
>
> For example, together with the support for relaxing of EPCM permissions,
> enclave pages added with the vetted permissions in brackets below
> are allowed to have permissions as follows:
> * (RWX) => RW => R => RX => RWX
> * (RW) => R => RW
> * (RX) => R => RX
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> Changes since V1:
> - Change terminology to use "relax" instead of "extend" to refer to
> the case when enclave page permissions are added (Dave).
> - Use ioctl() in commit message (Dave).
> - Add examples on what permissions would be allowed (Dave).
> - Split enclave page permission changes into two ioctl()s, one for
> permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
> and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
> (Jarkko).
> - In support of the ioctl() name change the following names have been
> changed:
> struct sgx_page_modp -> struct sgx_enclave_restrict_perm
> sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
> sgx_page_modp() -> sgx_enclave_restrict_perm()
> - ioctl() takes entire secinfo as input instead of
> page permissions only (Jarkko).
> - Fix kernel-doc to include () in function name.
> - Create and use utility for the ETRACK flow.
> - Fixups in comments
> - Move kernel-doc to function that provides documentation for
> Documentation/x86/sgx.rst.
> - Remove redundant comment.
> - Make explicit which members of struct sgx_enclave_restrict_perm
> are for output (Dave).
>
> arch/x86/include/uapi/asm/sgx.h | 21 +++
> arch/x86/kernel/cpu/sgx/encl.c | 4 +-
> arch/x86/kernel/cpu/sgx/encl.h | 3 +
> arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++
> 4 files changed, 255 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 5c678b27bb72..b0ffb80bc67f 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -31,6 +31,8 @@ enum sgx_page_flags {
> _IO(SGX_MAGIC, 0x04)
> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>
> /**
> * struct sgx_enclave_create - parameter structure for the
> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
> __u64 count;
> };
>
> +/**
> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> + * @offset: starting page offset (page aligned relative to enclave base
> + * address defined in SECS)
> + * @length: length of memory (multiple of the page size)
> + * @secinfo: address for the SECINFO data containing the new permission bits
> + * for pages in range described by @offset and @length
> + * @result: (output) SGX result code of ENCLS[EMODPR] function
> + * @count: (output) bytes successfully changed (multiple of page size)
> + */
> +struct sgx_enclave_restrict_perm {
> + __u64 offset;
> + __u64 length;
> + __u64 secinfo;
> + __u64 result;
> + __u64 count;
> +};
> +
> struct sgx_enclave_run;
>
> /**
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 8da813504249..a5d4a7efb986 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> return epc_page;
> }
>
> -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> - unsigned long addr)
> +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> + unsigned long addr)
> {
> struct sgx_epc_page *epc_page;
> struct sgx_encl_page *entry;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index cb9f16d457ac..848a28d28d3d 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
> bool sgx_va_page_full(struct sgx_va_page *va_page);
> void sgx_encl_free_epc_page(struct sgx_epc_page *page);
>
> +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> + unsigned long addr);
> +
> #endif /* _X86_ENCL_H */
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 9cc6af404bf6..23bdf558b231 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
> return ret;
> }
>
> +/*
> + * Some SGX functions require that no cached linear-to-physical address
> + * mappings are present before they can succeed. Collaborate with
> + * hardware via ENCLS[ETRACK] to ensure that all cached
> + * linear-to-physical address mappings belonging to all threads of
> + * the enclave are cleared. See sgx_encl_cpumask() for details.
> + */
> +static int sgx_enclave_etrack(struct sgx_encl *encl)
> +{
> + void *epc_virt;
> + int ret;
> +
> + epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
> + ret = __etrack(epc_virt);
> + if (ret) {
> + /*
> + * ETRACK only fails when there is an OS issue. For
> + * example, two consecutive ETRACK was sent without
> + * completed IPI between.
> + */
> + pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
> + /*
> + * Send IPIs to kick CPUs out of the enclave and
> + * try ETRACK again.
> + */
> + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> + ret = __etrack(epc_virt);
> + if (ret) {
> + pr_err_once("ETRACK repeat returned %d (0x%x)",
> + ret, ret);
> + return -EFAULT;
> + }
> + }
> + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> +
> + return 0;
> +}
> +
> +/**
> + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view
> + * @encl: Enclave to which the pages belong.
> + * @modp: Checked parameters from user on which pages need modifying.
> + * @secinfo_perm: New (validated) permission bits.
> + *
> + * Return:
> + * - 0: Success.
> + * - -errno: Otherwise.
> + */
> +static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
> + struct sgx_enclave_restrict_perm *modp,
> + u64 secinfo_perm)
> +{
> + unsigned long vm_prot, run_prot_restore;
> + struct sgx_encl_page *entry;
> + struct sgx_secinfo secinfo;
> + unsigned long addr;
> + unsigned long c;
> + void *epc_virt;
> + int ret;
> +
> + memset(&secinfo, 0, sizeof(secinfo));
> + secinfo.flags = secinfo_perm;
> +
> + vm_prot = vm_prot_from_secinfo(secinfo_perm);
> +
> + for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> + addr = encl->base + modp->offset + c;
> +
> + mutex_lock(&encl->lock);
> +
> + entry = sgx_encl_load_page(encl, addr);
> + if (IS_ERR(entry)) {
> + ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
> + goto out_unlock;
> + }
> +
> + /*
> + * Changing EPCM permissions is only supported on regular
> + * SGX pages. Attempting this change on other pages will
> + * result in #PF.
> + */
> + if (entry->type != SGX_PAGE_TYPE_REG) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /*
> + * Do not verify if current runtime protection bits are what
> + * is being requested. The enclave may have relaxed EPCM
> + * permissions calls without letting the kernel know and
> + * thus permission restriction may still be needed even if
> + * from the kernel's perspective the permissions are unchanged.
> + */
> +
> + /* New permissions should never exceed vetted permissions. */
> + if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
> + ret = -EPERM;
> + goto out_unlock;
> + }
> +
> + /* Make sure page stays around while releasing mutex. */
> + if (sgx_unmark_page_reclaimable(entry->epc_page)) {
> + ret = -EAGAIN;
> + goto out_unlock;
> + }
> +
> + /*
> + * Change runtime protection before zapping PTEs to ensure
> + * any new #PF uses new permissions. EPCM permissions (if
> + * needed) not changed yet.
> + */
> + run_prot_restore = entry->vm_run_prot_bits;
> + entry->vm_run_prot_bits = vm_prot;
> +
> + mutex_unlock(&encl->lock);
> + /*
> + * Do not keep encl->lock because of dependency on
> + * mmap_lock acquired in sgx_zap_enclave_ptes().
> + */
> + sgx_zap_enclave_ptes(encl, addr);
> +
> + mutex_lock(&encl->lock);
> +
> + /* Change EPCM permissions. */
> + epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
> + ret = __emodpr(&secinfo, epc_virt);
> + if (encls_faulted(ret)) {
> + /*
> + * All possible faults should be avoidable:
> + * parameters have been checked, will only change
> + * permissions of a regular page, and no concurrent
> + * SGX1/SGX2 ENCLS instructions since these
> + * are protected with mutex.
> + */
> + pr_err_once("EMODPR encountered exception %d\n",
> + ENCLS_TRAPNR(ret));
> + ret = -EFAULT;
> + goto out_prot_restore;
> + }
> + if (encls_failed(ret)) {
> + modp->result = ret;
> + ret = -EFAULT;
> + goto out_prot_restore;
> + }
> +
> + ret = sgx_enclave_etrack(encl);
> + if (ret) {
> + ret = -EFAULT;
> + goto out_reclaim;
> + }
> +
> + sgx_mark_page_reclaimable(entry->epc_page);
> + mutex_unlock(&encl->lock);
> + }
> +
> + ret = 0;
> + goto out;
> +
> +out_prot_restore:
> + entry->vm_run_prot_bits = run_prot_restore;
> +out_reclaim:
> + sgx_mark_page_reclaimable(entry->epc_page);
> +out_unlock:
> + mutex_unlock(&encl->lock);
> +out:
> + modp->count = c;
> +
> + return ret;
> +}
> +
> +/**
> + * sgx_ioc_enclave_restrict_perm() - handler for
> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> + * @encl: an enclave pointer
> + * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm
> + * instance
> + *
> + * SGX2 distinguishes between relaxing and restricting the enclave page
> + * permissions maintained by the hardware (EPCM permissions) of pages
> + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
> + *
> + * EPCM permissions cannot be restricted from within the enclave, the enclave
> + * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]
> + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call
> + * will be ignored by the hardware.
> + *
> + * Enclave page permissions are not allowed to exceed the maximum vetted
> + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
> + *
> + * Return:
> + * - 0: Success
> + * - -errno: Otherwise
> + */
> +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
> + void __user *arg)
> +{
> + struct sgx_enclave_restrict_perm params;
> + u64 secinfo_perm;
> + long ret;
> +
> + ret = sgx_ioc_sgx2_ready(encl);
> + if (ret)
> + return ret;
> +
> + if (copy_from_user(&params, arg, sizeof(params)))
> + return -EFAULT;
> +
> + if (sgx_validate_offset_length(encl, params.offset, params.length))
> + return -EINVAL;
> +
> + ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
> + &secinfo_perm);
> + if (ret)
> + return ret;
> +
> + if (params.result || params.count)
> + return -EINVAL;
> +
> + ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
> +
> + if (copy_to_user(arg, &params, sizeof(params)))
> + return -EFAULT;
> +
> + return ret;
> +}
> +
> long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> {
> struct sgx_encl *encl = filep->private_data;
> @@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
> ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
> break;
> + case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
> + ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
> + break;
> default:
> ret = -ENOIOCTLCMD;
> break;
> --
> 2.25.1
>

Just a suggestion but these might be a bit less cluttered explanations of
the fields:

/// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
#[repr(C)]
pub struct RelaxPermissions {
/// In: starting page offset
offset: u64,
/// In: length of the address range (multiple of the page size)
length: u64,
/// In: SECINFO containing the relaxed permissions
secinfo: u64,
/// Out: length of the address range successfully changed
count: u64,
};

/// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
#[repr(C)]
pub struct RestrictPermissions {
/// In: starting page offset
offset: u64,
/// In: length of the address range (multiple of the page size)
length: u64,
/// In: SECINFO containing the restricted permissions
secinfo: u64,
/// In: ENCLU[EMODPR] return value
result: u64,
/// Out: length of the address range successfully changed
count: u64,
};

I can live with the current ones too but I rewrote them so that I can
quickly make sense of the fields later. It's Rust code but the point is
the documentation...

Also, it should not be too much trouble to use the struct in user space
code even if the struct names are struct sgx_enclave_relax_permissions and
struct sgx_enclave_restrict_permissions, given that you most likely have
exactly single call-site in the run-time.

Other than that, looks quite good.

BR, Jarkko

2022-02-22 19:25:01

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:

...

>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
>> index 5c678b27bb72..b0ffb80bc67f 100644
>> --- a/arch/x86/include/uapi/asm/sgx.h
>> +++ b/arch/x86/include/uapi/asm/sgx.h
>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
>> _IO(SGX_MAGIC, 0x04)
>> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
>> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
>> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>>
>> /**
>> * struct sgx_enclave_create - parameter structure for the
>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
>> __u64 count;
>> };
>>
>> +/**
>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
>> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>> + * @offset: starting page offset (page aligned relative to enclave base
>> + * address defined in SECS)
>> + * @length: length of memory (multiple of the page size)
>> + * @secinfo: address for the SECINFO data containing the new permission bits
>> + * for pages in range described by @offset and @length
>> + * @result: (output) SGX result code of ENCLS[EMODPR] function
>> + * @count: (output) bytes successfully changed (multiple of page size)
>> + */
>> +struct sgx_enclave_restrict_perm {
>> + __u64 offset;
>> + __u64 length;
>> + __u64 secinfo;
>> + __u64 result;
>> + __u64 count;
>> +};
>> +
>> struct sgx_enclave_run;
>>
>> /**

...

>
> Just a suggestion but these might be a bit less cluttered explanations of
> the fields:
>
> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RelaxPermissions {
> /// In: starting page offset
> offset: u64,
> /// In: length of the address range (multiple of the page size)
> length: u64,
> /// In: SECINFO containing the relaxed permissions
> secinfo: u64,
> /// Out: length of the address range successfully changed
> count: u64,
> };
>
> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RestrictPermissions {
> /// In: starting page offset
> offset: u64,
> /// In: length of the address range (multiple of the page size)
> length: u64,
> /// In: SECINFO containing the restricted permissions
> secinfo: u64,
> /// In: ENCLU[EMODPR] return value
> result: u64,
> /// Out: length of the address range successfully changed
> count: u64,
> };

In your proposal you shorten the descriptions from the current implementation.
I do consider the removed information valuable since I believe that it helps
users understand the kernel interface requirements without needing to be
familiar with or dig into the kernel code to understand how the provided data
is used.

For example, you shorten offset to "starting page offset", but what was removed
was the requirement that this offset has to be page aligned and what the offset
is relative to. I do believe summarizing these requirements upfront helps
a user space developer by not needing to dig through kernel code later
in order to understand why an -EINVAL was received.


> I can live with the current ones too but I rewrote them so that I can
> quickly make sense of the fields later. It's Rust code but the point is
> the documentation...

Since you do seem to be ok with the current descriptions I would prefer
to keep them.

> Also, it should not be too much trouble to use the struct in user space
> code even if the struct names are struct sgx_enclave_relax_permissions and
> struct sgx_enclave_restrict_permissions, given that you most likely have
> exactly single call-site in the run-time.

Are you requesting that I make the following name changes?
struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions

If so, do you want the function names also written out in this way?
sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions()
sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions()
sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions()
sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()

> Other than that, looks quite good.

Thank you very much for reviewing and testing this work.

Reinette

2022-02-23 16:36:01

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> > On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
>
> ...
>
> >> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> >> index 5c678b27bb72..b0ffb80bc67f 100644
> >> --- a/arch/x86/include/uapi/asm/sgx.h
> >> +++ b/arch/x86/include/uapi/asm/sgx.h
> >> @@ -31,6 +31,8 @@ enum sgx_page_flags {
> >> _IO(SGX_MAGIC, 0x04)
> >> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> >> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> >> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> >> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >>
> >> /**
> >> * struct sgx_enclave_create - parameter structure for the
> >> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
> >> __u64 count;
> >> };
> >>
> >> +/**
> >> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> >> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> >> + * @offset: starting page offset (page aligned relative to enclave base
> >> + * address defined in SECS)
> >> + * @length: length of memory (multiple of the page size)
> >> + * @secinfo: address for the SECINFO data containing the new permission bits
> >> + * for pages in range described by @offset and @length
> >> + * @result: (output) SGX result code of ENCLS[EMODPR] function
> >> + * @count: (output) bytes successfully changed (multiple of page size)
> >> + */
> >> +struct sgx_enclave_restrict_perm {
> >> + __u64 offset;
> >> + __u64 length;
> >> + __u64 secinfo;
> >> + __u64 result;
> >> + __u64 count;
> >> +};
> >> +
> >> struct sgx_enclave_run;
> >>
> >> /**
>
> ...
>
> >
> > Just a suggestion but these might be a bit less cluttered explanations of
> > the fields:
> >
> > /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> > #[repr(C)]
> > pub struct RelaxPermissions {
> > /// In: starting page offset
> > offset: u64,
> > /// In: length of the address range (multiple of the page size)
> > length: u64,
> > /// In: SECINFO containing the relaxed permissions
> > secinfo: u64,
> > /// Out: length of the address range successfully changed
> > count: u64,
> > };
> >
> > /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> > #[repr(C)]
> > pub struct RestrictPermissions {
> > /// In: starting page offset
> > offset: u64,
> > /// In: length of the address range (multiple of the page size)
> > length: u64,
> > /// In: SECINFO containing the restricted permissions
> > secinfo: u64,
> > /// In: ENCLU[EMODPR] return value
> > result: u64,
> > /// Out: length of the address range successfully changed
> > count: u64,
> > };
>
> In your proposal you shorten the descriptions from the current implementation.
> I do consider the removed information valuable since I believe that it helps
> users understand the kernel interface requirements without needing to be
> familiar with or dig into the kernel code to understand how the provided data
> is used.
>
> For example, you shorten offset to "starting page offset", but what was removed
> was the requirement that this offset has to be page aligned and what the offset
> is relative to. I do believe summarizing these requirements upfront helps
> a user space developer by not needing to dig through kernel code later
> in order to understand why an -EINVAL was received.
>
>
> > I can live with the current ones too but I rewrote them so that I can
> > quickly make sense of the fields later. It's Rust code but the point is
> > the documentation...
>
> Since you do seem to be ok with the current descriptions I would prefer
> to keep them.

Yeah, they are fine to me.

> > Also, it should not be too much trouble to use the struct in user space
> > code even if the struct names are struct sgx_enclave_relax_permissions and
> > struct sgx_enclave_restrict_permissions, given that you most likely have
> > exactly single call-site in the run-time.
>
> Are you requesting that I make the following name changes?
> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
>
> If so, do you want the function names also written out in this way?
> sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions()
> sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions()
> sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions()
> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()

Yes, unless you have a specific reason to shorten them :-)

> > Other than that, looks quite good.
>
> Thank you very much for reviewing and testing this work.

NP

> Reinette

BR, Jarkko

2022-02-24 00:42:27

by Dhanraj, Vijay

[permalink] [raw]
Subject: RE: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi All,

Regarding the recent update of splitting the page permissions change request into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is, revert to how it was done in the v1 version?

Why? Currently in Gramine (a library OS for unmodified applications, https://gramineproject.io/) with the new proposed change, one needs to store the page permission for each page or range of pages. And for every request of `mmap` or `mprotect`, Gramine would have to do a lookup of the page permissions for the request range and then call the respective IOCTL either RESTRICT or RELAX. This seems a little overwhelming.

Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this approach, we can avoid storing page permissions and simplify the implementation.

I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what will be the performance impact. Is there any data point to see the performance impact?

Thanks,
-Vijay

> -----Original Message-----
> From: Jarkko Sakkinen <[email protected]>
> Sent: Sunday, February 20, 2022 4:50 PM
> To: Reinette Chatre <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page
> permissions
>
> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> > In the initial (SGX1) version of SGX, pages in an enclave need to be
> > created with permissions that support all usages of the pages, from
> > the time the enclave is initialized until it is unloaded. For example,
> > pages used by a JIT compiler or when code needs to otherwise be
> > relocated need to always have RWX permissions.
> >
> > SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
> > and can be used to restrict the EPCM permissions of regular enclave
> > pages within an initialized enclave.
> >
> > Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
> > restricting EPCM permissions. With this ioctl() the user specifies a
> > page range and the permissions to be applied to all pages in the
> > provided range. After checking the new permissions (more detail
> > below) the page table entries are reset and any new page table entries
> > will contain the new, restricted, permissions.
> > ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the
> > ENCLS[ETRACK] flow that will ensure no cached linear-to-physical
> > address mappings to the changed pages remain.
> >
> > It is possible for the permission change request to fail on any page
> > within the provided range, either with an error encountered by the
> > kernel or by the SGX hardware while running ENCLS[EMODPR]. To support
> > partial success the ioctl() returns an error code based on failures
> > encountered by the kernel as well as two result output parameters: one
> > for the number of pages that were successfully changed and one for the
> > SGX return code.
> >
> > Checking user provided new permissions
> > ======================================
> >
> > Enclave page permission changes need to be approached with care and
> > for this reason permission changes are only allowed if the new
> > permissions are the same or more restrictive that the vetted
> > permissions. No additional checking is done to ensure that the
> > permissions are actually being restricted. This is because the enclave
> > may have relaxed the EPCM permissions from within the enclave without
> > letting the kernel know. An attempt to relax permissions using this
> > call will be ignored by the hardware.
> >
> > For example, together with the support for relaxing of EPCM
> > permissions, enclave pages added with the vetted permissions in
> > brackets below are allowed to have permissions as follows:
> > * (RWX) => RW => R => RX => RWX
> > * (RW) => R => RW
> > * (RX) => R => RX
> >
> > Signed-off-by: Reinette Chatre <[email protected]>
> > ---
> > Changes since V1:
> > - Change terminology to use "relax" instead of "extend" to refer to
> > the case when enclave page permissions are added (Dave).
> > - Use ioctl() in commit message (Dave).
> > - Add examples on what permissions would be allowed (Dave).
> > - Split enclave page permission changes into two ioctl()s, one for
> > permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
> > and one for permission relaxing
> (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
> > (Jarkko).
> > - In support of the ioctl() name change the following names have been
> > changed:
> > struct sgx_page_modp -> struct sgx_enclave_restrict_perm
> > sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
> > sgx_page_modp() -> sgx_enclave_restrict_perm()
> > - ioctl() takes entire secinfo as input instead of
> > page permissions only (Jarkko).
> > - Fix kernel-doc to include () in function name.
> > - Create and use utility for the ETRACK flow.
> > - Fixups in comments
> > - Move kernel-doc to function that provides documentation for
> > Documentation/x86/sgx.rst.
> > - Remove redundant comment.
> > - Make explicit which members of struct sgx_enclave_restrict_perm
> > are for output (Dave).
> >
> > arch/x86/include/uapi/asm/sgx.h | 21 +++
> > arch/x86/kernel/cpu/sgx/encl.c | 4 +-
> > arch/x86/kernel/cpu/sgx/encl.h | 3 +
> > arch/x86/kernel/cpu/sgx/ioctl.c | 229
> > ++++++++++++++++++++++++++++++++
> > 4 files changed, 255 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/uapi/asm/sgx.h
> > b/arch/x86/include/uapi/asm/sgx.h index 5c678b27bb72..b0ffb80bc67f
> > 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -31,6 +31,8 @@ enum sgx_page_flags {
> > _IO(SGX_MAGIC, 0x04)
> > #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> > _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> > +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> > + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >
> > /**
> > * struct sgx_enclave_create - parameter structure for the @@ -95,6
> > +97,25 @@ struct sgx_enclave_relax_perm {
> > __u64 count;
> > };
> >
> > +/**
> > + * struct sgx_enclave_restrict_perm - parameters for ioctl
> > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> > + * @offset: starting page offset (page aligned relative to enclave base
> > + * address defined in SECS)
> > + * @length: length of memory (multiple of the page size)
> > + * @secinfo: address for the SECINFO data containing the new permission
> bits
> > + * for pages in range described by @offset and @length
> > + * @result: (output) SGX result code of ENCLS[EMODPR] function
> > + * @count: (output) bytes successfully changed (multiple of page size)
> > + */
> > +struct sgx_enclave_restrict_perm {
> > + __u64 offset;
> > + __u64 length;
> > + __u64 secinfo;
> > + __u64 result;
> > + __u64 count;
> > +};
> > +
> > struct sgx_enclave_run;
> >
> > /**
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c index 8da813504249..a5d4a7efb986
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct
> sgx_encl_page *encl_page,
> > return epc_page;
> > }
> >
> > -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > - unsigned long addr)
> > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > + unsigned long addr)
> > {
> > struct sgx_epc_page *epc_page;
> > struct sgx_encl_page *entry;
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h
> > b/arch/x86/kernel/cpu/sgx/encl.h index cb9f16d457ac..848a28d28d3d
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page
> *va_page,
> > unsigned int offset); bool sgx_va_page_full(struct sgx_va_page
> > *va_page); void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >
> > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > + unsigned long addr);
> > +
> > #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c
> > b/arch/x86/kernel/cpu/sgx/ioctl.c index 9cc6af404bf6..23bdf558b231
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct
> sgx_encl *encl, void __user *arg)
> > return ret;
> > }
> >
> > +/*
> > + * Some SGX functions require that no cached linear-to-physical
> > +address
> > + * mappings are present before they can succeed. Collaborate with
> > + * hardware via ENCLS[ETRACK] to ensure that all cached
> > + * linear-to-physical address mappings belonging to all threads of
> > + * the enclave are cleared. See sgx_encl_cpumask() for details.
> > + */
> > +static int sgx_enclave_etrack(struct sgx_encl *encl) {
> > + void *epc_virt;
> > + int ret;
> > +
> > + epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
> > + ret = __etrack(epc_virt);
> > + if (ret) {
> > + /*
> > + * ETRACK only fails when there is an OS issue. For
> > + * example, two consecutive ETRACK was sent without
> > + * completed IPI between.
> > + */
> > + pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
> > + /*
> > + * Send IPIs to kick CPUs out of the enclave and
> > + * try ETRACK again.
> > + */
> > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb,
> NULL, 1);
> > + ret = __etrack(epc_virt);
> > + if (ret) {
> > + pr_err_once("ETRACK repeat returned %d (0x%x)",
> > + ret, ret);
> > + return -EFAULT;
> > + }
> > + }
> > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS
> view
> > + * @encl: Enclave to which the pages belong.
> > + * @modp: Checked parameters from user on which pages need
> modifying.
> > + * @secinfo_perm: New (validated) permission bits.
> > + *
> > + * Return:
> > + * - 0: Success.
> > + * - -errno: Otherwise.
> > + */
> > +static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
> > + struct sgx_enclave_restrict_perm *modp,
> > + u64 secinfo_perm)
> > +{
> > + unsigned long vm_prot, run_prot_restore;
> > + struct sgx_encl_page *entry;
> > + struct sgx_secinfo secinfo;
> > + unsigned long addr;
> > + unsigned long c;
> > + void *epc_virt;
> > + int ret;
> > +
> > + memset(&secinfo, 0, sizeof(secinfo));
> > + secinfo.flags = secinfo_perm;
> > +
> > + vm_prot = vm_prot_from_secinfo(secinfo_perm);
> > +
> > + for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> > + addr = encl->base + modp->offset + c;
> > +
> > + mutex_lock(&encl->lock);
> > +
> > + entry = sgx_encl_load_page(encl, addr);
> > + if (IS_ERR(entry)) {
> > + ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -
> EFAULT;
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * Changing EPCM permissions is only supported on regular
> > + * SGX pages. Attempting this change on other pages will
> > + * result in #PF.
> > + */
> > + if (entry->type != SGX_PAGE_TYPE_REG) {
> > + ret = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * Do not verify if current runtime protection bits are what
> > + * is being requested. The enclave may have relaxed EPCM
> > + * permissions calls without letting the kernel know and
> > + * thus permission restriction may still be needed even if
> > + * from the kernel's perspective the permissions are
> unchanged.
> > + */
> > +
> > + /* New permissions should never exceed vetted
> permissions. */
> > + if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > +
> > + /* Make sure page stays around while releasing mutex. */
> > + if (sgx_unmark_page_reclaimable(entry->epc_page)) {
> > + ret = -EAGAIN;
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * Change runtime protection before zapping PTEs to ensure
> > + * any new #PF uses new permissions. EPCM permissions (if
> > + * needed) not changed yet.
> > + */
> > + run_prot_restore = entry->vm_run_prot_bits;
> > + entry->vm_run_prot_bits = vm_prot;
> > +
> > + mutex_unlock(&encl->lock);
> > + /*
> > + * Do not keep encl->lock because of dependency on
> > + * mmap_lock acquired in sgx_zap_enclave_ptes().
> > + */
> > + sgx_zap_enclave_ptes(encl, addr);
> > +
> > + mutex_lock(&encl->lock);
> > +
> > + /* Change EPCM permissions. */
> > + epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
> > + ret = __emodpr(&secinfo, epc_virt);
> > + if (encls_faulted(ret)) {
> > + /*
> > + * All possible faults should be avoidable:
> > + * parameters have been checked, will only change
> > + * permissions of a regular page, and no concurrent
> > + * SGX1/SGX2 ENCLS instructions since these
> > + * are protected with mutex.
> > + */
> > + pr_err_once("EMODPR encountered exception
> %d\n",
> > + ENCLS_TRAPNR(ret));
> > + ret = -EFAULT;
> > + goto out_prot_restore;
> > + }
> > + if (encls_failed(ret)) {
> > + modp->result = ret;
> > + ret = -EFAULT;
> > + goto out_prot_restore;
> > + }
> > +
> > + ret = sgx_enclave_etrack(encl);
> > + if (ret) {
> > + ret = -EFAULT;
> > + goto out_reclaim;
> > + }
> > +
> > + sgx_mark_page_reclaimable(entry->epc_page);
> > + mutex_unlock(&encl->lock);
> > + }
> > +
> > + ret = 0;
> > + goto out;
> > +
> > +out_prot_restore:
> > + entry->vm_run_prot_bits = run_prot_restore;
> > +out_reclaim:
> > + sgx_mark_page_reclaimable(entry->epc_page);
> > +out_unlock:
> > + mutex_unlock(&encl->lock);
> > +out:
> > + modp->count = c;
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * sgx_ioc_enclave_restrict_perm() - handler for
> > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> > + * @encl: an enclave pointer
> > + * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm
> > + * instance
> > + *
> > + * SGX2 distinguishes between relaxing and restricting the enclave
> > +page
> > + * permissions maintained by the hardware (EPCM permissions) of pages
> > + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
> > + *
> > + * EPCM permissions cannot be restricted from within the enclave, the
> > +enclave
> > + * requires the kernel to run the privileged level 0 instructions
> > +ENCLS[EMODPR]
> > + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this
> > +call
> > + * will be ignored by the hardware.
> > + *
> > + * Enclave page permissions are not allowed to exceed the maximum
> > +vetted
> > + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
> > + *
> > + * Return:
> > + * - 0: Success
> > + * - -errno: Otherwise
> > + */
> > +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
> > + void __user *arg)
> > +{
> > + struct sgx_enclave_restrict_perm params;
> > + u64 secinfo_perm;
> > + long ret;
> > +
> > + ret = sgx_ioc_sgx2_ready(encl);
> > + if (ret)
> > + return ret;
> > +
> > + if (copy_from_user(&params, arg, sizeof(params)))
> > + return -EFAULT;
> > +
> > + if (sgx_validate_offset_length(encl, params.offset, params.length))
> > + return -EINVAL;
> > +
> > + ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
> > + &secinfo_perm);
> > + if (ret)
> > + return ret;
> > +
> > + if (params.result || params.count)
> > + return -EINVAL;
> > +
> > + ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
> > +
> > + if (copy_to_user(arg, &params, sizeof(params)))
> > + return -EFAULT;
> > +
> > + return ret;
> > +}
> > +
> > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long
> > arg) {
> > struct sgx_encl *encl = filep->private_data; @@ -918,6 +1144,9 @@
> > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> > case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
> > ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
> > break;
> > + case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
> > + ret = sgx_ioc_enclave_restrict_perm(encl, (void __user
> *)arg);
> > + break;
> > default:
> > ret = -ENOIOCTLCMD;
> > break;
> > --
> > 2.25.1
> >
>
> Just a suggestion but these might be a bit less cluttered explanations of
> the fields:
>
> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RelaxPermissions {
> /// In: starting page offset
> offset: u64,
> /// In: length of the address range (multiple of the page size)
> length: u64,
> /// In: SECINFO containing the relaxed permissions
> secinfo: u64,
> /// Out: length of the address range successfully changed
> count: u64,
> };
>
> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RestrictPermissions {
> /// In: starting page offset
> offset: u64,
> /// In: length of the address range (multiple of the page size)
> length: u64,
> /// In: SECINFO containing the restricted permissions
> secinfo: u64,
> /// In: ENCLU[EMODPR] return value
> result: u64,
> /// Out: length of the address range successfully changed
> count: u64,
> };
>
> I can live with the current ones too but I rewrote them so that I can
> quickly make sense of the fields later. It's Rust code but the point is
> the documentation...
>
> Also, it should not be too much trouble to use the struct in user space
> code even if the struct names are struct sgx_enclave_relax_permissions and
> struct sgx_enclave_restrict_permissions, given that you most likely have
> exactly single call-site in the run-time.
>
> Other than that, looks quite good.
>
> BR, Jarkko

2022-02-24 01:04:22

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote:
> On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
>>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
>>
>> ...
>>
>>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
>>>> index 5c678b27bb72..b0ffb80bc67f 100644
>>>> --- a/arch/x86/include/uapi/asm/sgx.h
>>>> +++ b/arch/x86/include/uapi/asm/sgx.h
>>>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
>>>> _IO(SGX_MAGIC, 0x04)
>>>> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
>>>> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
>>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
>>>> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>>>>
>>>> /**
>>>> * struct sgx_enclave_create - parameter structure for the
>>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
>>>> __u64 count;
>>>> };
>>>>
>>>> +/**
>>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
>>>> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>>>> + * @offset: starting page offset (page aligned relative to enclave base
>>>> + * address defined in SECS)
>>>> + * @length: length of memory (multiple of the page size)
>>>> + * @secinfo: address for the SECINFO data containing the new permission bits
>>>> + * for pages in range described by @offset and @length
>>>> + * @result: (output) SGX result code of ENCLS[EMODPR] function
>>>> + * @count: (output) bytes successfully changed (multiple of page size)
>>>> + */
>>>> +struct sgx_enclave_restrict_perm {
>>>> + __u64 offset;
>>>> + __u64 length;
>>>> + __u64 secinfo;
>>>> + __u64 result;
>>>> + __u64 count;
>>>> +};
>>>> +
>>>> struct sgx_enclave_run;
>>>>
>>>> /**
>>
>> ...
>>
>>>
>>> Just a suggestion but these might be a bit less cluttered explanations of
>>> the fields:
>>>
>>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
>>> #[repr(C)]
>>> pub struct RelaxPermissions {
>>> /// In: starting page offset
>>> offset: u64,
>>> /// In: length of the address range (multiple of the page size)
>>> length: u64,
>>> /// In: SECINFO containing the relaxed permissions
>>> secinfo: u64,
>>> /// Out: length of the address range successfully changed
>>> count: u64,
>>> };
>>>
>>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
>>> #[repr(C)]
>>> pub struct RestrictPermissions {
>>> /// In: starting page offset
>>> offset: u64,
>>> /// In: length of the address range (multiple of the page size)
>>> length: u64,
>>> /// In: SECINFO containing the restricted permissions
>>> secinfo: u64,
>>> /// In: ENCLU[EMODPR] return value
>>> result: u64,
>>> /// Out: length of the address range successfully changed
>>> count: u64,
>>> };
>>
>> In your proposal you shorten the descriptions from the current implementation.
>> I do consider the removed information valuable since I believe that it helps
>> users understand the kernel interface requirements without needing to be
>> familiar with or dig into the kernel code to understand how the provided data
>> is used.
>>
>> For example, you shorten offset to "starting page offset", but what was removed
>> was the requirement that this offset has to be page aligned and what the offset
>> is relative to. I do believe summarizing these requirements upfront helps
>> a user space developer by not needing to dig through kernel code later
>> in order to understand why an -EINVAL was received.
>>
>>
>>> I can live with the current ones too but I rewrote them so that I can
>>> quickly make sense of the fields later. It's Rust code but the point is
>>> the documentation...
>>
>> Since you do seem to be ok with the current descriptions I would prefer
>> to keep them.
>
> Yeah, they are fine to me.
>
>>> Also, it should not be too much trouble to use the struct in user space
>>> code even if the struct names are struct sgx_enclave_relax_permissions and
>>> struct sgx_enclave_restrict_permissions, given that you most likely have
>>> exactly single call-site in the run-time.
>>
>> Are you requesting that I make the following name changes?
>> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
>> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
>>
>> If so, do you want the function names also written out in this way?
>> sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions()
>> sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions()
>> sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions()
>> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()
>
> Yes, unless you have a specific reason to shorten them :-)

Just aesthetic reasons ... having a long function name can look unbalanced
if it has many parameters and if the parameters themselves are long it
becomes hard to keep to the required line length.

Even so, it does look as though the longest ones can be made to work within 80
characters:
sgx_enclave_restrict_permissions(...
struct sgx_enclave_restrict_permissions *modp,
...)

Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would
now have some branches span more lines than other so it would not look as neat as
now (this is subjective I know).

Apart from the aesthetic reasons I do not have another reason not to make the
change and I will do so in the next version.

Reinette

2022-02-24 01:42:38

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Vijay,

On 2/23/2022 11:21 AM, Dhanraj, Vijay wrote:
> Hi All,
>
> Regarding the recent update of splitting the page permissions changerequest
> into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is,
> revert to how it was done in the v1 version?

While V1 did have a single ioctl() to handle both relaxing and restricting
permissions it never was possible for the kernel to distinguish what the
user intended. For this reason, even though there was a single ioctl() in V1,
it implemented permission restriction while supporting permission
relaxing as a side effect since the PTEs are flushed and new PTEs will
support the new permission. A consequence was that the V1 SGX_IOC_PAGE_MODP
required ENCLU[EACCEPT] from within the enclave even if it was only intended
to be used to relax permissions. SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS in
V2 is exactly the same as SGX_IOC_PAGE_MODP of V1.

>
> Why? Currently in Gramine (a library OS for unmodified applications,
> https://gramineproject.io/) with the new proposed change, one needs
> to store the page permission for each page or range of pages. And for
> every request of `mmap` or `mprotect`, Gramine would have to do a lookup
> of the page permissions for the request range and then call the respective
> IOCTL either RESTRICT or RELAX. This seems a little overwhelming.

Gramine would also need to know when to enter the enclave to run EMODPE, which
goes in hand with running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.

>
> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do an
> `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this
> approach, we can avoid storing page permissions and simplify the implementation.

This should be possible with the current implementation, similar to previous
implementation, but not optimal if only EMODPE followed by
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is what is needed.

>
> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do
> TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what
> will be the performance impact. Is there any data point to see the performance impact?

It can be worse than just that. EMODPR requires the EPC page to be present
and thus the page would need to be loaded from swap and decrypted if it
is not present. This may also mean that existing EPC pages need to be
swapped out (first blocked, then encrypted to backing storage, then the
ETRACK flow followed by IPIs to ensure there are no more references to that
page) ... before there is space available for needed page to be loaded and
decrypted.

That only takes care of the EMODPR ... which as you state needs
to be followed by the ETRACK flow and IPIs.

The above is also just for the OS portion - after that there is the
EACCEPT that needs to be run from within the enclave for every page whether
permissions were relaxed or restricted. This would be dependent on the
implementation - whether the enclave is entered once per EACCEPT or once
for all EACCEPTs.

All of the above would be unnecessary if permissions were just relaxed from
within the enclave while SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS used to
perform the OS actions.

The performance impact should be easy to determine: run both ioctl()s
and compare how long they take. Since you are asking about Gramine this may be
best to do in that environment but I can attempt something on your behalf by
using the existing SGX selftest infrastructure.

As an experiment I modified the existing "unclobbered_vdso_oversubscribed_remove"
test case that currently runs the SGX_IOC_ENCLAVE_MODIFY_TYPE on a large memory
region to instead run ioctl()s SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS. In my test I ran these ioctl()s on a 4GB
memory range to amplify any performance impact since I was just measuring it
by printing timestamps from user space.

My result showed that:
* Running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS on the 4GB region took less than a second
No EACCEPT needed from user space.

* Running SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS on the 4GB region took about 20 seconds.
* Running EACCEPT on each enclave page took an additional 20 seconds. (Please note that
this is using a sub obtimal way of entering the enclave for each EACCEPT where it
would be more efficient to enter the enclave once and run EACCEPT for each page without
exiting the enclave.)

The performance impact seems significant to me.

Reinette

2022-02-28 13:16:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Feb 23, 2022 at 11:55:03AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote:
> > On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> >>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> >>
> >> ...
> >>
> >>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> >>>> index 5c678b27bb72..b0ffb80bc67f 100644
> >>>> --- a/arch/x86/include/uapi/asm/sgx.h
> >>>> +++ b/arch/x86/include/uapi/asm/sgx.h
> >>>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
> >>>> _IO(SGX_MAGIC, 0x04)
> >>>> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> >>>> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> >>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> >>>> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >>>>
> >>>> /**
> >>>> * struct sgx_enclave_create - parameter structure for the
> >>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
> >>>> __u64 count;
> >>>> };
> >>>>
> >>>> +/**
> >>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> >>>> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> >>>> + * @offset: starting page offset (page aligned relative to enclave base
> >>>> + * address defined in SECS)
> >>>> + * @length: length of memory (multiple of the page size)
> >>>> + * @secinfo: address for the SECINFO data containing the new permission bits
> >>>> + * for pages in range described by @offset and @length
> >>>> + * @result: (output) SGX result code of ENCLS[EMODPR] function
> >>>> + * @count: (output) bytes successfully changed (multiple of page size)
> >>>> + */
> >>>> +struct sgx_enclave_restrict_perm {
> >>>> + __u64 offset;
> >>>> + __u64 length;
> >>>> + __u64 secinfo;
> >>>> + __u64 result;
> >>>> + __u64 count;
> >>>> +};
> >>>> +
> >>>> struct sgx_enclave_run;
> >>>>
> >>>> /**
> >>
> >> ...
> >>
> >>>
> >>> Just a suggestion but these might be a bit less cluttered explanations of
> >>> the fields:
> >>>
> >>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> >>> #[repr(C)]
> >>> pub struct RelaxPermissions {
> >>> /// In: starting page offset
> >>> offset: u64,
> >>> /// In: length of the address range (multiple of the page size)
> >>> length: u64,
> >>> /// In: SECINFO containing the relaxed permissions
> >>> secinfo: u64,
> >>> /// Out: length of the address range successfully changed
> >>> count: u64,
> >>> };
> >>>
> >>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> >>> #[repr(C)]
> >>> pub struct RestrictPermissions {
> >>> /// In: starting page offset
> >>> offset: u64,
> >>> /// In: length of the address range (multiple of the page size)
> >>> length: u64,
> >>> /// In: SECINFO containing the restricted permissions
> >>> secinfo: u64,
> >>> /// In: ENCLU[EMODPR] return value
> >>> result: u64,
> >>> /// Out: length of the address range successfully changed
> >>> count: u64,
> >>> };
> >>
> >> In your proposal you shorten the descriptions from the current implementation.
> >> I do consider the removed information valuable since I believe that it helps
> >> users understand the kernel interface requirements without needing to be
> >> familiar with or dig into the kernel code to understand how the provided data
> >> is used.
> >>
> >> For example, you shorten offset to "starting page offset", but what was removed
> >> was the requirement that this offset has to be page aligned and what the offset
> >> is relative to. I do believe summarizing these requirements upfront helps
> >> a user space developer by not needing to dig through kernel code later
> >> in order to understand why an -EINVAL was received.
> >>
> >>
> >>> I can live with the current ones too but I rewrote them so that I can
> >>> quickly make sense of the fields later. It's Rust code but the point is
> >>> the documentation...
> >>
> >> Since you do seem to be ok with the current descriptions I would prefer
> >> to keep them.
> >
> > Yeah, they are fine to me.
> >
> >>> Also, it should not be too much trouble to use the struct in user space
> >>> code even if the struct names are struct sgx_enclave_relax_permissions and
> >>> struct sgx_enclave_restrict_permissions, given that you most likely have
> >>> exactly single call-site in the run-time.
> >>
> >> Are you requesting that I make the following name changes?
> >> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
> >> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
> >>
> >> If so, do you want the function names also written out in this way?
> >> sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions()
> >> sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions()
> >> sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions()
> >> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()
> >
> > Yes, unless you have a specific reason to shorten them :-)
>
> Just aesthetic reasons ... having a long function name can look unbalanced
> if it has many parameters and if the parameters themselves are long it
> becomes hard to keep to the required line length.
>
> Even so, it does look as though the longest ones can be made to work within 80
> characters:
> sgx_enclave_restrict_permissions(...
> struct sgx_enclave_restrict_permissions *modp,
> ...)
>
> Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would
> now have some branches span more lines than other so it would not look as neat as
> now (this is subjective I know).
>
> Apart from the aesthetic reasons I do not have another reason not to make the
> change and I will do so in the next version.

IMHO, for one call site aesthics reason in alignment is less important than
a no-brainer function name.

BR, Jarkko

2022-02-28 13:31:32

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> Hi All,
>
> Regarding the recent update of splitting the page permissions change
> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> one? That is, revert to how it was done in the v1 version?

They are logically separate complex functionalities:

1. "restrict" calls EMODPR and requires EACCEPT
2. "relax" increases permissions up to vetted ("EADD") and could be
combined with EMODPE called inside enclave.

I don't think it is a good idea.

BR, Jarkko

2022-02-28 17:36:03

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Feb 28, 2022 at 01:25:07PM +0100, Jarkko Sakkinen wrote:
> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > Hi All,
> >
> > Regarding the recent update of splitting the page permissions change
> > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > one? That is, revert to how it was done in the v1 version?
>
> They are logically separate complex functionalities:
>
> 1. "restrict" calls EMODPR and requires EACCEPT
> 2. "relax" increases permissions up to vetted ("EADD") and could be
> combined with EMODPE called inside enclave.
>
> I don't think it is a good idea.

I.e. in microarchitecture there is no EMODP but two different flows,
and thus it is not sane to act like there was with that kind of ioctl.
It is as granular as the hardware is this way, and I think that is
common sense.

It would make much sense as combining ECREATE/EADD/EINIT into a single
multi-function ioctl. Often user space needs to be anyway have at least
some logically distinct flows fore these.

BR, Jarkko

2022-02-28 18:06:36

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On 2/28/22 04:24, Jarkko Sakkinen wrote:
>> Regarding the recent update of splitting the page permissions change
>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>> one? That is, revert to how it was done in the v1 version?
> They are logically separate complex functionalities:
>
> 1. "restrict" calls EMODPR and requires EACCEPT
> 2. "relax" increases permissions up to vetted ("EADD") and could be
> combined with EMODPE called inside enclave.

It would be great to have a _slightly_ better justification than that.
Existing permission interfaces like chmod or mprotect() don't have this
asymmetry.

I think you're saying that the underlying hardware implementation is
asymmetric, so the interface should be too. I don't find that argument
very convincing. If the hardware interface is arcane and we can make it
look more sane in the ioctl() layer, we should that, asymmetry or not.

If we can't make it any more sane, let's say why the ioctl() must or
should be asymmetric.

The SGX2 page permission mechanism is horribly counter intuitive.
*Everybody* that looks at it thinks that it's wrong. That means that we
have a lot of work ahead of us to explain the interfaces that get
layered on top.

2022-02-28 18:47:25

by Dhanraj, Vijay

[permalink] [raw]
Subject: RE: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

> On 2/28/22 04:24, Jarkko Sakkinen wrote:
> >> Regarding the recent update of splitting the page permissions change
> >> request into two IOCTLS (RELAX and RESTRICT), can we combine them
> >> into one? That is, revert to how it was done in the v1 version?
> > They are logically separate complex functionalities:
> >
> > 1. "restrict" calls EMODPR and requires EACCEPT 2. "relax" increases
> > permissions up to vetted ("EADD") and could be
> > combined with EMODPE called inside enclave.
>
> It would be great to have a _slightly_ better justification than that.
> Existing permission interfaces like chmod or mprotect() don't have this
> asymmetry.
>
> I think you're saying that the underlying hardware implementation is
> asymmetric, so the interface should be too. I don't find that argument very
> convincing. If the hardware interface is arcane and we can make it look more
> sane in the ioctl() layer, we should that, asymmetry or not.
>

Very nice analogy with `mprotect` and agree to this. It would be simpler from
user space point of view if we can abstract this and maintain a single interface
to relax or restrict permission. But if committee feels having two IOCTLS is the way,
then will modify Gramine to adopt this approach.


2022-03-01 13:58:23

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Tue, Mar 01, 2022 at 02:26:48PM +0100, Jarkko Sakkinen wrote:
> On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote:
> > On 2/28/22 04:24, Jarkko Sakkinen wrote:
> > >> Regarding the recent update of splitting the page permissions change
> > >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > >> one? That is, revert to how it was done in the v1 version?
> > > They are logically separate complex functionalities:
> > >
> > > 1. "restrict" calls EMODPR and requires EACCEPT
> > > 2. "relax" increases permissions up to vetted ("EADD") and could be
> > > combined with EMODPE called inside enclave.
> >
> > It would be great to have a _slightly_ better justification than that.
> > Existing permission interfaces like chmod or mprotect() don't have this
> > asymmetry.
> >
> > I think you're saying that the underlying hardware implementation is
> > asymmetric, so the interface should be too. I don't find that argument
> > very convincing. If the hardware interface is arcane and we can make it
> > look more sane in the ioctl() layer, we should that, asymmetry or not.
>
> That is my argument, yes.
>
> > If we can't make it any more sane, let's say why the ioctl() must or
> > should be asymmetric.
>
> Perhaps underling this asymmetry in kdoc would be enough.
>
> > The SGX2 page permission mechanism is horribly counter intuitive.
> > *Everybody* that looks at it thinks that it's wrong. That means that we
> > have a lot of work ahead of us to explain the interfaces that get
> > layered on top.
>
> I fully agree on this :-)
>
> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> obviously new RX pages are now out of the picture:
>
>
> /*
> * Adding a regular page that is architecturally allowed to only
> * be created with RW permissions.
> * TBD: Interface with user space policy to support max permissions
> * of RWX.
> */
> prot = PROT_READ | PROT_WRITE;
> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>
> If that TBD is left out to the final version the page augmentation has a
> risk of a API bottleneck, and that risk can realize then also in the page
> permission ioctls.
>
> I.e. now any review comment is based on not fully known territory, we have
> one known unknown, and some unknown unknowns from unpredictable effect to
> future API changes.

I think the best way to move forward would be to do EAUG's explicitly with
an ioctl that could also include secinfo for permissions. Then you can
easily do the rest with EACCEPTCOPY inside the enclave.

Putting EAUG to the #PF handler and implicitly call it just too flakky and
hard to make deterministic for e.g. JIT compiler in our use case (not to
mention that JIT is not possible at all because inability to do RX pages).

BR, Jarkko

2022-03-01 19:48:18

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>> obviously new RX pages are now out of the picture:
>>
>>
>> /*
>> * Adding a regular page that is architecturally allowed to only
>> * be created with RW permissions.
>> * TBD: Interface with user space policy to support max permissions
>> * of RWX.
>> */
>> prot = PROT_READ | PROT_WRITE;
>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>
>> If that TBD is left out to the final version the page augmentation has a
>> risk of a API bottleneck, and that risk can realize then also in the page
>> permission ioctls.
>>
>> I.e. now any review comment is based on not fully known territory, we have
>> one known unknown, and some unknown unknowns from unpredictable effect to
>> future API changes.

The plan to complete the "TBD" in the above snippet was to follow this work
with user policy integration at this location. On a high level the plan was
for this to look something like:


/*
* Adding a regular page that is architecturally allowed to only
* be created with RW permissions.
* Interface with user space policy to support max permissions
* of RWX.
*/
prot = PROT_READ | PROT_WRITE;
encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);

if (user space policy allows RWX on dynamically added pages)
encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
else
encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);

The work that follows this series aimed to do the integration with user
space policy.

> I think the best way to move forward would be to do EAUG's explicitly with
> an ioctl that could also include secinfo for permissions. Then you can
> easily do the rest with EACCEPTCOPY inside the enclave.

SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
this purpose. It already includes SECINFO which may also be useful if
needing to later support EAUG of PT_SS* pages.

How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
after enclave initialization on any memory region within the enclave where
pages are planned to be added dynamically. This ioctl() calls EAUG to add the
new pages with RW permissions and their vm_max_prot_bits can be set to the
permissions found in the included SECINFO. This will support later EACCEPTCOPY
as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS

The big question is whether communicating user policy after enclave initialization
via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
appreciate a confirmation on this direction considering the significant history
behind this topic.

> Putting EAUG to the #PF handler and implicitly call it just too flakky and
> hard to make deterministic for e.g. JIT compiler in our use case (not to
> mention that JIT is not possible at all because inability to do RX pages).

In this series this is indeed not possible because it lacks the user policy
integration. JIT will be possible after user policy integration.

Reinette

2022-03-02 02:52:04

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote:
> On 2/28/22 04:24, Jarkko Sakkinen wrote:
> >> Regarding the recent update of splitting the page permissions change
> >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> >> one? That is, revert to how it was done in the v1 version?
> > They are logically separate complex functionalities:
> >
> > 1. "restrict" calls EMODPR and requires EACCEPT
> > 2. "relax" increases permissions up to vetted ("EADD") and could be
> > combined with EMODPE called inside enclave.
>
> It would be great to have a _slightly_ better justification than that.
> Existing permission interfaces like chmod or mprotect() don't have this
> asymmetry.
>
> I think you're saying that the underlying hardware implementation is
> asymmetric, so the interface should be too. I don't find that argument
> very convincing. If the hardware interface is arcane and we can make it
> look more sane in the ioctl() layer, we should that, asymmetry or not.

That is my argument, yes.

> If we can't make it any more sane, let's say why the ioctl() must or
> should be asymmetric.

Perhaps underling this asymmetry in kdoc would be enough.

> The SGX2 page permission mechanism is horribly counter intuitive.
> *Everybody* that looks at it thinks that it's wrong. That means that we
> have a lot of work ahead of us to explain the interfaces that get
> layered on top.

I fully agree on this :-)

With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
obviously new RX pages are now out of the picture:


/*
* Adding a regular page that is architecturally allowed to only
* be created with RW permissions.
* TBD: Interface with user space policy to support max permissions
* of RWX.
*/
prot = PROT_READ | PROT_WRITE;
encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;

If that TBD is left out to the final version the page augmentation has a
risk of a API bottleneck, and that risk can realize then also in the page
permission ioctls.

I.e. now any review comment is based on not fully known territory, we have
one known unknown, and some unknown unknowns from unpredictable effect to
future API changes.

BR, Jarkko

2022-03-02 03:43:15

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> >> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
> >> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> >> obviously new RX pages are now out of the picture:
> >>
> >>
> >> /*
> >> * Adding a regular page that is architecturally allowed to only
> >> * be created with RW permissions.
> >> * TBD: Interface with user space policy to support max permissions
> >> * of RWX.
> >> */
> >> prot = PROT_READ | PROT_WRITE;
> >> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> >> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> >>
> >> If that TBD is left out to the final version the page augmentation has a
> >> risk of a API bottleneck, and that risk can realize then also in the page
> >> permission ioctls.
> >>
> >> I.e. now any review comment is based on not fully known territory, we have
> >> one known unknown, and some unknown unknowns from unpredictable effect to
> >> future API changes.
>
> The plan to complete the "TBD" in the above snippet was to follow this work
> with user policy integration at this location. On a high level the plan was
> for this to look something like:
>
>
> /*
> * Adding a regular page that is architecturally allowed to only
> * be created with RW permissions.
> * Interface with user space policy to support max permissions
> * of RWX.
> */
> prot = PROT_READ | PROT_WRITE;
> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>
> if (user space policy allows RWX on dynamically added pages)
> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
> else
> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);
>
> The work that follows this series aimed to do the integration with user
> space policy.

What do you mean by "user space policy" anyway exactly? I'm sorry but I
just don't fully understand this.

It's too big of a risk to accept this series without X taken care of. Patch
series should neither have TODO nor TBD comments IMHO. I don't want to ack
a series based on speculation what might happen in the future.

> > I think the best way to move forward would be to do EAUG's explicitly with
> > an ioctl that could also include secinfo for permissions. Then you can
> > easily do the rest with EACCEPTCOPY inside the enclave.
>
> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> this purpose. It already includes SECINFO which may also be useful if
> needing to later support EAUG of PT_SS* pages.

You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.

And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
thing added to the #PF handler? Why is it added at all then?

> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> after enclave initialization on any memory region within the enclave where
> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
> new pages with RW permissions and their vm_max_prot_bits can be set to the
> permissions found in the included SECINFO. This will support later EACCEPTCOPY
> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS

I don't like this type of re-use of the existing API.

> The big question is whether communicating user policy after enclave initialization
> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
> appreciate a confirmation on this direction considering the significant history
> behind this topic.

I have no idea because I don't know what is user space policy.

> > Putting EAUG to the #PF handler and implicitly call it just too flakky and
> > hard to make deterministic for e.g. JIT compiler in our use case (not to
> > mention that JIT is not possible at all because inability to do RX pages).
>
> In this series this is indeed not possible because it lacks the user policy
> integration. JIT will be possible after user policy integration.

Like this I don't what this series can be used in practice.

Majority of practical use cases for EDMM boil down to having a way to add
new executable code (not just Enarx).

> Reinette

BR, Jarkko

2022-03-02 03:46:31

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote:
> > The work that follows this series aimed to do the integration with user
> > space policy.
>
> What do you mean by "user space policy" anyway exactly? I'm sorry but I
> just don't fully understand this.
>
> It's too big of a risk to accept this series without X taken care of. Patch
> series should neither have TODO nor TBD comments IMHO. I don't want to ack
> a series based on speculation what might happen in the future.

If I accept this, then I'm kind of pre-acking code that I have no idea what
it looks like, can it be acked, or am I doing the right thing for the
kernel by acking this.

It's unfortunately force majeure situation for me. I simply could not ack
this, whether I want it or not.

BR, Jarkko

2022-03-02 16:37:51

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Mar 02, 2022 at 03:11:06AM +0100, Jarkko Sakkinen wrote:
> On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote:
> > > The work that follows this series aimed to do the integration with user
> > > space policy.
> >
> > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > just don't fully understand this.
> >
> > It's too big of a risk to accept this series without X taken care of. Patch
> > series should neither have TODO nor TBD comments IMHO. I don't want to ack
> > a series based on speculation what might happen in the future.
>
> If I accept this, then I'm kind of pre-acking code that I have no idea what
> it looks like, can it be acked, or am I doing the right thing for the
> kernel by acking this.
>
> It's unfortunately force majeure situation for me. I simply could not ack
> this, whether I want it or not.

I'd actually to leave out permission change madness completely out of this
patch set, as we all know it is a grazy beast of microarchitecture. For
user space having that is less critical than having executable pages.

Simply with EAUG/EACCEPTCOPY you can already populate enclave with any
permissions you had in mind. Augmenting alone would be logically consistent
patch set that is actually usable for many workloads.

Now there is half-broken augmenting (this is even writtend down to the TBD
comment) and complex code for EMODPR and EMODT that is usable only for
kselftests and not much else before there is fully working augmenting.

This way we get actually sound patch set that is easy to review and apply
to the mainline. It is also factors easier for you to iterate a smaller
set of patches.

After this it is so much easier to start to look at remaining functionality,
and at the same time augmenting part can be stress tested with real-world
code and it will mature quickly.

This whole thing *really* needs a serious U-turn on how it is delivered to
the upstream. Sometimes it is better just to admit that this didn't start
with the right foot.

BR, Jarkko

2022-03-03 00:49:45

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
>>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>>>> obviously new RX pages are now out of the picture:
>>>>
>>>>
>>>> /*
>>>> * Adding a regular page that is architecturally allowed to only
>>>> * be created with RW permissions.
>>>> * TBD: Interface with user space policy to support max permissions
>>>> * of RWX.
>>>> */
>>>> prot = PROT_READ | PROT_WRITE;
>>>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>>>
>>>> If that TBD is left out to the final version the page augmentation has a
>>>> risk of a API bottleneck, and that risk can realize then also in the page
>>>> permission ioctls.
>>>>
>>>> I.e. now any review comment is based on not fully known territory, we have
>>>> one known unknown, and some unknown unknowns from unpredictable effect to
>>>> future API changes.
>>
>> The plan to complete the "TBD" in the above snippet was to follow this work
>> with user policy integration at this location. On a high level the plan was
>> for this to look something like:
>>
>>
>> /*
>> * Adding a regular page that is architecturally allowed to only
>> * be created with RW permissions.
>> * Interface with user space policy to support max permissions
>> * of RWX.
>> */
>> prot = PROT_READ | PROT_WRITE;
>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>
>> if (user space policy allows RWX on dynamically added pages)
>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
>> else
>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);
>>
>> The work that follows this series aimed to do the integration with user
>> space policy.
>
> What do you mean by "user space policy" anyway exactly? I'm sorry but I
> just don't fully understand this.

My apologies - I just assumed that you would need no reminder about this contentious
part of SGX history. Essentially it means that, yes, the kernel could theoretically
permit any kind of access to any file/page, but some accesses are known to generally
be a bad idea - like making memory executable as well as writable - and thus there
are additional checks based on what user space permits before the kernel allows
such accesses.

For example,
mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()

User policy and SGX has seen significant discussion. Some notable threads:
https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
https://lore.kernel.org/linux-security-module/[email protected]/

> It's too big of a risk to accept this series without X taken care of. Patch
> series should neither have TODO nor TBD comments IMHO. I don't want to ack
> a series based on speculation what might happen in the future.

ok

>
>>> I think the best way to move forward would be to do EAUG's explicitly with
>>> an ioctl that could also include secinfo for permissions. Then you can
>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>
>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>> this purpose. It already includes SECINFO which may also be useful if
>> needing to later support EAUG of PT_SS* pages.
>
> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.

I could, yes.

> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
> thing added to the #PF handler? Why is it added at all then?

I was just speculating in my response, there is no plan to extend
SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).

>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>> after enclave initialization on any memory region within the enclave where
>> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
>> new pages with RW permissions and their vm_max_prot_bits can be set to the
>> permissions found in the included SECINFO. This will support later EACCEPTCOPY
>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>
> I don't like this type of re-use of the existing API.

I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
considering the user policy question (above) and performance trade-off (more below).

>
>> The big question is whether communicating user policy after enclave initialization
>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>> appreciate a confirmation on this direction considering the significant history
>> behind this topic.
>
> I have no idea because I don't know what is user space policy.

This discussion is about some enclave usages needing RWX permissions
on dynamically added enclave pages. RWX permissions on dynamically added pages is
not something that should blindly be allowed for all SGX enclaves but instead the user
needs to explicitly allow specific enclaves to have such ability. This is equivalent
to (but not the same as) what exists in Linux today with LSM. As seen in
mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
files and memory be both writable and executable, but it would only do so for those
files and memory that the LSM (which is how user policy is communicated, like SELinux)
indicates it is allowed, not blindly do so for all files and all memory.

>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>> mention that JIT is not possible at all because inability to do RX pages).

I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
what I understand it would have a performance impact since it would require all memory
that may be needed by the enclave be pre-allocated from outside the enclave and not
just dynamically allocated from within the enclave at the time it is needed.

Would such a performance impact be acceptable?

>> In this series this is indeed not possible because it lacks the user policy
>> integration. JIT will be possible after user policy integration.
>
> Like this I don't what this series can be used in practice.
>
> Majority of practical use cases for EDMM boil down to having a way to add
> new executable code (not just Enarx).
>

Understood.

On 3/1/2022 8:03 PM, Jarkko Sakkinen wrote:
> I'd actually to leave out permission change madness completely out of this
> patch set, as we all know it is a grazy beast of microarchitecture. For
> user space having that is less critical than having executable pages.
>
> Simply with EAUG/EACCEPTCOPY you can already populate enclave with any
> permissions you had in mind. Augmenting alone would be logically consistent
> patch set that is actually usable for many workloads.

Support for permission changes is required in order to support dynamically added
pages (EAUG pages) to be made executable. Yes, you could make
a dynamically added page have executable EPCM permissions using EACCEPTCOPY
but the kernel is still required to make the PTE executable.

> Now there is half-broken augmenting (this is even writtend down to the TBD
> comment) and complex code for EMODPR and EMODT that is usable only for
> kselftests and not much else before there is fully working augmenting.
>
> This way we get actually sound patch set that is easy to review and apply
> to the mainline. It is also factors easier for you to iterate a smaller
> set of patches.
>
> After this it is so much easier to start to look at remaining functionality,
> and at the same time augmenting part can be stress tested with real-world
> code and it will mature quickly.
>
> This whole thing *really* needs a serious U-turn on how it is delivered to
> the upstream. Sometimes it is better just to admit that this didn't start
> with the right foot.

As mentioned above, from what I understand the support for (as you state) the
"majority of practical use cases" on dynamically added pages do require
supporting permission changes also. It thus seems to me that it would help
consuming this feature if dynamic addition of pages and permission changes
are presented together. The SGX2 functionality that remains after that is the
changing of page type, which forms part of the page removal flow. In this
regard I also find that presenting the page addition flow at the same time
as the page removal flow would make these features easier to consume. I
think supporting the addition of pages and leaving page removal to
"future work" would be similarly frustrating to consume.

Reinette

2022-03-03 17:11:11

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi all,

On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
<[email protected]> wrote:

> Hi Jarkko,
>
> On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>>> Hi Jarkko,
>>>
>>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>>>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version
>>>>> of
>>>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>>>>> obviously new RX pages are now out of the picture:
>>>>>
>>>>>
>>>>> /*
>>>>> * Adding a regular page that is architecturally allowed to only
>>>>> * be created with RW permissions.
>>>>> * TBD: Interface with user space policy to support max permissions
>>>>> * of RWX.
>>>>> */
>>>>> prot = PROT_READ | PROT_WRITE;
>>>>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>>> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>>>>
>>>>> If that TBD is left out to the final version the page augmentation
>>>>> has a
>>>>> risk of a API bottleneck, and that risk can realize then also in the
>>>>> page
>>>>> permission ioctls.
>>>>>
>>>>> I.e. now any review comment is based on not fully known territory,
>>>>> we have
>>>>> one known unknown, and some unknown unknowns from unpredictable
>>>>> effect to
>>>>> future API changes.
>>>
>>> The plan to complete the "TBD" in the above snippet was to follow this
>>> work
>>> with user policy integration at this location. On a high level the
>>> plan was
>>> for this to look something like:
>>>
>>>
>>> /*
>>> * Adding a regular page that is architecturally allowed to only
>>> * be created with RW permissions.
>>> * Interface with user space policy to support max permissions
>>> * of RWX.
>>> */
>>> prot = PROT_READ | PROT_WRITE;
>>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>
>>> if (user space policy allows RWX on dynamically added pages)
>>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>>> PROT_WRITE | PROT_EXEC, 0);
>>> else
>>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>>> PROT_WRITE, 0);
>>>
>>> The work that follows this series aimed to do the integration with user
>>> space policy.
>>
>> What do you mean by "user space policy" anyway exactly? I'm sorry but I
>> just don't fully understand this.
>
> My apologies - I just assumed that you would need no reminder about this
> contentious
> part of SGX history. Essentially it means that, yes, the kernel could
> theoretically
> permit any kind of access to any file/page, but some accesses are known
> to generally
> be a bad idea - like making memory executable as well as writable - and
> thus there
> are additional checks based on what user space permits before the kernel
> allows
> such accesses.
>
> For example,
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>
> User policy and SGX has seen significant discussion. Some notable
> threads:
> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> https://lore.kernel.org/linux-security-module/[email protected]/
>
>> It's too big of a risk to accept this series without X taken care of.
>> Patch
>> series should neither have TODO nor TBD comments IMHO. I don't want to
>> ack
>> a series based on speculation what might happen in the future.
>
> ok
>
>>
>>>> I think the best way to move forward would be to do EAUG's explicitly
>>>> with
>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>
>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>> this purpose. It already includes SECINFO which may also be useful if
>>> needing to later support EAUG of PT_SS* pages.
>>
>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a
>> day.
>
> I could, yes.
>
>> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this
>> weird
>> thing added to the #PF handler? Why is it added at all then?
>
> I was just speculating in my response, there is no plan to extend
> SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>
>>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>>> after enclave initialization on any memory region within the enclave
>>> where
>>> pages are planned to be added dynamically. This ioctl() calls EAUG to
>>> add the
>>> new pages with RW permissions and their vm_max_prot_bits can be set to
>>> the
>>> permissions found in the included SECINFO. This will support later
>>> EACCEPTCOPY
>>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>>
>> I don't like this type of re-use of the existing API.
>
> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus
> after
> considering the user policy question (above) and performance trade-off
> (more below).
>
>>
>>> The big question is whether communicating user policy after enclave
>>> initialization
>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all?
>>> I would
>>> appreciate a confirmation on this direction considering the
>>> significant history
>>> behind this topic.
>>
>> I have no idea because I don't know what is user space policy.
>
> This discussion is about some enclave usages needing RWX permissions
> on dynamically added enclave pages. RWX permissions on dynamically added
> pages is
> not something that should blindly be allowed for all SGX enclaves but
> instead the user
> needs to explicitly allow specific enclaves to have such ability. This
> is equivalent
> to (but not the same as) what exists in Linux today with LSM. As seen in
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able
> to make
> files and memory be both writable and executable, but it would only do
> so for those
> files and memory that the LSM (which is how user policy is communicated,
> like SELinux)
> indicates it is allowed, not blindly do so for all files and all memory.
>
>>>> Putting EAUG to the #PF handler and implicitly call it just too
>>>> flakky and
>>>> hard to make deterministic for e.g. JIT compiler in our use case (not
>>>> to
>>>> mention that JIT is not possible at all because inability to do RX
>>>> pages).
>
> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic
> but from
> what I understand it would have a performance impact since it would
> require all memory
> that may be needed by the enclave be pre-allocated from outside the
> enclave and not
> just dynamically allocated from within the enclave at the time it is
> needed.
>
> Would such a performance impact be acceptable?
>

User space won't always have enough info to decide whether the pages to be
EAUG'd immediately. In some cases (shared libraries, JVM for example) lots
of code/data pages can be mapped but never actually touched. One
enclave/process does not know if any other more important enclave/process
would need the EPC.

It should be for kernel to make the final decision as it has overall
picture of the system EPC usage and availability.

User space can provide a hint (similar to MAP_POPULATE) to kernel that the
mmap'd area will soon be needed and kernel should EAUG as soon as it sees
fit based on current system usage. Or kernel implement some policy to
avoid #PF triggered by EACCEPT, for example, if the system has ton of free
EPC relative to the requested by mmap at the time.

BR
Haitao

2022-03-04 02:21:34

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/3/2022 3:12 PM, Jarkko Sakkinen wrote:
> On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote:
>>> What do you mean by "user space policy" anyway exactly? I'm sorry but I
>>> just don't fully understand this.
>>
>> My apologies - I just assumed that you would need no reminder about this contentious
>> part of SGX history. Essentially it means that, yes, the kernel could theoretically
>> permit any kind of access to any file/page, but some accesses are known to generally
>> be a bad idea - like making memory executable as well as writable - and thus there
>> are additional checks based on what user space permits before the kernel allows
>> such accesses.
>
> The device files are limited by a GID (in systemd upstream), which is a
> "user policy".
>
> What you want to add and why augmentation cannot be made complete before
> the unknown factor is added to the access control?

After studying this part of SGX history I learned that unfortunately none of the
existing user policy controls have been found to be a perfect fit for enclaves.
Current user policy type permissions are associated with files and processes and
enclaves have properties of both. One process can execute multiple enclaves and
only one/some of those enclaves may require to execute dirty pages. Associating
a permission to execute dirty pages with the process, and thus giving that ability
to all of its enclaves, is not ideal. Similarly, the file /dev/sgx_enclave can
represent multiple enclaves used by multiple processes and a file permission is
similarly too broad.

What I was planning to propose and discuss after the SGX2 core enabling was
an ability for user space to uniquely identify enclaves that require the
ability to execute dirty pages. This identification can be specified by using
enclave properties like MRENCLAVE and MRSIGNER. Executing dirty pages would
only be allowed for these specific enclaves identified to require this ability.
A solution like this is possible using the kernel's keys subsystem by introducing
a new "enclave_execdirty" key that contains these properties. I have this working
as a PoC.

Perhaps the SGX_IOC_ENCLAVE_AUGMENT_PAGES what you propose can also be seen as
a solution to support user space policy ... instead that it is more fine grained
in that it is used to identify specific memory ranges within specific enclaves that
are allowed to execute dirty pages. What do you think?

>>>>> I think the best way to move forward would be to do EAUG's explicitly with
>>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>>
>>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>>> this purpose. It already includes SECINFO which may also be useful if
>>>> needing to later support EAUG of PT_SS* pages.
>>>
>>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
>>
>> I could, yes.
>
> And this enables EACCEPTCOPY pattern nicely.
>
> E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with
> permissions and a zero page:
>
> 1. enclave calls back to host to do mmap()
> 2. host does eaug on given range and enter back to enclave.
> 3. enclave does eacceptcopy with given permissions and a zero page.
>
>>> I don't like this type of re-use of the existing API.
>>
>> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
>> considering the user policy question (above) and performance trade-off (more below).
>
> Ok.
>
> If adding this would be a bottleneck it would be already persistent int
> "add pages", so whatever limitation there might be, it already exist.

Currently this checking is built in as part of "add pages", for example, user
space is prevented from circumventing existing protections on the source pages
with the "vma->vm_flags & VM_MAYEXEC" check in __sgx_encl_add_page().

Further, there is trust here in that the pages added before enclave
initialization are accompanied by their secinfo with the permissions of
the pages and those values are included in the measurement (MRENCLAVE) of
the final enclave. The maximum permissions any enclave page
specified during "add pages" may have is "locked down" during this time.

Permissions of EAUG pages are not included in the MRENCLAVE of the enclave
and there is no backing memory that can be referenced to learn what is already
allowed.

It is possible that some of the code dynamically loaded into the enclave
could indeed be buggy or malicious so effort should be made to only allow
executing of dirty pages to those enclaves specified to require the ability.

> Thus, logically, that could be safely added without worrying about user
> policies all that much...
>
>>
>>>
>>>> The big question is whether communicating user policy after enclave initialization
>>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>>>> appreciate a confirmation on this direction considering the significant history
>>>> behind this topic.
>>>
>>> I have no idea because I don't know what is user space policy.
>>
>> This discussion is about some enclave usages needing RWX permissions
>> on dynamically added enclave pages. RWX permissions on dynamically added pages is
>
> I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY
> type of pattern. Please correct if I'm wrong.

This only takes EPCM permissions into account. The issue comes in when the kernel
needs to determine whether it should allow the PTEs pointing to these pages to be
executable.

To elaborate your example, to use dynamically added RWX pages
EAUG->EACCEPTCOPY->SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is required and
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will only allow PTEs that are allowed. In the
driver sgx_encl_page->vm_max_prot_bits dictates what permissions are allowed
and SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will return EPERM if an attempt is made
to relax permissions beyond that.

When considering the user space policy integration, sgx_encl_page->vm_max_prot_bits
will be initialized to reflect allowed permissions, RWX if the enclave is so allowed,
in this way EAUG pages can be made executable using SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.

>> not something that should blindly be allowed for all SGX enclaves but instead the user
>> needs to explicitly allow specific enclaves to have such ability. This is equivalent
>> to (but not the same as) what exists in Linux today with LSM. As seen in
>> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
>> files and memory be both writable and executable, but it would only do so for those
>> files and memory that the LSM (which is how user policy is communicated, like SELinux)
>> indicates it is allowed, not blindly do so for all files and all memory.
>
> We could also potentially make LSM hooks to ioctls, if that is ever needed.

Could you please elaborate?

>
> And as I said earlier, EAUG ioctl does not make things any worse they might
> be.

I hope my earlier comments noting the differences with adding pages shine some light here.

>
>>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>>>> mention that JIT is not possible at all because inability to do RX pages).
>>
>> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
>> what I understand it would have a performance impact since it would require all memory
>> that may be needed by the enclave be pre-allocated from outside the enclave and not
>> just dynamically allocated from within the enclave at the time it is needed.
>>
>> Would such a performance impact be acceptable?
>
> IMHO yes because bad behaving enclave can cause the same issue anyway,
> and more indeterministic manner.

With EAUG pages supported in the page fault handler it is possible to support
both usages. Especially now that Dave provided guidance on how to
support MAP_POPULATE. As I understand, when MAP_POPULATE is supported a usage
needing deterministic behavior can pre-fault all the EAUG pages while those
usages mapping a lot of memory that mostly will go unused are also supported.


Reinette

2022-03-04 05:20:29

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote:
> > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > just don't fully understand this.
>
> My apologies - I just assumed that you would need no reminder about this contentious
> part of SGX history. Essentially it means that, yes, the kernel could theoretically
> permit any kind of access to any file/page, but some accesses are known to generally
> be a bad idea - like making memory executable as well as writable - and thus there
> are additional checks based on what user space permits before the kernel allows
> such accesses.

The device files are limited by a GID (in systemd upstream), which is a
"user policy".

What you want to add and why augmentation cannot be made complete before
the unknown factor is added to the access control?

> >>> I think the best way to move forward would be to do EAUG's explicitly with
> >>> an ioctl that could also include secinfo for permissions. Then you can
> >>> easily do the rest with EACCEPTCOPY inside the enclave.
> >>
> >> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> >> this purpose. It already includes SECINFO which may also be useful if
> >> needing to later support EAUG of PT_SS* pages.
> >
> > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
>
> I could, yes.

And this enables EACCEPTCOPY pattern nicely.

E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with
permissions and a zero page:

1. enclave calls back to host to do mmap()
2. host does eaug on given range and enter back to enclave.
3. enclave does eacceptcopy with given permissions and a zero page.

> > I don't like this type of re-use of the existing API.
>
> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
> considering the user policy question (above) and performance trade-off (more below).

Ok.

If adding this would be a bottleneck it would be already persistent int
"add pages", so whatever limitation there might be, it already exist.

Thus, logically, that could be safely added without worrying about user
policies all that much...

>
> >
> >> The big question is whether communicating user policy after enclave initialization
> >> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
> >> appreciate a confirmation on this direction considering the significant history
> >> behind this topic.
> >
> > I have no idea because I don't know what is user space policy.
>
> This discussion is about some enclave usages needing RWX permissions
> on dynamically added enclave pages. RWX permissions on dynamically added pages is

I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY
type of pattern. Please correct if I'm wrong.

> not something that should blindly be allowed for all SGX enclaves but instead the user
> needs to explicitly allow specific enclaves to have such ability. This is equivalent
> to (but not the same as) what exists in Linux today with LSM. As seen in
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
> files and memory be both writable and executable, but it would only do so for those
> files and memory that the LSM (which is how user policy is communicated, like SELinux)
> indicates it is allowed, not blindly do so for all files and all memory.

We could also potentially make LSM hooks to ioctls, if that is ever needed.

And as I said earlier, EAUG ioctl does not make things any worse they might
be.

> >>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
> >>> hard to make deterministic for e.g. JIT compiler in our use case (not to
> >>> mention that JIT is not possible at all because inability to do RX pages).
>
> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
> what I understand it would have a performance impact since it would require all memory
> that may be needed by the enclave be pre-allocated from outside the enclave and not
> just dynamically allocated from within the enclave at the time it is needed.
>
> Would such a performance impact be acceptable?

IMHO yes because bad behaving enclave can cause the same issue anyway,
and more indeterministic manner.

BR, Jarkko

2022-03-04 06:47:33

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Haitao,

On 3/3/2022 8:08 AM, Haitao Huang wrote:
> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre <[email protected]> wrote:
>> On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>>> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>>>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:

...

>>>>> I think the best way to move forward would be to do EAUG's explicitly with
>>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>>
>>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>>> this purpose. It already includes SECINFO which may also be useful if
>>>> needing to later support EAUG of PT_SS* pages.
>>>
>>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
>>
>> I could, yes.
>>
>>> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
>>> thing added to the #PF handler? Why is it added at all then?
>>
>> I was just speculating in my response, there is no plan to extend
>> SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>>
>>>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>>>> after enclave initialization on any memory region within the enclave where
>>>> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
>>>> new pages with RW permissions and their vm_max_prot_bits can be set to the
>>>> permissions found in the included SECINFO. This will support later EACCEPTCOPY
>>>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>>>
>>> I don't like this type of re-use of the existing API.
>>
>> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
>> considering the user policy question (above) and performance trade-off (more below).
>>
>>>
>>>> The big question is whether communicating user policy after enclave initialization
>>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>>>> appreciate a confirmation on this direction considering the significant history
>>>> behind this topic.
>>>
>>> I have no idea because I don't know what is user space policy.
>>
>> This discussion is about some enclave usages needing RWX permissions
>> on dynamically added enclave pages. RWX permissions on dynamically added pages is
>> not something that should blindly be allowed for all SGX enclaves but instead the user
>> needs to explicitly allow specific enclaves to have such ability. This is equivalent
>> to (but not the same as) what exists in Linux today with LSM. As seen in
>> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
>> files and memory be both writable and executable, but it would only do so for those
>> files and memory that the LSM (which is how user policy is communicated, like SELinux)
>> indicates it is allowed, not blindly do so for all files and all memory.
>>
>>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>>>> mention that JIT is not possible at all because inability to do RX pages).
>>
>> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
>> what I understand it would have a performance impact since it would require all memory
>> that may be needed by the enclave be pre-allocated from outside the enclave and not
>> just dynamically allocated from within the enclave at the time it is needed.
>>
>> Would such a performance impact be acceptable?
>>
>
> User space won't always have enough info to decide whether the pages to be EAUG'd immediately. In some cases (shared libraries, JVM for example) lots of code/data pages can be mapped but never actually touched. One enclave/process does not know if any other more important enclave/process would need the EPC.
>
> It should be for kernel to make the final decision as it has overall picture of the system EPC usage and availability.
>
> User space can provide a hint (similar to MAP_POPULATE) to kernel that the mmap'd area will soon be needed and kernel should EAUG as soon as it sees fit based on current system usage. Or kernel implement some policy to avoid #PF triggered by EACCEPT, for example, if the system has ton of free EPC relative to the requested by mmap at the time.
>

mmap(...,...,...,MAP_POPULATE,...,...) would be most fitting and
ideal since it would enable user space to indicate that the pages would
be needed soon and the kernel can then prefault the pages. This is already
desirable in the current implementation to avoid the first page fault on
pages added via SGX_IOC_ENCLAVE_ADD_PAGES.

Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
then I believe that SGX would benefit.

Reinette

2022-03-04 07:08:41

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions


On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <[email protected]>
wrote:

> On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> Hi all,
>>
>> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> <[email protected]> wrote:
>>
>> > Hi Jarkko,
>> >
>> > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> > > > Hi Jarkko,
>> > > >
>> > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > this version of
>> > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
>> pages but
>> > > > > > obviously new RX pages are now out of the picture:
>> > > > > >
>> > > > > >
>> > > > > > /*
>> > > > > > * Adding a regular page that is architecturally allowed to
>> only
>> > > > > > * be created with RW permissions.
>> > > > > > * TBD: Interface with user space policy to support max
>> permissions
>> > > > > > * of RWX.
>> > > > > > */
>> > > > > > prot = PROT_READ | PROT_WRITE;
>> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>> > > > > >
>> > > > > > If that TBD is left out to the final version the page
>> > > > > > augmentation has a
>> > > > > > risk of a API bottleneck, and that risk can realize then
>> > > > > > also in the page
>> > > > > > permission ioctls.
>> > > > > >
>> > > > > > I.e. now any review comment is based on not fully known
>> > > > > > territory, we have
>> > > > > > one known unknown, and some unknown unknowns from
>> > > > > > unpredictable effect to
>> > > > > > future API changes.
>> > > >
>> > > > The plan to complete the "TBD" in the above snippet was to
>> > > > follow this work
>> > > > with user policy integration at this location. On a high level
>> > > > the plan was
>> > > > for this to look something like:
>> > > >
>> > > >
>> > > > /*
>> > > > * Adding a regular page that is architecturally allowed to only
>> > > > * be created with RW permissions.
>> > > > * Interface with user space policy to support max permissions
>> > > > * of RWX.
>> > > > */
>> > > > prot = PROT_READ | PROT_WRITE;
>> > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > >
>> > > > if (user space policy allows RWX on dynamically added
>> pages)
>> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > else
>> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > PROT_WRITE, 0);
>> > > >
>> > > > The work that follows this series aimed to do the integration
>> with user
>> > > > space policy.
>> > >
>> > > What do you mean by "user space policy" anyway exactly? I'm sorry
>> but I
>> > > just don't fully understand this.
>> >
>> > My apologies - I just assumed that you would need no reminder about
>> this
>> > contentious
>> > part of SGX history. Essentially it means that, yes, the kernel could
>> > theoretically
>> > permit any kind of access to any file/page, but some accesses are
>> known
>> > to generally
>> > be a bad idea - like making memory executable as well as writable -
>> and
>> > thus there
>> > are additional checks based on what user space permits before the
>> kernel
>> > allows
>> > such accesses.
>> >
>> > For example,
>> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> >
>> > User policy and SGX has seen significant discussion. Some notable
>> > threads:
>> >
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> >
>> https://lore.kernel.org/linux-security-module/[email protected]/
>> >
>> > > It's too big of a risk to accept this series without X taken care
>> > > of. Patch
>> > > series should neither have TODO nor TBD comments IMHO. I don't want
>> > > to ack
>> > > a series based on speculation what might happen in the future.
>> >
>> > ok
>> >
>> > >
>> > > > > I think the best way to move forward would be to do EAUG's
>> > > > > explicitly with
>> > > > > an ioctl that could also include secinfo for permissions. Then
>> you can
>> > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > >
>> > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
>> used for
>> > > > this purpose. It already includes SECINFO which may also be
>> useful if
>> > > > needing to later support EAUG of PT_SS* pages.
>> > >
>> > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
>> > > a day.
>> >
>> > I could, yes.
>> >
>> > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
>> > > this weird
>> > > thing added to the #PF handler? Why is it added at all then?
>> >
>> > I was just speculating in my response, there is no plan to extend
>> > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> >
>> > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > after enclave initialization on any memory region within the
>> > > > enclave where
>> > > > pages are planned to be added dynamically. This ioctl() calls
>> > > > EAUG to add the
>> > > > new pages with RW permissions and their vm_max_prot_bits can be
>> > > > set to the
>> > > > permissions found in the included SECINFO. This will support
>> > > > later EACCEPTCOPY
>> > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > >
>> > > I don't like this type of re-use of the existing API.
>> >
>> > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
>> consensus
>> > after
>> > considering the user policy question (above) and performance trade-off
>> > (more below).
>> >
>> > >
>> > > > The big question is whether communicating user policy after
>> > > > enclave initialization
>> > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
>> > > > to all? I would
>> > > > appreciate a confirmation on this direction considering the
>> > > > significant history
>> > > > behind this topic.
>> > >
>> > > I have no idea because I don't know what is user space policy.
>> >
>> > This discussion is about some enclave usages needing RWX permissions
>> > on dynamically added enclave pages. RWX permissions on dynamically
>> added
>> > pages is
>> > not something that should blindly be allowed for all SGX enclaves but
>> > instead the user
>> > needs to explicitly allow specific enclaves to have such ability. This
>> > is equivalent
>> > to (but not the same as) what exists in Linux today with LSM. As seen
>> in
>> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is
>> able
>> > to make
>> > files and memory be both writable and executable, but it would only do
>> > so for those
>> > files and memory that the LSM (which is how user policy is
>> communicated,
>> > like SELinux)
>> > indicates it is allowed, not blindly do so for all files and all
>> memory.
>> >
>> > > > > Putting EAUG to the #PF handler and implicitly call it just
>> > > > > too flakky and
>> > > > > hard to make deterministic for e.g. JIT compiler in our use
>> > > > > case (not to
>> > > > > mention that JIT is not possible at all because inability to
>> > > > > do RX pages).
>> >
>> > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
>> deterministic
>> > but from
>> > what I understand it would have a performance impact since it would
>> > require all memory
>> > that may be needed by the enclave be pre-allocated from outside the
>> > enclave and not
>> > just dynamically allocated from within the enclave at the time it is
>> > needed.
>> >
>> > Would such a performance impact be acceptable?
>> >
>>
>> User space won't always have enough info to decide whether the pages to
>> be
>> EAUG'd immediately. In some cases (shared libraries, JVM for example)
>> lots
>> of code/data pages can be mapped but never actually touched. One
>> enclave/process does not know if any other more important
>> enclave/process
>> would need the EPC.
>>
>> It should be for kernel to make the final decision as it has overall
>> picture
>> of the system EPC usage and availability.
>
> EAUG ioctl does not give better capabilities for user space to waste
> EPC given that EADD ioctl already exists, i.e. your argument is logically
> incorrect.

The point of adding EAUG is to allow more efficient use of EPC pages.
Without EAUG, enclaves have to EADD everything upfront into EPC, consuming
predetermined number of EPC pages, some of which may not be used at all.
With EAUG, enclaves should be able to load minimal pages to get started,
pages added on #PF as they are actually accessed.

Obviously as you pointed out, some usages make more sense to pre-EAUG
(EAUG before #PF). But your proposal of supporting only pre-EAUG here
essentially makes EAUG behave almost the same as EADD. If the current
implementation with EAUG on #PF can also use MAP_POPULATE for pre-EAUG
(seems possible based on Dave's comments), then it is flxible to cover all
cases and allow kernel to optimize allocation of EPC pages.

Thanks
Haitao

2022-03-04 07:28:41

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> Hi all,
>
> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> <[email protected]> wrote:
>
> > Hi Jarkko,
> >
> > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > Hi Jarkko,
> > > >
> > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > this version of
> > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> > > > > > obviously new RX pages are now out of the picture:
> > > > > >
> > > > > >
> > > > > > /*
> > > > > > * Adding a regular page that is architecturally allowed to only
> > > > > > * be created with RW permissions.
> > > > > > * TBD: Interface with user space policy to support max permissions
> > > > > > * of RWX.
> > > > > > */
> > > > > > prot = PROT_READ | PROT_WRITE;
> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > > > >
> > > > > > If that TBD is left out to the final version the page
> > > > > > augmentation has a
> > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > also in the page
> > > > > > permission ioctls.
> > > > > >
> > > > > > I.e. now any review comment is based on not fully known
> > > > > > territory, we have
> > > > > > one known unknown, and some unknown unknowns from
> > > > > > unpredictable effect to
> > > > > > future API changes.
> > > >
> > > > The plan to complete the "TBD" in the above snippet was to
> > > > follow this work
> > > > with user policy integration at this location. On a high level
> > > > the plan was
> > > > for this to look something like:
> > > >
> > > >
> > > > /*
> > > > * Adding a regular page that is architecturally allowed to only
> > > > * be created with RW permissions.
> > > > * Interface with user space policy to support max permissions
> > > > * of RWX.
> > > > */
> > > > prot = PROT_READ | PROT_WRITE;
> > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > >
> > > > if (user space policy allows RWX on dynamically added pages)
> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > PROT_WRITE | PROT_EXEC, 0);
> > > > else
> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > PROT_WRITE, 0);
> > > >
> > > > The work that follows this series aimed to do the integration with user
> > > > space policy.
> > >
> > > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > > just don't fully understand this.
> >
> > My apologies - I just assumed that you would need no reminder about this
> > contentious
> > part of SGX history. Essentially it means that, yes, the kernel could
> > theoretically
> > permit any kind of access to any file/page, but some accesses are known
> > to generally
> > be a bad idea - like making memory executable as well as writable - and
> > thus there
> > are additional checks based on what user space permits before the kernel
> > allows
> > such accesses.
> >
> > For example,
> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> >
> > User policy and SGX has seen significant discussion. Some notable
> > threads:
> > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > https://lore.kernel.org/linux-security-module/[email protected]/
> >
> > > It's too big of a risk to accept this series without X taken care
> > > of. Patch
> > > series should neither have TODO nor TBD comments IMHO. I don't want
> > > to ack
> > > a series based on speculation what might happen in the future.
> >
> > ok
> >
> > >
> > > > > I think the best way to move forward would be to do EAUG's
> > > > > explicitly with
> > > > > an ioctl that could also include secinfo for permissions. Then you can
> > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > >
> > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> > > > this purpose. It already includes SECINFO which may also be useful if
> > > > needing to later support EAUG of PT_SS* pages.
> > >
> > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
> > > a day.
> >
> > I could, yes.
> >
> > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > this weird
> > > thing added to the #PF handler? Why is it added at all then?
> >
> > I was just speculating in my response, there is no plan to extend
> > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> >
> > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> > > > after enclave initialization on any memory region within the
> > > > enclave where
> > > > pages are planned to be added dynamically. This ioctl() calls
> > > > EAUG to add the
> > > > new pages with RW permissions and their vm_max_prot_bits can be
> > > > set to the
> > > > permissions found in the included SECINFO. This will support
> > > > later EACCEPTCOPY
> > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > >
> > > I don't like this type of re-use of the existing API.
> >
> > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus
> > after
> > considering the user policy question (above) and performance trade-off
> > (more below).
> >
> > >
> > > > The big question is whether communicating user policy after
> > > > enclave initialization
> > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > to all? I would
> > > > appreciate a confirmation on this direction considering the
> > > > significant history
> > > > behind this topic.
> > >
> > > I have no idea because I don't know what is user space policy.
> >
> > This discussion is about some enclave usages needing RWX permissions
> > on dynamically added enclave pages. RWX permissions on dynamically added
> > pages is
> > not something that should blindly be allowed for all SGX enclaves but
> > instead the user
> > needs to explicitly allow specific enclaves to have such ability. This
> > is equivalent
> > to (but not the same as) what exists in Linux today with LSM. As seen in
> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able
> > to make
> > files and memory be both writable and executable, but it would only do
> > so for those
> > files and memory that the LSM (which is how user policy is communicated,
> > like SELinux)
> > indicates it is allowed, not blindly do so for all files and all memory.
> >
> > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > too flakky and
> > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > case (not to
> > > > > mention that JIT is not possible at all because inability to
> > > > > do RX pages).
> >
> > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic
> > but from
> > what I understand it would have a performance impact since it would
> > require all memory
> > that may be needed by the enclave be pre-allocated from outside the
> > enclave and not
> > just dynamically allocated from within the enclave at the time it is
> > needed.
> >
> > Would such a performance impact be acceptable?
> >
>
> User space won't always have enough info to decide whether the pages to be
> EAUG'd immediately. In some cases (shared libraries, JVM for example) lots
> of code/data pages can be mapped but never actually touched. One
> enclave/process does not know if any other more important enclave/process
> would need the EPC.
>
> It should be for kernel to make the final decision as it has overall picture
> of the system EPC usage and availability.

EAUG ioctl does not give better capabilities for user space to waste
EPC given that EADD ioctl already exists, i.e. your argument is logically
incorrect.

BR, Jarkko

2022-03-04 10:44:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On 3/3/22 13:23, Reinette Chatre wrote:
> Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> then I believe that SGX would benefit.

Some Intel folks asked for this quite a while ago. I think it's
entirely doable: add a new vm_ops->populate() function that will allow
ignoring VM_IO|VM_PFNMAP if present.

Or, if nobody wants to waste all of the vm_ops space, just add an
arch_vma_populate() or something which can call over into SGX.

I'll happily review the patches if anyone can put such a beast together.

2022-03-04 13:13:18

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
>
> On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> > > Hi all,
> > >
> > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> > > <[email protected]> wrote:
> > >
> > > > Hi Jarkko,
> > > >
> > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > > > this version of
> > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
> > > pages but
> > > > > > > > obviously new RX pages are now out of the picture:
> > > > > > > >
> > > > > > > >
> > > > > > > > /*
> > > > > > > > * Adding a regular page that is architecturally allowed
> > > to only
> > > > > > > > * be created with RW permissions.
> > > > > > > > * TBD: Interface with user space policy to support max
> > > permissions
> > > > > > > > * of RWX.
> > > > > > > > */
> > > > > > > > prot = PROT_READ | PROT_WRITE;
> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > > > > > >
> > > > > > > > If that TBD is left out to the final version the page
> > > > > > > > augmentation has a
> > > > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > > > also in the page
> > > > > > > > permission ioctls.
> > > > > > > >
> > > > > > > > I.e. now any review comment is based on not fully known
> > > > > > > > territory, we have
> > > > > > > > one known unknown, and some unknown unknowns from
> > > > > > > > unpredictable effect to
> > > > > > > > future API changes.
> > > > > >
> > > > > > The plan to complete the "TBD" in the above snippet was to
> > > > > > follow this work
> > > > > > with user policy integration at this location. On a high level
> > > > > > the plan was
> > > > > > for this to look something like:
> > > > > >
> > > > > >
> > > > > > /*
> > > > > > * Adding a regular page that is architecturally allowed to only
> > > > > > * be created with RW permissions.
> > > > > > * Interface with user space policy to support max permissions
> > > > > > * of RWX.
> > > > > > */
> > > > > > prot = PROT_READ | PROT_WRITE;
> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > >
> > > > > > if (user space policy allows RWX on dynamically added
> > > pages)
> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > PROT_WRITE | PROT_EXEC, 0);
> > > > > > else
> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > PROT_WRITE, 0);
> > > > > >
> > > > > > The work that follows this series aimed to do the integration
> > > with user
> > > > > > space policy.
> > > > >
> > > > > What do you mean by "user space policy" anyway exactly? I'm
> > > sorry but I
> > > > > just don't fully understand this.
> > > >
> > > > My apologies - I just assumed that you would need no reminder
> > > about this
> > > > contentious
> > > > part of SGX history. Essentially it means that, yes, the kernel could
> > > > theoretically
> > > > permit any kind of access to any file/page, but some accesses are
> > > known
> > > > to generally
> > > > be a bad idea - like making memory executable as well as writable
> > > - and
> > > > thus there
> > > > are additional checks based on what user space permits before the
> > > kernel
> > > > allows
> > > > such accesses.
> > > >
> > > > For example,
> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> > > >
> > > > User policy and SGX has seen significant discussion. Some notable
> > > > threads:
> > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > > > https://lore.kernel.org/linux-security-module/[email protected]/
> > > >
> > > > > It's too big of a risk to accept this series without X taken care
> > > > > of. Patch
> > > > > series should neither have TODO nor TBD comments IMHO. I don't want
> > > > > to ack
> > > > > a series based on speculation what might happen in the future.
> > > >
> > > > ok
> > > >
> > > > >
> > > > > > > I think the best way to move forward would be to do EAUG's
> > > > > > > explicitly with
> > > > > > > an ioctl that could also include secinfo for permissions.
> > > Then you can
> > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > > > >
> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
> > > used for
> > > > > > this purpose. It already includes SECINFO which may also be
> > > useful if
> > > > > > needing to later support EAUG of PT_SS* pages.
> > > > >
> > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
> > > > > a day.
> > > >
> > > > I could, yes.
> > > >
> > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > > > this weird
> > > > > thing added to the #PF handler? Why is it added at all then?
> > > >
> > > > I was just speculating in my response, there is no plan to extend
> > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> > > >
> > > > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> > > > > > after enclave initialization on any memory region within the
> > > > > > enclave where
> > > > > > pages are planned to be added dynamically. This ioctl() calls
> > > > > > EAUG to add the
> > > > > > new pages with RW permissions and their vm_max_prot_bits can be
> > > > > > set to the
> > > > > > permissions found in the included SECINFO. This will support
> > > > > > later EACCEPTCOPY
> > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > > > >
> > > > > I don't like this type of re-use of the existing API.
> > > >
> > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
> > > consensus
> > > > after
> > > > considering the user policy question (above) and performance trade-off
> > > > (more below).
> > > >
> > > > >
> > > > > > The big question is whether communicating user policy after
> > > > > > enclave initialization
> > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > > > to all? I would
> > > > > > appreciate a confirmation on this direction considering the
> > > > > > significant history
> > > > > > behind this topic.
> > > > >
> > > > > I have no idea because I don't know what is user space policy.
> > > >
> > > > This discussion is about some enclave usages needing RWX permissions
> > > > on dynamically added enclave pages. RWX permissions on dynamically
> > > added
> > > > pages is
> > > > not something that should blindly be allowed for all SGX enclaves but
> > > > instead the user
> > > > needs to explicitly allow specific enclaves to have such ability. This
> > > > is equivalent
> > > > to (but not the same as) what exists in Linux today with LSM. As
> > > seen in
> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
> > > is able
> > > > to make
> > > > files and memory be both writable and executable, but it would only do
> > > > so for those
> > > > files and memory that the LSM (which is how user policy is
> > > communicated,
> > > > like SELinux)
> > > > indicates it is allowed, not blindly do so for all files and all
> > > memory.
> > > >
> > > > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > > > too flakky and
> > > > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > > > case (not to
> > > > > > > mention that JIT is not possible at all because inability to
> > > > > > > do RX pages).
> > > >
> > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
> > > deterministic
> > > > but from
> > > > what I understand it would have a performance impact since it would
> > > > require all memory
> > > > that may be needed by the enclave be pre-allocated from outside the
> > > > enclave and not
> > > > just dynamically allocated from within the enclave at the time it is
> > > > needed.
> > > >
> > > > Would such a performance impact be acceptable?
> > > >
> > >
> > > User space won't always have enough info to decide whether the pages
> > > to be
> > > EAUG'd immediately. In some cases (shared libraries, JVM for
> > > example) lots
> > > of code/data pages can be mapped but never actually touched. One
> > > enclave/process does not know if any other more important
> > > enclave/process
> > > would need the EPC.
> > >
> > > It should be for kernel to make the final decision as it has overall
> > > picture
> > > of the system EPC usage and availability.
> >
> > EAUG ioctl does not give better capabilities for user space to waste
> > EPC given that EADD ioctl already exists, i.e. your argument is logically
> > incorrect.
>
> The point of adding EAUG is to allow more efficient use of EPC pages.
> Without EAUG, enclaves have to EADD everything upfront into EPC, consuming
> predetermined number of EPC pages, some of which may not be used at all.
> With EAUG, enclaves should be able to load minimal pages to get started,
> pages added on #PF as they are actually accessed.
>
> Obviously as you pointed out, some usages make more sense to pre-EAUG (EAUG
> before #PF). But your proposal of supporting only pre-EAUG here essentially
> makes EAUG behave almost the same as EADD. If the current implementation
> with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
> based on Dave's comments), then it is flxible to cover all cases and allow
> kernel to optimize allocation of EPC pages.

There is no even a working #PF based implementation in existance, and your
argument has too many if's for my taste.

Reinette, can you squash this fixup to your patch set and send v3 so that
we get to a working implementation that can be benchmarked against e.g.
ioctl based version:

https://lore.kernel.org/linux-sgx/[email protected]/T/#u

This also objectively fixes some performance issues, e.g. EMODPE can be
just used without any round-trips (v2 requires relax ioctl).

BR, Jark

2022-03-04 19:55:09

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko

On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <[email protected]>
wrote:

> On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
>>
>> On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <[email protected]>
>> wrote:
>>
>> > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> > > Hi all,
>> > >
>> > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> > > <[email protected]> wrote:
>> > >
>> > > > Hi Jarkko,
>> > > >
>> > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> > > > > > Hi Jarkko,
>> > > > > >
>> > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > > > this version of
>> > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
>> > > pages but
>> > > > > > > > obviously new RX pages are now out of the picture:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > /*
>> > > > > > > > * Adding a regular page that is architecturally allowed
>> > > to only
>> > > > > > > > * be created with RW permissions.
>> > > > > > > > * TBD: Interface with user space policy to support max
>> > > permissions
>> > > > > > > > * of RWX.
>> > > > > > > > */
>> > > > > > > > prot = PROT_READ | PROT_WRITE;
>> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > > > > encl_page->vm_max_prot_bits =
>> encl_page->vm_run_prot_bits;
>> > > > > > > >
>> > > > > > > > If that TBD is left out to the final version the page
>> > > > > > > > augmentation has a
>> > > > > > > > risk of a API bottleneck, and that risk can realize then
>> > > > > > > > also in the page
>> > > > > > > > permission ioctls.
>> > > > > > > >
>> > > > > > > > I.e. now any review comment is based on not fully known
>> > > > > > > > territory, we have
>> > > > > > > > one known unknown, and some unknown unknowns from
>> > > > > > > > unpredictable effect to
>> > > > > > > > future API changes.
>> > > > > >
>> > > > > > The plan to complete the "TBD" in the above snippet was to
>> > > > > > follow this work
>> > > > > > with user policy integration at this location. On a high level
>> > > > > > the plan was
>> > > > > > for this to look something like:
>> > > > > >
>> > > > > >
>> > > > > > /*
>> > > > > > * Adding a regular page that is architecturally allowed to
>> only
>> > > > > > * be created with RW permissions.
>> > > > > > * Interface with user space policy to support max
>> permissions
>> > > > > > * of RWX.
>> > > > > > */
>> > > > > > prot = PROT_READ | PROT_WRITE;
>> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > >
>> > > > > > if (user space policy allows RWX on dynamically added
>> > > pages)
>> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > > > else
>> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > > > PROT_WRITE, 0);
>> > > > > >
>> > > > > > The work that follows this series aimed to do the integration
>> > > with user
>> > > > > > space policy.
>> > > > >
>> > > > > What do you mean by "user space policy" anyway exactly? I'm
>> > > sorry but I
>> > > > > just don't fully understand this.
>> > > >
>> > > > My apologies - I just assumed that you would need no reminder
>> > > about this
>> > > > contentious
>> > > > part of SGX history. Essentially it means that, yes, the kernel
>> could
>> > > > theoretically
>> > > > permit any kind of access to any file/page, but some accesses are
>> > > known
>> > > > to generally
>> > > > be a bad idea - like making memory executable as well as writable
>> > > - and
>> > > > thus there
>> > > > are additional checks based on what user space permits before the
>> > > kernel
>> > > > allows
>> > > > such accesses.
>> > > >
>> > > > For example,
>> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> > > >
>> > > > User policy and SGX has seen significant discussion. Some notable
>> > > > threads:
>> > > >
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> > > >
>> https://lore.kernel.org/linux-security-module/[email protected]/
>> > > >
>> > > > > It's too big of a risk to accept this series without X taken
>> care
>> > > > > of. Patch
>> > > > > series should neither have TODO nor TBD comments IMHO. I don't
>> want
>> > > > > to ack
>> > > > > a series based on speculation what might happen in the future.
>> > > >
>> > > > ok
>> > > >
>> > > > >
>> > > > > > > I think the best way to move forward would be to do EAUG's
>> > > > > > > explicitly with
>> > > > > > > an ioctl that could also include secinfo for permissions.
>> > > Then you can
>> > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > > > >
>> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
>> > > used for
>> > > > > > this purpose. It already includes SECINFO which may also be
>> > > useful if
>> > > > > > needing to later support EAUG of PT_SS* pages.
>> > > > >
>> > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and
>> call it
>> > > > > a day.
>> > > >
>> > > > I could, yes.
>> > > >
>> > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
>> > > > > this weird
>> > > > > thing added to the #PF handler? Why is it added at all then?
>> > > >
>> > > > I was just speculating in my response, there is no plan to extend
>> > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> > > >
>> > > > > > How this could work is user space calls
>> SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > > > after enclave initialization on any memory region within the
>> > > > > > enclave where
>> > > > > > pages are planned to be added dynamically. This ioctl() calls
>> > > > > > EAUG to add the
>> > > > > > new pages with RW permissions and their vm_max_prot_bits can
>> be
>> > > > > > set to the
>> > > > > > permissions found in the included SECINFO. This will support
>> > > > > > later EACCEPTCOPY
>> > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > > > >
>> > > > > I don't like this type of re-use of the existing API.
>> > > >
>> > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
>> > > consensus
>> > > > after
>> > > > considering the user policy question (above) and performance
>> trade-off
>> > > > (more below).
>> > > >
>> > > > >
>> > > > > > The big question is whether communicating user policy after
>> > > > > > enclave initialization
>> > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
>> > > > > > to all? I would
>> > > > > > appreciate a confirmation on this direction considering the
>> > > > > > significant history
>> > > > > > behind this topic.
>> > > > >
>> > > > > I have no idea because I don't know what is user space policy.
>> > > >
>> > > > This discussion is about some enclave usages needing RWX
>> permissions
>> > > > on dynamically added enclave pages. RWX permissions on dynamically
>> > > added
>> > > > pages is
>> > > > not something that should blindly be allowed for all SGX enclaves
>> but
>> > > > instead the user
>> > > > needs to explicitly allow specific enclaves to have such ability.
>> This
>> > > > is equivalent
>> > > > to (but not the same as) what exists in Linux today with LSM. As
>> > > seen in
>> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
>> > > is able
>> > > > to make
>> > > > files and memory be both writable and executable, but it would
>> only do
>> > > > so for those
>> > > > files and memory that the LSM (which is how user policy is
>> > > communicated,
>> > > > like SELinux)
>> > > > indicates it is allowed, not blindly do so for all files and all
>> > > memory.
>> > > >
>> > > > > > > Putting EAUG to the #PF handler and implicitly call it just
>> > > > > > > too flakky and
>> > > > > > > hard to make deterministic for e.g. JIT compiler in our use
>> > > > > > > case (not to
>> > > > > > > mention that JIT is not possible at all because inability to
>> > > > > > > do RX pages).
>> > > >
>> > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
>> > > deterministic
>> > > > but from
>> > > > what I understand it would have a performance impact since it
>> would
>> > > > require all memory
>> > > > that may be needed by the enclave be pre-allocated from outside
>> the
>> > > > enclave and not
>> > > > just dynamically allocated from within the enclave at the time it
>> is
>> > > > needed.
>> > > >
>> > > > Would such a performance impact be acceptable?
>> > > >
>> > >
>> > > User space won't always have enough info to decide whether the pages
>> > > to be
>> > > EAUG'd immediately. In some cases (shared libraries, JVM for
>> > > example) lots
>> > > of code/data pages can be mapped but never actually touched. One
>> > > enclave/process does not know if any other more important
>> > > enclave/process
>> > > would need the EPC.
>> > >
>> > > It should be for kernel to make the final decision as it has overall
>> > > picture
>> > > of the system EPC usage and availability.
>> >
>> > EAUG ioctl does not give better capabilities for user space to waste
>> > EPC given that EADD ioctl already exists, i.e. your argument is
>> logically
>> > incorrect.
>>
>> The point of adding EAUG is to allow more efficient use of EPC pages.
>> Without EAUG, enclaves have to EADD everything upfront into EPC,
>> consuming
>> predetermined number of EPC pages, some of which may not be used at all.
>> With EAUG, enclaves should be able to load minimal pages to get started,
>> pages added on #PF as they are actually accessed.
>>
>> Obviously as you pointed out, some usages make more sense to pre-EAUG
>> (EAUG
>> before #PF). But your proposal of supporting only pre-EAUG here
>> essentially
>> makes EAUG behave almost the same as EADD. If the current
>> implementation
>> with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
>> based on Dave's comments), then it is flxible to cover all cases and
>> allow
>> kernel to optimize allocation of EPC pages.
>
> There is no even a working #PF based implementation in existance, and
> your
> argument has too many if's for my taste.

1) if you mean no user space is implementing this kind of solution, read
this section, otherwise, skip to 2) below which is only couple of
sentences.

If you are willing to look, there is already implementation in our SDK to
do heap and stack expansion on demand on #PF. Enclaves may not know
heap/stack size up front, we have implemented these features to make EPC
usage more efficient. I don't know why normal processes can add RAM on
#PF, but enclaves adding EPC on #PF becomes so unacceptable concept to
you. And the kernel does that for EPC swapping already when #PF happens on
a swapped out EPC page.

Our implementation has gone through several rounds, the latest is
here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also
implemented in original OOT driver based SDK implementation. Customers are
using it and found them useful. I think this is a critical feature that
many other runtimes will also need.

2)
It's OK for you to request additional support for your usage and I agree
it is needed. But IMHO, totally getting rid of EAUG on #PF is bad and
unnecessary. Current implementation can be extended to support your usage.
What's the reason you think MAP_POPULATE won't work for you?

BR
Haitao

2022-03-05 02:50:07

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote:
> Hi Jarkko
>
> On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
> > >
> > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <[email protected]>
> > > wrote:
> > >
> > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> > > > > Hi all,
> > > > >
> > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> > > > > <[email protected]> wrote:
> > > > >
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > > > > > Hi Jarkko,
> > > > > > > >
> > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > > > > > this version of
> > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
> > > > > pages but
> > > > > > > > > > obviously new RX pages are now out of the picture:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > > * Adding a regular page that is architecturally allowed
> > > > > to only
> > > > > > > > > > * be created with RW permissions.
> > > > > > > > > > * TBD: Interface with user space policy to support max
> > > > > permissions
> > > > > > > > > > * of RWX.
> > > > > > > > > > */
> > > > > > > > > > prot = PROT_READ | PROT_WRITE;
> > > > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > > > > encl_page->vm_max_prot_bits =
> > > encl_page->vm_run_prot_bits;
> > > > > > > > > >
> > > > > > > > > > If that TBD is left out to the final version the page
> > > > > > > > > > augmentation has a
> > > > > > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > > > > > also in the page
> > > > > > > > > > permission ioctls.
> > > > > > > > > >
> > > > > > > > > > I.e. now any review comment is based on not fully known
> > > > > > > > > > territory, we have
> > > > > > > > > > one known unknown, and some unknown unknowns from
> > > > > > > > > > unpredictable effect to
> > > > > > > > > > future API changes.
> > > > > > > >
> > > > > > > > The plan to complete the "TBD" in the above snippet was to
> > > > > > > > follow this work
> > > > > > > > with user policy integration at this location. On a high level
> > > > > > > > the plan was
> > > > > > > > for this to look something like:
> > > > > > > >
> > > > > > > >
> > > > > > > > /*
> > > > > > > > * Adding a regular page that is architecturally allowed
> > > to only
> > > > > > > > * be created with RW permissions.
> > > > > > > > * Interface with user space policy to support max
> > > permissions
> > > > > > > > * of RWX.
> > > > > > > > */
> > > > > > > > prot = PROT_READ | PROT_WRITE;
> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > >
> > > > > > > > if (user space policy allows RWX on dynamically added
> > > > > pages)
> > > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > > > PROT_WRITE | PROT_EXEC, 0);
> > > > > > > > else
> > > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > > > PROT_WRITE, 0);
> > > > > > > >
> > > > > > > > The work that follows this series aimed to do the integration
> > > > > with user
> > > > > > > > space policy.
> > > > > > >
> > > > > > > What do you mean by "user space policy" anyway exactly? I'm
> > > > > sorry but I
> > > > > > > just don't fully understand this.
> > > > > >
> > > > > > My apologies - I just assumed that you would need no reminder
> > > > > about this
> > > > > > contentious
> > > > > > part of SGX history. Essentially it means that, yes, the
> > > kernel could
> > > > > > theoretically
> > > > > > permit any kind of access to any file/page, but some accesses are
> > > > > known
> > > > > > to generally
> > > > > > be a bad idea - like making memory executable as well as writable
> > > > > - and
> > > > > > thus there
> > > > > > are additional checks based on what user space permits before the
> > > > > kernel
> > > > > > allows
> > > > > > such accesses.
> > > > > >
> > > > > > For example,
> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> > > > > >
> > > > > > User policy and SGX has seen significant discussion. Some notable
> > > > > > threads:
> > > > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > > > > > https://lore.kernel.org/linux-security-module/[email protected]/
> > > > > >
> > > > > > > It's too big of a risk to accept this series without X taken
> > > care
> > > > > > > of. Patch
> > > > > > > series should neither have TODO nor TBD comments IMHO. I
> > > don't want
> > > > > > > to ack
> > > > > > > a series based on speculation what might happen in the future.
> > > > > >
> > > > > > ok
> > > > > >
> > > > > > >
> > > > > > > > > I think the best way to move forward would be to do EAUG's
> > > > > > > > > explicitly with
> > > > > > > > > an ioctl that could also include secinfo for permissions.
> > > > > Then you can
> > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > > > > > >
> > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
> > > > > used for
> > > > > > > > this purpose. It already includes SECINFO which may also be
> > > > > useful if
> > > > > > > > needing to later support EAUG of PT_SS* pages.
> > > > > > >
> > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and
> > > call it
> > > > > > > a day.
> > > > > >
> > > > > > I could, yes.
> > > > > >
> > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > > > > > this weird
> > > > > > > thing added to the #PF handler? Why is it added at all then?
> > > > > >
> > > > > > I was just speculating in my response, there is no plan to extend
> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> > > > > >
> > > > > > > > How this could work is user space calls
> > > SGX_IOC_ENCLAVE_ADD_PAGES
> > > > > > > > after enclave initialization on any memory region within the
> > > > > > > > enclave where
> > > > > > > > pages are planned to be added dynamically. This ioctl() calls
> > > > > > > > EAUG to add the
> > > > > > > > new pages with RW permissions and their vm_max_prot_bits
> > > can be
> > > > > > > > set to the
> > > > > > > > permissions found in the included SECINFO. This will support
> > > > > > > > later EACCEPTCOPY
> > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > > > > > >
> > > > > > > I don't like this type of re-use of the existing API.
> > > > > >
> > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
> > > > > consensus
> > > > > > after
> > > > > > considering the user policy question (above) and performance
> > > trade-off
> > > > > > (more below).
> > > > > >
> > > > > > >
> > > > > > > > The big question is whether communicating user policy after
> > > > > > > > enclave initialization
> > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > > > > > to all? I would
> > > > > > > > appreciate a confirmation on this direction considering the
> > > > > > > > significant history
> > > > > > > > behind this topic.
> > > > > > >
> > > > > > > I have no idea because I don't know what is user space policy.
> > > > > >
> > > > > > This discussion is about some enclave usages needing RWX
> > > permissions
> > > > > > on dynamically added enclave pages. RWX permissions on dynamically
> > > > > added
> > > > > > pages is
> > > > > > not something that should blindly be allowed for all SGX
> > > enclaves but
> > > > > > instead the user
> > > > > > needs to explicitly allow specific enclaves to have such
> > > ability. This
> > > > > > is equivalent
> > > > > > to (but not the same as) what exists in Linux today with LSM. As
> > > > > seen in
> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
> > > > > is able
> > > > > > to make
> > > > > > files and memory be both writable and executable, but it would
> > > only do
> > > > > > so for those
> > > > > > files and memory that the LSM (which is how user policy is
> > > > > communicated,
> > > > > > like SELinux)
> > > > > > indicates it is allowed, not blindly do so for all files and all
> > > > > memory.
> > > > > >
> > > > > > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > > > > > too flakky and
> > > > > > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > > > > > case (not to
> > > > > > > > > mention that JIT is not possible at all because inability to
> > > > > > > > > do RX pages).
> > > > > >
> > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
> > > > > deterministic
> > > > > > but from
> > > > > > what I understand it would have a performance impact since it
> > > would
> > > > > > require all memory
> > > > > > that may be needed by the enclave be pre-allocated from
> > > outside the
> > > > > > enclave and not
> > > > > > just dynamically allocated from within the enclave at the time
> > > it is
> > > > > > needed.
> > > > > >
> > > > > > Would such a performance impact be acceptable?
> > > > > >
> > > > >
> > > > > User space won't always have enough info to decide whether the pages
> > > > > to be
> > > > > EAUG'd immediately. In some cases (shared libraries, JVM for
> > > > > example) lots
> > > > > of code/data pages can be mapped but never actually touched. One
> > > > > enclave/process does not know if any other more important
> > > > > enclave/process
> > > > > would need the EPC.
> > > > >
> > > > > It should be for kernel to make the final decision as it has overall
> > > > > picture
> > > > > of the system EPC usage and availability.
> > > >
> > > > EAUG ioctl does not give better capabilities for user space to waste
> > > > EPC given that EADD ioctl already exists, i.e. your argument is
> > > logically
> > > > incorrect.
> > >
> > > The point of adding EAUG is to allow more efficient use of EPC pages.
> > > Without EAUG, enclaves have to EADD everything upfront into EPC,
> > > consuming
> > > predetermined number of EPC pages, some of which may not be used at all.
> > > With EAUG, enclaves should be able to load minimal pages to get started,
> > > pages added on #PF as they are actually accessed.
> > >
> > > Obviously as you pointed out, some usages make more sense to
> > > pre-EAUG (EAUG
> > > before #PF). But your proposal of supporting only pre-EAUG here
> > > essentially
> > > makes EAUG behave almost the same as EADD. If the current
> > > implementation
> > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
> > > based on Dave's comments), then it is flxible to cover all cases and
> > > allow
> > > kernel to optimize allocation of EPC pages.
> >
> > There is no even a working #PF based implementation in existance, and
> > your
> > argument has too many if's for my taste.
>
> 1) if you mean no user space is implementing this kind of solution, read
> this section, otherwise, skip to 2) below which is only couple of sentences.
>
> If you are willing to look, there is already implementation in our SDK to do
> heap and stack expansion on demand on #PF. Enclaves may not know heap/stack
> size up front, we have implemented these features to make EPC usage more
> efficient. I don't know why normal processes can add RAM on #PF, but
> enclaves adding EPC on #PF becomes so unacceptable concept to you. And the
> kernel does that for EPC swapping already when #PF happens on a swapped out
> EPC page.

In adds O(n) round-trips for a mmap() emulation, which can be done in O(1)
round-trips with a ioctl.

> Our implementation has gone through several rounds, the latest is
> here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also
> implemented in original OOT driver based SDK implementation. Customers are
> using it and found them useful. I think this is a critical feature that many
> other runtimes will also need.

I'm not sure what the common sense argument here is.

> 2)
> It's OK for you to request additional support for your usage and I agree it
> is needed. But IMHO, totally getting rid of EAUG on #PF is bad and
> unnecessary. Current implementation can be extended to support your usage.
> What's the reason you think MAP_POPULATE won't work for you?

I do not recall taking stand on MAP_POPULATE.

> BR
> Haitao

BR, Jarkko

2022-03-05 05:26:19

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Sorry, I missed this.

On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> On 3/3/22 13:23, Reinette Chatre wrote:
> > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > then I believe that SGX would benefit.
>
> Some Intel folks asked for this quite a while ago. I think it's
> entirely doable: add a new vm_ops->populate() function that will allow
> ignoring VM_IO|VM_PFNMAP if present.

I'm sorry what I don't understand what you mean by ignoring here,
i.e. cannot fully comprehend the last sentece.

And would the vm_ops->populate() be called right after the existing ones
involved with the VMA creation process?

> Or, if nobody wants to waste all of the vm_ops space, just add an
> arch_vma_populate() or something which can call over into SGX.
>
> I'll happily review the patches if anyone can put such a beast together.

I'll start with vm_ops->populate() and check the feedback first for
that.

BR, Jarkko

2022-03-06 11:13:24

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Sun, Mar 06, 2022 at 02:15:32AM +0200, Jarkko Sakkinen wrote:
> On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote:
> > Sorry, I missed this.
> >
> > On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > > On 3/3/22 13:23, Reinette Chatre wrote:
> > > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > > then I believe that SGX would benefit.
> > >
> > > Some Intel folks asked for this quite a while ago. I think it's
> > > entirely doable: add a new vm_ops->populate() function that will allow
> > > ignoring VM_IO|VM_PFNMAP if present.
> >
> > I'm sorry what I don't understand what you mean by ignoring here,
> > i.e. cannot fully comprehend the last sentece.
> >
> > And would the vm_ops->populate() be called right after the existing ones
> > involved with the VMA creation process?
> >
> > > Or, if nobody wants to waste all of the vm_ops space, just add an
> > > arch_vma_populate() or something which can call over into SGX.
> > >
> > > I'll happily review the patches if anyone can put such a beast together.
> >
> > I'll start with vm_ops->populate() and check the feedback first for
> > that.
>
> I would instead extend populate() in file_operations into:
>
> int (*populate)(struct file *, struct vm_area_struct *, bool populate);
>
> This does not add to memory consumption.

Ugh, mixing my words, sorry :-) I meant:

int (*mmap)(struct file *, struct vm_area_struct *, bool populate);

BR, Jarkko

2022-03-06 20:28:03

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, 04 Mar 2022 19:02:28 -0600, Jarkko Sakkinen <[email protected]>
wrote:

> On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote:
>> Hi Jarkko
>>
>> On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <[email protected]>
>> wrote:
>>
>> > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
>> > >
>> > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen
>> <[email protected]>
>> > > wrote:
>> > >
>> > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> > > > > Hi all,
>> > > > >
>> > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> > > > > <[email protected]> wrote:
>> > > > >
>> > > > > > Hi Jarkko,
>> > > > > >
>> > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre
>> wrote:
>> > > > > > > > Hi Jarkko,
>> > > > > > > >
>> > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > > > > > this version of
>> > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R
>> and RX
>> > > > > pages but
>> > > > > > > > > > obviously new RX pages are now out of the picture:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > /*
>> > > > > > > > > > * Adding a regular page that is architecturally
>> allowed
>> > > > > to only
>> > > > > > > > > > * be created with RW permissions.
>> > > > > > > > > > * TBD: Interface with user space policy to support
>> max
>> > > > > permissions
>> > > > > > > > > > * of RWX.
>> > > > > > > > > > */
>> > > > > > > > > > prot = PROT_READ | PROT_WRITE;
>> > > > > > > > > > encl_page->vm_run_prot_bits =
>> calc_vm_prot_bits(prot, 0);
>> > > > > > > > > > encl_page->vm_max_prot_bits =
>> > > encl_page->vm_run_prot_bits;
>> > > > > > > > > >
>> > > > > > > > > > If that TBD is left out to the final version the page
>> > > > > > > > > > augmentation has a
>> > > > > > > > > > risk of a API bottleneck, and that risk can realize
>> then
>> > > > > > > > > > also in the page
>> > > > > > > > > > permission ioctls.
>> > > > > > > > > >
>> > > > > > > > > > I.e. now any review comment is based on not fully
>> known
>> > > > > > > > > > territory, we have
>> > > > > > > > > > one known unknown, and some unknown unknowns from
>> > > > > > > > > > unpredictable effect to
>> > > > > > > > > > future API changes.
>> > > > > > > >
>> > > > > > > > The plan to complete the "TBD" in the above snippet was to
>> > > > > > > > follow this work
>> > > > > > > > with user policy integration at this location. On a high
>> level
>> > > > > > > > the plan was
>> > > > > > > > for this to look something like:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > /*
>> > > > > > > > * Adding a regular page that is architecturally allowed
>> > > to only
>> > > > > > > > * be created with RW permissions.
>> > > > > > > > * Interface with user space policy to support max
>> > > permissions
>> > > > > > > > * of RWX.
>> > > > > > > > */
>> > > > > > > > prot = PROT_READ | PROT_WRITE;
>> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot,
>> 0);
>> > > > > > > >
>> > > > > > > > if (user space policy allows RWX on dynamically
>> added
>> > > > > pages)
>> > > > > > > > encl_page->vm_max_prot_bits =
>> calc_vm_prot_bits(PROT_READ |
>> > > > > > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > > > > > else
>> > > > > > > > encl_page->vm_max_prot_bits =
>> calc_vm_prot_bits(PROT_READ |
>> > > > > > > > PROT_WRITE, 0);
>> > > > > > > >
>> > > > > > > > The work that follows this series aimed to do the
>> integration
>> > > > > with user
>> > > > > > > > space policy.
>> > > > > > >
>> > > > > > > What do you mean by "user space policy" anyway exactly? I'm
>> > > > > sorry but I
>> > > > > > > just don't fully understand this.
>> > > > > >
>> > > > > > My apologies - I just assumed that you would need no reminder
>> > > > > about this
>> > > > > > contentious
>> > > > > > part of SGX history. Essentially it means that, yes, the
>> > > kernel could
>> > > > > > theoretically
>> > > > > > permit any kind of access to any file/page, but some accesses
>> are
>> > > > > known
>> > > > > > to generally
>> > > > > > be a bad idea - like making memory executable as well as
>> writable
>> > > > > - and
>> > > > > > thus there
>> > > > > > are additional checks based on what user space permits before
>> the
>> > > > > kernel
>> > > > > > allows
>> > > > > > such accesses.
>> > > > > >
>> > > > > > For example,
>> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> > > > > >
>> > > > > > User policy and SGX has seen significant discussion. Some
>> notable
>> > > > > > threads:
>> > > > > >
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> > > > > >
>> https://lore.kernel.org/linux-security-module/[email protected]/
>> > > > > >
>> > > > > > > It's too big of a risk to accept this series without X taken
>> > > care
>> > > > > > > of. Patch
>> > > > > > > series should neither have TODO nor TBD comments IMHO. I
>> > > don't want
>> > > > > > > to ack
>> > > > > > > a series based on speculation what might happen in the
>> future.
>> > > > > >
>> > > > > > ok
>> > > > > >
>> > > > > > >
>> > > > > > > > > I think the best way to move forward would be to do
>> EAUG's
>> > > > > > > > > explicitly with
>> > > > > > > > > an ioctl that could also include secinfo for
>> permissions.
>> > > > > Then you can
>> > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > > > > > >
>> > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could
>> possibly be
>> > > > > used for
>> > > > > > > > this purpose. It already includes SECINFO which may also
>> be
>> > > > > useful if
>> > > > > > > > needing to later support EAUG of PT_SS* pages.
>> > > > > > >
>> > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and
>> > > call it
>> > > > > > > a day.
>> > > > > >
>> > > > > > I could, yes.
>> > > > > >
>> > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES
>> what is
>> > > > > > > this weird
>> > > > > > > thing added to the #PF handler? Why is it added at all then?
>> > > > > >
>> > > > > > I was just speculating in my response, there is no plan to
>> extend
>> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> > > > > >
>> > > > > > > > How this could work is user space calls
>> > > SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > > > > > after enclave initialization on any memory region within
>> the
>> > > > > > > > enclave where
>> > > > > > > > pages are planned to be added dynamically. This ioctl()
>> calls
>> > > > > > > > EAUG to add the
>> > > > > > > > new pages with RW permissions and their vm_max_prot_bits
>> > > can be
>> > > > > > > > set to the
>> > > > > > > > permissions found in the included SECINFO. This will
>> support
>> > > > > > > > later EACCEPTCOPY
>> > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > > > > > >
>> > > > > > > I don't like this type of re-use of the existing API.
>> > > > > >
>> > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
>> > > > > consensus
>> > > > > > after
>> > > > > > considering the user policy question (above) and performance
>> > > trade-off
>> > > > > > (more below).
>> > > > > >
>> > > > > > >
>> > > > > > > > The big question is whether communicating user policy
>> after
>> > > > > > > > enclave initialization
>> > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is
>> acceptable
>> > > > > > > > to all? I would
>> > > > > > > > appreciate a confirmation on this direction considering
>> the
>> > > > > > > > significant history
>> > > > > > > > behind this topic.
>> > > > > > >
>> > > > > > > I have no idea because I don't know what is user space
>> policy.
>> > > > > >
>> > > > > > This discussion is about some enclave usages needing RWX
>> > > permissions
>> > > > > > on dynamically added enclave pages. RWX permissions on
>> dynamically
>> > > > > added
>> > > > > > pages is
>> > > > > > not something that should blindly be allowed for all SGX
>> > > enclaves but
>> > > > > > instead the user
>> > > > > > needs to explicitly allow specific enclaves to have such
>> > > ability. This
>> > > > > > is equivalent
>> > > > > > to (but not the same as) what exists in Linux today with LSM.
>> As
>> > > > > seen in
>> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> Linux
>> > > > > is able
>> > > > > > to make
>> > > > > > files and memory be both writable and executable, but it would
>> > > only do
>> > > > > > so for those
>> > > > > > files and memory that the LSM (which is how user policy is
>> > > > > communicated,
>> > > > > > like SELinux)
>> > > > > > indicates it is allowed, not blindly do so for all files and
>> all
>> > > > > memory.
>> > > > > >
>> > > > > > > > > Putting EAUG to the #PF handler and implicitly call it
>> just
>> > > > > > > > > too flakky and
>> > > > > > > > > hard to make deterministic for e.g. JIT compiler in our
>> use
>> > > > > > > > > case (not to
>> > > > > > > > > mention that JIT is not possible at all because
>> inability to
>> > > > > > > > > do RX pages).
>> > > > > >
>> > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
>> > > > > deterministic
>> > > > > > but from
>> > > > > > what I understand it would have a performance impact since it
>> > > would
>> > > > > > require all memory
>> > > > > > that may be needed by the enclave be pre-allocated from
>> > > outside the
>> > > > > > enclave and not
>> > > > > > just dynamically allocated from within the enclave at the time
>> > > it is
>> > > > > > needed.
>> > > > > >
>> > > > > > Would such a performance impact be acceptable?
>> > > > > >
>> > > > >
>> > > > > User space won't always have enough info to decide whether the
>> pages
>> > > > > to be
>> > > > > EAUG'd immediately. In some cases (shared libraries, JVM for
>> > > > > example) lots
>> > > > > of code/data pages can be mapped but never actually touched. One
>> > > > > enclave/process does not know if any other more important
>> > > > > enclave/process
>> > > > > would need the EPC.
>> > > > >
>> > > > > It should be for kernel to make the final decision as it has
>> overall
>> > > > > picture
>> > > > > of the system EPC usage and availability.
>> > > >
>> > > > EAUG ioctl does not give better capabilities for user space to
>> waste
>> > > > EPC given that EADD ioctl already exists, i.e. your argument is
>> > > logically
>> > > > incorrect.
>> > >
>> > > The point of adding EAUG is to allow more efficient use of EPC
>> pages.
>> > > Without EAUG, enclaves have to EADD everything upfront into EPC,
>> > > consuming
>> > > predetermined number of EPC pages, some of which may not be used at
>> all.
>> > > With EAUG, enclaves should be able to load minimal pages to get
>> started,
>> > > pages added on #PF as they are actually accessed.
>> > >
>> > > Obviously as you pointed out, some usages make more sense to
>> > > pre-EAUG (EAUG
>> > > before #PF). But your proposal of supporting only pre-EAUG here
>> > > essentially
>> > > makes EAUG behave almost the same as EADD. If the current
>> > > implementation
>> > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems
>> possible
>> > > based on Dave's comments), then it is flxible to cover all cases and
>> > > allow
>> > > kernel to optimize allocation of EPC pages.
>> >
>> > There is no even a working #PF based implementation in existance, and
>> > your
>> > argument has too many if's for my taste.
>>
>> 1) if you mean no user space is implementing this kind of solution, read
>> this section, otherwise, skip to 2) below which is only couple of
>> sentences.
>>
>> If you are willing to look, there is already implementation in our SDK
>> to do
>> heap and stack expansion on demand on #PF. Enclaves may not know
>> heap/stack
>> size up front, we have implemented these features to make EPC usage more
>> efficient. I don't know why normal processes can add RAM on #PF, but
>> enclaves adding EPC on #PF becomes so unacceptable concept to you. And
>> the
>> kernel does that for EPC swapping already when #PF happens on a swapped
>> out
>> EPC page.
>
> In adds O(n) round-trips for a mmap() emulation, which can be done in
> O(1)
> round-trips with a ioctl.
>
>> Our implementation has gone through several rounds, the latest is
>> here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was
>> also
>> implemented in original OOT driver based SDK implementation. Customers
>> are
>> using it and found them useful. I think this is a critical feature that
>> many
>> other runtimes will also need.
>
> I'm not sure what the common sense argument here is.
>
My (wrong) assumption was that you are disabling EAUG on #PF totally, and
all I was saying EAUG on #PF is critical for many usages and disabling it
requires good justification.

But you are expecting an ioctl call for each #PF for those usages:
https://lore.kernel.org/linux-sgx/[email protected]/#t. IIUC, that's
better than total disabling but less optimal. (I have not checked all call
sequences in detail to be sure it would work for all our cases)


>> 2)
>> It's OK for you to request additional support for your usage and I
>> agree it
>> is needed. But IMHO, totally getting rid of EAUG on #PF is bad and
>> unnecessary. Current implementation can be extended to support your
>> usage.
>> What's the reason you think MAP_POPULATE won't work for you?
>
> I do not recall taking stand on MAP_POPULATE.

Thanks for looking into that. Like I said, that should cover all usages.

2022-03-07 01:34:09

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote:
> Sorry, I missed this.
>
> On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > On 3/3/22 13:23, Reinette Chatre wrote:
> > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > then I believe that SGX would benefit.
> >
> > Some Intel folks asked for this quite a while ago. I think it's
> > entirely doable: add a new vm_ops->populate() function that will allow
> > ignoring VM_IO|VM_PFNMAP if present.
>
> I'm sorry what I don't understand what you mean by ignoring here,
> i.e. cannot fully comprehend the last sentece.
>
> And would the vm_ops->populate() be called right after the existing ones
> involved with the VMA creation process?
>
> > Or, if nobody wants to waste all of the vm_ops space, just add an
> > arch_vma_populate() or something which can call over into SGX.
> >
> > I'll happily review the patches if anyone can put such a beast together.
>
> I'll start with vm_ops->populate() and check the feedback first for
> that.

I would instead extend populate() in file_operations into:

int (*populate)(struct file *, struct vm_area_struct *, bool populate);

This does not add to memory consumption.

BR, Jarkko

2022-03-10 06:23:37

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> On 3/3/22 13:23, Reinette Chatre wrote:
> > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > then I believe that SGX would benefit.
>
> Some Intel folks asked for this quite a while ago. I think it's
> entirely doable: add a new vm_ops->populate() function that will allow
> ignoring VM_IO|VM_PFNMAP if present.
>
> Or, if nobody wants to waste all of the vm_ops space, just add an
> arch_vma_populate() or something which can call over into SGX.
>
> I'll happily review the patches if anyone can put such a beast together.

Everyone would be better off, if EAUG's were done unconditionally for
mmap() after initialization. Nice property is that this needs no core mm
changes.

The resource saving argument is at least a bit weak because you might use
EMODPR for the address range anyway. So you end up doing things just
slower. And to have good confidentiality, you actually probably want to
clear also dynamically added pages with EACCEPTCOPY (and zero page) when
you take them into use.

I find it also a bit worrying that enclave has direct access to allocate
kernel resources and trigger ring-0 opcode. I don't like that part at
all. syscall/ioctl sets the correct barrier, as the host side should be
and is the resource manager, not the enclave.

BR, Jarkko

2022-03-10 10:34:40

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> Hi All,
>
> Regarding the recent update of splitting the page permissions change
> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> one? That is, revert to how it was done in the v1 version?
>
> Why? Currently in Gramine (a library OS for unmodified applications,
> https://gramineproject.io/) with the new proposed change, one needs to
> store the page permission for each page or range of pages. And for every
> request of `mmap` or `mprotect`, Gramine would have to do a lookup of the
> page permissions for the request range and then call the respective IOCTL
> either RESTRICT or RELAX. This seems a little overwhelming.
>
> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> With this approach, we can avoid storing page permissions and simplify
> the implementation.
>
> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows
> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> not sure what will be the performance impact. Is there any data point to
> see the performance impact?
>
> Thanks,
> -Vijay

This should get better in the next versuin. "relax" is gone. And for
dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
internal vm_max_prot_bits is set to RWX.

I patched the existing series eno

For Enarx I'm using the following patterns.

Shim mmap() handler:
1. Ask host for mmap() syscall.
2. Construct secinfo matching the protection bits.
3. For each page in the address range: EACCEPTCOPY with a
zero page.

Shim mprotect() handler:
1. Ask host for mprotect() syscall.
2. For each page in the address range: EACCEPT with PROT_NONE
secinfo and EMODPE with the secinfo having the prot bits.

Backend mprotect() handler:
1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
range with PROT_NONE.
2. Invoke real mprotect() syscall.

Not super-complicated.

That is the safest way to changes permissions i.e. use EMODPR only to reset
the permissions, and EMODPE as EMODP. Then the page is always either
inaccessible completely or with the correct permissions.

Any other ways to use EMODPR are a bit questionable. That's why I tend to
think that it would be better to kernel provide only limited version of it
to reset the permissions. Most of the other use will be most likely
mis-use. IMHO there is only one legit pattern to use it, i.e. "least
racy" pattern.

I would replace SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS with
SGX_IOC_ENCLAVE_RESET_PERMISSIONS that resets pages to PROT_NONE or embed
this straight into mprotect().

BR, Jarkko

2022-03-10 11:32:22

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 10, 2022 at 07:43:42AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > On 3/3/22 13:23, Reinette Chatre wrote:
> > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > then I believe that SGX would benefit.
> >
> > Some Intel folks asked for this quite a while ago. I think it's
> > entirely doable: add a new vm_ops->populate() function that will allow
> > ignoring VM_IO|VM_PFNMAP if present.
> >
> > Or, if nobody wants to waste all of the vm_ops space, just add an
> > arch_vma_populate() or something which can call over into SGX.
> >
> > I'll happily review the patches if anyone can put such a beast together.
>
> Everyone would be better off, if EAUG's were done unconditionally for
> mmap() after initialization. Nice property is that this needs no core mm
> changes.
>
> The resource saving argument is at least a bit weak because you might use
> EMODPR for the address range anyway. So you end up doing things just
> slower. And to have good confidentiality, you actually probably want to
> clear also dynamically added pages with EACCEPTCOPY (and zero page) when
> you take them into use.
>
> I find it also a bit worrying that enclave has direct access to allocate
> kernel resources and trigger ring-0 opcode. I don't like that part at
> all. syscall/ioctl sets the correct barrier, as the host side should be
> and is the resource manager, not the enclave.

Actually, this should be ABI compatible too. I'd expect all kselftests
continue work as they are.

BR, Jarkko

2022-03-11 10:27:00

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko

I have some trouble understanding the sequences below.

On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <[email protected]>
wrote:

> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
>> Hi All,
>>
>> Regarding the recent update of splitting the page permissions change
>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>> one? That is, revert to how it was done in the v1 version?
>>
>> Why? Currently in Gramine (a library OS for unmodified applications,
>> https://gramineproject.io/) with the new proposed change, one needs to
>> store the page permission for each page or range of pages. And for every
>> request of `mmap` or `mprotect`, Gramine would have to do a lookup of
>> the
>> page permissions for the request range and then call the respective
>> IOCTL
>> either RESTRICT or RELAX. This seems a little overwhelming.
>>
>> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
>> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
>> With this approach, we can avoid storing page permissions and simplify
>> the implementation.
>>
>> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
>> flows
>> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
>> not sure what will be the performance impact. Is there any data point to
>> see the performance impact?
>>
>> Thanks,
>> -Vijay
>
> This should get better in the next versuin. "relax" is gone. And for
> dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> internal vm_max_prot_bits is set to RWX.
>
> I patched the existing series eno
>
> For Enarx I'm using the following patterns.
>
> Shim mmap() handler:
> 1. Ask host for mmap() syscall.
> 2. Construct secinfo matching the protection bits.
> 3. For each page in the address range: EACCEPTCOPY with a
> zero page.

For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
So this only works for mmap(..., RW) or mmap(...,RWX).

So that gives you pages with RW/RWX.

To change permissions of any of those pages from RW/RWX to R/RX , you need
call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you
can't just do EMODPE.

so for RW->R, you either:

1)EMODPR(EPCM.NONE)
2)EACCEPT(EPCM.NONE)
3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
access permitted by enclave"

or:

1)EMODPR(EPCM.PROT_R)
2)EACCEPT(EPCM.PROT_R)


> Shim mprotect() handler:
> 1. Ask host for mprotect() syscall.
> 2. For each page in the address range: EACCEPT with PROT_NONE
> secinfo and EMODPE with the secinfo having the prot bits.

EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
EPCM.RW,
so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.


> Backend mprotect() handler:
> 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> range with PROT_NONE.
> 2. Invoke real mprotect() syscall.
>
Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
pages.

> Not super-complicated.
>
> That is the safest way to changes permissions i.e. use EMODPR only to
> reset
> the permissions, and EMODPE as EMODP. Then the page is always either
> inaccessible completely or with the correct permissions.
>
> Any other ways to use EMODPR are a bit questionable. That's why I tend to
> think that it would be better to kernel provide only limited version of
> it
> to reset the permissions. Most of the other use will be most likely
> mis-use. IMHO there is only one legit pattern to use it, i.e. "least
> racy" pattern.
>

I don't see it as "racy" if you copy some data into RW page and reduce it
to R.
From kernel point of view the only diff is EMODPR(NONE) vs EMODPR(R).

It's more efficient to do just EMODPR(R) than EMODPR(NONE)+ EMODPE(R).


Thanks
Haitao

2022-03-11 20:42:52

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:

> I do not believe that you encountered the #GP documented above because that
> check is already present in the current implementation of
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
>
> sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
> if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
> return -EINVAL;
>
> It does return EINVAL which is the catch-all error code used to represent
> invalid input from user space. I am not convinced that EACCES should be used
> instead though, EACCES means "Permission denied", which is not the case here.
> The case here is just an invalid request.
>
> It currently does not prevent the user from setting PROT_NONE though, which
> EMODPR does seem to allow.
>
> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> This motivates that EMODPR->PROT_NONE should not be allowed since it would
> not be possible to relax permissions (run EMODPE) after that. Even so, I
> also found in the SDM that EACCEPT has the note "Read access permitted
> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> from that perspective either since the enclave will not be able to
> EACCEPT the change. Does that match your understanding?
>
> I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.

Yes, I think we are in the same line with this.

But there is another thing.

As EAUG is taken care by the page handler so should EMODPR. It makes the
developer experience whole a lot easier when you don't have to back call
to host and ask it to execute EMODPR for the range.

It's also a huge incosistency in this patch set that they are handled
differently.

And it creates a concurrency case for user space that is complicated to say
the least, i.e. divided work between host and enclave implementation to
execute EMODPR is a nightmare scenario. On the other hand this is trivial
to sort out in kernel.

So what it means that, in one way or antoher, mprotect() needs to be the
melting point for both. This can be called mandatory requirement, however
this patch set it done, not least because of managing concurrency between
kernel and user space.

You can get that done by these steps:

1. Unmap PTE's in mprotect() flow.
2. In #PF handler, EMODPR with R set.

This clear API for enclave developer because you know in what state pages
are after mprotect(), and what you need to still do to them. Only the
syscall needs to be them performed by the host side.

BR, Jarkko

2022-03-11 21:27:45

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/11/2022 4:16 AM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
>> On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
>>> Hi Jarkko
>>>
>>> I have some trouble understanding the sequences below.
>>>
>>> On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <[email protected]>
>>> wrote:
>>>
>>>> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
>>>>> Hi All,
>>>>>
>>>>> Regarding the recent update of splitting the page permissions change
>>>>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>>>>> one? That is, revert to how it was done in the v1 version?
>>>>>
>>>>> Why? Currently in Gramine (a library OS for unmodified applications,
>>>>> https://gramineproject.io/) with the new proposed change, one needs to
>>>>> store the page permission for each page or range of pages. And for every
>>>>> request of `mmap` or `mprotect`, Gramine would have to do a lookup
>>>>> of the
>>>>> page permissions for the request range and then call the respective
>>>>> IOCTL
>>>>> either RESTRICT or RELAX. This seems a little overwhelming.
>>>>>
>>>>> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
>>>>> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
>>>>> With this approach, we can avoid storing page permissions and simplify
>>>>> the implementation.
>>>>>
>>>>> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
>>>>> flows
>>>>> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
>>>>> not sure what will be the performance impact. Is there any data point to
>>>>> see the performance impact?
>>>>>
>>>>> Thanks,
>>>>> -Vijay
>>>>
>>>> This should get better in the next versuin. "relax" is gone. And for
>>>> dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
>>>> internal vm_max_prot_bits is set to RWX.
>>>>
>>>> I patched the existing series eno
>>>>
>>>> For Enarx I'm using the following patterns.
>>>>
>>>> Shim mmap() handler:
>>>> 1. Ask host for mmap() syscall.
>>>> 2. Construct secinfo matching the protection bits.
>>>> 3. For each page in the address range: EACCEPTCOPY with a
>>>> zero page.
>>>
>>> For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
>>> So this only works for mmap(..., RW) or mmap(...,RWX).
>>
>> I use it only with EAUG.
>>
>>> So that gives you pages with RW/RWX.
>>>
>>> To change permissions of any of those pages from RW/RWX to R/RX , you need
>>> call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
>>> just do EMODPE.
>>>
>>> so for RW->R, you either:
>>>
>>> 1)EMODPR(EPCM.NONE)
>>> 2)EACCEPT(EPCM.NONE)
>>> 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
>>> access permitted by enclave"
>>>
>>> or:
>>>
>>> 1)EMODPR(EPCM.PROT_R)
>>> 2)EACCEPT(EPCM.PROT_R)
>>
>> I checked from SDM and you're correct.
>>
>> Then the appropriate thing is to reset to R.
>>
>>>> Shim mprotect() handler:
>>>> 1. Ask host for mprotect() syscall.
>>>> 2. For each page in the address range: EACCEPT with PROT_NONE
>>>> secinfo and EMODPE with the secinfo having the prot bits.
>>>
>>> EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
>>> EPCM.RW,
>>> so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
>>
>> Ditto.
>>
>>>> Backend mprotect() handler:
>>>> 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
>>>> range with PROT_NONE.
>>>> 2. Invoke real mprotect() syscall.
>>>>
>>> Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
>>> pages.
>>
>> Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
>>
>> Reinette, the ioctl should already check that either R or W is set in
>> secinfo and return -EACCES.
>>
>> I.e.
>>
>> (* Check for misconfigured SECINFO flags*)
>> IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
>> (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
>> THEN #GP(0); FI;
>>
>> I was testing this and wondering why my enclave #GP's, and then I checked
>> SDM after reading Haitao's response. So clearly check in kernel side is
>> needed.

I do not believe that you encountered the #GP documented above because that
check is already present in the current implementation of
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:

sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
return -EINVAL;

It does return EINVAL which is the catch-all error code used to represent
invalid input from user space. I am not convinced that EACCES should be used
instead though, EACCES means "Permission denied", which is not the case here.
The case here is just an invalid request.

It currently does not prevent the user from setting PROT_NONE though, which
EMODPR does seem to allow.

I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
This motivates that EMODPR->PROT_NONE should not be allowed since it would
not be possible to relax permissions (run EMODPE) after that. Even so, I
also found in the SDM that EACCEPT has the note "Read access permitted
by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
from that perspective either since the enclave will not be able to
EACCEPT the change. Does that match your understanding?

I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.

> I would consider also adding such check "add pages". It's our least common
> denominator.
>
> If we can assume that at least R is there for every enclave page, then it
> gives invariant that enables EMODPR with R all the time.

Adding pages without permissions to an enclave does not seem practical. I
do not know if there are such usages. I can add this as a separate change for
consideration.

Reinette

2022-03-11 21:27:48

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/11/2022 10:11 AM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>
>> I do not believe that you encountered the #GP documented above because that
>> check is already present in the current implementation of
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
>>
>> sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
>> if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
>> return -EINVAL;
>>
>> It does return EINVAL which is the catch-all error code used to represent
>> invalid input from user space. I am not convinced that EACCES should be used
>> instead though, EACCES means "Permission denied", which is not the case here.
>> The case here is just an invalid request.
>>
>> It currently does not prevent the user from setting PROT_NONE though, which
>> EMODPR does seem to allow.
>>
>> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
>> This motivates that EMODPR->PROT_NONE should not be allowed since it would
>> not be possible to relax permissions (run EMODPE) after that. Even so, I
>> also found in the SDM that EACCEPT has the note "Read access permitted
>> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
>> from that perspective either since the enclave will not be able to
>> EACCEPT the change. Does that match your understanding?
>>
>> I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.
>
> Yes, I think we are in the same line with this.
>
> But there is another thing.
>
> As EAUG is taken care by the page handler so should EMODPR. It makes the
> developer experience whole a lot easier when you don't have to back call
> to host and ask it to execute EMODPR for the range.
>
> It's also a huge incosistency in this patch set that they are handled
> differently.
>
> And it creates a concurrency case for user space that is complicated to say
> the least, i.e. divided work between host and enclave implementation to
> execute EMODPR is a nightmare scenario. On the other hand this is trivial
> to sort out in kernel.

EMODPR has possible failures due to state that is managed by the user space
runtime. Being able to communicate accurate EMODPR error codes to user space
runtime is helpful to the runtime in supporting its management of the enclave
memory. Accurate EMODPR error codes can be communicated when using an ioctl(),
not when run from within a page fault handler.

> So what it means that, in one way or antoher, mprotect() needs to be the
> melting point for both.

mprotect() is the syscall to modify VMA permissions. EPCM permissions are
different from VMA permissions and they are currently treated differently
by the kernel.

Moving EPCM permission changes to mprotect() forces EPCM permissions to be
the same as VMA permissions. That is a significant change. It is also
inconsistent since EPCM permission changes cannot be managed completely
from the kernel since the kernel can only ever restrict permissions.

> This can be called mandatory requirement, however
> this patch set it done, not least because of managing concurrency between
> kernel and user space.
>
> You can get that done by these steps:
>
> 1. Unmap PTE's in mprotect() flow.
> 2. In #PF handler, EMODPR with R set.

There is also the very significant ETRACK flow that
needs to be run after EMODPR. The implications of sending IPIs to all
CPUs that may be running in an enclave while in a page fault handler needs
to be considered. Page faults should be as fast as possible.

If this is considered then this tremendous impact on the page fault handler
should be managed and avoided as much as possible - but how will the page
fault handler even know when it should run EMODPR? The enclave can run
EMODPE from within the enclave at any time without any insight from the
kernel so the only way to have accurate permissions would then be to
run EMODPR on _every_ page fault, which is obviously a non-starter due
to the significant impact (EMODPR and ETRACK) and blast radius (IPIs).

Trying to move running of EMODPR earlier, during the mprotect() call itself
is also full of obstacles since the mprotect() call may result in VMAs
being split, which is an operation that can fail, and followed by
the EMODPR-ETRACK flows that can also fail (and not be able to
undo the VMA splits). With the EMODPR-ETRACK flows that can fail it
is here also not possible to communicate accurately to user space since
now there is the whole page range to consider, for example, mprotect()
cannot communicate
(a) which pages caused the failure, and (b) what failure was encountered.
This is possible when using the ioctl().


> This clear API for enclave developer because you know in what state pages
> are after mprotect(), and what you need to still do to them. Only the
> syscall needs to be them performed by the host side.

Supporting permission restriction in an ioctl() enables the runtime to manage
the enclave memory without needing to map it.

I have considered the idea of supporting the permission restriction with
mprotect() but as you can see in this response I did not find it to be
practical.

Reinette

2022-03-11 22:19:12

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 11, 2022 at 02:16:47PM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> > > Hi Jarkko
> > >
> > > I have some trouble understanding the sequences below.
> > >
> > > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <[email protected]>
> > > wrote:
> > >
> > > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > > > Hi All,
> > > > >
> > > > > Regarding the recent update of splitting the page permissions change
> > > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > > > one? That is, revert to how it was done in the v1 version?
> > > > >
> > > > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > > > https://gramineproject.io/) with the new proposed change, one needs to
> > > > > store the page permission for each page or range of pages. And for every
> > > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > > > of the
> > > > > page permissions for the request range and then call the respective
> > > > > IOCTL
> > > > > either RESTRICT or RELAX. This seems a little overwhelming.
> > > > >
> > > > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
> > > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > > > With this approach, we can avoid storing page permissions and simplify
> > > > > the implementation.
> > > > >
> > > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > > > flows
> > > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > > > not sure what will be the performance impact. Is there any data point to
> > > > > see the performance impact?
> > > > >
> > > > > Thanks,
> > > > > -Vijay
> > > >
> > > > This should get better in the next versuin. "relax" is gone. And for
> > > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > > > internal vm_max_prot_bits is set to RWX.
> > > >
> > > > I patched the existing series eno
> > > >
> > > > For Enarx I'm using the following patterns.
> > > >
> > > > Shim mmap() handler:
> > > > 1. Ask host for mmap() syscall.
> > > > 2. Construct secinfo matching the protection bits.
> > > > 3. For each page in the address range: EACCEPTCOPY with a
> > > > zero page.
> > >
> > > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> > > So this only works for mmap(..., RW) or mmap(...,RWX).
> >
> > I use it only with EAUG.
> >
> > > So that gives you pages with RW/RWX.
> > >
> > > To change permissions of any of those pages from RW/RWX to R/RX , you need
> > > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> > > just do EMODPE.
> > >
> > > so for RW->R, you either:
> > >
> > > 1)EMODPR(EPCM.NONE)
> > > 2)EACCEPT(EPCM.NONE)
> > > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> > > access permitted by enclave"
> > >
> > > or:
> > >
> > > 1)EMODPR(EPCM.PROT_R)
> > > 2)EACCEPT(EPCM.PROT_R)
> >
> > I checked from SDM and you're correct.
> >
> > Then the appropriate thing is to reset to R.
> >
> > > > Shim mprotect() handler:
> > > > 1. Ask host for mprotect() syscall.
> > > > 2. For each page in the address range: EACCEPT with PROT_NONE
> > > > secinfo and EMODPE with the secinfo having the prot bits.
> > >
> > > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> > > EPCM.RW,
> > > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
> >
> > Ditto.
> >
> > > > Backend mprotect() handler:
> > > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> > > > range with PROT_NONE.
> > > > 2. Invoke real mprotect() syscall.
> > > >
> > > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> > > pages.
> >
> > Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
> >
> > Reinette, the ioctl should already check that either R or W is set in
> > secinfo and return -EACCES.
> >
> > I.e.
> >
> > (* Check for misconfigured SECINFO flags*)
> > IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
> > (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
> > THEN #GP(0); FI;
> >
> > I was testing this and wondering why my enclave #GP's, and then I checked
> > SDM after reading Haitao's response. So clearly check in kernel side is
> > needed.
>
> I would consider also adding such check "add pages". It's our least common
> denominator.
>
> If we can assume that at least R is there for every enclave page, then it
> gives invariant that enables EMODPR with R all the time.

Since EAUG is done already in the #PF handler, so must be EMODPR. Otherwise
we do things incosistently [*]. One being in #PF handler and other being
ioctl is unacceptable.

Moving EMODPR to #PF handler would be trivial:

1. In mprotect() callback unmap PTE's for
the range.
2. In #PF handler, EMODPR with read permissions.

This is something that would be understandable for the user space. The only
API ever required would be EMODPE for permission changes. You could
basically implement the whole thing for EPCM inside enclave with no ioctls
required.

That would leave only ioctls to the series:
1. SGX_IOC_ENCLAVE_MODIFY_TYPE
2. SGX_IOO_ENCLAVE_REMOVE_PAGES

[*] For me stick to #PF handler for EAUG is fine for the first mainline
version. The API side is factors more critical.

BR, Jarkko

2022-03-11 23:10:50

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> > Hi Jarkko
> >
> > I have some trouble understanding the sequences below.
> >
> > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <[email protected]>
> > wrote:
> >
> > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > > Hi All,
> > > >
> > > > Regarding the recent update of splitting the page permissions change
> > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > > one? That is, revert to how it was done in the v1 version?
> > > >
> > > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > > https://gramineproject.io/) with the new proposed change, one needs to
> > > > store the page permission for each page or range of pages. And for every
> > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > > of the
> > > > page permissions for the request range and then call the respective
> > > > IOCTL
> > > > either RESTRICT or RELAX. This seems a little overwhelming.
> > > >
> > > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
> > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > > With this approach, we can avoid storing page permissions and simplify
> > > > the implementation.
> > > >
> > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > > flows
> > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > > not sure what will be the performance impact. Is there any data point to
> > > > see the performance impact?
> > > >
> > > > Thanks,
> > > > -Vijay
> > >
> > > This should get better in the next versuin. "relax" is gone. And for
> > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > > internal vm_max_prot_bits is set to RWX.
> > >
> > > I patched the existing series eno
> > >
> > > For Enarx I'm using the following patterns.
> > >
> > > Shim mmap() handler:
> > > 1. Ask host for mmap() syscall.
> > > 2. Construct secinfo matching the protection bits.
> > > 3. For each page in the address range: EACCEPTCOPY with a
> > > zero page.
> >
> > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> > So this only works for mmap(..., RW) or mmap(...,RWX).
>
> I use it only with EAUG.
>
> > So that gives you pages with RW/RWX.
> >
> > To change permissions of any of those pages from RW/RWX to R/RX , you need
> > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> > just do EMODPE.
> >
> > so for RW->R, you either:
> >
> > 1)EMODPR(EPCM.NONE)
> > 2)EACCEPT(EPCM.NONE)
> > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> > access permitted by enclave"
> >
> > or:
> >
> > 1)EMODPR(EPCM.PROT_R)
> > 2)EACCEPT(EPCM.PROT_R)
>
> I checked from SDM and you're correct.
>
> Then the appropriate thing is to reset to R.
>
> > > Shim mprotect() handler:
> > > 1. Ask host for mprotect() syscall.
> > > 2. For each page in the address range: EACCEPT with PROT_NONE
> > > secinfo and EMODPE with the secinfo having the prot bits.
> >
> > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> > EPCM.RW,
> > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
>
> Ditto.
>
> > > Backend mprotect() handler:
> > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> > > range with PROT_NONE.
> > > 2. Invoke real mprotect() syscall.
> > >
> > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> > pages.
>
> Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
>
> Reinette, the ioctl should already check that either R or W is set in
> secinfo and return -EACCES.
>
> I.e.
>
> (* Check for misconfigured SECINFO flags*)
> IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
> (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
> THEN #GP(0); FI;
>
> I was testing this and wondering why my enclave #GP's, and then I checked
> SDM after reading Haitao's response. So clearly check in kernel side is
> needed.

I would consider also adding such check "add pages". It's our least common
denominator.

If we can assume that at least R is there for every enclave page, then it
gives invariant that enables EMODPR with R all the time.

BR, Jarkko

2022-03-11 23:29:53

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> Hi Jarkko
>
> I have some trouble understanding the sequences below.
>
> On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > Hi All,
> > >
> > > Regarding the recent update of splitting the page permissions change
> > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > one? That is, revert to how it was done in the v1 version?
> > >
> > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > https://gramineproject.io/) with the new proposed change, one needs to
> > > store the page permission for each page or range of pages. And for every
> > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > of the
> > > page permissions for the request range and then call the respective
> > > IOCTL
> > > either RESTRICT or RELAX. This seems a little overwhelming.
> > >
> > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do
> > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > With this approach, we can avoid storing page permissions and simplify
> > > the implementation.
> > >
> > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > flows
> > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > not sure what will be the performance impact. Is there any data point to
> > > see the performance impact?
> > >
> > > Thanks,
> > > -Vijay
> >
> > This should get better in the next versuin. "relax" is gone. And for
> > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > internal vm_max_prot_bits is set to RWX.
> >
> > I patched the existing series eno
> >
> > For Enarx I'm using the following patterns.
> >
> > Shim mmap() handler:
> > 1. Ask host for mmap() syscall.
> > 2. Construct secinfo matching the protection bits.
> > 3. For each page in the address range: EACCEPTCOPY with a
> > zero page.
>
> For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> So this only works for mmap(..., RW) or mmap(...,RWX).

I use it only with EAUG.

> So that gives you pages with RW/RWX.
>
> To change permissions of any of those pages from RW/RWX to R/RX , you need
> call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> just do EMODPE.
>
> so for RW->R, you either:
>
> 1)EMODPR(EPCM.NONE)
> 2)EACCEPT(EPCM.NONE)
> 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> access permitted by enclave"
>
> or:
>
> 1)EMODPR(EPCM.PROT_R)
> 2)EACCEPT(EPCM.PROT_R)

I checked from SDM and you're correct.

Then the appropriate thing is to reset to R.

> > Shim mprotect() handler:
> > 1. Ask host for mprotect() syscall.
> > 2. For each page in the address range: EACCEPT with PROT_NONE
> > secinfo and EMODPE with the secinfo having the prot bits.
>
> EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> EPCM.RW,
> so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.

Ditto.

> > Backend mprotect() handler:
> > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> > range with PROT_NONE.
> > 2. Invoke real mprotect() syscall.
> >
> Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> pages.

Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.

Reinette, the ioctl should already check that either R or W is set in
secinfo and return -EACCES.

I.e.

(* Check for misconfigured SECINFO flags*)
IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
(SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
THEN #GP(0); FI;

I was testing this and wondering why my enclave #GP's, and then I checked
SDM after reading Haitao's response. So clearly check in kernel side is
needed.

BR, Jarkko

2022-03-14 07:32:30

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 05:45:48AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > > Supporting permission restriction in an ioctl() enables the runtime to manage
> > > the enclave memory without needing to map it.
> >
> > Which is opposite what you do in EAUG. You can also augment pages without
> > needing the map them. Sure you get that capability, but it is quite useless
> > in practice.
>
> Essentially you are tuning for a niche artifical use case over the common
> case that most people end up doing. It makes no sense.

Also it is important to remember why EMODPR is there: it is not to bring
useful control mechanism or interesting applications for SGX. It's there
because of hardware constraints. Therefore it should be used accordingly
and certainly not to fully expose its interface to the user space.

Without hardware constraints, we would have only in-enclave EMODP.

It is essentially a reset mechanism for EPCM, not more or less. Therefore,
it should be used as such and pick a *fixed* value to reset the EPCM from
the mapped range. I think PROT_READ is the sanest choice of the available
options. Then, EMODPE can be used for the most part just like "EMODP".

Please do not fully expose EMODPR to the user space. It's a pandora box
of misbehaviour and shooting yourself into foot.

BR, Jarkko

2022-03-14 07:58:27

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> Supporting permission restriction in an ioctl() enables the runtime to manage
> the enclave memory without needing to map it.

Which is opposite what you do in EAUG. You can also augment pages without
needing the map them. Sure you get that capability, but it is quite useless
in practice.

> I have considered the idea of supporting the permission restriction with
> mprotect() but as you can see in this response I did not find it to be
> practical.

Where is it practical? What is your application? How is it practical to
delegate the concurrency management of a split mprotect() to user space?
How do we get rid off a useless up-call to the host?

> Reinette

BR, Jarkko

2022-03-14 12:34:53

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> >
> > > I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> > > This motivates that EMODPR->PROT_NONE should not be allowed since it would
> > > not be possible to relax permissions (run EMODPE) after that. Even so, I
> > > also found in the SDM that EACCEPT has the note "Read access permitted
> > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> > > from that perspective either since the enclave will not be able to
> > > EACCEPT the change. Does that match your understanding?
> >
> > Yes, PROT_NONE should not be allowed.
> >
> > This is however the real problem.
> >
> > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > simply unacceptable. It also requires more concurrency management from
> > user space run-time, which would be heck a lot easier to do in the kernel.
> >
> > If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
> > should be too. Like this when things go opposite directions, this patch set
> > plain and simply will not work out.
> >
> > I would pick EAUG's strategy from these two as it requires half the back
> > calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
> > either in the #PF handler or as part of mprotect(), which ever suits you
> > best.
> >
> > I'll try demonstrate this with two examples.
> >
> > mmap() could go something like this() (simplified):
> > 1. Execution #UD's to SYSCALL.
> > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > 3. Enclave up-calls host's mmap().
> > 4. Loops the range with EACCEPTCOPY.
> >
> > mprotect() has to be done like this:
> > 1. Execution #UD's to SYSCALL.
> > 2. Host calls enclave's mprotect() handler.
> > 3. Enclave up-calls host's mprotect().
> > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> > 3. Loops the range with EACCEPT.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 5. Loops the range with EACCEPT + EMODPE.
>
> > This is just terrible IMHO. I hope these examples bring some insight.

E.g. in Enarx we have to add a special up-call (so called enarxcall in
intermediate that we call sallyport, which provides shared buffer to
communicate with the enclave) just for reseting the range with PROT_READ.
Feel very redundant, adds ugly cruft and is completely opposite strategy to
what you've chosen to do with EAUG, which is I think correct choice as far
as API is concerned.

BR, Jarkko

2022-03-14 12:45:25

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>
> > I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> > This motivates that EMODPR->PROT_NONE should not be allowed since it would
> > not be possible to relax permissions (run EMODPE) after that. Even so, I
> > also found in the SDM that EACCEPT has the note "Read access permitted
> > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> > from that perspective either since the enclave will not be able to
> > EACCEPT the change. Does that match your understanding?
>
> Yes, PROT_NONE should not be allowed.
>
> This is however the real problem.
>
> The current kernel patch set has inconsistent API and EMODPR ioctl is
> simply unacceptable. It also requires more concurrency management from
> user space run-time, which would be heck a lot easier to do in the kernel.
>
> If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
> should be too. Like this when things go opposite directions, this patch set
> plain and simply will not work out.
>
> I would pick EAUG's strategy from these two as it requires half the back
> calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
> either in the #PF handler or as part of mprotect(), which ever suits you
> best.
>
> I'll try demonstrate this with two examples.
>
> mmap() could go something like this() (simplified):
> 1. Execution #UD's to SYSCALL.
> 2. Host calls enclave's mmap() handler with mmap() parameters.
> 3. Enclave up-calls host's mmap().
> 4. Loops the range with EACCEPTCOPY.
>
> mprotect() has to be done like this:
> 1. Execution #UD's to SYSCALL.
> 2. Host calls enclave's mprotect() handler.
> 3. Enclave up-calls host's mprotect().
> 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> 3. Loops the range with EACCEPT.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5. Loops the range with EACCEPT + EMODPE.

> This is just terrible IMHO. I hope these examples bring some insight.

BR, Jarkko

2022-03-14 17:24:03

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>> Supporting permission restriction in an ioctl() enables the runtime to manage
>> the enclave memory without needing to map it.
>
> Which is opposite what you do in EAUG. You can also augment pages without
> needing the map them. Sure you get that capability, but it is quite useless
> in practice.
>
>> I have considered the idea of supporting the permission restriction with
>> mprotect() but as you can see in this response I did not find it to be
>> practical.
>
> Where is it practical? What is your application? How is it practical to
> delegate the concurrency management of a split mprotect() to user space?
> How do we get rid off a useless up-call to the host?
>

The email you responded to contained many obstacles against using mprotect()
but you chose to ignore them and snipped them all from your response. Could
you please address the issues instead of dismissing them?

Reinette

2022-03-14 23:12:33

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:

> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> This motivates that EMODPR->PROT_NONE should not be allowed since it would
> not be possible to relax permissions (run EMODPE) after that. Even so, I
> also found in the SDM that EACCEPT has the note "Read access permitted
> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> from that perspective either since the enclave will not be able to
> EACCEPT the change. Does that match your understanding?

Yes, PROT_NONE should not be allowed.

This is however the real problem.

The current kernel patch set has inconsistent API and EMODPR ioctl is
simply unacceptable. It also requires more concurrency management from
user space run-time, which would be heck a lot easier to do in the kernel.

If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
should be too. Like this when things go opposite directions, this patch set
plain and simply will not work out.

I would pick EAUG's strategy from these two as it requires half the back
calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
either in the #PF handler or as part of mprotect(), which ever suits you
best.

I'll try demonstrate this with two examples.

mmap() could go something like this() (simplified):
1. Execution #UD's to SYSCALL.
2. Host calls enclave's mmap() handler with mmap() parameters.
3. Enclave up-calls host's mmap().
4. Loops the range with EACCEPTCOPY.

mprotect() has to be done like this:
1. Execution #UD's to SYSCALL.
2. Host calls enclave's mprotect() handler.
3. Enclave up-calls host's mprotect().
4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
3. Loops the range with EACCEPT.

This is just terrible IMHO. I hope these examples bring some insight.

BR, Jarkko

2022-03-15 05:12:13

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko

On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <[email protected]>
wrote:

> On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> >
>> > > I saw Haitao's note that EMODPE requires "Read access permitted by
>> enclave".
>> > > This motivates that EMODPR->PROT_NONE should not be allowed since
>> it would
>> > > not be possible to relax permissions (run EMODPE) after that. Even
>> so, I
>> > > also found in the SDM that EACCEPT has the note "Read access
>> permitted
>> > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not
>> practical
>> > > from that perspective either since the enclave will not be able to
>> > > EACCEPT the change. Does that match your understanding?
>> >
>> > Yes, PROT_NONE should not be allowed.
>> >
>> > This is however the real problem.
>> >
>> > The current kernel patch set has inconsistent API and EMODPR ioctl is
>> > simply unacceptable. It also requires more concurrency management
>> from
>> > user space run-time, which would be heck a lot easier to do in the
>> kernel.
>> >
>> > If you really want EMODPR as ioctl, then for consistencys sake, then
>> EAUG
>> > should be too. Like this when things go opposite directions, this
>> patch set
>> > plain and simply will not work out.
>> >
>> > I would pick EAUG's strategy from these two as it requires half the
>> back
>> > calls to host from an enclave. I.e. please combine mprotect() and
>> EMODPR,
>> > either in the #PF handler or as part of mprotect(), which ever suits
>> you
>> > best.
>> >
>> > I'll try demonstrate this with two examples.
>> >
>> > mmap() could go something like this() (simplified):
>> > 1. Execution #UD's to SYSCALL.
>> > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > 3. Enclave up-calls host's mmap().
>> > 4. Loops the range with EACCEPTCOPY.
>> >
>> > mprotect() has to be done like this:
>> > 1. Execution #UD's to SYSCALL.
>> > 2. Host calls enclave's mprotect() handler.
>> > 3. Enclave up-calls host's mprotect().
>> > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.

I assume up-calls here are ocalls as we call them in our implementation,
which are the calls enclave make to untrusted side via EEXIT.

If so, can your implementation combine this two up-calls into one, then
host side just do ioctl() and mprotect to kernel? If so, would that
address your concern about extra up-calls?


>> > 3. Loops the range with EACCEPT.
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 5. Loops the range with EACCEPT + EMODPE.
>>
>> > This is just terrible IMHO. I hope these examples bring some insight.
>
> E.g. in Enarx we have to add a special up-call (so called enarxcall in
> intermediate that we call sallyport, which provides shared buffer to
> communicate with the enclave) just for reseting the range with PROT_READ.
> Feel very redundant, adds ugly cruft and is completely opposite strategy
> to
> what you've chosen to do with EAUG, which is I think correct choice as
> far
> as API is concerned.

The problem with EMODPR on #PF is that kernel needs to know what
permissions requested from enclave at the time of #PF. So enclave has to
make at least one call to kernel (again via ocall in our case, I assume
up-call in your case) to make the change.

Enclave runtime may not know the permissions until upper layer application
code (JIT or some kind of code loader) make the decision to change it. And
the ocalls/up-calls can only be done at that time, not upfront, like mmap
that is only used to reserve ranges.

I also see this model as consistent to what kernel does for regular memory
mappings: adding physical pages on #PF or pre-fault and changing PTE
permissions only after mprotect is called.

I would agree/prefer mprotect and the ioctl() for EMODPR be combined, but
Reinette pointed out some issues above on managing VMAs and handling
errors in that approach.

BR
Haitao

2022-03-15 16:00:26

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > Supporting permission restriction in an ioctl() enables the runtime to manage
> > the enclave memory without needing to map it.
>
> Which is opposite what you do in EAUG. You can also augment pages without
> needing the map them. Sure you get that capability, but it is quite useless
> in practice.

Essentially you are tuning for a niche artifical use case over the common
case that most people end up doing. It makes no sense.

BR, Jarkko

2022-03-17 06:41:47

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >> Supporting permission restriction in an ioctl() enables the runtime to manage
> >> the enclave memory without needing to map it.
> >
> > Which is opposite what you do in EAUG. You can also augment pages without
> > needing the map them. Sure you get that capability, but it is quite useless
> > in practice.
> >
> >> I have considered the idea of supporting the permission restriction with
> >> mprotect() but as you can see in this response I did not find it to be
> >> practical.
> >
> > Where is it practical? What is your application? How is it practical to
> > delegate the concurrency management of a split mprotect() to user space?
> > How do we get rid off a useless up-call to the host?
> >
>
> The email you responded to contained many obstacles against using mprotect()
> but you chose to ignore them and snipped them all from your response. Could
> you please address the issues instead of dismissing them?

I did read the whole email but did not see anything that would make a case
for fully exposed EMODPR, or having asymmetrical towards how EAUG works.

I had the same discussion with Haitao about PROT_NONE earlier, and am
fully aware that PROT_READ is required.

BR, Jarkko

2022-03-17 06:41:49

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> I also see this model as consistent to what kernel does for regular memory
> mappings: adding physical pages on #PF or pre-fault and changing PTE
> permissions only after mprotect is called.

And you were against this in EAUG's case. As in the EAUG's case
EMODPR could be done as part of the mprotect() flow.

BR, Jarkko

2022-03-17 06:47:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> Hi Jarkko
>
> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > >
> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > by enclave".
> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > since it would
> > > > > not be possible to relax permissions (run EMODPE) after that.
> > > Even so, I
> > > > > also found in the SDM that EACCEPT has the note "Read access
> > > permitted
> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > not practical
> > > > > from that perspective either since the enclave will not be able to
> > > > > EACCEPT the change. Does that match your understanding?
> > > >
> > > > Yes, PROT_NONE should not be allowed.
> > > >
> > > > This is however the real problem.
> > > >
> > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > simply unacceptable. It also requires more concurrency management
> > > from
> > > > user space run-time, which would be heck a lot easier to do in the
> > > kernel.
> > > >
> > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > then EAUG
> > > > should be too. Like this when things go opposite directions, this
> > > patch set
> > > > plain and simply will not work out.
> > > >
> > > > I would pick EAUG's strategy from these two as it requires half
> > > the back
> > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > EMODPR,
> > > > either in the #PF handler or as part of mprotect(), which ever
> > > suits you
> > > > best.
> > > >
> > > > I'll try demonstrate this with two examples.
> > > >
> > > > mmap() could go something like this() (simplified):
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > 3. Enclave up-calls host's mmap().
> > > > 4. Loops the range with EACCEPTCOPY.
> > > >
> > > > mprotect() has to be done like this:
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mprotect() handler.
> > > > 3. Enclave up-calls host's mprotect().
> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
>
> I assume up-calls here are ocalls as we call them in our implementation,
> which are the calls enclave make to untrusted side via EEXIT.
>
> If so, can your implementation combine this two up-calls into one, then host
> side just do ioctl() and mprotect to kernel? If so, would that address your
> concern about extra up-calls?
>
>
> > > > 3. Loops the range with EACCEPT.
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 5. Loops the range with EACCEPT + EMODPE.
> > >
> > > > This is just terrible IMHO. I hope these examples bring some insight.
> >
> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > intermediate that we call sallyport, which provides shared buffer to
> > communicate with the enclave) just for reseting the range with PROT_READ.
> > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > to
> > what you've chosen to do with EAUG, which is I think correct choice as
> > far
> > as API is concerned.
>
> The problem with EMODPR on #PF is that kernel needs to know what permissions
> requested from enclave at the time of #PF. So enclave has to make at least
> one call to kernel (again via ocall in our case, I assume up-call in your
> case) to make the change.

Your security scheme is broken if permissions are requested outside the
enclave, i.e. the hostile environment controls the permissions. That should
always come from the enclave and enclave uses EACCEPT* to validate that
what was given to EMODPR, EAUG and EMODT matches its expections.

Upper layer application should not never be in charge, and a broken
security scheme should never be supported.

If EMODPR sets unconditionally to PROT_READ, enclave is able to validate
this fact and then it can use EMODPE to set appropriate permissions.

BR, Jarkko

2022-03-17 07:46:10

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > Hi Jarkko
> >
> > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <[email protected]>
> > wrote:
> >
> > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > >
> > > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > > by enclave".
> > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > since it would
> > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > Even so, I
> > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > permitted
> > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > not practical
> > > > > > from that perspective either since the enclave will not be able to
> > > > > > EACCEPT the change. Does that match your understanding?
> > > > >
> > > > > Yes, PROT_NONE should not be allowed.
> > > > >
> > > > > This is however the real problem.
> > > > >
> > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > > simply unacceptable. It also requires more concurrency management
> > > > from
> > > > > user space run-time, which would be heck a lot easier to do in the
> > > > kernel.
> > > > >
> > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > then EAUG
> > > > > should be too. Like this when things go opposite directions, this
> > > > patch set
> > > > > plain and simply will not work out.
> > > > >
> > > > > I would pick EAUG's strategy from these two as it requires half
> > > > the back
> > > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > > EMODPR,
> > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > suits you
> > > > > best.
> > > > >
> > > > > I'll try demonstrate this with two examples.
> > > > >
> > > > > mmap() could go something like this() (simplified):
> > > > > 1. Execution #UD's to SYSCALL.
> > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > 3. Enclave up-calls host's mmap().
> > > > > 4. Loops the range with EACCEPTCOPY.
> > > > >
> > > > > mprotect() has to be done like this:
> > > > > 1. Execution #UD's to SYSCALL.
> > > > > 2. Host calls enclave's mprotect() handler.
> > > > > 3. Enclave up-calls host's mprotect().
> > > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> >
> > I assume up-calls here are ocalls as we call them in our implementation,
> > which are the calls enclave make to untrusted side via EEXIT.
> >
> > If so, can your implementation combine this two up-calls into one, then host
> > side just do ioctl() and mprotect to kernel? If so, would that address your
> > concern about extra up-calls?
> >
> >
> > > > > 3. Loops the range with EACCEPT.
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > 5. Loops the range with EACCEPT + EMODPE.
> > > >
> > > > > This is just terrible IMHO. I hope these examples bring some insight.
> > >
> > > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > > intermediate that we call sallyport, which provides shared buffer to
> > > communicate with the enclave) just for reseting the range with PROT_READ.
> > > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > > to
> > > what you've chosen to do with EAUG, which is I think correct choice as
> > > far
> > > as API is concerned.
> >
> > The problem with EMODPR on #PF is that kernel needs to know what permissions
> > requested from enclave at the time of #PF. So enclave has to make at least
> > one call to kernel (again via ocall in our case, I assume up-call in your
> > case) to make the change.
>
> The #PF handler should do unconditionally EMODPR with PROT_READ.

Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
this detail hugely anymore because it does not affect uapi.

Using EMODPR as a permission control mechanism is a ridiculous idea, and
I cannot commit to maintain a broken uapi.

BR, Jarkko

2022-03-17 07:53:50

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> Hi Jarkko
>
> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > >
> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > by enclave".
> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > since it would
> > > > > not be possible to relax permissions (run EMODPE) after that.
> > > Even so, I
> > > > > also found in the SDM that EACCEPT has the note "Read access
> > > permitted
> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > not practical
> > > > > from that perspective either since the enclave will not be able to
> > > > > EACCEPT the change. Does that match your understanding?
> > > >
> > > > Yes, PROT_NONE should not be allowed.
> > > >
> > > > This is however the real problem.
> > > >
> > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > simply unacceptable. It also requires more concurrency management
> > > from
> > > > user space run-time, which would be heck a lot easier to do in the
> > > kernel.
> > > >
> > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > then EAUG
> > > > should be too. Like this when things go opposite directions, this
> > > patch set
> > > > plain and simply will not work out.
> > > >
> > > > I would pick EAUG's strategy from these two as it requires half
> > > the back
> > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > EMODPR,
> > > > either in the #PF handler or as part of mprotect(), which ever
> > > suits you
> > > > best.
> > > >
> > > > I'll try demonstrate this with two examples.
> > > >
> > > > mmap() could go something like this() (simplified):
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > 3. Enclave up-calls host's mmap().
> > > > 4. Loops the range with EACCEPTCOPY.
> > > >
> > > > mprotect() has to be done like this:
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mprotect() handler.
> > > > 3. Enclave up-calls host's mprotect().
> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
>
> I assume up-calls here are ocalls as we call them in our implementation,
> which are the calls enclave make to untrusted side via EEXIT.
>
> If so, can your implementation combine this two up-calls into one, then host
> side just do ioctl() and mprotect to kernel? If so, would that address your
> concern about extra up-calls?
>
>
> > > > 3. Loops the range with EACCEPT.
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 5. Loops the range with EACCEPT + EMODPE.
> > >
> > > > This is just terrible IMHO. I hope these examples bring some insight.
> >
> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > intermediate that we call sallyport, which provides shared buffer to
> > communicate with the enclave) just for reseting the range with PROT_READ.
> > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > to
> > what you've chosen to do with EAUG, which is I think correct choice as
> > far
> > as API is concerned.
>
> The problem with EMODPR on #PF is that kernel needs to know what permissions
> requested from enclave at the time of #PF. So enclave has to make at least
> one call to kernel (again via ocall in our case, I assume up-call in your
> case) to make the change.

The #PF handler should do unconditionally EMODPR with PROT_READ.

BR, Jarkko

2022-03-17 16:13:53

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi

On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <[email protected]>
wrote:

> On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
>> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> > Hi Jarkko
>> >
>> > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
>> <[email protected]>
>> > wrote:
>> >
>> > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> > > > >
>> > > > > > I saw Haitao's note that EMODPE requires "Read access
>> permitted
>> > > > by enclave".
>> > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
>> > > > since it would
>> > > > > > not be possible to relax permissions (run EMODPE) after that.
>> > > > Even so, I
>> > > > > > also found in the SDM that EACCEPT has the note "Read access
>> > > > permitted
>> > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
>> > > > not practical
>> > > > > > from that perspective either since the enclave will not be
>> able to
>> > > > > > EACCEPT the change. Does that match your understanding?
>> > > > >
>> > > > > Yes, PROT_NONE should not be allowed.
>> > > > >
>> > > > > This is however the real problem.
>> > > > >
>> > > > > The current kernel patch set has inconsistent API and EMODPR
>> ioctl is
>> > > > > simply unacceptable. It also requires more concurrency
>> management
>> > > > from
>> > > > > user space run-time, which would be heck a lot easier to do in
>> the
>> > > > kernel.
>> > > > >
>> > > > > If you really want EMODPR as ioctl, then for consistencys sake,
>> > > > then EAUG
>> > > > > should be too. Like this when things go opposite directions,
>> this
>> > > > patch set
>> > > > > plain and simply will not work out.
>> > > > >
>> > > > > I would pick EAUG's strategy from these two as it requires half
>> > > > the back
>> > > > > calls to host from an enclave. I.e. please combine mprotect()
>> and
>> > > > EMODPR,
>> > > > > either in the #PF handler or as part of mprotect(), which ever
>> > > > suits you
>> > > > > best.
>> > > > >
>> > > > > I'll try demonstrate this with two examples.
>> > > > >
>> > > > > mmap() could go something like this() (simplified):
>> > > > > 1. Execution #UD's to SYSCALL.
>> > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > > > > 3. Enclave up-calls host's mmap().
>> > > > > 4. Loops the range with EACCEPTCOPY.
>> > > > >
>> > > > > mprotect() has to be done like this:
>> > > > > 1. Execution #UD's to SYSCALL.
>> > > > > 2. Host calls enclave's mprotect() handler.
>> > > > > 3. Enclave up-calls host's mprotect().
>> > > > > 4. Enclave up-calls host's ioctl() to
>> SGX_IOC_ENCLAVE_PERMISSIONS.
>> >
>> > I assume up-calls here are ocalls as we call them in our
>> implementation,
>> > which are the calls enclave make to untrusted side via EEXIT.
>> >ar
>> > If so, can your implementation combine this two up-calls into one,
>> then host
>> > side just do ioctl() and mprotect to kernel? If so, would that
>> address your
>> > concern about extra up-calls?
>> >
>> >
>> > > > > 3. Loops the range with EACCEPT.
>> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > > > 5. Loops the range with EACCEPT + EMODPE.
>> > > >
>> > > > > This is just terrible IMHO. I hope these examples bring some
>> insight.
>> > >
>> > > E.g. in Enarx we have to add a special up-call (so called enarxcall
>> in
>> > > intermediate that we call sallyport, which provides shared buffer to
>> > > communicate with the enclave) just for reseting the range with
>> PROT_READ.
>> > > Feel very redundant, adds ugly cruft and is completely opposite
>> strategy
>> > > to
>> > > what you've chosen to do with EAUG, which is I think correct choice
>> as
>> > > far
>> > > as API is concerned.
>> >
>> > The problem with EMODPR on #PF is that kernel needs to know what
>> permissions
>> > requested from enclave at the time of #PF. So enclave has to make at
>> least
>> > one call to kernel (again via ocall in our case, I assume up-call in
>> your
>> > case) to make the change.
>>
>> The #PF handler should do unconditionally EMODPR with PROT_READ.
>
> Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> this detail hugely anymore because it does not affect uapi.
>
> Using EMODPR as a permission control mechanism is a ridiculous idea, and
> I cannot commit to maintain a broken uapi.
>

Jarkko, how would automatically forcing PROT_READ on #PF work for this
sequence?

1) EAUG a page (has to be RW)
2) EACCEPT(RW)
3) enclave copies some data to page
4) enclave wants to change permission to R

If you are proposing mprotect, then as I indicated earlier, please address
concerns raised by Reinette:
https://lore.kernel.org/linux-sgx/[email protected]/



Thanks
Haitao

2022-03-17 16:14:58

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, 16 Mar 2022 23:37:26 -0500, Jarkko Sakkinen <[email protected]>
wrote:

> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> I also see this model as consistent to what kernel does for regular
>> memory
>> mappings: adding physical pages on #PF or pre-fault and changing PTE
>> permissions only after mprotect is called.
>
> And you were against this in EAUG's case. As in the EAUG's case
> EMODPR could be done as part of the mprotect() flow.
>

I preferred not automatic/unconditional EAUG during mmap.
Here I think automatic/unconditional EMODPR(PROT_READ) on #PF would not
work for all cases. See my reply to your other email.

Thanks
Haitao

2022-03-17 20:41:44

by Haitao Huang

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, 16 Mar 2022 23:34:39 -0500, Jarkko Sakkinen <[email protected]>
wrote:

> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> Hi Jarkko
>>
>> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <[email protected]>
>> wrote:
>>
>> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> > > >
>> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
>> > > by enclave".
>> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
>> > > since it would
>> > > > > not be possible to relax permissions (run EMODPE) after that.
>> > > Even so, I
>> > > > > also found in the SDM that EACCEPT has the note "Read access
>> > > permitted
>> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
>> > > not practical
>> > > > > from that perspective either since the enclave will not be able
>> to
>> > > > > EACCEPT the change. Does that match your understanding?
>> > > >
>> > > > Yes, PROT_NONE should not be allowed.
>> > > >
>> > > > This is however the real problem.
>> > > >
>> > > > The current kernel patch set has inconsistent API and EMODPR
>> ioctl is
>> > > > simply unacceptable. It also requires more concurrency management
>> > > from
>> > > > user space run-time, which would be heck a lot easier to do in the
>> > > kernel.
>> > > >
>> > > > If you really want EMODPR as ioctl, then for consistencys sake,
>> > > then EAUG
>> > > > should be too. Like this when things go opposite directions, this
>> > > patch set
>> > > > plain and simply will not work out.
>> > > >
>> > > > I would pick EAUG's strategy from these two as it requires half
>> > > the back
>> > > > calls to host from an enclave. I.e. please combine mprotect() and
>> > > EMODPR,
>> > > > either in the #PF handler or as part of mprotect(), which ever
>> > > suits you
>> > > > best.
>> > > >
>> > > > I'll try demonstrate this with two examples.
>> > > >
>> > > > mmap() could go something like this() (simplified):
>> > > > 1. Execution #UD's to SYSCALL.
>> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > > > 3. Enclave up-calls host's mmap().
>> > > > 4. Loops the range with EACCEPTCOPY.
>> > > >
>> > > > mprotect() has to be done like this:
>> > > > 1. Execution #UD's to SYSCALL.
>> > > > 2. Host calls enclave's mprotect() handler.
>> > > > 3. Enclave up-calls host's mprotect().
>> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
>>
>> I assume up-calls here are ocalls as we call them in our implementation,
>> which are the calls enclave make to untrusted side via EEXIT.
>>
>> If so, can your implementation combine this two up-calls into one, then
>> host
>> side just do ioctl() and mprotect to kernel? If so, would that address
>> your
>> concern about extra up-calls?
>>
>>
>> > > > 3. Loops the range with EACCEPT.
>> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > > 5. Loops the range with EACCEPT + EMODPE.
>> > >
>> > > > This is just terrible IMHO. I hope these examples bring some
>> insight.
>> >
>> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
>> > intermediate that we call sallyport, which provides shared buffer to
>> > communicate with the enclave) just for reseting the range with
>> PROT_READ.
>> > Feel very redundant, adds ugly cruft and is completely opposite
>> strategy
>> > to
>> > what you've chosen to do with EAUG, which is I think correct choice as
>> > far
>> > as API is concerned.
>>
>> The problem with EMODPR on #PF is that kernel needs to know what
>> permissions
>> requested from enclave at the time of #PF. So enclave has to make at
>> least
>> one call to kernel (again via ocall in our case, I assume up-call in
>> your
>> case) to make the change.
>
> Your security scheme is broken if permissions are requested outside the
> enclave, i.e. the hostile environment controls the permissions. That
> should
> always come from the enclave and enclave uses EACCEPT* to validate that
> what was given to EMODPR, EAUG and EMODT matches its expections.
>
> Upper layer application should not never be in charge, and a broken
> security scheme should never be supported.
>
Upper layer in this case I mean code inside enclave.
Enclave can always use EACCEPT to verify permissions and is in full
control of EPCM permissions.
Kernel(code outside enclave invoking kernel) would only be able to reduce
EPCM permissions, and as you know enclave can always EMODPE.
So this is not related to enclave security.

Haitao

2022-03-17 22:05:17

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> Hi
>
> On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <[email protected]>
> wrote:
>
> > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > Hi Jarkko
> > > >
> > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > <[email protected]>
> > > > wrote:
> > > >
> > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > >
> > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > permitted
> > > > > > by enclave".
> > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > since it would
> > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > Even so, I
> > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > permitted
> > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > not practical
> > > > > > > > from that perspective either since the enclave will not be
> > > able to
> > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > >
> > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > >
> > > > > > > This is however the real problem.
> > > > > > >
> > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > ioctl is
> > > > > > > simply unacceptable. It also requires more concurrency
> > > management
> > > > > > from
> > > > > > > user space run-time, which would be heck a lot easier to do
> > > in the
> > > > > > kernel.
> > > > > > >
> > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > then EAUG
> > > > > > > should be too. Like this when things go opposite directions,
> > > this
> > > > > > patch set
> > > > > > > plain and simply will not work out.
> > > > > > >
> > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > the back
> > > > > > > calls to host from an enclave. I.e. please combine
> > > mprotect() and
> > > > > > EMODPR,
> > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > suits you
> > > > > > > best.
> > > > > > >
> > > > > > > I'll try demonstrate this with two examples.
> > > > > > >
> > > > > > > mmap() could go something like this() (simplified):
> > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > >
> > > > > > > mprotect() has to be done like this:
> > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > 4. Enclave up-calls host's ioctl() to
> > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > >
> > > > I assume up-calls here are ocalls as we call them in our
> > > implementation,
> > > > which are the calls enclave make to untrusted side via EEXIT.
> > > >ar
> > > > If so, can your implementation combine this two up-calls into one,
> > > then host
> > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > address your
> > > > concern about extra up-calls?
> > > >
> > > >
> > > > > > > 3. Loops the range with EACCEPT.
> > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > 5. Loops the range with EACCEPT + EMODPE.
> > > > > >
> > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > insight.
> > > > >
> > > > > E.g. in Enarx we have to add a special up-call (so called
> > > enarxcall in
> > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > communicate with the enclave) just for reseting the range with
> > > PROT_READ.
> > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > strategy
> > > > > to
> > > > > what you've chosen to do with EAUG, which is I think correct
> > > choice as
> > > > > far
> > > > > as API is concerned.
> > > >
> > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > permissions
> > > > requested from enclave at the time of #PF. So enclave has to make
> > > at least
> > > > one call to kernel (again via ocall in our case, I assume up-call
> > > in your
> > > > case) to make the change.
> > >
> > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> >
> > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > this detail hugely anymore because it does not affect uapi.
> >
> > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > I cannot commit to maintain a broken uapi.
> >
>
> Jarkko, how would automatically forcing PROT_READ on #PF work for this
> sequence?
>
> 1) EAUG a page (has to be RW)
> 2) EACCEPT(RW)
> 3) enclave copies some data to page
> 4) enclave wants to change permission to R
>
> If you are proposing mprotect, then as I indicated earlier, please address
> concerns raised by Reinette:
> https://lore.kernel.org/linux-sgx/[email protected]/

For EAUG you can choose between #PF handler and having it as part of
mmap() with the same uapi.

For EMODPR clearly #PF handler would be tricky but nothing prevents
resetting the permissions as part of mprotect() flow, which is trivial.

One good reason to have a fixed EMODPR is that e.g. emulating properly
mprotect() is almost undoable if you don't do it otherwise. Specifically
the scenario where your address range spans through multiple adjacent
VMAs. It's even without EMODPR complex enough scenario that you really
don't want to ask yourself for more trouble than use EMODPR in a super
conservative manner.

Having EMODPR fully exposed will only make more difficult API to do with
extra round-trips. If you want to use ring-0 instructions fully exposed,
please don't use a kernel. There's a bunch of hardware features in Intel
CPUs for which Linux does not provide 1:1 all wide open interfaces.

BR, Jarkko

2022-03-17 22:05:43

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> > Hi
> >
> > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <[email protected]>
> > wrote:
> >
> > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > > Hi Jarkko
> > > > >
> > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > > <[email protected]>
> > > > > wrote:
> > > > >
> > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > > >
> > > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > > permitted
> > > > > > > by enclave".
> > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > > since it would
> > > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > > Even so, I
> > > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > > permitted
> > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > > not practical
> > > > > > > > > from that perspective either since the enclave will not be
> > > > able to
> > > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > > >
> > > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > > >
> > > > > > > > This is however the real problem.
> > > > > > > >
> > > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > > ioctl is
> > > > > > > > simply unacceptable. It also requires more concurrency
> > > > management
> > > > > > > from
> > > > > > > > user space run-time, which would be heck a lot easier to do
> > > > in the
> > > > > > > kernel.
> > > > > > > >
> > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > > then EAUG
> > > > > > > > should be too. Like this when things go opposite directions,
> > > > this
> > > > > > > patch set
> > > > > > > > plain and simply will not work out.
> > > > > > > >
> > > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > > the back
> > > > > > > > calls to host from an enclave. I.e. please combine
> > > > mprotect() and
> > > > > > > EMODPR,
> > > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > > suits you
> > > > > > > > best.
> > > > > > > >
> > > > > > > > I'll try demonstrate this with two examples.
> > > > > > > >
> > > > > > > > mmap() could go something like this() (simplified):
> > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > > >
> > > > > > > > mprotect() has to be done like this:
> > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > > 4. Enclave up-calls host's ioctl() to
> > > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > > >
> > > > > I assume up-calls here are ocalls as we call them in our
> > > > implementation,
> > > > > which are the calls enclave make to untrusted side via EEXIT.
> > > > >ar
> > > > > If so, can your implementation combine this two up-calls into one,
> > > > then host
> > > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > > address your
> > > > > concern about extra up-calls?
> > > > >
> > > > >
> > > > > > > > 3. Loops the range with EACCEPT.
> > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > 5. Loops the range with EACCEPT + EMODPE.
> > > > > > >
> > > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > > insight.
> > > > > >
> > > > > > E.g. in Enarx we have to add a special up-call (so called
> > > > enarxcall in
> > > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > > communicate with the enclave) just for reseting the range with
> > > > PROT_READ.
> > > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > > strategy
> > > > > > to
> > > > > > what you've chosen to do with EAUG, which is I think correct
> > > > choice as
> > > > > > far
> > > > > > as API is concerned.
> > > > >
> > > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > > permissions
> > > > > requested from enclave at the time of #PF. So enclave has to make
> > > > at least
> > > > > one call to kernel (again via ocall in our case, I assume up-call
> > > > in your
> > > > > case) to make the change.
> > > >
> > > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> > >
> > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > > this detail hugely anymore because it does not affect uapi.
> > >
> > > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > > I cannot commit to maintain a broken uapi.
> > >
> >
> > Jarkko, how would automatically forcing PROT_READ on #PF work for this
> > sequence?
> >
> > 1) EAUG a page (has to be RW)
> > 2) EACCEPT(RW)
> > 3) enclave copies some data to page
> > 4) enclave wants to change permission to R
> >
> > If you are proposing mprotect, then as I indicated earlier, please address
> > concerns raised by Reinette:
> > https://lore.kernel.org/linux-sgx/[email protected]/
>
> For EAUG you can choose between #PF handler and having it as part of
> mmap() with the same uapi.
>
> For EMODPR clearly #PF handler would be tricky but nothing prevents
> resetting the permissions as part of mprotect() flow, which is trivial.
>
> One good reason to have a fixed EMODPR is that e.g. emulating properly
> mprotect() is almost undoable if you don't do it otherwise. Specifically

s/don't//g

> the scenario where your address range spans through multiple adjacent
> VMAs. It's even without EMODPR complex enough scenario that you really
> don't want to ask yourself for more trouble than use EMODPR in a super
> conservative manner.
>
> Having EMODPR fully exposed will only make more difficult API to do with
> extra round-trips. If you want to use ring-0 instructions fully exposed,
> please don't use a kernel. There's a bunch of hardware features in Intel
> CPUs for which Linux does not provide 1:1 all wide open interfaces.
>
> BR, Jarkko

2022-03-17 22:15:54

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>> the enclave memory without needing to map it.
>>>
>>> Which is opposite what you do in EAUG. You can also augment pages without
>>> needing the map them. Sure you get that capability, but it is quite useless
>>> in practice.
>>>
>>>> I have considered the idea of supporting the permission restriction with
>>>> mprotect() but as you can see in this response I did not find it to be
>>>> practical.
>>>
>>> Where is it practical? What is your application? How is it practical to
>>> delegate the concurrency management of a split mprotect() to user space?
>>> How do we get rid off a useless up-call to the host?
>>>
>>
>> The email you responded to contained many obstacles against using mprotect()
>> but you chose to ignore them and snipped them all from your response. Could
>> you please address the issues instead of dismissing them?
>
> I did read the whole email but did not see anything that would make a case
> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.

I believe that on its own each obstacle I shared with you is significant enough
to not follow that approach. You simply respond that I am just not making a
case without acknowledging any obstacle or providing a reason why the obstacles
are not valid.

To help me understand your view, could you please respond to each of the
obstacles I list below and how it is not an issue?


1) ABI change:
mprotect() is currently supported to modify VMA permissions
irrespective of EPCM permissions. Supporting EPCM permission
changes with mprotect() would change this behavior.
For example, currently it is possible to have RW enclave
memory and support multiple tasks accessing the memory. Two
tasks can map the memory RW and later one can run mprotect()
to reduce the VMA permissions to read-only without impacting
the access of the other task.
By moving EPCM permission changes to mprotect() this usage
will no longer be supported and current behavior will change.

2) Only half EPCM permission management:
Moving to mprotect() as a way to set EPCM permissions is
not a clear interface for EPCM permission management because
the kernel can only restrict permissions. Even so, the kernel
has no insight into the current EPCM permissions and thus whether they
actually need to be restricted so every mprotect() call,
all except RWX, will need to be treated as a permission
restriction with all the implementation obstacles
that accompany it (more below).

There are two possible ways to implement permission restriction
as triggered by mprotect(), (a) during the mprotect() call or
(b) during a subsequent #PF (as suggested by you), each has
its own obstacles.

3) mprotect() implementation

When the user calls mprotect() the expectation is that the
call will either succeed or fail. If the call fails the user
expects the system to be unchanged. This is not possible if
permission restriction is done as part of mprotect().

(a) mprotect() may span multiple VMAs and involves VMA splits
that (from what I understand) cannot be undone. SGX memory
does not support VMA merges. If any SGX function
(EMODPR or ETRACK on any page) done after a VMA split fails
then the user will be left with fragmented memory.

(b) The EMODPR/ETRACK pair can fail on any of the pages provided
by the mprotect() call. If there is a failure then the
kernel cannot undo previously executed EMODPR since the kernel
cannot run EMODPE. The EPCM permissions are thus left in inconsistent
state since some of the pages would have changed EPCM permissions
and mprotect() does not have mechanism to communicate
partial success.
The partial success is needed to communicate to user space
(i) which pages need EACCEPT, (ii) which pages need to be
in new request (although user space does not have information
to help the new request succeed - see below).

(c) User space runtime has control over management of EPC memory
and accurate failure information would help it to do so.
Knowing the error code of the EMODPR failure would help
user space to take appropriate action. For example, EMODPR
can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
to learn that it needs to run EACCEPT on that page before
the EMODPR can succeed. Alternatively, if it learns that the
return is "SGX_EPC_PAGE_CONFLICT" then it could determine
that some other part of the runtime attempted an ENCLU
function on that page.
It is not possible to provide such detailed errors to user
space with mprotect().


4) #PF implementation

(a) There is more to restricting permissions than just running
ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
also initiate the ETRACK flow to ensure that any thread within
the enclave is interrupted by sending an IPI to the CPU,
this includes the thread that just triggered the #PF.

(b) Second consideration of the EMODPR and ETRACK flow is that
this has a large "blast radius" in that any thread in the
enclave needs to be interrupted. #PFs may arrive at any time
so setting up a page range where a fault into any page in the
page range will trigger enclave exits for all threads is
a significant yet random impact. I believe it would be better
to update all pages in the range at the same time and in this
way contain the impact of this significant EMODPR/ETRACK/IPIs
flow.

(c) How will the page fault handler know when EMODPR/ETRACK should
be run? Consider that the page fault handler can be called
significantly later than the mprotect() call and that
user space can call EMODPE any time to extend permissions.
This implies that EMODPR/ETRACK/IPIs should be run during
*every* page fault, irrespective of mprotect().

(d) If a page is in pending or modified state then EMODPR will
always fail. This is something that needs to be fixed by
user space runtime but the page fault will not be able
to communicate this.

Considering the above, could you please provide clear guidance on
how you envision permission restriction to be supported by mprotect()?

Reinette

2022-03-17 23:02:12

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Fri, Mar 18, 2022 at 12:00:17AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> > > Hi
> > >
> > > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <[email protected]>
> > > wrote:
> > >
> > > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > > > Hi Jarkko
> > > > > >
> > > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > > > <[email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > > > >
> > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > > > permitted
> > > > > > > > by enclave".
> > > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > > > since it would
> > > > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > > > Even so, I
> > > > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > > > permitted
> > > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > > > not practical
> > > > > > > > > > from that perspective either since the enclave will not be
> > > > > able to
> > > > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > > > >
> > > > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > > > >
> > > > > > > > > This is however the real problem.
> > > > > > > > >
> > > > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > > > ioctl is
> > > > > > > > > simply unacceptable. It also requires more concurrency
> > > > > management
> > > > > > > > from
> > > > > > > > > user space run-time, which would be heck a lot easier to do
> > > > > in the
> > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > > > then EAUG
> > > > > > > > > should be too. Like this when things go opposite directions,
> > > > > this
> > > > > > > > patch set
> > > > > > > > > plain and simply will not work out.
> > > > > > > > >
> > > > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > > > the back
> > > > > > > > > calls to host from an enclave. I.e. please combine
> > > > > mprotect() and
> > > > > > > > EMODPR,
> > > > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > > > suits you
> > > > > > > > > best.
> > > > > > > > >
> > > > > > > > > I'll try demonstrate this with two examples.
> > > > > > > > >
> > > > > > > > > mmap() could go something like this() (simplified):
> > > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > > > >
> > > > > > > > > mprotect() has to be done like this:
> > > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > > > 4. Enclave up-calls host's ioctl() to
> > > > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > > > >
> > > > > > I assume up-calls here are ocalls as we call them in our
> > > > > implementation,
> > > > > > which are the calls enclave make to untrusted side via EEXIT.
> > > > > >ar
> > > > > > If so, can your implementation combine this two up-calls into one,
> > > > > then host
> > > > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > > > address your
> > > > > > concern about extra up-calls?
> > > > > >
> > > > > >
> > > > > > > > > 3. Loops the range with EACCEPT.
> > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > > 5. Loops the range with EACCEPT + EMODPE.
> > > > > > > >
> > > > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > > > insight.
> > > > > > >
> > > > > > > E.g. in Enarx we have to add a special up-call (so called
> > > > > enarxcall in
> > > > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > > > communicate with the enclave) just for reseting the range with
> > > > > PROT_READ.
> > > > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > > > strategy
> > > > > > > to
> > > > > > > what you've chosen to do with EAUG, which is I think correct
> > > > > choice as
> > > > > > > far
> > > > > > > as API is concerned.
> > > > > >
> > > > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > > > permissions
> > > > > > requested from enclave at the time of #PF. So enclave has to make
> > > > > at least
> > > > > > one call to kernel (again via ocall in our case, I assume up-call
> > > > > in your
> > > > > > case) to make the change.
> > > > >
> > > > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> > > >
> > > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > > > this detail hugely anymore because it does not affect uapi.
> > > >
> > > > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > > > I cannot commit to maintain a broken uapi.
> > > >
> > >
> > > Jarkko, how would automatically forcing PROT_READ on #PF work for this
> > > sequence?
> > >
> > > 1) EAUG a page (has to be RW)
> > > 2) EACCEPT(RW)
> > > 3) enclave copies some data to page
> > > 4) enclave wants to change permission to R
> > >
> > > If you are proposing mprotect, then as I indicated earlier, please address
> > > concerns raised by Reinette:
> > > https://lore.kernel.org/linux-sgx/[email protected]/
> >
> > For EAUG you can choose between #PF handler and having it as part of
> > mmap() with the same uapi.
> >
> > For EMODPR clearly #PF handler would be tricky but nothing prevents
> > resetting the permissions as part of mprotect() flow, which is trivial.
> >
> > One good reason to have a fixed EMODPR is that e.g. emulating properly
> > mprotect() is almost undoable if you don't do it otherwise. Specifically
>
> s/don't//g
>
> > the scenario where your address range spans through multiple adjacent
> > VMAs. It's even without EMODPR complex enough scenario that you really
> > don't want to ask yourself for more trouble than use EMODPR in a super
> > conservative manner.
> >
> > Having EMODPR fully exposed will only make more difficult API to do with
> > extra round-trips. If you want to use ring-0 instructions fully exposed,
> > please don't use a kernel. There's a bunch of hardware features in Intel
> > CPUs for which Linux does not provide 1:1 all wide open interfaces.

I've now run a tweaked SGX2 v2 patch set [*] over 1,5 weeks and I'm really
really confident about the stability. My laptop has not crashed a single
time. For EAUG portion I'm probably rather sooner than later ready to give
reviewed-by's because the API works just great.

Just want to put a note that it is not the internals that I'm too concerned
off. For v3 I'd suggest that it is sent as you see fit and not to get stuck
to EMODPR.

What I'll do, once I get it, is that I'll construct a small well-defined
patch or perhaps patch set, which shows how I would change the EMODPR part.

[*] I run it my 2020 XPS13 laptop, which is SGX2 capable, and created this
CI thing that produces periodically automated kernel package builds of
it for the Arch Linux: https://github.com/jarkkojs/aur-linux-sgx/actions.
It's distro kernel with the same config, Reinette's patches on top, and
my tweaks on top of them. When v3 comes out, I'll update the kernel
version and replaces the v2+ patches with them.

BR, Jarkko

2022-03-17 23:41:46

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> > On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>> the enclave memory without needing to map it.
> >>>
> >>> Which is opposite what you do in EAUG. You can also augment pages without
> >>> needing the map them. Sure you get that capability, but it is quite useless
> >>> in practice.
> >>>
> >>>> I have considered the idea of supporting the permission restriction with
> >>>> mprotect() but as you can see in this response I did not find it to be
> >>>> practical.
> >>>
> >>> Where is it practical? What is your application? How is it practical to
> >>> delegate the concurrency management of a split mprotect() to user space?
> >>> How do we get rid off a useless up-call to the host?
> >>>
> >>
> >> The email you responded to contained many obstacles against using mprotect()
> >> but you chose to ignore them and snipped them all from your response. Could
> >> you please address the issues instead of dismissing them?
> >
> > I did read the whole email but did not see anything that would make a case
> > for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
>
> I believe that on its own each obstacle I shared with you is significant enough
> to not follow that approach. You simply respond that I am just not making a
> case without acknowledging any obstacle or providing a reason why the obstacles
> are not valid.
>
> To help me understand your view, could you please respond to each of the
> obstacles I list below and how it is not an issue?
>
>
> 1) ABI change:
> mprotect() is currently supported to modify VMA permissions
> irrespective of EPCM permissions. Supporting EPCM permission
> changes with mprotect() would change this behavior.
> For example, currently it is possible to have RW enclave
> memory and support multiple tasks accessing the memory. Two
> tasks can map the memory RW and later one can run mprotect()
> to reduce the VMA permissions to read-only without impacting
> the access of the other task.
> By moving EPCM permission changes to mprotect() this usage
> will no longer be supported and current behavior will change.

Your concurrency scenario is somewhat artificial. Obviously you need to
synchronize somehow, and breaking something that could be done with one
system call into two separates is not going to help with that. On the
contrary, it will add a yet one more difficulty layer.

mprotect() controls PTE permissions, not EPCM permissions. It is the corner
stone to do any sort of confidential computing to have this division.
That's why EACCEPT and EACCEPTCOPY exist.

There is no "current behaviour" yet because there is no mainline code, i.e.
that is easy one to address.

> 2) Only half EPCM permission management:
> Moving to mprotect() as a way to set EPCM permissions is
> not a clear interface for EPCM permission management because
> the kernel can only restrict permissions. Even so, the kernel
> has no insight into the current EPCM permissions and thus whether they
> actually need to be restricted so every mprotect() call,
> all except RWX, will need to be treated as a permission
> restriction with all the implementation obstacles
> that accompany it (more below).
>
> There are two possible ways to implement permission restriction
> as triggered by mprotect(), (a) during the mprotect() call or
> (b) during a subsequent #PF (as suggested by you), each has
> its own obstacles.

I would have prefered also for EAUG to bundle it unconditionally to mmap()
flow. I've merely said that I don't care whether it is a part of mprotect()
flow or in the #PF handler, as long as the feature is not uncontrolled
chaos. Probably at least in mprotect() case it is easier flow to implement
it directly as part of mprotect().

Kernel is not the most trusted party in the confidential computing
scenarios. It is one of the adversaries. And SGX is designed in the way
that enclave controls EPCMD database and kernel PTEs. By trying to
artificially limit this you don't bring security, other than trying to
block implementing applications based on SGX2.

We can ditch the whole SGX, if the point is that kernel controls what
happens inside enclave. Normal VMAs are much more capable for that purpose,
and kernel has full control over them with e.g. PTEs.

>
> 3) mprotect() implementation
>
> When the user calls mprotect() the expectation is that the
> call will either succeed or fail. If the call fails the user
> expects the system to be unchanged. This is not possible if
> permission restriction is done as part of mprotect().
>
> (a) mprotect() may span multiple VMAs and involves VMA splits
> that (from what I understand) cannot be undone. SGX memory
> does not support VMA merges. If any SGX function
> (EMODPR or ETRACK on any page) done after a VMA split fails
> then the user will be left with fragmented memory.

Oh well, SGX does not even support syscalls, if we go this level of
arguments. And you are trying to sort this out with even more flakky
interface, rather than stable EPCM reset to read state.

I've been implementing this exact feature lately and only realistic way to
do it without many corner cases is first use the current ioctl to reset the
range to READ in EPCM, and with EMODPE set the appropriate permissions.


> (b) The EMODPR/ETRACK pair can fail on any of the pages provided
> by the mprotect() call. If there is a failure then the
> kernel cannot undo previously executed EMODPR since the kernel
> cannot run EMODPE. The EPCM permissions are thus left in inconsistent
> state since some of the pages would have changed EPCM permissions
> and mprotect() does not have mechanism to communicate
> partial success.
> The partial success is needed to communicate to user space
> (i) which pages need EACCEPT, (ii) which pages need to be
> in new request (although user space does not have information
> to help the new request succeed - see below).

It's true but how common is that? Return e.g. -EIO, and run-time will
re-build the enclave. That anyway happens all the time with SGX for
various reasons (e.g. VM migration, S3 and whatnot). It's only important
that you know when this happens.

>
> (c) User space runtime has control over management of EPC memory
> and accurate failure information would help it to do so.
> Knowing the error code of the EMODPR failure would help
> user space to take appropriate action. For example, EMODPR
> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
> to learn that it needs to run EACCEPT on that page before
> the EMODPR can succeed. Alternatively, if it learns that the
> return is "SGX_EPC_PAGE_CONFLICT" then it could determine
> that some other part of the runtime attempted an ENCLU
> function on that page.
> It is not possible to provide such detailed errors to user
> space with mprotect().

Actually user space run-time is also an adversary. Kernel and user
space can e.g. kill the enclave or limit it with PTEs but EPCM is
beyond them *after* initialization. The whole point is to be able
to put e.g. containers to untrusted cloud.
>
>
> 4) #PF implementation
>
> (a) There is more to restricting permissions than just running
> ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
> also initiate the ETRACK flow to ensure that any thread within
> the enclave is interrupted by sending an IPI to the CPU,
> this includes the thread that just triggered the #PF.
>
> (b) Second consideration of the EMODPR and ETRACK flow is that
> this has a large "blast radius" in that any thread in the
> enclave needs to be interrupted. #PFs may arrive at any time
> so setting up a page range where a fault into any page in the
> page range will trigger enclave exits for all threads is
> a significant yet random impact. I believe it would be better
> to update all pages in the range at the same time and in this
> way contain the impact of this significant EMODPR/ETRACK/IPIs
> flow.
>
> (c) How will the page fault handler know when EMODPR/ETRACK should
> be run? Consider that the page fault handler can be called
> significantly later than the mprotect() call and that
> user space can call EMODPE any time to extend permissions.
> This implies that EMODPR/ETRACK/IPIs should be run during
> *every* page fault, irrespective of mprotect().
>
> (d) If a page is in pending or modified state then EMODPR will
> always fail. This is something that needs to be fixed by
> user space runtime but the page fault will not be able
> to communicate this.
>
> Considering the above, could you please provide clear guidance on
> how you envision permission restriction to be supported by mprotect()?

I'm not specifically driving #PF implementation but because it was so
important for EAUG, I said that I'm fine with #PF based implementation.

Personally, I would do both EAUG and EMODPR as part of mmap() and
mprotect() (e.g. to catch that partial success and return that -EIO)
flow but either works for me. The API is more of a concern than the
internals.

>
> Reinette

BR, Jarkko

2022-03-18 00:55:05

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>>>> Hi Jarkko,
>>>>
>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>>>> the enclave memory without needing to map it.
>>>>>
>>>>> Which is opposite what you do in EAUG. You can also augment pages without
>>>>> needing the map them. Sure you get that capability, but it is quite useless
>>>>> in practice.
>>>>>
>>>>>> I have considered the idea of supporting the permission restriction with
>>>>>> mprotect() but as you can see in this response I did not find it to be
>>>>>> practical.
>>>>>
>>>>> Where is it practical? What is your application? How is it practical to
>>>>> delegate the concurrency management of a split mprotect() to user space?
>>>>> How do we get rid off a useless up-call to the host?
>>>>>
>>>>
>>>> The email you responded to contained many obstacles against using mprotect()
>>>> but you chose to ignore them and snipped them all from your response. Could
>>>> you please address the issues instead of dismissing them?
>>>
>>> I did read the whole email but did not see anything that would make a case
>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
>>
>> I believe that on its own each obstacle I shared with you is significant enough
>> to not follow that approach. You simply respond that I am just not making a
>> case without acknowledging any obstacle or providing a reason why the obstacles
>> are not valid.
>>
>> To help me understand your view, could you please respond to each of the
>> obstacles I list below and how it is not an issue?
>>
>>
>> 1) ABI change:
>> mprotect() is currently supported to modify VMA permissions
>> irrespective of EPCM permissions. Supporting EPCM permission
>> changes with mprotect() would change this behavior.
>> For example, currently it is possible to have RW enclave
>> memory and support multiple tasks accessing the memory. Two
>> tasks can map the memory RW and later one can run mprotect()
>> to reduce the VMA permissions to read-only without impacting
>> the access of the other task.
>> By moving EPCM permission changes to mprotect() this usage
>> will no longer be supported and current behavior will change.
>
> Your concurrency scenario is somewhat artificial. Obviously you need to
> synchronize somehow, and breaking something that could be done with one
> system call into two separates is not going to help with that. On the
> contrary, it will add a yet one more difficulty layer.

This is about supporting multiple threads in a single enclave, they can
all have their own memory mappings based on the needs. This is currently
supported in mainline as part of SGX1.

>
> mprotect() controls PTE permissions, not EPCM permissions. It is the corner
> stone to do any sort of confidential computing to have this division.
> That's why EACCEPT and EACCEPTCOPY exist.

Right, mprotect() controls PTE permissions but now you are requesting it
to control EPCM permissions also.

There is only one permission field in the mprotect() API so this implies
that you request VMA and EPCM permissions to be in sync. This is new
behavior - different from the current mainline behavior.

>
> There is no "current behaviour" yet because there is no mainline code, i.e.
> that is easy one to address.

What I described is the current behavior in mainline code. It is the
current SGX1 behavior. Running an environment as I described on a SGX2
system with the mprotect() behavior you propose will see new behavior
with some threads encountering page faults with SGX error
code when it could run without issue on SGX1 system.

I do consider this an ABI change. It should be addressed
before using mprotect() for EPCM permissions can be considered.

Please do provide your opinion about the ABI change.

>> 2) Only half EPCM permission management:
>> Moving to mprotect() as a way to set EPCM permissions is
>> not a clear interface for EPCM permission management because
>> the kernel can only restrict permissions. Even so, the kernel
>> has no insight into the current EPCM permissions and thus whether they
>> actually need to be restricted so every mprotect() call,
>> all except RWX, will need to be treated as a permission
>> restriction with all the implementation obstacles
>> that accompany it (more below).
>>
>> There are two possible ways to implement permission restriction
>> as triggered by mprotect(), (a) during the mprotect() call or
>> (b) during a subsequent #PF (as suggested by you), each has
>> its own obstacles.
>
> I would have prefered also for EAUG to bundle it unconditionally to mmap()
> flow. I've merely said that I don't care whether it is a part of mprotect()
> flow or in the #PF handler, as long as the feature is not uncontrolled
> chaos. Probably at least in mprotect() case it is easier flow to implement
> it directly as part of mprotect().
>
> Kernel is not the most trusted party in the confidential computing
> scenarios. It is one of the adversaries. And SGX is designed in the way
> that enclave controls EPCMD database and kernel PTEs. By trying to
> artificially limit this you don't bring security, other than trying to
> block implementing applications based on SGX2.

I do not follow your argument. How is implementing EPCM permission restriction
with an ioctl() limiting anything?

>
> We can ditch the whole SGX, if the point is that kernel controls what
> happens inside enclave. Normal VMAs are much more capable for that purpose,
> and kernel has full control over them with e.g. PTEs.
>
>>
>> 3) mprotect() implementation
>>
>> When the user calls mprotect() the expectation is that the
>> call will either succeed or fail. If the call fails the user
>> expects the system to be unchanged. This is not possible if
>> permission restriction is done as part of mprotect().
>>
>> (a) mprotect() may span multiple VMAs and involves VMA splits
>> that (from what I understand) cannot be undone. SGX memory
>> does not support VMA merges. If any SGX function
>> (EMODPR or ETRACK on any page) done after a VMA split fails
>> then the user will be left with fragmented memory.
>
> Oh well, SGX does not even support syscalls, if we go this level of
> arguments. And you are trying to sort this out with even more flakky
> interface, rather than stable EPCM reset to read state.

I did not find your answer on how to handle this obstacle. Are you
saying that leaving the user with fragmented memory and inconsistent
state is acceptable?

Could you please elaborate? I am trying to understand how to support
this permission restriction with mprotect() and I get stuck on the scenario
where VMAs need to be split - this has to be handled if we go this route.

If it is possible to integrate with mprotect() then I can do so but I
do not see how to do so yet and here I mention one issue and you
again just dismiss it. If we are not able to handle this then it is
indeed mprotect() that will be the "flakky interface" and we should
stick with the ioctl().


> I've been implementing this exact feature lately and only realistic way to
> do it without many corner cases is first use the current ioctl to reset the
> range to READ in EPCM, and with EMODPE set the appropriate permissions.

This is supported in the current implementation with the
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().

>
>
>> (b) The EMODPR/ETRACK pair can fail on any of the pages provided
>> by the mprotect() call. If there is a failure then the
>> kernel cannot undo previously executed EMODPR since the kernel
>> cannot run EMODPE. The EPCM permissions are thus left in inconsistent
>> state since some of the pages would have changed EPCM permissions
>> and mprotect() does not have mechanism to communicate
>> partial success.
>> The partial success is needed to communicate to user space
>> (i) which pages need EACCEPT, (ii) which pages need to be
>> in new request (although user space does not have information
>> to help the new request succeed - see below).
>
> It's true but how common is that?

The kernel needs to handle all scenarios, whether it is common or not.

> Return e.g. -EIO, and run-time will
> re-build the enclave. That anyway happens all the time with SGX for
> various reasons (e.g. VM migration, S3 and whatnot). It's only important
> that you know when this happens.

Please confirm: you support a user space implementation using mprotect()
that can leave the system in inconsistent state?


>> (c) User space runtime has control over management of EPC memory
>> and accurate failure information would help it to do so.
>> Knowing the error code of the EMODPR failure would help
>> user space to take appropriate action. For example, EMODPR
>> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
>> to learn that it needs to run EACCEPT on that page before
>> the EMODPR can succeed. Alternatively, if it learns that the
>> return is "SGX_EPC_PAGE_CONFLICT" then it could determine
>> that some other part of the runtime attempted an ENCLU
>> function on that page.
>> It is not possible to provide such detailed errors to user
>> space with mprotect().
>
> Actually user space run-time is also an adversary. Kernel and user
> space can e.g. kill the enclave or limit it with PTEs but EPCM is
> beyond them *after* initialization. The whole point is to be able
> to put e.g. containers to untrusted cloud.

You seem to be saying that while the kernel could help the
runtime to manage the enclave it should not. Is this correct?

There may be scenarios where an enclave could repair itself during runtime,
for example by running EACCEPT on a page that had a PENDING bit set.
This information is provided to the runtime with the
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
implementation the kernel cannot provide this information and thus
forces the enclave to be torn down and rebuilt to recover.

Is this (using mprotect()) the kernel implementation you prefer?

>> 4) #PF implementation
>>
>> (a) There is more to restricting permissions than just running
>> ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
>> also initiate the ETRACK flow to ensure that any thread within
>> the enclave is interrupted by sending an IPI to the CPU,
>> this includes the thread that just triggered the #PF.
>>
>> (b) Second consideration of the EMODPR and ETRACK flow is that
>> this has a large "blast radius" in that any thread in the
>> enclave needs to be interrupted. #PFs may arrive at any time
>> so setting up a page range where a fault into any page in the
>> page range will trigger enclave exits for all threads is
>> a significant yet random impact. I believe it would be better
>> to update all pages in the range at the same time and in this
>> way contain the impact of this significant EMODPR/ETRACK/IPIs
>> flow.
>>
>> (c) How will the page fault handler know when EMODPR/ETRACK should
>> be run? Consider that the page fault handler can be called
>> significantly later than the mprotect() call and that
>> user space can call EMODPE any time to extend permissions.
>> This implies that EMODPR/ETRACK/IPIs should be run during
>> *every* page fault, irrespective of mprotect().
>>
>> (d) If a page is in pending or modified state then EMODPR will
>> always fail. This is something that needs to be fixed by
>> user space runtime but the page fault will not be able
>> to communicate this.
>>
>> Considering the above, could you please provide clear guidance on
>> how you envision permission restriction to be supported by mprotect()?
>
> I'm not specifically driving #PF implementation but because it was so
> important for EAUG, I said that I'm fine with #PF based implementation.
>
> Personally, I would do both EAUG and EMODPR as part of mmap() and
> mprotect() (e.g. to catch that partial success and return that -EIO)
> flow but either works for me. The API is more of a concern than the
> internals.

Are you now requesting EMODPR as part of mmap() also? Could you
please elaborate how mmap() and mprotect() can handle partial success?

Reinette

2022-03-21 09:59:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> >>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >>>> Hi Jarkko,
> >>>>
> >>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>>>> the enclave memory without needing to map it.
> >>>>>
> >>>>> Which is opposite what you do in EAUG. You can also augment pages without
> >>>>> needing the map them. Sure you get that capability, but it is quite useless
> >>>>> in practice.
> >>>>>
> >>>>>> I have considered the idea of supporting the permission restriction with
> >>>>>> mprotect() but as you can see in this response I did not find it to be
> >>>>>> practical.
> >>>>>
> >>>>> Where is it practical? What is your application? How is it practical to
> >>>>> delegate the concurrency management of a split mprotect() to user space?
> >>>>> How do we get rid off a useless up-call to the host?
> >>>>>
> >>>>
> >>>> The email you responded to contained many obstacles against using mprotect()
> >>>> but you chose to ignore them and snipped them all from your response. Could
> >>>> you please address the issues instead of dismissing them?
> >>>
> >>> I did read the whole email but did not see anything that would make a case
> >>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> >>
> >> I believe that on its own each obstacle I shared with you is significant enough
> >> to not follow that approach. You simply respond that I am just not making a
> >> case without acknowledging any obstacle or providing a reason why the obstacles
> >> are not valid.
> >>
> >> To help me understand your view, could you please respond to each of the
> >> obstacles I list below and how it is not an issue?
> >>
> >>
> >> 1) ABI change:
> >> mprotect() is currently supported to modify VMA permissions
> >> irrespective of EPCM permissions. Supporting EPCM permission
> >> changes with mprotect() would change this behavior.
> >> For example, currently it is possible to have RW enclave
> >> memory and support multiple tasks accessing the memory. Two
> >> tasks can map the memory RW and later one can run mprotect()
> >> to reduce the VMA permissions to read-only without impacting
> >> the access of the other task.
> >> By moving EPCM permission changes to mprotect() this usage
> >> will no longer be supported and current behavior will change.
> >
> > Your concurrency scenario is somewhat artificial. Obviously you need to
> > synchronize somehow, and breaking something that could be done with one
> > system call into two separates is not going to help with that. On the
> > contrary, it will add a yet one more difficulty layer.
>
> This is about supporting multiple threads in a single enclave, they can
> all have their own memory mappings based on the needs. This is currently
> supported in mainline as part of SGX1.
>
> >
> > mprotect() controls PTE permissions, not EPCM permissions. It is the corner
> > stone to do any sort of confidential computing to have this division.
> > That's why EACCEPT and EACCEPTCOPY exist.
>
> Right, mprotect() controls PTE permissions but now you are requesting it
> to control EPCM permissions also.
>
> There is only one permission field in the mprotect() API so this implies
> that you request VMA and EPCM permissions to be in sync. This is new
> behavior - different from the current mainline behavior.

Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR.
Then enclave can use EMODPE to set the permissions.

>
> >
> > There is no "current behaviour" yet because there is no mainline code, i.e.
> > that is easy one to address.
>
> What I described is the current behavior in mainline code. It is the
> current SGX1 behavior. Running an environment as I described on a SGX2
> system with the mprotect() behavior you propose will see new behavior
> with some threads encountering page faults with SGX error
> code when it could run without issue on SGX1 system.
>
> I do consider this an ABI change. It should be addressed
> before using mprotect() for EPCM permissions can be considered.
>
> Please do provide your opinion about the ABI change.

With SGX1 there's no meaningful use for mprotect() after EINIT. This
would be of course applicable after EINIT, not before. We have a flag
to check whether enclave has been initialized.

>
> >> 2) Only half EPCM permission management:
> >> Moving to mprotect() as a way to set EPCM permissions is
> >> not a clear interface for EPCM permission management because
> >> the kernel can only restrict permissions. Even so, the kernel
> >> has no insight into the current EPCM permissions and thus whether they
> >> actually need to be restricted so every mprotect() call,
> >> all except RWX, will need to be treated as a permission
> >> restriction with all the implementation obstacles
> >> that accompany it (more below).
> >>
> >> There are two possible ways to implement permission restriction
> >> as triggered by mprotect(), (a) during the mprotect() call or
> >> (b) during a subsequent #PF (as suggested by you), each has
> >> its own obstacles.
> >
> > I would have prefered also for EAUG to bundle it unconditionally to mmap()
> > flow. I've merely said that I don't care whether it is a part of mprotect()
> > flow or in the #PF handler, as long as the feature is not uncontrolled
> > chaos. Probably at least in mprotect() case it is easier flow to implement
> > it directly as part of mprotect().
> >
> > Kernel is not the most trusted party in the confidential computing
> > scenarios. It is one of the adversaries. And SGX is designed in the way
> > that enclave controls EPCMD database and kernel PTEs. By trying to
> > artificially limit this you don't bring security, other than trying to
> > block implementing applications based on SGX2.
>
> I do not follow your argument. How is implementing EPCM permission restriction
> with an ioctl() limiting anything?

If you use minimal permissions with EMODPR, it gives freedom for EMODPE
to use like it was EMODP, which is great.

>
> >
> > We can ditch the whole SGX, if the point is that kernel controls what
> > happens inside enclave. Normal VMAs are much more capable for that purpose,
> > and kernel has full control over them with e.g. PTEs.
> >
> >>
> >> 3) mprotect() implementation
> >>
> >> When the user calls mprotect() the expectation is that the
> >> call will either succeed or fail. If the call fails the user
> >> expects the system to be unchanged. This is not possible if
> >> permission restriction is done as part of mprotect().
> >>
> >> (a) mprotect() may span multiple VMAs and involves VMA splits
> >> that (from what I understand) cannot be undone. SGX memory
> >> does not support VMA merges. If any SGX function
> >> (EMODPR or ETRACK on any page) done after a VMA split fails
> >> then the user will be left with fragmented memory.
> >
> > Oh well, SGX does not even support syscalls, if we go this level of
> > arguments. And you are trying to sort this out with even more flakky
> > interface, rather than stable EPCM reset to read state.
>
> I did not find your answer on how to handle this obstacle. Are you
> saying that leaving the user with fragmented memory and inconsistent
> state is acceptable?
>
> Could you please elaborate? I am trying to understand how to support
> this permission restriction with mprotect() and I get stuck on the scenario
> where VMAs need to be split - this has to be handled if we go this route.
>
> If it is possible to integrate with mprotect() then I can do so but I
> do not see how to do so yet and here I mention one issue and you
> again just dismiss it. If we are not able to handle this then it is
> indeed mprotect() that will be the "flakky interface" and we should
> stick with the ioctl().

It's flakky because you have to pair every single mprotect() with
ioctl() that is unconditionally set to PROT_READ. Also it is concurrency
wise worse because mprotect() can do both with mmap_sem held. It adds
an extra useless round trip to the kernel.

>
>
> > I've been implementing this exact feature lately and only realistic way to
> > do it without many corner cases is first use the current ioctl to reset the
> > range to READ in EPCM, and with EMODPE set the appropriate permissions.
>
> This is supported in the current implementation with the
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().
>
> >
> >
> >> (b) The EMODPR/ETRACK pair can fail on any of the pages provided
> >> by the mprotect() call. If there is a failure then the
> >> kernel cannot undo previously executed EMODPR since the kernel
> >> cannot run EMODPE. The EPCM permissions are thus left in inconsistent
> >> state since some of the pages would have changed EPCM permissions
> >> and mprotect() does not have mechanism to communicate
> >> partial success.
> >> The partial success is needed to communicate to user space
> >> (i) which pages need EACCEPT, (ii) which pages need to be
> >> in new request (although user space does not have information
> >> to help the new request succeed - see below).
> >
> > It's true but how common is that?
>
> The kernel needs to handle all scenarios, whether it is common or not.

This is not true. Kernel needs to provide meaningful interface to the
hardware that does not user space to do stupid things. We do not provide
1:1 inteface to every single hardware interface. Allowing to use EMODPE
actually does provide full control of the permissions. That should be
enough.

>
> > Return e.g. -EIO, and run-time will
> > re-build the enclave. That anyway happens all the time with SGX for
> > various reasons (e.g. VM migration, S3 and whatnot). It's only important
> > that you know when this happens.
>
> Please confirm: you support a user space implementation using mprotect()
> that can leave the system in inconsistent state?

It actually does not leave kernel structures to incosistent state so it's
all fine. Partial success is almost inexistent unless there is actual bug
in the run-time. It's same as with files, sockets etc. If partial success
happens, user space is probably already in incosistent state.

I'm not sure how "system" is defined here so I cannot give definitive a
yes/no answer.

User space kicking itself to foot is not something that kernel usually
has to take extra measures for.

>
>
> >> (c) User space runtime has control over management of EPC memory
> >> and accurate failure information would help it to do so.
> >> Knowing the error code of the EMODPR failure would help
> >> user space to take appropriate action. For example, EMODPR
> >> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
> >> to learn that it needs to run EACCEPT on that page before
> >> the EMODPR can succeed. Alternatively, if it learns that the
> >> return is "SGX_EPC_PAGE_CONFLICT" then it could determine
> >> that some other part of the runtime attempted an ENCLU
> >> function on that page.
> >> It is not possible to provide such detailed errors to user
> >> space with mprotect().
> >
> > Actually user space run-time is also an adversary. Kernel and user
> > space can e.g. kill the enclave or limit it with PTEs but EPCM is
> > beyond them *after* initialization. The whole point is to be able
> > to put e.g. containers to untrusted cloud.
>
> You seem to be saying that while the kernel could help the
> runtime to manage the enclave it should not. Is this correct?
>
> There may be scenarios where an enclave could repair itself during runtime,
> for example by running EACCEPT on a page that had a PENDING bit set.
> This information is provided to the runtime with the
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
> implementation the kernel cannot provide this information and thus
> forces the enclave to be torn down and rebuilt to recover.
>
> Is this (using mprotect()) the kernel implementation you prefer?

If there is partial success it's a bug, not a legit scenario for well
behaving run-time.

>
> >> 4) #PF implementation
> >>
> >> (a) There is more to restricting permissions than just running
> >> ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
> >> also initiate the ETRACK flow to ensure that any thread within
> >> the enclave is interrupted by sending an IPI to the CPU,
> >> this includes the thread that just triggered the #PF.
> >>
> >> (b) Second consideration of the EMODPR and ETRACK flow is that
> >> this has a large "blast radius" in that any thread in the
> >> enclave needs to be interrupted. #PFs may arrive at any time
> >> so setting up a page range where a fault into any page in the
> >> page range will trigger enclave exits for all threads is
> >> a significant yet random impact. I believe it would be better
> >> to update all pages in the range at the same time and in this
> >> way contain the impact of this significant EMODPR/ETRACK/IPIs
> >> flow.
> >>
> >> (c) How will the page fault handler know when EMODPR/ETRACK should
> >> be run? Consider that the page fault handler can be called
> >> significantly later than the mprotect() call and that
> >> user space can call EMODPE any time to extend permissions.
> >> This implies that EMODPR/ETRACK/IPIs should be run during
> >> *every* page fault, irrespective of mprotect().
> >>
> >> (d) If a page is in pending or modified state then EMODPR will
> >> always fail. This is something that needs to be fixed by
> >> user space runtime but the page fault will not be able
> >> to communicate this.
> >>
> >> Considering the above, could you please provide clear guidance on
> >> how you envision permission restriction to be supported by mprotect()?
> >
> > I'm not specifically driving #PF implementation but because it was so
> > important for EAUG, I said that I'm fine with #PF based implementation.
> >
> > Personally, I would do both EAUG and EMODPR as part of mmap() and
> > mprotect() (e.g. to catch that partial success and return that -EIO)
> > flow but either works for me. The API is more of a concern than the
> > internals.
>
> Are you now requesting EMODPR as part of mmap() also? Could you
> please elaborate how mmap() and mprotect() can handle partial success?

Nope, I was just referring that EAUG is #PF based but could have been
also been implemented as part of mmap() flow. API wise it is symmetrical.

BR, Jarkko

2022-03-28 23:54:08

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

Hi Jarkko,

On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
>>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
>>>> Hi Jarkko,
>>>>
>>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
>>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>>>>>> Hi Jarkko,
>>>>>>
>>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>>>>>> the enclave memory without needing to map it.
>>>>>>>
>>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
>>>>>>> needing the map them. Sure you get that capability, but it is quite useless
>>>>>>> in practice.
>>>>>>>
>>>>>>>> I have considered the idea of supporting the permission restriction with
>>>>>>>> mprotect() but as you can see in this response I did not find it to be
>>>>>>>> practical.
>>>>>>>
>>>>>>> Where is it practical? What is your application? How is it practical to
>>>>>>> delegate the concurrency management of a split mprotect() to user space?
>>>>>>> How do we get rid off a useless up-call to the host?
>>>>>>>
>>>>>>
>>>>>> The email you responded to contained many obstacles against using mprotect()
>>>>>> but you chose to ignore them and snipped them all from your response. Could
>>>>>> you please address the issues instead of dismissing them?
>>>>>
>>>>> I did read the whole email but did not see anything that would make a case
>>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
>>>>
>>>> I believe that on its own each obstacle I shared with you is significant enough
>>>> to not follow that approach. You simply respond that I am just not making a
>>>> case without acknowledging any obstacle or providing a reason why the obstacles
>>>> are not valid.
>>>>
>>>> To help me understand your view, could you please respond to each of the
>>>> obstacles I list below and how it is not an issue?
>>>>
>>>>
>>>> 1) ABI change:
>>>> mprotect() is currently supported to modify VMA permissions
>>>> irrespective of EPCM permissions. Supporting EPCM permission
>>>> changes with mprotect() would change this behavior.
>>>> For example, currently it is possible to have RW enclave
>>>> memory and support multiple tasks accessing the memory. Two
>>>> tasks can map the memory RW and later one can run mprotect()
>>>> to reduce the VMA permissions to read-only without impacting
>>>> the access of the other task.
>>>> By moving EPCM permission changes to mprotect() this usage
>>>> will no longer be supported and current behavior will change.
>>>
>>> Your concurrency scenario is somewhat artificial. Obviously you need to
>>> synchronize somehow, and breaking something that could be done with one
>>> system call into two separates is not going to help with that. On the
>>> contrary, it will add a yet one more difficulty layer.
>>
>> This is about supporting multiple threads in a single enclave, they can
>> all have their own memory mappings based on the needs. This is currently
>> supported in mainline as part of SGX1.


Could you please comment on the above?

>>
>>>
>>> mprotect() controls PTE permissions, not EPCM permissions. It is the corner
>>> stone to do any sort of confidential computing to have this division.
>>> That's why EACCEPT and EACCEPTCOPY exist.
>>
>> Right, mprotect() controls PTE permissions but now you are requesting it
>> to control EPCM permissions also.
>>
>> There is only one permission field in the mprotect() API so this implies
>> that you request VMA and EPCM permissions to be in sync. This is new
>> behavior - different from the current mainline behavior.
>
> Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR.
> Then enclave can use EMODPE to set the permissions.

I think that I am starting to decipher what your vision is. If I understand
correctly mprotect() would serve a double purpose:
a) modify VMA permissions exactly as is done in SGX1 (no consideration of EPCM
permissions and only limitation is that VMA permissions are not allowed to
exceed vm_max_prot_bits)
b) EPCM permissions are _always_ restricted to PROT_READ irrespective of
VMA permissions requested (new)

Is this correct?

With mprotect() always resetting EPCM to be PROT_READ there is no new sync
between VMA and EPCM permissions.

>>> There is no "current behaviour" yet because there is no mainline code, i.e.
>>> that is easy one to address.
>>
>> What I described is the current behavior in mainline code. It is the
>> current SGX1 behavior. Running an environment as I described on a SGX2
>> system with the mprotect() behavior you propose will see new behavior
>> with some threads encountering page faults with SGX error
>> code when it could run without issue on SGX1 system.
>>
>> I do consider this an ABI change. It should be addressed
>> before using mprotect() for EPCM permissions can be considered.
>>
>> Please do provide your opinion about the ABI change.
>
> With SGX1 there's no meaningful use for mprotect() after EINIT. This
> would be of course applicable after EINIT, not before. We have a flag
> to check whether enclave has been initialized.

I interpret your comment to mean that the ABI change is acceptable since
existing usages of mprotect() after EINIT are not meaningful.

>>>> 2) Only half EPCM permission management:
>>>> Moving to mprotect() as a way to set EPCM permissions is
>>>> not a clear interface for EPCM permission management because
>>>> the kernel can only restrict permissions. Even so, the kernel
>>>> has no insight into the current EPCM permissions and thus whether they
>>>> actually need to be restricted so every mprotect() call,
>>>> all except RWX, will need to be treated as a permission
>>>> restriction with all the implementation obstacles
>>>> that accompany it (more below).
>>>>
>>>> There are two possible ways to implement permission restriction
>>>> as triggered by mprotect(), (a) during the mprotect() call or
>>>> (b) during a subsequent #PF (as suggested by you), each has
>>>> its own obstacles.
>>>
>>> I would have prefered also for EAUG to bundle it unconditionally to mmap()
>>> flow. I've merely said that I don't care whether it is a part of mprotect()
>>> flow or in the #PF handler, as long as the feature is not uncontrolled
>>> chaos. Probably at least in mprotect() case it is easier flow to implement
>>> it directly as part of mprotect().
>>>
>>> Kernel is not the most trusted party in the confidential computing
>>> scenarios. It is one of the adversaries. And SGX is designed in the way
>>> that enclave controls EPCMD database and kernel PTEs. By trying to
>>> artificially limit this you don't bring security, other than trying to
>>> block implementing applications based on SGX2.
>>
>> I do not follow your argument. How is implementing EPCM permission restriction
>> with an ioctl() limiting anything?
>
> If you use minimal permissions with EMODPR, it gives freedom for EMODPE
> to use like it was EMODP, which is great.

Understood.

>
>>
>>>
>>> We can ditch the whole SGX, if the point is that kernel controls what
>>> happens inside enclave. Normal VMAs are much more capable for that purpose,
>>> and kernel has full control over them with e.g. PTEs.
>>>
>>>>
>>>> 3) mprotect() implementation
>>>>
>>>> When the user calls mprotect() the expectation is that the
>>>> call will either succeed or fail. If the call fails the user
>>>> expects the system to be unchanged. This is not possible if
>>>> permission restriction is done as part of mprotect().
>>>>
>>>> (a) mprotect() may span multiple VMAs and involves VMA splits
>>>> that (from what I understand) cannot be undone. SGX memory
>>>> does not support VMA merges. If any SGX function
>>>> (EMODPR or ETRACK on any page) done after a VMA split fails
>>>> then the user will be left with fragmented memory.
>>>
>>> Oh well, SGX does not even support syscalls, if we go this level of
>>> arguments. And you are trying to sort this out with even more flakky
>>> interface, rather than stable EPCM reset to read state.
>>
>> I did not find your answer on how to handle this obstacle. Are you
>> saying that leaving the user with fragmented memory and inconsistent
>> state is acceptable?
>>
>> Could you please elaborate? I am trying to understand how to support
>> this permission restriction with mprotect() and I get stuck on the scenario
>> where VMAs need to be split - this has to be handled if we go this route.
>>
>> If it is possible to integrate with mprotect() then I can do so but I
>> do not see how to do so yet and here I mention one issue and you
>> again just dismiss it. If we are not able to handle this then it is
>> indeed mprotect() that will be the "flakky interface" and we should
>> stick with the ioctl().
>
> It's flakky because you have to pair every single mprotect() with
> ioctl() that is unconditionally set to PROT_READ. Also it is concurrency
> wise worse because mprotect() can do both with mmap_sem held. It adds
> an extra useless round trip to the kernel.

This still does not address my concern regarding possible fragmented memory.
Are you considering fragmented memory to be in the same category as the
inconsistent state mentioned below? (That it is a consequence of a bug in
the run-time?)

>>> I've been implementing this exact feature lately and only realistic way to
>>> do it without many corner cases is first use the current ioctl to reset the
>>> range to READ in EPCM, and with EMODPE set the appropriate permissions.
>>
>> This is supported in the current implementation with the
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().
>>
>>>
>>>
>>>> (b) The EMODPR/ETRACK pair can fail on any of the pages provided
>>>> by the mprotect() call. If there is a failure then the
>>>> kernel cannot undo previously executed EMODPR since the kernel
>>>> cannot run EMODPE. The EPCM permissions are thus left in inconsistent
>>>> state since some of the pages would have changed EPCM permissions
>>>> and mprotect() does not have mechanism to communicate
>>>> partial success.
>>>> The partial success is needed to communicate to user space
>>>> (i) which pages need EACCEPT, (ii) which pages need to be
>>>> in new request (although user space does not have information
>>>> to help the new request succeed - see below).
>>>
>>> It's true but how common is that?
>>
>> The kernel needs to handle all scenarios, whether it is common or not.
>
> This is not true. Kernel needs to provide meaningful interface to the
> hardware that does not user space to do stupid things. We do not provide
> 1:1 inteface to every single hardware interface. Allowing to use EMODPE
> actually does provide full control of the permissions. That should be
> enough.

I was not proposing that the kernel "provides a 1:1 interface for every single
hardware interface".

My comment was that the kernel needs to handle all user space scenarios.

It is possible that an enclave page is in a state where EMODPR can fail
because of something that needs to be fixed from within the enclave or run-time,
for example, clearing a EPCM.PENDING bit. The kernel needs to handle such
scenarios. I understand from your explanations that run-time handling of
such scenarios are not a goal or requirement but instead should always
require enclave re-build.

>>> Return e.g. -EIO, and run-time will
>>> re-build the enclave. That anyway happens all the time with SGX for
>>> various reasons (e.g. VM migration, S3 and whatnot). It's only important
>>> that you know when this happens.
>>
>> Please confirm: you support a user space implementation using mprotect()
>> that can leave the system in inconsistent state?
>
> It actually does not leave kernel structures to incosistent state so it's
> all fine. Partial success is almost inexistent unless there is actual bug
> in the run-time. It's same as with files, sockets etc. If partial success
> happens, user space is probably already in incosistent state.
>
> I'm not sure how "system" is defined here so I cannot give definitive a
> yes/no answer.
>
> User space kicking itself to foot is not something that kernel usually
> has to take extra measures for.

I am not against allowing user space kicking itself. I was of the opinion
that it would be helpful if the kernel can provide information to user space to
salvage itself instead of always forcing it to re-build. You make it clear
here and below that this is not a goal or requirement.

>>>> (c) User space runtime has control over management of EPC memory
>>>> and accurate failure information would help it to do so.
>>>> Knowing the error code of the EMODPR failure would help
>>>> user space to take appropriate action. For example, EMODPR
>>>> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
>>>> to learn that it needs to run EACCEPT on that page before
>>>> the EMODPR can succeed. Alternatively, if it learns that the
>>>> return is "SGX_EPC_PAGE_CONFLICT" then it could determine
>>>> that some other part of the runtime attempted an ENCLU
>>>> function on that page.
>>>> It is not possible to provide such detailed errors to user
>>>> space with mprotect().
>>>
>>> Actually user space run-time is also an adversary. Kernel and user
>>> space can e.g. kill the enclave or limit it with PTEs but EPCM is
>>> beyond them *after* initialization. The whole point is to be able
>>> to put e.g. containers to untrusted cloud.
>>
>> You seem to be saying that while the kernel could help the
>> runtime to manage the enclave it should not. Is this correct?
>>
>> There may be scenarios where an enclave could repair itself during runtime,
>> for example by running EACCEPT on a page that had a PENDING bit set.
>> This information is provided to the runtime with the
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
>> implementation the kernel cannot provide this information and thus
>> forces the enclave to be torn down and rebuilt to recover.
>>
>> Is this (using mprotect()) the kernel implementation you prefer?
>
> If there is partial success it's a bug, not a legit scenario for well
> behaving run-time.

ok

Reinette

2022-03-30 19:14:59

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Wed, Mar 30, 2022 at 06:00:30PM +0300, Jarkko Sakkinen wrote:
> On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> >
> > On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> > > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> > >> Hi Jarkko,
> > >>
> > >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> > >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> > >>>> Hi Jarkko,
> > >>>>
> > >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> > >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> > >>>>>> Hi Jarkko,
> > >>>>>>
> > >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> > >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> > >>>>>>>> the enclave memory without needing to map it.
> > >>>>>>>
> > >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
> > >>>>>>> needing the map them. Sure you get that capability, but it is quite useless
> > >>>>>>> in practice.
> > >>>>>>>
> > >>>>>>>> I have considered the idea of supporting the permission restriction with
> > >>>>>>>> mprotect() but as you can see in this response I did not find it to be
> > >>>>>>>> practical.
> > >>>>>>>
> > >>>>>>> Where is it practical? What is your application? How is it practical to
> > >>>>>>> delegate the concurrency management of a split mprotect() to user space?
> > >>>>>>> How do we get rid off a useless up-call to the host?
> > >>>>>>>
> > >>>>>>
> > >>>>>> The email you responded to contained many obstacles against using mprotect()
> > >>>>>> but you chose to ignore them and snipped them all from your response. Could
> > >>>>>> you please address the issues instead of dismissing them?
> > >>>>>
> > >>>>> I did read the whole email but did not see anything that would make a case
> > >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> > >>>>
> > >>>> I believe that on its own each obstacle I shared with you is significant enough
> > >>>> to not follow that approach. You simply respond that I am just not making a
> > >>>> case without acknowledging any obstacle or providing a reason why the obstacles
> > >>>> are not valid.
> > >>>>
> > >>>> To help me understand your view, could you please respond to each of the
> > >>>> obstacles I list below and how it is not an issue?
> > >>>>
> > >>>>
> > >>>> 1) ABI change:
> > >>>> mprotect() is currently supported to modify VMA permissions
> > >>>> irrespective of EPCM permissions. Supporting EPCM permission
> > >>>> changes with mprotect() would change this behavior.
> > >>>> For example, currently it is possible to have RW enclave
> > >>>> memory and support multiple tasks accessing the memory. Two
> > >>>> tasks can map the memory RW and later one can run mprotect()
> > >>>> to reduce the VMA permissions to read-only without impacting
> > >>>> the access of the other task.
> > >>>> By moving EPCM permission changes to mprotect() this usage
> > >>>> will no longer be supported and current behavior will change.
> > >>>
> > >>> Your concurrency scenario is somewhat artificial. Obviously you need to
> > >>> synchronize somehow, and breaking something that could be done with one
> > >>> system call into two separates is not going to help with that. On the
> > >>> contrary, it will add a yet one more difficulty layer.
> > >>
> > >> This is about supporting multiple threads in a single enclave, they can
> > >> all have their own memory mappings based on the needs. This is currently
> > >> supported in mainline as part of SGX1.
> >
> >
> > Could you please comment on the above?
>
>
> I've probably spent probably over two weeks of my life addressing concerns
> to the point that I feel as I was implementing this feature (that could be
> faster way to get it done).
>
> So I'll just wait the next version and see how it is like and give my
> feedback based on that. It's not really my problem to address every
> possible concern.

Once v3 is out, I'll check what I think is right, and what is wrong
and might send some fixups and see where that leads to. I think it
is more costructive way to move forward. Repeating same arguments
leads to nowhere.

BR, Jarkko

2022-03-31 03:34:06

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions

On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
>
> On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> >>>> Hi Jarkko,
> >>>>
> >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >>>>>> Hi Jarkko,
> >>>>>>
> >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>>>>>> the enclave memory without needing to map it.
> >>>>>>>
> >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
> >>>>>>> needing the map them. Sure you get that capability, but it is quite useless
> >>>>>>> in practice.
> >>>>>>>
> >>>>>>>> I have considered the idea of supporting the permission restriction with
> >>>>>>>> mprotect() but as you can see in this response I did not find it to be
> >>>>>>>> practical.
> >>>>>>>
> >>>>>>> Where is it practical? What is your application? How is it practical to
> >>>>>>> delegate the concurrency management of a split mprotect() to user space?
> >>>>>>> How do we get rid off a useless up-call to the host?
> >>>>>>>
> >>>>>>
> >>>>>> The email you responded to contained many obstacles against using mprotect()
> >>>>>> but you chose to ignore them and snipped them all from your response. Could
> >>>>>> you please address the issues instead of dismissing them?
> >>>>>
> >>>>> I did read the whole email but did not see anything that would make a case
> >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> >>>>
> >>>> I believe that on its own each obstacle I shared with you is significant enough
> >>>> to not follow that approach. You simply respond that I am just not making a
> >>>> case without acknowledging any obstacle or providing a reason why the obstacles
> >>>> are not valid.
> >>>>
> >>>> To help me understand your view, could you please respond to each of the
> >>>> obstacles I list below and how it is not an issue?
> >>>>
> >>>>
> >>>> 1) ABI change:
> >>>> mprotect() is currently supported to modify VMA permissions
> >>>> irrespective of EPCM permissions. Supporting EPCM permission
> >>>> changes with mprotect() would change this behavior.
> >>>> For example, currently it is possible to have RW enclave
> >>>> memory and support multiple tasks accessing the memory. Two
> >>>> tasks can map the memory RW and later one can run mprotect()
> >>>> to reduce the VMA permissions to read-only without impacting
> >>>> the access of the other task.
> >>>> By moving EPCM permission changes to mprotect() this usage
> >>>> will no longer be supported and current behavior will change.
> >>>
> >>> Your concurrency scenario is somewhat artificial. Obviously you need to
> >>> synchronize somehow, and breaking something that could be done with one
> >>> system call into two separates is not going to help with that. On the
> >>> contrary, it will add a yet one more difficulty layer.
> >>
> >> This is about supporting multiple threads in a single enclave, they can
> >> all have their own memory mappings based on the needs. This is currently
> >> supported in mainline as part of SGX1.
>
>
> Could you please comment on the above?


I've probably spent probably over two weeks of my life addressing concerns
to the point that I feel as I was implementing this feature (that could be
faster way to get it done).

So I'll just wait the next version and see how it is like and give my
feedback based on that. It's not really my problem to address every
possible concern.

BR, Jarkko