2022-05-10 19:38:47

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH V5 15/31] x86/sgx: Support restricting of enclave page permissions

In the initial (SGX1) version of SGX, pages in an enclave need to be
created with permissions that support all usages of the pages, from the
time the enclave is initialized until it is unloaded. For example,
pages used by a JIT compiler or when code needs to otherwise be
relocated need to always have RWX permissions.

SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
and can be used to restrict the EPCM permissions of regular enclave
pages within an initialized enclave.

Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
restricting EPCM permissions. With this ioctl() the user specifies
a page range and the EPCM permissions to be applied to all pages in
the provided range. ENCLS[EMODPR] is run to restrict the EPCM
permissions followed by the ENCLS[ETRACK] flow that will ensure
no cached linear-to-physical address mappings to the changed
pages remain.

It is possible for the permission change request to fail on any
page within the provided range, either with an error encountered
by the kernel or by the SGX hardware while running
ENCLS[EMODPR]. To support partial success the ioctl() returns an
error code based on failures encountered by the kernel as well
as two result output parameters: one for the number of pages
that were successfully changed and one for the SGX return code.

The page table entry permissions are not impacted by the EPCM
permission changes. VMAs and PTEs will continue to allow the
maximum vetted permissions determined at the time the pages
are added to the enclave. The SGX error code in a page fault
will indicate if it was an EPCM permission check that prevented
an access attempt.

No checking is done to ensure that the permissions are actually
being restricted. This is because the enclave may have relaxed
the EPCM permissions from within the enclave without the kernel
knowing. An attempt to relax permissions using this call will
be ignored by the hardware.

Reviewed-by: Jarkko Sakkinen <[email protected]>
Tested-by: Jarkko Sakkinen <[email protected]>
Tested-by: Haitao Huang <[email protected]>
Tested-by: Vijay Dhanraj <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since V4:
- Add Vijay's Tested-by tag.
- Add Jarkko's Reviewed-by and Tested-by tags.
- Add Haitao's Tested-by tag.
- Revert change that required minimum of read permissions since
requesting zero permissions is valid. Return earlier
parameter check to ensure read permission accompanies write
permission. (Vijay)

Changes since V3:
- User provides only new permissions, replacing secinfo. (Jarkko)
- Expand comments of sgx_enclave_etrack() to document the
context in which the function is safe to use.

Changes since V2:
- Include the sgx_ioc_sgx2_ready() utility
that previously was in "x86/sgx: Support relaxing of enclave page
permissions" that is removed from the next version.
- Few renames requested by Jarkko:
struct sgx_enclave_restrict_perm ->
struct sgx_enclave_restrict_permissions
sgx_enclave_restrict_perm() ->
sgx_enclave_restrict_permissions()
sgx_ioc_enclave_restrict_perm() ->
sgx_ioc_enclave_restrict_permissions()
- Make EPCM permissions independent from kernel view of
permissions. (Jarkko)
- Remove attempt at runtime tracking of EPCM permissions
(sgx_encl_page->vm_run_prot_bits).
- Do not flush page table entries - they are no longer impacted by
EPCM permission changes.
- Modify changelog to reflect new architecture.
- Ensure at least PROT_READ is requested - enclave requires read
access to the page for commands like EMODPE and EACCEPT. (Jarkko)

Changes since V1:
- Change terminology to use "relax" instead of "extend" to refer to
the case when enclave page permissions are added (Dave).
- Use ioctl() in commit message (Dave).
- Add examples on what permissions would be allowed (Dave).
- Split enclave page permission changes into two ioctl()s, one for
permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
(Jarkko).
- In support of the ioctl() name change the following names have been
changed:
struct sgx_page_modp -> struct sgx_enclave_restrict_perm
sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
sgx_page_modp() -> sgx_enclave_restrict_perm()
- ioctl() takes entire secinfo as input instead of
page permissions only (Jarkko).
- Fix kernel-doc to include () in function name.
- Create and use utility for the ETRACK flow.
- Fixups in comments
- Move kernel-doc to function that provides documentation for
Documentation/x86/sgx.rst.
- Remove redundant comment.
- Make explicit which members of struct sgx_enclave_restrict_perm
are for output (Dave).

arch/x86/include/uapi/asm/sgx.h | 21 ++++
arch/x86/kernel/cpu/sgx/ioctl.c | 216 ++++++++++++++++++++++++++++++++
2 files changed, 237 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index f4b81587e90b..82648c006470 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -29,6 +29,8 @@ enum sgx_page_flags {
_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
#define SGX_IOC_VEPC_REMOVE_ALL \
_IO(SGX_MAGIC, 0x04)
+#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
+ _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_restrict_permissions)

/**
* struct sgx_enclave_create - parameter structure for the
@@ -76,6 +78,25 @@ struct sgx_enclave_provision {
__u64 fd;
};

+/**
+ * struct sgx_enclave_restrict_permissions - parameters for ioctl
+ * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @offset: starting page offset (page aligned relative to enclave base
+ * address defined in SECS)
+ * @length: length of memory (multiple of the page size)
+ * @permissions:new permission bits for pages in range described by @offset
+ * and @length
+ * @result: (output) SGX result code of ENCLS[EMODPR] function
+ * @count: (output) bytes successfully changed (multiple of page size)
+ */
+struct sgx_enclave_restrict_permissions {
+ __u64 offset;
+ __u64 length;
+ __u64 permissions;
+ __u64 result;
+ __u64 count;
+};
+
struct sgx_enclave_run;

/**
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d41aa204761..720188d86ed4 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -660,6 +660,218 @@ static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
return sgx_set_attribute(&encl->attributes_mask, params.fd);
}

+/*
+ * Ensure enclave is ready for SGX2 functions. Readiness is checked
+ * by ensuring the hardware supports SGX2 and the enclave is initialized
+ * and thus able to handle requests to modify pages within it.
+ */
+static int sgx_ioc_sgx2_ready(struct sgx_encl *encl)
+{
+ if (!(cpu_feature_enabled(X86_FEATURE_SGX2)))
+ return -ENODEV;
+
+ if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+ return -EINVAL;
+
+ return 0;
+}
+
+/*
+ * Some SGX functions require that no cached linear-to-physical address
+ * mappings are present before they can succeed. Collaborate with
+ * hardware via ENCLS[ETRACK] to ensure that all cached
+ * linear-to-physical address mappings belonging to all threads of
+ * the enclave are cleared. See sgx_encl_cpumask() for details.
+ *
+ * Must be called with enclave's mutex held from the time the
+ * SGX function requiring that no cached linear-to-physical mappings
+ * are present is executed until this ETRACK flow is complete.
+ */
+static int sgx_enclave_etrack(struct sgx_encl *encl)
+{
+ void *epc_virt;
+ int ret;
+
+ epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
+ ret = __etrack(epc_virt);
+ if (ret) {
+ /*
+ * ETRACK only fails when there is an OS issue. For
+ * example, two consecutive ETRACK was sent without
+ * completed IPI between.
+ */
+ pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
+ /*
+ * Send IPIs to kick CPUs out of the enclave and
+ * try ETRACK again.
+ */
+ on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+ ret = __etrack(epc_virt);
+ if (ret) {
+ pr_err_once("ETRACK repeat returned %d (0x%x)",
+ ret, ret);
+ return -EFAULT;
+ }
+ }
+ on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+
+ return 0;
+}
+
+/**
+ * sgx_enclave_restrict_permissions() - Restrict EPCM permissions
+ * @encl: Enclave to which the pages belong.
+ * @modp: Checked parameters from user on which pages need modifying and
+ * their new permissions.
+ *
+ * Return:
+ * - 0: Success.
+ * - -errno: Otherwise.
+ */
+static long
+sgx_enclave_restrict_permissions(struct sgx_encl *encl,
+ struct sgx_enclave_restrict_permissions *modp)
+{
+ struct sgx_encl_page *entry;
+ struct sgx_secinfo secinfo;
+ unsigned long addr;
+ unsigned long c;
+ void *epc_virt;
+ int ret;
+
+ memset(&secinfo, 0, sizeof(secinfo));
+ secinfo.flags = modp->permissions & SGX_SECINFO_PERMISSION_MASK;
+
+ for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
+ addr = encl->base + modp->offset + c;
+
+ mutex_lock(&encl->lock);
+
+ entry = sgx_encl_load_page(encl, addr);
+ if (IS_ERR(entry)) {
+ ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
+ goto out_unlock;
+ }
+
+ /*
+ * Changing EPCM permissions is only supported on regular
+ * SGX pages. Attempting this change on other pages will
+ * result in #PF.
+ */
+ if (entry->type != SGX_PAGE_TYPE_REG) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /*
+ * Apart from ensuring that read-access remains, do not verify
+ * the permission bits requested. Kernel has no control over
+ * how EPCM permissions can be relaxed from within the enclave.
+ * ENCLS[EMODPR] can only remove existing EPCM permissions,
+ * attempting to set new permissions will be ignored by the
+ * hardware.
+ */
+
+ /* Change EPCM permissions. */
+ epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
+ ret = __emodpr(&secinfo, epc_virt);
+ if (encls_faulted(ret)) {
+ /*
+ * All possible faults should be avoidable:
+ * parameters have been checked, will only change
+ * permissions of a regular page, and no concurrent
+ * SGX1/SGX2 ENCLS instructions since these
+ * are protected with mutex.
+ */
+ pr_err_once("EMODPR encountered exception %d\n",
+ ENCLS_TRAPNR(ret));
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+ if (encls_failed(ret)) {
+ modp->result = ret;
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ ret = sgx_enclave_etrack(encl);
+ if (ret) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ mutex_unlock(&encl->lock);
+ }
+
+ ret = 0;
+ goto out;
+
+out_unlock:
+ mutex_unlock(&encl->lock);
+out:
+ modp->count = c;
+
+ return ret;
+}
+
+/**
+ * sgx_ioc_enclave_restrict_permissions() - handler for
+ * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @encl: an enclave pointer
+ * @arg: userspace pointer to a &struct sgx_enclave_restrict_permissions
+ * instance
+ *
+ * SGX2 distinguishes between relaxing and restricting the enclave page
+ * permissions maintained by the hardware (EPCM permissions) of pages
+ * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
+ *
+ * EPCM permissions cannot be restricted from within the enclave, the enclave
+ * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]
+ * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call
+ * will be ignored by the hardware.
+ *
+ * Return:
+ * - 0: Success
+ * - -errno: Otherwise
+ */
+static long sgx_ioc_enclave_restrict_permissions(struct sgx_encl *encl,
+ void __user *arg)
+{
+ struct sgx_enclave_restrict_permissions params;
+ long ret;
+
+ ret = sgx_ioc_sgx2_ready(encl);
+ if (ret)
+ return ret;
+
+ if (copy_from_user(&params, arg, sizeof(params)))
+ return -EFAULT;
+
+ if (sgx_validate_offset_length(encl, params.offset, params.length))
+ return -EINVAL;
+
+ if (params.permissions & ~SGX_SECINFO_PERMISSION_MASK)
+ return -EINVAL;
+
+ /*
+ * Fail early if invalid permissions requested to prevent ENCLS[EMODPR]
+ * from faulting later when the CPU does the same check.
+ */
+ if ((params.permissions & SGX_SECINFO_W) &&
+ !(params.permissions & SGX_SECINFO_R))
+ return -EINVAL;
+
+ if (params.result || params.count)
+ return -EINVAL;
+
+ ret = sgx_enclave_restrict_permissions(encl, &params);
+
+ if (copy_to_user(arg, &params, sizeof(params)))
+ return -EFAULT;
+
+ return ret;
+}
+
long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
{
struct sgx_encl *encl = filep->private_data;
@@ -681,6 +893,10 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
case SGX_IOC_ENCLAVE_PROVISION:
ret = sgx_ioc_enclave_provision(encl, (void __user *)arg);
break;
+ case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
+ ret = sgx_ioc_enclave_restrict_permissions(encl,
+ (void __user *)arg);
+ break;
default:
ret = -ENOIOCTLCMD;
break;
--
2.25.1



2022-09-01 16:34:34

by zhubojun

[permalink] [raw]
Subject: Re: [PATCH V5 15/31] x86/sgx: Support restricting of enclave page permissions

Hi, Reinette, thanks for your great contribution for EDMM Linux kernel patch. I am trying to follow the newest patch now, and I have some questions on it.

It seems that `sgx_enclave_restrict_permissions()` is able to do permission restrictions for multiple enclave’s pages. After driver invokes ENCLS[EMODPR] to restrict the page’s permission, it should then invoke ENCLS[ETRACK] and send IPIs to ensure stale TLB entries have been flushed. Only in this way, ENCLU[EACCEPT] inside enclave can only succeed.

Current implementation invokes `sgx_enclave_etrack(encl)` after every `__emodpr(…)` in the for loop. My question is:

Can we move the `sgx_enclave_etrack(encl)` out of the for loop? After doing so, `sgx_enclave_etrack(encl)` is invoked **one** time for multiple enclave pages’ permission restriction, instead of N times (N = `modp -> length / PAGE_SIZE`). We may gain some performance optimization from it.

Please correct my if my understanding is incorrect. Looking forward to your reply and Thanks for your time!

BR,
Bojun

2022-09-02 15:55:46

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH V5 15/31] x86/sgx: Support restricting of enclave page permissions

Hi Bojun

On 9/1/2022 8:55 AM, zhubojun wrote:
> It seems that `sgx_enclave_restrict_permissions()` is able to do
> permission restrictions for multiple enclave’s pages. After driver
> invokes ENCLS[EMODPR] to restrict the page’s permission, it should
> then invoke ENCLS[ETRACK] and send IPIs to ensure stale TLB entries
> have been flushed. Only in this way, ENCLU[EACCEPT] inside enclave
> can only succeed.

Correct.

>
> Current implementation invokes `sgx_enclave_etrack(encl)` after every
> `__emodpr(…)` in the for loop. My question is:
>
> Can we move the `sgx_enclave_etrack(encl)` out of the for loop? After
> doing so, `sgx_enclave_etrack(encl)` is invoked **one** time for
> multiple enclave pages’ permission restriction, instead of N times (N
> = `modp -> length / PAGE_SIZE`). We may gain some performance
> optimization from it.

How important is the performance of page permission restriction? How
about the performance of page type modification?

>
> Please correct my if my understanding is incorrect. Looking forward
> to your reply and Thanks for your time!

From the hardware perspective, a single ETRACK can be run after
EMODPR is run on a range of pages.

Some things to keep in mind when considering making this change:

Note that the enclave's mutex is obtained and released every time
an enclave page is modified. This is needed to (a) avoid softlockups
when modifying a large range of pages and (b) give the reclaimer
opportunity make space to load pages that will be modified.

Moving the ETRACK flow out of the for loop implies that the mutex would
be released between the time the page is modified and ETRACK flow is run
(with enclave mutex held). It is thus possible for other changes
to be made or attempted on a page between the time it is modified
and the ETRACK flow. The possible interactions between different
page modifications (both initiated from user space and the OS via
the reclaimer) need to be studied if it is considered to split
this flow in two parts.

With the ETRACK flow done while the enclave page being modified is
loaded there is a guarantee that the SECS page is loaded also. When
the ETRACK flow is isolated there needs to be changes to ensure
that the SECS page is loaded.

It needs to be considered how errors will be communicated to user
space and how possible inconsistent state could affect user space. In
support of partial success the ioctl() returns a count indicating
how many pages were successfully modified. With the configuration
and ETRACK done per page and their failures handled, the meaning
of this count is clear. This needs to be considered because it is
not possible for the kernel to undo an EMODPR. So if all (or some of) the
EMODPRs succeed but the final ETRACK fails for some reason then
the successful EMODPR cannot be undone yet all will be considered
failed? How should this be reported to user space? Variations may be ok
since EMODPR can be repeated from what I can tell but I envision scenarios
where some pages in a range may have their permissions restricted
(and thus have EPCM.PR set) but which pages have this state is not
clear to user space. I don't know what would be acceptable here.
Looking at the EACCEPT flow in the SDM it does not seem as though
EPCM.PR is one of the EPC page settings that are verified.

Reinette



2022-09-07 02:51:45

by zhubojun

[permalink] [raw]
Subject: Re: [PATCH V5 15/31] x86/sgx: Support restricting of enclave page permissions

Hi, Reinette. Sorry for late reply! Appreciate for you detailed explanation!

> On Sep 2, 2022, at 23:22, Reinette Chatre <[email protected]> wrote:
>
> How important is the performance of page permission restriction? How
> about the performance of page type modification?

Enclave applications may need to change its page permissions at runtime.
If they change page permissions frequently, it may introduce substantial
overhead, compared to applications running on native Linux.
(`mprotect()` is more lightweight compared to restricting and extending
page permissions inside enclave)

But I have not profiled the detailed of the page permission restriction’s
performance. So I’m not sure how much benefit we will gain for performance
if we move the ETRACK flow outside the `for loop`.

> From the hardware perspective, a single ETRACK can be run after
> EMODPR is run on a range of pages.
>
> Some things to keep in mind when considering making this change:
>
> Note that the enclave's mutex is obtained and released every time
> an enclave page is modified. This is needed to (a) avoid softlockups
> when modifying a large range of pages and (b) give the reclaimer
> opportunity make space to load pages that will be modified.
>
> Moving the ETRACK flow out of the for loop implies that the mutex would
> be released between the time the page is modified and ETRACK flow is run
> (with enclave mutex held). It is thus possible for other changes
> to be made or attempted on a page between the time it is modified
> and the ETRACK flow. The possible interactions between different
> page modifications (both initiated from user space and the OS via
> the reclaimer) need to be studied if it is considered to split
> this flow in two parts.
>
> With the ETRACK flow done while the enclave page being modified is
> loaded there is a guarantee that the SECS page is loaded also. When
> the ETRACK flow is isolated there needs to be changes to ensure
> that the SECS page is loaded.

Thanks for pointing out such case which I have not considered yet!

> It needs to be considered how errors will be communicated to user
> space and how possible inconsistent state could affect user space. In
> support of partial success the ioctl() returns a count indicating
> how many pages were successfully modified. With the configuration
> and ETRACK done per page and their failures handled, the meaning
> of this count is clear. This needs to be considered because it is
> not possible for the kernel to undo an EMODPR. So if all (or some of) the
> EMODPRs succeed but the final ETRACK fails for some reason then
> the successful EMODPR cannot be undone yet all will be considered
> failed? How should this be reported to user space? Variations may be ok
> since EMODPR can be repeated from what I can tell but I envision scenarios
> where some pages in a range may have their permissions restricted
> (and thus have EPCM.PR set) but which pages have this state is not
> clear to user space. I don't know what would be acceptable here.
> Looking at the EACCEPT flow in the SDM it does not seem as though
> EPCM.PR is one of the EPC page settings that are verified.
>
> Reinette

I agree with you. It is hard to handle when there the final ETRACK fails
but EMODPR succeeds.

Thanks for showing the case I have not considered but
needs thinking deeply!

BR,
Bojun


2022-09-07 03:01:24

by zhubojun

[permalink] [raw]
Subject: Re: [PATCH V5 15/31] x86/sgx: Support restricting of enclave page permissions

Hi, Reinette. Sorry for late reply! Appreciate for you detailed explanation!

> On Sep 2, 2022, at 23:22, Reinette Chatre <[email protected]> wrote:
>
> How important is the performance of page permission restriction? How
> about the performance of page type modification?

Enclave applications may need to change its page permissions at runtime.
If they change page permissions frequently, it may introduce substantial
overhead, compared to applications running on native Linux.
(`mprotect()` is more lightweight compared to restricting and extending
page permissions inside enclave)

But I have not profiled the detailed of the page permission restriction’s
performance. So I’m not sure how much benefit we will gain for performance
if we move the ETRACK flow outside the `for loop`.

> From the hardware perspective, a single ETRACK can be run after
> EMODPR is run on a range of pages.
>
> Some things to keep in mind when considering making this change:
>
> Note that the enclave's mutex is obtained and released every time
> an enclave page is modified. This is needed to (a) avoid softlockups
> when modifying a large range of pages and (b) give the reclaimer
> opportunity make space to load pages that will be modified.
>
> Moving the ETRACK flow out of the for loop implies that the mutex would
> be released between the time the page is modified and ETRACK flow is run
> (with enclave mutex held). It is thus possible for other changes
> to be made or attempted on a page between the time it is modified
> and the ETRACK flow. The possible interactions between different
> page modifications (both initiated from user space and the OS via
> the reclaimer) need to be studied if it is considered to split
> this flow in two parts.
>
> With the ETRACK flow done while the enclave page being modified is
> loaded there is a guarantee that the SECS page is loaded also. When
> the ETRACK flow is isolated there needs to be changes to ensure
> that the SECS page is loaded.

Thanks for pointing out such case which I have not considered yet!

> It needs to be considered how errors will be communicated to user
> space and how possible inconsistent state could affect user space. In
> support of partial success the ioctl() returns a count indicating
> how many pages were successfully modified. With the configuration
> and ETRACK done per page and their failures handled, the meaning
> of this count is clear. This needs to be considered because it is
> not possible for the kernel to undo an EMODPR. So if all (or some of) the
> EMODPRs succeed but the final ETRACK fails for some reason then
> the successful EMODPR cannot be undone yet all will be considered
> failed? How should this be reported to user space? Variations may be ok
> since EMODPR can be repeated from what I can tell but I envision scenarios
> where some pages in a range may have their permissions restricted
> (and thus have EPCM.PR set) but which pages have this state is not
> clear to user space. I don't know what would be acceptable here.
> Looking at the EACCEPT flow in the SDM it does not seem as though
> EPCM.PR is one of the EPC page settings that are verified.
>
> Reinette

I agree with you. It is hard to handle when there the final ETRACK fails
but EMODPR succeeds.

Thanks for showing the case I have not considered but
needs thinking deeply!

BR,
Bojun