Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
From:   "Xing, Cedric" <cedric.xing@intel.com>
To:     "Christopherson, Sean J" <sean.j.christopherson@intel.com>
CC:     Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>,
        Andy Lutomirski <luto@kernel.org>,
        Stephen Smalley <sds@tycho.nsa.gov>,
        James Morris <jmorris@namei.org>,
        "Serge E . Hallyn" <serge@hallyn.com>,
        LSM List <linux-security-module@vger.kernel.org>,
        Paul Moore <paul@paul-moore.com>,
        Eric Paris <eparis@parisplace.org>,
        "selinux@vger.kernel.org" <selinux@vger.kernel.org>,
        Jethro Beekman <jethro@fortanix.com>,
        "Hansen, Dave" <dave.hansen@intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        "Linus Torvalds" <torvalds@linux-foundation.org>,
        LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
        "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        "nhorman@redhat.com" <nhorman@redhat.com>,
        "npmccallum@redhat.com" <npmccallum@redhat.com>,
        "Ayoun, Serge" <serge.ayoun@intel.com>,
        "Katz-zamir, Shay" <shay.katz-zamir@intel.com>,
        "Huang, Haitao" <haitao.huang@intel.com>,
        Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
        "Svahn, Kai" <kai.svahn@intel.com>, Borislav Petkov <bp@alien8.de>,
        Josh Triplett <josh@joshtriplett.org>,
        "Huang, Kai" <kai.huang@intel.com>,
        David Rientjes <rientjes@google.com>,
        "Roberts, William C" <william.c.roberts@intel.com>,
        "Tricca, Philip B" <philip.b.tricca@intel.com>
Subject: RE: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM
Thread-Topic: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM
Thread-Index: AQHVGAkeu//pWDttTk+JsIicrUDwLaaG/FIQgAOo7oD//5fa8A==
Date:   Mon, 3 Jun 2019 18:30:54 +0000
Message-ID: <960B34DE67B9E140824F1DCDEC400C0F654ED042@ORSMSX116.amr.corp.intel.com>
References: <20190531233159.30992-1-sean.j.christopherson@intel.com>
 <960B34DE67B9E140824F1DCDEC400C0F654EC5FD@ORSMSX116.amr.corp.intel.com>
 <20190603171549.GE13384@linux.intel.com>
In-Reply-To: <20190603171549.GE13384@linux.intel.com>
Accept-Language: en-US
Content-Language: en-US
dlp-product: dlpe-windows
dlp-version: 11.2.0.6
dlp-reaction: no-action
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 10:16 AM
> 
> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > Hi Sean,
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> > completely internal to LSM because they can be both produced and
> > consumed inside an LSM module. So spilling them into SGX driver and
> > also user mode code makes the solution ugly and in some cases
> > impractical because not every enclave host process has a priori
> > knowledge on whether or not an enclave page would be EMODPE'd at
> runtime.
> 
> In this case, the host process should tag *all* pages it *might* convert
> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> if the enclave actually attempts mprotect(PROT_EXEC).

What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?

Anyway, my point is that it is unnecessary even if it works.

> 
> Take the SELinux path for example.  The only scenario in which
> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> PROT_EXEC.
> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> then PROT_EXEC will be cleared from @allowed_prot.
> 
> As Stephen pointed out, auditing the denials on @allowed_prot means the
> log will contain false positives of a sort.  But this is more of a noise
> issue than true false positives.  E.g. there are three possible outcomes
> for the enclave.
> 
>   - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
>     Requesting ALLOW_EXEC is either a straightforward a userspace bug or
>     a poorly written generic enclave loader.
> 
>   - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
>     the denial is a true false positive.
> 
>   - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
>     on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
>     The audit log will be noisy, but viewed as a whole the denials
> aren't
>     false positives.

What I was talking about was EMODPE[PROT_WRITE] on an RX page.

> 
> The potential for noisy audit logs and/or false positives is unfortunate,
> but it's (by far) the lesser of many evils.
> 
> > Theoretically speaking, what you really need is a per page flag (let's
> > name it WRITTEN?) indicating whether a page has ever been written to
> > (or more precisely, granted PROT_WRITE), which will be used to decide
> > whether to grant PROT_EXEC when requested in future. Given the fact
> > that all mprotect() goes through LSM and mmap() is limited to
> > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> asking user mode code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data
> >   structure to store for every page, the WRITTEN flag as described
> above.
> >   WRITTEN is cleared initially for all pages.
> 
> This would effectively require *every* LSM to duplicate the SGX driver's
> functionality, e.g. track per-page metadata, implement locking to
> prevent races between multiple mm structs, etc...

Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.

If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.

> 
> >   Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can
> >   only be PROT_NONE. This is to force all protection changes to go
> through
> >   security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages
> >   that are requested PROT_WRITE.
> 
> How would an LSM associate a page with a specific enclave?  vma->vm_file
> will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
> because we're allowing multiple processes to map a single enclave, not
> to mention that by mm would require holding a reference to the mm.

Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.

> 
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> look up
> >   and use WRITTEN flags for all pages within @vma, along with other
> global
> >   flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
> to decide
> >   on allowing/rejecting @prot.
> 
> vma->vm_file will always be /dev/sgx/enclave at this point, which means
> LSMs don't have the necessary anchor back to the source file, e.g. to
> enforce FILE__EXECUTE.  The noexec file system case is also unaddressed.

vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().
 
The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.

> 
> > * In hook security_file_free(), if @file is an  enclave, free storage
> >   allocated for WRITTEN flags.