Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1571315pxk; Fri, 18 Sep 2020 16:55:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdtTydgZLbrXVj+LLg5UsNASrCRknj5ROqjT4wTfgs7DpXo89UQszlv7zUMRYAyimarOYc X-Received: by 2002:a05:6402:1d0f:: with SMTP id dg15mr42587932edb.342.1600473351770; Fri, 18 Sep 2020 16:55:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600473351; cv=none; d=google.com; s=arc-20160816; b=DGuzA8pcoFio6UAncm64hIYRAOi86RKzp0/JrIl2E74ZBY5L2VTIRVUdAsI6ZcU/oV ohtsoal2yYFWhu3VslDjB/5V1h+ETQlTq7XwCMxnN3LevrJ9RPQhZJqizPCIkMc+bOWR Sc1UruQwjCkjsmDuA4eg7hLsY+MDxPoRph860QZ0KTKEzYRwFPzVSS3JmWwlPqZpK/hE q4KXOB451/XW5+LEIPL4Yq5tmwyhBdgU6L4JVbGNcASkD79vggHWqEChPCbTL8FT1nl9 XVPeYqG9U5ZFL7KodS6Mtly1PUtG2oM8xm6ItuThqxocTGi3JyWXZA6APy4kETyJezBd rCdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :ironport-sdr:ironport-sdr; bh=3f3HtEDZGIIoix455d92ftmfcLHdDNG5P61zHU35F/Q=; b=LcGJ/SUh3rEhCZtdkoPRAH9SUlS+gHHKc4k7uqPuy18UOW1M6AWQugUZ1MOHyWYUnx m17+PGdMajq4MGwXvIsIsMiH3s2jIRzX4LPPtUvY/lSsMwAiW0BK3f6Y/IzJ10YHvUm2 +GyiIFKzpHGHyBuMrxVuR3cqIQC6hB6HEV05QoS/D59hOW0tEJE7ACVCVpuYiwVoJCfH fitM1FIOaeK5WDEjEACWK/tqATK8tYxU7c34yEcc/YMT9d4nJhlm0NXzXkQQdOIO0yfu ZCuNA4TX7Dnjk4nTn1TjCjK+jgfsXkV+3jK+pUX1jqzDxU96RLBnO7TOidlWK1BW3xg5 bs3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h21si3812922edj.244.2020.09.18.16.55.13; Fri, 18 Sep 2020 16:55:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726129AbgIRXxq (ORCPT + 99 others); Fri, 18 Sep 2020 19:53:46 -0400 Received: from mga09.intel.com ([134.134.136.24]:33797 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726009AbgIRXxq (ORCPT ); Fri, 18 Sep 2020 19:53:46 -0400 IronPort-SDR: w88lfMPgP8nXs6QNBM3G9+SRjZ14xkOCZ5XgA13OPSP9X04b0sjDBzC0nzr1DkG4NMrOIrcm8e GdoUPGxCptDA== X-IronPort-AV: E=McAfee;i="6000,8403,9748"; a="160973147" X-IronPort-AV: E=Sophos;i="5.77,276,1596524400"; d="scan'208";a="160973147" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2020 16:53:41 -0700 IronPort-SDR: 7Tx3tHHwo1WoVwkybdZU7HNvtlLMmqX3VW0acb1K3sujhz8PdF+fzQtAi4QgCRrc9FtFlGDduE oOSShv2q69YQ== X-IronPort-AV: E=Sophos;i="5.77,276,1596524400"; d="scan'208";a="381080557" Received: from sjchrist-ice.jf.intel.com (HELO sjchrist-ice) ([10.54.31.34]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Sep 2020 16:53:38 -0700 Date: Fri, 18 Sep 2020 16:53:37 -0700 From: Sean Christopherson To: Andy Lutomirski Cc: Jarkko Sakkinen , X86 ML , linux-sgx@vger.kernel.org, LKML , Linux-MM , Andrew Morton , Matthew Wilcox , Jethro Beekman , Darren Kenny , Andy Shevchenko , asapek@google.com, Borislav Petkov , "Xing, Cedric" , chenalexchen@google.com, Conrad Parker , cyhanish@google.com, Dave Hansen , "Huang, Haitao" , Josh Triplett , "Huang, Kai" , "Svahn, Kai" , Keith Moyer , Christian Ludloff , Neil Horman , Nathaniel McCallum , Patrick Uiterwijk , David Rientjes , Thomas Gleixner , yaozhangx@google.com Subject: Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect() Message-ID: <20200918235337.GA21189@sjchrist-ice> References: <20200915112842.897265-1-jarkko.sakkinen@linux.intel.com> <20200915112842.897265-11-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote: > On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen > wrote: > > > > From: Sean Christopherson > > > > Add vm_ops()->mprotect() for additional constraints for a VMA. > > > > Intel Software Guard eXtensions (SGX) will use this callback to add two > > constraints: > > > > 1. Verify that the address range does not have holes: each page address > > must be filled with an enclave page. > > 2. Verify that VMA permissions won't surpass the permissions of any enclave > > page within the address range. Enclave cryptographically sealed > > permissions for each page address that set the upper limit for possible > > VMA permissions. Not respecting this can cause #GP's to be emitted. Side note, #GP is wrong. EPCM violations are #PFs. Skylake CPUs #GP, but that's technically an errata. But this isn't the real motivation, e.g. userspace can already trigger #GP/#PF by reading/writing a bad address, SGX simply adds another flavor. > It's been awhile since I looked at this. Can you remind us: is this > just preventing userspace from shooting itself in the foot or is this > something more important? Something more important, it's used to prevent userspace from circumventing a noexec filesystem by loading code into an enclave, and to give the kernel the option of adding enclave specific LSM policies in the future. The source file (if one exists) for the enclave is long gone when the enclave is actually mmap()'d and mprotect()'d. To enforce noexec, the requested permissions for a given page are snapshotted when the page is added to the enclave, i.e. when the enclave is built. Enclave pages that will be executable must originate from an a MAYEXEC VMA, e.g. the source page can't come from a noexec file system. The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring permissions beyond what are allowed, e.g. trying to map an enclave page with EXEC permissions when the page was added to the enclave without EXEC. Future LSM policies have a similar need due to vm_file always pointing at /dev/sgx/enclave, e.g. policies couldn't be attached to a specific enclave. ->mprotect() again allows enforcing permissions at map time that were checked at enclave build time, e.g. via an LSM hook. Deferring ->mprotect() until LSM support is added (if it ever is) would be problematic due to SGX2. With SGX2, userspace can extend permissions of an enclave page (for the CPU's EPC Map entry, not the kernel's page tables) without bouncing through the kernel. Without ->mprotect () enforcement. userspace could do EADD(RW) -> mprotect(RWX) -> EMODPE(X) to gain W+X. We want to disallow such a flow now, i.e. force userspace to do EADD(RW,X), so that the hypothetical LSM hook would have all information at EADD(), i.e. would be aware of the EXEC permission, without creating divergent behavior based on whether or not an LSM is active.