Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2571715rdg; Mon, 16 Oct 2023 08:19:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGbxnlaCs50skkVSmPs75D4DFcsddvxQNyvI+E5eeV3FU1WtqkjYniUetxOrVI9+KMb2vkd X-Received: by 2002:a17:90b:248e:b0:27d:2190:614e with SMTP id nt14-20020a17090b248e00b0027d2190614emr1678006pjb.40.1697469549484; Mon, 16 Oct 2023 08:19:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697469549; cv=none; d=google.com; s=arc-20160816; b=Rqa/vIGSDesCIEnuHSnTs7LIvoLljsqdL8BaLKKYn5P9KD+yXGEzMN64+kwmzpQlV4 dIB7ndqzi3vRNl4kekocfgMIk5109NdTWkieTYPtsDwfbNQBmNv+pTlzGebGzvlDSSQR 7wq6VrnNygUr/d68eO8rKlvEbdCbjEY9h3TPH2N+FBKwXAjpjHu2X1+nuN3WvxloMdiR uskg7yg8MGc5X6BS2/KJk9S3+Sk/iROAclQ6BgrrBOcydsodsjz0ib16MK/k8k0jDSQT dYbCZ6Si9zv1IHxnPGAhvnx/y8YbXCeQDb2RS4ojoyuts1DHqWUEH+BZMSIHGoLpEYAQ PSoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=q1mk0edZzBzeqvaBvILL9fveNRk2O03GLgP3WO+MmcU=; fh=BTo9xeFlhiD+sW+GJtwJGv7wyA6TSbJlwNTFMO8yEZY=; b=Zs09xZdSBuNbeKJxmB5JF0e6JIjVtHlrUd/2nY2fUOaWjDl3uE3E/4F3TlARqiuX3c pO6J3jX0w0n7mypmAXEH7E5fwzgAsLw86xcUWqjXm8OAb1ZKxzEAyCi0TImFXs6dHA+d M3EMiGj1KHRi8U41cRWBKXmU7ctET23mms5VFWMA+wdGNRIeFT4TXSHaveQsL+GI2ClM Dgf9X6ddahH5pah6COW+MJnJPHxfzZufKF6nabi8QcUmGSjJpuj2R+hgGUIK0EhT/wff j/bJvEdl6sLTGV9wIdqPu5BKFVpXMrrI9YJbNCPhGDgclXOGM+JkZnjzUw7DoTWM/ohI 5+mQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=EV9tZnQJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id kk11-20020a17090b4a0b00b0027749a1fe74si6665059pjb.182.2023.10.16.08.19.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 08:19:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=EV9tZnQJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id B16D38059068; Mon, 16 Oct 2023 08:19:06 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233725AbjJPPSr (ORCPT + 99 others); Mon, 16 Oct 2023 11:18:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233706AbjJPPSp (ORCPT ); Mon, 16 Oct 2023 11:18:45 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DDF1F1; Mon, 16 Oct 2023 08:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=q1mk0edZzBzeqvaBvILL9fveNRk2O03GLgP3WO+MmcU=; b=EV9tZnQJGPjQ2XN7YaG1iFBGwU CVhmKPs4rfB0vzIIjYWOAFfuD8Lclxxb3kH9GuUcHbzKvVXDibGUns84n44sJwba/b5OZXYMzpfgc Dg3+GrFpKnF/FDc20y8fLmw/tWdZbWw7RZsmvQoWmOhZCj4gAoOA9oajpabnWYYOpQUagNYIbtKWU sGfpICUov4sq+UCTadpy86PtxT8/LdcX+L9mItkitNc1hiT0ERHcMCKmrXX2UsbrO7bsGd6AbXmMX AXNe8drk59YS8DwfBT4C8y+gASl7XiGxQ5RaNLDQisdytnXdQNpebca/HI2dFRVHE8UWOo+ePROth daal4R4Q==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qsPMS-006lo3-BB; Mon, 16 Oct 2023 15:18:28 +0000 Date: Mon, 16 Oct 2023 16:18:28 +0100 From: Matthew Wilcox To: jeffxu@chromium.org Cc: akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall Message-ID: References: <20231016143828.647848-1-jeffxu@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231016143828.647848-1-jeffxu@chromium.org> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 16 Oct 2023 08:19:06 -0700 (PDT) On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > Modern CPUs support memory permissions such as RW and NX bits. Linux has > supported NX since the release of kernel version 2.6.8 in August 2004 [1]. This seems like a confusing way to introduce the subject. Here, you're talking about page permissions, whereas (as far as I can tell), mseal() is about making _virtual_ addresses immutable, for some value of immutable. > Memory sealing additionally protects the mapping itself against > modifications. This is useful to mitigate memory corruption issues where > a corrupted pointer is passed to a memory management syscall. For example, > such an attacker primitive can break control-flow integrity guarantees > since read-only memory that is supposed to be trusted can become writable > or .text pages can get remapped. Memory sealing can automatically be > applied by the runtime loader to seal .text and .rodata pages and > applications can additionally seal security critical data at runtime. > A similar feature already exists in the XNU kernel with the > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. > Also, Chrome wants to adopt this feature for their CFI work [2] and this > patchset has been designed to be compatible with the Chrome use case. This [2] seems very generic and wide-ranging, not helpful. [5] was more useful to understand what you're trying to do. > The new mseal() is an architecture independent syscall, and with > following signature: > > mseal(void addr, size_t len, unsigned int types, unsigned int flags) > > addr/len: memory range. Must be continuous/allocated memory, or else > mseal() will fail and no VMA is updated. For details on acceptable > arguments, please refer to comments in mseal.c. Those are also fully > covered by the selftest. Mmm. So when you say "continuous/allocated" what you really mean is "Must have contiguous VMAs" rather than "All pages in this range must be populated", yes? > types: bit mask to specify which syscall to seal, currently they are: > MM_SEAL_MSEAL 0x1 > MM_SEAL_MPROTECT 0x2 > MM_SEAL_MUNMAP 0x4 > MM_SEAL_MMAP 0x8 > MM_SEAL_MREMAP 0x10 I don't understand why we want this level of granularity. The OpenBSD and XNU examples just say "This must be immutable*". For values of immutable that allow downgrading access (eg RW to RO or RX to RO), but not upgrading access (RW->RX, RO->*, RX->RW). > Each bit represents sealing for one specific syscall type, e.g. > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitmask > is that the API is extendable, i.e. when needed, the sealing can be > extended to madvise, mlock, etc. Backward compatibility is also easy. Honestly, it feels too flexible. Why not just two flags to mprotect() -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- maybe for some things we want to be able to downgrade and for other things, we don't. I'd like to see some discussion of how this interacts with mprotect(). As far as I can tell, the intent is to lock the protections/existance of the mapping, and not to force memory to stay in core. So it's fine for the kernel to swap out the page and set up a PTE as a swap entry. It's also fine for the kernel to mark PTEs as RO to catch page faults; we're concerned with the LOGICAL permissions, and not the page tables.