Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp3424556rdg; Tue, 17 Oct 2023 14:34:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAfHPifsOBXeNIK001uqKnA+6VXIk1prW68Cq+7lB4B7Sr6YhVQt48Ld1bfxI0bymF/WEE X-Received: by 2002:a17:902:ce8e:b0:1c3:868f:5958 with SMTP id f14-20020a170902ce8e00b001c3868f5958mr3906070plg.20.1697578460228; Tue, 17 Oct 2023 14:34:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697578460; cv=none; d=google.com; s=arc-20160816; b=LrAEsqc1IkT0dWoJVk+v8kCvI5AGlOvnKKLhlXhrV0w13xj6Rx6digiYt5LShvaUQF DsWckpnP/jojnPfbqkD2eFJewUFUhMWF2u1E3jsliboskfkudunxFQjzPd4y0opf9WOV FHmVdQt7L5R3KvgZG8LjQ3lIZui/Le6Bc6n4WQQY4DpSlwrdL0FiaJXstyxGO4gbYeFT BM8bA7wAdpTEwFopg6S7Gzj6xxdkI87ZOcpFKikXVSvmmpsTc2kMP0Cqc6LgakrPv7Pt bueICBmVe+T2fKtRHeEMx6jGuc3NN7uWj5REZ8hJ2qKJMoDslgOzATs+bR2KEDnqP22F E4Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; fh=bBT4h5l2mXTvmvmJOY/1nAhFMzAknuzkjaf84Z5MrgU=; b=WuBUHYdgLznxK/KMuFBjkM52SWhwApqXrWEKVB+EOzT7IyTUsMwLTEt6XLmfA2xKQf QCDC9ZTw/kdgPi0m8yTPb4fHAzL5EjnDmufoE3w6UxI4bbMUR+y18T3/yOVIx+k7ZNaz W8n+Z6ZKCX3ug/EVHtEbMAjHmH+UshKWB3kmIypNTSDqDGYzNtb7lFjfVIB1zd1BjoG9 RMV2SqBgqnbkeKGcoDvn6M9GGR9M2Hj/N9o6ytFx3k4Hs7zyHr2c/0ieRsxCAGJlhn9o KrbJXZts5pKK44FwUzDmVXRwM/0p+UJzsDzgTPAqqpthzWoBPkEOKwlPXZQLIj9dFAAs eedg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yJxn2o7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id q12-20020a170902f34c00b001c737e1d3bfsi2004423ple.237.2023.10.17.14.34.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 14:34:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=yJxn2o7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 4449B80C0DCE; Tue, 17 Oct 2023 14:34:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231569AbjJQVeJ (ORCPT + 99 others); Tue, 17 Oct 2023 17:34:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229883AbjJQVeI (ORCPT ); Tue, 17 Oct 2023 17:34:08 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFCC5FA for ; Tue, 17 Oct 2023 14:34:04 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id d75a77b69052e-41b19dda4c6so42131cf.1 for ; Tue, 17 Oct 2023 14:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697578444; x=1698183244; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; b=yJxn2o7MWJoPCpBvwRDcfiFAQXwz/HAnNxWwq+OturXWiC0UVy0kaaPL30iD7Dg7Af el6Ft+9nH7nscGSEvsTrxrJ8eifqdIh7kXeShlKUjNcbQVSdo3GUkjOHBlO3LX89kIAu kofj9qAb8X5T5pXUC9GRxnYfEAY5PC2pXQvqf4JPu4EGDxE5w2QjsXVlzzm+QNxAVsEr /wveG8nldTm+2OAQ3O3T6Aur4Uo2UKQz87JLcRU8IqO4vbigNcyO3CvuYyUyDGEVkZS5 gGDBQZ+zEbGky4hct2WY+OVGBzIBEiXQSSHznkDa3n7q+kyJ557DCpM9ZnlcaeSsjwyV KLjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697578444; x=1698183244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FMXGhkJDs7eDT4ZOO41ZpDvjS9DBH4hMrhhJ0DjwM1k=; b=TnWSvd2kF8cpy2FBC2INcXrHxE4/HfyuoA/5dj705Jdm/rrLjZPVvDQvKtKtLL/OaK JkLVjq4HVJwZznI6Zr7RHCS696w1XNc1hbvLnQTrtmAJ07UJTNqA0/EbZR3nqhdUTBOi IPjkeMDTqMUt19hR6kFJF0OeE4ZzErl9tjYmE2/6zyLaWy1RF7NVzr0J7PL2XE0vPGLi dbyBdHZVnyE3zNmoAlKrlrLp7l9JCZ11TGkInjw5ItSF2+9s0ijR+GgUXupXb5Wmd6jf YZ6S62K8LjTCi8Ee02Wo7JAV+Xn+axkvhJGUNSVr9qpKPWw+j7P//ZvWP+TMLISmRPBb 1o3Q== X-Gm-Message-State: AOJu0YyrEZ7PnvzjvFL5Rvgchhxd5LRFnYIXsHv44+Ref64/5+exHddL rh6bf61X8/Mg5twJZsR8EBgU7mTQxLQRAJ0XoYPR3w== X-Received: by 2002:a05:622a:68ce:b0:41b:834e:1b24 with SMTP id ic14-20020a05622a68ce00b0041b834e1b24mr94058qtb.6.1697578443767; Tue, 17 Oct 2023 14:34:03 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Jeff Xu Date: Tue, 17 Oct 2023 14:33:25 -0700 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Pedro Falcato Cc: Matthew Wilcox , jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 17 Oct 2023 14:34:17 -0700 (PDT) On Tue, Oct 17, 2023 at 8:30=E2=80=AFAM Pedro Falcato wrote: > > On Mon, Oct 16, 2023 at 4:18=E2=80=AFPM Matthew Wilcox wrote: > > > > On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > > > Modern CPUs support memory permissions such as RW and NX bits. Linux = has > > > supported NX since the release of kernel version 2.6.8 in August 2004= [1]. > > > > This seems like a confusing way to introduce the subject. Here, you're > > talking about page permissions, whereas (as far as I can tell), mseal()= is > > about making _virtual_ addresses immutable, for some value of immutable= . > > > > > Memory sealing additionally protects the mapping itself against > > > modifications. This is useful to mitigate memory corruption issues wh= ere > > > a corrupted pointer is passed to a memory management syscall. For exa= mple, > > > such an attacker primitive can break control-flow integrity guarantee= s > > > since read-only memory that is supposed to be trusted can become writ= able > > > or .text pages can get remapped. Memory sealing can automatically be > > > applied by the runtime loader to seal .text and .rodata pages and > > > applications can additionally seal security critical data at runtime. > > > A similar feature already exists in the XNU kernel with the > > > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscal= l [4]. > > > Also, Chrome wants to adopt this feature for their CFI work [2] and t= his > > > patchset has been designed to be compatible with the Chrome use case. > > > > This [2] seems very generic and wide-ranging, not helpful. [5] was mor= e > > useful to understand what you're trying to do. > > > > > The new mseal() is an architecture independent syscall, and with > > > following signature: > > > > > > mseal(void addr, size_t len, unsigned int types, unsigned int flags) > > > > > > addr/len: memory range. Must be continuous/allocated memory, or else > > > mseal() will fail and no VMA is updated. For details on acceptable > > > arguments, please refer to comments in mseal.c. Those are also fully > > > covered by the selftest. > > > > Mmm. So when you say "continuous/allocated" what you really mean is > > "Must have contiguous VMAs" rather than "All pages in this range must > > be populated", yes? > > > > > types: bit mask to specify which syscall to seal, currently they are: > > > MM_SEAL_MSEAL 0x1 > > > MM_SEAL_MPROTECT 0x2 > > > MM_SEAL_MUNMAP 0x4 > > > MM_SEAL_MMAP 0x8 > > > MM_SEAL_MREMAP 0x10 > > > > I don't understand why we want this level of granularity. The OpenBSD > > and XNU examples just say "This must be immutable*". For values of > > immutable that allow downgrading access (eg RW to RO or RX to RO), > > but not upgrading access (RW->RX, RO->*, RX->RW). > > > > > Each bit represents sealing for one specific syscall type, e.g. > > > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bit= mask > > > is that the API is extendable, i.e. when needed, the sealing can be > > > extended to madvise, mlock, etc. Backward compatibility is also easy. > > > > Honestly, it feels too flexible. Why not just two flags to mprotect() > > -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- > > maybe for some things we want to be able to downgrade and for other > > things, we don't. > > I think it's worth pointing out that this suggestion (with PROT_*) > could easily integrate with mmap() and as such allow for one-shot > mmap() + mseal(). > If we consider the common case as 'addr =3D mmap(...); mseal(addr);', it > definitely sounds like a performance win as we halve the number of > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem > like common patterns. > Yes. mmap() can support sealing as well, and memory is allocated as immutable from begining. This is orthogonal to mseal() though. In case of ld.so, iiuc, memory can be first allocated as W, then later changed to RO, for example, during symbol resolution. The important point is that the application can decide what type of sealing it wants, and when to apply it. There needs to be an api(), that can be mseal() or mprotect2() or mimmutable(), the naming is not important to me. mprotect() in linux have the following signature: int mprotect(void addr[.len], size_t len, int prot); the prot bitmasks are all taken here. I have not checked the prot field in mmap(), there might be bits left, even not, we could have mmap2(), so that is not an issue. Thanks -Jeff > -- > Pedro