Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp3987110rdg; Wed, 18 Oct 2023 11:21:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHFvw9jMG89UKlNBP/XOcDwV0y8M4l46bfWsaMmZBDEsGBxGwd8NmWcb4iUkT/w0/LpzXIy X-Received: by 2002:a17:902:c405:b0:1c3:e2eb:f79d with SMTP id k5-20020a170902c40500b001c3e2ebf79dmr9567816plk.8.1697653271548; Wed, 18 Oct 2023 11:21:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697653271; cv=none; d=google.com; s=arc-20160816; b=gte52zko+so9RxYTpnpDSTTDl2kbrepawU0xb5Vu9AYw+a5Bz/zkagW2CsLRG439yl wPzbH4AlyzPKonWM+ltGLu624T9xyzk2R0/d0SvQX9XsZVMOEHx2i2wwUgGlJosxslej qe/FIr7jsyWPe2YUj3KjC9yh0TAcq6cD+I73E10GsqcI8AeuCpRNoCD/L1Lb0m47GvGU JatFZmDgcQJTyL/r15hAoF6j54Ex0odqjcaBN6pfF7SADeJ55ARhHHgLBHm1EqcWxcJ8 HTjNpbXNnKU6fr16belsE3LyHLh8tjDVGUN/Nh2gxnOmet12k7Ylii3uEyvcDyJdnHsI SfJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6vVIrHuWQVA6shM3HnYPie5+SRU5CPGTqtle1JU5tXo=; fh=bBT4h5l2mXTvmvmJOY/1nAhFMzAknuzkjaf84Z5MrgU=; b=qIuiPRNMbZRFNTdvqcKu2tCqNdb68hOxlwkGxdg3Phq7kzkH71G0zCyyRlQNApc9IA ztKig9IHPO0EN24N8ntJKwmbA9MWkWP3Xtf3sp9WRSW+tz8p1W6mvduk9zUOtxcfFVKP tUxaIdhqzpoUptKL3Y1sNX0Vr8mT2JHdVJ1v3dGrT6tbJkqqaJcgiwrIqRd6PWJ1h5DD hzEiHxQsiDqs/a6sldaTWtYGUsm5MTZ9deU5pxfYzgfv1DQNqnfxcrXSstp6zr5K0mNH ygGcG3bHOiZAEQqtJ7W+7I/Hin5JS75y3HZFUKuK93JnL3NyyDlC+zBAIfoTQSO6N56m 4x3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=2Vlfe3wP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id j8-20020a056a00174800b00690d25b1991si4603201pfc.41.2023.10.18.11.21.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 11:21:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=2Vlfe3wP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id C301080944D2; Wed, 18 Oct 2023 11:20:56 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231715AbjJRSUo (ORCPT + 99 others); Wed, 18 Oct 2023 14:20:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231492AbjJRSUm (ORCPT ); Wed, 18 Oct 2023 14:20:42 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E1C5113 for ; Wed, 18 Oct 2023 11:20:40 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id d75a77b69052e-41cb7720579so37671cf.1 for ; Wed, 18 Oct 2023 11:20:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697653239; x=1698258039; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6vVIrHuWQVA6shM3HnYPie5+SRU5CPGTqtle1JU5tXo=; b=2Vlfe3wPPKaCOHn3S36I5scTD/erTZe6VUeGITRJcVPh3Jsa4z0A1pMS9Drm2T31zQ 250mnWxYhasaEQ5DYVxSsSREv/TULBXzcITSNwLrUSMlLH6u0ofjTLQoA5DRArTGgxqj G6stLv82rdMdF6FdxqSYYSBNcydyyzKfbsNQYF7oQzJNY0w2aXRiKKJd7Gm8WxR3somE fQM76zCSVPBpkrfn6DGrSDpWOEhssv5+8qcsHoqClS7X3oLbc84zTDiAPpKY7uGDvLct SukbY6cA0EhbPqgGhW9DrCLyku3rjNIO0yI5k1LZF0bz+tMqWvJYkMwYdxsWkk0frBVu Ucwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697653239; x=1698258039; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6vVIrHuWQVA6shM3HnYPie5+SRU5CPGTqtle1JU5tXo=; b=paB3e3ZQ9z0hbZTZ8Jg+caDlTH4eCpR2fQ9g6pNudzarsndsEiFF2q867BMXrTrqbN 3ec6OZWMF3XTSicki797RAuGYPWrw9qe4EBnRAdIzZMyFr8P0brZZxPP07iZ1oZ8CGkO 5Od5lLL3lzB+JtEe119Jxgb8nK5/FIfCmEWmNDQVCOATQG643LwHPCCPR8zFUbiLCp5q q47H/naASH4iodZuAhGn5nIr2PeP0xhyArtb0k/s+2ajTZbxrgRCOIbagCwZO36q2V+k Gzb1WYxrQIFJKMKADaE+GwXdpWt5lHXGgHKfcI2HDKNuZKy0zWJKrjH2sZssB10LkV4F 4a4g== X-Gm-Message-State: AOJu0YyRVZrmd76Qx3UVxdOG0de7zXiZ2zaZOpTsALj+GOIhuqYtKobj AhTdtWQsNXqOP3n6kcppQpeBNC4GLeGOVgUOvj5a7Q== X-Received: by 2002:a05:622a:a049:b0:412:9cd:473b with SMTP id ju9-20020a05622aa04900b0041209cd473bmr48343qtb.4.1697653239212; Wed, 18 Oct 2023 11:20:39 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Jeff Xu Date: Wed, 18 Oct 2023 11:20:02 -0700 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Pedro Falcato Cc: Matthew Wilcox , jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 18 Oct 2023 11:20:56 -0700 (PDT) On Tue, Oct 17, 2023 at 3:35=E2=80=AFPM Pedro Falcato wrote: > > > > > > > I think it's worth pointing out that this suggestion (with PROT_*) > > > could easily integrate with mmap() and as such allow for one-shot > > > mmap() + mseal(). > > > If we consider the common case as 'addr =3D mmap(...); mseal(addr);',= it > > > definitely sounds like a performance win as we halve the number of > > > syscalls for a sealed mapping. And if we trivially look at e.g OpenBS= D > > > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem > > > like common patterns. > > > > > Yes. mmap() can support sealing as well, and memory is allocated as > > immutable from begining. > > This is orthogonal to mseal() though. > > I don't see how this can be orthogonal to mseal(). > In the case we opt for adding PROT_ bits, we should more or less only > need to adapt calc_vm_prot_bits(), and the rest should work without > issues. > vma merging won't merge vmas with different prots. The current > interfaces (mmap and mprotect) would work just fine. > In this case, mseal() or mimmutable() would only be needed if you need > to set immutability over a range of VMAs with different permissions. > Agreed. By orthogonal, I meant we can have two APIs: mmap() and mseal()/mprotect() i.e. we can't just rely on mmap() only without mseal()/mprotect()/mimmutabl= e(). Sealing can be applied after initial memory creation. > Note: modifications should look kinda like this: https://godbolt.org/z/Tb= jjd14Pe > The only annoying wrench in my plans here is that we have effectively > run out of vm_flags bits in 32-bit architectures, so this approach as > I described is not compatible with 32-bit. > > > In case of ld.so, iiuc, memory can be first allocated as W, then later > > changed to RO, for example, during symbol resolution. > > The important point is that the application can decide what type of > > sealing it wants, and when to apply it. There needs to be an api(), > > that can be mseal() or mprotect2() or mimmutable(), the naming is not > > important to me. > > > > mprotect() in linux have the following signature: > > int mprotect(void addr[.len], size_t len, int prot); > > the prot bitmasks are all taken here. > > I have not checked the prot field in mmap(), there might be bits left, > > even not, we could have mmap2(), so that is not an issue. > > I don't see what you mean. We have plenty of prot bits left (32-bits, > and we seem to have around 8 different bits used). > And even if we didn't, prot is the same in mprotect and mmap and mmap2 :) > > The only issue seems to be that 32-bit ran out of vm_flags, but that > can probably be worked around if need be. > Ah, you are right about this. vm_flags is full, and prot in mprotect() is n= ot. Apology that I was wrong previously and caused confusion. There is a slight difference in the syntax of mprotect and mseal. Each time when mprotect() is called, the kernel takes all of RWX bits and updates vm_flags, In other words, the application sets/unset each RWX, and kernel takes it. In the mseal() case, the kernel will remember which seal types were applied previously, and the application doesn=E2=80=99t need to repeat all existing seal types in the next mseal(). Once a seal type is applied, it can=E2=80=99t be unsealed. So if we want to use mprotect() for sealing, developers need to think of sealing bits differently than the rest of prot bits. It is a different programming model, might or might not be an obvious concept to developers. There is a difference in input check and error handling as well. for mseal(), if a given address range has a gap (unallocated memory), or if one of VMA is sealed with MM_SEAL_SEAL flag, none of VMAs is updated. For mprotect(), some VMAs can be updated, till an error happens to a VMA. IMO: I think it makes sense for mseal() and mprotect() to be different in this. mseal() only checks vma struct so it is fast, and mprotect() goes deeper into HW. Because of those two differences, personally I feel a new syscall might be worth the effort. That said, mmap() + mprotect() is workable to me if that is what the community wants. Thanks -Jeff -Jeff > -- > Pedro