Received: by 2002:a05:7412:798b:b0:fc:a2b0:25d7 with SMTP id fb11csp506796rdb; Thu, 22 Feb 2024 10:10:38 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXE7w2W/yeC16R28hir3fmwgtvqi3oo2IhhGlO+BcOW9sO7nGoStvkhCyMiELDbeS4gG7OjjjWx/7SVtuUpEUa1QrPSeIUGkPcBYYMkBA== X-Google-Smtp-Source: AGHT+IE9XPcAHVDWeOL/dQIc1IDa6AXGNkBJecujQ7aX7jtt76LnOyJUpKIjhF5SdLKEl8tlFk6l X-Received: by 2002:a17:902:b28a:b0:1dc:7bc:d025 with SMTP id u10-20020a170902b28a00b001dc07bcd025mr9561415plr.4.1708625438216; Thu, 22 Feb 2024 10:10:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708625438; cv=pass; d=google.com; s=arc-20160816; b=qCZynsCmO9Bq4SRD4NXAUs0BwC4soUPCBNqGLrbCLATFl0lwAGJKJDOSz4xVdyDdkQ 90zX0Di8BlRYh4U2oISpO88+wE7+b8iAtK+jmUD0iWzRv9TPGY8uyV7jh8dDXUJ54st1 EdcRy+qYgfwLRFB5N68EtQUP8f/QQ4Mi29fFkGblQ9kWwW2iTfA3LNJnq2dMW8md+OV5 RSTDVkEuhY3RfSFn5HuUMD/ZQfAiPylpkjHlHGmx+t6eTgSQ1Hj1RdAgdTGBEAIBzihw qZm3P3j4ww3jqKCXFEWd+o2XgBPvAkg6NoPya4VTXNRfgDz0m5xivaiytu2DmndcS7PU nhjg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=+9ZwYH6ktombumemqSFUuCawnd03QAa+vgaZnNMK24s=; fh=Y7wA24cXPu+1y5npe0iwN6ztQRYXaFFJ89MSqnkkqDs=; b=UJ48IC7vwUSk2tx2stSoCPO5ygIC9JFK2zWW6Sft4yhwkmHGC+2n1mt8e8NguN3JIR mkidbp6TaWKAnnbJOMXnx0kZILiD83CUFC8pzxgNSc4+Ad63PqjylCDI9ivtvIl0vkOX FmePQ2l4zAPrM9DVQgItowdhKVvOwpheX5fr1EVrzDbABf8jfk1cdHqu4GTqqeeTyIFR r34lRGit3mRkD2IR1R4ji32ff2rIReqdxD0/m3HQgaQF2wAOb8NM7YRWnSp3HT/gEvmS GcuzKTKkJbwsnRSMYW1jjAkGZ8fWw1XU2768aOy4WJfR38sqWmCa+DMDlK2zshkFsN6t CcHg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b=4Bl7iyQ4; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-77021-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-77021-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id u2-20020a170902e80200b001dbde90b4a8si9019274plg.158.2024.02.22.10.10.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Feb 2024 10:10:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-77021-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b=4Bl7iyQ4; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-77021-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-77021-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 79798B259AA for ; Thu, 22 Feb 2024 17:58:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 39F44156979; Thu, 22 Feb 2024 17:57:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b="4Bl7iyQ4" Received: from bee.tesarici.cz (bee.tesarici.cz [77.93.223.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93AC6155A56; Thu, 22 Feb 2024 17:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=77.93.223.253 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708624673; cv=none; b=FeFASWyk9PraZOl4cJDPO9rHn/jL425Q+5KYDGLtPmTXLL0WJCbkVVDzOJoUdR85vY+/qGoLRyB2PEAilMJccd2/Us57JicZhTP7gTECDT64ehMWKpxiTb7RespnleOFU7fjP+cL1p7mkmk/07Ds2a2/nFxm4CmZfawSK+6h+EI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708624673; c=relaxed/simple; bh=bm5WhP92TE9F7brDXB57F1j5zC3T11LrP8BxYsw4a3w=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZIWBVGfvEwsDHIiaEeSyicYuMZm1G45qqZb/9GXOXrJdVXIUn0/I3yLA5DrsomixjYD2hQgMMR6gDL+AYDZZzsDhaLXNrBAk8iP9Udzthd6h0VyfYfVhyx38tKWrP+ZDrqeXUMDR/x1kyfO1AlvO09DEdouLb1T3Ca5Q6hiozVw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz; spf=pass smtp.mailfrom=tesarici.cz; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b=4Bl7iyQ4; arc=none smtp.client-ip=77.93.223.253 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tesarici.cz Received: from meshulam.tesarici.cz (dynamic-2a00-1028-83b8-1e7a-4427-cc85-6706-c595.ipv6.o2.cz [IPv6:2a00:1028:83b8:1e7a:4427:cc85:6706:c595]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bee.tesarici.cz (Postfix) with ESMTPSA id EB4591B4022; Thu, 22 Feb 2024 18:57:45 +0100 (CET) Authentication-Results: mail.tesarici.cz; dmarc=fail (p=quarantine dis=none) header.from=tesarici.cz DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tesarici.cz; s=mail; t=1708624666; bh=+9ZwYH6ktombumemqSFUuCawnd03QAa+vgaZnNMK24s=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=4Bl7iyQ401acRfW9Hr0ywLRMgn053gjtAl1Bd2EVFRUuJGXxprp0o2biPdalr8Hpp gFYlYfLCW7CA5mCVaMwygOPrt4Mvxkiu1JR2xKuFolNLA9uJS1FLvgFNIYY56GTOjq NGiSoDt46lz05HN4CehhE1xCurdE5HLh+nmFV2a3t5GXqMj81DkQPlFAwI31mtS+MD EWBDPqpvzapQgFGS1uo/70knNjr1Ot7L/NHGTUo0h5mpIb+2mOKlFcnR3V1chDzHuC me2obwFWFUKhSLlW7uLRuDAYb75gNhBc16k/27pRJconBNTQzsdSwFTxTG4pk06Tis sB+Gq9wbRD6qw== Date: Thu, 22 Feb 2024 18:57:44 +0100 From: Petr =?UTF-8?B?VGVzYcWZw61r?= To: Dave Hansen Cc: Petr Tesarik , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , "open list:DOCUMENTATION" , open list , Roberto Sassu , John Johansen , Paul Moore , James Morris , "Serge E. Hallyn" , apparmor@lists.ubuntu.com, linux-security-module@vger.kernel.org, Petr Tesarik Subject: Re: [RFC 4/5] sbm: fix up calls to dynamic memory allocators Message-ID: <20240222185744.509e4958@meshulam.tesarici.cz> In-Reply-To: References: <20240222131230.635-1-petrtesarik@huaweicloud.com> <20240222131230.635-5-petrtesarik@huaweicloud.com> X-Mailer: Claws Mail 4.2.0 (GTK 3.24.39; x86_64-suse-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 22 Feb 2024 07:51:00 -0800 Dave Hansen wrote: > On 2/22/24 05:12, Petr Tesarik wrote: > > static const struct sbm_fixup fixups[] = > > { > > + /* kmalloc() and friends */ > > + { kmalloc_trace, proxy_alloc3 }, > > + { __kmalloc, proxy_alloc1 }, > > + { __kmalloc_node, proxy_alloc1 }, > > + { __kmalloc_node_track_caller, proxy_alloc1 }, > > + { kmalloc_large, proxy_alloc1 }, > > + { kmalloc_large_node, proxy_alloc1 }, > > + { krealloc, proxy_alloc2 }, > > + { kfree, proxy_free }, > > + > > + /* vmalloc() and friends */ > > + { vmalloc, proxy_alloc1 }, > > + { __vmalloc, proxy_alloc1 }, > > + { __vmalloc_node, proxy_alloc1 }, > > + { vzalloc, proxy_alloc1 }, > > + { vfree, proxy_free }, > > + > > { } > > }; > > Petr, thanks for sending this. This _is_ a pretty concise example of > what it means to convert kernel code to run in your sandbox mode. But, > from me, it's still "no thanks". > > Establishing and maintaining this proxy list will be painful. Folks > will change the code to call something new and break this *constantly*. > > That goes for infrastructure like the allocators and for individual > sandbox instances like apparmor. Understood. OTOH the proxy list is here for the PoC so I could send something that builds and runs without making it an overly big patch series. As explained in patch 5/5, the goal is not to make a global list. Instead, each instance should define what it needs and that way define its specific policy of interfacing with the rest of the kernel. To give an example, these AppArmor fixups would be added only to the sandbox which runs aa_unpack(), but not to the one which runs unpack_to_rootfs(), which is another PoC I did (but required porting more patches). If more fixups are needed after you change your code, you know you've just added a new dependency. It's then up to you to decide if it was intentional. > It's also telling that sandboxing a bit of apparmor took four fixups. > That tells me we're probably still only looking at the tip of the icebeg > if we were to convert a bunch more sites. Yes, it is the cost paid for getting code and data flows under control. In your opinion this kind of memory safety is not worth the effort of explicitly defining the interface between a sandboxed component and the rest of the kernel, because it increases maintenance costs. Correct? > That's on top of everything I was concerned about before. Good, I think I can understand the new concern, but regarding everything you were concerned about before, this part is still not quite clear to me. I'll try to summarize the points: * Running code in ring-0 is inherently safer than running code in ring-3. Since what I'm trying to do is protect kernel data structures from memory safety bugs in another part of the kernel, it roughly translates to: "Kernel data structures are better protected from rogue kernel modules than from userspace applications." This cannot possibly be what you are trying to say. * SMAP, SMEP and/or LASS can somehow protect one part of the kernel from memory safety bugs in another part of the kernel. I somehow can't see how that is the case. I have always thought that: * SMEP prevents the kernel to execute code from user pages. * SMAP prevents the kernel to read from or write into user pages. * LASS does pretty much the same job as SMEP+SMAP, but instead of using page table protection bits, it uses the highest bit of the virtual address because that's much faster. * Hardware designers are adding (other) hardware security defenses to ring-0 that are not applied to ring-3. Could you give an example of these other security defenses, please? * Ring-3 is more exposed to attacks. This statement sounds a bit too vague on its own. What attack vectors are we talking about? The primary attack vector that SBM is trying to address are exploits of kernel code vulnerabilities triggered by data from sources outside the kernel (boot loader, userspace, etc.). H. Peter Anvin added a few other points: * SBM has all the downsides of a microkernel without the upsides. I can only guess what would be the downsides and upsides... One notorious downside is performance. Agreed, there is some overhead. I'm not promoting SBM for time-critical operations. But compared to user-mode helpers (which was suggested as an alternative for one of the proposed scenarios), the overhead of SBM is at least an order of magnitude less. IPC and the need to define how servers interact with each other is another downside I can think of. Yes, there is a bit of it in SBM, as you have correctly noted above. * SBM introduces architectural changes that are most definitely *very* harmful both to maintainers and users. It is very difficult to learn something from this statement. Could you give some examples of how SBM harms either group, please? * SBM feels like paravirtualization all over again. All right, hpa, you've had lots of pain with paravirtualization. I feel with you, I've had my part of it too. Can you imagine how much trouble I could have spared myself for the libkdumpfile project if I didn't have to deal with the difference between "physical addresses" and "machine addresses"? However, this is hardly a relevant point. The Linux kernel community is respected for making decisions based on facts, not feelings. * SBM exposes kernel memory to user space. This is a misunderstanding. Sandbox mode does not share anything at all with user mode. It does share some CPU state with kernel mode, but not with user mode. If "user space" was intended to mean "Ring-3", then it doesn't explain how that is a really bad idea. * SBM is not needed, because there is already eBPF. Well, yes, but I believe they work on a different level. For example, eBPF needs a verifier to ensure memory safety. If you run eBPF code itself in a sandbox instead, that verifier is not needed, because memory safety is enforced by CPU hardware. When hpa says that SandBox Mode is "an enormous step in the wrong direction", I want to understand why this direction is wrong, so I can take a step in the right direction next time. So far there has been only one objective concern: the need to track code (and data) dependencies explicitly. AFAICS this is an inherent drawback of any kind of program decomposition. Is decomposition considered harmful? Petr T