Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp983428lqs; Wed, 6 Mar 2024 02:49:58 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX/qIiyjOEhxYDhI1rh+QsdKDFLLBscJlTFjwo7EGbuBgpotmuVrpJ7OeBgTw3yT6c41BfccD5eCs1BMsyl/s+f1dTHtQDOBySC+hCx/g== X-Google-Smtp-Source: AGHT+IGP7LuJCPtVt3pnK4QJbREEjNbd1sDZNw9HTPTrK1+PMvZt/hwzH7Nju6e4I4C0M8YrOtSj X-Received: by 2002:a05:6870:8091:b0:220:bba1:5c92 with SMTP id q17-20020a056870809100b00220bba15c92mr4679433oab.38.1709722198723; Wed, 06 Mar 2024 02:49:58 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709722198; cv=pass; d=google.com; s=arc-20160816; b=uRK/R0Hug5GsJK+iIpqQHUqNQslvqYSZh9n92dhnvPZxSNNiqqkYz5mG7CoMovk+1m 41m6isnzJuf0M94A8QC4paI9Oe7LvE88cW8yFx9ZipAeVFnmcPKHxtOI9aXtTVGF4IRS g5fSw6QS1AqjZ4nw1vLV0pMB6aKPjKwsSpD9L5JEjsO5SfiX7CNnLasTJDbv5+KyE2Nc 8/dudTr1wfyexgREk+odwSHVtnydVmtBObXAAt7dQBqoJa7QiOXauKyIgfUBl6ShOO99 C/kihH+e73Su5z1Hun4GWkggclLp5SD789paAp0ZvfBz03U0e1ahFWimmtVv+MRWJRTv zWNQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=GmiutPs420KSubfj+E41AV5jWZP3h7vvOSaBZaC9xL0=; fh=Wr0hVVfR3qlPZT2RMAvEm+2k3cxuKPAhLc8MbpcKDQY=; b=UKIHQC8oZ7abjkIYVeXZoxvtPW0BXVVpEGBrKsGq6WcyRvRb5149W84CBT6OVRw7ZI xnVsvbzvO8/lbM0dUGasIMPCmZfk5UJgzF22g1+27ucSqWizokimCyhQrJPf+/6noAnd /llBWc8yyn+MloPa4pr42qEDPCqG9T/xRLAFLBd7pLbXBHKXtqLjp3bzBhl02p6cvs4A JJhE1zsp/vpbfn7Q7iEdUZu4IEBbvJUC/7HsMXN1LytGO7pu1202SCRNpkHmw0MrttJ9 H4VxoeSEr6PbwqhR9JciUNrd+A9ejEs0CQ30JuP9efYXrSGdziCBIdOSqL9+3aDMXyXR kTcg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=JdmmeZGj; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-93803-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-93803-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id bw24-20020a056a02049800b005cf0e5119fesi12391697pgb.304.2024.03.06.02.49.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 02:49:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-93803-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=JdmmeZGj; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-93803-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-93803-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5D736282C7C for ; Wed, 6 Mar 2024 10:49:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C1CCF5FBB7; Wed, 6 Mar 2024 10:49:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JdmmeZGj" Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 121FD605B2 for ; Wed, 6 Mar 2024 10:49:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709722177; cv=none; b=NbJuZnTIuEEaHyO31d2sBLfQ4T9VeKBbSo7cD2fhiTGgNZp+PTHlBBLy6CjyCgWIx7MpEcajz65r/s+9vIXs4txVeANxVMKfbGH+orRzDd9E0GBtFz4fzjRiZ1YCN7va3GQbLHt/nFX8yWbPwS3v5ffozT+XTwLABRTiC4PMFQY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709722177; c=relaxed/simple; bh=RASrq3LBTekIgSKG7TbLLKECG+SrNyOwLafAkERxv94=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=vGa/vqbInIKWFAjYlLCZYmasY15L6XoeVdFqeKKjVMB2cQ/RNh1qyxJcuw+GIwHcGi8N8wPfv4osGXLMYE519N347Wb2gv94Pe1KHNFbTCETu0xYrB38TFL1tKZULNTuckmQDLPFoEenXXmAjI50ZAUMEH6HwiA43m2efsnlFgw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JdmmeZGj; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-33de4d3483eso4133688f8f.3 for ; Wed, 06 Mar 2024 02:49:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709722174; x=1710326974; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GmiutPs420KSubfj+E41AV5jWZP3h7vvOSaBZaC9xL0=; b=JdmmeZGjN6ACa8pcWynXSw/Sk3ibfpc+h/Nktcl9+z/Rjj50TCSRIGCKVBHsHTa6WU L1X2Zhr8zeipNbM/iXMaowEFJl6gO/JW7yHi+giDQBboojZ6Y4N8n9Q+oSwekATJplMr PVeXye5+MCLUud04JYqHiX8yTlwEJ9KbfIf8SY2sScr5uXbh8nWxSsicaTkO62EAa7UL i55tuw8ix+2OdNcWgLGLwdOpGeQhF4tpB/WLmTRZ7wamzEUium3tL7Bom8Kh/fRHkqmX NiGYKFp3v24j4psntFMDB1bUB3Vj5pm2Du9RdoErXLJi31TjL87bkgSNP40bD+i/8NE6 VN9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709722174; x=1710326974; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GmiutPs420KSubfj+E41AV5jWZP3h7vvOSaBZaC9xL0=; b=rVbYOSA8J3bc+JpmajUyS81gM/2GK6m/9j/5enjskSUHVRtQaC0NwGd0AoBFsdpASG zRCrd0BxRF0EIwMZzK0eKCGPjt6q2pQz5mjzj/U8ma/orRAeoHWUakgRv2PSGebxLd8L vxYVh98a9UKSC7OUlgLoGSSAMbB85+i7ATBTHd9LtzT197D9AGDifbuxjhRHRyPDz7G4 dI7fpqS99PFIGIBJOfFQrcLY3SEI/bz9ma+lbKR9AtaK1dl4VS+NfmH0gQsUlxJMhvTz Y31NfnjmmmOtgKw2uwdXl4snTO7B1THV2ZMT3dqOKNXhBRlxaF6yjf+gC+ieEaWWSCbD D1LQ== X-Forwarded-Encrypted: i=1; AJvYcCXNiERYe88/FT/9jIuAT/SIOBhDj1ySK5X5zNxhSbzfPbW0+V7b6kF46NJ4dcJdikZ9N9OXx7XjFjtxM8dbXeJCCyfYNcvVLu5A1zR+ X-Gm-Message-State: AOJu0YxKHH+kM3s1orZZt0A3jMJLffQupjyM9zaklWVwbMqtq6HC/LYh uKPoib8OxalR9fD4yXfAUo4WHRVTcQs8Gfx/fTO8Teel6M6UF0iSjYHaNzGCA7AvducKgvMqv8P JuoofW2mKXZs/be2aBjcWmwHz3ta5jmdhowA= X-Received: by 2002:adf:e883:0:b0:33d:1720:8cfd with SMTP id d3-20020adfe883000000b0033d17208cfdmr10811564wrm.41.1709722174393; Wed, 06 Mar 2024 02:49:34 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240301213442.198443-1-adrian.ratiu@collabora.com> <20240304-zugute-abtragen-d499556390b3@brauner> <202403040943.9545EBE5@keescook> <20240305-attentat-robust-b0da8137b7df@brauner> <202403050134.784D787337@keescook> <20240305-kontakt-ticken-77fc8f02be1d@brauner> <20240305-gremien-faucht-29973b61fb57@brauner> In-Reply-To: <20240305-gremien-faucht-29973b61fb57@brauner> From: Matt Denton Date: Wed, 6 Mar 2024 02:49:21 -0800 Message-ID: Subject: Re: [PATCH v2] proc: allow restricting /proc/pid/mem writes To: Christian Brauner Cc: Kees Cook , Matthew Denton , Adrian Ratiu , linux-fsdevel@vger.kernel.org, kernel@collabora.com, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Guenter Roeck , Doug Anderson , Jann Horn , Andrew Morton , Randy Dunlap , Mike Frysinger Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The SECCOMP_RET_USER_NOTIF sandbox is partially implemented but the reason we needed it (glibc blocking signals during certain syscalls we wanted to emulate) got reverted and we haven't had any important issues with the SECCOMP_RET_TRAP sandbox since then. /proc/pid/mem was always restricted on ChromeOS so the plan was to use process_vm_readv() and process_vm_writev() in the unsandboxed broker process. We knew about the pid race of course, but this would be far from the only place that Chrome would be potentially vulnerable to the race so it didn't seem any worse. We did need to use process_vm_writev() for some syscalls, like emulating stat() required us to write into the supervised process. On Tue, Mar 5, 2024 at 3:03=E2=80=AFAM Christian Brauner wrote: > > On Tue, Mar 05, 2024 at 10:58:31AM +0100, Christian Brauner wrote: > > On Tue, Mar 05, 2024 at 01:41:29AM -0800, Kees Cook wrote: > > > On Tue, Mar 05, 2024 at 09:59:47AM +0100, Christian Brauner wrote: > > > > > > Uhm, this will break the seccomp notifier, no? So you can't tur= n on > > > > > > SECURITY_PROC_MEM_RESTRICT_WRITE when you want to use the secco= mp > > > > > > notifier to do system call interception and rewrite memory loca= tions of > > > > > > the calling task, no? Which is very much relied upon in various > > > > > > container managers and possibly other security tools. > > > > > > > > > > > > Which means that you can't turn this on in any of the regular d= istros. > > > > > > > > > > FWIW, it's a run-time toggle, but yes, let's make sure this works > > > > > correctly. > > > > > > > > > > > So you need to either account for the calling task being a secc= omp > > > > > > supervisor for the task whose memory it is trying to access or = you need > > > > > > to provide a migration path by adding an api that let's caller'= s perform > > > > > > these writes through the seccomp notifier. > > > > > > > > > > How do seccomp supervisors that use USER_NOTIF do those kinds of > > > > > memory writes currently? I thought they were actually using ptrac= e? > > > > > Everything I'm familiar with is just using SECCOMP_IOCTL_NOTIF_AD= DFD, > > > > > and not doing fancy memory pokes. > > > > > > > > For example, incus has a seccomp supervisor such that each containe= r > > > > gets it's own goroutine that is responsible for handling system cal= l > > > > interception. > > > > > > > > If a container is started the container runtime connects to an AF_U= NIX > > > > socket to register with the seccomp supervisor. It stays connected = until > > > > it stops. Everytime a system call is performed that is registered i= n the > > > > seccomp notifier filter the container runtime will send a AF_UNIX > > > > message to the seccomp supervisor. This will include the following = fds: > > > > > > > > - the pidfd of the task that performed the system call (we should > > > > actually replace this with SO_PEERPIDFD now that we have that) > > > > - the fd of the task's memory to /proc//mem > > > > > > > > The seccomp supervisor will then perform the system call intercepti= on > > > > including the required memory reads and writes. > > > > > > Okay, so the patch would very much break that. Some questions, though= : > > > - why not use process_vm_writev()? > > > > Because it's inherently racy as I've explained in an earlier mail in > > this thread. Opening /proc//mem we can guard via: > > > > // Assume we hold @pidfd for supervised process > > > > int fd_mem =3D open("/proc/$pid/mem", O_RDWR);: > > > > if (pidfd_send_signal(pidfd, 0, ...) =3D=3D 0) > > write(fd_mem, ...); > > > > But we can't exactly do: > > > > process_vm_writev(pid, WRITE_TO_MEMORY, ...); > > if (pidfd_send_signal(pidfd, 0, ...) =3D=3D 0) > > write(fd_mem, ...); > > > > That's always racy. The process might have been reaped before we even > > call pidfd_send_signal() and we're writing to some random process > > memory. > > > > If we wanted to support this we'd need to implement a proposal I had a > > while ago: > > > > #define PROCESS_VM_RW_PIDFD (1 << 0) > > > > process_vm_readv(pidfd, ..., PROCESS_VM_RW_PIDFD); > > process_vm_writev(pidfd, ..., PROCESS_VM_RW_PIDFD); > > > > which is similar to what we did for waitid(pidfd, P_PIDFD, ...) > > > > That would make it possible to use a pidfd instead of a pid in the two > > system calls. Then we can get rid of the raciness and actually use thos= e > > system calls. As they are now, we can't. > > What btw, is the Linux sandbox on Chromium doing? Did they finally move > away from SECCOMP_RET_TRAP to SECCOMP_RET_USER_NOTIF? I see: > > https://issues.chromium.org/issues/40145101 > > What ever became of this?