Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1359641ybx; Thu, 7 Nov 2019 10:52:30 -0800 (PST) X-Google-Smtp-Source: APXvYqwTPy2crE6GsuxFXqQRb/ropah+ao3PLQlbwJ8McNgvQ6XVwpeY7VuOPiqVVgCN0wAwp1Yt X-Received: by 2002:a17:906:3919:: with SMTP id f25mr4725384eje.210.1573152750014; Thu, 07 Nov 2019 10:52:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573152750; cv=none; d=google.com; s=arc-20160816; b=0Fw8cwjVrltXviawGHIcmIrV2FlkFdIsFnAhn5ZrwZPfwKCGfNBkxH51qDI+qjWvC/ F3jDS25a/2FcmAhFfvB78EajzMRTI+5VsPy88BaTgu80+3V//k1Elss/bFGOXrBF0sTJ CH088Vi0+9WpQ+DoaGzkmECcGqzlq8lBfztN0GGvgkDoY8ctSlZuroW4gAO3NgKLV/4f bi0DXsEDeHkN+rg/p4d8DEtasnX5pc2pdH3OmDxn4AsBL2i5imkvGpuYiYJi1z9Fz29c Bb/aOmkbYxEc7DVW9cLx3bSaKrijoBtNimOJnL/3FoRqVMcKZKFx+nfmat6eotemhmmu 1wIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GkjehHi85xXIeAmqIsUNqDZH/jMBWhdTR0KFAEgoRF8=; b=K0BtFtyJVC/zWog2ny655lbiGVpkf/iX39vrbTMw/YkecWYlwiBmpvSutWdEXtrNWJ p4nlcPGNFws3FjWIVn3Rneo4Tv74YAV3Swu8F7hMqfDiqi+ul5TnyYDTA9kSxngH3OvY BSsma6URpf5RPw1fVLcENlRkJBlYjZGt8qcfpRvF+qWCy17tVr9EM3+XrimK6yfFMb+G DKEqBjSqAxCuaQkVosqKYXoVWW9+mADblW2vvJuqlTH1k88p/sgMUZSy1aWkuv64Diy2 xf7Zs/skc4ct85w2xmFca1oige8OQTy9vQd9Qn0EpNIj2rYIcVZL8ymRTDXI7Docz65H 2JkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=imXRr7Gk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b7si2192503ede.12.2019.11.07.10.52.06; Thu, 07 Nov 2019 10:52:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=imXRr7Gk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727414AbfKGSvG (ORCPT + 99 others); Thu, 7 Nov 2019 13:51:06 -0500 Received: from mail-lf1-f68.google.com ([209.85.167.68]:34309 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725990AbfKGSvF (ORCPT ); Thu, 7 Nov 2019 13:51:05 -0500 Received: by mail-lf1-f68.google.com with SMTP id f5so2435276lfp.1 for ; Thu, 07 Nov 2019 10:51:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GkjehHi85xXIeAmqIsUNqDZH/jMBWhdTR0KFAEgoRF8=; b=imXRr7GkNBwE8ssccen69v7UhBqSwcwGow4PL5eAl5Sy1Trg5deD9ZqHW5uxKlPXC1 uG/d+bxygT/JQ9w+Y5zg1sql1eG626hOr0sakmOa65SQ5aX0CgesEw4UkCFKIaSts5n2 SNaz5EtU2ZimdRB9GsKU4CSZq+3JLRfpLiRCBVyimidejxy2kx3Y7tPyb7lSol8dPrZU ld//gWqzftCmma1zPviuPalW3bRZ7YwvpSYsJfqD/jVZYWB99pmo4CibY2Lqw+pYxH9z 2wZfQr8bbFfDFeGcTtZda30u7mG1zAJp4s7qt3sZKLo+G9BF3YNezXUKnyjoDhs3sQk3 MCGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GkjehHi85xXIeAmqIsUNqDZH/jMBWhdTR0KFAEgoRF8=; b=D8rvQ0iiM9pQCQl3pEnyVBq0ybpdnYHOFcckhsvD0IIdEBpT9J41dKNap6Cy9LPhj5 RYSTEAX0xJttvSlSgcVUB9BNf/PVK2yX5udaC31BvrKbctPa8LAe1PEDNIcWWnUa9rSJ 1SvbXwwgOVHLl2hJQDDYBerYRU6H3/1k2KwKuwdUL93kp2PunOKY7unS3D4dyt4ItMqG POnmvuyd4nZ+Mc8kO7stMDll7qHMwhpuapZwXafResEHdGyR8xKYLAFx9jw8CbLTpADX x75zYHOzWKTBOwYyWQ4W5PHDjVLEyBHoYpTFNoKAMXhhDZJ94jbLXQy5OlpE5bjBnvAU yZVQ== X-Gm-Message-State: APjAAAWcV4cExm9E7a2OX0AM0CwKgQT4ojg18TBTL9iaqnr1evH5c/3K hnJgj/jSyDEU/wPkTfVQP1Tv6XAU6mFkeLn7/Glprg== X-Received: by 2002:ac2:5453:: with SMTP id d19mr3539669lfn.181.1573152662534; Thu, 07 Nov 2019 10:51:02 -0800 (PST) MIME-Version: 1.0 References: <1572967777-8812-1-git-send-email-rppt@linux.ibm.com> <1572967777-8812-2-git-send-email-rppt@linux.ibm.com> <20191105162424.GH30717@redhat.com> <20191107083902.GB3247@linux.ibm.com> <20191107153801.GF17896@redhat.com> <20191107182259.GK17896@redhat.com> In-Reply-To: <20191107182259.GK17896@redhat.com> From: Daniel Colascione Date: Thu, 7 Nov 2019 10:50:26 -0800 Message-ID: Subject: Re: [PATCH 1/1] userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK To: Andrea Arcangeli Cc: Mike Rapoport , Andy Lutomirski , linux-kernel , Andrew Morton , Jann Horn , Linus Torvalds , Lokesh Gidra , Nick Kralevich , Nosh Minwalla , Pavel Emelyanov , Tim Murray , Linux API , linux-mm Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 7, 2019 at 10:23 AM Andrea Arcangeli wrote: > > On Thu, Nov 07, 2019 at 08:15:53AM -0800, Daniel Colascione wrote: > > You're already paying for bounds checking. Receiving a message via a > > datagram socket is basically the same thing as what UFFD's read is > > doing anyway. > > Except it's synchronous and there are no dynamic allocations required > in uffd, while af_netlink and af_unix both all deal with queue of > events in skbs dynamically allocated. Do you have any evidence that skb allocation is a significant cost compared to a page fault and schedule? Regardless: if you don't want to use skbs, don't. My point is that recvmsg is the ideal interface for UFFD and i'm agnostic on the implementation of this interface. > And should then eventfd also become a netlink then? I mean uffd was > supposed to work like eventfd except it requires specialized events. You've raised eventfd as a model for UFFD on several occasions. I don't think eventfd is a good reference point. An eventfd is a single object with 64 bits of state. It can notify interested parties in changes to that state. Eventfd does not provide a queue. UFFD, however, *is* a queue. It provides an arbitrary number of state change notifications to a reader. In this way, UFFD is much more like a socket than it's like eventfd. That is, eventfd is about level-change notifications, but UFFD is about sending messages. > > Programs generally don't go calling recvmsg() on random FDs they get > > from the outside world. They do call read() on those FDs, which is why > > That programs generally don't do something only means the attack is > less probable. > > Programs generally aren't suid. Programs generally don't use > SCM_RIGHTS. Programs generally don't ignore the retval of > open/socket/uffd syscalls. Programs generally don't make assumptions > on the fd ID after one of those syscalls that install fds. > > If all programs generally do the right thing (where the most important > is to not make assumptions on the fd IDs and to check all syscall > retvals), there was never an issue to begin with even in uffd API. "The right thing" is a matter of contracts. If a program calls read() and behaves as if read() has only the effects read() is documented to have, that means that from the kernel's point of view, the program is doing the right thing. That you think certain practices are more prudent than others is irrelevant here. UFFD is a violation of read()'s *contract* and so if programs break after calling read(), it's the *kernel*'s fault. > > read() having unexpected side effects is terrible. > > If having unexpected side effects in read is "terrible" (i.e. I > personally prefer to use terms like terrible when there's at least > something that can be exploited in practice, not for theoretical > issues) for an SCM_RIGHTS receiving daemon, I just don't see how the > exact same unexpected (still theoretical) side effects in recvmsg with > an unexpected nested cmsg->cmsg_type == SCM_RIGHTS message being > returned, isn't terrible too. If a program calls recvmsg on an FD of unknown provenance, it *must* be prepared to receive file descriptors via SCM_RIGHTS. If it doesn't, it's a bug. The contract the kernel makes with userspace for recvmsg() includes the possibility of creating file descriptors. The contract the kernel makes with userspace for read() does not ordinarily involve creating file descriptors, so if the kernel does in fact do that, it's the kernel's problem. > > If you call it with a non-empty ancillary data buffer, you know to > > react to what you get. You're *opting into* the possibility of getting > > file descriptors. Sure, it's theoretically possible that a program > > calls recvmsg on random FDs it gets from unknown sources, sees > > SCM_RIGHTS unexpectedly, and just the SCM_RIGHTS message and its FD > > payload, but that's an outright bug, while calling read() on stdin is > > no bug. > > I'm not talking about stdin and suid. recvmsg might mitigate the > concern for suid (not certain, depends on the suid, if it's generally > doing what you expect most suid to be doing or not), I was talking > about the SCM_RIGHTS receiving daemon instead, the "worse" more > concerning case than the suid. > > I quote below Andy's relevant email: > > ====== > It's worse if SCM_RIGHTS is involved. > ====== > > Not all software will do this after calling recvmsg: > > if (cmsg->cmsg_type == SCM_RIGHTS) { > /* oops we got attacked and an fd was involountarily installed > because we received another AF_UNIX from a malicious attacker > in control of the other end of the SCM_RIGHTS-receiving > AF_UNIX connection instead of our expected socket family > which doesn't even support SCM_RIGHTS so we would never have > noticed an fd was installed after recvmsg */ > } If a program omits this code after calling recvmsg on a file descriptor of unknown provenance and the program breaks, it's the program's fault. It's reasonable to epect that recvmsg might create file descriptors if you call it on an unknown FD. It's unreasonable to expect a program to consider the possibility of read() creating file descriptors because read isn't documented to do that. >