Received: by 10.223.148.5 with SMTP id 5csp6225106wrq; Wed, 17 Jan 2018 10:53:39 -0800 (PST) X-Google-Smtp-Source: ACJfBosBL+xGiTJ8c+s6ZxBimWwAgdWYzsU4vj2XQsBExggQldbqbW1syXuZplOsIjlKHEIgqQWB X-Received: by 10.84.242.131 with SMTP id d3mr997244pll.26.1516215219110; Wed, 17 Jan 2018 10:53:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516215219; cv=none; d=google.com; s=arc-20160816; b=ZCU683o/D7Yx6DysJAOUles92Kxpa4yNmiNvEYLs/BQamF4Xd/tLlDnuh0QHwUhczW J6ikX9GYo+2AgNKt/KWXkNwGXIld4N2AW+BeKZofqafwyPtRRLfyiboFfVd5S06WU7u0 2I+ARypBwOyIPe5YBf01qXFucE21Vt1Pu8WwmlcOa9k3EHWQqtWDwUd4N/UYXyBM8p+L 42CtBuMFCQQmExidwWgC3gwk8IAYdo9w9hwYBEN2qBo/nYdBmS6WyBiayi/Y3LAAHXfh nRQ8s+KYVfHzcLxzpMaW17obRu00OSBrFJNjdfxz5MorXbuMH0jeSMiznJpnEQR6WsL8 jETw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=t6g0ug+0qQnUsUi9GM5w32A8b/hu/PGobl7tUmcweMA=; b=sV10yeFQLasWVpooJqGmpFCcxo9fX6gl7P8VzhVqM5tQMPTuib+s0ArSYzuCHMy3BN F3OSjMs9319drabPoEr1MPwKMv3llRbxQnHrgCnFd/QI/621iH4sDu5R7PZHeA2hQBNh VN6cZICP/EmJpWLi4QDjfxYJ+AGsgguIsLRfqJU1ds/RCFEOouIFOflc9+6TOBny9ezd OVW695wlivoNXr79bIQo3Bv9ilScR3vcuYKJLTbWW9lkcOjLjHOUheENPoRZC4e/0IfP U2VcVZDM40VMXaqTbfIugrNG3HGsvw2hbVdLYUMNWzEJr4aUjJtRg1rneYZ1vde4vMnx ILEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v17si4319794pge.254.2018.01.17.10.53.24; Wed, 17 Jan 2018 10:53:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754463AbeAQSwq (ORCPT + 99 others); Wed, 17 Jan 2018 13:52:46 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:46230 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753547AbeAQSwo (ORCPT ); Wed, 17 Jan 2018 13:52:44 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1ebsp6-0001kG-Fn; Wed, 17 Jan 2018 18:52:32 +0000 Date: Wed, 17 Jan 2018 18:52:32 +0000 From: Al Viro To: Alan Cox Cc: Linus Torvalds , Dan Williams , Linux Kernel Mailing List , linux-arch@vger.kernel.org, Andi Kleen , Kees Cook , kernel-hardening@lists.openwall.com, Greg Kroah-Hartman , the arch/x86 maintainers , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Andrew Morton Subject: Re: [PATCH v3 8/9] x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths Message-ID: <20180117185232.GW13338@ZenIV.linux.org.uk> References: <151586744180.5820.13215059696964205856.stgit@dwillia2-desk3.amr.corp.intel.com> <151586748981.5820.14559543798744763404.stgit@dwillia2-desk3.amr.corp.intel.com> <1516198646.4184.13.camel@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1516198646.4184.13.camel@linux.intel.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 17, 2018 at 02:17:26PM +0000, Alan Cox wrote: > On Tue, 2018-01-16 at 14:41 -0800, Linus Torvalds wrote: > > > > > > On Jan 16, 2018 14:23, "Dan Williams" > > wrote: > > > That said, for get_user specifically, can we do something even > > > cheaper. Dave H. reminds me that any valid user pointer that gets > > > past > > > the address limit check will have the high bit clear. So instead of > > > calculating a mask, just unconditionally clear the high bit. It > > > seems > > > worse case userspace can speculatively leak something that's > > > already > > > in its address space. > > > > That's not at all true. > > > > The address may be a kernel address. That's the whole point of > > 'set_fs()'. > > Can we kill off the remaining users of set_fs() ? Not easily. They tend to come in pairs (the usual pattern is get_fs(), save the result, set_fs(something), do work, set_fs(saved)), and counting each such area as single instance we have (in my tree right now) 121 locations. Some could be killed (and will eventually be - the number of set_fs()/access_ok()/__{get,put}_user()/__copy_...() call sites had been seriously decreasing during the last couple of years), but some are really hard to kill off. How, for example, would you deal with this one: /* * Receive a datagram from a UDP socket. */ static int svc_udp_recvfrom(struct svc_rqst *rqstp) { struct svc_sock *svsk = container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt); struct svc_serv *serv = svsk->sk_xprt.xpt_server; struct sk_buff *skb; union { struct cmsghdr hdr; long all[SVC_PKTINFO_SPACE / sizeof(long)]; } buffer; struct cmsghdr *cmh = &buffer.hdr; struct msghdr msg = { .msg_name = svc_addr(rqstp), .msg_control = cmh, .msg_controllen = sizeof(buffer), .msg_flags = MSG_DONTWAIT, }; ... err = kernel_recvmsg(svsk->sk_sock, &msg, NULL, 0, 0, MSG_PEEK | MSG_DONTWAIT); With kernel_recvmsg() (and in my tree the above is its last surviving caller) being int kernel_recvmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size, int flags) { mm_segment_t oldfs = get_fs(); int result; iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, vec, num, size); set_fs(KERNEL_DS); result = sock_recvmsg(sock, msg, flags); set_fs(oldfs); return result; } EXPORT_SYMBOL(kernel_recvmsg); We are asking for recvmsg() with zero data length; what we really want is ->msg_control. And _that_ is why we need that set_fs() - we want the damn thing to go into local variable. But note that filling ->msg_control will happen in put_cmsg(), called from ip_cmsg_recv_pktinfo(), called from ip_cmsg_recv_offset(), called from udp_recvmsg(), called from sock_recvmsg_nosec(), called from sock_recvmsg(). Or in another path in case of IPv6. Sure, we can arrange for propagation of that all way down those call chains. My preference would be to try and mark that (one and only) case in ->msg_flags, so that put_cmsg() would be able to check. ___sys_recvmsg() sets that as msg_sys->msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT); so we ought to be free to use any bit other than those two. Since put_cmsg() already checks ->msg_flags, that shouldn't put too much overhead. But then we'll need to do something to prevent speculative execution straying down that way, won't we? I'm not saying it can't be done, but quite a few of the remaining call sites will take serious work. Incidentally, what about copy_to_iter() and friends? They check iov_iter flavour and go either into the "copy to kernel buffer" or "copy to userland" paths. Do we need to deal with mispredictions there? We are calling a bunch of those on read()...