Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp579856pxy; Wed, 5 May 2021 08:49:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyFLiLQUlM+QJGCUEz1CrXCJUzrVJ7CgRO9jC3+F4RS8BN/vGHW+AJ8V9aqJWELyWqvbom1 X-Received: by 2002:aa7:d952:: with SMTP id l18mr32361242eds.83.1620229798954; Wed, 05 May 2021 08:49:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620229798; cv=none; d=google.com; s=arc-20160816; b=IRotnzNWLndSEPjHPF8XBp56T8W/MdYt6x6qW6wPS/wpK4CgZ/zbPev3ylmS1mevIX eWNxbmd46WAViV+YXSzyOY4hxcQ9CDLiyFEh0wcZrR22K28AH1SMSyNWnOYTDRqigqqe /lAELCd0qd4qC3CzU+In00qNsvUKo9vuGiNP/7g7HQ3fJfjcXAhT/NvKqa6kDzX1aYj4 5biJIHQaj6mjPhFFhf0udRVwOYs/og8UjufhhzyQzKU7bAP4XqVXvWIfvjIu5OXxgYkh vxAntoR5Jl6NAs+ftnN9CbzHhe0wGrBO7OwE5NGYjtRjsU9f8LEWdG9bOK+C2l9Ia2RR b0Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=U8/tUoiR6KfK6O5N9yG65qBra7gG3TFHMqmfmLj4uSk=; b=V9QfAXCv33gXpGP1I9iskEpvNy1rVzOcyiqizZWBDa9jFxG4ryAMPmITJ7yyy6ePmW dlDqrIDVyZISe1fwyj1+WTJsaB866bcCfWsfUuVv5+CwAxKIe12MdyYGLO3bOusT+Ehl VOEkmjt7LCl00adpZRx76/iISaiHC++25LKwP7aOILxfEcHccnLvmo/Hlz8+mqQH7fhz hVyPLZxFKrCWH68EbfoLYIZTMVxKRqvoTcf+Vhi+qPLUqB6ijmqp3/uIOQNdqXLqsa+Y HqKlq/yHFIlsI5tK2g/nWXeLdj+uDOKZPNfRWnJUJPzuP3p5bWzFDYjFRTTVJ7nEmfh9 DYuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ka11si5493196ejc.663.2021.05.05.08.49.34; Wed, 05 May 2021 08:49:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233594AbhEEPqY (ORCPT + 99 others); Wed, 5 May 2021 11:46:24 -0400 Received: from foss.arm.com ([217.140.110.172]:46518 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233466AbhEEPqX (ORCPT ); Wed, 5 May 2021 11:46:23 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 337AC1FB; Wed, 5 May 2021 08:45:26 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.28.242]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BDFDA3F718; Wed, 5 May 2021 08:45:23 -0700 (PDT) Date: Wed, 5 May 2021 16:45:21 +0100 From: Mark Rutland To: David Laight Cc: Josh Poimboeuf , Al Viro , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Linus Torvalds , Will Deacon , Dan Williams , Andrea Arcangeli , Waiman Long , Peter Zijlstra , Thomas Gleixner , Andrew Cooper , Andy Lutomirski , Christoph Hellwig , Borislav Petkov Subject: Re: [PATCH v4 3/4] x86/uaccess: Use pointer masking to limit uaccess speculation Message-ID: <20210505154521.GD5605@C02TD0UTHF1T.local> References: <5ba93cdbf35ab40264a9265fc24575a9b2f813b3.1620186182.git.jpoimboe@redhat.com> <20210505142542.GC5605@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 05, 2021 at 02:49:53PM +0000, David Laight wrote: > From: Mark Rutland > > Sent: 05 May 2021 15:26 > ... > > > +/* > > > + * Sanitize a user pointer such that it becomes NULL if it's not a valid user > > > + * pointer. This prevents speculatively dereferencing a user-controlled > > > + * pointer to kernel space if access_ok() speculatively returns true. This > > > + * should be done *after* access_ok(), to avoid affecting error handling > > > + * behavior. > > > + */ > > > +#define mask_user_ptr(ptr) \ > > > +({ \ > > > + unsigned long _ptr = (__force unsigned long)ptr; \ > > > + unsigned long mask; \ > > > + \ > > > + asm volatile("cmp %[max], %[_ptr]\n\t" \ > > > + "sbb %[mask], %[mask]\n\t" \ > > > + : [mask] "=r" (mask) \ > > > + : [_ptr] "r" (_ptr), \ > > > + [max] "r" (TASK_SIZE_MAX) \ > > > + : "cc"); \ > > > + \ > > > + mask &= _ptr; \ > > > + ((typeof(ptr)) mask); \ > > > +}) > > > > On arm64 we needed to have a sequence here because the addr_limit used > > to be variable, but now that we've removed set_fs() and split the > > user/kernel access routines we could simplify that to an AND with an > > immediate mask to force all pointers into the user half of the address > > space. IIUC x86_64 could do the same, and I think that was roughly what > > David was suggesting. > > Something like that :-) > > For 64bit you can either unconditionally mask the user address > (to clear a few high bits) or mask with a calculated value > if the address is invalid. > The former is almost certainly better. Sure; I was thinking of the former as arm64 does the latter today. > The other thing is that a valid length has to be less than > the TASK_SIZE_MAX. > Provided there are 2 zero bits at the top of every user address > you can check 'addr | size < limit' and know that 'addr + size' > won't wrap into kernel space. I see. The size concern is interesting, and I'm not sure whether it practically matters. If the size crosses the user/kernel gap, then for this to (potentially) be a problem the CPU must speculate an access past the gap before it takes the exception for the first access that hits the gap. With that in mind: * If the CPU cannot wildly mispredict an iteration of a uaccess loop (e.g. issues iterations in-order), then it would need to speculate accesses for the entire length of the gap without having raised an exception. For arm64 that's at least 2^56 bytes, which even with SVE's 256-bit vectors that's 2^40 accesses. I think it's impractical for a CPU to speculate a window this large before taking an exception. * If the CPU can wildly mispredict an iteration of a uaccess loop (e.g. do this non-sequentially and generate offests wildly), then it can go past the validated size boundary anyway, and we'd have to mask the pointer immediately prior to the access. Beyond value prediction, I'm not sure how this could happen given the way we build those loops. ... so for architectures with large user/kernel gaps I'm not sure that it's necessary to check the size up-front. On arm64 we also have a second defence as our uaccess primitives use "unprivileged load/store" instructions LDTR and STTR, which use the user permissions even when executed in kernel mode. So on CPUs where permissions are respected under speculation these cannot access kernel memory. Thanks, Mark.