Received: by 10.223.176.46 with SMTP id f43csp3395859wra; Mon, 22 Jan 2018 13:31:07 -0800 (PST) X-Google-Smtp-Source: AH8x227wMt7ZnwSP1lo+vWORFC70kzsI4n9WHOWksY1hdxWYhkwxlLmm1m3ZkImZnCLzRWNn9Mn1 X-Received: by 10.107.53.83 with SMTP id c80mr492985ioa.90.1516656667274; Mon, 22 Jan 2018 13:31:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516656667; cv=none; d=google.com; s=arc-20160816; b=xORy5UI+HGEJc2NyAHUTdriQoveGm0AyKW72pqTjlyxUth/+7AFdspnHOHNRCgKeei 38fPJd9Vcz36LyI0rO5VAuZi/nUzaFPuCPS4bfNzdjnC+WlENOtB/mWeZQhmRLrk6D6A 0vd9x8lNG6VsY9rnLnS8Doi1ElIKyJOFChk4pPVC5IoqhCNT0x2gOGac3VPUQx/y6yis RM1xqEIpuqECisPC48hXkEYKnh4zYS+c9eNBuCM7LhI1Ioq+uvl8Oszq0WFGfJRcxhuq JSJzDNxLhkGpPf264GwONqLY+JHHw8vPHkJi655symO8vw9OgAYZK61hSIjKXf1BN0F2 JPYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=bCHiYnTXy31KZBLKyJLs2VwoeM66e7Q/EGoBgNo9ZH4=; b=tMC19MMM3hsRea5bUPOG9VLlqDV5SciphBc8MEAhHPi9q9FFdOf86eIBGg35mExVyk +W+ZlvWCYc0xg8HDIgLrYNslA5bU2IWSF5iAAezILTbDWdcfXvZh5OwRi1bpoBg4BdU9 QD1ThAvwcUrTCzK/6WDaaGEGypIdG5E3tgbWmVLbRnlRKvnQjBT9Js67EbYoXdjNPz+L tuEPyGkVuMKgWJpf16IpJOU3gsm5J0Bqzttwz9sFW4QVnT6u1vaChbFSl0sORPNrCSbq kDm70kG2vZL/xNwi5AS4AraGsB3Wn0dLVcSScwSctTq8Gw2cMUgO9pp4DgLAPScqZ2mD J4uw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 199si6897899ita.140.2018.01.22.13.30.54; Mon, 22 Jan 2018 13:31:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751048AbeAVVa3 (ORCPT + 99 others); Mon, 22 Jan 2018 16:30:29 -0500 Received: from terminus.zytor.com ([65.50.211.136]:41365 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750878AbeAVVa2 (ORCPT ); Mon, 22 Jan 2018 16:30:28 -0500 Received: from tazenda.hos.anvin.org (c-24-5-245-234.hsd1.ca.comcast.net [24.5.245.234] (may be forged)) (authenticated bits=0) by mail.zytor.com (8.15.2/8.15.2) with ESMTPSA id w0MLAPm7003441 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Mon, 22 Jan 2018 13:10:27 -0800 Subject: Re: [RFC PATCH 00/16] PTI support for x86-32 To: Linus Torvalds Cc: Nadav Amit , Joerg Roedel , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , LKML , "open list:MEMORY MANAGEMENT" , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , "Liguori, Anthony" , Daniel Gruss , Hugh Dickins , Kees Cook , Andrea Arcangeli , Waiman Long , Joerg Roedel References: <1516120619-1159-1-git-send-email-joro@8bytes.org> <5D89F55C-902A-4464-A64E-7157FF55FAD0@gmail.com> <886C924D-668F-4007-98CA-555DB6279E4F@gmail.com> <9CF1DD34-7C66-4F11-856D-B5E896988E16@gmail.com> <143DE376-A8A4-4A91-B4FF-E258D578242D@zytor.com> From: "H. Peter Anvin" Message-ID: Date: Mon, 22 Jan 2018 13:10:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/22/18 12:14, Linus Torvalds wrote: > On Sun, Jan 21, 2018 at 6:20 PM, wrote: >> >> No idea about Intel, but at least on Transmeta CPUs the limit check was asynchronous with the access. > > Yes, but TMTA had a really odd uarch and didn't check segment limits natively. > Only on TM3000 ("Wilma") and TM5000 ("Fred"), not on TM8000 ("Astro"). Astro might in fact have been more synchronous than most modern machines (see below.) > When you do it in hardware. the limit check is actually fairly natural > to do early rather than late (since it acts on the linear address > _before_ base add and TLB lookup). > > So it's not like it can't be done late, but there are reasons why a > traditional microarchitecture might always end up doing the limit > check early and so segmentation might be a good defense against > meltdown on 32-bit Intel. I will try to investigate, but as you can imagine the amount of bandwidth I might be able to get on this is definitely going to be limited. All of the below is generic discussion that almost certainly can be found in some form in Hennesey & Patterson, and so I don't have to worry about giving away Intel secrets: It isn't really true that it is natural to check this early. One of the most fundamental frequency limiters in a modern CPU architecture (meaning anything from the last 20 years or so) has been the data-dependent AGU-D$-AGU loop. Note that this doesn't even include the TLB: the TLB is looked up in parallel with the D$, and if the result was *either* a cache-TLB mismatch or a TLB miss the result is prevented from committing. In the case of the x86, the AGU receives up to three sources plus the segment base, and if possible given the target process and gates available might be designed to have a unified 4-input adder, with the 3-input case for limit checks being done separately. Misses and even more so exceptions (which are far less frequent than misses) are demoted to a slower where the goal is to prevent commit rather than trying to race to be in the data path. So although it is natural to *issue* the load and the limit check at the same time, the limit check is still going to be deferred. Whether or not it is permitted to be fully asynchronous with the load is probably a tradeoff of timing requirements vs complexity. At least theoretically one could imagine a machine which would take the trap after the speculative machine had already chased the pointer loop several levels down; this would most likely mean separate uops to allow for the existing out-of-order machine to do the bookkeeping. -hpa