Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753336AbeAMUXh (ORCPT + 1 other); Sat, 13 Jan 2018 15:23:37 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:59606 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354AbeAMUXd (ORCPT ); Sat, 13 Jan 2018 15:23:33 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Dan Williams , Linux Kernel Mailing List , linux-arch@vger.kernel.org, Andi Kleen , Kees Cook , kernel-hardening@lists.openwall.com, Greg Kroah-Hartman , "the arch\/x86 maintainers" , Ingo Molnar , Al Viro , "H. Peter Anvin" , Thomas Gleixner , Andrew Morton , Alan Cox References: <151586744180.5820.13215059696964205856.stgit@dwillia2-desk3.amr.corp.intel.com> <151586748981.5820.14559543798744763404.stgit@dwillia2-desk3.amr.corp.intel.com> Date: Sat, 13 Jan 2018 14:22:17 -0600 In-Reply-To: (Linus Torvalds's message of "Sat, 13 Jan 2018 11:33:50 -0800") Message-ID: <87inc5zeyu.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eaSKY-0006Mp-IV;;;mid=<87inc5zeyu.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.121.73.102;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/5LzJVcd92Bm/9tPAdITPQYTIBtcb2cs4= X-SA-Exim-Connect-IP: 97.121.73.102 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v3 8/9] x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Linus Torvalds writes: > On Sat, Jan 13, 2018 at 11:05 AM, Linus Torvalds > wrote: >> >> I _know_ that lfence is expensive as hell on P4, for example. >> >> Yes, yes, "sbb" is often more expensive than most ALU instructions, >> and Agner Fog says it has a 10-cycle latency on Prescott (which is >> outrageous, but being one or two cycles more due to the flags >> generation is normal). So the sbb/and may certainly add a few cycles >> to the critical path, but on Prescott "lfence" is *50* cycles >> according to those same tables by Agner Fog. > > Side note: I don't think P4 is really relevant for a performance > discussion, I was just giving it as an example where we do know actual > cycles. > > I'm much more interested in modern Intel big-core CPU's, and just > wondering whether somebody could ask an architect. > > Because I _suspect_ the answer from a CPU architect would be: "Christ, > the sbb/and sequence is much better because it doesn't have any extra > serialization", but maybe I'm wrong, and people feel that lfence is > particularly easy to do right without any real downside. As an educated observer it seems like the cmpq/sbb/and sequence is an improvement because it moves the dependency from one end of the cpu pipeline to another. If any cpu does data speculation on anything other than branch targets that sequence could still be susceptible to speculation. >From the AMD patches it appears that lfence is becoming a serializing instruction which in principal is much more expensive. Also do we have alternatives for these sequences so if we run on an in-order atom (or 386 or 486) where speculation does not occur we can avoid the cost? Eric