Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758085AbcCCPFd (ORCPT ); Thu, 3 Mar 2016 10:05:33 -0500 Received: from mail-bl2on0127.outbound.protection.outlook.com ([65.55.169.127]:24775 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751247AbcCCPF3 convert rfc822-to-8bit (ORCPT ); Thu, 3 Mar 2016 10:05:29 -0500 X-Greylist: delayed 1081 seconds by postgrey-1.27 at vger.kernel.org; Thu, 03 Mar 2016 10:05:29 EST From: Dexuan Cui To: "linux-x86_64@vger.kernel.org" , "Thomas Gleixner" , Ingo Molnar , "H. Peter Anvin" , David Howells , "Paul E. McKenney" CC: "linux-kernel@vger.kernel.org" Subject: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Thread-Topic: x86 memory barrier: why does Linux prefer MFENCE to Locked ADD? Thread-Index: AdF1UXxpl5IsBBt0SWOmf/05vrR4MA== Date: Thu, 3 Mar 2016 14:33:15 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=microsoft.com; x-originating-ip: [167.220.232.225] x-ms-office365-filtering-correlation-id: e26f6d88-0317-4fca-5201-08d34370c239 x-microsoft-exchange-diagnostics: 1;BLUPR03MB1411;5:ShjCFlDiPeBFNoFjMWdrPttiWgGeg3uUs1IP1zIfiO/W0uazIgEiD8cUmHEQylHwj32kayzGfeYvSECW4y5Hf1U0pDS6OOFTJmjlr6DEO6GykKczcVQ9gWlkHevyn5JHvbQh3Nhmzt0ak8xj4ZOxFw==;24:ZftY2lgzh0eeGNaJFRDM9b3dIObEExz8LgNOFlO2ZlKG9z+dfHpUrcZUcNmIWyuZNuKOjbvbR493Wutn0BvKJSQxxX4hk2LG5FUbwVvPZJY= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1411; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(61426038)(61427038);SRVR:BLUPR03MB1411;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1411; x-forefront-prvs: 0870212862 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(164054003)(243025005)(2900100001)(229853001)(5001770100001)(87936001)(15975445007)(77096005)(86612001)(5008740100001)(10090500001)(10290500002)(15395725005)(5005710100001)(5001960100004)(10400500002)(86362001)(3280700002)(5004730100002)(3660700001)(33656002)(99286002)(8990500004)(76576001)(11100500001)(5003600100002)(66066001)(2906002)(92566002)(3846002)(102836003)(122556002)(6116002)(74316001)(40100003)(4326007)(1220700001)(1096002)(5002640100001)(54356999)(19580395003)(586003)(2501003)(50986999)(189998001);DIR:OUT;SFP:1102;SCL:1;SRVR:BLUPR03MB1411;H:BLUPR03MB1410.namprd03.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Mar 2016 14:33:15.6743 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR03MB1411 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1004 Lines: 31 Hi, My understanding about arch/x86/include/asm/barrier.h is: obviously Linux more likes {L,S,M}FENCE -- Locked ADD is only used in x86_32 platforms that don't support XMM2. However, it looks people say Locked Add is much faster than the FENCE instructions, even on modern Intel CPUs like Haswell, e.g., please see the three sources: " 11.5.1 Locked Instructions as Memory Barriers Optimization Use locked instructions to implement Store/Store and Store/Load barriers. " http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf "lock addl %(rsp), 0 is a better solution for StoreLoad barrier ": http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ "...locked instruction are more efficient barriers...": http://www.pvk.ca/Blog/2014/10/19/performance-optimisation-~-writing-an-essay/ I also found that FreeBSD prefers Locked Add. So, I'm curious why Linux prefers MFENCE. I guess I may be missing something. I tried to google the question, but didn't find an answer. Thanks, -- Dexuan