Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753089AbdDJWsJ (ORCPT ); Mon, 10 Apr 2017 18:48:09 -0400 Received: from g4t3425.houston.hpe.com ([15.241.140.78]:9655 "EHLO g4t3425.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752164AbdDJWsI (ORCPT ); Mon, 10 Apr 2017 18:48:08 -0400 X-Greylist: delayed 14061 seconds by postgrey-1.27 at vger.kernel.org; Mon, 10 Apr 2017 18:48:07 EDT From: "Kani, Toshimitsu" To: Dan Williams CC: "linux-nvdimm@lists.01.org" , Jan Kara , Matthew Wilcox , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" , Christoph Hellwig , Jeff Moyer , Ingo Molnar , Al Viro , "H. Peter Anvin" , Thomas Gleixner , Ross Zwisler Subject: RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Thread-Topic: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Thread-Index: AQHSr/2ySJko6sEzVEqxW9ti8glLH6G+8Z5AgAAtH4CAAAMYAIAAAvcAgAABTfCAAAhNAIAAAWTw Date: Mon, 10 Apr 2017 22:47:02 +0000 Message-ID: References: <149161025237.38725.13508986873214668503.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=hpe.com; x-originating-ip: [174.51.38.138] x-microsoft-exchange-diagnostics: 1;CS1PR84MB0295;7:+9BeIqkb0r5Ny1h3Xd7jSisyoP7lPIaJvn1ueUADyrQZcSNdnMW5ROPv2M0S5AmdYhJaESe8D70tXsUVI1BlagIQFTAYmLq9ezPhuJyfjzdJeaAKSjf1IU/jPB22exKhxGUxxaEDiL2FD2cLAd7Z/qAOWz+NOLo7vowXDAMG4LLc7/5CJuNZk3VfPNWwj+DwmDqH3Uh93kAWHzGuqf3/YPMg0BTGnHYPC5FfvaMPVe//cN2/6dF33P02Iy+yaYB+sVZQ1ArhbKVp63IvYO91k4MGU1+TRvBmcIsttO/o170XFNJkr1mmXW2fBexGsq1tQ0X4DBQ1X+jKszJomyKsbw== x-ms-office365-filtering-correlation-id: 0be0bda0-b1a2-4635-c9e2-08d480638148 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:CS1PR84MB0295; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(6055026)(6041248)(20161123560025)(20161123562025)(20161123555025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(6072148);SRVR:CS1PR84MB0295;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0295; x-forefront-prvs: 027367F73D x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39860400002)(39450400003)(39840400002)(39410400002)(39400400002)(39850400002)(122556002)(305945005)(7416002)(9686003)(5660300001)(86362001)(7736002)(2900100001)(74316002)(6116002)(102836003)(3846002)(55016002)(8666007)(54906002)(6916009)(2950100002)(53936002)(66066001)(189998001)(7696004)(3280700002)(33656002)(6436002)(81166006)(8676002)(8936002)(2906002)(76176999)(6506006)(50986999)(77096006)(93886004)(229853002)(110136004)(4326008)(38730400002)(25786009)(3660700001)(54356999);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0295;H:CS1PR84MB0294.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Apr 2017 22:47:02.4270 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0295 X-OriginatorOrg: hpe.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id v3AMmLAd022015 Content-Length: 1408 Lines: 34 > >> > The clflush here flushes for the cacheline size. So, we do not need to > flush > >> > the same cacheline again when the unaligned tail is in the same line. > >> > >> Ok, makes sense. Last question, can't we reduce the check to be: > >> > >> if ((bytes > flushed) && ((bytes - flushed) & 3)) > >> > >> ...since if 'bytes' was 4-byte aligned we would have performed > >> non-temporal stores. > > > > That is not documented behavior of copy_user_nocache, but as long as the > pmem > > version of copy_user_nocache follows the same implemented behavior, yes, > that > > works. > > Hmm, sorry this comment confuses me, I'm only referring to the current > version of __copy_user_nocache not the new pmem version. The way I > read the current code we only ever jump to the cached copy loop > (.L_1b_cache_copy_loop) if the trailing byte-count is 4-byte > misaligned. Yes, you are right and that's how the code is implemented. I added this trailing 4-byte handling for the >=8B case, which is shared with <8B case, since it was easy to do. But I considered it a bonus. This function also needs to handle 4B-aligned destination if it is to state that it handles 4B alignment for the >=8B case as well. Otherwise, it's inconsistent. Since I did not see much point of supporting such case, I simply documented in the Note that 8 byte alignment is required for the >=8B case. Thanks, -Toshi