Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753103AbdDJWKB (ORCPT ); Mon, 10 Apr 2017 18:10:01 -0400 Received: from mail-oi0-f44.google.com ([209.85.218.44]:36018 "EHLO mail-oi0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752144AbdDJWJ7 (ORCPT ); Mon, 10 Apr 2017 18:09:59 -0400 MIME-Version: 1.0 In-Reply-To: References: <149161025237.38725.13508986873214668503.stgit@dwillia2-desk3.amr.corp.intel.com> From: Dan Williams Date: Mon, 10 Apr 2017 15:09:58 -0700 Message-ID: Subject: Re: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions To: "Kani, Toshimitsu" Cc: "linux-nvdimm@lists.01.org" , Jan Kara , Matthew Wilcox , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" , Christoph Hellwig , Jeff Moyer , Ingo Molnar , Al Viro , "H. Peter Anvin" , Thomas Gleixner , Ross Zwisler Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2285 Lines: 56 On Mon, Apr 10, 2017 at 2:45 PM, Kani, Toshimitsu wrote: >> On Mon, Apr 10, 2017 at 2:28 PM, Kani, Toshimitsu >> wrote: >> >> > Thanks for the update. I think the alignment check should be based on >> >> > the following note in copy_user_nocache. >> >> > >> >> > * Note: Cached memory copy is used when destination or size is not >> >> > * naturally aligned. That is: >> >> > * - Require 8-byte alignment when size is 8 bytes or larger. >> >> > * - Require 4-byte alignment when size is 4 bytes. >> >> > >> >> > So, I think the code may be something like this. I also made the following >> >> changes: >> >> >> >> Thanks! >> >> >> >> > - Mask with 7, not 8. >> >> >> >> Yes, good catch. >> >> >> >> > - ALIGN with cacheline size, instead of 8. >> >> > - Add (bytes > flushed) test since calculation with unsigned long still >> results >> >> in a negative >> >> > value (as a positive value). >> >> > >> >> > if (bytes < 8) { >> >> > if ((dest & 3) || (bytes != 4)) >> >> > arch_wb_cache_pmem(addr, 1); >> >> > } else { >> >> > if (dest & 7) { >> >> > dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); >> >> >> >> Why align the destination to the next cacheline? As far as I can see >> >> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns >> >> to the next 8-byte boundary. >> > >> > The clflush here flushes for the cacheline size. So, we do not need to flush >> > the same cacheline again when the unaligned tail is in the same line. >> >> Ok, makes sense. Last question, can't we reduce the check to be: >> >> if ((bytes > flushed) && ((bytes - flushed) & 3)) >> >> ...since if 'bytes' was 4-byte aligned we would have performed >> non-temporal stores. > > That is not documented behavior of copy_user_nocache, but as long as the pmem > version of copy_user_nocache follows the same implemented behavior, yes, that > works. Hmm, sorry this comment confuses me, I'm only referring to the current version of __copy_user_nocache not the new pmem version. The way I read the current code we only ever jump to the cached copy loop (.L_1b_cache_copy_loop) if the trailing byte-count is 4-byte misaligned.