Received: by 10.192.165.148 with SMTP id m20csp1426778imm; Wed, 2 May 2018 22:09:14 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrFCQRL1z6JjOS19pBe0OPsRXLmeGF9JTXWQQpUP5n5oGMD92rW5U6fj/CUTQyOciH0sECo X-Received: by 10.98.92.6 with SMTP id q6mr9226503pfb.118.1525324154372; Wed, 02 May 2018 22:09:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525324154; cv=none; d=google.com; s=arc-20160816; b=M1W6WLpNH1WO1fHhKqMdDgqECWwkOKQcD8qA5HI04s/UlmzwtnaaivUMqmI49xvKxq +2FfL/tO84ds1LB2FxmC2NtIu//2gHWh7WblYRSkgFi83fHCN53gCA354NAaaMXRpzvn gSe5P0q9V0hiqptx2DGC4hGk5FNF92alnj2ghP3vBWZOyNfpG6u8uD3qNAJFS+yAi0oI U3/u0M/ix5o8jBQf8ktgeGwv0dRpkNnS/79raJa43eof1xtEUQh+w6gXWtN9md+OWMvr wNyPkW0MvMwHOrOVtsJFnqWDDVHgru3UnrxzLYbi2qG7osROEa1ga3vTKIi5QwZxbrJl 69OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:arc-authentication-results; bh=lfA/MtrEq6MMNeI9gsu5+gTxCy7yQv7B1BMD+fChK4Q=; b=ttKEwDDhxZ8sSYSQxxuJvz33TAcD9wM++TclH3V86Sy+MbkyQK3kh6odWpe+6JCz9k ZL+tJ1HtnSZ6uynkLgujh9KjKIzZIG6mZWjDQ286A8FtRZeKKp5Q7By9TLE7qm91mlMc RQaTGGQU3ynIby1TF7zCj2n6DnKiZ6kk+YkBAvgSqE+wmVNWSBQki7478byUnpBY0j/n Nj/VOg/IlNs3q9LfmdwGIic5zQiZ362EcI3PPQy/UWv2DNMukNk8C52lWN11pS5gSAEn jsAxotA/um2+Aa8PLdxhVPNTE+whVLQFBY9HkirVrRov2IsBfWf3L/4HC6feDAj7NafW vQyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 64-v6si12686322plb.574.2018.05.02.22.09.00; Wed, 02 May 2018 22:09:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752085AbeECFIs (ORCPT + 99 others); Thu, 3 May 2018 01:08:48 -0400 Received: from mga09.intel.com ([134.134.136.24]:52520 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbeECFIn (ORCPT ); Thu, 3 May 2018 01:08:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 May 2018 22:08:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,356,1520924400"; d="scan'208";a="37069664" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 02 May 2018 22:08:42 -0700 Subject: [PATCH v2 1/9] x86, memcpy_mcsafe: remove loop unrolling From: Dan Williams To: linux-nvdimm@lists.01.org Cc: x86@kernel.org, Ingo Molnar , Borislav Petkov , Tony Luck , Al Viro , Thomas Gleixner , Andy Lutomirski , Peter Zijlstra , Andrew Morton , Linus Torvalds , hch@lst.de, linux-kernel@vger.kernel.org, tony.luck@intel.com, linux-fsdevel@vger.kernel.org Date: Wed, 02 May 2018 21:58:46 -0700 Message-ID: <152532352111.17218.8776729190476488445.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152532351517.17218.3583455156840230837.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152532351517.17218.3583455156840230837.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In preparation for teaching memcpy_mcsafe() to return 'bytes remaining' rather than pass / fail, simplify the implementation to remove loop unrolling. The unrolling complicates the fault handling for negligible benefit given modern CPUs perform loop stream detection. Cc: Cc: Ingo Molnar Cc: Borislav Petkov Cc: Tony Luck Cc: Al Viro Cc: Thomas Gleixner Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Andrew Morton Reported-by: Linus Torvalds Signed-off-by: Dan Williams --- arch/x86/include/asm/string_64.h | 4 +-- arch/x86/lib/memcpy_64.S | 59 ++++++-------------------------------- 2 files changed, 12 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 533f74c300c2..4752f8984923 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -116,7 +116,7 @@ int strcmp(const char *cs, const char *ct); #endif #define __HAVE_ARCH_MEMCPY_MCSAFE 1 -__must_check int memcpy_mcsafe_unrolled(void *dst, const void *src, size_t cnt); +__must_check int __memcpy_mcsafe(void *dst, const void *src, size_t cnt); DECLARE_STATIC_KEY_FALSE(mcsafe_key); /** @@ -138,7 +138,7 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt) { #ifdef CONFIG_X86_MCE if (static_branch_unlikely(&mcsafe_key)) - return memcpy_mcsafe_unrolled(dst, src, cnt); + return __memcpy_mcsafe(dst, src, cnt); else #endif memcpy(dst, src, cnt); diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S index 9a53a06e5a3e..54c971892db5 100644 --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -184,11 +184,11 @@ ENDPROC(memcpy_orig) #ifndef CONFIG_UML /* - * memcpy_mcsafe_unrolled - memory copy with machine check exception handling + * __memcpy_mcsafe - memory copy with machine check exception handling * Note that we only catch machine checks when reading the source addresses. * Writes to target are posted and don't generate machine checks. */ -ENTRY(memcpy_mcsafe_unrolled) +ENTRY(__memcpy_mcsafe) cmpl $8, %edx /* Less than 8 bytes? Go to byte copy loop */ jb .L_no_whole_words @@ -213,49 +213,18 @@ ENTRY(memcpy_mcsafe_unrolled) jnz .L_copy_leading_bytes .L_8byte_aligned: - /* Figure out how many whole cache lines (64-bytes) to copy */ - movl %edx, %ecx - andl $63, %edx - shrl $6, %ecx - jz .L_no_whole_cache_lines - - /* Loop copying whole cache lines */ -.L_cache_w0: movq (%rsi), %r8 -.L_cache_w1: movq 1*8(%rsi), %r9 -.L_cache_w2: movq 2*8(%rsi), %r10 -.L_cache_w3: movq 3*8(%rsi), %r11 - movq %r8, (%rdi) - movq %r9, 1*8(%rdi) - movq %r10, 2*8(%rdi) - movq %r11, 3*8(%rdi) -.L_cache_w4: movq 4*8(%rsi), %r8 -.L_cache_w5: movq 5*8(%rsi), %r9 -.L_cache_w6: movq 6*8(%rsi), %r10 -.L_cache_w7: movq 7*8(%rsi), %r11 - movq %r8, 4*8(%rdi) - movq %r9, 5*8(%rdi) - movq %r10, 6*8(%rdi) - movq %r11, 7*8(%rdi) - leaq 64(%rsi), %rsi - leaq 64(%rdi), %rdi - decl %ecx - jnz .L_cache_w0 - - /* Are there any trailing 8-byte words? */ -.L_no_whole_cache_lines: movl %edx, %ecx andl $7, %edx shrl $3, %ecx jz .L_no_whole_words - /* Copy trailing words */ -.L_copy_trailing_words: +.L_copy_words: movq (%rsi), %r8 - mov %r8, (%rdi) - leaq 8(%rsi), %rsi - leaq 8(%rdi), %rdi + movq %r8, (%rdi) + addq $8, %rsi + addq $8, %rdi decl %ecx - jnz .L_copy_trailing_words + jnz .L_copy_words /* Any trailing bytes? */ .L_no_whole_words: @@ -276,8 +245,8 @@ ENTRY(memcpy_mcsafe_unrolled) .L_done_memcpy_trap: xorq %rax, %rax ret -ENDPROC(memcpy_mcsafe_unrolled) -EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled) +ENDPROC(__memcpy_mcsafe) +EXPORT_SYMBOL_GPL(__memcpy_mcsafe) .section .fixup, "ax" /* Return -EFAULT for any failure */ @@ -288,14 +257,6 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled) .previous _ASM_EXTABLE_FAULT(.L_copy_leading_bytes, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w0, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w1, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w2, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w3, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w4, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w5, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w6, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_w7, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_copy_trailing_words, .L_memcpy_mcsafe_fail) + _ASM_EXTABLE_FAULT(.L_copy_words, .L_memcpy_mcsafe_fail) _ASM_EXTABLE_FAULT(.L_copy_trailing_bytes, .L_memcpy_mcsafe_fail) #endif