Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp998171ybt; Fri, 19 Jun 2020 21:04:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvtQdi4/oFF+Vt7fl3xnNpeEqLZ3beIfRLetZE6/BQUe1kDcmyjaqRwz7/kY1VhdXz85Ov X-Received: by 2002:a05:6402:1714:: with SMTP id y20mr6534110edu.81.1592625893368; Fri, 19 Jun 2020 21:04:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592625893; cv=none; d=google.com; s=arc-20160816; b=A0XOweDUENLFlly/VRA6V3AHh7JjXEfwKETk60Oo6jKbFbV8NWiGdQ5dT1itvuK47j TQ9RrTrm+2X5vgkJYGFeU4goyhSRlbkyBs+GwaOnc8zOQMRIM8ekmKWuehh/R27rn9NW 5gT7dqhPCOevrRmmpmcL2ZCH8QBsUuJc+QhR9M+13ZhNALtmwvKVLYd996GemS6qZ8Et 542zGtZdVsX+13BXOmWR8NGMTbXmSMOUYO7T4xEeFcyRLJ0w1/FpkWVytSEM4BNPsCk/ t6t711qlwqmmh98YkdGRwJ0TapHV6B8q0iZHAv5841WvUSk1U0eZvJR0Fy928PJc5tKh kfPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :robot-unsubscribe:robot-id:message-id:mime-version:references :in-reply-to:cc:subject:to:reply-to:from:date; bh=YnW+syZkuSE//th36lEwnNpeXmjVaBnKqoHcAn7Thuk=; b=pWYXgvBp933uDmWh1kKADwzZgNVt81/XMtjJsuyIyNB/dQuhXaDIUVQlYGxU+6QgHa 4BeUUoypd+2OWX1WLAC2MyJP0rz7xpnr40qGrhW1i8Uv206n9WLbMIa2UZYXVbxOa0ua +oD4yXXL4XbX94QlQdhkjnTAp/wtzJ8uwY95ZV7A7dt9Wgxxbdu7zPk9SzcVHpTX710b iQ7oA9l4v0BOZfEux1OgrFK3dCpNq6+e8LykhgT4puQ/wYZuJrRAGv6tLiMXWq78jysX cAkJpM7yUN9hzXghft/yUX0wZAvNH1hT4dT0uDPF+TCbALOtZfQRKREfNH2mwNKw2Bug 1QZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d17si5069664ejw.42.2020.06.19.21.04.31; Fri, 19 Jun 2020 21:04:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2395452AbgFSQkq (ORCPT + 99 others); Fri, 19 Jun 2020 12:40:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2395444AbgFSQkk (ORCPT ); Fri, 19 Jun 2020 12:40:40 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70A1EC06174E; Fri, 19 Jun 2020 09:40:40 -0700 (PDT) Received: from [5.158.153.53] (helo=tip-bot2.lab.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jmK4C-0003r7-LU; Fri, 19 Jun 2020 18:40:36 +0200 Received: from [127.0.1.1] (localhost [IPv6:::1]) by tip-bot2.lab.linutronix.de (Postfix) with ESMTP id 0CC871C0085; Fri, 19 Jun 2020 18:40:36 +0200 (CEST) Date: Fri, 19 Jun 2020 16:40:35 -0000 From: "tip-bot2 for Matt Fleming" Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/urgent] x86/asm/64: Align start of __clear_user() loop to 16-bytes Cc: Matt Fleming , Borislav Petkov , , x86 , LKML In-Reply-To: <20200618102002.30034-1-matt@codeblueprint.co.uk> References: <20200618102002.30034-1-matt@codeblueprint.co.uk> MIME-Version: 1.0 Message-ID: <159258483578.16989.3987549539950250015.tip-bot2@tip-bot2> X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the x86/urgent branch of tip: Commit-ID: bb5570ad3b54e7930997aec76ab68256d5236d94 Gitweb: https://git.kernel.org/tip/bb5570ad3b54e7930997aec76ab68256d5236d94 Author: Matt Fleming AuthorDate: Thu, 18 Jun 2020 11:20:02 +01:00 Committer: Borislav Petkov CommitterDate: Fri, 19 Jun 2020 18:32:11 +02:00 x86/asm/64: Align start of __clear_user() loop to 16-bytes x86 CPUs can suffer severe performance drops if a tight loop, such as the ones in __clear_user(), straddles a 16-byte instruction fetch window, or worse, a 64-byte cacheline. This issues was discovered in the SUSE kernel with the following commit, 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") which increased the code object size from 10 bytes to 15 bytes and caused the 8-byte copy loop in __clear_user() to be split across a 64-byte cacheline. Aligning the start of the loop to 16-bytes makes this fit neatly inside a single instruction fetch window again and restores the performance of __clear_user() which is used heavily when reading from /dev/zero. Here are some numbers from running libmicro's read_z* and pread_z* microbenchmarks which read from /dev/zero: Zen 1 (Naples) libmicro-file 5.7.0-rc6 5.7.0-rc6 5.7.0-rc6 revert-1153933703d9+ align16+ Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%) Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%) Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%) Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%) Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%) Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%) Note that this doesn't affect Haswell or Broadwell microarchitectures which seem to avoid the alignment issue by executing the loop straight out of the Loop Stream Detector (verified using perf events). Fixes: 1153933703d9 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") Signed-off-by: Matt Fleming Signed-off-by: Borislav Petkov Cc: # v4.19+ Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk --- arch/x86/lib/usercopy_64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index fff28c6..b0dfac3 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -24,6 +24,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size) asm volatile( " testq %[size8],%[size8]\n" " jz 4f\n" + " .align 16\n" "0: movq $0,(%[dst])\n" " addq $8,%[dst]\n" " decl %%ecx ; jnz 0b\n"