Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2075459imu; Fri, 23 Nov 2018 04:21:45 -0800 (PST) X-Google-Smtp-Source: AFSGD/WP/OLjHGaxNXi6fGW58zqGrPY8WLNxm4E4aswdZ7HtPfuyy6xOgV0CtjacnlfZ2ww1ba+X X-Received: by 2002:a17:902:7603:: with SMTP id k3mr15578906pll.285.1542975705661; Fri, 23 Nov 2018 04:21:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542975705; cv=none; d=google.com; s=arc-20160816; b=M8Drxyt850WBw0i7wiQ5W5s5/rahxTbdqe4TW6JYTvLsoq1GEFuvZeZor0419oQv3q jFZ3xyx3anfleyZS6jF0lYqv7xbJsJIhTkleRFlhzVEvfxuEW3DrpjmjerRZ3EMRuhWy skPkjEFHFemFDHjaIkfBIl6D5FenHVJispBtw7qCHpsY4voGOp1FmdY3oj74lsrJq2/P T70ImoihgEil8vIQepRbBiYQiFUJivP8p/JXNswUbOOZGyyAsoBtAUu/z7Q/TTMMlLWg hNu6cJKW/WadUpznNyu/EFvA9cE1Jcwfd9vUUWKO3ojgVh1JdUl3KxuzLlcGLC10nSzS HWdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=H1a/enDQ+EYjA4JQtNCBwGiLZKIAqipP5Ygu5oYRf0A=; b=IZ7wTi/KXW2a7oFJLDXe8zWDpen7KOVWOtFbIDaYGjUQvpJAYCszsxT+FYolpCNMKr QFpxZeeGLvYxx9kW50wDx8iEew1tB7UnUlOV9NtTbvivZxCg32JRDV7tdH1YlwOxgiiW wzWxbWrgc9Y2bfV6fCsyHRZZ2WKcbhwH5Kil2bzGcjTh20qnPoe75FzxeGIHa3XyTk8o 57gF9CHKXol6j04O6NtGbOKzBlwnyZBLSNRWRrug+2X7mZr20I57my73nSrQyXwxCNJy VSjBwKYuL5Kak2XLKtYBzqZazly2sfbttX1VtFF1ykG3UIWuRlSp4KGYE3heKcGONVi4 n8bA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=VAeOpClr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o6-v6si61922917pfa.162.2018.11.23.04.21.31; Fri, 23 Nov 2018 04:21:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=VAeOpClr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394624AbeKVVwm (ORCPT + 99 others); Thu, 22 Nov 2018 16:52:42 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:34454 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732060AbeKVVwm (ORCPT ); Thu, 22 Nov 2018 16:52:42 -0500 Received: by mail-wr1-f68.google.com with SMTP id j2so8871284wrw.1 for ; Thu, 22 Nov 2018 03:13:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=H1a/enDQ+EYjA4JQtNCBwGiLZKIAqipP5Ygu5oYRf0A=; b=VAeOpClrhXd27NhgWnpMRdbhQ2jRTKB47s+pEGnuXHjGEMCgci27C7WQJHWxNrSR8q FQhJZBuiklC1JnHYBO/jQxGZJYT66QOx5laueF1/WcmS4Rwy56WIWVJgwh9xQDMCg8vx q/0Y9VvTtXtVCw8PSWJZLyq0oosuXqSLUuPuvs7vlpGzIYCiw8ps+uFhCs16O+YpT5Jv NTfPDA5OGWtrMdqFKgHXlnFae2kaUlTcNKmbOSom+7YCwF1BA7jXnPs/hp8Xjd9ECTOu I//SS/I6CDRS5j7hLv4YX9UyrckjQY8QHsUI/cNvoZilk8YY/O+gKHxLDOS4Qj5zAzee YP2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=H1a/enDQ+EYjA4JQtNCBwGiLZKIAqipP5Ygu5oYRf0A=; b=YdOSODNfnHKXFTBqUAXl+ldGMWGjNaQ1uGyxYgckrxfWkJL1loPP94vIC7fGnaKLsB 267iOvjdCphW0Zp6JT0ivfFaKQFqZIe9mFkPUVoRWAjr0iB79j8c6QQJrH076uoDrVWO 4obAUVwlYClQiI7lsy+HqmmrYaArBtaRnJZHAZk/gxWevATn422IrCyPhoZq7sCy1yWS hk8QFfNXa8aEiSOqfTeoJrKxbkrfIWQEgr6GpT3AdlCR3npf43M9ddwc398A7ZubVzb6 OsXJiruEZZcTVNnWXzLxLPRLLN//T37ksJMdm0Gfd+0+u5j0ZokGtiaZzEbyy+ZrvDhz mIAg== X-Gm-Message-State: AA+aEWbUxxvPhu34GDapQhZJO34MXc3YnrfRnR1ce1nhKxVCnmi7GRJo Spz2fvGtpoP2Fsjr+Uz80QY= X-Received: by 2002:adf:b201:: with SMTP id u1mr1088809wra.165.1542885224504; Thu, 22 Nov 2018 03:13:44 -0800 (PST) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id v133sm5098944wmd.4.2018.11.22.03.13.43 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 22 Nov 2018 03:13:43 -0800 (PST) Date: Thu, 22 Nov 2018 12:13:41 +0100 From: Ingo Molnar To: Linus Torvalds Cc: pabeni@redhat.com, Jens Axboe , Thomas Gleixner , Ingo Molnar , bp@alien8.de, Peter Anvin , the arch/x86 maintainers , Andrew Morton , Andrew Lutomirski , Peter Zijlstra , dvlasenk@redhat.com, brgerst@gmail.com, Linux List Kernel Mailing Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes Message-ID: <20181122111341.GA107459@gmail.com> References: <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk> <20181121063609.GA109082@gmail.com> <48e27a3a-2bb2-ff41-3512-8aeb3fd59e57@kernel.dk> <1c22125bb5d22c2dcd686d0d3b390f115894f746.camel@redhat.com> <20181122103231.GA102790@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181122103231.GA102790@gmail.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > The kernel text size reduction with Jen's patch is small but real: > > text data bss dec hex filename > 19572694 11516934 19873888 50963516 309a43c vmlinux.before > 19572468 11516934 19873888 50963290 309a35a vmlinux.after > > But I checked the disassembly, and it's not a real win, the new code is > actually more complex than the old one, as expected, but GCC (7.3.0) does > some particularly stupid things which bloats the generated code. So I dug into this some more: 1) Firstly I tracked down GCC bloating the might_fault() checks and the related out-of-line code exception handling which bloats the full generated function. 2) But with even that complication eliminated, there's a size reduction when Jen's patch is applied, which is puzzling: 19563640 11516790 19882080 50962510 309a04e vmlinux.before 19563274 11516790 19882080 50962144 3099ee0 vmlinux.after but this is entirely due to the .altinstructions section being counted as 'text' part of the vmlinux - while in reality it's not: 3) The _real_ part of the vmlinux gets bloated by Jen's patch: ffffffff81000000 <_stext>: before: ffffffff81b0e5e0 <__clear_user> after: ffffffff81b0e670 <__clear_user>: I.e. we get a e5e0 => e670 bloat, as expected. In the config I tested a later section of the kernel image first aligns away the bloat: before: ffffffff82fa6321 <.altinstr_aux>: after: ffffffff82fa6321 <.altinstr_aux>: and then artificially debloats the modified kernel via the altinstructions section: before: Disassembly of section .exit.text: ffffffff83160798 after: Disassembly of section .exit.text: ffffffff83160608 Note that there's a third level of obfuscation here: Jen's patch actually *adds* a new altinstructions statement: + /* + * For smaller copies, don't use ERMS as it's slower. + */ + if (len < 128) { + alternative_call(copy_user_generic_unrolled, + copy_user_generic_string, X86_FEATURE_REP_GOOD, + ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), + "=d" (len)), + "1" (to), "2" (from), "3" (len) + : "memory", "rcx", "r8", "r9", "r10", "r11"); + return ret; + } + /* * If CPU has ERMS feature, use copy_user_enhanced_fast_string. * Otherwise, if CPU has rep_good feature, use copy_user_generic_string. * Otherwise, use copy_user_generic_unrolled. */ alternative_call_2(copy_user_generic_unrolled, - copy_user_generic_string, - X86_FEATURE_REP_GOOD, - copy_user_enhanced_fast_string, - X86_FEATURE_ERMS, + copy_user_generic_string, X86_FEATURE_REP_GOOD, + copy_user_enhanced_fast_string, X86_FEATURE_ERMS, ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), "=d" (len)), "1" (to), "2" (from), "3" (len) So how can this change possibly result in a *small* altinstructions section? 4) The reason is GCC's somewhat broken __builtin_constant() logic, which leaves ~10% of the constant call sites actually active, but which are then optimized by GCC's later stages, and the alternative_call_2() gets optimized out and replaced with the alternative_call() call. This is where Jens's patch 'debloats' the vmlinux and confuses the 'size' utility and gains its code reduction street cred. Note to self: watch out for patches that change altinstructions and don't make premature vmlinux size impact assumptions. :-) Thanks, Ingo