Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2382175pxa; Mon, 3 Aug 2020 14:58:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaCJLo004L90wyxBnnG+8D07wHdxoXppmoFZVWYTc5cPO3Ymt4uD342NX9Ol6M/B+9zVj5 X-Received: by 2002:a05:6402:1855:: with SMTP id v21mr7740972edy.355.1596491913513; Mon, 03 Aug 2020 14:58:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596491913; cv=none; d=google.com; s=arc-20160816; b=aTkMCZU6PEV2dJxzBl86sejmXZ9etFqYVKEwpke+1mCfpEMEPuAweUrac6Bud3IAsJ noneA5VnOI4RczTRvvqyRV2NaSDi/hpzb9H7oJCKG8ZTclJZfskcYOWZynjkhle1/kzd 8OBl1a66D0XDOECYJUzM2j0wZqp+h9YmDEKdUZnHMzHhAsYP4rvWEjCiWXP92Ay6BZyQ v2JREv5chhb4lZdjKBMgchz5ZqiSHIh5ez+A1NfvWR2GnZ2DmPwRePqmORST4X+8ExEK o6nepSEgV4dbOdbcxJHP1RWEPr4YBztaDqsG/f/Mc+gc4+Csg3SUee1DnMomhNqO6eII FTxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:date:from :dkim-signature; bh=9/CYFaXh8bThql6xS8bKAvJejb77mKP97Fn6eGg2abM=; b=nZHgTicSdvO8dazCyckWJIeqwqyvNXf+rBIZIJNq8+qGiJx8XWVVRYnoovmxjL7THv DZh7NCyPKaCRcmISYB+OTJQ2k6yZYEQ9aM50SfDkvWvdA+xQ1cnRdz4gO03A2WCaZRid gQi+qRZRUq1Jdry6ZrJiYtZBKhN6I7iuTYZpI1z48CMtUJ4TC74eO9jXleEAVYo0B7lq kIjv1zcrLzEWwz2OAMy2XjglG+m42WPUnzQcgGiT2g1xFwYQj/Qaubpgxk3fS2P4srqS qCzdCRlEdHFlO4sEgLtAEFPvIKEC9IZF8Tx48bsiz1EJ/DrNdCb+hPsvGtypI7omryMq gBaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=XLFUj46j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e24si9776470eja.210.2020.08.03.14.58.11; Mon, 03 Aug 2020 14:58:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=XLFUj46j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728028AbgHCV5w (ORCPT + 99 others); Mon, 3 Aug 2020 17:57:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726239AbgHCV5u (ORCPT ); Mon, 3 Aug 2020 17:57:50 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65527C06174A for ; Mon, 3 Aug 2020 14:57:50 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id l6so36647463qkc.6 for ; Mon, 03 Aug 2020 14:57:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=9/CYFaXh8bThql6xS8bKAvJejb77mKP97Fn6eGg2abM=; b=XLFUj46jwBoJ1+zmfy7YEAh45GxdkUSQ/u/LZITjdY7/IFm0dF7RZp4jDemVhDodwC bV5Z4UrwnjUGV7ZdgLSqhWKqRYpwRpVmjaySXSJBAn4QDJLDqHX2JRENsAnNOoq5fXH2 eA6aR16r01dagR20o+lLbrcKvQT1XlWre9f/nmSvQRN8+Ch8E0X4PCSN2n9gs/V/E7Sp +Q6j0i8J78/8QmN6XwW28lYKOqs8TDUM3mUlPce1h/2wUBUmS653skqqNG+pTnRG3bCi 7HI6pjpxK8cWXmJIVbTgznf0Zhn9z8O9/jm62BnMHkefGZjsum0rU0V5TVt7eFBHeu6l eGHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:date:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=9/CYFaXh8bThql6xS8bKAvJejb77mKP97Fn6eGg2abM=; b=LR2cDElT5JaHQOexSAKF5h95i8gJjK1WlKrjY7gjsz8jQzjfC6x2gHMSP6Z2Ps2n7l gm6gWrMusBHg0W3ZUns5yKfBwx5uWWco7Jk63vlSEdNVJONJWrCbN947JzIuAz4J32g1 NIeo2cuGDdX/YjF2zYU6c3yWqjaT4knQDkIsbsRkp3q5+129huo2X7/R5odsBxUT5k2s Dyx1ESPjUGZcyK4bt6z0vOGcL26mpwJ14P9ONotMMJ7UKtLzLa0T2Uyy6O175pRMOCzc jweHxKR4IR0Qn3JDP4qGzMH6NJ/ftWmj6+niKYqgZBHnSLBcC6hGJMDOscNmooQwAb36 qBLg== X-Gm-Message-State: AOAM5322mzVz/x0FqBPZc2IO4/3BAV3Z1ftFfIKFTXn6ofOlIDLsqTXx 03mQdq+aLuIXR9i+fvK66vM= X-Received: by 2002:a37:7245:: with SMTP id n66mr7113938qkc.83.1596491869458; Mon, 03 Aug 2020 14:57:49 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id 3sm21724598qkm.117.2020.08.03.14.57.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 14:57:49 -0700 (PDT) From: Arvind Sankar X-Google-Original-From: Arvind Sankar Date: Mon, 3 Aug 2020 17:57:47 -0400 To: Nick Terrell Cc: Ingo Molnar , linux-kernel@vger.kernel.org, x86@kernel.org, Kernel Team , Nick Terrell , Yann Collet , Gao Xiang , Sven Schmidt <4sschmid@informatik.uni-hamburg.de>, Andrew Morton , Greg Kroah-Hartman , Linus Torvalds Subject: Re: [PATCH] lz4: Fix kernel decompression speed Message-ID: <20200803215747.GA1644409@rani.riverdale.lan> References: <20200803194022.2966806-1-nickrterrell@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200803194022.2966806-1-nickrterrell@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 03, 2020 at 12:40:22PM -0700, Nick Terrell wrote: > From: Nick Terrell > > This patch replaces all memcpy() calls with LZ4_memcpy() which calls > __builtin_memcpy() so the compiler can inline it. > > LZ4 relies heavily on memcpy() with a constant size being inlined. In > x86 and i386 pre-boot environments memcpy() cannot be inlined because > memcpy() doesn't get defined as __builtin_memcpy(). > > An equivalent patch has been applied upstream so that the next import > won't lose this change [1]. > > I've measured the kernel decompression speed using QEMU before and after > this patch for the x86_64 and i386 architectures. The speed-up is about > 10x as shown below. > > Code Arch Kernel Size Time Speed > v5.8 x86_64 11504832 B 148 ms 79 MB/s > patch x86_64 11503872 B 13 ms 885 MB/s > v5.8 i386 9621216 B 91 ms 106 MB/s > patch i386 9620224 B 10 ms 962 MB/s > > I also measured the time to decompress the initramfs on x86_64, i386, > and arm. All three show the same decompression speed before and after, > as expected. > > [1] https://github.com/lz4/lz4/pull/890 > Hi Nick, would you be able to test the below patch's performance to verify it gives the same speedup? It removes the #undef in misc.c which causes the decompressors to not use the builtin version. It should be equivalent to yours except for applying it to all the decompressors. Thanks. From 10f8d939fc367e3127e2d72ba099678debcae422 Mon Sep 17 00:00:00 2001 From: Arvind Sankar Date: Mon, 3 Aug 2020 17:07:37 -0400 Subject: [PATCH] x86/boot/compressed: Use builtin mem functions for decompressor Since commits c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions") fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c") the decompressor stub has been using the compiler's builtin memcpy, memset and memcmp functions, _except_ where it would likely have the largest impact, in the decompression code itself. Remove the #undef's of memcpy and memset in misc.c so that the decompressor code also uses the compiler builtins. The rationale given in the comment doesn't really apply: just because some functions use the out-of-line version is no reason to not use the builtin version in the rest. Replace the comment with an explanation of why memzero and memmove are being #define'd. Drop the suggestion to #undef in boot/string.h as well: the out-of-line versions are not really optimized versions, they're generic code that's good enough for the preboot environment. The compiler will likely generate better code for constant-size memcpy/memset/memcmp if it is allowed to. Signed-off-by: Arvind Sankar --- arch/x86/boot/compressed/misc.c | 7 ++----- arch/x86/boot/string.h | 5 +---- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 9652d5c2afda..0c74a6e526b6 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -30,12 +30,9 @@ #define STATIC static /* - * Use normal definitions of mem*() from string.c. There are already - * included header files which expect a definition of memset() and by - * the time we define memset macro, it is too late. + * Provide definitions of memzero and memmove as some of the decompressors will + * try to define their own functions if these are not defined as macros. */ -#undef memcpy -#undef memset #define memzero(s, n) memset((s), 0, (n)) #define memmove memmove diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h index 995f7b7ad512..a232da487cd2 100644 --- a/arch/x86/boot/string.h +++ b/arch/x86/boot/string.h @@ -11,10 +11,7 @@ void *memcpy(void *dst, const void *src, size_t len); void *memset(void *dst, int c, size_t len); int memcmp(const void *s1, const void *s2, size_t len); -/* - * Access builtin version by default. If one needs to use optimized version, - * do "undef memcpy" in .c file and link against right string.c - */ +/* Access builtin version by default. */ #define memcpy(d,s,l) __builtin_memcpy(d,s,l) #define memset(d,c,l) __builtin_memset(d,c,l) #define memcmp __builtin_memcmp -- 2.26.2