Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2515363pxa; Mon, 3 Aug 2020 18:59:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxyu+WaxXnvAXxx5fcgQvSuaf3FwF7zUeUNe8qqMDivik45VL069EVzEJG9UmL0dRq5z64a X-Received: by 2002:a17:906:960a:: with SMTP id s10mr18935232ejx.60.1596506355306; Mon, 03 Aug 2020 18:59:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596506355; cv=none; d=google.com; s=arc-20160816; b=Fgh/8+vyCIrT6Xg9l+ZV9yVj/ozMbMmWRWAvleXi0Phn7ywhM4k4Wg/gyMAeJe41nE ZmZ3bLolrLrhom7zJ7JJKMA9hmxnVxyynUDCJbuCrk+gIGvLcgd7rxG+TY6IKUF6KsHm xUhRDs+1pJkHTIdm7GExRyijrw9He/M8A4qoSMBIwd+tNa0GWAnr/ogw1C0ROgrfJ/WQ vEQwilWWpw/YIN0dgxixtKf8shBnaQ2PNXsd/6KzclD0/J0evKl/QLvBv/usrUblvnDi 94Z5UeFCU0osZS0PTcCzqOAtxpn+SGpTXKqZBRU0j5GU4980iPmsA6uE7vBsjPX3jMEe nTKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:date:from :dkim-signature; bh=Ah+PGP8o/FyvN4G1VIxPhld3lHPU5Vf3p/CKy9L7JHU=; b=rTZzxbGCw7xZsZSTe4G3ddm5PQrIA0ndAqKa3yTTxiLPadjnS561PUx6Eio5xBfKwS 4c6GwP5WQcJQov91f5uAOG+/Ja4JCqWfI6E8WrR2xVCW5s696EQCbpiLMq6Djwg3Cc/A noNIu+DZxNf1Vj72TpaDPF4e1CaKOWvcXxfVzV7mu/jnWvOUa0tlwEazC4lI2HpQEm/H 8Zcwvj6tYB7yNdRG6ph17bcR4CSBhW6XWhNpxYbz46MmZGcKdz2NMt1zcrp677gPgBSP seIzfHnZBxAHSDIBmAjmvlVegzZv+tOSNEvJccB1HsCfnP4sNd1PIpuIO8ZwHq+VhLdp xcBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="rqh/K0Z8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x9si11552669edj.121.2020.08.03.18.58.53; Mon, 03 Aug 2020 18:59:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="rqh/K0Z8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726823AbgHDB46 (ORCPT + 99 others); Mon, 3 Aug 2020 21:56:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726276AbgHDB46 (ORCPT ); Mon, 3 Aug 2020 21:56:58 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57C71C06174A for ; Mon, 3 Aug 2020 18:56:58 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id b25so29836362qto.2 for ; Mon, 03 Aug 2020 18:56:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Ah+PGP8o/FyvN4G1VIxPhld3lHPU5Vf3p/CKy9L7JHU=; b=rqh/K0Z8o4cRv/wOXznVBNzevV2V7Bh/f9d4Lhgj36/+jmZgclumX4v86fggvH70lh MPHYaNd+ARKYE81oPUXg3zpmFjoIGL8esu+97xyTg1VVsBLLS+DYcz4cNM8jbtxM0yMW Uarj6gDquGsCwurCHH4fGkU/vlDrIn0MfdA5bSTvi0yXVAOOOEimkTNJGt5PA8Q2B+c+ TdQ50aaVMqc+WUmnoeECBIKuv4kcWFI9iCWIWQEV3C2E4Zo3xyVjd/9XH1WYaipgfwlJ iiDnNZDKY0d6WlSdlXd4maNP6qYcMpcN2dk0QMxBNA11TzC3eso/XHIWtKbTxzMr/n1E X0Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:date:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=Ah+PGP8o/FyvN4G1VIxPhld3lHPU5Vf3p/CKy9L7JHU=; b=oz2M6uYxybzkYEjQ1y1U+8X3KN6iRiJJAl4I7IvsNepIDx+y1t+sWy3+Gm7mwHVhg4 +J7HixMIZQxWYhNZx6MPbon9I9muiIDcYAwcK3RaFl92x+BhYqkeeNDPXCfZdLzYcfCI oTDR0d3F5QgrpMLadD0bVKrUwXIYyolVF0+l6W7aHhbA1rzfeKztSzNq/B2DvvqLBRNv zF8A6fuJkzwfwbeAE8jfpyoBTlpPKdyV+DyMn+d0Cub0zoSMKshEtvSNT4hLEXylSicA hKVpnw2JQntMQFj9inbLH4r8V00z1laT71S3mTIbRbROVQFBeSXPeYXbGyTvd0FmWtSh Mw5w== X-Gm-Message-State: AOAM5320mnfOZNpfToZgByxS6GRyYeJ05mKPw7jLM4TB4pvzR8kSAWwQ pZQgF6dJgprN3n2CgMtUj7g= X-Received: by 2002:ac8:1e0e:: with SMTP id n14mr19245064qtl.109.1596506217315; Mon, 03 Aug 2020 18:56:57 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id y3sm24273096qtj.55.2020.08.03.18.56.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Aug 2020 18:56:56 -0700 (PDT) From: Arvind Sankar X-Google-Original-From: Arvind Sankar Date: Mon, 3 Aug 2020 21:56:54 -0400 To: Nick Terrell Cc: Arvind Sankar , Nick Terrell , Ingo Molnar , Linux Kernel Mailing List , X86 ML , Kernel Team , Yann Collet , Gao Xiang , Sven Schmidt <4sschmid@informatik.uni-hamburg.de>, Andrew Morton , Greg Kroah-Hartman , Linus Torvalds Subject: Re: [PATCH] lz4: Fix kernel decompression speed Message-ID: <20200804015654.GA1943218@rani.riverdale.lan> References: <20200803194022.2966806-1-nickrterrell@gmail.com> <20200803215747.GA1644409@rani.riverdale.lan> <3961E1BD-8F58-4240-A3B3-B7032A405B42@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3961E1BD-8F58-4240-A3B3-B7032A405B42@fb.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 03, 2020 at 10:55:01PM +0000, Nick Terrell wrote: > > > > On Aug 3, 2020, at 2:57 PM, Arvind Sankar wrote: > > > > On Mon, Aug 03, 2020 at 12:40:22PM -0700, Nick Terrell wrote: > >> From: Nick Terrell > >> > >> This patch replaces all memcpy() calls with LZ4_memcpy() which calls > >> __builtin_memcpy() so the compiler can inline it. > >> > >> LZ4 relies heavily on memcpy() with a constant size being inlined. In > >> x86 and i386 pre-boot environments memcpy() cannot be inlined because > >> memcpy() doesn't get defined as __builtin_memcpy(). > >> > >> An equivalent patch has been applied upstream so that the next import > >> won't lose this change [1]. > >> > >> I've measured the kernel decompression speed using QEMU before and after > >> this patch for the x86_64 and i386 architectures. The speed-up is about > >> 10x as shown below. > >> > >> Code Arch Kernel Size Time Speed > >> v5.8 x86_64 11504832 B 148 ms 79 MB/s > >> patch x86_64 11503872 B 13 ms 885 MB/s > >> v5.8 i386 9621216 B 91 ms 106 MB/s > >> patch i386 9620224 B 10 ms 962 MB/s > >> > >> I also measured the time to decompress the initramfs on x86_64, i386, > >> and arm. All three show the same decompression speed before and after, > >> as expected. > >> > >> [1] https://github.com/lz4/lz4/pull/890 > >> > > > > Hi Nick, would you be able to test the below patch's performance to > > verify it gives the same speedup? It removes the #undef in misc.c which > > causes the decompressors to not use the builtin version. It should be > > equivalent to yours except for applying it to all the decompressors. > > > > Thanks. > > I will measure it. I would expect it to provide the same speed up. It would be great to fix > the problem for x86/i386 in general. Thanks. I tried using RDTSC to get some timings under QEMU, and I get similar speedup as you have for LZ4, and around 15-20% or so for ZSTD (on 64-bit) -- I see that ZSTD_copy8 is already using __builtin_memcpy, but there must be more that can be optimized? There's a couple 1/2-byte sized copies in huf_decompress.c. For the rest of the algos there seems to be no difference, likely because some casual grepping doesn't show any constant-size copies in the other algorithms.