Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2087148imu; Wed, 21 Nov 2018 06:33:39 -0800 (PST) X-Google-Smtp-Source: AFSGD/VeNY6vQABx5LBIOe042IwghqowzAcxhk1oijAzoScJvTJOG8GUVMWVEuKW5h5zAbyffsH+ X-Received: by 2002:a63:4456:: with SMTP id t22mr6399601pgk.0.1542810818947; Wed, 21 Nov 2018 06:33:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542810818; cv=none; d=google.com; s=arc-20160816; b=vFpR6X1jS9duSvS3gPfpu3p3JTPiEV8uzIQU3aA0VOstQ29HGgEOVWfZTCO/xTb/IT HI9ucwZJVgEWNL1OIqf4bhtWB7T4bzbMPjI36In+fviDA72eLUBOu4azHnF4zPZK5jO+ fIBVeR/8zmG8FmmWo45hz7ff+6yob6BoFtnGL9TamxEoRVP2XgwKtEAMgmbbtMfCUsWj UwDPBTXFpiAW+rCbzjjJGRGOQ++M7cb+D+teTmsM2UvlHOHmYn6Rp82rGsWf9it13SmN qsqMRXN5d5ziKQwR+8JTQNHLxfmFC4Vt1uyyhig4MJ/lmWyqUvmwzm+3TXHoTRCK42Wd yqrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:organization:from:cc :references:to:subject:dkim-signature; bh=oy7/bDGN6Qo1Wl5tTPSgAnpptDsF4MfLP+gsYYEk8u4=; b=JQ1RU8ovuOfXtIsmpxTvJ7SQ2jLBWqWbepy0uwt5LZ4SxEJCYr+eigXW0KCrH+tWKF Q+2QWb6aU3c3AvJfWVM8PC/lS+MpATRfJqNH1DBfRsxtt9umi6oCT15iBUcPrdLrs2XK CL9TJ76e5PODvFOWQLusdmwW6B0jP6u4LFTMG6uDaCwOtef+x5thjDre57AdS81W/TQd vHjFn/2d4MaVmPXyYDdYHoAkRjA/XJQVzGDHh1C16q5z3MrNajrC7EKTP5OhRPri671I lplJ8Hbi+kpSuo/FXj11veMXINvQciZW2wW592BRxWl4DIZKYLvghwcQCob7jqKjvmwl FU0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oberhumer.com header.s=main header.b=hkJBrV7C; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u184si45090761pgd.93.2018.11.21.06.33.22; Wed, 21 Nov 2018 06:33:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oberhumer.com header.s=main header.b=hkJBrV7C; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729658AbeKVA1Y (ORCPT + 99 others); Wed, 21 Nov 2018 19:27:24 -0500 Received: from mail.servus.at ([193.170.194.20]:33822 "EHLO mail.servus.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728045AbeKVA1Y (ORCPT ); Wed, 21 Nov 2018 19:27:24 -0500 X-Greylist: delayed 480 seconds by postgrey-1.27 at vger.kernel.org; Wed, 21 Nov 2018 19:27:23 EST Received: from localhost (localhost [127.0.0.1]) by mail.servus.at (Postfix) with ESMTP id 5953D3000677; Wed, 21 Nov 2018 14:44:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=oberhumer.com; h= content-transfer-encoding:content-type:content-type:in-reply-to :mime-version:user-agent:date:date:message-id:organization:from :from:references:subject:subject:received:received; s=main; t= 1542807891; x=1544622292; bh=aAphKOCgmy46W7vOMqmhSKzSpxRPSKLfWvc mCIk0FdU=; b=hkJBrV7CtmzEMQGR6+Nmgiss1sfE+AqUBYCEEzpoR1EdRJVq6XE rUWezZL4HJEVWZxxykryGlle7TcMsGeYYyXyjesqSF8hLzwOS47a9YZXSwonnfxk kepAnxf2+NnMyiYpIc9Ih26pjHUcJxtI7nt2If8aYohFcKHPuXcHRkcM= X-Virus-Scanned: amavisd-new at servus.at Received: from mail.servus.at ([127.0.0.1]) by localhost (mail.servus.at [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nklUh_TOA-HP; Wed, 21 Nov 2018 14:44:51 +0100 (CET) Received: from [192.168.216.53] (unknown [81.10.228.128]) (Authenticated sender: oh_markus) by mail.servus.at (Postfix) with ESMTPSA id B077E3000676; Wed, 21 Nov 2018 14:44:49 +0100 (CET) Subject: Re: [PATCH 0/6] lib/lzo: performance improvements To: Dave Rodgman , "linux-kernel@vger.kernel.org" References: Cc: nd , "herbert@gondor.apana.org.au" , "davem@davemloft.net" , Matt Sealey , "nitingupta910@gmail.com" , "rpurdie@openedhand.com" , "minchan@kernel.org" , "sergey.senozhatsky.work@gmail.com" , Sonny Rao From: "Markus F.X.J. Oberhumer" Organization: oberhumer.com Message-ID: <5BF56151.5090201@oberhumer.com> Date: Wed, 21 Nov 2018 14:44:49 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dave, thanks for your patch set. Just some initial comments: I think the three patches [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm [PATCH 3/6] lib/lzo: 64-bit CTZ on Arm aarch64 [PATCH 4/6] lib/lzo: fast 8-byte copy on arm64 should be applied in any case - could you please make an extra pull request out of these and try to get them merged as fast as possible. Thanks. The first patch [PATCH 1/6] lib/lzo: clean-up by introducing COPY16 does not really affect the resulting code at the moment, but please note that in one case the actual copy unit is not allowed to be greater 8 bytes (which might be implied by the name "COPY16"). So this needs more work like an extra COPY16_BY_8() macro. As for your your "lzo-rle" improvements I'll have a look. Please note that the first byte value 17 is actually valid when using external dictionaries ("lzo1x_decompress_dict_safe()" in the LZO source code). While this functionality is not present in the Linux kernel at the moment it might be worrisome wrt future enhancements. Finally I'm wondering if your chart comparisions just compares the "lzo-rle" patch or also includes the ARM64 improvments - I cannot understand where a 20% speedup should come from if you have 0% zeros. Cheers, Markus On 2018-11-21 13:06, Dave Rodgman wrote: > This patch series introduces performance improvements for lzo. > > The improvements fall into two categories: general Arm-specific optimisations > (e.g., more efficient memory access); and the introduction of a special case > for handling runs of zeros (which is a common case for zram) using run-length > encoding. > > The introduction of RLE modifies the bitstream such that it can't be decoded > by old versions of lzo (the new lzo-rle can correctly decode old bitstreams). > To avoid possible issues where data is persisted on disk (e.g., squashfs), the > final patch in this series separates lzo-rle into a separate algorithm > alongside lzo, so that the new lzo-rle is (by default) only used for zram and > must be explicitly selected for other use-cases. This final patch could be > omitted if the consensus is that we'd rather avoid proliferation of lzo > variants. > > Overall, performance is improved by around 1.1 - 4.8x (data-dependent: data > with many zero runs shows higher improvement). Under real-world testing with > zram, time spent in (de)compression during swapping is reduced by around 27%. > The graph below shows the weighted round-trip throughput of lzo, lz4 and > lzo-rle, for randomly generated 4k chunks of data with varying levels of > entropy. (To calculate weighted round-trip throughput, compression performance > is emphasised to reflect the fact that zram does around 2.25x more compression > than decompression. (Results and overall trends are fairly similar for > unweighted). > > https://drive.google.com/file/d/18GU4pgRVCLNN7wXxynz-8R2ygrY2IdyE/view > > Contributors: > Dave Rodgman > Matt Sealey > -- Markus Oberhumer, , http://www.oberhumer.com/