Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2440934imu; Wed, 21 Nov 2018 11:39:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/XieQAsUB59VxwzKcsGw/A/hSoLL57lsAolpJTk4MA+E0xx8LW2a1AJMpSeJurE/OuA8+pP X-Received: by 2002:a65:6392:: with SMTP id h18mr7297625pgv.107.1542829199151; Wed, 21 Nov 2018 11:39:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542829199; cv=none; d=google.com; s=arc-20160816; b=1G4TlEGPC9Y8IMLdz7gUDSQw24TEd1GoG2k9G3/ZFmKXsUmHjUKcPkaIFgwCKXPTqT w6y1mkPq9WFOAeSTEBeQJO//GQP5zvy6Ds2vhaOIqddfi51BYOXiee7mkh+iNNVQlbMT iT9fUhROVxkpXBBYyALUGRz4xL1z6LX7UHXDaLVw+grENn36Sbvcc76Wmo1PfLAJwpMO RpgcRZ51U8cw9Zblg16JPEaMwwM8ITeoPI4t2+P/3ivc1Cr5XgClE5yJzn1BCbtmSjgO /z06gn9q/7209zdxIf2+eiO7DbG0KVSu7ahc4+AehCqxwtilQ3sPhNTjCI/ZSipyDKud l67Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:spamdiagnosticmetadata:spamdiagnosticoutput:nodisclaimer :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature; bh=3cFzlPWoAxT6W7Sc2Bhf59J5A/MKN/NzXXO6xLwZaAA=; b=bUYfYEn00UL3RK7zLTsGwXKTtE5egIOjVN62z0kjZeElA8RjyoDJ+rM4anHLuK+Q/N 6nQITIBYXlu6BLLhkOQzzmqUtGTYV2v/5BRtOPx2VGkau92DLR/dm3kuuyAAZljcFntF WA6Rs12jilQebuql2+pmkKuM6iVJDGUMNL8g5hGssv+WqQVDXaN5DMIDsxiLE+V2n9nD OXrvVcEHlH3DwI7YiG/+yuVqHrxlOI+cMdGTqHwVtPmUSsdzQo6hLp5Omw4530zZleNg ad7KWbR68bUSKfg23/xf9hphQ9rgmLIwoQKg1GOQIAYUCzO9ant4T5TjG7nILKmqx17H 1Ckg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector1-arm-com header.b=nfBvWTQN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f18si18294056pgl.457.2018.11.21.11.39.44; Wed, 21 Nov 2018 11:39:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector1-arm-com header.b=nfBvWTQN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732014AbeKVDaT (ORCPT + 99 others); Wed, 21 Nov 2018 22:30:19 -0500 Received: from mail-eopbgr00087.outbound.protection.outlook.com ([40.107.0.87]:4176 "EHLO EUR02-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730781AbeKVDaT (ORCPT ); Wed, 21 Nov 2018 22:30:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3cFzlPWoAxT6W7Sc2Bhf59J5A/MKN/NzXXO6xLwZaAA=; b=nfBvWTQNjZAxLy6xypCioyGywrpnr/C2agFqWfkDyEs9XpYOyk95p3LurEjReIEKpqdAFR/m3tLujvUfXfYR3oL0qaLixU2bwGWLhg/Z4mVyuGc9mjxpu2kCVsIfnc3Lyf1yR9r8e+Mi9heyqtUjmaHVn2m0kGdb3nuTG29660g= Received: from VI1PR0802MB2528.eurprd08.prod.outlook.com (10.175.20.142) by VI1PR0802MB2303.eurprd08.prod.outlook.com (10.172.13.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1361.14; Wed, 21 Nov 2018 16:55:02 +0000 Received: from VI1PR0802MB2528.eurprd08.prod.outlook.com ([fe80::3d5c:5229:b634:b1ac]) by VI1PR0802MB2528.eurprd08.prod.outlook.com ([fe80::3d5c:5229:b634:b1ac%9]) with mapi id 15.20.1339.027; Wed, 21 Nov 2018 16:55:01 +0000 From: Dave Rodgman To: "Markus F.X.J. Oberhumer" , "linux-kernel@vger.kernel.org" CC: nd , "herbert@gondor.apana.org.au" , "davem@davemloft.net" , Matt Sealey , "nitingupta910@gmail.com" , "rpurdie@openedhand.com" , "minchan@kernel.org" , "sergey.senozhatsky.work@gmail.com" , Sonny Rao Subject: Re: [PATCH 0/6] lib/lzo: performance improvements Thread-Topic: [PATCH 0/6] lib/lzo: performance improvements Thread-Index: AQHUgZI88EDtuaHmFU63dmJG+DkKO6VaPZmAgAA1IoA= Date: Wed, 21 Nov 2018 16:55:01 +0000 Message-ID: <992dd863-0143-38c9-6f6d-7cb1bb6fd15d@arm.com> References: <5BF56151.5090201@oberhumer.com> In-Reply-To: <5BF56151.5090201@oberhumer.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [217.140.106.53] x-clientproxiedby: CWXP265CA0041.GBRP265.PROD.OUTLOOK.COM (2603:10a6:400:2d::29) To VI1PR0802MB2528.eurprd08.prod.outlook.com (2603:10a6:800:b0::14) authentication-results: spf=none (sender IP is ) smtp.mailfrom=dave.rodgman@arm.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;VI1PR0802MB2303;6:c9udRNgJ0Kg0/RbEXKT78Nftdf6aoQc+w7M11PqJCE6PtuSEkquOjdgNwiiz+f3tzfqBPHJSV+1oM3pUx2Z0vyn4Lh+qheav0VBmZaLOD6L0VK9G03rrp9PgUdtSykzm60dQj4zh+VT5hZMeMdGkl9S1b7CPNzUdOAgMt5IT2GXfjIsgslS3CvZMurNQNVXvFBbpT5cIMgSRRgHUyPKu5qjh+BxDBogvpipgn7dtgbdWBH3W4C3Yl+HssqPeUnItfyjJpxAYd6+Ob1hWT9l1hl7WSpnsuudzr7LTxzMumDPH76Vw5qmIKLDmSY24PndLwOkHGi67mO4iVkasaSjy3ELz6MoKHCB8YDULEJv55KYjVibUpTYxhXI0zetSN1FteDFtSNzD8Suj6MSKxt0KrG0bLaXGeT6rg11WOrVq8aqKmStSnQ3Ki1Tlr5kfDaHSZGtay6o4m0ZONbbuChTFnw==;5:3tQ+eia5eskTDjzAaySxZMzNTegwgeD/0UH6G2fdhj2LRXkaZWkfbUxTr4bmkXhthUzgOY47ejiHVlH3dCf8+r64VfQUhoHsJdolabPK+L28uIKeIuRfjbN1YO2WlzBqZYJI0zq7e/Wto+1J2vHubUyP6cwFg9pc06m+4zdqMsQ=;7:9WzqJNj1EkXMcTjoi88QenCiGAK6XcwpghCke2hCnd/EVdQKOjEytBKMSOwPEILI5jyoL6MjOfIvb0MzlZ8ChMoBBfwRQOZ0yJtns6K2EDr0mPuW9ovIQRsKqoNrN+W/vQKtGZoF3Pv0Su+gtUAnQQ== x-ms-office365-filtering-correlation-id: 3bdce13f-3f67-4d99-8022-08d64fd213ab x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390098)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020);SRVR:VI1PR0802MB2303; x-ms-traffictypediagnostic: VI1PR0802MB2303: nodisclaimer: True x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(10201501046)(3231442)(944501410)(52105112)(3002001)(93006095)(93001095)(6055026)(148016)(149066)(150057)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123558120)(20161123562045)(201708071742011)(7699051)(76991095);SRVR:VI1PR0802MB2303;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0802MB2303; x-forefront-prvs: 08635C03D4 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(979002)(39860400002)(396003)(376002)(136003)(366004)(346002)(189003)(199004)(76176011)(66066001)(106356001)(105586002)(2501003)(71200400001)(71190400001)(4326008)(52116002)(6506007)(386003)(81156014)(6246003)(26005)(186003)(39060400002)(53936002)(8676002)(256004)(25786009)(31686004)(53546011)(14454004)(68736007)(102836004)(8936002)(31696002)(81166006)(86362001)(36756003)(99286004)(2900100001)(6116002)(446003)(2616005)(6306002)(476003)(11346002)(5660300001)(6512007)(6486002)(229853002)(14444005)(3846002)(2906002)(97736004)(110136005)(54906003)(44832011)(7736002)(4001150100001)(966005)(478600001)(305945005)(316002)(486006)(6436002)(969003)(989001)(999001)(1009001)(1019001);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0802MB2303;H:VI1PR0802MB2528.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: pTPCJs4+c7IN/0d/zjfWuMuFC+17FdDkl3CUWT7P2pLwtaAPzYN5PGOX/9q/EO3qr4Wwiti15RBU2ych1bpPw7xJzkHJYnbu6zDqRXnCK2QonXO9wvvXuqpBeT0jPjE7g9DC+PfJ0OgJnCp/x0sViKvpTupyAK+19Jqs4Wh9KjmcGXYZ47pDDZD1TugdE9cphtJmdr51QNLW/IjZW8Ka1Z71QGFnBAfuoeoX1iXLePUH/fZe3lMSdqWPRuDbmA+Re+prbuhjp4kWGMBul7Voj4HlRq6qZG2o1ol3R9BrDj6rsXZH0cOhXEu/xx+Guq9s5CQWqA9RrVRu9OTrkrbeaqeifaPCM0ZkoIV6qCiHFuM= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: <03F39B7CC4827F4C9AB4695EEDD2D64A@eurprd08.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3bdce13f-3f67-4d99-8022-08d64fd213ab X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Nov 2018 16:55:01.8736 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2303 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21/11/2018 1:44 pm, Markus F.X.J. Oberhumer wrote: > I think the three patches >=20 > [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm > [PATCH 3/6] lib/lzo: 64-bit CTZ on Arm aarch64 > [PATCH 4/6] lib/lzo: fast 8-byte copy on arm64 >=20 > should be applied in any case - could you please make an extra > pull request out of these and try to get them merged as fast > as possible. Thanks. The three patches you mention give around 10-25% performance uplift=20 (mostly on compression). I'll look at generating a pull request for these. > [PATCH 1/6] lib/lzo: clean-up by introducing COPY16 >=20 > does not really affect the resulting code at the moment, but please > note that in one case the actual copy unit is not allowed to > be greater 8 bytes (which might be implied by the name "COPY16"). > So this needs more work like an extra COPY16_BY_8() macro. I'll leave Matt to comment on this one, as it's his patch. > As for your your "lzo-rle" improvements I'll have a look. >=20 > Please note that the first byte value 17 is actually valid when using > external dictionaries ("lzo1x_decompress_dict_safe()" in the LZO source > code). While this functionality is not present in the Linux kernel at > the moment it might be worrisome wrt future enhancements. I wasn't aware of the external dictionary concern. Do you have any=20 suggestions for an alternative instruction that we could use instead=20 that would not be used by the existing lzo algorithm at the start of the=20 stream? If there isn't anything suitable, then we'd have to choose=20 between backwards compatibility (not a huge issue, if lzo-rle were to be=20 kept as a separate algorithm to lzo, but certainly nice to have) vs.=20 allowing for the possibility of introducing external dictionaries in future= . > Finally I'm wondering if your chart comparisions just compares the "lzo-r= le" > patch or also includes the ARM64 improvments - I cannot understand where = a > 20% speedup should come from if you have 0% zeros. The chart does indeed include the other improvements, so this is where=20 the performance uplift on the left hand side of the chart (i.e., random=20 data) comes from. Thanks for taking a look at this. Dave >=20 > Cheers, > Markus >=20 >=20 >=20 > On 2018-11-21 13:06, Dave Rodgman wrote: >> This patch series introduces performance improvements for lzo. >> >> The improvements fall into two categories: general Arm-specific optimisa= tions >> (e.g., more efficient memory access); and the introduction of a special = case >> for handling runs of zeros (which is a common case for zram) using run-l= ength >> encoding. >> >> The introduction of RLE modifies the bitstream such that it can't be dec= oded >> by old versions of lzo (the new lzo-rle can correctly decode old bitstre= ams). >> To avoid possible issues where data is persisted on disk (e.g., squashfs= ), the >> final patch in this series separates lzo-rle into a separate algorithm >> alongside lzo, so that the new lzo-rle is (by default) only used for zra= m and >> must be explicitly selected for other use-cases. This final patch could = be >> omitted if the consensus is that we'd rather avoid proliferation of lzo >> variants. >> >> Overall, performance is improved by around 1.1 - 4.8x (data-dependent: d= ata >> with many zero runs shows higher improvement). Under real-world testing = with >> zram, time spent in (de)compression during swapping is reduced by around= 27%. >> The graph below shows the weighted round-trip throughput of lzo, lz4 and >> lzo-rle, for randomly generated 4k chunks of data with varying levels of >> entropy. (To calculate weighted round-trip throughput, compression perfo= rmance >> is emphasised to reflect the fact that zram does around 2.25x more compr= ession >> than decompression. (Results and overall trends are fairly similar for >> unweighted). >> >> https://drive.google.com/file/d/18GU4pgRVCLNN7wXxynz-8R2ygrY2IdyE/view >> >> Contributors: >> Dave Rodgman >> Matt Sealey >> >=20