Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3814748imu; Fri, 30 Nov 2018 06:29:46 -0800 (PST) X-Google-Smtp-Source: AFSGD/WJv3lY4wpIqFsFdtHAgjF7OiXPZ/orAL0Hi67OA1jSZlNEM6/MnFsTWJvzjGBUxnpuIa9E X-Received: by 2002:a62:28c9:: with SMTP id o192mr5937538pfo.57.1543588186371; Fri, 30 Nov 2018 06:29:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543588186; cv=none; d=google.com; s=arc-20160816; b=d3i9U/sOqzMiBx+3Ksv8g3v8cE3HCE1AQA7lcMUJGaJr5HC9pSZMNJzHjc95Zqi3L0 F5DWmjFk5dDST1RwgvPYpsqZ4hM+H3pkHPqwo+QLmdVhDH9S0N4BrODiWKKEyrBL7QjX URT7cLpD911D7hewuVF8ynVMipkBG5VAQLYvAVsCugM1S+RpkaT4FgrbzDwglsN2I8ZT 13RYCdtRiYW0CmZ4ZFSBSY8+iJkPTv3v9J10Iko03FP0qKPn9CIgsHcMllV5yHlcNBNo kjgISGwCBZZFPhSoEmuLCkKSPjKlyVDsxPeHLHiubvRDnG8E7CiXICmx2jOu9vaF8NOi CjaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:nodisclaimer :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature; bh=T70Dk1wJvkEMPK7aFNB7DKCtZfihpotTgaFbDxRTtZk=; b=Js3Kx6IRR7NXT44ujXbWibjLW5DD7JaEwQnfa7XrtfgD0heCrkGWb2wT3KobOcoJSa ive0K+0bStn7aDwT5Ozv8MQUbJefIp0blqcUayOcX29d06F+nLeom/oa51n0ejji3Ghi hInosF+S381kY6jPHsVNmiz5QKpSRBEbZYBP3C5oQU761T6It+z76W3roVFRmdRKumU5 9wnLVwoMRP3nAGov6/rsWXcAjUSSD5Wb0y4Ndk3xxi6/cgzb6L4vs5iq/bhyABOcjKTb h0EuP3u18SsRQkqGY8LS4zZJf1Voff4F/r8hMBMwuNVOOBKJP+khPLCM3BNZ7AqawXCy wtDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector1-arm-com header.b="oqTC/Kl/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5si5202471pgi.146.2018.11.30.06.29.30; Fri, 30 Nov 2018 06:29:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector1-arm-com header.b="oqTC/Kl/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727157AbeLABf6 (ORCPT + 99 others); Fri, 30 Nov 2018 20:35:58 -0500 Received: from mail-eopbgr140078.outbound.protection.outlook.com ([40.107.14.78]:30574 "EHLO EUR01-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727117AbeLABf5 (ORCPT ); Fri, 30 Nov 2018 20:35:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=T70Dk1wJvkEMPK7aFNB7DKCtZfihpotTgaFbDxRTtZk=; b=oqTC/Kl/0UYmiYmRmcqVGpKHDcM5TPNx9iRj+Tx7T8Igtkwk7oOep7BVaN8Drvn0NphoPDgNpAE3fHHucKfjqrPq7+NaPP1FZI4ixjWo3Y3N9D5K7xhZto/ZoHzOowVgMsAp9N2clFOu5fjz7li0IDARI27fti1k4lANwpsdAvI= Received: from VI1PR0802MB2528.eurprd08.prod.outlook.com (10.175.20.142) by VI1PR0802MB2415.eurprd08.prod.outlook.com (10.175.25.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1361.19; Fri, 30 Nov 2018 14:26:24 +0000 Received: from VI1PR0802MB2528.eurprd08.prod.outlook.com ([fe80::3d5c:5229:b634:b1ac]) by VI1PR0802MB2528.eurprd08.prod.outlook.com ([fe80::3d5c:5229:b634:b1ac%11]) with mapi id 15.20.1382.020; Fri, 30 Nov 2018 14:26:24 +0000 From: Dave Rodgman To: "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" CC: "herbert@gondor.apana.org.au" , "davem@davemloft.net" , Matt Sealey , "nitingupta910@gmail.com" , "markus@oberhumer.com" , "minchan@kernel.org" , "sergey.senozhatsky.work@gmail.com" , "sonnyrao@google.com" , "gregkh@linuxfoundation.org" , nd , "sfr@canb.auug.org.au" Subject: [PATCH 2/8] lib/lzo: clean-up by introducing COPY16 Thread-Topic: [PATCH 2/8] lib/lzo: clean-up by introducing COPY16 Thread-Index: AQHUiLirBI6/h7mwQEyP8KI0iA2D6g== Date: Fri, 30 Nov 2018 14:26:23 +0000 Message-ID: <20181130142600.13782-3-dave.rodgman@arm.com> References: <20181130142600.13782-1-dave.rodgman@arm.com> In-Reply-To: <20181130142600.13782-1-dave.rodgman@arm.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.17.1 x-originating-ip: [217.140.106.53] x-clientproxiedby: CWLP265CA0336.GBRP265.PROD.OUTLOOK.COM (2603:10a6:401:57::36) To VI1PR0802MB2528.eurprd08.prod.outlook.com (2603:10a6:800:b0::14) authentication-results: spf=none (sender IP is ) smtp.mailfrom=dave.rodgman@arm.com; x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;VI1PR0802MB2415;6:viZ3lZaom/zuHMt3odCHjCssw3lA/+xSx0fKCgm5tdmiKkGBvUkTyzP249u59SQdSdbzJyk43S6RGrMPbFtSMRxiAOmBerWsyrPq3aOvWuf3wUL55EjvW5SQ5eM9ZTyyBYo6StAZ+rxIy+i/zNSMgXLNiRPFR59MfNyrX2hDU5hnjfC4GbCaK7697Eyl5JpICeYLItrt1GK5twKghmJm+WLmtS+AARpJqBjtYvtEnJadKbwwc4BuseC58F05Pf0RN3zEWEfp60dBiQ+ALBdBuaSkR+Gbv/pepW6WsqoXcn9/Z/U7JS1BsPAGuZs04SMb6AQCD84oMJOW22YqllFnsMIkEzh0OWyZSqy0E29VNI5coumFoOJ2G14ZrTHrsXUulREbFkevSX7AnnD0ggzp6hjt2w0SmUV7UYUnkOQeLyLZhZ6V8zGa1nCr/J6mPmkz1FDj6ZcyBbMz7dQ/H2RI6w==;5:+GQe5F5IcIblyzBRb92UQbN8ICAVxfBcBRcSEnOa6wiyC1WWrMTwLJxoQhOAzSuWkzKXvlLygJO3Fr8V7CfmHR3sMUG9BNOM76Zqc15KiZX4hk+Bg6HB8iUIYDrrtDXR0PDVjahxZ2FMAjPe++ypkLQ5+37n7bALsP3aSdVE9Ho=;7:u77ihIXtATfGPkZa5MkLCYv5wHISVPzsGF6wuTZz7Dk42wqvvs25Dr8XpPBvWWZbQL6FqSaegTSSh/U0Aj19aWHEyUsjxLAnosnYjepSmYPdu6iyExEB6egEaOjxXh8O/QwntAFi5/GZTHbOKed1pw== x-ms-office365-filtering-correlation-id: cd318591-fb88-45fb-f5fc-08d656cfce3c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390098)(7020095)(4652040)(8989299)(5600074)(711020)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020);SRVR:VI1PR0802MB2415; x-ms-traffictypediagnostic: VI1PR0802MB2415: nodisclaimer: True x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231453)(999002)(944501466)(4982022)(52105112)(3002001)(93006095)(93001095)(10201501046)(6055026)(148016)(149066)(150057)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(20161123558120)(20161123560045)(201708071742011)(7699051)(76991095);SRVR:VI1PR0802MB2415;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0802MB2415; x-forefront-prvs: 087223B4DA x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(396003)(39860400002)(376002)(346002)(366004)(136003)(199004)(189003)(386003)(6306002)(5660300001)(476003)(50226002)(11346002)(486006)(8936002)(446003)(6486002)(7416002)(39060400002)(25786009)(6506007)(2616005)(44832011)(6436002)(4326008)(66066001)(102836004)(6512007)(110136005)(71190400001)(71200400001)(76176011)(81166006)(305945005)(81156014)(52116002)(99286004)(186003)(26005)(316002)(8676002)(54906003)(14454004)(97736004)(68736007)(1076002)(256004)(7736002)(2906002)(478600001)(106356001)(105586002)(36756003)(6116002)(2501003)(86362001)(53936002)(966005)(3846002);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0802MB2415;H:VI1PR0802MB2528.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: MpOUfiX1GANcqnjPH9gxM+iHvrM6RoOspH0B39XW3VH2JursMx4w7ldyk0O0nQ8SdITqEo4MnW9qW7B+GSa7s4JeB4r0zHrUhzv0XxMPZzzULD4ZJnyyxiPFfYur8ml48f5zqJZ947kqKN4Au1xeP8uXlen9RDFs/C5sIUky7aYRbZpvlJ3Vzal/2rhCMgBJmy8o4/yClOpsSJEn+G/pAabCuZSA86nMuztAXx1YRMQxZmlklVaymxHt7BuVw62tKFtQhyXbWV+3Z/tpIra5gNVKyd5U3sdC9kNAZQDc31ryM0IoAlpa0ZfcWRbeX+PGx+1s2XZ9lB+0BMcDicwahyw5CFPgtj2dm3yw2QVLdj8= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: cd318591-fb88-45fb-f5fc-08d656cfce3c X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Nov 2018 14:26:24.0037 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2415 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matt Sealey Most compilers should be able to merge adjacent loads/stores of sizes which are less than but effect a multiple of a machine word size (in effect a memcpy() of a constant amount). However the semantics of the macro are that it just does the copy, the pointer increment is in the code, hence we see *a =3D *b a +=3D 8 b +=3D 8 *a =3D *b a +=3D 8 b +=3D 8 This introduces a dependency between the two groups of statements which seems to defeat said compiler optimizers and generate some very strange sequences of addition and subtraction of address offsets (i.e. it is overcomplicated). Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs), just define COPY16 as COPY8,COPY8. We leave the definition to preserve the need to do unaligned accesses to machine-sized words per the original code intent, we just don't use it in the code proper. COPY16 then gives us code like: *a =3D *b *(a+8) =3D *(b+8) a +=3D 16 b +=3D 16 This seems to allow compilers to generate much better code by using base register writeback or simply positively incrementing offsets which seems to positively affect performance. It is, at least, fewer instructions to do the same job. Link: http://lkml.kernel.org/r/20181127161913.23863-3-dave.rodgman@arm.com Signed-off-by: Matt Sealey Signed-off-by: Dave Rodgman Cc: David S. Miller Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Markus F.X.J. Oberhumer Cc: Minchan Kim Cc: Nitin Gupta Cc: Richard Purdie Cc: Sergey Senozhatsky Cc: Sonny Rao Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- lib/lzo/lzo1x_compress.c | 9 +++------ lib/lzo/lzo1x_decompress_safe.c | 18 ++++++------------ lib/lzo/lzodefs.h | 3 +++ 3 files changed, 12 insertions(+), 18 deletions(-) diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c index 236eb21167b5..82fb5571ce5e 100644 --- a/lib/lzo/lzo1x_compress.c +++ b/lib/lzo/lzo1x_compress.c @@ -60,8 +60,7 @@ lzo1x_1_do_compress(const unsigned char *in, size_t in_le= n, op +=3D t; } else if (t <=3D 16) { *op++ =3D (t - 3); - COPY8(op, ii); - COPY8(op + 8, ii + 8); + COPY16(op, ii); op +=3D t; } else { if (t <=3D 18) { @@ -76,8 +75,7 @@ lzo1x_1_do_compress(const unsigned char *in, size_t in_le= n, *op++ =3D tt; } do { - COPY8(op, ii); - COPY8(op + 8, ii + 8); + COPY16(op, ii); op +=3D 16; ii +=3D 16; t -=3D 16; @@ -255,8 +253,7 @@ int lzo1x_1_compress(const unsigned char *in, size_t in= _len, *op++ =3D tt; } if (t >=3D 16) do { - COPY8(op, ii); - COPY8(op + 8, ii + 8); + COPY16(op, ii); op +=3D 16; ii +=3D 16; t -=3D 16; diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_saf= e.c index a1c387f6afba..aa95d3066b7d 100644 --- a/lib/lzo/lzo1x_decompress_safe.c +++ b/lib/lzo/lzo1x_decompress_safe.c @@ -86,12 +86,9 @@ int lzo1x_decompress_safe(const unsigned char *in, size_= t in_len, const unsigned char *ie =3D ip + t; unsigned char *oe =3D op + t; do { - COPY8(op, ip); - op +=3D 8; - ip +=3D 8; - COPY8(op, ip); - op +=3D 8; - ip +=3D 8; + COPY16(op, ip); + op +=3D 16; + ip +=3D 16; } while (ip < ie); ip =3D ie; op =3D oe; @@ -187,12 +184,9 @@ int lzo1x_decompress_safe(const unsigned char *in, siz= e_t in_len, unsigned char *oe =3D op + t; if (likely(HAVE_OP(t + 15))) { do { - COPY8(op, m_pos); - op +=3D 8; - m_pos +=3D 8; - COPY8(op, m_pos); - op +=3D 8; - m_pos +=3D 8; + COPY16(op, m_pos); + op +=3D 16; + m_pos +=3D 16; } while (op < oe); op =3D oe; if (HAVE_IP(6)) { diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h index 497f9c9f03a8..e1b3cf6459a9 100644 --- a/lib/lzo/lzodefs.h +++ b/lib/lzo/lzodefs.h @@ -23,6 +23,9 @@ COPY4(dst, src); COPY4((dst) + 4, (src) + 4) #endif =20 +#define COPY16(dst, src) \ + do { COPY8(dst, src); COPY8((dst) + 8, (src) + 8); } while (0) + #if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN) #error "conflicting endian definitions" #elif defined(CONFIG_X86_64) --=20 2.17.1