Received: by 2002:ac0:950e:0:0:0:0:0 with SMTP id f14csp174069imc; Fri, 15 Mar 2019 20:54:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqyWRjdeXd6kK4Q9TSodZYyPEBouB2bYHbHScGxDjrqaHmEelZiWTe1Tc4MEgX+/4F/WlYfw X-Received: by 2002:a62:8c:: with SMTP id 134mr7492256pfa.27.1552708484767; Fri, 15 Mar 2019 20:54:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552708484; cv=none; d=google.com; s=arc-20160816; b=qvLV8ELp/052LfT+76KRfrBGChJ4OKe5ZBewfBkzyeDUjrY34bkuqWGgX5Y4XLqMI8 UTpQQMyY320VGs81I+/MD+X9xqlRo3YrlbUalKdOQyQUXwq95fPHx69vsaBMDoLfjh2f kzTHbG3MpcWlRqMraVVQ/M85tCjy57SsZiHy5JCU1sdwl4R847im/k3fQgclQCozR1zc 9lKT5ZyF2JokMyindsNGP3TWkUe8/O38LlTxce5pgpvAF9g1A2CymL8rKFSNkQ1K1fDj Gt734tl6cQyrK8m78uOLNGQ/Nr537ERhKjBtaQBMeMtNIalt7zJrybAwa762mXft5STi JY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:date:from:message-id; bh=DFBcOFyvJ2TlOWCXrDbIvSym7nUs1Mb5NRPWIbjOb+w=; b=z0neKxO+i/aEAr75ZxcI6l1YBxjAkuDODGQ9ZrU6XKKZzMO+ZKXDBpIRvzTQKq3NXU ShITv2vpnfzta1oZbc4xqsKaZ7aozZFbOTY7ztS7uQlaM0xNgKiyjbk/n3ED0+i1+DDV DjUYWuDKk4mqzeIfcoowNeZzAJPc2Yr9PHR3xlTJhax9YbHYw5FnrdbJcgnzdyZ8vxsK BGxC6SghZtIrNO+fr5OGqKac47YCUVvneS+uCtacPKSbtsGLULmtrAnAXoYhluDai0fD BHDMOjR3t/AwOLu6jg4DhphopzQ5k4VpagznNs5ISYi4ecC+LEGfd1tXu7R8vuAc8xLn 6URg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z13si3265775pgp.34.2019.03.15.20.54.29; Fri, 15 Mar 2019 20:54:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726932AbfCPDxW (ORCPT + 99 others); Fri, 15 Mar 2019 23:53:22 -0400 Received: from mx.sdf.org ([205.166.94.20]:60215 "EHLO mx.sdf.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726765AbfCPDxW (ORCPT ); Fri, 15 Mar 2019 23:53:22 -0400 Received: from sdf.org (IDENT:lkml@sdf.lonestar.org [205.166.94.16]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id x2G3pFrK022479 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Sat, 16 Mar 2019 03:51:15 GMT Received: (from lkml@localhost) by sdf.org (8.15.2/8.12.8/Submit) id x2G3pEuq020022; Sat, 16 Mar 2019 03:51:14 GMT Message-Id: From: George Spelvin Date: Sat, 16 Mar 2019 02:43:19 +0000 Subject: [PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller To: linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org, Andrew Morton Cc: George Spelvin , Andrey Abramov , Geert Uytterhoeven , Daniel Wagner , Rasmus Villemoes , Don Mullis , Dave Chinner , Andy Shevchenko Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v1->v2: Various spelling, naming and code style cleanups. Generally positive and no negative responses to the goals and algorithms used. I'm running these patches, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, on the machine I'm sending this from. I have tweaked the comments further, but I have verified the compiled object code is identical to a snapshot I took when I rebooted. As far as I'm concerned, this is ready to be merged. As there is no owner in MAINTAINERS, I was thinking of sending it via AKPM, like the recent lib/lzo changes. Andrew, is that okay with you? Because CONFIG_RETPOLINE has made indirect calls much more expensive, I thought I'd try to reduce the number made by the library sort functions. The first three patches apply to lib/sort.c. Patch #1 is a simple optimization. The built-in swap has special cases for aligned 4- and 8-byte objects. But those are almost never used; most calls to sort() work on larger structures, which fall back to the byte-at-a-time loop. This generalizes them to aligned *multiples* of 4 and 8 bytes. (If nothing else, it saves an awful lot of energy by not thrashing the store buffers as much.) Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice simple solid heapsort is preferable to more complex algorithms (sorry, Andrey), but it's possible to implement heapsort with far fewer comparisons (50% asymptotically, 25-40% reduction for realistic sizes) than the way it's been done up to now. And with some care, the code ends up smaller, as well. This is the "big win" patch. Patch #3 adds the same sort of indirect call bypass that has been added to the net code of late. The great majority of the callers use the builtin swap functions, so replace the indirect call to sort_func with a (highly preditable) series of if() statements. Rather surprisingly, this decreased code size, as the swap functions were inlined and their prologue & epilogue code eliminated. lib/list_sort.c is a bit trickier, as merge sort is already close to optimal, and we don't want to introduce triumphs of theory over practicality like the Ford-Johnson merge-insertion sort. Patch #4, without changing the algorithm, chops 32% off the code size and removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding upper limit on efficiently sortable input size). Patch #5 improves the algorithm. The previous code is already optimal for power-of-two (or slightly smaller) size inputs, but when the input size is just over a power of 2, there's a very unbalanced final merge. There are, in the literature, several algorithms which solve this, but they all depend on the "breadth-first" merge order which was replaced by commit 835cc0c8477f with a more cache-friendly "depth-first" order. Some hard thinking came up with a depth-first algorithm which defers merges as little as possible while avoiding bad merges. This saves 0.2*n compares, averaged over all sizes. The code size increase is minimal (64 bytes on x86-64, reducing the net savings to 26%), but the comments expanded significantly to document the clever algorithm. TESTING NOTES: I have some ugly user-space benchmarking code which I used for testing before moving this code into the kernel. Shout if you want a copy. I'm running this code right now, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient bottom-up heapsort variant lib/sort: Avoid indirect calls to built-in swap lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS lib/list_sort: Optimize number of calls to comparison function include/linux/list_sort.h | 1 + lib/list_sort.c | 244 +++++++++++++++++++++++++--------- lib/sort.c | 266 +++++++++++++++++++++++++++++--------- 3 files changed, 387 insertions(+), 124 deletions(-) -- 2.20.1