Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp1406307img; Tue, 19 Mar 2019 07:04:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqzUNUoQoQY/G9tyoRZe220vfXhQga/sVCZ9kKaqTkhUhQ+nFHeqCYNrvH+xqzyppi5I5D3d X-Received: by 2002:a63:5452:: with SMTP id e18mr2082211pgm.386.1553004282256; Tue, 19 Mar 2019 07:04:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553004282; cv=none; d=google.com; s=arc-20160816; b=H/PNVJRyO9IPmJIN6BnkRpqOeNmf5LkogsP5n53Q3j1sKheKpWuEEKIfP5cN2kzQgb t23o7U/3VZYtzAXpr/56EYFg6ERaZVS08ccgw7L9aUlHtMJIHpcwGplLLa628PHQZwTg iO5SdUZKvfi8pImqZv5Gd375N8Z5SkGIzvUY6cKJtbm9EUl1Y+4sSnAAYM4Zo4HPaPL9 j9Yf9s9UCRwG+g3QdSQyHs9+zIs551aBJk8/JHtHGNktSd36oVWvdPxYw8wtxHus9MnV Hzkudd5eRbWdnV2HXGr0oE1oQFX2QScwqW7CkKVc8o4nRvTH99VkEbxzqzroK+YpOC1p CujA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:organization:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=HIY+sWAwv5dOExCYPYGNvjDLCxQ2+g9fZRvtTUfk5Ag=; b=FI2Y1yDNlHp81fgzznkhFVxHaeySQFdY+J6zjFWpY+B8cfyiWprhFSJR7yG/1z51mm oHhXgldxdlscc+GR+ZsdBbYZD4A04i5jAaypWu/S7rNg2uZsLcP292A+c1/yslxbz8J3 1NRFCdlQjVtpKEOsnmtp/CRONRS7eRO1FqinKrzn+QjHlR3RNeqkuJRiuuxGVq94TiQD 8YNff4HeapNoT93swHNrTN+enAJ6YcJavT5fpIWC9wCj1Ox3eI6NIbXHf6NTi2aezWGT 8wA8mjr2w59Y5I4yw0P1oWxZp+4eRAjmg222q0cJwxV1TDa5xC+mnrWQ49fJ7YdAnmfI C0eQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8si11985438plk.67.2019.03.19.07.04.26; Tue, 19 Mar 2019 07:04:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727564AbfCSOBv (ORCPT + 99 others); Tue, 19 Mar 2019 10:01:51 -0400 Received: from mga02.intel.com ([134.134.136.20]:42907 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726464AbfCSOBu (ORCPT ); Tue, 19 Mar 2019 10:01:50 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Mar 2019 07:01:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,498,1544515200"; d="scan'208";a="153098151" Received: from smile.fi.intel.com (HELO smile) ([10.237.72.86]) by fmsmga002.fm.intel.com with ESMTP; 19 Mar 2019 07:01:46 -0700 Received: from andy by smile with local (Exim 4.92) (envelope-from ) id 1h6FJJ-0005o1-KM; Tue, 19 Mar 2019 16:01:45 +0200 Date: Tue, 19 Mar 2019 16:01:45 +0200 From: Andy Shevchenko To: George Spelvin Cc: linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org, Andrew Morton , Andrey Abramov , Geert Uytterhoeven , Daniel Wagner , Rasmus Villemoes , Don Mullis , Dave Chinner Subject: Re: [RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller Message-ID: <20190319140145.GH9224@smile.fi.intel.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 19, 2019 at 08:15:56AM +0000, George Spelvin wrote: > (Resend because earlier send had GIT_AUTHOR_DATE in the e-mail > headers and got filed with last month's archives. And probably > tripped a few spam filters.) I got both. > > v1->v2: Various spelling, naming and code style cleanups. > Generally positive and no negative responses to the > goals and algorithms used. > > I'm running these patches, with CONFIG_TEST_SORT and > CONFIG_TEST_LIST_SORT, on the machine I'm sending this from. > I have tweaked the comments further, but I have verified > the compiled object code is identical to a snapshot I took > when I rebooted. > > As far as I'm concerned, this is ready to be merged. > As there is no owner in MAINTAINERS, I was thinking of > sending it via AKPM, like the recent lib/lzo changes. > Andrew, is that okay with you? > Thanks for this improvement. I like when academic work is used in real life! FWIW, Reviewed-by: Andy Shevchenko > Because CONFIG_RETPOLINE has made indirect calls much more expensive, > I thought I'd try to reduce the number made by the library sort > functions. > > The first three patches apply to lib/sort.c. > > Patch #1 is a simple optimization. The built-in swap has special cases > for aligned 4- and 8-byte objects. But those are almost never used; > most calls to sort() work on larger structures, which fall back to the > byte-at-a-time loop. This generalizes them to aligned *multiples* of 4 > and 8 bytes. (If nothing else, it saves an awful lot of energy by not > thrashing the store buffers as much.) > > Patch #2 grabs a juicy piece of low-hanging fruit. I agree that > nice simple solid heapsort is preferable to more complex algorithms > (sorry, Andrey), but it's possible to implement heapsort with far fewer > comparisons (50% asymptotically, 25-40% reduction for realistic sizes) > than the way it's been done up to now. And with some care, the code > ends up smaller, as well. This is the "big win" patch. > > Patch #3 adds the same sort of indirect call bypass that has been added > to the net code of late. The great majority of the callers use the > builtin swap functions, so replace the indirect call to sort_func with a > (highly preditable) series of if() statements. Rather surprisingly, > this decreased code size, as the swap functions were inlined and their > prologue & epilogue code eliminated. > > lib/list_sort.c is a bit trickier, as merge sort is already close to > optimal, and we don't want to introduce triumphs of theory over > practicality like the Ford-Johnson merge-insertion sort. > > Patch #4, without changing the algorithm, chops 32% off the code size and > removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding > upper limit on efficiently sortable input size). > > Patch #5 improves the algorithm. The previous code is already optimal > for power-of-two (or slightly smaller) size inputs, but when the input > size is just over a power of 2, there's a very unbalanced final merge. > > There are, in the literature, several algorithms which solve this, but > they all depend on the "breadth-first" merge order which was replaced > by commit 835cc0c8477f with a more cache-friendly "depth-first" order. > Some hard thinking came up with a depth-first algorithm which defers > merges as little as possible while avoiding bad merges. This saves > 0.2*n compares, averaged over all sizes. > > The code size increase is minimal (64 bytes on x86-64, reducing the net > savings to 26%), but the comments expanded significantly to document > the clever algorithm. > > > TESTING NOTES: I have some ugly user-space benchmarking code > which I used for testing before moving this code into the kernel. > Shout if you want a copy. > > I'm running this code right now, with CONFIG_TEST_SORT and > CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since > the last round of minor edits to quell checkpatch. I figure there > will be at least one round of comments and final testing. > > George Spelvin (5): > lib/sort: Make swap functions more generic > lib/sort: Use more efficient bottom-up heapsort variant > lib/sort: Avoid indirect calls to built-in swap > lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS > lib/list_sort: Optimize number of calls to comparison function > > include/linux/list_sort.h | 1 + > lib/list_sort.c | 244 +++++++++++++++++++++++++--------- > lib/sort.c | 266 +++++++++++++++++++++++++++++--------- > 3 files changed, 387 insertions(+), 124 deletions(-) > > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko