Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp1156766img; Tue, 19 Mar 2019 01:25:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqwLP6D+/F1CuiyS6zCYKBe5ATPTZbUB0NE+w1Wx5CfdJ9OvbVeD9IJPnIupIAUAqzqch0op X-Received: by 2002:a17:902:1029:: with SMTP id b38mr775274pla.204.1552983940701; Tue, 19 Mar 2019 01:25:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552983940; cv=none; d=google.com; s=arc-20160816; b=cGRrYJZ5Jy0FkYLjn0gS+Gp601BKjfmKpQrb7fjd7pFZv5jBBmwDV/iXg/GouCAjKr SzDwuJEUD4KsQ7LzR9spPP5dnhnbABY6jrX0jzTXpOktrTCuZFcq4xoAbVE09deGi+/a jQ+MSk4xI3w5xHN8NAaIrxjM9cTRJTugkiCYkaJoUXP1kh7VgWE/2wsKOBLTuN5b0dY4 rWttl4UemQu+PK7FqjODy2Xn8+66+8WyuM75/e5MGBv5AfU6mEnbGeP6t+i/KRUUUgJZ RyNBxTjnZkBEl8v8ndMNuRz/8Mj4q47Db5EwjfGXA5k/l8Mh4xhx/QoHUcIg6T38X6WJ 84Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:date:from:references :in-reply-to:message-id; bh=4LlbMWW8l7ynska9XsC7zp7gqsH7cFv//DsF3lkihb8=; b=BPM70AEa3USfrpi+X/h5xn2KLOnWo7Mm2B2lh50rqQ83rG5npj2HOMtJTov2W1Mj8n 4r3/Vv7s9zXXKAf3xzDrUBZNJszLAnwgzn1YOio1HNIUv90GzsTxtp4LgHRsbEwbXiA7 YaVfYLsghoZEFE9Siv86GflI9DTdlTvstdJM4iqLMvnzY961hhxB+RIdNm0U+x0IoNQC sv6gFdo3tTr3X7dHrDTlehCkIKb8HnbW/BS/l3XWot4PHXlMJI3L4SPIutAhhsHXza70 BW2hSCGXmKVamhismKG8zac2pTkGz+yuRyOsWyswP+y35rl2AuMoJLRDb9a3IQQV4VVB RVEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v61si11597608plb.60.2019.03.19.01.25.25; Tue, 19 Mar 2019 01:25:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727073AbfCSIYn (ORCPT + 99 others); Tue, 19 Mar 2019 04:24:43 -0400 Received: from mx.sdf.org ([205.166.94.20]:55015 "EHLO mx.sdf.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725862AbfCSIYm (ORCPT ); Tue, 19 Mar 2019 04:24:42 -0400 Received: from sdf.org (IDENT:lkml@sdf.lonestar.org [205.166.94.16]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id x2J8Mf92024820 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Tue, 19 Mar 2019 08:22:41 GMT Received: (from lkml@localhost) by sdf.org (8.15.2/8.12.8/Submit) id x2J8MfLY008025; Tue, 19 Mar 2019 08:22:41 GMT Message-Id: In-Reply-To: References: From: George Spelvin Date: Tue, 19 Mar 2019 08:15:57 +0000 Subject: [RESEND PATCH v2 1/5] lib/sort: Make swap functions more generic To: linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org, Andrew Morton Cc: George Spelvin , Andrey Abramov , Geert Uytterhoeven , Daniel Wagner , Rasmus Villemoes , Don Mullis , Dave Chinner , Andy Shevchenko Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rather than having special-case swap functions for 4- and 8-byte objects, special-case aligned multiples of 4 or 8 bytes. This speeds up most users of sort() by avoiding fallback to the byte copy loop. Despite what commit ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) The functions also got renamed to reflect the fact that they support multiple words. In the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedback-from: Geert Uytterhoeven --- lib/sort.c | 135 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 105 insertions(+), 30 deletions(-) diff --git a/lib/sort.c b/lib/sort.c index d6b7a202b0b6..ec79eac85e21 100644 --- a/lib/sort.c +++ b/lib/sort.c @@ -11,35 +11,108 @@ #include #include -static int alignment_ok(const void *base, int align) +/** + * is_aligned - is this pointer & size okay for word-wide copying? + * @base: pointer to data + * @size: size of each element + * @align: required aignment (typically 4 or 8) + * + * Returns true if elements can be copied using word loads and stores. + * The size must be a multiple of the alignment, and the base address must + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. + * + * For some reason, gcc doesn't know to optimize "if (a & mask || b & mask)" + * to "if ((a | b) & mask)", so we do that by hand. + */ +__attribute_const__ __always_inline +static bool is_aligned(const void *base, size_t size, unsigned char align) { - return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || - ((unsigned long)base & (align - 1)) == 0; + unsigned char lsbits = (unsigned char)size; + + (void)base; +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + lsbits |= (unsigned char)(uintptr_t)base; +#endif + return (lsbits & (align - 1)) == 0; } -static void u32_swap(void *a, void *b, int size) +/** + * swap_words_32 - swap two elements in 32-bit chunks + * @a, @b: pointers to the elements + * @size: element size (must be a multiple of 4) + * + * Exchange the two objects in memory. This exploits base+index addressing, + * which basically all CPUs have, to minimize loop overhead computations. + * + * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the + * bottom of the loop, even though the zero flag is stil valid from the + * subtract (since the intervening mov instructions don't alter the flags). + * Gcc 8.1.0 doesn't have that problem. + */ +static void swap_words_32(void *a, void *b, int size) { - u32 t = *(u32 *)a; - *(u32 *)a = *(u32 *)b; - *(u32 *)b = t; -} - -static void u64_swap(void *a, void *b, int size) -{ - u64 t = *(u64 *)a; - *(u64 *)a = *(u64 *)b; - *(u64 *)b = t; -} - -static void generic_swap(void *a, void *b, int size) -{ - char t; + size_t n = (unsigned int)size; do { - t = *(char *)a; - *(char *)a++ = *(char *)b; - *(char *)b++ = t; - } while (--size > 0); + u32 t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; + } while (n); +} + +/** + * swap_words_64 - swap two elements in 64-bit chunks + * @a, @b: pointers to the elements + * @size: element size (must be a multiple of 8) + * + * Exchange the two objects in memory. This exploits base+index + * addressing, which basically all CPUs have, to minimize loop overhead + * computations. + * + * We'd like to use 64-bit loads if possible. If they're not, emulating + * one requires base+index+4 addressing which x86 has but most other + * processors do not. If CONFIG_64BIT, we definitely have 64-bit loads, + * but it's possible to have 64-bit loads without 64-bit pointers (e.g. + * x32 ABI). Are there any cases the kernel needs to worry about? + */ +static void swap_words_64(void *a, void *b, int size) +{ + size_t n = (unsigned int)size; + + do { +#ifdef CONFIG_64BIT + u64 t = *(u64 *)(a + (n -= 8)); + *(u64 *)(a + n) = *(u64 *)(b + n); + *(u64 *)(b + n) = t; +#else + /* Use two 32-bit transfers to avoid base+index+4 addressing */ + u32 t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; + + t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; +#endif + } while (n); +} + +/** + * swap_bytes - swap two elements a byte at a time + * @a, @b: pointers to the elements + * @size: element size + * + * This is the fallback if alignment doesn't allow using larger chunks. + */ +static void swap_bytes(void *a, void *b, int size) +{ + size_t n = (unsigned int)size; + + do { + char t = ((char *)a)[--n]; + ((char *)a)[n] = ((char *)b)[n]; + ((char *)b)[n] = t; + } while (n); } /** @@ -50,8 +123,10 @@ static void generic_swap(void *a, void *b, int size) * @cmp_func: pointer to comparison function * @swap_func: pointer to swap function or NULL * - * This function does a heapsort on the given array. You may provide a - * swap_func function optimized to your element type. + * This function does a heapsort on the given array. You may provide + * a swap_func function if you need to do something more than a memory + * copy (e.g. fix up pointers or auxiliary data), but the built-in swap + * isn't usually a bottleneck. * * Sorting time is O(n log n) both on average and worst-case. While * qsort is about 20% faster on average, it suffers from exploitable @@ -67,12 +142,12 @@ void sort(void *base, size_t num, size_t size, int i = (num/2 - 1) * size, n = num * size, c, r; if (!swap_func) { - if (size == 4 && alignment_ok(base, 4)) - swap_func = u32_swap; - else if (size == 8 && alignment_ok(base, 8)) - swap_func = u64_swap; + if (is_aligned(base, size, 8)) + swap_func = swap_words_64; + else if (is_aligned(base, size, 4)) + swap_func = swap_words_32; else - swap_func = generic_swap; + swap_func = swap_bytes; } /* heapify */ -- 2.20.1