Received: by 2002:ac0:950e:0:0:0:0:0 with SMTP id f14csp173265imc; Fri, 15 Mar 2019 20:52:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqwFCOVicpFkHhiV3/jgceEqm0w0qFrbJ8x1vo8T+i4gttU6AWQnm/BuT5xjlDGKfJoyXk3u X-Received: by 2002:a17:902:b196:: with SMTP id s22mr6134906plr.33.1552708358560; Fri, 15 Mar 2019 20:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552708358; cv=none; d=google.com; s=arc-20160816; b=GRxlBCT5vECXx8PCikh4ESGdAaGXC6dG366wv+XqQlV+uvoLxd1dTxpqIZRpoWCy+V 7SiUxjPpBuTvtxa/mgmljFu9GmWdnahqduidcc0B3Q7K3Hnx7r/GVY1mfr9Z48kNkro6 6a1VBLxHNYQs9fbbdryC6ia0OuW8h0N3otCqW9HsiuW+TBi9tRMtO+2mnVmbt+5lZz2Q K2yk8lalwq4Fn4JTTmsKAS+9sEBu087IFzwCuEiz2a0rBBa51vxqnpL8wwLPzBuv4DVN c8j0zKRPTMNojzqvGqTU0hFUhlQ/sgZFeg1zbIgDmv2w7l7rk0R5pTCPlFniK2U5+FLS JX3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:date:from:references :in-reply-to:message-id; bh=4LlbMWW8l7ynska9XsC7zp7gqsH7cFv//DsF3lkihb8=; b=ybxnWiS0DoCXl4KUmpHKQDMRGv31kZocSm0sYSVB4S1yT1yysTmHJgLjhAiKx4pNQr S4SrdscYjFN4n7z9D5eF6m3V5zPQqlDvn7N0zB3wirjJsjjkzbLF+LVBjNasMRzVCKZZ OIiVam/Q7hTYB5b/rTtTEuKE2E3I+qDsCezLBfFQMGFvk9+n3dtjRiu/l9ewusbGTK51 NU3MHb/rTJHriCkhKWbD7Utlyg5AKrsSARdAxz1F80DivrwEKrV5uvME2WHvXY9rPFxa yCJ69nWJ2Sn0WBAWV1FUIiIwJ/bUMCO/cS87/22fGnv2+ve2n2JvOB7+EhHKnwxJTTrt VwXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s74si1850669pfs.24.2019.03.15.20.52.24; Fri, 15 Mar 2019 20:52:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726959AbfCPDvn (ORCPT + 99 others); Fri, 15 Mar 2019 23:51:43 -0400 Received: from mx.sdf.org ([205.166.94.20]:60526 "EHLO mx.sdf.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726204AbfCPDvm (ORCPT ); Fri, 15 Mar 2019 23:51:42 -0400 Received: from sdf.org (IDENT:lkml@sdf.lonestar.org [205.166.94.16]) by mx.sdf.org (8.15.2/8.14.5) with ESMTPS id x2G3pHtl018228 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256 bits) verified NO); Sat, 16 Mar 2019 03:51:17 GMT Received: (from lkml@localhost) by sdf.org (8.15.2/8.12.8/Submit) id x2G3pHq1016177; Sat, 16 Mar 2019 03:51:17 GMT Message-Id: In-Reply-To: References: From: George Spelvin Date: Thu, 21 Feb 2019 06:30:28 +0000 Subject: [PATCH v2 1/5] lib/sort: Make swap functions more generic To: linux-kernel@vger.kernel.org, kernel-janitors@vger.kernel.org, Andrew Morton Cc: George Spelvin , Andrey Abramov , Geert Uytterhoeven , Daniel Wagner , Rasmus Villemoes , Don Mullis , Dave Chinner , Andy Shevchenko Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rather than having special-case swap functions for 4- and 8-byte objects, special-case aligned multiples of 4 or 8 bytes. This speeds up most users of sort() by avoiding fallback to the byte copy loop. Despite what commit ca96ab859ab4 ("lib/sort: Add 64 bit swap function") claims, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) The functions also got renamed to reflect the fact that they support multiple words. In the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedback-from: Geert Uytterhoeven --- lib/sort.c | 135 +++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 105 insertions(+), 30 deletions(-) diff --git a/lib/sort.c b/lib/sort.c index d6b7a202b0b6..ec79eac85e21 100644 --- a/lib/sort.c +++ b/lib/sort.c @@ -11,35 +11,108 @@ #include #include -static int alignment_ok(const void *base, int align) +/** + * is_aligned - is this pointer & size okay for word-wide copying? + * @base: pointer to data + * @size: size of each element + * @align: required aignment (typically 4 or 8) + * + * Returns true if elements can be copied using word loads and stores. + * The size must be a multiple of the alignment, and the base address must + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. + * + * For some reason, gcc doesn't know to optimize "if (a & mask || b & mask)" + * to "if ((a | b) & mask)", so we do that by hand. + */ +__attribute_const__ __always_inline +static bool is_aligned(const void *base, size_t size, unsigned char align) { - return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || - ((unsigned long)base & (align - 1)) == 0; + unsigned char lsbits = (unsigned char)size; + + (void)base; +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + lsbits |= (unsigned char)(uintptr_t)base; +#endif + return (lsbits & (align - 1)) == 0; } -static void u32_swap(void *a, void *b, int size) +/** + * swap_words_32 - swap two elements in 32-bit chunks + * @a, @b: pointers to the elements + * @size: element size (must be a multiple of 4) + * + * Exchange the two objects in memory. This exploits base+index addressing, + * which basically all CPUs have, to minimize loop overhead computations. + * + * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the + * bottom of the loop, even though the zero flag is stil valid from the + * subtract (since the intervening mov instructions don't alter the flags). + * Gcc 8.1.0 doesn't have that problem. + */ +static void swap_words_32(void *a, void *b, int size) { - u32 t = *(u32 *)a; - *(u32 *)a = *(u32 *)b; - *(u32 *)b = t; -} - -static void u64_swap(void *a, void *b, int size) -{ - u64 t = *(u64 *)a; - *(u64 *)a = *(u64 *)b; - *(u64 *)b = t; -} - -static void generic_swap(void *a, void *b, int size) -{ - char t; + size_t n = (unsigned int)size; do { - t = *(char *)a; - *(char *)a++ = *(char *)b; - *(char *)b++ = t; - } while (--size > 0); + u32 t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; + } while (n); +} + +/** + * swap_words_64 - swap two elements in 64-bit chunks + * @a, @b: pointers to the elements + * @size: element size (must be a multiple of 8) + * + * Exchange the two objects in memory. This exploits base+index + * addressing, which basically all CPUs have, to minimize loop overhead + * computations. + * + * We'd like to use 64-bit loads if possible. If they're not, emulating + * one requires base+index+4 addressing which x86 has but most other + * processors do not. If CONFIG_64BIT, we definitely have 64-bit loads, + * but it's possible to have 64-bit loads without 64-bit pointers (e.g. + * x32 ABI). Are there any cases the kernel needs to worry about? + */ +static void swap_words_64(void *a, void *b, int size) +{ + size_t n = (unsigned int)size; + + do { +#ifdef CONFIG_64BIT + u64 t = *(u64 *)(a + (n -= 8)); + *(u64 *)(a + n) = *(u64 *)(b + n); + *(u64 *)(b + n) = t; +#else + /* Use two 32-bit transfers to avoid base+index+4 addressing */ + u32 t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; + + t = *(u32 *)(a + (n -= 4)); + *(u32 *)(a + n) = *(u32 *)(b + n); + *(u32 *)(b + n) = t; +#endif + } while (n); +} + +/** + * swap_bytes - swap two elements a byte at a time + * @a, @b: pointers to the elements + * @size: element size + * + * This is the fallback if alignment doesn't allow using larger chunks. + */ +static void swap_bytes(void *a, void *b, int size) +{ + size_t n = (unsigned int)size; + + do { + char t = ((char *)a)[--n]; + ((char *)a)[n] = ((char *)b)[n]; + ((char *)b)[n] = t; + } while (n); } /** @@ -50,8 +123,10 @@ static void generic_swap(void *a, void *b, int size) * @cmp_func: pointer to comparison function * @swap_func: pointer to swap function or NULL * - * This function does a heapsort on the given array. You may provide a - * swap_func function optimized to your element type. + * This function does a heapsort on the given array. You may provide + * a swap_func function if you need to do something more than a memory + * copy (e.g. fix up pointers or auxiliary data), but the built-in swap + * isn't usually a bottleneck. * * Sorting time is O(n log n) both on average and worst-case. While * qsort is about 20% faster on average, it suffers from exploitable @@ -67,12 +142,12 @@ void sort(void *base, size_t num, size_t size, int i = (num/2 - 1) * size, n = num * size, c, r; if (!swap_func) { - if (size == 4 && alignment_ok(base, 4)) - swap_func = u32_swap; - else if (size == 8 && alignment_ok(base, 8)) - swap_func = u64_swap; + if (is_aligned(base, size, 8)) + swap_func = swap_words_64; + else if (is_aligned(base, size, 4)) + swap_func = swap_words_32; else - swap_func = generic_swap; + swap_func = swap_bytes; } /* heapify */ -- 2.20.1