Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2435097ioo; Sat, 28 May 2022 13:35:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzFgM4ThgjsCDG9PnkFpA59dU9b6NAl3+iJypTCG5ajPsWsl35/OAsFNhPysnpwZqkiDmtO X-Received: by 2002:a63:5415:0:b0:3fb:971:460a with SMTP id i21-20020a635415000000b003fb0971460amr13042448pgb.86.1653770113258; Sat, 28 May 2022 13:35:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653770113; cv=none; d=google.com; s=arc-20160816; b=jzdOpPrx55asgoCxU4XOyYYz+EqofdFIGS19kU3tzkkY3rB3OQ9E7tzuKQYLdQdYKr nklZuo7H/U1/cM4goakAVeQTppGguKHl/Ywd+KQKp94B1OqoUJW3UEc0+wBsvbl0BhQt bNHGzW8o7OI9cDaq6W7zQSliAmUe8hcWOWNrByZY3E/u9AogtvakN9eBJOx8+6p9iXO1 Fq2924PMS0pt2dP8kCGWWoA26JWhnA4mTkkQ9EE7EpeVqpqn/E6IOJ53whoH+OKZq1YA hFdER7JOQ/7Z5VogUpAn+ssaCqgZh+2zU7DrX3ByZgwoEasmy+DfuXm5hXR89IeaivbC 9/FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=361V99T0mQNSQZyHOPdAMCdjPFEyiJTvNkUeBgEbOug=; b=HiZDkB0tDPKodpF1UlhcqVEheEdnIBcElhA9Phnc1nwWNVxdJFg5JysK2pX3fc6VVv obElUFcwudryanN4h9d9OTWveejmUw52oPVqUJt/TUNcNcLRgRBroZALur+m/O9/eIzL tOj6Tloq0pqSQi3+9UDapqrU6YDUFSCZ3hRrl+zbHd2Q8HXu5IBoDM2DGQtZ7ohmIWFE 4qSNrqB4/I2UPlAL0kAFNaJs8qvcyORHvGwh7GJKlCXqjjbuDORONr3Rq7ALoNXml9b4 hMzrLW3OuL04iKd7clBejZorfJFSfQYtTvjbIWBOiyD0/2XyBHE3QtxURgC46o8pbFGT M4Zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UKRaDeHe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id c19-20020a631c53000000b003fb077f9479si10857502pgm.556.2022.05.28.13.35.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 13:35:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UKRaDeHe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D34041A0AE5; Sat, 28 May 2022 12:36:31 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231496AbiE1IOI (ORCPT + 99 others); Sat, 28 May 2022 04:14:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231189AbiE1IOC (ORCPT ); Sat, 28 May 2022 04:14:02 -0400 Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7388B9E; Sat, 28 May 2022 01:14:01 -0700 (PDT) Received: by mail-pg1-x52d.google.com with SMTP id g184so5847641pgc.1; Sat, 28 May 2022 01:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=361V99T0mQNSQZyHOPdAMCdjPFEyiJTvNkUeBgEbOug=; b=UKRaDeHeiRxhtIJZR+xleOoJVibDLSIhzWwg85RhvXjxUCq3Dw9NqgNY17soShlrq1 37pKHOx7zfTKsRVTQ9RfM1xxlCQiZhZdJ5jeWx/C2GPYlwQ1XePN0+7UKorCPBrQRSYE rsK7fjeIbYoys6MnHUZlrZ4G9fjlUQNur81OQXwz16c7xVThvjBZFskft1zWJCLU9xiU 2HY7X/uebsVkYT9SSIxvybtQsuPAOtmzXv6GnfqBifUgIqyShygFw8ZbCrginsUwY9yY DO/XV/zI9XBSIdk9MSKtZvHScABNEsuoq4lPJ+vLXK1IDwnW8mKh4qpcnqWTijxmyXre 6CNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=361V99T0mQNSQZyHOPdAMCdjPFEyiJTvNkUeBgEbOug=; b=VoutrjFjWJ5vmR3CNc5q6JFXOeDJfNMtlD3OL2iw3WBYCHgVOIuxlTZgHCW0xAMoX1 Nx7d3V99EC2UIkJ5t6DQFOxD/M3vYS9kHV5eufUq1sy/95cVCeY8zR/Gk9D93p/Bs0vk j61bp3BUzvvDTaYoHz/rB4YJh/txION8/xnwpYHvHCWxRfFKHV17rS0kfi1L8MY1Hv9R 88H/wkxO1QJ/jV3U1K28FUis0lS28aTO4Su3C8yP13N2jiCyTU6bFIk0Cd0SLSBwMcSW OFVYbI9tP7ScoRroYLke+q3SHdwH4h7480T9q3BA070NdGNBMLOe25TAsRRcawsEU4CV M+Gw== X-Gm-Message-State: AOAM5333rw/HYpqg44usQZnb2p29RbOxHAh1tDD51zbflkp3qhb2zpE7 vc2Safm+wF4VcknkKzJoh6c= X-Received: by 2002:a65:6a47:0:b0:3f5:d7a8:44ee with SMTP id o7-20020a656a47000000b003f5d7a844eemr40352490pgu.330.1653725641154; Sat, 28 May 2022 01:14:01 -0700 (PDT) Received: from localhost.localdomain ([140.116.104.153]) by smtp.gmail.com with ESMTPSA id i12-20020a17090ad34c00b001e0c5da6a51sm2774347pjx.50.2022.05.28.01.13.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 01:14:00 -0700 (PDT) From: Yu-Jen Chang To: ak@linux.intel.com, jdike@linux.intel.com Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, richard@nod.at, anton.ivanov@cambridgegreys.com, johannes@sipsolutions.net, linux-um@lists.infradead.org, jserv@ccns.ncku.edu.tw, Yu-Jen Chang Subject: [PATCH 1/2] x86/lib: Optimize memchr() Date: Sat, 28 May 2022 16:12:35 +0800 Message-Id: <20220528081236.3020-2-arthurchang09@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220528081236.3020-1-arthurchang09@gmail.com> References: <20220528081236.3020-1-arthurchang09@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The original assembly version of memchr() is implemented with the byte-wise comparing technique, which does not fully use 64-bits registers in x86_64 CPU. We use word-wide comparing so that 8 characters can be compared at the same time on x86_64 CPU. First we align the input and then use word-wise comparing to find the first 64-bit word that contain the target. Secondly, we compare every byte in the word and get the output. We create two files to measure the performance. The first file contains on average 10 characters ahead the target character. The second file contains at least 1000 characters ahead the target character. Our implementation of “memchr()” is slightly better in the first test and nearly 4x faster than the orginal implementation in the second test. Signed-off-by: Yu-Jen Chang Signed-off-by: Ching-Chun (Jim) Huang --- arch/x86/include/asm/string_64.h | 3 ++ arch/x86/lib/Makefile | 1 + arch/x86/lib/string_64.c | 78 ++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+) create mode 100644 arch/x86/lib/string_64.c diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 6e450827f..edce657e0 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -14,6 +14,9 @@ extern void *memcpy(void *to, const void *from, size_t len); extern void *__memcpy(void *to, const void *from, size_t len); +#define __HAVE_ARCH_MEMCHR +extern void *memchr(const void *cs, int c, size_t length); + #define __HAVE_ARCH_MEMSET void *memset(void *s, int c, size_t n); void *__memset(void *s, int c, size_t n); diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index f76747862..4d530e559 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -69,5 +69,6 @@ else lib-y += clear_page_64.o copy_page_64.o lib-y += memmove_64.o memset_64.o lib-y += copy_user_64.o + lib-y += string_64.o lib-y += cmpxchg16b_emu.o endif diff --git a/arch/x86/lib/string_64.c b/arch/x86/lib/string_64.c new file mode 100644 index 000000000..4e067d5be --- /dev/null +++ b/arch/x86/lib/string_64.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +/* How many bytes are loaded each iteration of the word copy loop */ +#define LBLOCKSIZE (sizeof(long)) + +#ifdef __HAVE_ARCH_MEMCHR + +void *memchr(const void *cs, int c, size_t length) +{ + const unsigned char *src = (const unsigned char *)cs, d = c; + + while (!IS_ALIGNED((long)src, sizeof(long))) { + if (!length--) + return NULL; + if (*src == d) + return (void *)src; + src++; + } + if (length >= LBLOCKSIZE) { + unsigned long mask = d << 8 | d; + unsigned int i = 32; + long xor, data; + const long consta = 0xFEFEFEFEFEFEFEFF, + constb = 0x8080808080808080; + + /* + * Create a 8-bytes mask for word-wise comparing. + * For example, a mask for 'a' is 0x6161616161616161. + */ + + mask |= mask << 16; + for (i = 32; i < LBLOCKSIZE * 8; i <<= 1) + mask |= mask << i; + /* + * We perform word-wise comparing with following operation: + * 1. Perform xor on the long word @src and @mask + * and put into @xor. + * 2. Add @xor with @consta. + * 3. ~@xor & @constb. + * 4. Perform & with the result of step 2 and 3. + * + * Step 1 creates a byte which is 0 in the long word if + * there is at least one target byte in it. + * + * Step 2 to Step 4 find if there is a byte with 0 in + * the long word. + */ + asm volatile("1:\n\t" + "movq (%0),%1\n\t" + "xorq %6,%1\n\t" + "lea (%1,%4), %2\n\t" + "notq %1\n\t" + "andq %5,%1\n\t" + "testq %1,%2\n\t" + "jne 2f\n\t" + "add $8,%0\n\t" + "sub $8,%3\n\t" + "cmp $7,%3\n\t" + "ja 1b\n\t" + "2:\n\t" + : "=D"(src), "=r"(xor), "=r"(data), "=r"(length) + : "r"(consta), "r"(constb), "r"(mask), "0"(src), + "1"(xor), "2"(data), "3"(length) + : "memory", "cc"); + } + + while (length--) { + if (*src == d) + return (void *)src; + src++; + } + return NULL; +} +EXPORT_SYMBOL(memchr); +#endif -- 2.25.1