Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp2552367imw; Sun, 10 Jul 2022 09:07:54 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uU7RrrdhP1ijEbrxQ6bDab1bvEAQ6QlUcpzhd/Hzj6Lr5GJSmMDFmgJPxOM+gbhhK6SmHe X-Received: by 2002:a17:906:c5:b0:722:e65d:770d with SMTP id 5-20020a17090600c500b00722e65d770dmr14016368eji.330.1657469274319; Sun, 10 Jul 2022 09:07:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657469274; cv=none; d=google.com; s=arc-20160816; b=J6HNWuKNsk9Z4yMikLP260gPIw0W6/8wxabMwxzkZysX69wBH77VxIZXaFVansY/qR xi5Ovz9sUzkk89r8F0QQ8bj1fmxwsVZoPGvx8tz3IKHoBi/cNHBqr3UDxuUZmuZS6CRy bb8DKlE6ulfHjpb90YBmfHWmnt1cFJgpV0zA22DQeWckEFuSFEvWIM9hl7FnRYc/7wgv Dr+XZRH3q8xJ7OziZZanEltGxipnpyjiqJkhZRu4CXeLD3P6W78iu8kVdpEF6xzhSI07 ud7Jt9GsW6pHP6FY5vJOaNVO5KgODv6Pj1z7K4vdFnJ/XD6FNpgDIMah/4+YyfoKG5k5 e8DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id; bh=6XodzgnvJHwBK8bi9pJk2/Z2GHCZOsjWfpepwKKVXY4=; b=NOGUrVHKChDg8qE4l7/ohLcv2hh3RqWXFGtw72VI+Naxo7ZD7hFk8DD5/V7WTA2Ajw QBeHBo3NjVV0lBMK1FctBrGD6b7eZD93GkonatHvnaMzH9lVksDr+QlPHF2vx9a75Ep7 zqXuHwX2xWjkIkaBFWzsJ2tV23JcS3Ffrf2Bn9k9L4SKPLi6Ou4sRYL9UTDzB+IxGHh5 ttD01JK1uIFMpH753XvPosWtcQ//cbCAs0Myk4gc/PuLvz3DvOlYwRjS1o+yMu00YGkr zgWDrM9xp9+vJ5bRloynBDCEggsiQTBhtQyOnc4quiLQ39yhsDfppROEdlsbNK3duQ7x w4Hw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o10-20020aa7d3ca000000b00435781299b4si5842475edr.357.2022.07.10.09.07.29; Sun, 10 Jul 2022 09:07:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229542AbiGJPQ1 convert rfc822-to-8bit (ORCPT + 99 others); Sun, 10 Jul 2022 11:16:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbiGJPQX (ORCPT ); Sun, 10 Jul 2022 11:16:23 -0400 Received: from relay4.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA547CE1B for ; Sun, 10 Jul 2022 08:16:22 -0700 (PDT) Received: from omf03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B4F1433DEA; Sun, 10 Jul 2022 15:16:21 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: joe@perches.com) by omf03.hostedemail.com (Postfix) with ESMTPA id 9A0636000B; Sun, 10 Jul 2022 15:16:19 +0000 (UTC) Message-ID: <75e3bb4f88fa43097540f3e2023df8388def5719.camel@perches.com> Subject: Re: [PATCH 2/2] lib/string.c: Optimize memchr() From: Joe Perches To: Yu-Jen Chang , andy@kernel.org, akinobu.mita@gmail.com Cc: jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org Date: Sun, 10 Jul 2022 08:16:17 -0700 In-Reply-To: <20220710142822.52539-3-arthurchang09@gmail.com> References: <20220710142822.52539-1-arthurchang09@gmail.com> <20220710142822.52539-3-arthurchang09@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.44.1-0ubuntu1 MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Rspamd-Server: rspamout03 X-Rspamd-Queue-Id: 9A0636000B X-Stat-Signature: 1orab4y3gfh1ujkwaxt1cfkuuw7itmxy X-Session-Marker: 6A6F6540706572636865732E636F6D X-Session-ID: U2FsdGVkX1/2jHyohJ7OktbUuEf+voQ7wKQCZ84grBg= X-HE-Tag: 1657466179-188916 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2022-07-10 at 22:28 +0800, Yu-Jen Chang wrote: > The original version of memchr() is implemented with the byte-wise > comparing technique, which does not fully use 64-bits or 32-bits > registers in CPU. We use word-wide comparing so that 8 characters > can be compared at the same time on CPU. This code is base on > David Laight's implementation. > > We create two files to measure the performance. The first file > contains on average 10 characters ahead the target character. > The second file contains at least 1000 characters ahead the > target character. Our implementation of “memchr()” is slightly > better in the first test and nearly 4x faster than the orginal > implementation in the second test. It seems you did not test this with 32bit compilers as there are 64 bit constants without ull > diff --git a/lib/string.c b/lib/string.c [] > @@ -905,21 +905,35 @@ EXPORT_SYMBOL(strnstr); > #ifndef __HAVE_ARCH_MEMCHR > /** > * memchr - Find a character in an area of memory. > - * @s: The memory area > + * @p: The memory area > * @c: The byte to search for > - * @n: The size of the area. > + * @length: The size of the area. > * > * returns the address of the first occurrence of @c, or %NULL > * if @c is not found > */ > -void *memchr(const void *s, int c, size_t n) > +void *memchr(const void *p, int c, unsigned long length) > { > - const unsigned char *p = s; > - while (n-- != 0) { > - if ((unsigned char)c == *p++) { > - return (void *)(p - 1); > + u64 mask, val; > + const void *end = p + length; > + > + c &= 0xff; > + if (p <= end - 8) { > + mask = c; > + MEMCHR_MASK_GEN(mask); > + > + for (; p <= end - 8; p += 8) { > + val = *(u64 *)p ^ mask; > + if ((val + 0xfefefefefefefeffu) & > + (~val & 0x8080808080808080u)) here.