Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp975365rdb; Tue, 30 Jan 2024 04:34:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IGZUxHemHBmno+Y1LYDd6jN8zmnYZfxI23JVoC13xWSbyCwIBgIFZD9+8SoTxL89n3VoOBB X-Received: by 2002:a05:6a00:1a91:b0:6de:1da4:ca99 with SMTP id e17-20020a056a001a9100b006de1da4ca99mr4435102pfv.18.1706618078209; Tue, 30 Jan 2024 04:34:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706618078; cv=pass; d=google.com; s=arc-20160816; b=c9l+yJCh6NmgHtj7eRpZDQNAcv38Ea+IiMl7C0i8tuxGPoElrtvWl5HfLTgGeZziRK XK/VQZkZrbVeQOghTw65iJ+0Tj5f61mjI96kdrAsPmV3Ilo2ua4IC95ym0juQ4JeLyVG tTBPLIl0YOBlwCpSjwfrA0F92AFCuHqxIfFyb5n3JS5AzgKp1/8hRmfWS+0a+/fObR6z Kc5/ZdfkzUnRGrWgXrMlO8jzQrfwggtsqlCuRN6TN84GoMCTjNpdWvYIgrG0ovFaHBjV aIh5pD06TpF+iP9tuuQI1vPvKhDMObLehyyErOEH8uihUHNVVGrdNbriFdgVPuJtaIqM GFLA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=fJqFXLWpL2hYqCBHRJAwF7u+umxcR/sBqT9TuMi7Otc=; fh=l6T1434Zq0I9qiWIpthnL58DAWfmT4MKlReRUJj6tJU=; b=h+SpPh/ByHLeixr3iEJf5XFayoRwBEI6G1PZv1a45IqKeZoYOtndKpIpT7Ofat53QZ CQPnPdZJecBgKpF15cSS1ITz6u4gW/vInROiNkH+e+M+TQYLCMfWadcOA2tJmngKsvmp j460Vz7356045KJgpnvRcRVIzJd9mlq9GPCA3526sQoRpM1IyQAvgGmkjIttnifBi8MG wcN9LYtMm1b8BmV3pAzm8KCG4sffG91ZYPz+Dr3XyVI395oaIZlMXB9J0yulqqWkKgKj lRmZrycfnw7aCKE9EkODgBmmFl7zZLoqCCal1KEaSmEI5LUObBfAiNaebzK6mNTmXopf ZlRQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail (test mode) header.i=@ics.forth.gr header.s=av header.b=CHdy3mk5; arc=pass (i=1 spf=pass spfdomain=ics.forth.gr dkim=pass dkdomain=ics.forth.gr dmarc=pass fromdomain=ics.forth.gr); spf=pass (google.com: domain of linux-kernel+bounces-44581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-44581-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ics.forth.gr Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id z1-20020aa78881000000b006dde290cd48si7402437pfe.318.2024.01.30.04.34.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 04:34:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-44581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@ics.forth.gr header.s=av header.b=CHdy3mk5; arc=pass (i=1 spf=pass spfdomain=ics.forth.gr dkim=pass dkdomain=ics.forth.gr dmarc=pass fromdomain=ics.forth.gr); spf=pass (google.com: domain of linux-kernel+bounces-44581-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-44581-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ics.forth.gr Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DAFD1B2D048 for ; Tue, 30 Jan 2024 12:09:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E7B6367E7C; Tue, 30 Jan 2024 12:07:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ics.forth.gr header.i=@ics.forth.gr header.b="CHdy3mk5" Received: from mailgate.ics.forth.gr (mailgate.ics.forth.gr [139.91.1.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03A6C679EF for ; Tue, 30 Jan 2024 12:07:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=139.91.1.2 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706616473; cv=none; b=Rvdrp1pz3XhqKR1BZcJ1ExVNEfd5kSL2o2uLrSsxbhFE8jJ5BhtMPzrA7jASFNS7nGB5twGy84foUEVqlVxBB6UGV45Xf9fRS8Gvz3EZc9RaSaoY2H51wnpULYvp/X+YumGJqJ71/H0+Ydlvwd0f3oD6WVKfWBb9uLbgkgMCR1c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706616473; c=relaxed/simple; bh=Iq4El0XupAdJ9EMVA4rv5oTgmdh7tQX13verRpi39oo=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=bb/HFxHp+bDEKRLoitBdvS4CmGOiYtU1wyaNE+4vWgbhIAjMcTSYQ5n0OF5Kl3d21QOPt9GCxva7dYyhQhp/90sOs1Xm2u45FZjSDBNHtRMXIRmSFDaABLa2ITvdT6wmbgt+0fx2kId3k5KVBh3KvfEQcPQ6pPzU8dnhRapBd6U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ics.forth.gr; spf=pass smtp.mailfrom=ics.forth.gr; dkim=pass (2048-bit key) header.d=ics.forth.gr header.i=@ics.forth.gr header.b=CHdy3mk5; arc=none smtp.client-ip=139.91.1.2 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ics.forth.gr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ics.forth.gr Received: from av3.ics.forth.gr (av3in.ics.forth.gr [139.91.1.77]) by mailgate.ics.forth.gr (8.15.2/ICS-FORTH/V10-1.8-GATE) with ESMTP id 40UC7mdS004455 for ; Tue, 30 Jan 2024 14:07:48 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; d=ics.forth.gr; s=av; c=relaxed/simple; q=dns/txt; i=@ics.forth.gr; t=1706616463; x=1709208463; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Iq4El0XupAdJ9EMVA4rv5oTgmdh7tQX13verRpi39oo=; b=CHdy3mk5F4tOU97XzSzXGQavQKwBFR4D+11W2UOQRa4bKsd730xr4qEgV3PNHrgi G/fisWOMS09IhzoFkEVWDGK+iRAHm+CX55sS29DmuKJgVBcvuYuiO/NZZbzaxKV0 P2CcKbxoT6Aa5hvTshClLGVC5PbBC76NiCMFChl8g6Konpd152Z2Tf9yyzSWDDtU DOtNDJ0AWwqdmKkmuDYyVeVbAVOEhhJQyru9USxruIZmDIorPqr3Kuym0zV2QSh5 4ihjBpmLY+Bu3AR4Iklw1OVph2LF4tmWIxVuwl3kWCaXws8pPVgLW/Oz9+4J5n7J qGxg08BG/oSg1IginLhMnQ==; X-AuditID: 8b5b014d-a17eb70000002178-22-65b8e68eb329 Received: from enigma.ics.forth.gr (webmail.ics.forth.gr [139.91.151.35]) by av3.ics.forth.gr (Symantec Messaging Gateway) with SMTP id 79.C1.08568.E86E8B56; Tue, 30 Jan 2024 14:07:42 +0200 (EET) X-ICS-AUTH-INFO: Authenticated user: mick at ics.forth.gr Message-ID: Date: Tue, 30 Jan 2024 14:07:37 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 3/3] riscv: optimized memset Content-Language: el-GR To: Jisheng Zhang , Paul Walmsley , Palmer Dabbelt , Albert Ou Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Matteo Croce References: <20240128111013.2450-1-jszhang@kernel.org> <20240128111013.2450-4-jszhang@kernel.org> From: Nick Kossifidis In-Reply-To: <20240128111013.2450-4-jszhang@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrHLMWRmVeSWpSXmKPExsXSHT1dWbfv2Y5Ug4P7LSy2/p7FbtF8bD2b xeVdc9gstn1uYbP4++s/q8XLyz3MFm2z+B3YPd68fMnicbjjC7vHplWdbB6bl9R7tO74y+5x qfk6u8fnTXIB7FFcNimpOZllqUX6dglcGUc/NDIXrBWrmDG9h7GBcYFgFyMnh4SAicT89iUs XYxcHEICRxkljj7cywiRsJTY/OklC4jNK2AvcXPKW2YQm0VAVeL5+ymMEHFBiZMzn4DViArI S9y/NYMdxBYWMJCYeK4frJ5ZQFziyPnfzCALRAS2MkpsffWWHSKRL7GkbQHYICGBZImnKy6A DWIT0JSYf+kgmM0pYC7x8fg+Joh6M4murV2MELa8xPa3c5gnMArMQnLHLCT7ZiFpmYWkZQEj yypGgcQyY73M5GK9tPyikgy99KJNjOAoYPTdwXh781u9Q4xMHIyHGCU4mJVEeH9qbk0V4k1J rKxKLcqPLyrNSS0+xCjNwaIkznvCdkGykEB6YklqdmpqQWoRTJaJg1Oqgan9657llraK/57O 4z1YprTuYdGPkiUfS0+LCm5+MaEyM2uxr8ftyVmfPu42Uua8z8eko2vU894xXeCefZJJlbHH ga7X/zq0+tyOmQar7He48cCh6vnypRtztd4WSPF+yshgWSMuufippH9h1czjH9Oep5/f+7Hf Q9Z3568K6Rqjq66fvh1vUbrG0fubWSh5d2XgDs3uVyWr5k2Kz/p+x+FE281ZH15F+ShMtOuz myE+UU25cH9T8/1LLR/KXNTytJ5kpW3/FxNfsWBKYYjbnHM3Mp4eFD9r8cWivfz8X+7yIAHB E1dfeK1PiVz5L/1PvcCvl2f9zTasNuZbLxFmE/nklKZD4YGL65+unro+yE+JpTgj0VCLuag4 EQCq+/Hb8QIAAA== On 1/28/24 13:10, Jisheng Zhang wrote: > diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c > index 20677c8067da..022edda68f1c 100644 > --- a/arch/riscv/lib/string.c > +++ b/arch/riscv/lib/string.c > @@ -144,3 +144,44 @@ void *memmove(void *dest, const void *src, size_t count) __weak __alias(__memmov > EXPORT_SYMBOL(memmove); > void *__pi_memmove(void *dest, const void *src, size_t count) __alias(__memmove); > void *__pi___memmove(void *dest, const void *src, size_t count) __alias(__memmove); > + > +void *__memset(void *s, int c, size_t count) > +{ > + union types dest = { .as_u8 = s }; > + > + if (count >= MIN_THRESHOLD) { > + unsigned long cu = (unsigned long)c; > + > + /* Compose an ulong with 'c' repeated 4/8 times */ > +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER > + cu *= 0x0101010101010101UL; > +#else > + cu |= cu << 8; > + cu |= cu << 16; > + /* Suppress warning on 32 bit machines */ > + cu |= (cu << 16) << 16; > +#endif I guess you could check against __SIZEOF_LONG__ here. > + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { > + /* > + * Fill the buffer one byte at time until > + * the destination is word aligned. > + */ > + for (; count && dest.as_uptr & WORD_MASK; count--) > + *dest.as_u8++ = c; > + } > + > + /* Copy using the largest size allowed */ > + for (; count >= BYTES_LONG; count -= BYTES_LONG) > + *dest.as_ulong++ = cu; > + } > + > + /* copy the remainder */ > + while (count--) > + *dest.as_u8++ = c; > + > + return s; > +} > +EXPORT_SYMBOL(__memset); BTW a similar approach could be used for memchr, e.g.: #if __SIZEOF_LONG__ == 8 #define HAS_ZERO(_x) (((_x) - 0x0101010101010101ULL) & ~(_x) & 0x8080808080808080ULL) #else #define HAS_ZERO(_x) (((_x) - 0x01010101UL) & ~(_x) & 0x80808080UL) #endif void * memchr(const void *src_ptr, int c, size_t len) { union const_data src = { .as_bytes = src_ptr }; unsigned char byte = (unsigned char) c; unsigned long mask = (unsigned long) c; size_t remaining = len; /* Nothing to do */ if (!src_ptr || !len) return NULL; if (len < 2 * WORD_SIZE) goto trailing; mask |= mask << 8; mask |= mask << 16; #if __SIZEOF_LONG__ == 8 mask |= mask << 32; #endif /* Search by byte up to the src's alignment boundary */ for(; src.as_uptr & WORD_MASK; remaining--, src.as_bytes++) { if (*src.as_bytes == byte) return (void*) src.as_bytes; } /* Search word by word using the mask */ for(; remaining >= WORD_SIZE; remaining -= WORD_SIZE, src.as_ulong++) { unsigned long check = *src.as_ulong ^ mask; if(HAS_ZERO(check)) break; } trailing: for(; remaining > 0; remaining--, src.as_bytes++) { if (*src.as_bytes == byte) return (void*) src.as_bytes; } return NULL; } Regards, Nick