Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp1504752pxv; Fri, 2 Jul 2021 05:34:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxPWfyteCMiYQZ+kAZA6ytKftsWc79d/KJkhbqQ04pTYmCZg9qp9hMB5GJqN8B4ksqhLPuR X-Received: by 2002:a17:906:3402:: with SMTP id c2mr5054703ejb.213.1625229249696; Fri, 02 Jul 2021 05:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625229249; cv=none; d=google.com; s=arc-20160816; b=GsX9sQS3aBqhxKsOkngSBiP6GTFNrujJhvQTu/a0F1lBvnqqcXnw65zZmiw9rhcwv5 cMNC2dd9PFe+TG422u1WYDBz/dPc0tUPlYs0vyDrqgwA7HOI/12LAV9X5k/GtvvriCYN QNuWWa1zu7oy3CQCJ0YiP9YDwfy8sSRbCBdi3Uf4o2TaWabb0lCSxBR26ayvYriJaHib Ojp0RSBbpw2iH5Ml7Lljt4+tzgIP5y7+nCi3UNrdgt6TJQclUon0EAkAm8aLiUYvdHpf nkUcbAN7iYF+riKvUATcu5KcQVCATYRzz0yvAkyghQ41lAfWOItyDaxoRV57wSA9KhIG Rf7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=o6ZDZDCvhi/aYSH4+o4WB0jbVOX9Yf7JZyqDclGvSvg=; b=fvZNzcNah6t4I1mvufgTAfrEWhu13a8mpOqod2oBDlsSpYSnuW06qxGJb23inlardo OHFxIxbZqWzLb+L98DSnB5zzEiI+0UZLm6C2toICXZjIj94GcRVAUESBX72lUCibvtpP WaR9pGPi5MwD77VJZZmj3SPC/eX5Lxr/fdtfW2pebxaFeU5CQCjdWAYPef0JB8xh8f03 Ky/ws/fD7p/fQ8aAggWFvsdh3x5NoOaLyzser4ZfwciV7XlnuxBEbB0CCb/XkIpjsdNt mPJDgiXeFQjmLmXcRSo6q0aP0q9lyNl4ESYcXGildRrNP0BUvLcRjnCKwUOWQaPVMFII JHMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hr12si3210680ejc.222.2021.07.02.05.33.43; Fri, 02 Jul 2021 05:34:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232226AbhGBMfB (ORCPT + 99 others); Fri, 2 Jul 2021 08:35:01 -0400 Received: from mail-ej1-f52.google.com ([209.85.218.52]:39597 "EHLO mail-ej1-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232228AbhGBMfA (ORCPT ); Fri, 2 Jul 2021 08:35:00 -0400 Received: by mail-ej1-f52.google.com with SMTP id hp26so3136983ejc.6; Fri, 02 Jul 2021 05:32:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=o6ZDZDCvhi/aYSH4+o4WB0jbVOX9Yf7JZyqDclGvSvg=; b=RxLVEXFe1ZOqlo95ZCKImSbLkM93C3aTVIrD1jipfq0PYcGC6u0bMuOMcnaWXq/A03 emTFg+3AwnvEffV2+CLG8riG2g1Zj1qlpm3ivBKQa3sWSCMbsKTcYeu9UxM3r13b0bcF Y0E2XzP6u2se5IvtQfeyULy3o2/HfsHkudXhGO8e8GxCl5uDkwo25DVa0++FYb0WqGm6 mUjDmOMxf1Xho+0OLnDAnSH5aaZA+Ey9eXSbLYmhqqTT5as+tV68+j7T0HM8/IvALLe/ 64uIpW2YYHQdoVZUzIcOu7puQzd+g4ut+uSIi6MaLkkfun4/aE0ACtYSm4HwD3StG3Z6 xFow== X-Gm-Message-State: AOAM530GPvH1ItIC2Dn/8RJW6xAGRaPAEMzFaYX02BXsVQKvhB1k5rN3 8t0sNg9dumy6lJM4lu+y2F7uZ/ab2nIN4Q== X-Received: by 2002:a17:906:9b86:: with SMTP id dd6mr4909110ejc.100.1625229142346; Fri, 02 Jul 2021 05:32:22 -0700 (PDT) Received: from msft-t490s.fritz.box (host-80-182-89-242.retail.telecomitalia.it. [80.182.89.242]) by smtp.gmail.com with ESMTPSA id c3sm1290189edy.0.2021.07.02.05.32.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jul 2021 05:32:21 -0700 (PDT) From: Matteo Croce To: linux-kernel@vger.kernel.org, Nick Kossifidis , Guo Ren , Christoph Hellwig , David Laight , Palmer Dabbelt , Emil Renner Berthing , Drew Fustini Cc: linux-arch@vger.kernel.org, Andrew Morton , Nick Desaulniers , linux-riscv@lists.infradead.org Subject: [PATCH v2 3/3] lib/string: optimized memset Date: Fri, 2 Jul 2021 14:31:53 +0200 Message-Id: <20210702123153.14093-4-mcroce@linux.microsoft.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210702123153.14093-1-mcroce@linux.microsoft.com> References: <20210702123153.14093-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Matteo Croce The generic memset is defined as a byte at time write. This is always safe, but it's slower than a 4 byte or even 8 byte write. Write a generic memset which fills the data one byte at time until the destination is aligned, then fills using the largest size allowed, and finally fills the remaining data one byte at time. On a RISC-V machine the speed goes from 140 Mb/s to 241 Mb/s, and this the binary size increase according to bloat-o-meter: Function old new delta memset 32 148 +116 Signed-off-by: Matteo Croce --- lib/string.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/lib/string.c b/lib/string.c index 108b83c34cec..264821f0e795 100644 --- a/lib/string.c +++ b/lib/string.c @@ -810,10 +810,38 @@ EXPORT_SYMBOL(__sysfs_match_string); */ void *memset(void *s, int c, size_t count) { - char *xs = s; + union types dest = { .as_u8 = s }; + if (count >= MIN_THRESHOLD) { + unsigned long cu = (unsigned long)c; + + /* Compose an ulong with 'c' repeated 4/8 times */ +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER + cu *= 0x0101010101010101UL; +#else + cu |= cu << 8; + cu |= cu << 16; + /* Suppress warning on 32 bit machines */ + cu |= (cu << 16) << 16; +#endif + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { + /* + * Fill the buffer one byte at time until + * the destination is word aligned. + */ + for (; count && dest.as_uptr & WORD_MASK; count--) + *dest.as_u8++ = c; + } + + /* Copy using the largest size allowed */ + for (; count >= BYTES_LONG; count -= BYTES_LONG) + *dest.as_ulong++ = cu; + } + + /* copy the remainder */ while (count--) - *xs++ = c; + *dest.as_u8++ = c; + return s; } EXPORT_SYMBOL(memset); -- 2.31.1