Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8135906rwr; Wed, 10 May 2023 18:56:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4wfNOoP9JFC/qYIssYLStKzWy0sFjdSQpiePEUs9MdpIWh0u/z5t74pNhPwqwU4SpUDhRd X-Received: by 2002:a17:90a:1fca:b0:250:d293:5d9d with SMTP id z10-20020a17090a1fca00b00250d2935d9dmr4680157pjz.43.1683770179314; Wed, 10 May 2023 18:56:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683770179; cv=none; d=google.com; s=arc-20160816; b=WIJjfqMa8mRhG6rWdVRxSsX/OMZT1fzCD1ecf5jfemHi65hE37gd44VXrI41CMYQrY eamsDRx/wtStCQ/+kPHtEbn4ozC8IViMdhDV1SxPYaM5HsXXocwiWST97iFzpuNrwAgv 0KQbv9v9OdaRFG1k4sQvJD8NI7EjRWzq6zMvEsCG2VhkHYtkAZhuRPU0+RLspBuurAe/ 6g0nSKb2j9VIrtkVG7krme5LkuSVdTWrbpb59iQ1KCJbWj6PjJcOWyIuDUr7KtDWifYB GPOtvUjzbyRcWErYlwDAHHezYtZwyGZpT2L1A6EonZGoGhMj1vvLCL0WLkibJAnVMPQC XQNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=cR7+2t+0b0AcTv+1gS1SNzk+cirj2d8JCGOUBrY8OuU=; b=J0M9FX+roLEtYiXC/cEr8zIjnNy3tSCf7EVoF45QXb1nApxYG+9IvCpbO1M121vE5U RxtD5clWrMMklhmrATZN4RyvZt3u5vAQrJBDwuy5epMjjyXOjZRDiumWZ+vjrC1ZWvZt JE+2WyrrsW6ebyXQhXC0dCb3HKTAUdHjgU0zG9cAnX1EDo8cMNpjYUTWIpBe+Dp8hwe1 t/faOQ0VFgcnbWKMcWAyJYlL3kSEjdDG4Qv2zXbIa2muiCJyT3gcpcXxhoNh/uo+E6qN 1hPVo7C8GV814WY3qO8e7DWvPMbzHsxyzlH72g4YOn+Nm6YWNrlB9OGqARgAzZrh3uFd qwzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=N6U+GLfJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v6-20020a17090abb8600b0025049be6267si12054107pjr.133.2023.05.10.18.56.03; Wed, 10 May 2023 18:56:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=N6U+GLfJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236708AbjEKBfX (ORCPT + 99 others); Wed, 10 May 2023 21:35:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230089AbjEKBfV (ORCPT ); Wed, 10 May 2023 21:35:21 -0400 Received: from m12.mail.163.com (m12.mail.163.com [220.181.12.216]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1F9C33A96 for ; Wed, 10 May 2023 18:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=cR7+2 t+0b0AcTv+1gS1SNzk+cirj2d8JCGOUBrY8OuU=; b=N6U+GLfJwnELRIvyHdX0i Ihhsnlk5QG0LKMS7AFckJNdKB2ihMBEcViphfplKRfJk1FcCo+YGyLG656lWdX8d 4A0yhhEJruIqLzG+DLl4w75gtTvzgdZMjNEJm19CmMoaeU9+/Oq0h1K3/ptlrrKh wacwv/XM7B9VJ0d4wgkcDc= Received: from zhangf-virtual-machine.localdomain (unknown [180.111.102.183]) by zwqz-smtp-mta-g5-3 (Coremail) with SMTP id _____wAHa+BFRlxkpawlBg--.7119S2; Thu, 11 May 2023 09:35:01 +0800 (CST) From: zhangfei To: zhang_fei_0403@163.com Cc: ajones@ventanamicro.com, aou@eecs.berkeley.edu, conor.dooley@microchip.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, palmer@dabbelt.com, paul.walmsley@sifive.com, zhangfei@nj.iscas.ac.cn Subject: [PATCH v2 2/2] RISC-V: lib: Optimize memset performance Date: Thu, 11 May 2023 09:34:53 +0800 Message-Id: <20230511013453.3275-1-zhang_fei_0403@163.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230511012604.3222-1-zhang_fei_0403@163.com> References: <20230511012604.3222-1-zhang_fei_0403@163.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _____wAHa+BFRlxkpawlBg--.7119S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7Ar17ur4rJr1fuF1rArW8Zwb_yoW8GrW5pr 4rCFs3Kr15trn3Wr9xtw1qqr45GayfKw15Grsrtw1kJrsrWa1jv34rX3y5WFy7Gryvyrs3 Zr42yr18WF1UAw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zRVOJrUUUUU= X-Originating-IP: [180.111.102.183] X-CM-SenderInfo: x2kd0w5bihxsiquqjqqrwthudrp/xtbCfA9sl2DcJgt+CAAAsv X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: zhangfei Optimized performance when the data size is less than 16 bytes. Compared to byte by byte storage, significant performance improvement has been achieved. It allows storage instructions to be executed in parallel and reduces the number of jumps. Additional checks can avoid redundant stores. Signed-off-by: Fei Zhang --- arch/riscv/lib/memset.S | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S index e613c5c27998..452764bc9900 100644 --- a/arch/riscv/lib/memset.S +++ b/arch/riscv/lib/memset.S @@ -106,9 +106,43 @@ WEAK(memset) beqz a2, 6f add a3, t0, a2 5: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 5b + /* fill head and tail with minimal branching */ + sb a1, 0(t0) + sb a1, -1(a3) + li a4, 2 + bgeu a4, a2, 6f + + sb a1, 1(t0) + sb a1, 2(t0) + sb a1, -2(a3) + sb a1, -3(a3) + li a4, 6 + bgeu a4, a2, 6f + + /* + * Adding additional detection to avoid + * redundant stores can lead + * to better performance + */ + sb a1, 3(t0) + sb a1, -4(a3) + li a4, 8 + bgeu a4, a2, 6f + + sb a1, 4(t0) + sb a1, -5(a3) + li a4, 10 + bgeu a4, a2, 6f + + sb a1, 5(t0) + sb a1, 6(t0) + sb a1, -6(a3) + sb a1, -7(a3) + li a4, 14 + bgeu a4, a2, 6f + + /* store the last byte */ + sb a1, 7(t0) 6: ret END(__memset) -- 2.33.0