Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5504340rwr; Tue, 9 May 2023 02:22:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7gdgQUwt9T6Z++wA4ZDHqvIdHkiwn44TiuFokS9fxUosfOd8kjPRfP/eR1QObTpd8AgxVb X-Received: by 2002:a05:6a00:23c3:b0:637:f1ae:d3e with SMTP id g3-20020a056a0023c300b00637f1ae0d3emr18137308pfc.25.1683624137125; Tue, 09 May 2023 02:22:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683624137; cv=none; d=google.com; s=arc-20160816; b=O/uL6BJ2zdkNsC1WrGOCFe6gyfKQs4nNJFjOGs9RdUBd1rZqWAiKYVRaOowotb2qk2 CY+utoLb61kp0gSD3Bqu+BeFDqjVCFjYUcjoYHXJYzaGltU1wcXbMXlZfxhTdWUVguou soVPPj8jvHDHh8KOWNCedqV8o3LzQ2BIPkhdYjX6NAVEau1svRld43HV5ijZPQahg9Bn 1Ok1nL+J7JQwES6D9YaZjaIAJG4SPyKV4m7FQMNv2V+aucvFvmr9WC7pOGzCWg2Jv4G0 /mpK4tbNYPaeERN/So4y5VN1kcxJLr24VJjz2cCAY+v2NTtUmyz+9ItdtbeYIAEHxKcQ PiCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ESRtU13Q20lhgeVgaqet/6dyVYxVu+bxM8k/smriowo=; b=t76YPr4ZuzVVX7yi3orOk1kT76I4VlI9kWO2dRZFfKJ4Rrhecvx4+m/nKLDfaNIX51 J98W3NVN8CfAHgjUdS7AQGSvzPt51s7P8KNgmzSlIDAlBEzMclrLfgV8HsAnclhTVvDe IPKA3HQkTAFRSYFkbkwkoFSRB4CFf3uCDXZVHURadCVVdX1icOgFNv60AD5+fxZcGQ6u eUsVdYSFdN5e0kzpso2nh/di4pIeZX+ooA9gfezwpjecak2ovPLR/fQW9WyXmF6cCcuJ Kpjx2wIn8Mk9opUOhh4DGJUJm8A5zGu0HDITeO/IPgWPKMQ7eq/Il0+oiyGKi+6ZBjev Wvvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ventanamicro.com header.s=google header.b=Uk9yV9Ey; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m184-20020a6258c1000000b006456f83ea4dsi1968191pfb.201.2023.05.09.02.22.04; Tue, 09 May 2023 02:22:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ventanamicro.com header.s=google header.b=Uk9yV9Ey; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233238AbjEIJQj (ORCPT + 99 others); Tue, 9 May 2023 05:16:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229899AbjEIJQh (ORCPT ); Tue, 9 May 2023 05:16:37 -0400 Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97A2CDC58 for ; Tue, 9 May 2023 02:16:36 -0700 (PDT) Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-9619095f479so886132866b.1 for ; Tue, 09 May 2023 02:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1683623795; x=1686215795; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ESRtU13Q20lhgeVgaqet/6dyVYxVu+bxM8k/smriowo=; b=Uk9yV9EyvHzhIP9kQ+L+aLwpbcYJw1tTqOttKI4POpX+hA4O4NDfUG3ka3/m2DtpdE Eemk7BJUMb9ZkBLuGazl6xBfCKUKmdzhHSuoxt4zDCLasBmsLizqoiHoneiIKLJJRyrW Y5tG6BAXpb7Vfr+1MhNPNxZECG1gZRQfOeUmmC9a5E8JM9rIKrQc7VLHMZxyWhnHeIFh 2a8NTd4q3bNYASOU2Cac/W2zI0IbbvttC2d7h90vrBh8399jtsYp17ggv7ACRxEojd8U 2sWeyh78aLiWJ+GjfzeajzPsZg1KKQgX4kRyeb3nojUihhVA1xRnLT9bWe3wfo5XEOL9 yUDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683623795; x=1686215795; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ESRtU13Q20lhgeVgaqet/6dyVYxVu+bxM8k/smriowo=; b=XUS12E7Jdmc4l6bejtiWXYQeAAUPxQuYl3J6gXlCDUyoe91Zb/ALfF97tFpO0/VlF1 DgIEGiQJuNMbOj3zFdeiyFj00W/6reGMJA2qkow0nuk55hBMA/fYXkK/YXUFWfk0/eEH eeGvlqgfbBy8e/f+rm7lMtELHQidyZkYdz2P/kKBCrAPz8x4VZeDcfgqsKL/yYt/jUGA 6p1KDncRTwyidtEEFIHDjuJEcJCHqbkZNtnJjxir59S84p5Tbs5OUmUl1TvaQInnLUkb 5UE8uKddJNxcWzePjP50G9XW+ow5OTmTEVjU5lXh7pubtbWyo/81HMzCYMTLGpM0Z3Z4 WDkQ== X-Gm-Message-State: AC+VfDyahg3WxdUkoYNag8CE80pTURWbd70TYctIHOyL0WCNSTJY2uTL j1piT5bu++GerFuzk1D4Rv4x7ndxBW5r2rpvRbA= X-Received: by 2002:a17:906:d554:b0:94f:e98:4e94 with SMTP id cr20-20020a170906d55400b0094f0e984e94mr11440790ejc.47.1683623795051; Tue, 09 May 2023 02:16:35 -0700 (PDT) Received: from localhost (2001-1ae9-1c2-4c00-20f-c6b4-1e57-7965.ip6.tmcz.cz. [2001:1ae9:1c2:4c00:20f:c6b4:1e57:7965]) by smtp.gmail.com with ESMTPSA id r23-20020aa7da17000000b00506987c5c71sm533721eds.70.2023.05.09.02.16.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 May 2023 02:16:34 -0700 (PDT) Date: Tue, 9 May 2023 11:16:33 +0200 From: Andrew Jones To: zhangfei Cc: aou@eecs.berkeley.edu, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, palmer@dabbelt.com, paul.walmsley@sifive.com, zhangfei@nj.iscas.ac.cn Subject: Re: [PATCH] riscv: Optimize memset Message-ID: <20230509-b0dc346928ddc8d2b5690f67@orel> References: <20230505-9ec599a36801972451e8b17f@orel> <20230509022207.3700-1-zhang_fei_0403@163.com> <20230509022207.3700-3-zhang_fei_0403@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230509022207.3700-3-zhang_fei_0403@163.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote: > From: zhangfei > > > > 5: > > > - sb a1, 0(t0) > > > - addi t0, t0, 1 > > > - bltu t0, a3, 5b > > > + sb a1, 0(t0) > > > + sb a1, -1(a3) > > > + li a4, 2 > > > + bgeu a4, a2, 6f > > > + > > > + sb a1, 1(t0) > > > + sb a1, 2(t0) > > > + sb a1, -2(a3) > > > + sb a1, -3(a3) > > > + li a4, 6 > > > + bgeu a4, a2, 6f > > > + > > > + sb a1, 3(t0) > > > + sb a1, -4(a3) > > > + li a4, 8 > > > + bgeu a4, a2, 6f > > > > Why is this check here? > > Hi, > > I filled head and tail with minimal branching. Each conditional ensures that > all the subsequently used offsets are well-defined and in the dest region. I know. You trimmed my comment, so I'll quote myself, here """ After the check of a2 against 6 above we know that offsets 6(t0) and -7(a3) are safe. Are we trying to avoid too may redundant stores with these additional checks? """ So, again. Why the additional check against 8 above and, the one you trimmed, checking 10? > > Although this approach may result in redundant storage, compared to byte by > byte storage, it allows storage instructions to be executed in parallel and > reduces the number of jumps. I understood that when I read the code, but text like this should go in the commit message to avoid people having to think their way through stuff. > > I used the code linked below for performance testing and commented on the memset > that calls the arm architecture in the code to ensure it runs properly on the > risc-v platform. > > [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53 > > The testing platform selected RISC-V SiFive U74.The test data is as follows: > > Before optimization > --------------------- > Random memset (bytes/ns): > memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30 > > Medium memset (bytes/ns): > memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67 > Large memset (bytes/ns): > memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40 > > After optimization > --------------------- > Random memset bytes/ns): > memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31 > Medium memset (bytes/ns ) > memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67 > Large memset (bytes/ns): > memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40 > > From the results, it can be seen that memset has significantly improved its performance with > a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns. And these benchmark results belong in the cover letter, which this series is missing. Thanks, drew