Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8141889rwr; Wed, 10 May 2023 19:02:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6un0cJCLEDc0gZld0DP8TbMaA+60M8CDHjW9hnn9rLEGbb/7DCJnp5toWkeMe9IawG8IBG X-Received: by 2002:a17:90b:3ece:b0:24e:3413:c7ff with SMTP id rm14-20020a17090b3ece00b0024e3413c7ffmr29702970pjb.7.1683770566225; Wed, 10 May 2023 19:02:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683770566; cv=none; d=google.com; s=arc-20160816; b=PakYooiCH0YoRfgdsd+3dnWc6PNmjmE6JOHdcztOmbB978hzFQPJI+sp31q3Yly2VV c0kUBRL+aT4dAPRDz7lyL1BxM44CNeOxih9Ix4r67Z+QvJFUeWxLkFtVif6KFuQZAi40 AU0QNhn9DoV6ahDrPUIq2RlXio6dbtafrBhTSGeyfFs7tx/EAGCYt6VqtrspVHyusP4D rG7f1MS+1E82oEePnlJOyPoe0TOyFW1h2SGY0uVWtZV74Ura0FWSYycpn7B8pbcoNWbD 7VIdaNj3YXgZogA8GEI1kCgdJb4i2SLIfvGyid/mUEj7XsTZW2+OimXucKYXTUlCykAI QFVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NIad5o/gCF/UbBioYF/NhdVu7OxbA2d0uZxhfg7phII=; b=LO+xwZOPsz0izViybNsLp8/uquH23SoTMQmlX394BWmf+RuDAJlmI+ok5ZdBVWjBbR LN4lyZdtqhoBxju9Cxwya3pTrMlwosJZ+H8FzSmyVweK3AxsSDz+X2Cy4SdULHO7A7u4 Ae7oTNtgvkQtcv4s8H1CRB/gaqWmRW+NsQzt79XOdygeAC+nvQQGF8H8qxTVskjCrdF4 d9ExQxmSp0YzvyGnyXhPJTXJM9XYxJbT0iYwlscNYa4UxBR+adRUi6LH/BiOpo76YLXm 4qfsDQDkNmSCUtmJJmoommZ6GaLcJeQ8M+SKAAoF3ZMwEn0KT3FD3tXBBQvcXJ0P+ASg tncw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=EqDt6WIJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v6-20020a17090abb8600b0025049be6267si12054107pjr.133.2023.05.10.19.02.31; Wed, 10 May 2023 19:02:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@163.com header.s=s110527 header.b=EqDt6WIJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=163.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236520AbjEKBna (ORCPT + 99 others); Wed, 10 May 2023 21:43:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229791AbjEKBn3 (ORCPT ); Wed, 10 May 2023 21:43:29 -0400 Received: from m12.mail.163.com (m12.mail.163.com [220.181.12.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2F08C40CB for ; Wed, 10 May 2023 18:43:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=NIad5 o/gCF/UbBioYF/NhdVu7OxbA2d0uZxhfg7phII=; b=EqDt6WIJwLIN9EIlhPeEq su1mQn9rqyxiIKdCJ1yrMEap+ZOAmyDVKyFK4wWwgErhuOOEKhYDeFreqKP3oETK yBppZY6aZ6L+w80vuh+ZQsV6yGABJyBMTbkEIAVyhj8hAXvGHeCx/cpENUPbebV4 VKedhJPIcr918o2Jzq9o8I= Received: from zhangf-virtual-machine.localdomain (unknown [180.111.102.183]) by zwqz-smtp-mta-g2-4 (Coremail) with SMTP id _____wCHxjYWSFxkD+YgBg--.56513S2; Thu, 11 May 2023 09:42:46 +0800 (CST) From: zhangfei To: ajones@ventanamicro.com Cc: aou@eecs.berkeley.edu, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, palmer@dabbelt.com, paul.walmsley@sifive.com, zhang_fei_0403@163.com, zhangfei@nj.iscas.ac.cn Subject: Re: [PATCH] riscv: Optimize memset Date: Thu, 11 May 2023 09:42:43 +0800 Message-Id: <20230511014243.3336-1-zhang_fei_0403@163.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230510-0adf0b2a2956ca1cd426a2d2@orel> References: <20230510-0adf0b2a2956ca1cd426a2d2@orel> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _____wCHxjYWSFxkD+YgBg--.56513S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7WFy7urW3XF4rWF1xXrWkJFb_yoW8Aw17pr 95JF1DKF4qgwnakw429w4IqrWYk3Z5JF1rXFWUJ3srA3s0g34rtF93KF4Y9a9rGrnakay2 vr45Xr1fXF1UZaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0pEGYLDUUUUU= X-Originating-IP: [180.111.102.183] X-CM-SenderInfo: x2kd0w5bihxsiquqjqqrwthudrp/1tbiWxVsl2I0Z+m5pQABsw X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SORTED_RECIPS, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: zhangfei On Wed, May 10, 2023 at 14:58:22PM +0200, Andrew Jones wrote: > On Wed, May 10, 2023 at 11:52:43AM +0800, zhangfei wrote: > > From: zhangfei > > > > On Tue, May 09, 2023 11:16:33AM +0200, Andrew Jones wrote: > > > On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote: > > > > > > > > Hi, > > > > > > > > I filled head and tail with minimal branching. Each conditional ensures that > > > > all the subsequently used offsets are well-defined and in the dest region. > > > > > > I know. You trimmed my comment, so I'll quote myself, here > > > > > > """ > > > After the check of a2 against 6 above we know that offsets 6(t0) > > > and -7(a3) are safe. Are we trying to avoid too may redundant > > > stores with these additional checks? > > > """ > > > > > > So, again. Why the additional check against 8 above and, the one you > > > trimmed, checking 10? > > > > Hi, > > > > These additional checks are to avoid too many redundant stores. > > > > Adding a check for more than 8 bytes is because after the loop > > segment '3' comes out, the remaining bytes are less than 8 bytes, > > which also avoids redundant stores. > > So the benchmarks showed these additional checks were necessary to avoid > making memset worse? Please add comments to the code explaining the > purpose of the checks. Hi, As you mentioned, the lack of these additional tests can make memset worse. When I removed the checks for 8 and 10 above, the benchmarks showed that the memset changed to 0.21 bytes/ns at 8B. Although this is better than storing byte by byte, additional detections will bring a better improvement to 0.27 bytes/ns. Due to the chaotic response in my previous email, I am sorry for this. I have reorganized patch v2 and sent it to you. Please reply under the latest patch. Thanks, Fei Zhang