Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5355592iog; Wed, 22 Jun 2022 18:16:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sD1tOPvCHaNUcVM3KBGNzvvPCCaYMkUoFdcoza/dLUSr0kO7Q/PpattuoBPjfbNfKnVDJa X-Received: by 2002:a17:906:7a56:b0:722:df69:3bd5 with SMTP id i22-20020a1709067a5600b00722df693bd5mr5582566ejo.581.1655946970273; Wed, 22 Jun 2022 18:16:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655946970; cv=none; d=google.com; s=arc-20160816; b=mihS1h8udzwwtuWB5j6OebgsLH4crihCsNo3JGihtix5b8RIssARMaIA2cxnyyfRTE oC6qxb+Ks6TzFPbIMMEWtQ5BcURPHC6nQSEUyplBq6DBdFq/TxS4nFIwbEejTt4Y5qaI sfXP7ptm8a85E9FPvV8f6sFRp7DYDahpo+2wqrmpCc+ig6ICiSWso1EM86K2KWAtXB3c st83oCBTOGP8imAVFo98XAq/FXSLvY5NVUvzcs73vD5rIOqWmy9SJ019j+IshofKsmKK PgDG45GN3rZAQjfaduZ8gk1Kzg1CA+QULML/jehoAJkzmvDEAacuuFt7XoRWZy3sgiGZ KMoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=y2aWYAHMJo28dKXNX+NpxzdGV8bYf4jTIQQ0Wc60lAk=; b=pxsS5dfgTRptr1shYdrC+YlMMkRLxWx+BntQqksjvNeBfGr0t0HRjeSDGcGXrT6nuJ 2Bnl6adUiCGMHESu5bOFC7cpnE12x1RHEH4QDaIgkT46WjgIPKpKixkJmwoQv1F+OJ+K J9AMbYqtNRoim23uC9lRxNMchXOMHVsuqCrVMV4UIz+pGgAGboIDrvp369Na8Maf0PVF t9Kbmy5rgOKnDJ1tp2S405wQaee+hAAnDmVgj6KU1dm1SRdmWo+bAmUY7FB3dJIwI1IK 29I6tuN062qBil1SjIfMkvFFUdMwzBmAwZ1oR8rL5+LzorlstzaYclPma/kpl5poJBCl m2Tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b14-20020aa7cd0e000000b004357523b6a3si13677048edw.550.2022.06.22.18.15.44; Wed, 22 Jun 2022 18:16:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237986AbiFWAnq (ORCPT + 99 others); Wed, 22 Jun 2022 20:43:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376872AbiFWAnk (ORCPT ); Wed, 22 Jun 2022 20:43:40 -0400 Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EFD792FFDC for ; Wed, 22 Jun 2022 17:43:38 -0700 (PDT) Received: from [10.130.0.135] (unknown [113.200.148.30]) by mail.loongson.cn (Coremail) with SMTP id AQAAf9DxL983t7NiK3ZVAA--.30427S3; Thu, 23 Jun 2022 08:43:36 +0800 (CST) Subject: Re: [PATCH v2 2/2] LoongArch: No need to call RESTORE_ALL_AND_RET for all syscalls To: Huacai Chen References: <1655806074-17454-1-git-send-email-yangtiezhu@loongson.cn> <1655806074-17454-3-git-send-email-yangtiezhu@loongson.cn> Cc: WANG Xuerui , Xuefeng Li , Jianmin Lv , Jun Yi , Rui Wang , LKML , Jiaxun Yang From: Tiezhu Yang Message-ID: <1a431c16-ecef-f7e9-4c4f-936e4bb3aeea@loongson.cn> Date: Thu, 23 Jun 2022 08:43:35 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-CM-TRANSID: AQAAf9DxL983t7NiK3ZVAA--.30427S3 X-Coremail-Antispam: 1UD129KBjvJXoW3XryruFykJrWxCr1kZry3twb_yoWxKrWDpF WxAFnakF4DWry8Ar9a9F18WrZ0y3Z3WF45Gr4UCFWxCw1v93sxXryvvFWUKF1qgw4FkF4j qa4Fq3s2gws8t3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvm14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26r1I6r4UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4j 6F4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4UJV WxJr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7I2V7IY0VAS07AlzVAY IcxG8wCY02Avz4vE14v_twCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8Jw C20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAF wI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjx v20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rWUJVWrZr1UMIIF0xvE x4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnU UI43ZEXa7VUjg18DUUUUU== X-CM-SenderInfo: p1dqw3xlh2x3gn0dqz5rrqw2lrqou0/ X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Cc Jiaxun Yang On 06/22/2022 06:01 PM, Huacai Chen wrote: > Hi, Tiezhu, > > On Tue, Jun 21, 2022 at 6:08 PM Tiezhu Yang wrote: >> >> In handle_syscall, it is unnecessary to call RESTORE_ALL_AND_RET >> for all syscalls. >> >> (1) rt_sigreturn call RESTORE_ALL_AND_RET. >> (2) The other syscalls call RESTORE_STATIC_SOME_SP_AND_RET. >> >> This patch only adds the minimal changes as simple as possible >> to reduce the code complexity, at the same time, it can reduce >> many load instructions. >> >> Here are the test environments: >> >> Hardware: Loongson-LS3A5000-7A1000-1w-A2101 >> Firmware: UDK2018-LoongArch-A2101-pre-beta8 [1] >> System: loongarch64-clfs-system-5.0 [2] >> >> The system passed functional testing used with the following >> test case without and with this patch: >> >> git clone https://github.com/hevz/sigaction-test.git >> cd sigaction-test >> make check >> >> Additionally, use UnixBench syscall to test the performance: >> >> git clone https://github.com/kdlucas/byte-unixbench.git >> cd byte-unixbench/UnixBench/ >> make >> pgms/syscall 600 >> >> In order to avoid the performance impact, add init=/bin/bash >> to the boot cmdline. >> >> Here is the test result, the bigger the better, it shows about >> 1.2% gain tested with close, getpid and exec [3]: >> >> duration without_this_patch with_this_patch >> 600 s 626558267 lps 634244079 lps >> >> [1] https://github.com/loongson/Firmware/tree/main/5000Series/PC/A2101 >> [2] https://github.com/sunhaiyong1978/CLFS-for-LoongArch/releases/tag/5.0 >> [3] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/syscall.c > I test your patch and the whole UnixBench result is like this: > > Before patch, single thread: > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 9235787.7 791.4 > Double-Precision Whetstone 55.0 2758.7 501.6 > Execl Throughput 43.0 2386.8 555.1 > File Copy 1024 bufsize 2000 maxblocks 3960.0 191752.0 484.2 > File Copy 256 bufsize 500 maxblocks 1655.0 78737.9 475.8 > File Copy 4096 bufsize 8000 maxblocks 5800.0 297402.5 512.8 > Pipe Throughput 12440.0 353658.1 284.3 > Pipe-based Context Switching 4000.0 120140.8 300.4 > Process Creation 126.0 5735.0 455.2 > Shell Scripts (1 concurrent) 42.4 2701.5 637.1 > Shell Scripts (8 concurrent) 6.0 894.9 1491.5 > System Call Overhead 15000.0 557467.4 371.6 > ======== > System Benchmarks Index Score 516.1 > > After patch, single thread: > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 9235688.9 791.4 > Double-Precision Whetstone 55.0 2758.7 501.6 > Execl Throughput 43.0 2377.8 553.0 > File Copy 1024 bufsize 2000 maxblocks 3960.0 192545.5 486.2 > File Copy 256 bufsize 500 maxblocks 1655.0 79735.0 481.8 > File Copy 4096 bufsize 8000 maxblocks 5800.0 299621.9 516.6 > Pipe Throughput 12440.0 354969.1 285.3 > Pipe-based Context Switching 4000.0 118307.5 295.8 > Process Creation 126.0 5757.0 456.9 > Shell Scripts (1 concurrent) 42.4 2695.4 635.7 > Shell Scripts (8 concurrent) 6.0 894.4 1490.6 > System Call Overhead 15000.0 563582.7 375.7 > ======== > System Benchmarks Index Score 517.0 > > Before patch, multi-threads: > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 36943633.4 3165.7 > Double-Precision Whetstone 55.0 11035.8 2006.5 > Execl Throughput 43.0 8800.1 2046.5 > File Copy 1024 bufsize 2000 maxblocks 3960.0 277638.3 701.1 > File Copy 256 bufsize 500 maxblocks 1655.0 92530.5 559.1 > File Copy 4096 bufsize 8000 maxblocks 5800.0 524344.3 904.0 > Pipe Throughput 12440.0 1359237.2 1092.6 > Pipe-based Context Switching 4000.0 571511.4 1428.8 > Process Creation 126.0 20823.3 1652.6 > Shell Scripts (1 concurrent) 42.4 6883.9 1623.6 > Shell Scripts (8 concurrent) 6.0 981.7 1636.1 > System Call Overhead 15000.0 2029539.8 1353.0 > ======== > System Benchmarks Index Score 1367.4 > > After patch, multi-threads: > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 36943793.6 3165.7 > Double-Precision Whetstone 55.0 11035.5 2006.4 > Execl Throughput 43.0 8768.3 2039.1 > File Copy 1024 bufsize 2000 maxblocks 3960.0 277962.9 701.9 > File Copy 256 bufsize 500 maxblocks 1655.0 92059.7 556.3 > File Copy 4096 bufsize 8000 maxblocks 5800.0 525937.5 906.8 > Pipe Throughput 12440.0 1361566.6 1094.5 > Pipe-based Context Switching 4000.0 575835.4 1439.6 > Process Creation 126.0 20426.4 1621.1 > Shell Scripts (1 concurrent) 42.4 6877.5 1622.0 > Shell Scripts (8 concurrent) 6.0 980.3 1633.8 > System Call Overhead 15000.0 2049771.6 1366.5 > ======== > System Benchmarks Index Score 1366.6 > > From my point of view, the benefit is negligible. There is another way to look at what is going on. This patch is related with syscall, I prefer to observe "System Call Overhead" in the test results. Here are the INDEX of "System Call Overhead" in your test results: thread before_patch after_patch gain single 371.6 375.7 1.103% multi 1353.0 1366.5 0.998% For now, I would like to wait for other people's review. If the conclusion is the optimization is meaningless, I am fine with ignoring this patch. Thanks, Tiezhu > > > Huacai > >> >> Signed-off-by: Tiezhu Yang >> --- >> arch/loongarch/include/asm/stackframe.h | 5 +++++ >> arch/loongarch/kernel/entry.S | 15 +++++++++++++++ >> 2 files changed, 20 insertions(+) >> >> diff --git a/arch/loongarch/include/asm/stackframe.h b/arch/loongarch/include/asm/stackframe.h >> index 4ca9530..551ab8f 100644 >> --- a/arch/loongarch/include/asm/stackframe.h >> +++ b/arch/loongarch/include/asm/stackframe.h >> @@ -216,4 +216,9 @@ >> RESTORE_SP_AND_RET \docfi >> .endm >> >> + .macro RESTORE_STATIC_SOME_SP_AND_RET docfi=0 >> + RESTORE_STATIC \docfi >> + RESTORE_SOME \docfi >> + RESTORE_SP_AND_RET \docfi >> + .endm >> #endif /* _ASM_STACKFRAME_H */ >> diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S >> index d5b3dbc..c764c99 100644 >> --- a/arch/loongarch/kernel/entry.S >> +++ b/arch/loongarch/kernel/entry.S >> @@ -14,6 +14,7 @@ >> #include >> #include >> #include >> +#include >> >> .text >> .cfi_sections .debug_frame >> @@ -62,9 +63,23 @@ SYM_FUNC_START(handle_syscall) >> li.d tp, ~_THREAD_MASK >> and tp, tp, sp >> >> + /* Syscall number held in a7, we can store it in TI_SYSCALL. */ >> + LONG_S a7, tp, TI_SYSCALL >> + >> move a0, sp >> bl do_syscall >> >> + /* >> + * Syscall number held in a7 which is stored in TI_SYSCALL. >> + * rt_sigreturn call RESTORE_ALL_AND_RET. >> + * The other syscalls call RESTORE_STATIC_SOME_SP_AND_RET. >> + */ >> + LONG_L t3, tp, TI_SYSCALL >> + li.w t4, __NR_rt_sigreturn >> + beq t3, t4, 1f >> + >> + RESTORE_STATIC_SOME_SP_AND_RET >> +1: >> RESTORE_ALL_AND_RET >> SYM_FUNC_END(handle_syscall) >> >> -- >> 2.1.0 >>