Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp12608136rwl; Tue, 3 Jan 2023 17:45:20 -0800 (PST) X-Google-Smtp-Source: AMrXdXvrQhwg22Xf1QcT4ERxi3EHwfKdew6B3bEBW6a+7ztVt97LQ4wHBjuoI+XsJylIZqkhj7gQ X-Received: by 2002:a62:6345:0:b0:580:cfbc:2902 with SMTP id x66-20020a626345000000b00580cfbc2902mr32707101pfb.1.1672796720475; Tue, 03 Jan 2023 17:45:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672796720; cv=none; d=google.com; s=arc-20160816; b=eza0i4jq1XcBSb1PE8j8ZywGVNqd1RHxLknRBC6BDjIgNP500xf3N4G/86d4OI+bcv 5yB5icDW+g0AWPMsqPEGhgA2SrCIpAXag6Xiu2+yZbJd39Y6Hks8Gu7U9Nux5wyaoYpn O20xWm4RGXyxRA7hA9JzDdDCDJGzrxcXqf+iaTXFPbq0b4ZBunihXiazN4QLsaktDV/a nTFNbmOFSig8YC/Iq4O0GxS8ELGGbGZZ9QNz4eukvUixyqtkWtMLzSnvuO3NmdSL70mr WDID0z9eyr24GeGZ/9TwRCMT3d6qq8M2wR1ER5sYF4AGl5XbadEOHAmXH8GVT+GTvosw nXdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=lpT9E3y7rlRqOVEo0UYdARYqNBrmaTyz67X3cVwJI2c=; b=jWvAhPzKnmJWWdS+4f9wly6DwmpZdhnCQDSRw2dgT/a1NVuuNAUJyZ0j+KO2OoDEZJ ltpeBpi74hAXPjxXG8m1DXnwp6a5jynbA7HkTu0X2KNbGoDdmoU0OKAOfMPsAleZYiH+ fkcHLY1Uhn2lhVZk2DTU+UiWoM0sy1Z6TRNkAEIYsmn7Ood5X44FcoaoJjBXpj1Oio0d BLcLDjCWBqwnKFengx1MYSRhuoVDPj4jaxH2zcXcvyWvT2f3rhBelz5/39gmS2LtLxLD EBOi4N6Tyo2Jdw3M6jUn5LMmOJxL8ZAKm3mRCBDr4M2tWbTCp+EIb5MVEYQtLHFHivER jvFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i63-20020a626d42000000b005747a147929si32194341pfc.21.2023.01.03.17.45.12; Tue, 03 Jan 2023 17:45:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234290AbjADBbo (ORCPT + 59 others); Tue, 3 Jan 2023 20:31:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233577AbjADBbm (ORCPT ); Tue, 3 Jan 2023 20:31:42 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2B2AD8 for ; Tue, 3 Jan 2023 17:31:41 -0800 (PST) Received: from kwepemm600005.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NmsV86T4wzRqht; Wed, 4 Jan 2023 09:30:08 +0800 (CST) Received: from [10.67.109.54] (10.67.109.54) by kwepemm600005.china.huawei.com (7.193.23.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 4 Jan 2023 09:31:39 +0800 Subject: Re: [bug-report] possible performance problem in ret_to_user_from_irq To: Jens Axboe , "Russell King (Oracle)" References: <7ecb8f3c-2aeb-a905-0d4a-aa768b9649b5@huawei.com> <50a5ebdb-4107-26cc-a2f6-da551d99ff38@kernel.dk> <1ecb9b0c-1103-650a-e32a-93110466b2ae@kernel.dk> CC: , From: Hui Tang Message-ID: Date: Wed, 4 Jan 2023 09:31:39 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <1ecb9b0c-1103-650a-e32a-93110466b2ae@kernel.dk> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.109.54] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600005.china.huawei.com (7.193.23.191) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/1/3 22:59, Jens Axboe wrote: > On 1/3/23 7:34?AM, Russell King (Oracle) wrote: >> On Tue, Jan 03, 2023 at 07:25:26AM -0700, Jens Axboe wrote: >>> On 1/3/23 3:06?AM, Russell King (Oracle) wrote: >>>> On Mon, Dec 26, 2022 at 04:45:20PM +0800, Hui Tang wrote: >>>>> hi folks. >>>>> >>>>> I found a performance problem which is introduced by commit >>>>> 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL"). >>>>> After the commit, any bit in the range of 0..15 will cause >>>>> do_work_pending() to be invoked. More frequent do_work_pending() >>>>> invoked possible result in worse performance. >>>>> >>>>> Some of the tests I've done? as follows: >>>>> lmbench test base with patch >>>>> ./lat_ctx -P 1 -s 0 2 7.3167 11.04 >>>>> ./lat_ctx -P 1 -s 16 2 8.0467 14.5367 >>>>> ./lat_ctx -P 1 -s 64 2 7.8667 11.43 >>>>> ./lat_ctx -P 1 -s 16 16 16.47 18.3667 >>>>> ./lat_pipe -P 1 28.1671 44.7904 >>>>> >>>>> libMicro-0.4.1 test base with patch >>>>> ./cascade_cond -E -C 200\ >>>>> -L -S -W -N "c_cond_1" -I 100 286.3333 358 >>>>> >>>>> When I adjust test bit, the performance problem gone. >>>>> - movs r1, r1, lsl #16 >>>>> + ldr r2, =#_TIF_WORK_MASK >>>>> + tst r1, r2 >>>>> >>>>> Does anyone have a good suggestion for this problem? >>>>> should just test _TIF_WORK_MASK, as before? >>>> >>>> I think it should be fine - but I would suggest re-organising the >>>> TIF definitions so that those TIF bits that shouldn't trigger >>>> do_work_pending are not in the first 16 bits. >>>> >>>> Note that all four bits in _TIF_SYSCALL_WORK need to stay within >>>> an 8-bit even-bit-aligned range, so the value is suitable for an >>>> immediate assembly constant. >>>> >>>> I'd suggest moving the TIF definitions for 20 to 19, and 4..7 to >>>> 20..23, and then 8 to 4. >>> >>> Like this? >>> >>> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h >>> index aecc403b2880..7f092cb55a41 100644 >>> --- a/arch/arm/include/asm/thread_info.h >>> +++ b/arch/arm/include/asm/thread_info.h >>> @@ -128,15 +128,16 @@ extern int vfp_restore_user_hwstate(struct user_vfp *, >>> #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ >>> #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ >>> #define TIF_UPROBE 3 /* breakpointed or singlestepping */ >>> -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ >>> -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ >>> -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ >>> -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ >>> -#define TIF_NOTIFY_SIGNAL 8 /* signal notifications exist */ >>> +#define TIF_NOTIFY_SIGNAL 4 /* signal notifications exist */ >>> >>> #define TIF_USING_IWMMXT 17 >>> #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ >>> -#define TIF_RESTORE_SIGMASK 20 >>> +#define TIF_RESTORE_SIGMASK 19 >>> +#define TIF_SYSCALL_TRACE 20 /* syscall trace active */ >>> +#define TIF_SYSCALL_AUDIT 21 /* syscall auditing active */ >>> +#define TIF_SYSCALL_TRACEPOINT 22 /* syscall tracepoint instrumentation */ >>> +#define TIF_SECCOMP 23 /* seccomp syscall filtering active */ >>> + >>> >>> #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) >>> #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) >> >> Yep, LGTM, thanks. > > Hui Tang, can you give it a whirl? Just checked and it applies to > 5.10-stable as well, just with a slight offset. Okay, I'll test it today.