Received: by 2002:a5d:925a:0:0:0:0:0 with SMTP id e26csp1268644iol; Sun, 12 Jun 2022 10:44:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxXfb1AK14+RJD/y2xo/mNDz3qVYfabBmJ/kubeI25vKUr3goQN9G/tZqcQjQaHCSUDNqCs X-Received: by 2002:a17:907:a422:b0:705:e944:fd3e with SMTP id sg34-20020a170907a42200b00705e944fd3emr48561092ejc.309.1655055898289; Sun, 12 Jun 2022 10:44:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655055898; cv=none; d=google.com; s=arc-20160816; b=bICcBoqnOd09ug3KreMJ+JnkoBYMiwW7g1eGE5/25RkgEmzAIEBdDchrVBC3WnEMR8 YZBq0B51nf9rAztQo92WUxhWji0VWXHmZnIkR9ZgZxygXXjQetpDiMbcRaOFqnnOCkC6 yyHJDOWKUUX53sk3puQtNqKqKm8g6uckkdgCPHRy1nhIXIJTudKqkRMLeSN2CA9CadmA YvbVIrO8Mt+zXEl6/M1grwhrcqREUX2td1gvhshoGGWstpS7muJgEqzEbJpbvWjr1IHC xtINz+uPwP2TNqxSDddwKeGXC4KmJUSCCXogQXuQi3uQAhKrupHMvIha5P+Knedc6vCp +Dkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=wZ/eV1jqglmOmAYJbmw1e8ljIsQkdqiGZdHIEfXwMgk=; b=T6bPhSEqPvVfCk64Wq6DY1bQCkLFI0lTfEmLJacNsD4l0G9ZV32fjr9JIR8QpQBxdv 7hBGSoH006oXGplOVHTvrpeh4awHxChijdAlqNBGu0Pa3NYaK9BpVqe9dgEXHdJe2l4c FZIs3WdSDyBXc8D4aLKpR6IFOxcPKN4arp2khBaUrkKtsqVA1zDxGRZgPVifnPppMLhI GfBR5erTl/l8BvGvcESoJLRgw3O0DGhbhnK2MtwO0yw24QAWy+H5f3L3+Re1SsZMf2tw QBCYSu3cCVOQmTN9OloJUHEVyt66S53Os/wlEGV6UBigyR8h3Yy73qrykP0IUofHpup5 KYyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="sYAsqi/3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ho34-20020a1709070ea200b00705973ded91si6869461ejc.453.2022.06.12.10.44.33; Sun, 12 Jun 2022 10:44:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="sYAsqi/3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231604AbiFLQkH (ORCPT + 99 others); Sun, 12 Jun 2022 12:40:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229473AbiFLQkF (ORCPT ); Sun, 12 Jun 2022 12:40:05 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E393E1839E; Sun, 12 Jun 2022 09:40:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6D36660FBA; Sun, 12 Jun 2022 16:40:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B96DDC34115; Sun, 12 Jun 2022 16:40:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655052002; bh=Xn/ixlJG6/c6tH5N7rI3NzTGuLWavqpfoz8W2lZkvdI=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=sYAsqi/3j5Z4GCn3cArS9fH8B+f8lr6B0O1z2newgr5qdXxBE+zV+P/iQYPms+4I1 RVSWeCWmclZzQIKrNgdHyTMl1uFelPmE/sL2RItTCas5dPsUiSm8youldX2sXnBFlu OV54dgR555KBs/Nd7N+L0lmAfa8rQj3GVo8YMLnd5HwdH37pwQ1hHR9lI/F/B+fF8S 9kAdoLYL26OU/l32qfFkZqLA+VvsfeeHwyXDrfo7i6Gg8/Fu8TRRw+0gpNlEqgm+lX S4WPo4fjJ9OLgKhW0K9K/bwmkf4XSSc2GIhohhsqY6O1U5I/afRAU0ICFcYmvP168J sT++Uu6oeqwSQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 526575C02F9; Sun, 12 Jun 2022 09:40:02 -0700 (PDT) Date: Sun, 12 Jun 2022 09:40:02 -0700 From: "Paul E. McKenney" To: "zhangfei.gao@foxmail.com" Cc: Zhangfei Gao , "linux-kernel@vger.kernel.org" , rcu@vger.kernel.org, Lai Jiangshan , Josh Triplett , Mathieu Desnoyers , Matthew Wilcox , Shameerali Kolothum Thodi , mtosatti@redhat.com, pbonzini@redhat.com, sheng.yang@intel.com Subject: Re: Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking readers from consuming CPU) cause qemu boot slow Message-ID: <20220612164002.GA1242564@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org> <20220611165956.GO1790663@paulmck-ThinkPad-P17-Gen-1> <20220612133631.GQ1790663@paulmck-ThinkPad-P17-Gen-1> <20220612162029.GR1790663@paulmck-ThinkPad-P17-Gen-1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220612162029.GR1790663@paulmck-ThinkPad-P17-Gen-1> X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 12, 2022 at 09:20:29AM -0700, Paul E. McKenney wrote: > On Sun, Jun 12, 2022 at 10:59:30PM +0800, zhangfei.gao@foxmail.com wrote: > > Hi, Paul > > > > On 2022/6/12 下午9:36, Paul E. McKenney wrote: > > > On Sun, Jun 12, 2022 at 03:40:30PM +0800, zhangfei.gao@foxmail.com wrote: > > > > Hi, Paul > > > > > > > > On 2022/6/12 上午12:59, Paul E. McKenney wrote: > > > > > On Sun, Jun 12, 2022 at 12:32:59AM +0800, Zhangfei Gao wrote: > > > > > > Hi, Paul > > > > > > > > > > > > When verifying qemu with acpi rmr feature on v5.19-rc1, the guest kernel > > > > > > stuck for several minutes. > > > > > Stuck for several minutes but then continues normally? Or stuck for > > > > > several minutes before you kill qemu? > > > > qemu boot stuck for several minutes, then guest can bootup normally, just > > > > slower. > > > > > And I have to ask... What happened without the ACPI RMR feature? > > > > If no ACPI, qemu boot quickly without stuck. > > > > build/aarch64-softmmu/qemu-system-aarch64 -machine > > > > virt,gic-version=3,iommu=smmuv3 \ > > > > -enable-kvm -cpu host -m 1024 \ > > > > -kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \ > > > > "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off" > > > > > > > > Adding acpi=force & -bios QEMU_EFI.fd, qemu boot stuck for several minutes. > > > > > > > > By the way, my hardware platform is aarch64. > > > Thank you for the information! The problem is excessive delay rather > > > than a hang, and it is configuration-dependent. Good to know! > > > > > > > Only change this can solve the stuck issue. > > > > > > > > --- a/kernel/rcu/srcutree.c > > > > +++ b/kernel/rcu/srcutree.c > > > > @@ -524,6 +524,10 @@ static unsigned long srcu_get_delay(struct srcu_struct > > > > *ssp) > > > >  { > > > >         unsigned long jbase = SRCU_INTERVAL; > > > > > > > > +       if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), > > > > READ_ONCE(ssp->srcu_gp_seq_needed_exp))) > > > > +               return 0; > > > > +       return SRCU_INTERVAL; > > > I am glad that you have a workaround for this problem, but this change > > > would re-introduce the problem that commit 282d8998e997 ("srcu: Prevent > > > expedited GPs and blocking readers from consuming CPU") was intended > > > to fix. For one example, your change can prevent kernel live patching > > > from applying a patch. So something else is needed. > > Understand, it is just debug where has issue. > > > > > > Does changing the value of SRCU_MAX_INTERVAL to (say) 3 decrease the delay > > > significantly? (This is not a fix, either, but instead a debug check.) > > No use. > > OK, that indicates that you have a very large number of invocations > of synchronize_srcu() or synchronize_srcu_expedited() instead of only > a few that take a very long time each. > > > > Your change always returns zero if another SRCU grace period is needed. > > > Let's look at the callers of srcu_get_delay(): > > > > > > o cleanup_srcu_struct() uses it to check whether there is an > > > expedited grace period pending, leaking the srcu_struct if so. > > > This should not affect boot delay. (Unless you are invoking > > > init_srcu_struct() and cleanup_srcu_struct() really really > > > often.) > > > > > > o srcu_gp_end() uses it to determine whether or not to allow > > > a one-jiffy delay before invoking callbacks at the end of > > > a grace period. > > > > > > o srcu_funnel_gp_start() uses it to determine whether or not to > > > allow a one-jiffy delay before starting the process of checking > > > for the end of an SRCU grace period. > > > > > > o try_check_zero() uses it to add an additional short delay > > > (instead of a long delay) between checks of reader state. > > > > > > o process_srcu() uses it to calculate the long delay between > > > checks of reader state. > > > > > > These add one-jiffy delays, except for process_srcu(), which adds a delay > > > of up to 10 jiffies. Even given HZ=100 (as opposed to the HZ=1000 that > > > I normally use), this requires thousands of such delays to add up to the > > > several minutes that you are seeing. (In theory, the delays could also > > > be due to SRCU readers, except that in that case adjusting timeouts in > > > the grace-period processing would not make things go faster.) > > > > > > So, does acpi=force & -bios QEMU_EFI.fd add SRCU grace periods? If so, > > > it would be very good make sure that this code is using SRCU efficiently. > > > One way to check would be to put a printk() into synchronize_srcu(), > > > though maintaining a counter and printing (say) every 1000th invocation > > > might be easier on the console output. > > > > good idea. > > Glad you like it. ;-) > > > > > > > And on 5.18, there is no such problem. > > > > > > > > > > > > After revert this patch, the issue solved. > > > > > > Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking readers from > > > > > > consuming CPU) > > > > > > > > > > > > > > > > > > qemu cmd: > > > > > > build/aarch64-softmmu/qemu-system-aarch64 -machine > > > > > > virt,gic-version=3,iommu=smmuv3 \ > > > > > > -enable-kvm -cpu host -m 1024 \ > > > > > > -kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \ > > > > > > "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off acpi=force" \ > > > > > > -bios QEMU_EFI.fd > > > > > > > > > > > > log: > > > > > > InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040 > > > > > > add-symbol-file /home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll > > > > > > 0x75459000 > > > > > > Loading driver at 0x00075458000 EntryPoint=0x00075459058 IScsiDxe.efi > > > > > > InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7AA4DE98 > > > > > > ProtectUefiImageCommon - 0x7AA4D040 > > > > > >   - 0x0000000075458000 - 0x000000000003F000 > > > > > > SetUefiImageMemoryAttributes - 0x0000000075458000 - 0x0000000000001000 > > > > > > (0x0000000000004008) > > > > > > SetUefiImageMemoryAttributes - 0x0000000075459000 - 0x000000000003B000 > > > > > > (0x0000000000020008) > > > > > > SetUefiImageMemoryAttributes - 0x0000000075494000 - 0x0000000000003000 > > > > > > (0x0000000000004008) > > > > > > InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952C8 > > > > > > InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358 > > > > > > InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370 > > > > > > InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952F8 > > > > > > InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358 > > > > > > InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370 > > > > > > InstallProtocolInterface: 59324945-EC44-4C0D-B1CD-9DB139DF070C 75495348 > > > > > > InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 754953E8 > > > > > > InstallProtocolInterface: 330D4706-F2A0-4E4F-A369-B66FA8D54385 7AA4D728 > > > > > > > > > > > > > > > > > > Not sure it is either reported or solved. > > > > > This is the first I have heard of it, so thank you for reporting it. > > > > > > > > > > Do you have a way of collecting something sysrq-t output? > > > > Do you mean "echo t > /proc/sysrq-trigger", > > > > There are too much output and kernel dump can not stop. > > > OK. What other tools do you have to work out what is happening during > > > temporary hangs such as this one? > > > > > > The question to be answered: "Is there usually at least one task waiting > > > in synchronize_srcu() during these hangs, and if so, which srcu_struct > > > is passed to those synchronize_srcu() calls?" > > > > As you suggested, add print in __synchronize_srcu, 1000 times print once. > > > > With acpi=force & -bios QEMU_EFI.fd > > > > When qemu stuck in > > InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040 > > add-symbol-file /home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll > > 0x75459000 > > > > The print in  __synchronize_srcu is print from 0 t0 9001. > > Now that is what I call a large number of calls! > > > [   94.271350] gzf __synchronize_srcu loop=1001 > > .... > > > > [  222.621659]  __synchronize_srcu loop=9001 > > [  222.621664] CPU: 96 PID: 2294 Comm: qemu-system-aar Not tainted > > 5.19.0-rc1-15071-g697f40b5235f-dirty #615 > > [  222.621666] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS > > 2280-V2 CS V5.B133.01 03/25/2021 > > [  222.621667] Call trace: > > [  222.621668]  dump_backtrace+0xe4/0xf0 > > [  222.621670] show_stack+0x20/0x70 > > [  222.621672] dump_stack_lvl+0x8c/0xb8 > > [  222.621674]  dump_stack+0x18/0x34 > > [  222.621676] __synchronize_srcu+0x120/0x128 > > [  222.621678] synchronize_srcu_expedited+0x2c/0x40 > > [  222.621680] kvm_swap_active_memslots+0x130/0x198 > > [  222.621683] kvm_activate_memslot+0x40/0x68 > > [  222.621684]  kvm_set_memslot+0x184/0x3b0 > > [  222.621686]  __kvm_set_memory_region+0x288/0x438 > > [  222.621688] kvm_set_memory_region+0x3c/0x60 > > This is KVM setting up one mapping in your IORT RMR memory, correct? > (As in I/O remapping table reserved memory regions.) > > > [  222.621689]  kvm_vm_ioctl+0x5a0/0x13e0 > > And this ioctl() is qemu telling KVM to do so, correct? > > > [  222.621691]  __arm64_sys_ioctl+0xb0/0xf8 > > [  222.621693]  invoke_syscall+0x4c/0x110 > > [  222.621696] el0_svc_common.constprop.0+0x68/0x128 > > [  222.621698]  do_el0_svc+0x34/0xc0 > > [  222.621701]  el0_svc+0x30/0x98 > > [  222.621704]  el0t_64_sync_handler+0xb8/0xc0 > > [  222.621706]  el0t_64_sync+0x18c/0x190 > > > > > > If no acpi=force, no print at all, 1000 times one print. > > OK, this certainly explains the slowdown, both from adding the IORT RMR > and from the commit that you bisected to. The answers to a few questions > might help illuminate a path towards a fix: > > Do these reserved memory regions really need to be allocated separately? > (For example, are they really all non-contiguous? If not, that is, if > there are a lot of contiguous memory regions, could you sort the IORT > by address and do one ioctl() for each set of contiguous memory regions?) > > Are all of these reserved memory regions set up before init is spawned? > > Are all of these reserved memory regions set up while there is only a > single vCPU up and running? > > Is the SRCU grace period really needed in this case? (I freely confess > to not being all that familiar with KVM.) Oh, and there was a similar many-requests problem with networking many years ago. This was solved by adding a new syscall/ioctl()/whatever that permitted many requests to be presented to the kernel with a single system call. Could a new ioctl() be introduced that requested a large number of these memory regions in one go so as to make each call to synchronize_rcu_expedited() cover a useful fraction of your 9000+ requests? Adding a few of the KVM guys on CC for their thoughts. Thanx, Paul