Received: by 2002:a05:7412:8d09:b0:fa:4c10:6cad with SMTP id bj9csp376623rdb; Tue, 16 Jan 2024 03:13:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IGWey+XYLezXmyUHNvS9yoGMfZdvTmXFMFm4IkNEfms1AbVY1B4BYXbSHqMOnqsojSfUP70 X-Received: by 2002:ac8:7f45:0:b0:429:bc61:6f4c with SMTP id g5-20020ac87f45000000b00429bc616f4cmr10251729qtk.21.1705403605955; Tue, 16 Jan 2024 03:13:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705403605; cv=none; d=google.com; s=arc-20160816; b=dqqm9wcp7euTolwqbqy4len5FJ1oVu5AsP2PjOhQjcrU3jeUm6Z+yaXggyQRqBj1Tl kzoc/7e7d7tTDeQuf9TIAVXN2xw6bEha8Jre/x8UgWs2Qzvfj7TRXRPz4Ts86gVi6m/r /nH7mqArnXNtiNHb2QvzmtJ6e0tFTKRnvRSZ9MNrL4b8gwJHLXdNY7JNz1RQFdqagjGs i3C2l8n3YUsBNZyB9U2JhgPxoe7hk/Y1ZqnTXKRJn7fcaAD84CaFG3+uaFBMqHpembVU vRf+Ka1w9dW8mcaI2xmCSCFJvLYTHrv0ywY1VN6QqgHM99Z1WQtDHveoMGdONHdIKz3T n+5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature; bh=N4fcksTp7uAR8vSzNVkN444xVjHY7IPK63m5BMQ1dBk=; fh=tErGk6/tUVSECHNEwK0i8gWEvm33Md6q3Gs6pBa9oJk=; b=hYKlFkLR92moFzjSZSAamJwe2OhmcTqYb4vAJsl9VFHTG3pXrG2GV151aLvFwXs81o JeSKlCCxForQZ97H1HXqn8IblJb5c1sjJYWRX6C70uYrwhKKhclEErfr1swTLsFlT9UA opNIbfptudV1NWP9EuAkfdKJATOKFcwlqQB9XcjA+BBfZ6ENYAP74WsHRJ+l9yTdn1/Y Ggce8AHnOF3Zfhp9gmgRatz8jzOf+06sCeACWbrIOdcMGs1yaJLgQ6USruHZ3pLWO7VK MrwA984Nnl/2f3D6/OR1hnewF9lhcfTqeW5W+lZaGtOh/TXoM1JKWijUvo/ArlmQFA/E Uf3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iDmOVxuY; spf=pass (google.com: domain of linux-kernel+bounces-27298-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-27298-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id f21-20020a05622a105500b00429fb96fcdcsi2144244qte.312.2024.01.16.03.13.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jan 2024 03:13:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-27298-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iDmOVxuY; spf=pass (google.com: domain of linux-kernel+bounces-27298-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-27298-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B301D1C231C6 for ; Tue, 16 Jan 2024 11:13:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 52C801B7F8; Tue, 16 Jan 2024 11:13:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iDmOVxuY" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 822D21B7ED; Tue, 16 Jan 2024 11:13:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0E8EC433F1; Tue, 16 Jan 2024 11:13:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705403598; bh=S+0SZtA/75E2v3VkKaIyy4uPLM3LshStUgu21aLTubc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=iDmOVxuYwMchFyoYdVADTNZz9TMlhzIa/aKxHixxz0KQ41cIc3Y+CMlgbWUx83ddD QoXselPmeixw44LsSSnOmrPS3qexnnY1/yFH8E7RR1Aw4kayf7uE1aNLP2piWMUWYT QInccIBD99HZ3hTVX4RUuPjzGMOwgU9aB8Ct5rd9amKF/e1m4+LMkatl1xxoB0gzF0 WDDceGzuBCmWj4bM8tTkSaebP0wjHUVI5QJY0Lc8/6ed7KRKmRpfoGlJ7NVVABvCBn 7nga2kTsKzerF2P51PCnof/yJec1QCwRTbQSRZbQ4FoLBl0oTI6baBXqBHard97zcH uNQslch4RLRlQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rPhNc-00C5Q7-AA; Tue, 16 Jan 2024 11:13:16 +0000 Date: Tue, 16 Jan 2024 11:13:15 +0000 Message-ID: <86v87t8ras.wl-maz@kernel.org> From: Marc Zyngier To: "sundongxu (A)" Cc: , , , , , , , , , Subject: Re: [bug report] GICv4.1: VM performance degradation due to not trapping vCPU WFI In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: sundongxu3@huawei.com, oliver.upton@linux.dev, yuzenghui@huawei.com, james.morse@arm.com, suzuki.poulose@arm.com, will@kernel.org, catalin.marinas@arm.com, wanghaibin.wang@huawei.com, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Tue, 16 Jan 2024 03:26:08 +0000, "sundongxu (A)" wrote: > > Hi Guys, > > We found a problem about GICv4/4.1, for example: > We use QEMU to start a VM (4 vCPUs and 8G memory), VM disk was > configured with virtio, and the network is configured with vhost-net, > the CPU affinity of the vCPU and emulator is as follows, in VM xml: > > > > > > > > > > Running Mysql in the VM, and sysbench (Mysql benchmark) on the host, > the performance index is tps, the higher the better. > If the host only enabled GICv3, the tps will be around 1400. > If the host enabled GICv4.1, other configurations remain unchanged, the > tps will be around 40. > > We found that when the host enabled GICv4.1, because vSGI is directly > injected to VM, and most time vCPU exclusively occupy the pCPU, vCPU > will not trap when executing the WFI instruction. Then from the host > view, the CPU usage of vCPU0~vCPU3 is almost 100%. When running mysql > service in VM, the vhost-net and qemu processes also need to obtain > enough CPU time, but unfortunately these processes cannot get that much > time (for example, only GICv3 enabled, the cpu usage of vhost-net is > about 43%, but with GICv4.1 enabled, it becomes 0~2%). During the test, > it was found that vhost-net sleeps and wakes up very frequently. When > vhost-net wakes up, it often cannot obtain CPU in time (because of > wake-up preemption check). After waking up, vhost-net will usually run > for a short period of time before going to sleep again. Can you elaborate on this preemption check issue? > > If the host enabled GICv4.1, and force vCPU to trap when executing WFI, > the tps will be around 1400. > > On the other hand, when vCPU executes WFI instruction without trapping, > the vcpu wake-up delay will be significantly improved. For example, the > result of running cyclictest in VM: > WFI trap 6us > WFI no trap 2us > > Currently, I add a KVM module parameter to control whether the vCPU > traps (by set or clear HCR_TWI) when executing the WFI instruction with > host enabled GICv4/4.1, and by default, vCPU traps are set. > > Or, it there a better way? As you foudn out, KVM has an adaptive way of dealing with HCR_TWI, turning it off when the vcpu is alone in the run queue. Which means it doesn't compete with any other thread. How comes the other threads don't register as being runnable? Effectively, we apply the same principle to vSGIs as to vLPIs, and it was found that this heuristic was pretty beneficial to vLPIs. I'm a bit surprised that vSGIs are so different in their usage pattern. Does it help if you move your "emulatorpin" to some other physical CPUs? Thanks, M. -- Without deviation from the norm, progress is not possible.