Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp139183ybi; Tue, 2 Jul 2019 17:49:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqyPxS+jfo8p3C0rSAYhApdP9ejzz4Nptjn68r4vVxWgsKG5wS22DiAgCnZQWKImItdD/QWV X-Received: by 2002:a65:57ca:: with SMTP id q10mr34842140pgr.52.1562114969138; Tue, 02 Jul 2019 17:49:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562114969; cv=none; d=google.com; s=arc-20160816; b=Spy0U7CSIdV2uZIX2rHZuZEjbNQr1Fj0mEmmnJMTEQdm2RVpDG2M6wDZTIorlGTEPv nzDQEca8mZikJ++oFKxsakRadkvzZzUqI9uAkvE8lQa3k/YomHDLyi2vMWxLqCBuqCir Sxn/J92EgIS6DC+hII8zmLZHyBqIi7IMy94IUHHPRqsM7z4Krv05xYlIDmwE8ltmD90D 9nPBwCD3pqfgmxT5dteNjjxerruCKLLa9koeJg7J4fRsxf8XdtoNabEDoHt3i3ap0AAz cixI+cGOjThmRr6uNcLaiBtm0dtNzJUbiuGa2AzCKEy2U98RkWQsVG/J3om3fburqNYs vptQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=YuTGE9OjScXsIDv/VUVR3Fjnrv0ray9iH433TKYeQT0=; b=aSCS3LFTPILnA92JzbXSnDHQapG/BPhQwvFA5gWMRa+/oAuI2ghgz5iE8HXmW+J4gZ 3k+90iJtVhy7SJo4pHkU/tlVZgJwo/NsVw5UdK9m3GwwwOEqDpDwfQL03ULNRT5MUDVt ul/YLmMKpqxhsHseeUrC8t2lgHQPPwWyH6rLO0/ALVhSyG59DPib1kOvK6jLOUDCJ98P NmqzlLCtiJkZOa16uJ9wqfaLHzUGWUmsATvGpcEeiCBAEggEIqD9BdYG04UnPop4UQEa 1tPdJ31qrVQReltRmkvtankFs0Frfqv5FnOaBcEnAEHftQq8WpPWV0B6SyPDh4tc3LVd gpnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XEgM91d5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id go12si379553plb.251.2019.07.02.17.49.13; Tue, 02 Jul 2019 17:49:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XEgM91d5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727300AbfGCArt (ORCPT + 99 others); Tue, 2 Jul 2019 20:47:49 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:45433 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727089AbfGCArs (ORCPT ); Tue, 2 Jul 2019 20:47:48 -0400 Received: by mail-ot1-f68.google.com with SMTP id x21so463604otq.12; Tue, 02 Jul 2019 17:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YuTGE9OjScXsIDv/VUVR3Fjnrv0ray9iH433TKYeQT0=; b=XEgM91d5cBYb7o4THx0NwhK9lTcNeinngzIa7nALd4ezIBXThOgnwd2UFuU5BaB6io oedxVxMzgx7JFMdjhL31iTxt7oJjdgxqHLNdO09ndrL7kdMLpYoqooen0E6q52IN1Zk0 91H98jSYMp0ohM3NlVDKcjckWbjlKhv0DZDDleiFin0kcufgKP86VLzjUy0pt+AcYxpY V/4+eKsWJH307EJkjHjUeaFjs46UVm6cK+ZpfmUTSgPV9BJMyw4gxwOxUii4HG+tHank ozXbXhjs1NQopGEwGiLy7U61TKh0b9UY9QsXqnHQuy4EiX+JTVi+ipK1+mcPmIXjFzhc vYLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YuTGE9OjScXsIDv/VUVR3Fjnrv0ray9iH433TKYeQT0=; b=hAjjQYkP2KH+EshZCwW+/kXeR6Z3Ygi+MgyPc8sJIkroLJ9LTIcmmhO0c5+aWyl9xY 9TJgebTdtMOIE2JXH6ylrqiFWfKEl3BnPAxPEoO9WNiMyAHq3/PCDPOlgrKm5T2isnh3 mUtrgXCW9Piu49pCdLlZbIo1Z9Vppy9dYABGn//j6Z7DBgIKq5HwrvNB9S43ROeC9MN5 2V71UOh/l27G5FJJEQB9wmt5JD3v3FomD7TgMggHU7Hzak3w97PH6Ar/L+6ria61cU98 cI+u5W57dRPuI5+t9xyRB6FRK0SR7zqCXoOWObBkWFBLTiwvciqUT3SHxEfLkz6fgQbv BkAw== X-Gm-Message-State: APjAAAVFQG3nswHZZeGnABo5LGo5Tl948qlRopvGme2I5XRMHSzb01mV dLKnP4VaM3ihXm9RLOwGJ0LUUPEELpNu3YzRXpE= X-Received: by 2002:a9d:4590:: with SMTP id x16mr25082668ote.254.1562114867509; Tue, 02 Jul 2019 17:47:47 -0700 (PDT) MIME-Version: 1.0 References: <1561110002-4438-1-git-send-email-wanpengli@tencent.com> <1fbd236a-f7f9-e66a-e08c-bf2bac901d15@redhat.com> <20190702222330.GB26621@amt.cnet> In-Reply-To: <20190702222330.GB26621@amt.cnet> From: Wanpeng Li Date: Wed, 3 Jul 2019 08:47:40 +0800 Message-ID: Subject: Re: [PATCH v5 0/4] KVM: LAPIC: Implement Exitless Timer To: Marcelo Tosatti Cc: Paolo Bonzini , LKML , kvm , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Jul 2019 at 06:23, Marcelo Tosatti wrote: > > On Tue, Jul 02, 2019 at 06:38:56PM +0200, Paolo Bonzini wrote: > > On 21/06/19 11:39, Wanpeng Li wrote: > > > Dedicated instances are currently disturbed by unnecessary jitter due > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident. > > > There is no hardware virtual timer on Intel for guest like ARM. Both > > > programming timer in guest and the emulated timer fires incur vmexits. > > > This patchset tries to avoid vmexit which is incurred by the emulated > > > timer fires in dedicated instance scenario. > > > > > > When nohz_full is enabled in dedicated instances scenario, the unpinned > > > timer will be moved to the nearest busy housekeepers after commit > > > 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit > > > 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, > > > KVM always makes lapic timer pinned to the pCPU which vCPU residents, the > > > reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer > > > pinned). Actually, these emulated timers can be offload to the housekeeping > > > cpus since APICv is really common in recent years. The guest timer interrupt > > > is injected by posted-interrupt which is delivered by housekeeping cpu > > > once the emulated timer fires. > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/ > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root > > > mode, ~3% redis performance benefit can be observed on Skylake server. > > > > Marcelo, > > > > does this patch work for you or can you still see the oops? > > Hi Paolo, > > No more oopses with kvm/queue. Can you include: Cool, thanks for the confirm, Marcelo! > > Index: kvm/arch/x86/kvm/lapic.c > =================================================================== > --- kvm.orig/arch/x86/kvm/lapic.c > +++ kvm/arch/x86/kvm/lapic.c > @@ -124,8 +124,7 @@ static inline u32 kvm_x2apic_id(struct k > > bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu) > { > - return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) && > - kvm_hlt_in_guest(vcpu->kvm); > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu); > } > EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer); > > However, for some reason (hrtimer subsystems responsability) with cyclictest -i 200 > on the guest, the timer runs on the local CPU: > > CPU 1/KVM-9454 [003] d..2 881.674196: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674200: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d.h. 881.674387: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 881.674393: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674395: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674399: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d.h. 881.674586: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 881.674593: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674595: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674599: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d.h. 881.674787: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 881.674793: get_nohz_timer_target: get_nohz_timer_target 3->0 > CPU 1/KVM-9454 [003] d..2 881.674795: get_nohz_timer_target: get_nohz_timer_target 3->0 > > But on boot: > > CPU 1/KVM-9454 [003] d..2 578.625394: get_nohz_timer_target: get_nohz_timer_target 3->0 > -0 [000] d.h1 578.626390: apic_timer_fn <-__hrtimer_run_queues > -0 [000] d.h1 578.626394: apic_timer_fn<-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 578.626401: get_nohz_timer_target: get_nohz_timer_target 3->0 > -0 [000] d.h1 578.628397: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 578.628407: get_nohz_timer_target: get_nohz_timer_target 3->0 > -0 [000] d.h1 578.631403: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 578.631413: get_nohz_timer_target: get_nohz_timer_target 3->0 > -0 [000] d.h1 578.635409: apic_timer_fn <-__hrtimer_run_queues > CPU 1/KVM-9454 [003] d..2 578.635419: get_nohz_timer_target: get_nohz_timer_target 3->0 > -0 [000] d.h1 578.640415: apic_timer_fn <-__hrtimer_run_queues You have an idle housekeeping cpu(cpu 0), however, most of housekeeping cpus will be busy in product environment to avoid to waste money. get_nohz_timer_target() will find a busy housekeeping cpu but the timer migration will fail if the timer is the first expiring timer on the new target(as the comments above the function switch_hrtimer_base()). Please try taskset -c 0 stress --cpu 1 on your host, you can observe(through /proc/timer_list) apic_timer_fn running on cpu 0 most of the time and sporadically on local cpu. Regards, Wanpeng Li