DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 0363E806B6
Date: Mon, 21 Aug 2017 17:44:49 +0200
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>,
        David Hildenbrand <david@redhat.com>, pbonzini@redhat.com,
        tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org,
        kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM/x86: Increase max vcpu number to 352
Message-ID: <20170821154445.GE20100@flask>
References: <1502359259-24966-1-git-send-email-tianyu.lan@intel.com>
 <20170810175056.GR2547@char.us.oracle.com>
 <23159a7e-463a-2a5b-5aaa-ef7f0eb43547@intel.com>
 <aab6d4cc-770c-4c49-c587-ff693e23388f@redhat.com>
 <20170811130020.GB28649@flask>
 <20170811193531.GM32249@dhcp-amer-vpn-adc-anyconnect-10-154-152-169.vpn.oracle.com>
 <323bcdf0-4f4c-5c24-fe8e-f2f773b58370@intel.com>
 <20170815141046.GN20279@char.us.oracle.com>
 <20170815161328.GB5975@flask>
 <20170818135758.GE11671@char.us.oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170818135758.GE11671@char.us.oracle.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2876
Lines: 64

2017-08-18 09:57-0400, Konrad Rzeszutek Wilk:
> On Tue, Aug 15, 2017 at 06:13:29PM +0200, Radim Krčmář wrote:
> > (Missed this mail before my last reply.)
> > 
> > 2017-08-15 10:10-0400, Konrad Rzeszutek Wilk:
> > > On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> > > > On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> > > > > Migration with 352 CPUs all being busy dirtying memory and also poking
> > > > > at various I/O ports (say all of them dirtying the VGA) is no problem?
> > > > 
> > > > This depends on what kind of workload is running during migration. I
> > > > think this may affect service down time since there maybe a lot of dirty
> > > > memory data to transfer after stopping vcpus. This also depends on how
> > > > user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> > > > will break migration function.
> > > 
> > > OK, so let me take a step back.
> > > 
> > > I see this nice 'supported' CPU count that is exposed in kvm module.
> > > 
> > > Then there is QEMU throwing out a warning if you crank up the CPU count
> > > above that number.
> > 
> > I find the range between "recommended max" and "hard max" VCPU count
> > confusing at best ... IIUC, it was there because KVM internals had
> > problems with scaling and we will hit more in the future because some
> > loops still are linear on VCPU count.
> 
> Is that documented somewhere? There are some folks would be interested
> in looking at that if it was known what exactly to look for..

Not really, Documentation/virtual/kvm/api.txt says:

  The recommended max_vcpus value can be retrieved using the
  KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.

And "recommended" is not explained any further.  We can only state that
the value has no connection with userspace functionality, because it is
provided by KVM.

Red Hat was raising the KVM_CAP_NR_VCPUS after testing on a machine that
had enough physical cores.  (PLE had to be slightly optimized when going
to 240.)

> > The exposed value doesn't say whether migration will work, because that
> > is a userspace thing and we're not aware of bottlenecks on the KVM side.
> > 
> > > Red Hat's web-pages talk about CPU count as well.
> > > 
> > > And I am assuming all of those are around what has been tested and
> > > what has shown to work. And one of those test-cases surely must
> > > be migration.
> > 
> > Right, Red Hat will only allow/support what it has tested, even if
> > upstream has a practically unlimited count.  I think the upstream number
> > used to be raised by Red Hat, which is why upstream isn't at the hard
> > implementation limit ...
> 
> Aim for the sky! Perhaps then lets crank it up to 4096 upstream and let
> each vendor/distro/cloud decide the right number based on their
> testing.

And hit the ceiling. :)
NR_CPUS seems like a good number upstream.