Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S977490AbdDXUEB (ORCPT ); Mon, 24 Apr 2017 16:04:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37778 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S977217AbdDXUDw (ORCPT ); Mon, 24 Apr 2017 16:03:52 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 0D961C0467F3 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 0D961C0467F3 Date: Mon, 24 Apr 2017 22:03:46 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Cornelia Huck Cc: David Hildenbrand , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Christoffer Dall , Marc Zyngier , Paolo Bonzini , Christian Borntraeger , James Hogan , Paul Mackerras , Alexander Graf Subject: Re: [PATCH 0/4] KVM: add KVM_CREATE_VM2 to allow dynamic kvm->vcpus array Message-ID: <20170424200346.GD5713@potion> References: <20170413201951.11939-1-rkrcmar@redhat.com> <20170418142903.44973c86.cornelia.huck@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170418142903.44973c86.cornelia.huck@de.ibm.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 24 Apr 2017 20:03:52 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2653 Lines: 55 2017-04-18 14:29+0200, Cornelia Huck: > On Tue, 18 Apr 2017 13:11:55 +0200 > David Hildenbrand wrote: >> On 13.04.2017 22:19, Radim Krčmář wrote: >> > new KVM_MAX_CONFIGURABLE_VCPUS, probably directly to INT_MAX/KVM_VCPU_ID, so we >> > don't have to worry about it for a while. >> > >> > PPC should be interested in this as they set KVM_MAX_VCPUS to NR_CPUS >> > and probably waste few pages for every guest this way. >> >> As we just store pointers, this should be a maximum of 4 pages for ppc >> (4k pages). Is this really worth yet another VM creation ioctl? Is there >> not a nicer way to handle this internally? >> >> An alternative might be to simply realloc the array when it reaches a >> certain size (on VCPU creation, maybe protecting the pointer via rcu). >> But not sure if something like that could work. > > I like that idea better, if it does work (I think it should be doable). > If we just double the array size every time we run out of space, we > should be able to do this with few reallocations. That has also the > advantage of being transparent to user space (other than increased > number of vcpus). Yes, relocating would work with protection against use-after-free and RCU fits well. (Readers don't have any lock we could piggyback on.) I didn't go for it because of C: the kvm_for_each_vcpu macro would be less robust if it included the locking around its body -- nested fors are susceptible to return/goto errors inside the loop body + we'd need to obfuscate several existing users of that pattern. And open-coding the protection everywhere is polluting the code too, IMO. Lock-less list would solve those problems, but we are accessing the VCPUs by index, which makes it suboptimal in other direction ... using the list for kvm_for_each_vcpu and adding RCU protected array for kvm_get_vcpu and kvm_get_vcpu_by_id looks like over-engineering as we wouldn't save memory, performance, nor lines of code by doing that. I didn't see a way to untangle kvm->vcpu that would allow a nice runtime-dynamic variant. We currently don't need to pass more information at VM creation time either, so I was also thinking of hijacking the parameter to KVM_CREATE_VM for factor-of-2 VCPU count (20 bits would last a while), but that is already a new interface and new IOCTL to do a superset of another one seemed much better. I agree that the idea is questionable. I'll redo the series and bump KVM_MAX_VCPUS unless you think that the dynamic could be done nicely. (The memory saving is a miniscule fraction of a VM size and if we do big increments in KVM_MAX_VCPUS, then the motivation is gone.) Thanks.