Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754905AbbFQKhz (ORCPT ); Wed, 17 Jun 2015 06:37:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39702 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751136AbbFQKhp (ORCPT ); Wed, 17 Jun 2015 06:37:45 -0400 Date: Wed, 17 Jun 2015 12:37:42 +0200 From: Igor Mammedov To: "Michael S. Tsirkin" Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com Subject: Re: [PATCH 3/5] vhost: support upto 509 memory regions Message-ID: <20150617123742.5c3fec30@nial.brq.redhat.com> In-Reply-To: <20150617110711-mutt-send-email-mst@redhat.com> References: <1434472419-148742-1-git-send-email-imammedo@redhat.com> <1434472419-148742-4-git-send-email-imammedo@redhat.com> <20150616231201-mutt-send-email-mst@redhat.com> <20150617000056.481e4a96@igors-macbook-pro.local> <20150617083202-mutt-send-email-mst@redhat.com> <20150617092802.5c8d8475@igors-macbook-pro.local> <20150617092910-mutt-send-email-mst@redhat.com> <20150617105421.71751f44@nial.brq.redhat.com> <20150617110711-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6429 Lines: 149 On Wed, 17 Jun 2015 12:11:09 +0200 "Michael S. Tsirkin" wrote: > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote: > > On Wed, 17 Jun 2015 09:39:06 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote: > > > > On Wed, 17 Jun 2015 08:34:26 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote: > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200 > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote: > > > > > > > > since commit > > > > > > > > 1d4e7e3 kvm: x86: increase user memory slots to 509 > > > > > > > > > > > > > > > > it became possible to use a bigger amount of memory > > > > > > > > slots, which is used by memory hotplug for > > > > > > > > registering hotplugged memory. > > > > > > > > However QEMU crashes if it's used with more than ~60 > > > > > > > > pc-dimm devices and vhost-net since host kernel > > > > > > > > in module vhost-net refuses to accept more than 65 > > > > > > > > memory regions. > > > > > > > > > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509 > > > > > > > > > > > > > > It was 64, not 65. > > > > > > > > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net. > > > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > > > > > > > Still thinking about this: can you reorder this to > > > > > > > be the last patch in the series please? > > > > > > sure > > > > > > > > > > > > > > > > > > > > Also - 509? > > > > > > userspace memory slots in terms of KVM, I made it match > > > > > > KVM's allotment of memory slots for userspace side. > > > > > > > > > > Maybe KVM has its reasons for this #. I don't see > > > > > why we need to match this exactly. > > > > np, I can cap it at safe 300 slots but it's unlikely that it > > > > would take cut off 1 extra hop since it's capped by QEMU > > > > at 256+[initial fragmented memory] > > > > > > But what's the point? We allocate 32 bytes per slot. > > > 300*32 = 9600 which is more than 8K, so we are doing > > > an order-3 allocation anyway. > > > If we could cap it at 8K (256 slots) that would make sense > > > since we could avoid wasting vmalloc space. > > 256 is amount of hotpluggable slots and there is no way > > to predict how initial memory would be fragmented > > (i.e. amount of slots it would take), if we guess wrong > > we are back to square one with crashing userspace. > > So I'd stay consistent with KVM's limit 509 since > > it's only limit, i.e. not actual amount of allocated slots. > > > > > I'm still not very happy with the whole approach, > > > giving userspace ability allocate 4 whole pages > > > of kernel memory like this. > > I'm working in parallel so that userspace won't take so > > many slots but it won't prevent its current versions > > crashing due to kernel limitation. > > Right but at least it's not a regression. If we promise userspace to > support a ton of regions, we can't take it back later, and I'm concerned > about the memory usage. > > I think it's already safe to merge the binary lookup patches, and maybe > cache and vmalloc, so that the remaining patch will be small. it isn't regression with switching to binary search and increasing slots to 509 either performance wise it's more on improvment side. And I was thinking about memory usage as well, that's why I've dropped faster radix tree in favor of more compact array, can't do better on kernel side of fix. Yes we will give userspace to ability to use more slots/and lock up more memory if it's not able to consolidate memory regions but that leaves an option for user to run guest with vhost performance vs crashing it at runtime. userspace/targets that could consolidate memory regions should do so and I'm working on that as well but that doesn't mean that users shouldn't have a choice. So far it's kernel limitation and this patch fixes crashes that users see now, with the rest of patches enabling performance not to regress. > > > > > > > > > > I think if we are changing this, it'd be nice to > > > > > > > create a way for userspace to discover the support > > > > > > > and the # of regions supported. > > > > > > That was my first idea before extending KVM's memslots > > > > > > to teach kernel to tell qemu this number so that QEMU > > > > > > at least would be able to check if new memory slot could > > > > > > be added but I was redirected to a more simple solution > > > > > > of just extending vs everdoing things. > > > > > > Currently QEMU supports upto ~250 memslots so 509 > > > > > > is about twice high we need it so it should work for near > > > > > > future > > > > > > > > > > Yes but old kernels are still around. Would be nice if you > > > > > can detect them. > > > > > > > > > > > but eventually we might still teach kernel and QEMU > > > > > > to make things more robust. > > > > > > > > > > A new ioctl would be easy to add, I think it's a good > > > > > idea generally. > > > > I can try to do something like this on top of this series. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > drivers/vhost/vhost.c | 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > > > index 99931a0..6a18c92 100644 > > > > > > > > --- a/drivers/vhost/vhost.c > > > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > > > @@ -30,7 +30,7 @@ > > > > > > > > #include "vhost.h" > > > > > > > > > > > > > > > > enum { > > > > > > > > - VHOST_MEMORY_MAX_NREGIONS = 64, > > > > > > > > + VHOST_MEMORY_MAX_NREGIONS = 509, > > > > > > > > VHOST_MEMORY_F_LOG = 0x1, > > > > > > > > }; > > > > > > > > > > > > > > > > -- > > > > > > > > 1.8.3.1 > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/