Date: Wed, 17 Jun 2015 00:19:15 +0200
From: Igor Mammedov <imammedo@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com
Subject: Re: [PATCH 0/5] vhost: support upto 509 memory regions
Message-ID: <20150617001915.23f062b0@igors-macbook-pro.local>
In-Reply-To: <20150616231505-mutt-send-email-mst@redhat.com>
References: <1434472419-148742-1-git-send-email-imammedo@redhat.com>
	<20150616231505-mutt-send-email-mst@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2673
Lines: 68

On Tue, 16 Jun 2015 23:16:07 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote:
> > Series extends vhost to support upto 509 memory regions,
> > and adds some vhost:translate_desc() performance improvemnts
> > so it won't regress when memslots are increased to 509.
> > 
> > It fixes running VM crashing during memory hotplug due
> > to vhost refusing accepting more than 64 memory regions.
> > 
> > It's only host kernel side fix to make it work with QEMU
> > versions that support memory hotplug. But I'll continue
> > to work on QEMU side solution to reduce amount of memory
> > regions to make things even better.
> 
> I'm concerned userspace work will be harder, in particular,
> performance gains will be harder to measure.
it appears so, so far.

> How about a flag to disable caching?
I've tried to measure cost of cache miss but without much luck,
difference between version with cache and with caching removed
was within margin of error (±10ns) (i.e. not mensurable on my
5min/10*10^6 test workload).
Also I'm concerned about adding extra fetch+branch for flag
checking will make things worse for likely path of cache hit,
so I'd avoid it if possible.

Or do you mean a simple global per module flag to disable it and
wrap thing in static key so that it will be cheap jump to skip
cache?
 
> > Performance wise for guest with (in my case 3 memory regions)
> > and netperf's UDP_RR workload translate_desc() execution
> > time from total workload takes:
> > 
> > Memory      |1G RAM|cached|non cached
> > regions #   |  3   |  53  |  53
> > ------------------------------------
> > upstream    | 0.3% |  -   | 3.5%
> > ------------------------------------
> > this series | 0.2% | 0.5% | 0.7%
> > 
> > where "non cached" column reflects trashing wokload
> > with constant cache miss. More details on timing in
> > respective patches.
> > 
> > Igor Mammedov (5):
> >   vhost: use binary search instead of linear in find_region()
> >   vhost: extend memory regions allocation to vmalloc
> >   vhost: support upto 509 memory regions
> >   vhost: add per VQ memory region caching
> >   vhost: translate_desc: optimization for desc.len < region size
> > 
> >  drivers/vhost/vhost.c | 95
> > +++++++++++++++++++++++++++++++++++++--------------
> > drivers/vhost/vhost.h |  1 + 2 files changed, 71 insertions(+), 25
> > deletions(-)
> > 
> > -- 
> > 1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/