Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754516Ab1CBNfA (ORCPT ); Wed, 2 Mar 2011 08:35:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:64268 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752606Ab1CBNe6 (ORCPT ); Wed, 2 Mar 2011 08:34:58 -0500 Message-ID: <4D6E477C.7050303@redhat.com> Date: Wed, 02 Mar 2011 15:34:52 +0200 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7 MIME-Version: 1.0 To: Marcelo Tosatti CC: Alex Williamson , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, xiaoguangrong@cn.fujitsu.com Subject: Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memory slots using wbtree References: <1298386481.5764.60.camel@x201> <20110222183822.22026.62832.stgit@s20.home> <4D6507C9.1000906@redhat.com> <1298484395.18387.28.camel@x201> <1298489332.18387.56.camel@x201> <4D662DBF.2020706@redhat.com> <1298568944.6140.21.camel@x201> <4D6A1F55.7080804@redhat.com> <20110301194703.GA7736@amt.cnet> In-Reply-To: <20110301194703.GA7736@amt.cnet> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4600 Lines: 96 On 03/01/2011 09:47 PM, Marcelo Tosatti wrote: > On Sun, Feb 27, 2011 at 11:54:29AM +0200, Avi Kivity wrote: > > On 02/24/2011 07:35 PM, Alex Williamson wrote: > > >On Thu, 2011-02-24 at 12:06 +0200, Avi Kivity wrote: > > >> On 02/23/2011 09:28 PM, Alex Williamson wrote: > > >> > I had forgotten about<1M mem, so actually the slot configuration was: > > >> > > > >> > 0:<1M > > >> > 1: 1M - 3.5G > > >> > 2: 4G+ > > >> > > > >> > I stacked the deck in favor of the static array (0: 4G+, 1: 1M-3.5G, 2: > > >> > <1M), and got these kernbench results: > > >> > > > >> > base (stdev) reorder (stdev) wbtree (stdev) > > >> > --------+-----------------+----------------+----------------+ > > >> > Elapsed | 42.809 (0.19) | 42.160 (0.22) | 42.305 (0.23) | > > >> > User | 115.709 (0.22) | 114.358 (0.40) | 114.720 (0.31) | > > >> > System | 41.605 (0.14) | 40.741 (0.22) | 40.924 (0.20) | > > >> > %cpu | 366.9 (1.45) | 367.4 (1.17) | 367.6 (1.51) | > > >> > context | 7272.3 (68.6) | 7248.1 (89.7) | 7249.5 (97.8) | > > >> > sleeps | 14826.2 (110.6) | 14780.7 (86.9) | 14798.5 (63.0) | > > >> > > > >> > So, wbtree is only slightly behind reordering, and the standard > > >> > deviation suggests the runs are mostly within the noise of each other. > > >> > Thanks, > > >> > > >> Doesn't this indicate we should use reordering, instead of a new data > > >> structure? > > > > > >The original problem that brought this on was scaling. The re-ordered > > >array still has O(N) scaling while the tree should have ~O(logN) (note > > >that it currently doesn't because it needs a compaction algorithm added > > >after insert and remove). So yes, it's hard to beat the results of a > > >test that hammers on the first couple entries of a sorted array, but I > > >think the tree has better than current performance and more predictable > > >when scaled performance. > > > > Scaling doesn't matter, only actual performance. Even a guest with > > 512 slots would still hammer only on the first few slots, since > > these will contain the bulk of memory. > > > > >If we knew when we were searching for which type of data, it would > > >perhaps be nice if we could use a sorted array for guest memory (since > > >it's nicely bounded into a small number of large chunks), and a tree for > > >mmio (where we expect the scaling to be a factor). Thanks, > > > > We have three types of memory: > > > > - RAM - a few large slots > > - mapped mmio (for device assignment) - possible many small slots > > - non-mapped mmio (for emulated devices) - no slots > > > > The first two are handled in exactly the same way - they're just > > memory slots. We expect a lot more hits into the RAM slots, since > > they're much bigger. But by far the majority of faults will be for > > the third category - mapped memory will be hit once per page, then > > handled by hardware until Linux memory management does something > > about the page, which should hopefully be rare (with device > > assignment, rare == never, since those pages are pinned). > > > > Therefore our optimization priorities should be > > - complete miss into the slot list > > - hit into the RAM slots > > - hit into the other slots (trailing far behind) > > Whatever ordering considered optimal in one workload can be suboptimal > in another. The binary search reduces the number of slots inspected > in the average case. Using slot size as weight favours probability. It's really difficult to come up with a workload that causes many hits to small slots. > > Of course worst-case performance matters. For example, we might > > (not sure) be searching the list with the mmu spinlock held. > > > > I think we still have a bit to go before we can justify the new data > > structure. > > Intensive IDE disk IO on guest with lots of assigned network devices, 3% > improvement on netperf with rtl8139, 1% improvement on kernbench? > > Fail to see justification for not using it. By itself it's great, but the miss cache will cause the code to be called very rarely. So I prefer the sorted array which is simpler (and faster for the few-large-slots case). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/