Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757914AbZFVUmu (ORCPT ); Mon, 22 Jun 2009 16:42:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751228AbZFVUml (ORCPT ); Mon, 22 Jun 2009 16:42:41 -0400 Received: from acsinet11.oracle.com ([141.146.126.233]:17968 "EHLO acsinet11.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751562AbZFVUmk convert rfc822-to-8bit (ORCPT ); Mon, 22 Jun 2009 16:42:40 -0400 MIME-Version: 1.0 Message-ID: <636843ec-b290-4ea9-b629-1d364f3b1112@default> Date: Mon, 22 Jun 2009 13:41:19 -0700 (PDT) From: Dan Magenheimer To: Martin Schwidefsky Cc: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com, npiggin@suse.de, chris.mason@oracle.com, kurt.hackel@oracle.com, dave.mccracken@oracle.com, Avi Kivity , jeremy@goop.org, Rik van Riel , alan@lxorguk.ukuu.org.uk, Rusty Russell , akpm@osdl.org, Marcelo Tosatti , Balbir Singh , tmem-devel@oss.oracle.com, sunil.mushran@oracle.com, linux-mm@kvack.org, Himanshu Raj Subject: RE: [RFC] transcendent memory for Linux In-Reply-To: <20090622132702.6638d841@skybase> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 1.5.1 (304090) [OL 9.0.0.6627] Content-Type: text/plain; charset=Windows-1252 Content-Transfer-Encoding: 8BIT X-Source-IP: abhmt005.oracle.com [141.146.116.14] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090203.4A3FEC75.01A6:SCFSTAT5015188,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4811 Lines: 111 > > Tmem has some similarity to IBM's Collaborative Memory Management, > > but creates more of a partnership between the kernel and the > > "privileged entity" and is not very invasive. Tmem may be > > applicable for KVM and containers; there is some disagreement on > > the extent of its value. Tmem is highly complementary to ballooning > > (aka page granularity hot plug) and memory deduplication (aka > > transparent content-based page sharing) but still has value > > when neither are present. Hi Martin -- Thanks much for taking the time to reply! > The basic idea seems to be that you reduce the amount of memory > available to the guest and as a compensation give the guest some > tmem, no? That's mostly right. Tmem's primary role is to help with guests that have had their available memory reduced (via ballooning or hotplug or some future mechanism). However tmem additionally provides a way of providing otherwise unused-by-the-hypervisor ("fallow") memory to a guest, essentially expanding a guest kernel's page cache if no other guest is using the RAM anyway. And "as a compensation GIVE the guest some tmem" is misleading, because tmem (at least ephemeral tmem) is never "given" to a guest. A better word might be "loaned" or "rented". The guest gets to use some tmem for awhile but if it doesn't use it effectively, the memory is "repossessed" (or the guest is "evicted" from using that memory) transparently so that it can be used more effectively elsewhere. > If that is the case then the effect of tmem is somewhat > comparable to the volatile page cache pages. There is definitely some similarity in that both are providing useful information to the hypervisor. In CMM's case, the guest is passively providing info; in tmem's case it is actively providing info and making use of the info within the kernel, not just in the hypervsior, which is why I described it as "more of a partnership". > The big advantage of this approach is its simplicity, but there > are down sides as well: > 1) You need to copy the data between the tmem pool and the page > cache. At least temporarily there are two copies of the same > page around. That increases the total amount of used memory. Certainly this is theoretically true, but I think the increase is small and transient. The kernel only puts the page into precache when it has decided to use that page for another purpose (due to memory pressure). Until it actually "reprovisions" the page, the data is briefly duplicated. On the other hand, copying eliminates the need for fancy games with virtual mappings and TLB entries. Copying appears to be getting much faster on recent CPUs; I'm not sure if this is also true of TLB operations. > 2) The guest has a smaller memory size. Either the memory is > large enough for the working set size in which case tmem is > ineffective... Yes, if the kernel has memory to "waste" (e.g. never refaults and never swaps), tmem is ineffective. The goal of tmem is to optimize memory usage across an environment where there is contention among multiple users (guests) for a limited resource (RAM). If your environment always has enough RAM for every guest and there's never any contention, you don't want tmem... but I'd assert you've wasted money in your data center by buying too much RAM! > or the working set does not fit which increases > the memory pressure and the cpu cycles spent in the mm code. True, this is where preswap is useful. Without tmem/preswap, "does not fit" means swap-to-disk or refaulting is required. Preswap alleviates the memory pressure by using tmem to essentially swap to "magic memory" and precache reduces the need for refaulting. > 3) There is an additional turning knob, the size of the tmem pool > for the guest. I see the need for a clever algorithm to determine > the size for the different tmem pools. Yes, some policy in the hypervisor is still required, essentially a "memory scheduler". The working implementation (in Xen) uses FIFO, but modified by admin-configurable "weight" values to allow QoS and avoid DoS. > Overall I would say its worthwhile to investigate the performance > impacts of the approach. Thanks. I'd appreciate any thoughts or experience you have in this area (onlist or offlist) as I don't think there are any adequate benchmarks that aren't either myopic for a complex environment or contrived (and thus misleading) to prove an isolated point. I would also guess that tmem is more beneficial on recent multi-core processors, and more costly on older chips. Thanks again, Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/