Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753159AbYAWMFM (ORCPT ); Wed, 23 Jan 2008 07:05:12 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751379AbYAWMFA (ORCPT ); Wed, 23 Jan 2008 07:05:00 -0500 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:54091 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750994AbYAWME7 (ORCPT ); Wed, 23 Jan 2008 07:04:59 -0500 Date: Wed, 23 Jan 2008 13:04:46 +0100 From: Andrea Arcangeli To: Robin Holt Cc: Avi Kivity , Christoph Lameter , Izik Eidus , Andrew Morton , Nick Piggin , kvm-devel@lists.sourceforge.net, Benjamin Herrenschmidt , steiner@sgi.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com, Hugh Dickins Subject: Re: [kvm-devel] [PATCH] export notifier #1 Message-ID: <20080123120446.GF15848@v2.random> References: <20080117193252.GC24131@v2.random> <20080121125204.GJ6970@v2.random> <4795F9D2.1050503@qumranet.com> <20080122144332.GE7331@v2.random> <20080122200858.GB15848@v2.random> <20080122223139.GD15848@v2.random> <479716AD.5070708@qumranet.com> <20080123105246.GG26420@sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080123105246.GG26420@sgi.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1971 Lines: 46 On Wed, Jan 23, 2008 at 04:52:47AM -0600, Robin Holt wrote: > But 100 callouts holding spinlocks will not work for our implementation > and even if the callouts are made with spinlocks released, we would very > strongly prefer a single callout which messages the range to the other > side. But you take the physical address and turn into mm+va with your rmap... > > Also, our rmap key for finding the spte is keyed on (mm, va). I imagine > > most RDMA cards are similar. > > For our RDMA rmap, it is based upon physical address. so why do you turn it into mm+va? > >> There is only the need to walk twice for pages that are marked Exported. > >> And the double walk is only necessary if the exporter does not have its > >> own rmap. The cross partition thing that we are doing has such an rmap and > >> its a matter of walking the exporters rmap to clear out the external > >> references and then we walk the local rmaps. All once. > >> > > > > The problem is that external mmus need a reverse mapping structure to > > locate their ptes. We can't expand struct page so we need to base it on mm > > + va. > > Our rmap takes a physical address and turns it into mm+va. Why don't you stick to mm+va and use get_user_pages and let the VM do the swapins etc...? > > Can they wait on that bit? > > PageLocked(page) should work, right? We already have a backoff > mechanism so we expect to be able to adapt it to include a > PageLocked(page) check. It's not PageLocked but wait_on_page___not___exported() called on the master node. Plus nothing in the VM of the master node calls SetPageExported... good luck to make it work (KVM swapping OTOH works like a charm already w/o the backwards secondary-TLB-flushing order). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/