Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753971AbbHRTGo (ORCPT ); Tue, 18 Aug 2015 15:06:44 -0400 Received: from mail-qg0-f49.google.com ([209.85.192.49]:33521 "EHLO mail-qg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751902AbbHRTGj (ORCPT ); Tue, 18 Aug 2015 15:06:39 -0400 Date: Tue, 18 Aug 2015 15:06:36 -0400 From: Jerome Glisse To: Dan Williams Cc: "linux-kernel@vger.kernel.org" , Boaz Harrosh , Rik van Riel , "linux-nvdimm@lists.01.org" , Dave Hansen , david , Ingo Molnar , Linux MM , Ingo Molnar , Mel Gorman , "H. Peter Anvin" , Ross Zwisler , "torvalds@linux-foundation.org" , Christoph Hellwig Subject: Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory" Message-ID: <20150818190634.GB7424@gmail.com> References: <20150813035005.36913.77364.stgit@otcpl-skl-sds-2.jf.intel.com> <20150814213714.GA3265@gmail.com> <20150814220605.GB3265@gmail.com> <20150817214554.GA5976@gmail.com> <20150818165532.GA7424@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2805 Lines: 54 On Tue, Aug 18, 2015 at 10:23:38AM -0700, Dan Williams wrote: > On Tue, Aug 18, 2015 at 9:55 AM, Jerome Glisse wrote: > > On Mon, Aug 17, 2015 at 05:46:43PM -0700, Dan Williams wrote: > >> On Mon, Aug 17, 2015 at 2:45 PM, Jerome Glisse wrote: > >> > On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote: > >> >> Although it does not offer perfect protection if device memory is at a > >> >> physically lower address than RAM, skipping the update of these > >> >> variables does seem to be what we want. For example /dev/mem would > >> >> fail to allow write access to persistent memory if it fails a > >> >> valid_phys_addr_range() check. Since /dev/mem does not know how to > >> >> write to PMEM in a reliably persistent way, it should not treat a > >> >> PMEM-pfn like RAM. > >> > > >> > So i attach is a patch that should keep ZONE_DEVICE out of consideration > >> > for the buddy allocator. You might also want to keep page reserved and not > >> > free inside the zone, you could replace the generic_online_page() using > >> > set_online_page_callback() while hotpluging device memory. > >> > > >> > >> Hmm, are we already protected by the fact that ZONE_DEVICE is not > >> represented in the GFP_ZONEMASK? > > > > Yeah seems you right, high_zoneidx (which is derive using gfp_zone()) will > > always limit which zones are considered. I thought that under memory presure > > it would go over all of the zonelist entry and eventualy consider the device > > zone. But it doesn't seems to be that way. > > > > Keeping the device zone out of the zonelist might still be a good idea, if > > only to avoid pointless iteration for the page allocator. Unless someone can > > think of a reason why this would be bad. > > > > The other question I have is whether disabling ZONE_DMA is a realistic > tradeoff for enabling ZONE_DEVICE? I.e. can ZONE_DMA default to off > going forward, lose some ISA device support, or do we need to figure > out how to enable > 4 zones. That require some auditing a quick look and it seems to matter for s390 arch and there is still few driver that use it. I think we can forget about ISA bus, i would be surprise if you could still run a recent kernel on a computer that has ISA bus. Thought maybe you don't need a new ZONE_DEV and all you need is valid struct page for this device memory, and you don't want this page to be useable by the general memory allocator. There is surely other ways to achieve that like marking all as reserved when you hotplug them. Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/