Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756562AbbFRUdq (ORCPT ); Thu, 18 Jun 2015 16:33:46 -0400 Received: from mga11.intel.com ([192.55.52.93]:9038 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751501AbbFRUdh (ORCPT ); Thu, 18 Jun 2015 16:33:37 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,641,1427785200"; d="scan'208";a="749233959" Date: Thu, 18 Jun 2015 13:33:35 -0700 From: "Luck, Tony" To: Vlastimil Babka Cc: Xishi Qiu , Andrew Morton , nao.horiguchi@gmail.com, Yinghai Lu , "H. Peter Anvin" , Thomas Gleixner , mingo@elte.hu, Xiexiuqi , Hanjun Guo , Linux MM , LKML Subject: Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations Message-ID: <20150618203335.GA3829@agluck-desk.sc.intel.com> References: <55704A7E.5030507@huawei.com> <557FD5F8.10903@suse.cz> <557FDB9B.1090105@huawei.com> <557FF06A.3020000@suse.cz> <55821D85.3070208@huawei.com> <55825DF0.9090903@suse.cz> <55829149.60807@huawei.com> <5582959E.4080402@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5582959E.4080402@suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2678 Lines: 62 On Thu, Jun 18, 2015 at 11:55:42AM +0200, Vlastimil Babka wrote: > >>>If there are many mirror regions in one node, then it will be many holes in the > >>>normal zone, is this fine? > >> > >>Yeah, it doesn't matter how many holes there are. > > > >So mirror zone and normal zone will span each other, right? > > > >e.g. node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal), 16-24G(mirror), 24-28G(normal) ... > >normal: start=4G, size=28-4=24G, > >mirror: start=8G, size=24-8=16G, > > Yes, that works. It's somewhat unfortunate wrt performance that the hardware > does it like this though. With current Xeon h/w you can have one mirrored range per memory controller ... and there are two memory controllers on a cpu socket, so two mirrored ranges per node. So a map might look like: SKT0: MC0: 0-2G Mirrored (but we may want to ignore mirror here to keep it for ZONE_DMA) SKT0: MC0: 2G-4G No memory ... I/O mapping area SKT0: MC0: 4G-34G Not mirrored SKT0: MC1: 34G-40G Mirrored SKT0: MC1: 40G-66G Not mirrored SKT1: MC0: 66G-70G Mirror SKT1: MC0: 70G-98G Not Mirrored SKT1: MC1: 98G-102G Mirror SKT1: MC1: 102G-130G Not Mirrored ... and so on. > >I think zone is defined according to the special address range, like 16M(DMA), 4G(DMA32), > > Traditionally yes. But then there is ZONE_MOVABLE, this year's LSF/MM we > discussed (and didn't outright deny) ZONE_CMA... > I'm not saying others will favour the new zone approach though, it's just my > opinion that it might be a better option than a new migratetype. If we are going to have lots of zones ... then perhaps we will need a fast way to look at a "struct page" and decide which zone it belongs to. Complicated math on the address deosn't sound ideal. If the complex zone model is just for 64-bit, are there enough bits available in page->flags (3 bits for 8 options ... which we are close to filling now ... 4 bits for future breathing room). > >and is it appropriate to add a new mirror zone with a volatile physical address? > > By "volatile" you mean what, that the example above would change > dynamically? That would be rather challenging... If we hot-add another cpu together with on die memory controllers connected to more memory ... then some of the new memory might be mirrored. Current h/w doesn't allow mirrored areas to grow/shrink (though if there are a lot of errors we may break a mirror so a whole range could lose the mirror attribute). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/