Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752595AbbF3SMs (ORCPT ); Tue, 30 Jun 2015 14:12:48 -0400 Received: from mga01.intel.com ([192.55.52.88]:62791 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750821AbbF3SMj convert rfc822-to-8bit (ORCPT ); Tue, 30 Jun 2015 14:12:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,379,1432623600"; d="scan'208";a="756002383" From: "Luck, Tony" To: Mel Gorman , Ingo Molnar CC: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , Kamezawa Hiroyuki , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: RE: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Thread-Topic: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Thread-Index: AQHQsH/SyOmj3uqah0iJLVCYU8WwM53FRlMAgAASLgCAABK3gP//5bhQ Date: Tue, 30 Jun 2015 18:12:35 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F32AA1974@ORSMSX114.amr.corp.intel.com> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> <20150630115353.GB6812@suse.de> In-Reply-To: <20150630115353.GB6812@suse.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2512 Lines: 53 > Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. Andrew already applied some patches from me that I think covered bootmem mirror allocations: commit fc6daaf93151877748f8096af6b3fddb147f22d6 mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute commit a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95 mm/memblock: allocate boot time data structures from mirrored memory commit b05b9f5f9dcf593a0e9327676b78e6c17b4218e8 x86, mirror: x86 enabling - find mirrored memory ranges If I missed something, please let me know. >> In that sense 'protecting' all kernel allocations is natural: we don't know how to >> recover from faults that affect kernel memory. >> > > It potentially uses all mirrored memory on memory that does not need that > sort of guarantee. For example, if there was a MC on memory backing the > inode cache then potentially that is recoverable as long as the inodes > were not dirty. Right now this is hard to do. On Intel we get a broadcast machine check that may catch bystander cpus holding locks that we might need to look at kernel structures to make decisions on what we just lost. That may get easier with local machine check (only the logical cpu that tried to consume the corrupt data gets the machine check ... patches for Linux are in for basic support of this ... waiting for h/w that does it). > That's a minor detail as the kernel could later protect > only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal > MC in kernel space could be distinguished from non-fatal checks. So the immediate use case is large memory servers (hundred+ Gbytes to TBytes) running some applications that use most of memory in user mode (like a database). We mirror enough memory to cover *all* the kernel allocations so that a bad memory access with be fixed from the mirror for kernel, or result in SIGBUS to a process for user page ... either way we don't crash the system. Perhaps in the future we might find some places in the kernel where we can cover a lot of memory without too many code changes ... e.g. things like pagecopy(). At that time we'd have to think about allocation priorities. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/