Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752996AbbF3LyK (ORCPT ); Tue, 30 Jun 2015 07:54:10 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47362 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751909AbbF3LyD (ORCPT ); Tue, 30 Jun 2015 07:54:03 -0400 Date: Tue, 30 Jun 2015 12:53:53 +0100 From: Mel Gorman To: Ingo Molnar Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630115353.GB6812@suse.de> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20150630104654.GA24932@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3529 Lines: 75 On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote: > > * Mel Gorman wrote: > > > [...] > > > > Basically, overall I feel this series is the wrong approach but not knowing who > > the users are making is much harder to judge. I strongly suspect that if > > mirrored memory is to be properly used then it needs to be available before the > > page allocator is even active. Once active, there needs to be controlled access > > for allocation requests that are really critical to mirror and not just all > > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be > > alterations to the bootmem allocator and access to an explicit reserve that is > > not accounted for as "free memory" and accessed via an explicit GFP flag. > > So I think the main goal is to avoid kernel crashes when a #MC memory fault > arrives on a piece of memory that is owned by the kernel. > Sounds logical. In that case, bootmem awareness would be crucial. Enabling support in just the page allocator is too late. > In that sense 'protecting' all kernel allocations is natural: we don't know how to > recover from faults that affect kernel memory. > It potentially uses all mirrored memory on memory that does not need that sort of guarantee. For example, if there was a MC on memory backing the inode cache then potentially that is recoverable as long as the inodes were not dirty. That's a minor detail as the kernel could later protect only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal MC in kernel space could be distinguished from non-fatal checks. Bootmem awareness is much more important either way. If that was addressed then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that is only used for MIGRATE_UNMOVABLE allocations and never for user-space. That misses MIGRATE_RECLAIMABLE so if that is required then we need something else that both preserves fragmentation avoidance and avoid introducing loads of new migratetypes. Reclaim-related issues could be partially avoided by forbidding use from userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during watermark checks. > We do know how to recover from faults that affect user-space memory alone. > > So if a mechanism is in place that prioritizes 3 groups of allocators: > > - non-recoverable memory (kernel allocations mostly) > So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether they are accounted for by zones of MIGRATE_TYPES. > - high priority user memory (critical apps that must never fail) > This one is problematic with a MIGRATE_TYPE-based approach such as the one in this series. If a high priority requires memory and MIGRATE_MIRROR is full then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR memory with no guarantee of success. It'll look like unnecessary thrashing from userspace but difficult to diagnose as reclaim stats are per-zone based. Dealing with this needs either a zone-based approach or a lot of surgery to reclaim (similar to what the node-based LRU series does actually when it skips pages when the caller requires lowmem pages). -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/