Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753489AbbGaIp7 (ORCPT ); Fri, 31 Jul 2015 04:45:59 -0400 Received: from outbound-smtp02.blacknight.com ([81.17.249.8]:38963 "EHLO outbound-smtp02.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751287AbbGaIny (ORCPT ); Fri, 31 Jul 2015 04:43:54 -0400 Date: Fri, 31 Jul 2015 09:43:50 +0100 From: Mel Gorman To: Vlastimil Babka Cc: Mel Gorman , Linux-MM , Johannes Weiner , Rik van Riel , Pintu Kumar , Xishi Qiu , Gioh Kim , LKML Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Message-ID: <20150731084350.GF5840@techsingularity.net> References: <1437379219-9160-1-git-send-email-mgorman@suse.com> <1437379219-9160-10-git-send-email-mgorman@suse.com> <55B8BA75.9090903@suse.cz> <20150729125334.GC19352@techsingularity.net> <55BB31C4.3000107@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <55BB31C4.3000107@suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7775 Lines: 170 On Fri, Jul 31, 2015 at 10:28:52AM +0200, Vlastimil Babka wrote: > On 07/29/2015 02:53 PM, Mel Gorman wrote: > >On Wed, Jul 29, 2015 at 01:35:17PM +0200, Vlastimil Babka wrote: > >>On 07/20/2015 10:00 AM, Mel Gorman wrote: > >>>+/* > >>>+ * Used when an allocation is about to fail under memory pressure. This > >>>+ * potentially hurts the reliability of high-order allocations when under > >>>+ * intense memory pressure but failed atomic allocations should be easier > >>>+ * to recover from than an OOM. > >>>+ */ > >>>+static void unreserve_highatomic_pageblock(const struct alloc_context *ac) > >>>+{ > >>>+ struct zonelist *zonelist = ac->zonelist; > >>>+ unsigned long flags; > >>>+ struct zoneref *z; > >>>+ struct zone *zone; > >>>+ struct page *page; > >>>+ int order; > >>>+ > >>>+ for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx, > >>>+ ac->nodemask) { > >> > >>This fixed order might bias some zones over others wrt unreserving. Is it OK? > > > >I could not think of a situation where it mattered. It'll always be > >preferring highest zone over lower zones. Allocation requests that can > >use any zone that do not care. Allocation requests that are limited to > >lower zones are protected as long as possible. > > Hmm... allocation requests will follow fair zone policy and thus the > highatomic reservations will be spread fairly among all zones? Unless the > allocations require lower zones of course. > Does that matter? > But for unreservations, normal/high allocations failing under memory > pressure will lead to unreserving highatomic pageblocks first in the higher > zones and only then the lower zones, and that was my concern. But it's true > that failing allocations that require lower zones will lead to unreserving > the lower zones, so it might be ok in the end. > Again, I don't think it matters. > > > >> > >>>+ /* Preserve at least one pageblock */ > >>>+ if (zone->nr_reserved_highatomic <= pageblock_nr_pages) > >>>+ continue; > >>>+ > >>>+ spin_lock_irqsave(&zone->lock, flags); > >>>+ for (order = 0; order < MAX_ORDER; order++) { > >> > >>Would it make more sense to look in descending order for a higher chance of > >>unreserving a pageblock that's mostly free? Like the traditional page stealing does? > >> > > > >I don't think it's worth the search cost. Traditional page stealing is > >searching because it's trying to minimise events that cause external > >fragmentation. Here we'd gain very little. We are under some memory > >pressure here, if enough pages are not free then another one will get > >freed shortly. Either way, I doubt the difference is measurable. > > Hmm, I guess... > > > > >>>+ struct free_area *area = &(zone->free_area[order]); > >>>+ > >>>+ if (list_empty(&area->free_list[MIGRATE_HIGHATOMIC])) > >>>+ continue; > >>>+ > >>>+ page = list_entry(area->free_list[MIGRATE_HIGHATOMIC].next, > >>>+ struct page, lru); > >>>+ > >>>+ zone->nr_reserved_highatomic -= pageblock_nr_pages; > >>>+ set_pageblock_migratetype(page, ac->migratetype); > >> > >>Would it make more sense to assume MIGRATE_UNMOVABLE, as high-order allocations > >>present in the pageblock typically would be, and apply the traditional page > >>stealing heuristics to decide if it should be changed to ac->migratetype (if > >>that differs)? > >> > > > >Superb spot, I had to think about this for a while and initially I was > >thinking your suggestion was a no-brainer and obviously the right thing > >to do. > > > >On the pro side, it preserves the fragmentation logic because it'll force > >the normal page stealing logic to be applied. > > > >On the con side, we may reassign the pageblock twice -- once to > >MIGRATE_UNMOVABLE and once to ac->migratetype. That one does not matter > >but the second con is that we inadvertly increase the number of unmovable > >blocks in some cases. > > > >Lets say we default to MIGRATE_UNMOVABLE, ac->migratetype is MIGRATE_MOVABLE > >and there are enough free pages to satisfy the allocation but not steal > >the whole pageblock. The end result is that we have a new unmovable > >pageblock that may not be necessary. The next unmovable allocation > >potentially is forever. They key observation is that previously the > >pageblock could have been short-lived high-order allocations that could > >be completely free soon if it was assigned MIGRATE_MOVABLE. This may not > >apply when SLUB is using high-order allocations but the point still > >holds. > > Yeah, I see the point. The obvious counterexample is a pageblock that we > design as MOVABLE and yet it contains some long-lived unmovable allocation. > More unmovable allocations could lead to choosing another movable block as a > fallback, while if we marked this pageblock as unmovable, they could go here > and not increase fragmentation. > > The problem is, we can't known which one is the case. I've toyed with an > idea of MIGRATE_MIXED blocks that would be for cases where the heuristics > decide that e.g. an UNMOVABLE block is empty enough to change it to MOVABLE, > but still it may contain some unmovable allocations. Such pageblocks should > be preferred fallbacks for future unmovable allocations before truly > pristine movable pageblocks. > I tried that once upon a time. The number of blocks simply increased over time and there was no sensible way to recover from it. I never found a solution that worked for very long. > >>>+ > >>>+ /* > >>>+ * If the caller is not atomic then discount the reserves. This will > >>>+ * over-estimate how the atomic reserve but it avoids a search > >>>+ */ > >>>+ if (likely(!(alloc_flags & ALLOC_HARDER))) > >>>+ free_pages -= z->nr_reserved_highatomic; > >> > >>Hm, so in the case the maximum of 10% reserved blocks is already full, we deny > >>the allocation access to another 10% of the memory and push it to reclaim. This > >>seems rather excessive. > > > >It's necessary. If normal callers can use it then the reserve fills with > >normal pages, the memory gets fragmented and high-order atomic allocations > >fail due to fragmentation. Similarly, the number of MIGRATE_HIGHORDER > >pageblocks cannot be unbound or everything else will be continually pushed > >into reclaim even if there is plenty of memory free. > > I understand denying normal allocations access to highatomic reserves via > watermarks is necessary. But my concern is that for each reserved pageblock > we effectively deny up to two pageblocks-worth-of-pages to normal > allocations. One pageblock that is marked as MIGRATE_HIGHATOMIC, and once it > becomes full, free_pages above are decreased twice - once by the pageblock > becoming full, and then again by subtracting z->nr_reserved_highatomic. This > extra gap is still usable by highatomic allocations, but they will also > potentially mark more pageblocks highatomic and further increase the gap. In > the worst case we have 10% of pageblocks marked highatomic and full, and > another 10% that is only usable by highatomic allocations (but won't be > marked as such), and if no more highatomic allocations come then the 10% is > wasted. Similar to what I said to Joosoo, I'll drop the size of the reserves in the next revision. It brings things back in line with MIGRATE_RESERVE in terms of the amount of memory we reserve while still removing the need for high-order watermark checks. It means that the rate of atomic high-order allocation failure will remain the same after the series as before but that should not matter. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/