Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759409AbaGPLOh (ORCPT ); Wed, 16 Jul 2014 07:14:37 -0400 Received: from cantor2.suse.de ([195.135.220.15]:52619 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759362AbaGPLOe (ORCPT ); Wed, 16 Jul 2014 07:14:34 -0400 Message-ID: <53C65E92.2000606@suse.cz> Date: Wed, 16 Jul 2014 13:14:26 +0200 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Joonsoo Kim CC: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Peter Zijlstra , Mel Gorman , Johannes Weiner , Minchan Kim , Yasuaki Ishimatsu , Zhang Yanfei , "Srivatsa S. Bhat" , Tang Chen , Naoya Horiguchi , Bartlomiej Zolnierkiewicz , Wen Congyang , Marek Szyprowski , Michal Nazarewicz , Laura Abbott , Heesub Shin , "Aneesh Kumar K.V" , Ritesh Harjani , t.stanislaws@samsung.com, Gioh Kim , linux-mm@kvack.org, Lisa Du , linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/10] fix freepage count problems due to memory isolation References: <1404460675-24456-1-git-send-email-iamjoonsoo.kim@lge.com> <53B6C947.1070603@suse.cz> <20140707044932.GA29236@js1304-P5Q-DELUXE> <53BAAFA5.9070403@suse.cz> <20140714062222.GA11317@js1304-P5Q-DELUXE> <53C3A7A5.9060005@suse.cz> <20140715082828.GM11317@js1304-P5Q-DELUXE> <53C4E813.7020108@suse.cz> <20140716084333.GA20359@js1304-P5Q-DELUXE> In-Reply-To: <20140716084333.GA20359@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/16/2014 10:43 AM, Joonsoo Kim wrote: >> I think your plan of multiple parallel CMA allocations (and thus >> multiple parallel isolations) is also possible. The isolate pcplists >> can be shared by pages coming from multiple parallel isolations. But >> the flush operation needs a pfn start/end parameters to only flush >> pages belonging to the given isolation. That might mean a bit of >> inefficient list traversing, but I don't think it's a problem. > > I think that special pcplist would cause a problem if we should check > pfn range. If there are too many pages on this pcplist, move pages from > this pcplist to isolate freelist takes too long time in irq context and > system could be broken. This operation cannot be easily stopped because > it is initiated by IPI on other cpu and starter of this IPI expect that > all pages on other cpus' pcplist are moved properly when returning > from on_each_cpu(). > > And, if there are so many pages, serious lock contention would happen > in this case. Hm I see. So what if it wasn't a special pcplist, but a special "free list" where the pages would be just linked together as on pcplist, regardless of order, and would not merge until the CPU that drives the memory isolation process decides it is safe to flush them away. That would remove the need for IPI's and provide the same guarantees I think. > Anyway, my idea's key point is using PageIsolated() to distinguish > isolated page, instead of using PageBuddy(). If page is PageIsolated(), Is PageIsolated a completely new page flag? Those are a limited resource so I would expect some resistance to such approach. Or a new special page->_mapcount value? That could maybe work. > it isn't handled as freepage although it is in buddy allocator. During free, > page with MIGRATETYPE_ISOLATE will be marked as PageIsolated() and > won't be merged and counted for freepage. OK. Preventing wrong merging is the key point and this should work. > When we move pages from normal buddy list to isolate buddy > list, we check PageBuddy() and subtract number of PageBuddy() pages Do we really need to check PageBuddy()? Could a page get marked as PageIsolate() but still go to normal list instead of isolate list? > from number of freepage. And, change page from PageBuddy() to PageIsolated() > since it is handled as isolated page at this point. In this way, freepage > count will be correct. > > Unisolation can be done by similar approach. > > I made prototype of this approach and it isn't intrusive to core > allocator compared to my previous patchset. > > Make sense? I think so :) > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/