Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751476AbcCICT2 (ORCPT ); Tue, 8 Mar 2016 21:19:28 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:63589 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750766AbcCICTU (ORCPT ); Tue, 8 Mar 2016 21:19:20 -0500 Message-ID: <56DF87E6.10703@huawei.com> Date: Wed, 9 Mar 2016 10:18:14 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Joonsoo Kim CC: Joonsoo Kim , Vlastimil Babka , Hanjun Guo , Laura Abbott , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Andrew Morton , Sasha Levin , Laura Abbott , Catalin Marinas , Will Deacon , Arnd Bergmann , "thunder.leizhen@huawei.com" , dingtinahong , , "linux-mm@kvack.org" Subject: Re: Suspicious error for CMA stress test References: <56D79284.3030009@redhat.com> <56D832BD.5080305@huawei.com> <20160304020232.GA12036@js1304-P5Q-DELUXE> <20160304043232.GC12036@js1304-P5Q-DELUXE> <56D92595.60709@huawei.com> <20160304063807.GA13317@js1304-P5Q-DELUXE> <56D93ABE.9070406@huawei.com> <20160307043442.GB24602@js1304-P5Q-DELUXE> <56DD7B20.1020508@suse.cz> <20160308074816.GA31471@js1304-P5Q-DELUXE> <56DEAD3D.5090706@huawei.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.56DF87FD.015D,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: ed41f6e7d2a049fd2d64899ea62c1475 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2406 Lines: 69 On 2016/3/8 23:36, Joonsoo Kim wrote: > 2016-03-08 19:45 GMT+09:00 Xishi Qiu : >> On 2016/3/8 15:48, Joonsoo Kim wrote: >> >>> On Mon, Mar 07, 2016 at 01:59:12PM +0100, Vlastimil Babka wrote: >>>> On 03/07/2016 05:34 AM, Joonsoo Kim wrote: >>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote: >>>>>>> Sad to hear that. >>>>>>> >>>>>>> Could you tell me your system's MAX_ORDER and pageblock_order? >>>>>>> >>>>>> >>>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help! >>>> >>>> I thought that CMA regions/operations (and isolation IIRC?) were >>>> supposed to be MAX_ORDER aligned exactly to prevent needing these >>>> extra checks for buddy merging. So what's wrong? >>> >>> CMA isolates MAX_ORDER aligned blocks, but, during the process, >>> partialy isolated block exists. If MAX_ORDER is 11 and >>> pageblock_order is 9, two pageblocks make up MAX_ORDER >>> aligned block and I can think following scenario because pageblock >>> (un)isolation would be done one by one. >>> >>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, >>> MIGRATE_ISOLATE, respectively. >>> >> >> Hi Joonsoo, >> >>> CC -> IC -> II (Isolation) >> >>> II -> CI -> CC (Un-isolation) >>> >>> If some pages are freed at this intermediate state such as IC or CI, >>> that page could be merged to the other page that is resident on >>> different type of pageblock and it will cause wrong freepage count. >>> >> >> Isolation will appear when do cma alloc, so there are two following threads. >> >> C(free)C(used) -> start_isolate_page_range -> I(free)C(used) -> I(free)I(someone free it) -> undo_isolate_page_range -> C(free)C(free) >> so free cma is 2M -> 0M -> 0M -> 4M, the increased 2M was freed by someone. > > Your example is correct one but think about following one. > C(free)C(used) -> start_isolate_page_range -> I(free)C(used) -> > I(free)**C**(someone free it) -> undo_isolate_page_range -> > C(free)C(free) > > it would be 2M -> 0M -> 2M -> 6M. > When we do I(free)C(someone free it), CMA freepage is added > because it is on CMA pageblock. But, bad merging happens and > 4M buddy is made and it is in isolate buddy list. > Later, when we do undo_isolation, this 4M buddy is moved to > CMA buddy list and 4M is added to CMA freepage counter so > total is 6M. > Hi Joonsoo, I know the cause of the problem now, thank you very much. > Thanks. > > . >