Received: by 10.223.176.5 with SMTP id f5csp265377wra; Tue, 30 Jan 2018 11:07:33 -0800 (PST) X-Google-Smtp-Source: AH8x225+PAXeTVee5Kns5b9GZcRecRLjkxqyy/xsvhKYOifPtLt6uTBz41eYPeWkYqCJVs5xRu3b X-Received: by 10.99.65.133 with SMTP id o127mr15558660pga.13.1517339253237; Tue, 30 Jan 2018 11:07:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517339253; cv=none; d=google.com; s=arc-20160816; b=vrlqi2RYNUjVu6ZX0FVVnx8Zevu1hPl/eT1RKEmNOHcFKXX0UghYYL3LWgxCJ9YAHf LBGnsxiyWE8Z2y5CFqe91j3Rv9+8aaTRoWTrJQDiypgMa6AG6nLmUVg8L4IQgmvxdplY PJMq+tQ7bw6yuULEfBCo1fcVcrRsGl/SXlIDG0vDY01fhmVmfGKeWFpRVZFSWikuC8Ss A+DwfORkn48Hz/uP9de7KwNBPOpZCL0/h4J/TeKOnqCdjiBSRJfT+XIWWdQMmTGx7lJJ s6wfpwmCb59GgnM6NrFnhJ6ROXjNCvAF9lW2KnGE6uGqV4zTuDm4lqjWwbu7zgsp+FTo QZZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=k18ZUNvIF10EsYtVNcBVxfPSjxpKymoTsgoS0rBvTPQ=; b=VD+wDAQj1LtuhVYZx35JXJHmXFwUxQEVj2+9c8phOJhQCexSygUUPMU0ncIi3BxOz/ d4EPg/eG8bPVMvFTjQJSMyYWddI8kB/s+nU7brrl/VLHzx+RPH9aDoVagtNXu9ZCdREx Nc0rh0tc7ELA6m/BuCKMBOSI5/4gf/SRzl6D9dYD2zpYtPhmegJJLMlJQ6ffKcpqFAPn vrwIk/N978+mJqSqCxhG4BsexMAHPYL8RLDGz5T43LavoVhh4BZ7uAzyxabJ6RcO+lTX GuV9UINfTtKDBJrhMmA0nSKSj0RM4zyTwNmfjzSwNJAgW1tKQUy2FaMlxRV06iskYWBo WHYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=eBnF/9Tg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z2si3469824pgr.168.2018.01.30.11.07.18; Tue, 30 Jan 2018 11:07:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=eBnF/9Tg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753423AbeA3SLK (ORCPT + 99 others); Tue, 30 Jan 2018 13:11:10 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:35778 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752595AbeA3SLJ (ORCPT ); Tue, 30 Jan 2018 13:11:09 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0UI7XuT038838 for ; Tue, 30 Jan 2018 18:11:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : in-reply-to : references : from : date : message-id : subject : to : cc : content-type; s=corp-2017-10-26; bh=k18ZUNvIF10EsYtVNcBVxfPSjxpKymoTsgoS0rBvTPQ=; b=eBnF/9TgnxFauwVSBFepw11NdEML8IvMZvCdPCZsDC1iK4ZQw6K44T1Dy0yBxb7HCZdg xHNO2VvmFz1HmedeklnqXfEcW9IdMhfOX/gOcnot+3MSXk8mAHkpD3oJkj8csvCwFI5b pUqkG6UxZ+0Tt+HTrQvkAXfoheXJ9dlCTOgD1RUTXVMYcSx3VGoCA0cDYTku1WiUNUQW F0z4n+FulGZ99g3H1+QIWqZf3YkbMseF322no5YqBhaUGZJOps5T0R+6Co6+IyAjWMHr 9iGtnPktQoB6UZFD6/0aVIa5zpL3ubM8ol2cG+M/CCF4fD+SFrOmu3XX6pR6UqTOcbPx QA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2ftwf0r38k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 30 Jan 2018 18:11:08 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w0UIB7ug013730 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Tue, 30 Jan 2018 18:11:07 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0UIB7rp023745 for ; Tue, 30 Jan 2018 18:11:07 GMT Received: from mail-ot0-f176.google.com (/74.125.82.176) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 30 Jan 2018 10:11:07 -0800 Received: by mail-ot0-f176.google.com with SMTP id a24so10827116otd.4 for ; Tue, 30 Jan 2018 10:11:07 -0800 (PST) X-Gm-Message-State: AKwxytdMJUP/twKNcFJueWzTFoEfpTpREIw+mcJfHpdDT70RZ7laKfGU 7n3ggu2GFKxB3bXNhZQElPbAVZ+5Vi/5urfyFw== X-Received: by 10.157.2.236 with SMTP id 99mr23895781otl.313.1517335867115; Tue, 30 Jan 2018 10:11:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.177.3 with HTTP; Tue, 30 Jan 2018 10:11:06 -0800 (PST) In-Reply-To: <20180130101141.GW21609@dhcp22.suse.cz> References: <20180130083006.GB1245@in.ibm.com> <20180130091600.GA26445@dhcp22.suse.cz> <20180130092815.GR21609@dhcp22.suse.cz> <20180130095345.GC1245@in.ibm.com> <20180130101141.GW21609@dhcp22.suse.cz> From: Pavel Tatashin Date: Tue, 30 Jan 2018 13:11:06 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Memory hotplug not increasing the total RAM To: Michal Hocko Cc: Bharata B Rao , Linux Kernel Mailing List , Linux Memory Management List , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8790 signatures=668657 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=26 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801300222 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, Thank you for taking care of the problem. The patch may introduce a small performance regression during normal boot, as we add a branch into a hot initialization path. But, it fixes a current problem, so: Reviewed-by: Pavel Tatashin However, I think we should change the hotplug code to also not to touch the map area until struct pages are initialized. Currently, we loop through "struct page"s several times during memory hotplug: 1. memset(0) in sparse_add_one_section() 2. loop in __add_section() to set do: set_page_node(page, nid); and SetPageReserved(page); 3. loop in pages_correctly_reserved() to check that SetPageReserved is set. 4. loop in memmap_init_zone() to call __init_single_pfn() Every time we have to loop through "struct page"s we lose the cached data, as they are massive. I suggest, getting rid of "1-3" loops, and only keep loop #4, and at the end of memmap_init_zone() after __init_single_pfn() calls do: if (context == MEMMAP_HOTPLUG) SetPageReserved(page); Hopefully, the compiler will optimize the above two lines into a conditional move instruction, and therefore, not adding any new branches. Also, this change would enable a future optimization of multithreading memory hotplugging, if that will ever be needed. Thank you, Pavel On Tue, Jan 30, 2018 at 5:11 AM, Michal Hocko wrote: > [Cc Andrew - thread starts here > http://lkml.kernel.org/r/20180130083006.GB1245@in.ibm.com] > > On Tue 30-01-18 15:23:45, Bharata B Rao wrote: >> On Tue, Jan 30, 2018 at 10:28:15AM +0100, Michal Hocko wrote: >> > On Tue 30-01-18 10:16:00, Michal Hocko wrote: >> > > On Tue 30-01-18 14:00:06, Bharata B Rao wrote: >> > > > Hi, >> > > > >> > > > With the latest upstream, I see that memory hotplug is not working >> > > > as expected. The hotplugged memory isn't seen to increase the total >> > > > RAM pages. This has been observed with both x86 and Power guests. >> > > > >> > > > 1. Memory hotplug code intially marks pages as PageReserved via >> > > > __add_section(). >> > > > 2. Later the struct page gets cleared in __init_single_page(). >> > > > 3. Next online_pages_range() increments totalram_pages only when >> > > > PageReserved is set. >> > > >> > > You are right. I have completely forgot about this late struct page >> > > initialization during onlining. memory hotplug really doesn't want >> > > zeroying. Let me think about a fix. >> > >> > Could you test with the following please? Not an act of beauty but >> > we are initializing memmap in sparse_add_one_section for memory >> > hotplug. I hate how this is different from the initialization case >> > but there is quite a long route to unify those two... So a quick >> > fix should be as follows. >> >> Tested on Power guest, fixes the issue. I can now see the total memory >> size increasing after hotplug. > > Thanks for your quick testing. Here we go with the fix. > > From d60b333d4048a84c3172829ec24706c761a7bd44 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jan 2018 11:02:18 +0100 > Subject: [PATCH] mm, memory_hotplug: fix memmap initialization > > Bharata has noticed that onlining a newly added memory doesn't increase > the total memory, pointing to f7f99100d8d9 ("mm: stop zeroing memory > during allocation in vmemmap") as a culprit. This commit has changed > the way how the memory for memmaps is initialized and moves it from the > allocation time to the initialization time. This works properly for the > early memmap init path. > > It doesn't work for the memory hotplug though because we need to mark > page as reserved when the sparsemem section is created and later > initialize it completely during onlining. memmap_init_zone is called > in the early stage of onlining. With the current code it calls > __init_single_page and as such it clears up the whole stage and > therefore online_pages_range skips those pages. > > Fix this by skipping mm_zero_struct_page in __init_single_page for > memory hotplug path. This is quite uggly but unifying both early init > and memory hotplug init paths is a large project. Make sure we plug the > regression at least. > > Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") > Cc: stable > Reported-and-Tested-by: Bharata B Rao > Signed-off-by: Michal Hocko > --- > mm/page_alloc.c | 22 ++++++++++++++-------- > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6129f989223a..f548f50c1f3c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1178,9 +1178,10 @@ static void free_one_page(struct zone *zone, > } > > static void __meminit __init_single_page(struct page *page, unsigned long pfn, > - unsigned long zone, int nid) > + unsigned long zone, int nid, bool zero) > { > - mm_zero_struct_page(page); > + if (zero) > + mm_zero_struct_page(page); > set_page_links(page, zone, nid, pfn); > init_page_count(page); > page_mapcount_reset(page); > @@ -1195,9 +1196,9 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, > } > > static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone, > - int nid) > + int nid, bool zero) > { > - return __init_single_page(pfn_to_page(pfn), pfn, zone, nid); > + return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero); > } > > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > @@ -1218,7 +1219,7 @@ static void __meminit init_reserved_page(unsigned long pfn) > if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone)) > break; > } > - __init_single_pfn(pfn, zid, nid); > + __init_single_pfn(pfn, zid, nid, true); > } > #else > static inline void init_reserved_page(unsigned long pfn) > @@ -1535,7 +1536,7 @@ static unsigned long __init deferred_init_pages(int nid, int zid, > } else { > page++; > } > - __init_single_page(page, pfn, zid, nid); > + __init_single_page(page, pfn, zid, nid, true); > nr_pages++; > } > return (nr_pages); > @@ -5400,15 +5401,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > * can be created for invalid pages (for alignment) > * check here not to call set_pageblock_migratetype() against > * pfn out of zone. > + * > + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap > + * because this is done early in sparse_add_one_section > */ > if (!(pfn & (pageblock_nr_pages - 1))) { > struct page *page = pfn_to_page(pfn); > > - __init_single_page(page, pfn, zone, nid); > + __init_single_page(page, pfn, zone, nid, > + context != MEMMAP_HOTPLUG); > set_pageblock_migratetype(page, MIGRATE_MOVABLE); > cond_resched(); > } else { > - __init_single_pfn(pfn, zone, nid); > + __init_single_pfn(pfn, zone, nid, > + context != MEMMAP_HOTPLUG); > } > } > } > -- > 2.15.1 > > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org