Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941312AbdDTFGl (ORCPT ); Thu, 20 Apr 2017 01:06:41 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33268 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S934455AbdDTFGj (ORCPT ); Thu, 20 Apr 2017 01:06:39 -0400 Subject: Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level To: Anshuman Khandual , "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20170419032759.29700-1-khandual@linux.vnet.ibm.com> <877f2ghqaf.fsf@skywalker.in.ibm.com> Cc: n-horiguchi@ah.jp.nec.com, akpm@linux-foundation.org From: Anshuman Khandual Date: Thu, 20 Apr 2017 10:35:14 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable x-cbid: 17042005-1617-0000-0000-000001C193DE X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17042005-1618-0000-0000-00004801487E Message-Id: <893ecbd7-e9fa-7a54-fc62-43f8a5b8107f@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-19_16:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704200041 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3252 Lines: 69 On 04/19/2017 12:12 PM, Anshuman Khandual wrote: > On 04/19/2017 11:50 AM, Aneesh Kumar K.V wrote: >> Anshuman Khandual writes: >> >>> Though migrating gigantic HugeTLB pages does not sound much like real >>> world use case, they can be affected by memory errors. Hence migration >>> at the PGD level HugeTLB pages should be supported just to enable soft >>> and hard offline use cases. >> In that case do we want to isolated the entire 16GB range ? Should we >> just dequeue the page from hugepage pool convert them to regular 64K >> pages and then isolate the 64K that had memory error ? > Though its a better thing to do, assuming that we can actually dequeue > the huge page and push it to the buddy allocator as normal 64K pages > (need to check on this as the original allocation happened from the > memblock instead of the buddy allocator, guess it should be possible > given that we do similar stuff during memory hot plug). In that case > we will also have to consider the same for the PMD based HugeTLB pages > as well or it should be only for these gigantic huge pages ? If we look at the code inside the function soft_offline_huge_page(), if the source huge page has been freed to the active_freelist then we mark the *entire* hugepage as poisoned but if the huge page has been released back to the buddy allocator then only the page in question is marked poisoned not the entire huge page. This was part was added with the commit a49ecbcd7 ("mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully"). But when I look at the migrate_pages() handling of huge pages, it always calls putback_active_hugepage() after successful migration to release the huge page back the active list not to the buddy allocator. I am not sure if the second half of 'if' block is ever getting executed at all. I am starting to wonder whats the point of releasing the huge page to the active list in migrate_pages() when we will go and mark the entire huge page as *poisoned*, put it in a dangling state (page->lru pointing to itself) which can not be allocated anyway. After migrate_pages() is successful and the source huge page is release to the active list. We just mark the single normal page has poisoned, get the source page from the active list and free it to the buddy allocator. This should just take care both PMD and PGD based huge pages. ---------------------------------------------------------------------- ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); /* * We know that soft_offline_huge_page() tries to migrate * only one hugepage pointed to by hpage, so we need not * run through the pagelist here. */ putback_active_hugepage(hpage); if (ret > 0) ret = -EIO; } else { /* overcommit hugetlb page will be freed to buddy */ if (PageHuge(page)) { set_page_hwpoison_huge_page(hpage); dequeue_hwpoisoned_huge_page(hpage); num_poisoned_pages_add(1 << compound_order(hpage)); } else { SetPageHWPoison(page); num_poisoned_pages_inc(); } } ----------------------------------------------------------------------