Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp986120imu; Fri, 11 Jan 2019 12:45:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN6bFEmwbdFsmzQN2hlyp7kZ1Kwi+h5GI8SqRMx4vZ/cTCyGbd4PkTPedAYb4d1gixINlDVM X-Received: by 2002:a62:4886:: with SMTP id q6mr16480186pfi.182.1547239508072; Fri, 11 Jan 2019 12:45:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547239508; cv=none; d=google.com; s=arc-20160816; b=TMmAN2nMhiS0sYl397e9Wlq0+ivYmAgtPYJ+euTV1MjrmBH2YJZcPwpDnix8tOxvnm IPPmbT+JuB5595rPFbJIMV57QhKj598FZmf9ucFBty8eB8+JDK6GRDDD6y5CytecdEIj vGJ2vvuEibqUY3n3QMP3niL8Q67HFxSXNMmEU+B8LqoPnfs2wlKl0yrQ9cJicPPMIFaU 9EEoIYDztj7TNG87xfSM1hCLNupZPYE0x0Zi6ZG8Oo59W+sWZhJRSfqYt+QBOCWFfU7X 0UnOYuOtESOePmmDmEEGCRGCh1zKzOXsKomFMb17+BMS1GddP/l9qRf0H7CycrXD9Vf8 7B9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=6WQS4qNyE2fZLkCEL3eEGS7LPl0SDnwKfMAYHXAH634=; b=mTaXMN5JaORUMPJVpIDeS8tV5YX4fUJLZaUEmDtAmYBlcDOO6+v/u1ax+uk1gJM0bs BugizMdOjN4MXk7jrsWN/yCIK5BUWvCZrhUDANFQWPmotOCca+elN8dKGPcBlOCQV6wu RaxktXhCxRwoqbYtTOhvc/Zf6e+mNlDKafpAeCku2ENkgCdYQqbPnp8vwE6fHRqEPu4h sXYLzdDls5gFk8rjp+htMLbtEmZBgRaAiFzsv2n2oIdbea2lGXqH92nOECzzz3HcyuVS KyfLOynfz1VPny0PKPFwBICcUuBJkmunrOKTLx92Uc0U1JnGhEvkLmYfClYv5+bgtJTd l9OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=O72kVGm4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 32si14421574plg.29.2019.01.11.12.44.52; Fri, 11 Jan 2019 12:45:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=O72kVGm4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404242AbfAKOmu (ORCPT + 99 others); Fri, 11 Jan 2019 09:42:50 -0500 Received: from mail.kernel.org ([198.145.29.99]:35562 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404233AbfAKOms (ORCPT ); Fri, 11 Jan 2019 09:42:48 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 82C622063F; Fri, 11 Jan 2019 14:42:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547217768; bh=p7gw9CE4s7AUU+M+A/fzPexDPhhZiLwYOVUfmiaPs28=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=O72kVGm4Bas2oknX+Pyi2LTqi6PK+7XSmVTlgjHpfjwfPEhy+XF9NzPYhZGbStxGM qD11b0Tq5I9gPm07Xf5C2hWNQ2ha+Xa11XWLRfZg13NnhqjY6fczn66WZPR+w7h8UN u4vZJGn1SWBs1u+5BecJwb2tg4aeEaoTA0AN3njU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Michal Hocko , Oscar Salvador , David Hildenbrand , Naoya Horiguchi , Andrew Morton , Linus Torvalds Subject: [PATCH 4.20 05/65] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined Date: Fri, 11 Jan 2019 15:14:51 +0100 Message-Id: <20190111131056.878077905@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190111131055.331350141@linuxfoundation.org> References: <20190111131055.331350141@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.20-stable review patch. If anyone has any objections, please let me know. ------------------ From: Michal Hocko commit b15c87263a69272423771118c653e9a1d0672caa upstream. We have received a bug report that an injected MCE about faulty memory prevents memory offline to succeed on 4.4 base kernel. The underlying reason was that the HWPoison page has an elevated reference count and the migration keeps failing. There are two problems with that. First of all it is dubious to migrate the poisoned page because we know that accessing that memory is possible to fail. Secondly it doesn't make any sense to migrate a potentially broken content and preserve the memory corruption over to a new location. Oscar has found out that 4.4 and the current upstream kernels behave slightly differently with his simply testcase === int main(void) { int ret; int i; int fd; char *array = malloc(4096); char *array_locked = malloc(4096); fd = open("/tmp/data", O_RDONLY); read(fd, array, 4095); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked)); if (ret) perror("mlock"); sleep (20); ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON); if (ret) perror("madvise"); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; return 0; } === + offline this memory. In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU list kernel: [] dump_trace+0x59/0x340 kernel: [] show_stack_log_lvl+0xea/0x170 kernel: [] show_stack+0x21/0x40 kernel: [] dump_stack+0x5c/0x7c kernel: [] warn_slowpath_common+0x81/0xb0 kernel: [] __pagevec_lru_add_fn+0x14c/0x160 kernel: [] pagevec_lru_move_fn+0xad/0x100 kernel: [] __lru_cache_add+0x6c/0xb0 kernel: [] add_to_page_cache_lru+0x46/0x70 kernel: [] extent_readpages+0xc3/0x1a0 [btrfs] kernel: [] __do_page_cache_readahead+0x177/0x200 kernel: [] ondemand_readahead+0x168/0x2a0 kernel: [] generic_file_read_iter+0x41f/0x660 kernel: [] __vfs_read+0xcd/0x140 kernel: [] vfs_read+0x7a/0x120 kernel: [] kernel_read+0x3b/0x50 kernel: [] do_execveat_common.isra.29+0x490/0x6f0 kernel: [] do_execve+0x28/0x30 kernel: [] call_usermodehelper_exec_async+0xfb/0x130 kernel: [] ret_from_fork+0x55/0x80 And that latter confuses the hotremove path because an LRU page is attempted to be migrated and that fails due to an elevated reference count. It is quite possible that the reuse of the HWPoisoned page is some kind of fixed race condition but I am not really sure about that. With the upstream kernel the failure is slightly different. The page doesn't seem to have LRU bit set but isolate_movable_page simply fails and do_migrate_range simply puts all the isolated pages back to LRU and therefore no progress is made and scan_movable_pages finds same set of pages over and over again. Fix both cases by explicitly checking HWPoisoned pages before we even try to get reference on the page, try to unmap it if it is still mapped. As explained by Naoya: : Hwpoison code never unmapped those for no big reason because : Ksm pages never dominate memory, so we simply didn't have strong : motivation to save the pages. Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU HWPoison pages which shouldn't happen but I couldn't convince myself about that. Naoya has noted the following: : Theoretically no such gurantee, because try_to_unmap() doesn't have a : guarantee of success and then memory_failure() returns immediately : when hwpoison_user_mappings fails. : Or the following code (comes after hwpoison_user_mappings block) also impli= : es : that the target page can still have PageLRU flag. : : /* : * Torn down by someone else? : */ : if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) { : action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); : res =3D -EBUSY; : goto out; : } : : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in : current version of your patch. Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Oscar Salvador Debugged-by: Oscar Salvador Tested-by: Oscar Salvador Acked-by: David Hildenbrand Acked-by: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory_hotplug.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -34,6 +34,7 @@ #include #include #include +#include #include @@ -1369,6 +1370,21 @@ do_migrate_range(unsigned long start_pfn pfn = page_to_pfn(compound_head(page)) + hpage_nr_pages(page) - 1; + /* + * HWPoison pages have elevated reference counts so the migration would + * fail on them. It also doesn't make any sense to migrate them in the + * first place. Still try to unmap such a page in case it is still mapped + * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep + * the unmap as the catch all safety net). + */ + if (PageHWPoison(page)) { + if (WARN_ON(PageLRU(page))) + isolate_lru_page(page); + if (page_mapped(page)) + try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS); + continue; + } + if (!get_page_unless_zero(page)) continue; /*