Received: by 10.223.185.116 with SMTP id b49csp2201476wrg; Sun, 4 Mar 2018 21:23:05 -0800 (PST) X-Google-Smtp-Source: AG47ELvsyQtJ3NoGgliEoZjuXBlJto4FCLoBfXwJTbmi74g5xAXiu1puDF9oEhC1pTxav9Nlc+Th X-Received: by 10.99.140.29 with SMTP id m29mr11506428pgd.320.1520227385640; Sun, 04 Mar 2018 21:23:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520227385; cv=none; d=google.com; s=arc-20160816; b=bFh5M76iuLfvIiLClL4BhxJfFu5ivYXl8zlmEQZC08MTiY95DGi3IbYiplku1iUvoh 1DuubufOwvmmntJYCJ/7h2Usa4HRUW0XB3T9MKRvF60vU09iBKFINNyU5psSDHXwx0Yo iKjAKifTJXfcOvkDjYLnnQebH630GnUpGK2mrOtUYtUx9P2V6scqkUO9au/ZVixTdzT6 KjSy90cfaPHfT3toVqFgnS6R1K2o+5xOiJms3NRilwgksoml9lOQiAqfaGcTS6bb+6nn at0LxlTfzwn0p+WbuKrwLmhP9nvbbKjGvRRztN1dJPO7F5mVxsp5rskI84maRjTolv0S Dptg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:reply-to:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=nSZRY2rdh7tLFySSazDUoH5HYn8HmrJhSgLVb1EBAIE=; b=L0+3jPSgUbdqZ8S746+Ui0K8/Gcli0jJZNBIRHa2B0WG5gakYviw2ZVPOBKIneXJBh sm0RCIsiiOYZ4DTzrWdhN6Jg1B+qy1sCsYnAz7ADqH6fYD52UNmCI6FTpXu1HFpcsaPy xTU9xD5oXja5JabZyc0rJcO6gfy/+SvkdOPQs97KcT7UHd2So7x130wqVs6HqBgZbVnT wfJZZJ87t/8OuKTZmTO9nkBjg/KSmFzDvhZRjaAJLd+bFBNnSrieIckKZLF2O7bmsGef iBsBF8CitcdI5zB8YwIEgq3I3feYIFTAFIRaguqjB8EFrcPYCzdgDtkfzdESObr5Znsg uC7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h2si7881542pgv.201.2018.03.04.21.22.51; Sun, 04 Mar 2018 21:23:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932789AbeCEFUi (ORCPT + 99 others); Mon, 5 Mar 2018 00:20:38 -0500 Received: from a2nlsmtp01-02.prod.iad2.secureserver.net ([198.71.225.36]:41712 "EHLO a2nlsmtp01-02.prod.iad2.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752352AbeCEFSf (ORCPT ); Mon, 5 Mar 2018 00:18:35 -0500 Received: from linuxonhyperv2.linuxonhyperv.com ([107.180.71.197]) by : HOSTING RELAY : with SMTP id siVCew77STLhIsiVCewZbZ; Sun, 04 Mar 2018 22:17:34 -0700 x-originating-ip: 107.180.71.197 Received: from kys by linuxonhyperv2.linuxonhyperv.com with local (Exim 4.89_1) (envelope-from ) id 1esiVC-00050q-0n; Sun, 04 Mar 2018 22:17:34 -0700 From: kys@exchange.microsoft.com To: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com, vkuznets@redhat.com, jasowang@redhat.com, leann.ogasawara@canonical.com, marcelo.cerri@canonical.com, sthemmin@microsoft.com Cc: "K . Y . Srinivasan" Subject: [PATCH V2 11/12] hv_balloon: fix bugs in num_pages_onlined accounting Date: Sun, 4 Mar 2018 22:17:21 -0700 Message-Id: <20180305051722.19157-11-kys@exchange.microsoft.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180305051722.19157-1-kys@exchange.microsoft.com> References: <20180305051539.19079-1-kys@exchange.microsoft.com> <20180305051722.19157-1-kys@exchange.microsoft.com> Reply-To: kys@microsoft.com X-CMAE-Envelope: MS4wfIlrs04KbsvJwBOada1kGsbtPgPRQJ9ul0P3tzLaVZW0c/mQWcoGq6l2eqkmi3yyh3PMiRTZvIQgE1y8KbT3tcbFtaLqcGw/M5HFBjHywGCsiUibZjvV K32h7Mi6ywTc0g7HHxjUR+lQE2Jq8lheV+QG7pT9NTd4fq1qgHr9seUiojyimL5JEodd6lzxpP1/FkmUhVSXCyzw95JUSLgNujx2KOXrS0PfhgZU/TfFb9iS P32Yo3u9m3O0lHmnNtZmuC6KisvAHuagY3hHIZLOvIoJzUmwmP9jB3+jCCrRQhUjgBx3dkVgKVqw45p7HrXxQ1isU+4/mLfrxHj3KVfuw4le9+vvDPt9GNSf WuU+ry8WiX9iBguCLAbaYic6UOT/WfvZpJjvIvp89WrQZxBL2ymNdS4or4U78ZT9ikGkbR/2f5DeM9pNtFJLF1WViLsTFsoYeHbyHsFR7AU0i+a/hJ+m4yCs oi5YEHa0NIxHfsk2Vi/5LdFNBnj2+I1OIpyrqsm9wlaKFe0+3jqTbJBkrMrMRva05WMafM4liSI9V2D7vR/hfMzMeaNB8FbGGgx93A== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Vitaly Kuznetsov Our num_pages_onlined accounting is buggy: 1) In case we're offlining a memory block which was present at boot (e.g. when there was no hotplug at all) we subtract 32k from 0 and as num_pages_onlined is unsigned get a very big positive number. 2) Commit 6df8d9aaf3af ("Drivers: hv: balloon: Correctly update onlined page count") made num_pages_onlined counter accurate on onlining but totally incorrect on offlining for partly populated regions: no matter how many pages were onlined and what was actually added to num_pages_onlined counter we always subtract the full region (32k) so again, num_pages_onlined can wrap around zero. By onlining/offlining the same partly populated region multiple times we can make the situation worse. Solve these issues by doing accurate accounting on offlining: walk HAS list, check for covered range and gaps. Fixes: 6df8d9aaf3af ("Drivers: hv: balloon: Correctly update onlined page count") Signed-off-by: Vitaly Kuznetsov Signed-off-by: K. Y. Srinivasan --- drivers/hv/hv_balloon.c | 82 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 69 insertions(+), 13 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 5b8e1ad1bcfe..7514a32d0de4 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -576,11 +576,65 @@ static struct hv_dynmem_device dm_device; static void post_status(struct hv_dynmem_device *dm); #ifdef CONFIG_MEMORY_HOTPLUG +static inline bool has_pfn_is_backed(struct hv_hotadd_state *has, + unsigned long pfn) +{ + struct hv_hotadd_gap *gap; + + /* The page is not backed. */ + if ((pfn < has->covered_start_pfn) || (pfn >= has->covered_end_pfn)) + return false; + + /* Check for gaps. */ + list_for_each_entry(gap, &has->gap_list, list) { + if ((pfn >= gap->start_pfn) && (pfn < gap->end_pfn)) + return false; + } + + return true; +} + +static unsigned long hv_page_offline_check(unsigned long start_pfn, + unsigned long nr_pages) +{ + unsigned long pfn = start_pfn, count = 0; + struct hv_hotadd_state *has; + bool found; + + while (pfn < start_pfn + nr_pages) { + /* + * Search for HAS which covers the pfn and when we find one + * count how many consequitive PFNs are covered. + */ + found = false; + list_for_each_entry(has, &dm_device.ha_region_list, list) { + while ((pfn >= has->start_pfn) && + (pfn < has->end_pfn) && + (pfn < start_pfn + nr_pages)) { + found = true; + if (has_pfn_is_backed(has, pfn)) + count++; + pfn++; + } + } + + /* + * This PFN is not in any HAS (e.g. we're offlining a region + * which was present at boot), no need to account for it. Go + * to the next one. + */ + if (!found) + pfn++; + } + + return count; +} + static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, void *v) { struct memory_notify *mem = (struct memory_notify *)v; - unsigned long flags; + unsigned long flags, pfn_count; switch (val) { case MEM_ONLINE: @@ -593,7 +647,19 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, case MEM_OFFLINE: spin_lock_irqsave(&dm_device.ha_lock, flags); - dm_device.num_pages_onlined -= mem->nr_pages; + pfn_count = hv_page_offline_check(mem->start_pfn, + mem->nr_pages); + if (pfn_count <= dm_device.num_pages_onlined) { + dm_device.num_pages_onlined -= pfn_count; + } else { + /* + * We're offlining more pages than we managed to online. + * This is unexpected. In any case don't let + * num_pages_onlined wrap around zero. + */ + WARN_ON_ONCE(1); + dm_device.num_pages_onlined = 0; + } spin_unlock_irqrestore(&dm_device.ha_lock, flags); break; case MEM_GOING_ONLINE: @@ -612,19 +678,9 @@ static struct notifier_block hv_memory_nb = { /* Check if the particular page is backed and can be onlined and online it. */ static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg) { - struct hv_hotadd_gap *gap; - unsigned long pfn = page_to_pfn(pg); - - /* The page is not backed. */ - if ((pfn < has->covered_start_pfn) || (pfn >= has->covered_end_pfn)) + if (!has_pfn_is_backed(has, page_to_pfn(pg))) return; - /* Check for gaps. */ - list_for_each_entry(gap, &has->gap_list, list) { - if ((pfn >= gap->start_pfn) && (pfn < gap->end_pfn)) - return; - } - /* This frame is currently backed; online the page. */ __online_page_set_limits(pg); __online_page_increment_counters(pg); -- 2.15.1