Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp362055imm; Thu, 14 Jun 2018 22:14:08 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIrjQ1PQzRhUoifROj26zFnGqZyAOaZ41UGuotdDcI5JA2b9Gj1LEiHDLkYXPKUbWLlrptc X-Received: by 2002:a17:902:9a4c:: with SMTP id x12-v6mr213749plv.213.1529039648188; Thu, 14 Jun 2018 22:14:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529039648; cv=none; d=google.com; s=arc-20160816; b=XTZJG2nNlkgpuOiKd+eP6Qykf4d6p8TD/zf5NvX1TadI+BpP37/rLUd/D6xafjMRmk j5Ev9En3MoiDSM5ptXpQZlbmhURBBY8/O5sWDbr4X1pe4bFYAKwfSxiQuMKQXHYW1SuV owJZoV9A/Ff0T85QKxLRdwH+J7/XKZcrz765m0O1qPRQwNqXy5CYMKZV4b8ST26DNoYb jcHOp3tN2xEX7gpvdrWevWknhonNUjEQYSyOFJOcl7YxggWdnDMV6A11DKNPV21sOm/A RD/paAOTt/7a6LE3fTWS8XuAjO+XJWrfBLYNpqPLHNvwGJ6afBWYgHun+ymyvVvVgmFv pTGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=lx5oJi46DCSHII9VdrNDGR9g2wLbv4jnvmqe3xj8uAs=; b=LhgM/2sxrFk7e+B5e6hl1Gz5Q33oyT3udsnZN68jVZtYdLSPtTEduyJuVgNhd6AIKW MQw06OEgkJgA5kCPOcq7seoT3NVJNp+IwcQV/j+FUY+0f0sa2SR16p1Fy9f257fwDblw tRzFqKbWPxyD6ckCDy0tHhC2XoL+xsBQS7FHvmDKnjbrzfZzqhC5MlXs/Ye/wkYv6ERo 0kq8P/M65uYsNVYjnX6SFB3RD7A+zneWHzeVy625IFWEDPpmHRMmPLiNMK5A/6sROPLc iLi+mUWat4Te6JNwcDZ2tGiVk+hSjWixZtNttkBvX74NvHYDdT+DOPOMOdA3P+2gvGcb kiiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 63-v6si213377plb.515.2018.06.14.22.13.23; Thu, 14 Jun 2018 22:14:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755884AbeFOFKd (ORCPT + 99 others); Fri, 15 Jun 2018 01:10:33 -0400 Received: from mga12.intel.com ([192.55.52.136]:33116 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755639AbeFOFIi (ORCPT ); Fri, 15 Jun 2018 01:08:38 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Jun 2018 22:08:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,225,1526367600"; d="scan'208";a="47272178" Received: from devel-ww.sh.intel.com ([10.239.48.110]) by fmsmga007.fm.intel.com with ESMTP; 14 Jun 2018 22:08:35 -0700 From: Wei Wang To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, pbonzini@redhat.com, wei.w.wang@intel.com, liliang.opensource@gmail.com, yang.zhang.wz@gmail.com, quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com, peterx@redhat.com Subject: [PATCH v33 1/4] mm: add a function to get free page blocks Date: Fri, 15 Jun 2018 12:43:10 +0800 Message-Id: <1529037793-35521-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1529037793-35521-1-git-send-email-wei.w.wang@intel.com> References: <1529037793-35521-1-git-send-email-wei.w.wang@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds a function to get free pages blocks from a free page list. The obtained free page blocks are hints about free pages, because there is no guarantee that they are still on the free page list after the function returns. One use example of this patch is to accelerate live migration by skipping the transfer of free pages reported from the guest. A popular method used by the hypervisor to track which part of memory is written during live migration is to write-protect all the guest memory. So, those pages that are hinted as free pages but are written after this function returns will be captured by the hypervisor, and they will be added to the next round of memory transfer. Suggested-by: Linus Torvalds Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michal Hocko Cc: Andrew Morton Cc: Michael S. Tsirkin Cc: Linus Torvalds --- include/linux/mm.h | 1 + mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e49388..c58b4e5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2002,6 +2002,7 @@ extern void free_area_init(unsigned long * zones_size); extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size); /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 07b3c23..7c816d9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5043,6 +5043,58 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/** + * get_from_free_page_list - get free page blocks from a free page list + * @order: the order of the free page list to check + * @buf: the array to store the physical addresses of the free page blocks + * @size: the array size + * + * This function offers hints about free pages. There is no guarantee that + * the obtained free pages are still on the free page list after the function + * returns. pfn_to_page on the obtained free pages is strongly discouraged + * and if there is an absolute need for that, make sure to contact MM people + * to discuss potential problems. + * + * The addresses are currently stored to the array in little endian. This + * avoids the overhead of converting endianness by the caller who needs data + * in the little endian format. Big endian support can be added on demand in + * the future. + * + * Return the number of free page blocks obtained from the free page list. + * The maximum number of free page blocks that can be obtained is limited to + * the caller's array size. + */ +uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t size) +{ + struct zone *zone; + enum migratetype mt; + struct page *page; + struct list_head *list; + unsigned long addr, flags; + uint32_t index = 0; + + for_each_populated_zone(zone) { + spin_lock_irqsave(&zone->lock, flags); + for (mt = 0; mt < MIGRATE_TYPES; mt++) { + list = &zone->free_area[order].free_list[mt]; + list_for_each_entry(page, list, lru) { + addr = page_to_pfn(page) << PAGE_SHIFT; + if (likely(index < size)) { + buf[index++] = cpu_to_le64(addr); + } else { + spin_unlock_irqrestore(&zone->lock, + flags); + return index; + } + } + } + spin_unlock_irqrestore(&zone->lock, flags); + } + + return index; +} +EXPORT_SYMBOL_GPL(get_from_free_page_list); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone; -- 2.7.4