Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751983AbdHCKoX (ORCPT ); Thu, 3 Aug 2017 06:44:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:44966 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751010AbdHCKoV (ORCPT ); Thu, 3 Aug 2017 06:44:21 -0400 Date: Thu, 3 Aug 2017 12:44:18 +0200 From: Michal Hocko To: Wei Wang Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, mst@redhat.com, mawilcox@microsoft.com, akpm@linux-foundation.org, virtio-dev@lists.oasis-open.org, david@redhat.com, cornelia.huck@de.ibm.com, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, liliang.opensource@gmail.com, yang.zhang.wz@gmail.com, quan.xu@aliyun.com Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks Message-ID: <20170803104417.GI12521@dhcp22.suse.cz> References: <1501742299-4369-1-git-send-email-wei.w.wang@intel.com> <1501742299-4369-5-git-send-email-wei.w.wang@intel.com> <20170803091151.GF12521@dhcp22.suse.cz> <5982FE07.3040207@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5982FE07.3040207@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2204 Lines: 64 On Thu 03-08-17 18:42:15, Wei Wang wrote: > On 08/03/2017 05:11 PM, Michal Hocko wrote: > >On Thu 03-08-17 14:38:18, Wei Wang wrote: [...] > >>+static int report_free_page_block(struct zone *zone, unsigned int order, > >>+ unsigned int migratetype, struct page **page) > >This is just too ugly and wrong actually. Never provide struct page > >pointers outside of the zone->lock. What I've had in mind was to simply > >walk free lists of the suitable order and call the callback for each one. > >Something as simple as > > > > for (i = 0; i < MAX_NR_ZONES; i++) { > > struct zone *zone = &pgdat->node_zones[i]; > > > > if (!populated_zone(zone)) > > continue; > > spin_lock_irqsave(&zone->lock, flags); > > for (order = min_order; order < MAX_ORDER; ++order) { > > struct free_area *free_area = &zone->free_area[order]; > > enum migratetype mt; > > struct page *page; > > > > if (!free_area->nr_pages) > > continue; > > > > for_each_migratetype_order(order, mt) { > > list_for_each_entry(page, > > &free_area->free_list[mt], lru) { > > > > pfn = page_to_pfn(page); > > visit(opaque2, prn, 1< > } > > } > > } > > > > spin_unlock_irqrestore(&zone->lock, flags); > > } > > > >[...] > > > I think the above would take the lock for too long time. That's why we > prefer to take one free page block each time, and taking it one by one > also doesn't make a difference, in terms of the performance that we > need. I think you should start with simple approach and impove incrementally if this turns out to be not optimal. I really detest taking struct pages outside of the lock. You never know what might happen after the lock is dropped. E.g. can you race with the memory hotremove? > The struct page is used as a "state" to get the next free page block. It is > only > given for an internal implementation of a function in mm ( not seen by the > outside caller). Would this be OK? > If not, how about pfn - we can also pass in pfn to the function, and do > pfn_to_page each time the function starts, and then do page_to_pfn when > returns. No, just do not try to play tricks with struct pages which might have gone away. -- Michal Hocko SUSE Labs