Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753294AbcJMJXv (ORCPT ); Thu, 13 Oct 2016 05:23:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:44894 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbcJMJXn (ORCPT ); Thu, 13 Oct 2016 05:23:43 -0400 Subject: Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list To: js1304@gmail.com, Andrew Morton References: <1476346102-26928-1-git-send-email-iamjoonsoo.kim@lge.com> <1476346102-26928-2-git-send-email-iamjoonsoo.kim@lge.com> Cc: Johannes Weiner , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim From: Vlastimil Babka Message-ID: <15d0cf1a-4b73-470d-208f-be7b0ebb48ba@suse.cz> Date: Thu, 13 Oct 2016 11:04:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <1476346102-26928-2-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3711 Lines: 88 On 10/13/2016 10:08 AM, js1304@gmail.com wrote: > From: Joonsoo Kim > > Currently, freeing page can stay longer in the buddy list if next higher > order page is in the buddy list in order to help coalescence. However, > it doesn't work for the simplest sequential free case. For example, think > about the situation that 8 consecutive pages are freed in sequential > order. > > page 0: attached at the head of order 0 list > page 1: merged with page 0, attached at the head of order 1 list > page 2: attached at the tail of order 0 list > page 3: merged with page 2 and then merged with page 0, attached at > the head of order 2 list > page 4: attached at the head of order 0 list > page 5: merged with page 4, attached at the tail of order 1 list > page 6: attached at the tail of order 0 list > page 7: merged with page 6 and then merged with page 4. Lastly, merged > with page 0 and we get order 3 freepage. > > With excluding page 0 case, there are three cases that freeing page is > attached at the head of buddy list in this example and if just one > corresponding ordered allocation request comes at that moment, this page > in being a high order page will be allocated and we would fail to make > order-3 freepage. > > Allocation usually happens in sequential order and free also does. So, it Are you sure this is true except after the system is freshly booted? As soon as it becomes fragmented, a stream of order-0 allocations will likely grab them randomly from all over the place and it's unlikely to recover except small orders. > would be important to detect such a situation and to give some chance > to be coalesced. > > I think that simple and effective heuristic about this case is just > attaching freeing page at the tail of the buddy list unconditionally. > If freeing isn't merged during one rotation, it would be actual > fragmentation and we don't need to care about it for coalescence. I'm not against removing this heuristic, but not without some benchmarks. The disadvantage of putting pages to tail lists is that they become cache-cold until allocated again. We should check how large that problem is. > Signed-off-by: Joonsoo Kim > --- > mm/page_alloc.c | 25 ++----------------------- > 1 file changed, 2 insertions(+), 23 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 1790391..c4f7d05 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -858,29 +858,8 @@ static inline void __free_one_page(struct page *page, > done_merging: > set_page_order(page, order); > > - /* > - * If this is not the largest possible page, check if the buddy > - * of the next-highest order is free. If it is, it's possible > - * that pages are being freed that will coalesce soon. In case, > - * that is happening, add the free page to the tail of the list > - * so it's less likely to be used soon and more likely to be merged > - * as a higher order page > - */ > - if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) { > - struct page *higher_page, *higher_buddy; > - combined_idx = buddy_idx & page_idx; > - higher_page = page + (combined_idx - page_idx); > - buddy_idx = __find_buddy_index(combined_idx, order + 1); > - higher_buddy = higher_page + (buddy_idx - combined_idx); > - if (page_is_buddy(higher_page, higher_buddy, order + 1)) { > - list_add_tail(&page->lru, > - &zone->free_area[order].free_list[migratetype]); > - goto out; > - } > - } > - > - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); > -out: > + list_add_tail(&page->lru, > + &zone->free_area[order].free_list[migratetype]); > zone->free_area[order].nr_free++; > } > >