Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753281AbcCBKFL (ORCPT ); Wed, 2 Mar 2016 05:05:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:37520 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750843AbcCBKFD (ORCPT ); Wed, 2 Mar 2016 05:05:03 -0500 Subject: Re: [PATCH v2 4/5] mm, kswapd: replace kswapd compaction with waking up kcompactd To: Joonsoo Kim References: <1454938691-2197-1-git-send-email-vbabka@suse.cz> <1454938691-2197-5-git-send-email-vbabka@suse.cz> <20160302063322.GB32695@js1304-P5Q-DELUXE> Cc: linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org, Andrea Arcangeli , "Kirill A. Shutemov" , Rik van Riel , Mel Gorman , David Rientjes , Michal Hocko , Johannes Weiner From: Vlastimil Babka Message-ID: <56D6BACB.7060005@suse.cz> Date: Wed, 2 Mar 2016 11:04:59 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160302063322.GB32695@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5938 Lines: 131 On 03/02/2016 07:33 AM, Joonsoo Kim wrote: >> >> 4.5-rc1 4.5-rc1 >> 3-test 4-test >> Minor Faults 106940795 106582816 >> Major Faults 829 813 >> Swap Ins 482 311 >> Swap Outs 6278 5598 >> Allocation stalls 128 184 >> DMA allocs 145 32 >> DMA32 allocs 74646161 74843238 >> Normal allocs 26090955 25886668 >> Movable allocs 0 0 >> Direct pages scanned 32938 31429 >> Kswapd pages scanned 2183166 2185293 >> Kswapd pages reclaimed 2152359 2134389 >> Direct pages reclaimed 32735 31234 >> Kswapd efficiency 98% 97% >> Kswapd velocity 1243.877 1228.666 >> Direct efficiency 99% 99% >> Direct velocity 18.767 17.671 >> Percentage direct scans 1% 1% >> Zone normal velocity 299.981 291.409 >> Zone dma32 velocity 962.522 954.928 >> Zone dma velocity 0.142 0.000 >> Page writes by reclaim 6278.800 5598.600 >> Page writes file 0 0 >> Page writes anon 6278 5598 >> Page reclaim immediate 93 96 >> Sector Reads 4357114 4307161 >> Sector Writes 11053628 11053091 >> Page rescued immediate 0 0 >> Slabs scanned 1592829 1555770 >> Direct inode steals 1557 2025 >> Kswapd inode steals 46056 45418 >> Kswapd skipped wait 0 0 >> THP fault alloc 579 614 >> THP collapse alloc 304 324 >> THP splits 0 0 >> THP fault fallback 793 730 >> THP collapse fail 11 14 >> Compaction stalls 1013 959 >> Compaction success 92 69 >> Compaction failures 920 890 >> Page migrate success 238457 662054 >> Page migrate failure 23021 32846 >> Compaction pages isolated 504695 1370326 >> Compaction migrate scanned 661390 7025772 >> Compaction free scanned 13476658 73302642 >> Compaction cost 262 762 >> >> After this patch we see improvements in allocation success rate (especially for >> phase 3) along with increased compaction activity. The compaction stalls >> (direct compaction) in the interfering kernel builds (probably THP's) also >> decreased somewhat to kcompactd activity, yet THP alloc successes improved a >> bit. > > Why you did the test with THP? THP interferes result of main test so > it would be better not to enable it. Hmm I've always left it enabled. It makes for a more realistic interference and would also show unintended regressions in that closely related area. > And, this patch increased compaction activity (10 times for migrate scanned) > may be due to resetting skip block information. Note that kswapd compaction activity was completely non-existent for reasons outlined in the changelog. > Isn't is better to disable it > for this patch to work as similar as possible that kswapd does and re-enable it > on next patch? If something goes bad, it can simply be reverted. > > Look like it is even not mentioned in the description. Yeah skip block information is discussed in the next patch, which mentions that it's being reset and why. I think it makes more sense, as when kswapd reclaims from low watermark to high, potentially many pageblocks have new free pages and the skip bits are obsolete. Next, kcompactd is separate thread, so it doesn't stall allocations (or kswapd reclaim) by its activity. Personally I hope that one day we can get rid of the skip bits completely. They can make the stats look apparently nicer, but I think their effect is nearly random. >> @@ -3066,8 +3071,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining, >> */ >> static bool kswapd_shrink_zone(struct zone *zone, >> int classzone_idx, >> - struct scan_control *sc, >> - unsigned long *nr_attempted) >> + struct scan_control *sc) >> { >> int testorder = sc->order; > > You can remove testorder completely. Hm right, thanks. >> -static unsigned long balance_pgdat(pg_data_t *pgdat, int order, >> - int *classzone_idx) >> +static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) >> { >> int i; >> int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ >> @@ -3166,9 +3155,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, >> count_vm_event(PAGEOUTRUN); >> >> do { >> - unsigned long nr_attempted = 0; >> bool raise_priority = true; >> - bool pgdat_needs_compaction = (order > 0); >> >> sc.nr_reclaimed = 0; >> >> @@ -3203,7 +3190,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, >> break; >> } >> >> - if (!zone_balanced(zone, order, 0, 0)) { >> + if (!zone_balanced(zone, order, true, 0, 0)) { > > Should we use highorder = true? We eventually skip to reclaim in the > kswapd_shrink_zone() when zone_balanced(,,false,,) is true. Hmm right. I probably thought that the value of end_zone -> balanced_classzone_idx would be important when waking kcompactd, but it's not used, so it's causing just some wasted CPU cycles. Thanks for the reviews!