Received: by 10.223.164.202 with SMTP id h10csp4009925wrb; Tue, 28 Nov 2017 22:37:46 -0800 (PST) X-Google-Smtp-Source: AGs4zMY43NIJoDFn8Kr5cTWXcErinpup97m7YHS1Nv9cI3H4QPDL9s8ZFhyZHO0NHKw1Iwcbfjmq X-Received: by 10.98.205.5 with SMTP id o5mr1895449pfg.39.1511937466815; Tue, 28 Nov 2017 22:37:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511937466; cv=none; d=google.com; s=arc-20160816; b=nJ62HEuV+MW9vRuVLNjsxVmqvsATGgcdrKD/p5MJMHyAD5bpq1oshfw/dRbpwhbH/8 HHK9PmHr6NELnKEIZFMuBMR8DEj4+ra1A9HcMGYX+Edw5T6nYb+mAmGSVKMFAd1I/0j5 9IWZ0YP7CjSJH1ILU4fh0FaK2dJ1ceV7FKkpEmC8umBGE9LF7k6sIdnrSZ6PURAmnJYX D9QTISDAuj0QXsxQCD8b3SD3oa8U7gWJS8SrUBT1NbChdMAVMUIheBEfgws2+KLIlM22 E3v9HLE3yLNdiCJlnELn0fWtwdWNNe8eH6FTFQw0FxuvNAlHqevFtULGSzcjyftJJBSK uh1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=cCLen/2zbRWgi3p8TJ/aP1JDu6z4voyS9nlvcpZ1qrA=; b=dcvJ/sLMR6luAnv5X7YW5wVtmdVJVjf1K8JQjuZ7vy6DbgoTU8u66bP/RC9cVdV0n9 zZwFYDjYMhZzGjN4Q2KsnH4FsH4O40N2BPbjt3KR9zey9RIsxgKge/l4qfduvEuwtKwe 9gRQWK4BmFOrFFB9Y1bHreAlcuuRXPWO+HAjdtE8aR0Xsi2xnkrOV2bf26TKAFQXGV8a bfHJKX+n46VcqmcV629p/y5+yuZ06nule/djAe2yj8F2fQXOmMQ6pggjpzLT/gviWikx V+UKd/9LStJacj6EF1g7KZhQV9f96vrmsLycdLDyImmVpbKF4g3HDh1qNr2EJCnxCQtk SXlg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g34si777107pld.328.2017.11.28.22.37.36; Tue, 28 Nov 2017 22:37:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752456AbdK2Gfy (ORCPT + 71 others); Wed, 29 Nov 2017 01:35:54 -0500 Received: from LGEAMRELO12.lge.com ([156.147.23.52]:59123 "EHLO lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752183AbdK2Gfr (ORCPT ); Wed, 29 Nov 2017 01:35:47 -0500 Received: from unknown (HELO lgemrelse7q.lge.com) (156.147.1.151) by 156.147.23.52 with ESMTP; 29 Nov 2017 15:35:45 +0900 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Received: from unknown (HELO localhost) (10.177.222.138) by 156.147.1.151 with ESMTP; 29 Nov 2017 15:35:45 +0900 X-Original-SENDERIP: 10.177.222.138 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Wed, 29 Nov 2017 15:41:46 +0900 From: Joonsoo Kim To: Vlastimil Babka Cc: Johannes Weiner , Andrew Morton , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm, compaction: direct freepage allocation for async direct compaction Message-ID: <20171129064146.GD8125@js1304-P5Q-DELUXE> References: <20171122143321.29501-1-hannes@cmpxchg.org> <32b5f1b6-e3aa-4f15-4ec6-5cbb5fe158d0@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <32b5f1b6-e3aa-4f15-4ec6-5cbb5fe158d0@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 22, 2017 at 03:52:55PM +0100, Vlastimil Babka wrote: > On 11/22/2017 03:33 PM, Johannes Weiner wrote: > > From: Vlastimil Babka > > > > The goal of direct compaction is to quickly make a high-order page available > > for the pending allocation. The free page scanner can add significant latency > > when searching for migration targets, although to succeed the compaction, the > > only important limit on the target free pages is that they must not come from > > the same order-aligned block as the migrated pages. > > > > This patch therefore makes direct async compaction allocate freepages directly > > from freelists. Pages that do come from the same block (which we cannot simply > > exclude from the freelist allocation) are put on separate list and released > > only after migration to allow them to merge. > > > > In addition to reduced stall, another advantage is that we split larger free > > pages for migration targets only when smaller pages are depleted, while the > > free scanner can split pages up to (order - 1) as it encouters them. However, > > this approach likely sacrifices some of the long-term anti-fragmentation > > features of a thorough compaction, so we limit the direct allocation approach > > to direct async compaction. > > > > For observational purposes, the patch introduces two new counters to > > /proc/vmstat. compact_free_direct_alloc counts how many pages were allocated > > directly without scanning, and compact_free_direct_miss counts the subset of > > these allocations that were from the wrong range and had to be held on the > > separate list. > > > > Signed-off-by: Vlastimil Babka > > Signed-off-by: Johannes Weiner > > --- > > > > Hi. I'm resending this because we've been struggling with the cost of > > compaction in our fleet, and this patch helps substantially. > > > > On 128G+ machines, we have seen isolate_freepages_block() eat up 40% > > of the CPU cycles and scanning up to a billion PFNs per minute. Not in > > a spike, but continuously, to service higher-order allocations from > > the network stack, fork (non-vmap stacks), THP, etc. during regular > > operation. > > > > I've been running this patch on a handful of less-affected but still > > pretty bad machines for a week, and the results look pretty great: > > > > http://cmpxchg.org/compactdirectalloc/compactdirectalloc.png > > Thanks a lot, that's very encouraging! > > > > > Note the two different scales - otherwise the compact_free_direct > > lines wouldn't be visible. The free scanner peaks close to 10M pages > > checked per minute, whereas the direct allocations peak at under 180 > > per minute, direct misses at 50. > > > > The work doesn't increase over this period, which is a good sign that > > long-term we're not trending toward worse fragmentation. > > > > There was an outstanding concern from Joonsoo regarding this patch - > > https://marc.info/?l=linux-mm&m=146035962702122&w=2 - although that > > didn't seem to affect us much in practice. > > That concern would be easy to fix, but I was also concerned that if there > are multiple direct compactions in parallel, they might keep too many free > pages isolated away. Recently I resumed work on this and come up with a > different approach, where I put the pages immediately back on tail of > free lists. There might be some downside in more "direct misses". > Also I didn't plan to restrict this to async compaction anymore, because > if it's a better way, we should use it everywhere. So here's how it > looks like now (only briefly tested), we could compare and pick the better > approach, or go with the older one for now and potentially change it later. IMHO, "good bye free scanner" is a way to go. My major concern is that co-existence of two different compaction algorithms make the system behaviour less predictable and make debugging hard. And, this compaction stall is immediate and actual problem reported many times unlike theoretical long term fragmentation which current freepage scanner try to prevent. Thanks. From 1585380706235357647@xxx Wed Nov 29 06:27:41 +0000 2017 X-GM-THRID: 1584777147397009876 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread