Received: by 10.213.65.68 with SMTP id h4csp224214imn; Tue, 20 Mar 2018 01:41:25 -0700 (PDT) X-Google-Smtp-Source: AG47ELscvi9k2MY6G4PbH8alIdLzzOgzybpm9TmZrqzxaG0ezSjYa005aDC8ERbD2P/G3UdeRAr0 X-Received: by 10.98.133.193 with SMTP id m62mr12931397pfk.74.1521535284971; Tue, 20 Mar 2018 01:41:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521535284; cv=none; d=google.com; s=arc-20160816; b=HX1sN/yCLCLGtDnYmv32mrV8DvaRK0kVlUNHUdWPLNkEWGIeI6B7UMAJqrOHXlG0IB ZVtvx01AszB0SoZ/BZFDiMz++rmrF9FwLkq/5u7Nocoj0OUQt48FlOX0YKV2nZPihtyB itU1yvH0gy3BB0ExWcoCXn1VMg3XwX7O9gT2+8Xzs5+EjSWDdk+7+3YcX16cX7hcwE0O wj9NgBDMZZHqaA826J3iGpBvsN6MUFcuBy9aiU58zVS0V8logTSsR02J5Ggn2SWSPO/d flGc3LWZfanuYFXNMlnTOW6zZO5pB9wapUCL8b25j16hl/mcEpeFhM+FW/2M39jxH1he zn8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=mNfmnIpRm3NBS3GGoVJTbl8ClMiZJRyDvA4gB1lALjY=; b=Vd2FuhZhuyXUJ4OQ/lb1iPTSRakVmBaFYJk9kPtHWruByh8bkuabo26AAyV2bCB76K KLTR2DAfcqwcsyEVn/27fZ0sbGZK717H60H5xhyKMoMRQ3c0GJKNwihvemzAdDjoiCaX 4371AFmdkF9Cswjp27EAazNw6VHAXnPIpJD894Dht4b8StHoCRnos54+fEySq5JLCszm Qu4e/N1LkQtdpciTjOlxobz5kXsWVZIPbswxKWlu3PW3JquNb6Jv53XHn+kCTIiN9cdJ XZkiCaqI94l1YKxIiMir942M0GLCte8Z5E7yDXekwk1s6yCZapBVJuKf4vTM1vtrZAdB qy8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g23-v6si1099877plo.697.2018.03.20.01.41.07; Tue, 20 Mar 2018 01:41:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752275AbeCTIj4 (ORCPT + 99 others); Tue, 20 Mar 2018 04:39:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:39313 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751774AbeCTIjz (ORCPT ); Tue, 20 Mar 2018 04:39:55 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 8127EACF0; Tue, 20 Mar 2018 08:39:53 +0000 (UTC) Date: Tue, 20 Mar 2018 09:39:50 +0100 From: Michal Hocko To: David Rientjes Cc: "Li,Rongqing" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "hannes@cmpxchg.org" , Andrey Ryabinin Subject: Re: =?utf-8?B?562U5aSNOiDnrZTlpI06IFtQQVRD?= =?utf-8?Q?H=5D?= mm/memcontrol.c: speed up to force empty a memory cgroup Message-ID: <20180320083950.GD23100@dhcp22.suse.cz> References: <1521448170-19482-1-git-send-email-lirongqing@baidu.com> <20180319085355.GQ23100@dhcp22.suse.cz> <2AD939572F25A448A3AE3CAEA61328C23745764B@BC-MAIL-M28.internal.baidu.com> <20180319103756.GV23100@dhcp22.suse.cz> <2AD939572F25A448A3AE3CAEA61328C2374589DC@BC-MAIL-M28.internal.baidu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 19-03-18 10:51:57, David Rientjes wrote: > On Mon, 19 Mar 2018, Li,Rongqing wrote: > > > > > Although SWAP_CLUSTER_MAX is used at the lower level, but the call > > > > stack of try_to_free_mem_cgroup_pages is too long, increase the > > > > nr_to_reclaim can reduce times of calling > > > > function[do_try_to_free_pages, shrink_zones, hrink_node ] > > > > > > > > mem_cgroup_resize_limit > > > > --->try_to_free_mem_cgroup_pages: .nr_to_reclaim = max(1024, > > > > --->SWAP_CLUSTER_MAX), > > > > ---> do_try_to_free_pages > > > > ---> shrink_zones > > > > --->shrink_node > > > > ---> shrink_node_memcg > > > > ---> shrink_list <-------loop will happen in this place > > > [times=1024/32] > > > > ---> shrink_page_list > > > > > > Can you actually measure this to be the culprit. Because we should rethink > > > our call path if it is too complicated/deep to perform well. > > > Adding arbitrary batch sizes doesn't sound like a good way to go to me. > > > > Ok, I will try > > > > Looping in mem_cgroup_resize_limit(), which takes memcg_limit_mutex on > every iteration which contends with lowering limits in other cgroups (on > our systems, thousands), calling try_to_free_mem_cgroup_pages() with less > than SWAP_CLUSTER_MAX is lame. Well, if the global lock is a bottleneck in your deployments then we can come up with something more clever. E.g. per hierarchy locking or even drop the lock for the reclaim altogether. If we reclaim in SWAP_CLUSTER_MAX then the potential over-reclaim risk quite low when multiple users are shrinking the same (sub)hierarchy. > It would probably be best to limit the > nr_pages to the amount that needs to be reclaimed, though, rather than > over reclaiming. How do you achieve that? The charging path is not synchornized with the shrinking one at all. > If you wanted to be invasive, you could change page_counter_limit() to > return the count - limit, fix up the callers that look for -EBUSY, and > then use max(val, SWAP_CLUSTER_MAX) as your nr_pages. I am not sure I understand -- Michal Hocko SUSE Labs