Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3256008ybi; Mon, 29 Jul 2019 03:34:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqz9Z66je97K22hqfPvDM+bXNxoyfPnFJ7DXe/7PLC9JWgR1FEjE5R0Y5CR53gR7IuPelspC X-Received: by 2002:a65:4489:: with SMTP id l9mr106714570pgq.207.1564396459876; Mon, 29 Jul 2019 03:34:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564396459; cv=none; d=google.com; s=arc-20160816; b=had3H5Z2rwX2CFbyvXNqRf5YitDwF9MSHzxM6KBQF0fdTxQTaSSUNCSKhdbWVCwq4E hUCjkddztM24HLY0Jge7270NIYJGf8mGRo3qNOKJh3uocdiqEO2C5jPi/LAkW62t/1eY ZMa4LipNSNKWY/ZvbkUrshhO+/f1VXOqPkkTl3eGYp5a8LAgFUZGXKD2c/XJFJ9455G1 saPGGAJ2KbrCThYfhc7LhjJI5dZfvTYz53VjnhytCJ70PATdZZ1wgeE4baLU6yw9foWv RflUvcRYQy8/DAy/2MJOQn5suQIqJkXzhjssul6jTdo0RWnSuoE8g/ZuSZpeijmev0jq ur8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=cP168jQU2woUazlHlOWFKLmLo1ka6ghcLnX/xNvPtVs=; b=ceBgPKze/xw1GeuDOY/wEymY7982HVfq/mdxXIdyqlAle2yhQEAlD3qwu8IdtJZeOd pJ+5y/fogz/SrYXT7pm5bg4k7sxMIm+znWkcxdQxqB7erCO7Glw6iRTn0uR/2gXBZTDn lYeBVAh5rsoRdF9gShXFrkr1EWzaDFqeCyXR/YqCpU67y4vvQ+Ch2YOqWqNtq8WWU5Fm Gkab/J6WfB5tsGfVutR931OCwYlnOP9pH6poKFW0EEQ4Gx29fFZigCvUrxsV/6uNLbgi Ub4EpCClZLbsaoNxJJeHSNNRXtomy1r6nThu/uWzKyUOoJjyT4OO+nTMMI8lYE7Zioqf DyvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w18si23345036pll.132.2019.07.29.03.34.03; Mon, 29 Jul 2019 03:34:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728258AbfG2KdJ (ORCPT + 99 others); Mon, 29 Jul 2019 06:33:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:54816 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726358AbfG2KdJ (ORCPT ); Mon, 29 Jul 2019 06:33:09 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BFE98AE1C; Mon, 29 Jul 2019 10:33:07 +0000 (UTC) Date: Mon, 29 Jul 2019 12:33:07 +0200 From: Michal Hocko To: Konstantin Khlebnikov Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Vladimir Davydov , Johannes Weiner Subject: Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop Message-ID: <20190729103307.GG9330@dhcp22.suse.cz> References: <156431697805.3170.6377599347542228221.stgit@buzz> <20190729091738.GF9330@dhcp22.suse.cz> <3d6fc779-2081-ba4b-22cf-be701d617bb4@yandex-team.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3d6fc779-2081-ba4b-22cf-be701d617bb4@yandex-team.ru> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 29-07-19 12:40:29, Konstantin Khlebnikov wrote: > On 29.07.2019 12:17, Michal Hocko wrote: > > On Sun 28-07-19 15:29:38, Konstantin Khlebnikov wrote: > > > High memory limit in memory cgroup allows to batch memory reclaiming and > > > defer it until returning into userland. This moves it out of any locks. > > > > > > Fixed gap between high and max limit works pretty well (we are using > > > 64 * NR_CPUS pages) except cases when one syscall allocates tons of > > > memory. This affects all other tasks in cgroup because they might hit > > > max memory limit in unhandy places and\or under hot locks. > > > > > > For example mmap with MAP_POPULATE or MAP_LOCKED might allocate a lot > > > of pages and push memory cgroup usage far ahead high memory limit. > > > > > > This patch uses halfway between high and max limits as threshold and > > > in this case starts memory reclaiming if mem_cgroup_handle_over_high() > > > called with argument only_severe = true, otherwise reclaim is deferred > > > till returning into userland. If high limits isn't set nothing changes. > > > > > > Now long running get_user_pages will periodically reclaim cgroup memory. > > > Other possible targets are generic file read/write iter loops. > > > > I do see how gup can lead to a large high limit excess, but could you be > > more specific why is that a problem? We should be reclaiming the similar > > number of pages cumulatively. > > > > Large gup might push usage close to limit and keep it here for a some time. > As a result concurrent allocations will enter direct reclaim right at > charging much more frequently. Yes, this is indeed prossible. On the other hand even the reclaim from the charge path doesn't really prevent from that happening because the context might get preempted or blocked on locks. So I guess we need a more detailed information of an actual world visible problem here. > Right now deferred recalaim after passing high limit works like distributed > memcg kswapd which reclaims memory in "background" and prevents completely > synchronous direct reclaim. > > Maybe somebody have any plans for real kswapd for memcg? I am not aware of that. The primary problem back then was that we simply cannot have a kernel thread per each memcg because that doesn't scale. Using kthreads and a dynamic pool of threads tends to be quite tricky - e.g. a proper accounting, scaling again. > I've put mem_cgroup_handle_over_high in gup next to cond_resched() and > later that gave me idea that this is good place for running any > deferred works, like bottom half for tasks. Right now this happens > only at switching into userspace. I am not against pushing high memory reclaim into the charge path in principle. I just want to hear how big of a problem this really is in practice. If this is mostly a theoretical problem that might hit then I would rather stick with the existing code though. -- Michal Hocko SUSE Labs