Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp3121741ybh; Mon, 5 Aug 2019 12:26:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqxmgbjVQ25KbLZ3+BuRgQ1HkUe/TaPzzN5/J2O4oH8/qF5oXtYEPQKop+Oin46GZAGPJnIj X-Received: by 2002:a17:902:aa83:: with SMTP id d3mr141299789plr.74.1565033185148; Mon, 05 Aug 2019 12:26:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565033185; cv=none; d=google.com; s=arc-20160816; b=hD41kMvN4nBFS+C2BWtlPaOAXhoYarAFsyM0Oke/QxNG4Tpz8hxYU4lSGLHrUkj7N3 cjeOwLx8SuFFozGDLkJRTKwk6Pl96jlNC778Ssutmmgq++yJa72KQ+NL5SQ3KKwLAkQr iFofJe6zKm/vYeSZAlv0bsjC2zrODYNQNE1T/I00BAb6LuOrURcx0SOKFhJ+1EXwwqBD QNljlS9kx86GBYB3opeTIG1PhkLyzVeWBA+lc3iQdXpXYdFM/uUUHqxxHKHJZ532J27F IwHFEhv5r+n7Zcq+iNzNulT8vEX7ESX6dgbY8/zJqvntXIGJHNLxmZpRG+0FThanimx4 6Xyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jBhZHZxcR0OIAcxY7/57CQdVnxqW0BKtiYJig9O330A=; b=jBIWabK5I9GG2+IyRKmgdSG1NkpUztWME54CwFJKOuuszuiVMmAOVGxgU9oXJXanwK z1v7IjE/QUbJFcZkFD+/rycro6NhRBAsb9a5zgIiQw1U9z3akZwZrPuT3QcBRwiTrI52 1jMu9330leXXI5Ah1L3Ah98jscECGXRXF39MGWHZBWSqtFstRIr7lKfzK4Odar3y5TLK PYHk0x7Qqk4EQgj4H4ppaNRIoNisZ4/NPchTWHR9lEjBnZCi/Szjdb+SCSzH4T0/QdZ5 coT0w5devqh4iA/IDcl68T7eMdRJ6vsCSXmsXoPqzqNHqjETfSXg8oGjIpPzQTkih/bW k04g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OWoEZJcf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t3si45814199pfh.218.2019.08.05.12.26.08; Mon, 05 Aug 2019 12:26:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OWoEZJcf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730371AbfHETZL (ORCPT + 99 others); Mon, 5 Aug 2019 15:25:11 -0400 Received: from mail-yw1-f66.google.com ([209.85.161.66]:33508 "EHLO mail-yw1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730011AbfHETZK (ORCPT ); Mon, 5 Aug 2019 15:25:10 -0400 Received: by mail-yw1-f66.google.com with SMTP id l124so30007398ywd.0 for ; Mon, 05 Aug 2019 12:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jBhZHZxcR0OIAcxY7/57CQdVnxqW0BKtiYJig9O330A=; b=OWoEZJcfzeB9SYxPFawo056U4q1al2l7k+0gU3JECGIOX5fJ9AIdUCnznZLyUznYRF Tt203xpJn7adxpO28UHjqgDOoQm2q85rp40ZFW+XF8uvBcrlSXeStp3k0U3tihqqdSZG HqAu/l+qBcDLyzGHEVoJraaaC6fsenAwxprN829u8nILdwy08WqWvqliXoPBXr2Y/7JM b7Nf9zKAJdeZoKdGBMxeicWw3wXpN10PUPyvRrn6Hzdwvuh0Ss0RCTr5DRAAeRPt0GnF uzYpyuws4/j/4HUokCCBpgk9/3CI8BxdwHhbi3hJWUq/6jl9NtTOOd8QPD+zqw5w5f6u nx+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jBhZHZxcR0OIAcxY7/57CQdVnxqW0BKtiYJig9O330A=; b=pGbbxbTS5UGXu58wLrjEmNROq6kJeZp3JqlWzJ/QhGC/VAX4VGb0ndMUO4uDF677ZZ 8vcU9ZRFKrZn3C1rcjZspgWtmbkcxlAVw5biWEN3bSN5oYL6PGddQYrl0KunG8k6bFh7 ZhUAZyM6RwilzPkQp9R0W3vGGFMBjfioyOd0geGneg/Ldcozei66C2xxv8tI7DLn0f9p uO4p0BqkG8MWzSoHMlmx5NN2gZLCZXVf8pDmh5lyF9xYV2cT8Y5XjXEcrL292wZaMFYz 7sge37pT5y3YxHtq0dW7qa4yF/vOLWd5WmrHDZGIFR4Xtz+DfrzIF2Baad9xs4uaLDwo sZvA== X-Gm-Message-State: APjAAAVRnKUMycAPk4gScj559ksO3dSHRsEZiNRmYPtgcVp2gtUDBz/1 yZ8QDDvRIQ0FEvWGqXm0dXb5cVE8cYPG0SqyHMqUDQ== X-Received: by 2002:a0d:cb42:: with SMTP id n63mr37267027ywd.205.1565033109503; Mon, 05 Aug 2019 12:25:09 -0700 (PDT) MIME-Version: 1.0 References: <156431697805.3170.6377599347542228221.stgit@buzz> <20190729091738.GF9330@dhcp22.suse.cz> <3d6fc779-2081-ba4b-22cf-be701d617bb4@yandex-team.ru> <20190729103307.GG9330@dhcp22.suse.cz> <20190729184850.GH9330@dhcp22.suse.cz> <20190802093507.GF6461@dhcp22.suse.cz> <20190805143239.GS7597@dhcp22.suse.cz> In-Reply-To: <20190805143239.GS7597@dhcp22.suse.cz> From: Shakeel Butt Date: Mon, 5 Aug 2019 12:24:58 -0700 Message-ID: Subject: Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop To: Michal Hocko Cc: Yang Shi , Konstantin Khlebnikov , Linux MM , Linux Kernel Mailing List , Cgroups , Vladimir Davydov , Johannes Weiner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 5, 2019 at 7:32 AM Michal Hocko wrote: > > On Fri 02-08-19 11:56:28, Yang Shi wrote: > > On Fri, Aug 2, 2019 at 2:35 AM Michal Hocko wrote: > > > > > > On Thu 01-08-19 14:00:51, Yang Shi wrote: > > > > On Mon, Jul 29, 2019 at 11:48 AM Michal Hocko wrote: > > > > > > > > > > On Mon 29-07-19 10:28:43, Yang Shi wrote: > > > > > [...] > > > > > > I don't worry too much about scale since the scale issue is not unique > > > > > > to background reclaim, direct reclaim may run into the same problem. > > > > > > > > > > Just to clarify. By scaling problem I mean 1:1 kswapd thread to memcg. > > > > > You can have thousands of memcgs and I do not think we really do want > > > > > to create one kswapd for each. Once we have a kswapd thread pool then we > > > > > get into a tricky land where a determinism/fairness would be non trivial > > > > > to achieve. Direct reclaim, on the other hand is bound by the workload > > > > > itself. > > > > > > > > Yes, I agree thread pool would introduce more latency than dedicated > > > > kswapd thread. But, it looks not that bad in our test. When memory > > > > allocation is fast, even though dedicated kswapd thread can't catch > > > > up. So, such background reclaim is best effort, not guaranteed. > > > > > > > > I don't quite get what you mean about fairness. Do you mean they may > > > > spend excessive cpu time then cause other processes starvation? I > > > > think this could be mitigated by properly organizing and setting > > > > groups. But, I agree this is tricky. > > > > > > No, I meant that the cost of reclaiming a unit of charges (e.g. > > > SWAP_CLUSTER_MAX) is not constant and depends on the state of the memory > > > on LRUs. Therefore any thread pool mechanism would lead to unfair > > > reclaim and non-deterministic behavior. > > > > Yes, the cost depends on the state of pages, but I still don't quite > > understand what does "unfair" refer to in this context. Do you mean > > some cgroups may reclaim much more than others? > > > Or the work may take too long so it can't not serve other cgroups in time? > > exactly. > How about allowing the users to implement their own user space kswapd? A memcg interface similar to MADV_PAGEOUT. Users can register for MEMCG_HIGH notification (it needs some modification) and on receiving the notification, the uswapd (User's kswapd) will trigger reclaim through memory.pageout (or memory.try_to_free_pages). One can argue why not just use MADV_PAGEOUT? In real workload, a job can be a combination of different sub-jobs and most probably may not know the importance of the memory layout of the tasks of the sub-jobs. So, a memcg level interface makes more sense there. Shakeel