Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp252967ybf; Wed, 26 Feb 2020 12:26:22 -0800 (PST) X-Google-Smtp-Source: APXvYqynorFp9Y8el8kIIqubbjXdHQt6QWBwWNjqB2sRTKAArxBciUXVV6eZlZObevQstLhlQwGo X-Received: by 2002:a05:6830:50:: with SMTP id d16mr487748otp.166.1582748781899; Wed, 26 Feb 2020 12:26:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582748781; cv=none; d=google.com; s=arc-20160816; b=ZJ7pHovd64HtyH4aMwi9HUKKTOi9f0h37eX/x7anm9M03yQO9VWdSS4ut27sOYYZ7g dvZVe/8WBKpHXuVZRaYEsOQVjNCT9eBcH+K+EARODF1gabmRnejem1xEX5zqR4uhf/4i IGTQzQSr9pNZ+GzcZbfZ2tIx1EsiaNCUwhPqRJSzqEQJg49hEdb7YcF31cQHB8FKT61i s10kFeyUttAk3RTBDhsu2v1E/VloMwo2sw3yYwt8dX23LaeJgq5Krr5FvRhrQ817h5iy 8U565jVzSThjgx7JJeTrT/6uWL1D9IzrwxERXYsfK5oqJepkFBDQ1WUyQk5iHPOTv4j3 ncog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=KwWaD9UNYEDZjJ4EPfYXtQOHPHFo1HH/Frc1ArcblIg=; b=TUCcqZ8Fzc7JNRaVHkHq28st4Rto7fG85kbGkqwkN1ja39v1lK3JVGIsC6H3JxAAKQ knnmZah3qkBnS52HlMCeeTMrzzod9SrwG21V9GElXONVY00AkzdHImMtD2U+PJ0WU2WF yprzuuZfeOnBFHBwPtb1CJNMVtH6vg3Gt5YckJRrk7x8kQlJGgfSjMk9bsJcfeQswRvU wgbHI/gYlgBfcndOkWQqP/uGH34Cki1+FgtikFGQOEiB+Ti3w5bzOBE1bcgCHyCWy6pL Q0ZvvF+tH4fLlgbfKUKtFk/VVwzwG+ENvU+dvY0TZ+c3Jwff4ALwKA6zxSTu20UrENAj fGcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YuXdKjW0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b8si353997otf.32.2020.02.26.12.26.10; Wed, 26 Feb 2020 12:26:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YuXdKjW0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727479AbgBZUZs (ORCPT + 99 others); Wed, 26 Feb 2020 15:25:48 -0500 Received: from mail-ot1-f65.google.com ([209.85.210.65]:37240 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727260AbgBZUZr (ORCPT ); Wed, 26 Feb 2020 15:25:47 -0500 Received: by mail-ot1-f65.google.com with SMTP id b3so726320otp.4 for ; Wed, 26 Feb 2020 12:25:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KwWaD9UNYEDZjJ4EPfYXtQOHPHFo1HH/Frc1ArcblIg=; b=YuXdKjW0oghU8BUd9VZISKdnbYPb7Iij9fXMHDfJe6Jc2da/+3gxW0hpZVtNmG9HsD EgJ+RO+EF51ANqkErW6DkmniTVm1OuwWvXCS/DeUCFFoC7Vg6l0+rsecJFI2MCsws2YZ j4Ug5p4cr2T4EyGr7MQJ58Uc1nWUvJ0ZHX6WNhJ/sYLo7ClXiBwKQFJKsYPpYPRZi/LX pcEkyCMRg5juMyH3rQD2+4+jjLvkP2F0mKSQ2RVAxPF4/BaYnCqS+LfpnBHSfSbsCrW2 ZSHA29PXfqFjNpsuNhDYO25EiDZCs+AABYp7G36xqJMvHoBYN0RH5TR4yiHd6KoaXuk8 dJKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KwWaD9UNYEDZjJ4EPfYXtQOHPHFo1HH/Frc1ArcblIg=; b=UVIV3NgRcUmpxGprkEaB/cfWHWNZn1n42TFWY70f/naoQ9NcCimOL9WAKscs1e9GxZ JsjsfByozDTew0Ma/2SQo+sFeUGPRG9ksK2Y5NiFjfmM1SXk15pWIfkT/7U0gMRNcrX9 hsfZjkr6x99Kr1ji7IJsoA1r/LNOcUI/67ecFj/czU5AhRo1jv/5BRkw0TTNsOLW96Dz BzhiBfwbkc768E3QEHTJP3p+gHQY1mRB4jEhQrzZmV7KnM4v7W7vBO/U06OL5bSx0o00 YThbzQZVN3n9kaEGE/XPMzpe46zrn87ciEL9pOFWSoRK+YveNg5AjDon8ievMDxls3Gr HsOg== X-Gm-Message-State: APjAAAVBdCHpnko+X6UJuhWPg89JhS5+dFcaIyEsHwXKY2kYW2nhdY1q cgLusZixIhyy3Qa6Kn7Lwtb8oHbPYvcj6Lv78j+4ZQ== X-Received: by 2002:a05:6830:1e2b:: with SMTP id t11mr449644otr.81.1582748745385; Wed, 26 Feb 2020 12:25:45 -0800 (PST) MIME-Version: 1.0 References: <20200219181219.54356-1-hannes@cmpxchg.org> In-Reply-To: <20200219181219.54356-1-hannes@cmpxchg.org> From: Shakeel Butt Date: Wed, 26 Feb 2020 12:25:33 -0800 Message-ID: Subject: Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high To: Johannes Weiner , Yang Shi Cc: Andrew Morton , Michal Hocko , Tejun Heo , Roman Gushchin , Linux MM , Cgroups , LKML , Kernel Team Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 19, 2020 at 10:12 AM Johannes Weiner wrote: > > We have received regression reports from users whose workloads moved > into containers and subsequently encountered new latencies. For some > users these were a nuisance, but for some it meant missing their SLA > response times. We tracked those delays down to cgroup limits, which > inject direct reclaim stalls into the workload where previously all > reclaim was handled my kswapd. > > This patch adds asynchronous reclaim to the memory.high cgroup limit > while keeping direct reclaim as a fallback. In our testing, this > eliminated all direct reclaim from the affected workload. > > memory.high has a grace buffer of about 4% between when it becomes > exceeded and when allocating threads get throttled. We can use the > same buffer for the async reclaimer to operate in. If the worker > cannot keep up and the grace buffer is exceeded, allocating threads > will fall back to direct reclaim before getting throttled. > > For irq-context, there's already async memory.high enforcement. Re-use > that work item for all allocating contexts, but switch it to the > unbound workqueue so reclaim work doesn't compete with the workload. > The work item is per cgroup, which means the workqueue infrastructure > will create at maximum one worker thread per reclaiming cgroup. > > Signed-off-by: Johannes Weiner > --- > mm/memcontrol.c | 60 +++++++++++++++++++++++++++++++++++++------------ > mm/vmscan.c | 10 +++++++-- This reminds me of the per-memcg kswapd proposal from LSFMM 2018 (https://lwn.net/Articles/753162/). If I understand this correctly, the use-case is that the job instead of direct reclaiming (potentially in latency sensitive tasks), prefers a background non-latency sensitive task to do the reclaim. I am wondering if we can use the memory.high notification along with a new memcg interface (like memory.try_to_free_pages) to implement a user space background reclaimer. That would resolve the cpu accounting concerns as the user space background reclaimer can share the cpu cost with the task. One concern with this approach will be that the memory.high notification is too late and the latency sensitive task has faced the stall. We can either introduce a threshold notification or another notification only limit like memory.near_high which can be set based on the job's rate of allocations and when the usage hits this limit just notify the user space. Shakeel