Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp3343850pxb; Thu, 10 Feb 2022 19:17:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJx1ttjBSfU7lKsUD9ew9T6SEhB5NL/WSJZEXtzXdZZVWOsKxHWV5wSgD/q+zS14+RgsXYOy X-Received: by 2002:aa7:d654:: with SMTP id v20mr11622374edr.146.1644549433799; Thu, 10 Feb 2022 19:17:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644549433; cv=none; d=google.com; s=arc-20160816; b=m0w9gvNCcwEg/5ppky09Vn3la9v3AySFjpdTP4WAt4MZmvGr4kDi14jg8qSJin3eUh 5FAvvOPkACB6WR99W4kpQJBZua3bXxxTHd5MSeZdxiiAToIxa4czsI4S/U+eSiKmnL2n HV4wiftNoVkYFjnl5F2NJ286qeWmuQKvwcWlMl8LR9UmKhbOI7OJ7QkwBTeWGJyyna7A PJleW0h5T8krytlfZ1XoxmGR3inj2hehhZdF3g5bzOij/EAlnMfupYTGsye0rB7BU85W JLZ6qWXi16JzZdfLdDwI70hNXDPC4Bq9HhRqOiweCD5ONQ7pMhPBxruHm1piYGkLreT9 S6eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=FYlkrxPZ2Z2zJ0ERqrS7qUlWPElIVoGvIlqioJZlkEg=; b=GL0++AQSnbj8GCKnufbCuvjAY86YkQ7pLWTQ/+5mG+HIbtFqBZ2hl5HVBi9jyOvT1/ DHePboI3evIYsa+q76nFJb/An89vj3um3TeywafhOqawVKfMc7fqlLEUBUqqHDxcudlO BCIMiBWcdktAYy8vL5JulB/Nqs40rAMgEI7exvzbApRndUhMEV4bm70nAvrvzmAtoGht col6aRNCp4hkzSABervVzacqMNME++KEBximKekhVObwVfaAHBXxRLq+tnlSJODOIH3L ntU1MYpzcR1MXMv7wFMEjUJT7bDuud3xpojLReukqycm2zaEN26Cbk3GBJ4upOVo1yRW znnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=N7sJdoUS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr3si11745388ejc.955.2022.02.10.19.16.48; Thu, 10 Feb 2022 19:17:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=N7sJdoUS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345932AbiBJXxP (ORCPT + 99 others); Thu, 10 Feb 2022 18:53:15 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:54342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345927AbiBJXxN (ORCPT ); Thu, 10 Feb 2022 18:53:13 -0500 Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E416B5F75 for ; Thu, 10 Feb 2022 15:53:13 -0800 (PST) Received: by mail-lf1-x135.google.com with SMTP id u6so13398603lfc.3 for ; Thu, 10 Feb 2022 15:53:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FYlkrxPZ2Z2zJ0ERqrS7qUlWPElIVoGvIlqioJZlkEg=; b=N7sJdoUSvYyq4MjeAPZ+AFQb0pTRwxjWTrsPhX0XMSt6+2JQTgiCBZcetXZs1nn0iF 4Mnb+PdRGBlJgMhfrAdhn5bYjANfKDHCNuCeLZDHMFckT6QMOsgNCIMfFzeekLGylJ5P wNUQvGaZVE164HcLoLdHVIv4PMt/RYLPqLzyYPWKs2LWg0XHv4Cng2Bg0iIqBVVl9dhC PuoIWkNoA7GZVZcRSnk/dg6fjwvxyIJ1DSLaTAgIgroZT3KUMkI62C/BmXmLWdbltMZD FtofcFN19iXXP4W5nA4N2/9YVNIv3zJSO3wYrw2lnv6lXYCFZED9GP4KMjf5FAU7UHK2 JrEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FYlkrxPZ2Z2zJ0ERqrS7qUlWPElIVoGvIlqioJZlkEg=; b=MCf8M5p+G3O7QKJS3FSK2gMTxyOYWnW2UJ1V9Pr/a4gbCAZlvgxlypRg5lKnh35NU+ cY+4GclN7tTsWaFzGKZ01YcNA7MPyy+Ol7gbRLKMZKYfwqVSywaafA+CD7IMzYK5V6U5 NfIGBDjsV9CnfFEoi5/bSxjVPwu2g7upnXbs+1rd+Ae1Z8LZkhfhdPkrDXtW4UiON/pL NtYoE1uPhmx3FjaP+Vw74/ffZgYjikgZBHXwUM/+0qoJ55J4jVwvqr2YDbzAkGPS4CFv jNZU1SkhjaGvxvtMkLFuQ7Gjz+8Y0WKJKfiIQHeZveNxI33m3SW7kaw22GXILWkiMkYG TfkA== X-Gm-Message-State: AOAM530rZE26nQj0U2tqrfDR/fDCKEjQd1l3tBB4bBIQQyPlz+W4jzDZ KqkNh6Zj9knEvqu30ehXW+WSMaxF+Osp3IlWILtw0w== X-Received: by 2002:a19:9144:: with SMTP id y4mr6846910lfj.494.1644537192004; Thu, 10 Feb 2022 15:53:12 -0800 (PST) MIME-Version: 1.0 References: <20220210081437.1884008-1-shakeelb@google.com> <20220210081437.1884008-5-shakeelb@google.com> In-Reply-To: From: Shakeel Butt Date: Thu, 10 Feb 2022 15:53:00 -0800 Message-ID: Subject: Re: [PATCH 4/4] memcg: synchronously enforce memory.high To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Chris Down , Andrew Morton , Cgroups , Linux MM , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 10, 2022 at 3:29 PM Roman Gushchin wrote: > > On Thu, Feb 10, 2022 at 02:22:36PM -0800, Shakeel Butt wrote: > > On Thu, Feb 10, 2022 at 12:15 PM Roman Gushchin wrote: > > > > > [...] > > > > > > Has this approach been extensively tested in the production? > > > > > > Injecting sleeps at return-to-userspace moment is safe in terms of priority > > > inversions: a slowed down task will unlikely affect the rest of the system. > > > > > > It way less predictable for a random allocation in the kernel mode, what if > > > the task is already holding a system-wide resource? > > > > > > Someone might argue that it's not better than a system-wide memory shortage > > > and the same allocation might go into a direct reclaim anyway, but with > > > the way how memory.high is used it will happen way more often. > > > > > > > Thanks for the review. > > > > This patchset is tested in the test environment for now and I do plan > > to test this in production but that is a slow process and will take > > some time. > > > > Let me answer the main concern you have raised i.e. the safety of > > throttling a task synchronously in the charge code path. Please note > > that synchronous memory reclaim and oom-killing can already cause the > > priority inversion issues you have mentioned. The way we usually > > tackle such issues are through userspace controllers. For example oomd > > is the userspace solution for catering such issues related to > > oom-killing. Here we have a similar userspace daemon monitoring the > > workload and deciding if it should let the workload grow or kill it. > > > > Now should we keep the current high limit enforcement implementation > > and let it be ineffective for some real workloads or should we make > > the enforcement more robust and let the userspace tackle some corner > > case priority inversion issues. I think we should follow the second > > option as we already have precedence of doing the same for reclaim and > > oom-killing. > > Well, in a theory it sounds good and I have no intention to oppose the > idea. However in practice we might easily get quite serious problems. > So I think we should be extra careful here. In the end we don't want to > pull and then revert this patch. > > The difference between the system-wide direct reclaim and this case is that > usually kswapd is doing a good job of refilling the empty buffer, so we don't > usually work in the circumstances of the global memory shortage. And when we do, > often it's not working out quite well, this is why oomd and other similar > solutions are required. >. > Another option is to use your approach only for special cases (e.g. huge > allocations) and keep the existing approach for most other allocations. > These are not necessarily huge allocations and can be a large number of small allocations. However I think we can make this idea work by checking current->memcg_nr_pages_over_high. If order(current->memcg_nr_pages_over_high) is, let's say, larger than PAGE_ALLOC_COSTLY_ORDER, then throttle synchronously. WDYT?