Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp528790pxb; Tue, 14 Sep 2021 03:11:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwpskHDy8RMlduiOv7HIXFO2GbwBrjnNcTNuwU+6BgJZOeqzRJ5/m7G7XFWBnveBpY5oGg+ X-Received: by 2002:aa7:cb0f:: with SMTP id s15mr18583493edt.190.1631614310412; Tue, 14 Sep 2021 03:11:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631614310; cv=none; d=google.com; s=arc-20160816; b=m/k9VAs60HQWAagsE1y5X/uliA0QUR0QK1rNph2nz4mfjC97bn73mZgV6sF0sDw2bq Pz999LqHqEPXgfh1fYg/Hp3LhUQnEWpCZGO/ijhECNukZVuzQc0bfAuD4aBd4Rp0Y4rk f/lFLJle1tucSSuSjFbjnsaGrM+hadZMIe+LgsOWXi6mTHyA0h/2cEnUMTKAdPAPgOkL 10ytFUrtnOUlQb2J/Z+qrFte/HsRCQX7TjrB9bvUXzSXupDHCpx9A9Pygj+XWnUaYEFt wbmvISYYHeZ2hQr/m7eUcTsyROFMvegFSqMmp04gGvQJMVikobM+PriyTz/pjDfmxpKT xSYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature; bh=87Usn2aUyaGukdvTQyAtkwgJSSvFWpDAI/0/mAUhfC4=; b=DGxi9i8Jl06rYoclBk1OSco7P++Vgesr1WQkzS7xdrxZhtyKssHoIwh0HknzcZfer5 07x9agcJ33EJOyvrDkG8oDG6ePW+8Qy1BjxSbvPPktJ5bHDS6Yqf90tFqvk6C44UTHhJ RQa74hfx5OL5ev9Ph5swbqDKCl6U/As1iRPDHOsBqRvYJQ58QdDdiG+ob6FwXbdGeULX ALVYk/pmK9BMXNSR8iw+XLYovXm0uqUFTu1y1jnzZ3LLdhsQAaTfCrc6VUWrQaN4RQaz VwB5MWFdklB9pzPnvXTstJv4iAsNPrYXqh4VLb3nH1/erCU9n6FAvAjwLSlgpsN5Mvuy PKVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=Eri6QZEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jy1si10031516ejc.592.2021.09.14.03.11.26; Tue, 14 Sep 2021 03:11:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=Eri6QZEd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231352AbhINKLZ (ORCPT + 99 others); Tue, 14 Sep 2021 06:11:25 -0400 Received: from relay.sw.ru ([185.231.240.75]:60568 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229969AbhINKLY (ORCPT ); Tue, 14 Sep 2021 06:11:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:Subject :From; bh=87Usn2aUyaGukdvTQyAtkwgJSSvFWpDAI/0/mAUhfC4=; b=Eri6QZEdGzXIi6+R334 MJpM9u15Zah1NXvVDf73rRVgwcCQIRy7ETslE/3Nnh1JPqIagnOR3jsiiXGO51H6WySfgswpGti8d ewNeU1Hl5U0GGI1TjYT+IcLjNRElnHlQr6VYemDwhb4E5pI/L2xEBgk51AuwEbgsEHQ+vVIlWaE=; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mQ5O9-001wA6-K9; Tue, 14 Sep 2021 13:10:05 +0300 From: Vasily Averin Subject: [PATCH memcg v2] memcg: prohibit unconditional exceeding the limit of dying tasks To: Michal Hocko , Johannes Weiner , Vladimir Davydov , Andrew Morton , Tetsuo Handa Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org References: Message-ID: <817a6ce2-4da9-72ac-c5b9-edd398d28a15@virtuozzo.com> Date: Tue, 14 Sep 2021 13:10:04 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The kernel currently allows dying tasks to exceed the memcg limits. The allocation is expected to be the last one and the occupied memory will be freed soon. This is not always true because it can be part of the huge vmalloc allocation. Allowed once, they will repeat over and over again. Moreover lifetime of the allocated object can differ from the lifetime of the dying task. Multiple such allocations running concurrently can not only overuse the memcg limit, but can lead to a global out of memory and, in the worst case, cause the host to panic. This patch removes checks forced exceed of the memcg limit for dying tasks. Also it breaks endless loop for tasks bypassed by the oom killer. In addition, it renames should_force_charge() helper to task_is_dying() because now its use do not lead to the forced charge. Suggested-by: Michal Hocko Signed-off-by: Vasily Averin --- mm/memcontrol.c | 27 ++++++++------------------- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 389b5766e74f..707f6640edda 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -234,7 +234,7 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) -static inline bool should_force_charge(void) +static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || (current->flags & PF_EXITING); @@ -1607,7 +1607,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ - ret = should_force_charge() || out_of_memory(&oc); + ret = task_is_dying() || out_of_memory(&oc); unlock: mutex_unlock(&oom_lock); @@ -2588,6 +2588,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, struct page_counter *counter; enum oom_status oom_status; unsigned long nr_reclaimed; + bool passed_oom = false; bool may_swap = true; bool drained = false; unsigned long pflags; @@ -2622,15 +2623,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, if (gfp_mask & __GFP_ATOMIC) goto force; - /* - * Unlike in global OOM situations, memcg is not in a physical - * memory shortage. Allow dying and OOM-killed tasks to - * bypass the last charges so that they can exit quickly and - * free their memory. - */ - if (unlikely(should_force_charge())) - goto force; - /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, @@ -2688,8 +2680,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, if (gfp_mask & __GFP_RETRY_MAYFAIL) goto nomem; - if (fatal_signal_pending(current)) - goto force; + /* Avoid endless loop for tasks bypassed by the oom killer */ + if (passed_oom && task_is_dying()) + goto nomem; /* * keep retrying as long as the memcg oom killer is able to make @@ -2698,14 +2691,10 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, */ oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask, get_order(nr_pages * PAGE_SIZE)); - switch (oom_status) { - case OOM_SUCCESS: + if (oom_status == OOM_SUCCESS) { + passed_oom = true; nr_retries = MAX_RECLAIM_RETRIES; goto retry; - case OOM_FAILED: - goto force; - default: - goto nomem; } nomem: if (!(gfp_mask & __GFP_NOFAIL)) -- 2.31.1