Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp751769pxb; Tue, 19 Oct 2021 12:12:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5oKRnik8Y3fcnoD00sS8f1u+nuMqrr/DUA1HD4EQRWshJ3/7ars2edNPEltj4aALoYEkT X-Received: by 2002:a63:794b:: with SMTP id u72mr30917548pgc.191.1634670742042; Tue, 19 Oct 2021 12:12:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634670742; cv=none; d=google.com; s=arc-20160816; b=VsbuhhjRALPT2b9rW2pYUOyaXDWykRi07ufqx5KM2PNLtcxvHx6/4iC+ON9VdM/EXd QgpGhS5neviIcfQaeqZh7HDiX3qFon1YbOX+BroAU27iyWsSvyeqYqzk2FdoJuiGvQio Ce0HiN/5t/55x9fkSO6qzHwwZeol0LHd8SVuaOEN2u0bMZ7jPyMLKb5YXhrjYM8g2VxL 3bZ8s7LnVlyxf2T5tisP7NQYnb0mir9/h7Y5bdQMtpfvEOwgYN3P9nc9QE85mzy5YMaC /t/8aMnJx+HpFCSHRuVKW0IDLOrADLWQ2FqPIDXssMhB18dUShQFt2fpwDdIsUmzXSql EvHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=Yqr1YShrQcBCidEtPOA9sMOR94IFy+dgIlzbwZ5QLM0=; b=WWgwno2vT38aLyytOars0kCd1M2udy4ZrIqYJJZ9Le6QLhNfx7nQTUgx7RWm2d0ohy iOeUXZI03ybzfajWCgcUkgZoZoRBM09CJIolNImvwZLKz2TqjQGFTCPLTCQvpAnSPet2 itFM275aqgOu6z/X5sEZzqIs8xVwyPi03tDjiEWT8ABjCGMdnTdhcsY/qnu2xuiyjn52 fdPwX+qCvuxUbwCEn4xD7bea7RFU2XNnMNQww/wicofvTQXnyC/Y/DMA7KNjiaTVczwE 6Bf7OAe3iqidxV4u6jj2CIzSvMvEaGITaueoiTh7i4n+zoPnJHMqTmgjsdyFUDTaPtNB NgVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=hmWnZipt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q12si34500798plx.447.2021.10.19.12.12.08; Tue, 19 Oct 2021 12:12:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=hmWnZipt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234401AbhJSTMC (ORCPT + 99 others); Tue, 19 Oct 2021 15:12:02 -0400 Received: from relay.sw.ru ([185.231.240.75]:59846 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231355AbhJSTMB (ORCPT ); Tue, 19 Oct 2021 15:12:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=Yqr1YShrQcBCidEtPOA9sMOR94IFy+dgIlzbwZ5QLM0=; b=hmWnZiptUY0gW9Ixx F3iAf57vhvo+PH2jLrgp3MwAYAFI8HIJaGCey++5FcHdvnDNHuBaz5vYYRiGRkSJvnoKV5HLs30ES ap16a4SDbcsjwIAEnIJvKsavVyn44eyjZqFjeB/JlipTbu6UfhyqHrlTBG0a1RypKzEgx5lsCjKXI =; Received: from [172.29.1.17] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mcuUX-006Vw6-6X; Tue, 19 Oct 2021 22:09:41 +0300 Subject: Re: [PATCH memcg 0/1] false global OOM triggered by memcg-limited task To: Michal Hocko Cc: Johannes Weiner , Vladimir Davydov , Andrew Morton , Roman Gushchin , Uladzislau Rezki , Vlastimil Babka , Shakeel Butt , Mel Gorman , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org References: <9d10df01-0127-fb40-81c3-cc53c9733c3e@virtuozzo.com> <6b751abe-aa52-d1d8-2631-ec471975cc3a@virtuozzo.com> <339ae4b5-6efd-8fc2-33f1-2eb3aee71cb2@virtuozzo.com> <687bf489-f7a7-5604-25c5-0c1a09e0905b@virtuozzo.com> <6c422150-593f-f601-8f91-914c6c5e82f4@virtuozzo.com> From: Vasily Averin Message-ID: <3c76e2d7-e545-ef34-b2c3-a5f63b1eff51@virtuozzo.com> Date: Tue, 19 Oct 2021 22:09:19 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.10.2021 17:13, Michal Hocko wrote: > On Tue 19-10-21 16:26:50, Vasily Averin wrote: >> On 19.10.2021 15:04, Michal Hocko wrote: >>> On Tue 19-10-21 13:54:42, Michal Hocko wrote: >>>> On Tue 19-10-21 13:30:06, Vasily Averin wrote: >>>>> On 19.10.2021 11:49, Michal Hocko wrote: >>>>>> On Tue 19-10-21 09:30:18, Vasily Averin wrote: >>>>>> [...] >>>>>>> With my patch ("memcg: prohibit unconditional exceeding the limit of dying tasks") try_charge_memcg() can fail: >>>>>>> a) due to fatal signal >>>>>>> b) when mem_cgroup_oom -> mem_cgroup_out_of_memory -> out_of_memory() returns false (when select_bad_process() found nothing) >>>>>>> >>>>>>> To handle a) we can follow to your suggestion and skip excution of out_of_memory() in pagefault_out_of memory() >>>>>>> To handle b) we can go to retry: if mem_cgroup_oom() return OOM_FAILED. >>>>> >>>>>> How is b) possible without current being killed? Do we allow remote >>>>>> charging? >>>>> >>>>> out_of_memory for memcg_oom >>>>> select_bad_process >>>>> mem_cgroup_scan_tasks >>>>> oom_evaluate_task >>>>> oom_badness >>>>> >>>>> /* >>>>> * Do not even consider tasks which are explicitly marked oom >>>>> * unkillable or have been already oom reaped or the are in >>>>> * the middle of vfork >>>>> */ >>>>> adj = (long)p->signal->oom_score_adj; >>>>> if (adj == OOM_SCORE_ADJ_MIN || >>>>> test_bit(MMF_OOM_SKIP, &p->mm->flags) || >>>>> in_vfork(p)) { >>>>> task_unlock(p); >>>>> return LONG_MIN; >>>>> } >>>>> >>>>> This time we handle userspace page fault, so we cannot be kenrel thread, >>>>> and cannot be in_vfork(). >>>>> However task can be marked as oom unkillable, >>>>> i.e. have p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN >>>> >>>> You are right. I am not sure there is a way out of this though. The task >>>> can only retry for ever in this case. There is nothing actionable here. >>>> We cannot kill the task and there is no other way to release the memory. >>> >>> Btw. don't we force the charge in that case? >> >> We should force charge for allocation from inside page fault handler, >> to prevent endless cycle in retried page faults. >> However we should not do it for allocations from task context, >> to prevent memcg-limited vmalloc-eaters from to consume all host memory. > > I don't see a big difference between those two. Because the #PF could > result into the very same situation depleting all the memory by > overcharging. A different behavior just leads to a confusion and > unexpected behavior. E.g. in the past we only triggered memcg OOM killer > from the #PF path and failed the charge otherwise. That is something > different but it shows problems we haven't anticipated and had user > visible problems. See 29ef680ae7c2 ("memcg, oom: move out_of_memory back > to the charge path"). In this case I think we should fail this allocation. It's better do not allow overcharge, neither in #PF not in regular allocations. However this failure will trigger false global OOM in pagefault_out_of_memory(), and we need to find some way to prevent it. Thank you, Vasily Averin