Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4901762imm; Tue, 19 Jun 2018 01:35:08 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKsn0pjfWwrbZ4/iITL9y03sMhpAADICPtenSoQ3e4Hc1iWwsnZVeeC8ebEhUB4nuQlXEHV X-Received: by 2002:a62:d913:: with SMTP id s19-v6mr17136963pfg.39.1529397308133; Tue, 19 Jun 2018 01:35:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529397308; cv=none; d=google.com; s=arc-20160816; b=EHC90rw9NqQFLB/Htrhav9qPXcK7r36YuLgh5zX8PwjwB1gJLkKK/upLXnYQ4kubG6 +Ok3Ia4ilBVdl2Gr25yoGiXGKx1l76emvSS32OC48l/eqT7V/+TZs9wxh4v1pXVrnzmM Q5SXZo3vohm7ic/taQ5SU7alVOEr82xB5/yhrh4FhTrcv1akhtveLNK4+rwT0q82BCYj v8epYqgEzGOIrg+cheViamoVfdCbwNC8kSF6w3qNjvyAUIgAboKs4bu/NvSHDnDSaL59 iqO/BUDkt1gs/GsHg6RtJYnQo0A51tdHYG8B91kXgx0PC4quVli2ADh5I5wyp/NAfLsR imqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=aPVuXcLAfodHEVese2P0E8vPSoDvMMkpnwNNUmKCliU=; b=REF7CoePutVULqlg20bWpB5VU7RrWLrjFy7YwW7GkNqPRkqgq3hMn9myCqT1d35Azt B7I66Y+qeN5312SiFlZrsDhIV/+PezFqR65ssEuCecXu6U9TmJiCmRbTYgZwufFNEp8K FqtvVLaoM634AWCUp+/m3gFO94v5uSl7XHttEOWaDi4zONS8WRvDvEvU5PLuKfoG+/1i Hn1t69JlrQPpA+0/vfq6l35gbZ2FNe60xT+6lwOrw8Wu7xFPYr9w11qifngZ3qlBtWUU D1+YOns7IqhQUI5+XyFivQFPo+5tG/X0Rm9tdYCN23ijr/MqVys5hb+GyTkcAYnW9cUv Dz+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n21-v6si12303532plp.31.2018.06.19.01.34.54; Tue, 19 Jun 2018 01:35:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S937425AbeFSIdU (ORCPT + 99 others); Tue, 19 Jun 2018 04:33:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:37346 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937236AbeFSIdS (ORCPT ); Tue, 19 Jun 2018 04:33:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F087EAEF5; Tue, 19 Jun 2018 08:33:16 +0000 (UTC) Date: Tue, 19 Jun 2018 10:33:16 +0200 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Tetsuo Handa , "Aneesh Kumar K.V" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, oom: fix unnecessary killing of additional processes Message-ID: <20180619083316.GB13685@dhcp22.suse.cz> References: <20180615065541.GA24039@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 15-06-18 16:15:39, David Rientjes wrote: [...] > I'd be happy to make the this timeout configurable, however, and default > it to perhaps one second as the blockable mmu notifier timeout in your own > code does. I find it somewhat sad that we'd need a sysctl for this, but > if that will appease you and it will help to move this into -mm then we > can do that. No. This has been nacked in the past and I do not see anything different from back than. > > Other than that I've already pointed to a more robust solution. If you > > are reluctant to try it out I will do, but introducing a timeout is just > > papering over the real problem. Maybe we will not reach the state that > > _all_ the memory is reapable but we definitely should try to make as > > much as possible to be reapable and I do not see any fundamental > > problems in that direction. > > You introduced the timeout already, I'm sure you realized yourself that > the oom reaper sets MMF_OOM_SKIP much too early. Trying to grab > mm->mmap_sem 10 times in a row with HZ/10 sleeps in between is a timeout. Yes, it is. And it is a timeout based some some feedback. The lock is held, let's retry later but do not retry for ever. We can do the same with blockable mmu notifiers. We are currently giving up right away. I was proposing to add can_sleep parameter to mmu_notifier_invalidate_range_start and return it EAGAIN if it would block. This would allow to simply retry on EAGAIN like we do for the mmap_sem. [...] > The reproducer on powerpc is very simple. Do an mmap() and mlock() the > length. Fork one 120MB process that does that and two 60MB processes that > do that in a 128MB memcg. And again, to solve this we just need to teach oom_reaper to handle mlocked memory. There shouldn't be any fundamental reason why this would be impossible AFAICS. Timeout is not a solution! [...] > It's inappropriate to merge code that oom kills many processes > unnecessarily when one happens to be mlocked or have blockable mmu > notifiers or when mm->mmap_sem can't be grabbed fast enough but forward > progress is actually being made. It's a regression, and it impacts real > users. Insisting that we fix the problem you introduced by making all mmu > notifiers unblockable and mlocked memory can always be reaped and > mm->mmap_sem can always be grabbed within a second is irresponsible. Well, a lack of real world bug reports doesn't really back your story here. I have asked about non-artificial workloads suffering and your responsive were quite nonspecific to say the least. And I do insist to come with a reasonable solution rather than random hacks. Jeez the oom killer was full of these. As I've said, if you are not willing to work on a proper solution, I will, but my nack holds for this patch until we see no other way around existing and real world problems. -- Michal Hocko SUSE Labs