Received: by 10.192.165.156 with SMTP id m28csp572175imm; Thu, 19 Apr 2018 04:05:48 -0700 (PDT) X-Google-Smtp-Source: AIpwx48fg2dRQg7YiBwQQr63lKbLaGqTnZxRLlinIFOifxkhV5HiT7xm0DKUb1NZjoSDQFN1cJrl X-Received: by 2002:a17:902:7844:: with SMTP id e4-v6mr5663633pln.296.1524135948866; Thu, 19 Apr 2018 04:05:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524135948; cv=none; d=google.com; s=arc-20160816; b=jYZR7vio1fbI58+8BoCyFBoi8fASZQKd5eSMvTMUAasv5AgZMGo5zHnyBfWijo3Y8d Vs//6mfjSpctFBfL+DMaNcv72c7uns6R5EBDqNVMpGV9wXHAh3TwdwRBTAyDK+ouRmoh q15NgQVcBajKwqHeaGYwLGhI4gLsHqbKV1p3X7VMSCjBbvMjWffl69VC5/foHilUuz2S d74vEz4HUzDxLDwGMJ2JqgL+uJZBWaW4khlO0GYSLIjpsN0o9KWHptDCdJMozgDcFgqi RrjpNeuhmg20MwjKBYQPtgEtctb/80ACYN8sPpK5m409+LTMN7ZRj15PuF19M8SWZl45 P3SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=lWYSDERqcql9Sx21R1WeOZEwnj2u2La7vrsqueqbwxk=; b=x23BYfEuVMqJV2OKS4G1lOEDlIhZmyZ1rBVi97IvfhhxfdW/XAL05MCEicHlNswLiW ITCxp+jxioZU0N1riVtInzpBqVhBs5CwGQBdWEQHHR14Gff+1nCEMkJFvGUFuL6QatsW bXTWUyJY7jYBZS3EL4paw9Y32cV9wbIr/2rdrqFjEWdKUjPzciqSEkfUR11flO+f4aoe dXRKkFsPGUHQjGNIgXJbMl1EzU5BQC3/DzW2mF153RC2MEyoZCTOYtjqAfBq+IxDctc/ c8ZSdgpkDpmil2D2vbJ1Xnz3t9Zx9Esir0Oi7qvl9XfV+GQeA8AJvwA6bRQcIDSiEOZt jDAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f11si2816360pgr.139.2018.04.19.04.05.34; Thu, 19 Apr 2018 04:05:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752633AbeDSLE2 (ORCPT + 99 others); Thu, 19 Apr 2018 07:04:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:43993 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751882AbeDSLE1 (ORCPT ); Thu, 19 Apr 2018 07:04:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 1AFA0AED4; Thu, 19 Apr 2018 11:04:26 +0000 (UTC) Date: Thu, 19 Apr 2018 13:04:19 +0200 From: Michal Hocko To: Tetsuo Handa Cc: rientjes@google.com, akpm@linux-foundation.org, aarcange@redhat.com, guro@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap Message-ID: <20180419110419.GQ17484@dhcp22.suse.cz> References: <20180418075051.GO17484@dhcp22.suse.cz> <20180419063556.GK17484@dhcp22.suse.cz> <201804191945.BBF87517.FVMLOQFOHSFJOt@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201804191945.BBF87517.FVMLOQFOHSFJOt@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 19-04-18 19:45:46, Tetsuo Handa wrote: > Michal Hocko wrote: > > > exit_mmap() does not block before set_bit(MMF_OOM_SKIP) once it is > > > entered. > > > > Not true. munlock_vma_pages_all might take page_lock which can have > > unpredictable dependences. This is the reason why we are ruling out > > mlocked VMAs in the first place when reaping the address space. > > Wow! Then, > > > While you are correct, strictly speaking, because unmap_vmas can race > > with the oom reaper. With the lock held during the whole operation we > > can indeed trigger back off in the oom_repaer. It will keep retrying but > > the tear down can take quite some time. This is a fair argument. On the > > other hand your lock protocol introduces the MMF_OOM_SKIP problem I've > > mentioned above and that really worries me. The primary objective of the > > reaper is to guarantee a forward progress without relying on any > > externalities. We might kill another OOM victim but that is safer than > > lock up. > > current code has a possibility that the OOM reaper is disturbed by > unpredictable dependencies, like I worried that > > I think that there is a possibility that the OOM reaper tries to reclaim > mlocked pages as soon as exit_mmap() cleared VM_LOCKED flag by calling > munlock_vma_pages_all(). > > when current approach was proposed. We currently have the MMF_OOM_SKIP problem. > We need to teach the OOM reaper stop reaping as soon as entering exit_mmap(). > Maybe let the OOM reaper poll for progress (e.g. none of get_mm_counter(mm, *) > decreased for last 1 second) ? Can we start simple and build a more elaborate heuristics on top _please_? In other words holding the mmap_sem for write for oom victims in exit_mmap should handle the problem. We can then enhance this to probe for progress or any other clever tricks if we find out that the race happens too often and we kill more than necessary. Let's not repeat the error of trying to be too clever from the beginning as we did previously. This are is just too subtle and obviously error prone. -- Michal Hocko SUSE Labs