Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp637971ybl; Fri, 16 Aug 2019 01:25:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqwnrgpaepzVb3puMA2G/rVpDcyxTxEbaNXgusUrgtACqykv0P7TtJxGWau2H5BFVqptZtmE X-Received: by 2002:a17:90a:d3cf:: with SMTP id d15mr5984220pjw.34.1565943945040; Fri, 16 Aug 2019 01:25:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565943945; cv=none; d=google.com; s=arc-20160816; b=ASjqIkUOnnpu5EVxLuwh4yO33plSiGzc+TYNxpll9r2yKZcrKFwvP02yPxvGvQwbUc 3tHm5Xc7dIFet55zYFCOPXJ7oacQp27OWNPfMMpjStQGHEurtMQwjYXuvFz//UcZJfHE v8Mvthf8FfKGDoFf/dOydbJ2Xv6znUa9VaOyVM/aaw688BpjHTdjguCX0sTQlNVk/9Cc M8Us2orABofeJ9W4x3widScHbj4cZKPj4ZS7HGW7F9/1HLjCxDYoesU0vYPPWvQKaTSQ Vti48NICP+Onnrimx8TL+jl1jWgJKFAeii1Na7f9alSLq320s5LGbCRFmwMsvixPPBZY npWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=aiHKeafxA/u+sSeW1XFmcRSUd6Uy/LzGgLPj1as6yzk=; b=Q5x5LFrfv6n7oBOEoCPyE8yEBiLehc7OBKe0QQobq+u/aORVbTOiA3EH0NeTgCgYa3 hvHqeRPj6rmvlawPyw56jY3CyiT8sC9t7Zn9sWvhaWarbWiTpW0Sh5OHtnwWv1S1samp hIh9vuCVU6r2U+u6bvviO0pLwe5PVI+BF1Q1lFUm8fnW/NaRMawXdKnPvXbOAzZrs2gy dhFo0MYmPzP/V34zIF6B9R2h3R8YvkpzjTJuhT+aAMQC5B4962jNCexB1BtHAUN2uTW9 wldERZLzs81vXevGwegmusm5wYoXyFfj+z+wi27RyRJQhjrI2dzx3/+t/sdGEFAMCQYt FmMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p12si3623345pli.293.2019.08.16.01.25.27; Fri, 16 Aug 2019 01:25:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726903AbfHPIYc (ORCPT + 99 others); Fri, 16 Aug 2019 04:24:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:42142 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726684AbfHPIYc (ORCPT ); Fri, 16 Aug 2019 04:24:32 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 11D2CAFC3; Fri, 16 Aug 2019 08:24:30 +0000 (UTC) Date: Fri, 16 Aug 2019 10:24:28 +0200 From: Michal Hocko To: Andrew Morton Cc: Daniel Vetter , LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Jason Gunthorpe , Peter Zijlstra , Ingo Molnar , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190816082428.GB27790@dhcp22.suse.cz> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814134558.fe659b1a9a169c0150c3e57c@linux-foundation.org> <20190815084429.GE9477@dhcp22.suse.cz> <20190815151509.9ddbd1f11fb9c4c3e97a67a5@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190815151509.9ddbd1f11fb9c4c3e97a67a5@linux-foundation.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 15-08-19 15:15:09, Andrew Morton wrote: > On Thu, 15 Aug 2019 10:44:29 +0200 Michal Hocko wrote: > > > > I continue to struggle with this. It introduces a new kernel state > > > "running preemptibly but must not call schedule()". How does this make > > > any sense? > > > > > > Perhaps a much, much more detailed description of the oom_reaper > > > situation would help out. > > > > The primary point here is that there is a demand of non blockable mmu > > notifiers to be called when the oom reaper tears down the address space. > > As the oom reaper is the primary guarantee of the oom handling forward > > progress it cannot be blocked on anything that might depend on blockable > > memory allocations. These are not really easy to track because they > > might be indirect - e.g. notifier blocks on a lock which other context > > holds while allocating memory or waiting for a flusher that needs memory > > to perform its work. If such a blocking state happens that we can end up > > in a silent hang with an unusable machine. > > Now we hope for reasonable implementations of mmu notifiers (strong > > words I know ;) and this should be relatively simple and effective catch > > all tool to detect something suspicious is going on. > > > > Does that make the situation more clear? > > Yes, thanks, much. Maybe a code comment along the lines of > > This is on behalf of the oom reaper, specifically when it is > calling the mmu notifiers. The problem is that if the notifier were > to block on, for example, mutex_lock() and if the process which holds > that mutex were to perform a sleeping memory allocation, the oom > reaper is now blocked on completion of that memory allocation. reaper is now blocked on completion of that memory allocation and cannot provide the guarantee of the OOM forward progress. OK. > btw, do we need task_struct.non_block_count? Perhaps the oom reaper > thread could set a new PF_NONBLOCK (which would be more general than > PF_OOM_REAPER). If we run out of PF_ flags, use (current == oom_reaper_th). Well, I do not have a strong opinion here. A simple check for the value seems to be trivial. There are quite some holes in task_struct to hide this counter without increasing the size. -- Michal Hocko SUSE Labs