Received: by 10.213.65.68 with SMTP id h4csp247398imn; Tue, 3 Apr 2018 20:01:09 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+2Gyah4tYjZZE9HMfRptTeCXIoS2yWIuVG8VskVfZa7+9Ui0AbtweWCqlVW4MlVvFrONTb X-Received: by 2002:a17:902:2be4:: with SMTP id l91-v6mr17151220plb.102.1522810869170; Tue, 03 Apr 2018 20:01:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522810869; cv=none; d=google.com; s=arc-20160816; b=lUf6TbMPbTflCua2sDMBoi0qonsSNs5e5ELjhKP02Z1kz/0N+ihc7HVws4cJQ6JD5v 6Y4YVswk9dV5plKNhzOwbgdsYmc9Xgqk5POIyoZdoPd3ZHJwbDawACdK6FZqKkpbp7nS Ui4rDpNHziJrCQTlKq1Zf+bin/qEYJVfqXrnZzVsOJEY4U1Ypr4hVBFqzCW15fVovs5t gnklpS4qHiOkbaSDc22RV/DXiMWKFCarx8L9tInPL9JmvvN3FmeJp4l8mbpaRmhfM6JV JhZKWlEZYeJOYmZexEeOiBmJWwEDRs5bFs7JTwY1jhsE/0H0UjqHwos5cvk+ROYkpVuL 8knw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=KLLf4DAsFVDsnYtr/UA+xHMefKiIWjbO2rb1cidaEiM=; b=Dm+se4bevVNrGk7T33NXtaTqpSQdWY3iCLL+W2PgKyD8+OH4q7lYGRa3Nl1vNc22aP FpbwBm0NU528Rk51baRnmIYsZ9+rW/nIFk37kx5MABr5v5nKgwHt8rsbvEYcz6LSFSAQ 9QtVhzbMPrtp13BWW7Gc735qBxkyTXjnuaCSXVmovRqLlV/KZ3UZLhNW8Hg9wa/GuDNf 0AIm7yR92DiTqlI1aLvlEmEyKWS52SvoupnUaOxUyShFSc7rVAkmApCnJFJ5TLQ0Kk6T OkZW3nU7JX3DfEI1nRvQnJxl0jKgJs7xQ0RUXpgbh1FYWvbLyxlIV7Vloce2LhGxkoee qqpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KJX9P8KH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u8-v6si2096008plh.469.2018.04.03.20.00.55; Tue, 03 Apr 2018 20:01:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KJX9P8KH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753169AbeDDC6m (ORCPT + 99 others); Tue, 3 Apr 2018 22:58:42 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:38721 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752274AbeDDC6k (ORCPT ); Tue, 3 Apr 2018 22:58:40 -0400 Received: by mail-wm0-f46.google.com with SMTP id i3so15047334wmf.3 for ; Tue, 03 Apr 2018 19:58:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=KLLf4DAsFVDsnYtr/UA+xHMefKiIWjbO2rb1cidaEiM=; b=KJX9P8KHx73lyxmEGyFUCZS0pNmpdLI7Un6O6YI4pm7fMKSFzFiPDfhBfsH+sTpKj+ zkGCVJADSvRRV4QAkS9/eDLXuKSF2h3uei0rUeLNeQ/Sae+dhbmU1V10sRWi+j/uYF9n uDOVYLSmjWGEEgcDwfeqhlSKUOtaGlJUar+yzvRL0MvkjK1xeO7/S1dQPSeCwMcON2uu xhB6jib/OeugkgAdnAx95lepP6xOtr9gck89rtpcpqeuptJte3+LZq56cSmmWNgvqkjj +LfhMjwwaaT6Uujrqt5RLMtVjDAqn/Z1uhhEPZOpKNuYIYAxI2y2f4uOLukjUyJPn/Rc 0YXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=KLLf4DAsFVDsnYtr/UA+xHMefKiIWjbO2rb1cidaEiM=; b=IYr8yChHQgnj58XNxIdcLDgTrRcbf5vONaM8RjsySNarTA8xnUyzhJCHtEuXlwfvf8 7iSzRT+Qh6jiVqdbS685cxNAaeb0/np0wcGLYCSAdLkz2s6jmV3Vu2XGjpW+6xjAyOSk Z1cUG60/b9SkcfQxxihPI5Gg3YP5Q6TfNf+Udh8Sdy8YdGcsORgi7s9gNRPEXICdHJed 5h/4VQcuekqafFt8tH4QYQbTyqS/OdcTKx0x5UFDPvZkcpF3Cp3lvwBL66p8vgB7UXZs +7Yqe47m+s84yADccrh8h0JfSarOCyFeQox+HgXX1jmvFG/Un0TK8ylit4uHCAojE5CI 9vNQ== X-Gm-Message-State: AElRT7GS+VLFqQXGDBAkai0TPlDAtFh9tUms11KOigVEpW6O7k+IZ4Z1 go7huyC3k/7kwiwnHqzhiqU0XHmGIUaPEAs4Bc8= X-Received: by 10.80.146.170 with SMTP id k39mr19032272eda.110.1522810719616; Tue, 03 Apr 2018 19:58:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.201.76 with HTTP; Tue, 3 Apr 2018 19:58:39 -0700 (PDT) In-Reply-To: <20180403135607.GC5501@dhcp22.suse.cz> References: <1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com> <20180330102038.2378925b@gandalf.local.home> <20180403110612.GM5501@dhcp22.suse.cz> <20180403075158.0c0a2795@gandalf.local.home> <20180403121614.GV5501@dhcp22.suse.cz> <20180403082348.28cd3c1c@gandalf.local.home> <20180403123514.GX5501@dhcp22.suse.cz> <20180403093245.43e7e77c@gandalf.local.home> <20180403135607.GC5501@dhcp22.suse.cz> From: Zhaoyang Huang Date: Wed, 4 Apr 2018 10:58:39 +0800 Message-ID: Subject: Re: [PATCH v1] kernel/trace:check the val against the available mem To: Michal Hocko Cc: Steven Rostedt , Ingo Molnar , linux-kernel@vger.kernel.org, kernel-patch-test@lists.linaro.org, Andrew Morton , Joel Fernandes , linux-mm@kvack.org, Vlastimil Babka Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 3, 2018 at 9:56 PM, Michal Hocko wrote: > On Tue 03-04-18 09:32:45, Steven Rostedt wrote: >> On Tue, 3 Apr 2018 14:35:14 +0200 >> Michal Hocko wrote: > [...] >> > Being clever is OK if it doesn't add a tricky code. And relying on >> > si_mem_available is definitely tricky and obscure. >> >> Can we get the mm subsystem to provide a better method to know if an >> allocation will possibly succeed or not before trying it? It doesn't >> have to be free of races. Just "if I allocate this many pages right >> now, will it work?" If that changes from the time it asks to the time >> it allocates, that's fine. I'm not trying to prevent OOM to never >> trigger. I just don't want to to trigger consistently. > > How do you do that without an actuall allocation request? And more > fundamentally, what if your _particular_ request is just fine but it > will get us so close to the OOM edge that the next legit allocation > request simply goes OOM? There is simply no sane interface I can think > of that would satisfy a safe/sensible "will it cause OOM" semantic. > The point is the app which try to allocate the size over the line will escape the OOM and let other innocent to be sacrificed. However, the one which you mentioned above will be possibly selected by OOM that triggered by consequnce failed allocation. >> > > Perhaps I should try to allocate a large group of pages with >> > > RETRY_MAYFAIL, and if that fails go back to NORETRY, with the thinking >> > > that the large allocation may reclaim some memory that would allow the >> > > NORETRY to succeed with smaller allocations (one page at a time)? >> > >> > That again relies on a subtle dependencies of the current >> > implementation. So I would rather ask whether this is something that >> > really deserves special treatment. If admin asks for a buffer of a >> > certain size then try to do so. If we get OOM then bad luck you cannot >> > get large memory buffers for free... >> >> That is not acceptable to me nor to the people asking for this. >> >> The problem is known. The ring buffer allocates memory page by page, >> and this can allow it to easily take all memory in the system before it >> fails to allocate and free everything it had done. > > Then do not allow buffers that are too large. How often do you need > buffers that are larger than few megs or small % of the available > memory? Consuming excessive amount of memory just to trace workload > which will need some memory on its own sounds just dubious to me. > >> If you don't like the use of si_mem_available() I'll do the larger >> pages method. Yes it depends on the current implementation of memory >> allocation. It will depend on RETRY_MAYFAIL trying to allocate a large >> number of pages, and fail if it can't (leaving memory for other >> allocations to succeed). >> >> The allocation of the ring buffer isn't critical. It can fail to >> expand, and we can tell the user -ENOMEM. I original had NORETRY >> because I rather have it fail than cause an OOM. But there's folks >> (like Joel) that want it to succeed when there's available memory in >> page caches. > > Then implement a retry logic on top of NORETRY. You can control how hard > to retry to satisfy the request yourself. You still risk that your > allocation will get us close to OOM for _somebody_ else though. > >> I'm fine if the admin shoots herself in the foot if the ring buffer >> gets big enough to start causing OOMs, but I don't want it to cause >> OOMs if there's not even enough memory to fulfill the ring buffer size >> itself. > > I simply do not see the difference between the two. Both have the same > deadly effect in the end. The direct OOM has an arguable advantage that > the effect is immediate rather than subtle with potential performance > side effects until the machine OOMs after crawling for quite some time. > > -- > Michal Hocko > SUSE Labs