Received: by 10.223.176.46 with SMTP id f43csp560486wra; Wed, 24 Jan 2018 02:28:00 -0800 (PST) X-Google-Smtp-Source: AH8x224xZ5jqy06+dsxI/7Pj4ILkNH2RM0qMqPlrHM+2LlyPT9U7ql7INtuMfM70t5hUO7HG3GPL X-Received: by 10.101.91.193 with SMTP id o1mr9820935pgr.315.1516789680487; Wed, 24 Jan 2018 02:28:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516789680; cv=none; d=google.com; s=arc-20160816; b=Xa1jQXWwwCZxz2FEDT1O2Zvo/X2PetzD+5AOb6xlkUENkZvRuLnlPIb+zml0Hnjybr 2fBIzqTWHAe6/DV3RhQzJRcJANn+Gadt+yw3m5ksf1jirUWbGpfE0pIT3Qp61HYj2woO UFo+Zqx5zqZvrCdFrSlCT3icuYJG+cVVLuk1lv+Rxqo2NFoIj6l3FbnEwtWcoQjueHqn o6pM37XTnUciRAYIP1T4QGg3QdfftGlzkErTMQWw5IV2csQwQ0eI6Xx2nJyuncr8hd/9 K3E6EbpDmwN0jCpsvV0WZBbccNC3nxr21Dpaj4LNgKU08Dl32S3oqsqwhg38I3aL19N/ 4B2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=K6OyXKhEPHYai62duCiugT5BjdkP+iy/zrPCf/yEs38=; b=PIv4hqMQsxXxJs3lxPx1MPaj3gX88xDdxmlh196FYURyVl5s01h712JlC9seyukMAR VrsQrI6bNPAZzVVnTiE9rnDpV3TOuCT1lM3nJjfy0Ssix+zy7FO611ow+salEP8Fg07B 3QEZHEiE/T8bq0QGm7Dj86n1tzKWwzNWflgFDdrFIhJ/9xSGsehb83CP69y56M//EIoG vD/DYGAEXskf871iw0+pggxoQBFvhJN3rCwMeuSxJYXV1lPkeV8x6/XK8vmzf01uOeDi X5Ey24mlfPF3qzt0U8jLGKkQuvYoRJPGcxs0Qw31ZZJ0NIDTOA4mR/dyIG5Q8/66C9fZ A+uw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l1si15986561pgn.324.2018.01.24.02.27.45; Wed, 24 Jan 2018 02:28:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932954AbeAXK1U (ORCPT + 99 others); Wed, 24 Jan 2018 05:27:20 -0500 Received: from mail.netline.ch ([148.251.143.178]:56184 "EHLO netline-mail3.netline.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932634AbeAXK1T (ORCPT ); Wed, 24 Jan 2018 05:27:19 -0500 Received: from localhost (localhost [127.0.0.1]) by netline-mail3.netline.ch (Postfix) with ESMTP id 8F6AD2A6045; Wed, 24 Jan 2018 11:27:17 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at netline-mail3.netline.ch Received: from netline-mail3.netline.ch ([127.0.0.1]) by localhost (netline-mail3.netline.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id dyrAyeIeJZyB; Wed, 24 Jan 2018 11:27:17 +0100 (CET) Received: from thor (190.2.62.188.dynamic.wline.res.cust.swisscom.ch [188.62.2.190]) by netline-mail3.netline.ch (Postfix) with ESMTPSA id 98A0D2A6042; Wed, 24 Jan 2018 11:27:16 +0100 (CET) Received: from localhost ([::1]) by thor with esmtp (Exim 4.90) (envelope-from ) id 1eeIGx-000646-VT; Wed, 24 Jan 2018 11:27:15 +0100 Subject: Re: [RFC] Per file OOM badness To: Michal Hocko Cc: linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Christian.Koenig@amd.com, linux-mm@kvack.org, amd-gfx@lists.freedesktop.org, Roman Gushchin References: <1516294072-17841-1-git-send-email-andrey.grodzovsky@amd.com> <20180118170006.GG6584@dhcp22.suse.cz> <20180123152659.GA21817@castle.DHCP.thefacebook.com> <20180123153631.GR1526@dhcp22.suse.cz> <20180124092847.GI1526@dhcp22.suse.cz> From: =?UTF-8?Q?Michel_D=c3=a4nzer?= Message-ID: <583f328e-ff46-c6a4-8548-064259995766@daenzer.net> Date: Wed, 24 Jan 2018 11:27:15 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180124092847.GI1526@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-CA Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-01-24 10:28 AM, Michal Hocko wrote: > On Tue 23-01-18 17:39:19, Michel Dänzer wrote: >> On 2018-01-23 04:36 PM, Michal Hocko wrote: >>> On Tue 23-01-18 15:27:00, Roman Gushchin wrote: >>>> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote: >>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote: >>>>>> Hi, this series is a revised version of an RFC sent by Christian König >>>>>> a few years ago. The original RFC can be found at >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w& >>>>> Here is the origin cover letter text >>>>> : I'm currently working on the issue that when device drivers allocate memory on >>>>> : behalf of an application the OOM killer usually doesn't knew about that unless >>>>> : the application also get this memory mapped into their address space. >>>>> : >>>>> : This is especially annoying for graphics drivers where a lot of the VRAM >>>>> : usually isn't CPU accessible and so doesn't make sense to map into the >>>>> : address space of the process using it. >>>>> : >>>>> : The problem now is that when an application starts to use a lot of VRAM those >>>>> : buffers objects sooner or later get swapped out to system memory, but when we >>>>> : now run into an out of memory situation the OOM killer obviously doesn't knew >>>>> : anything about that memory and so usually kills the wrong process. >>>>> : >>>>> : The following set of patches tries to address this problem by introducing a per >>>>> : file OOM badness score, which device drivers can use to give the OOM killer a >>>>> : hint how many resources are bound to a file descriptor so that it can make >>>>> : better decisions which process to kill. >>>>> : >>>>> : So question at every one: What do you think about this approach? >>>>> : >>>>> : My biggest concern right now is the patches are messing with a core kernel >>>>> : structure (adding a field to struct file). Any better idea? I'm considering >>>>> : to put a callback into file_ops instead. >>>> >>>> Hello! >>>> >>>> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you? >>> >>> I do not think so. The problem is that the allocating context is not >>> identical with the end consumer. >> >> That's actually not really true. Even in cases where a BO is shared with >> a different process, it is still used at least occasionally in the >> process which allocated it as well. Otherwise there would be no point in >> sharing it between processes. > > OK, but somebody has to be made responsible. Otherwise you are just > killing a process which doesn't really release any memory. > >> There should be no problem if the memory of a shared BO is accounted for >> in each process sharing it. It might be nice to scale each process' >> "debt" by 1 / (number of processes sharing it) if possible, but in the >> worst case accounting it fully in each process should be fine. > > So how exactly then helps to kill one of those processes? The memory > stays pinned behind or do I still misunderstand? Fundamentally, the memory is only released once all references to the BOs are dropped. That's true no matter how the memory is accounted for between the processes referencing the BO. In practice, this should be fine: 1. The amount of memory used for shared BOs is normally small compared to the amount of memory used for non-shared BOs (and other things). So regardless of how shared BOs are accounted for, the OOM killer should first target the process which is responsible for more memory overall. 2. If the OOM killer kills a process which is sharing BOs with another process, this should result in the other process dropping its references to the BOs as well, at which point the memory is released. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer