Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp658760ybi; Fri, 31 May 2019 07:07:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqwxU7UtzT1lEjQ5oEMfXy549j9uKjz++55fk1M8giOOMhLJOsH0GkLihglxTWPHrBq4Jofx X-Received: by 2002:a17:902:b495:: with SMTP id y21mr9627458plr.243.1559311657248; Fri, 31 May 2019 07:07:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559311657; cv=none; d=google.com; s=arc-20160816; b=GW3HlgIadQnSxz915CX8eVvTV2SNbanyhBgB2POvlhhFbq63icbo320mHCOm6JqS1H Qi9XX5b5cDLnv838lQwIiVLBrCQY6r9Ddgb6qnppdDYfHKVoxvSdcFjZU2/bVY95K9Wj WlMFtJOY70efESiCzO4QGNPzpQJVwS6GUIVlVY3FfTt6Gc1QhJMyXJ/B/UTjhewlCNwp VsYszgLYF65zQIpV2NS6ornPnvcklZ+ISLR778BCcq3RWyT1yDz/LdWsCAYAxdskgvqK H09guj/vUvg9awY6lJW1YuoK/ELtfOil0AVeVXUSzmDwEtwOjFHitMVKhqg4qUNsbii/ +PVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=P88sQ0EedyxZ1j8sjDczzolnTreW/zwi695NMfjN7iw=; b=xBvmf9IQ/Ggks0TO1h04SmftfhOhp+hu9KNoHeooNB0Rgp9IhNrywJ8zb7/uR4jmyl /ERcEtKpLSZaVsVKmHOYsCDiOJAVdHGKH8as+kPXbe1ejpMrdZxQV73JmQd2DQXTJIaR 4NTZAIci7nhqE5o2x3yT6VG+l6RzL8FVBxosUHGyygzcSQUFP+hqjjErTgC8CjKy/cpj lAAz7Qx8LOjrhdtJs3qbZ68h2sRXHdgD0DJAUZ4xHjo8XbvrloGRFNUg4GQq+NhEPML9 H+A5xjc4zvaGX4Po79HzbrDf+o1xo3QXK7Ez0h+r8zxgI+kWycmWVSEZYPHTPthjSP16 49EA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x188si7309343pfd.175.2019.05.31.07.07.06; Fri, 31 May 2019 07:07:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726635AbfEaODf (ORCPT + 99 others); Fri, 31 May 2019 10:03:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:55636 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726037AbfEaODf (ORCPT ); Fri, 31 May 2019 10:03:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C57EDAF52; Fri, 31 May 2019 14:03:32 +0000 (UTC) Date: Fri, 31 May 2019 16:03:32 +0200 From: Michal Hocko To: Minchan Kim Cc: Andrew Morton , linux-mm , LKML , linux-api@vger.kernel.org, Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com Subject: Re: [RFCv2 1/6] mm: introduce MADV_COLD Message-ID: <20190531140332.GT6896@dhcp22.suse.cz> References: <20190531064313.193437-1-minchan@kernel.org> <20190531064313.193437-2-minchan@kernel.org> <20190531084752.GI6896@dhcp22.suse.cz> <20190531133904.GC195463@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190531133904.GC195463@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 31-05-19 22:39:04, Minchan Kim wrote: > On Fri, May 31, 2019 at 10:47:52AM +0200, Michal Hocko wrote: > > On Fri 31-05-19 15:43:08, Minchan Kim wrote: > > > When a process expects no accesses to a certain memory range, it could > > > give a hint to kernel that the pages can be reclaimed when memory pressure > > > happens but data should be preserved for future use. This could reduce > > > workingset eviction so it ends up increasing performance. > > > > > > This patch introduces the new MADV_COLD hint to madvise(2) syscall. > > > MADV_COLD can be used by a process to mark a memory range as not expected > > > to be used in the near future. The hint can help kernel in deciding which > > > pages to evict early during memory pressure. > > > > > > Internally, it works via deactivating pages from active list to inactive's > > > head if the page is private because inactive list could be full of > > > used-once pages which are first candidate for the reclaiming and that's a > > > reason why MADV_FREE move pages to head of inactive LRU list. Therefore, > > > if the memory pressure happens, they will be reclaimed earlier than other > > > active pages unless there is no access until the time. > > > > [I am intentionally not looking at the implementation because below > > points should be clear from the changelog - sorry about nagging ;)] > > > > What kind of pages can be deactivated? Anonymous/File backed. > > Private/shared? If shared, are there any restrictions? > > Both file and private pages could be deactived from each active LRU > to each inactive LRU if the page has one map_count. In other words, > > if (page_mapcount(page) <= 1) > deactivate_page(page); Why do we restrict to pages that are single mapped? > > Are there any restrictions on mappings? E.g. what would be an effect of > > this operation on hugetlbfs mapping? > > VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma will be skipped like MADV_FREE|DONTNEED OK documenting that this is restricted to the same vmas as MADV_FREE|DONTNEED is really useful to mention. > > > > > Also you are talking about inactive LRU but what kind of LRU is that? Is > > it the anonymous LRU? If yes, don't we have the same problem as with the > > active file page -> inactive file LRU > active anon page -> inacdtive anon LRU > > > early MADV_FREE implementation when enough page cache causes that > > deactivated anonymous memory doesn't get reclaimed anytime soon. Or > > worse never when there is no swap available? > > I think MADV_COLD is a little bit different symantic with MAVD_FREE. > MADV_FREE means it's okay to discard when the memory pressure because > the content of the page is *garbage*. Furthemore, freeing such pages is > almost zero overhead since we don't need to swap out and access > afterward causes minor fault. Thus, it would make sense to put those > freeable pages in inactive file LRU to compete other used-once pages. > > However, MADV_COLD doesn't means it's a garbage and freeing requires > swap out/swap in afterward. So, it would be better to move inactive > anon's LRU list, not file LRU. Furthermore, it would avoid unnecessary > scanning of those cold anonymous if system doesn't have a swap device. Please document this, if this is really a desirable semantic because then you have the same set of problems as we've had with the early MADV_FREE implementation mentioned above. -- Michal Hocko SUSE Labs