Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp991024imm; Fri, 27 Jul 2018 09:22:54 -0700 (PDT) X-Google-Smtp-Source: AAOMgpckN9CdqHv51VJFRBNTfMuGSVSOdIiGhdFAIMVEmLBCy5rWSLc4V4DlVPncRggkqtxfrNTM X-Received: by 2002:a62:4255:: with SMTP id p82-v6mr7394604pfa.238.1532708574783; Fri, 27 Jul 2018 09:22:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532708574; cv=none; d=google.com; s=arc-20160816; b=qkjYFtUAJpWk173AbTTlO1P1NtJcir3nOtshIjHkWPmio+Ej7xnOl/EEpQhiiXAT3M tInN2xokPwjmz5ilKR3r0j+3P1xT2b1SnFmWqBNNdwWhWVNEHy/+3j7cGUdnjBP7ZfuR Ovrs2LG+neYuPIo7iiUyKs4cgnjwTUxE54PL7RvZ4e+XTxw/FaKHADEYjmt2TF0mJgRf x/nSZiAz9RQyRt6sZTnsD7Tk5La1wue05Ujc08TJaKcnFvi6USRoDS+akPHKnQ4aJ/wo 1Hq++vzrbnwZBp7nfUVBSO37yS9HRhEPwlGhU55WIm+MvywFTe9+npg/sHH7n8XeEiPy 3vFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=cB+wFpVYVPujHnpAloDjopPAskBGPRxITLzMTkeeYJj7W+iaaASaJcnkBM+o9QHO4K kafS2oPxWmf2E25mX/cxwJ74r1tKJL9diqfEDrt/UAxvcg3W8nPXU8SyYJAyzBkM0NRQ ohjgce0hVDCkFepskwxUXBeWsd4Tez7yFVrOLaU9lA6NSxzD8NiCbVFSB5kDU6TIdCOl TfmUC4by5FaOU6hvAc1zCMeMRgu+M501jH8FfZc1qtuRZvXUUoUvFzXnu9Fbhuu3x8Q1 83FViDmyzG+BqdSnQ4p20Xntz7aRtTFWfLml9XeyZKV+H8kU9SFNYh+pzKl97HSzSu0x REAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b=P7b81J3H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31-v6si3878848plg.260.2018.07.27.09.22.39; Fri, 27 Jul 2018 09:22:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b=P7b81J3H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732167AbeG0Roa (ORCPT + 99 others); Fri, 27 Jul 2018 13:44:30 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:42636 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730510AbeG0Roa (ORCPT ); Fri, 27 Jul 2018 13:44:30 -0400 Received: by mail-pg1-f196.google.com with SMTP id y4-v6so3512880pgp.9 for ; Fri, 27 Jul 2018 09:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=endlessm-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=P7b81J3HXJbzbV/IW/4WCeglpX01UF+PG52sk/qnn4KG/BTTIAhEZ8gSt+BGjIR55I /sfNdaX0yL7AUK0/NmpbNxkOJ9h36I7w41leOMRjSyrwynsCcU0v9OlAVvoTHuwQFaWI C+pH/A7IUkY9C9FjSdynxr9ITVY0wxNtANaivxYh+kLJ0oKFFWrDXtVOjzHwTD7q2rFt sAJALskKD2o66CX5e9hBs+fxXFR7IqA6RykefAPsuo44jicK1knjwSnSHVdZlLOLmGR9 0IdMVPwMxy6Uf8Vn4JIB61amolagXX3EouWzpSdYOSOV02Tw4Jh6pfXgNkZ8O1QrK6w7 GVrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=RJo1bpbTpD3jpRAwX9/ttCICodK0fTis+3K5gkO1RB0=; b=ELf9PqKqrPespyM+IpForQ83iyHZQ7ws2NptrpXAwCEWhFEcTruAEr6lmOMRskeUmt kdpPFZ26wYaxMLE6hIW6Tzn9yLG+EWukoYi8SYHzNMjZLW3KGYRLAvCp6IZHgJrtAmq6 iB+1i8bk5OYgSf4Asy+fFNAEaN6/mTlBi3FRr1Sl6/Ji8EJgmvnlGQLxv5Lh49dUgmAI L+If7AScIqu1ubec23S5A8VRwTMaIyE6ul8TcCrEcwUoqeERjIjxs1qlv5/LkeGRuJ7G JvRTfhQ6p1te2I6sixMA0douX5sjfCJXU/yv3RtCnpr/s3Tb0p0aTiLSDqgVI+Ff2/Yi 8K8g== X-Gm-Message-State: AOUpUlFrbKelNtVCLsivpp1Ye5fsXEqchbFKPL1KuqVlosrpEGYB/bYv 9TSSoGm+1uBeMu0hRMjbDrERTw== X-Received: by 2002:a62:6941:: with SMTP id e62-v6mr7332670pfc.217.1532708512457; Fri, 27 Jul 2018 09:21:52 -0700 (PDT) Received: from limbo.local ([190.105.169.2]) by smtp.gmail.com with ESMTPSA id r28-v6sm11392248pfd.37.2018.07.27.09.21.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jul 2018 09:21:51 -0700 (PDT) From: Daniel Drake To: mhocko@kernel.org Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux@endlessm.com, linux-kernel@vger.kernel.org Subject: Making direct reclaim fail when thrashing Date: Fri, 27 Jul 2018 11:21:43 -0500 Message-Id: <20180727162143.26466-1-drake@endlessm.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Split from the thread [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 where we were discussing if/how to make the direct reclaim codepath fail if we're excessively thrashing, so that the OOM killer might step in. This is potentially desirable when the thrashing is so bad that the UI stops responding, causing the user to pull the plug. On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko wrote: > mm/workingset.c allows for tracking when an actual page got evicted. > workingset_refault tells us whether a give filemap fault is a recent > refault and activates the page if that is the case. So what you need is > to note how many refaulted pages we have on the active LRU list. If that > is a large part of the list and if the inactive list is really small > then we know we are trashing. This all sounds much easier than it will > eventually turn out to be of course but I didn't really get to play with > this much. Apologies in advance for any silly mistakes or terrible code that follows, as I am not familiar in this part of the kernel. As mentioned in my last mail, knowing if a page on the active list was refaulted into place appears not trivial, because the eviction information was lost upon refault (it was stored in the page cache shadow entry). Here I'm experimenting by adding another tag to the page cache radix tree, tagging pages that were activated in the refault path. And then in get_scan_count I'm checking how many active pages have that tag, and also looking at the size of the active and inactive lists. It has a performance blow (probably due to looping over the whole active list and doing lots of locking?) but I figured it might serve as one step forward. The results are not exactly as I would expect. Upon launching 20 processes that allocate and memset 100mb RAM each, exhausting all RAM (and no swap available), the kernel starts thrashing and I get numbers like: get_scan_count lru1 active=422714 inactive=19595 refaulted=0 get_scan_count lru3 active=832 inactive=757 refaulted=21 Lots of active anonymous pages (lru1), and none refaulted, perhaps not surprising because it can't swap them out, no swap available. But only few file pages on the lists (lru3), and only a tiny number of refaulted ones, which doesn't line up with your suggestion of detecting when a large part of the active list is made up of refaulted pages. Any further suggestions appreciated. Thanks Daniel --- include/linux/fs.h | 1 + include/linux/radix-tree.h | 2 +- mm/filemap.c | 2 ++ mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++++ 4 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index d85ac9d24bb3..45f94ffd1c67 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -462,6 +462,7 @@ struct block_device { #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE 2 +#define PAGECACHE_TAG_REFAULTED 3 int mapping_tagged(struct address_space *mapping, int tag); diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 34149e8b5f73..86eccb71ef7e 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -65,7 +65,7 @@ static inline bool radix_tree_is_internal_node(void *ptr) /*** radix-tree API starts here ***/ -#define RADIX_TREE_MAX_TAGS 3 +#define RADIX_TREE_MAX_TAGS 4 #ifndef RADIX_TREE_MAP_SHIFT #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) diff --git a/mm/filemap.c b/mm/filemap.c index 250f675dcfb2..9a686570dc75 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -917,6 +917,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, */ if (!(gfp_mask & __GFP_WRITE) && shadow && workingset_refault(shadow)) { + radix_tree_tag_set(&mapping->i_pages, page_index(page), + PAGECACHE_TAG_REFAULTED); SetPageActive(page); workingset_activation(page); } else diff --git a/mm/vmscan.c b/mm/vmscan.c index 03822f86f288..79bc810b43bb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2102,6 +2102,30 @@ enum scan_balance { SCAN_FILE, }; + +static int count_refaulted(struct lruvec *lruvec, enum lru_list lru) { + int nr_refaulted = 0; + struct page *page; + + list_for_each_entry(page, &lruvec->lists[lru], lru) { + /* Lookup page cache entry from page following the approach + * taken in __set_page_dirty_nobuffers */ + unsigned long flags; + struct address_space *mapping = page_mapping(page); + if (!mapping) + continue; + + xa_lock_irqsave(&mapping->i_pages, flags); + BUG_ON(page_mapping(page) != mapping); + nr_refaulted += radix_tree_tag_get(&mapping->i_pages, + page_index(page), + PAGECACHE_TAG_REFAULTED); + xa_unlock_irqrestore(&mapping->i_pages, flags); + } + + return nr_refaulted; +} + /* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined @@ -2270,6 +2294,19 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long size; unsigned long scan; + if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) { + int nr_refaulted; + unsigned long inactive, active; + + nr_refaulted = count_refaulted(lruvec, lru); + active = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); + inactive = lruvec_lru_size(lruvec, lru - 1, + sc->reclaim_idx); + pr_err("get_scan_count lru%d active=%ld inactive=%ld " + "refaulted=%d\n", + lru, active, inactive, nr_refaulted); + } + size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); scan = size >> sc->priority; /* -- 2.17.1