Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp409378imu; Fri, 21 Dec 2018 01:00:29 -0800 (PST) X-Google-Smtp-Source: ALg8bN7EFPu6I7cMJ+XgWTgo9SEisjTEA8jXn4/2jCeDkV1rJIMrXqpyh4WGrBxAipGKzEWCnadn X-Received: by 2002:a17:902:d01:: with SMTP id 1mr1675159plu.127.1545382829548; Fri, 21 Dec 2018 01:00:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545382829; cv=none; d=google.com; s=arc-20160816; b=XFNJYzHVnQ/wj2lxuYcjt8Yh+ozMT6BJGMH1ks1BI8czCOKb1s4NEmaD3ZRE8RVtqQ SMSej4qSi2j4B0Z8Trwy6z0dR2EdO5Qxb5UewCYQ90IpRO8NfH9umr8BQWPitXSowsGY sxQCXbUFrMgSapzwnyuCh7AstApAjnJoZd3Pni8C0hxR9IEN8b/phymqXFjYv8EHc0YG G52545hLLK9t0E5Mqowb33vdunGrhE49JRquRFfBV0A5W6TCYvdr8DgTvzZ5jRULngpR rD/FiHPEzN3gwE0+A4FNWrY2nYv1UZk4qgkgyNLRJ1TZAEFkSzAly6AnJb2s5F5jCnkv g7dQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=iQuUS1ctiab0p6w24Pg0vmWArk2PuzPsKTHxkr27G/s=; b=Wz25B5A43LLDMOrM9LW0SENWBvsFIi/T22oj+JIpKpwSOgwJ26GY1b65VmVq9D9Xx3 QNCYb/Dm1Sf7ALqyYnbHUhYDbZMtrg1dDXeO5Px1OBNuEZp11qtc7IBC+1OeLe2BWErl kAfxt5UfyE1BRsFqQ5g/WbE3HygYDwNXuiQdACvin1PbAOjbgDs02ET6G4q84H9yeDKF ku/dhevnecsqRAPcDDMTf2rIEDq2hi4mPCzEj/Ie96xHD86wsXIyIyEp0JL5k+7ApLhl 2r9xXvpJSSyqtC6zqbLtJ0KkjLoUnKPRRRiY6kZCivg96tw/yoiVMzC+DtuCSdTrPB7/ MhiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g4si21669848pfm.85.2018.12.21.01.00.13; Fri, 21 Dec 2018 01:00:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731392AbeLUGez (ORCPT + 99 others); Fri, 21 Dec 2018 01:34:55 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:43444 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731064AbeLUGez (ORCPT ); Fri, 21 Dec 2018 01:34:55 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01419;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0TGIAz8b_1545374007; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TGIAz8b_1545374007) by smtp.aliyun-inc.com(127.0.0.1); Fri, 21 Dec 2018 14:33:30 +0800 Subject: Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low To: Hugh Dickins , Andrew Morton Cc: mhocko@kernel.org, vbabka@suse.cz, hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kirill Tkhai , Andrea Arcangeli References: <1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com> <20181220144513.bf099a67c1140865f496011f@linux-foundation.org> From: Yang Shi Message-ID: <575fdffe-abfa-e52b-7b91-97e5e6ffb4bb@linux.alibaba.com> Date: Thu, 20 Dec 2018 22:33:26 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/20/18 10:04 PM, Hugh Dickins wrote: > On Thu, 20 Dec 2018, Andrew Morton wrote: >> Is anyone interested in reviewing this? Seems somewhat serious. >> Thanks. > Somewhat serious, but no need to rush. > >> From: Yang Shi >> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low >> >> When running a stress test, we occasionally run into the below hang issue: > Artificial load presumably. > >> INFO: task ksmd:205 blocked for more than 360 seconds. >> Tainted: G E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1 > 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae > ("ksm: introduce ksm_max_page_sharing per page deduplication limit"). > > The patch below is more economical than Andrea's, but I don't think > a second workaround should be added, unless Andrea's is shown to be > insufficient, even with its ksm_max_page_sharing tuned down to suit. > > Yang, please try to reproduce on upstream, or backport Andrea's to > 4.9-stable - thanks. I believe Andrea's commit could workaround this problem too by limiting the number of sharing pages. However, IMHO, even though we just have a few hundred pages share one KSM page, it still sounds not worth reclaiming it in direct reclaim in low priority. According to Andrea's commit log, it still takes a few msec to walk the rmap for 256 shared pages. Thanks, Yang > > Hugh > >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> ksmd D 0 205 2 0x00000000 >> ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00 >> ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0 >> 00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000 >> Call Trace: >> [] ? __schedule+0x258/0x720 >> [] ? do_flush_tlb_all+0x30/0x30 >> [] ? free_cpumask_var+0x9/0x10 >> [] schedule+0x36/0x80 >> [] schedule_timeout+0x206/0x4b0 >> [] ? native_flush_tlb_others+0x11f/0x180 >> [] ? ktime_get+0x40/0xb0 >> [] io_schedule_timeout+0xda/0x170 >> [] ? bit_wait+0x60/0x60 >> [] bit_wait_io+0x1b/0x60 >> [] __wait_on_bit_lock+0x59/0xc0 >> [] __lock_page+0x86/0xa0 >> [] ? wake_atomic_t_function+0x60/0x60 >> [] ksm_scan_thread+0xeb9/0x1430 >> [] ? prepare_to_wait_event+0x100/0x100 >> [] ? try_to_merge_with_ksm_page+0x850/0x850 >> [] kthread+0xe6/0x100 >> [] ? kthread_park+0x60/0x60 >> [] ret_from_fork+0x46/0x60 >> >> ksmd found a suitable KSM page on the stable tree and is trying to lock >> it. But it is locked by the direct reclaim path which is walking the >> page's rmap to get the number of referenced PTEs. >> >> The KSM page rmap walk needs to iterate all rmap_items of the page and all >> rmap anon_vmas of each rmap_item. So it may take (# rmap_item * # >> children processes) loops. This number of loops might be very large in >> the worst case, and may take a long time. >> >> Typically, direct reclaim will not intend to reclaim too many pages, and >> it is latency sensitive. So it is not worth doing the long ksm page rmap >> walk to reclaim just one page. >> >> Skip KSM pages in direct reclaim if the reclaim priority is low, but still >> try to reclaim KSM pages with high priority. >> >> Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com >> Signed-off-by: Yang Shi >> Cc: Vlastimil Babka >> Cc: Johannes Weiner >> Cc: Hugh Dickins >> Cc: Michal Hocko >> Cc: Andrea Arcangeli >> Signed-off-by: Andrew Morton >> --- >> >> mm/vmscan.c | 23 +++++++++++++++++++++-- >> 1 file changed, 21 insertions(+), 2 deletions(-) >> >> --- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low >> +++ a/mm/vmscan.c >> @@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st >> } >> } >> >> - if (!force_reclaim) >> - references = page_check_references(page, sc); >> + if (!force_reclaim) { >> + /* >> + * Don't try to reclaim KSM page in direct reclaim if >> + * the priority is not high enough. >> + */ >> + if (PageKsm(page) && !current_is_kswapd() && >> + sc->priority > (DEF_PRIORITY - 2)) >> + references = PAGEREF_KEEP; >> + else >> + references = page_check_references(page, sc); >> + } >> >> switch (references) { >> case PAGEREF_ACTIVATE: >> @@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned >> } >> } >> >> + /* >> + * Skip KSM page in direct reclaim if priority is not >> + * high enough. >> + */ >> + if (PageKsm(page) && !current_is_kswapd() && >> + sc->priority > (DEF_PRIORITY - 2)) { >> + putback_lru_page(page); >> + continue; >> + } >> + >> if (page_referenced(page, 0, sc->target_mem_cgroup, >> &vm_flags)) { >> nr_rotated += hpage_nr_pages(page); >> _