Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1825424yba; Thu, 4 Apr 2019 20:14:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqzFEWbhNKmmNMjxe88MZx8zyFlDaAGjFMtAVKinJq6WLjTJMpWAtpHNrCqgXhEIjE/AZfjl X-Received: by 2002:a63:e915:: with SMTP id i21mr9457557pgh.297.1554434081542; Thu, 04 Apr 2019 20:14:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554434081; cv=none; d=google.com; s=arc-20160816; b=ijdfy6GW/CDzxlDnUtha7trZDJIrmMIUxTE+LyWMyEfwXwioYHO5XcKtAlYXal5jQz 7dg73Hj9zRgd1WP64NktPu4yl9nzdhRB37rqctgTY2MecjtXvu/ZbbOKTzezW2145Rvj c/12SnK/uZpCJPy7LOlpPzFR/wToFwXoHnk0Qh0SpiEsy6BXwcfHKaP3YZMxJScfAiFq vJWPeSjoEGfcTJCiyG1R9rFVzuqB2cEVPiJlvHgia/gWe/qbT56QyxbVxj9TRF7rY4jx TvXowT22QalTPfMWErstgwadggExN5BCLzk+xzIVxYxNj5jBE0s7EGL9RbfTHlTUmcja 346g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=7JRLk73o3zdzwIeyVvfIEIgIijd2IU7f5ObPNEVJ4Uk=; b=ox/EkevKg46ZET8u+ctbsdrQH1w0pa+BHHOynEmFQMM4CHq3ku387I6byQenPIb+OS ghqpLBbJGTSS+n1dJZvZjc7wDstsgT2f0QJeXvZTJoXZ5ERe0hC1kgnydtPhjj9EAHcW sph7iO0IGfq8ofTbGodNSYu/6wnIwYJayv2YqIKUvNVdg5VAJTy3gpYFklTsJFTNuxPX DwL+wMpxBAZMCwNq9ARftMsTe7V1JRui8tQ4Yab/+G4yC6Iuwm+AtDlxy/5r088IQS2Z 84P+Ls5B4Oii7xPN2pIBNGVLphfxonatGFLABhGkCsxGUXDTBhl/8dIHrbWNUTfweVAl sWSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NHq0rRgS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7si16993957pgp.161.2019.04.04.20.14.24; Thu, 04 Apr 2019 20:14:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NHq0rRgS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729501AbfDEDNq (ORCPT + 99 others); Thu, 4 Apr 2019 23:13:46 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:44566 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729142AbfDEDNp (ORCPT ); Thu, 4 Apr 2019 23:13:45 -0400 Received: by mail-ed1-f68.google.com with SMTP id d11so4131680edp.11 for ; Thu, 04 Apr 2019 20:13:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7JRLk73o3zdzwIeyVvfIEIgIijd2IU7f5ObPNEVJ4Uk=; b=NHq0rRgSwNnu1x+JHk6N4YcwfOsV9hZcvVMfE6qV6BO/wk2mRcrUEcj3WC6cu01eI4 oWRDgbhtBxNfIJ2BlcLt0Flx6F8jtIB3Rgf9ar6EkRFSBYOH+pxIEk33S115qOFCAOOx IY0XSba1SPLI/uN2ODIGAetRyopTQkbTmJyAbFKeOBzuFLAn+vGq4zapF5jyiyzBejW7 8rt590KbLxtrdzL+64nJg+b/nsGzfGxM3GoZ5CmdCLWneBE3HqJfFLMhHcuPx1zMjGI0 OGkvNxSuGwu/vt4b2HqvPzAA549s5Qm2JbLhBjGuPICiJxEp6p1H9+SgQpzKfU3g7LeT aWGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7JRLk73o3zdzwIeyVvfIEIgIijd2IU7f5ObPNEVJ4Uk=; b=ix66txwoKR46vbWRlCEx653Kz0FiwuG86Wuy/6oMQPHyiKg1smJm2hJCeXzodvMJte YrXE3y7FdesZCbkitJWrcdT7/5kAZx2O3OTBV622gCOHvSfwqHlU8v9V+7WBoiC0aGUf fQr7P5wokoFs90XVF65sG+jhDhDrULw8FgAVe3haTIkJH9T9dnxcVFpYp+HjVJMmf4B5 axRloR7Gy1eKPOwYaFP07zl+odFQ8kkqSVhVi9/W4rO9asQRki41iy0wouehm7J3I+Fa 02rF4UtEd/zX8RS/RsKVhUfxoyQW9T8arwGM+AfP/eHBpbi5wVi6QSQA0yseG7lkaKXO 70bw== X-Gm-Message-State: APjAAAWxhPu9kqKdVnpxlVxHTKl6f+GeXsFYU6BVCrDYHSL8xXeUJ6zz ugJ47BSxh/mDiG5bFnr03waRIeRLeT0/Mvg/8hE= X-Received: by 2002:aa7:d0d3:: with SMTP id u19mr6276058edo.234.1554434023184; Thu, 04 Apr 2019 20:13:43 -0700 (PDT) MIME-Version: 1.0 References: <1554348617-12897-1-git-send-email-huangzhaoyang@gmail.com> <20190404071512.GE12864@dhcp22.suse.cz> In-Reply-To: <20190404071512.GE12864@dhcp22.suse.cz> From: Zhaoyang Huang Date: Fri, 5 Apr 2019 11:13:32 +0800 Message-ID: Subject: Re: [PATCH] mm:workingset use real time to judge activity of the file page To: Michal Hocko Cc: Andrew Morton , Vlastimil Babka , Joonsoo Kim , David Rientjes , Zhaoyang Huang , Roman Gushchin , Jeff Layton , Matthew Wilcox , "open list:MEMORY MANAGEMENT" , LKML , Pavel Tatashin , Johannes Weiner , geng.ren@unisoc.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org resend it via the right mailling list and rewrite the comments by ZY. On Thu, Apr 4, 2019 at 3:15 PM Michal Hocko wrote: > > [Fixup email for Pavel and add Johannes] > > On Thu 04-04-19 11:30:17, Zhaoyang Huang wrote: > > From: Zhaoyang Huang > > > > In previous implementation, the number of refault pages is used > > for judging the refault period of each page, which is not precised as > > eviction of other files will be affect a lot on current cache. > > We introduce the timestamp into the workingset's entry and refault ratio > > to measure the file page's activity. It helps to decrease the affection > > of other files(average refault ratio can reflect the view of whole system > > 's memory). > > The patch is tested on an Android system, which can be described as > > comparing the launch time of an application between a huge memory > > consumption. The result is launch time decrease 50% and the page fault > > during the test decrease 80%. > > I don't understand what exactly you're saying here, can you please elaborate? The reason it's using distances instead of absolute time is because the ordering of the LRU is relative and not based on absolute time. E.g. if a page is accessed every 500ms, it depends on all other pages to determine whether this page is at the head or the tail of the LRU. So when you refault, in order to determine the relative position of the refaulted page in the LRU, you have to compare it to how fast that LRU is moving. The absolute refault time, or the average time between refaults, is not comparable to what's already in memory. comment by ZY For current implementation, it is hard to deal with the evaluation of refault period under the scenario of huge dropping of file pages within short time, which maybe caused by a high order allocation or continues single page allocation in KSWAPD. On the contrary, such page which having a big refault_distance will be deemed as INACTIVE wrongly, which will be reclaimed earlier than it should be and lead to page thrashing. So we introduce 'avg_refault_time' & 'refault_ratio' to judge if the refault is a accumulated thing or caused by a tight reclaiming. That is to say, a big refault_distance in a long time would also be inactive as the result of comparing it with ideal time(avg_refault_time: avg_refault_time = delta_lru_reclaimed_pages/ avg_refault_retio (refault_ratio = lru->inactive_ages / time). > > Signed-off-by: Zhaoyang Huang > > --- > > include/linux/mmzone.h | 2 ++ > > mm/workingset.c | 24 +++++++++++++++++------- > > 2 files changed, 19 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 32699b2..c38ba0a 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -240,6 +240,8 @@ struct lruvec { > > atomic_long_t inactive_age; > > /* Refaults at the time of last reclaim cycle */ > > unsigned long refaults; > > + atomic_long_t refaults_ratio; > > + atomic_long_t prev_fault; > > #ifdef CONFIG_MEMCG > > struct pglist_data *pgdat; > > #endif > > diff --git a/mm/workingset.c b/mm/workingset.c > > index 40ee02c..6361853 100644 > > --- a/mm/workingset.c > > +++ b/mm/workingset.c > > @@ -159,7 +159,7 @@ > > NODES_SHIFT + \ > > MEM_CGROUP_ID_SHIFT) > > #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) > > - > > +#define EVICTION_JIFFIES (BITS_PER_LONG >> 3) > > /* > > * Eviction timestamps need to be able to cover the full range of > > * actionable refaults. However, bits are tight in the radix tree > > @@ -175,18 +175,22 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction) > > eviction >>= bucket_order; > > eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; > > eviction = (eviction << NODES_SHIFT) | pgdat->node_id; > > + eviction = (eviction << EVICTION_JIFFIES) | (jiffies >> EVICTION_JIFFIES); > > eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT); > > > > return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY); > > } > > > > static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, > > - unsigned long *evictionp) > > + unsigned long *evictionp, unsigned long *prev_jiffp) > > { > > unsigned long entry = (unsigned long)shadow; > > int memcgid, nid; > > + unsigned long prev_jiff; > > > > entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT; > > + entry >>= EVICTION_JIFFIES; > > + prev_jiff = (entry & ((1UL << EVICTION_JIFFIES) - 1)) << EVICTION_JIFFIES; > > nid = entry & ((1UL << NODES_SHIFT) - 1); > > entry >>= NODES_SHIFT; > > memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1); > > @@ -195,6 +199,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, > > *memcgidp = memcgid; > > *pgdat = NODE_DATA(nid); > > *evictionp = entry << bucket_order; > > + *prev_jiffp = prev_jiff; > > } > > > > /** > > @@ -242,8 +247,12 @@ bool workingset_refault(void *shadow) > > unsigned long refault; > > struct pglist_data *pgdat; > > int memcgid; > > + unsigned long refault_ratio; > > + unsigned long prev_jiff; > > + unsigned long avg_refault_time; > > + unsigned long refault_time; > > > > - unpack_shadow(shadow, &memcgid, &pgdat, &eviction); > > + unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &prev_jiff); > > > > rcu_read_lock(); > > /* > > @@ -288,10 +297,11 @@ bool workingset_refault(void *shadow) > > * list is not a problem. > > */ > > refault_distance = (refault - eviction) & EVICTION_MASK; > > - > > inc_lruvec_state(lruvec, WORKINGSET_REFAULT); > > - > > - if (refault_distance <= active_file) { > > + lruvec->refaults_ratio = atomic_long_read(&lruvec->inactive_age) / jiffies; > > + refault_time = jiffies - prev_jiff; > > + avg_refault_time = refault_distance / lruvec->refaults_ratio; > > + if (refault_time <= avg_refault_time) { > > inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); > > rcu_read_unlock(); > > return true; > > @@ -521,7 +531,7 @@ static int __init workingset_init(void) > > * some more pages at runtime, so keep working with up to > > * double the initial memory by using totalram_pages as-is. > > */ > > - timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT; > > + timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT - EVICTION_JIFFIES; > > max_order = fls_long(totalram_pages - 1); > > if (max_order > timestamp_bits) > > bucket_order = max_order - timestamp_bits; > > -- > > 1.9.1 > > -- > Michal Hocko > SUSE Labs