Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp216767ybi; Fri, 26 Jul 2019 08:36:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqQviyFMN7oc3+IeClrFX+8pQ+Sbm4k/uKWL+mzoeBzqvetS2nWnkWEdiCDBEBbFcOZUGL X-Received: by 2002:a17:902:fe93:: with SMTP id x19mr89922267plm.77.1564155410295; Fri, 26 Jul 2019 08:36:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564155410; cv=none; d=google.com; s=arc-20160816; b=zffMmJN+YJiAu7YDYZahWlcDnd55XtQ5hZ6Z1DN1kjIqulo65lsAsXA2johFvvussw 8C6O4NVRFefFhMIEgO3oIU2/xIQpL+a+1prWvt/8o5Jpy9YL6/adHdoLXtis99tjmSNP d5ys2YmAr+F1nKt8NhexG17oRFtNEcBIL4cwjLdnhzy0PGZsHQBGyD0v8T4tpR1DJa/8 xc+hkEVsyyxgWjvLGmOg5tyUm+DjLm43YM9PZ7LuJ6Nteh/8FpW44ssfEaPmukcWt6VK Bi5iSc/5uTAnn5kXUcgqNDUkzh5gBT1gSE1ILP4hzllneJYsE+VvHY+13x6yah8OsQsi qbUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=C+Qa8P0BVRywE6w6po24OhcvzTYC7atnf2pboXk+DH0=; b=zeunbvPJlV+cdgzvnFy+L3mtBepusaGqptd+GvJp63tHi53iPkvReWa4tuyzW1jwRh Om+lQEGC3kPvXCa3jRmUIaHWoQNTNVOSnKbyh1BUCx54Ds4BW9egGVQtYkEl+MawNvLf NikRCQ1EZ5SB7VDXFC9UGTe4jw34g0LtUhAJvNgdMAlcCYQFNR3VqY4hRlGDJ2GB+EGB 28c8rrz8Q7lSdeim+TurVM/NGyVUAMyyTiaeZJYlMO2yTSM2OA0O+EvnRYYODNgJ72hz ehSGPEu7/goJSpdk69J5CGNC3wGfrZ8Egr3jlYJxJZqFkr6mV5HZQDd9T8lhUP6JqpDm xkaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=U+PaCLx8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t19si19000710plo.376.2019.07.26.08.36.35; Fri, 26 Jul 2019 08:36:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=U+PaCLx8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389785AbfGZPeG (ORCPT + 99 others); Fri, 26 Jul 2019 11:34:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:49574 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388449AbfGZPeE (ORCPT ); Fri, 26 Jul 2019 11:34:04 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 947BF2054F; Fri, 26 Jul 2019 15:34:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1564155243; bh=Zbddla8VV+mKzdfXhWuEssTmkq3nJ7p212+T/dx9BZ0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U+PaCLx8oyqPVv3zbRLYuohZRXINoNQOxvCnFYWjt6JtfLg8iPcGFmJCmowht/7dN l6vcBL22UMrSGOzQPfnaJG1j8NBVNcXEPrEWSTN+vW2gC2nIeU9nzewu1LyOLX7UmQ /Jp3C2IMnVLmPtbD88dBme37/JddA3UOU0ZYp0S8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Kuo-Hsin Yang , Johannes Weiner , Michal Hocko , Sonny Rao , Mel Gorman , Rik van Riel , Vladimir Davydov , Minchan Kim , Andrew Morton , Linus Torvalds Subject: [PATCH 4.19 50/50] mm: vmscan: scan anonymous pages on file refaults Date: Fri, 26 Jul 2019 17:25:25 +0200 Message-Id: <20190726152306.041848000@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190726152300.760439618@linuxfoundation.org> References: <20190726152300.760439618@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kuo-Hsin Yang commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa upstream. When file refaults are detected and there are many inactive file pages, the system never reclaim anonymous pages, the file pages are dropped aggressively when there are still a lot of cold anonymous pages and system thrashes. This issue impacts the performance of applications with large executable, e.g. chrome. With this patch, when file refault is detected, inactive_list_is_low() always returns true for file pages in get_scan_count() to enable scanning anonymous pages. The problem can be reproduced by the following test program. ---8<--- void fallocate_file(const char *filename, off_t size) { struct stat st; int fd; if (!stat(filename, &st) && st.st_size >= size) return; fd = open(filename, O_WRONLY | O_CREAT, 0600); if (fd < 0) { perror("create file"); exit(1); } if (posix_fallocate(fd, 0, size)) { perror("fallocate"); exit(1); } close(fd); } long *alloc_anon(long size) { long *start = malloc(size); memset(start, 1, size); return start; } long access_file(const char *filename, long size, long rounds) { int fd, i; volatile char *start1, *end1, *start2; const int page_size = getpagesize(); long sum = 0; fd = open(filename, O_RDONLY); if (fd == -1) { perror("open"); exit(1); } /* * Some applications, e.g. chrome, use a lot of executable file * pages, map some of the pages with PROT_EXEC flag to simulate * the behavior. */ start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); if (start1 == MAP_FAILED) { perror("mmap"); exit(1); } end1 = start1 + size / 2; start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2); if (start2 == MAP_FAILED) { perror("mmap"); exit(1); } for (i = 0; i < rounds; ++i) { struct timeval before, after; volatile char *ptr1 = start1, *ptr2 = start2; gettimeofday(&before, NULL); for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size) sum += *ptr1 + *ptr2; gettimeofday(&after, NULL); printf("File access time, round %d: %f (sec) ", i, (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 1000000.0); } return sum; } int main(int argc, char *argv[]) { const long MB = 1024 * 1024; long anon_mb, file_mb, file_rounds; const char filename[] = "large"; long *ret1; long ret2; if (argc != 4) { printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS "); exit(0); } anon_mb = atoi(argv[1]); file_mb = atoi(argv[2]); file_rounds = atoi(argv[3]); fallocate_file(filename, file_mb * MB); printf("Allocate %ld MB anonymous pages ", anon_mb); ret1 = alloc_anon(anon_mb * MB); printf("Access %ld MB file pages ", file_mb); ret2 = access_file(filename, file_mb * MB, file_rounds); printf("Print result to prevent optimization: %ld ", *ret1 + ret2); return 0; } ---8<--- Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program fills ram with 2048 MB memory, access a 200 MB file for 10 times. Without this patch, the file cache is dropped aggresively and every access to the file is from disk. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.489316 (sec) File access time, round 1: 2.581277 (sec) File access time, round 2: 2.487624 (sec) File access time, round 3: 2.449100 (sec) File access time, round 4: 2.420423 (sec) File access time, round 5: 2.343411 (sec) File access time, round 6: 2.454833 (sec) File access time, round 7: 2.483398 (sec) File access time, round 8: 2.572701 (sec) File access time, round 9: 2.493014 (sec) With this patch, these file pages can be cached. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.475189 (sec) File access time, round 1: 2.440777 (sec) File access time, round 2: 2.411671 (sec) File access time, round 3: 1.955267 (sec) File access time, round 4: 0.029924 (sec) File access time, round 5: 0.000808 (sec) File access time, round 6: 0.000771 (sec) File access time, round 7: 0.000746 (sec) File access time, round 8: 0.000738 (sec) File access time, round 9: 0.000747 (sec) Checked the swap out stats during the test [1], 19006 pages swapped out with this patch, 3418 pages swapped out without this patch. There are more swap out, but I think it's within reasonable range when file backed data set doesn't fit into the memory. $ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644, inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB) pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages Access 2100 MB regular file pages File access time, round 0: 12.165, (sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896, inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec) active_anon: 1430576, inactive_anon: 477144, active_file: 25440, inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec) active_anon: 1427436, inactive_anon: 476060, active_file: 21112, inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec) active_anon: 1420444, inactive_anon: 473632, active_file: 23216, inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec) active_anon: 1413964, inactive_anon: 471460, active_file: 31728, inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366 (+ 11309120) With 4 processes accessing non-overlapping parts of a large file, 30316 pages swapped out with this patch, 5152 pages swapped out without this patch. The swapout number is small comparing to pgpgin. [1]: https://github.com/vovo/testing/blob/master/mem_thrash.c Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty") Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty") Signed-off-by: Kuo-Hsin Yang Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Sonny Rao Cc: Mel Gorman Cc: Rik van Riel Cc: Vladimir Davydov Cc: Minchan Kim Cc: [4.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [backported to 4.14.y, 4.19.y, 5.1.y: adjust context] Signed-off-by: Kuo-Hsin Yang Signed-off-by: Greg Kroah-Hartman --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2190,7 +2190,7 @@ static void shrink_active_list(unsigned * 10TB 320 32GB */ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, - struct scan_control *sc, bool actual_reclaim) + struct scan_control *sc, bool trace) { enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE; struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -2216,7 +2216,7 @@ static bool inactive_list_is_low(struct * rid of the stale workingset quickly. */ refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE); - if (file && actual_reclaim && lruvec->refaults != refaults) { + if (file && lruvec->refaults != refaults) { inactive_ratio = 0; } else { gb = (inactive + active) >> (30 - PAGE_SHIFT); @@ -2226,7 +2226,7 @@ static bool inactive_list_is_low(struct inactive_ratio = 1; } - if (actual_reclaim) + if (trace) trace_mm_vmscan_inactive_list_is_low(pgdat->node_id, sc->reclaim_idx, lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive, lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,