Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp2952420ybd; Thu, 27 Jun 2019 23:52:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqwJRKrvW1BLxNAnqWCx+EzX1jMEexsJQ2S6I0Jl341d1VUKGyRYEwrdYGyQaAg7b0U6fIXC X-Received: by 2002:a65:48c3:: with SMTP id o3mr4672286pgs.70.1561704738419; Thu, 27 Jun 2019 23:52:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561704738; cv=none; d=google.com; s=arc-20160816; b=TrOEaxfrHOXTL9I5q7FdSqnFiHPYMyUGEKKzhJNx3yEFYQ3S5E5J3wHBBNvcZBg0Rh L/0hyrUb3f7f6YMOxPbqInnjbdnPcA5wJWtpXmQiCw59uTjE6r8/oV+PppM5Wz+37xgl pSw+HCsF83l6dckE5UVahfGWF3cL3L2UkqIM3G0BGtqL1bix67MI3rvQuLlGQSAyJcEN bCajOIIAG/ZlRufWKVrBhJebz+W0nGf5DBh+1GhdVcjtpZgOtLoBQzxAGKjeTirhK2Im lKxJIeQwUuHrv52nY37x/jT6ggsxvz3yIEaiL2nc5xP1bTsQFThNwQp67AlFJIpP2rBO W8fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=mCstpKpvgbBC+V+ZGiWFlCxOPjghNzZbXr1kJOJkGEc=; b=0TFUE/tRY843Xo0PJiHMdJPoHHogPALfnBWKYL4pFIo8TJUR5Bgn7+VKFak+DlFL5R sN2GvHZkdqKLoyqzknb7zigBsfzMQly3B3nhG5n6i5oqDiGoXgOQrKkxja51R/k5qY36 A6VXoPhgKYFl/dhrxLAQci+Wxex88yssOLZ8EK4W0oIkStgVCYyqJXXWVRIxTy/m1/hC 6wlpRc9+VL+jUJmDBa5l91A3TQt05NDshDAsQk02bsPUjP27KfVbv5kDSpj9MhHxULGa u3nmYxQ8Mmk5oUGWr/0eN+61ko5u5BmmD14UVCxvPulz8VcCRJ6JoubpDTqMx0+QomyD lvUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Cv6URamx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 68si1334908ple.89.2019.06.27.23.52.02; Thu, 27 Jun 2019 23:52:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Cv6URamx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727100AbfF1Gvp (ORCPT + 99 others); Fri, 28 Jun 2019 02:51:45 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:43390 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726827AbfF1Gvp (ORCPT ); Fri, 28 Jun 2019 02:51:45 -0400 Received: by mail-pl1-f196.google.com with SMTP id cl9so2679620plb.10 for ; Thu, 27 Jun 2019 23:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=mCstpKpvgbBC+V+ZGiWFlCxOPjghNzZbXr1kJOJkGEc=; b=Cv6URamx2iez5Bj7TuIlhxGqHMOHEygYhuPAvLwOBofBDUglWIMw6QdOGPE73NQ6+i jav6fRvR0NxwWAYc3M4mcV4KkwniIE0+DN9BgNIy7ng1I8LL3V9c861VT0Mkbmx7mC9N XGLVsuuq500ozM+G8Ui+mQjISaOIxPaU3cOkGDCtd9Vyy9OgRtlfuNKeA+I7aE3nz2Hr taBSqBi3qcGKuPWxaNJoWNCupfRrMuMfUeLAzTyq3HwfoaGB9JSEheS21EIa6ByWe/jg oQ3malVO0XGQYEpnPI6g+Au35JqstM4w6ueSgeJmsm2dmTDleG4RBR1U+TdoyM29PAjK AFXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=mCstpKpvgbBC+V+ZGiWFlCxOPjghNzZbXr1kJOJkGEc=; b=r8fbvhPJgCv8uF0qDm+BstF3cy7wHU+l5L4rDpv+1Ru5qEEMW2R8UVOl003aTl5Qfo eee3utr/X2MceOGjpC+AzSlQuiNzztdASxZBbjPHwVeLiWsRgEbMVUxW0LJywxG0BLFj masm+gZoHp/63C71R/oSeIFerennC8cekAf5vJojmHZCFogBZD4zrRAv1ukqy7B++lRD Zx0lc/CyixRlwN6/2g0temG3Gc6cJAr5CGLnWRa3CHdvvjEEHTwz6sMHBE0FcfL+IQli pMdQK0BGfUCWldjpx3H3cej6jHHKjZIG1MKuBYMNaNMZa2sm4cxIipocyUuAvV9c5oPg BPaw== X-Gm-Message-State: APjAAAV8uTbcFqRm6DL9XOK2u+aLa6G6FMboD5XkyQ9STLT2/9YOPcaC f+wCWfrWdw3cyWN1se2J3j7asMfe X-Received: by 2002:a17:902:542:: with SMTP id 60mr9722394plf.68.1561704703972; Thu, 27 Jun 2019 23:51:43 -0700 (PDT) Received: from google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id a21sm1303140pfi.27.2019.06.27.23.51.40 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 23:51:42 -0700 (PDT) Date: Fri, 28 Jun 2019 15:51:38 +0900 From: Minchan Kim To: Johannes Weiner Cc: Kuo-Hsin Yang , Andrew Morton , Michal Hocko , Sonny Rao , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: vmscan: fix not scanning anonymous pages when detecting file refaults Message-ID: <20190628065138.GA251482@google.com> References: <20190619080835.GA68312@google.com> <20190627184123.GA11181@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190627184123.GA11181@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Johannes, On Thu, Jun 27, 2019 at 02:41:23PM -0400, Johannes Weiner wrote: > On Wed, Jun 19, 2019 at 04:08:35PM +0800, Kuo-Hsin Yang wrote: > > When file refaults are detected and there are many inactive file pages, > > the system never reclaim anonymous pages, the file pages are dropped > > aggressively when there are still a lot of cold anonymous pages and > > system thrashes. This issue impacts the performance of applications with > > large executable, e.g. chrome. > > > > When file refaults are detected. inactive_list_is_low() may return > > different values depends on the actual_reclaim parameter, the following > > 2 conditions could be satisfied at the same time. > > > > 1) inactive_list_is_low() returns false in get_scan_count() to trigger > > scanning file lists only. > > 2) inactive_list_is_low() returns true in shrink_list() to allow > > scanning active file list. > > > > In that case vmscan would only scan file lists, and as active file list > > is also scanned, inactive_list_is_low() may keep returning false in > > get_scan_count() until file cache is very low. > > > > Before commit 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in > > cache workingset transition"), inactive_list_is_low() never returns > > different value in get_scan_count() and shrink_list() in one > > shrink_node_memcg() run. The original design should be that when > > inactive_list_is_low() returns false for file lists, vmscan only scan > > inactive file list. As only inactive file list is scanned, > > inactive_list_is_low() would soon return true. > > > > This patch makes the return value of inactive_list_is_low() independent > > of actual_reclaim. > > > > The problem can be reproduced by the following test program. > > > > ---8<--- > > void fallocate_file(const char *filename, off_t size) > > { > > struct stat st; > > int fd; > > > > if (!stat(filename, &st) && st.st_size >= size) > > return; > > > > fd = open(filename, O_WRONLY | O_CREAT, 0600); > > if (fd < 0) { > > perror("create file"); > > exit(1); > > } > > if (posix_fallocate(fd, 0, size)) { > > perror("fallocate"); > > exit(1); > > } > > close(fd); > > } > > > > long *alloc_anon(long size) > > { > > long *start = malloc(size); > > memset(start, 1, size); > > return start; > > } > > > > long access_file(const char *filename, long size, long rounds) > > { > > int fd, i; > > volatile char *start1, *end1, *start2; > > const int page_size = getpagesize(); > > long sum = 0; > > > > fd = open(filename, O_RDONLY); > > if (fd == -1) { > > perror("open"); > > exit(1); > > } > > > > /* > > * Some applications, e.g. chrome, use a lot of executable file > > * pages, map some of the pages with PROT_EXEC flag to simulate > > * the behavior. > > */ > > start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED, > > fd, 0); > > if (start1 == MAP_FAILED) { > > perror("mmap"); > > exit(1); > > } > > end1 = start1 + size / 2; > > > > start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2); > > if (start2 == MAP_FAILED) { > > perror("mmap"); > > exit(1); > > } > > > > for (i = 0; i < rounds; ++i) { > > struct timeval before, after; > > volatile char *ptr1 = start1, *ptr2 = start2; > > gettimeofday(&before, NULL); > > for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size) > > sum += *ptr1 + *ptr2; > > gettimeofday(&after, NULL); > > printf("File access time, round %d: %f (sec)\n", i, > > (after.tv_sec - before.tv_sec) + > > (after.tv_usec - before.tv_usec) / 1000000.0); > > } > > return sum; > > } > > > > int main(int argc, char *argv[]) > > { > > const long MB = 1024 * 1024; > > long anon_mb, file_mb, file_rounds; > > const char filename[] = "large"; > > long *ret1; > > long ret2; > > > > if (argc != 4) { > > printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS\n"); > > exit(0); > > } > > anon_mb = atoi(argv[1]); > > file_mb = atoi(argv[2]); > > file_rounds = atoi(argv[3]); > > > > fallocate_file(filename, file_mb * MB); > > printf("Allocate %ld MB anonymous pages\n", anon_mb); > > ret1 = alloc_anon(anon_mb * MB); > > printf("Access %ld MB file pages\n", file_mb); > > ret2 = access_file(filename, file_mb * MB, file_rounds); > > printf("Print result to prevent optimization: %ld\n", > > *ret1 + ret2); > > return 0; > > } > > ---8<--- > > > > Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the > > program fills ram with 2048 MB memory, access a 200 MB file for 10 > > times. Without this patch, the file cache is dropped aggresively and > > every access to the file is from disk. > > > > $ ./thrash 2048 200 10 > > Allocate 2048 MB anonymous pages > > Access 200 MB file pages > > File access time, round 0: 2.489316 (sec) > > File access time, round 1: 2.581277 (sec) > > File access time, round 2: 2.487624 (sec) > > File access time, round 3: 2.449100 (sec) > > File access time, round 4: 2.420423 (sec) > > File access time, round 5: 2.343411 (sec) > > File access time, round 6: 2.454833 (sec) > > File access time, round 7: 2.483398 (sec) > > File access time, round 8: 2.572701 (sec) > > File access time, round 9: 2.493014 (sec) > > > > With this patch, these file pages can be cached. > > > > $ ./thrash 2048 200 10 > > Allocate 2048 MB anonymous pages > > Access 200 MB file pages > > File access time, round 0: 2.475189 (sec) > > File access time, round 1: 2.440777 (sec) > > File access time, round 2: 2.411671 (sec) > > File access time, round 3: 1.955267 (sec) > > File access time, round 4: 0.029924 (sec) > > File access time, round 5: 0.000808 (sec) > > File access time, round 6: 0.000771 (sec) > > File access time, round 7: 0.000746 (sec) > > File access time, round 8: 0.000738 (sec) > > File access time, round 9: 0.000747 (sec) > > > > Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition") > > Signed-off-by: Kuo-Hsin Yang > > Acked-by: Johannes Weiner > > Your change makes sense - we should indeed not force cache trimming > only while the page cache is experiencing refaults. > > I can't say I fully understand the changelog, though. The problem of I guess the point of the patch is "actual_reclaim" paramter made divergency to balance file vs. anon LRU in get_scan_count. Thus, it ends up scanning file LRU active/inactive list at file thrashing state. So, Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition") would make sense to me since it introduces the parameter. > forcing cache trimming while there is enough page cache is older than > the commit you refer to. It could be argued that this commit is > incomplete - it could have added refault detection not just to > inactive:active file balancing, but also the file:anon balancing; but > it didn't *cause* this problem. > > Shouldn't this be > > Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty") > Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty") That would affect, too but it would be trouble to have stable backport since we don't have refault machinery in there.