Received: by 10.223.164.221 with SMTP id h29csp11743wrb; Fri, 27 Oct 2017 13:21:08 -0700 (PDT) X-Google-Smtp-Source: ABhQp+QB/yYbKQRVSI4YLblevXJqERgd4yhPDfOp1DKHKftvuMj4XYbEnsI3K6V9GNDtXzGiryb8 X-Received: by 10.98.33.203 with SMTP id o72mr1478007pfj.41.1509135668429; Fri, 27 Oct 2017 13:21:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509135668; cv=none; d=google.com; s=arc-20160816; b=Zstm9lSNsIkVbmKpXOt0YSaox0GuS53ko2grwylad9K85Oiy15axW5u5MtFHOAfdVE 0LaGBnUmGZn4Y35sdOmj1pfIjScnILGMwxR5XXMAWef5xPA7KXaB7NUgpx68ONUZAYJn jKQUzcUpqwaTAg3K1gJE4H6b63yLxrSrCpV69ByHzbVpLlmzkNA/cyR+PRpxu2HcA3yx FT4OuXt2TrCK+gSCVgbnWjWsDEQYePMZiUPR5ICmIAQNfucLDAxYg1Cfyl530CbZA2bL JlPrXQH4GFPAGZlgmDpee5UrNO/4Jg6XrVHSu5Jd8M0GHbH4kSz5tMFD+ntgRcph7CXe ZYew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=ZKaIAnJ6lT56JSiL+i/4IQbllcs35g4yxJ2tOhHeUmE=; b=rkpRp0ENqnu1KT/nsgrorbHyPLtoV8Vhr6dqRcOwGWjivi99RowkyTUytp/DmNOh0f V2bdcC4ks9RSUzw1TZxzpiBW1uVAtkE0Ok2MOBfYAeGsIKmzvgOM+pv6Jjgr3/bJZhRT UlliHtAK/918VxefHI4+bVZMOeuWsjjsSXALXRRl5mwJ7OskJu52MtEvoy27xRdSDSUd tYxAza2/6tc7ofCgHVtn+zYSs0lAAxHonV6nHuLReNAm9gqefPONtXeKQPyJ3AEP6uqD W2LifunxrrswXAbO/5hsCYqVE/G7CnzSw6VLa86bl6FX2eY07yJlbUizX9yyRxj7Hh7/ oRlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=kY/7cYeJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cisco.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b11si5339331pgq.114.2017.10.27.13.20.54; Fri, 27 Oct 2017 13:21:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cisco.com header.s=iport header.b=kY/7cYeJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cisco.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752809AbdJ0UTf (ORCPT + 99 others); Fri, 27 Oct 2017 16:19:35 -0400 Received: from rcdn-iport-6.cisco.com ([173.37.86.77]:34294 "EHLO rcdn-iport-6.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752565AbdJ0UTb (ORCPT ); Fri, 27 Oct 2017 16:19:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=2689; q=dns/txt; s=iport; t=1509135571; x=1510345171; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=D8CVjGKAOIE9segHD2Y46lbfbGzW2RjxtD234L2lGn4=; b=kY/7cYeJGGMtVn65P3km09gAZPIP5Ne3mzQQav1WervFdmxCFVYDtLcE XCJH7n0zLIIHa1vC0X+hBPM/EbXJrprbw3LPYCtlEkI1vGNa3V0gq6ycr pxJVTGIzSvQ55tKEDySjQZR8TmEaRH665tiRd3mnkYDxNWwFbfu0ZtcDS M=; X-IronPort-AV: E=Sophos;i="5.44,305,1505779200"; d="scan'208";a="313291126" Received: from rcdn-core-8.cisco.com ([173.37.93.144]) by rcdn-iport-6.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Oct 2017 20:19:08 +0000 Received: from [10.21.101.2] (sjc-vpn1-1282.cisco.com [10.21.101.2]) by rcdn-core-8.cisco.com (8.14.5/8.14.5) with ESMTP id v9RKJ3ix028701 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 27 Oct 2017 20:19:07 GMT Subject: Re: Detecting page cache trashing state To: Johannes Weiner Cc: Taras Kondratiuk , Michal Hocko , linux-mm@kvack.org, xe-linux-external@cisco.com, linux-kernel@vger.kernel.org References: <150543458765.3781.10192373650821598320@takondra-t460s> <20170915143619.2ifgex2jxck2xt5u@dhcp22.suse.cz> <150549651001.4512.15084374619358055097@takondra-t460s> <20170918163434.GA11236@cmpxchg.org> <20171025175424.GA14039@cmpxchg.org> From: "Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)" Message-ID: Date: Fri, 27 Oct 2017 23:19:02 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171025175424.GA14039@cmpxchg.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Auto-Response-Suppress: DR, OOF, AutoReply Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Johannes, On 10/25/2017 08:54 PM, Johannes Weiner wrote: > Hi Ruslan, > > sorry about the delayed response, I missed the new activity in this > older thread. > > On Thu, Sep 28, 2017 at 06:49:07PM +0300, Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco) wrote: >> Hi Johannes, >> >> Hopefully I was able to rebase the patch on top v4.9.26 (latest supported >> version by us right now) >> and test a bit. >> The overall idea definitely looks promising, although I have one question on >> usage. >> Will it be able to account the time which processes spend on handling major >> page faults >> (including fs and iowait time) of refaulting page? > That's the main thing it should measure! :) > > The lock_page() and wait_on_page_locked() calls are where iowaits > happen on a cache miss. If those are refaults, they'll be counted. > >> As we have one big application which code space occupies big amount of place >> in page cache, >> when the system under heavy memory usage will reclaim some of it, the >> application will >> start constantly thrashing. Since it code is placed on squashfs it spends >> whole CPU time >> decompressing the pages and seem memdelay counters are not detecting this >> situation. >> Here are some counters to indicate this: >> >> 19:02:44 CPU %user %nice %system %iowait %steal %idle >> 19:02:45 all 0.00 0.00 100.00 0.00 0.00 0.00 >> >> 19:02:44 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s >> pgscand/s pgsteal/s %vmeff >> 19:02:45 15284.00 0.00 428.00 352.00 19990.00 0.00 0.00 >> 15802.00 0.00 >> >> And as nobody actively allocating memory anymore looks like memdelay >> counters are not >> actively incremented: >> >> [:~]$ cat /proc/memdelay >> 268035776 >> 6.13 5.43 3.58 >> 1.90 1.89 1.26 > How does it correlate with /proc/vmstat::workingset_activate during > that time? It only counts thrashing time of refaults it can actively > detect. The workingset counters are growing quite actively too. Here are some numbers per second: workingset_refault�� 8201 workingset_activate�� 389 workingset_restore�� 187 workingset_nodereclaim�� 313 > Btw, how many CPUs does this system have? There is a bug in this > version on how idle time is aggregated across multiple CPUs. The error > compounds with the number of CPUs in the system. The system has 2 CPU cores. > I'm attaching 3 bugfixes that go on top of what you have. There might > be some conflicts, but they should be minor variable naming issues. > I will test with your patches and get back to you. Thanks, Ruslan From 1582290753067606525@xxx Thu Oct 26 03:54:12 +0000 2017 X-GM-THRID: 1578563211273176438 X-Gmail-Labels: Inbox,Category Forums