Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751512AbdIORba (ORCPT ); Fri, 15 Sep 2017 13:31:30 -0400 Received: from rcdn-iport-1.cisco.com ([173.37.86.72]:59356 "EHLO rcdn-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751201AbdIORb3 (ORCPT ); Fri, 15 Sep 2017 13:31:29 -0400 X-IronPort-AV: E=Sophos;i="5.42,398,1500940800"; d="scan'208";a="299516636" Subject: Re: Detecting page cache trashing state To: Taras Kondratiuk , linux-mm@kvack.org Cc: xe-linux-external@cisco.com, Ruslan Ruslichenko , linux-kernel@vger.kernel.org References: <150543458765.3781.10192373650821598320@takondra-t460s> <150549350270.4512.4357187826510021894@takondra-t460s> From: Daniel Walker Message-ID: <35118dbb-6a03-aa84-a005-aafa4b9929c7@cisco.com> Date: Fri, 15 Sep 2017 10:31:27 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <150549350270.4512.4357187826510021894@takondra-t460s> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Auto-Response-Suppress: DR, OOF, AutoReply Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2979 Lines: 67 On 09/15/2017 09:38 AM, Taras Kondratiuk wrote: > Quoting Daniel Walker (2017-09-15 07:22:27) >> On 09/14/2017 05:16 PM, Taras Kondratiuk wrote: >>> Hi >>> >>> In our devices under low memory conditions we often get into a trashing >>> state when system spends most of the time re-reading pages of .text >>> sections from a file system (squashfs in our case). Working set doesn't >>> fit into available page cache, so it is expected. The issue is that >>> OOM killer doesn't get triggered because there is still memory for >>> reclaiming. System may stuck in this state for a quite some time and >>> usually dies because of watchdogs. >>> >>> We are trying to detect such trashing state early to take some >>> preventive actions. It should be a pretty common issue, but for now we >>> haven't find any existing VM/IO statistics that can reliably detect such >>> state. >>> >>> Most of metrics provide absolute values: number/rate of page faults, >>> rate of IO operations, number of stolen pages, etc. For a specific >>> device configuration we can determine threshold values for those >>> parameters that will detect trashing state, but it is not feasible for >>> hundreds of device configurations. >>> >>> We are looking for some relative metric like "percent of CPU time spent >>> handling major page faults". With such relative metric we could use a >>> common threshold across all devices. For now we have added such metric >>> to /proc/stat in our kernel, but we would like to find some mechanism >>> available in upstream kernel. >>> >>> Has somebody faced similar issue? How are you solving it? >> >> Did you make any attempt to tune swappiness ? >> >> Documentation/sysctl/vm.txt >> >> swappiness >> >> This control is used to define how aggressive the kernel will swap >> memory pages. Higher values will increase agressiveness, lower values >> decrease the amount of swap. >> >> The default value is 60. >> ======================================================= >> >> Since your using squashfs I would guess that's going to act like swap. >> The default tune of 60 is most likely for x86 servers which may not be a >> good value for some other device. > Swap is disabled in our systems, so anonymous pages can't be evicted. > As per my understanding swappiness tune is irrelevant. > > Even with enabled swap swappiness tune can't help much in this case. If > working set doesn't fit into available page cache we will hit the same > trashing state. I think it's our lack of understanding of how the VM works. If the system has no swap, then the system shouldn't start evicting pages unless you have %100 memory utilization, then the only place for those pages to go is back into the backing store, squashfs in this case. What your suggesting is that there is still free memory, which means something must be evicting page more aggressively then waiting till %100 utilization. Maybe someone more knownlegable about the VM subsystem can clear this up. Daniel