Received: by 10.223.164.202 with SMTP id h10csp1027907wrb; Sun, 26 Nov 2017 18:19:29 -0800 (PST) X-Google-Smtp-Source: AGs4zMboqwyZho+kyll6e+6xvVtx0R7zaKSj56cDMIj4/7xj+09rC3obIbSU1au7kQ/J6dX99duv X-Received: by 10.159.249.73 with SMTP id h9mr8516503pls.6.1511749169800; Sun, 26 Nov 2017 18:19:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511749169; cv=none; d=google.com; s=arc-20160816; b=rd+QyBW+IiEi23AV5AnAgg8pU2MOj5n5Z/76+US6oULNDa8p9voLxR/BUI2/7MWD9p 8oEKZvjYdUUkVdOtmmdn1hDO5qN32hUToksLuKCuhSrsKxyyJxi13wmE9jY9nes2hQvf CyuLp+NnuSaf+BLbs3Vo5BsrFXQ6ujKaw8g1x1lnk3MdL+h26kYwFhHu3X3Cx1oKsFYk /7B5NZB9MYFMzvRH8wiYpZoNOnwJlEq6cbke48o4dz73mHr6XjZ5ZrMgt3vfvcGpxise kRVjbNyuvodXj95quQs6ntquHmqxSoYn60JNuBJUqqSCYk8fYYF42zMfq1ezgM/lPguR K1FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=XkXd0gcex33+Rnfi+S6Kcjzka0Dxjll2hi3DN/+6VIo=; b=bJXoJ2KPtMi1mbGI7QY0lmUNAo04YqM4q3Hk7fwL2A1Yknki9pVaU1oRsZqxI7xRL9 ackL08BApBXpx2w5bfocXaHkJz3Ad7PSdfS4O68FQ/BAW+3tkSl/DTSbHGLC4ePiTiw6 sAOys8wLAY4ZxDspnpi3aKuIMOI34Wry7ykl8VVopUBd4ToDIiEN0lM+CHSO8MLaZTh3 zPV/KLgzDibpI1sJGVuGFrVbL1xeimYexBwGvmxR4QgDfpHys9dWb4X0IVxONNrLUdT9 c/WTigFCcox8I3Lr7q4FCZ5Ka+WIAkyY7+dlg1IMpcYQuvUiyksd39YzoVS/a9PK5kFY TFEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b193si22198968pga.103.2017.11.26.18.19.16; Sun, 26 Nov 2017 18:19:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751252AbdK0CSk (ORCPT + 77 others); Sun, 26 Nov 2017 21:18:40 -0500 Received: from LGEAMRELO12.lge.com ([156.147.23.52]:44296 "EHLO lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750844AbdK0CSi (ORCPT ); Sun, 26 Nov 2017 21:18:38 -0500 Received: from unknown (HELO lgemrelse7q.lge.com) (156.147.1.151) by 156.147.23.52 with ESMTP; 27 Nov 2017 11:18:36 +0900 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: minchan@kernel.org Received: from unknown (HELO bbox) (10.177.220.163) by 156.147.1.151 with ESMTP; 27 Nov 2017 11:18:36 +0900 X-Original-SENDERIP: 10.177.220.163 X-Original-MAILFROM: minchan@kernel.org Date: Mon, 27 Nov 2017 11:18:35 +0900 From: Minchan Kim To: "Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)" Cc: Johannes Weiner , Taras Kondratiuk , Michal Hocko , linux-mm@kvack.org, xe-linux-external@cisco.com, linux-kernel@vger.kernel.org Subject: Re: Detecting page cache trashing state Message-ID: <20171127021835.GA27255@bbox> References: <150543458765.3781.10192373650821598320@takondra-t460s> <20170915143619.2ifgex2jxck2xt5u@dhcp22.suse.cz> <150549651001.4512.15084374619358055097@takondra-t460s> <20170918163434.GA11236@cmpxchg.org> <20171025175424.GA14039@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Nov 20, 2017 at 09:40:56PM +0200, Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco) wrote: > Hi Johannes, > > I tested with your patches but situation is still mostly the same. > > Spend some time for debugging and found that the problem is squashfs > specific (probably some others fs's too). > The point is that iowait for squashfs reads will be awaited inside squashfs > readpage() callback. > Here is some backtrace for page fault handling to illustrate this: > > �1)�������������� |� handle_mm_fault() { > �1)�������������� |��� filemap_fault() { > �1)�������������� |����� __do_page_cache_readahead() > �1)�������������� |������� add_to_page_cache_lru() > �1)�������������� |������� squashfs_readpage() { > �1)�������������� |��������� squashfs_readpage_block() { > �1)�������������� |����������� squashfs_get_datablock() { > �1)�������������� |������������� squashfs_cache_get() { > �1)�������������� |��������������� squashfs_read_data() { > �1)�������������� |����������������� ll_rw_block() { > �1)�������������� |������������������� submit_bh_wbc.isra.42() > �1)�������������� |����������������� __wait_on_buffer() { > �1)�������������� |������������������� io_schedule() { > �------------------------------------------ > �0)�� kworker-79�� =>��� -0 > �------------------------------------------ > �0)�� 0.382 us��� |� blk_complete_request(); > �0)�������������� |� blk_done_softirq() { > �0)�������������� |��� blk_update_request() { > �0)�������������� |����� end_buffer_read_sync() > �0) + 38.559 us�� |��� } > �0) + 48.367 us�� |� } > �------------------------------------------ > �0)�� kworker-79�� =>� memhog-781 > �------------------------------------------ > �0) ! 278.848 us� |������������������� } > �0) ! 279.612 us� |����������������� } > �0)�������������� |����������������� squashfs_decompress() { > �0) # 4919.082 us |������������������� squashfs_xz_uncompress(); > �0) # 4919.864 us |����������������� } > �0) # 5479.212 us |��������������� } /* squashfs_read_data */ > �0) # 5479.749 us |������������� } /* squashfs_cache_get */ > �0) # 5480.177 us |����������� } /* squashfs_get_datablock */ > �0)�������������� |����������� squashfs_copy_cache() { > �0)�� 0.057 us��� |������������� unlock_page(); > �0) ! 142.773 us� |����������� } > �0) # 5624.113 us |��������� } /* squashfs_readpage_block */ > �0) # 5628.814 us |������� } /* squashfs_readpage */ > �0) # 5665.097 us |����� } /* __do_page_cache_readahead */ > �0) # 5667.437 us |��� } /* filemap_fault */ > �0) # 5672.880 us |� } /* handle_mm_fault */ > > As you can see squashfs_read_data() schedules IO by ll_rw_block() and then > it waits for IO to finish inside wait_on_buffer(). > After that read buffer is decompressed and page is unlocked inside > squashfs_readpage() handler. > > Thus by the the time when filemap_fault() calls lock_page_or_retry() page > will be uptodate and unlocked, > wait_on_page_bit() is not called at all, and time spent for read/decompress > is not accounted. A weakness in current approach is that it relies on page lock. It means it cannot work with sychronous devices like DAX, zram and so on, I think. Johannes, Can we add memdelay_enter to every fault handler's prologue? and we can check it in epilogue whether the faulted page is workingset. If is was, we can accumuate the spent time. It would work with synchronous devices, esp, zram without hacking some FSes like squashfs. I think page fault handler/kswapd/direct reclaim would cover most of cases of *real* memory pressure but un[lock]page freinds would cover superfluously, for example, FSes can call it easily without memory pressure. > > Tried to apply quick workaround for test: > > diff --git a/mm/readahead.c b/mm/readahead.c > index c4ca702..5e2be2b 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -126,9 +126,21 @@ static int read_pages(struct address_space *mapping, > struct file *filp, > > ���� for (page_idx = 0; page_idx < nr_pages; page_idx++) { > ���� ��� struct page *page = lru_to_page(pages); > +��� ��� bool refault = false; > +��� ��� unsigned long mdflags; > + > ���� ��� list_del(&page->lru); > -��� ��� if (!add_to_page_cache_lru(page, mapping, page->index, gfp)) > +��� ��� if (!add_to_page_cache_lru(page, mapping, page->index, gfp)) { > +��� ��� ��� if (!PageUptodate(page) && PageWorkingset(page)) { > +��� ��� ��� ��� memdelay_enter(&mdflags); > +��� ��� ��� ��� refault = true; > +��� ��� ��� } > + > ���� ��� ��� mapping->a_ops->readpage(filp, page); > + > +��� ��� ��� if (refault) > +��� ��� ��� ��� memdelay_leave(&mdflags); > +��� ��� } > ���� ��� put_page(page); From 1584615875855567321@xxx Mon Nov 20 19:51:02 +0000 2017 X-GM-THRID: 1578563211273176438 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread