Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp8014701ybi; Tue, 23 Jul 2019 01:17:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqzlLf/K52j/3J5oG2jdDe0RnpYPaq58bdBMRGhmfL4UIzC605jHjNRK2ZNJ2e+MYORw4iaY X-Received: by 2002:a63:3805:: with SMTP id f5mr42439245pga.272.1563869855837; Tue, 23 Jul 2019 01:17:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563869855; cv=none; d=google.com; s=arc-20160816; b=tdg6svLrJKmbxwrOFx70vE78WdAdyaSvtkHw/5KqgnTmOLNXbc1nOy/nE6qtKjxqax Q2GFtwIPMeUL6gkb6rsAZPj68WETMWm3Fn+VexHLX82GS34z0kdWsR66X3FITP7rWMoh vQkb4K7hx16s6/JYp26IPzavGra/B1Qj71wFO28WRRSvXyaDypOsAGHWF24JK/8AgFsK QYhooaKeSRDNipwoaQux/qsI6OhL0y+BHNCXut+dLbxNYn/fPKQdJzpeaT+zWwRIfLg9 XmhYEn9nKBNM+R/CPhCWMnZbpb0TzIcaEyriudlphWfBXvLuXmFqaRilxL7iKIIozKnh ltSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=85QgoVCMoUBXEV57x4G0PDzl0QhwTAJz3PR0yob8+QQ=; b=j9Hto0XSULeW21Oy58Dcp7Npvx89s45owOG8bim5+Q0Eel34Y7wvGLqSpJrMfgSLG+ hg0mpmsrFScMxHcfhbia+p3Sva6p7ne0Jx+iXROnRB8obzci0k/GJQX5FwIOrkthH+Dq E+TgXdyVx9P3Lcc6LLcAstz5N/Y71bs8Ll6UxOtkbYwxyKwMivvfmJkXuAu5z1DRCfmv 6Sm/yGjcl62q39AF/I5JdTDmCF0hNYJXcVu4HEjYap7E7m8lIz7xc+c404D49Jj1Jwdh BhxvL3QAk+eldFRbGN2elUvr2V+yIjCAe70FMfzAbyfLsTbjcljPV7nsvSUA8w4yYQoZ MM7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=U09JlmRF; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z185si10584688pfz.248.2019.07.23.01.17.17; Tue, 23 Jul 2019 01:17:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=U09JlmRF; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731046AbfGVXfa (ORCPT + 99 others); Mon, 22 Jul 2019 19:35:30 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:39958 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731019AbfGVXfa (ORCPT ); Mon, 22 Jul 2019 19:35:30 -0400 Received: by mail-pg1-f193.google.com with SMTP id w10so18405042pgj.7 for ; Mon, 22 Jul 2019 16:35:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=85QgoVCMoUBXEV57x4G0PDzl0QhwTAJz3PR0yob8+QQ=; b=U09JlmRF+1e0SXt1nVtPE+wsgbeVT+ODWkwh5EDZK/HI2W80i+GnSjYreZoZXSI25X H/SgYIii0FLjAEyz5fr1WEYrj2sBL7nya8427TlnXTltUbOXOAlyW3PNZsaBZvxVlt9p RgLWM5tK8Sx6N9INgAByIFLu9Ox5yaiWxUh//d2f8VulJfzD/2OGyTqkfx1tsYqpi2aZ 9pTGAYQZrPIxM09wjYq9BG2H+3Xk4U/1Xw1EXLWHs6+EyDzpF9FgQvWLD1cdX6f1bApk +PT94ubpctMbcTGgGWiIX4I+Bn5Ly+Usy4Fm5gUH2puNJ6ZQxnG4w0K/QY7JYtZ7JJqP eD4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=85QgoVCMoUBXEV57x4G0PDzl0QhwTAJz3PR0yob8+QQ=; b=FYfhI5gI+RmVOPY2/zIynYZMYg9EyY9CAVNu5aod1/EjB1lbUO5GG7xtpJ4DvamHsG HGHDY8hk4caWUxpvr/fxVRtDt02nq2Iq7RkdKc1jg9URNCPEJGwlqplwNMBfCVE+SUgx mlLkEJOZ45kI5Ey8JNohNZqE6Z9KOCoHQGdZM6nWMBaR4LIaWlHhTBrN3OMPD2rrPv8/ akl2bFICO4BST9ctb5N4I0CVFUm2IT6JGhZTdF831cQvnrWXAL79lhSnxYDE4CsCkHHF pQLnr3QrMgjkiAww/TXCgwg5ksbg/3W4qWoWH3r42A3QxMrbvcVSv48Qi0U08xwK++d2 y8kA== X-Gm-Message-State: APjAAAX/rU9ivF5DflmNXH/vmaz9qdrkt74Ckvz61MzZG9syUng5TauL mF07Ni9FPLMGQrLTvC7AxS8= X-Received: by 2002:a63:188:: with SMTP id 130mr72665111pgb.231.1563838529829; Mon, 22 Jul 2019 16:35:29 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::d1c7]) by smtp.gmail.com with ESMTPSA id l6sm40554336pga.72.2019.07.22.16.35.28 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 16:35:29 -0700 (PDT) Date: Mon, 22 Jul 2019 19:35:27 -0400 From: Johannes Weiner To: Andrew Morton Cc: linux-mm@kvack.org, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] psi: annotate refault stalls from IO submission Message-ID: <20190722233527.GA21594@cmpxchg.org> References: <20190722201337.19180-1-hannes@cmpxchg.org> <20190722152607.dd175a9d517a5f6af06a8bdc@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190722152607.dd175a9d517a5f6af06a8bdc@linux-foundation.org> User-Agent: Mutt/1.12.0 (2019-05-25) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Jul 22, 2019 at 03:26:07PM -0700, Andrew Morton wrote: > On Mon, 22 Jul 2019 16:13:37 -0400 Johannes Weiner wrote: > > > psi tracks the time tasks wait for refaulting pages to become > > uptodate, but it does not track the time spent submitting the IO. The > > submission part can be significant if backing storage is contended or > > when cgroup throttling (io.latency) is in effect - a lot of time is > > spent in submit_bio(). In that case, we underreport memory pressure. > > It's a somewhat broad patch. How significant is this problem in the > real world? Can we be confident that the end-user benefit is worth the > code changes? The error scales with how aggressively IO is throttled compared to the device's capability. For example, we have system maintenance software throttled down pretty hard on IO compared to the workload. When memory is contended, the system software starts thrashing cache, but since the backing device is actually pretty fast, the majority of "io time" is from injected throttling delays during submit_bio(). As a result we barely see memory pressure, when the reality is that there is almost no progress due to the thrashing and we should be killing misbehaving stuff. > > Annotate the submit_bio() paths (or the indirection through readpage) > > for refaults and swapin to get proper psi coverage of delays there. > > > > Signed-off-by: Johannes Weiner > > --- > > fs/btrfs/extent_io.c | 14 ++++++++++++-- > > fs/ext4/readpage.c | 9 +++++++++ > > fs/f2fs/data.c | 8 ++++++++ > > fs/mpage.c | 9 +++++++++ > > mm/filemap.c | 20 ++++++++++++++++++++ > > mm/page_io.c | 11 ++++++++--- > > mm/readahead.c | 24 +++++++++++++++++++++++- > > We touch three filesystems. Why these three? Are all other > filesystems OK or will they need work as well? These are the ones that I found open-coding add_to_page_cache_lru() followed by submit_bio() instead of going through generic code like mpage, use read_cache_pages(), implement ->readpage only. > > @@ -2753,11 +2763,14 @@ static struct page *do_read_cache_page(struct address_space *mapping, > > void *data, > > gfp_t gfp) > > { > > + bool refault = false; > > struct page *page; > > int err; > > repeat: > > page = find_get_page(mapping, index); > > if (!page) { > > + unsigned long pflags; > > + > > That was a bit odd. This? > > --- a/mm/filemap.c~psi-annotate-refault-stalls-from-io-submission-fix > +++ a/mm/filemap.c > @@ -2815,12 +2815,12 @@ static struct page *do_read_cache_page(s > void *data, > gfp_t gfp) > { > - bool refault = false; > struct page *page; > int err; > repeat: > page = find_get_page(mapping, index); > if (!page) { > + bool refault = false; > unsigned long pflags; > > page = __page_cache_alloc(gfp); > _ > It's so that when we jump to 'filler:' from outside the branch, the 'refault' variable is initialized from the first time through: bool refault = false; struct page *page; page = find_get_page(mapping, index); if (!page) { __page_cache_alloc() add_to_page_cache_lru() refault = PageWorkingset(page); filler: if (refault) psi_memstall_enter(&pflags); readpage() if (refault) psi_memstall_leave(&pflags); } lock_page() if (PageUptodate()) goto out; goto filler;