Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp157726ybg; Sun, 31 May 2020 20:24:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxU9z4RrE+4x+izh5pszOnlC8MTT6VMAoGNlU0DQUO4XTTcRIP0lDhqdvUf0Ebi2QIIJ5xd X-Received: by 2002:a17:906:a48:: with SMTP id x8mr12781051ejf.89.1590981858314; Sun, 31 May 2020 20:24:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590981858; cv=none; d=google.com; s=arc-20160816; b=hfUzawONiwpS/P6KMsGmGISLch491iYDNM5mNonvWlRz75a7kzQ2ozNIkUHKoZ5Rlh Q4raxRehFPipCNko0RMXWJfhM8dOAxWTnCxw3xFI6jYFzoZbBDVPOZ45PWd4icVOKMD2 pYvSJ+oc397SUV0GyrLkGT5N311FyMUXfU5VBbizuc3adjO+jcK01TMuzB1c9MeHlNYO Srqrb+Nj0xY0FvAsrFuBnoVGuTGQKbFZ9XSb5iXnAu67uiNbKx74l1/L4rsDNRAvIYpn T5RcftjJ8GF/CHzXm1n5ojG+1IdiaeohVxPOCEDDQZg8y7bxFEqWG0TPuVAWOV0ZOxYC swQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=cAL9AzTlR3/rckuz0QiBMeDp6G7+Z3C8GGU0gCN+rUE=; b=tnY1dVibM+mcEbb0Knb0tTtPiPviCkVpEXedCfMvcFIzat//klViyqiURxm1q/mArT gIXpQouB3Ul7d20+cePyQY8feNZ3db148dHPXlwR5Sv6e9CcRF+lAshIR4X5KEkwYrCe n4NZ9bluhEPVPEOnuPGuCzG4TUYhVof0J7+csS+rR4UmTxryDI63Q+gJhdFJCJh3HXg6 /NcOd91/aFBDF5lj2v2GhK/unDOIpn+UyiHyPQ4AGKDXwOcvciFZR2nBnmgJyKdDvElR 3MngI3Tix7Zi3qApjUTCxmqSDEph+2KKuWk+lG24r1eNXIgmiOV/f2VkMR1c24jF8Ufp +KFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=O2UdUjD8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h11si312275ejl.680.2020.05.31.20.23.55; Sun, 31 May 2020 20:24:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=O2UdUjD8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727113AbgFADWJ (ORCPT + 99 others); Sun, 31 May 2020 23:22:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726860AbgFADWI (ORCPT ); Sun, 31 May 2020 23:22:08 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B355C08C5C0 for ; Sun, 31 May 2020 20:22:08 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id b135so11228763yba.11 for ; Sun, 31 May 2020 20:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=cAL9AzTlR3/rckuz0QiBMeDp6G7+Z3C8GGU0gCN+rUE=; b=O2UdUjD8EOczgn/SjK0fKFL0Zv4xhbEPuhxmDtj5bVx0uGLy+65C/pgAov0drvlu5M fdLUJ/cY4PDDSlFxpcj4jNB3caWX6tsvWf6ngrtryjT9djpEPqg+03CSUjtgoEO0EN/T NqGNPho1JQhBF4XnSu8EP9J3RQyUAq9KXt5g2A58syxbm05Ic99xDyeHy0TdH3uQIWqO rCA8ga3goyC1hhi5tJkkkTLdlamubwNf+kBTaNZD0XWCYwLIjGRu4BxaY70vwQs9+Tk8 0WQnOERJvOdvExMPdmRHUF409YX3M5mCiIR27/VEip1u3T74GRPX7zWl1jCUCsco2iTt n70Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=cAL9AzTlR3/rckuz0QiBMeDp6G7+Z3C8GGU0gCN+rUE=; b=PmgO+EjNXhTsX+DW235P9aEcKUYp0DF2hBXUiXfHx6yczvP2re7Tlqdf9AvGa9SpZI v517jYMoh15hgsUsrZAmKNs0fDXV0YQ8upmH6ayU2gAwpan4fRHzqj78teWyQguPT0Xo yWwGdeBzYjaZL85Sonev926BpIxJ5R4H1RFmo0ZPPx75EnK0OvqDFZ7aV0WcJ9LLF5w2 oB+QlYtLYEcGVpEWBv+CIqrmHyVrvpwEKHdHo+GneXo33lGlDg/gFodvKx+AJgC4o08C BQAywPkJEJXLTpMZNn1m799YRsagT4MXQhAXIATBuKcDwQYxRHfgJm4/lY35siga/RRf Wpsw== X-Gm-Message-State: AOAM530mO1yKRIoLa3y0KXpH613LOUApXKcQel1vNYeyEpiBNG3FOo0g AMvfK6DD9H2ybSbOh8kqyRTyIne36QXe X-Received: by 2002:a25:3203:: with SMTP id y3mr20025143yby.77.1590981727300; Sun, 31 May 2020 20:22:07 -0700 (PDT) Date: Sun, 31 May 2020 20:22:04 -0700 Message-Id: <20200601032204.124624-1-gthelen@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.27.0.rc0.183.gde8f92d652-goog Subject: [PATCH] shmem, memcg: enable memcg aware shrinker From: Greg Thelen To: Hugh Dickins , Andrew Morton , Kirill Tkhai Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Greg Thelen , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()") a memcg aware shrinker is only called when the per-memcg per-node shrinker_map indicates that the shrinker may have objects to release to the memcg and node. shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs shrinker which advertises per memcg and numa awareness. The shmem shrinker releases memory by splitting hugepages that extend beyond i_size. Shmem does not currently set bits in shrinker_map. So, starting with b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under pressure. This leads to undeserved memcg OOM kills. Example that reliably sees memcg OOM kill in unpatched kernel: FS=/tmp/fs CONTAINER=/cgroup/memory/tmpfs_shrinker mkdir -p $FS mount -t tmpfs -o huge=always nodev $FS # Create 1000 MB container, which shouldn't suffer OOM. mkdir $CONTAINER echo 1000M > $CONTAINER/memory.limit_in_bytes echo $BASHPID >> $CONTAINER/cgroup.procs # Create 4000 files. Ideally each file uses 4k data page + a little # metadata. Assume 8k total per-file, 32MB (4000*8k) should easily # fit within container's 1000 MB. But if data pages use 2MB # hugepages (due to aggressive huge=always) then files consume 8GB, # which hits memcg 1000 MB limit. for i in {1..4000}; do echo . > $FS/$i done v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") maintains the per-node per-memcg shrinker bitmap for THP shrinker. But there's no such logic in shmem. Make shmem set the per-memcg per-node shrinker bits when it modifies inodes to have shrinkable pages. Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()") Cc: # 4.19+ Signed-off-by: Greg Thelen --- mm/shmem.c | 61 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 35 insertions(+), 26 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index bd8840082c94..e11090f78cb5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat, return 0; } +/* + * Expose inode and optional page to shrinker as having a possibly splittable + * hugepage that reaches beyond i_size. + */ +static void shmem_shrinker_add(struct shmem_sb_info *sbinfo, + struct inode *inode, struct page *page) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + + spin_lock(&sbinfo->shrinklist_lock); + /* + * _careful to defend against unlocked access to ->shrink_list in + * shmem_unused_huge_shrink() + */ + if (list_empty_careful(&info->shrinklist)) { + list_add_tail(&info->shrinklist, &sbinfo->shrinklist); + sbinfo->shrinklist_len++; + } + spin_unlock(&sbinfo->shrinklist_lock); + +#ifdef CONFIG_MEMCG + if (page && PageTransHuge(page)) + memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page), + inode->i_sb->s_shrink.id); +#endif +} + static int shmem_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = d_inode(dentry); @@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr) * to shrink under memory pressure. */ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); + struct page *page; + + page = find_get_page(inode->i_mapping, + (newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT); + shmem_shrinker_add(sbinfo, inode, page); + if (page) + put_page(page); } } } @@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, if (PageTransHuge(page) && DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) < hindex + HPAGE_PMD_NR - 1) { - /* - * Part of the huge page is beyond i_size: subject - * to shrink under memory pressure. - */ - spin_lock(&sbinfo->shrinklist_lock); - /* - * _careful to defend against unlocked access to - * ->shrink_list in shmem_unused_huge_shrink() - */ - if (list_empty_careful(&info->shrinklist)) { - list_add_tail(&info->shrinklist, - &sbinfo->shrinklist); - sbinfo->shrinklist_len++; - } - spin_unlock(&sbinfo->shrinklist_lock); + shmem_shrinker_add(sbinfo, inode, page); } /* -- 2.27.0.rc0.183.gde8f92d652-goog