Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1009473pxb; Tue, 17 Aug 2021 01:19:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz09R6yMMvdVwNRLtTvPUtVAH5U/BfMEmEUMx82ALc/U/QHOwT6ryL7XtqMI022wTIhaD0V X-Received: by 2002:a17:906:659:: with SMTP id t25mr2680063ejb.372.1629188376623; Tue, 17 Aug 2021 01:19:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629188376; cv=none; d=google.com; s=arc-20160816; b=qhR9khchB1Pu2vKeu8Wojr2LhXYTPK2aOVBXDDk4fWBr0J+KMnaHC4szFkSHfq+fRF Z+0G4DpKTP64JmbLQ7pMcT1zp135oYfb3V7SZNgqqVh+6wrQq/QdlKy0K72qy4lvucw8 8tJ3Q60XAoZsXXLmmiJxkdweuz7VRnsNZxBYb89pZSSXgHzmOJtgk3pvw6+ncV2neG3y uhbWlNZFRcL1BSkR3qDuRjxNr+pU0FkInM5jajVCAKIJNNnh1meKxkw67cK/9poSblTV CaCuTU0Gi+/m7VwUxDHiHigr7V2WWcCByV/8KI2I9BDqkgpS6xbpyeCuBD/7yxXlYp+2 NaFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=ijowZ4jcrG0Ehbu0X7RLv2m890u4vuBU8RzGN0n7bL0=; b=tP7Bi5UlCEQ0wETEGfTsajUH6MUpDCBNjbIZ0HuZwME9wzgWPLv8C33F87XvZ5VKY5 qyXukyOq8Y+Nxdrf+MPKR2Hhqqz6FXnGWmfjTHWAvEDPGMirhoFwAYrTEoOhHJYv1WmO 9eaSg69ENoUa9MGtwM5ho6PiAktiTNI/Ek6x9l5nu7/bJYSyJkdsy7Qaa45iCN7HVgWA ajv3I/4BXFJ1I1JeD3MD1NqLsR9jrvVrXLV+E9OmUxFHMgUl+mRnpYNtLE+Qa3tJnyKg M1U3jl/E54PHhgCdvCMBR6OmxTYYf7Wc2FL0hMrnnHw00TMMp44GqbFBTfwDH7YQ6Pwb 1aBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="hkL/t1N5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s11si1861362edh.357.2021.08.17.01.19.13; Tue, 17 Aug 2021 01:19:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="hkL/t1N5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235120AbhHQIST (ORCPT + 99 others); Tue, 17 Aug 2021 04:18:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234706AbhHQISR (ORCPT ); Tue, 17 Aug 2021 04:18:17 -0400 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4794EC061764 for ; Tue, 17 Aug 2021 01:17:45 -0700 (PDT) Received: by mail-qk1-x72b.google.com with SMTP id az7so22160843qkb.5 for ; Tue, 17 Aug 2021 01:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=ijowZ4jcrG0Ehbu0X7RLv2m890u4vuBU8RzGN0n7bL0=; b=hkL/t1N59snMllnmFaWkLsl2GgYb22030wlHiEeGop1btUrASwwzsFxbAV5LMYGE7q ozjKkck4uIJ4l1yp4GbhPSoqRVn8uj5D5eatF3Z9fVPNX22tkXoXqc2N+x1RNEw8XK+Y GS6Y74GADMaDC9HV+cat0PaBPrmtm9u64hTjHxzKGPZeEFb95d72hs6yQGUjYxhWnmhM LZ5cGAOGrIQNAeddwbzrpmiwIVtS/7Eg67KQ15w6ZNtFK9UZ7gUiFvDLYtXzmo9UjvuI PA9PwX/hWqQngtiDQwZygVZ85HzrBgyRHoJv+9zymjFDvvfF8iqcoQ492w0mOGbUv1Ds k1Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=ijowZ4jcrG0Ehbu0X7RLv2m890u4vuBU8RzGN0n7bL0=; b=k5qNHffVuO0HnPrmK02zb0PTsOoH1KU6N2V9ddjNvXbbxYw2aJoNgTWE/EMUaDVWnh zXhKqkugjsMlmbpCirZPefThUeJM/nkptBTLXXC3eHoY3qDEZuDCsyhZ3TxPIVm0KfXu NEcHNDDkhd5mB94MtkgI+lQAnxf2pNWuMESyKnN9i6b3vHWecne8cRwLCkToc8ryNHCp LkrGehyY2ayXo0LbxAruRjW4o06Ny6y8RrTe447YciQzz2ODtellUoG1f2/ZQi/WFWxt Yct/UsOtyajgh1DNwL9A1fUyZkqTRMpSk8MWcAXm6FKgiXjwDHgVQ6beS9vJQPyqohQV 7XMA== X-Gm-Message-State: AOAM5306At0tdJBwV1GllgM9sl5YHzlFspbtbn5M2BOc91043qjYJPm5 gXEHOpc6fxak59LYY3SBuHWTHA== X-Received: by 2002:a05:620a:1222:: with SMTP id v2mr2636380qkj.1.1629188264267; Tue, 17 Aug 2021 01:17:44 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id h140sm921535qke.112.2021.08.17.01.17.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Aug 2021 01:17:43 -0700 (PDT) Date: Tue, 17 Aug 2021 01:17:41 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Andrew Morton cc: Hugh Dickins , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 6/9] huge tmpfs: SGP_NOALLOC to stop collapse_file() on race In-Reply-To: Message-ID: <1355343b-acf-4653-ef79-6aee40214ac5@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org khugepaged's collapse_file() currently uses SGP_NOHUGE to tell shmem_getpage() not to try allocating a huge page, in the very unlikely event that a racing hole-punch removes the swapped or fallocated page as soon as i_pages lock is dropped. We want to consolidate shmem's huge decisions, removing SGP_HUGE and SGP_NOHUGE; but cannot quite persuade ourselves that it's okay to regress the protection in this case - Yang Shi points out that the huge page would remain indefinitely, charged to root instead of the intended memcg. collapse_file() should not even allocate a small page in this case: why proceed if someone is punching a hole? SGP_READ is almost the right flag here, except that it optimizes away from a fallocated page, with NULL to tell caller to fill with zeroes (like a hole); whereas collapse_file()'s sequence relies on using a cache page. Add SGP_NOALLOC just for this. There are too many consecutive "if (page"s there in shmem_getpage_gfp(): group it better; and fix the outdated "bring it back from swap" comment. Signed-off-by: Hugh Dickins --- include/linux/shmem_fs.h | 1 + mm/khugepaged.c | 2 +- mm/shmem.c | 29 +++++++++++++++++------------ 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 9b7f7ac52351..7d97b15a2f7a 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -94,6 +94,7 @@ extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, /* Flag allocation requirements to shmem_getpage */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ + SGP_NOALLOC, /* similar, but fail on hole or use fallocated page */ SGP_CACHE, /* don't exceed i_size, may allocate page */ SGP_NOHUGE, /* like SGP_CACHE, but no huge pages */ SGP_HUGE, /* like SGP_CACHE, huge pages preferred */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b0412be08fa2..045cc579f724 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1721,7 +1721,7 @@ static void collapse_file(struct mm_struct *mm, xas_unlock_irq(&xas); /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, - SGP_NOHUGE)) { + SGP_NOALLOC)) { result = SCAN_FAIL; goto xa_unlocked; } diff --git a/mm/shmem.c b/mm/shmem.c index 740d48ef1eb5..226ac3a911e9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1871,26 +1871,31 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, return error; } - if (page) + if (page) { hindex = page->index; - if (page && sgp == SGP_WRITE) - mark_page_accessed(page); - - /* fallocated page? */ - if (page && !PageUptodate(page)) { + if (sgp == SGP_WRITE) + mark_page_accessed(page); + if (PageUptodate(page)) + goto out; + /* fallocated page */ if (sgp != SGP_READ) goto clear; unlock_page(page); put_page(page); - page = NULL; - hindex = index; } - if (page || sgp == SGP_READ) - goto out; /* - * Fast cache lookup did not find it: - * bring it back from swap or allocate. + * SGP_READ: succeed on hole, with NULL page, letting caller zero. + * SGP_NOALLOC: fail on hole, with NULL page, letting caller fail. + */ + *pagep = NULL; + if (sgp == SGP_READ) + return 0; + if (sgp == SGP_NOALLOC) + return -ENOENT; + + /* + * Fast cache lookup and swap lookup did not find it: allocate. */ if (vma && userfaultfd_missing(vma)) { -- 2.26.2