Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp28094806rwd; Tue, 4 Jul 2023 12:52:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlExiE7PJ01ehE4ZMfLT+slSVQxCsuEGqM/XNNvZ00kK74x7WAieiYU3PTW5nAjqrIp28RTa X-Received: by 2002:a17:903:234e:b0:1b8:a894:407c with SMTP id c14-20020a170903234e00b001b8a894407cmr247022plh.22.1688500333371; Tue, 04 Jul 2023 12:52:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688500333; cv=none; d=google.com; s=arc-20160816; b=GTIY1H879XbN8oHsLIiqgyid588XVGAHVnuYQyr32GK724T5mNeaXQ/0IZuz3X7c2t Y8kLBqaJjpvsZOpb1loI2eIi+5zBFPlikpAQHsgcWbu+bzw4BBXktJJm22CWV10NQsvj CX+amT51oBBbWbMHkdiTSLbI7kl9HR9lx/CF0hhCCO+lAF5ave6AScXcWBtXnkJ/+E0K BwyVOLRwoT0kRlvPM0uZXmdKXFvps/7KkZ4l8L6SFNbv7i5jn+653lWbFJ18m4h+4d3n jSJOpEOFhpX/WdtFHAmGU+V0CdcmlvfNqrapWvd7BixcrncnmV9qc5q9jzTHnoZcEC4r XHEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=A4IKew0kN3UlZ/53V8vTAFpGpJXOcLX/p8Q1Xyl4NZo=; fh=XLJLg8WXdhwGrMSzs3MfkZczrcBsqkmvHzzfT+Z54/s=; b=cXvq0/SSR89rNR+kX3XaqUDV1TyexizDlALYIkJgDQTyTqgLZMa7nmoGqNFfJMsBRa 1a8MWeXdm6jUTkaxwwzRLPdI/Hp9YB1w0a1LCkCBmL9Sgn1RZB1sgbw7PpvW+xorNIWu w+VmXvHsso5T2kKtU7tB/9Y4AIOdNmE4565OlhwX3LMMXValDNZWYOwsKqFVJK0xh1Q4 eqgNpq1GUMWAejbI7RNX/7JUJSTcR5+rDNuyMr07Nc4iifjtpRYf6dVud1uVfu6chvw4 6Y7ZY51KzV1oH6enN0uhw8ywnCy3NC6vSwAFmVEwvE29Sqt3aNrnbN43q5JkESjR+WDX T8Zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=mlsaLhBH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i14-20020a17090332ce00b001b8867a3e83si7822704plr.407.2023.07.04.12.52.00; Tue, 04 Jul 2023 12:52:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=mlsaLhBH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231364AbjGDTgG (ORCPT + 99 others); Tue, 4 Jul 2023 15:36:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229451AbjGDTgG (ORCPT ); Tue, 4 Jul 2023 15:36:06 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6693710C9 for ; Tue, 4 Jul 2023 12:36:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=A4IKew0kN3UlZ/53V8vTAFpGpJXOcLX/p8Q1Xyl4NZo=; b=mlsaLhBHoQBje1BQR2ufPx/gJu UzG8a+yaHAe6vPT4WeTAGvPuFNMNPyFZoYJ55MgU5Ug/vlosGKQZsNtOBdqn+YwjpGUr2MXiTAjlf Ti4Ia8ZHlWFTZIvUgXYK4SBqVM3XYYUgr+xlmMvtz7oMpkrreg+TjKMkgDPrlIqhUx8rqtWf8dzAB 3zZyVZi0oQZlETbkSTSMhilFRevsNZFfh+EcMhJPaTRerfqzrhT5bkXFpxHh8EKKn1YBuUXji9N2J 7dp4+XKpR1W7DL6ydA5sRIfTa5shVt1UtoEwa1DqOtLG2qbqb5rUQsPuZT78YWMtTbpZLyGMjpzUp 6wdAWIjw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qGlod-009PfR-Mx; Tue, 04 Jul 2023 19:35:59 +0000 Date: Tue, 4 Jul 2023 20:35:59 +0100 From: Matthew Wilcox To: Sidhartha Kumar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org Subject: Re: [PATCH 1/4] mm/memory: convert do_page_mkwrite() to use folios Message-ID: References: <20230703055850.227169-1-sidhartha.kumar@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230703055850.227169-1-sidhartha.kumar@oracle.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 02, 2023 at 10:58:47PM -0700, Sidhartha Kumar wrote: > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { > + folio_unlock(folio); I promised to explain this better once I had time, and I have time now. folio->mapping is used for a multitude of purposes, unfortunately. Maybe some future work will reduce that, but for now, These Are The Rules. If the folio is marked as being Slab, it's used for something else. The folio does not belong to an address space (nor can it be mapped, so we're not going to see it here, but sometimes we see it in other contexts where we call folio_mapping()). The bottom two bits are used as PAGE_MAPPING_FLAGS. If they're both 0, this folio belongs to a file, and the rest of folio->mapping is a pointer to a struct address_space. Or they're both 0 because the whole thing is NULL. More on that below. If the bottom two bits are 01b, this is an anonymous folio, and folio->mapping is actually a pointer to an anon_vma (which is not the same thing as an anon vma). If the bottom two bits are 10b, this is a Movable page (anon & file memory is also movable, but this is different). The folio->mapping points to a struct movable_operations. If the bottom two bits are 11b, this is a KSM allocation, and folio->mapping points to a struct ksm_stable_node. When we remove a folio from the page cache, we reset folio->mapping to NULL. We often remove folios from the page cache before their refcount drops to zero (the common case is to look up the folio in the page cache, which grabs a reference, remove the folio from the page cache which decrements the refcount, then put the folio which might be the last refcount). So it's entirely possible to see a folio in this function with a NULL mapping; that means it's been removed from the file through a truncate or siimlar, and we need to fail the mkwrite. Userspace is about to get a segfault. If you find all of that confusing, well, I agree, and I'm trying to simplify it. So, with all that background, what's going on here? Part of the "modern" protocol for handling page faults is to lock the folio in vm_ops->page_mkwrite. But we still support (... why?) drivers that haven't been updated. They return 0 on success instead of VM_FAULT_LOCKED. So we take the lock for them, then check that the folio wasn't truncated, and bail out if it looks like it was. If we have a really old-school driver that has allocated a page, mapped it to userspace, and set page->mapping to be, eg, Movable, by calling folio_mapping() instead of folio->mapping, we'll end up seeing NULL instead of a non-NULL value, mistakenly believe it to have been truncated and enter an endless loop. Am I being paranoid here? Maybe! Drivers should have been updated by now. The "modern" way was introduced in 2007 (commit d0217ac04ca6), so it'd be nice to turn this into a WARN_ON_ONCE so drivers fix their code. There are only ~30 implementations of page_mkwrite in the kernel, so it might not take too long to check everything's OK.