Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3257309pxu; Sun, 11 Oct 2020 03:11:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2/ZHtZTGC+2VpIjxlZmT7TR0Qwb716bnQ7bS5m06uAcljBgEM/8OaTJfVWPNwKi3rXlDS X-Received: by 2002:a17:906:c095:: with SMTP id f21mr23872862ejz.108.1602411111163; Sun, 11 Oct 2020 03:11:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602411111; cv=none; d=google.com; s=arc-20160816; b=m4mIN6lbVRFlbJZ4FaSbWAUIjAsqeDPdB5HeLb3M37yoX/Q48ATfmKbCIZrOZZJbMN 6ichpsXLuwM8LyfJEv7nvuTWOGO7ZvniMtFZezK6y9y9vm7ArjJJ5H6Mh5MOzbvYlIEB oqFZ/zlINqXMlov8ezjtPazX019J16cG1SIO0oOHtPRTBumSbCUrymOBrsTAaE2W/GVY 6K3JzVndDbpiiJAhlxid60TDbRQpIowvTfSLgIRSyyDbddld/Wpn02UvOLegB/WK+0xM pexSL053qWvHAVpTGEhVNQKtMY9x7/1tZJ1ID00fRBiobxPIEx6mH0106EutpnNFGeu3 rU1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7Msw6fpgRxYN9+n0XAq2tl/9+x3JBz7DH4RT1m5W6lM=; b=HOq1Rxc0Q4OTFIxqGdaGNI+/+HfK0f36rx3FW71jBnUuy41mUUi8Q5pArO+LL46zq8 S4HGV0oswOSkpDpa0Qr8jmE8nx8gleRU2MgtAvR/3gILgPL4J0xKM9xmM/c6O5ij7d6f 6KJBjMlZhUSoaCtHSK1mmJINfrnkElMmoFucTAO5Z7xhDQFBe0cdn4QIhJwhI/0LkBVM npdFRqvkEX179RGobgf168f7j5e4Y1oWySqoLXBSHjuNNf5ygd3NceW9Y4a/fqU7tteZ U8rMQsrF9X0iZURTKf/OBchXiWShJ4ED0MBdkzaXYHymUaBl9zRqk+AyR1YYwpCeofDd 6CSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=gG7YhX2L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p2si10182052edm.265.2020.10.11.03.11.23; Sun, 11 Oct 2020 03:11:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=gG7YhX2L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732563AbgJJWep (ORCPT + 99 others); Sat, 10 Oct 2020 18:34:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730197AbgJJSyk (ORCPT ); Sat, 10 Oct 2020 14:54:40 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FC5DC0613BD for ; Sat, 10 Oct 2020 08:09:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=7Msw6fpgRxYN9+n0XAq2tl/9+x3JBz7DH4RT1m5W6lM=; b=gG7YhX2LzOY8pM12PPGk/xQmO/ og4fso+Qlgg4PN0Zy7fwqAf+Rj4Zbxchp0FUeQGdF/qoo7yy+IURApu7dMgcPSYgITDFiyjRLODFF zzKs91mJ0Z5MCmMqBfCXYZNFhdods9yliMOlrSSR3577CWhDwa8LloF7I7Tig0p4mp42m9uYHkE/8 0TDAZUeRtT1t+nGGrqJG4a0iGl+nRHdjzQ+P8oDVkhaJsNYU04f634Zosq072OOFBN32IJ6XR3QiK V0Y+1eDWaLgLz5h2fB4kLycTxRqssK6A6bj8zG4vAPRHGiABADmuIAU8HGanwOx2w5ahiZP08urgo z/XldW+A==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kRGUX-0004oC-Kf; Sat, 10 Oct 2020 15:09:01 +0000 Date: Sat, 10 Oct 2020 16:09:01 +0100 From: Matthew Wilcox To: Hugh Dickins Cc: Andrew Morton , Linus Torvalds , Song Liu , "Kirill A. Shutemov" , Yang Shi , Denis Lisov , Qian Cai , Suren Baghdasaryan , David Rientjes , Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm/khugepaged: fix filemap page_to_pgoff(page) != offset Message-ID: <20201010150901.GX20115@casper.infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 09, 2020 at 08:07:59PM -0700, Hugh Dickins wrote: > There have been elusive reports of filemap_fault() hitting its > VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built > with CONFIG_READ_ONLY_THP_FOR_FS=y. > > Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and > CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged > without NUMA reuses the same huge page after collapse_file() failed > (whereas NUMA targets its allocation to the respective node each time). > And most of us were usually testing with CONFIG_NUMA=y kernels. Good catch. There have been at least three bugs in recent times which can cause this VM_BUG_ON_PAGE() to trigger. This one, one where swapping out a THP led to all 512 entries pointing to the same non-huge page on swapin (fixed in -mm) and one that I introduced for a few weeks in -mm where failing to split a THP would lead to random tree corruption due to a non-zeroed node being freed to the slab cache. There may yet be a fourth. I've seen it occasionally in recent testing so I'll add this patch and see if it disappears. > Instead, non-NUMA khugepaged_prealloc_page() release the old page > if anyone else has a reference to it (1% of cases when I tested). I think this is a good way to fix the problem. We could also change khugepaged to insert a frozen page, ensuring that find_get_entry() would spin until the collapse has succeeded or the page was removed from the cache again. But I have no problem with this approach. I want to note that this is a silent data corruption for reads. generic_file_buffered_read() has a reference to the page, so this patch will fix it, but before it could be copying the wrong data to userspace. Reviewed-by: Matthew Wilcox (Oracle)