Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1696661pxa; Sun, 2 Aug 2020 17:41:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzPpTKRKJnOcaBBpzQhik2W1qZsSUlnoJhHmTmjdGuGJj7CXnRff+lq5QOvZdaINMWG08dP X-Received: by 2002:a17:906:c002:: with SMTP id e2mr14928578ejz.244.1596415267145; Sun, 02 Aug 2020 17:41:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596415267; cv=none; d=google.com; s=arc-20160816; b=NrmAJb0i3eixr9X8hrOSxJh1X0ESpqO0OMUjhXCE4LZaHGApoy0I0bJQV4g9LVhAc7 U7Y8k8D1K1kNnLxb62ijm5rKuWm7LmklS97U8i+h/+xy7I53XnMuozJ1zGa8Yl8w5u44 Ykbbv0XG/LAAPZnd6ZOrnHzr24erHkzmJd6CPHgDpckfDNkbX/Sq3D/MqGOG5nV7IBpx toNrb/JKTjT0neYyYGJ3ejhQyrGyYAFlH8cFMH2yTdlkeoG2ba4/G51PWVehaMX0n/BG ioV4wlBVfm46uxRuYIgymL3SkXwulcQjIJmNFVNbv2LHQxd2pQxBdXhug3w/RTGWimOI tKAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=aRX5IBXcTykzNKGXdZYifgU3K0CDFJnL+KBNpK80gYI=; b=YouGADqV9f2tCUiaCLhrXCt4To7vqTE3YN0yrz+EQ994lJcQUKE0ST5PPdt+BZz0Ef UVnh1NbDYMjEvwiSOhq8UKtZduBXwNzShLW334KsySwkzjpRv6Lmx5eFDqNNaqtiLxIq 7Zy8DGmsfmucHGHG8ABV70jyT1XfSIOCnQ/enYgGecqksO77wbFhDotLBNve5g/0yOCw NgbpaJBgmcXGi/vIG+A3Hak2j1nsxIvfOKlMovNRmAzVQhlRqtjFvMSfLmo+Fr9uW0Vn dY4xn58iObFtfvO8oZVG4qKG+oVdVt/bgCKFsYobOctU462sQVZFJ5calhSD8I1P95hh FA0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hFMuOS3Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l7si9416414edn.510.2020.08.02.17.40.44; Sun, 02 Aug 2020 17:41:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hFMuOS3Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727849AbgHCAfk (ORCPT + 99 others); Sun, 2 Aug 2020 20:35:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726257AbgHCAfk (ORCPT ); Sun, 2 Aug 2020 20:35:40 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFF84C06174A for ; Sun, 2 Aug 2020 17:35:39 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id x7so350886qvi.5 for ; Sun, 02 Aug 2020 17:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=aRX5IBXcTykzNKGXdZYifgU3K0CDFJnL+KBNpK80gYI=; b=hFMuOS3ZBNFCJHvj9qV+M1TPMM7FEKwpc2WGGMEsd8OSczt5mGpH50Bqd9p7wPVc/f nprXNF+5TpZ/IcsZ6kLpAq3Np8aDR5WB6OZ5J2IPWfLT/PZTQjzbW0Je91KIt3pzUHsh E/I7mbaXb7lOu1L89kRYBUvlIg3PFu51zoOQOZgw4PPJdEbbcqrKA7I5LL87pUW+l9iM UV+r5Pgyxa68WgFfA6yt+mQdC0uP7fQu/n2pcPIZBDwnSZDXc3sTUfv9pTFmGjTvpJOh KdRVFdyKf4UbEdW9gQJ4R5SqY8cWSxyfhEeR6KumcKGshP0HxgJL2CFtcZ3GdJjWDSdu t7Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=aRX5IBXcTykzNKGXdZYifgU3K0CDFJnL+KBNpK80gYI=; b=Q5EnhL3RDjvJ/Ii7sGLBaUIVyalMUS10sWRTHggYKbQdi+mme08IdtKhDYccrgj2l1 XzR/Y33plEMg3ONVJqox1eHGENMJltFFM3uMTh7rCw733qXLjUL9MxAGkjaPe38DzG43 BDjcILl+u8JSl+Z0T/fKC/8Ouv5BPm3uLmtEwuPAayCdU64cHVqEnlglmXS4I4ANPa1n GSjUd0FEJ0RIpYDR+ibvGURoSsh+d1czfvAyve1VXMQg+QbM6wEYhYE6HYNvk+dpVUxR /znkmxnnqHQzDqOAr6vE1Er3bIZ3pb8HtpWk7HrbUea0FCj8YS/pKBX0r0aqhx4pz1pU 3n+g== X-Gm-Message-State: AOAM533ax1D/7bW8mc5rMP/oNrJyujMJx8Jrrq7Lkyja+y9Z3+hRlsi5 wqeGW2Rfu7McgEuX7q5n2U3P+Q== X-Received: by 2002:a0c:b743:: with SMTP id q3mr14227006qve.229.1596414938691; Sun, 02 Aug 2020 17:35:38 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c21sm17450693qka.9.2020.08.02.17.35.36 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sun, 02 Aug 2020 17:35:37 -0700 (PDT) Date: Sun, 2 Aug 2020 17:35:23 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , "Kirill A. Shutemov" , Andrea Arcangeli , Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] khugepaged: retract_page_tables() remember to test exit In-Reply-To: <20200802214408.patvlf3sghro3nhi@box> Message-ID: References: <20200802214408.patvlf3sghro3nhi@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 3 Aug 2020, Kirill A. Shutemov wrote: > On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote: > > Only once have I seen this scenario (and forgot even to notice what > > forced the eventual crash): a sequence of "BUG: Bad page map" alerts > > from vm_normal_page(), from zap_pte_range() servicing exit_mmap(); > > pmd:00000000, pte values corresponding to data in physical page 0. > > > > The pte mappings being zapped in this case were supposed to be from a > > huge page of ext4 text (but could as well have been shmem): my belief > > is that it was racing with collapse_file()'s retract_page_tables(), > > found *pmd pointing to a page table, locked it, but *pmd had become > > 0 by the time start_pte was decided. > > > > In most cases, that possibility is excluded by holding mmap lock; > > but exit_mmap() proceeds without mmap lock. Most of what's run by > > khugepaged checks khugepaged_test_exit() after acquiring mmap lock: > > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() > > do so, for example. But retract_page_tables() did not: fix that > > (using an mm variable instead of vma->vm_mm repeatedly). > > Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold > i_mmap lock, no? Unlinking a VMA requires it. Ah, my wording is misleading, yes. That comment "(using an mm variable instead of vma->vm_mm repeatedly)" was nothing more than a note, that the patch is bigger than it could be, because I decided to use an mm variable, instead of vma->vm_mm repeatedly. But it looks as if I'm saying there used to be a need for READ_ONCE() or something, and by using the mm variable I was fixing the problem. No, sorry: delete that line now the point is made: the mm variable is just a patch detail, it's not important. The fix (as the subject suggested) is for retract_page_tables() to check khugepaged_test_exit(), after acquiring mmap lock, before doing anything to the page table. Getting the mmap lock serializes with __mmput(), which briefly takes and drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on mm_users makes sure we don't touch the page table once exit_mmap() might reach it, since exit_mmap() will be proceeding without mmap lock, not expecting anyone to be racing with it. (I devised that protocol for ksmd, then Andrea adopted it for khugepaged: back then it was important for these daemons to have a hold on the mm, without an actual reference to mm_users, because that would prevent the OOM killer from reaching exit_mmap(). Nowadays with the OOM reaper, it's probably less crucial to avoid mm_users, but I think still worthwhile.) Thanks a lot for looking at these patches so quickly, Hugh