Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp9542474pxu; Mon, 28 Dec 2020 20:38:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJxUFthtNpEkUFxQYbOsaHUGXosH0YbErjR5VaClMRU4OEY7ru8wElRaiFmkdNMebiDrNPjV X-Received: by 2002:a17:906:408f:: with SMTP id u15mr45013078ejj.84.1609216729323; Mon, 28 Dec 2020 20:38:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609216729; cv=none; d=google.com; s=arc-20160816; b=Ci9AvEUTUDlNpfBIQipHtMQ95oeiUc4Hz+prsbrkH3h7IITRJWoLrnXFItQVlRmXUQ 2xwC8bvZyFfu/OvgpUTTKhj6L32G6fnifFuayJGMj1CL3Cho4iSRDDFkJO/SYRU9DpdB +VtoGvI7nxMHht017zjxwTjr5/xhdWDbXTkmbZXA7KQkWi3ipzJtLJL8B2TQa5IYEMWF 9CNKb4HUZc1y5lEkxQ7590wI/7uqmmgdvqOWNQ1yZ4E4wAcre8ksVZxBe6hjsi+mHG8o NgpdtrFwShHndclN/Wn4Ckw4iBBdFpfO2dFYueBCHRDuzBqI25DinXxEUwFJGnR0CzW2 +ABA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=JSMEE+awVQW+e2MZzo1grl+KwXPNiZfuFNeu/ipIh0vi5eg7EP1nCyHz9j7z8uZlmU eXsYx1WXnjoxD212ewh9JL0dCkeEJmHpvIz1F0oa7K18sGmZ1w0K4Ie7ph9+nf9WD6kz 0LWFW/FxVzGU4zuFB05vVQTD45sJIx5JJvd7gwVemWx8Ib41o0ZiqvwivhCJp3rWT1zp b+Ze8qNMW+7Espmw+BYCFaEW5J1MqJyavWYI3KYOirIoPz7u34xVgPP9pza7c0lyyj+g vS5g3y/Fu1ZXvCtdLmr9QMGBq73m2Rz7ErSk99Rk2He21UK9umZMTyJ+o1TVJWYSHINc Zmmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HKp1B9Sa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u18si22620920edy.420.2020.12.28.20.38.26; Mon, 28 Dec 2020 20:38:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HKp1B9Sa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726276AbgL2EgC (ORCPT + 99 others); Mon, 28 Dec 2020 23:36:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726014AbgL2EgC (ORCPT ); Mon, 28 Dec 2020 23:36:02 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5085AC0613D6 for ; Mon, 28 Dec 2020 20:35:22 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id s2so13603210oij.2 for ; Mon, 28 Dec 2020 20:35:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=HKp1B9SaJ4tN+WlMO/Qg9gpOOi6ZDpwz4xG4OzPcWYuJu1lWBDnwSx77dfIPumeGBt 6xCVrpoH6RqQL5DEDV5mduzxLMDfvMG0oGAd50jUFEvL3Z5Bsx/LwqU5WP9JG4ZvhopB uSfSicQt/SklehWvKfm08VlMl0WGiv9Mhpp/VszvsmacTClXzXnVUpM2whqiErC0hn/Z alqFshOjTj3PaOHMuDSLeO+aIexc3M/a99on4A9KqSkwRBn4od3ypNmOnHk0eX8n2TwZ YzCaL4nGke7Y0FkGjnG1psXFjOxzTmej5ESsnXpIQ4iu4vgYXK9ZwoujSntqcILYI7X9 bfuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=KpXRKIw3P90o9BX1YNU9mSCxfw8UA+cjKWyg0yGrtl2QaghXOnrbzWSN7JUO4fBUUK EhDVy2aYV6Bjan5JJgPIZQ2+IFLsZIA789kxCZLvsS9ahokr7NyPWsYz3Ap9hN3doCcp mww7KErZ3DQ8n1YKlt4nwDIPDJENt0qllX0mbd7CdebU9leRi1rfpjkKp829bSFv72Uc dB+T45/7CpmWyVtNd8gjoKsHu+4uzQzZ3fyCfcvtZyibBDv3fGXy1A4IYEpNZd5npaR+ nE4qz287VS3rKBm8PmQvExX1V9d9m+s4gphi1zmCM4SN4HyCbxDrm+dDAHYvIydz+fo6 w6/A== X-Gm-Message-State: AOAM532AX0SBvpLau7G0eG5jhAhE6SUmxufo3dhxePCPPOpI64jndL+3 I2W5hdjI6F4rQUP916RWh4iFEQ== X-Received: by 2002:a05:6808:3c9:: with SMTP id o9mr1301842oie.103.1609216521313; Mon, 28 Dec 2020 20:35:21 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j2sm9666576otq.78.2020.12.28.20.35.19 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 28 Dec 2020 20:35:20 -0800 (PST) Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Linus Torvalds , Hugh Dickins , Matthew Wilcox , "Kirill A. Shutemov" , Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box> Message-ID: References: <20201226224016.dxjmordcfj75xgte@box> <20201227234853.5mjyxcybucts3kbq@box> <20201228125352.phnj2x2ci3kwfld5@box> <20201228220548.57hl32mmrvvefj6q@box> <20201228221237.6nu75kgxq7ikxn2a@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Got it at last, sorry it's taken so long. On Tue, 29 Dec 2020, Kirill A. Shutemov wrote: > On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote: > > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote: > > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov wrote: > > > > > > > > So far I only found one more pin leak and always-true check. I don't see > > > > how can it lead to crash or corruption. Keep looking. Those mods look good in themselves, but, as you expected, made no difference to the corruption I was seeing. > > > > > > Well, I noticed that the nommu.c version of filemap_map_pages() needs > > > fixing, but that's obviously not the case Hugh sees. > > > > > > No,m I think the problem is the > > > > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > > > > at the end of filemap_map_pages(). > > > > > > Why? > > > > > > Because we've been updating vmf->pte as we go along: > > > > > > vmf->pte += xas.xa_index - last_pgoff; > > > > > > and I think that by the time we get to that "pte_unmap_unlock()", > > > vmf->pte potentially points to past the edge of the page directory. > > > > Well, if it's true we have bigger problem: we set up an pte entry without > > relevant PTL. > > > > But I *think* we should be fine here: do_fault_around() limits start_pgoff > > and end_pgoff to stay within the page table. Yes, Linus's patch had made no difference, the map_pages loop is safe in that respect. > > > > It made mw looking at the code around pte_unmap_unlock() and I think that > > the bug is that we have to reset vmf->address and NULLify vmf->pte once we > > are done with faultaround: > > > > diff --git a/mm/memory.c b/mm/memory.c > > Ugh.. Wrong place. Need to sleep. > > I'll look into your idea tomorrow. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 87671284de62..e4daab80ed81 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, > } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); > pte_unmap_unlock(vmf->pte, vmf->ptl); > rcu_read_unlock(); > + vmf->address = address; > + vmf->pte = NULL; > WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > -- And that made no (noticeable) difference either. But at last I realized, it's absolutely on the right track, but missing the couple of early returns at the head of filemap_map_pages(): add --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); - if (!head) { - rcu_read_unlock(); - return 0; - } + if (!head) + goto out; if (filemap_map_pmd(vmf, head)) { - rcu_read_unlock(); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; + goto out; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -3066,9 +3064,9 @@ unlock: put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); +out: rcu_read_unlock(); vmf->address = address; - vmf->pte = NULL; WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); return ret; -- and then the corruption is fixed. It seems miraculous that the machines even booted with that bad vmf->address going to __do_fault(): maybe that tells us what a good job map_pages does most of the time. You'll see I've tried removing the "vmf->pte = NULL;" there. I did criticize earlier that vmf->pte was being left set, but was either thinking back to some earlier era of mm/memory.c, or else confusing with vmf->prealloc_pte, which is NULLed when consumed: I could not find anywhere in mm/memory.c which now needs vmf->pte to be cleared, and I seem to run fine without it (even on i386 HIGHPTE). So, the mystery is solved; but I don't think any of these patches should be applied. Without thinking through Linus's suggestions re do_set_pte() in particular, I do think this map_pages interface is too ugly, and given us lots of trouble: please take your time to go over it all again, and come up with a cleaner patch. I've grown rather jaded, and questioning the value of the rework: I don't think I want to look at or test another for a week or so. Hugh