Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp7690519imm; Thu, 28 Jun 2018 07:52:59 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIFNDyRTnL1GTMjUvAby++y1WDNOSifLHBv7w1oQN/oIGMpo0E2FCA4SZwmgssmkL1G+Y4c X-Received: by 2002:a17:902:3041:: with SMTP id u59-v6mr10860242plb.208.1530197579076; Thu, 28 Jun 2018 07:52:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530197579; cv=none; d=google.com; s=arc-20160816; b=LgkXPh/QBmb0J45mgvr8Yxu1Iqkm1rkNJsOcGc5dh5WLH0IvR3Z7cRAQPU7J/9SGK2 yLT+iMau6wMAksKtET2rWm8NwzJ8aspLIwYyih6xPG6/94L+HRUjnhzSYYYQpGurMmVg 8zAgeuNDE8LtTVKXwDNDSFO9LzhrNnXEi+X2JZkxDyBOo9+JnELbQ9WRv/QAwWHspi/9 0Rgfpap/NxGh6K06ts+KPNyoqgn6GZeZiH10oOroUdGfFvCjt4BzB/NXNXtRQRhBQmAT 3dnJiDNI6rBIASI5nf/+ELJhB2zwW0iLWHtEwhdexV7EZYpC5Q0DB6NkCEWurv8fAAL0 0XCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:message-id :content-language:in-reply-to:mime-version:user-agent:date:autocrypt :openpgp:from:references:cc:to:subject:arc-authentication-results; bh=XGVUj2PXyKv+7r3nkAVP0eZJTK+I2h30MbSsslHy/6g=; b=PT/9XNuqTyHUGKO2S75QxmoGzI6N58/RJ07stj2JC8wjh9fyZNXtUe4/rqcIoc1gD/ N8CtqiJfOQs03mKSl8J7ERrLpdPzg26F07kykhznk0TxYfciXZT1qe59Kn2/pK1KPbAw decAgcAkjgrxZnbhcqJehAT+sjSwKMt7FHBTT48/GovvZAgdMDrlXTd2DyoxbYfmPuzE MvsTyEGORs6bSwtfq9ZdReWZ4861zhXaRRLIXGZDd4oExm1Mq8jfgOYCYhvowLlfVEle /Iy1FieUEx+fEtQnC90iTlEkdnmiSWR8gxHgu07GpeGkot6xGShbfnUc3WPT9nwyBTEn OpBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2-v6si5702285pgv.562.2018.06.28.07.52.44; Thu, 28 Jun 2018 07:52:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966769AbeF1Ov1 (ORCPT + 99 others); Thu, 28 Jun 2018 10:51:27 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:41710 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S966458AbeF1OvX (ORCPT ); Thu, 28 Jun 2018 10:51:23 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5SEmsDC059328 for ; Thu, 28 Jun 2018 10:51:23 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2jw08vvyxn-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 28 Jun 2018 10:51:22 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 28 Jun 2018 15:51:20 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 28 Jun 2018 15:51:16 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w5SEpF8D33947896 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 28 Jun 2018 14:51:15 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0119F52050; Thu, 28 Jun 2018 15:51:06 +0100 (BST) Received: from oc7330422307.ibm.com (unknown [9.152.224.133]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 9DF7452054; Thu, 28 Jun 2018 15:51:05 +0100 (BST) Subject: Re: [PATCH/RFC] mm: do not drop unused pages when userfaultd is running To: David Hildenbrand , linux-mm@kvack.org, linux-s390@vger.kernel.org Cc: kvm@vger.kernel.org, Janosch Frank , Cornelia Huck , linux-kernel@vger.kernel.org, Martin Schwidefsky , Andrea Arcangeli References: <20180628123916.96106-1-borntraeger@de.ibm.com> <1e470063-d56c-0a76-7a7f-2c0f0e87824b@de.ibm.com> From: Christian Borntraeger Openpgp: preference=signencrypt Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= xsFNBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABzTRDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKElCTSkgPGJvcm50cmFlZ2VyQGRlLmlibS5jb20+wsF4BBMBAgAiBQJO nDz4AhsDBgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRARe7yAtaYcfOYVD/9sqc6ZdYKD bmDIvc2/1LL0g7OgiA8pHJlYN2WHvIhUoZUIqy8Sw2EFny/nlpPVWfG290JizNS2LZ0mCeGZ 80yt0EpQNR8tLVzLSSr0GgoY0lwsKhAnx3p3AOrA8WXsPL6prLAu3yJI5D0ym4MJ6KlYVIjU ppi4NLWz7ncA2nDwiIqk8PBGxsjdc/W767zOOv7117rwhaGHgrJ2tLxoGWj0uoH3ZVhITP1z gqHXYaehPEELDV36WrSKidTarfThCWW0T3y4bH/mjvqi4ji9emp1/pOWs5/fmd4HpKW+44tD Yt4rSJRSa8lsXnZaEPaeY3nkbWPcy3vX6qafIey5d8dc8Uyaan39WslnJFNEx8cCqJrC77kI vcnl65HaW3y48DezrMDH34t3FsNrSVv5fRQ0mbEed8hbn4jguFAjPt4az1xawSp0YvhzwATJ YmZWRMa3LPx/fAxoolq9cNa0UB3D3jmikWktm+Jnp6aPeQ2Db3C0cDyxcOQY/GASYHY3KNra z8iwS7vULyq1lVhOXg1EeSm+lXQ1Ciz3ub3AhzE4c0ASqRrIHloVHBmh4favY4DEFN19Xw1p 76vBu6QjlsJGjvROW3GRKpLGogQTLslbjCdIYyp3AJq2KkoKxqdeQYm0LZXjtAwtRDbDo71C FxS7i/qfvWJv8ie7bE9A6Wsjn87BTQROnDz4ARAAmPI1e8xB0k23TsEg8O1sBCTXkV8HSEq7 JlWz7SWyM8oFkJqYAB7E1GTXV5UZcr9iurCMKGSTrSu3ermLja4+k0w71pLxws859V+3z1jr nhB3dGzVZEUhCr3EuN0t8eHSLSMyrlPL5qJ11JelnuhToT6535cLOzeTlECc51bp5Xf6/XSx SMQaIU1nDM31R13o98oRPQnvSqOeljc25aflKnVkSfqWSrZmb4b0bcWUFFUKVPfQ5Z6JEcJg Hp7qPXHW7+tJTgmI1iM/BIkDwQ8qe3Wz8R6rfupde+T70NiId1M9w5rdo0JJsjKAPePKOSDo RX1kseJsTZH88wyJ30WuqEqH9zBxif0WtPQUTjz/YgFbmZ8OkB1i+lrBCVHPdcmvathknAxS bXL7j37VmYNyVoXez11zPYm+7LA2rvzP9WxR8bPhJvHLhKGk2kZESiNFzP/E4r4Wo24GT4eh YrDo7GBHN82V4O9JxWZtjpxBBl8bH9PvGWBmOXky7/bP6h96jFu9ZYzVgIkBP3UYW+Pb1a+b w4A83/5ImPwtBrN324bNUxPPqUWNW0ftiR5b81ms/rOcDC/k/VoN1B+IHkXrcBf742VOLID4 YP+CB9GXrwuF5KyQ5zEPCAjlOqZoq1fX/xGSsumfM7d6/OR8lvUPmqHfAzW3s9n4lZOW5Jfx bbkAEQEAAcLBXwQYAQIACQUCTpw8+AIbDAAKCRARe7yAtaYcfPzbD/9WNGVf60oXezNzSVCL hfS36l/zy4iy9H9rUZFmmmlBufWOATjiGAXnn0rr/Jh6Zy9NHuvpe3tyNYZLjB9pHT6mRZX7 Z1vDxeLgMjTv983TQ2hUSlhRSc6e6kGDJyG1WnGQaqymUllCmeC/p9q5m3IRxQrd0skfdN1V AMttRwvipmnMduy5SdNayY2YbhWLQ2wS3XHJ39a7D7SQz+gUQfXgE3pf3FlwbwZhRtVR3z5u aKjxqjybS3Ojimx4NkWjidwOaUVZTqEecBV+QCzi2oDr9+XtEs0m5YGI4v+Y/kHocNBP0myd pF3OoXvcWdTb5atk+OKcc8t4TviKy1WCNujC+yBSq3OM8gbmk6NwCwqhHQzXCibMlVF9hq5a FiJb8p4QKSVyLhM8EM3HtiFqFJSV7F+h+2W0kDyzBGyE0D8z3T+L3MOj3JJJkfCwbEbTpk4f n8zMboekuNruDw1OADRMPlhoWb+g6exBWx/YN4AY9LbE2KuaScONqph5/HvJDsUldcRN3a5V RGIN40QWFVlZvkKIEkzlzqpAyGaRLhXJPv/6tpoQaCQQoSAc5Z9kM/wEd9e2zMeojcWjUXgg oWj8A/wY4UXExGBu+UCzzP/6sQRpBiPFgmqPTytrDo/gsUGqjOudLiHQcMU+uunULYQxVghC syiRa+UVlsKmx1hsEg== Date: Thu, 28 Jun 2018 16:51:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 18062814-0008-0000-0000-0000024CE51E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18062814-0009-0000-0000-000021B35F6F Message-Id: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-06-28_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1806280168 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/28/2018 04:49 PM, David Hildenbrand wrote: > On 28.06.2018 16:39, Christian Borntraeger wrote: >> >> >> On 06/28/2018 03:18 PM, David Hildenbrand wrote: >>> On 28.06.2018 14:39, Christian Borntraeger wrote: >>>> KVM guests on s390 can notify the host of unused pages. This can result >>>> in pte_unused callbacks to be true for KVM guest memory. >>>> >>>> If a page is unused (checked with pte_unused) we might drop this page >>>> instead of paging it. This can have side-effects on userfaultd, when the >>>> page in question was already migrated: >>>> >>>> The next access of that page will trigger a fault and a user fault >>>> instead of faulting in a new and empty zero page. As QEMU does not >>>> expect a userfault on an already migrated page this migration will fail. >>>> >>>> The most straightforward solution is to ignore the pte_unused hint if a >>>> userfault context is active for this VMA. >>>> >>>> Cc: Martin Schwidefsky >>>> Cc: Andrea Arcangeli >>>> Cc: stable@vger.kernel.org >>>> Signed-off-by: Christian Borntraeger >>>> --- >>>> mm/rmap.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 6db729dc4c50..3f3a72aa99f2 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1481,7 +1481,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, >>>> set_pte_at(mm, address, pvmw.pte, pteval); >>>> } >>>> >>>> - } else if (pte_unused(pteval)) { >>>> + } else if (pte_unused(pteval) && !vma->vm_userfaultfd_ctx.ctx) { >>>> /* >>>> * The guest indicated that the page content is of no >>>> * interest anymore. Simply discard the pte, vmscan >>>> >>> >>> To understand the implications better: >>> >>> This is like a MADV_DONTNEED from user space while a userfaultfd >>> notifier is registered for this vma range. >>> >>> While we can block such calls in QEMU ("we registered it, we know it >>> best"), we can't do the same in the kernel. >>> >>> These "intern MADV_DONTNEED" can actually trigger "deferred", so e.g. if >>> the pte_unused() was set before userfaultfd has been registered, we can >>> still get the same result, right?> >> Not sure I understand your last sentence. > > Rephrased: Instead trying to stop somebody from setting pte_unused will > not work, as we might get a userfaultfd registration at some point and > find a previously set pte_unused afterwards. Yes, exactly. the unused value can be set before the migration. > >> This place here is called on the unmap, (e.g. when the host tries to page out). >> The value was transferred before (and always before) during the page table invalidation. >> So pte_unused was always set before. This is the place where we decide if we page >> out (ans establish a swap pte) or just drop this page table entry. So if >> no userfaultd is registered at that point in time we are good. > > This certainly applies to ordinary userfaultfd we have right now. > userfaultfd WP (write-protect) or other features to come might be > different, but it does not seem to do any harm in case we page out > instead of dropping it. This way we are on the safe side. yes. > > In other words: I think this is the right approach.