Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1072430rwl; Wed, 5 Apr 2023 11:20:17 -0700 (PDT) X-Google-Smtp-Source: AKy350ZL3XiAL9EDsQzrLwKqkg9wYBBgkZK9p+YWAzhKgNre5rY5LL/ZgF8SLifA0yra59/0jWCJ X-Received: by 2002:a17:90b:1a8d:b0:23f:ebf2:d3e9 with SMTP id ng13-20020a17090b1a8d00b0023febf2d3e9mr7615197pjb.6.1680718816840; Wed, 05 Apr 2023 11:20:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680718816; cv=none; d=google.com; s=arc-20160816; b=s94J9QpXmxTml8sv7H3mNitTOFZt4KKV43nBlVMBDS5uER+dfCPJ5ilgBq4Whf+Vcz Ro6iAT6KCv0+UogNwGIP/J2kJx32eDvIDBoVB9Jxix+6zLCQqQBlVAD29AfiKXrDTf1B 93cpSpub1pUseyRjlOHGOp+LTRF4dMQ5CvoRyUdk8lmpt4TdUSmBgQxlsBHbHSW09wuH PITP6qRDcB37e12Nyfkh10claBH+RhWsaLYQf8Px5CqY0AJ0Nj5cGTkDS0cm+lS5V4Cd Zch2T/YNCrL8v/SprHZ7lwRRrUAJsy6zUBKFk7cQauk/X4nO59Wbi+CPT5HjtzdfQiz8 Klxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=d/OJh5EY6LkSbCvmqPG7kzxh52E5UYitOggQSknW56hEFI0msEDDh36BTINIGl2w+F lYAFzXk490WwD+1Nw7FtKg/Gcutmgxv0chKW21J3afakqlyfaBpgQPWBJUDo+K21KAE5 3iXrD8p8m6ZEnFlwWyhE55XSbKsfLSKFEl7rVPYCqkl4I23tVOi4ePhP67rYsy8xl/r0 3930gj8Dchp2nqYB0I96rJl6LrqNTHSEFN4lTPiV17geyNGXkBjWGWwJuAKLt9kOlFkk Be7jlukEDPFEHYC4hFYHQS9yXEhn/JQ7OUuIBIa/YpySmYqGorZ97FhL5LkjNgqYgkfD qxEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EDqBYSHa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e6-20020a17090a9a8600b0023d1b2fb2d9si1804023pjp.165.2023.04.05.11.20.02; Wed, 05 Apr 2023 11:20:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EDqBYSHa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234066AbjDESKo (ORCPT + 99 others); Wed, 5 Apr 2023 14:10:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234049AbjDESKl (ORCPT ); Wed, 5 Apr 2023 14:10:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E72451AC for ; Wed, 5 Apr 2023 11:10:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680718203; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=EDqBYSHadGx6whOgb87cwnEb4PUip4+DZuWZL3jsKeS/Tx2MmDfqojCDd+brMfYbAjKssW mSjCAt1ogqYA/8As8A1aSW2uo5lLzvWjInF1X/y57x34xk1YoFIA8uqhcvknPMic30ynUx xxUtUkUMeIi441QWnR00k1iDgM2ZGyo= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-75-h8mqphgDOGC3IHxVEkD50Q-1; Wed, 05 Apr 2023 14:10:02 -0400 X-MC-Unique: h8mqphgDOGC3IHxVEkD50Q-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-3e1522cf031so12452821cf.1 for ; Wed, 05 Apr 2023 11:10:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680718202; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iQaR8cLruAgHBCHSLgU+0MSFf8a4qwol/WFeB+l5N2I=; b=204tOEWcx2VSzBUKIy3Z6J3bG67RecvJX3D3Ppn/4l7ERKOw/flhgPqQEhmYnqumqq XPbqP4kyyJi4GSjZJNilyWXtKhUsPmToNGpw0GJsY8kML3z7Tko5WgM3FJZABS8HdbIQ JnsQGndoVmeh3PETHAG0Rxjwx58hrJet3C7NI1mDpYPTiEq45eVOujhltpK3Oq1pR9IB uqeykx4P12lXA0WMEhD8zsswzoi/uKhT2s4ZDyoDNeHIyjOc6sS2N8/94ZLvDEin7HkO XiHCo0abRo7emRBIFjGAVeZ0hiKmrdqSvkg6qVv3nPvWU2bHVvOCoxuVDBztmiWZvL3O AfSg== X-Gm-Message-State: AAQBX9eA9S5FO2Tx09DGiP8Zf0XS3/F+7c+IuKiKFv9DqyYzQpiX3mJX bY2z1lgXUoE1BCiJTrD3faL2lUA/+jlZgPleBzyPkVOKHJg+HrmATjtCK64FH84WLlacexsPMX8 hNYkzDq8i5edQbhYWx837fLmB X-Received: by 2002:a05:622a:1a24:b0:3e6:707e:d3c2 with SMTP id f36-20020a05622a1a2400b003e6707ed3c2mr7427862qtb.0.1680718201959; Wed, 05 Apr 2023 11:10:01 -0700 (PDT) X-Received: by 2002:a05:622a:1a24:b0:3e6:707e:d3c2 with SMTP id f36-20020a05622a1a2400b003e6707ed3c2mr7427806qtb.0.1680718201547; Wed, 05 Apr 2023 11:10:01 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id c30-20020ac86e9e000000b003e388264753sm4116280qtv.65.2023.04.05.11.10.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Apr 2023 11:10:01 -0700 (PDT) Date: Wed, 5 Apr 2023 14:09:59 -0400 From: Peter Xu To: Yang Shi Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Axel Rasmussen , Nadav Amit , David Hildenbrand , Andrew Morton , Andrea Arcangeli , Mike Rapoport , linux-stable Subject: Re: [PATCH] mm/khugepaged: Check again on anon uffd-wp during isolation Message-ID: References: <20230405155120.3608140-1-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 05, 2023 at 09:59:15AM -0700, Yang Shi wrote: > On Wed, Apr 5, 2023 at 8:51 AM Peter Xu wrote: > > > > Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round > > done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), > > during which all the locks will be released temporarily. It means the > > pgtable can change during this phase before 2nd round starts. > > > > It's logically possible some ptes got wr-protected during this phase, and > > we can errornously collapse a thp without noticing some ptes are > > wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did > > that for the 1st phase, not the 2nd phase. > > > > Since __collapse_huge_page_isolate() happens after a round of small page > > swapins, we don't need to worry on any !present ptes - if it existed > > khugepaged will already bail out. So we only need to check present ptes > > with uffd-wp bit set there. > > > > This is something I found only but never had a reproducer, I thought it was > > one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns > > out it's not the cause of that but an userspace bug. However this seems to > > still be a real bug even with a very small race window, still worth to have > > it fixed and copy stable. > > Yeah, I agree. But I got confused by userfaultfd_wp(vma) and > pte_uffd_wp(pte). If a vma is armed with uffd wp, shall we skip the > whole vma? If so, whether it is better to just check vma? We do > revalidate vma once reacquiring mmap_lock, so we should be able to > bail out earlier. Checking against VMA is safe too, the difference is current code still allows thp to be collapsed as long as none of the page is explicitly protected over the thp range, even if the range is registered with userfault-wp. That's also what e1e267c7928f does. Here we have slightly different handling between anon / file thps (file thps checks against the vma flags), IMHO mostly because we don't scan pgtables when making decisions to collapse a shmem thp, so we made it simple by checking against vma flags. We can make it the same as anon but it might be an overkill just to scan the entries for uffd-wp purpose. For anon we always scans the pgtable anyway so it's easier to make a more accurate decision. Thanks, -- Peter Xu