Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5216399rwd; Tue, 23 May 2023 21:01:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4DfLOn2hBi8OlULAmT9W3f9JBYSPWBErNjrA7lncA+DknWMPsknRMo9fPySQ2oKJWfHhfc X-Received: by 2002:a17:90a:4ce5:b0:24e:3bb3:ea0c with SMTP id k92-20020a17090a4ce500b0024e3bb3ea0cmr15011295pjh.10.1684900903455; Tue, 23 May 2023 21:01:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684900903; cv=none; d=google.com; s=arc-20160816; b=fgLzluO0MnoATJRrj2DZht0zA61c1DmWfD4j3vSds+vIIx88dzqLroSXGXNj9dWIby YfEhPxksS5w11cRQ6zbEfoX+D6FoFJdzzIgueUIuAbEhu7gV9riNgyxHF8QqrhNQCpHc 1anOYa3A773fkzEKaEFSQpRIRHg77jkpfTRx95EiWkMBo0jLIpZ07CNm8CNSvURqymdL YScMDUgApPyeI7EfvD4qOlOw9bk85+rAXcaKzDrHqsuOaGkw5xSHqwyZGdQa6DtvzVGY HbjV5LS3vnyvtpEDPgbwAAfP5BnAFA1+Y4d9CTLvQ0MT37b9yBS27mJ2mMyUEV6YovC1 Qb+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=2lVbnEYsazKzX4enQtO4af5XFObezJstzH38Mnuyqik=; b=xFfT1qUWzHYe6wrMI64DK/yJtVo62JzP+nPsUlxRujPBVjNj8gS3NoOKFNiMHHCoYi bTSLBXzhowaYOrcDGAEeVV/HIRl2XU0cKf6DOAqdwWBEHjeU/XRqNTzNTk1gHg4vUMr7 WO9sXtPyxN+GptrDGRnK9SCKIP6bLYfETC0B0DiuE3EM+BjWgymqvd/LDMaNuQdcZgxU yiysi7otfDbly5xh89yj/G93XSWqv6xxAFX/mkaz0n2HjN0idPcBX769wrPhuiX4vLOn DhWHdihjeQRETG+ZENLFhum8xa+ODwOZdJVTl5ATRTWngGKmh5njtn2IC/VELIE0fBH5 VVSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=QeqsWRUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o23-20020a17090ac71700b002535adfa0e4si487198pjt.65.2023.05.23.21.01.28; Tue, 23 May 2023 21:01:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=QeqsWRUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239198AbjEXDqm (ORCPT + 99 others); Tue, 23 May 2023 23:46:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239400AbjEXDqH (ORCPT ); Tue, 23 May 2023 23:46:07 -0400 Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87E58E48 for ; Tue, 23 May 2023 20:45:50 -0700 (PDT) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-561a7d96f67so6569247b3.3 for ; Tue, 23 May 2023 20:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684899949; x=1687491949; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2lVbnEYsazKzX4enQtO4af5XFObezJstzH38Mnuyqik=; b=QeqsWRUTIaM7z8ObBASO6rMxhUwHOaMOJQuaiTpKL4dhLcrusvv81VoDTUHxu0zaPc l2n68Z6HCYi0FK4CW7Xxh4nLcs7I6SuH4H+KIz63Dxsbq/W7e5ockvMqvIJT4A8rVFEw r2HfYjvpXIdGadh+M/JHV94OiHEJS2Y1qXcnjJBi44vAWTeSfWv4kQmauXwlPd+D2jKk JQKv9zBR2MqNQi/DVKTe1dXOuqbJliDsxOzg+5MAguYnOYSfs091WlrNIJNRMs47CrZd jPERmDuwKyAsBb944CNDy1jjmz6iypYAK7Gy1wvcADAX0RL813ghMliaaD152dgoa9X4 YVgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684899949; x=1687491949; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2lVbnEYsazKzX4enQtO4af5XFObezJstzH38Mnuyqik=; b=WDxTtGLOTNk3H1whjR20IJ7t8sPdnGHAUUYKRpbpCQkodUFOp0XhhLa99VNDWPSnnp YXPnnXHhK89jBrtEcT1tbHDjQDP24MxeLdbMbjA+w/Qeu95Ljt5JfdAIznkepjmZGRs0 495yhrev5zHU0MEMSe56/8WkQomZ5zWq+WQd1S6UlXzv+pPp0dx0xPrlmmpFnxi44iiB IqVovQPn9kpinDdJVs2WLOEAm13tmwDkdTyZ7STyebHnLC2lkZUTKFJyfRby5UOXpAoS 4B7P1X9b/WSZ+epJA60NrnclxcOZnacfPXxT94OBFVBmtAUs1LOn7+t+WqvWEOBHeNFw 1rKQ== X-Gm-Message-State: AC+VfDxUNRieQ+OwPa9sjeitaKysFJwRyJvaY2S1HRrLyHr+JDNSTNM+ 2OWl33dSj1ezY8f0S9e1NvyFyA== X-Received: by 2002:a0d:ff44:0:b0:55d:7d07:7fbb with SMTP id p65-20020a0dff44000000b0055d7d077fbbmr20216649ywf.27.1684899949486; Tue, 23 May 2023 20:45:49 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f205-20020a816ad6000000b00561949f713fsm3417913ywc.39.2023.05.23.20.45.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 May 2023 20:45:49 -0700 (PDT) Date: Tue, 23 May 2023 20:45:45 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Alistair Popple cc: Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 24/31] mm/migrate_device: allow pte_offset_map_lock() to fail In-Reply-To: <877csz943s.fsf@nvidia.com> Message-ID: <838a5172-f7f2-43db-e990-d38b36b544a2@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> <877csz943s.fsf@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 23 May 2023, Alistair Popple wrote: > Hugh Dickins writes: > > > migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after > > splitting huge zero pmd, and the pmd_none() handling after successfully > > splitting huge page: those are now managed inside pte_offset_map_lock(), > > and by "goto again" when it fails. > > > > But the skip after unsuccessful split_huge_page() must stay: it avoids an > > endless loop. The skip when pmd_bad()? Remove that: it will be treated > > as a hole rather than a skip once cleared by pte_offset_map_lock(), but > > with different timing that would be so anyway; and it's arguably best to > > leave the pmd_bad() handling centralized there. > > So for a pmd_bad() the sequence would be: > > 1. pte_offset_map_lock() would return NULL and clear the PMD. > 2. goto again marks the page as a migrating hole, > 3. In migrate_vma_insert_page() a new PMD is created by pmd_alloc(). > 4. This leads to a new zero page getting mapped for the previously > pmd_bad() mapping. Agreed. > > I'm not entirely sure what the pmd_bad() case is used for but is that > ok? I understand that previously it was all a matter of timing, but I > wouldn't rely on the previous code being correct in this regard either. The pmd_bad() case is for when the pmd table got corrupted (overwritten, cosmic rays, whatever), and that pmd entry is easily recognized as nonsense: we try not to crash on it, but user data may have got lost. My "timing" remark may not be accurate: I seem to be living in the past, when we had a lot more "pmd_none_or_clear_bad()"s around than today - I was thinking that any one of them could be racily changing the bad to none. Though I suppose I am now making my timing remark accurate, by changing the bad to none more often again. Since data is liable to be lost anyway (unless the corrupted entry was actually none before it got corrupted), it doesn't matter greatly what we do with it (some would definitely prefer a crash, but traditionally we don't): issue a "pmd bad" message and not get stuck in a loop is the main thing. > > > migrate_vma_insert_page(): remove comment on the old pte_offset_map() > > and old locking limitations; remove the pmd_trans_unstable() check and > > just proceed to pte_offset_map_lock(), aborting when it fails (page has > > now been charged to memcg, but that's so in other cases, and presumably > > uncharged later). > > Correct, the non-migrating page will be freed later via put_page() which > will uncharge the page. Thanks for confirming, yes, it was more difficult once upon a time, but nowadays just a matter of reaching the final put_page() Hugh