Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp1757995rwb; Thu, 19 Jan 2023 15:13:43 -0800 (PST) X-Google-Smtp-Source: AMrXdXvjsSrn408LoGDAJhCNknauNDLwC3BXf6M9GxJ6MlNAN7Q1EJU+RM/0MzRc1VUvgR/4OfRO X-Received: by 2002:a17:907:d40c:b0:86b:aa56:7451 with SMTP id vi12-20020a170907d40c00b0086baa567451mr13861979ejc.53.1674170023299; Thu, 19 Jan 2023 15:13:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674170023; cv=none; d=google.com; s=arc-20160816; b=iI7bi1t/jOqsLamCbWE9pXYKvr19Axkv0gmsqDCd/DYwm20eb0nJdGvdZ7m0frQCwN P978xoLYdTaCz081Aw6uFTd96Uz6ZZQpvavCKtmQMl7hP9ZBvOx52LTKK2dIC4Phc5nI qewfgnXFjmESSTVlkuBMF/jkTngWfE0DbqdrYChdwgeIvJZCXiIa+c/HamYUSHKku7sJ pdR6yyMT6EGeVzDLpQFgD08VI5AquEX5AGF/yKa46f2M6lSPLIB3PuxHGQNuRYjZLd/l R6oY5HZzobDczZ2TwQUOVwsgQJpYdbVegNxKD9fIojicArlHtYNr6kQ+lR13aBjjpgW6 bimA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=NQPn+zoClAB7VAcgGNX1aWjMV6TE89/w8gvLZCqEeCw1B6Z/N2h/RP806/z4mu4jm0 PUr/xmNovEFxHexzw64WJVazLcoZnm++SNo0MGme6fMQIJaj3m20tq517kSioHs3rYbJ lWzzKiEfAyEtt/L9kNe2hTtOPGSyTr0/Lmb6hXkmF2npck5j8RhsPw87fooq48XHMLIS P+UHktfLleyDLjP8W267Rty1pkJj1k2hwXl68Hm5Pz2fR9Ge/aMpDln4gHSHOzfMSVM+ 6wvb35mu8W5PyLI8llFQGG6fWK5bP9So9n1XybJM9nf4n0L3JBZU4XAUbwQQA5LiHPNm uRgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PXnUA2TM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sh15-20020a1709076e8f00b0086e3e3c8a78si18851700ejc.49.2023.01.19.15.13.29; Thu, 19 Jan 2023 15:13:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PXnUA2TM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229886AbjASWyD (ORCPT + 48 others); Thu, 19 Jan 2023 17:54:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229769AbjASWxR (ORCPT ); Thu, 19 Jan 2023 17:53:17 -0500 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B7BFDA8ED for ; Thu, 19 Jan 2023 14:35:52 -0800 (PST) Received: by mail-wm1-x336.google.com with SMTP id k16so2721033wms.2 for ; Thu, 19 Jan 2023 14:35:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=PXnUA2TMT1ZJGGkhQcewcqvTOgWQghOf9G1hAWc2/ueZAI+bUtmtWRyV5MSLHHbGZe q9ybynQ/kknfNfIE8kCpZkjGKoodgl6KuQt/wNR5rIKGRPZk8Cb603+HbtkdnlTdWL2r PIoaDLZqbxYf/xH7ABz24Zxsnrf6DoHno40tZfYjVT93J3+bLaoc9/q/HD4eN2+U2QPu vopVl4hLKKeHaYK0uNq7eBK6k+aEBkvZuATrd8tAeaaDbxOv5BoGVxwmtTryT5x+kzoV mJw04yaPLnAKOgiUbUIFcItq1R3BcA2lvYh5bc6ZF9Z93M+TwKwaEcXDijZyiGpGD76h pXFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+5uMI4CJ0tmLZKY5iYY8/PVPFxzdMy3XqhHCLQ4x93U=; b=WyNjChD/yNMn6s4RqFo5lQo4z1D8Dr/9NAwIujVE/ZATpttF41vivFIqMs6jlEyLeN 0MAh1H7vWknhJIzRvpR/T/zLMyag1XeyQ3dPgfF1x7lEBwW0Vm/zRW66me6t2HbzeY1M cEEBryedhp8ko1SQAafU7DzCnQY2a1bEj7pp+9S0eeBUoPwMBrGwwCQx7SZ+P8ApzcHW HRo6ZS7DHwn+27u7nznPyhrM6mxt/S7y2H7LsR9LxdH9ERwMczgxARmcXT8Xlp69t2TW kccWhDR4cmlQGSyLUMBK7mcBipUqf+ka0DuajN6YgOoKHUrT89/dR/uoCD96LbQ6ZY0Q jfbA== X-Gm-Message-State: AFqh2koh1h3I7FiVkTT7/b+19Ghlbxpm7i4PDoMUhVt7w56lslGTJ8bT 7IjIFRbZSTEgZJGoomYOO4s103e1xHP9kg7g08rWng== X-Received: by 2002:a05:600c:3095:b0:3d9:7950:dc5f with SMTP id g21-20020a05600c309500b003d97950dc5fmr668501wmn.120.1674167748498; Thu, 19 Jan 2023 14:35:48 -0800 (PST) MIME-Version: 1.0 References: <06423461-c543-56fe-cc63-cabda6871104@redhat.com> <6548b3b3-30c9-8f64-7d28-8a434e0a0b80@redhat.com> In-Reply-To: From: James Houghton Date: Thu, 19 Jan 2023 14:35:12 -0800 Message-ID: Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range To: Peter Xu Cc: Mike Kravetz , David Hildenbrand , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 19, 2023 at 2:23 PM Peter Xu wrote: > > On Thu, Jan 19, 2023 at 02:00:32PM -0800, Mike Kravetz wrote: > > I do not know much about the (primary) live migration use case. My > > guess is that page table lock contention may be an issue? In this use > > case, HGM is only enabled for the duration the live migration operation, > > then a MADV_COLLAPSE is performed. If contention is likely to be an > > issue during this time, then yes we would need to pass around with > > something like hugetlb_pte. > > I'm not aware of any such contention issue. IMHO the migration problem is > majorly about being too slow transferring a page being so large. Shrinking > the page size should resolve the major problem already here IIUC. This will be problematic if you scale up VMs to be quite large. Google upstreamed the "TDP MMU" for KVM/x86 that removed the need to take the MMU lock for writing in the EPT violation path. We found that this change is required for VMs >200 or so vCPUs to consistently avoid CPU soft lockups in the guest. Requiring each UFFDIO_CONTINUE (in the post-copy path) to serialize on the same PTL would be problematic in the same way. > > AFAIU 4K-only solution should only reduce any lock contention because locks > will always be pte-level if VM_HUGETLB_HGM set. When walking and creating > the intermediate pgtable entries we can use atomic ops just like generic > mm, so no lock needed at all. With uncertainty on the size of mappings, > we'll need to take any of the multiple layers of locks. > Other than taking the HugeTLB VMA lock for reading, walking/allocating page tables won't need any additional locking. We take the PTL to allocate the next level down, but so does generic mm (look at __pud_alloc, __pmd_alloc for example). Maybe I am misunderstanding. - James