Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp642767rdb; Sat, 30 Sep 2023 21:58:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE9VllmjXTpskIi7nCdyT+CLWwApqYZ6mC1aJFHXiYDos3tXmiREww4QZkueW1KmMlzPepH X-Received: by 2002:a05:6a20:32aa:b0:14c:def1:e728 with SMTP id g42-20020a056a2032aa00b0014cdef1e728mr6420014pzd.60.1696136314143; Sat, 30 Sep 2023 21:58:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696136314; cv=none; d=google.com; s=arc-20160816; b=JYPm0rfo2fulcvdP/qt4ulae2nKfEEnTgej8OX5/qXYYDWWP+0DngsnupsOHeS9EW9 us79YiBayR7OOSjFFBhMVICg0CiKQgkik6SyvKCoJy6iAnO4Z+b7f3Z6ENJkRSn/fsV0 v/HiUyMGYTtGbc7MN9wQDjwAtztQ5muWWqEAirp7lxsdpAnXGZHyAdyZl2KtqN0ujU71 NU2Y/Ola4GE7vs8Zyb0Y2GQ/bjCoD8CrYXkSyvd0atXERr+/QNfgiQBL4ZrPU9uGdtX3 PfyuXccJHDWeyk0RKNP3lmQYlImja61283FWHL5ulo7GcsLH+ILRbmkxg0ib2qFtR57C kLoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=pE9mhLbiNFXKaWFGmDBpwheWDufBV96UhYV4t5P9EjI=; fh=klBNXgqHNLN28Sy9xWegPy1QuF+zplPcI1MHQa5+hpk=; b=LvvwCjxLJiY56EttlAvoS9mkH3D4uh9FtpMCUoO0BBpatd2gRYb8OB4QL5toAPIuEu v+TGprGRcGdPqBSzLK9WzA8uP0qukk/JlJoDYme8WHklPTKHOs+1sZnHv1NjBC9BSgoe EL7LjaRTy2c6txfn58nRgvpcjZOiEL8sxvAu/7FjIctiGtgYLnnad2nktFEEq/xB60yL awBtjrRB7OxhMQi3GRD68/AZhJMe2Arwau5PHmu2fTrPxNlH7B20FKEwvlAN5PJSeZ9R UWeZ87R/Wb2+TOZadyXu1lvrC80cswQPR+h9OcLMBWBDQ+JiXSPM/qkVXWHAGSXC2GXa cfhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=aAyrQlCz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id z7-20020a1709027e8700b001c60c5726absi17130512pla.39.2023.09.30.21.58.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 30 Sep 2023 21:58:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=aAyrQlCz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 5FD6E826C4C8; Sat, 30 Sep 2023 19:58:27 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230209AbjJACyj (ORCPT + 99 others); Sat, 30 Sep 2023 22:54:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjJACyi (ORCPT ); Sat, 30 Sep 2023 22:54:38 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D060DD3 for ; Sat, 30 Sep 2023 19:54:35 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0723CC433C7; Sun, 1 Oct 2023 02:54:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1696128875; bh=lhdylN6ubXsWPNt/SHxBMnKveIpOU1CtLRkaa4GXlHs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aAyrQlCzHV97txvA9wz6K4iZTOs+XVz/J+6DN8XzLEl+8WVMfNpb1Ei1ZkZfXGI7Z QsMBfKvnEqp/6+rs/zHgc1xqYfi7bzVoEEznhs+C9OjJThLV+LfZDsguheUMlTRL94 CBxaWM8/kSSUm9aSMoy31gVYeIbfYEd/BqNaMspg= Date: Sat, 30 Sep 2023 19:54:34 -0700 From: Andrew Morton To: riel@surriel.com Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org Subject: Re: [PATCH v5 0/3] hugetlbfs: close race between MADV_DONTNEED and page fault Message-Id: <20230930195434.3507483510ba7961985fbeb2@linux-foundation.org> In-Reply-To: <20231001005659.2185316-1-riel@surriel.com> References: <20231001005659.2185316-1-riel@surriel.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Sat, 30 Sep 2023 19:58:27 -0700 (PDT) On Sat, 30 Sep 2023 20:55:47 -0400 riel@surriel.com wrote: > v5: somehow a __vma_private_lock(vma) test failed to make it from my tree into the v4 series, fix that > v4: fix unmap_vmas locking issue pointed out by Mike Kravetz, and resulting lockdep fallout > v3: fix compile error w/ lockdep and test case errors with patch 3 > v2: fix the locking bug found with the libhugetlbfs tests. > > Malloc libraries, like jemalloc and tcalloc, take decisions on when > to call madvise independently from the code in the main application. > > This sometimes results in the application page faulting on an address, > right after the malloc library has shot down the backing memory with > MADV_DONTNEED. > > Usually this is harmless, because we always have some 4kB pages > sitting around to satisfy a page fault. However, with hugetlbfs > systems often allocate only the exact number of huge pages that > the application wants. > > Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of > any lock taken on the page fault path, which can open up the following > race condition: > > CPU 1 CPU 2 > > MADV_DONTNEED > unmap page > shoot down TLB entry > page fault > fail to allocate a huge page > killed with SIGBUS > free page > > Fix that race by extending the hugetlb_vma_lock locking scheme to also > cover private hugetlb mappings (with resv_map), and pulling the locking > from __unmap_hugepage_final_range into helper functions called from > zap_page_range_single. This ensures page faults stay locked out of > the MADV_DONTNEED VMA until the huge pages have actually been freed. Didn't we decide that [1/3] and [2/3] should be cc:stable? > The third patch in the series is more of an RFC. Using the > invalidate_lock instead of the hugetlb_vma_lock greatly simplifies > the code, but at the cost of turning a per-VMA lock into a lock > per backing hugetlbfs file, which could slow things down when > multiple processes are mapping the same hugetlbfs file. "could slow things down" is testable-for? This third one I'd queue up for testing for a 6.7-rc1 merge, so I'll split the series apart. Not a problem, but it would be a little better if things were originally packaged that way.