Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp466830lqs; Tue, 5 Mar 2024 07:14:11 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCW2p0s17lhMUvY1/4NUXLCoqPhItWCH6f+IKItnaaA6LHEB8k8S99Bol6xyoY7/xIaPkzU2zU2HcoXl3x2cRbu35ec8IovytNZyH8+2MQ== X-Google-Smtp-Source: AGHT+IF2+czL58MYAaUslFp3hTNLmHTbH4E8CCF/zKWtPbUNo1Z7CNlIEhgxOMzlBf126o20T/hT X-Received: by 2002:a05:6a20:1453:b0:1a0:9ab0:982a with SMTP id a19-20020a056a20145300b001a09ab0982amr2371669pzi.26.1709651651137; Tue, 05 Mar 2024 07:14:11 -0800 (PST) Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id o16-20020a056a001b5000b006e56bfc50fesi10102256pfv.301.2024.03.05.07.14.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 07:14:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-92581-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-92581-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-92581-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id CE9A72874F2 for ; Tue, 5 Mar 2024 15:14:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DD8BE1272A8; Tue, 5 Mar 2024 15:14:03 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 47F5C126F11; Tue, 5 Mar 2024 15:14:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709651643; cv=none; b=Dt09oAvCdybG7xRdoubuAK2JvgRiCLZo5SDrvgHWjnuM0fCVZeAqqlnBMOjgJ3ka3RM61/Ww4XOf+o0a2uiW0etcCqKTAtAkKFfIqsNKxEtrRHrsLyESlsSXnayDclJ2GI/tQJvPJt5cQbzLFsMLdd2MazU7fyCRhlJKBrTS2bI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709651643; c=relaxed/simple; bh=usGZF6UxV+1VLnpklW44wjNYg9/XmH4RSi/XxQz0hzQ=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=LY/7da/r5AKzZDPxLkgdyfR0/ow3iv9S8a5tJuqFEIIfZSx3VvzCPy/ens944Es6b8j9/DfGGrSQA9i43wJlc48ojy8k7xTc+SXaouVBsHKQV1vl/rshAPyETdjI2gGaOrGIr8X7uhfKuC9pZUjQ0Ucwpf14zJZZVyDeTO/BsZs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F0781FB; Tue, 5 Mar 2024 07:14:37 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 836683F738; Tue, 5 Mar 2024 07:13:59 -0800 (PST) From: Ryan Roberts To: Andrew Morton , David Hildenbrand , "Huang, Ying" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1] mm: swap: Fix race between free_swap_and_cache() and swapoff() Date: Tue, 5 Mar 2024 15:13:49 +0000 Message-Id: <20240305151349.3781428-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was valid. This wasn't present in get_swap_device() so I've added it. I couldn't find any existing get_swap_device() call sites where this extra check would cause any false alarms. Details of how to provoke one possible issue (thanks to David Hilenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> .. WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/ Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts --- Applies on top of v6.8-rc6 and mm-unstable (b38c34939fe4). Thanks, Ryan mm/swapfile.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b3a2d85e350..f580e6abc674 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1281,7 +1281,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) - goto put_out; + goto bad_offset; + if (data_race(!si->swap_map[swp_offset(entry)])) + goto bad_free; return si; bad_nofile: @@ -1289,9 +1291,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) out: return NULL; put_out: - pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val); percpu_ref_put(&si->users); return NULL; +bad_offset: + pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val); + goto put_out; +bad_free: + pr_err("%s: %s%08lx\n", __func__, Unused_offset, entry.val); + goto put_out; } static unsigned char __swap_entry_free(struct swap_info_struct *p, @@ -1609,13 +1616,14 @@ int free_swap_and_cache(swp_entry_t entry) if (non_swap_entry(entry)) return 1; - p = _swap_info_get(entry); + p = get_swap_device(entry); if (p) { count = __swap_entry_free(p, entry); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) __try_to_reclaim_swap(p, swp_offset(entry), TTRS_UNMAPPED | TTRS_FULL); + put_swap_device(p); } return p != NULL; } -- 2.25.1