Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4293421pxf; Tue, 30 Mar 2021 04:23:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8IVf4rWbVcEBolHbJas0PWHPQ8Zm1fJHGjnAMJr8HSK6f2XpdGIpXfS8x9QiLbwgrfERC X-Received: by 2002:a05:6402:3587:: with SMTP id y7mr34458186edc.54.1617103418308; Tue, 30 Mar 2021 04:23:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617103418; cv=none; d=google.com; s=arc-20160816; b=KoadEmq3IE+4crrenGqfy228shjMU77mo7f/AMUzKA/2XmnfrSocxXFa3W3IXR5AVp Pm6AqgN9yUCciI4k9V1oSpe9nI1CJtxRHAMxwUm7GbjxND0/Y1TyOM50PXjnYZ7oZAy/ 4wZ/WK/Zx4MOrdawAxrC82waEORLh2aM/lGfTB2RU5K9/RA7wwGE7ecIKeoTlZNtRAol GD1woTFLLxFxfeT/ocRFaJZh00uH1aUIb3nXuGY3BQlrRa7AF72MNhsObmKYnwR7XKIm hph98f4jbPq0m/V4b/ZJ2w/2mbtvGepTwOaQiXSSSUUn7n+nDtzFEpNUXKwj6I+ZYkY/ PFKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=jyNEQWBozx+GoWju/kOqjcQywLFAW4Qp2wrXZgJqP/Y=; b=yq27YLoTw4v/wpq1kWVL/lEVkANhiGqAW2OvZvsoyq4aD3oLgbf95zbLXLRL06JGxo fprVuTzomhOo24+AcNKUrYXllDkWzFH3pNAgjwtCliBCdVAs/X0UcUidQBYQgQJ53yE3 L/VdB14uLmYXnyuJS7sTVFOLEaxUcBJL4mERzIXp3oJNJMVMWFas+OLS1pp3muoO0dRF VJtllbeyGLy7AJ9y0ilB+0PJFqHW51wTLbCH2bxpXotKKihf9EupV5517/tAziwy7+Xa ru2zup1PKN4zko5PSIIY5d/NJRfMuEXF+xOcTRLe/3xS4t/esKSUo2uIfe/t+F1fPdNP O4Pg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o9si14153040eji.141.2021.03.30.04.23.14; Tue, 30 Mar 2021 04:23:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231819AbhC3LWC (ORCPT + 99 others); Tue, 30 Mar 2021 07:22:02 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:14635 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230303AbhC3LV5 (ORCPT ); Tue, 30 Mar 2021 07:21:57 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4F8n5f2Lr3zmbHR; Tue, 30 Mar 2021 19:19:18 +0800 (CST) Received: from [10.174.179.86] (10.174.179.86) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.498.0; Tue, 30 Mar 2021 19:21:54 +0800 Subject: Re: [Question] Is there a race window between swapoff vs synchronous swap_readpage To: "Huang, Ying" CC: Linux-MM , linux-kernel , Andrew Morton , Matthew Wilcox , Yu Zhao , "Shakeel Butt" , Alex Shi , "Minchan Kim" References: <364d7ce9-ccb7-fa04-7067-44a96be87060@huawei.com> <8735wdbdy4.fsf@yhuang6-desk1.ccr.corp.intel.com> <0cb765aa-1783-cd62-c4a4-b3fbc620532d@huawei.com> <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com> From: Miaohe Lin Message-ID: <7d2126a2-e67e-cadb-d732-77f8d54a2f0c@huawei.com> Date: Tue, 30 Mar 2021 19:21:54 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <87h7kt9ufw.fsf@yhuang6-desk1.ccr.corp.intel.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.86] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/3/30 11:44, Huang, Ying wrote: > Miaohe Lin writes: > >> On 2021/3/30 9:57, Huang, Ying wrote: >>> Hi, Miaohe, >>> >>> Miaohe Lin writes: >>> >>>> Hi all, >>>> I am investigating the swap code, and I found the below possible race window: >>>> >>>> CPU 1 CPU 2 >>>> ----- ----- >>>> do_swap_page >>>> skip swapcache case (synchronous swap_readpage) >>>> alloc_page_vma >>>> swapoff >>>> release swap_file, bdev, or ... >>>> swap_readpage >>>> check sis->flags is ok >>>> access swap_file, bdev or ...[oops!] >>>> si->flags = 0 >>>> >>>> The swapcache case is ok because swapoff will wait on the page_lock of swapcache page. >>>> Is this will really happen or Am I miss something ? >>>> Any reply would be really grateful. Thanks! :) >>> >>> This appears possible. Even for swapcache case, we can't guarantee the >> >> Many thanks for reply! >> >>> swap entry gotten from the page table is always valid too. The >> >> The page table may change at any time. And we may thus do some useless work. >> But the pte_same() check could handle these races correctly if these do not >> result in oops. >> >>> underlying swap device can be swapped off at the same time. So we use >>> get/put_swap_device() for that. Maybe we need similar stuff here. >> >> Using get/put_swap_device() to guard against swapoff for swap_readpage() sounds >> really bad as swap_readpage() may take really long time. Also such race may not be >> really hurtful because swapoff is usually done when system shutdown only. >> I can not figure some simple and stable stuff out to fix this. Any suggestions or >> could anyone help get rid of such race? > > Some reference counting on the swap device can prevent swap device from > swapping-off. To reduce the performance overhead on the hot-path as > much as possible, it appears we can use the percpu_ref. > Sounds a good idea. Many thanks for your suggestion. :) > Best Regards, > Huang, Ying > . >