Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp1406514pxb; Sun, 11 Apr 2021 18:55:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNZDIcGrkXW3MKaGFpJLUQtF/ndCuCOWxVUI4mi8QXVvxGpgccBW97aJklRFwIqWeTvfEr X-Received: by 2002:a17:90a:528b:: with SMTP id w11mr26438102pjh.162.1618192500212; Sun, 11 Apr 2021 18:55:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618192500; cv=none; d=google.com; s=arc-20160816; b=rno7UELMh37ZOgbOEFCrCZHl/EKw86THXwZq1H68QTAADs/IGIdTGzuRi6b3U66Yr4 tMrgWk8KTV1DBQ0lzHIcelphzzmRzCnGChLLD9i15R/pcrXOEnd0ileliyKt8z1mYaHf 9+iBG+czDN2MvpQTvD3PxWvhJuAeIBP67ciRme6PyzZgJo3Gs2sOJ8i1kQyv2txkoqe/ I7V64MXi6dRg8a0GViZbNl7ppYHPlllCFHIp7vv7MoQ5QLyNlHA6iZF5SZ0MRoPKgvDs NDvpQh0O8kvhUtDdhKA/Ur+epEVUR85bYrFiRGeqWnKActW5nNbxU4PiYwDtFcRw6ISB M9OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:ironport-sdr:ironport-sdr; bh=OMuxHWh/xtDkNbCj4x33LN0X5oYj/6XJUz6AHXkUxuM=; b=E1MIUIR3QJoGT7TY9elTKOaA8auTXmmNuM53zHh7IgH1+ugSt7hEeDRYDuGYbrfGBx KzyIINZMT6f5tBcpv+u+x9aIuJcxR6+GsupesX5yaOd33+WZ8w9dpzB+jBnNGmBYBY4s 9gI+PqJrbcBzG+Y726q9+eEjArR7tuzFlWLBBmFR+NG1tFVKDKMt2u5JiZ+6To4EKCHs AlEfrcbuCYgeLN7rfz7LSggnikVoZlAJOODgSgWqxsPH07sefaW9tvUvirW864CFOuNI +VsOE62Ws7Ahp+W7ua4ZfzCAFAkqVhvk3d7qpLPouIZVmKBTlf03HZcEsgS6VsL31O/b tgeA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d10si11418580pgp.182.2021.04.11.18.54.47; Sun, 11 Apr 2021 18:55:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235837AbhDLBou (ORCPT + 99 others); Sun, 11 Apr 2021 21:44:50 -0400 Received: from mga09.intel.com ([134.134.136.24]:16117 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235543AbhDLBos (ORCPT ); Sun, 11 Apr 2021 21:44:48 -0400 IronPort-SDR: WP7ihIab611jDvhYJKrkOmidfX8AtLmhScRFE5eexnwiKN+fPquVtR/wVNy6j5exPqWAVmtETq BGqcMwbhcelQ== X-IronPort-AV: E=McAfee;i="6000,8403,9951"; a="194203436" X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="194203436" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:30 -0700 IronPort-SDR: /TLdrKv6QNTpc9r6cNaJvN03c0rl7R1vpQvNcc+PWAqDmUej2sY+vRll5ALrrzHw6Wxu4tLV6A 5a3AdkRs4etQ== X-IronPort-AV: E=Sophos;i="5.82,214,1613462400"; d="scan'208";a="459970583" Received: from yhuang6-desk1.sh.intel.com (HELO yhuang6-desk1.ccr.corp.intel.com) ([10.239.13.1]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Apr 2021 18:44:26 -0700 From: "Huang, Ying" To: Miaohe Lin Cc: Tim Chen , , , , , , , , , , , , Subject: Re: [PATCH 2/5] swap: fix do_swap_page() race with swapoff References: <20210408130820.48233-1-linmiaohe@huawei.com> <20210408130820.48233-3-linmiaohe@huawei.com> <7684b3de-2824-9b1f-f033-d4bc14f9e195@linux.intel.com> <50d34b02-c155-bad7-da1f-03807ad31275@huawei.com> <995a130b-f07a-4771-1fe3-477d2f3c1e8e@linux.intel.com> Date: Mon, 12 Apr 2021 09:44:23 +0800 In-Reply-To: (Miaohe Lin's message of "Sat, 10 Apr 2021 11:17:29 +0800") Message-ID: <87r1jgwa2w.fsf@yhuang6-desk1.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Miaohe Lin writes: > On 2021/4/10 1:17, Tim Chen wrote: >> >> >> On 4/9/21 1:42 AM, Miaohe Lin wrote: >>> On 2021/4/9 5:34, Tim Chen wrote: >>>> >>>> >>>> On 4/8/21 6:08 AM, Miaohe Lin wrote: >>>>> When I was investigating the swap code, I found the below possible race >>>>> window: >>>>> >>>>> CPU 1 CPU 2 >>>>> ----- ----- >>>>> do_swap_page >>>>> synchronous swap_readpage >>>>> alloc_page_vma >>>>> swapoff >>>>> release swap_file, bdev, or ... >>>> >>> >>> Many thanks for quick review and reply! >>> >>>> Perhaps I'm missing something. The release of swap_file, bdev etc >>>> happens after we have cleared the SWP_VALID bit in si->flags in destroy_swap_extents >>>> if I read the swapoff code correctly. >>> Agree. Let's look this more close: >>> CPU1 CPU2 >>> ----- ----- >>> swap_readpage >>> if (data_race(sis->flags & SWP_FS_OPS)) { >>> swapoff >>> p->swap_file = NULL; >>> struct file *swap_file = sis->swap_file; >>> struct address_space *mapping = swap_file->f_mapping;[oops!] >>> ... >>> p->flags = 0; >>> ... >>> >>> Does this make sense for you? >> >> p->swapfile = NULL happens after the >> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence in swapoff(). >> >> So I don't think the sequence you illustrated on CPU2 is in the right order. >> That said, without get_swap_device/put_swap_device in swap_readpage, you could >> potentially blow pass synchronize_rcu() on CPU2 and causes a problem. so I think >> the problematic race looks something like the following: >> >> >> CPU1 CPU2 >> ----- ----- >> swap_readpage >> if (data_race(sis->flags & SWP_FS_OPS)) { >> swapoff >> p->flags = &= ~SWP_VALID; >> .. >> synchronize_rcu(); >> .. >> p->swap_file = NULL; >> struct file *swap_file = sis->swap_file; >> struct address_space *mapping = swap_file->f_mapping;[oops!] >> ... >> ... >> > > Agree. This is also what I meant to illustrate. And you provide a better one. Many thanks! For the pages that are swapped in through swap cache. That isn't an issue. Because the page is locked, the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been unlocked. So the race is for the fast path as follows, if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) I found it in your original patch description. But please make it more explicit to reduce the potential confusing. Best Regards, Huang, Ying