Received: by 10.223.185.116 with SMTP id b49csp2275279wrg; Sat, 17 Feb 2018 17:08:11 -0800 (PST) X-Google-Smtp-Source: AH8x2277ycnHFP9/EI59lsmfrpVte3KNM/kAw2ddiDAFRkmyXihCo33cBAuo6VGYXxw6IDBbEmzh X-Received: by 2002:a17:902:7008:: with SMTP id y8-v6mr10013395plk.358.1518916091715; Sat, 17 Feb 2018 17:08:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518916091; cv=none; d=google.com; s=arc-20160816; b=miOPAo91vjwtXbBU0USF19Bbwe0pdf0yuhDKVt6ywFoMoN/pNVLiNw+H4A0HjdMSvR 6JjtebXoxpJKUTcIikUL4yi40ZFvFM8IAo8LtOvmuVmrnIm2boJ0MGDuI451bLGXiV3x s88ooQkRzpwXhCaZeRUjSoCIP3vIw1X/Byh1snpZa5gLJCGE2Hl6xMoaKiliP9EJQFxJ lTfNEl0EKNYUrqsoq2SLWopLzMxptsDTrcd3GMsTzbgeZkcll7yMuzCwhXZc1fbZuF6i SSfATDZNkDZ1/+EHtZ8NcceCTtbe39wjeRE5ZGyxtYp0aRHml4jDpc2CutsFAU6ujiZK 9p6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=eefhBX34Mxg+23f6F7QvFmisvqBpla4lBJ+Cy87O4ck=; b=GxYmT3g7GKqFDZRxTTogt9vb4jLQi8Xju+ol8xBdw3sqdCahicMnArtYFJK5tmV9Dg vlzvpUBtH4LbBBKguQ0i7ODLl3vz9k+HGWtbuSA3UA91Jzbu9mtmXLrIrtymU1Lhx3xu 2BfXg7AL8BhvdHsuhCa8+SHmx5OxkK2WB+UqjdQOoIaGzFVaj6PcZ2l/Tu7Ur/KqGgWA NTYHESbeX10auwmeGuVKDxplIBjVSy/ajVkBIWNHfwNnHKuPIxkERHTgzuoFFpXXyWou SyzXRCj86IgknmuFgh6HyjTTsv6H75gLjSIskJmHpvXuJ9jI+ATY8yrY28E5Em9xhsHH DQEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=SA2QYRR+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 41-v6si2041397plf.521.2018.02.17.17.07.45; Sat, 17 Feb 2018 17:08:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=SA2QYRR+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751247AbeBRBGu (ORCPT + 99 others); Sat, 17 Feb 2018 20:06:50 -0500 Received: from mail-qt0-f195.google.com ([209.85.216.195]:45352 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751083AbeBRBGt (ORCPT ); Sat, 17 Feb 2018 20:06:49 -0500 Received: by mail-qt0-f195.google.com with SMTP id v90so5542375qte.12 for ; Sat, 17 Feb 2018 17:06:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=eefhBX34Mxg+23f6F7QvFmisvqBpla4lBJ+Cy87O4ck=; b=SA2QYRR+tc3W+oUl2jHEmy163b/fly7btrIqIY4dHbKSNnizVnoG79/DhRYv8X2K00 +CEa7Mis8WmXHPIqjwHsgC0HCgAT5gUu7s1JVcu4lCTzhpAByBFNYP+YFr2Vk/nCVH58 ezHDaZ8N7QqeC7zlHjtioukK2yNb3397Q3CHNBVW1pOOe4A6yNAeVUYva5AIf8v5HfRe kHmHQyGGyOIUjEBf1+vrMSGXw79nyDJiM4yga+bNXWI6jf8+mjdmR2w1iBynvW7NefQr Eyew5uzFhKCbD5LKRs3uJofsp3wuByWxeUgdDieCZLobTC39uFfcUHKrTfm0dsNSjNmi EYfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=eefhBX34Mxg+23f6F7QvFmisvqBpla4lBJ+Cy87O4ck=; b=n/M8GsqAfR1FSOi93khu43MD3ko7kpp2bnUInMr4PC9i2HA5IQCX3EVeQCfofIvR+m KRseb+Aj7AIXlf5l4BHKOjhku0qa6DvReseTj9fwHtIi+DtfU3mvqtMrc1hgkuoSt0VW twQegWc3qdDRHUe8CpsgvnB+cBIBR4KpveEqY9YyXp5f3OqnlaWjjeFyO3bUFmumhJ7v zKjxChi5bzqyqbzHf7S6jd9r34Ik8s6MiwZhpErUoEH8Fc8VUAEbMYWVRwLG/uib4dWy +s2LpTBp8XnGJmHzxkr73hoCxMqkqlSXffRmJ6oolOoq59bWroEPvozwlefzZRVgD2qq CKcA== X-Gm-Message-State: APf1xPDwdS8kAA4hwaQ1rZSrAF5Elk3avbPaoSTxmE3qh2vcJAmbg3CN GDxYx4LdOx+zJCTduqHdzo5nO/SpZ4thOJR1iIw= X-Received: by 10.200.34.1 with SMTP id o1mr18039274qto.103.1518916008211; Sat, 17 Feb 2018 17:06:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.27.235 with HTTP; Sat, 17 Feb 2018 17:06:47 -0800 (PST) In-Reply-To: <20180216153823.ad74f1d2c157adc67ed2c970@linux-foundation.org> References: <20180213014220.2464-1-ying.huang@intel.com> <20180213154123.9f4ef9e406ea8365ca46d9c5@linux-foundation.org> <87fu64jthz.fsf@yhuang-dev.intel.com> <20180216153823.ad74f1d2c157adc67ed2c970@linux-foundation.org> From: huang ying Date: Sun, 18 Feb 2018 09:06:47 +0800 Message-ID: Subject: Re: [PATCH -mm -v5 RESEND] mm, swap: Fix race between swapoff and some swap operations To: Andrew Morton Cc: "Huang, Ying" , linux-mm@kvack.org, LKML , Hugh Dickins , "Paul E . McKenney" , Minchan Kim , Johannes Weiner , Tim Chen , Shaohua Li , Mel Gorman , jglisse@redhat.com, Michal Hocko , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , Aaron Lu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 17, 2018 at 7:38 AM, Andrew Morton wrote: > On Wed, 14 Feb 2018 08:38:00 +0800 "Huang\, Ying" wrote: > >> Andrew Morton writes: >> >> > On Tue, 13 Feb 2018 09:42:20 +0800 "Huang, Ying" wrote: >> > >> >> From: Huang Ying >> >> >> >> When the swapin is performed, after getting the swap entry information >> >> from the page table, system will swap in the swap entry, without any >> >> lock held to prevent the swap device from being swapoff. This may >> >> cause the race like below, >> > >> > Sigh. In terms of putting all the work into the swapoff path and >> > avoiding overheads in the hot paths, I guess this is about as good as >> > it will get. >> > >> > It's a very low-priority fix so I'd prefer to keep the patch in -mm >> > until Hugh has had an opportunity to think about it. >> > >> >> ... >> >> >> >> +/* >> >> + * Check whether swap entry is valid in the swap device. If so, >> >> + * return pointer to swap_info_struct, and keep the swap entry valid >> >> + * via preventing the swap device from being swapoff, until >> >> + * put_swap_device() is called. Otherwise return NULL. >> >> + */ >> >> +struct swap_info_struct *get_swap_device(swp_entry_t entry) >> >> +{ >> >> + struct swap_info_struct *si; >> >> + unsigned long type, offset; >> >> + >> >> + if (!entry.val) >> >> + goto out; >> >> + type = swp_type(entry); >> >> + if (type >= nr_swapfiles) >> >> + goto bad_nofile; >> >> + si = swap_info[type]; >> >> + >> >> + preempt_disable(); >> > >> > This preempt_disable() is later than I'd expect. If a well-timed race >> > occurs, `si' could now be pointing at a defunct entry. If that >> > well-timed race include a swapoff AND a swapon, `si' could be pointing >> > at the info for a new device? >> >> struct swap_info_struct pointed to by swap_info[] will never be freed. >> During swapoff, we only free the memory pointed to by the fields of >> struct swap_info_struct. And when swapon, we will always reuse >> swap_info[type] if it's not NULL. So it should be safe to dereference >> swap_info[type] with preemption enabled. > > That's my point. If there's a race window during which there is a > parallel swapoff+swapon, this swap_info_struct may now be in use for a > different device? Yes. It's possible. And the caller of get_swap_device() can live with it if the swap_info_struct has been fully initialized. For example, for the race in the patch description, do_swap_page swapin_readahead __read_swap_cache_async swapcache_prepare __swap_duplicate in __swap_duplicate(), it's possible that the swap device returned by get_swap_device() is different from the swap device when __swap_duplicate() call get_swap_device(). But the struct_info_struct has been fully initialized, so __swap_duplicate() can reference si->swap_map[] safely. And we will check si->swap_map[] before any further operation. Even if the swap entry is swapped out again for the new swap device, we will check the page table again in do_swap_page(). So there is no functionality problem. Best Regards, Huang, Ying