Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp878445imj; Fri, 15 Feb 2019 08:16:22 -0800 (PST) X-Google-Smtp-Source: AHgI3IZIjYX0PFEY+318R5Ip6gQz0nU8I3t02bhxoqoFQnKJZzpjn2cpOw92F6x23el2tUBwW4Di X-Received: by 2002:a65:5c4b:: with SMTP id v11mr6099242pgr.333.1550247382503; Fri, 15 Feb 2019 08:16:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550247382; cv=none; d=google.com; s=arc-20160816; b=tDNzS0jWyPJFPrw89mHhYIkVLk2oC5rzh7lbqGOu5G/cByvhnp4KRArsZ6pjWUAx8P ZyMq/A9nYQkL8RYykdLRa0NDH8GbyRm7CZbYrmg/oTv2alteuHqqHmQ/ReZMwXa477w7 BHOofsRFpNRrMohDDnOkeBa/N+HwaebDVIkITz2kuKjIcag6ulDAPCQrKhPrwHi4FOox jZT7fr2tfN8mkiDacAA49fjQFmuOZQz6suI7yE8iDviKlb5Cso8fADkWiNXgwGsCerry +IRFGj58U8AsC2WH4VNcJK8X/wVuRmun6PCrOZJV7DUbDs8aHcnQJPuePLAJUz9UbSiS 2KtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=eiGYyTMLjvoCfZDYnRCaaCBm4lN8kCVhrT3KK2fZQ5c=; b=yi0UJLzgz6woxDWD59TVS1VFBC6Ixk0uMMXjRuEhNjl7JZGyWrVhSLAlwASyogEFua GODxS2RoSUcCHZtBEqeXUMgFOtoGT4lX5Ital4eVGEf30gtyQO7saG/T5Ne7zkJF7iWt RwJRR2DiwXqh++0j9VvntdDps3HZYBlfvBnuKJkaIysIewiA7bTyUCsXF33eBNJz2EaD oD44eoBkzPyx0aNQjTFL+Aj17XXJ+lHZV/H0P9qsJLD5gzNYaZyu67B5lMLfwHFoSVux qWZV5Koghl1zCuLJzqFvKIOwGPESRFWOfIoqJNM/MfSLriYvPWE68qjaHkQcrDfOAMkU Rc3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 72si3120037plb.224.2019.02.15.08.16.06; Fri, 15 Feb 2019 08:16:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437042AbfBONLZ (ORCPT + 99 others); Fri, 15 Feb 2019 08:11:25 -0500 Received: from mx2.suse.de ([195.135.220.15]:52502 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388952AbfBONLZ (ORCPT ); Fri, 15 Feb 2019 08:11:25 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4B6ADADCA; Fri, 15 Feb 2019 13:11:23 +0000 (UTC) Date: Fri, 15 Feb 2019 14:11:22 +0100 From: Michal Hocko To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , "Paul E . McKenney" , Minchan Kim , Johannes Weiner , Tim Chen , Mel Gorman , =?utf-8?B?Suly9G1l?= Glisse , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , Daniel Jordan , Andrea Parri Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations Message-ID: <20190215131122.GA4525@dhcp22.suse.cz> References: <20190211083846.18888-1-ying.huang@intel.com> <20190214143318.GJ4525@dhcp22.suse.cz> <871s49bkaz.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871s49bkaz.fsf@yhuang-dev.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 15-02-19 15:08:36, Huang, Ying wrote: > Michal Hocko writes: > > > On Mon 11-02-19 16:38:46, Huang, Ying wrote: > >> From: Huang Ying > >> > >> When swapin is performed, after getting the swap entry information from > >> the page table, system will swap in the swap entry, without any lock held > >> to prevent the swap device from being swapoff. This may cause the race > >> like below, > >> > >> CPU 1 CPU 2 > >> ----- ----- > >> do_swap_page > >> swapin_readahead > >> __read_swap_cache_async > >> swapoff swapcache_prepare > >> p->swap_map = NULL __swap_duplicate > >> p->swap_map[?] /* !!! NULL pointer access */ > >> > >> Because swapoff is usually done when system shutdown only, the race may > >> not hit many people in practice. But it is still a race need to be fixed. > >> > >> To fix the race, get_swap_device() is added to check whether the specified > >> swap entry is valid in its swap device. If so, it will keep the swap > >> entry valid via preventing the swap device from being swapoff, until > >> put_swap_device() is called. > >> > >> Because swapoff() is very rare code path, to make the normal path runs as > >> fast as possible, disabling preemption + stop_machine() instead of > >> reference count is used to implement get/put_swap_device(). From > >> get_swap_device() to put_swap_device(), the preemption is disabled, so > >> stop_machine() in swapoff() will wait until put_swap_device() is called. > >> > >> In addition to swap_map, cluster_info, etc. data structure in the struct > >> swap_info_struct, the swap cache radix tree will be freed after swapoff, > >> so this patch fixes the race between swap cache looking up and swapoff > >> too. > >> > >> Races between some other swap cache usages protected via disabling > >> preemption and swapoff are fixed too via calling stop_machine() between > >> clearing PageSwapCache() and freeing swap cache data structure. > >> > >> Alternative implementation could be replacing disable preemption with > >> rcu_read_lock_sched and stop_machine() with synchronize_sched(). > > > > using stop_machine is generally discouraged. It is a gross > > synchronization. > > > > Besides that, since when do we have this problem? > > For problem, you mean the race between swapoff and the page fault > handler? yes > The problem is introduced in v4.11 when we avoid to replace > swap_info_struct->lock with swap_cluster_info->lock in > __swap_duplicate() if possible to improve the scalability of swap > operations. But because swapoff is a really rare operation, I don't > think it's necessary to backport the fix. Well, a lack of any bug reports would support your theory that this is unlikely to hit in practice. Fixes tag would be nice to have regardless though. Thanks! -- Michal Hocko SUSE Labs