Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752110AbbHRAfy (ORCPT ); Mon, 17 Aug 2015 20:35:54 -0400 Received: from mail-pa0-f41.google.com ([209.85.220.41]:34207 "EHLO mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750975AbbHRAfv (ORCPT ); Mon, 17 Aug 2015 20:35:51 -0400 Date: Mon, 17 Aug 2015 17:34:27 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Andrey Konovalov , Al Viro , Hugh Dickins , Michal Hocko , Johannes Weiner , Vladimir Davydov , Jason Low , Cesar Eduardo Barros , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dmitry Vyukov , Kostya Serebryany , Alexander Potapenko Subject: [PATCH] mm: fix potential data race in SyS_swapon In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3439 Lines: 100 While running KernelThreadSanitizer (ktsan) on upstream kernel with trinity, we got a few reports from SyS_swapon, here is one of them: Read of size 8 by thread T307 (K7621): [< inlined >] SyS_swapon+0x3c0/0x1850 SYSC_swapon mm/swapfile.c:2395 [] SyS_swapon+0x3c0/0x1850 mm/swapfile.c:2345 [] ia32_do_call+0x1b/0x25 Looks like the swap_lock should be taken when iterating through the swap_info array on lines 2392 - 2401: q->swap_file may be reset to NULL by another thread before it is dereferenced for f_mapping. But why is that iteration needed at all? Doesn't the claim_swapfile() which follows do all that is needed to check for a duplicate entry - FMODE_EXCL on a bdev, testing IS_SWAPFILE under i_mutex on a regfile? Well, not quite: bd_may_claim() allows the same "holder" to claim the bdev again, so we do need to use a different holder than "sys_swapon"; and we should not replace appropriate -EBUSY by inappropriate -EINVAL. Index i was reused in a cpu loop further down: renamed cpu there. Reported-by: Andrey Konovalov Signed-off-by: Hugh Dickins --- mm/swapfile.c | 25 +++++++------------------ 1 file changed, 7 insertions(+), 18 deletions(-) --- 4.2-rc7/mm/swapfile.c 2015-07-05 19:25:02.852131158 -0700 +++ linux/mm/swapfile.c 2015-08-16 21:30:22.694123923 -0700 @@ -2143,11 +2143,10 @@ static int claim_swapfile(struct swap_in if (S_ISBLK(inode->i_mode)) { p->bdev = bdgrab(I_BDEV(inode)); error = blkdev_get(p->bdev, - FMODE_READ | FMODE_WRITE | FMODE_EXCL, - sys_swapon); + FMODE_READ | FMODE_WRITE | FMODE_EXCL, p); if (error < 0) { p->bdev = NULL; - return -EINVAL; + return error; } p->old_block_size = block_size(p->bdev); error = set_blocksize(p->bdev, PAGE_SIZE); @@ -2348,7 +2347,6 @@ SYSCALL_DEFINE2(swapon, const char __use struct filename *name; struct file *swap_file = NULL; struct address_space *mapping; - int i; int prio; int error; union swap_header *swap_header; @@ -2388,19 +2386,8 @@ SYSCALL_DEFINE2(swapon, const char __use p->swap_file = swap_file; mapping = swap_file->f_mapping; - - for (i = 0; i < nr_swapfiles; i++) { - struct swap_info_struct *q = swap_info[i]; - - if (q == p || !q->swap_file) - continue; - if (mapping == q->swap_file->f_mapping) { - error = -EBUSY; - goto bad_swap; - } - } - inode = mapping->host; + /* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */ error = claim_swapfile(p, inode); if (unlikely(error)) @@ -2433,6 +2420,8 @@ SYSCALL_DEFINE2(swapon, const char __use goto bad_swap; } if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) { + int cpu; + p->flags |= SWP_SOLIDSTATE; /* * select a random position to start with to help wear leveling @@ -2451,9 +2440,9 @@ SYSCALL_DEFINE2(swapon, const char __use error = -ENOMEM; goto bad_swap; } - for_each_possible_cpu(i) { + for_each_possible_cpu(cpu) { struct percpu_cluster *cluster; - cluster = per_cpu_ptr(p->percpu_cluster, i); + cluster = per_cpu_ptr(p->percpu_cluster, cpu); cluster_set_null(&cluster->index); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/