Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1191024ybn; Wed, 2 Oct 2019 12:11:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqwnUuoAWCVwPB90OLgrKWLYEzV4Bn6PeBa4C0KitdPoAEHqVxXs6xz2VM04D5/A1PeM+quo X-Received: by 2002:a17:906:e109:: with SMTP id gj9mr4453528ejb.160.1570043517151; Wed, 02 Oct 2019 12:11:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570043517; cv=none; d=google.com; s=arc-20160816; b=G2L4E0P2tkvdB2C8/ZIymgP+EVMfbRnLF4fZv2xadDCM2CAAXPAi3F+qQKETvLVwwB Ls0b/gkAgzqUBsYXYCo+i71SB4oNwh4Tz2zamohuU/CNVsBj5ccm2KwjtI50vqpaPFlh +c/wEL1ds2648leLrL6jqJJwxke0QV+dztcZ4LfBJKpkKLl88oVKEqfkIDHTP+S4oNpt VyJwq3u9060tjC6wr0JglY/KoghjrFwguhU155mdYex+o/vw0QlPtyHI+Ow5AdxLz0P9 GzLSOlvxC9ymx2AFqtHCfHYAxu25FKMkMlkkXGrbeyFz2SKi15QoZsQdI9z1MnyjNl8y ks1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition; bh=VYHZ4t2gFr3g1UaUttqfEt0RGDNTJN/wqXq6pTY4jDg=; b=P/x6eglJwpQfJTPKm2irqw2m78xZ3XFAuvzUBw9izprr/GnPzH4vbUGfB18UUyhWqM xbJGyQCh1USZ7Cmuo8YGMtetNtpszzhHMzofpYw/TkqXDFFZY6oniEsjOJKHwtrAZnBB VB9BLrnS13irLZa/jjfYhZYHRH3LkuM88edqmC1kBwxOa9tMK0BhDUIt3slaXVvT3/vg QxvvVjZFXkwu0fEbMvaZvoXK1GSUqvKaFTVuZeKeLn224jmJORw9TJtLzhJq1uT0x27v 327zyz/A5GJ+aIZrvbJG6vsrj+OQUx/TGlB4BuLldAn4pPUc+3wkYE+M69HvWvAJMxjm ZEpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e31si21482ede.199.2019.10.02.12.11.32; Wed, 02 Oct 2019 12:11:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729638AbfJBTJb (ORCPT + 99 others); Wed, 2 Oct 2019 15:09:31 -0400 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:35962 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729340AbfJBTIS (ORCPT ); Wed, 2 Oct 2019 15:08:18 -0400 Received: from [192.168.4.242] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iFjyx-000368-Qn; Wed, 02 Oct 2019 20:08:15 +0100 Received: from ben by deadeye with local (Exim 4.92.1) (envelope-from ) id 1iFjyp-0003ep-7q; Wed, 02 Oct 2019 20:08:07 +0100 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, Denis Kirjanov , "Josef Bacik" , "Filipe Manana" Date: Wed, 02 Oct 2019 20:06:51 +0100 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) X-Patchwork-Hint: ignore Subject: [PATCH 3.16 62/87] Btrfs: fix race between readahead and device replace/removal In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.242 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.75-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit ce7791ffee1e1ee9f97193b817c7dd1fa6746aad upstream. The list of devices is protected by the device_list_mutex and the device replace code, in its finishing phase correctly takes that mutex before removing the source device from that list. However the readahead code was iterating that list without acquiring the respective mutex leading to crashes later on due to invalid memory accesses: [125671.831036] general protection fault: 0000 [#1] PREEMPT SMP [125671.832129] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis tpm ppdev evdev parport_pc psmouse sg parport processor ser [125671.834973] CPU: 10 PID: 19603 Comm: kworker/u32:19 Tainted: G W 4.6.0-rc7-btrfs-next-29+ #1 [125671.834973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [125671.834973] Workqueue: btrfs-readahead btrfs_readahead_helper [btrfs] [125671.834973] task: ffff8801ac520540 ti: ffff8801ac918000 task.ti: ffff8801ac918000 [125671.834973] RIP: 0010:[] [] __radix_tree_lookup+0x6a/0x105 [125671.834973] RSP: 0018:ffff8801ac91bc28 EFLAGS: 00010206 [125671.834973] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6a RCX: 0000000000000000 [125671.834973] RDX: 0000000000000000 RSI: 00000000000c1bff RDI: ffff88002ebd62a8 [125671.834973] RBP: ffff8801ac91bc70 R08: 0000000000000001 R09: 0000000000000000 [125671.834973] R10: ffff8801ac91bc70 R11: 0000000000000000 R12: ffff88002ebd62a8 [125671.834973] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000c1bff [125671.834973] FS: 0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000 [125671.834973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [125671.834973] CR2: 000000000073cae4 CR3: 00000000b7723000 CR4: 00000000000006e0 [125671.834973] Stack: [125671.834973] 0000000000000000 ffff8801422d5600 ffff8802286bbc00 0000000000000000 [125671.834973] 0000000000000001 ffff8802286bbc00 00000000000c1bff 0000000000000000 [125671.834973] ffff88002e639eb8 ffff8801ac91bc80 ffffffff81270541 ffff8801ac91bcb0 [125671.834973] Call Trace: [125671.834973] [] radix_tree_lookup+0xd/0xf [125671.834973] [] reada_peer_zones_set_lock+0x3e/0x60 [btrfs] [125671.834973] [] reada_pick_zone+0x29/0x103 [btrfs] [125671.834973] [] reada_start_machine_worker+0x129/0x2d3 [btrfs] [125671.834973] [] btrfs_scrubparity_helper+0x185/0x3aa [btrfs] [125671.834973] [] btrfs_readahead_helper+0xe/0x10 [btrfs] [125671.834973] [] process_one_work+0x271/0x4e9 [125671.834973] [] worker_thread+0x1eb/0x2c9 [125671.834973] [] ? rescuer_thread+0x2b3/0x2b3 [125671.834973] [] kthread+0xd4/0xdc [125671.834973] [] ret_from_fork+0x22/0x40 [125671.834973] [] ? kthread_stop+0x286/0x286 So fix this by taking the device_list_mutex in the readahead code. We can't use here the lighter approach of using a rcu_read_lock() and rcu_read_unlock() pair together with a list_for_each_entry_rcu() call because we end up doing calls to sleeping functions (kzalloc()) in the respective code path. Signed-off-by: Filipe Manana Reviewed-by: Josef Bacik Signed-off-by: Ben Hutchings --- fs/btrfs/reada.c | 2 ++ 1 file changed, 2 insertions(+) --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -766,12 +766,14 @@ static void __reada_start_machine(struct do { enqueued = 0; + mutex_lock(&fs_devices->device_list_mutex); list_for_each_entry(device, &fs_devices->devices, dev_list) { if (atomic_read(&device->reada_in_flight) < MAX_IN_FLIGHT) enqueued += reada_start_machine_dev(fs_info, device); } + mutex_unlock(&fs_devices->device_list_mutex); total += enqueued; } while (enqueued && total < 10000);