Received: by 10.192.165.148 with SMTP id m20csp4323579imm; Tue, 8 May 2018 06:45:20 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp/K2cErISHX+oi3Uj5bsCKGP/S8s3b/ivRsRh/SMKE1XOZZSdvzL+OynEvPt4Lyb0MDgPX X-Received: by 10.98.172.20 with SMTP id v20mr40838105pfe.101.1525787120724; Tue, 08 May 2018 06:45:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525787120; cv=none; d=google.com; s=arc-20160816; b=bEewzSqucO3Z9/UaD3ovMWe4atiAKxzNaO0p2lbhXLGpTCy3M9apYzEsgI/GQTivlc wIx7QPOVQfGYmGxjgAHF4Hli8HZPpEIE63z4KGwEJ2WT4N7jFjKe8EhikmRNahz+H8ma OvCH9jHE10YPzF8lR9eA5q7cYobmX2YD8fsonDcHIDDDatCKG2yUzyyKibjrx7pbLEfr vJbzyHxIbMKt/qKn6gVHnKKsDKric7VN6jDVTgn899Z3Jd1oNWT9t6gRnfu2uSeg5EG2 I4QSYFPKXmq0Vq8KxjZ9MqunyL8BjHwsRm5svw7I6yz+fWmrWJKOprFaIWxs12JvcN0X 99sA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=HbICkmKSUiRapC0qeuFiznUIF6bwYOsLcR9bs5tkCe0=; b=I5LNtgveXCY1omXGIQL8pzWUVdfalk1Vzmp+kYMsb1m/9iejTlN300cLxnn+DC4fXV pMdu/RjFVHmTvuN7InL2YE9XTE45yKHi5B2Ybnog+WokjM3ZM/hDZ1/Y26ubuoQpblZS Zj6Yb2MPBqJgxxQY4qAkwXYpUHZvTyTn461F/AwUr1IZC0d4hxGiA6s67XtmXMAbvKe7 xRD15Ez0f1xCbF23KAE5PgurtRSH03WYdRmNK4snsEp9r+XEccH6Ry4AtRoq6aWAzhs8 pEFdNXc58UimVehHpjmejtJHY0yc7HCdwIXNR9RTOEKVBKA41eCFm/N75ToXTt/Y3OI2 wVuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=becZVfu8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n136si24331926pfd.312.2018.05.08.06.45.05; Tue, 08 May 2018 06:45:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=becZVfu8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755211AbeEHNnt (ORCPT + 99 others); Tue, 8 May 2018 09:43:49 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:36142 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752563AbeEHNnr (ORCPT ); Tue, 8 May 2018 09:43:47 -0400 Received: by mail-wr0-f194.google.com with SMTP id p4-v6so2080640wrh.3 for ; Tue, 08 May 2018 06:43:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=HbICkmKSUiRapC0qeuFiznUIF6bwYOsLcR9bs5tkCe0=; b=becZVfu8gGIc1fxfClrEAUK0ttFXXEOgENAHax2gD2o4+AbJqvtf+nxDU7oF3bRRgV t9skFukgWRm+N/XPTjDpqwjoVNL25f0j3omL6/d4WkkPGUpqMdl/L/C5680YD1+jbJHN GGRM37NyAD1ARN4E5Cwaa+0TcvRTYgARasQksRbN+yyv1I4UMhz8AKg1REGUol7Y3O4z YYKsmIXQZ+FP0J3NxoCLFzpEL7kDGzqJvaGElo99lXVtWtif4k4q2GAtPvPYGebEACP3 FmtabOoglmqUO31dSh8l/4oUUj6b4pll1TyYfcow8H0rdsr/v8DNC+UXEMCzJhviaJJe xZzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=HbICkmKSUiRapC0qeuFiznUIF6bwYOsLcR9bs5tkCe0=; b=bBK+hCX0S3GimAQ8Q6WU47eW595uZwMVAo+czB6pZOQ3C6l5UfZ5QZGzUU1a7fQT2U 6K0lcb3BtZPqkNDODyPNNYXoLQ8OymhDtN4TYQl+oLl0AvueVuqPitJCyrytQZ1nUoKI OPM1if8+YSZr6fWsE3FdiDXPaUcKZoKlp0cBz+IzDUt1WU8Dj277vakcEE61a05mMK8Z HY/3yvU6L2TG/g0BFZDAsg0nesU9lxAJLaUFVe9MUSf5XIbV+m2igUzVFen0hF+RYX68 vuS1w0rqmZfHVkJVZGHp03EkxRPl9J7bxoJ/+BS2pF5CBd4yXYgUa7y350nqK3mdc/JO eOLw== X-Gm-Message-State: ALQs6tBJX3a3v89Oyo+bbTrPXeTv4Lo3n5cm46/Nagn+4AYSu+YzK+2i UVOU0atJFh0+rsfh/kBlIVdV0O/NuR95kbB3SCU= X-Received: by 2002:adf:9654:: with SMTP id c20-v6mr35226167wra.89.1525787025930; Tue, 08 May 2018 06:43:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.174.74 with HTTP; Tue, 8 May 2018 06:43:44 -0700 (PDT) In-Reply-To: <20180222073330.36259-1-carmark.dlut@gmail.com> References: <20180222073330.36259-1-carmark.dlut@gmail.com> From: Vegard Nossum Date: Tue, 8 May 2018 15:43:44 +0200 Message-ID: Subject: Re: [PATCH] fscache: fix a kernel BUG at fs/fscache/operation.c:69! To: David Howells Cc: carmark.dlut@gmail.com, linux-cachefs@redhat.com, LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22 February 2018 at 08:33, wrote: > From: Lei Xue > > There is a potential race in fscache operation enqueuing for reading and > copying multiple pages from cachefiles to netfs. > Under some heavy load system, it will happen very often. > > If this race occurs, an oops similar to the following is seen: > > kernel BUG at fs/fscache/operation.c:69! > invalid opcode: 0000 [#1] SMP > =E2=80=A6 > #0 [ffff883fff0838d8] machine_kexec at ffffffff81051beb > #1 [ffff883fff083938] crash_kexec at ffffffff810f2542 > #2 [ffff883fff083a08] oops_end at ffffffff8163e1a8 > #3 [ffff883fff083a30] die at ffffffff8101859b > #4 [ffff883fff083a60] do_trap at ffffffff8163d860 > #5 [ffff883fff083ab0] do_invalid_op at ffffffff81015204 > #6 [ffff883fff083b60] invalid_op at ffffffff8164701e > [exception RIP: fscache_enqueue_operation+246] > RIP: ffffffffa0b793c6 RSP: ffff883fff083c18 RFLAGS: 00010046 > RAX: 0000000000000019 RBX: ffff8832ed1a9ec0 RCX: 0000000000000006 > RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 > RBP: ffff883fff083c20 R8: 0000000000000086 R9: 000000000000178f > R10: ffffffff816aeb00 R11: ffff883fff08392e R12: ffff8802f0525620 > R13: ffff88407ffc01d8 R14: 0000000000000000 R15: 0000000000000003 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 > #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 > #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 > #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 > > Signed-off-by: Lei Xue > --- > fs/cachefiles/rdwr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c > index 883bc7bb12c5..9d5d13e150fb 100644 > --- a/fs/cachefiles/rdwr.c > +++ b/fs/cachefiles/rdwr.c > @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_entry_t *w= ait, unsigned mode, > > spin_lock(&object->work_lock); > list_add_tail(&monitor->op_link, &monitor->op->to_do); > + fscache_enqueue_retrieval(monitor->op); > spin_unlock(&object->work_lock); > > - fscache_enqueue_retrieval(monitor->op); > return 0; > } Hi, Just wondering what the status of this patch is? We've been hitting a similar problem and arrived at the same patch as a potential fix for it. Our crashes look like this: WARNING: CPU: 0 PID: 120693 at kernel/workqueue.c:618 insert_work+0x5f/0x70 Modules linked in: nbd CPU: 0 PID: 120693 Comm: sh Not tainted 4.16.2-0 #1 Hardware name: Oracle Corporation Sun Fire X4800/20434, BIOS 11080200 08/12/2016 RIP: 0010:insert_work+0x5f/0x70 RSP: 0018:ffff88103fa039b8 EFLAGS: 00010046 RAX: ffff88103f443f00 RBX: ffff880187c37c00 RCX: 0000000000000005 RDX: ffff880187c37c20 RSI: ffff8807c04dec00 RDI: 0000000000000000 RBP: ffff88103fa039c8 R08: 0000000000000101 R09: 0000000000000001 R10: ffff887eee68fd40 R11: 0000000000000001 R12: ffff88503fafc600 R13: 000000000001cf60 R14: ffff880187c37c00 R15: ffff88103f443f00 FS: 0000000000000000(0000) GS:ffff88103fa00000(0000) knlGS:000000000000000= 0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f394d2780a0 CR3: 0000000bcc936000 CR4: 00000000000006f0 Call Trace: __queue_work+0x11f/0x320 queue_work_on+0x19/0x30 fscache_enqueue_operation+0x83/0x160 cachefiles_read_waiter+0xd2/0x130 __wake_up_common+0x81/0x120 __wake_up_locked_key_bookmark+0x16/0x20 wake_up_page_bit+0x97/0xe0 unlock_page+0x20/0x30 page_endio+0x21/0xa0 mpage_end_io+0x41/0x60 bio_endio+0x78/0x90 dec_pending+0x140/0x250 ? linear_status+0x40/0x40 clone_endio+0x86/0x100 bio_endio+0x78/0x90 blk_update_request+0x8d/0x2b0 scsi_end_request+0x36/0x200 scsi_io_completion+0x12a/0x5e0 scsi_finish_command+0xf2/0x150 scsi_softirq_done+0x13e/0x160 __blk_mq_complete_request+0xb8/0x180 blk_mq_complete_request+0x57/0x70 scsi_mq_done+0x10/0x20 megasas_complete_cmd+0xdf/0x620 megasas_complete_cmd_dpc+0x8f/0x100 tasklet_action+0x9a/0xb0 __do_softirq+0xbf/0x1c8 irq_exit+0x9c/0xb0 do_IRQ+0x5b/0xe0 common_interrupt+0xf/0xf RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10 RSP: 0018:ffffc900309e3cf8 EFLAGS: 00000296 ORIG_RAX: ffffffffffffffde RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000001 RDX: ffffea0006793fe0 RSI: 0000000000000296 RDI: ffff88107ffff800 RBP: ffffc900309e3cf8 R08: 0000000000000002 R09: 000000000011b912 R10: 00000000000000e7 R11: 0000000000000000 R12: ffffea0014baa000 R13: ffff88103fa1d120 R14: ffff88107fff6000 R15: ffff88107fff6000 pagevec_lru_move_fn+0xb7/0xe0 ? pagevec_move_tail_fn+0x350/0x350 __pagevec_lru_add+0x12/0x20 lru_add_drain_cpu+0xc4/0xe0 lru_add_drain+0x10/0x20 exit_mmap+0x58/0x190 ? __handle_mm_fault+0x9a4/0x1540 ? hrtimer_try_to_cancel+0x1b/0xa0 mmput+0x4e/0x100 do_exit+0x22f/0xa10 do_group_exit+0x3a/0xa0 SyS_exit_group+0x12/0x20 do_syscall_64+0x61/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f394d325fa8 RSP: 002b:00007ffda407e668 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f394d603830 RCX: 00007f394d325fa8 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffffa8 R10: 00007f394d6097e0 R11: 0000000000000246 R12: 00007f394d603830 R13: 00007ffda407f8b2 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 4e 10 4c 89 01 e8 30 f8 ff ff f0 83 44 24 fc 00 8b 83 40 03 00 00 85 c0 75 08 f0 ff ff 48 83 c4 08 5b 5d c3 <0f> 0b eb b7 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 ---[ end trace e5ddade8fd0233c4 ]--- Some inconclusive notes, FWIW: First we get the workqueue warning where it's hitting the warning in set_work_data() where it checks that the work is not pending: static inline void set_work_data(struct work_struct *work, unsigned long da= ta, unsigned long flags) { WARN_ON_ONCE(!work_pending(work)); atomic_long_set(&work->data, data | flags | work_static(work)); } This is odd because the flag should have been set just a few moments before, in queue_work_on(): bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work) { bool ret =3D false; unsigned long flags; local_irq_save(flags); if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)= )) { __queue_work(cpu, wq, work); ret =3D true; } local_irq_restore(flags); return ret; } Then we have the next part giving a few more clues: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 =3D=3D 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! invalid opcode: 0000 [#1] SMP Modules linked in: nbd CPU: 1 PID: 122494 Comm: kworker/u257:3 Tainted: G W 4.16.2-0= #1 Hardware name: Oracle Corporation Sun Fire X4800/20434, BIOS 11080200 08/12/2016 Workqueue: fscache_operation fscache_op_work_func RIP: 0010:fscache_put_operation+0x1e3/0x1f0 RSP: 0018:ffffc9002ff8be08 EFLAGS: 00010296 RAX: 0000000000000019 RBX: ffff8807c04dec00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff88103fa56398 RDI: ffff88103fa56398 RBP: ffffc9002ff8be18 R08: 0000000000000000 R09: 0000000000000771 R10: ffffffff820afc80 R11: 0000000000000770 R12: ffff8807c04dec00 R13: ffff88103f443f00 R14: 0000000000000000 R15: ffff880187c37c00 FS: 0000000000000000(0000) GS:ffff88103fa40000(0000) knlGS:000000000000000= 0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002343008 CR3: 000000000240a000 CR4: 00000000000006e0 Call Trace: fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 Code: 31 c0 e8 26 11 ef ff 48 c7 c7 92 ee 2c 82 31 c0 e8 18 11 ef ff 8b 73 40 ba 05 00 00 ad 2a 82 31 c0 e8 02 11 ef ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 8 RIP: fscache_put_operation+0x1e3/0x1f0 RSP: ffffc9002ff8be08 ---[ end trace e5ddade8fd0233c5 ]--- The assertion comes from fscache_put_operation(): ASSERTIFCMP(op->state !=3D FSCACHE_OP_ST_INITIALISED && op->state !=3D FSCACHE_OP_ST_COMPLETE, op->state, =3D=3D, FSCACHE_OP_ST_CANCELLED); It expects op->state to be either INITIALISED, COMPLETE, or CANCELLED, but it is in fact DEAD.The only place that sets it to DEAD is fscache_put_operation() itself (a few lines after this assert). So we somehow have an operation that manages to have its refcount drop to 0 twice in a row -- without hitting the assertion that checks for a positive refcount. We can tell from the first stack trace that we were just in fscache_enqueue_operation(), which has the following code: atomic_inc(&op->usage); if (!queue_work(fscache_op_wq, &op->work)) fscache_put_operation(op); The reason we don't hit the positive refcount assertion is that the atomic_inc() above probably brings it from 0 to 1. It does smell a lot like a race causing premature free + use-after-free explaining the weird diagnostics. It takes about 15-20 minutes to reproduce the crash, with the patch above it's gone, but we can test alternative patches. Vegard