Received: by 2002:ac0:8845:0:0:0:0:0 with SMTP id g63csp1593130img; Wed, 27 Feb 2019 02:00:11 -0800 (PST) X-Google-Smtp-Source: AHgI3IYYl6fnDcSmarPEEy686c1mMT5Oz2eM5b/gN6uHemZpTfL6MIN1WrHGnilYBH9iUeenhiZ0 X-Received: by 2002:a17:902:7e49:: with SMTP id a9mr1177358pln.303.1551261611510; Wed, 27 Feb 2019 02:00:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551261611; cv=none; d=google.com; s=arc-20160816; b=ML6GnWJzM6LMvdFqwoNkU88ry51b9lyAZAoi50dd8noql2sFbzRdGDSBjIxwNbR/Am rEbp/A6aQxwB2DJMPPSag43woKs2zslOhC9vt/UFoD9Va6Ck1DQLxltJENvd/NTvDnam R06t/lp/b9gg4Y4PEekMrf1q5qHIIXzaGgFHPisdd2Ynp61nEoKemWYrWyoVorZecRdm ALeLKewT0UkMROxPK1rRbu7zn9nOo0gVzc4dkXwxTsylzNftXW/QtjPxCvKHfkhYFa1v Za0Jh4SU/X+o9DPU/WGhzS34k+igL3HklSt1JnxQT3aLziU5XbOr6iFlLj3yAK+wlzHT L2DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=IOXtnxKwJD/kSMBqEz70PSiNhFPnIQhhJlaCfw9Hv8I=; b=VbPYB67WcwG+V78pqJ447URpMzx0kYdyWRz+FPjpeBQRnKzO5JLIDjOIXt7hgOE6P+ grDYxu+OdIoBeAz3+2jBP6hI8fPZjkWjehz3ORUhsDSLkj6eaeyzSC6GH+Pv49v5ktqa WvJiLm7uIn0qPLXGi0UDm2wzM6g0tdb1Yx48z5xaPW5FsPS+6ul30u9H9QqBRupkAdLQ ACnJD7xxWy3Um6IAVmMAvx2aC17EcplMIrgccai8/Mo7KmgBuXV+ZW1PV9Dre9q04Tud BKRz6UvzuUBJp4csyow3/aO1aOnX/4//z1p1+G1hLi/CK9TJkfUTQutq/Ro6+2GuX6ve 3r9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kzcdXrKd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l38si14888650pgb.399.2019.02.27.01.59.55; Wed, 27 Feb 2019 02:00:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kzcdXrKd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729871AbfB0J7D (ORCPT + 99 others); Wed, 27 Feb 2019 04:59:03 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:40347 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729452AbfB0J7D (ORCPT ); Wed, 27 Feb 2019 04:59:03 -0500 Received: by mail-it1-f195.google.com with SMTP id l139so6841704ita.5 for ; Wed, 27 Feb 2019 01:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=IOXtnxKwJD/kSMBqEz70PSiNhFPnIQhhJlaCfw9Hv8I=; b=kzcdXrKd7kQZJIlkQOYpHgrjHD8FgytJ+CCuIsEI/pN8VszbuDmOKSOuI02r1vCQj0 JRqUU7JF9uRQRK4YFvhf8K2mmjiyZHYHMAx4c8yPcg38S3FM7/KvAt2owphLc4UR67QC P56qmCX4I+zhuzkBQvP3bnxfy8ArrlQK5UkYhB8kfwjJA6Ulxj4uTd3s0JACSJUl83gK M6ts8hVAFlYTTvRPN8UeQ57vkVMWHXrWCGKlvl5DWJ1AuQT9YH7hmt5ASu48mcq26sOm KrdS/nE6uJUz0sI7cjftQU7Fsqd2aeWb2I4gwaiLheG3Di9zDVaYRs0qjCmmUw/QSrUA fGqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=IOXtnxKwJD/kSMBqEz70PSiNhFPnIQhhJlaCfw9Hv8I=; b=iTqU0Y5rDQUKqeHUB4NnXy50WUSK4UF4zWpHVAZMwug8/wis7M+WmBXgCHTSYmMh4P gjn4EKRd8unBR5YWeytYSNtNsUAhfxfU54hoGiC722Fio2gO2iiOtLlGp9jXjDAkSTLT Morsm96VobddAC0SPfTJud6ibcKJs507zPdwcD/ZPGroCTdX4LdIh6sRLerZW7NrmKuP z+Eocl7ctC972xNX0ZnjDjUBDM8tYHqR05dCVRVHFiNGO/OQ+RYqQUy5phZDHteFTw55 xQKtONPKBi4/I6i2HxdzVR+qr+nVAA0jcV92XeTU1pg1MNAyy4/Nmd2vEQ4HL/LGLWue n7UA== X-Gm-Message-State: AHQUAuZ51NfXmHmfMqwmdPHmXvTtTFLQgrX+YCKzLe4hFcF7Kma4EKnL CSoT1TXqrfn2YM4ECwUDSQZFg9AP37bO2G+cVU1JwQ== X-Received: by 2002:a02:4985:: with SMTP id p5mr743685jad.35.1551261542048; Wed, 27 Feb 2019 01:59:02 -0800 (PST) MIME-Version: 1.0 References: <0000000000009a01370582c6772a@google.com> <20190226151738.GA6430@mit.edu> In-Reply-To: <20190226151738.GA6430@mit.edu> From: Dmitry Vyukov Date: Wed, 27 Feb 2019 10:58:50 +0100 Message-ID: Subject: Re: INFO: rcu detected stall in ext4_file_write_iter To: "Theodore Y. Ts'o" , syzbot , Andreas Dilger , linux-ext4@vger.kernel.org, LKML , linux-fsdevel , syzkaller-bugs , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 26, 2019 at 4:17 PM Theodore Y. Ts'o wrote: > > TL;DR: This doesn't appear to be ext4 specific, and seems to involve > an unholy combination of the perf_event_open(2) and sendfile(2) system > calls. > > On Mon, Feb 25, 2019 at 10:50:05PM -0800, syzbot wrote: > > syzbot found the following crash on: > > > > HEAD commit: 8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github... > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=161b71d4c00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f > > dashboard link: https://syzkaller.appspot.com/bug?extid=7d19c5fe6a3f1161abb7 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103908f8c00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=105e5cd0c00000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+7d19c5fe6a3f1161abb7@syzkaller.appspotmail.com > > > > audit: type=1400 audit(1550814986.750:36): avc: denied { map } for > > pid=8058 comm="syz-executor004" path="/root/syz-executor004991115" > > dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 > > tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1 > > hrtimer: interrupt took 42841 ns > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > rcu: (detected by 1, t=10502 jiffies, g=5873, q=2) > > rcu: All QSes seen, last rcu_preempt kthread activity 10502 > > (4295059997-4295049495), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > syz-executor004 R running task 26448 8069 8060 0x00000000 > > This particular repro seems to induce similar failures when I tried it > with xfs and btrfs as well as ext4. > > The repro seems to involve the perf_event_open(2) and sendfile(2) > system calls, and killing the process which is performing the > sendfile(2). The repro also uses the sched_setattr(2) system call, > but when I commented it out, the failure still happened, so this > appears to be another case of "Syzkaller? We don't need to bug > developers with a minimal test case! Open source developers are a > free unlimited resource, after all!" :-) > > Commenting out the perf_event_open(2) does seem to make the problem go > away. > > Since there are zillions of ways to self-DOS a Linux server without > having to resert to exotic combination of system calls, this isn't > something I'm going to prioritize for myself, but I'm hoping someone > else has time and curiosity. Peter, Ingo, do you have any updates on the perf_event_open/sched_setattr stalls? This bug cause assorted hangs throughout kernel and so is nasty. syzkaller tries to remove all syscalls from reproducers one-by-one. Somehow without sched_setattr the hang did not reproduce (a bunch of repros have perf_event_open+sched_setattr so somehow they seem to be related). Kernel is not as simple as a single-threaded deterministic fully-reproducible user-space xml parsing library, more (almost all) aspects are flaky and non-deterministic and thus require more human intelligence. But even with perfect repros machines still won't be able to tell in all cases that even though the hang happened in ext4 code, the root cause is actually another scheduler-related system call. So thanks for looking into this.