Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4287649imm; Mon, 8 Oct 2018 19:11:52 -0700 (PDT) X-Google-Smtp-Source: ACcGV61CMaD9yt3YkLvFam35qvbwLINAmCAooa2wJ0eTwUzh3bnHjVq8MtI/EuvahFCcgTcDOKBL X-Received: by 2002:a62:42d4:: with SMTP id h81-v6mr28570838pfd.0.1539051112913; Mon, 08 Oct 2018 19:11:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539051112; cv=none; d=google.com; s=arc-20160816; b=y+ZtsLe05F1yKVLWH+xAX324Ww1J2erUW3Px0daBqto7xyoyD7KhmWMARrBy5Z+aRl YinMMHEXJoByvWQzkeOqaCf1w6qX3j6jtAaGe/IqOHuHDJamyrwT6XA5tSTDyZUAKU6M X9wg3xiJKfQYa0XSedp2HChxVee8u91UOGT7h6EBd4q+ogUNO30ihJrRBNWZk7F5+kv3 YoeZppHbxo8luhtORGDczbvxMPYV8tHH+Eha4YUvXTnvDwWK3hs4SKVRLEMoZoBqlYu/ U0Sm/y8PwpDlw4T0Xl9oVvrClDl0s5HCJept92sdqFOmiSwORfYSztCxwPpmcDLj6n3D vYIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=cJDl8wMdtgdPU7T8StEvHeBYZ3BsJdXsWWxOjrQOyHQ=; b=x3rxaZ0MklBlO+n0Yir2bamTIfVQDuXyhbL0wYdOMjR20uCnCU1XpKe/Cc8AGz3rQi qzLIub7oYGIDLgdzVWwNIEjNRhv+WZN7IucqP178C2X0MFddgKTjA0UfFzt6pzMF15cI mmXCD4TNG+tB8YwiLpjnubocvQXn8+Q5aBNqkgf50/U/7P55oV9ShXeBSlaNfQrpOCx/ x+5KWK4RL07BlFbqjL8xp7BnPx73WQfI4F4FDWZenKxqqea05Xbs9DiBg39jnE2ergqg GDpvpRgtJF6sicImeAqdZsunQhxeoB7m6CVm2H2WdS5eDlU/QDdKfXln2SlJHOpZIbbu CcmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8-v6si19068676pgj.352.2018.10.08.19.11.38; Mon, 08 Oct 2018 19:11:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726723AbeJIJYl (ORCPT + 99 others); Tue, 9 Oct 2018 05:24:41 -0400 Received: from nautica.notk.org ([91.121.71.147]:36588 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726452AbeJIJYl (ORCPT ); Tue, 9 Oct 2018 05:24:41 -0400 Received: by nautica.notk.org (Postfix, from userid 1001) id 99392C009; Tue, 9 Oct 2018 04:10:04 +0200 (CEST) Date: Tue, 9 Oct 2018 04:09:49 +0200 From: Dominique Martinet To: syzbot Cc: davem@davemloft.net, ericvh@gmail.com, linux-kernel@vger.kernel.org, lucho@ionkov.net, netdev@vger.kernel.org, rminnich@sandia.gov, syzkaller-bugs@googlegroups.com, v9fs-developer@lists.sourceforge.net Subject: Re: BUG: corrupted list in p9_read_work Message-ID: <20181009020949.GA29622@nautica> References: <000000000000ca61cd0571178677@google.com> <000000000000fddb150577c15af6@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <000000000000fddb150577c15af6@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org syzbot wrote on Mon, Oct 08, 2018: > syzbot has found a reproducer for the following crash on: > > HEAD commit: 0854ba5ff5c9 Merge git://git.kernel.org/pub/scm/linux/kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1514ec06400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=88e9a8a39dc0be2d > dashboard link: https://syzkaller.appspot.com/bug?extid=2222c34dc40b515f30dc > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10b91685400000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com > > list_del corruption, ffff88019ae36ee8->next is LIST_POISON1 > (dead000000000100) > ------------[ cut here ]------------ > [...] > list_del include/linux/list.h:125 [inline] > p9_read_work+0xab6/0x10e0 net/9p/trans_fd.c:379 Hmm this looks very much like the report from syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com which should have been fixed by Tomas in 9f476d7c540cb ("net/9p/trans_fd.c: fix race by holding the lock")... It looks like another double list_del, looking at the code again there actually are other ways this could happen around connection errors. For example, - p9_read_work receives something and lookup works... meanwhile - p9_write_work fails to write and calls p9_conn_cancel, which deletes from the req_list without waiting for other works to finish (could also happen in p9_poll_mux) - p9_read_work finishes processing the read and deletes from list again For this one the simplest fix would probably be to just not list_del/call p9_client_cb at all if m->r?req->status isn't REQ_STATUS_ERROR in p9_read_work after the "got new packet" debug print, and frankly I think that's saner so I'll send a patch shortly doing that, but I have zero confidence there aren't similar bugs around, the tcp code is so messy... Most of the syzbot reports recently have been around trans_fd which I don't think is used much in real life, and this is not really motivating (i.e. I think it would probably need a more extensive rewrite but nobody cares) :/ Dmitry, on that note, do you think syzbot could possibly test other transports somehow? rdma or virtio cannot be faked as easily as passing a fd around, but I'd be very interested in seeing these flayed a bit. (I'm also curious what logic is used to generate the syz tests, the write$P9_Rxx replies have nothing to do with what the client would expect so it probably doesn't test very far; this test in particular does not even get past the initial P9_TVERSION that the client would expect immediately after mount, so it's basically only testing logic around packet handling on error... Or if we're accepting a RREADDIR in reply to TVERSION we have bigger problems, and now I'm looking at it I think we just might never check that....... I'll look at that for the next cycle) Back to the current patch, since as I said I am not confident this is a good enough fix for the current bug, will I get notified if the bug happens again once the patch hits linux-next with the Reported-by tag ? (I don't have the setup necessary to run a syz repro as there is no C repro, and won't have much time to do that setup sorry) > FS-Cache: N-cookie d=000000000a092700 n=00000000d8ee0022 > FS-Cache: N-key=[10] '34323935303034313132' > FS-Cache: Duplicate cookie detected > FS-Cache: O-cookie c=00000000911358e4 [p=000000006545c95d fl=222 nc=0 na=1] > FS-Cache: O-cookie d=000000000a092700 n=000000007635356b > FS-Cache: O-key=[10] ' > [...] (on an unrelated topic, I got these FS-Cache warnings quite often when testing with cache enabled and have no idea what they mean. I don't normally use cache so haven't spent time looking at it, but I find these rather worrying... If someone having a clue reads this, I'd love to hear what they could mean and what we should look at) Thanks, -- Dominique