Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966651AbcLVSuh (ORCPT ); Thu, 22 Dec 2016 13:50:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51194 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754129AbcLVSue (ORCPT ); Thu, 22 Dec 2016 13:50:34 -0500 Date: Thu, 22 Dec 2016 10:50:30 -0800 From: Chris Leech To: Dave Chinner Cc: Linus Torvalds , Johannes Weiner , Linux Kernel Mailing List , Lee Duncan , open-iscsi@googlegroups.com, Linux SCSI List , linux-block@vger.kernel.org, Christoph Hellwig Subject: Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0 Message-ID: <20161222185030.so4btkuzzkih3owz@straylight.hirudinean.org> Mail-Followup-To: Chris Leech , Dave Chinner , Linus Torvalds , Johannes Weiner , Linux Kernel Mailing List , Lee Duncan , open-iscsi@googlegroups.com, Linux SCSI List , linux-block@vger.kernel.org, Christoph Hellwig References: <20161214222411.GH4326@dastard> <20161214222953.GI4326@dastard> <20161216185906.t2wmrr6wqjdsrduw@straylight.hirudinean.org> <20161221221638.GD4758@dastard> <20161222001303.nvrtm22szn3hgxar@straylight.hirudinean.org> <20161222051322.GF4758@dastard> <20161222065012.GI4758@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161222065012.GI4758@dastard> User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 22 Dec 2016 18:50:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2936 Lines: 64 On Thu, Dec 22, 2016 at 05:50:12PM +1100, Dave Chinner wrote: > On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote: > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner wrote: > > > > > > There may be deeper issues. I just started running scalability tests > > > (e.g. 16-way fsmark create tests) and about a minute in I got a > > > directory corruption reported - something I hadn't seen in the dev > > > cycle at all. > > > > By "in the dev cycle", do you mean your XFS changes, or have you been > > tracking the merge cycle at least for some testing? > > I mean the three months leading up to the 4.10 merge, when all the > XFS changes were being tested against 4.9-rc kernels. > > The iscsi problem showed up when I updated the base kernel from > 4.9 to 4.10-current last week to test the pullreq I was going to > send you. I've been bust with other stuff until now, so I didn't > upgrade my working trees again until today in the hope the iscsi > problem had already been found and fixed. > > > > I unmounted the fs, mkfs'd it again, ran the > > > workload again and about a minute in this fired: > > > > > > [628867.607417] ------------[ cut here ]------------ > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220 > > > > Well, part of the changes during the merge window were the shadow > > entry tracking changes that came in through Andrew's tree. Adding > > Johannes Weiner to the participants. > > > > > Now, this workload does not touch the page cache at all - it's > > > entirely an XFS metadata workload, so it should not really be > > > affecting the working set code. > > > > Well, I suspect that anything that creates memory pressure will end up > > triggering the working set code, so .. > > > > That said, obviously memory corruption could be involved and result in > > random issues too, but I wouldn't really expect that in this code. > > > > It would probably be really useful to get more data points - is the > > problem reliably in this area, or is it going to be random and all > > over the place. > > The iscsi problem is 100% reproducable. create a pair of iscsi luns, > mkfs, run xfstests on them. iscsi fails a second after xfstests mounts > the filesystems. > > The test machine I'm having all these other problems on? stable and > steady as a rock using PMEM devices. Moment I go to use /dev/vdc > (i.e. run load/perf benchmarks) it starts falling over left, right > and center. I'm not reproducing any problems with xfstests running over iscsi_tcp right now. Two 10G luns exported from an LIO target, attached directly to a test VM as sda/sdb and xfstests configured to use sda1/sdb1 as TEST_DEV and SCRATCH_DEV. The virtio scatterlist issue that popped right away for me is triggered by an hdparm ioctl, which is being run by tuned on Fedora. And that actually seems to happen back on 4.9 as well :( Chris