Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752829AbcLFJyH (ORCPT ); Tue, 6 Dec 2016 04:54:07 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:60086 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752364AbcLFJyF (ORCPT ); Tue, 6 Dec 2016 04:54:05 -0500 MIME-Version: 1.0 In-Reply-To: <20161201113031.GB5813@arm.com> References: <20161201113031.GB5813@arm.com> From: Ming Lei Date: Tue, 6 Dec 2016 17:53:46 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [bug report v4.8] fs/locks.c: kernel oops during posix lock stress test To: Will Deacon Cc: Linux FS Devel , Alexander Viro , Jeff Layton , "J. Bruce Fields" , Catalin Marinas , Linux Kernel Mailing List , linux-arm-kernel , David Daney Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1486 Lines: 39 Hi Will, On Thu, Dec 1, 2016 at 7:30 PM, Will Deacon wrote: > On Mon, Nov 28, 2016 at 11:10:14AM +0800, Ming Lei wrote: >> When I run stress-ng via the following steps on one ARM64 dual >> socket system(Cavium Thunder), the kernel oops[1] can often be >> triggered after running the stress test for several hours(sometimes >> it may take longer): >> >> - git clone git://kernel.ubuntu.com/cking/stress-ng.git >> - apply the attachment patch which just makes the posix file >> lock stress test more aggressive >> - run the test via '~/git/stress-ng$./stress-ng --lockf 128 --aggressive' >> >> >> From the oops log, looks one garbage file_lock node is got >> from the linked list of 'ctx->flc_posix' when the issue happens. >> >> BTW, the issue isn't observed on single socket Cavium Thunder yet, >> and the same issue can be seen on Ubuntu Xenial(v4.4 based kernel) >> too. > > FWIW, I've been running this on Seattle for 24 hours with your patch applied > and not seen any problems yet. That said, Thomas did just fix an rt_mutex > race which only seemed to pop up on Thunder, so you could give those > patches a try. > > https://lkml.kernel.org/r/20161130205431.629977871@linutronix.de I applied the patch against Ubuntu Yakkety kernel(v4.8 based), and run the test again on one dual-socket Cavium ThunderX system, and the issue can still be triggered. So looks not a same issue with David Daney's. Anyway, thank you for providing this input! Thanks, Ming