From: Jan Kara Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable Date: Tue, 23 Sep 2014 11:42:04 +0200 Message-ID: <20140923094204.GB2359@quack.suse.cz> References: <541AD93A.70203@profihost.ag> <20140918192131.GD19520@thunk.org> <541B32A1.3080706@profihost.ag> <20140918194311.GE19520@thunk.org> <541FC817.7030401@profihost.ag> <20140922164715.GB4572@thunk.org> <54206AA2.1050607@profihost.ag> <20140922202004.GF4572@thunk.org> <54212641.9010808@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Theodore Ts'o , linux-ext4@vger.kernel.org, "p.herz@profihost.ag >> Philipp Herz - Profihost AG" , stable@vger.kernel.org To: Stefan Priebe - Profihost AG Return-path: Content-Disposition: inline In-Reply-To: <54212641.9010808@profihost.ag> Sender: stable-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Tue 23-09-14 09:50:25, Stefan Priebe - Profihost AG wrote: > > Am 22.09.2014 um 22:20 schrieb Theodore Ts'o: > > On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote: > >> Hi, > >> Am 22.09.2014 18:47, schrieb Theodore Ts'o: > >>> On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote: > >>>>> That's not the whole message; you just weren't able to capture it all. > >>>>> How are you capturing these messages, by the way? Serial console? > >>>> > >>>> Sorry this was an incomplete copy and paste by me. > >>>> > >>>> Here is the complete output: > >>>> [1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281] > >>>> [1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 > >>> > >>> OK, thanks, this is a known bug, where when ext4 is under heavy memory > >>> pressure, we can end up stalling in reclaim. This message indicates > >>> that the system got stalled for 22 seconds, which is not good, since > >>> it impacts the interactivity of your system, and increases the > >>> long-tail latency of requests to servers running on your system, but > >>> it doesn't cause any data loss or will cause any of your processes to > >>> crash or otherwise stop functioning (except for temporarily). > >>> > >>> It's something that we are working on, and there are patches which > >>> Zheng Liu submitted that still need a bit of polishing, but I hope to > >>> have it addressed soon. > >> > >> Thanks for your feedback. Will those patches go to stable? Any link to > >> those patches? > > > > I'm not sure they will go to Stable when they are ready, because the > > patches are somewhat complex and so they may not apply cleanly to much > > older kernels. > > > > The patches under discussion (some have been applied, others hae been > > waiting for some requested changes) can be found here: > > > > http://patchwork.ozlabs.org/patch/377720 > > http://patchwork.ozlabs.org/patch/377721 > > http://patchwork.ozlabs.org/patch/377722 > > http://patchwork.ozlabs.org/patch/377723 > > http://patchwork.ozlabs.org/patch/377724 > > http://patchwork.ozlabs.org/patch/377725 > > http://patchwork.ozlabs.org/patch/377727 > > hui that's a lot. Are they ALL needed to fix this? Yes, all of them are needed. > No workaround possible? I don't know about any. > What will Redhat do with their 3.10 RHEL 7 kernel? Well, I cannot speak for RH guys but for SLES if there's a customer request, we'll just go and backport the patches... Honza -- Jan Kara SUSE Labs, CR