Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752329AbbH1MyH (ORCPT ); Fri, 28 Aug 2015 08:54:07 -0400 Received: from mail-yk0-f179.google.com ([209.85.160.179]:33212 "EHLO mail-yk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751794AbbH1MyF (ORCPT ); Fri, 28 Aug 2015 08:54:05 -0400 MIME-Version: 1.0 In-Reply-To: <20150707233743.GZ7943@dastard> References: <20150707233743.GZ7943@dastard> Date: Fri, 28 Aug 2015 20:54:04 +0800 Message-ID: Subject: Re: Possible memory allocation deadlock in kmem_alloc and hung task in xfs_log_commit_cil and xlog_cil_push From: Gavin Guo To: Dave Chinner Cc: xfs@oss.sgi.com, linux-kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2737 Lines: 66 On Wed, Jul 8, 2015 at 7:37 AM, Dave Chinner wrote: > On Tue, Jul 07, 2015 at 05:29:43PM +0800, Gavin Guo wrote: >> Hi all, >> >> Recently, we observed that there is the error message in >> Ubuntu-3.13.0-48.80: >> >> "XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)" >> >> repeatedly shows in the dmesg. Temporarily, our workaround is to tune the >> parameters, such as, vfs_cache_pressure, min_free_kbytes, and dirty_ratio. >> >> And we also found that there are different error messages regarding the >> hung tasks which happened in xfs_log_commit_cil and xlog_cil_push. >> >> The log is available at: http://paste.ubuntu.com/11835007/ >> >> The following link seems the same problem we suffered: >> >> XFS hangs with XFS: possible memory allocation deadlock in kmem_alloc >> http://oss.sgi.com/archives/xfs/2015-03/msg00172.html >> >> I read the mail and found that there might be some modification regarding >> to move the memory allocation outside the ctx lock. And I also read the >> latest patch from February of 2015 to see if there is any new change >> about that. Unfortunately, I didn't find anything regarding the change (may >> be I'm not familiar with the XFS, so didn't find the commit). If it's >> possible for someone who is familiar with the code to point out the commits >> related to the bug if already exist or any status about the plan. > > No commits - the approach I thought we might be able to take to > avoid the problem didn't work out. I have another idea of how we > might solve the problem, but I haven't ad a chance to prototype it > yet. I have read the code for a while and still can't figure out how to fix. My current understanding is that the problem is Buddy system is running out of memory so the XFS kmem_alloc(), called by xfs_log_commit_cil-> xlog_cil_insert_items-> xlog_cil_insert_format_items-> kmem_zalloc, fail and stuck in the while loop and retry. There are also 2 other threads running in the same time: 1). xfs_log_commit_cil->down_read(&cil->xc_ctx_lock); 2). xlog_cil_push->down_write(&cil->xc_ctx_lock); So, the both threads are blocked and waiting for the first kmem_zalloc() to succeed. However, if there is a way to decrease the memory request or if it's possible to elaborate more on the idea you mentioned. I know it's a problem which cannot be solved in a short time. And I'd like to help if there is any possibility. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/