Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp797468imd; Thu, 1 Nov 2018 05:52:53 -0700 (PDT) X-Google-Smtp-Source: AJdET5erFaInlHOt9l4r41p+WftNOqX9mgmbjoXScpMHfA6qIYaN1XWWUMql+JLCqHgaM0KzK2Wf X-Received: by 2002:a63:f811:: with SMTP id n17mr7291991pgh.23.1541076773499; Thu, 01 Nov 2018 05:52:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541076773; cv=none; d=google.com; s=arc-20160816; b=wRH5uU0Jdus+/2Z6HSS4+aeh4kKnnB1f7O0nLXeZTRbtpXXrz3uAyu58I1m0Tfn4Le eUsjIIp1tzQjrfW4YWyT2+jCO9MmK1Iwf36+2EWh6Hh7Qq/ZZ/1kufh9hucZM90zUJWC N2zhJM153nF6z4zTbF25UL/YaCQUS9DIGECHBTDhfmEG67uT7k0ouZQ9drvKpqG+Hd5S 9fzZ2+sKKBZdYdEBV74+l2MVz/C9xNBcgd3+lJrDD5CDyC0yiCmIyFFIENAiDT05VZZQ kIzrAU3zw4WVo9o/j5uHOs8LfCOqHIG0Vo++6rFm+uyV4ZIV/Udq30xvShmUb7eOwph8 2KRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=xn50LbntvkSW7iCiP5sDP+tQExiqNYWFmnlrzdPGkBQ=; b=cZroGBWbrogPcVIBrnOx7d14kfOygFmK6ceTyG5+zN7NBlQ8lJ9tXxmd6HaABYjbio sFFuPdB8guOwLMOzfAasmQCrW7wCg3KE559F7ob/2SKLxQo71q0P8DhMFG48wEZgvkCb ie1s6rDhr131vhoZuHmXqXji5+ULYh8++y8N8ENIc6h4wy4LdOUQOZ2xqHBHmNCBkQqJ jv+fdVJgFYfsb9YqU7l14RsTB/mBkIhLLr3LsR7WCFGM2nt1D2kyo1suOWE+wYj+qTgX vq8lZrYVC/6g8kJSRuSAbOX0H8qpRlGKXoqSBCtLkwyhIioO+zi5vYYcobRpZYIeOnPk jJhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z16-v6si30281768pga.177.2018.11.01.05.52.33; Thu, 01 Nov 2018 05:52:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728288AbeKAVxP convert rfc822-to-8bit (ORCPT + 99 others); Thu, 1 Nov 2018 17:53:15 -0400 Received: from smtp.h3c.com ([60.191.123.56]:22775 "EHLO h3cmg01-ex.h3c.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727862AbeKAVxP (ORCPT ); Thu, 1 Nov 2018 17:53:15 -0400 Received: from BJHUB02-EX.srv.huawei-3com.com (unknown [10.63.20.170]) by h3cmg01-ex.h3c.com with smtp id 139e_1880_5900c6f3_67de_4edc_961b_810788f4dcea; Thu, 01 Nov 2018 20:49:02 +0800 Received: from H3CMLB12-EX.srv.huawei-3com.com ([fe80::10fe:abde:731b:fdde]) by BJHUB02-EX.srv.huawei-3com.com ([::1]) with mapi id 14.03.0415.000; Thu, 1 Nov 2018 20:48:48 +0800 From: Changwei Ge To: Larry Chen , Joseph Qi CC: "linux-kernel@vger.kernel.org" , "ocfs2-devel@oss.oracle.com" , Andrew Morton Subject: Re: [Ocfs2-devel] [PATCH V3] ocfs2: fix dead lock caused by ocfs2_defrag_extent Thread-Topic: [Ocfs2-devel] [PATCH V3] ocfs2: fix dead lock caused by ocfs2_defrag_extent Thread-Index: AQHUcbKZagBejW0xd0Sk/bgqi0J6UA== Date: Thu, 1 Nov 2018 12:48:47 +0000 Message-ID: <63ADC13FD55D6546B7DECE290D39E37301277DE42D@H3CMLB12-EX.srv.huawei-3com.com> References: <20181101071422.14470-1-lchen@suse.com> <63ADC13FD55D6546B7DECE290D39E37301277DE397@H3CMLB12-EX.srv.huawei-3com.com> <24a08d67-dd33-7fc1-628a-af55cd2de1fe@suse.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.125.20.206] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Larry, On 2018/11/1 20:39, Larry Chen wrote: > Hi Joseph, > > On 11/1/18 7:52 PM, Changwei Ge wrote: >> Hello Joseph, >> >> On 2018/11/1 17:01, Joseph Qi wrote: >>> Hi Larry, >>> >>> On 18/11/1 15:14, Larry Chen wrote: >>>> ocfs2_defrag_extent may fall into deadlock. >>>> >>>> ocfs2_ioctl_move_extents >>>> ocfs2_ioctl_move_extents >>>> ocfs2_move_extents >>>> ocfs2_defrag_extent >>>> ocfs2_lock_allocators_move_extents >>>> >>>> ocfs2_reserve_clusters >>>> inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>> >>>> __ocfs2_flush_truncate_log >>>> inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>> >>>> As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock >>>> against the global bitmap if local allocator has not sufficient cluters. >>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will >>>> return success with global bitmap locked. >>>> >>>> After ocfs2_reserve_cluster(), if truncate log is full, >>>> __ocfs2_flush_truncate_log() will definitely fall into deadlock because it >>>> needs to inode_lock global bitmap, which has already been locked. >>>> >>>> To fix this bug, we could remove from ocfs2_lock_allocators_move_extents() >>>> the code which intends to lock global allocator, and put the removed code >>>> after __ocfs2_flush_truncate_log(). >>>> >>>> ocfs2_lock_allocators_move_extents() is referred by 2 places, one is here, >>>> the other does not need the data allocator context, which means this patch >>>> does not affect the caller so far. >>>> >>>> Change log: >>>> 1. Correct the function comment. >>>> 2. Remove unused argument from ocfs2_lock_meta_allocator_move_extents. >>>> >>>> Signed-off-by: Larry Chen >>>> --- >>>> fs/ocfs2/move_extents.c | 47 ++++++++++++++++++++++++++--------------------- >>>> 1 file changed, 26 insertions(+), 21 deletions(-) >>>> >> >>> IMO, here clusters_to_move is only for data_ac, since we change this >>> function to only handle meta_ac, I'm afraid clusters_to_move related >>> logic has to been changed correspondingly. >> >> I think we can't remove *clusters_to_move* from here as clusters can be reserved latter outsides this function, but we >> still have to reserve metadata(extents) in advance. >> So we need that argument. >> > > Yeah, I think clusters_to_move should be reserved, in order to keep the > original logic as it was. > > But I'm curious about why max_recs_needed should be equal to > 2 * extents_to_split + cluster_to_move? > Does that mean that each cluster might form an extent?? It's because in the worst case that the file is of great fragment and one extent record can only represent one cluster. And we might encounter extents split during merging and rotation so we also need _2 * extents_to_split_. Thanks, Changwei > > Thanks, > Larry > > >> Thanks, >> Changwei >> >>> >>> Thanks, >>> Joseph >>>> u32 extents_to_split, >>>> struct ocfs2_alloc_context **meta_ac, >>>> - struct ocfs2_alloc_context **data_ac, >>>> int extra_blocks, >>>> int *credits) >>>> { >>>> @@ -192,13 +188,6 @@ static int ocfs2_lock_allocators_move_extents(struct inode *inode, >>>> goto out; >>>> } >>>> >>>> - if (data_ac) { >>>> - ret = ocfs2_reserve_clusters(osb, clusters_to_move, data_ac); >>>> - if (ret) { >>>> - mlog_errno(ret); >>>> - goto out; >>>> - } >>>> - } >>>> >>>> *credits += ocfs2_calc_extend_credits(osb->sb, et->et_root_el); >>>> >>>> @@ -257,10 +246,10 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context, >>>> } >>>> } >>>> >>>> - ret = ocfs2_lock_allocators_move_extents(inode, &context->et, *len, 1, >>>> - &context->meta_ac, >>>> - &context->data_ac, >>>> - extra_blocks, &credits); >>>> + ret = ocfs2_lock_meta_allocator_move_extents(inode, &context->et, >>>> + *len, 1, >>>> + &context->meta_ac, >>>> + extra_blocks, &credits); >>>> if (ret) { >>>> mlog_errno(ret); >>>> goto out; >>>> @@ -283,6 +272,21 @@ static int ocfs2_defrag_extent(struct ocfs2_move_extents_context *context, >>>> } >>>> } >>>> >>>> + /* >>>> + * Make sure ocfs2_reserve_cluster is called after >>>> + * __ocfs2_flush_truncate_log, otherwise, dead lock may happen. >>>> + * >>>> + * If ocfs2_reserve_cluster is called >>>> + * before __ocfs2_flush_truncate_log, dead lock on global bitmap >>>> + * may happen. >>>> + * >>>> + */ >>>> + ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac); >>>> + if (ret) { >>>> + mlog_errno(ret); >>>> + goto out_unlock_mutex; >>>> + } >>>> + >>>> handle = ocfs2_start_trans(osb, credits); >>>> if (IS_ERR(handle)) { >>>> ret = PTR_ERR(handle); >>>> @@ -600,9 +604,10 @@ static int ocfs2_move_extent(struct ocfs2_move_extents_context *context, >>>> } >>>> } >>>> >>>> - ret = ocfs2_lock_allocators_move_extents(inode, &context->et, len, 1, >>>> - &context->meta_ac, >>>> - NULL, extra_blocks, &credits); >>>> + ret = ocfs2_lock_meta_allocator_move_extents(inode, &context->et, >>>> + len, 1, >>>> + &context->meta_ac, >>>> + extra_blocks, &credits); >>>> if (ret) { >>>> mlog_errno(ret); >>>> goto out; >>>> >>> >>> _______________________________________________ >>> Ocfs2-devel mailing list >>> Ocfs2-devel@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>> >> > >