Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2105429yba; Wed, 3 Apr 2019 01:08:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqyL4m3/HoGMxYE+If+fXqMEIZoXqBmi+siFZRBLIrwAYBjCF+8+0xelM3JNNdPQy0ZNqCPw X-Received: by 2002:a63:db10:: with SMTP id e16mr32883850pgg.142.1554278904759; Wed, 03 Apr 2019 01:08:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554278904; cv=none; d=google.com; s=arc-20160816; b=jdR9SOjFfeKnJafb4AMTED62VTj2Yd/7LjRlC7p00OngQaqZVIh8vsDgDmiP95DYXl SIeHqwvLCX+NxZXYfAAUMs6hISo/BAynONLAj28StdB40bWT6qrLTiP6+/M2P4LyJ5GO ApkbwhiXYCB8XtI5hIyArA0S6SRzAgjwFXsQgIYX50sJw0lPjJ+TxAv91FCHVb39JrmV cukscm30SraqevBCdUJWBIE4akWI8zvQZBm7Wrq8R964w0ciL3etDvr8igXUjLsOFkoH kZg23HDnL79XPX0MHzT+dUlBvQOsY4MW0yZKRnYTFi+P6axKn+ysM5b+XpDnJLqBGgVo zLBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=5Agr5RgQNBP49yVgv+SmZy9Wq/QvY7FSN7tPOUKkBxY=; b=S8jttROChwGOhRwuUX/c/OyCYIWGHZB7kH6T3CXy3FzLD8G0ZivaCgg27azW6Sm5Rt 8UeIGHi4uje4lIP2Q/A0a6wkYHenxaH4RV3jXRi0O201zTLhD5uYGPWIl33j/fPJ0fuu 4EsXxepSxItSMLsyHrUOAcp3+2lRaRC2oPvrsgHqH9Z/zYqXzvMFiQKkrWmT/VInkrc8 TA5ArIdy8BwYhtQ4lqJ4a5C7RjFmf9ZeIR5ik9fngTm5dZ+fYSVY+3tVDB4We3vsOp8Z pKY0sm30pHFdA1snGg+3xHyPX0hWGaxxrJy6BJrh46rnbnyOOJa2oyYUB4rWh4No/Esr jfIw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 134si13628796pga.249.2019.04.03.01.08.08; Wed, 03 Apr 2019 01:08:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728782AbfDCIHE (ORCPT + 99 others); Wed, 3 Apr 2019 04:07:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:41294 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726004AbfDCIHE (ORCPT ); Wed, 3 Apr 2019 04:07:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CCD79AE3E; Wed, 3 Apr 2019 08:07:01 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 09E891E3FFD; Wed, 3 Apr 2019 10:07:01 +0200 (CEST) Date: Wed, 3 Apr 2019 10:07:01 +0200 From: Jan Kara To: Steve Magnani Cc: Jan Kara , "linux-kernel@vger.kernel.org" , linux-fsdevel@vger.kernel.org Subject: Re: Possible UDF locking error? Message-ID: <20190403080701.GB8836@quack2.suse.cz> References: <224e1613-b080-bd64-ef88-badcb755a233@digidescorp.com> <20190325164211.GF8308@quack2.suse.cz> <16a6ed55-e8c9-cc56-b675-e9a9de57ab66@digidescorp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16a6ed55-e8c9-cc56-b675-e9a9de57ab66@digidescorp.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Sat 30-03-19 14:49:46, Steve Magnani wrote: > On 3/25/19 11:42 AM, Jan Kara wrote: > > Hi! > > > > On Sat 23-03-19 15:14:05, Steve Magnani wrote: > > > I have been hunting a UDF bug that occasionally results in generation > > > of an Allocation Extent Descriptor with an incorrect tagLocation. So > > > far I haven't been able to see a path through the code that could > > > cause that. But, I noticed some inconsistency in locking during > > > AED generation and wonder if it could result in random corruption. > > > > > > The function udf_update_inode() has this general pattern: > > > > > > bh = udf_tgetblk(...); // calls sb_getblk() > > > lock_buffer(bh); > > > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > > > // other code to populate FE/EFE data in the block > > > set_buffer_uptodate(bh); > > > unlock_buffer(bh); > > > mark_buffer_dirty(bh); > > > > > > This I can understand - the lock is held for as long as the buffer > > > contents are being assembled. > > > > > > In contrast, udf_setup_indirect_aext(), which constructs an AED, > > > has this sequence: > > > > > > bh = udf_tgetblk(...); // calls sb_getblk() > > > lock_buffer(bh); > > > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > > > > > > set_buffer_uptodate(bh); > > > unlock_buffer(bh); > > > mark_buffer_dirty_inode(bh); > > > > > > // other code to populate AED data in the block > > > > > > In this case the population of the block occurs without > > > the protection of the lock. > > > > > > Because the block has been marked dirty, does this mean that > > > writeback could occur at any point during population? > > Yes. Thanks for noticing this! > > > > > There is one path through udf_setup_indirect_aext() where > > > mark_buffer_dirty_inode() gets called again after population is > > > complete, which I suppose could heal a partial writeout, but there is > > > also another path in which the buffer does not get marked dirty again. > > Generally, we add new extents to the created indirect extent which dirties > > the buffer and that should fix the problem. But you are definitely right > > that the code is suspicious and should be fixed. Will you send a patch? > > I did a little archaeology to see how the code evolved to this point. It's > been like this a long time. > > I also did some research to understand why filesystems use lock_buffer() > sometimes but not others. For example, the FAT driver never calls it. I ran > across this thread from 2011: > > https://lkml.org/lkml/2011/5/16/402 > > ...from which I conclude that while it is correct in a strict sense to hold > a lock on a buffer any time its contents are being modified, performance > considerations make it preferable (or at least reasonable) to make some > modifications without a lock provided it's known that a subsequent write-out > will "fix" any potential partial write out before anyone else tries to read > the block. Understood but UDF (and neither FAT) are really that performance critical. If you look for performance, you'd certainly pick a different filesystem. UDF is mainly for data interchange so it should work reasonably for copy-in copy-out style of workloads, the rest isn't that important. So there correctness and simplicity is preferred over performance. > I doubt that UDF sees common use with DIF/DIX block devices, > which might make a decision in favor of performance a little easier. Since > the FAT driver doesn't contain Darrick's proposed changes I assume a > decision was made that performance was more important there. > > Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets > those criteria. But udf_table_free_blocks() may not dirty the AED block. > > So if this looks reasonable I will resend as a formal patch: > > --- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500 > +++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500 > @@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode > return -EIO; > lock_buffer(bh); > memset(bh->b_data, 0x00, sb->s_blocksize); > - set_buffer_uptodate(bh); > - unlock_buffer(bh); > - mark_buffer_dirty_inode(bh, inode); > aed = (struct allocExtDesc *)(bh->b_data); > if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) { > @@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode > udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block, > sizeof(struct tag)); > + set_buffer_uptodate(bh); > + unlock_buffer(bh); > + > nepos.block = neloc; > nepos.offset = sizeof(struct allocExtDesc); > nepos.bh = bh; > @@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode > } else { > __udf_add_aext(inode, epos, &nepos.block, > sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0); > + /* Make sure completed AED gets written out */ > + mark_buffer_dirty_inode(nepos.bh, inode); Why do you mark the buffer as dirty only here? I'd just mark it dirty after unlocking. If __udf_add_aext() or udf_write_aext() modify the buffer, they will mark it as dirty as well... Thanks! Honza -- Jan Kara SUSE Labs, CR