Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp3273258pxb; Wed, 13 Oct 2021 02:39:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz37zOmGbbrm+eUAqaPef4gpnKGDI+ckbleVty8WLh6pQ5a0P6akUK0WXe/vuK86J4FAl4/ X-Received: by 2002:a05:6402:2550:: with SMTP id l16mr8045201edb.229.1634117955493; Wed, 13 Oct 2021 02:39:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634117955; cv=none; d=google.com; s=arc-20160816; b=B4qsZCBxsURDSkLVrolHY2KPFbZhHr+TWfFSSOXoTcEPmD6yM1utPj8axZZbvCuc0t WOEuhPVoFbBFuVMBYSJ2fACUNX5SZM549ZshzbBL87bgclC6mupbnfDsRfs30YihzVYR 6oax1L/YpYHb1VNnkDjauEbLUQfI3AdwPV47HaMX031/PWgRRiHkrNaRZZQnXKlql+3v Cg6TB7LSdDfTHBNG+7NG26wcxl23Gy08fb2G9Yc1YoGOk3/0vch646XiC2BctTyx4O3x u4t4blCNwYABAIJag5jaYw1rnZ63rhDslOBLk5+gIXzCfY/yEvdmkdz/fVXTfr+1nuW8 gqkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-signature; bh=YxjEGZZ1prI96OAFnaOsXZL53xqr89Luesf5g0xwzvc=; b=Qv2TV0V4ybvPacGWu+6iaY/wYCQ8dWJeIelnJ9bdPFPOHmGGZ2Bh3Nj3Lwkiw8jkOS CZZxwj958zKTslGwdtS1XxOswjRkqNOLt+uX1nPMOEvlnYAWwDc5IEzq0j5B+pMp9Qkm SaKMJcDYuZglaAvRmOil2Vk2MtVp19gvADokRNXQXa0sL04lW0VR3a6My3evFveTBNE4 8twAFkPrIUbYQwgs+pI56PMDffhisUSv264oEfC99xAOUNzr91SfQ9Eiy4lxKqGGzYvN k8ljW0MIKnM7IpxRd9hoewfI44JQCui93eMbdZXeDJyWN6rR6ED3Yw/lnr5PemKIL0ww CLvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=wMOoFPoG; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ne29si2681920ejc.157.2021.10.13.02.38.51; Wed, 13 Oct 2021 02:39:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=wMOoFPoG; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238986AbhJMJkv (ORCPT + 99 others); Wed, 13 Oct 2021 05:40:51 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:49614 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238388AbhJMJkv (ORCPT ); Wed, 13 Oct 2021 05:40:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3168F21A63; Wed, 13 Oct 2021 09:38:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1634117927; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YxjEGZZ1prI96OAFnaOsXZL53xqr89Luesf5g0xwzvc=; b=wMOoFPoGcozMXIDZZg5MQ+yQjOhvCmgGFo5xJ+gJfe3aA2f8PT0R3kB7yKiblCZpuliiox 3O+ac2MoOgtXpZIvrJCoT8KyDzXMxDEKOc1wBZs7wjNiYk3mde8LaMGkeoAJ2vUqpPa1Px WECW95hlU7ti6sjlZLCTV7Lss0izXrI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1634117927; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YxjEGZZ1prI96OAFnaOsXZL53xqr89Luesf5g0xwzvc=; b=9IZCJSmB+5AAZGerpBYokk4amIXaeOhmnTBRhxu/0ZKJ+6I6s9/5+EuDuUEJ5kv3CaQ4C1 WRS3LwT2zUl0O9CQ== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 21DCDA3B85; Wed, 13 Oct 2021 09:38:47 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 268831E11B6; Wed, 13 Oct 2021 11:38:47 +0200 (CEST) Date: Wed, 13 Oct 2021 11:38:47 +0200 From: Jan Kara To: yebin Cc: Jan Kara , tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time Message-ID: <20211013093847.GB19200@quack2.suse.cz> References: <20210911090059.1876456-1-yebin10@huawei.com> <20210911090059.1876456-3-yebin10@huawei.com> <20211007123100.GG12712@quack2.suse.cz> <615FA55B.5070404@huawei.com> <615FAF27.8070000@huawei.com> <20211012084727.GF9697@quack2.suse.cz> <61657590.2050407@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <61657590.2050407@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue 12-10-21 19:46:24, yebin wrote: > On 2021/10/12 16:47, Jan Kara wrote: > > On Fri 08-10-21 10:38:31, yebin wrote: > > > On 2021/10/8 9:56, yebin wrote: > > > > On 2021/10/7 20:31, Jan Kara wrote: > > > > > On Sat 11-09-21 17:00:55, Ye Bin wrote: > > > > > > kmmpd: > > > > > > ... > > > > > > diff = jiffies - last_update_time; > > > > > > if (diff > mmp_check_interval * HZ) { > > > > > > ... > > > > > > As "mmp_check_interval = 2 * mmp_update_interval", 'diff' always little > > > > > > than 'mmp_update_interval', so there will never trigger detection. > > > > > > Introduce last_check_time record previous check time. > > > > > > > > > > > > Signed-off-by: Ye Bin > > > > > I think the check is there only for the case where write_mmp_block() + > > > > > sleep took longer than mmp_check_interval. I agree that should rarely > > > > > happen but on a really busy system it is possible and in that case > > > > > we would > > > > > miss updating mmp block for too long and so another node could have > > > > > started > > > > > using the filesystem. I actually don't see a reason why kmmpd should be > > > > > checking the block each mmp_check_interval as you do - > > > > > mmp_check_interval > > > > > is just for ext4_multi_mount_protect() to know how long it should wait > > > > > before considering mmp block stale... Am I missing something? > > > > > > > > > > Honza > > > > I'm sorry, I didn't understand the detection mechanism here before. Now > > > > I understand > > > > the detection mechanism here. > > > > As you said, it's just an abnormal protection. There's really no problem. > > > > > > > Yeah, i did test as following steps > > > hostA hostB > > > mount > > > ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN > > > delay 5s after label "skip" so hostB will see seq is > > > EXT4_MMP_SEQ_CLEAN > > > mount > > > ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN > > > run kmmpd > > > run kmmpd > > > > > > Actually,in this situation kmmpd will not detect confliction. > > > In ext4_multi_mount_protect function we write mmp data first and wait > > > 'wait_time * HZ' seconds, > > > read mmp data do check. Most of the time, If 'wait_time' is zero, it can pass > > > check. > > But how can be wait_time zero? As far as I'm reading the code, wait_time > > must be at least EXT4_MMP_MIN_CHECK_INTERVAL... > > > > Honza > int ext4_multi_mount_protect(struct super_block *sb, > ext4_fsblk_t mmp_block) > { > struct ext4_super_block *es = EXT4_SB(sb)->s_es; > struct buffer_head *bh = NULL; > struct mmp_struct *mmp = NULL; > u32 seq; > unsigned int mmp_check_interval = > le16_to_cpu(es->s_mmp_update_interval); > unsigned int wait_time = 0; --> wait_time is > equal with zero > int retval; > > if (mmp_block < le32_to_cpu(es->s_first_data_block) || > mmp_block >= ext4_blocks_count(es)) { > ext4_warning(sb, "Invalid MMP block in superblock"); > goto failed; > } > > retval = read_mmp_block(sb, &bh, mmp_block); > if (retval) > goto failed; > > mmp = (struct mmp_struct *)(bh->b_data); > > if (mmp_check_interval < EXT4_MMP_MIN_CHECK_INTERVAL) > mmp_check_interval = EXT4_MMP_MIN_CHECK_INTERVAL; > > /* > * If check_interval in MMP block is larger, use that instead of > * update_interval from the superblock. > */ > if (le16_to_cpu(mmp->mmp_check_interval) > mmp_check_interval) > mmp_check_interval = le16_to_cpu(mmp->mmp_check_interval); > > seq = le32_to_cpu(mmp->mmp_seq); > if (seq == EXT4_MMP_SEQ_CLEAN) --> If hostA and hostB mount the > same block device at the same time, > --> HostA and hostB maybe get 'seq' with the same value EXT4_MMP_SEQ_CLEAN. > goto skip; Oh, I see. Thanks for explanation. > ... > skip: > /* > * write a new random sequence number. > */ > seq = mmp_new_seq(); > mmp->mmp_seq = cpu_to_le32(seq); > > retval = write_mmp_block(sb, bh); > if (retval) > goto failed; > > /* > * wait for MMP interval and check mmp_seq. > */ > if (schedule_timeout_interruptible(HZ * wait_time) != 0) { > --> If seq is equal with EXT4_MMP_SEQ_CLEAN, wait_time is zero. > ext4_warning(sb, "MMP startup interrupted, failing mount"); > goto failed; > } > > retval = read_mmp_block(sb, &bh, mmp_block); -->We may get the same > data with which we wrote, so we can't detect conflict at here. OK, I see. So the race in ext4_multi_mount_protect() goes like: hostA hostB read_mmp_block() read_mmp_block() - sees EXT4_MMP_SEQ_CLEAN - sees EXT4_MMP_SEQ_CLEAN write_mmp_block() wait_time == 0 -> no wait read_mmp_block() - all OK, mount write_mmp_block() wait_time == 0 -> no wait read_mmp_block() - all OK, mount Do I get it right? Actually, if we passed seq we wrote in ext4_multi_mount_protect() to kmmpd (probably in sb), then kmmpd would notice the conflict on its first invocation but still that would be a bit late because there would be a time window where hostA and hostB would be both using the fs. We could reduce the likelyhood of this race by always waiting in ext4_multi_mount_protect() between write & read but I guess that is undesirable as it would slow down all clean mounts. Ted? Honza -- Jan Kara SUSE Labs, CR