Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp8021372pxb; Fri, 19 Feb 2021 05:37:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJy5eQgMOF1ADcT6Knk87VQfIvPWVWOWgFwkUcLi0OfTW7Bis3aHzw0QM+WNlxOvgriLkYXz X-Received: by 2002:a50:e1c9:: with SMTP id m9mr9641322edl.307.1613741845669; Fri, 19 Feb 2021 05:37:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613741845; cv=none; d=google.com; s=arc-20160816; b=bl4j2IfPlOcC5hpwijL6VgItzu9xGc649f4quJ91i+RXume/R+DpiYJYl//pJeHAxR du/5W8+y91Eea8rS2gPvIKHYbhHs2zDycWoC2yGNIOTgVxOZH+vmWJ+aCgngX8uPaPsl qjQ5h2sgIb4Ac79xdIECYjRZvhiTgGgO34Yj26ZbmO0avgULfjCwKpat7v2zvAMpBBQq m/HcqtEDzGxBL9POn0I82m+RwhK3OgBXetFn2XlGgiRgB9DUjuYpkTeHYxuSe8u6/Yr3 0/uE7mrFub7sux3eC4CdUH7/7YcALRTGxvtmfmRJNmX7UoA5eUcXJ8GXy6US4uJh9SOF qyFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=mcoXhqBIUqG6vfkydd43rkdHFNv9STub+t6Kng0JGuY=; b=qvZNHyg90nGUFSwaZlTQq86vJ+HtL38Cm1Tro5b8fT4FikZKn1Saw5PH0luNSM1wc5 3xmHQp7yoahUxgR/D3cEn/IY9UI/HoYYzTLyZxKgGVGRNRVRncFoxIj2YXQnhXNloZXR CZaUsDDWIkcuKug/5iuerx9yF8gTJCrBYrbv57qvstsD7Lyodl2eTsjFjv7YvyarcRNE 5m8KV8jgX7VpX8pgb+4i+SEmQvsQF07QAc4oKJUEdmB0Z1sjl8ljv7KfsoYti1V4aZ7V rNufQG1zpoYkNpcVvCBHKhXiGFxYMOZHQlJMgw5q8p166BiaJLcl6s+3kVUORYMOQmM2 LQDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=M9VjBZUS; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k16si6076694edq.147.2021.02.19.05.36.57; Fri, 19 Feb 2021 05:37:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=M9VjBZUS; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229636AbhBSNgl (ORCPT + 99 others); Fri, 19 Feb 2021 08:36:41 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:33481 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229553AbhBSNgk (ORCPT ); Fri, 19 Feb 2021 08:36:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613741713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mcoXhqBIUqG6vfkydd43rkdHFNv9STub+t6Kng0JGuY=; b=M9VjBZUSP1r+OqftIYrB2AYUs/olHE83V7gFD0LXaMbBC0bJXbRJL0Ij1mxguh9tVhZZI2 CcqbPlTokUcgJuJecsFx4dWKkSniDs6a+Ea2MJRjQwuZHUTVvgYYaIXZw7dW9x4w69wx7v OSQ/jDWSoEQ8OXbSjhyUTe4r3TiQICo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-416-shBcZ05sPZGj0O3E1IjIew-1; Fri, 19 Feb 2021 08:35:09 -0500 X-MC-Unique: shBcZ05sPZGj0O3E1IjIew-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AF6DE803F49; Fri, 19 Feb 2021 13:35:08 +0000 (UTC) Received: from work (unknown [10.40.194.236]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7C305189C4; Fri, 19 Feb 2021 13:35:04 +0000 (UTC) Date: Fri, 19 Feb 2021 14:34:59 +0100 From: Lukas Czerner To: Alexey Lyashkov Cc: Andreas Dilger , Artem Blagodarenko , linux-ext4 , Eric Sandeen Subject: Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file Message-ID: <20210219133459.vezgrlkjpmaizvb4@work> References: <20210212093719.162065-1-lczerner@redhat.com> <20210218095146.265302-1-lczerner@redhat.com> <99A17D19-8764-4027-8B1E-E7ADBE5E2CEE@gmail.com> <20210219105713.uu2mywenytfd2e5j@work> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote: > Lukas, > > because e2fsprogs have an bad assumption about IO size for the O_DIRECT case. > and because library uses a code like > >> > set_block_size(1k); > seek(fs, 1); > read_block(); > >>> > which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report. > > Reference to patch. > https://patchwork.ozlabs.org/project/linux-ext4/patch/20201023112659.1559-1-artem.blagodarenko@gmail.com/ Alright, I skimmed through your patch proposal and I am not sure I completely understand the problem because you have not provided the code adding O_DIRECT support for e2image. However I think that it is a reasonable assumption to make that there is not going to be a file system on a block device such that the fs blocksize is smaller than device sector size. You can't create such fs with mke2fs and you can't mount such file system either. All that said I can now see that there is a problem in case of mke2fs and debugfs when used with O_DIRECT (-D) on a file system image with 1k block size stored on a file in the host file system on the block device with sector size larger than 1k (...I am getting Inception flashbacks now) In fact I can confirm that indeed, both mke2fs and debugfs will fail in such scenario. The question is whether we care enough to support O_DIRECT in such situations. Personally I don't care enough about this. However it would be nice to at least have a check (probably in ext2fs_open2, unix_open_channel or such) and notify user about the problem. Note that this conversation does not affect my patch since ext2fs_mmp_read() does not use the unix_io infrastructure. -Lukas > > Alex > > > 19 февр. 2021 г., в 13:57, Lukas Czerner написал(а): > > > > On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote: > >> Andreas, > >> > >> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ? > >> Instead of fixing an O_DIRECT access with patches sends early. > > > > Why would it not work at all ? This is a fix for a specific problem and > > I am not currently aware of ony other problems e2fsprogs should have > > with 4k sector size drives. Do you have a specific problem in mind ? > > > > Thanks! > > -Lukas > > > >> > >> > >> Alex > >> > >>> 19 февр. 2021 г., в 1:20, Andreas Dilger написал(а): > >>> > >>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner wrote: > >>>> > >>>> Currently the mmp block is read using O_DIRECT to avoid any caching that > >>>> may be done by the VM. However when working with regular files this > >>>> creates alignment issues when the device of the host file system has > >>>> sector size larger than the blocksize of the file system in the file > >>>> we're working with. > >>>> > >>>> This can be reproduced with t_mmp_fail test when run on the device with > >>>> 4k sector size because the mke2fs fails when trying to read the mmp > >>>> block. > >>>> > >>>> Fix it by disabling O_DIRECT when working with regular files. I don't > >>>> think there is any risk of doing so since the file system layer, unlike > >>>> shared block device, should guarantee cache consistency. > >>>> > >>>> Signed-off-by: Lukas Czerner > >>>> Reviewed-by: Eric Sandeen > >>> > >>> Reviewed-by: Andreas Dilger > >>> > >>>> --- > >>>> v2: Fix comment - it avoids problems when the sector size is larger not > >>>> smaller than blocksize > >>>> > >>>> lib/ext2fs/mmp.c | 22 +++++++++++----------- > >>>> 1 file changed, 11 insertions(+), 11 deletions(-) > >>>> > >>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c > >>>> index c21ae272..cca2873b 100644 > >>>> --- a/lib/ext2fs/mmp.c > >>>> +++ b/lib/ext2fs/mmp.c > >>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf) > >>>> * regardless of how the io_manager is doing reads, to avoid caching of > >>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */ > >>>> if (fs->mmp_fd <= 0) { > >>>> + struct stat st; > >>>> int flags = O_RDWR | O_DIRECT; > >>>> > >>>> -retry: > >>>> + /* > >>>> + * There is no reason for using O_DIRECT if we're working with > >>>> + * regular file. Disabling it also avoids problems with > >>>> + * alignment when the device of the host file system has sector > >>>> + * size larger than blocksize of the fs we're working with. > >>>> + */ > >>>> + if (stat(fs->device_name, &st) == 0 && > >>>> + S_ISREG(st.st_mode)) > >>>> + flags &= ~O_DIRECT; > >>>> + > >>>> fs->mmp_fd = open(fs->device_name, flags); > >>>> if (fs->mmp_fd < 0) { > >>>> - struct stat st; > >>>> - > >>>> - /* Avoid O_DIRECT for filesystem image files if open > >>>> - * fails, since it breaks when running on tmpfs. */ > >>>> - if (errno == EINVAL && (flags & O_DIRECT) && > >>>> - stat(fs->device_name, &st) == 0 && > >>>> - S_ISREG(st.st_mode)) { > >>>> - flags &= ~O_DIRECT; > >>>> - goto retry; > >>>> - } > >>>> retval = EXT2_ET_MMP_OPEN_DIRECT; > >>>> goto out; > >>>> } > >>>> -- > >>>> 2.26.2 > >>>> > >>> > >>> > >>> Cheers, Andreas > >>> > >>> > >>> > >>> > >>> > >> > > >