Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp4864104pxu; Wed, 21 Oct 2020 07:14:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJr4SvWYPwKpmo30KhRWFTwo5wZOatrstX8PnmYho/BTn1vdOAZ3b0trP7u/usDRQr3caP X-Received: by 2002:a17:907:2089:: with SMTP id pv9mr3838499ejb.427.1603289665549; Wed, 21 Oct 2020 07:14:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603289665; cv=none; d=google.com; s=arc-20160816; b=HPhkpytIZfkVJ+2vv2wMP5vyi7J9Zy81phY8KEEJhKLLWxaF3X0QjUjGB2MV/xAp/K eWujnGyWwfuJoNRGTMVRn8P1bcVKkfSuFlog5P4RV6gR/+4npOO0h1oVOpr2QaP5AEr4 vcxUHt7psmPBGjGs31HlxgvMBsZLFkQsEhjWt+G2QPz/T+U4ZvHQZwpXJdTYaQWrnJNt fLLMQzoHQDEKZu7HNZwlKaprt8rOMNnxjflRs1JZiKMGlH+jNl1wUA5bkt/1W7JUJ7fX 4ZOIGZv9zpWdalcYESHJSvRoqoWAMwSZlmGN9Q1sidCPTNPhQSz6gW/+9iKvQpvZ5P/E 3Gcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2WNQUIxgOW2iFvaa4bExrsGr7EjcAQp8DYCCLGE7JMM=; b=YuQaRFH6n4V0SjOzHlYf63ttsp+Bbtpuh/b+8DcKtxH4EJH9WTeTXjwS0OZGMQCqGp 9Sg3WNoqXKxMRzdUJ7UhyAS8fLi+u+x3S70eEASkuuMzojq2UQCpQn9eQf0VBklGrxH5 27onSyINtGFN2e8nyxfCbTdh32XaTBeKPuK9HpxL8RDOB5GKYE0ToD6aU8oJjcuvj/nt bmL8WD9GGt66CImo6dyVknW2KgkGeG1dmC67H66cqOouvg2yL+H6UqI4JpCSHLAwkf6h /pOvFuXCQT5bR0De/+GmY3Fe5tvI0cUcnI4aWjWQtDoMbKDQmHXZT0GuJe9l7V5A1bog n4Bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@veeam.com header.s=mx2 header.b=rf3ZDaAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=veeam.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b8si1418384ejp.403.2020.10.21.07.14.00; Wed, 21 Oct 2020 07:14:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@veeam.com header.s=mx2 header.b=rf3ZDaAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=veeam.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2442690AbgJUMzO (ORCPT + 99 others); Wed, 21 Oct 2020 08:55:14 -0400 Received: from mx2.veeam.com ([64.129.123.6]:40766 "EHLO mx2.veeam.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408370AbgJUMzN (ORCPT ); Wed, 21 Oct 2020 08:55:13 -0400 Received: from mail.veeam.com (spbmbx01.amust.local [172.17.17.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.veeam.com (Postfix) with ESMTPS id 661E941522; Wed, 21 Oct 2020 08:55:09 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veeam.com; s=mx2; t=1603284909; bh=2WNQUIxgOW2iFvaa4bExrsGr7EjcAQp8DYCCLGE7JMM=; h=Date:From:To:CC:Subject:References:In-Reply-To:From; b=rf3ZDaAfuvv3l21JLrs05Gxqw/IPsjDjtfvmOoECBJKxS8ovF8zs+KV/k20jv8zWg wkDyl1Ks62gC+mvsqU6YX+5DxgEpGF091/fNDJiffeYGVhRxlndv2f/kZgjpxnQ+Wc 6WxeXS+YlEVLntMnIl+A65OwqBihPpIlwH55KjG8= Received: from veeam.com (172.24.14.5) by spbmbx01.amust.local (172.17.17.171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.595.3; Wed, 21 Oct 2020 15:55:06 +0300 Date: Wed, 21 Oct 2020 15:55:55 +0300 From: Sergei Shtepa To: Matthew Wilcox CC: Damien Le Moal , "axboe@kernel.dk" , "viro@zeniv.linux.org.uk" , "hch@infradead.org" , "darrick.wong@oracle.com" , "linux-xfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "rjw@rjwysocki.net" , "len.brown@intel.com" , "pavel@ucw.cz" , "akpm@linux-foundation.org" , Johannes Thumshirn , "ming.lei@redhat.com" , "jack@suse.cz" , "tj@kernel.org" , "gustavo@embeddedor.com" , "bvanassche@acm.org" , "osandov@fb.com" , "koct9i@gmail.com" , "steve@sk2.org" , "linux-block@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-pm@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH 1/2] Block layer filter - second version Message-ID: <20201021125555.GE20749@veeam.com> References: <1603271049-20681-1-git-send-email-sergei.shtepa@veeam.com> <1603271049-20681-2-git-send-email-sergei.shtepa@veeam.com> <20201021114438.GK20115@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline In-Reply-To: <20201021114438.GK20115@casper.infradead.org> X-Originating-IP: [172.24.14.5] X-ClientProxiedBy: spbmbx01.amust.local (172.17.17.171) To spbmbx01.amust.local (172.17.17.171) X-EsetResult: clean, is OK X-EsetId: 37303A295605D26A677566 X-Veeam-MMEX: True Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The 10/21/2020 14:44, Matthew Wilcox wrote: > On Wed, Oct 21, 2020 at 09:21:36AM +0000, Damien Le Moal wrote: > > > + * submit_bio_direct - submit a bio to the block device layer for I/O > > > + * bypass filter. > > > + * @bio: The bio describing the location in memory and on the device. > > > * > > > + * Description: > > You don't need this line. > > > > + * This is a version of submit_bio() that shall only be used for I/O > > > + * that cannot be intercepted by block layer filters. > > > + * All file systems and other upper level users of the block layer > > > + * should use submit_bio() instead. > > > + * Use this function to access the swap partition and directly access > > > + * the block device file. > > I don't understand why O_DIRECT gets to bypass the block filter. Nor do > I understand why anybody would place a block filter on the swap device. > But if somebody did place a filter on the swap device, why should swap > be able to bypass the filter? > I am very happy to hear such a question. You are really trying to understand the algorithm. Yes, intercepting the swap partition is absurd. But we can't guarantee that the filter won't intercept swap. Swap operation is related to the memory allocation logic. If a swap on the block device are accessed during memory allocation from filter, a deadlock occurs. We can allow filters to occasionally shoot off their feet, especially under high load. But I think it's better not to do it. "directly access" - it is not O_DIRECT. This means (I think) direct reading from the device file, like "dd if=/dev/sda1". As for intercepting direct reading, I don't know how to do the right thing. The problem here is that in fs/block_dev.c in function __blkdev_direct_IO() uses the qc - value returned by the submit_bio() function. This value is used below when calling blk_poll(bdev_get_queue(dev), qc, true). The filter cannot return a meaningful value of the blk_qc_t type when intercepting a request, because at that time it does not know which queue the request will fall into. If function submit_bio() will always return BLK_QC_T_NONE - I think the algorithm of the __blk dev_direct_IO() will not work correctly. If we need to intercept direct access to a block device, we need to at least redo the __blkdev_direct_IO function, getting rid of blk_pool. I'm not sure it's necessary yet. -- Sergei Shtepa Veeam Software developer.