Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp636014imw; Wed, 13 Jul 2022 05:25:36 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sa7lpaOGWwb5rFZ1cOIFdJuEp4w1tUKMduRtDBEGbT3Lbq7KOuU9BizR8qTIXXk35L44sd X-Received: by 2002:a63:d711:0:b0:415:c581:2aff with SMTP id d17-20020a63d711000000b00415c5812affmr2800371pgg.278.1657715136477; Wed, 13 Jul 2022 05:25:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657715136; cv=none; d=google.com; s=arc-20160816; b=sQOmYfyNRA7OXPZG5hmfbIHFmMPSdNafajv2dh0i1YzG6Mua5Q9YPZ2EbfhHJ8KFb2 iIkDL3Xr6iuupZuZ8YhUz0Hv1dbsOs0rWrN5Ps5IggvyIf1y28sxMWDv/mXT1y50zwga aiFk1vm53+8K9RUUbW45L3PEKjxXaBoFixhMYogkOEbLF7P9H/WiMpNzuI/IpLu1kQuU 1YJFQO0S1i2Zb6sgPE+P8Lx7BAMIzyxtLO6KL4mug5HX4fLY1TyYOKmkv2ebaDMORfhd b9sApmh/y24IyUOmIKOtbEkf5ohWLLmm3pXGOmHXtsMQ+VSBd/plc19qHSXrJAYbr4MZ GQzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=zQ/ZVt480ezifkPZehDb75C3ZE/sBmWOTZFlFXYsE1Q=; b=DWd4hTC5fmhtBQdhKqLj+ObFvdc+GDv3JtkuJeoysjxNAY0j5NZ7mNRkUQVSFhURJB VZRd4Y1CcWk0b+JdMeXGVSlnPY4OBt40/J4av/Hz9gCyzVlICiFtHzZxLwlK8drF93C6 LGgbs1BDtXF/CK3KUozDXK+IdXBRvk9Oqc8vipQlUY53uOQNxF7W77SdfBYp9W4KebMp bKhBAjR/bEw+XVv/tSi04wq8OKwjAbKsB7z9ARJfDPs6nIzcwocA9a6hvYx40Il4NVip mNUmC1tM6CnzWYftipfe//bgvYSl+Suwax2RbNhmC0IL3Co1YHQbmpmwk47F2rV+J1Jr YVDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b="ehNWRgt/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id on6-20020a17090b1d0600b001f05563b9eesi2616098pjb.103.2022.07.13.05.25.23; Wed, 13 Jul 2022 05:25:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b="ehNWRgt/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234934AbiGML4L (ORCPT + 99 others); Wed, 13 Jul 2022 07:56:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236127AbiGML4H (ORCPT ); Wed, 13 Jul 2022 07:56:07 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DD3D1034ED; Wed, 13 Jul 2022 04:56:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=zQ/ZVt480ezifkPZehDb75C3ZE/sBmWOTZFlFXYsE1Q=; b=ehNWRgt/QUVkgcdcEueXaNh5X5 b0rF/ru4ecjZqewzB3D//ugVio7xMFxxD2ARzI+iylV7AZQ77/rSF0y2sR75VVXvJC8gU5pqiU6NU RIJWAG3JqxKobfpeRmjG8WiRmFgTkDkqNZPRirB0LuXdgXKRx/4nB7hwgPRy9St6y1VNonDALgPen sRxfFRnW1BanpKFXs7Z9Xl01+kdjtfkJbTkcEgGj8hpmrgPzIFQJuTBKreo+znzVjApfClqENNxEB BMdt0/Glc8z+jRz3h84C581a15d1J+NynoFNldmk8REnOwkHEtL/GXwauvNnh1KQ8S/nEmONyskrI 1OJFxAgg==; Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1oBayK-003JZA-Ey; Wed, 13 Jul 2022 11:56:04 +0000 Date: Wed, 13 Jul 2022 04:56:04 -0700 From: Christoph Hellwig To: Sergei Shtepa Cc: Christoph Hellwig , "axboe@kernel.dk" , "linux-block@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 01/20] block, blk_filter: enable block device filters Message-ID: References: <1655135593-1900-1-git-send-email-sergei.shtepa@veeam.com> <1655135593-1900-2-git-send-email-sergei.shtepa@veeam.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 08, 2022 at 12:45:33PM +0200, Sergei Shtepa wrote: > 1. Work at the partition or disk level? > At the user level, programs operate with block devices. > In fact, the "disk" entity makes sense only for the kernel level. > When the user chooses which block devices to backup and which not, > he operates with mounting points, which are converted into block > devices, partitions. Therefore, it is better to handle bio before > remapping to disk. > If the filtering is performed after remapping, then we will be > forced to apply a filter to the entire disk, or complicate the > filtering algorithm by calculating which range of sectors bio is > addressed to. And if bio is addressed to the partition boundary... > Filtering at the block device level seems to me a simpler solution. > But this is not the biggest problem. Note that bi_bdev stays for the partition things came from. So we could still do filtering after blk_partition_remap has been called, the filter driver just needs to be careful on how to interpret the sector numbers. > 2. Can the filter sleep or postpone bio processing to the worker thread? I think all of te above is fine, just for normal submit_bio based drivers. > The problem is in the implementation of the COW algorithm. > If I send a bio to read a chunk (one bio), and then pass a write bio, > then with some probability I am reading partially overwritten data. > Writing overtakes reading. And flags REQ_SYNC and REQ_PREFLUSH don't help. > Maybe it's a disk driver issue, or a hypervisor, or a NAS, or a RAID, > or maybe normal behavior. I don't know. Although, maybe I'm not working > correctly with flags. I have seen the comments on patch 11/20, but I am > not sure that the fixes will solve this problem. > But because of this, I have to postpone the write until the read completes. In the I/O stack there really isn't any ordering. While a general reordering looks a bit odd to be, it absolutely it always possible. > 2.1 The easiest way to solve the problem is to block the writer's thread > with a semaphore. And for bio with a flag REQ_NOWAIT, complete processing > with bio_wouldblock_error(). This is the solution currently being used. This sounds ok. The other option would be to put the write on hold and only queue it up from the read completion (or rather a workqueue kicked off from the read completion). But this is basically the same, just without blocking the I/O submitter, so we could do the semaphore first and optimize later as needed. > If I am blocked by the q->q_usage_counter counter, then I will not > be able to execute COW in the context of the current thread due to deadlocks. > I will have to use a scheme with an additional worker thread. > Bio filtering will become much more complicated. q_usage_counter itself doesn't really block you from doing anything. You can still sleep inside of it, and most driver do that.