Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2830092imu; Sun, 23 Dec 2018 08:44:39 -0800 (PST) X-Google-Smtp-Source: ALg8bN6ZkT5ZiizjVqTVOq980knAFdJFJVjfhzzTZpAdC8eqs/csnU2tCyKCZFmGgu8TSlTTozY6 X-Received: by 2002:a17:902:6bc7:: with SMTP id m7mr10329129plt.106.1545583479515; Sun, 23 Dec 2018 08:44:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545583479; cv=none; d=google.com; s=arc-20160816; b=iK15k91bE2ppCYEtSa9Y65b9g24OCjhRbcDzwF7NSeLDgH71wM2ZYXllmMeVvX6mFK tY3Ic2OF8X6OAHMTaQjID/HCoS1O1LEHznjvi0Ij0J4oN6TnlCc02j8sT+ZX+f15Ca9f j1HqSJ1QEZDsCp0ez+xc8O/7UBfxogGaIfwu15YaYDtRvlb/R2nbxmJwXrtvpGmurDgT g2tOFGLrPA+rACX7Q/4m6VdeiAkafagjH9dGJf6wqjcx8L75Avn919de+6GCaEY6p4Yk 7sCnUM25YbWiBpcJZXj6eIHo9Zvvo3u/L7uoN+lgHQI6M4S+ctmGGviHwOW1GZSKu0zP ggHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=0mZjSL5rVJFEYazbO6PanpYGXm9y36Qt//elXnmS2/Y=; b=VYBWgO3f+Y7ubUVk/3kPv1UtQWw137/+edMWNq1X7wdSg/+LttbJcalW5/oYJnSDQb u2SDgsmZAviB2NEVrtb95e6Lx8p4LimqH+ch8qxl4NB/ZfxRd1IPTZR2NkiqT5tI91p8 FDxbgY5WxKZRmnatlowyzoE2+RP05hXWwPlTgeFzwMh80ketP+QLFbXyT3ibpoPSVrTy hMpiAAAysLgQeFykZPQinCM9HiHGT/0atATCIK9Nfq1HL/V9DZT/HGkaG1u2wMYgLbZP hoofbTzDgx5wpPi5J3+eJ9E78KS3IaA3HcftFKyU2PSzkIkwP82dwojSiW6YwOk0Lkfy SSfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s71si27029676pfk.105.2018.12.23.08.44.24; Sun, 23 Dec 2018 08:44:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728452AbeLWM5N (ORCPT + 99 others); Sun, 23 Dec 2018 07:57:13 -0500 Received: from mail-wm1-f46.google.com ([209.85.128.46]:51803 "EHLO mail-wm1-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725807AbeLWM5N (ORCPT ); Sun, 23 Dec 2018 07:57:13 -0500 Received: by mail-wm1-f46.google.com with SMTP id b11so9272732wmj.1; Sun, 23 Dec 2018 04:57:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=0mZjSL5rVJFEYazbO6PanpYGXm9y36Qt//elXnmS2/Y=; b=L8S6KX3lSIi2zssIvjvNtFRxOxzl0g/Zg63p7qaELDe7I1nv5vqaxzhoXJmKnkAjt2 i5/r4SI1tTR+QR6mTC1G7UXfUWwEzvqA2gpWIjGPtY5OfWAowwJoXsMC19FC8mTd//8X K7TRA2M138wGpjEEGqae7E0IgCO3JtLUcU42yDKhFlyJhV/RX+uPBU9nWsD/GRsz7PYA ZVBGR8+5NSYqWStuCLaIRAhy/jep6rFE6h8aVrSQKg4Lxq/Xv28P7LRmoMn7+sU/NXJu Ce1D2/+XNU6IiDiHzE2VGuUZiqeCjVaYBXgYwzdslAcey4K28bsQcsso6KvqnwqgSY84 ICgg== X-Gm-Message-State: AJcUukfATOEHX4EokntvkW6+Wal6ZY6zQ5fEoFF6J74Ll2oFMIE0KaQq Cp6HDPka74DVUHMhIL1OzXsKy/GW X-Received: by 2002:a1c:2408:: with SMTP id k8mr9000749wmk.110.1545569829837; Sun, 23 Dec 2018 04:57:09 -0800 (PST) Received: from [10.0.0.5] ([207.232.55.62]) by smtp.gmail.com with ESMTPSA id c9sm25214731wmh.27.2018.12.23.04.57.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Dec 2018 04:57:09 -0800 (PST) Subject: Re: remove exofs, the T10 OSD code and block/scsi bidi support V3 To: Christoph Hellwig , Douglas Gilbert References: <20181111133211.13926-1-hch@lst.de> <4f4b6aff-6726-c500-e3e4-f8b73d641851@electrozaur.com> <20181219144347.GB23410@lst.de> <0e8b8d45-cfeb-ba9d-c92f-953cabede1ee@interlog.com> <20181220072656.GA10011@lst.de> Cc: axboe@kernel.dk, martin.petersen@oracle.com, Johannes Thumshirn , Benjamin Block , linux-scsi@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org From: Boaz Harrosh Message-ID: <406d1a96-2a97-2e35-e52e-22525555fc09@electrozaur.com> Date: Sun, 23 Dec 2018 14:57:07 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20181220072656.GA10011@lst.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/12/18 09:26, Christoph Hellwig wrote: > On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote: >>> 1) reduce the size of every kernel with block layer support, and >>> even more for every kernel with scsi support >> >> By proposing the removal of bidi support from the block layer, it isn't >> just the SCSI subsystem that will be impacted. Those NVMe documents >> that you referred me to earlier in the year, in the command tables >> in 1.3c and earlier you have noticed the 2 bit direction field and >> what 11b means? Even if there aren't any bidi NVMe commands *** yet, >> the fact that NVMe's 64 byte command format has provision for 4 >> (not 2) independent data transfers (data + meta, for each direction). >> Surely NVMe will sooner or later take advantage of those ... a >> command like READ GATHERED comes to mind. > > NVMe on the other hand does have support for separate read and write > buffers as in the current SCSI bidi support, as it encodes the data > transfers in that SQE. So IFF NVMe does bidi commands it would have > to use a single buffer for data in/out, There is no such thing as "buffer" there is at first a bio, and after virtual-to-iommu mapping a scatter-gather-list. All these are currently governed by a struct request. request, bio, and sgl, have a single direction, All API's expect a single direction. All BIDI did was to say. Lets not change any API or structure but just use two of them at the same time. All the wiser is the very high level user, and the very low HW driver like iscsi. All the middlewere was never touched. In the view of a bidi target like say an osd. It all stream looks like a single "Buffer" on the wire, were some of it is read and some of it is written to. > which can be easily done ?? Did you try. It will take much more than an additional pointer sir > in the block layer without the current bidi support that chains > two struct request instances for data in and data out. > That was the all trick of not changing a single API or structure Just have two of the same thing, we already know how to handle >>> 2) reduce the size of the critical struct request structure by >>> 128 bits, thus reducing the memory used by every blk-mq driver >>> significantly, never mind the cache effects >> >> Hmm, one pointer (that is null in the non-bidi case) should be enough, >> that's 64 or 32 bits. > > Due to the way we use request chaining we need two fields at the > moment. ->special and ->next_rq. No! ->special is nothing to do with bidi. ->special is a field to be used by LLD's only and are not to be touched by block layer or transports or high level users. Request has the single ->next_rq for bidi. And could be eliminated by sharing space with the elevator info. Do you want a patch? (So in effect it can be taking 0 bytes, and yes a little bit of code) > If we'd refactor the whole thing > for the basically non-existent user we could indeed probably get it > down to a single pointer. > >> While on the subject of bidi, the order of transfers: is the data-out >> (to the target) always before the data-in or is it the target device >> that decides (depending on the semantics of the command) who is first? > > The way I read SAM data needs to be transferred to the device for > processing first, then the processing occurs and then it is transferred > out, so the order seems fixed. > Not sure what is the "SAM" above. But most of the BIDI commands I know, osd and otherwise, the order is command specific, and many times it is done in parallel. Read some bits than write some bits, rinse and repeat ... (You see in scsi the all OUT buffer is part of the actual CDB, so in effect any READ is a BIDI. The novelty here is the variable sizes and the SW stack memory targets for the different operations) >> >> Doug Gilbert >> >> *** there could already be vendor specific bidi NVMe commands out >> there (ditto for SCSI) > > For NVMe they'd need to transfer data in and out in the same buffer > to sort work, and even then only if we don't happen to be bounce > buffering using swiotlb, or using a network transport. Similarly for > SCSI only iSCSI at the moment supports bidi CDBs, so we could have > applications using vendor specific bidi commands on iSCSI, which > is exactly what we're trying to find out, but it is a bit of a very > niche use case. > Again bidi works NOW. Did not yet see the big gain, of throwing it out. Jai Maa Boaz