Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5028888rdb; Tue, 12 Dec 2023 17:26:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IEsUevjTS1g+Kls4MMiflzZ3TTnQG/ofda6nqQ0Hm8jeJoycsvhzLs5goTYo8tMVEgBIm14 X-Received: by 2002:a05:6870:331e:b0:1fb:75c:3ff7 with SMTP id x30-20020a056870331e00b001fb075c3ff7mr9279846oae.87.1702430775640; Tue, 12 Dec 2023 17:26:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702430775; cv=none; d=google.com; s=arc-20160816; b=d4ak8OJBeLFmcgPyqwfD1YhSqZy0Hsutmi1s5GWHdvzuvr565lp4kAY+0M3cmq7PwP /vhLkwiVDiwQGU+fCX1pOI7Ypev474xvSbZgoHRyy9NkVTOGLAElH58tXGCLMc32yxSR 3KPEpot1aXSmScK8EQeU2c2inL8eZHozcDUek4vhKAij+Iaug7krXaSw+PLgUT1OzwTr dD4sRdBAVBMxLHvvAK4tyoEsDSRSdJuXvSa4TZZNWuRHvUfC4093yRZevu3Kh2hXtQT3 1NiIA+josidiGpGqN1X2f1bswOFNxjsIxSDJqq9Zxszzxflt8EcGuHFHLrV2vRw3oGHm W40w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7+Fi662eTBH61l7dziQI31ojIowi0xTylG+eKXF8CnY=; fh=ePbfa/LFdoJBFxYtI78YwfCEVqGayflZm9RBVdgRY70=; b=oIi8IsjvF3Bc7gf8b2MMwFHQGbpPP9v1Zzlhmpz7cNyMOTAVX9VS1IkLPS5v32y5O8 uP5n/Y9fhbbZiSd+Z4jdRJGxKm8O5AHF52KeIxZk8O6MYRn2BQrquIM2Kj7Wl/us7Zd0 eJKKcs8e+V2QhyD3h/Ud/xVjYAYWb2Npm9D2OWCmCToRTbfCYGVnDOGbxCrlUXFGSQT7 U7YSdFD4GkEiX4X1qfVGulclrGqljIKH9wF4i9p+7PniHVxkpJyrxOgWtJwxH1o+kHtP f2FVnQzWzcubwoDIl642uXSWEehmMkJe2i4th/ZHQ3N8aiSwoXwYYAEyiLHrjCt0B6bd wB8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NEPzLyB6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id bw32-20020a056a0204a000b005c663eae37fsi8722471pgb.295.2023.12.12.17.26.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 17:26:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NEPzLyB6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 07EC480465C1; Tue, 12 Dec 2023 17:26:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378138AbjLMBZz (ORCPT + 99 others); Tue, 12 Dec 2023 20:25:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378177AbjLMBZu (ORCPT ); Tue, 12 Dec 2023 20:25:50 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A04CA7 for ; Tue, 12 Dec 2023 17:25:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702430756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7+Fi662eTBH61l7dziQI31ojIowi0xTylG+eKXF8CnY=; b=NEPzLyB676JyqYQ/Ueq9RkfXVT0oghLuZ0Aw7Xq2AWQHfO8TJdu9YLCXLFQloDyf0JF8IC o2BnnVkmai89sktHpYekcolerWUXkQIRgpzHAHpOSFR4qBm5Jnvb5DzurXPzgRpvSv+pXA gL0nUsXjw7vRLFkEeeLR0YuG6lwgD/E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-664-C7-uyY69MkKS7aFFTyZZzQ-1; Tue, 12 Dec 2023 20:25:52 -0500 X-MC-Unique: C7-uyY69MkKS7aFFTyZZzQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D1ED9833B41; Wed, 13 Dec 2023 01:25:51 +0000 (UTC) Received: from fedora (unknown [10.72.116.39]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5B04C3C25; Wed, 13 Dec 2023 01:25:42 +0000 (UTC) Date: Wed, 13 Dec 2023 09:25:38 +0800 From: Ming Lei To: John Garry Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, jaswin@linux.ibm.com, bvanassche@acm.org, Himanshu Madhani Subject: Re: [PATCH v2 01/16] block: Add atomic write operations to request_queue limits Message-ID: References: <20231212110844.19698-1-john.g.garry@oracle.com> <20231212110844.19698-2-john.g.garry@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231212110844.19698-2-john.g.garry@oracle.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 12 Dec 2023 17:26:13 -0800 (PST) On Tue, Dec 12, 2023 at 11:08:29AM +0000, John Garry wrote: > From: Himanshu Madhani > > Add the following limits: > - atomic_write_boundary_bytes > - atomic_write_max_bytes > - atomic_write_unit_max_bytes > - atomic_write_unit_min_bytes > > All atomic writes limits are initialised to 0 to indicate no atomic write > support. Stacked devices are just not supported either for now. > > Signed-off-by: Himanshu Madhani > #jpg: Heavy rewrite > Signed-off-by: John Garry > --- > Documentation/ABI/stable/sysfs-block | 47 ++++++++++++++++++++++ > block/blk-settings.c | 60 ++++++++++++++++++++++++++++ > block/blk-sysfs.c | 33 +++++++++++++++ > include/linux/blkdev.h | 37 +++++++++++++++++ > 4 files changed, 177 insertions(+) > > diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block > index 1fe9a553c37b..ba81a081522f 100644 > --- a/Documentation/ABI/stable/sysfs-block > +++ b/Documentation/ABI/stable/sysfs-block > @@ -21,6 +21,53 @@ Description: > device is offset from the internal allocation unit's > natural alignment. > > +What: /sys/block//atomic_write_max_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter specifies the maximum atomic write > + size reported by the device. This parameter is relevant > + for merging of writes, where a merged atomic write > + operation must not exceed this number of bytes. > + The atomic_write_max_bytes may exceed the value in > + atomic_write_unit_max_bytes if atomic_write_max_bytes > + is not a power-of-two or atomic_write_unit_max_bytes is > + limited by some queue limits, such as max_segments. > + > + > +What: /sys/block//atomic_write_unit_min_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter specifies the smallest block which can > + be written atomically with an atomic write operation. All > + atomic write operations must begin at a > + atomic_write_unit_min boundary and must be multiples of > + atomic_write_unit_min. This value must be a power-of-two. > + > + > +What: /sys/block//atomic_write_unit_max_bytes > +Date: January 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter defines the largest block which can be > + written atomically with an atomic write operation. This > + value must be a multiple of atomic_write_unit_min and must > + be a power-of-two. > + > + > +What: /sys/block//atomic_write_boundary_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] A device may need to internally split I/Os which > + straddle a given logical block address boundary. In that > + case a single atomic write operation will be processed as > + one of more sub-operations which each complete atomically. > + This parameter specifies the size in bytes of the atomic > + boundary if one is reported by the device. This value must > + be a power-of-two. > + > > What: /sys/block//diskseq > Date: February 2021 > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 0046b447268f..d151be394c98 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -59,6 +59,10 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->zoned = BLK_ZONED_NONE; > lim->zone_write_granularity = 0; > lim->dma_alignment = 511; > + lim->atomic_write_unit_min_sectors = 0; > + lim->atomic_write_unit_max_sectors = 0; > + lim->atomic_write_max_sectors = 0; > + lim->atomic_write_boundary_sectors = 0; Can we move the four into single structure and setup them in single API? Then cross-validation can be done in this API. > } > > /** > @@ -183,6 +187,62 @@ void blk_queue_max_discard_sectors(struct request_queue *q, > } > EXPORT_SYMBOL(blk_queue_max_discard_sectors); > > +/** > + * blk_queue_atomic_write_max_bytes - set max bytes supported by > + * the device for atomic write operations. > + * @q: the request queue for the device > + * @size: maximum bytes supported > + */ > +void blk_queue_atomic_write_max_bytes(struct request_queue *q, > + unsigned int bytes) > +{ > + q->limits.atomic_write_max_sectors = bytes >> SECTOR_SHIFT; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_max_bytes); What if driver doesn't call it but driver supports atomic write? I guess the default max sectors should be atomic_write_unit_max_sectors if the feature is enabled. > + > +/** > + * blk_queue_atomic_write_boundary_bytes - Device's logical block address space > + * which an atomic write should not cross. > + * @q: the request queue for the device > + * @bytes: must be a power-of-two. > + */ > +void blk_queue_atomic_write_boundary_bytes(struct request_queue *q, > + unsigned int bytes) > +{ > + q->limits.atomic_write_boundary_sectors = bytes >> SECTOR_SHIFT; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_boundary_bytes); Default atomic_write_boundary_sectors should be atomic_write_unit_max_sectors in case of atomic write? > + > +/** > + * blk_queue_atomic_write_unit_min_sectors - smallest unit that can be written > + * atomically to the device. > + * @q: the request queue for the device > + * @sectors: must be a power-of-two. > + */ > +void blk_queue_atomic_write_unit_min_sectors(struct request_queue *q, > + unsigned int sectors) > +{ > + struct queue_limits *limits = &q->limits; > + > + limits->atomic_write_unit_min_sectors = sectors; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_min_sectors); atomic_write_unit_min_sectors should be >= (physical block size >> 9) given the minimized atomic write unit is physical sector for all disk. > + > +/* > + * blk_queue_atomic_write_unit_max_sectors - largest unit that can be written > + * atomically to the device. > + * @q: the request queue for the device > + * @sectors: must be a power-of-two. > + */ > +void blk_queue_atomic_write_unit_max_sectors(struct request_queue *q, > + unsigned int sectors) > +{ > + struct queue_limits *limits = &q->limits; > + > + limits->atomic_write_unit_max_sectors = sectors; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_max_sectors); atomic_write_unit_max_sectors should be >= atomic_write_unit_min_sectors. Thanks, Ming