This series consolidates the documentation for /sys/block/<disk>/queue/
into Documentation/ABI/, where it is supposed to go (as per Greg KH:
https://lore.kernel.org/r/[email protected]).
This series also updates MAINTAINERS to associate the block
documentation with the block layer.
This series applies to linux-block/for-next.
Eric Biggers (7):
docs: sysfs-block: sort alphabetically
docs: sysfs-block: add contact for nomerges
docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
docs: sysfs-block: document stable_writes
docs: sysfs-block: document virt_boundary_mask
docs: block: remove queue-sysfs.rst
MAINTAINERS: add entries for block layer documentation
Documentation/ABI/testing/sysfs-block | 766 ++++++++++++++++++--------
Documentation/block/index.rst | 1 -
Documentation/block/queue-sysfs.rst | 321 -----------
MAINTAINERS | 2 +
4 files changed, 545 insertions(+), 545 deletions(-)
delete mode 100644 Documentation/block/queue-sysfs.rst
base-commit: c2626d30f312afc341158e07bf088f5a23b4eeeb
--
2.34.1
From: Eric Biggers <[email protected]>
/sys/block/<disk>/queue/stable_writes is completely undocumented.
Document it.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-block | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 94711edc6529d..5d8b187e1ec54 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -515,6 +515,16 @@ Description:
scheduler module, if it isn't already present in the system.
+What: /sys/block/<disk>/queue/stable_writes
+Date: September 2020
+Contact: [email protected]
+Description:
+ [RW] If the device requires that memory must not be modified
+ while it is being written out to disk, this file will contain
+ '1'. Otherwise it will contain '0'. This file is writable for
+ testing purposes.
+
+
What: /sys/block/<disk>/queue/throttle_sample_time
Date: March 2017
Contact: [email protected]
--
2.34.1
From: Eric Biggers <[email protected]>
/sys/block/<disk>/queue/virt_boundary_mask is completely undocumented.
Document it.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-block | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 5d8b187e1ec54..8cc795476244c 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -536,6 +536,17 @@ Description:
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
+What: /sys/block/<disk>/queue/virt_boundary_mask
+Date: April 2021
+Contact: [email protected]
+Description:
+ [RO] This file shows the I/O segment alignment mask for the
+ block device. I/O requests to this device will be split between
+ segments wherever either the end of the previous segment or the
+ beginning of the current segment is not aligned to
+ virt_boundary_mask + 1 bytes.
+
+
What: /sys/block/<disk>/queue/wbt_lat_usec
Date: November 2016
Contact: [email protected]
--
2.34.1
From: Eric Biggers <[email protected]>
The nomerges file was missing a "Contact" entry. Use linux-block.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-block | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 9febd53a5ebe8..c70fce6b76c17 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -242,7 +242,7 @@ Description:
What: /sys/block/<disk>/queue/nomerges
Date: January 2010
-Contact:
+Contact: [email protected]
Description:
Standard I/O elevator operations include attempts to
merge contiguous I/Os. For known random I/O loads these
--
2.34.1
From: Eric Biggers <[email protected]>
sysfs documentation is supposed to go in Documentation/ABI/.
However, /sys/block/<disk>/queue/* are documented in
Documentation/block/queue-sysfs.rst, and sometimes redundantly in
Documentation/ABI/testing/sysfs-block too.
Let's consolidate this documentation into Documentation/ABI/.
Therefore, copy the relevant docs from queue-sysfs.rst into sysfs-block.
This primarily means adding the 25 missing files that were documented in
queue-sysfs.rst only, as well as mentioning the RO/RW status of files.
Documentation/ABI/ requires "Date" and "Contact" fields. For the Date
fields, I used the date of the commit which added support for each file.
For the "Contact" fields, I used linux-block.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-block | 481 ++++++++++++++++++++------
1 file changed, 380 insertions(+), 101 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index c70fce6b76c17..94711edc6529d 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -46,7 +46,7 @@ Description:
The value type is unsigned int.
Cf. Documentation/block/stat.rst which contains a single value for
requests in flight.
- This is related to nr_requests in Documentation/block/queue-sysfs.rst
+ This is related to /sys/block/<disk>/queue/nr_requests
and for SCSI device also its queue_depth.
@@ -134,207 +134,486 @@ Description:
same as the format of /sys/block/<disk>/stat.
+What: /sys/block/<disk>/queue/add_random
+Date: June 2010
+Contact: [email protected]
+Description:
+ [RW] This file allows to turn off the disk entropy contribution.
+ Default value of this file is '1'(on).
+
+
What: /sys/block/<disk>/queue/chunk_sectors
Date: September 2016
Contact: Hannes Reinecke <[email protected]>
Description:
- chunk_sectors has different meaning depending on the type
+ [RO] chunk_sectors has different meaning depending on the type
of the disk. For a RAID device (dm-raid), chunk_sectors
- indicates the size in 512B sectors of the RAID volume
- stripe segment. For a zoned block device, either
- host-aware or host-managed, chunk_sectors indicates the
- size in 512B sectors of the zones of the device, with
- the eventual exception of the last zone of the device
- which may be smaller.
+ indicates the size in 512B sectors of the RAID volume stripe
+ segment. For a zoned block device, either host-aware or
+ host-managed, chunk_sectors indicates the size in 512B sectors
+ of the zones of the device, with the eventual exception of the
+ last zone of the device which may be smaller.
+
+What: /sys/block/<disk>/queue/dax
+Date: June 2016
+Contact: [email protected]
+Description:
+ [RO] This file indicates whether the device supports Direct
+ Access (DAX), used by CPU-addressable storage to bypass the
+ pagecache. It shows '1' if true, '0' if not.
What: /sys/block/<disk>/queue/discard_granularity
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
Description:
- Devices that support discard functionality may
- internally allocate space using units that are bigger
- than the logical block size. The discard_granularity
- parameter indicates the size of the internal allocation
- unit in bytes if reported by the device. Otherwise the
- discard_granularity will be set to match the device's
- physical block size. A discard_granularity of 0 means
- that the device does not support discard functionality.
+ [RO] Devices that support discard functionality may internally
+ allocate space using units that are bigger than the logical
+ block size. The discard_granularity parameter indicates the size
+ of the internal allocation unit in bytes if reported by the
+ device. Otherwise the discard_granularity will be set to match
+ the device's physical block size. A discard_granularity of 0
+ means that the device does not support discard functionality.
What: /sys/block/<disk>/queue/discard_max_bytes
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
Description:
- Devices that support discard functionality may have
- internal limits on the number of bytes that can be
- trimmed or unmapped in a single operation. Some storage
- protocols also have inherent limits on the number of
- blocks that can be described in a single command. The
- discard_max_bytes parameter is set by the device driver
- to the maximum number of bytes that can be discarded in
- a single operation. Discard requests issued to the
- device must not exceed this limit. A discard_max_bytes
- value of 0 means that the device does not support
- discard functionality.
+ [RW] While discard_max_hw_bytes is the hardware limit for the
+ device, this setting is the software limit. Some devices exhibit
+ large latencies when large discards are issued, setting this
+ value lower will make Linux issue smaller discards and
+ potentially help reduce latencies induced by large discard
+ operations.
+
+
+What: /sys/block/<disk>/queue/discard_max_hw_bytes
+Date: July 2015
+Contact: [email protected]
+Description:
+ [RO] Devices that support discard functionality may have
+ internal limits on the number of bytes that can be trimmed or
+ unmapped in a single operation. The `discard_max_hw_bytes`
+ parameter is set by the device driver to the maximum number of
+ bytes that can be discarded in a single operation. Discard
+ requests issued to the device must not exceed this limit. A
+ `discard_max_hw_bytes` value of 0 means that the device does not
+ support discard functionality.
What: /sys/block/<disk>/queue/discard_zeroes_data
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
Description:
- Will always return 0. Don't rely on any specific behavior
+ [RO] Will always return 0. Don't rely on any specific behavior
for discards, and don't read this file.
+What: /sys/block/<disk>/queue/fua
+Date: May 2018
+Contact: [email protected]
+Description:
+ [RO] Whether or not the block driver supports the FUA flag for
+ write requests. FUA stands for Force Unit Access. If the FUA
+ flag is set that means that write requests must bypass the
+ volatile cache of the storage device.
+
+
+What: /sys/block/<disk>/queue/hw_sector_size
+Date: January 2008
+Contact: [email protected]
+Description:
+ [RO] This is the hardware sector size of the device, in bytes.
+
+
+What: /sys/block/<disk>/queue/independent_access_ranges/
+Date: October 2021
+Contact: [email protected]
+Description:
+ [RO] The presence of this sub-directory of the
+ /sys/block/xxx/queue/ directory indicates that the device is
+ capable of executing requests targeting different sector ranges
+ in parallel. For instance, single LUN multi-actuator hard-disks
+ will have an independent_access_ranges directory if the device
+ correctly advertizes the sector ranges of its actuators.
+
+ The independent_access_ranges directory contains one directory
+ per access range, with each range described using the sector
+ (RO) attribute file to indicate the first sector of the range
+ and the nr_sectors (RO) attribute file to indicate the total
+ number of sectors in the range starting from the first sector of
+ the range. For example, a dual-actuator hard-disk will have the
+ following independent_access_ranges entries.::
+
+ $ tree /sys/block/<disk>/queue/independent_access_ranges/
+ /sys/block/<disk>/queue/independent_access_ranges/
+ |-- 0
+ | |-- nr_sectors
+ | `-- sector
+ `-- 1
+ |-- nr_sectors
+ `-- sector
+
+ The sector and nr_sectors attributes use 512B sector unit,
+ regardless of the actual block size of the device. Independent
+ access ranges do not overlap and include all sectors within the
+ device capacity. The access ranges are numbered in increasing
+ order of the range start sector, that is, the sector attribute
+ of range 0 always has the value 0.
+
+
+What: /sys/block/<disk>/queue/io_poll
+Date: November 2015
+Contact: [email protected]
+Description:
+ [RW] When read, this file shows whether polling is enabled (1)
+ or disabled (0). Writing '0' to this file will disable polling
+ for this device. Writing any non-zero value will enable this
+ feature.
+
+
+What: /sys/block/<disk>/queue/io_poll_delay
+Date: November 2016
+Contact: [email protected]
+Description:
+ [RW] If polling is enabled, this controls what kind of polling
+ will be performed. It defaults to -1, which is classic polling.
+ In this mode, the CPU will repeatedly ask for completions
+ without giving up any time. If set to 0, a hybrid polling mode
+ is used, where the kernel will attempt to make an educated guess
+ at when the IO will complete. Based on this guess, the kernel
+ will put the process issuing IO to sleep for an amount of time,
+ before entering a classic poll loop. This mode might be a little
+ slower than pure classic polling, but it will be more efficient.
+ If set to a value larger than 0, the kernel will put the process
+ issuing IO to sleep for this amount of microseconds before
+ entering classic polling.
+
+
What: /sys/block/<disk>/queue/io_timeout
Date: November 2018
Contact: Weiping Zhang <[email protected]>
Description:
- io_timeout is the request timeout in milliseconds. If a request
- does not complete in this time then the block driver timeout
- handler is invoked. That timeout handler can decide to retry
- the request, to fail it or to start a device recovery strategy.
+ [RW] io_timeout is the request timeout in milliseconds. If a
+ request does not complete in this time then the block driver
+ timeout handler is invoked. That timeout handler can decide to
+ retry the request, to fail it or to start a device recovery
+ strategy.
+
+
+What: /sys/block/<disk>/queue/iostats
+Date: January 2009
+Contact: [email protected]
+Description:
+ [RW] This file is used to control (on/off) the iostats
+ accounting of the disk.
What: /sys/block/<disk>/queue/logical_block_size
Date: May 2009
Contact: Martin K. Petersen <[email protected]>
Description:
- This is the smallest unit the storage device can
- address. It is typically 512 bytes.
+ [RO] This is the smallest unit the storage device can address.
+ It is typically 512 bytes.
What: /sys/block/<disk>/queue/max_active_zones
Date: July 2020
Contact: Niklas Cassel <[email protected]>
Description:
- For zoned block devices (zoned attribute indicating
+ [RO] For zoned block devices (zoned attribute indicating
"host-managed" or "host-aware"), the sum of zones belonging to
any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
is limited by this value. If this value is 0, there is no limit.
+ If the host attempts to exceed this limit, the driver should
+ report this error with BLK_STS_ZONE_ACTIVE_RESOURCE, which user
+ space may see as the EOVERFLOW errno.
+
+
+What: /sys/block/<disk>/queue/max_discard_segments
+Date: February 2017
+Contact: [email protected]
+Description:
+ [RO] The maximum number of DMA scatter/gather entries in a
+ discard request.
+
+
+What: /sys/block/<disk>/queue/max_hw_sectors_kb
+Date: September 2004
+Contact: [email protected]
+Description:
+ [RO] This is the maximum number of kilobytes supported in a
+ single data transfer.
+
+
+What: /sys/block/<disk>/queue/max_integrity_segments
+Date: September 2010
+Contact: [email protected]
+Description:
+ [RO] Maximum number of elements in a DMA scatter/gather list
+ with integrity data that will be submitted by the block layer
+ core to the associated block driver.
+
What: /sys/block/<disk>/queue/max_open_zones
Date: July 2020
Contact: Niklas Cassel <[email protected]>
Description:
- For zoned block devices (zoned attribute indicating
+ [RO] For zoned block devices (zoned attribute indicating
"host-managed" or "host-aware"), the sum of zones belonging to
- any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
- is limited by this value. If this value is 0, there is no limit.
+ any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN, is
+ limited by this value. If this value is 0, there is no limit.
+
+
+What: /sys/block/<disk>/queue/max_sectors_kb
+Date: September 2004
+Contact: [email protected]
+Description:
+ [RW] This is the maximum number of kilobytes that the block
+ layer will allow for a filesystem request. Must be smaller than
+ or equal to the maximum size allowed by the hardware.
+
+
+What: /sys/block/<disk>/queue/max_segment_size
+Date: March 2010
+Contact: [email protected]
+Description:
+ [RO] Maximum size in bytes of a single element in a DMA
+ scatter/gather list.
+
+
+What: /sys/block/<disk>/queue/max_segments
+Date: March 2010
+Contact: [email protected]
+Description:
+ [RO] Maximum number of elements in a DMA scatter/gather list
+ that is submitted to the associated block driver.
What: /sys/block/<disk>/queue/minimum_io_size
Date: April 2009
Contact: Martin K. Petersen <[email protected]>
Description:
- Storage devices may report a granularity or preferred
- minimum I/O size which is the smallest request the
- device can perform without incurring a performance
- penalty. For disk drives this is often the physical
- block size. For RAID arrays it is often the stripe
- chunk size. A properly aligned multiple of
- minimum_io_size is the preferred request size for
- workloads where a high number of I/O operations is
- desired.
+ [RO] Storage devices may report a granularity or preferred
+ minimum I/O size which is the smallest request the device can
+ perform without incurring a performance penalty. For disk
+ drives this is often the physical block size. For RAID arrays
+ it is often the stripe chunk size. A properly aligned multiple
+ of minimum_io_size is the preferred request size for workloads
+ where a high number of I/O operations is desired.
What: /sys/block/<disk>/queue/nomerges
Date: January 2010
Contact: [email protected]
Description:
- Standard I/O elevator operations include attempts to
- merge contiguous I/Os. For known random I/O loads these
- attempts will always fail and result in extra cycles
- being spent in the kernel. This allows one to turn off
- this behavior on one of two ways: When set to 1, complex
- merge checks are disabled, but the simple one-shot merges
- with the previous I/O request are enabled. When set to 2,
- all merge tries are disabled. The default value is 0 -
- which enables all types of merge tries.
+ [RW] Standard I/O elevator operations include attempts to merge
+ contiguous I/Os. For known random I/O loads these attempts will
+ always fail and result in extra cycles being spent in the
+ kernel. This allows one to turn off this behavior on one of two
+ ways: When set to 1, complex merge checks are disabled, but the
+ simple one-shot merges with the previous I/O request are
+ enabled. When set to 2, all merge tries are disabled. The
+ default value is 0 - which enables all types of merge tries.
+
+
+What: /sys/block/<disk>/queue/nr_requests
+Date: July 2003
+Contact: [email protected]
+Description:
+ [RW] This controls how many requests may be allocated in the
+ block layer for read or write requests. Note that the total
+ allocated number may be twice this amount, since it applies only
+ to reads or writes (not the accumulated sum).
+
+ To avoid priority inversion through request starvation, a
+ request queue maintains a separate request pool per each cgroup
+ when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
+ each such per-block-cgroup request pool. IOW, if there are N
+ block cgroups, each request queue may have up to N request
+ pools, each independently regulated by nr_requests.
What: /sys/block/<disk>/queue/nr_zones
Date: November 2018
Contact: Damien Le Moal <[email protected]>
Description:
- nr_zones indicates the total number of zones of a zoned block
- device ("host-aware" or "host-managed" zone model). For regular
- block devices, the value is always 0.
+ [RO] nr_zones indicates the total number of zones of a zoned
+ block device ("host-aware" or "host-managed" zone model). For
+ regular block devices, the value is always 0.
What: /sys/block/<disk>/queue/optimal_io_size
Date: April 2009
Contact: Martin K. Petersen <[email protected]>
Description:
- Storage devices may report an optimal I/O size, which is
- the device's preferred unit for sustained I/O. This is
- rarely reported for disk drives. For RAID arrays it is
- usually the stripe width or the internal track size. A
- properly aligned multiple of optimal_io_size is the
- preferred request size for workloads where sustained
- throughput is desired. If no optimal I/O size is
- reported this file contains 0.
+ [RO] Storage devices may report an optimal I/O size, which is
+ the device's preferred unit for sustained I/O. This is rarely
+ reported for disk drives. For RAID arrays it is usually the
+ stripe width or the internal track size. A properly aligned
+ multiple of optimal_io_size is the preferred request size for
+ workloads where sustained throughput is desired. If no optimal
+ I/O size is reported this file contains 0.
What: /sys/block/<disk>/queue/physical_block_size
Date: May 2009
Contact: Martin K. Petersen <[email protected]>
Description:
- This is the smallest unit a physical storage device can
- write atomically. It is usually the same as the logical
- block size but may be bigger. One example is SATA
- drives with 4KB sectors that expose a 512-byte logical
- block size to the operating system. For stacked block
- devices the physical_block_size variable contains the
- maximum physical_block_size of the component devices.
+ [RO] This is the smallest unit a physical storage device can
+ write atomically. It is usually the same as the logical block
+ size but may be bigger. One example is SATA drives with 4KB
+ sectors that expose a 512-byte logical block size to the
+ operating system. For stacked block devices the
+ physical_block_size variable contains the maximum
+ physical_block_size of the component devices.
+
+
+What: /sys/block/<disk>/queue/read_ahead_kb
+Date: May 2004
+Contact: [email protected]
+Description:
+ [RW] Maximum number of kilobytes to read-ahead for filesystems
+ on this block device.
+
+
+What: /sys/block/<disk>/queue/rotational
+Date: January 2009
+Contact: [email protected]
+Description:
+ [RW] This file is used to stat if the device is of rotational
+ type or non-rotational type.
+
+
+What: /sys/block/<disk>/queue/rq_affinity
+Date: September 2008
+Contact: [email protected]
+Description:
+ [RW] If this option is '1', the block layer will migrate request
+ completions to the cpu "group" that originally submitted the
+ request. For some workloads this provides a significant
+ reduction in CPU cycles due to caching effects.
+
+ For storage configurations that need to maximize distribution of
+ completion processing setting this option to '2' forces the
+ completion to run on the requesting cpu (bypassing the "group"
+ aggregation logic).
+
+
+What: /sys/block/<disk>/queue/scheduler
+Date: October 2004
+Contact: [email protected]
+Description:
+ [RW] When read, this file will display the current and available
+ IO schedulers for this block device. The currently active IO
+ scheduler will be enclosed in [] brackets. Writing an IO
+ scheduler name to this file will switch control of this block
+ device to that new IO scheduler. Note that writing an IO
+ scheduler name to this file will attempt to load that IO
+ scheduler module, if it isn't already present in the system.
+
+
+What: /sys/block/<disk>/queue/throttle_sample_time
+Date: March 2017
+Contact: [email protected]
+Description:
+ [RW] This is the time window that blk-throttle samples data, in
+ millisecond. blk-throttle makes decision based on the
+ samplings. Lower time means cgroups have more smooth throughput,
+ but higher CPU overhead. This exists only when
+ CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
+
+
+What: /sys/block/<disk>/queue/wbt_lat_usec
+Date: November 2016
+Contact: [email protected]
+Description:
+ [RW] If the device is registered for writeback throttling, then
+ this file shows the target minimum read latency. If this latency
+ is exceeded in a given window of time (see wb_window_usec), then
+ the writeback throttling will start scaling back writes. Writing
+ a value of '0' to this file disables the feature. Writing a
+ value of '-1' to this file resets the value to the default
+ setting.
+
+
+What: /sys/block/<disk>/queue/write_cache
+Date: April 2016
+Contact: [email protected]
+Description:
+ [RW] When read, this file will display whether the device has
+ write back caching enabled or not. It will return "write back"
+ for the former case, and "write through" for the latter. Writing
+ to this file can change the kernels view of the device, but it
+ doesn't alter the device state. This means that it might not be
+ safe to toggle the setting from "write back" to "write through",
+ since that will also eliminate cache flushes issued by the
+ kernel.
What: /sys/block/<disk>/queue/write_same_max_bytes
Date: January 2012
Contact: Martin K. Petersen <[email protected]>
Description:
- Some devices support a write same operation in which a
+ [RO] Some devices support a write same operation in which a
single data block can be written to a range of several
- contiguous blocks on storage. This can be used to wipe
- areas on disk or to initialize drives in a RAID
- configuration. write_same_max_bytes indicates how many
- bytes can be written in a single write same command. If
- write_same_max_bytes is 0, write same is not supported
- by the device.
+ contiguous blocks on storage. This can be used to wipe areas on
+ disk or to initialize drives in a RAID configuration.
+ write_same_max_bytes indicates how many bytes can be written in
+ a single write same command. If write_same_max_bytes is 0, write
+ same is not supported by the device.
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
Date: November 2016
Contact: Chaitanya Kulkarni <[email protected]>
Description:
- Devices that support write zeroes operation in which a
- single request can be issued to zero out the range of
- contiguous blocks on storage without having any payload
- in the request. This can be used to optimize writing zeroes
- to the devices. write_zeroes_max_bytes indicates how many
- bytes can be written in a single write zeroes command. If
- write_zeroes_max_bytes is 0, write zeroes is not supported
- by the device.
+ [RO] Devices that support write zeroes operation in which a
+ single request can be issued to zero out the range of contiguous
+ blocks on storage without having any payload in the request.
+ This can be used to optimize writing zeroes to the devices.
+ write_zeroes_max_bytes indicates how many bytes can be written
+ in a single write zeroes command. If write_zeroes_max_bytes is
+ 0, write zeroes is not supported by the device.
+
+
+What: /sys/block/<disk>/queue/zone_append_max_bytes
+Date: May 2020
+Contact: [email protected]
+Description:
+ [RO] This is the maximum number of bytes that can be written to
+ a sequential zone of a zoned block device using a zone append
+ write operation (REQ_OP_ZONE_APPEND). This value is always 0 for
+ regular block devices.
+
+
+What: /sys/block/<disk>/queue/zone_write_granularity
+Date: January 2021
+Contact: [email protected]
+Description:
+ [RO] This indicates the alignment constraint, in bytes, for
+ write operations in sequential zones of zoned block devices
+ (devices with a zoned attributed that reports "host-managed" or
+ "host-aware"). This value is always 0 for regular block devices.
What: /sys/block/<disk>/queue/zoned
Date: September 2016
Contact: Damien Le Moal <[email protected]>
Description:
- zoned indicates if the device is a zoned block device
- and the zone model of the device if it is indeed zoned.
- The possible values indicated by zoned are "none" for
- regular block devices and "host-aware" or "host-managed"
- for zoned block devices. The characteristics of
- host-aware and host-managed zoned block devices are
- described in the ZBC (Zoned Block Commands) and ZAC
- (Zoned Device ATA Command Set) standards. These standards
- also define the "drive-managed" zone model. However,
- since drive-managed zoned block devices do not support
- zone commands, they will be treated as regular block
- devices and zoned will report "none".
+ [RO] zoned indicates if the device is a zoned block device and
+ the zone model of the device if it is indeed zoned. The
+ possible values indicated by zoned are "none" for regular block
+ devices and "host-aware" or "host-managed" for zoned block
+ devices. The characteristics of host-aware and host-managed
+ zoned block devices are described in the ZBC (Zoned Block
+ Commands) and ZAC (Zoned Device ATA Command Set) standards.
+ These standards also define the "drive-managed" zone model.
+ However, since drive-managed zoned block devices do not support
+ zone commands, they will be treated as regular block devices and
+ zoned will report "none".
What: /sys/block/<disk>/stat
--
2.34.1
From: Eric Biggers <[email protected]>
This has been replaced by Documentation/ABI/testing/sysfs-block, which
is the correct place for sysfs documentation.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/block/index.rst | 1 -
Documentation/block/queue-sysfs.rst | 321 ----------------------------
2 files changed, 322 deletions(-)
delete mode 100644 Documentation/block/queue-sysfs.rst
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
index 86dcf7159f990..3a41495dd77b5 100644
--- a/Documentation/block/index.rst
+++ b/Documentation/block/index.rst
@@ -20,7 +20,6 @@ Block
kyber-iosched
null_blk
pr
- queue-sysfs
request
stat
switching-sched
diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
deleted file mode 100644
index 3f569d5324857..0000000000000
--- a/Documentation/block/queue-sysfs.rst
+++ /dev/null
@@ -1,321 +0,0 @@
-=================
-Queue sysfs files
-=================
-
-This text file will detail the queue files that are located in the sysfs tree
-for each block device. Note that stacked devices typically do not export
-any settings, since their queue merely functions as a remapping target.
-These files are the ones found in the /sys/block/xxx/queue/ directory.
-
-Files denoted with a RO postfix are readonly and the RW postfix means
-read-write.
-
-add_random (RW)
----------------
-This file allows to turn off the disk entropy contribution. Default
-value of this file is '1'(on).
-
-chunk_sectors (RO)
-------------------
-This has different meaning depending on the type of the block device.
-For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
-of the RAID volume stripe segment. For a zoned block device, either host-aware
-or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
-of the device, with the eventual exception of the last zone of the device which
-may be smaller.
-
-dax (RO)
---------
-This file indicates whether the device supports Direct Access (DAX),
-used by CPU-addressable storage to bypass the pagecache. It shows '1'
-if true, '0' if not.
-
-discard_granularity (RO)
-------------------------
-This shows the size of internal allocation of the device in bytes, if
-reported by the device. A value of '0' means device does not support
-the discard functionality.
-
-discard_max_hw_bytes (RO)
--------------------------
-Devices that support discard functionality may have internal limits on
-the number of bytes that can be trimmed or unmapped in a single operation.
-The `discard_max_hw_bytes` parameter is set by the device driver to the
-maximum number of bytes that can be discarded in a single operation.
-Discard requests issued to the device must not exceed this limit.
-A `discard_max_hw_bytes` value of 0 means that the device does not support
-discard functionality.
-
-discard_max_bytes (RW)
-----------------------
-While discard_max_hw_bytes is the hardware limit for the device, this
-setting is the software limit. Some devices exhibit large latencies when
-large discards are issued, setting this value lower will make Linux issue
-smaller discards and potentially help reduce latencies induced by large
-discard operations.
-
-discard_zeroes_data (RO)
-------------------------
-Obsolete. Always zero.
-
-fua (RO)
---------
-Whether or not the block driver supports the FUA flag for write requests.
-FUA stands for Force Unit Access. If the FUA flag is set that means that
-write requests must bypass the volatile cache of the storage device.
-
-hw_sector_size (RO)
--------------------
-This is the hardware sector size of the device, in bytes.
-
-io_poll (RW)
-------------
-When read, this file shows whether polling is enabled (1) or disabled
-(0). Writing '0' to this file will disable polling for this device.
-Writing any non-zero value will enable this feature.
-
-io_poll_delay (RW)
-------------------
-If polling is enabled, this controls what kind of polling will be
-performed. It defaults to -1, which is classic polling. In this mode,
-the CPU will repeatedly ask for completions without giving up any time.
-If set to 0, a hybrid polling mode is used, where the kernel will attempt
-to make an educated guess at when the IO will complete. Based on this
-guess, the kernel will put the process issuing IO to sleep for an amount
-of time, before entering a classic poll loop. This mode might be a
-little slower than pure classic polling, but it will be more efficient.
-If set to a value larger than 0, the kernel will put the process issuing
-IO to sleep for this amount of microseconds before entering classic
-polling.
-
-io_timeout (RW)
----------------
-io_timeout is the request timeout in milliseconds. If a request does not
-complete in this time then the block driver timeout handler is invoked.
-That timeout handler can decide to retry the request, to fail it or to start
-a device recovery strategy.
-
-iostats (RW)
--------------
-This file is used to control (on/off) the iostats accounting of the
-disk.
-
-logical_block_size (RO)
------------------------
-This is the logical block size of the device, in bytes.
-
-max_discard_segments (RO)
--------------------------
-The maximum number of DMA scatter/gather entries in a discard request.
-
-max_hw_sectors_kb (RO)
-----------------------
-This is the maximum number of kilobytes supported in a single data transfer.
-
-max_integrity_segments (RO)
----------------------------
-Maximum number of elements in a DMA scatter/gather list with integrity
-data that will be submitted by the block layer core to the associated
-block driver.
-
-max_active_zones (RO)
----------------------
-For zoned block devices (zoned attribute indicating "host-managed" or
-"host-aware"), the sum of zones belonging to any of the zone states:
-EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
-If this value is 0, there is no limit.
-
-If the host attempts to exceed this limit, the driver should report this error
-with BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW
-errno.
-
-max_open_zones (RO)
--------------------
-For zoned block devices (zoned attribute indicating "host-managed" or
-"host-aware"), the sum of zones belonging to any of the zone states:
-EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
-If this value is 0, there is no limit.
-
-If the host attempts to exceed this limit, the driver should report this error
-with BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS
-errno.
-
-max_sectors_kb (RW)
--------------------
-This is the maximum number of kilobytes that the block layer will allow
-for a filesystem request. Must be smaller than or equal to the maximum
-size allowed by the hardware.
-
-max_segments (RO)
------------------
-Maximum number of elements in a DMA scatter/gather list that is submitted
-to the associated block driver.
-
-max_segment_size (RO)
----------------------
-Maximum size in bytes of a single element in a DMA scatter/gather list.
-
-minimum_io_size (RO)
---------------------
-This is the smallest preferred IO size reported by the device.
-
-nomerges (RW)
--------------
-This enables the user to disable the lookup logic involved with IO
-merging requests in the block layer. By default (0) all merges are
-enabled. When set to 1 only simple one-hit merges will be tried. When
-set to 2 no merge algorithms will be tried (including one-hit or more
-complex tree/hash lookups).
-
-nr_requests (RW)
-----------------
-This controls how many requests may be allocated in the block layer for
-read or write requests. Note that the total allocated number may be twice
-this amount, since it applies only to reads or writes (not the accumulated
-sum).
-
-To avoid priority inversion through request starvation, a request
-queue maintains a separate request pool per each cgroup when
-CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
-per-block-cgroup request pool. IOW, if there are N block cgroups,
-each request queue may have up to N request pools, each independently
-regulated by nr_requests.
-
-nr_zones (RO)
--------------
-For zoned block devices (zoned attribute indicating "host-managed" or
-"host-aware"), this indicates the total number of zones of the device.
-This is always 0 for regular block devices.
-
-optimal_io_size (RO)
---------------------
-This is the optimal IO size reported by the device.
-
-physical_block_size (RO)
-------------------------
-This is the physical block size of device, in bytes.
-
-read_ahead_kb (RW)
-------------------
-Maximum number of kilobytes to read-ahead for filesystems on this block
-device.
-
-rotational (RW)
----------------
-This file is used to stat if the device is of rotational type or
-non-rotational type.
-
-rq_affinity (RW)
-----------------
-If this option is '1', the block layer will migrate request completions to the
-cpu "group" that originally submitted the request. For some workloads this
-provides a significant reduction in CPU cycles due to caching effects.
-
-For storage configurations that need to maximize distribution of completion
-processing setting this option to '2' forces the completion to run on the
-requesting cpu (bypassing the "group" aggregation logic).
-
-scheduler (RW)
---------------
-When read, this file will display the current and available IO schedulers
-for this block device. The currently active IO scheduler will be enclosed
-in [] brackets. Writing an IO scheduler name to this file will switch
-control of this block device to that new IO scheduler. Note that writing
-an IO scheduler name to this file will attempt to load that IO scheduler
-module, if it isn't already present in the system.
-
-write_cache (RW)
-----------------
-When read, this file will display whether the device has write back
-caching enabled or not. It will return "write back" for the former
-case, and "write through" for the latter. Writing to this file can
-change the kernels view of the device, but it doesn't alter the
-device state. This means that it might not be safe to toggle the
-setting from "write back" to "write through", since that will also
-eliminate cache flushes issued by the kernel.
-
-write_same_max_bytes (RO)
--------------------------
-This is the number of bytes the device can write in a single write-same
-command. A value of '0' means write-same is not supported by this
-device.
-
-wbt_lat_usec (RW)
------------------
-If the device is registered for writeback throttling, then this file shows
-the target minimum read latency. If this latency is exceeded in a given
-window of time (see wb_window_usec), then the writeback throttling will start
-scaling back writes. Writing a value of '0' to this file disables the
-feature. Writing a value of '-1' to this file resets the value to the
-default setting.
-
-throttle_sample_time (RW)
--------------------------
-This is the time window that blk-throttle samples data, in millisecond.
-blk-throttle makes decision based on the samplings. Lower time means cgroups
-have more smooth throughput, but higher CPU overhead. This exists only when
-CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
-
-write_zeroes_max_bytes (RO)
----------------------------
-For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
-bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
-is not supported.
-
-zone_append_max_bytes (RO)
---------------------------
-This is the maximum number of bytes that can be written to a sequential
-zone of a zoned block device using a zone append write operation
-(REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices.
-
-zoned (RO)
-----------
-This indicates if the device is a zoned block device and the zone model of the
-device if it is indeed zoned. The possible values indicated by zoned are
-"none" for regular block devices and "host-aware" or "host-managed" for zoned
-block devices. The characteristics of host-aware and host-managed zoned block
-devices are described in the ZBC (Zoned Block Commands) and ZAC
-(Zoned Device ATA Command Set) standards. These standards also define the
-"drive-managed" zone model. However, since drive-managed zoned block devices
-do not support zone commands, they will be treated as regular block devices
-and zoned will report "none".
-
-zone_write_granularity (RO)
----------------------------
-This indicates the alignment constraint, in bytes, for write operations in
-sequential zones of zoned block devices (devices with a zoned attributed
-that reports "host-managed" or "host-aware"). This value is always 0 for
-regular block devices.
-
-independent_access_ranges (RO)
-------------------------------
-
-The presence of this sub-directory of the /sys/block/xxx/queue/ directory
-indicates that the device is capable of executing requests targeting
-different sector ranges in parallel. For instance, single LUN multi-actuator
-hard-disks will have an independent_access_ranges directory if the device
-correctly advertizes the sector ranges of its actuators.
-
-The independent_access_ranges directory contains one directory per access
-range, with each range described using the sector (RO) attribute file to
-indicate the first sector of the range and the nr_sectors (RO) attribute file
-to indicate the total number of sectors in the range starting from the first
-sector of the range. For example, a dual-actuator hard-disk will have the
-following independent_access_ranges entries.::
-
- $ tree /sys/block/<device>/queue/independent_access_ranges/
- /sys/block/<device>/queue/independent_access_ranges/
- |-- 0
- | |-- nr_sectors
- | `-- sector
- `-- 1
- |-- nr_sectors
- `-- sector
-
-The sector and nr_sectors attributes use 512B sector unit, regardless of
-the actual block size of the device. Independent access ranges do not
-overlap and include all sectors within the device capacity. The access
-ranges are numbered in increasing order of the range start sector,
-that is, the sector attribute of range 0 always has the value 0.
-
-Jens Axboe <[email protected]>, February 2009
--
2.34.1
From: Eric Biggers <[email protected]>
Sort the documentation for the files alphabetically by file path so that
there is a logical order and it's clear where to add new files.
With two small exceptions, this patch doesn't change the documentation
itself and just reorders it:
- In /sys/block/<disk>/<part>/stat, I replaced <part> with <partition>
to be consistent with the other files.
- The description for /sys/block/<disk>/<part>/stat referred to another
file "above", which I reworded.
Signed-off-by: Eric Biggers <[email protected]>
---
Documentation/ABI/testing/sysfs-block | 385 ++++++++++++++------------
1 file changed, 203 insertions(+), 182 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index b16b0c45a272e..9febd53a5ebe8 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -1,31 +1,37 @@
-What: /sys/block/<disk>/stat
-Date: February 2008
-Contact: Jerome Marchand <[email protected]>
+What: /sys/block/<disk>/alignment_offset
+Date: April 2009
+Contact: Martin K. Petersen <[email protected]>
Description:
- The /sys/block/<disk>/stat files displays the I/O
- statistics of disk <disk>. They contain 11 fields:
+ Storage devices may report a physical block size that is
+ bigger than the logical block size (for instance a drive
+ with 4KB physical sectors exposing 512-byte logical
+ blocks to the operating system). This parameter
+ indicates how many bytes the beginning of the device is
+ offset from the disk's natural alignment.
- == ==============================================
- 1 reads completed successfully
- 2 reads merged
- 3 sectors read
- 4 time spent reading (ms)
- 5 writes completed
- 6 writes merged
- 7 sectors written
- 8 time spent writing (ms)
- 9 I/Os currently in progress
- 10 time spent doing I/Os (ms)
- 11 weighted time spent doing I/Os (ms)
- 12 discards completed
- 13 discards merged
- 14 sectors discarded
- 15 time spent discarding (ms)
- 16 flush requests completed
- 17 time spent flushing (ms)
- == ==============================================
- For more details refer Documentation/admin-guide/iostats.rst
+What: /sys/block/<disk>/discard_alignment
+Date: May 2011
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ Devices that support discard functionality may
+ internally allocate space in units that are bigger than
+ the exported logical block size. The discard_alignment
+ parameter indicates how many bytes the beginning of the
+ device is offset from the internal allocation unit's
+ natural alignment.
+
+
+What: /sys/block/<disk>/diskseq
+Date: February 2021
+Contact: Matteo Croce <[email protected]>
+Description:
+ The /sys/block/<disk>/diskseq files reports the disk
+ sequence number, which is a monotonically increasing
+ number assigned to every drive.
+ Some devices, like the loop device, refresh such number
+ every time the backing file is changed.
+ The value type is 64 bit unsigned.
What: /sys/block/<disk>/inflight
@@ -44,26 +50,12 @@ Description:
and for SCSI device also its queue_depth.
-What: /sys/block/<disk>/diskseq
-Date: February 2021
-Contact: Matteo Croce <[email protected]>
-Description:
- The /sys/block/<disk>/diskseq files reports the disk
- sequence number, which is a monotonically increasing
- number assigned to every drive.
- Some devices, like the loop device, refresh such number
- every time the backing file is changed.
- The value type is 64 bit unsigned.
-
-
-What: /sys/block/<disk>/<part>/stat
-Date: February 2008
-Contact: Jerome Marchand <[email protected]>
+What: /sys/block/<disk>/integrity/device_is_integrity_capable
+Date: July 2014
+Contact: Martin K. Petersen <[email protected]>
Description:
- The /sys/block/<disk>/<part>/stat files display the
- I/O statistics of partition <part>. The format is the
- same as the above-written /sys/block/<disk>/stat
- format.
+ Indicates whether a storage device is capable of storing
+ integrity metadata. Set if the device is T10 PI-capable.
What: /sys/block/<disk>/integrity/format
@@ -74,6 +66,15 @@ Description:
E.g. T10-DIF-TYPE1-CRC.
+What: /sys/block/<disk>/integrity/protection_interval_bytes
+Date: July 2015
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ Describes the number of data bytes which are protected
+ by one integrity tuple. Typically the device's logical
+ block size.
+
+
What: /sys/block/<disk>/integrity/read_verify
Date: June 2008
Contact: Martin K. Petersen <[email protected]>
@@ -91,21 +92,6 @@ Description:
512 bytes of data.
-What: /sys/block/<disk>/integrity/device_is_integrity_capable
-Date: July 2014
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Indicates whether a storage device is capable of storing
- integrity metadata. Set if the device is T10 PI-capable.
-
-What: /sys/block/<disk>/integrity/protection_interval_bytes
-Date: July 2015
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Describes the number of data bytes which are protected
- by one integrity tuple. Typically the device's logical
- block size.
-
What: /sys/block/<disk>/integrity/write_generate
Date: June 2008
Contact: Martin K. Petersen <[email protected]>
@@ -114,16 +100,6 @@ Description:
generate checksums for write requests bound for
devices that support receiving integrity metadata.
-What: /sys/block/<disk>/alignment_offset
-Date: April 2009
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Storage devices may report a physical block size that is
- bigger than the logical block size (for instance a drive
- with 4KB physical sectors exposing 512-byte logical
- blocks to the operating system). This parameter
- indicates how many bytes the beginning of the device is
- offset from the disk's natural alignment.
What: /sys/block/<disk>/<partition>/alignment_offset
Date: April 2009
@@ -136,76 +112,6 @@ Description:
indicates how many bytes the beginning of the partition
is offset from the disk's natural alignment.
-What: /sys/block/<disk>/queue/logical_block_size
-Date: May 2009
-Contact: Martin K. Petersen <[email protected]>
-Description:
- This is the smallest unit the storage device can
- address. It is typically 512 bytes.
-
-What: /sys/block/<disk>/queue/physical_block_size
-Date: May 2009
-Contact: Martin K. Petersen <[email protected]>
-Description:
- This is the smallest unit a physical storage device can
- write atomically. It is usually the same as the logical
- block size but may be bigger. One example is SATA
- drives with 4KB sectors that expose a 512-byte logical
- block size to the operating system. For stacked block
- devices the physical_block_size variable contains the
- maximum physical_block_size of the component devices.
-
-What: /sys/block/<disk>/queue/minimum_io_size
-Date: April 2009
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Storage devices may report a granularity or preferred
- minimum I/O size which is the smallest request the
- device can perform without incurring a performance
- penalty. For disk drives this is often the physical
- block size. For RAID arrays it is often the stripe
- chunk size. A properly aligned multiple of
- minimum_io_size is the preferred request size for
- workloads where a high number of I/O operations is
- desired.
-
-What: /sys/block/<disk>/queue/optimal_io_size
-Date: April 2009
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Storage devices may report an optimal I/O size, which is
- the device's preferred unit for sustained I/O. This is
- rarely reported for disk drives. For RAID arrays it is
- usually the stripe width or the internal track size. A
- properly aligned multiple of optimal_io_size is the
- preferred request size for workloads where sustained
- throughput is desired. If no optimal I/O size is
- reported this file contains 0.
-
-What: /sys/block/<disk>/queue/nomerges
-Date: January 2010
-Contact:
-Description:
- Standard I/O elevator operations include attempts to
- merge contiguous I/Os. For known random I/O loads these
- attempts will always fail and result in extra cycles
- being spent in the kernel. This allows one to turn off
- this behavior on one of two ways: When set to 1, complex
- merge checks are disabled, but the simple one-shot merges
- with the previous I/O request are enabled. When set to 2,
- all merge tries are disabled. The default value is 0 -
- which enables all types of merge tries.
-
-What: /sys/block/<disk>/discard_alignment
-Date: May 2011
-Contact: Martin K. Petersen <[email protected]>
-Description:
- Devices that support discard functionality may
- internally allocate space in units that are bigger than
- the exported logical block size. The discard_alignment
- parameter indicates how many bytes the beginning of the
- device is offset from the internal allocation unit's
- natural alignment.
What: /sys/block/<disk>/<partition>/discard_alignment
Date: May 2011
@@ -218,6 +124,30 @@ Description:
partition is offset from the internal allocation unit's
natural alignment.
+
+What: /sys/block/<disk>/<partition>/stat
+Date: February 2008
+Contact: Jerome Marchand <[email protected]>
+Description:
+ The /sys/block/<disk>/<partition>/stat files display the
+ I/O statistics of partition <partition>. The format is the
+ same as the format of /sys/block/<disk>/stat.
+
+
+What: /sys/block/<disk>/queue/chunk_sectors
+Date: September 2016
+Contact: Hannes Reinecke <[email protected]>
+Description:
+ chunk_sectors has different meaning depending on the type
+ of the disk. For a RAID device (dm-raid), chunk_sectors
+ indicates the size in 512B sectors of the RAID volume
+ stripe segment. For a zoned block device, either
+ host-aware or host-managed, chunk_sectors indicates the
+ size in 512B sectors of the zones of the device, with
+ the eventual exception of the last zone of the device
+ which may be smaller.
+
+
What: /sys/block/<disk>/queue/discard_granularity
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
@@ -231,6 +161,7 @@ Description:
physical block size. A discard_granularity of 0 means
that the device does not support discard functionality.
+
What: /sys/block/<disk>/queue/discard_max_bytes
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
@@ -247,6 +178,7 @@ Description:
value of 0 means that the device does not support
discard functionality.
+
What: /sys/block/<disk>/queue/discard_zeroes_data
Date: May 2011
Contact: Martin K. Petersen <[email protected]>
@@ -254,6 +186,111 @@ Description:
Will always return 0. Don't rely on any specific behavior
for discards, and don't read this file.
+
+What: /sys/block/<disk>/queue/io_timeout
+Date: November 2018
+Contact: Weiping Zhang <[email protected]>
+Description:
+ io_timeout is the request timeout in milliseconds. If a request
+ does not complete in this time then the block driver timeout
+ handler is invoked. That timeout handler can decide to retry
+ the request, to fail it or to start a device recovery strategy.
+
+
+What: /sys/block/<disk>/queue/logical_block_size
+Date: May 2009
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ This is the smallest unit the storage device can
+ address. It is typically 512 bytes.
+
+
+What: /sys/block/<disk>/queue/max_active_zones
+Date: July 2020
+Contact: Niklas Cassel <[email protected]>
+Description:
+ For zoned block devices (zoned attribute indicating
+ "host-managed" or "host-aware"), the sum of zones belonging to
+ any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
+ is limited by this value. If this value is 0, there is no limit.
+
+
+What: /sys/block/<disk>/queue/max_open_zones
+Date: July 2020
+Contact: Niklas Cassel <[email protected]>
+Description:
+ For zoned block devices (zoned attribute indicating
+ "host-managed" or "host-aware"), the sum of zones belonging to
+ any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
+ is limited by this value. If this value is 0, there is no limit.
+
+
+What: /sys/block/<disk>/queue/minimum_io_size
+Date: April 2009
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ Storage devices may report a granularity or preferred
+ minimum I/O size which is the smallest request the
+ device can perform without incurring a performance
+ penalty. For disk drives this is often the physical
+ block size. For RAID arrays it is often the stripe
+ chunk size. A properly aligned multiple of
+ minimum_io_size is the preferred request size for
+ workloads where a high number of I/O operations is
+ desired.
+
+
+What: /sys/block/<disk>/queue/nomerges
+Date: January 2010
+Contact:
+Description:
+ Standard I/O elevator operations include attempts to
+ merge contiguous I/Os. For known random I/O loads these
+ attempts will always fail and result in extra cycles
+ being spent in the kernel. This allows one to turn off
+ this behavior on one of two ways: When set to 1, complex
+ merge checks are disabled, but the simple one-shot merges
+ with the previous I/O request are enabled. When set to 2,
+ all merge tries are disabled. The default value is 0 -
+ which enables all types of merge tries.
+
+
+What: /sys/block/<disk>/queue/nr_zones
+Date: November 2018
+Contact: Damien Le Moal <[email protected]>
+Description:
+ nr_zones indicates the total number of zones of a zoned block
+ device ("host-aware" or "host-managed" zone model). For regular
+ block devices, the value is always 0.
+
+
+What: /sys/block/<disk>/queue/optimal_io_size
+Date: April 2009
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ Storage devices may report an optimal I/O size, which is
+ the device's preferred unit for sustained I/O. This is
+ rarely reported for disk drives. For RAID arrays it is
+ usually the stripe width or the internal track size. A
+ properly aligned multiple of optimal_io_size is the
+ preferred request size for workloads where sustained
+ throughput is desired. If no optimal I/O size is
+ reported this file contains 0.
+
+
+What: /sys/block/<disk>/queue/physical_block_size
+Date: May 2009
+Contact: Martin K. Petersen <[email protected]>
+Description:
+ This is the smallest unit a physical storage device can
+ write atomically. It is usually the same as the logical
+ block size but may be bigger. One example is SATA
+ drives with 4KB sectors that expose a 512-byte logical
+ block size to the operating system. For stacked block
+ devices the physical_block_size variable contains the
+ maximum physical_block_size of the component devices.
+
+
What: /sys/block/<disk>/queue/write_same_max_bytes
Date: January 2012
Contact: Martin K. Petersen <[email protected]>
@@ -267,6 +304,7 @@ Description:
write_same_max_bytes is 0, write same is not supported
by the device.
+
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
Date: November 2016
Contact: Chaitanya Kulkarni <[email protected]>
@@ -280,6 +318,7 @@ Description:
write_zeroes_max_bytes is 0, write zeroes is not supported
by the device.
+
What: /sys/block/<disk>/queue/zoned
Date: September 2016
Contact: Damien Le Moal <[email protected]>
@@ -297,50 +336,32 @@ Description:
zone commands, they will be treated as regular block
devices and zoned will report "none".
-What: /sys/block/<disk>/queue/nr_zones
-Date: November 2018
-Contact: Damien Le Moal <[email protected]>
-Description:
- nr_zones indicates the total number of zones of a zoned block
- device ("host-aware" or "host-managed" zone model). For regular
- block devices, the value is always 0.
-What: /sys/block/<disk>/queue/max_active_zones
-Date: July 2020
-Contact: Niklas Cassel <[email protected]>
-Description:
- For zoned block devices (zoned attribute indicating
- "host-managed" or "host-aware"), the sum of zones belonging to
- any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
- is limited by this value. If this value is 0, there is no limit.
-
-What: /sys/block/<disk>/queue/max_open_zones
-Date: July 2020
-Contact: Niklas Cassel <[email protected]>
+What: /sys/block/<disk>/stat
+Date: February 2008
+Contact: Jerome Marchand <[email protected]>
Description:
- For zoned block devices (zoned attribute indicating
- "host-managed" or "host-aware"), the sum of zones belonging to
- any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
- is limited by this value. If this value is 0, there is no limit.
+ The /sys/block/<disk>/stat files displays the I/O
+ statistics of disk <disk>. They contain 11 fields:
-What: /sys/block/<disk>/queue/chunk_sectors
-Date: September 2016
-Contact: Hannes Reinecke <[email protected]>
-Description:
- chunk_sectors has different meaning depending on the type
- of the disk. For a RAID device (dm-raid), chunk_sectors
- indicates the size in 512B sectors of the RAID volume
- stripe segment. For a zoned block device, either
- host-aware or host-managed, chunk_sectors indicates the
- size in 512B sectors of the zones of the device, with
- the eventual exception of the last zone of the device
- which may be smaller.
+ == ==============================================
+ 1 reads completed successfully
+ 2 reads merged
+ 3 sectors read
+ 4 time spent reading (ms)
+ 5 writes completed
+ 6 writes merged
+ 7 sectors written
+ 8 time spent writing (ms)
+ 9 I/Os currently in progress
+ 10 time spent doing I/Os (ms)
+ 11 weighted time spent doing I/Os (ms)
+ 12 discards completed
+ 13 discards merged
+ 14 sectors discarded
+ 15 time spent discarding (ms)
+ 16 flush requests completed
+ 17 time spent flushing (ms)
+ == ==============================================
-What: /sys/block/<disk>/queue/io_timeout
-Date: November 2018
-Contact: Weiping Zhang <[email protected]>
-Description:
- io_timeout is the request timeout in milliseconds. If a request
- does not complete in this time then the block driver timeout
- handler is invoked. That timeout handler can decide to retry
- the request, to fail it or to start a device recovery strategy.
+ For more details refer Documentation/admin-guide/iostats.rst
base-commit: c2626d30f312afc341158e07bf088f5a23b4eeeb
--
2.34.1
From: Eric Biggers <[email protected]>
Include Documentation/block/ and Documentation/ABI/testing/sysfs-block
in the "BLOCK LAYER" maintainers file entry.
Signed-off-by: Eric Biggers <[email protected]>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 360e9aa0205d6..9f66238ccb991 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3380,6 +3380,8 @@ M: Jens Axboe <[email protected]>
L: [email protected]
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
+F: Documentation/ABI/testing/sysfs-block
+F: Documentation/block/
F: block/
F: drivers/block/
F: include/linux/blk*
--
2.34.1
On Wed, Dec 01, 2021 at 12:45:17AM -0800, Eric Biggers wrote:
> This series consolidates the documentation for /sys/block/<disk>/queue/
> into Documentation/ABI/, where it is supposed to go (as per Greg KH:
> https://lore.kernel.org/r/[email protected]).
>
> This series also updates MAINTAINERS to associate the block
> documentation with the block layer.
>
> This series applies to linux-block/for-next.
>
> Eric Biggers (7):
> docs: sysfs-block: sort alphabetically
> docs: sysfs-block: add contact for nomerges
> docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
> docs: sysfs-block: document stable_writes
> docs: sysfs-block: document virt_boundary_mask
> docs: block: remove queue-sysfs.rst
> MAINTAINERS: add entries for block layer documentation
Wonderful, thanks for doing this!
Reviewed-by: Greg Kroah-Hartman <[email protected]>
On 12/1/21 9:45 AM, Eric Biggers wrote:
> This series consolidates the documentation for /sys/block/<disk>/queue/
> into Documentation/ABI/, where it is supposed to go (as per Greg KH:
> https://lore.kernel.org/r/[email protected]).
>
> This series also updates MAINTAINERS to associate the block
> documentation with the block layer.
>
> This series applies to linux-block/for-next.
>
> Eric Biggers (7):
> docs: sysfs-block: sort alphabetically
> docs: sysfs-block: add contact for nomerges
> docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
> docs: sysfs-block: document stable_writes
> docs: sysfs-block: document virt_boundary_mask
> docs: block: remove queue-sysfs.rst
> MAINTAINERS: add entries for block layer documentation
>
> Documentation/ABI/testing/sysfs-block | 766 ++++++++++++++++++--------
> Documentation/block/index.rst | 1 -
> Documentation/block/queue-sysfs.rst | 321 -----------
> MAINTAINERS | 2 +
> 4 files changed, 545 insertions(+), 545 deletions(-)
> delete mode 100644 Documentation/block/queue-sysfs.rst
>
>
> base-commit: c2626d30f312afc341158e07bf088f5a23b4eeeb
>
Yay.
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
On 12/1/21 12:45 AM, Eric Biggers wrote:
> This series consolidates the documentation for /sys/block/<disk>/queue/
> into Documentation/ABI/, where it is supposed to go (as per Greg KH:
> https://lore.kernel.org/r/[email protected]).
>
> This series also updates MAINTAINERS to associate the block
> documentation with the block layer.
>
> This series applies to linux-block/for-next.
>
> Eric Biggers (7):
> docs: sysfs-block: sort alphabetically
> docs: sysfs-block: add contact for nomerges
> docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
> docs: sysfs-block: document stable_writes
> docs: sysfs-block: document virt_boundary_mask
> docs: block: remove queue-sysfs.rst
> MAINTAINERS: add entries for block layer documentation
>
> Documentation/ABI/testing/sysfs-block | 766 ++++++++++++++++++--------
> Documentation/block/index.rst | 1 -
> Documentation/block/queue-sysfs.rst | 321 -----------
> MAINTAINERS | 2 +
> 4 files changed, 545 insertions(+), 545 deletions(-)
> delete mode 100644 Documentation/block/queue-sysfs.rst
How about adding a patch that moves Documentation/ABI/testing/sysfs-block
to Documentation/ABI/stable/sysfs-block? The block layer sysfs ABI is used
widely by user space software and is considered stable.
Thanks,
Bart.
On Thu, Dec 02, 2021 at 11:32:45AM -0800, Bart Van Assche wrote:
> On 12/1/21 12:45 AM, Eric Biggers wrote:
> > This series consolidates the documentation for /sys/block/<disk>/queue/
> > into Documentation/ABI/, where it is supposed to go (as per Greg KH:
> > https://lore.kernel.org/r/[email protected]).
> >
> > This series also updates MAINTAINERS to associate the block
> > documentation with the block layer.
> >
> > This series applies to linux-block/for-next.
> >
> > Eric Biggers (7):
> > docs: sysfs-block: sort alphabetically
> > docs: sysfs-block: add contact for nomerges
> > docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
> > docs: sysfs-block: document stable_writes
> > docs: sysfs-block: document virt_boundary_mask
> > docs: block: remove queue-sysfs.rst
> > MAINTAINERS: add entries for block layer documentation
> >
> > Documentation/ABI/testing/sysfs-block | 766 ++++++++++++++++++--------
> > Documentation/block/index.rst | 1 -
> > Documentation/block/queue-sysfs.rst | 321 -----------
> > MAINTAINERS | 2 +
> > 4 files changed, 545 insertions(+), 545 deletions(-)
> > delete mode 100644 Documentation/block/queue-sysfs.rst
>
> How about adding a patch that moves Documentation/ABI/testing/sysfs-block
> to Documentation/ABI/stable/sysfs-block? The block layer sysfs ABI is used
> widely by user space software and is considered stable.
>
That would make sense. I decided not to include it in this patch series since
some of the sysfs-block files were added recently, so may not be as "stable" as
ones that have been around for 18 years, and because about 90% of the sysfs
documentation is in the "testing" directory anyway so it is not unusual. So I
felt it should be a separate change.
I think these patches should go in first, and then I can send a separate patch
that moves the file to the stable directory, if there is no objection to it.
- Eric
On Thu, Dec 02, 2021 at 01:05:39PM -0800, Eric Biggers wrote:
> On Thu, Dec 02, 2021 at 11:32:45AM -0800, Bart Van Assche wrote:
> > On 12/1/21 12:45 AM, Eric Biggers wrote:
> > > This series consolidates the documentation for /sys/block/<disk>/queue/
> > > into Documentation/ABI/, where it is supposed to go (as per Greg KH:
> > > https://lore.kernel.org/r/[email protected]).
> > >
> > > This series also updates MAINTAINERS to associate the block
> > > documentation with the block layer.
> > >
> > > This series applies to linux-block/for-next.
> > >
> > > Eric Biggers (7):
> > > docs: sysfs-block: sort alphabetically
> > > docs: sysfs-block: add contact for nomerges
> > > docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
> > > docs: sysfs-block: document stable_writes
> > > docs: sysfs-block: document virt_boundary_mask
> > > docs: block: remove queue-sysfs.rst
> > > MAINTAINERS: add entries for block layer documentation
> > >
> > > Documentation/ABI/testing/sysfs-block | 766 ++++++++++++++++++--------
> > > Documentation/block/index.rst | 1 -
> > > Documentation/block/queue-sysfs.rst | 321 -----------
> > > MAINTAINERS | 2 +
> > > 4 files changed, 545 insertions(+), 545 deletions(-)
> > > delete mode 100644 Documentation/block/queue-sysfs.rst
> >
> > How about adding a patch that moves Documentation/ABI/testing/sysfs-block
> > to Documentation/ABI/stable/sysfs-block? The block layer sysfs ABI is used
> > widely by user space software and is considered stable.
> >
>
> That would make sense. I decided not to include it in this patch series since
> some of the sysfs-block files were added recently, so may not be as "stable" as
> ones that have been around for 18 years, and because about 90% of the sysfs
> documentation is in the "testing" directory anyway so it is not unusual. So I
> felt it should be a separate change.
>
> I think these patches should go in first, and then I can send a separate patch
> that moves the file to the stable directory, if there is no objection to it.
>
Since no one has objected and this series hasn't been applied yet, I guess I'll
just go ahead and send out a new series which includes the renaming to stable.
- Eric