2020-06-23 02:36:09

by Daejun Park

[permalink] [raw]
Subject: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support


Changelog:

v2 -> v3
1. Add checking input module parameter value.
2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
3. Cleanup for unused variables and label.

v1 -> v2
1. Change the full boilerplate text to SPDX style.
2. Adopt dynamic allocation for sub-region data structure.
3. Cleanup.

NAND flash memory-based storage devices use Flash Translation Layer (FTL)
to translate logical addresses of I/O requests to corresponding flash
memory addresses. Mobile storage devices typically have RAM with
constrained size, thus lack in memory to keep the whole mapping table.
Therefore, mapping tables are partially retrieved from NAND flash on
demand, causing random-read performance degradation.

To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
(Host Performance Booster) which uses host system memory as a cache for the
FTL mapping table. By using HPB, FTL data can be read from host memory
faster than from NAND flash memory.

The current version only supports the DCM (device control mode).
This patch consists of 4 parts to support HPB feature.

1) UFS-feature layer
2) HPB probe and initialization process
3) READ -> HPB READ using cached map information
4) L2P (logical to physical) map management

The UFS-feature is an additional layer to avoid the structure in which the
UFS-core driver and the UFS-feature are entangled with each other in a
single module.
By adding the layer, UFS-features composed of various combinations can be
supported. Also, even if a new feature is added, modification of the
UFS-core driver can be minimized.

In the HPB probe and init process, the device information of the UFS is
queried. After checking supported features, the data structure for the HPB
is initialized according to the device information.

A read I/O in the active sub-region where the map is cached is changed to
HPB READ by the HPB module.

The HPB module manages the L2P map using information received from the
device. For active sub-region, the HPB module caches through ufshpb_map
request. For the in-active region, the HPB module discards the L2P map.
When a write I/O occurs in an active sub-region area, associated dirty
bitmap checked as dirty for preventing stale read.

HPB is shown to have a performance improvement of 58 - 67% for random read
workload. [1]

This series patches are based on the 5.9/scsi-queue branch.

[1]:
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong

Daejun park (5):
scsi: ufs: Add UFS feature related parameter
scsi: ufs: Add UFS feature layer
scsi: ufs: Introduce HPB module
scsi: ufs: L2P map management for HPB read
scsi: ufs: Prepare HPB read for cached sub-region

drivers/scsi/ufs/Kconfig | 9 +
drivers/scsi/ufs/Makefile | 3 +-
drivers/scsi/ufs/ufs.h | 12 +
drivers/scsi/ufs/ufsfeature.c | 148 +++
drivers/scsi/ufs/ufsfeature.h | 69 ++
drivers/scsi/ufs/ufshcd.c | 23 +-
drivers/scsi/ufs/ufshcd.h | 3 +
drivers/scsi/ufs/ufshpb.c | 1996 ++++++++++++++++++++++++++++++++++++
drivers/scsi/ufs/ufshpb.h | 234 +++++
9 files changed, 2494 insertions(+), 3 deletions(-)
created mode 100644 drivers/scsi/ufs/ufsfeature.c
created mode 100644 drivers/scsi/ufs/ufsfeature.h
created mode 100644 drivers/scsi/ufs/ufshpb.c
created mode 100644 drivers/scsi/ufs/ufshpb.h


2020-06-23 03:58:41

by Daejun Park

[permalink] [raw]
Subject: [RFC PATCH v3 1/5] scsi: ufs: Add UFS feature related parameter


This is a patch for parameters to be used for UFS features layer and HPB
module.

Signed-off-by: Daejun Park <[email protected]>
---
drivers/scsi/ufs/ufs.h | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/ufs/ufs.h b/drivers/scsi/ufs/ufs.h
index f8ab16f30fdc..ae557b8d3eba 100644
--- a/drivers/scsi/ufs/ufs.h
+++ b/drivers/scsi/ufs/ufs.h
@@ -122,6 +122,7 @@ enum flag_idn {
QUERY_FLAG_IDN_WB_EN = 0x0E,
QUERY_FLAG_IDN_WB_BUFF_FLUSH_EN = 0x0F,
QUERY_FLAG_IDN_WB_BUFF_FLUSH_DURING_HIBERN8 = 0x10,
+ QUERY_FLAG_IDN_HPB_RESET = 0x11,
};

/* Attribute idn for Query requests */
@@ -195,6 +196,9 @@ enum unit_desc_param {
UNIT_DESC_PARAM_PHY_MEM_RSRC_CNT = 0x18,
UNIT_DESC_PARAM_CTX_CAPABILITIES = 0x20,
UNIT_DESC_PARAM_LARGE_UNIT_SIZE_M1 = 0x22,
+ UNIT_DESC_HPB_LU_MAX_ACTIVE_REGIONS = 0x23,
+ UNIT_DESC_HPB_LU_PIN_REGION_START_OFFSET = 0x25,
+ UNIT_DESC_HPB_LU_NUM_PIN_REGIONS = 0x27,
UNIT_DESC_PARAM_WB_BUF_ALLOC_UNITS = 0x29,
};

@@ -235,6 +239,8 @@ enum device_desc_param {
DEVICE_DESC_PARAM_PSA_MAX_DATA = 0x25,
DEVICE_DESC_PARAM_PSA_TMT = 0x29,
DEVICE_DESC_PARAM_PRDCT_REV = 0x2A,
+ DEVICE_DESC_PARAM_HPB_VER = 0x40,
+ DEVICE_DESC_PARAM_HPB_CONTROL = 0x42,
DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP = 0x4F,
DEVICE_DESC_PARAM_WB_PRESRV_USRSPC_EN = 0x53,
DEVICE_DESC_PARAM_WB_TYPE = 0x54,
@@ -283,6 +289,10 @@ enum geometry_desc_param {
GEOMETRY_DESC_PARAM_ENM4_MAX_NUM_UNITS = 0x3E,
GEOMETRY_DESC_PARAM_ENM4_CAP_ADJ_FCTR = 0x42,
GEOMETRY_DESC_PARAM_OPT_LOG_BLK_SIZE = 0x44,
+ GEOMETRY_DESC_HPB_REGION_SIZE = 0x48,
+ GEOMETRY_DESC_HPB_NUMBER_LU = 0x49,
+ GEOMETRY_DESC_HPB_SUBREGION_SIZE = 0x4A,
+ GEOMETRY_DESC_HPB_DEVICE_MAX_ACTIVE_REGIONS = 0x4B,
GEOMETRY_DESC_PARAM_WB_MAX_ALLOC_UNITS = 0x4F,
GEOMETRY_DESC_PARAM_WB_MAX_WB_LUNS = 0x53,
GEOMETRY_DESC_PARAM_WB_BUFF_CAP_ADJ = 0x54,
@@ -327,6 +337,7 @@ enum {

/* Possible values for dExtendedUFSFeaturesSupport */
enum {
+ UFS_DEV_HPB_SUPPORT = BIT(7),
UFS_DEV_WRITE_BOOSTER_SUP = BIT(8),
};

@@ -537,6 +548,7 @@ struct ufs_dev_info {
u8 *model;
u16 wspecversion;
u32 clk_gating_wait_us;
+ u8 b_ufs_feature_sup;
u32 d_ext_ufs_feature_sup;
u8 b_wb_buffer_type;
u32 d_wb_alloc_units;

base-commit: 3145550a7f8b08356c8ff29feaa6c56aca12901d
--
2.17.1

2020-06-23 04:21:06

by Daejun Park

[permalink] [raw]
Subject: [RFC PATCH v3 2/5] scsi: ufs: Add UFS-feature layer


This patch is adding UFS feature layer to UFS core driver.

UFS Driver data structure (struct ufs_hba)

┌--------------┐
│ UFS feature │ <-- HPB module
│ layer │ <-- other extended feature module
└--------------┘
Each extended UFS-Feature module has a bus of ufs-ext feature type.
The UFS feature layer manages common APIs used by each extended feature
module. The APIs are set of UFS Query requests and UFS Vendor commands
related to each extended feature module.

Other extended features can also be implemented using the proposed APIs.
For example, in Write Booster, "prep_fn" can be used to guarantee the
lifetime of UFS by updating the amount of write IO.
And reset/reset_host/suspend/resume can be used to manage the kernel task
for checking lifetime of UFS.

The following 6 callback functions have been added to "ufshcd.c".
prep_fn: called after construct upiu structure
reset: called after proving hba
reset_host: called before ufshcd_host_reset_and_restore
suspend: called before ufshcd_suspend
resume: called after ufshcd_resume
rsp_upiu: called in ufshcd_transfer_rsp_status with SAM_STAT_GOOD state

Signed-off-by: Daejun Park <[email protected]>
---
drivers/scsi/ufs/Makefile | 2 +-
drivers/scsi/ufs/ufsfeature.c | 148 ++++++++++++++++++++++++++++++++++
drivers/scsi/ufs/ufsfeature.h | 69 ++++++++++++++++
drivers/scsi/ufs/ufshcd.c | 17 ++++
drivers/scsi/ufs/ufshcd.h | 3 +
5 files changed, 238 insertions(+), 1 deletion(-)
create mode 100644 drivers/scsi/ufs/ufsfeature.c
create mode 100644 drivers/scsi/ufs/ufsfeature.h

diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index f0c5b95ec9cc..433b871badfa 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_SCSI_UFS_CDNS_PLATFORM) += cdns-pltfrm.o
obj-$(CONFIG_SCSI_UFS_QCOM) += ufs-qcom.o
obj-$(CONFIG_SCSI_UFS_EXYNOS) += ufs-exynos.o
obj-$(CONFIG_SCSI_UFSHCD) += ufshcd-core.o
-ufshcd-core-y += ufshcd.o ufs-sysfs.o
+ufshcd-core-y += ufshcd.o ufs-sysfs.o ufsfeature.o
ufshcd-core-$(CONFIG_SCSI_UFS_BSG) += ufs_bsg.o
obj-$(CONFIG_SCSI_UFSHCD_PCI) += ufshcd-pci.o
obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
diff --git a/drivers/scsi/ufs/ufsfeature.c b/drivers/scsi/ufs/ufsfeature.c
new file mode 100644
index 000000000000..94c6be6babd3
--- /dev/null
+++ b/drivers/scsi/ufs/ufsfeature.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Universal Flash Storage Feature Support
+ *
+ * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ * Yongmyung Lee <[email protected]>
+ * Jinyoung Choi <[email protected]>
+ */
+
+#include "ufshcd.h"
+#include "ufsfeature.h"
+
+inline void ufsf_slave_configure(struct ufs_hba *hba,
+ struct scsi_device *sdev)
+{
+ /* skip well-known LU */
+ if (sdev->lun >= UFS_UPIU_MAX_UNIT_NUM_ID)
+ return;
+
+ if (!(hba->dev_info.b_ufs_feature_sup & UFS_DEV_HPB_SUPPORT))
+ return;
+
+ atomic_inc(&hba->ufsf.slave_conf_cnt);
+
+ wake_up(&hba->ufsf.sdev_wait);
+}
+
+inline void ufsf_ops_prep_fn(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.prep_fn)
+ ufshpb_drv->ufshpb_ops.prep_fn(hba, lrbp);
+}
+
+inline void ufsf_ops_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.rsp_upiu)
+ ufshpb_drv->ufshpb_ops.rsp_upiu(hba, lrbp);
+}
+
+inline void ufsf_ops_reset_host(struct ufs_hba *hba)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.reset_host)
+ ufshpb_drv->ufshpb_ops.reset_host(hba);
+}
+
+inline void ufsf_ops_reset(struct ufs_hba *hba)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.reset)
+ ufshpb_drv->ufshpb_ops.reset(hba);
+}
+
+inline void ufsf_ops_suspend(struct ufs_hba *hba)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.suspend)
+ ufshpb_drv->ufshpb_ops.suspend(hba);
+}
+
+inline void ufsf_ops_resume(struct ufs_hba *hba)
+{
+ struct ufshpb_driver *ufshpb_drv;
+
+ ufshpb_drv = dev_get_drvdata(&hba->ufsf.hpb_dev);
+
+ if (ufshpb_drv && ufshpb_drv->ufshpb_ops.resume)
+ ufshpb_drv->ufshpb_ops.resume(hba);
+}
+
+struct device_type ufshpb_dev_type = {
+ .name = "ufshpb_device"
+};
+EXPORT_SYMBOL(ufshpb_dev_type);
+
+static int ufsf_bus_match(struct device *dev,
+ struct device_driver *gendrv)
+{
+ if (dev->type == &ufshpb_dev_type)
+ return 1;
+
+ return 0;
+}
+
+struct bus_type ufsf_bus_type = {
+ .name = "ufsf_bus",
+ .match = ufsf_bus_match,
+};
+EXPORT_SYMBOL(ufsf_bus_type);
+
+static void ufsf_dev_release(struct device *dev)
+{
+ put_device(dev->parent);
+}
+
+void ufsf_scan_features(struct ufs_hba *hba)
+{
+ int ret;
+
+ init_waitqueue_head(&hba->ufsf.sdev_wait);
+ atomic_set(&hba->ufsf.slave_conf_cnt, 0);
+
+ if (hba->dev_info.wspecversion >= HPB_SUPPORTED_VERSION &&
+ (hba->dev_info.b_ufs_feature_sup & UFS_DEV_HPB_SUPPORT)) {
+ device_initialize(&hba->ufsf.hpb_dev);
+
+ hba->ufsf.hpb_dev.bus = &ufsf_bus_type;
+ hba->ufsf.hpb_dev.type = &ufshpb_dev_type;
+ hba->ufsf.hpb_dev.parent = get_device(hba->dev);
+ hba->ufsf.hpb_dev.release = ufsf_dev_release;
+
+ dev_set_name(&hba->ufsf.hpb_dev, "ufshpb");
+ ret = device_add(&hba->ufsf.hpb_dev);
+ if (ret)
+ dev_warn(hba->dev, "ufshpb: failed to add device\n");
+ }
+}
+
+static int __init ufsf_init(void)
+{
+ int ret;
+
+ ret = bus_register(&ufsf_bus_type);
+ if (ret)
+ pr_err("%s bus_register failed\n", __func__);
+
+ return ret;
+}
+device_initcall(ufsf_init);
diff --git a/drivers/scsi/ufs/ufsfeature.h b/drivers/scsi/ufs/ufsfeature.h
new file mode 100644
index 000000000000..1822d9d8e745
--- /dev/null
+++ b/drivers/scsi/ufs/ufsfeature.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Universal Flash Storage Feature Support
+ *
+ * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ * Yongmyung Lee <[email protected]>
+ * Jinyoung Choi <[email protected]>
+ */
+
+#ifndef _UFSFEATURE_H_
+#define _UFSFEATURE_H_
+
+#define HPB_SUPPORTED_VERSION 0x0310
+
+struct ufs_hba;
+struct ufshcd_lrb;
+
+/**
+ * struct ufsf_operation - UFS feature specific callbacks
+ * @prep_fn: called after construct upiu structure. The prep_fn should work
+ * properly even if it processes the same SCSI command multiple
+ * times by requeuing.
+ * @reset: called after probing hba
+ * @reset_host: called before ufshcd_host_reset_and_restore
+ * @suspend: called before ufshcd_suspend
+ * @resume: called after ufshcd_resume
+ * @rsp_upiu: called in ufshcd_transfer_rsp_status with SAM_STAT_GOOD state
+ */
+struct ufsf_operation {
+ void (*prep_fn)(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
+ void (*reset)(struct ufs_hba *hba);
+ void (*reset_host)(struct ufs_hba *hba);
+ void (*suspend)(struct ufs_hba *hba);
+ void (*resume)(struct ufs_hba *hba);
+ void (*rsp_upiu)(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
+};
+
+struct ufshpb_driver {
+ struct device_driver drv;
+ struct list_head lh_hpb_lu;
+
+ struct ufsf_operation ufshpb_ops;
+
+ /* memory management */
+ struct kmem_cache *ufshpb_mctx_cache;
+ mempool_t *ufshpb_mctx_pool;
+ mempool_t *ufshpb_page_pool;
+
+ struct workqueue_struct *ufshpb_wq;
+};
+
+struct ufsf_feature_info {
+ atomic_t slave_conf_cnt;
+ wait_queue_head_t sdev_wait;
+ struct device hpb_dev;
+};
+
+void ufsf_slave_configure(struct ufs_hba *hba, struct scsi_device *sdev);
+void ufsf_scan_features(struct ufs_hba *hba);
+void ufsf_ops_prep_fn(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
+void ufsf_ops_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
+void ufsf_ops_reset_host(struct ufs_hba *hba);
+void ufsf_ops_reset(struct ufs_hba *hba);
+void ufsf_ops_suspend(struct ufs_hba *hba);
+void ufsf_ops_resume(struct ufs_hba *hba);
+
+#endif /* End of Header */
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 52abe82a1166..082dfca8e237 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2533,6 +2533,8 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)

ufshcd_comp_scsi_upiu(hba, lrbp);

+ ufsf_ops_prep_fn(hba, lrbp);
+
err = ufshcd_map_sg(hba, lrbp);
if (err) {
lrbp->cmd = NULL;
@@ -4665,6 +4667,8 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
struct ufs_hba *hba = shost_priv(sdev->host);
struct request_queue *q = sdev->request_queue;

+ ufsf_slave_configure(hba, sdev);
+
blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);

if (ufshcd_is_rpm_autosuspend_allowed(hba))
@@ -4791,6 +4795,9 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
*/
pm_runtime_get_noresume(hba->dev);
}
+
+ if (scsi_status == SAM_STAT_GOOD)
+ ufsf_ops_rsp_upiu(hba, lrbp);
break;
case UPIU_TRANSACTION_REJECT_UPIU:
/* TODO: handle Reject UPIU Response */
@@ -6539,6 +6546,8 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
* Stop the host controller and complete the requests
* cleared by h/w
*/
+ ufsf_ops_reset_host(hba);
+
ufshcd_hba_stop(hba);

spin_lock_irqsave(hba->host->host_lock, flags);
@@ -6963,6 +6972,7 @@ static int ufs_get_device_desc(struct ufs_hba *hba)
/* getting Specification Version in big endian format */
dev_info->wspecversion = desc_buf[DEVICE_DESC_PARAM_SPEC_VER] << 8 |
desc_buf[DEVICE_DESC_PARAM_SPEC_VER + 1];
+ dev_info->b_ufs_feature_sup = desc_buf[DEVICE_DESC_PARAM_UFS_FEAT];

model_index = desc_buf[DEVICE_DESC_PARAM_PRDCT_NAME];

@@ -7340,6 +7350,7 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
}

ufs_bsg_probe(hba);
+ ufsf_scan_features(hba);
scsi_scan_host(hba->host);
pm_runtime_put_sync(hba->dev);

@@ -7428,6 +7439,7 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool async)
/* Enable Auto-Hibernate if configured */
ufshcd_auto_hibern8_enable(hba);

+ ufsf_ops_reset(hba);
out:

trace_ufshcd_init(dev_name(hba->dev), ret,
@@ -8185,6 +8197,8 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
req_link_state = UIC_LINK_OFF_STATE;
}

+ ufsf_ops_suspend(hba);
+
/*
* If we can't transition into any of the low power modes
* just gate the clocks.
@@ -8306,6 +8320,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
ufshcd_release(hba);
+ ufsf_ops_resume(hba);
out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
schedule_delayed_work(&hba->rpm_dev_flush_recheck_work,
@@ -8402,6 +8417,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
/* Enable Auto-Hibernate if configured */
ufshcd_auto_hibern8_enable(hba);

+ ufsf_ops_resume(hba);
+
if (hba->dev_info.b_rpm_dev_flush_capable) {
hba->dev_info.b_rpm_dev_flush_capable = false;
cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index c774012582b4..6fe5c9b3a0e7 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -46,6 +46,7 @@
#include "ufs.h"
#include "ufs_quirks.h"
#include "ufshci.h"
+#include "ufsfeature.h"

#define UFSHCD "ufshcd"
#define UFSHCD_DRIVER_VERSION "0.2"
@@ -736,6 +737,8 @@ struct ufs_hba {
bool wb_buf_flush_enabled;
bool wb_enabled;
struct delayed_work rpm_dev_flush_recheck_work;
+
+ struct ufsf_feature_info ufsf;
};

/* Returns true if clocks can be gated. Otherwise false */
--
2.17.1


2020-06-28 12:26:53

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi Daejun

Seems you intentionally ignored to give you comments on my suggestion.
let me provide the reason.

Before submitting your next version patch, please check your L2P
mapping HPB reqeust submission logical algorithem. I have did
performance comparison testing on 4KB, there are about 13% performance
drop. Also the hit count is lower. I don't know if this is related to
your current work queue scheduling, since you didn't add the timer for
each HPB request.

Thanks,

Bean


On Tue, 2020-06-23 at 10:02 +0900, Daejun Park wrote:
> Changelog:
>
> v2 -> v3
> 1. Add checking input module parameter value.
> 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
> 3. Cleanup for unused variables and label.
>
> v1 -> v2
> 1. Change the full boilerplate text to SPDX style.
> 2. Adopt dynamic allocation for sub-region data structure.
> 3. Cleanup.
>
> NAND flash memory-based storage devices use Flash Translation Layer
> (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping
> table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
>
> To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
> (Host Performance Booster) which uses host system memory as a cache
> for the
> FTL mapping table. By using HPB, FTL data can be read from host
> memory
> faster than from NAND flash memory.
>
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
>
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
>
> The UFS-feature is an additional layer to avoid the structure in
> which the
> UFS-core driver and the UFS-feature are entangled with each other in
> a
> single module.
> By adding the layer, UFS-features composed of various combinations
> can be
> supported. Also, even if a new feature is added, modification of the
> UFS-core driver can be minimized.
>
> In the HPB probe and init process, the device information of the UFS
> is
> queried. After checking supported features, the data structure for
> the HPB
> is initialized according to the device information.
>
> A read I/O in the active sub-region where the map is cached is
> changed to
> HPB READ by the HPB module.
>
> The HPB module manages the L2P map using information received from
> the
> device. For active sub-region, the HPB module caches through
> ufshpb_map
> request. For the in-active region, the HPB module discards the L2P
> map.
> When a write I/O occurs in an active sub-region area, associated
> dirty
> bitmap checked as dirty for preventing stale read.
>
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
>
> This series patches are based on the 5.9/scsi-queue branch.
>
> [1]:
>
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong
>
> Daejun park (5):
> scsi: ufs: Add UFS feature related parameter
> scsi: ufs: Add UFS feature layer
> scsi: ufs: Introduce HPB module
> scsi: ufs: L2P map management for HPB read
> scsi: ufs: Prepare HPB read for cached sub-region
>
> drivers/scsi/ufs/Kconfig | 9 +
> drivers/scsi/ufs/Makefile | 3 +-
> drivers/scsi/ufs/ufs.h | 12 +
> drivers/scsi/ufs/ufsfeature.c | 148 +++
> drivers/scsi/ufs/ufsfeature.h | 69 ++
> drivers/scsi/ufs/ufshcd.c | 23 +-
> drivers/scsi/ufs/ufshcd.h | 3 +
> drivers/scsi/ufs/ufshpb.c | 1996
> ++++++++++++++++++++++++++++++++++++
> drivers/scsi/ufs/ufshpb.h | 234 +++++
> 9 files changed, 2494 insertions(+), 3 deletions(-)
> created mode 100644 drivers/scsi/ufs/ufsfeature.c
> created mode 100644 drivers/scsi/ufs/ufsfeature.h
> created mode 100644 drivers/scsi/ufs/ufshpb.c
> created mode 100644 drivers/scsi/ufs/ufshpb.h

2020-06-29 18:38:20

by Avri Altman

[permalink] [raw]
Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support



>
> Hi Avri
>
> On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> > Hi Bean,
> > >
> > > Hi Daejun
> > >
> > > Seems you intentionally ignored to give you comments on my
> > > suggestion.
> > > let me provide the reason.
> > >
> > > Before submitting your next version patch, please check your L2P
> > > mapping HPB reqeust submission logical algorithem. I have did
> > > performance comparison testing on 4KB, there are about 13%
> > > performance
> > > drop. Also the hit count is lower. I don't know if this is related
> > > to
> > > your current work queue scheduling, since you didn't add the timer
> > > for
> > > each HPB request.
> >
> > In device control mode, the various decisions,
> > and specifically those that are causing repetitive evictions,
> > are made by the device.
> > Is this the issue that you are referring to?
> >
>
> For this device mode, if HPB mapping table of the active region becomes
> dirty in the UFS device side, there is repetitive inactive rsp, but it
> is not the reason for the condition I mentioned here.
>
> > As for the driver, do you see any issue that is causing unnecessary
> > latency?
> >
>
> In Daejun's patch, it now uses work_queue, and as long there is new RSP of
> thesubregion to be activated, the driver will queue "work" to this work
> queue, actually, this is deferred work. we don't know when it will be
> scheduled/finished. we need to optimize it.
But those "to-do" lists are checked on every completion interrupt and on every resume.
Do you see any scenario in which the "to-be-activated" or "to-be-inactivate" work is getting starved?

2020-06-29 18:40:47

by Avri Altman

[permalink] [raw]
Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

If no-one else objects, maybe you can submit your patches as non-RFC for review?

Thanks,
Avri

> -----Original Message-----
> From: Daejun Park <[email protected]>
> Sent: Tuesday, June 23, 2020 4:02 AM
> To: Avri Altman <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; ALIM AKHTAR
> <[email protected]>; Daejun Park <[email protected]>
> Cc: [email protected]; [email protected]; Sang-yoon Oh
> <[email protected]>; Sung-Jun Park
> <[email protected]>; yongmyung lee
> <[email protected]>; Jinyoung CHOI <j-
> [email protected]>; Adel Choi <[email protected]>; BoRam
> Shin <[email protected]>
> Subject: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support
>
> CAUTION: This email originated from outside of Western Digital. Do not click
> on links or open attachments unless you recognize the sender and know that
> the content is safe.
>
>
> Changelog:
>
> v2 -> v3
> 1. Add checking input module parameter value.
> 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
> 3. Cleanup for unused variables and label.
>
> v1 -> v2
> 1. Change the full boilerplate text to SPDX style.
> 2. Adopt dynamic allocation for sub-region data structure.
> 3. Cleanup.
>
> NAND flash memory-based storage devices use Flash Translation Layer (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
>
> To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
> (Host Performance Booster) which uses host system memory as a cache for
> the
> FTL mapping table. By using HPB, FTL data can be read from host memory
> faster than from NAND flash memory.
>
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
>
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
>
> The UFS-feature is an additional layer to avoid the structure in which the
> UFS-core driver and the UFS-feature are entangled with each other in a
> single module.
> By adding the layer, UFS-features composed of various combinations can be
> supported. Also, even if a new feature is added, modification of the
> UFS-core driver can be minimized.
>
> In the HPB probe and init process, the device information of the UFS is
> queried. After checking supported features, the data structure for the HPB
> is initialized according to the device information.
>
> A read I/O in the active sub-region where the map is cached is changed to
> HPB READ by the HPB module.
>
> The HPB module manages the L2P map using information received from the
> device. For active sub-region, the HPB module caches through ufshpb_map
> request. For the in-active region, the HPB module discards the L2P map.
> When a write I/O occurs in an active sub-region area, associated dirty
> bitmap checked as dirty for preventing stale read.
>
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
>
> This series patches are based on the 5.9/scsi-queue branch.
>
> [1]:
> https://www.usenix.org/conference/hotstorage17/program/presentation/jeo
> ng
>
> Daejun park (5):
> scsi: ufs: Add UFS feature related parameter
> scsi: ufs: Add UFS feature layer
> scsi: ufs: Introduce HPB module
> scsi: ufs: L2P map management for HPB read
> scsi: ufs: Prepare HPB read for cached sub-region
>
> drivers/scsi/ufs/Kconfig | 9 +
> drivers/scsi/ufs/Makefile | 3 +-
> drivers/scsi/ufs/ufs.h | 12 +
> drivers/scsi/ufs/ufsfeature.c | 148 +++
> drivers/scsi/ufs/ufsfeature.h | 69 ++
> drivers/scsi/ufs/ufshcd.c | 23 +-
> drivers/scsi/ufs/ufshcd.h | 3 +
> drivers/scsi/ufs/ufshpb.c | 1996 ++++++++++++++++++++++++++++++++++++
> drivers/scsi/ufs/ufshpb.h | 234 +++++
> 9 files changed, 2494 insertions(+), 3 deletions(-)
> created mode 100644 drivers/scsi/ufs/ufsfeature.c
> created mode 100644 drivers/scsi/ufs/ufsfeature.h
> created mode 100644 drivers/scsi/ufs/ufshpb.c
> created mode 100644 drivers/scsi/ufs/ufshpb.h

2020-06-29 18:49:00

by Avri Altman

[permalink] [raw]
Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi Bean,
>
> Hi Daejun
>
> Seems you intentionally ignored to give you comments on my suggestion.
> let me provide the reason.
>
> Before submitting your next version patch, please check your L2P
> mapping HPB reqeust submission logical algorithem. I have did
> performance comparison testing on 4KB, there are about 13% performance
> drop. Also the hit count is lower. I don't know if this is related to
> your current work queue scheduling, since you didn't add the timer for
> each HPB request.
In device control mode, the various decisions,
and specifically those that are causing repetitive evictions,
are made by the device.
Is this the issue that you are referring to?

As for the driver, do you see any issue that is causing unnecessary latency?

Thanks,
Avri

2020-06-29 20:52:23

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

On Mon, 2020-06-29 at 11:06 +0000, Avri Altman wrote:
> >
> > Hi Avri
> >
> > On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> > > Hi Bean,
> > > >
> > > > Hi Daejun
> > > >
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > > >
> > > > Before submitting your next version patch, please check your
> > > > L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is
> > > > related
> > > > to
> > > > your current work queue scheduling, since you didn't add the
> > > > timer
> > > > for
> > > > each HPB request.
> > >
> > > In device control mode, the various decisions,
> > > and specifically those that are causing repetitive evictions,
> > > are made by the device.
> > > Is this the issue that you are referring to?
> > >
> >
> > For this device mode, if HPB mapping table of the active region
> > becomes
> > dirty in the UFS device side, there is repetitive inactive rsp, but
> > it
> > is not the reason for the condition I mentioned here.
> >
> > > As for the driver, do you see any issue that is causing
> > > unnecessary
> > > latency?
> > >
> >
> > In Daejun's patch, it now uses work_queue, and as long there is new
> > RSP of
> > thesubregion to be activated, the driver will queue "work" to this
> > work
> > queue, actually, this is deferred work. we don't know when it will
> > be
> > scheduled/finished. we need to optimize it.
>
> But those "to-do" lists are checked on every completion interrupt and
> on every resume.
> Do you see any scenario in which the "to-be-activated" or "to-be-
> inactivate" work is getting starved?
>

let me run more testing cases, will back to you if there is new
updates.

Thanks,
Bean


2020-06-29 20:54:27

by Daejun Park

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

> Seems you intentionally ignored to give you comments on my suggestion.
> let me provide the reason.
Sorry! I replied to your comment (https://lkml.org/lkml/2020/6/15/1492),
but you didn't reply on that. I thought you agreed because you didn't send
any more comments.


> Before submitting your next version patch, please check your L2P
> mapping HPB reqeust submission logical algorithem. I have did
We are also reviewing the code that you submitted before.
It seems to be a performance improvement as it sends a map request directly.

> performance comparison testing on 4KB, there are about 13% performance
> drop. Also the hit count is lower. I don't know if this is related to
It is interesting that there is actually a performance improvement.
Could you share the test environment, please? However, I think stability is
important to HPB driver. We have tested our method with the real products and
the HPB 1.0 driver is based on that.
After this patch, your approach can be done as an incremental patch? I would
like to test the patch that you submitted and verify it.

> your current work queue scheduling, since you didn't add the timer for
> each HPB request.
There was Bart's comment that it was not good add an arbitrary timeout value
to the request. (please refer to: https://lkml.org/lkml/2020/6/11/1043)
When no timer is added to the request, the SD timout will be set as default
timeout at the block layer.

Thanks,
Daejun

2020-06-29 21:03:25

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi Avri

On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> Hi Bean,
> >
> > Hi Daejun
> >
> > Seems you intentionally ignored to give you comments on my
> > suggestion.
> > let me provide the reason.
> >
> > Before submitting your next version patch, please check your L2P
> > mapping HPB reqeust submission logical algorithem. I have did
> > performance comparison testing on 4KB, there are about 13%
> > performance
> > drop. Also the hit count is lower. I don't know if this is related
> > to
> > your current work queue scheduling, since you didn't add the timer
> > for
> > each HPB request.
>
> In device control mode, the various decisions,
> and specifically those that are causing repetitive evictions,
> are made by the device.
> Is this the issue that you are referring to?
>

For this device mode, if HPB mapping table of the active region becomes
dirty in the UFS device side, there is repetitive inactive rsp, but it
is not the reason for the condition I mentioned here.

> As for the driver, do you see any issue that is causing unnecessary
> latency?
>

In Daejun's patch, it now uses work_queue, and as long there is new RSP of thesubregion to be activated, the driver will queue "work" to this work
queue, actually, this is deferred work. we don't know when it will be
scheduled/finished. we need to optimize it.


Thanks,
Bean



2020-06-29 21:04:26

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi Daejun

On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > Seems you intentionally ignored to give you comments on my
> > suggestion.
> > let me provide the reason.
>
> Sorry! I replied to your comment (
> https://lkml.org/lkml/2020/6/15/1492),
> but you didn't reply on that. I thought you agreed because you didn't
> send
> any more comments.
>
>
> > Before submitting your next version patch, please check your L2P
> > mapping HPB reqeust submission logical algorithem. I have did
>
> We are also reviewing the code that you submitted before.
> It seems to be a performance improvement as it sends a map request
> directly.
>
> > performance comparison testing on 4KB, there are about 13%
> > performance
> > drop. Also the hit count is lower. I don't know if this is related
> > to
>
> It is interesting that there is actually a performance improvement.
> Could you share the test environment, please? However, I think
> stability is
> important to HPB driver. We have tested our method with the real
> products and
> the HPB 1.0 driver is based on that.

I just run fio benchmark tool with --rw=randread, --bs=4kb, --
size=8G/10G/64G/100G. and see what performance diff with the direct
submission approach.

> After this patch, your approach can be done as an incremental patch?
> I would
> like to test the patch that you submitted and verify it.
>
> > your current work queue scheduling, since you didn't add the timer
> > for
> > each HPB request.
>

Taking into consideration of the HPB 2.0, can we submit the HPB write
request to the SCSI layer? if not, it will be a direct submission way.
why not directly use direct way? or maybe you have a more advisable
approach to work around this. would you please share with us.
appreciate.


> There was Bart's comment that it was not good add an arbitrary
> timeout value
> to the request. (please refer to:
> https://lkml.org/lkml/2020/6/11/1043)
> When no timer is added to the request, the SD timout will be set as
> default
> timeout at the block layer.
>

I saw that, so I should add a timer in order to optimise HPB reqeust
scheduling/completition. this is ok so far.

> Thanks,
> Daejun

Thanks,
Bean


2020-06-30 01:08:49

by Daejun Park

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi Bean,
> On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > Seems you intentionally ignored to give you comments on my
> > > suggestion.
> > > let me provide the reason.
> >
> > Sorry! I replied to your comment (
> > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492),
> > but you didn't reply on that. I thought you agreed because you didn't
> > send
> > any more comments.
> >
> >
> > > Before submitting your next version patch, please check your L2P
> > > mapping HPB reqeust submission logical algorithem. I have did
> >
> > We are also reviewing the code that you submitted before.
> > It seems to be a performance improvement as it sends a map request
> > directly.
> >
> > > performance comparison testing on 4KB, there are about 13%
> > > performance
> > > drop. Also the hit count is lower. I don't know if this is related
> > > to
> >
> > It is interesting that there is actually a performance improvement.
> > Could you share the test environment, please? However, I think
> > stability is
> > important to HPB driver. We have tested our method with the real
> > products and
> > the HPB 1.0 driver is based on that.
>
> I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> size=8G/10G/64G/100G. and see what performance diff with the direct
> submission approach.

Thanks!

> > After this patch, your approach can be done as an incremental patch?
> > I would
> > like to test the patch that you submitted and verify it.
> >
> > > your current work queue scheduling, since you didn't add the timer
> > > for
> > > each HPB request.
> >
>
> Taking into consideration of the HPB 2.0, can we submit the HPB write
> request to the SCSI layer? if not, it will be a direct submission way.
> why not directly use direct way? or maybe you have a more advisable
> approach to work around this. would you please share with us.
> appreciate.

I am considering a direct submission way for the next version.
We will implement the write buffer command of HPB 2.0, after patching HPB 1.0.

As for the direct submission of HPB releated command including HPB write
buffer, I think we'd better discuss the right approach in depth before
moving on to the next step.

Thanks,
Daejun

2020-06-30 06:41:24

by Avri Altman

[permalink] [raw]
Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

Hi,

>
> Hi Bean,
> > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > >
> > > Sorry! I replied to your comment (
> > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> 0cc47a31cdf8-
> 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> 2F15%2F1492),
> > > but you didn't reply on that. I thought you agreed because you didn't
> > > send
> > > any more comments.
> > >
> > >
> > > > Before submitting your next version patch, please check your L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > >
> > > We are also reviewing the code that you submitted before.
> > > It seems to be a performance improvement as it sends a map request
> > > directly.
> > >
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is related
> > > > to
> > >
> > > It is interesting that there is actually a performance improvement.
> > > Could you share the test environment, please? However, I think
> > > stability is
> > > important to HPB driver. We have tested our method with the real
> > > products and
> > > the HPB 1.0 driver is based on that.
> >
> > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > size=8G/10G/64G/100G. and see what performance diff with the direct
> > submission approach.
>
> Thanks!
>
> > > After this patch, your approach can be done as an incremental patch?
> > > I would
> > > like to test the patch that you submitted and verify it.
> > >
> > > > your current work queue scheduling, since you didn't add the timer
> > > > for
> > > > each HPB request.
> > >
> >
> > Taking into consideration of the HPB 2.0, can we submit the HPB write
> > request to the SCSI layer? if not, it will be a direct submission way.
> > why not directly use direct way? or maybe you have a more advisable
> > approach to work around this. would you please share with us.
> > appreciate.
>
> I am considering a direct submission way for the next version.
> We will implement the write buffer command of HPB 2.0, after patching HPB
> 1.0.
>
> As for the direct submission of HPB releated command including HPB write
> buffer, I think we'd better discuss the right approach in depth before
> moving on to the next step.
I vote to stay with the current implementation because:
1) Bean is probably right about 2.0, but it's out of scope for now -
there is a long way to go before we'll need to worry about it
2) For now, we should focus on the functional flows.
Performance issues, should such issues indeed exists, can be dealt with later. And,
3) The current code base is running in production for more than 3 years now.
I am not so eager to dump a robust, well debugged code unless it absolutely necessary.

Thanks,
Avri


2020-06-30 22:11:14

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote:
> Hi Bean,
> > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > >
> > > Sorry! I replied to your comment (
> > >
https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492
> > > ),
> > > but you didn't reply on that. I thought you agreed because you
> > > didn't
> > > send
> > > any more comments.
> > >
> > >
> > > > Before submitting your next version patch, please check your
> > > > L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > >
> > > We are also reviewing the code that you submitted before.
> > > It seems to be a performance improvement as it sends a map
> > > request
> > > directly.
> > >
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is
> > > > related
> > > > to
> > >
> > > It is interesting that there is actually a performance
> > > improvement.
> > > Could you share the test environment, please? However, I think
> > > stability is
> > > important to HPB driver. We have tested our method with the real
> > > products and
> > > the HPB 1.0 driver is based on that.
> >
> > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > size=8G/10G/64G/100G. and see what performance diff with the direct
> > submission approach.
>
> Thanks!
>
> > > After this patch, your approach can be done as an incremental
> > > patch?
> > > I would
> > > like to test the patch that you submitted and verify it.
> > >
> > > > your current work queue scheduling, since you didn't add the
> > > > timer
> > > > for
> > > > each HPB request.
> >
> > Taking into consideration of the HPB 2.0, can we submit the HPB
> > write
> > request to the SCSI layer? if not, it will be a direct submission
> > way.
> > why not directly use direct way? or maybe you have a more advisable
> > approach to work around this. would you please share with us.
> > appreciate.
>
> I am considering a direct submission way for the next version.
> We will implement the write buffer command of HPB 2.0, after patching
> HPB 1.0.
>
> As for the direct submission of HPB releated command including HPB
> write
> buffer, I think we'd better discuss the right approach in depth
> before
> moving on to the next step.
>

Hi Daejun
If you need reference code, you can freely copy my code from my RFC v3
patchset. or if you need my side testing support, just let me, I can
help you test your code.

Thanks,
Bean


2020-06-30 22:13:18

by Bean Huo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

On Tue, 2020-06-30 at 06:39 +0000, Avri Altman wrote:
> Hi,
>
> >
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > >
> > > > Sorry! I replied to your comment (
> > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> >
> > 0cc47a31cdf8-
> > 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> > 2F15%2F1492),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't
> > > > send
> > > > any more comments.
> > > >
> > > >
> > > > > Before submitting your next version patch, please check your
> > > > > L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > >
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map
> > > > request
> > > > directly.
> > > >
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance
> > > > > drop. Also the hit count is lower. I don't know if this is
> > > > > related
> > > > > to
> > > >
> > > > It is interesting that there is actually a performance
> > > > improvement.
> > > > Could you share the test environment, please? However, I think
> > > > stability is
> > > > important to HPB driver. We have tested our method with the
> > > > real
> > > > products and
> > > > the HPB 1.0 driver is based on that.
> > >
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the
> > > direct
> > > submission approach.
> >
> > Thanks!
> >
> > > > After this patch, your approach can be done as an incremental
> > > > patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > >
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer
> > > > > for
> > > > > each HPB request.
> > >
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write
> > > request to the SCSI layer? if not, it will be a direct submission
> > > way.
> > > why not directly use direct way? or maybe you have a more
> > > advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> >
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after
> > patching HPB
> > 1.0.
> >
> > As for the direct submission of HPB releated command including HPB
> > write
> > buffer, I think we'd better discuss the right approach in depth
> > before
> > moving on to the next step.
>
> I vote to stay with the current implementation because:
> 1) Bean is probably right about 2.0, but it's out of scope for now -
> there is a long way to go before we'll need to worry about it
> 2) For now, we should focus on the functional flows.
> Performance issues, should such issues indeed exists, can be
> dealt with later. And,
> 3) The current code base is running in production for more than 3
> years now.
> I am not so eager to dump a robust, well debugged code unless it
> absolutely necessary.
>
> Thanks,
> Avri
>
>
Hi Avri
Thanks, appreciate you shared your position on this topic. I don't know
how I can convince you to change your opinion.
Let me try.

1. HPB 2.0 is not out of scope.
HPB 1.0 only supports 4KB read length, which is useless. I don't know
if there will be users who want to use HPB driver only supports 4KB
chunk size. I think, we all know that some smartphone vendors have
already use HPB 2.0, even HPB 2.0 has not been released yet. you
mentioned this in your before emails. HPB 1.0 is just a
transition(limited) version, we need to think about the HPB 2.0 support
when we develop the HPB 1.0 driver.
To say the least, if we don't think about HPB 2.0 support, and just
focus HPB 1.0, in the end, after HPB 2.0 releasing, we need to return
original point, re-do lots thing, why we cannot fix it now and think
one step further.

2. The major goal of the HPB feature is to improve random read
performance, and HPB device mode implementing flow is now already very
clear enough. I don't know what the functional flows you mentioned.
if it is HPB host mode, no, this is another big topic, I think we'd
better not add in current driver until we all have a final approach.

3. Regarding the Daejun's HPB driver used age, I can't easily jump to a
conclusion. But for sure, before he disclosed his HPB driver and
submitted to the community, he did lots of changes and deletions. That
means it still needs lots of tests.


I didn't mean to disrupt Daejun's patch upstreaming. If Daejun can
consider HPB 2.0 support while developing HPB 1.0 patch, that is super.
Thus we can quickly add HPB 2.0 support once HPB 2.0 Spec released.
Think about that who is now using HPB 1.0?

Thanks,
Bean










2020-07-01 00:48:40

by Daejun Park

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support

On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote:
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > >
> > > > Sorry! I replied to your comment (
> > > >
> https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492
> > > > ),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't
> > > > send
> > > > any more comments.
> > > >
> > > >
> > > > > Before submitting your next version patch, please check your
> > > > > L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > >
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map
> > > > request
> > > > directly.
> > > >
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance
> > > > > drop. Also the hit count is lower. I don't know if this is
> > > > > related
> > > > > to
> > > >
> > > > It is interesting that there is actually a performance
> > > > improvement.
> > > > Could you share the test environment, please? However, I think
> > > > stability is
> > > > important to HPB driver. We have tested our method with the real
> > > > products and
> > > > the HPB 1.0 driver is based on that.
> > >
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the direct
> > > submission approach.
> >
> > Thanks!
> >
> > > > After this patch, your approach can be done as an incremental
> > > > patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > >
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer
> > > > > for
> > > > > each HPB request.
> > >
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write
> > > request to the SCSI layer? if not, it will be a direct submission
> > > way.
> > > why not directly use direct way? or maybe you have a more advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> >
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after patching
> > HPB 1.0.
> >
> > As for the direct submission of HPB releated command including HPB
> > write
> > buffer, I think we'd better discuss the right approach in depth
> > before
> > moving on to the next step.
> >
>
> Hi Daejun
> If you need reference code, you can freely copy my code from my RFC v3
> patchset. or if you need my side testing support, just let me, I can
> help you test your code.
>
It will be good example code for developing HPB 2.0.

Thanks,
Daejun

2020-07-01 01:57:52

by Alim Akhtar

[permalink] [raw]
Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support



> -----Original Message-----
> From: Avri Altman <[email protected]>
> Sent: 30 June 2020 12:09
> To: [email protected]; Bean Huo <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; ALIM AKHTAR <[email protected]>
> Cc: [email protected]; [email protected]; Sang-yoon Oh
> <[email protected]>; Sung-Jun Park
> <[email protected]>; yongmyung lee
> <[email protected]>; Jinyoung CHOI <[email protected]>;
> Adel Choi <[email protected]>; BoRam Shin
> <[email protected]>
> Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster
> Support
>
> Hi,
>
> >
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > >
> > > > Sorry! I replied to your comment (
> > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> > 0cc47a31cdf8-
> >
> 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> > 2F15%2F1492),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't send any more comments.
> > > >
> > > >
> > > > > Before submitting your next version patch, please check your L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > >
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map request
> > > > directly.
> > > >
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance drop. Also the hit count is lower. I don't know if
> > > > > this is related to
> > > >
> > > > It is interesting that there is actually a performance improvement.
> > > > Could you share the test environment, please? However, I think
> > > > stability is important to HPB driver. We have tested our method
> > > > with the real products and the HPB 1.0 driver is based on that.
> > >
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the direct
> > > submission approach.
> >
> > Thanks!
> >
> > > > After this patch, your approach can be done as an incremental patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > >
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer for each HPB request.
> > > >
> > >
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write request to the SCSI layer? if not, it will be a direct submission way.
> > > why not directly use direct way? or maybe you have a more advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> >
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after patching
> > HPB 1.0.
> >
> > As for the direct submission of HPB releated command including HPB
> > write buffer, I think we'd better discuss the right approach in depth
> > before moving on to the next step.
> I vote to stay with the current implementation because:
> 1) Bean is probably right about 2.0, but it's out of scope for now -
> there is a long way to go before we'll need to worry about it
> 2) For now, we should focus on the functional flows.
> Performance issues, should such issues indeed exists, can be dealt with later.
> And,
> 3) The current code base is running in production for more than 3 years now.
> I am not so eager to dump a robust, well debugged code unless it absolutely
> necessary.
>
Avri and Bean,
I think this is good approach to take, and let us add incremental patches to add future specification enhancements.

> Thanks,
> Avri
>