2023-05-01 12:42:08

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 0/5] optimize some data structure in nvme

This serie is a proposal to slighly optimize the memory needed for some
structures used in nvme.

This follows the discussion in [1].

Honnestly, I'm not convinced that this serie really brings semething.
Because of the way memory alocation works, and its over-allocation to try to
avoid memory fragmentation, some limited gains are most of the time useless.

It could still help:
- many holes in structure can, at some point, have its size reach a threshold
(this is specially true if such structures are allocated with kcalloc() or
kmalloc_array())
- it can save some space in some other structures if embedded in them
- it can save a few cycles if the structure is memcpy()'ed or zeroed, for
example
- can reduce cache usage

With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
much more theorical.

The changes are really limited, so even if the gain is marginal, maybe it still
makes sense to merge them.

Each patch gives the layout generated by pahole before and after the patch.

[1]: https://lore.kernel.org/all/[email protected]/

Christophe JAILLET (5):
nvmet: Reorder fields in 'struct nvmet_sq'
nvmet: Reorder fields in 'struct nvme_ctrl'
nvmet: Reorder fields in 'struct nvmf_ctrl_options'
nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'
nvmet: Reorder fields in 'struct nvmefc_fcp_req'

drivers/nvme/host/auth.c | 6 +++---
drivers/nvme/host/fabrics.h | 8 ++++----
drivers/nvme/host/nvme.h | 6 +++---
drivers/nvme/target/nvmet.h | 4 ++--
include/linux/nvme-fc-driver.h | 10 +++++-----
5 files changed, 17 insertions(+), 17 deletions(-)

--
2.34.1


2023-05-01 12:42:21

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 472 to 464
bytes when CONFIG_NVME_TARGET_AUTH is defined.

This structure is embedded into some other structures, so it helps reducing
their sizes as well.

Signed-off-by: Christophe JAILLET <[email protected]>
---
Using pahole

Before:
======
struct nvmet_sq {
struct nvmet_ctrl * ctrl; /* 0 8 */
struct percpu_ref ref; /* 8 16 */
u16 qid; /* 24 2 */
u16 size; /* 26 2 */
u32 sqhd; /* 28 4 */
bool sqhd_disabled; /* 32 1 */

/* XXX 7 bytes hole, try to pack */

struct delayed_work auth_expired_work; /* 40 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
bool authenticated; /* 224 1 */

/* XXX 1 byte hole, try to pack */

u16 dhchap_tid; /* 226 2 */
u16 dhchap_status; /* 228 2 */

/* XXX 2 bytes hole, try to pack */

int dhchap_step; /* 232 4 */

/* XXX 4 bytes hole, try to pack */

u8 * dhchap_c1; /* 240 8 */
u8 * dhchap_c2; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
u32 dhchap_s1; /* 256 4 */
u32 dhchap_s2; /* 260 4 */
u8 * dhchap_skey; /* 264 8 */
int dhchap_skey_len; /* 272 4 */

/* XXX 4 bytes hole, try to pack */

struct completion free_done; /* 280 96 */
/* --- cacheline 5 boundary (320 bytes) was 56 bytes ago --- */
struct completion confirm_done; /* 376 96 */

/* size: 472, cachelines: 8, members: 19 */
/* sum members: 454, holes: 5, sum holes: 18 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 24 bytes */
};

After:
=====
struct nvmet_sq {
struct nvmet_ctrl * ctrl; /* 0 8 */
struct percpu_ref ref; /* 8 16 */
u16 qid; /* 24 2 */
u16 size; /* 26 2 */
u32 sqhd; /* 28 4 */
bool sqhd_disabled; /* 32 1 */
bool authenticated; /* 33 1 */

/* XXX 6 bytes hole, try to pack */

struct delayed_work auth_expired_work; /* 40 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
u16 dhchap_tid; /* 224 2 */
u16 dhchap_status; /* 226 2 */
int dhchap_step; /* 228 4 */
u8 * dhchap_c1; /* 232 8 */
u8 * dhchap_c2; /* 240 8 */
u32 dhchap_s1; /* 248 4 */
u32 dhchap_s2; /* 252 4 */
/* --- cacheline 4 boundary (256 bytes) --- */
u8 * dhchap_skey; /* 256 8 */
int dhchap_skey_len; /* 264 4 */

/* XXX 4 bytes hole, try to pack */

struct completion free_done; /* 272 96 */
/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
struct completion confirm_done; /* 368 96 */

/* size: 464, cachelines: 8, members: 19 */
/* sum members: 454, holes: 2, sum holes: 10 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 16 bytes */
};
---
drivers/nvme/target/nvmet.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c50146085fb5..8cfd60f3b564 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -109,8 +109,8 @@ struct nvmet_sq {
u32 sqhd;
bool sqhd_disabled;
#ifdef CONFIG_NVME_TARGET_AUTH
- struct delayed_work auth_expired_work;
bool authenticated;
+ struct delayed_work auth_expired_work;
u16 dhchap_tid;
u16 dhchap_status;
int dhchap_step;
--
2.34.1

2023-05-01 12:42:34

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 3/5] nvmet: Reorder fields in 'struct nvmf_ctrl_options'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmf_ctrl_options' from 136 to
128 bytes.

When such a structure is allocated in nvmf_create_ctrl(), because of the
way memory allocation works, when 136 bytes were requested, 192 bytes were
allocated.

So this saves 64 bytes per allocation, 1 cache line to hold the whole
structure and a few cycles when zeroing the memory in nvmf_create_ctrl().

Signed-off-by: Christophe JAILLET <[email protected]>
---
Using pahole

Before:
======
struct nvmf_ctrl_options {
unsigned int mask; /* 0 4 */

/* XXX 4 bytes hole, try to pack */

char * transport; /* 8 8 */
char * subsysnqn; /* 16 8 */
char * traddr; /* 24 8 */
char * trsvcid; /* 32 8 */
char * host_traddr; /* 40 8 */
char * host_iface; /* 48 8 */
size_t queue_size; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
unsigned int nr_io_queues; /* 64 4 */
unsigned int reconnect_delay; /* 68 4 */
bool discovery_nqn; /* 72 1 */
bool duplicate_connect; /* 73 1 */

/* XXX 2 bytes hole, try to pack */

unsigned int kato; /* 76 4 */
struct nvmf_host * host; /* 80 8 */
int max_reconnects; /* 88 4 */

/* XXX 4 bytes hole, try to pack */

char * dhchap_secret; /* 96 8 */
char * dhchap_ctrl_secret; /* 104 8 */
bool disable_sqflow; /* 112 1 */
bool hdr_digest; /* 113 1 */
bool data_digest; /* 114 1 */

/* XXX 1 byte hole, try to pack */

unsigned int nr_write_queues; /* 116 4 */
unsigned int nr_poll_queues; /* 120 4 */
int tos; /* 124 4 */
/* --- cacheline 2 boundary (128 bytes) --- */
int fast_io_fail_tmo; /* 128 4 */

/* size: 136, cachelines: 3, members: 24 */
/* sum members: 121, holes: 4, sum holes: 11 */
/* padding: 4 */
/* last cacheline: 8 bytes */
};


After:
=====
struct nvmf_ctrl_options {
unsigned int mask; /* 0 4 */
int max_reconnects; /* 4 4 */
char * transport; /* 8 8 */
char * subsysnqn; /* 16 8 */
char * traddr; /* 24 8 */
char * trsvcid; /* 32 8 */
char * host_traddr; /* 40 8 */
char * host_iface; /* 48 8 */
size_t queue_size; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
unsigned int nr_io_queues; /* 64 4 */
unsigned int reconnect_delay; /* 68 4 */
bool discovery_nqn; /* 72 1 */
bool duplicate_connect; /* 73 1 */

/* XXX 2 bytes hole, try to pack */

unsigned int kato; /* 76 4 */
struct nvmf_host * host; /* 80 8 */
char * dhchap_secret; /* 88 8 */
char * dhchap_ctrl_secret; /* 96 8 */
bool disable_sqflow; /* 104 1 */
bool hdr_digest; /* 105 1 */
bool data_digest; /* 106 1 */

/* XXX 1 byte hole, try to pack */

unsigned int nr_write_queues; /* 108 4 */
unsigned int nr_poll_queues; /* 112 4 */
int tos; /* 116 4 */
int fast_io_fail_tmo; /* 120 4 */

/* size: 128, cachelines: 2, members: 24 */
/* sum members: 121, holes: 2, sum holes: 3 */
/* padding: 4 */
};
---
drivers/nvme/host/fabrics.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index dcac3df8a5f7..1bc57fb75825 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -77,6 +77,9 @@ enum {
* with the parsing opts enum.
* @mask: Used by the fabrics library to parse through sysfs options
* on adding a NVMe controller.
+ * @max_reconnects: maximum number of allowed reconnect attempts before removing
+ * the controller, (-1) means reconnect forever, zero means remove
+ * immediately;
* @transport: Holds the fabric transport "technology name" (for a lack of
* better description) that will be used by an NVMe controller
* being added.
@@ -96,9 +99,6 @@ enum {
* @discovery_nqn: indicates if the subsysnqn is the well-known discovery NQN.
* @kato: Keep-alive timeout.
* @host: Virtual NVMe host, contains the NQN and Host ID.
- * @max_reconnects: maximum number of allowed reconnect attempts before removing
- * the controller, (-1) means reconnect forever, zero means remove
- * immediately;
* @dhchap_secret: DH-HMAC-CHAP secret
* @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
* authentication
@@ -112,6 +112,7 @@ enum {
*/
struct nvmf_ctrl_options {
unsigned mask;
+ int max_reconnects;
char *transport;
char *subsysnqn;
char *traddr;
@@ -125,7 +126,6 @@ struct nvmf_ctrl_options {
bool duplicate_connect;
unsigned int kato;
struct nvmf_host *host;
- int max_reconnects;
char *dhchap_secret;
char *dhchap_ctrl_secret;
bool disable_sqflow;
--
2.34.1

2023-05-01 12:43:02

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
bytes when all CONFIG_* are defined.

This structure is embedded into some other structures, so it helps reducing
their size as well.

Signed-off-by: Christophe JAILLET <[email protected]>
---
Using pahole

Before:
======
struct nvme_ctrl {
bool comp_seen; /* 0 1 */

/* XXX 3 bytes hole, try to pack */

enum nvme_ctrl_state state; /* 4 4 */
bool identified; /* 8 1 */

/* XXX 7 bytes hole, try to pack */

spinlock_t lock; /* 16 72 */
/* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
struct mutex scan_lock; /* 88 160 */
/* --- cacheline 3 boundary (192 bytes) was 56 bytes ago --- */
const struct nvme_ctrl_ops * ops; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct request_queue * admin_q; /* 256 8 */
struct request_queue * connect_q; /* 264 8 */
struct request_queue * fabrics_q; /* 272 8 */
struct device * dev; /* 280 8 */
int instance; /* 288 4 */
int numa_node; /* 292 4 */
struct blk_mq_tag_set * tagset; /* 296 8 */
struct blk_mq_tag_set * admin_tagset; /* 304 8 */
struct list_head namespaces; /* 312 16 */
/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
struct rw_semaphore namespaces_rwsem; /* 328 168 */
/* --- cacheline 7 boundary (448 bytes) was 48 bytes ago --- */
struct device ctrl_device __attribute__((__aligned__(8))); /* 496 1400 */

/* XXX last struct has 3 bytes of padding */

/* --- cacheline 29 boundary (1856 bytes) was 40 bytes ago --- */
struct device * device; /* 1896 8 */
struct device * hwmon_device; /* 1904 8 */
struct cdev cdev; /* 1912 296 */
/* --- cacheline 34 boundary (2176 bytes) was 32 bytes ago --- */
struct work_struct reset_work; /* 2208 80 */
/* --- cacheline 35 boundary (2240 bytes) was 48 bytes ago --- */
struct work_struct delete_work; /* 2288 80 */
/* --- cacheline 37 boundary (2368 bytes) --- */
wait_queue_head_t state_wq; /* 2368 88 */
/* --- cacheline 38 boundary (2432 bytes) was 24 bytes ago --- */
struct nvme_subsystem * subsys; /* 2456 8 */
struct list_head subsys_entry; /* 2464 16 */
struct opal_dev * opal_dev; /* 2480 8 */
char name[12]; /* 2488 12 */
/* --- cacheline 39 boundary (2496 bytes) was 4 bytes ago --- */
u16 cntlid; /* 2500 2 */

/* XXX 2 bytes hole, try to pack */

u32 ctrl_config; /* 2504 4 */
u16 mtfa; /* 2508 2 */

/* XXX 2 bytes hole, try to pack */

u32 queue_count; /* 2512 4 */

/* XXX 4 bytes hole, try to pack */

u64 cap; /* 2520 8 */
u32 max_hw_sectors; /* 2528 4 */
u32 max_segments; /* 2532 4 */
u32 max_integrity_segments; /* 2536 4 */
u32 max_discard_sectors; /* 2540 4 */
u32 max_discard_segments; /* 2544 4 */
u32 max_zeroes_sectors; /* 2548 4 */
u32 max_zone_append; /* 2552 4 */
u16 crdt[3]; /* 2556 6 */
/* --- cacheline 40 boundary (2560 bytes) was 2 bytes ago --- */
u16 oncs; /* 2562 2 */
u32 dmrsl; /* 2564 4 */
u16 oacs; /* 2568 2 */
u16 sqsize; /* 2570 2 */
u32 max_namespaces; /* 2572 4 */
atomic_t abort_limit; /* 2576 4 */
u8 vwc; /* 2580 1 */

/* XXX 3 bytes hole, try to pack */

u32 vs; /* 2584 4 */
u32 sgls; /* 2588 4 */
u16 kas; /* 2592 2 */
u8 npss; /* 2594 1 */
u8 apsta; /* 2595 1 */
u16 wctemp; /* 2596 2 */
u16 cctemp; /* 2598 2 */
u32 oaes; /* 2600 4 */
u32 aen_result; /* 2604 4 */
u32 ctratt; /* 2608 4 */
unsigned int shutdown_timeout; /* 2612 4 */
unsigned int kato; /* 2616 4 */
bool subsystem; /* 2620 1 */

/* XXX 3 bytes hole, try to pack */

/* --- cacheline 41 boundary (2624 bytes) --- */
long unsigned int quirks; /* 2624 8 */
struct nvme_id_power_state psd[32]; /* 2632 1024 */
/* --- cacheline 57 boundary (3648 bytes) was 8 bytes ago --- */
struct nvme_effects_log * effects; /* 3656 8 */
struct xarray cels; /* 3664 88 */
/* --- cacheline 58 boundary (3712 bytes) was 40 bytes ago --- */
struct work_struct scan_work; /* 3752 80 */
/* --- cacheline 59 boundary (3776 bytes) was 56 bytes ago --- */
struct work_struct async_event_work; /* 3832 80 */
/* --- cacheline 61 boundary (3904 bytes) was 8 bytes ago --- */
struct delayed_work ka_work; /* 3912 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 64 boundary (4096 bytes) --- */
struct delayed_work failfast_work; /* 4096 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 66 boundary (4224 bytes) was 56 bytes ago --- */
struct nvme_command ka_cmd; /* 4280 64 */
/* --- cacheline 67 boundary (4288 bytes) was 56 bytes ago --- */
struct work_struct fw_act_work; /* 4344 80 */
/* --- cacheline 69 boundary (4416 bytes) was 8 bytes ago --- */
long unsigned int events; /* 4424 8 */
u8 anacap; /* 4432 1 */
u8 anatt; /* 4433 1 */

/* XXX 2 bytes hole, try to pack */

u32 anagrpmax; /* 4436 4 */
u32 nanagrpid; /* 4440 4 */

/* XXX 4 bytes hole, try to pack */

struct mutex ana_lock; /* 4448 160 */
/* --- cacheline 72 boundary (4608 bytes) --- */
struct nvme_ana_rsp_hdr * ana_log_buf; /* 4608 8 */
size_t ana_log_size; /* 4616 8 */
struct timer_list anatt_timer; /* 4624 88 */
/* --- cacheline 73 boundary (4672 bytes) was 40 bytes ago --- */
struct work_struct ana_work; /* 4712 80 */
/* --- cacheline 74 boundary (4736 bytes) was 56 bytes ago --- */
struct work_struct dhchap_auth_work; /* 4792 80 */
/* --- cacheline 76 boundary (4864 bytes) was 8 bytes ago --- */
struct mutex dhchap_auth_mutex; /* 4872 160 */
/* --- cacheline 78 boundary (4992 bytes) was 40 bytes ago --- */
struct nvme_dhchap_queue_context * dhchap_ctxs; /* 5032 8 */
struct nvme_dhchap_key * host_key; /* 5040 8 */
struct nvme_dhchap_key * ctrl_key; /* 5048 8 */
/* --- cacheline 79 boundary (5056 bytes) --- */
u16 transaction; /* 5056 2 */

/* XXX 6 bytes hole, try to pack */

u64 ps_max_latency_us; /* 5064 8 */
bool apst_enabled; /* 5072 1 */

/* XXX 3 bytes hole, try to pack */

u32 hmpre; /* 5076 4 */
u32 hmmin; /* 5080 4 */
u32 hmminds; /* 5084 4 */
u16 hmmaxd; /* 5088 2 */

/* XXX 2 bytes hole, try to pack */

u32 ioccsz; /* 5092 4 */
u32 iorcsz; /* 5096 4 */
u16 icdoff; /* 5100 2 */
u16 maxcmd; /* 5102 2 */
int nr_reconnects; /* 5104 4 */

/* XXX 4 bytes hole, try to pack */

long unsigned int flags; /* 5112 8 */
/* --- cacheline 80 boundary (5120 bytes) --- */
struct nvmf_ctrl_options * opts; /* 5120 8 */
struct page * discard_page; /* 5128 8 */
long unsigned int discard_page_busy; /* 5136 8 */
struct nvme_fault_inject fault_inject; /* 5144 216 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 83 boundary (5312 bytes) was 48 bytes ago --- */
enum nvme_ctrl_type cntrltype; /* 5360 4 */
enum nvme_dctype dctype; /* 5364 4 */

/* size: 5368, cachelines: 84, members: 104 */
/* sum members: 5323, holes: 13, sum holes: 45 */
/* paddings: 4, sum paddings: 15 */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));


After:
=====
struct nvme_ctrl {
bool comp_seen; /* 0 1 */
bool identified; /* 1 1 */

/* XXX 2 bytes hole, try to pack */

enum nvme_ctrl_state state; /* 4 4 */
spinlock_t lock; /* 8 72 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
struct mutex scan_lock; /* 80 160 */
/* --- cacheline 3 boundary (192 bytes) was 48 bytes ago --- */
const struct nvme_ctrl_ops * ops; /* 240 8 */
struct request_queue * admin_q; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct request_queue * connect_q; /* 256 8 */
struct request_queue * fabrics_q; /* 264 8 */
struct device * dev; /* 272 8 */
int instance; /* 280 4 */
int numa_node; /* 284 4 */
struct blk_mq_tag_set * tagset; /* 288 8 */
struct blk_mq_tag_set * admin_tagset; /* 296 8 */
struct list_head namespaces; /* 304 16 */
/* --- cacheline 5 boundary (320 bytes) --- */
struct rw_semaphore namespaces_rwsem; /* 320 168 */
/* --- cacheline 7 boundary (448 bytes) was 40 bytes ago --- */
struct device ctrl_device __attribute__((__aligned__(8))); /* 488 1400 */

/* XXX last struct has 3 bytes of padding */

/* --- cacheline 29 boundary (1856 bytes) was 32 bytes ago --- */
struct device * device; /* 1888 8 */
struct device * hwmon_device; /* 1896 8 */
struct cdev cdev; /* 1904 296 */
/* --- cacheline 34 boundary (2176 bytes) was 24 bytes ago --- */
struct work_struct reset_work; /* 2200 80 */
/* --- cacheline 35 boundary (2240 bytes) was 40 bytes ago --- */
struct work_struct delete_work; /* 2280 80 */
/* --- cacheline 36 boundary (2304 bytes) was 56 bytes ago --- */
wait_queue_head_t state_wq; /* 2360 88 */
/* --- cacheline 38 boundary (2432 bytes) was 16 bytes ago --- */
struct nvme_subsystem * subsys; /* 2448 8 */
struct list_head subsys_entry; /* 2456 16 */
struct opal_dev * opal_dev; /* 2472 8 */
char name[12]; /* 2480 12 */
u16 cntlid; /* 2492 2 */
u16 mtfa; /* 2494 2 */
/* --- cacheline 39 boundary (2496 bytes) --- */
u32 ctrl_config; /* 2496 4 */
u32 queue_count; /* 2500 4 */
u64 cap; /* 2504 8 */
u32 max_hw_sectors; /* 2512 4 */
u32 max_segments; /* 2516 4 */
u32 max_integrity_segments; /* 2520 4 */
u32 max_discard_sectors; /* 2524 4 */
u32 max_discard_segments; /* 2528 4 */
u32 max_zeroes_sectors; /* 2532 4 */
u32 max_zone_append; /* 2536 4 */
u16 crdt[3]; /* 2540 6 */
u16 oncs; /* 2546 2 */
u32 dmrsl; /* 2548 4 */
u16 oacs; /* 2552 2 */
u16 sqsize; /* 2554 2 */
u32 max_namespaces; /* 2556 4 */
/* --- cacheline 40 boundary (2560 bytes) --- */
atomic_t abort_limit; /* 2560 4 */
u8 vwc; /* 2564 1 */

/* XXX 3 bytes hole, try to pack */

u32 vs; /* 2568 4 */
u32 sgls; /* 2572 4 */
u16 kas; /* 2576 2 */
u8 npss; /* 2578 1 */
u8 apsta; /* 2579 1 */
u16 wctemp; /* 2580 2 */
u16 cctemp; /* 2582 2 */
u32 oaes; /* 2584 4 */
u32 aen_result; /* 2588 4 */
u32 ctratt; /* 2592 4 */
unsigned int shutdown_timeout; /* 2596 4 */
unsigned int kato; /* 2600 4 */
bool subsystem; /* 2604 1 */

/* XXX 3 bytes hole, try to pack */

long unsigned int quirks; /* 2608 8 */
struct nvme_id_power_state psd[32]; /* 2616 1024 */
/* --- cacheline 56 boundary (3584 bytes) was 56 bytes ago --- */
struct nvme_effects_log * effects; /* 3640 8 */
/* --- cacheline 57 boundary (3648 bytes) --- */
struct xarray cels; /* 3648 88 */
/* --- cacheline 58 boundary (3712 bytes) was 24 bytes ago --- */
struct work_struct scan_work; /* 3736 80 */
/* --- cacheline 59 boundary (3776 bytes) was 40 bytes ago --- */
struct work_struct async_event_work; /* 3816 80 */
/* --- cacheline 60 boundary (3840 bytes) was 56 bytes ago --- */
struct delayed_work ka_work; /* 3896 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 63 boundary (4032 bytes) was 48 bytes ago --- */
struct delayed_work failfast_work; /* 4080 184 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 66 boundary (4224 bytes) was 40 bytes ago --- */
struct nvme_command ka_cmd; /* 4264 64 */
/* --- cacheline 67 boundary (4288 bytes) was 40 bytes ago --- */
struct work_struct fw_act_work; /* 4328 80 */
/* --- cacheline 68 boundary (4352 bytes) was 56 bytes ago --- */
long unsigned int events; /* 4408 8 */
/* --- cacheline 69 boundary (4416 bytes) --- */
u8 anacap; /* 4416 1 */
u8 anatt; /* 4417 1 */

/* XXX 2 bytes hole, try to pack */

u32 anagrpmax; /* 4420 4 */
u32 nanagrpid; /* 4424 4 */

/* XXX 4 bytes hole, try to pack */

struct mutex ana_lock; /* 4432 160 */
/* --- cacheline 71 boundary (4544 bytes) was 48 bytes ago --- */
struct nvme_ana_rsp_hdr * ana_log_buf; /* 4592 8 */
size_t ana_log_size; /* 4600 8 */
/* --- cacheline 72 boundary (4608 bytes) --- */
struct timer_list anatt_timer; /* 4608 88 */
/* --- cacheline 73 boundary (4672 bytes) was 24 bytes ago --- */
struct work_struct ana_work; /* 4696 80 */
/* --- cacheline 74 boundary (4736 bytes) was 40 bytes ago --- */
struct work_struct dhchap_auth_work; /* 4776 80 */
/* --- cacheline 75 boundary (4800 bytes) was 56 bytes ago --- */
struct mutex dhchap_auth_mutex; /* 4856 160 */
/* --- cacheline 78 boundary (4992 bytes) was 24 bytes ago --- */
struct nvme_dhchap_queue_context * dhchap_ctxs; /* 5016 8 */
struct nvme_dhchap_key * host_key; /* 5024 8 */
struct nvme_dhchap_key * ctrl_key; /* 5032 8 */
u16 transaction; /* 5040 2 */

/* XXX 6 bytes hole, try to pack */

u64 ps_max_latency_us; /* 5048 8 */
/* --- cacheline 79 boundary (5056 bytes) --- */
bool apst_enabled; /* 5056 1 */

/* XXX 1 byte hole, try to pack */

u16 hmmaxd; /* 5058 2 */
u32 hmpre; /* 5060 4 */
u32 hmmin; /* 5064 4 */
u32 hmminds; /* 5068 4 */
u32 ioccsz; /* 5072 4 */
u32 iorcsz; /* 5076 4 */
u16 icdoff; /* 5080 2 */
u16 maxcmd; /* 5082 2 */
int nr_reconnects; /* 5084 4 */
long unsigned int flags; /* 5088 8 */
struct nvmf_ctrl_options * opts; /* 5096 8 */
struct page * discard_page; /* 5104 8 */
long unsigned int discard_page_busy; /* 5112 8 */
/* --- cacheline 80 boundary (5120 bytes) --- */
struct nvme_fault_inject fault_inject; /* 5120 216 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 83 boundary (5312 bytes) was 24 bytes ago --- */
enum nvme_ctrl_type cntrltype; /* 5336 4 */
enum nvme_dctype dctype; /* 5340 4 */

/* size: 5344, cachelines: 84, members: 104 */
/* sum members: 5323, holes: 7, sum holes: 21 */
/* paddings: 4, sum paddings: 15 */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
---
drivers/nvme/host/nvme.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bf46f122e9e1..53417b6439a7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -246,8 +246,8 @@ enum nvme_ctrl_flags {

struct nvme_ctrl {
bool comp_seen;
- enum nvme_ctrl_state state;
bool identified;
+ enum nvme_ctrl_state state;
spinlock_t lock;
struct mutex scan_lock;
const struct nvme_ctrl_ops *ops;
@@ -279,8 +279,8 @@ struct nvme_ctrl {
char name[12];
u16 cntlid;

- u32 ctrl_config;
u16 mtfa;
+ u32 ctrl_config;
u32 queue_count;

u64 cap;
@@ -353,10 +353,10 @@ struct nvme_ctrl {
bool apst_enabled;

/* PCIe only: */
+ u16 hmmaxd;
u32 hmpre;
u32 hmmin;
u32 hmminds;
- u16 hmmaxd;

/* Fabrics only */
u32 ioccsz;
--
2.34.1

2023-05-01 12:43:23

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 5/5] nvmet: Reorder fields in 'struct nvmefc_fcp_req'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmefc_fcp_req' from
112 to 104 bytes.

This structure is embedded in some other structures (nvme_fc_fcp_op
which itself is embedded in nvme_fcp_op_w_sgl), so it helps reducing the
size of these structures too.

Signed-off-by: Christophe JAILLET <[email protected]>
---
Using pahole

Before:
======
struct nvmefc_fcp_req {
void * cmdaddr; /* 0 8 */
void * rspaddr; /* 8 8 */
dma_addr_t cmddma; /* 16 8 */
dma_addr_t rspdma; /* 24 8 */
u16 cmdlen; /* 32 2 */
u16 rsplen; /* 34 2 */
u32 payload_length; /* 36 4 */
struct sg_table sg_table; /* 40 16 */
struct scatterlist * first_sgl; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int sg_cnt; /* 64 4 */
enum nvmefc_fcp_datadir io_dir; /* 68 4 */
__le16 sqid; /* 72 2 */

/* XXX 6 bytes hole, try to pack */

void (*done)(struct nvmefc_fcp_req *); /* 80 8 */
void * private; /* 88 8 */
u32 transferred_length; /* 96 4 */
u16 rcv_rsplen; /* 100 2 */

/* XXX 2 bytes hole, try to pack */

u32 status; /* 104 4 */

/* size: 112, cachelines: 2, members: 17 */
/* sum members: 100, holes: 2, sum holes: 8 */
/* padding: 4 */
/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));


After:
=====
struct nvmefc_fcp_req {
void * cmdaddr; /* 0 8 */
void * rspaddr; /* 8 8 */
dma_addr_t cmddma; /* 16 8 */
dma_addr_t rspdma; /* 24 8 */
u16 cmdlen; /* 32 2 */
u16 rsplen; /* 34 2 */
u32 payload_length; /* 36 4 */
struct sg_table sg_table; /* 40 16 */
struct scatterlist * first_sgl; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int sg_cnt; /* 64 4 */
enum nvmefc_fcp_datadir io_dir; /* 68 4 */
void (*done)(struct nvmefc_fcp_req *); /* 72 8 */
void * private; /* 80 8 */
__le16 sqid; /* 88 2 */
u16 rcv_rsplen; /* 90 2 */
u32 transferred_length; /* 92 4 */
u32 status; /* 96 4 */

/* size: 104, cachelines: 2, members: 17 */
/* padding: 4 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
---
include/linux/nvme-fc-driver.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h
index fa092b9be2fd..4109f1bd6128 100644
--- a/include/linux/nvme-fc-driver.h
+++ b/include/linux/nvme-fc-driver.h
@@ -185,7 +185,6 @@ enum nvmefc_fcp_datadir {
* @first_sgl: memory for 1st scatter/gather list segment for payload data
* @sg_cnt: number of elements in the scatter/gather list
* @io_dir: direction of the FCP request (see NVMEFC_FCP_xxx)
- * @sqid: The nvme SQID the command is being issued on
* @done: The callback routine the LLDD is to invoke upon completion of
* the FCP operation. req argument is the pointer to the original
* FCP IO operation.
@@ -194,12 +193,13 @@ enum nvmefc_fcp_datadir {
* while processing the operation. The length of the buffer
* corresponds to the fcprqst_priv_sz value specified in the
* nvme_fc_port_template supplied by the LLDD.
+ * @sqid: The nvme SQID the command is being issued on
*
* Values set by the LLDD indicating completion status of the FCP operation.
* Must be set prior to calling the done() callback.
+ * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
* @transferred_length: amount of payload data, in bytes, that were
* transferred. Should equal payload_length on success.
- * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
* @status: Completion status of the FCP operation. must be 0 upon success,
* negative errno value upon failure (ex: -EIO). Note: this is
* NOT a reflection of the NVME CQE completion status. Only the
@@ -219,14 +219,14 @@ struct nvmefc_fcp_req {
int sg_cnt;
enum nvmefc_fcp_datadir io_dir;

- __le16 sqid;
-
void (*done)(struct nvmefc_fcp_req *req);

void *private;

- u32 transferred_length;
+ __le16 sqid;
+
u16 rcv_rsplen;
+ u32 transferred_length;
u32 status;
} __aligned(sizeof(u64)); /* alignment for other things alloc'd with */

--
2.34.1

2023-05-01 12:43:55

by Christophe JAILLET

[permalink] [raw]
Subject: [PATCH 4/5] nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvme_dhchap_queue_context' from
416 to 400 bytes.

This structure is kvcalloc()'ed in nvme_auth_init_ctrl(), so it is likely
that the allocation can be relatively big. Saving 16 bytes per structure
may might a slight difference.

Signed-off-by: Christophe JAILLET <[email protected]>
---
Using pahole

Before:
======
struct nvme_dhchap_queue_context {
struct list_head entry; /* 0 16 */
struct work_struct auth_work; /* 16 80 */
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
struct nvme_ctrl * ctrl; /* 96 8 */
struct crypto_shash * shash_tfm; /* 104 8 */
struct crypto_kpp * dh_tfm; /* 112 8 */
void * buf; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
int qid; /* 128 4 */
int error; /* 132 4 */
u32 s1; /* 136 4 */
u32 s2; /* 140 4 */
u16 transaction; /* 144 2 */
u8 status; /* 146 1 */
u8 hash_id; /* 147 1 */

/* XXX 4 bytes hole, try to pack */

size_t hash_len; /* 152 8 */
u8 dhgroup_id; /* 160 1 */
u8 c1[64]; /* 161 64 */
/* --- cacheline 3 boundary (192 bytes) was 33 bytes ago --- */
u8 c2[64]; /* 225 64 */
/* --- cacheline 4 boundary (256 bytes) was 33 bytes ago --- */
u8 response[64]; /* 289 64 */

/* XXX 7 bytes hole, try to pack */

/* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
u8 * host_response; /* 360 8 */
u8 * ctrl_key; /* 368 8 */
int ctrl_key_len; /* 376 4 */

/* XXX 4 bytes hole, try to pack */

/* --- cacheline 6 boundary (384 bytes) --- */
u8 * host_key; /* 384 8 */
int host_key_len; /* 392 4 */

/* XXX 4 bytes hole, try to pack */

u8 * sess_key; /* 400 8 */
int sess_key_len; /* 408 4 */

/* size: 416, cachelines: 7, members: 25 */
/* sum members: 393, holes: 4, sum holes: 19 */
/* padding: 4 */
/* last cacheline: 32 bytes */
};


After:
=====
struct nvme_dhchap_queue_context {
struct list_head entry; /* 0 16 */
struct work_struct auth_work; /* 16 80 */
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
struct nvme_ctrl * ctrl; /* 96 8 */
struct crypto_shash * shash_tfm; /* 104 8 */
struct crypto_kpp * dh_tfm; /* 112 8 */
void * buf; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
int qid; /* 128 4 */
int error; /* 132 4 */
u32 s1; /* 136 4 */
u32 s2; /* 140 4 */
u16 transaction; /* 144 2 */
u8 status; /* 146 1 */
u8 dhgroup_id; /* 147 1 */
u8 hash_id; /* 148 1 */

/* XXX 3 bytes hole, try to pack */

size_t hash_len; /* 152 8 */
u8 c1[64]; /* 160 64 */
/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
u8 c2[64]; /* 224 64 */
/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
u8 response[64]; /* 288 64 */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
u8 * host_response; /* 352 8 */
u8 * ctrl_key; /* 360 8 */
u8 * host_key; /* 368 8 */
u8 * sess_key; /* 376 8 */
/* --- cacheline 6 boundary (384 bytes) --- */
int ctrl_key_len; /* 384 4 */
int host_key_len; /* 388 4 */
int sess_key_len; /* 392 4 */

/* size: 400, cachelines: 7, members: 25 */
/* sum members: 393, holes: 1, sum holes: 3 */
/* padding: 4 */
/* last cacheline: 16 bytes */
};
---
drivers/nvme/host/auth.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index ea16a0aba679..daf5d144a8ea 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -30,18 +30,18 @@ struct nvme_dhchap_queue_context {
u32 s2;
u16 transaction;
u8 status;
+ u8 dhgroup_id;
u8 hash_id;
size_t hash_len;
- u8 dhgroup_id;
u8 c1[64];
u8 c2[64];
u8 response[64];
u8 *host_response;
u8 *ctrl_key;
- int ctrl_key_len;
u8 *host_key;
- int host_key_len;
u8 *sess_key;
+ int ctrl_key_len;
+ int host_key_len;
int sess_key_len;
};

--
2.34.1

2023-05-01 13:59:24

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH 0/5] optimize some data structure in nvme


> This serie is a proposal to slighly optimize the memory needed for some
> structures used in nvme.
>
> This follows the discussion in [1].
>
> Honnestly, I'm not convinced that this serie really brings semething.
> Because of the way memory alocation works, and its over-allocation to try to
> avoid memory fragmentation, some limited gains are most of the time useless.
>
> It could still help:
> - many holes in structure can, at some point, have its size reach a threshold
> (this is specially true if such structures are allocated with kcalloc() or
> kmalloc_array())
> - it can save some space in some other structures if embedded in them
> - it can save a few cycles if the structure is memcpy()'ed or zeroed, for
> example
> - can reduce cache usage
>
> With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
> much more theorical.
>
> The changes are really limited, so even if the gain is marginal, maybe it still
> makes sense to merge them.

Don't see why not, given they make do the structures smaller.

Reviewed-by: Sagi Grimberg <[email protected]>

2023-05-12 20:44:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/5] optimize some data structure in nvme

Thanks,

the whole series looks good to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-05-13 01:07:54

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH 0/5] optimize some data structure in nvme

On 5/1/23 05:40, Christophe JAILLET wrote:
> This serie is a proposal to slighly optimize the memory needed for some
> structures used in nvme.
>
> This follows the discussion in [1].
>
> Honnestly, I'm not convinced that this serie really brings semething.
> Because of the way memory alocation works, and its over-allocation to try to
> avoid memory fragmentation, some limited gains are most of the time useless.
>
> It could still help:
> - many holes in structure can, at some point, have its size reach a threshold
> (this is specially true if such structures are allocated with kcalloc() or
> kmalloc_array())
> - it can save some space in some other structures if embedded in them
> - it can save a few cycles if the structure is memcpy()'ed or zeroed, for
> example
> - can reduce cache usage
>
> With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
> much more theorical.
>
> The changes are really limited, so even if the gain is marginal, maybe it still
> makes sense to merge them.
>
> Each patch gives the layout generated by pahole before and after the patch.
>
> [1]: https://lore.kernel.org/all/[email protected]/
>
> Christophe JAILLET (5):
> nvmet: Reorder fields in 'struct nvmet_sq'
> nvmet: Reorder fields in 'struct nvme_ctrl'
> nvmet: Reorder fields in 'struct nvmf_ctrl_options'
> nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'
> nvmet: Reorder fields in 'struct nvmefc_fcp_req'
>
> drivers/nvme/host/auth.c | 6 +++---
> drivers/nvme/host/fabrics.h | 8 ++++----
> drivers/nvme/host/nvme.h | 6 +++---
> drivers/nvme/target/nvmet.h | 4 ++--
> include/linux/nvme-fc-driver.h | 10 +++++-----
> 5 files changed, 17 insertions(+), 17 deletions(-)
>

thanks a lot for doing this and following the comment on the other patch.

Looks good.

Reviewed-by: Chaitanya Kulkarni <[email protected]>

-ck


2023-05-15 14:25:52

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'

On Mon, May 01, 2023 at 02:40:26PM +0200, Christophe JAILLET wrote:
> Group some variables based on their sizes to reduce holes.
> On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
> bytes when all CONFIG_* are defined.

This patch is changing struct nvme_ctrl but the commit log only mentions
struct nvmet_sq, which was handled in patch 1/5. I'll just fix that up
when applying.

> This structure is embedded into some other structures, so it helps reducing
> their size as well.
>
> Signed-off-by: Christophe JAILLET <[email protected]>
> ---
> Using pahole
>
> Before:
> ======
> struct nvme_ctrl {
> bool comp_seen; /* 0 1 */

2023-05-15 17:32:23

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 0/5] optimize some data structure in nvme

Thanks, applied to nvme-6.5.

2023-05-15 17:33:34

by Christophe JAILLET

[permalink] [raw]
Subject: Re: [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'

Le 15/05/2023 à 16:21, Keith Busch a écrit :
> On Mon, May 01, 2023 at 02:40:26PM +0200, Christophe JAILLET wrote:
>> Group some variables based on their sizes to reduce holes.
>> On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
>> bytes when all CONFIG_* are defined.
>
> This patch is changing struct nvme_ctrl but the commit log only mentions
> struct nvmet_sq, which was handled in patch 1/5. I'll just fix that up
> when applying.
>

Thanks for catching and fixing it :).

CJ