virtio-mem currently only supports device block sizes that span at most
a single Linux memory block. For example, gigantic pages in the hypervisor
result on x86-64 in a device block size of 1 GiB - when the Linux memory
block size is 128 MiB, we cannot support such devices (we fail loading the
driver). Of course, we want to support any device block size in any Linux
VM.
Bigger device block sizes will become especially important once supporting
VFIO in QEMU - each device block has to be mapped separately, and the
maximum number of mappings for VFIO is 64k. So we usually want blocks in
the gigabyte range when wanting to grow the VM big.
This series:
- Performs some cleanups
- Factors out existing Sub Block Mode (SBM)
- Implements memory hot(un)plug in Big Block Mode (BBM)
I need one core-mm change, to make offline_and_remove_memory() eat bigger
chunks.
This series is based on "next-20201009" and can be found at:
[email protected]:virtio-mem/linux.git virtio-mem-dbm-v1
Once some virtio-mem patches that are pending in the -mm tree are upstream
(I guess they'll go in in 5.10), I'll resend based on Linus' tree.
I suggest to take this (including the MM patch, acks/review please) via the
vhost tree once time has come. In the meantime, I'll do more testing.
David Hildenbrand (29):
virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
virtio-mem: simplify calculation in
virtio_mem_mb_state_prepare_next_mb()
virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
virtio-mem: generalize check for added memory
virtio-mem: generalize virtio_mem_owned_mb()
virtio-mem: generalize virtio_mem_overlaps_range()
virtio-mem: drop last_mb_id
virtio-mem: don't always trigger the workqueue when offlining memory
virtio-mem: generalize handling when memory is getting onlined
deferred
virtio-mem: use "unsigned long" for nr_pages when fake
onlining/offlining
virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
virtio-mem: factor out handling of fake-offline pages in memory
notifier
virtio-mem: retry fake-offlining via alloc_contig_range() on
ZONE_MOVABLE
virito-mem: document Sub Block Mode (SBM)
virtio-mem: memory block states are specific to Sub Block Mode (SBM)
virito-mem: subblock states are specific to Sub Block Mode (SBM)
virtio-mem: factor out calculation of the bit number within the
sb_states bitmap
virito-mem: existing (un)plug functions are specific to Sub Block Mode
(SBM)
virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block
Mode (SBM)
virtio-mem: memory notifier callbacks are specific to Sub Block Mode
(SBM)
virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
virtio-mem: factor out adding/removing memory from Linux
virtio-mem: print debug messages from virtio_mem_send_*_request()
virtio-mem: Big Block Mode (BBM) memory hotplug
virtio-mem: allow to force Big Block Mode (BBM) and set the big block
size
mm/memory_hotplug: extend offline_and_remove_memory() to handle more
than one memory block
virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
drivers/virtio/virtio_mem.c | 1783 +++++++++++++++++++++++++----------
mm/memory_hotplug.c | 105 ++-
2 files changed, 1373 insertions(+), 515 deletions(-)
--
2.26.2
We don't want to add too much memory when it's not getting onlined
immediately, to avoid running OOM. Generalize the handling, to avoid
making use of memory block states. Use a threshold of 1 GiB for now.
Properly adjust the offline size when adding/removing memory. As we are
not always protected by a lock when touching the offline size, use an
atomic64_t. We don't care about races (e.g., someone offlining memory
while we are adding more), only about consistent values.
(1 GiB needs a memmap of ~16MiB - which sounds reasonable even for
setups with little boot memory and (possibly) one virtio-mem device per
node)
We don't want to retrigger when onlining is caused immediately by our
action (e.g., adding memory which immediately gets onlined), so use a
flag to indicate if the workqueue is active and use that as an
indicator whether to trigger a retry.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 95 ++++++++++++++++++++++++-------------
1 file changed, 63 insertions(+), 32 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 8ea00f0b2ecd..cb2e8f254650 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -51,6 +51,7 @@ struct virtio_mem {
/* Workqueue that processes the plug/unplug requests. */
struct work_struct wq;
+ atomic_t wq_active;
atomic_t config_changed;
/* Virtqueue for guest->host requests. */
@@ -99,7 +100,15 @@ struct virtio_mem {
/* Summary of all memory block states. */
unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
-#define VIRTIO_MEM_NB_OFFLINE_THRESHOLD 10
+
+ /*
+ * We don't want to add too much memory if it's not getting onlined,
+ * to avoid running OOM. Besides this threshold, we allow to have at
+ * least two offline blocks at a time (whatever is bigger).
+ */
+#define VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD (1024 * 1024 * 1024)
+ atomic64_t offline_size;
+ uint64_t offline_threshold;
/*
* One byte state per memory block.
@@ -393,6 +402,18 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
return 0;
}
+/*
+ * Test if we could add memory without creating too much offline memory -
+ * to avoid running OOM if memory is getting onlined deferred.
+ */
+static bool virtio_mem_could_add_memory(struct virtio_mem *vm, uint64_t size)
+{
+ if (WARN_ON_ONCE(size > vm->offline_threshold))
+ return false;
+
+ return atomic64_read(&vm->offline_size) + size <= vm->offline_threshold;
+}
+
/*
* Try to add a memory block to Linux. This will usually only fail
* if out of memory.
@@ -405,6 +426,8 @@ static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
{
const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+ const uint64_t size = memory_block_size_bytes();
+ int rc;
/*
* When force-unloading the driver and we still have memory added to
@@ -418,10 +441,13 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
}
dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
- return add_memory_driver_managed(vm->nid, addr,
- memory_block_size_bytes(),
- vm->resource_name,
- MEMHP_MERGE_RESOURCE);
+ /* Memory might get onlined immediately. */
+ atomic64_add(size, &vm->offline_size);
+ rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
+ MEMHP_MERGE_RESOURCE);
+ if (rc)
+ atomic64_sub(size, &vm->offline_size);
+ return rc;
}
/*
@@ -436,16 +462,19 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id)
{
const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+ const uint64_t size = memory_block_size_bytes();
int rc;
dev_dbg(&vm->vdev->dev, "removing memory block: %lu\n", mb_id);
- rc = remove_memory(vm->nid, addr, memory_block_size_bytes());
- if (!rc)
+ rc = remove_memory(vm->nid, addr, size);
+ if (!rc) {
+ atomic64_sub(size, &vm->offline_size);
/*
* We might have freed up memory we can now unplug, retry
* immediately instead of waiting.
*/
virtio_mem_retry(vm);
+ }
return rc;
}
@@ -461,18 +490,20 @@ static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm,
unsigned long mb_id)
{
const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id);
+ const uint64_t size = memory_block_size_bytes();
int rc;
dev_dbg(&vm->vdev->dev, "offlining and removing memory block: %lu\n",
mb_id);
- rc = offline_and_remove_memory(vm->nid, addr,
- memory_block_size_bytes());
- if (!rc)
+ rc = offline_and_remove_memory(vm->nid, addr, size);
+ if (!rc) {
+ atomic64_sub(size, &vm->offline_size);
/*
* We might have freed up memory we can now unplug, retry
* immediately instead of waiting.
*/
virtio_mem_retry(vm);
+ }
return rc;
}
@@ -555,8 +586,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
{
- unsigned long nb_offline;
-
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
virtio_mem_mb_set_state(vm, mb_id,
@@ -569,12 +598,6 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
BUG();
break;
}
- nb_offline = vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] +
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL];
-
- /* see if we can add new blocks now that we onlined one block */
- if (nb_offline == VIRTIO_MEM_NB_OFFLINE_THRESHOLD - 1)
- virtio_mem_retry(vm);
}
static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
@@ -688,6 +711,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
case MEM_OFFLINE:
virtio_mem_notify_offline(vm, mb_id);
+ atomic64_add(size, &vm->offline_size);
/*
* Trigger the workqueue. Now that we have some offline memory,
* maybe we can handle pending unplug requests.
@@ -700,6 +724,18 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
case MEM_ONLINE:
virtio_mem_notify_online(vm, mb_id);
+
+ atomic64_sub(size, &vm->offline_size);
+ /*
+ * Start adding more memory once we onlined half of our
+ * threshold. Don't trigger if it's possibly due to our actipn
+ * (e.g., us adding memory which gets onlined immediately from
+ * the core).
+ */
+ if (!atomic_read(&vm->wq_active) &&
+ virtio_mem_could_add_memory(vm, vm->offline_threshold / 2))
+ virtio_mem_retry(vm);
+
vm->hotplug_active = false;
mutex_unlock(&vm->hotplug_mutex);
break;
@@ -1060,18 +1096,6 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
return 0;
}
-/*
- * Don't add too many blocks that are not onlined yet to avoid running OOM.
- */
-static bool virtio_mem_too_many_mb_offline(struct virtio_mem *vm)
-{
- unsigned long nb_offline;
-
- nb_offline = vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] +
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL];
- return nb_offline >= VIRTIO_MEM_NB_OFFLINE_THRESHOLD;
-}
-
/*
* Try to plug the desired number of subblocks and add the memory block
* to Linux.
@@ -1225,7 +1249,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
/* Try to plug and add unused blocks */
virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_UNUSED) {
- if (virtio_mem_too_many_mb_offline(vm))
+ if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
return -ENOSPC;
rc = virtio_mem_mb_plug_and_add(vm, mb_id, &nb_sb);
@@ -1236,7 +1260,7 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
/* Try to prepare, plug and add new blocks */
while (nb_sb) {
- if (virtio_mem_too_many_mb_offline(vm))
+ if (!virtio_mem_could_add_memory(vm, memory_block_size_bytes()))
return -ENOSPC;
rc = virtio_mem_prepare_next_mb(vm, &mb_id);
@@ -1536,6 +1560,7 @@ static void virtio_mem_run_wq(struct work_struct *work)
if (vm->broken)
return;
+ atomic_set(&vm->wq_active, 1);
retry:
rc = 0;
@@ -1596,6 +1621,8 @@ static void virtio_mem_run_wq(struct work_struct *work)
"unknown error, marking device broken: %d\n", rc);
vm->broken = true;
}
+
+ atomic_set(&vm->wq_active, 0);
}
static enum hrtimer_restart virtio_mem_timer_expired(struct hrtimer *timer)
@@ -1704,6 +1731,10 @@ static int virtio_mem_init(struct virtio_mem *vm)
memory_block_size_bytes());
vm->next_mb_id = vm->first_mb_id;
+ /* Prepare the offline threshold - make sure we can add two blocks. */
+ vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
+ VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
+
dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
dev_info(&vm->vdev->dev, "device block size: 0x%llx",
--
2.26.2
Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
possible, so we don't have do deal with allocation orders.
Add a comment why we have that restriction for now.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 670b3faf412d..78c2fbcddcf8 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -755,14 +755,15 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
*/
static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
{
- const int order = MAX_ORDER - 1;
+ const unsigned long max_nr_pages = MAX_ORDER_NR_PAGES;
int i;
/*
- * We are always called with subblock granularity, which is at least
- * aligned to MAX_ORDER - 1.
+ * We are always called at least with MAX_ORDER_NR_PAGES
+ * granularity/alignment (e.g., the way subblocks work). All pages
+ * inside such a block are alike.
*/
- for (i = 0; i < nr_pages; i += 1 << order) {
+ for (i = 0; i < nr_pages; i += max_nr_pages) {
struct page *page = pfn_to_page(pfn + i);
/*
@@ -772,14 +773,14 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages)
* alike.
*/
if (PageDirty(page)) {
- virtio_mem_clear_fake_offline(pfn + i, 1 << order,
+ virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
false);
- generic_online_page(page, order);
+ generic_online_page(page, MAX_ORDER - 1);
} else {
- virtio_mem_clear_fake_offline(pfn + i, 1 << order,
+ virtio_mem_clear_fake_offline(pfn + i, max_nr_pages,
true);
- free_contig_range(pfn + i, 1 << order);
- adjust_managed_page_count(page, 1 << order);
+ free_contig_range(pfn + i, max_nr_pages);
+ adjust_managed_page_count(page, max_nr_pages);
}
}
}
@@ -792,7 +793,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
int sb_id;
/*
- * We exploit here that subblocks have at least MAX_ORDER - 1
+ * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
* size/alignment and that this callback is is called with such a
* size/alignment. So we cannot cross subblocks and therefore
* also not memory blocks.
@@ -1675,13 +1676,15 @@ static int virtio_mem_init(struct virtio_mem *vm)
"Some memory is not addressable. This can make some memory unusable.\n");
/*
- * Calculate the subblock size:
- * - At least MAX_ORDER - 1 / pageblock_order.
- * - At least the device block size.
- * In the worst case, a single subblock per memory block.
+ * We want subblocks to span at least MAX_ORDER_NR_PAGES and
+ * pageblock_nr_pages pages. This:
+ * - Simplifies our page onlining code (virtio_mem_online_page_cb)
+ * and fake page onlining code (virtio_mem_fake_online).
+ * - Is required for now for alloc_contig_range() to work reliably -
+ * it doesn't properly handle smaller granularity on ZONE_NORMAL.
*/
- vm->subblock_size = PAGE_SIZE * 1ul << max_t(uint32_t, MAX_ORDER - 1,
- pageblock_order);
+ vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+ pageblock_nr_pages) * PAGE_SIZE;
vm->subblock_size = max_t(uint64_t, vm->device_block_size,
vm->subblock_size);
vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
--
2.26.2
No longer used, let's drop it.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 37a0e338ae4a..5c93f8a65eba 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -84,8 +84,6 @@ struct virtio_mem {
/* Id of the first memory block of this device. */
unsigned long first_mb_id;
- /* Id of the last memory block of this device. */
- unsigned long last_mb_id;
/* Id of the last usable memory block of this device. */
unsigned long last_usable_mb_id;
/* Id of the next memory bock to prepare when needed. */
@@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
memory_block_size_bytes());
vm->next_mb_id = vm->first_mb_id;
- vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
- vm->region_size) - 1;
dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
--
2.26.2
The calculation is already complicated enough, let's limit it to one
location.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 2cc497ad8298..73ff6e9ba839 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
_mb_id--) \
if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
+/*
+ * Calculate the bit number in the sb_states bitmap for the given subblock
+ * inside the given memory block.
+ */
+static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
+ unsigned long mb_id, int sb_id)
+{
+ return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+}
+
/*
* Mark all selected subblocks plugged.
*
@@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
unsigned long mb_id, int sb_id,
int count)
{
- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
__bitmap_set(vm->sbm.sb_states, bit, count);
}
@@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
unsigned long mb_id, int sb_id,
int count)
{
- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
__bitmap_clear(vm->sbm.sb_states, bit, count);
}
@@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
unsigned long mb_id, int sb_id,
int count)
{
- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
if (count == 1)
return test_bit(bit, vm->sbm.sb_states);
@@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
unsigned long mb_id, int sb_id,
int count)
{
- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
/* TODO: Helper similar to bitmap_set() */
return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
@@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
unsigned long mb_id)
{
- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
return find_next_zero_bit(vm->sbm.sb_states,
bit + vm->nb_sb_per_mb, bit) - bit;
--
2.26.2
Let's rename accordingly.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 3a772714fec9..d06c8760b337 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
return start >= vm->addr && start + size <= vm->addr + vm->region_size;
}
-static int virtio_mem_notify_going_online(struct virtio_mem *vm,
- unsigned long mb_id)
+static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
+ unsigned long mb_id)
{
switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
@@ -604,8 +604,8 @@ static int virtio_mem_notify_going_online(struct virtio_mem *vm,
return NOTIFY_BAD;
}
-static void virtio_mem_notify_offline(struct virtio_mem *vm,
- unsigned long mb_id)
+static void virtio_mem_sbm_notify_offline(struct virtio_mem *vm,
+ unsigned long mb_id)
{
switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
case VIRTIO_MEM_SBM_MB_ONLINE_PARTIAL:
@@ -622,7 +622,8 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
}
}
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
+static void virtio_mem_sbm_notify_online(struct virtio_mem *vm,
+ unsigned long mb_id)
{
switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) {
case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL:
@@ -639,8 +640,8 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
}
}
-static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
- unsigned long mb_id)
+static void virtio_mem_sbm_notify_going_offline(struct virtio_mem *vm,
+ unsigned long mb_id)
{
const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
unsigned long pfn;
@@ -655,8 +656,8 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
}
}
-static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
- unsigned long mb_id)
+static void virtio_mem_sbm_notify_cancel_offline(struct virtio_mem *vm,
+ unsigned long mb_id)
{
const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
unsigned long pfn;
@@ -716,7 +717,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- virtio_mem_notify_going_offline(vm, mb_id);
+ virtio_mem_sbm_notify_going_offline(vm, mb_id);
break;
case MEM_GOING_ONLINE:
mutex_lock(&vm->hotplug_mutex);
@@ -726,10 +727,10 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- rc = virtio_mem_notify_going_online(vm, mb_id);
+ rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
break;
case MEM_OFFLINE:
- virtio_mem_notify_offline(vm, mb_id);
+ virtio_mem_sbm_notify_offline(vm, mb_id);
atomic64_add(size, &vm->offline_size);
/*
@@ -743,7 +744,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
mutex_unlock(&vm->hotplug_mutex);
break;
case MEM_ONLINE:
- virtio_mem_notify_online(vm, mb_id);
+ virtio_mem_sbm_notify_online(vm, mb_id);
atomic64_sub(size, &vm->offline_size);
/*
@@ -762,7 +763,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
case MEM_CANCEL_OFFLINE:
if (!vm->hotplug_active)
break;
- virtio_mem_notify_cancel_offline(vm, mb_id);
+ virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
vm->hotplug_active = false;
mutex_unlock(&vm->hotplug_mutex);
break;
--
2.26.2
Let's rename and move accordingly. While at it, rename sb_bitmap to
"sb_states".
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
1 file changed, 62 insertions(+), 56 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e76d6f769aa5..2cc497ad8298 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -137,17 +137,23 @@ struct virtio_mem {
* memory in one 4 KiB page.
*/
uint8_t *mb_states;
- } sbm;
- /*
- * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
- *
- * With 4MB subblocks, we manage 128GB of memory in one page.
- */
- unsigned long *sb_bitmap;
+ /*
+ * Bitmap: one bit per subblock. Allocated similar to
+ * sbm.mb_states.
+ *
+ * A set bit means the corresponding subblock is plugged,
+ * otherwise it's unblocked.
+ *
+ * With 4 MiB subblocks, we manage 128 GiB of memory in one
+ * 4 KiB page.
+ */
+ unsigned long *sb_states;
+ } sbm;
/*
- * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
+ * Mutex that protects the sbm.mb_count, sbm.mb_states, and
+ * sbm.sb_states.
*
* When this lock is held the pointers can't change, ONLINE and
* OFFLINE blocks can't change the state and no subblocks will get
@@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
*
* Will not modify the state of the memory block.
*/
-static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
- unsigned long mb_id, int sb_id,
- int count)
+static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
+ unsigned long mb_id, int sb_id,
+ int count)
{
const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
- __bitmap_set(vm->sb_bitmap, bit, count);
+ __bitmap_set(vm->sbm.sb_states, bit, count);
}
/*
@@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
*
* Will not modify the state of the memory block.
*/
-static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
- unsigned long mb_id, int sb_id,
- int count)
+static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
+ unsigned long mb_id, int sb_id,
+ int count)
{
const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
- __bitmap_clear(vm->sb_bitmap, bit, count);
+ __bitmap_clear(vm->sbm.sb_states, bit, count);
}
/*
* Test if all selected subblocks are plugged.
*/
-static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
- unsigned long mb_id, int sb_id,
- int count)
+static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
+ unsigned long mb_id, int sb_id,
+ int count)
{
const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
if (count == 1)
- return test_bit(bit, vm->sb_bitmap);
+ return test_bit(bit, vm->sbm.sb_states);
/* TODO: Helper similar to bitmap_set() */
- return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
+ return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
bit + count;
}
/*
* Test if all selected subblocks are unplugged.
*/
-static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
- unsigned long mb_id, int sb_id,
- int count)
+static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
+ unsigned long mb_id, int sb_id,
+ int count)
{
const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
/* TODO: Helper similar to bitmap_set() */
- return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
+ return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
+ bit + count;
}
/*
* Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
* none.
*/
-static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
+static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
unsigned long mb_id)
{
const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
- return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
- bit;
+ return find_next_zero_bit(vm->sbm.sb_states,
+ bit + vm->nb_sb_per_mb, bit) - bit;
}
/*
* Prepare the subblock bitmap for the next memory block.
*/
-static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
+static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
{
const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
- unsigned long *new_sb_bitmap, *old_sb_bitmap;
+ unsigned long *new_bitmap, *old_bitmap;
- if (vm->sb_bitmap && old_pages == new_pages)
+ if (vm->sbm.sb_states && old_pages == new_pages)
return 0;
- new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
- if (!new_sb_bitmap)
+ new_bitmap = vzalloc(new_pages * PAGE_SIZE);
+ if (!new_bitmap)
return -ENOMEM;
mutex_lock(&vm->hotplug_mutex);
- if (new_sb_bitmap)
- memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
+ if (new_bitmap)
+ memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
- old_sb_bitmap = vm->sb_bitmap;
- vm->sb_bitmap = new_sb_bitmap;
+ old_bitmap = vm->sbm.sb_states;
+ vm->sbm.sb_states = new_bitmap;
mutex_unlock(&vm->hotplug_mutex);
- vfree(old_sb_bitmap);
+ vfree(old_bitmap);
return 0;
}
@@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
int sb_id;
for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
continue;
pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
sb_id * vm->subblock_size);
@@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
int sb_id;
for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
continue;
pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
sb_id * vm->subblock_size);
@@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
* If plugged, online the pages, otherwise, set them fake
* offline (PageOffline).
*/
- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
generic_online_page(page, order);
else
virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
@@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
rc = virtio_mem_send_plug_request(vm, addr, size);
if (!rc)
- virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
+ virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
return rc;
}
@@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
rc = virtio_mem_send_unplug_request(vm, addr, size);
if (!rc)
- virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
+ virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
return rc;
}
@@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
while (*nb_sb) {
/* Find the next candidate subblock */
while (sb_id >= 0 &&
- virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
+ virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
sb_id--;
if (sb_id < 0)
break;
/* Try to unplug multiple subblocks at a time */
count = 1;
while (count < *nb_sb && sb_id > 0 &&
- virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
count++;
sb_id--;
}
@@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
return rc;
/* Resize the subblock bitmap if required. */
- rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
+ rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
if (rc)
return rc;
@@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
return -EINVAL;
while (*nb_sb) {
- sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
+ sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
if (sb_id >= vm->nb_sb_per_mb)
break;
count = 1;
while (count < *nb_sb &&
sb_id + count < vm->nb_sb_per_mb &&
- !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
- 1))
+ !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
count++;
rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
@@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
virtio_mem_fake_online(pfn, nr_pages);
}
- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
if (online)
virtio_mem_sbm_set_mb_state(vm, mb_id,
VIRTIO_MEM_SBM_MB_ONLINE);
@@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
/* some subblocks might have been unplugged even on failure */
- if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
+ if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
virtio_mem_sbm_set_mb_state(vm, mb_id,
VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
if (rc)
return rc;
- if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
/*
* Remove the block from Linux - this should never fail.
* Hinder the block from getting onlined by marking it
@@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
/* If possible, try to unplug the complete block in one shot. */
if (*nb_sb >= vm->nb_sb_per_mb &&
- virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
vm->nb_sb_per_mb);
if (!rc) {
@@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
/* Find the next candidate subblock */
while (sb_id >= 0 &&
- !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
+ !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
sb_id--;
if (sb_id < 0)
break;
@@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
* remove it. This will usually not fail, as no memory is in use
* anymore - however some other notifiers might NACK the request.
*/
- if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
mutex_unlock(&vm->hotplug_mutex);
rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
mutex_lock(&vm->hotplug_mutex);
@@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
/* remove all tracking data - no locking needed */
vfree(vm->sbm.mb_states);
- vfree(vm->sb_bitmap);
+ vfree(vm->sbm.sb_states);
/* reset the device and cleanup the queues */
vdev->config->reset(vdev);
--
2.26.2
Let's add some documentation for the current mode - Sub Block Mode (SBM) -
to prepare for a new mode - Big Block Mode (BBM).
Follow-up patches will properly factor out the existing Sub Block Mode
(SBM) and implement Device Block Mode (DBM).
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index faeb759687fe..fd8685673fe4 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -27,6 +27,21 @@ static bool unplug_online = true;
module_param(unplug_online, bool, 0644);
MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
+/*
+ * virtio-mem currently supports the following modes of operation:
+ *
+ * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
+ * size of a Sub Block (SB) is determined based on the device block size, the
+ * pageblock size, and the maximum allocation granularity of the buddy.
+ * Subblocks within a Linux memory block might either be plugged or unplugged.
+ * Memory is added/removed to Linux MM in Linux memory block granularity.
+ *
+ * User space / core MM (auto onlining) is responsible for onlining added
+ * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
+ * always onlined separately, and all memory within a Linux memory block is
+ * onlined to the same zone - virtio-mem relies on this behavior.
+ */
+
enum virtio_mem_mb_state {
/* Unplugged, not added to Linux. Can be reused later. */
VIRTIO_MEM_MB_STATE_UNUSED = 0,
--
2.26.2
Currently, we do not support device block sizes that exceed the Linux
memory block size. For example, having a device block size of 1 GiB (e.g.,
gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
blocks.
Let's implement Big Block Mode (BBM), whereby we add/remove at least
one Linux memory block at a time. With a 1 GiB device block size, a Big
Block (BB) will cover 8 Linux memory blocks.
We'll keep registering the online_page_callback machinery, it will be used
for safe memory hotunplug in BBM next.
Note: BBM is properly prepared for variable-sized Linux memory
blocks that we might see in the future. So we won't care how many Linux
memory blocks a big block actually spans, and how the memory notifier is
called.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
1 file changed, 402 insertions(+), 82 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e68d0d99590c..4d396ef98a92 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
/*
* virtio-mem currently supports the following modes of operation:
*
- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
* size of a Sub Block (SB) is determined based on the device block size, the
* pageblock size, and the maximum allocation granularity of the buddy.
* Subblocks within a Linux memory block might either be plugged or unplugged.
* Memory is added/removed to Linux MM in Linux memory block granularity.
*
+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
+ * Memory is added/removed to Linux MM in Big Block granularity.
+ *
+ * The mode is determined automatically based on the Linux memory block size
+ * and the device block size.
+ *
* User space / core MM (auto onlining) is responsible for onlining added
* Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
* always onlined separately, and all memory within a Linux memory block is
@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
VIRTIO_MEM_SBM_MB_COUNT
};
+/*
+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
+ */
+enum virtio_mem_bbm_bb_state {
+ /* Unplugged, not added to Linux. Can be reused later. */
+ VIRTIO_MEM_BBM_BB_UNUSED = 0,
+ /* Plugged, not added to Linux. Error on add_memory(). */
+ VIRTIO_MEM_BBM_BB_PLUGGED,
+ /* Plugged and added to Linux. */
+ VIRTIO_MEM_BBM_BB_ADDED,
+ VIRTIO_MEM_BBM_BB_COUNT
+};
+
struct virtio_mem {
struct virtio_device *vdev;
@@ -113,6 +132,9 @@ struct virtio_mem {
atomic64_t offline_size;
uint64_t offline_threshold;
+ /* If set, the driver is in SBM, otherwise in BBM. */
+ bool in_sbm;
+
struct {
/* Id of the first memory block of this device. */
unsigned long first_mb_id;
@@ -151,9 +173,27 @@ struct virtio_mem {
unsigned long *sb_states;
} sbm;
+ struct {
+ /* Id of the first big block of this device. */
+ unsigned long first_bb_id;
+ /* Id of the last usable big block of this device. */
+ unsigned long last_usable_bb_id;
+ /* Id of the next device bock to prepare when needed. */
+ unsigned long next_bb_id;
+
+ /* Summary of all big block states. */
+ unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
+
+ /* One byte state per big block. See sbm.mb_states. */
+ uint8_t *bb_states;
+
+ /* The block size used for (un)plugged, adding/removing. */
+ uint64_t bb_size;
+ } bbm;
+
/*
- * Mutex that protects the sbm.mb_count, sbm.mb_states, and
- * sbm.sb_states.
+ * Mutex that protects the sbm.mb_count, sbm.mb_states,
+ * sbm.sb_states, bbm.bb_count, and bbm.bb_states
*
* When this lock is held the pointers can't change, ONLINE and
* OFFLINE blocks can't change the state and no subblocks will get
@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
return mb_id * memory_block_size_bytes();
}
+/*
+ * Calculate the big block id of a given address.
+ */
+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
+ uint64_t addr)
+{
+ return addr / vm->bbm.bb_size;
+}
+
+/*
+ * Calculate the physical start address of a given big block id.
+ */
+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
+ unsigned long bb_id)
+{
+ return bb_id * vm->bbm.bb_size;
+}
+
/*
* Calculate the subblock id of a given address.
*/
@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
return (addr - mb_addr) / vm->sbm.sb_size;
}
+/*
+ * Set the state of a big block, taking care of the state counter.
+ */
+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
+ unsigned long bb_id,
+ enum virtio_mem_bbm_bb_state state)
+{
+ const unsigned long idx = bb_id - vm->bbm.first_bb_id;
+ enum virtio_mem_bbm_bb_state old_state;
+
+ old_state = vm->bbm.bb_states[idx];
+ vm->bbm.bb_states[idx] = state;
+
+ BUG_ON(vm->bbm.bb_count[old_state] == 0);
+ vm->bbm.bb_count[old_state]--;
+ vm->bbm.bb_count[state]++;
+}
+
+/*
+ * Get the state of a big block.
+ */
+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
+ unsigned long bb_id)
+{
+ return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
+}
+
+/*
+ * Prepare the big block state array for the next big block.
+ */
+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
+{
+ unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
+ unsigned long new_bytes = old_bytes + 1;
+ int old_pages = PFN_UP(old_bytes);
+ int new_pages = PFN_UP(new_bytes);
+ uint8_t *new_array;
+
+ if (vm->bbm.bb_states && old_pages == new_pages)
+ return 0;
+
+ new_array = vzalloc(new_pages * PAGE_SIZE);
+ if (!new_array)
+ return -ENOMEM;
+
+ mutex_lock(&vm->hotplug_mutex);
+ if (vm->bbm.bb_states)
+ memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
+ vfree(vm->bbm.bb_states);
+ vm->bbm.bb_states = new_array;
+ mutex_unlock(&vm->hotplug_mutex);
+
+ return 0;
+}
+
+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
+ for (_bb_id = vm->bbm.first_bb_id; \
+ _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
+ _bb_id++) \
+ if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
+
/*
* Set the state of a memory block, taking care of the state counter.
*/
@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
return virtio_mem_add_memory(vm, addr, size);
}
+/*
+ * See virtio_mem_add_memory(): Try adding a big block.
+ */
+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+ const uint64_t size = vm->bbm.bb_size;
+
+ return virtio_mem_add_memory(vm, addr, size);
+}
+
/*
* Try removing memory from Linux. Will only fail if memory blocks aren't
* offline.
@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
struct memory_notify *mhp = arg;
const unsigned long start = PFN_PHYS(mhp->start_pfn);
const unsigned long size = PFN_PHYS(mhp->nr_pages);
- const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
int rc = NOTIFY_OK;
+ unsigned long id;
if (!virtio_mem_overlaps_range(vm, start, size))
return NOTIFY_DONE;
- /*
- * Memory is onlined/offlined in memory block granularity. We cannot
- * cross virtio-mem device boundaries and memory block boundaries. Bail
- * out if this ever changes.
- */
- if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
- !IS_ALIGNED(start, memory_block_size_bytes())))
- return NOTIFY_BAD;
+ if (vm->in_sbm) {
+ id = virtio_mem_phys_to_mb_id(start);
+ /*
+ * In SBM, we add memory in separate memory blocks - we expect
+ * it to be onlined/offlined in the same granularity. Bail out
+ * if this ever changes.
+ */
+ if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
+ !IS_ALIGNED(start, memory_block_size_bytes())))
+ return NOTIFY_BAD;
+ } else {
+ id = virtio_mem_phys_to_bb_id(vm, start);
+ /*
+ * In BBM, we only care about onlining/offlining happening
+ * within a single big block, we don't care about the
+ * actual granularity as we don't track individual Linux
+ * memory blocks.
+ */
+ if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
+ return NOTIFY_BAD;
+ }
/*
* Avoid circular locking lockdep warnings. We lock the mutex
@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- virtio_mem_sbm_notify_going_offline(vm, mb_id);
+ if (vm->in_sbm)
+ virtio_mem_sbm_notify_going_offline(vm, id);
break;
case MEM_GOING_ONLINE:
mutex_lock(&vm->hotplug_mutex);
@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
+ if (vm->in_sbm)
+ rc = virtio_mem_sbm_notify_going_online(vm, id);
break;
case MEM_OFFLINE:
- virtio_mem_sbm_notify_offline(vm, mb_id);
+ if (vm->in_sbm)
+ virtio_mem_sbm_notify_offline(vm, id);
atomic64_add(size, &vm->offline_size);
/*
@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
mutex_unlock(&vm->hotplug_mutex);
break;
case MEM_ONLINE:
- virtio_mem_sbm_notify_online(vm, mb_id);
+ if (vm->in_sbm)
+ virtio_mem_sbm_notify_online(vm, id);
atomic64_sub(size, &vm->offline_size);
/*
@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
case MEM_CANCEL_OFFLINE:
if (!vm->hotplug_active)
break;
- virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
+ if (vm->in_sbm)
+ virtio_mem_sbm_notify_cancel_offline(vm, id);
vm->hotplug_active = false;
mutex_unlock(&vm->hotplug_mutex);
break;
@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
{
const unsigned long addr = page_to_phys(page);
- const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
+ unsigned long id, sb_id;
struct virtio_mem *vm;
- int sb_id;
+ bool do_online;
- /*
- * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
- * size/alignment and that this callback is is called with such a
- * size/alignment. So we cannot cross subblocks and therefore
- * also not memory blocks.
- */
rcu_read_lock();
list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
continue;
- sb_id = virtio_mem_phys_to_sb_id(vm, addr);
- /*
- * If plugged, online the pages, otherwise, set them fake
- * offline (PageOffline).
- */
- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
+ if (vm->in_sbm) {
+ /*
+ * We exploit here that subblocks have at least
+ * MAX_ORDER_NR_PAGES size/alignment - so we cannot
+ * cross subblocks within one call.
+ */
+ id = virtio_mem_phys_to_mb_id(addr);
+ sb_id = virtio_mem_phys_to_sb_id(vm, addr);
+ do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
+ sb_id, 1);
+ } else {
+ do_online = true;
+ }
+ if (do_online)
generic_online_page(page, order);
else
virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
return rc;
}
+/*
+ * Request to unplug a big block.
+ *
+ * Will not modify the state of the big block.
+ */
+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+ const uint64_t size = vm->bbm.bb_size;
+
+ return virtio_mem_send_unplug_request(vm, addr, size);
+}
+
+/*
+ * Request to plug a big block.
+ *
+ * Will not modify the state of the big block.
+ */
+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
+{
+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
+ const uint64_t size = vm->bbm.bb_size;
+
+ return virtio_mem_send_plug_request(vm, addr, size);
+}
+
/*
* Unplug the desired number of plugged subblocks of a offline or not-added
* memory block. Will fail if any subblock cannot get unplugged (instead of
@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
return 0;
}
-/*
- * Try to plug the requested amount of memory.
- */
-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
{
uint64_t nb_sb = diff / vm->sbm.sb_size;
unsigned long mb_id;
@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
return rc;
}
+/*
+ * Plug a big block and add it to Linux.
+ *
+ * Will modify the state of the big block.
+ */
+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
+ unsigned long bb_id)
+{
+ int rc;
+
+ if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
+ VIRTIO_MEM_BBM_BB_UNUSED))
+ return -EINVAL;
+
+ rc = virtio_mem_bbm_plug_bb(vm, bb_id);
+ if (rc)
+ return rc;
+ virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
+
+ rc = virtio_mem_bbm_add_bb(vm, bb_id);
+ if (rc) {
+ if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
+ virtio_mem_bbm_set_bb_state(vm, bb_id,
+ VIRTIO_MEM_BBM_BB_UNUSED);
+ else
+ /* Retry from the main loop. */
+ virtio_mem_bbm_set_bb_state(vm, bb_id,
+ VIRTIO_MEM_BBM_BB_PLUGGED);
+ return rc;
+ }
+ return 0;
+}
+
+/*
+ * Prepare tracking data for the next big block.
+ */
+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
+ unsigned long *bb_id)
+{
+ int rc;
+
+ if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
+ return -ENOSPC;
+
+ /* Resize the big block state array if required. */
+ rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
+ if (rc)
+ return rc;
+
+ vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
+ *bb_id = vm->bbm.next_bb_id;
+ vm->bbm.next_bb_id++;
+ return 0;
+}
+
+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
+{
+ uint64_t nb_bb = diff / vm->bbm.bb_size;
+ unsigned long bb_id;
+ int rc;
+
+ if (!nb_bb)
+ return 0;
+
+ /* Try to plug and add unused big blocks */
+ virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
+ return -ENOSPC;
+
+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
+ if (!rc)
+ nb_bb--;
+ if (rc || !nb_bb)
+ return rc;
+ cond_resched();
+ }
+
+ /* Try to prepare, plug and add new big blocks */
+ while (nb_bb) {
+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
+ return -ENOSPC;
+
+ rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
+ if (rc)
+ return rc;
+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
+ if (!rc)
+ nb_bb--;
+ if (rc)
+ return rc;
+ cond_resched();
+ }
+
+ return 0;
+}
+
+/*
+ * Try to plug the requested amount of memory.
+ */
+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
+{
+ if (vm->in_sbm)
+ return virtio_mem_sbm_plug_request(vm, diff);
+ return virtio_mem_bbm_plug_request(vm, diff);
+}
+
/*
* Unplug the desired number of plugged subblocks of an offline memory block.
* Will fail if any subblock cannot get unplugged (instead of skipping it).
@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
return 0;
}
-/*
- * Try to unplug the requested amount of memory.
- */
-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
{
uint64_t nb_sb = diff / vm->sbm.sb_size;
unsigned long mb_id;
@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
return rc;
}
+/*
+ * Try to unplug the requested amount of memory.
+ */
+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
+{
+ if (vm->in_sbm)
+ return virtio_mem_sbm_unplug_request(vm, diff);
+ return -EBUSY;
+}
+
/*
* Try to unplug all blocks that couldn't be unplugged before, for example,
* because the hypervisor was busy.
*/
static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
{
- unsigned long mb_id;
+ unsigned long id;
int rc;
- virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
- rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
+ if (!vm->in_sbm) {
+ virtio_mem_bbm_for_each_bb(vm, id,
+ VIRTIO_MEM_BBM_BB_PLUGGED) {
+ rc = virtio_mem_bbm_unplug_bb(vm, id);
+ if (rc)
+ return rc;
+ virtio_mem_bbm_set_bb_state(vm, id,
+ VIRTIO_MEM_BBM_BB_UNUSED);
+ }
+ return 0;
+ }
+
+ virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
+ rc = virtio_mem_sbm_unplug_mb(vm, id);
if (rc)
return rc;
- virtio_mem_sbm_set_mb_state(vm, mb_id,
+ virtio_mem_sbm_set_mb_state(vm, id,
VIRTIO_MEM_SBM_MB_UNUSED);
}
@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
usable_region_size, &usable_region_size);
end_addr = vm->addr + usable_region_size;
end_addr = min(end_addr, phys_limit);
- vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
+
+ if (vm->in_sbm)
+ vm->sbm.last_usable_mb_id =
+ virtio_mem_phys_to_mb_id(end_addr) - 1;
+ else
+ vm->bbm.last_usable_bb_id =
+ virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
/* see if there is a request to change the size */
virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
static int virtio_mem_init(struct virtio_mem *vm)
{
const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
+ uint64_t sb_size, addr;
uint16_t node_id;
if (!vm->vdev->config->get) {
@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
if (vm->nid == NUMA_NO_NODE)
vm->nid = memory_add_physaddr_to_nid(vm->addr);
- /*
- * We always hotplug memory in memory block granularity. This way,
- * we have to wait for exactly one memory block to online.
- */
- if (vm->device_block_size > memory_block_size_bytes()) {
- dev_err(&vm->vdev->dev,
- "The block size is not supported (too big).\n");
- return -EINVAL;
- }
-
/* bad device setup - warn only */
if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
dev_warn(&vm->vdev->dev,
@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
* - Is required for now for alloc_contig_range() to work reliably -
* it doesn't properly handle smaller granularity on ZONE_NORMAL.
*/
- vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
- vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
- vm->sbm.sb_size);
- vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
+ sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+ pageblock_nr_pages) * PAGE_SIZE;
+ sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
+
+ if (sb_size < memory_block_size_bytes()) {
+ /* SBM: At least two subblocks per Linux memory block. */
+ vm->in_sbm = true;
+ vm->sbm.sb_size = sb_size;
+ vm->sbm.sbs_per_mb = memory_block_size_bytes() /
+ vm->sbm.sb_size;
+
+ /* Round up to the next full memory block */
+ addr = vm->addr + memory_block_size_bytes() - 1;
+ vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
+ vm->sbm.next_mb_id = vm->sbm.first_mb_id;
+ } else {
+ /* BBM: At least one Linux memory block. */
+ vm->bbm.bb_size = vm->device_block_size;
- /* Round up to the next full memory block */
- vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
- memory_block_size_bytes());
- vm->sbm.next_mb_id = vm->sbm.first_mb_id;
+ vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
+ vm->bbm.next_bb_id = vm->bbm.first_bb_id;
+ }
/* Prepare the offline threshold - make sure we can add two blocks. */
vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
+ /* In BBM, we also want at least two big blocks. */
+ vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
+ vm->offline_threshold);
dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
(unsigned long long)vm->device_block_size);
dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
memory_block_size_bytes());
- dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
- (unsigned long long)vm->sbm.sb_size);
+ if (vm->in_sbm)
+ dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
+ (unsigned long long)vm->sbm.sb_size);
+ else
+ dev_info(&vm->vdev->dev, "big block size: 0x%llx",
+ (unsigned long long)vm->bbm.bb_size);
if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
cancel_work_sync(&vm->wq);
hrtimer_cancel(&vm->retry_timer);
- /*
- * After we unregistered our callbacks, user space can online partially
- * plugged offline blocks. Make sure to remove them.
- */
- virtio_mem_sbm_for_each_mb(vm, mb_id,
- VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
- rc = virtio_mem_sbm_remove_mb(vm, mb_id);
- BUG_ON(rc);
- virtio_mem_sbm_set_mb_state(vm, mb_id,
- VIRTIO_MEM_SBM_MB_UNUSED);
+ if (vm->in_sbm) {
+ /*
+ * After we unregistered our callbacks, user space can online
+ * partially plugged offline blocks. Make sure to remove them.
+ */
+ virtio_mem_sbm_for_each_mb(vm, mb_id,
+ VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
+ rc = virtio_mem_sbm_remove_mb(vm, mb_id);
+ BUG_ON(rc);
+ virtio_mem_sbm_set_mb_state(vm, mb_id,
+ VIRTIO_MEM_SBM_MB_UNUSED);
+ }
+ /*
+ * After we unregistered our callbacks, user space can no longer
+ * offline partially plugged online memory blocks. No need to
+ * worry about them.
+ */
}
- /*
- * After we unregistered our callbacks, user space can no longer
- * offline partially plugged online memory blocks. No need to worry
- * about them.
- */
/* unregister callbacks */
unregister_virtio_mem_device(vm);
@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
}
/* remove all tracking data - no locking needed */
- vfree(vm->sbm.mb_states);
- vfree(vm->sbm.sb_states);
+ if (vm->in_sbm) {
+ vfree(vm->sbm.mb_states);
+ vfree(vm->sbm.sb_states);
+ } else {
+ vfree(vm->bbm.bb_states);
+ }
/* reset the device and cleanup the queues */
vdev->config->reset(vdev);
--
2.26.2
Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
1 file changed, 48 insertions(+), 48 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index fc2b1ff3beed..3a772714fec9 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -96,11 +96,6 @@ struct virtio_mem {
/* Maximum region size in bytes. */
uint64_t region_size;
- /* The subblock size. */
- uint64_t subblock_size;
- /* The number of subblocks per memory block. */
- uint32_t nb_sb_per_mb;
-
/* Id of the first memory block of this device. */
unsigned long first_mb_id;
/* Id of the last usable memory block of this device. */
@@ -126,6 +121,11 @@ struct virtio_mem {
uint64_t offline_threshold;
struct {
+ /* The subblock size. */
+ uint64_t sb_size;
+ /* The number of subblocks per Linux memory block. */
+ uint32_t sbs_per_mb;
+
/* Summary of all memory block states. */
unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
- return (addr - mb_addr) / vm->subblock_size;
+ return (addr - mb_addr) / vm->sbm.sb_size;
}
/*
@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
unsigned long mb_id, int sb_id)
{
- return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
+ return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
}
/*
@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
}
/*
- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
* none.
*/
static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
return find_next_zero_bit(vm->sbm.sb_states,
- bit + vm->nb_sb_per_mb, bit) - bit;
+ bit + vm->sbm.sbs_per_mb, bit) - bit;
}
/*
@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
{
const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
- const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
- const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
+ const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
+ const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
unsigned long *new_bitmap, *old_bitmap;
@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
unsigned long mb_id)
{
- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
unsigned long pfn;
int sb_id;
- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
continue;
pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size);
+ sb_id * vm->sbm.sb_size);
virtio_mem_fake_offline_going_offline(pfn, nr_pages);
}
}
@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
unsigned long mb_id)
{
- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
unsigned long pfn;
int sb_id;
- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
continue;
pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size);
+ sb_id * vm->sbm.sb_size);
virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
}
}
@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
int sb_id, int count)
{
const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size;
- const uint64_t size = count * vm->subblock_size;
+ sb_id * vm->sbm.sb_size;
+ const uint64_t size = count * vm->sbm.sb_size;
int rc;
dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
int sb_id, int count)
{
const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size;
- const uint64_t size = count * vm->subblock_size;
+ sb_id * vm->sbm.sb_size;
+ const uint64_t size = count * vm->sbm.sb_size;
int rc;
dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
int sb_id, count;
int rc;
- sb_id = vm->nb_sb_per_mb - 1;
+ sb_id = vm->sbm.sbs_per_mb - 1;
while (*nb_sb) {
/* Find the next candidate subblock */
while (sb_id >= 0 &&
@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
*/
static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
{
- uint64_t nb_sb = vm->nb_sb_per_mb;
+ uint64_t nb_sb = vm->sbm.sbs_per_mb;
return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
}
@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
unsigned long mb_id, uint64_t *nb_sb)
{
- const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
+ const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
int rc;
if (WARN_ON_ONCE(!count))
@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
* Mark the block properly offline before adding it to Linux,
* so the memory notifiers will find the block in the right state.
*/
- if (count == vm->nb_sb_per_mb)
+ if (count == vm->sbm.sbs_per_mb)
virtio_mem_sbm_set_mb_state(vm, mb_id,
VIRTIO_MEM_SBM_MB_OFFLINE);
else
@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
while (*nb_sb) {
sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
- if (sb_id >= vm->nb_sb_per_mb)
+ if (sb_id >= vm->sbm.sbs_per_mb)
break;
count = 1;
while (count < *nb_sb &&
- sb_id + count < vm->nb_sb_per_mb &&
+ sb_id + count < vm->sbm.sbs_per_mb &&
!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
count++;
@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
/* fake-online the pages if the memory block is online */
pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size);
- nr_pages = PFN_DOWN(count * vm->subblock_size);
+ sb_id * vm->sbm.sb_size);
+ nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
virtio_mem_fake_online(pfn, nr_pages);
}
- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
if (online)
virtio_mem_sbm_set_mb_state(vm, mb_id,
VIRTIO_MEM_SBM_MB_ONLINE);
@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
*/
static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
{
- uint64_t nb_sb = diff / vm->subblock_size;
+ uint64_t nb_sb = diff / vm->sbm.sb_size;
unsigned long mb_id;
int rc;
@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
/* some subblocks might have been unplugged even on failure */
- if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
+ if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
virtio_mem_sbm_set_mb_state(vm, mb_id,
VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
if (rc)
return rc;
- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
/*
* Remove the block from Linux - this should never fail.
* Hinder the block from getting onlined by marking it
@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
unsigned long mb_id, int sb_id,
int count)
{
- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
unsigned long start_pfn;
int rc;
start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
- sb_id * vm->subblock_size);
+ sb_id * vm->sbm.sb_size);
rc = virtio_mem_fake_offline(start_pfn, nr_pages);
if (rc)
@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
int rc, sb_id;
/* If possible, try to unplug the complete block in one shot. */
- if (*nb_sb >= vm->nb_sb_per_mb &&
- virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (*nb_sb >= vm->sbm.sbs_per_mb &&
+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
- vm->nb_sb_per_mb);
+ vm->sbm.sbs_per_mb);
if (!rc) {
- *nb_sb -= vm->nb_sb_per_mb;
+ *nb_sb -= vm->sbm.sbs_per_mb;
goto unplugged;
} else if (rc != -EBUSY)
return rc;
}
/* Fallback to single subblocks. */
- for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
+ for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
/* Find the next candidate subblock */
while (sb_id >= 0 &&
!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
* remove it. This will usually not fail, as no memory is in use
* anymore - however some other notifiers might NACK the request.
*/
- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
mutex_unlock(&vm->hotplug_mutex);
rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
mutex_lock(&vm->hotplug_mutex);
@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
*/
static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
{
- uint64_t nb_sb = diff / vm->subblock_size;
+ uint64_t nb_sb = diff / vm->sbm.sb_size;
unsigned long mb_id;
int rc;
@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
* - Is required for now for alloc_contig_range() to work reliably -
* it doesn't properly handle smaller granularity on ZONE_NORMAL.
*/
- vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
- vm->subblock_size = max_t(uint64_t, vm->device_block_size,
- vm->subblock_size);
- vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
+ vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
+ pageblock_nr_pages) * PAGE_SIZE;
+ vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
+ vm->sbm.sb_size);
+ vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
/* Round up to the next full memory block */
vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
memory_block_size_bytes());
dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
- (unsigned long long)vm->subblock_size);
+ (unsigned long long)vm->sbm.sb_size);
if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
--
2.26.2
On Mon, Oct 12, 2020 at 02:52:57PM +0200, David Hildenbrand wrote:
>Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
>possible, so we don't have do deal with allocation orders.
>
>Add a comment why we have that restriction for now.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
--
Wei Yang
Help you, Help me
On 12.10.20 14:53, David Hildenbrand wrote:
> Let's add some documentation for the current mode - Sub Block Mode (SBM) -
> to prepare for a new mode - Big Block Mode (BBM).
>
> Follow-up patches will properly factor out the existing Sub Block Mode
> (SBM) and implement Device Block Mode (DBM).
s/Device Block Mode (DBM)/Big Block Mode (BBM)/
--
Thanks,
David / dhildenb
On Mon, Oct 12, 2020 at 02:53:02PM +0200, David Hildenbrand wrote:
>No longer used, let's drop it.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
If above two patches are merged.
Reviewed-by: Wei Yang <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 4 ----
> 1 file changed, 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 37a0e338ae4a..5c93f8a65eba 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -84,8 +84,6 @@ struct virtio_mem {
>
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
>- /* Id of the last memory block of this device. */
>- unsigned long last_mb_id;
> /* Id of the last usable memory block of this device. */
> unsigned long last_usable_mb_id;
> /* Id of the next memory bock to prepare when needed. */
>@@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
> memory_block_size_bytes());
> vm->next_mb_id = vm->first_mb_id;
>- vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
>- vm->region_size) - 1;
>
> dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>--
>2.26.2
--
Wei Yang
Help you, Help me
> No longer used, let's drop it.
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: Pankaj Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> drivers/virtio/virtio_mem.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 37a0e338ae4a..5c93f8a65eba 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -84,8 +84,6 @@ struct virtio_mem {
>
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
> - /* Id of the last memory block of this device. */
> - unsigned long last_mb_id;
> /* Id of the last usable memory block of this device. */
> unsigned long last_usable_mb_id;
> /* Id of the next memory bock to prepare when needed. */
> @@ -1689,8 +1687,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
> memory_block_size_bytes());
> vm->next_mb_id = vm->first_mb_id;
> - vm->last_mb_id = virtio_mem_phys_to_mb_id(vm->addr +
> - vm->region_size) - 1;
>
> dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
Reviewed-by: Pankaj Gupta <[email protected]>
On Mon, Oct 12, 2020 at 02:53:09PM +0200, David Hildenbrand wrote:
>Let's add some documentation for the current mode - Sub Block Mode (SBM) -
>to prepare for a new mode - Big Block Mode (BBM).
>
>Follow-up patches will properly factor out the existing Sub Block Mode
>(SBM) and implement Device Block Mode (DBM).
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index faeb759687fe..fd8685673fe4 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -27,6 +27,21 @@ static bool unplug_online = true;
> module_param(unplug_online, bool, 0644);
> MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>
>+/*
>+ * virtio-mem currently supports the following modes of operation:
>+ *
>+ * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ * size of a Sub Block (SB) is determined based on the device block size, the
>+ * pageblock size, and the maximum allocation granularity of the buddy.
>+ * Subblocks within a Linux memory block might either be plugged or unplugged.
>+ * Memory is added/removed to Linux MM in Linux memory block granularity.
>+ *
>+ * User space / core MM (auto onlining) is responsible for onlining added
>+ * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>+ * always onlined separately, and all memory within a Linux memory block is
>+ * onlined to the same zone - virtio-mem relies on this behavior.
>+ */
>+
> enum virtio_mem_mb_state {
> /* Unplugged, not added to Linux. Can be reused later. */
> VIRTIO_MEM_MB_STATE_UNUSED = 0,
>--
>2.26.2
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:12PM +0200, David Hildenbrand wrote:
>The calculation is already complicated enough, let's limit it to one
>location.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 2cc497ad8298..73ff6e9ba839 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> _mb_id--) \
> if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
>
>+/*
>+ * Calculate the bit number in the sb_states bitmap for the given subblock
>+ * inside the given memory block.
>+ */
>+static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
>+ unsigned long mb_id, int sb_id)
>+{
>+ return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+}
>+
> /*
> * Mark all selected subblocks plugged.
> *
>@@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> __bitmap_set(vm->sbm.sb_states, bit, count);
> }
>@@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> __bitmap_clear(vm->sbm.sb_states, bit, count);
> }
>@@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> if (count == 1)
> return test_bit(bit, vm->sbm.sb_states);
>@@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> /* TODO: Helper similar to bitmap_set() */
> return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
>@@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> unsigned long mb_id)
> {
>- const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
>+ const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
>
> return find_next_zero_bit(vm->sbm.sb_states,
> bit + vm->nb_sb_per_mb, bit) - bit;
>--
>2.26.2
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
> 1 file changed, 48 insertions(+), 48 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fc2b1ff3beed..3a772714fec9 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -96,11 +96,6 @@ struct virtio_mem {
> /* Maximum region size in bytes. */
> uint64_t region_size;
>
>- /* The subblock size. */
>- uint64_t subblock_size;
>- /* The number of subblocks per memory block. */
>- uint32_t nb_sb_per_mb;
>-
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
> /* Id of the last usable memory block of this device. */
>@@ -126,6 +121,11 @@ struct virtio_mem {
> uint64_t offline_threshold;
>
> struct {
>+ /* The subblock size. */
>+ uint64_t sb_size;
>+ /* The number of subblocks per Linux memory block. */
>+ uint32_t sbs_per_mb;
>+
> /* Summary of all memory block states. */
> unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
>
>@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
> const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
>
>- return (addr - mb_addr) / vm->subblock_size;
>+ return (addr - mb_addr) / vm->sbm.sb_size;
> }
>
> /*
>@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id)
> {
>- return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
> }
>
> /*
>@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> }
>
> /*
>- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
> * none.
> */
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
>
> return find_next_zero_bit(vm->sbm.sb_states,
>- bit + vm->nb_sb_per_mb, bit) - bit;
>+ bit + vm->sbm.sbs_per_mb, bit) - bit;
> }
>
> /*
>@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>- const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
>- const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
>+ const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
>+ const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
> int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> unsigned long *new_bitmap, *old_bitmap;
>@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> unsigned long mb_id)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> unsigned long pfn;
> int sb_id;
>
>- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
> virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> }
> }
>@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> unsigned long mb_id)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> unsigned long pfn;
> int sb_id;
>
>- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
> virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> }
> }
>@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> int sb_id, int count)
> {
> const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size;
>- const uint64_t size = count * vm->subblock_size;
>+ sb_id * vm->sbm.sb_size;
>+ const uint64_t size = count * vm->sbm.sb_size;
> int rc;
>
> dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
>@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> int sb_id, int count)
> {
> const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size;
>- const uint64_t size = count * vm->subblock_size;
>+ sb_id * vm->sbm.sb_size;
>+ const uint64_t size = count * vm->sbm.sb_size;
> int rc;
>
> dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
>@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> int sb_id, count;
> int rc;
>
>- sb_id = vm->nb_sb_per_mb - 1;
>+ sb_id = vm->sbm.sbs_per_mb - 1;
> while (*nb_sb) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
>@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> */
> static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
>- uint64_t nb_sb = vm->nb_sb_per_mb;
>+ uint64_t nb_sb = vm->sbm.sbs_per_mb;
>
> return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
> }
>@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> unsigned long mb_id, uint64_t *nb_sb)
> {
>- const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
>+ const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
> int rc;
>
> if (WARN_ON_ONCE(!count))
>@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> * Mark the block properly offline before adding it to Linux,
> * so the memory notifiers will find the block in the right state.
> */
>- if (count == vm->nb_sb_per_mb)
>+ if (count == vm->sbm.sbs_per_mb)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE);
> else
>@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>
> while (*nb_sb) {
> sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
>- if (sb_id >= vm->nb_sb_per_mb)
>+ if (sb_id >= vm->sbm.sbs_per_mb)
> break;
> count = 1;
> while (count < *nb_sb &&
>- sb_id + count < vm->nb_sb_per_mb &&
>+ sb_id + count < vm->sbm.sbs_per_mb &&
> !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> count++;
>
>@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>
> /* fake-online the pages if the memory block is online */
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>- nr_pages = PFN_DOWN(count * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
>+ nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
> virtio_mem_fake_online(pfn, nr_pages);
> }
>
>- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> if (online)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> */
> static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
>- uint64_t nb_sb = diff / vm->subblock_size;
>+ uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
> int rc;
>
>@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
>
> /* some subblocks might have been unplugged even on failure */
>- if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+ if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> if (rc)
> return rc;
>
>- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> /*
> * Remove the block from Linux - this should never fail.
> * Hinder the block from getting onlined by marking it
>@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
> unsigned long start_pfn;
> int rc;
>
> start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
>
> rc = virtio_mem_fake_offline(start_pfn, nr_pages);
> if (rc)
>@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> int rc, sb_id;
>
> /* If possible, try to unplug the complete block in one shot. */
>- if (*nb_sb >= vm->nb_sb_per_mb &&
>- virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (*nb_sb >= vm->sbm.sbs_per_mb &&
>+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
>- vm->nb_sb_per_mb);
>+ vm->sbm.sbs_per_mb);
> if (!rc) {
>- *nb_sb -= vm->nb_sb_per_mb;
>+ *nb_sb -= vm->sbm.sbs_per_mb;
> goto unplugged;
> } else if (rc != -EBUSY)
> return rc;
> }
>
> /* Fallback to single subblocks. */
>- for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
>+ for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
> !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> * remove it. This will usually not fail, as no memory is in use
> * anymore - however some other notifiers might NACK the request.
> */
>- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> mutex_unlock(&vm->hotplug_mutex);
> rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> mutex_lock(&vm->hotplug_mutex);
>@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> */
> static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
>- uint64_t nb_sb = diff / vm->subblock_size;
>+ uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
> int rc;
>
>@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
> * - Is required for now for alloc_contig_range() to work reliably -
> * it doesn't properly handle smaller granularity on ZONE_NORMAL.
> */
>- vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>- pageblock_nr_pages) * PAGE_SIZE;
>- vm->subblock_size = max_t(uint64_t, vm->device_block_size,
>- vm->subblock_size);
>- vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
>+ vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+ pageblock_nr_pages) * PAGE_SIZE;
>+ vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>+ vm->sbm.sb_size);
>+ vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>
> /* Round up to the next full memory block */
> vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
> dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> memory_block_size_bytes());
> dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>- (unsigned long long)vm->subblock_size);
>+ (unsigned long long)vm->sbm.sb_size);
> if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
>
>--
>2.26.2
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
One trivial suggestion, could we move this patch close the data structure
movement patch?
I know this would be some work, since you have changed some of the code logic.
This would take you some time to rebase.
>---
> drivers/virtio/virtio_mem.c | 96 ++++++++++++++++++-------------------
> 1 file changed, 48 insertions(+), 48 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index fc2b1ff3beed..3a772714fec9 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -96,11 +96,6 @@ struct virtio_mem {
> /* Maximum region size in bytes. */
> uint64_t region_size;
>
>- /* The subblock size. */
>- uint64_t subblock_size;
>- /* The number of subblocks per memory block. */
>- uint32_t nb_sb_per_mb;
>-
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
> /* Id of the last usable memory block of this device. */
>@@ -126,6 +121,11 @@ struct virtio_mem {
> uint64_t offline_threshold;
>
> struct {
>+ /* The subblock size. */
>+ uint64_t sb_size;
>+ /* The number of subblocks per Linux memory block. */
>+ uint32_t sbs_per_mb;
>+
> /* Summary of all memory block states. */
> unsigned long mb_count[VIRTIO_MEM_SBM_MB_COUNT];
>
>@@ -256,7 +256,7 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
> const unsigned long mb_addr = virtio_mem_mb_id_to_phys(mb_id);
>
>- return (addr - mb_addr) / vm->subblock_size;
>+ return (addr - mb_addr) / vm->sbm.sb_size;
> }
>
> /*
>@@ -334,7 +334,7 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id)
> {
>- return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>+ return (mb_id - vm->first_mb_id) * vm->sbm.sbs_per_mb + sb_id;
> }
>
> /*
>@@ -397,7 +397,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> }
>
> /*
>- * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
>+ * Find the first unplugged subblock. Returns vm->sbm.sbs_per_mb in case there is
> * none.
> */
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
>@@ -406,7 +406,7 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
>
> return find_next_zero_bit(vm->sbm.sb_states,
>- bit + vm->nb_sb_per_mb, bit) - bit;
>+ bit + vm->sbm.sbs_per_mb, bit) - bit;
> }
>
> /*
>@@ -415,8 +415,8 @@ static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
>- const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
>- const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
>+ const unsigned long old_nb_bits = old_nb_mb * vm->sbm.sbs_per_mb;
>+ const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->sbm.sbs_per_mb;
> int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> unsigned long *new_bitmap, *old_bitmap;
>@@ -642,15 +642,15 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
> static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> unsigned long mb_id)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> unsigned long pfn;
> int sb_id;
>
>- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
> virtio_mem_fake_offline_going_offline(pfn, nr_pages);
> }
> }
>@@ -658,15 +658,15 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> unsigned long mb_id)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size);
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size);
> unsigned long pfn;
> int sb_id;
>
>- for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>+ for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id++) {
> if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
> virtio_mem_fake_offline_cancel_offline(pfn, nr_pages);
> }
> }
>@@ -1079,8 +1079,8 @@ static int virtio_mem_sbm_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
> int sb_id, int count)
> {
> const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size;
>- const uint64_t size = count * vm->subblock_size;
>+ sb_id * vm->sbm.sb_size;
>+ const uint64_t size = count * vm->sbm.sb_size;
> int rc;
>
> dev_dbg(&vm->vdev->dev, "plugging memory block: %lu : %i - %i\n", mb_id,
>@@ -1100,8 +1100,8 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> int sb_id, int count)
> {
> const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size;
>- const uint64_t size = count * vm->subblock_size;
>+ sb_id * vm->sbm.sb_size;
>+ const uint64_t size = count * vm->sbm.sb_size;
> int rc;
>
> dev_dbg(&vm->vdev->dev, "unplugging memory block: %lu : %i - %i\n",
>@@ -1128,7 +1128,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> int sb_id, count;
> int rc;
>
>- sb_id = vm->nb_sb_per_mb - 1;
>+ sb_id = vm->sbm.sbs_per_mb - 1;
> while (*nb_sb) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
>@@ -1163,7 +1163,7 @@ static int virtio_mem_sbm_unplug_any_sb(struct virtio_mem *vm,
> */
> static int virtio_mem_sbm_unplug_mb(struct virtio_mem *vm, unsigned long mb_id)
> {
>- uint64_t nb_sb = vm->nb_sb_per_mb;
>+ uint64_t nb_sb = vm->sbm.sbs_per_mb;
>
> return virtio_mem_sbm_unplug_any_sb(vm, mb_id, &nb_sb);
> }
>@@ -1203,7 +1203,7 @@ static int virtio_mem_sbm_prepare_next_mb(struct virtio_mem *vm,
> static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> unsigned long mb_id, uint64_t *nb_sb)
> {
>- const int count = min_t(int, *nb_sb, vm->nb_sb_per_mb);
>+ const int count = min_t(int, *nb_sb, vm->sbm.sbs_per_mb);
> int rc;
>
> if (WARN_ON_ONCE(!count))
>@@ -1221,7 +1221,7 @@ static int virtio_mem_sbm_plug_and_add_mb(struct virtio_mem *vm,
> * Mark the block properly offline before adding it to Linux,
> * so the memory notifiers will find the block in the right state.
> */
>- if (count == vm->nb_sb_per_mb)
>+ if (count == vm->sbm.sbs_per_mb)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE);
> else
>@@ -1271,11 +1271,11 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>
> while (*nb_sb) {
> sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
>- if (sb_id >= vm->nb_sb_per_mb)
>+ if (sb_id >= vm->sbm.sbs_per_mb)
> break;
> count = 1;
> while (count < *nb_sb &&
>- sb_id + count < vm->nb_sb_per_mb &&
>+ sb_id + count < vm->sbm.sbs_per_mb &&
> !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> count++;
>
>@@ -1288,12 +1288,12 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>
> /* fake-online the pages if the memory block is online */
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>- nr_pages = PFN_DOWN(count * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
>+ nr_pages = PFN_DOWN(count * vm->sbm.sb_size);
> virtio_mem_fake_online(pfn, nr_pages);
> }
>
>- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> if (online)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1310,7 +1310,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> */
> static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
>- uint64_t nb_sb = diff / vm->subblock_size;
>+ uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
> int rc;
>
>@@ -1393,13 +1393,13 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct virtio_mem *vm,
> rc = virtio_mem_sbm_unplug_any_sb(vm, mb_id, nb_sb);
>
> /* some subblocks might have been unplugged even on failure */
>- if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+ if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb))
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> if (rc)
> return rc;
>
>- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> /*
> * Remove the block from Linux - this should never fail.
> * Hinder the block from getting onlined by marking it
>@@ -1426,12 +1426,12 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
>- const unsigned long nr_pages = PFN_DOWN(vm->subblock_size) * count;
>+ const unsigned long nr_pages = PFN_DOWN(vm->sbm.sb_size) * count;
> unsigned long start_pfn;
> int rc;
>
> start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
>- sb_id * vm->subblock_size);
>+ sb_id * vm->sbm.sb_size);
>
> rc = virtio_mem_fake_offline(start_pfn, nr_pages);
> if (rc)
>@@ -1467,19 +1467,19 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> int rc, sb_id;
>
> /* If possible, try to unplug the complete block in one shot. */
>- if (*nb_sb >= vm->nb_sb_per_mb &&
>- virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (*nb_sb >= vm->sbm.sbs_per_mb &&
>+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> rc = virtio_mem_sbm_unplug_sb_online(vm, mb_id, 0,
>- vm->nb_sb_per_mb);
>+ vm->sbm.sbs_per_mb);
> if (!rc) {
>- *nb_sb -= vm->nb_sb_per_mb;
>+ *nb_sb -= vm->sbm.sbs_per_mb;
> goto unplugged;
> } else if (rc != -EBUSY)
> return rc;
> }
>
> /* Fallback to single subblocks. */
>- for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
>+ for (sb_id = vm->sbm.sbs_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
> !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>@@ -1501,7 +1501,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> * remove it. This will usually not fail, as no memory is in use
> * anymore - however some other notifiers might NACK the request.
> */
>- if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) {
> mutex_unlock(&vm->hotplug_mutex);
> rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> mutex_lock(&vm->hotplug_mutex);
>@@ -1518,7 +1518,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> */
> static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
>- uint64_t nb_sb = diff / vm->subblock_size;
>+ uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
> int rc;
>
>@@ -1805,11 +1805,11 @@ static int virtio_mem_init(struct virtio_mem *vm)
> * - Is required for now for alloc_contig_range() to work reliably -
> * it doesn't properly handle smaller granularity on ZONE_NORMAL.
> */
>- vm->subblock_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>- pageblock_nr_pages) * PAGE_SIZE;
>- vm->subblock_size = max_t(uint64_t, vm->device_block_size,
>- vm->subblock_size);
>- vm->nb_sb_per_mb = memory_block_size_bytes() / vm->subblock_size;
>+ vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+ pageblock_nr_pages) * PAGE_SIZE;
>+ vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>+ vm->sbm.sb_size);
>+ vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>
> /* Round up to the next full memory block */
> vm->first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>@@ -1827,7 +1827,7 @@ static int virtio_mem_init(struct virtio_mem *vm)
> dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> memory_block_size_bytes());
> dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>- (unsigned long long)vm->subblock_size);
>+ (unsigned long long)vm->sbm.sb_size);
> if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
>
>--
>2.26.2
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:11PM +0200, David Hildenbrand wrote:
>Let's rename and move accordingly. While at it, rename sb_bitmap to
>"sb_states".
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
Ok, you separate the change into two parts.
Reviewed-by: Wei Yang <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
> 1 file changed, 62 insertions(+), 56 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e76d6f769aa5..2cc497ad8298 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -137,17 +137,23 @@ struct virtio_mem {
> * memory in one 4 KiB page.
> */
> uint8_t *mb_states;
>- } sbm;
>
>- /*
>- * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
>- *
>- * With 4MB subblocks, we manage 128GB of memory in one page.
>- */
>- unsigned long *sb_bitmap;
>+ /*
>+ * Bitmap: one bit per subblock. Allocated similar to
>+ * sbm.mb_states.
>+ *
>+ * A set bit means the corresponding subblock is plugged,
>+ * otherwise it's unblocked.
>+ *
>+ * With 4 MiB subblocks, we manage 128 GiB of memory in one
>+ * 4 KiB page.
>+ */
>+ unsigned long *sb_states;
>+ } sbm;
>
> /*
>- * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
>+ * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>+ * sbm.sb_states.
> *
> * When this lock is held the pointers can't change, ONLINE and
> * OFFLINE blocks can't change the state and no subblocks will get
>@@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> *
> * Will not modify the state of the memory block.
> */
>-static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
>- unsigned long mb_id, int sb_id,
>- int count)
>+static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
>+ unsigned long mb_id, int sb_id,
>+ int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
>- __bitmap_set(vm->sb_bitmap, bit, count);
>+ __bitmap_set(vm->sbm.sb_states, bit, count);
> }
>
> /*
>@@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
> *
> * Will not modify the state of the memory block.
> */
>-static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
>- unsigned long mb_id, int sb_id,
>- int count)
>+static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
>+ unsigned long mb_id, int sb_id,
>+ int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
>- __bitmap_clear(vm->sb_bitmap, bit, count);
>+ __bitmap_clear(vm->sbm.sb_states, bit, count);
> }
>
> /*
> * Test if all selected subblocks are plugged.
> */
>-static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
>- unsigned long mb_id, int sb_id,
>- int count)
>+static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
>+ unsigned long mb_id, int sb_id,
>+ int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> if (count == 1)
>- return test_bit(bit, vm->sb_bitmap);
>+ return test_bit(bit, vm->sbm.sb_states);
>
> /* TODO: Helper similar to bitmap_set() */
>- return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
>+ return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
> bit + count;
> }
>
> /*
> * Test if all selected subblocks are unplugged.
> */
>-static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
>- unsigned long mb_id, int sb_id,
>- int count)
>+static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
>+ unsigned long mb_id, int sb_id,
>+ int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> /* TODO: Helper similar to bitmap_set() */
>- return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
>+ return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
>+ bit + count;
> }
>
> /*
> * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
> * none.
> */
>-static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
>+static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> unsigned long mb_id)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
>
>- return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
>- bit;
>+ return find_next_zero_bit(vm->sbm.sb_states,
>+ bit + vm->nb_sb_per_mb, bit) - bit;
> }
>
> /*
> * Prepare the subblock bitmap for the next memory block.
> */
>-static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
>+static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
> const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
> const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
> int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
>- unsigned long *new_sb_bitmap, *old_sb_bitmap;
>+ unsigned long *new_bitmap, *old_bitmap;
>
>- if (vm->sb_bitmap && old_pages == new_pages)
>+ if (vm->sbm.sb_states && old_pages == new_pages)
> return 0;
>
>- new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
>- if (!new_sb_bitmap)
>+ new_bitmap = vzalloc(new_pages * PAGE_SIZE);
>+ if (!new_bitmap)
> return -ENOMEM;
>
> mutex_lock(&vm->hotplug_mutex);
>- if (new_sb_bitmap)
>- memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
>+ if (new_bitmap)
>+ memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
>
>- old_sb_bitmap = vm->sb_bitmap;
>- vm->sb_bitmap = new_sb_bitmap;
>+ old_bitmap = vm->sbm.sb_states;
>+ vm->sbm.sb_states = new_bitmap;
> mutex_unlock(&vm->hotplug_mutex);
>
>- vfree(old_sb_bitmap);
>+ vfree(old_bitmap);
> return 0;
> }
>
>@@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> int sb_id;
>
> for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> sb_id * vm->subblock_size);
>@@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> int sb_id;
>
> for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
>- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> sb_id * vm->subblock_size);
>@@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> * If plugged, online the pages, otherwise, set them fake
> * offline (PageOffline).
> */
>- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> generic_online_page(page, order);
> else
> virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
> rc = virtio_mem_send_plug_request(vm, addr, size);
> if (!rc)
>- virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
>+ virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
> return rc;
> }
>
>@@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
> rc = virtio_mem_send_unplug_request(vm, addr, size);
> if (!rc)
>- virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
>+ virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
> return rc;
> }
>
>@@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
> while (*nb_sb) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
>- virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
>+ virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
> sb_id--;
> if (sb_id < 0)
> break;
> /* Try to unplug multiple subblocks at a time */
> count = 1;
> while (count < *nb_sb && sb_id > 0 &&
>- virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
>+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
> count++;
> sb_id--;
> }
>@@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> return rc;
>
> /* Resize the subblock bitmap if required. */
>- rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
>+ rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
> if (rc)
> return rc;
>
>@@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> return -EINVAL;
>
> while (*nb_sb) {
>- sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
>+ sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
> if (sb_id >= vm->nb_sb_per_mb)
> break;
> count = 1;
> while (count < *nb_sb &&
> sb_id + count < vm->nb_sb_per_mb &&
>- !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
>- 1))
>+ !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> count++;
>
> rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
>@@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> virtio_mem_fake_online(pfn, nr_pages);
> }
>
>- if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> if (online)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_ONLINE);
>@@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
>
> /* some subblocks might have been unplugged even on failure */
>- if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
>+ if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> if (rc)
> return rc;
>
>- if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> /*
> * Remove the block from Linux - this should never fail.
> * Hinder the block from getting onlined by marking it
>@@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>
> /* If possible, try to unplug the complete block in one shot. */
> if (*nb_sb >= vm->nb_sb_per_mb &&
>- virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
> vm->nb_sb_per_mb);
> if (!rc) {
>@@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
>- !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> sb_id--;
> if (sb_id < 0)
> break;
>@@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> * remove it. This will usually not fail, as no memory is in use
> * anymore - however some other notifiers might NACK the request.
> */
>- if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
>+ if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> mutex_unlock(&vm->hotplug_mutex);
> rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> mutex_lock(&vm->hotplug_mutex);
>@@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>
> /* remove all tracking data - no locking needed */
> vfree(vm->sbm.mb_states);
>- vfree(vm->sb_bitmap);
>+ vfree(vm->sbm.sb_states);
>
> /* reset the device and cleanup the queues */
> vdev->config->reset(vdev);
>--
>2.26.2
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>Currently, we do not support device block sizes that exceed the Linux
>memory block size. For example, having a device block size of 1 GiB (e.g.,
>gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>blocks.
>
>Let's implement Big Block Mode (BBM), whereby we add/remove at least
>one Linux memory block at a time. With a 1 GiB device block size, a Big
>Block (BB) will cover 8 Linux memory blocks.
>
>We'll keep registering the online_page_callback machinery, it will be used
>for safe memory hotunplug in BBM next.
>
>Note: BBM is properly prepared for variable-sized Linux memory
>blocks that we might see in the future. So we won't care how many Linux
>memory blocks a big block actually spans, and how the memory notifier is
>called.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Cc: Michal Hocko <[email protected]>
>Cc: Oscar Salvador <[email protected]>
>Cc: Wei Yang <[email protected]>
>Cc: Andrew Morton <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
> 1 file changed, 402 insertions(+), 82 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e68d0d99590c..4d396ef98a92 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
> /*
> * virtio-mem currently supports the following modes of operation:
> *
>- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
> * size of a Sub Block (SB) is determined based on the device block size, the
> * pageblock size, and the maximum allocation granularity of the buddy.
> * Subblocks within a Linux memory block might either be plugged or unplugged.
> * Memory is added/removed to Linux MM in Linux memory block granularity.
> *
>+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>+ * Memory is added/removed to Linux MM in Big Block granularity.
>+ *
>+ * The mode is determined automatically based on the Linux memory block size
>+ * and the device block size.
>+ *
> * User space / core MM (auto onlining) is responsible for onlining added
> * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
> * always onlined separately, and all memory within a Linux memory block is
>@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
> VIRTIO_MEM_SBM_MB_COUNT
> };
>
>+/*
>+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>+ */
>+enum virtio_mem_bbm_bb_state {
>+ /* Unplugged, not added to Linux. Can be reused later. */
>+ VIRTIO_MEM_BBM_BB_UNUSED = 0,
>+ /* Plugged, not added to Linux. Error on add_memory(). */
>+ VIRTIO_MEM_BBM_BB_PLUGGED,
>+ /* Plugged and added to Linux. */
>+ VIRTIO_MEM_BBM_BB_ADDED,
>+ VIRTIO_MEM_BBM_BB_COUNT
>+};
>+
> struct virtio_mem {
> struct virtio_device *vdev;
>
>@@ -113,6 +132,9 @@ struct virtio_mem {
> atomic64_t offline_size;
> uint64_t offline_threshold;
>
>+ /* If set, the driver is in SBM, otherwise in BBM. */
>+ bool in_sbm;
>+
> struct {
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
>@@ -151,9 +173,27 @@ struct virtio_mem {
> unsigned long *sb_states;
> } sbm;
>
>+ struct {
>+ /* Id of the first big block of this device. */
>+ unsigned long first_bb_id;
>+ /* Id of the last usable big block of this device. */
>+ unsigned long last_usable_bb_id;
>+ /* Id of the next device bock to prepare when needed. */
>+ unsigned long next_bb_id;
>+
>+ /* Summary of all big block states. */
>+ unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>+
>+ /* One byte state per big block. See sbm.mb_states. */
>+ uint8_t *bb_states;
>+
>+ /* The block size used for (un)plugged, adding/removing. */
>+ uint64_t bb_size;
>+ } bbm;
Can we use a union here?
>+
> /*
>- * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>- * sbm.sb_states.
>+ * Mutex that protects the sbm.mb_count, sbm.mb_states,
>+ * sbm.sb_states, bbm.bb_count, and bbm.bb_states
> *
> * When this lock is held the pointers can't change, ONLINE and
> * OFFLINE blocks can't change the state and no subblocks will get
>@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
> return mb_id * memory_block_size_bytes();
> }
>
>+/*
>+ * Calculate the big block id of a given address.
>+ */
>+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>+ uint64_t addr)
>+{
>+ return addr / vm->bbm.bb_size;
>+}
>+
>+/*
>+ * Calculate the physical start address of a given big block id.
>+ */
>+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ return bb_id * vm->bbm.bb_size;
>+}
>+
> /*
> * Calculate the subblock id of a given address.
> */
>@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> return (addr - mb_addr) / vm->sbm.sb_size;
> }
>
>+/*
>+ * Set the state of a big block, taking care of the state counter.
>+ */
>+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>+ unsigned long bb_id,
>+ enum virtio_mem_bbm_bb_state state)
>+{
>+ const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>+ enum virtio_mem_bbm_bb_state old_state;
>+
>+ old_state = vm->bbm.bb_states[idx];
>+ vm->bbm.bb_states[idx] = state;
>+
>+ BUG_ON(vm->bbm.bb_count[old_state] == 0);
>+ vm->bbm.bb_count[old_state]--;
>+ vm->bbm.bb_count[state]++;
>+}
>+
>+/*
>+ * Get the state of a big block.
>+ */
>+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>+}
>+
>+/*
>+ * Prepare the big block state array for the next big block.
>+ */
>+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>+{
>+ unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>+ unsigned long new_bytes = old_bytes + 1;
>+ int old_pages = PFN_UP(old_bytes);
>+ int new_pages = PFN_UP(new_bytes);
>+ uint8_t *new_array;
>+
>+ if (vm->bbm.bb_states && old_pages == new_pages)
>+ return 0;
>+
>+ new_array = vzalloc(new_pages * PAGE_SIZE);
>+ if (!new_array)
>+ return -ENOMEM;
>+
>+ mutex_lock(&vm->hotplug_mutex);
>+ if (vm->bbm.bb_states)
>+ memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>+ vfree(vm->bbm.bb_states);
>+ vm->bbm.bb_states = new_array;
>+ mutex_unlock(&vm->hotplug_mutex);
>+
>+ return 0;
>+}
>+
>+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>+ for (_bb_id = vm->bbm.first_bb_id; \
>+ _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>+ _bb_id++) \
>+ if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>+
> /*
> * Set the state of a memory block, taking care of the state counter.
> */
>@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
> return virtio_mem_add_memory(vm, addr, size);
> }
>
>+/*
>+ * See virtio_mem_add_memory(): Try adding a big block.
>+ */
>+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_add_memory(vm, addr, size);
>+}
>+
> /*
> * Try removing memory from Linux. Will only fail if memory blocks aren't
> * offline.
>@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> struct memory_notify *mhp = arg;
> const unsigned long start = PFN_PHYS(mhp->start_pfn);
> const unsigned long size = PFN_PHYS(mhp->nr_pages);
>- const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
> int rc = NOTIFY_OK;
>+ unsigned long id;
>
> if (!virtio_mem_overlaps_range(vm, start, size))
> return NOTIFY_DONE;
>
>- /*
>- * Memory is onlined/offlined in memory block granularity. We cannot
>- * cross virtio-mem device boundaries and memory block boundaries. Bail
>- * out if this ever changes.
>- */
>- if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>- !IS_ALIGNED(start, memory_block_size_bytes())))
>- return NOTIFY_BAD;
>+ if (vm->in_sbm) {
>+ id = virtio_mem_phys_to_mb_id(start);
>+ /*
>+ * In SBM, we add memory in separate memory blocks - we expect
>+ * it to be onlined/offlined in the same granularity. Bail out
>+ * if this ever changes.
>+ */
>+ if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>+ !IS_ALIGNED(start, memory_block_size_bytes())))
>+ return NOTIFY_BAD;
>+ } else {
>+ id = virtio_mem_phys_to_bb_id(vm, start);
>+ /*
>+ * In BBM, we only care about onlining/offlining happening
>+ * within a single big block, we don't care about the
>+ * actual granularity as we don't track individual Linux
>+ * memory blocks.
>+ */
>+ if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>+ return NOTIFY_BAD;
>+ }
>
> /*
> * Avoid circular locking lockdep warnings. We lock the mutex
>@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> break;
> }
> vm->hotplug_active = true;
>- virtio_mem_sbm_notify_going_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_going_offline(vm, id);
> break;
> case MEM_GOING_ONLINE:
> mutex_lock(&vm->hotplug_mutex);
>@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> break;
> }
> vm->hotplug_active = true;
>- rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>+ if (vm->in_sbm)
>+ rc = virtio_mem_sbm_notify_going_online(vm, id);
> break;
> case MEM_OFFLINE:
>- virtio_mem_sbm_notify_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_offline(vm, id);
>
> atomic64_add(size, &vm->offline_size);
> /*
>@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> mutex_unlock(&vm->hotplug_mutex);
> break;
> case MEM_ONLINE:
>- virtio_mem_sbm_notify_online(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_online(vm, id);
>
> atomic64_sub(size, &vm->offline_size);
> /*
>@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> case MEM_CANCEL_OFFLINE:
> if (!vm->hotplug_active)
> break;
>- virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_cancel_offline(vm, id);
> vm->hotplug_active = false;
> mutex_unlock(&vm->hotplug_mutex);
> break;
>@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> const unsigned long addr = page_to_phys(page);
>- const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>+ unsigned long id, sb_id;
> struct virtio_mem *vm;
>- int sb_id;
>+ bool do_online;
>
>- /*
>- * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>- * size/alignment and that this callback is is called with such a
>- * size/alignment. So we cannot cross subblocks and therefore
>- * also not memory blocks.
>- */
> rcu_read_lock();
> list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> continue;
>
>- sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>- /*
>- * If plugged, online the pages, otherwise, set them fake
>- * offline (PageOffline).
>- */
>- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ if (vm->in_sbm) {
>+ /*
>+ * We exploit here that subblocks have at least
>+ * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>+ * cross subblocks within one call.
>+ */
>+ id = virtio_mem_phys_to_mb_id(addr);
>+ sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>+ do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>+ sb_id, 1);
>+ } else {
>+ do_online = true;
>+ }
>+ if (do_online)
> generic_online_page(page, order);
> else
> virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> return rc;
> }
>
>+/*
>+ * Request to unplug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_send_unplug_request(vm, addr, size);
>+}
>+
>+/*
>+ * Request to plug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_send_plug_request(vm, addr, size);
>+}
>+
> /*
> * Unplug the desired number of plugged subblocks of a offline or not-added
> * memory block. Will fail if any subblock cannot get unplugged (instead of
>@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> return 0;
> }
>
>-/*
>- * Try to plug the requested amount of memory.
>- */
>-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
> uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
>@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> return rc;
> }
>
>+/*
>+ * Plug a big block and add it to Linux.
>+ *
>+ * Will modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ int rc;
>+
>+ if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+ VIRTIO_MEM_BBM_BB_UNUSED))
>+ return -EINVAL;
>+
>+ rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>+ if (rc)
>+ return rc;
>+ virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>+
>+ rc = virtio_mem_bbm_add_bb(vm, bb_id);
>+ if (rc) {
>+ if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>+ virtio_mem_bbm_set_bb_state(vm, bb_id,
>+ VIRTIO_MEM_BBM_BB_UNUSED);
>+ else
>+ /* Retry from the main loop. */
>+ virtio_mem_bbm_set_bb_state(vm, bb_id,
>+ VIRTIO_MEM_BBM_BB_PLUGGED);
>+ return rc;
>+ }
>+ return 0;
>+}
>+
>+/*
>+ * Prepare tracking data for the next big block.
>+ */
>+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>+ unsigned long *bb_id)
>+{
>+ int rc;
>+
>+ if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>+ return -ENOSPC;
>+
>+ /* Resize the big block state array if required. */
>+ rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>+ if (rc)
>+ return rc;
>+
>+ vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>+ *bb_id = vm->bbm.next_bb_id;
>+ vm->bbm.next_bb_id++;
>+ return 0;
>+}
>+
>+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ uint64_t nb_bb = diff / vm->bbm.bb_size;
>+ unsigned long bb_id;
>+ int rc;
>+
>+ if (!nb_bb)
>+ return 0;
>+
>+ /* Try to plug and add unused big blocks */
>+ virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+ return -ENOSPC;
>+
>+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+ if (!rc)
>+ nb_bb--;
>+ if (rc || !nb_bb)
>+ return rc;
>+ cond_resched();
>+ }
>+
>+ /* Try to prepare, plug and add new big blocks */
>+ while (nb_bb) {
>+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+ return -ENOSPC;
>+
>+ rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>+ if (rc)
>+ return rc;
>+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+ if (!rc)
>+ nb_bb--;
>+ if (rc)
>+ return rc;
>+ cond_resched();
>+ }
>+
>+ return 0;
>+}
>+
>+/*
>+ * Try to plug the requested amount of memory.
>+ */
>+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ if (vm->in_sbm)
>+ return virtio_mem_sbm_plug_request(vm, diff);
>+ return virtio_mem_bbm_plug_request(vm, diff);
>+}
>+
> /*
> * Unplug the desired number of plugged subblocks of an offline memory block.
> * Will fail if any subblock cannot get unplugged (instead of skipping it).
>@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> return 0;
> }
>
>-/*
>- * Try to unplug the requested amount of memory.
>- */
>-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
> uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
>@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> return rc;
> }
>
>+/*
>+ * Try to unplug the requested amount of memory.
>+ */
>+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ if (vm->in_sbm)
>+ return virtio_mem_sbm_unplug_request(vm, diff);
>+ return -EBUSY;
>+}
>+
> /*
> * Try to unplug all blocks that couldn't be unplugged before, for example,
> * because the hypervisor was busy.
> */
> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> {
>- unsigned long mb_id;
>+ unsigned long id;
> int rc;
>
>- virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>- rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>+ if (!vm->in_sbm) {
>+ virtio_mem_bbm_for_each_bb(vm, id,
>+ VIRTIO_MEM_BBM_BB_PLUGGED) {
>+ rc = virtio_mem_bbm_unplug_bb(vm, id);
>+ if (rc)
>+ return rc;
>+ virtio_mem_bbm_set_bb_state(vm, id,
>+ VIRTIO_MEM_BBM_BB_UNUSED);
>+ }
>+ return 0;
>+ }
>+
>+ virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>+ rc = virtio_mem_sbm_unplug_mb(vm, id);
> if (rc)
> return rc;
>- virtio_mem_sbm_set_mb_state(vm, mb_id,
>+ virtio_mem_sbm_set_mb_state(vm, id,
> VIRTIO_MEM_SBM_MB_UNUSED);
> }
>
>@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
> usable_region_size, &usable_region_size);
> end_addr = vm->addr + usable_region_size;
> end_addr = min(end_addr, phys_limit);
>- vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>+
>+ if (vm->in_sbm)
>+ vm->sbm.last_usable_mb_id =
>+ virtio_mem_phys_to_mb_id(end_addr) - 1;
>+ else
>+ vm->bbm.last_usable_bb_id =
>+ virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
>
> /* see if there is a request to change the size */
> virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
> static int virtio_mem_init(struct virtio_mem *vm)
> {
> const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>+ uint64_t sb_size, addr;
> uint16_t node_id;
>
> if (!vm->vdev->config->get) {
>@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> if (vm->nid == NUMA_NO_NODE)
> vm->nid = memory_add_physaddr_to_nid(vm->addr);
>
>- /*
>- * We always hotplug memory in memory block granularity. This way,
>- * we have to wait for exactly one memory block to online.
>- */
>- if (vm->device_block_size > memory_block_size_bytes()) {
>- dev_err(&vm->vdev->dev,
>- "The block size is not supported (too big).\n");
>- return -EINVAL;
>- }
>-
> /* bad device setup - warn only */
> if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
> dev_warn(&vm->vdev->dev,
>@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
> * - Is required for now for alloc_contig_range() to work reliably -
> * it doesn't properly handle smaller granularity on ZONE_NORMAL.
> */
>- vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>- pageblock_nr_pages) * PAGE_SIZE;
>- vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>- vm->sbm.sb_size);
>- vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>+ sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+ pageblock_nr_pages) * PAGE_SIZE;
>+ sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>+
>+ if (sb_size < memory_block_size_bytes()) {
>+ /* SBM: At least two subblocks per Linux memory block. */
>+ vm->in_sbm = true;
>+ vm->sbm.sb_size = sb_size;
>+ vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>+ vm->sbm.sb_size;
>+
>+ /* Round up to the next full memory block */
>+ addr = vm->addr + memory_block_size_bytes() - 1;
>+ vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>+ vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+ } else {
>+ /* BBM: At least one Linux memory block. */
>+ vm->bbm.bb_size = vm->device_block_size;
>
>- /* Round up to the next full memory block */
>- vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>- memory_block_size_bytes());
>- vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+ vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
>+ vm->bbm.next_bb_id = vm->bbm.first_bb_id;
>+ }
>
> /* Prepare the offline threshold - make sure we can add two blocks. */
> vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
> VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
>+ /* In BBM, we also want at least two big blocks. */
>+ vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
>+ vm->offline_threshold);
>
> dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
> (unsigned long long)vm->device_block_size);
> dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> memory_block_size_bytes());
>- dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>- (unsigned long long)vm->sbm.sb_size);
>+ if (vm->in_sbm)
>+ dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>+ (unsigned long long)vm->sbm.sb_size);
>+ else
>+ dev_info(&vm->vdev->dev, "big block size: 0x%llx",
>+ (unsigned long long)vm->bbm.bb_size);
> if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
>
>@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> cancel_work_sync(&vm->wq);
> hrtimer_cancel(&vm->retry_timer);
>
>- /*
>- * After we unregistered our callbacks, user space can online partially
>- * plugged offline blocks. Make sure to remove them.
>- */
>- virtio_mem_sbm_for_each_mb(vm, mb_id,
>- VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>- rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>- BUG_ON(rc);
>- virtio_mem_sbm_set_mb_state(vm, mb_id,
>- VIRTIO_MEM_SBM_MB_UNUSED);
>+ if (vm->in_sbm) {
>+ /*
>+ * After we unregistered our callbacks, user space can online
>+ * partially plugged offline blocks. Make sure to remove them.
>+ */
>+ virtio_mem_sbm_for_each_mb(vm, mb_id,
>+ VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>+ rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>+ BUG_ON(rc);
>+ virtio_mem_sbm_set_mb_state(vm, mb_id,
>+ VIRTIO_MEM_SBM_MB_UNUSED);
>+ }
>+ /*
>+ * After we unregistered our callbacks, user space can no longer
>+ * offline partially plugged online memory blocks. No need to
>+ * worry about them.
>+ */
> }
>- /*
>- * After we unregistered our callbacks, user space can no longer
>- * offline partially plugged online memory blocks. No need to worry
>- * about them.
>- */
>
> /* unregister callbacks */
> unregister_virtio_mem_device(vm);
>@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> }
>
> /* remove all tracking data - no locking needed */
>- vfree(vm->sbm.mb_states);
>- vfree(vm->sbm.sb_states);
>+ if (vm->in_sbm) {
>+ vfree(vm->sbm.mb_states);
>+ vfree(vm->sbm.sb_states);
>+ } else {
>+ vfree(vm->bbm.bb_states);
>+ }
>
> /* reset the device and cleanup the queues */
> vdev->config->reset(vdev);
>--
>2.26.2
--
Wei Yang
Help you, Help me
On 16.10.20 10:53, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>
>> Cc: "Michael S. Tsirkin" <[email protected]>
>> Cc: Jason Wang <[email protected]>
>> Cc: Pankaj Gupta <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> One trivial suggestion, could we move this patch close the data structure
> movement patch?
>
> I know this would be some work, since you have changed some of the code logic.
> This would take you some time to rebase.
You mean after patch #17 ?
I guess I can move patch #18 (prereq) a little further up (e.g., after
patch #15). Guess moving it in front of #19 shouldn't be too hard.
Will give it a try - if it takes too much effort, I'll leave it like this.
Thanks!
--
Thanks,
David / dhildenb
On 16.10.20 11:38, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>> Currently, we do not support device block sizes that exceed the Linux
>> memory block size. For example, having a device block size of 1 GiB (e.g.,
>> gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>> blocks.
>>
>> Let's implement Big Block Mode (BBM), whereby we add/remove at least
>> one Linux memory block at a time. With a 1 GiB device block size, a Big
>> Block (BB) will cover 8 Linux memory blocks.
>>
>> We'll keep registering the online_page_callback machinery, it will be used
>> for safe memory hotunplug in BBM next.
>>
>> Note: BBM is properly prepared for variable-sized Linux memory
>> blocks that we might see in the future. So we won't care how many Linux
>> memory blocks a big block actually spans, and how the memory notifier is
>> called.
>>
>> Cc: "Michael S. Tsirkin" <[email protected]>
>> Cc: Jason Wang <[email protected]>
>> Cc: Pankaj Gupta <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Oscar Salvador <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
>> 1 file changed, 402 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index e68d0d99590c..4d396ef98a92 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>> /*
>> * virtio-mem currently supports the following modes of operation:
>> *
>> - * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>> + * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>> * size of a Sub Block (SB) is determined based on the device block size, the
>> * pageblock size, and the maximum allocation granularity of the buddy.
>> * Subblocks within a Linux memory block might either be plugged or unplugged.
>> * Memory is added/removed to Linux MM in Linux memory block granularity.
>> *
>> + * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>> + * Memory is added/removed to Linux MM in Big Block granularity.
>> + *
>> + * The mode is determined automatically based on the Linux memory block size
>> + * and the device block size.
>> + *
>> * User space / core MM (auto onlining) is responsible for onlining added
>> * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>> * always onlined separately, and all memory within a Linux memory block is
>> @@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
>> VIRTIO_MEM_SBM_MB_COUNT
>> };
>>
>> +/*
>> + * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>> + */
>> +enum virtio_mem_bbm_bb_state {
>> + /* Unplugged, not added to Linux. Can be reused later. */
>> + VIRTIO_MEM_BBM_BB_UNUSED = 0,
>> + /* Plugged, not added to Linux. Error on add_memory(). */
>> + VIRTIO_MEM_BBM_BB_PLUGGED,
>> + /* Plugged and added to Linux. */
>> + VIRTIO_MEM_BBM_BB_ADDED,
>> + VIRTIO_MEM_BBM_BB_COUNT
>> +};
>> +
>> struct virtio_mem {
>> struct virtio_device *vdev;
>>
>> @@ -113,6 +132,9 @@ struct virtio_mem {
>> atomic64_t offline_size;
>> uint64_t offline_threshold;
>>
>> + /* If set, the driver is in SBM, otherwise in BBM. */
>> + bool in_sbm;
>> +
>> struct {
>> /* Id of the first memory block of this device. */
>> unsigned long first_mb_id;
>> @@ -151,9 +173,27 @@ struct virtio_mem {
>> unsigned long *sb_states;
>> } sbm;
>>
>> + struct {
>> + /* Id of the first big block of this device. */
>> + unsigned long first_bb_id;
>> + /* Id of the last usable big block of this device. */
>> + unsigned long last_usable_bb_id;
>> + /* Id of the next device bock to prepare when needed. */
>> + unsigned long next_bb_id;
>> +
>> + /* Summary of all big block states. */
>> + unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>> +
>> + /* One byte state per big block. See sbm.mb_states. */
>> + uint8_t *bb_states;
>> +
>> + /* The block size used for (un)plugged, adding/removing. */
>> + uint64_t bb_size;
>> + } bbm;
>
> Can we use a union here?
As I had the same thought initially, it most probably makes sense :)
Thanks!
--
Thanks,
David / dhildenb
On Fri, Oct 16, 2020 at 03:17:06PM +0200, David Hildenbrand wrote:
>On 16.10.20 10:53, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>>
>>> Cc: "Michael S. Tsirkin" <[email protected]>
>>> Cc: Jason Wang <[email protected]>
>>> Cc: Pankaj Gupta <[email protected]>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>
>> One trivial suggestion, could we move this patch close the data structure
>> movement patch?
>>
>> I know this would be some work, since you have changed some of the code logic.
>> This would take you some time to rebase.
>
>You mean after patch #17 ?
Yes
>
>I guess I can move patch #18 (prereq) a little further up (e.g., after
>patch #15). Guess moving it in front of #19 shouldn't be too hard.
>
>Will give it a try - if it takes too much effort, I'll leave it like this.
>
Not a big deal, while it will make the change more intact to me.
This is a big patch set to me. In case it could be split into two parts, like
bug fix/logic improvement and BBM implementation, that would be more friendly
to review.
>Thanks!
>
>--
>Thanks,
>
>David / dhildenb
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
>virtio-mem currently only supports device block sizes that span at most
>a single Linux memory block. For example, gigantic pages in the hypervisor
>result on x86-64 in a device block size of 1 GiB - when the Linux memory
>block size is 128 MiB, we cannot support such devices (we fail loading the
>driver). Of course, we want to support any device block size in any Linux
>VM.
>
>Bigger device block sizes will become especially important once supporting
>VFIO in QEMU - each device block has to be mapped separately, and the
>maximum number of mappings for VFIO is 64k. So we usually want blocks in
>the gigabyte range when wanting to grow the VM big.
>
>This series:
>- Performs some cleanups
>- Factors out existing Sub Block Mode (SBM)
>- Implements memory hot(un)plug in Big Block Mode (BBM)
>
>I need one core-mm change, to make offline_and_remove_memory() eat bigger
>chunks.
>
>This series is based on "next-20201009" and can be found at:
> [email protected]:virtio-mem/linux.git virtio-mem-dbm-v1
>
I am trying to apply this patch set, while found I can't 'git fetch' this
repo. Is there any other repo I would apply this patch set?
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
> virtio-mem currently only supports device block sizes that span at most
> a single Linux memory block. For example, gigantic pages in the hypervisor
> result on x86-64 in a device block size of 1 GiB - when the Linux memory
> block size is 128 MiB, we cannot support such devices (we fail loading the
> driver). Of course, we want to support any device block size in any Linux
> VM.
>
> Bigger device block sizes will become especially important once supporting
> VFIO in QEMU - each device block has to be mapped separately, and the
> maximum number of mappings for VFIO is 64k. So we usually want blocks in
> the gigabyte range when wanting to grow the VM big.
I guess it missed this Linux right? There's an mm change which did not
get an ack from mm mainatiners, so I can't merge it ...
> This series:
> - Performs some cleanups
> - Factors out existing Sub Block Mode (SBM)
> - Implements memory hot(un)plug in Big Block Mode (BBM)
>
> I need one core-mm change, to make offline_and_remove_memory() eat bigger
> chunks.
>
> This series is based on "next-20201009" and can be found at:
> [email protected]:virtio-mem/linux.git virtio-mem-dbm-v1
>
> Once some virtio-mem patches that are pending in the -mm tree are upstream
> (I guess they'll go in in 5.10), I'll resend based on Linus' tree.
> I suggest to take this (including the MM patch, acks/review please) via the
> vhost tree once time has come. In the meantime, I'll do more testing.
>
> David Hildenbrand (29):
> virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
> virtio-mem: simplify calculation in
> virtio_mem_mb_state_prepare_next_mb()
> virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
> virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
> virtio-mem: generalize check for added memory
> virtio-mem: generalize virtio_mem_owned_mb()
> virtio-mem: generalize virtio_mem_overlaps_range()
> virtio-mem: drop last_mb_id
> virtio-mem: don't always trigger the workqueue when offlining memory
> virtio-mem: generalize handling when memory is getting onlined
> deferred
> virtio-mem: use "unsigned long" for nr_pages when fake
> onlining/offlining
> virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
> virtio-mem: factor out handling of fake-offline pages in memory
> notifier
> virtio-mem: retry fake-offlining via alloc_contig_range() on
> ZONE_MOVABLE
> virito-mem: document Sub Block Mode (SBM)
> virtio-mem: memory block states are specific to Sub Block Mode (SBM)
> virito-mem: subblock states are specific to Sub Block Mode (SBM)
> virtio-mem: factor out calculation of the bit number within the
> sb_states bitmap
> virito-mem: existing (un)plug functions are specific to Sub Block Mode
> (SBM)
> virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block
> Mode (SBM)
> virtio-mem: memory notifier callbacks are specific to Sub Block Mode
> (SBM)
> virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
> virtio-mem: factor out adding/removing memory from Linux
> virtio-mem: print debug messages from virtio_mem_send_*_request()
> virtio-mem: Big Block Mode (BBM) memory hotplug
> virtio-mem: allow to force Big Block Mode (BBM) and set the big block
> size
> mm/memory_hotplug: extend offline_and_remove_memory() to handle more
> than one memory block
> virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
> virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
>
> drivers/virtio/virtio_mem.c | 1783 +++++++++++++++++++++++++----------
> mm/memory_hotplug.c | 105 ++-
> 2 files changed, 1373 insertions(+), 515 deletions(-)
>
> --
> 2.26.2
> Am 18.10.2020 um 17:29 schrieb Michael S. Tsirkin <[email protected]>:
>
> On Mon, Oct 12, 2020 at 02:52:54PM +0200, David Hildenbrand wrote:
>> virtio-mem currently only supports device block sizes that span at most
>> a single Linux memory block. For example, gigantic pages in the hypervisor
>> result on x86-64 in a device block size of 1 GiB - when the Linux memory
>> block size is 128 MiB, we cannot support such devices (we fail loading the
>> driver). Of course, we want to support any device block size in any Linux
>> VM.
>>
>> Bigger device block sizes will become especially important once supporting
>> VFIO in QEMU - each device block has to be mapped separately, and the
>> maximum number of mappings for VFIO is 64k. So we usually want blocks in
>> the gigabyte range when wanting to grow the VM big.
>
> I guess it missed this Linux right? There's an mm change which did not
> get an ack from mm mainatiners, so I can't merge it ...
No issue, I was targeting 5.11 either way! I‘ll resend based on linus‘ tree now that all prereqs are upstream.
On Mon, Oct 12, 2020 at 02:53:15PM +0200, David Hildenbrand wrote:
>Let's rename accordingly.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index 3a772714fec9..d06c8760b337 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
> return start >= vm->addr && start + size <= vm->addr + vm->region_size;
> }
>
>-static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>- unsigned long mb_id)
>+static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
>+ unsigned long mb_id)
Look into this patch with "virtio-mem: Big Block Mode (BBM) memory hotplug"
together, I thought the code is a little "complex".
The final logic of virtio_mem_memory_notifier_cb() looks like this:
virtio_mem_memory_notifier_cb()
if (vm->in_sbm)
notify_xxx()
if (vm->in_sbm)
notify_xxx()
Can we adjust this like
virtio_mem_memory_notifier_cb()
notify_xxx()
if (vm->in_sbm)
return
notify_xxx()
if (vm->in_sbm)
return
This style looks a little better to me.
--
Wei Yang
Help you, Help me
On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>Currently, we do not support device block sizes that exceed the Linux
>memory block size. For example, having a device block size of 1 GiB (e.g.,
>gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>blocks.
>
>Let's implement Big Block Mode (BBM), whereby we add/remove at least
>one Linux memory block at a time. With a 1 GiB device block size, a Big
>Block (BB) will cover 8 Linux memory blocks.
>
>We'll keep registering the online_page_callback machinery, it will be used
>for safe memory hotunplug in BBM next.
>
>Note: BBM is properly prepared for variable-sized Linux memory
>blocks that we might see in the future. So we won't care how many Linux
>memory blocks a big block actually spans, and how the memory notifier is
>called.
>
>Cc: "Michael S. Tsirkin" <[email protected]>
>Cc: Jason Wang <[email protected]>
>Cc: Pankaj Gupta <[email protected]>
>Cc: Michal Hocko <[email protected]>
>Cc: Oscar Salvador <[email protected]>
>Cc: Wei Yang <[email protected]>
>Cc: Andrew Morton <[email protected]>
>Signed-off-by: David Hildenbrand <[email protected]>
>---
> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
> 1 file changed, 402 insertions(+), 82 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index e68d0d99590c..4d396ef98a92 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
> /*
> * virtio-mem currently supports the following modes of operation:
> *
>- * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>+ * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
> * size of a Sub Block (SB) is determined based on the device block size, the
> * pageblock size, and the maximum allocation granularity of the buddy.
> * Subblocks within a Linux memory block might either be plugged or unplugged.
> * Memory is added/removed to Linux MM in Linux memory block granularity.
> *
>+ * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>+ * Memory is added/removed to Linux MM in Big Block granularity.
>+ *
>+ * The mode is determined automatically based on the Linux memory block size
>+ * and the device block size.
>+ *
> * User space / core MM (auto onlining) is responsible for onlining added
> * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
> * always onlined separately, and all memory within a Linux memory block is
>@@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
> VIRTIO_MEM_SBM_MB_COUNT
> };
>
>+/*
>+ * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>+ */
>+enum virtio_mem_bbm_bb_state {
>+ /* Unplugged, not added to Linux. Can be reused later. */
>+ VIRTIO_MEM_BBM_BB_UNUSED = 0,
>+ /* Plugged, not added to Linux. Error on add_memory(). */
>+ VIRTIO_MEM_BBM_BB_PLUGGED,
>+ /* Plugged and added to Linux. */
>+ VIRTIO_MEM_BBM_BB_ADDED,
>+ VIRTIO_MEM_BBM_BB_COUNT
>+};
>+
> struct virtio_mem {
> struct virtio_device *vdev;
>
>@@ -113,6 +132,9 @@ struct virtio_mem {
> atomic64_t offline_size;
> uint64_t offline_threshold;
>
>+ /* If set, the driver is in SBM, otherwise in BBM. */
>+ bool in_sbm;
>+
> struct {
> /* Id of the first memory block of this device. */
> unsigned long first_mb_id;
>@@ -151,9 +173,27 @@ struct virtio_mem {
> unsigned long *sb_states;
> } sbm;
>
>+ struct {
>+ /* Id of the first big block of this device. */
>+ unsigned long first_bb_id;
>+ /* Id of the last usable big block of this device. */
>+ unsigned long last_usable_bb_id;
>+ /* Id of the next device bock to prepare when needed. */
>+ unsigned long next_bb_id;
>+
>+ /* Summary of all big block states. */
>+ unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>+
>+ /* One byte state per big block. See sbm.mb_states. */
>+ uint8_t *bb_states;
>+
>+ /* The block size used for (un)plugged, adding/removing. */
>+ uint64_t bb_size;
>+ } bbm;
>+
> /*
>- * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>- * sbm.sb_states.
>+ * Mutex that protects the sbm.mb_count, sbm.mb_states,
>+ * sbm.sb_states, bbm.bb_count, and bbm.bb_states
> *
> * When this lock is held the pointers can't change, ONLINE and
> * OFFLINE blocks can't change the state and no subblocks will get
>@@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
> return mb_id * memory_block_size_bytes();
> }
>
>+/*
>+ * Calculate the big block id of a given address.
>+ */
>+static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>+ uint64_t addr)
>+{
>+ return addr / vm->bbm.bb_size;
>+}
>+
>+/*
>+ * Calculate the physical start address of a given big block id.
>+ */
>+static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ return bb_id * vm->bbm.bb_size;
>+}
>+
> /*
> * Calculate the subblock id of a given address.
> */
>@@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
> return (addr - mb_addr) / vm->sbm.sb_size;
> }
>
>+/*
>+ * Set the state of a big block, taking care of the state counter.
>+ */
>+static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>+ unsigned long bb_id,
>+ enum virtio_mem_bbm_bb_state state)
>+{
>+ const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>+ enum virtio_mem_bbm_bb_state old_state;
>+
>+ old_state = vm->bbm.bb_states[idx];
>+ vm->bbm.bb_states[idx] = state;
>+
>+ BUG_ON(vm->bbm.bb_count[old_state] == 0);
>+ vm->bbm.bb_count[old_state]--;
>+ vm->bbm.bb_count[state]++;
>+}
>+
>+/*
>+ * Get the state of a big block.
>+ */
>+static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>+}
>+
>+/*
>+ * Prepare the big block state array for the next big block.
>+ */
>+static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>+{
>+ unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>+ unsigned long new_bytes = old_bytes + 1;
>+ int old_pages = PFN_UP(old_bytes);
>+ int new_pages = PFN_UP(new_bytes);
>+ uint8_t *new_array;
>+
>+ if (vm->bbm.bb_states && old_pages == new_pages)
>+ return 0;
>+
>+ new_array = vzalloc(new_pages * PAGE_SIZE);
>+ if (!new_array)
>+ return -ENOMEM;
>+
>+ mutex_lock(&vm->hotplug_mutex);
>+ if (vm->bbm.bb_states)
>+ memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>+ vfree(vm->bbm.bb_states);
>+ vm->bbm.bb_states = new_array;
>+ mutex_unlock(&vm->hotplug_mutex);
>+
>+ return 0;
>+}
>+
>+#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>+ for (_bb_id = vm->bbm.first_bb_id; \
>+ _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>+ _bb_id++) \
>+ if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>+
> /*
> * Set the state of a memory block, taking care of the state counter.
> */
>@@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
> return virtio_mem_add_memory(vm, addr, size);
> }
>
>+/*
>+ * See virtio_mem_add_memory(): Try adding a big block.
>+ */
>+static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_add_memory(vm, addr, size);
>+}
>+
> /*
> * Try removing memory from Linux. Will only fail if memory blocks aren't
> * offline.
>@@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> struct memory_notify *mhp = arg;
> const unsigned long start = PFN_PHYS(mhp->start_pfn);
> const unsigned long size = PFN_PHYS(mhp->nr_pages);
>- const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
> int rc = NOTIFY_OK;
>+ unsigned long id;
>
> if (!virtio_mem_overlaps_range(vm, start, size))
> return NOTIFY_DONE;
>
>- /*
>- * Memory is onlined/offlined in memory block granularity. We cannot
>- * cross virtio-mem device boundaries and memory block boundaries. Bail
>- * out if this ever changes.
>- */
>- if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>- !IS_ALIGNED(start, memory_block_size_bytes())))
>- return NOTIFY_BAD;
>+ if (vm->in_sbm) {
>+ id = virtio_mem_phys_to_mb_id(start);
>+ /*
>+ * In SBM, we add memory in separate memory blocks - we expect
>+ * it to be onlined/offlined in the same granularity. Bail out
>+ * if this ever changes.
>+ */
>+ if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>+ !IS_ALIGNED(start, memory_block_size_bytes())))
>+ return NOTIFY_BAD;
>+ } else {
>+ id = virtio_mem_phys_to_bb_id(vm, start);
>+ /*
>+ * In BBM, we only care about onlining/offlining happening
>+ * within a single big block, we don't care about the
>+ * actual granularity as we don't track individual Linux
>+ * memory blocks.
>+ */
>+ if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>+ return NOTIFY_BAD;
>+ }
>
> /*
> * Avoid circular locking lockdep warnings. We lock the mutex
>@@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> break;
> }
> vm->hotplug_active = true;
>- virtio_mem_sbm_notify_going_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_going_offline(vm, id);
> break;
> case MEM_GOING_ONLINE:
> mutex_lock(&vm->hotplug_mutex);
>@@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> break;
> }
> vm->hotplug_active = true;
>- rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>+ if (vm->in_sbm)
>+ rc = virtio_mem_sbm_notify_going_online(vm, id);
> break;
> case MEM_OFFLINE:
>- virtio_mem_sbm_notify_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_offline(vm, id);
>
> atomic64_add(size, &vm->offline_size);
> /*
>@@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> mutex_unlock(&vm->hotplug_mutex);
> break;
> case MEM_ONLINE:
>- virtio_mem_sbm_notify_online(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_online(vm, id);
>
> atomic64_sub(size, &vm->offline_size);
> /*
>@@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
> case MEM_CANCEL_OFFLINE:
> if (!vm->hotplug_active)
> break;
>- virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>+ if (vm->in_sbm)
>+ virtio_mem_sbm_notify_cancel_offline(vm, id);
> vm->hotplug_active = false;
> mutex_unlock(&vm->hotplug_mutex);
> break;
>@@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> {
> const unsigned long addr = page_to_phys(page);
>- const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>+ unsigned long id, sb_id;
> struct virtio_mem *vm;
>- int sb_id;
>+ bool do_online;
>
>- /*
>- * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>- * size/alignment and that this callback is is called with such a
>- * size/alignment. So we cannot cross subblocks and therefore
>- * also not memory blocks.
>- */
> rcu_read_lock();
> list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
> if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
> continue;
>
>- sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>- /*
>- * If plugged, online the pages, otherwise, set them fake
>- * offline (PageOffline).
>- */
>- if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>+ if (vm->in_sbm) {
>+ /*
>+ * We exploit here that subblocks have at least
>+ * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>+ * cross subblocks within one call.
>+ */
>+ id = virtio_mem_phys_to_mb_id(addr);
>+ sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>+ do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>+ sb_id, 1);
>+ } else {
>+ do_online = true;
>+ }
>+ if (do_online)
> generic_online_page(page, order);
> else
> virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>@@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
> return rc;
> }
>
>+/*
>+ * Request to unplug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_send_unplug_request(vm, addr, size);
>+}
>+
>+/*
>+ * Request to plug a big block.
>+ *
>+ * Will not modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>+{
>+ const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>+ const uint64_t size = vm->bbm.bb_size;
>+
>+ return virtio_mem_send_plug_request(vm, addr, size);
>+}
>+
> /*
> * Unplug the desired number of plugged subblocks of a offline or not-added
> * memory block. Will fail if any subblock cannot get unplugged (instead of
>@@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
> return 0;
> }
>
>-/*
>- * Try to plug the requested amount of memory.
>- */
>-static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
> {
> uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
>@@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
> return rc;
> }
>
>+/*
>+ * Plug a big block and add it to Linux.
>+ *
>+ * Will modify the state of the big block.
>+ */
>+static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>+ unsigned long bb_id)
>+{
>+ int rc;
>+
>+ if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>+ VIRTIO_MEM_BBM_BB_UNUSED))
>+ return -EINVAL;
>+
>+ rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>+ if (rc)
>+ return rc;
>+ virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>+
>+ rc = virtio_mem_bbm_add_bb(vm, bb_id);
>+ if (rc) {
>+ if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>+ virtio_mem_bbm_set_bb_state(vm, bb_id,
>+ VIRTIO_MEM_BBM_BB_UNUSED);
>+ else
>+ /* Retry from the main loop. */
>+ virtio_mem_bbm_set_bb_state(vm, bb_id,
>+ VIRTIO_MEM_BBM_BB_PLUGGED);
>+ return rc;
>+ }
>+ return 0;
>+}
>+
>+/*
>+ * Prepare tracking data for the next big block.
>+ */
>+static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>+ unsigned long *bb_id)
>+{
>+ int rc;
>+
>+ if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>+ return -ENOSPC;
>+
>+ /* Resize the big block state array if required. */
>+ rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>+ if (rc)
>+ return rc;
>+
>+ vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>+ *bb_id = vm->bbm.next_bb_id;
>+ vm->bbm.next_bb_id++;
>+ return 0;
>+}
>+
>+static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ uint64_t nb_bb = diff / vm->bbm.bb_size;
>+ unsigned long bb_id;
>+ int rc;
>+
>+ if (!nb_bb)
>+ return 0;
>+
>+ /* Try to plug and add unused big blocks */
>+ virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+ return -ENOSPC;
>+
>+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+ if (!rc)
>+ nb_bb--;
>+ if (rc || !nb_bb)
>+ return rc;
>+ cond_resched();
>+ }
>+
>+ /* Try to prepare, plug and add new big blocks */
>+ while (nb_bb) {
>+ if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>+ return -ENOSPC;
>+
>+ rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>+ if (rc)
>+ return rc;
>+ rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>+ if (!rc)
>+ nb_bb--;
>+ if (rc)
>+ return rc;
>+ cond_resched();
>+ }
>+
>+ return 0;
>+}
>+
>+/*
>+ * Try to plug the requested amount of memory.
>+ */
>+static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ if (vm->in_sbm)
>+ return virtio_mem_sbm_plug_request(vm, diff);
>+ return virtio_mem_bbm_plug_request(vm, diff);
>+}
>+
> /*
> * Unplug the desired number of plugged subblocks of an offline memory block.
> * Will fail if any subblock cannot get unplugged (instead of skipping it).
>@@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
> return 0;
> }
>
>-/*
>- * Try to unplug the requested amount of memory.
>- */
>-static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
> {
> uint64_t nb_sb = diff / vm->sbm.sb_size;
> unsigned long mb_id;
>@@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
> return rc;
> }
>
>+/*
>+ * Try to unplug the requested amount of memory.
>+ */
>+static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>+{
>+ if (vm->in_sbm)
>+ return virtio_mem_sbm_unplug_request(vm, diff);
>+ return -EBUSY;
>+}
>+
> /*
> * Try to unplug all blocks that couldn't be unplugged before, for example,
> * because the hypervisor was busy.
> */
> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
> {
>- unsigned long mb_id;
>+ unsigned long id;
> int rc;
>
>- virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>- rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>+ if (!vm->in_sbm) {
>+ virtio_mem_bbm_for_each_bb(vm, id,
>+ VIRTIO_MEM_BBM_BB_PLUGGED) {
>+ rc = virtio_mem_bbm_unplug_bb(vm, id);
>+ if (rc)
>+ return rc;
>+ virtio_mem_bbm_set_bb_state(vm, id,
>+ VIRTIO_MEM_BBM_BB_UNUSED);
>+ }
>+ return 0;
>+ }
>+
>+ virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>+ rc = virtio_mem_sbm_unplug_mb(vm, id);
> if (rc)
> return rc;
>- virtio_mem_sbm_set_mb_state(vm, mb_id,
>+ virtio_mem_sbm_set_mb_state(vm, id,
> VIRTIO_MEM_SBM_MB_UNUSED);
> }
>
>@@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
> usable_region_size, &usable_region_size);
> end_addr = vm->addr + usable_region_size;
> end_addr = min(end_addr, phys_limit);
>- vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>+
>+ if (vm->in_sbm)
>+ vm->sbm.last_usable_mb_id =
>+ virtio_mem_phys_to_mb_id(end_addr) - 1;
>+ else
>+ vm->bbm.last_usable_bb_id =
>+ virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
>
> /* see if there is a request to change the size */
> virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>@@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
> static int virtio_mem_init(struct virtio_mem *vm)
> {
> const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>+ uint64_t sb_size, addr;
> uint16_t node_id;
>
> if (!vm->vdev->config->get) {
>@@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
> if (vm->nid == NUMA_NO_NODE)
> vm->nid = memory_add_physaddr_to_nid(vm->addr);
>
>- /*
>- * We always hotplug memory in memory block granularity. This way,
>- * we have to wait for exactly one memory block to online.
>- */
>- if (vm->device_block_size > memory_block_size_bytes()) {
>- dev_err(&vm->vdev->dev,
>- "The block size is not supported (too big).\n");
>- return -EINVAL;
>- }
>-
> /* bad device setup - warn only */
> if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
> dev_warn(&vm->vdev->dev,
>@@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
> * - Is required for now for alloc_contig_range() to work reliably -
> * it doesn't properly handle smaller granularity on ZONE_NORMAL.
> */
>- vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>- pageblock_nr_pages) * PAGE_SIZE;
>- vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>- vm->sbm.sb_size);
>- vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>+ sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>+ pageblock_nr_pages) * PAGE_SIZE;
>+ sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>+
>+ if (sb_size < memory_block_size_bytes()) {
>+ /* SBM: At least two subblocks per Linux memory block. */
>+ vm->in_sbm = true;
>+ vm->sbm.sb_size = sb_size;
>+ vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>+ vm->sbm.sb_size;
>+
>+ /* Round up to the next full memory block */
>+ addr = vm->addr + memory_block_size_bytes() - 1;
>+ vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>+ vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+ } else {
>+ /* BBM: At least one Linux memory block. */
>+ vm->bbm.bb_size = vm->device_block_size;
>
>- /* Round up to the next full memory block */
>- vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>- memory_block_size_bytes());
>- vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>+ vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
Per my understanding, vm->addr is not guaranteed to be bb_size aligned, right?
Why not round up to next big block?
>+ vm->bbm.next_bb_id = vm->bbm.first_bb_id;
>+ }
>
> /* Prepare the offline threshold - make sure we can add two blocks. */
> vm->offline_threshold = max_t(uint64_t, 2 * memory_block_size_bytes(),
> VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
>+ /* In BBM, we also want at least two big blocks. */
>+ vm->offline_threshold = max_t(uint64_t, 2 * vm->bbm.bb_size,
>+ vm->offline_threshold);
>
> dev_info(&vm->vdev->dev, "start address: 0x%llx", vm->addr);
> dev_info(&vm->vdev->dev, "region size: 0x%llx", vm->region_size);
>@@ -1886,8 +2196,12 @@ static int virtio_mem_init(struct virtio_mem *vm)
> (unsigned long long)vm->device_block_size);
> dev_info(&vm->vdev->dev, "memory block size: 0x%lx",
> memory_block_size_bytes());
>- dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>- (unsigned long long)vm->sbm.sb_size);
>+ if (vm->in_sbm)
>+ dev_info(&vm->vdev->dev, "subblock size: 0x%llx",
>+ (unsigned long long)vm->sbm.sb_size);
>+ else
>+ dev_info(&vm->vdev->dev, "big block size: 0x%llx",
>+ (unsigned long long)vm->bbm.bb_size);
> if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
> dev_info(&vm->vdev->dev, "nid: %d", vm->nid);
>
>@@ -2044,22 +2358,24 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> cancel_work_sync(&vm->wq);
> hrtimer_cancel(&vm->retry_timer);
>
>- /*
>- * After we unregistered our callbacks, user space can online partially
>- * plugged offline blocks. Make sure to remove them.
>- */
>- virtio_mem_sbm_for_each_mb(vm, mb_id,
>- VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>- rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>- BUG_ON(rc);
>- virtio_mem_sbm_set_mb_state(vm, mb_id,
>- VIRTIO_MEM_SBM_MB_UNUSED);
>+ if (vm->in_sbm) {
>+ /*
>+ * After we unregistered our callbacks, user space can online
>+ * partially plugged offline blocks. Make sure to remove them.
>+ */
>+ virtio_mem_sbm_for_each_mb(vm, mb_id,
>+ VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) {
>+ rc = virtio_mem_sbm_remove_mb(vm, mb_id);
>+ BUG_ON(rc);
>+ virtio_mem_sbm_set_mb_state(vm, mb_id,
>+ VIRTIO_MEM_SBM_MB_UNUSED);
>+ }
>+ /*
>+ * After we unregistered our callbacks, user space can no longer
>+ * offline partially plugged online memory blocks. No need to
>+ * worry about them.
>+ */
> }
>- /*
>- * After we unregistered our callbacks, user space can no longer
>- * offline partially plugged online memory blocks. No need to worry
>- * about them.
>- */
>
> /* unregister callbacks */
> unregister_virtio_mem_device(vm);
>@@ -2078,8 +2394,12 @@ static void virtio_mem_remove(struct virtio_device *vdev)
> }
>
> /* remove all tracking data - no locking needed */
>- vfree(vm->sbm.mb_states);
>- vfree(vm->sbm.sb_states);
>+ if (vm->in_sbm) {
>+ vfree(vm->sbm.mb_states);
>+ vfree(vm->sbm.sb_states);
>+ } else {
>+ vfree(vm->bbm.bb_states);
>+ }
>
> /* reset the device and cleanup the queues */
> vdev->config->reset(vdev);
>--
>2.26.2
--
Wei Yang
Help you, Help me
On 19.10.20 04:26, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:19PM +0200, David Hildenbrand wrote:
>> Currently, we do not support device block sizes that exceed the Linux
>> memory block size. For example, having a device block size of 1 GiB (e.g.,
>> gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
>> blocks.
>>
>> Let's implement Big Block Mode (BBM), whereby we add/remove at least
>> one Linux memory block at a time. With a 1 GiB device block size, a Big
>> Block (BB) will cover 8 Linux memory blocks.
>>
>> We'll keep registering the online_page_callback machinery, it will be used
>> for safe memory hotunplug in BBM next.
>>
>> Note: BBM is properly prepared for variable-sized Linux memory
>> blocks that we might see in the future. So we won't care how many Linux
>> memory blocks a big block actually spans, and how the memory notifier is
>> called.
>>
>> Cc: "Michael S. Tsirkin" <[email protected]>
>> Cc: Jason Wang <[email protected]>
>> Cc: Pankaj Gupta <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Oscar Salvador <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> drivers/virtio/virtio_mem.c | 484 ++++++++++++++++++++++++++++++------
>> 1 file changed, 402 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index e68d0d99590c..4d396ef98a92 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -30,12 +30,18 @@ MODULE_PARM_DESC(unplug_online, "Try to unplug online memory");
>> /*
>> * virtio-mem currently supports the following modes of operation:
>> *
>> - * * Sub Block Mode (SBM): A Linux memory block spans 1..X subblocks (SB). The
>> + * * Sub Block Mode (SBM): A Linux memory block spans 2..X subblocks (SB). The
>> * size of a Sub Block (SB) is determined based on the device block size, the
>> * pageblock size, and the maximum allocation granularity of the buddy.
>> * Subblocks within a Linux memory block might either be plugged or unplugged.
>> * Memory is added/removed to Linux MM in Linux memory block granularity.
>> *
>> + * * Big Block Mode (BBM): A Big Block (BB) spans 1..X Linux memory blocks.
>> + * Memory is added/removed to Linux MM in Big Block granularity.
>> + *
>> + * The mode is determined automatically based on the Linux memory block size
>> + * and the device block size.
>> + *
>> * User space / core MM (auto onlining) is responsible for onlining added
>> * Linux memory blocks - and for selecting a zone. Linux Memory Blocks are
>> * always onlined separately, and all memory within a Linux memory block is
>> @@ -61,6 +67,19 @@ enum virtio_mem_sbm_mb_state {
>> VIRTIO_MEM_SBM_MB_COUNT
>> };
>>
>> +/*
>> + * State of a Big Block (BB) in BBM, covering 1..X Linux memory blocks.
>> + */
>> +enum virtio_mem_bbm_bb_state {
>> + /* Unplugged, not added to Linux. Can be reused later. */
>> + VIRTIO_MEM_BBM_BB_UNUSED = 0,
>> + /* Plugged, not added to Linux. Error on add_memory(). */
>> + VIRTIO_MEM_BBM_BB_PLUGGED,
>> + /* Plugged and added to Linux. */
>> + VIRTIO_MEM_BBM_BB_ADDED,
>> + VIRTIO_MEM_BBM_BB_COUNT
>> +};
>> +
>> struct virtio_mem {
>> struct virtio_device *vdev;
>>
>> @@ -113,6 +132,9 @@ struct virtio_mem {
>> atomic64_t offline_size;
>> uint64_t offline_threshold;
>>
>> + /* If set, the driver is in SBM, otherwise in BBM. */
>> + bool in_sbm;
>> +
>> struct {
>> /* Id of the first memory block of this device. */
>> unsigned long first_mb_id;
>> @@ -151,9 +173,27 @@ struct virtio_mem {
>> unsigned long *sb_states;
>> } sbm;
>>
>> + struct {
>> + /* Id of the first big block of this device. */
>> + unsigned long first_bb_id;
>> + /* Id of the last usable big block of this device. */
>> + unsigned long last_usable_bb_id;
>> + /* Id of the next device bock to prepare when needed. */
>> + unsigned long next_bb_id;
>> +
>> + /* Summary of all big block states. */
>> + unsigned long bb_count[VIRTIO_MEM_BBM_BB_COUNT];
>> +
>> + /* One byte state per big block. See sbm.mb_states. */
>> + uint8_t *bb_states;
>> +
>> + /* The block size used for (un)plugged, adding/removing. */
>> + uint64_t bb_size;
>> + } bbm;
>> +
>> /*
>> - * Mutex that protects the sbm.mb_count, sbm.mb_states, and
>> - * sbm.sb_states.
>> + * Mutex that protects the sbm.mb_count, sbm.mb_states,
>> + * sbm.sb_states, bbm.bb_count, and bbm.bb_states
>> *
>> * When this lock is held the pointers can't change, ONLINE and
>> * OFFLINE blocks can't change the state and no subblocks will get
>> @@ -247,6 +287,24 @@ static unsigned long virtio_mem_mb_id_to_phys(unsigned long mb_id)
>> return mb_id * memory_block_size_bytes();
>> }
>>
>> +/*
>> + * Calculate the big block id of a given address.
>> + */
>> +static unsigned long virtio_mem_phys_to_bb_id(struct virtio_mem *vm,
>> + uint64_t addr)
>> +{
>> + return addr / vm->bbm.bb_size;
>> +}
>> +
>> +/*
>> + * Calculate the physical start address of a given big block id.
>> + */
>> +static uint64_t virtio_mem_bb_id_to_phys(struct virtio_mem *vm,
>> + unsigned long bb_id)
>> +{
>> + return bb_id * vm->bbm.bb_size;
>> +}
>> +
>> /*
>> * Calculate the subblock id of a given address.
>> */
>> @@ -259,6 +317,67 @@ static unsigned long virtio_mem_phys_to_sb_id(struct virtio_mem *vm,
>> return (addr - mb_addr) / vm->sbm.sb_size;
>> }
>>
>> +/*
>> + * Set the state of a big block, taking care of the state counter.
>> + */
>> +static void virtio_mem_bbm_set_bb_state(struct virtio_mem *vm,
>> + unsigned long bb_id,
>> + enum virtio_mem_bbm_bb_state state)
>> +{
>> + const unsigned long idx = bb_id - vm->bbm.first_bb_id;
>> + enum virtio_mem_bbm_bb_state old_state;
>> +
>> + old_state = vm->bbm.bb_states[idx];
>> + vm->bbm.bb_states[idx] = state;
>> +
>> + BUG_ON(vm->bbm.bb_count[old_state] == 0);
>> + vm->bbm.bb_count[old_state]--;
>> + vm->bbm.bb_count[state]++;
>> +}
>> +
>> +/*
>> + * Get the state of a big block.
>> + */
>> +static enum virtio_mem_bbm_bb_state virtio_mem_bbm_get_bb_state(struct virtio_mem *vm,
>> + unsigned long bb_id)
>> +{
>> + return vm->bbm.bb_states[bb_id - vm->bbm.first_bb_id];
>> +}
>> +
>> +/*
>> + * Prepare the big block state array for the next big block.
>> + */
>> +static int virtio_mem_bbm_bb_states_prepare_next_bb(struct virtio_mem *vm)
>> +{
>> + unsigned long old_bytes = vm->bbm.next_bb_id - vm->bbm.first_bb_id;
>> + unsigned long new_bytes = old_bytes + 1;
>> + int old_pages = PFN_UP(old_bytes);
>> + int new_pages = PFN_UP(new_bytes);
>> + uint8_t *new_array;
>> +
>> + if (vm->bbm.bb_states && old_pages == new_pages)
>> + return 0;
>> +
>> + new_array = vzalloc(new_pages * PAGE_SIZE);
>> + if (!new_array)
>> + return -ENOMEM;
>> +
>> + mutex_lock(&vm->hotplug_mutex);
>> + if (vm->bbm.bb_states)
>> + memcpy(new_array, vm->bbm.bb_states, old_pages * PAGE_SIZE);
>> + vfree(vm->bbm.bb_states);
>> + vm->bbm.bb_states = new_array;
>> + mutex_unlock(&vm->hotplug_mutex);
>> +
>> + return 0;
>> +}
>> +
>> +#define virtio_mem_bbm_for_each_bb(_vm, _bb_id, _state) \
>> + for (_bb_id = vm->bbm.first_bb_id; \
>> + _bb_id < vm->bbm.next_bb_id && _vm->bbm.bb_count[_state]; \
>> + _bb_id++) \
>> + if (virtio_mem_bbm_get_bb_state(_vm, _bb_id) == _state)
>> +
>> /*
>> * Set the state of a memory block, taking care of the state counter.
>> */
>> @@ -504,6 +623,17 @@ static int virtio_mem_sbm_add_mb(struct virtio_mem *vm, unsigned long mb_id)
>> return virtio_mem_add_memory(vm, addr, size);
>> }
>>
>> +/*
>> + * See virtio_mem_add_memory(): Try adding a big block.
>> + */
>> +static int virtio_mem_bbm_add_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> + const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> + const uint64_t size = vm->bbm.bb_size;
>> +
>> + return virtio_mem_add_memory(vm, addr, size);
>> +}
>> +
>> /*
>> * Try removing memory from Linux. Will only fail if memory blocks aren't
>> * offline.
>> @@ -731,20 +861,33 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> struct memory_notify *mhp = arg;
>> const unsigned long start = PFN_PHYS(mhp->start_pfn);
>> const unsigned long size = PFN_PHYS(mhp->nr_pages);
>> - const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
>> int rc = NOTIFY_OK;
>> + unsigned long id;
>>
>> if (!virtio_mem_overlaps_range(vm, start, size))
>> return NOTIFY_DONE;
>>
>> - /*
>> - * Memory is onlined/offlined in memory block granularity. We cannot
>> - * cross virtio-mem device boundaries and memory block boundaries. Bail
>> - * out if this ever changes.
>> - */
>> - if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>> - !IS_ALIGNED(start, memory_block_size_bytes())))
>> - return NOTIFY_BAD;
>> + if (vm->in_sbm) {
>> + id = virtio_mem_phys_to_mb_id(start);
>> + /*
>> + * In SBM, we add memory in separate memory blocks - we expect
>> + * it to be onlined/offlined in the same granularity. Bail out
>> + * if this ever changes.
>> + */
>> + if (WARN_ON_ONCE(size != memory_block_size_bytes() ||
>> + !IS_ALIGNED(start, memory_block_size_bytes())))
>> + return NOTIFY_BAD;
>> + } else {
>> + id = virtio_mem_phys_to_bb_id(vm, start);
>> + /*
>> + * In BBM, we only care about onlining/offlining happening
>> + * within a single big block, we don't care about the
>> + * actual granularity as we don't track individual Linux
>> + * memory blocks.
>> + */
>> + if (WARN_ON_ONCE(id != virtio_mem_phys_to_bb_id(vm, start + size - 1)))
>> + return NOTIFY_BAD;
>> + }
>>
>> /*
>> * Avoid circular locking lockdep warnings. We lock the mutex
>> @@ -763,7 +906,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> break;
>> }
>> vm->hotplug_active = true;
>> - virtio_mem_sbm_notify_going_offline(vm, mb_id);
>> + if (vm->in_sbm)
>> + virtio_mem_sbm_notify_going_offline(vm, id);
>> break;
>> case MEM_GOING_ONLINE:
>> mutex_lock(&vm->hotplug_mutex);
>> @@ -773,10 +917,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> break;
>> }
>> vm->hotplug_active = true;
>> - rc = virtio_mem_sbm_notify_going_online(vm, mb_id);
>> + if (vm->in_sbm)
>> + rc = virtio_mem_sbm_notify_going_online(vm, id);
>> break;
>> case MEM_OFFLINE:
>> - virtio_mem_sbm_notify_offline(vm, mb_id);
>> + if (vm->in_sbm)
>> + virtio_mem_sbm_notify_offline(vm, id);
>>
>> atomic64_add(size, &vm->offline_size);
>> /*
>> @@ -790,7 +936,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> mutex_unlock(&vm->hotplug_mutex);
>> break;
>> case MEM_ONLINE:
>> - virtio_mem_sbm_notify_online(vm, mb_id);
>> + if (vm->in_sbm)
>> + virtio_mem_sbm_notify_online(vm, id);
>>
>> atomic64_sub(size, &vm->offline_size);
>> /*
>> @@ -809,7 +956,8 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
>> case MEM_CANCEL_OFFLINE:
>> if (!vm->hotplug_active)
>> break;
>> - virtio_mem_sbm_notify_cancel_offline(vm, mb_id);
>> + if (vm->in_sbm)
>> + virtio_mem_sbm_notify_cancel_offline(vm, id);
>> vm->hotplug_active = false;
>> mutex_unlock(&vm->hotplug_mutex);
>> break;
>> @@ -980,27 +1128,29 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
>> static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
>> {
>> const unsigned long addr = page_to_phys(page);
>> - const unsigned long mb_id = virtio_mem_phys_to_mb_id(addr);
>> + unsigned long id, sb_id;
>> struct virtio_mem *vm;
>> - int sb_id;
>> + bool do_online;
>>
>> - /*
>> - * We exploit here that subblocks have at least MAX_ORDER_NR_PAGES.
>> - * size/alignment and that this callback is is called with such a
>> - * size/alignment. So we cannot cross subblocks and therefore
>> - * also not memory blocks.
>> - */
>> rcu_read_lock();
>> list_for_each_entry_rcu(vm, &virtio_mem_devices, next) {
>> if (!virtio_mem_contains_range(vm, addr, PFN_PHYS(1 << order)))
>> continue;
>>
>> - sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>> - /*
>> - * If plugged, online the pages, otherwise, set them fake
>> - * offline (PageOffline).
>> - */
>> - if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
>> + if (vm->in_sbm) {
>> + /*
>> + * We exploit here that subblocks have at least
>> + * MAX_ORDER_NR_PAGES size/alignment - so we cannot
>> + * cross subblocks within one call.
>> + */
>> + id = virtio_mem_phys_to_mb_id(addr);
>> + sb_id = virtio_mem_phys_to_sb_id(vm, addr);
>> + do_online = virtio_mem_sbm_test_sb_plugged(vm, id,
>> + sb_id, 1);
>> + } else {
>> + do_online = true;
>> + }
>> + if (do_online)
>> generic_online_page(page, order);
>> else
>> virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
>> @@ -1180,6 +1330,32 @@ static int virtio_mem_sbm_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>> return rc;
>> }
>>
>> +/*
>> + * Request to unplug a big block.
>> + *
>> + * Will not modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_unplug_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> + const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> + const uint64_t size = vm->bbm.bb_size;
>> +
>> + return virtio_mem_send_unplug_request(vm, addr, size);
>> +}
>> +
>> +/*
>> + * Request to plug a big block.
>> + *
>> + * Will not modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_plug_bb(struct virtio_mem *vm, unsigned long bb_id)
>> +{
>> + const uint64_t addr = virtio_mem_bb_id_to_phys(vm, bb_id);
>> + const uint64_t size = vm->bbm.bb_size;
>> +
>> + return virtio_mem_send_plug_request(vm, addr, size);
>> +}
>> +
>> /*
>> * Unplug the desired number of plugged subblocks of a offline or not-added
>> * memory block. Will fail if any subblock cannot get unplugged (instead of
>> @@ -1365,10 +1541,7 @@ static int virtio_mem_sbm_plug_any_sb(struct virtio_mem *vm,
>> return 0;
>> }
>>
>> -/*
>> - * Try to plug the requested amount of memory.
>> - */
>> -static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +static int virtio_mem_sbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>> {
>> uint64_t nb_sb = diff / vm->sbm.sb_size;
>> unsigned long mb_id;
>> @@ -1435,6 +1608,112 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> return rc;
>> }
>>
>> +/*
>> + * Plug a big block and add it to Linux.
>> + *
>> + * Will modify the state of the big block.
>> + */
>> +static int virtio_mem_bbm_plug_and_add_bb(struct virtio_mem *vm,
>> + unsigned long bb_id)
>> +{
>> + int rc;
>> +
>> + if (WARN_ON_ONCE(virtio_mem_bbm_get_bb_state(vm, bb_id) !=
>> + VIRTIO_MEM_BBM_BB_UNUSED))
>> + return -EINVAL;
>> +
>> + rc = virtio_mem_bbm_plug_bb(vm, bb_id);
>> + if (rc)
>> + return rc;
>> + virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>> +
>> + rc = virtio_mem_bbm_add_bb(vm, bb_id);
>> + if (rc) {
>> + if (!virtio_mem_bbm_unplug_bb(vm, bb_id))
>> + virtio_mem_bbm_set_bb_state(vm, bb_id,
>> + VIRTIO_MEM_BBM_BB_UNUSED);
>> + else
>> + /* Retry from the main loop. */
>> + virtio_mem_bbm_set_bb_state(vm, bb_id,
>> + VIRTIO_MEM_BBM_BB_PLUGGED);
>> + return rc;
>> + }
>> + return 0;
>> +}
>> +
>> +/*
>> + * Prepare tracking data for the next big block.
>> + */
>> +static int virtio_mem_bbm_prepare_next_bb(struct virtio_mem *vm,
>> + unsigned long *bb_id)
>> +{
>> + int rc;
>> +
>> + if (vm->bbm.next_bb_id > vm->bbm.last_usable_bb_id)
>> + return -ENOSPC;
>> +
>> + /* Resize the big block state array if required. */
>> + rc = virtio_mem_bbm_bb_states_prepare_next_bb(vm);
>> + if (rc)
>> + return rc;
>> +
>> + vm->bbm.bb_count[VIRTIO_MEM_BBM_BB_UNUSED]++;
>> + *bb_id = vm->bbm.next_bb_id;
>> + vm->bbm.next_bb_id++;
>> + return 0;
>> +}
>> +
>> +static int virtio_mem_bbm_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> + uint64_t nb_bb = diff / vm->bbm.bb_size;
>> + unsigned long bb_id;
>> + int rc;
>> +
>> + if (!nb_bb)
>> + return 0;
>> +
>> + /* Try to plug and add unused big blocks */
>> + virtio_mem_bbm_for_each_bb(vm, bb_id, VIRTIO_MEM_BBM_BB_UNUSED) {
>> + if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>> + return -ENOSPC;
>> +
>> + rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>> + if (!rc)
>> + nb_bb--;
>> + if (rc || !nb_bb)
>> + return rc;
>> + cond_resched();
>> + }
>> +
>> + /* Try to prepare, plug and add new big blocks */
>> + while (nb_bb) {
>> + if (!virtio_mem_could_add_memory(vm, vm->bbm.bb_size))
>> + return -ENOSPC;
>> +
>> + rc = virtio_mem_bbm_prepare_next_bb(vm, &bb_id);
>> + if (rc)
>> + return rc;
>> + rc = virtio_mem_bbm_plug_and_add_bb(vm, bb_id);
>> + if (!rc)
>> + nb_bb--;
>> + if (rc)
>> + return rc;
>> + cond_resched();
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Try to plug the requested amount of memory.
>> + */
>> +static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> + if (vm->in_sbm)
>> + return virtio_mem_sbm_plug_request(vm, diff);
>> + return virtio_mem_bbm_plug_request(vm, diff);
>> +}
>> +
>> /*
>> * Unplug the desired number of plugged subblocks of an offline memory block.
>> * Will fail if any subblock cannot get unplugged (instead of skipping it).
>> @@ -1573,10 +1852,7 @@ static int virtio_mem_sbm_unplug_any_sb_online(struct virtio_mem *vm,
>> return 0;
>> }
>>
>> -/*
>> - * Try to unplug the requested amount of memory.
>> - */
>> -static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> +static int virtio_mem_sbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> {
>> uint64_t nb_sb = diff / vm->sbm.sb_size;
>> unsigned long mb_id;
>> @@ -1642,20 +1918,42 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> return rc;
>> }
>>
>> +/*
>> + * Try to unplug the requested amount of memory.
>> + */
>> +static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff)
>> +{
>> + if (vm->in_sbm)
>> + return virtio_mem_sbm_unplug_request(vm, diff);
>> + return -EBUSY;
>> +}
>> +
>> /*
>> * Try to unplug all blocks that couldn't be unplugged before, for example,
>> * because the hypervisor was busy.
>> */
>> static int virtio_mem_unplug_pending_mb(struct virtio_mem *vm)
>> {
>> - unsigned long mb_id;
>> + unsigned long id;
>> int rc;
>>
>> - virtio_mem_sbm_for_each_mb(vm, mb_id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>> - rc = virtio_mem_sbm_unplug_mb(vm, mb_id);
>> + if (!vm->in_sbm) {
>> + virtio_mem_bbm_for_each_bb(vm, id,
>> + VIRTIO_MEM_BBM_BB_PLUGGED) {
>> + rc = virtio_mem_bbm_unplug_bb(vm, id);
>> + if (rc)
>> + return rc;
>> + virtio_mem_bbm_set_bb_state(vm, id,
>> + VIRTIO_MEM_BBM_BB_UNUSED);
>> + }
>> + return 0;
>> + }
>> +
>> + virtio_mem_sbm_for_each_mb(vm, id, VIRTIO_MEM_SBM_MB_PLUGGED) {
>> + rc = virtio_mem_sbm_unplug_mb(vm, id);
>> if (rc)
>> return rc;
>> - virtio_mem_sbm_set_mb_state(vm, mb_id,
>> + virtio_mem_sbm_set_mb_state(vm, id,
>> VIRTIO_MEM_SBM_MB_UNUSED);
>> }
>>
>> @@ -1681,7 +1979,13 @@ static void virtio_mem_refresh_config(struct virtio_mem *vm)
>> usable_region_size, &usable_region_size);
>> end_addr = vm->addr + usable_region_size;
>> end_addr = min(end_addr, phys_limit);
>> - vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr) - 1;
>> +
>> + if (vm->in_sbm)
>> + vm->sbm.last_usable_mb_id =
>> + virtio_mem_phys_to_mb_id(end_addr) - 1;
>> + else
>> + vm->bbm.last_usable_bb_id =
>> + virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
>>
>> /* see if there is a request to change the size */
>> virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
>> @@ -1804,6 +2108,7 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
>> static int virtio_mem_init(struct virtio_mem *vm)
>> {
>> const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
>> + uint64_t sb_size, addr;
>> uint16_t node_id;
>>
>> if (!vm->vdev->config->get) {
>> @@ -1836,16 +2141,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
>> if (vm->nid == NUMA_NO_NODE)
>> vm->nid = memory_add_physaddr_to_nid(vm->addr);
>>
>> - /*
>> - * We always hotplug memory in memory block granularity. This way,
>> - * we have to wait for exactly one memory block to online.
>> - */
>> - if (vm->device_block_size > memory_block_size_bytes()) {
>> - dev_err(&vm->vdev->dev,
>> - "The block size is not supported (too big).\n");
>> - return -EINVAL;
>> - }
>> -
>> /* bad device setup - warn only */
>> if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
>> dev_warn(&vm->vdev->dev,
>> @@ -1865,20 +2160,35 @@ static int virtio_mem_init(struct virtio_mem *vm)
>> * - Is required for now for alloc_contig_range() to work reliably -
>> * it doesn't properly handle smaller granularity on ZONE_NORMAL.
>> */
>> - vm->sbm.sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>> - pageblock_nr_pages) * PAGE_SIZE;
>> - vm->sbm.sb_size = max_t(uint64_t, vm->device_block_size,
>> - vm->sbm.sb_size);
>> - vm->sbm.sbs_per_mb = memory_block_size_bytes() / vm->sbm.sb_size;
>> + sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
>> + pageblock_nr_pages) * PAGE_SIZE;
>> + sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
>> +
>> + if (sb_size < memory_block_size_bytes()) {
>> + /* SBM: At least two subblocks per Linux memory block. */
>> + vm->in_sbm = true;
>> + vm->sbm.sb_size = sb_size;
>> + vm->sbm.sbs_per_mb = memory_block_size_bytes() /
>> + vm->sbm.sb_size;
>> +
>> + /* Round up to the next full memory block */
>> + addr = vm->addr + memory_block_size_bytes() - 1;
>> + vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
>> + vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>> + } else {
>> + /* BBM: At least one Linux memory block. */
>> + vm->bbm.bb_size = vm->device_block_size;
>>
>> - /* Round up to the next full memory block */
>> - vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(vm->addr - 1 +
>> - memory_block_size_bytes());
>> - vm->sbm.next_mb_id = vm->sbm.first_mb_id;
>> + vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, vm->addr);
>
> Per my understanding, vm->addr is not guaranteed to be bb_size aligned, right?
>
The virtio spec enforces alignment to device block size. (QEMU is buggy
with bigger block sizes, though. Fix is on the QEMU list.)
> Why not round up to next big block?
Will implicitly be done in patch #26, where it might no longer be
guaranteed.
--
Thanks,
David / dhildenb
On 18.10.20 14:41, Wei Yang wrote:
> On Fri, Oct 16, 2020 at 03:17:06PM +0200, David Hildenbrand wrote:
>> On 16.10.20 10:53, Wei Yang wrote:
>>> On Mon, Oct 12, 2020 at 02:53:14PM +0200, David Hildenbrand wrote:
>>>> Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
>>>>
>>>> Cc: "Michael S. Tsirkin" <[email protected]>
>>>> Cc: Jason Wang <[email protected]>
>>>> Cc: Pankaj Gupta <[email protected]>
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>
>>> One trivial suggestion, could we move this patch close the data structure
>>> movement patch?
>>>
>>> I know this would be some work, since you have changed some of the code logic.
>>> This would take you some time to rebase.
>>
>> You mean after patch #17 ?
>
> Yes
>
>>
>> I guess I can move patch #18 (prereq) a little further up (e.g., after
>> patch #15). Guess moving it in front of #19 shouldn't be too hard.
>>
>> Will give it a try - if it takes too much effort, I'll leave it like this.
>>
>
> Not a big deal, while it will make the change more intact to me.
>
> This is a big patch set to me. In case it could be split into two parts, like
> bug fix/logic improvement and BBM implementation, that would be more friendly
> to review.
I'll most probably keep it as a single series, but reshuffle the patches
into
1. cleanups
2. preparations
3. BBM
That should make things easier to digest. Thanks!
--
Thanks,
David / dhildenb
On 19.10.20 03:57, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:15PM +0200, David Hildenbrand wrote:
>> Let's rename accordingly.
>>
>> Cc: "Michael S. Tsirkin" <[email protected]>
>> Cc: Jason Wang <[email protected]>
>> Cc: Pankaj Gupta <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> drivers/virtio/virtio_mem.c | 29 +++++++++++++++--------------
>> 1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>> index 3a772714fec9..d06c8760b337 100644
>> --- a/drivers/virtio/virtio_mem.c
>> +++ b/drivers/virtio/virtio_mem.c
>> @@ -589,8 +589,8 @@ static bool virtio_mem_contains_range(struct virtio_mem *vm, uint64_t start,
>> return start >= vm->addr && start + size <= vm->addr + vm->region_size;
>> }
>>
>> -static int virtio_mem_notify_going_online(struct virtio_mem *vm,
>> - unsigned long mb_id)
>> +static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm,
>> + unsigned long mb_id)
>
> Look into this patch with "virtio-mem: Big Block Mode (BBM) memory hotplug"
> together, I thought the code is a little "complex".
>
> The final logic of virtio_mem_memory_notifier_cb() looks like this:
>
> virtio_mem_memory_notifier_cb()
> if (vm->in_sbm)
> notify_xxx()
> if (vm->in_sbm)
> notify_xxx()
>
> Can we adjust this like
>
> virtio_mem_memory_notifier_cb()
> notify_xxx()
> if (vm->in_sbm)
> return
> notify_xxx()
> if (vm->in_sbm)
> return
>
> This style looks a little better to me.
Then we lose all the shared code after any of the mode-specific
handling? Like we have in MEM_OFFLINE, MEM_ONLINE, MEM_CANCEL_OFFLINE, ...
Don't think this will improve the situation.
--
Thanks,
David / dhildenb
> Let's rename and move accordingly. While at it, rename sb_bitmap to
> "sb_states".
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: Pankaj Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> drivers/virtio/virtio_mem.c | 118 +++++++++++++++++++-----------------
> 1 file changed, 62 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index e76d6f769aa5..2cc497ad8298 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -137,17 +137,23 @@ struct virtio_mem {
> * memory in one 4 KiB page.
> */
> uint8_t *mb_states;
> - } sbm;
>
> - /*
> - * $nb_sb_per_mb bit per memory block. Handled similar to sbm.mb_states.
> - *
> - * With 4MB subblocks, we manage 128GB of memory in one page.
> - */
> - unsigned long *sb_bitmap;
> + /*
> + * Bitmap: one bit per subblock. Allocated similar to
> + * sbm.mb_states.
> + *
> + * A set bit means the corresponding subblock is plugged,
> + * otherwise it's unblocked.
> + *
> + * With 4 MiB subblocks, we manage 128 GiB of memory in one
> + * 4 KiB page.
> + */
> + unsigned long *sb_states;
> + } sbm;
>
> /*
> - * Mutex that protects the sbm.mb_count, sbm.mb_states, and sb_bitmap.
> + * Mutex that protects the sbm.mb_count, sbm.mb_states, and
> + * sbm.sb_states.
> *
> * When this lock is held the pointers can't change, ONLINE and
> * OFFLINE blocks can't change the state and no subblocks will get
> @@ -326,13 +332,13 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> *
> * Will not modify the state of the memory block.
> */
> -static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
> - unsigned long mb_id, int sb_id,
> - int count)
> +static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
> + unsigned long mb_id, int sb_id,
> + int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> - __bitmap_set(vm->sb_bitmap, bit, count);
> + __bitmap_set(vm->sbm.sb_states, bit, count);
> }
>
> /*
> @@ -340,86 +346,87 @@ static void virtio_mem_mb_set_sb_plugged(struct virtio_mem *vm,
> *
> * Will not modify the state of the memory block.
> */
> -static void virtio_mem_mb_set_sb_unplugged(struct virtio_mem *vm,
> - unsigned long mb_id, int sb_id,
> - int count)
> +static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
> + unsigned long mb_id, int sb_id,
> + int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> - __bitmap_clear(vm->sb_bitmap, bit, count);
> + __bitmap_clear(vm->sbm.sb_states, bit, count);
> }
>
> /*
> * Test if all selected subblocks are plugged.
> */
> -static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm,
> - unsigned long mb_id, int sb_id,
> - int count)
> +static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
> + unsigned long mb_id, int sb_id,
> + int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> if (count == 1)
> - return test_bit(bit, vm->sb_bitmap);
> + return test_bit(bit, vm->sbm.sb_states);
>
> /* TODO: Helper similar to bitmap_set() */
> - return find_next_zero_bit(vm->sb_bitmap, bit + count, bit) >=
> + return find_next_zero_bit(vm->sbm.sb_states, bit + count, bit) >=
> bit + count;
> }
>
> /*
> * Test if all selected subblocks are unplugged.
> */
> -static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm,
> - unsigned long mb_id, int sb_id,
> - int count)
> +static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> + unsigned long mb_id, int sb_id,
> + int count)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
>
> /* TODO: Helper similar to bitmap_set() */
> - return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count;
> + return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
> + bit + count;
> }
>
> /*
> * Find the first unplugged subblock. Returns vm->nb_sb_per_mb in case there is
> * none.
> */
> -static int virtio_mem_mb_first_unplugged_sb(struct virtio_mem *vm,
> +static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> unsigned long mb_id)
> {
> const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
>
> - return find_next_zero_bit(vm->sb_bitmap, bit + vm->nb_sb_per_mb, bit) -
> - bit;
> + return find_next_zero_bit(vm->sbm.sb_states,
> + bit + vm->nb_sb_per_mb, bit) - bit;
> }
>
> /*
> * Prepare the subblock bitmap for the next memory block.
> */
> -static int virtio_mem_sb_bitmap_prepare_next_mb(struct virtio_mem *vm)
> +static int virtio_mem_sbm_sb_states_prepare_next_mb(struct virtio_mem *vm)
> {
> const unsigned long old_nb_mb = vm->next_mb_id - vm->first_mb_id;
> const unsigned long old_nb_bits = old_nb_mb * vm->nb_sb_per_mb;
> const unsigned long new_nb_bits = (old_nb_mb + 1) * vm->nb_sb_per_mb;
> int old_pages = PFN_UP(BITS_TO_LONGS(old_nb_bits) * sizeof(long));
> int new_pages = PFN_UP(BITS_TO_LONGS(new_nb_bits) * sizeof(long));
> - unsigned long *new_sb_bitmap, *old_sb_bitmap;
> + unsigned long *new_bitmap, *old_bitmap;
>
> - if (vm->sb_bitmap && old_pages == new_pages)
> + if (vm->sbm.sb_states && old_pages == new_pages)
> return 0;
>
> - new_sb_bitmap = vzalloc(new_pages * PAGE_SIZE);
> - if (!new_sb_bitmap)
> + new_bitmap = vzalloc(new_pages * PAGE_SIZE);
> + if (!new_bitmap)
> return -ENOMEM;
>
> mutex_lock(&vm->hotplug_mutex);
> - if (new_sb_bitmap)
> - memcpy(new_sb_bitmap, vm->sb_bitmap, old_pages * PAGE_SIZE);
> + if (new_bitmap)
> + memcpy(new_bitmap, vm->sbm.sb_states, old_pages * PAGE_SIZE);
>
> - old_sb_bitmap = vm->sb_bitmap;
> - vm->sb_bitmap = new_sb_bitmap;
> + old_bitmap = vm->sbm.sb_states;
> + vm->sbm.sb_states = new_bitmap;
> mutex_unlock(&vm->hotplug_mutex);
>
> - vfree(old_sb_bitmap);
> + vfree(old_bitmap);
> return 0;
> }
>
> @@ -630,7 +637,7 @@ static void virtio_mem_notify_going_offline(struct virtio_mem *vm,
> int sb_id;
>
> for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> - if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> + if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> sb_id * vm->subblock_size);
> @@ -646,7 +653,7 @@ static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm,
> int sb_id;
>
> for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) {
> - if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> + if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> continue;
> pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
> sb_id * vm->subblock_size);
> @@ -936,7 +943,7 @@ static void virtio_mem_online_page_cb(struct page *page, unsigned int order)
> * If plugged, online the pages, otherwise, set them fake
> * offline (PageOffline).
> */
> - if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> + if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> generic_online_page(page, order);
> else
> virtio_mem_set_fake_offline(PFN_DOWN(addr), 1 << order,
> @@ -1071,7 +1078,7 @@ static int virtio_mem_mb_plug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
> rc = virtio_mem_send_plug_request(vm, addr, size);
> if (!rc)
> - virtio_mem_mb_set_sb_plugged(vm, mb_id, sb_id, count);
> + virtio_mem_sbm_set_sb_plugged(vm, mb_id, sb_id, count);
> return rc;
> }
>
> @@ -1092,7 +1099,7 @@ static int virtio_mem_mb_unplug_sb(struct virtio_mem *vm, unsigned long mb_id,
>
> rc = virtio_mem_send_unplug_request(vm, addr, size);
> if (!rc)
> - virtio_mem_mb_set_sb_unplugged(vm, mb_id, sb_id, count);
> + virtio_mem_sbm_set_sb_unplugged(vm, mb_id, sb_id, count);
> return rc;
> }
>
> @@ -1115,14 +1122,14 @@ static int virtio_mem_mb_unplug_any_sb(struct virtio_mem *vm,
> while (*nb_sb) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
> - virtio_mem_mb_test_sb_unplugged(vm, mb_id, sb_id, 1))
> + virtio_mem_sbm_test_sb_unplugged(vm, mb_id, sb_id, 1))
> sb_id--;
> if (sb_id < 0)
> break;
> /* Try to unplug multiple subblocks at a time */
> count = 1;
> while (count < *nb_sb && sb_id > 0 &&
> - virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
> + virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id - 1, 1)) {
> count++;
> sb_id--;
> }
> @@ -1168,7 +1175,7 @@ static int virtio_mem_prepare_next_mb(struct virtio_mem *vm,
> return rc;
>
> /* Resize the subblock bitmap if required. */
> - rc = virtio_mem_sb_bitmap_prepare_next_mb(vm);
> + rc = virtio_mem_sbm_sb_states_prepare_next_mb(vm);
> if (rc)
> return rc;
>
> @@ -1253,14 +1260,13 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> return -EINVAL;
>
> while (*nb_sb) {
> - sb_id = virtio_mem_mb_first_unplugged_sb(vm, mb_id);
> + sb_id = virtio_mem_sbm_first_unplugged_sb(vm, mb_id);
> if (sb_id >= vm->nb_sb_per_mb)
> break;
> count = 1;
> while (count < *nb_sb &&
> sb_id + count < vm->nb_sb_per_mb &&
> - !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id + count,
> - 1))
> + !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id + count, 1))
> count++;
>
> rc = virtio_mem_mb_plug_sb(vm, mb_id, sb_id, count);
> @@ -1277,7 +1283,7 @@ static int virtio_mem_mb_plug_any_sb(struct virtio_mem *vm, unsigned long mb_id,
> virtio_mem_fake_online(pfn, nr_pages);
> }
>
> - if (virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> + if (virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> if (online)
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_ONLINE);
> @@ -1377,13 +1383,13 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm,
> rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb);
>
> /* some subblocks might have been unplugged even on failure */
> - if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> + if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb))
> virtio_mem_sbm_set_mb_state(vm, mb_id,
> VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL);
> if (rc)
> return rc;
>
> - if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> + if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> /*
> * Remove the block from Linux - this should never fail.
> * Hinder the block from getting onlined by marking it
> @@ -1452,7 +1458,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
>
> /* If possible, try to unplug the complete block in one shot. */
> if (*nb_sb >= vm->nb_sb_per_mb &&
> - virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> + virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> rc = virtio_mem_mb_unplug_sb_online(vm, mb_id, 0,
> vm->nb_sb_per_mb);
> if (!rc) {
> @@ -1466,7 +1472,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> for (sb_id = vm->nb_sb_per_mb - 1; sb_id >= 0 && *nb_sb; sb_id--) {
> /* Find the next candidate subblock */
> while (sb_id >= 0 &&
> - !virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1))
> + !virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1))
> sb_id--;
> if (sb_id < 0)
> break;
> @@ -1485,7 +1491,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm,
> * remove it. This will usually not fail, as no memory is in use
> * anymore - however some other notifiers might NACK the request.
> */
> - if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> + if (virtio_mem_sbm_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) {
> mutex_unlock(&vm->hotplug_mutex);
> rc = virtio_mem_mb_offline_and_remove(vm, mb_id);
> mutex_lock(&vm->hotplug_mutex);
> @@ -2007,7 +2013,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>
> /* remove all tracking data - no locking needed */
> vfree(vm->sbm.mb_states);
> - vfree(vm->sb_bitmap);
> + vfree(vm->sbm.sb_states);
>
> /* reset the device and cleanup the queues */
> vdev->config->reset(vdev);
Reviewed-by: Pankaj Gupta <[email protected]>
> > Let's add some documentation for the current mode - Sub Block Mode (SBM) -
> > to prepare for a new mode - Big Block Mode (BBM).
> >
> > Follow-up patches will properly factor out the existing Sub Block Mode
> > (SBM) and implement Device Block Mode (DBM).
>
> s/Device Block Mode (DBM)/Big Block Mode (BBM)/
>
Reviewed-by: Pankaj Gupta <[email protected]>
> The calculation is already complicated enough, let's limit it to one
> location.
>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Jason Wang <[email protected]>
> Cc: Pankaj Gupta <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> drivers/virtio/virtio_mem.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 2cc497ad8298..73ff6e9ba839 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -327,6 +327,16 @@ static int virtio_mem_sbm_mb_states_prepare_next_mb(struct virtio_mem *vm)
> _mb_id--) \
> if (virtio_mem_sbm_get_mb_state(_vm, _mb_id) == _state)
>
> +/*
> + * Calculate the bit number in the sb_states bitmap for the given subblock
> + * inside the given memory block.
> + */
> +static int virtio_mem_sbm_sb_state_bit_nr(struct virtio_mem *vm,
> + unsigned long mb_id, int sb_id)
> +{
> + return (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> +}
> +
> /*
> * Mark all selected subblocks plugged.
> *
> @@ -336,7 +346,7 @@ static void virtio_mem_sbm_set_sb_plugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
> - const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> + const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> __bitmap_set(vm->sbm.sb_states, bit, count);
> }
> @@ -350,7 +360,7 @@ static void virtio_mem_sbm_set_sb_unplugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
> - const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> + const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> __bitmap_clear(vm->sbm.sb_states, bit, count);
> }
> @@ -362,7 +372,7 @@ static bool virtio_mem_sbm_test_sb_plugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
> - const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> + const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> if (count == 1)
> return test_bit(bit, vm->sbm.sb_states);
> @@ -379,7 +389,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> unsigned long mb_id, int sb_id,
> int count)
> {
> - const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id;
> + const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, sb_id);
>
> /* TODO: Helper similar to bitmap_set() */
> return find_next_bit(vm->sbm.sb_states, bit + count, bit) >=
> @@ -393,7 +403,7 @@ static bool virtio_mem_sbm_test_sb_unplugged(struct virtio_mem *vm,
> static int virtio_mem_sbm_first_unplugged_sb(struct virtio_mem *vm,
> unsigned long mb_id)
> {
> - const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb;
> + const int bit = virtio_mem_sbm_sb_state_bit_nr(vm, mb_id, 0);
>
> return find_next_zero_bit(vm->sbm.sb_states,
> bit + vm->nb_sb_per_mb, bit) - bit;
Agree, there are alot of *b things, good to clean as much.
Reviewed-by: Pankaj Gupta <[email protected]>