2015-07-20 02:12:46

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 0/5] Drivers: hv: vmbus: Miscellaneous improvements and fixes

In addition to a bug fix and some improvements to the way we distribute channel
load amongst available CPUs, this patch set also includes an implementation of
a clocksource based on the TSC page that Hyper-V supports.

Christopher Oo (1):
Drivers: hv_vmbus: Fix signal to host condition

Dexuan Cui (1):
Drivers: hv: vmbus: Further improve CPU affiliation logic

K. Y. Srinivasan (2):
Drivers: hv: vmbus: Improve the CPU affiliation for channels
Drivers: hv: vmbus: Implement a clocksource based on the TSC page

Viresh Kumar (1):
drivers/hv: Migrate to new 'set-state' interface

arch/x86/include/uapi/asm/hyperv.h | 2 +
drivers/hv/channel_mgmt.c | 29 +++++++--
drivers/hv/hv.c | 120 ++++++++++++++++++++++++++++--------
drivers/hv/hyperv_vmbus.h | 14 ++++
drivers/hv/ring_buffer.c | 14 +---
5 files changed, 136 insertions(+), 43 deletions(-)

--
1.7.4.1


2015-07-20 02:13:38

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 1/5] Drivers: hv: vmbus: Improve the CPU affiliation for channels

The current code tracks the assigned CPUs within a NUMA node in the context of
the primary channel. So, if we have a VM with a single NUMA node with 8 VCPUs, we may
end up unevenly distributing the channel load. Fix the issue by tracking affiliations
globally.

Signed-off-by: K. Y. Srinivasan <[email protected]>
---
drivers/hv/channel_mgmt.c | 11 ++++++-----
drivers/hv/hv.c | 9 +++++++++
drivers/hv/hyperv_vmbus.h | 5 +++++
include/linux/hyperv.h | 1 -
4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 4506a66..f52373a 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -391,6 +391,7 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
struct vmbus_channel *primary = channel->primary_channel;
int next_node;
struct cpumask available_mask;
+ struct cpumask *alloced_mask;

for (i = IDE; i < MAX_PERF_CHN; i++) {
if (!memcmp(type_guid->b, hp_devs[i].guid,
@@ -408,7 +409,6 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
* channel, bind it to cpu 0.
*/
channel->numa_node = 0;
- cpumask_set_cpu(0, &channel->alloced_cpus_in_node);
channel->target_cpu = 0;
channel->target_vp = hv_context.vp_index[0];
return;
@@ -433,21 +433,22 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
channel->numa_node = next_node;
primary = channel;
}
+ alloced_mask = &hv_context.hv_numa_map[primary->numa_node];

- if (cpumask_weight(&primary->alloced_cpus_in_node) ==
+ if (cpumask_weight(alloced_mask) ==
cpumask_weight(cpumask_of_node(primary->numa_node))) {
/*
* We have cycled through all the CPUs in the node;
* reset the alloced map.
*/
- cpumask_clear(&primary->alloced_cpus_in_node);
+ cpumask_clear(alloced_mask);
}

- cpumask_xor(&available_mask, &primary->alloced_cpus_in_node,
+ cpumask_xor(&available_mask, alloced_mask,
cpumask_of_node(primary->numa_node));

cur_cpu = cpumask_next(-1, &available_mask);
- cpumask_set_cpu(cur_cpu, &primary->alloced_cpus_in_node);
+ cpumask_set_cpu(cur_cpu, alloced_mask);

channel->target_cpu = cur_cpu;
channel->target_vp = hv_context.vp_index[cur_cpu];
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index d3943bc..85498a7 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -329,6 +329,13 @@ int hv_synic_alloc(void)
size_t ced_size = sizeof(struct clock_event_device);
int cpu;

+ hv_context.hv_numa_map = kzalloc(sizeof(struct cpumask) * nr_node_ids,
+ GFP_ATOMIC);
+ if (hv_context.hv_numa_map == NULL) {
+ pr_err("Unable to allocate NUMA map\n");
+ goto err;
+ }
+
for_each_online_cpu(cpu) {
hv_context.event_dpc[cpu] = kmalloc(size, GFP_ATOMIC);
if (hv_context.event_dpc[cpu] == NULL) {
@@ -342,6 +349,7 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate clock event device\n");
goto err;
}
+
hv_init_clockevent_device(hv_context.clk_evt[cpu], cpu);

hv_context.synic_message_page[cpu] =
@@ -390,6 +398,7 @@ void hv_synic_free(void)
{
int cpu;

+ kfree(hv_context.hv_numa_map);
for_each_online_cpu(cpu)
hv_synic_free_cpu(cpu);
}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index cddc0c9..7bdef66 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -551,6 +551,11 @@ struct hv_context {
* Support PV clockevent device.
*/
struct clock_event_device *clk_evt[NR_CPUS];
+ /*
+ * To manage allocations in a NUMA node.
+ * Array indexed by numa node ID.
+ */
+ struct cpumask *hv_numa_map;
};

extern struct hv_context hv_context;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 30d3a1f..34cf811 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -699,7 +699,6 @@ struct vmbus_channel {
/*
* State to manage the CPU affiliation of channels.
*/
- struct cpumask alloced_cpus_in_node;
int numa_node;
/*
* Support for sub-channels. For high performance devices,
--
1.7.4.1

2015-07-20 02:13:40

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 2/5] Drivers: hv: vmbus: Further improve CPU affiliation logic

From: Dexuan Cui <[email protected]>

Keep track of CPU affiliations of sub-channels within the scope of the primary
channel. This will allow us to better distribute the load amongst available
CPUs.

Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
---
drivers/hv/channel_mgmt.c | 20 ++++++++++++++++++--
include/linux/hyperv.h | 1 +
2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f52373a..3ab4753 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -447,8 +447,24 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
cpumask_xor(&available_mask, alloced_mask,
cpumask_of_node(primary->numa_node));

- cur_cpu = cpumask_next(-1, &available_mask);
- cpumask_set_cpu(cur_cpu, alloced_mask);
+ cur_cpu = -1;
+ while (true) {
+ cur_cpu = cpumask_next(cur_cpu, &available_mask);
+ if (cur_cpu >= nr_cpu_ids) {
+ cur_cpu = -1;
+ cpumask_copy(&available_mask,
+ cpumask_of_node(primary->numa_node));
+ continue;
+ }
+
+ if (!cpumask_test_cpu(cur_cpu,
+ &primary->alloced_cpus_in_node)) {
+ cpumask_set_cpu(cur_cpu,
+ &primary->alloced_cpus_in_node);
+ cpumask_set_cpu(cur_cpu, alloced_mask);
+ break;
+ }
+ }

channel->target_cpu = cur_cpu;
channel->target_vp = hv_context.vp_index[cur_cpu];
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 34cf811..30d3a1f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -699,6 +699,7 @@ struct vmbus_channel {
/*
* State to manage the CPU affiliation of channels.
*/
+ struct cpumask alloced_cpus_in_node;
int numa_node;
/*
* Support for sub-channels. For high performance devices,
--
1.7.4.1

2015-07-20 02:13:14

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 3/5] Drivers: hv_vmbus: Fix signal to host condition

From: Christopher Oo <[email protected]>

Fixes a bug where previously hv_ringbuffer_read would pass in the old
number of bytes available to read instead of the expected old read index
when calculating when to signal to the host that the ringbuffer is empty.
Since the previous write size is already saved, also changes the
hv_need_to_signal_on_read to use the previously read value rather than
recalculating it.

Signed-off-by: Christopher Oo <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
---
drivers/hv/ring_buffer.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 6361d12..70a1a9a 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -103,10 +103,9 @@ static bool hv_need_to_signal(u32 old_write, struct hv_ring_buffer_info *rbi)
* there is room for the producer to send the pending packet.
*/

-static bool hv_need_to_signal_on_read(u32 old_rd,
- struct hv_ring_buffer_info *rbi)
+static bool hv_need_to_signal_on_read(u32 prev_write_sz,
+ struct hv_ring_buffer_info *rbi)
{
- u32 prev_write_sz;
u32 cur_write_sz;
u32 r_size;
u32 write_loc = rbi->ring_buffer->write_index;
@@ -123,10 +122,6 @@ static bool hv_need_to_signal_on_read(u32 old_rd,
cur_write_sz = write_loc >= read_loc ? r_size - (write_loc - read_loc) :
read_loc - write_loc;

- prev_write_sz = write_loc >= old_rd ? r_size - (write_loc - old_rd) :
- old_rd - write_loc;
-
-
if ((prev_write_sz < pending_sz) && (cur_write_sz >= pending_sz))
return true;

@@ -517,7 +512,6 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info, void *buffer,
u32 next_read_location = 0;
u64 prev_indices = 0;
unsigned long flags;
- u32 old_read;

if (buflen <= 0)
return -EINVAL;
@@ -528,8 +522,6 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info, void *buffer,
&bytes_avail_toread,
&bytes_avail_towrite);

- old_read = bytes_avail_toread;
-
/* Make sure there is something to read */
if (bytes_avail_toread < buflen) {
spin_unlock_irqrestore(&inring_info->ring_lock, flags);
@@ -560,7 +552,7 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info, void *buffer,

spin_unlock_irqrestore(&inring_info->ring_lock, flags);

- *signal = hv_need_to_signal_on_read(old_read, inring_info);
+ *signal = hv_need_to_signal_on_read(bytes_avail_towrite, inring_info);

return 0;
}
--
1.7.4.1

2015-07-20 02:13:39

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 4/5] drivers/hv: Migrate to new 'set-state' interface

From: Viresh Kumar <[email protected]>

Migrate hv driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: [email protected]
Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
---
drivers/hv/hv.c | 45 +++++++++++++++++++--------------------------
1 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 85498a7..b9dd5e8 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -271,7 +271,7 @@ static int hv_ce_set_next_event(unsigned long delta,
{
cycle_t current_tick;

- WARN_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT);
+ WARN_ON(!clockevent_state_oneshot(evt));

rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
current_tick += delta;
@@ -279,31 +279,24 @@ static int hv_ce_set_next_event(unsigned long delta,
return 0;
}

-static void hv_ce_setmode(enum clock_event_mode mode,
- struct clock_event_device *evt)
+static int hv_ce_shutdown(struct clock_event_device *evt)
+{
+ wrmsrl(HV_X64_MSR_STIMER0_COUNT, 0);
+ wrmsrl(HV_X64_MSR_STIMER0_CONFIG, 0);
+
+ return 0;
+}
+
+static int hv_ce_set_oneshot(struct clock_event_device *evt)
{
union hv_timer_config timer_cfg;

- switch (mode) {
- case CLOCK_EVT_MODE_PERIODIC:
- /* unsupported */
- break;
-
- case CLOCK_EVT_MODE_ONESHOT:
- timer_cfg.enable = 1;
- timer_cfg.auto_enable = 1;
- timer_cfg.sintx = VMBUS_MESSAGE_SINT;
- wrmsrl(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
- break;
-
- case CLOCK_EVT_MODE_UNUSED:
- case CLOCK_EVT_MODE_SHUTDOWN:
- wrmsrl(HV_X64_MSR_STIMER0_COUNT, 0);
- wrmsrl(HV_X64_MSR_STIMER0_CONFIG, 0);
- break;
- case CLOCK_EVT_MODE_RESUME:
- break;
- }
+ timer_cfg.enable = 1;
+ timer_cfg.auto_enable = 1;
+ timer_cfg.sintx = VMBUS_MESSAGE_SINT;
+ wrmsrl(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
+
+ return 0;
}

static void hv_init_clockevent_device(struct clock_event_device *dev, int cpu)
@@ -318,7 +311,8 @@ static void hv_init_clockevent_device(struct clock_event_device *dev, int cpu)
* references to the hv_vmbus module making it impossible to unload.
*/

- dev->set_mode = hv_ce_setmode;
+ dev->set_state_shutdown = hv_ce_shutdown;
+ dev->set_state_oneshot = hv_ce_set_oneshot;
dev->set_next_event = hv_ce_set_next_event;
}

@@ -512,8 +506,7 @@ void hv_synic_cleanup(void *arg)

/* Turn off clockevent device */
if (ms_hyperv.features & HV_X64_MSR_SYNTIMER_AVAILABLE)
- hv_ce_setmode(CLOCK_EVT_MODE_SHUTDOWN,
- hv_context.clk_evt[cpu]);
+ hv_ce_shutdown(hv_context.clk_evt[cpu]);

rdmsrl(HV_X64_MSR_SINT0 + VMBUS_MESSAGE_SINT, shared_sint.as_uint64);

--
1.7.4.1

2015-07-20 02:13:15

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH 5/5] Drivers: hv: vmbus: Implement a clocksource based on the TSC page

The current Hyper-V clock source is based on the per-partition reference counter
and this counter is being accessed via s synthetic MSR - HV_X64_MSR_TIME_REF_COUNT.
Hyper-V has a more efficient way of computing the per-partition reference
counter value that does not involve reading a synthetic MSR. We implement
a time source based on this mechanism.

Tested-by: Vivek Yadav <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
---
arch/x86/include/uapi/asm/hyperv.h | 2 +
drivers/hv/hv.c | 66 ++++++++++++++++++++++++++++++++++++
drivers/hv/hyperv_vmbus.h | 9 +++++
3 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 8fba544..89f15e3 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -27,6 +27,8 @@
#define HV_X64_MSR_VP_RUNTIME_AVAILABLE (1 << 0)
/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
#define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1)
+/* Partition reference TSC MSR is available */
+#define HV_X64_MSR_REFERENCE_TSC_AVAILABLE (1 << 9)

/* A partition's reference time stamp counter (TSC) page */
#define HV_X64_MSR_REFERENCE_TSC 0x40000021
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index b9dd5e8..875f96b 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -130,6 +130,54 @@ static u64 do_hypercall(u64 control, void *input, void *output)
#endif /* !x86_64 */
}

+static cycle_t read_hv_clock_tsc(struct clocksource *arg)
+{
+ cycle_t current_tick;
+ struct ms_hyperv_tsc_page *tsc_pg = hv_context.tsc_page;
+
+ if (tsc_pg->tsc_sequence != -1) {
+ /*
+ * Use the tsc page to compute the value.
+ */
+
+ while (1) {
+ cycle_t tmp;
+ u32 sequence = tsc_pg->tsc_sequence;
+ u64 cur_tsc;
+ u64 scale = tsc_pg->tsc_scale;
+ s64 offset = tsc_pg->tsc_offset;
+
+ rdtscll(cur_tsc);
+ /* current_tick = ((cur_tsc *scale) >> 64) + offset */
+ asm("mulq %3"
+ : "=d" (current_tick), "=a" (tmp)
+ : "a" (cur_tsc), "r" (scale));
+
+ current_tick += offset;
+ if (tsc_pg->tsc_sequence == sequence)
+ return current_tick;
+
+ if (tsc_pg->tsc_sequence != -1)
+ continue;
+ /*
+ * Fallback using MSR method.
+ */
+ break;
+ }
+ }
+ rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
+ return current_tick;
+}
+
+static struct clocksource hyperv_cs_tsc = {
+ .name = "hyperv_clocksource_tsc_page",
+ .rating = 425,
+ .read = read_hv_clock_tsc,
+ .mask = CLOCKSOURCE_MASK(64),
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+
/*
* hv_init - Main initialization routine.
*
@@ -139,7 +187,9 @@ int hv_init(void)
{
int max_leaf;
union hv_x64_msr_hypercall_contents hypercall_msr;
+ union hv_x64_msr_hypercall_contents tsc_msr;
void *virtaddr = NULL;
+ void *va_tsc = NULL;

memset(hv_context.synic_event_page, 0, sizeof(void *) * NR_CPUS);
memset(hv_context.synic_message_page, 0,
@@ -183,6 +233,22 @@ int hv_init(void)

hv_context.hypercall_page = virtaddr;

+#ifdef CONFIG_X86_64
+ if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
+ va_tsc = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+ if (!va_tsc)
+ goto cleanup;
+ hv_context.tsc_page = va_tsc;
+
+ rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+ tsc_msr.enable = 1;
+ tsc_msr.guest_physical_address = vmalloc_to_pfn(va_tsc);
+
+ wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+ clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
+ }
+#endif
return 0;

cleanup:
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 7bdef66..4b1eb6d 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -517,6 +517,7 @@ struct hv_context {
u64 guestid;

void *hypercall_page;
+ void *tsc_page;

bool synic_initialized;

@@ -560,6 +561,14 @@ struct hv_context {

extern struct hv_context hv_context;

+struct ms_hyperv_tsc_page {
+ volatile u32 tsc_sequence;
+ u32 reserved1;
+ volatile u64 tsc_scale;
+ volatile s64 tsc_offset;
+ u64 reserved2[509];
+};
+
struct hv_ring_buffer_debug_info {
u32 current_interrupt_mask;
u32 current_read_index;
--
1.7.4.1

2015-07-20 11:33:43

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH 3/5] Drivers: hv_vmbus: Fix signal to host condition

> -----Original Message-----
> From: deve On Behalf of K. Y. Srinivasan
> Sent: Monday, July 20, 2015 11:37
>
> From: Christopher Oo
>
> Fixes a bug where previously hv_ringbuffer_read would pass in the old
> number of bytes available to read instead of the expected old read index
> when calculating when to signal to the host that the ringbuffer is empty.
> Since the previous write size is already saved, also changes the
> hv_need_to_signal_on_read to use the previously read value rather than
> recalculating it.
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> @@ -560,7 +552,7 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info
> *inring_info, void *buffer,
>
> spin_unlock_irqrestore(&inring_info->ring_lock, flags);
>
> - *signal = hv_need_to_signal_on_read(old_read, inring_info);
> + *signal = hv_need_to_signal_on_read(bytes_avail_towrite, inring_info);
>
> return 0;
> }

Good catch!

-- Dexuan

2015-07-20 12:14:22

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH 5/5] Drivers: hv: vmbus: Implement a clocksource based on the TSC page

> From: devel [mailto:driverdev-devel-bounces...linuxdriverproject.org] On Behalf
> Of K. Y. Srinivasan
> Sent: Monday, July 20, 2015 11:37
>
> The current Hyper-V clock source is based on the per-partition reference counter
> and this counter is being accessed via s synthetic MSR -
> HV_X64_MSR_TIME_REF_COUNT.
> Hyper-V has a more efficient way of computing the per-partition reference
> counter value that does not involve reading a synthetic MSR. We implement
> a time source based on this mechanism.
> ...
> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> @@ -183,6 +233,22 @@ int hv_init(void)
>
> hv_context.hypercall_page = virtaddr;
>
> +#ifdef CONFIG_X86_64
> + if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
> + va_tsc = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> + if (!va_tsc)
> + goto cleanup;
> + hv_context.tsc_page = va_tsc;
> +
> + rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
> +
> + tsc_msr.enable = 1;
> + tsc_msr.guest_physical_address = vmalloc_to_pfn(va_tsc);
> +
> + wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
> + clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
> + }
> +#endif

Should we disable the mechanism and vfree() the page in hv_cleanup() for kexec/kdump?

-- Dexuan

2015-07-20 18:50:56

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH 5/5] Drivers: hv: vmbus: Implement a clocksource based on the TSC page



> -----Original Message-----
> From: Dexuan Cui
> Sent: Monday, July 20, 2015 5:14 AM
> To: KY Srinivasan; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH 5/5] Drivers: hv: vmbus: Implement a clocksource based
> on the TSC page
>
> > From: devel [mailto:driverdev-devel-bounces...linuxdriverproject.org] On
> Behalf
> > Of K. Y. Srinivasan
> > Sent: Monday, July 20, 2015 11:37
> >
> > The current Hyper-V clock source is based on the per-partition reference
> counter
> > and this counter is being accessed via s synthetic MSR -
> > HV_X64_MSR_TIME_REF_COUNT.
> > Hyper-V has a more efficient way of computing the per-partition reference
> > counter value that does not involve reading a synthetic MSR. We
> implement
> > a time source based on this mechanism.
> > ...
> > diff --git a/arch/x86/include/uapi/asm/hyperv.h
> > @@ -183,6 +233,22 @@ int hv_init(void)
> >
> > hv_context.hypercall_page = virtaddr;
> >
> > +#ifdef CONFIG_X86_64
> > + if (ms_hyperv.features &
> HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
> > + va_tsc = __vmalloc(PAGE_SIZE, GFP_KERNEL,
> PAGE_KERNEL);
> > + if (!va_tsc)
> > + goto cleanup;
> > + hv_context.tsc_page = va_tsc;
> > +
> > + rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
> > +
> > + tsc_msr.enable = 1;
> > + tsc_msr.guest_physical_address = vmalloc_to_pfn(va_tsc);
> > +
> > + wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
> > + clocksource_register_hz(&hyperv_cs_tsc,
> NSEC_PER_SEC/100);
> > + }
> > +#endif
>
> Should we disable the mechanism and vfree() the page in hv_cleanup() for
> kexec/kdump?

Thanks Dexuan. I will update the patch and resend.

Regards,

K. Y
>
> -- Dexuan