2007-12-20 13:34:19

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 0/16] lguest: introduce vcpu structure

this patch makes room for the vcpu structure in lguest, already used in
this very same way at lguest64. It's the first part of our plan to
have lguest and lguest64 unified too.

When two dogs hang out, you don't have new puppies right in the other day.
Some time has to be elapsed. They have to grow first. In this same spirit, having these
patches _do not_ mean smp guests can be launched (yet)
Much more work is to come, but this is the basic infrastructure.

Enjoy


2007-12-20 13:34:33

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 01/16] introduce vcpu struct

this patch introduces a vcpu struct for lguest. In upcoming patches,
more and more fields will be moved from the lguest struct to the vcpu

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lg.h | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 8692489..9723732 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -38,6 +38,13 @@ struct lguest_pages
#define CHANGED_GDT_TLS 4 /* Actually a subset of CHANGED_GDT */
#define CHANGED_ALL 3

+struct lguest;
+
+struct lguest_vcpu {
+ int vcpu_id;
+ struct lguest *lg;
+};
+
/* The private info the thread maintains about the guest. */
struct lguest
{
@@ -47,6 +54,9 @@ struct lguest
struct lguest_data __user *lguest_data;
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
+ struct lguest_vcpu vcpus[NR_CPUS];
+ unsigned int nr_vcpus;
+
u32 pfn_limit;
/* This provides the offset to the base of guest-physical
* memory in the Launcher. */
@@ -92,6 +102,11 @@ struct lguest
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};

+static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu)
+{
+ return container_of((vcpu - vcpu->vcpu_id), struct lguest, vcpus[0]);
+}
+
extern struct mutex lguest_lock;

/* core.c: */
--
1.5.0.6

2007-12-20 13:34:50

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 03/16] initialize vcpu

this patch initializes the first vcpu in the initialize() routing,
which is responsible for starting the process of putting the guest up.
right now, as much of the fields are still not per-vcpu, it does not
do much.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lguest_user.c | 17 +++++++++++++++++
1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 3b92a61..d1b1c26 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -88,6 +88,17 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
return run_guest(lg, (unsigned long __user *)user);
}

+static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
+ unsigned long start_ip)
+{
+ vcpu->vcpu_id = vcpu_id;
+
+ vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
+ vcpu->lg->nr_vcpus++;
+
+ return 0;
+}
+
/*L:020 The initialization write supplies 4 pointer sized (32 or 64 bit)
* values (in addition to the LHREQ_INITIALIZE value). These are:
*
@@ -134,6 +145,12 @@ static int initialize(struct file *file, const unsigned long __user *input)
lg->mem_base = (void __user *)(long)args[0];
lg->pfn_limit = args[1];

+ /* This is the first cpu */
+ lg->nr_vcpus = 0;
+ err = vcpu_start(&lg->vcpus[0], 0, args[3]);
+ if (err)
+ goto release_guest;
+
/* We need a complete page for the Guest registers: they are accessible
* to the Guest and we can only grant it access to whole pages. */
lg->regs_page = get_zeroed_page(GFP_KERNEL);
--
1.5.0.6

2007-12-20 13:35:15

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 02/16] adapt lguest launcher to per-cpuness

This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
Documentation/lguest/lguest.c | 24 +++++++++++++++++-------
1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 9b0e322..c406ba9 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -79,6 +79,9 @@ static void *guest_base;
/* The maximum guest physical address allowed, and maximum possible. */
static unsigned long guest_limit, guest_max;

+/* a per-cpu variable indicating whose vcpu is currently running */
+static unsigned int __thread vcpu_id;
+
/* This is our list of devices. */
struct device_list
{
@@ -554,7 +557,7 @@ static void wake_parent(int pipefd, int lguest_fd)
else
FD_CLR(-fd - 1, &devices.infds);
} else /* Send LHREQ_BREAK command. */
- write(lguest_fd, args, sizeof(args));
+ pwrite(lguest_fd, args, sizeof(args), 0);
}
}

@@ -1511,7 +1514,8 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
int readval;

/* We read from the /dev/lguest device to run the Guest. */
- readval = read(lguest_fd, &notify_addr, sizeof(notify_addr));
+ readval = pread(lguest_fd, &notify_addr,
+ sizeof(notify_addr), vcpu_id);

/* One unsigned long means the Guest did HCALL_NOTIFY */
if (readval == sizeof(notify_addr)) {
@@ -1521,17 +1525,22 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* ENOENT means the Guest died. Reading tells us why. */
} else if (errno == ENOENT) {
char reason[1024] = { 0 };
- read(lguest_fd, reason, sizeof(reason)-1);
+ pread(lguest_fd, reason, sizeof(reason)-1, vcpu_id);
errx(1, "%s", reason);
/* EAGAIN means the Waker wanted us to look at some input.
* Anything else means a bug or incompatible change. */
} else if (errno != EAGAIN)
err(1, "Running guest failed");

- /* Service input, then unset the BREAK to release the Waker. */
- handle_input(lguest_fd);
- if (write(lguest_fd, args, sizeof(args)) < 0)
- err(1, "Resetting break");
+ if (!vcpu_id) {
+ /*
+ * Service input, then unset the BREAK to
+ * release the Waker.
+ */
+ handle_input(lguest_fd);
+ if (pwrite(lguest_fd, args, sizeof(args), 0) < 0)
+ err(1, "Resetting break");
+ }
}
}
/*
@@ -1582,6 +1591,7 @@ int main(int argc, char *argv[])
devices.lastdev = &devices.dev;
devices.next_irq = 1;

+ vcpu_id = 0;
/* We need to know how much memory so we can set up the device
* descriptor and memory pages for the devices as we parse the command
* line. So we quickly look through the arguments to find the amount
--
1.5.0.6

2007-12-20 13:35:45

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 04/16] per-cpu run guest

This patch makes the run_guest() routine use the vcpu struct.
This is required since in a smp guest environment, there's no
more the notion of "running the guest", but rather, it is "running the vcpu"

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 ++++--
drivers/lguest/lg.h | 4 ++--
drivers/lguest/lguest_user.c | 6 +++++-
drivers/lguest/x86/core.c | 16 +++++++++++-----
4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index cb4c670..70fc65e 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -174,8 +174,10 @@ void __lgwrite(struct lguest *lg, unsigned long addr, const void *b,
/*H:030 Let's jump straight to the the main loop which runs the Guest.
* Remember, this is called by the Launcher reading /dev/lguest, and we keep
* going around and around until something interesting happens. */
-int run_guest(struct lguest *lg, unsigned long __user *user)
+int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
{
+ struct lguest *lg = vcpu->lg;
+
/* We stop running once the Guest is dead. */
while (!lg->dead) {
/* First we run any hypercalls the Guest wants done. */
@@ -226,7 +228,7 @@ int run_guest(struct lguest *lg, unsigned long __user *user)
local_irq_disable();

/* Actually run the Guest until something happens. */
- lguest_arch_run_guest(lg);
+ lguest_arch_run_guest(vcpu);

/* Now we're ready to be interrupted or moved to other CPUs */
local_irq_enable();
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 9723732..c4a0a97 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -131,7 +131,7 @@ void __lgwrite(struct lguest *, unsigned long, const void *, unsigned);
} while(0)
/* (end of memory access helper routines) :*/

-int run_guest(struct lguest *lg, unsigned long __user *user);
+int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user);

/* Helper macros to obtain the first 12 or the last 20 bits, this is only the
* first step in the migration to the kernel types. pte_pfn is already defined
@@ -182,7 +182,7 @@ void page_table_guest_data_init(struct lguest *lg);
/* <arch>/core.c: */
void lguest_arch_host_init(void);
void lguest_arch_host_fini(void);
-void lguest_arch_run_guest(struct lguest *lg);
+void lguest_arch_run_guest(struct lguest_vcpu *vcpu);
void lguest_arch_handle_trap(struct lguest *lg);
int lguest_arch_init_hypercalls(struct lguest *lg);
int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index d1b1c26..894d530 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -55,11 +55,15 @@ static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
{
struct lguest *lg = file->private_data;
+ struct lguest_vcpu *vcpu = NULL;
+ unsigned int vcpu_id = *o;

/* You must write LHREQ_INITIALIZE first! */
if (!lg)
return -EINVAL;

+ vcpu = &lg->vcpus[vcpu_id];
+
/* If you're not the task which owns the Guest, go away. */
if (current != lg->tsk)
return -EPERM;
@@ -85,7 +89,7 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
lg->pending_notify = 0;

/* Run the Guest until something interesting happens. */
- return run_guest(lg, (unsigned long __user *)user);
+ return run_guest(vcpu, (unsigned long __user *)user);
}

static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 482aec2..0530ef3 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -73,8 +73,10 @@ static DEFINE_PER_CPU(struct lguest *, last_guest);
* since it last ran. We saw this set in interrupts_and_traps.c and
* segments.c.
*/
-static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
+static void copy_in_guest_info(struct lguest_vcpu *vcpu,
+ struct lguest_pages *pages)
{
+ struct lguest *lg = vcpu->lg;
/* Copying all this data can be quite expensive. We usually run the
* same Guest we ran last time (and that Guest hasn't run anywhere else
* meanwhile). If that's not the case, we pretend everything in the
@@ -113,14 +115,16 @@ static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
}

/* Finally: the code to actually call into the Switcher to run the Guest. */
-static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)
+static void run_guest_once(struct lguest_vcpu *vcpu,
+ struct lguest_pages *pages)
{
/* This is a dummy value we need for GCC's sake. */
unsigned int clobber;
+ struct lguest *lg = vcpu->lg;

/* Copy the guest-specific information into this CPU's "struct
* lguest_pages". */
- copy_in_guest_info(lg, pages);
+ copy_in_guest_info(vcpu, pages);

/* Set the trap number to 256 (impossible value). If we fault while
* switching to the Guest (bad segment registers or bug), this will
@@ -161,8 +165,10 @@ static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)

/*H:040 This is the i386-specific code to setup and run the Guest. Interrupts
* are disabled: we own the CPU. */
-void lguest_arch_run_guest(struct lguest *lg)
+void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
+
/* Remember the awfully-named TS bit? If the Guest has asked to set it
* we set it now, so we can trap and pass that trap to the Guest if it
* uses the FPU. */
@@ -180,7 +186,7 @@ void lguest_arch_run_guest(struct lguest *lg)
/* Now we actually run the Guest. It will return when something
* interesting happens, and we can examine its registers to see what it
* was doing. */
- run_guest_once(lg, lguest_pages(raw_smp_processor_id()));
+ run_guest_once(vcpu, lguest_pages(raw_smp_processor_id()));

/* Note that the "regs" pointer contains two extra entries which are
* not really registers: a trap number which says what interrupt or
--
1.5.0.6

2007-12-20 13:36:03

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 05/16] make write() operation smp aware

This patch makes the write() file operation smp aware. Which means, receiving
the vcpu_id value through the offset parameter, and being well aware to which
vcpu we're talking to.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lguest_user.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 894d530..ae5bf4c 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -223,14 +223,21 @@ static ssize_t write(struct file *file, const char __user *in,
struct lguest *lg = file->private_data;
const unsigned long __user *input = (const unsigned long __user *)in;
unsigned long req;
+ struct lguest_vcpu *vcpu = NULL;
+ int vcpu_id = *off;

if (get_user(req, input) != 0)
return -EFAULT;
input++;

/* If you haven't initialized, you must do that first. */
- if (req != LHREQ_INITIALIZE && !lg)
- return -EINVAL;
+ if (req != LHREQ_INITIALIZE) {
+ if (!lg)
+ return -EINVAL;
+ vcpu = &lg->vcpus[vcpu_id];
+ if (!vcpu)
+ return -EINVAL;
+ }

/* Once the Guest is dead, all you can do is read() why it died. */
if (lg && lg->dead)
--
1.5.0.6

2007-12-20 13:36:29

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 06/16] make hypercalls use the vcpu struct

this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their
callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole
guest

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 +++---
drivers/lguest/hypercalls.c | 42 +++++++++++++++++++++++-------------------
drivers/lguest/lg.h | 16 ++++++++--------
drivers/lguest/x86/core.c | 16 ++++++++++------
4 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 70fc65e..ef35e02 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -181,8 +181,8 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
/* We stop running once the Guest is dead. */
while (!lg->dead) {
/* First we run any hypercalls the Guest wants done. */
- if (lg->hcall)
- do_hypercalls(lg);
+ if (vcpu->hcall)
+ do_hypercalls(vcpu);

/* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
@@ -234,7 +234,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
local_irq_enable();

/* Now we deal with whatever happened to the Guest. */
- lguest_arch_handle_trap(lg);
+ lguest_arch_handle_trap(vcpu);
}

/* The Guest is dead => "No such file or directory" */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index b478aff..62da355 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -29,8 +29,10 @@

/*H:120 This is the core hypercall routine: where the Guest gets what it wants.
* Or gets killed. Or, in the case of LHCALL_CRASH, both. */
-static void do_hcall(struct lguest *lg, struct hcall_args *args)
+static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
{
+ struct lguest *lg = vcpu->lg;
+
switch (args->arg0) {
case LHCALL_FLUSH_ASYNC:
/* This call does nothing, except by breaking out of the Guest
@@ -91,7 +93,7 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args)
break;
default:
/* It should be an architecture-specific hypercall. */
- if (lguest_arch_do_hcall(lg, args))
+ if (lguest_arch_do_hcall(vcpu, args))
kill_guest(lg, "Bad hypercall %li\n", args->arg0);
}
}
@@ -104,10 +106,11 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args)
* Guest put them in the ring, but we also promise the Guest that they will
* happen before any normal hypercall (which is why we check this before
* checking for a normal hcall). */
-static void do_async_hcalls(struct lguest *lg)
+static void do_async_hcalls(struct lguest_vcpu *vcpu)
{
unsigned int i;
u8 st[LHCALL_RING_SIZE];
+ struct lguest *lg = vcpu->lg;

/* For simplicity, we copy the entire call status array in at once. */
if (copy_from_user(&st, &lg->lguest_data->hcall_status, sizeof(st)))
@@ -119,7 +122,7 @@ static void do_async_hcalls(struct lguest *lg)
/* We remember where we were up to from last time. This makes
* sure that the hypercalls are done in the order the Guest
* places them in the ring. */
- unsigned int n = lg->next_hcall;
+ unsigned int n = vcpu->next_hcall;

/* 0xFF means there's no call here (yet). */
if (st[n] == 0xFF)
@@ -127,8 +130,8 @@ static void do_async_hcalls(struct lguest *lg)

/* OK, we have hypercall. Increment the "next_hcall" cursor,
* and wrap back to 0 if we reach the end. */
- if (++lg->next_hcall == LHCALL_RING_SIZE)
- lg->next_hcall = 0;
+ if (++vcpu->next_hcall == LHCALL_RING_SIZE)
+ vcpu->next_hcall = 0;

/* Copy the hypercall arguments into a local copy of
* the hcall_args struct. */
@@ -139,7 +142,7 @@ static void do_async_hcalls(struct lguest *lg)
}

/* Do the hypercall, same as a normal one. */
- do_hcall(lg, &args);
+ do_hcall(vcpu, &args);

/* Mark the hypercall done. */
if (put_user(0xFF, &lg->lguest_data->hcall_status[n])) {
@@ -156,16 +159,17 @@ static void do_async_hcalls(struct lguest *lg)

/* Last of all, we look at what happens first of all. The very first time the
* Guest makes a hypercall, we end up here to set things up: */
-static void initialize(struct lguest *lg)
+static void initialize(struct lguest_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
/* You can't do anything until you're initialized. The Guest knows the
* rules, so we're unforgiving here. */
- if (lg->hcall->arg0 != LHCALL_LGUEST_INIT) {
- kill_guest(lg, "hypercall %li before INIT", lg->hcall->arg0);
+ if (vcpu->hcall->arg0 != LHCALL_LGUEST_INIT) {
+ kill_guest(lg, "hypercall %li before INIT", vcpu->hcall->arg0);
return;
}

- if (lguest_arch_init_hypercalls(lg))
+ if (lguest_arch_init_hypercalls(vcpu))
kill_guest(lg, "bad guest page %p", lg->lguest_data);

/* The Guest tells us where we're not to deliver interrupts by putting
@@ -194,27 +198,27 @@ static void initialize(struct lguest *lg)
* Remember from the Guest, hypercalls come in two flavors: normal and
* asynchronous. This file handles both of types.
*/
-void do_hypercalls(struct lguest *lg)
+void do_hypercalls(struct lguest_vcpu *vcpu)
{
/* Not initialized yet? This hypercall must do it. */
- if (unlikely(!lg->lguest_data)) {
+ if (unlikely(!vcpu->lg->lguest_data)) {
/* Set up the "struct lguest_data" */
- initialize(lg);
+ initialize(vcpu);
/* Hcall is done. */
- lg->hcall = NULL;
+ vcpu->hcall = NULL;
return;
}

/* The Guest has initialized.
*
* Look in the hypercall ring for the async hypercalls: */
- do_async_hcalls(lg);
+ do_async_hcalls(vcpu);

/* If we stopped reading the hypercall ring because the Guest did a
* NOTIFY to the Launcher, we want to return now. Otherwise we do
* the hypercall. */
- if (!lg->pending_notify) {
- do_hcall(lg, lg->hcall);
+ if (!vcpu->lg->pending_notify) {
+ do_hcall(vcpu, vcpu->hcall);
/* Tricky point: we reset the hcall pointer to mark the
* hypercall as "done". We use the hcall pointer rather than
* the trap number to indicate a hypercall is pending.
@@ -225,7 +229,7 @@ void do_hypercalls(struct lguest *lg)
* Launcher, the run_guest() loop will exit without running the
* Guest. When it comes back it would try to re-run the
* hypercall. */
- lg->hcall = NULL;
+ vcpu->hcall = NULL;
}
}

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index c4a0a97..696cdf1 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -43,6 +43,10 @@ struct lguest;
struct lguest_vcpu {
int vcpu_id;
struct lguest *lg;
+
+ /* If a hypercall was asked for, this points to the arguments. */
+ struct hcall_args *hcall;
+ u32 next_hcall;
};

/* The private info the thread maintains about the guest. */
@@ -65,13 +69,9 @@ struct lguest
u32 cr2;
int halted;
int ts;
- u32 next_hcall;
u32 esp1;
u8 ss1;

- /* If a hypercall was asked for, this points to the arguments. */
- struct hcall_args *hcall;
-
/* Do we need to stop what we're doing and return to userspace? */
int break_out;
wait_queue_head_t break_wq;
@@ -183,9 +183,9 @@ void page_table_guest_data_init(struct lguest *lg);
void lguest_arch_host_init(void);
void lguest_arch_host_fini(void);
void lguest_arch_run_guest(struct lguest_vcpu *vcpu);
-void lguest_arch_handle_trap(struct lguest *lg);
-int lguest_arch_init_hypercalls(struct lguest *lg);
-int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args);
+void lguest_arch_handle_trap(struct lguest_vcpu *vcpu);
+int lguest_arch_init_hypercalls(struct lguest_vcpu *vcpu);
+int lguest_arch_do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args);
void lguest_arch_setup_regs(struct lguest *lg, unsigned long start);

/* <arch>/switcher.S: */
@@ -196,7 +196,7 @@ int lguest_device_init(void);
void lguest_device_remove(void);

/* hypercalls.c: */
-void do_hypercalls(struct lguest *lg);
+void do_hypercalls(struct lguest_vcpu *vcpu);
void write_timestamp(struct lguest *lg);

/*L:035
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 0530ef3..5e56629 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -285,8 +285,9 @@ static int emulate_insn(struct lguest *lg)
}

/*H:050 Once we've re-enabled interrupts, we look at why the Guest exited. */
-void lguest_arch_handle_trap(struct lguest *lg)
+void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
switch (lg->regs->trapnum) {
case 13: /* We've intercepted a General Protection Fault. */
/* Check if this was one of those annoying IN or OUT
@@ -338,7 +339,7 @@ void lguest_arch_handle_trap(struct lguest *lg)
case LGUEST_TRAP_ENTRY:
/* Our 'struct hcall_args' maps directly over our regs: we set
* up the pointer now to indicate a hypercall is pending. */
- lg->hcall = (struct hcall_args *)lg->regs;
+ vcpu->hcall = (struct hcall_args *)lg->regs;
return;
}

@@ -493,8 +494,10 @@ void __exit lguest_arch_host_fini(void)


/*H:122 The i386-specific hypercalls simply farm out to the right functions. */
-int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args)
+int lguest_arch_do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
{
+ struct lguest *lg = vcpu->lg;
+
switch (args->arg0) {
case LHCALL_LOAD_GDT:
load_guest_gdt(lg, args->arg1, args->arg2);
@@ -513,13 +516,14 @@ int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args)
}

/*H:126 i386-specific hypercall initialization: */
-int lguest_arch_init_hypercalls(struct lguest *lg)
+int lguest_arch_init_hypercalls(struct lguest_vcpu *vcpu)
{
u32 tsc_speed;
+ struct lguest *lg = vcpu->lg;

/* The pointer to the Guest's "struct lguest_data" is the only
* argument. We check that address now. */
- if (!lguest_address_ok(lg, lg->hcall->arg1, sizeof(*lg->lguest_data)))
+ if (!lguest_address_ok(lg, vcpu->hcall->arg1, sizeof(*lg->lguest_data)))
return -EFAULT;

/* Having checked it, we simply set lg->lguest_data to point straight
@@ -527,7 +531,7 @@ int lguest_arch_init_hypercalls(struct lguest *lg)
* copy_to_user/from_user from now on, instead of lgread/write. I put
* this in to show that I'm not immune to writing stupid
* optimizations. */
- lg->lguest_data = lg->mem_base + lg->hcall->arg1;
+ lg->lguest_data = lg->mem_base + vcpu->hcall->arg1;

/* We insist that the Time Stamp Counter exist and doesn't change with
* cpu frequency. Some devious chip manufacturers decided that TSC
--
1.5.0.6

2007-12-20 13:36:46

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 07/16] per-vcpu lguest timers

Here, I introduce per-vcpu timers. With this, we can have
local expiries, needed for accounting time in smp guests

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 20 ++++++++++----------
drivers/lguest/lg.h | 10 +++++-----
drivers/lguest/lguest_user.c | 12 +++++++-----
4 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 62da355..4364bc2 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -78,7 +78,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
guest_set_pmd(lg, args->arg1, args->arg2);
break;
case LHCALL_SET_CLOCKEVENT:
- guest_set_clockevent(lg, args->arg1);
+ guest_set_clockevent(vcpu, args->arg1);
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 2b66f79..189d66e 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -470,13 +470,13 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt,
* infrastructure to set a callback at that time.
*
* 0 means "turn off the clock". */
-void guest_set_clockevent(struct lguest *lg, unsigned long delta)
+void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta)
{
ktime_t expires;

if (unlikely(delta == 0)) {
/* Clock event device is shutting down. */
- hrtimer_cancel(&lg->hrt);
+ hrtimer_cancel(&vcpu->hrt);
return;
}

@@ -484,25 +484,25 @@ void guest_set_clockevent(struct lguest *lg, unsigned long delta)
* all the time between now and the timer interrupt it asked for. This
* is almost always the right thing to do. */
expires = ktime_add_ns(ktime_get_real(), delta);
- hrtimer_start(&lg->hrt, expires, HRTIMER_MODE_ABS);
+ hrtimer_start(&vcpu->hrt, expires, HRTIMER_MODE_ABS);
}

/* This is the function called when the Guest's timer expires. */
static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
{
- struct lguest *lg = container_of(timer, struct lguest, hrt);
+ struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt);

/* Remember the first interrupt is the timer interrupt. */
- set_bit(0, lg->irqs_pending);
+ set_bit(0, vcpu->lg->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
- if (lg->halted)
- wake_up_process(lg->tsk);
+ if (vcpu->lg->halted)
+ wake_up_process(vcpu->lg->tsk);
return HRTIMER_NORESTART;
}

/* This sets up the timer for this Guest. */
-void init_clockdev(struct lguest *lg)
+void init_clockdev(struct lguest_vcpu *vcpu)
{
- hrtimer_init(&lg->hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
- lg->hrt.function = clockdev_fn;
+ hrtimer_init(&vcpu->hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+ vcpu->hrt.function = clockdev_fn;
}
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 696cdf1..0205409 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -47,6 +47,9 @@ struct lguest_vcpu {
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
+
+ /* Virtual clock device */
+ struct hrtimer hrt;
};

/* The private info the thread maintains about the guest. */
@@ -95,9 +98,6 @@ struct lguest

struct lguest_arch arch;

- /* Virtual clock device */
- struct hrtimer hrt;
-
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};
@@ -150,8 +150,8 @@ void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
void copy_traps(const struct lguest *lg, struct desc_struct *idt,
const unsigned long *def);
-void guest_set_clockevent(struct lguest *lg, unsigned long delta);
-void init_clockdev(struct lguest *lg);
+void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta);
+void init_clockdev(struct lguest_vcpu *vcpu);
bool check_syscall_vector(struct lguest *lg);
int init_interrupts(void);
void free_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index ae5bf4c..7481e82 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -97,6 +97,9 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
{
vcpu->vcpu_id = vcpu_id;

+ /* The timer for lguest's clock needs initialization. */
+ init_clockdev(vcpu);
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -176,9 +179,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
* address. */
lguest_arch_setup_regs(lg, args[3]);

- /* The timer for lguest's clock needs initialization. */
- init_clockdev(lg);
-
/* We keep a pointer to the Launcher task (ie. current task) for when
* other Guests want to wake this one (inter-Guest I/O). */
lg->tsk = current;
@@ -269,6 +269,7 @@ static ssize_t write(struct file *file, const char __user *in,
static int close(struct inode *inode, struct file *file)
{
struct lguest *lg = file->private_data;
+ int i;

/* If we never successfully initialized, there's nothing to clean up */
if (!lg)
@@ -277,8 +278,9 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
- /* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
- hrtimer_cancel(&lg->hrt);
+ for (i = 0; i < lg->nr_vcpus; i++)
+ /* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
+ hrtimer_cancel(&lg->vcpus[i].hrt);
/* Free up the shadow page tables for the Guest. */
free_guest_pagetable(lg);
/* Now all the memory cleanups are done, it's safe to release the
--
1.5.0.6

2007-12-20 13:37:19

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 08/16] per-vcpu interrupt processing.

This patch adapts interrupt processing for using the vcpu struct.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 25 ++++++++++++++-----------
drivers/lguest/lg.h | 10 +++++-----
drivers/lguest/lguest_user.c | 7 ++++---
drivers/lguest/x86/core.c | 2 +-
5 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index ef35e02..4d0102d 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -203,7 +203,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
/* Check if there are any interrupts which can be delivered
* now: if so, this sets up the hander to be executed when we
* next run the Guest. */
- maybe_do_interrupt(lg);
+ maybe_do_interrupt(vcpu);

/* All long-lived kernel loops need to check with this horrible
* thing called the freezer. If the Host is trying to suspend,
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 189d66e..db440cb 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -60,11 +60,13 @@ static void push_guest_stack(struct lguest *lg, unsigned long *gstack, u32 val)
* We set up the stack just like the CPU does for a real interrupt, so it's
* identical for the Guest (and the standard "iret" instruction will undo
* it). */
-static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err)
+static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
+ int has_err)
{
unsigned long gstack, origstack;
u32 eflags, ss, irq_enable;
unsigned long virtstack;
+ struct lguest *lg = vcpu->lg;

/* There are two cases for interrupts: one where the Guest is already
* in the kernel, and a more complex one where the Guest is in
@@ -129,9 +131,10 @@ static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err)
*
* maybe_do_interrupt() gets called before every entry to the Guest, to see if
* we should divert the Guest to running an interrupt handler. */
-void maybe_do_interrupt(struct lguest *lg)
+void maybe_do_interrupt(struct lguest_vcpu *vcpu)
{
unsigned int irq;
+ struct lguest *lg = vcpu->lg;
DECLARE_BITMAP(blk, LGUEST_IRQS);
struct desc_struct *idt;

@@ -145,7 +148,7 @@ void maybe_do_interrupt(struct lguest *lg)
sizeof(blk)))
return;

- bitmap_andnot(blk, lg->irqs_pending, blk, LGUEST_IRQS);
+ bitmap_andnot(blk, vcpu->irqs_pending, blk, LGUEST_IRQS);

/* Find the first interrupt. */
irq = find_first_bit(blk, LGUEST_IRQS);
@@ -180,11 +183,11 @@ void maybe_do_interrupt(struct lguest *lg)
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt->a, idt->b)) {
/* OK, mark it no longer pending and deliver it. */
- clear_bit(irq, lg->irqs_pending);
+ clear_bit(irq, vcpu->irqs_pending);
/* set_guest_interrupt() takes the interrupt descriptor and a
* flag to say whether this interrupt pushes an error code onto
* the stack as well: virtual interrupts never do. */
- set_guest_interrupt(lg, idt->a, idt->b, 0);
+ set_guest_interrupt(vcpu, idt->a, idt->b, 0);
}

/* Every time we deliver an interrupt, we update the timestamp in the
@@ -245,19 +248,19 @@ static int has_err(unsigned int trap)
}

/* deliver_trap() returns true if it could deliver the trap. */
-int deliver_trap(struct lguest *lg, unsigned int num)
+int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num)
{
/* Trap numbers are always 8 bit, but we set an impossible trap number
* for traps inside the Switcher, so check that here. */
- if (num >= ARRAY_SIZE(lg->arch.idt))
+ if (num >= ARRAY_SIZE(vcpu->lg->arch.idt))
return 0;

/* Early on the Guest hasn't set the IDT entries (or maybe it put a
* bogus one in): if we fail here, the Guest will be killed. */
- if (!idt_present(lg->arch.idt[num].a, lg->arch.idt[num].b))
+ if (!idt_present(vcpu->lg->arch.idt[num].a, vcpu->lg->arch.idt[num].b))
return 0;
- set_guest_interrupt(lg, lg->arch.idt[num].a, lg->arch.idt[num].b,
- has_err(num));
+ set_guest_interrupt(vcpu, vcpu->lg->arch.idt[num].a,
+ vcpu->lg->arch.idt[num].b, has_err(num));
return 1;
}

@@ -493,7 +496,7 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt);

/* Remember the first interrupt is the timer interrupt. */
- set_bit(0, vcpu->lg->irqs_pending);
+ set_bit(0, vcpu->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
if (vcpu->lg->halted)
wake_up_process(vcpu->lg->tsk);
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 0205409..db2edd6 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -50,6 +50,9 @@ struct lguest_vcpu {

/* Virtual clock device */
struct hrtimer hrt;
+
+ /* Pending virtual interrupts */
+ DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};

/* The private info the thread maintains about the guest. */
@@ -97,9 +100,6 @@ struct lguest
const char *dead;

struct lguest_arch arch;
-
- /* Pending virtual interrupts */
- DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};

static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu)
@@ -141,8 +141,8 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user);
#define pgd_pfn(x) (pgd_val(x) >> PAGE_SHIFT)

/* interrupts_and_traps.c: */
-void maybe_do_interrupt(struct lguest *lg);
-int deliver_trap(struct lguest *lg, unsigned int num);
+void maybe_do_interrupt(struct lguest_vcpu *vcpu);
+int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num);
void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
void pin_stack_pages(struct lguest *lg);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 7481e82..60cf6c6 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -36,7 +36,8 @@ static int break_guest_out(struct lguest *lg, const unsigned long __user *input)

/*L:050 Sending an interrupt is done by writing LHREQ_IRQ and an interrupt
* number to /dev/lguest. */
-static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
+static int user_send_irq(struct lguest_vcpu *vcpu,
+ const unsigned long __user *input)
{
unsigned long irq;

@@ -46,7 +47,7 @@ static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
return -EINVAL;
/* Next time the Guest runs, the core code will see if it can deliver
* this interrupt. */
- set_bit(irq, lg->irqs_pending);
+ set_bit(irq, vcpu->irqs_pending);
return 0;
}

@@ -251,7 +252,7 @@ static ssize_t write(struct file *file, const char __user *in,
case LHREQ_INITIALIZE:
return initialize(file, input);
case LHREQ_IRQ:
- return user_send_irq(lg, input);
+ return user_send_irq(vcpu, input);
case LHREQ_BREAK:
return break_guest_out(lg, input);
default:
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 5e56629..3d21c6d 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -344,7 +344,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
}

/* We didn't handle the trap, so it needs to go to the Guest. */
- if (!deliver_trap(lg, lg->regs->trapnum))
+ if (!deliver_trap(vcpu, lg->regs->trapnum))
/* If the Guest doesn't have a handler (either it hasn't
* registered any yet, or it's one of the faults we don't let
* it handle), it dies with a cryptic error message. */
--
1.5.0.6

2007-12-20 13:37:49

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 09/16] map_switcher_in_guest() per-vcpu

The switcher needs to be mapped per-vcpu, because different vcpus
will potentially have different page tables (they don't have to,
because threads will share the same).

So our first step is the make the function receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lg.h | 3 ++-
drivers/lguest/page_tables.c | 4 +++-
drivers/lguest/x86/core.c | 2 +-
3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index db2edd6..f6e9020 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -173,7 +173,8 @@ void guest_pagetable_clear_all(struct lguest *lg);
void guest_pagetable_flush_user(struct lguest *lg);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages);
+void map_switcher_in_guest(struct lguest_vcpu *vcpu,
+ struct lguest_pages *pages);
int demand_page(struct lguest *info, unsigned long cr2, int errcode);
void pin_page(struct lguest *lg, unsigned long vaddr);
unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index fffabb3..7fb8627 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -634,8 +634,10 @@ void free_guest_pagetable(struct lguest *lg)
* Guest (and not the pages for other CPUs). We have the appropriate PTE pages
* for each CPU already set up, we just need to hook them in now we know which
* Guest is about to run on this CPU. */
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages)
+void map_switcher_in_guest(struct lguest_vcpu *vcpu,
+ struct lguest_pages *pages)
{
+ struct lguest *lg = vcpu->lg;
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 3d21c6d..9bf2213 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -92,7 +92,7 @@ static void copy_in_guest_info(struct lguest_vcpu *vcpu,
pages->state.host_cr3 = __pa(current->mm->pgd);
/* Set up the Guest's page tables to see this CPU's pages (and no
* other CPU's pages). */
- map_switcher_in_guest(lg, pages);
+ map_switcher_in_guest(vcpu, pages);
/* Set up the two "TSS" members which tell the CPU what stack to use
* for traps which do directly into the Guest (ie. traps at privilege
* level 1). */
--
1.5.0.6

2007-12-20 13:38:08

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 10/16] make emulate_insn receive a vcpu struct.

emulate_insn() needs to know about current eip, which will be,
in the future, a per-vcpu thing. So in this patch, the function
prototype is modified to receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/x86/core.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9bf2213..2fb9cd3 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -220,8 +220,9 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
* When the Guest uses one of these instructions, we get a trap (General
* Protection Fault) and come here. We see if it's one of those troublesome
* instructions and skip over it. We return true if we did. */
-static int emulate_insn(struct lguest *lg)
+static int emulate_insn(struct lguest_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
u8 insn;
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
@@ -294,7 +295,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
* instructions which we need to emulate. If so, we just go
* back into the Guest after we've done it. */
if (lg->regs->errcode == 0) {
- if (emulate_insn(lg))
+ if (emulate_insn(vcpu))
return;
}
break;
--
1.5.0.6

2007-12-20 13:38:33

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 12/16] replace lguest_arch with lguest_vcpu_arch.

The fields found in lguest_arch are not really per-guest,
but per-cpu (gdt, idt, etc). So this patch turns lguest_arch
into lguest_vcpu_arch.

It makes sense to have a per-guest per-arch struct, but this
can be addressed later, when the need arrives.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/interrupts_and_traps.c | 29 +++++++++++----------
drivers/lguest/lg.h | 19 +++++++-------
drivers/lguest/segments.c | 43 +++++++++++++++++---------------
drivers/lguest/x86/core.c | 24 ++++++++----------
include/asm-x86/lguest.h | 2 +-
5 files changed, 60 insertions(+), 57 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 1ceff5f..b3d444a 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -180,7 +180,7 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)
/* Look at the IDT entry the Guest gave us for this interrupt. The
* first 32 (FIRST_EXTERNAL_VECTOR) entries are for traps, so we skip
* over them. */
- idt = &lg->arch.idt[FIRST_EXTERNAL_VECTOR+irq];
+ idt = &vcpu->arch.idt[FIRST_EXTERNAL_VECTOR+irq];
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt->a, idt->b)) {
/* OK, mark it no longer pending and deliver it. */
@@ -253,15 +253,15 @@ int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num)
{
/* Trap numbers are always 8 bit, but we set an impossible trap number
* for traps inside the Switcher, so check that here. */
- if (num >= ARRAY_SIZE(vcpu->lg->arch.idt))
+ if (num >= ARRAY_SIZE(vcpu->arch.idt))
return 0;

/* Early on the Guest hasn't set the IDT entries (or maybe it put a
* bogus one in): if we fail here, the Guest will be killed. */
- if (!idt_present(vcpu->lg->arch.idt[num].a, vcpu->lg->arch.idt[num].b))
+ if (!idt_present(vcpu->arch.idt[num].a, vcpu->arch.idt[num].b))
return 0;
- set_guest_interrupt(vcpu, vcpu->lg->arch.idt[num].a,
- vcpu->lg->arch.idt[num].b, has_err(num));
+ set_guest_interrupt(vcpu, vcpu->arch.idt[num].a,
+ vcpu->arch.idt[num].b, has_err(num));
return 1;
}

@@ -387,7 +387,8 @@ static void set_trap(struct lguest *lg, struct desc_struct *trap,
*
* We saw the Guest setting Interrupt Descriptor Table (IDT) entries with the
* LHCALL_LOAD_IDT_ENTRY hypercall before: that comes here. */
-void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)
+void load_guest_idt_entry(struct lguest_vcpu *vcpu,
+ unsigned int num, u32 lo, u32 hi)
{
/* Guest never handles: NMI, doublefault, spurious interrupt or
* hypercall. We ignore when it tries to set them. */
@@ -396,13 +397,13 @@ void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)

/* Mark the IDT as changed: next time the Guest runs we'll know we have
* to copy this again. */
- lg->changed |= CHANGED_IDT;
+ vcpu->lg->changed |= CHANGED_IDT;

/* Check that the Guest doesn't try to step outside the bounds. */
- if (num >= ARRAY_SIZE(lg->arch.idt))
- kill_guest(lg, "Setting idt entry %u", num);
+ if (num >= ARRAY_SIZE(vcpu->arch.idt))
+ kill_guest(vcpu->lg, "Setting idt entry %u", num);
else
- set_trap(lg, &lg->arch.idt[num], num, lo, hi);
+ set_trap(vcpu->lg, &vcpu->arch.idt[num], num, lo, hi);
}

/* The default entry for each interrupt points into the Switcher routines which
@@ -438,14 +439,14 @@ void setup_default_idt_entries(struct lguest_ro_state *state,
/*H:240 We don't use the IDT entries in the "struct lguest" directly, instead
* we copy them into the IDT which we've set up for Guests on this CPU, just
* before we run the Guest. This routine does that copy. */
-void copy_traps(const struct lguest *lg, struct desc_struct *idt,
+void copy_traps(const struct lguest_vcpu *vcpu, struct desc_struct *idt,
const unsigned long *def)
{
unsigned int i;

/* We can simply copy the direct traps, otherwise we use the default
* ones in the Switcher: they will return to the Host. */
- for (i = 0; i < ARRAY_SIZE(lg->arch.idt); i++) {
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.idt); i++) {
/* If no Guest can ever override this trap, leave it alone. */
if (!direct_trap(i))
continue;
@@ -454,8 +455,8 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt,
* Interrupt gates (type 14) disable interrupts as they are
* entered, which we never let the Guest do. Not present
* entries (type 0x0) also can't go direct, of course. */
- if (idt_type(lg->arch.idt[i].a, lg->arch.idt[i].b) == 0xF)
- idt[i] = lg->arch.idt[i];
+ if (idt_type(vcpu->arch.idt[i].a, vcpu->arch.idt[i].b) == 0xF)
+ idt[i] = vcpu->arch.idt[i];
else
/* Reset it to the default. */
default_idt_entry(&idt[i], i, def[i]);
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index d05fe38..f9429ff 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -57,6 +57,8 @@ struct lguest_vcpu {

/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
+
+ struct lguest_vcpu_arch arch;
};

/* The private info the thread maintains about the guest. */
@@ -99,8 +101,6 @@ struct lguest

/* Dead? */
const char *dead;
-
- struct lguest_arch arch;
};

static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu)
@@ -144,12 +144,13 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user);
/* interrupts_and_traps.c: */
void maybe_do_interrupt(struct lguest_vcpu *vcpu);
int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num);
-void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
+void load_guest_idt_entry(struct lguest_vcpu *vcpu, unsigned int i,
+ u32 low, u32 hi);
void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
void pin_stack_pages(struct lguest *lg);
void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
-void copy_traps(const struct lguest *lg, struct desc_struct *idt,
+void copy_traps(const struct lguest_vcpu *vcpu, struct desc_struct *idt,
const unsigned long *def);
void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta);
void init_clockdev(struct lguest_vcpu *vcpu);
@@ -159,11 +160,11 @@ void free_interrupts(void);

/* segments.c: */
void setup_default_gdt_entries(struct lguest_ro_state *state);
-void setup_guest_gdt(struct lguest *lg);
-void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num);
-void guest_load_tls(struct lguest *lg, unsigned long tls_array);
-void copy_gdt(const struct lguest *lg, struct desc_struct *gdt);
-void copy_gdt_tls(const struct lguest *lg, struct desc_struct *gdt);
+void setup_guest_gdt(struct lguest_vcpu *vcpu);
+void load_guest_gdt(struct lguest_vcpu *vcpu, unsigned long table, u32 num);
+void guest_load_tls(struct lguest_vcpu *vcpu, unsigned long tls_array);
+void copy_gdt(const struct lguest_vcpu *vcpu, struct desc_struct *gdt);
+void copy_gdt_tls(const struct lguest_vcpu *vcpu, struct desc_struct *gdt);

/* page_tables.c: */
int init_guest_pagetable(struct lguest *lg, unsigned long pgtable);
diff --git a/drivers/lguest/segments.c b/drivers/lguest/segments.c
index 9e189cb..c9608bd 100644
--- a/drivers/lguest/segments.c
+++ b/drivers/lguest/segments.c
@@ -58,7 +58,8 @@ static int ignored_gdt(unsigned int num)
* Protection Fault in the Switcher when it restores a Guest segment register
* which tries to use that entry. Then we kill the Guest for causing such a
* mess: the message will be "unhandled trap 256". */
-static void fixup_gdt_table(struct lguest *lg, unsigned start, unsigned end)
+static void fixup_gdt_table(struct lguest_vcpu *vcpu, unsigned start,
+ unsigned end)
{
unsigned int i;

@@ -71,14 +72,14 @@ static void fixup_gdt_table(struct lguest *lg, unsigned start, unsigned end)
/* Segment descriptors contain a privilege level: the Guest is
* sometimes careless and leaves this as 0, even though it's
* running at privilege level 1. If so, we fix it here. */
- if ((lg->arch.gdt[i].b & 0x00006000) == 0)
- lg->arch.gdt[i].b |= (GUEST_PL << 13);
+ if ((vcpu->arch.gdt[i].b & 0x00006000) == 0)
+ vcpu->arch.gdt[i].b |= (GUEST_PL << 13);

/* Each descriptor has an "accessed" bit. If we don't set it
* now, the CPU will try to set it when the Guest first loads
* that entry into a segment register. But the GDT isn't
* writable by the Guest, so bad things can happen. */
- lg->arch.gdt[i].b |= 0x00000100;
+ vcpu->arch.gdt[i].b |= 0x00000100;
}
}

@@ -109,31 +110,31 @@ void setup_default_gdt_entries(struct lguest_ro_state *state)

/* This routine sets up the initial Guest GDT for booting. All entries start
* as 0 (unusable). */
-void setup_guest_gdt(struct lguest *lg)
+void setup_guest_gdt(struct lguest_vcpu *vcpu)
{
/* Start with full 0-4G segments... */
- lg->arch.gdt[GDT_ENTRY_KERNEL_CS] = FULL_EXEC_SEGMENT;
- lg->arch.gdt[GDT_ENTRY_KERNEL_DS] = FULL_SEGMENT;
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_CS] = FULL_EXEC_SEGMENT;
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_DS] = FULL_SEGMENT;
/* ...except the Guest is allowed to use them, so set the privilege
* level appropriately in the flags. */
- lg->arch.gdt[GDT_ENTRY_KERNEL_CS].b |= (GUEST_PL << 13);
- lg->arch.gdt[GDT_ENTRY_KERNEL_DS].b |= (GUEST_PL << 13);
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_CS].b |= (GUEST_PL << 13);
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_DS].b |= (GUEST_PL << 13);
}

/*H:650 An optimization of copy_gdt(), for just the three "thead-local storage"
* entries. */
-void copy_gdt_tls(const struct lguest *lg, struct desc_struct *gdt)
+void copy_gdt_tls(const struct lguest_vcpu *vcpu, struct desc_struct *gdt)
{
unsigned int i;

for (i = GDT_ENTRY_TLS_MIN; i <= GDT_ENTRY_TLS_MAX; i++)
- gdt[i] = lg->arch.gdt[i];
+ gdt[i] = vcpu->arch.gdt[i];
}

/*H:640 When the Guest is run on a different CPU, or the GDT entries have
* changed, copy_gdt() is called to copy the Guest's GDT entries across to this
* CPU's GDT. */
-void copy_gdt(const struct lguest *lg, struct desc_struct *gdt)
+void copy_gdt(const struct lguest_vcpu *vcpu, struct desc_struct *gdt)
{
unsigned int i;

@@ -141,21 +142,22 @@ void copy_gdt(const struct lguest *lg, struct desc_struct *gdt)
* replaced. See ignored_gdt() above. */
for (i = 0; i < GDT_ENTRIES; i++)
if (!ignored_gdt(i))
- gdt[i] = lg->arch.gdt[i];
+ gdt[i] = vcpu->arch.gdt[i];
}

/*H:620 This is where the Guest asks us to load a new GDT (LHCALL_LOAD_GDT).
* We copy it from the Guest and tweak the entries. */
-void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num)
+void load_guest_gdt(struct lguest_vcpu *vcpu, unsigned long table, u32 num)
{
+ struct lguest *lg = vcpu->lg;
/* We assume the Guest has the same number of GDT entries as the
* Host, otherwise we'd have to dynamically allocate the Guest GDT. */
- if (num > ARRAY_SIZE(lg->arch.gdt))
+ if (num > ARRAY_SIZE(vcpu->arch.gdt))
kill_guest(lg, "too many gdt entries %i", num);

/* We read the whole thing in, then fix it up. */
- __lgread(lg, lg->arch.gdt, table, num * sizeof(lg->arch.gdt[0]));
- fixup_gdt_table(lg, 0, ARRAY_SIZE(lg->arch.gdt));
+ __lgread(lg, vcpu->arch.gdt, table, num * sizeof(vcpu->arch.gdt[0]));
+ fixup_gdt_table(vcpu, 0, ARRAY_SIZE(vcpu->arch.gdt));
/* Mark that the GDT changed so the core knows it has to copy it again,
* even if the Guest is run on the same CPU. */
lg->changed |= CHANGED_GDT;
@@ -165,12 +167,13 @@ void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num)
* Remember that this happens on every context switch, so it's worth
* optimizing. But wouldn't it be neater to have a single hypercall to cover
* both cases? */
-void guest_load_tls(struct lguest *lg, unsigned long gtls)
+void guest_load_tls(struct lguest_vcpu *vcpu, unsigned long gtls)
{
- struct desc_struct *tls = &lg->arch.gdt[GDT_ENTRY_TLS_MIN];
+ struct desc_struct *tls = &vcpu->arch.gdt[GDT_ENTRY_TLS_MIN];
+ struct lguest *lg = vcpu->lg;

__lgread(lg, tls, gtls, sizeof(*tls)*GDT_ENTRY_TLS_ENTRIES);
- fixup_gdt_table(lg, GDT_ENTRY_TLS_MIN, GDT_ENTRY_TLS_MAX+1);
+ fixup_gdt_table(vcpu, GDT_ENTRY_TLS_MIN, GDT_ENTRY_TLS_MAX+1);
/* Note that just the TLS entries have changed. */
lg->changed |= CHANGED_GDT_TLS;
}
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index a0d710e..177b9e5 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -101,14 +101,14 @@ static void copy_in_guest_info(struct lguest_vcpu *vcpu,

/* Copy direct-to-Guest trap entries. */
if (lg->changed & CHANGED_IDT)
- copy_traps(lg, pages->state.guest_idt, default_idt_entries);
+ copy_traps(vcpu, pages->state.guest_idt, default_idt_entries);

/* Copy all GDT entries which the Guest can change. */
if (lg->changed & CHANGED_GDT)
- copy_gdt(lg, pages->state.guest_gdt);
+ copy_gdt(vcpu, pages->state.guest_gdt);
/* If only the TLS entries have changed, copy them. */
else if (lg->changed & CHANGED_GDT_TLS)
- copy_gdt_tls(lg, pages->state.guest_gdt);
+ copy_gdt_tls(vcpu, pages->state.guest_gdt);

/* Mark the Guest as unchanged for next time. */
lg->changed = 0;
@@ -198,7 +198,7 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
* re-enable interrupts an interrupt could fault and thus overwrite
* cr2, or we could even move off to a different CPU. */
if (vcpu->regs->trapnum == 14)
- lg->arch.last_pagefault = read_cr2();
+ vcpu->arch.last_pagefault = read_cr2();
/* Similarly, if we took a trap because the Guest used the FPU,
* we have to restore the FPU it expects to see. */
else if (vcpu->regs->trapnum == 7)
@@ -309,7 +309,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, lg->arch.last_pagefault,
+ if (demand_page(lg, vcpu->arch.last_pagefault,
vcpu->regs->errcode))
return;

@@ -321,7 +321,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
* happen before it's done the LHCALL_LGUEST_INIT hypercall, so
* lg->lguest_data could be NULL */
if (lg->lguest_data &&
- put_user(lg->arch.last_pagefault, &lg->lguest_data->cr2))
+ put_user(vcpu->arch.last_pagefault, &lg->lguest_data->cr2))
kill_guest(lg, "Writing cr2");
break;
case 7: /* We've intercepted a Device Not Available fault. */
@@ -352,7 +352,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
* it handle), it dies with a cryptic error message. */
kill_guest(lg, "unhandled trap %li at %#lx (%#lx)",
vcpu->regs->trapnum, vcpu->regs->eip,
- vcpu->regs->trapnum == 14 ? lg->arch.last_pagefault
+ vcpu->regs->trapnum == 14 ? vcpu->arch.last_pagefault
: vcpu->regs->errcode);
}

@@ -498,17 +498,15 @@ void __exit lguest_arch_host_fini(void)
/*H:122 The i386-specific hypercalls simply farm out to the right functions. */
int lguest_arch_do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
{
- struct lguest *lg = vcpu->lg;
-
switch (args->arg0) {
case LHCALL_LOAD_GDT:
- load_guest_gdt(lg, args->arg1, args->arg2);
+ load_guest_gdt(vcpu, args->arg1, args->arg2);
break;
case LHCALL_LOAD_IDT_ENTRY:
- load_guest_idt_entry(lg, args->arg1, args->arg2, args->arg3);
+ load_guest_idt_entry(vcpu, args->arg1, args->arg2, args->arg3);
break;
case LHCALL_LOAD_TLS:
- guest_load_tls(lg, args->arg1);
+ guest_load_tls(vcpu, args->arg1);
break;
default:
/* Bad Guest. Bad! */
@@ -589,5 +587,5 @@ void lguest_arch_setup_regs(struct lguest_vcpu *vcpu, unsigned long start)

/* There are a couple of GDT entries the Guest expects when first
* booting. */
- setup_guest_gdt(vcpu->lg);
+ setup_guest_gdt(vcpu);
}
diff --git a/include/asm-x86/lguest.h b/include/asm-x86/lguest.h
index ccd3384..4a10736 100644
--- a/include/asm-x86/lguest.h
+++ b/include/asm-x86/lguest.h
@@ -56,7 +56,7 @@ struct lguest_ro_state
struct desc_struct guest_gdt[GDT_ENTRIES];
};

-struct lguest_arch
+struct lguest_vcpu_arch
{
/* The GDT entries copied into lguest_ro_state when running. */
struct desc_struct gdt[GDT_ENTRIES];
--
1.5.0.6

2007-12-20 13:39:21

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 11/16] make registers per-vcpu

This is the most obvious per-vcpu field: registers.

So this patch moves it from struct lguest to struct vcpu,
and patch the places in which they are used, accordingly

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/interrupts_and_traps.c | 29 ++++++++++++-----------
drivers/lguest/lg.h | 9 ++++---
drivers/lguest/lguest_user.c | 36 +++++++++++++++---------------
drivers/lguest/page_tables.c | 4 ++-
drivers/lguest/x86/core.c | 39 +++++++++++++++++----------------
5 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index db440cb..1ceff5f 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -71,7 +71,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
/* There are two cases for interrupts: one where the Guest is already
* in the kernel, and a more complex one where the Guest is in
* userspace. We check the privilege level to find out. */
- if ((lg->regs->ss&0x3) != GUEST_PL) {
+ if ((vcpu->regs->ss&0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
* hypercall: both the virtual address and the segment */
virtstack = lg->esp1;
@@ -82,12 +82,12 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
* stack: when the Guest does an "iret" back from the interrupt
* handler the CPU will notice they're dropping privilege
* levels and expect these here. */
- push_guest_stack(lg, &gstack, lg->regs->ss);
- push_guest_stack(lg, &gstack, lg->regs->esp);
+ push_guest_stack(lg, &gstack, vcpu->regs->ss);
+ push_guest_stack(lg, &gstack, vcpu->regs->esp);
} else {
/* We're staying on the same Guest (kernel) stack. */
- virtstack = lg->regs->esp;
- ss = lg->regs->ss;
+ virtstack = vcpu->regs->esp;
+ ss = vcpu->regs->ss;

origstack = gstack = guest_pa(lg, virtstack);
}
@@ -96,7 +96,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
* the "Interrupt Flag" bit is always set. We copy that bit from the
* Guest's "irq_enabled" field into the eflags word: we saw the Guest
* copy it back in "lguest_iret". */
- eflags = lg->regs->eflags;
+ eflags = vcpu->regs->eflags;
if (get_user(irq_enable, &lg->lguest_data->irq_enabled) == 0
&& !(irq_enable & X86_EFLAGS_IF))
eflags &= ~X86_EFLAGS_IF;
@@ -105,19 +105,19 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
* "eflags" word, the old code segment, and the old instruction
* pointer. */
push_guest_stack(lg, &gstack, eflags);
- push_guest_stack(lg, &gstack, lg->regs->cs);
- push_guest_stack(lg, &gstack, lg->regs->eip);
+ push_guest_stack(lg, &gstack, vcpu->regs->cs);
+ push_guest_stack(lg, &gstack, vcpu->regs->eip);

/* For the six traps which supply an error code, we push that, too. */
if (has_err)
- push_guest_stack(lg, &gstack, lg->regs->errcode);
+ push_guest_stack(lg, &gstack, vcpu->regs->errcode);

/* Now we've pushed all the old state, we change the stack, the code
* segment and the address to execute. */
- lg->regs->ss = ss;
- lg->regs->esp = virtstack + (gstack - origstack);
- lg->regs->cs = (__KERNEL_CS|GUEST_PL);
- lg->regs->eip = idt_address(lo, hi);
+ vcpu->regs->ss = ss;
+ vcpu->regs->esp = virtstack + (gstack - origstack);
+ vcpu->regs->cs = (__KERNEL_CS|GUEST_PL);
+ vcpu->regs->eip = idt_address(lo, hi);

/* There are two kinds of interrupt handlers: 0xE is an "interrupt
* gate" which expects interrupts to be disabled on entry. */
@@ -158,7 +158,8 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)

/* They may be in the middle of an iret, where they asked us never to
* deliver interrupts. */
- if (lg->regs->eip >= lg->noirq_start && lg->regs->eip < lg->noirq_end)
+ if ((vcpu->regs->eip >= lg->noirq_start) &&
+ (vcpu->regs->eip < lg->noirq_end))
return;

/* If they're halted, interrupts restart them. */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f6e9020..d05fe38 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -44,6 +44,10 @@ struct lguest_vcpu {
int vcpu_id;
struct lguest *lg;

+ /* At end of a page shared mapped over lguest_pages in guest. */
+ unsigned long regs_page;
+ struct lguest_regs *regs;
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -58,9 +62,6 @@ struct lguest_vcpu {
/* The private info the thread maintains about the guest. */
struct lguest
{
- /* At end of a page shared mapped over lguest_pages in guest. */
- unsigned long regs_page;
- struct lguest_regs *regs;
struct lguest_data __user *lguest_data;
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
@@ -187,7 +188,7 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu);
void lguest_arch_handle_trap(struct lguest_vcpu *vcpu);
int lguest_arch_init_hypercalls(struct lguest_vcpu *vcpu);
int lguest_arch_do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args);
-void lguest_arch_setup_regs(struct lguest *lg, unsigned long start);
+void lguest_arch_setup_regs(struct lguest_vcpu *vcpu, unsigned long start);

/* <arch>/switcher.S: */
extern char start_switcher_text[], end_switcher_text[], switch_to_guest[];
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 60cf6c6..4f51e25 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -101,6 +101,19 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
/* The timer for lguest's clock needs initialization. */
init_clockdev(vcpu);

+ /* We need a complete page for the Guest registers: they are accessible
+ * to the Guest and we can only grant it access to whole pages. */
+ vcpu->regs_page = get_zeroed_page(GFP_KERNEL);
+ if (!vcpu->regs_page)
+ return -ENOMEM;
+
+ /* We actually put the registers at the bottom of the page. */
+ vcpu->regs = (void *)vcpu->regs_page + PAGE_SIZE - sizeof(*vcpu->regs);
+
+ /* Now we initialize the Guest's registers, handing it the start
+ * address. */
+ lguest_arch_setup_regs(vcpu, start_ip);
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -159,16 +172,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto release_guest;

- /* We need a complete page for the Guest registers: they are accessible
- * to the Guest and we can only grant it access to whole pages. */
- lg->regs_page = get_zeroed_page(GFP_KERNEL);
- if (!lg->regs_page) {
- err = -ENOMEM;
- goto release_guest;
- }
- /* We actually put the registers at the bottom of the page. */
- lg->regs = (void *)lg->regs_page + PAGE_SIZE - sizeof(*lg->regs);
-
/* Initialize the Guest's shadow page tables, using the toplevel
* address the Launcher gave us. This allocates memory, so can
* fail. */
@@ -176,10 +179,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto free_regs;

- /* Now we initialize the Guest's registers, handing it the start
- * address. */
- lguest_arch_setup_regs(lg, args[3]);
-
/* We keep a pointer to the Launcher task (ie. current task) for when
* other Guests want to wake this one (inter-Guest I/O). */
lg->tsk = current;
@@ -204,7 +203,7 @@ static int initialize(struct file *file, const unsigned long __user *input)
return sizeof(args);

free_regs:
- free_page(lg->regs_page);
+ free_page(lg->vcpus[0].regs_page);
release_guest:
kfree(lg);
unlock:
@@ -279,9 +278,12 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
- for (i = 0; i < lg->nr_vcpus; i++)
+ for (i = 0; i < lg->nr_vcpus; i++) {
/* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
hrtimer_cancel(&lg->vcpus[i].hrt);
+ /* We can free up the register page we allocated. */
+ free_page(lg->vcpus[i].regs_page);
+ }
/* Free up the shadow page tables for the Guest. */
free_guest_pagetable(lg);
/* Now all the memory cleanups are done, it's safe to release the
@@ -291,8 +293,6 @@ static int close(struct inode *inode, struct file *file)
* kmalloc()ed string, either of which is ok to hand to kfree(). */
if (!IS_ERR(lg->dead))
kfree(lg->dead);
- /* We can free up the register page we allocated. */
- free_page(lg->regs_page);
/* We clear the entire structure, which also marks it as free for the
* next user. */
memset(lg, 0, sizeof(*lg));
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index 7fb8627..8c41030 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -641,6 +641,7 @@ void map_switcher_in_guest(struct lguest_vcpu *vcpu,
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
+ unsigned long pfn;

/* Make the last PGD entry for this Guest point to the Switcher's PTE
* page for this CPU (with appropriate flags). */
@@ -655,7 +656,8 @@ void map_switcher_in_guest(struct lguest_vcpu *vcpu,
* CPU's "struct lguest_pages": if we make sure the Guest's register
* page is already mapped there, we don't have to copy them out
* again. */
- regs_pte = pfn_pte (__pa(lg->regs_page) >> PAGE_SHIFT, __pgprot(_PAGE_KERNEL));
+ pfn = __pa(vcpu->regs_page) >> PAGE_SHIFT;
+ regs_pte = pfn_pte(pfn, __pgprot(_PAGE_KERNEL));
switcher_pte_page[(unsigned long)pages/PAGE_SIZE%PTRS_PER_PTE] = regs_pte;
}
/*:*/
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 2fb9cd3..a0d710e 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -129,7 +129,7 @@ static void run_guest_once(struct lguest_vcpu *vcpu,
/* Set the trap number to 256 (impossible value). If we fault while
* switching to the Guest (bad segment registers or bug), this will
* cause us to abort the Guest. */
- lg->regs->trapnum = 256;
+ vcpu->regs->trapnum = 256;

/* Now: we push the "eflags" register on the stack, then do an "lcall".
* This is how we change from using the kernel code segment to using
@@ -197,11 +197,11 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
* bad virtual address. We have to grab this now, because once we
* re-enable interrupts an interrupt could fault and thus overwrite
* cr2, or we could even move off to a different CPU. */
- if (lg->regs->trapnum == 14)
+ if (vcpu->regs->trapnum == 14)
lg->arch.last_pagefault = read_cr2();
/* Similarly, if we took a trap because the Guest used the FPU,
* we have to restore the FPU it expects to see. */
- else if (lg->regs->trapnum == 7)
+ else if (vcpu->regs->trapnum == 7)
math_state_restore();

/* Restore SYSENTER if it's supposed to be on. */
@@ -227,12 +227,12 @@ static int emulate_insn(struct lguest_vcpu *vcpu)
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
* guest_pa just subtracts the Guest's page_offset. */
- unsigned long physaddr = guest_pa(lg, lg->regs->eip);
+ unsigned long physaddr = guest_pa(lg, vcpu->regs->eip);

/* This must be the Guest kernel trying to do something, not userspace!
* The bottom two bits of the CS segment register are the privilege
* level. */
- if ((lg->regs->cs & 3) != GUEST_PL)
+ if ((vcpu->regs->cs & 3) != GUEST_PL)
return 0;

/* Decoding x86 instructions is icky. */
@@ -275,12 +275,12 @@ static int emulate_insn(struct lguest_vcpu *vcpu)
if (in) {
/* Lower bit tells is whether it's a 16 or 32 bit access */
if (insn & 0x1)
- lg->regs->eax = 0xFFFFFFFF;
+ vcpu->regs->eax = 0xFFFFFFFF;
else
- lg->regs->eax |= (0xFFFF << shift);
+ vcpu->regs->eax |= (0xFFFF << shift);
}
/* Finally, we've "done" the instruction, so move past it. */
- lg->regs->eip += insnlen;
+ vcpu->regs->eip += insnlen;
/* Success! */
return 1;
}
@@ -289,12 +289,12 @@ static int emulate_insn(struct lguest_vcpu *vcpu)
void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
{
struct lguest *lg = vcpu->lg;
- switch (lg->regs->trapnum) {
+ switch (vcpu->regs->trapnum) {
case 13: /* We've intercepted a General Protection Fault. */
/* Check if this was one of those annoying IN or OUT
* instructions which we need to emulate. If so, we just go
* back into the Guest after we've done it. */
- if (lg->regs->errcode == 0) {
+ if (vcpu->regs->errcode == 0) {
if (emulate_insn(vcpu))
return;
}
@@ -309,7 +309,8 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, lg->arch.last_pagefault, lg->regs->errcode))
+ if (demand_page(lg, lg->arch.last_pagefault,
+ vcpu->regs->errcode))
return;

/* OK, it's really not there (or not OK): the Guest needs to
@@ -340,19 +341,19 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
case LGUEST_TRAP_ENTRY:
/* Our 'struct hcall_args' maps directly over our regs: we set
* up the pointer now to indicate a hypercall is pending. */
- vcpu->hcall = (struct hcall_args *)lg->regs;
+ vcpu->hcall = (struct hcall_args *)vcpu->regs;
return;
}

/* We didn't handle the trap, so it needs to go to the Guest. */
- if (!deliver_trap(vcpu, lg->regs->trapnum))
+ if (!deliver_trap(vcpu, vcpu->regs->trapnum))
/* If the Guest doesn't have a handler (either it hasn't
* registered any yet, or it's one of the faults we don't let
* it handle), it dies with a cryptic error message. */
kill_guest(lg, "unhandled trap %li at %#lx (%#lx)",
- lg->regs->trapnum, lg->regs->eip,
- lg->regs->trapnum == 14 ? lg->arch.last_pagefault
- : lg->regs->errcode);
+ vcpu->regs->trapnum, vcpu->regs->eip,
+ vcpu->regs->trapnum == 14 ? lg->arch.last_pagefault
+ : vcpu->regs->errcode);
}

/* Now we can look at each of the routines this calls, in increasing order of
@@ -559,9 +560,9 @@ int lguest_arch_init_hypercalls(struct lguest_vcpu *vcpu)
*
* Most of the Guest's registers are left alone: we used get_zeroed_page() to
* allocate the structure, so they will be 0. */
-void lguest_arch_setup_regs(struct lguest *lg, unsigned long start)
+void lguest_arch_setup_regs(struct lguest_vcpu *vcpu, unsigned long start)
{
- struct lguest_regs *regs = lg->regs;
+ struct lguest_regs *regs = vcpu->regs;

/* There are four "segment" registers which the Guest needs to boot:
* The "code segment" register (cs) refers to the kernel code segment
@@ -588,5 +589,5 @@ void lguest_arch_setup_regs(struct lguest *lg, unsigned long start)

/* There are a couple of GDT entries the Guest expects when first
* booting. */
- setup_guest_gdt(lg);
+ setup_guest_gdt(vcpu->lg);
}
--
1.5.0.6

2007-12-20 13:39:40

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 14/16] makes special fields be per-vcpu

lguest struct have room for some fields, namely, cr2, ts, esp1
and ss1, that are not really guest-wide, but rather, vcpu-wide.

This patch puts it in the vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 10 +++++-----
drivers/lguest/interrupts_and_traps.c | 24 +++++++++++++-----------
drivers/lguest/lg.h | 18 ++++++++++--------
drivers/lguest/page_tables.c | 11 ++++++-----
drivers/lguest/x86/core.c | 10 ++++------
5 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 41ea2e2..c6b87ef 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -58,7 +58,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
/* FLUSH_TLB comes in two flavors, depending on the
* argument: */
if (args->arg1)
- guest_pagetable_clear_all(lg);
+ guest_pagetable_clear_all(vcpu);
else
guest_pagetable_flush_user(lg);
break;
@@ -66,10 +66,10 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
/* All these calls simply pass the arguments through to the right
* routines. */
case LHCALL_NEW_PGTABLE:
- guest_new_pagetable(lg, args->arg1);
+ guest_new_pagetable(vcpu, args->arg1);
break;
case LHCALL_SET_STACK:
- guest_set_stack(lg, args->arg1, args->arg2, args->arg3);
+ guest_set_stack(vcpu, args->arg1, args->arg2, args->arg3);
break;
case LHCALL_SET_PTE:
guest_set_pte(lg, args->arg1, args->arg2, __pte(args->arg3));
@@ -82,7 +82,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
- lg->ts = args->arg1;
+ vcpu->ts = args->arg1;
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
@@ -189,7 +189,7 @@ static void initialize(struct lguest_vcpu *vcpu)
* first write to a Guest page. This may have caused a copy-on-write
* fault, but the old page might be (read-only) in the Guest
* pagetable. */
- guest_pagetable_clear_all(lg);
+ guest_pagetable_clear_all(vcpu);
}

/*H:100
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 10c9aea..78f6210 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -74,8 +74,8 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
if ((vcpu->regs->ss&0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
* hypercall: both the virtual address and the segment */
- virtstack = lg->esp1;
- ss = lg->ss1;
+ virtstack = vcpu->esp1;
+ ss = vcpu->ss1;

origstack = gstack = guest_pa(lg, virtstack);
/* We push the old stack segment and pointer onto the new
@@ -313,10 +313,11 @@ static int direct_trap(unsigned int num)
* the Guest.
*
* Which is deeply unfair, because (literally!) it wasn't the Guests' fault. */
-void pin_stack_pages(struct lguest *lg)
+void pin_stack_pages(struct lguest_vcpu *vcpu)
{
unsigned int i;

+ struct lguest *lg = vcpu->lg;
/* Depending on the CONFIG_4KSTACKS option, the Guest can have one or
* two pages of stack space. */
for (i = 0; i < lg->stack_pages; i++)
@@ -324,7 +325,7 @@ void pin_stack_pages(struct lguest *lg)
* start of the page after the kernel stack. Subtract one to
* get back onto the first stack page, and keep subtracting to
* get to the rest of the stack pages. */
- pin_page(lg, lg->esp1 - 1 - i * PAGE_SIZE);
+ pin_page(lg, vcpu->esp1 - 1 - i * PAGE_SIZE);
}

/* Direct traps also mean that we need to know whenever the Guest wants to use
@@ -335,21 +336,22 @@ void pin_stack_pages(struct lguest *lg)
*
* In Linux each process has its own kernel stack, so this happens a lot: we
* change stacks on each context switch. */
-void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages)
+void guest_set_stack(struct lguest_vcpu *vcpu, u32 seg, u32 esp,
+ unsigned int pages)
{
/* You are not allowed have a stack segment with privilege level 0: bad
* Guest! */
if ((seg & 0x3) != GUEST_PL)
- kill_guest(lg, "bad stack segment %i", seg);
+ kill_guest(vcpu->lg, "bad stack segment %i", seg);
/* We only expect one or two stack pages. */
if (pages > 2)
- kill_guest(lg, "bad stack pages %u", pages);
+ kill_guest(vcpu->lg, "bad stack pages %u", pages);
/* Save where the stack is, and how many pages */
- lg->ss1 = seg;
- lg->esp1 = esp;
- lg->stack_pages = pages;
+ vcpu->ss1 = seg;
+ vcpu->esp1 = esp;
+ vcpu->lg->stack_pages = pages;
/* Make sure the new stack pages are mapped */
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}

/* All this reference to mapping stacks leads us neatly into the other complex
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index b23694e..dbf70c6 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -46,6 +46,11 @@ struct lguest_vcpu {
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */

+ u32 cr2;
+ int ts;
+ u32 esp1;
+ u8 ss1;
+
/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
struct lguest_regs *regs;
@@ -80,10 +85,6 @@ struct lguest
* memory in the Launcher. */
void __user *mem_base;
unsigned long kernel_address;
- u32 cr2;
- int ts;
- u32 esp1;
- u8 ss1;

/* Bitmap of what has changed: see CHANGED_* above. */
int changed;
@@ -146,8 +147,9 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu);
int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num);
void load_guest_idt_entry(struct lguest_vcpu *vcpu, unsigned int i,
u32 low, u32 hi);
-void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
-void pin_stack_pages(struct lguest *lg);
+void guest_set_stack(struct lguest_vcpu *vcpu, u32 seg, u32 esp,
+ unsigned int pages);
+void pin_stack_pages(struct lguest_vcpu *vcpu);
void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
void copy_traps(const struct lguest_vcpu *vcpu, struct desc_struct *idt,
@@ -169,9 +171,9 @@ void copy_gdt_tls(const struct lguest_vcpu *vcpu, struct desc_struct *gdt);
/* page_tables.c: */
int init_guest_pagetable(struct lguest *lg, unsigned long pgtable);
void free_guest_pagetable(struct lguest *lg);
-void guest_new_pagetable(struct lguest *lg, unsigned long pgtable);
+void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable);
void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i);
-void guest_pagetable_clear_all(struct lguest *lg);
+void guest_pagetable_clear_all(struct lguest_vcpu *vcpu);
void guest_pagetable_flush_user(struct lguest *lg);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index 8c41030..f0f271d 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -432,9 +432,10 @@ static unsigned int new_pgdir(struct lguest *lg,
* Now we've seen all the page table setting and manipulation, let's see what
* what happens when the Guest changes page tables (ie. changes the top-level
* pgdir). This occurs on almost every context switch. */
-void guest_new_pagetable(struct lguest *lg, unsigned long pgtable)
+void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable)
{
int newpgdir, repin = 0;
+ struct lguest *lg = vcpu->lg;

/* Look to see if we have this one already. */
newpgdir = find_pgdir(lg, pgtable);
@@ -446,7 +447,7 @@ void guest_new_pagetable(struct lguest *lg, unsigned long pgtable)
lg->pgdidx = newpgdir;
/* If it was completely blank, we map in the Guest kernel stack */
if (repin)
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}

/*H:470 Finally, a routine which throws away everything: all PGD entries in all
@@ -468,11 +469,11 @@ static void release_all_pagetables(struct lguest *lg)
* mapping. Since kernel mappings are in every page table, it's easiest to
* throw them all away. This traps the Guest in amber for a while as
* everything faults back in, but it's rare. */
-void guest_pagetable_clear_all(struct lguest *lg)
+void guest_pagetable_clear_all(struct lguest_vcpu *vcpu)
{
- release_all_pagetables(lg);
+ release_all_pagetables(vcpu->lg);
/* We need the Guest kernel stack mapped again. */
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}
/*:*/
/*M:009 Since we throw away all mappings when a kernel mapping changes, our
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 177b9e5..aec2527 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -96,8 +96,8 @@ static void copy_in_guest_info(struct lguest_vcpu *vcpu,
/* Set up the two "TSS" members which tell the CPU what stack to use
* for traps which do directly into the Guest (ie. traps at privilege
* level 1). */
- pages->state.guest_tss.esp1 = lg->esp1;
- pages->state.guest_tss.ss1 = lg->ss1;
+ pages->state.guest_tss.esp1 = vcpu->esp1;
+ pages->state.guest_tss.ss1 = vcpu->ss1;

/* Copy direct-to-Guest trap entries. */
if (lg->changed & CHANGED_IDT)
@@ -167,12 +167,10 @@ static void run_guest_once(struct lguest_vcpu *vcpu,
* are disabled: we own the CPU. */
void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
{
- struct lguest *lg = vcpu->lg;
-
/* Remember the awfully-named TS bit? If the Guest has asked to set it
* we set it now, so we can trap and pass that trap to the Guest if it
* uses the FPU. */
- if (lg->ts)
+ if (vcpu->ts)
lguest_set_ts();

/* SYSENTER is an optimized way of doing system calls. We can't allow
@@ -328,7 +326,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
/* If the Guest doesn't want to know, we already restored the
* Floating Point Unit, so we just continue without telling
* it. */
- if (!lg->ts)
+ if (!vcpu->ts)
return;
break;
case 32 ... 255:
--
1.5.0.6

2007-12-20 13:40:17

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 13/16] per-vcpu lguest task management

lguest uses tasks to control its running behaviour (like sending
breaks, controlling halted state, etc). In a per-vcpu environment,
each vcpu will have its own underlying task. So this patch
makes the infrastructure for that possible

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 4 +-
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 8 ++--
drivers/lguest/lg.h | 14 ++++----
drivers/lguest/lguest_user.c | 56 ++++++++++++++++++--------------
5 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 4d0102d..285a465 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -197,7 +197,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
return -ERESTARTSYS;

/* If Waker set break_out, return to Launcher. */
- if (lg->break_out)
+ if (vcpu->break_out)
return -EAGAIN;

/* Check if there are any interrupts which can be delivered
@@ -217,7 +217,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)

/* If the Guest asked to be stopped, we sleep. The Guest's
* clock timer or LHCALL_BREAK from the Waker will wake us. */
- if (lg->halted) {
+ if (vcpu->halted) {
set_current_state(TASK_INTERRUPTIBLE);
schedule();
continue;
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 4364bc2..41ea2e2 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -86,7 +86,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
- lg->halted = 1;
+ vcpu->halted = 1;
break;
case LHCALL_NOTIFY:
lg->pending_notify = args->arg1;
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index b3d444a..10c9aea 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -163,11 +163,11 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)
return;

/* If they're halted, interrupts restart them. */
- if (lg->halted) {
+ if (vcpu->halted) {
/* Re-enable interrupts. */
if (put_user(X86_EFLAGS_IF, &lg->lguest_data->irq_enabled))
kill_guest(lg, "Re-enabling interrupts");
- lg->halted = 0;
+ vcpu->halted = 0;
} else {
/* Otherwise we check if they have interrupts disabled. */
u32 irq_enabled;
@@ -500,8 +500,8 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
/* Remember the first interrupt is the timer interrupt. */
set_bit(0, vcpu->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
- if (vcpu->lg->halted)
- wake_up_process(vcpu->lg->tsk);
+ if (vcpu->halted)
+ wake_up_process(vcpu->tsk);
return HRTIMER_NORESTART;
}

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f9429ff..b23694e 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -43,6 +43,8 @@ struct lguest;
struct lguest_vcpu {
int vcpu_id;
struct lguest *lg;
+ struct task_struct *tsk;
+ struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */

/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
@@ -55,6 +57,11 @@ struct lguest_vcpu {
/* Virtual clock device */
struct hrtimer hrt;

+ /* Do we need to stop what we're doing and return to userspace? */
+ int break_out;
+ wait_queue_head_t break_wq;
+ int halted;
+
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);

@@ -65,8 +72,6 @@ struct lguest_vcpu {
struct lguest
{
struct lguest_data __user *lguest_data;
- struct task_struct *tsk;
- struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
struct lguest_vcpu vcpus[NR_CPUS];
unsigned int nr_vcpus;

@@ -76,15 +81,10 @@ struct lguest
void __user *mem_base;
unsigned long kernel_address;
u32 cr2;
- int halted;
int ts;
u32 esp1;
u8 ss1;

- /* Do we need to stop what we're doing and return to userspace? */
- int break_out;
- wait_queue_head_t break_wq;
-
/* Bitmap of what has changed: see CHANGED_* above. */
int changed;
struct lguest_pages *last_pages;
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 4f51e25..d081db4 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -13,7 +13,8 @@
* LHREQ_BREAK and the value "1" to /dev/lguest to do this. Once the Launcher
* has done whatever needs attention, it writes LHREQ_BREAK and "0" to release
* the Waker. */
-static int break_guest_out(struct lguest *lg, const unsigned long __user *input)
+static int break_guest_out(struct lguest_vcpu *vcpu,
+ const unsigned long __user *input)
{
unsigned long on;

@@ -22,14 +23,15 @@ static int break_guest_out(struct lguest *lg, const unsigned long __user *input)
return -EFAULT;

if (on) {
- lg->break_out = 1;
+ vcpu->break_out = 1;
/* Pop it out of the Guest (may be running on different CPU) */
- wake_up_process(lg->tsk);
+ wake_up_process(vcpu->tsk);
/* Wait for them to reset it */
- return wait_event_interruptible(lg->break_wq, !lg->break_out);
+ return wait_event_interruptible(vcpu->break_wq,
+ !vcpu->break_out);
} else {
- lg->break_out = 0;
- wake_up(&lg->break_wq);
+ vcpu->break_out = 0;
+ wake_up(&vcpu->break_wq);
return 0;
}
}
@@ -66,7 +68,7 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
vcpu = &lg->vcpus[vcpu_id];

/* If you're not the task which owns the Guest, go away. */
- if (current != lg->tsk)
+ if (current != vcpu->tsk)
return -EPERM;

/* If the guest is already dead, we indicate why */
@@ -114,6 +116,19 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
* address. */
lguest_arch_setup_regs(vcpu, start_ip);

+ /* Initialize the queue for the waker to wait on */
+ init_waitqueue_head(&vcpu->break_wq);
+
+ /* We keep a pointer to the Launcher task (ie. current task) for when
+ * other Guests want to wake this one (inter-Guest I/O). */
+ vcpu->tsk = current;
+
+ /* We need to keep a pointer to the Launcher's memory map, because if
+ * the Launcher dies we need to clean it up. If we don't keep a
+ * reference, it is destroyed before close() is called. */
+ vcpu->mm = get_task_mm(vcpu->tsk);
+
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -179,17 +194,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto free_regs;

- /* We keep a pointer to the Launcher task (ie. current task) for when
- * other Guests want to wake this one (inter-Guest I/O). */
- lg->tsk = current;
- /* We need to keep a pointer to the Launcher's memory map, because if
- * the Launcher dies we need to clean it up. If we don't keep a
- * reference, it is destroyed before close() is called. */
- lg->mm = get_task_mm(lg->tsk);
-
- /* Initialize the queue for the waker to wait on */
- init_waitqueue_head(&lg->break_wq);
-
/* We remember which CPU's pages this Guest used last, for optimization
* when the same Guest runs on the same CPU twice. */
lg->last_pages = NULL;
@@ -244,7 +248,7 @@ static ssize_t write(struct file *file, const char __user *in,
return -ENOENT;

/* If you're not the task which owns the Guest, you can only break */
- if (lg && current != lg->tsk && req != LHREQ_BREAK)
+ if (lg && current != vcpu->tsk && req != LHREQ_BREAK)
return -EPERM;

switch (req) {
@@ -253,7 +257,7 @@ static ssize_t write(struct file *file, const char __user *in,
case LHREQ_IRQ:
return user_send_irq(vcpu, input);
case LHREQ_BREAK:
- return break_guest_out(lg, input);
+ return break_guest_out(vcpu, input);
default:
return -EINVAL;
}
@@ -278,17 +282,19 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
+
+ /* Free up the shadow page tables for the Guest. */
+ free_guest_pagetable(lg);
+
for (i = 0; i < lg->nr_vcpus; i++) {
/* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
hrtimer_cancel(&lg->vcpus[i].hrt);
/* We can free up the register page we allocated. */
free_page(lg->vcpus[i].regs_page);
+ /* Now all the memory cleanups are done, it's safe to release
+ * the Launcher's memory management structure. */
+ mmput(lg->vcpus[i].mm);
}
- /* Free up the shadow page tables for the Guest. */
- free_guest_pagetable(lg);
- /* Now all the memory cleanups are done, it's safe to release the
- * Launcher's memory management structure. */
- mmput(lg->mm);
/* If lg->dead doesn't contain an error code it will be NULL or a
* kmalloc()ed string, either of which is ok to hand to kfree(). */
if (!IS_ERR(lg->dead))
--
1.5.0.6

2007-12-20 13:40:49

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 15/16] make pending notifications per-vcpu

this patch makes the pending_notify field, used to control
pending notifications, per-vcpu, instead of per-guest

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 +++---
drivers/lguest/hypercalls.c | 6 +++---
drivers/lguest/lg.h | 3 ++-
drivers/lguest/lguest_user.c | 4 ++--
4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 285a465..d628515 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -186,10 +186,10 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)

/* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
- if (lg->pending_notify) {
- if (put_user(lg->pending_notify, user))
+ if (vcpu->pending_notify) {
+ if (put_user(vcpu->pending_notify, user))
return -EFAULT;
- return sizeof(lg->pending_notify);
+ return sizeof(vcpu->pending_notify);
}

/* Check for signals */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index c6b87ef..95e1062 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -89,7 +89,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
vcpu->halted = 1;
break;
case LHCALL_NOTIFY:
- lg->pending_notify = args->arg1;
+ vcpu->pending_notify = args->arg1;
break;
default:
/* It should be an architecture-specific hypercall. */
@@ -152,7 +152,7 @@ static void do_async_hcalls(struct lguest_vcpu *vcpu)

/* Stop doing hypercalls if they want to notify the Launcher:
* it needs to service this first. */
- if (lg->pending_notify)
+ if (vcpu->pending_notify)
break;
}
}
@@ -217,7 +217,7 @@ void do_hypercalls(struct lguest_vcpu *vcpu)
/* If we stopped reading the hypercall ring because the Guest did a
* NOTIFY to the Launcher, we want to return now. Otherwise we do
* the hypercall. */
- if (!vcpu->lg->pending_notify) {
+ if (!vcpu->pending_notify) {
do_hcall(vcpu, vcpu->hcall);
/* Tricky point: we reset the hcall pointer to mark the
* hypercall as "done". We use the hcall pointer rather than
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index dbf70c6..6faf90d 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -51,6 +51,8 @@ struct lguest_vcpu {
u32 esp1;
u8 ss1;

+ unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */
+
/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
struct lguest_regs *regs;
@@ -95,7 +97,6 @@ struct lguest
struct pgdir pgdirs[4];

unsigned long noirq_start, noirq_end;
- unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */

unsigned int stack_pages;
u32 tsc_khz;
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index d081db4..349d69d 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -88,8 +88,8 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)

/* If we returned from read() last time because the Guest notified,
* clear the flag. */
- if (lg->pending_notify)
- lg->pending_notify = 0;
+ if (vcpu->pending_notify)
+ vcpu->pending_notify = 0;

/* Run the Guest until something interesting happens. */
return run_guest(vcpu, (unsigned long __user *)user);
--
1.5.0.6

2007-12-20 13:41:15

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 16/16] per-vcpu lguest pgdir management

this patch makes the pgdir management per-vcpu. The pgdirs pool
is still guest-wide (although it'll probably need to grow when we
are really executing more vcpus), but the pgdidx index is gone,
since it makes no sense anymore. Instead, we use a per-vcpu
index.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 6 ++--
drivers/lguest/lg.h | 12 +++---
drivers/lguest/page_tables.c | 60 +++++++++++++++++----------------
drivers/lguest/x86/core.c | 6 ++--
5 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 95e1062..f379475 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -60,7 +60,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
if (args->arg1)
guest_pagetable_clear_all(vcpu);
else
- guest_pagetable_flush_user(lg);
+ guest_pagetable_flush_user(vcpu);
break;

/* All these calls simply pass the arguments through to the right
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 78f6210..a0ac77e 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -77,7 +77,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
virtstack = vcpu->esp1;
ss = vcpu->ss1;

- origstack = gstack = guest_pa(lg, virtstack);
+ origstack = gstack = guest_pa(vcpu, virtstack);
/* We push the old stack segment and pointer onto the new
* stack: when the Guest does an "iret" back from the interrupt
* handler the CPU will notice they're dropping privilege
@@ -89,7 +89,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
virtstack = vcpu->regs->esp;
ss = vcpu->regs->ss;

- origstack = gstack = guest_pa(lg, virtstack);
+ origstack = gstack = guest_pa(vcpu, virtstack);
}

/* Remember that we never let the Guest actually disable interrupts, so
@@ -325,7 +325,7 @@ void pin_stack_pages(struct lguest_vcpu *vcpu)
* start of the page after the kernel stack. Subtract one to
* get back onto the first stack page, and keep subtracting to
* get to the rest of the stack pages. */
- pin_page(lg, vcpu->esp1 - 1 - i * PAGE_SIZE);
+ pin_page(vcpu, vcpu->esp1 - 1 - i * PAGE_SIZE);
}

/* Direct traps also mean that we need to know whenever the Guest wants to use
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 6faf90d..e700408 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -57,6 +57,8 @@ struct lguest_vcpu {
unsigned long regs_page;
struct lguest_regs *regs;

+ int vcpu_pgd; /* which pgd this vcpu is currently using */
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -92,8 +94,6 @@ struct lguest
int changed;
struct lguest_pages *last_pages;

- /* We keep a small number of these. */
- u32 pgdidx;
struct pgdir pgdirs[4];

unsigned long noirq_start, noirq_end;
@@ -175,14 +175,14 @@ void free_guest_pagetable(struct lguest *lg);
void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable);
void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i);
void guest_pagetable_clear_all(struct lguest_vcpu *vcpu);
-void guest_pagetable_flush_user(struct lguest *lg);
+void guest_pagetable_flush_user(struct lguest_vcpu *vcpu);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
void map_switcher_in_guest(struct lguest_vcpu *vcpu,
struct lguest_pages *pages);
-int demand_page(struct lguest *info, unsigned long cr2, int errcode);
-void pin_page(struct lguest *lg, unsigned long vaddr);
-unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
+int demand_page(struct lguest_vcpu *vcpu, unsigned long cr2, int errcode);
+void pin_page(struct lguest_vcpu *vcpu, unsigned long vaddr);
+unsigned long guest_pa(struct lguest_vcpu *vcpu, unsigned long vaddr);
void page_table_guest_data_init(struct lguest *lg);

/* <arch>/core.c: */
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index f0f271d..84c22d7 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -94,10 +94,10 @@ static pte_t *spte_addr(struct lguest *lg, pgd_t spgd, unsigned long vaddr)

/* These two functions just like the above two, except they access the Guest
* page tables. Hence they return a Guest address. */
-static unsigned long gpgd_addr(struct lguest *lg, unsigned long vaddr)
+static unsigned long gpgd_addr(struct lguest_vcpu *vcpu, unsigned long vaddr)
{
unsigned int index = vaddr >> (PGDIR_SHIFT);
- return lg->pgdirs[lg->pgdidx].gpgdir + index * sizeof(pgd_t);
+ return vcpu->lg->pgdirs[vcpu->vcpu_pgd].gpgdir + index * sizeof(pgd_t);
}

static unsigned long gpte_addr(struct lguest *lg,
@@ -200,22 +200,23 @@ static void check_gpgd(struct lguest *lg, pgd_t gpgd)
*
* If we fixed up the fault (ie. we mapped the address), this routine returns
* true. Otherwise, it was a real fault and we need to tell the Guest. */
-int demand_page(struct lguest *lg, unsigned long vaddr, int errcode)
+int demand_page(struct lguest_vcpu *vcpu, unsigned long vaddr, int errcode)
{
pgd_t gpgd;
pgd_t *spgd;
unsigned long gpte_ptr;
pte_t gpte;
pte_t *spte;
+ struct lguest *lg = vcpu->lg;

/* First step: get the top-level Guest page table entry. */
- gpgd = lgread(lg, gpgd_addr(lg, vaddr), pgd_t);
+ gpgd = lgread(lg, gpgd_addr(vcpu, vaddr), pgd_t);
/* Toplevel not present? We can't map it in. */
if (!(pgd_flags(gpgd) & _PAGE_PRESENT))
return 0;

/* Now look at the matching shadow entry. */
- spgd = spgd_addr(lg, lg->pgdidx, vaddr);
+ spgd = spgd_addr(lg, vcpu->vcpu_pgd, vaddr);
if (!(pgd_flags(*spgd) & _PAGE_PRESENT)) {
/* No shadow entry: allocate a new shadow PTE page. */
unsigned long ptepage = get_zeroed_page(GFP_KERNEL);
@@ -297,19 +298,19 @@ int demand_page(struct lguest *lg, unsigned long vaddr, int errcode)
*
* This is a quick version which answers the question: is this virtual address
* mapped by the shadow page tables, and is it writable? */
-static int page_writable(struct lguest *lg, unsigned long vaddr)
+static int page_writable(struct lguest_vcpu *vcpu, unsigned long vaddr)
{
pgd_t *spgd;
unsigned long flags;

/* Look at the current top level entry: is it present? */
- spgd = spgd_addr(lg, lg->pgdidx, vaddr);
+ spgd = spgd_addr(vcpu->lg, vcpu->vcpu_pgd, vaddr);
if (!(pgd_flags(*spgd) & _PAGE_PRESENT))
return 0;

/* Check the flags on the pte entry itself: it must be present and
* writable. */
- flags = pte_flags(*(spte_addr(lg, *spgd, vaddr)));
+ flags = pte_flags(*(spte_addr(vcpu->lg, *spgd, vaddr)));

return (flags & (_PAGE_PRESENT|_PAGE_RW)) == (_PAGE_PRESENT|_PAGE_RW);
}
@@ -317,10 +318,10 @@ static int page_writable(struct lguest *lg, unsigned long vaddr)
/* So, when pin_stack_pages() asks us to pin a page, we check if it's already
* in the page tables, and if not, we call demand_page() with error code 2
* (meaning "write"). */
-void pin_page(struct lguest *lg, unsigned long vaddr)
+void pin_page(struct lguest_vcpu *vcpu, unsigned long vaddr)
{
- if (!page_writable(lg, vaddr) && !demand_page(lg, vaddr, 2))
- kill_guest(lg, "bad stack page %#lx", vaddr);
+ if (!page_writable(vcpu, vaddr) && !demand_page(vcpu, vaddr, 2))
+ kill_guest(vcpu->lg, "bad stack page %#lx", vaddr);
}

/*H:450 If we chase down the release_pgd() code, it looks like this: */
@@ -358,28 +359,28 @@ static void flush_user_mappings(struct lguest *lg, int idx)
*
* The Guest has a hypercall to throw away the page tables: it's used when a
* large number of mappings have been changed. */
-void guest_pagetable_flush_user(struct lguest *lg)
+void guest_pagetable_flush_user(struct lguest_vcpu *vcpu)
{
/* Drop the userspace part of the current page table. */
- flush_user_mappings(lg, lg->pgdidx);
+ flush_user_mappings(vcpu->lg, vcpu->vcpu_pgd);
}
/*:*/

/* We walk down the guest page tables to get a guest-physical address */
-unsigned long guest_pa(struct lguest *lg, unsigned long vaddr)
+unsigned long guest_pa(struct lguest_vcpu *vcpu, unsigned long vaddr)
{
pgd_t gpgd;
pte_t gpte;

/* First step: get the top-level Guest page table entry. */
- gpgd = lgread(lg, gpgd_addr(lg, vaddr), pgd_t);
+ gpgd = lgread(vcpu->lg, gpgd_addr(vcpu, vaddr), pgd_t);
/* Toplevel not present? We can't map it in. */
if (!(pgd_flags(gpgd) & _PAGE_PRESENT))
- kill_guest(lg, "Bad address %#lx", vaddr);
+ kill_guest(vcpu->lg, "Bad address %#lx", vaddr);

- gpte = lgread(lg, gpte_addr(lg, gpgd, vaddr), pte_t);
+ gpte = lgread(vcpu->lg, gpte_addr(vcpu->lg, gpgd, vaddr), pte_t);
if (!(pte_flags(gpte) & _PAGE_PRESENT))
- kill_guest(lg, "Bad address %#lx", vaddr);
+ kill_guest(vcpu->lg, "Bad address %#lx", vaddr);

return pte_pfn(gpte) * PAGE_SIZE | (vaddr & ~PAGE_MASK);
}
@@ -399,11 +400,12 @@ static unsigned int find_pgdir(struct lguest *lg, unsigned long pgtable)
/*H:435 And this is us, creating the new page directory. If we really do
* allocate a new one (and so the kernel parts are not there), we set
* blank_pgdir. */
-static unsigned int new_pgdir(struct lguest *lg,
+static unsigned int new_pgdir(struct lguest_vcpu *vcpu,
unsigned long gpgdir,
int *blank_pgdir)
{
unsigned int next;
+ struct lguest *lg = vcpu->lg;

/* We pick one entry at random to throw out. Choosing the Least
* Recently Used might be better, but this is easy. */
@@ -413,7 +415,7 @@ static unsigned int new_pgdir(struct lguest *lg,
lg->pgdirs[next].pgdir = (pgd_t *)get_zeroed_page(GFP_KERNEL);
/* If the allocation fails, just keep using the one we have */
if (!lg->pgdirs[next].pgdir)
- next = lg->pgdidx;
+ next = vcpu->vcpu_pgd;
else
/* This is a blank page, so there are no kernel
* mappings: caller must map the stack! */
@@ -442,9 +444,9 @@ void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable)
/* If not, we allocate or mug an existing one: if it's a fresh one,
* repin gets set to 1. */
if (newpgdir == ARRAY_SIZE(lg->pgdirs))
- newpgdir = new_pgdir(lg, pgtable, &repin);
+ newpgdir = new_pgdir(vcpu, pgtable, &repin);
/* Change the current pgd index to the new one. */
- lg->pgdidx = newpgdir;
+ vcpu->vcpu_pgd = newpgdir;
/* If it was completely blank, we map in the Guest kernel stack */
if (repin)
pin_stack_pages(vcpu);
@@ -591,11 +593,11 @@ int init_guest_pagetable(struct lguest *lg, unsigned long pgtable)
{
/* We start on the first shadow page table, and give it a blank PGD
* page. */
- lg->pgdidx = 0;
- lg->pgdirs[lg->pgdidx].gpgdir = pgtable;
- lg->pgdirs[lg->pgdidx].pgdir = (pgd_t*)get_zeroed_page(GFP_KERNEL);
- if (!lg->pgdirs[lg->pgdidx].pgdir)
+ lg->pgdirs[0].gpgdir = pgtable;
+ lg->pgdirs[0].pgdir = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+ if (!lg->pgdirs[0].pgdir)
return -ENOMEM;
+ lg->vcpus[0].vcpu_pgd = 0;
return 0;
}

@@ -607,7 +609,7 @@ void page_table_guest_data_init(struct lguest *lg)
/* We tell the Guest that it can't use the top 4MB of virtual
* addresses used by the Switcher. */
|| put_user(4U*1024*1024, &lg->lguest_data->reserve_mem)
- || put_user(lg->pgdirs[lg->pgdidx].gpgdir,&lg->lguest_data->pgdir))
+ || put_user(lg->pgdirs[0].gpgdir, &lg->lguest_data->pgdir))
kill_guest(lg, "bad guest page %p", lg->lguest_data);

/* In flush_user_mappings() we loop from 0 to
@@ -638,7 +640,6 @@ void free_guest_pagetable(struct lguest *lg)
void map_switcher_in_guest(struct lguest_vcpu *vcpu,
struct lguest_pages *pages)
{
- struct lguest *lg = vcpu->lg;
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
@@ -648,7 +649,8 @@ void map_switcher_in_guest(struct lguest_vcpu *vcpu,
* page for this CPU (with appropriate flags). */
switcher_pgd = __pgd(__pa(switcher_pte_page) | _PAGE_KERNEL);

- lg->pgdirs[lg->pgdidx].pgdir[SWITCHER_PGD_INDEX] = switcher_pgd;
+ vcpu->lg->pgdirs[vcpu->vcpu_pgd].pgdir[SWITCHER_PGD_INDEX] =
+ switcher_pgd;

/* We also change the Switcher PTE page. When we're running the Guest,
* we want the Guest's "regs" page to appear where the first Switcher
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index aec2527..582eaf0 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -147,7 +147,7 @@ static void run_guest_once(struct lguest_vcpu *vcpu,
* 0-th argument above, ie "a"). %ebx contains the
* physical address of the Guest's top-level page
* directory. */
- : "0"(pages), "1"(__pa(lg->pgdirs[lg->pgdidx].pgdir))
+ : "0"(pages), "1"(__pa(lg->pgdirs[vcpu->vcpu_pgd].pgdir))
/* We tell gcc that all these registers could change,
* which means we don't have to save and restore them in
* the Switcher. */
@@ -225,7 +225,7 @@ static int emulate_insn(struct lguest_vcpu *vcpu)
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
* guest_pa just subtracts the Guest's page_offset. */
- unsigned long physaddr = guest_pa(lg, vcpu->regs->eip);
+ unsigned long physaddr = guest_pa(vcpu, vcpu->regs->eip);

/* This must be the Guest kernel trying to do something, not userspace!
* The bottom two bits of the CS segment register are the privilege
@@ -307,7 +307,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, vcpu->arch.last_pagefault,
+ if (demand_page(vcpu, vcpu->arch.last_pagefault,
vcpu->regs->errcode))
return;

--
1.5.0.6

2007-12-25 23:34:20

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 01/16] introduce vcpu struct

On Friday 21 December 2007 00:33:41 Glauber de Oliveira Costa wrote:
> this patch introduces a vcpu struct for lguest. In upcoming patches,
> more and more fields will be moved from the lguest struct to the vcpu

Hi Glommer,

> +static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu)
> +{
> + return container_of((vcpu - vcpu->vcpu_id), struct lguest, vcpus[0]);
> +}

I think this function is a bad idea: it contains implicit UP assumptions which
aren't obvious to the caller. vcpu->lg should do the same thing, no?

Rusty,

2007-12-25 23:36:12

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 02/16] adapt lguest launcher to per-cpuness

On Friday 21 December 2007 00:33:42 Glauber de Oliveira Costa wrote:
> + if (!vcpu_id) {
> + /*
> + * Service input, then unset the BREAK to
> + * release the Waker.
> + */
> + handle_input(lguest_fd);
> + if (pwrite(lguest_fd, args, sizeof(args), 0) < 0)
> + err(1, "Resetting break");
> + }

I hate winged comments: those two extra lines, wasted!

Cheers,
Rusty.

2007-12-25 23:40:46

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 04/16] per-cpu run guest

On Friday 21 December 2007 00:33:44 Glauber de Oliveira Costa wrote:
> @@ -55,11 +55,15 @@ static int user_send_irq(struct lguest *lg, const
> unsigned long __user *input) static ssize_t read(struct file *file, char
> __user *user, size_t size,loff_t*o) {
> struct lguest *lg = file->private_data;
> + struct lguest_vcpu *vcpu = NULL;
> + unsigned int vcpu_id = *o;
>
> /* You must write LHREQ_INITIALIZE first! */
> if (!lg)
> return -EINVAL;
>
> + vcpu = &lg->vcpus[vcpu_id];
> +

Better do a bounds check here!

Cheers,
Rusty.

2007-12-25 23:40:59

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 05/16] make write() operation smp aware

On Friday 21 December 2007 00:33:45 Glauber de Oliveira Costa wrote:
> --- a/drivers/lguest/lguest_user.c
> +++ b/drivers/lguest/lguest_user.c
> @@ -223,14 +223,21 @@ static ssize_t write(struct file *file, const char
...
> /* If you haven't initialized, you must do that first. */
> - if (req != LHREQ_INITIALIZE && !lg)
> - return -EINVAL;
> + if (req != LHREQ_INITIALIZE) {
> + if (!lg)
> + return -EINVAL;
> + vcpu = &lg->vcpus[vcpu_id];
> + if (!vcpu)
> + return -EINVAL;
> + }

Bounds check again...

Cheers,
Rusty.

2007-12-25 23:47:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 09/16] map_switcher_in_guest() per-vcpu

On Friday 21 December 2007 00:33:49 Glauber de Oliveira Costa wrote:
> The switcher needs to be mapped per-vcpu, because different vcpus
> will potentially have different page tables (they don't have to,
> because threads will share the same).
>
> So our first step is the make the function receive a vcpu struct

Hmm, I wonder if we should call it lg_vcpu: lguest_vcpu is a little long, and
in total adds a fair number of lines to the code :)

Rusty.

2007-12-25 23:48:00

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 13/16] per-vcpu lguest task management

On Friday 21 December 2007 00:33:53 Glauber de Oliveira Costa wrote:
> @@ -114,6 +116,19 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int
> vcpu_id, * address. */
> lguest_arch_setup_regs(vcpu, start_ip);
>
> + /* Initialize the queue for the waker to wait on */
> + init_waitqueue_head(&vcpu->break_wq);
> +
> + /* We keep a pointer to the Launcher task (ie. current task) for when
> + * other Guests want to wake this one (inter-Guest I/O). */
> + vcpu->tsk = current;
> +
> + /* We need to keep a pointer to the Launcher's memory map, because if
> + * the Launcher dies we need to clean it up. If we don't keep a
> + * reference, it is destroyed before close() is called. */
> + vcpu->mm = get_task_mm(vcpu->tsk);
> +
> +

Nitpick: extra line?

Cheers,
Rusty.

2007-12-25 23:54:58

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/16] lguest: introduce vcpu structure

On Friday 21 December 2007 00:33:40 Glauber de Oliveira Costa wrote:
> this patch makes room for the vcpu structure in lguest, already used in
> this very same way at lguest64. It's the first part of our plan to
> have lguest and lguest64 unified too.

Hi Glauber!

These patches look really solid, thanks! A few minor things, then I'll
apply them and push them for 2.6.25.

My only question is whether we should go further and vpu-ify routines like
lgread and kill_guest, so that we can avoid more "lg" temporary variables...

> When two dogs hang out, you don't have new puppies right in the other day.
> Some time has to be elapsed. They have to grow first. In this same spirit,
> having these patches _do not_ mean smp guests can be launched (yet)
> Much more work is to come, but this is the basic infrastructure.

OK, that made me laugh...

Thanks!
Rusty.

2007-12-26 14:25:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 02/16] adapt lguest launcher to per-cpuness


On Wed, 26 Dec 2007, Rusty Russell wrote:

> On Friday 21 December 2007 00:33:42 Glauber de Oliveira Costa wrote:
> > + if (!vcpu_id) {
> > + /*
> > + * Service input, then unset the BREAK to
> > + * release the Waker.
> > + */
> > + handle_input(lguest_fd);
> > + if (pwrite(lguest_fd, args, sizeof(args), 0) < 0)
> > + err(1, "Resetting break");
> > + }
>
> I hate winged comments: those two extra lines, wasted!
>

For multiple lines, wings are a Good Thing (TM). Otherwise it looks
sloppy.

/* Service input, then unset the BREAK to
* release the Waker. */

extra asterisk! ok then

/* Service input, then unset the BREAK to
release the Waker. */

Yuck, that "release" looks like it can be code, especially with parsers
that look for comments that start with some sort of /* or *

Those little wings do IMHO make the code look nicer. I know in the Linux
community, my weight compared to you is a chihuahua compared to a
St. Bernard. But in this case, I believe others think that my collar is
prettier than yours. ;-)

Some one buy Rusty a bigger hard-drive to store those extra lines.

-- Steve

2007-12-27 00:08:51

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 02/16] adapt lguest launcher to per-cpuness

On Thursday 27 December 2007 01:24:10 Steven Rostedt wrote:
> On Wed, 26 Dec 2007, Rusty Russell wrote:
> > On Friday 21 December 2007 00:33:42 Glauber de Oliveira Costa wrote:
> > > + if (!vcpu_id) {
> > > + /*
> > > + * Service input, then unset the BREAK to
> > > + * release the Waker.
> > > + */
> > > + handle_input(lguest_fd);
> > > + if (pwrite(lguest_fd, args, sizeof(args), 0) < 0)
> > > + err(1, "Resetting break");
> > > + }
> >
> > I hate winged comments: those two extra lines, wasted!
>
> For multiple lines, wings are a Good Thing (TM). Otherwise it looks
> sloppy.
>
> /* Service input, then unset the BREAK to
> * release the Waker. */
>
> extra asterisk! ok then

No, that is correct. See all the rest of the lguest comments, or ask Dave
Miller :)

Of course, if you can keep a comment concisely in one line, it's even better.
But since colorizing editors are so common, the extra wings just steal
vertical space.

> Some one buy Rusty a bigger hard-drive to store those extra lines.

More importantly, a bigger screen to hold the code :)

Cheers,
Rusty.

2008-01-06 17:34:10

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 0/16] lguest: introduce vcpu structure

On Dec 25, 2007 9:54 PM, Rusty Russell <[email protected]> wrote:
> On Friday 21 December 2007 00:33:40 Glauber de Oliveira Costa wrote:
> > this patch makes room for the vcpu structure in lguest, already used in
> > this very same way at lguest64. It's the first part of our plan to
> > have lguest and lguest64 unified too.
>
> Hi Glauber!
>
> These patches look really solid, thanks! A few minor things, then I'll
> apply them and push them for 2.6.25.

Thanks for all comments. I was in vacations until today, and I'll
repost a new version that address all your comments
soon (that's why I'm not answering each of them individually now, have
to look carefully)

> My only question is whether we should go further and vpu-ify routines like
> lgread and kill_guest, so that we can avoid more "lg" temporary variables...
Essentially, they don't need it, because they only touch
globally-visible variables (visible to the guest).
So it's more of an stylish thing. Using the vcpu in the signature can
have only one harm:
It needs the caller to also have a pointer to a vcpu, so we may end up
using it everywhere, like a domino fall.

Alternatively, in such functions that don't currently receive a vcpu
(nor they need to), we can convention to always pass
lg->vcpus[0] to lgread, kill_guest, etc. Which one do you prefer?

> > When two dogs hang out, you don't have new puppies right in the other day.
> > Some time has to be elapsed. They have to grow first. In this same spirit,
> > having these patches _do not_ mean smp guests can be launched (yet)
> > Much more work is to come, but this is the basic infrastructure.
>
> OK, that made me laugh...
\o/
> Thanks!
> Rusty.
>
>



--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

2008-01-07 00:54:20

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/16] lguest: introduce vcpu structure

On Monday 07 January 2008 04:33:53 Glauber de Oliveira Costa wrote:
> On Dec 25, 2007 9:54 PM, Rusty Russell <[email protected]> wrote:
> > My only question is whether we should go further and vpu-ify routines
> > like lgread and kill_guest, so that we can avoid more "lg" temporary
> > variables...
>
> Essentially, they don't need it, because they only touch
> globally-visible variables (visible to the guest).
> So it's more of an stylish thing. Using the vcpu in the signature can
> have only one harm:
> It needs the caller to also have a pointer to a vcpu, so we may end up
> using it everywhere, like a domino fall.
>
> Alternatively, in such functions that don't currently receive a vcpu
> (nor they need to), we can convention to always pass
> lg->vcpus[0] to lgread, kill_guest, etc. Which one do you prefer?

I'm happy with a domino effect. I don't want to see lg->vcpus[0] *anywhere*
though, because it's non-futureproof.

When I looked through these patches it seems to me that we should accept that
vcpu is now the basic guest unit, and lg exists to serve it. Otherwise I
think you can see the bones of the old UP code poking through, and that's
ugly.

Thanks!
Rusty.

2008-01-07 13:06:17

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 0/16 -v2] lguest smp infrastructure

Folks,

This new series is not at all fundamentally different from the old one
I sent. Only difference is that I address the comments received, mainly
from Rusty.

enjoy!

2008-01-07 13:06:32

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 01/16] introduce vcpu struct

this patch introduces a vcpu struct for lguest. In upcoming patches,
more and more fields will be moved from the lguest struct to the vcpu

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lg.h | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 8692489..8fc1c29 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -38,6 +38,13 @@ struct lguest_pages
#define CHANGED_GDT_TLS 4 /* Actually a subset of CHANGED_GDT */
#define CHANGED_ALL 3

+struct lguest;
+
+struct lg_vcpu {
+ int vcpu_id;
+ struct lguest *lg;
+};
+
/* The private info the thread maintains about the guest. */
struct lguest
{
@@ -47,6 +54,9 @@ struct lguest
struct lguest_data __user *lguest_data;
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
+ struct lg_vcpu vcpus[NR_CPUS];
+ unsigned int nr_vcpus;
+
u32 pfn_limit;
/* This provides the offset to the base of guest-physical
* memory in the Launcher. */
--
1.5.0.6

2008-01-07 13:06:47

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 03/16] initialize vcpu

this patch initializes the first vcpu in the initialize() routing,
which is responsible for starting the process of putting the guest up.
right now, as much of the fields are still not per-vcpu, it does not
do much.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lguest_user.c | 20 ++++++++++++++++++++
1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 3b92a61..34be8e7 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -88,6 +88,20 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
return run_guest(lg, (unsigned long __user *)user);
}

+static int vcpu_start(struct lg_vcpu *vcpu, int vcpu_id,
+ unsigned long start_ip)
+{
+ if (vcpu_id > NR_CPUS)
+ return -EINVAL;
+
+ vcpu->vcpu_id = vcpu_id;
+
+ vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
+ vcpu->lg->nr_vcpus++;
+
+ return 0;
+}
+
/*L:020 The initialization write supplies 4 pointer sized (32 or 64 bit)
* values (in addition to the LHREQ_INITIALIZE value). These are:
*
@@ -134,6 +148,12 @@ static int initialize(struct file *file, const unsigned long __user *input)
lg->mem_base = (void __user *)(long)args[0];
lg->pfn_limit = args[1];

+ /* This is the first cpu */
+ lg->nr_vcpus = 0;
+ err = vcpu_start(&lg->vcpus[0], 0, args[3]);
+ if (err)
+ goto release_guest;
+
/* We need a complete page for the Guest registers: they are accessible
* to the Guest and we can only grant it access to whole pages. */
lg->regs_page = get_zeroed_page(GFP_KERNEL);
--
1.5.0.6

2008-01-07 13:07:01

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 04/16] per-cpu run guest

This patch makes the run_guest() routine use the vcpu struct.
This is required since in a smp guest environment, there's no
more the notion of "running the guest", but rather, it is "running the vcpu"

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 ++++--
drivers/lguest/lg.h | 4 ++--
drivers/lguest/lguest_user.c | 10 +++++++++-
drivers/lguest/x86/core.c | 16 +++++++++++-----
4 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index cb4c670..07a4c22 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -174,8 +174,10 @@ void __lgwrite(struct lguest *lg, unsigned long addr, const void *b,
/*H:030 Let's jump straight to the the main loop which runs the Guest.
* Remember, this is called by the Launcher reading /dev/lguest, and we keep
* going around and around until something interesting happens. */
-int run_guest(struct lguest *lg, unsigned long __user *user)
+int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)
{
+ struct lguest *lg = vcpu->lg;
+
/* We stop running once the Guest is dead. */
while (!lg->dead) {
/* First we run any hypercalls the Guest wants done. */
@@ -226,7 +228,7 @@ int run_guest(struct lguest *lg, unsigned long __user *user)
local_irq_disable();

/* Actually run the Guest until something happens. */
- lguest_arch_run_guest(lg);
+ lguest_arch_run_guest(vcpu);

/* Now we're ready to be interrupted or moved to other CPUs */
local_irq_enable();
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 8fc1c29..271d214 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -126,7 +126,7 @@ void __lgwrite(struct lguest *, unsigned long, const void *, unsigned);
} while(0)
/* (end of memory access helper routines) :*/

-int run_guest(struct lguest *lg, unsigned long __user *user);
+int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user);

/* Helper macros to obtain the first 12 or the last 20 bits, this is only the
* first step in the migration to the kernel types. pte_pfn is already defined
@@ -177,7 +177,7 @@ void page_table_guest_data_init(struct lguest *lg);
/* <arch>/core.c: */
void lguest_arch_host_init(void);
void lguest_arch_host_fini(void);
-void lguest_arch_run_guest(struct lguest *lg);
+void lguest_arch_run_guest(struct lg_vcpu *vcpu);
void lguest_arch_handle_trap(struct lguest *lg);
int lguest_arch_init_hypercalls(struct lguest *lg);
int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 34be8e7..216514b 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -55,11 +55,19 @@ static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
{
struct lguest *lg = file->private_data;
+ struct lg_vcpu *vcpu = NULL;
+ unsigned int vcpu_id = *o;

/* You must write LHREQ_INITIALIZE first! */
if (!lg)
return -EINVAL;

+ /* Watch out for arbitrary vcpu indexes! */
+ if (vcpu_id > lg->nr_vcpus)
+ return -EINVAL;
+
+ vcpu = &lg->vcpus[vcpu_id];
+
/* If you're not the task which owns the Guest, go away. */
if (current != lg->tsk)
return -EPERM;
@@ -85,7 +93,7 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
lg->pending_notify = 0;

/* Run the Guest until something interesting happens. */
- return run_guest(lg, (unsigned long __user *)user);
+ return run_guest(vcpu, (unsigned long __user *)user);
}

static int vcpu_start(struct lg_vcpu *vcpu, int vcpu_id,
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 482aec2..3496cd9 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -73,8 +73,10 @@ static DEFINE_PER_CPU(struct lguest *, last_guest);
* since it last ran. We saw this set in interrupts_and_traps.c and
* segments.c.
*/
-static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
+static void copy_in_guest_info(struct lg_vcpu *vcpu,
+ struct lguest_pages *pages)
{
+ struct lguest *lg = vcpu->lg;
/* Copying all this data can be quite expensive. We usually run the
* same Guest we ran last time (and that Guest hasn't run anywhere else
* meanwhile). If that's not the case, we pretend everything in the
@@ -113,14 +115,16 @@ static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
}

/* Finally: the code to actually call into the Switcher to run the Guest. */
-static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)
+static void run_guest_once(struct lg_vcpu *vcpu,
+ struct lguest_pages *pages)
{
/* This is a dummy value we need for GCC's sake. */
unsigned int clobber;
+ struct lguest *lg = vcpu->lg;

/* Copy the guest-specific information into this CPU's "struct
* lguest_pages". */
- copy_in_guest_info(lg, pages);
+ copy_in_guest_info(vcpu, pages);

/* Set the trap number to 256 (impossible value). If we fault while
* switching to the Guest (bad segment registers or bug), this will
@@ -161,8 +165,10 @@ static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)

/*H:040 This is the i386-specific code to setup and run the Guest. Interrupts
* are disabled: we own the CPU. */
-void lguest_arch_run_guest(struct lguest *lg)
+void lguest_arch_run_guest(struct lg_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
+
/* Remember the awfully-named TS bit? If the Guest has asked to set it
* we set it now, so we can trap and pass that trap to the Guest if it
* uses the FPU. */
@@ -180,7 +186,7 @@ void lguest_arch_run_guest(struct lguest *lg)
/* Now we actually run the Guest. It will return when something
* interesting happens, and we can examine its registers to see what it
* was doing. */
- run_guest_once(lg, lguest_pages(raw_smp_processor_id()));
+ run_guest_once(vcpu, lguest_pages(raw_smp_processor_id()));

/* Note that the "regs" pointer contains two extra entries which are
* not really registers: a trap number which says what interrupt or
--
1.5.0.6

2008-01-07 13:07:31

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 02/16] adapt lguest launcher to per-cpuness

This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
Documentation/lguest/lguest.c | 23 ++++++++++++++++-------
1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 9b0e322..4745f7e 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -79,6 +79,9 @@ static void *guest_base;
/* The maximum guest physical address allowed, and maximum possible. */
static unsigned long guest_limit, guest_max;

+/* a per-cpu variable indicating whose vcpu is currently running */
+static unsigned int __thread vcpu_id;
+
/* This is our list of devices. */
struct device_list
{
@@ -554,7 +557,7 @@ static void wake_parent(int pipefd, int lguest_fd)
else
FD_CLR(-fd - 1, &devices.infds);
} else /* Send LHREQ_BREAK command. */
- write(lguest_fd, args, sizeof(args));
+ pwrite(lguest_fd, args, sizeof(args), 0);
}
}

@@ -1511,7 +1514,8 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
int readval;

/* We read from the /dev/lguest device to run the Guest. */
- readval = read(lguest_fd, &notify_addr, sizeof(notify_addr));
+ readval = pread(lguest_fd, &notify_addr,
+ sizeof(notify_addr), vcpu_id);

/* One unsigned long means the Guest did HCALL_NOTIFY */
if (readval == sizeof(notify_addr)) {
@@ -1521,17 +1525,21 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* ENOENT means the Guest died. Reading tells us why. */
} else if (errno == ENOENT) {
char reason[1024] = { 0 };
- read(lguest_fd, reason, sizeof(reason)-1);
+ pread(lguest_fd, reason, sizeof(reason)-1, vcpu_id);
errx(1, "%s", reason);
/* EAGAIN means the Waker wanted us to look at some input.
* Anything else means a bug or incompatible change. */
} else if (errno != EAGAIN)
err(1, "Running guest failed");

- /* Service input, then unset the BREAK to release the Waker. */
- handle_input(lguest_fd);
- if (write(lguest_fd, args, sizeof(args)) < 0)
- err(1, "Resetting break");
+ if (!vcpu_id) {
+ /* Service input, then unset the BREAK to
+ * release the Waker. Right now, simple mecahnism
+ * to issue it all to first vcpu */
+ handle_input(lguest_fd);
+ if (pwrite(lguest_fd, args, sizeof(args), 0) < 0)
+ err(1, "Resetting break");
+ }
}
}
/*
@@ -1582,6 +1590,7 @@ int main(int argc, char *argv[])
devices.lastdev = &devices.dev;
devices.next_irq = 1;

+ vcpu_id = 0;
/* We need to know how much memory so we can set up the device
* descriptor and memory pages for the devices as we parse the command
* line. So we quickly look through the arguments to find the amount
--
1.5.0.6

2008-01-07 13:07:52

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 05/16] make write() operation smp aware

This patch makes the write() file operation smp aware. Which means, receiving
the vcpu_id value through the offset parameter, and being well aware to which
vcpu we're talking to.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lguest_user.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 216514b..d176004 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -230,14 +230,21 @@ static ssize_t write(struct file *file, const char __user *in,
struct lguest *lg = file->private_data;
const unsigned long __user *input = (const unsigned long __user *)in;
unsigned long req;
+ struct lg_vcpu *vcpu = NULL;
+ int vcpu_id = *off;

if (get_user(req, input) != 0)
return -EFAULT;
input++;

/* If you haven't initialized, you must do that first. */
- if (req != LHREQ_INITIALIZE && !lg)
- return -EINVAL;
+ if (req != LHREQ_INITIALIZE) {
+ if (!lg || (vcpu_id > lg->nr_vcpus))
+ return -EINVAL;
+ vcpu = &lg->vcpus[vcpu_id];
+ if (!vcpu)
+ return -EINVAL;
+ }

/* Once the Guest is dead, all you can do is read() why it died. */
if (lg && lg->dead)
--
1.5.0.6

2008-01-07 13:08:16

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 07/16] per-vcpu lguest timers

Here, I introduce per-vcpu timers. With this, we can have
local expiries, needed for accounting time in smp guests

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 20 ++++++++++----------
drivers/lguest/lg.h | 10 +++++-----
drivers/lguest/lguest_user.c | 12 +++++++-----
4 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 9417601..1bf133e 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -78,7 +78,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
guest_set_pmd(lg, args->arg1, args->arg2);
break;
case LHCALL_SET_CLOCKEVENT:
- guest_set_clockevent(lg, args->arg1);
+ guest_set_clockevent(vcpu, args->arg1);
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 2b66f79..3be18a6 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -470,13 +470,13 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt,
* infrastructure to set a callback at that time.
*
* 0 means "turn off the clock". */
-void guest_set_clockevent(struct lguest *lg, unsigned long delta)
+void guest_set_clockevent(struct lg_vcpu *vcpu, unsigned long delta)
{
ktime_t expires;

if (unlikely(delta == 0)) {
/* Clock event device is shutting down. */
- hrtimer_cancel(&lg->hrt);
+ hrtimer_cancel(&vcpu->hrt);
return;
}

@@ -484,25 +484,25 @@ void guest_set_clockevent(struct lguest *lg, unsigned long delta)
* all the time between now and the timer interrupt it asked for. This
* is almost always the right thing to do. */
expires = ktime_add_ns(ktime_get_real(), delta);
- hrtimer_start(&lg->hrt, expires, HRTIMER_MODE_ABS);
+ hrtimer_start(&vcpu->hrt, expires, HRTIMER_MODE_ABS);
}

/* This is the function called when the Guest's timer expires. */
static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
{
- struct lguest *lg = container_of(timer, struct lguest, hrt);
+ struct lg_vcpu *vcpu = container_of(timer, struct lg_vcpu, hrt);

/* Remember the first interrupt is the timer interrupt. */
- set_bit(0, lg->irqs_pending);
+ set_bit(0, vcpu->lg->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
- if (lg->halted)
- wake_up_process(lg->tsk);
+ if (vcpu->lg->halted)
+ wake_up_process(vcpu->lg->tsk);
return HRTIMER_NORESTART;
}

/* This sets up the timer for this Guest. */
-void init_clockdev(struct lguest *lg)
+void init_clockdev(struct lg_vcpu *vcpu)
{
- hrtimer_init(&lg->hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
- lg->hrt.function = clockdev_fn;
+ hrtimer_init(&vcpu->hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+ vcpu->hrt.function = clockdev_fn;
}
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 13a991a..9c90fd3 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -47,6 +47,9 @@ struct lg_vcpu {
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
+
+ /* Virtual clock device */
+ struct hrtimer hrt;
};

/* The private info the thread maintains about the guest. */
@@ -95,9 +98,6 @@ struct lguest

struct lguest_arch arch;

- /* Virtual clock device */
- struct hrtimer hrt;
-
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};
@@ -145,8 +145,8 @@ void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
void copy_traps(const struct lguest *lg, struct desc_struct *idt,
const unsigned long *def);
-void guest_set_clockevent(struct lguest *lg, unsigned long delta);
-void init_clockdev(struct lguest *lg);
+void guest_set_clockevent(struct lg_vcpu *vcpu, unsigned long delta);
+void init_clockdev(struct lg_vcpu *vcpu);
bool check_syscall_vector(struct lguest *lg);
int init_interrupts(void);
void free_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index d176004..cd2b0bf 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -104,6 +104,9 @@ static int vcpu_start(struct lg_vcpu *vcpu, int vcpu_id,

vcpu->vcpu_id = vcpu_id;

+ /* The timer for lguest's clock needs initialization. */
+ init_clockdev(vcpu);
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -183,9 +186,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
* address. */
lguest_arch_setup_regs(lg, args[3]);

- /* The timer for lguest's clock needs initialization. */
- init_clockdev(lg);
-
/* We keep a pointer to the Launcher task (ie. current task) for when
* other Guests want to wake this one (inter-Guest I/O). */
lg->tsk = current;
@@ -276,6 +276,7 @@ static ssize_t write(struct file *file, const char __user *in,
static int close(struct inode *inode, struct file *file)
{
struct lguest *lg = file->private_data;
+ int i;

/* If we never successfully initialized, there's nothing to clean up */
if (!lg)
@@ -284,8 +285,9 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
- /* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
- hrtimer_cancel(&lg->hrt);
+ for (i = 0; i < lg->nr_vcpus; i++)
+ /* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
+ hrtimer_cancel(&lg->vcpus[i].hrt);
/* Free up the shadow page tables for the Guest. */
free_guest_pagetable(lg);
/* Now all the memory cleanups are done, it's safe to release the
--
1.5.0.6

2008-01-07 13:08:35

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 06/16] make hypercalls use the vcpu struct

this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their
callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole
guest

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 +++---
drivers/lguest/hypercalls.c | 42 +++++++++++++++++++++++-------------------
drivers/lguest/lg.h | 16 ++++++++--------
drivers/lguest/x86/core.c | 16 ++++++++++------
4 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 07a4c22..99f65f9 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -181,8 +181,8 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)
/* We stop running once the Guest is dead. */
while (!lg->dead) {
/* First we run any hypercalls the Guest wants done. */
- if (lg->hcall)
- do_hypercalls(lg);
+ if (vcpu->hcall)
+ do_hypercalls(vcpu);

/* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
@@ -234,7 +234,7 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)
local_irq_enable();

/* Now we deal with whatever happened to the Guest. */
- lguest_arch_handle_trap(lg);
+ lguest_arch_handle_trap(vcpu);
}

/* The Guest is dead => "No such file or directory" */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index b478aff..9417601 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -29,8 +29,10 @@

/*H:120 This is the core hypercall routine: where the Guest gets what it wants.
* Or gets killed. Or, in the case of LHCALL_CRASH, both. */
-static void do_hcall(struct lguest *lg, struct hcall_args *args)
+static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
{
+ struct lguest *lg = vcpu->lg;
+
switch (args->arg0) {
case LHCALL_FLUSH_ASYNC:
/* This call does nothing, except by breaking out of the Guest
@@ -91,7 +93,7 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args)
break;
default:
/* It should be an architecture-specific hypercall. */
- if (lguest_arch_do_hcall(lg, args))
+ if (lguest_arch_do_hcall(vcpu, args))
kill_guest(lg, "Bad hypercall %li\n", args->arg0);
}
}
@@ -104,10 +106,11 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args)
* Guest put them in the ring, but we also promise the Guest that they will
* happen before any normal hypercall (which is why we check this before
* checking for a normal hcall). */
-static void do_async_hcalls(struct lguest *lg)
+static void do_async_hcalls(struct lg_vcpu *vcpu)
{
unsigned int i;
u8 st[LHCALL_RING_SIZE];
+ struct lguest *lg = vcpu->lg;

/* For simplicity, we copy the entire call status array in at once. */
if (copy_from_user(&st, &lg->lguest_data->hcall_status, sizeof(st)))
@@ -119,7 +122,7 @@ static void do_async_hcalls(struct lguest *lg)
/* We remember where we were up to from last time. This makes
* sure that the hypercalls are done in the order the Guest
* places them in the ring. */
- unsigned int n = lg->next_hcall;
+ unsigned int n = vcpu->next_hcall;

/* 0xFF means there's no call here (yet). */
if (st[n] == 0xFF)
@@ -127,8 +130,8 @@ static void do_async_hcalls(struct lguest *lg)

/* OK, we have hypercall. Increment the "next_hcall" cursor,
* and wrap back to 0 if we reach the end. */
- if (++lg->next_hcall == LHCALL_RING_SIZE)
- lg->next_hcall = 0;
+ if (++vcpu->next_hcall == LHCALL_RING_SIZE)
+ vcpu->next_hcall = 0;

/* Copy the hypercall arguments into a local copy of
* the hcall_args struct. */
@@ -139,7 +142,7 @@ static void do_async_hcalls(struct lguest *lg)
}

/* Do the hypercall, same as a normal one. */
- do_hcall(lg, &args);
+ do_hcall(vcpu, &args);

/* Mark the hypercall done. */
if (put_user(0xFF, &lg->lguest_data->hcall_status[n])) {
@@ -156,16 +159,17 @@ static void do_async_hcalls(struct lguest *lg)

/* Last of all, we look at what happens first of all. The very first time the
* Guest makes a hypercall, we end up here to set things up: */
-static void initialize(struct lguest *lg)
+static void initialize(struct lg_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
/* You can't do anything until you're initialized. The Guest knows the
* rules, so we're unforgiving here. */
- if (lg->hcall->arg0 != LHCALL_LGUEST_INIT) {
- kill_guest(lg, "hypercall %li before INIT", lg->hcall->arg0);
+ if (vcpu->hcall->arg0 != LHCALL_LGUEST_INIT) {
+ kill_guest(lg, "hypercall %li before INIT", vcpu->hcall->arg0);
return;
}

- if (lguest_arch_init_hypercalls(lg))
+ if (lguest_arch_init_hypercalls(vcpu))
kill_guest(lg, "bad guest page %p", lg->lguest_data);

/* The Guest tells us where we're not to deliver interrupts by putting
@@ -194,27 +198,27 @@ static void initialize(struct lguest *lg)
* Remember from the Guest, hypercalls come in two flavors: normal and
* asynchronous. This file handles both of types.
*/
-void do_hypercalls(struct lguest *lg)
+void do_hypercalls(struct lg_vcpu *vcpu)
{
/* Not initialized yet? This hypercall must do it. */
- if (unlikely(!lg->lguest_data)) {
+ if (unlikely(!vcpu->lg->lguest_data)) {
/* Set up the "struct lguest_data" */
- initialize(lg);
+ initialize(vcpu);
/* Hcall is done. */
- lg->hcall = NULL;
+ vcpu->hcall = NULL;
return;
}

/* The Guest has initialized.
*
* Look in the hypercall ring for the async hypercalls: */
- do_async_hcalls(lg);
+ do_async_hcalls(vcpu);

/* If we stopped reading the hypercall ring because the Guest did a
* NOTIFY to the Launcher, we want to return now. Otherwise we do
* the hypercall. */
- if (!lg->pending_notify) {
- do_hcall(lg, lg->hcall);
+ if (!vcpu->lg->pending_notify) {
+ do_hcall(vcpu, vcpu->hcall);
/* Tricky point: we reset the hcall pointer to mark the
* hypercall as "done". We use the hcall pointer rather than
* the trap number to indicate a hypercall is pending.
@@ -225,7 +229,7 @@ void do_hypercalls(struct lguest *lg)
* Launcher, the run_guest() loop will exit without running the
* Guest. When it comes back it would try to re-run the
* hypercall. */
- lg->hcall = NULL;
+ vcpu->hcall = NULL;
}
}

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 271d214..13a991a 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -43,6 +43,10 @@ struct lguest;
struct lg_vcpu {
int vcpu_id;
struct lguest *lg;
+
+ /* If a hypercall was asked for, this points to the arguments. */
+ struct hcall_args *hcall;
+ u32 next_hcall;
};

/* The private info the thread maintains about the guest. */
@@ -65,13 +69,9 @@ struct lguest
u32 cr2;
int halted;
int ts;
- u32 next_hcall;
u32 esp1;
u8 ss1;

- /* If a hypercall was asked for, this points to the arguments. */
- struct hcall_args *hcall;
-
/* Do we need to stop what we're doing and return to userspace? */
int break_out;
wait_queue_head_t break_wq;
@@ -178,9 +178,9 @@ void page_table_guest_data_init(struct lguest *lg);
void lguest_arch_host_init(void);
void lguest_arch_host_fini(void);
void lguest_arch_run_guest(struct lg_vcpu *vcpu);
-void lguest_arch_handle_trap(struct lguest *lg);
-int lguest_arch_init_hypercalls(struct lguest *lg);
-int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args);
+void lguest_arch_handle_trap(struct lg_vcpu *vcpu);
+int lguest_arch_init_hypercalls(struct lg_vcpu *vcpu);
+int lguest_arch_do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args);
void lguest_arch_setup_regs(struct lguest *lg, unsigned long start);

/* <arch>/switcher.S: */
@@ -191,7 +191,7 @@ int lguest_device_init(void);
void lguest_device_remove(void);

/* hypercalls.c: */
-void do_hypercalls(struct lguest *lg);
+void do_hypercalls(struct lg_vcpu *vcpu);
void write_timestamp(struct lguest *lg);

/*L:035
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 3496cd9..6877de2 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -285,8 +285,9 @@ static int emulate_insn(struct lguest *lg)
}

/*H:050 Once we've re-enabled interrupts, we look at why the Guest exited. */
-void lguest_arch_handle_trap(struct lguest *lg)
+void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
switch (lg->regs->trapnum) {
case 13: /* We've intercepted a General Protection Fault. */
/* Check if this was one of those annoying IN or OUT
@@ -338,7 +339,7 @@ void lguest_arch_handle_trap(struct lguest *lg)
case LGUEST_TRAP_ENTRY:
/* Our 'struct hcall_args' maps directly over our regs: we set
* up the pointer now to indicate a hypercall is pending. */
- lg->hcall = (struct hcall_args *)lg->regs;
+ vcpu->hcall = (struct hcall_args *)lg->regs;
return;
}

@@ -493,8 +494,10 @@ void __exit lguest_arch_host_fini(void)


/*H:122 The i386-specific hypercalls simply farm out to the right functions. */
-int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args)
+int lguest_arch_do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
{
+ struct lguest *lg = vcpu->lg;
+
switch (args->arg0) {
case LHCALL_LOAD_GDT:
load_guest_gdt(lg, args->arg1, args->arg2);
@@ -513,13 +516,14 @@ int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args)
}

/*H:126 i386-specific hypercall initialization: */
-int lguest_arch_init_hypercalls(struct lguest *lg)
+int lguest_arch_init_hypercalls(struct lg_vcpu *vcpu)
{
u32 tsc_speed;
+ struct lguest *lg = vcpu->lg;

/* The pointer to the Guest's "struct lguest_data" is the only
* argument. We check that address now. */
- if (!lguest_address_ok(lg, lg->hcall->arg1, sizeof(*lg->lguest_data)))
+ if (!lguest_address_ok(lg, vcpu->hcall->arg1, sizeof(*lg->lguest_data)))
return -EFAULT;

/* Having checked it, we simply set lg->lguest_data to point straight
@@ -527,7 +531,7 @@ int lguest_arch_init_hypercalls(struct lguest *lg)
* copy_to_user/from_user from now on, instead of lgread/write. I put
* this in to show that I'm not immune to writing stupid
* optimizations. */
- lg->lguest_data = lg->mem_base + lg->hcall->arg1;
+ lg->lguest_data = lg->mem_base + vcpu->hcall->arg1;

/* We insist that the Time Stamp Counter exist and doesn't change with
* cpu frequency. Some devious chip manufacturers decided that TSC
--
1.5.0.6

2008-01-07 13:08:50

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 08/16] per-vcpu interrupt processing.

This patch adapts interrupt processing for using the vcpu struct.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 25 ++++++++++++++-----------
drivers/lguest/lg.h | 10 +++++-----
drivers/lguest/lguest_user.c | 7 ++++---
drivers/lguest/x86/core.c | 2 +-
5 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 99f65f9..bc3b32d 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -203,7 +203,7 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)
/* Check if there are any interrupts which can be delivered
* now: if so, this sets up the hander to be executed when we
* next run the Guest. */
- maybe_do_interrupt(lg);
+ maybe_do_interrupt(vcpu);

/* All long-lived kernel loops need to check with this horrible
* thing called the freezer. If the Host is trying to suspend,
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 3be18a6..d28671b 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -60,11 +60,13 @@ static void push_guest_stack(struct lguest *lg, unsigned long *gstack, u32 val)
* We set up the stack just like the CPU does for a real interrupt, so it's
* identical for the Guest (and the standard "iret" instruction will undo
* it). */
-static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err)
+static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
+ int has_err)
{
unsigned long gstack, origstack;
u32 eflags, ss, irq_enable;
unsigned long virtstack;
+ struct lguest *lg = vcpu->lg;

/* There are two cases for interrupts: one where the Guest is already
* in the kernel, and a more complex one where the Guest is in
@@ -129,9 +131,10 @@ static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err)
*
* maybe_do_interrupt() gets called before every entry to the Guest, to see if
* we should divert the Guest to running an interrupt handler. */
-void maybe_do_interrupt(struct lguest *lg)
+void maybe_do_interrupt(struct lg_vcpu *vcpu)
{
unsigned int irq;
+ struct lguest *lg = vcpu->lg;
DECLARE_BITMAP(blk, LGUEST_IRQS);
struct desc_struct *idt;

@@ -145,7 +148,7 @@ void maybe_do_interrupt(struct lguest *lg)
sizeof(blk)))
return;

- bitmap_andnot(blk, lg->irqs_pending, blk, LGUEST_IRQS);
+ bitmap_andnot(blk, vcpu->irqs_pending, blk, LGUEST_IRQS);

/* Find the first interrupt. */
irq = find_first_bit(blk, LGUEST_IRQS);
@@ -180,11 +183,11 @@ void maybe_do_interrupt(struct lguest *lg)
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt->a, idt->b)) {
/* OK, mark it no longer pending and deliver it. */
- clear_bit(irq, lg->irqs_pending);
+ clear_bit(irq, vcpu->irqs_pending);
/* set_guest_interrupt() takes the interrupt descriptor and a
* flag to say whether this interrupt pushes an error code onto
* the stack as well: virtual interrupts never do. */
- set_guest_interrupt(lg, idt->a, idt->b, 0);
+ set_guest_interrupt(vcpu, idt->a, idt->b, 0);
}

/* Every time we deliver an interrupt, we update the timestamp in the
@@ -245,19 +248,19 @@ static int has_err(unsigned int trap)
}

/* deliver_trap() returns true if it could deliver the trap. */
-int deliver_trap(struct lguest *lg, unsigned int num)
+int deliver_trap(struct lg_vcpu *vcpu, unsigned int num)
{
/* Trap numbers are always 8 bit, but we set an impossible trap number
* for traps inside the Switcher, so check that here. */
- if (num >= ARRAY_SIZE(lg->arch.idt))
+ if (num >= ARRAY_SIZE(vcpu->lg->arch.idt))
return 0;

/* Early on the Guest hasn't set the IDT entries (or maybe it put a
* bogus one in): if we fail here, the Guest will be killed. */
- if (!idt_present(lg->arch.idt[num].a, lg->arch.idt[num].b))
+ if (!idt_present(vcpu->lg->arch.idt[num].a, vcpu->lg->arch.idt[num].b))
return 0;
- set_guest_interrupt(lg, lg->arch.idt[num].a, lg->arch.idt[num].b,
- has_err(num));
+ set_guest_interrupt(vcpu, vcpu->lg->arch.idt[num].a,
+ vcpu->lg->arch.idt[num].b, has_err(num));
return 1;
}

@@ -493,7 +496,7 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
struct lg_vcpu *vcpu = container_of(timer, struct lg_vcpu, hrt);

/* Remember the first interrupt is the timer interrupt. */
- set_bit(0, vcpu->lg->irqs_pending);
+ set_bit(0, vcpu->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
if (vcpu->lg->halted)
wake_up_process(vcpu->lg->tsk);
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 9c90fd3..6c794cd 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -50,6 +50,9 @@ struct lg_vcpu {

/* Virtual clock device */
struct hrtimer hrt;
+
+ /* Pending virtual interrupts */
+ DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};

/* The private info the thread maintains about the guest. */
@@ -97,9 +100,6 @@ struct lguest
const char *dead;

struct lguest_arch arch;
-
- /* Pending virtual interrupts */
- DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
};

extern struct mutex lguest_lock;
@@ -136,8 +136,8 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user);
#define pgd_pfn(x) (pgd_val(x) >> PAGE_SHIFT)

/* interrupts_and_traps.c: */
-void maybe_do_interrupt(struct lguest *lg);
-int deliver_trap(struct lguest *lg, unsigned int num);
+void maybe_do_interrupt(struct lg_vcpu *vcpu);
+int deliver_trap(struct lg_vcpu *vcpu, unsigned int num);
void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
void pin_stack_pages(struct lguest *lg);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index cd2b0bf..abae008 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -36,7 +36,8 @@ static int break_guest_out(struct lguest *lg, const unsigned long __user *input)

/*L:050 Sending an interrupt is done by writing LHREQ_IRQ and an interrupt
* number to /dev/lguest. */
-static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
+static int user_send_irq(struct lg_vcpu *vcpu,
+ const unsigned long __user *input)
{
unsigned long irq;

@@ -46,7 +47,7 @@ static int user_send_irq(struct lguest *lg, const unsigned long __user *input)
return -EINVAL;
/* Next time the Guest runs, the core code will see if it can deliver
* this interrupt. */
- set_bit(irq, lg->irqs_pending);
+ set_bit(irq, vcpu->irqs_pending);
return 0;
}

@@ -258,7 +259,7 @@ static ssize_t write(struct file *file, const char __user *in,
case LHREQ_INITIALIZE:
return initialize(file, input);
case LHREQ_IRQ:
- return user_send_irq(lg, input);
+ return user_send_irq(vcpu, input);
case LHREQ_BREAK:
return break_guest_out(lg, input);
default:
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 6877de2..8fbb373 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -344,7 +344,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
}

/* We didn't handle the trap, so it needs to go to the Guest. */
- if (!deliver_trap(lg, lg->regs->trapnum))
+ if (!deliver_trap(vcpu, lg->regs->trapnum))
/* If the Guest doesn't have a handler (either it hasn't
* registered any yet, or it's one of the faults we don't let
* it handle), it dies with a cryptic error message. */
--
1.5.0.6

2008-01-07 13:09:10

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 09/16] map_switcher_in_guest() per-vcpu

The switcher needs to be mapped per-vcpu, because different vcpus
will potentially have different page tables (they don't have to,
because threads will share the same).

So our first step is the make the function receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/lg.h | 3 ++-
drivers/lguest/page_tables.c | 4 +++-
drivers/lguest/x86/core.c | 2 +-
3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 6c794cd..f871737 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -168,7 +168,8 @@ void guest_pagetable_clear_all(struct lguest *lg);
void guest_pagetable_flush_user(struct lguest *lg);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages);
+void map_switcher_in_guest(struct lg_vcpu *vcpu,
+ struct lguest_pages *pages);
int demand_page(struct lguest *info, unsigned long cr2, int errcode);
void pin_page(struct lguest *lg, unsigned long vaddr);
unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index fffabb3..c79fac2 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -634,8 +634,10 @@ void free_guest_pagetable(struct lguest *lg)
* Guest (and not the pages for other CPUs). We have the appropriate PTE pages
* for each CPU already set up, we just need to hook them in now we know which
* Guest is about to run on this CPU. */
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages)
+void map_switcher_in_guest(struct lg_vcpu *vcpu,
+ struct lguest_pages *pages)
{
+ struct lguest *lg = vcpu->lg;
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 8fbb373..125a14b 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -92,7 +92,7 @@ static void copy_in_guest_info(struct lg_vcpu *vcpu,
pages->state.host_cr3 = __pa(current->mm->pgd);
/* Set up the Guest's page tables to see this CPU's pages (and no
* other CPU's pages). */
- map_switcher_in_guest(lg, pages);
+ map_switcher_in_guest(vcpu, pages);
/* Set up the two "TSS" members which tell the CPU what stack to use
* for traps which do directly into the Guest (ie. traps at privilege
* level 1). */
--
1.5.0.6

2008-01-07 13:09:33

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 10/16] make emulate_insn receive a vcpu struct.

emulate_insn() needs to know about current eip, which will be,
in the future, a per-vcpu thing. So in this patch, the function
prototype is modified to receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/x86/core.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 125a14b..b336fff 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -220,8 +220,9 @@ void lguest_arch_run_guest(struct lg_vcpu *vcpu)
* When the Guest uses one of these instructions, we get a trap (General
* Protection Fault) and come here. We see if it's one of those troublesome
* instructions and skip over it. We return true if we did. */
-static int emulate_insn(struct lguest *lg)
+static int emulate_insn(struct lg_vcpu *vcpu)
{
+ struct lguest *lg = vcpu->lg;
u8 insn;
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
@@ -294,7 +295,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
* instructions which we need to emulate. If so, we just go
* back into the Guest after we've done it. */
if (lg->regs->errcode == 0) {
- if (emulate_insn(lg))
+ if (emulate_insn(vcpu))
return;
}
break;
--
1.5.0.6

2008-01-07 13:09:48

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 12/16] replace lguest_arch with lg_vcpu_arch.

The fields found in lguest_arch are not really per-guest,
but per-cpu (gdt, idt, etc). So this patch turns lguest_arch
into lg_vcpu_arch.

It makes sense to have a per-guest per-arch struct, but this
can be addressed later, when the need arrives.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/interrupts_and_traps.c | 29 +++++++++++----------
drivers/lguest/lg.h | 19 +++++++-------
drivers/lguest/segments.c | 43 +++++++++++++++++---------------
drivers/lguest/x86/core.c | 24 ++++++++----------
include/asm-x86/lguest.h | 2 +-
5 files changed, 60 insertions(+), 57 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 4cc7404..f8f7efe 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -180,7 +180,7 @@ void maybe_do_interrupt(struct lg_vcpu *vcpu)
/* Look at the IDT entry the Guest gave us for this interrupt. The
* first 32 (FIRST_EXTERNAL_VECTOR) entries are for traps, so we skip
* over them. */
- idt = &lg->arch.idt[FIRST_EXTERNAL_VECTOR+irq];
+ idt = &vcpu->arch.idt[FIRST_EXTERNAL_VECTOR+irq];
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt->a, idt->b)) {
/* OK, mark it no longer pending and deliver it. */
@@ -253,15 +253,15 @@ int deliver_trap(struct lg_vcpu *vcpu, unsigned int num)
{
/* Trap numbers are always 8 bit, but we set an impossible trap number
* for traps inside the Switcher, so check that here. */
- if (num >= ARRAY_SIZE(vcpu->lg->arch.idt))
+ if (num >= ARRAY_SIZE(vcpu->arch.idt))
return 0;

/* Early on the Guest hasn't set the IDT entries (or maybe it put a
* bogus one in): if we fail here, the Guest will be killed. */
- if (!idt_present(vcpu->lg->arch.idt[num].a, vcpu->lg->arch.idt[num].b))
+ if (!idt_present(vcpu->arch.idt[num].a, vcpu->arch.idt[num].b))
return 0;
- set_guest_interrupt(vcpu, vcpu->lg->arch.idt[num].a,
- vcpu->lg->arch.idt[num].b, has_err(num));
+ set_guest_interrupt(vcpu, vcpu->arch.idt[num].a,
+ vcpu->arch.idt[num].b, has_err(num));
return 1;
}

@@ -387,7 +387,8 @@ static void set_trap(struct lguest *lg, struct desc_struct *trap,
*
* We saw the Guest setting Interrupt Descriptor Table (IDT) entries with the
* LHCALL_LOAD_IDT_ENTRY hypercall before: that comes here. */
-void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)
+void load_guest_idt_entry(struct lg_vcpu *vcpu,
+ unsigned int num, u32 lo, u32 hi)
{
/* Guest never handles: NMI, doublefault, spurious interrupt or
* hypercall. We ignore when it tries to set them. */
@@ -396,13 +397,13 @@ void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)

/* Mark the IDT as changed: next time the Guest runs we'll know we have
* to copy this again. */
- lg->changed |= CHANGED_IDT;
+ vcpu->lg->changed |= CHANGED_IDT;

/* Check that the Guest doesn't try to step outside the bounds. */
- if (num >= ARRAY_SIZE(lg->arch.idt))
- kill_guest(lg, "Setting idt entry %u", num);
+ if (num >= ARRAY_SIZE(vcpu->arch.idt))
+ kill_guest(vcpu->lg, "Setting idt entry %u", num);
else
- set_trap(lg, &lg->arch.idt[num], num, lo, hi);
+ set_trap(vcpu->lg, &vcpu->arch.idt[num], num, lo, hi);
}

/* The default entry for each interrupt points into the Switcher routines which
@@ -438,14 +439,14 @@ void setup_default_idt_entries(struct lguest_ro_state *state,
/*H:240 We don't use the IDT entries in the "struct lguest" directly, instead
* we copy them into the IDT which we've set up for Guests on this CPU, just
* before we run the Guest. This routine does that copy. */
-void copy_traps(const struct lguest *lg, struct desc_struct *idt,
+void copy_traps(const struct lg_vcpu *vcpu, struct desc_struct *idt,
const unsigned long *def)
{
unsigned int i;

/* We can simply copy the direct traps, otherwise we use the default
* ones in the Switcher: they will return to the Host. */
- for (i = 0; i < ARRAY_SIZE(lg->arch.idt); i++) {
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.idt); i++) {
/* If no Guest can ever override this trap, leave it alone. */
if (!direct_trap(i))
continue;
@@ -454,8 +455,8 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt,
* Interrupt gates (type 14) disable interrupts as they are
* entered, which we never let the Guest do. Not present
* entries (type 0x0) also can't go direct, of course. */
- if (idt_type(lg->arch.idt[i].a, lg->arch.idt[i].b) == 0xF)
- idt[i] = lg->arch.idt[i];
+ if (idt_type(vcpu->arch.idt[i].a, vcpu->arch.idt[i].b) == 0xF)
+ idt[i] = vcpu->arch.idt[i];
else
/* Reset it to the default. */
default_idt_entry(&idt[i], i, def[i]);
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index d8429a0..5165172 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -57,6 +57,8 @@ struct lg_vcpu {

/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
+
+ struct lg_vcpu_arch arch;
};

/* The private info the thread maintains about the guest. */
@@ -99,8 +101,6 @@ struct lguest

/* Dead? */
const char *dead;
-
- struct lguest_arch arch;
};

extern struct mutex lguest_lock;
@@ -139,12 +139,13 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user);
/* interrupts_and_traps.c: */
void maybe_do_interrupt(struct lg_vcpu *vcpu);
int deliver_trap(struct lg_vcpu *vcpu, unsigned int num);
-void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
+void load_guest_idt_entry(struct lg_vcpu *vcpu, unsigned int i,
+ u32 low, u32 hi);
void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
void pin_stack_pages(struct lguest *lg);
void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
-void copy_traps(const struct lguest *lg, struct desc_struct *idt,
+void copy_traps(const struct lg_vcpu *vcpu, struct desc_struct *idt,
const unsigned long *def);
void guest_set_clockevent(struct lg_vcpu *vcpu, unsigned long delta);
void init_clockdev(struct lg_vcpu *vcpu);
@@ -154,11 +155,11 @@ void free_interrupts(void);

/* segments.c: */
void setup_default_gdt_entries(struct lguest_ro_state *state);
-void setup_guest_gdt(struct lguest *lg);
-void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num);
-void guest_load_tls(struct lguest *lg, unsigned long tls_array);
-void copy_gdt(const struct lguest *lg, struct desc_struct *gdt);
-void copy_gdt_tls(const struct lguest *lg, struct desc_struct *gdt);
+void setup_guest_gdt(struct lg_vcpu *vcpu);
+void load_guest_gdt(struct lg_vcpu *vcpu, unsigned long table, u32 num);
+void guest_load_tls(struct lg_vcpu *vcpu, unsigned long tls_array);
+void copy_gdt(const struct lg_vcpu *vcpu, struct desc_struct *gdt);
+void copy_gdt_tls(const struct lg_vcpu *vcpu, struct desc_struct *gdt);

/* page_tables.c: */
int init_guest_pagetable(struct lguest *lg, unsigned long pgtable);
diff --git a/drivers/lguest/segments.c b/drivers/lguest/segments.c
index 9e189cb..b3cca41 100644
--- a/drivers/lguest/segments.c
+++ b/drivers/lguest/segments.c
@@ -58,7 +58,8 @@ static int ignored_gdt(unsigned int num)
* Protection Fault in the Switcher when it restores a Guest segment register
* which tries to use that entry. Then we kill the Guest for causing such a
* mess: the message will be "unhandled trap 256". */
-static void fixup_gdt_table(struct lguest *lg, unsigned start, unsigned end)
+static void fixup_gdt_table(struct lg_vcpu *vcpu, unsigned start,
+ unsigned end)
{
unsigned int i;

@@ -71,14 +72,14 @@ static void fixup_gdt_table(struct lguest *lg, unsigned start, unsigned end)
/* Segment descriptors contain a privilege level: the Guest is
* sometimes careless and leaves this as 0, even though it's
* running at privilege level 1. If so, we fix it here. */
- if ((lg->arch.gdt[i].b & 0x00006000) == 0)
- lg->arch.gdt[i].b |= (GUEST_PL << 13);
+ if ((vcpu->arch.gdt[i].b & 0x00006000) == 0)
+ vcpu->arch.gdt[i].b |= (GUEST_PL << 13);

/* Each descriptor has an "accessed" bit. If we don't set it
* now, the CPU will try to set it when the Guest first loads
* that entry into a segment register. But the GDT isn't
* writable by the Guest, so bad things can happen. */
- lg->arch.gdt[i].b |= 0x00000100;
+ vcpu->arch.gdt[i].b |= 0x00000100;
}
}

@@ -109,31 +110,31 @@ void setup_default_gdt_entries(struct lguest_ro_state *state)

/* This routine sets up the initial Guest GDT for booting. All entries start
* as 0 (unusable). */
-void setup_guest_gdt(struct lguest *lg)
+void setup_guest_gdt(struct lg_vcpu *vcpu)
{
/* Start with full 0-4G segments... */
- lg->arch.gdt[GDT_ENTRY_KERNEL_CS] = FULL_EXEC_SEGMENT;
- lg->arch.gdt[GDT_ENTRY_KERNEL_DS] = FULL_SEGMENT;
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_CS] = FULL_EXEC_SEGMENT;
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_DS] = FULL_SEGMENT;
/* ...except the Guest is allowed to use them, so set the privilege
* level appropriately in the flags. */
- lg->arch.gdt[GDT_ENTRY_KERNEL_CS].b |= (GUEST_PL << 13);
- lg->arch.gdt[GDT_ENTRY_KERNEL_DS].b |= (GUEST_PL << 13);
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_CS].b |= (GUEST_PL << 13);
+ vcpu->arch.gdt[GDT_ENTRY_KERNEL_DS].b |= (GUEST_PL << 13);
}

/*H:650 An optimization of copy_gdt(), for just the three "thead-local storage"
* entries. */
-void copy_gdt_tls(const struct lguest *lg, struct desc_struct *gdt)
+void copy_gdt_tls(const struct lg_vcpu *vcpu, struct desc_struct *gdt)
{
unsigned int i;

for (i = GDT_ENTRY_TLS_MIN; i <= GDT_ENTRY_TLS_MAX; i++)
- gdt[i] = lg->arch.gdt[i];
+ gdt[i] = vcpu->arch.gdt[i];
}

/*H:640 When the Guest is run on a different CPU, or the GDT entries have
* changed, copy_gdt() is called to copy the Guest's GDT entries across to this
* CPU's GDT. */
-void copy_gdt(const struct lguest *lg, struct desc_struct *gdt)
+void copy_gdt(const struct lg_vcpu *vcpu, struct desc_struct *gdt)
{
unsigned int i;

@@ -141,21 +142,22 @@ void copy_gdt(const struct lguest *lg, struct desc_struct *gdt)
* replaced. See ignored_gdt() above. */
for (i = 0; i < GDT_ENTRIES; i++)
if (!ignored_gdt(i))
- gdt[i] = lg->arch.gdt[i];
+ gdt[i] = vcpu->arch.gdt[i];
}

/*H:620 This is where the Guest asks us to load a new GDT (LHCALL_LOAD_GDT).
* We copy it from the Guest and tweak the entries. */
-void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num)
+void load_guest_gdt(struct lg_vcpu *vcpu, unsigned long table, u32 num)
{
+ struct lguest *lg = vcpu->lg;
/* We assume the Guest has the same number of GDT entries as the
* Host, otherwise we'd have to dynamically allocate the Guest GDT. */
- if (num > ARRAY_SIZE(lg->arch.gdt))
+ if (num > ARRAY_SIZE(vcpu->arch.gdt))
kill_guest(lg, "too many gdt entries %i", num);

/* We read the whole thing in, then fix it up. */
- __lgread(lg, lg->arch.gdt, table, num * sizeof(lg->arch.gdt[0]));
- fixup_gdt_table(lg, 0, ARRAY_SIZE(lg->arch.gdt));
+ __lgread(lg, vcpu->arch.gdt, table, num * sizeof(vcpu->arch.gdt[0]));
+ fixup_gdt_table(vcpu, 0, ARRAY_SIZE(vcpu->arch.gdt));
/* Mark that the GDT changed so the core knows it has to copy it again,
* even if the Guest is run on the same CPU. */
lg->changed |= CHANGED_GDT;
@@ -165,12 +167,13 @@ void load_guest_gdt(struct lguest *lg, unsigned long table, u32 num)
* Remember that this happens on every context switch, so it's worth
* optimizing. But wouldn't it be neater to have a single hypercall to cover
* both cases? */
-void guest_load_tls(struct lguest *lg, unsigned long gtls)
+void guest_load_tls(struct lg_vcpu *vcpu, unsigned long gtls)
{
- struct desc_struct *tls = &lg->arch.gdt[GDT_ENTRY_TLS_MIN];
+ struct desc_struct *tls = &vcpu->arch.gdt[GDT_ENTRY_TLS_MIN];
+ struct lguest *lg = vcpu->lg;

__lgread(lg, tls, gtls, sizeof(*tls)*GDT_ENTRY_TLS_ENTRIES);
- fixup_gdt_table(lg, GDT_ENTRY_TLS_MIN, GDT_ENTRY_TLS_MAX+1);
+ fixup_gdt_table(vcpu, GDT_ENTRY_TLS_MIN, GDT_ENTRY_TLS_MAX+1);
/* Note that just the TLS entries have changed. */
lg->changed |= CHANGED_GDT_TLS;
}
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index f213d00..edfac30 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -101,14 +101,14 @@ static void copy_in_guest_info(struct lg_vcpu *vcpu,

/* Copy direct-to-Guest trap entries. */
if (lg->changed & CHANGED_IDT)
- copy_traps(lg, pages->state.guest_idt, default_idt_entries);
+ copy_traps(vcpu, pages->state.guest_idt, default_idt_entries);

/* Copy all GDT entries which the Guest can change. */
if (lg->changed & CHANGED_GDT)
- copy_gdt(lg, pages->state.guest_gdt);
+ copy_gdt(vcpu, pages->state.guest_gdt);
/* If only the TLS entries have changed, copy them. */
else if (lg->changed & CHANGED_GDT_TLS)
- copy_gdt_tls(lg, pages->state.guest_gdt);
+ copy_gdt_tls(vcpu, pages->state.guest_gdt);

/* Mark the Guest as unchanged for next time. */
lg->changed = 0;
@@ -198,7 +198,7 @@ void lguest_arch_run_guest(struct lg_vcpu *vcpu)
* re-enable interrupts an interrupt could fault and thus overwrite
* cr2, or we could even move off to a different CPU. */
if (vcpu->regs->trapnum == 14)
- lg->arch.last_pagefault = read_cr2();
+ vcpu->arch.last_pagefault = read_cr2();
/* Similarly, if we took a trap because the Guest used the FPU,
* we have to restore the FPU it expects to see. */
else if (vcpu->regs->trapnum == 7)
@@ -309,7 +309,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, lg->arch.last_pagefault,
+ if (demand_page(lg, vcpu->arch.last_pagefault,
vcpu->regs->errcode))
return;

@@ -321,7 +321,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
* happen before it's done the LHCALL_LGUEST_INIT hypercall, so
* lg->lguest_data could be NULL */
if (lg->lguest_data &&
- put_user(lg->arch.last_pagefault, &lg->lguest_data->cr2))
+ put_user(vcpu->arch.last_pagefault, &lg->lguest_data->cr2))
kill_guest(lg, "Writing cr2");
break;
case 7: /* We've intercepted a Device Not Available fault. */
@@ -352,7 +352,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
* it handle), it dies with a cryptic error message. */
kill_guest(lg, "unhandled trap %li at %#lx (%#lx)",
vcpu->regs->trapnum, vcpu->regs->eip,
- vcpu->regs->trapnum == 14 ? lg->arch.last_pagefault
+ vcpu->regs->trapnum == 14 ? vcpu->arch.last_pagefault
: vcpu->regs->errcode);
}

@@ -498,17 +498,15 @@ void __exit lguest_arch_host_fini(void)
/*H:122 The i386-specific hypercalls simply farm out to the right functions. */
int lguest_arch_do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
{
- struct lguest *lg = vcpu->lg;
-
switch (args->arg0) {
case LHCALL_LOAD_GDT:
- load_guest_gdt(lg, args->arg1, args->arg2);
+ load_guest_gdt(vcpu, args->arg1, args->arg2);
break;
case LHCALL_LOAD_IDT_ENTRY:
- load_guest_idt_entry(lg, args->arg1, args->arg2, args->arg3);
+ load_guest_idt_entry(vcpu, args->arg1, args->arg2, args->arg3);
break;
case LHCALL_LOAD_TLS:
- guest_load_tls(lg, args->arg1);
+ guest_load_tls(vcpu, args->arg1);
break;
default:
/* Bad Guest. Bad! */
@@ -589,5 +587,5 @@ void lguest_arch_setup_regs(struct lg_vcpu *vcpu, unsigned long start)

/* There are a couple of GDT entries the Guest expects when first
* booting. */
- setup_guest_gdt(vcpu->lg);
+ setup_guest_gdt(vcpu);
}
diff --git a/include/asm-x86/lguest.h b/include/asm-x86/lguest.h
index ccd3384..441bb7a 100644
--- a/include/asm-x86/lguest.h
+++ b/include/asm-x86/lguest.h
@@ -56,7 +56,7 @@ struct lguest_ro_state
struct desc_struct guest_gdt[GDT_ENTRIES];
};

-struct lguest_arch
+struct lg_vcpu_arch
{
/* The GDT entries copied into lguest_ro_state when running. */
struct desc_struct gdt[GDT_ENTRIES];
--
1.5.0.6

2008-01-07 13:10:18

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 11/16] make registers per-vcpu

This is the most obvious per-vcpu field: registers.

So this patch moves it from struct lguest to struct vcpu,
and patch the places in which they are used, accordingly

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/interrupts_and_traps.c | 29 ++++++++++++-----------
drivers/lguest/lg.h | 9 ++++---
drivers/lguest/lguest_user.c | 36 +++++++++++++++---------------
drivers/lguest/page_tables.c | 4 ++-
drivers/lguest/x86/core.c | 39 +++++++++++++++++----------------
5 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index d28671b..4cc7404 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -71,7 +71,7 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
/* There are two cases for interrupts: one where the Guest is already
* in the kernel, and a more complex one where the Guest is in
* userspace. We check the privilege level to find out. */
- if ((lg->regs->ss&0x3) != GUEST_PL) {
+ if ((vcpu->regs->ss&0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
* hypercall: both the virtual address and the segment */
virtstack = lg->esp1;
@@ -82,12 +82,12 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
* stack: when the Guest does an "iret" back from the interrupt
* handler the CPU will notice they're dropping privilege
* levels and expect these here. */
- push_guest_stack(lg, &gstack, lg->regs->ss);
- push_guest_stack(lg, &gstack, lg->regs->esp);
+ push_guest_stack(lg, &gstack, vcpu->regs->ss);
+ push_guest_stack(lg, &gstack, vcpu->regs->esp);
} else {
/* We're staying on the same Guest (kernel) stack. */
- virtstack = lg->regs->esp;
- ss = lg->regs->ss;
+ virtstack = vcpu->regs->esp;
+ ss = vcpu->regs->ss;

origstack = gstack = guest_pa(lg, virtstack);
}
@@ -96,7 +96,7 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
* the "Interrupt Flag" bit is always set. We copy that bit from the
* Guest's "irq_enabled" field into the eflags word: we saw the Guest
* copy it back in "lguest_iret". */
- eflags = lg->regs->eflags;
+ eflags = vcpu->regs->eflags;
if (get_user(irq_enable, &lg->lguest_data->irq_enabled) == 0
&& !(irq_enable & X86_EFLAGS_IF))
eflags &= ~X86_EFLAGS_IF;
@@ -105,19 +105,19 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
* "eflags" word, the old code segment, and the old instruction
* pointer. */
push_guest_stack(lg, &gstack, eflags);
- push_guest_stack(lg, &gstack, lg->regs->cs);
- push_guest_stack(lg, &gstack, lg->regs->eip);
+ push_guest_stack(lg, &gstack, vcpu->regs->cs);
+ push_guest_stack(lg, &gstack, vcpu->regs->eip);

/* For the six traps which supply an error code, we push that, too. */
if (has_err)
- push_guest_stack(lg, &gstack, lg->regs->errcode);
+ push_guest_stack(lg, &gstack, vcpu->regs->errcode);

/* Now we've pushed all the old state, we change the stack, the code
* segment and the address to execute. */
- lg->regs->ss = ss;
- lg->regs->esp = virtstack + (gstack - origstack);
- lg->regs->cs = (__KERNEL_CS|GUEST_PL);
- lg->regs->eip = idt_address(lo, hi);
+ vcpu->regs->ss = ss;
+ vcpu->regs->esp = virtstack + (gstack - origstack);
+ vcpu->regs->cs = (__KERNEL_CS|GUEST_PL);
+ vcpu->regs->eip = idt_address(lo, hi);

/* There are two kinds of interrupt handlers: 0xE is an "interrupt
* gate" which expects interrupts to be disabled on entry. */
@@ -158,7 +158,8 @@ void maybe_do_interrupt(struct lg_vcpu *vcpu)

/* They may be in the middle of an iret, where they asked us never to
* deliver interrupts. */
- if (lg->regs->eip >= lg->noirq_start && lg->regs->eip < lg->noirq_end)
+ if ((vcpu->regs->eip >= lg->noirq_start) &&
+ (vcpu->regs->eip < lg->noirq_end))
return;

/* If they're halted, interrupts restart them. */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f871737..d8429a0 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -44,6 +44,10 @@ struct lg_vcpu {
int vcpu_id;
struct lguest *lg;

+ /* At end of a page shared mapped over lguest_pages in guest. */
+ unsigned long regs_page;
+ struct lguest_regs *regs;
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -58,9 +62,6 @@ struct lg_vcpu {
/* The private info the thread maintains about the guest. */
struct lguest
{
- /* At end of a page shared mapped over lguest_pages in guest. */
- unsigned long regs_page;
- struct lguest_regs *regs;
struct lguest_data __user *lguest_data;
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
@@ -182,7 +183,7 @@ void lguest_arch_run_guest(struct lg_vcpu *vcpu);
void lguest_arch_handle_trap(struct lg_vcpu *vcpu);
int lguest_arch_init_hypercalls(struct lg_vcpu *vcpu);
int lguest_arch_do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args);
-void lguest_arch_setup_regs(struct lguest *lg, unsigned long start);
+void lguest_arch_setup_regs(struct lg_vcpu *vcpu, unsigned long start);

/* <arch>/switcher.S: */
extern char start_switcher_text[], end_switcher_text[], switch_to_guest[];
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index abae008..cd68446 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -108,6 +108,19 @@ static int vcpu_start(struct lg_vcpu *vcpu, int vcpu_id,
/* The timer for lguest's clock needs initialization. */
init_clockdev(vcpu);

+ /* We need a complete page for the Guest registers: they are accessible
+ * to the Guest and we can only grant it access to whole pages. */
+ vcpu->regs_page = get_zeroed_page(GFP_KERNEL);
+ if (!vcpu->regs_page)
+ return -ENOMEM;
+
+ /* We actually put the registers at the bottom of the page. */
+ vcpu->regs = (void *)vcpu->regs_page + PAGE_SIZE - sizeof(*vcpu->regs);
+
+ /* Now we initialize the Guest's registers, handing it the start
+ * address. */
+ lguest_arch_setup_regs(vcpu, start_ip);
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -166,16 +179,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto release_guest;

- /* We need a complete page for the Guest registers: they are accessible
- * to the Guest and we can only grant it access to whole pages. */
- lg->regs_page = get_zeroed_page(GFP_KERNEL);
- if (!lg->regs_page) {
- err = -ENOMEM;
- goto release_guest;
- }
- /* We actually put the registers at the bottom of the page. */
- lg->regs = (void *)lg->regs_page + PAGE_SIZE - sizeof(*lg->regs);
-
/* Initialize the Guest's shadow page tables, using the toplevel
* address the Launcher gave us. This allocates memory, so can
* fail. */
@@ -183,10 +186,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto free_regs;

- /* Now we initialize the Guest's registers, handing it the start
- * address. */
- lguest_arch_setup_regs(lg, args[3]);
-
/* We keep a pointer to the Launcher task (ie. current task) for when
* other Guests want to wake this one (inter-Guest I/O). */
lg->tsk = current;
@@ -211,7 +210,7 @@ static int initialize(struct file *file, const unsigned long __user *input)
return sizeof(args);

free_regs:
- free_page(lg->regs_page);
+ free_page(lg->vcpus[0].regs_page);
release_guest:
kfree(lg);
unlock:
@@ -286,9 +285,12 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
- for (i = 0; i < lg->nr_vcpus; i++)
+ for (i = 0; i < lg->nr_vcpus; i++) {
/* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
hrtimer_cancel(&lg->vcpus[i].hrt);
+ /* We can free up the register page we allocated. */
+ free_page(lg->vcpus[i].regs_page);
+ }
/* Free up the shadow page tables for the Guest. */
free_guest_pagetable(lg);
/* Now all the memory cleanups are done, it's safe to release the
@@ -298,8 +300,6 @@ static int close(struct inode *inode, struct file *file)
* kmalloc()ed string, either of which is ok to hand to kfree(). */
if (!IS_ERR(lg->dead))
kfree(lg->dead);
- /* We can free up the register page we allocated. */
- free_page(lg->regs_page);
/* We clear the entire structure, which also marks it as free for the
* next user. */
memset(lg, 0, sizeof(*lg));
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index c79fac2..5045325 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -641,6 +641,7 @@ void map_switcher_in_guest(struct lg_vcpu *vcpu,
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
+ unsigned long pfn;

/* Make the last PGD entry for this Guest point to the Switcher's PTE
* page for this CPU (with appropriate flags). */
@@ -655,7 +656,8 @@ void map_switcher_in_guest(struct lg_vcpu *vcpu,
* CPU's "struct lguest_pages": if we make sure the Guest's register
* page is already mapped there, we don't have to copy them out
* again. */
- regs_pte = pfn_pte (__pa(lg->regs_page) >> PAGE_SHIFT, __pgprot(_PAGE_KERNEL));
+ pfn = __pa(vcpu->regs_page) >> PAGE_SHIFT;
+ regs_pte = pfn_pte(pfn, __pgprot(_PAGE_KERNEL));
switcher_pte_page[(unsigned long)pages/PAGE_SIZE%PTRS_PER_PTE] = regs_pte;
}
/*:*/
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index b336fff..f213d00 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -129,7 +129,7 @@ static void run_guest_once(struct lg_vcpu *vcpu,
/* Set the trap number to 256 (impossible value). If we fault while
* switching to the Guest (bad segment registers or bug), this will
* cause us to abort the Guest. */
- lg->regs->trapnum = 256;
+ vcpu->regs->trapnum = 256;

/* Now: we push the "eflags" register on the stack, then do an "lcall".
* This is how we change from using the kernel code segment to using
@@ -197,11 +197,11 @@ void lguest_arch_run_guest(struct lg_vcpu *vcpu)
* bad virtual address. We have to grab this now, because once we
* re-enable interrupts an interrupt could fault and thus overwrite
* cr2, or we could even move off to a different CPU. */
- if (lg->regs->trapnum == 14)
+ if (vcpu->regs->trapnum == 14)
lg->arch.last_pagefault = read_cr2();
/* Similarly, if we took a trap because the Guest used the FPU,
* we have to restore the FPU it expects to see. */
- else if (lg->regs->trapnum == 7)
+ else if (vcpu->regs->trapnum == 7)
math_state_restore();

/* Restore SYSENTER if it's supposed to be on. */
@@ -227,12 +227,12 @@ static int emulate_insn(struct lg_vcpu *vcpu)
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
* guest_pa just subtracts the Guest's page_offset. */
- unsigned long physaddr = guest_pa(lg, lg->regs->eip);
+ unsigned long physaddr = guest_pa(lg, vcpu->regs->eip);

/* This must be the Guest kernel trying to do something, not userspace!
* The bottom two bits of the CS segment register are the privilege
* level. */
- if ((lg->regs->cs & 3) != GUEST_PL)
+ if ((vcpu->regs->cs & 3) != GUEST_PL)
return 0;

/* Decoding x86 instructions is icky. */
@@ -275,12 +275,12 @@ static int emulate_insn(struct lg_vcpu *vcpu)
if (in) {
/* Lower bit tells is whether it's a 16 or 32 bit access */
if (insn & 0x1)
- lg->regs->eax = 0xFFFFFFFF;
+ vcpu->regs->eax = 0xFFFFFFFF;
else
- lg->regs->eax |= (0xFFFF << shift);
+ vcpu->regs->eax |= (0xFFFF << shift);
}
/* Finally, we've "done" the instruction, so move past it. */
- lg->regs->eip += insnlen;
+ vcpu->regs->eip += insnlen;
/* Success! */
return 1;
}
@@ -289,12 +289,12 @@ static int emulate_insn(struct lg_vcpu *vcpu)
void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
{
struct lguest *lg = vcpu->lg;
- switch (lg->regs->trapnum) {
+ switch (vcpu->regs->trapnum) {
case 13: /* We've intercepted a General Protection Fault. */
/* Check if this was one of those annoying IN or OUT
* instructions which we need to emulate. If so, we just go
* back into the Guest after we've done it. */
- if (lg->regs->errcode == 0) {
+ if (vcpu->regs->errcode == 0) {
if (emulate_insn(vcpu))
return;
}
@@ -309,7 +309,8 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, lg->arch.last_pagefault, lg->regs->errcode))
+ if (demand_page(lg, lg->arch.last_pagefault,
+ vcpu->regs->errcode))
return;

/* OK, it's really not there (or not OK): the Guest needs to
@@ -340,19 +341,19 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
case LGUEST_TRAP_ENTRY:
/* Our 'struct hcall_args' maps directly over our regs: we set
* up the pointer now to indicate a hypercall is pending. */
- vcpu->hcall = (struct hcall_args *)lg->regs;
+ vcpu->hcall = (struct hcall_args *)vcpu->regs;
return;
}

/* We didn't handle the trap, so it needs to go to the Guest. */
- if (!deliver_trap(vcpu, lg->regs->trapnum))
+ if (!deliver_trap(vcpu, vcpu->regs->trapnum))
/* If the Guest doesn't have a handler (either it hasn't
* registered any yet, or it's one of the faults we don't let
* it handle), it dies with a cryptic error message. */
kill_guest(lg, "unhandled trap %li at %#lx (%#lx)",
- lg->regs->trapnum, lg->regs->eip,
- lg->regs->trapnum == 14 ? lg->arch.last_pagefault
- : lg->regs->errcode);
+ vcpu->regs->trapnum, vcpu->regs->eip,
+ vcpu->regs->trapnum == 14 ? lg->arch.last_pagefault
+ : vcpu->regs->errcode);
}

/* Now we can look at each of the routines this calls, in increasing order of
@@ -559,9 +560,9 @@ int lguest_arch_init_hypercalls(struct lg_vcpu *vcpu)
*
* Most of the Guest's registers are left alone: we used get_zeroed_page() to
* allocate the structure, so they will be 0. */
-void lguest_arch_setup_regs(struct lguest *lg, unsigned long start)
+void lguest_arch_setup_regs(struct lg_vcpu *vcpu, unsigned long start)
{
- struct lguest_regs *regs = lg->regs;
+ struct lguest_regs *regs = vcpu->regs;

/* There are four "segment" registers which the Guest needs to boot:
* The "code segment" register (cs) refers to the kernel code segment
@@ -588,5 +589,5 @@ void lguest_arch_setup_regs(struct lguest *lg, unsigned long start)

/* There are a couple of GDT entries the Guest expects when first
* booting. */
- setup_guest_gdt(lg);
+ setup_guest_gdt(vcpu->lg);
}
--
1.5.0.6

2008-01-07 13:10:45

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 13/16] per-vcpu lguest task management

lguest uses tasks to control its running behaviour (like sending
breaks, controlling halted state, etc). In a per-vcpu environment,
each vcpu will have its own underlying task. So this patch
makes the infrastructure for that possible

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 4 +-
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 8 ++--
drivers/lguest/lg.h | 14 ++++----
drivers/lguest/lguest_user.c | 55 ++++++++++++++++++---------------
5 files changed, 44 insertions(+), 39 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index bc3b32d..847f2df 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -197,7 +197,7 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)
return -ERESTARTSYS;

/* If Waker set break_out, return to Launcher. */
- if (lg->break_out)
+ if (vcpu->break_out)
return -EAGAIN;

/* Check if there are any interrupts which can be delivered
@@ -217,7 +217,7 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)

/* If the Guest asked to be stopped, we sleep. The Guest's
* clock timer or LHCALL_BREAK from the Waker will wake us. */
- if (lg->halted) {
+ if (vcpu->halted) {
set_current_state(TASK_INTERRUPTIBLE);
schedule();
continue;
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 1bf133e..edc8cb4 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -86,7 +86,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
- lg->halted = 1;
+ vcpu->halted = 1;
break;
case LHCALL_NOTIFY:
lg->pending_notify = args->arg1;
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index f8f7efe..c1ca198 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -163,11 +163,11 @@ void maybe_do_interrupt(struct lg_vcpu *vcpu)
return;

/* If they're halted, interrupts restart them. */
- if (lg->halted) {
+ if (vcpu->halted) {
/* Re-enable interrupts. */
if (put_user(X86_EFLAGS_IF, &lg->lguest_data->irq_enabled))
kill_guest(lg, "Re-enabling interrupts");
- lg->halted = 0;
+ vcpu->halted = 0;
} else {
/* Otherwise we check if they have interrupts disabled. */
u32 irq_enabled;
@@ -500,8 +500,8 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
/* Remember the first interrupt is the timer interrupt. */
set_bit(0, vcpu->irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
- if (vcpu->lg->halted)
- wake_up_process(vcpu->lg->tsk);
+ if (vcpu->halted)
+ wake_up_process(vcpu->tsk);
return HRTIMER_NORESTART;
}

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 5165172..8c9c8df 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -43,6 +43,8 @@ struct lguest;
struct lg_vcpu {
int vcpu_id;
struct lguest *lg;
+ struct task_struct *tsk;
+ struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */

/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
@@ -55,6 +57,11 @@ struct lg_vcpu {
/* Virtual clock device */
struct hrtimer hrt;

+ /* Do we need to stop what we're doing and return to userspace? */
+ int break_out;
+ wait_queue_head_t break_wq;
+ int halted;
+
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);

@@ -65,8 +72,6 @@ struct lg_vcpu {
struct lguest
{
struct lguest_data __user *lguest_data;
- struct task_struct *tsk;
- struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */
struct lg_vcpu vcpus[NR_CPUS];
unsigned int nr_vcpus;

@@ -76,15 +81,10 @@ struct lguest
void __user *mem_base;
unsigned long kernel_address;
u32 cr2;
- int halted;
int ts;
u32 esp1;
u8 ss1;

- /* Do we need to stop what we're doing and return to userspace? */
- int break_out;
- wait_queue_head_t break_wq;
-
/* Bitmap of what has changed: see CHANGED_* above. */
int changed;
struct lguest_pages *last_pages;
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index cd68446..acc1616 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -13,7 +13,8 @@
* LHREQ_BREAK and the value "1" to /dev/lguest to do this. Once the Launcher
* has done whatever needs attention, it writes LHREQ_BREAK and "0" to release
* the Waker. */
-static int break_guest_out(struct lguest *lg, const unsigned long __user *input)
+static int break_guest_out(struct lg_vcpu *vcpu,
+ const unsigned long __user *input)
{
unsigned long on;

@@ -22,14 +23,15 @@ static int break_guest_out(struct lguest *lg, const unsigned long __user *input)
return -EFAULT;

if (on) {
- lg->break_out = 1;
+ vcpu->break_out = 1;
/* Pop it out of the Guest (may be running on different CPU) */
- wake_up_process(lg->tsk);
+ wake_up_process(vcpu->tsk);
/* Wait for them to reset it */
- return wait_event_interruptible(lg->break_wq, !lg->break_out);
+ return wait_event_interruptible(vcpu->break_wq,
+ !vcpu->break_out);
} else {
- lg->break_out = 0;
- wake_up(&lg->break_wq);
+ vcpu->break_out = 0;
+ wake_up(&vcpu->break_wq);
return 0;
}
}
@@ -70,7 +72,7 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
vcpu = &lg->vcpus[vcpu_id];

/* If you're not the task which owns the Guest, go away. */
- if (current != lg->tsk)
+ if (current != vcpu->tsk)
return -EPERM;

/* If the guest is already dead, we indicate why */
@@ -121,6 +123,18 @@ static int vcpu_start(struct lg_vcpu *vcpu, int vcpu_id,
* address. */
lguest_arch_setup_regs(vcpu, start_ip);

+ /* Initialize the queue for the waker to wait on */
+ init_waitqueue_head(&vcpu->break_wq);
+
+ /* We keep a pointer to the Launcher task (ie. current task) for when
+ * other Guests want to wake this one (inter-Guest I/O). */
+ vcpu->tsk = current;
+
+ /* We need to keep a pointer to the Launcher's memory map, because if
+ * the Launcher dies we need to clean it up. If we don't keep a
+ * reference, it is destroyed before close() is called. */
+ vcpu->mm = get_task_mm(vcpu->tsk);
+
vcpu->lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
vcpu->lg->nr_vcpus++;

@@ -186,17 +200,6 @@ static int initialize(struct file *file, const unsigned long __user *input)
if (err)
goto free_regs;

- /* We keep a pointer to the Launcher task (ie. current task) for when
- * other Guests want to wake this one (inter-Guest I/O). */
- lg->tsk = current;
- /* We need to keep a pointer to the Launcher's memory map, because if
- * the Launcher dies we need to clean it up. If we don't keep a
- * reference, it is destroyed before close() is called. */
- lg->mm = get_task_mm(lg->tsk);
-
- /* Initialize the queue for the waker to wait on */
- init_waitqueue_head(&lg->break_wq);
-
/* We remember which CPU's pages this Guest used last, for optimization
* when the same Guest runs on the same CPU twice. */
lg->last_pages = NULL;
@@ -251,7 +254,7 @@ static ssize_t write(struct file *file, const char __user *in,
return -ENOENT;

/* If you're not the task which owns the Guest, you can only break */
- if (lg && current != lg->tsk && req != LHREQ_BREAK)
+ if (lg && current != vcpu->tsk && req != LHREQ_BREAK)
return -EPERM;

switch (req) {
@@ -260,7 +263,7 @@ static ssize_t write(struct file *file, const char __user *in,
case LHREQ_IRQ:
return user_send_irq(vcpu, input);
case LHREQ_BREAK:
- return break_guest_out(lg, input);
+ return break_guest_out(vcpu, input);
default:
return -EINVAL;
}
@@ -285,17 +288,19 @@ static int close(struct inode *inode, struct file *file)
/* We need the big lock, to protect from inter-guest I/O and other
* Launchers initializing guests. */
mutex_lock(&lguest_lock);
+
+ /* Free up the shadow page tables for the Guest. */
+ free_guest_pagetable(lg);
+
for (i = 0; i < lg->nr_vcpus; i++) {
/* Cancels the hrtimer set via LHCALL_SET_CLOCKEVENT. */
hrtimer_cancel(&lg->vcpus[i].hrt);
/* We can free up the register page we allocated. */
free_page(lg->vcpus[i].regs_page);
+ /* Now all the memory cleanups are done, it's safe to release
+ * the Launcher's memory management structure. */
+ mmput(lg->vcpus[i].mm);
}
- /* Free up the shadow page tables for the Guest. */
- free_guest_pagetable(lg);
- /* Now all the memory cleanups are done, it's safe to release the
- * Launcher's memory management structure. */
- mmput(lg->mm);
/* If lg->dead doesn't contain an error code it will be NULL or a
* kmalloc()ed string, either of which is ok to hand to kfree(). */
if (!IS_ERR(lg->dead))
--
1.5.0.6

2008-01-07 13:11:00

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 14/16] makes special fields be per-vcpu

lguest struct have room for some fields, namely, cr2, ts, esp1
and ss1, that are not really guest-wide, but rather, vcpu-wide.

This patch puts it in the vcpu struct

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 10 +++++-----
drivers/lguest/interrupts_and_traps.c | 24 +++++++++++++-----------
drivers/lguest/lg.h | 18 ++++++++++--------
drivers/lguest/page_tables.c | 11 ++++++-----
drivers/lguest/x86/core.c | 10 ++++------
5 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index edc8cb4..4a4133b 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -58,7 +58,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
/* FLUSH_TLB comes in two flavors, depending on the
* argument: */
if (args->arg1)
- guest_pagetable_clear_all(lg);
+ guest_pagetable_clear_all(vcpu);
else
guest_pagetable_flush_user(lg);
break;
@@ -66,10 +66,10 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
/* All these calls simply pass the arguments through to the right
* routines. */
case LHCALL_NEW_PGTABLE:
- guest_new_pagetable(lg, args->arg1);
+ guest_new_pagetable(vcpu, args->arg1);
break;
case LHCALL_SET_STACK:
- guest_set_stack(lg, args->arg1, args->arg2, args->arg3);
+ guest_set_stack(vcpu, args->arg1, args->arg2, args->arg3);
break;
case LHCALL_SET_PTE:
guest_set_pte(lg, args->arg1, args->arg2, __pte(args->arg3));
@@ -82,7 +82,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
- lg->ts = args->arg1;
+ vcpu->ts = args->arg1;
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
@@ -189,7 +189,7 @@ static void initialize(struct lg_vcpu *vcpu)
* first write to a Guest page. This may have caused a copy-on-write
* fault, but the old page might be (read-only) in the Guest
* pagetable. */
- guest_pagetable_clear_all(lg);
+ guest_pagetable_clear_all(vcpu);
}

/*H:100
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index c1ca198..745f3ae 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -74,8 +74,8 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
if ((vcpu->regs->ss&0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
* hypercall: both the virtual address and the segment */
- virtstack = lg->esp1;
- ss = lg->ss1;
+ virtstack = vcpu->esp1;
+ ss = vcpu->ss1;

origstack = gstack = guest_pa(lg, virtstack);
/* We push the old stack segment and pointer onto the new
@@ -313,10 +313,11 @@ static int direct_trap(unsigned int num)
* the Guest.
*
* Which is deeply unfair, because (literally!) it wasn't the Guests' fault. */
-void pin_stack_pages(struct lguest *lg)
+void pin_stack_pages(struct lg_vcpu *vcpu)
{
unsigned int i;

+ struct lguest *lg = vcpu->lg;
/* Depending on the CONFIG_4KSTACKS option, the Guest can have one or
* two pages of stack space. */
for (i = 0; i < lg->stack_pages; i++)
@@ -324,7 +325,7 @@ void pin_stack_pages(struct lguest *lg)
* start of the page after the kernel stack. Subtract one to
* get back onto the first stack page, and keep subtracting to
* get to the rest of the stack pages. */
- pin_page(lg, lg->esp1 - 1 - i * PAGE_SIZE);
+ pin_page(lg, vcpu->esp1 - 1 - i * PAGE_SIZE);
}

/* Direct traps also mean that we need to know whenever the Guest wants to use
@@ -335,21 +336,22 @@ void pin_stack_pages(struct lguest *lg)
*
* In Linux each process has its own kernel stack, so this happens a lot: we
* change stacks on each context switch. */
-void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages)
+void guest_set_stack(struct lg_vcpu *vcpu, u32 seg, u32 esp,
+ unsigned int pages)
{
/* You are not allowed have a stack segment with privilege level 0: bad
* Guest! */
if ((seg & 0x3) != GUEST_PL)
- kill_guest(lg, "bad stack segment %i", seg);
+ kill_guest(vcpu->lg, "bad stack segment %i", seg);
/* We only expect one or two stack pages. */
if (pages > 2)
- kill_guest(lg, "bad stack pages %u", pages);
+ kill_guest(vcpu->lg, "bad stack pages %u", pages);
/* Save where the stack is, and how many pages */
- lg->ss1 = seg;
- lg->esp1 = esp;
- lg->stack_pages = pages;
+ vcpu->ss1 = seg;
+ vcpu->esp1 = esp;
+ vcpu->lg->stack_pages = pages;
/* Make sure the new stack pages are mapped */
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}

/* All this reference to mapping stacks leads us neatly into the other complex
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 8c9c8df..1b3c933 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -46,6 +46,11 @@ struct lg_vcpu {
struct task_struct *tsk;
struct mm_struct *mm; /* == tsk->mm, but that becomes NULL on exit */

+ u32 cr2;
+ int ts;
+ u32 esp1;
+ u8 ss1;
+
/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
struct lguest_regs *regs;
@@ -80,10 +85,6 @@ struct lguest
* memory in the Launcher. */
void __user *mem_base;
unsigned long kernel_address;
- u32 cr2;
- int ts;
- u32 esp1;
- u8 ss1;

/* Bitmap of what has changed: see CHANGED_* above. */
int changed;
@@ -141,8 +142,9 @@ void maybe_do_interrupt(struct lg_vcpu *vcpu);
int deliver_trap(struct lg_vcpu *vcpu, unsigned int num);
void load_guest_idt_entry(struct lg_vcpu *vcpu, unsigned int i,
u32 low, u32 hi);
-void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages);
-void pin_stack_pages(struct lguest *lg);
+void guest_set_stack(struct lg_vcpu *vcpu, u32 seg, u32 esp,
+ unsigned int pages);
+void pin_stack_pages(struct lg_vcpu *vcpu);
void setup_default_idt_entries(struct lguest_ro_state *state,
const unsigned long *def);
void copy_traps(const struct lg_vcpu *vcpu, struct desc_struct *idt,
@@ -164,9 +166,9 @@ void copy_gdt_tls(const struct lg_vcpu *vcpu, struct desc_struct *gdt);
/* page_tables.c: */
int init_guest_pagetable(struct lguest *lg, unsigned long pgtable);
void free_guest_pagetable(struct lguest *lg);
-void guest_new_pagetable(struct lguest *lg, unsigned long pgtable);
+void guest_new_pagetable(struct lg_vcpu *vcpu, unsigned long pgtable);
void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i);
-void guest_pagetable_clear_all(struct lguest *lg);
+void guest_pagetable_clear_all(struct lg_vcpu *vcpu);
void guest_pagetable_flush_user(struct lguest *lg);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index 5045325..1a7ac3a 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -432,9 +432,10 @@ static unsigned int new_pgdir(struct lguest *lg,
* Now we've seen all the page table setting and manipulation, let's see what
* what happens when the Guest changes page tables (ie. changes the top-level
* pgdir). This occurs on almost every context switch. */
-void guest_new_pagetable(struct lguest *lg, unsigned long pgtable)
+void guest_new_pagetable(struct lg_vcpu *vcpu, unsigned long pgtable)
{
int newpgdir, repin = 0;
+ struct lguest *lg = vcpu->lg;

/* Look to see if we have this one already. */
newpgdir = find_pgdir(lg, pgtable);
@@ -446,7 +447,7 @@ void guest_new_pagetable(struct lguest *lg, unsigned long pgtable)
lg->pgdidx = newpgdir;
/* If it was completely blank, we map in the Guest kernel stack */
if (repin)
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}

/*H:470 Finally, a routine which throws away everything: all PGD entries in all
@@ -468,11 +469,11 @@ static void release_all_pagetables(struct lguest *lg)
* mapping. Since kernel mappings are in every page table, it's easiest to
* throw them all away. This traps the Guest in amber for a while as
* everything faults back in, but it's rare. */
-void guest_pagetable_clear_all(struct lguest *lg)
+void guest_pagetable_clear_all(struct lg_vcpu *vcpu)
{
- release_all_pagetables(lg);
+ release_all_pagetables(vcpu->lg);
/* We need the Guest kernel stack mapped again. */
- pin_stack_pages(lg);
+ pin_stack_pages(vcpu);
}
/*:*/
/*M:009 Since we throw away all mappings when a kernel mapping changes, our
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index edfac30..9a64174 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -96,8 +96,8 @@ static void copy_in_guest_info(struct lg_vcpu *vcpu,
/* Set up the two "TSS" members which tell the CPU what stack to use
* for traps which do directly into the Guest (ie. traps at privilege
* level 1). */
- pages->state.guest_tss.esp1 = lg->esp1;
- pages->state.guest_tss.ss1 = lg->ss1;
+ pages->state.guest_tss.esp1 = vcpu->esp1;
+ pages->state.guest_tss.ss1 = vcpu->ss1;

/* Copy direct-to-Guest trap entries. */
if (lg->changed & CHANGED_IDT)
@@ -167,12 +167,10 @@ static void run_guest_once(struct lg_vcpu *vcpu,
* are disabled: we own the CPU. */
void lguest_arch_run_guest(struct lg_vcpu *vcpu)
{
- struct lguest *lg = vcpu->lg;
-
/* Remember the awfully-named TS bit? If the Guest has asked to set it
* we set it now, so we can trap and pass that trap to the Guest if it
* uses the FPU. */
- if (lg->ts)
+ if (vcpu->ts)
lguest_set_ts();

/* SYSENTER is an optimized way of doing system calls. We can't allow
@@ -328,7 +326,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
/* If the Guest doesn't want to know, we already restored the
* Floating Point Unit, so we just continue without telling
* it. */
- if (!lg->ts)
+ if (!vcpu->ts)
return;
break;
case 32 ... 255:
--
1.5.0.6

2008-01-07 13:11:30

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 15/16] make pending notifications per-vcpu

this patch makes the pending_notify field, used to control
pending notifications, per-vcpu, instead of per-guest

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/core.c | 6 +++---
drivers/lguest/hypercalls.c | 6 +++---
drivers/lguest/lg.h | 3 ++-
drivers/lguest/lguest_user.c | 4 ++--
4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 847f2df..eae5149 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -186,10 +186,10 @@ int run_guest(struct lg_vcpu *vcpu, unsigned long __user *user)

/* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
- if (lg->pending_notify) {
- if (put_user(lg->pending_notify, user))
+ if (vcpu->pending_notify) {
+ if (put_user(vcpu->pending_notify, user))
return -EFAULT;
- return sizeof(lg->pending_notify);
+ return sizeof(vcpu->pending_notify);
}

/* Check for signals */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 4a4133b..ae8c0b4 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -89,7 +89,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
vcpu->halted = 1;
break;
case LHCALL_NOTIFY:
- lg->pending_notify = args->arg1;
+ vcpu->pending_notify = args->arg1;
break;
default:
/* It should be an architecture-specific hypercall. */
@@ -152,7 +152,7 @@ static void do_async_hcalls(struct lg_vcpu *vcpu)

/* Stop doing hypercalls if they want to notify the Launcher:
* it needs to service this first. */
- if (lg->pending_notify)
+ if (vcpu->pending_notify)
break;
}
}
@@ -217,7 +217,7 @@ void do_hypercalls(struct lg_vcpu *vcpu)
/* If we stopped reading the hypercall ring because the Guest did a
* NOTIFY to the Launcher, we want to return now. Otherwise we do
* the hypercall. */
- if (!vcpu->lg->pending_notify) {
+ if (!vcpu->pending_notify) {
do_hcall(vcpu, vcpu->hcall);
/* Tricky point: we reset the hcall pointer to mark the
* hypercall as "done". We use the hcall pointer rather than
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 1b3c933..d33445f 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -51,6 +51,8 @@ struct lg_vcpu {
u32 esp1;
u8 ss1;

+ unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */
+
/* At end of a page shared mapped over lguest_pages in guest. */
unsigned long regs_page;
struct lguest_regs *regs;
@@ -95,7 +97,6 @@ struct lguest
struct pgdir pgdirs[4];

unsigned long noirq_start, noirq_end;
- unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */

unsigned int stack_pages;
u32 tsc_khz;
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index acc1616..8ac6d2b 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -92,8 +92,8 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)

/* If we returned from read() last time because the Guest notified,
* clear the flag. */
- if (lg->pending_notify)
- lg->pending_notify = 0;
+ if (vcpu->pending_notify)
+ vcpu->pending_notify = 0;

/* Run the Guest until something interesting happens. */
return run_guest(vcpu, (unsigned long __user *)user);
--
1.5.0.6

2008-01-07 13:11:45

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 16/16] per-vcpu lguest pgdir management

this patch makes the pgdir management per-vcpu. The pgdirs pool
is still guest-wide (although it'll probably need to grow when we
are really executing more vcpus), but the pgdidx index is gone,
since it makes no sense anymore. Instead, we use a per-vcpu
index.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/lguest/hypercalls.c | 2 +-
drivers/lguest/interrupts_and_traps.c | 6 ++--
drivers/lguest/lg.h | 12 +++---
drivers/lguest/page_tables.c | 60 +++++++++++++++++----------------
drivers/lguest/x86/core.c | 6 ++--
5 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index ae8c0b4..b3a1942 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -60,7 +60,7 @@ static void do_hcall(struct lg_vcpu *vcpu, struct hcall_args *args)
if (args->arg1)
guest_pagetable_clear_all(vcpu);
else
- guest_pagetable_flush_user(lg);
+ guest_pagetable_flush_user(vcpu);
break;

/* All these calls simply pass the arguments through to the right
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 745f3ae..68b403f 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -77,7 +77,7 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
virtstack = vcpu->esp1;
ss = vcpu->ss1;

- origstack = gstack = guest_pa(lg, virtstack);
+ origstack = gstack = guest_pa(vcpu, virtstack);
/* We push the old stack segment and pointer onto the new
* stack: when the Guest does an "iret" back from the interrupt
* handler the CPU will notice they're dropping privilege
@@ -89,7 +89,7 @@ static void set_guest_interrupt(struct lg_vcpu *vcpu, u32 lo, u32 hi,
virtstack = vcpu->regs->esp;
ss = vcpu->regs->ss;

- origstack = gstack = guest_pa(lg, virtstack);
+ origstack = gstack = guest_pa(vcpu, virtstack);
}

/* Remember that we never let the Guest actually disable interrupts, so
@@ -325,7 +325,7 @@ void pin_stack_pages(struct lg_vcpu *vcpu)
* start of the page after the kernel stack. Subtract one to
* get back onto the first stack page, and keep subtracting to
* get to the rest of the stack pages. */
- pin_page(lg, vcpu->esp1 - 1 - i * PAGE_SIZE);
+ pin_page(vcpu, vcpu->esp1 - 1 - i * PAGE_SIZE);
}

/* Direct traps also mean that we need to know whenever the Guest wants to use
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index d33445f..6e6a69e 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -57,6 +57,8 @@ struct lg_vcpu {
unsigned long regs_page;
struct lguest_regs *regs;

+ int vcpu_pgd; /* which pgd this vcpu is currently using */
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -92,8 +94,6 @@ struct lguest
int changed;
struct lguest_pages *last_pages;

- /* We keep a small number of these. */
- u32 pgdidx;
struct pgdir pgdirs[4];

unsigned long noirq_start, noirq_end;
@@ -170,14 +170,14 @@ void free_guest_pagetable(struct lguest *lg);
void guest_new_pagetable(struct lg_vcpu *vcpu, unsigned long pgtable);
void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i);
void guest_pagetable_clear_all(struct lg_vcpu *vcpu);
-void guest_pagetable_flush_user(struct lguest *lg);
+void guest_pagetable_flush_user(struct lg_vcpu *vcpu);
void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
unsigned long vaddr, pte_t val);
void map_switcher_in_guest(struct lg_vcpu *vcpu,
struct lguest_pages *pages);
-int demand_page(struct lguest *info, unsigned long cr2, int errcode);
-void pin_page(struct lguest *lg, unsigned long vaddr);
-unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
+int demand_page(struct lg_vcpu *vcpu, unsigned long cr2, int errcode);
+void pin_page(struct lg_vcpu *vcpu, unsigned long vaddr);
+unsigned long guest_pa(struct lg_vcpu *vcpu, unsigned long vaddr);
void page_table_guest_data_init(struct lguest *lg);

/* <arch>/core.c: */
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index 1a7ac3a..839ea27 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -94,10 +94,10 @@ static pte_t *spte_addr(struct lguest *lg, pgd_t spgd, unsigned long vaddr)

/* These two functions just like the above two, except they access the Guest
* page tables. Hence they return a Guest address. */
-static unsigned long gpgd_addr(struct lguest *lg, unsigned long vaddr)
+static unsigned long gpgd_addr(struct lg_vcpu *vcpu, unsigned long vaddr)
{
unsigned int index = vaddr >> (PGDIR_SHIFT);
- return lg->pgdirs[lg->pgdidx].gpgdir + index * sizeof(pgd_t);
+ return vcpu->lg->pgdirs[vcpu->vcpu_pgd].gpgdir + index * sizeof(pgd_t);
}

static unsigned long gpte_addr(struct lguest *lg,
@@ -200,22 +200,23 @@ static void check_gpgd(struct lguest *lg, pgd_t gpgd)
*
* If we fixed up the fault (ie. we mapped the address), this routine returns
* true. Otherwise, it was a real fault and we need to tell the Guest. */
-int demand_page(struct lguest *lg, unsigned long vaddr, int errcode)
+int demand_page(struct lg_vcpu *vcpu, unsigned long vaddr, int errcode)
{
pgd_t gpgd;
pgd_t *spgd;
unsigned long gpte_ptr;
pte_t gpte;
pte_t *spte;
+ struct lguest *lg = vcpu->lg;

/* First step: get the top-level Guest page table entry. */
- gpgd = lgread(lg, gpgd_addr(lg, vaddr), pgd_t);
+ gpgd = lgread(lg, gpgd_addr(vcpu, vaddr), pgd_t);
/* Toplevel not present? We can't map it in. */
if (!(pgd_flags(gpgd) & _PAGE_PRESENT))
return 0;

/* Now look at the matching shadow entry. */
- spgd = spgd_addr(lg, lg->pgdidx, vaddr);
+ spgd = spgd_addr(lg, vcpu->vcpu_pgd, vaddr);
if (!(pgd_flags(*spgd) & _PAGE_PRESENT)) {
/* No shadow entry: allocate a new shadow PTE page. */
unsigned long ptepage = get_zeroed_page(GFP_KERNEL);
@@ -297,19 +298,19 @@ int demand_page(struct lguest *lg, unsigned long vaddr, int errcode)
*
* This is a quick version which answers the question: is this virtual address
* mapped by the shadow page tables, and is it writable? */
-static int page_writable(struct lguest *lg, unsigned long vaddr)
+static int page_writable(struct lg_vcpu *vcpu, unsigned long vaddr)
{
pgd_t *spgd;
unsigned long flags;

/* Look at the current top level entry: is it present? */
- spgd = spgd_addr(lg, lg->pgdidx, vaddr);
+ spgd = spgd_addr(vcpu->lg, vcpu->vcpu_pgd, vaddr);
if (!(pgd_flags(*spgd) & _PAGE_PRESENT))
return 0;

/* Check the flags on the pte entry itself: it must be present and
* writable. */
- flags = pte_flags(*(spte_addr(lg, *spgd, vaddr)));
+ flags = pte_flags(*(spte_addr(vcpu->lg, *spgd, vaddr)));

return (flags & (_PAGE_PRESENT|_PAGE_RW)) == (_PAGE_PRESENT|_PAGE_RW);
}
@@ -317,10 +318,10 @@ static int page_writable(struct lguest *lg, unsigned long vaddr)
/* So, when pin_stack_pages() asks us to pin a page, we check if it's already
* in the page tables, and if not, we call demand_page() with error code 2
* (meaning "write"). */
-void pin_page(struct lguest *lg, unsigned long vaddr)
+void pin_page(struct lg_vcpu *vcpu, unsigned long vaddr)
{
- if (!page_writable(lg, vaddr) && !demand_page(lg, vaddr, 2))
- kill_guest(lg, "bad stack page %#lx", vaddr);
+ if (!page_writable(vcpu, vaddr) && !demand_page(vcpu, vaddr, 2))
+ kill_guest(vcpu->lg, "bad stack page %#lx", vaddr);
}

/*H:450 If we chase down the release_pgd() code, it looks like this: */
@@ -358,28 +359,28 @@ static void flush_user_mappings(struct lguest *lg, int idx)
*
* The Guest has a hypercall to throw away the page tables: it's used when a
* large number of mappings have been changed. */
-void guest_pagetable_flush_user(struct lguest *lg)
+void guest_pagetable_flush_user(struct lg_vcpu *vcpu)
{
/* Drop the userspace part of the current page table. */
- flush_user_mappings(lg, lg->pgdidx);
+ flush_user_mappings(vcpu->lg, vcpu->vcpu_pgd);
}
/*:*/

/* We walk down the guest page tables to get a guest-physical address */
-unsigned long guest_pa(struct lguest *lg, unsigned long vaddr)
+unsigned long guest_pa(struct lg_vcpu *vcpu, unsigned long vaddr)
{
pgd_t gpgd;
pte_t gpte;

/* First step: get the top-level Guest page table entry. */
- gpgd = lgread(lg, gpgd_addr(lg, vaddr), pgd_t);
+ gpgd = lgread(vcpu->lg, gpgd_addr(vcpu, vaddr), pgd_t);
/* Toplevel not present? We can't map it in. */
if (!(pgd_flags(gpgd) & _PAGE_PRESENT))
- kill_guest(lg, "Bad address %#lx", vaddr);
+ kill_guest(vcpu->lg, "Bad address %#lx", vaddr);

- gpte = lgread(lg, gpte_addr(lg, gpgd, vaddr), pte_t);
+ gpte = lgread(vcpu->lg, gpte_addr(vcpu->lg, gpgd, vaddr), pte_t);
if (!(pte_flags(gpte) & _PAGE_PRESENT))
- kill_guest(lg, "Bad address %#lx", vaddr);
+ kill_guest(vcpu->lg, "Bad address %#lx", vaddr);

return pte_pfn(gpte) * PAGE_SIZE | (vaddr & ~PAGE_MASK);
}
@@ -399,11 +400,12 @@ static unsigned int find_pgdir(struct lguest *lg, unsigned long pgtable)
/*H:435 And this is us, creating the new page directory. If we really do
* allocate a new one (and so the kernel parts are not there), we set
* blank_pgdir. */
-static unsigned int new_pgdir(struct lguest *lg,
+static unsigned int new_pgdir(struct lg_vcpu *vcpu,
unsigned long gpgdir,
int *blank_pgdir)
{
unsigned int next;
+ struct lguest *lg = vcpu->lg;

/* We pick one entry at random to throw out. Choosing the Least
* Recently Used might be better, but this is easy. */
@@ -413,7 +415,7 @@ static unsigned int new_pgdir(struct lguest *lg,
lg->pgdirs[next].pgdir = (pgd_t *)get_zeroed_page(GFP_KERNEL);
/* If the allocation fails, just keep using the one we have */
if (!lg->pgdirs[next].pgdir)
- next = lg->pgdidx;
+ next = vcpu->vcpu_pgd;
else
/* This is a blank page, so there are no kernel
* mappings: caller must map the stack! */
@@ -442,9 +444,9 @@ void guest_new_pagetable(struct lg_vcpu *vcpu, unsigned long pgtable)
/* If not, we allocate or mug an existing one: if it's a fresh one,
* repin gets set to 1. */
if (newpgdir == ARRAY_SIZE(lg->pgdirs))
- newpgdir = new_pgdir(lg, pgtable, &repin);
+ newpgdir = new_pgdir(vcpu, pgtable, &repin);
/* Change the current pgd index to the new one. */
- lg->pgdidx = newpgdir;
+ vcpu->vcpu_pgd = newpgdir;
/* If it was completely blank, we map in the Guest kernel stack */
if (repin)
pin_stack_pages(vcpu);
@@ -591,11 +593,11 @@ int init_guest_pagetable(struct lguest *lg, unsigned long pgtable)
{
/* We start on the first shadow page table, and give it a blank PGD
* page. */
- lg->pgdidx = 0;
- lg->pgdirs[lg->pgdidx].gpgdir = pgtable;
- lg->pgdirs[lg->pgdidx].pgdir = (pgd_t*)get_zeroed_page(GFP_KERNEL);
- if (!lg->pgdirs[lg->pgdidx].pgdir)
+ lg->pgdirs[0].gpgdir = pgtable;
+ lg->pgdirs[0].pgdir = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+ if (!lg->pgdirs[0].pgdir)
return -ENOMEM;
+ lg->vcpus[0].vcpu_pgd = 0;
return 0;
}

@@ -607,7 +609,7 @@ void page_table_guest_data_init(struct lguest *lg)
/* We tell the Guest that it can't use the top 4MB of virtual
* addresses used by the Switcher. */
|| put_user(4U*1024*1024, &lg->lguest_data->reserve_mem)
- || put_user(lg->pgdirs[lg->pgdidx].gpgdir,&lg->lguest_data->pgdir))
+ || put_user(lg->pgdirs[0].gpgdir, &lg->lguest_data->pgdir))
kill_guest(lg, "bad guest page %p", lg->lguest_data);

/* In flush_user_mappings() we loop from 0 to
@@ -638,7 +640,6 @@ void free_guest_pagetable(struct lguest *lg)
void map_switcher_in_guest(struct lg_vcpu *vcpu,
struct lguest_pages *pages)
{
- struct lguest *lg = vcpu->lg;
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
@@ -648,7 +649,8 @@ void map_switcher_in_guest(struct lg_vcpu *vcpu,
* page for this CPU (with appropriate flags). */
switcher_pgd = __pgd(__pa(switcher_pte_page) | _PAGE_KERNEL);

- lg->pgdirs[lg->pgdidx].pgdir[SWITCHER_PGD_INDEX] = switcher_pgd;
+ vcpu->lg->pgdirs[vcpu->vcpu_pgd].pgdir[SWITCHER_PGD_INDEX] =
+ switcher_pgd;

/* We also change the Switcher PTE page. When we're running the Guest,
* we want the Guest's "regs" page to appear where the first Switcher
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9a64174..b85e3de 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -147,7 +147,7 @@ static void run_guest_once(struct lg_vcpu *vcpu,
* 0-th argument above, ie "a"). %ebx contains the
* physical address of the Guest's top-level page
* directory. */
- : "0"(pages), "1"(__pa(lg->pgdirs[lg->pgdidx].pgdir))
+ : "0"(pages), "1"(__pa(lg->pgdirs[vcpu->vcpu_pgd].pgdir))
/* We tell gcc that all these registers could change,
* which means we don't have to save and restore them in
* the Switcher. */
@@ -225,7 +225,7 @@ static int emulate_insn(struct lg_vcpu *vcpu)
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
* guest_pa just subtracts the Guest's page_offset. */
- unsigned long physaddr = guest_pa(lg, vcpu->regs->eip);
+ unsigned long physaddr = guest_pa(vcpu, vcpu->regs->eip);

/* This must be the Guest kernel trying to do something, not userspace!
* The bottom two bits of the CS segment register are the privilege
@@ -307,7 +307,7 @@ void lguest_arch_handle_trap(struct lg_vcpu *vcpu)
*
* The errcode tells whether this was a read or a write, and
* whether kernel or userspace code. */
- if (demand_page(lg, vcpu->arch.last_pagefault,
+ if (demand_page(vcpu, vcpu->arch.last_pagefault,
vcpu->regs->errcode))
return;

--
1.5.0.6

2008-01-08 11:03:00

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 04/16] per-cpu run guest

On Tuesday 08 January 2008 00:05:25 Glauber de Oliveira Costa wrote:
> + /* Watch out for arbitrary vcpu indexes! */
> + if (vcpu_id > lg->nr_vcpus)
> + return -EINVAL;
> +
> + vcpu = &lg->vcpus[vcpu_id];
> +

Out-by-one error here... Fixed it for you, plus a couple of others.

I've applied the patches, but made one minor-but-invasive change: I didn't
want to ask you to spin the patches again!

I changed "vcpu" to "cpu" everywhere (the v is pretty redundant in this
context), which cut about a dozen lines of code out (things now fitted
again!).

I also changed "vcpu_id" to simply "id" and made it unsigned. Do you plan for
this to always be equal to the index in the vcpu array BTW? If so, we can
neaten vcpu_start (now lg_cpu_start)...

You can grab the latest now...

Thanks!
Rusty.