2010-12-21 21:39:32

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [RFC PATCH v1] Consider void entries in the P2M as 1-1 mapping.

Please see attached an RFC of first set of patches that augments
how Xen MMU deals with PFNs that point to physical devices.

Short summary: No need to troll through code to add VM_IO on mmap paths
anymore.

Long summary:
Under Xen MMU we would distinguish two different types of PFNs in
the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning).
If there was a device which PCI BAR was within the P2M, we would look
at the flags and if _PAGE_IOMAP was passed we would just return the PFN without
consulting the P2M. We have a patch (and some auxilary for other subsystems)
that sets this:
x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas

This patchset proposes a different way of doing this where the patch
above and the other auxilary ones will not be neccessary.

This different is the one that H. Peter Anvin and Jeremy Fitzhardinge
suggested. The mechanism is to think of the void entries (so not filled) in the
P2M tree structure as identity (1-1) mapping. In the past we used to think of
those regions as "missing" and under the ownership of the balloon code. But
the balloon code only operates on a specific region. This region is in last
E820 RAM page (basically any region past nr_pages is considered balloon
type page).

Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up
with the void entries and point to the "missing" pages.

This patchset considers the void entries as "identity" and for balloon pages
you have to set the PFNs to be "missing". This means that the void entries
are now considered 1-1, so for PFNs which exist in large gaps of the P2M space
will return the same PFN. Since the E820 gaps could cross boundary (keep in
mind that the P2M structure is a 3-level tree) in the P2M regions we go
through the E820 gaps and reserved E820 regions and set those to be
identity. For large regions we just hook up the top (or middle) pointer
to shared "identity" pages. For smaller regions we set the MFN wherein
pfn_to_mfn(pfn)==pfn.

For the case of the balloon pages, the setting of the "missing" pages
is mostly already done. The initial case of carving the last E820 region for
balloon ownership is augmented to set those PFNs to missing.


This patchset is also available under git:
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/p2m-identity

Further work:
The xen/mmu.c code where it deals with _PAGE_IOMAP can be removed, but
to guard against regressions or bugs lets take it one patchset at a time.

Also filter out _PAGE_IOMAP on entries that are System RAM (which is wrong).
With this P2M to lookup we can make this determination easily.

P.S.
You might wonder why the IDENTITY_P2M_ENTRY value is not used when writting
to the P2M, but only used as flag. That is b/c the toolset does not consider
that value to be correct so instead we use the INVALID_P2M_ENTRY.
In-Reply-To:


2010-12-21 21:38:11

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 08/10] xen/mmu: Bugfix. Fill the top entry page with appropriate middle layer pointers.

If we swapped over from using an p2m_mid_identical to p2m_mid_missing
(earlier call to set_phys_to_machine) and then started going through the
PFNs in descending order to program a new MFN (balloon worker), we would
end up in this code path. At that point we would set up new page filled with
pointers to p2m_identity instead of p2m_missing. This had the disastrous
effect that get_phys_to_machine on that PFN would return an 1-1 mapping
instead of INVALID_P2M_ENTRY resulting in hitting a BUG check in balloon driver.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 92f4fec..a917439 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -480,7 +480,10 @@ static bool alloc_p2m(unsigned long pfn)
if (!mid)
return false;

- p2m_mid_init(mid, p2m_identity);
+ if (mid == p2m_mid_identity)
+ p2m_mid_init(mid, p2m_identity);
+ else
+ p2m_mid_init(mid, p2m_missing);

if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
free_p2m_page(mid);
--
1.7.1

2010-12-21 21:38:17

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 06/10] xen/setup: Only set identity mapping in E820 regions when privileged.

We do not want to set the identity mapping on E820 reserved
regions when running as PV. This is b/c the 0->ISA_END_ADDRESS region
would be considered identity and we would try to read DMI information
and fail (since the pfn_to_mfn(mfn)==pfn) under PV guests.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/setup.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 752c865..34fb906 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -160,7 +160,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
if (end < start)
continue;

- if (e820->map[i].type != E820_RAM) {
+ if (xen_initial_domain() && e820->map[i].type != E820_RAM) {
for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
set_phys_to_machine(pfn, pfn);
identity += pfn - PFN_UP(start);
--
1.7.1

2010-12-21 21:38:21

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 10/10] xen/mmu: Warn against races.

The initial bootup code uses set_phys_to_machine quite a lot, and after
bootup it would be the balloon driver. The balloon driver does have
mutex lock so this should not be necessary - but just in case, add
a warning if we do hit this scenario.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b2b8733..86d3bd3 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -549,13 +549,15 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
idx = p2m_index(pfn);

if (mfn == INVALID_P2M_ENTRY) {
- /* If it is INVALID, swap over.. */
+ /* Ballon lock should inhibit racing, but just in case.*/
if (p2m_top[topidx] == p2m_mid_identity) {
- p2m_top[topidx] = p2m_mid_missing;
+ WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_identity,
+ p2m_mid_missing) != p2m_mid_identity);
return 1;
}
if (p2m_top[topidx][mididx] == p2m_identity) {
- p2m_top[topidx][mididx] = p2m_missing;
+ WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_identity,
+ p2m_missing) != p2m_identity);
return 1;
}
}
--
1.7.1

2010-12-21 21:38:20

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 05/10] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.

For all regions that are not considered RAM, we do not
necessarily have to set the P2M, as it assumes that any
P2M top branch (so covering 134217728 pages, on 64-bit) or
middle branch (so covering 262144 pages, on 64-bit) are identity.
Meaning pfn_to_mfn(pfn)==pfn. However, not all E820 gaps are
that large, so for smaller and for boundary conditions we fill
out the P2M mapping with the identity mapping.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/setup.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d984d36..752c865 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -146,6 +146,34 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
return released;
}

+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+ phys_addr_t last = 0;
+ int i;
+ unsigned long identity = 0;
+ unsigned long pfn;
+
+ for (i = 0; i < e820->nr_map; i++) {
+ phys_addr_t start = e820->map[i].addr;
+ phys_addr_t end = start + e820->map[i].size;
+
+ if (end < start)
+ continue;
+
+ if (e820->map[i].type != E820_RAM) {
+ for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
+ set_phys_to_machine(pfn, pfn);
+ identity += pfn - PFN_UP(start);
+ }
+ if (start > last && ((start - last) > 0)) {
+ for (pfn = PFN_UP(last); pfn < PFN_DOWN(start); pfn++)
+ set_phys_to_machine(pfn, pfn);
+ identity += pfn - PFN_UP(last);
+ }
+ last = end;
+ }
+ return identity;
+}
/**
* machine_specific_memory_setup - Hook for machine specific memory setup.
**/
@@ -254,6 +282,12 @@ char * __init xen_memory_setup(void)

xen_add_extra_mem(extra_pages);

+ /*
+ * Set P2M for all non-RAM pages and E820 gaps to be identity
+ * type PFNs.
+ */
+ printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n",
+ xen_set_identity(&e820));
return "Xen";
}

--
1.7.1

2010-12-21 21:38:15

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 03/10] xen/mmu: Add the notion of IDENTITY_P2M_ENTRY.

Our P2M tree structure is a three-level. On the leaf nodes
we set the Machine Frame Number (MFN) of the PFN. What this means
is that when one does: pfn_to_mfn(pfn), which is used when creating
PTE entries, you get the real MFN of the hardware. When Xen sets
up a guest it initially populates a array which has descending MFN
values, as so:

idx: 0, 1, 2
[0x290F, 0x290E, 0x290D, ..]

so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list
starts looking quite random.

Anyhow, we graft this structure on our P2M tree structure and stick in
those MFN in the leafs. But for all other leaf entries, or for the top
root, or middle one, for which there is no corresponding MFN, we assume
that the MFN is the PFN. In other words, for example:
pfn_to_mfn(0xc0000)=0xc0000.

Note, this is a departure from how P2M previously worked. In the past, it
would give you INVALID_P2M_ENTRY, so:
pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY.

The benefit of this is, that we can assume for non-RAM regions (think
PCI BARs, or ACPI spaces), we can create mappings easily b/c we
get the PFN value to match the MFN.

However, there is ballooning to be considered. Ballooning requires
that we keep track of INVALID_P2M_ENTRY or even set the leaf entries
with this value. Better yet, there might be huge regions (8MB, at
the end of E820 region) for which we need to set that INVALID_P2M_ENTRY.

For that we introduce two new pages: p2m_missing and p2m_mid_missing.
All entries in p2m_missing are of INVALID_P2M_ENTRY type, and
all entries in p2m_mid_missing point to p2m_missing. Whenever we
detect that we need to set INVALID_P2M_ENTRY for large areas we swap
those p2m_missing and/or p2m_mid_missing.

P.S.
We cannot set the IDENTITY_P2M_ENTRY in the P2M tree. This is b/c
the toolstack considers that invalid and would abort during
migration of the PV guest.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/include/asm/xen/page.h | 1 +
arch/x86/xen/mmu.c | 61 +++++++++++++++++++++++++++++++++++----
2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 8760cc6..4b0ee16 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -29,6 +29,7 @@ typedef struct xpaddr {

/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
#define INVALID_P2M_ENTRY (~0UL)
+#define IDENTITY_P2M_ENTRY (0UL)
#define FOREIGN_FRAME_BIT (1UL<<31)
#define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d6d0276..4ba7e4e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -217,6 +217,9 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);

+static RESERVE_BRK_ARRAY(unsigned long, p2m_missing, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_missing, P2M_MID_PER_PAGE);
+
RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));

@@ -260,12 +263,12 @@ static void p2m_top_mfn_p_init(unsigned long **top)
top[i] = p2m_mid_identity_mfn;
}

-static void p2m_mid_init(unsigned long **mid)
+static void p2m_mid_init(unsigned long **mid, unsigned long *ptr)
{
unsigned i;

for (i = 0; i < P2M_MID_PER_PAGE; i++)
- mid[i] = p2m_identity;
+ mid[i] = ptr;
}

static void p2m_mid_mfn_init(unsigned long *mid)
@@ -374,11 +377,16 @@ void __init xen_build_dynamic_phys_to_machine(void)
p2m_init(p2m_identity);

p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_mid_init(p2m_mid_identity);
+ p2m_mid_init(p2m_mid_identity, p2m_identity);

p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_init(p2m_top);

+ p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_init(p2m_missing);
+
+ p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_mid_init(p2m_mid_missing, p2m_missing);
/*
* The domain builder gives us a pre-constructed p2m array in
* mfn_list for all the pages initially given to us, so we just
@@ -390,7 +398,7 @@ void __init xen_build_dynamic_phys_to_machine(void)

if (p2m_top[topidx] == p2m_mid_identity) {
unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_mid_init(mid);
+ p2m_mid_init(mid, p2m_identity);

p2m_top[topidx] = mid;
}
@@ -410,6 +418,28 @@ unsigned long get_phys_to_machine(unsigned long pfn)
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);

+ /*
+ * The INVALID_P2M_ENTRY is filled in both p2m_*identity
+ * and in p2m_*missing, so returning the INVALID_P2M_ENTRY
+ * would be wrong.
+ */
+ if (p2m_top[topidx] == p2m_mid_identity)
+ return pfn;
+
+ if (p2m_top[topidx][mididx] == p2m_identity)
+ return pfn;
+
+#if 0
+ /*
+ * These are superflous. The p2m_missing and p2m_mid_missing
+ * both contain INVALID_P2M_ENTRY values. But this is correct
+ * and can help in understanding this code. */
+ if (p2m_top[topidx] == p2m_mid_missing)
+ return INVALID_P2M_ENTRY;
+
+ if (p2m_top[topidx][mididx] == p2m_missing)
+ return INVALID_P2M_ENTRY;
+#endif
return p2m_top[topidx][mididx][idx];
}
EXPORT_SYMBOL_GPL(get_phys_to_machine);
@@ -449,7 +479,7 @@ static bool alloc_p2m(unsigned long pfn)
if (!mid)
return false;

- p2m_mid_init(mid);
+ p2m_mid_init(mid, p2m_identity);

if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
free_p2m_page(mid);
@@ -513,9 +543,28 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);

- if (p2m_top[topidx][mididx] == p2m_identity)
+ if (mfn == INVALID_P2M_ENTRY) {
+ /* If it is INVALID, swap over.. */
+ if (p2m_top[topidx] == p2m_mid_identity) {
+ p2m_top[topidx] = p2m_mid_missing;
+ return 1;
+ }
+ if (p2m_top[topidx][mididx] == p2m_identity) {
+ p2m_top[topidx][mididx] = p2m_missing;
+ return 1;
+ }
+ }
+
+ /* And the result of the above swap over.. */
+ if (p2m_top[topidx][mididx] == p2m_missing)
return mfn == INVALID_P2M_ENTRY;

+ /* For sparse holes were the p2m leaf has real PFN along with
+ * PCI holes, stick in the PFN as the MFN value, do not pass
+ * in the IDENTITY_P2M_ENTRY state - that value cannot be saved.
+ */
+ BUG_ON(mfn == IDENTITY_P2M_ENTRY);
+
p2m_top[topidx][mididx][idx] = mfn;

return true;
--
1.7.1

2010-12-21 21:38:55

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 02/10] xen/p2m: change p2m_missing_* to p2m_identity_*

From: Konrad Rzeszutek Wilk <[email protected]>

We are going to alter how we think about P2M. Most of the
P2M contains MFN, and areas that are not populated are
considered to be "missing". Missing means that the PFN
is either not set for this guest (not have that much memory
allocated) or is under the balloon driver ownership.

We are instead now going to think of those not populated
areas as "identity." Meaning that that the PFN for which
we would get the p2m_identity we will provide the the PFN
value back instead of P2M_MISSING. Essentially treating
those regions as PFN==MFN.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 59 ++++++++++++++++++++++++++-------------------------
1 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 44924e5..d6d0276 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -209,9 +209,9 @@ unsigned long xen_max_p2m_pfn __read_mostly;
#define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE)

/* Placeholders for holes in the address space */
-static RESERVE_BRK_ARRAY(unsigned long, p2m_missing, P2M_PER_PAGE);
-static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_missing, P2M_MID_PER_PAGE);
-static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_missing_mfn, P2M_MID_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
+static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_identity_mfn, P2M_MID_PER_PAGE);

static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
@@ -241,7 +241,7 @@ static void p2m_top_init(unsigned long ***top)
unsigned i;

for (i = 0; i < P2M_TOP_PER_PAGE; i++)
- top[i] = p2m_mid_missing;
+ top[i] = p2m_mid_identity;
}

static void p2m_top_mfn_init(unsigned long *top)
@@ -249,7 +249,7 @@ static void p2m_top_mfn_init(unsigned long *top)
unsigned i;

for (i = 0; i < P2M_TOP_PER_PAGE; i++)
- top[i] = virt_to_mfn(p2m_mid_missing_mfn);
+ top[i] = virt_to_mfn(p2m_mid_identity_mfn);
}

static void p2m_top_mfn_p_init(unsigned long **top)
@@ -257,7 +257,7 @@ static void p2m_top_mfn_p_init(unsigned long **top)
unsigned i;

for (i = 0; i < P2M_TOP_PER_PAGE; i++)
- top[i] = p2m_mid_missing_mfn;
+ top[i] = p2m_mid_identity_mfn;
}

static void p2m_mid_init(unsigned long **mid)
@@ -265,7 +265,7 @@ static void p2m_mid_init(unsigned long **mid)
unsigned i;

for (i = 0; i < P2M_MID_PER_PAGE; i++)
- mid[i] = p2m_missing;
+ mid[i] = p2m_identity;
}

static void p2m_mid_mfn_init(unsigned long *mid)
@@ -273,7 +273,7 @@ static void p2m_mid_mfn_init(unsigned long *mid)
unsigned i;

for (i = 0; i < P2M_MID_PER_PAGE; i++)
- mid[i] = virt_to_mfn(p2m_missing);
+ mid[i] = virt_to_mfn(p2m_identity);
}

static void p2m_init(unsigned long *p2m)
@@ -300,8 +300,8 @@ void xen_build_mfn_list_list(void)

/* Pre-initialize p2m_top_mfn to be completely missing */
if (p2m_top_mfn == NULL) {
- p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_mid_mfn_init(p2m_mid_missing_mfn);
+ p2m_mid_identity_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_mid_mfn_init(p2m_mid_identity_mfn);

p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_mfn_p_init(p2m_top_mfn_p);
@@ -310,7 +310,7 @@ void xen_build_mfn_list_list(void)
p2m_top_mfn_init(p2m_top_mfn);
} else {
/* Reinitialise, mfn's all change after migration */
- p2m_mid_mfn_init(p2m_mid_missing_mfn);
+ p2m_mid_mfn_init(p2m_mid_identity_mfn);
}

for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
@@ -326,15 +326,15 @@ void xen_build_mfn_list_list(void)
* they're just missing, just update the stored mfn,
* since all could have changed over a migrate.
*/
- if (mid == p2m_mid_missing) {
+ if (mid == p2m_mid_identity) {
BUG_ON(mididx);
- BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
- p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
+ BUG_ON(mid_mfn_p != p2m_mid_identity_mfn);
+ p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_identity_mfn);
pfn += (P2M_MID_PER_PAGE - 1) * P2M_PER_PAGE;
continue;
}

- if (mid_mfn_p == p2m_mid_missing_mfn) {
+ if (mid_mfn_p == p2m_mid_identity_mfn) {
/*
* XXX boot-time only! We should never find
* missing parts of the mfn tree after
@@ -370,11 +370,11 @@ void __init xen_build_dynamic_phys_to_machine(void)

xen_max_p2m_pfn = max_pfn;

- p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_init(p2m_missing);
+ p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_init(p2m_identity);

- p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_mid_init(p2m_mid_missing);
+ p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_mid_init(p2m_mid_identity);

p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_init(p2m_top);
@@ -388,7 +388,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
unsigned topidx = p2m_top_index(pfn);
unsigned mididx = p2m_mid_index(pfn);

- if (p2m_top[topidx] == p2m_mid_missing) {
+ if (p2m_top[topidx] == p2m_mid_identity) {
unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(mid);

@@ -443,7 +443,7 @@ static bool alloc_p2m(unsigned long pfn)
top_p = &p2m_top[topidx];
mid = *top_p;

- if (mid == p2m_mid_missing) {
+ if (mid == p2m_mid_identity) {
/* Mid level is missing, allocate a new one */
mid = alloc_p2m_page();
if (!mid)
@@ -451,7 +451,7 @@ static bool alloc_p2m(unsigned long pfn)

p2m_mid_init(mid);

- if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+ if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
free_p2m_page(mid);
}

@@ -460,9 +460,9 @@ static bool alloc_p2m(unsigned long pfn)

BUG_ON(virt_to_mfn(mid_mfn) != *top_mfn_p);

- if (mid_mfn == p2m_mid_missing_mfn) {
+ if (mid_mfn == p2m_mid_identity_mfn) {
/* Separately check the mid mfn level */
- unsigned long missing_mfn;
+ unsigned long identity_mfn;
unsigned long mid_mfn_mfn;

mid_mfn = alloc_p2m_page();
@@ -471,15 +471,16 @@ static bool alloc_p2m(unsigned long pfn)

p2m_mid_mfn_init(mid_mfn);

- missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
+ identity_mfn = virt_to_mfn(p2m_mid_identity_mfn);
mid_mfn_mfn = virt_to_mfn(mid_mfn);
- if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
+ if (cmpxchg(top_mfn_p, identity_mfn, mid_mfn_mfn) !=
+ identity_mfn)
free_p2m_page(mid_mfn);
else
p2m_top_mfn_p[topidx] = mid_mfn;
}

- if (p2m_top[topidx][mididx] == p2m_missing) {
+ if (p2m_top[topidx][mididx] == p2m_identity) {
/* p2m leaf page is missing */
unsigned long *p2m;

@@ -489,7 +490,7 @@ static bool alloc_p2m(unsigned long pfn)

p2m_init(p2m);

- if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
+ if (cmpxchg(&mid[mididx], p2m_identity, p2m) != p2m_identity)
free_p2m_page(p2m);
else
mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -512,7 +513,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);

- if (p2m_top[topidx][mididx] == p2m_missing)
+ if (p2m_top[topidx][mididx] == p2m_identity)
return mfn == INVALID_P2M_ENTRY;

p2m_top[topidx][mididx][idx] = mfn;
--
1.7.1

2010-12-21 21:38:13

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 04/10] xen/mmu: For 1-1 mapping, automatically set _PAGE_IOMAP.

This means that for PFNs (specifically: those in any E820 gaps
or non-RAM E820 regions) that have 1-1 mapping we set the
_PAGE_IOMAP flag.

Later on we could remove the _PAGE_IOMAP code handling, but
for right now lets keep this in to not introduce any bisection
failures across this patchset.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 4ba7e4e..bd02e7d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -832,6 +832,9 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
pteval_t flags = val & PTE_FLAGS_MASK;
unsigned long mfn = pfn_to_mfn(pfn);

+ if (mfn == pfn)
+ flags |= _PAGE_IOMAP;
+
/*
* If there's no mfn for the pfn, then just create an
* empty non-present pte. Unfortunately this loses
--
1.7.1

2010-12-21 21:38:09

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

This patch prepares ourselves for the case where void entries in the P2M
tree structure do not necessarily imply that the pages are missing.
With this, we diligently set regions that will be used by the
balloon driver to be INVALID_P2M_ENTRY and under the ownership
of the balloon driver.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/setup.c | 8 ++++++++
drivers/xen/balloon.c | 1 +
2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b5a7f92..d984d36 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;

static __init void xen_add_extra_mem(unsigned long pages)
{
+ unsigned long pfn;
+
u64 size = (u64)pages * PAGE_SIZE;
u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;

@@ -66,6 +68,11 @@ static __init void xen_add_extra_mem(unsigned long pages)
xen_extra_mem_size += size;

xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
+
+ for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++) {
+ BUG_ON(!set_phys_to_machine(pfn, INVALID_P2M_ENTRY));
+ BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
+ }
}

static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
@@ -105,6 +112,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
start, end, ret);
if (ret == 1) {
set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
len++;
}
}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 43f9f02..f82bb48 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -297,6 +297,7 @@ static int decrease_reservation(unsigned long nr_pages)
for (i = 0; i < nr_pages; i++) {
pfn = mfn_to_pfn(frame_list[i]);
set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
balloon_append(pfn_to_page(pfn));
}

--
1.7.1

2010-12-21 21:38:08

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 09/10] xen/mmu: Be aware of p2m_[mid_|]missing when saving/restore.

We did not consider the 1-1 mapping and during restore would
not properly deal with them.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a917439..b2b8733 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -329,14 +329,13 @@ void xen_build_mfn_list_list(void)
* they're just missing, just update the stored mfn,
* since all could have changed over a migrate.
*/
- if (mid == p2m_mid_identity) {
+ if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
BUG_ON(mididx);
BUG_ON(mid_mfn_p != p2m_mid_identity_mfn);
p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_identity_mfn);
pfn += (P2M_MID_PER_PAGE - 1) * P2M_PER_PAGE;
continue;
}
-
if (mid_mfn_p == p2m_mid_identity_mfn) {
/*
* XXX boot-time only! We should never find
--
1.7.1

2010-12-21 21:41:19

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 07/10] xen/mmu: Work with 1-1 mappings when allocating new top/middle entries.

We were not properly taking under advisement the 1-1 mappings
when a large area of memory was ballooned out.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/mmu.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index bd02e7d..92f4fec 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -464,7 +464,7 @@ static void free_p2m_page(void *p)
static bool alloc_p2m(unsigned long pfn)
{
unsigned topidx, mididx;
- unsigned long ***top_p, **mid;
+ unsigned long ***top_p, **mid, **mid_orig;
unsigned long *top_mfn_p, *mid_mfn;

topidx = p2m_top_index(pfn);
@@ -473,15 +473,16 @@ static bool alloc_p2m(unsigned long pfn)
top_p = &p2m_top[topidx];
mid = *top_p;

- if (mid == p2m_mid_identity) {
+ if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
/* Mid level is missing, allocate a new one */
+ mid_orig = mid;
mid = alloc_p2m_page();
if (!mid)
return false;

p2m_mid_init(mid, p2m_identity);

- if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
+ if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
free_p2m_page(mid);
}

@@ -510,9 +511,11 @@ static bool alloc_p2m(unsigned long pfn)
p2m_top_mfn_p[topidx] = mid_mfn;
}

- if (p2m_top[topidx][mididx] == p2m_identity) {
+ if (p2m_top[topidx][mididx] == p2m_identity ||
+ p2m_top[topidx][mididx] == p2m_missing) {
/* p2m leaf page is missing */
unsigned long *p2m;
+ unsigned long *p2m_orig = p2m_top[topidx][mididx];

p2m = alloc_p2m_page();
if (!p2m)
@@ -520,7 +523,7 @@ static bool alloc_p2m(unsigned long pfn)

p2m_init(p2m);

- if (cmpxchg(&mid[mididx], p2m_identity, p2m) != p2m_identity)
+ if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
free_p2m_page(p2m);
else
mid_mfn[mididx] = virt_to_mfn(p2m);
--
1.7.1

2010-12-21 22:19:43

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> This patch prepares ourselves for the case where void entries in the P2M
> tree structure do not necessarily imply that the pages are missing.
> With this, we diligently set regions that will be used by the
> balloon driver to be INVALID_P2M_ENTRY and under the ownership
> of the balloon driver.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/setup.c | 8 ++++++++
> drivers/xen/balloon.c | 1 +
> 2 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index b5a7f92..d984d36 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
>
> static __init void xen_add_extra_mem(unsigned long pages)
> {
> + unsigned long pfn;
> +
> u64 size = (u64)pages * PAGE_SIZE;
> u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
>
> @@ -66,6 +68,11 @@ static __init void xen_add_extra_mem(unsigned long pages)
> xen_extra_mem_size += size;
>
> xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
> +
> + for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++) {
> + BUG_ON(!set_phys_to_machine(pfn, INVALID_P2M_ENTRY));

Use __set_phys_to_machine where you don't expect (or can't allow) any
allocation.

Also, I'm not a fan of hiding real side-effectful code in a BUG_ON
predicate.

> + BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
> + }
> }
>
> static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
> @@ -105,6 +112,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
> start, end, ret);
> if (ret == 1) {
> set_phys_to_machine(pfn, INVALID_P2M_ENTRY);

Ditto on this one (I guess I forgot, or it predates the existence of
__set_phys_to_machine).

> + BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
> len++;
> }
> }
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 43f9f02..f82bb48 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -297,6 +297,7 @@ static int decrease_reservation(unsigned long nr_pages)
> for (i = 0; i < nr_pages; i++) {
> pfn = mfn_to_pfn(frame_list[i]);
> set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> + BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);
> balloon_append(pfn_to_page(pfn));
> }
>

2010-12-21 22:29:34

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 04/10] xen/mmu: For 1-1 mapping, automatically set _PAGE_IOMAP.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> This means that for PFNs (specifically: those in any E820 gaps
> or non-RAM E820 regions) that have 1-1 mapping we set the
> _PAGE_IOMAP flag.
>
> Later on we could remove the _PAGE_IOMAP code handling, but
> for right now lets keep this in to not introduce any bisection
> failures across this patchset.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/mmu.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 4ba7e4e..bd02e7d 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -832,6 +832,9 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> pteval_t flags = val & PTE_FLAGS_MASK;
> unsigned long mfn = pfn_to_mfn(pfn);
>
> + if (mfn == pfn)
> + flags |= _PAGE_IOMAP;

Why? Does it really make sense to set _PAGE_IOMAP if they just happen
to be the same value?

J

2010-12-21 22:34:47

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 05/10] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> For all regions that are not considered RAM, we do not
> necessarily have to set the P2M, as it assumes that any
> P2M top branch (so covering 134217728 pages, on 64-bit) or
> middle branch (so covering 262144 pages, on 64-bit) are identity.
> Meaning pfn_to_mfn(pfn)==pfn. However, not all E820 gaps are
> that large, so for smaller and for boundary conditions we fill
> out the P2M mapping with the identity mapping.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/setup.c | 34 ++++++++++++++++++++++++++++++++++
> 1 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index d984d36..752c865 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -146,6 +146,34 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
> return released;
> }
>
> +static unsigned long __init xen_set_identity(const struct e820map *e820)
> +{
> + phys_addr_t last = 0;
> + int i;
> + unsigned long identity = 0;
> + unsigned long pfn;
> +
> + for (i = 0; i < e820->nr_map; i++) {
> + phys_addr_t start = e820->map[i].addr;
> + phys_addr_t end = start + e820->map[i].size;
> +
> + if (end < start)
> + continue;
> +
> + if (e820->map[i].type != E820_RAM) {
> + for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> + set_phys_to_machine(pfn, pfn);
> + identity += pfn - PFN_UP(start);
> + }
> + if (start > last && ((start - last) > 0)) {
> + for (pfn = PFN_UP(last); pfn < PFN_DOWN(start); pfn++)
> + set_phys_to_machine(pfn, pfn);
> + identity += pfn - PFN_UP(last);
> + }

Couldn't you just do something like:

if (e820->map[i].type != E820_RAM)
continue;

for (pfn = PFN_UP(last); pfn < PFN_DOWN(start); pfn++)
set_phys_to_machine(pfn, pfn);
identity += pfn - PFN_UP(last);

last = end;

ie, handle the hole and non-RAM cases together?

Also, what happens with the p2m tree mid layers in this? If you're
doing page-by-page set_phys_to_machine, won't it end up allocating them
all? How can you optimise the "large chunks of address space are
identity" case?

It would probably be cleanest to have a set_ident_phys_to_machine(start,
end) function which can do all that.

J
> + last = end;
> + }
> + return identity;
> +}
> /**
> * machine_specific_memory_setup - Hook for machine specific memory setup.
> **/
> @@ -254,6 +282,12 @@ char * __init xen_memory_setup(void)
>
> xen_add_extra_mem(extra_pages);
>
> + /*
> + * Set P2M for all non-RAM pages and E820 gaps to be identity
> + * type PFNs.
> + */
> + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n",
> + xen_set_identity(&e820));
> return "Xen";
> }
>

2010-12-21 22:37:05

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 06/10] xen/setup: Only set identity mapping in E820 regions when privileged.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> We do not want to set the identity mapping on E820 reserved
> regions when running as PV. This is b/c the 0->ISA_END_ADDRESS region
> would be considered identity and we would try to read DMI information
> and fail (since the pfn_to_mfn(mfn)==pfn) under PV guests.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/setup.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 752c865..34fb906 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -160,7 +160,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
> if (end < start)
> continue;
>
> - if (e820->map[i].type != E820_RAM) {
> + if (xen_initial_domain() && e820->map[i].type != E820_RAM) {
> for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> set_phys_to_machine(pfn, pfn);
> identity += pfn - PFN_UP(start);

So you're relying on the fact that the ISA area is the only non-RAM e820
entry in domU? I think it would be better to do a specific exclusion
for the ISA area rather than this.

J

2010-12-21 22:37:49

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 07/10] xen/mmu: Work with 1-1 mappings when allocating new top/middle entries.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> We were not properly taking under advisement the 1-1 mappings
> when a large area of memory was ballooned out.

Could you expand on this? What does it mean?

J

> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/mmu.c | 13 ++++++++-----
> 1 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index bd02e7d..92f4fec 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -464,7 +464,7 @@ static void free_p2m_page(void *p)
> static bool alloc_p2m(unsigned long pfn)
> {
> unsigned topidx, mididx;
> - unsigned long ***top_p, **mid;
> + unsigned long ***top_p, **mid, **mid_orig;
> unsigned long *top_mfn_p, *mid_mfn;
>
> topidx = p2m_top_index(pfn);
> @@ -473,15 +473,16 @@ static bool alloc_p2m(unsigned long pfn)
> top_p = &p2m_top[topidx];
> mid = *top_p;
>
> - if (mid == p2m_mid_identity) {
> + if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
> /* Mid level is missing, allocate a new one */
> + mid_orig = mid;
> mid = alloc_p2m_page();
> if (!mid)
> return false;
>
> p2m_mid_init(mid, p2m_identity);
>
> - if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
> + if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
> free_p2m_page(mid);
> }
>
> @@ -510,9 +511,11 @@ static bool alloc_p2m(unsigned long pfn)
> p2m_top_mfn_p[topidx] = mid_mfn;
> }
>
> - if (p2m_top[topidx][mididx] == p2m_identity) {
> + if (p2m_top[topidx][mididx] == p2m_identity ||
> + p2m_top[topidx][mididx] == p2m_missing) {
> /* p2m leaf page is missing */
> unsigned long *p2m;
> + unsigned long *p2m_orig = p2m_top[topidx][mididx];
>
> p2m = alloc_p2m_page();
> if (!p2m)
> @@ -520,7 +523,7 @@ static bool alloc_p2m(unsigned long pfn)
>
> p2m_init(p2m);
>
> - if (cmpxchg(&mid[mididx], p2m_identity, p2m) != p2m_identity)
> + if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
> free_p2m_page(p2m);
> else
> mid_mfn[mididx] = virt_to_mfn(p2m);

2010-12-21 22:38:33

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 08/10] xen/mmu: Bugfix. Fill the top entry page with appropriate middle layer pointers.

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> If we swapped over from using an p2m_mid_identical to p2m_mid_missing
> (earlier call to set_phys_to_machine) and then started going through the
> PFNs in descending order to program a new MFN (balloon worker), we would
> end up in this code path. At that point we would set up new page filled with
> pointers to p2m_identity instead of p2m_missing. This had the disastrous
> effect that get_phys_to_machine on that PFN would return an 1-1 mapping
> instead of INVALID_P2M_ENTRY resulting in hitting a BUG check in balloon driver.
>
Are you going to fold this into the appropriate patch later?

> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/mmu.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 92f4fec..a917439 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -480,7 +480,10 @@ static bool alloc_p2m(unsigned long pfn)
> if (!mid)
> return false;
>
> - p2m_mid_init(mid, p2m_identity);
> + if (mid == p2m_mid_identity)
> + p2m_mid_init(mid, p2m_identity);
> + else
> + p2m_mid_init(mid, p2m_missing);
>
> if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
> free_p2m_page(mid);

2010-12-21 22:41:26

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 02/10] xen/p2m: change p2m_missing_* to p2m_identity_*

On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> From: Konrad Rzeszutek Wilk <[email protected]>
>
> We are going to alter how we think about P2M. Most of the
> P2M contains MFN, and areas that are not populated are
> considered to be "missing". Missing means that the PFN
> is either not set for this guest (not have that much memory
> allocated) or is under the balloon driver ownership.
>
> We are instead now going to think of those not populated
> areas as "identity." Meaning that that the PFN for which
> we would get the p2m_identity we will provide the the PFN
> value back instead of P2M_MISSING. Essentially treating
> those regions as PFN==MFN.
>

This renames missing -> identity, but does it actually change the
functionality? Doesn't it just leave it being misnamed? It would
probably be better to fold in the actual identity implementation as well.
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/mmu.c | 59 ++++++++++++++++++++++++++-------------------------
> 1 files changed, 30 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 44924e5..d6d0276 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -209,9 +209,9 @@ unsigned long xen_max_p2m_pfn __read_mostly;
> #define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE)
>
> /* Placeholders for holes in the address space */
> -static RESERVE_BRK_ARRAY(unsigned long, p2m_missing, P2M_PER_PAGE);
> -static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_missing, P2M_MID_PER_PAGE);
> -static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_missing_mfn, P2M_MID_PER_PAGE);
> +static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE);
> +static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE);
> +static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_identity_mfn, P2M_MID_PER_PAGE);
>
> static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
> static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
> @@ -241,7 +241,7 @@ static void p2m_top_init(unsigned long ***top)
> unsigned i;
>
> for (i = 0; i < P2M_TOP_PER_PAGE; i++)
> - top[i] = p2m_mid_missing;
> + top[i] = p2m_mid_identity;
> }
>
> static void p2m_top_mfn_init(unsigned long *top)
> @@ -249,7 +249,7 @@ static void p2m_top_mfn_init(unsigned long *top)
> unsigned i;
>
> for (i = 0; i < P2M_TOP_PER_PAGE; i++)
> - top[i] = virt_to_mfn(p2m_mid_missing_mfn);
> + top[i] = virt_to_mfn(p2m_mid_identity_mfn);
> }
>
> static void p2m_top_mfn_p_init(unsigned long **top)
> @@ -257,7 +257,7 @@ static void p2m_top_mfn_p_init(unsigned long **top)
> unsigned i;
>
> for (i = 0; i < P2M_TOP_PER_PAGE; i++)
> - top[i] = p2m_mid_missing_mfn;
> + top[i] = p2m_mid_identity_mfn;
> }
>
> static void p2m_mid_init(unsigned long **mid)
> @@ -265,7 +265,7 @@ static void p2m_mid_init(unsigned long **mid)
> unsigned i;
>
> for (i = 0; i < P2M_MID_PER_PAGE; i++)
> - mid[i] = p2m_missing;
> + mid[i] = p2m_identity;
> }
>
> static void p2m_mid_mfn_init(unsigned long *mid)
> @@ -273,7 +273,7 @@ static void p2m_mid_mfn_init(unsigned long *mid)
> unsigned i;
>
> for (i = 0; i < P2M_MID_PER_PAGE; i++)
> - mid[i] = virt_to_mfn(p2m_missing);
> + mid[i] = virt_to_mfn(p2m_identity);
> }
>
> static void p2m_init(unsigned long *p2m)
> @@ -300,8 +300,8 @@ void xen_build_mfn_list_list(void)
>
> /* Pre-initialize p2m_top_mfn to be completely missing */
> if (p2m_top_mfn == NULL) {
> - p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
> - p2m_mid_mfn_init(p2m_mid_missing_mfn);
> + p2m_mid_identity_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + p2m_mid_mfn_init(p2m_mid_identity_mfn);
>
> p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
> p2m_top_mfn_p_init(p2m_top_mfn_p);
> @@ -310,7 +310,7 @@ void xen_build_mfn_list_list(void)
> p2m_top_mfn_init(p2m_top_mfn);
> } else {
> /* Reinitialise, mfn's all change after migration */
> - p2m_mid_mfn_init(p2m_mid_missing_mfn);
> + p2m_mid_mfn_init(p2m_mid_identity_mfn);
> }
>
> for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
> @@ -326,15 +326,15 @@ void xen_build_mfn_list_list(void)
> * they're just missing, just update the stored mfn,
> * since all could have changed over a migrate.
> */
> - if (mid == p2m_mid_missing) {
> + if (mid == p2m_mid_identity) {
> BUG_ON(mididx);
> - BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
> - p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
> + BUG_ON(mid_mfn_p != p2m_mid_identity_mfn);
> + p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_identity_mfn);
> pfn += (P2M_MID_PER_PAGE - 1) * P2M_PER_PAGE;
> continue;
> }
>
> - if (mid_mfn_p == p2m_mid_missing_mfn) {
> + if (mid_mfn_p == p2m_mid_identity_mfn) {
> /*
> * XXX boot-time only! We should never find
> * missing parts of the mfn tree after
> @@ -370,11 +370,11 @@ void __init xen_build_dynamic_phys_to_machine(void)
>
> xen_max_p2m_pfn = max_pfn;
>
> - p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
> - p2m_init(p2m_missing);
> + p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + p2m_init(p2m_identity);
>
> - p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
> - p2m_mid_init(p2m_mid_missing);
> + p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + p2m_mid_init(p2m_mid_identity);
>
> p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
> p2m_top_init(p2m_top);
> @@ -388,7 +388,7 @@ void __init xen_build_dynamic_phys_to_machine(void)
> unsigned topidx = p2m_top_index(pfn);
> unsigned mididx = p2m_mid_index(pfn);
>
> - if (p2m_top[topidx] == p2m_mid_missing) {
> + if (p2m_top[topidx] == p2m_mid_identity) {
> unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
> p2m_mid_init(mid);
>
> @@ -443,7 +443,7 @@ static bool alloc_p2m(unsigned long pfn)
> top_p = &p2m_top[topidx];
> mid = *top_p;
>
> - if (mid == p2m_mid_missing) {
> + if (mid == p2m_mid_identity) {
> /* Mid level is missing, allocate a new one */
> mid = alloc_p2m_page();
> if (!mid)
> @@ -451,7 +451,7 @@ static bool alloc_p2m(unsigned long pfn)
>
> p2m_mid_init(mid);
>
> - if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
> + if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
> free_p2m_page(mid);
> }
>
> @@ -460,9 +460,9 @@ static bool alloc_p2m(unsigned long pfn)
>
> BUG_ON(virt_to_mfn(mid_mfn) != *top_mfn_p);
>
> - if (mid_mfn == p2m_mid_missing_mfn) {
> + if (mid_mfn == p2m_mid_identity_mfn) {
> /* Separately check the mid mfn level */
> - unsigned long missing_mfn;
> + unsigned long identity_mfn;
> unsigned long mid_mfn_mfn;
>
> mid_mfn = alloc_p2m_page();
> @@ -471,15 +471,16 @@ static bool alloc_p2m(unsigned long pfn)
>
> p2m_mid_mfn_init(mid_mfn);
>
> - missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
> + identity_mfn = virt_to_mfn(p2m_mid_identity_mfn);
> mid_mfn_mfn = virt_to_mfn(mid_mfn);
> - if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
> + if (cmpxchg(top_mfn_p, identity_mfn, mid_mfn_mfn) !=
> + identity_mfn)

Don't wrap this.

> free_p2m_page(mid_mfn);
> else
> p2m_top_mfn_p[topidx] = mid_mfn;
> }
>
> - if (p2m_top[topidx][mididx] == p2m_missing) {
> + if (p2m_top[topidx][mididx] == p2m_identity) {
> /* p2m leaf page is missing */
> unsigned long *p2m;
>
> @@ -489,7 +490,7 @@ static bool alloc_p2m(unsigned long pfn)
>
> p2m_init(p2m);
>
> - if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
> + if (cmpxchg(&mid[mididx], p2m_identity, p2m) != p2m_identity)
> free_p2m_page(p2m);
> else
> mid_mfn[mididx] = virt_to_mfn(p2m);
> @@ -512,7 +513,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> mididx = p2m_mid_index(pfn);
> idx = p2m_index(pfn);
>
> - if (p2m_top[topidx][mididx] == p2m_missing)
> + if (p2m_top[topidx][mididx] == p2m_identity)
> return mfn == INVALID_P2M_ENTRY;
>
> p2m_top[topidx][mididx][idx] = mfn;

Thanks,
J

2010-12-21 23:23:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

On 12/21/2010 02:19 PM, Jeremy Fitzhardinge wrote:
>
> Also, I'm not a fan of hiding real side-effectful code in a BUG_ON
> predicate.
>

That's an understatement. There are two schools of thought about what
that should mean if BUG_ON was suppressed at compile time.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-12-22 08:37:24

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH v1] Consider void entries in the P2M as 1-1 mapping.

On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> In the past we used to think of those regions as "missing" and under
> the ownership of the balloon code. But the balloon code only operates
> on a specific region. This region is in lastE820 RAM page (basically
> any region past nr_pages is considered balloon type page).

That is true at start of day but once the system is up and running the
balloon driver can make a hole for anything which can be returned by
alloc_page.

The following descriptions seem to consider this correctly but I just
wanted to clarify.

I don't think it's necessarily the last E820 RAM page either, that's
just what the tools today happen to build. In principal the tools could
push down a holey e820 (e.g. with PCI holes prepunched etc) and boot the
domain ballooned down such that the N-2, N-3 e820 RAM regions are above
nr_pages too.

> This patchset considers the void entries as "identity" and for balloon
> pages you have to set the PFNs to be "missing". This means that the
> void entries are now considered 1-1, so for PFNs which exist in large
> gaps of the P2M space will return the same PFN.

I would naively have expected that a missing entry indicated an
invalid/missing entry rather than an identity region, it just seems like
the safer default since we are (maybe) more likely to catch an
INVALID_P2M_ENTRY before handing it to the hypervisor and getting
ourselves shot.

In that case the identity regions would need to be explicitly
registered, is that harder to do?

I guess we could register any hole or explicit non-RAM region in the
e820 as identity but do we sometimes see I/O memory above the top of the
e820 or is there some other problem I'm not thinking of?

> The xen/mmu.c code where it deals with _PAGE_IOMAP can be removed, but
> to guard against regressions or bugs lets take it one patchset at a
> time.

Could we have a WARN_ON(_PAGE_IOMAP && !PAGE_IDENTITY) (or whatever the
predicates really are) in some relevant places in mmu.c?

Ian.

2010-12-22 08:50:03

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 05/10] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.

On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> @@ -254,6 +282,12 @@ char * __init xen_memory_setup(void)
>
> xen_add_extra_mem(extra_pages);
>
> + /*
> + * Set P2M for all non-RAM pages and E820 gaps to be identity
> + * type PFNs.
> + */
> + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n",
> + xen_set_identity(&e820));

I had to look at this patch twice to find the caller of the
xen_set_identity function here. I think spending a variable here would
be worth the improved clarity.

Ian.

2010-12-22 08:50:51

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

On Tue, 2010-12-21 at 22:19 +0000, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > + BUG_ON(!set_phys_to_machine(pfn, INVALID_P2M_ENTRY));
[...]
> I'm not a fan of hiding real side-effectful code in a BUG_ON
> predicate.
>
> > + BUG_ON(get_phys_to_machine(pfn) != INVALID_P2M_ENTRY);

I'm also not a fan of this "is the previous I called function buggy"
BUG_ON.

If we aren't confident that set_phys_to_machine() is doing the right
thing then adding a post-condition check to that function would be a
more preferable option.

Ian.

2010-12-22 08:53:57

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 03/10] xen/mmu: Add the notion of IDENTITY_P2M_ENTRY.

On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> P.S.
> We cannot set the IDENTITY_P2M_ENTRY in the P2M tree. This is b/c
> the toolstack considers that invalid and would abort during
> migration of the PV guest.

What do we do instead, just write the actual identity MFN value?

Does using 1UL<<30 (or 1UL<<62 on 64 bit, note that Jeremy recently
fixed FOREIGN_FRAME_BIT too) as IDENTITY_P2M_ENTRY keep the toolstack
happy while still allowing us to distinguish from identity from
happens-to-be-the-same as necessary?

To some extent 1-1 I/O regions could be considered foreign mappings (of
"memory" owned by DOMIO) so perhaps it's possible that setting the
FOREIGN_FRAME_BIT is both sufficient and correct?

Ian.

2010-12-22 14:53:41

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

On Tue, Dec 21, 2010 at 02:19:40PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > This patch prepares ourselves for the case where void entries in the P2M
> > tree structure do not necessarily imply that the pages are missing.
> > With this, we diligently set regions that will be used by the
> > balloon driver to be INVALID_P2M_ENTRY and under the ownership
> > of the balloon driver.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/xen/setup.c | 8 ++++++++
> > drivers/xen/balloon.c | 1 +
> > 2 files changed, 9 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index b5a7f92..d984d36 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
> >
> > static __init void xen_add_extra_mem(unsigned long pages)
> > {
> > + unsigned long pfn;
> > +
> > u64 size = (u64)pages * PAGE_SIZE;
> > u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
> >
> > @@ -66,6 +68,11 @@ static __init void xen_add_extra_mem(unsigned long pages)
> > xen_extra_mem_size += size;
> >
> > xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
> > +
> > + for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++) {
> > + BUG_ON(!set_phys_to_machine(pfn, INVALID_P2M_ENTRY));
>
> Use __set_phys_to_machine where you don't expect (or can't allow) any
> allocation.

Are you OK with me moving then this check:

if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
from set_phys_to_machine to __set_phys_to_machine?

>
> Also, I'm not a fan of hiding real side-effectful code in a BUG_ON
> predicate.

Ok. Will expunge these.

2010-12-22 15:00:07

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 02/10] xen/p2m: change p2m_missing_* to p2m_identity_*

On Tue, Dec 21, 2010 at 02:41:23PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > From: Konrad Rzeszutek Wilk <[email protected]>
> >
> > We are going to alter how we think about P2M. Most of the
> > P2M contains MFN, and areas that are not populated are
> > considered to be "missing". Missing means that the PFN
> > is either not set for this guest (not have that much memory
> > allocated) or is under the balloon driver ownership.
> >
> > We are instead now going to think of those not populated
> > areas as "identity." Meaning that that the PFN for which
> > we would get the p2m_identity we will provide the the PFN
> > value back instead of P2M_MISSING. Essentially treating
> > those regions as PFN==MFN.
> >
>
> This renames missing -> identity, but does it actually change the
> functionality? Doesn't it just leave it being misnamed? It would
> probably be better to fold in the actual identity implementation as well.

You sure? It would be a lot of changes in one patch. This patch is
a nop - so no functional changes except the name change.

Let me annotate the git tree to mention this.

> > mid_mfn_mfn = virt_to_mfn(mid_mfn);
> > - if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
> > + if (cmpxchg(top_mfn_p, identity_mfn, mid_mfn_mfn) !=
> > + identity_mfn)
>
> Don't wrap this.

Checkpatch.pl was unhappy without it. I can ignore this.

2010-12-22 15:04:36

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 04/10] xen/mmu: For 1-1 mapping, automatically set _PAGE_IOMAP.

On Tue, Dec 21, 2010 at 02:29:31PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > This means that for PFNs (specifically: those in any E820 gaps
> > or non-RAM E820 regions) that have 1-1 mapping we set the
> > _PAGE_IOMAP flag.
> >
> > Later on we could remove the _PAGE_IOMAP code handling, but
> > for right now lets keep this in to not introduce any bisection
> > failures across this patchset.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/xen/mmu.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 4ba7e4e..bd02e7d 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -832,6 +832,9 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> > pteval_t flags = val & PTE_FLAGS_MASK;
> > unsigned long mfn = pfn_to_mfn(pfn);
> >
> > + if (mfn == pfn)
> > + flags |= _PAGE_IOMAP;
>
> Why? Does it really make sense to set _PAGE_IOMAP if they just happen
> to be the same value?

Yes. Without this, user applications, such as 'dmidecode' cannot get
data.

But I think with ditching a bunch of the _PAGE_IOMAP in the xen/mmu.c we can
remove this.

I would rather keep this patch as temporary scaffolding and when the
other set of patches is ready for the _PAGE_IOMAP, ditch this one.

>
> J

2010-12-22 15:04:42

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 05/10] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.

> Couldn't you just do something like:
>
> if (e820->map[i].type != E820_RAM)

I am going to assume you meant '==' here.

> continue;
>
> for (pfn = PFN_UP(last); pfn < PFN_DOWN(start); pfn++)
> set_phys_to_machine(pfn, pfn);
> identity += pfn - PFN_UP(last);
>
> last = end;
>
> ie, handle the hole and non-RAM cases together?

A derivation of this does work:

last = ISA_END_ADDRESS;
for (i = 0; i < e820->nr_map; i++) {
phys_addr_t start = e820->map[i].addr;
phys_addr_t end = start + e820->map[i].size;

if (end < start)
continue;

/* Skip over the 1MB region. */
if (last > end)
continue;

if (e820->map[i].type == E820_RAM) {
/* Without saving 'last' we would end up gobbling RAM regions. */
last = end;
continue;
}

for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
set_phys_to_machine(pfn, pfn);
identity += pfn - PFN_UP(last);

last = end;
}


>
> Also, what happens with the p2m tree mid layers in this? If you're
> doing page-by-page set_phys_to_machine, won't it end up allocating them
> all? How can you optimise the "large chunks of address space are
> identity" case?

The issue here is that when this code path is called (xen_memory_setup),
the p2m_top[topidx][mididx] has been set to start_info->mfn_list[] (by
xen_build_dynamic_phys_to_machine).

Granted, some of these entries have been evicted (by the xen_return_unused_memory),
so the regions in the mfn_list are pock-marked with INVALID_P2M_ENTRY. For those
regions we set the PFN in the p2m_top[topidx][mididx][idx]. We do
not allocate anything during this pass.

In the the dom0_mem=max:X (or X,max:Y, where Y>X), which I neglected to test this
would actually try to allocate (whoops). Let me roll up a patch for this.

>
> It would probably be cleanest to have a set_ident_phys_to_machine(start,
> end) function which can do all that.

Not sure if it is truly needed.

2010-12-22 15:06:50

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH v1] Consider void entries in the P2M as 1-1 mapping.

On Wed, Dec 22, 2010 at 08:36:55AM +0000, Ian Campbell wrote:
> On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> > In the past we used to think of those regions as "missing" and under
> > the ownership of the balloon code. But the balloon code only operates
> > on a specific region. This region is in lastE820 RAM page (basically
> > any region past nr_pages is considered balloon type page).
>
> That is true at start of day but once the system is up and running the
> balloon driver can make a hole for anything which can be returned by
> alloc_page.

<nods>
>
> The following descriptions seem to consider this correctly but I just
> wanted to clarify.

Yes. Thank you for thinking this one through.
>
> I don't think it's necessarily the last E820 RAM page either, that's
> just what the tools today happen to build. In principal the tools could
> push down a holey e820 (e.g. with PCI holes prepunched etc) and boot the
> domain ballooned down such that the N-2, N-3 e820 RAM regions are above
> nr_pages too.

OK, but they would be marked as E820 RAM regions, right?
>
> > This patchset considers the void entries as "identity" and for balloon
> > pages you have to set the PFNs to be "missing". This means that the
> > void entries are now considered 1-1, so for PFNs which exist in large
> > gaps of the P2M space will return the same PFN.
>
> I would naively have expected that a missing entry indicated an
> invalid/missing entry rather than an identity region, it just seems like

It has. For regions that are small, or already allocated it would
stuff the INVALID_P2M_ENTRY in it. For larger areas (so more than 1MB or so)
if there has not been a top entry allocated for it, it will attach
the p2m_mid_missing to it which has pointes to p2m_missing, which in
turn is filled iwht INVALID_P2M_ENTRY.

> the safer default since we are (maybe) more likely to catch an
> INVALID_P2M_ENTRY before handing it to the hypervisor and getting
> ourselves shot.

When I think entry, I think the lowel-level of the tree, not the
top or middle which are the ones that are by default now considered
"identity". FYI, the p2m_identity is stuffed with INVALID_P2M_ENTRY
so if somebody does get a hold of the value there somehow without
first trying to set it, we would catch it and do this:

(xen/mmu.c, pte_pfn_to_mfn function):

/*
* If there's no mfn for the pfn, then just create an
* empty non-present pte. Unfortunately this loses
* information about the original pfn, so
* pte_mfn_to_pfn is asymmetric.
*/
if (unlikely(mfn == INVALID_P2M_ENTRY)) {
mfn = 0;
flags = 0;
}


>
> In that case the identity regions would need to be explicitly
> registered, is that harder to do?

It might not be.. but it would end up in the same logic path (in
the pte_pfn_to_mfn function).

>
> I guess we could register any hole or explicit non-RAM region in the
> e820 as identity but do we sometimes see I/O memory above the top of the
> e820 or is there some other problem I'm not thinking of?

Hot plug memory is one. There are also some PCI BARs that are above
that region (but I can't remember the details). Jeremy mentioned
something about Fujitsu machines.

>
> > The xen/mmu.c code where it deals with _PAGE_IOMAP can be removed, but
> > to guard against regressions or bugs lets take it one patchset at a
> > time.
>
> Could we have a WARN_ON(_PAGE_IOMAP && !PAGE_IDENTITY) (or whatever the
> predicates really are) in some relevant places in mmu.c?

The PAGE_IDENTITY or (IDENTITY_P2M_ENTRY) is never set anywhere. We could
do this:

WARN_ON(pfn_to_mfn(pfn)==pfn && (flag & _PAGE_IOMAP))

but that would be printed all the time.

Unless I saved some extra flag (as you were alluding to earlier) and did that
along with the MFN and for identity mappings just returned that flag unconditionaly.

2010-12-22 15:08:28

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 06/10] xen/setup: Only set identity mapping in E820 regions when privileged.

On Tue, Dec 21, 2010 at 02:37:01PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > We do not want to set the identity mapping on E820 reserved
> > regions when running as PV. This is b/c the 0->ISA_END_ADDRESS region
> > would be considered identity and we would try to read DMI information
> > and fail (since the pfn_to_mfn(mfn)==pfn) under PV guests.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/xen/setup.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 752c865..34fb906 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -160,7 +160,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820)
> > if (end < start)
> > continue;
> >
> > - if (e820->map[i].type != E820_RAM) {
> > + if (xen_initial_domain() && e820->map[i].type != E820_RAM) {
> > for (pfn = PFN_UP(start); pfn < PFN_DOWN(end); pfn++)
> > set_phys_to_machine(pfn, pfn);
> > identity += pfn - PFN_UP(start);
>
> So you're relying on the fact that the ISA area is the only non-RAM e820
> entry in domU? I think it would be better to do a specific exclusion
> for the ISA area rather than this.

Ok. I've rolled something along what xen_return_unused regions does in the
function.

2010-12-22 15:10:53

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 07/10] xen/mmu: Work with 1-1 mappings when allocating new top/middle entries.

On Tue, Dec 21, 2010 at 02:37:46PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > We were not properly taking under advisement the 1-1 mappings
> > when a large area of memory was ballooned out.
>
> Could you expand on this? What does it mean?

The balloon code was going from the end of its region down, and those
regions were in the p2m_missing (and p2m_mid_missing) zone. Which is
correct, except that the alloc_p2m did not know how to handle this.

So it never actually allocated the middle or entry level pages
which would contain the newly added PFNs.

I've rolled this patch in the
xen/mmu: Add the notion of identity (1-1) mapping.

patch soon to be posted.

2010-12-22 15:13:15

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 08/10] xen/mmu: Bugfix. Fill the top entry page with appropriate middle layer pointers.

On Tue, Dec 21, 2010 at 02:38:30PM -0800, Jeremy Fitzhardinge wrote:
> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > If we swapped over from using an p2m_mid_identical to p2m_mid_missing
> > (earlier call to set_phys_to_machine) and then started going through the
> > PFNs in descending order to program a new MFN (balloon worker), we would
> > end up in this code path. At that point we would set up new page filled with
> > pointers to p2m_identity instead of p2m_missing. This had the disastrous
> > effect that get_phys_to_machine on that PFN would return an 1-1 mapping
> > instead of INVALID_P2M_ENTRY resulting in hitting a BUG check in balloon driver.
> >
> Are you going to fold this into the appropriate patch later?

Yes. Rolled it in
xen/mmu: Add the notion of identity (1-1) mapping.

2010-12-22 15:46:34

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 01/10] xen: Make all reserved pages for the balloon be INVALID_P2M_ENTRY.

On 12/22/2010 06:53 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 21, 2010 at 02:19:40PM -0800, Jeremy Fitzhardinge wrote:
>> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
>>> This patch prepares ourselves for the case where void entries in the P2M
>>> tree structure do not necessarily imply that the pages are missing.
>>> With this, we diligently set regions that will be used by the
>>> balloon driver to be INVALID_P2M_ENTRY and under the ownership
>>> of the balloon driver.
>>>
>>> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
>>> ---
>>> arch/x86/xen/setup.c | 8 ++++++++
>>> drivers/xen/balloon.c | 1 +
>>> 2 files changed, 9 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>>> index b5a7f92..d984d36 100644
>>> --- a/arch/x86/xen/setup.c
>>> +++ b/arch/x86/xen/setup.c
>>> @@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size;
>>>
>>> static __init void xen_add_extra_mem(unsigned long pages)
>>> {
>>> + unsigned long pfn;
>>> +
>>> u64 size = (u64)pages * PAGE_SIZE;
>>> u64 extra_start = xen_extra_mem_start + xen_extra_mem_size;
>>>
>>> @@ -66,6 +68,11 @@ static __init void xen_add_extra_mem(unsigned long pages)
>>> xen_extra_mem_size += size;
>>>
>>> xen_max_p2m_pfn = PFN_DOWN(extra_start + size);
>>> +
>>> + for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++) {
>>> + BUG_ON(!set_phys_to_machine(pfn, INVALID_P2M_ENTRY));
>> Use __set_phys_to_machine where you don't expect (or can't allow) any
>> allocation.
> Are you OK with me moving then this check:
>
> if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
> BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
> return true;
> }
> from set_phys_to_machine to __set_phys_to_machine?

Yep - not that we'll see that taken on any current or near-future Xen, I
suspect.

J

2010-12-22 17:48:19

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 07/10] xen/mmu: Work with 1-1 mappings when allocating new top/middle entries.

On Wed, Dec 22, 2010 at 08:54:31AM +0000, Ian Campbell wrote:
> On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> > We were not properly taking under advisement the 1-1 mappings
> > when a large area of memory was ballooned out.
>
> Are we lazily allocating the p2m tree nodes for regions initially
> covered by the balloon? (perhaps we have always done this and it isn't
> new with this series)

Before: Yes
Now: Yes.
>
> Would it be simpley to always populate enough tree nodes to cover the
> ballooned area as well as nr_pages at start of day and therefore avoid
> worrying about it later on (except for memory hotplug which is special
> in this way already)?

Tried that, ran out of reserved_brk space :-) But not sure what
we would gain for this - it is not always guaranteed that we will populate
up to the memory 'maxmem' region.

2010-12-22 18:01:24

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH v1] Consider void entries in the P2M as 1-1 mapping.

> > OK, but they would be marked as E820 RAM regions, right?
>
> Yes. There's no special E820 type for ballooned out RAM.

Wheew, good.
>
> > It has. For regions that are small, or already allocated it would
> > stuff the INVALID_P2M_ENTRY in it. For larger areas (so more than 1MB or so)
> > if there has not been a top entry allocated for it, it will attach
> > the p2m_mid_missing to it which has pointes to p2m_missing, which in
> > turn is filled iwht INVALID_P2M_ENTRY.
>
> Hrm, I think I'm probably just confused by the missing vs. invalid vs.
> void terminology and conflating it all with IDENTITY/INVALID_P2M_ENTRY
> and getting in a mess.

I should do a better job explaining this. Will attach some pictures next
time.

>
> > > the safer default since we are (maybe) more likely to catch an
> > > INVALID_P2M_ENTRY before handing it to the hypervisor and getting
> > > ourselves shot.
> >
> > When I think entry, I think the lowel-level of the tree, not the
> > top or middle which are the ones that are by default now considered
> > "identity".
>
> "now" before this series or "now" after?

After.
>
> I think the default value for a lookup of an uninitialised entry should
> be the same regardless of whether the mid levels of the tree happen to
> be filled in (or pointing to common placeholder entries) or not. Is that
> the case?

Yes. But there are no uninitialized entry. All of them are either
INVALID_P2M_ENTRY or have a PFN value (with some potential flags attached to them).

Nothing else is allowed.
>
> > FYI, the p2m_identity is stuffed with INVALID_P2M_ENTRY
> > so if somebody does get a hold of the value there somehow without
> > first trying to set it, we would catch it and do this:
>
> p2m_identity is filled with INVALID_P2M_ENTRY? No wonder I'm confused by
> the names ;-) Why isn't it either called p2m_invalid or filled with

I am using both 'p2m_missing' and 'p2m_identity' pointers as a way
to figure out if the entries are considered missing (so up for balloon
graps) or identity PFNs. If it is neither p2m_missing nor p2m_identity it means
it has been allocated (probably via alloc_p2m) and contains PFNs (which
might be INVALID_P2M_ENTRY if balloon plucks that page out, a PFN,
or an 1-1 if the E820 gap or reserved region falls within that entry).

The contents of both pages (p2m_missing and p2m_identity) is INVALID_P2M_ENTRY.

> IDENTITY_P2M_ENTRY?

The value 0 would make the toolstack during migrate throw a fit.
>
> > It might not be.. but it would end up in the same logic path (in
> > the pte_pfn_to_mfn function).
>
> Sure.
>
> My concern is about this bit but rather about what accesses to unknown
> entries return. Currently I think they return INVALID_P2M_ENTRY but you
> are proposing that they return identity instead, which seems wrong for

Correct.
> anything which isn't explicitly initialised as I/O (or identity for any
> other reason).

Aha! And this is what we are fixing. You see, a lot of drivers don't explicitly
initialize their vmap's as I/O (or do as VM_IO but actually use real RAM). This
makes it possible to work with those guys.


I think what you are saying is to be more conservative and only set those implicit
1-1 mappings on E820 gaps, and on non-RAM E820 regions.

Everything else should be considered missing so that we will return for
pfn_to_mfn(MAX_P2M_PFN) == INVALID_P2M_ENTRY instead of MAX_P2M_PFN?

>
> > >
> > > > The xen/mmu.c code where it deals with _PAGE_IOMAP can be removed, but
> > > > to guard against regressions or bugs lets take it one patchset at a
> > > > time.
> > >
> > > Could we have a WARN_ON(_PAGE_IOMAP && !PAGE_IDENTITY) (or whatever the
> > > predicates really are) in some relevant places in mmu.c?
> >
> > The PAGE_IDENTITY or (IDENTITY_P2M_ENTRY) is never set anywhere.
>
> So how is it used? I don't see it apart from in a single BUG_ON and some
> comments. Do you just rely on IDENTITY_P2M_ENTRY==0 and things being
> filed with 0 by default?

No. INVALID_P2M_ENTRY.

Now that I think of it, I am not sure why I even introduced the
IDENTITY_P2M_ENTRY. It sure is confusing.

2010-12-22 16:28:24

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 04/10] xen/mmu: For 1-1 mapping, automatically set _PAGE_IOMAP.

On Wed, 2010-12-22 at 15:02 +0000, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 21, 2010 at 02:29:31PM -0800, Jeremy Fitzhardinge wrote:
> > On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
> > > @@ -832,6 +832,9 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> > > pteval_t flags = val & PTE_FLAGS_MASK;
> > > unsigned long mfn = pfn_to_mfn(pfn);
> > >
> > > + if (mfn == pfn)
> > > + flags |= _PAGE_IOMAP;
> >
> > Why? Does it really make sense to set _PAGE_IOMAP if they just happen
> > to be the same value?
>
> Yes. Without this, user applications, such as 'dmidecode' cannot get
> data.

I think Jeremy's point is that the "mfn == pfn" here has false
positives, since it is possible that a normal RAM page has identical mfn
and pfn by pure chance.

Ian.

2010-12-22 17:05:20

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 07/10] xen/mmu: Work with 1-1 mappings when allocating new top/middle entries.

On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> We were not properly taking under advisement the 1-1 mappings
> when a large area of memory was ballooned out.

Are we lazily allocating the p2m tree nodes for regions initially
covered by the balloon? (perhaps we have always done this and it isn't
new with this series)

Would it be simpley to always populate enough tree nodes to cover the
ballooned area as well as nr_pages at start of day and therefore avoid
worrying about it later on (except for memory hotplug which is special
in this way already)?

Ian.

>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/mmu.c | 13 ++++++++-----
> 1 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index bd02e7d..92f4fec 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -464,7 +464,7 @@ static void free_p2m_page(void *p)
> static bool alloc_p2m(unsigned long pfn)
> {
> unsigned topidx, mididx;
> - unsigned long ***top_p, **mid;
> + unsigned long ***top_p, **mid, **mid_orig;
> unsigned long *top_mfn_p, *mid_mfn;
>
> topidx = p2m_top_index(pfn);
> @@ -473,15 +473,16 @@ static bool alloc_p2m(unsigned long pfn)
> top_p = &p2m_top[topidx];
> mid = *top_p;
>
> - if (mid == p2m_mid_identity) {
> + if (mid == p2m_mid_identity || mid == p2m_mid_missing) {
> /* Mid level is missing, allocate a new one */
> + mid_orig = mid;
> mid = alloc_p2m_page();
> if (!mid)
> return false;
>
> p2m_mid_init(mid, p2m_identity);
>
> - if (cmpxchg(top_p, p2m_mid_identity, mid) != p2m_mid_identity)
> + if (cmpxchg(top_p, mid_orig, mid) != mid_orig)
> free_p2m_page(mid);
> }
>
> @@ -510,9 +511,11 @@ static bool alloc_p2m(unsigned long pfn)
> p2m_top_mfn_p[topidx] = mid_mfn;
> }
>
> - if (p2m_top[topidx][mididx] == p2m_identity) {
> + if (p2m_top[topidx][mididx] == p2m_identity ||
> + p2m_top[topidx][mididx] == p2m_missing) {
> /* p2m leaf page is missing */
> unsigned long *p2m;
> + unsigned long *p2m_orig = p2m_top[topidx][mididx];
>
> p2m = alloc_p2m_page();
> if (!p2m)
> @@ -520,7 +523,7 @@ static bool alloc_p2m(unsigned long pfn)
>
> p2m_init(p2m);
>
> - if (cmpxchg(&mid[mididx], p2m_identity, p2m) != p2m_identity)
> + if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
> free_p2m_page(p2m);
> else
> mid_mfn[mididx] = virt_to_mfn(p2m);

2010-12-22 16:26:21

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH v1] Consider void entries in the P2M as 1-1 mapping.

On Wed, 2010-12-22 at 15:06 +0000, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 22, 2010 at 08:36:55AM +0000, Ian Campbell wrote:
> > On Tue, 2010-12-21 at 21:37 +0000, Konrad Rzeszutek Wilk wrote:
> > > In the past we used to think of those regions as "missing" and under
> > > the ownership of the balloon code. But the balloon code only operates
> > > on a specific region. This region is in lastE820 RAM page (basically
> > > any region past nr_pages is considered balloon type page).
> >
> > That is true at start of day but once the system is up and running the
> > balloon driver can make a hole for anything which can be returned by
> > alloc_page.
>
> <nods>
> >
> > The following descriptions seem to consider this correctly but I just
> > wanted to clarify.
>
> Yes. Thank you for thinking this one through.
> >
> > I don't think it's necessarily the last E820 RAM page either, that's
> > just what the tools today happen to build. In principal the tools could
> > push down a holey e820 (e.g. with PCI holes prepunched etc) and boot the
> > domain ballooned down such that the N-2, N-3 e820 RAM regions are above
> > nr_pages too.
>
> OK, but they would be marked as E820 RAM regions, right?

Yes. There's no special E820 type for ballooned out RAM.

> > > This patchset considers the void entries as "identity" and for balloon
> > > pages you have to set the PFNs to be "missing". This means that the
> > > void entries are now considered 1-1, so for PFNs which exist in large
> > > gaps of the P2M space will return the same PFN.
> >
> > I would naively have expected that a missing entry indicated an
> > invalid/missing entry rather than an identity region, it just seems like
>
> It has. For regions that are small, or already allocated it would
> stuff the INVALID_P2M_ENTRY in it. For larger areas (so more than 1MB or so)
> if there has not been a top entry allocated for it, it will attach
> the p2m_mid_missing to it which has pointes to p2m_missing, which in
> turn is filled iwht INVALID_P2M_ENTRY.

Hrm, I think I'm probably just confused by the missing vs. invalid vs.
void terminology and conflating it all with IDENTITY/INVALID_P2M_ENTRY
and getting in a mess.

> > the safer default since we are (maybe) more likely to catch an
> > INVALID_P2M_ENTRY before handing it to the hypervisor and getting
> > ourselves shot.
>
> When I think entry, I think the lowel-level of the tree, not the
> top or middle which are the ones that are by default now considered
> "identity".

"now" before this series or "now" after?

I think the default value for a lookup of an uninitialised entry should
be the same regardless of whether the mid levels of the tree happen to
be filled in (or pointing to common placeholder entries) or not. Is that
the case?

> FYI, the p2m_identity is stuffed with INVALID_P2M_ENTRY
> so if somebody does get a hold of the value there somehow without
> first trying to set it, we would catch it and do this:

p2m_identity is filled with INVALID_P2M_ENTRY? No wonder I'm confused by
the names ;-) Why isn't it either called p2m_invalid or filled with
IDENTITY_P2M_ENTRY?

> It might not be.. but it would end up in the same logic path (in
> the pte_pfn_to_mfn function).

Sure.

My concern is about this bit but rather about what accesses to unknown
entries return. Currently I think they return INVALID_P2M_ENTRY but you
are proposing that they return identity instead, which seems wrong for
anything which isn't explicitly initialised as I/O (or identity for any
other reason).

> >
> > > The xen/mmu.c code where it deals with _PAGE_IOMAP can be removed, but
> > > to guard against regressions or bugs lets take it one patchset at a
> > > time.
> >
> > Could we have a WARN_ON(_PAGE_IOMAP && !PAGE_IDENTITY) (or whatever the
> > predicates really are) in some relevant places in mmu.c?
>
> The PAGE_IDENTITY or (IDENTITY_P2M_ENTRY) is never set anywhere.

So how is it used? I don't see it apart from in a single BUG_ON and some
comments. Do you just rely on IDENTITY_P2M_ENTRY==0 and things being
filed with 0 by default?

> We could
> do this:
>
> WARN_ON(pfn_to_mfn(pfn)==pfn && (flag & _PAGE_IOMAP))
>
> but that would be printed all the time.
>
> Unless I saved some extra flag (as you were alluding to earlier) and did that
> along with the MFN and for identity mappings just returned that flag unconditionaly.

2010-12-22 20:36:27

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 02/10] xen/p2m: change p2m_missing_* to p2m_identity_*

On 12/22/2010 06:59 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 21, 2010 at 02:41:23PM -0800, Jeremy Fitzhardinge wrote:
>> On 12/21/2010 01:37 PM, Konrad Rzeszutek Wilk wrote:
>>> From: Konrad Rzeszutek Wilk <[email protected]>
>>>
>>> We are going to alter how we think about P2M. Most of the
>>> P2M contains MFN, and areas that are not populated are
>>> considered to be "missing". Missing means that the PFN
>>> is either not set for this guest (not have that much memory
>>> allocated) or is under the balloon driver ownership.
>>>
>>> We are instead now going to think of those not populated
>>> areas as "identity." Meaning that that the PFN for which
>>> we would get the p2m_identity we will provide the the PFN
>>> value back instead of P2M_MISSING. Essentially treating
>>> those regions as PFN==MFN.
>>>
>> This renames missing -> identity, but does it actually change the
>> functionality? Doesn't it just leave it being misnamed? It would
>> probably be better to fold in the actual identity implementation as well.
> You sure? It would be a lot of changes in one patch. This patch is
> a nop - so no functional changes except the name change.
>
> Let me annotate the git tree to mention this.

Yeah, I'm in two minds. I like small single-purpose patches, but the
rename really does leave things v. misnamed. I guess it doesn't really
matter for one commit, so long as its still bisectable (and the commit
comment makes it clear that the name is misleading).

>>> mid_mfn_mfn = virt_to_mfn(mid_mfn);
>>> - if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
>>> + if (cmpxchg(top_mfn_p, identity_mfn, mid_mfn_mfn) !=
>>> + identity_mfn)
>> Don't wrap this.
> Checkpatch.pl was unhappy without it. I can ignore this.

Checkpatch is generally wrong on the subject of long lines.

J