Recently kdump for pv-on-hvm Xen guests was implemented by me.
One issue remains:
The xen_balloon driver in the guest frees guest pages and gives them
back to the hypervisor. These pages are marked as mmio in the
hypervisor. During a read of such a page via the /proc/vmcore interface
the hypervisor calls the qemu-dm process. qemu-dm tries to map the page,
this attempt fails because the page is not backed by ram and 0xff is
returned. All this generates high load in dom0 because the reads come
as 8byte requests.
There seems to be no way to make the crash kernel aware of the state of
individual pages in the crashed kernel, it is not aware of memory
ballooning. And doing that from within the "kernel to crash" seems error
prone. Since over time the fragmentation will increase, it would be best
if the crash kernel itself queries the state of oldmem pages.
If copy_oldmem_page() would call a function, a hook, provided by the Xen
pv-on-hvm drivers to query if the pfn to read from is really backed by
ram the load issue could be avoided. Unfortunately, even Xen needs to
get a new interface to query the state of individual hvm guest pfns for
the purpose mentioned above.
Another issue, slightly related, is memory hotplug.
How is this currently handled for kdump? Is there code which
automatically reconfigures the kdump kernel with the new memory ranges?
Olaf
On Thu, Apr 7, 2011 at 5:56 PM, Olaf Hering <[email protected]> wrote:
>
> Recently kdump for pv-on-hvm Xen guests was implemented by me.
>
> One issue remains:
> The xen_balloon driver in the guest frees guest pages and gives them
> back to the hypervisor. These pages are marked as mmio in the
> hypervisor. During a read of such a page via the /proc/vmcore interface
> the hypervisor calls the qemu-dm process. qemu-dm tries to map the page,
> this attempt fails because the page is not backed by ram and 0xff is
> returned. All this generates high load in dom0 because the reads come
> as 8byte requests.
>
> There seems to be no way to make the crash kernel aware of the state of
> individual pages in the crashed kernel, it is not aware of memory
> ballooning. And doing that from within the "kernel to crash" seems error
> prone. Since over time the fragmentation will increase, it would be best
> if the crash kernel itself queries the state of oldmem pages.
>
> If copy_oldmem_page() would call a function, a hook, provided by the Xen
> pv-on-hvm drivers to query if the pfn to read from is really backed by
> ram the load issue could be avoided. Unfortunately, even Xen needs to
> get a new interface to query the state of individual hvm guest pfns for
> the purpose mentioned above.
This makes sense for me, we might need a Xen-specific copy_oldmem_page()
hook and a native hook.
>
> Another issue, slightly related, is memory hotplug.
> How is this currently handled for kdump? Is there code which
> automatically reconfigures the kdump kernel with the new memory ranges?
>
No, the crashkernel memory is reserved during boot, and it is static after
that (except you can shrink this memory via /sys). Kdump isn't aware of
memory hotplug.
Thanks.
On Thu, Apr 07, Américo Wang wrote:
> On Thu, Apr 7, 2011 at 5:56 PM, Olaf Hering <[email protected]> wrote:
> > Another issue, slightly related, is memory hotplug.
> > How is this currently handled for kdump? Is there code which
> > automatically reconfigures the kdump kernel with the new memory ranges?
> >
>
> No, the crashkernel memory is reserved during boot, and it is static after
> that (except you can shrink this memory via /sys).
I meant the overall amount of memory changed by memory hotplug events,
not the small memory range for the crash kernel itself.
> Kdump isn't aware of memory hotplug.
Ok. Perhaps there are hotplug events where a helper script could run
something like 'rckdump restart'.
Olaf
On Thu, Apr 7, 2011 at 9:12 PM, Olaf Hering <[email protected]> wrote:
> On Thu, Apr 07, Américo Wang wrote:
>
>> On Thu, Apr 7, 2011 at 5:56 PM, Olaf Hering <[email protected]> wrote:
>> > Another issue, slightly related, is memory hotplug.
>> > How is this currently handled for kdump? Is there code which
>> > automatically reconfigures the kdump kernel with the new memory ranges?
>> >
>>
>> No, the crashkernel memory is reserved during boot, and it is static after
>> that (except you can shrink this memory via /sys).
>
> I meant the overall amount of memory changed by memory hotplug events,
> not the small memory range for the crash kernel itself.
>
>> Kdump isn't aware of memory hotplug.
>
> Ok. Perhaps there are hotplug events where a helper script could run
> something like 'rckdump restart'.
>
That will not work.
You need to change the kernel to let crashkernel memory to be aware
of memory-hotplug, which means you need to re-search for a suitable
memory range in the newly added memory.
Thanks.
On Thu, Apr 07, Olaf Hering wrote:
> Recently kdump for pv-on-hvm Xen guests was implemented by me.
>
> One issue remains:
> The xen_balloon driver in the guest frees guest pages and gives them
> back to the hypervisor. These pages are marked as mmio in the
> hypervisor. During a read of such a page via the /proc/vmcore interface
> the hypervisor calls the qemu-dm process. qemu-dm tries to map the page,
> this attempt fails because the page is not backed by ram and 0xff is
> returned. All this generates high load in dom0 because the reads come
> as 8byte requests.
>
> There seems to be no way to make the crash kernel aware of the state of
> individual pages in the crashed kernel, it is not aware of memory
> ballooning. And doing that from within the "kernel to crash" seems error
> prone. Since over time the fragmentation will increase, it would be best
> if the crash kernel itself queries the state of oldmem pages.
Here is a version that works for me.
A hook is called for each pfn. If the hook returns 0 the pfn is not ram,
and the range is cleared, for sparse vmcore files.
Otherwise the pfn is handled as ram. In the worst case each read attempt
will be handled by qemu-dm process which causes just some load in dom0.
This patch still lacks locking.
How can I make sure unregister_oldmem_pfn_is_ram() is not called while
the loop in read_from_oldmem() is still active?
Is there an example of similar code already in the kernel?
Olaf
---
fs/proc/vmcore.c | 29 ++++++++++++++++++++++++++---
include/linux/crash_dump.h | 5 +++++
2 files changed, 31 insertions(+), 3 deletions(-)
Index: linux-2.6.39-rc5/fs/proc/vmcore.c
===================================================================
--- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
+++ linux-2.6.39-rc5/fs/proc/vmcore.c
@@ -35,6 +35,22 @@ static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore = NULL;
+/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
+static int (*oldmem_pfn_is_ram)(unsigned long pfn);
+
+void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
+{
+ oldmem_pfn_is_ram = fn;
+}
+
+void unregister_oldmem_pfn_is_ram(void)
+{
+ oldmem_pfn_is_ram = NULL;
+}
+
+EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
+EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
+
/* Reads a page from the oldmem device from given offset. */
static ssize_t read_from_oldmem(char *buf, size_t count,
u64 *ppos, int userbuf)
@@ -42,6 +58,7 @@ static ssize_t read_from_oldmem(char *bu
unsigned long pfn, offset;
size_t nr_bytes;
ssize_t read = 0, tmp;
+ int (*fn)(unsigned long);
if (!count)
return 0;
@@ -55,9 +72,15 @@ static ssize_t read_from_oldmem(char *bu
else
nr_bytes = count;
- tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
- if (tmp < 0)
- return tmp;
+ fn = oldmem_pfn_is_ram;
+ /* if pfn is not ram, return zeros for spares dump files */
+ if (fn && fn(pfn) == 0)
+ memset(buf, 0, nr_bytes);
+ else {
+ tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
+ if (tmp < 0)
+ return tmp;
+ }
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
Index: linux-2.6.39-rc5/include/linux/crash_dump.h
===================================================================
--- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
+++ linux-2.6.39-rc5/include/linux/crash_dump.h
@@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
if (is_kdump_kernel())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
}
+
+#define HAVE_OLDMEM_PFN_IS_RAM 1
+extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));
+extern void unregister_oldmem_pfn_is_ram(void);
+
#else /* !CONFIG_CRASH_DUMP */
static inline int is_kdump_kernel(void) { return 0; }
#endif /* CONFIG_CRASH_DUMP */
The balloon driver in a Xen guest frees guest pages and marks them as
mmio. When the kernel crashes and the crash kernel attempts to read the
oldmem via /proc/vmcore a read from ballooned pages will generate 100%
load in dom0 because Xen asks qemu-dm for the page content. Since the
reads come in as 8byte requests each ballooned page is tried 512 times.
With this change a hook can be registered which checks wether the given
pfn is really ram. The hook has to return a value > 0 for ram pages, a
value < 0 on error (because the hypercall is not known) and 0 for
non-ram pages.
This will reduce the time to read /proc/vmcore. Without this change a
512M guest with 128M crashkernel region needs 200 seconds to read it,
with this change it takes just 2 seconds.
Signed-off-by: Olaf Hering <[email protected]>
---
fs/proc/vmcore.c | 50 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/crash_dump.h | 5 ++++
2 files changed, 52 insertions(+), 3 deletions(-)
Index: linux-2.6.39-rc5/fs/proc/vmcore.c
===================================================================
--- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
+++ linux-2.6.39-rc5/fs/proc/vmcore.c
@@ -18,6 +18,7 @@
#include <linux/init.h>
#include <linux/crash_dump.h>
#include <linux/list.h>
+#include <linux/wait.h>
#include <asm/uaccess.h>
#include <asm/io.h>
@@ -35,6 +36,44 @@ static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore = NULL;
+/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
+static int (*oldmem_pfn_is_ram)(unsigned long pfn);
+static DECLARE_WAIT_QUEUE_HEAD(oldmem_fn_waitq);
+static atomic_t oldmem_fn_refcount = ATOMIC_INIT(0);
+
+void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
+{
+ if (oldmem_pfn_is_ram == NULL)
+ oldmem_pfn_is_ram = fn;
+}
+
+void unregister_oldmem_pfn_is_ram(void)
+{
+ wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
+ oldmem_pfn_is_ram = NULL;
+ wmb();
+}
+
+static int pfn_is_ram(unsigned long pfn)
+{
+ int (*fn)(unsigned long);
+ /* pfn is ram unless fn() checks pagetype */
+ int ret = 1;
+
+ atomic_inc(&oldmem_fn_refcount);
+ smp_mb__after_atomic_inc();
+ fn = oldmem_pfn_is_ram;
+ if (fn)
+ ret = fn(pfn);
+ if (atomic_dec_and_test(&oldmem_fn_refcount))
+ wake_up(&oldmem_fn_waitq);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
+EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
+
/* Reads a page from the oldmem device from given offset. */
static ssize_t read_from_oldmem(char *buf, size_t count,
u64 *ppos, int userbuf)
@@ -55,9 +94,14 @@ static ssize_t read_from_oldmem(char *bu
else
nr_bytes = count;
- tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
- if (tmp < 0)
- return tmp;
+ /* if pfn is not ram, return zeros for spares dump files */
+ if (pfn_is_ram(pfn) == 0)
+ memset(buf, 0, nr_bytes);
+ else {
+ tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
+ if (tmp < 0)
+ return tmp;
+ }
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
Index: linux-2.6.39-rc5/include/linux/crash_dump.h
===================================================================
--- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
+++ linux-2.6.39-rc5/include/linux/crash_dump.h
@@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
if (is_kdump_kernel())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
}
+
+#define HAVE_OLDMEM_PFN_IS_RAM 1
+extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));
+extern void unregister_oldmem_pfn_is_ram(void);
+
#else /* !CONFIG_CRASH_DUMP */
static inline int is_kdump_kernel(void) { return 0; }
#endif /* CONFIG_CRASH_DUMP */
On Tue, 3 May 2011 21:08:06 +0200
Olaf Hering <[email protected]> wrote:
>
> The balloon driver in a Xen guest frees guest pages and marks them as
> mmio. When the kernel crashes and the crash kernel attempts to read the
> oldmem via /proc/vmcore a read from ballooned pages will generate 100%
> load in dom0 because Xen asks qemu-dm for the page content. Since the
> reads come in as 8byte requests each ballooned page is tried 512 times.
>
> With this change a hook can be registered which checks wether the given
> pfn is really ram. The hook has to return a value > 0 for ram pages, a
> value < 0 on error (because the hypercall is not known) and 0 for
> non-ram pages.
>
> This will reduce the time to read /proc/vmcore. Without this change a
> 512M guest with 128M crashkernel region needs 200 seconds to read it,
> with this change it takes just 2 seconds.
Seems reasonable, I suppose.
Is there some suitable ifdef we can put around this stuff to avoid
adding it to kernel builds which will never use it?
> ...
>
> --- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
> +++ linux-2.6.39-rc5/fs/proc/vmcore.c
> @@ -18,6 +18,7 @@
> #include <linux/init.h>
> #include <linux/crash_dump.h>
> #include <linux/list.h>
> +#include <linux/wait.h>
> #include <asm/uaccess.h>
> #include <asm/io.h>
>
> @@ -35,6 +36,44 @@ static u64 vmcore_size;
>
> static struct proc_dir_entry *proc_vmcore = NULL;
>
> +/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
> +static int (*oldmem_pfn_is_ram)(unsigned long pfn);
> +static DECLARE_WAIT_QUEUE_HEAD(oldmem_fn_waitq);
> +static atomic_t oldmem_fn_refcount = ATOMIC_INIT(0);
> +
> +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> +{
> + if (oldmem_pfn_is_ram == NULL)
> + oldmem_pfn_is_ram = fn;
> +}
This is racy, and it should return a success code. And we may as well
mark it __must_check to prevent people from cheating.
> +void unregister_oldmem_pfn_is_ram(void)
> +{
> + wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> + oldmem_pfn_is_ram = NULL;
> + wmb();
> +}
I'd say we should do away with the (also racy) refcount thing.
Instead, require that callers not be using the thing when they
unregister. ie: that callers not be buggy.
> +static int pfn_is_ram(unsigned long pfn)
> +{
> + int (*fn)(unsigned long);
> + /* pfn is ram unless fn() checks pagetype */
> + int ret = 1;
> +
> + atomic_inc(&oldmem_fn_refcount);
> + smp_mb__after_atomic_inc();
> + fn = oldmem_pfn_is_ram;
> + if (fn)
> + ret = fn(pfn);
> + if (atomic_dec_and_test(&oldmem_fn_refcount))
> + wake_up(&oldmem_fn_waitq);
> +
> + return ret;
> +}
This function would have been a suitable place at which to document the
entire feature. As it stands, anyone who is reading this code won't
have any clue why it exists.
> +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
Each export should be placed immediately after the function which is
being exported, please. Checkpatch reports this. Please use checkpatch.
> /* Reads a page from the oldmem device from given offset. */
> static ssize_t read_from_oldmem(char *buf, size_t count,
> u64 *ppos, int userbuf)
> @@ -55,9 +94,14 @@ static ssize_t read_from_oldmem(char *bu
> else
> nr_bytes = count;
>
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> - if (tmp < 0)
> - return tmp;
> + /* if pfn is not ram, return zeros for spares dump files */
typo.
Also, sentences start with capital letters!
> + if (pfn_is_ram(pfn) == 0)
> + memset(buf, 0, nr_bytes);
> + else {
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> + if (tmp < 0)
> + return tmp;
> + }
> *ppos += nr_bytes;
> count -= nr_bytes;
> buf += nr_bytes;
> Index: linux-2.6.39-rc5/include/linux/crash_dump.h
> ===================================================================
> --- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
> +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
> if (is_kdump_kernel())
> elfcorehdr_addr = ELFCORE_ADDR_ERR;
> }
> +
> +#define HAVE_OLDMEM_PFN_IS_RAM 1
What's this for?
> +extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));
"unsigned long pfn" in the declaration, please. This has good
documentation value.
> +extern void unregister_oldmem_pfn_is_ram(void);
On Thu, May 05, Andrew Morton wrote:
> On Tue, 3 May 2011 21:08:06 +0200
> Olaf Hering <[email protected]> wrote:
> > This will reduce the time to read /proc/vmcore. Without this change a
> > 512M guest with 128M crashkernel region needs 200 seconds to read it,
> > with this change it takes just 2 seconds.
>
> Seems reasonable, I suppose.
Andrew,
Thanks for your feedback.
> Is there some suitable ifdef we can put around this stuff to avoid
> adding it to kernel builds which will never use it?
The change is for pv-on-hvm guests. In this setup the (out-of-tree)
paravirtualized drivers shutdown the emulated hardware, then they
communicate directly with the backend.
There is no ifdef right now. I guess at some point, when xen is fully
merged, this hook can be put into some CONFIG_XEN_PV_ON_HVM thing.
> > +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> > +{
> > + if (oldmem_pfn_is_ram == NULL)
> > + oldmem_pfn_is_ram = fn;
> > +}
>
> This is racy, and it should return a success code. And we may as well
> mark it __must_check to prevent people from cheating.
I will update that part.
> > +void unregister_oldmem_pfn_is_ram(void)
> > +{
> > + wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> > + oldmem_pfn_is_ram = NULL;
> > + wmb();
> > +}
>
> I'd say we should do away with the (also racy) refcount thing.
> Instead, require that callers not be using the thing when they
> unregister. ie: that callers not be buggy.
I think oldmem_pfn_is_ram can be cleared unconditionally, the NULL check
in pfn_is_ram() below will prevent a crash.
This whole refcount thing is to prevent a module unload while
pfn_is_ram() is calling the hook, in other words the called code should
not go away until the hook returns to pfn_is_ram().
Should the called code increase/decrease the modules refcount instead?
I remember there was some MODULE_INC/MODULE_DEC macro (cant remember the
exact name) at some point. What needs to be done inside the module to
prevent an unload while its still in use? Is it __module_get/module_put
for each call of fn()?
The called function which will go into the xen source at some point
looks like shown below. HVMOP_get_mem_type was just merged into xen-unstable.
xen-unstable.hg/unmodified_drivers/linux-2.6/platform-pci/platform-pci.c
#ifdef HAVE_OLDMEM_PFN_IS_RAM
static int xen_oldmem_pfn_is_ram(unsigned long pfn)
{
struct xen_hvm_get_mem_type a;
int ret;
a.domid = DOMID_SELF;
a.pfn = pfn;
if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))
return -ENXIO;
switch (a.mem_type) {
case HVMMEM_mmio_dm:
ret = 0;
break;
case HVMMEM_ram_rw:
case HVMMEM_ram_ro:
default:
ret = 1;
break;
}
return ret;
}
#endif
static int __devinit platform_pci_init(...)
{
/* other init stuff */
#ifdef HAVE_OLDMEM_PFN_IS_RAM
register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
#endif
/* other init stuff */
}
Also, this xen driver has no module_exit. So for xen the unregister is
not an issue. I havent looked at the to-be-merged pv-on-hvm drivers,
maybe they can be properly unloaded.
> > +static int pfn_is_ram(unsigned long pfn)
> > +{
> > + int (*fn)(unsigned long);
> > + /* pfn is ram unless fn() checks pagetype */
> > + int ret = 1;
> > +
> > + atomic_inc(&oldmem_fn_refcount);
> > + smp_mb__after_atomic_inc();
> > + fn = oldmem_pfn_is_ram;
> > + if (fn)
> > + ret = fn(pfn);
> > + if (atomic_dec_and_test(&oldmem_fn_refcount))
> > + wake_up(&oldmem_fn_waitq);
> > +
> > + return ret;
> > +}
>
> This function would have been a suitable place at which to document the
> entire feature. As it stands, anyone who is reading this code won't
> have any clue why it exists.
I will add a comment.
> > +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> > +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
>
> Each export should be placed immediately after the function which is
> being exported, please. Checkpatch reports this. Please use checkpatch.
Will do.
> > +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> > @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
> > if (is_kdump_kernel())
> > elfcorehdr_addr = ELFCORE_ADDR_ERR;
> > }
> > +
> > +#define HAVE_OLDMEM_PFN_IS_RAM 1
>
> What's this for?
So that out-of-tree drivers dont fail to compile when they call that
hook unconditionally. Perhaps they could use the kernel version.
Olaf
The balloon driver in a Xen guest frees guest pages and marks them as
mmio. When the kernel crashes and the crash kernel attempts to read the
oldmem via /proc/vmcore a read from ballooned pages will generate 100%
load in dom0 because Xen asks qemu-dm for the page content. Since the
reads come in as 8byte requests each ballooned page is tried 512 times.
With this change a hook can be registered which checks wether the given
pfn is really ram. The hook has to return a value > 0 for ram pages, a
value < 0 on error (because the hypercall is not known) and 0 for
non-ram pages.
This will reduce the time to read /proc/vmcore. Without this change a
512M guest with 128M crashkernel region needs 200 seconds to read it,
with this change it takes just 2 seconds.
Signed-off-by: Olaf Hering <[email protected]>
---
v2:
remove refcounting, called function has to take care of module refcounting
add comments to pfn_is_ram()
register_oldmem_pfn_is_ram() returns -EBUSY if a function is already registerd
move exports close to exported function
update prototypes to include 'pfn'
fs/proc/vmcore.c | 52 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/crash_dump.h | 5 ++++
2 files changed, 54 insertions(+), 3 deletions(-)
Index: linux-2.6.39-rc6/fs/proc/vmcore.c
===================================================================
--- linux-2.6.39-rc6.orig/fs/proc/vmcore.c
+++ linux-2.6.39-rc6/fs/proc/vmcore.c
@@ -35,6 +35,46 @@ static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore = NULL;
+/*
+ * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
+ * The called function has to take care of module refcounting.
+ */
+static int (*oldmem_pfn_is_ram)(unsigned long pfn);
+
+int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn))
+{
+ if (oldmem_pfn_is_ram)
+ return -EBUSY;
+ oldmem_pfn_is_ram = fn;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
+
+void unregister_oldmem_pfn_is_ram(void)
+{
+ oldmem_pfn_is_ram = NULL;
+ wmb();
+}
+EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
+
+static int pfn_is_ram(unsigned long pfn)
+{
+ int (*fn)(unsigned long pfn);
+ /* pfn is ram unless fn() checks pagetype */
+ int ret = 1;
+
+ /*
+ * Ask hypervisor if the pfn is really ram.
+ * A ballooned page contains no data and reading from such a page
+ * will cause high load in the hypervisor.
+ */
+ fn = oldmem_pfn_is_ram;
+ if (fn)
+ ret = fn(pfn);
+
+ return ret;
+}
+
/* Reads a page from the oldmem device from given offset. */
static ssize_t read_from_oldmem(char *buf, size_t count,
u64 *ppos, int userbuf)
@@ -55,9 +95,15 @@ static ssize_t read_from_oldmem(char *bu
else
nr_bytes = count;
- tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
- if (tmp < 0)
- return tmp;
+ /* If pfn is not ram, return zeros for sparse dump files */
+ if (pfn_is_ram(pfn) == 0)
+ memset(buf, 0, nr_bytes);
+ else {
+ tmp = copy_oldmem_page(pfn, buf, nr_bytes,
+ offset, userbuf);
+ if (tmp < 0)
+ return tmp;
+ }
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
Index: linux-2.6.39-rc6/include/linux/crash_dump.h
===================================================================
--- linux-2.6.39-rc6.orig/include/linux/crash_dump.h
+++ linux-2.6.39-rc6/include/linux/crash_dump.h
@@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
if (is_kdump_kernel())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
}
+
+#define HAVE_OLDMEM_PFN_IS_RAM 1
+extern int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn));
+extern void unregister_oldmem_pfn_is_ram(void);
+
#else /* !CONFIG_CRASH_DUMP */
static inline int is_kdump_kernel(void) { return 0; }
#endif /* CONFIG_CRASH_DUMP */
On Fri, 6 May 2011 12:55:46 +0200
Olaf Hering <[email protected]> wrote:
> Should the called code increase/decrease the modules refcount instead?
> I remember there was some MODULE_INC/MODULE_DEC macro (cant remember the
> exact name) at some point. What needs to be done inside the module to
> prevent an unload while its still in use? Is it __module_get/module_put
> for each call of fn()?
A try_module_get(THIS_MODULE) in the register function will do the trick.
However it's unneeded. Documentation/DocBook/kernel-hacking.tmpl tells us
try_module_get() module_put()
These manipulate the module usage count, to protect against
removal (a module also can't be removed if another module uses one
of its exported symbols: see below). Before calling into module
code, you should call <function>try_module_get()</function> on
that module: if it fails, then the module is being removed and you
should act as if it wasn't there. Otherwise, you can safely enter
the module, and call <function>module_put()</function> when you're
finished.
So as your module will have a reference to vmcore.c's register and unregister
functions, nothing needs to be done: the presence of the client module alone
will pin the vmcore.c module.
However it's all moot, because the fs/proc/vmcore.c code cannot
presently be built as a module and it's rather unlikely that it ever
will be.
On Fri, May 06, Andrew Morton wrote:
> So as your module will have a reference to vmcore.c's register and unregister
> functions, nothing needs to be done: the presence of the client module alone
> will pin the vmcore.c module.
I meant the other way around. Keep /proc/vmcore open and read from it,
then try to rmmod foo.ko which provides fn().
Olaf
On Fri, 6 May 2011 21:39:16 +0200
Olaf Hering <[email protected]> wrote:
> On Fri, May 06, Andrew Morton wrote:
>
> > So as your module will have a reference to vmcore.c's register and unregister
> > functions, nothing needs to be done: the presence of the client module alone
> > will pin the vmcore.c module.
>
> I meant the other way around. Keep /proc/vmcore open and read from it,
> then try to rmmod foo.ko which provides fn().
>
The client foo.ko will need to prevent itself from being unloaded while
it's actively doing stuff, yes. Typically that would be done in its
module_exit() function - wait for current activity to complete and
block new activity. The "block new activity" thing should be automatic
because nobody has any more references to anything in the module.