Hi there,
Trying to mmap /dev/kmem with an offset I take from /boot/System.map, I
get an EIO error on a 2.6.20-rc4.
This is something that used to work on older kernels.
Had a look at mmap_kmem() in drivers/char/mem.c, and I'm wondering
whether pfn is correctly computed there: shouldn't we have something like
pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) +
__pa(vma->vm_pgoff << PAGE_SHIFT);
instead of
pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
Or may be should I substract PAGE_OFFSET from the value I get from
System.map before mmapping /dev/kmem?
Thanks for your help
Regards,
Nadia
On Thu, 18 Jan 2007, Nadia Derbey wrote:
>
> Trying to mmap /dev/kmem with an offset I take from /boot/System.map,
> I get an EIO error on a 2.6.20-rc4.
> This is something that used to work on older kernels.
>
> Had a look at mmap_kmem() in drivers/char/mem.c, and I'm wondering whether
> pfn is correctly computed there: shouldn't we have something like
>
> pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) +
> __pa(vma->vm_pgoff << PAGE_SHIFT);
>
> instead of
>
> pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
>
> Or may be should I substract PAGE_OFFSET from the value I get from System.map
> before mmapping /dev/kmem?
Sigh, you're right, 2.6.19 messed that up completely.
No, you never had to subtract PAGE_OFFSET from that address
in the past, and you shouldn't have to do so now.
Please revert the offending patch below, and then maybe Franck
can come up with a patch which preserves the original behaviour
on architectures which used to work (e.g. i386), while fixing
it for those architectures (which are they?) that did not.
I guess it's reassuring to know that not many are actually
using mmap of /dev/kmem, so you're the first to notice: thanks.
Hugh
From: Franck Bui-Huu <[email protected]>
Date: Thu, 12 Oct 2006 19:06:33 +0000 (+0200)
Subject: [PATCH] Fix up mmap_kmem
X-Git-Tag: v2.6.19-rc2^0~6
X-Git-Url: http://127.0.0.1:1234/?p=.git;a=commitdiff_plain;h=99a10a60ba9bedcf5d70ef81414d3e03816afa3f;hp=a5344a9555fffd045218aced89afd6ca0f884e10
[PATCH] Fix up mmap_kmem
vma->vm_pgoff is an pfn _offset_ relatif to the begining
of the memory start. The previous code was doing at first:
vma->vm_pgoff << PAGE_SHIFT
which results into a wrong physical address since some
platforms have a physical mem start that can be different
from 0. After that the previous call __pa() on this
wrong physical address, however __pa() is used to convert
a _virtual_ address into a physical one.
This patch rewrites this convertion. It calculates the
pfn of PAGE_OFFSET which is the pfn of the mem start
then it adds the vma->vm_pgoff to it.
It also uses virt_to_phys() instead of __pa() since the
latter shouldn't be used by drivers.
Signed-off-by: Franck Bui-Huu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 6511012..a89cb52 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -292,8 +292,8 @@ static int mmap_kmem(struct file * file,
{
unsigned long pfn;
- /* Turn a kernel-virtual address into a physical page frame */
- pfn = __pa((u64)vma->vm_pgoff << PAGE_SHIFT) >> PAGE_SHIFT;
+ /* Turn a pfn offset into an absolute pfn */
+ pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
/*
* RED-PEN: on some architectures there is more mapped memory
Hugh Dickins wrote:
> On Thu, 18 Jan 2007, Nadia Derbey wrote:
>
>>Trying to mmap /dev/kmem with an offset I take from /boot/System.map,
>>I get an EIO error on a 2.6.20-rc4.
>>This is something that used to work on older kernels.
>>
>>Had a look at mmap_kmem() in drivers/char/mem.c, and I'm wondering whether
>>pfn is correctly computed there: shouldn't we have something like
>>
>>pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) +
>> __pa(vma->vm_pgoff << PAGE_SHIFT);
>>
>>instead of
>>
>>pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
>>
>>Or may be should I substract PAGE_OFFSET from the value I get from System.map
>>before mmapping /dev/kmem?
>
>
> Sigh, you're right, 2.6.19 messed that up completely.
> No, you never had to subtract PAGE_OFFSET from that address
> in the past, and you shouldn't have to do so now.
>
> Please revert the offending patch below, and then maybe Franck
> can come up with a patch which preserves the original behaviour
> on architectures which used to work (e.g. i386), while fixing
> it for those architectures (which are they?) that did not.
>
Ok, I'll do that, thanks!
Regards,
Nadia
Hugh Dickins wrote:
> On Thu, 18 Jan 2007, Nadia Derbey wrote:
>
>>Trying to mmap /dev/kmem with an offset I take from /boot/System.map,
>>I get an EIO error on a 2.6.20-rc4.
>>This is something that used to work on older kernels.
>>
>>Had a look at mmap_kmem() in drivers/char/mem.c, and I'm wondering whether
>>pfn is correctly computed there: shouldn't we have something like
>>
>>pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) +
>> __pa(vma->vm_pgoff << PAGE_SHIFT);
>>
>>instead of
>>
>>pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
>>
>>Or may be should I substract PAGE_OFFSET from the value I get from System.map
>>before mmapping /dev/kmem?
>
>
> Sigh, you're right, 2.6.19 messed that up completely.
> No, you never had to subtract PAGE_OFFSET from that address
> in the past, and you shouldn't have to do so now.
>
> Please revert the offending patch below, and then maybe Franck
> can come up with a patch which preserves the original behaviour
> on architectures which used to work (e.g. i386), while fixing
> it for those architectures (which are they?) that did not.
>
> I guess it's reassuring to know that not many are actually
> using mmap of /dev/kmem, so you're the first to notice: thanks.
>
I find this feature very interesting from a testing perspective. Now,
since I don't like the idea of being the only one that uses a feature
(not maintained - may be even to be removed?) may be you could advice me
on a more broadly used way to get the value of a non exported kernel
variable from inside a test running in user mode? should I use /dev/mem
instead? But in that case, I should do the virtual to physical
conversion myself, right?
Regards,
Nadia
Hi,
Hugh Dickins wrote:
> On Thu, 18 Jan 2007, Nadia Derbey wrote:
>> Trying to mmap /dev/kmem with an offset I take from /boot/System.map,
>> I get an EIO error on a 2.6.20-rc4.
>> This is something that used to work on older kernels.
>>
>> Had a look at mmap_kmem() in drivers/char/mem.c, and I'm wondering whether
>> pfn is correctly computed there: shouldn't we have something like
>>
>> pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) +
>> __pa(vma->vm_pgoff << PAGE_SHIFT);
>>
>> instead of
>>
>> pfn = PFN_DOWN(virt_to_phys((void *)PAGE_OFFSET)) + vma->vm_pgoff;
>>
>> Or may be should I substract PAGE_OFFSET from the value I get from System.map
>> before mmapping /dev/kmem?
>
> Sigh, you're right, 2.6.19 messed that up completely.
> No, you never had to subtract PAGE_OFFSET from that address
> in the past, and you shouldn't have to do so now.
>
> Please revert the offending patch below, and then maybe Franck
> can come up with a patch which preserves the original behaviour
> on architectures which used to work (e.g. i386), while fixing
> it for those architectures (which are they?) that did not.
>
I've been confused by 'vma->vm_pgoff' meaning. I assumed that it was
an offset relatif to the start of the kernel address space
(PAGE_OFFSET) as the commit message I submitted explains. So doing:
fd = open("/dev/kmem", O_RDONLY);
kmem = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 4 * 4096);
actually asks for a kernel space mapping which start 4 pages after the
begining of the kernel memory space.
So yes, if the 'offset' expected by 'mmap(/dev/kmem, ..., offset)'
usage is actually a kernel virtual address the patch I submitted is
wrong. The offending line should have been something like:
pfn = PFN_DOWN(virt_to_phys((void *)(vma->vm_pgoff << PAGE_SHIFT)));
and in this case 'vma->vm_pgoff' has no sense to me. My apologizes for
this mess.
> I guess it's reassuring to know that not many are actually
> using mmap of /dev/kmem, so you're the first to notice: thanks.
>
yes it doesn't seems to be used. In my case, I was just playing with
it when I submitted the patch but have no real usages.
Franck
On Fri, 19 Jan 2007, Nadia Derbey wrote:
> Hugh Dickins wrote:
> >
> > Sigh, you're right, 2.6.19 messed that up completely.
> > No, you never had to subtract PAGE_OFFSET from that address
> > in the past, and you shouldn't have to do so now.
Whoops, I should never have said "never". Checking further back,
I found that in 2.4, and prior to 2.6.12, mmap_kmem was simply
#defined to mmap_mem, and so you would then have had to subtract
PAGE_OFFSET (well, on i386 at least) from the kernel virtual
address to get it to work.
Andi fixed that in 2.6.12. I agree with his fix: the same data should
appear at the same offset whether you use mmap or read, which was not
so before his patch (nor after Franck's). Though I do wonder whether
it was safe to change its behaviour at that stage: more evidence that
few have actually been using mmap of /dev/kmem.
> > I guess it's reassuring to know that not many are actually
> > using mmap of /dev/kmem, so you're the first to notice: thanks.
> >
> I find this feature very interesting from a testing perspective. Now, since I
> don't like the idea of being the only one that uses a feature (not maintained
> - may be even to be removed?) may be you could advice me on a more broadly
> used way to get the value of a non exported kernel variable from inside a test
> running in user mode? should I use /dev/mem instead? But in that case, I
> should do the virtual to physical conversion myself, right?
That's exactly the way I've used mmap of /dev/kmem in the past myself:
yes, a convenient way to collect stats or patch flags when investigating
something on a private kernel. There are probably more sophisticated
alternatives now (systemtap, [a-z]probes), but mmap of /dev/kmem is
pretty easy to understand, whatever its limitations.
You're right to question whether you'll be safer in the long term to
use /dev/mem instead, doing the virtual to physical conversion yourself:
I share your doubt. On the one hand, /dev/kmem is clearly underused,
with an interface which changes without anyone noticing; and Fedora
doesn't even supply the node (they'd like to get rid of /dev/mem also,
I believe - too wide an open door into the kernel - but a few tools
still use it). On the other hand, if for some reason we make the
mapping between physical and kernel virtual more complicated in
future, /dev/kmem should in theory save you from having to worry
about that (though in practice none of us has ever got around to
supporting mmap of the vmalloc area through /dev/kmem yet).
I think, if this thread provokes a call to remove support for mmap
of /dev/kmem, I'd find that hard to argue against, and you'd better
switch to /dev/mem. But personally I'd prefer it to remain.
Hugh
On Fri, 2007-01-19 at 16:33 +0000, Hugh Dickins wrote:
> Though I do wonder whether
> it was safe to change its behaviour at that stage: more evidence that
> few have actually been using mmap of /dev/kmem.
... and maybe we should just kill /dev/kmem entirely... it seems mostly
used by rootkits but very few other things, if any at all...
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
On Fri, 19 Jan 2007, Franck Bui-Huu wrote:
> Hugh Dickins wrote:
> >
> > Please revert the offending patch below, and then maybe Franck
> > can come up with a patch which preserves the original behaviour
> > on architectures which used to work (e.g. i386), while fixing
> > it for those architectures (which are they?) that did not.
> >
>
> I've been confused by 'vma->vm_pgoff' meaning. I assumed that it was
> an offset relatif to the start of the kernel address space
> (PAGE_OFFSET) as the commit message I submitted explains. So doing:
>
> fd = open("/dev/kmem", O_RDONLY);
> kmem = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 4 * 4096);
>
> actually asks for a kernel space mapping which start 4 pages after the
> begining of the kernel memory space.
I agree that the name "kmem" lends itself to that interpretation
(and the 2.4 through 2.6.11 history shows that's not a wild idea);
but read and write of /dev/kmem have always used the kernel virtual
address as offset, so mmap of /dev/kmem should be doing the same.
>
> So yes, if the 'offset' expected by 'mmap(/dev/kmem, ..., offset)'
> usage is actually a kernel virtual address the patch I submitted is
> wrong. The offending line should have been something like:
>
> pfn = PFN_DOWN(virt_to_phys((void *)(vma->vm_pgoff << PAGE_SHIFT)));
>
> and in this case 'vma->vm_pgoff' has no sense to me. My apologizes for
> this mess.
Apology surely accepted, it's a confusing area (inevitably: in a driver
for mem, the distinction between addresses and offsets become blurred).
But please note, the worst of it was, that your patch comment gave no
hint that you were knowingly changing its behaviour on the "main"
architectures: it reads as if you were simply fixing it up on a
few less popular architectures where an anomaly had been missed.
> > I guess it's reassuring to know that not many are actually
> > using mmap of /dev/kmem, so you're the first to notice: thanks.
>
> yes it doesn't seems to be used. In my case, I was just playing with
> it when I submitted the patch but have no real usages.
Have I got it right, that actually the problem you thought you were
fixing does not even exist? __pa was already doing the right thing on
all architectures, wasn't it? So we can simply ask Linus to revert your
patch? I don't think your PFN_DOWN or virt_to_phys were improvements:
though mem.c happens to live in drivers/char/, imagine it under mm/.
Hugh
On Fri, 19 Jan 2007, Arjan van de Ven wrote:
> On Fri, 2007-01-19 at 16:33 +0000, Hugh Dickins wrote:
> > Though I do wonder whether
> > it was safe to change its behaviour at that stage: more evidence that
> > few have actually been using mmap of /dev/kmem.
>
> ... and maybe we should just kill /dev/kmem entirely... it seems mostly
> used by rootkits but very few other things, if any at all...
It was discourteous of me not to CC you: I thought you might say that ;)
Though so long as /dev/mem support remains, /dev/kmem might as well?
And be kept as a CONFIG_ option under DEBUG_KERNEL thereafter?
Hugh
On Fri, 2007-01-19 at 17:12 +0000, Hugh Dickins wrote:
> On Fri, 19 Jan 2007, Arjan van de Ven wrote:
> > On Fri, 2007-01-19 at 16:33 +0000, Hugh Dickins wrote:
> > > Though I do wonder whether
> > > it was safe to change its behaviour at that stage: more evidence that
> > > few have actually been using mmap of /dev/kmem.
> >
> > ... and maybe we should just kill /dev/kmem entirely... it seems mostly
> > used by rootkits but very few other things, if any at all...
>
> It was discourteous of me not to CC you: I thought you might say that ;)
> Though so long as /dev/mem support remains, /dev/kmem might as well?
they're not the same; for a long time, /dev/mem on actual memory
returned zeros... so you couldn't use it for rootkits ;)
(that got "fixed" a while ago)
> And be kept as a CONFIG_ option under DEBUG_KERNEL thereafter?
config option is fine I suppose... I assume distros will be smart enough
to turn it off ;)
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
On Fri, 19 Jan 2007, Arjan van de Ven wrote:
> On Fri, 2007-01-19 at 17:12 +0000, Hugh Dickins wrote:
> > Though so long as /dev/mem support remains, /dev/kmem might as well?
>
> they're not the same; for a long time, /dev/mem on actual memory
> returned zeros... so you couldn't use it for rootkits ;)
> (that got "fixed" a while ago)
We fixed (or "fixed") the mmap of them both, not to give zeroes on
!PageReserved pages; but read and write were never so restricted.
(Oh, I've said "never" again - expect I'll be humiliated shortly.)
Can't rootkits work as just about as easily through read & write?
Hugh
> But personally I'd prefer it to remain.
Similar. I also got some tools who use it to read kernel variables
and doing the V->P conversion in user space would be tedious
and unreliable (e.g. for vmalloc)
-Andi
On 1/19/07, Hugh Dickins <[email protected]> wrote:
>
> Apology surely accepted, it's a confusing area (inevitably: in a driver
> for mem, the distinction between addresses and offsets become blurred).
>
> But please note, the worst of it was, that your patch comment gave no
> hint that you were knowingly changing its behaviour on the "main"
> architectures: it reads as if you were simply fixing it up on a
> few less popular architectures where an anomaly had been missed.
>
Because I was thinking that the expected behaviour was the one
implemented before 2.6.12. So I really thought to fix a bug, again
sorry for not having checked the history...
That said, it's really confusing to pass a virtual address as an
offset because:
(a) mmap() has always worked with offset not addresses;
(b) the kernel will treat this virtual address as an offset
until kmem driver convert it back to a virtual
address. And it seems that during this convertion the
lowest bits of the virtual address will be lost...
Maybe read/write behaviours should be changed to use the offset as an
offset and not as a virtual address.
> > > I guess it's reassuring to know that not many are actually
> > > using mmap of /dev/kmem, so you're the first to notice: thanks.
> >
> > yes it doesn't seems to be used. In my case, I was just playing with
> > it when I submitted the patch but have no real usages.
>
> Have I got it right, that actually the problem you thought you were
> fixing does not even exist?
yes, see above.
> __pa was already doing the right thing on
> all architectures, wasn't it? So we can simply ask Linus to revert your
> patch?
yes we can if the desired behaviour is the one introduced by 2.6.12.
> I don't think your PFN_DOWN or virt_to_phys were improvements:
> though mem.c happens to live in drivers/char/, imagine it under mm/.
>
I don't get your point here. Do you mean that virt_to_phys() is only
meant for drivers ? If so, I would have said that virt_to_phys() is
prefered once boot memory init is finished...
Franck
On Sat, 20 Jan 2007, Franck Bui-Huu wrote:
> On 1/19/07, Hugh Dickins <[email protected]> wrote:
>
> That said, it's really confusing to pass a virtual address as an
> offset because:
>
> (a) mmap() has always worked with offset not addresses;
mmap maps some offset down a backing object to some virtual address
in userspace, for some length. When the backing object is itself
memory, what would you use for the offset down that backing object?
For physical memory (/dev/mem), physical address; for virtual
memory (/dev/kmem), virtual address.
> (b) the kernel will treat this virtual address as an offset
> until kmem driver convert it back to a virtual
> address. And it seems that during this convertion the
> lowest bits of the virtual address will be lost...
mmap always demands PAGE_SIZE alignment of offset and address.
Or is it not those lowest (12 on i386) bits you're referring to
as lost? If you're expecting mmap to map kernel memory at the
same addresses as kernel memory... well, that's already done
without mmap! but not much help towards getting a userspace
mapping of kernel memory.
> Maybe read/write behaviours should be changed to use the offset as an
> offset and not as a virtual address.
They do already use the offset as an offset; so I imagine you're
suggesting they be changed to work with "offset of virtual address
from PAGE_OFFSET" instead of simple virtual address, to match the
change you made to mmap_kmem in 2.6.19.
Adding further to the confusion and incompatibility between releases,
in a (slightly) more widely used interface than the mmap, for no gain?
No, I don't think so.
> > Have I got it right, that actually the problem you thought you were
> > fixing does not even exist?
>
> yes, see above.
Thanks for the confirmation.
> > I don't think your PFN_DOWN or virt_to_phys were improvements:
> > though mem.c happens to live in drivers/char/, imagine it under mm/.
>
> I don't get your point here. Do you mean that virt_to_phys() is only
> meant for drivers ? If so, I would have said that virt_to_phys() is
> prefered once boot memory init is finished...
My point was merely that I'd much rather ask Linus (when he's back)
for a straight revert of your patch, than mess around with including
your change from __pa to virt_to_phys: that even if we prefer general
drivers to say virt_to_phys than __pa, drivers/char/mem.c is a special
case intimately bound up with mm and can be granted its present exemption.
Hugh