2005-03-21 15:32:56

by Hayim Shaul

[permalink] [raw]
Subject: mmap/munmap bug


Hi all,

I have an unexplained bug with mmap/munmap on 2.6.X.

I'm writing a kernel module that gives super-fast access to the network.
It does so by doing mmap thus avoiding the memcpy to/from user.

It works well for some time but then the kernel panics with a bad_page
message (full stack is given below)

What I understand from this message is that an unmapped page is being
unmapped again. The bug usually appears when I unmap the area from user
space.

I don't understand what I am doing wrong. I follow the example from
Linux-device-driver (2nd ed.) and codes I found under drivers/.

I also saw that there's a mapping bug in the 2.6 kernels. I'm not
convinced yet that this is the case, but if so, is there a work around?

relevant parts of the code are given below.

I'd appreciate any input,
Hayim.

************************************88
The full panic message:

Mar 21 08:48:15 localhost kernel: Bad page state at free_hot_cold_page
(in process 'noa', page c1000100)
Mar 21 08:48:15 localhost kernel: flags:0x00001014 mapping:00000000
mapcount:0 count:0
Mar 21 08:48:15 localhost kernel: Backtrace:
Mar 21 08:48:15 localhost kernel: [<c01329a5>] bad_page+0x75/0xb0
Mar 21 08:48:15 localhost kernel: [<c013308c>]
free_hot_cold_page+0x5c/0xd0
Mar 21 08:48:15 localhost kernel: [<c013c6fb>]
zap_pte_range+0x14b/0x270
Mar 21 08:48:15 localhost kernel: [<c013c873>] zap_pmd_range+0x53/0x70
Mar 21 08:48:15 localhost kernel: [<c013c8d3>] zap_pud_range+0x43/0x60
Mar 21 08:48:15 localhost kernel: [<c013c96e>]
unmap_page_range+0x7e/0xa0
Mar 21 08:48:15 localhost kernel: [<c013ca81>] unmap_vmas+0xf1/0x200
Mar 21 08:48:15 localhost kernel: [<c0141005>] unmap_region+0x75/0xe0
Mar 21 08:48:15 localhost kernel: [<c0141303>] do_munmap+0x113/0x150
Mar 21 08:48:15 localhost kernel: [<c0141384>] sys_munmap+0x44/0x70
Mar 21 08:48:15 localhost kernel: [<c0102563>] syscall_call+0x7/0xb
Mar 21 08:48:15 localhost kernel: Trying to fix it up, but a reboot is
needed

**********************************************************8


static void sniffer_vma_open(struct vm_area_struct *vma) {
printk("vma_open\n");
}

static void sniffer_vma_close(struct vm_area_struct *vma) {
printk("vma_close\n");
}

static int proc_file_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* don.t do anything here: "nopage" will fill the holes */
vma->vm_ops = &sniffer_vm_ops;
vma->vm_flags |= VM_RESERVED;
sniffer_vma_open(vma);
return 0;
}

static struct page *proc_file_nopage(struct vm_area_struct *vma,
unsigned long address, int *type)
{
struct page *page = NOPAGE_SIGBUS;

unsigned long physaddr = ((address - vma->vm_start) >> PAGE_SHIFT) +
vma->vm_pgoff;

if (! page_should_be_mapped(my_page_bitmap, physaddr))
return NOPAGE_SIGBUS;

page = virt_to_page((physaddr << PAGE_SHIFT));
// page = virt_to_page(__va(physaddr << PAGE_SHIFT)); // bug in LDD?
get_page(page);
return page;
}

struct vm_operations_struct sniffer_vm_ops = {
.open = sniffer_vma_open,
.close = sniffer_vma_close,
.nopage = proc_file_nopage,
};
static
struct file_operations File_Ops_4_Our_Proc_File = {
.read = proc_file_read,
.write = proc_file_write,
.open = proc_file_open,
.release = proc_file_close,
.mmap = proc_file_mmap,
};

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/


+++++++++++++++++++++++++++++++++++++++++++
This Mail Was Scanned By Mail-seCure System
at the Tel-Aviv University CC.


2005-03-21 18:34:10

by Arjan van de Ven

[permalink] [raw]
Subject: Re: mmap/munmap bug

On Mon, 2005-03-21 at 17:32 +0200, Hayim Shaul wrote:
> Hi all,
>
> I have an unexplained bug with mmap/munmap on 2.6.X.
>
> I'm writing a kernel module that gives super-fast access to the network.
> It does so by doing mmap thus avoiding the memcpy to/from user.

well... you are aware the network stack already supports generic zero
copy networking, right ?


2005-03-22 07:57:40

by Gleb Natapov

[permalink] [raw]
Subject: Re: mmap/munmap bug

On Mon, Mar 21, 2005 at 07:34:02PM +0100, Arjan van de Ven wrote:
> On Mon, 2005-03-21 at 17:32 +0200, Hayim Shaul wrote:
> > Hi all,
> >
> > I have an unexplained bug with mmap/munmap on 2.6.X.
> >
> > I'm writing a kernel module that gives super-fast access to the network.
> > It does so by doing mmap thus avoiding the memcpy to/from user.
>
> well... you are aware the network stack already supports generic zero
> copy networking, right ?
>
Does it support zero copy not only for send but also for receive? Can we
receive packets directly to userspace buffers?

--
Gleb.

2005-03-22 08:05:48

by Arjan van de Ven

[permalink] [raw]
Subject: Re: mmap/munmap bug

On Tue, 2005-03-22 at 09:56 +0200, Gleb Natapov wrote:
> On Mon, Mar 21, 2005 at 07:34:02PM +0100, Arjan van de Ven wrote:
> > On Mon, 2005-03-21 at 17:32 +0200, Hayim Shaul wrote:
> > > Hi all,
> > >
> > > I have an unexplained bug with mmap/munmap on 2.6.X.
> > >
> > > I'm writing a kernel module that gives super-fast access to the network.
> > > It does so by doing mmap thus avoiding the memcpy to/from user.
> >
> > well... you are aware the network stack already supports generic zero
> > copy networking, right ?
> >
> Does it support zero copy not only for send but also for receive? Can we
> receive packets directly to userspace buffers?

that it can't currently, but without some major protocol stack rework
that's not going to be easy. If you want to help do that work,
excellent! Be sure to contact the people on net-dev mailinglist since
they are the ones having looked at this previously.



2005-03-22 08:24:03

by Hayim Shaul

[permalink] [raw]
Subject: Re: mmap/munmap bug

>> Does it support zero copy not only for send but also for receive? Can we
>> receive packets directly to userspace buffers?
>
> that it can't currently, but without some major protocol stack rework
> that's not going to be easy. If you want to help do that work,
> excellent! Be sure to contact the people on net-dev mailinglist since
> they are the ones having looked at this previously.

My case is simpler, as the application I attend it to is similar to a NAT.
A packet comes in, a little alternation of the headers and off it goes
again. So there's no TCP-stack or anything.

What I thought of doing, is map the skbuff to user-space. Have the
user-application alter the headers. Send the (same) skbuff from
kernel-space.

Does there exist anything equivalent?

2005-03-22 08:34:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: mmap/munmap bug

On Tue, 2005-03-22 at 10:23 +0200, Hayim Shaul wrote:
> >> Does it support zero copy not only for send but also for receive? Can we
> >> receive packets directly to userspace buffers?
> >
> > that it can't currently, but without some major protocol stack rework
> > that's not going to be easy. If you want to help do that work,
> > excellent! Be sure to contact the people on net-dev mailinglist since
> > they are the ones having looked at this previously.
>
> My case is simpler, as the application I attend it to is similar to a NAT.
> A packet comes in, a little alternation of the headers and off it goes
> again. So there's no TCP-stack or anything.
>
> What I thought of doing, is map the skbuff to user-space. Have the
> user-application alter the headers. Send the (same) skbuff from
> kernel-space.
>
> Does there exist anything equivalent?

yes; netfilter has facilities for this actually afaik.
tcpdump also uses something like this (but only in one direction), it
mmaps some ringbuffer with incomming packets.


2005-03-22 09:39:10

by Hayim Shaul

[permalink] [raw]
Subject: Re: mmap/munmap bug


The page contains the data part of the skbuff, so I guess it is used by
a slab.

Isn't this the idea of mapcount? to keep count of number of mapping to a
page, and free the page only when this ref-count reaches zero??


I don't mind the slab being freed and the page not. My application won't
access that page until the net-driver allocates a buffer on this page and
I receive its pointer.
Will this page be used again, if I keep a mapping to it?
I don't see any reasons not to.

On Tue, 22 Mar 2005, yingchao zhou wrote:

> Suppose the page used by slab, and when it was cut
> down, the page probably be freed twice.
> --- Hayim Shaul <[email protected]> ?????ģ?
>>
>> It certainly is.
>>
>> But doesn't the get_page() supposed to increment the
>> map_count??
>>
>> If so, then the kernel should have crashed every
>> time I mapped/unmapped.
>>
>>
>>
>> On Tue, 22 Mar 2005, yingchao zhou wrote:
>>
>>> The returned page from proc_file_nopage probably
>> used
>>> by other part of kernel or application.
>>> --- Hayim Shaul <[email protected]> ?????ģ?
>>>>
>>>> Hi all,
>>>>
>>>> I have an unexplained bug with mmap/munmap on
>> 2.6.X.
>>>>
>>>> I'm writing a kernel module that gives super-fast
>>>> access to the network.
>>>> It does so by doing mmap thus avoiding the memcpy
>>>> to/from user.
>>>>
>>>> It works well for some time but then the kernel
>>>> panics with a bad_page
>>>> message (full stack is given below)
>>>>
>>>> What I understand from this message is that an
>>>> unmapped page is being
>>>> unmapped again. The bug usually appears when I
>> unmap
>>>> the area from user
>>>> space.
>>>>
>>>> I don't understand what I am doing wrong. I
>> follow
>>>> the example from
>>>> Linux-device-driver (2nd ed.) and codes I found
>>>> under drivers/.
>>>>
>>>> I also saw that there's a mapping bug in the 2.6
>>>> kernels. I'm not
>>>> convinced yet that this is the case, but if so,
>> is
>>>> there a work around?
>>>>
>>>> relevant parts of the code are given below.
>>>>
>>>> I'd appreciate any input,
>>>> Hayim.
>>>>
>>>> ************************************88
>>>> The full panic message:
>>>>
>>>> Mar 21 08:48:15 localhost kernel: Bad page state
>> at
>>>> free_hot_cold_page
>>>> (in process 'noa', page c1000100)
>>>> Mar 21 08:48:15 localhost kernel:
>> flags:0x00001014
>>>> mapping:00000000
>>>> mapcount:0 count:0
>>>> Mar 21 08:48:15 localhost kernel: Backtrace:
>>>> Mar 21 08:48:15 localhost kernel: [<c01329a5>]
>>>> bad_page+0x75/0xb0
>>>> Mar 21 08:48:15 localhost kernel: [<c013308c>]
>>>> free_hot_cold_page+0x5c/0xd0
>>>> Mar 21 08:48:15 localhost kernel: [<c013c6fb>]
>>>> zap_pte_range+0x14b/0x270
>>>> Mar 21 08:48:15 localhost kernel: [<c013c873>]
>>>> zap_pmd_range+0x53/0x70
>>>> Mar 21 08:48:15 localhost kernel: [<c013c8d3>]
>>>> zap_pud_range+0x43/0x60
>>>> Mar 21 08:48:15 localhost kernel: [<c013c96e>]
>>>> unmap_page_range+0x7e/0xa0
>>>> Mar 21 08:48:15 localhost kernel: [<c013ca81>]
>>>> unmap_vmas+0xf1/0x200
>>>> Mar 21 08:48:15 localhost kernel: [<c0141005>]
>>>> unmap_region+0x75/0xe0
>>>> Mar 21 08:48:15 localhost kernel: [<c0141303>]
>>>> do_munmap+0x113/0x150
>>>> Mar 21 08:48:15 localhost kernel: [<c0141384>]
>>>> sys_munmap+0x44/0x70
>>>> Mar 21 08:48:15 localhost kernel: [<c0102563>]
>>>> syscall_call+0x7/0xb
>>>> Mar 21 08:48:15 localhost kernel: Trying to fix
>> it
>>>> up, but a reboot is
>>>> needed
>>>>
>>>>
>>>
>>
> **********************************************************8
>>>>
>>>>
>>>> static void sniffer_vma_open(struct
>> vm_area_struct
>>>> *vma) {
>>>> printk("vma_open\n");
>>>> }
>>>>
>>>> static void sniffer_vma_close(struct
>> vm_area_struct
>>>> *vma) {
>>>> printk("vma_close\n");
>>>> }
>>>>
>>>> static int proc_file_mmap(struct file *filp,
>> struct
>>>> vm_area_struct *vma)
>>>> {
>>>> /* don.t do anything here: "nopage" will
>> fill
>>>> the holes */
>>>> vma->vm_ops = &sniffer_vm_ops;
>>>> vma->vm_flags |= VM_RESERVED;
>>>> sniffer_vma_open(vma);
>>>> return 0;
>>>> }
>>>>
>>>> static struct page *proc_file_nopage(struct
>>>> vm_area_struct *vma,
>>>> unsigned long address, int
>> *type)
>>>> {
>>>> struct page *page = NOPAGE_SIGBUS;
>>>>
>>>> unsigned long physaddr = ((address -
>>>> vma->vm_start) >> PAGE_SHIFT) +
>>>> vma->vm_pgoff;
>>>>
>>>> if (! page_should_be_mapped(my_page_bitmap,
>>>> physaddr))
>>>> return NOPAGE_SIGBUS;
>>>>
>>>> page = virt_to_page((physaddr <<
>> PAGE_SHIFT));
>>>> // page = virt_to_page(__va(physaddr <<
>>>> PAGE_SHIFT)); // bug in LDD?
>>>> get_page(page);
>>>> return page;
>>>>
>>>> }
>>>>
>>>> struct vm_operations_struct sniffer_vm_ops = {
>>>> .open = sniffer_vma_open,
>>>> .close = sniffer_vma_close,
>>>> .nopage = proc_file_nopage,
>>>> };
>>>>
>>>> static
>>>> struct file_operations File_Ops_4_Our_Proc_File =
>> {
>>>> .read = proc_file_read,
>>>> .write = proc_file_write,
>>>> .open = proc_file_open,
>>>> .release = proc_file_close,
>>>> .mmap = proc_file_mmap,
>>>> };
>>>>
>>>> --
>>>> Kernelnewbies: Help each other learn about the
>> Linux
>>>> kernel.
>>>> Archive:
>>>> http://mail.nl.linux.org/kernelnewbies/
>>>> FAQ: http://kernelnewbies.org/faq/
>>>>
>>>>
>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>> This Mail Was Scanned By Mail-seCure System
>>>> at the Tel-Aviv University CC.
>>>> -
>>>> To unsubscribe from this list: send the line
>>>> "unsubscribe linux-kernel" in
>>>> the body of a message to
>> [email protected]
>>>> More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>
>>>
>>
> _________________________________________________________
>>> Do You Yahoo!?
>>> ע??????һ??Ʒ?ʵ??Ż????ѵ???
>>>
>>
> http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++
>>> This Mail Was Scanned By Mail-seCure System
>>> at the Tel-Aviv University CC.
>>>
>
> _________________________________________________________
> Do You Yahoo!?
> ע??????һ??Ʒ?ʵ??Ż????ѵ???
> http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/
>
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
>

2005-03-22 09:44:23

by Hayim Shaul

[permalink] [raw]
Subject: Re: mmap/munmap bug

>>
>> What I thought of doing, is map the skbuff to user-space. Have the
>> user-application alter the headers. Send the (same) skbuff from
>> kernel-space.
>>
>> Does there exist anything equivalent?
>
> yes; netfilter has facilities for this actually afaik.
> tcpdump also uses something like this (but only in one direction), it
> mmaps some ringbuffer with incomming packets.

Are you refering to NF_QUEUE + libipq ?
I was under the impression that it does involve memcpy.

Also, I think that with this you cannot change routing decisions made on
the packet, although, I'm not sure yet how critical this is for me.