2005-02-28 12:07:24

by Bernd Schubert

[permalink] [raw]
Subject: swapper: page allocation failure. order:1, mode:0x20

Oh no, not this page allocation problems again. In summer I already posted
problems with page allocation errors with 2.6.7, but to me it seemed that
nobody cared. That time we got those problems every morning during the cron
jobs and our main file server always completely crashed.
This time its our cluster master system and first happend after an uptime
of 89 days, kernel is 2.6.9. Besides of those messages, the system still
seems to run stable

I really beg for help here, so please please please help me solving this
probem. What can I do to solve it?

First a (dumb) question, what does 'page allocation failure' really mean?
Is it some out of memory case?


Thanks a lot in advance for any help,
Bernd





Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
Feb 28 10:04:45 hitchcock kernel:
Feb 28 10:04:45 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
Feb 28 10:04:45 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff803d064a>{ip_frag_create+26}
Feb 28 10:04:45 hitchcock kernel: <ffffffff8016061e>{cache_grow+190} <ffffffff80160e80>{cache_alloc_refill+560}
Feb 28 10:04:45 hitchcock kernel: <ffffffff801617e3>{__kmalloc+195} <ffffffff803b5680>{alloc_skb+64}
Feb 28 10:04:45 hitchcock kernel: <ffffffff8031727e>{tg3_alloc_rx_skb+222} <ffffffff80317553>{tg3_rx+371}
Feb 28 10:04:45 hitchcock kernel: <ffffffff80317977>{tg3_poll+183} <ffffffff803bc306>{net_rx_action+134}
Feb 28 10:04:45 hitchcock kernel: <ffffffff8013d0ab>{__do_softirq+123} <ffffffff8013d162>{do_softirq+50}
Feb 28 10:04:45 hitchcock kernel: <ffffffff801140ab>{do_IRQ+347} <ffffffff801114eb>{ret_from_intr+0}
Feb 28 10:04:45 hitchcock kernel: <EOI> <ffffffff8010f070>{default_idle+0} <ffffffff8010f094>{default_idle+36}
Feb 28 10:04:45 hitchcock kernel: <ffffffff8010f147>{cpu_idle+39}
Feb 28 10:05:41 hitchcock rpc.mountd: authenticated unmount request from beo-04:666 for /lib64 (/lib64)
Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
Feb 28 10:07:36 hitchcock kernel:
Feb 28 10:07:36 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff8016061e>{cache_grow+190}
Feb 28 10:07:36 hitchcock kernel: <ffffffff80160e80>{cache_alloc_refill+560} <ffffffff801617e3>{__kmalloc+195}
Feb 28 10:07:36 hitchcock kernel: <ffffffff803b5680>{alloc_skb+64} <ffffffff8031727e>{tg3_alloc_rx_skb+222}
Feb 28 10:07:36 hitchcock kernel: <ffffffff80317553>{tg3_rx+371} <ffffffff80317977>{tg3_poll+183}
Feb 28 10:07:36 hitchcock kernel: <ffffffff803bc306>{net_rx_action+134} <ffffffff8013d0ab>{__do_softirq+123}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8013d162>{do_softirq+50} <ffffffff801140ab>{do_IRQ+347}
Feb 28 10:07:36 hitchcock kernel: <ffffffff801114eb>{ret_from_intr+0} <EOI> <ffffffff8010f070>{default_idle+0}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8010f094>{default_idle+36} <ffffffff8010f147>{cpu_idle+39}
Feb 28 10:07:36 hitchcock kernel:
Feb 28 10:07:36 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
Feb 28 10:07:36 hitchcock kernel:
Feb 28 10:07:36 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff8016061e>{cache_grow+190}
Feb 28 10:07:36 hitchcock kernel: <ffffffff80160e80>{cache_alloc_refill+560} <ffffffff801617e3>{__kmalloc+195}
Feb 28 10:07:36 hitchcock kernel: <ffffffff803b5680>{alloc_skb+64} <ffffffff8031727e>{tg3_alloc_rx_skb+222}
Feb 28 10:07:36 hitchcock kernel: <ffffffff80317553>{tg3_rx+371} <ffffffff80317977>{tg3_poll+183}
Feb 28 10:07:36 hitchcock kernel: <ffffffff803bc306>{net_rx_action+134} <ffffffff8013d0ab>{__do_softirq+123}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8013d162>{do_softirq+50} <ffffffff801140ab>{do_IRQ+347}
Feb 28 10:07:36 hitchcock kernel: <ffffffff801114eb>{ret_from_intr+0} <EOI> <ffffffff8010f070>{default_idle+0}
Feb 28 10:07:36 hitchcock kernel: <ffffffff8010f094>{default_idle+36} <ffffffff8010f147>{cpu_idle+39}


--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit?t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]


2005-02-28 13:31:40

by Marcel Smeets

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Hello had the same sort of messages, posted it two days ago.

I am using SuSE 9.2 Pro with kernel uname -a:

Linux 2.6.8-24.11-smp #1 SMP Fri Jan 14 13:01:26 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux

Intel(R) Xeon(TM) CPU 2.80GHz

2 GB RAM
1 GB SWAP

and these strange kernel messages keep appearing::

it is a high load machine with iptables running. The
/proc/net/ip_conntrack number keeps running higher and does not get
less. After a while the maximum is reached. th e only way to solve
this is stop iptables, unload the ip modules from the kernel, load
them again and start iptables again. Then the messages will not come
again. Could it be a kernel bug???

dmesg:

swapper: page allocation failure. order:1, mode:0x20

Call Trace:<IRQ> <ffffffff8015ecef>{__alloc_pages+1135}
<ffffffff8015e830>{__get_free_pages+16}
<ffffffff80162676>{kmem_getpages+38}
<ffffffff8031e3ea>{tcp_v4_route_req+250}
<ffffffff80162a99>{cache_alloc_refill+665}
<ffffffff80162c46>{kmem_cache_alloc+54}
<ffffffff802e2f0a>{sk_alloc+58}
<ffffffff80323d99>{tcp_create_openreq_child+41}
<ffffffff803229e7>{tcp_v4_syn_recv_sock+87}
<ffffffff80323b7e>{tcp_check_req+606}
<ffffffff8030b4dc>{ip_finish_output+508}
<ffffffff80308b90>{ip_dst_output+0}
<ffffffff8030b668>{ip_output+216} <ffffffff80308bc8>{ip_dst_output+56}
<ffffffff80132f9b>{recalc_task_prio+635}
<ffffffff80308b90>{ip_dst_output+0}
<ffffffff80132f9b>{recalc_task_prio+635}
<ffffffff80133c91>{activate_task+129}
<ffffffff80137a89>{autoremove_wake_function+9}
<ffffffff80132633>{__wake_up_common+67}
<ffffffff80132c13>{__wake_up+67} <ffffffff802e2344>{sock_def_readable+68}
<ffffffff80320b79>{tcp_v4_do_rcv+233}
<ffffffffa017b034>{:ipt_state:match+36}
<ffffffff803214cb>{tcp_v4_rcv+1835}
<ffffffff80305fb9>{ip_local_deliver_finish+297}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff802f3a43>{nf_hook_slow+195}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff803062fe>{ip_local_deliver+622}
<ffffffff8030592e>{ip_rcv_finish+574} <ffffffff803056f0>{ip_rcv_finish+0}
<ffffffff803056f0>{ip_rcv_finish+0} <ffffffff802f3a43>{nf_hook_slow+195}
<ffffffff803056f0>{ip_rcv_finish+0} <ffffffff80305e34>{ip_rcv+1188}
<ffffffff802e9749>{netif_receive_skb+729}
<ffffffffa00e9c9c>{:e1000:e1000_clean+1820}
<ffffffff802e85b4>{net_rx_action+132}
<ffffffff8013dca1>{__do_softirq+113}
<ffffffff8013dd55>{do_softirq+53} <ffffffff80113e3f>{do_IRQ+335}
<ffffffff8010f540>{mwait_idle+0} <ffffffff80110cf5>{ret_from_intr+0}
<EOI> <ffffffff8010f596>{mwait_idle+86} <ffffffff8010f9ea>{cpu_idle+26}
<ffffffff804e971a>{start_kernel+490} <ffffffff804e91e0>{_sinittext+480}

swapper: page allocation failure. order:1, mode:0x20

Call Trace:<IRQ> <ffffffff8015ecef>{__alloc_pages+1135}
<ffffffff8015e830>{__get_free_pages+16}
<ffffffff80162676>{kmem_getpages+38}
<ffffffff8031e3ea>{tcp_v4_route_req+250}
<ffffffff80162a99>{cache_alloc_refill+665}
<ffffffff80162c46>{kmem_cache_alloc+54}
<ffffffff802e2f0a>{sk_alloc+58}
<ffffffff80323d99>{tcp_create_openreq_child+41}
<ffffffff803229e7>{tcp_v4_syn_recv_sock+87}
<ffffffff80323b7e>{tcp_check_req+606}
<ffffffff8030b4dc>{ip_finish_output+508}
<ffffffff80308b90>{ip_dst_output+0}
<ffffffff8030b668>{ip_output+216} <ffffffff80308bc8>{ip_dst_output+56}
<ffffffff802f3a43>{nf_hook_slow+195} <ffffffff80308b90>{ip_dst_output+0}
<ffffffff8013340a>{task_rq_lock+74}
<ffffffff80133f9b>{try_to_wake_up+747}
<ffffffff8030ad6b>{ip_queue_xmit+1211}
<ffffffff8013340a>{task_rq_lock+74}
<ffffffff8013340a>{task_rq_lock+74}
<ffffffff80132633>{__wake_up_common+67}
<ffffffff80132c13>{__wake_up+67}
<ffffffff80312996>{tcp_ack_saw_tstamp+22}
<ffffffff80314d90>{tcp_ack+1040} <ffffffff80320b79>{tcp_v4_do_rcv+233}
<ffffffffa017b034>{:ipt_state:match+36}
<ffffffff803214cb>{tcp_v4_rcv+1835}
<ffffffff80305fb9>{ip_local_deliver_finish+297}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff802f3a43>{nf_hook_slow+195}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff803062fe>{ip_local_deliver+622}
<ffffffff8030592e>{ip_rcv_finish+574}
<ffffffff803056f0>{ip_rcv_finish+0} <ffffffff803056f0>{ip_rcv_finish+0}
<ffffffff802f3a43>{nf_hook_slow+195} <ffffffff803056f0>{ip_rcv_finish+0}
<ffffffff80305e34>{ip_rcv+1188} <ffffffff802e9749>{netif_receive_skb+729}
<ffffffffa00e9c9c>{:e1000:e1000_clean+1820}
<ffffffff80133c91>{activate_task+129}
<ffffffff80133f9b>{try_to_wake_up+747}
<ffffffff802e85b4>{net_rx_action+132}
<ffffffff8013dca1>{__do_softirq+113} <ffffffff8013dd55>{do_softirq+53}
<ffffffff80113e3f>{do_IRQ+335} <ffffffff8010f540>{mwait_idle+0}
<ffffffff80110cf5>{ret_from_intr+0} <EOI>
<ffffffff8010f596>{mwait_idle+86}
<ffffffff8010f9ea>{cpu_idle+26} <ffffffff804e971a>{start_kernel+490}
<ffffffff804e91e0>{_sinittext+480}
swapper: page allocation failure. order:1, mode:0x20

Call Trace:<IRQ> <ffffffff8015ecef>{__alloc_pages+1135}
<ffffffff8015e830>{__get_free_pages+16}
<ffffffff80162676>{kmem_getpages+38}
<ffffffff8031e3ea>{tcp_v4_route_req+250}
<ffffffff80162a99>{cache_alloc_refill+665}
<ffffffff80162c46>{kmem_cache_alloc+54}
<ffffffff802e2f0a>{sk_alloc+58}
<ffffffff80323d99>{tcp_create_openreq_child+41}
<ffffffff803229e7>{tcp_v4_syn_recv_sock+87}
<ffffffff80323b7e>{tcp_check_req+606}
<ffffffff8030b4dc>{ip_finish_output+508}
<ffffffff80308b90>{ip_dst_output+0}
<ffffffff8030b668>{ip_output+216} <ffffffff80308bc8>{ip_dst_output+56}
<ffffffff802f3a43>{nf_hook_slow+195} <ffffffff80308b90>{ip_dst_output+0}
<ffffffff80132f9b>{recalc_task_prio+635}
<ffffffff80133c91>{activate_task+129}
<ffffffff8013340a>{task_rq_lock+74}
<ffffffff80132633>{__wake_up_common+67}
<ffffffff80313e1b>{tcp_fastretrans_alert+859}
<ffffffff8031536d>{tcp_ack+2541}
<ffffffff80320b79>{tcp_v4_do_rcv+233}
<ffffffffa017b034>{:ipt_state:match+36}
<ffffffff803214cb>{tcp_v4_rcv+1835}
<ffffffff80305fb9>{ip_local_deliver_finish+297}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff802f3a43>{nf_hook_slow+195}
<ffffffff80305e90>{ip_local_deliver_finish+0}
<ffffffff803062fe>{ip_local_deliver+622}
<ffffffff8030592e>{ip_rcv_finish+574} <ffffffff803056f0>{ip_rcv_finish+0}
<ffffffff803056f0>{ip_rcv_finish+0} <ffffffff802f3a43>{nf_hook_slow+195}
<ffffffff803056f0>{ip_rcv_finish+0} <ffffffff80305e34>{ip_rcv+1188}
<ffffffff802e9749>{netif_receive_skb+729}
<ffffffffa00e9c9c>{:e1000:e1000_clean+1820}
<ffffffff80132c13>{__wake_up+67} <ffffffff802e85b4>{net_rx_action+132}
<ffffffff8013dca1>{__do_softirq+113} <ffffffff8013dd55>{do_softirq+53}
<ffffffff80113e3f>{do_IRQ+335} <ffffffff8010f540>{mwait_idle+0}
<ffffffff80110cf5>{ret_from_intr+0} <EOI>
<ffffffff8010f596>{mwait_idle+86}
<ffffffff8010f9ea>{cpu_idle+26} <ffffffff804e971a>{start_kernel+490}
<ffffffff804e91e0>{_sinittext+480}


On Mon, 28 Feb 2005 13:07:13 +0100, Bernd Schubert
<[email protected]> wrote:
> Oh no, not this page allocation problems again. In summer I already posted
> problems with page allocation errors with 2.6.7, but to me it seemed that
> nobody cared. That time we got those problems every morning during the cron
> jobs and our main file server always completely crashed.
> This time its our cluster master system and first happend after an uptime
> of 89 days, kernel is 2.6.9. Besides of those messages, the system still
> seems to run stable
>
> I really beg for help here, so please please please help me solving this
> probem. What can I do to solve it?
>
> First a (dumb) question, what does 'page allocation failure' really mean?
> Is it some out of memory case?
>
> Thanks a lot in advance for any help,
> Bernd
>
> Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
> Feb 28 10:04:45 hitchcock kernel:
> Feb 28 10:04:45 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff803d064a>{ip_frag_create+26}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8016061e>{cache_grow+190} <ffffffff80160e80>{cache_alloc_refill+560}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff801617e3>{__kmalloc+195} <ffffffff803b5680>{alloc_skb+64}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8031727e>{tg3_alloc_rx_skb+222} <ffffffff80317553>{tg3_rx+371}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff80317977>{tg3_poll+183} <ffffffff803bc306>{net_rx_action+134}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8013d0ab>{__do_softirq+123} <ffffffff8013d162>{do_softirq+50}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff801140ab>{do_IRQ+347} <ffffffff801114eb>{ret_from_intr+0}
> Feb 28 10:04:45 hitchcock kernel: <EOI> <ffffffff8010f070>{default_idle+0} <ffffffff8010f094>{default_idle+36}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8010f147>{cpu_idle+39}
> Feb 28 10:05:41 hitchcock rpc.mountd: authenticated unmount request from beo-04:666 for /lib64 (/lib64)
> Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
> Feb 28 10:07:36 hitchcock kernel:
> Feb 28 10:07:36 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff8016061e>{cache_grow+190}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff80160e80>{cache_alloc_refill+560} <ffffffff801617e3>{__kmalloc+195}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff803b5680>{alloc_skb+64} <ffffffff8031727e>{tg3_alloc_rx_skb+222}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff80317553>{tg3_rx+371} <ffffffff80317977>{tg3_poll+183}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff803bc306>{net_rx_action+134} <ffffffff8013d0ab>{__do_softirq+123}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8013d162>{do_softirq+50} <ffffffff801140ab>{do_IRQ+347}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff801114eb>{ret_from_intr+0} <EOI> <ffffffff8010f070>{default_idle+0}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8010f094>{default_idle+36} <ffffffff8010f147>{cpu_idle+39}
> Feb 28 10:07:36 hitchcock kernel:
> Feb 28 10:07:36 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
> Feb 28 10:07:36 hitchcock kernel:
> Feb 28 10:07:36 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff8016061e>{cache_grow+190}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff80160e80>{cache_alloc_refill+560} <ffffffff801617e3>{__kmalloc+195}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff803b5680>{alloc_skb+64} <ffffffff8031727e>{tg3_alloc_rx_skb+222}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff80317553>{tg3_rx+371} <ffffffff80317977>{tg3_poll+183}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff803bc306>{net_rx_action+134} <ffffffff8013d0ab>{__do_softirq+123}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8013d162>{do_softirq+50} <ffffffff801140ab>{do_IRQ+347}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff801114eb>{ret_from_intr+0} <EOI> <ffffffff8010f070>{default_idle+0}
> Feb 28 10:07:36 hitchcock kernel: <ffffffff8010f094>{default_idle+36} <ffffffff8010f147>{cpu_idle+39}
>
> --
> Bernd Schubert
> Physikalisch Chemisches Institut / Theoretische Chemie
> Universit?t Heidelberg
> INF 229
> 69120 Heidelberg
> e-mail: [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2005-02-28 15:24:32

by Benjamin L. Shi

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

We've seen these, by adding the following tueables resolved the problem.
More specifically, the lower zone protection made the difference.

vm.vfs_cache_pressure=1000
vm.lower_zone_protection=100
vm.max_map_count = 32668
vm.min_free_kbytes = 10000


Bernd Schubert wrote:
> Oh no, not this page allocation problems again. In summer I already pos=
> ted
> problems with page allocation errors with 2.6.7, but to me it seemed th=
> at
> nobody cared. That time we got those problems every morning during the =
> cron
> jobs and our main file server always completely crashed.
> This time its our cluster master system and first happend after an upti=
> me
> of 89 days, kernel is 2.6.9. Besides of those messages, the system stil=
> l
> seems to run stable
>
> I really beg for help here, so please please please help me solving thi=
> s
> probem. What can I do to solve it?
>
> First a (dumb) question, what does 'page allocation failure' really m=
> ean?
> Is it some out of memory case?
>
>
> Thanks a lot in advance for any help,
> Bernd

2005-02-28 15:37:08

by Ravindra Nadgauda

[permalink] [raw]
Subject: Signals/ Communication from kernel to user!



Hello,
We wanted to establish a communication from kernel module (possibly a
driver) to a user level process.

Wanted to know whether signals can be used for this purpose OR there any
other (better) methods of communication??

Regards,
Ravindra N.

2005-02-28 15:39:21

by Bernd Schubert

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Hello Benjamin,

On Monday 28 February 2005 16:23, Benjamin L. Shi wrote:
> We've seen these, by adding the following tueables resolved the problem.
> More specifically, the lower zone protection made the difference.
>
> vm.vfs_cache_pressure=1000
> vm.lower_zone_protection=100
> vm.max_map_count = 32668
> vm.min_free_kbytes = 10000
>

many thanks, we will test this now and set those values on all of our 2.6.
systems.

Thanks a lot again,
Bernd

2005-02-28 16:51:50

by Timothy R. Chavez

[permalink] [raw]
Subject: Re: Signals/ Communication from kernel to user!

On Mon, 28 Feb 2005 21:06:57 +0530, Ravindra Nadgauda
<[email protected]> wrote:
>
>
> Hello,
> We wanted to establish a communication from kernel module (possibly a
> driver) to a user level process.
>
> Wanted to know whether signals can be used for this purpose OR there any
> other (better) methods of communication??

Perhaps netlink? Here's an introduction: http://qos.ittc.ku.edu/netlink/html/

>
> Regards,
> Ravindra N.

--
- Timothy R. Chavez

2005-02-28 19:17:39

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: Signals/ Communication from kernel to user!

>
>
> Hello,
> We wanted to establish a communication from kernel module (possibly a
> driver) to a user level process.
>
> Wanted to know whether signals can be used for this purpose OR there any
> other (better) methods of communication??
>
a bit brute force but you can simply run through the task list and kick
the pid of your user-space app (example for 2.4 kernel):

hofrat

---snip---
/*
* Copywrite 2002 Der Herr Hofrat
* License GPL V2
* Author [email protected]
*/
/*
* run through the task list of linux search for the passed pid and send it
* a SIGKILL . run as insmod pid=# to send process with pid # a kill signal
*/

#include <bits/signum.h> /* signal number macros SIGHUP etc. */
#include <linux/kernel.h> /* printk level */
#include <linux/module.h> /* kernel version etc. */
#include <linux/sched.h> /* task_struct */

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Der Herr Hofrat");
MODULE_DESCRIPTION("Signal to a user-space app from a kernel module");

int pid=0;
MODULE_PARM(pid,"i");

int
ksignal(int pid,int signum)
{
struct task_struct *p;

/* run through the task list of linux until we find our pid */
for_each_task(p){
if(p->pid == pid){
printk("sending signal %d for pid %d\n",signum,(int)p->pid);
/* don't have a sig_info struct to send along - pass 0 */
return send_sig(signum,p,0);
}
}
/* did not find the requested pid */
return -1;
}

int
init_module(void)
{
/* send pid a SIGKILL */
ksignal(pid,SIGKILL);
return 0;
}

void
cleanup_module(void)
{
printk("out of here\n");
}

2005-02-28 21:43:03

by Lee Revell

[permalink] [raw]
Subject: Re: Signals/ Communication from kernel to user!

On Mon, 2005-02-28 at 21:06 +0530, Ravindra Nadgauda wrote:
>
> Hello,
> We wanted to establish a communication from kernel module (possibly a
> driver) to a user level process.
>
> Wanted to know whether signals can be used for this purpose OR there any
> other (better) methods of communication??

If you need fast IPC forget about signals. They are way too slow.

The traditional UNIX way for userspace to talk to the kernel has been
ioctls. For various reasons ioctls are not highly regarded in the Linux
kernel community.

I believe the currently favored method is to have the driver create
sysfs entries which userspace read()s and write()s.

Really, your question is too vague. It would help if you said what
exactly you are trying to accomplish.

Lee

2005-03-01 01:12:38

by Robert Hancock

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Bernd Schubert wrote:
> Oh no, not this page allocation problems again. In summer I already posted
> problems with page allocation errors with 2.6.7, but to me it seemed that
> nobody cared. That time we got those problems every morning during the cron
> jobs and our main file server always completely crashed.
> This time its our cluster master system and first happend after an uptime
> of 89 days, kernel is 2.6.9. Besides of those messages, the system still
> seems to run stable
>
> I really beg for help here, so please please please help me solving this
> probem. What can I do to solve it?
>
> First a (dumb) question, what does 'page allocation failure' really mean?
> Is it some out of memory case?
> Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure. order:1, mode:0x20
> Feb 28 10:04:45 hitchcock kernel:
> Feb 28 10:04:45 hitchcock kernel: Call Trace:<IRQ> <ffffffff8015b0de>{__alloc_pages+878} <ffffffff8015b10e>{__get_free_pages+14}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8015edc6>{kmem_getpages+38} <ffffffff803d064a>{ip_frag_create+26}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8016061e>{cache_grow+190} <ffffffff80160e80>{cache_alloc_refill+560}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff801617e3>{__kmalloc+195} <ffffffff803b5680>{alloc_skb+64}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff8031727e>{tg3_alloc_rx_skb+222} <ffffffff80317553>{tg3_rx+371}
> Feb 28 10:04:45 hitchcock kernel: <ffffffff80317977>{tg3_poll+183} <ffffffff803bc306>{net_rx_action+134}

Essentially the tg3 Ethernet driver is trying to allocate memory to
store a received packet, and is unable to do so. Since this is done
inside interrupt context, this allocation has to be serviced from
physical memory. Order 1 means it only wanted one page of memory, and
since that failed it looks like the system must have been awfully short
on available physical RAM.. it could be some kind of kernel memory leak
or VM issue, though this condition may not be entirely unexpected in
certain cases, like if the system has little physical RAM free at a
certain point and then a flood of network packets arrive.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/


2005-03-01 01:20:33

by Nigel Cunningham

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Hi.

On Tue, 2005-03-01 at 12:10, Robert Hancock wrote:
> Bernd Schubert wrote:
> Essentially the tg3 Ethernet driver is trying to allocate memory to
> store a received packet, and is unable to do so. Since this is done
> inside interrupt context, this allocation has to be serviced from
> physical memory. Order 1 means it only wanted one page of memory, and

Minor point, I know, but it's 2 pages of memory. If it couldn't get an
order zero page, that would be even greater hernia material!

Regards,

Nigel

> since that failed it looks like the system must have been awfully short
> on available physical RAM.. it could be some kind of kernel memory leak
> or VM issue, though this condition may not be entirely unexpected in
> certain cases, like if the system has little physical RAM free at a
> certain point and then a flood of network packets arrive.
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://softwaresuspend.berlios.de


2005-03-01 01:26:41

by Nick Piggin

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Robert Hancock wrote:

> Bernd Schubert wrote:
>
>> Oh no, not this page allocation problems again. In summer I already
>> posted problems with page allocation errors with 2.6.7, but to me it
>> seemed that nobody cared. That time we got those problems every
>> morning during the cron jobs and our main file server always
>> completely crashed.
>> This time its our cluster master system and first happend after an
>> uptime of 89 days, kernel is 2.6.9. Besides of those messages, the
>> system still seems to run stable
>>
>> I really beg for help here, so please please please help me solving
>> this probem. What can I do to solve it?
>>

You should upgrade to the newest kernel if possible. If that's not possible,
increase /proc/sys/vm/min_free_kbytes

This allocation failure really should not cause your system to crash, but
increasing min_free_kbytes will make it less likely that you will see an
allocation failure.

>> First a (dumb) question, what does 'page allocation failure' really
>> mean? Is it some out of memory case?
>> Feb 28 10:04:45 hitchcock kernel: swapper: page allocation failure.
>> order:1, mode:0x20
>> Feb 28 10:04:45 hitchcock kernel:
>> Feb 28 10:04:45 hitchcock kernel: Call Trace:<IRQ>
>> <ffffffff8015b0de>{__alloc_pages+878}
>> <ffffffff8015b10e>{__get_free_pages+14}
>> Feb 28 10:04:45 hitchcock kernel:
>> <ffffffff8015edc6>{kmem_getpages+38}
>> <ffffffff803d064a>{ip_frag_create+26}
>> Feb 28 10:04:45 hitchcock kernel:
>> <ffffffff8016061e>{cache_grow+190}
>> <ffffffff80160e80>{cache_alloc_refill+560}
>> Feb 28 10:04:45 hitchcock kernel:
>> <ffffffff801617e3>{__kmalloc+195} <ffffffff803b5680>{alloc_skb+64}
>> Feb 28 10:04:45 hitchcock kernel:
>> <ffffffff8031727e>{tg3_alloc_rx_skb+222} <ffffffff80317553>{tg3_rx+371}
>> Feb 28 10:04:45 hitchcock kernel:
>> <ffffffff80317977>{tg3_poll+183} <ffffffff803bc306>{net_rx_action+134}
>
>
> Essentially the tg3 Ethernet driver is trying to allocate memory to
> store a received packet, and is unable to do so. Since this is done
> inside interrupt context, this allocation has to be serviced from
> physical memory. Order 1 means it only wanted one page of memory, and
> since that failed it looks like the system must have been awfully
> short on available physical RAM.. it could be some kind of kernel
> memory leak or VM issue, though this condition may not be entirely
> unexpected in certain cases, like if the system has little physical
> RAM free at a certain point and then a flood of network packets arrive.
>

Yep. The reason why these failures are beeing seen is that earlier
kernels did
not reserve enough memory for GFP_ATOMIC allocations. Later kernels
increased
this, and also made higher order (ie. greater than 0) GFP_ATOMIC allocations
more robust.


2005-03-01 01:29:42

by Robert Hancock

[permalink] [raw]
Subject: Re: swapper: page allocation failure. order:1, mode:0x20

Nigel Cunningham wrote:
> Hi.
>
> On Tue, 2005-03-01 at 12:10, Robert Hancock wrote:
>
>>Bernd Schubert wrote:
>>Essentially the tg3 Ethernet driver is trying to allocate memory to
>>store a received packet, and is unable to do so. Since this is done
>>inside interrupt context, this allocation has to be serviced from
>>physical memory. Order 1 means it only wanted one page of memory, and
>
>
> Minor point, I know, but it's 2 pages of memory. If it couldn't get an
> order zero page, that would be even greater hernia material!

Indeed.. off-by-one error :-)