2009-04-03 21:25:01

by Keyoor Khristi

[permalink] [raw]
Subject: Issue with netlink implementation

Hi All,

We are facing an issue with netlink implementation. We are sending the
data from user space to kernel space. a kernel thread gets the data
from netlink socket (via skb_recv_datagram). After doing little bit
processing we give the skb to driver. driver sends the data and frees
the skb. Sometimes we observe that the writer thread in userspace gets
stuck writing a packet and doesnt come out. We traced the problem to
netlink_attachskb. When a packet is sent on the netlink socket
netlink_attachskb is called to add the skb to the queue. when there is
not enough space, the thread is added to nlk->wait queue and issues
schedule_timeout. it doesnt come out of it as no other thread awakens
it. It seems when the driver frees the skb, the data is freed and the
receive space is made available in the destructor but the thread
waiting is not awaken. This is causing the problem we're seeing.
I think the netlink implementation in af_netlink.c can be enhanced. In
netlink_attachskb, after invoking skb_set_owner_r we should change the
skb->destructor to point to newly added function netlink_rfree. When
skb is freed, netlink_rfree function can issue sock_rfree and awaken
the threads waiting on nlk->wait queue.

Please let me know what you think about it.
I'm not sure if i've sent the email to correct forum. If i should be
sending to another forum, please tell me.

Regards,
K


2009-04-04 05:06:39

by David Miller

[permalink] [raw]
Subject: Re: Issue with netlink implementation

From: Keyoor Khristi <[email protected]>
Date: Sat, 4 Apr 2009 02:54:48 +0530

You might want to ask networking questions on the networking
developer list, [email protected], CC:'d

Thank you.

> Hi All,
>
> We are facing an issue with netlink implementation. We are sending the
> data from user space to kernel space. a kernel thread gets the data
> from netlink socket (via skb_recv_datagram). After doing little bit
> processing we give the skb to driver. driver sends the data and frees
> the skb. Sometimes we observe that the writer thread in userspace gets
> stuck writing a packet and doesnt come out. We traced the problem to
> netlink_attachskb. When a packet is sent on the netlink socket
> netlink_attachskb is called to add the skb to the queue. when there is
> not enough space, the thread is added to nlk->wait queue and issues
> schedule_timeout. it doesnt come out of it as no other thread awakens
> it. It seems when the driver frees the skb, the data is freed and the
> receive space is made available in the destructor but the thread
> waiting is not awaken. This is causing the problem we're seeing.
> I think the netlink implementation in af_netlink.c can be enhanced. In
> netlink_attachskb, after invoking skb_set_owner_r we should change the
> skb->destructor to point to newly added function netlink_rfree. When
> skb is freed, netlink_rfree function can issue sock_rfree and awaken
> the threads waiting on nlk->wait queue.
>
> Please let me know what you think about it.
> I'm not sure if i've sent the email to correct forum. If i should be
> sending to another forum, please tell me.
>
> Regards,
> K
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-04-04 09:33:30

by Patrick McHardy

[permalink] [raw]
Subject: Re: Issue with netlink implementation

> From: Keyoor Khristi <[email protected]>
>> We are facing an issue with netlink implementation. We are sending the
>> data from user space to kernel space. a kernel thread gets the data
>> from netlink socket (via skb_recv_datagram). After doing little bit
>> processing we give the skb to driver. driver sends the data and frees
>> the skb. Sometimes we observe that the writer thread in userspace gets
>> stuck writing a packet and doesnt come out. We traced the problem to
>> netlink_attachskb. When a packet is sent on the netlink socket
>> netlink_attachskb is called to add the skb to the queue. when there is
>> not enough space, the thread is added to nlk->wait queue and issues
>> schedule_timeout. it doesnt come out of it as no other thread awakens
>> it. It seems when the driver frees the skb, the data is freed and the
>> receive space is made available in the destructor but the thread
>> waiting is not awaken. This is causing the problem we're seeing.
>> I think the netlink implementation in af_netlink.c can be enhanced. In
>> netlink_attachskb, after invoking skb_set_owner_r we should change the
>> skb->destructor to point to newly added function netlink_rfree. When
>> skb is freed, netlink_rfree function can issue sock_rfree and awaken
>> the threads waiting on nlk->wait queue.

This sounds like you're not using netlink_kernel_create() to create
your netlink socket. Messages from userspace to the kernel are processed
synchronously, your process should never end up on the wait queue if
you've set up the netlink socket in the kernel properly:

int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
u32 pid, int nonblock)
{
struct sock *sk;
int err;
long timeo;

skb = netlink_trim(skb, gfp_any());

timeo = sock_sndtimeo(ssk, nonblock);
retry:
sk = netlink_getsockbypid(ssk, pid);
if (IS_ERR(sk)) {
kfree_skb(skb);
return PTR_ERR(sk);
}
if (netlink_is_kernel(sk))
return netlink_unicast_kernel(sk, skb);
...

>> a kernel thread gets the data from netlink socket (via
skb_recv_datagram)

You need to either use netlink kernel sockets or go through
recvmsg().