2004-04-19 12:52:54

by Hariprasad Nellitheertha

[permalink] [raw]
Subject: Problem with Netpoll based netdumping and NAPI

Hi All,

I am facing a problem while trying to network dump using LKCD. My
debugging so far indicates that this is due to both NAPI and NETPOLL
being enabled.

I am using LKCD on the 2.6.5 kernel and both the client and server are
i386 boxes. The dumping machine has an e100 card. I have built the kernel
with both CONFIG_E100_NAPI and CONFIG_NET_POLL_CONTROLLER (and the other
netpoll related options) selected.

LKCD uses netpoll for its network dump implementation. The problem we see
is that the network dump driver does not receive any packet from the
card driver and hence dumping fails. In e100_intr(), we call
netif_rx_schedule() if we are using the NAPI feature. netif_rx_schedule,
in turn, ends up adding the processing of this packet to the NET_RX_SOFTIRQ
softirq.

When we do netdump, all the other cpus are halted and interrupts disabled.
So, we never get around to scheduling the ksoftirqd thread and these packets
are never processed. I think any one using netpoll with NAPI logic turned on
will face this problem.

One way I worked around this was to avoid the NAPI logic when we end up
in e100_intr() due to netpoll. But I think a better solution would be to
let the NAPI code handle this.

Request all to comment and suggest an effective way to fix this problem.
Thanks in advance.

Regards, Hari
--
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore


2004-04-19 17:43:45

by Matt Mackall

[permalink] [raw]
Subject: Re: Problem with Netpoll based netdumping and NAPI

[changed cc: from linux-net to netdev]

On Mon, Apr 19, 2004 at 06:21:48PM +0530, Hariprasad Nellitheertha wrote:
> Hi All,
>
> I am facing a problem while trying to network dump using LKCD. My
> debugging so far indicates that this is due to both NAPI and NETPOLL
> being enabled.
>
> I am using LKCD on the 2.6.5 kernel and both the client and server are
> i386 boxes. The dumping machine has an e100 card. I have built the kernel
> with both CONFIG_E100_NAPI and CONFIG_NET_POLL_CONTROLLER (and the other
> netpoll related options) selected.
>
> LKCD uses netpoll for its network dump implementation. The problem we see
> is that the network dump driver does not receive any packet from the
> card driver and hence dumping fails. In e100_intr(), we call
> netif_rx_schedule() if we are using the NAPI feature. netif_rx_schedule,
> in turn, ends up adding the processing of this packet to the NET_RX_SOFTIRQ
> softirq.

Netpoll should be manually calling the NAPI poll function like this
after calling the interrupt handler (in netpoll_poll()):

/* If scheduling is stopped, tickle NAPI bits */
if(trapped && np->dev->poll &&
test_bit(__LINK_STATE_RX_SCHED, &np->dev->state))
np->dev->poll(np->dev, &budget);

Please ensure that LKCD is calling netpoll_set_trap(1) which tells it
that packet scheduling is stopped.

I've tested this path primarily with tg3 and kgdb-over-ethernet, but
it should be functionally quite similar to e100 and lkcd.

--
Matt Mackall : http://www.selenic.com : Linux development and consulting

2004-04-21 06:02:39

by Hariprasad Nellitheertha

[permalink] [raw]
Subject: Re: Problem with Netpoll based netdumping and NAPI

Hi Matt,

On Mon, Apr 19, 2004 at 12:42:54PM -0500, Matt Mackall wrote:
> [changed cc: from linux-net to netdev]
>
> On Mon, Apr 19, 2004 at 06:21:48PM +0530, Hariprasad Nellitheertha wrote:
> > Hi All,
> >
> > I am facing a problem while trying to network dump using LKCD. My
> > debugging so far indicates that this is due to both NAPI and NETPOLL
> > being enabled.
> >
> > I am using LKCD on the 2.6.5 kernel and both the client and server are
> > i386 boxes. The dumping machine has an e100 card. I have built the kernel
> > with both CONFIG_E100_NAPI and CONFIG_NET_POLL_CONTROLLER (and the other
> > netpoll related options) selected.
> >
> > LKCD uses netpoll for its network dump implementation. The problem we see
> > is that the network dump driver does not receive any packet from the
> > card driver and hence dumping fails. In e100_intr(), we call
> > netif_rx_schedule() if we are using the NAPI feature. netif_rx_schedule,
> > in turn, ends up adding the processing of this packet to the NET_RX_SOFTIRQ
> > softirq.
>
> Netpoll should be manually calling the NAPI poll function like this
> after calling the interrupt handler (in netpoll_poll()):
>
> /* If scheduling is stopped, tickle NAPI bits */
> if(trapped && np->dev->poll &&
> test_bit(__LINK_STATE_RX_SCHED, &np->dev->state))
> np->dev->poll(np->dev, &budget);
>
> Please ensure that LKCD is calling netpoll_set_trap(1) which tells it
> that packet scheduling is stopped.

This was indeed the problem. We were not calling netpoll_set_trap in LKCD.
Adding this fixed the problem. Thanks so much for your help with this.

Regards, Hari

>
> I've tested this path primarily with tg3 and kgdb-over-ethernet, but
> it should be functionally quite similar to e100 and lkcd.
>
> --
> Matt Mackall : http://www.selenic.com : Linux development and consulting

--
Hariprasad Nellitheertha
Linux Technology Center
India Software Labs
IBM India, Bangalore