commit e22b886a8a43b ("sched/wait: Add might_sleep() checks") included
in today's linux-next added a check that fires on e1000 with netpoll:
BUG: sleeping function called from invalid context at kernel/irq/manage.c:104
in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: systemd
no locks held by systemd/1.
irq event stamp: 10102965
hardirqs last enabled at (10102965): [<ffffffff810cbafd>] vprintk_emit+0x2dd/0x6a0
hardirqs last disabled at (10102964): [<ffffffff810cb897>] vprintk_emit+0x77/0x6a0
softirqs last enabled at (10102342): [<ffffffff810666aa>] __do_softirq+0x27a/0x6f0
softirqs last disabled at (10102337): [<ffffffff81066e86>] irq_exit+0x56/0xe0
Preemption disabled at:[<ffffffff817de50d>] printk_emit+0x31/0x33
CPU: 1 PID: 1 Comm: systemd Not tainted 3.18.0-rc2-next-20141029-dirty #222
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014
ffffffff81a82291 ffff88001e743978 ffffffff817df31d 0000000000000000
0000000000000000 ffff88001e7439a8 ffffffff8108dfa2 ffff88001e7439a8
ffffffff81a82291 0000000000000068 0000000000000000 ffff88001e7439d8
Call Trace:
[<ffffffff817df31d>] dump_stack+0x4f/0x7c
[<ffffffff8108dfa2>] ___might_sleep+0x182/0x2b0
[<ffffffff8108e10a>] __might_sleep+0x3a/0xc0
[<ffffffff810ce358>] synchronize_irq+0x38/0xa0
[<ffffffff810ce633>] ? __disable_irq_nosync+0x43/0x70
[<ffffffff810ce690>] disable_irq+0x20/0x30
[<ffffffff815d7253>] e1000_netpoll+0x23/0x60
[<ffffffff81678d02>] netpoll_poll_dev+0x72/0x3a0
[<ffffffff817e9993>] ? _raw_spin_trylock+0x73/0x90
[<ffffffff8167920f>] ? netpoll_send_skb_on_dev+0x1df/0x2e0
[<ffffffff816791e7>] netpoll_send_skb_on_dev+0x1b7/0x2e0
[<ffffffff816795f3>] netpoll_send_udp+0x2e3/0x490
[<ffffffff815d1f61>] ? write_msg+0x51/0x140
[<ffffffff815d1fdf>] write_msg+0xcf/0x140
[<ffffffff810cadbb>] call_console_drivers.constprop.22+0x13b/0x2a0
[<ffffffff810cb6bd>] console_unlock+0x39d/0x500
[<ffffffff810cbb3e>] ? vprintk_emit+0x31e/0x6a0
[<ffffffff810cbb5c>] vprintk_emit+0x33c/0x6a0
[<ffffffff811a6c6e>] ? might_fault+0x5e/0xc0
[<ffffffff817de50d>] printk_emit+0x31/0x33
[<ffffffff810cbfad>] devkmsg_write+0xbd/0x110
[<ffffffff811f24d5>] do_iter_readv_writev+0x65/0xa0
[<ffffffff811f3b72>] do_readv_writev+0xe2/0x290
[<ffffffff810cbef0>] ? vprintk+0x30/0x30
[<ffffffff810d499d>] ? rcu_read_lock_held+0x6d/0x70
[<ffffffff812142d6>] ? __fget_light+0xc6/0xd0
[<ffffffff811f3dac>] vfs_writev+0x3c/0x50
[<ffffffff811f3eed>] SyS_writev+0x4d/0xe0
[<ffffffff817ea16d>] system_call_fastpath+0x16/0x1b
I'm able to reproduce it consistently by sending a lot of packets from
that interface while writing to /dev/kmsg with netconsole
enabled. Just writing to /dev/kmsg isn't enough.
# with ping -f or pktgen running
for i in `seq 1 20` ; do echo '............................................' > /dev/kmsg ; done
--
Sabrina
On Wed, Oct 29, 2014 at 04:56:20PM +0100, Sabrina Dubroca wrote:
> commit e22b886a8a43b ("sched/wait: Add might_sleep() checks") included
> in today's linux-next added a check that fires on e1000 with netpoll:
>
>
> BUG: sleeping function called from invalid context at kernel/irq/manage.c:104
> in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: systemd
> no locks held by systemd/1.
> irq event stamp: 10102965
> hardirqs last enabled at (10102965): [<ffffffff810cbafd>] vprintk_emit+0x2dd/0x6a0
> hardirqs last disabled at (10102964): [<ffffffff810cb897>] vprintk_emit+0x77/0x6a0
> softirqs last enabled at (10102342): [<ffffffff810666aa>] __do_softirq+0x27a/0x6f0
> softirqs last disabled at (10102337): [<ffffffff81066e86>] irq_exit+0x56/0xe0
> Preemption disabled at:[<ffffffff817de50d>] printk_emit+0x31/0x33
>
> CPU: 1 PID: 1 Comm: systemd Not tainted 3.18.0-rc2-next-20141029-dirty #222
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014
> ffffffff81a82291 ffff88001e743978 ffffffff817df31d 0000000000000000
> 0000000000000000 ffff88001e7439a8 ffffffff8108dfa2 ffff88001e7439a8
> ffffffff81a82291 0000000000000068 0000000000000000 ffff88001e7439d8
> Call Trace:
> [<ffffffff817df31d>] dump_stack+0x4f/0x7c
> [<ffffffff8108dfa2>] ___might_sleep+0x182/0x2b0
> [<ffffffff8108e10a>] __might_sleep+0x3a/0xc0
> [<ffffffff810ce358>] synchronize_irq+0x38/0xa0
> [<ffffffff810ce690>] disable_irq+0x20/0x30
> [<ffffffff815d7253>] e1000_netpoll+0x23/0x60
> [<ffffffff81678d02>] netpoll_poll_dev+0x72/0x3a0
> [<ffffffff816791e7>] netpoll_send_skb_on_dev+0x1b7/0x2e0
> [<ffffffff816795f3>] netpoll_send_udp+0x2e3/0x490
Oh cute.. not entirely sure what to do there. This only works if you
_know_ desc->threads_active will never be !0.
The best I can come up with is something like this, which avoids the
might_sleep() in the one special case.
Thomas?
---
kernel/irq/manage.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..b7cb736a8b32 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -100,8 +100,11 @@ void synchronize_irq(unsigned int irq)
* running. Now verify that no threaded handlers are
* active.
*/
- wait_event(desc->wait_for_threads,
- !atomic_read(&desc->threads_active));
+ if (atomic_read(&desc->threads_active)) {
+ might_sleep();
+ __wait_event(desc->wait_for_threads,
+ !atomic_read(&desc->threads_active));
+ }
}
}
EXPORT_SYMBOL(synchronize_irq);
On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 04:56:20PM +0100, Sabrina Dubroca wrote:
> > commit e22b886a8a43b ("sched/wait: Add might_sleep() checks") included
> > in today's linux-next added a check that fires on e1000 with netpoll:
> >
> >
> > BUG: sleeping function called from invalid context at kernel/irq/manage.c:104
> > in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: systemd
> > no locks held by systemd/1.
> > irq event stamp: 10102965
> > hardirqs last enabled at (10102965): [<ffffffff810cbafd>] vprintk_emit+0x2dd/0x6a0
> > hardirqs last disabled at (10102964): [<ffffffff810cb897>] vprintk_emit+0x77/0x6a0
> > softirqs last enabled at (10102342): [<ffffffff810666aa>] __do_softirq+0x27a/0x6f0
> > softirqs last disabled at (10102337): [<ffffffff81066e86>] irq_exit+0x56/0xe0
> > Preemption disabled at:[<ffffffff817de50d>] printk_emit+0x31/0x33
> >
> > CPU: 1 PID: 1 Comm: systemd Not tainted 3.18.0-rc2-next-20141029-dirty #222
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014
> > ffffffff81a82291 ffff88001e743978 ffffffff817df31d 0000000000000000
> > 0000000000000000 ffff88001e7439a8 ffffffff8108dfa2 ffff88001e7439a8
> > ffffffff81a82291 0000000000000068 0000000000000000 ffff88001e7439d8
> > Call Trace:
> > [<ffffffff817df31d>] dump_stack+0x4f/0x7c
> > [<ffffffff8108dfa2>] ___might_sleep+0x182/0x2b0
> > [<ffffffff8108e10a>] __might_sleep+0x3a/0xc0
> > [<ffffffff810ce358>] synchronize_irq+0x38/0xa0
> > [<ffffffff810ce690>] disable_irq+0x20/0x30
> > [<ffffffff815d7253>] e1000_netpoll+0x23/0x60
> > [<ffffffff81678d02>] netpoll_poll_dev+0x72/0x3a0
> > [<ffffffff816791e7>] netpoll_send_skb_on_dev+0x1b7/0x2e0
> > [<ffffffff816795f3>] netpoll_send_udp+0x2e3/0x490
>
> Oh cute.. not entirely sure what to do there. This only works if you
> _know_ desc->threads_active will never be !0.
>
> The best I can come up with is something like this, which avoids the
> might_sleep() in the one special case.
>
> Thomas?
Yuck. No. You are just papering over the problem.
What happens if you add 'threadirqs' to the kernel command line? Or if
the interrupt line is shared with a real threaded interrupt user?
The proper solution is to have a poll_lock for e1000 which serializes
the hardware interrupt against netpoll instead of using
disable/enable_irq().
In fact that's less expensive than the disable/enable_irq() dance and
the chance of contention is pretty low. If done right it will be a
NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
Thanks,
tglx
On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> Yuck. No. You are just papering over the problem.
>
> What happens if you add 'threadirqs' to the kernel command line? Or if
> the interrupt line is shared with a real threaded interrupt user?
>
> The proper solution is to have a poll_lock for e1000 which serializes
> the hardware interrupt against netpoll instead of using
> disable/enable_irq().
>
> In fact that's less expensive than the disable/enable_irq() dance and
> the chance of contention is pretty low. If done right it will be a
> NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
>
OK a little something like so then I suppose.. But I suspect most all
the network drivers will need this and maybe more, disable_irq() is a
popular little thing and we 'just' changed semantics on them.
---
drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
kernel/irq/manage.c | 2 +-
3 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 69707108d23c..3f48609f2318 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -323,6 +323,8 @@ struct e1000_adapter {
struct delayed_work watchdog_task;
struct delayed_work fifo_stall_task;
struct delayed_work phy_info_task;
+
+ spinlock_t irq_lock;
};
enum e1000_state_t {
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 5f6aded512f5..d12cbffe2149 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
e1000_irq_disable(adapter);
spin_lock_init(&adapter->stats_lock);
+ spin_lock_init(&adapter->irq_lock);
set_bit(__E1000_DOWN, &adapter->flags);
@@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
* @irq: interrupt number
* @data: pointer to a network interface device structure
**/
-static irqreturn_t e1000_intr(int irq, void *data)
+static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
{
- struct net_device *netdev = data;
- struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
u32 icr = er32(ICR);
@@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
return IRQ_HANDLED;
}
+static irqreturn_t e1000_intr(int irq, void *data)
+{
+ struct net_device *netdev = data;
+ struct e1000_adapter *adapter = netdev_priv(netdev);
+ irqreturn_t ret;
+
+ spin_lock(&adapter->irq_lock);
+ ret = __e1000_intr(irq, adapter);
+ spin_unlock(&adapter->irq_lock);
+
+ return ret;
+}
+
/**
* e1000_clean - NAPI Rx polling callback
* @adapter: board private structure
@@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
- disable_irq(adapter->pdev->irq);
+ spin_lock(&adapter->irq_lock)
e1000_intr(adapter->pdev->irq, netdev);
- enable_irq(adapter->pdev->irq);
+ spin_unlock(&adapter->irq_lock)
}
#endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..b5a4a06bf2fd 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
* to complete before returning. If you use this function while
* holding a resource the IRQ handler may need you will deadlock.
*
- * This function may be called - with care - from IRQ context.
+ * This function may _NOT_ be called from IRQ context.
*/
void disable_irq(unsigned int irq)
{
On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> >
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> >
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> >
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> >
>
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.
Thomas- if you are fine with Peter's patch, I can get this under
testing.
>
> ---
> drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
> drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
> kernel/irq/manage.c | 2 +-
> 3 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
> struct delayed_work watchdog_task;
> struct delayed_work fifo_stall_task;
> struct delayed_work phy_info_task;
> +
> + spinlock_t irq_lock;
> };
>
> enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
> e1000_irq_disable(adapter);
>
> spin_lock_init(&adapter->stats_lock);
> + spin_lock_init(&adapter->irq_lock);
>
> set_bit(__E1000_DOWN, &adapter->flags);
>
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
> * @irq: interrupt number
> * @data: pointer to a network interface device structure
> **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
> {
> - struct net_device *netdev = data;
> - struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = &adapter->hw;
> u32 icr = er32(ICR);
>
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
> return IRQ_HANDLED;
> }
>
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> + struct net_device *netdev = data;
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + irqreturn_t ret;
> +
> + spin_lock(&adapter->irq_lock);
> + ret = __e1000_intr(irq, adapter);
> + spin_unlock(&adapter->irq_lock);
> +
> + return ret;
> +}
> +
> /**
> * e1000_clean - NAPI Rx polling callback
> * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>
> - disable_irq(adapter->pdev->irq);
> + spin_lock(&adapter->irq_lock)
> e1000_intr(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> + spin_unlock(&adapter->irq_lock)
> }
> #endif
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
> * to complete before returning. If you use this function while
> * holding a resource the IRQ handler may need you will deadlock.
> *
> - * This function may be called - with care - from IRQ context.
> + * This function may _NOT_ be called from IRQ context.
> */
> void disable_irq(unsigned int irq)
> {
On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> >
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> >
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> >
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> >
>
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.
We changed that almost 4 years ago :) What we 'just' did was to add a
prominent warning into the code.
> ---
> drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
> drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
> kernel/irq/manage.c | 2 +-
> 3 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
> struct delayed_work watchdog_task;
> struct delayed_work fifo_stall_task;
> struct delayed_work phy_info_task;
> +
> + spinlock_t irq_lock;
> };
>
> enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
> e1000_irq_disable(adapter);
>
> spin_lock_init(&adapter->stats_lock);
> + spin_lock_init(&adapter->irq_lock);
>
> set_bit(__E1000_DOWN, &adapter->flags);
>
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
> * @irq: interrupt number
> * @data: pointer to a network interface device structure
> **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
> {
> - struct net_device *netdev = data;
> - struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = &adapter->hw;
> u32 icr = er32(ICR);
>
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
> return IRQ_HANDLED;
> }
>
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> + struct net_device *netdev = data;
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + irqreturn_t ret;
> +
> + spin_lock(&adapter->irq_lock);
> + ret = __e1000_intr(irq, adapter);
> + spin_unlock(&adapter->irq_lock);
> +
> + return ret;
> +}
> +
> /**
> * e1000_clean - NAPI Rx polling callback
> * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>
> - disable_irq(adapter->pdev->irq);
> + spin_lock(&adapter->irq_lock)
> e1000_intr(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> + spin_unlock(&adapter->irq_lock)
> }
> #endif
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
> * to complete before returning. If you use this function while
> * holding a resource the IRQ handler may need you will deadlock.
> *
> - * This function may be called - with care - from IRQ context.
> + * This function may _NOT_ be called from IRQ context.
It can only be called from preemptible thread context.
Thanks,
tglx
On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
>
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > >
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > >
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > >
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > >
> >
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
>
> We changed that almost 4 years ago :) What we 'just' did was to add a
> prominent warning into the code.
You know that is the same right... they didn't know it was broken
therefore it wasn't :-), but now they need to go actually do stuff about
it, an entirely different proposition.
On Wed, 29 Oct 2014, Jeff Kirsher wrote:
> On Wed, 2014-10-29 at 20:36 +0100, Peter Zijlstra wrote:
> > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > Yuck. No. You are just papering over the problem.
> > >
> > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > the interrupt line is shared with a real threaded interrupt user?
> > >
> > > The proper solution is to have a poll_lock for e1000 which serializes
> > > the hardware interrupt against netpoll instead of using
> > > disable/enable_irq().
> > >
> > > In fact that's less expensive than the disable/enable_irq() dance and
> > > the chance of contention is pretty low. If done right it will be a
> > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > >
> >
> > OK a little something like so then I suppose.. But I suspect most all
> > the network drivers will need this and maybe more, disable_irq() is a
> > popular little thing and we 'just' changed semantics on them.
>
> Thomas- if you are fine with Peter's patch, I can get this under
> testing.
I'm fine with it except for the comment part of disable_irq(), but
that does not matter :)
One nitpick: Instead of having the lock unconditionally, I'd make it
depend on CONFIG_NET_POLL_CONTROLLER.
#ifdef CONFIG_NET_POLL_CONTROLLER
static inline void netpoll_lock(struct e1000_adapter *adapter)
{
spin_lock(&adapter->irq_lock);
}
static inline void netpoll_unlock(struct e1000_adapter *adapter)
{
spin_unlock(&adapter->irq_lock);
}
#else
static inline void netpoll_lock(struct e1000_adapter *adapter) { }
static inline void netpoll_unlock(struct e1000_adapter *adapter) { }
#endif
and use that instead of the unconditional spin[un]lock() invocations.
But that's up to you.
Thanks,
tglx
On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> >
> > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > Yuck. No. You are just papering over the problem.
> > > >
> > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > the interrupt line is shared with a real threaded interrupt user?
> > > >
> > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > the hardware interrupt against netpoll instead of using
> > > > disable/enable_irq().
> > > >
> > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > the chance of contention is pretty low. If done right it will be a
> > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > >
> > >
> > > OK a little something like so then I suppose.. But I suspect most all
> > > the network drivers will need this and maybe more, disable_irq() is a
> > > popular little thing and we 'just' changed semantics on them.
> >
> > We changed that almost 4 years ago :) What we 'just' did was to add a
> > prominent warning into the code.
>
> You know that is the same right... they didn't know it was broken
> therefore it wasn't :-), but now they need to go actually do stuff about
> it, an entirely different proposition.
Right, and of course the world and some more has the very same code
there:
poll_controller()
{
disable_irq();
dev_interrupt_handler();
enable_irq();
}
Trying to twist my brain to come up with a solution which avoids the
spinlock, but I have a hard time to come up with one.
The only thing I came up with so far is to avoid adding locks to every
driver incarnation and instead put it into struct net_device and
provide helper functions for the lock/unlock case.
That does not change the fact that we need to deal with that on a per
driver basis :(
Thanks,
tglx
On Wed, 29 Oct 2014, Thomas Gleixner wrote:
> On Wed, 29 Oct 2014, Peter Zijlstra wrote:
>
> > On Wed, Oct 29, 2014 at 08:49:03PM +0100, Thomas Gleixner wrote:
> > > On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> > >
> > > > On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > > > > Yuck. No. You are just papering over the problem.
> > > > >
> > > > > What happens if you add 'threadirqs' to the kernel command line? Or if
> > > > > the interrupt line is shared with a real threaded interrupt user?
> > > > >
> > > > > The proper solution is to have a poll_lock for e1000 which serializes
> > > > > the hardware interrupt against netpoll instead of using
> > > > > disable/enable_irq().
> > > > >
> > > > > In fact that's less expensive than the disable/enable_irq() dance and
> > > > > the chance of contention is pretty low. If done right it will be a
> > > > > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> > > > >
> > > >
> > > > OK a little something like so then I suppose.. But I suspect most all
> > > > the network drivers will need this and maybe more, disable_irq() is a
> > > > popular little thing and we 'just' changed semantics on them.
> > >
> > > We changed that almost 4 years ago :) What we 'just' did was to add a
> > > prominent warning into the code.
> >
> > You know that is the same right... they didn't know it was broken
> > therefore it wasn't :-), but now they need to go actually do stuff about
> > it, an entirely different proposition.
>
> Right, and of course the world and some more has the very same code
> there:
>
> poll_controller()
> {
> disable_irq();
> dev_interrupt_handler();
> enable_irq();
> }
>
> Trying to twist my brain to come up with a solution which avoids the
> spinlock, but I have a hard time to come up with one.
>
> The only thing I came up with so far is to avoid adding locks to every
> driver incarnation and instead put it into struct net_device and
> provide helper functions for the lock/unlock case.
>
> That does not change the fact that we need to deal with that on a per
> driver basis :(
But at least it allows to mitigate the impact by making it conditional
at a central point.
static inline void netpoll_lock(struct net_device *nd)
{
if (netpoll_active(nd))
spin_lock(&nd->netpoll_lock);
}
and let the core code make sure that activation/deactivation of
netpoll on a particular interface is serialized against the interrupt
and netpoll calls.
Not sure if it's worth the trouble, but at least it allows to deal
with it in the core instead of dealing with it on a per driver base.
Thanks,
tglx
On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> But at least it allows to mitigate the impact by making it conditional
> at a central point.
>
> static inline void netpoll_lock(struct net_device *nd)
> {
> if (netpoll_active(nd))
> spin_lock(&nd->netpoll_lock);
> }
branch fail vs lock might be a toss on most machines, but if we're
hitting cold cachelines we loose big.
> and let the core code make sure that activation/deactivation of
> netpoll on a particular interface is serialized against the interrupt
> and netpoll calls.
>
> Not sure if it's worth the trouble, but at least it allows to deal
> with it in the core instead of dealing with it on a per driver base.
Does multi-queue have one netdev per queue or does that need moar
logicz?
On Wed, 29 Oct 2014, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 09:23:42PM +0100, Thomas Gleixner wrote:
> > But at least it allows to mitigate the impact by making it conditional
> > at a central point.
> >
> > static inline void netpoll_lock(struct net_device *nd)
> > {
> > if (netpoll_active(nd))
> > spin_lock(&nd->netpoll_lock);
> > }
>
> branch fail vs lock might be a toss on most machines, but if we're
> hitting cold cachelines we loose big.
Well, if the net_device is not cache hot on irq entry you have lost
already. The extra branch/lock is not going to add much to that.
Thanks,
tglx
Hello, sorry for the delay.
2014-10-29, 20:36:03 +0100, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
> > Yuck. No. You are just papering over the problem.
> >
> > What happens if you add 'threadirqs' to the kernel command line? Or if
> > the interrupt line is shared with a real threaded interrupt user?
> >
> > The proper solution is to have a poll_lock for e1000 which serializes
> > the hardware interrupt against netpoll instead of using
> > disable/enable_irq().
> >
> > In fact that's less expensive than the disable/enable_irq() dance and
> > the chance of contention is pretty low. If done right it will be a
> > NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
> >
>
> OK a little something like so then I suppose.. But I suspect most all
> the network drivers will need this and maybe more, disable_irq() is a
> popular little thing and we 'just' changed semantics on them.
>
> ---
> drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
> drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
> kernel/irq/manage.c | 2 +-
> 3 files changed, 20 insertions(+), 6 deletions(-)
I've been running with variants of this patch, things seem ok.
As noted earlier, there are a lot of drivers doing this disable_irq +
irq_handler + enable_irq sequence. I found about 60.
Many already take a lock in the interrupt handler, and look like we
could just remove the call to disable_irq (example: cp_interrupt,
drivers/net/ethernet/realtek/8139cp.c).
Here's how I modified your patch. The locking compiles away if
CONFIG_NET_POLL_CONTROLLER=n.
I can work on converting all the drivers from disable_irq to
netpoll_irq_lock, if that's okay with you.
In igb there's also a synchronize_irq() called from the netpoll
controller (in igb_irq_disable()), I think a similar locking scheme
would work.
I also saw a few disable_irq_nosync and disable_percpu_irq. These are
okay?
Thanks.
---
diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 69707108d23c..79444125b9bd 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -68,6 +68,7 @@
#include <linux/mii.h>
#include <linux/ethtool.h>
#include <linux/if_vlan.h>
+#include <linux/netpoll.h>
#define BAR_0 0
#define BAR_1 1
@@ -323,6 +324,8 @@ struct e1000_adapter {
struct delayed_work watchdog_task;
struct delayed_work fifo_stall_task;
struct delayed_work phy_info_task;
+
+ struct netpoll_irq_lock netpoll_lock;
};
enum e1000_state_t {
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 24f3986cfae2..5749a27e5c5e 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -32,6 +32,7 @@
#include <linux/prefetch.h>
#include <linux/bitops.h>
#include <linux/if_vlan.h>
+#include <linux/netpoll.h>
char e1000_driver_name[] = "e1000";
static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
@@ -1313,6 +1314,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
e1000_irq_disable(adapter);
spin_lock_init(&adapter->stats_lock);
+ netpoll_irq_lock_init(&adapter->netpoll_lock);
set_bit(__E1000_DOWN, &adapter->flags);
@@ -3751,10 +3753,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
* @irq: interrupt number
* @data: pointer to a network interface device structure
**/
-static irqreturn_t e1000_intr(int irq, void *data)
+static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
{
- struct net_device *netdev = data;
- struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
u32 icr = er32(ICR);
@@ -3796,6 +3796,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
return IRQ_HANDLED;
}
+static irqreturn_t e1000_intr(int irq, void *data)
+{
+ struct net_device *netdev = data;
+ struct e1000_adapter *adapter = netdev_priv(netdev);
+ irqreturn_t ret;
+
+ netpoll_irq_lock(&adapter->netpoll_lock);
+ ret = __e1000_intr(irq, adapter);
+ netpoll_irq_unlock(&adapter->netpoll_lock);
+
+ return ret;
+}
+
/**
* e1000_clean - NAPI Rx polling callback
* @adapter: board private structure
@@ -5220,9 +5233,7 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
- disable_irq(adapter->pdev->irq);
e1000_intr(adapter->pdev->irq, netdev);
- enable_irq(adapter->pdev->irq);
}
#endif
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index b25ee9ffdbe6..a171f1a50e0e 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -117,4 +117,35 @@ static inline bool netpoll_tx_running(struct net_device *dev)
}
#endif
+struct netpoll_irq_lock {
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ spinlock_t lock;
+#endif
+};
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static inline void netpoll_irq_lock_init(struct netpoll_irq_lock *np_lock)
+{
+ spin_lock_init(&np_lock->lock);
+}
+static inline void netpoll_irq_lock(struct netpoll_irq_lock *np_lock)
+{
+ spin_lock(&np_lock->lock);
+}
+static inline void netpoll_irq_unlock(struct netpoll_irq_lock *np_lock)
+{
+ spin_unlock(&np_lock->lock);
+}
+#else
+static inline void netpoll_irq_lock_init(struct netpoll_irq_lock *np_lock)
+{
+}
+static inline void netpoll_irq_lock(struct netpoll_irq_lock *np_lock)
+{
+}
+static inline void netpoll_irq_unlock(struct netpoll_irq_lock *np_lock)
+{
+}
+#endif
+
#endif
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> index 69707108d23c..3f48609f2318 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -323,6 +323,8 @@ struct e1000_adapter {
> struct delayed_work watchdog_task;
> struct delayed_work fifo_stall_task;
> struct delayed_work phy_info_task;
> +
> + spinlock_t irq_lock;
> };
>
> enum e1000_state_t {
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 5f6aded512f5..d12cbffe2149 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -1310,6 +1310,7 @@ static int e1000_sw_init(struct e1000_adapter *adapter)
> e1000_irq_disable(adapter);
>
> spin_lock_init(&adapter->stats_lock);
> + spin_lock_init(&adapter->irq_lock);
>
> set_bit(__E1000_DOWN, &adapter->flags);
>
> @@ -3748,10 +3749,8 @@ void e1000_update_stats(struct e1000_adapter *adapter)
> * @irq: interrupt number
> * @data: pointer to a network interface device structure
> **/
> -static irqreturn_t e1000_intr(int irq, void *data)
> +static irqreturn_t __e1000_intr(int irq, struct e1000_adapter *adapter)
> {
> - struct net_device *netdev = data;
> - struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = &adapter->hw;
> u32 icr = er32(ICR);
>
> @@ -3793,6 +3792,19 @@ static irqreturn_t e1000_intr(int irq, void *data)
> return IRQ_HANDLED;
> }
>
> +static irqreturn_t e1000_intr(int irq, void *data)
> +{
> + struct net_device *netdev = data;
> + struct e1000_adapter *adapter = netdev_priv(netdev);
> + irqreturn_t ret;
> +
> + spin_lock(&adapter->irq_lock);
> + ret = __e1000_intr(irq, adapter);
> + spin_unlock(&adapter->irq_lock);
> +
> + return ret;
> +}
> +
> /**
> * e1000_clean - NAPI Rx polling callback
> * @adapter: board private structure
> @@ -5217,9 +5229,9 @@ static void e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>
> - disable_irq(adapter->pdev->irq);
> + spin_lock(&adapter->irq_lock)
> e1000_intr(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> + spin_unlock(&adapter->irq_lock)
> }
> #endif
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 0a9104b4608b..b5a4a06bf2fd 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -427,7 +427,7 @@ EXPORT_SYMBOL(disable_irq_nosync);
> * to complete before returning. If you use this function while
> * holding a resource the IRQ handler may need you will deadlock.
> *
> - * This function may be called - with care - from IRQ context.
> + * This function may _NOT_ be called from IRQ context.
> */
> void disable_irq(unsigned int irq)
> {
--
Sabrina
On 12/02/14 17:35, Sabrina Dubroca wrote:
> Hello, sorry for the delay.
>
> 2014-10-29, 20:36:03 +0100, Peter Zijlstra wrote:
>> On Wed, Oct 29, 2014 at 07:33:00PM +0100, Thomas Gleixner wrote:
>>> Yuck. No. You are just papering over the problem.
>>>
>>> What happens if you add 'threadirqs' to the kernel command line? Or if
>>> the interrupt line is shared with a real threaded interrupt user?
>>>
>>> The proper solution is to have a poll_lock for e1000 which serializes
>>> the hardware interrupt against netpoll instead of using
>>> disable/enable_irq().
>>>
>>> In fact that's less expensive than the disable/enable_irq() dance and
>>> the chance of contention is pretty low. If done right it will be a
>>> NOOP for the CONFIG_NET_POLL_CONTROLLER=n case.
>>>
>>
>> OK a little something like so then I suppose.. But I suspect most all
>> the network drivers will need this and maybe more, disable_irq() is a
>> popular little thing and we 'just' changed semantics on them.
>>
>> ---
>> drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
>> drivers/net/ethernet/intel/e1000/e1000_main.c | 22 +++++++++++++++++-----
>> kernel/irq/manage.c | 2 +-
>> 3 files changed, 20 insertions(+), 6 deletions(-)
>
> I've been running with variants of this patch, things seem ok.
>
> As noted earlier, there are a lot of drivers doing this disable_irq +
> irq_handler + enable_irq sequence. I found about 60.
> Many already take a lock in the interrupt handler, and look like we
> could just remove the call to disable_irq (example: cp_interrupt,
> drivers/net/ethernet/realtek/8139cp.c).
>
> Here's how I modified your patch. The locking compiles away if
> CONFIG_NET_POLL_CONTROLLER=n.
>
> I can work on converting all the drivers from disable_irq to
> netpoll_irq_lock, if that's okay with you.
>
> In igb there's also a synchronize_irq() called from the netpoll
> controller (in igb_irq_disable()), I think a similar locking scheme
> would work.
> I also saw a few disable_irq_nosync and disable_percpu_irq. These are
> okay?
>
> [ ... ]
Hello,
Earlier today I ran into the bug mentioned at the start of this thread
with kernel 3.19-rc1 and the e1000e driver. Can anyone tell me what the
latest status is ?
Thanks,
Bart.