Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752183AbbGNFL3 (ORCPT ); Tue, 14 Jul 2015 01:11:29 -0400 Received: from mail-yk0-f169.google.com ([209.85.160.169]:36723 "EHLO mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751628AbbGNFL2 (ORCPT ); Tue, 14 Jul 2015 01:11:28 -0400 MIME-Version: 1.0 In-Reply-To: <20150713131825.GA16186@node.dhcp.inet.fi> References: <20150713131825.GA16186@node.dhcp.inet.fi> Date: Mon, 13 Jul 2015 22:11:27 -0700 Message-ID: Subject: Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings From: Cong Wang To: "Kirill A. Shutemov" Cc: "David S. Miller" , netdev , "linux-kernel@vger.kernel.org" , Thomas Graf Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4318 Lines: 96 On Mon, Jul 13, 2015 at 6:18 AM, Kirill A. Shutemov wrote: > Hi, > > This simple test-case trigers few locking asserts in kernel: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > > #define SOL_NETLINK 270 > > int main(int argc, char **argv) > { > unsigned int block_size = 16 * 4096; > struct nl_mmap_req req = { > .nm_block_size = block_size, > .nm_block_nr = 64, > .nm_frame_size = 16384, > .nm_frame_nr = 64 * block_size / 16384, > }; > unsigned int ring_size; > int fd; > > fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); > if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) > exit(1); > if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) > exit(1); > > ring_size = req.nm_block_nr * req.nm_block_size; > mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > return 0; > } > > +++ exited with 0 +++ > [ 2.500126] BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 > [ 2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init > [ 2.501997] 3 locks held by init/1: > [ 2.502380] #0: (reboot_mutex){+.+...}, at: [] SyS_reboot+0xa9/0x220 > [ 2.503328] #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [] __blocking_notifier_call_chain+0x39/0x70 > [ 2.504659] #2: (rcu_callback){......}, at: [] rcu_do_batch.isra.49+0x160/0x10c0 > [ 2.505724] Preemption disabled at:[] __delay+0xf/0x20 > [ 2.506443] > [ 2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 > [ 2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 > [ 2.508386] ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 > [ 2.509233] 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 > [ 2.510057] ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 > [ 2.510882] Call Trace: > [ 2.511146] [] dump_stack+0x4f/0x7b > [ 2.511763] [] ___might_sleep+0x16d/0x270 > [ 2.512476] [] __might_sleep+0x4d/0x90 > [ 2.513071] [] mutex_lock_nested+0x2f/0x430 > [ 2.513683] [] ? _raw_spin_unlock_irqrestore+0x5d/0x80 > [ 2.514385] [] ? __this_cpu_preempt_check+0x13/0x20 > [ 2.515066] [] netlink_set_ring+0x1ed/0x350 > [ 2.515694] [] ? netlink_undo_bind+0x70/0x70 > [ 2.516411] [] netlink_sock_destruct+0x80/0x150 > [ 2.517070] [] __sk_free+0x1d/0x160 > [ 2.517607] [] sk_free+0x19/0x20 > [ 2.518118] [] deferred_put_nlk_sk+0x20/0x30 > [ 2.518735] [] rcu_do_batch.isra.49+0x79c/0x10c0 Caused by: commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 Author: Thomas Graf Date: Fri Jan 2 23:00:22 2015 +0100 netlink: Lockless lookup with RCU grace period in socket release Defers the release of the socket reference using call_rcu() to allow using an RCU read-side protected call to rhashtable_lookup() This restores behaviour and performance gains as previously introduced by e341694 ("netlink: Convert netlink_lookup() to use RCU protected hash table") without the side effect of severely delayed socket destruction. Signed-off-by: Thomas Graf Signed-off-by: David S. Miller We can't hold mutex lock in a rcu callback, perhaps we could defer the mmap ring cleanup to a workqueue. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/