Received: by 10.223.185.116 with SMTP id b49csp2404178wrg; Mon, 12 Feb 2018 09:01:49 -0800 (PST) X-Google-Smtp-Source: AH8x227/NlcsorTJoIaZQz7RCvAGBYpiDsKVTlGq9iOuRlGNP0SqiVnWfCqswUfgiihQUjR1B1V4 X-Received: by 10.99.136.195 with SMTP id l186mr6789634pgd.427.1518454909812; Mon, 12 Feb 2018 09:01:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518454909; cv=none; d=google.com; s=arc-20160816; b=KjlBwrAsMjfTAEnxzhcsBiX10eqlzkxRdBiormnAsLrgzt5R+ayB1wTj+Xufg9hKdu zC/+Xl9MUqSLCznEMr9uBE6Z8hUsRUL2iA/stj9y6gs1+ci8Ea5dV6ZnFfB4c36f6CW4 t0NhKlVWXnVH8V1Hj5LjZvfVSFc3P/qYSYF+547JT2uZ5gX3AEWMfPMS7TJ0IKfHD0UE jgx7mvqqwq+1CQkG/JD9DZrMCQwZXBtNCBOR2XrHCx9SGHfpEBt1LC9WZGb24r/i61/3 MedQHA7cUFtu0Hd8GclZUybYtiTQ2SGUAY1W36HuJqUUed8ITZsjGWBnlqPswlagM4sq Q6qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=cN27uZbKZbSdvEURuRlWH6LWLEAlE5dVqRNbtyTS80M=; b=YyBMFtzXSxU/eBeS43lgm5rTMywERjKxycoBZV9rwVJLXNiAOzuw++UkWW6me7UBJ1 l1AIv7IvKzUIs1WNFs7Y2aEmdSrSebYMqUa+6BIleopVrqfNR63480vMJIWqigWvU2iR RI8Kwfpd2VHk2s+KQiKXEsofGfsuDgiuye8u4GEK0OkEsTm4Zqr6lyOAKb+7gus1IvRV 3iL6aS+vklDo7OdYKDn60vOGtXu3j00Sow7UrIW5+miDkCuI/nOIVHLfSGLv/uv3VPC1 ZJHKe+jEBcaFdso/eYuz8IX9p7W110w/MLv0cRPPM7UM936467dOsVRgkvv+dHNjpGdl 1+lQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i6-v6si2835586plt.26.2018.02.12.09.01.07; Mon, 12 Feb 2018 09:01:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751215AbeBLRAV (ORCPT + 99 others); Mon, 12 Feb 2018 12:00:21 -0500 Received: from www62.your-server.de ([213.133.104.62]:45130 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750769AbeBLRAT (ORCPT ); Mon, 12 Feb 2018 12:00:19 -0500 Received: from [62.202.221.8] (helo=linux.home) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-SHA:256) (Exim 4.85_2) (envelope-from ) id 1elHSj-0002vC-4K; Mon, 12 Feb 2018 18:00:17 +0100 Subject: Re: lost connection to test machine (4) To: dennisszhou@gmail.com Cc: Dmitry Vyukov , syzbot , Alexei Starovoitov , netdev , LKML , syzkaller-bugs@googlegroups.com, tj@kernel.org References: <001a113f8734783e94056505f8fd@google.com> From: Daniel Borkmann Message-ID: <00c45ca8-305d-1818-e974-a9903c8494b8@iogearbox.net> Date: Mon, 12 Feb 2018 18:00:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.99.3/24308/Mon Feb 12 10:20:55 2018) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/12/2018 05:03 PM, Dmitry Vyukov wrote: > On Mon, Feb 12, 2018 at 5:00 PM, syzbot > wrote: >> Hello, >> >> syzbot hit the following crash on bpf-next commit >> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +0000) >> Merge tag 'usercopy-v4.16-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux >> >> So far this crash happened 898 times on bpf-next, net-next, upstream. >> C reproducer is attached. >> syzkaller reproducer is attached. >> Raw console output is attached. >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached. > > The reproducer first causes several tasks spending minutes at this stack: > > [ 110.762189] NMI backtrace for cpu 2 > [ 110.762206] CPU: 2 PID: 3760 Comm: syz-executor Not tainted 4.15.0+ #96 > [ 110.762210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 110.762224] RIP: 0010:mutex_spin_on_owner+0x303/0x420 > [ 110.762232] INFO: NMI handler (nmi_cpu_backtrace_handler) took too > long to run: 1.103 msecs > [ 110.762237] RSP: 0018:ffff88005be470e8 EFLAGS: 00000246 > [ 110.762268] RAX: ffff88006ca00000 RBX: 0000000000000000 RCX: ffffffff81554165 > [ 110.762275] RDX: 0000000000000001 RSI: 1ffffffff0d97884 RDI: 0000000000000000 > [ 110.762281] RBP: ffff88005be47210 R08: dffffc0000000001 R09: fffffbfff0db2b75 > [ 110.762286] R10: fffffbfff0db2b74 R11: ffffffff86d95ba7 R12: ffffffff86d95ba0 > [ 110.762292] R13: ffffed000b7c8e25 R14: dffffc0000000000 R15: ffff880064691040 > [ 110.762300] FS: 00007f84ed029700(0000) GS:ffff88006cb00000(0000) > knlGS:0000000000000000 > [ 110.762305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 110.762311] CR2: 00007fd565f7b1b0 CR3: 000000005bddf002 CR4: 00000000001606e0 > [ 110.762316] Call Trace: > [ 110.762383] __mutex_lock.isra.1+0x97d/0x1440 > [ 110.762659] __mutex_lock_slowpath+0xe/0x10 > [ 110.762668] mutex_lock+0x3e/0x50 > [ 110.762677] pcpu_alloc+0x846/0xfe0 > [ 110.762778] __alloc_percpu_gfp+0x27/0x30 > [ 110.762801] array_map_alloc+0x484/0x690 > [ 110.762832] SyS_bpf+0xa27/0x4770 > [ 110.763190] do_syscall_64+0x297/0x760 > [ 110.763260] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > and later machine dies with: > > [ 191.484308] Kernel panic - not syncing: Out of memory and no > killable processes... > [ 191.484308] > [ 191.485740] CPU: 3 PID: 746 Comm: kworker/3:1 Not tainted 4.15.0+ #96 > [ 191.486761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 191.488071] Workqueue: events pcpu_balance_workfn > [ 191.488821] Call Trace: > [ 191.489299] dump_stack+0x175/0x225 > [ 191.490590] panic+0x22a/0x4be > [ 191.493061] out_of_memory.cold.31+0x20/0x21 > [ 191.496380] __alloc_pages_slowpath+0x1d98/0x28a0 > [ 191.503616] __alloc_pages_nodemask+0x89c/0xc60 > [ 191.507876] pcpu_populate_chunk+0x1fd/0x9b0 > [ 191.510114] pcpu_balance_workfn+0x1019/0x1450 > [ 191.517804] process_one_work+0x9d5/0x1460 > [ 191.522714] worker_thread+0x1cc/0x1410 > [ 191.529319] kthread+0x304/0x3c0 > > The original message with attachments is here: > https://groups.google.com/d/msg/syzkaller-bugs/Km3xEZu9zzU/rO-7XuwZAgAJ [ +Dennis, +Tejun ] Looks like we're stuck in percpu allocator with key/value size of 4 bytes each and large number of entries (max_entries) in the reproducer in above link. Could we have some __GFP_NORETRY semantics and let allocations fail instead of triggering OOM killer?