Received: by 10.223.164.202 with SMTP id h10csp3937760wrb; Tue, 28 Nov 2017 21:00:59 -0800 (PST) X-Google-Smtp-Source: AGs4zMbWgQsFc2FKHMTy6QMwm/lTl73jQ1cXD0Oe2g5yIoX1i5IuIkvGvfx1Fb8pDOa3G8TYPHlZ X-Received: by 10.84.140.1 with SMTP id 1mr1662968pls.114.1511931659628; Tue, 28 Nov 2017 21:00:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511931659; cv=none; d=google.com; s=arc-20160816; b=KZsACS5owDCxyO6MkpYH94eIlGYMYmE3okYf3HP0nuPXBEgtr4BJWmEjiAx2wFIRfG AEAIh/ucR+DZQdr997D0rdT0+pmxTLXsRK+8SP+uYjOjc4Kv+fMB1BBL/KU//jGbzkVc IeI/LjycRSuImipePBrzVSPKLnPUXehyJVAT9Zd5tRn9eGCya2j54/eXPRQOSw3vJiqP T0oN4EyD9whlEBStaT7ve5PspniEgzraRMCS8wKFEBXuhab2BTcY+k3KFReHJuNj/vQ3 yrKPfWj8JWtNaRd7PhQCH8TwSAo7yJqC3nPX+JTg4SoE4UaLPKOfeqOPkxsACiZzrfCS MDow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=scJCnD5UpgSp3TxNxWwbjMQ6H1+pHnaUvhuHgti3ZM0=; b=cKKkTeC4nAnLSy3JMeERrmKG62wqs9KxD3gnDrDUbC7bIIIf01bOLMwk6EUzCdFdFO HpYBbtz+CYw4QmYWujCfuCor5axHonGdTsj4WguPaqDrVKpnRMqEzzlqqQ7MKKMJ3CiD 5xJgNwhd9ra1xs0Wje9NQjoPjkQSsc+BnBUUmQXNJLmXhaXHMVnX9K16u2CWMCw+P9Mh k5mqlityz6qGc+gyfMYQUYBbFJNtWvK/DE1+zxvsToCAzK0Ms5lIp69Ovpxjt7qld/Ia +qoJBtALVukVL6Z0ijD6y6E4Z9kKRrro3R9N3/ckvsBuYx5eCqlafUYJ4PDfT6vO6Tx5 0UTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hYjQlt2b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v21si626355pgn.749.2017.11.28.21.00.48; Tue, 28 Nov 2017 21:00:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hYjQlt2b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752626AbdK2Ey6 (ORCPT + 71 others); Tue, 28 Nov 2017 23:54:58 -0500 Received: from mail-it0-f47.google.com ([209.85.214.47]:39411 "EHLO mail-it0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751911AbdK2Ey5 (ORCPT ); Tue, 28 Nov 2017 23:54:57 -0500 Received: by mail-it0-f47.google.com with SMTP id 68so2595130ite.4 for ; Tue, 28 Nov 2017 20:54:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=scJCnD5UpgSp3TxNxWwbjMQ6H1+pHnaUvhuHgti3ZM0=; b=hYjQlt2b79BVWQST74Dqa+jeSibWDnrN+5LYxnpR9g17fwtU3uINx6SCPiqe1VfL3E 8GMozoROhsbPZDSsjq+S4IOMoQZhi9Twteu1aXnOjgl2+xVlefG5V4AjWQ7D+2gj4FJ/ y6fW6ojun95aKPmlDM/GrqO6ZFEgFHOLYbmGQrZU6M7j4FphEu4Qy4IXcDABQzcswdNg bG7BhE7qaJ9LJ/Jz6LYdII4NGGeOG/c+GpWjKYBu17a1qej/jfGktiEQSoctoGi7axO7 JeHwzeuYvw6kVN3Kz/sQHBpTuGSqplD03s3AFwXpUguufZnF/hDc83ZraNUv/XQYpXmr lekA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=scJCnD5UpgSp3TxNxWwbjMQ6H1+pHnaUvhuHgti3ZM0=; b=ERMw+Ajz5DXdG06rl3lF8EnUgA2r0OdKPfYplBRcUTVVo1cQjsgkHSanssp2+8fV4q N2fi/dqcak9BfWsplH2q9qo5/+LRwzYz1BxV8XrmF8DuOEPe8fWnCASF8EtbypB+e+G/ WlbCxR+qceE4Gw/PO+kld7edtiLdt+tJzacSWMeb4+h4Vt3TZZlCG2UvWqyF8+077RUN ayQA2ZjRaQThra3A81B+GUM1fWMYmDsDL+XqkaMbZIitoPX7Kex1kO9ZS1NEAek97BIm L7SobDH91qbfezPNaJcs+JkYWmvJwAyhySwvRsa1JCWdha63FyaRQimD53+VGlsQF8wk NCHQ== X-Gm-Message-State: AJaThX57XNysdJgfdZ0rNgIGIxPbU1dmLsW69NdjNG/jX+EQ3Ufr757H M97RvsN4pAIqTylwPBOmzVc0LzZ3p+hS2kmbIMg= X-Received: by 10.36.211.22 with SMTP id n22mr6023514itg.5.1511931296620; Tue, 28 Nov 2017 20:54:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.5.21 with HTTP; Tue, 28 Nov 2017 20:54:56 -0800 (PST) In-Reply-To: References: <1511841842-3786-1-git-send-email-zhouzhouyi@gmail.com> From: Zhouyi Zhou Date: Wed, 29 Nov 2017 12:54:56 +0800 Message-ID: Subject: Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache To: Dmitry Vyukov Cc: Andrey Ryabinin , Alexander Potapenko , kasan-dev , Linux-MM , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, There is new discoveries! When I find qlist_move_cache reappear in my environment, I use kgdb to break into function qlist_move_cache. I found this function is called because of cgroup release. I also find libvirt allocate a memory croup for each qemu it started, in my system, it looks like this: root@ednserver3:/sys/fs/cgroup/memory/machine.slice# ls cgroup.clone_children machine-qemu\x2d491_25_30.scope machine-qemu\x2d491_40_30.scope machine-qemu\x2d491_6_30.scope memory.limit_in_bytes cgroup.event_control machine-qemu\x2d491_26_30.scope machine-qemu\x2d491_41_30.scope machine-qemu\x2d491_7_30.scope memory.max_usage_in_bytes cgroup.procs machine-qemu\x2d491_27_30.scope machine-qemu\x2d491_4_30.scope machine-qemu\x2d491_8_30.scope memory.move_charge_at_immigrate machine-qemu\x2d491_10_30.scope machine-qemu\x2d491_28_30.scope machine-qemu\x2d491_47_30.scope machine-qemu\x2d491_9_30.scope memory.numa_stat machine-qemu\x2d491_11_30.scope machine-qemu\x2d491_29_30.scope machine-qemu\x2d491_48_30.scope memory.failcnt memory.oom_control machine-qemu\x2d491_12_30.scope machine-qemu\x2d491_30_30.scope machine-qemu\x2d491_49_30.scope memory.force_empty memory.pressure_level machine-qemu\x2d491_13_30.scope machine-qemu\x2d491_31_30.scope machine-qemu\x2d491_50_30.scope memory.kmem.failcnt memory.soft_limit_in_bytes machine-qemu\x2d491_17_30.scope machine-qemu\x2d491_32_30.scope machine-qemu\x2d491_51_30.scope memory.kmem.limit_in_bytes memory.stat machine-qemu\x2d491_18_30.scope machine-qemu\x2d491_33_30.scope machine-qemu\x2d491_52_30.scope memory.kmem.max_usage_in_bytes memory.swappiness machine-qemu\x2d491_19_30.scope machine-qemu\x2d491_34_30.scope machine-qemu\x2d491_5_30.scope memory.kmem.slabinfo memory.usage_in_bytes machine-qemu\x2d491_20_30.scope machine-qemu\x2d491_35_30.scope machine-qemu\x2d491_53_30.scope memory.kmem.tcp.failcnt memory.use_hierarchy machine-qemu\x2d491_21_30.scope machine-qemu\x2d491_36_30.scope machine-qemu\x2d491_54_30.scope memory.kmem.tcp.limit_in_bytes notify_on_release machine-qemu\x2d491_22_30.scope machine-qemu\x2d491_37_30.scope machine-qemu\x2d491_55_30.scope memory.kmem.tcp.max_usage_in_bytes tasks machine-qemu\x2d491_23_30.scope machine-qemu\x2d491_38_30.scope machine-qemu\x2d491_56_30.scope memory.kmem.tcp.usage_in_bytes machine-qemu\x2d491_24_30.scope machine-qemu\x2d491_39_30.scope machine-qemu\x2d491_57_30.scope memory.kmem.usage_in_bytes and in each memory cgroup there are many slabs: root@ednserver3:/sys/fs/cgroup/memory/machine.slice/machine-qemu\x2d491_10_30.scope# cat memory.kmem.slabinfo slabinfo - version: 2.1 # name : tunables : slabdata kmalloc-2048 0 0 2240 3 2 : tunables 24 12 8 : slabdata 0 0 0 kmalloc-512 0 0 704 11 2 : tunables 54 27 8 : slabdata 0 0 0 skbuff_head_cache 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0 kmalloc-1024 0 0 1216 3 1 : tunables 24 12 8 : slabdata 0 0 0 kmalloc-192 0 0 320 12 1 : tunables 120 60 8 : slabdata 0 0 0 pid 3 21 192 21 1 : tunables 120 60 8 : slabdata 1 1 0 signal_cache 0 0 1216 3 1 : tunables 24 12 8 : slabdata 0 0 0 sighand_cache 0 0 2304 3 2 : tunables 24 12 8 : slabdata 0 0 0 fs_cache 0 0 192 21 1 : tunables 120 60 8 : slabdata 0 0 0 files_cache 0 0 896 4 1 : tunables 54 27 8 : slabdata 0 0 0 task_delay_info 3 72 112 36 1 : tunables 120 60 8 : slabdata 2 2 0 task_struct 3 3 3840 1 1 : tunables 24 12 8 : slabdata 3 3 0 radix_tree_node 0 0 728 5 1 : tunables 54 27 8 : slabdata 0 0 0 shmem_inode_cache 2 9 848 9 2 : tunables 54 27 8 : slabdata 1 1 0 inode_cache 39 45 744 5 1 : tunables 54 27 8 : slabdata 9 9 0 ext4_inode_cache 0 0 1224 3 1 : tunables 24 12 8 : slabdata 0 0 0 sock_inode_cache 3 8 832 4 1 : tunables 54 27 8 : slabdata 2 2 0 proc_inode_cache 0 0 816 5 1 : tunables 54 27 8 : slabdata 0 0 0 dentry 52 90 272 15 1 : tunables 120 60 8 : slabdata 6 6 0 anon_vma 140 348 136 29 1 : tunables 120 60 8 : slabdata 12 12 0 anon_vma_chain 257 468 112 36 1 : tunables 120 60 8 : slabdata 13 13 0 vm_area_struct 510 780 272 15 1 : tunables 120 60 8 : slabdata 52 52 0 mm_struct 1 3 1280 3 1 : tunables 24 12 8 : slabdata 1 1 0 cred_jar 12 24 320 12 1 : tunables 120 60 8 : slabdata 2 2 0 So, when I end the libvirt scenery, those slabs belong to those qemus has to invoke quarantine_remove_cache, I guess that's why qlist_move_cache occupies so much CPU cycles. I also guess this make libvirt complain (wait for too long?) Sorry not to research deeply into system in the first place and submit a patch in a hurry. And I propose a little sugguestion to improve qlist_move_cache if you like. Won't we design some kind of hash mechanism, then we group the qlist_node according to their cache, so as not to compare one by one to every qlist_node in the system. Sorry for your time Best Wishes Zhouyi On Wed, Nov 29, 2017 at 7:41 AM, Zhouyi Zhou wrote: > Hi, > I will try to reestablish the environment, and design proof of > concept of experiment. > Cheers > > On Wed, Nov 29, 2017 at 1:57 AM, Dmitry Vyukov wrote: >> On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov wrote: >>> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou wrote: >>>> Hi, >>>> By using perf top, qlist_move_cache occupies 100% cpu did really >>>> happen in my environment yesterday, or I >>>> won't notice the kasan code. >>>> Currently I have difficulty to let it reappear because the frontend >>>> guy modified some user mode code. >>>> I can repeat again and again now is >>>> kgdb_breakpoint () at kernel/debug/debug_core.c:1073 >>>> 1073 wmb(); /* Sync point after breakpoint */ >>>> (gdb) p quarantine_batch_size >>>> $1 = 3601946 >>>> And by instrument code, maximum >>>> global_quarantine[quarantine_tail].bytes reached is 6618208. >>> >>> On second thought, size does not matter too much because there can be >>> large objects. Quarantine always quantize by objects, we can't part of >>> an object into one batch, and another part of the object into another >>> object. But it's not a problem, because overhead per objects is O(1). >>> We can push a single 4MB object and overflow target size by 4MB and >>> that will be fine. >>> Either way, 6MB is not terribly much too. Should take milliseconds to process. >>> >>> >>> >>> >>>> I do think drain quarantine right in quarantine_put is a better >>>> place to drain because cache_free is fine in >>>> that context. I am willing do it if you think it is convenient :-) >> >> >> Andrey, do you know of any problems with draining quarantine in push? >> Do you have any objections? >> >> But it's still not completely clear to me what problem we are solving. From 1585355187784556354@xxx Tue Nov 28 23:42:04 +0000 2017 X-GM-THRID: 1585281151451498932 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread