Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932549AbcLLUuq (ORCPT ); Mon, 12 Dec 2016 15:50:46 -0500 Received: from mail-db5eur01on0110.outbound.protection.outlook.com ([104.47.2.110]:61997 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753239AbcLLUum (ORCPT ); Mon, 12 Dec 2016 15:50:42 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=MPatlasov@virtuozzo.com; Subject: Re: [PATCH] btrfs: limit async_work allocation and worker func duration To: References: <148072986343.13061.16191252239168261528.stgit@maxim-thinkpad> <20161212145443.GT12522@twin.jikos.cz> CC: , , , From: Maxim Patlasov Message-ID: <2d4aaf16-b9b3-6cd9-d542-c74f00811c93@virtuozzo.com> Date: Mon, 12 Dec 2016 12:35:51 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161212145443.GT12522@twin.jikos.cz> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [73.53.48.178] X-ClientProxiedBy: BY1PR20CA0023.namprd20.prod.outlook.com (10.162.140.33) To DB6PR0801MB1845.eurprd08.prod.outlook.com (10.169.227.143) X-MS-Office365-Filtering-Correlation-Id: 073d1a43-e288-43ac-f583-08d422ce7daa X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:DB6PR0801MB1845; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1845;3:S1+9e4K7+KmXUCAy7xcKsLIv0jSuPOTef127ijwUMplHDn5TDM2gqIxaPGpl8mdp15PpNWbhK3fL65kUSoM2MIX3fWJtw+wXDHYUXePzLQG+RYO+lkitO0w+p3+SwTr0QdXC5qt8uRkCxkqwUHBPnOqE9UNN44ns/n3SSq9IH/exVYyLtaHUVv2J0VaTGDd5HciqAZxlZkv6Y6S92LPYT1Vwfuoi65DdTxc302+Z6gm3QZ+lFtguRbWcI2Ld+1Ob5n1K61B3jpXYWbzroPbeFA==;25:YB2MjCrUUhc7z9pcRWgS5wDahpjfsdk8zh+2i4jFJb9cfHhevb6ly17d59Uau+6B7ByN/4azMjDB+uVtYzwc4gnNZWzSFid5uSndjj+s1GtJH5G8I0HqUnpbpIRGq1BBfMnXrjKHAptmz9PGgJFjY7O/sOYS8hR3pahpdJ+l2XU7Do55Z49wTRNttqPHe569fSocO2Y6mtA74ShIxnDX6TiiOChM4tFgaEC68Ar8I5VW5DOYZDpD2mVG/A8nU8iI9Q1NyEhBSLXvbMWuks+6Sazouv73pstRZZbEvwh6rovjatATDtHx1zovjp1KNVK7DHv91wWh/n82LO9ndzcD+Hyu1TxNtN+7W5gLq52cQmhHViTpOqdKkrEgTIf90A0sTCIijdv49NBoNubVWcrTDV/hrz6MPOuFCAD5D/kLBjglwb78l4yTrQLc8nbNzXBQxAXmADJTv2Nrp8XrR/JGQQ== X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1845;31:4V3SFa4gJZvq1l0OpO3XvImF8F3svG4VQB6weO5gMvVfihVoIbwHIkRA4k94I+BM/V8OR22fHutIKXUrvOSVOmL0mDQpLWpmD3sOhAAMwYqUN4jWHPDUkiprO3Ob6OCGsMlM552pCVwl160LUpBn7VhXye9aAtRxYWVEddGwRMZzeIuNF1o3+t8z+avE+SKQLLUDiiH9mXmx9RPVwJWo8fBL6JK1SL/DkNkKpLdzxe64ehIgldx8HN87jVIwKxiaXL7V1T0yk39kW5do9qk/c+50lJGnZ7/HPBjOLfo9xbA=;20:bW+0glCjHJKt1avihNIzAoXw6lKHFgK6TKFfqAQl6/RxuG1JxJadVAagULcXWLWvqcLPKucJwb9VcOJc/K0cJrsft6VhGjLbcb/pJ4GGn8ZD0+z8sxMgI3TeEZPXn4H8YWJm8IBuZhn+5YXW90Bc6qYJ5/uD3OIwHuzLPjcKjITuaaCt+9B+eqWCO/HZkoqtn8YqkqYE4ou+NHk/jBTbNvHDgvTf40at8JvDYDA0CjgN/ZPYgNbPuVAXxwwYIZwp X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(209352067349851); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123560025)(20161123555025)(20161123564025)(20161123562025)(6072148);SRVR:DB6PR0801MB1845;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0801MB1845; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1845;4:H4sS2GuQRr9j6UB+VdpBygkP7LeTC5GUyQ8NC3oHwq2lm598IkB3RLjgnRL6UX2oCZHbDPpIXADB6YSW09vttouKKumFI/BflD6xeQi5Cs+eotjSLtZZ8Q9VCtWwJ6ma/0RH+MFGD/i6HicqMeh1E22Fc0Ble4CwXeegHsWEyInt30RmeeAhderVNnwAmWPP1fcZk1I0kXcuWf1dFCrxI6rBqfPx+vHwX+LshQn8/9AohGg0p6sk91RNhoVBfpzas2ZvB4HN/ezEYA6bm88QHfN/UX50Oj7aQVKhdxTKcBaOzM8BmTTHj/SEHnCu6I9/6il2AjUlzqBNU5keXfpEjlj+WLuWN19Xr0XRZ8/1aSC2wyFUA55xLI5aFVY1efX+eCrFBscdQ8e/u3AxyJeVv8KsIc5kcH2DsYICdZ/aicCRLC1JzOOao2AkoAQKTsgqGJP99xC326X4lDcLGA7U7IN90XSflWKWCgXYu7QNcYIl0fs7xmIMptGQtjKu+q/16tQdHLU6EksaxAcPhca1HDi1dIoM00JsQWNBLMmpY/qWdyckXCtFhL3vKvQVH/lsOP+MXecu6xFkljKGK4+HzDbnwwuQmMd7OPGDNO89mUqhWOmA3p/De2L88QRsqajr X-Forefront-PRVS: 0154C61618 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(39450400003)(24454002)(189002)(199003)(377454003)(2906002)(64126003)(65826007)(8676002)(7736002)(50466002)(81166006)(81156014)(31696002)(575784001)(4001350100001)(86362001)(83506001)(305945005)(230700001)(117156001)(6116002)(3846002)(189998001)(97736004)(31686004)(36756003)(2351001)(76176999)(101416001)(38730400001)(229853002)(106356001)(42186005)(5660300001)(92566002)(50986999)(4326007)(80792005)(23746002)(2950100002)(68736007)(6666003)(6916009)(65956001)(47776003)(345774005)(66066001)(110136003)(65806001)(6486002)(90366009)(54356999)(33646002)(77096006)(105586002);DIR:OUT;SFP:1102;SCL:1;SRVR:DB6PR0801MB1845;H:[192.168.0.106];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB6PR0801MB1845;23:jZWqyoBdHvmHnrvvfGOa3gEt10iwXdvu+Nc?= =?Windows-1252?Q?wlI8Nrl8lNRoS2bolmb4TZmPmPHrfv0g7Mmo+6D3mER5f+qgFAc/lFjy?= =?Windows-1252?Q?8kljseMEGhmTkQmnI+zdWZs8NKAX6shxqV4T3w6BA0EG6ar4ZWcRAbAl?= =?Windows-1252?Q?UBKX8L5TlYAZfBFFjNzXj0U8X9SMNP5LI/IOamnxZnDklj8AhX7k3TlX?= =?Windows-1252?Q?EPCNtPwj8Mi70trDXh10+k5qdVLoMKgjFwi4KAZGc86+EgO8jq/MGiQE?= =?Windows-1252?Q?+1i/BEI0JZ87bX8RwaMB6ik7siumWFSyFcmAdGZaRb5u+wacuAE67qXo?= =?Windows-1252?Q?l0eG9ShzM5fab6JBUnTlFnSg1F9ctO5sEk5qHRkit9jAUmHWwtgZKf+1?= =?Windows-1252?Q?Qa7Ss22FJ+7dIAJ2T3HQTFtmAaoGEdrOMLCdWZpDuyyVSvAq/pdnBawS?= =?Windows-1252?Q?UGSOwPNJBr8sV4PNQMsrqH1KImw9hNgSn74MXfO4SYz3DAe0IoMcQW88?= =?Windows-1252?Q?AaYzqsNhocl6KAC71ANQ7NmaUcEb7XgbslBnf+8+5ZGAFuBIW+p03FTx?= =?Windows-1252?Q?c7ZZFMMd2m8gwIws+pZapQzmu/GLl7NQ8wrjZV+dl3dPC81ylf6yEIh0?= =?Windows-1252?Q?MzHyey3FmM8c1tbPhO/AtiggC+rMxBFH5FnHYpLPyXg896M8uYXlO9Pf?= =?Windows-1252?Q?N2TDcul5IY8ONhQrFWiVfheZaDC2aqEgAg8qMdCn8aEHeYoDH4RIx7aC?= =?Windows-1252?Q?p/uCYAG3haLzWfWITcNvd6FRDXDnP26Igihbuh19g5R5irPbGlm50Lkf?= =?Windows-1252?Q?GvRGMJUcmZntFGQGJ/OFE1WAgVwN8pzlc0ewFbyGWlpCPIz8Oe61hWKP?= =?Windows-1252?Q?BqaI/BTh96ZiPhCh7QXkyPRNjtE29pGR3K4oKmKb5mirFWxzcWuZjgRM?= =?Windows-1252?Q?o0JztRst9cSH25R1R7FKj3unRyS6g8wu86jWupIcIkKXG9XjuUZuQITi?= =?Windows-1252?Q?Gmf6uCHaFOjRsref8HDWBBMDmdysAcmTS4R1Ns2i7bJqzAO4kbRiNNJZ?= =?Windows-1252?Q?WNUrSOWHFgqu1Rhp2KXiHgbs+bn2waMqQpb5KZG49h+CPROWLuOL203/?= =?Windows-1252?Q?gDWkGaZhCE5CPZ5vh/Sdu3pNGpvo2V7PhK/TKgOqYzV7R0BuZhPw7uVN?= =?Windows-1252?Q?7S8irMxf7d47XYazFCO6JENYaTOGhMg93FUqVTuNQIE2mVoOWWvdLAbv?= =?Windows-1252?Q?REUo/l4JlmsW/3rS52PPkFE+5Rg2SA1U1QYqW4n2GNj8ac6yN0VPPCvc?= =?Windows-1252?Q?5y24ntoRPVP8toItIQWu+OaOFd1ufwwe4W5yKirQUs4Pvy7oKJjXgnK3?= =?Windows-1252?Q?GBQVwoLoDGXDncN96qt0YTRIASMHRW1A51Hs6+OvM3EVCY3n3pBMEvds?= =?Windows-1252?Q?qy0rMQEkJSN1+9PR+wGP4FqJH9L3KjdQiAVzy9OpZU9/yaNuIsSmTcdR?= =?Windows-1252?Q?STTK/TMj5uQPSl1PrZZKQGPo4ZtqURHHHd3Ezlbwy01Bat2rUdQ=3D?= =?Windows-1252?Q?=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1845;6:+1xM2QFNsKC3kynVuz2uMibx59WymMJ1yd8WL3znIdKBTYo60gAa9aDQzFVtIftjnoRJpUxJBMYdgwzl3XyfrE2YmDy3PP0qpKwJnljQlZH8hhZ20H2Ssfb/ow4VyX1KReLzMsONYPlkndWylKT39D9lSPQXrW1dxkEWbxcW3DWMIB7sHLymH9ZLQoqC3S6h6bfGUCeeON7EPEmPlpxvBCIvrnll5kVoIsgA9hopgarCP10W0tvkj/8TaJ374B2eN//T13OF2JLua+Kv+Azix/mNdoSzLyPh7uo5gftQ43fdF0Am4D9k53BxzpbEFWgicwXJJNqqwwKdtiHbUt9OCNHOyd8sJF1/h6ClL1DhfdHNktnKk0yn6jGL1m0k0rnqYaPiS0dCGvZaeitefn9xPxYA7LZ3s+tLTlN+mrmtPIA=;5:anum1OBqVoiwnh6zJEBvg2zquHv0Igs4y+oD7U1ac4eJkmpqWLUWn2VhQqsJNyqUQEy2e0tsRyrnzqiC6//iLi58VZjqthaRKauof9salCXAGH+jjkwDmF0rk09ieYcXuFnKbLdw0dUlwcgLKvULOg==;24:NgbadbPjhPENTO4vD1UHkpKxK7hE1JoLlMP3vBELJXglS05LM/jJ4nex7cN5yPKoQJquPfpronhi2CaQ1JnJRPDlCfW17is/UvkLJV37BCA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1845;7:oK11KvBWU/aCTe/4Dk5a154ljQ3gNxaTAbVlbN+ugl5c0cLFg26XWBt8f8czmnX/jIGBFY8U8+c+9bykOtbac14XldJdRmnaChnVXxLb8oEWIXntxLN1z45twPGHAMiSD4hkZmMr+uDcNDpsz34Vx8LIuwtsCOneUwM9P/G3aGzJ/6W+QRrjJx+dUMpxH7jJkKQ6QYa14WDP9wAu5F6KkFPV+MzEDtddkPecRenNW5CQHoYUucCD6dkZG6rgXM9kTqRtXxmYAKHp/sJYsIlET3sZJvF+NGvKlehIFOUEd0SnOpprS7rXg8Du18jtmU3pYXi0iFa05nq9Ow5NTvsqeQEXz23pvQRgqqy2qTTjpJcHRD5/Cyfbqy8rTW7Jir9k4iVf6tGfsTaFJ3fONOJ80GgmV5agl4GlMGg3clRawDaXdexgnTSmqDRsbcjb1lA8V293oGqGb8yTasgiqPlyLw==;20:qf65QyBlaFtwY6YY58k+wCggx9hLc53vJhdjj/Hj9HV9BfTIYpocUZK16xY2SxvkZdK3pDxrNCFART5LE45X3CrSft0EqwuuGd6tS2unP2PlQXZrHu1P3ZylYX0FCIUdmOrZlOHadGSB6MyAQhCDf+Ony/G/Rv9zfqo9fKneZl4= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2016 20:36:02.0014 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1845 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4681 Lines: 104 On 12/12/2016 06:54 AM, David Sterba wrote: > On Fri, Dec 02, 2016 at 05:51:36PM -0800, Maxim Patlasov wrote: >> Problem statement: unprivileged user who has read-write access to more than >> one btrfs subvolume may easily consume all kernel memory (eventually >> triggering oom-killer). >> >> Reproducer (./mkrmdir below essentially loops over mkdir/rmdir): >> >> [root@kteam1 ~]# cat prep.sh >> >> DEV=/dev/sdb >> mkfs.btrfs -f $DEV >> mount $DEV /mnt >> for i in `seq 1 16` >> do >> mkdir /mnt/$i >> btrfs subvolume create /mnt/SV_$i >> ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2` >> mount -t btrfs -o subvolid=$ID $DEV /mnt/$i >> chmod a+rwx /mnt/$i >> done >> >> [root@kteam1 ~]# sh prep.sh >> >> [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done >> >> [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done >> kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0 >> kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0 >> kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0 >> kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0 >> >> The huge numbers above come from insane number of async_work-s allocated >> and queued by btrfs_wq_run_delayed_node. >> >> The problem is caused by btrfs_wq_run_delayed_node() queuing more and more >> works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The >> worker func (btrfs_async_run_delayed_root) processes at least >> BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery >> works as expected while the list is almost empty. As soon as it is getting >> bigger, worker func starts to process more than one item at a time, it takes >> longer, and the chances to have async_works queued more than needed is getting >> higher. >> >> The problem above is worsened by another flaw of delayed-inode implementation: >> if async_work was queued in a throttling branch (number of items >= >> BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until >> the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that >> the func occupies CPU infinitely (up to 30sec in my experiments): while the >> func is trying to drain the list, the user activity may add more and more >> items to the list. > Nice analysis! > >> The patch fixes both problems in straightforward way: refuse queuing too >> many works in btrfs_wq_run_delayed_node and bail out of worker func if >> at least BTRFS_DELAYED_WRITEBACK items are processed. >> >> Signed-off-by: Maxim Patlasov >> --- >> fs/btrfs/async-thread.c | 8 ++++++++ >> fs/btrfs/async-thread.h | 1 + >> fs/btrfs/delayed-inode.c | 6 ++++-- >> 3 files changed, 13 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c >> index e0f071f..29f6252 100644 >> --- a/fs/btrfs/async-thread.c >> +++ b/fs/btrfs/async-thread.c >> @@ -86,6 +86,14 @@ btrfs_work_owner(struct btrfs_work *work) >> return work->wq->fs_info; >> } >> >> +bool btrfs_workqueue_normal_congested(struct btrfs_workqueue *wq) >> +{ >> + int thresh = wq->normal->thresh != NO_THRESHOLD ? >> + wq->normal->thresh : num_possible_cpus(); > Why not num_online_cpus? I vaguely remember we should be checking online > cpus, but don't have the mails for reference. We use it elsewhere for > spreading the work over cpus, but it's still not bullet proof regarding > cpu onlining/offlining. Thank you for review, David! I borrowed num_possible_cpus from the definition of WQ_UNBOUND_MAX_ACTIVE in workqueue.h, but if btrfs uses num_online_cpus elsewhere, it must be OK as well. Another problem that I realized only now, is that nobody increments/decrements wq->normal->pending if thresh == NO_THRESHOLD, so the code looks pretty misleading: it looks as though assigning thresh to num_possible_cpus (or num_online_cpus) matters, but the next line compares it with "pending" that is always zero. As far as we don't have any NO_THRESHOLD users of btrfs_workqueue_normal_congested for now, I tend to think it's better to add a descriptive comment and simply return "false" from btrfs_workqueue_normal_congested rather than trying to address some future needs now. See please v2 of the patch. Thanks, Maxim > > Otherwise looks good to me, as far as I can imagine the possible > behaviour of the various async parameters just from reading the code.