Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751573AbcLCGXV (ORCPT ); Sat, 3 Dec 2016 01:23:21 -0500 Received: from mail-db5eur01on0118.outbound.protection.outlook.com ([104.47.2.118]:28022 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750900AbcLCGXT (ORCPT ); Sat, 3 Dec 2016 01:23:19 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=MPatlasov@virtuozzo.com; Subject: [PATCH] btrfs: limit async_work allocation and worker func duration From: Maxim Patlasov To: CC: , , , Date: Fri, 2 Dec 2016 17:51:36 -0800 Message-ID: <148072986343.13061.16191252239168261528.stgit@maxim-thinkpad> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [162.246.95.100] X-ClientProxiedBy: DM5PR12CA0004.namprd12.prod.outlook.com (10.172.32.142) To HE1PR0801MB1849.eurprd08.prod.outlook.com (10.168.150.145) X-MS-Office365-Filtering-Correlation-Id: b75f57e6-0ca7-4a5d-e5a3-08d41b1ef2c0 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:HE1PR0801MB1849; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;3:lf9iIfuohK5iXJUt6fe9lJj3MKe+1tSjjz6rcjPW2Ka0KV22Q+mRZA1ABz31IHI2atkc4I0HAMCyZJgXDzqz30rh+AI2k3n5odPLBxiTE3ijtib7/4w3xbZGh+41V2eMcesbFRmsfhAKrhBCN3Q2cNwirS916ytHZnVVy1aE8F6t4rBJyeWzEomwAPd85KySn08LQRavbFWiYO3v21JrAxKUerTsmBkafHsxrdbrggni0gomjcg4Bw5Iu5OBJa2HuAnLPWy4rVEw6H2A69ZRfg== X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;25:PqVnn9SUZlTw4g9jF9BFZ5uOpehjNoK7R4ISy3wYmR0/Lg+Z8+pSlLDg2sjAn1g57SbkzJuSV9+YRaTGoPyE/UJ5uyybd9ggFp9XU24Z983kkCeuJ6zXtml0Zis72c2/Wo4LzIEU8kz1zqGYZeT+fB6cQP68GXzgbR4OiJa1c34rUi3jEIBw5wyJh+XqDTK2LvhIFO89CD3uqwcD/4p5P6L6g2Rt8P+x7HoK6UL61eCYuMGWQqDItyTXZBvv7ZdIN4Rst3ctlj0QvUW6iFk2zTYfAdAM8UIqW7cf/R5QxMtORfRXfQctr87ZIN6Sd5uPFSnzFwJnn30QGtHpd36FcbYESwLrBloMRkidHSLWjctal2j6TR6hcMkVMzct1dnRsKtVKuLY4qQVeWNiUk3ERL88LKZmoCq1z0nyc3I5hE1tHEApVZf1neF9LHcXSZTcM/LqCqN+eTnfK5ngieUDkmdjzQOtyA0I56BcUZHEGxjXj3KKcOasWlh2pdE3z0OZCk9UsxB/nmUJw1rGx3ZFtx1OgPbWZQ17LNuOVnOx9rC3k2SkPLR+P65xce0iJtpwgL88dFTk3cJShLhfFafFKO2KDB64xUTZhjhO+afE7YtxsSjyB2A5l9UqdtaGNI1XXP7LVFXW3dFxRECQ2YF238SGR4IRLe1lrSNvoPbqsFtZwUOUJx4OE7MU37HEmJcDmCdbwjXmg6QnSkaHwYzPj1giLdMDFNu5FkyPjqvMyp/ZsD4dkMIkvB0h1EsTnWcjrTmHelttsrJEKSev8bmRbw== X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;31:C09Q4PlnI3KajJT2x68tl6UKB3hDyyWEkZt6xuKoxVRbJ7hPCZrHz9rMiMxB0ckvAx1qWp1oyPTXDJv6hCz7vCbfnQkY3LxfPLHNkLkYBJIZBmbr5mXX09yYLUUEPpgWZ90OuwtD7jd6mBNcCTLsGzp0octVjpwTqpurb22Kz1jtbNTawmzNUcLqnYUH9BK6oiJTfPmLLDOQ0I9QxfDCRvTZNHpOxzmFzY8t36kqrPqorO3EzjIoEYzm8OZiEfHytabXzwi15P9zpLPiiiIvDTSGGgcBJCJC3GHLnL1NEo8=;20:W+pPzxAX3MBjMP0NlGwx92yDWt7RC2xm3UHEVCML6I1Sm46dC3qZc5ZvzKKgoin5Xh9BanMFNnDocOVv0p3ZwpkqkV+yQAzNSnXvdwPZe1oPb97vTCLAzoXepG9Km6UqCHPR4kRHTyjSve6QGiyBZ1FipVMJOkTCXmL67WPHgpk= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148);SRVR:HE1PR0801MB1849;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1849; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;4:+8R7uibxGLH3vamRi7RM08SzqL3JG1eFoQ7HFuvgwYnlkAycsAg+r8p3b5LyLOoAxoe9FLuWU1guBSxN7WZ+Zl62cxq2EhD+SrzRjMIsZZ3PbCSwVzfiTNX5lsLZ6KTaTYAwDkAL1BFwAVpXIn0OCebHw0EN6GmHQw0w8i0UkEDWsOaM/fQQ+djxFC8ydZ7TGqYlO3AnRTKb+buOaFi6n7gmPrpKbIsjv/3RAxkAAxjNTD0SmSLvGB4/A/shP8lCi4f43f7L/uuH5jED6dQ1BAgHDZVa3+zHLCJ+dzn0HcHmhEuGiBfDnf6/tAwdF4q23zkfMb2JVij7C2R4b4k+7yrO6jODLAqxmAqNZdY6DUbUHBKswXVe6oHSsXyFL0DB5oNeOGvBEhY5hoHB8O2PXYIxoYm0yEwDUG3l1aIMQHkKhxwsePMsXX8NJZCxREpqSnktgm1c6uzT5OZ4+sA4cZI2YH5zmlWphp7KfH5Cs0On65jljXcG7byJNYi3gFodS5wyxDWKw8Wnw+sGLRp5md6vOl4i1rLd93Lr2Av6DWQ0YnaLqAEqXnH6YFGaWYlkvBO3IDRsYXBgnQKkrpBk7g== X-Forefront-PRVS: 0145758B1D X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(7916002)(189002)(199003)(81156014)(575784001)(86362001)(81166006)(4001430100002)(6116002)(3846002)(68736007)(2906002)(9686002)(4326007)(54356999)(50986999)(101416001)(38730400001)(6916009)(39410400001)(110136003)(39450400002)(77096006)(5660300001)(733004)(6486002)(6666003)(103116003)(90366009)(47776003)(2351001)(106356001)(105586002)(33646002)(33716001)(230700001)(50466002)(42186005)(83506001)(23676002)(4001350100001)(97736004)(107886002)(7846002)(7736002)(189998001)(80792005)(66066001)(305945005)(92566002)(8676002)(7099028);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0801MB1849;H:[10.161.30.31];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjA4MDFNQjE4NDk7MjM6dmF2S1dJbWF5ekhRaUt5S2h6dXgzbkYy?= =?utf-8?B?SHM4UUsxbmFQMW5RaGMybnZMUERoV096SUMyWEdsOEVwOU0ySWRQdTI5Ry9H?= =?utf-8?B?RTBCTnp4aEZSdk1MUVhNZkZiZFlhMmo5QjZXSTN6RjVtbzNSelU3aXFlSWVG?= =?utf-8?B?VFlCVlY0eW5jRzRvSHVBOHhsNUVkZjgzd0NEdnV4L1VFbStGUURvczlCNjhn?= =?utf-8?B?SVl2R3p0OU9jejh6RGtnS2tjRU5NWTMwMWN4V05PVlkveWlZK3dIZm9OUUhD?= =?utf-8?B?anptODQ1OWE4SUFadHBzSjhDNmovdE5HaC9qeHllNWZSWjFsSytpWjczOU50?= =?utf-8?B?OGgxY0x6UklNZ2k5Z1FVbnp2QUc4ckN0SXAzeUpKSktnR04wNzgyNURpeWgx?= =?utf-8?B?Q1lmOFFpMWdqcmlqbUcvZ3VDVHlIOHk5NU5ZdEJpenZlTkFtU2Jnc2JieHRS?= =?utf-8?B?VHF1RWQ5SURjWmhCTE9JQ1B5QWpZMnA1VzFvN1dKcHVLcklGaDFMU1VwZGhB?= =?utf-8?B?dG1ROXNUMk83RExyNHRWbUtxdGJGZFQ5QWlSeStudnpNQ0wzcEhDcVM3T01H?= =?utf-8?B?VEpXaktJRUx0K296R3VIUlI4aVRSU0l1VWwyM2tzS0YxdmlwRGliemdGZTRs?= =?utf-8?B?L2JFY3FYYUhxdmpoRFhVV0dwbGdHL3B6M3RtRmZ6cUFHUk1VQVNZL256bjhy?= =?utf-8?B?dXBJRnd2T3VNK0ZLY1Q4cFdqdDU4L3RyZXYzTmZQa1F3eTJTMS9TUGJhVnhD?= =?utf-8?B?VmtKNW5acTJINkowT0twRTE1RmhJZWc1N2piZXlNb3o2Z2pjNW82aXJ6dy93?= =?utf-8?B?TUd3U28rRVI4cE1UUExtazB0T2doNnVWT1FwTE81UStkN284MFJNK1NyOW44?= =?utf-8?B?S0NlUHdUQUxrUDUwQTFGMGRaNm9HYms4WjVFYVMvUWM2Qktac3JKMjBnY1ll?= =?utf-8?B?eFNhWHNrTDBQd2xRQnl0eFdIYkw5NDBXbUc2U1h6T3h2T3VmS2ZWRGhpNTRR?= =?utf-8?B?QUtocWt0SzBYTG9MNzNRai9qa21QeGx1ZU1JK1kwZXdCRUhIYlBlTHk4STdX?= =?utf-8?B?OW96b1Zra0ErOGlpUHF6ejd3MUg0ZkQvRjRTZjFCd0xyRXNmNVAyWGpKTFQy?= =?utf-8?B?aExaeVFscTU4Q2hJUytRZHl4ZSt0TkkxYjBWRkFmekNYbENRVjAwY09lS1B5?= =?utf-8?B?eEZNekcwSXQ1Tm1IY1BjRkJQMUJqVUppcmVtM3VXVFZJdzlFd3FDMGdQbUwv?= =?utf-8?B?d1BhdytFdzI2K2x1bzFqRTdyR1Mvc21hWHliMmFOWFRBUFVlSi94Qm9Wb29k?= =?utf-8?B?L2pCRXdXT3BOZUFmM01yNUVvM0p5MElpaG40QUpveU9mVVhScHJYWUhWdDVk?= =?utf-8?B?QmQxaGxnRzVMZER3VWVkNzdnWkp0N1ZGQlYvQXZ3bFp4MzY1V3lvTjVmSEN6?= =?utf-8?B?eWpuSndjbUdBSjNMdzMvaUhvTkpSdnNMbG53YVJQRHJ1Um5ReXczeGhHZ2lm?= =?utf-8?B?Q1c2R09jcFM1b0FkMFhmUnFhTmJBVjRWeUJIdS9yWnUwTlpiMEJuVjZveHlr?= =?utf-8?B?TnJIZUlodVJkL0JnUFJ4cWJJc2Rsc3J6MDR5dzBQdUtNWTdFZEN2MmkrSDA0?= =?utf-8?B?NmptYUhVYXFTM0d3K2h4TWxvSVhFY3pqU3NRTTdybmdnaUFUdEJwK2U2QVNq?= =?utf-8?B?L0lZQncreDBoYVZTTDk0dnpuWElVUmlHNDM0QUZWcXhLMGxtMWNmZEo2YlFO?= =?utf-8?Q?LIupMCsXngqRPCBzhvHfZkDGYTofsX92R3fDVNs=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;6:byp8tU1rZS+9otEMM2Po6xuqtqAX3cqm/oddkzc+q6WMP4RZXO3P+YQiu8+2pcv3eT+oVnJs9XGZHhM4EruwfEQtsi0zl4EF1J1duQEXukt6MkGNaXuPEzdgESGgTa0veMPtOKBYS4OCOLqmvddXPxAeIxS0fYGSQjuGl9R47Jo5BUroX12dozygyOdRipRIXX/DtiRJOOTosP0vya8R7QcxtBTn4JXDpx/TC1VIffvG7Ob2KU8+5vqLPtXgxPj8Ca+kUmXjEtVE/aZMIAoHQXexzZSwUY6WeBgT1M2ddVc7FCB/8Qvvb7XKx83CxmXI0wghyVxweD6cwd9kOGlbiE8vwVR5hJuwb1b4+3gu3t5+3KRlYVPvivSJkvNI7+X7zdJJwQESTTZ7hajC5uzXjX2LvrKtABEixZo4396LUk29S0bueX38iyvwuxNN7HkddFwkh+GBsWLenmkHJC/7Zg==;5:6jP5puSPckWdIi0ZleXC0CaXN1re3zv8zlKsvtURQeAtz385ug/jSM7Ah0ikrRPjLZs2vtu7IRKuA2InyHS+dIQEahds6SttlE7FJIiPfKRO7DpRDktvdR12aL7B5Ymcz4uRf9lRpBCGsjYaRlf5B+bW9vuSgQz06lkVb4O8sME=;24:XdyzJkAY6wGYWV+eWucMfTDxuNjyNdUjlwdwROZ08gCf5EzyizBPsJFz5+eylGcwWSnzcJSOpuEMym7K7NiTQnClAPADua1/ev0A48ARaDo= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1849;7:NS2XQb+MTeQlJK3F9wPTaEcq9ZH9eEXjdOOhan6vLopGhntoVpHvSVMzfcXDNrUps+SYXUN5hOa7RotPrXWZ7xJu4VhvMlRSfphTqzgmCK0PHcB1zEbbrpksJ2pYRyYKCv1zlZjaRYlvH4DJXTIr5cJrHCRkN0DIbItGu6iO3u6QkX2C5JKWtC3/ksHVb1BAVmJ5yM4L4talcmVD071v1prbtDmes10mjaLaWqU57PCABQ+DM6qVq7/CASG8SwQODdejaBgPbVeeFWuV0onTYysY3dtH1cmoj6A0iHXEPHvzv6xSH7awirLlxggrik9n1IL1xw4tk31t1dE4oc9E1ddGvaCmYkY8dIToIq2hOQcdxboqmeOT1khxIVwZ54q0tkg7BluQtdZbJf2HclrWb0ti16hviu197mOsdXtla+AhAx/9butEHhU9ahmQQ7tglFZAh76ZXtqNQTlE+5F0fg==;20:kRXckTVb3hQOTRz/Vtgvs66PkT/vupxruSEA8ExUNmWxHAeUDiuXljI6VyLsGeE7KO2qoM99gT0Ic0YiSbFM79iDYDKuxF/fevePXteDHQ53XgqKjqv74AnoNACSLZzIUSlL2s5UxPYUIyoGAO4vMcCTl6ZnRBoHBntPY0o6Tzo= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Dec 2016 01:51:48.7627 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1849 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4705 Lines: 114 Problem statement: unprivileged user who has read-write access to more than one btrfs subvolume may easily consume all kernel memory (eventually triggering oom-killer). Reproducer (./mkrmdir below essentially loops over mkdir/rmdir): [root@kteam1 ~]# cat prep.sh DEV=/dev/sdb mkfs.btrfs -f $DEV mount $DEV /mnt for i in `seq 1 16` do mkdir /mnt/$i btrfs subvolume create /mnt/SV_$i ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2` mount -t btrfs -o subvolid=$ID $DEV /mnt/$i chmod a+rwx /mnt/$i done [root@kteam1 ~]# sh prep.sh [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0 kmalloc-128 9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0 kmalloc-128 24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0 kmalloc-128 42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata 1336070 1336070 0 The huge numbers above come from insane number of async_work-s allocated and queued by btrfs_wq_run_delayed_node. The problem is caused by btrfs_wq_run_delayed_node() queuing more and more works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The worker func (btrfs_async_run_delayed_root) processes at least BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery works as expected while the list is almost empty. As soon as it is getting bigger, worker func starts to process more than one item at a time, it takes longer, and the chances to have async_works queued more than needed is getting higher. The problem above is worsened by another flaw of delayed-inode implementation: if async_work was queued in a throttling branch (number of items >= BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that the func occupies CPU infinitely (up to 30sec in my experiments): while the func is trying to drain the list, the user activity may add more and more items to the list. The patch fixes both problems in straightforward way: refuse queuing too many works in btrfs_wq_run_delayed_node and bail out of worker func if at least BTRFS_DELAYED_WRITEBACK items are processed. Signed-off-by: Maxim Patlasov --- fs/btrfs/async-thread.c | 8 ++++++++ fs/btrfs/async-thread.h | 1 + fs/btrfs/delayed-inode.c | 6 ++++-- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index e0f071f..29f6252 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -86,6 +86,14 @@ btrfs_work_owner(struct btrfs_work *work) return work->wq->fs_info; } +bool btrfs_workqueue_normal_congested(struct btrfs_workqueue *wq) +{ + int thresh = wq->normal->thresh != NO_THRESHOLD ? + wq->normal->thresh : num_possible_cpus(); + + return atomic_read(&wq->normal->pending) > thresh * 2; +} + BTRFS_WORK_HELPER(worker_helper); BTRFS_WORK_HELPER(delalloc_helper); BTRFS_WORK_HELPER(flush_delalloc_helper); diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h index 8e52484..1f95973 100644 --- a/fs/btrfs/async-thread.h +++ b/fs/btrfs/async-thread.h @@ -84,4 +84,5 @@ void btrfs_workqueue_set_max(struct btrfs_workqueue *wq, int max); void btrfs_set_work_high_priority(struct btrfs_work *work); struct btrfs_fs_info *btrfs_work_owner(struct btrfs_work *work); struct btrfs_fs_info *btrfs_workqueue_owner(struct __btrfs_workqueue *wq); +bool btrfs_workqueue_normal_congested(struct btrfs_workqueue *wq); #endif diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 3eeb9cd..de946dd 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1356,7 +1356,8 @@ release_path: total_done++; btrfs_release_prepared_delayed_node(delayed_node); - if (async_work->nr == 0 || total_done < async_work->nr) + if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) || + total_done < async_work->nr) goto again; free_path: @@ -1372,7 +1373,8 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, { struct btrfs_async_delayed_work *async_work; - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) + if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND || + btrfs_workqueue_normal_congested(fs_info->delayed_workers)) return 0; async_work = kmalloc(sizeof(*async_work), GFP_NOFS);