Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp268724ybg; Tue, 2 Jun 2020 23:49:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwVj+hlOMe1+AjkIYTg8djBU8uAUMZdNBAekKvrAoDm3mvZLCSiWyClpCcOdznxoRQYiQGv X-Received: by 2002:a50:afc2:: with SMTP id h60mr2073361edd.110.1591166977382; Tue, 02 Jun 2020 23:49:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591166977; cv=none; d=google.com; s=arc-20160816; b=MWAu3ojRlt4YeK2CarVC7qXjBuV9I8Ci9uM+Dhs/qLx+Tq+xHEpnNok9RUPZAx/wQp U0yfBvBA54opo0MteT6npP40UlQeMuNwLvjb8/sLMhIvcQc+rR2oIzrmMocXHqBJNcVs NtWUsGNfqaO878nqT90HyuAX9xVmx31CsTqIrGDSpr2Bx1tIHYIlftUFzS0Rq1RIm4Ig u3cuf2v7LEyZF+NwuNVcaz9QKDCQMJgceoAeqiBemLHomGJecmD9HQWWcecHVZ4Q5lrg xmqcdGr+rMD0PGk7XhWWOenbI8ETn3z1FaBgDILZ9qSmOULTYFyBM8VX/K+H48hDS/UF RMKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:cms-type:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:cc:to:subject:dkim-signature:dkim-filter; bh=FP+oHu1ZKPLgzkpn+WtVvYWahse9lDSc5c8CXSXoDuw=; b=fbVREMY4yXLwmlAXsKHmlNVWTOvF0I8TXRGgmK4dnpHAa+SKEliwZhH60UMHRPmRi6 wpcjZkhK8sqmp3TdNb7HR/UZnga6wy/g5Bi6Qj0XV71ph7SNhzZvKZuSDQ9vBuo8ZtBJ MFPgBsTRfysqCycfg4BOvwqB9Ys1ZXmvGxwkClwz4ZzswdlH59C5YRqp2i5tcd7yvamh ncc4uP9TNJqRxxY06jWVwUHstkB9L5fxPRwcLXIDyBz4E20Z7Y3Tfj6fRl4V0TDkIR/a 0f4vBJzvontlLV+MJ9UWQ4pggdbU5Z5AG/uerDWGjAbyNtf8EGacGJFhiZ0fYaWsDAHu 1hDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=S9m8oly8; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x26si625607ejc.746.2020.06.02.23.49.08; Tue, 02 Jun 2020 23:49:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=S9m8oly8; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725955AbgFCGsz (ORCPT + 99 others); Wed, 3 Jun 2020 02:48:55 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:49337 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725828AbgFCGsz (ORCPT ); Wed, 3 Jun 2020 02:48:55 -0400 Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20200603064852euoutp018c42277d576a368f93be3ecfcb93bf61~U9eUQ3qNf0461504615euoutp01H for ; Wed, 3 Jun 2020 06:48:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20200603064852euoutp018c42277d576a368f93be3ecfcb93bf61~U9eUQ3qNf0461504615euoutp01H DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1591166932; bh=FP+oHu1ZKPLgzkpn+WtVvYWahse9lDSc5c8CXSXoDuw=; h=Subject:To:Cc:From:Date:In-Reply-To:References:From; b=S9m8oly8sddS6EL39ylPOAAFBMBUeOfjtsBwFUT5jFOX9bda35LST2qRRsYbydW61 pB6aQQY0akz9+DlEZTF6yTtcytgre3kFerWzsxLcMYta/kfoERGYXeX9tkjhRFsWE2 goZ6uCQpThD71RZ1jDXUMqGXjeCqjVXnvrFmcfoQ= Received: from eusmges2new.samsung.com (unknown [203.254.199.244]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20200603064852eucas1p1bd0bcfd345341dce68a296375fb0a791~U9eUHWNvv2442224422eucas1p1u; Wed, 3 Jun 2020 06:48:52 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges2new.samsung.com (EUCPMTA) with SMTP id B0.F8.60679.4D747DE5; Wed, 3 Jun 2020 07:48:52 +0100 (BST) Received: from eusmtrp2.samsung.com (unknown [182.198.249.139]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20200603064851eucas1p2e435089fbdf4de1d1fa3fb051c2f3d7b~U9eTepbYJ0363103631eucas1p2T; Wed, 3 Jun 2020 06:48:51 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp2.samsung.com (KnoxPortal) with ESMTP id 20200603064851eusmtrp2e11aa2f6a4bf3a04b87301913d7ca40f~U9eTd6Xir1776517765eusmtrp2U; Wed, 3 Jun 2020 06:48:51 +0000 (GMT) X-AuditID: cbfec7f4-0cbff7000001ed07-b8-5ed747d48e40 Received: from eusmtip2.samsung.com ( [203.254.199.222]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id 23.F1.08375.3D747DE5; Wed, 3 Jun 2020 07:48:51 +0100 (BST) Received: from [106.210.88.143] (unknown [106.210.88.143]) by eusmtip2.samsung.com (KnoxPortal) with ESMTPA id 20200603064851eusmtip24519b279e7936764c77c01632cbc1b25~U9eTCtPhA0725207252eusmtip2j; Wed, 3 Jun 2020 06:48:51 +0000 (GMT) Subject: Re: [PATCHv5 3/5] ext4: mballoc: Introduce pcpu seqcnt for freeing PA to improve ENOSPC handling To: Ritesh Harjani , linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Jan Kara , Theodore Ts'o , "Aneesh Kumar K . V" , linux-kernel@vger.kernel.org From: Marek Szyprowski Message-ID: Date: Wed, 3 Jun 2020 08:48:51 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1 MIME-Version: 1.0 In-Reply-To: <7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.com> Content-Transfer-Encoding: 8bit Content-Language: en-US X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrGKsWRmVeSWpSXmKPExsWy7djPc7pX3K/HGZz5zmFx8aC/xfPlixkt Zs67w2axZ+9JFovLu+awWbx6fIvdorXnJ7sDu8eERQcYPZrOHGX2WL/lKovH501yASxRXDYp qTmZZalF+nYJXBk/vz5lLDinXfF78RzmBsbNKl2MnBwSAiYSF6/dY+9i5OIQEljBKHHocDsz hPOFUWLxyXNsEM5nRommvzMZYVpWd+9nhUgsZ5S4vOEsVMt7Rokvd04yg1QJC2RLrNi6kwnE FhFwk1iz9wwTSBGzwHpGiSd314ONYhMwlOh628UGYvMK2En8P/kQrJlFQEVi+cPDLCC2qECs RM/9V8wQNYISJ2c+AYtzCsRITPzSCdbLLCAv0bx1NjOELS5x68l8JohT17FLPOiIg7BdJB7t m80OYQtLvDq+BcqWkfi/cz7YcRICzYwSD8+tZYdweoB+a5oB9bS1xJ1zv4C2cQBt0JRYv0sf Iuwo8W3pNrCwhACfxI23ghA38ElM2jadGSLMK9HRJgRRrSYx6/g6uLUHL1xinsCoNAvJZ7OQ fDMLyTezEPYuYGRZxSieWlqcm55abJSXWq5XnJhbXJqXrpecn7uJEZh6Tv87/mUH464/SYcY BTgYlXh4DQyvxQmxJpYVV+YeYpTgYFYS4XU6ezpOiDclsbIqtSg/vqg0J7X4EKM0B4uSOK/x opexQgLpiSWp2ampBalFMFkmDk6pBkbOzHNlatlqUyo3tzBeMd9rzL2R32XPso88FlV7ZnU8 YZ8gbdIb0bx4zs3C0yfXL2ubmNU1O/qRzGW3T74TnlQt/eWQua0z9kxK2xeb5rPPUn7LKJY7 cKc7y+gxvuhyvr/CXdciTfTk3Fm2sbMPnhAX558c0GXblvwwdEnAgVVJE5u3aUiZKSmxFGck GmoxFxUnAgBNLVKtOQMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJIsWRmVeSWpSXmKPExsVy+t/xe7qX3a/HGcx7qW5x8aC/xfPlixkt Zs67w2axZ+9JFovLu+awWbx6fIvdorXnJ7sDu8eERQcYPZrOHGX2WL/lKovH501yASxRejZF +aUlqQoZ+cUltkrRhhZGeoaWFnpGJpZ6hsbmsVZGpkr6djYpqTmZZalF+nYJehk/vz5lLDin XfF78RzmBsbNKl2MnBwSAiYSq7v3s3YxcnEICSxllFgy6zwzREJG4uS0BlYIW1jiz7UuNoii t4wSE1afZwdJCAtkS8zaegesSETATWLN3jNMIEXMAhsZJQ7MuMgM0dHFKHFv8lFGkCo2AUOJ rrcgozg5eAXsJP6ffAi2jkVARWL5w8MsILaoQKxE9+If7BA1ghInZz4Bi3MKxEhM/NIJ1sss YCYxbzNEL7OAvETz1tlQtrjErSfzmSYwCs1C0j4LScssJC2zkLQsYGRZxSiSWlqcm55bbKhX nJhbXJqXrpecn7uJERht24793LyD8dLG4EOMAhyMSjy8BobX4oRYE8uKK3MPMUpwMCuJ8Dqd PR0nxJuSWFmVWpQfX1Sak1p8iNEU6LmJzFKiyfnARJBXEm9oamhuYWlobmxubGahJM7bIXAw RkggPbEkNTs1tSC1CKaPiYNTqoGxPGEmS866/GufAqpuXDl32sE0IKRJZVKt4KGFR/7PCppo VSotzMRVEzK5N4uZNaqe95yWxv5ZL0Tznq3ludPEccnXofHJY07bz6XS3wpnzVNb2N64SjSJ bfHM+Dn/9k2/0G3EuT/zdJXpu7CjD4PuFrs1lj8tvs17i3t35beJWZv/n6uPXvFMiaU4I9FQ i7moOBEA4qcxKMwCAAA= X-CMS-MailID: 20200603064851eucas1p2e435089fbdf4de1d1fa3fb051c2f3d7b X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20200603064851eucas1p2e435089fbdf4de1d1fa3fb051c2f3d7b X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20200603064851eucas1p2e435089fbdf4de1d1fa3fb051c2f3d7b References: <7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.com> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Ritesh, On 20.05.2020 08:40, Ritesh Harjani wrote: > There could be a race in function ext4_mb_discard_group_preallocations() > where the 1st thread may iterate through group's bb_prealloc_list and > remove all the PAs and add to function's local list head. > Now if the 2nd thread comes in to discard the group preallocations, > it will see that the group->bb_prealloc_list is empty and will return 0. > > Consider for a case where we have less number of groups > (for e.g. just group 0), > this may even return an -ENOSPC error from ext4_mb_new_blocks() > (where we call for ext4_mb_discard_group_preallocations()). > But that is wrong, since 2nd thread should have waited for 1st thread > to release all the PAs and should have retried for allocation. > Since 1st thread was anyway going to discard the PAs. > > The algorithm using this percpu seq counter goes below: > 1. We sample the percpu discard_pa_seq counter before trying for block > allocation in ext4_mb_new_blocks(). > 2. We increment this percpu discard_pa_seq counter when we either allocate > or free these blocks i.e. while marking those blocks as used/free in > mb_mark_used()/mb_free_blocks(). > 3. We also increment this percpu seq counter when we successfully identify > that the bb_prealloc_list is not empty and hence proceed for discarding > of those PAs inside ext4_mb_discard_group_preallocations(). > > Now to make sure that the regular fast path of block allocation is not > affected, as a small optimization we only sample the percpu seq counter > on that cpu. Only when the block allocation fails and when freed blocks > found were 0, that is when we sample percpu seq counter for all cpus using > below function ext4_get_discard_pa_seq_sum(). This happens after making > sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty. > > It can be well argued that why don't just check for grp->bb_free to > see if there are any free blocks to be allocated. So here are the two > concerns which were discussed:- > > 1. If for some reason the blocks available in the group are not > appropriate for allocation logic (say for e.g. > EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then > the retry logic may result into infinte looping since grp->bb_free is > non-zero. > > 2. Also before preallocation was clubbed with block allocation with the > same ext4_lock_group() held, there were lot of races where grp->bb_free > could not be reliably relied upon. > Due to above, this patch considers discard_pa_seq logic to determine if > we should retry for block allocation. Say if there are are n threads > trying for block allocation and none of those could allocate or discard > any of the blocks, then all of those n threads will fail the block > allocation and return -ENOSPC error. (Since the seq counter for all of > those will match as no block allocation/discard was done during that > duration). > > Signed-off-by: Ritesh Harjani This patch landed in yesterday's linux-next and causes following WARNING/BUG on various Samsung Exynos-based boards:  BUG: using smp_processor_id() in preemptible [00000000] code: logsave/552  caller is ext4_mb_new_blocks+0x404/0x1300  CPU: 3 PID: 552 Comm: logsave Tainted: G        W 5.7.0-next-20200602 #4  Hardware name: Samsung Exynos (Flattened Device Tree)  [] (unwind_backtrace) from [] (show_stack+0x10/0x14)  [] (show_stack) from [] (dump_stack+0xbc/0xe8)  [] (dump_stack) from [] (check_preemption_disabled+0xec/0xf0)  [] (check_preemption_disabled) from [] (ext4_mb_new_blocks+0x404/0x1300)  [] (ext4_mb_new_blocks) from [] (ext4_ext_map_blocks+0xc7c/0x10f4)  [] (ext4_ext_map_blocks) from [] (ext4_map_blocks+0x118/0x5a0)  [] (ext4_map_blocks) from [] (mpage_map_and_submit_extent+0x134/0x9c0)  [] (mpage_map_and_submit_extent) from [] (ext4_writepages+0xb18/0xcb0)  [] (ext4_writepages) from [] (do_writepages+0x20/0x94)  [] (do_writepages) from [] (__filemap_fdatawrite_range+0xac/0xcc)  [] (__filemap_fdatawrite_range) from [] (filemap_flush+0x28/0x30)  [] (filemap_flush) from [] (ext4_release_file+0x70/0xac)  [] (ext4_release_file) from [] (__fput+0xc4/0x234)  [] (__fput) from [] (task_work_run+0x88/0xcc)  [] (task_work_run) from [] (do_work_pending+0x52c/0x5cc)  [] (do_work_pending) from [] (slow_work_pending+0xc/0x20)  Exception stack(0xec9c1fb0 to 0xec9c1ff8)  1fa0:                                     00000000 0044969c 0000006c 00000000  1fc0: 00000001 0045a014 00000241 00000006 00000000 be91abb4 be91abb0 0000000c  1fe0: 00459fd4 be91ab90 00448ed4 b6e43444 60000050 00000003 Please let me know how I can help debugging this issue. The above log is from linux-next 20200602 compiled from exynos_defconfig running on ARM 32bit Samsung Exynos4412-based Odroid U3 board, however I don't think this is Exynos specific issue. Probably I've observed it, because exynos_defconfig has most of the debugging options enabled. > ... Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland