Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933099AbcJMAK5 (ORCPT ); Wed, 12 Oct 2016 20:10:57 -0400 Received: from sender153-mail.zoho.com ([74.201.84.153]:25411 "EHLO sender153-mail.zoho.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932399AbcJMAKs (ORCPT ); Wed, 12 Oct 2016 20:10:48 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=subject:to:references:cc:from:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=rr/Tm3txuJyL+ZsE5pxmtmZ/o0UCwQ9fC2S/lSI/ROrkOkmL01XkmXmfaI14YX50EOca++TMpkl8 XeaDF0KaDJ72E14TlbXQrSbCXCxSy7mFDs1kViBBKINXKMEjNP6d Subject: Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely To: Andrew Morton References: <57FCF07C.2020103@zoho.com> <20161012144112.0494082cf4cbd07609d2405d@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, zijun_hu@htc.com, tj@kernel.org, cl@linux.com From: zijun_hu Message-ID: <57FED0B1.3010506@zoho.com> Date: Thu, 13 Oct 2016 08:09:21 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20161012144112.0494082cf4cbd07609d2405d@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2261 Lines: 64 On 10/13/2016 05:41 AM, Andrew Morton wrote: > On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu wrote: > >> as shown by pcpu_build_alloc_info(), the number of units within a percpu >> group is educed by rounding up the number of CPUs within the group to >> @upa boundary, therefore, the number of CPUs isn't equal to the units's >> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk() >> uses BUG_ON() to assert one number is equal the other roughly, so a panic >> is maybe triggered by the BUG_ON() falsely. >> >> in order to fix this issue, the number of CPUs is rounded up then compared >> with units's, the BUG_ON() is replaced by warning and returning error code >> as well to keep system alive as much as possible. > > Under what circumstances is the triggered? In other words, what are > the end-user visible effects of the fix? > the BUG_ON() takes effect when the number of CPUs isn't aligned @upa, the BUG_ON() should not be triggered under this normal circumstances. the aim of this fixing is prevent the BUG_ON() which is triggered under the case. see below original code segments for reason. pcpu_build_alloc_info(){ ... for_each_possible_cpu(cpu) if (group_map[cpu] == group) gi->cpu_map[gi->nr_units++] = cpu; gi->nr_units = roundup(gi->nr_units, upa); calculate the number of CPUs belonging to a group into relevant @gi->nr_units then roundup @gi->nr_units up to @upa for itself unit += gi->nr_units; ... } pcpu_page_first_chunk() { ... ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL); if (IS_ERR(ai)) return PTR_ERR(ai); BUG_ON(ai->nr_groups != 1); BUG_ON(ai->groups[0].nr_units != num_possible_cpus()); it seems there is only one group and all CPUs belong to the group but compare the number of CPUs with the number of units directly. ... } as shown by comments in above function. ai->groups[0].nr_units should equal to roundup(num_possible_cpus(), @upa) other than num_possible_cpus() directly. > I mean, this is pretty old code (isn't it?) so what are you doing that > triggers this? > > i am learning memory management source and find the inconsistency and think the BUG_ON() maybe be triggered under this special normal but possible case it maybe a logic error