Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933819AbcJMB4Q (ORCPT ); Wed, 12 Oct 2016 21:56:16 -0400 Received: from sender153-mail.zoho.com ([74.201.84.153]:25436 "EHLO sender153-mail.zoho.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933475AbcJMB4I (ORCPT ); Wed, 12 Oct 2016 21:56:08 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=subject:to:references:cc:from:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=jdQUT2/+/EhE0hLgAXbtlPay4G2V6RJa/r7NAjMi9TRFFhbpm1vQOo2F8C4DeXfk1GtmeTpm8Qog yLftEAQo59RbwE6KyrODIg1MYRlpA5PTsdSua4svtauIsV8KB9CO Subject: Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely To: Andrew Morton References: <57FCF07C.2020103@zoho.com> <20161012144112.0494082cf4cbd07609d2405d@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, zijun_hu@htc.com, tj@kernel.org, cl@linux.com From: zijun_hu Message-ID: <57FECFCD.7020108@zoho.com> Date: Thu, 13 Oct 2016 08:05:33 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20161012144112.0494082cf4cbd07609d2405d@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2215 Lines: 61 On 10/13/2016 05:41 AM, Andrew Morton wrote: > On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu wrote: > >> as shown by pcpu_build_alloc_info(), the number of units within a percpu >> group is educed by rounding up the number of CPUs within the group to >> @upa boundary, therefore, the number of CPUs isn't equal to the units's >> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk() >> uses BUG_ON() to assert one number is equal the other roughly, so a panic >> is maybe triggered by the BUG_ON() falsely. >> >> in order to fix this issue, the number of CPUs is rounded up then compared >> with units's, the BUG_ON() is replaced by warning and returning error code >> as well to keep system alive as much as possible. > > Under what circumstances is the triggered? In other words, what are > the end-user visible effects of the fix? > the BUG_ON() takes effect when the number isn't aligned @upa, the BUG_ON() should not be triggered under this normal circumstances. the aim of this fixing is prevent the BUG_ON() which is triggered under the case. see below original code segments for reason. pcpu_build_alloc_info(){ ... for_each_possible_cpu(cpu) if (group_map[cpu] == group) gi->cpu_map[gi->nr_units++] = cpu; gi->nr_units = roundup(gi->nr_units, upa); calculate the number of CPUs belonging to a group into relevant @gi->nr_units then roundup @gi->nr_units up to @upa for itself unit += gi->nr_units; ... } pcpu_page_first_chunk() { ... ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL); if (IS_ERR(ai)) return PTR_ERR(ai); BUG_ON(ai->nr_groups != 1); BUG_ON(ai->groups[0].nr_units != num_possible_cpus()); it seems there is only one group and all CPUs belong to the group but compare the number of CPUs with the number of units directly. as shown by comments in above function. ai->groups[0].nr_units should equal to roundup(num_possible_cpus(), @upa) other than num_possible_cpus() directly. ... } > I mean, this is pretty old code (isn't it?) so what are you doing that > triggers this? > > i am learning memory source and find the inconsistency and think the BUG_ON() maybe be triggered under this special normal but possible case