Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754913AbcJNAQ6 (ORCPT ); Thu, 13 Oct 2016 20:16:58 -0400 Received: from sender153-mail.zoho.com ([74.201.84.153]:25328 "EHLO sender153-mail.zoho.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752729AbcJNAQt (ORCPT ); Thu, 13 Oct 2016 20:16:49 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=from:subject:to:references:cc:message-id:date:user-agent:mime-version:in-reply-to:content-type; b=eR6xOs3AYA36Q5KWcWCon0LGc2QrUxjWXxkUgWRStwvFGmtM+qPwbkEVARKzthqFl+g5cLs1aTcB ekkeNfaCv39HV2bNzz0tdHVdrLiJy1HxmvrVFMcbZmnHms0BG+EA From: zijun_hu Subject: Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely To: Tejun Heo References: <57FCF07C.2020103@zoho.com> <20161013232902.GD32534@mtj.duckdns.org> Cc: zijun_hu@htc.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , cl@linux.com Message-ID: <10d149b0-e436-730d-2050-f9e1a6fed39e@zoho.com> Date: Fri, 14 Oct 2016 08:15:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161013232902.GD32534@mtj.duckdns.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2514 Lines: 70 On 2016/10/14 7:29, Tejun Heo wrote: > On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote: >> From: zijun_hu >> >> as shown by pcpu_build_alloc_info(), the number of units within a percpu >> group is educed by rounding up the number of CPUs within the group to >> @upa boundary, therefore, the number of CPUs isn't equal to the units's >> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk() >> uses BUG_ON() to assert one number is equal the other roughly, so a panic >> is maybe triggered by the BUG_ON() falsely. >> >> in order to fix this issue, the number of CPUs is rounded up then compared >> with units's, the BUG_ON() is replaced by warning and returning error code >> as well to keep system alive as much as possible. > > I really can't decode what the actual issue is here. Can you please > give an example of a concrete case? > the right relationship between the number of CPUs @nr_cpus within a percpu group and the number of unites @nr_units within the same group is that @nr_units == roundup(@nr_cpus, @upa); the process of consideration is shown as follows: 1) current code segments: BUG_ON(ai->nr_groups != 1); BUG_ON(ai->groups[0].nr_units != num_possible_cpus()); 2) changes for considering the right relationship between the number of CPUs and units BUG_ON(ai->nr_groups != 1); BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa)); 3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() isn't nice as shown by linus recent LKML mail BUG_ON(ai->nr_groups != 1); if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa)) return -EINVAL; so 3) is my finial changes; for the relationship of both numbers : see the reply for andrew >> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size, >> >> /* allocate pages */ >> j = 0; >> - for (unit = 0; unit < num_possible_cpus(); unit++) >> + for (unit = 0; unit < num_possible_cpus(); unit++) { >> + unsigned int cpu = ai->groups[0].cpu_map[unit]; >> for (i = 0; i < unit_pages; i++) { >> - unsigned int cpu = ai->groups[0].cpu_map[unit]; >> void *ptr; >> >> ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE); >> if (!ptr) { >> pr_warn("failed to allocate %s page for cpu%u\n", >> - psize_str, cpu); >> + psize_str, cpu); > > And stop making gratuitous changes? > this changes is just for looking nicer instinctively @cpu can be determined in the first outer loop. > Thanks. >