Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755606Ab3H3WOu (ORCPT ); Fri, 30 Aug 2013 18:14:50 -0400 Received: from mail-ee0-f50.google.com ([74.125.83.50]:40218 "EHLO mail-ee0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752599Ab3H3WOt (ORCPT ); Fri, 30 Aug 2013 18:14:49 -0400 MIME-Version: 1.0 In-Reply-To: <52210D34.5030807@infradead.org> References: <52210D34.5030807@infradead.org> Date: Fri, 30 Aug 2013 15:14:47 -0700 Message-ID: Subject: Re: do_div() silently truncates "base" to 32bit From: Anatol Pomozov To: Randy Dunlap Cc: LKML , Tejun Heo , bernie@develer.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2408 Lines: 76 Hi On Fri, Aug 30, 2013 at 2:23 PM, Randy Dunlap wrote: > On 08/30/13 10:21, Anatol Pomozov wrote: >> Hi, >> >> >> I was debugging weird "zero divide" problem in CFQ code below >> >> >> static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf, >> struct blkg_policy_data *pd, int off) >> { >> struct cfq_group *cfqg = pd_to_cfqg(pd); >> u64 samples = blkg_stat_read(&cfqg->stats.avg_queue_size_samples); >> u64 v = 0; >> >> if (samples) { >> v = blkg_stat_read(&cfqg->stats.avg_queue_size_sum); >> do_div(v, samples); >> } >> __blkg_prfill_u64(sf, pd, v); >> return 0; >> } >> >> >> do_div() crashes says "zero divide". It is weird because just a few >> lines above we check divider for zero. >> >> >> The problem comes from include/asm-generic/div64.h file that >> implements do_div() as macros: >> >> # define do_div(n,base) ({ \ >> uint32_t __base = (base); \ >> uint32_t __rem; \ >> __rem = ((uint64_t)(n)) % __base; \ >> (n) = ((uint64_t)(n)) / __base; \ >> __rem; \ >> }) >> >> >> Do you see the problem? >> >> The problem here is that "base" argument is truncated to 32bit, but in >> the function above "sample" is 64bit variable. If sample's 32 low bits >> are zero - we have a crash. in fact we have incorrect behavior any >> time when high 32bits are non-zero. >> >> >> My question is why the base is 32bit? Why not to use 64bit arguments? > > Maybe performance related? > > If you want 64-bit values, don't use do_div() from asm-generic/div64.h. > > Instead look at linux/math64.h and use div_u64_rem() et al > or the recently posted div64_u64_rem(). > [posted by Mike Snitzer on Aug. 21 2013] > > I.e., use exactly the function(s) that you need to use. > > Does that fix the problem? It definitely fixes the crash. I've already sent a patch to CFQ maillist http://news.gmane.org/gmane.linux.kernel.cgroups But another question still remains: why compiler does not warn that size truncation happens? How to prevent bugs like CFQ one in the future? Should we add a compile-time assert to do_div() to prevent passing 64 numbers in "base" macro parameter? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/