Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934121AbbLPK2k (ORCPT ); Wed, 16 Dec 2015 05:28:40 -0500 Received: from mga02.intel.com ([134.134.136.20]:16353 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933866AbbLPK2i (ORCPT ); Wed, 16 Dec 2015 05:28:38 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,436,1444719600"; d="scan'208";a="872604000" Message-ID: <56713CD3.80006@linux.intel.com> Date: Wed, 16 Dec 2015 18:28:35 +0800 From: "Zhang, Yanmin" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Steven Rostedt CC: "Qiu, PeiyangX" , linux-kernel@vger.kernel.org, mingo@redhat.com, Rusty Russell Subject: Re: [PATCH] ftrace: fix race between ftrace and insmod References: <566E3076.5080708@intel.com> <566E3482.1090008@intel.com> <20151214105128.780b8bac@gandalf.local.home> <566F675C.9000904@linux.intel.com> <566F8871.7070408@linux.intel.com> <20151215123719.183879fc@gandalf.local.home> In-Reply-To: <20151215123719.183879fc@gandalf.local.home> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10624 Lines: 296 On 2015/12/16 1:37, Steven Rostedt wrote: > On Tue, 15 Dec 2015 11:26:41 +0800 > "Zhang, Yanmin" wrote: > >>> This seems very hackish, although I can't think of a better way at the >>> moment. But I would like not to add more code into module.c if >>> possible, and just use a notifier unless there's a real reason we can't >>> (as there was when we added the hook in the first place). >>> We would double-check it again. It's not a good idea to change common >>> module codes. >> I double-checked it. If using notifier, we have to add 2 new module notification >> events in enum module_state. >> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. >> At MODULE_STATE_INITING, call ftrace_module_init(mod); >> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function); >> At MODULE_STATE_COMING, call a new function which links the new start_pg >> to ftrace_pages. >> Such design can also fix another bug in current kernel that ftrace start_pg >> of the module is not cleaned up if complete_formation fails. >> However, we change module common codes. The new events only serve ftrace. >> >> If not holding ftrace_mutex across start_pg setup and complete_formation, >> some other variables are not protected well, such like ftrace_pages, >> ftrace_start_up, and so on. > OK, that was basically the reason the hook was added in the first place. > > The reason the modules notifier doesn't work here is that there's too > many exits from the module code that may leave the mutex held. > > I thought about this some more. So the issue is that we add the module > functions to the ftrace records. Then another task enables function > tracing before the module is fully set up. As ftrace is enabling > function tracing, the module sets its text to read-only, outside the > scope of ftrace setting all text to read-write. When ftrace gets to the > module code, it fails to do the modifications and you get the > ftrace_bug() splat. > > Sounds right? > > Now, I really dislike the taking of the ftrace mutex and returning back > to module handling. That is very subtle, fragile and prone to bugs. I > took a step back to find another solution and I think I found one. > > Take the below patch and add to it (I'll keep this patch as mine, and > you can submit another patch to do the following). You don't need to > resend this patch, just send me a patch that does this: > > > Add the module notifier that calls ftrace_module_enable(mod), removing > it from ftrace_init_module(). Feel free to monkey with that function to > be able to be called directly if it can't already. > > What my patch does is to create a new ftrace record flag DISABLED, > which is set to all functions in a module as it is added. Then later on > (in the module notifier that you will add), the flag is cleared. In > between, the module functions will be ignored. > > > If the module errors out and the notifier is not called, we don't care. > The records will simply be removed, and the flag will be meaningless. > > I needed to move the code around to still let the module code get > enabled if tracing was already enabled. I even cleaned that code up as > I found it has gotten a bit messy. > > -- Steve > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 4736a826baf5..660e7c698f3b 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr); > * REGS - the record wants the function to save regs > * REGS_EN - the function is set up to save regs. > * IPMODIFY - the record allows for the IP address to be changed. > + * DISABLED - the record is not ready to be touched yet > * > * When a new ftrace_ops is registered and wants a function to save > * pt_regs, the rec->flag REGS is set. When the function has been > @@ -371,10 +372,11 @@ enum { > FTRACE_FL_TRAMP = (1UL << 28), > FTRACE_FL_TRAMP_EN = (1UL << 27), > FTRACE_FL_IPMODIFY = (1UL << 26), > + FTRACE_FL_DISABLED = (1UL << 25), > }; > > -#define FTRACE_REF_MAX_SHIFT 26 > -#define FTRACE_FL_BITS 6 > +#define FTRACE_REF_MAX_SHIFT 25 > +#define FTRACE_FL_BITS 7 > #define FTRACE_FL_MASKED_BITS ((1UL << FTRACE_FL_BITS) - 1) > #define FTRACE_FL_MASK (FTRACE_FL_MASKED_BITS << FTRACE_REF_MAX_SHIFT) > #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1) > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index e290a30f2d0b..e7e15874fb7c 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -2023,6 +2023,9 @@ static int ftrace_check_record(struct dyn_ftrace *rec, int enable, int update) > > ftrace_bug_type = FTRACE_BUG_UNKNOWN; > > + if (rec->flags & FTRACE_FL_DISABLED) > + return FTRACE_UPDATE_IGNORE; > + > /* > * If we are updating calls: > * > @@ -2849,64 +2852,41 @@ ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) > return 1; > } > > -static int referenced_filters(struct dyn_ftrace *rec) > -{ > - struct ftrace_ops *ops; > - int cnt = 0; > - > - for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) { > - if (ops_references_rec(ops, rec)) > - cnt++; > - } > - > - return cnt; > -} > - > static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) > { > struct ftrace_page *pg; > struct dyn_ftrace *p; > cycle_t start, stop; > unsigned long update_cnt = 0; > - unsigned long ref = 0; > - bool test = false; > + unsigned long rec_flags = 0; > int i; > > + start = ftrace_now(raw_smp_processor_id()); > + > /* > - * When adding a module, we need to check if tracers are > - * currently enabled and if they are set to trace all functions. > - * If they are, we need to enable the module functions as well > - * as update the reference counts for those function records. > + * When a module is loaded, this function is called to convert > + * the calls to mcount in its text to nops, and also to create > + * an entry in the ftrace data. Now, if ftrace is activated > + * after this call, but before the module sets its text to > + * read-only, the modification of enabling ftrace can fail if > + * the read-only is done while ftrace is converting the calls. > + * To prevent this, the module's records are set as disabled > + * and will be enabled after the call to set the module's text > + * to read-only. > */ > - if (mod) { > - struct ftrace_ops *ops; > - > - for (ops = ftrace_ops_list; > - ops != &ftrace_list_end; ops = ops->next) { > - if (ops->flags & FTRACE_OPS_FL_ENABLED) { > - if (ops_traces_mod(ops)) > - ref++; > - else > - test = true; > - } > - } > - } > - > - start = ftrace_now(raw_smp_processor_id()); > + if (mod) > + rec_flags |= FTRACE_FL_DISABLED; > > for (pg = new_pgs; pg; pg = pg->next) { > > for (i = 0; i < pg->index; i++) { > - int cnt = ref; > > /* If something went wrong, bail without enabling anything */ > if (unlikely(ftrace_disabled)) > return -1; > > p = &pg->records[i]; > - if (test) > - cnt += referenced_filters(p); > - p->flags = cnt; > + p->flags = rec_flags; > > /* > * Do the initial record conversion from mcount jump > @@ -2916,21 +2896,6 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) > break; > > update_cnt++; > - > - /* > - * If the tracing is enabled, go ahead and enable the record. > - * > - * The reason not to enable the record immediatelly is the > - * inherent check of ftrace_make_nop/ftrace_make_call for > - * correct previous instructions. Making first the NOP > - * conversion puts the module to the correct state, thus > - * passing the ftrace_make_call check. > - */ > - if (ftrace_start_up && cnt) { > - int failed = __ftrace_replace_code(p, 1); > - if (failed) > - ftrace_bug(failed, p); > - } > } > } > > @@ -4938,6 +4903,19 @@ static int ftrace_process_locs(struct module *mod, > > #define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next) > > +static int referenced_filters(struct dyn_ftrace *rec) > +{ > + struct ftrace_ops *ops; > + int cnt = 0; > + > + for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) { > + if (ops_references_rec(ops, rec)) > + cnt++; > + } > + > + return cnt; > +} > + > void ftrace_release_mod(struct module *mod) > { > struct dyn_ftrace *rec; > @@ -4980,12 +4958,82 @@ void ftrace_release_mod(struct module *mod) > mutex_unlock(&ftrace_lock); > } > > +static void ftrace_module_enable(struct module *mod) > +{ > + struct dyn_ftrace *rec; > + struct ftrace_page *pg; > + > + mutex_lock(&ftrace_lock); > + > + if (ftrace_disabled) > + goto out_unlock; > + > + /* > + * If the tracing is enabled, go ahead and enable the record. > + * > + * The reason not to enable the record immediatelly is the > + * inherent check of ftrace_make_nop/ftrace_make_call for > + * correct previous instructions. Making first the NOP > + * conversion puts the module to the correct state, thus > + * passing the ftrace_make_call check. > + * > + * We also delay this to after the module code already set the > + * text to read-only, as we now need to set it back to read-write > + * so that we can modify the text. > + */ > + if (ftrace_start_up) > + ftrace_arch_code_modify_prepare(); > + > + do_for_each_ftrace_rec(pg, rec) { > + int cnt; > + /* > + * do_for_each_ftrace_rec() is a double loop. > + * module text shares the pg. If a record is > + * not part of this module, then skip this pg, > + * which the "break" will do. > + */ > + if (!within_module_core(rec->ip, mod)) > + break; > + > + cnt = 0; > + > + /* > + * When adding a module, we need to check if tracers are > + * currently enabled and if they are, and can trace this record, > + * we need to enable the module functions as well as update the > + * reference counts for those function records. > + */ > + if (ftrace_start_up) > + cnt += referenced_filters(rec); > + > + /* This clears FTRACE_FL_DISABLED */ > + rec->flags = cnt; > + > + if (ftrace_start_up && cnt) { > + int failed = __ftrace_replace_code(rec, 1); If we choose to call ftrace_module_enable when receiving module notification MODULE_STATE_COMING, TEXT section of the module is already changed to RO. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/