Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E1A7C61DA4 for ; Sat, 18 Feb 2023 06:31:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229638AbjBRGbz (ORCPT ); Sat, 18 Feb 2023 01:31:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjBRGbx (ORCPT ); Sat, 18 Feb 2023 01:31:53 -0500 Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D67FF5D3DE for ; Fri, 17 Feb 2023 22:31:49 -0800 (PST) Received: from loongson.cn (unknown [113.200.148.30]) by gateway (Coremail) with SMTP id _____8DxmdnUcPBjSCQCAA--.4900S3; Sat, 18 Feb 2023 14:31:48 +0800 (CST) Received: from [10.130.0.102] (unknown [113.200.148.30]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxX+TRcPBjeew1AA--.64979S3; Sat, 18 Feb 2023 14:31:46 +0800 (CST) Subject: Re: [PATCH v4 2/3] LoongArch: Add ptrace single step support To: WANG Xuerui , Huacai Chen , Oleg Nesterov Cc: Jiaxun Yang , loongarch@lists.linux.dev, linux-kernel@vger.kernel.org References: <20230217023745.20800-1-zhangqing@loongson.cn> <20230217023745.20800-3-zhangqing@loongson.cn> From: Qing Zhang Message-ID: Date: Sat, 18 Feb 2023 14:31:45 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID: AQAAf8AxX+TRcPBjeew1AA--.64979S3 X-CM-SenderInfo: x2kd0wptlqwqxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvAXoW3Cw4DXF4xXryUKrWxAr1DZFb_yoW8Xw4UGo W7KF1rtr4rJr4jgr1UJr1DJF13Xr1UGrZFyry7G3srX3W8A3WUW3yUGrW5tay7Jr1kGr4U Ga4UXry0yFyxArnxn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXasCq-sGcSsGvf J3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU0xBIdaVrnRJU UUvYb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2IYs7xG6rWj6s 0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1l84 ACjcxK6I8E87Iv67AKxVWxJVW8Jr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4j6r4UJwAS 0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21l42 xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWU GwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI4 8JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4U MIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I 8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j1YL9UUUUU= Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Xuerui On 2023/2/17 下午6:16, WANG Xuerui wrote: > On 2023/2/17 10:37, Qing Zhang wrote: >> Use the generic ptrace_resume code for PTRACE_SYSCALL, PTRACE_CONT, >> PTRACE_KILL and PTRACE_SINGLESTEP. This implies defining >> arch_has_single_step in  and implementing the >> user_enable_single_step and user_disable_single_step functions. >> >> LongArch has no hardware single-step register. the hardware single-step > > "LoongArch cannot do hardware single-stepping per se" could be more > accurate a description. ("Hardware single-step register" could be > ambiguous in implying the only way to achieve single-stepping would be > via software, but in fact you explained in the next sentence that it's > actually doable, just in an alternate way.) > >> function multiplex fetch instruction watchpoint(FWPS) and specifies that >> the next instruction must trigger the watch exception by setting the >> mask bit. > > I assume the "multiplex" is translated from "复用", in which case it's > not used in the exact same way as in Chinese. Plus you have the subject > wrong, "hardware single-step function" is the topic but not the actor > doing the "multiplexing". > > My take (continuing from the previous sentence I suggested): > > "Instead it is achieved by configuring the instruction fetch watchpoints > so that WPEF is triggered on the next instruction." > >> Some scenarios use CSR.FWPS.Skip to ignore the next hit result, not to >> trigger >> the watchpoint exception, and proceed to the next instruction. > > Again, wrong topicalization ("scenarios" is not the actor/subject). "In > some scenarios CSR.FWPS.Skip is used to ..." sounds better. > > And the multiple "next" usages sound unclear, you could probably reword > this a bit. (I can't immediately understand this part so I'm unable to > provide a suggestion.) > >> >> Signed-off-by: Qing Zhang >> --- >>   arch/loongarch/include/asm/inst.h      | 39 +++++++++++++++ >>   arch/loongarch/include/asm/loongarch.h |  3 ++ >>   arch/loongarch/include/asm/processor.h |  3 ++ >>   arch/loongarch/include/asm/ptrace.h    |  2 + >>   arch/loongarch/kernel/hw_breakpoint.c  | 35 +++++++++++-- >>   arch/loongarch/kernel/ptrace.c         | 68 ++++++++++++++++++++++++++ >>   arch/loongarch/kernel/traps.c          | 34 +++++++++++-- >>   7 files changed, 176 insertions(+), 8 deletions(-) >> >> diff --git a/arch/loongarch/include/asm/inst.h >> b/arch/loongarch/include/asm/inst.h >> index ba18ce8fbdf2..00c6f261d9a2 100644 >> --- a/arch/loongarch/include/asm/inst.h >> +++ b/arch/loongarch/include/asm/inst.h >> @@ -368,6 +368,45 @@ static inline bool is_stack_alloc_ins(union >> loongarch_instruction *ip) >>           is_imm12_negative(ip->reg2i12_format.immediate); >>   } >> +static inline bool branch_ins_target_pc(union loongarch_instruction *ip) > > This name is not faithfully reflecting its purpose, which is to check if > the given instruction will change PC; and the reason this has to be done > in the first place being that, single stepping would never stop due to > current hardware erratum if the PC would stay the same after the insn's > execution. > > You may want to add some comments regarding the erratum too if you're in > a position to disclose more details. > >> +{ >> +    switch (ip->reg0i26_format.opcode) { >> +    case b_op: >> +    case bl_op: >> +        if (ip->reg0i26_format.immediate_l == 0 >> +           && ip->reg0i26_format.immediate_h == 0) >> +            return false; >> +    } >> + >> +    switch (ip->reg1i21_format.opcode) { >> +    case beqz_op: >> +    case bnez_op: >> +    case bceqz_op: >> +        if (ip->reg1i21_format.immediate_l == 0 >> +           && ip->reg1i21_format.immediate_h == 0) >> +            return false; >> +    } >> + >> +    switch (ip->reg2i16_format.opcode) { >> +    case jirl_op: >> +        if (ip->reg2i16_format.rj == 0x1 >> +           && ip->reg2i16_format.rd == 0x1 > Given the appearance of the other cases, I feel like the triggering > condition of the nonterminating singlestepping is that "new PC == > current PC" always hold. So, isn't every `rd == rj && imm == 0` case > problematic? > We just have to think about dest_pc == current_pc, Not every case where 'rd == rj &&imm == 0' is a problem. current_pc = csr_era (The user mode of the Exception pc) dest_pc = rj + offset eg: jirl ra,ra 0 1. When rj == ra, current_pc = dest_pc. jirl t0,t0 0 2. When rj is another register, if the value in it is equal to ra, current_pc = dest_pc. The value of rj is recovered by pt_reg. Therefore, rd does not need comparison, I modified jirl's judgment and others in v5. Thanks, -Qing >> +           && ip->reg2i16_format.immediate == 0) >> +            return false; >> +        break; >> +    case beq_op: >> +    case bne_op: >> +    case blt_op: >> +    case bge_op: >> +    case bltu_op: >> +    case bgeu_op: >> +        if (ip->reg2i16_format.immediate == 0) >> +            return false; >> +    } >> + >> +    return true; >> +} >> + >>   void simu_pc(struct pt_regs *regs, union loongarch_instruction insn); >>   void simu_branch(struct pt_regs *regs, union loongarch_instruction >> insn); >> diff --git a/arch/loongarch/include/asm/loongarch.h >> b/arch/loongarch/include/asm/loongarch.h >> index e9aed583a064..65b7dcdea16d 100644 >> --- a/arch/loongarch/include/asm/loongarch.h >> +++ b/arch/loongarch/include/asm/loongarch.h >> @@ -1055,6 +1055,9 @@ static __always_inline void iocsr_write64(u64 >> val, u32 reg) >>   #define LOONGARCH_CSR_DERA        0x501    /* debug era */ >>   #define LOONGARCH_CSR_DESAVE        0x502    /* debug save */ >> +#define CSR_FWPC_SKIP_SHIFT        16 >> +#define CSR_FWPC_SKIP            (_ULCAST_(1) << CSR_FWPC_SKIP_SHIFT) >> + >>   /* >>    * CSR_ECFG IM >>    */ >> diff --git a/arch/loongarch/include/asm/processor.h >> b/arch/loongarch/include/asm/processor.h >> index db060c5a976f..3ea0f1910c23 100644 >> --- a/arch/loongarch/include/asm/processor.h >> +++ b/arch/loongarch/include/asm/processor.h >> @@ -131,6 +131,9 @@ struct thread_struct { >>       struct perf_event    *hbp_break[LOONGARCH_MAX_BRP]; >>       struct perf_event    *hbp_watch[LOONGARCH_MAX_WRP]; >> +    /* Used by ptrace single_step */ > > "PTRACE_SINGLESTEP" if you want to refer to the technical name of the > feature, or "ptrace single stepping" which is more natural language. > >> +    unsigned long single_step; >> + >>       /* >>        * FPU & vector registers, must be at last because >>        * they are conditionally copied at fork(). >> diff --git a/arch/loongarch/include/asm/ptrace.h >> b/arch/loongarch/include/asm/ptrace.h >> index 58596c4f8a0f..66a0e6c480a3 100644 >> --- a/arch/loongarch/include/asm/ptrace.h >> +++ b/arch/loongarch/include/asm/ptrace.h >> @@ -150,4 +150,6 @@ static inline void user_stack_pointer_set(struct >> pt_regs *regs, >>       regs->regs[3] = val; >>   } >> +#define arch_has_single_step()        (1) >> + >>   #endif /* _ASM_PTRACE_H */ >> diff --git a/arch/loongarch/kernel/hw_breakpoint.c >> b/arch/loongarch/kernel/hw_breakpoint.c >> index 6431cd319c32..75d3652fbe00 100644 >> --- a/arch/loongarch/kernel/hw_breakpoint.c >> +++ b/arch/loongarch/kernel/hw_breakpoint.c >> @@ -153,6 +153,22 @@ static int hw_breakpoint_slot_setup(struct >> perf_event **slots, int max_slots, >>    */ >>   void flush_ptrace_hw_breakpoint(struct task_struct *tsk) >>   { >> +    int i; >> +    struct thread_struct *t = &tsk->thread; >> + >> +    for (i = 0; i < LOONGARCH_MAX_BRP; i++) { >> +        if (t->hbp_break[i]) { >> +            unregister_hw_breakpoint(t->hbp_break[i]); >> +            t->hbp_break[i] = NULL; >> +        } >> +    } >> + >> +    for (i = 0; i < LOONGARCH_MAX_WRP; i++) { >> +        if (t->hbp_watch[i]) { >> +            unregister_hw_breakpoint(t->hbp_watch[i]); >> +            t->hbp_watch[i] = NULL; >> +        } >> +    } >>   } >>   void ptrace_hw_copy_thread(struct task_struct *tsk) >> @@ -498,11 +514,20 @@ arch_initcall(arch_hw_breakpoint_init); >>   void hw_breakpoint_thread_switch(struct task_struct *next) >>   { >>       struct pt_regs *regs = task_pt_regs(next); >> - >> -    /* Update breakpoints */ >> -    update_bp_registers(regs, 1, 0); >> -    /* Update watchpoints */ >> -    update_bp_registers(regs, 1, 1); >> +    u64 addr, mask; >> + >> +    if (test_bit(TIF_SINGLESTEP, &task_thread_info(next)->flags)) { >> +        addr = read_wb_reg(CSR_CFG_ADDR, 0, 0); >> +        mask = read_wb_reg(CSR_CFG_MASK, 0, 0); >> +        if ((task_pt_regs(next)->csr_era & ~mask) == (addr & ~mask)) >> +            csr_write32(CSR_FWPC_SKIP, LOONGARCH_CSR_FWPS); >> +        regs->csr_prmd |= CSR_PRMD_PWE; >> +    } else { >> +        /* Update breakpoints */ >> +        update_bp_registers(regs, 1, 0); >> +        /* Update watchpoints */ >> +        update_bp_registers(regs, 1, 1); >> +    } >>   } >>   void hw_breakpoint_pmu_read(struct perf_event *bp) >> diff --git a/arch/loongarch/kernel/ptrace.c >> b/arch/loongarch/kernel/ptrace.c >> index bee4194177fd..52a3ee4366f4 100644 >> --- a/arch/loongarch/kernel/ptrace.c >> +++ b/arch/loongarch/kernel/ptrace.c >> @@ -20,6 +20,7 @@ >>   #include >>   #include >>   #include >> +#include >>   #include >>   #include >>   #include >> @@ -30,6 +31,7 @@ >>   #include >>   #include >>   #include >> +#include >>   #include >>   #include >> @@ -39,6 +41,7 @@ >>   #include >>   #include >>   #include >> +#include >>   #include >>   #include >> @@ -541,3 +544,68 @@ long arch_ptrace(struct task_struct *child, long >> request, >>       return ret; >>   } >> + >> +void ptrace_triggered(struct perf_event *bp, >> +              struct perf_sample_data *data, struct pt_regs *regs) >> +{ >> +    struct perf_event_attr attr; >> + >> +    attr = bp->attr; >> +    attr.disabled = true; >> +    modify_user_hw_breakpoint(bp, &attr); >> +} >> + >> +static int set_single_step(struct task_struct *tsk, unsigned long addr) >> +{ >> +    struct thread_struct *thread = &tsk->thread; >> +    struct perf_event *bp; >> +    struct perf_event_attr attr; >> +    struct arch_hw_breakpoint *info; >> + >> +    bp = thread->hbp_break[0]; >> +    if (!bp) { >> +        ptrace_breakpoint_init(&attr); >> + >> +        attr.bp_addr = addr; >> +        attr.bp_len = HW_BREAKPOINT_LEN_8; >> +        attr.bp_type = HW_BREAKPOINT_X; >> + >> +        bp = register_user_hw_breakpoint(&attr, ptrace_triggered, >> +                         NULL, tsk); >> +        if (IS_ERR(bp)) >> +            return PTR_ERR(bp); >> + >> +        thread->hbp_break[0] = bp; >> +    } else { >> +        int err; >> + >> +        attr = bp->attr; >> +        attr.bp_addr = addr; >> +        /* reenable breakpoint */ >> +        attr.disabled = false; >> +        err = modify_user_hw_breakpoint(bp, &attr); >> +        if (unlikely(err)) >> +            return err; >> + >> +        csr_write64(attr.bp_addr, LOONGARCH_CSR_IB0ADDR); >> +    } >> +    info = counter_arch_bp(bp); >> +    info->mask = 0xffffffffffff; >> + >> +    return 0; >> +} >> + >> +/* ptrace API */ >> +void user_enable_single_step(struct task_struct *task) >> +{ >> +    struct thread_info *ti = task_thread_info(task); >> + >> +    set_single_step(task, task_pt_regs(task)->csr_era); >> +    task->thread.single_step = task_pt_regs(task)->csr_era; >> +    set_ti_thread_flag(ti, TIF_SINGLESTEP); >> +} >> + >> +void user_disable_single_step(struct task_struct *task) >> +{ >> +    clear_tsk_thread_flag(task, TIF_SINGLESTEP); >> +} >> diff --git a/arch/loongarch/kernel/traps.c >> b/arch/loongarch/kernel/traps.c >> index 2b133079e0f3..a59275a43bd1 100644 >> --- a/arch/loongarch/kernel/traps.c >> +++ b/arch/loongarch/kernel/traps.c >> @@ -511,9 +511,37 @@ asmlinkage void noinstr do_watch(struct pt_regs >> *regs) >>   #ifdef CONFIG_HAVE_HW_BREAKPOINT >>       irqentry_state_t state = irqentry_enter(regs); >> -    breakpoint_handler(regs); >> -    watchpoint_handler(regs); >> -    force_sig(SIGTRAP); >> +    if (test_tsk_thread_flag(current, TIF_SINGLESTEP)) { >> +        int llbit = (csr_read32(LOONGARCH_CSR_LLBCTL) & 0x1); >> +        unsigned long pc = regs->csr_era; >> +        union loongarch_instruction *ip = (union >> loongarch_instruction *)pc; >> + >> +        if (llbit) { >> +        /* >> +         * When the ll-sc combo is encountered, it is regarded as an >> single >> +         * instruction. So don't clear llbit and reset CSR.FWPS.Skip >> until >> +         * the llsc execution is completed. >> +         */ >> +            csr_write32(CSR_FWPC_SKIP, LOONGARCH_CSR_FWPS); >> +            csr_write32(CSR_LLBCTL_KLO, LOONGARCH_CSR_LLBCTL); >> +        } else if (pc == current->thread.single_step) { >> +        /* >> +         * Maybe some hardware Bug caused that certain insns are >> occasionally >> +         * not skipped when CSR.FWPS.Skip is set, such as >> fld.d/fst.d. So Singlestep >> +         * needs to compare whether the csr_era is equal to the value >> of singlestep >> +         * which last time set. >> +         */ >> +            if (branch_ins_target_pc(ip)) >> +            /* Prevent rd == pc from causing single step to fail to >> stop */ >> +                csr_write32(CSR_FWPC_SKIP, LOONGARCH_CSR_FWPS); >> +        } else { >> +            force_sig(SIGTRAP); >> +        } >> +    } else { >> +        breakpoint_handler(regs); >> +        watchpoint_handler(regs); >> +        force_sig(SIGTRAP); >> +    } >>       irqentry_exit(regs, state); >>   #endif >