Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp768559rdg; Wed, 11 Oct 2023 05:12:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGvONLlH3Kk6na4hQ48/Dl19S8E22rBKvfoxN5zRQ1F1LCLUaj6YieJi5/1+iZb/XAl0ZEZ X-Received: by 2002:a05:6a20:da9d:b0:16c:b5be:5f6c with SMTP id iy29-20020a056a20da9d00b0016cb5be5f6cmr14358854pzb.54.1697026364565; Wed, 11 Oct 2023 05:12:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697026364; cv=none; d=google.com; s=arc-20160816; b=vh1h24tzimQj79YsAa8qquof4mO6yXpm+CkAIQMrlaufL4rY5eOGhHL9REvKYyv2cI ZIQS7xHtYut1VnM/GXZwpMy/MNseRVfxQ09TcCq4A23tMpfHQ606hxDJKqWtbPhcIj6p 5Bgw5jw+DtKSF6eFMTSaA2uEBNocjd0Q/LN212WICQSMEgsy3VPEHb46xCwKp+8W2guR nXYBw/Cshrz4C4Nk1IlLgTZ2UI4RZy4gWEb1NZEa6N21vVszmsdID++4CTUEDQ3CY5F/ PdAZ1JwbWkYDc/c/5LTFUW2efX6z+y9m3jyz17Mdc7XmZe2MmUsYYl9VOGJWFvwisrwT jjVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=spK2veKjGKPWlL695PFbHPuUiEqdtYJXw3T2DQ3u9LI=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=kQPbNZXARvzl5vM9xJS1K8GqIUSY/SANIF+/lYhNZAw8NgtKJx7A8JcoE4lzxxWOax lb1w73FcCbmlDnUHk4Mxjiv7Wlbq70T68OBhzhn2XAa/6fL/W2Sd5kiDccrZPYooxvSm BJF1XOW6MLr9qVI8ifCpsLpeaM4bti9au5QY0W6caeuHPf4JQfqdep55BQN4OPczjl0R 2OU619mLPm7EYM8lcTrGye9qzjKMPup1SIs6RbEv4kmo3wekw58Xmj3uhtnFMru9NYbx 761c7OVi3eKLw8fVoWM37WjriJ6HxaVw++5MgjscSYVFQMJIvrXf83kNIfnjfK95nIPN lFcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=HNYpW+UB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id a29-20020a637f1d000000b00578e7a37c45si14636807pgd.10.2023.10.11.05.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 05:12:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=HNYpW+UB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 7B3C28135618; Wed, 11 Oct 2023 05:12:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346771AbjJKML5 (ORCPT + 99 others); Wed, 11 Oct 2023 08:11:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346854AbjJKMKp (ORCPT ); Wed, 11 Oct 2023 08:10:45 -0400 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8554D58 for ; Wed, 11 Oct 2023 05:09:43 -0700 (PDT) Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-694ed847889so5203357b3a.2 for ; Wed, 11 Oct 2023 05:09:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1697026172; x=1697630972; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=spK2veKjGKPWlL695PFbHPuUiEqdtYJXw3T2DQ3u9LI=; b=HNYpW+UBrqt1EsgOIjp8OlgFt+mMZYusVSuqLV+l/QNLFNSyFc8wDu52pcMSXvk8aN NTmBSuhCYHw1X6T7563JgBBENauR1pagGauCpBDUSWy5DcN6RR0aDrw36Lub/A3yU5jD wToRVraUczoB+pltLxsW56ixxIdV4uskoeyaIYSxmXCglA3hK0/TLBWcZHgNY9+P2tTX ODstqAamlpifAG4ZTg1TyS3WCouQytiQQg4UlBjsPQp3XA3fx3ncHvUSZXxybeTYlXD0 JaSwiC6P8aB09giMMPD64moeHD+fROMg1vMnNCwDVG4Dgis5ff/x1hhndpkqaE7XQcoC b18A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697026172; x=1697630972; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=spK2veKjGKPWlL695PFbHPuUiEqdtYJXw3T2DQ3u9LI=; b=nu5yy6QcjSK3XJICQM1jStejUzBtXG0PLYJOfOPiC/xXXuz89aa5rxcL1zoM3KwQX0 2ts8g4Xv6FK/aE3I50GIpF7VKSRN0a5smJK4OaYqCWcJ10cDwb93OHrNDbP7eVgU4W57 BVv+jcgRVn+PJN8YgOfYhEtqVN2so+Z7hC4ZNr33hydMZEarykpM7pvEd7DircGLOhJ1 fFAy/oSXR+8GJwls8jn5RdbyxagGTvBkuADNFthhRC497LvbqsdM/f9mtv4kPx+uttwS Q0QzkWMLVKN/XS14jhOHQ/pDFyy3unM4Vke7bWQ8pzd41NiMYLw0P2bZdy/N65D3lfT0 P9Ng== X-Gm-Message-State: AOJu0Yy76vnIvt+E71QAENQoWjLy1gtbUxPsQ+0wGDytfSudwYJhuVKW tAwmj0q6K9DX1MHOXkwqb/B5Og== X-Received: by 2002:a05:6a00:1943:b0:690:d4fa:d43d with SMTP id s3-20020a056a00194300b00690d4fad43dmr18270745pfk.6.1697026171738; Wed, 11 Oct 2023 05:09:31 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.142]) by smtp.gmail.com with ESMTPSA id u13-20020a62ed0d000000b006930db1e6cfsm9962769pfh.62.2023.10.11.05.09.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 05:09:31 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v5 5/8] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Date: Wed, 11 Oct 2023 20:08:54 +0800 Message-Id: <20231011120857.251943-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231011120857.251943-1-zhouchuyi@bytedance.com> References: <20231011120857.251943-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 11 Oct 2023 05:12:40 -0700 (PDT) X-Spam-Level: ** css_iter and task_iter should be used in rcu section. Specifically, in sleepable progs explicit bpf_rcu_read_lock() is needed before use these iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to use them directly. This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check whether we are in rcu cs before we want to invoke this kfunc. If the rcu protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU. Once user do rcu_unlock during the iteration, state MEM_RCU of regs would be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED. It is worth noting that currently, bpf_rcu_read_unlock does not clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg search STACK_ITER. Signed-off-by: Chuyi Zhou --- include/linux/bpf_verifier.h | 19 ++++++++------ include/linux/btf.h | 1 + kernel/bpf/helpers.c | 4 +-- kernel/bpf/verifier.c | 50 ++++++++++++++++++++++++++++-------- 4 files changed, 53 insertions(+), 21 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 94ec766432f5..e67cd45a85be 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -386,19 +386,18 @@ struct bpf_verifier_state { u32 jmp_history_cnt; }; -#define bpf_get_spilled_reg(slot, frame) \ +#define bpf_get_spilled_reg(slot, frame, mask) \ (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ + ((1 << frame->stack[slot].slot_type[0]) & (mask))) \ ? &frame->stack[slot].spilled_ptr : NULL) /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ -#define bpf_for_each_spilled_reg(iter, frame, reg) \ - for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ +#define bpf_for_each_spilled_reg(iter, frame, reg, mask) \ + for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask); \ iter < frame->allocated_stack / BPF_REG_SIZE; \ - iter++, reg = bpf_get_spilled_reg(iter, frame)) + iter++, reg = bpf_get_spilled_reg(iter, frame, mask)) -/* Invoke __expr over regsiters in __vst, setting __state and __reg */ -#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ +#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \ ({ \ struct bpf_verifier_state *___vstate = __vst; \ int ___i, ___j; \ @@ -410,7 +409,7 @@ struct bpf_verifier_state { __reg = &___regs[___j]; \ (void)(__expr); \ } \ - bpf_for_each_spilled_reg(___j, __state, __reg) { \ + bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \ if (!__reg) \ continue; \ (void)(__expr); \ @@ -418,6 +417,10 @@ struct bpf_verifier_state { } \ }) +/* Invoke __expr over regsiters in __vst, setting __state and __reg */ +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ + bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr) + /* linked list of verifier states used to prune search */ struct bpf_verifier_state_list { struct bpf_verifier_state state; diff --git a/include/linux/btf.h b/include/linux/btf.h index 928113a80a95..c2231c64d60b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -74,6 +74,7 @@ #define KF_ITER_NEW (1 << 8) /* kfunc implements BPF iter constructor */ #define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */ #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */ +#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */ /* * Tag marking a kernel function as a kfunc. This is meant to minimize the diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 6330e37ca8d5..5081a4ac5b6d 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2555,10 +2555,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 528d375c17ee..3a60cc87520e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1173,7 +1173,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg static void __mark_reg_known_zero(struct bpf_reg_state *reg); +static bool in_rcu_cs(struct bpf_verifier_env *env); + +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta); + static int mark_stack_slots_iter(struct bpf_verifier_env *env, + struct bpf_kfunc_call_arg_meta *meta, struct bpf_reg_state *reg, int insn_idx, struct btf *btf, u32 btf_id, int nr_slots) { @@ -1194,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, __mark_reg_known_zero(st); st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ + if (is_kfunc_rcu_protected(meta)) { + if (in_rcu_cs(env)) + st->type |= MEM_RCU; + else + st->type |= PTR_UNTRUSTED; + } st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = i == 0 ? id : 0; st->iter.btf = btf; @@ -1268,7 +1279,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env, return true; } -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, struct btf *btf, u32 btf_id, int nr_slots) { struct bpf_func_state *state = func(env, reg); @@ -1276,26 +1287,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ spi = iter_get_spi(env, reg, nr_slots); if (spi < 0) - return false; + return -EINVAL; for (i = 0; i < nr_slots; i++) { struct bpf_stack_state *slot = &state->stack[spi - i]; struct bpf_reg_state *st = &slot->spilled_ptr; + if (st->type & PTR_UNTRUSTED) + return -EPROTO; /* only main (first) slot has ref_obj_id set */ if (i == 0 && !st->ref_obj_id) - return false; + return -EINVAL; if (i != 0 && st->ref_obj_id) - return false; + return -EINVAL; if (st->iter.btf != btf || st->iter.btf_id != btf_id) - return false; + return -EINVAL; for (j = 0; j < BPF_REG_SIZE; j++) if (slot->slot_type[j] != STACK_ITER) - return false; + return -EINVAL; } - return true; + return 0; } /* Check if given stack slot is "special": @@ -7618,15 +7631,24 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id return err; } - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots); + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots); if (err) return err; } else { /* iter_next() or iter_destroy() expect initialized iter state*/ - if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) { + err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots); + switch (err) { + case 0: + break; + case -EINVAL: verbose(env, "expected an initialized iter_%s as arg #%d\n", iter_type_str(meta->btf, btf_id), regno); - return -EINVAL; + return err; + case -EPROTO: + verbose(env, "expected an RCU CS when using %s\n", meta->func_name); + return err; + default: + return err; } spi = iter_get_spi(env, reg, nr_slots); @@ -10209,6 +10231,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_RCU; } +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta) +{ + return meta->kfunc_flags & KF_RCU_PROTECTED; +} + static bool __kfunc_param_match_suffix(const struct btf *btf, const struct btf_param *arg, const char *suffix) @@ -11560,6 +11587,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (env->cur_state->active_rcu_lock) { struct bpf_func_state *state; struct bpf_reg_state *reg; + u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER); if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) { verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n"); @@ -11570,7 +11598,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, verbose(env, "nested rcu read lock (kernel function %s)\n", func_name); return -EINVAL; } else if (rcu_unlock) { - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({ if (reg->type & MEM_RCU) { reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL); reg->type |= PTR_UNTRUSTED; -- 2.20.1