Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp155995iob; Tue, 3 May 2022 13:56:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwp5qJi7BrLA8sy25RSLPmJ6OQBDmwAehO+D6unw1/yoxQwALIOyEbtV6sR6l/wigp4BoUu X-Received: by 2002:a17:907:1b1b:b0:6e4:7fac:6ce0 with SMTP id mp27-20020a1709071b1b00b006e47fac6ce0mr17314982ejc.617.1651611367426; Tue, 03 May 2022 13:56:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651611367; cv=none; d=google.com; s=arc-20160816; b=TomIYAjNvKxhJsNmZ/aAo2qMePlh864WhsRpa2ea4Cy0bpipRaj/Wv3FUbRYms+0x6 maqysN3wZjPEL0D+0ycqGDW3d8LehUISUdAIb7dILj67u13UcYD6Bb+qg4hdn9bs1YNw mGfs0c1dYAGL8o1X6Jlnchs7PCzHZVCPUr9q1e8j00YYAIYTH7RJFpuyCwBZldqRmnSL E1tvmegCDZ/CmpbRBoazk4k2hYa26t8VOnIWpsf+gLalu2PhrMTRGnEP4Ug+Dxkal+g/ yn+guHugcEhz7eqU+79KBanUnYkoIcSpyUO4KFhg4GsZhixYM7DmxUT3OegxLjhZ9unj k4SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=mOzLpToeNi7cqdxC84EYcmPTEFP0phVGBwbe66P0P1U=; b=t+toVKW3oeRnuceXfegyj5jsnueBJTGVZGuCAppnQs3VuG0dy+mJRprbz+1d/cG4K1 LQLAsJsIQR1uYaZyNkzpRBn2nSqnNDWFzwjyeA3m+nU0zeIAYx+Fb/Zu/0jvX1JsZjn2 yslUqQON4TKK86KlAoc/BufzKzaMJ32zWDhzLiRbG9TlKbJQWU3cCdwYfoTAV0C5TiHj WpGXOiJKzW4W6n56xaehMdiejo/Ov5DiJLRuU/d+hgCxC8HmhUP2xWSVDZO5aiYu8wWp K8EtD7we4iPcIXbq9dfLqFyiTzA5kGh1Y1n8Ger6s1vqbmWJpOg0MH9WT+f3YWmYPui7 uYgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=dnkDAWnT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v2-20020a056402348200b00427f49e7142si1449874edc.16.2022.05.03.13.55.12; Tue, 03 May 2022 13:56:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=dnkDAWnT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241005AbiECSLB (ORCPT + 99 others); Tue, 3 May 2022 14:11:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238979AbiECSK4 (ORCPT ); Tue, 3 May 2022 14:10:56 -0400 Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 249193EAAA for ; Tue, 3 May 2022 11:07:22 -0700 (PDT) Received: by mail-io1-xd2c.google.com with SMTP id e3so16085502ios.6 for ; Tue, 03 May 2022 11:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=mOzLpToeNi7cqdxC84EYcmPTEFP0phVGBwbe66P0P1U=; b=dnkDAWnTkeTj61Zet+oK6f7ii3U29sPVoBa36qZn/k4Z7cGL0eMVzu6Dg9jmKT87FC 0YQxOe3ZEy7XVvtzMKmGQP4KTHmTX4yhcWh+nZNzJ5Od8CqCWt2acfk0K9jexvJl5P+g 1PYiCdrAHAlsEReKwAqQ1Jx2rCZ8IddDVizM4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=mOzLpToeNi7cqdxC84EYcmPTEFP0phVGBwbe66P0P1U=; b=YxbLJUhuz9Xj97kw7IAIQa4VrTpQqOLWcEL3Y8d7MZbb8CieIDZHs9FHu9cH+OjcWC kVSAaMfwdMQOuCrhtDz6hQvdX/MWOBkGinAcIjRoxMpdTRBVfW4Y5DngHF8iI3ff8lPV kVhk67udQFrZxx7RIglIdSi+ED2mwbvCW7WImiZmD72btwZhH0WxGLzAQtxZXaTyRwYG L5noMCfSIuQ0FM306d/0pv5mQ2Pk9PfjKUeomJ1/7DHnsTiJIDLthCH/ysh4JC++qzqg DV3u4+XwcyDkTJ/B/z7BI4w2ZPIoVaWV17MK8XEND50BG8E0oGoYBiHqFaVtjyc51sex kFDQ== X-Gm-Message-State: AOAM532E1UHh/IACAScyVlJCYsA5e6cCJartUhrr9NyQaWx6E9BdrDGI ZchHNolDz2bTBx+m+4ue5hS9dNSiXJHyx4RHWqa5kA== X-Received: by 2002:a05:6638:250b:b0:32b:6adc:d9a with SMTP id v11-20020a056638250b00b0032b6adc0d9amr4596159jat.31.1651601239878; Tue, 03 May 2022 11:07:19 -0700 (PDT) MIME-Version: 1.0 References: <20220418231746.2464800-1-grundler@chromium.org> <23cbe4be-7ced-62da-8fdb-366b726fe10f@marvell.com> In-Reply-To: From: Grant Grundler Date: Tue, 3 May 2022 11:07:08 -0700 Message-ID: Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes To: Dmitrii Bezrukov Cc: Grant Grundler , Igor Russkikh , Jakub Kicinski , Paolo Abeni , netdev , "David S . Miller" , LKML , Aashay Shringarpure , Yi Chou , Shervin Oloumi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dmitrii! On Tue, May 3, 2022 at 4:15 AM Dmitrii Bezrukov wro= te: > > Hi Grants, > > >[1/5] net: atlantic: limit buff_ring index value > > >diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >index d875ce3ec759..e72b9d86f6ad 100644 > >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >@@ -981,7 +981,9 @@ int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s > >*self, struct aq_ring_s *ring) > > > > if (buff->is_lro) { > > /* LRO */ > >- buff->next =3D rxd_wb->next_desc_ptr; > >+ buff->next =3D > >+ (rxd_wb->next_desc_ptr < ring->si= ze) ? > >+ rxd_wb->next_desc_ptr : 0U; > > ++ring->stats.rx.lro_packets; > > } else { > > /* jumbo */ > > I don=E2=80=99t find this way correct. At least in this functions there i= s no access to buffers by this index "next". Did I misunderstand what this code (line 378) is doing in aq_ring.c? 342 #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) 343 int aq_ring_rx_clean(struct aq_ring_s *self, 344 struct napi_struct *napi, 345 int *work_done, 346 int budget) 347 { ... 371 if (buff_->next >=3D self->size) { 372 err =3D -EIO; 373 goto err_exit; 374 } 375 376 frag_cnt++; 377 next_ =3D buff_->next, 378 buff_ =3D &self->buff_ring[next_]; My change is redundant with Lines 371-374 - they essentially do the same thing and were added on 2021-12-26 by 5f50153288452 (Zekun Shen) The original fuzzing work was done on chromeos-v5.4 kernel and didn't include this change. I'll backport 5f50153288452t to chromeos-v5.4 and drop my proposed change. > Following this fix, if it happens then during assembling of LRO session i= t could be that this buffer (you suggesting to use buffer with index 0) bec= omes a part of current LRO session and will be also processed as a single p= acket or as a part of other LRO huge packet. > To be honest there are lot of possibilities depends on current values of = head and tail which may cause either memory leak or double free or somethin= g else. *nod* > There is a code which calls this function aq_ring.c: aq_ring_rx_clean. Exactly. > From my POV it's better to check that indexes don't point to outside of r= ing in code which collects LRO session. Sounds good to me. I don't have a preference. I'm ok with dropping/ignoring [1/5] patch. > There is expectation that "next" is in range "cleaned" descriptors, which= means that HW put data there wrote descriptor and buffers are ready to be = process by assembling code. > So in case if "next" points to something outside of ring then guess it wo= uld be better just to stop collecting these buffers to one huge packet and = treat this LRO session as completed. > Then rest of buffers (which should be it that chain) will be collected ag= ain without beginning and just dropped by stack later. That makes sense to me. And apologies for not noticing Zekun Shen's 2021-12-26 change earlier. I've been working on this off and on for several months. > > [4/5] net: atlantic: add check for MAX_SKB_FRAGS > > > >diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c > >b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c > >index bc1952131799..8201ce7adb77 100644 > >--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c > >+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c > >@@ -363,6 +363,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self, > > continue; > > > > if (!buff->is_eop) { > >+ unsigned int frag_cnt =3D 0U; > > buff_ =3D buff; > > do { > > bool is_rsc_completed =3D true; > >@@ -371,6 +372,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, > > err =3D -EIO; > > goto err_exit; > > } > >+ > >+ frag_cnt++; > > next_ =3D buff_->next, > > buff_ =3D &self->buff_ring[next_]; > > is_rsc_completed =3D > >@@ -378,7 +381,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self, > > next_, > > self->hw_head= ); > > > >- if (unlikely(!is_rsc_completed)) { > >+ if (unlikely(!is_rsc_completed) || > >+ frag_cnt > MAX_SKB_FRAGS) { > > err =3D 0; > > goto err_exit; > > } > > Number of fragments are limited by HW configuration: hw_atl_b0_internal.h= : #define HW_ATL_B0_LRO_RXD_MAX 16U. Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead? > Let's imagine if it happens: driver stucks at this point it will wait for= session completion and session will not be completed due to too much fragm= ents. > Guess in case if number of buffers exceeds this limit then it is required= to close session and continue (submit packet to stack and finalize clearin= g if processed descriptors/buffers). Sorry, I'm not understanding your conclusion. Is the "goto err_exit" in this case doing what you described? Does this patch have the right idea (even if it should test against a different constant)? My main concern is the CPU gets stuck in this loop for a very long (infinite?) time. > > > [5/5] net: atlantic: verify hw_head_ is reasonable diff --git > >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >index e72b9d86f6ad..9b6b93bb3e86 100644 > >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c > >@@ -889,6 +889,27 @@ int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_= s *self, > > err =3D -ENXIO; > > goto err_exit; > > } > >+ > >+ /* Validate that the new hw_head_ is reasonable. */ > >+ if (hw_head_ >=3D ring->size) { > >+ err =3D -ENXIO; > >+ goto err_exit; > >+ } > >+ > >+ if (ring->sw_head >=3D ring->sw_tail) { > >+ /* Head index hasn't wrapped around to below tail index. = */ > >+ if (hw_head_ < ring->sw_head && hw_head_ >=3D ring->sw_ta= il) { > >+ err =3D -ENXIO; > >+ goto err_exit; > >+ } > >+ } else { > >+ /* Head index has wrapped around and is below tail index.= */ > >+ if (hw_head_ < ring->sw_head || hw_head_ >=3D ring->sw_ta= il) { > >+ err =3D -ENXIO; > >+ goto err_exit; > >+ } > >+ } > >+ > > ring->hw_head =3D hw_head_; > > err =3D aq_hw_err_from_flags(self); > > Simple example. One packet with one buffer was sent. Sw_tail =3D 1, sw_he= ad=3D0. From interrupt this function is called and hw_head_ is 1. > Code will follow "else" branch of second "if" that you add and condition = "if (hw_head_ < ring->sw_head || hw_head_ >=3D ring->sw_tail) {" will be tr= ue which seems to be not expected. Correct - I've got this wrong (head/tail swapped). Even if I had it right, As Igor observed, debatable if it's necessary. Please drop/ignore this patch as well. Aashay and I need to discuss this more. thank you again! cheers, grant > > Best regards, > Dmitrii Bezrukov > > -----Original Message----- > From: Grant Grundler > Sent: Tuesday, April 26, 2022 7:21 PM > To: Igor Russkikh > Cc: Grant Grundler ; Dmitrii Bezrukov ; Jakub Kicinski ; Paolo Abeni ; netdev ; David S . Miller ; LKML ; Aashay Shringarpure ; Yi Chou ; Shervin Oloumi > Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes > > [reply-all again since I forgot to tell gmail to post this as "plain text= "...grrh... so much for AI figuring this stuff out.] > > > On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh wro= te: > > > > Hi Grant, > > > > Sorry for the delay, I was on vacation. > > Thanks for working on this. > > Hi Igor! > Very welcome! And yes, I was starting to wonder... but I'm now glad that = you didn't review them before you got back. These patches are no reason to = ruin a perfectly good vacation. :) > > > I'm adding here Dmitrii, to help me review the patches. > > Dmitrii, here is a full series: > > > > https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__patchwork.kernel= . > > org_project_netdevbpf_cover_20220418231746.2464800-2D1-2Dgrundler-40ch > > romium.org_&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ&r=3DAliKLBUTg9lFc5sIM= TzJt8 > > MdPiAgKbsC8IpLIHmdX9w&m=3D1LeNSCJMgZ7UkGBm56FcvL_Oza8VOX45LEtQf31qPE2KL= Q > > cr5Q36aMIUR2DzLhi7&s=3DfVFxLPRO8K2DYFpGUOggf38nbDFaHKg8aATsjB1TuB0&e=3D > > > > Grant, I've reviewed and also quite OK with patches 1-4. > > Excellent! \o/ > > > > For patch 5 - why do you think we need these extra comparisons with sof= tware head/tail? > > The ChromeOS security team (CC'd) believes the driver needs to verify "ex= pected behavior". In other words, the driver expects the device to provide = new values of tail index which are between [tail,head) ("available to fill"= ). > > Your question makes me chuckle because I asked exactly the same question.= :D Everyone agrees it is a minimum requirement to verify the index was "in= bounds". And I agree it's prudent to verify the device is "well behaved" w= here we can. I haven't looked at the code enough to know what could go wron= g if, for example, the tail index is decremented instead of incremented or = a "next fragment" index falls in the "available to fill" range. > > However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS se= curity team's word that this check is needed. If you (or Dmitrii) feel stro= ngly the driver can handle malicious or firmware bugs in other ways, I'm no= t offended if you decline this patch. However, I would be curious what thos= e other mechanisms are. > > > From what I see in logic, only the size limiting check is enough there.= . > > > > Other extra checks are tricky and non intuitive.. > > Yes, somewhat tricky in the code but conceptually simple: For the RX buff= er ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is "availab= le to fill". New tail values should always be in the latter range. > > The trickiness comes in because this is a ring buffer and [tail, head) it= is equally likely that head =3D< tail or head > tail numerically. > > If you like, feel free to add comments explaining the ring behavior or as= k me to add such a comment (and repost #5). I'm a big fan of documenting no= n-intuitive things in the code. That way the next person to look at the cod= e can verify the code and the IO device do what the comment claims. > > On the RX buffer ring, I'm also wondering if there is a race condition su= ch that the driver uses stale values of the tail pointer when walking the R= X fragment lists and validating index values. Aashay assures me this race c= ondition is not possible and I am convinced this is true for the TX buffer = ring where the driver is the "producer" > (tells the device what is in the TX ring). I still have to review the RX = buffer handling code more and will continue the conversation with him until= we agree. > > cheers, > grant > > > > > Regards, > > Igor > > > > On 4/21/2022 9:53 PM, Grant Grundler wrote: > > > External Email > > > > > > -------------------------------------------------------------------- > > > -- > > > Igor, > > > Will you have a chance to comment on this in the near future? > > > Should someone else review/integrate these patches? > > > > > > I'm asking since I've seen no comments in the past three days. > > > > > > cheers, > > > grant > > > > > > > > > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler > > > > > > wrote: > > >> > > >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic > > >> driver in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters > > >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version): > > >> > > >> https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__docs.google.c= o > > >> m_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOp > > >> OlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=3DDwIBaQ&c=3DnKjWec2b= 6R0 > > >> mOyPaz7xtfQ&r=3D3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=3DQoxR= 8Wo > > >> QQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=3D620jqe= S > > >> vQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=3D > > >> > > >> It essentially describes four problems: > > >> 1) validate rxd_wb->next_desc_ptr before populating buff->next > > >> 2) "frag[0] not initialized" case in aq_ring_rx_clean() > > >> 3) limit iterations handling fragments in aq_ring_rx_clean() > > >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update() > > >> > > >> I've added one "clean up" contribution: > > >> "net: atlantic: reduce scope of is_rsc_complete" > > >> > > >> I tested the "original" patches using chromeos-v5.4 kernel branch: > > >> > > >> https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__chromium-2Dre= v > > >> iew.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28st > > >> atus-3Aopen-2520OR-2520status-3Amerged-29&d=3DDwIBaQ&c=3DnKjWec2b6R0= mOy > > >> Paz7xtfQ&r=3D3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=3DQoxR8Wo= QQ- > > >> hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=3D1a1YwJqrY= - > > >> be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=3D > > >> > > >> The fuzzing team will retest using the chromeos-v5.4 patches and > > >> the b0 HW. > > >> > > >> I've forward ported those patches to 5.18-rc2 and compiled them but > > >> am currently unable to test them on 5.18-rc2 kernel (logistics probl= ems). > > >> > > >> I'm confident in all but the last patch: > > >> "net: atlantic: verify hw_head_ is reasonable" > > >> > > >> Please verify I'm not confusing how ring->sw_head and ring->sw_tail > > >> are used in hw_atl_b0_hw_ring_tx_head_update(). > > >> > > >> Credit largely goes to Chrome OS Fuzzing team members: > > >> Aashay Shringarpure, Yi Chou, Shervin Oloumi > > >> > > >> cheers, > > >> grant