Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3633254rdb; Wed, 13 Sep 2023 19:36:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFEL6xKJAZAhT2pT/RW2sXPxIPw6xlJYHT8n6JtapWZqTITx8FH/tdXQSUbFKeo+eGVIGeF X-Received: by 2002:a17:903:1c7:b0:1b9:de75:d5bb with SMTP id e7-20020a17090301c700b001b9de75d5bbmr5303781plh.7.1694658969013; Wed, 13 Sep 2023 19:36:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694658968; cv=none; d=google.com; s=arc-20160816; b=Li9Z+OFP+ig+SR5CAWuGlFeA1fGx0cixmz2BJ4Rny/Qf2AiYvOa8dB3uj1HImFRydv JH8y2C+bW9l5FzNkyga7sRhi2QZIQRruwLRNtfqdmxyT+BEw/gPo8rv+KrC9XNWqf+XD ls8gr+DFZg3yUwkpUjQzaIrn7ePb1af0Ql65vU/UztEk05+jodFFM7qvn/tpr5NutaO6 hNvDuH4tdRArTGSPqPVu222zbUUcU+udthohBLG//DMKqtYCgQbI5ZBBeP3ItP/he/Jg ko+1oIjNulH8xaxdzFFoQJc3UzT5an1pLNC6xhRTZF8mohNoHcRd8825Fbpmy8UrlL6z rlFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:in-reply-to:date:subject:cc:to:from:user-agent :references:dkim-signature; bh=KbpgIZY0S+jrnQeTU9MHb5yuccFEI2qq5nwURR5Z7vY=; fh=Lybw/1fr5s7SfpgtF75xlk4olQS50au3TFUaMtkI0mQ=; b=WS7I2e5TGlC/UyehU4DsRa/tq9/hJqY4C3mElvyajgkOxyEJm1nWc61QuBZcP4Curp b3bS9rs/4JNzPaRYbgee6K43K/UfgVbmcZQjyXRWp2un/zk3lPMtwHzUZ8wuxXayl/2y EVx7kthCes0M1tagQ+bg14gunaVCvqXadnztzh2QdiYLqTkf5Tgdzm+B7ZMe5uvHAFHU XAslJGL7oa8P3ux9wZoOeyAUXlXl7gebxUNWBHdgHDAq53NdFakNNKNzGKfpt/FonDrK K1CdXerq82GcrYcAaty55hD7UxYrqLhJYoJMqDb2xf7be4uq5hVMNyqvo84j6L+Tn7TV fH7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=sxYmXU2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id n18-20020a170902e55200b001b8ba81d04dsi745518plf.395.2023.09.13.19.36.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 19:36:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=sxYmXU2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 64B52815CD92; Wed, 13 Sep 2023 08:53:51 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230023AbjIMPxc (ORCPT + 99 others); Wed, 13 Sep 2023 11:53:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229655AbjIMPxb (ORCPT ); Wed, 13 Sep 2023 11:53:31 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DB71E3 for ; Wed, 13 Sep 2023 08:53:27 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4018af103bcso81515e9.1 for ; Wed, 13 Sep 2023 08:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1694620406; x=1695225206; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:user-agent:references:from:to:cc:subject:date :message-id:reply-to; bh=KbpgIZY0S+jrnQeTU9MHb5yuccFEI2qq5nwURR5Z7vY=; b=sxYmXU2lIBjYAH7eyGQ7Hq0KSDiSJbcuWKC5hJh6RMEFmziORpHB23AgUDZGBTmoU+ OfnKIQGj3KooB4ATvF7qZJAHay3Z4kqrE18YFWFPM6qZVjLoUq+bUrxatuu4orC5otbC bIyMLOsTcqyt2UcXOrIz0fyasLIA6KQUQqB4dAnONm9GlgcjJkK635H7HaU+FGUPBIN2 6s1gXx3ZFqUJKaHfhfwtpEJXPFLaOd03DK8db64v8ZIiV0sC+LDCXfTUVjfEoylHqtTr JOdqdHZimZscMOpUJt3r6kdWK+XiaeUZjR48JcHb3swXHgG76YPCuNI7eVUqdxZv7bvQ iMUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694620406; x=1695225206; h=content-transfer-encoding:mime-version:message-id:in-reply-to:date :subject:cc:to:from:user-agent:references:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=KbpgIZY0S+jrnQeTU9MHb5yuccFEI2qq5nwURR5Z7vY=; b=iXLMBXhX9U6/fYApDWJR3niwoSWLEi4MrEUb+faKeANLwf8wNeBLMN59ANmuaQBhVY e0jomP+Z0F+oARVMDCVY49TvzBMDg8xHvr2plQs9TohBJVR6Ys7hM9lO1O0yjUh3mwrB Rr+pPcKHZ+xvandlFNEjoXC8jMe0aUcarXOwInIkVL2+QgLOnvVYIdxLaExzZ6oIlE+I NaiQi7q4fAkZ+fXa7S2D6W6/WQ+RhV/VfcyL2bVg1tiY9X+Vq3BZMJs+J1+A5BQ0yR9S cJ/nvG4ebKWgbX8QWhSftXQxue4Xi0BbTFy90KVLYqOaYRN73qqzuDRzA802Bsj3Pwta oaAQ== X-Gm-Message-State: AOJu0YxyeKe8zEk3F/sPHDtyzMhyPu8LWdorUgxJjfbPjTFAxK5ySX0g l4g0pCeKMD/nFdYEPmlx71bqVg== X-Received: by 2002:a05:600c:44c9:b0:3ff:a95b:9751 with SMTP id f9-20020a05600c44c900b003ffa95b9751mr4604899wmo.7.1694620405692; Wed, 13 Sep 2023 08:53:25 -0700 (PDT) Received: from zen.linaroharston ([85.9.250.243]) by smtp.gmail.com with ESMTPSA id z20-20020a7bc7d4000000b003feae747ff2sm2400657wmk.35.2023.09.13.08.53.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 08:53:24 -0700 (PDT) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id 496A41FFBB; Wed, 13 Sep 2023 16:53:24 +0100 (BST) References: User-agent: mu4e 1.11.17; emacs 29.1.50 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Erik Schilling Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, virtio-fs@redhat.com, Richard Henderson , Weiwei Li , Vivek Goyal , Stefan Hajnoczi , Miklos Szeredi , Manos Pitsidianakis , Viresh Kumar , linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, German Maglione , Hanna Czenczek , Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= Subject: Re: [BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs Date: Wed, 13 Sep 2023 16:38:12 +0100 In-reply-to: Message-ID: <87msxqrrwr.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 13 Sep 2023 08:53:51 -0700 (PDT) X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,WEIRD_PORT autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email "Erik Schilling" writes: > CCing a few more people as suggested by stefanha on #qemu. (Add philmd who's tracking the MIPs failure) > On Wed Sep 13, 2023 at 8:18 AM CEST, Erik Schilling wrote: >> On Fri Sep 1, 2023 at 12:37 PM CEST, Erik Schilling wrote: >> > On Wed Aug 30, 2023 at 10:20 AM CEST, Erik Schilling wrote: >> > > Hi all! >> > > >> > > Some days ago I posted to #virtiofs:matrix.org, describing that I am >> > > observing what looks like a corruption when executing programs from a >> > > virtiofs-based filesystem. >> > > >> > > Over the last few days I spent more time drilling into the problem. >> > > This is an attempt at summarizing my findings in order to see what o= ther >> > > people think about this. >> > > >> > > When running binaries mounted from virtiofs they may either: fail wi= th a >> > > segfault, fail with badaddr, get stuck or - sometimes - succeed. >> > > >> > > Environment: >> > > Host: Fedora 38 running 6.4.11-200.fc38.x86_64 >> > > Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64 >> > > virtiofsd: latest main + some debug prints [1] >> > > QEMU: built from recent git [2] >> > > >> > > virtiofsd invocation: >> > > RUST_LOG=3D"debug" ./virtiofsd --seccomp=3Dnone --sandbox=3Dnone \ >> > > --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=3D= never >> > > >> > > QEMU invocation: >> > > ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine v= irt \ >> > > -cpu cortex-a57 \ >> > > -serial mon:stdio \ >> > > -device virtio-net-pci,netdev=3Dnet0 \ >> > > -netdev user,id=3Dnet0,hostfwd=3Dtcp::2223-:22 \ >> > > -display none -m 2048 -smp 4 \ >> > > -object memory-backend-memfd,id=3Dmem,size=3D2048M,share=3Don \ >> > > -numa node,memdev=3Dmem \ >> > > -hda trs-overlay-guest.qcow2 \ >> > > -chardev socket,id=3Dchar0,path=3D"fs.sock0" \ >> > > -device vhost-user-fs-pci,queue-size=3D1024,chardev=3Dchar0,tag= =3D/dev/root \ >> > > -append 'root=3D/dev/vda2 ro log_buf_len=3D8M' >> > > >> > > I figured that launching virtiofsd with --cache=3Dalways masks the >> > > problem. Therefore, I set --cache=3Dnever, but I think I observed no >> > > difference compared to the default setting (auto). >> > > >> > > Adding logging to virtiofsd and kernel _feeled_ like it made the pro= blem >> > > harder to reproduce - leaving me with the impression that some race = is >> > > happening on somewhere. >> > > >> > > Trying to rule out that virtiofsd is returning corrupted data, I add= ed >> > > some logging and hashsum calculation hacks to it [1]. The hashes che= ck >> > > out across multiple accesses and the order and kind of queued messag= es >> > > is exactly the same in both the error case and crash case. fio was a= lso >> > > unable to find any errors with a naive job description [3]. >> > > >> > > Next, I tried to capture info on the guest side. This became a bit >> > > tricky since the crashes became pretty rare once I followed a fixed >> > > pattern of starting log capture, running perf and trying to reproduce >> > > the problem. Ultimately, I had the most consistent results with >> > > immediately running a program twice: >> > > >> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \ >> > > /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils / >> > > >> > > (/mnt being the virtiofs mount) >> > > >> > > For collecting logs, I made a hack to the guest kernel in order to d= ump >> > > the page content after receiving the virtiofs responses [4]. Reprodu= cing >> > > the problem with this, leaves me with logs that seem to suggest that >> > > virtiofsd is returning identical content, but the guest kernel seems= to >> > > receive differing pages: >> > > >> > > good-kernel [5]: >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 n= odeid 0x1 in.len 56 out.len 104 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 n= odeid 0x1 in.len 53 out.len 128 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 n= odeid 0x29 in.len 56 out.len 104 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 = nodeid 0x29 in.len 48 out.len 16 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a = nodeid 0x29 in.len 80 out.len 832 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs: page: 000000006996d520 >> > > kernel: virtio_fs: to: 00000000de590c14 >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > [...] >> > > >> > > bad-kernel [6]: >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 n= odeid 0x1 in.len 56 out.len 104 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 n= odeid 0x1 in.len 53 out.len 128 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 n= odeid 0x16 in.len 56 out.len 104 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 = nodeid 0x16 in.len 48 out.len 16 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a = nodeid 0x16 in.len 80 out.len 832 >> > > kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 >> > > kernel: virtio_fs: page: 000000006ce9a559 >> > > kernel: virtio_fs: to: 000000007ae8b946 Are these the copying from virtio to buffer cache? I would assume they trigger -d trace:memory_notdirty_write_access if so. >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00= 00 ................ >> > > kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00= 00 .@.......+b..... >> > > kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00= 00 .N.............. >> > > [...] >> > > >> > > When looking at the corresponding output from virtiofsd, it claims to >> > > have returned identical data: >> > > >> > > good-virtiofsd [7]: >> > > [DEBUG virtiofsd::server] Received request: opcode=3DRead (15), in= ode=3D41, unique=3D794, pid=3D481 >> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = =3D ReadIn { >> > > fh: 31, >> > > offset: 0, >> > > size: 832, >> > > read_flags: 2, >> > > lock_owner: 6838554705639967244, >> > > flags: 131072, >> > > padding: 0, >> > > } >> > > [src/file_traits.rs:161] hash =3D 2308490450751364994 >> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 84= 8, error: 0, unique: 794 } >> > > >> > > bad-virtiofsd [8]: >> > > [DEBUG virtiofsd::server] Received request: opcode=3DRead (15), in= ode=3D22, unique=3D362, pid=3D406 >> > > [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = =3D ReadIn { >> > > fh: 12, >> > > offset: 0, >> > > size: 832, >> > > read_flags: 2, >> > > lock_owner: 6181120926258395554, >> > > flags: 131072, >> > > padding: 0, >> > > } >> > > [src/file_traits.rs:161] hash =3D 2308490450751364994 >> > > [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 84= 8, error: 0, unique: 362 } >> > > >> > > The "corruption" only seems to happen in this one page, all other pa= ges >> > > are identical between runs (except that the bad run terminates earli= er). >> > > >> > > What do the experts think here? To me it feels a bit like some kind = of >> > > corruption is going on. Or am I misinterpreting things here? >> > > >> > > Which further analysis steps would you suggest? >> > > >> > > >> > > Further notes: >> > > >> > > After collecting the above results, I realized that running the guest >> > > with -smp 1 makes the problems a lot worse. So maybe that is a better >> > > choice when trying to reproduce it. >> > > >> > > Repo with my scripts is available at: >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/ >> > > >> > > The scripts are just quick and dirty implementations and are not >> > > particulary portable. >> > >> > Summary of my testing during the last few days: >> > >> > Testing with KCSAN revealed a few cases that look like missing READ_ON= CE >> > annotations (will send patches separately). But nothing of that was >> > related to the immediate problem. I tested instrument_read() and anoth= er >> > round of logging with a delay to virtio_fs_request_complete. It looks >> > like the buffer get corrupted before entering that function. KCSAN >> > or manual sleeps + prints did not show any corruption while in that >> > function. >> > >> > KASAN did not report any issues. >> > >> > Patching virtiofsd to do an additional copy and going through rust-vmm= 's >> > .copy_to() function did not change the behaviour. >> > >> > I will mostly be off next week, will continue analysis afterwards. Hap= py >> > to hear about suggestions of other things to try :). >> >> Back from a week of vacation... >> >> Summary of what was discussed on #virtiofs:matrix.org: >> >> The issue only seems to happen in QEMU TCG scenarios (I tested aarch64 >> and x86_64 on x86_64, wizzard on Matrix tested arm32). >> >> CCing qemu-devel. Maybe someone has some hints on where to focus the >> debugging efforts? >> >> I am trying to build a complex monster script of tracing the relevant >> addresses in order to figure out whether the guest or host does the >> writes. But I am happy to hear about more clever ideas :). > > After hearing about investigations of bugs in other virtio scenarios > that seem to be caused by QEMU [9], I tested some older QEMU versions. > > Indeed, a882b5712373171d3bd53cd82ddab4453ddef468 did not show the buggy > behaviour. So I did a bisect: > > git bisect start > # status: waiting for both good and bad commits > # good: [a882b5712373171d3bd53cd82ddab4453ddef468] Revert "virtio: > introduce macro IRTIO_CONFIG_IRQ_IDX" > git bisect good a882b5712373171d3bd53cd82ddab4453ddef468 > # status: waiting for bad commit, 1 good commit known > # bad: [9ef497755afc252fb8e060c9ea6b0987abfd20b6] Merge tag > 'pull-vfio-20230911' of https://github.com/legoater/qemu into staging > git bisect bad 9ef497755afc252fb8e060c9ea6b0987abfd20b6 > # skip: [3ba5fe46ea4456a16e2f47ab8e75943b54879c4e] Merge tag > 'mips-20221108' of https://github.com/philmd/qemu into staging > git bisect skip 3ba5fe46ea4456a16e2f47ab8e75943b54879c4e > # skip: [ade760a2f63804b7ab1839fbc3e5ddbf30538718] Merge tag > 'pull-request-2022-11-08' of https://gitlab.com/thuth/qemu into > staging > git bisect skip ade760a2f63804b7ab1839fbc3e5ddbf30538718 > # good: [ad2ca2e3f762b0cb98eb976002569795b270aef1] target/xtensa: Dro= p tcg_temp_free > git bisect good ad2ca2e3f762b0cb98eb976002569795b270aef1 > # bad: [19a720b74fde7e859d19f12c66a72e545947a657] Merge tag > 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into > staging > git bisect bad 19a720b74fde7e859d19f12c66a72e545947a657 > # bad: [29d9efca16080211f107b540f04d1ed3c12c63b0] arm/Kconfig: Do > not build TCG-only boards on a KVM-only build > git bisect bad 29d9efca16080211f107b540f04d1ed3c12c63b0 > # good: [9636e513255362c4a329e3e5fb2c97dab3c5ce47] Merge tag > 'misc-next-pull-request' of https://gitlab.com/berrange/qemu into > staging > git bisect good 9636e513255362c4a329e3e5fb2c97dab3c5ce47 > # bad: [45608654aa63ca2b311d6cb761e1522f2128e00e] Merge tag > 'pull-tpm-2023-04-20-1' of https://github.com/stefanberger/qemu-tpm > into staging > git bisect bad 45608654aa63ca2b311d6cb761e1522f2128e00e > # good: [1ff4a81bd3efb207992f1da267886fe0c4df764f] tcg: use QTree ins= tead of GTree > git bisect good 1ff4a81bd3efb207992f1da267886fe0c4df764f > # bad: [9ed98cae151368cc89c4bb77c9f325f7185e8f09] block-backend: > ignore inserted state in blk_co_nb_sectors > git bisect bad 9ed98cae151368cc89c4bb77c9f325f7185e8f09 > # good: [c8cb603293fd329f2a62ade76ec9de3f462fc5c3] tests/avocado: Tes= t Xen guest support under KVM > git bisect good c8cb603293fd329f2a62ade76ec9de3f462fc5c3 > # bad: [64f1c63d87208e28e8e38c4ab514ada1728960ef] Merge tag > 'pull_error_handle_fix_use_after_free.v1' of > https://github.com/stefanberger/qemu-tpm into staging > git bisect bad 64f1c63d87208e28e8e38c4ab514ada1728960ef > # good: [8a712df4d4d736b7fe6441626677bfd271d95b15] Merge tag > 'pull-for-8.0-040423-2' of https://gitlab.com/stsquad/qemu into > staging > git bisect good 8a712df4d4d736b7fe6441626677bfd271d95b15 > # bad: [7d0334e49111787ae19fbc8d29ff6e7347f0605e] Merge tag > 'pull-tcg-20230404' of https://gitlab.com/rth7680/qemu into staging > git bisect bad 7d0334e49111787ae19fbc8d29ff6e7347f0605e > # bad: [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump= cache set in cpu_exec_loop > git bisect bad 3371802fba3f7be4465f8a5e5777d43d556676ef > # good: [6cda41daa2162b8e1048124655ba02a8c2b762b4] Revert > "linux-user/arm: Take more care allocating commpage" > git bisect good 6cda41daa2162b8e1048124655ba02a8c2b762b4 > # skip: [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix ove= rwrite problems of tcg_cflags > git bisect skip c83574392e0af108a643347712564f6749906413 > # only skipped commits left to test > # possible first bad commit: > [3371802fba3f7be4465f8a5e5777d43d556676ef] accel/tcg: Fix jump cache > set in cpu_exec_loop It should be possible to rule out the jump cache in HEAD by patching out the two: if (likely(tb && tb->pc =3D=3D pc && tb->cs_base =3D=3D cs_base && tb->flags =3D=3D flags && tb_cflags(tb) =3D=3D cflags)) { return tb; } functions to if(0) { return tb } in tb_lookup. That can rule out if a corrupted jump cache is responsible (at the cost of running a bit slower). Apropos of earlier tb_jmp_cache_inval_tb() is responsible for removing entries of TB's that are invalidated for whatever reason from the cache. > # possible first bad commit: > [c83574392e0af108a643347712564f6749906413] accel/tcg: Fix overwrite > problems of tcg_cflags > > I had an inclusive test in the end where c83574392e did not yield in me > being able to start the VM. It's interesting that is involved the PCREL stuff. This is a relatively recent optimisation that allows us to re-use translations for physical pages that are mapped into virtual memory multiple times. To do that the target translator has to support CF_PCREL and translate only knowing the PC as an offset into the page. > > Whether one of these contains a bug or whether only new behaviour of > QEMU revealed a bug somewhere else is of course still to be figured out. > > [9] https://gitlab.com/qemu-project/qemu/-/issues/1866 This is follow up on #1826, #1834, #1846 which is all broadly to do with how QEMU deals with updates to it's memory maps (e.g. when PCI devices are remapped/reconfigured). We are unsure if the MIPS architecture should be executing some sort of barrier around reconfiguration that can ensure we exit the block or we have to detect mid-block IO accesses icount style. > > - Erik > >> >> - Erik >> >> > >> > Good weekend, >> > >> > - Erik >> > >> > >> > > >> > > - Erik >> > > >> > > [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd= 6e1f169801b2b03da1f >> > > [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931= c99ef7390f5d3d6fbf6ec4 >> > > [3] >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and= -verify-mmap.fio >> > > [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76= ad7701dc0bb95 >> > > [5] Line 12923: >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-= 1.txt >> > > [6] Line 12923: >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt >> > > [7] >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#= L2538-2549 >> > > [8] >> > > https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s= -config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#= L1052-1063 --=20 Alex Benn=C3=A9e Virtualisation Tech Lead @ Linaro