zpoolがDEGRADEDになってしまいました。
# zpool status -v pool: hoge state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 09:48:45 with 0 errors on Wed Jun 8 12:50:31 2022 config: NAME STATE READ WRITE CKSUM hoge DEGRADED 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 ada4p2 ONLINE 0 0 0 logs nvd0p1 REMOVED 0 0 0 cache nvd0p2 REMOVED 0 0 0 errors: Permanent errors have been detected in the following files: hoge/media/tmp:<0x0> hoge/var/mail:<0x0> hoge/var:<0x0> hoge/var/log:<0x0> hoge/var/db:<0x0> hoge/var/db/pkg:<0x0>
/var/log/messagesを見ると、
Jun 22 15:19:29 potato kernel: nvme0: RECOVERY_START TTTTTTTTTTTTTTTTT vs UUUUUUUUUUUUUUUUU Jun 22 15:19:29 potato kernel: nvme0: Controller in fatal status, resetting Jun 22 15:19:29 potato kernel: nvme0: Resetting controller due to a timeout and possible hot unplug. Jun 22 15:19:29 potato kernel: nvme0: RECOVERY_WAITING Jun 22 15:19:29 potato kernel: nvme0: resetting controller Jun 22 15:19:29 potato kernel: nvme0: waiting Jun 22 15:19:29 potato kernel: nvme0: failing outstanding i/o Jun 22 15:19:29 potato kernel: nvme0: READ sqid:1 cid:121 nsid:1 lba:VVVVVVV len:4 Jun 22 15:19:29 potato kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:121 cdw0:0 Jun 22 15:19:29 potato kernel: nvme0: failing outstanding i/o Jun 22 15:19:29 potato kernel: nvme0: WRITE sqid:2 cid:117 nsid:1 lba:WWWWWWW len:1 Jun 22 15:19:29 potato kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:117 cdw0:0 Jun 22 15:19:29 potato kernel: nvd0: detached Jun 22 15:19:29 potato kernel: nvme0: READ sqid:2 cid:0 nsid:1 lba:8389176 len:16 Jun 22 15:19:29 potato kernel: nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:0 cdw0:0 Jun 22 15:19:30 potato kernel: nvme0: waiting Jun 22 15:19:30 potato syslogd: last message repeated 1 times Jun 22 15:19:30 potato ZFS[37628]: pool I/O failure, zpool=potato error=6 Jun 22 15:19:30 potato ZFS[37632]: vdev state changed, pool_guid=XXXXXXXXXXXXXXXXXXX vdev_guid=YYYYYYYYYYYYYYYYYYY Jun 22 15:19:30 potato ZFS[37636]: vdev is removed, pool_guid=XXXXXXXXXXXXXXXXXXX vdev_guid=YYYYYYYYYYYYYYYYYYY Jun 22 15:19:30 potato ZFS[37640]: vdev state changed, pool_guid=XXXXXXXXXXXXXXXXXXX vdev_guid=ZZZZZZZZZZZZZZZZZZZ Jun 22 15:19:30 potato ZFS[37644]: vdev is removed, pool_guid=XXXXXXXXXXXXXXXXXXX vdev_guid=ZZZZZZZZZZZZZZZZZZZ Jun 22 15:19:30 potato ZFS[37648]: pool I/O failure, zpool=hoge error=6 Jun 22 15:22:00 potato ZFS[37661]: pool I/O failure, zpool=hoge error=6 Jun 22 16:02:44 potato ZFS[37724]: pool I/O failure, zpool=hoge error=6 Jun 22 16:04:34 potato ZFS[37785]: pool I/O failure, zpool=hoge error=6 Jun 22 16:04:36 potato ZFS[37792]: pool I/O failure, zpool=hoge error=6
という状態。何らかの要因でMVNE SSDがOSから見えなくなってしまったようです。rebootすると、
# zpool status -v pool: hoge state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 09:48:45 with 0 errors on Wed Jun 8 12:50:31 2022 config: NAME STATE READ WRITE CKSUM hoge ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 ada4p2 ONLINE 0 0 0 logs nvd0p1 ONLINE 0 0 0 cache nvd0p2 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: hoge/media/tmp:<0x0> hoge/var/mail:<0x0> hoge/var:<0x0> hoge/var/log:<0x0> hoge/var/db:<0x0> hoge/var/db/pkg:<0x0> hoge/home/potato:<0x0>
となりONLINEには復帰したもののerrorsがうざい。念の為にzpool scrubを実行。
# zpool scrub hoge
errorsを消すために、zpool clear hogeを実行。
# zpool clear hoge
これで、やっとerrorsが消えました。
# zpool status -v pool: hoge state: ONLINE scan: scrub repaired 0B in 09:43:39 with 0 errors on Fri Jun 24 19:27:30 2022 config: NAME STATE READ WRITE CKSUM hoge ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 ada4p2 ONLINE 0 0 0 logs nvd0p1 ONLINE 0 0 0 cache nvd0p2 ONLINE 0 0 0 errors: No known data errors