Server Fault Asked by FlorentR on November 7, 2021
I upgraded my server (SuperMicro X11-SSM-F, LSI SAS 9211-8i) from Ubuntu 18.04 to 20.04. The server had 2 zpools, one composed of a single WD Red 10 TB (downloadpool), and the other composed of 8 WD Red 10TB and 2 Seagate IronWolf 8TB arranged in 5x 2 mirrors (masterpool). The pools were created using /dev/disk/by-id
references so as to be stable across restarts. The pools are scrubbed regularly, and the last scrub was a couple of weeks old and didn’t show any errors.
When I rebooted after updating to Ubuntu 20.04, the second pool (masterpool) was gone. After running zfs import
, it reimported it, but using sdX
references for most of the disks (the WD Reds, but not the Seagates). Also, the pool with the single lone WD Red was fine and was referencing its disk by-id.
The output of zpool status
for masterpool looked something like this (this is from memory):
NAME STATE READ WRITE CKSUM
masterpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdk ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdf ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdh ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
This is not ideal, because these identifiers are not stable, so after looking a bit online, I re-exported the pool, and ran zpool import -d /dev/disk/by-id masterpool
.
But now, zpool is telling me that there are checksum errors:
NAME STATE READ WRITE CKSUM
masterpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000cca26af27d8b ONLINE 0 0 2
wwn-0x5000cca273ee8907 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000cca26aeb9280 ONLINE 0 0 8
wwn-0x5000cca273eeaed7 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
wwn-0x5000cca273c21a05 ONLINE 0 0 0
wwn-0x5000cca267eaa17a ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
wwn-0x5000cca26af7e655 ONLINE 0 0 0
wwn-0x5000cca273c099dd ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
So I’m running a scrub, and zfs has found a few more checksum errors:
pool: masterpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Fri May 22 21:47:34 2020
27.1T scanned at 600M/s, 27.0T issued at 597M/s, 31.1T total
112K repaired, 86.73% done, 0 days 02:00:45 to go
config:
NAME STATE READ WRITE CKSUM
masterpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x5000cca26af27d8b DEGRADED 0 0 15 too many errors (repairing)
wwn-0x5000cca273ee8907 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
wwn-0x5000cca26aeb9280 DEGRADED 0 0 18 too many errors (repairing)
wwn-0x5000cca273eeaed7 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
wwn-0x5000cca273c21a05 ONLINE 0 0 0
wwn-0x5000cca267eaa17a ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
wwn-0x5000cca26af7e655 ONLINE 0 0 0
wwn-0x5000cca273c099dd ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
Weirdly, smartctl does not show anything amiss in the smart monitoring data (similar output for both disks, just showing one):
$ sudo smartctl /dev/disk/by-id/wwn-0x5000cca26aeb9280 -a
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0004 129 129 054 Old_age Offline - 112
3 Spin_Up_Time 0x0007 153 153 024 Pre-fail Always - 431 (Average 430)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 31
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000a 100 100 067 Old_age Always - 0
8 Seek_Time_Performance 0x0004 128 128 020 Old_age Offline - 18
9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 15474
10 Spin_Retry_Count 0x0012 100 100 060 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31
22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 664
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 664
194 Temperature_Celsius 0x0002 158 158 000 Old_age Always - 41 (Min/Max 16/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 19 -
# 2 Short offline Completed without error 00% 0 -
...
Also, I notice that many of the aliases in /dev/disk/by-id
are gone (all the ata-*
for the WD Reds except the lone one in cloudpool):
# ls /dev/disk/by-id/ -l
total 0
lrwxrwxrwx 1 root root 9 May 22 23:19 ata-Samsung_SSD_850_EVO_500GB_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 ata-Samsung_SSD_850_EVO_500GB_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 22 23:19 scsi-0ATA_Samsung_SSD_850_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-0ATA_Samsung_SSD_850_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 22 23:19 scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 23 01:28 scsi-35000c500a2e631c6 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-35000c500a2e631c6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-35000c500a2e631c6-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 scsi-35000c500a2edebe0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-35000c500a2edebe0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-35000c500a2edebe0-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 23 00:38 scsi-35000cca267eaa17a -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-35000cca267eaa17a-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-35000cca267eaa17a-part9 -> ../../sdg9
lrwxrwxrwx 1 root root 9 May 23 01:20 scsi-35000cca26aeb9280 -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26aeb9280-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26aeb9280-part9 -> ../../sdl9
lrwxrwxrwx 1 root root 9 May 23 01:20 scsi-35000cca26af27d8b -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26af27d8b-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26af27d8b-part9 -> ../../sdk9
lrwxrwxrwx 1 root root 9 May 23 02:35 scsi-35000cca26af7e655 -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-35000cca26af7e655-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-35000cca26af7e655-part9 -> ../../sdi9
lrwxrwxrwx 1 root root 9 May 23 00:35 scsi-35000cca273c099dd -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273c099dd-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273c099dd-part9 -> ../../sdf9
lrwxrwxrwx 1 root root 9 May 22 23:21 scsi-35000cca273c0c7e3 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-35000cca273c0c7e3-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-35000cca273c0c7e3-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 23 03:01 scsi-35000cca273c21a05 -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-35000cca273c21a05-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-35000cca273c21a05-part9 -> ../../sdj9
lrwxrwxrwx 1 root root 9 May 23 00:35 scsi-35000cca273ee8907 -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273ee8907-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273ee8907-part9 -> ../../sde9
lrwxrwxrwx 1 root root 9 May 23 00:04 scsi-35000cca273eeaed7 -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-35000cca273eeaed7-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-35000cca273eeaed7-part9 -> ../../sdh9
lrwxrwxrwx 1 root root 9 May 22 23:19 scsi-35002538d40f8ba4c -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-35002538d40f8ba4c-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 22 23:19 scsi-SATA_Samsung_SSD_850_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-SATA_Samsung_SSD_850_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD-part9 -> ../../sdl9
lrwxrwxrwx 1 root root 9 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD-part9 -> ../../sdk9
lrwxrwxrwx 1 root root 9 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND-part9 -> ../../sdi9
lrwxrwxrwx 1 root root 9 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD-part9 -> ../../sdf9
lrwxrwxrwx 1 root root 9 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D-part9 -> ../../sdj9
lrwxrwxrwx 1 root root 9 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD-part9 -> ../../sde9
lrwxrwxrwx 1 root root 9 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD-part9 -> ../../sdh9
lrwxrwxrwx 1 root root 9 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z-part9 -> ../../sdg9
lrwxrwxrwx 1 root root 9 May 23 01:28 wwn-0x5000c500a2e631c6 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 wwn-0x5000c500a2e631c6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 wwn-0x5000c500a2e631c6-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 May 23 01:16 wwn-0x5000c500a2edebe0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 wwn-0x5000c500a2edebe0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 wwn-0x5000c500a2edebe0-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 May 23 00:38 wwn-0x5000cca267eaa17a -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 wwn-0x5000cca267eaa17a-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 wwn-0x5000cca267eaa17a-part9 -> ../../sdg9
lrwxrwxrwx 1 root root 9 May 23 01:20 wwn-0x5000cca26aeb9280 -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26aeb9280-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26aeb9280-part9 -> ../../sdl9
lrwxrwxrwx 1 root root 9 May 23 01:20 wwn-0x5000cca26af27d8b -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26af27d8b-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26af27d8b-part9 -> ../../sdk9
lrwxrwxrwx 1 root root 9 May 23 02:35 wwn-0x5000cca26af7e655 -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 wwn-0x5000cca26af7e655-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 wwn-0x5000cca26af7e655-part9 -> ../../sdi9
lrwxrwxrwx 1 root root 9 May 23 00:35 wwn-0x5000cca273c099dd -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273c099dd-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273c099dd-part9 -> ../../sdf9
lrwxrwxrwx 1 root root 9 May 22 23:21 wwn-0x5000cca273c0c7e3 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 wwn-0x5000cca273c0c7e3-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 wwn-0x5000cca273c0c7e3-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 May 23 03:01 wwn-0x5000cca273c21a05 -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 wwn-0x5000cca273c21a05-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 wwn-0x5000cca273c21a05-part9 -> ../../sdj9
lrwxrwxrwx 1 root root 9 May 23 00:35 wwn-0x5000cca273ee8907 -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273ee8907-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273ee8907-part9 -> ../../sde9
lrwxrwxrwx 1 root root 9 May 23 00:04 wwn-0x5000cca273eeaed7 -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 wwn-0x5000cca273eeaed7-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 wwn-0x5000cca273eeaed7-part9 -> ../../sdh9
lrwxrwxrwx 1 root root 9 May 22 23:19 wwn-0x5002538d40f8ba4c -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 wwn-0x5002538d40f8ba4c-part1 -> ../../sda1
So this raises many questions:
1) Why did my pool disappear? Is that because the symlinks in /dev/disk/by-id/
disappeared, and zfs couldn’t locate most of the disks?
2) Are the checksum errors worrysome? The disks seem perfectly healthy. I just looked at a couple of dirs and files while the pool was imported with the sdX
references, could that have caused checksum to be incorrectly rewritten if zfs imported the disks in the wrong order?
3) How do I get the missing /dev/disk/by-id/ata-*
symlinks back? Has something changed with Ubuntu 20.04 there that would have caused them to disappear?
4) I thought it was a good idea to refer to my disks through /dev/disk/by-id/
, because these would be stable. Is that not the best way to go about it?
5) I don’t like the wwn-*
names because they are non-descriptive to me. I’d much rather have names that reflect the serial number of the disk, so I can easily identify them if I need to do a replacement. I’ve gone ahead and setup aliases in /dev/disk/by-vdev/
(aliasing to wwn-*
), following the advice in http://kbdone.com/zfs-basics/#Consistent_device_IDs_via_vdev_idconf_file:
$ cat /etc/zfs/vdev_id.conf
alias ST8000VN0022-2EL_ZA17H5D3 /dev/disk/by-id/wwn-0x5000c500a2edebe0
alias ST8000VN0022-2EL_ZA17FZXF /dev/disk/by-id/wwn-0x5000c500a2e631c6
alias WD100EFAX-68_2YG1R7PD /dev/disk/by-id/wwn-0x5000cca273c0c7e3
alias WD100EFAX-68_2YK9BHKD /dev/disk/by-id/wwn-0x5000cca273ee8907
...
Thoughts?
Thanks!
Edit: output from zpool status
after scrub completion:
root@cloud:~# zpool status
pool: downloadpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0 days 11:33:18 with 0 errors on Sun May 10 11:57:19 2020
config:
NAME STATE READ WRITE CKSUM
downloadpool ONLINE 0 0 0
ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD ONLINE 0 0 0
errors: No known data errors
pool: masterpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 112K in 0 days 15:06:09 with 0 errors on Sat May 23 12:53:43 2020
config:
NAME STATE READ WRITE CKSUM
masterpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x5000cca26af27d8b DEGRADED 0 0 15 too many errors
wwn-0x5000cca273ee8907 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
wwn-0x5000cca26aeb9280 DEGRADED 0 0 18 too many errors
wwn-0x5000cca273eeaed7 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
wwn-0x5000cca273c21a05 ONLINE 0 0 0
wwn-0x5000cca267eaa17a ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
wwn-0x5000cca26af7e655 ONLINE 0 0 0
wwn-0x5000cca273c099dd ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
errors: No known data errors
I had the exact same problem. Your post helped me in the right direction. So here are my thoughts.
I have 6 drives, 2 drives in zfs pool 'A' attached to the SATA controller of the motherboard, and 4 drives in zfs pool 'B' attached to my LSI SAS 9211 controller. The pools where setup to look for devices in /dev/disk/by-id.
After upgrading from Ubuntu 18.04 to Ubuntu 20.04, the device id's of all disks attached to the SAS controller where changed, from device id ata-* to scsi-SATA*. After rebooting the server, zfs pool B was missing, because zfs couldn't find the device id's anymore during import. The device id's of the drives connected to the SATA controller on the motherboard stayed the same. The zfs pool using those drives could be imported and wasn't missing after the release upgrade.
This is how I fixed the missing 'B' pool:
First I listed all pools that where available for import:
sudo zpool import
This listed my missing pool 'B', and all the correct drives in that pool, but named as devices listed in /dev. So I imported the pool using device id's listed in /dev/disk/by-id. I got a warning that the pool appears to be potentially active, so I had to force import using -f, like this:
sudo zpool import -f -d /dev/disk/by-id B
And everything was fine again. Pool B was available again. I didn't export the pool. I didn't import the pool without telling to use device id's first. The device id's used are different now: wwn-*
I ran a scrub on the pool, resulting in no errors.
To answer your questions:
I think the release upgrade from Ubuntu 18.04 to 20.04 caused the links in /dev/disk/by-id changing.
I didn't import the pool with the /dev references, and I imported using the option -f. That would be the difference with what you did and what I did. But I can't imagine this would be a problem, unless the wrong drives where used.
I didn't get the old disk by id links back. But by importing the pool using the directive to use the disk id's, it's using new disk id's that's good enough for me. I don't need the old ones back.
I still think it's a good idea to refer to disks through /dev/disk/by-id/. These are stable during reboots and when disks are moving physically around in the server (I tested this). I am a bit disappointed that a release upgrade would brake the disk id naming. But I am glad it could be solved in my case by importing the pool again.
The same reason for me. Thanks for the tip to use aliases! Perhaps I will use this.
Answered by Crestop on November 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP