Как восстановить кластер sheepdog
После полного отключения электропитания в логе наблюдаем следующее:
# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift
Failed to read object 8009a36700000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 800c11d500000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80188a3200000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 8021d3c800000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 8041786300000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 808bb12700000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80b643c600000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80be351000000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80c66f7400000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80d19f4b00000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80dcbbc300000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80ddd9d500000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80e5b6b900000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80ec092100000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80efa76700000000 Waiting for other nodes to join cluster
Failed to read inode header
При этом, все узлы были запущены и “dog node list” нормально отрабатывал, выдавая весь список членов кластера.
Чтобы исправить, надо убедиться, что все члены кластера запущены и дать команду восстановления кластера:
Затем надо запустить контроль кластера:
После этих действий все стало хорошо.
# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift
volume-a87bfb14-44c3-4777-82c8-de1d435f3496 0 2.0 GB 68 MB 0.0 MB 2019-09-02 23:21 5ce99 3 22
b4b7409b-8934-437f-9a12-b2c1f87d5eea 0 700 MB 704 MB 0.0 MB 2019-09-02 17:02 c11d5 3 22
52d793b4-ee5f-45fb-84f0-f2400979cfde 0 607 MB 608 MB 0.0 MB 2019-09-04 16:23 188a32 3 22
b0731b77-ce3f-458f-b0b1-b88a64e7b4c6 0 283 MB 284 MB 0.0 MB 2019-09-02 14:09 d19f4b 3 22
fcc5b88c-fc8f-4683-8d32-a9fedac80b02 0 1.7 GB 1.7 GB 0.0 MB 2019-09-09 15:08 ec0921 3 22
....
В логе появились сообщения, что все объекты восстановлены.
# tail /var/lib/evstorage/evs.log
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 91%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 92%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 93%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 94%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 95%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 96%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 97%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 98%
Sep 16 14:44:34 INFO [main] recover_object_main(1004) object recovery progress 99%
Sep 16 14:44:36 NOTICE [main] cluster_recovery_completion(744) all nodes are recovered, epoch 210
Опубликовано: 10.09.2019