Как восстановить кластер sheepdog

После полного отключения электропитания в логе наблюдаем следующее:

# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag   Block Size Shift
Failed to read object 8009a36700000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 800c11d500000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80188a3200000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 8021d3c800000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 8041786300000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 808bb12700000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80b643c600000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80be351000000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80c66f7400000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80d19f4b00000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80dcbbc300000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80ddd9d500000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80e5b6b900000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80ec092100000000 Waiting for other nodes to join cluster
Failed to read inode header
Failed to read object 80efa76700000000 Waiting for other nodes to join cluster
Failed to read inode header

При этом,  все узлы были запущены и “dog node list” нормально отрабатывал, выдавая весь список членов кластера.

Чтобы исправить, надо убедиться, что все члены кластера запущены и дать команду восстановления кластера:

dog cluster recover force

Затем надо запустить контроль кластера:

dog cluster check

После этих действий все стало хорошо.

# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag   Block Size Shift
  volume-a87bfb14-44c3-4777-82c8-de1d435f3496     0  2.0 GB   68 MB  0.0 MB 2019-09-02 23:21    5ce99      3                22
  b4b7409b-8934-437f-9a12-b2c1f87d5eea     0  700 MB  704 MB  0.0 MB 2019-09-02 17:02    c11d5      3                22
  52d793b4-ee5f-45fb-84f0-f2400979cfde     0  607 MB  608 MB  0.0 MB 2019-09-04 16:23   188a32      3                22
  b0731b77-ce3f-458f-b0b1-b88a64e7b4c6     0  283 MB  284 MB  0.0 MB 2019-09-02 14:09   d19f4b      3                22
  fcc5b88c-fc8f-4683-8d32-a9fedac80b02     0  1.7 GB  1.7 GB  0.0 MB 2019-09-09 15:08   ec0921      3                22
  ....

В логе появились сообщения, что все объекты восстановлены.

# tail /var/lib/evstorage/evs.log
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  91%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  92%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  93%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  94%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  95%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  96%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  97%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  98%
Sep 16 14:44:34   INFO [main] recover_object_main(1004) object recovery progress  99%
Sep 16 14:44:36 NOTICE [main] cluster_recovery_completion(744) all nodes are recovered, epoch 210

Опубликовано: 10.09.2019