Following instructions on kuutamod/run.md
Running a localnet setup consists of
hivemindconsists of- consul as the RAFT consensus layer
- 3 seperate near localnet nodes to start the network
validatorwith metrics available atcurl localhost:2233/metrics
screen -S validator ./target/debug/kuutamod --neard-home .data/near/localnet/kuutamod0/ \
--voter-node-key .data/near/localnet/kuutamod0/voter_node_key.json \
--validator-node-key .data/near/localnet/node3/node_key.json \
--validator-key .data/near/localnet/node3/validator_key.json \
--near-boot-nodes $(jq -r .public_key < .data/near/localnet/node0/node_key.json)@127.0.0.1:33301
failoverwith metrics available atcurl localhost:2234/metrics
screen -S failover ./target/debug/kuutamod \
--exporter-address 127.0.0.1:2234 \
--validator-network-addr 0.0.0.0:24569 \
--voter-network-addr 0.0.0.0:24570 \
--neard-home .data/near/localnet/kuutamod1/ \
--voter-node-key .data/near/localnet/kuutamod1/voter_node_key.json \
--validator-node-key .data/near/localnet/node3/node_key.json \
--validator-key .data/near/localnet/node3/validator_key.json \
--near-boot-nodes $(jq -r .public_key < .data/near/localnet/node0/node_key.json)@127.0.0.1:33301
Initial check of the validator and failover metrics
Validator: kuutamod_state{type="Validating"} 1
Failover: kuutamod_state{type="Voting"} 1
Pass control + c to send a graceful shutdown command to the main validator
Check of the validator and failover metrics
Validator:
Failover: kuutamod_state{type="Validating"} 1kuutamod_state{type="Validating"} 1
The failover has taken over the validatting responsibilities of the initial validator
When problems with the initial validator are fixed it can be restarted and it will start in a voting role until the failover dies in which it will take over validation
With everything running via screen passing screen -X -S <session_name> kill will forcefully kill the process.
Passing this into the validator will kill it and the failover will properly take over validation (although there is a considerable ~1-2min delay especially compared to the quick failover when killed gracefully). The problem arises when trying to restart the validator process.
Restarting the validator process with the command above results in the following errors eventually killing the process
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
level=warn pid=174131 message="Neard finished unexpectly with signal: 6 (core dumped)" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Voting -> Startup" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Startup -> Syncing" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Syncing -> Registering" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Registering -> Voting" target="kuutamod::supervisor" node_id=node
2022-07-18T18:58:04.693039Z INFO neard: version="1.27.0" build="nix:1.27.0" latest_protocol=54
2022-07-18T18:58:04.693659Z INFO near: Opening store database at ".data/near/localnet/kuutamod0/data"
2022-07-18T18:58:04.767130Z INFO db: Created a new RocksDB instance. num_instances=1
2022-07-18T18:58:04.768723Z INFO db: Dropped a RocksDB instance. num_instances=0
thread 'main' panicked at 'Failed to open the database: DBError("IO error: While lock file: .data/near/localnet/kuutamod0/data/LOCK: Resource temporarily unavailable")', core/store/src/lib.rs:340:49
The errors point to a IO error regarding a LOCK file in the node's data directory. Presumably when the neard service is gracefully shut down it removes this LOCK but when it is forcefully shut down it is not removed.