in

What are you guys disaster recovery plans?

I followed guides to get running on testnet and then “trashed” everything again to start from scratch. For preparation I actually built a script based on the guides that handles a lot of getting up and running. eg. I can now probably be up in less than an hour with a full reinstall.

However it doesn’t solve the issue of having to sync again. For Geth an infura failover can do but one still needs to sync the beacon chain . how long does that actually take on mainnet? On testnet for me it’s about 1 day (=24hrs). So one still loses a full day (not accounting for possible hardware to arrive).

Do you guys create a chain backup now and then? (but with clonezilla this will lead to some downtime as well). Or some other means of backing it up that doesn’t need downtime?

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

10 Comments

  1. Ansible scripts to rebuild the node and just chill out until I get resynced (the loss will be negligible). Could use infura as a backup if you just can’t wait…

  2. If you’re worried about the potential losses from 24 hours, that would mean you have a ton of validators or you’re being a tad bit neurotic.

    If it makes financial sense (i.e. you have a ton of validators and 24 hours means 4-5+ figure losses) get the same exact hardware, sync it up on a eth1 node and beacon chain node, unplug and then put it somewhere safe and then you have a machine that’s good to go if for some reason your validator machine blows up. The only thing you would have to do is to run an update on everything, sync up from the last time you synced and configure your validators. Or you can even have that separate machine just running eth1 and beacon chain nodes constantly so it’s up to date.

    I would say power loss or internet outages of 24 hours are far more likely than a machine getting trashed, so unless you’re also prepared for that too you only mitigated a lower probability risk.

  3. Breathe deep.

    Take it slow.

    Start again from first principles and don’t rush anything. Ignore the chaos in my inbox from missed attestation.

    5-10 days is no dramas at all.

  4. I use lighthouse, and if it were to go down I could import my keys and run on new hardware and use the infura teku fallover beacon chain until the lighthouse beacon chain syncs. And use infura backup to geth too, so if either beacon chain or geth fails it will still work.

  5. Yeah use infura or alchemy for failover. Sync your Geth client in the background then switch back over to it once it’s ready. I think you can also do geth backups locally to reduce sync time?

    Also, worst case scenario, you’re down for a day or two, the losses of which are totally insubstantial. There are literally validators on the main net that have been offline since like day 1 and are still going strong (offline penalties are tiny!)

    Edit: misread and thought your issue was the eth1 node. Infura should still provide beaconnode failover too though

  6. What everyone here is saying is right. The downtime is negligible.

    geth took me like 3 days to sync full I think.

    Teku took a 4-5 days I think.

    If you’re worried about it, you can buy another hard drive and either take snapshots of the dbs of the clients, though, you’ll have to shut down the clients to do that I think since they lock the db when it’s in use.

Trying to understand the hashrate up and down of ethereum?

Why is Ethereum Not Moving upward?