Running erigon ethereum nodes on GCP and AWS

So I setup some pretty monster servers for running Erigon clients on both AWS and GCP. Here are their specs and how they are chugging along currently. I’m expecting the sync to complete at around 1.3TB.

GCP Erigon Sync

Summary

I would have expected this sync to happen a lot faster considering the hardware that’s being used. It’s well above the recommend hardware from the erigon readme:

  • Baseline (ext4 SSD): 16Gb RAM sync takes 5 days, 32Gb - 4 days, 64Gb - 3 days

At first I didn’t believe the bottleneck is IO because an IO test shows very fast write speeds. However this chart looks like it has the potential hallmarks of throttling, but I don’t know if this pattern is characteristic of the execution stage which the node is currently on.

In GCP, I don’t seem to be able to setup provisioned IOPS like I can in AWS, but the recommendation is to create a large volume in order to have better IO performance on the disk so that’s why I created a large 7.6TB volume for the sync.

Running Time: ~7 days (still ongoing)

Specs

OS Instance Type vCPUs RAM HD
Ubuntu 20.04.3 LTS e2-highcpu-32 32 32GB 7.6TB SSD (more TB=better IO)

CPU Info

lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:              0
CPU MHz:               2200.218
BogoMIPS:              4400.43
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              56320K
NUMA node0 CPU(s):     0-31

df

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       7.6T 1006G  6.6T  13% /

Observability Metrics


Current erigon log output:

INFO[09-04|09:24:11.597] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|09:24:25.838] [7/18 Execution] Executed blocks         number=12176587 blk/s=8.622 tx/s=1634.578 Mgas/s=104.981 batch="486.70 MiB" alloc="1.78 GiB" sys="3.80 GiB"
INFO[09-04|09:24:55.869] [7/18 Execution] Executed blocks         number=12176775 blk/s=6.260 tx/s=1064.993 Mgas/s=78.093 batch="489.64 MiB" alloc="2.83 GiB" sys="3.80 GiB"
INFO[09-04|09:25:25.899] [7/18 Execution] Executed blocks         number=12176997 blk/s=7.393 tx/s=1391.806 Mgas/s=90.522 batch="492.46 MiB" alloc="2.43 GiB" sys="3.80 GiB"
INFO[09-04|09:25:55.872] [7/18 Execution] Executed blocks         number=12177241 blk/s=8.141 tx/s=1758.558 Mgas/s=100.308 batch="495.29 MiB" alloc="2.05 GiB" sys="3.80 GiB"
INFO[09-04|09:26:11.597] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|09:26:25.789] [7/18 Execution] Executed blocks         number=12177467 blk/s=7.554 tx/s=1406.694 Mgas/s=93.320 batch="498.24 MiB" alloc="3.26 GiB" sys="3.80 GiB"
INFO[09-04|09:26:55.856] [7/18 Execution] Executed blocks         number=12177713 blk/s=8.182 tx/s=1642.290 Mgas/s=101.222 batch="501.10 MiB" alloc="2.98 GiB" sys="3.80 GiB"
INFO[09-04|09:27:25.799] [7/18 Execution] Executed blocks         number=12177985 blk/s=9.084 tx/s=1759.820 Mgas/s=110.718 batch="503.97 MiB" alloc="2.64 GiB" sys="3.80 GiB"
INFO[09-04|09:27:55.849] [7/18 Execution] Executed blocks         number=12178255 blk/s=8.985 tx/s=1754.048 Mgas/s=110.811 batch="506.95 MiB" alloc="2.50 GiB" sys="3.80 GiB"
INFO[09-04|09:28:11.597] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|09:28:25.836] [7/18 Execution] Executed blocks         number=12178529 blk/s=9.137 tx/s=1922.201 Mgas/s=111.785 batch="509.94 MiB" alloc="2.17 GiB" sys="3.80 GiB"

htop

iostat

# iostat
Linux 4.15.0-1098-gcp (erigon-node-0) 	09/04/2021 	_x86_64_	(32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.30    0.00    0.81    1.09    0.10   94.70

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.00         0.00         0.00          8          0
sda            1005.88      2960.22      3809.41 1230423174 1583391421

io test (dd)

# sudo dd if=/dev/zero of=/home/ubuntu/erigon/test1.img bs=2G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 4.47902 s, 479 MB/s

AWS Erigon Sync

Summary

Again I would have expected this sync to happen a faster considering the hardware that’s being used. It’s well above the recommend hardware from the repository:

  • Baseline (ext4 SSD): 16Gb RAM sync takes 5 days, 32Gb - 4 days, 64Gb - 3 days

To note in this configuration is that the Erigon data is sitting on its own volume. It also has 32,000 provisioned IOPS (which is way overkill judging by the current throughput). This sync looks to be executing a lot faster than the GCP sync, even though I’m still expecting it to complete after 6 days or so which is 50% longer than the benchmark.

Running Time: ~5 days (still ongoing)

Specs

OS Name vCPUs RAM HD
Ubuntu 20.04.3 LTS c5.4xlarge 16 32GB 2TB SSD (32000 provisioned IOPS)

CPU Info

# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:                        7
CPU MHz:                         3605.818
BogoMIPS:                        6000.00
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       256 KiB
L1i cache:                       256 KiB
L2 cache:                        8 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0-15

Uptime

# uptime
 16:58:18 up 4 days, 21:42,  1 user,  load average: 1.11, 1.11, 1.17

df

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    2.0T  943G  970G  50% /erigon

Observability Metrics


Current erigon log output:

INFO[09-04|08:55:31.180] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|08:55:59.703] [7/18 Execution] Executed blocks         number=11755478 blk/s=11.800 tx/s=1940.575 Mgas/s=146.031 batch="177.60 MiB" alloc="2.26 GiB" sys="7.69 GiB"
INFO[09-04|08:56:29.660] [7/18 Execution] Executed blocks         number=11755864 blk/s=12.885 tx/s=2260.906 Mgas/s=156.571 batch="182.81 MiB" alloc="1.76 GiB" sys="7.69 GiB"
INFO[09-04|08:56:59.664] [7/18 Execution] Executed blocks         number=11756331 blk/s=15.565 tx/s=3425.167 Mgas/s=189.931 batch="187.68 MiB" alloc="1.68 GiB" sys="7.69 GiB"
INFO[09-04|08:57:29.700] [7/18 Execution] Executed blocks         number=11756763 blk/s=14.383 tx/s=2838.241 Mgas/s=176.091 batch="192.68 MiB" alloc="1.55 GiB" sys="7.69 GiB"
INFO[09-04|08:57:31.180] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|08:57:59.686] [7/18 Execution] Executed blocks         number=11757216 blk/s=15.107 tx/s=2813.534 Mgas/s=185.650 batch="198.09 MiB" alloc="2.43 GiB" sys="7.69 GiB"
INFO[09-04|08:58:29.706] [7/18 Execution] Executed blocks         number=11757685 blk/s=15.623 tx/s=2938.736 Mgas/s=191.732 batch="203.29 MiB" alloc="2.11 GiB" sys="7.69 GiB"
INFO[09-04|08:58:59.662] [7/18 Execution] Executed blocks         number=11758130 blk/s=14.855 tx/s=2985.672 Mgas/s=180.122 batch="207.57 MiB" alloc="1.78 GiB" sys="7.69 GiB"
INFO[09-04|08:59:29.700] [7/18 Execution] Executed blocks         number=11758580 blk/s=14.981 tx/s=3132.589 Mgas/s=183.131 batch="212.19 MiB" alloc="1.51 GiB" sys="7.69 GiB"
INFO[09-04|08:59:31.180] [p2p] GoodPeers                          eth66=97 eth65=98
INFO[09-04|08:59:59.793] [7/18 Execution] Executed blocks         number=11759041 blk/s=15.319 tx/s=2896.544 Mgas/s=186.064 batch="216.98 MiB" alloc="2.50 GiB" sys="7.69 GiB"

htop

iostat

Linux 5.4.0-1045-aws (erigon-node-1) 	09/04/21 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.18    0.00    0.45    1.42    0.00   92.95

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
nvme0n1           0.95        11.91        25.64         0.00    5065701   10905088          0
nvme1n1        1079.23      3191.00      5060.82         0.00 1357122680 2152352480          0

io test (dd)

# sudo dd if=/dev/zero of=/erigon/test1.img bs=2G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 4.26688 s, 503 MB/s
1 Like

Update: a core contributor to Erigon said this about GCP:

in order to match the benchmarks, you’d have to attach local SSD devices (NVMe), because “generic” SSD would probably be throttled
Normally, if you drive’s name is /dev/sda1 , what would be generic SSD
you need to do special steps to attach Local-SSD to a VM instance when you create it. They come in pieces of 350Gb each, and you then need to combine them with madm or with LVM