Skip to main content

sdnog hosting project: infrastructure updates

Here, we share regular updates on the development, upgrades, and operational status of the infrastructure that supports the sdnog community . This includes deployments of core services, network enhancements, performance improvements, and collaborative efforts with local and international partners.

12 May 2025 :  RAM Upgrade and Faulty DIMM Replacement on Server

The suspect DIMM in slot A2 has been removed and replaced with a 32GB module from INX. The server has now been upgraded to 32GB of RAM. Slots A2 and B2 are populated identically to maintain dual-channel memory support.

19 May 2025: New backup system

INX has implemented a new backup system - we’re now using proxmox backup server instead of a remote NFS mount from a NAS.  so yesterday i cut the sdnog hosting infrastructure over to this.  i also took this chance to upgrade the backup infrastructure to 10gb/s as this was limited to 1gb/s previously.  

backups are now much faster - because they are incremental  (and more backend speed)  - the proxmox server has 32x gold cores at 3.6ghz, caching disks, more spindles, etc.    you can login to any of the server and hit “Run now” for any of the servers and you’ll see this for yourself - the backup results are still  being emailed to sysadmin@sdnog (attached one here) 

i changed the backups storage from “keep 4 copies” to keep 14 copies because the disk space used is lower.

later this week, i’ll deploy another server in a different location so that there is off-site backups;  my plan is to set backups to run at 12h intervals, and create alternate backups so that morning backups (7am) stay onsite, but evening backups (7pm) go offsite.  if you have better ideas, let me know.

28 May 2025: VM1 Stability Issue Due to Faulty DIMM

An issue has been identified with one of the two 32GB DIMMs installed in the VM1 server a few weeks ago. The server has experienced two crashes since the installation, with system logs consistently indicating a fault in DIMM B1. A replacement DIMM is being sourced to resolve the issue.

22 June 2025: Backup Sync Issue with Xneelo (SMR) – Fixed

There was some confusion with the backup to the Xneelo site (SMR). I recently upgraded the server and changed the Let’s Encrypt key for management on the backup server. As a result, the backups to SMR failed because the SSL fingerprint no longer matched. This issue has now been fixed, and I’ve re-initiated the backups. Just a reminder — we run two backup setups:

  • Morning (SMR)

  • Evening (PKL)

Both go to different locations, so backups from yesterday are still available. Backup notifications are sent to sysadmin@sdnog.sd

30 June 2025: VM1 Stability Issue Due to Faulty DIMM

VM01 is currently offline due to issue B7 (previous incident was B1). The team is investigating. No estimated time of restoration yet; updates will follow once confirmed.

12 July 2025: Network and Backup Setup Updates

  • The management network uplink (hosting vpn.sdnog, the hypervisor iDRAC, and vmbr0 interface) has been upgraded to 10 Gb/s.

  • The connection to the backup server is now a dedicated 20 Gb/s LACP bundle (previously 10 Gb/s). Backups now run in parallel instead of serially.

  • Backup verification has been enabled, improving data integrity with additional compute resources on the backup server side.

  • The backup architecture has been restructured. All backups now target pbs1.pkl only; remote servers synchronize with this main backup server. This new setup offers better efficiency under PBS. Daily logs should be monitored for any failures in case pbs1.pkl becomes unavailable.

  • For any related concerns and updates :  sysadmin(at)sdnog(dot)sd

sdnog-cloud-infra-update.png