The cause of the outage was due to a new monitoring software we implemented on kvm01-tor which caused the system's SMBUS to hang which then caused the host OS to also hang which we needed to do an emergency reboot.
We resolved the issue by unloading the w83795 kernel module to avoid the software and system to query those affected sensors.
We are sorry for the inconveniences this may have caused.