Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save thebream/a9ae1e54f92db58cfafd3c0bfb83dc3f to your computer and use it in GitHub Desktop.

Select an option

Save thebream/a9ae1e54f92db58cfafd3c0bfb83dc3f to your computer and use it in GitHub Desktop.
Remediation script for e1000e network driver hang on Proxmox VE

Check recent system journal entries for "hang detected" message(s), and if found then reset network interface.

Workaround until driver / kernel fixed.

TLDR; Quick start

  1. Copy attached script to somewhere, e.g. /root/cron/hangcheck2.sh
  2. If your interface name is not eno1 then update the ifup and ifdown commands
  3. Review code for your peace of mind (after all, this script is going to be running as root!)
  4. Give it a test run:
    tim@tim-nuc10:~$ /root/cron/hangcheck2.sh -vv
    Verbosity is: 2
    2025-11-20 10:52:48: No network hang detected, exiting
    
  5. Add entry to root crontab, e.g. run every 10 minutes, starting at 2 minutes past the hour:
    2,12,22,32,42,52 * * * * /root/cron/hangcheck2.sh >> /var/log/hangcheck2.log || echo "CRON JOB FAILED"
    

Reference

https://forum.proxmox.com/threads/e1000-driver-hang.58284/

Background

System tested on

Intel NUC 10 (NUC10i5FNH) with I219-V integrated onboard LAN using e1000e driver:

root@tim-nuc10:~# lspci | grep Ethernet  
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (10) I219-V

root@tim-nuc10:~# ethtool -i eno1  
driver: e1000e  
version: 6.8.12-14-pve  
firmware-version: 0.6-4  
expansion-rom-version:  
bus-info: 0000:00:1f.6  
supports-statistics: yes  
supports-test: yes  
supports-eeprom-access: yes  
supports-register-dump: yes  
supports-priv-flags: yes

Example error from system journal

When network has hung, these errors occur about every 2 seconds). Verbose output shown here:

root@tim-nuc10:~# journalctl --since "2025-11-19 05:00:03" --until "2025-11-19 05:00:04" -o verbose
Wed 2025-11-19 05:00:03.609642 AEDT [s=357c571eab6d4965951303c26dfdd3c1;i=1cadd5;b=d3f13914711347d9b09a19db19ba1dd2;m=4>
    _BOOT_ID=d3f13914711347d9b09a19db19ba1dd2
    _MACHINE_ID=4dd372840ba9458ea51b7ea0e1828448
    _HOSTNAME=tim-nuc10
    _RUNTIME_SCOPE=system
    _TRANSPORT=kernel
    SYSLOG_FACILITY=0
    SYSLOG_IDENTIFIER=kernel
    PRIORITY=3
    _KERNEL_SUBSYSTEM=pci
    _KERNEL_DEVICE=+pci:0000:00:1f.6
    _UDEV_SYSNAME=0000:00:1f.6
    _SOURCE_MONOTONIC_TIMESTAMP=4956951916911
    MESSAGE=e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
              TDH                  <e9>
              TDT                  <9>
              next_to_use          <9>
              next_to_clean        <e9>
            buffer_info[next_to_clean]:
              time_stamp           <2273268d5>
              next_to_watch        <ea>
              jiffies              <227716f00>
              next_to_watch.status <0>
            MAC Status             <40080083>
            PHY Status             <796d>
            PHY 1000BASE-T Status  <3800>
            PHY Extended Status    <3000>
            PCI Status             <10>

(Note that the network interface name, "eno1", is in the first line of the MESSAGE field above. The script could parse that line to get name instead of hard-coding it)

Based on that, the hang check looks at the journal for the last two minutes and filters on these fields:

_TRANSPORT=kernel
_KERNEL_SUBSYSTEM=pci
PRIORITY=3

Output from that is then filtered through this, to count the number of hangs:

grep -c "Detected Hardware Unit Hang:"

Complete command looks like this:

if ! hangcount=$(journalctl \
                    --since "2 minutes ago" _TRANSPORT=kernel \
                    _KERNEL_SUBSYSTEM=pci --priority=3 | \
                 grep -c "Detected Hardware Unit Hang:")
then
    log_msg 1 "No network hang detected, exiting"
    exit 0
fi
#!/bin/bash
# Reset network interface if system journal shows it has hung
# Refer https://forum.proxmox.com/threads/e1000-driver-hang.58284
# 19/11/2025 v1.0 Created simplified version - Tim.
set -euo pipefail
usage() {
cat <<EOF
Reset network if hang detected in PVE networking, must be run as root.
Options:
-h this help
-v verbose output, specify multiple times to increase verbosity
EOF
}
log_msg() {
local loglevel="$1"
local logmsg="$2"
local timestamp
timestamp=$(date +"%Y-%m-%d %H:%M:%S")
if [[ $loglevel -le $VERBOSITY ]]; then
echo "$timestamp: $logmsg"
fi
}
#defaults
VERBOSITY=0
while getopts "hv" OPTION
do
case $OPTION in
h)
usage
exit 0
;;
v)
VERBOSITY=$((VERBOSITY + 1))
;;
\?)
usage
exit 3
;;
esac
done
shift $((OPTIND - 1))
[[ $VERBOSITY -ge 1 ]] && echo "Verbosity is: $VERBOSITY"
# check system journal for recent hang
if ! hangcount=$(journalctl \
--since "2 minutes ago" _TRANSPORT=kernel \
_KERNEL_SUBSYSTEM=pci --priority=3 | \
grep -c "Detected Hardware Unit Hang:")
then
log_msg 1 "No network hang detected, exiting"
exit 0
fi
log_msg 0 "Hang detected, count is: $hangcount, restarting network"
# need full path, root cron PATH does not include /usr/sbin
/usr/sbin/ifdown eno1; sleep 10; /usr/sbin/ifup eno1
log_msg 2 "Sleeping 10 seconds for good luck"
sleep 10
# problem has been detected, so exit non-zero to get notification from cron
exit 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment