If the switch is showing disconnected in the UI but is online locally and reachable then we can try the following steps to troubleshoot further. We will need console access to the switch or SSH access to the switch to check the following:
1. Make sure the EX switch is on the supported version
Minimum JUNOS OS firmware versions supported for ZTP:
-
- EX2300, EX3400: 18.2R3-S2
- EX4300: 18.4R2-S2
- EX4600, EX4650: 20.4R3
2. Ensure the switch has a valid IP address
Run “ show interfaces terse”. You should see the irb.0 interface having an IP address. You might see multiple irb interfaces depending on the switch model (or in the case of VC). At least one irb interface needs to have a valid IP address.
The switch can also connect using management IP which will be seen on me0 interface. Ensure either irb0 or me0 interface has a valid IP and is administratively UP. (Link status also needs to be UP)
3. Ensure that the device can reach the gateway
4. Ensure the switch can reach the internet
mist@OFFICE_GF_SWITCH> ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=117 time=22.996 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=24.747 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=16.528 ms --- 8.8.8.8 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max/stddev = 16.528/21.424/24.747/3.535 ms
5. Check if the switch can resolve oc-term.mistsys.net
njohnny@NJ_EX_2300_C_Test> ping oc-term.mistsys.net PING ab847c3d0fcd311e9b3ae02d80612151-659eb20beaaa3ea3.elb.us-west-1.amazonaws.com (13.56.90.212): 56 data bytes
If it’s unable to resolve then check for DNS servers configured on the switch:
mist@OFFICE_GF_SWITCH> show configuration | display set | grep name-server set system name-server 202.56.230.2 set system name-server 202.56.230.7 set system name-server 8.8.8.8
configure name server (set system name-server 8.8.8.8)
6. Ensure firewall ports are open (tcp port 2200 for oc-term.mistsys.net)
Please check your cloud environment to see which ports and hosts to enable
njohnny@NJ_EX_2300_C_Test> show system connections | grep 2200 tcp4 0 0 192.168.3.24.64647 13.56.90.212.2200 ESTABLISHED
7. Check if the switch system time is correct
njohnny@NJ_EX_2300_C_Test> show system uptime fpc0: -------------------------------------------------------------------------- Current time: 2020-09-01 21:49:05 UTC Time Source: LOCAL CLOCK System booted: 2020-08-27 06:57:04 UTC (5d 14:52 ago) Protocols started: 2020-08-27 07:01:35 UTC (5d 14:47 ago) Last configured: 2020-09-01 17:21:59 UTC (04:27:06 ago) by mist 9:49PM up 5 days, 14:52, 2 users, load averages: 0.79, 0.65, 0.58
8. Check if ‘device-id’ is of the format ‘<org_id>.<mac_addr> in the CLI command below
njohnny@NJ_EX_2300_C_Test# show system services outbound-ssh traceoptions { file outbound-ssh.log size 64k files 5; flag all; } client mist { device-id ca01ea19-afde-49a4-ad33-2d9902f14a7e.e8a2453e672e; secret "$9$L7i7-wgoJUDkg49Ap0IRrevW-VYgoDHqWLGDkqQzRhcreWLX-Vs2XxGDHkPfn/Cp0IcSeMLxn/LxN-ws5Qz6tuRhSv8Xrl87dVY2TzF/uOEcyKWLleUjikPfIEhSrvxNdbYgRhK8x7Vbk.mf5F9CuOBEtp0IcSMWoJZjmfFn/CA05TIEhSeK4aJUjqP5Q9tu4an/CtOB7-dboJZUjHmfaJn/ApREevW8X-YgoiqmxNb2gaUD69Cp1RSyKMLxCtORSrvM7-VboJDjqPTzNdmfzF/9vW8LdbY2aZGisY4ZDif5z3690BylKWX7KvZUHkTQlKvW-VJGDiqmGU/CtuEhKM87wYaJDkqfoaQFn6At1RhrM8xNd"; ## SECRET-DATA keep-alive { retry 3; timeout 5; } services netconf; oc-term.mistsys.net { port 2200; retry 1000; timeout 60; } }
You can also check log messages on the switch.
njohnny@NJ_EX_2300_C_Test> show log messages | last 20 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/6 status 27 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/7 status 27 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/8 status 27 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/9 status 27 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/10 status 27 Sep 1 21:54:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/11 status 27 Sep 1 21:55:24 NJ_EX_2300_C_Test mgd[93246]: UI_DBASE_LOGIN_EVENT: User 'njohnny' entering configuration mode Sep 1 21:57:18 NJ_EX_2300_C_Test mgd[93246]: UI_DBASE_LOGOUT_EVENT: User 'njohnny' exiting configuration mode Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/0 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/1 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/2 status 1 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/3 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/4 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/5 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/6 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/7 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/8 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/9 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/10 status 27 Sep 1 21:57:24 NJ_EX_2300_C_Test chassisd[4937]: PoE port name ge-0/0/11 status 27
9. If you are adding the switch for the first time
Try to release the Switch from the UI (delete the SW entry) and then try to claim it or readopt the Switch.
-
Delete the present Mist configuration from the switch.
-
Again claim via Claim Code or adopt via CLI command.
-
Verify system service outbound-ssh using the below command “show system services outbound-ssh”
and “show system connections | grep 2200”
If switch is still stuck in disconnected state and:
-
sessions are stuck in FIN_WAIT
-
Switch is able to resolve DNS
-
Internet is reachable
Check for MTU issues on the nodes. Easiest way to validate this is by initiating a ping towards any public server [say 8.8.8.8]
Else, if you have the uplink pcaps from switch, a failing transaction with MTU issue would look something like this:
On a closer look, we could see that the packets with size 1514 are getting retried.
We could do a ping test from the switch in question as follows:
mist@ACC2-A6-IDF1-IAD10> ping size 1450 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1450 data bytes
76 bytes from 8.8.8.8: icmp_seq=0 ttl=59 time=12.444 ms
— 8.8.8.8 ping statistics —
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 12.318/12.381/12.444/0.063 ms
Failed case:
mist@ACC2-A6-IDF1-IAD10> ping size 1480 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1480 data bytes
— 8.8.8.8 ping statistics —
4 packets transmitted, 0 packets received, 100% packet loss
Based on the byte size at which packets are getting timed out, MTU could be adjusted on the uplink accordingly.
10. Deactivate the outbound SSH and reactivate it
Deactivate system service outbound-ssh using the below command
deactivate system services outbound-ssh client mist Commit
Activate outbound-ssh using the below command:
activate system services outbound-ssh client mist