This section provides procedures for replacing an AMC590 HDD.
Assumptions
No USB keys are used for installation of the AMC590 HDD replacement.
Workflow
Step | Action |
---|---|
1 | Optional. Using the unaffected HDD, create a tar file backup of the cpu_ss7gw and backups directories. |
2 | On the failed HDD, update the /etc/fstab file to preserve the mount point and then shut down the failed HDD. Refer to Shut down the Failed HDD. |
3 |
Useful Information
Execute the following commands at any time, as necessary.
This will fix most cursor/page navigation issues in editors such as vim:
export TERM=xterm
These will fix many display issues. For example, if logged in through the console and the terminal does not scroll properly:
reset
stty sane
If the emergency mode prompt is encountered (after boot), enter the root password to login for shell access:
[ OK ] Started Update UTMP about System Boot/Shutdown. Starting Update UTMP about System Runlevel Changes... [ OK ] Started Update UTMP about System Runlevel Changes. [ OK ] Started Crash recovery kernel arming. Welcome to emergency mode! After logging in,type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try again to boot into default mode. Give root password for maintenance (or type control-D to continue): <enter root password here>
Create a Backup Using the Unaffected MGMT Card
Use this procedure to create a back up of the cpu_ss7gw and backups directories using the MGMT card that did not suffer an HDD failure. This procedure is optional.
Start
Execute the following commands from the bash shell prompt (this can be performed via an SSH session, as it is not on the affected MGMT card and thus the console is not required):
cd /var/log/ tar -cvf files.tar cpu_ss7gw/ backups/
Example Output[root@OTT42slot14 ~]# cd /var/log [root@OTT42slot14 log]# tar -cvf files.tar cpu_ss7gw/ backups/ cpu_ss7gw/ cpu_ss7gw/previous cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/ cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/usb/ cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/usb/stp_sp2k_19_0_0_nb20210429_upgrade.sh cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/load/ cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/load/stp_sp2k_19_0_0_nb20210429.sh cpu_ss7gw/upgrade_data/ cpu_ss7gw/upgrade_data/PRE_UPGRADE_DATA.stp_sp2k_19_0_0_nb20210429.Thu_Feb__9_16_18_09_EST_2023 cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/ cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/usb/ cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/usb/stp_sp2k_21_0_1_nb20220425_upgrade.sh cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/load/ cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/load/stp_sp2k_21_0_1_nb20220425.sh cpu_ss7gw/current backups/ backups/OTT42slot24-bak-09-11-21-141718-Complete-SP2K.19.0.tar.gz backups/OTT42slot14-bak-09-02-23-162651-Complete-SP2K.19.0.tar.gz backups/usb/ backups/OTT42slot14-bak-04-06-21-203147-Complete-SP2K.19.0.tar.gz backups/OTT42slot24-bak-09-02-23-201844-Complete-SP2K.21.0.tar.gz [root@OTT42slot14 log]#
Shut down the Failed HDD
On the MGMT card with the HDD failure update the /etc/fstab file to preserve the mount point and then shut down the disk.
Perform the following procedures from the bash shell prompt.
All procedures on the affected MGMT card (the one associated with the HDD failure) are to be performed via a console login session.
Start
Edit the /etc/fstab file by executing the following commands:
export TERM=xterm vim /etc/fstab
Comment out the LABEL=LOGS and LABEL=STATS entries by adding a ‘#’ at the beginning of these lines. When finished, exit vim by issuing the standard write+quit action from vim’s command mode:
:wq
From the bash shell, issue the following command so as to verify the file changes:
cat /etc/fstab
The output should look similar to the following. Note the two lines which have now been commented out.
Example Output[root@sp2k3ulab68slot14 ~]# cat /etc/fstab # # /etc/fstab # Created by anaconda on Sun Nov 17 22:28:32 2019 # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # LABEL=FS2 / ext4 noatime,nodiratime 1 1 LABEL=BOOT /boot ext4 noatime,nodiratime 1 2 LABEL=FS1 /upgrade ext4 noatime,nodiratime 1 2 none /tmp tmpfs size=256M 0 0 #LABEL=LOGS /shared/log ext4 noatime,nodiratime 1 2 #LABEL=STATS /shared/stats ext4 noatime,nodiratime 1 2
Halt the affected CPU by executing the following command from the bash shell:
shutdown now
Wait for the shutdown procedures to complete. The last couple of lines printed on the console should be similar to the following:
Example Output[ OK ] Reached target Shutdown. [2836.447156] Power down.
Replace an AMC590 HDD
Use this procedure to replace an AMC590 disk.
Before you remove an AMC, refer to Extracting and Inserting an AMC for guidance to avoid breaking the fragile ejector handle.
In this example, the Carrier blade slot is 7 and the AMC590 HDD to be replaced is in slot 1.
Start
- Deactivate the faulty AMC590 HDD from the VSE Genius CLI by logging into the NDM as mtc.
At the CLI prompt, enter the following command:
cli> hardware app-blade deactivate <frame> <shelf> <VSE slot> <carrier blade subslot>
Example Outputcli> ha app-blade deactivate 0 0 7 1 %Warning: You are about to deactivate the hardware. This will power-off the hardware. The hardware will not recover unless the activate command is used to power it on. Do you want to continue? confirm (y/n)>y %Warning: This is an Open-slot blade. To avoid any service impact you must ensure the application is locked before continuing. Do you want to continue? confirm (y/n)>y %Progress: Command initiated successfully. %Progress: Command completed successfully.
Extract the existing AMC590 HDD and insert the new AMC590 HDD into the appropriate Carrier blade in bay 1.
Activate the new (replacement) AMC590 HDD from the VSE Genius CLI:
cli> hardware app-blade activate <frame> <shelf> <VSE slot> <carrier blade subslot>
Example Outputcli> ha app-blade activate 0 0 7 1 %Progress: Command initiated successfully. %Progress: Command completed successfully.
At this point, the affected CPU (that was previously shut down) can now be powered up again. This can be achieved by extracting and inserting the AMC121 card. Refer to the procedure Extracting and Inserting an AMC.
Once the login prompt is displayed, wait several minutes to allow any background processes to complete their startup procedures.
After logging in compare the output of the "wd" command from the affected CPU and unaffected CPU. The watchdog processes count should match.
- Login to the system as root to access the bash shell.
Execute the following commands, one at a time, from the bash shell.
systemctl stop crond systemctl stop sonus-application systemctl stop logmonitor systemctl stop snmpdmonitor systemctl stop tuned systemctl stop pacemaker systemctl stop corosync systemctl stop blkmonitor systemctl stop pcsd systemctl stop idled systemctl stop tspmonitor systemctl stop ntpmonitor killall httpd killall -KILL httpd httpd: no process found
Note that one or both of these last two commands may display an error such as:
Example Errorhttpd: no process found
this can safely be ignored.
Execute the following commands at the bash shell:
rm -rf /var/log/* rm -rf /var/stats/* umount /var/log umount /var/stats
Note that one or both of these last two commands may display an error such as:
Example Errorsumount: /var/log: not mounted umount: /var/stats: not mounted
this can safely be ignored.
NoteIf the umount command reports an error such that the filesystem in question can NOT be unmounted as it is busy, use the fuser command to determine which processes are still using it as such (e.g., for /var/log):
fuser -m /var/log
Then, the associated numerical process IDs can be killed off one by one (with the kill command) as necessary.
Continuing on the affected CPU (card associated with the HDD failure)
Execute the format_stats_logs_hard_drive script by issuing the following command at the bash shell prompt:
format_stats_logs_hard_drive
Answer ‘y’ when prompted to proceed. The output should be similar to the following:
Example Output========================================= TARGET FOR PARTITIONING/FORMATTING: Board: AMC121 Hard Drive: /dev/sda ========================================= If you choose to partition/format the hard drive, all data on that drive will be erased. Are you sure you want to partition/format the hard drive? [y/n] y Erasing existing partitions and filesystems. Existing partitions and filesystems successfully erased. Partitioning the hard drive. Hard drive successfully partitioned. Formatting filesystem(s) on the hard drive. Filesystem(s) successfully formatted on the hard drive. Mounting /dev/sda1 on /shared/stats. /dev/sda1 successfully mounted on /shared/stats. Mounting /dev/sda2 on /shared/log. /dev/sda2 successfully mounted on /shared/log. Creating required file/directory structure on the new filesytem(s). File/directory structure successfully created. All done.
Once the script has completed, issue the following commands at the bash prompt. Replace “<unaffected slot>” with the slot number of the unaffected MGMT card (the one that did not suffer an associated HDD failure). For example, if performing the HDD replacement on slot 24, such that the unaffected MGMT card is therefore slot 14, the scp command below would be (example) “scp slot14_0:/var/log/files.tar .”
cd /var/log/ scp slot<unaffected slot>_0:/var/log/files.tar .
Example Outputfiles.tar 100% 1139MB 29.2MB/s 00:39 tar -xvf files.tar rm -f files.tar
Edit the /etc/fstab file by executing the following commands:
export TERM=xterm vim /etc/fstab
Uncomment the LABEL=LOGS and LABEL=STATS entries by removing the ‘#’ from the beginning of these lines. When finished, be sure to exit vim by issuing the standard write+quite command from vim’s command mode:
:wq
From the bash shell, issue the following command so as to verify the file changes:
cat /etc/fstab
The output should look similar to the following. Note the two lines are no longer commented out.
Example Output# # /etc/fstab # Created by anaconda on Sun Nov 17 22:28:32 2019 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # LABEL=FS2 / ext4 noatime,nodiratime 1 1 LABEL=BOOT /boot ext4 noatime,nodiratime 1 2 LABEL=FS1 /upgrade ext4 noatime,nodiratime 1 2 none /tmp tmpfs size=256M 0 0 LABEL=LOGS /shared/log ext4 noatime,nodiratime 1 2 LABEL=STATS /shared/stats ext4 noatime,nodiratime 1 2
Complete the HDD replacement issuing the following command at the bash shell prompt, which will reboot the card in question:
rebootd
Once the card has finished (re)booting, wait several minutes to allow any background processes to complete their startup procedures, then log in as root and make sure that everything is copacetic. For example, execute the ‘wd’ command to ensure all processes are running as expected, and issue the ‘df’ command to ensure that the proper filesystems on the new HDD are mounted. There should be entries such as the following (see bold):
df -hT
Example Output[root@sp2k13Uslot24 ~]# df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/sda3 ext3 1.6G 1.2G 283M 82% / devtmpfs devtmpfs 914M 8.0K 914M 1% /dev tmpfs tmpfs 936M 39M 898M 5% /dev/shm tmpfs tmpfs 936M 17M 919M 2% /run tmpfs tmpfs 936M 0 936M 0% /sys/fs/cgroup none tmpfs 256M 12K 256M 1% /tmp /dev/sdb1 ext4 58G 53M 55G 1% /shared/stats /dev/sdb2 ext4 172G 814M 163G 1% /shared/log /dev/sda2 ext4 1.6G 625M 869M 42% /upgrade /dev/sda1 ext4 486M 125M 332M 28% /boot tmpfs tmpfs 188M 0 188M 0% /run/user/0
Check the directory listings for /var/log and /var/stats to ensure that the usual logging files have been set up and are being actively logged to.