This section provides procedures for replacing an AMC590 HDD.

Assumptions

No USB keys are used for installation of the AMC590 HDD replacement.

Workflow
StepAction
1

Optional. Using the unaffected HDD, create a tar file backup of the cpu_ss7gw and backups directories.

Refer to Create a Backup Using the Unaffected MGMT Card

2

On the failed HDD, update the /etc/fstab file to preserve the mount point and then shut down the failed HDD.

Refer to Shut down the Failed HDD.

3

Replace an AMC590 HDD.

Useful Information
Note

Execute the following commands at any time, as necessary.

This will fix most cursor/page navigation issues in editors such as vim:

export TERM=xterm

These will fix many display issues. For example, if logged in through the console and the terminal does not scroll properly:

reset

stty sane

Note

If the emergency mode prompt is encountered (after boot), enter the root password to login for shell access:

[  OK  ] Started Update UTMP about System Boot/Shutdown.

         Starting Update UTMP about System Runlevel Changes...

[  OK  ] Started Update UTMP about System Runlevel Changes.

[  OK  ] Started Crash recovery kernel arming.

Welcome to emergency mode! After logging in,type "journalctl -xb" to view

system logs, "systemctl reboot" to reboot, "systemctl default" or ^D

to try again to boot into default mode.

Give root password for maintenance

(or type control-D to continue): <enter root password here>
Create a Backup Using the Unaffected MGMT Card

Use this procedure to create a back up of the cpu_ss7gw and backups directories using the MGMT card that did not suffer an HDD failure. This procedure is optional.

Start
  1. Execute the following commands from the bash shell prompt (this can be performed via an SSH session, as it is not on the affected MGMT card and thus the console is not required):

    cd /var/log/
    tar -cvf files.tar cpu_ss7gw/ backups/
    Example Output
    [root@OTT42slot14 ~]# cd /var/log
    [root@OTT42slot14 log]# tar -cvf files.tar cpu_ss7gw/ backups/
    cpu_ss7gw/
    cpu_ss7gw/previous
    cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/
    cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/usb/
    cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/usb/stp_sp2k_19_0_0_nb20210429_upgrade.sh
    cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/load/
    cpu_ss7gw/stp_sp2k_19_0_0_nb20210429/load/stp_sp2k_19_0_0_nb20210429.sh
    cpu_ss7gw/upgrade_data/
    cpu_ss7gw/upgrade_data/PRE_UPGRADE_DATA.stp_sp2k_19_0_0_nb20210429.Thu_Feb__9_16_18_09_EST_2023
    cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/
    cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/usb/
    cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/usb/stp_sp2k_21_0_1_nb20220425_upgrade.sh
    cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/load/
    cpu_ss7gw/stp_sp2k_21_0_1_nb20220425/load/stp_sp2k_21_0_1_nb20220425.sh
    cpu_ss7gw/current
    backups/
    backups/OTT42slot24-bak-09-11-21-141718-Complete-SP2K.19.0.tar.gz
    backups/OTT42slot14-bak-09-02-23-162651-Complete-SP2K.19.0.tar.gz
    backups/usb/
    backups/OTT42slot14-bak-04-06-21-203147-Complete-SP2K.19.0.tar.gz
    backups/OTT42slot24-bak-09-02-23-201844-Complete-SP2K.21.0.tar.gz
    [root@OTT42slot14 log]# 

Shut down the Failed HDD 

On the MGMT card with the HDD failure update the /etc/fstab file to preserve the mount point and then shut down the disk.

Perform the following procedures from the bash shell prompt.

Note

All procedures on the affected MGMT card (the one associated with the HDD failure) are to be performed via a console login session.

Start
  1. Edit the /etc/fstab file by executing the following commands:

    export TERM=xterm
    vim /etc/fstab
  2. Comment out the LABEL=LOGS and LABEL=STATS entries by adding a ‘#’ at the beginning of these lines.  When finished, exit vim by issuing the standard write+quit action from vim’s command mode:

    :wq
  3. From the bash shell, issue the following command so as to verify the file changes:

    cat /etc/fstab

    The output should look similar to the following.  Note the two lines which have now been commented out.

    Example Output
    [root@sp2k3ulab68slot14 ~]# cat /etc/fstab
    #
    # /etc/fstab
    # Created by anaconda on Sun Nov 17 22:28:32 2019
    # Accessible filesystems, by reference, are maintained under '/dev/disk'
    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
    #
    LABEL=FS2   /                       ext4    noatime,nodiratime        1 1
    LABEL=BOOT  /boot                   ext4    noatime,nodiratime        1 2
    LABEL=FS1   /upgrade                ext4    noatime,nodiratime        1 2
    none        /tmp                    tmpfs   size=256M                 0 0
    #LABEL=LOGS  /shared/log             ext4    noatime,nodiratime        1 2
    #LABEL=STATS /shared/stats           ext4    noatime,nodiratime        1 2
  4. Halt the affected CPU by executing the following command from the bash shell:

    shutdown now
  5. Wait for the shutdown procedures to complete.  The last couple of lines printed on the console should be similar to the following:

    Example Output
    [  OK  ] Reached target Shutdown.
    
    [2836.447156] Power down.

Replace an AMC590 HDD

Use this procedure to replace an AMC590 disk.

Note

Before you remove an AMC, refer to Extracting and Inserting an AMC for guidance to avoid breaking the fragile ejector handle.

Tip

In this example, the Carrier blade slot is 7 and the AMC590 HDD to be replaced is in slot 1.

Start
  1. Deactivate the faulty AMC590 HDD from the VSE Genius CLI by logging into the NDM as mtc.
  2. At the CLI prompt, enter the following command:

    cli> hardware app-blade deactivate <frame> <shelf> <VSE slot> <carrier blade subslot>
    Example Output
    cli> ha app-blade deactivate 0 0 7 1
    
    %Warning: You are about to deactivate the hardware. This will power-off the hardware. The hardware will not recover unless the activate command is used to power it on. Do you want to continue?
    
    confirm (y/n)>y
    
    %Warning: This is an Open-slot blade. To avoid any service impact you must ensure the application is locked before continuing. Do you want to continue?
    
    confirm (y/n)>y
    
    %Progress: Command initiated successfully.
    
    %Progress: Command completed successfully.
  3. Extract the existing AMC590 HDD and insert the new AMC590 HDD into the appropriate Carrier blade in bay 1.

  4. Activate the new (replacement) AMC590 HDD from the VSE Genius CLI:

    cli> hardware app-blade activate <frame> <shelf> <VSE slot> <carrier blade subslot>
    Example Output
    cli> ha app-blade activate 0 0 7 1
    %Progress: Command initiated successfully.
    %Progress: Command completed successfully.

    At this point, the affected CPU (that was previously shut down) can now be powered up again.  This can be achieved by extracting and inserting the AMC121 card. Refer to the procedure Extracting and Inserting an AMC.

  5. Once the login prompt is displayed, wait several minutes to allow any background processes to complete their startup procedures.

    After logging in compare the output of the "wd" command from the affected CPU and unaffected CPU.  The watchdog processes count should match.

    Example Output
    [root@cvm17sp2klab18slot14 ~]# wd
    WID  PID   STA  RPT  RHT  HTV  HTT  COM
    1    9088  S+   0    0    10   10   slotmon --platform SP2K -s 14 -g 0 -p /op...
    2    11179 S+   0    0    10   10   pxbr -p /opt/cpu_ss7gw/current/data --pla...
    3    11187 S+   0    0    10   10   upp -p /opt/cpu_ss7gw/current/data/ --pla...
    4    11328 S+   0    0    10   10   sca -p /opt/cpu_ss7gw/current/data/
    5    11502 S+   0    0    10   10   sysmon.default -p /opt/cpu_ss7gw/current/...
    6    11681 S+   0    0    10   10   sysmon.cpucheck -p /opt/cpu_ss7gw/current...
    7    11911 S+   0    0    10   10   sysmon.diskmon -p /opt/cpu_ss7gw/current/...
    8    12078 S+   0    0    10   10   sysmon.logcpu -p /opt/cpu_ss7gw/current/s...
    9    12300 S+   0    0    10   10   sysmon.logmem -p /opt/cpu_ss7gw/current/s...
    10   12526 S+   0    0    10   10   sysmon.usbcheck -p /opt/cpu_ss7gw/current...
    11   12747 S+   0    0    10   10   sysmon.slotinventory -p /opt/cpu_ss7gw/cu...
    12   12956 S+   0    0    10   10   sysmon.configuremtu -p /opt/cpu_ss7gw/cur...
    13   13167 S+   0    0    10   10   sysmon.configaudit -p /opt/cpu_ss7gw/curr...
    14   32754 S+   8    1    10   10   hwmon --platform SP2K -c 10
    15   13816 S+   0    0    10   10   licensed -r -f /opt/cpu_ss7gw/current/dat...
    16   26780 S+   1    0    10   10   snm --platform SP2K -p /opt/cpu_ss7gw/cur...
    17   13859 S+   0    0    10   10   gws --platform SP2K -p /opt/cpu_ss7gw/cur...
    18   14547 S+   1    0    10   10   dre --platform SP2K -p /opt/cpu_ss7gw/cur...
    19   13960 S+   0    0    10   10   sccp --platform SP2K -p /opt/cpu_ss7gw/cu...
    20   16158 S+   0    1    10   10   dinamo --platform SP2K -p /opt/cpu_ss7gw/...
    21   14071 S+   0    0    10   10   l4cvtr --platform SP2K -p /opt/cpu_ss7gw/...
    
    [root@cvm17sp2klab18slot24 ~]# wd
    WID  PID   STA  RPT  RHT  HTV  HTT  COM
    1    30809 S+   0    0    10   10   slotmon --platform SP2K -s 24 -g 0 -p /op...
    2    3620  S+   0    0    10   10   pxbr -p /opt/cpu_ss7gw/current/data --pla...
    3    3634  S+   0    0    10   10   upp -p /opt/cpu_ss7gw/current/data/ --pla...
    4    3664  S+   0    0    10   10   sca -p /opt/cpu_ss7gw/current/data/
    5    3758  S+   0    0    10   10   sysmon.default -p /opt/cpu_ss7gw/current/...
    6    3925  S+   0    0    10   10   sysmon.cpucheck -p /opt/cpu_ss7gw/current...
    7    4049  S+   0    0    10   10   sysmon.diskmon -p /opt/cpu_ss7gw/current/...
    8    4264  S+   0    0    10   10   sysmon.logcpu -p /opt/cpu_ss7gw/current/s...
    9    4469  S+   0    0    10   10   sysmon.logmem -p /opt/cpu_ss7gw/current/s...
    10   4684  S+   0    0    10   10   sysmon.usbcheck -p /opt/cpu_ss7gw/current...
    11   4804  S+   0    0    10   10   sysmon.slotinventory -p /opt/cpu_ss7gw/cu...
    12   4904  S+   0    0    10   10   sysmon.configuremtu -p /opt/cpu_ss7gw/cur...
    13   5140  S+   0    0    10   10   sysmon.configaudit -p /opt/cpu_ss7gw/curr...
    14   5250  S+   0    0    10   10   hwmon --platform SP2K -c 20
    15   7118  S+   0    0    10   10   licensed -r -f /opt/cpu_ss7gw/current/dat...
    16   7136  S+   0    0    10   10   snm --platform SP2K -p /opt/cpu_ss7gw/cur...
    17   7184  S+   0    0    10   10   gws --platform SP2K -p /opt/cpu_ss7gw/cur...
    18   7330  S+   0    0    10   10   dre --platform SP2K -p /opt/cpu_ss7gw/cur...
    19   7362  S+   0    0    10   10   sccp --platform SP2K -p /opt/cpu_ss7gw/cu...
    20   7453  S+   0    0    10   10   dinamo --platform SP2K -p /opt/cpu_ss7gw/...
    21   7477  S+   0    0    10   10   l4cvtr --platform SP2K -p /opt/cpu_ss7gw/...
  6. Login to the system as root to access the bash shell.
  7. Execute the following commands, one at a time, from the bash shell.

    systemctl stop crond
    
    systemctl stop sonus-application
    
    systemctl stop logmonitor
    
    systemctl stop snmpdmonitor
    
    systemctl stop tuned
    
    systemctl stop pacemaker
    
    systemctl stop corosync
    
    systemctl stop blkmonitor
    
    systemctl stop pcsd
    
    systemctl stop idled
    
    systemctl stop tspmonitor
    
    systemctl stop ntpmonitor
    
    killall httpd
    
    killall -KILL httpd
    
    httpd: no process found
  8. Note that one or both of these last two commands may display an error such as:

    Example Error
    httpd: no process found

    this can safely be ignored.

  9. Execute the following commands at the bash shell:

    rm -rf /var/log/*
    rm -rf /var/stats/*
    umount /var/log
    umount /var/stats

    Note that one or both of these last two commands may display an error such as:

    Example Errors
    umount: /var/log: not mounted
    umount: /var/stats: not mounted

    this can safely be ignored.

    Note

    If the umount command reports an error such that the filesystem in question can NOT be unmounted as it is busy, use the fuser command to determine which processes are still using it as such (e.g., for /var/log):

    fuser -m /var/log

    Then, the associated numerical process IDs can be killed off one by one (with the kill command) as necessary.

    Continuing on the affected CPU (card associated with the HDD failure)

  10. Execute the format_stats_logs_hard_drive script by issuing the following command at the bash shell prompt:

    format_stats_logs_hard_drive
  11. Answer ‘y’ when prompted to proceed.  The output should be similar to the following:

    Example Output
    =========================================
       TARGET FOR PARTITIONING/FORMATTING:
       Board:               AMC121
       Hard Drive:          /dev/sda
    =========================================
    
    If you choose to partition/format the hard drive, all data on that drive will be erased.  Are you sure you want to partition/format the hard drive? [y/n] y
    
    Erasing existing partitions and filesystems.
    
    Existing partitions and filesystems successfully erased.
    
    Partitioning the hard drive.
    
    Hard drive successfully partitioned.
    
    Formatting filesystem(s) on the hard drive.
    
    Filesystem(s) successfully formatted on the hard drive.
    
    Mounting /dev/sda1 on /shared/stats.
    
    /dev/sda1 successfully mounted on /shared/stats.
    
    Mounting /dev/sda2 on /shared/log.
    
    /dev/sda2 successfully mounted on /shared/log.
    
    Creating required file/directory structure on the new filesytem(s).
    
    File/directory structure successfully created.
    
    All done.
  12. Once the script has completed, issue the following commands at the bash prompt.  Replace “<unaffected slot>” with the slot number of the unaffected MGMT card (the one that did not suffer an associated HDD failure).  For example, if performing the HDD replacement on slot 24, such that the unaffected MGMT card is therefore slot 14, the scp command below would be (example) “scp slot14_0:/var/log/files.tar .”

    cd /var/log/
    scp slot<unaffected slot>_0:/var/log/files.tar .
    Example Output
    files.tar                                     100% 1139MB  29.2MB/s   00:39
    tar -xvf files.tar
    rm -f files.tar
  13. Edit the /etc/fstab file by executing the following commands:

    export TERM=xterm
    vim /etc/fstab
  14. Uncomment the LABEL=LOGS and LABEL=STATS entries by removing the ‘#’ from the beginning of these lines.  When finished, be sure to exit vim by issuing the standard write+quite command from vim’s command mode:

    :wq
  15. From the bash shell, issue the following command so as to verify the file changes:

    cat /etc/fstab
  16. The output should look similar to the following.  Note the two lines are no longer commented out.

    Example Output
    #
    # /etc/fstab
    # Created by anaconda on Sun Nov 17 22:28:32 2019
    #
    # Accessible filesystems, by reference, are maintained under '/dev/disk'
    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
    #
    LABEL=FS2   /                       ext4    noatime,nodiratime        1 1
    LABEL=BOOT  /boot                   ext4    noatime,nodiratime        1 2
    LABEL=FS1   /upgrade                ext4    noatime,nodiratime        1 2
    none        /tmp                    tmpfs   size=256M                 0 0
    LABEL=LOGS  /shared/log             ext4    noatime,nodiratime        1 2
    LABEL=STATS /shared/stats           ext4    noatime,nodiratime        1 2
  17. Complete the HDD replacement issuing the following command at the bash shell prompt, which will reboot the card in question:

    rebootd
  18. Once the card has finished (re)booting, wait several minutes to allow any background processes to complete their startup procedures, then log in as root and make sure that everything is copacetic.  For example, execute the ‘wd’ command to ensure all processes are running as expected, and issue the ‘df’ command to ensure that the proper filesystems on the new HDD are mounted.  There should be entries such as the following (see bold):

    df -hT


    Example Output
    [root@sp2k13Uslot24 ~]# df -hT
    
    Filesystem     Type      Size  Used Avail Use% Mounted on
    
    /dev/sda3      ext3      1.6G  1.2G  283M  82% /
    devtmpfs       devtmpfs  914M  8.0K  914M   1% /dev
    tmpfs          tmpfs     936M   39M  898M   5% /dev/shm
    tmpfs          tmpfs     936M   17M  919M   2% /run
    tmpfs          tmpfs     936M     0  936M   0% /sys/fs/cgroup
    none           tmpfs     256M   12K  256M   1% /tmp
    /dev/sdb1      ext4       58G   53M   55G   1% /shared/stats
    /dev/sdb2      ext4      172G  814M  163G   1% /shared/log
    /dev/sda2      ext4      1.6G  625M  869M  42% /upgrade
    /dev/sda1      ext4      486M  125M  332M  28% /boot
    tmpfs          tmpfs     188M     0  188M   0% /run/user/0
Note

Check the directory listings for /var/log and /var/stats to ensure that the usual logging files have been set up and are being actively logged to.