In this section
Hardware alarms are generated on the DSC 8000 chassis to indicate the status of system components. System components include the chassis, Fan Trays, and Power Supplies.
The alarms are accessed from the alarm dashboard in the Web UI. The alarm dashboard provides an indication of system status by showing the number of critical, major, and minor alarms that are generated by system events. Individual alarms that appear in the dashboard are accessed to determine the origin of the alarm. To access the alarms, see section Alarm Dashboard Web User Interface in the Alarms Guide. To view details of individual alarms, see section View Alarms.
The hardware alarms are generated by hardware sensors monitoring system components in the chassis. There are two types of hardware sensors on the DSC 8000:
- Threshold sensors
- Discrete sensors
All system component sensors and alarms are designated as chassis alarms and are monitored by the AMC671 (MCH) card in each chassis. As many as four chassis are present in a multi-shelf DSC 8000 system. Each MCH monitors its own local sensors as well as system component sensors in each chassis. System component sensors and alarms are accessed by the AMC671 (MCH) slot interface in the the Web UI. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.
Threshold sensors and the corresponding SNMP threshold alarms are described in the following sections:
Chassis Threshold Sensors and Alarms
Discrete sensors and the corresponding SNMP alarms are described in the following sections:
Chassis Discrete Sensors and Alarms
In the event an alarm is raised for a system component, each MCH card in the chassis reports the alarm. The slot number for the alarm is set to the reporting MCH's slot number. Each instance of the hwmon application receives notification from both MCH cards in the chassis, and both applications respond with an alarm. Therefore, you will see up to four alarms and traps associated with sensor events for system components: one alarm from each instance of hwmon; and one alarm from each MCH card in the chassis. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.
System components follow a consistent naming convention in the Web UI. For example, system Fan Trays are are labelled as Cooling Units (CUs) in the Web UI. Similarly, system Power Supplies are labelled as Power Modules (PMs).
Alarm details provided in the Web UI include the hardware sensor's Portal ID and IPMI number, which are dynamically assigned. The ID assignments change when a card is removed or when a new card is inserted in the DSC 8000; therefore, are not used to identify a hardware alarm.
Chassis Threshold Sensors and Alarms
All threshold sensors available on DSC 8000 cooling units and power modules generate alarms. An alarm is generated by a threshold sensor event. Threshold sensor events consist of voltage or temperature values crossing a pre-defined threshold level.
The following table describes the threshold hardware sensors available on the DSC 8000 chassis.
Some versions of DSC software generate minor temperature alarms when the monitored temperature crosses the Upper Non-Critical (UNC) temperature threshold. Minor temperature threshold crossing events are required for stable operation of the cooling sub-system and the resulting alarms are ignored.
Introduced in DSC software Release 15.0, temperature alarms are only generated when the monitored temperature crosses the Upper-Critical (UC) and Upper-Non-Recoverable (UNR) temperature thresholds, which generates major and critical alarms, respectively.
Threshold Sensor Events
Threshold sensor event severity levels are defined as follows:
- Noncritical: This is a warning that one or more operating specifications are somewhat out of normal range, but there is not yet a problem to be addressed. Noncritical events are for information only, and they do not indicate that the DSC 8000 chassis is outside of operating limits. In general, no action is required. However, in certain contexts, system/shelf management software may initiate preventive action. For example, if several cards in a shelf report upper noncritical temperature events, the shelf manager may decide to increase fan speed.
- Critical: The DSC 8000 chassis is operating within specified tolerances, but one or more specifications are getting close to the critical thresholds. Critical events indicate that the card is still within its operating limits, but it is close to exceeding one of those limits. Possible action in this case is to closely monitor the alarming sensor and take more aggressive action if it approaches the nonrecoverable threshold.
- Nonrecoverable: The DSC 8000 chassis is no longer operating within specified tolerances. Nonrecoverable events indicate that the card may no longer be functioning because it is now outside of its operating limits. Action is likely required or has already been taken by the local hardware/firmware. For example, a processor may shut itself down because its maximum die temperature was exceeded, or a shelf manager may deactivate the card because the processor is too hot.
Chassis Voltage Sensor Threshold Levels
The following Table shows the voltage threshold levels for the voltage sensors on the DSC 8000 chassis. The threshold sensors trigger an SNMP alarm when a pre-defined sensor threshold level is crossed.
SNMP Threshold Sensor Alarms
This section describes the SNMP alarms generated by threshold sensor events on the DSC 8000 chassis.
The following SNMP alarms are registered when a threshold sensor event occurs on the DSC 8000 chassis.
Chassis Component FRU Data Records
DSC 8000 chassis system components include fan trays and power supplies. System components are monitored by the AMC671 (MCH) cards located in the same chassis. Each power supply and fan tray has a unique ID in the chassis. An FRU data record is available for system components managed by the MCH card.
FRU data records contain information on manufacturing data associated with each component, including:
- Manufacturing date
- Serial number
- Product part number
To access a system component FRU data record, access the Web UI and select the Hardware Monitor application (hwmon) from the System menu. The MCH slots in the chassis are presented. Click on one of the MCH slots to access FRU data on system components managed by the selected MCH card. FRU data records for power supplies are listed in the table, 'Power Supply Selection'. FRU data records for fan trays are listed in the table, 'Fan Tray Selection'.
The following table describes each FRU data record available on the DSC 8000 chassis.
Chassis Discrete Sensors and Alarms
Cooling units and power modules are defined as system components in the DSC 8000 chassis. An alarm is generated when a cooling unit or power module is inserted or extracted from the chassis.
The following Table describes the discrete hardware sensors on the DSC-8000 chassis. Discrete sensors return values of 'on' and 'off' or 'true' and 'false'. Each entity in the system has a 'Version Change' sensor that reports the entity's FRU state. These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification.
1 Each entity in the system has a hot-swap sensor that reports the entity's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. The sensor returns a one bit value for each of the eight states, M0 - M7, as defined in the specification. For example, if bit 0 is set, the FRU is in state M0. Similarly, if bit 4 is set, the sensor returns a value of 16 (0001000b), which is the Normal (Active) state, M4.
The state values include:
[7] – 1b: FRU Operational State M7 = Communication Lost
[6] – 1b: FRU Operational State M6 = FRU Deactivation In Progress
[5] – 1b: FRU Operational State M5 = FRU Deactivation Request
[4] – 1b: FRU Operational State M4 = FRU Active
[3] – 1b: FRU Operational State M3 = FRU Activation in Progress
[2] – 1b: FRU Operational State M2 = FRU Activation Request
[1] – 1b: FRU Operational State M1 = FRU Inactive
[0] – 1b: FRU Operational State M0 = FRU Not Installed
2 Each entity in the system has a 'Version Change' sensor that reports a change in the entity's FRU state.These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification. The sensor returns a one bit value assigned to eight possible FRU state changes. For example, if bit 0 is set, then the condition defined by the value of 00h is present. The eight conditions include the following:
00h: hardware change detected (informational). This offset does not indicate whether the hardware change was successful or not, only that a change occurred.
01h: firmware or software change detected (informational).
02h: hardware incompatibility detected
03h: firmware or software incompatibility detected
04h: entity has an invalid or unsupported hardware version
05h: entity contains an invalid or unsupported firmware or software version
06h: hardware change detected on entity was successful (de-assertion event = unsuccessful)
07h: software or firmware change detected on entity was successful (de-assertion event = unsuccessful)
SNMP Discrete Sensor Alarms
This section describes the SNMP alarms generated by discrete sensor events on the DSC 8000 chassis.
The following SNMP alarms are registered when a discrete sensor event occurs on the DSC 8000 chassis.