In this section
Hardware alarms are generated on the DSC 8000 chassis to indicate the status of system components. System components include the chassis, Fan Trays, and Power Supplies.
The alarms are accessed from the alarm dashboard in the Web UI. The alarm dashboard provides an indication of system status by showing the number of critical, major, and minor alarms that are generated by system events. Individual alarms that appear in the dashboard are accessed to determine the origin of the alarm. To access the alarms, see section Alarm Dashboard Web User Interface in the Alarms Guide. To view details of individual alarms, see section View Alarms.
The hardware alarms are generated by hardware sensors monitoring system components in the chassis. There are two types of hardware sensors on the DSC 8000:
- Threshold sensors
- Discrete sensors
All system component sensors and alarms are designated as chassis alarms and are monitored by the AMC671 (MCH) card in each chassis. As many as four chassis are present in a multi-shelf DSC 8000 system. Each MCH monitors its own local sensors as well as system component sensors in each chassis. System component sensors and alarms are accessed by the AMC671 (MCH) slot interface in the the Web UI. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.
Threshold sensors and the corresponding SNMP threshold alarms are described in the following sections:
Chassis Threshold Sensors and Alarms
Discrete sensors and the corresponding SNMP alarms are described in the following sections:
Chassis Discrete Sensors and Alarms
In the event an alarm is raised for a system component, each MCH card in the chassis reports the alarm. The slot number for the alarm is set to the reporting MCH's slot number. Each instance of the hwmon application receives notification from both MCH cards in the chassis, and both applications respond with an alarm. Therefore, you will see up to four alarms and traps associated with sensor events for system components: one alarm from each instance of hwmon; and one alarm from each MCH card in the chassis. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.
System components follow a consistent naming convention in the Web UI. For example, system Fan Trays are are labelled as Cooling Units (CUs) in the Web UI. Similarly, system Power Supplies are labelled as Power Modules (PMs).
Alarm details provided in the Web UI include the hardware sensor's Portal ID and IPMI number, which are dynamically assigned. The ID assignments change when a card is removed or when a new card is inserted in the DSC 8000; therefore, are not used to identify a hardware alarm.
Chassis Threshold Sensors and Alarms
All threshold sensors available on DSC 8000 cooling units and power modules generate alarms. An alarm is generated by a threshold sensor event. Threshold sensor events consist of voltage or temperature values crossing a pre-defined threshold level.
The following table describes the threshold hardware sensors available on the DSC 8000 chassis.
Chassis threshold hardware sensors available in HWMON
IPMI Sensor Name | Alias in HWMON | Description | Units | SNMP Alarms | Alarm Event |
---|---|---|---|---|---|
EXHAUST TEMP | CU1-EXHAUST TEMP | Cooling unit exhaust temperature on Cooling Unit 1. | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
CU2-EXHAUST TEMP | Cooling unit exhaust temperature on Cooling Unit 2. | ||||
Exhaust Temp (lower case) | PM1-Exhaust Temp | Power module exhaust temperature on Power Module 1. | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
PM2-Exhaust Temp | Power module exhaust temperature on Power Module 2. | ||||
PM4-Exhaust Temp | Power module exhaust temperature on Power Module 3. | ||||
INLET TEMP | CU1-INLET TEMP | Cooling unit air inlet temperature on Cooling Unit 1. | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
CU2-INLET TEMP | Cooling unit air inlet temperature on Cooling Unit 2. | ||||
Intake Temp | PM1-Intake Temp | Power module air intake temperature on Power Module 1. | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
PM2-Intake Temp | Power module air intake temperature on Power Module 2. | ||||
PM4-Intake Temp | Power module air intake temperature on Power Module 3. | ||||
SMP | PM1-SMP | Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module. Power Module Shared Management Power (SMP) voltage for Power Module 1. | Volts | 6346 - 6351 | Critical, Major, and Minor alarms on threshold crossings. |
PM2-SMP | Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module. Power Module Shared Management Power (SMP) voltage for Power Module 2. | ||||
PM4-SMP | Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module. Power Module Shared Management Power (SMP) voltage for Power Module 4. | ||||
12V | CU1-12V | 12V voltage sensor on Cooling Module 1. | Volts | 6346 - 6351 | Critical, Major, and Minor alarms on threshold crossings. |
CU2-12V | 12V voltage sensor on Cooling Module 2. | ||||
PM1-12V | 12V voltage sensor on Power Module 1. | ||||
PM2-12V | 12V voltage sensor on Power Module 2. | ||||
PM4-12V | 12V voltage sensor on Power Module 4. | ||||
3.3V | PM1-3.3V | 3.3V voltage sensor on Power Module 1. | Volts | 6346 - 6351 | Critical, Major, and Minor alarms on threshold crossings. |
PM2-3.3V | 3.3V voltage sensor on Power Module 2. | ||||
PM4-3.3V | 3.3V voltage sensor on Power Module 4. | ||||
3.3V MGMT | CU1-3.3V MGMT | Cooling unit 3.3V management (IPMI) power sensor on Cooling Unit 1. | Volts | 6346 - 6351 | Critical, Major, and Minor alarms on threshold crossings. |
CU2-3.3V MGMT | Cooling unit 3.3V management (IPMI) power sensor on Cooling Unit 2. | ||||
Iout | PM1-Iout 0 PM2-Iout 0 PM4-Iout 0 | Each PM has 3 power bricks that supply current to the system modules. Sensor Iout 0 monitors output current for power brick_0 on each Power Module. | Amperes | 6346 - 6351 | Critical, Major, and Minor alarms on threshold crossings. |
PM1-Iout 1 PM2-Iout 1 PM4-Iout 1 | Each PM has 3 power bricks that supply current to system modules. Sensor Iout 1 monitors output current for power brick_1 on each Power Module. | ||||
PM1-Iout 2 PM2-Iout 2 PM4-Iout 2 | Each PM has 3 power bricks that supply current to system modules. Sensor Iout 2 monitors output current for power brick 2 on each Power Module. | ||||
Fan | CU1-Fan 1 CU2-Fan 1 | Each cooling unit has 5 cooling fans. The sensor Fan 1 monitors the speed of fan 1 on each cooling unit. | RPM | 6346 - 6351 | Alarms on lower minor, major, and critical threshold crossings. |
CU1-Fan 2 | Each cooling unit has 5 cooling fans. The sensor Fan 2 monitors the speed of fan 2 on each cooling unit. | ||||
CU1-Fan 3 | Each cooling unit has 5 cooling fans. The sensor Fan 3 monitors the speed of fan 3 on each cooling unit. | ||||
CU1-Fan 4 | Each cooling unit has 5 cooling fans. The sensor Fan 4 monitors the speed of fan 4 on each cooling unit. | ||||
CU1-Fan 5 | Each cooling unit has 5 cooling fans. The sensor Fan 5 monitors the speed of fan 5 on each cooling unit. | ||||
VDD_EMMC | PM1-VDD_EMMC | Power Module EMMC (IPMI controller) voltage on Power Module 1. | Volts
| 6346 - 6351 | Alarms on lower minor, major, and critical threshold crossings. |
PM2-VDD_EMMC | Power Module EMMC (IPMI controller) voltage on Power Module 2. | ||||
PM4-VDD_EMMC | Power Module EMMC (IPMI controller) voltage on Power Module 4. |
Some versions of DSC software generate minor temperature alarms when the monitored temperature crosses the Upper Non-Critical (UNC) temperature threshold. Minor temperature threshold crossing events are required for stable operation of the cooling sub-system and the resulting alarms are ignored.
Introduced in DSC software Release 15.0, temperature alarms are only generated when the monitored temperature crosses the Upper-Critical (UC) and Upper-Non-Recoverable (UNR) temperature thresholds, which generates major and critical alarms, respectively.
Threshold Sensor Events
Threshold sensor event severity levels are defined as follows:
- Noncritical: This is a warning that one or more operating specifications are somewhat out of normal range, but there is not yet a problem to be addressed. Noncritical events are for information only, and they do not indicate that the DSC 8000 chassis is outside of operating limits. In general, no action is required. However, in certain contexts, system/shelf management software may initiate preventive action. For example, if several cards in a shelf report upper noncritical temperature events, the shelf manager may decide to increase fan speed.
- Critical: The DSC 8000 chassis is operating within specified tolerances, but one or more specifications are getting close to the critical thresholds. Critical events indicate that the card is still within its operating limits, but it is close to exceeding one of those limits. Possible action in this case is to closely monitor the alarming sensor and take more aggressive action if it approaches the nonrecoverable threshold.
- Nonrecoverable: The DSC 8000 chassis is no longer operating within specified tolerances. Nonrecoverable events indicate that the card may no longer be functioning because it is now outside of its operating limits. Action is likely required or has already been taken by the local hardware/firmware. For example, a processor may shut itself down because its maximum die temperature was exceeded, or a shelf manager may deactivate the card because the processor is too hot.
Chassis Voltage Sensor Threshold Levels
The following Table shows the voltage threshold levels for the voltage sensors on the DSC 8000 chassis. The threshold sensors trigger an SNMP alarm when a pre-defined sensor threshold level is crossed.
Chassis Voltage Threshold Sensor Levels
Sensor Name | Units | Lower Non-Recoverable (Alarm 6350) | Lower (Alarm 6348) | Lower (Alarm 6346) | Upper (Alarm 6344) | Upper (Alarm 6342) | Upper (Alarm 6340) |
---|---|---|---|---|---|---|---|
12V | Volts | 9.06 | 10.02 | 10.86 | 13.26 | 14.10 | 15.06 |
3.3V | Volts | 3.313 | 3.372 | 3.387 | 3.71 | 3.725 | 3.754 |
3.3V MGMT | Volts | 0.001 | 3.019 | 3.078 | 3.549 | 3.607 | 3.803 |
SNMP Threshold Sensor Alarms
This section describes the SNMP alarms generated by threshold sensor events on the DSC 8000 chassis.
The following SNMP alarms are registered when a threshold sensor event occurs on the DSC 8000 chassis.
DSC 8000 SNMP Threshold Sensor Alarms
SNMP Alarm Number | Alarm Name | Clearing Alarm |
---|---|---|
6340 | 6341 | |
6341 | N/A | |
6342 | 6343 | |
6343 | N/A | |
6344 | 6345 | |
6345 | N/A | |
6346 | 6347 | |
6347 | N/A | |
6348 | 6349 | |
6349 | N/A | |
6350 | 6351 | |
6351 | N/A |
Chassis Component FRU Data Records
DSC 8000 chassis system components include fan trays and power supplies. System components are monitored by the AMC671 (MCH) cards located in the same chassis. Each power supply and fan tray has a unique ID in the chassis. An FRU data record is available for system components managed by the MCH card.
FRU data records contain information on manufacturing data associated with each component, including:
- Manufacturing date
- Serial number
- Product part number
To access a system component FRU data record, access the Web UI and select the Hardware Monitor application (hwmon) from the System menu. The MCH slots in the chassis are presented. Click on one of the MCH slots to access FRU data on system components managed by the selected MCH card. FRU data records for power supplies are listed in the table, 'Power Supply Selection'. FRU data records for fan trays are listed in the table, 'Fan Tray Selection'.
The following table describes each FRU data record available on the DSC 8000 chassis.
DSC 8000 Chassis FRU Data Records
System Component | Component ID | Product Name | Description |
---|---|---|---|
Fan Tray | Fan Tray ID_0 | Altamont | FRU Data Record for the Fan Tray in slot CU1 |
Fan Tray ID_1 | Altamont | FRU Data Record for the Fan Tray in slot CU2 | |
Power Supply | Power Supply ID_0 | MTC6213 | FRU Data Record for the Power Supply in slot PM1 |
Power Supply ID_1 | MTC6213 | FRU Data Record for the Power Supply in slot PM2 | |
Power Supply ID_2 | MTC6213 | FRU Data Record for the Power Supply in slot PM4 |
Chassis Discrete Sensors and Alarms
Cooling units and power modules are defined as system components in the DSC 8000 chassis. An alarm is generated when a cooling unit or power module is inserted or extracted from the chassis.
The following Table describes the discrete hardware sensors on the DSC-8000 chassis. Discrete sensors return values of 'on' and 'off' or 'true' and 'false'. Each entity in the system has a 'Version Change' sensor that reports the entity's FRU state. These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification.
Chassis discrete hardware sensors available in HWMON
IPMI Sensor Name | Alias in HWMON | Description | SNMP Alarms | Alarm Event |
---|---|---|---|---|
PM Status
| PM1-PM Status | Each power module has a status sensor. PM1-PM is the status sensor on Power Module 1. The sensor returns the following values: 1 = PM OK | On transition to 'Failure Detected', a powerFaultAsserted alarm is raised. On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised. | |
PM2-PM Status | Each power module has a status sensor. PM2-PM is the status sensor on Power Module 2. The sensor returns the following values: 1 = PM OK | On transition to 'Failure Detected', a powerFaultAsserted alarm is raised. On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised. | ||
PM4-PM Status | Each power module has a status sensor. PM4-PM is the status sensor on Power Module 4. The sensor returns the following values: 1 = PM OK | 6321 powerFaultDeassert | On transition to 'Failure Detected', a powerFaultAsserted alarm is raised. On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised. | |
Telco Alarm | Telco Alarm | Indicates the state of the Cut-off switch. | N/A | No alarms are associated with this entity. |
Hot-swap
| Hot-swap CU 11 | Hot-swap sensor for Cooling Unit 1. | A system-level component has been deactivated or extracted. The A system-level component has been activated or inserted. This is a clearing Alarm. | |
Hot-swap CU 21 | Hot-swap sensor for Cooling Unit 2. | 6329 sysComponentInserted | A system-level component has been deactivated or extracted. The A system-level component has been activated or inserted.This is a clearing Alarm. | |
Hot-swap PM 11 | Hot-swap sensor for Power Module 1 | 6329 sysComponentInserted | A system-level component has been deactivated or extracted. The A system-level component has been activated or inserted. | |
Hot-swap PM 21 | Hot-swap sensor for Power Module 2 | 6329 sysComponentInserted | A system-level component has been deactivated or extracted. The A system-level component has been activated or inserted. This is a clearing Alarm. | |
Hot-swap PM 41 | Hot-swap sensor for Power Module 4 | 6329 sysComponentInserted | A system-level component has been deactivated or extracted. The A system-level component has been activated or inserted. This is a clearing Alarm. | |
Hot-swap Telco1 | Hot-swap sensor for the Telco Alarm Panel software entity The Telco entity this sensor belongs to, is a software construct. It does not refer to the physical alarm panel interface. No alarms are raised for this sensor. | N/A | No alarms are associated with this entity. | |
Version Change | CU1-Version Change2 | This sensor reports the FRU state on Cooling Unit 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | N/A | Informational sensor only. No alarms are generated. |
CU2-Version Change2 | This sensor reports the FRU state on Cooling Unit 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM1-Version Change2 | This sensor reports the FRU state on Power Module 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM2-Version Change2 | This sensor reports the FRU state on Power Module 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM4-Version Change2 | This sensor reports the FRU state on Power Module 4. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. |
1 Each entity in the system has a hot-swap sensor that reports the entity's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. The sensor returns a one bit value for each of the eight states, M0 - M7, as defined in the specification. For example, if bit 0 is set, the FRU is in state M0. Similarly, if bit 4 is set, the sensor returns a value of 16 (0001000b), which is the Normal (Active) state, M4.
The state values include:
[7] – 1b: FRU Operational State M7 = Communication Lost
[6] – 1b: FRU Operational State M6 = FRU Deactivation In Progress
[5] – 1b: FRU Operational State M5 = FRU Deactivation Request
[4] – 1b: FRU Operational State M4 = FRU Active
[3] – 1b: FRU Operational State M3 = FRU Activation in Progress
[2] – 1b: FRU Operational State M2 = FRU Activation Request
[1] – 1b: FRU Operational State M1 = FRU Inactive
[0] – 1b: FRU Operational State M0 = FRU Not Installed
2 Each entity in the system has a 'Version Change' sensor that reports a change in the entity's FRU state.These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification. The sensor returns a one bit value assigned to eight possible FRU state changes. For example, if bit 0 is set, then the condition defined by the value of 00h is present. The eight conditions include the following:
00h: hardware change detected (informational). This offset does not indicate whether the hardware change was successful or not, only that a change occurred.
01h: firmware or software change detected (informational).
02h: hardware incompatibility detected
03h: firmware or software incompatibility detected
04h: entity has an invalid or unsupported hardware version
05h: entity contains an invalid or unsupported firmware or software version
06h: hardware change detected on entity was successful (de-assertion event = unsuccessful)
07h: software or firmware change detected on entity was successful (de-assertion event = unsuccessful)
SNMP Discrete Sensor Alarms
This section describes the SNMP alarms generated by discrete sensor events on the DSC 8000 chassis.
The following SNMP alarms are registered when a discrete sensor event occurs on the DSC 8000 chassis.
DSC 8000 SNMP Discrete Sensor Alarms
SNMP Alarm Number | Alarm Name | Clearing Alarm |
---|---|---|
6320 | powerFaultAssert | 6321 |
6321 | powerFaultDeassert | N/A |
6328 | 6329 | |
6329 | N/A |