In this section
Hardware alarms are generated on the AMC671 (MCH) to indicate the status of system components in the chassis and local components residing on the card.
The alarms are accessed from the alarm dashboard in the Web UI. The alarm dashboard provides an indication of system status by showing the number of critical, major, and minor alarms that are generated by system events. Individual alarms that appear in the dashboard are accessed to determine the origin of the alarm. To access the alarms, see section Alarm Dashboard Web User Interface in the Alarms Guide. To view details of individual alarms, see section View Alarms.
The hardware alarms are generated by hardware sensors on the AMC671 (MCH) card. In addition to the local sensors, the MCH card reports chassis alarms generated by system components. There are two types of hardware sensors on the DSC 8000:
In the event an alarm is raised for a system component, each MCH card in the chassis reports the alarm. The slot number for the alarm is set to the reporting MCH's slot number. Each instance of the hwmon application receives notification from both MCH cards in the chassis, and both applications respond with an alarm. Therefore, you will see up to four alarms and traps associated with sensor events for system components: one alarm from each instance of hwmon; and one alarm from each MCH card in the chassis.
When viewing alarm details, the Event Context Name identifies the system component that caused the alarm, and the Event Context Slot identifies the MCH card reporting the event. Use the MCH slot number to determine which chassis is reporting the system component alarm event. For a multi-shelf DSC 8000 system, the MCH cards are identified by the slot number: MCH1 (slot 1) and MCH14 (slot 14) reside in the control shelf; MCH15 (slot 15) and MCH28 (slot 28) reside in the first expansion shelf; MCH29 (slot 29) and MCH42 (slot 42) reside in the second expansion shelf; and finally, MCH43 (slot 43) and MCH56 (slot 56) reside in the third expansion shelf.
Threshold sensors and the corresponding SNMP threshold alarms are described in the following sections:
MCH Threshold Sensors and Alarms
Discrete sensors and the corresponding SNMP alarms are described in the following sections:
MCH Discrete Sensors and Alarms
Alarm details provided in the Web UI include the hardware sensor's Portal ID and IPMI number, which are dynamically assigned. The ID assignments change when a card is removed or when a new card is inserted in the DSC 8000 chassis; therefore, are not used to identify a hardware alarm.
All threshold sensors available on the AMC671 (MCH) card generate alarms. An alarm is generated by a threshold sensor event. Threshold sensor events consist of voltage or temperature values crossing a pre-defined threshold level.
The following table lists and and describes the threshold hardware sensors on AMC671 (MCH) cards.
IPMI Sensor Name | Alias in HWMON | Description | Units | SNMP Alarms | Alarm Event |
---|---|---|---|---|---|
Mezz Temp | Mezz Temp | Temperature sensor on the 10G Ethernet switch fabric of the MCH | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
Temp | Temp 1 | One of four temperature sensors on the MCH. The Temp 1 sensor monitors the outlet temperature of the card. | C degrees | 6348 - 6351 | Critical and Major alarms on threshold crossings (except Minor starting in 15.0) |
Temp 2 | One of four temperature sensors on the MCH. The Temp 2 sensor monitors the inlet temperature of the card. | ||||
Temp 3 | One of four temperature sensors on the MCH. The Temp 3 sensor monitors the temperature on the 1G Ethernet switch on the card. | ||||
Temp 4 | One of four temperature sensors on the MCH. The Temp 4 sensor monitors the temperature of the CPU device on the card. |
Threshold sensor event severity levels are defined as follows:
A threshold sensor triggers an SNMP alarm when a pre-defined temperature threshold level is crossed by a monitored temperature. The following table shows the temperature threshold levels for temperature sensors on AMC671 (MCH) cards.
Some versions of DSC software generate minor temperature alarms when the monitored temperature crosses the Upper Non-Critical (UNC) temperature threshold. Minor temperature threshold crossing events are required for stable operation of the cooling sub-system and the resulting alarms are ignored.
Introduced in DSC software Release 15.0, temperature alarms are only generated when the monitored temperature crosses the Upper-Critical (UC) and Upper-Non-Recoverable (UNR) temperature thresholds, which generates major and critical alarms, respectively.
The following table lists the SNMP alarms registered when a threshold sensor event occurs on the AMC671 (MCH) card.
SNMP Alarm Number | Alarm Name | Clearing Alarm |
---|---|---|
6340 | 6341 | |
6341 | N/A | |
6342 | 6343 | |
6343 | N/A | |
6344 | 6345 | |
6345 | N/A | |
6346 | 6347 | |
6347 | N/A | |
6348 | 6349 | |
6349 | N/A | |
6350 | 6351 | |
6351 | N/A |
Discrete sensors return values of 'on' and 'off' or 'true' and 'false' to the system software. Each entity in the system has a 'Version Change' sensor that reports the entity's FRU state. These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification.
The following Table lists and describes the discrete hardware sensors on AMC671 (MCH) cards.
IPMI Sensor Name | Alias in HWMON | Description | SNMP Alarms | Alarm Event |
---|---|---|---|---|
Hot-swap | Hot-swap MCH 11 | This is the hot-swap sensor for the MCH 1. The hot-swap sensor reports the MCH card's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. | Alarm raised on detection of state transitions for:
Alarm is cleared on detection of state transition for:
| |
Hot-swap MCH 21 | This is the hot-swap sensor for the MCH 2. The hot-swap sensor reports the MCH card's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. | Alarm raised on detection of state transitions for:
Alarm is cleared on detection of state transition for:
| ||
POWER GOOD | Mezz POWER GOOD | This is a PICMG boolean sensor (false = 1, true = 2) that indicates if the power sub-system on the 10G Ethernet switch portion of the MCH card is functional. | On state transition to 'false', a powerFaultAssert Alarm is raised. On state transition to 'true', a powerFaultDeassert (clearing) Alarm is raised. | |
IPMB Physical | N/A | No Alarms are associated with this entity. | ||
Version Change | CU1-Version Change2 | This sensor reports the FRU state on Cooling Unit 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | N/A | Informational sensor only. No alarms are generated. |
CU2-Version Change2 | This sensor reports the FRU state on Cooling Unit 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM1-Version Change2 | This sensor reports the FRU state on Power Module 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM2-Version Change2 | This sensor reports the FRU state on Power Module 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. | |||
PM4-Version Change2 | This sensor reports the FRU state on Power Module 4. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present. |
1 Each entity in the system has a hot-swap sensor that reports the entity's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. The sensor returns a one bit value for each of the eight states, M0 - M7, as defined in the specification. For example, if bit 0 is set, the FRU is in state M0. Similarly, if bit 4 is set, the sensor returns a value of 16 (0001000b), which is the Normal (Active) state, M4.
The state values include:
[7] – 1b: FRU Operational State M7 = Communication Lost
[6] – 1b: FRU Operational State M6 = FRU Deactivation In Progress
[5] – 1b: FRU Operational State M5 = FRU Deactivation Request
[4] – 1b: FRU Operational State M4 = FRU Active
[3] – 1b: FRU Operational State M3 = FRU Activation in Progress
[2] – 1b: FRU Operational State M2 = FRU Activation Request
[1] – 1b: FRU Operational State M1 = FRU Inactive
[0] – 1b: FRU Operational State M0 = FRU Not Installed
2 Each entity in the DSC 8000 system has a version change sensor that reports the entity's FRU state. These states are described in the Intelligent Platform Management Interface Specification Second Generation, v2.0. There are six entities in the DSC 8000 system that are managed by the MCH. The sensor named 'Version Change' monitors the MCH card (MCH 1 or MCH 2). The other entities monitored by version change sensors are: two cooling units (CU1 and CU2) and three power modules (PM1, PM2, and PM4).
Each entity in the system has a 'Version Change' sensor that reports a change in the entity's FRU state.These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification. The sensor returns a one bit value assigned to eight possible FRU state changes. For example, if bit 0 is set, then the condition defined by the value of 00h is present. The eight conditions include the following:
00h: hardware change detected (informational). This offset does not indicate whether the hardware change was successful or not, only that a change occurred.
01h: firmware or software change detected (informational).
02h: hardware incompatibility detected
03h: firmware or software incompatibility detected
04h: entity has an invalid or unsupported hardware version
05h: entity contains an invalid or unsupported firmware or software version
06h: hardware change detected on entity was successful (de-assertion event = unsuccessful)
07h: software or firmware change detected on entity was successful (de-assertion event = unsuccessful)
The following table lists the SNMP alarms that are registered when a discrete sensor event occurs on the AMC671 (MCH) card.
SNMP Alarm Number | Alarm Name | Clearing Alarm |
---|---|---|
6310 | 6311 | |
6311 | N/A | |
6320 | powerFaultAssert | 6321 |
6321 | powerFaultDeassert | N/A |