In this section

Hardware alarms are generated on the DSC 8000 chassis to indicate the status of system components. System components include the chassis, Fan Trays, and Power Supplies.

The alarms are accessed from the alarm dashboard in the Web UI. The alarm dashboard provides an indication of system status by showing the number of critical, major, and minor alarms that are generated by system events. Individual alarms that appear in the dashboard are accessed to determine the origin of the alarm. To access the alarms, see section Alarm Dashboard Web User Interface in the Alarms Guide. To view details of individual alarms, see section View Alarms.

The hardware alarms are generated by hardware sensors monitoring system components in the chassis. There are two types of hardware sensors on the DSC 8000:

  • Threshold sensors
  • Discrete sensors
Note

All system component sensors and alarms are designated as chassis alarms and are monitored by the AMC671 (MCH) card in each chassis. As many as four chassis are present in a multi-shelf DSC 8000 system. Each MCH monitors its own local sensors as well as system component sensors in each chassis. System component sensors and alarms are accessed by the AMC671 (MCH) slot interface in the the Web UI. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.

Threshold sensors and the corresponding SNMP threshold alarms are described in the following sections:

Chassis Threshold Sensors and Alarms

SNMP Threshold Alarms

Discrete sensors and the corresponding SNMP alarms are described in the following sections:

Chassis Discrete Sensors and Alarms

SNMP Discrete Alarms

Note

In the event an alarm is raised for a system component, each MCH card in the chassis reports the alarm. The slot number for the alarm is set to the reporting MCH's slot number. Each instance of the hwmon application receives notification from both MCH cards in the chassis, and both applications respond with an alarm. Therefore, you will see up to four alarms and traps associated with sensor events for system components: one alarm from each instance of hwmon; and one alarm from each MCH card in the chassis. For more information on viewing chassis alarms, refer to AMC671 (MCH) Alarms.

Note

System components follow a consistent naming convention in the Web UI. For example, system Fan Trays are are labelled as Cooling Units (CUs) in the Web UI. Similarly, system Power Supplies are labelled as Power Modules (PMs).

Note

Alarm details provided in the Web UI include the hardware sensor's Portal ID and IPMI number, which are dynamically assigned. The ID assignments change when a card is removed or when a new card is inserted in the DSC 8000; therefore, are not used to identify a hardware alarm.

Chassis Threshold Sensors and Alarms

All threshold sensors available on DSC 8000 cooling units and power modules generate alarms. An alarm is generated by a threshold sensor event. Threshold sensor events consist of voltage or temperature values crossing a pre-defined threshold level.

The following table describes the threshold hardware sensors available on the DSC 8000 chassis.

Chassis threshold hardware sensors available in HWMON

IPMI Sensor NameAlias in HWMONDescriptionUnitsSNMP AlarmsAlarm Event
EXHAUST TEMP

CU1-EXHAUST TEMP

Cooling unit exhaust temperature on Cooling Unit 1.C degrees6348 - 6351Critical and Major alarms on threshold crossings (except Minor starting in 15.0)
CU2-EXHAUST TEMPCooling unit exhaust temperature on Cooling Unit 2.
Exhaust Temp
(lower case)
PM1-Exhaust TempPower module exhaust temperature on Power Module 1.C degrees

6348 - 6351

Critical and Major alarms on threshold crossings (except Minor starting in 15.0)

PM2-Exhaust TempPower module exhaust temperature on Power Module 2.
PM4-Exhaust TempPower module exhaust temperature on Power Module 3.
INLET TEMPCU1-INLET TEMP

Cooling unit air inlet temperature on Cooling Unit 1.C degrees6348 - 6351Critical and Major alarms on threshold crossings (except Minor starting in 15.0)
CU2-INLET TEMPCooling unit air inlet temperature on Cooling Unit 2.
Intake TempPM1-Intake Temp

Power module air intake temperature on Power Module 1.C degrees

6348 - 6351

Critical and Major alarms on threshold crossings (except Minor starting in 15.0)

PM2-Intake TempPower module air intake temperature on Power Module 2.
PM4-Intake TempPower module air intake temperature on Power Module 3.
SMP

PM1-SMP


Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module.

Power Module Shared Management Power (SMP) voltage for Power Module 1.

Volts6346 - 6351Critical, Major, and Minor alarms on threshold crossings.
PM2-SMP

Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module.

Power Module Shared Management Power (SMP) voltage for Power Module 2.

PM4-SMP

Each power module generates 5V, which is diode-ored on the backplane. The shared voltage is used to generate the VDD_EMMC voltage on each power module.

Power Module Shared Management Power (SMP) voltage for Power Module 4.

12V

CU1-12V

12V voltage sensor on Cooling Module 1.Volts6346 - 6351Critical, Major, and Minor alarms on threshold crossings.
CU2-12V12V voltage sensor on Cooling Module 2.
PM1-12V12V voltage sensor on Power Module 1.
PM2-12V12V voltage sensor on Power Module 2.
PM4-12V12V voltage sensor on Power Module 4.
3.3VPM1-3.3V3.3V voltage sensor on Power Module 1.Volts6346 - 6351Critical, Major, and Minor alarms on threshold crossings.
PM2-3.3V3.3V voltage sensor on Power Module 2.
PM4-3.3V3.3V voltage sensor on Power Module 4.
3.3V MGMTCU1-3.3V MGMT

Cooling unit 3.3V management (IPMI) power sensor on Cooling Unit 1.Volts6346 - 6351Critical, Major, and Minor alarms on threshold crossings.
CU2-3.3V MGMTCooling unit 3.3V management (IPMI) power sensor on Cooling Unit 2.
Iout


PM1-Iout 0
PM2-Iout 0
PM4-Iout 0
Each PM has 3 power bricks that supply current to the system modules. Sensor Iout 0 monitors output current for power brick_0 on each Power Module.

Amperes

6346 - 6351

Critical, Major, and Minor alarms on threshold crossings.

PM1-Iout 1
PM2-Iout 1
PM4-Iout 1
Each PM has 3 power bricks that supply current to system modules. Sensor Iout 1 monitors output current for power brick_1 on each Power Module.
PM1-Iout 2
PM2-Iout 2
PM4-Iout 2
Each PM has 3 power bricks that supply current to system modules. Sensor Iout 2 monitors output current for power brick 2 on each Power Module.
FanCU1-Fan 1
CU2-Fan 1
Each cooling unit has 5 cooling fans. The sensor Fan 1 monitors the speed of fan 1 on each cooling unit.RPM6346 - 6351Alarms on lower minor, major, and critical threshold crossings.



CU1-Fan 2
CU2-Fan 2

Each cooling unit has 5 cooling fans. The sensor Fan 2 monitors the speed of fan 2 on each cooling unit.

CU1-Fan 3
CU2-Fan 3

Each cooling unit has 5 cooling fans. The sensor Fan 3 monitors the speed of fan 3 on each cooling unit.

CU1-Fan 4
CU2-Fan 4

Each cooling unit has 5 cooling fans. The sensor Fan 4 monitors the speed of fan 4 on each cooling unit.

CU1-Fan 5
CU2-Fan 5

Each cooling unit has 5 cooling fans. The sensor Fan 5 monitors the speed of fan 5 on each cooling unit.
VDD_EMMCPM1-VDD_EMMC

Power Module EMMC (IPMI controller) voltage on Power Module 1.

Volts

 

 

6346 - 6351

Alarms on lower minor, major, and critical threshold crossings.
PM2-VDD_EMMC

Power Module EMMC (IPMI controller) voltage on Power Module 2.

PM4-VDD_EMMC

Power Module EMMC (IPMI controller) voltage on Power Module 4.

Caution

Some versions of DSC software generate minor temperature alarms when the monitored temperature crosses the Upper Non-Critical (UNC) temperature threshold. Minor temperature threshold crossing events are required for stable operation of the cooling sub-system and the resulting alarms are ignored.

Introduced in DSC software Release 15.0, temperature alarms are only generated when the monitored temperature crosses the Upper-Critical (UC) and Upper-Non-Recoverable (UNR) temperature thresholds, which generates major and critical alarms, respectively.

Threshold Sensor Events

Threshold sensor event severity levels are defined as follows:

  • Noncritical: This is a warning that one or more operating specifications are somewhat out of normal range, but there is not yet a problem to be addressed. Noncritical events are for information only, and they do not indicate that the DSC 8000 chassis is outside of operating limits. In general, no action is required. However, in certain contexts, system/shelf management software may initiate preventive action. For example, if several cards in a shelf report upper noncritical temperature events, the shelf manager may decide to increase fan speed.

  • Critical: The DSC 8000 chassis is operating within specified tolerances, but one or more specifications are getting close to the critical thresholds. Critical events indicate that the card is still within its operating limits, but it is close to exceeding one of those limits. Possible action in this case is to closely monitor the alarming sensor and take more aggressive action if it approaches the nonrecoverable threshold.

  • Nonrecoverable: The DSC 8000 chassis is no longer operating within specified tolerances. Nonrecoverable events indicate that the card may no longer be functioning because it is now outside of its operating limits. Action is likely required or has already been taken by the local hardware/firmware. For example, a processor may shut itself down because its maximum die temperature was exceeded, or a shelf manager may deactivate the card because the processor is too hot.

Chassis Voltage Sensor Threshold Levels

The following Table shows the voltage threshold levels for the voltage sensors on the DSC 8000 chassis. The threshold sensors trigger an SNMP alarm when a pre-defined sensor threshold level is crossed.

Chassis Voltage Threshold Sensor Levels

Sensor
Name

Units

Lower Non-Recoverable
Threshold

(Alarm 6350)

Lower
Critical Threshold

(Alarm 6348)

Lower
Non-Critical Threshold

(Alarm 6346)

Upper
Non-Critical Threshold

(Alarm 6344)

Upper
Critical Threshold

(Alarm 6342)

Upper
Non-Recoverable Threshold

(Alarm 6340)

12VVolts9.0610.0210.8613.2614.1015.06
3.3VVolts3.3133.3723.3873.713.7253.754
3.3V MGMT Volts0.0013.0193.0783.5493.6073.803

 

SNMP Threshold Sensor Alarms

This section describes the SNMP alarms generated by threshold sensor events on the DSC 8000 chassis.

The following SNMP alarms are registered when a threshold sensor event occurs on the DSC 8000 chassis.

DSC 8000 SNMP Threshold Sensor Alarms

 

Chassis Component FRU Data Records

DSC 8000 chassis system components include fan trays and power supplies. System components are monitored by the AMC671 (MCH) cards located in the same chassis. Each power supply and fan tray has a unique ID in the chassis. An FRU data record is available for system components managed by the MCH card. 

FRU data records contain information on manufacturing data associated with each component, including:

  • Manufacturing date
  • Serial number
  • Product part number

To access a system component FRU data record, access the Web UI and select the Hardware Monitor application (hwmon) from the System menu. The MCH slots in the chassis are presented. Click on one of the MCH slots to access FRU data on system components managed by the selected MCH card. FRU data records for power supplies are listed in the table, 'Power Supply Selection'. FRU data records for fan trays are listed in the table, 'Fan Tray Selection'.

The following table describes each FRU data record available on the DSC 8000 chassis.

DSC 8000 Chassis FRU Data Records

System ComponentComponent IDProduct Name            Description
Fan TrayFan Tray ID_0

Altamont            

FRU Data Record for the Fan Tray in slot CU1

Fan Tray ID_1

Altamont            

FRU Data Record for the Fan Tray in slot CU2
Power SupplyPower Supply ID_0

MTC6213           

FRU Data Record for the Power Supply in slot PM1
Power Supply ID_1

MTC6213           

FRU Data Record for the Power Supply in slot PM2
Power Supply ID_2

MTC6213           

FRU Data Record for the Power Supply in slot PM4

 

Chassis Discrete Sensors and Alarms

Cooling units and power modules are defined as system components in the DSC 8000 chassis. An alarm is generated when a cooling unit or power module is inserted or extracted from the chassis.

The following Table describes the discrete hardware sensors on the DSC-8000 chassis. Discrete sensors return values of 'on' and 'off' or 'true' and 'false'. Each entity in the system has a 'Version Change' sensor that reports the entity's FRU state. These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification.

Chassis discrete hardware sensors available in HWMON

IPMI Sensor NameAlias in HWMONDescriptionSNMP AlarmsAlarm Event

PM Status

 

PM1-PM Status

Each power module has a status sensor. PM1-PM is the status sensor on Power Module 1. The sensor returns the following values:

1 = PM OK
2 = Failure Detected

6320
powerFaultAssert

6321
powerFaultDeassert

On transition to 'Failure Detected', a powerFaultAsserted alarm is raised.

On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised.

PM2-PM Status

Each power module has a status sensor. PM2-PM is the status sensor on Power Module 2. The sensor returns the following values:

1 = PM OK
2 = Failure Detected

6320
powerFaultAssert

6321
powerFaultDeassert

On transition to 'Failure Detected', a powerFaultAsserted alarm is raised.

On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised.

PM4-PM Status

Each power module has a status sensor. PM4-PM is the status sensor on Power Module 4. The sensor returns the following values:

1 = PM OK
2 = Failure Detected

6320
powerFaultAssert

6321
powerFaultDeassert

On transition to 'Failure Detected', a powerFaultAsserted alarm is raised.

On transition to 'PM OK', a powerFaultDeasserted alarm (clearing event) is raised.

Telco AlarmTelco AlarmIndicates the state of the Cut-off switch.N/ANo alarms are associated with this entity.
Hot-swap
   

 



                       
Hot-swap CU 11

Hot-swap sensor for Cooling Unit 1.

6328 sysComponentExtracted

6329 sysComponentInserted

A system-level component has been deactivated or extracted. The
component that was extracted is identified in the alarm.

A system-level component has been activated or inserted. This is a clearing Alarm.

Hot-swap CU 21Hot-swap sensor for Cooling Unit 2.

6328 sysComponentExtracted

6329 sysComponentInserted

A system-level component has been deactivated or extracted. The
component that was extracted is identified in the alarm.

A system-level component has been activated or inserted.This is a clearing Alarm.

Hot-swap PM 11

Hot-swap sensor for Power Module 1

6328 sysComponentExtracted

6329 sysComponentInserted

A system-level component has been deactivated or extracted. The
component that was extracted is identified in the alarm.

A system-level component has been activated or inserted.

Hot-swap PM 21Hot-swap sensor for Power Module 2

6328 sysComponentExtracted

6329 sysComponentInserted

A system-level component has been deactivated or extracted. The
component that was extracted is identified in the alarm.

A system-level component has been activated or inserted. This is a clearing Alarm.

Hot-swap PM 41Hot-swap sensor for Power Module 4

6328 sysComponentExtracted

6329 sysComponentInserted

A system-level component has been deactivated or extracted. The
component that was extracted is identified in the alarm.

A system-level component has been activated or inserted. This is a clearing Alarm.

Hot-swap Telco1

Hot-swap sensor for the Telco Alarm Panel software entity

The Telco entity this sensor belongs to, is a software construct. It does not refer to the physical alarm panel interface. No alarms are raised for this sensor.

N/ANo alarms are associated with this entity.
Version ChangeCU1-Version Change2This sensor reports the FRU state on Cooling Unit 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present.N/AInformational sensor only. No alarms are generated.
CU2-Version Change2This sensor reports the FRU state on Cooling Unit 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present.
PM1-Version Change2This sensor reports the FRU state on Power Module 1. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present.
PM2-Version Change2This sensor reports the FRU state on Power Module 2. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present.
PM4-Version Change2This sensor reports the FRU state on Power Module 4. The sensor returns a one bit value assigned to eight possible FRU conditions. For example, if bit 0 is set, the condition defined by the value of 00h is present.

1 Each entity in the system has a hot-swap sensor that reports the entity's FRU state. These states are described in PICMG® 3.0 AdvancedTCA® Base Specification. The sensor returns a one bit value for each of the eight states, M0 - M7, as defined in the specification. For example, if bit 0 is set, the FRU is in state M0. Similarly, if bit 4 is set, the sensor returns a value of 16 (0001000b), which is the Normal (Active) state, M4.

The state values include:

[7] – 1b: FRU Operational State M7 = Communication Lost

[6] – 1b: FRU Operational State M6 = FRU Deactivation In Progress

[5] – 1b: FRU Operational State M5 = FRU Deactivation Request

[4] – 1b: FRU Operational State M4 = FRU Active

[3] – 1b: FRU Operational State M3 = FRU Activation in Progress

[2] – 1b: FRU Operational State M2 = FRU Activation Request

[1] – 1b: FRU Operational State M1 = FRU Inactive

[0] – 1b: FRU Operational State M0 = FRU Not Installed

2 Each entity in the system has a 'Version Change' sensor that reports a change in the entity's FRU state.These states are described in Intelligent Platform Management Interface Specification Second Generation (v2.0) specification. The sensor returns a one bit value assigned to eight possible FRU state changes. For example, if bit 0 is set, then the condition defined by the value of 00h is present. The eight conditions include the following:

00h: hardware change detected (informational). This offset does not indicate whether the hardware change was successful or not, only that a change occurred.

01h: firmware or software change detected (informational).

02h: hardware incompatibility detected

03h: firmware or software incompatibility detected

04h: entity has an invalid or unsupported hardware version

05h: entity contains an invalid or unsupported firmware or software version

06h: hardware change detected on entity was successful (de-assertion event = unsuccessful)

07h: software or firmware change detected on entity was successful (de-assertion event = unsuccessful)

 

SNMP Discrete Sensor Alarms

This section describes the SNMP alarms generated by discrete sensor events on the DSC 8000 chassis.

The following SNMP alarms are registered when a discrete sensor event occurs on the DSC 8000 chassis.

DSC 8000 SNMP Discrete Sensor Alarms

SNMP Alarm NumberAlarm NameClearing Alarm
6320powerFaultAssert6321
6321powerFaultDeassertN/A

6328

sysComponentExtracted

6329

6329

sysComponentInserted

N/A

    

  • No labels