Cisco UCS
How to recover the only administrator account for Cisco UCS Manager
Recovering the only administrator account for Cisco UCS Manager:
If we lost/forgot the password of the only administrator account, you cannot retrieve the original password. However you have the option to recover it by changing the password for which you need to to power cycle all fabric interconnects (FI) in a Cisco UCS domain.
You can reset the password for all other local accounts through Cisco UCS Manager. However, you must log in to Cisco UCS Manager with an account that includes aaa or admin privileges. If you do not have access to a admin account then read below:
I am going to tell you in steps how to do that:
Prerequisite 1: Determining the Leadership Role of a Fabric Interconnect
- In the Navigation pane, click the Equipment tab.
- In the Equipment tab, expand Equipment > Fabric Interconnects.
- Click the fabric interconnect for which you want to identify the role.
- In the Work pane, click the General tab.
- In the General tab, click the down arrows on the High Availability Details bar to expand that area.
- View the Leadership field to determine whether the fabric interconnect is the primary or subordinate.
Prerequisite 2: Verifying the Firmware Versions on a Fabric Interconnect
You can use the following procedure to verify the firmware versions on all fabric interconnects in a Cisco UCS domain. You can verify the firmware for a single fabric interconnect through the Installed Firmware tab for that fabric interconnect.
- In the Navigation pane, click the Equipment tab.
- In the Equipment tab, click the Equipment node.
- In the Work pane, click the Firmware Management tab.
- In the Installed Firmware tab, verify that the following firmware versions for each fabric interconnect match the version to which you updated the firmware:
Kernel version
System version
Scenario 1: Recovering the Admin Account Password in a Standalone Configuration
This procedure will help you to recover the password that you set for the admin account when you performed an initial system setup on the fabric interconnect. The admin account is the system administrator or superuser account.
Before You Begin:
- Physically connect the console port on the fabric interconnect to a computer terminal or console server
- Determine the running versions of the following firmware:
The firmware kernel version on the fabric interconnect
The firmware system version
- Connect to the console port.
- Power cycle the fabric interconnect:
Turn off the power to the fabric interconnect.
Turn on the power to the fabric interconnect.
- In the console, press one of the following key combinations as it boots to get the loader prompt:
Ctrl+l
Ctrl+Shift+r
You may need to press the selected key combination multiple times before your screen displays the loader prompt.
- Boot the kernel firmware version on the fabric interconnect.
loader >
boot /installables/switch/
kernel_firmware_version
Example:
loader >
boot /installables/switch/ucs-6100-k9-kickstart.4.1.3.N2.1.0.11.gbin
- Enter config terminal mode.
Fabric(boot)#
config terminal
- Reset the admin password.
Fabric(boot)(config)#
admin-password
password
Choose a strong password that includes at least one capital letter and one number. The password cannot be blank. The new password displays in clear text mode.
- Exit config terminal mode and return to the boot prompt.
- Boot the system firmware version on the fabric interconnect.
Fabric(boot)#
load /installables/switch/
system_firmware_version
Example:
Fabric(boot)#
load /installables/switch/ucs-6100-k9-system.4.1.3.N2.1.0.211.bin
- After the system image loads, log in to Cisco UCS Manager.
Scenario 2: Recovering the Admin Account Password in a Cluster Configuration
This procedure will help you to recover the password that you set for the admin account when you performed an initial system setup on the fabric interconnects. The admin account is the system administrator or superuser account.
Before You Begin
- Physically connect a console port on one of the fabric interconnects to a computer terminal or console server.
- Obtain the following information:
The firmware kernel version on the fabric interconnect
The firmware system version
Which fabric interconnect has the primary leadership role and which is the subordinate
- Connect to the console port.
- For the subordinate fabric interconnect.
- Turn off the power to the fabric interconnect.
- Turn on the power to the fabric interconnect.
- In the console, press one of the following key combinations as it boots to get the loader prompt:
Ctrl+l
Ctrl+Shift+r
You may need to press the selected key combination multiple times before your screen displays the loader prompt.
- Power cycle the primary fabric interconnect:
- Turn off the power to the fabric interconnect.
- Turn on the power to the fabric interconnect.
- In the console, press one of the following key combinations as it boots to get the loader prompt:
Ctrl+l
Ctrl+Shift+r
You may need to press the selected key combination multiple times before your screen displays the loader prompt.
- Boot the kernel firmware version on the primary fabric interconnect.
loader > boot /installables/switch/
kernel_firmware_version
Example:
loader > boot /installables/switch/ucs-6100-k9-kickstart.4.1.3.N2.1.0.11.gbin
- Enter config terminal mode.
Fabric(boot)# config terminal
- Reset the admin password.
Fabric(boot)(config)# admin-password password
Choose a strong password that includes at least one capital letter and one number. The password cannot be blank. The new password displays in clear text mode.
- Exit config terminal mode and return to the boot prompt.
- Boot the system firmware version on the primary fabric interconnect.
Fabric(boot)# load /installables/switch/
system_firmware_version
Example:
Fabric(boot)# load /installables/switch/ucs-6100-k9-system.4.1.3.N2.1.0.211.bin
- After the system image loads, log in to Cisco UCS Manager.
- In the console for the subordinate fabric interconnect, do the following to bring it up:
- Boot the kernel firmware version on the subordinate fabric interconnect.
loader > boot /installables/switch/
kernel_firmware_version
- Boot the system firmware version on the subordinate fabric interconnect.
Fabric(boot)# load /installables/switch/
system_firmware_version
How to install a memory on a Cisco UCS B200 M3
To install a DIMM into the blade server, follow these steps:
Procedure
Step 1: Open both DIMM connector latches.

Step 2: Press the DIMM into its slot evenly on both ends until it clicks into place.
DIMMs are keyed, if a gentle force is not sufficient, make sure the notch on the DIMM is correctly aligned.
Note: Be sure that the notch in the DIMM aligns with the slot. If the notch is misaligned you may damage the DIMM, the slot, or both.
Step 3: Press the DIMM connector latches inward slightly to seat them fully.
Supported DIMMs
The DIMMs supported in this blade server are constantly being updated. A list of currently supported and available drives is in the specification sheets at:
http://www.cisco.com/en/US/products/ps10280/products_data_sheets_list.html
Cisco does not support third-party memory DIMMs, and in some cases their use may irreparably damage the server and require an RMA and down time.
Memory Arrangement
The blade server contains 24 DIMM slots—12 for each CPU. Each set of 12 DIMM slots is arranged into four channels, where each channel has three DIMMs.

1 | Channels A-D for CPU 1 | 2 | Channels E-H for CPU 2 |
DIMMs and Channels
Each channel is identified by a letter—A, B, C, D for CPU1, and E, F, G, H for CPU 2. Each DIMM slot is numbered 0, 1, or 2. Note that each DIMM slot 0 is blue, each slot 1 is black, and each slot 2 is off-white or beige.
The figure below shows how DIMMs and channels are physically laid out on the blade server. The DIMM slots in the upper and lower right are associated with the second CPU (CPU shown on right in the diagram), while the DIMM slots in the upper and lower left are associated with the first CPU (CPU shown on left).

Courtesy: Cisco
What is FlexPod?
FlexPod is a integrated computing, networking, and storage solution developed by Cisco and NetApp. Its configurations and workloads are published as Cisco Validated Designs. FlexPod is categorized by established and emerging client needs:
- FlexPod Data Center was developed for large enterprises.
- FlexPod Express serves small and medium-sized enterprises.
- FlexPod Select focuses on high capacity and performance for specialized workloads.
Cisco and NetApp support FlexPod deployments through the Cooperative Support Model.
FlexPod components include Cisco Unified Computing System (Cisco UCS)servers, Cisco Nexus switches, and NetApp unified storage systems.
The FlexPod architecture can scale up or out. And it can be optimized for a variety of mixed workloads in both virtualized and nonvirtualized environments.
Large enterprise data centers need agile platforms with high availability and scalable storage. Along with reducing operating costs, chief information officers want to use a converged infrastructure to support hybrid cloud computing.
Small and medium-sized enterprises need a simplified setup and easy use, access to public cloud services, and greater value for their data center budgets.
Many enterprises also need purpose-built, high-capacity platforms for specialized workloads. Large-scale, real-time data analytics place unique demands on computing stacks. Video surveillance, in-memory databases, and public cloud infrastructures have similar capacity needs.
FlexPod’s architecture can be configured for the growing needs of all these clients. FlexPod is deployed with more than 4100 customers and available in more than 100 countries. For the future, FlexPod customers and partners want configuration guidance, easy ordering, and validation for the configuration that is most aligned with their needs.
To meet these emerging trends, FlexPod delivers three named configurations:
- FlexPod Data Center
- FlexPod Express
- FlexPod Select
Cisco and NetApp support FlexPod through a Cooperative Support Model, receive best in class experience from NetApp, Cisco and our ecosystem partners delivered through collaborative and coordinated support services for your FlexPod integrated infrastructure.
FlexPod benefits from integrated management in the form of Cisco UCS Director.UCS Director supports cohesive, flexible data centers, built on FlexPod, that increase IT and business agility, while reducing operational processes and expenses.
An overview of the FlexPod solution is available through an iPad app.
FlexPod’s architectural flexibility is underpinned by a series of Cisco Validated Designs. These guides cover the important areas of the FlexPod infrastructure, applications on FlexPod, and security.
Featured Validated Designs
- Infrastructure
- FlexPod Datacenter with VMware vSphere 5.5 and Cisco UCS Director(PDF – 12.1 MB)
- FlexPod with Cisco UCS Mini Design Guide (PDF – 2.1 MB)
- FlexPod Datacenter with VMware vSphere 5.5 Update 2 and Cisco Nexus 9000 Application Centric Infrastructure (ACI) Design Guide (PDF – 7.7 MB)
- FlexPod Datacenter with VMware vSphere 5.5 U1 and Cisco Nexus 9000 Series Switches Design Guide (PDF – 2.8 MB)
- FlexPod Datacenter with VMware vSphere 5.5 Update 1 Design Guide (PDF – 3.9 MB)
- FlexPod Datacenter with VMware vSphere 5.5 Update 1 with 7 – Mode (PDF – 7.8 MB)
- FlexPod Data Center with VMware vSphere 5.1U1 and Cisco Nexus 9000 Series Switches Design Guide (PDF – 2.6 MB)
- FlexPod Data Center with VMware vSphere 5.1U1 and Cisco Nexus 6000 Series Switches Design Guide
- FlexPod Data Center with VMware vSphere 5.1 and Nexus 7000 Using FCoE Design Guide
- FlexPod Datacenter with VMware vSphere 5.5 and Cisco UCS Director(PDF – 12.1 MB)
- Security
- FlexPod Data Center with Cisco Secure Enclaves (PDF – 7.0 MB)
- FlexPod Data Center with Cisco Secure Enclaves (PDF – 7.0 MB)
- Microsoft
- FlexPod Datacenter with Microsoft Exchange 2013 and Cisco Application Centric Infrastructure (PDF – 9.2 MB)
- FlexPod Data Center with Microsoft SharePoint 2013 and Cisco ACI Design Guide (PDF – 11.2 MB)
- FlexPod Data Center with Microsoft Private Cloud 4.0 (PDF – 1.3 MB)
- FlexPod Data Center with Microsoft Private Cloud FT 3.0 with 7-Mode Design Guide
- SharePoint 2010 for FlexPod on VMware for 100,000 Users
- FlexPod Datacenter with Microsoft Exchange 2013 and Cisco Application Centric Infrastructure (PDF – 9.2 MB)
- Oracle
- Oracle RAC on FlexPod (PDF – 11.0 MB)
- Oracle JD Edwards on FlexPod with Oracle Linux
- Oracle RAC on FlexPod (PDF – 11.0 MB)
- SAP
- FlexPod Datacenter for SAP Solution (PDF – 12.7 MB)
- FlexPod Datacenter for SAP Solution (PDF – 12.7 MB)
Videos
- Secure Multitenancy and FlexPod (13:29 min)
- FlexPod Management and Automation (5:36 min)
- Introduction to FlexPod Express (2:14 min)
Solution Briefs
- FlexPod with Cisco UCS Mini (PDF – 363 KB)
- FlexPod Datacenter with VMware vSphere 5.5 Update 1
- FlexPod Datacenter with VMware vSphere 5.1 Update 1 and Cisco Nexus 9000 Series Switches
- FlexPod Datacenter with VMware vSphere 5.1U1 and Cisco ACI (PDF – 144 KB)
- FlexPod Data Center with VMware vSphere 5.1, Cisco Nexus 7000 Series Switches, and NetApp MetroCluster for Multisite Deployment
- FlexPod Data Center with VMware vSphere 5.1 Update 1 and Cisco Nexus 6000 Series Switches
- FlexPod Data Center with Citrix XenDesktop (PDF – 193 KB)
- FlexPod Data Center with VMware vSphere 5.1 Update 1
- FlexPod IP Shared Storage Solution for Small and Medium-Size Businesses (PDF – 580 KB)
- FlexPod Data Center with VMware vSphere 5.1 and Cisco Nexus 7000 Series Switches
- FlexPod Data Center with VMware vSphere 5.1, Cisco Nexus 7000 Series Switches, and IP-Based Storage
- FlexPod Express VMWare vSphere (PDF – 1.24 MB)
- FlexPod Express with Microsoft Windows Server 2012 Hyper-V (PDF – 1.56 MB)
Featured Case Studies
- Americas
- Katz, Sapper & Miller (PDF – 558 KB)
- King County (PDF – 563 KB)
- Photobucket (PDF – 457 KB)
- ActioNet (PDF – 394 KB)
- Katz, Sapper & Miller (PDF – 558 KB)
- Asia-Pacific
- County Fire Authority (PDF – 454 KB)
- Energia Communications, Inc.(PDF – 169 KB)
- Duzon Bizon (PDF – 689 KB)
- Swinburne University of Technology (PDF – 382 KB)
- County Fire Authority (PDF – 454 KB)
- EMEAR
- Toyota Tsusho Africa (PDF – 317 KB)
- Groupe Mutuel (PDF – 226 KB)
- Suttons Group (PDF – 273 KB)
- Steria (PDF – 209 KB)
- Toyota Tsusho Africa (PDF – 317 KB)
A more complete listing is available on the Data Center Case Studies.
Courtesy: Cisco
How to install NIC Teaming Driver and configure NIC Teaming in a Cisco UCS B200-M3
The Cisco NIC Teaming Driver is contained in the UCS-related Windows Utilities ISO. You can download it from http://www.cisco.com. Depending on your platform, choose either Cisco UCS B-Series Blade Server Software or Cisco UCS C-Series Rack-Mount UCS-Managed Server Software. Once you have installed Windows on the blade you can proceed to install the teaming software.
First let us see how to install the NIC teaming driver on the server.
Once the driver is installed, you need to configure the teaming on the desired NICs.
How to fix UCSM login problems with the Java 7 Update 45
This thread was brought to my attention – https://supportforums.cisco.com/thread/2246189
After updating Java to Update 45 – you can no longer login to UCSM (UCS Manager)
You may see one of two errors:
Login Error: java.io.IOException: Invalid Http response
Login Error: java.io.IOException: Server returned HTTP response code: 400 for URL: http://x.x.x.x:443/nuova
Cisco Bug ID: CSCuj84421
This is due to a change introduced in Java
The solution posted is to rollback to Update 25. Rolling back to Update 40 also works.
DCICN Exam – Cisco Data Center Networking (640-911) details and Study Guide
The 640-911 DCICN “Introducing Cisco Data Center Networking” is one of the exams associated with the CCNA® Data Center certification. This 90-minute 65−75 questions exam tests a candidate’s knowledge of networking concepts for the Data Center environment, based on Nexus-OS. You will learn fundamental information on how a Data Center network works; and how to configure virtualization in the network, addressing schemes, troubleshooting and configuration skills. Candidates can prepare for this exam by taking the course 640-911 DCICN, “Introducing Cisco Data Center Networking”.
The following topics are general guidelines for the content likely to be included on the exam. However, other related topics may also appear on any specific delivery of the exam. In order to better reflect the contents of the exam and for clarity purposes, the guidelines below may change at any time without notice.
Download Complete List of Topics in PDF format
The Study/Learn tabs for the 640-911 DCICN Exam page.
DCICN Exam Topics ![]() |
Duration | Learn More |
Media Type |
Access Now | ||
Describe how a network worksHide Titles | ||||||
Preparing for your CCNA Data Center Certifications Studies | 00:43:00 | ![]() |
![]() |
Watch Now | ||
Internetworking Basics | 29 pages | ![]() |
![]() |
View Now | ||
Understanding the TCP/IP Internet Layer | 00:25:00 | ![]() |
![]() |
Watch Now | ||
|
01:29:42 | ![]() |
![]() |
Subscribe Now | ||
|
01:13:40 | ![]() |
![]() |
Subscribe Now | ||
NX-OS and Cisco Nexus Switching: Next-Generation Data Center Architectures ![]() |
480 pages | ![]() |
![]() |
Buy Now | ||
Configure, verify and troubleshoot a switch with VLANs and interswitch communicationHide Titles |
||||||
Ethernet Technologies | 46 pages | ![]() |
![]() |
View Now | ||
Introduction to LAN Protocols | 9 pages | ![]() |
![]() |
View Now | ||
Understanding VLANS by Understanding MAC Table Operation | 4 pages | ![]() |
![]() |
View Now | ||
Calculating an 802.1d Spanning-Tree Topology | 20 pages | ![]() |
![]() |
View Now | ||
Implement an IP addressing scheme and IP Services to meet network requirements in a medium-size Enterprise branch office network |
||||||
IPAddressingGuide | 1 page | ![]() |
![]() |
View Now | ||
IP Addressing and Subnetting for New Users | 12 pages | ![]() |
![]() |
View Now | ||
Binary Game | Varies | ![]() |
![]() |
Play Now | ||
Subnet Game | Varies | ![]() |
![]() |
Play Now | ||
Subnet Troubleshooting Game | Varies | ![]() |
![]() |
Play Now | ||
IP Routing – Introduction | Varies | ![]() |
![]() |
Watch Now | ||
Configure, verify, and troubleshoot basic router operation and routing on Cisco devicesHide Titles |
||||||
Routing Basics | 11 pages | ![]() |
![]() |
View Now | ||
IP Routing – Introduction | Varies | ![]() |
![]() |
Watch Now | ||
Introducing EIGRP | 00:22:00 | ![]() |
![]() |
Watch Now | ||
Introducing the OSPF Protocol | 00:23:00 | ![]() |
![]() |
Watch Now | ||
Nexus 7000 Series Data Sheet | 8 pages | ![]() |
![]() |
View Now | ||
Nexus Licensing | 44 pages | ![]() |
![]() |
View Now |
Demystifying Monitoring for Cisco UCS Manager and Standalone C-Series Servers
The Cisco UCS Monitoring Resource Handbook is a monitoring reference guide that was developed to supplement this session.
Participants:
Eric Williams, Moderator, Technical Marketing Engineer, Cisco
Jeff Foster, Technical Marketing Engineer, Cisco
Jason Shaw, Technical Marketing Engineer, Cisco
Links Relevant to this session:
UCS Manager MIB Reference Guide: http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/mib/b-series/b_UCS_MIBRef.html
UCS Manager Fault Reference Guide:http://www.cisco.com/en/US/docs/unified_computing/ucs/ts/faults/reference/UCSFaultsRef.pdf
C-Series MIB Reference Guide: http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/mib/c-series/b_UCS_Standalone_C-Series_MIBRef.pdf
C-Series Fault Reference Guide:http://www.cisco.com/en/US/docs/unified_computing/ucs/c/sw/fault/reference/guide/CIMC_Fault_codes.pdf
Monitoring UCS Manager with Syslog:
To learn more about Cisco UCS Manager and Standalone C-Series:
Cisco UCS Communities: http://communities.cisco.com/ucs
Cisco Developed Integrations Communities: http://communities.cisco.com/ucsintegrations
Cisco UCS Manager: http://www.cisco.com/en/US/products/ps10281/index.html
Cisco UCS Central: http://www.cisco.com/en/US/products/ps12502/index.html
Cisco UCS Management (Blog): http://blogs.cisco.com/datacenter/cisco-ucs-management/
Cisco UCS Monitoring Resource Handbook
‘Demystifying Monitoring for UCS Manager & C-Series’ Tech Talk available here:
https://communities.cisco.com/docs/DOC-37138
Additional Cisco Monitoring Resources: (Cited within this document)
- UCS Manager MIB Reference Guide
- UCS Manager Fault Reference Guide
- C-Series MIB Reference Guide:
- C-Series Fault Reference Guide:
- Monitoring UCS Manager with Syslog:
UCSM and Standalone C-Series Monitoring Overview:
UCS Manager Monitoring Background:
The core of UCS Manager is made up three core elements, which are the Data Management Engine (DME), Application Gateway (AG), and user accessible northbound interface (SNMP, Syslog, XMLAPI and UCS CLI). With UCS Manager there are three main ways of monitoring UCS servers, which are XML API, SNMP, and syslog. Both SNMP and Syslog are interfaces only used for monitoring as they are “read-only” in nature, not allowing an end user to change the configuration. Alternatively, the UCS XML API is a monitoring that is “read-write” in nature, which does allow an end user to both monitor UCS, as well as change the configuration if needed.
Data Management Engine (DME) – The DME is the center of the UCS Manager universe, or the “queen bee” of the entire system. It is the maintainer of the UCS XML database which houses the inventory database of all physical elements (blade / rack mount servers, chassis, IO modules, fabric interconnects, etc.), the logical configuration data for profiles, policies, pools, vNIC / vHBA templates, and the various networking related configuration details (VLANs, VSANs, port channels, network uplinks, server downlinks, etc). It maintains the current health and state of all components of all physical and logical elements in a UCS Domain, and maintains the transition information of all Finite State Machine (FSM) tasks occurring. The inventory, health, and configuration data of managed end points stored in the UCS XML Database are always showing current data, delivered in near real time. As fault conditions are raised and cleared on end points, the DME will create, clear, and remove faults in the UCS XML database as those fault conditions are raised or mitigated. The faults stored in the UCS XML database only are the ones actively occurring, as the DME by default does not store a historical log of all faults that have occurred on a UCS Domain.
Application Gateway (AG) – The AG’s are the software agents, or “worker bees”, that communicate directly with the end points to provide the health and state of the end points to the DME. AG’s manage configuration changes from the current state to the desired state during FSM transitions when changes are made to the UCS XML database. AG managed end points include servers, chassis, IO Modules, fabric extenders, fabric interconnects, and NXOS. The server AG’s actively monitor the server through the IPMI and SEL logs via the Cisco Integrated Management Controller (CIMC) to provide the DME with the health, state, configuration, and potential fault conditions of a device. The IO Module AG and chassis AG communicate with the Chassis Management Controller (CMC) to get information about the health, state, configuration, and fault conditions visible by the CMC. The fabric interconnect / NXOS AG communicates directly with NXOS to get information about the health, state, configuration, statistics, and fault conditions visible by NXOS on the fabric interconnects. All AG’s provide the inventory details to DME about end point during the various discovery processes. The AG’s perform the state changes necessary to configure an end point during FSM triggered transitions, monitors the health and state of the end points, and notifies the DME of any faults or conditions.
Northbound interfaces – The northbound interfaces include SNMP, Syslog, CLI and XML API. The XML API present in the Apache webserver layer used to send login, logout, query, and configuration requests via HTTP or HTTPS. SNMP and Syslog are both consumers of data from the DME. SNMP informs and traps are translated directly from the fault information stored in the UCS XML database. Inversely, SNMP GET requests are sent through the same object translation engine in reverse, where the DME receives a request from the object translation engine and the data is translated from XML data from the DME to a SNMP response. Syslog messages use the same object translation engine as SNMP, where the source of the data (faults, events, audit logs) is translated from XML into a UCS Manager formatted syslog message.
Standalone C-Series Monitoring Background:
Monitoring support for our Standalone C-Series Servers has evolved with each release. The features and capabilities of the current CIMC release, v1.5 supports our M3 Platforms including the C220 M3, C240 M3, C22 M3, C24 M3 and C420 M3 as well as our C260 M2 and C460 M2. While earlier versions of our CIMC supported Syslog and SNMP, the Fault Engine added support for SNMP v3 in CIMC v1.5. We have documented the internals of our monitoring subsystem in the graphic included below.
Fault Engine Overview:
While Cisco Standalone C-Series Servers do not support the DME/AG architecture described above in the UCS Manager section, many of the same concepts can be applied to the monitoring subsystem for Standalone Servers. The Fault Engine has become a central repository and clearinghouse for fault data as it is passed along to monitoring endpoints. The Fault engine acts as a master repository for events within the system which initiates alerts (SNMP Traps, Syslog messages, XML API events, etc.) but can also be queried via SNMP (GETs) or the XML API. This durability of fault information means provides customers a mechanism to not only receive fault data, but also use these interfaces to query system health data.
Within the system, the Fault Engine regularly polls component health status in the form of sensor data using IPMI and the Storage Daemon and these values are compared to threshold reference points. If a sensor value is outside one of the threshold values, an entry is created in the fault engine and notifications are sent as appropriate. As discussed earlier, multiple notification types are supported including SNMP (Traps and Informs), Syslog (Messages) and XML API (Event Subscription) and fault queries are supported through SNMP GET and XML API queries. Cisco has developed a number of integrations for 3rd Party Management solutions that leverage queries of the Fault Engine data to drive notifications in these management tools. The Fault Engine retains faults until they are mitigated or until the IMC is rebooted.
UCS Manager Best Practices:
The recommendation for monitoring a UCS Manager environment would be to monitor all faults of either severity critical or major and that are not of type “FSM”. FSM related faults are transient in nature, as they are triggered when a FSM transition is occurring in UCS Manager. Generally speaking, FSM related faults will resolve themselves automatically as most are triggered after a task fails the first time, but will be successful on a subsequent try. An example of a FSM task failure would be when a FSM task waiting for a server to finish BIOS POST fails during a service profile association. This particular condition can happen when a server with many memory DIMMs takes longer to successfully finish POST than the default timeout of the FSM task. This timeout would raise a FSM fault on this task, but by default would keep retrying up to the defined FSM task retry limit. If a subsequent retry is successful, the FSM task fault raised will be cleared and removed. However, if subsequent retries are unsuccessful and the retry limit is hit, the FSM task will be faulted and another fault will be raised against the affected object. In this example, a configuration failure would be raised against the service profile, as the association process would have failed because the server did not perform a successful BIOS POST.
If you are looking for a list of the most critical faults codes to monitor, refer to the “Syslog Messages to Monitor” section in Chapter 3 of the “Monitoring UCS Manager with Syslog” guide below. The fault codes listed are the same codes for all interfaces (SNMP, syslog, or XML API).
C-Series Standalone Best Practices:
Filtering: As referenced above, the faults for our Standalone C-Series Servers are consistent with faults for UCS Manager. The concept of FSM (Finite State Machine) does not exist with Standalone C-Series, there is no reason to filter out FSM State changes when monitoring these systems. The recommendation is that filters not be applied to Standalone C-Series Servers as all raised faults are relevant to customers who are interested in monitoring/alerting capabilities. At present, there are approximately 85 faults that are included in the Fault Database for our Standalone C-Series Servers with CIMC 1.5(3).
SNMP vs. Platform Event Filters (PEF): As monitoring has evolved in these systems, support has been extended to include a number of notification mechanisms, and Cisco is planning to deprecate Platform Event Filters (PEF) and Platform Event Traps (PET) in a future CIMC release. Platform Event Traps are sent as IPMI v1 traps where filters (PEF) can be applied so only certain subsystem traps are sent to the NMS system. The variable bindings that are consistent across UCS Manager and Standalone C-Series servers do not apply to Platform Event Filters as they have their own nomenclature that is defined and maintained by Intel.
XML API Usage: As a more robust XML API has been implemented in Standalone C-Series Servers, this is the preferred mechanism for capturing faults sent by the system. The XML API supports Event Subscription which provides proactive alerting. The XML API also supports queries which can be used to collect data in the fault table on a regular basis.
Cisco UCS MIB Files:
Cisco MIBs are available at the following download site:
http://www.cisco.com/public/sw-center/netmgmt/cmtk/mibs.shtml
All Cisco UCS Manager and Standalone C-Series faults are available with SNMP using the cucsFaultTable table and the CISCO-UNIFIED-COMUTING-FAULT-MIB. The table contains one entry for every fault instance. Each entry has variables to indicate the nature of a problem, such as its severity and type. The same object is used to model all Cisco UCS fault types, including equipment problems, FSM failures, configuration or environmental issues, and connectivity issues. The cucsFaultTable table includes all active faults (those that have been raised and need user attention), and all faults that have been cleared but not yet deleted because of the retention interval.
Important OIDs (Object Identifier):
In UCS Manager version 1.3 and later, Cisco UCS Manager sends a cucsFaultActiveNotif event notification whenever a fault is raised. There is one exception to this rule: Cisco UCS Manager does not send event notifications for FSM faults. The trap variables indicate the nature of the problem, including the fault type. Cisco UCS Manager sends a cucsFaultClearNotif event notification whenever a fault has been cleared. A fault is cleared when the underlying issue has been resolved.
In UCS Manager version 1.4 and later, the cucsFaultActiveNotif and cucsFaultClearNotif traps are defined in the CISCO-UNIFIED-COMPUTING-NOTIFS-MIB. All faults can be polled using SNMP GET operations on the cucsFaultTable, which is defined in the CISO-UNIFIED-COMPUTING-FAULT-MIB.
Fault Attributes (Variable Bindings):
MIB Loading Order & Statistics Collection Details:
More details on MIB load ordering and statistics collection including a comprehensive list of Statistics OID and their corresponding Statistics tables are located in the following MIB Reference Guides:
MIB Reference for Cisco UCS Manager:
http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/mib/b-series/b_UCS_MIBRef.pdf
MIB Reference for Cisco UCS Standalone C-Series Servers:
UCS Manager and Standalone C-Series Faults:
In the Cisco UCS, a fault is a mutable object that is managed by the Cisco UCS Manager. Each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. During the lifecycle of a fault, it can change from one state or severity to another.
Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state. A fault remains in the Cisco UCS Manager until the fault is cleared and deleted according to the settings in the fault collection policy.
You can view all faults in the Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. You can also configure the fault collection policy to determine how a Cisco UCS instance collects and retains faults.
Fault Severities for UCS Manager and Standalone C-Series Servers include:
Types of faults for UCS Manager and Standalone C-Series Servers include:
Fault Overview:
The faults in Cisco UCS are stateful, and a fault raised in a Cisco UCS instance transitions through more than one state during its lifecycle. In addition, only one instance of a given fault can exist on each object. If the same fault occurs a second time, the Cisco UCS increases the number of occurrences by one.
A fault has the following lifecycle:
- A condition occurs in the system and the Cisco UCS raises a fault in the active state.
- If the fault is alleviated within a short period of time known as the flap interval, the fault severity remains at its original active value but the fault enters the soaking state. The soaking state indicates that the condition that raised the fault has cleared, but the system is waiting to see whether the fault condition reoccurs.
- If the condition reoccurs during the flap interval, the fault enters the flapping state. Flapping occurs when a fault is raised and cleared several times in rapid succession. If the condition does not reoccur during the flap interval, the fault is cleared.
- Once cleared, the fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated, and that the fault is not deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the fault collection policy.
- If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.