Troubleshooting an Exadata Environment on the command line

This article describes how you can do an effective Exadata troubleshooting from the command line and show some “tools” which you can used.

What are major components inside the Exadata System? (I know that there are some other components like KVM switches and PDU etc. but they makes normally very rare problems.)

  • Compute Node
  • Storage Cells
  • Infiniband Switches

Troubleshooting Exadata Compute Nodes

We have more or less three components on the Compute Node:

  • Clusterware, ASM
  • Oracle Rdbms
  • Config Files related to the Storage Cells

Clusterware, ASM and the Rdbms Software are the same as you are working on a RAC node there is no difference. While the Exadata System is directly connected to the Storage Cells you need to check for the Cell Config Files.

Troubleshooting the Compute Nodes

Type Description
OS Syslogs /var/log/messages
OS & Clusterware Exawatcher Logs ..oracle.oswatcher/osw/archive/
Oracle Rdbms Exachk Script Comprehensive Report very detailed
 ASM / Storage Cells related KFOD / V$Views ASM tools used by Support : KFOD, KFED, AMDU (Doc ID 1485597.1)Storage Cell config files

Troubleshooting Exadata Storage Cells

First a overview from the documentation

cell_architect

What are the components of the Storage Cell

  • Restart Server
  • Management Server
  • cellsrv

Troubleshooting the Storage Cell

Type Description
OS Syslogs /var/log/messages
Cell Server adrci You can use the ‘adrci’ tool like you were using it for the Rdbms Software
cellcli List alerthistory

 

Troubleshooting Infiniband Switch

Type Description
OS Exawatcher The Logfiles from the Exawatcher
Topology check ibdiagtoolsverify-topology Check all cables, links and speed. For more information use “-h” option
ibdiagtoolsinfinicheck Validate the configuration and performance

 

Conclusion

Troubleshooting an Exadata System looks very similar to an Oracle RAC environment with ASM and Clusterware. Major additional components are the Storage Cells and the Infinband Switches.

So if you would like to do a troubleshooting in a Exadata environment on the command line here are some additional tips:

  • Run the exachk every month and before and after every upgrade. Exachk is a very powerful tool. A must for the daily business.
    • Keep in mind that the new version has an option
    • “–diff <old_report> <new_report>”
      • So it is very easy to find differences between System changes
  •  Automate your work by some small shell scripts here an example for reading the alerthistory of the Storage Cells

for i in 1 2 3 4 5 6 7
do

ssh root@cell{$i} “cellcli –e list alerthistory |grep –I critical”
done

With these kind of easy scripts you can check your environment very fast and effectively.

Stay tuned while I will show the benefit of an exachk report.

Advertisements

Über spa

Oracle and Unix Professional, main focus on Oracle HA - Systems also an Exadata enthusiasts
Dieser Beitrag wurde unter Exadata abgelegt und mit , , , , , verschlagwortet. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s