ASM „corrupted metadata block“ check via amdu / kfed (Part 1)

corrupt

Last week we had a crash on our Exadata ASM Instance and we are not amused about this but we restart the instance and start working as usually.

About the environment: „GRID Software is Release 12.1 but the diskgroups are compatible 11.2.0.4“

To be save we start a check on the DATA diskgroup.


ALTER DISKGROUP DATA CHECK all NOREPAIR;

The check run online but nearly 25 hours
We saw in the meantime lots of errors in the ASM alert.log

Tue Jun 21 15:47:15 2016
NOTE: disk DATA_CD_10_srv1CD13, used AU total mismatch: DD={514269, 0} AT={514270, 0}
Tue Jun 21 15:47:15 2016
GMON querying group 1 at 567 for pid 52, osid 138892
GMON checking disk 143 for group 1 at 568 for pid 52, osid 138892

A MOS note said this should not be a problem but is this correct …?

The analyze is done via a  dump with amdu of the diskgroup when the „CHECK NO REPAIRS“ is ready.


amdu -diskstring 'o/*/*' -dump 'DATA'

Yes, we start the dump in a directory where we have enough space while the amdu tool creates a lot

of 2GB files dependent from the size of the diskgroup. One small file will also be created during this

dump and it is the  report.txt file.

The report.txt has information about the System, OS, Version, all scanned disks and also a list

about the scanned disks which have „corrupted metadata blocks“.

Here an example

---------------------------- SCANNING DISK N0002 -----------------------------

Disk N0002: '192.168.10.10/DATA_CD_01_srv1cd2">192.168.10.10/DATA_CD_01_srv1cd2'

AMDU-00209: Corrupt block found: Disk N0002 AU [454272] block [0] type [0]

AMDU-00201: Disk N0002: '192.168.10.10/DATA_CD_01_srv1cd2">192.168.10.10/DATA_CD_01_srv1cd2'

AMDU-00217: Message 217 not found;  product=RDBMS; facility=AMDU; arguments: [0] [1024] [blk_kfbl]

           Allocated AU's: <strong>507621</strong>

                Free AU's: 57627

       AU's read for dump: 194

       Block images saved: 12457

        Map lines written: 194

          Heartbeats seen: 0

  Corrupt metadata blocks: 1

        Corrupt AT blocks: 0

The next question was: „How can we check if this metadata block is corrupted?“

The answer is you need the kfed tool and theAllocated AU’s: 507621″ 

from the report.txt files.


[oracle0@srv1db1]$ kfed read <strong>aun=507621</strong> aus=4194304 blkn=0 dev=o/<a href="http://192.168.10.10/DATA_CD_00_srv1cd2%7C" data-saferedirecturl="https://www.google.com/url?hl=de&q=http://192.168.10.10/DATA_CD_00_srv1cd2%257C&source=gmail&ust=1467906685880000&usg=AFQjCNHjeE0eAiO3K3BgawquJRg3dq2V0Q">192.168.10.10/DATA_CD_00_srv1cd2</a>

kfbh.endian:                         58 ; 0x000: 0x3a

kfbh.hard:                          162 ; 0x001: 0xa2

kfbh.type:                            0 ; 0x002: <strong>KFBTYP_INVALID</strong>

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:              1477423104 ; 0x004: blk=1477423104

kfbh.block.obj:              3200986444 ; 0x008: disk=732492

kfbh.check:                    67174540 ; 0x00c: 0x0401008c

kfbh.fcn.base:                    51826 ; 0x010: 0x0000ca72

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

1EFB9400000 0000A23A 580FB000 BECB2D4C 0401008C  [:......XL-......]

1EFB9400010 0000CA72 00000000 00000000 00000000  [r...............]

1EFB9400020 00000000 00000000 00000000 00000000  [................]

  Repeat 253 times

As you saw in the example the „kfbh.type = KFBTYP_INVALID“ which means the metadata block is corrupt.

So and how can I fix this?

In our situation we have an diskgroup which is compatible 11.2.0.4 so we have to start an


ALTER DISKGROUP DATA CHECK ALL REPAIR;

Yeah this could be very dangerous.

If the „CHECK ALL REPAIR“ find a corruption and try to repair this the diskgroup will be dismounted

This means all databases which are up and running will crash

But keep in mind that a „CHECK ALL REPAIR“ will also run 25 hours.

Is there another solution?
Yes but you need also a dismount of the diskgroup.

Then run the amdu tool „OFFLINE“ again and check the report.txt file again for corrupted metadata blocks

More details will be discussed in Part 2 about ASM kfed and amdu

So stay tuned.

 


 

Advertisements

Exadata ASM Disk overview via kfod

Exa_Storage

That’s the question:

How can I  easily create an overview from the OS command line of all ASM Disks which are configured?

Exadata ASM Disks

ASM disks in an Exadata Machine are part of the Storage cells and presented to the Compute Nodes via the proprietary iDB protocol

The ASM instance is running on the Compute Node

Each storage cell has 12 hard disks and flash disks. During Exadata Setup the Grid disks where created on the hard disks.
Grid Disks are not visible to the Operating System, only to ASM, Database Instance and related utilities, via iDB protocol

To get an overview of the Grid Disks via command line use the „kfod tool“
Here an output from the kfod discovering a Full Rack:


Login in and set your GRID Environment
[ora1120@s1s2db2 asm_grash]<strong>$ $ORACLE_HOME/bin/kfod disks=all</strong>
--------------------------------------------------------------------------------
 Disk Size Path User Group 
================================================================================
 1: 2260992 Mb o/192.168.10.10/DATA_CD_00_s1s2cd2 
 2: 2260992 Mb o/192.168.10.10/DATA_CD_01_s1s2cd2 
 3: 2260992 Mb o/192.168.10.10/DATA_CD_02_s1s2cd2 
 4: 2260992 Mb o/192.168.10.10/DATA_CD_03_s1s2cd2 
 5: 2260992 Mb o/192.168.10.10/DATA_CD_04_s1s2cd2 
 6: 2260992 Mb o/192.168.10.10/DATA_CD_05_s1s2cd2 
 7: 2260992 Mb o/192.168.10.10/DATA_CD_06_s1s2cd2 
 8: 2260992 Mb o/192.168.10.10/DATA_CD_07_s1s2cd2 
 9: 2260992 Mb o/192.168.10.10/DATA_CD_08_s1s2cd2 
 10: 2260992 Mb o/192.168.10.10/DATA_CD_09_s1s2cd2 
 11: 2260992 Mb o/192.168.10.10/DATA_CD_10_s1s2cd2 
 12: 2260992 Mb o/192.168.10.10/DATA_CD_11_s1s2cd2 
 13: 34608 Mb o/192.168.10.10/DBFS_DG_CD_02_s1s2cd2 
 14: 34608 Mb o/192.168.10.10/DBFS_DG_CD_03_s1s2cd2 
 15: 34608 Mb o/192.168.10.10/DBFS_DG_CD_04_s1s2cd2 
 16: 34608 Mb o/192.168.10.10/DBFS_DG_CD_05_s1s2cd2 
 17: 34608 Mb o/192.168.10.10/DBFS_DG_CD_06_s1s2cd2 
 18: 34608 Mb o/192.168.10.10/DBFS_DG_CD_07_s1s2cd2 
 19: 34608 Mb o/192.168.10.10/DBFS_DG_CD_08_s1s2cd2 
 20: 34608 Mb o/192.168.10.10/DBFS_DG_CD_09_s1s2cd2 
 21: 34608 Mb o/192.168.10.10/DBFS_DG_CD_10_s1s2cd2 
 22: 34608 Mb o/192.168.10.10/DBFS_DG_CD_11_s1s2cd2 
 23: 565360 Mb o/192.168.10.10/RECO_CD_00_s1s2cd2 
 24: 565360 Mb o/192.168.10.10/RECO_CD_01_s1s2cd2 
 25: 565360 Mb o/192.168.10.10/RECO_CD_02_s1s2cd2 
 26: 565360 Mb o/192.168.10.10/RECO_CD_03_s1s2cd2 
 27: 565360 Mb o/192.168.10.10/RECO_CD_04_s1s2cd2 
 28: 565360 Mb o/192.168.10.10/RECO_CD_05_s1s2cd2 
 29: 565360 Mb o/192.168.10.10/RECO_CD_06_s1s2cd2 
 30: 565360 Mb o/192.168.10.10/RECO_CD_07_s1s2cd2 
 31: 565360 Mb o/192.168.10.10/RECO_CD_08_s1s2cd2 
 32: 565360 Mb o/192.168.10.10/RECO_CD_09_s1s2cd2 
 33: 565360 Mb o/192.168.10.10/RECO_CD_10_s1s2cd2 
 34: 565360 Mb o/192.168.10.10/RECO_CD_11_s1s2cd2 
 35: 2260992 Mb o/192.168.10.11/DATA_CD_00_s1s2cd3 
 36: 2260992 Mb o/192.168.10.11/DATA_CD_01_s1s2cd3 
 37: 2260992 Mb o/192.168.10.11/DATA_CD_02_s1s2cd3 
 38: 2260992 Mb o/192.168.10.11/DATA_CD_03_s1s2cd3 
 39: 2260992 Mb o/192.168.10.11/DATA_CD_04_s1s2cd3 
 40: 2260992 Mb o/192.168.10.11/DATA_CD_05_s1s2cd3 
 41: 2260992 Mb o/192.168.10.11/DATA_CD_06_s1s2cd3 
 42: 2260992 Mb o/192.168.10.11/DATA_CD_07_s1s2cd3 
 43: 2260992 Mb o/192.168.10.11/DATA_CD_08_s1s2cd3 
 44: 2260992 Mb o/192.168.10.11/DATA_CD_09_s1s2cd3 
 45: 2260992 Mb o/192.168.10.11/DATA_CD_10_s1s2cd3 
 46: 2260992 Mb o/192.168.10.11/DATA_CD_11_s1s2cd3 
 47: 34608 Mb o/192.168.10.11/DBFS_DG_CD_02_s1s2cd3 
 48: 34608 Mb o/192.168.10.11/DBFS_DG_CD_03_s1s2cd3 
 49: 34608 Mb o/192.168.10.11/DBFS_DG_CD_04_s1s2cd3 
 50: 34608 Mb o/192.168.10.11/DBFS_DG_CD_05_s1s2cd3 
 51: 34608 Mb o/192.168.10.11/DBFS_DG_CD_06_s1s2cd3 
 52: 34608 Mb o/192.168.10.11/DBFS_DG_CD_07_s1s2cd3 
 53: 34608 Mb o/192.168.10.11/DBFS_DG_CD_08_s1s2cd3 
 54: 34608 Mb o/192.168.10.11/DBFS_DG_CD_09_s1s2cd3 
 55: 34608 Mb o/192.168.10.11/DBFS_DG_CD_10_s1s2cd3 
 56: 34608 Mb o/192.168.10.11/DBFS_DG_CD_11_s1s2cd3 
 57: 565360 Mb o/192.168.10.11/RECO_CD_00_s1s2cd3 
 58: 565360 Mb o/192.168.10.11/RECO_CD_01_s1s2cd3 
 59: 565360 Mb o/192.168.10.11/RECO_CD_02_s1s2cd3 
 60: 565360 Mb o/192.168.10.11/RECO_CD_03_s1s2cd3 
 61: 565360 Mb o/192.168.10.11/RECO_CD_04_s1s2cd3 
 62: 565360 Mb o/192.168.10.11/RECO_CD_05_s1s2cd3 
 63: 565360 Mb o/192.168.10.11/RECO_CD_06_s1s2cd3 
 64: 565360 Mb o/192.168.10.11/RECO_CD_07_s1s2cd3 
 65: 565360 Mb o/192.168.10.11/RECO_CD_08_s1s2cd3 
 66: 565360 Mb o/192.168.10.11/RECO_CD_09_s1s2cd3 
 67: 565360 Mb o/192.168.10.11/RECO_CD_10_s1s2cd3 
 68: 565360 Mb o/192.168.10.11/RECO_CD_11_s1s2cd3 
 69: 2260992 Mb o/192.168.10.12/DATA_CD_00_s1s2cd4 
 70: 2260992 Mb o/192.168.10.12/DATA_CD_01_s1s2cd4 
 71: 2260992 Mb o/192.168.10.12/DATA_CD_02_s1s2cd4 
 72: 2260992 Mb o/192.168.10.12/DATA_CD_03_s1s2cd4 
 73: 2260992 Mb o/192.168.10.12/DATA_CD_04_s1s2cd4 
 74: 2260992 Mb o/192.168.10.12/DATA_CD_05_s1s2cd4 
 75: 2260992 Mb o/192.168.10.12/DATA_CD_06_s1s2cd4 
 76: 2260992 Mb o/192.168.10.12/DATA_CD_07_s1s2cd4 
 77: 2260992 Mb o/192.168.10.12/DATA_CD_08_s1s2cd4 
 78: 2260992 Mb o/192.168.10.12/DATA_CD_09_s1s2cd4 
 79: 2260992 Mb o/192.168.10.12/DATA_CD_10_s1s2cd4 
 80: 2260992 Mb o/192.168.10.12/DATA_CD_11_s1s2cd4 
 81: 34608 Mb o/192.168.10.12/DBFS_DG_CD_02_s1s2cd4 
 82: 34608 Mb o/192.168.10.12/DBFS_DG_CD_03_s1s2cd4 
 83: 34608 Mb o/192.168.10.12/DBFS_DG_CD_04_s1s2cd4 
 84: 34608 Mb o/192.168.10.12/DBFS_DG_CD_05_s1s2cd4 
 85: 34608 Mb o/192.168.10.12/DBFS_DG_CD_06_s1s2cd4 
 86: 34608 Mb o/192.168.10.12/DBFS_DG_CD_07_s1s2cd4 
 87: 34608 Mb o/192.168.10.12/DBFS_DG_CD_08_s1s2cd4 
 88: 34608 Mb o/192.168.10.12/DBFS_DG_CD_09_s1s2cd4 
 89: 34608 Mb o/192.168.10.12/DBFS_DG_CD_10_s1s2cd4 
 90: 34608 Mb o/192.168.10.12/DBFS_DG_CD_11_s1s2cd4 
 91: 565360 Mb o/192.168.10.12/RECO_CD_00_s1s2cd4 
 92: 565360 Mb o/192.168.10.12/RECO_CD_01_s1s2cd4 
 93: 565360 Mb o/192.168.10.12/RECO_CD_02_s1s2cd4 
 94: 565360 Mb o/192.168.10.12/RECO_CD_03_s1s2cd4 
 95: 565360 Mb o/192.168.10.12/RECO_CD_04_s1s2cd4 
 96: 565360 Mb o/192.168.10.12/RECO_CD_05_s1s2cd4 
 97: 565360 Mb o/192.168.10.12/RECO_CD_06_s1s2cd4 
 98: 565360 Mb o/192.168.10.12/RECO_CD_07_s1s2cd4 
 99: 565360 Mb o/192.168.10.12/RECO_CD_08_s1s2cd4 
 .....
 
.... 
 448: 2260992 Mb o/192.168.10.9/DATA_CD_05_s1s2cd1 
 449: 2260992 Mb o/192.168.10.9/DATA_CD_06_s1s2cd1 
 450: 2260992 Mb o/192.168.10.9/DATA_CD_07_s1s2cd1 
 451: 2260992 Mb o/192.168.10.9/DATA_CD_08_s1s2cd1 
 452: 2260992 Mb o/192.168.10.9/DATA_CD_09_s1s2cd1 
 453: 2260992 Mb o/192.168.10.9/DATA_CD_10_s1s2cd1 
 454: 2260992 Mb o/192.168.10.9/DATA_CD_11_s1s2cd1 
 455: 34608 Mb o/192.168.10.9/DBFS_DG_CD_02_s1s2cd1 
 456: 34608 Mb o/192.168.10.9/DBFS_DG_CD_03_s1s2cd1 
 457: 34608 Mb o/192.168.10.9/DBFS_DG_CD_04_s1s2cd1 
 458: 34608 Mb o/192.168.10.9/DBFS_DG_CD_05_s1s2cd1 
 459: 34608 Mb o/192.168.10.9/DBFS_DG_CD_06_s1s2cd1 
 460: 34608 Mb o/192.168.10.9/DBFS_DG_CD_07_s1s2cd1 
 461: 34608 Mb o/192.168.10.9/DBFS_DG_CD_08_s1s2cd1 
 462: 34608 Mb o/192.168.10.9/DBFS_DG_CD_09_s1s2cd1 
 463: 34608 Mb o/192.168.10.9/DBFS_DG_CD_10_s1s2cd1 
 464: 34608 Mb o/192.168.10.9/DBFS_DG_CD_11_s1s2cd1 
 465: 565360 Mb o/192.168.10.9/RECO_CD_00_s1s2cd1 
 466: 565360 Mb o/192.168.10.9/RECO_CD_01_s1s2cd1 
 467: 565360 Mb o/192.168.10.9/RECO_CD_02_s1s2cd1 
 468: 565360 Mb o/192.168.10.9/RECO_CD_03_s1s2cd1 
 469: 565360 Mb o/192.168.10.9/RECO_CD_04_s1s2cd1 
 470: 565360 Mb o/192.168.10.9/RECO_CD_05_s1s2cd1 
 471: 565360 Mb o/192.168.10.9/RECO_CD_06_s1s2cd1 
 472: 565360 Mb o/192.168.10.9/RECO_CD_07_s1s2cd1 
 473: 565360 Mb o/192.168.10.9/RECO_CD_08_s1s2cd1 
 474: 565360 Mb o/192.168.10.9/RECO_CD_09_s1s2cd1 
 475: 565360 Mb o/192.168.10.9/RECO_CD_10_s1s2cd1 
 476: 565360 Mb o/192.168.10.9/RECO_CD_11_s1s2cd1