Oracle数据库 ORA-600 [13013]故障处理

接到电话,客户的核心系统Oracle数据库实例宕机,远程过去在告警日志中看到大量的如下报错,报错很频繁:

......

Fri Jul 25 13:20:14 2014

Errors in file /u01/app/oracle/diag/rdbms/d012band/d012band/trace/d012band_smon_5964354.trc  (incident=43361):

ORA-00600: internal error code, arguments: [13013], [5001], [268], [8452274], [7], [8452274], [17], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/d012band/d012band/incident/incdir_43361/d012band_smon_5964354_i43361.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Non-fatal internal error happenned while SMON was doing logging scn->time mapping.

SMON encountered 1 out of maximum 100 non-fatal internal errors.

Fri Jul 25 13:20:17 2014

Dumping diagnostic data in directory=[cdmp_20140725132017], requested by (instance=1, osid=5964354 (SMON)), summary=[incident=43361].

Starting background process SMCO

......

一段时间后,最后实例被PMON进程终止,导致核心系统宕机。

客户环境:Oracle Database 11.2.0.3.0 for IBM AIX 6.1,单机,IBM HACMP主备模式保护。

    参考《 New and Improved: ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 1438920.1)》和《ORA-600 [13013] "Unable to get a Stable set of Records" (文档 ID 28185.1)》文章;

该报错是由于对某个表执行DML操作,该表对应的某个索引损坏导致的,解决的办法是找出操作的表和受损的索引,重建索引即可。

下面是对ORA-600的参数说明:

 Argument format

  =================

  This format relates to Oracle Server 8.0.3 and above

    Arg [a] Passcount

    Arg [b] Data Object number

    Arg [c] Tablespace Decimal Relative DBA (RDBA) of block containing the row to be updated

    Arg [d] Row Slot number

    Arg [e] Decimal RDBA of block being updated (Typically same as [c])

    Arg [f] Code

由此执行下面的SQL语句查询找出DML操作的表,以及表对应的索引:

SQL> SELECT OWNER,OBJECT_NAME,OBJECT_TYPE FROM DBA_OBJECTS WHERE DATA_OBJECT_ID=268;

发现问题对象是SYS用户下面的SMON_SCN_TIME表,SYS.SMON_SCN_TIME是Oracle更新非常频繁的一张表,该数据字典用于维护SCN和时间的关系。

SQL> SELECT OWNER,INDEX_NAME FROM DBA_INDEXES WHERE TABLE_NAME='SMON_SCN_TIME';

发现这个表对应有SMON_SCN_TIME_TIM_IDX和SMON_SCN_TIME_SCN_IDX两个索引。

对表进行分析,找出有问题的索引:

SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;

Table analyzed.

SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;

table/Index Cross Reference Failure - see trace file

能够正常的分析表,但是不能正常的分析表上的索引。

参考《OERR: ORA-1499 table/Index Cross Reference Failure - see trace file (文档 ID 1499.1)》文章,

尝试找出受影响的索引,查看trace file:

          BH (0x70000078fe2df98) file#: 2 rdba: 0x0081483e (2/84030) class: 1 ba: 0x70000078ce52000

            set: 174 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2272,19

            dbwrid: 5 obj: 272 objn: 272 tsn: 1 afn: 2 hint: f

            hash: [0x700000801ff36a0,0x700000801ff36a0] lru: [0x70000074bf81f40,0x70000074bf81ce0]

            ckptq: [NULL] fileq: [NULL] objq: [0x7000007f59b4138,0x7000007f59b4138] objaq: [0x7000007f59b4128,0x7000007f59b4128]

            use: [NULL] wait: [NULL] fast-cr-pins: 2

            st: XCURRENT md: NULL fpin: 'kdiwh15: kdifxs' tch: 12

            flags:

            LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]

            buffer tsn: 1 rdba: 0x0081483e (2/84030)

            scn: 0x0000.2f4752d8 seq: 0x01 flg: 0x06 tail: 0x52d80601

            frmt: 0x02 chkval: 0x0778 type: 0x06=trans data

0x0081483e要去掉0x,0x表示十六进制。

执行下面的SQL语句:

SELECT owner, segment_name, segment_type, partition_name

FROM   DBA_SEGMENTS

WHERE  header_file = (SELECT file# 

                      FROM   v$datafile 

                      WHERE  rfile# = dbms_utility.data_block_address_file(to_number('0080f8b2','XXXXXXXX'))

                        AND  ts#= 1)

  AND header_block = dbms_utility.data_block_address_block(to_number('0080f8b2','XXXXXXXX'));

可惜没有任何的查询结果。

继续参考《Instance Terminated With Error ORA-00474: SMON Process Terminated With Error (文档 ID 1361872.1)》文章,文章有下面一段描述:

CAUSE

ORA600 [13011] is raised due to indexes corruption. To verify the corruption run the following statements:

SQL> conn / as sysdba

SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE;

Table analyzed.

-- It should come out clean giving message table analyzed.

SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE; 

-- it should fail with Ora-1499 if at least one index is corrupted

ORA-1499就是我们前面遇到的报错:

Error: ORA 1499 

Text: table/Index Cross Reference Failure - see trace file 

-------------------------------------------------------------------------------

Cause:  An error occurred when validating an index or a table using the 

        ANALYZE command.

        One or more entries does not point to the appropriate cross-reference.

Action: Check the trace file for more descriptive messages about the problem.

        Correct these errors.

如果在分析表的时候收到ORA-1499的错误,表示至少有一个索引是损坏的。

SOLUTION

Rebuild corrupted indexes:

SQL> conn as sysdba

SQL> ALTER INDEX SMON_SCN_TIME_TIM_IDX REBUILD ONLINE;

SQL> ALTER INDEX SMON_SCN_TIME_SCN_IDX REBUILD ONLINE;

then re-run

SQL> ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE;

Table analyzed.

Note: The last statement should not report any errors.

两个索引重建之后,ANALYZE TABLE smon_scn_time VALIDATE STRUCTURE CASCADE ONLINE语句能顺利的分析,告警日志不再有报错产生。