zhangle633 发表于 2016-5-30 16:47:17

RAC 双机集群出现问题,故障节点CRS服务始终起不来。

大家好,现在我们的生产环境的oracle集群出现问题,一共两个节点,其中一个节点的crs服务始终无法启动。
下面我列出详细的排查的过程,大家帮我看看问题在哪里,如何解决,谢谢。


SQL> select name,state,total_mb from gv$asm_diskgroup;

NAME
------------------------------------------------------------
STATE                  TOTAL_MB
---------------------- ----------
OCRVT
DISMOUNTED                      0

DATA
DISMOUNTED                      0


SQL> alter diskgroup ocrvt mount;
alter diskgroup ocrvt mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCRVT" cannot be mounted
ORA-15003: diskgroup "OCRVT" already mounted in another lock name space

始终无法手动挂载这两个dismounted的磁盘
根据以上报错信息,在网上搜索了一些解决方案, 在故障节点上依次执行
roothas.pl      rootcrs.pl       root.sh

# ./roothas.pl -deconfig -force -verbose
Using configuration parameter file: ./crsconfig_params
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Delete failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
You must kill ohasd processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
ACFS-9313: No ADVM/ACFS installation detected.
Either /etc/oracle/olr.loc does not exist or is not readable
Make sure the file exists and it has read and execute access
Failure in execution (rc=-1, 512, A file or directory in the path name does not exist.) for command /etc/ohasd deinstall
Successfully deconfigured Oracle Restart stack


# ./rootcrs.pl -deconfig -force -verbose
Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd

CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'racdba3'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'racdba3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'racdba3'
CRS-2673: Attempting to stop 'ora.asm' on 'racdba3'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'racdba3'
CRS-2677: Stop of 'ora.mdnsd' on 'racdba3' succeeded
CRS-2677: Stop of 'ora.asm' on 'racdba3' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'racdba3'
CRS-2677: Stop of 'ora.drivers.acfs' on 'racdba3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'racdba3' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'racdba3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'racdba3'
CRS-2677: Stop of 'ora.cssd' on 'racdba3' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'racdba3'
CRS-2677: Stop of 'ora.gipcd' on 'racdba3' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'racdba3'
CRS-2677: Stop of 'ora.gpnpd' on 'racdba3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'racdba3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
This may take several minutes. Please wait ...
0518-307 odmdelete: 1 objects deleted.
0518-307 odmdelete: 1 objects deleted.
0518-307 odmdelete: 1 objects deleted.
Successfully deconfigured Oracle clusterware stack on this node


# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=/oracle/app/grid_home

Enter the full pathname of the local bin directory: :
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /oracle/app/grid_home/crs/install/crsconfig_params
User ignored Prerequisites during installation
Failed to get parameter value(s) from a "/oracle/app/grid_home/gpnp/racdba3/profiles/peer/profile.xml" profile.
Failed to get parameter value(s) from a "/oracle/app/grid_home/gpnp/profiles/peer/profile.xml" profile.
User grid has the required capabilities to run CSSD in realtime mode
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'racdba3'
CRS-2676: Start of 'ora.mdnsd' on 'racdba3' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racdba3'
CRS-2676: Start of 'ora.gpnpd' on 'racdba3' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racdba3'
CRS-2672: Attempting to start 'ora.gipcd' on 'racdba3'
CRS-2676: Start of 'ora.cssdmonitor' on 'racdba3' succeeded
CRS-2676: Start of 'ora.gipcd' on 'racdba3' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racdba3'
CRS-2672: Attempting to start 'ora.diskmon' on 'racdba3'
CRS-2676: Start of 'ora.diskmon' on 'racdba3' succeeded
CRS-2676: Start of 'ora.cssd' on 'racdba3' succeeded

Mounting Disk Group OCRVT failed with the following message:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCRVT" cannot be mounted
ORA-15003: diskgroup "OCRVT" already mounted in another lock name space


Configuration of ASM ... failed
see asmca logs at /oracle/app/grid_base/cfgtoollogs/asmca for details
Did not succssfully configure and start ASM at /oracle/app/grid_home/crs/install/crsconfig_lib.pm line 6763.
/oracle/app/grid_home/perl/bin/perl -I/oracle/app/grid_home/perl/lib -I/oracle/app/grid_home/crs/install /oracle/app/grid_home/crs/install/rootcrs.pl execution failed

So,现在最主要问题就是故障节点的ASM磁盘挂载不上去。
主要报错就是
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCRVT" cannot be mounted
ORA-15003: diskgroup "OCRVT" already mounted in another lock name space

有朋友提出可以把另外一个节点服务器重启一下,或者数据库停掉,也执行一遍root.sh。
但是这个节点目前是唯一支撑应用的了,不能随便停机。

朋友们有什么好的建议或解决方案吗?
需要日志什么的 我会贴出来
页: [1]
查看完整版本: RAC 双机集群出现问题,故障节点CRS服务始终起不来。