Monday, April 11, 2016

How To Cleanup Grid Infrastructure After Installtion Failure

This blogpost is about how to clean up the failed Grid Infrastructure installation,When Grid Infrastructure installation fails, especially while running the root.sh script, Failure maybe due to various reasons. My failure reason was that, I was installing GI and in my VM I had allocated very little memory, due to which the root.sh script hanged, patiently waited for 3 hours, nothing moved, had to abruptly shutdown the VM and go home.

C:\>"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" list hdds
UUID:           3326c82d-4a75-4f5a-a306-5762ec45db86
Parent UUID:    base
State:          locked write
Type:           normal (base)
Location:       C:\RAC11g\ajithn1\ajithn1.vdi
Storage format: VDI
Capacity:       30720 MBytes
Encryption:     disabled

1)      When checked, it was a half baked situation, In node “ajithn1” ASM instance was up and running and all crs services were up

Using username "oracle".
Last login: Sun Apr 10 16:57:19 2016
[oracle@ajithn1 ~]$ ps -ef|grep pmon
oracle    3697     1  0 08:21 ?        00:00:00 asm_pmon_+ASM1
oracle    4423  4346  0 08:32 pts/1    00:00:00 grep pmon
[oracle@ajithn1 ~]$ su -
Password:
[root@ajithn1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@ajithn1 ~]#

2)      In node “ajithn2”, again ASM instance was up but cluster services were not.
Using username "oracle".
Last login: Sun Apr 10 16:58:58 2016
[oracle@ajithn2 ~]$ ps -ef|grep pmon
oracle    3733     1  0 08:32 ?        00:00:00 asm_pmon_+ASM2
oracle    3846  3607  0 08:32 pts/1    00:00:00 grep pmon
[oracle@ajithn2 ~]$ crsctl check crs
-bash: crsctl: command not found
[oracle@ajithn2 ~]$ su -
Password:
[root@ajithn2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@ajithn2 ~]#


3)      Now, how to cleanup your GI installation? To do a cleanup execute the below perl command on all the nodes but not the last node (I have only 2 nodes, so second node will have a slightly different command)

[oracle@ajithn1 ~]$ perl /u01/grid/oracle/product/11.2.0/grid_1/crs/install/rootcrs.pl -verbose -deconfig -force
2016-04-11 08:35:03: Parsing the host name
2016-04-11 08:35:03: Checking for super user privileges
You must be logged in as root to run this script.
2016-04-11 08:35:03: ###### Begin Error Stack Trace ######
2016-04-11 08:35:03:     Package         File                 Line Calling
2016-04-11 08:35:03:     --------------- -------------------- ---- ----------
2016-04-11 08:35:03:  1: crsconfig_lib   s_crsconfig_lib.pm    121 crsconfig_lib::error
2016-04-11 08:35:03:  2: crsconfig_lib   crsconfig_lib.pm      856 crsconfig_lib::s_check_SuperUser
2016-04-11 08:35:03:  3: main            rootcrs.pl            311 crsconfig_lib::check_SuperUser
2016-04-11 08:35:03: ####### End Error Stack Trace #######

Log in as root and rerun this script.
2016-04-11 08:35:03: ###### Begin Error Stack Trace ######
2016-04-11 08:35:03:     Package         File                 Line Calling
2016-04-11 08:35:03:     --------------- -------------------- ---- ----------
2016-04-11 08:35:03:  1: crsconfig_lib   s_crsconfig_lib.pm    122 crsconfig_lib::error
2016-04-11 08:35:03:  2: crsconfig_lib   crsconfig_lib.pm      856 crsconfig_lib::s_check_SuperUser
2016-04-11 08:35:03:  3: main            rootcrs.pl            311 crsconfig_lib::check_SuperUser
2016-04-11 08:35:03: ####### End Error Stack Trace #######

2016-04-11 08:35:03: Not running as authorized user
Insufficient privileges to execute this script
2016-04-11 08:35:03: ###### Begin Error Stack Trace ######
2016-04-11 08:35:03:     Package         File                 Line Calling
2016-04-11 08:35:03:     --------------- -------------------- ---- ----------
2016-04-11 08:35:03:  1: main            rootcrs.pl            313 crsconfig_lib
2016-04-11 08:35:03: ####### End Error Stack Trace #######

[oracle@ajithn1 ~]$ su -
Password:
[root@ajithn1 ~]# perl /u01/grid/oracle/product/11.2.0/grid_1/crs/install/rootcr
2016-04-11 08:35:23: Parsing the host name
2016-04-11 08:35:23: Checking for super user privileges
2016-04-11 08:35:23: User has super user privileges
Using configuration parameter file: /u01/grid/oracle/product/11.2.0/grid_1/crs/i
VIP exists.:ajithn1
VIP exists.: /ajithn1-vip/192.168.78.61/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 17212, multicast IP address 234.96.173.129, l
ACFS-9200: Supported
CRS-2613: Could not find resource 'ora.registry.acfs'.
CRS-4000: Command Stop failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resourc
CRS-2673: Attempting to stop 'ora.crsd' on 'ajithn1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'ajit
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'ajithn1'
CRS-2677: Stop of 'ora.DATA.dg' on 'ajithn1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'ajithn1'
CRS-2677: Stop of 'ora.asm' on 'ajithn1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'ajithn1' has completed
CRS-2677: Stop of 'ora.crsd' on 'ajithn1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ajithn1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'ajithn1'
CRS-2673: Attempting to stop 'ora.evmd' on 'ajithn1'
CRS-2673: Attempting to stop 'ora.asm' on 'ajithn1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ajithn1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ajithn1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ajithn1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ajithn1' succeeded
CRS-2677: Stop of 'ora.asm' on 'ajithn1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'ajithn1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'ajithn1'
CRS-2677: Stop of 'ora.cssd' on 'ajithn1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ajithn1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'ajithn1'
CRS-2677: Stop of 'ora.gpnpd' on 'ajithn1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'ajithn1'
CRS-2677: Stop of 'ora.diskmon' on 'ajithn1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ajithn1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ajithn1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node
[root@ajithn1 ~]#


4)      Now, run the below perl command in the last node of your cluster, in my case it is node “ajithn2” , I have only 2 nodes.

[oracle@ajithn2 ~]$ perl /u01/grid/oracle/product/11.2.0/grid_1/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode
2016-04-11 08:37:29: Parsing the host name
2016-04-11 08:37:29: Checking for super user privileges
You must be logged in as root to run this script.
2016-04-11 08:37:29: ###### Begin Error Stack Trace ######
2016-04-11 08:37:29:     Package         File                 Line Calling
2016-04-11 08:37:29:     --------------- -------------------- ---- ----------
2016-04-11 08:37:29:  1: crsconfig_lib   s_crsconfig_lib.pm    121 crsconfig_lib::error
2016-04-11 08:37:29:  2: crsconfig_lib   crsconfig_lib.pm      856 crsconfig_lib::s_check_SuperUser
2016-04-11 08:37:29:  3: main            rootcrs.pl            311 crsconfig_lib::check_SuperUser
2016-04-11 08:37:29: ####### End Error Stack Trace #######

Log in as root and rerun this script.
2016-04-11 08:37:29: ###### Begin Error Stack Trace ######
2016-04-11 08:37:29:     Package         File                 Line Calling
2016-04-11 08:37:29:     --------------- -------------------- ---- ----------
2016-04-11 08:37:29:  1: crsconfig_lib   s_crsconfig_lib.pm    122 crsconfig_lib::error
2016-04-11 08:37:29:  2: crsconfig_lib   crsconfig_lib.pm      856 crsconfig_lib::s_check_SuperUser
2016-04-11 08:37:29:  3: main            rootcrs.pl            311 crsconfig_lib::check_SuperUser
2016-04-11 08:37:29: ####### End Error Stack Trace #######

2016-04-11 08:37:29: Not running as authorized user
Insufficient privileges to execute this script
2016-04-11 08:37:29: ###### Begin Error Stack Trace ######
2016-04-11 08:37:29:     Package         File                 Line Calling
2016-04-11 08:37:29:     --------------- -------------------- ---- ----------
2016-04-11 08:37:29:  1: main            rootcrs.pl            313 crsconfig_lib::error
2016-04-11 08:37:29: ####### End Error Stack Trace #######

[oracle@ajithn2 ~]$ su -
Password:
[root@ajithn2 ~]# perl /u01/grid/oracle/product/11.2.0/grid_1/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode
2016-04-11 08:37:44: Parsing the host name
2016-04-11 08:37:44: Checking for super user privileges
2016-04-11 08:37:44: User has super user privileges
Using configuration parameter file: /u01/grid/oracle/product/11.2.0/grid_1/crs/install/crsconfig_params
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 17212, multicast IP address 234.96.173.129, listening port 2016
PRKO-2439 : VIP does not exist.

ACFS-9200: Supported
CRS-2613: Could not find resource 'ora.registry.acfs'.
CRS-4000: Command Stop failed, or completed with errors.
CRS-2613: Could not find resource 'ora.registry.acfs'.
CRS-4000: Command Delete failed, or completed with errors.
CRS-2673: Attempting to stop 'ora.crsd' on 'ajithn2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'ajithn2'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'ajithn2'
CRS-2677: Stop of 'ora.DATA.dg' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'ajithn2'
CRS-2677: Stop of 'ora.asm' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.eons' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.ons' on 'ajithn2'
CRS-2677: Stop of 'ora.ons' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.eons' on 'ajithn2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'ajithn2' has completed
CRS-2677: Stop of 'ora.crsd' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.evmd' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.asm' on 'ajithn2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.asm' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'ajithn2'
CRS-2677: Stop of 'ora.cssd' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'ajithn2'
CRS-2677: Stop of 'ora.diskmon' on 'ajithn2' succeeded
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-4000: Command Modify failed, or completed with errors.
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'ajithn2'
CRS-2676: Start of 'ora.cssdmonitor' on 'ajithn2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'ajithn2'
CRS-2672: Attempting to start 'ora.diskmon' on 'ajithn2'
CRS-2676: Start of 'ora.diskmon' on 'ajithn2' succeeded
CRS-2676: Start of 'ora.cssd' on 'ajithn2' succeeded
CRS-4611: Successful deletion of voting disk +DATA.
CRS-2672: Attempting to start 'ora.ctssd' on 'ajithn2'
CRS-2676: Start of 'ora.ctssd' on 'ajithn2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'ajithn2'
CRS-2676: Start of 'ora.asm' on 'ajithn2' succeeded
ASM de-configuration trace file location: /u01/app/oracle/cfgtoollogs/asmca/asmcadc_clean4904428430281726239.log
ASM Clean Configuration START
ASM Clean Configuration END

ASM with SID +ASM1 deleted successfully. Check /u01/app/oracle/cfgtoollogs/asmca/asmcadc_clean4904428430281726239.log for details.

CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-4000: Command Delete failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ajithn2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.asm' on 'ajithn2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.asm' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'ajithn2'
CRS-2677: Stop of 'ora.cssd' on 'ajithn2' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'ajithn2'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ajithn2'
CRS-2677: Stop of 'ora.gipcd' on 'ajithn2' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'ajithn2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ajithn2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
[root@ajithn2 ~]#


5)      My ASM instance was also up, But when I try to reinstall the GI, I will not be able to use the already used ASM devices (sdb1, sdc1), So, let’s scribble something on the headers of the devices to erase the details of ASM DG and make it reusable. (Using dd command to erase the ASM diskgroup details from device headers)

[root@ajithn1 ~]# /etc/init.d/oracleasm listdisks
BACKUP
DATA
[root@ajithn1 ~]# dd if=/dev/zero of=/dev/sdb1 bs=1024 count=100
100+0 records in
100+0 records out
102400 bytes (102 kB) copied, 0.267837 seconds, 382 kB/s
[root@ajithn1 ~]# dd if=/dev/zero of=/dev/sdc1 bs=1024 count=100
100+0 records in
100+0 records out
102400 bytes (102 kB) copied, 0.357379 seconds, 287 kB/s
[root@ajithn1 ~]#


6)      Removing the ASM Diskgroups

[root@ajithn1 ~]# /etc/init.d/oracleasm deletedisk DATA /dev/sdb1
Removing ASM disk "DATA":                                  [  OK  ]
[root@ajithn1 ~]# /etc/init.d/oracleasm deletedisk BACKUP /dev/sdc1
Removing ASM disk "BACKUP":                                [  OK  ]
[root@ajithn1 ~]# /etc/init.d/oracleasm listdisks
[root@ajithn1 ~]#

7)      Recreating the ASM diskgroups.

[root@ajithn1 ~]# /etc/init.d/oracleasm createdisk data /dev/sdb1
Marking disk "data" as an ASM disk:                        [  OK  ]
[root@ajithn1 ~]# /etc/init.d/oracleasm createdisk backup /dev/sdc1
Marking disk "backup" as an ASM disk:                      [  OK  ]
[root@ajithn1 ~]# /etc/init.d/oracleasm listdisks
BACKUP
DATA
[root@ajithn1 ~]#


Once the ASM Diskgroups are recreated, we can restart the GI installation and it should be successful, and I had a successful installation after I increased my VM memory.



HAPPY LEARNING!

No comments:

Post a Comment

Thanks for you valuable comments !