1.02.2018

Wrong Cell Server name on X6-2 Elastic Rack - Bug 25317550

Two X6-2 Elastic Full capacity Exadata systems were deployed recently. Due to the following BUG, cell names were not properly updated with the client provided names after executing the applyElasticConfig.sh.

Bug 25317550 : OEDA FAILS TO SET CELL NAME RESULTING IN GRID DISK NAMES NOT HAVING RIGHT SUFFIX

Though this doesn't impact the operations, but, certainly will create confusion when multiple Exadata systems are deployed in the same data center, due to exact name of cell, cell disks, grid disks.

Note : Its highly recommended to validate the cell names after executing the applyElasticConfig.sh, before running the onecommand. If you encounter the similar problem, simply change the cell name with alter cell name=[correctname] and proceed with onecommand execution to avoid the BUG.

The default names looks like the below :

# dcli -g cell_group -l root 'cellcli -e list cell attributes name'
                        celadm1: ru02
                        celadm1: ru04
                        celadm3: ru06
                        celadm4: ru08
                        celadm5: ru10
                        celadm6: ru12



Changing the cell name to reflect the cell disk, grid disk names, you need to follow the below procedure:

The procedure below must be performed on all cells separately and sequentially(to avoid full downtime);

1) Change the cell name:
    cellcli> alter cell name=celadm5

 
2) Confirm griddisks can be taken offline.

    cellcli> list griddisk attributes name, ASMDeactivationOutcome, ASMModeStatus
            ASMDeactivationOutcome - Should be YES for all griddisks


3) Inactivate griddisk on that cell
    cellcli> alter griddisk all inactive

 
            Observation - IF any votesiks are in the storage server will relocate to any surviving storage servers.


4) Change cell disk name
            alter celldisk CD_00_ru10 name=CD_00_celadm5;    
            alter celldisk CD_01_ru10 name=CD_01_
celadm5;
            alter celldisk CD_02_ru10 name=CD_02_
celadm5;
            alter celldisk CD_03_ru10 name=CD_03_
celadm5;
            alter celldisk CD_04_ru10 name=CD_04_
celadm5;
            alter celldisk CD_05_ru10 name=CD_05_
celadm5;
            alter celldisk CD_06_ru10 name=CD_06_
celadm5;
            alter celldisk CD_07_ru10 name=CD_07_
celadm5;
            alter celldisk CD_08_ru10 name=CD_08_
celadm5;
            alter celldisk CD_09_ru10 name=CD_09_
celadm5;
            alter celldisk CD_10_ru10 name=CD_10_
celadm5;
            alter celldisk CD_11_ru10 name=CD_11_
celadm5;

5) Change Griddisk name using the below examples (do it for all grid disks, DATAC1, DBFS & RECOC1)
            alter GRIDDISK DATAC1_CD_00_ru10  name=DATAC1_CD_00_
celadm5;
            alter GRIDDISK DBFS_DG_CD_02_ru10 name=DBFS_DG_CD_02_
celadm5;
            alter GRIDDISK RECOC1_CD_11_ru10  name=RECOC1_CD_11_
celadm5;

6) Activate griddisk on that cell
            cellcli> ALTER GRIDDISK ALL ACTIVE;
       
        There are some important points to be noted after activating griddisks.


      a) asm disks path and name
       * griddisk name change is automatically getting relflected in asm disk path.
       * asm logical name is still referring old name.
      b) failgroup
       * failgroup name is changed and using the same old name.

7) Changing ASM logical name and failgroup name.

    * This can be achived by dropping asmdisk and adding back with correct name. The observation is failgroup name will get automatically changed when we adding
      back asm disks with correct name.
    * ASMCA is the best tool to drop and add back asm disks with 250+ rebalancing power limit.
   
    a) Drop asm disks and observations.
        * We need to make sure asmdisks can be dropped
             cellcli> list griddisk attributes name, ASMDeactivationOutcome, ASMModeStatus
                ASMDeactivationOutcome - Should be YES for all griddisks
        * Drop asmdisks using asmca or alter diskgroup 


        We can see asmdisk state will be dropping and there will be an ongoing rebalance operation.
   
        * ASM rebalance operation.
            We can see ongoing asm rebalance operation using below command and change the power to finish it fast.


                sqlplus / as asm


                sql> select * from v$asm_operation;
                sql> alter diskgroup DATAC1 rebalance power 256;


        * Once rebalance operation completed we can asm disk state as changed to noraml, name will become empty failgroup also changed with corret name.
       
a) ADD back asm disks and observations.

Adding back as well can be done by using asmca or asm diskgroup alter commands.
We need to make sure we are adding back with correct name in this case DATAC1_CD_00_RU10 should be added back DATAC1_CD_00_arb02celadm19

We can see ongoing asm rebalance operation using below command and change the power to finish it fast.


        sqlplus / as asm
        sql> select * from v$asm_operation;
   
8) Remaining cells

We can continue same operation for remaining cells and entire operation can be completed with out any downtime at database level.
Once we have completed we can see all votedisks as well relocated or renamed with new name.


References:
Bug 25317550 : OEDA FAILS TO SET CELL NAME RESULTING IN GRID DISK NAMES NOT HAVING RIGHT SUFFIX

I appreciate and thank my team member Khalid Kizhakkethil for doing this wonderful job and preparing the documentation.


 



No comments: