Srinivasarao Kandimalla's Blog: 2012

Friday, 7 September 2012

Ocr and voting disks Administration

Voting disk:
As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n
voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). Oracle Clusterware supports a maximum of 32 voting disks.It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our
cluster will fail with both 4 voting disks or 3 voting disks.

Friday, 31 August 2012

RMAN

1. checksyntax

>rman checksyntax

Recovery Manager: Release 10.2.0.5.0 - Production on Fri Aug 31 02:31:28 2012
Copyright (c) 1982, 2007, Oracle. All rights reserved.
RMAN> @hot_d2d.rcv
RMAN> #
2> #   File:         NLPR1.rcv
3> #   Description: RMAN script to backup the NLPR1 database
4> #   Author:       Paul Gray , I.B.M.
5> #
6> #   Modification History
7> #   --------------------
8> #   P. Gray           01/09/03 Created
9> #   N. Sampson        20/07/05 Amended for R&D
10> #########################################################################
11>
12> run {
13> allocate channel dsk1 type disk format '${DESTINATION}/%d_inc0_%U.rbck' ;
14> backup incremental level 0 database ;
15> release channel dsk1;
16> }
The cmdfile has no syntax errors
RMAN>
RMAN> sql 'alter system archive log current';
The cmdfile has no syntax errors
RMAN>
RMAN> change archivelog all crosscheck;
The cmdfile has no syntax errors
RMAN>
RMAN> run {
2> allocate channel arch1 type disk format '${DESTINATION}/%d_arch_%U.rbck';
3> backup archivelog all ;
4> backup archivelog all archivelog until time 'sysdate-2' delete input;
5> release channel arch1;
6> }
The cmdfile has no syntax errors
RMAN> exit
Recovery Manager complete.

Tuesday, 31 July 2012

Local inventory vs Global inventory

oraInventory

The inventory is a very important part of the Oracle Universal Installer. oraInventory is repository (directory) which store/records oracle software products & their oracle_homes location on a machine.

Binary OraInventory

Before OUI 2.X the inventory was binary, the binary orainvenory maintains in inventory in binary format

XML Inventory

Starting from OUI 2.X and higher information in the inventory is stored in the Extensible Markup Language (XML) format
The XML format allows easier diagnostic of the problem and faster loading of data.
There are two inventories with the newer releases of OUI (2.x and higher):

1.Local Inventory
There is one Local inventory per ORACLE_HOME.
Inventory inside each Oracle Home is called as local Inventory or oracle_home Inventory. This Inventory holds information to that oracle_home only.
2.Global Inventory
The central inventory directory outside the ORACLE_HOME (Global Inventory)

This inventory stores information about.

All the Oracle software products installed on all ORACLE_HOMES on a machine,These products can be various oracle components like database, oracle application server, collaboration suite, soa suite, forms & reports or discoverer server.
Other non-oracle products such as Java Runtime env's (JRE)

At startup, the Oracle Universal Installer first looks for the key that specifies where the global inventory is located at (this key varies by platform).
/var/opt/oracle/oraInst.loc (typical)
/etc/oraInst.loc (AIX and Linux)
HKEY_LOCAL_MACHINE -> Software -> Oracle -> INST_LOC (Windows platforms)

If this key is found, the directory within it will be used as the global inventory location.
If the key is not found, the inventory path will default created as follows:

UNIX : ORACLE_BASE\oraInventory
WINDOWS : c:\program files\oracle\Inventory

If the ORACLE_BASE environment variable is not defined, the inventory is created at the same level as the first Oracle home. For
example, if your first Oracle home is at /private/ORACLEHome1, then, the inventory is at /private/oraInventory.

If you want to see list of oracle products on machine check for file inventory.xml under ContentsXML in oraInventory
multiple Global Inventory on a machine
YES you can have multiple global Inventory
Global Inventory is corrupted ?
No need to worry if your global Inventory is corrupted, you can recreate global Inventory on machine using Universal Installer and attach already Installed oracle home by option
-attachHome
./runInstaller -silent -attachHome -invPtrLoc $location_to_oraInst.loc
ORACLE_HOME="Oracle_Home_Location" ORACLE_HOME_NAME="Oracle_Home_Name"
CLUSTER_NODES="{}"

Wednesday, 6 June 2012

New finding in architecture

Just now i find out very interesting concept in Architecture.

When instance starts, all background process related to read/write with database (DBWR & LGWR) will read redolog files and data files while opening the database. actually this used to happen in oracle 6.0 in which there is no CKPT process and LGWR used to perform checkpoint at that time. after that even though oracle changed and introduced CKPT, still in 10g also LGWR reads datafiles while opening the database.

For more information read following link.
http://forums.oracle.com/forums/thread.jspa?messageID=2631627&#2631627

Saturday, 2 June 2012

Recovery scenario during Loss of all controlfiles

LOSS of all controlfiles (Complete recovery and No need to open the database with resetlogs after recreating the controlfile manually)

Assumptions:

- We are tracing the backup controlfile as part of backup activity or we are having controlfile recreation script.

- We have lossed all the controlfiles including mirrored controlfiles also.

1. I have deleted all the controlfiles manually at OS level.

ls -ltr *.ctl

-rw-r----- 1 oracle dba 6602752 Oct 26 14:01 SID_control03.ctl

-rw-r----- 1 oracle dba 6602752 Oct 26 14:01 SID_control02.ctl

-rw-r----- 1 oracle dba 6602752 Oct 26 14:01 SID_control01.ctl

rm *.ctl

ls -ltr *.ctl

ls: 0653-341 The file *.ctl does not exist.

2. After deleting the controlfiles I have connected to the database and try to access the some information from controlfile views, while viewing the information from v$controlfile_record_section view got the below error. Below are the details.

sqlplus "/as sysdba"

SQL*Plus: Release 10.2.0.2.0 - Production on Mon Oct 26 14:02:32 2009

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production

With the Partitioning and Data Mining options

SQL> select * from V$CONTROLFILE_RECORD_SECTION;

select * from V$CONTROLFILE_RECORD_SECTION

ERROR at line 1:

ORA-00210: cannot open the specified control file

ORA-00202: control file:

'/restore1/oracle/SID/data/SID_control01.ctl'

ORA-27041: unable to open file

IBM AIX RISC System/6000 Error: 2: No such file or directory

Additional information: 3

Note:

After above step I tried to reproduce the same error in different ways,after lossing all the controlfiles also, I'm able to do manual check point and swich logfile successfully. While accessing most of the v$ views I got the controlfile error. Below are the details.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL> show parameter control

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

control_file_record_keep_time integer 7

control_files string /restore1/oracle/SID/data

/SID_control01.ctl, /rest

ore1/oracle/SID/data/ATOP

RD01_control02.ctl, /restore1/

oracle/SID/data/SID_

control03.ctl

SQL> archive log list

ORA-00210: cannot open the specified control file

ORA-00202: control file: '/restore1/oracle/SID/data/SID_control01.ctl'

ORA-27041: unable to open file

IBM AIX RISC System/6000 Error: 2: No such file or directory

Additional information: 3

SQL> select * from v$log;

select * from v$log

ERROR at line 1:

ORA-00210: cannot open the specified control file

ORA-00202: control file:

'/restore1/oracle/SID/data/SID_control01.ctl'

ORA-27041: unable to open file

IBM AIX RISC System/6000 Error: 2: No such file or directory

Additional information: 3

SQL> select * from v$logfile;

select * from v$logfile

ERROR at line 1:

ORA-00210: cannot open the specified control file

ORA-00202: control file:

'/restore1/oracle/SID/data/SID_control01.ctl'

ORA-27041: unable to open file

IBM AIX RISC System/6000 Error: 2: No such file or directory

Additional information: 3

SQL> alter system backup controlfile to trace;

alter system backup controlfile to trace

ERROR at line 1:

ORA-02065: illegal option for ALTER SYSTEM

SQL> select * from v$database;

select * from v$database

ERROR at line 1:

ORA-00210: cannot open the specified control file

ORA-00202: control file:

'/restore1/oracle/SID/data/SID_control01.ctl'

ORA-27041: unable to open file

IBM AIX RISC System/6000 Error: 2: No such file or directory

Additional information: 3

3. After that shut down database using shut abort option.

SQL> shut abort

ORACLE instance shut down.

4. Below is the control file recreation script make sure change the resetlogs option to noresetlogs

(Because we are having all archives and on line redo logs for complete recovery).

cat cr8contrl.sql

CREATE CONTROLFILE REUSE DATABASE "SID" NORESETLOGS NOARCHIVELOG

MAXLOGFILES 16

MAXLOGMEMBERS 3

MAXDATAFILES 100

MAXINSTANCES 8

MAXLOGHISTORY 584

LOGFILE

GROUP 1 (

'/restore1/oracle/SID/data/SID_redo01a.log',

'/restore1/oracle/SID/data/SID_redo01b.log'

) SIZE 10M,

GROUP 2 (

'/restore1/oracle/SID/data/SID_redo02a.log',

'/restore1/oracle/SID/data/SID_redo02b.log'

) SIZE 10M

-- STANDBY LOGFILE

DATAFILE

'/restore1/oracle/SID/data/SID_system01.dbf',

'/restore1/oracle/SID/data/SID_undotbs01.dbf',

'/restore1/oracle/SID/data/SID_sysaux01.dbf',

'/restore1/oracle/SID/data/SID_users01.dbf',

'/restore1/oracle/SID/data/SID_autodata01.dbf',

'/restore1/oracle/SID/data/SID_autoindexes01.dbf'

CHARACTER SET US7ASCII

;

5. start the instance in nomount( this step is not required because startup nomount is part of controlfile recreation script).

SQL> startup nomount

ORACLE instance started.

Total System Global Area 838860800 bytes

Fixed Size 2074928 bytes

Variable Size 503318224 bytes

Database Buffers 327155712 bytes

Redo Buffers 6311936 bytes

6. run the controlfile recreatin script as below.

SQL>@cr8contrl.sql

Control file created.

7. Try to open the database without recovery will get the error message like datafile 1 is inconsistent.

SQL>alter database open;

alter database open

ERROR at line 1:

ORA-01113: file 1 needs media recovery

ORA-01110: data file 1: '/restore1/oracle/SID/data/SID_system01.dbf'

8. Recovery the database without using backup controlfile option.

SQL>recover database;

Media recovery completed.

9. Open the database without resetlogs;

SQL> alter database open;

Database altered.

Saturday, 26 May 2012

How to pin nodes in RAC

The following steps can be used to pin nodes in RAC.
Pinning a node means that the association of a node name with a node number is fixed. If a node is not pinned,node number may change if the lease expires while it is down. The lease of a pinned node never expires.
To find whether a node in a cluster is pinned,use the olsnodes command.
[grid@raclinux1 stage10g]$ olsnodes -t
raclinux1 Unpinned
raclinux2 Unpinned

Use the pin command to pin the nodes
[root@raclinux1 bin]# ./crsctl pin css -n raclinux1
CRS-4664: Node raclinux1 successfully pinned.
[root@raclinux1 bin]#

[root@raclinux2 bin]# ./crsctl pin css -n raclinux2
CRS-4664: Node raclinux2 successfully pinned.
[root@raclinux2 bin]#

[grid@raclinux1 stage10g]$ olsnodes -t
raclinux1 Pinned
raclinux2 Pinned

Thursday, 24 May 2012

Loss of redolog group

Recovering after loss of members of the INACTIVE redo log group

You’re attempting to open your database and receive this message

ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo02.log’
ORA-00312: online log 2 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo02a.log’
The message indicates that two members of an online redo log group in your database have experienced a media failure.
To recover when you’ve lost all members of an inactive redo log group, perform the following steps

Verify that the log group status is INACTIVE.
SQL> select group#,members,status from v$log;
GROUP#    MEMBERS STATUS
———- ———- —————-
1                      2             CURRENT
2                      2             INACTIVE
3                      2             ACTIVE

Recreate the log group with the clear logfile command.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
(Here I manually deleted redolog files members of group 2)
SQL> startup
ORACLE instance started.
Total System Global Area 285212672 bytes
Fixed Size                  1218992 bytes
Variable Size              92276304 bytes
Database Buffers          188743680 bytes
Redo Buffers                2973696 bytes
Database mounted.

ORA-00313: open failed for members of log group 2 of thread 1

ORA-00312: online log 2 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo02.log’
ORA-00312: online log 2 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo02a.log’
If all members of an online redo log group are damaged, you won’t be able to open your database, oracle will allow you to only mount your database.
SQL> select status from v$instance;
STATUS
————
MOUNTED
SQL> select group#,status,archived,thread#,sequence# from v$log;
GROUP# STATUS           ARC    THREAD# SEQUENCE#
———- —————- — ———- ———-
1           CURRENT          NO           1         11
3           INACTIVE         YES          1         10
2           INACTIVE         YES          1          9
If the status is INACTIVE and ARCHIVED=YES,then this log group is no longer needed for crash recovery , therefore ,you can use the clear logfile command to recreate all members of a log group.
SQL> alter database clear logfile group 2;
Database altered.
if STATUS = INACTIVE and ARCHIVED = NO

sql> ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2;

Recovering After loss of members of the CURRENT redo log group

All of the members of a current online redo log group in your database have experience media failure.
You lose all members of a current online redo log group. Here are possible reasons,

Perform an incomplete recovery up to the last good SCN.
If flashback is enabled, flash your database back to the last good SCN.
If you’re using Oracle data guard, fail over to your physical or logical standby database.

SQL> select group#,status,archived,thread#,sequence# from v$log;
GROUP# STATUS           ARC    THREAD# SEQUENCE#
———- —————- — ———- ———-
1               CURRENT          NO           1         17
2                INACTIVE         YES          1         15
3                INACTIVE         YES          1         16
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 285212672 bytes
Fixed Size                  1218992 bytes
Variable Size              92276304 bytes
Database Buffers          188743680 bytes
Redo Buffers                2973696 bytes
Database mounted.

ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo01.log’
ORA-00312: online log 1 thread 1:
‘/home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/redo01a.log’

SQL> select group#,status,archived,thread#,sequence# from v$log;

GROUP# STATUS           ARC    THREAD# SEQUENCE#
———- —————- — ———- ———-
1                CURRENT          NO           1         17
3                INACTIVE         YES          1         16
2                INACTIVE         YES          1         15
SQL> alter system checkpoint;
alter system checkpoint
*
ERROR at line 1:
ORA-01109: database not open
First determine the last good SCN by querying the first_change# column from v$log, you’re missing only the current online redo logs, therefore you can perform an incomplete recovery up to but not including,
SQL> select group#,status,archived,thread#,sequence#,first_change# from v$log;
GROUP# STATUS           ARC    THREAD# SEQUENCE# FIRST_CHANGE#
———- —————- — ———- ———- ————-
1             CURRENT          NO           1         17        574131
3 INACTIVE         YES          1         16        574125
2             INACTIVE         YES          1         15        574103
SQL>
In this case, you can restore and recover up to but not including, 574131
Operation :-
Losing all members of your current online redo log group is arguably the worst thing taht can happen to your database. if you experience media failure with all members of the current online redo group, then you will lose any transactions contained in those logs. in this case, you will have to perform incomplete recovery before you can open your database.
[oracle@localhost root]$ rlwrap rman target sys/sys
Recovery Manager: Release 10.2.0.1.0 – Production on Mon May 31 10:51:10 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: TEST (DBID=2015564632, not open)
RMAN> restore database until scn 574131;
Starting restore at 31-MAY-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=150 devtype=DISK
channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/system01.dbf
restoring datafile 00002 to /home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/undotbs01.dbf
restoring datafile 00003 to /home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/sysaux01.dbf
restoring datafile 00004 to /home/oracle/oracle/product/10.2.0/db_1/oradata/TEST/users01.dbf
channel ORA_DISK_1: reading from backup piece /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/backupset/2010_05_25/o1_mf_nnndf_TAG20100525T103307_5zpz9nft_.bkp
channel ORA_DISK_1: restored backup piece 1
piece handle=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/backupset/2010_05_25/o1_mf_nnndf_TAG20100525T103307_5zpz9nft_.bkp tag=TAG20100525T103307
channel ORA_DISK_1: restore complete, elapsed time: 00:01:37
Finished restore at 31-MAY-10
RMAN> recover database until scn 574131;
Starting recover at 31-MAY-10
using channel ORA_DISK_1
starting media recovery
archive log thread 1 sequence 7 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_25/o1_mf_1_7_5zpzh95q_.arc
archive log thread 1 sequence 8 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_8_606k8n7x_.arc
archive log thread 1 sequence 9 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_9_606lkvgy_.arc
archive log thread 1 sequence 10 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_10_606mfskg_.arc
archive log thread 1 sequence 11 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_11_606ndqvj_.arc
archive log thread 1 sequence 12 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_12_606szwxo_.arc
archive log thread 1 sequence 13 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_13_606t3zs8_.arc
archive log thread 1 sequence 14 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_14_606t777d_.arc
archive log thread 1 sequence 15 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_15_606t908r_.arc
archive log thread 1 sequence 16 is already on disk as file /home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_16_606t9jd0_.arc
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_25/o1_mf_1_7_5zpzh95q_.arc thread=1 sequence=7
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_8_606k8n7x_.arc thread=1 sequence=8
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_9_606lkvgy_.arc thread=1 sequence=9
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_10_606mfskg_.arc thread=1 sequence=10
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_11_606ndqvj_.arc thread=1 sequence=11
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_12_606szwxo_.arc thread=1 sequence=12
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_13_606t3zs8_.arc thread=1 sequence=13
archive log filename=/home/oracle/oracle/product/10.2.0/db_1/flash_recovery_area/TEST/archivelog/2010_05_31/o1_mf_1_14_606t777d_.arc thread=1 sequence=14
media recovery complete, elapsed time: 00:00:06
Finished recover at 31-MAY-10
RMAN> alter database open resetlogs;
database opened.
SQL> select status from v$instance;
STATUS
————
OPEN
SQL> select group#,members,status from v$log;
GROUP#    MEMBERS STATUS
———- ———- —————-
1                       2           INACTIVE
2           2          CURRENT
3         2          UNUSED

Recover from loss of active redo log file.
- shutdown the database

SHUTDOWN IMMEDIATE;

- restore the datafiles and proceed for an incomplete recovery until the missing log sequence
- mount the database

STARTUP MOUNT;

- check the status of the missing log file group from v$log view

select group#, sequence#, bytes, first_change#, to_char(first_time,’DD-MM-YY hh24:mi:ss’) tim, status from v$log;

- if STATUS = ACTIVE and ARCHIVED = NO

RECOVER DATABASE UNTIL TIME ‘yyyy-mm-dd:hh24:mi:ss’;

‘ss’ in the seconds should be 1 second less than the first_time of the missing redo log file. - open the database with resetlogs option

`1`	`ALTER` `DATABASE` `OPEN` `RESETLOGS;`

- take a solid backup

Monday, 21 May 2012

What happens during HOT BACKUP?

During hot backup, a script or begin backup command puts a tablespace into backup mode, then copies the datafiles to disk or tape, then takes the tablespace out of backup mode.

Many people are having a misconception regarding hot backup that, during hot backup DBWR process will stop writing into datafiles. Also, they say that while the datafiles are not writable, changes are stored somewhere in the SGA, the redologs, the rollback segments etc places and will be written back to datafiles when the tablespace is taken out of backup mode.

In fact, Oracle’s tablespace hot backup does not work this way at all. It absolutely does not stop writing the datafiles, and actually allows continued operation of the database almost exactly as during normal operation.

1. The tablespace is checkpointed 2. The checkpoint SCN in the datafile header freeze to increment with checkpoints 3. Full images of changed DB blocks are written to the redologs

The above three actions are required to guarantee consistency once the file is restored and recovery is applied.
By freezing the checkpoint SCN in the file headers, any subsequent recovery on that backup copy of the file will know that it must begins at that SCN. Having an old SCN in the file header tells recovery that the file is an old one, and that it should look for the redolog file containing that SCN, and apply recovery starting there.
Note that checkpoints to datafiles in hot backup mode are not stopped during the backup, only the incrementing of the main checkpoint SCN flag will stop increasing.

Also we will observe that huge redo will be generated during hot backup. This is the result of the logging of full images of changed blocks in these tablespaces to the redologs.Normally, Oracle logs an entry in the redologs for every change in the database, but it does not log the whole image of the database block. By logging full images of changed DB blocks to the redologs during backup mode, Oracle eliminates the possibility of the backup containing fractured blocks.

Typically, Oracle database blocks are a multiple of O/S blocks. For instance, most Unix filesystems have a default block size of 4k, while Oracle’s default block size is 8k. This means that the filesystem stores data in 4k chunks, while Oracle performs reads and writes in 8k chunks, or multiples. While backing up a datafile, your backup script makes a copy of the datafile from the filesystem, using O/S utilities such as copy, dd etc. As it is making this copy, it is reading in O/S-block sized increments. If the database writer happens to be writing a DB block into the datafile at the same time when backup is reading that block, your backup copy of the DB block could contain some O/S blocks before and after writing by DBWR. This kind of blocks which are having mismatched halves are called fractured blocks.

By logging the full block image of the changed block to the redologs, it guarantees that in the event of a recovery, any fractured that might be in the backup copy of the datafile will be resolved by replacing them with the full image of the block from the redologs.

ALTER SYSTEM SWITCH LOGFILE vs ALTER SYSTEM ARCHIVELOG CURRENT

 Both ALTER SYSTEM SWITCH LOGFILE(Asynchronous) and ALTER SYSTEM ARCHIVELOG

 CURRENT(Synchronous) will force a log switch, but they do it in different

 ways!

The 'ALTER SYSTEM ARCHIVELOG CURRENT' is the preferred one that should be
used for backup scripts.

The reason being,when you do an 'ALTER SYSTEM SWITCH
LOGFILE', you get the prompt back immediately. This command performs a
database checkpoint, switches to the next log, In background signals the

archiver to copy the logfile to the archive destination, and returns the

prompt. Mind you,this command does not wait for the archive to complete.

As the command indicates, it only performs a log switch. Hence, you might

come across backup scripts which does an 'ALTER SYSTEM SWITCH LOGFILE' and

then does a SLEEP for some time, thus allocating some time for the Archiver

to finish archiving.

Whereas, an 'ALTER SYSTEM ARCHIVE LOG CURRENT' does all of the
above, but does not return back to the prompt until the archive is
complete.This command is safer because it waits for the OS to acknowledge(ACK)

that the redo log has been successfully written.

Also, 'ALTER SYSTEM SWITCH LOGFILE' archives only the current thread.
Whereas the 'ALTER SYSTEM ARCHIVELOG CURRENT' needs a thread to be
specified and If you miss the thread parameter, Oracle archives all redo
log file groups from all enabled threads, including logs previous to
current logs.

Friday, 11 May 2012

Adding a voting disk and ocr file online

In 10.2 RAC it is now possible to add a voting disk online in addition to specifying more than one voting disk during the ClusterWare install.In 10G R1 there was no software level mirroring of the voting disk.

I tried this out for the first time today and things didn't go as I expected.

[root@dbrac1 ~]# /home/oracle/oracle/product/10.2.0/crs/bin/crsctl add css votedisk /dev/raw/raw4
Cluster is not in a ready state for online disk addition
You have to use the force option in the above command due to
Bug 4898020 ADDING VOTING DISK ONLINE CRASH THE CRS
which is fixed in the 10.2.0.4 patchset. This is again another case where clearly documented functionality in 10.2 does not work.

Before using the force option make sure the CRS stack is not up on any of the nodes.

The Cluster Ready Services Stack

The list in this section describes the processes that comprise CRS. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.

Cluster Ready Services (CRS): The primary program for managing high availability operations in a cluster.
The CRS daemon (crsd) manages cluster resources based on the configuration information that is stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes. When you have Oracle RAC installed, the crsd process monitors the Oracle database instance, listener, and so on, and automatically restarts these components when a failure occurs.
Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interface with your clusterware to manage node membership information.
The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure may result in Oracle Clusterware restarting the node.
Oracle ASM: Provides disk management for Oracle Clusterware and Oracle Database.
Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle Clusterware.
Event Management (EVM): A background process that publishes events that Oracle Clusterware creates.
Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application Notification (FAN) events.
Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g release 1 (11.1).
Oracle Root Agent (orarootagent): A specialized oraagent process that helps crsd manage resources owned by root, such as the network, and the Grid virtual IP address.

The Cluster Synchronization Service (CSS), Event Management (EVM), and Oracle Notification Services (ONS) components communicate with other cluster component layers on other nodes in the same cluster database environment. These components are also the main communication links between Oracle Database, applications, and the Oracle Clusterware high availability components. In addition, these background processes monitor and manage database operations.

[root@dbrac1 init.d]# ps -ef | grep d.bin | grep -v grep
should return no rows.

[root@dbrac1 init.d]# /home/oracle/oracle/product/10.2.0/crs/bin/crsctl add css votedisk /dev/raw/raw4 -force
Now formatting voting disk: /dev/raw/raw4
successful addition of votedisk /dev/raw/raw4.
[root@dbrac1 init.d]#

Check that the newly added disk can be seen from both nodes.

[root@dbrac1 init.d]# /home/oracle/oracle/product/10.2.0/crs/bin/crsctl query css votedisk
0. 0 /dev/raw/raw2
1. 0 /dev/raw/raw4

located 2 votedisk(s).

[root@dbrac2 init.d]# /home/oracle/oracle/product/10.2.0/crs/bin/crsctl query css votedisk
0. 0 /dev/raw/raw2
1. 0 /dev/raw/raw4

located 2 votedisk(s).

The same error is encountered should you wish to delete the newly added disk

[root@dbrac1 bin]# ./crsctl delete css votedisk /dev/raw/raw4
Cluster is not in a ready state for online disk removal
[root@dbrac1 bin]# ./crsctl delete css votedisk /dev/raw/raw4 -force
successful deletion of votedisk /dev/raw/raw4.

Again please note that the force option should be used only after the CRS stack
is shutdown on all nodes. Failure to do so could result in OCR corruption.

Adding OCR

From 10g R2 we can multiplex the ocr

Adding OCR device again (CRS was online)
[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1 or /dev/sda1
                                                                                (destination_file or disk)
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       : 744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter

Changing the location of an existing OCR

1. [root@node1-pub ~]# ocrconfig -replace ocr destination_file
2.[root@node1-pub ~]# ocrconfig -replace ocr disk

Saturday, 5 May 2012

How to restore the lost archivelogs before it is shipped to the Standby

Today ,the day following the weekend we realised all the archivelogs are deleted before being shipped to the standby
We usually keep only 2 days archivelogs so that to avoid asm diskgroup from getting exhausted.
The script which deletes the archivelog is
ORACLE_HOME=/oracle/app/oracle/product/10.2.0;export ORACLE_HOME
ORACLE_SID=;export ORACLE_SID
/oracle/app/oracle/product/10.2.0/bin/rman target / << EOF
delete archivelog until time ‘trunc(sysdate)-2′;
y
EOF
today on monday i realised that archivelogs genareated were not shipped to physical standby because we had not enabled the dataguard at weekends.So we detected the archive log gap from V$archive_gaps
select * from gv$archive_gaps;
Inst_id thread# sequence# gap
1 5630-5645
2 4512-4536
The above command on standby database gives us the missing archive log sequence from the nodes
the archive logs are restored on the primary using rman as follows
rman target /
restore archivelog from logseq 5630 until logseq 5645 thread 1;
restore archivelog from logseq 4512 until logseq 4536 thread 2;
after restoring the archive logs start the shipping of archive logs as follows:
alter system set log_archive_dest_state_2=enable;
& it was observed things were back to normal. also the alert log of the standby gives the indication of missing archivelogs
from rman u can verify the archivelogs are present at the primary or not?
RMAN> list archivelog from logseq 5630

Monday, 16 April 2012

Rman format specifications

Syntax Element	Description
`%a`	Specifies the activation ID of the database.
`%c`	Specifies the copy number of the backup piece within a set of duplexed backup pieces. If you did not duplex a backup, then this variable is 1 for backup sets and 0 for proxy copies. If one of these commands is enabled, then the variable shows the copy number. The maximum value for `%c` is 256.
`%d`	Specifies the name of the database.
`%D`	Specifies the current day of the month from the Gregorian calendar in format `DD`.
`%e`	Specifies the archived log sequence number.
`%f`	Specifies the absolute file number.
`%F`	Combines the DBID, day, month, year, and sequence into a unique and repeatable generated name. This variable translates into `c-IIIIIIIIII-YYYYMMDD-QQ`, where: `IIIIIIIIII` stands for the DBID. The DBID is printed in decimal so that it can be easily associated with the target database. `YYYYMMDD` is a time stamp in the Gregorian calendar of the day the backup is generated `QQ` is the sequence in hexadecimal number that starts with 00 and has a maximum of 'FF' (256)
`%h`	Specifies the archived redo log thread number.
`%I`	Specifies the DBID.
`%M`	Specifies the month in the Gregorian calendar in format `MM`.
`%N`	Specifies the tablespace name.
`%n`	Specifies the name of the database, padded on the right with `x` characters to a total length of eight characters. For example, if the `prod1` is the database name, then the padded name is `prod1xxx`.
`%p`	Specifies the piece number within the backup set. This value starts at `1` for each backup set and is incremented by 1 as each backup piece is created. Note: If you specify `PROXY`, then the `%p` variable must be included in the `FORMAT` string either explicitly or implicitly within `%U`.
`%s`	Specifies the backup set number. This number is a counter in the control file that is incremented for each backup set. The counter value starts at `1` and is unique for the lifetime of the control file. If you restore a backup control file, then duplicate values can result. Also, `CREATE` `CONTROLFILE` initializes the counter back to `1`.
`%t`	Specifies the backup set time stamp, which is a 4-byte value derived as the number of seconds elapsed since a fixed reference time. The combination of `%s` and `%t` can be used to form a unique name for the backup set.
`%T`	Specifies the year, month, and day in the Gregorian calendar in this format: `YYYYMMDD`.
`%u`	Specifies an 8-character name constituted by compressed representations of the backup set or image copy number and the time the backup set or image copy was created.
`%U`	Specifies a system-generated unique filename (default). The meaning of `%U` is different for image copies and backup pieces. For a backup piece, `%U` specifies a convenient shorthand for `%u_%p_%c` that guarantees uniqueness in generated backup filenames. If you do not specify a format when making a backup, then RMAN uses `%U` by default. For an image copy of a datafile, `%U` means the following: `data-D-%d_id-%I_TS-%N_FNO-%f_%u` For an image copy of an archived redo velog, %U means the following: `arch-D_%d-id-%I_S-%e_T-%h_A-%a_%u` For an image copy of a control file, %U means the following: `cf-D_%d-id-%I_%u`
`%Y`	Specifies the year in this format: `YYYY`.
`%%`	Specifies the literal `'%'` character. For example, `%%Y` translates to the string `%Y`.

Comparing of RMAN Backup Compression Levels

There are different compression levels with Oracle 11g R2. BASIC, LOW, MEDIUM and HIGH are the four different compression levels. We must have “Advanced Compression” option license to use LOW, MEDIUM and HIGH levels of compression. In this article, I will do the tests at 4 compression levels. We will compare compression levels for backup duration and backup size.
So let’s just our test.
I wrote a shell script like the following. Thus, you can watch your spare time.

# vi rman_compression_test.sh

Add following lines to rman_compression_test.sh script and save it.

echo “RMAN Backup Start Date :” `date ‘+%d.%m.%Y %H:%M:%S’`
StartTime=$(date +%s)
export NLS_LANG=AMERICAN export NLS_DATE_FORMAT=’DD-MON-YYYY HH24:MI:SS’
rman target / << EOSQL
backup as compressed backupset database;
EOSQL
EndTime=$(date +%s)
DiffTime=$(( $EndTime – $StartTime ))
echo “RMAN Backup Finished.”
echo “Backup End Date :” `date ‘+%d.%m.%Y %H:%M:%S’`
echo “RMAN Backup Duration :” $DiffTime

Let us set the location of our backup files.

# rman target /
RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT ‘/oracle/yedek/bck_test11g/%U’;

1- BASIC compression level test. let’s see our recent compression level.

# rman target /
RMAN> SHOW COMPRESSION ALGORITHM ;
RMAN configuration parameters for database with db_unique_name DBARGE are:
CONFIGURE COMPRESSION ALGORITHM ‘BASIC’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;

We are using BASIC level. Now let’s back up of our 10GB database as compressed.

# . rman_compression_test.sh
RMAN Backup Start Date : 13.03.2012 16:19:33
…..
Recovery Manager complete.
RMAN Backup Finished.
Backup End Date : 13.03.2012 16:26:32
RMAN Backup Duration : 419

The load average was during the backup:
# top load average: 1.12, 0.82, 0.75
Backup Size: 636M
2- LOW compression level test.

# rman target /
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘LOW’;
old RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘BASIC’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘LOW’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters are successfully stored

Now we are using the LOW level. let’s start backup.

# . rman_compression_test.sh
RMAN Backup Start Date : 13.03.2012 16:30:36
…..
Recovery Manager complete.
RMAN Backup Finished.
Backup End Date : 13.03.2012 16:33:45 RMAN
Backup Duration : 189

The load average was during the backup:
# top load average: 1.34, 0.85, 0.74
Backup Size: 797M
3- MEDIUM compression level test.

# rman target /
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘MEDIUM’;
old RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘LOW’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘MEDIUM’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters are successfully stored

Now we are using the MEDIUM level. let’s start backup.

# . rman_compression_test.sh
RMAN Backup Start Date : 13.03.2012 16:36:21
…..
Recovery Manager complete.
RMAN Backup Finished.
Backup End Date : 13.03.2012 16:40:19
RMAN Backup Duration : 238

The load average was during the backup:
# top load average: 1.38, 0.93, 0.77
Backup Size: 674M
4- HIGH compression level test.

# rman target /
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘HIGH’;
old RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘MEDIUM’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM ‘HIGH’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters are successfully stored

Now we are using the HIGH level. let’s start backup.

# . rman_compression_test.sh
RMAN Backup Start Date : 13.03.2012 16:42:21
…..
Recovery Manager complete.
RMAN Backup Finished.
Backup End Date : 13.03.2012 17:34:30
RMAN Backup Duration : 3129

The load average was during the backup:
# top load average: 1.20, 1.07, 0.88
Backup Size: 485M
5- Normal backup test. Change content of our rman_compression_test.sh script with following lines

# vi rman_compression_test.sh
echo “RMAN Backup Start Date :” `date ‘+%d.%m.%Y %H:%M:%S’`
StartTime=$(date +%s)
export NLS_LANG=AMERICAN export NLS_DATE_FORMAT=’DD-MON-YYYY HH24:MI:SS’
rman target / << EOSQL
backup as backupset database;
EOSQL
EndTime=$(date +%s)
DiffTime=$(( $EndTime – $StartTime ))
echo “RMAN Backup Finished.”
echo “Backup End Date :” `date ‘+%d.%m.%Y %H:%M:%S’`
echo “RMAN Backup Duration :” $DiffTime

Execute script.

#. rman_compression_test.sh
RMAN Backup Start Date : 13.03.2012 17:37:51
…..
Recovery Manager complete.
RMAN Backup Finished.
Backup End Date : 13.03.2012 17:42:30
RMAN Backup Duration : 279

The load average was during the backup:
# top load average: 2.42, 1.56, 1.17
Backup Size: 4.0G

Compression Level	Backup Size	Backup Duration
NORMAL	4.0G	279 sec
BASIC	636M	419 sec
LOW	797M	189 sec
MEDIUM	674M	238 sec
HIGH	485M	3129 sec

Thursday, 12 April 2012

Suspending and Resuming a Database

The ALTER SYSTEM SUSPEND statement suspends a database by halting all input and output (I/O) to datafiles (file header and file data) and control files, thus allowing a database to be backed up without I/O interference. When the database is suspended all preexisting I/O operations are allowed to complete and any new database accesses are placed in a queued state.
The suspend command suspends the database, and is not specific to an instance. Therefore, in an Oracle Real Application Clusters environment, if the suspend command is entered on one system, then internal locking mechanisms will propagate the halt request across instances, thereby quiescing all active instances in a given cluster. However, do not start a new instance while you suspend another instance, since the new instance will not be suspended.
Use the ALTER SYSTEM RESUME statement to resume normal database operations. You can specify the SUSPEND and RESUME from different instances. For example, if instances 1, 2, and 3 are running, and you issue an ALTER SYSTEM SUSPEND statement from instance 1, then you can issue a RESUME from instance 1, 2, or 3 with the same effect.
The suspend/resume feature is useful in systems that allow you to mirror a disk or file and then split the mirror, providing an alternative backup and restore solution. If you use a system that is unable to split a mirrored disk from an existing database while writes are occurring, then you can use the suspend/resume feature to facilitate the split.
The suspend/resume feature is not a suitable substitute for normal shutdown operations, however, since copies of a suspended database can contain uncommitted updates.
Do not use the ALTER SYSTEM SUSPEND statement as a substitute for placing a tablespace in hot backup mode. Precede any database suspend operation by an ALTER TABLESPACE BEGIN BACKUP statement.
The following statements illustrate ALTER SYSTEM SUSPEND/RESUME usage. The V$INSTANCE view is queried to confirm database status.
SQL> ALTER SYSTEM SUSPEND;
System altered
SQL> SELECT DATABASE_STATUS FROM V$INSTANCE;
DATABASE_STATUS
———
SUSPENDED
SQL> ALTER SYSTEM RESUME;
System altered
SQL> SELECT DATABASE_STATUS FROM V$INSTANCE;
DATABASE_STATUS
———
ACTIVE

Quiescing a Database

Quiesce is used to describe pausing or altering the state of running processes on a computer, particularly those that might modify information stored on disk during a backup, in order to guarantee a consistent and usable backup.
There are times when a DBA needs to perform work on the database that requires no other ACTIVE sessions. This is very common in development databases, where a DBA tries to run a package or a modify a table when the application is till running.
There are also frequent occurrences where too much IT support folks time is spent in bringing lot of application servers down to perform a deployment or some database maintenance. That is where Quiescing comes in.
To put a database(run on one instance and it affects all the instances of the RAC database).

ALTER SYSTEM QUIESCE RESTRICTED;

Quiescing puts the database in mode that does not allow any further connections to become ACTIVE state. It would however wait for all ACTIVE connections to become INACTIVE or terminate. Gladly Oracle provides a query to find sessions which are stopping it from enter into QUIESCE mode.

select bq.sid, ss.user, ss.osuser, ss.type, ss.program
from v$blocking_quiesce bq, v$session ss
where bq.sid = ss.sid;

Note:

In this discussion of quiesce database, a DBA is defined as user SYS or SYSTEM. Other users, including those with the DBA role, are not allowed to issue the ALTER SYSTEM QUIESCE DATABASE statement or proceed after the database is quiesced.

Once in QUIESCE mode, DBA connections are the only ones that are allowed to be in ACTIVE state. The DBA can then easily lock any object to perform his tasks.

Once done to release the QUIESCE

ALTER SYSTEM UNQUIESCE;
You can query V$INSTANCE view to see active state .

SQL> select active_state from v$instance;

You can see 3 situation. NORMAL – Database is normal state (unquiescing). QUIESCING – Being quiesced, but some non-DBA sessions are still active. There are active sessions QUIESCED – Quiesced. There is no non-DBA sessions are active.

Total Pageviews

Friday, 7 September 2012

Friday, 31 August 2012

Tuesday, 31 July 2012

Wednesday, 6 June 2012

Saturday, 2 June 2012

LOSS of all controlfiles (Complete recovery and No need to open the database with resetlogs after recreating the controlfile manually)

Saturday, 26 May 2012

Thursday, 24 May 2012

Recovering after loss of members of the INACTIVE redo log group

Recovering After loss of members of the CURRENT redo log group

Monday, 21 May 2012

Friday, 11 May 2012

The Cluster Ready Services Stack

Saturday, 5 May 2012

Monday, 16 April 2012

Thursday, 12 April 2012