Tuesday, September 23, 2008

Troubleshooting ORA-16191 : Primary log shipping client not logged on standby

I recieved an error ORA-16191 from email alert tonight when I was on-call. I searched the internet immediately, found that:

---
ORA-16191: Primary log shipping client not logged on standby
Cause: An attempt to ship redo to standby without logging on to standby or with invalid user credentials.
Action: Check that primary and standby are using password files and that both primary and standby have the same SYS password. Restart primary and/or standby after ensuring that password file is accessible and REMOTE_LOGIN_PASSWORDFILE initialization parameter is set to SHARED or EXCLUSIVE
---

I checked the alert log file of the primary database, found that the alert starting from 1:18




Tue Sep 23 01:18:15 2008
Error 1017 received logging on to the standby
------------------------------------------------------------
RA-16191: Primary log shipping client not logged on standby
PING[ARC3]: Heartbeat failed to connect to standby 'PS4008A.world'. Error is 16191.
Tue Sep 23 01:23:16 2008
Error 1031 received logging on to the standby
Tue Sep 23 01:23:16 2008
Errors in file /logs/ORACLE/MYDBNAME/bdump/pphi08a_arc3_2846.trc:
ORA-01031: insufficient privileges
PING[ARC3]: Heartbeat failed to connect to standby 'PS4008A.world'. Error is 1031.

Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------



I also checked the alert log file of the standby database, find that the someone probably was doing shutdown/start the standby db



-----
Tue Sep 23 02:00:33 2008
Physical Standby Database mounted.
Completed: alter database mount standby database
Tue Sep 23 02:00:33 2008
Physical Standby Database mounted.
Completed: alter database mount standby database
Tue Sep 23 02:01:26 2008
alter database recover managed standby database parallel 16 disconnect
Tue Sep 23 02:01:26 2008
alter database recover managed standby database parallel 16 disconnect
---



The timestamp of the pmon process also indicated it was just started tonight.



$ ps -ef grep pmon
oracle 23059 1 0 01:59:26 ? 0:00 ora_pmon_PS4008A



Another team member explained to me that we change the SYS password every 2 or 3 months, however,
when we change for the primary database, we don't change for the standby database. So after standby db down and up, Oracle trying to connect the standby using the new password, but the password file in the standby db site still contains old password. This is the reason that I saw the error.

Based on what he said and the problem observed, it looks to me that somehow there is a "connection" from primary db to the standby db using the password in the password files and the two password files have to be in sync to enable such a connection. When we changed the sys password in the primary only, the password files are out of sync. So when standby is bounced, this "connection" need to be re-established but it can not because the out of sync situation.

I thus used the "orapwd" command to create a new password file on the standby site. Problem resolved. No such alerts recieved again.

No comments: