Scenario is this, a physical DB server is showing intermittent controller errors, our server admin is nervous (as am I) as this server is home to one of our more critical databases. Prepped another server and installed OS and 10.2.0.4, exact same platform as current production (I wanted to use the opportunity to go to 11G but vendor is not ready)
Switch over is this Saturday, I've already test restored a (user managed) online backup to the new server,rebuilt the controlfile, and applied available redo to it. I also took the opportunity to move some datafiles and the redo logs to different locations and multiplex the controlfiles better.
My plan is this, the day before the switch over, restore latest online backup from prod to new server, go through the usual controlfile rebuild and put in recovery mode and apply all available redo, then leaving in recovery mode. On switch day, bounce the prod db, startup restrict, switch logfile and shutdown. Copy latest archivelogs to new machine and apply, then open resetlogs. I'm thinking this is the fastest way with minimal downtime.
Just curious how you (or others) would approach this. I actually have time to do a direct datafile copy from server to server, but want to see how fast I can make this happen.
Actually, I think that’s pretty much the textbook scenario. You could glorify it up a little: create standby database, ship redo, perform transition from primary-to-secondary, and from secondary-to-primary, but that is practically what you’re doing anyway. (Just check that if you aren’t using a specially-constructed standby controlfile that it will let you perform multiple recoveries: something lurking at the back of my brain tells me that a non-standby controlfile won’t resume recovery after an initial recovery has completed, even though you didn’t say ‘open resetlogs’ at the end of that initial recovery… I could be wrong, though. My memory isn’t what it used to be!! Maybe you’ll have to re-copy the production controlfile to trigger further redo application?)
But in principle, a standby scenario would definitely be the fastest way I’d know how to do it, anyway, short of using things like mirrored LUNs on a SAN, for example.
I will be rebuilding the control file to reflect the new datafile and redo log locations, then opening with resetlogs. The old prod database will be shut down at this point, and should I need ever to go back to an earlier time than the cutover I would be accessing the older production db.
I was thinking of copying the controlfiles over as well and moving the datafile locations from within the database in mount stage rather than rebuild controlfile, but I thought this as a 6 of one / 1/2 doz of the other type of scenario, and decided to just rebuild the controlfile.
basically this is how it will play out
day before cutover:
shutdown new prod database (currently in place to test app against)
copy datafiles from latest online backup of old prod database to new server (database)
issue startup nomount of new prod database
issue @control.sql (my script to rebuild the control file with the new datafile & redo log locations)
issue recover database until cancel using backup controlfile;
let recover apply all available logs and leave in recovery state
day of cutover:
shutdown old prod database and startup restrict
issue alter system switch logfile;
shutdown old production database and copy all archived logs since the last one that was applied on day before
finish applying the rest of the archived logs
issue alter database open resetlogs;
take offline (or online, depending on time restriction) backup of new database
change tns entry in OID to reflect new location
startup new database
Only thing I was wondering, if one alter system switch logfile is sufficient before I shut the old one down- if memory serves it should be, I’ll create a test table and last thing I do before issuing logfile switch is insert a row in it then query it on the new one, just to be sure.
Hmm. I’m not sure about this. Have you tested it?
My concern is that when you use the ‘create controlfile’ script to rebuild a controlfile, it cheats: it goes out to the datafiles, checks their header SCNs and says “right, I’ll build the controlfile to agree with the highest SCN found’. If all datafiles agree on that SCN, then no recovery is necessary… and you end up with a consistent database when what you want is for it to look inconsistent so that redo can be applied.
Now, I think that if you were to copy the oldprod online redo logs to the new location before you issued the ‘create controlfile’ command, that would be sufficient to trigger the application of the last bit of online redo (because then the highest SCN would be that recorded in the oldprod’s current redo log). Obviously, you can only do the copy of the online redo logs once the oldprod’s been shut down.
I am still a bit concerned that your plan seems to involve applying redo from the archived redo logs to a database that’s already been recovered, too. Maybe that’s fine… but I am still concerned that once a ‘recover database’ is finished, that’s it, even if more archives are subsequently produced from the source database. Only in standby databases do you get a chance to apply the redo in the archives generated after that initial ‘recover database’, I think. But I could be wrong on that.
Sorry: I am picking at your plan without having good answers to your question. If I get the chance to cook up a test database, I’ll try this out for you -but since your switchover is Saturday, I suspect you’ll know the answers before I get a chance to work them out!
Oh, incidentally: recover using backup controlfile would be wrong. What you’re proposing involves creating a nice, minty-fresh new controlfile. It’s definitely not a backup controlfile (which would imply a binary image of a controlfile restored from backup).
My gut feel is that you have datafiles restored from backup (let’s call that time B). You need to copy all archives from oldprod to newprod generated after time B. You shutdown oldprod at, let us say, time S. You copy across the controlfile and online redo logs as at time S to newprod. You then mount the newprod database and issue a simple ‘recover database’: it should prompt for the application of redo from time B until time S, prompting for archives around time B+1, B+2 and so on, and then automatically switching to the online logs for times around S-1 and S.
I can’t see that you need a resetlogs at all (you’re presumably hoping for a complete recovery, not an incomplete one?). You just need to force the application of redo from the time you did your last oldprod backup until the time of the last bit of redo contained in oldprod’s online logs -and you’d do that by copying across controlfiles+online redo, so that they agree on the SCN to be reached, which will be later than the SCN contained in the headers of your datafiles.
Hmm, you have me thinking now, which is never good this late at night when I’m away on course (intro to SQLServer 2008 for the Oracle DBA) and just finished a meal of Guiness meatloaf washed down by a couple of pints of same.
Anyway, what I am hoping to do here is just do what I always do when creating a clone database for dev or test from a production database, only instead of stopping at whatever archivelogs I have copied over, this time I’ll apply every last archivelog from the source database until the time it was shut down. I will not take the new database out of recovery mode until I have copied the very last archivelog from the old (now shut down) database. Only then will I issue an alter database open resetlogs command.
The process I always do (for cloning) is this, copy over data files (except temp) from latest user managed online backup to new location and/or server. Because it is from an online backup, online redo logs are not copied across. I then do a startup nomount on the new machine (already pre created the service, pfile and file locations etc) run a script that I built from a “backup controlfile to trace” command on the source database, (change a few things like file locations, set database , resetlogs etc.) Once the new controlfile is created I then issue my “recover database until cancel using backup controlfile” command. I have to use the “using backup controlfile”, if I don’t I get a scolding message back saying I have to. Then I am prompted to apply archived logs, by default it is looking in the path/location from the original prod database, so I either make sure I have created the same drive/folder path on the new machine or enter in new location where I have copied the archived logs to from the prod machine. After I have applied all the logs I copied, I then issue alter database open resetlogs, which of course makes this a brand new database starting at square 1 (no recovery to an earlier point in time possible), which is ok because it’s a test/dev database, or in this case, the new prod database.
I usually do this in one fell swoop, but in this case to save time I’ll start the recovery process the day before so I’m not waiting for the datafile copy, then leave it in recovery mode waiting to apply the rest of the archivelogs, then all I need to copy over are the archived logs starting from where I left off.
I used to refresh dev/test databases with schema imports, but found this method much faster.
I was thinking of this time just copying cold datafiles, redo logs and controlfiles across to an identical directory structure, but in the end doing it the above way has the same result, and exact replica of the prod database in a new server.
The one thing in the back of my mind is, how many log switches I need to do to make sure ALL of the latest data in the old prod database resides in the archived logs I’m going to be copying to the new machine.
Using backup controlfile is never necessary unless the SCN of the controlfile is to be ignored by the recovery process (because it’s wrong, because it’s been restored). A freshly-created controlfile (‘create controlfile…’) will never require it.
Resetlogs is only ever required if incomplete recovery is performed. In your scenario, there is no actual need to perform an incomplete recovery, though if that’s what you’re comfortable with as a result of performing this procedure many times, fair enough. I’m just saying, I’d be doing a complete recovery and no resetlogs would be required (which might help, for example, with the RMAN stuff).
If your only remaining doubt is how many log switches you require to ensure all online redo is in the archives, the answer is either ’1′ or ‘infinite’: the point being that a database which is not open read-only is, theoretically, going to write something into the current online log, and a log switch only means that it’s a different current log from the one you were writing in before. But something is always current… and you can’t have an archive that contains the contents of the current log.
However, if you are in restricted session mode, as you propose, then it is conceivable that one log switch will suffice. That is, the contents of the new current log won’t be of significance -provided you are prepared to do an incomplete recovery elsewhere, which you seem to be.
I see what you are getting at, when I get back to my workplace tomorrow I will have time to test a couple of these scenarios before the cutover. Since my regular cloning process has not necessitated a complete recovery, as I just wanted a “close to current” copy of the database, incomplete recovery’s have sufficed.
So, instead of my first line of my control.sql reading like this:
CREATE CONTROLFILE set DATABASE “IVARPROD” RESETLOGS ARCHIVELOG
I should be changing the RESETLOGS to NORESETLOGS, then issuing a simple “recover database” command after making sure all my archived redo is in place. I’ll test that as soon as I get a chance.
Yeah, I think that’d be the way I’d go, anyway. Let’s hope it works out for you, I guess!
Best Answer
I’ve now tested this, and this approach will work.
I had a database running on one server, inserting records every 20 seconds, log switching and archiving every 10 minutes. I put each tablespace in manual backup mode (I could have used RMAN for this, but either approach will achieve identical outcomes), and copied the datafiles across to a second server (on which Oracle had been installed but no database created). I spent a couple of hours copying those files across before the entire set was complete. I then didn’t do anything for several hours (this is the equivalent of copying online backups of datafiles across to a new server on Monday and Tuesday but not then doing anything else until the end of the week!)
I then copied across the 500+ archive logs (the database was still up and running on ServerA at this point, of course). This meant creating the correct sub-directories in the flash_recovery_area over on ServerB beforehand. I also copied over my spfile and password file at this point.
So now I had an inconsistent set of datafiles and all archives sitting on ServerB.
I then did a alter database backup controlfile to trace as ‘/home/oracle/controlfile.trc’ noresetlogs over on ServerA, and I copied that trace file over to ServerB. I editied the file there so that the only commands it contained were the ones about ‘create controlfile … noresetlogs archivelog’. All the stuff about recovering the database and creating a TEMP tablespace, I removed.
Now I moved all my datafiles on ServerB so that they lived in equivalent directories as they did originally on ServerA. I then tried to run my ‘create controlfile’ script… but this failed: the error message explained that it couldn’t identify the online redo logs (which are mentioned in the create controlfile syntax), so it wasn’t going to try creating a controlfile after all.
So, I went back to ServerA (the database is still up at this time) and copied across the *online* redo logs to ServerB. This is, of course, a complete no-no, because a hot-copied online redo log will be internally corrupt: but I had no intention of using the online logs for anything except just ‘being there’, so this wasn’t a problem.
With the online log copies in place over on ServerB, I re-ran my ‘create controlfile’ script, and this time it was fine: we got to the MOUNT state without a drama… at which point, I shut the database down and deleted the online redo logs (because, as hot-copies, they are unreliable anyway).
That done, I again mounted my database on ServerB and issued a ‘recover database’ command, followed by ‘auto’: the recovery process churned through the 500+ archives and then bombed out with this error:
This was as expected: the recovery process has applied all the archived redo and now wants to switch to using the stuff in the online logs. Since they’re not there, the process blows up. No worries!
Right: over to ServerA and a nice, clean shutdown of the database. We are now in planned outage territory -and this is the first time the database has been inaccessible to ordinary users. Once it’s closed, I copy the online redo logs to ServerB and any archive logs which have been produced since the bulk of them were transported across earlier. This takes just a few minutes.
Over on ServerB, I delete the controlfile: it won’t agree with the fresh online redo logs I’ve just copied across and is therefore useless. But I again run my ‘create controlfile’ script to create a brand new controlfile. Once that’s done, it’s time to issue the ‘recover database’ command once more, followed by ‘auto’: it runs through the last few archivelogs, switches invisibly to applying redo from the new online redo log copies… and then says ‘media recovery complete’.
One ‘alter database open’ (and a ‘create temporary tablespace…’) and the database is open for business on its new host. A check of the records in my test table indicate that the row I inserted in there just before I shut the database down on ServerA is indeed present and correct, so no data has been lost… and no resetlogs have been issued, so all prior backups etc. remain usable (not a concern in 11g, of course, but definitely a consideration in 10g and earlier).
The only “naughty” bit required in any of that is where I took a hot copy of the production online redo logs in order to persuade the ‘create controlfile’ command to succeed. As soon as it does, delete them from the file system, because you do not (and must not) apply any of the redo they contain. You can do a ‘recover database’ at that point safe in the knowledge that only redo from the archives will be applied, because when it gets to the bit where it thinks it needs the online logs, it will discover they’re not there. That sets you up for doing a clean shutdown and copy of the ‘real’ online logs later on, for a second bout of recovery, through to completion.
Thanks for testing that Howard – that’s pretty well how I have it planned to play out, in a round about way. One difference is I had planned to relocate my redo logs and all the datafiles on the new server, and the redo logs would be in new groups, instead of groups 5 through 9 on c: and d: drive, they are now groups 1 through 5 on drive d: and e: , and instead of datafiles being scattered across d: and e: drives, all are going to reside on e: drive in the new home. This is all reflected in my create controlfile script, but if I am reading this right, in order to do a complete recovery I’ll need to leave the original names and locations of the redo logs in the rebuild controlfile script (as well as replicate the names on the new server), copy the logs over from the (now shut down) old database for the recovery process, and change my locations/names/groups manually later once the database is up and running.
The way I had originally planned it would have been an incomplete recovery, although I would have had the same data in the database. My first test run I had the redo renamed and relocated in the controlfile rebuild, but I can see that would prohibit using the redo logs from the old database for the rest of the recovery.
You’ll be able to change the location of your online logs just fine.
Where I did my ‘naughty hot copy’, for example: if I’d edited my script so that it said ‘expect all logs on e: drive’, that would have been fine, so long as I’d actually hot-copied the online logs from oldserver’s d: and e: drive to the new locations on the newserver.
Don’t worry about whether logs are numbered 1, 2, 3, 4, 5 or 106, 107, 108, 109, 110 either. The internal numbering is irrelevant, and the script will cope with it under all circumstances, automatically.
If you were planning to alter the number of log groups, however (e.g., oldserver has 5 groups and you want newserver to run with 10), that would be a different matter: you wouldn’t be able to do a ‘naughty hot copy’ of a redo log that doesn’t actually exist, after all. Mucking about with the number of groups is best left to post-move ‘alter database’ commands, which are all online things, after all.
But no: if you’ve got 5 groups on d: and e: drives, and you want them to become 5 groups on just e: drive on the new server, that’s alll just fine: the naughty hot copy can be to newserver’s e: drive, you edit the create controlfile script to match, and you’ll be able to create the new controlfile without drama.
Done and it went faster than expected even. I had extra time so this morning I was able to copy over last nights online backup, and it turns out activity was quiet on the weekend so there was not even any archived logs since the backup.
After I “hot copied” the online logs over, and created the new controlfile, shut the database down, deleted the redo logs, did a startup mount and then issued a recover command it complained about not seeing the redo logs – then I realized there had been no redo archived since the backup, meaning there was no archived log to look for, so I just waited until the server admin was ready for me to shut down the old database, did that then copied over the cold redo logs to their respective new locations and issued recover database again and in a couple of seconds it was back up and running and I could open it and re add the tempfile to the temp tablespace.
One thing I did have to change though, was the locations of the redo logs to match the old locations in my create controlfile script, it complained about not being able to see the file when I had it in a different location. No biggie, on the old server when I was creating the groups I inadvertently created one folder too many and ended up with the files in e:\oracle\oradata\ivarprod\ivarprod\ rather than just e:\oracle\oradata\ivarprod\ – I thought I could fix this with the controlfile re-creation but ended up putting the original path in and creating the extra sub folder on the drive.
All in all, easily the fastest transition/move I’ve done, although my blazing fast efficiency is moot in this case because we have to rename the server and assign the ip address and name from the old server to the new one, which involves several reboots etc, then take the old server off the domain. All because of some (poorly coded I think) application logic that looks for update files on one of the DB server drives, and the application admin insisted we keep the same server name and IP address for the new server. When I upgrade this database next time I’m going to be more insistent that only the database and nothing else application related will reside on it.
Really glad it went so well!
Hope you got something out of the exercise here, too!
Regards,
HJR
sure did, better understanding of how the rebuild controlfile works, and how the controlfile responds to redologs etc.. next task is to get more comfortable with rman and try the same thing on a test server, although for simplicity one really can’t get much simpler than the above..
thanks for the pointers & testing!