SAP cross-cluster move with minimized downtime

SAP cross-cluster move with minimized downtime: Many customers are using SAP ABAP or Java application servers installed in a system architecture based on the Windows Server Failover cluster framework. This high availability architecture is well-known and has been proven to work for years. Unplanned hardware failures and maintenance activities can be covered with minimal impact for the system availability.
But in some scenarios, it is necessary to transfer an SAP system from one Failover Cluster to another one. That could be the case in these situations:

  • New active directory domain name or new DNS domain name, e.g. after the acquisition of a company
  • A Windows OS upgrade, e.g. from Windows Server 2012 or lower to Windows Server 2016.
  • From Windows Server 2012 R2 there is the possibility to upgrade the nodes rolling and in-place
  • A (partly) destroyed cluster configuration
  • Redesign of the datacenter architecture
  • Move to an IaaS partner like Microsoft Azure or to the Google Cloud Platform

As far as I know, there are many guides and best practices (like the official SAP installation guide or this blog)  written on how to install SAP in a Windows Server Failover Cluster. But I don’t know a “best practice documentation” on how to move an SAP system from one cluster environment to a new one. I will try to change this with this blog.

Prerequisites

As a starting position, I will assume that we do have an already clustered SAP (A)SCS instance up and running. Of course, this method would work in a similar way for every clustered SAP ABAP or Java system. In this example, I am using an SAP ABAP system called “RIX”. In the beginning, RIX is running on a Windows Server 2012 R2 cluster, consisting of the nodes wsiv8050-1 and wsiv8050-2. It’s a very typical setup, as it is described in the SAP ABAP Installation Guide (find an overview of SAP NetWeaver Installation Guides here[Link]) and it looks like:

SAP cross-cluster move with minimized downtimeSAP cross-cluster move with minimized downtimeSAP cross-cluster move with minimized downtimeSAP cross-cluster move with minimized downtimeSAP cross-cluster move with minimized downtimemerge them into \obelixsapmntRIXSYSprofile.

Prepare the database move

From an SAP cluster perspective, everything is now prepared to start the move. But in my environment, it is advisable to start transferring the database to the new SQL Server instance now, because I plan to move both components, SAP and database in one short downtime from the existing environment to the new cluster. To keep the necessary downtime as short as possible, I will utilize the standard backup- and restore capabilities that are offered from the SQL Server DMBS to build up a “hot stand-by database”. To do so, there are many conceivable approaches. Read the existing database software documentation for your DBMS to find the best way for your setup. For SQL Server, a manually initiated “transaction log shipping”, as described here, is a sufficient feature to achieve the desired goal.

SAP cross-cluster move with minimized downtime--execute the Full Database Backup on the existing instance :connect WSIV8050-SQLCLU GO BACKUP DATABASE [RIX] TO DISK = N'\parkplatzSamuel_BackupRIXFullRIX.bak' WITH NOFORMAT, NOINIT, NAME = N'RIX-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 5 GO --now we can connect to the target instance to restore the database fullbackup in the new Environment :connect WSIV8051-SQLC51 USE master GO RESTORE DATABASE RIX FROM DISK = N'\parkplatzSamuel_BackupRIXFullRIX.bak' WITH FILE = 1, NOUNLOAD, REPLACE, STATS = 10, NORECOVERY GO --do the TLOG backup on the source database :connect WSIV8050-SQLCLU GO BACKUP LOG [RIX] TO DISK = N'\parkplatzSamuel_BackupRIX1-RIX.trn' WITH NOFORMAT, NOINIT, NAME = N'RIX Database Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO --and restore the TLOG on the target instance :connect WSIV8051-SQLC51 GO RESTORE LOG RIX FROM DISK = N'\parkplatzSamuel_BackupRIX1-RIX.trn' WITH FILE = 1, NOUNLOAD, STATS = 10, NORECOVERY GO

Pay attention to the “NORECOVERY“-option I used twice. The result on the target instance is a database in ‘Restoring’ mode. That allows to restore additional transaction log backups to bring the source and the target database in sync during downtime, before opening the RIX database on the target instance on wsiv8051-SQLC51.

Do the magic: Move the ASCS and DB instance in a short system downtime

As always, it is mandatory to have a full system backup before doing maintenance on a productive SAP system. If everything is well-prepared, the risk to need to roll back is quite low, but it’s always a possible solution to roll back and postpone the maintenance activity.

At this point, you have a clustered ASCS instance, a clustered SQL Server database instance, and local ERS instances running on both cluster environments. Your ABAP application instances are still installed on the old/existing cluster.

If your downtime window begins, stop – as always – all interfaces, your batch jobs and log off all dialog users.

  1. Stop the SAP application servers.
  2. Take the ASCS & ERS instances offline. It is recommended to “disable” the Windows services to avoid an unintended restart in future.

  3. Using the Windows Server Failover Cluster Management, remove (!) the IP address and the network name “asterix” from your current cluster node wsiv8050-1.

  4. Remove the cluster object in Active Directory “asterix”

  5. Modify the IP address and network name from “obelix” to “asterix” in the SAP cluster role in the new environment wsiv8051-c.

  6. Before we can start the new ASCS, we need to reconfigure the profile files if not already prepared at this point in time. That means:
    1. Rename the ASCS profile, e.g.
      rename D:usrsapRIXSYSprofileRIX_ASCS00_obelix D:usrsapRIXSYSprofileRIX_ASCS00_asterix​
    2. Adjust all the parameters inside the DEFAULT.PFL and the instance profile RIX_ASCS00_asterix that are pointing to obelix, which didn’t exist anymore. To do so, you could simply copy the profiles from the former cluster. But take care of every single profile parameter and – most importantly – adjust the database related settings. For SQL Server, this would be the parameters
      SAPDBHOST = WSIV8051-SQL dbms/type = mss dbs/mss/server = WSIV8051-SQLC51 dbs/mss/dbname = RIX dbs/mss/schema = rix​
    3. Modify the local profiles from the ERS instances. In my environment, they are local profile files in C:usrsapRIXERS10profile on each node. Correct all the “obelix” entries to “asterix” in these profile files on every cluster node. And remember to restart the ERS instances after the changes have been made!
    4. If there are adm specific user environment variables like SAPLOCALHOST=obelix or MSSQL_SERVER=wsiv8050-sqlCLU, you must modify them to the new values. E.g. using regedit HKEY_USERS –>  –> Environment.
    5. Modify the Windows service SAPRIX_00 on both (=all!) new cluster nodes, that asterix instead of obelix is used to determine the sapstartsrv.exe parameters. You can either use sc.exe /config= or regedit.exe adjusting the “ImagePath” key in the hive HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesSAPRIX_00SAP cross-cluster move with minimized downtime--do the last TLOG backup on the source database before switching to offline :connect WSIV8050-SQLCLU GO BACKUP LOG [RIX] TO DISK = N'\parkplatzSamuel_BackupRIX2-RIX.trn' WITH NOFORMAT, NOINIT, NAME = N'RIX Database Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 ALTER DATABASE [RIX] SET ONLINE WITH ROLLBACK IMMEDIATE GO --and restore the TLOG on the target instance :connect WSIV8051-SQLC51 GO RESTORE LOG RIX FROM DISK = N'\parkplatzSamuel_BackupRIX2-RIX.trn' WITH FILE = 1, NOUNLOAD, STATS = 10, NORECOVERY GO :connect WSIV8051-SQLC51 GO --finally open the database on the target SQL Server instance RESTORE DATABASE RIX WITH RECOVERY ​
    6. Re-configure the SQL Server logins and security. To do so, there are many possible ways. One would be to follow SAP note 1294762 – SCHEMA4SAP.VBS and create a schema repair script. That would transfer all the logins. In addition to that, remember to transfer all other database related objects like SQL Server Agent jobs, additional logins or maintenance plans.
    7. Start the ASCS instance using the cluster role “SAP RIX” on the new cluster node. If everything is properly configured, the ASCS is running smoothly.
    8. Start the old application server instances which you still have installed at least on two Windows hosts, wsiv8050-1 and wsiv8050-2. All former application instances should successfully load their profile from the “new” \asterixsapmntRIXSYSprofile without any modification. Verify that the application server instances connect to the database moved to the new server.

    At this point you have successfully switched to the new ASCS instance and database on the new cluster!

    Follow up activities

    You can now:

    • install additional application server instances on the new cluster nodes
    • remove the old SAP instances on the old Windows hosts. Or simply delete the old environment.

    After everything is finished, you can install two additional SAP application server instances on the two new Windows Server 2016 cluster nodes to the existing system RIX. Of course, these SAP instances are installed on local disks. Thus, your final system architecture could look like:

    The difference to the starting architectural overview is marginal – only both node host names are exchanged (from wsiv8050-1/ wsiv8050-2 to wsiv8051-1/ wsiv8051-2). And of course, the new cluster nodes are completely independent from the former ones – they could be using a different Windows Server Version or changed server location (e.g. even operated in an IaaS environment).

    Disclaimer

    Please consider this blog as a source of inspiration. It could be very beneficial to follow such a guidance instead of obtaining knowledge by doing. But I strongly recommend to test this procedure in your environment before each production downtime. After you finished the “obelix” installation, you could perhaps move this environment to an unused dummy name, for example “idefix”. That will lead to the certainness which is necessary to do all the of steps systematically during the (short?!) downtime of your productive system.

    New NetWeaver Information at SAP.com

    Very Helpfull

     

     

    User Rating: Be the first one !