Pythian has always been serious about reducing human mistakes. Our consultants have always been required to log all terminal outputs for the work we execute so that the information is available to our clients, and to make sure the same work is done the same way the next time. Later,
FIT-ACER was invented - a set of checks to be followed by the DBA working on the maintenance to make sure the person is ready to perform the work, and is actively aware of what, why, where and when exactly it needs to be executed. Most recently, we have started adopting the automation approach; for example, by building
Ansible playbooks that perform the maintenance work, which almost completely eliminates any human mistakes. Automation is the ideal, but the path to get there is neither simple nor quick. I was looking for a simple, intuitive process that could be used before the automation was implemented. This blog post will present my solution: a template-based Action Plan Generator using Google Docs that I created to reliably advance the maintenance through all testing cycles and into production.
The current approach
Of course, with our organizational emphasis on reliable, repeatable work, we have been creating action plans for our maintenance activities for a long time. Usually, this is a multi-step process:
- The preliminary action plan for the maintenance is created by a DBA who takes into consideration the current state of the system and the change that needs to be applied. The DBA studies documentation and relevant My Oracle Support notes to determine the right sequence of actions.
- The action plan is executed in a test/dev system. At this step, the DBA records the exact commands that need to be run, addresses issues, etc.
- The final action plan is created by traversing the terminal logs and reviewing the notes from the previous step. Some steps may need to be reordered in this case, and if any issues were observed, the fixes may need to be added to avoid the issues in the next iterations. The outcome of this step is a document that lists the exact commands that need to be executed to get from the current state of the system to the patched state (upgraded, migrated, etc.) successfully.
This approach has worked really well for us and it has been one of the best ways to make sure the action plan is reliably repeatable and no steps are missed. However, it has a potential issue - hard-coded values. It's clear that some things differ between different systems - port values, host names, usernames, paths and so on, which made it a necessity to add one more step:
- Run a find/replace to adjust the hard-coded values in the action plan, and generate separate action plans for every other environment. Review the action plan to spot errors on text replacements.
I personally have had many problems with this step. I'm an Oracle Apps DBA, and when dealing with Oracle e-Business Suite maintenance, it's not uncommon to have an action plan of 50+ pages in front of you. The largest action plans I've worked with were longer than 200 pages. Imagine how difficult it is to review it all and make sure the replace didn't break anything. Is the find/replace/review process error-prone? Absolutely! We've had situations where some values are mis-replaced, or the search pattern matched too many strings in the document, resulting in an undesirable change.
Template-based action plans
The idea is very simple:
- The final action plan is created as a template (during Step 3 above), so that the outcome won't contain any hard-coded values. For example, we may use a "
##DB_PORT##
" placeholder instead of port 1521. We also include "toggles" to be able to enable/disable complete sections of the document.
- All the values for placeholders for every environment that needs to be maintained are defined in a separate spreadsheet that is compact and easily manageable.
- The final action plans are generated by a simple macro script by taking the template and updating the placeholders to the correct values.
Creating the action plan template like this saves time and reduces risk. It removes the necessity for manual find/replace, and eliminates the possibility of unintentionally replacing something that shouldn't be replaced - errors that are not simple to spot when reviewing the result manually. It doesn't eliminate the need for a review of the generated plan, but it should simplify the process.
Here, in a publicly shared google drive folder, is a working example for you to copy, try out and use for your own benefit. It consists of two documents explained below.
Action plan generator - template
Action Plan Generator - Template is an example of a very simple template. Here's a small section of it: [caption id="attachment_105789" align="aligncenter" width="1361"]
Action Plan Generator - Template[/caption] Notice the following dynamic pieces in the template:
- Variables - strings like
##ORA_USER##
, ##DB_HOST_S##
, and ##JIRA_ISSUE##
. These placeholder variables are replaced with real values from the "Action Plan Generator - Variables" document when the final action plan is generated.
- Toggles -
##EM13C>##
(the beginning tag) and ##<EM13C##
(the end tag). Toggles allow removing the section of the document between the beginning and end tags. If the EM13C
is set to "OFF
" the content will not show up in the final plan.
Action plan generator - variables
Action Plan Generator - Template is a spreadsheet that defines the values for the placeholder variables and toggles. Here's how simple it looks: [caption id="attachment_105790" align="aligncenter" width="2077"]
Action Plan Generator - Variables[/caption] There are a few things to note:
- You'll need to update the "Document Template ID" to point it to the template document you actually want to use (open the template Google Doc in a browser, and the ID is part of the URL)
- You'll need to update the "Target Folder ID" to point it to the folder where you want to create the generated action plans.
- The table needs to be updated to define the variable and toggle names used in the template document. More columns can be added to allow more environments, and more variables can be added, as well.
- Clicking on the "Generate the Document" button will initiate the macro (it will ask for permissions to run; if you're not sure it's safe you can examine the code by navigating to Tools -> Script editor). The macro will:
- Look up the template
- Prompt for the environment name for which to generate the action plan
- Back up the previously generated action plan (copy + rename it to add timestamp to its name)
- Copy the template and rename it by replacing " - Template" to " - <environment name>"
- Replace the variables and remove the sections between
OFF
-toggles
- Display a message with the URL to the document
The result
Here's the action plan section (the same we displayed above for the Template file) after generating the plan for DEV environment (notice that the 13c EM blackout section was removed as the toggle is set to "
OFF
", and the variables have been replaced with the actual values): [caption id="attachment_105788" align="aligncenter" width="1377"]
Action Plan Generator - DEV[/caption] You can also take a look at complete files as they were generated:
Action Plan Generator - DEV,
Action Plan Generator - QA and
Action Plan Generator - PROD. I've used this tool a number of times already, as have several other members of my team. Initially, it takes more time to produce the action plan template instead of just putting down all the commands one after another because one needs to consider all environments at the same time. But it's already much quicker for the next template, as large parts of the previous action plan can be reused. I also can confirm this really simplifies modification of the action plan, as the work happens in a single document - the template - and there's no need to edit different documents for different environments. I believe this method really helps to reduce the mistakes that DBAs allow in the maintenance activities of their systems.
What's next?
Automation is still the ultimate goal. But once you start working with the action plan templates, you'll notice that it requires a generalization of the action plan in a way that it applies to all environments that you maintain. That is actually a halfway towards automation! You're not investing your time in something that you'll throw out as soon as your automation framework is in place and you've gone through the learning curve for using it, you're actually creating something that can act as a foundation for your playbooks, recipes or configuration files. You'll get real automation results faster, and if you're also avoiding a few human errors on the way - it's a win-win!