By Jason Parms
When thinking of security for your small business, you can ensure a minimum of business downtime by performing ongoing and preventive maintenance on your physical equipment. Regular maintenance ensures your server software and security patches are up to date, minimizing inefficiencies due to software version conflicts and downtime due to mechanical failure.
Preventive system maintenance means performing regular inspections, implementing regular updates, and proactively preventing potential failures before they occur. Having a plan in place and executing on it regularly will help avoid unplanned downtime due to battery failure, clogged air filters, outdated firmware, and other physical causes.
Create a Maintenance Plan
Define the objectives of your maintenance program and create a plan that meets these goals. Potential goals include reducing unplanned downtime and the risk of downtime, safety and security, improved capability to provide services, and so on. If you know what you are trying to accomplish, then a few simple metrics can tell you whether you are accomplishing your goal-a critical point if you're trying to fund an initiative of this sort.
Plans should include certain standard features:
Create your own maintenance routine. Having a routine, or stated procedure to follow, is an important part of the plan. A checklist to guide you is provided later in this article, but your data center should develop its own checklist, tuned to your specific needs. Check your equipment manuals to see what intervals and routines are recommended by the manufacturers; some OEMs even provide standardized checklists detailing the preventive maintenance that is optimal for your equipment.
Make it regular. If you don't explicitly make time for it, there will never seem to be a good time to bring the server down to do maintenance. Not all maintenance, however, means having the server down, but you'll need to do it periodically to keep your servers healthy, and, for instance, to update software. Pay attention to what windows of time-when the fewest people are logged in and using the system-are best to perform maintenance activities.
Don't wait for an actual failure! Plan your routine maintenance in advance. If you have older machines that function in a small, airless server room, you probably want to perform inspections and cleaning routines more frequently than if you have new equipment in a well-ventilated room.
Address physical and mechanical maintenance. Your IT people might already be maintaining software and patches as part of your data center's security plan, but ensure that they are also attending to physical, mechanical, and environmental issues. For instance, UPS and batteries benefit greatly from regular maintenance to ensure they are healthy and ready to go in case of power outages. Other systems that should be maintained include HVAC, generators, and physical plant items like doors, emergency exits, cabling, etc. Good maintenance in these areas can also reduce power usage, enabling your data center to run more cleanly and efficiently.
Keep good documentation. If you document your procedures and maintenance history, and the outcomes of each procedure, you'll ensure maintenance is being implemented according to plan, and you can assess the effectiveness of your overall plan. This information can also be of invaluable assistance in case of an actual system failure-it can help you identify problems, or at least rule out certain issues. Also, review your maintenance history to identify chronic equipment problems and trouble spots.
Consider hiring outside help. Sometimes the best maintenance is knowing when to call in the experts. Not all data center managers will be experts on changing air filters for the HVAC system, for example, nor should they be. Your maintenance plan and checklist should list resources for each task. Trying to solve certain problems yourself may make them worse.
Consider a CMMS. Computerized Maintenance Management Systems (CMMS) or preventive maintenance software can help you track the status and history of maintenance of your equipment, and the associated costs. It can also provide a resource to ensure maintenance schedules are known and met, make established procedures available, provide maintenance history, track effectiveness, and in general make tasks more manageable.
Assign tasks. Assigning maintenance tasks ahead of time with everyone knowing what their roles are will keep your maintenance routine on track.
Prioritize maintenance activities. Recognize that some maintenance activities are more important than others. Understand your own data center, such as which applications are the most critical and which equipment is most vulnerable, to help you assign a priority rating to each task. If there is a reason a full maintenance routine can't be run, you can perform the most-needed tasks, while allowing the less critical items to wait.
Preventive Maintenance Sample Routine
It can't be stressed enough that your preventive maintenance plan is dependent to a large degree on your specific situation: hardware, software, environment, criticality, and other factors. The following should only be used as a guideline for developing a plan of your own:
Server Maintenance
Maintain the health of your server. Ensure there is good airflow and that vents are not blocked. Take the server offline and visually inspect its intake and outflow paths. Dust the machine itself as well as the area around it. Remove any accumulations around the vents. In a clean, antistatic environment, open up the box and inspect the system chassis, paying attention to the CPU heat sink and RAM modules, the fan assemblies, and air pathways. Use dry compressed air to blow away dust, which can act as an insulator or heat trap. Excessive heat can cause equipment failure, but even before that, it can cause the unit to consume much more power and to work much harder than it needs to.
Check your local hard disks and media. Run a physical check for bad sectors, and defragment on a regular basis. Even if you have terrific redundancy and fail over, equipment failure will at least result in the repair or purchase of new equipment-time and effort that could be prevented.
Check the event logs for the system, malware, and events. Even though your security plan probably includes a directive to review system logs regularly, the people reading them are looking for critical events, and there are hundreds of sub-critical events that can tell you a lot about how your system functions and where the weak spots are, potentially alerting you to imminent failures. For instance, recoverable errors may not raise a critical alarm, but they may signal a module failure, giving you an opportunity to be proactive about your system.
Also, while you're looking at the logs, ensure the setup is correct, that the reporting threshold is adequate, and that the correct people are notified in case of an event. Double-check that the contact list is up to date.
Keep your patches and software up to date. The server has a lot of software that has to interact successfully, and one way to ease that process is to ensure that software versions are updated in a timely fashion, and that security patches are applied. It's virtually impossible to release perfect software, so expect that you will have to update to fix problems, improve security, improve interoperability, and streamline performance.
However, you do not want to set up your production servers to automatically update–you need control over timing, and even over which updates and patches to apply. Most system administrators wait for new releases to be tested before deploying them. And when it is determined that deployment is appropriate, be sure you can return to your original configuration if the update or patch causes unexpected problems.
Among the software to keep updated is your virus and anti-malware software. This can mean more than downloading the latest definition files-if your data center manager is keeping up with the security landscape, it may mean periodically changing your antivirus software or adding malware protection.
Location and Environment
When planning your routine, be sure you include physical equipment checks for switches and routers, circuit breakers, power supplies, cabling, HVAC systems, fire detection, and prevention systems.
Ensure you have up-to-date inventory. Know what you have, how old it is, and where it is. This will help you execute a preventive maintenance plan efficiently.
Make safety part of the routine. Safety is expressed in a couple of different ways: making sure your conditions are safe for work and for equipment, and ensuring your maintenance plan is safe for the people who need to execute it. Data centers have many hazards, especially electrical hazards. Ensure anyone who needs to perform maintenance knows how to do so safely; likewise, ensure a safe environment for people who work there.
Ongoing education. Provide ongoing education so data center technicians know what hazards to look for, and how to avoid them.
Keep it clean. As we've already discussed, dust can create a hazard by blocking airflow, and also by limiting the movement of physical parts. Cleaning shouldn't wait for your preventive maintenance routine, but it should be a regular part of it.
IR scans. If your business has the funds, or if you can hire a vendor, consider having an IR (infrared) scan, which can help identify physical problems. The IR scan specifically looks for unusually high temperatures, which can signal deteriorating equipment due to vibrations, blocked vents, and other problems.
Sample Checklist
Follow this sample checklist for your data center maintenance:
1. Server maintenance
- Physical inspection of servers.
- Check disk sectors (Check Disk).
- Defragment (Defrag).
- Update virus and malware software.
- Install security patches.
- Install application updates as appropriate.
- Review event logs.
- Confirm notification lists.
- Remove dust and any obstructions from server intake and vents.
2. Location and environment
- Check batteries.
- Test UPS systems.
- Test power generators.
- Inspect HVAC infrastructure.
- Test and inspect switches, routers, etc.
- Performance IR scans to find potential weak spots.
- Visually examine cabling and connections.
- Clean physical area.
3. Schedule next maintenance
Plan Ahead
An ounce of prevention has always been worth at least a pound of cure. Preventive maintenance can save your company an enormous amount of money and work, and help your data center run more efficiently.
The post Protect Your Data Center With Basic Preventive Maintenance: A Checklist for Businesses appeared first on AllBusiness.com
The post Protect Your Data Center With Basic Preventive Maintenance: A Checklist for Businesses appeared first on AllBusiness.com. Click for more information about Guest Post.
No comments:
Post a Comment