zOS Security Compliance

Security auditing methods have not changed markedly from those first developed for the standalone computer environments of the 1960s. These methods were adequate for their time, but modern information system technology has made auditing computer security a much more imposing problem.

There are numerous reasons for this. Personal computers have placed powerful tools for exploration and hacking onto everyone’s desk. Networks have revolutionized the exchange of information, but they have also provided a direct path for hackers to attack and compromise critical computer assets. Even more threatening, employees and contractors can often readily gain unrestricted access to even the most sensitive information simply because standards for protection have not been designed or implemented. In this environment, bookkeeping based auditing methods not only fall short, but can create a misleading impression that security is under control.

In 1997 Ford Motor Company and SunTrust Service Corp began a project to replace ad hoc audit methods with a disciplined, standardized approach. The goal of the new approach was to be able to perform a complete security audit daily. The key to this new audit approach was the application of the principles of Total Quality Management (TQM) to information security. TQM has raised automobile quality at Ford to world class levels; certainly similar methodologies could resolve the problems of system and security integrity. This article describes the implementation of this new approach.

Total Quality Management for Information Security

The starting point for the project was an iterative quality approach in which specifications for a secure system are defined and adherence to those specifications are measured. When errors or discrepancies are detected, corrections are tested and applied through repeated cycles of improvement. This is the amorphous quality approach espoused by Dr. Deming and adopted by his followers in the 1980s. Once in place, this approach formed the basis for a continual self-audit and led to greater confidence that the computer environment was secure.

Continuous improvement requires that the organization adopt strategies for defect elimination. Defect prevention requires designs and methods that support quality, plus commitments to quality at the organization and process level. This means that there must be clear standards for protection of information resources, and that these standards must be enforced in direct and certain manner. For example if a system administrator places a file on the system without any security protection, perhaps by granting universal access to the file, there must be an automated way to detect such an occurrence. Once detected, there must be a swift way, whether automated, manual or a combination, to assure that the universal access is replaced by more restrictive access.

Also, any continuous improvement process must obey W. Ross Ashby’s concept of Requisite Variety. Ashby’s concept is that the solution can not be more complex than the problem. Thus improvements in quality generally require that unnecessary or redundant steps be sought out and eliminated. By minimizing the steps in a process, both control and monitoring become easier.

Finally, quality requires simplification. In the case of data security there are a large number of tools that can used to report on problems. However, these tools generally require a large amount of systems and security knowledge to be of any use. In fact, the security audit landscape is littered with tools that have proven too complex or cumbersome to use. In this case, the goal was to provide a tool simple enough for a clerk to perform the audit.

Application of TQM to Data Security

The goal of security administration is to assure that users have access only to the information they need and that data can be counted upon to have integrity. Data security administrators do three things to accomplish this. They issue commands to computer subsystems that control the operation of the security subsystem, they establish security structures such as user groups, and they grant users access to information. For the purpose of TQM, these security administrator actions can be viewed as a discrete manufacturing process where the product being manufactured is resource access. 

Pursuing this analogy, each item coming off the assembly line can be measured for quality. For example, assume that a security administrator is asked to grant several users access to an engineering release system. In this example, assume that the users already have userids and passwords on the systems in question and that their managers authorized the users as needing access (i.e., they are approved users). Then one of four things might happen:
      
a. No Defect: Correctly granting permission with no side effects (nothing else was erroneously changed).

The administrator verifies or sets up a user group that has access to the required resources (dataset profiles and general resource profiles). The administrator issues commands that connect the approved users to the group.

b. No Failure but with Defective Side Effect: Permission granted but with some other erroneous change.

The administrator verifies or sets up a user group that has access to the required resources plus a number of other sensitive resources that are not required. The administrator issues commands that connect approved users to the group. The result is that the approved users get the access they need. The defective side effect is that they also get access to information they should not have.

c. Failure with Defective Side Effect: Failure to correctly grant permission with some other erroneous change.

The administrator sets up a new user group but issues commands that permit the group access to an incorrect set of resources. The administrator issues commands that grant the approved users plus a number of unapproved users access to the resource group. The result is that users do not get access to information they need. The side effect is that a number of users get access to information they should not have.

d. Failure without Side Effect: Failure to correctly grant permission with no other erroneous changes.

The administrator sets up a new resource group but fails to issue the commands to permit the group access to the required resources. The result is that users do not get access to information they need.

Three of the four administration scenarios result in some error. In scenarios c & d, the users will not have been given the required privileges; one or more of these users will certainly report this condition. This may be considered self-reporting. In contrast, in scenario d, the users have no reason to know of or report the error. This is a non-reporting error. Furthermore, actions taken to correct scenario c may easily lead to a scenario b situation; thus scenario c should be regarded as a partially self-reporting error. These scenarios are summarized in the following figure.

In considering the process of detecting both errors and side effects, the view of data security administration as a manufacturing process is especially is enlightening. While it may seem that defects in material and workmanship are obvious to those who manufacture physical objects, such as automobiles, in truth the unmonitored assembly line deteriorates rapidly to a point where the product is shoddy.  Likewise, while it may appear that self-reporting security administration errors would be readily detected and corrected, thereby assuring that the process of security administration has at least some baseline level of quality, such is not the case. In fact, where self-report is relied upon the number of errors generally increases over time and problems in the administration process are rarely, if ever, addressed.

Moreover, self-report simply can not be relied upon to detect the errors caused by side effects. This is troublesome because side effects open trap doors through which unscrupulous individuals may easily gain access to sensitive applications and data. In an environment where side effects are not detected, they are likely to increase in frequency over time, just as the unmonitored assembly line produces products with less and less quality.

However, it is not enough to state that automobile be manufactured to world class levels of quality or even that there will be fewer than ten defects per automobile coming off the assembly line. These are really only goals which if not achieved offer little information on the causes of failure. Instead, quality measures for automobiles must be stated in terms of the details that are measurable, such as allowable tolerances for the fit and finish of body panels or the pressure produced by a fuel pump.

Likewise, the foundation of an effective data security TQM program is the adoption of a thorough and specific set of standards. Policy statements will not do the job. For instance, on an IBM mainframe, the policy might be: “only security officers and selected system programmers shall be granted system security privileges”. However, the standard would have to list exactly which users are qualified to have these privileges based on the policy. Then the state of the security system can be evaluated by determining discrepancies from this list.

The software provided a quick way to develop baseline standards that are very specific and contained thousands of items. Items in the standards include, for example, datasets that were considered sensitive, users that should have system privileges, and modules that were permitted to have the security bypass parameters. These were all reviewed in a series of meetings over several months. Once the customized standards are validated they are stored in the secure database for use in the quality measurement process.

Tools for Achieving a Secure System

As with most organizations that take an ad hoc approach to security quality, Ford and SunTrust used MVS and RACF tools to report on a variety of different security parameters. Using these tools, a variety of reports were run on a regular basis. Many of these reports were long and some contained complex information that had to be interpreted. Given the limited time that analysts could devote to reviewing these reports, many simply could not be analyzed. Furthermore, there were no specific or formal standards for judging whether the reports reflected an increase or a decrease in security quality.

To move beyond the ad hoc quality management stage, Ford determined that it had to automate the audit. This resulted in the development of MASE, a computer program that supports the continual self-audit process. MASE performs the following functions:

MASE automates the process of developing and managing security standards. A system extract is used to populate an initial load of the AuditStar standards. These can be edited within AuditStar and supplemented by manually entered standards.

Once the standards are certified, MASE gathers daily extracts of all required security information from each monitored system. These extracts are gathered on the mainframe and then converted and loaded into the MASE database. As each daily extract is loaded into MASE, a deviation analysis is automatically performed. The result is a series of summary and detailed discrepancy reports. These reports list all items that do not meet standards and all discrepancies that have been resolved.

Finally, MASE manages the process of resolving discrepancies. The security administrator uses MASE to acknowledge or close discrepancies, and to check on historical trends.

By installing MASE, both Ford and SunTrust were able to automate the entire TQM lifecycle for IBM/MVS RACF, auditing security through repeatable, defined, and managed processes. The result is better security and cleaner audits with much lower costs of managing compliance.