By Mark McCulley, Business Owner, Experienced ISO Auditor, Trainer, and Presenter
The Root Cause
When weeds grow in a lawn or garden, it’s not enough to cut the top off so you no longer see the weed. The only way to permanently eradicate the weed is by pulling up or destroying the entire root. The same applies in many disciplines: if we do not find the true cause/s for a problem, we will meet it again, and will be forced to spend even more time and effort to solve it.
- Medical: A patient in surgery has the perfectly healthy kidney removed instead of the diseased one; called “Wrong-site” surgery.
- Manufacturing: a complex sheet-metal part was bent backward: that is, ‘good side’ the wrong way. The first article had been approved before several hundred pieces had been manufactured backward, and this was the third time this nonconformance had happened.
Stopping repetitive errors is the goal of Root Cause Analysis, or RCA. This process is used in many industries, including manufacturing, food production and distribution, medical science, and various types of service; and many disciplines, including sales, customer service, engineering, and management.
Root Cause: the underlying fault or weakness in a process or system that triggers an undesired event or result (“Nonconformance”). More than one cause is usually sought, thus “cause/s” is used herein.
Root Cause Analysis: a process, using a set of defined tools, used in investigating and categorizing cause/s of a nonconformance. Results of RCA are used in testing and validation of cause/s — known in ISO standards as “verification of effectiveness” of Corrective Actions.
Fault: a wrong instruction or directive in documentation of a process or procedure.
Weakness: an unclear or missing instruction or directive, resulting in more than one possible decision or action, some of which will result in a non-conformance.
Nonconformance: the non-fulfillment of a requirement; can be a product (tangible), service (customer-facing) or policy violation (internal issue such as in health and safety).
Corrective Action: (CA) an action to correct the Root Cause/s of a nonconformance, thus preventing a recurrence.
Correction: To replace the nonconforming product or service. This treats the effect only.
Containment: actions to ensure that nonconforming products or services do no harm, or do not reach the customer. This treats the effect only.
Establish the Team
Step 1 is to establish the team:
Choosing the most appropriate team members is crucial in finding the root cause/s. The person or team responsible for the process in which the nonconformance occurred, or was detected (not always the same), should be the core but not tasked with the solution on their own; in medical practice, there are times when those persons are not allowed to be on the team.
Decision-makers whose work contributes to or is affected by the product or service at hand may be added.
A solely top-management committee is ineffective; the team must have direct input of the residents, nurses or other front-line workers who are the users of current procedures and will have to implement any changes.
Keep the team lean and nimble, and avoid the old management mantra “A camel is a horse designed by a committee.”
With internet connectivity, the team is not required to be in the same room. A few online collaborative solutions are listed in the Appendix.
“There are three kinds of leaders. Those that tell you what to do. Those that allow you to do what you want. And lean leaders who come down to the work and help you figure it out” – John Shook
Define the Problem
Step 2 is to define the problem:
A good problem definition, or “problem statement” is best written as a few sentences, clearly and concisely written, stating the occurrence as it was observed and the immediate effects.
The problem statement should include:
- What happened
- Where it happened
- The date and time it happened
- When it happened (in the sequence of events or the step in the process)
- How it was observed, measured, or calculated
- What was affected
- Who was involved
- What standard was affected or failed to be met
Let’s look closer at each one…
Sample Problem Statement
A fictional example from St. Hope Medical Clinic in Athabasca, Manitoba: “An overdose of Heparin was given to a patient in NICU, at 9:18 AM, shortly before shift change. After the error was discovered, a corrective dose of protamine sulfate was administered by drip. Nurse C. Osborne and PA E. Smith were the staff on duty. They admitted they had failed to cross-check the prescribed Heparin dosage before administering it to the patient.”
Based on this example, would you be able to begin investigating a root cause?
Containment vs. Corrective Action: Immediate containment actions for the incident can contribute to a sound RCA but do not solve the problem! In the above example, how soon a staff member noticed the problem, how long it took to administer the corrective dose, and how soon the patient recovered, are useful at this stage only in ensuring that everyone practiced the types of observation and data recording required by the written procedure.
“If you can’t explain it simply, you don’t understand it well enough” – Albert Einstein
Rules and Tools for Defining Cause
Like any discipline, this one has rules and tools that save a lot of mistakes and lost time. Here are a few.
5 Causal Rules: For team discussions and all forms of RCA:
- Clearly show the cause-and-effect relationship; that is, show that the proposed cause in fact caused, or could have been the cause, of the nonconformance. If the proposed cause is repeated, does the same effect recur, or is it highly likely?
- Use specific, accurate, factual description of what occurred; opinions, statements in the negative, and vague pronouncements don’t lead to solutions.
- Identify a system cause, not a human cause; beginning with blame on a person does not lead to finding the error or breakdown within the system.
- Identify the preceding cause of procedure violation; what earlier steps contributed? A procedural error has a preceding cause. (For example, not timely entering data meant data was not available for the next operation, where the nonconformance occurred.)
- Failure to act is only causal when there is a pre-existing duty to act. Lack of a procedure, not ignorance of a procedure, creates a system fault leading to a nonconformance.
Discussing possible causes with others in a cross-functional group helps gain an overview of the problem and aids in a more complete solution.
In brainstorming, every possible idea is proposed, continuing until nobody can add any more ideas. Categorizing, analyzing and weighting the different possibilities is the next step, before beginning to eliminate those that do not seem related.
Professor Kaoru Ishikawa began using the fishbone diagram to break down problems into cause and effect, and include contributing causes, in categories for clearer examination of relationships and relative importance. This tool can be used in brainstorming possible causes with your team, categorizing them and homing in on most-likely causes.
The diagram looks like a fish skeleton, with a box on the right holding the summarized problem statement (visually, the causes flow into the event). Along the spine, the bones branch out at intervals, labeled with the categories (listed below). Along each bone is space to write specific potential causes. A sample is included in the Appendix. If you prefer not thinking about fish skeletons, the same effect can be created from the outline of a tree, with each major branch as a category and sub-branches listing the possible causes.
In service industries, 6 S’s are used: Surroundings, Suppliers, Systems, Standard Documentation, Skills, and Scope of work.
In manufacturing, six “bones” are used: Machine, Material, Method, Person, Measurement, and Environment (which includes available information and the cultural environment).
In marketing, 8 P’s are used: Product/Service, Price, Place, Promotion, People, Process, Physical evidence, and Packaging.
Another common tool is the 5-why causal analysis. For each possible cause uncovered in brainstorming or on the Ishikawa diagram, ask your team 3 to 5 (or more) “Why” questions. Each question should dig deeper into the source of the nonconformance, in order to differentiate between the initial cause and a mere effect.
The Titanic Syndrome
“The Titanic sank and 1,500 people died” states two issues, not one. The second may be seen as an effect caused by the sinking, but is the loss of life caused by the actual sinking, or something else? How are the two related? The lives lost issue deserves its own RCA. In your problem statement and in your RCA, you may find there is more than one effect, thus more than one problem, and thus more than one RCA required.
“Where leaders spend their time determines what is important to their organization!” – Mike Stickler
Use of Data Tools
What if a perceived problem is not a single incident but a trend? For instance, a hospital observes a 25% reduction in patient satisfaction, as defined by post-release surveys, in the past six months. RCA necessarily involves some degree of analysis of data gathered by the system either within a database or by a check sheet (a form used to collect data in real time). Here are some examples of data tools:
- Scatter graph: Data patterns will sometimes appear to be random. A scatter graph (or plot or chart) plots system data on an XY graph with one variable on the horizontal axis and one on the vertical axis. Control parameters can be used to isolate one of the variables, and to isolate causation. If no control parameter exists (or is possible), the chart shows correlation, not causation, but it can help in guiding toward a Root Cause.
- Control chart: a simple graph derived from system data showing upper and lower acceptable limits, compared to the design standard, and how outputs over time measure against those limits. Variations in outputs – for instance, time to respond to patient requests, or size ranges in manufacturing – show up as on, above or below the desired level and limits.
- Standard Deviation: this is the amount of variation of a set of values pulled from system data, and is expressed as a numerical value, the square root of its variation from the statistical mean. This tool could show whether the amount of variability within the data is acceptable compared to the expected result; although it will not, by itself, point to a Root Cause.
- Stratification: this tool helps analyze groups of data by sorting it first into ‘subpopulations’ or subgroups, which can then be examined discretely. An example in medicine would be sorting patient populations over time by age, by presenting condition, or by date first seen; in manufacturing, by type of part, material, or customer; or many more, for any discipline. However, a stratified group must only belong to one subgroup.
- Pareto principle and chart: this could be stated as “80% of improvement is achieved by correcting 20% of the possible errors.” A Pareto Chart is often used to summarize data visually, with bars showing the distribution of data. If your analysis of data during RCA uncovers one or two most-likely causes, don’t hesitate to start on those instead of waiting for a possible future full set of causes. (And yes, it can be 60/40, 70/30, 80/20, 90/10.) A similar tool is a Histogram, which groups data by ranges and charts the ranges by size.
Testing for true Root Cause
One way to test for a true a Root Cause is to think of it as “the first domino to topple” and to show that other causes followed as effects. A functional test traces directly from the proposed Root Cause, step by step, through actual occurrences in the event (uncovered by the 5-Whys tool) during your investigation, and leading directly to the reported nonconformance. If the effects branch off to some other result, examination of assumptions and data is indicated.
In the legal world, the Root Cause is called the “proximate cause” or less formally, the “but-for” cause. According to the old proverb, “for lack of a nail, a shoe was lost; for lack of a shoe a horse was lost” and so on until the entire war and kingdom was lost, “all for the lack of a nail.” The proximate cause is the one that, ‘but-for’ this factor, the incident would not have occurred. If that statement is shown to be true, it indicates a Root Cause.
Is the Root Cause Actionable? The goal is to stop repetition of error, so implementation of a solution which is a change in what led to the event, is mandatory. Therefore, a Root Cause must be actionable; rather than “human error” which cannot be prevented, a Root Cause must describe some flaw in the system. The Corrective Action must directly address the Root Cause, and when tested, shown to remedy and prevent recurrence of the original problem.
Questionable Root Causes
These tend to be stated in RCA when “why” questions, or data analysis, were lacking.
Employee failed to follow procedure: we could assume that happens frequently. But why is that the case? Laziness, lack of caring, illness, drug use? How would we know?
Training not effective: The nonconformance happened due to an employee not correctly following a procedure, so the assumption is their training was not effective and needs to be repeated. But why was the training ineffective, and why was that not discovered sooner?
Employee assumed: a better word is ‘guessed’ which is a cause; but what is the cause behind that cause? Are employees making other assumptions that will result in other nonconformances? Why is that happening?
Procedure out of date: why was that? Multiple causes could exist such as lack of internal audit to detect the issue, or a deeper cause of failure to keep procedures up to date as practice changes. Merely changing one procedure will not prevent other nonconformances stemming from lack of maintenance of procedures and work instructions.
“Water seeks the lowest level” is true in business as well as nature. Culture, which is either created and guided by top management, or allowed to be created by every department, is a key element in avoiding nonconformances, or it could be stated, continually improving. Looking at the above list of “Questionable Root Causes,” each of these could be traced back to a culture of getting by with the minimum. Commonly seen in manufacturing, correcting errors by “FISI” or “Fix It and Ship It,” is like trying to treat a migraine with a bandage.
Top management must set the standard of devoting enough time and energy to create and maintain a company culture of “only the best” whether in patient care, making burritos, or creating parts for aerospace. Even a good RCA investigation team cannot do its job well if being hurried along or sidelined to solve the latest emergency.
In many ways, RCA could always lead back to a lack of top-management commitment. Yet, that in itself is a simplistic answer. Why is that the case? What additional data does top management need to see and understand from within the system, to see the results of a FISI culture? How would that problem be resolved, permanently? And who will be responsible for solving it?
Good questions, these and many more. Perhaps you will be someone who can answer them.
“I took the one less traveled and that has made all the difference.” – Robert Frost
About Mark McCulley
Mark McCulley is
T Mark enjoys
Read What is Lean?
click here to create your own account.
The Checklist Manifesto by Atul Gawande