The production line can be a fast-moving, complex environment. When one problem arises, it can quickly cascade, causing bottlenecks, product defects, or full-on manufacturing stoppages. While it’s critical to address the issue at the moment, smart manufacturers want to identify the true source of the problem so that it doesn’t keep happening. A root cause analysis (RCA) helps them do exactly that.

This article details the necessary steps involved in RCA, the techniques and tools that can be applied, the commonly encountered challenges, and best practices for performing this critical detective work. Equipped with this knowledge, manufacturers can develop RCA as a valuable core competency.

What Is a Root Cause Analysis?

An RCA is a systematic process for identifying the fundamental reason for a particular problem. In the context of manufacturing, such an investigation is used to identify the true origin of product defects, machine failures, or other issues in production.

Rather than applying a Band-Aid, the ultimate goal of an RCA is to develop and roll out a solution to the problem’s underlying cause to stop it at its source and prevent recurrence. By doing so, a manufacturer can improve product quality, increase production reliability, reduce waste, and preserve customer satisfaction.

Key Takeaways

  • Manufacturers perform RCAs to identify and address the core reason for a production or product issue.
  • Implementing an RCA helps companies get to the bottom of manufacturing issues to prevent recurrence.
  • RCA benefits include reductions in cost and downtime, increases in safety and productivity, and improved processes.
  • Manufacturing teams can adopt a variety of techniques and best practices that underpin a data-driven analysis of failures.
  • They must also be aware of common obstacles that can impede corrective action.

Root Cause Analysis in Manufacturing Explained

RCA is an analytics process whereby companies deal with inevitable production disruptions when they occur, then track them backward to a particular base issue (or issues) for resolution to improve operational effectiveness going forward. Some of the main reasons manufacturers initiate an RCA arise when it runs into failures or issues related to its products, equipment, or processes include:

  • Shop floor disruptions: Unanticipated bottlenecks, slowdowns, or stoppages on the production line that hamper workflow efficiency or output.
  • Product defects or failures: Failed quality checks or recurring product defects that result in customer complaints, regulatory concerns, or recalls.
  • Machinery breakdowns: Failures in equipment that lead to unplanned maintenance issues or downtime.
  • Safety issues: Workplace accidents or other shop floor concerns that place employee safety or regulatory compliance at risk.
  • Recurring problems: Repeated or systemic issues that suggest previous solutions did not address the foundational cause.

A standardized RCA approach can illuminate the causes of any number of adverse events and flaws in the system that precipitated them. It also involves developing prevention strategies for the future in the form of, for example, process improvements, improved training, or new machinery.

RCA is recognized as a critical component of many manufacturing process improvement approaches, such as Six Sigma and lean manufacturing. When developed as a core competency, RCA provides a clear structure that minimizes the time needed to trace a problem back to its source, preventing issues from being swept under the proverbial rug only to pop back up again.

How to Perform a Root Cause Analysis

Any manufacturing firm that wants to reduce downtime and costs, improve product quality, increase safety, and foster an environment of continuous improvement should understand how to perform an RCA. The specific tools, approach, and degree of detail will vary by industry, company, or type of problem; however, the core RCA process will be consistent across most scenarios.

1. Describe the Problem

The first step is to clearly and objectively define the evident issue. This description should include what happened, when, where, and the impact. Specifics and facts are important for accuracy; assumptions, generalizations, and ambiguity could skew analysis. Ask clarifying questions to make sure that those involved fully understand the problem and its scope.

2. Assemble Your Data

Next, gather relevant information, such as maintenance records, process data, operator logs, and statements on environmental conditions. It’s critical to be thorough, which will likely require multiple sources of data, including digital records, employee interviews, and physical evidence. Remember: Garbage in, garbage out—in other words, inaccurate or incomplete data will yield incorrect conclusions.

3. Narrow Down a Few Potential Causes

At this point, investigators can winnow the possible foundational issues that led to the problem or incident. Cross-functional teams with diverse perspectives and skills can think through possible contributing factors, using structured techniques (explained in the next section of this article). Be careful not to jump to a sole conclusion without clear evidence.

4. Investigate Each Potential Cause

The RCA team can now analyze each possible root cause and calculate the likelihood that it sparked the issue. Again, the team should use established tools and logic to rule in or out potential causes and avoid traps like confirmation bias—that is, focusing on causes or evidence that aligns with one’s preexisting beliefs—or ignoring inconvenient data or evidence that requires extra time to review.

5. Identify the Root Cause

Having tested the possibilities and gathered evidence, the team is now ready to pinpoint the primary root cause or causes of the issue. Applying the techniques explained in the next section can help investigators push through any intermediate tangles to uncover what’s fundamentally wrong and prevent recurrence of the problem.

6. Create a Plan to Correct the Issue

Now it’s time to develop a corrective action plan. This strategy should include specific actions that will rectify the core issue, not just its symptom(s). Include all key stakeholders in developing the plan to ensure viability and sustainability. Overly complex or expensive corrections are unlikely to be implemented.

7. Correct the Issue

Plan in hand, attention then turns to implementation. This may include changing processes, buying or fixing equipment, updating procedures, or providing training. Any changes should be well communicated and assign clear ownership. Lack of resources can result in ineffective implementation of even the best-laid plans.

8. Monitor the Results

Follow-up is essential to confirm that the corrections have been made and are having the desired results. Moving on or ignoring new data that comes in can result in the original issue reappearing. Instead, the team should track results over time through the use of key performance indicators (KPIs) and audits, making changes as required. Leading manufacturers take this a step further to measure the impact of the RCA program overall—for example, the number of RCA investigations conducted, the percentage that led to successful corrective actions, and any reduction in equipment downtime, quality defects, or cost savings achieved as a result.

Root Cause Analysis Techniques and Methods

Most manufacturers standardize RCA by establishing structured techniques and methods that identify, analyze, and address the underlying causes of product or process problems or failures. By adopting the most appropriate methodology, —manufacturers can develop a thorough, evidence-based RCA practice that yields more effective and efficient problem-solving and ongoing production improvements.

Five Whys

This aptly named approach involves an investigation team asking “why?” at least five times, beginning with a question about the problem’s surface symptoms and then challenging subsequent responses. This method pushes teams to move beyond superficial explanations or contributing factors to zero in on the real source of an issue. For example, if a company’s laser-cutting machine keeps failing, the five questions might be:

  1. Why did the laser-cutting machine stop working? (Answer: An overload blew a fuse.)
  2. Why was there an overload? (Answer: The bearing was not adequately lubricated.)
  3. Why was the bearing not lubricated? (Answer: The lubrication pump was not working sufficiently.)
  4. Why was the lubrication pump not working sufficiently? (Answer: The shaft of the pump was worn and rattling.)
  5. Why was the shaft worn and rattling? (Answer: The strainer was not attached, allowing metal scraps to get in.)

The Five Whys method typically works best for simple problems, where the cause-and-effect relationship is more straightforward, or later on in the RCA process after many other root causes have been ruled out. The selected why questions should be based on evidence gathered during the investigation, not on hunches or previous experiences.

Fishbone Diagram

Also known as an Ishikawa diagram, a fishbone diagram categorizes and maps out the possible causes of a problem, such as equipment failure. Categories might include “method,” “machines,” “materials,” “manpower,” and “environment.”

infographic fishbone diagram
This fishbone diagram maps the potential root causes of equipment failure. Each “bone” represents one of six categories—machine, method, worker, material, environment, and measurement—with specific contributing factors branching off.

A fishbone diagram works well for complex manufacturing problems that may have many potential causes; it encourages teams to delve thoroughly into multiple contributing factors to get to the source. This method underpins cross-functional brainstorming in the diagram’s development.

Fault Tree Analysis

Another visual tool is the fault tree, which RCA teams can use to analyze and deduce the source of an issue by working from the top down. This process begins with describing the adverse event (for example, the aforementioned equipment failure), then creating a series of statements to be proved true or false using a tree-like logic diagram. These statements or events are arranged in a sequence of series relationships (x “or” y) and parallel relationships (x “and” y), using logic symbols to illustrate dependencies among events.

infographic fault tree analysis equipment failure
This fault tree analysis breaks down the top-level issue—equipment failure—into four contributing causes: mechanical failure, electrical failure, operator error, and environmental conditions. The analysis also maps how these failures interact. “And gates” means that all listed events must occur to cause the failure. “Or gates” means that any single contributing event can trigger the next-level failure.

Fault trees can be particularly beneficial when analyzing complex manufacturing environments where one failure creates potential ripple effects throughout the system. Some manufacturers prefer to take a proactive approach, using fault tree analysis early in product development and design to anticipate and address problems that may come up during production.

Is/Is Not Analysis

An “is/is not analysis” is a coordinated approach to eliminating irrelevant issues that narrows down the options in a root cause investigation. Especially useful when the production problem is unclear or has blurry boundaries, this approach helps the team define a problem (what it is and what it is not), as well as other details, such as where and when it occurs (and where and when it does not).

The is/is not method involves creating a clear problem statement, noting what is or is not part of the issue. Then the team creates two columns: an “is” column for all factors that are definitely part of the issue and an “is not” column for all factors that are definitely not part of the issue. By analyzing the two columns, the team can begin to identify patterns, anomalies, and possible root causes of the problem.

For example, if a plant has been experiencing equipment failure, the “is” column may include the involved machine models, the time of day or location at which the problem occurs, and the work being done at the time. Conversely, the “is not” column might note uninvolved machine models, the times of day or locations not experiencing the issue, and the types of work processes that are not involved.

Effective application of is/is not analyses demands specificity and objectivity when defining the “is” and “is not” parameters. Don’t skip past any key distinctions.

Pareto Analysis

Pareto analysis (or a Pareto chart) helps manufacturing teams identify the most likely “vital few” causes that are contributing to the majority of a production issue. Based on the 80/20 rule—aka the Pareto Principle—the idea is that 80% of a problem is likely caused by 20% of the causes. By zeroing in on the latter, a production team can focus its efforts on maximizing improvements. Pareto analysis, like most of the other methods outlined here, begins by clearly defining the production problem and then gathering relevant data.

For example, a manufacturer that discovers variability in the weight of produced smartphones will gather related data, such as batch records and quality control data. Then it will identify potential root causes related to, for example, raw materials, equipment, human factors, process issues, or environmental conditions. Then the company can determine the quantity of manufacturing errors associated with each possible cause, first ranking them by the frequency and next calculating the cumulative percentage of occurrences for each cause. Human factors may be responsible 40% of the time, equipment problems 20% of the time, process issues 5% of the time, and environmental issues 2% of the time.

On a Pareto chart, the possible causes appear on the x-axis, the frequency of each is noted on the y-axis, and bars are drawn. In addition, a z-axis illustrates the cumulative percentage for each cause. A steep line in cumulative percentages will point to the causes with the most significant impact on the problem at hand.

Pareto Chart Example

Cause Frequency (%) Cumulative (%)
Human factors 40 40
Equipment problems 20 60
Process issues 5 65
Environmental issues 2 67
This Pareto chart analyzes four root causes of smartphone weight variability—human error, equipment problems, process issues, and environmental issues—and ranks them by frequency. The cumulative line shows their combined impact. Following the 80/20 rule, the chart helps manufacturers focus corrective actions on the few key factors that drive most weight variability.

Although it might seem logical to attack all possible causes, prioritizing the most impactful ones over those that rarely impact production or product is especially useful when resources are limited. Ranking is best achieved by gathering accurate data over an appropriate period of time (or number of batches).

Failure Model and Effect Analysis (FMEA)

Based on severity, occurrence, and detectability, FMEA is another method that RCA teams can use to assess and prioritize the effects of potential issues. Unlike some of the other techniques described above, FMEA is often used proactively during product or process design to develop plans for mitigating the most critical risks, as well as to install controls for detecting failure modes. Thus, it’s important to revisit FMEA when processes or products change or new data becomes available.

Though FMEA is most often considered a risk management approach, it can prove particularly useful in the context of an RCA because it produces a predetermined list of likely failure points (though they may still want to use another methodology to dig deeper into the root causes). If a manufacturer encounters product or process issues for which it had performed FMEA, it may want to reassess its preventive actions.

Benefits of Conducting a Root Cause Analysis in Manufacturing

Performing an RCA is crucial for manufacturers if they are to isolate and tackle the underlying causes of production issues or product problems. Doing so can ultimately result in greater operational reliability, better product quality, and more satisfied customers.

But those aren’t the only benefits RCA confers. Some of the key reasons for developing a solid RCA practice include:

  • Optimized costs: Nearly three-quarters of manufacturers said they had experienced a product recall in the previous five years, according to a Hexagon/ETQ 2024 survey, costing them millions of dollars. With numbers like these, it’s clear that defective products, inefficient processes, recurring errors, and wasteful practices can easily—and severely—cut into manufacturers’ often thin margins. Add in downtime, bottlenecks, or safety issues, and cost management becomes all the more critical. Performing an RCA and implementing targeted solutions helps cut down on unnecessary expenses and improves profitability.
  • Improved processes: Process improvement is another high priority for manufacturers. By identifying and addressing root causes, companies can improve their production workflows and boost efficiency, consistency, and product quality.
  • Decreased downtime: RCA can reduce unplanned downtime on production lines by fixing the core reasons behind equipment failures, process bottlenecks, or work stoppages. Sometimes it’s a workforce issue: According to a 2024 survey conducted by L2L, 81% of manufacturers said they experienced disruptions in their plant operations due to high employee turnover.
  • Increased worksite safety: Manufacturers that employ RCA to pinpoint and prevent the rudimentary reasons for accidents or close calls on the shop floor or in warehouses are more likely to have safer worksites and fewer incidents or injuries. The application of RCA can also help promote a culture that values safety, which helps employees feel valued and makes the manufacturing operation a more attractive place to work.
  • Enhanced productivity: Integrating RCA into the production tool kit increases the likelihood that a manufacturer will be able to eradicate the kinds of recurring issues and inefficiencies that slow things down. That means both machinery and shop floor staff can operate at their peak capacity, elevating production efficiency.
  • A proactive culture: RCA is about getting to the bottom of issues so they don’t fester. Thus, its use can help create a culture of proactive problem-solving. Team members who see issues being addressed before they escalate recognize that continuous improvement is valued.

Challenges in Conducting a Root Cause Analysis in Manufacturing

Given the benefits of RCA, it might seem that performing these investigations should be a slam dunk. However, some critical obstacles can impede effective analyses and undermine their productivity and accuracy, including:

  • Confirmation bias: Ruling out information or data that doesn’t support preexisting assumptions or beliefs about the cause of a production issue is known as confirmation bias. Confirmation bias can cause an RCA team to overlook certain evidence or alternative sources of the issue, resulting in incomplete or entirely wrong results and perpetuation of the production or product issue.
  • Misunderstanding contributing factors vs. root causes: A root cause is the underlying reason why a production or product problem happened. A contributing factor is a condition or situation that influenced the problem, but didn’t cause it. A successful RCA is clear on the difference and aims at finding the former. If a CNC machine routinely breaks down, inadequate operator training, poorly managed maintenance schedules, or raw material quality issues could all be contributing factors. However, the root cause may be insufficient budgeting for maintenance and training, lack of a comprehensive maintenance plan, or insufficient raw material quality control.
  • Data gaps: As mentioned earlier, an RCA investigation is only as good as the data that feeds into it. Inaccurate or incomplete data can slow down or invalidate the RCA process. Missing data can also lead to diagnoses based on hunches or anecdotal evidence, which alters the efficacy of the process, results, and solutions.
  • Assigning individual blame: RCAs have two goals—problem-solving and continuous improvement. Blaming individuals or groups rather than finding causes in the system or process can have a deleterious effect not only on the investigation but also on the broader culture. Subsequent negative impacts include fostering fear, discouraging open communication, and preventing identification of the primary issues that need to be addressed.
  • Failure to follow up: An effective investigation does not end with the implementation of a solution. It’s essential to monitor the effectiveness of corrective actions and reassess the situation as needed, lest the problem recur or even reappear in new ways. The best RCA incorporates ongoing monitoring to make sure that solutions achieve the intended results and are adjusted when they do not.

Root Cause Analysis Best Practices

A haphazard or ad hoc approach to RCA is unlikely to yield desired outcomes; in fact, it may make matters worse if it results in inaccurate conclusions or ineffective corrective actions.

Adopting RCA best practices will put manufacturers in a better position to systematically identify and eradicate the underlying reasons for production or product issues or failures. Companies seeking to reap the full benefits of RCA and sidestep the common obstacles enumerated above would be wise to integrate the following best practices into their approaches.

Focus on Improving Systems

The goal is to create an open culture where analysis is a welcome exercise in improving processes and products rather than place blame. The aim should be to get at the “how” and “why” of the issue, not the “who.” Human factors may be at play, but punitive actions will stifle the process.

Companies can take a variety of actions to underscore this emphasis on systems improvement. These include instituting cross-functional teams to handle RCA, establishing a culture in which employees feel safe when reporting problems, and encouraging them to contribute to improving systems. RCA results must guide corrective actions, whether that’s changing processes, providing training, or investing in new equipment, not chastising employees.

Look for the Cause, Not the Symptoms

It’s often much easier to address symptoms than to delve deeply enough to find and fix the core issues. Quick fixes are temporal, at best.

Manufacturers should instead push their investigations beyond surface-level issues to discover the originating causes and prevent problems from resurfacing. Implementing the techniques described earlier in this article will help operationalize this digging in the dirt. Collecting and analyzing as much relevant, complete, and accurate data as possible will also help distinguish between causes and symptoms.

Keep Your Solutions Realistic

Corrective actions that are overly costly or implausible are unlikely to be successful. Solutions for unearthing root causes should be practical, feasible, and sustainable, and take into account the organization’s resources and constraints.

To develop realistic solutions, companies should involve all key stakeholders in generating and evaluating corrective actions. They should also consider the cost, time, and impact of any proposed corrective action before attempting to implement it. Another option is to pilot the solution on a small scale before rolling it out more broadly.

Validate That Corrective Actions Have Been Taken

As noted earlier, a good RCA process does not end once a solution has been implemented. Manufacturers should monitor all corrective actions post-implementation to make certain they were applied as intended and resolved the core issue.

Establishing KPIs or quality metrics to measure the effectiveness of the RCA and ensuing solution(s) is a best practice. It’s a good idea to assign this follow-up and verification work to an individual or team so it doesn’t fall through the cracks. Some companies opt to conduct regular audits or reviews to confirm that any desired changes become standard operating procedure.

Take Steps to Prevent a Similar Incident

Corrective actions to address the root cause aren’t the only after-RCA best practice worth adopting. There’s also significant value in sharing lessons learned and standardizing improvements more broadly across the organization.

This can begin by documenting all findings, corrective actions, and outcomes. This information should be shared with other teams and departments so they also can benefit from the lessons learned and any new best practices developed. Relevant training materials, standard operating procedures, and maintenance schedules require updating as necessary.

Leverage Your Technologies

The best analyses involve a significant amount of data collection, analytics, and KPI management to be sure that true root causes are identified and corrective actions produce the desired results. Enterprise technology platforms and software, such as ERP and supply chain management systems, can automate many tasks, reduce the likelihood of errors, and keep the RCA and related efforts on track.

When RCA processes are integrated with these systems, companies can look forward to having access to centralized, accurate, and often real-time data. Built-in functionality, such as embedded AI, real-time dashboards, reporting tools, and automated alerts, can also help production teams identify patterns, proactively address production problems, and monitor the upshot of their RCAs and related action plans.

Facilitate Smarter Problem-Solving With NetSuite for Manufacturing

NetSuite for Manufacturing is a cloud-based ERP platform purpose-built to support companies with complex operations. By offering real-time shop floor control capabilities, NetSuite gives managers the ability to monitor performance for issues or anomalies that may require intervention and, ultimately, RCA. In addition, NetSuite for Manufacturing is a unified solution, connecting manufacturing data and processes to other relevant enterprise information (finance, HR, CRM, etc.) for an expedited RCA process.

Constantly fighting fires is no way to run a complex manufacturing operation. Although some production issues and product defects may be inevitable, structured, data-driven approach that pinpoints underlying sources will go far in preventing their recurrence while driving continuous improvement across manufacturing operations. A well-thought-out RCA practice—backed by the right data, investigative methodologies, technology platforms, and processes—can help manufacturers address the root causes and deeper issues lurking in their production environments, rather than simply fixing the symptoms that bubble up to the surface.

#1 Cloud ERP
for Manufacturing

Free Product Tour(opens in new tab)

Root Cause Analysis in Manufacturing FAQs

Is a root cause analysis considered lean technique or a Six Sigma technique?

Root cause analysis (RCA) is not associated exclusively with either lean manufacturing or Six Sigma methodologies. The practice, which concentrates on identifying and addressing the foundational factors leading to production or product problems, is actually central to both of these continuous improvement methodologies.

What is the difference between a root cause and a contributing factor?

A root cause is the fundamental reason why a production or product problem happened, while a contributing factor is a condition or situation that made a problem more likely to occur. If a piece of manufacturing equipment keeps unexpectedly shutting down, contributing factors could be poor operator training or raw material quality issues, while the root cause may be insufficient training budgets, outdated training programs, or poor raw material quality control.

Which KPIs are used in a root cause analysis?

The KPIs used to measure the effectiveness of the corrective actions taken after a root cause analysis (RCA) vary by problem and intended outcomes. If a certain production line has been experiencing stoppages, for example, an important KPI to track might be uptime or downtime.

Companies can also use KPIs to measure the effectiveness of their overall RCA programs. These metrics include RCA completion rate (percentage of identified issues for which a root cause analysis was successfully completed), RCA cycle time (average time to complete an RCA from initiation to closure), recurrence rate (frequency at which the same or similar issues recur after an RCA), and RCA cost savings (money saved as a result of implementing corrective actions).

When should you perform a root cause analysis?

A number of scenarios warrant a root cause analysis. These include a adverse events such as breakdowns or failures, which can create numerous issues: manufacturing slowdowns or stoppages, product defects, quality issues, safety issues, or regulatory noncompliance. A recurring problem or repeated complaints suggest the problem is not an isolated incident but the result of a more systemic issue that requires RCA analysis. In addition, a regulatory requirement or industry standard may call for investigating a significant event.