Torqued: Air Asia Crash Highlights Risks of In-flight Troubleshooting

 - February 1, 2016, 3:00 AM

When an Air Asia Airbus A320-216 on a routine flight from Surabaya, Indonesia, to Singapore crashed on Dec. 28, 2014, killing 162 people, many experts suspected that bad weather at the time might have played a role. The media interest was intense at first, as the accident so closely followed the mysterious disappearance of Malaysia Airlines Flight 370. But media attention flagged once the aircraft was found in the waters off the coast of Indonesia. A crash in a part of the world deemed by Americans to be remote, and with no Americans on board, doesn’t usually interest the media here for very long. MH370, of course, was different. The disappearance of a modern jetliner was as riveting for the general public as for the aviation world.

But even in the aviation community, the crash of Air Asia Flight 8501 seemed to be quickly forgotten. That’s a mistake. All accidents have something we can learn from. Especially when there are so few accidents, those that occur need to be thoroughly examined for any information that can help us prevent future ones. When human factors are involved, it’s particularly worth reviewing. Regardless of where in the world an accident occurs, the human element remains the same.

The recently released accident report on Air Asia Flight 8501 reveals that weather was not a factor. But several disturbing factors were, involving both the flight crews and the maintenance department. What happened here could happen anywhere. And it could happen again. The lessons to be learned from this accident are many, and just because this involved an airline in Asia doesn’t mean the lessons don’t apply to U.S. airlines. Or corporate flight departments. Or even individual GA pilots and mechanics. 

Repeated Problem and Improper Response

Twenty-five minutes into the flight–at 2300 UTC–the A320 was cruising at 32,000 feet when the crew got their first amber warning light on the electronic centralized aircraft monitoring (Ecam): auto flt rud trv lim 1, indicating a problem with the rudder travel limiter. This automated system prevents the pilots or autopilot system from moving the rudder too far at high speed, which could result in the rudder and vertical fin separating from the aircraft. 

Thereafter, the accident report indicates the following sequence of events related to the rudder travel limiter system based on the flight data recorder (FDR):

  • At 2301 UTC, the FDR recorded failure on both rudder travel limiter units and triggered a chime and master caution light. The Ecam message showed auto flt rud trv lim sys (auto flight rudder travel limiter system). The PIC read and performed the Ecam action of auto flt rud trv lim sys to set flight augmentation computer (FAC) 1 and 2 push buttons on the overhead panel to off then to on one by one. Both rudder travel limiter units returned to function normally.
  • At 2309 UTC, the FDR recorded the second failure on both rudder travel limiter units and triggered a chime and master caution light. The pilots repeated the Ecam action and again both rudder travel limiter units returned to function normally.
  • At 2313:41 UTC, the single chime sounded and the amber Ecam message auto flt rud trv lim sys was displayed. This was the third failure on both rudder travel limiter units on this flight. The pilots performed the Ecam actions and the system returned to function normally.
  •  At 2315:36 UTC, the fourth failure on both rudder travel limiter units triggered Ecam message auto flt rud trv lim sys, chime and master caution light.

According to the FDR, after the first three warning lights, the crew properly performed the Ecam actions. However, after the fourth warning the crew did not follow the Ecam message but instead appeared to pull the FAC circuit breaker. This caused the autopilot and auto-thrust to disengage, removing several stall protections. The aircraft ultimately entered a stall from which the crew was unable to recover and the aircraft began its uncontrolled descent into the ocean.

Fly the Airplane

The first lesson that we should take away from this accident is one that every crew should know from accidents that have occurred in the U.S. Troubleshooting in flight–especially troubleshooting when flight controls are suspect–is hazardous because of the risk of unintended consequences. 

The first significant risk is distraction from the primary job of flying the airplane. One of the accidents in the U.S. that drew attention to this was the Eastern Airlines Flight 401 crash into the Everglades in 1972. In that accident, as the aircraft was descending to land at Miami International Airport, the crew became so focused on why a landing-gear position indicator bulb was off that they stopped flying the airplane. With the autopilot accidentally disengaged, the crew failed to notice that the airplane was slowly descending. The aircraft gradually flew into the Everglades, killing 101 and injuring 75 of the 176 people on board.

The second significant risk is taking an action that has consequences that crewmembers are not aware of. For example, a circuit breaker might not be labeled with all the systems that receive power. So pulling that circuit breaker could interrupt power to a number of electronic components without the crew knowing which ones. In the Air Asia crash, it’s likely that the crew did not realize the significance of pulling that particular circuit breaker but were very much surprised by the consequences. Not only were systems unexpectedly shut down but the resulting airplane gyrations were likely also unanticipated. (As a side note, circuit breakers should never be pulled in flight or reset unless the flight operations manual requires it.)

The Jan. 31, 2000, crash of an MD-83 operating Alaska Airlines Flight 261 highlights the third significant risk: troubleshooting can make a problem worse. In that case, while attempting to troubleshoot an apparently jammed stabilizer trim motor, the crew overflew a number of airports where the aircraft could have landed safely. Instead, the troubleshooting, which involved repeatedly trying to change the position of the horizontal stabilizer by manipulating the electric motor controls up and down, resulted in the horizontal stabilizer becoming disengaged from the jackscrew that controls the up and down movement of the horizontal stabilizer. This rendered the aircraft uncontrollable and it crashed into the ocean off the coast of Los Angeles. All 88 people aboard were killed. 

Every crew–regardless of whether they’re flying an airliner, a corporate jet or a GA airplane–needs to remember that troubleshooting should be done only on the ground. The only exception would be a system that is absolutely needed to land the aircraft. And even then, special precautions need to be taken.

The other significant lesson from this crash is related to maintenance. There were repeated write-ups and deferrals of the rudder trim limiter system. The accident report does not delve into why this was allowed to occur and why the aircraft was not taken out of service until the problem could be found. Repeat items have to be dealt with in a timely manner and not deferred over and over so that crews are not put in the position of having to deal with known problems.

Comments

Article misses the point, '...This caused the autopilot and auto-thrust to disengage, removing several stall protections. The aircraft ultimately entered a stall from which the crew was unable to recover and the aircraft began its uncontrolled descent into the ocean.'
This belies a lack of knowledge of Airbus stall protections. Pulling the circuit breaker was not advisable at all, but this was the beginning of the chain because it caused the aircraft to switch to alternate law. Airbus pilots are taught to fly full-back stick in most eventualities, which is the very last thing you normally want to do in any other aircraft because you will induce a stall. Because Airbus' have AofA protection, they are taught not to fear, or even recognise a stall, because they are taught to believe it is not possible. Upset recovery isn't even in their vocabulary because theoretically you cannot upset an Airbus, except in alternate law. AF296 (crashed into trees), AF447 (crashed into atlantic) and now Air Asia 8501 shows that the envelope protection philosophy does not work, and the general Airbus pilot community need to be taught that 'alternate law' or 'direct law' means NO FULL BACK STICK.

Me thinks "Bloke" is missing the bigger point.

"Alternate Law" questions aside, the basic issue is that the crew did not have a full understanding of what systems would be affected if one/both CBs were pulled. Which doesn't surprise me. Most mechs or engineers won't either, unless they sit down with the wiring manuals and the maintenance manual for a few hours

As I understand it from reading the report, the bigger issue is that a flight crew member was trying to clear an EICAS message by using a Maintenance Manual procedure IN FLIGHT, without fully understanding what was disabled when the CBs were pulled.

According to the report, the crew had the EICAS message the previous day, while sitting at the gate. The message was cleared (per the M.M.) by the maintenance staff by pulling the CBs. The pilot observed this, and asked the mech if pulling the CBs was a legit procedure, which the mech confirmed, and which it was............BY A MECH, WITH THE AIRPLANE ON THE GROUND.

IOW, a "little bit of knowledge is dangerous" issue.

The bigger question in my mind is who decides when a "nuisance" becomes a "discrepancy". We've all dealt with the intermittent caution/warning lights and EICAS messages. From what I gather, the aircraft was legal to remain in service, for months or even years, as long as the message could be cleared on the ground. Which was what was happening, for many flights/months. It looks like a fundamental flaw in the airline maintenance plan (especially when the work is contracted out, and/or the same guys don't see the airplane everyday). Who "owns" the problem? At some point, someone needed to put on his Big Boy pants and say "This problem has gone on long enough.......it's AOG until we figure out what's wrong".

Show comments (2)