Author Archive

IT Operations Management Insights From the United Airlines, NYSE and WSJ Outages

In rapid-fire succession on 8 July 2015, United Airlines, the New York Stock Exchange (NYSE), and the Wall Street Journal (WSJ) website experienced extremely high-profile outages.

The culprits? For United Airlines and the IT OperationsNYSE, it was the network (routers and gateways, respectively); for the WSJ, it was the combination of insufficient capacity for unanticipated demand, and a failure of server connections (a 504 error).

Key factors in these events were network devices, outages, downtime, service impact, configurations, and capacity –the realm of monitoring and management of IT infrastructure used to deliver applications and services. Yet most of what I’ve read sensationalized the nonexistent cybersecurity angle. So I’m highlighting some key IT operations management insights to take away from incidents such as these:

Any assumption that could impact ongoing business and IT operations should be considered wrong. Felix Unger said it best: Never assume.

The United Airlines router outage impacted applications and services that ultimately grounded all planes. An IT operator easily could have assumed in such a scenario that he or she would see a relevant alert generated by a faulty device. But what if the bad router effectively imploded before getting a chance to deliver its “suicide note”? Then the operator has to know to look for events that should have been generated but weren’t. In my experience, the kind of out-of-the-box thinking needed to immediately deduce this type of problem is extremely rare – especially in a major-outage scenario, in which significant pressure exists to solve the problem ASAP, and everyone presumes all relevant data is there, but it just hasn’t (yet) been properly analyzed into something insightful. (more…)

Three Key Observations From the Gartner Data Center, Infrastructure and Operations Management Conference

I was fortunate enough to be part of the team that supported the EMC presence at the recent Gartner Data Center, Infrastructure and Operations Management Conference in Las Vegas earlier this month. Lots of hard work (briefings, meetings, staffing the expo booth) but also a great opportunity to speak with users and customers, as well as garner some interesting insights from the Gartner analyst-presented sessions.

So what were some of the key themes I observed? First, the software-defined data center is moving a lot closer to reality for a lot of attendees. Key technologies such as software-defined storage and software-defined networking have moved for most from the “I’ll keep my eyes on it” bucket in 2014 into the “I’ve got to do something about this in 2015” bucket. That’s no surprise to our team; we’ve been observing a lot of the same behavior in our interactions with customers at places like executive briefings and user-group meetings. And it helped drive a lot of the insights we presented in our event-sponsor session on“Making the Software-Defined Data Center a Reality for Your Business,” in which the need for automation, especially at the management and monitoring level, was emphasized as a critical requirement to delivering on the promise of the software-defined data center.

Another key theme that had almost everyone talking was a notion of “bi-modal IT,” in which IT operations would simultaneously support both an agile, devops-like model for rapid iterations and deployment of newer applications and services, while also maintaining a “traditional” IT operations model for more traditional, less business-differentiating applications and services. In some ways, analysts had been alluding to this for years – devops was coming; it would be a major influential force; prepare for it. What was lacking was the “how,” and that confused and even scared people. But now at this event we learnedanalysts are saying to support both models (hence “bi-modal” IT), and, more importantly, deploy supporting systems and tools for each – and absolutely don’t try to use one system for both models (because nothing is out there that can do that effectively). Folks I spoke to almost had a concurrent sense of relief: Two modes, each with their own tools and systems, makes sense to everyone, and eliminates that angst associated with potentially trying to make the round peg fit in a square hole. And since it came from this event, it has the inherent “validation” that many in upper management want.

DSCN0314 Building on this, the third theme I noticed (more from my interactions with other conference attendees, especially at the EMCexpo booth) was a strong interest in continuous availability of applications and systems, rather than in backing up and being able to recover these same environments. People were asking the right questions: For example, what kinds of storage architectures make sense in a continuous-availability model, and can those be aligned with changing data needs? (Yes, and EMC has a lot to offer on this front.) What are the key elements of a monitoring system that focuses on continuous availability? (One answer: automated root-cause and impact analysis, which radically shrinks time needed to identify problems, and is a key capability in the EMC Service Assurance Suite.) And can a server-based SAN play a role in continuous availability architecture? (Absolutely – as long as you’re managing it with EMC ScaleIO.)

And this event also had its share of the unexpected (the Las Vegas strip was fogged in – yes, that’s not afoggy_lv typo – for almost two full days), as well as lighter fun-filled moments (EMC’s arcade-themed hospitality suite for conference attendees, complete with a customized Pac-Man-like game called “ViPR Strike). And as always, it’s the discussions and interactions that I cherish and remember the most.

Which brings it back to you: Were you at the conference too? If so, what do you think of these higher-level observations of mine? What else do you have to add or share? Even if you didn’t go, what are your thoughts and opinions on what you’ve read here?

What “Field of Dreams” Can Teach You About IT Projects and IT Operations Management

Would you think that the 1989 movie “Field of Dreams” has just as much to do with IT operations as it does with baseball. Remember the backstory?field_of_dreams_poster

The movie’s protagonist, Ray Kinsella, is an average and unsuccessful farmer, with regrets about his past. Like many on an IT team who gets struck with a bolt of inspiration – an idea for an IT-related project (a new application or service, probably along the lines of “what if we had a way to…” or “what if we did this:”) – Ray listens to a voice in the night telling him “If you build it, he will come.” He proceeds to plow under his crop, and construct a baseball field in the middle of his Iowa cornfield

Then Ray’s “project” develops its own momentum – the seemingly now-corporeal ghost of Chicago White Sox outfielder Shoeless Joe Jackson walks out of the cornfield bordering the newly built baseball field, admires everything, thanks Ray for what he’s done, and asks to come back – with “friends.”

Now Ray on an IT team would have had a similar experience: Someone gets wind he’s been working on a Skunk Works project that, although radical, could be something amazing. Shoeless Joe is like that first test user that becomes the unintentional evangelist, and quickly starts to build a critical mass among users.

At this stage in the IT project, things are going well: The user base has grown, the old guard (shown in the movie as 1960s anti-establishment author Thomas Mann) at first grudgingly agrees to take a look at the project, then likes what it sees, becomes a strong advocate, and things evolve quickly (maybe even moving to formal alpha testing). In the movie, Shoeless Joe has brought a throng of other now-corporeal ghosts to play baseball once again on Ray’s field (more users, all of whom love the work Ray’s done). And Ray’s wife Annie stands by her man, despite a wave of criticism coming from her brother Mark, the financial advisor and antagonist who absolutely cannot see or understand Ray’s vision, and what he’s done.

Mark personifies the non-IT finance person who absolutely cannot see any value in an IT project. The premise makes no sense. It’s not about economic optimization. Losses need to be cut and risk averted. (Sound familiar?) Even the testimonies of others can’t budge this person from his position. And without the ability to secure funding, that IT project isn’t ever going to go anywhere and become something big.

Getting past this immoveable object requires an “ah-ha” moment, where the clouds part and insight illuminates the closed-minded. For Mark, it was seeing the “project” in a context that mattered to him. (In the movie, Mark could finally see the ballplayers after one of them, Moonlight Graham, willingly chose to step off the safe environment of the field to help Ray’s daughter Karen, who had fallen from the bleachers.)

So now Ray’s IT project has its financing. It’s gone from a germ of an idea, been incubated, grown, overcome hurdles, proven, and spread among users. It’s been promoted. (Thomas confidently predicts that people will come.) In IT project terms, it’s ready to go live.

In the movie, that go-live moment is represented by Ray finally getting his reward for all his efforts: The newest player to arrive at this field is no other than his father, with whom, from a soliloquy early in the movie, Ray had significant, gnawing regret from their long, unresolved estrangement. The movie delivers its biggest tear-jerker moment when Ray, who, as a rebellious teen, steadfastly refused to throw a baseball around with his father, was now able to fix that by asking his father to have a catch.

And they did. And the credits to the movie rolled, showing a fade-to scene of miles and miles of cars driving toward Ray’s field. And you might think that would be the end of the IT analogies here as well. But it’s not.

Although the biggest IT lesson of all came from the end of the movie, it had nothing to do with Ray and his father playing catch. Just before that sequence, the most level-headed character in the movie, Ray’s wife Annie, simply and succinctly proved to be the voice of reason in one sentence: “If all these people are going to come, we’ve got a lot of work to do.”

Annie represents IT operations. She supported Ray’s idea (the IT project) from the beginning. (Many I’ve spoken to in IT operations say “Do we ever really have a choice of not supporting it?”) As IT operations does, she kept things running smoothly despite the chaos unfolding. And, as the voice of reason to any IT project, IT operations provides the view that starts to answer the question “What happens now that we’ve gone live?”

And Annie’s insight provides the biggest unstated IT lesson from the movie: Although the IT project and the way it unfolds is important (and makes for a good story, in this case), you can’t neglect the IT operations view of the world: Things need to happen after the go-live date to ensure IT service delivery environment (i.e., infrastructure) keeps running smoothly and performs as expected. And consider the long-term impact of what’s now changed in the environment. In IT terms, the project’s success has created a performance bottleneck in the environment (the traffic jam of people trying to get to the field). And that’s just for starters: Where will all these people eat, sleep, and bathe? And is there a wi-fi hotspot nearby?

Good IT projects can be a tremendous experience. They can overcome obstacles to create new value, change things for the better, and get people to see things in a whole new light. But a key takeaway to keep in mind is that they, like a movie, tend to have a story arc: a perceived beginning, middle, and end.

And it’s really that “end” that needs to be thought of as the beginning – the start of the usage lifecycle of that application or service. That’s when you have to address everything that needs to be done, from an IT operations monitoring and management perspective, to keep that new application or service available and performing the way it should to meet (or exceed) user expectations. And that’s exactly what EMC Service Assurance Suite and EMC ViPR SRM do – provide IT operations teams with the insights necessary to ensure that IT service delivery environment functions they way that they should, as well as the ability to easily absorb changes in the IT environment.

If I were producing this as a short movie, I’d now call out “fade to black, cut, and roll credits.” But IT operations would still keep a light on, behind the scenes, to help keep an eye on things.



Connect with us on Twitter