Why You Can’t Manage What You Don’t Measure: The NOC – MDI Approach
I’m writing this here at my desk, third coffee in hand, with three decades of hard-won experience in the network and security wilds, and I’m here to tell you that … you can’t manage what you don’t measure.
The NOC – MDI (Monitoring, Diagnostics and Intelligence)
The NOC is the nucleus of any enterprise network, but has never really been properly established as its work is often shrouded in technobabble or just forgotten about. That stops now. Allow me to introduce you to the most important NOC KPIs we manage at PJ Networks—measurements that have steered us from way back when PSTN multiplexer was a thing, to the madness of the Slammer worm, and now toward the truth behind upgrading zero trust for banks we’ve recently made secure under our care.
Why To Utilize Metrics In Your NOC Setup
Metrics allow you to make sense out of chaos. They hear tales buried in event torrents, alarm tempests. It’s just a question of showing that things are better than yesterday. And without that, you don’t know if your team is actually getting better or running around.
From my earliest days as a network admin in 1993 (where “downtime” was defined as losing your voice calls on the PSTN mux) to leading PJ Networks today, I’ve seen everything from major delayed detection to grossly inefficient resolutions because of bad or missing data. If you run NOC management — in India or elsewhere — getting your KPIs right is the secret sauce.
NOC Key Performance Indicators We Swear By at PJ Networks
Let’s get tactical. Here’s what we fixate upon, the numbers that actually shift the needle:
-
MTTD & MTTR- Mean Time to Detect & Mean Time to Resolve
– MTTD is the amount of time it takes your team to notice a problem once it’s occurring. Speed is less consequences.
– MTTR is how long till the problem’s fixed.I recall the Slammer worm in 03 and how with my old PSTN gear it was the difference in a molten mess and containment when it came to MTTD and MTTR.
Our goals at PJ Networks are to achieve an MTTD of less than five minutes for our critical events and a MTTR of less than 30 minutes. Are these tough? You bet. But with good monitoring and intelligent automation, it is possible.
-
Incident Volume & Severity
It’s not only about the volume of alerts coming in. Is your team caught in false alarms, or is it diving into real issues?
We classify these incidents according to their severity:- Severity 1: Take action now. Network is out, major banking transactions denied.
- Severity 2: High impact but no complete outage.
- Severity 3: Minor or info alerts.
Tracking these over time helps you spend more effectively. Remember: sometimes a low frequency of high severity is worse than high frequency with less severity.
-
False positives and false negatives
Ah, the bane of any NOC.- A false positive squanders your team’s time following noise.
- False negative allows the threats to escape unnoticed.
When you’re at PJ Networks, we’ve spent years finetuning detection thresholds—because here’s the deal: too aggressive alerts are the IT equivalent of your car alarm going off every 5 minutes. No one listens anymore.
Pushing false rates below 10% and preventing false negatives from ever rising above 0% are challenges we face daily.
-
Capacity/ Resource Utilisation
Note: In respect of Major port capacity addition does not include 7 KMT of project implemented by a private party on BOT basis at Ennore port.Networks are like cooking—everything has a sweet load band. If you have too little utilization, you’re wasting resources; too much and the network chokes.
Capacity Monitoring It is watching over:
- Bandwidth use
- CPU Utilization on routers/firewalls
- Memory and I/O on servers
Tuning usage is cheap or free and adds benefits. PJ Networks is targeting somewhere in the range of 70-80% utilization—enough for bursts, and not too close to the network hitting red line.
How We Set Realistic Benchmarks And Targets (PJ Networks’ Approach…)
Benchmarks are not some magical golden numbers out of cloud. They should mirror your environment, tech stack, team abilities and business requirements.
Here’s how we do it:
- Begin with history data from your running NOC system.
- Look to industry averages —but don’t follow them slavishly.
- Consider your business clients’ criticality (i.e., banking vs. retail).
- Stagger the goals: short (3 months out), medium (6-12 months ahead) and long (over a year out).
- Go back and look at the benchmarks once a quarter. Networks evolve. Metrics must, too.
PS: While you think 1 minute of MTTD is doable at all, you’re better slowing yourself down and considering a bit what network complexity, amount of data, or team size is. Perfection is the enemy of good in this case.
Data Collection Methods & Tools – Not So Secret Ingredients
Without solid data, your KPIs are garbage (guesses).
These are our go-tos:
- Network monitoring solutions such as Nagios, SolarWinds, PRTG for the real-time state of a device.
- SIEM platforms (Security Information and Event Management) for security alerts.
- Ticketing systems (Jira Service Desk, ServiceNow) to follow up and resolve times.
- Capacity management tools that report hardware use.
It’s also tempting to rush into the latest AI-driven app. Been there. I’m skeptical here most of the AI hype I heard at DefCon’s hardware hacking village sounded like just that, hype. Good old analytics with a touch of intelligent automation is better than most shiny buzzwords.
Wherever possible, automate data pulls to reduce the demands of error-prone manual logging.
And remember your team’s contribution — screen scraping logs or manual overrides in real life.
Techniques for Trend Analysis and Reporting
Reporting is more than dashboards. It’s storytelling.
Over weeks and months, track KPI trends, and look for:
- Deviations from baselines.
- Seasonality (weekend decline, patch day increase).
- Correlation (high volume of incidents lead to many false positives).
We recommend:
- Employ visual charts: line graphs for trends, pie charts for incident types.
- Weekly snapshots + monthly deep dives.
- Disaggregate outliers and analyse the causes.
This is where Python or Excel can be your best friends — my team automated KPI trend charts that refresh with the raw data every day. Aids in catching the little festering problems before they erupt.
The Art of Communicating KPIs to the C-Suite
Here’s a pro tip I had to learn the hard way:
Executives do not want your network engineering terminology. They want impact.
- Translating MTTD and MTTR into business risks saved.
- Utilize easy dashboards with explicit color codings (green yellow red).
- Provide status vs. targets commentary.
- Visualize the why behind sudden jumps or declines in specific metrics.
At PJ Networks, we create a one page KPI summary sheet for execs that simplifies the technicalities to actually help them make decisions. Keeps leadership involved — and supportive of necessary investments.
Continuous Learning: How to Use Metrics to Achieve Process and Product Excellence
KPIs are worthless unless they drive action.
Focus on:
- Identifying the root causes of repeat incidents.
- Tuning the alerting thresholds to minimize the noise.
- Training taken from delays to detection/resolution.
- Infrastructure investment where capacity regularly meets thresholds.
Think of my PSTN days – had we not adjusted the time in which we could identify and resolve it, the Slammer worm would have brought down entire networks much more frequently.
At PJ Networks, we have a bi-monthly review process: the team looks at KPIs, we talk about things that have gone well, and – importantly – name and shame things that haven’t. No sugarcoating. Only growth.
Quick Take: NOC KPIs India—The Metrics That Matter Most
- MTTD < 5 mins, MTTR < 30 mins : Arbitrated PJ Networks Business Critical Incidents benchmarks
- Focus on severity of incidents and not just volume in order to prioritize efforts.
- Keep false positives to a minimum (<10%) so as not to suffer from alert fatigue
- You want to operate at 70-80 percent of network capacity for ideal performance
- Combine automated instruments and manual insights for precise data uptake
- Frame metrics as business impact to executives, not tech talk
Conclusion & A Simple KPI Dashboard Template
You have a treasure trove of data in your NOC today — if you’re able to measure, track, and act on the most relevant KPIs.
PJ Networks has thrown decades of experience into defining and using these metrics – not only technology benchmarks, but business enablers.
Here’s a simple KPI dashboard template we use ourselves:
| KPI | Target | Actual | Trend (Previous 3 Months) | Result |
|---|---|---|---|---|
| MTTD Major Incidents | < 5 min | 4.2 min | ↓ | Green |
| MTTR Critical Incidents | < 30 mins | 28 mins | → | UP |
| False Positives | < 10% | 12% | ↑ | Yellow |
| Network Capacity Utilization | 70-80% | 75% | → | Yes |
| Volume of Cases (Per Month) | N/A | 150 (10% with severe symptoms) | ↓ | Green |
You can safely start from there and modify depending on your team and your tech stack.
Final rant? Gone are the days of depending on fancy AI marketed as next-gen. First trust in your successful metrics. Your network is only as valuable as your analysis and the people generating your reports.
Okay, coffee number four. Until next time, keep your metrics tight and your networks tightter.
Cheers,
Sanjay Seth
P J Networks Pvt Ltd
