$ cat ~/posts/measuring-everything.txt

Opening file...

[Feb 19, 2023]-rw-r--r--1 Hugo staff

Engineering as a Science: Stop Guessing, Start Measuring Your Keystroke Capital

Imagine a patient walking into a doctor’s office: "Doc, I need surgery. I feel like my appendix is about to burst." "Say no more," replies the surgeon. "I had a hunch you’d say that. Let's operate."

It sounds absurd. Yet, in software engineering, we witness this behavior daily:

Massive refactors launched because the code "felt smells."
Infrastructure overhauls based on vague fears of scalability rather than capacity planning.
Micro-optimizations applied without a single benchmark.

Intuition vs. Measurement

Don't get me wrong: I am not arguing for analysis paralysis. For seed-stage startups, intuition is a survival mechanism. When your codebase is small and speed is everything, you don't need a dashboard to know what's broken.

But once a team exceeds a certain size (say, 20 engineers) or when the product finds traction, relying solely on intuition becomes negligent. You are no longer dealing with infinite possibilities; you are managing finite resources.

The "Keystroke Capital" Theory

Let’s perform a theoretical exercise. Regardless of your company's bank balance, your engineering team has a hard limit: Keystroke Capital.

Imagine your team has a collective budget of 10,000 keystrokes per week. If a minor cleanup costs 10 keystrokes, fine. But what if a "readability refactor" triggers a ripple effect consuming 1,000 keystrokes—10% of your weekly capital—without any measurable gain in performance or velocity?

You have just spent 10% of your budget based on a feeling. Without measurement, you cannot prove the investment was worth the withdrawal.

No measures, no alignment

Without measurements, success is indistinguishable from luck.

As a Senior Software Engineer, your job involves constant negotiation. You are negotiating for time to pay down debt, for resources to upgrade infrastructure, or for the delay of a feature to ensure stability. In these negotiations, data is your leverage. Without it, you are just another opinion in the room.

Without measurements, you cannot set objectives.
Without objectives, you cannot define success.
Without success criteria, you cannot create alignment.

A product team that cannot quantify its impact is a team in danger of underinvestment. If you can't prove that your refactoring saved $50k in compute costs or reduced onboarding time by 30%, you are relying on management's goodwill. That is a precarious place to be.

Our job is a scientific job

Our job is a scientific job and one of the main pillars of scientific culture is measurement.

This measurement is part of a scientific approach.

We often hear about 10x developers and I have personally worked with people who are indeed terribly more productive than others. But it's not their typing speed on the keyboard that makes them so productive, it's their ability to apply a systematic approach to solve a problem .

It’s a learnable loop: Observe → Hypothesize → Measure → Act.

A Concrete Example: The "Slow" Payment Page

Listen: Users complain the payment page is "slow."
Measure: Don't guess. Your APM shows a bimodal distribution: 80% of users load in <1s, but 20% take >10s. This 20% correlates with high cart abandonment (lost revenue).
Hypothesis: It’s not the code efficiency; it’s a resource bottleneck under specific concurrency loads.
Action: Simulate traffic. Use flame graphs to identify the locking mechanism causing the queue.
Validation: Deploy the fix and watch the P95 latency drop on the dashboard.

This seems obvious, yet many teams skip the "Measure" step and jump straight to "Action" based on intuition.

Of course, the speed of resolution will depend on your initial knowledge and know-how. The fact that you master certain diagnostic tools or that you need assistance will slow you down or speed you up, but it is the method that must guide you.

Case Study: Taming a 2.3M Line Monolith

Let’s apply this "Keystroke Capital" and scientific mindset to a massive architectural challenge.

Context:

At Malt (where I work), we are a scale-up with ~100 engineers. We operate a massive monolith with over 2.3 million lines of code.

The Symptom: CI/CD latency was destroying our velocity. Feedback loops were getting longer.

The "Intuitive" Solution: "The build is slow, let's optimize the compiler or buy bigger instances."

The Scientific Approach:

Staff Engineer Nicolas Grisey Demengel didn't trust the intuition. He dug into the data.

Observation: The build time for a single app wasn't the main issue. The issue was the queue depth.
Data Point: When Team A deployed App A, it triggered builds for App B, C, and D.
Root Cause: We had a "coupling" problem. Shared libraries were too entangled. A minor change in a utility library triggered a "rebuild the world" event.

The Metric that Mattered:

Nicolas didn't just start "decoupling" at random (which would have cost millions of keystrokes). He established a specific metric:

$ echo
>
Impact Radius = Frequency of library modification $\times$ Number of dependent apps triggered.

This data visualization highlighted the "Hotspots": the specific libraries causing 80% of the CI congestion.

The Result:

Instead of a multi-year "rewrite everything" project, the team focused their limited "Keystroke Capital" solely on these hotspots. They set quantitative thresholds (e.g., "Max 5 downstream builds per commit").

This is the difference between an endless refactoring crusade and a targeted engineering strike.

(You can read the full technical breakdown on the Malt Engineering Blog)

Where Do You Stand? The Data Maturity Check

To bridge the gap between "Intuition Engineering" and "Scientific Engineering," you need to honestly assess your team's current data culture. We can simplify the common 5-level maturity models into four essential stages defined by their symptoms.

This stage is defined by a complete lack of, or a highly chaotic, data infrastructure. You can't prove anything.

Engineering:
- No standardized monitoring or APM (Application Performance Monitoring).
- Performance testing is manual and sporadic.
- Logs are present but not aggregated (you must SSH to check them).
- No recognized Engineering metrics are tracked (e.g., DORA metrics, Cycle Time, etc.).
Product:
- The product is uninstrumented; you don't track usage analytics.
- Objectives are all qualitative ("Ship this feature").
- Core KPIs (conversion, churn, activation) are unknown or estimated.

The actions here are quite obvious, we need to train the whole team on the importance of data and put in place the tools to collect measurements.

Stage 2: Reactive (Reactive Symptoms)

Data tools are deployed, but the team uses them only when something is already broken. The quality and reliability of the data are low.

Data Quality Issues:
- Log levels are inconsistent (e.g., no clear policy on what constitutes a WARN vs. ERROR).
- Data sources are often contradictory.
- Tracking (analytics) is in place but often broken or not followed by a tracking plan.
Alert Fatigue:
- Alerting is poorly configured, leading to high false positives (a "noisy pager").
- Teams ignore alerts or temporarily mask them to restore silence.
Inconsistent Usage:
- KPI definitions change frequently between departments.
- Post-mortems rely on guesswork because the pre-incident data was not being actively monitored.
- Actionable Step: A dedicated owner (or team) must be appointed to enforce data standards and quality.

Stage 3: Predictive & Managed (Internal Strategy)

The engineering and product teams act scientifically, and data is critical, but its value is mostly confined to the tech organization. Data drives internal decision-making.

Engineering Focus:
- SLA/SLO/SLI are clearly defined, understood, and actively managed.
- Data is central to project kick-offs (e.g., "If we refactor this, the goal is P95 latency improvement from X to Y").
- You know the cost-per-user of your infrastructure (nominal cost in machine resources).
- Engineering metrics (DORA, deployment frequency) are communicated to the Executive level (ComEx).
Product Focus:
- Every new project has success metrics defined upfront.
- Product operating costs are linked to the business for budget calculations.

Stage 4: Effective & Strategic (Company-Wide Impact)

Data is recognized as a critical competitive asset across the entire company. Its quality and governance are formalized and leveraged externally.

Company Alignment:
- The importance of data and its quality is recognized throughout the entire company, not just in tech.
- All teams (Sales, Marketing, Finance) use the same definitions and central data sources for KPIs.
External Impact:
- You are confident enough in your stability to publish public SLAs (Service Level Agreements).
- You can provide detailed status pages with stability breakdowns by service (e.g., "Failure in progress on payment component, search engine is OK").
- Data can provide a competitive advantage, potentially through publishing data in open-data formats.

Conclusion

At Malt, we push for Stage 3 (EDIT: in 2023). We store business events in BigQuery, aggregate monitoring in Datadog, and use analytics to drive documentation updates. It’s not just about tools; it’s about respect for our own resources.

Questions for you

How do you quantify "tech debt" payoff in your teams?
And you, do you use quantitative objectives? If not, could you add some?
What would be your maturity level on the above model?
Do you know the cost of a user and the evolution of this cost over the last 2 years?
Are your engineering metrics known outside your team?

Resources

What is data maturity and how to climb the data maturity scale
Data Governance maturity model
Your version control system contains invaluable insights for your engineering teams III
Your version control system contains invaluable insights for your engineering teams II
Your version control system contains invaluable insights for your engineering teams I

Hugo's Blog@blog:~$