Observability: Performance Engineering for Business Success

May 15, 2024
Business Success, Dynatrace, Performance

The global digital transformation and e-commerce boom continue to drive up expectations for IT quality, especially regarding performance and user experience. But the current state is critical:
77% of consumers have stopped using certain digital services or uninstalled apps due to performance issues.

This underscores the growing need for strategies and solutions to achieve effective Application Performance Management (APM). Operations teams, developers, and management all require end-to-end visibility into their IT architectures and reliable tools to maintain business continuity with minimal resource expenditure. Given today’s complexity, traditional threshold-based monitoring is no longer sufficient.

In complex environments such as online retail, cloud ecosystems, microservices, and diverse applications, Observability is now essential for maintaining performance, particularly for business-critical processes.

IT Budgets Shrink, Yet Digitalization Continues

Despite continued investments in industrial and business-process digitalization, IT budgets are tightening. In 2023, the average IT budget accounted for just 3.6% of total company revenue, down from 4.2% in 2022.
This share varies by industry: finance and electrical engineering sectors typically spend above the average, while construction and mechanical engineering fall below.

Security remains the top priority for about 80% of companies, due to increasingly sophisticated cyberattacks. Yet 59% aim to invest more in IT operations to better meet evolving customer demands and boost competitiveness, particularly through improved e-commerce performance, stability, flexibility, and scalability for enhanced customer experience.

To achieve this, many decision-makers are renewing their IT infrastructure. MACH architectures (Microservices, API-first, Cloud-native, Headless) are becoming the standard over all-in-one solutions. Their scalability and adaptability make them ideal for dynamic cloud environments and fast customer journeys.

Observability as a Prerequisite for Stable IT

When infrastructures are consolidated or expanded, effective monitoring tools become essential. Without them, IT stability and user experience are nearly impossible to maintain.
Modern architectures – clouds, microservices, serverless apps, hybrid or on-prem environments, generate large volumes of anomalies. On top of this, the rise of AI adds a new layer of complexity.

New Challenge: AI and Shadow AI

According to Forrester, 2024 marks the start of an era of “Intentional AI”, moving beyond the hype toward concrete strategic use. 67% of companies plan to integrate Generative AI into their overall AI strategies.

Yet another trend is emerging: Shadow AI—the unauthorized use of AI tools by employees. Around 60% of employees globally are expected to use AI tools at work without approval.
A Salesforce study reveals:

52% of German employees have used unauthorized GenAI tools
34% brought their own, officially banned AI tools

This “Bring Your Own AI” (BYOAI) trend is poised to explode. Shadow AI introduces not just security risks but new challenges in performance monitoring. While usage can be limited through internal policies, strategic AI deployments (via MLOps) require robust performance management and monitoring frameworks to ensure business continuity.

Traditional Monitoring Captures Only 1% of Anomalies

Operating containers, microservices, clouds, and AI systems dramatically increases complexity and data volumes can reach terabytes.
This overwhelms traditional monitoring, which typically detects only about 1% of anomalies—the “known knowns”.
The real issue lies in the “unknown unknowns” unexpected, unfamiliar anomalies that defy standard analysis.

Traditional monitoring tools might detect that “something is wrong” but often can’t explain what or why. Troubleshooting these unknowns is slow and costly; often leading to hours or even days of poor performance and critical user experience issues.

In digital business, especially e-commerce, this is unacceptable. As Greg Linden, creator of Amazon’s recommendation engine, once said:
“100 milliseconds of latency costs Amazon 1% in revenue.”

Hence, businesses must implement Observability with real-time monitoring, automated root cause analysis, and rapid resolution (low MTTR) for even complex incidents.

Optimal APM with Observability

Observability takes performance monitoring to the next level by integrating and analyzing logs, metrics, and traces:

Logs: Show events, errors, user and device info
Metrics: Show system behavior (e.g., CPU usage, transaction volumes)
Traces: Show how long requests take and where bottlenecks occur

Combined, these reveal deep insights and transparency. For example, metrics might show slow response times, while logs reveal the cause, bycomplex transactions being processed.

To implement Observability, entire systems must be instrumented to collect data. Due to the data volumes, AI and ML are key for real-time analysis and insight generation.

AI-powered systems:

Understand historical system states
Compare them to real-time data
Identify anomalies
Assign known issues to auto-remediation
Escalate unknowns to specialists

Observability dramatically improves the detection, understanding, and resolution of unexpected anomalies “unknown unknowns”, thus ensuring seamless operations and top-tier user experience.

Business Benefits of Observability

Beyond performance, Observability offers significant business value:

Customer behavior data can be linked to technical data to measure IT’s direct contribution to business outcomes
Predictive analytics can forecast system loads, enabling proactive scaling

Observability also frees up DevOps teams by streamlining issue resolution and shortening release cycles. Logs, metrics, and traces can be integrated with CI/CD pipelines to test the performance impact of changes early, before issues reach production.

This gives developers more time for innovation and value-creation.

Observability and Generative AI: The Road Ahead

Generative AI brings added complexity to APM. It demands handling broader data diversity and volume. Full observability into AI systems is still evolving.

As Cory Minton, Head of Observability Strategy at Splunk, puts it:

“We need to figure out how to extract metrics, logs, and traces from AI. Does it behave as expected? If not, how do we build and fine-tune MLOps pipelines to keep functionality intact?”

Modern observability systems must rise to this challenge – with or without integrated GenAI.

Strong Return on Investment

We are entering a new era of performance engineering. While not all anomaly resolution can be automated, Observability is ideal for performance management:

It identifies and classifies unknown issues
Routes them directly to the right specialist
Frees up expert resources from manual data triage

Companies are increasingly investing in Observability to achieve business goals and boost profitability. Surveys show:

On average, every €1 invested in Observability returns €2 – it pays off.

References

1 – https://survey.zohopublic.eu/zs/HETsVe
2 – https://www.forrester.com/blogs/prognosen-2024-generative-ki-de/
3 – https://www.salesforce.com/news/stories/ai-at-work-research/
4 – https://www.splunk.com/de_de/form/it-predictions.html
5 – https://www.splunk.com/de_de/form/it-predictions.html

Observability: Performance Engineering for Business Success

IT Budgets Shrink, Yet Digitalization Continues

Observability as a Prerequisite for Stable IT

New Challenge: AI and Shadow AI

Traditional Monitoring Captures Only 1% of Anomalies

Optimal APM with Observability

Business Benefits of Observability

Observability and Generative AI: The Road Ahead

Strong Return on Investment

References

amasol

Additional Resources

Legal Information

MUNICH

VIENNA

NEW DELHI

LONDON