Application Performance Monitoring: Beyond the Health Check Endpoint

A health check endpoint that returns 200 tells you your application process is running. It tells you nothing about whether your application is performing well, why users might be experiencing slow response times, which database queries are responsible for 40% of your API latency, or how your application's performance has changed since the last deployment.

Performance monitoring is about answering those questions with data. Here is how I set it up properly.

The Performance Metrics That Matter

Performance monitoring starts with defining what "performance" means for your specific application. For a web API, the relevant metrics are:

Response time by endpoint — not just average, but p50, p90, p99, and p99.9. The average latency for your API might be 45ms. The p99 might be 1,200ms. Those are wildly different user experiences, and the average tells you almost nothing about the tail.

Database query time — which queries are slow, how frequently they run, and whether query performance is consistent or variable (variable indicates table scan behavior that degrades as data grows).

External API call latency — every call to a third-party service is a latency source you do not control. You need to know which external calls are the slowest and what happens when they time out.

Error rate by endpoint — percentage of requests that return 4xx or 5xx responses, broken down by endpoint.

Throughput — requests per second, showing your load patterns and helping you correlate performance changes with traffic changes.

Distributed Tracing: Following a Request Through Your System

For applications that span multiple services — an API that calls other APIs, reads from a database, puts items on a queue — you need distributed tracing to understand where time is spent within a request.

OpenTelemetry is the standard. It provides vendor-neutral instrumentation libraries that export trace data to any compatible backend (Jaeger, Zipkin, Datadog, Honeycomb, Grafana Tempo).

Instrument your Node.js application:

// instrumentation.ts — must be loaded before anything else
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { HttpInstrumentation } from "@opentelemetry/instrumentation-http";
import { ExpressInstrumentation } from "@opentelemetry/instrumentation-express";
import { PgInstrumentation } from "@opentelemetry/instrumentation-pg";

Const sdk = new NodeSDK({
 serviceName: "payment-api",
 traceExporter: new OTLPTraceExporter({
 url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
 }),
 instrumentations: [
 new HttpInstrumentation(),
 new ExpressInstrumentation(),
 new PgInstrumentation(),
 ],
});

Sdk.start();

Load this before your application code:

{
 "scripts": {
 "start": "node --require ./instrumentation.js src/index.js"
 }
}

With this in place, every HTTP request creates a trace with spans for each operation: the incoming HTTP request, each database query, each outbound HTTP call. You can see the waterfall of operations for any request — total time, time per operation, where the bottleneck is.

A trace that shows a 1,200ms API response might reveal: 5ms routing overhead, 800ms for a single database query, 350ms for an external API call, 45ms for everything else. The database query is the bottleneck. That is actionable.

Database Query Performance

Database performance degrades silently. A query that takes 10ms with 10,000 rows takes 1,200ms with 1 million rows if it is doing a sequential scan. Unless you are watching query times over time, you will not notice until users start complaining.

Enable Postgres slow query logging:

-- In postgresql.conf or via ALTER SYSTEM
log_min_duration_statement = 100 -- Log queries taking over 100ms
log_statement = 'none'

This logs every query that takes over 100ms. Review these logs weekly. Any query appearing regularly in slow query logs needs an index or query optimization.

For more sophisticated analysis, use pg_stat_statements. It tracks execution statistics for all queries:

CREATE EXTENSION pg_stat_statements;

-- Find the slowest queries by total time
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The total_exec_time column shows you which queries are consuming the most cumulative time — even if individual calls are fast, a query called 10,000 times at 50ms each totals 500 seconds of database time. These high-call-count queries are worth optimizing even if the individual execution time seems acceptable.

Use EXPLAIN ANALYZE on slow queries to see the query plan:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345 ORDER BY created_at DESC LIMIT 10;

If the plan shows "Seq Scan" on a large table, you need an index. If it shows "Index Scan" but is still slow, the index might not be selective enough, or the query might be returning too many rows.

Frontend Performance with Core Web Vitals

Backend latency is only part of user-perceived performance. The frontend rendering pipeline — JavaScript execution, CSS parsing, image loading, layout calculation — contributes significantly to what users actually experience.

Core Web Vitals are Google's standardized metrics for user experience:

Largest Contentful Paint (LCP) — when does the main content load? Target under 2.5 seconds.

Interaction to Next Paint (INP) — how quickly does the page respond to user interaction? Target under 200ms.

Cumulative Layout Shift (CLS) — how much does the layout jump around as content loads? Target under 0.1.

Measure these from real user sessions, not from synthetic Lighthouse tests. Lighthouse on a fast developer laptop with a fast internet connection is not representative of your users' experience. Use the Chrome User Experience Report, or install a Real User Monitoring (RUM) tool.

For Nuxt and Next.js applications, Vercel Analytics and Netlify Analytics provide Core Web Vitals data from real users. For custom deployment targets, integrate the web-vitals library:

import { getCLS, getFID, getFCP, getLCP, getTTFB } from "web-vitals";

Function sendToAnalytics(metric: { name: string; value: number; delta: number }) {
 // Send to your analytics endpoint
 fetch("/api/metrics", {
 method: "POST",
 body: JSON.stringify(metric),
 headers: { "Content-Type": "application/json" },
 });
}

GetCLS(sendToAnalytics);
getFID(sendToAnalytics);
getFCP(sendToAnalytics);
getLCP(sendToAnalytics);
getTTFB(sendToAnalytics);

Collect these metrics in your analytics database and build a dashboard showing p75 values for each metric over time. The target for Core Web Vitals is the 75th percentile — you want 75% of your users to have a good experience.

Performance Regression Detection

The most valuable performance monitoring is detecting regressions immediately after deployment. Set up a performance comparison between your last production deployment and the current one.

After every deployment, run a synthetic load test against your staging environment and compare key endpoint latencies to the baseline:

# Using k6 for a basic load test
k6 run --env BASE_URL=https://staging.myapp.com scripts/load-test.js

If p99 latency for your critical endpoints increased by more than 20% compared to the previous deployment, that is a regression. Catch it in staging before it reaches production.

Set up a deployment annotation in your monitoring dashboards. Every time a deployment happens, mark it on your performance graphs. This makes correlating performance changes with deployments trivial — you can see exactly when a latency spike started and match it to the deployment that caused it.

Building the Performance Dashboard

A single dashboard with six charts covers the performance visibility I want for most applications:

API p50/p90/p99 latency (last 24 hours, rolling)
Error rate by endpoint (last 24 hours)
Requests per second (last 24 hours)
Top 10 slowest average database queries (last hour)
LCP p75 from real users (last 7 days)
External API call latency by provider (last 24 hours)

These six panels answer "how is my application performing" across the full stack. Any significant degradation shows up in at least one of these panels.

If you want help setting up performance monitoring for your application or have a specific performance problem you are trying to diagnose, book a session at https://calendly.com/jamesrossjr.