Display Health Monitoring
Real-time monitoring and analytics for display performance, uptime, and health status to ensure reliable digital signage operations.
Overview
Display Health Monitoring provides comprehensive insights into display performance, connection status, and operational health. Track uptime, diagnose issues, and maintain optimal display performance across your entire network.
Key Features:
- Real-time health status
- Uptime and downtime tracking
- Connection monitoring
- Error and warning alerts
- Performance metrics
- Historical health data
- Automated health checks
Display Health Status
Status Indicators
Online (Green):
- Display actively connected
- Content rendering normally
- Recent heartbeat received (< 2 minutes)
- No critical errors
Degraded (Yellow):
- Intermittent connectivity
- Performance issues detected
- Minor errors present
- Slow response times
Offline (Red):
- No connection established
- Heartbeat timeout (> 5 minutes)
- Display unreachable
- Critical errors
Unknown (Gray):
- Newly registered
- No health data yet
- Awaiting first connection
Health Check Frequency
Automatic Checks:
- Heartbeat: Every 60 seconds
- Connection status: Every 30 seconds
- Performance metrics: Every 5 minutes
- Error logs: Real-time
- Daily summary: 00:00 UTC
Health Events
Event Types
CONNECTED
- Display establishes connection
- Session created
- Initial handshake successful
- Configuration synced
DISCONNECTED
- Display loses connection
- Session ended
- Network interruption
- Intentional disconnect
ERROR
- Application error occurred
- Content load failed
- Configuration error
- Runtime exception
WARNING
- Minor issue detected
- Performance degradation
- Resource constraint
- Configuration mismatch
PERFORMANCE_DEGRADED
- Slow response times
- High CPU/memory usage
- Network latency detected
- Frame rate drops
NETWORK_ISSUE
- Intermittent connectivity
- Packet loss detected
- DNS resolution failure
- Bandwidth limitation
CONTENT_LOAD_FAILED
- Media file unreachable
- Integration timeout
- Invalid content format
- Storage access error
LAYOUT_RENDER_ERROR
- Layout parsing failed
- Invalid configuration
- CSS/styling error
- Rendering exception
Event Severity Levels
INFO
- Normal operations
- Routine events
- Status updates
- Configuration changes
WARNING
- Non-critical issues
- Performance concerns
- Resource warnings
- Configuration notices
ERROR
- Recoverable errors
- Content failures
- Temporary issues
- Retry-able problems
CRITICAL
- System failures
- Unrecoverable errors
- Security issues
- Data corruption
Health Metrics
Uptime Metrics
Uptime Percentage:
- Calculated per day/week/month
- Formula:
(uptime seconds / total seconds) × 100 - Target: 99%+ for production displays
- Excludes scheduled maintenance
Uptime Seconds:
- Total time display online
- Continuous connection duration
- Aggregated daily
Downtime Seconds:
- Total time display offline
- Disconnection duration
- Excludes scheduled maintenance
Example Calculation:
Day: 24 hours = 86,400 seconds
Uptime: 85,500 seconds
Downtime: 900 seconds (15 minutes)
Uptime %: (85,500 / 86,400) × 100 = 98.96%
Connection Metrics
Connection Count:
- Total connections established
- Session creation events
- Reconnections included
Disconnection Count:
- Total disconnections
- Normal and abnormal
- Network interruptions
Average Connection Duration:
- Mean time between connects/disconnects
- Measured in seconds
- Indicates stability
Connection Quality:
- Excellent: < 5 disconnections/day
- Good: 5-20 disconnections/day
- Fair: 20-50 disconnections/day
- Poor: > 50 disconnections/day
Error Metrics
Error Count:
- Total ERROR severity events
- Excludes warnings
- Per day aggregation
Warning Count:
- Total WARNING severity events
- Non-critical issues
- Per day aggregation
Critical Count:
- Total CRITICAL severity events
- Urgent attention required
- Immediate alerts sent
Error Rate:
- Errors per hour of uptime
- Target: < 1 error/hour
- Calculated:
total errors / uptime hours
Performance Metrics
Average Response Time:
- Mean API response latency
- Heartbeat round-trip time
- Measured in milliseconds
- Target: < 200ms
Peak Response Time:
- Maximum latency observed
- Slowest request in period
- Indicates worst-case performance
Frame Rate:
- Content rendering FPS
- Target: 30 FPS minimum
- Drops indicate performance issues
Resource Usage:
- CPU utilization percentage
- Memory consumption
- Network bandwidth usage
- Storage space available
Monitoring Dashboard
Display Health Overview
At-a-Glance Status:
- Total displays count
- Online displays count
- Offline displays count
- Degraded displays count
Health Distribution:
Healthy: 25 displays (83%)
Warning: 4 displays (13%)
Critical: 1 display (3%)
Unknown: 0 displays (0%)
Recent Events:
- Last 24 hours event timeline
- Critical alerts highlighted
- Filterable by severity
- Quick issue identification
Individual Display Health
Status Card:
- Display name and location
- Current status indicator
- Last heartbeat timestamp
- Uptime today/this week
Quick Stats:
Uptime Today: 23h 45m (98.96%)
Connection Quality: Excellent
Error Count: 2 warnings
Last Error: 4 hours ago
Health Timeline:
- 24-hour visual timeline
- Color-coded status periods
- Event markers
- Zoom and filter options
Historical Data
Date Range Selection:
- Last 24 hours
- Last 7 days
- Last 30 days
- Custom date range
Trend Analysis:
- Uptime trends over time
- Error frequency patterns
- Performance degradation
- Connection stability
Comparative Analysis:
- Compare multiple displays
- Store-level aggregates
- Account-wide health
- Benchmark against averages
Alerts and Notifications
Alert Triggers
Display Offline:
- Trigger: No heartbeat for 5 minutes
- Severity: CRITICAL
- Actions: Email, SMS, Slack
Uptime Below Threshold:
- Trigger: < 95% uptime over 24 hours
- Severity: WARNING
- Actions: Email notification
High Error Rate:
- Trigger: > 10 errors per hour
- Severity: ERROR
- Actions: Email, dashboard alert
Performance Degraded:
- Trigger: Response time > 1000ms
- Severity: WARNING
- Actions: Dashboard notification
Connection Instability:
- Trigger: > 10 disconnects per hour
- Severity: WARNING
- Actions: Email notification
Alert Configuration
Per-Display Settings:
Display: Front Window Display
Alerts Enabled: Yes
Offline Alert:
Threshold: 5 minutes
Notify: [email protected], (555) 123-4567
Uptime Alert:
Threshold: 95%
Period: 24 hours
Notify: [email protected]
Error Rate Alert:
Threshold: 10 errors/hour
Notify: [email protected]
Alert Frequency:
- Immediate: Critical events
- Digest: Warnings (hourly summary)
- Daily Report: Health summary
- Weekly Report: Trend analysis
Notification Channels
Email:
- Detailed alert information
- Links to dashboard
- Historical context
- Recommended actions
SMS:
- Critical alerts only
- Brief summary
- Direct dashboard link
- Opt-in required
Slack/Discord:
- Real-time notifications
- Threaded conversations
- Status updates
- Team collaboration
Webhook:
- Custom integrations
- Third-party monitoring
- Automation triggers
- JSON payload
Troubleshooting with Health Data
Display Won't Connect
Check Health Events:
- View event log for display
- Look for NETWORK_ISSUE events
- Check DISCONNECTED event details
- Review error messages
Common Causes:
- Network connectivity issues
- Firewall blocking connection
- Invalid session token
- Display device offline
Resolution Steps:
- Verify network connection
- Check firewall rules
- Revoke and regenerate session
- Restart display device
Frequent Disconnections
Analyze Connection Metrics:
- Connection count vs. disconnection count
- Average connection duration
- Time of day pattern
- Network environment
Possible Causes:
- Unstable network
- Power interruptions
- Display device issues
- Router/switch problems
Solutions:
- Wired vs. wireless comparison
- Network diagnostics
- Display device upgrade
- Network infrastructure review
Poor Performance
Review Performance Metrics:
- Average response time trends
- Peak response time patterns
- Error frequency correlation
- Resource usage levels
Performance Issues:
- High latency (> 500ms)
- Frequent content load failures
- Slow rendering
- Memory leaks
Optimization Steps:
- Reduce content complexity
- Optimize media file sizes
- Check network bandwidth
- Update display software
- Review layout configuration
Content Not Displaying
Check Error Events:
- CONTENT_LOAD_FAILED events
- LAYOUT_RENDER_ERROR events
- Error details and timestamps
- Retry attempts
Common Errors:
Error: CONTENT_LOAD_FAILED
Message: Failed to load image from URL
Details: HTTP 404 Not Found
URL: https://example.com/image.jpg
Timestamp: 2025-10-14 14:32:15 UTC
Resolution:
- Verify content source URL
- Check content source permissions
- Test content in browser
- Review content configuration
- Check integration status
API Endpoints
Get Display Health Summary
GET /api/analytics/display-health/:displayId
Query Parameters:
startDate(required): ISO 8601 dateendDate(required): ISO 8601 date
Response:
{
"displayId": "disp_abc123",
"displayName": "Front Window Display",
"period": {
"startDate": "2025-10-01T00:00:00Z",
"endDate": "2025-10-14T23:59:59Z"
},
"uptime": {
"uptimeSeconds": 1196400,
"downtimeSeconds": 2400,
"uptimePercentage": 99.80,
"target": 99.00
},
"connections": {
"connectionCount": 42,
"disconnectionCount": 41,
"avgConnectionDuration": 28600
},
"errors": {
"totalErrors": 12,
"warningCount": 10,
"errorCount": 2,
"criticalCount": 0,
"errorRate": 0.036
},
"performance": {
"avgResponseTime": 145,
"peakResponseTime": 850
},
"status": "healthy",
"lastHeartbeat": "2025-10-14T14:35:22Z"
}
Get Recent Health Events
GET /api/displays/:displayId/health-events
Query Parameters:
limit(optional): Max events to return (default: 100)severity(optional): Filter by severity (INFO, WARNING, ERROR, CRITICAL)eventType(optional): Filter by event typestartDate(optional): ISO 8601 dateendDate(optional): ISO 8601 date
Response:
{
"displayId": "disp_abc123",
"events": [
{
"id": "evt_xyz789",
"eventType": "CONTENT_LOAD_FAILED",
"severity": "ERROR",
"message": "Failed to load video content",
"metadata": {
"contentSourceId": "cs_123",
"url": "https://example.com/video.mp4",
"errorCode": "NETWORK_TIMEOUT"
},
"timestamp": "2025-10-14T14:32:15Z"
}
],
"total": 247,
"page": 1,
"limit": 100
}
Best Practices
Proactive Monitoring
Daily Checks:
- Review overnight health summary
- Check critical alerts
- Verify all displays online
- Address warnings promptly
Weekly Reviews:
- Uptime trend analysis
- Error pattern identification
- Performance benchmarking
- Capacity planning
Monthly Audits:
- Health metrics review
- Alert configuration tuning
- Performance optimization
- Hardware replacement planning
Alert Configuration
Start Conservative:
- Enable critical alerts first
- Add warnings gradually
- Tune thresholds based on data
- Avoid alert fatigue
Threshold Recommendations:
- Offline alert: 5-10 minutes
- Uptime threshold: 95-99%
- Error rate: 5-10 errors/hour
- Response time: 500-1000ms
Escalation Policy:
1. Critical Alert → Immediate notification (email + SMS)
2. No Response (15 min) → Escalate to manager
3. No Response (30 min) → Escalate to on-call
4. No Resolution (1 hour) → Executive notification
Performance Optimization
Regular Maintenance:
- Weekly: Review error logs
- Monthly: Performance analysis
- Quarterly: Hardware assessment
- Annually: Full system audit
Optimization Opportunities:
- Displays with > 2% downtime
- Error rate > 1 per hour
- Response time > 300ms
- Connection instability
Resource Planning:
- Monitor uptime trends
- Plan for peak loads
- Schedule maintenance windows
- Capacity expansion timing
Integrations
Third-Party Monitoring
Datadog Integration:
- Export health metrics
- Custom dashboards
- Alert routing
- Log aggregation
PagerDuty Integration:
- Incident management
- On-call rotation
- Escalation policies
- Incident timeline
Slack Integration:
- Real-time notifications
- Team collaboration
- Status updates
- Command shortcuts
Webhooks
Configure Webhook:
{
"url": "https://your-system.com/webhook",
"events": [
"display.offline",
"display.online",
"display.error.critical"
],
"secret": "your-webhook-secret"
}
Webhook Payload:
{
"event": "display.offline",
"timestamp": "2025-10-14T14:35:22Z",
"display": {
"id": "disp_abc123",
"name": "Front Window Display",
"location": "Main Street Store"
},
"details": {
"lastHeartbeat": "2025-10-14T14:30:15Z",
"downtime": 307
}
}
Advanced Features
Predictive Analytics
Failure Prediction:
- ML models analyze health patterns
- Predict potential failures
- Proactive maintenance alerts
- Reduce unplanned downtime
Trend Detection:
- Gradual performance degradation
- Increasing error rates
- Connection instability patterns
- Early warning indicators
Custom Health Checks
Define Custom Metrics:
{
"name": "Content Refresh Success Rate",
"query": "content_refreshes_succeeded / total_content_refreshes",
"threshold": 0.95,
"alert": "WARNING"
}
Schedule Custom Checks:
- HTTP endpoint monitoring
- Database query performance
- Integration health checks
- Custom script execution
Health Score
Composite Health Score:
Health Score = (Uptime × 0.40) +
(Connection Stability × 0.30) +
(Error Rate × 0.20) +
(Performance × 0.10)
Scale: 0-100
90-100: Excellent
80-89: Good
70-79: Fair
<70: Poor
Use Cases:
- Display comparison
- Store performance ranking
- Account health summary
- SLA compliance tracking
Troubleshooting Guide
Health Data Not Updating
Symptoms:
- Last heartbeat timestamp frozen
- Metrics not changing
- Events not appearing
- Status stuck
Check:
- Display actually connected?
- Heartbeat service running?
- Database connectivity?
- API endpoint accessible?
Inaccurate Metrics
Symptoms:
- Uptime percentages incorrect
- Connection counts wrong
- Performance metrics unusual
- Data inconsistencies
Validate:
- Clock synchronization (NTP)
- Timezone configuration
- Data aggregation logic
- Database integrity
Missing Health Events
Symptoms:
- Events not logged
- Gaps in event timeline
- Missing critical alerts
- Incomplete history
Troubleshoot:
- Event logging enabled?
- Storage space available?
- Log rotation configured?
- Database write permissions?
Next Steps
- Display Registration - Register and configure displays
- Troubleshooting Displays - Fix display issues
- Analytics Dashboard - View performance analytics
- Alert Configuration - Configure health alerts