動画検索
関連広告
検索結果
An incident is any unplanned disruption or event that requires immediate attention or action
Replace chaos with calm
Anyone can trigger the incident Response Process at any time
On-call schedules for primary and backup subject matter experts (one primary and one backup for each team)
Gain consensus "Are there any strong objections"
Handoffs are encouraged
Failure to Notify Stakeholders
Red Herrings
Practice running major incidents as a team
Don't neglect the postmortem
How do you differentiate observability from what is traditionally known as monitoring in IT systems?
What points of data do you believe have the most relevance to understand the inside of a complex system?
How does testing in production fit in here?
What tools really help provide users the best information to identify problems before they impact end users?
What advice would you give someone implementing observability in their production environment?