The Incident and Problem Management Specialist will govern the (ITIL-based) incident and problem management process. Accountable for coordinating the right resources to ensure the restoration of normal service (e.g., back to agreed service levels) following an event that disrupts service.
The incumbent ensures continuous improvement of processes required to effectively deliver incident and problem management services. The individual will take on the major incident role in an on-call 24/7 support rotational basis. Shift work may include evening, night, and early morning shifts.
The role includes Problem Management functions where the incumbent will be responsible for gathering RCA artifacts and tracking progress of related tasks. The incumbent will manage after-hour emergency change meetings.
What can you expect in this role?
- Lead and manage the end-to-end lifecycle of major or managed incidents, ensuring effective coordination, escalation, and resolution.
- Responsible for the command and control of major or managed incident bridges; provide timely and clear updates to stakeholders; participate in on-call rotations.
- Coordinate problem review meetings to clearly establish root cause; define actions and accountability.
- Collaborate with Change Management teams when emergency changes are required to restore service during business hours. The incumbent will be responsible for performing emergency change activities after hours.
- Document incident learnings, contribute to the knowledge base, and support process improvements.
- Govern Service Level Agreements (SLA) and Key Performance Indicators (KPI). Provide accurate reporting to key stakeholders.
- Works closely with ITSM Change and Release Management team to ensure a seamless alignment across Service Management.
- Responsible for training new and existing staff members with an understanding of the incident and problem management process.
- Responsible for reviewing metrics and reports with leadership / technology groups; measuring the effectiveness of the incident and problem activities.
What do you bring to the role?
- University degree or college diploma in computer science, information systems, or a related discipline.
- 8+ years of IT industry experience, including exposure to cloud and high availability production environments; 5+ years of Major Incident and Problem Management experience.
- ITSM Toolsets: Experience using ServiceNow and Jira Service Management would be an asset.
- Technology Proficiency: Networking, database, cloud, SaaS, monitoring, or cyber security knowledge would be an asset.
- Software Proficiency: MS Windows, MS Teams, MS Office, Confluence, Jira, or SharePoint would be an asset.
- Strong personality traits include confidence, discipline, and assertiveness.
- Self-motivated, accountable, and able to navigate challenges with a sense of calm and determination.
- Analytical and troubleshooting mindset; able to grasp complex systems quickly.
- Organized, detail-oriented, and able to manage multiple tasks under pressure.
- Strong communication skills (written and verbal); adept at conveying technical issues clearly for all stakeholders.
- ITIL certification would be considered an asset.