ITIL OSA Event Management – Flashcards
Unlock all answers in this set
Unlock answersquestion
Effective service operation depends on
answer
1. knowing the status of infrastructure and services 2. ability to detect deviations from normal or expected operation
question
event
answer
a change of state that has significance for the management of an IT service or other CI; can require IT operations personnel to take action and often result in events being logged
question
event management
answer
the process responsible for managing events through their lifecycle; one of the main activities of IT operations
question
configuration item (CI)
answer
any component or other service asset that needs to be managed in order to deliver an IT service; information about each is recorded in a CMS (configuration management system) and maintained throughout its lifecycle; they are under the control of change management and can include services, hardware, software, buildings, people, and formal documentation such as processes, procedures, and SLAs (service level agreements)
question
active monitoring
answer
monitoring of a CI or an IT service that uses automated regular checks to discover its current status; tools that poll CIs at a predetermined frequency to determine their current status; exceptions generate an alert that is communicated to the appropriate tool or team for follow-up actions
question
passive monitoring tools
answer
montioring of a CI or an IT service or process that relies on an alert or notification to discover its current status; tools that detect and correlate operational alerts that are generated by the CIs themselves
question
purpose of event management
answer
1. manages events through their lifecycle 2. includes activities that detect events, make sense of them, and determine the appropriate response 3. provide the basis for operational monitoring and control 4. automate normal operations as well as detect early warnings and failures of CIs
question
objectives of event management
answer
1. detect all significant changes of state to CIs 2. determine the appropriate control action (response) 3. provide a trigger to initiate other operational processes 4. provide a means to compare actual performance against designs 5. provide a basis for service assurance and reporting
question
scope of event management
answer
1. supports any service management aspect that needs to be controlled and can be automated including: -CIs (monitoring and updating of status) -environmental conditions -software license monitoring -security monitoring -normal activities 2. monitoring and event management are related but different
question
business value of event management
answer
1. early detection of incidents 2. monitoring automated activities by exception 3. information for other service management processes 4. basis for automation 5. early notifications, which can prevent service disruptions
question
policies of event management
answer
1. events should only be sent to those responsible for action 2. event management should be centralized as much as possible 3. events should utilize common messaging and logging standards 4. event handling should be automated where possible 5. events should have standard classification schemes and escalation procedures 6. all recognized events should be captured and logged
question
informational events
answer
1. indicates normal operation 2. conveys data for decision making indicates information that can be used for trending and analysis to inform the service provider in its decision-making process
question
informational events examples
answer
-data for decision making -scheduled work completed -user accessed an application -e-mail was received
question
warning events
answer
1. indicates usual operation 2. conveys predictive information or early warning 3. additional monitoring or response may be required indicates early warning information that can often be leveraged to minimize or prevent any user or business impact
question
warning events examples
answer
- transaction completion time 10% higher than normal - CPU utilization with 5% of highest tolerance
question
exception events
answer
-indicates operation outside of acceptable range -conveys abnormal situations that require follow-up actions indicates abnormal situations or failures that require additional follow-up actions
question
exception events examples
answer
-services or functionality unavailable -CPU utilization above acceptable levels -incorrect password attempts -unauthorized software
question
Filtering (events) Strategy: integration
answer
event management integrated into all service management processes
question
Filtering (events) Strategy: design
answer
services designed with event management in mind
question
Filtering (events) Strategy: trial and error
answer
perfection is elusive, formal reviews and evaluation
question
Filtering (events) Strategy: planning
answer
-approach from an enterprise perspective -manage as a project -ensure realistic timeliness and resources
question
Name the event types.
answer
1. informational 2. warning 3. exception
question
Name event filtering types.
answer
1. integration 2. design 3. trial and error 4. planning
question
Event management should be:
answer
-designed within the service design stage with availability and capacity management involvement -testing and validated as part of service transition -supported, managed, and refined by service operation -support and be supported by continual service improvement
question
key questions for event management design
answer
1. What needs to be monitored? 2. What type of monitoring is required? 3. When should an event be generated? 4. What information needs to be communicated with the event? 5. Who will the messages be delivered to? 6. Who will be responsible for communicating and taking necessary follow-up actions?
question
Instrumentation
answer
defining and designing how IT components and services will be monitored and controlled
question
consideration for effective instrumentation
answer
1. event generation, classification, communication, and escalation 2. availability and adequacy of a CI's event generation capabilities 3. What data will be captured in the record? 4. Will active or passive monitoring be used? 5. Where will events be logged and stored? 6. How will supplementary data be gathered?
question
Error messaging
answer
1. services should be designed and tested to support event management -meaningful error messaging -adequate supporting detail to facilitate analysis 2. service management tools can provide enterprise wide monitoring -centralized monitoring across complex distributed environments -standardized messaging across multiple platforms
question
event detection and alerting mechanism
answer
-event management design +configuration and population of tools for event detection +establishing rule sets and criteria for correlation -thorough design requires the following knowledge +relationship of services to business processes +service level requirements +resource supporting each CI +normal and abnormal operations of each CI +information that needs to be captured for each event +incident categorization and prioritization codes +CI dependencies and significance of multiple events
question
event records should include:
answer
1. device 2. component 3. type of failure 4. date and time 5. parameters 6. unique identifier 7. value
question
Activities of event management
answer
1. event occurs 2. event notification 3. event detection 4. event logging 5. first-level correlation and filtering 6. significance of events 7. second-level correlation 8. further action required 9. response selection 10. review actions 11. close event
question
Activities of event management: event occurs
answer
-everyone involved in designing and supporting services should be involved + defining the types of events that need to be detected + fine-tuning event filtering levels and correlation rules
question
Activities of event management: event notification
answer
- CIs can communicate events in two ways: +polled by a service management tool (active monitoring) +CI generates a notification when certain thresholds are met (passive monitoring) -Event notifications: +proprietary or standards based +meaningful data to targeted audience +clearly defined roles and responsibilities
question
Activities of event management: event detection
answer
- notifications will result in event detection -detection can happen with agent software or centralized management tool
question
Activities of event management: event logging
answer
-events should be logged -logging can be centralized or left on the device -clear instructions should be defined on how and when to check the logs if left on device -standards should be defined for how long events are kept before deletion
question
Activities of event management: first-level correlation and filtering/significance of events
answer
-determines the event type and whether to communicate it +informational: does not require action; logged for predetermined period of time; used to generate statistics +warning: service or device has reached threshold that requires action; actions can prevent an exception from occurring; failures should be treated as exceptions, even when services is not impacted +exception: abnormal operation, often SLA or OLA breach; can be total failures or degraded performance; includes events such as unauthorized devices detected
question
Activities of event management: second-level correlation
answer
-warning events require additional correlation: +what is the significance and what action needs to be taken? +management tools can compare performance against standards +correlation engines can apply additional business rules
question
Activities of event management: further action required
answer
-generating an incident record -generating an RFC -escalation to change management related to an authorized RFC -automated scripts -automated paging or notification systems -database actions
question
Activities of event management: response selection
answer
-auto response -alerts and human intervention -incident, problem or change? -open an RFC- can be initiated when event occurs or through correlation -open an incident record -open or link to a problem record -special types of incident - incidents with no business impact: +generate and escalate incident to appropriate team +record that no business impact occurred and ensure not calculated as downtime or reported as business impact incident +can be used to demonstrate proactive service provider capabilities
question
Activities of event management: review actions
answer
-warning and exception events should be reviewed: +ensure events were handled appropriately +can be automated (open/close or down/up events) + should not duplicate incident, problem, or change closure steps +provides input into evaluation and improvement of event management
question
Activities of event management: close event
answer
-events that generate an incident, problem, or change should be formally closed and linked to associated records
question
Triggers of event management
answer
-exceptions to acceptable CI performance or state -exceptions to automated process or procedures -exceptions to business processes being monitored -completion of tasks or jobs -CI status changes -application or data access
question
inputs of event management
answer
-operational and service level requirements -alarms, alerts, and defined thresholds -event correlation rules -automated responses -defined roles and responsibilities -operational procedures for event response
question
outputs of event management
answer
-communicated and escalated events -event logs -events initiating incident management -events indicating SLA or OLA breaches -events indicating completion of operational activities -SKMS event information and history
question
Service design with event management
answer
-service level management -information security management -availability management -capacity management
question
Service transition with event management
answer
-service asset and configuration management -knowledge management -change management
question
service operation with event management
answer
-incident management -problem management -access management
question
key information involved in event management
answer
-event messages -database holding CI state and performance information -monitoring tools and agent software -correlation engines and rules sets
question
challenges of event management
answer
-obtaining initial funding for tools and effort required -establishing the correct level of filtering -deploying monitoring tools and agents across the enterprise -automated and monitoring activities impacting capacity utilization -acquiring or developing the necessary skills -deploying tools without processes to define and operate them
question
risks of event management
answer
-failure to obtain adequate funding or resources -incorrect levels of filtering -failure to maintain deployment and monitoring across the enterprise
question
event management process owner
answer
1. carrying out the generic process owner role for event management 2. planning and managing support for event management tools and processes 3. coordinating interfaces between event management and other ITSM processes
question
Other event management roles: service desk
answer
-typically not involved unless event requires a response within the scope of the service desk -initial investigation of events identified as incidents -escalating incidents and related events to appropriate resources as needed -communication about the status of events as appropriate to all stakeholder groups
question
other event management roles: technical and application management
answer
-support event management across the service lifecycle -service design: defining events, detection and correlation mechanisms and responses -service transition: testing to ensure events are detected and responses are appropriate -service operation: performing event management for systems and applications under their control; responding to incidents and problems related to events in their areas; ensuring IT operations and service desk staff are trained appropriately for their level of involvement in event management
question
other event management roles: IT operations management
answer
-event monitoring commonly delegated to IT operations where it exists: event monitoring and first-line response; following SOPs related to each event; ensuring incidents are created and escalated as appropriate
question
CSF: Detecting all changes of state that have significance for the management of CIs and IT services
answer
KPI: Number and ratio of events compared with the number of incidents KPI: Number and percentage of each type of event per platform or application versus total number of platforms and applications underpinning live IT services (looking to identify IT services that may be at risk for lack of capability to detect their events)
question
CSF: Ensuring all events are communicated to the appropriate functions that need to be informed or take further control actions
answer
KPI: Number and percantage of events that required human intervention and whether this was performed KPI: Number of incidents that occurred and percentage of these that were trigered without a corresponding event
question
CSF: Providing the trigger, or entry point, for the execution of many service operation processes and operations management activities
answer
KPI: Number and percentage of events that required human intervention and whether this was performed
question
CSF: Provide the means to compare actual operating performance and behvaior against design standards and SLAs
answer
KPI: Number and percentage of incidents that were resolved without impact to the business (indicates the overall effectiveness of the event management process and underpinning solutions) KPI: Number and percentage of events that resulted in incidents of changes KPI: Number and percentage of events caused by existing problems or known errors (this may result in a change to the priority of work on that problem or known error) KPI: Number and percentage of events indicating performance issues (for example, growth in the nubmer of times an application exceeded its transaction thresholds over the past six months) KPI: Number and percentage of events indicating potential availability issues (for example, failovers to alternative devices, or excessive workload swapping)
question
CSF: Providing a basis for service assurance, reporting, and service improvement
answer
KPI: Number and percentage of repeated or duplicated events (this will help in the tuning of the correlation engine to eliminate unnecessary event generation and can also be used to assist in the design of better event generation functionality in new services) KPI: Number of events/alerts generated without actual degradation of service/functionality (false positives- indication of the accuracy of the instrumentation parameters, important for CSI)