This ZenPack is developed and supported by Zenoss Inc. Commercial ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to request more information regarding this or any other ZenPacks. Click here to view all available Zenoss Commercial ZenPacks.
This ZenPack provides support for monitoring Amazon Web Services.
Version 5.1.0 - Download
This ZenPack provides support for monitoring Amazon Web Services (AWS). Monitoring for the following EC2, VPC, RDS, CloudFormation, ECS and S3 entities is provided through a combination of the AWS EC2, RDS, CloudFormation, ECS and CloudWatch APIs.
The features added by this ZenPack can be summarized as follows. They are each detailed further below.
The following entities will be automatically discovered through an account name, access key and secret key you provide. The attributes, tags and collections will be updated on Zenoss’ normal remodeling interval which defaults to every 12 hours.
Attributes: Name, Region, ECS Cluster, ECS Service, ECS Container Instance, Availability Zone, Last Status, Desired Status
Starting from AWS ZP 5.0, the modeling was changed. First modeling starts right after AWS device has been added. After that, AWS device will be modeled every 12 hours by default.
Note: Right now we use only one `aws.Base` plugin in 'Modeler Plugins' and new `zAWSEnabledPlugins` property in the Configuration Properties' was added to control a list of AWS modeler sub-plugins.
If you model device directly by pressing 'Model Device' button, changes will be not be applied instantly after modeling is done. It will take some time to apply all new data to your device.
There are two datsources in EstimatedCharges template. AWSModelRunPlugin is responsible for starting special server on zenpython's side to listen for requests from zenmodeler, when modeling is performed. AWSQueueProcessingPlugin periodically checks internal queue on zenpython's side and sends datamaps contained to zenhub.
`zAWSEnabledPlugins` controls a list of AWS modeler sub-plugins, such as:
The following metrics will be collected every 5 minutes by default. Any other CloudWatch metrics can also be collected by adding them to the appropriate monitoring template. The Average statistic is collected, and the graphed value is per second for anything that resembles a rate.
Metrics: EstimatedCharges, EC2EstimatedCharges, S3EstimatedCharges, RDSEstimatedCharges, DynamoDBEstimatedCharges, LightsailEstimatedCharges, RedshiftEstimatedCharges, SESEstimatedCharges, SNSEstimatedCharges, CloudTrailEstimatedCharges, DataTransferEstimatedCharges, QueueServiceEstimatedCharges, KmsEstimatedCharges, ECSEstimatedCharges
Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskReadBytes, DiskWriteBytes, NetworkIn, NetworkOut
Note: These metrics aggregated only for EC2 Instances with detailed monitoring enabled
Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskReadBytes, DiskWriteBytes, NetworkIn, NetworkOut, StatusCheckFailed_Instance, StatusCheckFailed_System, CheckReserved
Metrics: VolumeReadBytes, VolumeWriteBytes, VolumeReadOps, VolumeWriteOps, VolumeTotalReadTime, VolumeTotalWriteTime, VolumeIdleTime, VolumeQueueLength
Provisioned IOPS Metrics: VolumeThroughputPercentage, VolumeReadWriteOps
Metrics: BucketTotalSize, BucketKeysCount
Metrics: CPUUtilization, FreeableMemory, FreeStorageSpace, SwapUsage, ReadIOPS, WriteIOPS, DatabaseConnections, DiskQueueDepth, ReplicaLag
Metrics: NumberOfMessagesSent, NumberOfMessagesDeleted, ApproximateNumberOfMessagesVisible
Auto Scaling Groups
Metrics: GroupInServiceInstances, GroupPendingInstances, GroupStandbyInstances, GroupTerminatingInstances, GroupTotalInstances
Application Load Balancers:
Metrics: ActiveConnectionCount, NewConnectionCount, RejectedConnectionCount, TargetConnectionErrorCount, ELBAuthError, ELBAuthFailure, TargetResponseTime, TargetTLSNegotiationErrorCount, ClientTLSNegotiationErrorCount, HTTPCode_ELB_3XX_Count, TPCode_ELB_4XX_Count, HTTPCode_ELB_5XX_Count, HTTPCode_ELB_500_Count, HTTPCode_ELB_502_Count, HTTPCode_ELB_503_Count, HTTPCode_ELB_504_Count, HTTPCode_Target_2XX_Count, HTTPCode_Target_3XX_Count, HTTPCode_Target_4XX_Count, HTTPCode_Target_5XX_Count, ProcessedBytes, ConsumedLCUs, RequestCount, RuleEvaluations
Note: Depends on Load Balancer configuration, some metrics might be unavailable.
The Amazon CloudWatch datasource type also allows for the collection of any other CloudWatch metric.
Besides CloudWatch metrics, the following metrics will also be collected every 5 minutes by default.
Monitoring large cloud may require to contact AWS support with request to increase CloudWatch API requests limit. Appropriate event will be created in Zenoss in case limit for CloudWatch requests has been exceeded.
CloudWatch datasources utilize multithreading for better performance. It is possible to increase speed by setting twistedthreadpoolsize value in configuration of zenpython daemon. Please note that setting higher value will result also in bigger memory usage.
Collection interval my be changed using zAWSCloudWatchCollectionInterval property. By default it is set to 300 seconds. This will affect most of Amazon CloudWatch datasources and may help in reducing monitoring costs. It doesn’t change interval of datapoints on the graphs, but only changes frequency Zenoss performs API calls to CloudWatch.
Note: By default, `CloudFormation` and `ECS` modeler sub-plugins are not enabled. Users need to add them manually and initiate the modeling for those components to be modeled and monitored. For AutoScalingGroup, the monitoring to work, user needs to enable the monitoring on AWS console for the specific AutoScalingGroups.
User may configure Zenoss to consume specific SQS Queues, parse and convert messages to Zenoss events.
SQS Events generated might be delayed in their creation due to Amazons use of short polling by default.
If configured to not delete messages (listen), events will be sent only for messages created after previous monitoring cycle. This prevents from flooding Zenoss Events console with historical SQS messages.
Monitoring plugin collects CloudFormation Events for each CF Stack and shows them as Zenoss Events with the same time. Also it updates status of CF Stack or CF Resource component it belongs.
Standard Zenoss Event Fields
CREATE_FAILED and DELETE_FAILED events have CRITICAL severity, all others INFO one.
By default all generated events are mapped to /AWS/CloudFormation event class.
Once the event is sent, it will not be sent again. If the user clears the event, it will not reappear again.
In case zAWSCloudFormationEventsAutoClear zProperty set to True for each CREATE_COMPLETE and DELETE_COMPLETE corresponding autoclear event will be generated to clear previous CRITICAL ones.
Notifications for events now have the option to be sent with email using Amazon SES.
In addition to the standard email notification fields you will need to fill out the following additional fields.
The senders email and the email of the subscribers must be verified within SES for the target region.
The following resource counts subject to the soft-limits will be collected every 5 minutes and when any of these metrics approaches a soft limit threshold, a Zenoss event will be triggered.
The thresholds are set to the default limit values. If you changed this limit for your account, you should manually change the Max threshold value using the following steps:
You can optionally configure each monitored AWS account to attempt to discover and monitor the guest Linux or Windows operating systems running within each EC2 instance, when specific Tags are present. This requires that your Zenoss system has the network and server access it needs to monitor the guest operating system. VPC and non-VPC modes are supported.
The guest operating system devices’ life-cycle are managed along with the instance. For example, the guest operating system device is set to a decommissioned production state when the EC2 instance is stopped, and the guest operating system device is deleted when the EC2 instance is destroyed.
When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact capability for services running on AWS. The following service impact relationships are automatically added. These will be included in any services that contain one or more of the explicitly mentioned entities.
Service Impact Relationships
The ZenPack now provides a way to group and collect AWS components on an account based on AWS Tags. You can define a tag filter by navigating to your AWS account device and selecting “Add AWS Tag Filter” from the “+” menu in the lower left corner of the screen. On the dialog that pops up, give your Tag Filter a name, and select the tag you want to track. You can combine multiple tags with the AND and OR operators. You can also generate a Component Group based on the Tag Filter. Click Submit when finished.
The Tag Filter will be visible in the the navigation bar area, and the “AWS Tag Filters” section. This will allow you to view all components of any type matched by the filter, along with their graphs.
In addition, you can use this Tag Filter to view billing information for the group of components in the Expenses Analysis section (see Expense Analysis).
The AWS Tag Filters use a special monitoring template, TagFilter, which is not visible in the device-level monitoring template section, but is visible if you go to Advanced > Monitoring Templates. From here, you can add modify the template, should you need to do so.
To turn on monitoring of charges for Amazon services one should enable EstimatedCharges monitoring template for AWS device. This will add graphs with billing information into device overview page and on Expenses Analysis page.
To control spendings limit zAWSBillingCostThreshold zProperty should be used. It is set to 1000 by default. This property sets threshold for bullet-like billing graph to turn red and used in “Billing Cost” threshold as well. Event is generated if spendings go over it’s value.
Billing graphs shows estimated charges for whole account and detailed charges per service. Top 10 services displayed on pie chart.
This ZenPack uses linear interpolation to predict total per month charges and this information displayed on device overview page and on the `Total Estimated Charge` graph as new datapoint in `Expense Analysis`.
You can track AWS usage charges for a given tag or tag group, and grouped by specific services. In order to set this up, you must create a Tag Filter to match the tag or tags in which you are interested in. And then you must configure detailed billing reports in your AWS account. See Configuring Charges Per Tags Monitoring for details.
This zenpack uses the Amazon Cloudwatch API to collect metric data. The first 1,000,000 calls to this API each month are free, and then additional calls are charged at a rate of $0.01 per 1,000 calls. For specific pricing questions, see AWS Cloudwatch Pricing.
A report is provided (Reports -> AWS Reports -> Monitoring Costs) to provide a detailed breakdown of API calls and estimated cost per monitoring template on each monitored EC2 Account.
CloudFormation Stacks Blueprints provides graphical representation of all Stacks templates. The same way as it’s done in AWS Console.
At start only stacks are shown. Double click on the node expands stacks and shows its resources. Also buttons for quick expanding and collapsing all visible stacks are available.
The set of visible stacks can be narrowed down by regions and stack’s name filters. Stack name filter sets the fragment needs to be present in stack’s name. After setting filters Refresh button should be pressed to apply changes.
Each node in stack is resource defined in template. First row of text specifies name of resource defined in template, the second one is type of resource and the last is id of deployed AWS entity.
By default diagram only shows resources were deployed, to show all resources Show Undeployed Resources checkbox can be used.
Links represent dependencies between resources (e.g. EC2 Instances refer Security Groups).
There also are separate blueprints for each CF Stack component.
'zAWSECSEventBuses' contains names of AWS Event Bridge Event Buses which are used for ECS Schedule Rules discovery.By default Event Bus with name 'default' is used. If you use custom Event Bus for ECS Schedule Rules, you will need to add it name to 'zAWSECSEventBuses' manually.
Use the following steps to start monitoring EC2 using the Zenoss web interface.
Alternatively you can use zenbatchload to add accounts from the command line. To do this, you must create a file with contents similar to the following. Replace all values in angle brackets with your values minus the brackets. Multiple accounts can be added under the same /Device/AWS/EC2 section.
/Devices/AWS/EC2 loader='ec2account', loader_arg_keys=['accountid', 'devicename', 'accesskey', 'secretkey', 'devicePath', 'collector']
<devicename> accountid='accountid', devicename='devicename', accesskey='accesskey', secretkey='secretkey', devicePath='/Devices/AWS/EC2', collector='localhost'
You can then load the account(s) with the following command:
$ zenbatchload <filename>
Use zAWSRegionToModel property to narrow components modeled. By default it has empty value, so all EC2 regions and it’s child components will be discovered. Specify EC2 region name, or multiple names separated by comma in it. This will be used as a filter and may help with large AWS accounts.
Some regions (as 'ap-east-1' (Hong Kong)) may be disabled by default, on the AWS console. In this case message about skipping it will be shown while modelling.
Use the following steps to configure instance guest device discovery. Guest device discovery must be configured individually for each EC2 account.
Please note: zAWSDiscover defines a filter for guest devices discovery and used before zAWSGuestDeviceClassTags. List of tags in zAWSGuestDeviceClassTags used from top to bottom, so first matching key=value will apply. Also property zAWSGuestDeviceClassTags takes precedence over Device Overview configured classes.
`zAWSDiscover` also supports complex tags in the format `<some:long:key:value>;` Multiply colons are allowed in the `key` name only (`key:key:value`).
If your instances are VPC instances, and are in a different VPC than the Zenoss server that’s monitoring the EC2 account, you must add a Collector tag to containing VPC with the value set to the name of the Zenoss collector to which discovered guest devices should be assigned.
You can optionally configure an alternate remote collector for the devices created from AWS Instances with the following configuration properties:
New zAWSCloseGuestEvents property added. If set to true, all open events for guest device will be closed while guest device deletion.
Guest devices should be discovered automatically during modeling. However, if an error occurs during modeling, or some other unexpected event, it is possible for guest devices to be skipped. If some guest devices appear to be missing, you can force the discovery process to be repeated.
In the Zenoss UI, navigate to your AWS EC2 Account device, and find the gear icon menu in the bottom left corner of the window. Under the this menu, click the option labeled “Find Missing Guest Devices.” This will schedule a job for immediate execution, which will clear the guest ID cache and run the discovery process for each instance. Existing guest devices will remain, but any devices previously missed will be detected. You can monitor the progress of this job in the Jobs section of the UI, under the Advanced Tab.
Several criteria must be met in order for a guest device to be discovered by the AWS ZenPack. Those requirements are as follows:
If all the criteria above are met by the EC2 Instance, and an existing device with and ID or title matching the EC2 Instance’s ID exists, or an existing device has a matching IP address, the EC2 Instance will be linked to that existing device.
If no existing device matches the EC2 Instance, but the criteria above are met, a new device will be created in the Linux or Windows device class configured for the account.
Note that guest device creation is triggered during modeling, but is queued as a job to be run later. Thus a guest device will not show up until after modeling has completed, and the corresponding scheduled job has completed.
If a device link appears to be missing, double check the criteria above, and run the Find Missing Guest Devices task described in the preceding section.
When creating guest devices a job should be scheduled for each guest device to be created. If a job was created for the guest device, but the guest device was not created, you can check the job output in the Jobs section of Zenoss.
If a job was not created, you can try running the modeler in debug mode to see why guest device creation was skipped.
You can optionally configure your monitored AWS account, so that the newly added or recently dropped instances are automatically reflected on Zenoss UI during monitoring:
If zAWSRemodelEnabled is false, only the instance state will be updated on existing instances. If set to true, then all instance properties will be updated on existing instances, and new instances will be added to the model.
You can disable auto change of the production state for EC2 Instances, for this purpose you have to:
By default, the production state is changed to ‘Production’ (1000) for running EC2 instances, and to ‘Decommissioned’ (-1) for stopped ones. These states may be customized by specifying the desired production state IDs (numbers) in zAWSAutoChangeProdStateRunning and zAWSAutoChangeProdStateStopped.
If user changes production state for some guest device manually, this state will be used for this guest device when EC2 instance switches to running state.
Use the following steps to specify the PEM file to region for use in auto-discovering instance guest operating systems:
In some cases, you may have a large quantity of AWS Snapshots in your environment, which can slow down performance of the modeler. If you do not need to model them, you can disable collection of snapshots by setting the zAWSEnableSnapshotCollection property to false. This will prevent the modeler from collecting and modeling snapshots in future. It will also cause current snapshot components to be removed from Zenoss the next time the model is updated.
If you have already modeled your AWS snapshots, and the count is high, removing them can cause the modeler to timeout. If this occurs, you can remove them manually by running the included dmd script delete_all_snapshot_components from the zope container.
Note: The delete_all_snapshot_component script will delete all AWS snapshot components from all AWS devices without prompting for confirmation. If you have multiple AWS devices and only want to delete snapshots from some devices, use zendmd.
If you use tag filters to organize your modeled AWS components, you may also want to enable monitoring charges per tag filter added to Zenoss. This will require configuration on both AWS and Zenoss sides.
To process Cost and Usage reports AWS Athena service is used, so please expect some extra costs for the service usage.
Configuration on AWS side (you may use a different account to collect billing data from the account being used for monitoring, by using zAWSBillingAccessKey and zAWSBillingSecretKey zProperties):
Note: It can take up to 24 hours for AWS to start delivering reports to your S3 bucket.
For configuration on Zenoss side set the next zProperties:
If Cost and Usage reports are stored on separate account, zAWSBillingAccessKey and zAWSBillingSecretKey zProperties should be set to access and secret keys of this account. If these properties are empty, access and secret key from device will be used.
If a tag is used for Tag Filter, but is missing in Cost and Usage reports, billing data will not be collected for such Tag Filter and corresponding Info event with list of missing tags will be generated.
If necessary, this zenpack can query AWS through an HTTP proxy. This is configured in the usual way, by setting the *_proxy environment variables. Because of this, the setting is global for a particular zenoss process. It is therefore important to be aware that, for instance, enabling proxying for zenpython may cause it to be used for other service monitoring beyond just AWS.
To configure these environment variables, edit the service definitions (via ‘serviced service edit’ or the Control Center UI) for the zenpython, zenmodeler, and zenjobs containers as follows:
"http_proxy=http://[proxy host]:[proxy port]",
"https_proxy=http://[proxy host]:[proxy port]",
Note that both http_proxy and https_proxy values must begin with http://. The no_proxy variable is required so that communication with other zenoss services is not impacted.
Note: Do not add this to the zope container.
To control SQS Queue Messages Monitoring zAWSSQSQueues zProperty should be used. It defines list of queue name RegEx patterns and Zenoss event generation configuration.
For each SQS queue, the list of patterns is checked from top to bottom, and the first queue name match will define a configuration. In case of no match, queue will be not listened.
Messages from the queue will be consumed, and parsed with one of the following message types:
Uses message body as subject/message for Zenoss events. Severity will be Info.
There is possibility to attach SQS Queues to SNS Topics. The parser Zenoss will decode message and use “Subject” and “Message” fields to set corresponding Zenoss event fields. Event severity will be Info.
Zenoss will expect message body to be in JSON format, decode and treat it as already formed Zenoss event. The next fields will be extracted:
For all parsers generated events will have the next fields:
If event class or event class key is not defined in event, AWSSQSMessages event class key (mapped to /AWS/SQSMessage) will be used.
You can get namespaces and metrics available on selected device, for this purpose you have to: 1. In CloudWatch datasource edit window click on the dropdown button in Namespace or Metric Name input fields. 2. In new window select Device, then Region and list of available Namespaces and Metrics for that region will appear.
Before testing datasource (using button Test in datasource edit window) you may set valid device name in Device Name input field.
Sometimes you may need to monitor some AWS CloudWatch metrics which are not defined in AWS ZP. In case AWS ZP already models component you need to add metric you may just update provided template (create new one with-addition suffix to avoid template overriding during AWS ZP upgrade) and add new datasource with corresponding graphs. You will need Amazon CloudWatch datasource with Namespace and Metric Name fields populated.
If there is no component you want to monitor you may create graph on device level. But you will need a separate datasource for each component instance.
Here is an example how to add monitoring of AWS ECS:
To enable incremental modeling for ECS Components you will need to select any Region component, switch to the Templates in the drop-down menu and enable ECSWatch datasource.This datasource is disabled by default. Collection interval can be changed by using Cycletime property of ECSWatch datasource.
'zAWSMaxCallRetries' with default value of 4 is responsible for the number of maximum retries per one call to AWS API. You can increase it if you have throttling issues with AWS API. Setting this property to high values may cause too long calls and delays.
This property is applied for AutoScalingGroups, LambdaFunctions, LoadBalancers, TargetGroups. After changing this property and remodeling - device historical data for that components will be lost. By default, this property is enabled for a fresh install for AWS ZP 5.1.0 and disabled for upgrade to AWS ZP 5.1.0.
This property uses AWS ARN as `id ` for AutoScalingGroups, LambdaFunctions, LoadBalancers and TargetGroups to handle a case, when some component was created on the AWS, deleted and recreated with the same name. On the AWS side - these are two different components. On Zenoss side this was the same component for AutoScalingGroups, LambdaFunctions, LoadBalancers and TargetGroups. Setting `zAWSUseNewIds` to true and remodeling will fix this issue.
Sometimes throttling error `Rate exceeded` may happens during modeling Lambda Functions, while getting tags info. `zAWSLambdaFunctionsRetriesDelay` sets base delay in seconds for retries.
If you have more than 1000 SQS Queues in the one region, only 1000 will be discovered. To handle this case you may use `zAWSSQSQueuesPrefix` property. Add prefixes to specifySQS Queues you want to model additionally. Only strings should be used, regexes are not supported. SQS Queues that are collected using `zAWSSQSQueuesPrefix` are also limitedto 1000 Queues per one prefix.
Installing this ZenPack will add the following items to your Zenoss system.
Event Class Mapping
"ec2:Describe*", "ecs:Describe*", "ecs:List*", "events:ListRules", "events:ListTargetsByRule", "elasticloadbalancing:Describe*",
If modeling is not working and next message is present in modeling log:
No suitable AWS Modeler Server instance found.
Please check in zenpython log if it's running.
Check if monitoring templateEstimatedChargesis available on device level withAWSModelRunPluginandAWSQueueProcessingPlugindatasources inside it. If not, add them from Advanced - Monitoring Templates - EstimatedCharges. Then restart zenpython and look into zenpython.log for the message -AWS Modeler Server is up.It should appear before regular collecting process starts.
AWS Modeler Server is up.
During monitoring AWS Account such error events might be created:
During processing ... datapoints in ... an error occured: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing
An error occurred (SignatureDoesNotMatch) when calling the GetQueueUrl operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for deta
An error occurred (AuthFailure) when calling the DescribeReservedInstances operation: AWS was not able to validate the provided access credentials
These errors mean that Zenoss could not connect to AWS API due to wrong access token. It might be caused by: 1. Wrong AWS credentials. Please check EC2 Access and Secret Keys. 2. Wrong time on collector host. Please adjust system clock on collector hosts. Consider using NTP daemon to automatically adjust host’s clock.
The next error occurs during gathering billing data for tags and means that zAWSBillingAccessKey and zAWSBillingSecretKey need to be checked:
Could not fetch billing data. Check your zAWSBilling* properties: An error occurred (SignatureDoesNotMatch) when calling the GetObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing meth
The AWS Zenpack of versions 2.0.0 / 2.1.0 can be upgraded. To upgrade the ZenPack, install the latest version over the existing one. There is no action for the user to migrate the data. The performance data and events of old ZenPack are retained as per the retain policy settings.
During upgrade from version 2.x to 3.0.0 and above all performance data for S3 Buckets will be lost.
When upgrading from 3.x to 4.x, tags are structured differently. Devices must be remodeled to handle tags properly.
In case of using local copy ofEstimatedChargestemplate on AWS device, after upgrade to 5.x and above, binded local template needs to be synced with ZenPack's one manually. There are new datasourcesAWSModelRunPluginandAWSQueueProcessingPluginneed to be copied to local template.
In version 5.0.0 modeling CloudFormation components was enabled by default. This was fixed in version 5.0.1. If new device was added in version 5.0.0, recheck zAWSEnabledPlugins for enabling/disabling CloudFormation modeling.
After upgrading to 4.1.1 performance data for LoadBalancers and Target Groups components, (which were introduced in 4.1.0) will be lost.
Starting from version 4.1.0 AWS ZenPack is no longer supported on Zenoss 4.x.
When upgrading from versions 4.0.2, you may see errors regarding default_value. Those errors will be cleared automatically within the next monitoring cycles. If those events are not cleared, close them manually.
In the current version of Zenpack monitoring of large AWS account (e.g. >1000 EC2 instances and volumes) may cause performance issues:
It is possible to reduce number of datapoints collected by disabling monitoring templates you don’t need.