Our previous blog post, Designing and Deploying Cisco AI Spoofing Detection, Part 1: From Device to Behavioral Model, introduced a hybrid cloud/on-premises service that detects spoofing attacks using behavioral traffic models of endpoints. In that post, we discussed the motivation and the need for this service and the scope of its operation. We then provided an overview of our Machine Learning development and maintenance process. This post will detail the global architecture of Cisco AISD, the mode of operation, and how IT incorporates the results into its security workflow.
Since Cisco AISD is a security product, minimizing detection delay is of significant importance. With that in mind, several infrastructure choices were designed into the service. Most Cisco AI Analytics services use Spark as a processing engine. However, in Cisco AISD, we use an AWS Lambda function instead of Spark because the warmup time of a Lambda function is typically shorter, enabling a quicker generation of results and, therefore a shorter detection delay. While this design choice reduces the computational capacity of the process, that has not been a problem thanks to a custom-made caching strategy that reduces processing to only new data on each Lambda execution.
Global AI Spoofing Detection Architecture Overview
Cisco AISD is deployed on a Cisco DNA Center network controller using a hybrid architecture of an on-premises controller tethered to a cloud service. The service consists of on-premises processes as well as cloud-based components.
The on-premises components on the Cisco DNA Center controller perform several vital functions. On the outbound data path, the service continually receives and processes raw data captured from network devices, anonymizes customer PII, and exports it to cloud processes over a secure channel. On the inbound data path, it receives any new endpoint spoofing alerts generated by the Machine Learning algorithms in the cloud, deanonymizes any relevant customer PII, and triggers any Changes of Authorization (CoA) via Cisco Identity Services Engine (ISE) on affected endpoints.
The cloud components perform several key functions focused primarily on processing the high volume data flowing from all on-premises deployments and running Machine Learning inference. In particular, the evaluation and detection mechanism has three steps:
- Apache Airflow is the underlying orchestrator and scheduler to initiate compute functions. An Airflow DAG frequently enqueues computation requests for each active customer to a queuing service.
- As each computation request is dequeued, a corresponding serverless compute function is invoked. Using serverless functions enables us to control compute costs at scale. This is a highly efficient multi-step, compute-intensive, short-running function that performs an ETL step by reading raw anonymized customer data from data buckets and transforming them into a set of input feature vectors to be used for inference by our Machine Learning models for spoof detection. This compute function leverages some of cloud providers’ common Function as a Service architecture.
- This function then also performs the model inference step on the feature vectors produced in the previous step, ultimately leading to the detection of spoofing attempts if they are present. If a spoof attempt is detected, the details of the finding are pushed to a database that is queried by the on-premises components of Cisco DNA Center and finally presented to administrators for action.
Figure 1 captures a high-level view of the Cisco AISD components. Two components, in particular, are central to the cloud inferencing functionality: the Scheduler and the serverless functions.
The Scheduler is an Airflow Directed Acyclic Graph (DAG) responsible for triggering the serverless function executions on active Cisco AISD customer data. The DAG runs at high-frequency intervals pushing events into a queue and triggering the inference function executions. The DAG executions prepare all the metadata for the compute function. This includes determining customers with active flows, grouping compute batches based on telemetry volume, optimizing the compute process, etc. The inferencing function performs ETL operations, model inference, detection, and storage of spoofing alerts if any. This compute-intensive process implements much of the intelligence for spoof detection. As our ML models get retrained regularly, this architecture enables the quick rollout—or rollback if needed—of updated models without any change or impact on the service.
The inference function executions have a stable average runtime of approximately 9 seconds, as shown in Figure 2, which, as stipulated in the design, does not introduce any significant delay in detecting spoofing attempts.
Cisco AI Spoofing Detection in Action
In this blog post series, we described the internal design principles and processes of the Cisco AI Spoofing Detection service. However, from a network operator’s point of view, all these internals are entirely transparent. To start using the hybrid on-premises/cloud-based spoofing detection system, Cisco DNA Center Admins need to enable the corresponding service and cloud data export in Cisco DNA Center System Settings for AI Analytics, as shown in Figure 3.
Once enabled, the on-prem component in the Cisco DNA Center starts to export relevant data to the cloud that hosts the spoof detection service. The cloud components automatically start the process for scheduling the model inference function runs, evaluating the ML spoofing detection models against incoming traffic, and raising alerts when spoofing attempts on a customer endpoint are detected. When the system detects spoofing, the Cisco DNA Center in the customer’s network receives an alert with information. An example of such a detection is shown in Figure 4. In the Cisco DNA Center console, the network operator can set options to execute pre-defined containment actions for the endpoints marked as spoofed: shut down the port, flap the port, or re-authenticate the port from memory.