Cloud Field Day 18 is here and Juniper is kicking off a packed couple of days of exciting presentations. Juniper Networks has been making some strong innovative moves over recent years. The purchase of Apstra in 2020 has become a core story for data center and cloud networking portfolio.
Cloud Automation is Not a Goal. It’s a Necessity
It’s unlikely that anything needs automation as much as a network. The amount of policy, dynamics, and combination of both data movement and data security is intense. This means the amount of risk is extremely high for misconfiguration or external actors taking action on your data.
Call it digital transformation or whatever you want. What matters is the speed at which your team needs to work is not achievable without some level of automation. Apstra was founded to be able to bring the idea if intent-based networking to the forefront. Can your network be trustworthy, consistent, and resilient? It can when automation as a practice is core to your operations.
CFD18: Private Data Center as Easy as Cloud – Mansour Karam
Starting in how data center evolution has occurred was a fun throwback. Here we are in 2023 all talking about AI and ML while the networks of the world are struggling to catch up to the use-cases that are very real already.
Being able to interact with AI and ML using continuous path optimization is critical. Especially in the data center where we have interaction with public and private cloud networks.
To really reach deterministic control in dynamic networks, you need to have three fundamental source capabilities.
- Intent-based – signals that give application and traffic intent. This is core to being able to do dynamic policy.
- AI-driven – use probabilistic insights to discover patterns and drive decisions (and ideally, actions).
- App-aware – the application is the key asset for the business and being app-aware is required to really build a true intent-based networking solution.
I was curious at the start about how the human interaction with the platform to influence how application operations and network operations can be tuned. The application awareness is key IMHO because there is so much information floating up at Layer 4-7 that make policy and intent-based network
There’s a fourth bonus which is being open. Juniper has taken a good approach by using reference architectures using their JVD (Juniper Validated Designs).
So much of what we need to do is about frameworks, and functional automation.
Cloud Style On Premises Network – Chris Marget
Chris is a seasoned Tech Field Day presenter and fantastic technologist. Great opening to talk about the concept of where systems automation and the language of choice needs to be to work for on-premises and hybrid environments.
It may seem funny to talk about data center automation as part of “Cloud Field Day”. Think about why it matters: cloud is about a methodology of operations and consumption of resources. Just because we don’t have an “infinite” supply like a public cloud.
Demo time!!
Chris launched into a great demo of the dynamic lab. We are watching an automated deployment of a data center configuration which is based on wired up physical gear that we start configuring with the Apstra platform to define a topology and then apply the code using Terraform.
This spurred the conversation around the differences between this type of automated, policy-driven application versus “traditional” data center networking. Lots of good chat on the challenges of Terraform and device management came up. The main reason we ask about how Apstra manages Terraform resources is for understanding the failure domains and drift management.
Apstra is able to manage all of the configurations without needing to code in the native networking tools.
The demo is a very cool view of deploying a web application, a web proxy, and all the routes are dynamically built for the multiple back end web servers. Very cool. Terraform does a lot of heavy lifting which makes the configuration popular since many folks are familiar with it. Terraform does the initial blueprint creation, and lots of object level control is availalble.
It’s going to require some initial template configuration and care and feeding. There is no “easy button” for initial configuration for data center networks, but you also don’t need to do that code often.
There is also an interesting option to do rollbacks and a time machine type of capability. This is helpful for both auditing and the actual recovery to a known good configuration in case of an issue.
What is important to understand is the differentiation that Apstra brings. The platform does not replace some functional networking. We sometimes think that we can replace the network operator or automate initial physical configuration.
Apstra does really slick stuff for detecting anomalies and identifying the issues. This greatly cuts down the troubleshooting time.
Automating AI Cluster Design with Apstra – James Kelly
You had me at AI cluster networking.
Training models on an AI cluster requires an intense amount of movement of data. While GPU and processing power is the top resource, data transfer optimization is becoming the new bottleneck. There are growing datasets and growing amounts of processors running. This all adds up to a new challenge that comes with the advent of parallel processing but at significant scale.
Rail-optimized design is a method that NVIDIA uses with NVLink to reduce the amount of tromboning that is happening with data getting in and out of parallel GPUs. Juniper is able to understand all the way down to the physical cabling in the GPU cluster to be able to do network path optimization.
DEMO TIME AGAIN!
Configuration of the Terraform Apstra example we saw is held in Github here: https://github.com/Juniper/terraform-apstra-examples
Seeing the implementation of an 8-way leaf configuration and the amount of variability you could have tells you how important automation and data-driven optimization is an absolute necessity.
One thing you learn in this demo and discussion is the amount of protocol, software, and hardware interplay. There is literally no way to keep up with the dynamics using human operations.
The amount of analytics pouring through the environment is humbling. This is why it’s a great lead in to the final presentation.
How to be a Data Center Analytics Superhero – Kyle Baxter & Rajeev Menon Kadekuzhi
Operations depends greatly on visibility and awareness of state of the environment. Gathering telemetry and analytics from the environment is something nifty that the Juniper Apstra team are working with a new feature we saw plus more coming with flow in the near future.
IBA probes are used to do data gathering from the live environment. Collectors run to aggregate telemetry. Next the dashboards give you an simple, intuitive way to visualize and also interact with the environment using built-in CLI integration for troubleshooting.
I highly suggest you watch the troubleshooting flow for the final session because of how well the use-case is illustrated.
More Processors for Discovering Valuable Insights
The standout hit for me here is the amount of analytics processors that are available to combine data and then slice it and dice it to find out real insights from what’s happening.
The UX is pretty intuitive even if you don’t have a strong grasp about graph databases and visualization. You can do lots of additional processing on the data to help with logging effectively.
Anyone making the first jump to graph queries may find it a little daunting at first. Juniper did a great job by including lots of out-of-the-box queries and you can see the code, examples, and then build out your own queries using the UI or CLI.
It was also great to see the upcoming feature called Apstra Flow Data. This is a new capability that focuses on the enrichment happening to the data. Adding enrichment and context with visualization. This will be something to watch for sure!
Lots of very intuitive visualizations and being able to see dynamic flows and metrics is super cool.
Thoughts on Juniper and What’s Next
What matters is the methodology. Juniper and Apstra give an automated, adaptive way to manage the network. Data center networking products have the capability to be managed, but rarely build a good “all in one” solution for controlling multiple types of devices/networks/applications.
Adding the new features around AI clusters and being able to do rail-optimized configuration for your GPU environment. I was very happy to see that Juniper is pragmatic in the features they are adding.
The telemetry and graph features are very interesting. I’m curious to see how much time operations will spend in here versus doing data exports. Regardless, the Juniper Apstra team did a great job covering questions and the use-cases.
Make sure to check out Apstra and keep track of updates at Juniper Apstra on LinkedIn as well!
Bonus Reading
Good reading to help show more of what we have talked about during the sessions:
Automating AI Training Clusters with Juniper Apstra
Don’t Let Your AI Get Caught in Traffic
GitHub repo from the demo: https://github.com/chrismarget-j/cfd18
Check out Cloud Labs
Check out the Juniper page on Tech Field Day for more from previous presentations as well.
DISCLOSURE: My travel expenses were covered by Tech Field Day (GestaltIT) for the event. All analysis and content is my opinion from the presentation, discussion, and independent research.