Scale & manage a million containers in production
Today, more than ever, IT organizations are facing the challenge of adapting to fast evolving business needs and agility, while phasing with legacy systems in the most effective way possible. It also requires them to connect different worlds with their discrete standards and velocities.
A key cog in the solution wheel lies in the use of efficient schedulers with good scheduling algorithms, fast service registration and discovery, and pooling resources across the organization into a single cluster.
With the stay-at-home economy, more and more people are using streaming services for entertainment at home. To meet the demand while keeping control on costs, platforms are embracing containerization platforms along with orchestrators or schedulers to deliver services to their customers.
The goal of container orchestrators or schedulers such as Kubernetes or Nomad is to allow organizations to deploy more applications at faster rates while increasing resource density. Instead of running just one application per host, a scheduler separates applications from infrastructure so that multiple applications run on each host. This increases the resource utilization across a cluster and saves on infrastructure costs.
“For us, we found that many problems with our services were not caused by Kubernetes. They were there all along and Kubernetes just made them more visible,”
Illya Chekrykin – 2017 KubeCon
Choosing a scheduler that works for you
An efficient scheduler has three qualities. The first is to be able to service multiple development teams requesting to deploy applications at the same time. The second is to be able to rapidly place those applications across a global infrastructure fleet. Finally, the ability to reschedule applications from failed nodes to healthy nodes, providing cluster-wide application availability management and self-healing capabilities.
Parallelism allows a scheduler to scale in terms of both finding placements for large numbers of tasks (container or application) and servicing many requestors. Optimistically concurrent shared state scheduler can linearly scale scheduling throughput safely while ensuring clients are not oversubscribed.
Bin packing algorithm
Utilize all of a node’s resources before placing tasks on a different node. Allows for a larger variety of workloads and maximizes resource utilization and density which ultimately minimizes infrastructure costs.
Keep utilization even across all nodes. Distributes risk and reduces the response and recovery effort if a machine dies.
Make it easy for clients to submit jobs at scale.
Manually updating service configurations at this scale would be logistically impossible, so an automated solution is needed. Service discovery is essential when scheduling containers at scale. Tools such as Consul by HashiCorp or Zookeeper allow services to register themselves and their location with a central registry. Other services in the cluster then use this central registry to discover the services they need to connect to.
Pooling Resources across a single cluster
Rapid scheduling across infrastructure is helpful beyond just scaling cluster size. Most companies have a number of different teams or organizations, often competing for resources. Traditionally this problem has been solved by creating unique, segmented, clusters for each group. While this does provide isolation and prevents overlap, it usually leads to underutilization of each team’s cluster. With Rapid, cluster wide, deployment of resources, a company can unify and simplify their infrastructure while also allowing for more flexibility between teams
Drive Infrastructure-as-a-Code practice for Hybrid Cloud in highly regulated industries
Highly regulated industries like healthcare, financial, and government have been so far slow in adopting DevOps practices due to regulatory compliance like data privacy, data residency as well as security concerns. But these issues can be addressed using hybrid and multi-clouds, which include both private datacenters and public clouds leveraging IaC (infrastructure as code) concepts along with the required security principles and configuration policies. We have actively worked with a number of clients in these regulated industries to streamline their IaC practice and deploy infrastructure in hybrid clouds.
Hybrid and multi-cloud environment solutions used by regulated industries go beyond point solutions within each specific cloud to solutions like HashiCorp Terraform or configuration languages like Ansible. These solutions follow a declarative approach and define what is the infrastructure to be created rather than an imperative approach which defines each step to create the infrastructure.
As may be obvious, IaC is integral to DevOps principles that include continuous integration and continuous delivery/deployment.
Benefits of a robust IaC practice
Immutable instances (uniform infrastructure and avoids configuration drift which helps regulated industries for compliance and security)
Deployment speed (DevOPS and IaC tools along with cloud and virtualized infrastructure provide increase in environment provisioning and configuration speed)
Change management (versions of deployment manifests help deploy and manage code change which helps speed up bug fixes, ensuring consistent quality and security especially for regulated industries)
High scalability (provisioning infrastructure with code can help increase or decrease virtual cloud resources quickly and with ease to meet elastic demand)
Shorter feedback loops (quick deployment and testing with IaC can help developers build and release new features more quickly and get quick customer feedback and improve customer satisfaction)
Scale Security as Code Practice and Transform to Zero Trust Security as Code
Often conflated with Security Automation, Security as Code ensures that security policy, that is, the goals of your organization, is enforced by code, rather than by written procedures or manual approvals.
Zero Trust Security
As opposed to High Trust Security, this is a term from Network Security in which, rather than trusting a given network, set of networks or other infrastructure identity models, our identity model becomes more “logical”. This implies trusting a given credential, set of credentials or other logical identity, instead of trusting an IP address or VM.
The larger an organization is, and the greater the complexity and interdependence of its various technical, compliance and business requirements, more obvious is the need for an adaptive, enablement focused approach to security. Bottlenecks in the form of manual procedures will begin to cripple an organization that needs to both scale and meet increasingly uncertain threat and compliance needs.
Security as Code, and Zero Trust Security reduce the need for manual intervention, and rely on more tenable assumptions about large, highly regulated environments. Give your business the armor it needs to survive in markets where others fail, and force competitors to take on too much risk to compete effectively.
But it begs the question, where to start, and how to avoid potential pitfalls?
It is important to recognize that gatekeeping function of SIEM and other traditional Information Security departments has its place, but it isn’t Security Enablement. Shift the mindset from removing or patching security leaks, to adding armor that enables your developers and products to innovate faster and respond to changing needs before the competition.
There is a larger need for the Security Enablement among organizations in the throes of adopting DevOps & SRE practices. A critical perspective is to combine security enforcement perspective / gatekeeping responsibilities that already exist, with the need to move quickly on Cloud adoption.
The skill set for rapid adoption of new security techniques and models is not the same as traditional SIEM. People who have experience in automation and SRE will be invaluable, and can sometimes even necessitate a separate initiative and “charter” from an SIEM department, and work in tandem with them. SIEM may not be the best suited to develop Security as Code solutions, but they can and should be part of a conversation about security policy and “red-team” the results of Zero Trust & Security as Code.
Figure out the tools that will work best with your development processes, and work backwards from there. The benefit of this is that it gets developers involved in their application security in a self service way, that’s a lot closer to their code.
Investigate implications of making CI/CD integration, developer friendly APIs, and other automation focused feature sets a more important criterion for choosing Enterprise Cloud Security software, because this will reduce deployment risk, and will accelerate safe cloud migration to meet deadlines.KPIs for security systems should include an upper bound, median, and floor for % addition to cycle time.