Concepts
Overview
Elasticsearch is an open-source, distributed search and analytics engine. With PDS, you can leverage core capabilities of Kubernetes to deploy and manage your Elasticsearch clusters using a cloud-native, container-based model.
PDS follows the release lifecycle of the upstream Elasticsearch project, meaning new releases are available on PDS shortly after they have become GA. Likewise, older versions of Elasticsearch are removed once they have reached end-of-life. See the list of currently supported Elasticsearch versions.
Like all data services in PDS, Elasticsearch deployments run within Kubernetes. PDS includes a component called the PDS Deployments Operator
that manages the deployment of all PDS data services, including Elasticsearch. The operator extends the functionality of Kubernetes by implementing a custom resource called elasticsearch
. This resource type represents a Elasticsearch cluster allowing standard Kubernetes tools to be used to manage Elasticsearch clusters, including scaling, monitoring, and, upgrading.
You can learn more about the PDS architecture here.
Clustering
Elasticsearch is a distributed system meant to be run as a cluster of individual nodes. This allows Elasticsearch to be deployed in a fault-tolerant, highly available manner.
Although PDS allows Elasticsearch to be deployed with any number of nodes, high availability can only be achieved when running with three or more nodes. Smaller clusters should only be considered in development environments. In clusters with three or fewer nodes, all nodes are configured with all node roles (master
, data
, ingest
, transform
, ml
, and remote_cluster_client
). In clusters with more than three nodes, only three of the nodes will be configured to include the master
role.
When deployed as a multi-node cluster, individual nodes, deployed as pods within a statefulSet
, automatically discover each other to form a cluster. Node discovery within an Elasticsearch cluster is also automatic when pods are deleted and recreated or when additional nodes are added to the cluster via horizontal scaling.
PDS leverages the partition tolerance of Elasticsearch by spreading Elasticsearch servers across Kubernetes worker nodes when possible. PDS utilizes Stork, in combination with Kubernetes storageClasses
, to intelligently schedule pods. By provisioning storage from different worker nodes, and then scheduling pods to be hyper-converged with the volumes, PDS deploys Elasticsearch clusters in a way that maximizes fault tolerance, even if entire worker nodes or availability zones are impacted.
Refer to the PDS architecture to learn more about Stork and scheduling.