Portworx Enterprise Release Notes
3.3.0
June 23, 2025
To install or upgrade Portworx Enterprise to version 3.3.0, ensure that you are running one of the supported kernels and all system requirements are met.
New Features
-
Active Cluster on FlashArray Direct Access volumes
Portworx now supports ActiveCluster on FlashArray Direct Access volumes with PX-StoreV2, allowing synchronous replication and automatic failover across multiple FlashArrays. For more information, see Install Portworx with Pure Storage FlashArray Direct Access volumes with ActiveCluster setup. -
Application I/O Control leveraging Control Group v2
Portworx now supports Application I/O Control on hosts that use cgroup v2, in addition to cgroup v1. Portworx automatically detects the available cgroup version and applies I/O throttling accordingly to ensure seamless operation across supported Linux distributions. For more information, see Application I/O Control. -
Vault for storing vSphere credentials
Portworx Enterprise now supports storing vSphere credentials in Vault when using Vault as a secret provider, to provide a more secure and centralized way to manage vSphere credentials. Previously, vSphere credentials were stored in Kubernetes secrets. For more information, see Secrets Management with Vault. -
Enhanced Cluster-wide Diagnostics Collection and Upload
Portworx now supports cluster-level diagnostics collection through thePortworxDiag
custom resource. When created, the Portworx Operator launches temporary diagnostic pods that collect node-level data and Portworx pod logs, store the results in the/var/cores
directory, and then automatically delete the diagnostic pods. For more information, see On-demand diagnostics usingPortworxDiag
custom resource. -
Volume-granular Checksum Verification tool for PX Store-V1
Portworx now supports block-level checksum verification across volume replicas using thepxctl volume verify-checksum
command for PX-StoreV1. This feature ensures data integrity by comparing checksums across all replicas and supports pause/resume functionality with configurable I/O controls. For more information, see pxctl volume. -
TLS Encryption for Internal KVDB Communication
Portworx now supports enabling Transport Layer Security (TLS) for internal KVDB communication on Google Anthos. Subsequent releases will include support for additional platforms. This feature secures communication between internal KVDB and all Portworx nodes using TLS certificates managed bycert-manager
. For more information, see Enable TLS for Internal KVDB.
Early Access Features
-
Portworx Shared RWX Block volumes for KubeVirt VMs
Portworx now supports ReadWriteMany (RWX) raw block volumes for KubeVirt virtual machines (VMs), enabling high-performance, shared storage configurations that support live migration of VMs in OpenShift environments. For more information, see Manage Shared Block Device (RWX Block) for KubeVirt VMs. -
Enhance capacity management by provisioning custom storage pools
Portworx now enables provisioning of storage pools during and post Portworx installation, enhancing the management of storage capacity. For more information, see Provision storage pool. -
Journal IO support for PX-StoreV2
Portworx now supports Journal device setup and journal IO profile volumes for PX-StoreV2. For more information, see Add a journal device. -
Support for multiple connections on the same NIC interface or bonded NIC using LACP
Portworx enables the use of multiple connections on the same NIC interface or bonded NIC interfaces using LACP, to enhance performance as data traffic can be distributed across multiple links. For more information, see Configure multiple NICs with LACP NIC Bonding. -
Pool drain
Portworx now supports moving volume replicas between storage pools using the pool drain operation. For more information, see Move volumes using pool drain.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-43808 | Pure FA Cloud Drives attach backend volumes to hosts on FlashArray using the hostnames retrieved from the NODE_NAME container environment variable, which is specified by the Portworx spec.nodeName field. This might lead to hostname collisions across clusters. User Impact: Backend volumes might mount to hosts of other clusters. Resolution: The Pure FlashArray CloudDrive feature now sets the Purity hostname using a combination of the hostname and NODE_UID , limited to a maximum of 63 characters. This prevents hostname collisions and ensures that backend volumes are mounted only on the correct hosts. This also allows easier mapping back to the original host in the FlashArray user interface (UI) and logs.Components: Drive & Pool Management Affected Versions: 3.2.2 and earlier | Minor |
PWX-43472 | When a storage node fails, storageless nodes would repeatedly attempt cloud drive failover. Each attempt opened a new connection to kvdb /etcd but did not close it.User Impact: Open connections might eventually exhaust available file descriptors, making etcd non-responsive to new connections. Resolution: Connections opened for kvdb health checks during failover attempts are properly closed, preventing resource exhaustion and maintaining etcd responsiveness. Components: Control Plane, KVDB Affected Versions: 3.2.2.1 | Minor |
PWX-41940 | Portworx telemetry did not collect kubelet logs from cluster nodes. Only Portworx logs were available for troubleshooting. User Impact: Without kubelet logs, diagnosing cluster-level Kubernetes issues (such as pod crashes, evictions, or node failures) was slower and less effective, impeding root cause analysis and consistent monitoring across environments. Resolution: Telemetry-enabled clusters now periodically send filtered kubelet logs, which provides more complete telemetry for debugging and alerting. Components: Telemetry and Monitoring Affected Versions: 3.3.0 | Minor |
PWX-36280 | Portworx did not display kube-scheduler, kube-controller-manager, and pause image details in the /version endpoint output.User Impact: Without image details, it is difficult to obtain complete component version information when querying the manifest or automating image checks using the curl command. Resolution: The /version endpoint now includes kube-scheduler, kube-controller-manager, pause, and other relevant images in its version manifest, addressing the need for comprehensive version reporting via standard API calls.Components: Install & Uninstall, Operator Affected Versions: All | Minor |
PWX-32328 | Sometimes, Portworx propagated volume metrics to Prometheus from the wrong node, and in some cases, metrics for deleted volumes were reported as active. User Impact: A single volume appears as attached to two different nodes, resulting in false alerts about the actual state of storage volumes in Prometheus. Resolution: Volume-related metrics are now emitted only by the node where the volume is actually attached. Components: Telemetry and Monitoring Affected Versions: All | Minor |
PWX-27968 | Volume replicas may be placed incorrectly when its VPS volume affinity or volume anti-affinity rule contains multiple match expressions. The provisioner may treat a partial match as a whole match, thus mistakenly selecting/deselecting certain pools during volume provisioning. User Impact: When users create new volumes or add an existing volume using a VPS volume rule with multiple match expressions, users may see replicas placed on unwanted nodes (volume-affinity scenario), or provision failure (volume anti-affinity rule). Resolution: Modify the provision algorithm to always evaluate VPS volume rules on a per-volume basis, thereby avoiding confusion between partial match and full match. Recommendation: No need to modify VPS and storage classes. With the new version, new volumes will be placed correctly according to the VPS rules. However, incorrectly placed existing volumes still require manual fix (move replicas using pxctl command).Components: Volume Placement & Balancing Affected Versions: 3.3.x | Minor |
PWX-39098 | Abrupt pod shutdown or deletion in rare scenarios might leave behind (retain) device mappings. User Impact: New pods attempting to use the volume become stuck in the ContainerCreating phase due to incomplete cleanup of the device mappings.Resolution: The fix adds additional methods to remove any retained device mappings and attempts to clean them up. If the cleanup is unsuccessful, an appropriate message is returned. Components: Shared volumes Affected Versions: 3.2.0 | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PWX-43720 | In Portworx Enterprise version 3.3.0 or later, when used with operator version 25.2.0 or later, the
Components: Telemetry | Minor |
PWX-43275 | When using FlashArray Direct Access (FADA) volumes in ActiveCluster mode, if a volume is deleted while one of the backend arrays is unavailable, orphan volumes may remain on the FlashArray that was down. This issue does not affect application workloads directly, but manual cleanup might be required to identify and remove orphaned volumes.
Components: Volume Management | Minor |
PWX-44473 | KubeVirt virtual machines using Portworx on IPv6 clusters with SharedV4 volumes may enter a Workaround: Configure the
| Minor |
PWX-44223 | A storage-less node may fail to pick up the drive-set from a deleted storage node if the API token used for FlashArray authentication has expired and the node does not automatically retrieve the updated token from the Kubernetes secret. As a result, the storage-less node is unable to log in to the FlashArray and fails to initiate drive-set failover after the storage node is deleted.
Components: Drive & Pool Management | Minor |
PWX-44623 | When provisioning virtual machines (VMs) on FlashArray using KubeVirt, the VM might remain in the Provisioning state if the underlying PersistentVolumeClaim (PVC) fails with the error
The VM will attempt to provision a new PVC automatically. Components: Volume Management | Minor |
PWX-43060 | If a FlashArray Direct Access (FADA) multipath device becomes unavailable after the volume has already been marked as
Components: Volume Management | Minor |
PWX-43212 | Some VMs might remain stuck in the Shutting Down state after a FlashArray (FA) failover, especially when nodes are overpopulated with VMs. This is a known occurrence related to VM density and node resource allocation. Resolution: Monitor the number of VMs assigned to each node and plan resource allocation across the cluster to reduce the risk. Components: Volume Management | Minor |
PWX-44486 | If the coordinator node of an RWX volume (i.e. the node where the volume is currently attached) is placed into Maintenance mode, application pods using the volume might temporarily experience I/O disruption and encounter Input/Output errors. Resolution:
To prevent this issue, before putting a node into Maintenance mode, check if any volumes (especially RWX volumes) are attached to it. If they are attached, restart Portworx on the node first by running Components: Storage | Minor |
3.2.3
May 13, 2025
To install or upgrade Portworx Enterprise to version 3.2.3, ensure that you are running one of the supported kernels and all prerequisites are met.
New Features
- FlashArray Direct Access shared raw block (RWX) volumes
Portworx now supports FADA shared raw block (RWX) volumes, enabling live migration of KubeVirt VMs with high-performance raw block storage. This eliminates filesystem overhead, improves I/O performance, and ensures seamless migration by allowing simultaneous volume access on source and destination nodes. For more information, see Run KubeVirt VMs with FlashArray Direct Access shared raw block (RWX) volumes.
Note: This release also addresses security vulnerabilities.
Improvements
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-42785 | FlashArray Fibre Channel integration now filters out WWNs from uncabled ports when creating hosts. This enhancement reduces manual intervention and prevents errors during volume attachment in environments with partially connected FC ports. | Volume Management |
PWX-43645 | Portworx now supports setting the sticky bit for FlashArray Direct Access (FADA) volumes. You can set the sticky bit using the --sticky flag with the pxctl volume update command. | Volume Management |
PWX-42482 | A backoff mechanism now limits repeated calls, reducing kube-apiserver load. Unnecessary LIST /api/v1/nodes API calls from nodes in the NotReady state or with px/enabled=false are reduced, which improves efficiency. | API |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-42489 | Portworx made repeated GET requests to /api/v1/namespaces/portworx/configmaps/px-telemetry-phonehome from nodes that were either NotReady or had px/enabled=false .User Impact: These API calls were unnecessary and added load to the Kubernetes API server, particularly in clusters with many storageless or inactive nodes. Resolution: Portworx startup previously made unconditional API calls to fetch telemetry configuration. This has been fixed by updating the sequence to first check local sources before querying the Kubernetes API. Components: Telemetry and Monitoring Affected Versions: 3.2.2.2 or earlier | Minor |
PWX-43598 | Repl-add could select a replica node that doesn't comply with the volume's volume-affinity VPS rule when no valid pools are available. If a KubeVirt VPS fixer job is running, it may enter a loop of repeated repl-add and repl-remove operations on the same volume without resolving the placement issue. User Impact: This may lead to incorrect replica placement and violation of affinity rules. The VPS fixer job can create unnecessary load by repeatedly attempting to correct the placement. Resolution: Portworx now evaluates additional conditions before allowing fallback to relaxed volume-affinity placement. Relaxed-mode is applied only when no nodes are available that meet the required affinity criteria, ensuring more consistent replica alignment. Components: Volume Placement and Balancing Affected Versions: 3.2.2.2 and earlier | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PWX-43463 | In OpenShift virtualization, after you restart the entire Kubernetes cluster, virtual machines remain in the Workaround:
| Minor |
PWX-43486 | When using FlashArray Direct Access (FADA) shared block volumes, virtual machines (VMs) might temporarily stop during live migration if the primary FlashArray (FA) controller reboots while the secondary controller is unavailable. This occurs because I/O paths are unavailable, causing I/O errors that pause the VM. Workaround:
| Minor |
PWX-42358 | On RHEL 8.10 systems, running Linux kernel 4.18, Workaround: To resolve this issue, restart the Portworx service or manually recreate the missing cgroup directories by running the following commands:
| Minor |
PWX-43849 | Portworx does not support Debian 11 with kernel version 5.10.0-34-amd64 for PX-StoreV1 due to a known issue, and we recommend using Debian 12 with kernel version | Minor |
3.2.2.2
April 17, 2025
Portworx now supports IPv6 clusters for OpenShift with KubeVirt in dual-stack networking mode. To install or upgrade Portworx Enterprise to version 3.2.2.2, ensure that you are running one of the supported kernels and all prerequisites are met.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-42915 | In clusters where IPv6 is preferred (PX_PREFER_IPV6_NETWORK_IP=true ), Sharedv4 volume mounts may fail if Portworx selects an incorrect IPv6 address. This causes pods to remain in the ContainerCreating state with a "permission denied" error from the server.User Impact: Pods using Sharedv4 volumes may fail to start in IPv6-preferred or dual-stack clusters. This does not affect clusters using IPv4 by default. Resolution: Portworx now uses a consistent strategy to select the most appropriate IPv6 address:
Components: Shared Volumes Affected Versions: 3.2.2.1 or earlier | Minor |
PWX-42843 | When Portworx was deployed in a dual-stack (IPv4 and IPv6) Kubernetes cluster, it created a sharedv4 Kubernetes Service without explicitly specifying the ipFamily field. If ipFamily wasn't set, Kubernetes created an IPv4 address by default, while Portworx was listening on an IPv6 address.User Impact: Pods using sharedv4 service volumes failed to start because sharedv4 volume mounts couldn't complete using the IPv4-based Kubernetes Service IP address. Resolution: Portworx now explicitly sets the ipFamily field on the sharedv4 Kubernetes Service based on the IP address it uses in a dual-stack Kubernetes cluster.Components: Shared Volumes Affected Versions: 3.2.2.1 or earlier | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PWX-43355 | KubeVirt virtual machines using Portworx on IPv6 clusters with SharedV4 volumes may enter a Workaround: Configure the
| Minor |
3.2.2.1
March 26, 2025
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-42778 | Processing unaligned write requests on volumes might cause partial data transfer issues. User Impact: When processing large write requests (e.g., 1MB), unaligned blocks at the start of a request might lead to partial data transfers. This occurs when the available space in the user iovecs runs out before the last portion of the data is copied.Note: This issue occurs only when using the virtio driver in a KubeVirt deployment.Resolution: Improved handling of unaligned requests prevents premature exhaustion of user iovecs and ensures that all data is copied for large write operations. Components: Storage Affected Versions: 3.2.2 | Minor |
3.2.2
March 10, 2025
Since February 10, 2025, the Portworx Essentials license was discontinued. Starting with version 3.2.2, no images will be released for Portworx Essentials.
To install or upgrade Portworx Enterprise to version 3.2.2, ensure that you are running one of the supported kernels and that the prerequisites are met.
New Features
-
Encryption support for FlashArray Direct Access (FADA)
Portworx now supports FADA volume Encryption, providing seamless data protection by encrypting information both in transit and at rest on FlashArray storage. Encryption keys are used consistently across the cluster, even with multiple FlashArrays. This feature ensures that data remains secure throughout the process, with encryption handled at the storage level. For more information, see Create encrypted PVCs in FlashArray. -
NVMe-oF/TCP support for FlashArray Direct Access (FADA)
Portworx now supports NVMe-oF/TCP protocol, providing high-performance, low-latency storage access for Kubernetes applications using FlashArray LUNs. By leveraging standard TCP/IP, this feature eliminates the need for specialized networking hardware like RoCEv2, making deployment more flexible and cost-effective while maintaining optimal performance. For more information, see Set up NVMe-oF TCP protocol with FlashArray. -
PX-StoreV2 support on additional platforms
Portworx now supports installation with PX-StoreV2 on the following platforms: -
Portworx Enterprise now supports Kubernetes version 1.31, starting from version 1.31.6. Before upgrading Kubernetes to 1.31.6 or later, update the Portworx Operator to version 24.2.3. For more details, refer to the Portworx Operator 24.2.3 release notes.
The logUploader
utility is now hosted in the portworx/log-upload
repository. Please update your image repository mirrors to pull from this new location.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-41668 | In environments with slow container runtimes, the Portworx Pod could report READY=0/1 (Not Ready) even when the backend was fully operational. This occurred due to internal readiness checks failing to update in rare cases.User Impact: The pod might appear as Not Ready , causing confusion in monitoring.Resolution: The readiness check logic has been fixed, ensuring the POD transitions correctly to READY=1/1 when the backend is operational.Components: Monitoring Affected Versions: 3.2.1.2 or earlier | Minor |
PWX-40755 | When Portworx is configured with separate data and management interfaces, some KubeVirt VMs may enter a paused state during platform or Portworx upgrades. User Impact: During upgrades, certain KubeVirt VMs may pause unexpectedly and require manual intervention to restart. Resolution: The issue has been fixed, ensuring KubeVirt VMs remain operational during Portworx and platform upgrades, without requiring manual restarts. Components: Upgrade Affected Versions: 3.2.0, 3.2.1, 3.2.1.1, and 3.2.1.2 | Minor |
PWX-40564 | Pool expansion could fail if the backend FA volume was expanded but the updated size was not reflected on the node. User Impact: If a drive was expanded only in the backend and a pool expansion was attempted with a size smaller than the backend size, the operation would fail. Resolution: Pool expansion now correctly retrieves and updates the drive size from the backend, preventing failures caused by size mismatches. Components: Drive and Pool Management Affected Versions: All | Minor |
PWX-39322 | Cloud drive lock contention during the startup of an affected node could cause inconsistencies in the internal KVDB, potentially triggering a panic in other PX nodes. User Impact: In large clusters, where lock contention is more likely, this issue could significantly extend the Portworx cluster restore process. Resolution: If an inconsistency is detected when the affected node starts, it now performs a cleanup to resolve the issue, preventing other nodes from panicking. Components: Drive and Pool Management Affected Versions: All | Minor |
PWX-40423 | Decommissioning a non-KVDB storage node did not automatically delete the associated drives from the FA backend. User Impact: Users had to manually remove drives from the FA backend after decommissioning a node. Resolution: The decommission process has been updated to ensure that backend devices are deleted automatically when the node wipe is completed. Components: Drive and Pool Management Affected Versions: All | Minor |
PWX-41685 | The PVC label template in VPS did not recognize incoming label keys containing multiple segments (dots). As a result, the template was not replaced with the label value, leading to unintended VPS behavior. User Impact: Users utilizing PVC label templates with multi-segment PVC labels experienced incorrect VPS functionality. Resolution: Updated the pattern matching for PVC label templates to support multi-segment label keys, ensuring correct label value replacement. Components: Volume Placement and Balancing Affected Versions: All | Minor |
PWX-40364 | When volume IDs had varying lengths (as expected), the defrag schedule occasionally failed to resume from the correct position after pausing. Instead, it restarted from the beginning, preventing the completion of a full iteration. User Impact: The built-in defrag schedule was unable to iterate through all volumes, rendering it ineffective in addressing performance issues. Users had to revert to using a defrag script. Resolution: The built-in defrag schedule now correctly resumes from the last stopped position and iterates through all volumes as expected. Components: KVDB Affected Versions: 3.2.0 and 3.2.1 | Minor |
PWX-37613 | If a pool expansion failed after cloud drives were expanded but before the pool was updated, attempting a subsequent expansion with a smaller size resulted in an error. User Impact: Users could experience a pool expansion failure if a previous expansion was interrupted and left unfinished, and they attempted another expansion of a smaller size. Resolution: The second pool expansion request now detects and completes the previously interrupted expansion instead of failing. Components: Drive and Pool Management Affected Versions: 3.1.2 to 3.2.1 | Minor |
PWX-38702 | In certain failover scenarios, mounting a shared file system could fail with an "Exists" or "file exists" error. This issue occurs due to an unclean unmount when the file system was last mounted on the same node. User Impact: This might result in user pods remaining in “Container Creating” state. Resolution: The fix addresses multiple underlying causes that lead to unclean unmounts. Additionally, since this issue can also arise due to a race condition in the Linux kernel, the fix now detects such scenarios, aborts the mount process, and provides a clear error message. Components: Shared Volumes Affected Versions: 3.2.0 to 3.2.1.2 | Minor |
PWX-42043 | The CLI command pxctl cred list [-j] returns an error and fails to list credentials.User Impact: If the cluster contains non-S3 credentials, the pxctl cred list [-j] command will not display the credentials.Resolution: The command now correctly lists all credentials, including non-S3 credentials, without errors. Components: CLI and API Affected Versions: 3.1.8, 3.2.1.2 | Minor |
Known issues (Errata)
-
PWX-42379: On PX-Security enabled clusters running Kubernetes 1.31 or later, expanding an in-tree PersistentVolumeClaim (PVC) fails due to compatibility issues. This prevents users from increasing storage capacity through standard PVC expansion methods, potentially impacting workloads that require additional storage.
Workaround: Until this issue is resolved in a future
external-resizer
sidecar release from the upstream Kubernetes community, users can manually expand the volume usingpxctl volume update --size <new-size> <volume-name>
instead of updating the PVC size.
Components: Volume Management
Affected Versions: 3.2.1.1 or later
Severity: Minor -
PWX-42513: When you deploy more than 100 apps with FlashArray Direct Access (FADA) PVCs using NVMe-oF-TCP at the same time, volumes are created in the backend. However, the attempt to attach hosts to the volume in the Portworx layer sometimes fails, leaving device mappers on the hosts with no available paths. Because the mapper device is created, Portworx attempts to create a filesystem but hangs due to the missing paths.
Additionally, PVC creations can get stuck in the
ContainerCreating
state. The large number of multipath FADA volumes increases the time required for newer FADA volumes' multipath to appear, causing Portworx to enter an error state.Note: We recommend creating FADA volumes in batches with a significant interval between each batch.
Workaround: To recover from this state, perform the following steps:
-
Identify the affected device:
multipath -ll
eui.00806e28521374ac24a9371800023155 dm-34 ##,##
size=50G features='1 queue_if_no_path' hwhandler='0' wp=rw -
Disable queueing for the affected device:
multipathd disablequeueing map eui.00806e28521374ac24a9371800023155
-
Flush the multipath device:
multipath -f eui.00806e28521374ac24a9371800023155
-
Verify that the device has been removed:
multipath -ll eui.00806e28521374ac24a9371800023155
-
Reattach the volume manually from the FA controller to the host (worker node).
-
Confirm that the device is correctly reattached and that paths are available:
multipath -ll eui.00806e28521374ac24a9371800023155
eui.00806e28521374ac24a9371800023155 dm-34 NVME,Pure Storage FlashArray
size=50G features='4 queue_if_no_path retain_attached_hw_handler queue_mode bio' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=50 status=active
|- 1:245:24:544 nvme1n24 259:68 active ready running
`- 0:1008:24:544 nvme0n24 259:71 active ready running -
Confirm that no Portworx processes are in an uninterruptible sleep state (
D
state) using the following command:ps aux | grep " D "
Components: Volume Management
Affected Versions: 3.2.2
Severity: Minor -
3.2.1.2
February 04, 2025
New Features
Portworx Enterprise now supports the following:
- Installation of Portworx with PX-StoreV2 on Rancher clusters running on Ubuntu or SUSE Linux Micro. For hardware and software requirements, see Prerequisites.
- Rancher clusters on SUSE Linux Micro. For a list of supported distributions and kernel versions, see Qualified Distros and Kernel Versions.
Note: This release also addresses security vulnerabilities.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-41663 | If Kubernetes clusters contain FlashBlade volumes migrated from Pure Storage Orchestrator (PSO) clusters, the Portworx process on these systems enters a continuous crash loop, preventing normal volume operations. User Impact: Portworx repeatedly crashes and restarts, preventing normal cluster operation. Resolution: This issue has been resolved. Portworx no longer crashes in environments with FlashBlade volumes migrated from PSO clusters. Components: Upgrade Affected Versions: 3.2.1, 3.2.1.1 | Major |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-3880 | On systems with automatic updates enabled, the system may upgrade to a kernel version that is not listed on the supported kernel page. This can prevent the Portworx kernel module from loading, resulting in a failed Portworx installation. Workaround: Disable automatic updates and verify that you are using a supported kernel version. Components: Installation Affected Versions: 3.2.x, 3.1.x, 3.0.x | Major |
3.2.1.1
December 17, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-40233 | The volume snapshot count with Portworx CSI for FlashArray and FlashBlade license has been increased from 5 to 64 | Licensing & Metering |
PWX-37757 | The Pure export rules for accessing FlashBlade are now defined by the specified accessModes in the PVC specification.
| Volume Management |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38838 | During asynchronous disaster recovery, delete requests for objects in the object-store were being rejected when the schedule contained more than 16 volumes. User Impact: Due to this issue, users saw some of the asynchronous disaster recovery relayed objects were not being cleaned up from object-store. Resolution: The system has been updated to accept up to 64 delete requests. This change prevents objects from being retained in the object-store when the schedule includes more than 16 volumes but fewer than 64. Components: Migration Affected Versions: 3.2.x, 3.1.x, 3.0.x | Major |
PWX-40477 | Portworx cluster failed to migrate from using external KVDB to internal KVDB. User Impact: Due to this issue, users were unable to migrate their Portworx clusters from external KVDB to internal KVDB, disrupting operations that rely on the internal KVDB for managing the cluster's state and configuration. Resolution: Portworx clusters can now be successfully migrated from external KVDB to internal KVDB. For instructions, contact the Portworx support team. Components: KVDB Affected Versions: 3.2.0, 2.13.12 | Major |
3.2.1
December 2, 2024
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Portworx now supports the PX-StoreV2 backend on the following platforms
3.2.0
October 31, 2024
Visit these pages to see if you're ready to upgrade to this version:
Portworx 3.2.0 requires Portworx Operator 24.1.3 or newer.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Secure multi-tenancy with Pure FlashArray When a single FlashArray is shared among multiple users, administrators can use realms to allocate storage resources to each tenant within isolated environments. Realms set boundaries, allowing administrators to define custom policies for each tenant. When a realm is specified, the user must provide a FlashArray pod name where Portworx will create all volumes (direct access or cloud drives) within that realm. This ensures that each tenant can only see their own storage volumes when logged into the array.
- Support for VMware Storage vMotion Portworx now supports the Storage vMotion feature of VMware, enabling vSphere cloud drives to be moved from one datastore to another without any downtime.
- Defragmentation schedules
Users can now set up defragmentation schedules using
pxctl
commands during periods of low workload to improve the performance of Portworx.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-35876 | For IBM customers, Portworx now supports the StorageClass with the encryption flag set to true. | Marketplaces |
PWX-38395 | Previously, all storageless nodes would restart to claim a driveset when a storage node went down and its driveset was detached in the same zone. With this improvement, only one storageless node will claim ownership of the driveset and restart, while the other storageless nodes remain unaffected and do not restart. | Drive & Pool Management |
PWX-33561 | For partially attached drivesets, Portworx now detaches the driveset only when cloud drives are not mounted, avoiding unnecessary detachment when a mount is present. | Drive & Pool Management |
PWX-37403 | FlashArray now allows specifying multiple management ports for the same FlashArray. If customers are on a VLAN connection to FlashArray, the virtual IP address might encounter issues. Customers can specify the management IPs of the controllers directly in the secret as comma-separated values. | Drive & Pool Management |
PWX-38597 | For FlashArray Cloud Drives, on Portworx restart, any stale entries of the driveset are cleaned, and the locally attached driveset is prioritized for mounting volumes rather than checking all other drives. | Drive & Pool Management |
PWX-39131 | The total number of GET API calls are reduced significantly. | Drive & Pool Management |
PWX-38551 | The latency of any operation on FlashArray due to multiple API calls has been reduced. Portworx now uses the FlashArray IDs stored in the cloud drive config map to limit API calls only to the FlashArray where the drive resides. | Drive & Pool Management |
PWX-37864 | When you add a drive using the pool expand add-drive operation, the config map is now automatically updated with the pool ID of the newly added drive, preventing the need for a Portworx restart. | Drive & Pool Management |
PWX-38630 | Portworx now supports adding a cloud drive to a storageless node when the cloud drive specification for the journal device in the StorageCluster spec is explicitly set to a value other than auto . | Drive & Pool Management |
PWX-38074 | Improved the startup timing of Portworx nodes in multi-FlashArray setups by handling metrics timeouts more effectively. When volume creation on a FlashArray takes too long, Portworx now avoids sending further requests to that FlashArray for 15 minutes, allowing other nodes to continue the startup process without delays. | Drive & Pool Management |
PWX-38644 | For FlashArray Cloud Drives, pool expansion failure messages are no longer overridden by maintenance mode messages, providing more useful error information for users to debug their environment. | Drive & Pool Management |
PWX-33042 | In disaggregated environments, users cannot add drives to a storageless node labeled as portworx.io/node-type=storageless . To add drives, users need to change the node label to portworx.io/node-type=storage and restart Portworx. | Drive & Pool Management |
PWX-38169 | During pool expansion, Portworx now check the specific driveset that the node records, rather than iterating through all drivesets in the cluster randomly. This change significantly reduces the number of API calls made to the backend, thereby decreasing the time required for pool expansion and minimizing the risk of failure, particularly in large clusters. | Drive & Pool Management |
PWX-38691 | Portworx now raises an alert called ArrayLoginFailed when it fails to log into a FlashArray using the provided credentials. The alert includes a message listing the arrays where the login is failing. | Drive & Pool Management |
PWX-37672 | The pxctl cd i --<node-ID> command now displays the IOPS set during disk creation | Drive & Pool Management |
PWX-37439 | Azure users can now specify IOPS and throughput parameters for Ultra Disk and Premium v2 disks. These parameters can only be set during the installation process. | Drive & Pool Management |
PWX-38397 | Portworx now exposes NFS proc FS pool stats as Prometheus metrics. Metrics to track the number of Packets Arrived , Sockets Enqueued , Threads Woken , and Threads Timedout have been added. | Shared Volumes |
PWX-35278 | A cache for the NFS and Mountd ports has been added, so the system no longer needs to look up the ports every time. The GetPort function is only called the first time during the creation or update of the port, and the cache updates if accessed 15 minutes after the previous call. | Shared Volumes |
PWX-33580 | The NFS unmount process has been improved by adding a timeout for the stat command, preventing it from getting stuck when the NFS server is offline and allowing retries without hanging. | Shared volumes |
PWX-38180 | Users can now set the QPS and Burst rate to configure the rate at which API requests are made to the Kubernetes API server. This ensures that the failover of the sharedv4 service in a scaled setup is successful, even if another operation causes an error and restarts some application pods. To do this, add the following environment variables:
| Shared Volumes |
PWX-39035 | Portworx will no longer print the Last Attached field in the CLI's volume inspect output if the volume has never been attached. | Volume Management |
PWX-39373 | For FlashArray Direct Access volumes, the token timeout time has been is increased from 15 minutes to 5 hours to avoid, which provides enough time for Portworx to process large number of API token requests | Volume Management |
PWX-39302 | For Portworx CSI volumes, calls to the Kubernetes API to inspect a PVC have been significantly reduced, improving performance. | Volume Management |
PWX-37798 | Users can now remove labels from a Portworx volume using the pxctl volume update -l command, allowing them to manually assign pre-provisioned Portworx volumes to a pod. | Volume Management |
PWX-38585 | FlashArray Direct Access users can now clone volumes using pxctl . | Volume Management |
PWX-35300 | Improved FlashBlade Direct Access volume creation performance by removing an internal lock, which previously caused delays during parallel creation processes. | Volume Management |
PWX-37910 | Cloudsnaps are now initialized using a snapshot of KVDB avoiding failure errors. | Storage |
PWX-35130 | Portworx now sends an error message and exits the retry loop when a volume is stuck in a pending state, preventing continuous creation attempts. | Storage |
PWX-35769 | Storageless nodes now remain in maintenance mode without being decommissioned, even if they exceed the auto-decommission timeout. This prevents failure for user-triggered operations when the storageless node is in maintenance mode. | Control Plane |
PWX-39540 | Portworx now ensures the correct information for a pure volume is returned, even if the FlashArray is buggy, preventing node crashes. | Control Plane |
PWX-37765 | The pxctl volume list command has been improved to allow the use of the --pool-uid flag alongside the --trashcan flag, enabling the filtering of trashcan volumes based on the specified Pool UUID. | CLI & API |
PWX-37722 | Added a new --pool-uid flag to the pxctl clouddrive inspect command, allowing users to filter the inspect output based on the specified Pool UUID. | CLI & API |
PWX-30622 | The output of the pxctl volume inspect <volume-id> command now displays the labels alphabetically, making it easier to track any changes made to labels. | CLI & API |
PWX-39146 | The pxctl status output also includes a timestamp indicating when the information was collected. | CLI & API |
PWX-36245 | PX-StoreV2 pools now support a maximum capacity of 480TB by choosing appropriate chunk size during pool creation. | PX-StoreV2 |
PWX-39059 | Portworx now installs successfully on cGroupsV2 and Docker Container runtime environments. | Install & Uninstall |
PWX-37195 | Portworx now automatically detects SELinux-related issues during installation and attempts to resolve them, ensuring a smoother installation process on SELinux-enabled platforms. | Install & Uninstall |
PWX-38848 | Portworx now properly handles the floating license-lease updates, when cloud-drives move between the nodes. | Licensing & Metering |
PWX-38694 | Improved the time to bring up a large cluster by removing a short-lived cluster lock used in cloud drive deployments. | KVDB |
PWX-38577 | The logic for handling KVDB nodes when out of quorum has been improved in Portworx. Now, Portworx processes do not restart when KVDB nodes are down. | KVDB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38609 | Portworx sometimes lost the driveset lock for FlashArray cloud drives when the KVDB drive was removed in situations such as KVDB failover. User Impact: Loss of the driveset lock resulted in other nodes attempting to attach a drive already attached to the current node. Resolution: Portworx now uses a temporary driveset to safely remove the KVDB drive. Components: KVDB Affected Versions: 3.1.5 | Critical |
PWX-38721 | Portworx attempted to mount FlashBlade Direct Access volumes using the NFS IP. However, if an existing mount point used an FQDN, Portworx defaulted to the FQDN after a restart. If a Kubernetes mount request timed out, but Portworx completed it successfully, Kubernetes retried the request. Portworx then returned an error due to the FQDN, leading to repeated mount attempts. User Impact: Application pods with a timed-out initial mount request were stuck in the ContainerCreating state. Resolution: Portworx now performs IP resolution on the existing mount entry. If they match, it confirms the mount paths are already created, and Portworx returns a success. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-38618 | In a cluster where multiple applications used the same FlashBlade Direct Access volume, some applications used FQDNs while others used IP addresses. The NFS server recognized only the FQDN, causing a mismatch in the mount source paths tracked by Portworx. User Impact: Application pods using IPs to mount the FlashBlade Direct Access volume were stuck in the terminating state. Resolution: When a request is received from CSI to unmount a target path for FlashBlade Direct Access, Portworx unconditionally unmounts it, even if the source path differs from the one recognized by it. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-38376 | During node initialization in the boot-up process, FlashArray properties are required for all the dev mapper paths already present on the node. This call is made to all arrays configured in pure.json configuration file, which sometimes failed, causing the initialization to fail.User Impact: Users saw node initialization failures due to errors from arrays that had no volumes for the current node. Additionally, unintended extra API calls were made to the arrays, contributing to the overall API load. Resolution: Portworx now uses the FlashArray volume serial to determine which array the volume belongs to. The array ID is then passed as a label selector to DeviceMappings, ensuring that only the relevant array is queried. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-36693 | When a storageless node transitioned to a storage node, the node's identity changed as it took over the storage node identity. The old identity corresponding to the storageless node was removed from the Portworx cluster. All volumes attached to the removed node were marked as detached, even if pods were currently running on the node. User Impact: Volumes incorrectly appeared as detached, even while pods were running and consuming the volumes. Resolution: Portworx now decommissions cloud drives only after the AutoDecommissionTimeout expires, ensuring that volumes remain attached to the node and are not incorrectly displayed as detached. Components: Volume Management Affected Versions: 3.1.1 | Critical |
PWX-38173 | When the storage node attempted to restart, it could not attach the previous driveset, as it was already claimed by another node, and could not start as a new node because the drives were still mounted. User Impact: The storage node attempting to come back online repeatedly restarted due to unmounted drive mount points. Resolution: Portworx now automatically unmounts FlashArray drive mount points if it detects that the previous driveset is unavailable but its mount points still exist. Component: Drive and Pool Management Affected Versions: 3.0.x, 3.1.x | Critical |
PWX-38862 | During Portworx upgrades, a sync call was triggered and became stuck on nodes when the underlying mounts were unhealthy. User Impact: Portworx upgrades were unsuccessful on nodes with unhealthy shared volume mounts. Resolution: Portworx has removed the sync call, ensuring that upgrades now complete successfully. Components: Drive & Pool Management Affected Versions: 3.1.x | Critical |
PWX-38936 | If a storage node restarts, it restarted a few times before it could successfully boot, because its driveset was locked and won't be available for a few minutes. User Impact: Users saw the Failed to take the lock on drive set error message and the node took longer time to restart.Resolution: In such case Portworx tells the restarting node that the driveset is not locked, and thus it is able to claim this driveset without having to wait until the lock expires. During this time other nodes still see this driveset as locked and unavailable Components: Drive & Pool Management Affected Versions: 3.1.x | Major |
PWX-39627 | In large Portworx clusters with many storage nodes using FlashArray or FlashBlade as the backend, multiple nodes might simultaneously attempt to update the lock configmap, resulting in conflict errors from Kubernetes. User Impact: Although the nodes eventually resolved the conflicts, this issue spammed the logs and slowed down boot times, especially in large clusters. Resolution: The refresh interval has been changed from 20 seconds to 1 minute. In case of a conflict error, Portworx now delays the retry by a random interval between 1 and 2 seconds, reducing the likelihood of simultaneous updates. Additionally, the conflict is logged only after 10 consecutive occurrences, indicating a real issue. Components: Drive & Pool Management Affected Versions: 3.1.x, 3.0.x | Major |
PWX-36318 | In IBM Cloud, the node name is the same as the node IP. If the selected subnet had very few available IPs and Portworx replaced a worker node, the new node would take the same IP. User Impact: When Portworx started on the replaced node with the same IP, it incorrectly assumed that it had locally attached drives due to the volume attachments. This assumption led to an attempt to access the non-attached device path on the new node, causing Portworx to fail to start. Resolution: With the new provider-id annotation added to the volume attachment, Portworx now correctly identifies the replaced node as a new one without local attachments.Component: Drive and Pool Management Affected Versions: 3.1.x | Major |
PWX-38114 | In IBM Cloud, the node name is the same as the node IP. If the selected subnet had very few available IPs and a worker node was replaced, the new node had the same IP. User Impact: When Portworx started on the replaced node with the same IP, it incorrectly assumed it had locally attached drives due to existing volume attachments, leading to a stat call on the non-attached device path and causing Portworx to fail to start. Resolution: The volume attachment now includes a new annotation, provider-id , which is the unique provider ID of the node, allowing Portworx to recognize that the replaced node is new and has no local attachments.Component: Drive and Pool Management Affected Versions: 3.0.x, 3.1.x | Major |
PWX-37283 | A storageless node did not transition into a storage node after a restart if it initially became storageless due to infrastructure errors unrelated to Portworx. User Impact: These errors caused the node to have attached drives that it was unaware of, preventing the node from recognizing that it could use these drives during the transition process. Resolution: when a storageless node attempts to become a storage node, it checks for any attached drives that it previously did not recognize. Using this information, the storageless node can now correctly decide whether to restart and transition into a storage node. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x | Major |
PWX-38760 | On a node with existing FlashBlade volumes mounted via NFS using a DNS/FQDN endpoint, if Portworx received repeated requests to mount the same FlashBlade volume on the same mount path but using an IP address instead of the FQDN, Portworx returned an error for the repeated requests. User Impact: Pods were stuck in the ContainerCreating state. Resolution: Portworx has been updated to recognize and return success for such repeated requests when existing mount points are present. Components: Volume Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37614 | When a Portworx volume with volumeMode=Block was created from a StorageClass that also had fs or fsType specified, Portworx incorrectly attempted to format the raw block volume with the specified file system.User Impact: Users were unable to use a common StorageClass for creating both block and file volumes. Resolution: Portworx now allows the creation of raw block PVCs even if fs or fsType parameters are specified in the StorageClass.Components: Volume Management Affected Versions: 3.1.2 | Major |
PWX-37282 | HA-Add and HA-level recovery failed on volumes with volume-affinity VPS, as the volume-affinity VPS restricted pool provisioning to certain nodes. User Impact: Users experienced issues such as volumes losing HA after node decommission or HA-Add operations failing. Resolution: The restriction of volume-affinity VPS has been relaxed. Portworx now prioritizes pools that match VPS labels but will select secondary candidate pools under specific conditions, such as during HA increases and when the volume carries the specified VPS labels. This change does not affect VPS validity. Components: Storage Affected Versions: 3.1.x, 3.0.x | Major |
PWX-38539 | The Autopilot config triggered multiple rebalance audit operations for Portworx processes, which overloaded Portworx and resulted in process restarts. User Impact: Users saw alerts indicating Portworx process restarts. Resolution: Portworx now combines multiple rebalance audit triggers into a single execution, minimizing the load on Portworx processes and reducing the likelihood of restarts. Components: Storage Affected Versions: 3.1.2.1 | Major |
PWX-38681 | If there were any bad mounts on the host, volume inspect calls for FlashArray Direct Access volumes would take a long time, as df -h calls would hang.User Impact: Users experienced slowness when running pxctl volume inspect <volId> .Resolution: Portworx now extracts the FlashArray Direct Access volume dev mapper path and runs df -h only on that specific path.Components: CLI and API Affected Versions: 3.1.x, 3.0.x | Major |
PWX-37799 | Portworx restarted when creating a cloud backup due to a KVDB failure. User Impact: If a cloud backup occurred during a KVDB failure, Portworx would unexpectedly restart. Resolution: The nil pointer error causing the restart has been fixed. Now, Portworx raises an alert for backup failure instead of unexpectedly restarting. Components: Cloudsnaps Affected Versions: 3.1.x, 3.0.x | Major |
PWX-39080 | When the Kubernetes API server throttled Portworx requests, in certain scenarios, a background worker thread would hold a lock for an extended period, causing Portworx to assert and restart. User Impact: Portworx asserted and restarted unexpectedly. Resolution: The Kubernetes API calls from the background worker thread have been moved outside the lock's context to prevent the assert. Components: KVDB Affected Versions: 3.2.0 | Major |
PWX-37589 | When Azure users attempted to resize their drives, Portworx performed an online expansion for Azure drives, which did not align with Azure's recommendation to detach drives of 4 TB or smaller from the VM before expanding them. User Impact: Azure drives failed to resize and returned the following error: Message: failed to resize cloud drive to: 6144 due to: compute.DisksClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Disk of size 4096 GB (<=4096 GB) cannot be resized to 6144 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation Resolution: Portworx now detaches drives of 4 TB or smaller before performing pool expansion, instead of attempting online expansion. Components: Drive & Pool Management Affected Versions: 3.0.x,3.1.x | Minor |
PWX-36683 | Portworx failed to resolve the correct management IP of the cluster and contacted the Telemetry system using an incorrect IP/port combination. This issue caused the pxctl status command output to result in Telemetry erroneously reporting as Disabled or Degraded .User Impact: Telemetry would sometimes appear to be unhealthy even when it was functioning correctly. This could lead to confusion and misinterpretation of the system's health status. Resolution: The issue was resolved by fixing the logic that chooses the management IP, ensuring that Portworx correctly resolves the management IP of the cluster. This change prevents the system from using the wrong IP/port combination to contact the Telemetry system, thereby ensuring accurate reporting of Telemetry status. Components: Telemetry & Monitoring Affected Versions: 3.0.x,3.1.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-3505 | EKS users may encounter issues installing Portworx on EKS version 1.30. This version requires the Amazon Linux 2023 (AL2023) kernel, which, in turn, enforces IMDSv2 by default Workaround:
Affected Versions: 3.0., x, 3.1.x | Critical |
PD-3329 | Provisioning of KubeVirt VM fails if the bootOrder is not specified for the VM disks and the first disk is not a PVC or a DataVolume. Workaround: Specify the bootOrder in the VM spec or ensure that the first disk is a PVC or a DataVolume. Components: KVDB Affected Versions: 3.1.3 | Major |
PD-3324 | Portworx upgrades may fail with Unauthorized errors due to the service account token expiring when the Portworx pod terminates in certain Kubernetes versions. This causes API calls to fail, potentially leading to stuck Kubernetes upgrades. Workaround: Upgrade the Portworx Operator to version 24.2.0 or higher, which automatically issues a new token for Portworx. Components: Install & Uninstall Affected Versions: 3.1.1, 3.2.0 | Major |
PD-3412 | A Kubernetes pod can get stuck in the ContainerCreating state with the error message: MountVolume.SetUp failed for volume "<PV_NAME>" : rpc error: code = Unavailable desc = failed to attach volume: Volume: <VOL_ID> is attached on: <NODE_ID> , where NODE_ID is the Portworx NODE ID of the same node where the pod is trying to be created.Workaround: Restart the Portworx service on the impacted node. Components: Volume Management Affected Versions: 3.2.0 | Major |
PD-3408 | If you have configured IOPS and bandwidth for a FlashArray Direct Access volume, and that volume is snapshotted and later restored into a new volume, the original IOPS and bandwidth settings are not honored. Workaround: Manually set the IOPS and bandwidth directly on the FlashArray for the restored volume. Components: Volume Management Affected Versions: 3.1.4, 3.2.0 | Major |
PD-3434 | During node decommission, if a node is rebooted, it can enter a state where the node spec has been deleted, but the associated cloud drive has not been cleaned up. If this node is recommissioned, the Portworx reboot fails because both the previous and current drivesets are attached to the node.Workaround:
Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3409 | When a user create a journal device as a dedicated cloud drive and create the storage pool using the pxctl sv add-drive command, the cloud drives are not automatically deleted when the storage pool is deleted.Workaround: Manually remove the drives after deleting the pool. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3416 | When you change the zone or any labels on an existing Portworx storage node with cloud drives, Portworx may fail to start on that node. If the labels are changed, the driveset associated with the old zone might become orphaned, and a new storage driveset may be created. Workaround: To change topology labels on existing storage nodes, contact Portworx support for assistance. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3496 | For Portworx installation, using FlashArray Direct Access, without a Realm specified. If the user clones a volume that is inside a FlashArray pod and clones it to a new volume that is not in a FlashArray pod, the cloned volume appearsto be bound but might not be attachable. Workaround: Include the parameter pure_fa_pod_name: "" in the StoargeClass of the cloned volumes.Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3494 | In a vSphere local mode installation environment, users may encounter incorrect alerts stating that cloud drives were moved to a datastore lacking the expected prefix (e.g., local-i ).when performing Storage vMotion of VMDKs associated with specific VMs.Workaround: This alert can be safely ignored. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3365 | When you run the drop_cache service on Portworx nodes, it can cause Portworx to fail to start due to known issues in the kernel.Workaround: Avoid tunning drop_cache service on Portworx nodes.Components: Storage Affected Versions: 3.1.4, 3.2.0 | Minor |
3.1.8
January 28, 2025
Visit these pages to see if you're ready to upgrade to this version:
Note
This version addresses security vulnerabilities.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-41332 | When running Portworx in debug mode with FlashBlade, certain log entries displayed extraneous information under rare conditions. User Impact: Unwanted information appeared in the log entries. Resolution: Portworx has been enhanced to ensure that only relevant information is displayed. Affected Versions: 3.1.x, 3.0.x, 2.13.x Component: Volume Management | Major |
PWX-41329 PWX-41480 | When executing a few commands, extraneous information was displayed in their output. User Impact: Unwanted information appeared in the output of certain commands. Resolution: Portworx has been enhanced to ensure that only relevant information is displayed. Affected Versions: 3.1.x, 3.0.x, 2.13.x Component: CLI & API | Major |
3.1.7
December 3, 2024
Visit these pages to see if you're ready to upgrade to this version:
Note
This version addresses security vulnerabilities.
3.1.6.1
November 13, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-39990 | As part of node statistics collection, Portworx read the timestamp data stats while its storage component was updating them at the same time, leading to data conflicts. User Impact: The Portworx storage component restarted due to an invalid memory access issue. Resolution: A lock mechanism has been added to manage concurrent reads and writes to the timestamp data, preventing conflicts. Affected Versions: 3.1.0 Component: Storage | Critical |
3.1.6
October 02, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38930 | For PX-StoreV2 deployments with volumes that had a replica factor greater than 1 and were either remotely attached or not accessed through PX-Fast PVCs, if a power loss, kernel panic, or ungraceful node reboot occurred, the data was incorrectly marked as stable due to buffering in the underlying layers, despite being unstable. User Impact: In these rare situations, this issue can mark PVC data as unstable. Resolution: Portworx now correctly marks the data as stable , preventing this problem. Components: PX-StoreV2 Affected Versions: 2.13.x, 3.0.x, 3.1.x | Critical |
3.1.5
September 19, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-38849 | For Sharedv4 volumes, users can now apply the disable_others=true label to limit the mountpoint and export path permissions to 0770 , effectively removing access for other users and enhancing the security of the volumes. | Volume Management |
PWX-38791 | The FlashArray Cloud Drive volume driveset lock logic has been improved to ensure the driveset remains locked to its original node, which can otherwise detach due to a connection loss to the FlashArray during a reboot, preventing other nodes from claiming it:
| Drive & Pool Management |
PWX-38714 | During the DriveSet check, if device mapper devices are detected, Portworx cleans them before mounting FlashArray Cloud Drive volumes. This prevents mounting issues during failover operations on a FlashArray Cloud Drive volume. | Drive & Pool Management |
PWX-37642 | The logic for the sharedv4 mount option has been improved:
| Sharedv4 Volumes |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36679 | Portworx could not perform read or write operations on Sharedv4 volumes if NFSD version 3 was disabled in /etc/nfs.conf .User Impact: Read or write operations failed on Sharedv4 volumes. Resolution: Portworx no longer depends on the specific enabled NFSD version and now only checks if the service is running. Components: Shared Volumes Affected Versions: 3.1.0 | Major |
PWX-38888 | In some cases, when a FlashArray Direct Access volume failed over between nodes, Portworx version 3.1.4 did not properly clean up the mount path for these volumes. User Impact: Application pods using FlashArray Direct Access volumes were stuck in the Terminating state.Resolution: Portworx now properly handles the cleanup of FlashArray Direct Access volume mount points during failover between nodes. Components: Volume Management Affected Versions: 3.1.4 | Minor |
3.1.4
August 15, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37590 | Users running on environments with multipath version 0.8.8 and using FlashArray devices, either as Direct Access Volumes or Cloud Drive Volumes, may have experienced issues with the multipath device not appearing in time. User Impact: Users saw Portworx installations or Volume creation operations fail. Resolution: Portworx is now capable of running on multipath version 0.8.8. Components: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
3.1.3
July 16, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-37576 | Portworx has significantly reduced the number of vSphere API calls during the booting process and pool expansion. | Drive & Pool Management |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37870 | When PX-Security is enabled on a cluster that is also using Vault for storing secrets, the in-tree provisioner (kubernetes.io/portworx-volume) fails to provision a volume. User Impact: PVCs became stuck in a Pending state with the following error: failed to get token: No Secret Data found for Secret ID . Resolution: Use the CSI provisioner (pxd.portworx.com) to provision the volumes on clusters that have PX-Security enabled. Components: Volume Management Affected Versions: 3.0.3, 3.1.2 | Major |
PWX-37799 | A KVDB failure sometimes Portworx to restart when creating cloud backups. User Impact: Users saw Portworx restart unexpectedly. Resolution: Portworx now raises an alert, notifying users of a backup failure instead of unexpectedly restarting. Components: Cloudsnaps Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37661 | If the credentials provided in px-vsphere-secret were invalid, Portworx failed to create a Kubernetes client, and the process would restart every few seconds leading to many login failures continuously. User Impact: Users saw a large number of client creation trials, which may have lead to the credentials being blocked or too many API calls. Resolution: If the credentials are invalid, Portworx will now wait for secret to be changed before trying to log in again. Components: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37339 | Sharedv4 service failover did not work correctly when a node had a link-local IP from the subnet 169.254.0.0/16. In clusters running OpenShift 4.15 or later, Kubernetes nodes may have a link-local IP from this subnet by default. User Impact: Users saw disruptions in applications utilizing sharedv4-service volumes when the NFS server node went down. Resolution: Portworx has been improved to prevent VM outages in such situations. Components: Sharedv4 Affected Versions: 3.1.0.2 | Major |
3.1.2.1
July 8, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37753 | Portworx reloaded and reconfigured VMs on every boot, which is a costly activity in vSphere. User Impact: Users saw a significant number of VM reload and reconfigure activities during Portworx restarts, which sometimes overwhelmed vCenter. Resolution: Portworx has been optimized to minimize unnecessary reload and reconfigure actions for VMs. Now, these actions are mostly triggered only once during the VM's lifespan. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-35217 | Portworx maintained two vSphere sessions at all times. These sessions would become idle after Portworx restarts, and vSphere would eventually clean them up. vSphere counts idle sessions toward its session limits, which caused an issue if all nodes restarted simultaneously in a large cluster. User Impact: In large clusters, users encountered the 503 Service Unavailable error if all nodes restarted simultaneously.Resolution: Portworx now actively terminates sessions after completing activities like boot and pool expansion. Note that in rare situations where Portworx might not close the sessions, users may still see idle sessions. These sessions are cleaned by vSphere based on the timeout settings of the user's environment. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36727 | When a user decommissioned a node, Portworx would process the node deletion in the background. And for every volume delete or update operation, it checked if all nodes marked as decommissioned had no references to these volumes, which took a long time to delete a node. User Impact: The Portworx cluster went down as the KVDB node timed out. Resolution: The logic for decommissioning nodes has been improved to prevent such situations. Component: KVDB Affected Versions: 3.1.x, 3.0.x, 2.13.x | Minor |
3.1.2
June 19, 2024
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Customers can now migrate legacy shared volumes to sharedv4 service volumes.
- For FlashBlade Direct Access volumes, users can provide multiple NFS endpoints using the
pure_nfs_endpoint
parameter. This is useful when the same FlashBlade is shared across different zones in a cluster.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-33044 | Portworx will perform additional live VM migrations to ensure a KubeVirt VM always uses the block device directly by running the VM on the volume coordinator node. | Sharedv4 |
PWX-23390 | Stork will now raise events on a pod or VM object if it fails to schedule them in a hyperconverged fashion. | Stork and DR |
PWX-37113 | In KubeVirt environments, Portworx no longer triggers RebalanceJobStarted and RebalanceJobFinished alarms every 15 minutes due to the KubeVirt fix-vps job. Alarms are now raised only when the background job is moving replicas. | Storage |
PWX-36600 | The output of the rebalance HA-update process has been improved to display the state of each action during the process. | Storage |
PWX-36854 | The output of the pxctl volume inspect command has been improved. The Kind field can now be left empty inside the claimRef , allowing the output to include application pods that are using the volumes. | Storage |
PWX-33812 | Portworx now supports Azure PremiumV2_LRS and UltraSSD_LRS disk types. | Drive and Pool Management |
PWX-36484 | A new query parameter ce=azure has been added for Azure users to identify the cloud environment being used. The parameter ensures that the right settings and optimizations are applied based on the cloud environment. | Install |
PWX-36714 | The timeout for switching licenses from floating to Portworx Enterprise has been increased, avoiding timeout failures. | Licensing |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36869 | When using a FlashArray on Purity 6.6.6 with NVMe-RoCE, a change in the REST API resulted in a deadlock in Portworx. User Impact: FlashArray Direct Access attachment operations never completed, and FlashArray Cloud Drive nodes failed to start. Resolution: Portworx now properly handles the changed API for NVMe and does not enter a deadlock. Component: FA-FB Affected Versions: 3.1.x, 3.0.x, 2.13.x | Critical |
PWX-37059 | In disaggregated mode, storageless nodes restarted every few minutes attempting to claim the storage driveset and ended up being unsuccessful. User Impact: Due to storageless node restarts, some customer applications experienced IO disruption. Resolution: When a storage node goes down, Portworx now stops storageless nodes from restarting in a disaggregated mode, avoiding them to claim the storage driveset. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37351 | If the drive paths changed due to a node restart or a Portworx upgrade, it led to a storage down state on the node. User Impact: Portworx failed to restart because of the storage down state. Components: Drive & Pool Management Affected Versions: 3.1.0.3, 3.1.1.1 | Major |
PWX-36786 | An offline storageless node was auto-decommissioned under certain race conditions, making the cloud-drive driveset orphaned. User Impact: When Portworx started as a storageless node using this orphaned cloud-drive driveset, it failed to start since the node's state was decommissioned. Resolution: Portworx now auto-cleans such orphaned storageless cloud-drive drivesets and starts successfully. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36887 | When one of the internal KVDB nodes was down for several minutes, Portworx added another node to the KVDB cluster. Portworx initially added the new KVDB member as a learner. If, for some reason, KVDB connectivity was lost for more than a couple of minutes after adding the learner, the learner stayed in the cluster and prevented a failover to a different KVDB node. User Impact: The third node was not able to join the KVDB cluster with the error Peer URLs already exists. KVDB continued to run with only two members.Resolution: When Portworx encounters the above error, it removes the failed learner from the cluster, thereby allowing the third node to join. Component: Internal KVDB Affected Versions: 3.0.x, 3.1.1 | Major |
PWX-36873 | When Portworx was using HashiCorp's Vault configured with Kubernetes or AppRole authentication, it attempted to automatically refresh the access tokens when they expired. If the Kubernetes Service Account was removed or the AppRole expired, the token-refresh kept failing, and excessive attempts to refresh it caused a crash of the Vault service on large clusters. User Impact: The excessive attempts to refresh the tokens caused a crash of the Vault service on large clusters. Resolution: Portworx nodes now detect excessive errors from the Vault service and will avoid accessing Vault for the next 5 minutes. Component: Volume Management Affected Versions: 3.0.5, 3.0.3 | Major |
PWX-36601 | Previously, the default timeout for rebalance HA-update actions was 30 minutes. This duration was insufficient for some very slow setups, resulting in HA-update failures. User Impact: The rebalance job for HA-update failed to complete. In some cases, the volume's HA-level changed unexpectedly. Resolution: The default rebalance HA-update timeout has been increased to 5 hour Components: Storage Affected Versions: 2.13.x, 3.0.x, 3.1.x | Major |
PWX-35312 | In version 3.1.0, a periodic job that fetched drive properties caused an increase in the number of API calls across all platforms. User Impact: The API rate limits approached their maximum capacity more quickly, stressing the backend. Resolution: Portworx improved the system to significantly reduce the number of API calls on all platforms. Component: Cloud Drives Affected Versions: 3.1.0 | Major |
PWX-30441 | For AWS users, Portworx did not update the drive properties for the gp2 drives that were converted to gp3 drives. User Impact: As the IOPS of such drives changed, but not updated, pool expansion failed on these drives. Resolution: During the maintenance cycle, that is required for converting gp2 drives to gp3, Portworx now refreshes the disk properties of these drives. Component: Cloud Drives Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36139 | During pool expansion with the add-drive operation using the CSI provider on a KVDB node, there is a possibility of the new drive getting the StorageClass of the KVDB drive instead of the data drive, if they are different.User Impact: In such a case, a drive might have been added but the pool expansion operation failed, causing some inconsistencies. Resolution: Portworx takes the StorageClass of only the data drives present in the node. Component: Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-3031 | For an Azure cluster with storage and storageless nodes using Premium LRS or SSD drive types, when a user updates the Portworx StorageClass to use PremiumV2 LRS or Ultra SSD drive types, the changes might not reflect on the existing nodes.Workaround: StorageClass changes will apply only to the new nodes added to the cluster. For existing nodes, perform the following steps:
Affected versions: 3.1.2 | Major |
PD-3012 | If maxStorageNodesPerZone is set to a value greater than the current number of worker nodes in an AKS cluster, additional storage nodes in an offline state may appear post-upgrade due to surge nodes.Workaround: Manually delete any extra storage node entries created during the Kubernetes cluster upgrade by following thenode decommission process.Components: Cloud Drives Affected versions: 2.13.x, 3.0.x, 3.1.x | Major |
PD-3013 | Pool expansion may fail if a node is rebooted before the expansion process is completed, displaying errors such as drives in the same pool not of the same type . Workaround: Retry the pool expansion on the impacted node. Components: Drive and Pool Management Affected versions: 3.1.2 | Major |
PD-3035 | Users may encounter issues with migrations of legacy shared volumes to shared4v service volumes appearing stuck if performed on a decommissioned node. Workaround: If a node is decommissioned during a migration, the pods running on that node must be forcefully terminated to allow the migration to continue. Component: Shared4v Volumes Affected version: 3.1.2 | Major |
PD-3030 | In environments where multipath is used to provision storage disks for Portworx, incorrect shutdown ordering may occur, causing multipath to shut down before Portworx. This can lead to situations where outstanding IOs from applications, still pending in Portworx, may fail to reach the storage disk. Workaround:
Affected Versions: 3.1.2 | Major |
3.1.1
April 03, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-35939 | For DR clusters, the cluster domain of the nodes is exposed in the node inspect and node enumerate SDK responses. This information is used by the operator to create the pod disruption budget, preventing loss during Kubernetes upgrades. | DR and Migration |
PWX-35395 | When Portworx encounters errors like checksum mismatch or bad disk sectors while reading data from the backend disk the IOOperationWarning alert will be raised. This alert is tracked by the metric px_alerts_iooperationwarning_total . | Storage |
PWX-35738 | Portworx now queries an optimized subset of VMs to determine the driveset to attach, avoiding potential errors during an upgrade where a transient state of a VM could have resulted in an error during boot. | Cloud Drives |
PWX-35397 | The start time for Portworx on both Kubernetes and vSphere platforms has been significantly reduced by eliminating repeated calls to the Kubernetes API and vSphere servers. | Cloud Drives |
PWX-35042 | The Portworx CLI has been enhanced with the following improvements:
| Cloud Drives |
PWX-33493 | For pool expansion operations with the pxctl sv pool expand command, the add-disk and resize-disk flags have been renamed to add-drive and resize-drive , respectively. The command will continue to support the old flags for compatibility. | Cloud Drives |
PWX-35351 | The OpenShift Console now displays the Used Space for CSI sharedV4 volumes. | Sharedv4 |
PWX-35187 | Customers can now obtain the list of Portworx images from the spec generator. | Spec Generator |
PWX-36543 | If the current license is set to expire within the next 60 days, Portworx now automatically updates the IBM Marketplace license to a newer one upon the restart of the Portworx service. | Licensing |
PWX-36496 | The error messages for pxctl license activate have been improved to return a more appropriate error message in case of double activation. | Licensing |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36416 | When a PX-StoreV2 pool reached its full capacity and could not be expanded further using the resize-drive option, it went offline due to a pool full condition.User Impact: If pool capacity reached a certain threshold, the pool went offline. Resolution: Since PX-StoreV2 pools cannot be expanded using the add-drive operation. You can increase the capacity on a node by adding new pools to it:
Affected Versions: 3.0.0 | Critical |
PWX-36344 | A deadlock in the Kubernetes Config lock led to failed pool expansion. User Impact: Customers needed to restart Portworx if pool expansion became stuck. Resolution: An unbuffered channel that resulted in a deadlock when written to in a very specific window is now changed to have a buffer, breaking the deadlock. Components: Pool Management Affected Versions: 2.13.x, 3.0.x | Major |
PWX-36393 | Occasionally, Portworx CLI binaries were installed incorrectly due to issues (e.g., read/write errors) that the installation process failed to detect, causing the Portworx service to not start. User Impact: Portworx upgrade process failed. Resolution: Portworx has improved the installation process by ensuring the correct installation of CLI commands and detecting these errors during the installation. Components: Install Affected Versions: 2.13.x, 3.0.x | Major |
PWX-36339 | For a sharedv4 service pod, there was a race condition where the cached mount table failed to reflect the unmounting of the path. User Impact: Pod deletion got stuck in the Terminating state, waiting for the underlying mount point to be deleted. Resolution: Force refresh of cache for an NFS mount point if it is not attached and is already unmounted. This will ensure that the underlying mount path gets removed and the pod terminates cleanly. Components: Sharedv4 Affected versions: 2.13.x, 3.0.x | Major |
PWX-36522 | When FlashArray Direct Access volumes and FlashArray Cloud Drive volumes were used together, the system couldn't mount the PVC due to an Invalid arguments for mount entry error, causing the related pods to not start. User Impact: Application pods failed to start. Resolution: The mechanism to populate the mount table on restart has been changed to ensure an exact device match rather than a prefix-based search, addressing the root cause of the incorrect mount entries and subsequent failures. Components: Volume Management Affected version: 3.1.0 | Major |
PWX-36247 | The field portworx.io/misc-args had an incorrect value of -T dmthin instead of -T px-storev2 to select the backend type..User Impact: Customers had to manually change this argument to -T px-storev2 after generating the spec from the spec generator.Resolution: The value for the field has been changed to -T px-storev2 .Components: FA-FB Affected version: 3.1.0 | Major |
PWX-35925 | When downloading air-gapped bootstrap specific for OEM release (e.g. px-essentials ), the script used an incorrect URL for the Portworx images.User Impact: The air-gapped bootstrap script fetched the incorrect Portworx image, particularly for Portworx Essentials. Resolution: The air-gapped bootstrap has been fixed, and now efficiently handles the OEM release images. Components: Install Affected version: 2.13.x, 3.0.x | Major |
PWX-35782 | In a synchronous DR setup, a node repeatedly crashed during a network partition because Portworx attempted to operate on a node from another domain that was offline and unavailable. User Impact: In the event of a network partition between the two domains, temporary node crashes could occur. Resolution: Portworx now avoids the nodes that are not online or unavailable from other domain. Components: DR and Migration Affected version: 3.1.0 | Major |
PWX-36500 | Older versions of Portworx installations with FlashArray Cloud Drive displayed an incorrect warning message in the pxctl status output on RHEL 8.8 and above OS versions, even though the issue had been fixed in the multipathd package that comes with these OS versions.User Impact: With Portworx version 2.13.0 or above, users on RHEL 8.8 or higher who were using FlashArray Cloud Drives saw the following warning in the pxctl status output: WARNING: multipath version 0.8.7 (between 0.7.7 and 0.9.3) is known to have issues with crashing and/or high CPU usage. If possible, please upgrade multipathd to version 0.9.4 or higher to avoid this issue .Resolution: The output of pxctl status has been improved to display the warning message for the correct RHEL versions.Components: FA-FB Affected version: 2.13.x, 3.0.x, 3.1.0 | Major |
PWX-33030 | For FlashArray Cloud Drives, when the skip_kpartx flag was set in the multipath config, the partition mappings for device mapper devices did not load, prevented Portworx from starting correctly.User Impact: This resulted in a random device (either a child or a parent/dm device) with the UUID label being selected and attempted to be mounted. If a child device was chosen, the mount would fail with a Device is busy error.Resolution: Portworx now avoids such a situation by modifying the specific unbuffered channel to include a buffer, thus preventing the deadlock. Components: FA-FB Affected version: 2.13.x, 3.0.x | Minor |
3.1.0.1
March 20, 2024
Visit these pages to see if you're ready to upgrade to this version:
This is a hotfix release intended for IBM Cloud customers. Please contact the Portworx support team for more information.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36260 | When installing Portworx version 3.1.0 from the IBM Marketplace catalog, the PX-Enterprise IBM Cloud license for a fresh installation is valid until November 30, 2026. However, for existing clusters that were running older versions of Portworx and upgraded to 3.1.0, the license did not automatically update to reflect the new expiry date of November 30, 2026.User Impact: With the old license expiring on April 2, 2024, Portworx operations could be affected after this date. Resolution: To extend the license until November 30, 2026, follow the instructions on the Upgrading Portworx on IBM Cloud via Helm page to update to version 3.1.0.1. Components: Licensing Affected versions: 2.13.x, 3.0.x, 3.1.0 | Critical |
3.1.0
January 31, 2024
Visit these pages to see if you're ready to upgrade to this version:
Starting with version 3.1.0:
- Portworx CSI for FlashArray and FlashBlade license SKU will only support Direct Access volumes and no Portworx volumes. If you are using Portworx volumes, reach out to the support team before upgrading Portworx.
- Portworx Enterprise will exclusively support kernel versions 4.18 and above.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- The auto_journal profile is now available to detect the IO pattern and determine whether the
journal
IO profile is beneficial for an application. This detector analyzes the incoming write IO pattern to ascertain whether thejournal
IO profile would improve the application's performance. It continuously analyzes the write IO pattern and toggles between thenone
andjournal
IO profiles as needed. - A dynamic labeling feature is now available, allowing Portworx users to label Volume Placement Strategies(VPS) flexibly and dynamically. Portworx now supports the use of dynamic labeling through the inclusion of
${pvc.labels.labelkey}
in values.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-31558 | Google Anthos users can now generate the correct Portworx spec from Portworx Central, even when storage device formats are incorrect. | Spec Generation |
PWX-28654 | Added the NonQuorumMember flag to the node inspect and Enumerate SDK API responses. This flag provides an accurate value depending on whether a node contributes to cluster quorum. | SDK/gRPC |
PWX-31945 | Portworx now provides an internal API for listing all storage options on the cluster. | SDK/gRPC |
PWX-29706 | Portworx now supports a new streaming Watch API that provides updates on volume information that has been created, modified, or deleted. | SDK/gRPC |
PWX-35071 | Portworx now distinguishes between FlashArray and FlashBlade calls, routing them to appropriate backends based on the current volume type (file or block), thereby reducing the load on FlashArray or FlashBlade backends. | FA-FB |
PWX-34033 | For FlashArray and FlashBlade integrations, many optimizations have been made in caching and information sharing, resulting in a significant reduction in number of REST calls made to the backing FlashArray and FlashBlade. | FA-FB |
PWX-35167 | The default timeout for the FlashBlade Network Storage Manager (NSM) lock has been increased to prevent Portworx restarts. | FA-FB |
PWX-30083 | Portworx now manages the TTL for alerts instead of relying on etcd's key expiry mechanism. | KVDB |
PWX-33430 | The error message displayed when a KVDB lock times out has been made more verbose to provide a better explanation. | KVDB |
PWX-34248 | The sharedv4 parameter in a StorageClass enables users to choose between sharedv4 and non-shared volumes:
| Sharedv4 |
PWX-35113 | Users can now enable the forward-nfs-attach-enable storage option for applications using sharedv4 volumes. This allows Portworx to attach a volume to the most suitable available nodes. | Sharedv4 |
PWX-32278 | On the destination cluster, all snapshots are now deleted during migration when the parent volume is deleted. | Stork |
PWX-32260 | The resize-disk option for pool expansion is now also available on TKGS clusters. | Cloud Drives |
PWX-32259 | Portworx now uses cloud provider identification by reusing the provider's singleton instance, avoiding repetitive checks if the provider type is already specified in the cluster spec. | Cloud Drives |
PWX-35428 | In environments with slow vCenter API responses, Portworx now caches specific vSphere API responses, reducing the impact of these delays. | Cloud Drives |
PWX-33561 | When using the PX-StoreV2 backend, Portworx now detaches partially attached driversets for cloud-drives only when the cloud-drives are not mounted. | Cloud Drives |
PWX-33042 | In a disaggregated deployment, storageless nodes can be converted to storage nodes by changing the node label to portworx.io/node-type=storage | Cloud Drives |
PWX-28191 | AWS credentials for Drive Management can now be provided through a Kubernetes secret px-aws in the same namespace where Portworx is deployed. | Cloud Drives |
PWX-34253 | Azure users will now see accurate storage type displays: Premium_LRS is identified as SSD, and NVME storage is also correctly represented. | Cloud Drives |
PWX-31808 | Pool deletion is now allowed for vSphere cloud drives. | Cloud Drives |
PWX-32920 | vSphere drives can now be resized up to a maximum of 62 TB per drive. | Pool Management |
PWX-32462 | Portworx now permits most overlapping mounts and will only reject overlapping mounts if a bidirectional (i.e., shared) parent directory mount is present. | px-runc |
PWX-32905 | Portworx now properly detects the NFS service on OpenShift platforms. | px-runc |
PWX-35292 | To reduce log volume in customer clusters, logs generated when a volume is not found during CSI mounting have been moved to the TRACE level. | CSI |
PWX-34995 | Portworx CSI for FlashArray and FlashBlade license SKU now counts Portworx and FA/FB drives separately based on the drive type. | Licensing |
PWX-35452 | The mount mapping's lock mechanism has been improved to prevent conflicts between unmount and mount processes, ensuring more reliable pod start-ups. | Volume Management |
PWX-33577 | The fstrim operation has been improved for efficiency:
| Storage |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-31652 | Portworx was unable to identify the medium for the vSphere cloud drives. User Impact: Portworx deployment failed on vSphere with cloud drives. Resolution: Portworx now successfully identifies the drive medium type correctly and can be deployed on a cluster with vSphere cloud drives. Components: Drive & Pool Management Affected Versions: 2.13.x | Critical |
PWX-35430 | Requests for asynchronous DR migration operations were previously load balanced to nodes that were not in the same cluster domain. User Impact: In hybrid DR setups, such as one where cluster A is synchronously paired with cluster B, and cluster B is asynchronously paired with cluster C, any attempts to migrate from Cluster B to Cluster C would result in failure, showing an error that indicates a BackupLocation not found .Resolution: Portworx now ensures that migration requests are load balanced within nodes in the same cluster domain as the initial request. Components: DR and Migration Affected Versions: 3.0.4 | Critical |
PWX-35277 | In an asynchronous DR deployment, if security/auth is enabled in a Portworx cluster, migrations involving multiple volumes would fail with authentication errors. User Impact: Migrations in asynchronous DR setups involving multiple volumes failed with authentication errors. Resolution: Authentication logic has been modified to handle migrations involving multiple volumes on the auth enabled clusters. Components: DR and Migrations Affected versions: 3.0.0 | Critical |
PWX-34369 | When using HTTPS endpoints for cluster pairing, Portworx incorrectly parsed the HTTPS URL scheme. User Impact: Cluster pairing would fail when using an HTTPS endpoint. Resolution: Portworx has now corrected the HTTPS URL parsing logic. Components: DR and Migration Affected versions: 3.0.0 | Critical |
PWX-35466 | Cloudsnaps or asynchronous DR operations failed when attempted from a metro cluster due to inaccessible credentials. This issue specifically occurred if the credential was not available from both domains of the metro cluster. User Impact: Cloudsnap operations or asynchronous DR from metro clusters could fail if the required credentials were not accessible in both domains. Resolution: Portworx now detects a coordinator node that has access to the necessary credentials for executing cloudsnaps or asynchronous DR operations. Components: DR and Migration Affected versions: 3.0.4 | Critical |
PWX-35324 | FlashArray Direct Access volumes are formatted upon attachment. All newly created volumes remain in a pending state until they are formatted. If Portworx was restarted before a volume had been formatted, it would delete the volume that was still in the pending state. User Impact: The newly created FlashArray Direct Access volumes were deleted. Resolution: Portworx now avoids deleting volumes that are in the pending state. Components: FA-FB Affected versions: 3.0.x | Critical |
PWX-35279 | Upon Portworx startup, if there were volumes attached from a FlashArray that was not registered in the px-pure-secret , Portworx would detach them as part of a cleanup routine.User Impact: Non-Portworx disks, including boot drives and other FlashArray volumes, were mistakenly detached from the node and required reconnection. Resolution: Portworx no longer cleans up healthy FlashArray volumes on startup. Components: FA-FB Affected versions: 2.13.11, 3.0.0, 3.0.4 | Critical |
PWX-34377 | Portworx was incorrectly marking FlashBlade Direct Attach volumes being transitioned to read-only status. This incorrect identification led to a restart of all pods associated with these volumes. User Impact: The restart of running pods resulted in application restarts or failures. Resolution: Checks within Portworx that were leading to false identification of Read-Only transitions for FlashBlade volumes have been fixed. Components: FA-FB Affected versions: 3.0.4 | Critical |
PWX-32881 | The CSI driver failed to register after the Anthos storage validation test suite was removed and a node was re-added to the cluster. User Impact: The CSI server was unable to restart if the Unix domain socket had been deleted. Resolution: The CSI server now successfully restarts and restores the Unix domain socket, even if the socket has been deleted. Update to this version if your workload involves deleting the kubelet directory during node decommissioning.Components: CSI Affected versions: 3.0.0 | Critical |
PWX-31551 | The latest OpenShift installs have more strict SELinux policies, which prevent non-privileged pods to access the csi.sock CSI interface file.User Impact: Portworx install failed. Resolution: All Portworx CSI pods are now configured as privileged pods.Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Critical |
PWX-31842 | On TKGI clusters, if Portworx service and pods were restarted, it led to excessive mounts (mount-leaks). User Impact: The IO operations on the node would progressively slow down, until the host would completely hang. Resolution: The mountpoints that are used by Portworx have been changed. Components: oci-monitor Affected versions: 2.1.1 | Critical |
PWX-35603 | When running Portworx on older Linux systems (specifically those using GLIBC 2.31 or older) in conjunction with newer versions of Kubernetes, Portworx previously failed to detect dynamic updates of pod credentials and tokens, hence led to Unauthorized errors when utilizing Kubernetes client APIs.User Impact: Users could encounter Unauthorized errors when using Kubernetes client APIs.Resolution: Dynamic token updates are now processed correctly by Portworx. Components: oci-monitor Affected versions: 3.0.1 | Critical |
PWX-34250 | If encryption was applied on both the client side (using an encryption passphrase) and the server side (using Server-Side Encryption, SSE) for creating credential commands, this approach failed to configure S3 storage in Portworx to use both encryption methods. User Impact: Configuration of S3 storage would fail in the above mentioned condition. Resolution: Users can now simultaneously use both server-side and client-side encryption when creating credentials for S3 or S3-compatible object stores. Components: Cloudsnaps Affected versions: 3.0.2, 3.0.3, 3.0.4 | Critical |
PWX-22870 | Portworx installations would by default automatically attempt to install NFS packages on the host system. However, since NFS packages add new users/groups, they were often blocked on Red Hat Enterprise Linux / CentOS platforms with SELinux enabled. User Impact: Sharedv4 volumes failed to attach on platforms with SELinux enabled. Resolution: Portworx installation is now more persistent on Red Hat Enterprise Linux / CentOS platforms with SELinux enabled. Components: IPV6 Affected versions: 2.5.4 | Major |
PWX-35332 | Concurrent access to an internal data structure containing NFS export entries resulted in a Portworx node crashing with the fatal error: concurrent map read and map write in knfs.HasExports error.User Impact: This issue triggered a restart of Portworx on that node. Resolution: A lock mechanism has been implemented to prevent this issue. Components: Sharedv4 Affected versions: 2.10.0 | Major |
PWX-34865 | When upgrading Portworx from version 2.13 (or older) to version 3.0 or newer, the internal KVDB version was also updated. If there was a KVDB membership change during the upgrade, the internal KVDB lost quorum in some corner cases. User Impact: The internal KVDB lost quorum, enforcing Portworx upgrade of a KVDB node that was still on an older Portworx version. Resolution: In some cases, Portworx now chooses a different mechanism for the KVDB membership change. Components: KVDB Affected versions: 3.0.0 | Major |
PWX-35527 | When a Portworx KVDB node went down and subsequently came back online with the same node ID but a new IP address, Portworx nodes on the other servers continued to use the stale IP address for connecting to KVDB. User Impact: Portworx nodes faced connection issues while connecting to the internal KVDB, as they attempted to use the outdated IP address. Resolution: Portworx now updates the correct IP address on such nodes. Component: KVDB Affected versions: 2.13.x, 3.0.x | Major |
PWX-33592 | Portworx incorrectly applied the time set by the execution_timeout_sec option.User Impact: Some operations time out before the time set through the execution_timeout_sec option.Resolution: The behavior of this runtime option is now fixed. Components: KVDB Affected versions: 2.13.x, 3.0.x | Major |
PWX-35353 | Portworx installations (version 3.0.0 or newer) failed on Kubernetes systems using Docker container runtime versions older than 20.10.0. User Impact: Portworx installation failed on Docker container runtimes older than 20.10.0. Resolution: Portworx can now be installed on older Docker container runtimes. Components: oci-monitor Affected versions: 3.0.0 | Major |
PWX-33800 | In Operator version 23.5.1, Portworx was configured so that a restart of the Portworx pod would also trigger a restart of the portworx.service backend.User Impact: This configuration caused disruptions in storage operations. Resolution: Now pod restarts do not trigger a restart of the portworx.service backend.Components: oci-monitor Affected versions: 2.6.0 | Major |
PWX-32378 | During the OpenShift upgrade process, the finalizer service, which ran when Portworx was not processing IOs, experienced a hang and subsequently timed out. User Impact: This caused the OpenShift upgrade to fail. Resolution: The Portworx service now runs to stop Portworx and sets the PXD_timeout during OpenShift upgrades. Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Major |
PWX-35366 | When the underlying nodes of an OKE cluster were replaced multiple times (due to upgrades or other reasons), Portworx failed to start, displaying the error Volume cannot be attached, because one of the volume attachments is not configured as shareable .User Impact: Portworx became unusable on nodes that were created to replace the original OKE worker nodes. Resolution: Portworx now successfully starts on such nodes. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33413 | After an upgrade, when a zone name case was changed, Portworx considered this to be a new zone. User Impact: The calculation of the total storage in the cluster by Portworx became inaccurate. Resolution: Portworx now considers a zone name with the same spelling, regardless of case, to be the same zone. For example, Zone1, zone1, and ZONE1 are all considered the same zone. Components: Cloud Drives Affected versions: 2.12.1 | Major |
PWX-33040 | For Portworx users using cloud drives on the IBM platform, when the IBM CSI block storage plugin was unable to successfully bind Portworx cloud-drive PVCs (for any reason), these PVCs remained in a pending state. As a retry mechanism, Portworx created new PVCs. Once the IBM CSI block storage plugin was again able to successfully provision drives, all these PVCs got into a bound state.User Impact: A large number of unwanted block devices were created in users' IBM accounts. Resolution: Portworx now cleans up unwanted PVC objects during every restart and KVDB failover. Components: Cloud Drives Affected versions: 2.13.0 | Major |
PWX-35114 | The storageless node could not come online after Portworx was deployed and showed the failed to find any available datastores or datastore clusters error.User Impact: Portworx failed to start on the storageless node which had no access to a datastore. Resolution: Storageless nodes can now be deployed without any access to a datastore. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33444 | If a disk that was attached to a node became unavailable, Portworx continuously attempted to find the missing drive-set. User Impact: Portworx failed to restart. Resolution: Portworx now ignores errors related to missing disks and attempts to start by attaching to the available driveset, or it creates a new driveset if suitable drives are available on the node. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33076 | When more than one container mounted to a docker volume, all of them mounted to the same path as the mount path was not unique as it only had the volume name. User Impact: When one container used to go offline, it would unmount for the other container mounted to the same volume. Resolution: The volume mount HTTP request ID is now attached to the path which makes the path unique for every mount to the same volume. Components: Volume Management Affected versions: 2.13.x, 3.0.x | Major |
PWX-35394 | Host detach operation on the volume failed with the error HostDetach: Failed to detach volume .User Impact: A detach or unmount operation on a volume would get stuck if attach and detach operations were performed in quick succession, leading to incomplete unmount operations. Resolution: Portworx now reliably handles detach or unmount operations on a volume, even when attach and detach operations are performed in quick succession. Components: Volume Management Affected Versions: 2.13.x, 3.0.x | Major |
PWX-32369 | In a synchronous DR setup, cloudsnaps with different objectstores for each domain failed to backup and cleanup the expired cloudsnaps. User Impact: The issue occurred because of a single node, which did not have access to both the objectstores, was performing cleanup of the expired cloudsnaps. Resolution: Portworx now designates two nodes, one in each domain, to perform the cleanup of the expired cloudsnaps. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-35136 | During cloudsnap deletions, some objects were not removed because the deletion requests exceeded the S3 API's limit for the number of objects that could be deleted at once. User Impact: This would leave objects on S3 for deleted cloudsnaps, thereby consuming S3 capacity. Resolution: Portworx has been updated to ensure that deletion requests do not exceed the S3 API's limit for the number of objects that can be deleted. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-34654 | Cloudsnap status returned empty results without any error for a taskID that was no longer in the KVDB. User Impact: No information was provided for users to take corrective actions. Resolution: Portworx now returns an error instead of empty status values. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-31078 | When backups were restored to a namespace different from the original volume's, the restored volumes retained labels indicating the original namespace, not the new one. User Impact: The functionality of sharedv4 volumes would impact due to the labels not accurately reflecting the new namespace in which the volumes were located. Resolution: Labels for the restored volume have been fixed to reflect the correct namespace in which the volume resides. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-32278 | During migration, on destination cluster the orphan snapshot was left behind even though parent volume was not present during certain error scenarios. User Impact: This can lead to an increase in capacity usage. Resolution: Now, such orphan cloudsnaps are deleted when the parent volume is deleted. Components: Asynchronous DR Affected versions: 2.13.x, 3.0.x | Major |
PWX-35084 | Portworx incorrectly determined the number of CPU cores when running on hosts enabled with cGroupsV2. User Impact: This created issues when limiting the CPU resources, or pinning the Portworx service to certain CPU cores. Resolution: Portworx now properly determines number of available CPU cores. Components: px-runc Affected versions: 3.0.2 | Major |
PWX-32792 | On OpenShift 4.13, Portworx did not proxy portworx-service logs. It kept journal logs from multiple machine IDs, which caused the Portworx pod to stop proxying the logs from portworx.service .User Impact: In OpenShift 4.13, the generation of journal logs from multiple machine IDs led to the Portworx pod ceasing to proxy the logs from portworx.service .Resolution: Portworx log proxy has been fixed to locate the correct journal log using the current machine ID. Components: Monitoring Affected versions: 2.13.x, 3.0.x | Major |
PWX-34652 | During the ha-update process, all existing volume labels were removed and could not be recovered.User Impact: This resulted in the loss of all volume labels, significantly impacting volume management and identification. Resolution: Volume labels now do not change during the ha-update process.Components: Storage Affected versions: 2.13.x, 3.0.x | Major |
PWX-34710 | A large amount of log data was generated during storage rebalance jobs or dry runs. User Impact: This led to log files occupying a large amount of space. Resolution: The volume of logging data has been reduced by 10%. Components: Storage Affected versions: 2.13.x, | Major |
PWX-34821 | In scenarios where the system is heavily loaded and imbalanced, elevated syncfs latencies were observed. This situation led to the fs_freeze call, responsible for synchronizing all dirty data, timing out before completion.User Impact: Users experienced timeouts during the fs_freeze call, impacting the normal operation of the system.Resolution: Restart the system and retry the snapshot operation. Components: Storage Affected versions: 3.0.x | Major |
PWX-33647 | When the Portworx process are restarted, it verifies the existing mounts on the system for sanity. If one of the mounts was NFS mount of a Portworx volume, the mount point verification would hung as Portworx was in the process of starting up. User Impact: The Portworx process would not come up and would enter an infinite wait, waiting for the mount point verification to return. Resolution: When Portworx is starting up, it now skips the verification of Portworx-backed mount points to allow the startup process to continue. Components: Storage Affected versions: 3.0.2 | Major |
PWX-33631 | Portworx applied locking mechanisms to synchronize requests across different worker nodes during the provisioning of CSI volumes, to distribute workloads evenly causing decrease in performance for CSI volume creation. User Impact: This synchronization approach led to a decrease in performance for CSI volume creation in heavily loaded clusters. Resolution: If experiencing slow CSI volume creation, upgrade to this version. Components: CSI Affected versions: 2.13.x, 3.0.x | Major |
PWX-34355 | In certain occasions, while mounting an FlashArray cloud drive disks backing a storage pool, Portworx used the single path device instead of multipath device. User Impact: Portworx entered in the StorageDown state.Resolution: Portworx now identifies the multipath device associated with a given device name and uses this multipath device for mounting operations. Components: FA-FB Affected versions: 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.13.11, 3.0.0 | Major |
PWX-34925 | When a large number of FlashBlade Direct Access volumes were created subsequently could lead to restating of Portworx with the fatal error: sync: unlock of unlocked mutex error.User Impact: When trying to create a large number of FlashBlade volumes concurrently, Portworx process might get restarted due to contention on the lock. Resolution: Improved the locking mechanism to avoid this error. Components: FA-FB Affected versions: 3.0.4 | Major |
PWX-35680 | The Portworx spec generator was incorrectly defaulting telemetry to be disabled when the StorageCluster spec was generated outside of the Portworx Central UI. This does not affect customers who applied a storagecluster with an empty telemetry spec or generated their spec through the UI. User Impact: Telemetry was disabled by default. Resolution: To enable telemetry, users should explicitly specify it if intended. Components: Spec-Gen Affected versions: 2.12.0, 2.13.0, 3.0.0 | Major |
PWX-34325 | When operating Kubernetes with the containerd runtime and a custom root directory set in the containerd configuration, the installation of Portworx would fail.User Impact: Portworx install would fail, resulting in unusual error messages due to a bug in containerd. Resolution: The installation will now intercept the error message and replace it with a clearer message that includes suggestions on how to fix the Portworx configuration. Components: Installation Affected versions: 3.0.0 | Minor |
PWX-33557 | The CallHome functionality sometimes unconditionally attempted to send the data to the local telemetry service. User Impact: This caused errors, if the telemetry was disabled. Resolution: The CallHome now sends data only if the Telemetry has been enabled. Components: Monitoring Affected versions: 3.0.0 | Minor |
PWX-32536 | Portworx installation failed on certain Linux systems using cGroupsV2 and containerd container runtimes, as it was unable to properly locate container identifiers. User Impact: Portworx installation failed. Resolution: The container scanning process has been improved to ensure successful Portworx installation on such platforms. Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Minor |
PWX-30967 | During volume provisioning, snapshot volume labels are included in the count. The nodes were disqualified for provisioning when volume_anti_affinity or volume_affinity VPS was configured, resulting in volume creation failures.User Impact: When stale snapshots existed, the creation of volumes using the VPS with either volume_anti_affinity or volume_affinity setting would fail.Resolution: Upgrade to this version and retry previously failed volume creation request. Components: Stork Affected versions: 2.13.2 | Minor |
PWX-33999 | During the installation of NFS packages, Portworx incorrectly interpreted any issues or errors that occurred as timeout errors. User Impact: Portworx misrepresented and masked the original issues. Resolution: Portworx now accurately processes NFS installation errors during its installation. Components: px-runc Affected versions: 2.7.0 | Minor |
PWX-33008 | Creation of a proxy volume with CSI enabled and RWX access mode failed due to the default use of sharedv4 for all RWX volumes in CSI. User Impact: Users could not create proxy volumes with CSI enabled and RWX access mode. Resolution: To successfully create proxy volumes with CSI and RWX access mode, upgrade to this version. Components: Sharedv4 Affected versions: 3.0.0 | Minor |
PWX-34326 | The Portworx CSI Driver GetPluginInfo API returned an incorrect CSI version. User Impact: This resulted in confusion when the CSI version was retrieved by the Nomad CLI. Resolution: The Portworx CSI Driver GetPluginInfo API now returns the correct CSI version. Components: CSI Affected versions: 2.13.x,3.0.x | Minor |
PWX-31577 | Occasionally, when a user requested cloudsnap to stop, it would lead to incorrect increase in the available resources. User Impact: More cloudsnaps were started and they were stuck in the NotStarted state as resources were unavailable.Resolution: Stopping cloudsnaps does not incorrectly now increase the available resources, thus avoiding the issue. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-2673 | KubeVirt VM or container workloads may remain in the Starting state due to the remounting of volumes failing with a device busy error.Workaround:
Affected versions: 2.13.x, 3.0.x | Critical |
PD-2546 | In a synchronous DR deployment, telemetry registrations might fail on the destination cluster. Workaround:
Affected versions: 3.0.4 | Critical |
PD-2574 | If a disk is removed from an online pool using the PX-StoreV2 backend, it may cause a kernel panic. Workaround: To avoid kernel panic, do not remove disks from an online pool or node. Components: Storage Affected versions: NA | Critical |
PD-2387 | In OpenShift Container Platform (OCP) version 4.13 or newer, application pods using Portworx sharedv4 volumes can get stuck in Terminating state. This is because kubelet is unable to stop the application container when an application namespace is deleted.Workaround:
Terminating state, reboot the node on which the pod is running. Note that after rebooting, it might take several minutes for the pod to transition out of the Terminating state.Components: Sharedv4 Affected versions: 3.0.0 | Major |
PD-2621 | Occasionally, deleting a TKGi cluster with Portworx fails with the Warning: Executing errand on multiple instances in parallel. error.Workaround: Before deleting your cluster, perform the following steps:
Components: Kubernetes Integration Affected versions: | Major |
PD-2631 | After resizing a FlashArray Direct Access volume with a filesystem (such as ext4, xfs, or others) by a significant amount, you might not be able to detach the volume, or delete the pod using this volume. Workaround: Allow time for the filesystem resizing process to finish. After the resize is complete, retry the operations. Components: FA-FB Affected versions: 2.13.x, 3.0.x, 3.1.0 | Major |
PD-2597 | Online pool expansion with the add-disk operation might fail when using the PX-StoreV2 backend.Workaround: Enter the pool into maintenance mode, then expand your pool capacity. Components: Storage Affected versions: 3.0.0, 3.1.0 | Major |
PD-2585 | The node wipe operation might fail with the Node wipe did not cleanup all PX signatures. A manual cleanup maybe required. error on a system with user setup device names containing specific Portworx reserved keywords(such as pwx ).Workaround: You need to rename or delete devices that use Portworx reserved keywords in their device names before retrying the node wipe operation. Furthermore, it is recommended not to use Portworx reserved keywords such as px , pwx , pxmd , px-metadata , pxd , or pxd-enc while setting up devices or volumes, to avoid encountering such issues.Components: Storage Affected versions: 3.0.0 | Major |
PD-2665 | During a pool expansion operation, if a cloud-based storage disk drive provisioned on a node is detached before the completion of the pool resizing or rebalancing, you can see the show drives: context deadline exceeded error in the output of the pxctl sv pool show command.Workaround: Ensure that cloud-based storage disk drives involved in pool expansion operations remain attached until the resizing and rebalancing processes are fully completed. In cases where a drive becomes detached during this process, hard reboot the node to restore normal operations. Component: PX-StoreV2 Affected versions: 3.0.0, 3.1.0 | Major |
PD-2833 | With Portworx 3.1.0, migrations might fail between two clusters if one of the clusters is running a version of Portworx older than 3.1.0, resulting in a key not found error.Workaround: Ensure that both the source and destination clusters are upgraded to version 3.1.0 or newer. Components: DR & Migration Affected Versions: 3.1.0 | Minor |
PD-2644 | If an application volume contains a large number of files (e.g., 100,000) in a directory, changing the ownership of these files can take a long time, causing delays in the mount process. Workaround: If the ownership change is taking a long time, Portworx by Pure Storage recommends setting fsGroupChangePolicy to OnRootMismatch . For more information, see the Kubernetes documentation.Components: Storage Affected versions: 2.13.x, 3.0.x | Minor |
PD-2359 | When a virtual machine is transferred from one hypervisor to another and Portworx is restarted, the CSI container might fail to start properly and shows the CrashLoopBackoff error.Workaround: Remove the topology.portworx.io/hypervisor label from the affected node.Components: CSI Affected versions: 2.13.x, 3.0.x | Minor |
PD-2579 | When the Portworx pod (oci-mon ) cannot determine the management IP used by the Portworx container, the pxctl status command output on this pod shows a Disabled or Unhealthy status.Workaround: This issue is related to display only. To view the correct information, run the following command directly on the host machine: kubectl exec -it <oci-mon pod> -- nsenter --mount=/host_proc/1/ns/mnt -- pxctl status .Components: oci-monitor Affected versions: 2.13.0 | Minor |
3.0.5
April 17, 2024
Visit these pages to see if you're ready to upgrade to this version:
For users currently on Portworx versions 2.11.x, 2.12.x, or 2.13.x, Portworx by Pure Storage recommends upgrading to Portworx 3.0.5 instead of moving to the next major version.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36858 | When using Hashicorp Vault integration, Portworx nodes kept attempting to connect to the Vault service. In the case of misconfigured authentication, the excessive attempts to log in to Vault crashed the Vault service. User Impact: Excessive attempts led to crashing of Vault services. Resolution: Portworx has implemented exponential back-off to reduce the frequency of login attempts to the Vault service. Components: Secret Store Affected Versions: 3.0.4 | Critical |
PWX-36873 | When Portworx is using Harsicorp's Vault configured with Kubernetes or Approle authentication, it automatically refreshes expired access tokens. However, if the Kubernetes or Service Account got removed or Approle expired, the token-refresh failed. User Impact: Excessive attempts to refresh the access tokens caused the Vault service to crash, especially in large clusters. Resolution: The Portworx node now identifies excessive errors from the Vault service and and will avoid accessing Vault for a cooling-off period of 5 minutes. Components: Secret Store Affected Versions: 3.0.3 | Major |
PWX-36847 | In case of a Kubernetes API call failure, Portworx used to incorrectly assume the zone of the node to be the default empty zone. Due to this, it tried to attach drives that belonged to that default zone. As there are no drives created in this default zone, Portworx went ahead and created a new set of drives, assuming this node to be in a different zone. User Impact: This led to duplicate entries and cluster went out of quorum. Resolution: Portworx now does not treat the default zone as a special zone. This allows Portworx to check for any existing drives that are already attach or available to be attached from any zone before trying to create new ones. Components: Cloud Drives Affected Versions: 3.0.3 | Major |
PWX-36786 | An offline, storageless node was incorrectly auto-decommissioned due to specific race conditions, resulting in the clouddrive DriveSet being left orphaned. User Impact: Portworx failed to start when attempting to operate as a storageless node using this orphaned clouddrive DriveSet, due to the node being in a decommissioned state. Resolution: Portworx now automatically cleans up such orphaned storageless clouddrive DriveSets, allowing it to start successfully. Components: Cloud Drive Affected Versions: 2.13.x, 3.0.x, and 3.1.x | Major |
3.0.4
November 15, 2023
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-34315 | Improved how Portworx identifies pods with volumes in the Read-Only state before restarting them. | Storage |
PWX-34153 | CSI sidecar images are updated to the latest open source versions. | CSI |
PWX-34029 | Portworx now removes stale FlashArray multipath devices upon startup, which may result from pod failovers (for FlashArray Direct Access) or drive set failovers (for FlashArray Cloud Drives) while Portworx was not running. These stale devices had no direct impact but could have led to slow operations if many were present. | FA-FB |
PWX-34974 | Users can now configure the default duration, which is set to 15 minutes, after which the logs should be refreshed to get the most up-to-date statistics for FlashBlade volumes, using the following command::pxctl cluster options update --fb-stats-expiry-duration <time-in-minutes> The minimum duration for refresh is one minute. | FA-FB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-34334 | Cloudsnaps of an aggregated volume with a replication level of 2 or more uploaded incorrect data if one of the replica nodes from which a previous cloudsnap operation had been executed was down. User Impact: The most recent snapshots were lost. Resolution: Portworx now forces a full backup in scenarios where the previous cloudsnap node is down. Components: Cloudsnaps Affected versions: 3.0.x | Critical |
PWX-33632 | If an attach request remained in the processing queue for a long time, it would lead to a panic. User Impact: Portworx would restart on the node. This was because an FA attach operation involved making REST API calls to FA, as well as running iSCSI rescans, which consumed more time. When Portworx received a high volume of requests to attach FA DirectAccess volumes, the queue for these attach requests gradually grew over time, leading to a panic in Portworx. Resolution: The timeout for queued attach requests has been increased to 15 minutes for FA DirectAccess volumes. Components: FA-FB Affected versions: 2.13.x, 3.0.x | Critical |
PWX-34885 | When NFS proxy volumes were created, it resulted in the restart of the Portworx service. User Impact: Although NFS proxy volumes were created, the service restart affected user applications. Resolution: Portworx now creates NFS proxy volumes successfully without restarting the Portworx service. Components: Storage Affected versions: 3.0.2 | Critical |
PWX-34277 | When an application pod using an FA Direct Access volume was failed over to another node, and Portworx was restarted on the original node, the pod on the original node became stuck in the Terminating state. User Impact: Portworx didn't clean up the mountpaths where the volume had previously been attached, as it couldn't locate the application on the local node. Resolution: Portworx now cleans up the mountpath even when the application is not found on the node. Components: FA-FB Affected versions: 2.13.x, 3.0.x | Major |
PWX-30297 | Portworx failed to restart when a multipath device was specified for the internal KVDB. Several devices with the kvdbvol label were found for the multipath device. Portworx selected the first device on the list, which might not have been the correct one.User Impact: Portworx failed to start because it selected the incorrect device path for KVDB. Resolution: When a multipath device is specified for the internal KVDB, Portworx now selects the correct device path. Components: KVDB Affected versions: 2.11.x | Major |
PWX-33935 | When the --sources option was used in the pxctl volume ha-update command for the aggregated volume, it caused the Portworx service processes to abort with an assertion.User Impact: The Portworx service on all nodes in the cluster continuously kept restarting. Resolution: Contact the Portworx support team to restore your cluster. Components: Storage Affected versions: 2.13.x, 3.0.x | Major |
PWX-33898 | When two pods, both using the same RWO FA Direct Access volume, were started on two different nodes, Portworx would move the FA Direct Access volume attachment to the node where the most recent pod was running, rather than rejecting the setup request for the second pod. User Impact: A stale FA Direct Access multipath device remained on the original node where the first pod was started, causing subsequent attach or mount requests on that node to fail. Resolution: A second pod request for the same RWO FA Direct Access volume on a different node will now be rejected if such a FA Direct Access volume is already attached and in use on another node. Components: FA-FB Affected versions: 2.13.11 | Major |
PWX-33828 | If you deleted a FA Direct Access PVC attached to an offline Portworx node, Portworx removed the associated volume from its KVDB. However, the FlashArray did not delete its associated volume because it remained connected to the offline node on the FlashArray. User Impact: This created orphaned volumes on the FlashArray. Resolution: Portworx now detects a volume that is attached to an offline Portworx node and will disconnect it from all the nodes in the FlashArray and avoid orphaned volumes. If there are any existing orphaned volumes, clean them manually. Components: FA-FB Affected versions: 2.13.8 | Major |
3.0.3
October 11, 2023
Notes
- This version addresses security vulnerabilities.
- Starting with version 3.0.3, aggregated volumes with PX-StoreV2 are not supported.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-32255 | Now the runtime option fast_node_down_detection is enabled by default. This option allows quick detection of when the Portworx service goes offline. | Storage |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33113 | Portworx reduced the pricing for GCP Marketplace from 55 cents/node/hour to 33 cents/node/hour, but this change was not being reflected for existing users who were still reporting billing to the old endpoint. User Impact: Existing GCP Marketplace users were being incorrectly billed at the previous rate of 55 cents/node/hour. Resolution: Upgrade Portworx to version 3.0.3 to reflect the new pricing rate. Components: Billing Affected versions: 2.13.8 | Critical |
PWX-34025 | In certain cases, increasing the replication level of a volume on a PX-StoreV2 cluster created new replicas with non-zero blocks that had been overwritten with zeros on the existing replicas. User Impact: The Ext4 filesystem reported a mismatch and delayed allocation failures when a user application attempted to write data to the volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the volume.Components: PX-StoreV2 Affected versions: 3.0.2 | Major |
3.0.2
September 28, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-32226 | AWS users can now choose to enable server side encryption for s3 credentials, assuming the s3 object-store provider supports it. Use the --s3-sse flag with either the AES256 or aws:kms value.
| Cloudsnaps |
PWX-33229 | Previously, a Portworx license would expire if Portworx could not reach its billing server within 72 hours. Now users can continue to use Portworx for up to 30 days if the billing servers are not reachable. | Licensing |
PWX-31233 | Portworx has removed volume size enforcement for FlashArray and FlashBlade Direct Access volumes. This will allow users to create volumes greater than 40TiB for all license types. | Licensing |
PWX-33551 | Users can now configure the REST API call timeout (in seconds) for FA/FB by adding the new environment variable PURE_REST_TIMEOUT to the StorageCluster. When updating this value, the execution timeout should also be updated accordingly using the following command:pxctl cluster options update --runtime-options execution_timeout_sec=<sec> PURE_REST_TIMEOUT is set to 8 seconds and execution_timeout_sec to 60 seconds by default. Contact Portworx support to find the right values for your cluster. | FA-FB |
PWX-33364 | As part of FlashArray integration, Portworx has now reduced the number of API calls it makes to the arrays endpoint on FA. | FA-FB |
PWX-33593 | Portworx now caches certain FlashArray attachment system calls, improving the performance of mount operations for FA backed volumes on nodes with large numbers of attached devices, or many redundant paths to the array. | FA-FB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33451 | In certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data was expected. User Impact: Ext4 filesystem complained about a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume.Components: Storage Affected versions: 3.0.0, 2.12.x, 2.13.x | Critical |
PWX-33258 | Sometimes, Portworx timed out FlashBlade direct access volume creation when it took over 30 seconds. User Impact: Volume creation stayed in a pending state. Resolution: The timeout for FB volume creation has been increased to 180 seconds (3 minutes) to allow more time for FBDA volume creation. User can now use the --fb-lock-timeout cluster option to increase the timeout for FB volume creation beyond 180 seconds (3 minutes).Components: FA-FB Affected versions: 2.13.6 | Critical |
PWX-32428 | For the PKS environment, the sharedv4 mount failed on the remote client node with the error No such file or directory .User Impact: The restarts of the Portworx pods and service lead to excessive mounts (mount-leaks) on the PKS platforms. Thus, progressively slowing down the IO operations on the node. Resolution: The mountpoints are changed that Portworx uses on the PKS platform. If you are experiencing slowdowns on a PKS node, upgrade the Operator to the latest version, and reboot the affected PKS nodes. Components: Sharedv4 Affected versions: 2.12.x, 2.13.x | Critical |
PWX-33388 | The standalone SaaS metering agent crashed the Portworx container with a nil panic error. User Impact: This caused the Portworx container on one node to crash continuously. Resolution: Upgrade to 3.0.2 if you are using a SaaS license to avoid this issue. Components: Control Plane Affected versions: 3.0.1, 3.0.0 | Critical |
PWX-32074 | The CPU core numbers were wrongly detected by the px-runc command.User Impact: Portworx did not start on the requested cores. Resolution: The behavior of the --cpuset-cpus argument of the px-runc install command has been fixed. User can now specify the CPUs on which Portworx execution should be allowed.Components: px-runc Affected versions: 2.x.x | Critical |
PWX-33112 | Timestamps were incorrectly recorded in the write-ahead log. User Impact: The write operations were stuck due to a lack of log reservation space. Resolution: Portworx now consistently flushes timestamp references into the log. Components: Storage Affected versions: 2.12.x, 2.13.x | Critical |
PWX-31605 | The pool expansion failed because the serial number from the WWID could not be extracted. User Impact: FlashArray devices (both cloud drives and direct access) encountered expansion or attachment failures when multipath devices from other vendors (such as HPE or NetApp) were attached. Resolution: This issue has been fixed. Components: Pool Management Affected versions: 2.13.2 | Critical |
PWX-33120 | Too many unnecessary vSphere API calls were made by Portworx. User Impact: An excess of API calls and vSphere events could have caused confusion and distraction for users of vSphere Cloud Drives. Resolution: If you are seeing many vSphere VM Reconfigure events at a regular interval in the clusters configured with Portworx Cloud Drives, upgrade Portworx to the latest version. Components: Metering & Billing Affected versions: 3.0.0 | Major |
PWX-33299 | When using custom image registry, OCI-Monitor was unable to locate Kubernetes nampspaces to pull secrets. User Impact: Portworx installation failed with the error Failed retrieving default/tcr-pull-cpaas-5000 . Resolution: Portworx now consults container-runtime and Kubernetes to determine the correct Kubernetes namespace for Portworx installation. Components: OCI Monitor Affected versions: 3.0.0, 2.13.x, 2.12.x | Major |
PWX-31840 | When resizing a volume, the --provisioning-commit-labels cluster option was not honored, resulting in unlimited thin provisioning. User Impact: Portworx volumes were resized to large sizes without rejections, exceeding pool provisioning limits. Resolution: Now the --provisioning-commit-labels cluster option is honored during resizing volumes and prevents unexpected large volumes.Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-32572 | When using the older Containerd versions (v1.4.x or 1.5.x), Portworx kept opening connections to Containerd, eventually depleting all the file-descriptors available on the system. User Impact: Portworx nodes crashed with the too many open files error. Resolution: Portworx no longer leaks the file-descriptors when working with older Containerd versions. Components: OCI Monitor Affected versions: 2.13.6, 3.0.0 | Minor |
PWX-30781 | The kubernetes version parameter (?kbver ) in the air-gapped script did not process the version extension.User Impact: The script generated the wrong image URLs for the Kubernetes dependent images. Resolution: Parsing of the kbver parameter has been fixed. Components: Spec Generator Affected versions: 3.0.0 | Minor |
3.0.1
September 3, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer, causing the Portworx license status to appear expired. User Impact: Users could not create new volumes. Resolution: The auth token is no longer used by Portworx when making API or api_version calls to FA during license validation.Components: FA-FB Affected versions: 3.0.0 | Critical |
PWX-33223 | Portworx was hitting a panic when a value was set for an uninitialized object. User Impact: This caused the Portworx container to crash and restart. Resolution: Upgrade to Portworx version 3.0.1 if using Pure cloud drives. Components: FA-FB Affected versions: 3.0.0 | Major |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
3.0.0
July 11, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
Portworx 3.0.0 requires Portworx Operator 23.5.1 or newer.
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
AWS users can now deploy Portworx with the PX-StoreV2 datastore. In order to have PX-StoreV2 as your default datastore, your cluster should pass the preflight check, which verifies your cluster's compatibility with the PX-StoreV2 datastore.
-
You can now provision and use cloud drives on FlashArrays that are in the same zone using the CSI topology for FlashArray Cloud Drives feature. This improves fault tolerance for replicas, performance, and manageability for large clusters.
-
For environments such as GCP and Anthos that follow blue-green upgrade model, Portworx allows temporary license extension to minimize downtime during upgrades. Once you start the license expansion, the Portworx cluster's license will temporarily be extended to accommodate up to double the number of licensed nodes. While the existing nodes (called blue nodes) serve production traffic, Portworx will expand the cluster by adding new nodes (called green nodes) that have upgraded Linux OS or new hardware.
-
Portworx now offers the capability to utilize user-managed keys for encrypting cloud drives on Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE). By leveraging powerful encryption algorithms, the Oracle disk encryption feature converts data into an encrypted format, ensuring that unauthorized individuals cannot access it. You can specify the encryption key in the StorageCluster using the following cloud-drive volume specifications:
type=pv-<number-of-vpus>,size=<size-of-disk>,kms=<ocid-of-vault-key>
-
Portworx now enables you to define custom tags for cloud drives provisioned across various platforms such as AWS, Azure, GCP, and Oracle cloud. While installing Portworx, you can specify the custom tags in the StorageCluster spec:
type=<device-type>,size=<volume-size>,tags=<custom-tags>
This user-defined metadata enhances flexibility, organization, and provides additional contextual information for objects stored in the cloud. It empowers users with improved data management, search capabilities, and greater control over their cloud-based data.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-29486 | Portworx now supports online expansion of storage pools containing auto journal devices with disk-resize operation. | Pool Management |
PWX-29435 | When you run the pxctl sv pool expand -o add-disk command, the common disk tags from existing drives will be attached to the newly added cloud-drive disk. | Pool Management |
PWX-28904 | Storage pool expansion now supports online pool resizing on Azure, with no downtime. This is as long as Microsoft's documentation requirements are met. | Pool Management |
PWX-30876 | Pool expansion with add-disk operation is now supported for repl1 volumes. | Pool Management |
PWX-29863 | The pool expansion completion message is improved to Pool resize requested successfully. Please check resize operation status with pxctl sv pool show . | Pool Management |
PWX-28665 | The pxctl cd list command now lists cloud-drives on nodes with local drives. | Cloud Drives |
PWX-28697 | FlashArray cloud drives now show information about the array they are located on. Use pxctl cd inspect to view this information. | Cloud Drives |
PWX-29348 | Added 3 new fields to the CloudBackupSize API to reflect the correct backup size:
| Cloudsnaps |
PWX-27610 | Portworx will now periodically defragment the KVDB database. KVDB will be defragmented every 2 weeks by default, if the DB size is greater than 100 MiB. You can also configure the defragment schedule using the following options with the pxctl cluster options update command:
| KVDB |
PWX-31403 | For AWS clusters, Portworx now defaults the following configurations for dedicated KVDB disk:
| KVDB |
PWX-31055 | The alert message for VolumeSpaceLow is improved to show clear information. | Storage |
PWX-29785 | Improved the implementation to restrict the nodiscard and autofstrim flags on XFS volumes. These two flags are disabled for volumes formatted with XFS. | PX-StoreV1 |
PWX-30557 | Portworx checks pool size and drive count limits before resizing the storage pool. It will abort with a proper error message if the resolved pool expansion plan exceeds limits. | PX-StoreV2 |
PWX-30820 | Portworx now redistributes cloud migration request received from stork between all the nodes in the cluster using a round-robin mechanism. This helps evenly distribute the migration workload across all the nodes in the cluster and avoids hot spots. | DR & Migration |
PWX-29428 | Portworx CSI images now use the registry.k8s.io registry. | CSI |
PWX-28035 | Portworx now supports distributing FlashArray Cloud Drive volumes among topologically distributed FlashArrays. | FA-FB |
PWX-31500 | The pxctl cluster provision-status command will now show more states of a pool. | CLI |
PWX-31257 | The pxctl alerts show command with the --start-time and --end-time options can now be used independently. | Monitoring |
PWX-30754 | Added support for leases permission to the PVC controller ClusterRole. | Spec Generation |
PWX-29202 | pxctl cluster provision-status will now show the host name for nodes. The host name helps you to correlate that command's output with the node list provided by pxctl status . | CLI |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-30030 | Some volumes incorrectly showed Not in quorum status.User Impact: Portworx volumes were out of quorum after a network split even though all the nodes and pools for the volume's replica were online and healthy. This happened when the node could communicate over network to KVDB but not with rest of the nodes. Resolution: Restart the Portworx service on the node where the volume is currently attached. Components: Storage Affected versions: 2.12.2 | Critical |
PWX-30511 | When autofstrim was disabled, internal autofstrim volume information was not removed completely. User Impact: An error occurred while running manual fstrim. Resolution: This issue has been fixed. Components: Storage Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30294 | The pvc-controller pods failed to start in the DaemonSet deployment. User Impact: The pvc-controller failed due to the deprecated values of the --leader-elect-resource-lock flag.Resolution: These values have been removed to use the default leases value.Components: Spec Generator Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30930 | The KVDB cluster could not form a quorum after KVDB was down on one node. User Impact: On a loaded cluster or when the underlying KVDB disk had latency issues, KVDB nodes failed to elect leaders among themselves. Resolution: Increase the heartbeat interval using the runtime option kvdb-heartbeat-interval=1000 .Components: KVDB Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30985 | Concurrent pool expansion operations using add-disk and auto resulted in pool expansion failure, with the error mountpoint is busy .User Impact: Pool resize requests were rejected. Resolution: Portworx now serializes pool expansion operations. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30685 | In clusters running with cloud drives and auto-journal partitions, pool deletion resulted in deleting the data drive with an auto-journal partition. User Impact: Portworx had issues restarting after the pool deletion operation. Resolution: Upgrade to the current Portworx version. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30628 | The pool expansion would result in a deadlock when it had a volume in a re-sync state and the pool was already full. User Impact: Pool expansion would get stuck if a volume in the pool was in a re-sync state and the pool was full. No new pool expansions can be issued on such a pool. Resolution: Pool expansion will now be aborted immediately if it detects an unclean volume in the pool. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30551 | If a diagnostics package collection was triggered during a node initialization, it caused the node initialization to fail and the node to restart. User Impact: The node restarted when node initialization and diagnostics package collection occurred at the same time. Resolution: Now diagnostics package collection will not restart the node. Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-29976 | Cloud drive creation failed when a vSphere 8.0 datastore cluster was used for Portworx installation. User Impact: Portworx failed to install on vSphere 8 with datastore clusters. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.13.1 | Major |
PWX-29889 | Portworx installation with local install mode failed when both a journal device and a KVDB device were configured simultaneously. User Impact: Portworx would not allow creating multiple disks in local mode install Resolution: This issue has been fixed. Components: KVDB Affected versions: 2.12.x, 2.13.x | Major |
PWX-29512 | In certain cases, a KVDB node failover resulted in inconsistent KVDB membership, causing an orphaned entry in the cluster. User Impact: The cluster operated with one less KVDB node. Resolution: Every time Portworx performs a KVDB failover, if it detects an orphaned node, Portworx removes it before continuing the failover operation. Components: KVDB Affected versions: 2.13.x | Major |
PWX-29511 | Portworx would remove an offline internal KVDB node as part of its failover process, even when it was not part of quorum. User Impact: The KVDB cluster would lose quorum and required manual intervention to restore its functionality. Resolution: Portworx will not remove a node from the internal KVDB cluster if it is out of quorum. Components: KVDB Affected versions: 2.13.x | Major |
PWX-28287 | Pool expansion on an EKS cluster failed while optimization of the associated volume(s) was in progress. User Impact: Pool expansion was unsuccessful. Resolution: Portworx catches these scenarios early in the pool expansion process and provide a clear and readable error message to the user. Components: Cloud Drives Affected versions: 2.12.x, 2.13.x | Major |
PWX-28590 | In vSphere local mode install, storageless nodes (disaggregated mode) would claim storage ownership of a hypervisor if it was the first to boot up. This meant that a node capable of creating storage might not be able to get ownership. User Impact: In vSphere local mode, Portworx installed in degraded mode. It occurred during a fresh install or when an existing storage node was terminated. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.12.1 | Major |
PWX-30831 | On EKS, if the cloud drives were in different zones or removed, Portworx failed to boot up in certain situations. User Impact: Portworx did not start on an EKS cluster with removed drives. Resolution: Portworx now ignores zone mismatches and sends alerts for deleted drives. It will now not abort the boot up process and continue to the next step. Components: Cloud Drives Affected versions: 2.12.x, 2.13.x | Major |
PWX-31349 | Sometimes Portworx processes on the destination or DR cluster would restart frequently due to a deadlock between the node responsible for distributing the restore processing and the code attempting to attach volumes internally. User Impact: Restore operations failed Resolution: This issue has been fixed. Components: DR and Migration Affected versions: 2.12.x, 2.13.x | Major |
PWX-31019 | During cloudsnap backup/restore, there was a crash occasionally caused by the array index out of range of the preferredNodeForCloudsnap function. User Impact: Cloudsnap restore failed. Resolution: This issue has been fixed. Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-30246 | Portworx NFS package installation failed due to a lock held by the unattended-upgrade service running on the system. User Impact: Sharedv4 volume mounts failed Resolution: Portworx NFS package install now waits for the lock, then installs the required packages. This issue is resolved after upgrading to the current version and restarting the Portworx container. Components: Sharedv4 Affected versions: 2.11.2, 2.12.1 | Major |
PWX-30338 | VPS pod labels were not populated in the Portworx volume spec. User Impact: VPS using the podMatchExpressions field in a StatefulSet sometimes failed to function correctly because volume provisioning and pod inception occurred at the same time.Resolution: Portworx ensures that volume provisioning collects the pod name before provisioning. Components: Volume Placement Strategies Affected versions: 2.12.x, 2.13.x | Minor |
PWX-28317 | A replica set was incorrectly created for proxy volumes. User Impact: When a node was decommissioned, it got stuck if a proxy volume’s replica set was on that node. Resolution: Now replica sets are not created for proxy volumes. Components: Proxy Volumes Affected versions: 2.11.4 | Minor |
PWX-29411 | In vSphere, when creating a new cluster, KVDB disk creation failed for a selected KVDB node. User Impact: In the local mode install, the KVDB disk creation failures resulted in wrongly giving up ownership of a hypervisor. This created two storage nodes on the same hypervisors. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.12.1. 2.13.x | Minor |
PWX-28302 | The pool expand command failed to expand an existing pool size when it was increased by 4 GB or less. User Impact: If the user expanded the pool by 4 GB or less, the pxctl sv pool expand command failed with an invalid parameter error.Resolution: Increase the pool size by at least 4 GB. Components: PX-StoreV2 Affected versions: 2.12.x, 2.13.x | Minor |
PWX-30632 | NFS backupLocation for cloudBackups failed with the error validating credential: Empty name string for nfs error. The NFS name used by Portworx to mount the NFS server was not passed to the required function.User Impact: Using BackupLocations for NFS targets failed. Resolution: Portworx now passes the credential name to the function that uses the name to mount the NFS server. Components: Cloudsnaps Affected versions: 2.13.x | Minor |
PWX-25792 | During the volume mount of FA/FB DA volumes, Portworx did not honor the nosuid mount option specified in the storage class.User Impact: Post migration from PSO to Portworx, volumes with the nosuid mount option failed to mount on the host.Resolution: Portworx now explicitly sets the nosuid mount option in the mount flag before invoking the mount system call.Components: FA-FB Affected versions: 2.11.0 | Minor |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2149 | Portworx 3.0.0 cannot be installed using the Rancher catalog chart. You should use PX-Central to generate the Portworx spec. |
PD-2107 | If there is a ha-update operation while the volume is in a detached state, a different node might start publishing the volume metrics, but the old node won’t stop publishing the volume metrics. This results in duplicate metrics, and only one will have the correct currhalevel.Workaround: For detached volumes, before doing a ha-update , attach the volume manually through pxctl . |
PD-2086 | Portworx does not support Oracle API signing keys with apassphrase. Workaround: Use API signing keys without a passphrase. |
PD-2122 | The add-drive operation fails when a drive is added to an existing cloud-based pool.Workaround: Use the pxctl service pool expand -operation add-disk -uid <pool-ID> -size <new-storage-pool-size-in-GiB> command to add a new drive to such pools. |
PD-2170 | The pool expansion can fail on Google Cloud when using the pxctl service pool expand -operation add-disk command with the error Cause: ProviderInternal Error: googleapi: Error 503: Internal error. Please try again or contact Google Support. Workaround: Rerun the command. |
PD-2188 | In OCP 4.13 or newer, when the application namespace or pod is deleted, application pods that use Portworx sharedv4 volumes can get stuck in the Terminating state. The output of the ps -ef --forest command for the stuck pod showed that the conmon process had one or more defunct child processes. Workaround: Find the nodes on which the sharedv4 volume(s) used by the affected pods are attached, then restart the NFS server on those nodes with the systemctl restart nfs-server command. Wait for a couple of minutes. If the pod is still stuck in the Terminating state, reboot the node on which the pod is running. The pod might take several minutes to release after a reboot. |
PD-2209 | When Portworx is upgraded to version 3.0.0 without upgrading Portworx Operator to version 23.5.1, telemetry is disabled. This is because the port is not updated for the telemetry pod. Workaround: Upgrade Portworx Operator to the latest version and bounce the Portworox pods manually. |
PD-2615 | Migrations triggered as part of Async DR will fail in the "Volume stage" when Portworx is configured with PX-Security on the source and destination clusters. Workaround: Please contact support if you encounter this issue. |
Known issues (Errata) with PX-StoreV2 datastore
Issue Number | Issue Description |
---|---|
PD-2138 | Scaling down the node groups in AWS results in node termination. After a node is terminated, the drives are moved to an available storageless node. However, in some cases, after migration the associated pools remain in an error state. Workaround: Restart the Portworx service, then run a maintenance cycle using the pxctl sv maintenance --cycle command. |
PD-2116 | In some cases, re-initialization of a node fails after it is decommissioned and wiped with the error Failed in initializing drives on the node x.x.x.x : failed to vgcreate . Workaround: Reboot the node and retry initializing it. |
PD-2141 | When cloud drives are detached and reattached manually, the associated pool can go down and remain in an error state. Workaround: Restart the Portworx service, then run a maintenance cycle using the pxctl sv maintenance --cycle command. |
PD-2153 | If the add-drive operation is interrupted by a drive detach, scale down or any other operation, the pool expansion can get stuck.Workaround: Reboot the node. |
PD-2174 | When you add unsupported drives to the StorageCluster spec of a running cluster,Portworx goes down. Workaround: Remove the unsupported drive from the StorageCluster spec. The Portworx Operator will recreate the failed pod and Portworx will be up and running again on that node. |
PD-2208 | Portworx on-premises with PX-StoreV2 fails to upgrade to version 3.0.0. Workaround: Replace -T dmthin with -T px-storev2 in your StorageCluster, as the dmthin flag is deprecated. After updating the StorageCluster spec, restart the Portworx nodes. |
2.13.12
March 05, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-35603 | When running Portworx on older Linux systems (specifically those using GLIBC 2.31 or older) with newer versions of Kubernetes, Portworx failed to detect dynamic updates of pod credentials and tokens. This led to Unauthorized errors when using Kubernetes client APIs.Resolution: Portworx now correctly processes dynamic token updates. |
PWX-29750 | In certain cases, the cloudsnaps that were using S3 object-stores were not completely deleted because S3 object-stores did not support bulk deletes or were unable to handle large cloudsnaps. This resulted in undeleted cloudsnap objects, leading to unnecessary capacity consumption on S3. Resolution: Portworx now addresses and resolves such cloudsnaps deletion issues. |
PWX-35136 | During cloudsnap deletions, some objects were not removed because the deletion requests exceeded the S3 API's limit for the number of objects that could be deleted at once. This would leave objects on S3 for deleted cloudsnaps, thereby consuming S3 capacity. Resolution: Portworx now ensures that deletion requests do not exceed the S3 API's limit. |
PWX-31019 | An array index out of range error in the preferredNodeForCloudsnap function occasionally caused crashes during cloudsnap backup/restore operations.Resolution: This issue has been fixed, and Portworx now prevents such crashes during cloudsnap backup or restore operations. |
PWX-30030 | Some Portworx volumes incorrectly showed Not in quorum status after a network split, even though all the nodes and pools for the volume's replica were online and healthy. This happened when the node could communicate over network to KVDB but not with rest of the nodes. Resolution: Portworx volumes now accurately reflect their current state in such situations. |
PWX-33647 | When the Portworx process are restarted, it verifies the existing mounts on the system for sanity. If one of the mounts was NFS mount of a Portworx volume, the mount point verification would hung as Portworx was in the process of starting up. Resolution: When Portworx is starting up, it now skips the verification of Portworx-backed mount points to allow the startup process to continue. |
PWX-29511 | Portworx would remove an offline internal KVDB node as part of its failover process, even when it was not part of quorum. The KVDB cluster would lose quorum and required manual intervention to restore its functionality. Resolution: Portworx does not remove a node from the internal KVDB cluster if it is out of quorum. |
PWX-29533 | During node initialization with cloud drives, a race condition occasionally occurred between the Linux device manager (udevd) and Portworx initialization, causing node initialization failures. This was because drives were not fully available for Portworx's use, preventing users from adding new nodes to an existing cluster. Resolution: Portworx has increased the number of retries for accessing the drives during initialization to mitigate this failure. |
PWX-35650 | GKE customers encountered a nil panic exception when the provided GKE credentials were invalid. Resolution: Portworx now properly shuts down and logs the error, aiding in the diagnosis of credential-related issues. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2768 | When cloning or capturing a snapshot of an FlashArray Direct Access PVC that is either currently resizing or has encountered a resizing failure, the clone or snapshot creation might fail. Workaround: Initiate the resize operation again on the original volume, followed by the deletion and recreation of the clone or snapshot, or allow for an automatic retry. |
2.13.11
October 25, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
- It is recommended that you upgrade to the most latest version of Portworx when upgrading from version 2.13.11.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-34029 | Portworx now removes stale FlashArray multipath devices upon startup, which may result from pod failovers (for FlashArray Direct Access) or drive set failovers (for FlashArray Cloud Drives) while Portworx was not running. These stale devices had no direct impact but could have lead to slow operations if many were present. |
PWX-33551 | You can now configure the REST API call timeout (in seconds) for FA/FB by adding the new environment variable PURE_REST_TIMEOUT to the StorageCluster. When updating this value, you should also update the execution timeout using the following command:pxctl cluster options update --runtime-options execution_timeout_sec=<sec> PURE_REST_TIMEOUT is set to 8 seconds and execution_timeout_sec to 60 seconds by default. Contact Portworx support to find the right values for your cluster. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
PWX-33229 | Previously, a Portworx license would expire if Portworx could not reach its billing server within 72 hours. Users can now continue to use Portworx for up to 30 days if the billing servers are not reachable. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
PWX-33364 | As part of FlashArray integration, Portworx has now reduced the number of API calls it makes to the arrays endpoint on FA. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-33828 | If you deleted a FA Direct Access PVC attached to an offline Portworx node, Portworx removed the associated volume from its KVDB. However, the FlashArray did not delete its associated volume because it remained connected to the offline node on the FlashArray. This created orphaned volumes on the FlashArray. Resolution: Portworx now detects a volume that is attached to an offline Portworx node and will disconnect it from all the nodes in the FlashArray and avoid orphaned volumes. |
PWX-33632 | If an attach request remained in the processing queue for a long time, it would lead to a panic, causing Portworx to restart on a node. This was because an FA attach operation involved making REST API calls to FA, as well as running iSCSI rescans, which consumed more time. When Portworx received a high volume of requests to attach FA DirectAccess volumes, the queue for these attach requests gradually grew over time, leading to a panic in Portworx. Resolution: The timeout for queued attach requests has been increased to 15 minutes for FA DirectAccess volumes. |
PWX-33898 | When two pods, both using the same RWO FA Direct Access volume, were started on two different nodes, Portworx would move the FADA volume attachment to the node where the most recent pod was running, rather than rejecting the setup request for the second pod. This resulted in a stale FADA multipath device remaining on the original node where the first pod was started, causing subsequent attach or mount requests on that node to fail Resolution: A second pod request for the same RWO FA Direct Access volume on a different node will now be rejected if such a FA Direct Access volume is already attached and in use on another node. |
PWX-33631 | To distribute workloads across all worker nodes during the provisioning of CSI volumes, Portworx obtains locks to synchronize requests across different worker nodes. Resolution: If CSI volume creation is slow, upgrade to this version. |
PWX-34277 | When an application pod using an FA Direct Access volume was failed over to another node, and Portworx was restarted on the original node, the pod on the original node became stuck in the Terminating state. This occurred because Portworx didn't clean up the mountpaths where the volume had previously been attached, as it couldn't locate the application on the local node. Resolution: Portworx now cleans up the mountpath even when the application is not found on the node. |
PWX-34334 | Cloudsnaps of an aggregated volume with a replication level of 2 or more uploaded incorrect data if one of the replica nodes from which a previous cloudsnap operation had been executed was down. Resolution: Portworx now forces a full backup in scenarios where the previous cloudsnap node is down. |
PWX-33935 | When the --sources option was used in the pxctl volume ha-update command for the aggregated volume, it caused the Portworx service processes to abort with an assertion. As a result, the Portworx service on all nodes in the cluster continuously kept restarting.Resolution: Contact the Portworx support team to restore your cluster. |
PWX-34025 | In certain cases, increasing the replication level of a volume on a PX-StoreV2 cluster created new replicas with non-zero blocks that were overwritten with zeros on the existing replicas. This caused the ext4 filesystem to report a mismatch and delayed allocation failures when a user application attempted to write data to the volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the volume. This issue has been fixed in Portworx version 3.0.3 and now backported to 2.13.11. |
PWX-33451 | In certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data was expected. This caused the ext4 filesystem to report a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume. This issue has been fixed in Portworx version 3.0.2 and is now backported to 2.13.11. |
PWX-32572 | When using the older Containerd versions (v1.4.x or 1.5.x), Portworx kept opening connections to Containerd, eventually depleting all the file-descriptors available on the system. This caused the Portworx nodes to crash with the too many open files error. Resolution: Portworx no longer leaks the file-descriptors when working with older Containerd versions. This issue has been fixed in Portworx version 3.0.2 and is now backported to 2.13.11. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2474 | In certain scenarios, you might encounter the alert Failed to delete FlashArray Direct Access volume on FlashArray when deleting an FA Direct Access PVC. This occurs when the Portworx volume and the Kubernetes PVC are deleted, but the deletion fails on the FlashArray due to one of the following reasons:
|
PD-2474 | When a Portworx volume is created, it remains in the down - pending state. This occurs due to a race condition when Portworx is restarted while it is performing an FA API call to create a volume, and the volume creation is not completed on the FA side.Workaround: Delete the volume in the down - pending state using the pxctl volume delete command. |
PD-2477 | During FA Direct Access volume resizing, if the network between FlashArray and Portworx is disconnected, the PVC and the Portworx volume reflect the updated size, but the actual size on the FA backend remains unchanged. Workaround: Once the network is connected again, trigger another PVC resize operation to update the size on the FlashArray backend. |
2.13.10
September 3, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer. This caused the Portworx license status to appear expired and users were not able to create new volumes. Resolution: This issue has been fixed in Portworx version 3.0.1 and now backported to 2.13.10. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
2.13.9
August 28, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33258 | This issue impacted users who are using Flash Blade Direct Access volumes only. When Flash Blade direct access volume creation took more than 30 seconds, Portworx sometimes timed out on volume creation, leaving volumes in a pending state. Resolution: With this fix, the default timeout for FB volume creation has been increased from 30 seconds to 180 seconds (3 minutes). You can also set this timeout to a higher value using the new cluster option called --fb-lock-timeout . You can tune this as required based on the volume creation times on Flash Blade, as it depends on your performance and network bandwidth. You must set this time in minutes; for example, if you want to set the timeout to 6 minutes: pxctl cluster options update --fb-lock-timeout 6 |
2.13.8
August 24, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-33014 | With Portworx Operator 23.7.0, Portworx can dynamically load telemetry port values specified by the operator. |
PWX-30798 | Users can now schedule fstrim operations. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-33006 | The FlashArray Direct Access PVCs were deleted upon a Portworx restart if they were newly created, not yet attached, and in a "Pending" state. There is no data loss since these were unpopulated volumes. Resolution: Portworx has enhanced the code to no longer delete "Pending" FADA volumes on PX startup. |
PWX-30511 | When auto fstrim was disabled, internal state data did not clear and caused manual fstrim to enter an error state. Resolution: This issue has been fixed. |
2.13.7
July 11, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-31855 | When mounting a large number of PVCs that use FADA volumes, PVC creation took a long time and crashed Portworx. Resolution: The heavyweight list of all devices API has been removed from the attach call, reducing the time taken to attach volumes. |
PWX-30551 | The node restarted when node initialization and diagnostics package collection happened at the same time. Resolution: The diagnostics package collection will not restart the node. |
PWX-21105 | Volume operations such as Attach/Detach/Mount/Unmount would get stuck if a large number of requests were sent for the same volume. Portworx would accept all requests and add them to its API queue. All requests for a specific volume are processed serially. This would cause newer requests to be queued for a longer duration. Resolution: When a request does not get processed within 30s because it is sitting behind other requests in the API queue for the same volume, Portworx will return an error to the client requesting it to try again. |
PWX-29067 | The application pods using FADA volumes were not getting auto remounted in read-write mode when one of the multiple configured network interfaces went down.Resolution: Portworx now enables multiple iSCSI interfaces for FlashArray connections. These interfaces must be registered with the iscsiadm -m iface command. Use the --flasharray-iscsi-allowed-ifaces cluster option to restrict the interfaces used by FADA connections. This ensures if one of the interfaces go down, the FADA volume will stay mounted as read-write. For more details about the flasharray-iscsi-allowed-ifaces flag, see FlashArray and FlashBlade environment variables. |
2.13.6
June 16, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-30569 | Portworx now supports OpenShift version 4.13.0 with kernel version 5.14.0-284.13.1.el9_2.x86_64. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-31647 | If any read-write volume changed to a read-only state, pods using these volumes had to be manually restarted to make the mounts back to read-write. Resolution: A background task is now implemented to run periodically (by default every 30 seconds), which checks for read-only volumes and terminates managed pods using them. You can customize this time-interval with the --ro-vol-pod-bounce-interval cluster option. This background task is enabled for FA DirectAccess volumes by default.To enable this for all Portworx volumes, use the --ro-vol-pod-bounce all cluster option. |
2.13.5
May 16, 2023
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Portworx can now be deployed from Azure Marketplace with a pay-as-you-go subscription.
2.13.4
May 09, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx by Pure Storage recommends upgrading to Portworx 2.13.4 if you are using Portworx 2.12.0 with Azure managed identity to avoid the PWX-30675 issue, which is explained below.
Fixes
Issue Number | Issue Description |
---|---|
PWX-30675 | During installation of Portworx 2.12.0 on AKS, Portworx checked for the AZURE_CLIENT_SECRET , AZURE_TENANT_ID and AZURE_CLIENT_ID environment variables. However, users of Azure managed identity had only set the AZURE_CLIENT_ID , resulting in a failed installation.Resolution: This issue has been fixed and now Portworx checks only for the AZURE_CLIENT_ID environment variable. |
2.13.3
April 24, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
If you are currently using any Portworx 2.12 version, Portworx by Pure Storage recommends upgrading to version 2.13.3 due to the PWX-29074 issue, which is explained below.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-30420 | In Portworx version 2.13.0, a prerequisite check was implemented to detect the versions of the multipath tool with known issues (0.7.x to 0.9.3) during installation or upgrade of Portworx. If a faulty version was detected, it was not possible to install or upgrade Portworx. However, this prerequisite check has now been removed, and Portworx installs or upgrades are not blocked on these faulty versions. Instead, a warning message is displayed, advising customers to upgrade their multipath package. |
PWX-29992 | In Async DR migration, a snapshot was previously created at the start of restores as a fallback in case of errors, but it added extra load with creation and deletion operations. This is improved, as Portworx do not create a fallback snapshot, allowing users to create clones from the last successful migrated snapshot if necessary for error cases. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-29640 | Incorrect allocation of floating license and insertion of excess data into the Portworx key-value database caused new nodes to repeatedly fail to join the Portworx cluster . Resolution: Cluster-join failures now perform thorough cleanup to remove all temporary resources created during the failed cluster-join attempts. |
PWX-30056 | During migration, if a PVC has the sticky bit set (which prevents volumes from being deleted), it accumulated the internal snapshot that was created for the asynchronous DR deployment, thus consuming extra storage space. Resolution: The internal snapshots are now created without the sticky bit. |
PWX-30484 | The SaaS license key was not activated when installing Portworx version 2.13.0 or later. Resolution: This issue has been fixed. |
PWX-26437 | Due to a rare corner-case condition, node decommissioning could leave orphaned keys in the KVDB. Resolution: The forced node-decommission command has been modified to perform the node-decommission more thoroughly, and to clean up the orphaned data from the KVDB. |
PWX-29074 | Portworx incorrectly pinged the customer DNS server. At regular intervals, when the /etc/hosts file from the node periodically rsynced with the Portworx runc container, it temporarily removed the mappings for KVDB domain names. As a result, internal KVDB name resolution queries were incorrectly forwarded to the customer's DNS servers. Resolution: This issue has been fixed. |
PWX-29325 | The local snapshot schedule could not be changed using the pxctl CLI. An update to a previously created snapshot failed with the error Update Snapshot Interval: Failed to update volume: This IO profile has been deprecated .Resolution: You can now disable snapshot schedules with the —periodic parameter, as shown in the following command:pxctl volume snap-interval-update --periodic 0 <vol-id> |
PWX-30255 | The log message is improved to add extra metadata into node markdown updates. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2063 | In an Async DR deployment, if the --sticky flag is set to on for Portworx volumes:
off the sticky bit flag on the Portworx volumes on the source cluster:PX_POD=$(kubectl get pods -l name=portworx -n <px-namespace> -o jsonpath='{.items[0].metadata.name}') kubectl exec $PX_POD -n <px-namespace> -- /opt/pwx/bin/pxctl volume update <vol-id> --sticky off |
2.13.2
April 7, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27957 | The volume replica level in an Asynchronous DR deployment now matches the source volume's replica level at the end of each migration cycle. |
PWX-29017 | Storage stats are periodically collected and stored to improve Portworx cluster debugging. |
PWX-29976 | Portworx now supports vSphere version 8.0 |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23651 | Certain workloads involving file truncates or deletes from large to very small sizes caused the volume to enter an internal error state. The issue is specific to the Ext4 filesystem because of the way it handles file truncates/deletes. As a result, PVC resize/expand operations failed.Resolution: Portworx now recognizes these specific categories of errors as fixable and automatically fixes them during the next mount. |
PWX-29353 | If multiple NFS credentials were created with the same NFS server and export paths, cloudsnaps did not work correctly. Resolution: If the export paths are different with the same NFS server, they now get mounted at different mount points, avoiding this issue. |
PWX-28898 | Heavy snapshot loads caused delays in snapshot completion. This caused replicas to lag and the backend storage pool to keep consuming space. Resolution: You can increase the time Portworx waits for storage to complete the snapshot. This will cause the replicas to remain in the pool until the next Portworx service restart, which performs garbage collection of such replicas. |
PWX-28882 | Upgrades or installations of Portworx on Nomad with cloud drives failed at bootup. Impacted versions: 2.10.0 and later. Resolution: Portworx version 2.13.2 can successfully boot up on Nomad with cloud drives. |
PWX-29600 | The VPS Exists operator did not work when the value of key parameter was empty.Resolution: The VPS Exists operator now allows empty values for the key parameter without failing. |
PWX-29719 | On FlashArray cloud drive setup, if some iSCSI interfaces could log in successfully while others failed, the FlashArray connection sometimes failed with the failed to log in to all paths error. This prevented Portworx from restarting successfully in clusters with known network issues. |
PWX-29756 | If FlashArray iSCSI attempted to log in several times, it timed out, creating extra orphaned volumes on the FlashArray. Resolution: The number of retries has been limited to 3. |
PWX-28713 | Kubernetes nodes with Fully Qualified Domain Names (FQDNs) detected FlashArray cloud drives as partially attached. This prevented Portworx from restarting successfully if the FlashArray host name did not match the name of the node, such as with FQDNs. |
PWX-30003 | A race condition when updating volume usage in auto fstrim resulted in Portworx restart. |
2.13.1
April 4, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
Portworx can now be deployed from the GCP Marketplace with the following new offerings. You can also change between these offerings after deploying Portworx by changing the value of the environment variable
PRODUCT_PLAN_ID
within your StorageCluster spec:- PX-ENTERPRISE
- PX-ENTERPRISE-DR
- PX-ENTERPRISE-BAREMETAL
- PX-ENTERPRISE-DR-BAREMETAL
Fixes
Issue Number | Issue Description |
---|---|
PWX-29572 | In Portworx 2.13.0, the PSO2PX migration tool would fail with the error pre-create filter failed: CSI PVC Name/Namespace not provided to this request due to a change made in the Portworx CSI Driver.Resolution: For migrating from PSO to Portworx, you should use Portworx 2.13.1. The migration tool will fail with Portworx 2.13.0. |
2.13.0
February 23, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
A known issue with multipath tool versions 0.7.x to 0.9.3 causes high CPU usage and/or multipath crashes that disrupt IO operations. To prevent this, Portworx now performs a prerequisite check to detect these faulty multipath versions starting with version 2.13.0. If this check fails, it will not be possible to install or upgrade Portworx. Portworx by Pure Storage recommends upgrading the multipath tool version to 0.9.4 before upgrading to Portworx to any Portworx 2.13 version.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- You can now install Portworx on Oracle Container Engine for Kubernetes.
- You can now use Portworx on FlashArray NVMe/RoCE.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27200 | Added the following pxctl commands for auto fstrim:
|
PWX-28351 | You can now enable pay-as-you-go billing for Docker Swarm. |
PWX-27523 | CSI sidecar images are updated to the latest open source versions. |
PWX-27920 | Batching is now enabled in the metrics collector to reduce memory usage on large scale clusters. |
PWX-28137 | The Portworx maintained fork for CSI external-provisioner has been removed in favor of the open source version. |
PWX-28149 | The Portworx CSI Driver now distributes volume deletion across the entire cluster for faster burst-deleting of many volumes. |
PWX-28131 | Pool expansion for repl1 volumes is now supported on all cloud environments, except for the following scenarios:
|
PWX-28277 | Updated stork-scheduler deployment and stork-config map in the spec generator to use Kube Scheduler Configuration for Kubernetes version 1.23 or newer. |
PWX-28363 | Reduced the number of vSphere API calls made during Portworx bootup on vSphere. This significantly improves Portworx upgrade times in environments where vSphere servers are overloaded. |
PWX-10054 | Portworx can now monitor the health of an internal KVDB, and when it is detected as unhealthy, Portworx can initiate KVDB failover. |
PWX-27521 | The Portworx CSI driver now supports version 1.7 of the CSI spec. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23203 | In some cases, migration or Asynchronous DR failed when the source volume was being resized. Resolution: On the destination cluster, Portworx now resizes the volume before migration operations. |
PWX-26061 | Deleting cloudsnaps failed with the curl command on a gRPC port.Resolution: Add a separate field for providing the bucket ID. |
PWX-26928 | Portworx installation would fail when unattended-upgr was running on the system, or you were unable to lock the necessary packages for installation.Resolution: Re-attempt installation after waiting for the lock to be released. |
PWX-27506 | When a node was down for a long time, cloudsnap restores were taking longer to start. Resolution: Portworx now makes other nodes in the cluster to process such restore requests. |
PWX-28305 | Portworx was facing a lock hold timeout assert while detaching sharedv4 service volumes, if the Kubernetes API calls were being rate limited. Resolution: To avoid this assert, the Kubernetes API calls are now done outside the context of a lock. |
PWX-28422 | Snapshot and cloudsnapshot requests were failing if a volume was in the detached state and one of its coordinator had changed IP address. Resolution: Portworx now reattaches the volume with the correct IP address on snapshot and cloudsnapshot request for detached volumes. |
PWX-28224 | The pxctl cd list command was failing to fetch the cloud drives when ran from hot-nodes (nodes with local storage).Resolution: This issue has been fixed. |
PWX-28225 | The summary of the command pxctl cd list was showing all nodes as cloud drive nodes.Resolution: The output of the command is reframed. |
PWX-28321 | The output of pxctl cd list was showing storageless nodes even though there were no storageless nodes present in the cluster.Resolution: Wait for the Portworx cleanup job to be completed, which runs every 30 minutes. |
PWX-28341 | In the NodeStart phase, if a gRPC request for getting node stats was invoked before completion of the pxdriver bootstrap, Portworx would abruptly stop. Resolution: Now Portworx returns an error instead of stopping abruptly. |
PWX-28285 | The high frequency of sharedv4 volume operations (such as create, attach, mount, unmount, detach, or delete) requires frequent changes to NFS exports. This was causing the NFS server to stop responding and a potential node restart. Resolution: When applying changes to NFS exports, Portworx now combines multiple changes together and sends a single batch update to the NFS server. Portworx also limits the frequency of NFS server restarts to prevent such issues. |
PWX-28529 | Fixed an issue where volumes with replicas on a node in pool maintenance were temporarily marked as out of quorum when the replica node exited pool maintenance. |
PWX-28551 | In Portworx version 2.12.1, one of the sanitizing operations changed upper case letters to lower case letters. This caused CSI pod registration issues during the upgrade. Resolution: This issue is fixed as Portworx now adheres to the regular expression for topology label values. |
PWX-28539 | During the attachment of FlashArray (FA) NVMe volumes, Portworx performs stale device cleanup. However, this cleanup process sometimes failed when the device was busy, causing the volume attachment to fail. Resolution: The FA NVMe volumes can now be attached, even if the stale cleanup fails. |
PWX-28614 | Fixed a bug where pool expansion of pools with repl1 volumes did not abort. |
PWX-28910 | In a Synchronous DR deployment, if the domains were imbalanced and one domain had over-provisioned a new volume, all the replicas of the volume would land in the same domain. Resolution: Now the replicas are forced to spread across the failure domains during the volume creation operation in the Synchronous DR deployment. If provisioning is not possible, then the volume creation operation will fail. You can use the pxctl cluster options update -metro-dr-domain-protection off command to disable this protection. |
PWX-28909 | When an error occurred during CSI snapshots, the Portworx CSI driver was incorrectly marking the snapshot ready for consumption. This resulted in a failure to restore PVCs from a snapshot in this case. Resolution: Create a snapshot and immediately hydrate a new PVC with the snapshot contents. |
PWX-29186 | Fields required for Volume Placement Strategy were missing from the CSI volume spec.VolumeLabels . This was resulting in a Volume Placement Strategy where a namespace failed to place volumes correctly.Resolution: While some simple volume placement strategies may work without this fix, users of CSI should upgrade to Portworx version 2.13.0 if they use Volume Placement Strategies. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1859 | When storage is full, a repl 1 volume will be in the NOT IN QUORUM state and a deadlock occurs, so you cannot expand the pool.Workaround: To expand the pool, pass the --dont-wait-for-clean-volumes option as part of the expand command. |
PD-1866 | When using FlashArray Cloud Drives and FlashArray Direct Access volumes, Portworx version 2.13.0 does not support Ubuntu versions 20.04 and 22.x with the default multipath package (version 0.8x). Workaround: Portworx requires version 0.9.4 of the multipath-tools package. Reach out to the support team if you need help building the package. |
2.12.6
September 3, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer. This caused the Portworx license status to appear expired and users were not able to create new volumes. Resolution: This issue has been fixed in Portworx version 3.0.1 and now backported to 2.12.6. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
2.12.5
May 09, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-30003 | Portworx restarted due to an internal race condition caused by high-frequency metadata updates overloading Portworx nodes. Resolution: This issue has been fixed in Portworx version 2.13.2 and now backported to 2.12.5. |
2.12.4
April 26, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx by Pure Storage recommends upgrading to version 2.12.4 as it fixes a regression introduced in 2.12.0, which is explained in the PWX-28551 issue below.
Fixes
Issue Number | Issue Description |
---|---|
PWX-28551 | In Portworx version 2.12.1, one of the sanitizing operations changed upper case letters to lower case letters. This caused CSI pod registration issues during the upgrade. Resolution: This issue has been fixed. |
2.12.3
April 17, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-28285 | The high frequency of sharedv4 volume operations (such as create, attach, mount, unmount, detach, or delete) requires frequent changes to NFS exports. This caused the NFS server to stop responding and a potential node restart. Resolution: This issue has been fixed in Portworx version 2.13.0 and now backported to 2.12.3. |
PWX-29074 | Portworx incorrectly pinged the customer DNS server. At regular intervals, when the /etc/hosts file from the node periodically rsynced with the Portworx runc container, it temporarily removed the mappings for KVDB domain names. As a result, internal KVDB name resolution queries were incorrectly forwarded to the customer's DNS servers. Resolution: This issue has been fixed. Portworx by Pure Storage recommends upgrading to version 2.12.3, if running Portworx 2.12.0, 2.12.1, or 2.12.2. |
2.12.2
January 28, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-28339 | CSI Volumes restored from snapshots were missing PVC name and namespace metadata. This caused failures when using sharedv4 service volumes. Resolution: Portworx now adds the PVC name and namespace to the volume during restore. |
PWX-22828 | If automatic filesystem trim was disabled and then enabled within one minute, then the pxctl volume autofstrim status command incorrectly reported the status: Filesystem Trim Initializing. Please wait .Resolution: This issue has been fixed. |
PWX-28406 | Automatic filesystem trim would skip a volume if any of the replicas for the volume were hosted on a node where there was no pool with ID 0. Resolution: This issue has been fixed. |
2.12.1.4
September 22, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
This is a hotfix release intended for select customers. Please contact the Portworx support team for more information.
Fixes
Issue Number | Issue Description |
---|---|
PWX-33451 | Ext4 filesystem complained about a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. This occurred because, in certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data is expected. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume. |
2.12.1
December 14, 2022
Visit these pages to see if you're ready to upgrade to the latest version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
Google Cloud users can now encrypt GCP cloud drives using customer managed encryption keys.
-
You can now use Vault Transit to manage key generation for encrypting in-transit data.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26232 | The Portworx node's IP addresses are now included in the license server's "long client usage" output (lsctl client ls -l ). |
PWX-26304 | Storageless nodes will become storage nodes when max_storage_nodes_per_zone is increased. |
PWX-27769 | vSphere and IBM cloud platforms will be able to now recognize the zone label topology.portworx.io/zone . This will help Portworx honor zone related settings like maxStorageNodesPerZone . |
PWX-27174 | pxctl cluster provision-status will now show IP addresses for nodes. The IP addresses help you to correlate that command's output with the node list provided by pxctl status . |
Fixes
Issue Number | Issue Description |
---|---|
PWX-27748 | Some incremental fixes in version 2.12.0 introduced issues with DaemonSet YAML generation for K3s and RKE2 Kubernetes platforms. Resolution: These issues have been fixed. |
PWX-27849 | Kubernetes versions 1.25 and later [do no support PodSecurityPolicy] (https://kubernetes.io/docs/concepts/security/pod-security-policy/). Resolution:PX-Central does not include PodSecurityPolicy in the YAML install specs when the Kubernetes version is 1.25 or later. |
PWX-27267 | Cloudsnaps for aggregated volumes were failing certain checks when part of the aggregated volume did not have any differential data to upload. Resolution: This version fixes those checks and prevents failure due to empty differential data. |
PWX-27246 | During a new installation of Portworx on a vSphere environment (installation in local mode), several VMs appeared as storage nodes on a single ESXi host. This was because of a race condition in the vSphere environment, which increased the number of nodes forming the cluster and affected quorum decisions. Resolution: You can choose not to upgrade if you take care of the race condition during installation. You can overcome the race condition by allowing only 1 VM to come up on an ESXi host during installation. Once the installation is complete, you can bring up as many VMs as you want simultaneously. The problem exists only until the first storage VM comes up. |
PWX-26021 | For sharedv4 apps, multiple mount/unmount requests on the same path could become stuck in Uninterruptible Sleep (D) State .Resolution: In the case that the client tries to mount but the previous request still exists, there will be a Kubernetes event stating that the previous request is still in progress. |
PWX-27732 | vSphere cloud drive labels previously contained a space, which is not compatible with Kubernetes standards and caused an error in CSI. Resolution: Portworx now replaces the space character with a dash ( - ). All other special characters will be replaced by a period (. ). |
PWX-27227 | If a pool rebalancing was issued with the --dry-run option, then Portworx created unnecessary rebalance audit keys in the KVDB. As it was not possible to delete these keys, the disk size of the KVDB increased. Resolution : Portworx no longer creates audit keys when a pool rebalancing is issued with the --dry-run option, and Portworx deletes orphaned keys that have already been created. |
PWX-27407 | The KVDB contained inconsistent node entries because of a race condition in the auto decommission of storageless nodes. This was causing Portworx to restart. Resolution: The race condition is now handled, and Portworx ensures that no inconsistent entries are left behind in the KVDB during the decommission process. |
PWX-27917 | Portworx ignored the value of MaxStorageNodesPerZone if an uneven number of nodes are labeled as portworx.io/node-type=storage .Resolution: This issue has been fixed. |
PWX-24088 | For cloud drives provisioned on FlashArray (FA), the default mount was nodiscard . When Portworx deleted contents of a volume or the volume itself, the space was not reclaimed by FA. This caused a discrepancy in available space displayed on a Pure1 dashboard vs available space displayed through the pxctl status usage command.Resolution: Storage pools for FA are now mounted with discard . This allows space to be reclaimed whenever volumes are deleted or files are removed. |
Deprecations
The following feature has been deprecated:
- Internal objectstore support
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1684 | If the sharedv4_svc_type parameter is not specified during ReadWriteMany volume creation, Portworx defaults to a sharedv4 service volume unless Metro DR is enabled, in which case Portworx defaults to sharedv4 (non-service) volumes. You can explicitly set the sharedv4_svc_type parameter in the StorageClass. If it is set to an empty string, a sharedv4 (non-service) volume is created. |
PD-1729 | On some recent Linux kernels, back to back online resize operations of Ext4 volumes can fail. This is because of a bug in the kernel which has been fixed in the latest kernel release. Workaround: Upgrade to a more recent kernel version, or restart the application pod that is using the volume. This remounts the volumes and completes the resize operation. |
2.12.0
October 24, 2022
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx 2.12.0 requires Operator 1.10.0 and Stork 2.12.0.
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
On-prem users can now enable PX-Fast functionality that utilizes the new PX-StoreV2 datastore. PX-Fast enables a new accelerated IO path for volumes and is optimized for workloads requiring consistent low latencies. PX-StoreV2 is the new Portworx datastore optimized for supporting IO intensive workloads for configurations utilizing high performance NVMe class devices.
-
Early access support for Portworx Object Service. This feature allows storage admins to provision object storage buckets with various backing providers using custom Kubernetes objects.
-
You can now use Vault AppRole's Role ID and Secret ID to authenticate with Vault. Portworx will auto-generate Vault tokens to store encryption secrets and cloud credentials in Vault.
-
Metro and asynchronous disaster recovery (DR) involves migrating Kubernetes resources from a source cluster to a destination cluster. To ensure that the applications can come up correctly on the destination clusters, you may need to modify resources to work as intended on your destination cluster. The ResourceTransformation feature allows you to define a set of rules that modify the Kubernetes resources before they are migrated to the destination cluster.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26700 | A pod that was using a sharedv4 volume might have taken a few minutes to terminate when Portworx running on one node was waiting for a response from Portworx running on another node which was down. This has been fixed by deferring the remote call when the remote node is down. |
PWX-26631 | In one of the error paths, Portworx was listing pods in all the namespaces in the Kubernetes cluster, which caused the Portworx process to consume a large amount of memory temporarily. The pod listing is now limited to a single namespace. |
PWX-24862 | At times, users create sharedv4 volumes unintentionally by using a sharedv4 storageClass with a ReadWriteOnce PVC. This is because previously Portworx created a sharedv4 volume when either the storageClass had sharedv4: true or the PVC access mode was ReadWriteMany/ReadOnlyMany . To avoid these unintentional sharedv4 volumes, a sharedv4 volume is now created only if the PVC access mode is ReadWriteMany/ReadOnlyMany , and the sharedv4 setting in the storageClass does not matter. This may require modification to the specs of some existing apps.If an app expects a sharedv4 volume while using a ReadWriteOnce PVC, some of the pods may fail to start. The PVC will have to be modified to use ReadWriteMany or ReadOnlyMany access mode. |
PWX-23285 | Adds uniform support for PX_HTTP_PROXY , PX_HTTPS_PROXY , and NO_PROXY environment variables (which are equivalent to commonly used Linux HTTP_PROXY , HTTPS_PROXY and NO_PROXY environment vars.)Specifying the HTTP proxy via the cluster options is now deprecated. Also adds support for authenticated HTTP proxy, where you specify the username and password to authenticate with an HTTP proxy. For example: PX_HTTP_PROXY=http://user:password@myproxy.acme.org . |
PWX-22292 | Previously, when you updated an AKS cluster principal's password and refreshed your Kubernetes secret, you would sometimes have to restart Portworx pods to propagate the changes to Portworx. Now, Portworx automatically refreshes the AKS secrets. |
PWX-22927 | You can now use Vault AppRole authentication for vault integration. You need to provide VAULT_ADDR, VAULT_APPROLE_ROLE_ID, VAULT_APPROLE_SECRET_ID, and VAULT_AUTH_METHOD (approle) via Kubernetes Secrets or as environment variable. |
PWX-26421 | Portworx logs now display Vault's authentication method if login is successful. |
PWX-24437 | Added discard stats to Grafana performance dashboard. |
PWX-18687 | Portworx now instantly detects that a replication node is down as a result of socket events, preventing high IO latencies. |
PD-1628 | Portworx now supports FlashBlade and FlashArray SafeMode feature |
PD-1634 | Added support for live migration of virtual machines between nodes in the OpenShift environment. This feature works only with Stork version 2.12. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-20808 | When Portworx was configured with external etcd v3 as its key-value database, there were delays and timeouts when running the pxctl service kvdb members command because the provided etcd endpoints to Portworx did not match with what were used internally.Resolution: Portworx now consults both configured KVDB endpoints and configuration before displaying the KVDB information. |
PWX-24649 | Occasionally, a race condition in the initial setup of Portworx was leading to an invalid topology-zone set on cloud-drives, resulting in Portworx allocating and using more cloud-drives than configured. Resolution: The startup issue has been fixed. |
PWX-26406 | Repeated CSI NodePublish/NodeUnpublish API calls were resulting in Portworx using more resources because these APIs did a deep Inspect on the volume.Resolution: CSI NodePublish/NodeUnpublish now uses fewer resources because it avoids a deep Inspect and any extra API calls to the Kubernetes API server. |
PWX-24872 | The pxctl cloudsnap list -x 5 command was reporting an error.Resolution: The issue has been fixed. |
PWX-26935 | When a pod with sharedv4 volume is terminated, Portworx unmounts the nfs-mounted path on the local node. When the remote NFS server node was powered off, unmount was delaying the pod termination. Resolution: This issue has been fixed. |
PWX-26578 | Portworx was taking a long time to run background tasks. On some clusters, this was causing long delays due to a large backlog. Resolution: This issue has been fixed. |
PWX-23454 | Pool and node labels that are the same as volumes were ignored when applying VolumePlacementStrategies. Resolution: This issue has been fixed. |
PWX-26445 | PVC creation could failing with an unauthorized error message if the service account token being used by Portworx is expired. Resolution: For Kubernetes version 1.21 and later, refresh the ServiceAccountToken for the Portworx service to prevent unauthorized errors (after 1 year by default, or 3 months for an EKS cluster). |
PWX-24785 | When installing Portworx on Rancher via helm charts, some permissions on Secret objects were missing in the Portworx spec.Resolution: px-role is added to both Portworx Enterprise and Portworx Essentials with the Kubernetes secrets permissions. |
PWX-19220 | The output of pxctl clouddrive commands did not provide cloud-storage details making it difficult to troubleshoot issues using vSphere GUI.Resolution: The outputs of the commands pxctl clouddrive list and pxctl clouddrive inspect now include vSphere datastore information and drive-labels, respectively. |
PWX-27170 | Cloudsnap restores were not forward compatible, meaning an older version of Portworx could not restore a newer version of cloudsnap. However, in such cases, cloudsnap restore was completed without an error, but without data. Resolution: Now from version 2.12.0, Portworx has a check that fails during such restore operations |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1619 | As a result of deleting application pods that are using sharedv4 or sharedv4 service volumes in Kubernetes, part of the pod's state may not be properly cleaned up. Later, if the pod's namespace is deleted, the namespace may be stuck in the Terminating state.Workaround: Contact Portworx by Pure Storage support team to clean up such namespaces. |
PD-1611 | When using the PX-StoreV2 datastore, running multiple concurrent resize and clone operations on the same fastpath volume may cause either resize or clone operation to fail. Workaround: Retry the failed operation. |
PD-1595 | When using the PX-StoreV2 datastore, a pool may not automatically transition into Online state after completing a drive add operation. Workaround: Perform a maintenance cycle on the node. |
PD-1592 | When using the PX-StoreV2 datastore, pool maintenance enter or exit operation may get stuck if there are encrypted PVCs attached on the node with an outstanding IOs. Workaround: Reboot the node where encrypted PVC is attached. |
PD-1650 | When using the PX-StoreV2 datastore, a PX-Fast volume may get attached in an inactive fastpath state because of internal sanity check failure. Workaround: Restart the application pod consuming the volume so that it goes through a detach and attach cycle which will reattempt fastpath activation. |
PD-1651 | When using the PX-StoreV2 datastore, in case of failure during installation, Portworx may get into a restart loop with an error message: PX deployment failed with an error "failed to create MD-array:” .Workaround: Clean up the failed install using node-wiper and retry installation. |
PD-1655 | You experience telemetry pod crashing issues due to port conflicts. Workaround: Adjust the Portworx start port by adding the following to your StorageCluster spec: startPort: <starting-port> |
2.11.5
January 12, 2023
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PD-1769 | It takes less time to upgrade Anthos because Portworx now makes fewer vSphere API calls for the following:
|
2.11.4
October 4, 2022
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27053 | Added runtime knobs to modify the amount of work that a resync workflow does at any instant of time. These knobs will facilitate throttling of resync operation. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-27028 | While performing writes on the target, the resync operation pinned down resources even when the write operation had no priority to execute. Resolution: In version 2.11.4, this issue is fixed. |
2.11.3
September 13, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-26745 | In Portworx versions 2.11.0 through 2.11.2, Async DR restore takes longer than in previous versions. Resolution: In version 2.11.3, this issue has been resolved. |
2.11.2
August 11, 2022
New features
Portworx by Pure Storage is proud to introduce the following new feature:
- You can now enable encryption on the Azure cloud drives using your own key stored in Azure Key Vault.
Notes
Starting with Portworx version 2.12.0, internal objectstore will be deprecated.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26047 | pxctl status now shows a deprecation warning when internal objectstore is running on a cluster. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23465 | Backups were not encrypted if BackupLocation in Kubernetes had an encryption key set for cloudsnaps. (Note that this should not be confused with encrypted volumes. This encryption key, if set, is applied only to cloudsnaps irrespective of encrypted volumes.)Resolution: Backups are now encrypted in this case. |
PWX-24731 | The Grafana image was not included into the list of images for the air-gapped bootstrap script. Customers using Prometheus monitoring needed to manually copy the Grafana container image into their environments. Resolution: The air-gapped bootstrap script has been updated and now includes the Grafana image. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1390 | The billing agent might try to reach outside the network portal in air-gapped environments. Workaround: Disabled the call home service on Portworx nodes by running pxctl sv call-home disable . |
2.11.1
July 19, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-24519 | The mount path was not erased if you restarted Portworx at the wrong time during an unmount operation when using CSI. This caused pods to be stuck in the terminating state. Resolution: When you restart Portworx now, it ensures that the mount path is deleted. |
PWX-24514 | When a cluster is configured with PX-Security and using Floating license, it was not possible to add new nodes to the Portworx cluster. Resolution: You can now add new nodes to the cluster. |
PWX-23487 | On certain kernel versions (5.4.x and later) during startup, volume attach sometimes got stuck, preventing Portworx from starting. This is because a system-generated IO can occur on the volume while the volume attach is in progress, causing the volume attach to wait for IO completion, which in turn waits for startup to complete, resulting in a deadlock. Resolution: Portworx now avoids the deadlock by preventing access to the volume until attach is complete. This functionality is only enabled after a system reboot. |
2.11.0
July 11, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- On-premises users who want to use Pure Storage FlashArray with Portworx on Kubernetes can provision and attach FlashArray LUNs as a Direct Access volume.
- The CSI topology feature allows users of FlashArray Direct Access volumes and FlashBlade Direct Access filesystems to direct their applications to provision storage on a FlashArray Direct Access volume or FlashBlade Direct Access filesystem that is in the same set of Kubernetes nodes where the application pod is located.
- You can now use Portworx with IBM cloud drives on VPC Gen2 infrastructure. Portworx will use the IBM CSI provider to automatically provision and manage its own storage disks.
- You can enable pay-as-you-go billing for an air-gapped cluster with no outbound connectivity by acquiring a pay-as-you-go account key from Portworx. This key can be used on any cluster to activate the license, provided you can report usage collected by the metering module.
- You can now deploy Portworx in IPv6 networking enabled environments.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-24195 | Portworx supports using BackupLocation CR with IAM policy as an AWS s3 target for cloudsnaps triggered through Stork using ApplicationBackup CR. When Portworx detects that a BackupLocation is provided as a target, it uses the IAM role of the instance, where it is running, for authentication with s3. |
PWX-23392 | Updated the Portworx CSI driver to CSI 1.6 spec. |
PWX-23326 | Updated CSI Provisioner, Snapshotter, Snapshot Controller, Node Driver Registrar, and Volume Health controller to the latest releases. |
PWX-24188 | A warning log is removed, which was printed when Docker inquired about a volume name that Portworx could not find. |
PWX-23103 | Added a detailed warning message for volume provision issues when there are conflicting volumes in the trashcan. Volume provisioning with volume placement rules can be blocked by matching volumes in the trashcan. The new warning message informs you which trashcan volumes are causing a conflict. |
PWX-22045 | Portworx now starts faster on high-scale clusters. During the Portworx start-up process, you will see a reduction in API calls to cloud providers. In particular, AWS API calls related to EBS volumes will be reduced. |
PWX-20012 | Enabled support for pd-balanced disk type on Google Cloud environment. You can now specify pd-balanced as one of the disk types (such as, type=pd-balanced, size=700 ) in the device spec file. |
PWX-24054 | pxctl service pool delete --help command is enhanced to show Note and Example in addition to usage information. For example,Note: This operation is supported only on on-prem local disks and AWS cloud-drive Examples: pxctl service pool delete [flags] poolID |
PWX-23196 | You can now configure the number of retries for an error from the object store. Each of these retries involves a 10-second backoff delay, followed by progressively longer delays (incrementing by 10-second intervals) between each attempt. If the object store has multiple IP addresses as the endpoints, then for a given request, the retries are done on each of these endpoints. For more details, refer to Configure retry limit on an error |
PWX-23408 | Added support to migrate PSO volumes into Portworx through the PSO2PX migration tool. |
PWX-24332 | The Portworx diags bundle now includes the output of the pxctl clouddrive list command when available. |
PWX-23523 | CSI volume provisioning is now distributed across all Portworx nodes in a cluster providing higher performance for burst CSI volume provisioning. |
PWX-22993 | Portworx can now be activated in PAYG (pay-as-you-go)/SAAS mode using the commandpxctl license activate saas --key <pay_key> . |
PWX-23172 | Added support for cgroups-v2 host configurations, running with docker and cri-o container runtimes. |
PWX-23576 | Renamed the PX-Essentials FA/FB license SKU to Portworx CSI for FA/FB SKU. |
PWX-23678 | Added support for px-els (Portworx Embedded License Server) to install and operate in IPv6 network configurations. |
PWX-23179 | Provides a way to do an in-place restore from a SkinnySnap. Now you can create a clone from a SkinnySnap and use the clone to restore the parent volume. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14944 | Removed invalid tokens from PX-Security audit logs. When a non-JWT or invalid token was passed to the PX Security authentication layer, it was being logged. There is no impact to the user with this change. |
PWX-22891 | For On-premise Portworx installations not using cloud drives, a KVDB failover could potentially fail since Portworx on the node fails to find the configured KVDB device since its name had changed after the initial install. Resolution: Portworx will now fingerprint all the KVDB drives provided on all nodes regardless of whether they will run KVDB or not at install time. This ensures a KVDB failover happens even if device name changes on certain nodes. |
PWX-22914 | When undergoing an Anthos/EKS/GKE upgrade, Portworx could experience excessive delays due to internal KVDB failovers. Resolution: The KVDB failover rules now no longer consider storageless nodes as a KVDB candidate unless they have a dedicated kvdb-storage or are labeled with px/metadata-node=true . |
PWX-23018 | In a DR setup, destination clusters sometimes ended up with volumes in the trashcan if the trashcan feature was enabled. Users had to either delete the volumes from the trashcan in the destination cluster or disable the trashcan feature in the destination cluster. |
PWX-23185 | Cloudsnap restore operations sometimes slowed if a rebalance started on the same pool. Resolution: Portorx now avoids rebalancing volumes being restored from cloudsnaps. |
PWX-23229 | If the IP address of a node changed after reboot with an active cloudsnap, the cloudsnap operation failed with "Failed to read/write extents" error. This resulted in a failed backup, and users needed to reissue the cloudsnap. Resolution: Reattach the snapshot after node restart to avoid a snapshot being in an incorrect attachment state. |
PWX-23384 | When using a sharedv4 service volume with NFSv4 (default), users had to configure the mountd service to run on a single port even though NFSv4 does not use mountd . Resolution: Portworx now skips the check for the mountd port when using NFSv4 and omits the mountd port from the Kubernetes service and endpoints objects. |
PWX-23457 | Portworx showed Pure volumes separately in the pxctl license list output even when there were no explicit limits for Pure volumes, which was confusing. Resolution: This output has been improved. |
PWX-23490 | Previously, upgrading from Portworx versions 2.9.0 up to anything before 2.11.0 did not update the decision matrix. Resolution: Starting with 2.11.0, Portworx will now update the decision matrix at boot-time. |
PWX-23546 | The px-storage process restarted if the write complete message was received after a node restarted. |
PWX-23623 | Portworx logged benign warnings that the container runtime was not initialized. Resolution: Portworx no longer logs these warnings. |
PWX-23710 | When Kubernetes was installed on top of the containerd container runtime, the Portworx installation may not have properly cleaned up the containerd-shim process and container directories. As a consequence, nodes may have needed to be rebooted for the Portworx upgrade process to complete. |
PWX-23979 | KVDB entries for deleted volumes were not removed from KVDB. As a result, KVDB sizes might have increased in some cases where volumes were constantly being deleted and scheduled snapshot and cloudsnaps are configured. |
PWX-24047 | Portworx will no longer use Kubernetes DNS (originally introduced with PWX-22491). In several configurations, Kubernetes DNS did not work properly. Resolution: Portworx now relies on a more stable host's DNS config instead. |
PWX-24105 | In rare cases, Portworx on a node may have repeatedly restarted because of a panic due to nil pointer dereference when deleting a pod for a sharedv4 volume. Resolution: Portworx will not come up until the relevant pod is deleted manually from Kubernetes by scaling down the application. |
PWX-24112 | IBM csi-resizer would sometimes crash when resizing volumes.**Resolution:**This issue is addressed in the IBM Block CSI Driver version 4.4. Check the version on your cluster using the command ibmcloud ks cluster addon ls --cluster <cluster-id> . |
PWX-24187 | The PAYG (pay-as-you-go) license disables the license when there are issues with reporting/billing Portworx usage. After reporting/billing gets reestablished, the license is automatically enabled. Portworx did not add the default license features, requiring a restart of the portworx.service to properly re-establish the license. |
PWX-24297 | Portworx updated the decision matrix in the config map, causing nil pointer exceptions to appear in non-Kubernetes environments. Resolution: Portworx now checks that the config map exists before updating it. |
PWX-24433 | When Docker or CRI-O is not initialized on a cluster, Portworx would periodically print the following log line: Unable to list containers. err scheduler not initialized .Resolution: The log line is now suppressed when Portworx detects that there is runtime like Docker or CRI-O that is not initialized. |
PWX-24410 | DaemonSet YAML installs using private container registry server were using invalid image-paths (incompatible with air-gapped, or PX-Operator), thus resulting in a failure to load the required images. Resolution: Fixed regression introduced with Portworx version 2.10.0, when custom container registry server was in use. |
PWX-22481 | A pod could take upwards of 10 mins to terminate if a Sharedv4 service failover and a namespace deletion happens at the same time. Resolution: Scale down the pods to 0 before deleting the namespace. |
PWX-22128 | Async DR creates new volume (clone from its previously downloaded snapshot) on the destination cluster every time the volume is migrated. If cloudbackups are configured for the volumes from destination cluster, then every backup for a volume results in being full, as these volumes are newly created on every migration for the same volume. This change fixes this issue and allows the backups from destination cluster to be incremental. Resolution: When a volume is migrated, the volume is in-place restored to its previously downloaded snapshot and the incremental diff is downloaded to the volume without creating any new volume. Since there is no new volume being created, any backups for this volume can now be incremental. |
Deprecations
The following features have been deprecated:
- Legacy shared volumes
- Volume groups
- Hashicorp Consul support
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1325 | On IBM cloud drive, if pxctl sv pool expand with a resize-disk operation fails due to an underlying IBM issue, you will see the following error signature: Error: timed out waiting for the condition . This indicates that the IBM provider failed and could not perform the operation within 3 minutes.Workaround: If the underlying disk on the host has expanded, then issue the command pxctl sv pool update --resize --uid <> to complete the pool expand operation. If the underlying disk on the host have not expanded, check the IBM csi-controller pods for any potential errors reported by IBM. |
PD-1327 | On IBM clouddrive, if a pxctl service pool expand -s <target-size> with resize-disk for a target size X fails, you cannot issue another pool expand operation with a target size lower than the value X . On IBM clouddrive, a resize of the underlying disk is issued by changing the size on the associated PVC object. If IBM csi-driver fails to act upon this PVC size change, the pool expand operation will fail, but the PVC size cannot be reduced back to older value. You will see the following error: spec.resources.requests.storage: Forbidden: field can not be less than previous value .Workaround: When a pool expand operation and a subsequent IBM PVC resize is triggered, it is expected by the IBM CSI resizer pod to eventually reconcile and complete the resize operation. Once the underlying disk on the host has expanded, then issue the command pxctl sv pool update --resize --uid <> to complete the pool expand operation. If the underlying disk on the host has not expanded, check IBM csi-resizer pods for any potential errors reported by IBM. |
PD-1339 | When a Portworx storage pool contains a repl 1 volume replica, pool expansion operations report following error: service pool expand: resize for pool <pool-uuid> is already in progress. found attached volumes: [vol3] that have it's only replica on this pool.Will not proceed with pool expansion. Stop applications using these volumes or increase replicas to proceed. resizeType=RESIZE_TYPE_ADD_DISK,skipWaitForCleanVolumes=false,newSize=150 .The actual reason of failure is not resize for pool <pool-uuid> is already in progress ; the correct reason of failure is found attached volumes: [vol3] that have it's only replica on this pool.Will not proceed with pool expansion. Stop applications using these volumes or increase replicas to proceed. resizeType=RESIZE_TYPE_ADD_DISK,skipWaitForCleanVolumes=false,newSize=150 .Workaround: The command pxctl sv pool show displays the correct error message. |
PD-1354 | When a PVC for a FlashArray DirectAccess volume is being provisioned, Portworx makes a call to the backend FlashArray to provision the volume. If Portworx is killed or crashes while this call is in progress or just before this call is invoked, the PVC will stay in a Pending state forever.Workaround: For a PVC which is stuck in Pending state, check the events for an error signature indicating that calls to the Portworx service have timed out. If such a case arises, clean up the PVC and retry PVC creation. |
PD-1374 | For FlashArray volumes, resizing might hang when there is a management connection failure. Workaround: Manually bring out the volume from the maintenance mode. |
PD-1360 | When a snapshot volume is detached, you see the Error in stats : Volume does not have a coordinator error message. Workaround: This message appears because the volume is created, but not attached or formatted. A coordinator node is not created until a volume is attached. |
PD-1388 | the Prometheus Operator pulls the wrong Prometheus image. In air-gapped environments, Prometheus pod deployment will fail with an ImagePullBackOff error. Workaround: Before installing Portworx, upload a Prometheus image with the latest tag to your private registry. |
2.10.4
Nov 8, 2022
- This version addresses security vulnerabilities.
2.10.3
June 30, 2022
Improvements
Improvement Number | Improvement Description |
---|---|
PWX-23523 | CSI Volume Provisioning is now distributed across all Portworx nodes in a cluster. Large volume provisioning performance increases should be seen for large enough volumes. There is no user impact for customers other than higher performance for burst CSI volume provisioning. |
2.10.2
June 1, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-23364 | Fixed a CSI volume provisioning issue where orphaned Portworx volumes were left behind if a PVC deletion was issued before volume creation finished. Resolution: Users should upgrade their Portworx version if they are seeing orphaned Portworx CSI volumes with no associated PVC/PV. |
2.10.1
May 5, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-19815 | pxctl credentials create commands were failing due to an RSA error when using Google Cloud KMS as the secret provider and trying to store credentials which were too long for the RSA key to handle.Resolution: This patch adds the fix for the issue without changing the existing operation to add credentials. |
2.10.0
April 7, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- The Portworx Application Control feature provides a method for controlling an individual Portworx volume’s IO or bandwidth usage of backend pool resources. Portworx volumes are created from a common backend pool and share their available IOPS and bandwidth amongst all other provisioned Portworx volumes.
- The volume trash can feature provides protection against accidental or inadvertent volume deletions which could result in loss of data. In a clustered environment such as Kubernetes, unintended deletion of a PV or a namespace will cause volumes to be lost. This feature is recommended in any environment which is prone to such inadvertent deletions, as it can help to prevent data loss.
- You can enable automatic filesystem trimming (auto fstrim) at the volume, node, or cluster level. When you enable auto fstrim at the cluster or node level and enable
nodiscard
on your volumes, auto fstrim monitors the unused space in all filesystems mounted on Portworx volumes and automatically triggers a trim job to return unused space back to the pool, and you do not have to manually issue trim jobs.
Notes
Portworx 2.10 is the last release where Ubuntu 16.04 will be supported.
Improvements
Improvement Number | Improvement Description |
---|---|
PWX-20674 | In Async DR setup, before creating a cluster pair, the DR license is checked on both the clusters. The cluster pair request will error out if only one of the clusters has DR license set. |
PWX-21318 | Users can set the frequency of full backups using pxctl cluster options update -b <number> . The default value is 7, and you can check the value that is set with pxctl cluster options list . The number controls the number of incremental backups made before a full backup is done. |
PWX-21024 | This fix adds --secret_key and --secret_options that allows users to propagate Kubernetes secret config information to the cli and the backend during volume import. |
PWX-19780 | Local volumes that are pending due to ha-increase will now appear when using pxctl volume list --node <node> . |
PWX-20210 | Added support for specifying throughput parameter for gp3 drive types in AWS. The throughput parameter can only be specified at install time. Portworx currently does not allow a way to change the throughput parameter once installed. One can still change the throughput of any drives directly from the AWS console. |
PWX-22977 | Sometimes cloudsnaps can fail with InternalServerError , resulting in cloudsnap backup failures and the need for user intervention to reissue the cloudsnap command for the same volume. This fix increased the aws-sdk retries and also added back-off retries. |
PWX-21122 | VPS users now have control at the pool level with the new built-in topologyKey portworx.io/pool to allow volume affinity and anti-affinity to work for individual pools. Users can now control volume placement topology at the narrower level of pool. This allows finer control than the default topology of nodes. |
PWX-20938 | Added a VolumePlacementStrategy template for use with StatefulSets that allows volume affinity and anti-affinity with volumes belonging to the same StatefulSet pod. Use the key/value px/statefulset-pod"/"${pvc.statefulset-pod} in a matchExpression . With this, you can ensure volumes in the same StatefulSet pod do or do not land on the same node. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-9712 | Some applications might starve Portworx from Async IO event reservations. This will result in a panic loop. Resolution: The absence of Async IO reservation is now a soft failure. |
PWX-20675 | ClusterPair can also be set up using a backup location Kubernetes object instead of creating credentials on both clusters. There was an issue where destination credentials were reset, then the system ignored the backup location object and used the internal object store. This caused migration failures. User impact: Migration start failing for cluster pair configured using backup location. Resolution: Resets of credentials on DR sites have been fixed for cluster pairs configured using backup location. |
PWX-21143 | Portworx POD (oci-monitor container) was using a broad privileged:true security privilege, enabling too many security attributes.Resolution: We have replaced the broad privileged:true security setting with fine-grained security privileges. |
PWX-21358 | When a Portworx cluster is created in a vSphere environment, Portworx disks (vmdks) were unevenly placed among the datastores in a vSphere storage cluster. In extreme cases, all vmdks would land up in the same datastore. We have taken a best effort approach of distributing vmdks as evenly as possible among all the datastores in a Storage Cluster to the extent that vSphere apis allow. User impact: Users had to deal with an uneven distribution of vmdks because vmdk movement across datastores is not supported. To work around this issue, users would bring up nodes one at a time. Resolution: This best-effort approach is available for a more even distribution of vmdks among datastores of a storage pod. |
PWX-21389 | When external EtcD v3 configured with user AuthN (authentication) as a KVDB, Portworx was not installing correctly. Resolution: When user-AuthN is in use, KVDB clients are now properly initialized and set up. |
PWX-21514 | In an internal kvdb cluster operating at maximum kvdb cluster size, if one of the kvdb member nodes goes down, it will be replaced by an available non-kvdb node in the Portworx cluster. When the previous member recovers from the problem and comes up, its kvdb disk will be deleted. In a vSphere environment, this deletion used to fail and users would see an additional kvdb drive when they list all available drives in the Portworx cluster. User impact: vSphere environments could see unused kvdb disks lingering around in the cluster until they are deleted from outside of the Portworx environment. Recommendation (optional): If users choose not to upgrade, they will have to manually delete those extra lingering disks. |
PWX-21551 | When a Portworx volume switched to read-only mode, Portworx restarted docker-containers that use px-volumes, but it did not restart containerd/cri-o containers. Resolution: Portworx now also restarts containerd/cri-o containers. |
PWX-22001 | Volume placement strategy rules with affected_replica rules will now be applied when increasing the HA level of a volume.User impact: Rules with affected_replica volume placement rules were not correctly applied when increasing HA level as they were when initially provisioning a volume for the same HA level. |
PWX-22035 | If a node restarted when Portworx creates a snapshot after deleting a volume, Portworx sometimes restarted. User impact: When the following things happen together:
Resolution: This release addresses this issue and Portworx no longer restarts under this circumstance. |
PWX-22478 | Portworx node-wipe operation did not clean up all the old node identitifiers, which caused issues with the telemetry container after the node was wiped or recycled.Resolution: Portworx node-wipe procedure was fixed, so all node identities are properly recycled. |
PWX-22491 | Portworx installations were using the default dnsPolicy , which did not include the Kubernetes internal DNS server.Resolution: We changed the default dnsPolicy to ClusterFirstWithHostNet , so now the Kubernetes DNS is also used in hostname resolution. |
PWX-22787 | Portworx generates a core then restarts on certain nodes where application pods are trying to setup. User impact: Portworx will generate a core and restart itself in the scenario where an application pod is trying to attach a volume on a node and the volume is already attached and busy in another node in the cluster. The Portworx service will auto-recover from this after the restart. Only Portworx 2.9.1 was impacted by this issue. Resolution: The issue causing Portworx to restart has been fixed. |
PWX-22791 | There is no update API for credentials. User impact: When keys are rotated for cloudsnap credentials, there is no way to update the credentials with the new keys. Only way is to delete and recreate credentials with new keys. This requires stork schedule for cloudsnap to be updated with new credential ID to avoid failures due to credential ID mismatch. With porx version 2.10, update API has been added to credentials and allows users to update most of the parameters. Caution must be exercised while changing parameters such as bucket or the end point, which causes previous cloudsnaps to be no longer visible through the modified credentials. |
PWX-22887 | After a node is decommissioned, backups may fail for volumes which were attached on the decommissioned node. Resolution: Detaching the volume using pxctl host detach fixes the issue. |
PWX-22941 | While performing an internal kvdb node failover, a failure in setting up internal kvdb could result into an orphaned unstarted node entry in the internal kvdb cluster. User impact: Internal kvdb clusters would keep running at a reduced cluster size. Resolution: A Portworx node which failed to perform internal kvdb failover will detect its own orphaned node entry in the kvdb cluster and clean it up. |
PWX-22942 | In stretch cluster, cloudsnaps can sometime fail with Not authenticated with secrets error. This is due a missing check, which schedules the cloudsnaps on a node in a Kubernetes cluster without access to credentials (BackupLocation).Resolution: Fix the check to not allow cloudsnaps on nodes without Kubernetes secrets. |
PWX-23060 | Previously, FlashArray and FlashBlade had a limit of 200 Portworx or FA/FB volumes in a cluster. Resolution: The limits are now 200 Portworx and 100,000 FA/FB volumes in a cluster. |
PWX-23085 | A Portworx upgrade on a storageless node in the cloud drive configuration can get stuck with the message DriveAttachedOnDifferentNode Error when pool ExpandInProgress on another node in the cluster .Resolution: The issue has been fixed. Portworx now skips drives where expansion is in progress until the expansion is complete. |
PWX-23096 | Cloudsnaps may stay in active state forever when the executing node is decommissioned. User impact: Cloudsnaps may be stuck in active state for ever. Further requests to the same volume may be in queued state. Resolution: Mark that these cloudsnaps are stopped, allowing newer requests run on a different replica. |
PWX-23099 | Restoring cloudsnap with the intention of doing inplace restore fails with the error Not enough space available on the pool , even though the used size of the volume being restored is less than available space on pool.User impact: Failure to restore the cloudsnap to the same pool as parent volume. Resolution: Check for pool space with respect to used size of the volume being restored than the actual volume size. |
PWX-23119 | Panic was occurring in one of the Portworx processes when creating a local snapshot for replicated volumes having more than 1 replica and which have the skinnysnap feature enabled. User impact: When the skinnysnap feature is enabled and the number of skinnysnaps is set to greater than 1, the Portworx process on one or more nodes may panic while creating a local snapshot. This happens if not all replicas are online on a volume with a replication level greater than 1. Resolution: Fix the out of bounds access that caused the panic in skinnysnap creation path with replicated volumes having more than 1 replica. |
PWX-23151 | On OpenShift platform, the Portworx service could not use Kubernetes client APIs if the Portworx POD was stopped. Resolution: The Portworx service has been isolated from Portworx POD, so stopping the POD on OpenShift platform no longer prevents Portworx service from using Kubernetes client APIs. |
PWX-23155 | In a cluster with more than 512 nodes, Portworx fails to start after 511 nodes. Portworx had a limit to open client connections which gets crossed while adding nodes after 511. User impact: More than 511 nodes cannot be added to the cluster or it will cause Portworx to be in a crash loop. |
PWX-23174 | For fragmented large volumes, repl-add keeps restarting from scratch.User impact: As repl-add is stuck in loop, ha-add won't finish, and a new replica is not added to the replica set. |
PWX-23189 | On Tanzu vSphere with the Kubernetes platform, the worker VMs have a 16GiB root partition. Due to the small size of the root partition, this can lead to disk pressure as the life of the cluster increases. Resolution: We recommend that you monitor the free disk space on these workers continuously and garbage collect space as needed. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1104 | When installed in vSphere local mode, if a VM which is running a storageless Portworx node is migrated to another ESXi which does not have any storage Portworx nodes, this storageless Portworx node will fail to transition into a storage node. Workaround: Restarting Portworx on the storageless Portworx node will transition it into a storage node. Portworx can be restarted by applying the px/service=restart label on the Kubernetes node or by issing systemctl restart portworx on the node. |
PD-1117 | When trash can is enabled in disaster recovery setup (by setting a value greater than 0 for VolumeExpiration in cluster settings) on the destination cluster, users will see many volumes. If the expiration is set to a very large number, these snapshots might take up significant capacity as well. This is a known issue and will be addressed in future release.Workaround: Do not enable VolumeExpiration in cluster settings. |
PD-1125 | If a Portworx storage pool is in an Error state (seen in pxctl service pool show ), do not submit new pool expand operations on the pool.Workaround: Before submitting new pool expand operations, fix the pool state by entering and then exiting the pool in maintenance mode using the pxctl service pool maintenance command. |
PD-1127 | Portworx pool expand operation has the status failed to update drive set state: etcdserver: leader changed .Workaround: This error indicates that the actual pool expansion is complete in the background. The message occurs when Portworx tries to update the status of the drives in the pool. |
PD-1130 | A storage pool expand using add-disk can be stuck in progress with the error Pool is still not up. add drive status: Drive add: No pending operation pool status: StorageFull .Workaround: Restart Portworx on the node to resolve the issue. This can be done by issuing systemctl restart portworx or labeling the Kubernetes node with the px/service=restart label. |
PD-1165 | Due to an incomplete container image, Portworx installation or upgrade operations can get stuck with the message: could not create container: parent snapshot <> does not exist: not found .Workaround: Identify the px-enterprise image and remove it. The following sample commands do this:ctr -n k8s.io i ls | grep docker.io/portworx/px-enterprise:2.10.0 ctr -n k8s.io i rm docker.io/portworx/px-enterprise:2.10.0 |
2.9.1.4
Apr 1, 2022
Notes
- This version addresses security vulnerabilities.
2.9.1.3
Mar 15, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22943 | Portworx with FA cloud drives was erroneously able to start with the user_friendly_names setting enabled. User impact: Portworx installs successfully initially, but on restart it won't be able to identify its own drives. This could cause Portworx to create new drives ignoring the already created ones. Resolution: Portworx no longer starts if the multipath user_friendly_names setting is enabled. If after installing this version you receive this error, update your multipath configuration. |
2.9.1.1
Feb 3, 2022
ADVISORY: Pure Storage strongly recommends users of the 2.9.1 release upgrade to 2.9.1.1.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22787 | Under a certain race condition, Portworx could generate a core and restart itself. This could happen when an application pod tries to attach a volume on a node while the volume is already attached on another node in the cluster. User impact: Portworx on the node where the application pod is trying to attach the volume would generate a core and restart. The Portworx service auto-recovered from this after the restart. Only Portworx 2.9.1 was impacted by this issue. Resolution: The issue causing Portworx to restart has been fixed. |
2.9.1
Jan 27, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Support for Pure FlashBlade as a Direct Access filesystem has graduated from early access to Generally Available! With this feature, Portworx directly provisions FlashBlade NFS filesystems, maps them to a user PVC, and mounts them to pods. Reach out to your account team to enable this feature.
- Support for Pure FlashArray cloud drives has graduated from early access to Generally Available! Use FlashArrays as a cloud storage provider. Reach out to your account team to enable this feature.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-22105 | Portworx now supports PKS distributions based on "containerd" container runtimes. |
PWX-21721 | The pxctl status command's response time is now reduced when telemetry is enabled. This was done by running telemetry status asynchronously and caching its status. |
PWX-20642 | Portworx no longer requires global permissions on all datastores. Users can now specify which datastores to give Portworx access to. |
PWX-22195 | Improved Portworx logs by adding a co-relation ID to every API request and will be logged at all levels. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22197 | CSI provided drives may have incorrect classification of media type: Disks based on flash media getting classified as Magnetic disks. User impact: Portworx may incorrectly classify other storage attributes which are derived from the storage media like IO Priority, etc. Resolution: CSI provided drives are now correctly classified. If you’ve had your drives incorrectly classified, you can manually change the io_priority of the pool using the pxctl sv pool update command. |
PWX-22079 | A golang panic (stacktrace) occurred when there was an error initializing the storage layer. User impact: When there was an error while initializing the storage-layer, golang sometimes panicked (stacktrace output), and the real error was masked. Resolution: Portworx now properly handles errors from the storage layer, and no longer causes golang panics. |
PWX-21605 | Portworx keeps a track of the number of NFS threads configured on a node. If the number of threads drops below 80% of the configured value it will reset it to the configured thread count. However a variance of 80% was too large, and on an overloaded system could cause the system to run with fewer number of threads than desired. User impact: On certain overloaded systems, sharedV4 application pods could see NFS timeouts since the NFS server had less number of threads than the configured amount. Resolution: Portworx will now keep the NFS thread count within 95% of the configured value. |
PWX-22313 | Transitioning a storageless node into a storage node caused other nodes in the cluster to receive a NodeDown event (for the storageless node) with an IP which matches with the new storage node. This caused the sharedV4 server to assume that there were no sharedV4 clients active on that IP. User impact: A sharedV4 app running on such a node which transitions could see I/O errors. Resolution: Portworx on peer nodes will now detect these transitions, ignore NodeDown events for the same duplicate IP, and avoid removing the client for the sharedV4 volumes. |
PWX-22244 | When fsGroup is set on the volume, the kubelet has to perform a recursive permissions change on the mount path. This can take time and delay the pod creation when there are a large number of files in the volume. In Kubernetes 1.20 or later, there is a setting fsGroupChangePolicy: OnRootMismatch which tells kubelet to skip the recursive permissions change if the permissions on the root (mount path) are correct. This prevented the delay in pod creation, but was rendered ineffective when Portworx reset a permissions value. User impact: Specifying fsGroupChangePolicy: OnRootMismatch did not alleviate the pod creation delay caused by the fsGroup setting. |
PWX-22237 | A race condition in Portworx during storageless node initialization sometimes created an orphaned entry in pxctl clouddrive list where that node's entry was not present in pxctl status . User impact: The node list between pxctl status and pxctl clouddrive list was not in sync. Resolution: Portworx now handles this race condition and ensures that such orphaned entries are removed. |
PWX-22218 | Portworx failed to mount volumes into asymmetrical shared mounts. User impact: When using asymmetrical shared mounts (e.g. mounting different directories between host/container), it was not possible to mount Portworx volumes into these directories. Resolution: After the fix, asymmetrical shared mounts work properly (i.e. you can mount volumes into such directories). |
PWX-22178 | On Kubernetes installations that use the CRI-O container runtime, setting up a custom bidirectional (shared) mount for the Portworx pod did not propagate to portworx.service . Instead, it would be set up as a regular bind-mount, that could not be used to mount the PXD devices. Resolution: The bidirectional mounts are now properly propagated to portworx.service . |
PWX-21544 | When coming out of run-flat mode after more than 10 minutes have elapsed, a Portworx quorum node sometimes failed to start a watch on the internal KVDB because the required KVDB revision had already been compacted. User impact: When using an internal KVDB, there may have been a brief outage when Portworx exits run-flat mode. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-1076 | When using Portworx with FlashArray, if new drives are added while paths are down, it may not have all connections established and may result in failures when only a certain subset of paths go down, even if others are live. Workaround: This can be recovered after all paths are present with pxctl sv m --cycle , which will detach and reattach the drives, hopefully ensuring all paths are added back. |
PD-1068 | When using Portworx with FlashArray, expanding a pool by resizing when some paths are down (even if some are still up) may result in issues, as the single paths may not pick up the new path size and fail the multipath resize operation. Workaround: Run the resize again when all paths are restored to resolve the issue and complete the expansion. |
PD-1038 | Portworx pool expand operations fail to resize when some multipath connection paths of FA are down. Workaround: After the network is restored, you can run iscsiadm -m session --rescan and expand the pool again. |
PD-1067 | Disabling a port on the FlashArray will also remove it from the list of ports in the REST API, and thus Portworx will not attach it. This can cause some multipath paths to remain offline even after the port is reattached, especially if Portworx had a restart while the port was down. Workaround: You can recover the faulty paths by running the pxctl sv m --cycle command to reattach them and bring back the missing paths. Note that unless all paths are down, Portworx will still function fine, just with reduced iSCSI/FC-layer redundancy. |
PD-1062 | px-pure-secret contains FlashArray and FlashBlade connection information, specifically management endpoint and token. The secret is loaded when Portworx starts, therefore it needs to be present before Portworx is deployed. Also, any changes to the secret after Portworx is already started will not be detected. Workaround: If you need to change array backends or renew the token, you must restart Portworx. This also applies to FlashArray disk provisioning, and impacts changes to the FA/FB essentials licenses. |
PD-1045 | In the cloud drive mode of deployment with a FlashArray, a restart of the primary controller or a network outage to the FlashArray could cause a storage node to transition into a storageless node. This transition happens since another storageless node in the cluster picks up the disks and starts as a storage node. The original storage node however still ends up having the signature of the old disks and starts up as a storageless node with StorageInit failure. This happens only if Portworx on this node is unable to cleanly detach its disks due to the primary controller on FlashArray being down. Similarly, after the primary controller restarts or a network outage occurs, a storage node could see errors from the backend disk, or the internal KVDB could see errors from the KVDB disk and cause Portworx or the internal KVDB on that node to enter an error state. Workaround: Once the primary controller is back, restart Portworx on the impacted node to recover. |
PD-1063 | If the Kubernetes ETCD is unstable, Portworx may experience intermittent access issues to the Kubernetes API. Workaround: If a pool expand operation fails with the error message: could not retrieve portworx-storage-decision-matrix config map: etcdserver: leader changed" , retry the pool expand operation. |
PD-1093 | Application pods can get stuck in the ContainerCreating state. Check for both of the following conditions to determine if volume attachment has failed:
|
PD-1071 | If you manually disconnect any connected volumes from FlashArray, the Portworx node may become stuck attempting to reconnect to the original volume if there are pending I/Os. Workaround: Reconnecting the volume will resolve this issue at the next Portworx restart and the node will return to a healthy state. |
PD-1095 | If you uninstall Portworx with deleteStrategy set to Uninstall (and not UninstallAndWipe ), then you reinstall Portworx, the telemetry service and metrics collector will be unable to push metrics and may run into a CrashLoopBack state. This is for certificate security reasons.Workaround: Contact Support to reissue the certificate. |
2.9.0
Nov 22, 2021
Notes
- If you're using Kubernetes 1.22, you should use Stork 2.7.0 with Portworx 2.9.0.
- After upgrading to Portworx 2.9.0, the existing
sharedv4 service
volumes will be switched to using NFS version 4.0.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18037 | Improved pxctl status bootstrap issue reporting when KVDB connectivity is blocked. |
PWX-18038 | Clarified error message when using an incorrect network interface. |
PWX-18362 | Using the pxctl cloudsnap list -d command, you can now list cloudsnaps of volumes that are no longer present in the cluster, but belonged to it. |
PWX-20670 | Portworx will attempt to enable persistent journaling when installing . |
PWX-21373 | The following template can now be used in a VolumePlacementStrategy for the volumeAntiAffinity or volumeAffinity to automatically constrain the MatchExpressions to the PVC namespace.- key: "namespace" values: - "${pvc.namespace}" You can now separate interaction between different namespaces when using volume (anti-)affinity in VPS. |
PWX-21506 | One of the folders used by legacy shared (fuse) volumes will not created unless shared volumes are created and mounted. This change prevents the internal mount path, specifically (/opt/pwx/oci/rootfs/pxmounts ), from being created when there are no shared volumes being used. |
PWX-21994 | Added support for the cgroup V2 -configured hosts. |
PWX-21662 | Portworx now supports OpenShift version 4.9. |
PWX-21341 | Added the sharedv4_failover_strategy storageClass parameter whose value can be either aggressive or normal . The aggressive strategy uses a shorter failover grace period than the one used by the normal strategy. If sharedv4_failover_strategy is unspecified, then the default for sharedv4 service volumes is aggressive and that for sharedv4 volumes is normal . The value for this parameter can be changed using the pxctl volume update command as well. An empty value clears the setting. |
PWX-21684 | Telemetry is now disabled by default opn the spec generator. Enable telemetry under advanced settings. |
PWX-21895 | KDVB Metrics are now disabled by default, lowering the amount of metrics generated. You can reenable them by adding kvdb_metrics_enable=1 as a runtime option. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21710 | If a KVDB node encounters IO errors and restarts, it will fail to unmount the previous mountpoint. User Impact: The KVDB node did not start, resulting in reduced KVDB availability if there were no other nodes available to take over for the failed replica. Resolution: The Portworx container now looks for unhealthy mount points and unmounts them. |
PWX-21590 | Portworx nodes would not transition into run-flat mode when the etcd cluster was unreachable and lost cluster quorum. Each node detected the etcd cluster as unreachable at different times, rendering the cluster unable to reach consensus on whether etcd quorum is lost. User Impact: Cluster KVDB quorum as well as Portworx cluster quorum would be lost and Portworx would not transition into run-flat mode. Resolution: All Portworx nodes will now detect etcd is not reachable within 1 minute and enter run-flat mode. |
PWX-21506 | Mount paths used by shared (fuse) volumes have wider permissions than desired. User Impact: One of the folders used by legacy shared (fuse) volumes were not created unless shared volumes were created and mounted. Resolution: This change prevents the internal mount path, specifically ( /opt/pwx/oci/rootfs/pxmounts ), from being created when there are no shared volumes being used. |
PWX-21168 | Added support for the RKE2 Kubernetes distribution. User Impact: The RKE2 has switched to K3s Kubernetes distribution baseline, which broke the Portworx deployment. Resolution: The YAML-generator at Portworx Central had been fixed to recognize the RKE2-based Kubernetes version, and automatically apply the customization required to install Portworx. |
PWX-20780 | Pods using encrypted sharedv4 volumes got stuck in the terminating state. User Impact: If a node that was hosting the replica of a sharedv4 encrypted volume (server node) was rebooted, it was possible for application pods accessing that volume to get stuck in the terminating state. Resolution: Portworx will detect restart of a sharedv4 encrypted volume server node and will automatically restart application pods using that volume which will recover them to functional state. |
PWX-21965 | Portworx .stack files were getting accumulated in /var/cores and not deleted. User Impact: Users experienced a growing number of files with the .stack extension in /var/cores on the the worker nodes, which they must delete manually.Resolution: The new .stack files generated in Portworx now are now deleted automatically. |
PWX-21823 | In cloud setups, if a cluster was scaled down, resulting in internal KVDB nodes being shutdown, then there was a possibility that more than the quorum number of nodes were removed from the bootstrap configuration map where the internal KVDB nodes list is maintained. User Impact: If internal KVDB nodes were scaled down, users had to manually recover the KVDB after scale up. |
PWX-21807 | Expanding a pool using add drive beyond 6 drives caused creation of new pools. User Impact: If you had six drives in a single pool, and you tried the pxctl service pool expand command to add drives, the command created a brand new pool instead of expanding the existing one.Resolution: The pxctl service pool command will now fail the operation with an error message indicating that the pool has reached the maximum number of drives. |
PWX-21664 | When Kubernetes was installed on top of the containerd container environment and using an older (non-default) runtime version io.containerd.runtime.v1.linux , Portworx installation sometimes did not properly clean up the containerd-shim process and container directories. User Impact: The node may have required a reboot for Portworx upgrade process to complete. Resolution: Portworx now cleans up processes and directories when running on this older containerd runtime. |
PWX-21659 | In certain failed scenarios where cloud backup would fail even before creating a cloud backup ID, the migration status would not be updated with the appropriate error message. User Impact: Migrations triggered as a part of Async DR would fail with an empty error message. Resolution: Portworx now sets the correct message in all cloud backup error scenarios. |
PWX-21558 | You can sepcify discard or nodiscard filesystem mount options using two different volume spec fields (volumespec.nodiscard and volumespec.mount_options ). User Impact: This resulted in ambiguous volumespec settings in the following cases: volumespec.nodiscard = true and volumespec.mount_options="discard=" volumespec.nodiscard = false and volumespec.mount_options="nodiscard=" Resolution: When ambiguous volumespec settings for discard or nodiscard are detected, Portworx uses the volumespec.nodiscard to derive the final value, and generates an alert to notify that the volumespec needs to be fixed. |
PWX-21380 | One of the internal data structures was not properly protected for access from multiple threads. This caused Portworx to encounter an error and be restarted. User Impact: In a scaled-up setup with a large number of nodes, Portworx restarted intermittently. Resolution: Portworx now does not share the data structure between multiple threads. |
PWX-21328 | Applications using file locks do not work with sharedv4 service volumes while using NFS version 3. Any attempt to acquire a file lock hangs indefinitely. User Impact: Containers stuck in this state cannot be terminated using normal methods. Resolution: This no longer occurs with sharedV4 service volumes. |
PWX-21297 | The Portworx pod did not encode stored credentials correctly when authenticating with the container-registry to download the px-enterprise container. User Impact: Depending on the characters used in password for the container-registry, authentication continuously failed, and the Portworx pod was unable to pull and install Portworx on the cluster. Resolution: Portworx now properly encodes credentials. |
PWX-21004 | Locks are not transferred on failover. User Impact: Users saw unpredictable behavior for applications relying on NFSv4 locking. Resolution: Portworx disallows NFSv4 for sharedv4 service volumes. |
PWX-20643 | Pods that used several sharedv4 volumes sometimes became stuck in the terminating state. User Impact: When a pod that was using several sharedv4 volumes was deleted, it sometimes became stuck in the terminating state. Users had to reboot the node to escape this state. Resolution: Pods using sharedv4 volumes no longer get blocked indefinitely. |
PWX-19144 | During migration, Portworx volume labels were not copied. User Impact: Users were forced to manually copy volume labels after migration to reach the desired state. Resolution: Volume labels are now copied automatically during migration. |
PWX-21951 | The Operator created more storage nodes than maxStorageNodePerZone specified in STC.User Impact: The Portworx cluster came up with a different number of storage nodes than the number specified using the maxStorageNodePerZone parameter in the STC.Resolution: Portworx now comes up with the exact number of storage nodes specified in the maxStorageNodePerZone parameter. |
PWX-21799 | Added support for PX_HTTP_PROXY with RKE2 installs. User Impact: When installing Portworx in air-gapped environments, you can include the PX_HTTPS_PROXY environment variable to use http-proxy with the install. However, this variable was not used when pulling px-enterprise image during installs on Kubernetes with containerd container-runtime clusters.Resolution: Portworx now uses the PX_HTTPS_PROXY environment variable when installing Portworx on Kubernetes with containerd container-runtime clusters. |
PWX-21542 | When you installed Portworx on an air-gapped cluster, the node start up time was delayed. This happened because the metering agent ran to report the health of Portworx cluster. User Impact: Portworx tried to run the metering agent, resulting in a delayed start. Resolution: The metering agent is now disabled to avoid the delayed start. |
PWX-21498 | The drain volume attachment job timeout was too long. User Impact: The drain volume attachment job had an upper time limit of 30 minutes. In a certain error scenarios the job will stay in pending for 30 minutes and then timeout and fail. Resolution: Changed the drain volume attachment job timeout from 30 minutes to 10 minutes. Any volume attachment drain operation is expected to complete within 10 minutes. |
PWX-21411 | When a Portworx node went out of quorum and then rejoined the cluster, the volume mountpoint sometimes became read-only. User Impact: Some application pods became stuck in the container creating or crashloopbackoff state. Resolution: Portworx now detects the pods with ReadOnly PVC and proactively bounces the application pods after Portworx startup completes on a node. |
PWX-21224 | Setting cloudsnap threads to four or less resulted in cloudsnap backups in hung state. User Impact: With cloudsnap threads (Cloudsnap maximum threads field in cluster options) set to four or below and doing more than 10 cloudsnaps at the same time, cloudsnaps became stuck and made no progress. Resolution: Incorrect check for thread count no longer results in a deadlock. |
PWX-21197 | The runtime option limit_drivers_per_pool did not work in Portworx version 2.8.0. User Impact: The limit_drives_per_pool is a runtime option to control the number of drives in a pool. It was possible for the last pool to have more drives than the limit under the following circumstances: if the drive count is not an exact multiple of the limit and if creating another pool will have too few drives within. Internally, a new pool is created if drive count is at least 50% of the limit is available in the last pool.Resolution: Now, during the later drive add from maintenance mode, these limits are honoured more strictly. Any drive add will fall into a pool only if the drive count is within the limit. If not, a new pool will be formed. |
PWX-21163 | Prometheus was unable to access the Portworx internal ETCD on a multi network interface setup causing ETCD alerts not to appear. User Impact: Prometheus alerts and monitoring did not work properly for the alerts related internal ETCD on multi-network interface setups. Resolution: Portworx now allows internal etcd access from all network devices for Prometheus can have access and scrape alerts. |
PWX-21057 | Pods failed to come up for restored PVCs that were encrypted with Vault namespace secrets. User Impact: If you used a PVC that was cloned from a snapshot, and it was encrypted by Vault namespace secrets, then pods using that PVC were stuck in the container creating state. Resolution: Portworx now checks for the Vault namespace value at the correct place in the volume spec for restored volumes. This allows pods to finish setup and not get stuck. |
PWX-21037 | Unable to set the mount_options in volume spec. The nodiscard or discard volume fields were not in sync with mount_options .User Impact: This resulted in unpredictable behavior wherein after mounting a pxd volume , volumes appeared mounted with discard option even when volumespec.nodiscard was set to true .Resolution: Portworx now allows setting mount options. The volumespec.nodiscard and mount options are now made to be in sync, and this provides predictable behavior for pxd volume mounts. |
PWX-20962 | Portworx experienced a startup issue with volatile mounts of files in Kubernetes environments. User Impact: If files were manually removed from the /opt/pwx/oci/mounts/ directory, followed by restarting Portworx service before Portworx pod, this sometimes resulted in a looping failure to start the Portworx service.Resolution: Synchronization of volatile mounts into /opt/pwx/oci/mounts/ is fixed, so the restart of the Portworx pod properly restores files/directories in this directory. |
PWX-20949 | Cloudsnap restore operations sometimes failed with the error message Restore failed due to stall . This happened when the restore operation incorrectly evaluates the node to be in maintenance mode. User Impact: User restores may not have completed, and users may have been required to restart the node where the restore was stalled, or reissue the restore command. Resolution: Portworx cloudsnap now does not interpret node status. An upper layer module handles it, preventing this issue. |
PWX-20942 | Synchronous DR on Tanzu cloud drives did not work. User Impact: In Tanzu, the cloud drive backing Portworx is a PVC, which is a cluster-scoped resource. In order to help clusters distinguish between their drivesets, the drivesets should have been labeled accordingly. Resolution: Portworx now supports Tanzu PVCs. |
PWX-20903 | The affected_replicas in VolumePlacementStrategy ReplicaPlacementSpec were not applied correctly when multiple were used. User Impact: Volume provisioning sometimes failed when you had a VPS replicaAffinity with multiple rules using affected_replicas that should work. Resolution: Using multiple affected_replicas rules now work in Portworx as expected. |
PWX-20893 | The pxctl v check --mode fix_safe <volname/volid> command failed when Deleted inode <ino> has zero dtime was found by fsck . User Impact: If a file was in use at the same time it was deleted, it was never properly closed, and the filesystem was remounted, users may have seen an error from fsck .Resolution: The pxctl v check --mode fix_safe <volname/volid> command now recovers the volume to a clean state. |
PWX-20546 | Portworx sometimes displayed a rebalance job status as DONE while rebalance actions were still pending. User Impact: You may have seen a rebalance job status as DONE with some pending jobs belonging to deleted volumes.Resolution: Portworx now does not show the rebalance job for deleted volumes. |
PWX-21279 | In some cases, when multiple matchExpressions were specified in a volume placement strategy, a pool which satisfied only some expressions got selected. User impact: Instead of multiple matchExpressions, users could use multiple rules in versions affected by this issue to get the expected result. Resolution: Portworx now checks for all match expressions. |
PWX-20029 | If an NFS client node was restarted, the client reload process would restart all the pods assigned to the node. If it detected that an NFS server is down, it would start the timer for that event. When the timer expired, the process invoked the recovery routine, and the pods that use the volumes with that NFS were restarted again. This restart was sometimes unnecessary. User impact: Users saw pods reset unnecessarily. Resolution: Portworx now gets the latest list of attached volumes in the recovery routine, and only restarts the pods that are still using the stale volumes. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-1015 | If the Portworx pod is deleted and recreated while a pool expand of a Portworx Pool is in progress, the pool expand will fail with the error message: could not retrieve portworx-storage-decision-matrix config map: Unauthorized .Workaround: Wait for the new pod to come up and then resubmit the pool expand request. The subsequent request will go through. |
PD-1007 | During a Portworx cluster upgrade, if the cluster is using ContainerD as the runtime, a Portworx node may get stuck during upgrade.Workaround: Check if the px-oci-installer process is stuck on the node using the`ps -ef |
PD-1005 | In certain scenarios, a PVC resize request issued from Kubernetes can error out and the API will fail, but Portworx will eventually complete the resize operation. Kubernetes retries the operation but Portworx then returns the error: No change is requested . This causes the PVC size to not match with the actual Portworx volume size. |
PD-990 | Application containers running with non-Docker container runtimes will not get automatically restarted when Portworx detects issues with the volume mounts for those containers. Examples of non-Docker container runtimes are containerd, CRI-O, rkt. The most common scenario where volume mount issues are detected is when the mount becomes read-only when Portworx is down on a node for more than 10 minutes. Workaround: Restart application containers to workaround this issue. In Kubernetes, this means deleting application pods so that they get recreated. |
PD-1020 | A Portworx install where the backing drives are provisioned by Portworx can fail with the following fingerprint:Failed to format [-f --nodiscard /dev/<device-name>] . The issue can happen when the corresponding device is not attached completely on the node and the format cannot detect it.Workaround: Restart Portworx on the node. To restart Portworx, either label the corresponding Kubernetes node with label px/service=restart or if you're already on the node, systemctl restart portworx . |
PD-1017 | The application that uses sharedv4 service encrypted volume may have some pods stuck in the ContainerCreating state after there is a sharedv4 service failover and then failback. The failover happens when the Portworx service is restarted on the node where the volume is attached. The failback happens when the Portworx service is subsequently stopped on the node where the volume attachment was moved during failover.Workaround: Restart the Portworx service on the node where the volume was attached before the failover. Use a normal failover strategy for the sharedv4 service volumes. The default is aggressive . |
PD-1023 | Failures such as iSCSI disconnects, HBA reset, etc. on Portworx pool drives can sometimes cause the pool to enter an error state and remain in that state if there’s an outstanding operation in the kernel. Workaround: To recover from this, reboot the node. |
PWX-21956 | Pool expansions by resizing volumes will fail if some paths to the FlashArray are down. Pool expansions may fail in reduced connectivity scenarios, such as Purity upgrades, data network issues, or others. Workaround: Restore all paths to the array before attempting the resize again. |
PWX-21276 | If some valid and some invalid FlashArray or FlashBlade endpoints are provided, Portworx will fail to start. If invalid credentials or endpoints are entered (or an API token expires), Portworx will fail to start. Workaround: Correct the credentials in the secret and restart Portworx. |
PD-1024 | When using Portworx with FlashBlade (FB), users store login credentials, such as FA/FB IPs and access tokens, in the px-pure-secret Kubernetes secret. In the event that an access token expires or is otherwise invalidated, Portworx automatically provisions workloads onto the next accessible FB to avoid interruptions. As a result, users may not be alerted when FlashBlades become inaccessible, and workloads can concentrate on the remaining FlashBlades, impacting performance. Workaround: To avoid this issue, ensure the credentials stored in px-pure-secret are valid. If you find invalid credentials, correct them and restart Portworx to restore full use. |
2.8.1.6
May 20, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-23997 | A kernel panic occurs if any application tools try to perform a grep operation or user file system level commands on the pxd-control device. User Impact: The affected node will experience a kernel panic due to the Portworx kernel module being unable to handle the filesystem user level commands. Resolution: Portworx now handles these kinds of commands on pxd-control devices by denying access, preventing kernel panic. |
2.8.1.5
Mar 1, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22968 | In vSphere environments, unused virtual disks were left lingering around. User impact: Users may have seen multiple KVDB disks lingering around on worker nodes without getting cleaned up. Resolution: Portworx now detects these lingering disks and deletes them. |
2.8.1.4
Feb 15, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21514 | In vSphere environments, Portworx sometimes failed to remove KVDB drives. User impact: Users saw an additional KVDB drive when they listed all available drives in the Portworx cluster. |
2.8.1.3
Jan 24, 2022
Notes
Portworx now includes kernel module support for 4.15.0-163-generic
.
2.8.1.2
Nov 2, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21485 | The NFS rpc-statd service sometimes failed to start on Portworx nodes, preventing sharedv4 volumes from mounting. User Impact: Application pods using sharedv4 volumes sometimes became stuck in the ContainerCreating state, with volume mount operations failing with the error: mount.nfs: rpc.statd is not running but is required for remote locking.mount.nfs: Either use '-o nolock' to keep locks local, or start statd.\nmount.nfs: an incorrect mount option was specified Resolution: Portworx now detects when rpc-statd is either not running or is in an inconsistent state, and restarts it to ensure that sharedv4/NFS mounts proceed. |
2.8.1.1
Oct 13, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21506 | One of the Internal folders used for mounting the legacy shared (FUSE) volumes was always created, even if shared (FUSE) volumes were already present on the system. User impact: Mount paths used by shared (FUSE) volumes had wider permissions than desired. Resolution: This change prevents the internal mount path, specifically /opt/pwx/oci/rootfs/pxmounts , from being created when there are no shared volumes being used. |
2.8.1
Sept 20, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-21172 | In a Metro DR configuration, there can be multiple cluster domains within the same cluster. When a sharedv4 volume is created, its replicas are placed across these cluster domains. If an app requests a sharedv4 volume, then there is no guarantee which replica node will act as the sharedv4 NFS server. This improvement ensures that whenever a sharedv4 application is started in any of the domains, the volume is attached to a node in the same domain where the application is running, guaranteeing minimum latency. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21227 | While trying to run IOs on multiple volumes with an overlapped overwrite pattern, an sm abort error sometimes occurred on one of the nodes. User impact: This error caused Portworx to restart. Resolution: This rare deadlock during resync no longer occurs. |
PWX-21002 | A deadlock in the NFSWatchdog path caused lock contention and an inspect command to hang. User impact: Users experienced a mounting issue for the affected volume and saw the pod stuck in the ContainerCreating state. Resolution: This deadlock has been eliminated. |
PWX-21224 | Setting cloudsnap threads to 4 or less resulted in cloudsnap backups in hung state. User Impact: With cloudsnap threads (Cloudsnap maximum threads field in cluster options) set to 4 or below and doing more than 10 cloudsnaps at the same time, cloudsnaps would be in hung/stuck state without making any progress. Resolution: Incorrect check for thread count resulted in deadlock causing the above scenario, which has been addressed in this release. |
PWX-20949 | Issue" Cloudsnap restore may fail with error message "Restore failed due to stall". This happens if restore incorrectly thinks that the node is in maintenance mode. User Impact: These restores may never complete and user may with need to restart the node where the restore is stalled or reissue the restore command Resolution: This change fixes the issue where cloudsnap does not interpret node status. An upper layer module handles it preventing the issue. |
PWX-19533 | Fixed an issue where a node may accumulate over-writes for a volume causing the px-storage process to restart.User impact: None |
PWX-21163 | Prometheus was unable to access the Portworx internal etcd on a multi network interface setup, hence etcd alerts are not seen. User impact: Prometheus alerts and monitoring did not work properly for the alerts related to internal etcd on a multi-network interface setup. Resolution: Portworx is now allowed internal etcd access from all network devices, giving Prometheus access to scrape alerts. |
PWX-21197 | There was a regression in using limit_drives_per_pool in 2.8.0 User impact: limit_drives_per_pool is a runtime option to control the number of drives in a pool. The last pool could have more drives than the limit if the drive count was not an exact multiple of the limit, and if creating another pool would have resulted in too few drives within it. Internally, a new pool is created if the drive count is at least 50% of the limit available in the last pool.Resolution: These limits are now honored more strictly when a drive is added from maintenance mode. Any drive add operation will fall into a pool only if the drive count is within the limit. If not, a new pool will be formed. |
2.8.0
July 30, 2021
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Early access support for Pure FlashBlade as a Direct Access filesystem. With this feature, Portworx directly provisions FlashBlade NFS filesystems, maps them to a user PVC, and mounts them to pods. Reach out to your account team to enable this feature.
- Early access support for Pure FlashArray cloud drives. Use FlashArrays as a cloud storage provider. Reach out to your account team to enable this feature.
- Snapshot optimization using extent metadata: Reduce the amount of data sent to your cloud storage provider when taking cloud snapshots.
- SkinnySnaps: improve the performance of your storage pools when taking volume snapshots.
- Sharedv4 service volumes: improve fault tolerance by associating sharedv4 volumes with a Kubernetes service.
- You can now install Portworx on Nomad with CSI enabled
- Install and scale a Portworx cluster on VMware Tanzu with CSI.
- With Pure1 integration, Portworx can now automatically upload its diags to Pure Storage's call home service called Pure1.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20720 | Portworx now supports SharedV4 volumes on VMware Photon hosts |
PWX-20131 | You can now resize disk pools with disks of size 32TiB on Azure for up to 32TiB. |
PWX-20060 | The Portworx spec generator now creates the GA/v1 API version of the CSI VolumeSnapshot CRDs. |
PWX-18845 | Portworx now supports Amazon General Purpose SSD volumes (gp3). |
PWX-10281 | The Portworx CSI driver now supports Raw Block volumes for RWO PVCs. |
PWX-20102 | Licensing improvement: Autopilot licenses are now automatically included with all Portworx Enterprise licenses, including floating licenses. |
PWX-19553 | Alerts now bypass the API queue. As a result, pxctl will still show alerts even when the API queue is full. |
PWX-19496 | Kubernetes PVCs will now get created by the CopyOnWrite on demand setting by default. |
PWX-19320 | Portworx CSI Driver volumes will now be rounded up to the nearest GiB to match the Portworx in-tree volume plugin. This change only occurs for new volumes and volume size updates. |
PWX-20803 | Added photon support for pool caching. Portworx will try to install the required packages if enabled. If they fail, the Portworx installation will fail. Pool caching requires the following available packages:
|
PWX-20527 | The maximum number of cloud drives per node (not per pool) has increased from 12 to 32. Note that specific cloud providers may impose their own limits (remaining at 12), and that there are still limits per pool that may come into effect sooner. |
PWX-20423 | Sharedv4 Export Options Improvements: The storage class option export_options can now take any NFS export option as a comma separated list of strings. Portworx will apply those export options on the node where the volume is attached and exported over NFS. |
PWX-20204 | For cloud drive, the requirement to specify skip_deprecation has been removed and users can now use the pxctl sv drive add command without that. |
PWX-18529 | Portworx will now report back home in trial license installations |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20780 | Pods using encrypted sharedv4 volumes sometimes got stuck in the terminating state. User impact: If a node hosting a replica of a sharedv4 encrypted volume (server node) was rebooted, the application pods accessing that volume sometimes got stuck in the terminating state. Resolution: Portworx now detects when a sharedv4 encrypted volume server node restarts, and will automatically restart application pods using that volume to recover them to a functional state. |
PWX-20417 | On Kubernetes-v1.20.1 on the containerd-v1.4.4 container runtime,the Portworx installer used an invalid cgroups-path. User impact: Portworx failed to install Resolution: Portworx now properly installs on this Kubernetes version and runtime. |
PWX-20789 | A pool expand status message was incorrect. User impact: The status message in the pool expand operation that appears in the pxctl service pool show command output was inaccurate when the operation type was add-disk . Users saw the incorrect size amount by which the pool was being expanded by, but the operation functioned properly. Resolution: The status message now correctly states the size of the the pool that is expanding. |
PWX-20327 | When running on cloud at scale (more than 200 nodes), if there are certain nodes flapping or restarting in a loop, it would cause healthy Portworx nodes to slow down processing kvdb watch updates. User impact: Certain volume operations (for example, create or attach) take a long time or fail. |
PWX-20085 | Portworx failed to install when both -metadata and -kvdbDevice options were passed. User impact: Users saw their installations fail if they provided both options. Resolution: If users provide both options, Portworx installation will no longer fail. It now defaults to using the -metadata device for KVDB to maintain backward compatibility. |
PWX-19771 | Communication between nodes could block forever. User impact: Users sometimes saw volume access or management pend forever without timing out. Resolution: All communications in the Portworx cluster now timeout. A new pxctl cluster options allows you to configure a default RPC timeout. |
PWX-20733 | If a custom container registry was specified and the cluster was using containerd, the oci-monitor attempted to pull the Portworx Enterprise image from the wrong registry. User impact: Users saw installation fail. Resolution: oci-monitor now pulls the Portworx Enterprise image from the correct custom registry. |
PWX-20614 | When multiple oci-monitor container images are present (using the same image-hash but a different name), Kubernetes sometimes started the wrong container image. User impact: Users potentially saw the wrong Portworx Enterprise image to be loaded on the nodes. Resolution: The oci-monitor now consults multiple configuration entries when deciding which px-enterprise image needs to be loaded." |
PWX-20456 | Fixes the installation problem with containerd-v1.4.6 container runtime, running in "systemd cgroups" mode. User impact: If one is running a Kubernetes cluster configured with containerd-v1.4.6 (or higher) container runtime with systemd cgroups driver, Portworx service would fail to start. Resolution: Portworx startup issues are now resolved when using this container runtime/configuration. |
PWX-20423 | The security_label export option was removed after a node reboot. User Impact: When using sharedv4 with SELinux enabled, an app using sharedv4 volumes sometimes saw permission issues if the node where the volume was attached was rebooted. Resolution: Improvements to the sharedv4 volumes resolved this issue. |
PWX-20319 | Fixes an issue with Portworx service restarts hanging indefinitely, when DBus service is not responding. User impact: On host-systems that have a non-responsive DBus service, Portworx startup used to hang indefinitely. Resolution: Portworx startup no longer hangs if it cannot connect to the DBus service. |
PWX-20236 | In some scenarios, volume unmount may fail due to EIO errors on mount path, which could be due to prolonged downtime on the volume. User Impact: Pods may fail to terminate and reboot. Resolution: Continue to unmount the volume even when readlink fails with EIO on mount path. This allows pods to continue with remount of the volume. |
PWX-20187 | When passing a Kubernetes secret for etcd username/password using environment variables, they were taken and used "as is", rather than being "expanded" and replaced with the actual values from Kubernetes secret. User impact: When specifying the etcd username/password, the environment variables (e.g. populated by Kubernetes secrets) were not being "expanded" before configuring etcd connection. Resolution: You can now specify the etcd username/password via the environment variables. |
PWX-20092 | Queued backups may fail if the volume replica is reduced in such a way that the replica on the node assigned for queued backup gets removed. User Impact: Queued backups failed. Resolution: Users can wait for queued backups to complete before running the ha-reduce command, or re-issue the backup command once the HA reduce is complete. |
PWX-19805 | Portworx couldn't unmask the rpcbind service. User impact: Portworx could not properly integrate with NFS services, if NFS services were masked on the host. As a consequence, sharedV4 volumes could not be served from that host. Resolution: The Portworx service startup/restart now checks for masked NFS services, and automatically unmasks them. |
PWX-19802 | Migrations were failing with error - "Too many cloudsnap requests please try again" User Impact: If a cluster migration was triggered which migrated more than 200 volumes at once, the migration would fail since Portworx would rate limit the cloudsnap requests. Resolution: Cluster Migration will not fail if the internal cloud snap requests are rate limited. Portworx will gracefully handle those "busy" errors and retry the operation until it succeeds. |
PWX-19568 | When an NVMe partition is specified as a storage device, the device state is shown as "offline". User impact: Users sometimes saw their partitioned NVMe drive state as "offline". Resolution: NVMe partitions given as storage devices now correctly show as "online". |
PWX-19250 | Talisman px-wipe does not support etcd with username and password User impact: Using automated cluster-wipe procedure would not purge the cluster's data from the key-value database, when an external username/password -enabled EtcD was configured as cluster's KVDB. Resolution: The cluster-wipe procedure should now purge the data also from username/password -enabled EtcD." |
PWX-19209 | OCI-Mon/containerd occasional install/upgrade glitches User impact: During Portworx upgrades on containerd container-runtimes, there was a race-condition with the cleanup of install-container, which could block the upgrade until the node got rebooted. Resolution: The cleanup procedure was improved, so the race-condition no longer occurs" |
PWX-18704 | Drive add not respecting limit_drive_per_pool User impact: When a runtime option limit_drive_per_pool was set to 2, a pool delete/initialize/drive add operation sometimes resulted in pools of more than 2 drives.Resolution: The px-runc install-time configuration limit_drives_per_pool is now honoured during drive. |
PWX-18370 | A tracker file from previous Portworx version was not getting deleted during wipe causing sharedv4 volume mount issues. User impact: If you deleted a Portworx version and reinstalled a new version, the wipe process did not remove a tracker file used for sharedv4 volumes. This will fail to mount new sharedv4 volumes after reinstallation. Resolution: The tracker file is now removed. |
PWX-20485 | In versions earlier than Portworx 2.7.0, pxctl volume check on ext4 formatted secure pxd volume returns the "Background Volume Service not supported on Encrypted volumes" error. User Impact: Users couldn't use pxctl to do volume checks on ext4 formatted secure pxd volumes. A workaround was fsck directly from within the Portworx container. Resolution: Portworx now supports volume check on ext4 formatted secure pxd volume using pxctl. |
PWX-20303 | Fixed an issue where KVDB updates were not pulled in from other nodes when this node is unable to get the updates from KVDB. |
PWX-20398 | Fixed an issue where request processing started before the px-storage process was initialized causing it to restart. User impact: These restarts may have created core files, users may have seen the process restart. |
PWX-20690 | The px-storage process incorrectly finished processing internal timestamps causing it to restart. User impact: These restarts may have created core files, users may have seen the process restart. |
PWX-19005 | Cloudsnaps may be stuck irrespective of whether you issued stop on the cloudsnap taskID. This sometimes happened when the local snapshot was deleted while cloudsnap was still active. In these conditions, snap detach operations failed and caused cloudsnaps to get stuck. User Impact: Users saw stuck cloudsnaps, which could only be fixed by restarting the node where the stuck cloudsnap was active. Resolution: Detach failures are not retried forever, minimizing this scenario. |
PWX-20708 | Cloudsnap size was not tracked correctly for incremental cloudsnaps in cloudsnap metadata. User Impact: Users may not see correct cloudsnap size because of this issue. Resolution: Now the incremental cloudsnap size is tracked correctly. |
PWX-20364 | An incorrect check searched all nodes in ReplicaSets to be online. User impact: Cloudsnaps may fail on restart if some of the replica nodes are online. Resolution: Removed the check for online nodes. |
PWX-20237 | Cloudsnap operations did not choose the node where the previous cloudsnap was executed, specifically for reducing the HA level volume. User impact: Some cloudsnaps could be shown as full, even though it could have been incremental. Resolution: If the volume replicas differ between previous cloudsnap and current cloudsnap, Portworx now chooses the same replica node where the previous backup ran. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-896 | With 2.8.0, the --sharedv4_service_type option was added to the pxctl volume create and pxctl volume update commands. This new option is not applicable to non-Kubernetes environments, such as Nomad . This option works only in the Kubernetes environment. |
PWX-20666 | The issue occurs during the first boot of a newly created cluster, and if the node contains Ubuntu 20.04/Fedora CoreOS 33.20210301.3.1 as the host and you use the -j auto option. Any Linux distribution using systemd version 245 and above seems to have this problem. Ubuntu 18.04 containers use version 237 of systemd . User impact Portworx takes additional time to come up. Recommendation: Do not use the -j auto option. If you use the option, then consider installing parted command on the host. If that is not possible, wait for Portworx to reach stability. You must only wait for the first boot after new cluster creation. |
PWX-20836 | When using Portworx cloud drives and using a node selector in the StorageCluster CRD specification, a node may come up with a drive configuration that is different from what is specified in the StorageCluster CRD under the node selector. This occurs when one Portworx node creates a drive based on the node selector configuration. If that drive is not currently being used by that node, another node that is trying to initialize may pick up this available drive (even if that newly created drive does not match with the user specified configuration for that node). |
OPERATOR-410 | A workaround for deploying Operator on OCP (IBM) with nodes labeled as worker/master: Remove the nodeAffinity node-role.kubernetes.io/master from the live StorageCluster spec to deploy OCI pods on control plane nodes. |
PD-891 | Currently Photon OS versions 3.0 and 4.0 do not allow a user to install mdadm using default yum and tdnf repositories. As this package is a must for px-cache deployment Pure Storage does not support caching on Photon OS on version 2.8.0 |
PWX-20839 | Portworx 2.8.0 does not support the Photon distro for px-cache. |
PWX-20586 | When creating a burst of PVCs (a large number of PVCs within few seconds) with the Portworx CSI driver or a separate Portworx PVC controller, the Portworx volume creations can get stuck in a pending state. Under this condition, pxctl volume list will show these volumes as “down - attached”. The volume creates will eventually converge and complete, but this can take more than 1 hour.Workaround: Stagger PVC creations in smaller batches if using the CSI driver or the Portworx PVC controller. |
PWX-13190 | The pxctl credentials delete <uid> command is currently not supported for Kubernetes secret provider. To delete the credentials use kubectl delete secret <> -n <namespace> , where <namespace> is where Portworx is installed. |
PWX-9487 | In a metro DR setup where Portworx spans across 2 data centers, a node can be started with an argument --cluster_domain and value set to witness . This node will act as a witness node between the two data centers and it will also contribute to quorum even if it is started as storageless node. |
PWX-20766 | The Portworx CSI Driver on Openshift 3.11 may have issues starting up when a node is rebooted. Customers are advised to upgrade to Openshift 4.0+ or Kubernetes 1.13.12+. |
PWX-20766 | The Portworx CSI Driver will not be enabled by default in the PX-Installer for Kubernetes 1.12 and earlier unless a flag csiAlpha=true is provided. |
PWX-20423 | Portworx uses a set of default export options in storage classes which cannot currently be overridden. The export_options parameter only allows extending the current default options. |
PD-915 | If a storage node is removed or replaced from the cluster and that node was the NFS server for a sharedv4 volume, client application pods running on other nodes can get stuck in pod Terminating state when deleted. User impact: Application pods using sharedv4 volumes remotely will get stuck in Terminating state. Recommendation:
|
PD-914 | When a pod that is using a sharedv4 service volume, is scheduled on the same node where the volume is attached, Portworx sets up the pod to access the volume locally via a bind mount. If the pod is scheduled on a different node, the pod uses an NFS mount to access the volume remotely. If there is a sharedv4 service failover, the volume gets attached to a different node. After the failover, pods that were accessing the volume remotely over NFS, continue to have an access to the volume. But the pods that were accessing the volume locally via bind-mount, lose access to the volume even after Portworx is ready on the node since the volume is no longer attached to that node. Such pods need to be deleted and recreated so that they start accessing the volume remotely over NFS. If the stork is enabled in the Kuberenetes cluster, it automatically deletes such pods. But sometimes a manual intervention may be required if the stork is either not installed or fails to restart such pods. Recommendation: Enable stork to reduce the likelihood of running into this issue. If you do run into this issue, use the command kubectl pod delete -n <namespae> <pod> to delete the pods which cannot access sharedv4 service volume anymore because the sharedv4 service failed over to another node. This problem does not apply to the pods that are using "sharedv4" volumes without the service feature. This problem does not apply to the pods that are accessing "sharedv4 service" volumes remotely i.e. from a node other than the one where volume is attached. |
PD-926 | For information about Sharedv4 service known issues, see the notes in the Provision a Sharedv4 Volume section of the documentation. |
2.7.4
August 27, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21057 | Pods failed to come up with restored PVCs that were encrypted with vault namespace secrets. User impact: Pods using PVC that is cloned from snapshot, which is encrypted using vault namespace secrets, will remain in container creating. Resolution: Portworx now fixed this issue to copy over the required values for encrypted volumes for cloned PVC. Pods will not remain in container creating for this issue. |
PWX-21139 | In a DR setup, during the failover and failback of an encrypted volume, some labels that Portworx used for encrypting the volumes got removed. User impact: Failback or restore of encrypted volumes using per-volume secrets would fail as restore of the volume on the source cluster would fail. Resolution: The cloud backups and restores done as a part of failover and failback of encrypted volumes ensure that the encryption related labels are not removed. |
2.7.3
July 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20323 | Portworx now tries to reconnect to the KVDB at least 3 times before restarting the Portworx process. |
PWX-19994 | Added two new runtime options: quorum_timeout_in_seconds : Sets the maximum time for which nodes will wait in seconds to reach quorum. After this timeout, Portworx will restart.kv_snap_lock_duration_in_mins : Sets the maximum timeout for which Portworx will wait for a KVDB snapshot operation to complete. After this timeout, Portworx will panic and restart if the snapshot does not complete. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20641 | RedHat OpenShift 4.7.16 write locks partitions Portworx uses during upgrades. Portworx versions 2.5.x through 2.7.2.1 generated chattr-protected immutable /etc/pwx/.private.json files. Lastly, OpenShift 4.7 started protecting root-partition (using read-only mountpoints). User impact: These files were sometimes collected into OpenShift's historical snapshots and interfered with OpenShift upgrades. Resolution: the px-runc statup service now scans and fixes any immutable .private.json files found in CoreOS file-system snapshots and OpenShift snapshots. This works for both read-only and read-write partitions. |
PWX-15391 | When using an internal KVDB, Portworx encountered an error if one of the KVDB nodes went down. User impact: Users saw impacted filesystems enter read-only mode. Resolution: The run-flat feature keeps the Portworx volumes online, even if the KVDB is down. New create/attach/mount operations are not allowed, but existing volumes don't see an I/O interruption as long as all the replicas of the volume are online. Note the following implications for the external and internal KVDB: External KVDB: as long as all the Portworx nodes continue to stay up, all the volume replicas should stay online and with no I/O interruption. However, if any node goes down, volumes with a replica on the down nodes will see an outage. Internal KVDB: if the KVDB is down, it almost certainly implies that at least 2 Portworx nodes are down. Any volumes with a replica on the down nodes will see an I/O interruption. |
PWX-20075 | CSI VolumeSnapshotContent objects incorrectly displayed a restore size of 0. User impact: External backup systems that depend on the CSI VolumeSnapshotContent restore resize sometimes failed. Resolution: The Portworx CSI driver now correctly adds the restore size to new CSI volume snapshots, and snapshot contents will have the correct RestoreSize. |
PWX-19518 | Overwriting a cluster wide secret when using Vault Namespaces failed with the NotFound error. User impact: Users were unable to use Vault as their secret management store. Resolution: The issue has been fixed and Portworx now uses the correct vault namespace while resetting the cluster wide secret. |
PWX-20149 | Portworx encrypted volume creation failed due to lock expiration. User impact: In certain scenarios, Portworx encrypted volume creation took longer than expected and eventually timed out. Resolution: Portworx no longer times-out when creating encrypted devices. |
PWX-20519 | On vSphere environments experiencing high I/O latency, Portworx cluster installation failed while setting-up the internal KVDB. User impact: Users saw the internal KVDB fail to initialize the disks within the allocated time. Resolution: Portworx now initializes a "thin" disk, rather than a "zeroedThick" disk by default; this option can be overridden. |
2.7.2.1
June 25, 2021
Notes
- Portworx 2.7.2.1 once again supports installations on Ubuntu 16.04 with the 4.15.0-142-generic and 4.15.0-144-generic kernels. See the list of supported kernels for information.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20594 | Portworx erroneously allowed the creation of large replication sets, causing the px-storage process to create a core file.User impact: Users required help from support to fix the volume definition. Resolution: Portworx no longer allows the creation of replication sets larger than 3. |
PWX-20629 | I/O did not progress for Portworx volumes and mount points until the px-storage process was restarted. User impact: I/O may not have progressed and users may have had to restart the px-storage process.Resolution: Portworx no longer requires a restart of the px-storage process. |
PWX-20619 | I/O to the sharedv4 volumes was blocked while the NFS server reloaded the exports. When there were a large number of sharedv4 volumes being exported from the node, I/O was blocked for a prolonged period of time. User impact: Apps using the df command saw it take a long time to complete, causing pods to reset unnecessarily when df was used as a health-check. Removing one of the export options fixed this issue.Resolution: The df command will no longer slow down under these circumstances. |
PWX-20640 | When uploading diagnostics, Portworx made its configuration file immutable. User impact: When made immutable, configuration files may interfere with Opershift upgrades. Resolution: Portworx no longer makes its configuration file immutable when uploading diagnostics. |
2.7.2
May 18, 2021
Notes
- Portworx 2.7.2 no longer supports installations on Ubuntu 16.04 with the 4.15.0-142-generic kernel. Upgrade to Ubuntu 18.04. See the list of supported kernels for information.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20106 | Portworx now supports various kernels hosted on IBM Cloud in air-gapped environments. Specifically, Portworx supports the 4.15.0-142-generic kernel on Ubuntu 18.04. |
PWX-20072 | Users can now install Portworx from the IBM catalog onto a private cluster and enable the integrated license and billing feature for these clusters. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14559 | The Read and write throughput values in Grafana and Prometheus were erroneously transposed. User impact: Users saw read throughput values where they expected to see write throughput values, and vice versa. Resolution: These values now reflect the correct metric in Grafana and Prometheus. |
2.7.1
April 29, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18530 | Storage pool caching now supports caching on SSD pools, which will be cached with NVME drives if they're available. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-19368 | If Floating licenses were used on a cluster, wiping a node or the cluster didn't return the license leases back to the license server. User impact: Users had to wait for the license leases to expire in order to reuse them. Resolution: Wiping a node or cluster now correctly releases the license leases back to the license server. |
PWX-19200 | Portworx was unable to attach AWS cloud drives when running on AWS Outpost. User Impact: When running on AWS Outpost, Portworx failed to attach the backing drives, causing the cluster initialization to fail. Resolution: Portworx now includes the Outpost ARN in the EBS volume creation, which allows the volume to be attached in the instance |
PWX-19167 | In very rare cases, the px-storage process sometimes aborted and restarted due to a race condition when releasing resources.User impact: Users experienced no interruptions, but may have seen Portworx restart. Resolution: This race condition no longer occurs. |
PWX-19022 | Pods sometimes failed to mount a volume and may not have started if that volume was first attached for background work and then later attached for mounting purposes. User impact: Users saw their pods fail to start. To correct this, they had to detach and reattach the volume, usually by stopping and restarting the affected application. Resolution: Pods no longer fail to mount volumes under these circumstances. |
PWX-18983 | An issue with security and non-CSI deployments during volume deletion caused Portworx to return incorrect information when it detected an error during a request to inspect a volume. User impact: Users saw incorrect information when inspecting a deleted volume. Resolution: Portworx now displays the correct information when inspecting a volume under these circumstances. |
PWX-18957 | After recovering an offline cluster from a KVDB backup file, that cluster's license entered an invalid state. This was caused by the ClusterUUID in the restored KVDB having extra quotes. User impact: Users attempting to recover clusters in this manner saw their licenses fail to restore correctly. Resolution: During the recovery process, Portworx now ensures that no extra quotes are added to the ClusterUUID once the recovery is done. |
PWX-18641 | Portworx displayed an incorrect alert for snapshots when the parent volume's HA level was decreased. User impact: Users may have seen this incorrect alert. Resolution: Portworx will no longer attempt to run HA level reduce operations on snapshots which have an HA level of 1 (which fails and triggers incorrect alert) when the HA level is reduced on the snapshot's parent volume. |
PWX-18447 | Portworx enabled the sharedv4/NFS watchdog even if the --disable-sharedv4 flag was set. User impact: Despite disabling the sharedv4 volume feature, users may have seen errors about NFS/sharedv4 being unhealthy. Resolution: Portworx no longer enables the NFS watchdog if sharedv4/NFS is disabled. |
PWX-17697 | Users couldn't remove storage pool labels. User impact: When users attempted to remove storage pool labels, they saw the command return Pool properties updated , but Portworx didn't remove the label.Resolution: Storage pools now have the same behavior as volumes. You can now remove labels by passing --labels <key=> without a value to remove the previously added label. |
PWX-7505 | Unsecured nodes could be added to a secured cluster, specifically if those nodes are part of a different Kubernetes cluster with a different configuration manifest. User impact: This allowed any of the unsecured nodes to join a secured cluster. As a result, the API endpoints for the unsecured nodes would be unsecured and allow anyone to execute any pxctl or RPC request. Resolution: Portworx can now be configured using the PORTWORX_FEATUREGATE_CHECK_NODE_SECURITY feature gate to prevent unsecured nodes from joining a cluster if at least one node is secured. |
PWX-19503 | The px-storage process initialization got stuck if the "num_cpu_threads" or "num_io_threads" rt_opts value did not equal the "num_threads" rt_opt value.User impact: Portworx didn't come up, and users needed to remove the "num_threads" rt_opts for the px-storage process to finish initialization.Resolution: Portworx initialization no longer gets stuck. |
PWX-19383 | A Kubernetes RBAC issue with the CSI resizer installation caused CSI PVC resizing to fail. User impact: Users saw CSI PVC resize failures. Resolution: The Portworx spec generator now correctly adds the necessary RBAC for the CSI Resizer to function properly. |
PWX-19173 | CSI VolumeSnapshotContent objects incorrectly displayed a restore size of 0. User impact: External backup systems that depend on the CSI VolumeSnapshotContent restore resize sometimes failed. Resolution: The Portworx CSI driver now correctly adds the restore size to a VolumeSnapshotContent object. |
PWX-18640 | The Portworx alert VolumeHAUpdateFailure has been updated to VolumeHAUpdateNotify for cases where the update is not failing. User impact: Users saw misleading VolumeHAUpdateFailure alerts when an update succeeded. Resolution: Portworx alerting system sends the correct alarm event for this case. |
PWX-19277 | Cloudsnaps sometimes failed to attach the internal snap for an aggregated volume if the node containing the aggregated replica was down. User impact: While the cloudsnap operation was marked as failed, the error description did not display the correct error message. Solution: Cloudsnaps no longer fail to attach, and error messages now correctly indicate that the node is down. |
PWX-19797 | With 2.7.0, cloudsnap imposed restrictions on active cloudsnap commands being processed. User impact: Async DR sometimes failed for some volumes. Solution: 2.7.1 increases the number of commands being processed to a much higher value, thereby avoiding async DR failures. |
2.7.0
March 23, 2021
New features
- Announcing the
auto
IO profile, which applies an IO profile to optimize volume performance based on the workload data patterns it sees.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16376 | Multiple KVDBs will now run within the same failure domain if there aren't enough failure domains available to place each KVDB on its own. |
PWX-11345 | Introducing a new Prometheus metric to track latencies Portworx sees from the KVDB. This metric tracks the total time it takes for a KVDB put operation to result in a corresponding watch update from the KVDB. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18789 | Expanding a pool using the lazyzeroedthick disk type in vSphere failed with the error: "found no candidates which have current drive type: lazyzeroedthick". User impact: If a user expanded a pool using the lazyzeroedthick disk type in vSphere, it failed with a message like: "could not find a suitable storage distribution candidate due to: found no candidates which have current drive type: lazyzeroedthick". Resolution: This happened because a section for the lazyzeroedthick disk type was missing in the storage decision matrix config map that ships with Portworx; this has now been added. |
PWX-17578 | Updating volumes with pxctl commands sometimes reset other values that were previously set by a spec. User impact: If users updated the queue-depth value using the pxctl volume update --queue-depth command, the directIO value for that volume was reset.Resolution: Portworx no longer resets volume-specific fields when other fields are updated using pxctl commands. |
PWX-18724 | When using the runtime option rt_opts_conf_high under heavy load, the Portworx storage process sometimes ran out of internal resources and had to restart due to an assertion failure. User impact: Upon restart, the Portworx process may have gotten stuck in a restart loop, resulting application downtime. Resolution: The resources are now sized correctly when the rt_opts_conf_high runtime option is in use. |
PWX-18632 | Portworx displayed expiration dates for permanent licenses. User impact: Users saw a distinct expiration date for their permanent licenses. Despite this reporting error, permanent licenses would not actually expire. Resolution: Portworx now correctly reports permanent licenses as never expiring. |
PWX-18513 | If a volume with an io_profile set to db_remote and replication level of 1 was backed-up using a cloudsnap, attempting to restore that cloudsnap would fail through Stork or in PX-Backup. User impact: Users attempting to restore this kind of cloudsnap without providing an additional parameter to force the replication level to 2 encountered an error. Resolution: The cloudsnap restore operation now resets the io_profile to sequential if it finds a volume with a replication level of 1 and io_profile set to db_remote . |
PWX-18388 | Pool expand operations using the resize-disk method failed on vSphere cloud drive setups. User impact: Users with storage pools powered by vShpere cloud drives could not expand them using the resize-disk method.Resolution: The resize-disk method failed because the rescan-scsi-bus.sh script was missing from the Portworx container. This script has been replaced, and users can once again expand vSphere cloud drive storage pools using resize-disk . |
PWX-18365 | Portworx overrode the cluster option for optimized restores if a different runtime option for optimized restores was provided. User impact: Because Portworx prefers cluster options over runtime options as a standard, users may have been confused when this runtime option behaved differently. Resolution: Portworx no longer honors runtime options for optimized restores; you must use cluster options to enable optimized restores. |
PWX-18210 | The px/service node-label is used to control the state of the Portworx service. However, the px/service=remove label did not properly remove Portworx.User impact: When users attempted to remove a node, Portworx became stuck in an uninstall loop on that node. Resolution: The px/service=remove label now behaves as it previously did, and now uninstalls Portworx on the node as expected. |
PWX-17282 | Previously, every Portworx deployment using the Operator included Stork, regardless of whether or not you enabled the Stork component on the spec generator. User impact: Users' deployments always included Stork, even if they did not want to enable it. Resolution: The spec generator now correctly excludes Stork if you don't enable it. |
PWX-19118 | A resize operation performed while a storage node was in the StorageDown state caused the px-storage process to restart if the node had replica for the volume being resized. User impact: Users experienced no interruptions, but may have seen Portworx restart. Resolution: the px-storage process no longer restarts under these circumstances. |
PWX-19055 | Portworx clusters did not auto-recover when running on vSphere cloud drives in local mode. User Impact: When users installed Portworx using vSphere cloud drives on local, non-shared datastores and used an internal KVDB, Portworx did not automatically recover if a storage node went down. Resolution: Portworx will no longer incorrectly mark its internal KVDB drives as storage drives, allowing the internal KVDB to recover as expected. |
PWX-17217 | Portworx failed to exit maintenance mode after a drive add operation was shown as done . User impact: Portworx stays in maintenance mode and users cannot exit it. Resolution: Portworx now properly exits maintenance mode after a drive add operation. |
PWX-19060 | When Portworx was configured to use email, an entry was printed to the log that contained hashed email credentials. User impact: Hashed, potentially sensitive information may have been written to the logs. Resolution: Portworx no longer prints this hashed information into the log. |
PWX-19028 | Portworx sometimes hung when evaluating multipart licenses where one of the licenses had expired. User impact: Users saw Portworx hang and had to reset the Portworx node. Resolution: Portworx no longer hangs in this scenario. |
Notes
- Portworx 2.7.0 is not currently supported on Fedora 33
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PWX-19022 | Attaching/Mounting a volume on the same host where it was attached internally for background work (such as cloudsnaps) fails to create the virtual kernel device. User impact: The pod may fail to mount the volume and may not start. Recommendation: Detach and re-attach the volume to fix this issue. |
2.6.5
March 6, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18967 | Portworx occasionally locked volume creation for a prolonged period when taking more than one cloudsnap for the same volume. User Impact: Users experienced longer response times when creating new volumes. Resolution: Cloudsnaps now lock only the volume they're being taken on, and no longer interfere with volume creation. |
2.6.4.1
February 18, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18625 | In certain corner cases, when a volume being restored on a destination cluster was deleted before a restore completed, async DR or migration sometimes got stuck. User Impact: Some of the nodes in the destination cluster may have become slow, with logs showing prints similar to this: time="2021-02-16T21:51:23Z" level=error msg="Failed to attach cloudsnap internally :774477562361631177 err:Volume with ID: 774477562361631177 not found" Resolution: Portworx now fails the restore operation if a volume is deleted before a restore completes. |
2.6.4
February 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18549 | Storage pools with an auto journal partition can now be expanded with a drive resize operation. Use pxctl service pool expand -o resize-disk for this operation. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18403 | When running vSphere cloud drives, Portworx initialization sometimes failed due to a time out in looking up the disk paths. User impact: Users with VMs containing 2 or more disks that don't show up in the /dev/disk/by-id path saw Portworx initialization time out. Portworx looked for the /dev/disk/by-id path for each disk for 2 minutes before timing out.Resolution: Portworx will now perform a udevadm trigger if it cannot find the disks; the timeout has been removed. |
2.6.3
January 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-17546 | Portworx deployments no longer use network ports 6060 and 6061. |
PWX-16412 | Added support for proxy via the PX_HTTP_PROXY environment variable for usage-based reporting APIs. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-17663 | In certain scenarios where the PV to PVC mapping was changed out of band, Portworx showed incorrect "Volume Consumers" under the volume inspect command output. User impact: Users saw incorrect values in the volume inspect command output. Resolution: Portworx now correctly shows values for "Volume Consumers" in the inspect command output. |
PWX-17155 | Sharedv4 volume consumers were tracked based on nodeID, which changes when a storageless node becomes storage node. User impact: The sharedv4 pods running on a storageless lost access to the mounted sharedv4 volume when that node that was restarted to assume role of storage node. Resolution: Portworx now uses nodeIP instead of nodeID to track the sharedv4 clients as the node IP remains the same when the role changes from storageless to storage node. |
PWX-17699 | Portworx created the incorrect type of vSphere disks when using Portworx disk provisioning. User impact: Portworx incorrectly parsed the lazyzeroedthick disk type provided by users in the vSphere cloud drives spec, and instead created the default eagerzeroedthick disks. Resolution: Portworx now correctly parses the spec and created thes correct disk type. |
PWX-17450 | Volume mount operations inside pods on destination clusters sometimes failed after async DR/Migration. User impact: Users sometimes needed to restart their pods after DR/Migration to correct failed volume mounts. Resolution: These volume mount operations no longer fail. |
PWX-18100 | Expanding pools with the add-drive option created new pools instead of expanding an existing pool. User impact: Users saw new pools created when they were expecting pools to expand in size. Resolution: Portworx now correctly expands pools when the add-drive option is presented. |
2.6.2.1
January 7, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-17725 | Migration cloud snapshots sometimes failed due to overlapping extents when being transferred to the cloud. User impact: Users saw migrations partially pass. Resolution: Cloud snapshots now handle the transfer of overlapping extents more gracefully to achieve successful cloud snapshots and migrations. |
2.6.2
December 7, 2020
New features
- Announcing a new command for transferring a Portworx cloud driveset from one storage node to a storageless node. This command is currently supported only for Google Cloud Platform, and is not supported when Portworx is installed using an internal KVDB.
- Portworx now allows you to drain/remove volume attachments from a node through the
pxctl service node drain-attachments
command. - Portworx now supports IBM Hyper Protect Crypto Services (IBM HPCS) as a key management store.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-17154 | Portworx now supports a NO_PROXY environment variable with a licensing code that defines which hosts will not use an HTTP proxy for licensing actions. In addition, PX_HTTP_PROXY and PX_HTTPS_PROXY environment variables will be ignored when using license-servers and floating licenses, unless you specify the PX_FORCE_HTTP_PROXY=1 environment variable (i.e. Portworx will assume local-access when working with Floating licenses). |
PWX-16410 | Portworx now supports K3s v1.19. |
PWX-15234 | You can now delete pending references to credentials from the KVDB. |
PWX-16602 | Portworx now allows for a runtime option to disable zero detection for converting zero-filled buffer to discard. |
PWX-14705 | You can now set the "Sender" for the email alerts as follows: sv email set --recipient="email1@portworx.com;email2@portworx.com" . |
PWX-16678 | Portworx now displays alerts when it overrides the user-provided value for the maxStorageNodesPerZone parameter. |
PWX-16655 | Portworx now supports the pxctl service pool cache status <pool> command in all operational modes. The following CLI pxctl commands have been moved to only be supported in "pool maintenance" mode and deprecated from "maintenance" mode:
|
PWX-13377 | Applied the latest OS updates to resolve most of the vulnerabilities detected. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16137 | A race-condition between Portworx and Docker occurred at startup. User impact: If the node cannot install NFS-packages (i.e. air-gapped host environments), and runs Portworx with the Docker container-runtime, users might encounter a race-condition on node reboot that leads to Docker creating new local volumes instead of using existing Portworx volumes. Resolution: Portworx now detects and prevents Docker from creating new local volumes on node reboot. |
PWX-16136 | Portworx pods hung when the dbus service became unresponsive.User impact: In situations where the dbus service running on the host became unresponsive, the Portworx pod wasn't able to upgrade Portworx or propagate configuration changes. Resolution: Portworx now detects when the dbus service is unresponsive and fails over to different methods to run the Portworx upgrades or change configuration. |
PWX-16413 | A harmless warning was sometimes displayed during Portworx installation. User impact: When installed into the staging area, Portworx displayed a warning when setting up the services; these warnings were safe to ignore, but caused unnecessary confusion. Resolution: These warnings are no longer displayed. |
PWX-16420 | Portworx pods running on Ubuntu/20.04 nodes did not proxy the portworx-output.service logs to the OCI-Monitor display. User impact: On Ubuntu/20.04, the Portworx pods did not proxy the Portworx service logs. This made troubleshooting difficult on Kubernetes deployments that do not provide SSH access to the hosts. Resolution: The Portworx pods now correctly proxy the Portworx service logs. |
PWX-16418 | When running on Kubernetes nodes using the ContainerD container runtime, Portworx installs and upgrades sometimes failed. User impact: If a Portworx installation or upgrade was interrupted on the nodes using the ContainerD container runtime, further attempts to install or upgrade Portworx sometimes failed until the node was rebooted. Resolution: Portworx now performs a more thorough cleanup before each install and upgrade operation, solving this issue. |
PWX-16853 | Multipart license expiration was not clear. User impact: When running Portworx on a multipart license (i.e. a license composed out of several parts/ActivationIDs), different parts of the license could expire at different times. The pxctl status and pxctl license list commands didn't provide any indication that some parts of the license would expire sooner. Resolution: The pxctl status and pxctl license list commands now display a notice if a part of the license will expire sooner than the overall license. |
PWX-16769 | License updates sometimes failed to propagate across all nodes. User impact: For users with ETCD as the KVDB and aggressive ETCD compactions, some of the nodes in Portworx cluster skipped the automatic application of updated licenses. Users had to restart the Portworx service to pick up the changes. Resolution: Portworx no longer skips license updates when aggressive ETCD compactions are used. |
PWX-16793 | Sometimes Portworx installation timed out while installing NFS packages. User impact: Some host environments may have a large amount of package repositories or have a slow internet connection, which could lead into timeouts while installing NFS services during Portworx installation or upgrade, which ultimately results in inability to use SharedV4 volumes that depend on NFS. Resolution: Portworx installs have been sped up and avoid timeouts with NFS services installations. |
PWX-17057 | With Linux kernel 4.20.x and above, NFS pool stats are incremented at much slower pace than an hour or it may not increment at all if there are no exports. User impact: With Portworx versions prior to 2.6.2, there may have been false alarms about all NFS threads being busy when there were no exports. Resolution: Portworx no longer processes the NFS pool stats when there are no NFS exports. |
PWX-16775 | The Google object store did not have pagination for object enumeration, which caused any list call to list everything in the bucket. User impact: Cloudsnap backups and restores failed to start and the request timed out. Listing cloudsnaps through pxctl also timed-out.Resolution: Added pagination to object enumeration with the Google object store. |
PWX-16796 | Proxy username and password were ignored as part of PX_HTTP_PROXY on Portworx Essentials causing license renewal to fail. User impact: Portworx essentials clusters went into "license expired" mode with PX_HTTP_PROXY Resolution: Portworx essentials now honors Username and Password fields given as part of PX_HTTP_PROXY to successfully make connections with proxy |
PWX-16072 | When Portworx was installed in PAYG mode, the cluster license expired after being unable to connect to that billing server instead of going into maintenance mode. User Impact: When a PAYG node license expired, users had to recommission the node. Resolution: Portworx PAYG nodes now enter maintenance mode correctly when the billing server is unreachable. |
PWX-16429 | Adding a new drive using the pxctl service drive add command was failing due to an issue with applying labels on the new pool. User impact: If users wanted to add a new Portworx pool to the node, their command to add a new drive using pxctl service drive add failed with an error message about labels being too long. This prevented users from creating new pools. Resolution: The Kubernetes labels are being skipped to be applied to the backend storage pools. |
PWX-16206 | Portworx failed to correctly detect the value of the maxStorageNodesPerZone parameter when running on GKE. User impact: When running on GKE with autoscaling enabled, Portworx did not detect the preferred value to use for the maxStorageNodesPerZone parameter for its cloud drives. As a result, Portworx would run without or with an incorrect value for the maxStorageNodesPerZone parameter. This resulted in issues when the cluster size was scaled and unintended nodes became become storage nodes. Resolution: The calculation for the value of the maxStorageNodesPerZone parameter on the GKE autoscaling min pool size values has been fixed. In addition to this, if the minimum number of nodes in the cluster is lower than the total number of zones, and at least one Portworx node will now be made a storage node in a given zone. |
PWX-16554 | An incorrect check prevented HA level restoration for volumes with HA level 3 under some conditions during decommission operation. User impact: Decommission operation failed to restore HA level for affected volumes if the volume had HA level 3 under some conditions. Resolution: When you decommission a node, Portworx now properly restores the HA level. |
PWX-16407 | DR migration sometimes became stuck in the "Active" state when Portworx restarted while migration was beginning . User impact: Any additional DRs also became stuck and did not finish. Resolution: Portworx now handles this situation better. |
PWX-16495 | Currently, the KVDB lock is held over the Inspect API. If the remote node didn't respond, Portworx held the KVDB lock for more than 3 minutes and the node where the Attach was issued asserted. User impact: The Portworx service could assert and restart if it tried to remotely detach a volume from a peer node but the request to do that took more than 3 minutes. Resolution: Portworx will not assert and restart if such a request gets stuck or does not return within a given timeout. |
PWX-13527 | Internal KVDB startup would fail. This would usually require a wipe of the node and re-install. User impact: Users saw the following error: "Operation cannot be fulfilled on configmaps". Resolution: Portworx will now detect such errors received from Kubernetes and will retry the operation instead of exiting. |
PWX-16384 | When a node with a sharedv4 encrypted volume attached was rebooted, the volume was not re-exported over NFS for other nodes to consume. User impact: Since it's an encrypted volume, it couldn't be attached without a passphrase. Resolution: Portworx now triggers a restart of the app, which will reattach the encrypted sharedv4 volume. |
PWX-16729 | A particular race condition caused an unmount of a sharedv4 volume to succeed without actually removing the underlying NFS mountpoint. User impact: This caused the pod using the sharedv4 volume to be stuck in the "Terminating" state. Resolution: Portworx no longer experiences this race condition. |
PWX-16715 | In certain cloud deployments, API calls to the instance metadata service or the cloud management portals are blocked or routed through a proxy. User impact: In these cases, Portworx calls to the cloud were blocked indefinitely, causing Portworx to fail to initialize. Resolution: Portworx now invokes all the cloud and instance metadata APIs with a timeout and will avoid getting blocked indefinitely. |
PWX-14101 | For certain providers like vSphere, when a cloud drive created by Portworx was deleted out of band, Portworx ignored it, created a new disk, and started as a brand new node. User impact: This caused an issue if there were no additional licenses in the cluster. Resolution: If a disk is deleted out of band or is moved to another datastore in vShpere, Portworx now errors-out and does not create new drives. |
PWX-16465 | Portworx held a lock while performing operations in the dm-crypt layer, or while making an external KMS API call for encrypted volumes. If either of them took longer than the expected amount of time, Portworx asserted. User impact: In certain scenarios, Portworx restarted while attaching encrypted volumes. Resolution: Portworx will no longer assert if any calls get stuck in the "dm-crypt" layer or if the HTTPS calls to the KMS providers timeout. |
PWX-15043 | The pxctl volume inspect command output did not show the "Mount Options" field for NFS proxy volumes. User Impact: You could not see the "Mount Options" field for NFS proxy volumes, even if you explicitly provided the mount options while creating such a volume. Resolution: The pxctl volume inspect command output now shows the "Mount Options" field for NFS proxy volumes. |
PWX-16386 | On certain slower systems, a sharedv4 volume wasn't mounted over NFS as soon as it was exported on the server. User impact: Portworx showed the access denied by server error. Resolution Portworx now detects this error scenario and retries the NFS mount. |
PWX-17477 | In clusters that have seen more than 3000 unsuccessful node add attempts, Portworx, on addition of another node running the 2.6.x release, encountered a node index overflow. User Impact: Other nodes in the cluster could dump a core. Resolution: This patch fixes the node index allocation workflow and prevents the new node from joining the cluster. |
PWX-17206 | A part-probe inside the container took a long time to finish. User impact: Portworx took a long time to reach the "Ready" state after a node restart. Resolution: Portworx now uses a host-based part-probe to resolve this issue. |
PWX-14925 | Portworx drives showed as offline in the pxctl status and pxctl service pool show commandsUser impact: When a drive was added to Portworx, the pxctl status and pxctl service pool show commands showed the drive as offline when the commands were run from the Portworx oci-monitor pod.Resolution: In the oci-monitor pod, pxctl now gets the updated information about the newly added drives from the Portworx container. |
PWX-15984 | A large number of cloudsnap delete requests were stuck pending in the KVDB. User impact: Cloudsnaps were not deleted from the cloud, or cloudsnap delete requests did not make much progress. Resolution: Improvements to cloudsnap delete operations reduce processing times. |
PWX-16681 | Restore failed when the incremental chain is broken due to either deleted cloudsnaps or deleted local snaps in the source cluster. User impact: Async migration continuously fails. Resolution: To automatically resume async DR, migration now deletes local cloudsnaps if a restore fails, triggering a full backup to fix the issue. |
OC-196 | An issue with Portworx upgrades from v2.4, 2.5 to v2.6 on Kubernetes with floating licenses caused an excess number of licenses to be consumed. User impact: When upgrading from v2.4, 2.5 to v2.6 on Kubernetes, Portworx temporarily consumed the double amount of license leases. Resolution: The upgrade to Portworx now properly recycles the license leases during the upgrade procedure, no longer consuming more licenses than it should have. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PWX-17217 | Portworx fails to exit maintenance mode after drive add operations showed as "done". User impact: If a user restarts Portworx, the add drive status command may return “done” while the md reshape operation was still in progress. Even when the reshape is done, the in-core status of the mountpoint wont change, and users can't exit maintenance mode.Recommendations: If you're stuck in maintenance mode as a result of this issue, you can restart Portworx to clear it. |
PWX-17531 | A port conflict between containerd and the secure port used by the PVC controller causes the controller to enter the "CrashLoopBackup" mode. User Impact: Users running on Kubernetes clusters using containerd saw their Portworx PVC controller pods enter the "CrashLoopBackup" mode. Recommendations: You can fix this issue by adding the --secure-port=9031 flag to the portworx-pvc-controller deployment, which can be found in the namespace on which you installed Portworx (kube-system by default). If you are using a custom start port for the Portworx installation, add 30 to the configured start port and use that number for the --secure-port parameter (e.g. if using 10000, use 10030). |
2.6.1.6
November 20, 2020
Notes
- Portworx licenses for DR are now enabled to work on IBM Cloud.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16429 | An issue with applying labels on new pools caused new drive add operations using the pxctl service drive add command to fail. User impact: If users tried to add a new Portworx pool to the node, their pxctl service drive add command failed with an error message about labels being too long. This prevented users from creating new pools.Resolution: Portworx no longer applies the Kubernetes labels to the backend storage pools. |
PWX-16941 | Portworx installation failed when users set the VSPHERE_INSTALL_MODE=local flag to enable the vSphere cloud drive provisioning feature.User impact: Portworx failed to initialize when used in the mode above. Resolution: Portworx now properly initializes when vSphere cloud drive provisioning is enabled. |
2.6.1.5
November 16, 2020
Notes
- Added support for OCP 4.6.1
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16796 | For Portworx Essentials users, Portworx ignored the proxy username and password set as part of PX_HTTP_PROXY , causing license renewal to fail.User impact: Portworx Essentials clusters entered 'license expired' mode when PX_HTTP_PROXY was set. Resolution: Portworx Essentials now honors the Username and Password fields given as part of PX_HTTP_PROXY to successfully make connections with the proxy. |
PWX-16775 | The Google object store did not have pagination for object enumeration, which caused any list call to list everything in the bucket. User impact: Cloudsnap backups and restores failed to start and the request timed out. Listing cloudsnaps through pxctl also timed-out.Resolution: Added pagination to object enumeration with the Google object store. |
2.6.1.4
October 30, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16432 | Multipathd configuration files were not correctly set up for blacklisting Portworx devices. User impact: Incorrect entries in the Multipathd configuration file caused other devices to be handled incorrectly on the host. Resolution: This fix disables the updates to the Multipathd configuration file by default, and adds an option to enable the updates through the runc install argument -enable-mpcfg-update . |
2.6.1.3
October 14, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10434 | When upgrading to the 2.6.x releases on certain platforms where the CPU does not support the SSE4.2 instruction set, Portworx encountered a checksum mismatch on the log file. User impact: The node would go into storageless mode after the upgrade. Resolution: This patch fixes the log replay so that it does the CPU capability check and uses the right checksum type to verify the log checksum. |
2.6.1.2
October 9, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16417 | Portworx would not recognize the multipath devices. User impact: Portworx nodes came up as a storageless node. Resolution: Portworx now properly opens /etc/multipath.conf and recognizes Multipath devices. |
2.6.1.1
October 7, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16013 | On certain kernel versions, such as variants of AKS ubuntu kernel 4.15.0, and under certain conditions, the filesystem IO on the backing storage pool sometimes hung due to a kernel bug. User impact: Impacted Portworx pods displayed a 'healthy' node as 'unhealthy', causing downtime for affected users. Impact: This fix patches the filesystem kernel module in variants of AKS ubuntu kernel 4.15.0 and reinserts the patched kernel module, fixing the issue for users on this kernel. |
2.6.1
October 2, 2020
New features
- Introducing Portworx on the AWS Marketplace: deploy Portworx from the AWS Marketplace and pay through the AWS Marketplace Metering Service.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16005 | Added support for fetching tokens per vault namespace. |
PWX-14307 | Users can instruct Portworx to delete the local snaps created for cloudsnaps after the backup is complete through the pxctl --delete-local option. This causes the subsequent backups to be full. |
PWX-15987 | The pool expand alert now includes additional information about the cluster ID in the event metrics. |
PWX-15427 | Volume resize operation status alerts export metrics to Prometheus with additional context, such as: volumeid, clusterid, pvc name, namespace. |
PWX-13524 | You can now use a network interface for cloudsnap endpoints. This is a cluster-level setting. |
PWX-16063 | Introducing a new px-cache configuration parameter to control the cache block size for advanced users: px-runc arg: -cache_blocksize <size> . |
PWX-15897 | Improved Portworx NFS handling in multiple ways:
|
PWX-15036 | Traditionally, Portworx installed via Pay-as-you-go or marketplace mode will go into maintenance mode if it is unable to report usage within 72 hours. You can now configure a longer time period, up to 7 days(168 hours), by passing rtOpts billing_timeout_hours to the Portworx DaemonSet. If you set an invalid rtOpts value, Portworx returns to 72 hours. |
PWX-11884 | Portworx now supports per volume encryption with scaled volumes. DCOS users who use Portworx scaled volumes can provide a volume spec in the following manner to create a scaled volume which uses "mysecret" secret key for encryption: secret_key=mysecret,secure=true,name=myscaledvolume,scale=3 |
PWX-14950 | Portworx now supports the ability to read vSphere usernames and passwords from a Kubernetes secret directly instead of mounting them inside the Portworx spec. |
PWX-15698 | Added a new command, pxctl service node-usage <node-id> , that displays all volumes/snapshots with their storage usage and exclusive usage bytes for a given node. Since it traverses through the filesystem, this is an expensive command, and should be used with caution and infrequently. This change also removes support for or deprecates capacity usage of single volume: pxctl volume usage is no longer supported. |
PWX-16246 | px-runc now features the -cache_blocksize <value> option, which configures cache blocks size for px-cache. this option supports values of 1MB and above and a power of 2. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15130 | OCI-Monitor could have left zombie processes when installing Portworx for the first time. User impact: In most cases these zombies were harmless, but they had the potential to lock up the yum package system on CentOS hosts in rare circumstances.Resolution: OCI-Monitor now properly cleans up zombie processes. |
PWX-16240 | The ETCD_PASSWORD environment variable was shown in plaintext on px-runc and the OCI-Monitor's logs. User impact: The ETCD_PASSWORD environment variable was shown in plaintext in the Portworx/Kubernetes logs. Resolution: The ETCD_PASSWORD is no longer shown in plaintext in the logs. |
PWX-15806 | KVDB backups are stored under /var/lib/osd/kvdb_backup and one of the internal directories of Portworx where storage is mounted. On storageless nodes, KVDB backup files were not getting rotated from the internal directories since there is no storage.User Impact: The backup files could end up filling the root filesystem of the node. Resolution: Only dump the KVDB backup files under /var/lib/osd/kvdb_backup which get rotated periodically. |
PWX-15705 | Application backups can't work with the newer security model of 2.6.0. User impact: Application backups failed after upgrading to Portworx 2.6.0. Resolution: The auth model now works with the older style of auth annotations. |
PWX-16006 | Under certain circumstances, Portworx doesn't apply all Kubernetes node labels to storage pools. User impact: PVCs using replica affinity on those labels are stuck in the pending state Resolution: Portworx now performs the Kubernetes node update later in the initialization process. |
PWX-15961 | If you reconfigured a network device and attempted to restore a cloud backup on a volume from a snapshot, Portworx tried to use the IP of the previous network device in the restore and the cloud backup failed. User impact: Users saw the following error message: "Failed to create backup: Volume attached on unknown () node", and had to manually attach and detach the volume. Resolution: Portworx now updates the network device when they're reconfigured. |
PWX-15770 | Portworx sometimes couldn't complete volume export/attach/mount operations for NFS pods before timing-out. User impact: The affected pods failed to deploy. Resolution: Portworx no longer retries NFS mount operations in a loop on failure. The timeout for the NFS unmount command starts at 2 minutes, and if retried in a loop, an API Mount request can scale up to more than 4 minutes. |
PWX-15622 | sharedv4 volume mounts timed-out. User Impact: In a slower network or on overloaded nodes, sharedv4 (NFS) volume mounts can timeout and attempt multiple retries. The affected pod never becomes operational and repeatedly shows the signal: killed error. Resolution: sharedv4 volume mount operations now wait 2 minutes before timing-out. You can also specify an option to configure the timeout to larger values if required: pxctl cluster options update --sharedv4-mount-timeout-sec <value> |
2.6.0.2
September 25, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16160 | Environment variables were not anonymized. User Impact: Sensitive information regarding secrets may potentially have been printed in the logs. Resolution: Portworx now anonymizes all environment variables. |
2.6.0.1
September 22, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16005 | Added support for fetching tokens per vault namespace |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15705 | Application backups can't work with the newer security model of 2.6.0. User impact: Application backups failed after upgrading to Portworx 2.6.0. Resolution: The auth model now works with the older style of auth annotations. |
2.6.0
August 25, 2020
Notes
- If you're upgrading an auth-enabled Portworx cluster to Portworx 2.6.0, you must upgrade Stork to version 2.4.5.
- Operator versions prior to 1.4 and Autopilot currently do not support auth-enabled clusters running Portworx 2.6.0. Support for this is planned for a future release