Portworx Enterprise Release Notes
3.2.1
December 2, 2024
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Portworx now supports the PX-StoreV2 backend on the following platforms
3.2.0
October 31, 2024
Visit these pages to see if you're ready to upgrade to this version:
Portworx 3.2.0 requires Portworx Operator 24.1.3 or newer.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Secure multi-tenancy with Pure FlashArray When a single FlashArray is shared among multiple users, administrators can use realms to allocate storage resources to each tenant within isolated environments. Realms set boundaries, allowing administrators to define custom policies for each tenant. When a realm is specified, the user must provide a FlashArray pod name where Portworx will create all volumes (direct access or cloud drives) within that realm. This ensures that each tenant can only see their own storage volumes when logged into the array.
- Support for VMware Storage vMotion Portworx now supports the Storage vMotion feature of VMware, enabling vSphere cloud drives to be moved from one datastore to another without any downtime.
- Defragmentation schedules
Users can now set up defragmentation schedules using
pxctl
commands during periods of low workload to improve the performance of Portworx.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-35876 | For IBM customers, Portworx now supports the StorageClass with the encryption flag set to true. | Marketplaces |
PWX-38395 | Previously, all storageless nodes would restart to claim a driveset when a storage node went down and its driveset was detached in the same zone. With this improvement, only one storageless node will claim ownership of the driveset and restart, while the other storageless nodes remain unaffected and do not restart. | Drive & Pool Management |
PWX-33561 | For partially attached drivesets, Portworx now detaches the driveset only when cloud drives are not mounted, avoiding unnecessary detachment when a mount is present. | Drive & Pool Management |
PWX-37403 | FlashArray now allows specifying multiple management ports for the same FlashArray. If customers are on a VLAN connection to FlashArray, the virtual IP address might encounter issues. Customers can specify the management IPs of the controllers directly in the secret as comma-separated values. | Drive & Pool Management |
PWX-38597 | For FlashArray Cloud Drives, on Portworx restart, any stale entries of the driveset are cleaned, and the locally attached driveset is prioritized for mounting volumes rather than checking all other drives. | Drive & Pool Management |
PWX-39131 | The total number of GET API calls are reduced significantly. | Drive & Pool Management |
PWX-38551 | The latency of any operation on FlashArray due to multiple API calls has been reduced. Portworx now uses the FlashArray IDs stored in the cloud drive config map to limit API calls only to the FlashArray where the drive resides. | Drive & Pool Management |
PWX-37864 | When you add a drive using the pool expand add-drive operation, the config map is now automatically updated with the pool ID of the newly added drive, preventing the need for a Portworx restart. | Drive & Pool Management |
PWX-38630 | Portworx now supports adding a cloud drive to a storageless node when the cloud drive specification for the journal device in the StorageCluster spec is explicitly set to a value other than auto . | Drive & Pool Management |
PWX-38074 | Improved the startup timing of Portworx nodes in multi-FlashArray setups by handling metrics timeouts more effectively. When volume creation on a FlashArray takes too long, Portworx now avoids sending further requests to that FlashArray for 15 minutes, allowing other nodes to continue the startup process without delays. | Drive & Pool Management |
PWX-38644 | For FlashArray Cloud Drives, pool expansion failure messages are no longer overridden by maintenance mode messages, providing more useful error information for users to debug their environment. | Drive & Pool Management |
PWX-33042 | In disaggregated environments, users cannot add drives to a storageless node labeled as portworx.io/node-type=storageless . To add drives, users need to change the node label to portworx.io/node-type=storage and restart Portworx. | Drive & Pool Management |
PWX-38169 | During pool expansion, Portworx now check the specific driveset that the node records, rather than iterating through all drivesets in the cluster randomly. This change significantly reduces the number of API calls made to the backend, thereby decreasing the time required for pool expansion and minimizing the risk of failure, particularly in large clusters. | Drive & Pool Management |
PWX-38691 | Portworx now raises an alert called ArrayLoginFailed when it fails to log into a FlashArray using the provided credentials. The alert includes a message listing the arrays where the login is failing. | Drive & Pool Management |
PWX-37672 | The pxctl cd i --<node-ID> command now displays the IOPS set during disk creation | Drive & Pool Management |
PWX-37439 | Azure users can now specify IOPS and throughput parameters for Ultra Disk and Premium v2 disks. These parameters can only be set during the installation process. | Drive & Pool Management |
PWX-38397 | Portworx now exposes NFS proc FS pool stats as Prometheus metrics. Metrics to track the number of Packets Arrived , Sockets Enqueued , Threads Woken , and Threads Timedout have been added. | Shared Volumes |
PWX-35278 | A cache for the NFS and Mountd ports has been added, so the system no longer needs to look up the ports every time. The GetPort function is only called the first time during the creation or update of the port, and the cache updates if accessed 15 minutes after the previous call. | Shared Volumes |
PWX-33580 | The NFS unmount process has been improved by adding a timeout for the stat command, preventing it from getting stuck when the NFS server is offline and allowing retries without hanging. | Shared volumes |
PWX-38180 | Users can now set the QPS and Burst rate to configure the rate at which API requests are made to the Kubernetes API server. This ensures that the failover of the sharedv4 service in a scaled setup is successful, even if another operation causes an error and restarts some application pods. To do this, add the following environment variables:
| Shared Volumes |
PWX-39035 | Portworx will no longer print the Last Attached field in the CLI's volume inspect output if the volume has never been attached. | Volume Management |
PWX-39373 | For FlashArray Direct Access volumes, the token timeout time has been is increased from 15 minutes to 5 hours to avoid, which provides enough time for Portworx to process large number of API token requests | Volume Management |
PWX-39302 | For Portworx CSI volumes, calls to the Kubernetes API to inspect a PVC have been significantly reduced, improving performance. | Volume Management |
PWX-37798 | Users can now remove labels from a Portworx volume using the pxctl volume update -l command, allowing them to manually assign pre-provisioned Portworx volumes to a pod. | Volume Management |
PWX-38585 | FlashArray Direct Access users can now clone volumes using pxctl . | Volume Management |
PWX-35300 | Improved FlashBlade Direct Access volume creation performance by removing an internal lock, which previously caused delays during parallel creation processes. | Volume Management |
PWX-37910 | Cloudsnaps are now initialized using a snapshot of KVDB avoiding failure errors. | Storage |
PWX-35130 | Portworx now sends an error message and exits the retry loop when a volume is stuck in a pending state, preventing continuous creation attempts. | Storage |
PWX-35769 | Storageless nodes now remain in maintenance mode without being decommissioned, even if they exceed the auto-decommission timeout. This prevents failure for user-triggered operations when the storageless node is in maintenance mode. | Control Plane |
PWX-39540 | Portworx now ensures the correct information for a pure volume is returned, even if the FlashArray is buggy, preventing node crashes. | Control Plane |
PWX-37765 | The pxctl volume list command has been improved to allow the use of the --pool-uid flag alongside the --trashcan flag, enabling the filtering of trashcan volumes based on the specified Pool UUID. | CLI & API |
PWX-37722 | Added a new --pool-uid flag to the pxctl clouddrive inspect command, allowing users to filter the inspect output based on the specified Pool UUID. | CLI & API |
PWX-30622 | The output of the pxctl volume inspect <volume-id> command now displays the labels alphabetically, making it easier to track any changes made to labels. | CLI & API |
PWX-39146 | The pxctl status output also includes a timestamp indicating when the information was collected. | CLI & API |
PWX-36245 | PX-StoreV2 pools now support a maximum capacity of 480TB by choosing appropriate chunk size during pool creation. | PX-StoreV2 |
PWX-39059 | Portworx now installs successfully on cGroupsV2 and Docker Container runtime environments. | Install & Uninstall |
PWX-37195 | Portworx now automatically detects SELinux-related issues during installation and attempts to resolve them, ensuring a smoother installation process on SELinux-enabled platforms. | Install & Uninstall |
PWX-38848 | Portworx now properly handles the floating license-lease updates, when cloud-drives move between the nodes. | Licensing & Metering |
PWX-38694 | Improved the time to bring up a large cluster by removing a short-lived cluster lock used in cloud drive deployments. | KVDB |
PWX-38577 | The logic for handling KVDB nodes when out of quorum has been improved in Portworx. Now, Portworx processes do not restart when KVDB nodes are down. | KVDB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38609 | Portworx sometimes lost the driveset lock for FlashArray cloud drives when the KVDB drive was removed in situations such as KVDB failover. User Impact: Loss of the driveset lock resulted in other nodes attempting to attach a drive already attached to the current node. Resolution: Portworx now uses a temporary driveset to safely remove the KVDB drive. Components: KVDB Affected Versions: 3.1.5 | Critical |
PWX-38721 | Portworx attempted to mount FlashBlade Direct Access volumes using the NFS IP. However, if an existing mount point used an FQDN, Portworx defaulted to the FQDN after a restart. If a Kubernetes mount request timed out, but Portworx completed it successfully, Kubernetes retried the request. Portworx then returned an error due to the FQDN, leading to repeated mount attempts. User Impact: Application pods with a timed-out initial mount request were stuck in the ContainerCreating state. Resolution: Portworx now performs IP resolution on the existing mount entry. If they match, it confirms the mount paths are already created, and Portworx returns a success. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-38618 | In a cluster where multiple applications used the same FlashBlade Direct Access volume, some applications used FQDNs while others used IP addresses. The NFS server recognized only the FQDN, causing a mismatch in the mount source paths tracked by Portworx. User Impact: Application pods using IPs to mount the FlashBlade Direct Access volume were stuck in the terminating state. Resolution: When a request is received from CSI to unmount a target path for FlashBlade Direct Access, Portworx unconditionally unmounts it, even if the source path differs from the one recognized by it. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-38376 | During node initialization in the boot-up process, FlashArray properties are required for all the dev mapper paths already present on the node. This call is made to all arrays configured in pure.json configuration file, which sometimes failed, causing the initialization to fail.User Impact: Users saw node initialization failures due to errors from arrays that had no volumes for the current node. Additionally, unintended extra API calls were made to the arrays, contributing to the overall API load. Resolution: Portworx now uses the FlashArray volume serial to determine which array the volume belongs to. The array ID is then passed as a label selector to DeviceMappings, ensuring that only the relevant array is queried. Components: Volume Management Affected Versions: 3.1.x, 3.0.x | Critical |
PWX-36693 | When a storageless node transitioned to a storage node, the node's identity changed as it took over the storage node identity. The old identity corresponding to the storageless node was removed from the Portworx cluster. All volumes attached to the removed node were marked as detached, even if pods were currently running on the node. User Impact: Volumes incorrectly appeared as detached, even while pods were running and consuming the volumes. Resolution: Portworx now decommissions cloud drives only after the AutoDecommissionTimeout expires, ensuring that volumes remain attached to the node and are not incorrectly displayed as detached. Components: Volume Management Affected Versions: 3.1.1 | Critical |
PWX-38173 | When the storage node attempted to restart, it could not attach the previous driveset, as it was already claimed by another node, and could not start as a new node because the drives were still mounted. User Impact: The storage node attempting to come back online repeatedly restarted due to unmounted drive mount points. Resolution: Portworx now automatically unmounts FlashArray drive mount points if it detects that the previous driveset is unavailable but its mount points still exist. Component: Drive and Pool Management Affected Versions: 3.0.x, 3.1.x | Critical |
PWX-38862 | During Portworx upgrades, a sync call was triggered and became stuck on nodes when the underlying mounts were unhealthy. User Impact: Portworx upgrades were unsuccessful on nodes with unhealthy shared volume mounts. Resolution: Portworx has removed the sync call, ensuring that upgrades now complete successfully. Components: Drive & Pool Management Affected Versions: 3.1.x | Critical |
PWX-38936 | If a storage node restarts, it restarted a few times before it could successfully boot, because its driveset was locked and won't be available for a few minutes. User Impact: Users saw the Failed to take the lock on drive set error message and the node took longer time to restart.Resolution: In such case Portworx tells the restarting node that the driveset is not locked, and thus it is able to claim this driveset without having to wait until the lock expires. During this time other nodes still see this driveset as locked and unavailable Components: Drive & Pool Management Affected Versions: 3.1.x | Major |
PWX-39627 | In large Portworx clusters with many storage nodes using FlashArray or FlashBlade as the backend, multiple nodes might simultaneously attempt to update the lock configmap, resulting in conflict errors from Kubernetes. User Impact: Although the nodes eventually resolved the conflicts, this issue spammed the logs and slowed down boot times, especially in large clusters. Resolution: The refresh interval has been changed from 20 seconds to 1 minute. In case of a conflict error, Portworx now delays the retry by a random interval between 1 and 2 seconds, reducing the likelihood of simultaneous updates. Additionally, the conflict is logged only after 10 consecutive occurrences, indicating a real issue. Components: Drive & Pool Management Affected Versions: 3.1.x, 3.0.x | Major |
PWX-36318 | In IBM Cloud, the node name is the same as the node IP. If the selected subnet had very few available IPs and Portworx replaced a worker node, the new node would take the same IP. User Impact: When Portworx started on the replaced node with the same IP, it incorrectly assumed that it had locally attached drives due to the volume attachments. This assumption led to an attempt to access the non-attached device path on the new node, causing Portworx to fail to start. Resolution: With the new provider-id annotation added to the volume attachment, Portworx now correctly identifies the replaced node as a new one without local attachments.Component: Drive and Pool Management Affected Versions: 3.1.x | Major |
PWX-38114 | In IBM Cloud, the node name is the same as the node IP. If the selected subnet had very few available IPs and a worker node was replaced, the new node had the same IP. User Impact: When Portworx started on the replaced node with the same IP, it incorrectly assumed it had locally attached drives due to existing volume attachments, leading to a stat call on the non-attached device path and causing Portworx to fail to start. Resolution: The volume attachment now includes a new annotation, provider-id , which is the unique provider ID of the node, allowing Portworx to recognize that the replaced node is new and has no local attachments.Component: Drive and Pool Management Affected Versions: 3.0.x, 3.1.x | Major |
PWX-37283 | A storageless node did not transition into a storage node after a restart if it initially became storageless due to infrastructure errors unrelated to Portworx. User Impact: These errors caused the node to have attached drives that it was unaware of, preventing the node from recognizing that it could use these drives during the transition process. Resolution: when a storageless node attempts to become a storage node, it checks for any attached drives that it previously did not recognize. Using this information, the storageless node can now correctly decide whether to restart and transition into a storage node. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x | Major |
PWX-38760 | On a node with existing FlashBlade volumes mounted via NFS using a DNS/FQDN endpoint, if Portworx received repeated requests to mount the same FlashBlade volume on the same mount path but using an IP address instead of the FQDN, Portworx returned an error for the repeated requests. User Impact: Pods were stuck in the ContainerCreating state. Resolution: Portworx has been updated to recognize and return success for such repeated requests when existing mount points are present. Components: Volume Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37614 | When a Portworx volume with volumeMode=Block was created from a StorageClass that also had fs or fsType specified, Portworx incorrectly attempted to format the raw block volume with the specified file system.User Impact: Users were unable to use a common StorageClass for creating both block and file volumes. Resolution: Portworx now allows the creation of raw block PVCs even if fs or fsType parameters are specified in the StorageClass.Components: Volume Management Affected Versions: 3.1.2 | Major |
PWX-37282 | HA-Add and HA-level recovery failed on volumes with volume-affinity VPS, as the volume-affinity VPS restricted pool provisioning to certain nodes. User Impact: Users experienced issues such as volumes losing HA after node decommission or HA-Add operations failing. Resolution: The restriction of volume-affinity VPS has been relaxed. Portworx now prioritizes pools that match VPS labels but will select secondary candidate pools under specific conditions, such as during HA increases and when the volume carries the specified VPS labels. This change does not affect VPS validity. Components: Storage Affected Versions: 3.1.x, 3.0.x | Major |
PWX-38539 | The Autopilot config triggered multiple rebalance audit operations for Portworx processes, which overloaded Portworx and resulted in process restarts. User Impact: Users saw alerts indicating Portworx process restarts. Resolution: Portworx now combines multiple rebalance audit triggers into a single execution, minimizing the load on Portworx processes and reducing the likelihood of restarts. Components: Storage Affected Versions: 3.1.2.1 | Major |
PWX-38681 | If there were any bad mounts on the host, volume inspect calls for FlashArray Direct Access volumes would take a long time, as df -h calls would hang.User Impact: Users experienced slowness when running pxctl volume inspect <volId> .Resolution: Portworx now extracts the FlashArray Direct Access volume dev mapper path and runs df -h only on that specific path.Components: CLI and API Affected Versions: 3.1.x, 3.0.x | Major |
PWX-37799 | Portworx restarted when creating a cloud backup due to a KVDB failure. User Impact: If a cloud backup occurred during a KVDB failure, Portworx would unexpectedly restart. Resolution: The nil pointer error causing the restart has been fixed. Now, Portworx raises an alert for backup failure instead of unexpectedly restarting. Components: Cloudsnaps Affected Versions: 3.1.x, 3.0.x | Major |
PWX-39080 | When the Kubernetes API server throttled Portworx requests, in certain scenarios, a background worker thread would hold a lock for an extended period, causing Portworx to assert and restart. User Impact: Portworx asserted and restarted unexpectedly. Resolution: The Kubernetes API calls from the background worker thread have been moved outside the lock's context to prevent the assert. Components: KVDB Affected Versions: 3.2.0 | Major |
PWX-37589 | When Azure users attempted to resize their drives, Portworx performed an online expansion for Azure drives, which did not align with Azure's recommendation to detach drives of 4 TB or smaller from the VM before expanding them. User Impact: Azure drives failed to resize and returned the following error: Message: failed to resize cloud drive to: 6144 due to: compute.DisksClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Disk of size 4096 GB (<=4096 GB) cannot be resized to 6144 GB (>4096 GB) while it is attached to a running VM. Please stop your VM or detach the disk and retry the operation Resolution: Portworx now detaches drives of 4 TB or smaller before performing pool expansion, instead of attempting online expansion. Components: Drive & Pool Management Affected Versions: 3.0.x,3.1.x | Minor |
PWX-36683 | Portworx failed to resolve the correct management IP of the cluster and contacted the Telemetry system using an incorrect IP/port combination. This issue caused the pxctl status command output to result in Telemetry erroneously reporting as Disabled or Degraded .User Impact: Telemetry would sometimes appear to be unhealthy even when it was functioning correctly. This could lead to confusion and misinterpretation of the system's health status. Resolution: The issue was resolved by fixing the logic that chooses the management IP, ensuring that Portworx correctly resolves the management IP of the cluster. This change prevents the system from using the wrong IP/port combination to contact the Telemetry system, thereby ensuring accurate reporting of Telemetry status. Components: Telemetry & Monitoring Affected Versions: 3.0.x,3.1.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-3505 | EKS users may encounter issues installing Portworx on EKS version 1.30. This version requires the Amazon Linux 2023 (AL2023) kernel, which, in turn, enforces IMDSv2 by default Workaround:
Affected Versions: 3.0., x, 3.1.x | Critical |
PD-3329 | Provisioning of KubeVirt VM fails if the bootOrder is not specified for the VM disks and the first disk is not a PVC or a DataVolume. Workaround: Specify the bootOrder in the VM spec or ensure that the first disk is a PVC or a DataVolume. Components: KVDB Affected Versions: 3.1.3 | Major |
PD-3324 | Portworx upgrades may fail with Unauthorized errors due to the service account token expiring when the Portworx pod terminates in certain Kubernetes versions. This causes API calls to fail, potentially leading to stuck Kubernetes upgrades. Workaround: Upgrade the Portworx Operator to version 24.2.0 or higher, which automatically issues a new token for Portworx. Components: Install & Uninstall Affected Versions: 3.1.1, 3.2.0 | Major |
PD-3412 | A Kubernetes pod can get stuck in the ContainerCreating state with the error message: MountVolume.SetUp failed for volume "<PV_NAME>" : rpc error: code = Unavailable desc = failed to attach volume: Volume: <VOL_ID> is attached on: <NODE_ID> , where NODE_ID is the Portworx NODE ID of the same node where the pod is trying to be created.Workaround: Restart the Portworx service on the impacted node. Components: Volume Management Affected Versions: 3.2.0 | Major |
PD-3408 | If you have configured IOPS and bandwidth for a FlashArray Direct Access volume, and that volume is snapshotted and later restored into a new volume, the original IOPS and bandwidth settings are not honored. Workaround: Manually set the IOPS and bandwidth directly on the FlashArray for the restored volume. Components: Volume Management Affected Versions: 3.1.4, 3.2.0 | Major |
PD-3434 | During node decommission, if a node is rebooted, it can enter a state where the node spec has been deleted, but the associated cloud drive has not been cleaned up. If this node is recommissioned, the Portworx reboot fails because both the previous and current drivesets are attached to the node.Workaround:
Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3409 | When a user create a journal device as a dedicated cloud drive and create the storage pool using the pxctl sv add-drive command, the cloud drives are not automatically deleted when the storage pool is deleted.Workaround: Manually remove the drives after deleting the pool. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3416 | When you change the zone or any labels on an existing Portworx storage node with cloud drives, Portworx may fail to start on that node. If the labels are changed, the driveset associated with the old zone might become orphaned, and a new storage driveset may be created. Workaround: To change topology labels on existing storage nodes, contact Portworx support for assistance. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3496 | For Portworx installation, using FlashArray Direct Access, without a Realm specified. If the user clones a volume that is inside a FlashArray pod and clones it to a new volume that is not in a FlashArray pod, the cloned volume appearsto be bound but might not be attachable. Workaround: Include the parameter pure_fa_pod_name: "" in the StoargeClass of the cloned volumes.Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3494 | In a vSphere local mode installation environment, users may encounter incorrect alerts stating that cloud drives were moved to a datastore lacking the expected prefix (e.g., local-i ).when performing Storage vMotion of VMDKs associated with specific VMs.Workaround: This alert can be safely ignored. Components: Drive & Pool Management Affected Versions: 3.2.0 | Major |
PD-3365 | When you run the drop_cache service on Portworx nodes, it can cause Portworx to fail to start due to known issues in the kernel.Workaround: Avoid tunning drop_cache service on Portworx nodes.Components: Storage Affected Versions: 3.1.4, 3.2.0 | Minor |
3.1.7
December 3, 2024
Visit these pages to see if you're ready to upgrade to this version:
Note
This version addresses security vulnerabilities.
3.1.6.1
November 13, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-39990 | As part of node statistics collection, Portworx read the timestamp data stats while its storage component was updating them at the same time, leading to data conflicts. User Impact: The Portworx storage component restarted due to an invalid memory access issue. Resolution: A lock mechanism has been added to manage concurrent reads and writes to the timestamp data, preventing conflicts. Affected Versions: 3.1.0 Component: Storage | Critical |
3.1.6
October 02, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-38930 | For PX-StoreV2 deployments with volumes that had a replica factor greater than 1 and were either remotely attached or not accessed through PX-Fast PVCs, if a power loss, kernel panic, or ungraceful node reboot occurred, the data was incorrectly marked as stable due to buffering in the underlying layers, despite being unstable. User Impact: In these rare situations, this issue can mark PVC data as unstable. Resolution: Portworx now correctly marks the data as stable , preventing this problem. Components: PX-StoreV2 Affected Versions: 2.13.x, 3.0.x, 3.1.x | Critical |
3.1.5
September 19, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-38849 | For Sharedv4 volumes, users can now apply the disable_others=true label to limit the mountpoint and export path permissions to 0770 , effectively removing access for other users and enhancing the security of the volumes. | Volume Management |
PWX-38791 | The FlashArray Cloud Drive volume driveset lock logic has been improved to ensure the driveset remains locked to its original node, which can otherwise detach due to a connection loss to the FlashArray during a reboot, preventing other nodes from claiming it:
| Drive & Pool Management |
PWX-38714 | During the DriveSet check, if device mapper devices are detected, Portworx cleans them before mounting FlashArray Cloud Drive volumes. This prevents mounting issues during failover operations on a FlashArray Cloud Drive volume. | Drive & Pool Management |
PWX-37642 | The logic for the sharedv4 mount option has been improved:
| Sharedv4 Volumes |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36679 | Portworx could not perform read or write operations on Sharedv4 volumes if NFSD version 3 was disabled in /etc/nfs.conf .User Impact: Read or write operations failed on Sharedv4 volumes. Resolution: Portworx no longer depends on the specific enabled NFSD version and now only checks if the service is running. Components: Shared Volumes Affected Versions: 3.1.0 | Major |
PWX-38888 | In some cases, when a FlashArray Direct Access volume failed over between nodes, Portworx version 3.1.4 did not properly clean up the mount path for these volumes. User Impact: Application pods using FlashArray Direct Access volumes were stuck in the Terminating state.Resolution: Portworx now properly handles the cleanup of FlashArray Direct Access volume mount points during failover between nodes. Components: Volume Management Affected Versions: 3.1.4 | Minor |
3.1.4
August 15, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37590 | Users running on environments with multipath version 0.8.8 and using FlashArray devices, either as Direct Access Volumes or Cloud Drive Volumes, may have experienced issues with the multipath device not appearing in time. User Impact: Users saw Portworx installations or Volume creation operations fail. Resolution: Portworx is now capable of running on multipath version 0.8.8. Components: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
3.1.3
July 16, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-37576 | Portworx has significantly reduced the number of vSphere API calls during the booting process and pool expansion. | Drive & Pool Management |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37870 | When PX-Security is enabled on a cluster that is also using Vault for storing secrets, the in-tree provisioner (kubernetes.io/portworx-volume) fails to provision a volume. User Impact: PVCs became stuck in a Pending state with the following error: failed to get token: No Secret Data found for Secret ID . Resolution: Use the CSI provisioner (pxd.portworx.com) to provision the volumes on clusters that have PX-Security enabled. Components: Volume Management Affected Versions: 3.0.3, 3.1.2 | Major |
PWX-37799 | A KVDB failure sometimes Portworx to restart when creating cloud backups. User Impact: Users saw Portworx restart unexpectedly. Resolution: Portworx now raises an alert, notifying users of a backup failure instead of unexpectedly restarting. Components: Cloudsnaps Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37661 | If the credentials provided in px-vsphere-secret were invalid, Portworx failed to create a Kubernetes client, and the process would restart every few seconds leading to many login failures continuously. User Impact: Users saw a large number of client creation trials, which may have lead to the credentials being blocked or too many API calls. Resolution: If the credentials are invalid, Portworx will now wait for secret to be changed before trying to log in again. Components: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37339 | Sharedv4 service failover did not work correctly when a node had a link-local IP from the subnet 169.254.0.0/16. In clusters running OpenShift 4.15 or later, Kubernetes nodes may have a link-local IP from this subnet by default. User Impact: Users saw disruptions in applications utilizing sharedv4-service volumes when the NFS server node went down. Resolution: Portworx has been improved to prevent VM outages in such situations. Components: Sharedv4 Affected Versions: 3.1.0.2 | Major |
3.1.2.1
July 8, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-37753 | Portworx reloaded and reconfigured VMs on every boot, which is a costly activity in vSphere. User Impact: Users saw a significant number of VM reload and reconfigure activities during Portworx restarts, which sometimes overwhelmed vCenter. Resolution: Portworx has been optimized to minimize unnecessary reload and reconfigure actions for VMs. Now, these actions are mostly triggered only once during the VM's lifespan. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-35217 | Portworx maintained two vSphere sessions at all times. These sessions would become idle after Portworx restarts, and vSphere would eventually clean them up. vSphere counts idle sessions toward its session limits, which caused an issue if all nodes restarted simultaneously in a large cluster. User Impact: In large clusters, users encountered the 503 Service Unavailable error if all nodes restarted simultaneously.Resolution: Portworx now actively terminates sessions after completing activities like boot and pool expansion. Note that in rare situations where Portworx might not close the sessions, users may still see idle sessions. These sessions are cleaned by vSphere based on the timeout settings of the user's environment. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36727 | When a user decommissioned a node, Portworx would process the node deletion in the background. And for every volume delete or update operation, it checked if all nodes marked as decommissioned had no references to these volumes, which took a long time to delete a node. User Impact: The Portworx cluster went down as the KVDB node timed out. Resolution: The logic for decommissioning nodes has been improved to prevent such situations. Component: KVDB Affected Versions: 3.1.x, 3.0.x, 2.13.x | Minor |
3.1.2
June 19, 2024
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Customers can now migrate legacy shared volumes to sharedv4 service volumes.
- For FlashBlade Direct Access volumes, users can provide multiple NFS endpoints using the
pure_nfs_endpoint
parameter. This is useful when the same FlashBlade is shared across different zones in a cluster.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-33044 | Portworx will perform additional live VM migrations to ensure a KubeVirt VM always uses the block device directly by running the VM on the volume coordinator node. | Sharedv4 |
PWX-23390 | Stork will now raise events on a pod or VM object if it fails to schedule them in a hyperconverged fashion. | Stork and DR |
PWX-37113 | In KubeVirt environments, Portworx no longer triggers RebalanceJobStarted and RebalanceJobFinished alarms every 15 minutes due to the KubeVirt fix-vps job. Alarms are now raised only when the background job is moving replicas. | Storage |
PWX-36600 | The output of the rebalance HA-update process has been improved to display the state of each action during the process. | Storage |
PWX-36854 | The output of the pxctl volume inspect command has been improved. The Kind field can now be left empty inside the claimRef , allowing the output to include application pods that are using the volumes. | Storage |
PWX-33812 | Portworx now supports Azure PremiumV2_LRS and UltraSSD_LRS disk types. | Drive and Pool Management |
PWX-36484 | A new query parameter ce=azure has been added for Azure users to identify the cloud environment being used. The parameter ensures that the right settings and optimizations are applied based on the cloud environment. | Install |
PWX-36714 | The timeout for switching licenses from floating to Portworx Enterprise has been increased, avoiding timeout failures. | Licensing |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36869 | When using a FlashArray on Purity 6.6.6 with NVMe-RoCE, a change in the REST API resulted in a deadlock in Portworx. User Impact: FlashArray Direct Access attachment operations never completed, and FlashArray Cloud Drive nodes failed to start. Resolution: Portworx now properly handles the changed API for NVMe and does not enter a deadlock. Component: FA-FB Affected Versions: 3.1.x, 3.0.x, 2.13.x | Critical |
PWX-37059 | In disaggregated mode, storageless nodes restarted every few minutes attempting to claim the storage driveset and ended up being unsuccessful. User Impact: Due to storageless node restarts, some customer applications experienced IO disruption. Resolution: When a storage node goes down, Portworx now stops storageless nodes from restarting in a disaggregated mode, avoiding them to claim the storage driveset. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-37351 | If the drive paths changed due to a node restart or a Portworx upgrade, it led to a storage down state on the node. User Impact: Portworx failed to restart because of the storage down state. Components: Drive & Pool Management Affected Versions: 3.1.0.3, 3.1.1.1 | Major |
PWX-36786 | An offline storageless node was auto-decommissioned under certain race conditions, making the cloud-drive driveset orphaned. User Impact: When Portworx started as a storageless node using this orphaned cloud-drive driveset, it failed to start since the node's state was decommissioned. Resolution: Portworx now auto-cleans such orphaned storageless cloud-drive drivesets and starts successfully. Component: Drive and Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36887 | When one of the internal KVDB nodes was down for several minutes, Portworx added another node to the KVDB cluster. Portworx initially added the new KVDB member as a learner. If, for some reason, KVDB connectivity was lost for more than a couple of minutes after adding the learner, the learner stayed in the cluster and prevented a failover to a different KVDB node. User Impact: The third node was not able to join the KVDB cluster with the error Peer URLs already exists. KVDB continued to run with only two members.Resolution: When Portworx encounters the above error, it removes the failed learner from the cluster, thereby allowing the third node to join. Component: Internal KVDB Affected Versions: 3.0.x, 3.1.1 | Major |
PWX-36873 | When Portworx was using HashiCorp's Vault configured with Kubernetes or AppRole authentication, it attempted to automatically refresh the access tokens when they expired. If the Kubernetes Service Account was removed or the AppRole expired, the token-refresh kept failing, and excessive attempts to refresh it caused a crash of the Vault service on large clusters. User Impact: The excessive attempts to refresh the tokens caused a crash of the Vault service on large clusters. Resolution: Portworx nodes now detect excessive errors from the Vault service and will avoid accessing Vault for the next 5 minutes. Component: Volume Management Affected Versions: 3.0.5, 3.0.3 | Major |
PWX-36601 | Previously, the default timeout for rebalance HA-update actions was 30 minutes. This duration was insufficient for some very slow setups, resulting in HA-update failures. User Impact: The rebalance job for HA-update failed to complete. In some cases, the volume's HA-level changed unexpectedly. Resolution: The default rebalance HA-update timeout has been increased to 5 hour Components: Storage Affected Versions: 2.13.x, 3.0.x, 3.1.x | Major |
PWX-35312 | In version 3.1.0, a periodic job that fetched drive properties caused an increase in the number of API calls across all platforms. User Impact: The API rate limits approached their maximum capacity more quickly, stressing the backend. Resolution: Portworx improved the system to significantly reduce the number of API calls on all platforms. Component: Cloud Drives Affected Versions: 3.1.0 | Major |
PWX-30441 | For AWS users, Portworx did not update the drive properties for the gp2 drives that were converted to gp3 drives. User Impact: As the IOPS of such drives changed, but not updated, pool expansion failed on these drives. Resolution: During the maintenance cycle, that is required for converting gp2 drives to gp3, Portworx now refreshes the disk properties of these drives. Component: Cloud Drives Affected Versions: 3.1.x, 3.0.x, 2.13.x | Major |
PWX-36139 | During pool expansion with the add-drive operation using the CSI provider on a KVDB node, there is a possibility of the new drive getting the StorageClass of the KVDB drive instead of the data drive, if they are different.User Impact: In such a case, a drive might have been added but the pool expansion operation failed, causing some inconsistencies. Resolution: Portworx takes the StorageClass of only the data drives present in the node. Component: Pool Management Affected Versions: 3.1.x, 3.0.x, 2.13.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-3031 | For an Azure cluster with storage and storageless nodes using Premium LRS or SSD drive types, when a user updates the Portworx StorageClass to use PremiumV2 LRS or Ultra SSD drive types, the changes might not reflect on the existing nodes.Workaround: StorageClass changes will apply only to the new nodes added to the cluster. For existing nodes, perform the following steps:
Affected versions: 3.1.2 | Major |
PD-3012 | If maxStorageNodesPerZone is set to a value greater than the current number of worker nodes in an AKS cluster, additional storage nodes in an offline state may appear post-upgrade due to surge nodes.Workaround: Manually delete any extra storage node entries created during the Kubernetes cluster upgrade by following thenode decommission process.Components: Cloud Drives Affected versions: 2.13.x, 3.0.x, 3.1.x | Major |
PD-3013 | Pool expansion may fail if a node is rebooted before the expansion process is completed, displaying errors such as drives in the same pool not of the same type . Workaround: Retry the pool expansion on the impacted node. Components: Drive and Pool Management Affected versions: 3.1.2 | Major |
PD-3035 | Users may encounter issues with migrations of legacy shared volumes to shared4v service volumes appearing stuck if performed on a decommissioned node. Workaround: If a node is decommissioned during a migration, the pods running on that node must be forcefully terminated to allow the migration to continue. Component: Shared4v Volumes Affected version: 3.1.2 | Major |
PD-3030 | In environments where multipath is used to provision storage disks for Portworx, incorrect shutdown ordering may occur, causing multipath to shut down before Portworx. This can lead to situations where outstanding IOs from applications, still pending in Portworx, may fail to reach the storage disk. Workaround:
Affected Versions: 3.1.2 | Major |
3.1.1
April 03, 2024
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-35939 | For DR clusters, the cluster domain of the nodes is exposed in the node inspect and node enumerate SDK responses. This information is used by the operator to create the pod disruption budget, preventing loss during Kubernetes upgrades. | DR and Migration |
PWX-35395 | When Portworx encounters errors like checksum mismatch or bad disk sectors while reading data from the backend disk the IOOperationWarning alert will be raised. This alert is tracked by the metric px_alerts_iooperationwarning_total . | Storage |
PWX-35738 | Portworx now queries an optimized subset of VMs to determine the driveset to attach, avoiding potential errors during an upgrade where a transient state of a VM could have resulted in an error during boot. | Cloud Drives |
PWX-35397 | The start time for Portworx on both Kubernetes and vSphere platforms has been significantly reduced by eliminating repeated calls to the Kubernetes API and vSphere servers. | Cloud Drives |
PWX-35042 | The Portworx CLI has been enhanced with the following improvements:
| Cloud Drives |
PWX-33493 | For pool expansion operations with the pxctl sv pool expand command, the add-disk and resize-disk flags have been renamed to add-drive and resize-drive , respectively. The command will continue to support the old flags for compatibility. | Cloud Drives |
PWX-35351 | The OpenShift Console now displays the Used Space for CSI sharedV4 volumes. | Sharedv4 |
PWX-35187 | Customers can now obtain the list of Portworx images from the spec generator. | Spec Generator |
PWX-36543 | If the current license is set to expire within the next 60 days, Portworx now automatically updates the IBM Marketplace license to a newer one upon the restart of the Portworx service. | Licensing |
PWX-36496 | The error messages for pxctl license activate have been improved to return a more appropriate error message in case of double activation. | Licensing |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36416 | When a PX-StoreV2 pool reached its full capacity and could not be expanded further using the resize-drive option, it went offline due to a pool full condition.User Impact: If pool capacity reached a certain threshold, the pool went offline. Resolution: Since PX-StoreV2 pools cannot be expanded using the add-drive operation. You can increase the capacity on a node by adding new pools to it:
Affected Versions: 3.0.0 | Critical |
PWX-36344 | A deadlock in the Kubernetes Config lock led to failed pool expansion. User Impact: Customers needed to restart Portworx if pool expansion became stuck. Resolution: An unbuffered channel that resulted in a deadlock when written to in a very specific window is now changed to have a buffer, breaking the deadlock. Components: Pool Management Affected Versions: 2.13.x, 3.0.x | Major |
PWX-36393 | Occasionally, Portworx CLI binaries were installed incorrectly due to issues (e.g., read/write errors) that the installation process failed to detect, causing the Portworx service to not start. User Impact: Portworx upgrade process failed. Resolution: Portworx has improved the installation process by ensuring the correct installation of CLI commands and detecting these errors during the installation. Components: Install Affected Versions: 2.13.x, 3.0.x | Major |
PWX-36339 | For a sharedv4 service pod, there was a race condition where the cached mount table failed to reflect the unmounting of the path. User Impact: Pod deletion got stuck in the Terminating state, waiting for the underlying mount point to be deleted. Resolution: Force refresh of cache for an NFS mount point if it is not attached and is already unmounted. This will ensure that the underlying mount path gets removed and the pod terminates cleanly. Components: Sharedv4 Affected versions: 2.13.x, 3.0.x | Major |
PWX-36522 | When FlashArray Direct Access volumes and FlashArray Cloud Drive volumes were used together, the system couldn't mount the PVC due to an Invalid arguments for mount entry error, causing the related pods to not start. User Impact: Application pods failed to start. Resolution: The mechanism to populate the mount table on restart has been changed to ensure an exact device match rather than a prefix-based search, addressing the root cause of the incorrect mount entries and subsequent failures. Components: Volume Management Affected version: 3.1.0 | Major |
PWX-36247 | The field portworx.io/misc-args had an incorrect value of -T dmthin instead of -T px-storev2 to select the backend type..User Impact: Customers had to manually change this argument to -T px-storev2 after generating the spec from the spec generator.Resolution: The value for the field has been changed to -T px-storev2 .Components: FA-FB Affected version: 3.1.0 | Major |
PWX-35925 | When downloading air-gapped bootstrap specific for OEM release (e.g. px-essentials ), the script used an incorrect URL for the Portworx images.User Impact: The air-gapped bootstrap script fetched the incorrect Portworx image, particularly for Portworx Essentials. Resolution: The air-gapped bootstrap has been fixed, and now efficiently handles the OEM release images. Components: Install Affected version: 2.13.x, 3.0.x | Major |
PWX-35782 | In a synchronous DR setup, a node repeatedly crashed during a network partition because Portworx attempted to operate on a node from another domain that was offline and unavailable. User Impact: In the event of a network partition between the two domains, temporary node crashes could occur. Resolution: Portworx now avoids the nodes that are not online or unavailable from other domain. Components: DR and Migration Affected version: 3.1.0 | Major |
PWX-36500 | Older versions of Portworx installations with FlashArray Cloud Drive displayed an incorrect warning message in the pxctl status output on RHEL 8.8 and above OS versions, even though the issue had been fixed in the multipathd package that comes with these OS versions.User Impact: With Portworx version 2.13.0 or above, users on RHEL 8.8 or higher who were using FlashArray Cloud Drives saw the following warning in the pxctl status output: WARNING: multipath version 0.8.7 (between 0.7.7 and 0.9.3) is known to have issues with crashing and/or high CPU usage. If possible, please upgrade multipathd to version 0.9.4 or higher to avoid this issue .Resolution: The output of pxctl status has been improved to display the warning message for the correct RHEL versions.Components: FA-FB Affected version: 2.13.x, 3.0.x, 3.1.0 | Major |
PWX-33030 | For FlashArray Cloud Drives, when the skip_kpartx flag was set in the multipath config, the partition mappings for device mapper devices did not load, prevented Portworx from starting correctly.User Impact: This resulted in a random device (either a child or a parent/dm device) with the UUID label being selected and attempted to be mounted. If a child device was chosen, the mount would fail with a Device is busy error.Resolution: Portworx now avoids such a situation by modifying the specific unbuffered channel to include a buffer, thus preventing the deadlock. Components: FA-FB Affected version: 2.13.x, 3.0.x | Minor |
3.1.0.1
March 20, 2024
Visit these pages to see if you're ready to upgrade to this version:
This is a hotfix release intended for IBM Cloud customers. Please contact the Portworx support team for more information.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36260 | When installing Portworx version 3.1.0 from the IBM Marketplace catalog, the PX-Enterprise IBM Cloud license for a fresh installation is valid until November 30, 2026. However, for existing clusters that were running older versions of Portworx and upgraded to 3.1.0, the license did not automatically update to reflect the new expiry date of November 30, 2026.User Impact: With the old license expiring on April 2, 2024, Portworx operations could be affected after this date. Resolution: To extend the license until November 30, 2026, follow the instructions on the Upgrading Portworx on IBM Cloud via Helm page to update to version 3.1.0.1. Components: Licensing Affected versions: 2.13.x, 3.0.x, 3.1.0 | Critical |
3.1.0
January 31, 2024
Visit these pages to see if you're ready to upgrade to this version:
Starting with version 3.1.0:
- Portworx CSI for FlashArray and FlashBlade license SKU will only support Direct Access volumes and no Portworx volumes. If you are using Portworx volumes, reach out to the support team before upgrading Portworx.
- Portworx Enterprise will exclusively support kernel versions 4.18 and above.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- The auto_journal profile is now available to detect the IO pattern and determine whether the
journal
IO profile is beneficial for an application. This detector analyzes the incoming write IO pattern to ascertain whether thejournal
IO profile would improve the application's performance. It continuously analyzes the write IO pattern and toggles between thenone
andjournal
IO profiles as needed. - A dynamic labeling feature is now available, allowing Portworx users to label Volume Placement Strategies(VPS) flexibly and dynamically. Portworx now supports the use of dynamic labeling through the inclusion of
${pvc.labels.labelkey}
in values.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-31558 | Google Anthos users can now generate the correct Portworx spec from Portworx Central, even when storage device formats are incorrect. | Spec Generation |
PWX-28654 | Added the NonQuorumMember flag to the node inspect and Enumerate SDK API responses. This flag provides an accurate value depending on whether a node contributes to cluster quorum. | SDK/gRPC |
PWX-31945 | Portworx now provides an internal API for listing all storage options on the cluster. | SDK/gRPC |
PWX-29706 | Portworx now supports a new streaming Watch API that provides updates on volume information that has been created, modified, or deleted. | SDK/gRPC |
PWX-35071 | Portworx now distinguishes between FlashArray and FlashBlade calls, routing them to appropriate backends based on the current volume type (file or block), thereby reducing the load on FlashArray or FlashBlade backends. | FA-FB |
PWX-34033 | For FlashArray and FlashBlade integrations, many optimizations have been made in caching and information sharing, resulting in a significant reduction in number of REST calls made to the backing FlashArray and FlashBlade. | FA-FB |
PWX-35167 | The default timeout for the FlashBlade Network Storage Manager (NSM) lock has been increased to prevent Portworx restarts. | FA-FB |
PWX-30083 | Portworx now manages the TTL for alerts instead of relying on etcd's key expiry mechanism. | KVDB |
PWX-33430 | The error message displayed when a KVDB lock times out has been made more verbose to provide a better explanation. | KVDB |
PWX-34248 | The sharedv4 parameter in a StorageClass enables users to choose between sharedv4 and non-shared volumes:
| Sharedv4 |
PWX-35113 | Users can now enable the forward-nfs-attach-enable storage option for applications using sharedv4 volumes. This allows Portworx to attach a volume to the most suitable available nodes. | Sharedv4 |
PWX-32278 | On the destination cluster, all snapshots are now deleted during migration when the parent volume is deleted. | Stork |
PWX-32260 | The resize-disk option for pool expansion is now also available on TKGS clusters. | Cloud Drives |
PWX-32259 | Portworx now uses cloud provider identification by reusing the provider's singleton instance, avoiding repetitive checks if the provider type is already specified in the cluster spec. | Cloud Drives |
PWX-35428 | In environments with slow vCenter API responses, Portworx now caches specific vSphere API responses, reducing the impact of these delays. | Cloud Drives |
PWX-33561 | When using the PX-StoreV2 backend, Portworx now detaches partially attached driversets for cloud-drives only when the cloud-drives are not mounted. | Cloud Drives |
PWX-33042 | In a disaggregated deployment, storageless nodes can be converted to storage nodes by changing the node label to portworx.io/node-type=storage | Cloud Drives |
PWX-28191 | AWS credentials for Drive Management can now be provided through a Kubernetes secret px-aws in the same namespace where Portworx is deployed. | Cloud Drives |
PWX-34253 | Azure users will now see accurate storage type displays: Premium_LRS is identified as SSD, and NVME storage is also correctly represented. | Cloud Drives |
PWX-31808 | Pool deletion is now allowed for vSphere cloud drives. | Cloud Drives |
PWX-32920 | vSphere drives can now be resized up to a maximum of 62 TB per drive. | Pool Management |
PWX-32462 | Portworx now permits most overlapping mounts and will only reject overlapping mounts if a bidirectional (i.e., shared) parent directory mount is present. | px-runc |
PWX-32905 | Portworx now properly detects the NFS service on OpenShift platforms. | px-runc |
PWX-35292 | To reduce log volume in customer clusters, logs generated when a volume is not found during CSI mounting have been moved to the TRACE level. | CSI |
PWX-34995 | Portworx CSI for FlashArray and FlashBlade license SKU now counts Portworx and FA/FB drives separately based on the drive type. | Licensing |
PWX-35452 | The mount mapping's lock mechanism has been improved to prevent conflicts between unmount and mount processes, ensuring more reliable pod start-ups. | Volume Management |
PWX-33577 | The fstrim operation has been improved for efficiency:
| Storage |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-31652 | Portworx was unable to identify the medium for the vSphere cloud drives. User Impact: Portworx deployment failed on vSphere with cloud drives. Resolution: Portworx now successfully identifies the drive medium type correctly and can be deployed on a cluster with vSphere cloud drives. Components: Drive & Pool Management Affected Versions: 2.13.x | Critical |
PWX-35430 | Requests for asynchronous DR migration operations were previously load balanced to nodes that were not in the same cluster domain. User Impact: In hybrid DR setups, such as one where cluster A is synchronously paired with cluster B, and cluster B is asynchronously paired with cluster C, any attempts to migrate from Cluster B to Cluster C would result in failure, showing an error that indicates a BackupLocation not found .Resolution: Portworx now ensures that migration requests are load balanced within nodes in the same cluster domain as the initial request. Components: DR and Migration Affected Versions: 3.0.4 | Critical |
PWX-35277 | In an asynchronous DR deployment, if security/auth is enabled in a Portworx cluster, migrations involving multiple volumes would fail with authentication errors. User Impact: Migrations in asynchronous DR setups involving multiple volumes failed with authentication errors. Resolution: Authentication logic has been modified to handle migrations involving multiple volumes on the auth enabled clusters. Components: DR and Migrations Affected versions: 3.0.0 | Critical |
PWX-34369 | When using HTTPS endpoints for cluster pairing, Portworx incorrectly parsed the HTTPS URL scheme. User Impact: Cluster pairing would fail when using an HTTPS endpoint. Resolution: Portworx has now corrected the HTTPS URL parsing logic. Components: DR and Migration Affected versions: 3.0.0 | Critical |
PWX-35466 | Cloudsnaps or asynchronous DR operations failed when attempted from a metro cluster due to inaccessible credentials. This issue specifically occurred if the credential was not available from both domains of the metro cluster. User Impact: Cloudsnap operations or asynchronous DR from metro clusters could fail if the required credentials were not accessible in both domains. Resolution: Portworx now detects a coordinator node that has access to the necessary credentials for executing cloudsnaps or asynchronous DR operations. Components: DR and Migration Affected versions: 3.0.4 | Critical |
PWX-35324 | FlashArray Direct Access volumes are formatted upon attachment. All newly created volumes remain in a pending state until they are formatted. If Portworx was restarted before a volume had been formatted, it would delete the volume that was still in the pending state. User Impact: The newly created FlashArray Direct Access volumes were deleted. Resolution: Portworx now avoids deleting volumes that are in the pending state. Components: FA-FB Affected versions: 3.0.x | Critical |
PWX-35279 | Upon Portworx startup, if there were volumes attached from a FlashArray that was not registered in the px-pure-secret , Portworx would detach them as part of a cleanup routine.User Impact: Non-Portworx disks, including boot drives and other FlashArray volumes, were mistakenly detached from the node and required reconnection. Resolution: Portworx no longer cleans up healthy FlashArray volumes on startup. Components: FA-FB Affected versions: 2.13.11, 3.0.0, 3.0.4 | Critical |
PWX-34377 | Portworx was incorrectly marking FlashBlade Direct Attach volumes being transitioned to read-only status. This incorrect identification led to a restart of all pods associated with these volumes. User Impact: The restart of running pods resulted in application restarts or failures. Resolution: Checks within Portworx that were leading to false identification of Read-Only transitions for FlashBlade volumes have been fixed. Components: FA-FB Affected versions: 3.0.4 | Critical |
PWX-32881 | The CSI driver failed to register after the Anthos storage validation test suite was removed and a node was re-added to the cluster. User Impact: The CSI server was unable to restart if the Unix domain socket had been deleted. Resolution: The CSI server now successfully restarts and restores the Unix domain socket, even if the socket has been deleted. Update to this version if your workload involves deleting the kubelet directory during node decommissioning.Components: CSI Affected versions: 3.0.0 | Critical |
PWX-31551 | The latest OpenShift installs have more strict SELinux policies, which prevent non-privileged pods to access the csi.sock CSI interface file.User Impact: Portworx install failed. Resolution: All Portworx CSI pods are now configured as privileged pods.Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Critical |
PWX-31842 | On TKGI clusters, if Portworx service and pods were restarted, it led to excessive mounts (mount-leaks). User Impact: The IO operations on the node would progressively slow down, until the host would completely hang. Resolution: The mountpoints that are used by Portworx have been changed. Components: oci-monitor Affected versions: 2.1.1 | Critical |
PWX-35603 | When running Portworx on older Linux systems (specifically those using GLIBC 2.31 or older) in conjunction with newer versions of Kubernetes, Portworx previously failed to detect dynamic updates of pod credentials and tokens, hence led to Unauthorized errors when utilizing Kubernetes client APIs.User Impact: Users could encounter Unauthorized errors when using Kubernetes client APIs.Resolution: Dynamic token updates are now processed correctly by Portworx. Components: oci-monitor Affected versions: 3.0.1 | Critical |
PWX-34250 | If encryption was applied on both the client side (using an encryption passphrase) and the server side (using Server-Side Encryption, SSE) for creating credential commands, this approach failed to configure S3 storage in Portworx to use both encryption methods. User Impact: Configuration of S3 storage would fail in the above mentioned condition. Resolution: Users can now simultaneously use both server-side and client-side encryption when creating credentials for S3 or S3-compatible object stores. Components: Cloudsnaps Affected versions: 3.0.2, 3.0.3, 3.0.4 | Critical |
PWX-22870 | Portworx installations would by default automatically attempt to install NFS packages on the host system. However, since NFS packages add new users/groups, they were often blocked on Red Hat Enterprise Linux / CentOS platforms with SELinux enabled. User Impact: Sharedv4 volumes failed to attach on platforms with SELinux enabled. Resolution: Portworx installation is now more persistent on Red Hat Enterprise Linux / CentOS platforms with SELinux enabled. Components: IPV6 Affected versions: 2.5.4 | Major |
PWX-35332 | Concurrent access to an internal data structure containing NFS export entries resulted in a Portworx node crashing with the fatal error: concurrent map read and map write in knfs.HasExports error.User Impact: This issue triggered a restart of Portworx on that node. Resolution: A lock mechanism has been implemented to prevent this issue. Components: Sharedv4 Affected versions: 2.10.0 | Major |
PWX-34865 | When upgrading Portworx from version 2.13 (or older) to version 3.0 or newer, the internal KVDB version was also updated. If there was a KVDB membership change during the upgrade, the internal KVDB lost quorum in some corner cases. User Impact: The internal KVDB lost quorum, enforcing Portworx upgrade of a KVDB node that was still on an older Portworx version. Resolution: In some cases, Portworx now chooses a different mechanism for the KVDB membership change. Components: KVDB Affected versions: 3.0.0 | Major |
PWX-35527 | When a Portworx KVDB node went down and subsequently came back online with the same node ID but a new IP address, Portworx nodes on the other servers continued to use the stale IP address for connecting to KVDB. User Impact: Portworx nodes faced connection issues while connecting to the internal KVDB, as they attempted to use the outdated IP address. Resolution: Portworx now updates the correct IP address on such nodes. Component: KVDB Affected versions: 2.13.x, 3.0.x | Major |
PWX-33592 | Portworx incorrectly applied the time set by the execution_timeout_sec option.User Impact: Some operations time out before the time set through the execution_timeout_sec option.Resolution: The behavior of this runtime option is now fixed. Components: KVDB Affected versions: 2.13.x, 3.0.x | Major |
PWX-35353 | Portworx installations (version 3.0.0 or newer) failed on Kubernetes systems using Docker container runtime versions older than 20.10.0. User Impact: Portworx installation failed on Docker container runtimes older than 20.10.0. Resolution: Portworx can now be installed on older Docker container runtimes. Components: oci-monitor Affected versions: 3.0.0 | Major |
PWX-33800 | In Operator version 23.5.1, Portworx was configured so that a restart of the Portworx pod would also trigger a restart of the portworx.service backend.User Impact: This configuration caused disruptions in storage operations. Resolution: Now pod restarts do not trigger a restart of the portworx.service backend.Components: oci-monitor Affected versions: 2.6.0 | Major |
PWX-32378 | During the OpenShift upgrade process, the finalizer service, which ran when Portworx was not processing IOs, experienced a hang and subsequently timed out. User Impact: This caused the OpenShift upgrade to fail. Resolution: The Portworx service now runs to stop Portworx and sets the PXD_timeout during OpenShift upgrades. Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Major |
PWX-35366 | When the underlying nodes of an OKE cluster were replaced multiple times (due to upgrades or other reasons), Portworx failed to start, displaying the error Volume cannot be attached, because one of the volume attachments is not configured as shareable .User Impact: Portworx became unusable on nodes that were created to replace the original OKE worker nodes. Resolution: Portworx now successfully starts on such nodes. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33413 | After an upgrade, when a zone name case was changed, Portworx considered this to be a new zone. User Impact: The calculation of the total storage in the cluster by Portworx became inaccurate. Resolution: Portworx now considers a zone name with the same spelling, regardless of case, to be the same zone. For example, Zone1, zone1, and ZONE1 are all considered the same zone. Components: Cloud Drives Affected versions: 2.12.1 | Major |
PWX-33040 | For Portworx users using cloud drives on the IBM platform, when the IBM CSI block storage plugin was unable to successfully bind Portworx cloud-drive PVCs (for any reason), these PVCs remained in a pending state. As a retry mechanism, Portworx created new PVCs. Once the IBM CSI block storage plugin was again able to successfully provision drives, all these PVCs got into a bound state.User Impact: A large number of unwanted block devices were created in users' IBM accounts. Resolution: Portworx now cleans up unwanted PVC objects during every restart and KVDB failover. Components: Cloud Drives Affected versions: 2.13.0 | Major |
PWX-35114 | The storageless node could not come online after Portworx was deployed and showed the failed to find any available datastores or datastore clusters error.User Impact: Portworx failed to start on the storageless node which had no access to a datastore. Resolution: Storageless nodes can now be deployed without any access to a datastore. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33444 | If a disk that was attached to a node became unavailable, Portworx continuously attempted to find the missing drive-set. User Impact: Portworx failed to restart. Resolution: Portworx now ignores errors related to missing disks and attempts to start by attaching to the available driveset, or it creates a new driveset if suitable drives are available on the node. Components: Cloud Drives Affected versions: 2.13.x, 3.0.x | Major |
PWX-33076 | When more than one container mounted to a docker volume, all of them mounted to the same path as the mount path was not unique as it only had the volume name. User Impact: When one container used to go offline, it would unmount for the other container mounted to the same volume. Resolution: The volume mount HTTP request ID is now attached to the path which makes the path unique for every mount to the same volume. Components: Volume Management Affected versions: 2.13.x, 3.0.x | Major |
PWX-35394 | Host detach operation on the volume failed with the error HostDetach: Failed to detach volume .User Impact: A detach or unmount operation on a volume would get stuck if attach and detach operations were performed in quick succession, leading to incomplete unmount operations. Resolution: Portworx now reliably handles detach or unmount operations on a volume, even when attach and detach operations are performed in quick succession. Components: Volume Management Affected Versions: 2.13.x, 3.0.x | Major |
PWX-32369 | In a synchronous DR setup, cloudsnaps with different objectstores for each domain failed to backup and cleanup the expired cloudsnaps. User Impact: The issue occurred because of a single node, which did not have access to both the objectstores, was performing cleanup of the expired cloudsnaps. Resolution: Portworx now designates two nodes, one in each domain, to perform the cleanup of the expired cloudsnaps. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-35136 | During cloudsnap deletions, some objects were not removed because the deletion requests exceeded the S3 API's limit for the number of objects that could be deleted at once. User Impact: This would leave objects on S3 for deleted cloudsnaps, thereby consuming S3 capacity. Resolution: Portworx has been updated to ensure that deletion requests do not exceed the S3 API's limit for the number of objects that can be deleted. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-34654 | Cloudsnap status returned empty results without any error for a taskID that was no longer in the KVDB. User Impact: No information was provided for users to take corrective actions. Resolution: Portworx now returns an error instead of empty status values. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-31078 | When backups were restored to a namespace different from the original volume's, the restored volumes retained labels indicating the original namespace, not the new one. User Impact: The functionality of sharedv4 volumes would impact due to the labels not accurately reflecting the new namespace in which the volumes were located. Resolution: Labels for the restored volume have been fixed to reflect the correct namespace in which the volume resides. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Major |
PWX-32278 | During migration, on destination cluster the orphan snapshot was left behind even though parent volume was not present during certain error scenarios. User Impact: This can lead to an increase in capacity usage. Resolution: Now, such orphan cloudsnaps are deleted when the parent volume is deleted. Components: Asynchronous DR Affected versions: 2.13.x, 3.0.x | Major |
PWX-35084 | Portworx incorrectly determined the number of CPU cores when running on hosts enabled with cGroupsV2. User Impact: This created issues when limiting the CPU resources, or pinning the Portworx service to certain CPU cores. Resolution: Portworx now properly determines number of available CPU cores. Components: px-runc Affected versions: 3.0.2 | Major |
PWX-32792 | On OpenShift 4.13, Portworx did not proxy portworx-service logs. It kept journal logs from multiple machine IDs, which caused the Portworx pod to stop proxying the logs from portworx.service .User Impact: In OpenShift 4.13, the generation of journal logs from multiple machine IDs led to the Portworx pod ceasing to proxy the logs from portworx.service .Resolution: Portworx log proxy has been fixed to locate the correct journal log using the current machine ID. Components: Monitoring Affected versions: 2.13.x, 3.0.x | Major |
PWX-34652 | During the ha-update process, all existing volume labels were removed and could not be recovered.User Impact: This resulted in the loss of all volume labels, significantly impacting volume management and identification. Resolution: Volume labels now do not change during the ha-update process.Components: Storage Affected versions: 2.13.x, 3.0.x | Major |
PWX-34710 | A large amount of log data was generated during storage rebalance jobs or dry runs. User Impact: This led to log files occupying a large amount of space. Resolution: The volume of logging data has been reduced by 10%. Components: Storage Affected versions: 2.13.x, | Major |
PWX-34821 | In scenarios where the system is heavily loaded and imbalanced, elevated syncfs latencies were observed. This situation led to the fs_freeze call, responsible for synchronizing all dirty data, timing out before completion.User Impact: Users experienced timeouts during the fs_freeze call, impacting the normal operation of the system.Resolution: Restart the system and retry the snapshot operation. Components: Storage Affected versions: 3.0.x | Major |
PWX-33647 | When the Portworx process are restarted, it verifies the existing mounts on the system for sanity. If one of the mounts was NFS mount of a Portworx volume, the mount point verification would hung as Portworx was in the process of starting up. User Impact: The Portworx process would not come up and would enter an infinite wait, waiting for the mount point verification to return. Resolution: When Portworx is starting up, it now skips the verification of Portworx-backed mount points to allow the startup process to continue. Components: Storage Affected versions: 3.0.2 | Major |
PWX-33631 | Portworx applied locking mechanisms to synchronize requests across different worker nodes during the provisioning of CSI volumes, to distribute workloads evenly causing decrease in performance for CSI volume creation. User Impact: This synchronization approach led to a decrease in performance for CSI volume creation in heavily loaded clusters. Resolution: If experiencing slow CSI volume creation, upgrade to this version. Components: CSI Affected versions: 2.13.x, 3.0.x | Major |
PWX-34355 | In certain occasions, while mounting an FlashArray cloud drive disks backing a storage pool, Portworx used the single path device instead of multipath device. User Impact: Portworx entered in the StorageDown state.Resolution: Portworx now identifies the multipath device associated with a given device name and uses this multipath device for mounting operations. Components: FA-FB Affected versions: 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.13.11, 3.0.0 | Major |
PWX-34925 | When a large number of FlashBlade Direct Access volumes were created subsequently could lead to restating of Portworx with the fatal error: sync: unlock of unlocked mutex error.User Impact: When trying to create a large number of FlashBlade volumes concurrently, Portworx process might get restarted due to contention on the lock. Resolution: Improved the locking mechanism to avoid this error. Components: FA-FB Affected versions: 3.0.4 | Major |
PWX-35680 | The Portworx spec generator was incorrectly defaulting telemetry to be disabled when the StorageCluster spec was generated outside of the Portworx Central UI. This does not affect customers who applied a storagecluster with an empty telemetry spec or generated their spec through the UI. User Impact: Telemetry was disabled by default. Resolution: To enable telemetry, users should explicitly specify it if intended. Components: Spec-Gen Affected versions: 2.12.0, 2.13.0, 3.0.0 | Major |
PWX-34325 | When operating Kubernetes with the containerd runtime and a custom root directory set in the containerd configuration, the installation of Portworx would fail.User Impact: Portworx install would fail, resulting in unusual error messages due to a bug in containerd. Resolution: The installation will now intercept the error message and replace it with a clearer message that includes suggestions on how to fix the Portworx configuration. Components: Installation Affected versions: 3.0.0 | Minor |
PWX-33557 | The CallHome functionality sometimes unconditionally attempted to send the data to the local telemetry service. User Impact: This caused errors, if the telemetry was disabled. Resolution: The CallHome now sends data only if the Telemetry has been enabled. Components: Monitoring Affected versions: 3.0.0 | Minor |
PWX-32536 | Portworx installation failed on certain Linux systems using cGroupsV2 and containerd container runtimes, as it was unable to properly locate container identifiers. User Impact: Portworx installation failed. Resolution: The container scanning process has been improved to ensure successful Portworx installation on such platforms. Components: oci-monitor Affected versions: 2.13.x, 3.0.x | Minor |
PWX-30967 | During volume provisioning, snapshot volume labels are included in the count. The nodes were disqualified for provisioning when volume_anti_affinity or volume_affinity VPS was configured, resulting in volume creation failures.User Impact: When stale snapshots existed, the creation of volumes using the VPS with either volume_anti_affinity or volume_affinity setting would fail.Resolution: Upgrade to this version and retry previously failed volume creation request. Components: Stork Affected versions: 2.13.2 | Minor |
PWX-33999 | During the installation of NFS packages, Portworx incorrectly interpreted any issues or errors that occurred as timeout errors. User Impact: Portworx misrepresented and masked the original issues. Resolution: Portworx now accurately processes NFS installation errors during its installation. Components: px-runc Affected versions: 2.7.0 | Minor |
PWX-33008 | Creation of a proxy volume with CSI enabled and RWX access mode failed due to the default use of sharedv4 for all RWX volumes in CSI. User Impact: Users could not create proxy volumes with CSI enabled and RWX access mode. Resolution: To successfully create proxy volumes with CSI and RWX access mode, upgrade to this version. Components: Sharedv4 Affected versions: 3.0.0 | Minor |
PWX-34326 | The Portworx CSI Driver GetPluginInfo API returned an incorrect CSI version. User Impact: This resulted in confusion when the CSI version was retrieved by the Nomad CLI. Resolution: The Portworx CSI Driver GetPluginInfo API now returns the correct CSI version. Components: CSI Affected versions: 2.13.x,3.0.x | Minor |
PWX-31577 | Occasionally, when a user requested cloudsnap to stop, it would lead to incorrect increase in the available resources. User Impact: More cloudsnaps were started and they were stuck in the NotStarted state as resources were unavailable.Resolution: Stopping cloudsnaps does not incorrectly now increase the available resources, thus avoiding the issue. Components: Cloudsnaps Affected versions: 2.13.x, 3.0.x | Minor |
Known issues (Errata)
Issue Number | Issue Description | Severity |
---|---|---|
PD-2673 | KubeVirt VM or container workloads may remain in the Starting state due to the remounting of volumes failing with a device busy error.Workaround:
Affected versions: 2.13.x, 3.0.x | Critical |
PD-2546 | In a synchronous DR deployment, telemetry registrations might fail on the destination cluster. Workaround:
Affected versions: 3.0.4 | Critical |
PD-2574 | If a disk is removed from an online pool using the PX-StoreV2 backend, it may cause a kernel panic. Workaround: To avoid kernel panic, do not remove disks from an online pool or node. Components: Storage Affected versions: NA | Critical |
PD-2387 | In OpenShift Container Platform (OCP) version 4.13 or newer, application pods using Portworx sharedv4 volumes can get stuck in Terminating state. This is because kubelet is unable to stop the application container when an application namespace is deleted.Workaround:
Terminating state, reboot the node on which the pod is running. Note that after rebooting, it might take several minutes for the pod to transition out of the Terminating state.Components: Sharedv4 Affected versions: 3.0.0 | Major |
PD-2621 | Occasionally, deleting a TKGi cluster with Portworx fails with the Warning: Executing errand on multiple instances in parallel. error.Workaround: Before deleting your cluster, perform the following steps:
Components: Kubernetes Integration Affected versions: | Major |
PD-2631 | After resizing a FlashArray Direct Access volume with a filesystem (such as ext4, xfs, or others) by a significant amount, you might not be able to detach the volume, or delete the pod using this volume. Workaround: Allow time for the filesystem resizing process to finish. After the resize is complete, retry the operations. Components: FA-FB Affected versions: 2.13.x, 3.0.x, 3.1.0 | Major |
PD-2597 | Online pool expansion with the add-disk operation might fail when using the PX-StoreV2 backend.Workaround: Enter the pool into maintenance mode, then expand your pool capacity. Components: Storage Affected versions: 3.0.0, 3.1.0 | Major |
PD-2585 | The node wipe operation might fail with the Node wipe did not cleanup all PX signatures. A manual cleanup maybe required. error on a system with user setup device names containing specific Portworx reserved keywords(such as pwx ).Workaround: You need to rename or delete devices that use Portworx reserved keywords in their device names before retrying the node wipe operation. Furthermore, it is recommended not to use Portworx reserved keywords such as px , pwx , pxmd , px-metadata , pxd , or pxd-enc while setting up devices or volumes, to avoid encountering such issues.Components: Storage Affected versions: 3.0.0 | Major |
PD-2665 | During a pool expansion operation, if a cloud-based storage disk drive provisioned on a node is detached before the completion of the pool resizing or rebalancing, you can see the show drives: context deadline exceeded error in the output of the pxctl sv pool show command.Workaround: Ensure that cloud-based storage disk drives involved in pool expansion operations remain attached until the resizing and rebalancing processes are fully completed. In cases where a drive becomes detached during this process, hard reboot the node to restore normal operations. Component: PX-StoreV2 Affected versions: 3.0.0, 3.1.0 | Major |
PD-2833 | With Portworx 3.1.0, migrations might fail between two clusters if one of the clusters is running a version of Portworx older than 3.1.0, resulting in a key not found error.Workaround: Ensure that both the source and destination clusters are upgraded to version 3.1.0 or newer. Components: DR & Migration Affected Versions: 3.1.0 | Minor |
PD-2644 | If an application volume contains a large number of files (e.g., 100,000) in a directory, changing the ownership of these files can take a long time, causing delays in the mount process. Workaround: If the ownership change is taking a long time, Portworx by Pure Storage recommends setting fsGroupChangePolicy to OnRootMismatch . For more information, see the Kubernetes documentation.Components: Storage Affected versions: 2.13.x, 3.0.x | Minor |
PD-2359 | When a virtual machine is transferred from one hypervisor to another and Portworx is restarted, the CSI container might fail to start properly and shows the CrashLoopBackoff error.Workaround: Remove the topology.portworx.io/hypervisor label from the affected node.Components: CSI Affected versions: 2.13.x, 3.0.x | Minor |
PD-2579 | When the Portworx pod (oci-mon ) cannot determine the management IP used by the Portworx container, the pxctl status command output on this pod shows a Disabled or Unhealthy status.Workaround: This issue is related to display only. To view the correct information, run the following command directly on the host machine: kubectl exec -it <oci-mon pod> -- nsenter --mount=/host_proc/1/ns/mnt -- pxctl status .Components: oci-monitor Affected versions: 2.13.0 | Minor |
3.0.5
April 17, 2024
Visit these pages to see if you're ready to upgrade to this version:
For users currently on Portworx versions 2.11.x, 2.12.x, or 2.13.x, Portworx by Pure Storage recommends upgrading to Portworx 3.0.5 instead of moving to the next major version.
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-36858 | When using Hashicorp Vault integration, Portworx nodes kept attempting to connect to the Vault service. In the case of misconfigured authentication, the excessive attempts to log in to Vault crashed the Vault service. User Impact: Excessive attempts led to crashing of Vault services. Resolution: Portworx has implemented exponential back-off to reduce the frequency of login attempts to the Vault service. Components: Secret Store Affected Versions: 3.0.4 | Critical |
PWX-36873 | When Portworx is using Harsicorp's Vault configured with Kubernetes or Approle authentication, it automatically refreshes expired access tokens. However, if the Kubernetes or Service Account got removed or Approle expired, the token-refresh failed. User Impact: Excessive attempts to refresh the access tokens caused the Vault service to crash, especially in large clusters. Resolution: The Portworx node now identifies excessive errors from the Vault service and and will avoid accessing Vault for a cooling-off period of 5 minutes. Components: Secret Store Affected Versions: 3.0.3 | Major |
PWX-36847 | In case of a Kubernetes API call failure, Portworx used to incorrectly assume the zone of the node to be the default empty zone. Due to this, it tried to attach drives that belonged to that default zone. As there are no drives created in this default zone, Portworx went ahead and created a new set of drives, assuming this node to be in a different zone. User Impact: This led to duplicate entries and cluster went out of quorum. Resolution: Portworx now does not treat the default zone as a special zone. This allows Portworx to check for any existing drives that are already attach or available to be attached from any zone before trying to create new ones. Components: Cloud Drives Affected Versions: 3.0.3 | Major |
PWX-36786 | An offline, storageless node was incorrectly auto-decommissioned due to specific race conditions, resulting in the clouddrive DriveSet being left orphaned. User Impact: Portworx failed to start when attempting to operate as a storageless node using this orphaned clouddrive DriveSet, due to the node being in a decommissioned state. Resolution: Portworx now automatically cleans up such orphaned storageless clouddrive DriveSets, allowing it to start successfully. Components: Cloud Drive Affected Versions: 2.13.x, 3.0.x, and 3.1.x | Major |
3.0.4
November 15, 2023
Visit these pages to see if you're ready to upgrade to this version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-34315 | Improved how Portworx identifies pods with volumes in the Read-Only state before restarting them. | Storage |
PWX-34153 | CSI sidecar images are updated to the latest open source versions. | CSI |
PWX-34029 | Portworx now removes stale FlashArray multipath devices upon startup, which may result from pod failovers (for FlashArray Direct Access) or drive set failovers (for FlashArray Cloud Drives) while Portworx was not running. These stale devices had no direct impact but could have led to slow operations if many were present. | FA-FB |
PWX-34974 | Users can now configure the default duration, which is set to 15 minutes, after which the logs should be refreshed to get the most up-to-date statistics for FlashBlade volumes, using the following command::pxctl cluster options update --fb-stats-expiry-duration <time-in-minutes> The minimum duration for refresh is one minute. | FA-FB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-34334 | Cloudsnaps of an aggregated volume with a replication level of 2 or more uploaded incorrect data if one of the replica nodes from which a previous cloudsnap operation had been executed was down. User Impact: The most recent snapshots were lost. Resolution: Portworx now forces a full backup in scenarios where the previous cloudsnap node is down. Components: Cloudsnaps Affected versions: 3.0.x | Critical |
PWX-33632 | If an attach request remained in the processing queue for a long time, it would lead to a panic. User Impact: Portworx would restart on the node. This was because an FA attach operation involved making REST API calls to FA, as well as running iSCSI rescans, which consumed more time. When Portworx received a high volume of requests to attach FA DirectAccess volumes, the queue for these attach requests gradually grew over time, leading to a panic in Portworx. Resolution: The timeout for queued attach requests has been increased to 15 minutes for FA DirectAccess volumes. Components: FA-FB Affected versions: 2.13.x, 3.0.x | Critical |
PWX-34885 | When NFS proxy volumes were created, it resulted in the restart of the Portworx service. User Impact: Although NFS proxy volumes were created, the service restart affected user applications. Resolution: Portworx now creates NFS proxy volumes successfully without restarting the Portworx service. Components: Storage Affected versions: 3.0.2 | Critical |
PWX-34277 | When an application pod using an FA Direct Access volume was failed over to another node, and Portworx was restarted on the original node, the pod on the original node became stuck in the Terminating state. User Impact: Portworx didn't clean up the mountpaths where the volume had previously been attached, as it couldn't locate the application on the local node. Resolution: Portworx now cleans up the mountpath even when the application is not found on the node. Components: FA-FB Affected versions: 2.13.x, 3.0.x | Major |
PWX-30297 | Portworx failed to restart when a multipath device was specified for the internal KVDB. Several devices with the kvdbvol label were found for the multipath device. Portworx selected the first device on the list, which might not have been the correct one.User Impact: Portworx failed to start because it selected the incorrect device path for KVDB. Resolution: When a multipath device is specified for the internal KVDB, Portworx now selects the correct device path. Components: KVDB Affected versions: 2.11.x | Major |
PWX-33935 | When the --sources option was used in the pxctl volume ha-update command for the aggregated volume, it caused the Portworx service processes to abort with an assertion.User Impact: The Portworx service on all nodes in the cluster continuously kept restarting. Resolution: Contact the Portworx support team to restore your cluster. Components: Storage Affected versions: 2.13.x, 3.0.x | Major |
PWX-33898 | When two pods, both using the same RWO FA Direct Access volume, were started on two different nodes, Portworx would move the FA Direct Access volume attachment to the node where the most recent pod was running, rather than rejecting the setup request for the second pod. User Impact: A stale FA Direct Access multipath device remained on the original node where the first pod was started, causing subsequent attach or mount requests on that node to fail. Resolution: A second pod request for the same RWO FA Direct Access volume on a different node will now be rejected if such a FA Direct Access volume is already attached and in use on another node. Components: FA-FB Affected versions: 2.13.11 | Major |
PWX-33828 | If you deleted a FA Direct Access PVC attached to an offline Portworx node, Portworx removed the associated volume from its KVDB. However, the FlashArray did not delete its associated volume because it remained connected to the offline node on the FlashArray. User Impact: This created orphaned volumes on the FlashArray. Resolution: Portworx now detects a volume that is attached to an offline Portworx node and will disconnect it from all the nodes in the FlashArray and avoid orphaned volumes. If there are any existing orphaned volumes, clean them manually. Components: FA-FB Affected versions: 2.13.8 | Major |
3.0.3
October 11, 2023
Notes
- This version addresses security vulnerabilities.
- Starting with version 3.0.3, aggregated volumes with PX-StoreV2 are not supported.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-32255 | Now the runtime option fast_node_down_detection is enabled by default. This option allows quick detection of when the Portworx service goes offline. | Storage |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33113 | Portworx reduced the pricing for GCP Marketplace from 55 cents/node/hour to 33 cents/node/hour, but this change was not being reflected for existing users who were still reporting billing to the old endpoint. User Impact: Existing GCP Marketplace users were being incorrectly billed at the previous rate of 55 cents/node/hour. Resolution: Upgrade Portworx to version 3.0.3 to reflect the new pricing rate. Components: Billing Affected versions: 2.13.8 | Critical |
PWX-34025 | In certain cases, increasing the replication level of a volume on a PX-StoreV2 cluster created new replicas with non-zero blocks that had been overwritten with zeros on the existing replicas. User Impact: The Ext4 filesystem reported a mismatch and delayed allocation failures when a user application attempted to write data to the volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the volume.Components: PX-StoreV2 Affected versions: 3.0.2 | Major |
3.0.2
September 28, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-32226 | AWS users can now choose to enable server side encryption for s3 credentials, assuming the s3 object-store provider supports it. Use the --s3-sse flag with either the AES256 or aws:kms value.
| Cloudsnaps |
PWX-33229 | Previously, a Portworx license would expire if Portworx could not reach its billing server within 72 hours. Now users can continue to use Portworx for up to 30 days if the billing servers are not reachable. | Licensing |
PWX-31233 | Portworx has removed volume size enforcement for FlashArray and FlashBlade Direct Access volumes. This will allow users to create volumes greater than 40TiB for all license types. | Licensing |
PWX-33551 | Users can now configure the REST API call timeout (in seconds) for FA/FB by adding the new environment variable PURE_REST_TIMEOUT to the StorageCluster. When updating this value, the execution timeout should also be updated accordingly using the following command:pxctl cluster options update --runtime-options execution_timeout_sec=<sec> PURE_REST_TIMEOUT is set to 8 seconds and execution_timeout_sec to 60 seconds by default. Contact Portworx support to find the right values for your cluster. | FA-FB |
PWX-33364 | As part of FlashArray integration, Portworx has now reduced the number of API calls it makes to the arrays endpoint on FA. | FA-FB |
PWX-33593 | Portworx now caches certain FlashArray attachment system calls, improving the performance of mount operations for FA backed volumes on nodes with large numbers of attached devices, or many redundant paths to the array. | FA-FB |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33451 | In certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data was expected. User Impact: Ext4 filesystem complained about a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume.Components: Storage Affected versions: 3.0.0, 2.12.x, 2.13.x | Critical |
PWX-33258 | Sometimes, Portworx timed out FlashBlade direct access volume creation when it took over 30 seconds. User Impact: Volume creation stayed in a pending state. Resolution: The timeout for FB volume creation has been increased to 180 seconds (3 minutes) to allow more time for FBDA volume creation. User can now use the --fb-lock-timeout cluster option to increase the timeout for FB volume creation beyond 180 seconds (3 minutes).Components: FA-FB Affected versions: 2.13.6 | Critical |
PWX-32428 | For the PKS environment, the sharedv4 mount failed on the remote client node with the error No such file or directory .User Impact: The restarts of the Portworx pods and service lead to excessive mounts (mount-leaks) on the PKS platforms. Thus, progressively slowing down the IO operations on the node. Resolution: The mountpoints are changed that Portworx uses on the PKS platform. If you are experiencing slowdowns on a PKS node, upgrade the Operator to the latest version, and reboot the affected PKS nodes. Components: Sharedv4 Affected versions: 2.12.x, 2.13.x | Critical |
PWX-33388 | The standalone SaaS metering agent crashed the Portworx container with a nil panic error. User Impact: This caused the Portworx container on one node to crash continuously. Resolution: Upgrade to 3.0.2 if you are using a SaaS license to avoid this issue. Components: Control Plane Affected versions: 3.0.1, 3.0.0 | Critical |
PWX-32074 | The CPU core numbers were wrongly detected by the px-runc command.User Impact: Portworx did not start on the requested cores. Resolution: The behavior of the --cpuset-cpus argument of the px-runc install command has been fixed. User can now specify the CPUs on which Portworx execution should be allowed.Components: px-runc Affected versions: 2.x.x | Critical |
PWX-33112 | Timestamps were incorrectly recorded in the write-ahead log. User Impact: The write operations were stuck due to a lack of log reservation space. Resolution: Portworx now consistently flushes timestamp references into the log. Components: Storage Affected versions: 2.12.x, 2.13.x | Critical |
PWX-31605 | The pool expansion failed because the serial number from the WWID could not be extracted. User Impact: FlashArray devices (both cloud drives and direct access) encountered expansion or attachment failures when multipath devices from other vendors (such as HPE or NetApp) were attached. Resolution: This issue has been fixed. Components: Pool Management Affected versions: 2.13.2 | Critical |
PWX-33120 | Too many unnecessary vSphere API calls were made by Portworx. User Impact: An excess of API calls and vSphere events could have caused confusion and distraction for users of vSphere Cloud Drives. Resolution: If you are seeing many vSphere VM Reconfigure events at a regular interval in the clusters configured with Portworx Cloud Drives, upgrade Portworx to the latest version. Components: Metering & Billing Affected versions: 3.0.0 | Major |
PWX-33299 | When using custom image registry, OCI-Monitor was unable to locate Kubernetes nampspaces to pull secrets. User Impact: Portworx installation failed with the error Failed retrieving default/tcr-pull-cpaas-5000 . Resolution: Portworx now consults container-runtime and Kubernetes to determine the correct Kubernetes namespace for Portworx installation. Components: OCI Monitor Affected versions: 3.0.0, 2.13.x, 2.12.x | Major |
PWX-31840 | When resizing a volume, the --provisioning-commit-labels cluster option was not honored, resulting in unlimited thin provisioning. User Impact: Portworx volumes were resized to large sizes without rejections, exceeding pool provisioning limits. Resolution: Now the --provisioning-commit-labels cluster option is honored during resizing volumes and prevents unexpected large volumes.Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-32572 | When using the older Containerd versions (v1.4.x or 1.5.x), Portworx kept opening connections to Containerd, eventually depleting all the file-descriptors available on the system. User Impact: Portworx nodes crashed with the too many open files error. Resolution: Portworx no longer leaks the file-descriptors when working with older Containerd versions. Components: OCI Monitor Affected versions: 2.13.6, 3.0.0 | Minor |
PWX-30781 | The kubernetes version parameter (?kbver ) in the air-gapped script did not process the version extension.User Impact: The script generated the wrong image URLs for the Kubernetes dependent images. Resolution: Parsing of the kbver parameter has been fixed. Components: Spec Generator Affected versions: 3.0.0 | Minor |
3.0.1
September 3, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer, causing the Portworx license status to appear expired. User Impact: Users could not create new volumes. Resolution: The auth token is no longer used by Portworx when making API or api_version calls to FA during license validation.Components: FA-FB Affected versions: 3.0.0 | Critical |
PWX-33223 | Portworx was hitting a panic when a value was set for an uninitialized object. User Impact: This caused the Portworx container to crash and restart. Resolution: Upgrade to Portworx version 3.0.1 if using Pure cloud drives. Components: FA-FB Affected versions: 3.0.0 | Major |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
3.0.0
July 11, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
Portworx 3.0.0 requires Portworx Operator 23.5.1 or newer.
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
AWS users can now deploy Portworx with the PX-StoreV2 datastore. In order to have PX-StoreV2 as your default datastore, your cluster should pass the preflight check, which verifies your cluster's compatibility with the PX-StoreV2 datastore.
-
You can now provision and use cloud drives on FlashArrays that are in the same zone using the CSI topology for FlashArray Cloud Drives feature. This improves fault tolerance for replicas, performance, and manageability for large clusters.
-
For environments such as GCP and Anthos that follow blue-green upgrade model, Portworx allows temporary license extension to minimize downtime during upgrades. Once you start the license expansion, the Portworx cluster's license will temporarily be extended to accommodate up to double the number of licensed nodes. While the existing nodes (called blue nodes) serve production traffic, Portworx will expand the cluster by adding new nodes (called green nodes) that have upgraded Linux OS or new hardware.
-
Portworx now offers the capability to utilize user-managed keys for encrypting cloud drives on Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE). By leveraging powerful encryption algorithms, the Oracle disk encryption feature converts data into an encrypted format, ensuring that unauthorized individuals cannot access it. You can specify the encryption key in the StorageCluster using the following cloud-drive volume specifications:
type=pv-<number-of-vpus>,size=<size-of-disk>,kms=<ocid-of-vault-key>
-
Portworx now enables you to define custom tags for cloud drives provisioned across various platforms such as AWS, Azure, GCP, and Oracle cloud. While installing Portworx, you can specify the custom tags in the StorageCluster spec:
type=<device-type>,size=<volume-size>,tags=<custom-tags>
This user-defined metadata enhances flexibility, organization, and provides additional contextual information for objects stored in the cloud. It empowers users with improved data management, search capabilities, and greater control over their cloud-based data.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description | Component |
---|---|---|
PWX-29486 | Portworx now supports online expansion of storage pools containing auto journal devices with disk-resize operation. | Pool Management |
PWX-29435 | When you run the pxctl sv pool expand -o add-disk command, the common disk tags from existing drives will be attached to the newly added cloud-drive disk. | Pool Management |
PWX-28904 | Storage pool expansion now supports online pool resizing on Azure, with no downtime. This is as long as Microsoft's documentation requirements are met. | Pool Management |
PWX-30876 | Pool expansion with add-disk operation is now supported for repl1 volumes. | Pool Management |
PWX-29863 | The pool expansion completion message is improved to Pool resize requested successfully. Please check resize operation status with pxctl sv pool show . | Pool Management |
PWX-28665 | The pxctl cd list command now lists cloud-drives on nodes with local drives. | Cloud Drives |
PWX-28697 | FlashArray cloud drives now show information about the array they are located on. Use pxctl cd inspect to view this information. | Cloud Drives |
PWX-29348 | Added 3 new fields to the CloudBackupSize API to reflect the correct backup size:
| Cloudsnaps |
PWX-27610 | Portworx will now periodically defragment the KVDB database. KVDB will be defragmented every 2 weeks by default, if the DB size is greater than 100 MiB. You can also configure the defragment schedule using the following options with the pxctl cluster options update command:
| KVDB |
PWX-31403 | For AWS clusters, Portworx now defaults the following configurations for dedicated KVDB disk:
| KVDB |
PWX-31055 | The alert message for VolumeSpaceLow is improved to show clear information. | Storage |
PWX-29785 | Improved the implementation to restrict the nodiscard and autofstrim flags on XFS volumes. These two flags are disabled for volumes formatted with XFS. | PX-StoreV1 |
PWX-30557 | Portworx checks pool size and drive count limits before resizing the storage pool. It will abort with a proper error message if the resolved pool expansion plan exceeds limits. | PX-StoreV2 |
PWX-30820 | Portworx now redistributes cloud migration request received from stork between all the nodes in the cluster using a round-robin mechanism. This helps evenly distribute the migration workload across all the nodes in the cluster and avoids hot spots. | DR & Migration |
PWX-29428 | Portworx CSI images now use the registry.k8s.io registry. | CSI |
PWX-28035 | Portworx now supports distributing FlashArray Cloud Drive volumes among topologically distributed FlashArrays. | FA-FB |
PWX-31500 | The pxctl cluster provision-status command will now show more states of a pool. | CLI |
PWX-31257 | The pxctl alerts show command with the --start-time and --end-time options can now be used independently. | Monitoring |
PWX-30754 | Added support for leases permission to the PVC controller ClusterRole. | Spec Generation |
PWX-29202 | pxctl cluster provision-status will now show the host name for nodes. The host name helps you to correlate that command's output with the node list provided by pxctl status . | CLI |
Fixes
Issue Number | Issue Description | Severity |
---|---|---|
PWX-30030 | Some volumes incorrectly showed Not in quorum status.User Impact: Portworx volumes were out of quorum after a network split even though all the nodes and pools for the volume's replica were online and healthy. This happened when the node could communicate over network to KVDB but not with rest of the nodes. Resolution: Restart the Portworx service on the node where the volume is currently attached. Components: Storage Affected versions: 2.12.2 | Critical |
PWX-30511 | When autofstrim was disabled, internal autofstrim volume information was not removed completely. User Impact: An error occurred while running manual fstrim. Resolution: This issue has been fixed. Components: Storage Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30294 | The pvc-controller pods failed to start in the DaemonSet deployment. User Impact: The pvc-controller failed due to the deprecated values of the --leader-elect-resource-lock flag.Resolution: These values have been removed to use the default leases value.Components: Spec Generator Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30930 | The KVDB cluster could not form a quorum after KVDB was down on one node. User Impact: On a loaded cluster or when the underlying KVDB disk had latency issues, KVDB nodes failed to elect leaders among themselves. Resolution: Increase the heartbeat interval using the runtime option kvdb-heartbeat-interval=1000 .Components: KVDB Affected versions: 2.12.x, 2.13.x | Critical |
PWX-30985 | Concurrent pool expansion operations using add-disk and auto resulted in pool expansion failure, with the error mountpoint is busy .User Impact: Pool resize requests were rejected. Resolution: Portworx now serializes pool expansion operations. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30685 | In clusters running with cloud drives and auto-journal partitions, pool deletion resulted in deleting the data drive with an auto-journal partition. User Impact: Portworx had issues restarting after the pool deletion operation. Resolution: Upgrade to the current Portworx version. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30628 | The pool expansion would result in a deadlock when it had a volume in a re-sync state and the pool was already full. User Impact: Pool expansion would get stuck if a volume in the pool was in a re-sync state and the pool was full. No new pool expansions can be issued on such a pool. Resolution: Pool expansion will now be aborted immediately if it detects an unclean volume in the pool. Components: Pool Management Affected versions: 2.12.x, 2.13.x | Major |
PWX-30551 | If a diagnostics package collection was triggered during a node initialization, it caused the node initialization to fail and the node to restart. User Impact: The node restarted when node initialization and diagnostics package collection occurred at the same time. Resolution: Now diagnostics package collection will not restart the node. Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-29976 | Cloud drive creation failed when a vSphere 8.0 datastore cluster was used for Portworx installation. User Impact: Portworx failed to install on vSphere 8 with datastore clusters. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.13.1 | Major |
PWX-29889 | Portworx installation with local install mode failed when both a journal device and a KVDB device were configured simultaneously. User Impact: Portworx would not allow creating multiple disks in local mode install Resolution: This issue has been fixed. Components: KVDB Affected versions: 2.12.x, 2.13.x | Major |
PWX-29512 | In certain cases, a KVDB node failover resulted in inconsistent KVDB membership, causing an orphaned entry in the cluster. User Impact: The cluster operated with one less KVDB node. Resolution: Every time Portworx performs a KVDB failover, if it detects an orphaned node, Portworx removes it before continuing the failover operation. Components: KVDB Affected versions: 2.13.x | Major |
PWX-29511 | Portworx would remove an offline internal KVDB node as part of its failover process, even when it was not part of quorum. User Impact: The KVDB cluster would lose quorum and required manual intervention to restore its functionality. Resolution: Portworx will not remove a node from the internal KVDB cluster if it is out of quorum. Components: KVDB Affected versions: 2.13.x | Major |
PWX-28287 | Pool expansion on an EKS cluster failed while optimization of the associated volume(s) was in progress. User Impact: Pool expansion was unsuccessful. Resolution: Portworx catches these scenarios early in the pool expansion process and provide a clear and readable error message to the user. Components: Cloud Drives Affected versions: 2.12.x, 2.13.x | Major |
PWX-28590 | In vSphere local mode install, storageless nodes (disaggregated mode) would claim storage ownership of a hypervisor if it was the first to boot up. This meant that a node capable of creating storage might not be able to get ownership. User Impact: In vSphere local mode, Portworx installed in degraded mode. It occurred during a fresh install or when an existing storage node was terminated. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.12.1 | Major |
PWX-30831 | On EKS, if the cloud drives were in different zones or removed, Portworx failed to boot up in certain situations. User Impact: Portworx did not start on an EKS cluster with removed drives. Resolution: Portworx now ignores zone mismatches and sends alerts for deleted drives. It will now not abort the boot up process and continue to the next step. Components: Cloud Drives Affected versions: 2.12.x, 2.13.x | Major |
PWX-31349 | Sometimes Portworx processes on the destination or DR cluster would restart frequently due to a deadlock between the node responsible for distributing the restore processing and the code attempting to attach volumes internally. User Impact: Restore operations failed Resolution: This issue has been fixed. Components: DR and Migration Affected versions: 2.12.x, 2.13.x | Major |
PWX-31019 | During cloudsnap backup/restore, there was a crash occasionally caused by the array index out of range of the preferredNodeForCloudsnap function. User Impact: Cloudsnap restore failed. Resolution: This issue has been fixed. Components: Storage Affected versions: 2.12.x, 2.13.x | Major |
PWX-30246 | Portworx NFS package installation failed due to a lock held by the unattended-upgrade service running on the system. User Impact: Sharedv4 volume mounts failed Resolution: Portworx NFS package install now waits for the lock, then installs the required packages. This issue is resolved after upgrading to the current version and restarting the Portworx container. Components: Sharedv4 Affected versions: 2.11.2, 2.12.1 | Major |
PWX-30338 | VPS pod labels were not populated in the Portworx volume spec. User Impact: VPS using the podMatchExpressions field in a StatefulSet sometimes failed to function correctly because volume provisioning and pod inception occurred at the same time.Resolution: Portworx ensures that volume provisioning collects the pod name before provisioning. Components: Volume Placement Strategies Affected versions: 2.12.x, 2.13.x | Minor |
PWX-28317 | A replica set was incorrectly created for proxy volumes. User Impact: When a node was decommissioned, it got stuck if a proxy volume’s replica set was on that node. Resolution: Now replica sets are not created for proxy volumes. Components: Proxy Volumes Affected versions: 2.11.4 | Minor |
PWX-29411 | In vSphere, when creating a new cluster, KVDB disk creation failed for a selected KVDB node. User Impact: In the local mode install, the KVDB disk creation failures resulted in wrongly giving up ownership of a hypervisor. This created two storage nodes on the same hypervisors. Resolution: This issue has been fixed. Components: Cloud Drives Affected versions: 2.12.1. 2.13.x | Minor |
PWX-28302 | The pool expand command failed to expand an existing pool size when it was increased by 4 GB or less. User Impact: If the user expanded the pool by 4 GB or less, the pxctl sv pool expand command failed with an invalid parameter error.Resolution: Increase the pool size by at least 4 GB. Components: PX-StoreV2 Affected versions: 2.12.x, 2.13.x | Minor |
PWX-30632 | NFS backupLocation for cloudBackups failed with the error validating credential: Empty name string for nfs error. The NFS name used by Portworx to mount the NFS server was not passed to the required function.User Impact: Using BackupLocations for NFS targets failed. Resolution: Portworx now passes the credential name to the function that uses the name to mount the NFS server. Components: Cloudsnaps Affected versions: 2.13.x | Minor |
PWX-25792 | During the volume mount of FA/FB DA volumes, Portworx did not honor the nosuid mount option specified in the storage class.User Impact: Post migration from PSO to Portworx, volumes with the nosuid mount option failed to mount on the host.Resolution: Portworx now explicitly sets the nosuid mount option in the mount flag before invoking the mount system call.Components: FA-FB Affected versions: 2.11.0 | Minor |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2149 | Portworx 3.0.0 cannot be installed using the Rancher catalog chart. You should use PX-Central to generate the Portworx spec. |
PD-2107 | If there is a ha-update operation while the volume is in a detached state, a different node might start publishing the volume metrics, but the old node won’t stop publishing the volume metrics. This results in duplicate metrics, and only one will have the correct currhalevel.Workaround: For detached volumes, before doing a ha-update , attach the volume manually through pxctl . |
PD-2086 | Portworx does not support Oracle API signing keys with apassphrase. Workaround: Use API signing keys without a passphrase. |
PD-2122 | The add-drive operation fails when a drive is added to an existing cloud-based pool.Workaround: Use the pxctl service pool expand -operation add-disk -uid <pool-ID> -size <new-storage-pool-size-in-GiB> command to add a new drive to such pools. |
PD-2170 | The pool expansion can fail on Google Cloud when using the pxctl service pool expand -operation add-disk command with the error Cause: ProviderInternal Error: googleapi: Error 503: Internal error. Please try again or contact Google Support. Workaround: Rerun the command. |
PD-2188 | In OCP 4.13 or newer, when the application namespace or pod is deleted, application pods that use Portworx sharedv4 volumes can get stuck in the Terminating state. The output of the ps -ef --forest command for the stuck pod showed that the conmon process had one or more defunct child processes. Workaround: Find the nodes on which the sharedv4 volume(s) used by the affected pods are attached, then restart the NFS server on those nodes with the systemctl restart nfs-server command. Wait for a couple of minutes. If the pod is still stuck in the Terminating state, reboot the node on which the pod is running. The pod might take several minutes to release after a reboot. |
PD-2209 | When Portworx is upgraded to version 3.0.0 without upgrading Portworx Operator to version 23.5.1, telemetry is disabled. This is because the port is not updated for the telemetry pod. Workaround: Upgrade Portworx Operator to the latest version and bounce the Portworox pods manually. |
PD-2615 | Migrations triggered as part of Async DR will fail in the "Volume stage" when Portworx is configured with PX-Security on the source and destination clusters. Workaround: Please contact support if you encounter this issue. |
Known issues (Errata) with PX-StoreV2 datastore
Issue Number | Issue Description |
---|---|
PD-2138 | Scaling down the node groups in AWS results in node termination. After a node is terminated, the drives are moved to an available storageless node. However, in some cases, after migration the associated pools remain in an error state. Workaround: Restart the Portworx service, then run a maintenance cycle using the pxctl sv maintenance --cycle command. |
PD-2116 | In some cases, re-initialization of a node fails after it is decommissioned and wiped with the error Failed in initializing drives on the node x.x.x.x : failed to vgcreate . Workaround: Reboot the node and retry initializing it. |
PD-2141 | When cloud drives are detached and reattached manually, the associated pool can go down and remain in an error state. Workaround: Restart the Portworx service, then run a maintenance cycle using the pxctl sv maintenance --cycle command. |
PD-2153 | If the add-drive operation is interrupted by a drive detach, scale down or any other operation, the pool expansion can get stuck.Workaround: Reboot the node. |
PD-2174 | When you add unsupported drives to the StorageCluster spec of a running cluster,Portworx goes down. Workaround: Remove the unsupported drive from the StorageCluster spec. The Portworx Operator will recreate the failed pod and Portworx will be up and running again on that node. |
PD-2208 | Portworx on-premises with PX-StoreV2 fails to upgrade to version 3.0.0. Workaround: Replace -T dmthin with -T px-storev2 in your StorageCluster, as the dmthin flag is deprecated. After updating the StorageCluster spec, restart the Portworx nodes. |
2.13.12
March 05, 2024
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-35603 | When running Portworx on older Linux systems (specifically those using GLIBC 2.31 or older) with newer versions of Kubernetes, Portworx failed to detect dynamic updates of pod credentials and tokens. This led to Unauthorized errors when using Kubernetes client APIs.Resolution: Portworx now correctly processes dynamic token updates. |
PWX-29750 | In certain cases, the cloudsnaps that were using S3 object-stores were not completely deleted because S3 object-stores did not support bulk deletes or were unable to handle large cloudsnaps. This resulted in undeleted cloudsnap objects, leading to unnecessary capacity consumption on S3. Resolution: Portworx now addresses and resolves such cloudsnaps deletion issues. |
PWX-35136 | During cloudsnap deletions, some objects were not removed because the deletion requests exceeded the S3 API's limit for the number of objects that could be deleted at once. This would leave objects on S3 for deleted cloudsnaps, thereby consuming S3 capacity. Resolution: Portworx now ensures that deletion requests do not exceed the S3 API's limit. |
PWX-31019 | An array index out of range error in the preferredNodeForCloudsnap function occasionally caused crashes during cloudsnap backup/restore operations.Resolution: This issue has been fixed, and Porworx now prevents such crashes during cloudsnap backup or restore operations. |
PWX-30030 | Some Portworx volumes incorrectly showed Not in quorum status after a network split, even though all the nodes and pools for the volume's replica were online and healthy. This happened when the node could communicate over network to KVDB but not with rest of the nodes. Resolution: Portworx volumes now accurately reflect their current state in such situations. |
PWX-33647 | When the Portworx process are restarted, it verifies the existing mounts on the system for sanity. If one of the mounts was NFS mount of a Portworx volume, the mount point verification would hung as Portworx was in the process of starting up. Resolution: When Portworx is starting up, it now skips the verification of Portworx-backed mount points to allow the startup process to continue. |
PWX-29511 | Portworx would remove an offline internal KVDB node as part of its failover process, even when it was not part of quorum. The KVDB cluster would lose quorum and required manual intervention to restore its functionality. Resolution: Portworx does not remove a node from the internal KVDB cluster if it is out of quorum. |
PWX-29533 | During node initialization with cloud drives, a race condition occasionally occurred between the Linux device manager (udevd) and Portworx initialization, causing node initialization failures. This was because drives were not fully available for Portworx's use, preventing users from adding new nodes to an existing cluster. Resolution: Portworx has increased the number of retries for accessing the drives during initialization to mitigate this failure. |
PWX-35650 | GKE customers encountered a nil panic exception when the provided GKE credentials were invalid. Resolution: Portworx now properly shuts down and logs the error, aiding in the diagnosis of credential-related issues. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2768 | When cloning or capturing a snapshot of an FlashArray Direct Access PVC that is either currently resizing or has encountered a resizing failure, the clone or snapshot creation might fail. Workaround: Initiate the resize operation again on the original volume, followed by the deletion and recreation of the clone or snapshot, or allow for an automatic retry. |
2.13.11
October 25, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
- It is recommended that you upgrade to the most latest version of Portworx when upgrading from version 2.13.11.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-34029 | Portworx now removes stale FlashArray multipath devices upon startup, which may result from pod failovers (for FlashArray Direct Access) or drive set failovers (for FlashArray Cloud Drives) while Portworx was not running. These stale devices had no direct impact but could have lead to slow operations if many were present. |
PWX-33551 | You can now configure the REST API call timeout (in seconds) for FA/FB by adding the new environment variable PURE_REST_TIMEOUT to the StorageCluster. When updating this value, you should also update the execution timeout using the following command:pxctl cluster options update --runtime-options execution_timeout_sec=<sec> PURE_REST_TIMEOUT is set to 8 seconds and execution_timeout_sec to 60 seconds by default. Contact Portworx support to find the right values for your cluster. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
PWX-33229 | Previously, a Portworx license would expire if Portworx could not reach its billing server within 72 hours. Users can now continue to use Portworx for up to 30 days if the billing servers are not reachable. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
PWX-33364 | As part of FlashArray integration, Portworx has now reduced the number of API calls it makes to the arrays endpoint on FA. This improvement was included in Portworx version 3.0.2 and now is backported to 2.13.11. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-33828 | If you deleted a FA Direct Access PVC attached to an offline Portworx node, Portworx removed the associated volume from its KVDB. However, the FlashArray did not delete its associated volume because it remained connected to the offline node on the FlashArray. This created orphaned volumes on the FlashArray. Resolution: Portworx now detects a volume that is attached to an offline Portworx node and will disconnect it from all the nodes in the FlashArray and avoid orphaned volumes. |
PWX-33632 | If an attach request remained in the processing queue for a long time, it would lead to a panic, causing Portworx to restart on a node. This was because an FA attach operation involved making REST API calls to FA, as well as running iSCSI rescans, which consumed more time. When Portworx received a high volume of requests to attach FA DirectAccess volumes, the queue for these attach requests gradually grew over time, leading to a panic in Portworx. Resolution: The timeout for queued attach requests has been increased to 15 minutes for FA DirectAccess volumes. |
PWX-33898 | When two pods, both using the same RWO FA Direct Access volume, were started on two different nodes, Portworx would move the FADA volume attachment to the node where the most recent pod was running, rather than rejecting the setup request for the second pod. This resulted in a stale FADA multipath device remaining on the original node where the first pod was started, causing subsequent attach or mount requests on that node to fail Resolution: A second pod request for the same RWO FA Direct Access volume on a different node will now be rejected if such a FA Direct Access volume is already attached and in use on another node. |
PWX-33631 | To distribute workloads across all worker nodes during the provisioning of CSI volumes, Portworx obtains locks to synchronize requests across different worker nodes. Resolution: If CSI volume creation is slow, upgrade to this version. |
PWX-34277 | When an application pod using an FA Direct Access volume was failed over to another node, and Portworx was restarted on the original node, the pod on the original node became stuck in the Terminating state. This occurred because Portworx didn't clean up the mountpaths where the volume had previously been attached, as it couldn't locate the application on the local node. Resolution: Portworx now cleans up the mountpath even when the application is not found on the node. |
PWX-34334 | Cloudsnaps of an aggregated volume with a replication level of 2 or more uploaded incorrect data if one of the replica nodes from which a previous cloudsnap operation had been executed was down. Resolution: Portworx now forces a full backup in scenarios where the previous cloudsnap node is down. |
PWX-33935 | When the --sources option was used in the pxctl volume ha-update command for the aggregated volume, it caused the Portworx service processes to abort with an assertion. As a result, the Portworx service on all nodes in the cluster continuously kept restarting.Resolution: Contact the Portworx support team to restore your cluster. |
PWX-34025 | In certain cases, increasing the replication level of a volume on a PX-StoreV2 cluster created new replicas with non-zero blocks that were overwritten with zeros on the existing replicas. This caused the ext4 filesystem to report a mismatch and delayed allocation failures when a user application attempted to write data to the volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the volume. This issue has been fixed in Portworx version 3.0.3 and now backported to 2.13.11. |
PWX-33451 | In certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data was expected. This caused the ext4 filesystem to report a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume. This issue has been fixed in Portworx version 3.0.2 and is now backported to 2.13.11. |
PWX-32572 | When using the older Containerd versions (v1.4.x or 1.5.x), Portworx kept opening connections to Containerd, eventually depleting all the file-descriptors available on the system. This caused the Portworx nodes to crash with the too many open files error. Resolution: Portworx no longer leaks the file-descriptors when working with older Containerd versions. This issue has been fixed in Portworx version 3.0.2 and is now backported to 2.13.11. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2474 | In certain scenarios, you might encounter the alert Failed to delete FlashArray Direct Access volume on FlashArray when deleting an FA Direct Access PVC. This occurs when the Portworx volume and the Kubernetes PVC are deleted, but the deletion fails on the FlashArray due to one of the following reasons:
|
PD-2474 | When a Portworx volume is created, it remains in the down - pending state. This occurs due to a race condition when Portworx is restarted while it is performing an FA API call to create a volume, and the volume creation is not completed on the FA side.Workaround: Delete the volume in the down - pending state using the pxctl volume delete command. |
PD-2477 | During FA Direct Access volume resizing, if the network between FlashArray and Portworx is disconnected, the PVC and the Portworx volume reflect the updated size, but the actual size on the FA backend remains unchanged. Workaround: Once the network is connected again, trigger another PVC resize operation to update the size on the FlashArray backend. |
2.13.10
September 3, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer. This caused the Portworx license status to appear expired and users were not able to create new volumes. Resolution: This issue has been fixed in Portworx version 3.0.1 and now backported to 2.13.10. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
2.13.9
August 28, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33258 | This issue impacted users who are using Flash Blade Direct Access volumes only. When Flash Blade direct access volume creation took more than 30 seconds, Portworx sometimes timed out on volume creation, leaving volumes in a pending state. Resolution: With this fix, the default timeout for FB volume creation has been increased from 30 seconds to 180 seconds (3 minutes). You can also set this timeout to a higher value using the new cluster option called --fb-lock-timeout . You can tune this as required based on the volume creation times on Flash Blade, as it depends on your performance and network bandwidth. You must set this time in minutes; for example, if you want to set the timeout to 6 minutes: pxctl cluster options update --fb-lock-timeout 6 |
2.13.8
August 24, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-33014 | With Portworx Operator 23.7.0, Portworx can dynamically load telemetry port values specified by the operator. |
PWX-30798 | Users can now schedule fstrim operations. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-33006 | The FlashArray Direct Access PVCs were deleted upon a Portworx restart if they were newly created, not yet attached, and in a "Pending" state. There is no data loss since these were unpopulated volumes. Resolution: Portworx has enhanced the code to no longer delete "Pending" FADA volumes on PX startup. |
PWX-30511 | When auto fstrim was disabled, internal state data did not clear and caused manual fstrim to enter an error state. Resolution: This issue has been fixed. |
2.13.7
July 11, 2023
Visit these pages to see if you're ready to upgrade to this version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-31855 | When mounting a large number of PVCs that use FADA volumes, PVC creation took a long time and crashed Portworx. Resolution: The heavyweight list of all devices API has been removed from the attach call, reducing the time taken to attach volumes. |
PWX-30551 | The node restarted when node initialization and diagnostics package collection happened at the same time. Resolution: The diagnostics package collection will not restart the node. |
PWX-21105 | Volume operations such as Attach/Detach/Mount/Unmount would get stuck if a large number of requests were sent for the same volume. Portworx would accept all requests and add them to its API queue. All requests for a specific volume are processed serially. This would cause newer requests to be queued for a longer duration. Resolution: When a request does not get processed within 30s because it is sitting behind other requests in the API queue for the same volume, Portworx will return an error to the client requesting it to try again. |
PWX-29067 | The application pods using FADA volumes were not getting auto remounted in read-write mode when one of the multiple configured network interfaces went down.Resolution: Portworx now enables multiple iSCSI interfaces for FlashArray connections. These interfaces must be registered with the iscsiadm -m iface command. Use the --flasharray-iscsi-allowed-ifaces cluster option to restrict the interfaces used by FADA connections. This ensures if one of the interfaces go down, the FADA volume will stay mounted as read-write. For more details about the flasharray-iscsi-allowed-ifaces flag, see FlashArray and FlashBlade environment variables. |
2.13.6
June 16, 2023
Visit these pages to see if you're ready to upgrade to this version:
Notes
- This version addresses security vulnerabilities.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-30569 | Portworx now supports OpenShift version 4.13.0 with kernel version 5.14.0-284.13.1.el9_2.x86_64. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-31647 | If any read-write volume changed to a read-only state, pods using these volumes had to be manually restarted to make the mounts back to read-write. Resolution: A background task is now implemented to run periodically (by default every 30 seconds), which checks for read-only volumes and terminates managed pods using them. You can customize this time-interval with the --ro-vol-pod-bounce-interval cluster option. This background task is enabled for FA DirectAccess volumes by default.To enable this for all Portworx volumes, use the --ro-vol-pod-bounce all cluster option. |
2.13.5
May 16, 2023
Visit these pages to see if you're ready to upgrade to this version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Portworx can now be deployed from Azure Marketplace with a pay-as-you-go subscription.
2.13.4
May 09, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx by Pure Storage recommends upgrading to Portworx 2.13.4 if you are using Portworx 2.12.0 with Azure managed identity to avoid the PWX-30675 issue, which is explained below.
Fixes
Issue Number | Issue Description |
---|---|
PWX-30675 | During installation of Portworx 2.12.0 on AKS, Portworx checked for the AZURE_CLIENT_SECRET , AZURE_TENANT_ID and AZURE_CLIENT_ID environment variables. However, users of Azure managed identity had only set the AZURE_CLIENT_ID , resulting in a failed installation.Resolution: This issue has been fixed and now Portworx checks only for the AZURE_CLIENT_ID environment variable. |
2.13.3
April 24, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
If you are currently using any Portworx 2.12 version, Portworx by Pure Storage recommends upgrading to version 2.13.3 due to the PWX-29074 issue, which is explained below.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-30420 | In Portworx version 2.13.0, a prerequisite check was implemented to detect the versions of the multipath tool with known issues (0.7.x to 0.9.3) during installation or upgrade of Portworx. If a faulty version was detected, it was not possible to install or upgrade Portworx. However, this prerequisite check has now been removed, and Portworx installs or upgrades are not blocked on these faulty versions. Instead, a warning message is displayed, advising customers to upgrade their multipath package. |
PWX-29992 | In Async DR migration, a snapshot was previously created at the start of restores as a fallback in case of errors, but it added extra load with creation and deletion operations. This is improved, as Portworx do not create a fallback snapshot, allowing users to create clones from the last successful migrated snapshot if necessary for error cases. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-29640 | Incorrect allocation of floating license and insertion of excess data into the Portworx key-value database caused new nodes to repeatedly fail to join the Portworx cluster . Resolution: Cluster-join failures now perform thorough cleanup to remove all temporary resources created during the failed cluster-join attempts. |
PWX-30056 | During migration, if a PVC has the sticky bit set (which prevents volumes from being deleted), it accumulated the internal snapshot that was created for the asynchronous DR deployment, thus consuming extra storage space. Resolution: The internal snapshots are now created without the sticky bit. |
PWX-30484 | The SaaS license key was not activated when installing Portworx version 2.13.0 or later. Resolution: This issue has been fixed. |
PWX-26437 | Due to a rare corner-case condition, node decommissioning could leave orphaned keys in the KVDB. Resolution: The forced node-decommission command has been modified to perform the node-decommission more thoroughly, and to clean up the orphaned data from the KVDB. |
PWX-29074 | Portworx incorrectly pinged the customer DNS server. At regular intervals, when the /etc/hosts file from the node periodically rsynced with the Portworx runc container, it temporarily removed the mappings for KVDB domain names. As a result, internal KVDB name resolution queries were incorrectly forwarded to the customer's DNS servers. Resolution: This issue has been fixed. |
PWX-29325 | The local snapshot schedule could not be changed using the pxctl CLI. An update to a previously created snapshot failed with the error Update Snapshot Interval: Failed to update volume: This IO profile has been deprecated .Resolution: You can now disable snapshot schedules with the —periodic parameter, as shown in the following command:pxctl volume snap-interval-update --periodic 0 <vol-id> |
PWX-30255 | The log message is improved to add extra metadata into node markdown updates. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2063 | In an Async DR deployment, if the --sticky flag is set to on for Portworx volumes:
off the sticky bit flag on the Portworx volumes on the source cluster:PX_POD=$(kubectl get pods -l name=portworx -n <px-namespace> -o jsonpath='{.items[0].metadata.name}') kubectl exec $PX_POD -n <px-namespace> -- /opt/pwx/bin/pxctl volume update <vol-id> --sticky off |
2.13.2
April 7, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27957 | The volume replica level in an Asynchronous DR deployment now matches the source volume's replica level at the end of each migration cycle. |
PWX-29017 | Storage stats are periodically collected and stored to improve Portworx cluster debugging. |
PWX-29976 | Portworx now supports vSphere version 8.0 |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23651 | Certain workloads involving file truncates or deletes from large to very small sizes caused the volume to enter an internal error state. The issue is specific to the Ext4 filesystem because of the way it handles file truncates/deletes. As a result, PVC resize/expand operations failed.Resolution: Portworx now recognizes these specific categories of errors as fixable and automatically fixes them during the next mount. |
PWX-29353 | If multiple NFS credentials were created with the same NFS server and export paths, cloudsnaps did not work correctly. Resolution: If the export paths are different with the same NFS server, they now get mounted at different mount points, avoiding this issue. |
PWX-28898 | Heavy snapshot loads caused delays in snapshot completion. This caused replicas to lag and the backend storage pool to keep consuming space. Resolution: You can increase the time Portworx waits for storage to complete the snapshot. This will cause the replicas to remain in the pool until the next Portworx service restart, which performs garbage collection of such replicas. |
PWX-28882 | Upgrades or installations of Portworx on Nomad with cloud drives failed at bootup. Impacted versions: 2.10.0 and later. Resolution: Portworx version 2.13.2 can successfully boot up on Nomad with cloud drives. |
PWX-29600 | The VPS Exists operator did not work when the value of key parameter was empty.Resolution: The VPS Exists operator now allows empty values for the key parameter without failing. |
PWX-29719 | On FlashArray cloud drive setup, if some iSCSI interfaces could log in successfully while others failed, the FlashArray connection sometimes failed with the failed to log in to all paths error. This prevented Portworx from restarting successfully in clusters with known network issues. |
PWX-29756 | If FlashArray iSCSI attempted to log in several times, it timed out, creating extra orphaned volumes on the FlashArray. Resolution: The number of retries has been limited to 3. |
PWX-28713 | Kubernetes nodes with Fully Qualified Domain Names (FQDNs) detected FlashArray cloud drives as partially attached. This prevented Portworx from restarting successfully if the FlashArray host name did not match the name of the node, such as with FQDNs. |
PWX-30003 | A race condition when updating volume usage in auto fstrim resulted in Portworx restart. |
2.13.1
April 4, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
Portworx can now be deployed from the GCP Marketplace with the following new offerings. You can also change between these offerings after deploying Portworx by changing the value of the environment variable
PRODUCT_PLAN_ID
within your StorageCluster spec:- PX-ENTERPRISE
- PX-ENTERPRISE-DR
- PX-ENTERPRISE-BAREMETAL
- PX-ENTERPRISE-DR-BAREMETAL
Fixes
Issue Number | Issue Description |
---|---|
PWX-29572 | In Portworx 2.13.0, the PSO2PX migration tool would fail with the error pre-create filter failed: CSI PVC Name/Namespace not provided to this request due to a change made in the Portworx CSI Driver.Resolution: For migrating from PSO to Portworx, you should use Portworx 2.13.1. The migration tool will fail with Portworx 2.13.0. |
2.13.0
February 23, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
A known issue with multipath tool versions 0.7.x to 0.9.3 causes high CPU usage and/or multipath crashes that disrupt IO operations. To prevent this, Portworx now performs a prerequisite check to detect these faulty multipath versions starting with version 2.13.0. If this check fails, it will not be possible to install or upgrade Portworx. Portworx by Pure Storage recommends upgrading the multipath tool version to 0.9.4 before upgrading to Portworx to any Portworx 2.13 version.
New features
Portworx by Pure Storage is proud to introduce the following new features:
- You can now install Portworx on Oracle Container Engine for Kubernetes.
- You can now use Portworx on FlashArray NVMe/RoCE.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27200 | Added the following pxctl commands for auto fstrim:
|
PWX-28351 | You can now enable pay-as-you-go billing for Docker Swarm. |
PWX-27523 | CSI sidecar images are updated to the latest open source versions. |
PWX-27920 | Batching is now enabled in the metrics collector to reduce memory usage on large scale clusters. |
PWX-28137 | The Portworx maintained fork for CSI external-provisioner has been removed in favor of the open source version. |
PWX-28149 | The Portworx CSI Driver now distributes volume deletion across the entire cluster for faster burst-deleting of many volumes. |
PWX-28131 | Pool expansion for repl1 volumes is now supported on all cloud environments, except for the following scenarios:
|
PWX-28277 | Updated stork-scheduler deployment and stork-config map in the spec generator to use Kube Scheduler Configuration for Kubernetes version 1.23 or newer. |
PWX-28363 | Reduced the number of vSphere API calls made during Portworx bootup on vSphere. This significantly improves Portworx upgrade times in environments where vSphere servers are overloaded. |
PWX-10054 | Portworx can now monitor the health of an internal KVDB, and when it is detected as unhealthy, Portworx can initiate KVDB failover. |
PWX-27521 | The Portworx CSI driver now supports version 1.7 of the CSI spec. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23203 | In some cases, migration or Asynchronous DR failed when the source volume was being resized. Resolution: On the destination cluster, Portworx now resizes the volume before migration operations. |
PWX-26061 | Deleting cloudsnaps failed with the curl command on a gRPC port.Resolution: Add a separate field for providing the bucket ID. |
PWX-26928 | Portworx installation would fail when unattended-upgr was running on the system, or you were unable to lock the necessary packages for installation.Resolution: Re-attempt installation after waiting for the lock to be released. |
PWX-27506 | When a node was down for a long time, cloudsnap restores were taking longer to start. Resolution: Portworx now makes other nodes in the cluster to process such restore requests. |
PWX-28305 | Portworx was facing a lock hold timeout assert while detaching sharedv4 service volumes, if the Kubernetes API calls were being rate limited. Resolution: To avoid this assert, the Kubernetes API calls are now done outside the context of a lock. |
PWX-28422 | Snapshot and cloudsnapshot requests were failing if a volume was in the detached state and one of its coordinator had changed IP address. Resolution: Portworx now reattaches the volume with the correct IP address on snapshot and cloudsnapshot request for detached volumes. |
PWX-28224 | The pxctl cd list command was failing to fetch the cloud drives when ran from hot-nodes (nodes with local storage).Resolution: This issue has been fixed. |
PWX-28225 | The summary of the command pxctl cd list was showing all nodes as cloud drive nodes.Resolution: The output of the command is reframed. |
PWX-28321 | The output of pxctl cd list was showing storageless nodes even though there were no storageless nodes present in the cluster.Resolution: Wait for the Portworx cleanup job to be completed, which runs every 30 minutes. |
PWX-28341 | In the NodeStart phase, if a gRPC request for getting node stats was invoked before completion of the pxdriver bootstrap, Portworx would abruptly stop. Resolution: Now Portworx returns an error instead of stopping abruptly. |
PWX-28285 | The high frequency of sharedv4 volume operations (such as create, attach, mount, unmount, detach, or delete) requires frequent changes to NFS exports. This was causing the NFS server to stop responding and a potential node restart. Resolution: When applying changes to NFS exports, Portworx now combines multiple changes together and sends a single batch update to the NFS server. Portworx also limits the frequency of NFS server restarts to prevent such issues. |
PWX-28529 | Fixed an issue where volumes with replicas on a node in pool maintenance were temporarily marked as out of quorum when the replica node exited pool maintenance. |
PWX-28551 | In Portworx version 2.12.1, one of the sanitizing operations changed upper case letters to lower case letters. This caused CSI pod registration issues during the upgrade. Resolution: This issue is fixed as Portworx now adheres to the regular expression for topology label values. |
PWX-28539 | During the attachment of FlashArray (FA) NVMe volumes, Portworx performs stale device cleanup. However, this cleanup process sometimes failed when the device was busy, causing the volume attachment to fail. Resolution: The FA NVMe volumes can now be attached, even if the stale cleanup fails. |
PWX-28614 | Fixed a bug where pool expansion of pools with repl1 volumes did not abort. |
PWX-28910 | In a Synchronous DR deployment, if the domains were imbalanced and one domain had over-provisioned a new volume, all the replicas of the volume would land in the same domain. Resolution: Now the replicas are forced to spread across the failure domains during the volume creation operation in the Synchronous DR deployment. If provisioning is not possible, then the volume creation operation will fail. You can use the pxctl cluster options update -metro-dr-domain-protection off command to disable this protection. |
PWX-28909 | When an error occurred during CSI snapshots, the Portworx CSI driver was incorrectly marking the snapshot ready for consumption. This resulted in a failure to restore PVCs from a snapshot in this case. Resolution: Create a snapshot and immediately hydrate a new PVC with the snapshot contents. |
PWX-29186 | Fields required for Volume Placement Strategy were missing from the CSI volume spec.VolumeLabels . This was resulting in a Volume Placement Strategy where a namespace failed to place volumes correctly.Resolution: While some simple volume placement strategies may work without this fix, users of CSI should upgrade to Portworx version 2.13.0 if they use Volume Placement Strategies. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1859 | When storage is full, a repl 1 volume will be in the NOT IN QUORUM state and a deadlock occurs, so you cannot expand the pool.Workaround: To expand the pool, pass the --dont-wait-for-clean-volumes option as part of the expand command. |
PD-1866 | When using FlashArray Cloud Drives and FlashArray Direct Access volumes, Portworx version 2.13.0 does not support Ubuntu versions 20.04 and 22.x with the default multipath package (version 0.8x). Workaround: Portworx requires version 0.9.4 of the multipath-tools package. Reach out to the support team if you need help building the package. |
2.12.6
September 3, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-33389 | The Portworx CSI license for FA/FB validation failed when upgraded Purity to version 6.4.2 or newer. This caused the Portworx license status to appear expired and users were not able to create new volumes. Resolution: This issue has been fixed in Portworx version 3.0.1 and now backported to 2.12.6. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-2349 | When you upgrade Portworx to a higher version, the upgrade is successful, but the Portworx CSI license renewal could take a long time. Workaround: Run the pxctl license reset command to reflect the correct license status. |
PD-2350 | Upgrades on some nodes may become stuck with the following message: This node is already initialized but could not be found in the cluster map. . This issue can be caused by an orphaned storageless node. Workaround: Verify if the node which has this error is a storageless node. If it is, delete the orphaned storageless node using the command: pxctl clouddrive delete --node <> to progress the upgrade. |
2.12.5
May 09, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-30003 | Portworx restarted due to an internal race condition caused by high-frequency metadata updates overloading Portworx nodes. Resolution: This issue has been fixed in Portworx version 2.13.2 and now backported to 2.12.5. |
2.12.4
April 26, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx by Pure Storage recommends upgrading to version 2.12.4 as it fixes a regression introduced in 2.12.0, which is explained in the PWX-28551 issue below.
Fixes
Issue Number | Issue Description |
---|---|
PWX-28551 | In Portworx version 2.12.1, one of the sanitizing operations changed upper case letters to lower case letters. This caused CSI pod registration issues during the upgrade. Resolution: This issue has been fixed. |
2.12.3
April 17, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-28285 | The high frequency of sharedv4 volume operations (such as create, attach, mount, unmount, detach, or delete) requires frequent changes to NFS exports. This caused the NFS server to stop responding and a potential node restart. Resolution: This issue has been fixed in Portworx version 2.13.0 and now backported to 2.12.3. |
PWX-29074 | Portworx incorrectly pinged the customer DNS server. At regular intervals, when the /etc/hosts file from the node periodically rsynced with the Portworx runc container, it temporarily removed the mappings for KVDB domain names. As a result, internal KVDB name resolution queries were incorrectly forwarded to the customer's DNS servers. Resolution: This issue has been fixed. Portworx by Pure Storage recommends upgrading to version 2.12.3, if running Portworx 2.12.0, 2.12.1, or 2.12.2. |
2.12.2
January 28, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
Fixes
Issue Number | Issue Description |
---|---|
PWX-28339 | CSI Volumes restored from snapshots were missing PVC name and namespace metadata. This caused failures when using sharedv4 service volumes. Resolution: Portworx now adds the PVC name and namespace to the volume during restore. |
PWX-22828 | If automatic filesystem trim was disabled and then enabled within one minute, then the pxctl volume autofstrim status command incorrectly reported the status: Filesystem Trim Initializing. Please wait .Resolution: This issue has been fixed. |
PWX-28406 | Automatic filesystem trim would skip a volume if any of the replicas for the volume were hosted on a node where there was no pool with ID 0. Resolution: This issue has been fixed. |
2.12.1.4
September 22, 2023
Visit these pages to see if you're ready to upgrade to the latest version:
This is a hotfix release intended for select customers. Please contact the Portworx support team for more information.
Fixes
Issue Number | Issue Description |
---|---|
PWX-33451 | Ext4 filesystem complained about a mismatch and delayed allocation failures when a user application tried to write data to an aggregated Portworx volume. This occurred because, in certain cases, increasing the replication level of an aggregated volume failed to zero out specific blocks associated with stripes belonging to replication set 1 or higher, where zero data is expected. Resolution: Users can now run the fsck operation to rectify the failures or remove the added replicas from the aggregated volume. |
2.12.1
December 14, 2022
Visit these pages to see if you're ready to upgrade to the latest version:
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
Google Cloud users can now encrypt GCP cloud drives using customer managed encryption keys.
-
You can now use Vault Transit to manage key generation for encrypting in-transit data.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26232 | The Portworx node's IP addresses are now included in the license server's "long client usage" output (lsctl client ls -l ). |
PWX-26304 | Storageless nodes will become storage nodes when max_storage_nodes_per_zone is increased. |
PWX-27769 | vSphere and IBM cloud platforms will be able to now recognize the zone label topology.portworx.io/zone . This will help Portworx honor zone related settings like maxStorageNodesPerZone . |
PWX-27174 | pxctl cluster provision-status will now show IP addresses for nodes. The IP addresses help you to correlate that command's output with the node list provided by pxctl status . |
Fixes
Issue Number | Issue Description |
---|---|
PWX-27748 | Some incremental fixes in version 2.12.0 introduced issues with DaemonSet YAML generation for K3s and RKE2 Kubernetes platforms. Resolution: These issues have been fixed. |
PWX-27849 | Kubernetes versions 1.25 and later [do no support PodSecurityPolicy] (https://kubernetes.io/docs/concepts/security/pod-security-policy/). Resolution:PX-Central does not include PodSecurityPolicy in the YAML install specs when the Kubernetes version is 1.25 or later. |
PWX-27267 | Cloudsnaps for aggregated volumes were failing certain checks when part of the aggregated volume did not have any differential data to upload. Resolution: This version fixes those checks and prevents failure due to empty differential data. |
PWX-27246 | During a new installation of Portworx on a vSphere environment (installation in local mode), several VMs appeared as storage nodes on a single ESXi host. This was because of a race condition in the vSphere environment, which increased the number of nodes forming the cluster and affected quorum decisions. Resolution: You can choose not to upgrade if you take care of the race condition during installation. You can overcome the race condition by allowing only 1 VM to come up on an ESXi host during installation. Once the installation is complete, you can bring up as many VMs as you want simultaneously. The problem exists only until the first storage VM comes up. |
PWX-26021 | For sharedv4 apps, multiple mount/unmount requests on the same path could become stuck in Uninterruptible Sleep (D) State .Resolution: In the case that the client tries to mount but the previous request still exists, there will be a Kubernetes event stating that the previous request is still in progress. |
PWX-27732 | vSphere cloud drive labels previously contained a space, which is not compatible with Kubernetes standards and caused an error in CSI. Resolution: Portworx now replaces the space character with a dash ( - ). All other special characters will be replaced by a period (. ). |
PWX-27227 | If a pool rebalancing was issued with the --dry-run option, then Portworx created unnecessary rebalance audit keys in the KVDB. As it was not possible to delete these keys, the disk size of the KVDB increased. Resolution : Portworx no longer creates audit keys when a pool rebalancing is issued with the --dry-run option, and Portworx deletes orphaned keys that have already been created. |
PWX-27407 | The KVDB contained inconsistent node entries because of a race condition in the auto decommission of storageless nodes. This was causing Portworx to restart. Resolution: The race condition is now handled, and Portworx ensures that no inconsistent entries are left behind in the KVDB during the decommission process. |
PWX-27917 | Portworx ignored the value of MaxStorageNodesPerZone if an uneven number of nodes are labeled as portworx.io/node-type=storage .Resolution: This issue has been fixed. |
PWX-24088 | For cloud drives provisioned on FlashArray (FA), the default mount was nodiscard . When Portworx deleted contents of a volume or the volume itself, the space was not reclaimed by FA. This caused a discrepancy in available space displayed on a Pure1 dashboard vs available space displayed through the pxctl status usage command.Resolution: Storage pools for FA are now mounted with discard . This allows space to be reclaimed whenever volumes are deleted or files are removed. |
Deprecations
The following feature has been deprecated:
- Internal objectstore support
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1684 | If the sharedv4_svc_type parameter is not specified during ReadWriteMany volume creation, Portworx defaults to a sharedv4 service volume unless Metro DR is enabled, in which case Portworx defaults to sharedv4 (non-service) volumes. You can explicitly set the sharedv4_svc_type parameter in the StorageClass. If it is set to an empty string, a sharedv4 (non-service) volume is created. |
PD-1729 | On some recent Linux kernels, back to back online resize operations of Ext4 volumes can fail. This is because of a bug in the kernel which has been fixed in the latest kernel release. Workaround: Upgrade to a more recent kernel version, or restart the application pod that is using the volume. This remounts the volumes and completes the resize operation. |
2.12.0
October 24, 2022
Visit these pages to see if you're ready to upgrade to the latest version:
Notes
Portworx 2.12.0 requires Operator 1.10.0 and Stork 2.12.0.
New features
Portworx by Pure Storage is proud to introduce the following new features:
-
On-prem users can now enable PX-Fast functionality that utilizes the new PX-StoreV2 datastore. PX-Fast enables a new accelerated IO path for volumes and is optimized for workloads requiring consistent low latencies. PX-StoreV2 is the new Portworx datastore optimized for supporting IO intensive workloads for configurations utilizing high performance NVMe class devices.
-
Early access support for Portworx Object Service. This feature allows storage admins to provision object storage buckets with various backing providers using custom Kubernetes objects.
-
You can now use Vault AppRole's Role ID and Secret ID to authenticate with Vault. Portworx will auto-generate Vault tokens to store encryption secrets and cloud credentials in Vault.
-
Metro and asynchronous disaster recovery (DR) involves migrating Kubernetes resources from a source cluster to a destination cluster. To ensure that the applications can come up correctly on the destination clusters, you may need to modify resources to work as intended on your destination cluster. The ResourceTransformation feature allows you to define a set of rules that modify the Kubernetes resources before they are migrated to the destination cluster.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26700 | A pod that was using a sharedv4 volume might have taken a few minutes to terminate when Portworx running on one node was waiting for a response from Portworx running on another node which was down. This has been fixed by deferring the remote call when the remote node is down. |
PWX-26631 | In one of the error paths, Portworx was listing pods in all the namespaces in the Kubernetes cluster, which caused the Portworx process to consume a large amount of memory temporarily. The pod listing is now limited to a single namespace. |
PWX-24862 | At times, users create sharedv4 volumes unintentionally by using a sharedv4 storageClass with a ReadWriteOnce PVC. This is because previously Portworx created a sharedv4 volume when either the storageClass had sharedv4: true or the PVC access mode was ReadWriteMany/ReadOnlyMany . To avoid these unintentional sharedv4 volumes, a sharedv4 volume is now created only if the PVC access mode is ReadWriteMany/ReadOnlyMany , and the sharedv4 setting in the storageClass does not matter. This may require modification to the specs of some existing apps.If an app expects a sharedv4 volume while using a ReadWriteOnce PVC, some of the pods may fail to start. The PVC will have to be modified to use ReadWriteMany or ReadOnlyMany access mode. |
PWX-23285 | Adds uniform support for PX_HTTP_PROXY , PX_HTTPS_PROXY , and NO_PROXY environment variables (which are equivalent to commonly used Linux HTTP_PROXY , HTTPS_PROXY and NO_PROXY environment vars.)Specifying the HTTP proxy via the cluster options is now deprecated. Also adds support for authenticated HTTP proxy, where you specify the username and password to authenticate with an HTTP proxy. For example: PX_HTTP_PROXY=http://user:password@myproxy.acme.org . |
PWX-22292 | Previously, when you updated an AKS cluster principal's password and refreshed your Kubernetes secret, you would sometimes have to restart Portworx pods to propagate the changes to Portworx. Now, Portworx automatically refreshes the AKS secrets. |
PWX-22927 | You can now use Vault AppRole authentication for vault integration. You need to provide VAULT_ADDR, VAULT_APPROLE_ROLE_ID, VAULT_APPROLE_SECRET_ID, and VAULT_AUTH_METHOD (approle) via Kubernetes Secrets or as environment variable. |
PWX-26421 | Portworx logs now display Vault's authentication method if login is successful. |
PWX-24437 | Added discard stats to Grafana performance dashboard. |
PWX-18687 | Portwox now instantly detects that a replication node is down as a result of socket events, preventing high IO latencies. |
PD-1628 | Portworx now supports FlashBlade and FlashArray SafeMode feature |
PD-1634 | Added support for live migration of virtual machines between nodes in the OpenShift environment. This feature works only with Stork version 2.12. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-20808 | When Portworx was configured with external etcd v3 as its key-value database, there were delays and timeouts when running the pxctl service kvdb members command because the provided etcd endpoints to Portworx did not match with what were used internally.Resolution: Portworx now consults both configured KVDB endpoints and configuration before displaying the KVDB information. |
PWX-24649 | Occasionally, a race condition in the initial setup of Portworx was leading to an invalid topology-zone set on cloud-drives, resulting in Portworx allocating and using more cloud-drives than configured. Resolution: The startup issue has been fixed. |
PWX-26406 | Repeated CSI NodePublish/NodeUnpublish API calls were resulting in Portworx using more resources because these APIs did a deep Inspect on the volume.Resolution: CSI NodePublish/NodeUnpublish now uses fewer resources because it avoids a deep Inspect and any extra API calls to the Kubernetes API server. |
PWX-24872 | The pxctl cloudsnap list -x 5 command was reporting an error.Resolution: The issue has been fixed. |
PWX-26935 | When a pod with sharedv4 volume is terminated, Portworx unmounts the nfs-mounted path on the local node. When the remote NFS server node was powered off, unmount was delaying the pod termination. Resolution: This issue has been fixed. |
PWX-26578 | Portworx was taking a long time to run background tasks. On some clusters, this was causing long delays due to a large backlog. Resolution: This issue has been fixed. |
PWX-23454 | Pool and node labels that are the same as volumes were ignored when applying VolumePlacementStrategies. Resolution: This issue has been fixed. |
PWX-26445 | PVC creation could failing with an unauthorized error message if the service account token being used by Portworx is expired. Resolution: For Kubernetes version 1.21 and later, refresh the ServiceAccountToken for the Portworx service to prevent unauthorized errors (after 1 year by default, or 3 months for an EKS cluster). |
PWX-24785 | When installing Portworx on Rancher via helm charts, some permissions on Secret objects were missing in the Portworx spec.Resolution: px-role is added to both Portworx Enterprise and Portworx Essentials with the Kubernetes secrets permissions. |
PWX-19220 | The output of pxctl clouddrive commands did not provide cloud-storage details making it difficult to troubleshoot issues using vSphere GUI.Resolution: The outputs of the commands pxctl clouddrive list and pxctl clouddrive inspect now include vSphere datastore information and drive-labels, respectively. |
PWX-27170 | Cloudsnap restores were not forward compatible, meaning an older version of Portworx could not restore a newer version of cloudsnap. However, in such cases, cloudsnap restore was completed without an error, but without data. Resolution: Now from version 2.12.0, Portworx has a check that fails during such restore operations |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1619 | As a result of deleting application pods that are using sharedv4 or sharedv4 service volumes in Kubernetes, part of the pod's state may not be properly cleaned up. Later, if the pod's namespace is deleted, the namespace may be stuck in the Terminating state.Workaround: Contact Portworx by Pure Storage support team to clean up such namespaces. |
PD-1611 | When using the PX-StoreV2 datastore, running multiple concurrent resize and clone operations on the same fastpath volume may cause either resize or clone operation to fail. Workaround: Retry the failed operation. |
PD-1595 | When using the PX-StoreV2 datastore, a pool may not automatically transition into Online state after completing a drive add operation. Workaround: Perform a maintenance cycle on the node. |
PD-1592 | When using the PX-StoreV2 datastore, pool maintenance enter or exit operation may get stuck if there are encrypted PVCs attached on the node with an outstanding IOs. Workaround: Reboot the node where encrypted PVC is attached. |
PD-1650 | When using the PX-StoreV2 datastore, a PX-Fast volume may get attached in an inactive fastpath state because of internal sanity check failure. Workaround: Restart the application pod consuming the volume so that it goes through a detach and attach cycle which will reattempt fastpath activation. |
PD-1651 | When using the PX-StoreV2 datastore, in case of failure during installation, Portworx may get into a restart loop with an error message: PX deployment failed with an error "failed to create MD-array:” .Workaround: Clean up the failed install using node-wiper and retry installation. |
PD-1655 | You experience telemetry pod crashing issues due to port conflicts. Workaround: Adjust the Portworx start port by adding the following to your StorageCluster spec: startPort: <starting-port> |
2.11.5
January 12, 2023
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PD-1769 | It takes less time to upgrade Anthos because Portworx now makes fewer vSphere API calls for the following:
|
2.11.4
October 4, 2022
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-27053 | Added runtime knobs to modify the amount of work that a resync workflow does at any instant of time. These knobs will facilitate throttling of resync operation. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-27028 | While performing writes on the target, the resync operation pinned down resources even when the write operation had no priority to execute. Resolution: In version 2.11.4, this issue is fixed. |
2.11.3
September 13, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-26745 | In Portworx versions 2.11.0 through 2.11.2, Async DR restore takes longer than in previous versions. Resolution: In version 2.11.3, this issue has been resolved. |
2.11.2
August 11, 2022
New features
Portworx by Pure Storage is proud to introduce the following new feature:
- You can now enable encryption on the Azure cloud drives using your own key stored in Azure Key Vault.
Notes
Starting with Portworx version 2.12.0, internal objectstore will be deprecated.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-26047 | pxctl status now shows a deprecation warning when internal objectstore is running on a cluster. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-23465 | Backups were not encrypted if BackupLocation in Kubernetes had an encryption key set for cloudsnaps. (Note that this should not be confused with encrypted volumes. This encryption key, if set, is applied only to cloudsnaps irrespective of encrypted volumes.)Resolution: Backups are now encrypted in this case. |
PWX-24731 | The Grafana image was not included into the list of images for the air-gapped bootstrap script. Customers using Prometheus monitoring needed to manually copy the Grafana container image into their environments. Resolution: The air-gapped bootstrap script has been updated and now includes the Grafana image. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1390 | The billing agent might try to reach outside the network portal in air-gapped environments. Workaround: Disabled the call home service on Portworx nodes by running pxctl sv call-home disable . |
2.11.1
July 19, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-24519 | The mount path was not erased if you restarted Portworx at the wrong time during an unmount operation when using CSI. This caused pods to be stuck in the terminating state. Resolution: When you restart Portworx now, it ensures that the mount path is deleted. |
PWX-24514 | When a cluster is configured with PX-Security and using Floating license, it was not possible to add new nodes to the Portworx cluster. Resolution: You can now add new nodes to the cluster. |
PWX-23487 | On certain kernel versions (5.4.x and later) during startup, volume attach sometimes got stuck, preventing Portworx from starting. This is because a system-generated IO can occur on the volume while the volume attach is in progress, causing the volume attach to wait for IO completion, which in turn waits for startup to complete, resulting in a deadlock. Resolution: Portworx now avoids the deadlock by preventing access to the volume until attach is complete. This functionality is only enabled after a system reboot. |
2.11.0
July 11, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- On-premises users who want to use Pure Storage FlashArray with Portworx on Kubernetes can provision and attach FlashArray LUNs as a Direct Access volume.
- The CSI topology feature allows users of FlashArray Direct Access volumes and FlashBlade Direct Access filesystems to direct their applications to provision storage on a FlashArray Direct Access volume or FlashBlade Direct Access filesystem that is in the same set of Kubernetes nodes where the application pod is located.
- You can now use Portworx with IBM cloud drives on VPC Gen2 infrastructure. Portworx will use the IBM CSI provider to automatically provision and manage its own storage disks.
- You can enable pay-as-you-go billing for an air-gapped cluster with no outbound connectivity by acquiring a pay-as-you-go account key from Portworx. This key can be used on any cluster to activate the license, provided you can report usage collected by the metering module.
- You can now deploy Portworx in IPv6 networking enabled environments.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-24195 | Portworx supports using BackupLocation CR with IAM policy as an AWS s3 target for cloudsnaps triggered through Stork using ApplicationBackup CR. When Portworx detects that a BackupLocation is provided as a target, it uses the IAM role of the instance, where it is running, for authentication with s3. |
PWX-23392 | Updated the Portworx CSI driver to CSI 1.6 spec. |
PWX-23326 | Updated CSI Provisioner, Snapshotter, Snapshot Controller, Node Driver Registrar, and Volume Health controller to the latest releases. |
PWX-24188 | A warning log is removed, which was printed when Docker inquired about a volume name that Portworx could not find. |
PWX-23103 | Added a detailed warning message for volume provision issues when there are conflicting volumes in the trashcan. Volume provisioning with volume placement rules can be blocked by matching volumes in the trashcan. The new warning message informs you which trashcan volumes are causing a conflict. |
PWX-22045 | Portworx now starts faster on high-scale clusters. During the Portworx start-up process, you will see a reduction in API calls to cloud providers. In particular, AWS API calls related to EBS volumes will be reduced. |
PWX-20012 | Enabled support for pd-balanced disk type on Google Cloud environment. You can now specify pd-balanced as one of the disk types (such as, type=pd-balanced, size=700 ) in the device spec file. |
PWX-24054 | pxctl service pool delete --help command is enhanced to show Note and Example in addition to usage information. For example,Note: This operation is supported only on on-prem local disks and AWS cloud-drive Examples: pxctl service pool delete [flags] poolID |
PWX-23196 | You can now configure the number of retries for an error from the object store. Each of these retries involves a 10-second backoff delay, followed by progressively longer delays (incrementing by 10-second intervals) between each attempt. If the object store has multiple IP addresses as the endpoints, then for a given request, the retries are done on each of these endpoints. For more details, refer to Configure retry limit on an error |
PWX-23408 | Added support to migrate PSO volumes into Portworx through the PSO2PX migration tool. |
PWX-24332 | The Portworx diags bundle now includes the output of the pxctl clouddrive list command when available. |
PWX-23523 | CSI volume provisioning is now distributed across all Portworx nodes in a cluster providing higher performance for burst CSI volume provisioning. |
PWX-22993 | Portworx can now be activated in PAYG (pay-as-you-go)/SAAS mode using the commandpxctl license activate saas --key <pay_key> . |
PWX-23172 | Added support for cgroups-v2 host configurations, running with docker and cri-o container runtimes. |
PWX-23576 | Renamed the PX-Essentials FA/FB license SKU to Portworx CSI for FA/FB SKU. |
PWX-23678 | Added support for px-els (Portworx Embedded License Server) to install and operate in IPv6 network configurations. |
PWX-23179 | Provides a way to do an in-place restore from a SkinnySnap. Now you can create a clone from a SkinnySnap and use the clone to restore the parent volume. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14944 | Removed invalid tokens from PX-Security audit logs. When a non-JWT or invalid token was passed to the PX Security authentication layer, it was being logged. There is no impact to the user with this change. |
PWX-22891 | For On-premise Portworx installations not using cloud drives, a KVDB failover could potentially fail since Portworx on the node fails to find the configured KVDB device since its name had changed after the initial install. Resolution: Portworx will now fingerprint all the KVDB drives provided on all nodes regardless of whether they will run KVDB or not at install time. This ensures a KVDB failover happens even if device name changes on certain nodes. |
PWX-22914 | When undergoing an Anthos/EKS/GKE upgrade, Portworx could experience excessive delays due to internal KVDB failovers. Resolution: The KVDB failover rules now no longer consider storageless nodes as a KVDB candidate unless they have a dedicated kvdb-storage or are labeled with px/metadata-node=true . |
PWX-23018 | In a DR setup, destination clusters sometimes ended up with volumes in the trashcan if the trashcan feature was enabled. Users had to either delete the volumes from the trashcan in the destination cluster or disable the trashcan feature in the destination cluster. |
PWX-23185 | Cloudsnap restore operations sometimes slowed if a rebalance started on the same pool. Resolution: Portorx now avoids rebalancing volumes being restored from cloudsnaps. |
PWX-23229 | If the IP address of a node changed after reboot with an active cloudsnap, the cloudsnap operation failed with "Failed to read/write extents" error. This resulted in a failed backup, and users needed to reissue the cloudsnap. Resolution: Reattach the snapshot after node restart to avoid a snapshot being in an incorrect attachment state. |
PWX-23384 | When using a sharedv4 service volume with NFSv4 (default), users had to configure the mountd service to run on a single port even though NFSv4 does not use mountd . Resolution: Portworx now skips the check for the mountd port when using NFSv4 and omits the mountd port from the Kubernetes service and endpoints objects. |
PWX-23457 | Portworx showed Pure volumes separately in the pxctl license list output even when there were no explicit limits for Pure volumes, which was confusing. Resolution: This output has been improved. |
PWX-23490 | Previously, upgrading from Portworx versions 2.9.0 up to anything before 2.11.0 did not update the decision matrix. Resolution: Starting with 2.11.0, Portworx will now update the decision matrix at boot-time. |
PWX-23546 | The px-storage process restarted if the write complete message was received after a node restarted. |
PWX-23623 | Portworx logged benign warnings that the container runtime was not initialized. Resolution: Portworx no longer logs these warnings. |
PWX-23710 | When Kubernetes was installed on top of the containerd container runtime, the Portworx installation may not have properly cleaned up the containerd-shim process and container directories. As a consequence, nodes may have needed to be rebooted for the Portworx upgrade process to complete. |
PWX-23979 | KVDB entries for deleted volumes were not removed from KVDB. As a result, KVDB sizes might have increased in some cases where volumes were constantly being deleted and scheduled snapshot and cloudsnaps are configured. |
PWX-24047 | Portworx will no longer use Kubernetes DNS (originally introduced with PWX-22491). In several configurations, Kubernetes DNS did not work properly. Resolution: Portworx now relies on a more stable host's DNS config instead. |
PWX-24105 | In rare cases, Portworx on a node may have repeatedly restarted because of a panic due to nil pointer dereference when deleting a pod for a sharedv4 volume. Resolution: Portworx will not come up until the relevant pod is deleted manually from Kubernetes by scaling down the application. |
PWX-24112 | IBM csi-resizer would sometimes crash when resizing volumes.**Resolution:**This issue is addressed in the IBM Block CSI Driver version 4.4. Check the version on your cluster using the command ibmcloud ks cluster addon ls --cluster <cluster-id> . |
PWX-24187 | The PAYG (pay-as-you-go) license disables the license when there are issues with reporting/billing Portworx usage. After reporting/billing gets reestablished, the license is automatically enabled. Portworx did not add the default license features, requiring a restart of the portworx.service to properly re-establish the license. |
PWX-24297 | Portworx updated the decision matrix in the config map, causing nil pointer exceptions to appear in non-Kubernetes environments. Resolution: Portworx now checks that the config map exists before updating it. |
PWX-24433 | When Docker or CRI-O is not initialized on a cluster, Portworx would periodically print the following log line: Unable to list containers. err scheduler not initialized .Resolution: The log line is now suppressed when Portworx detects that there is runtime like Docker or CRI-O that is not initialized. |
PWX-24410 | DaemonSet YAML installs using private container registry server were using invalid image-paths (incompatible with air-gapped, or PX-Operator), thus resulting in a failure to load the required images. Resolution: Fixed regression introduced with Portworx version 2.10.0, when custom container registry server was in use. |
PWX-22481 | A pod could take upwards of 10 mins to terminate if a Sharedv4 service failover and a namespace deletion happens at the same time. Resolution: Scale down the pods to 0 before deleting the namespace. |
PWX-22128 | Async DR creates new volume (clone from its previously downloaded snapshot) on the destination cluster every time the volume is migrated. If cloudbackups are configured for the volumes from destination cluster, then every backup for a volume results in being full, as these volumes are newly created on every migration for the same volume. This change fixes this issue and allows the backups from destination cluster to be incremental. Resolution: When a volume is migrated, the volume is in-place restored to its previously downloaded snapshot and the incremental diff is downloaded to the volume without creating any new volume. Since there is no new volume being created, any backups for this volume can now be incremental. |
Deprecations
The following features have been deprecated:
- Legacy shared volumes
- Volume groups
- Hashicorp Consul support
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1325 | On IBM cloud drive, if pxctl sv pool expand with a resize-disk operation fails due to an underlying IBM issue, you will see the following error signature: Error: timed out waiting for the condition . This indicates that the IBM provider failed and could not perform the operation within 3 minutes.Workaround: If the underlying disk on the host has expanded, then issue the command pxctl sv pool update --resize --uid <> to complete the pool expand operation. If the underlying disk on the host have not expanded, check the IBM csi-controller pods for any potential errors reported by IBM. |
PD-1327 | On IBM clouddrive, if a pxctl service pool expand -s <target-size> with resize-disk for a target size X fails, you cannot issue another pool expand operation with a target size lower than the value X . On IBM clouddrive, a resize of the underlying disk is issued by changing the size on the associated PVC object. If IBM csi-driver fails to act upon this PVC size change, the pool expand operation will fail, but the PVC size cannot be reduced back to older value. You will see the following error: spec.resources.requests.storage: Forbidden: field can not be less than previous value .Workaround: When a pool expand operation and a subsequent IBM PVC resize is triggered, it is expected by the IBM CSI resizer pod to eventually reconcile and complete the resize operation. Once the underlying disk on the host has expanded, then issue the command pxctl sv pool update --resize --uid <> to complete the pool expand operation. If the underlying disk on the host has not expanded, check IBM csi-resizer pods for any potential errors reported by IBM. |
PD-1339 | When a Portworx storage pool contains a repl 1 volume replica, pool expansion operations report following error: service pool expand: resize for pool <pool-uuid> is already in progress. found attached volumes: [vol3] that have it's only replica on this pool.Will not proceed with pool expansion. Stop applications using these volumes or increase replicas to proceed. resizeType=RESIZE_TYPE_ADD_DISK,skipWaitForCleanVolumes=false,newSize=150 .The actual reason of failure is not resize for pool <pool-uuid> is already in progress ; the correct reason of failure is found attached volumes: [vol3] that have it's only replica on this pool.Will not proceed with pool expansion. Stop applications using these volumes or increase replicas to proceed. resizeType=RESIZE_TYPE_ADD_DISK,skipWaitForCleanVolumes=false,newSize=150 .Workaround: The command pxctl sv pool show displays the correct error message. |
PD-1354 | When a PVC for a FlashArray DirectAccess volume is being provisioned, Portworx makes a call to the backend FlashArray to provision the volume. If Portworx is killed or crashes while this call is in progress or just before this call is invoked, the PVC will stay in a Pending state forever.Workaround: For a PVC which is stuck in Pending state, check the events for an error signature indicating that calls to the Portworx service have timed out. If such a case arises, clean up the PVC and retry PVC creation. |
PD-1374 | For FlashArray volumes, resizing might hang when there is a management connection failure. Workaround: Manually bring out the volume from the maintenance mode. |
PD-1360 | When a snapshot volume is detached, you see the Error in stats : Volume does not have a coordinator error message. Workaround: This message appears because the volume is created, but not attached or formatted. A coordinator node is not created until a volume is attached. |
PD-1388 | the Prometheus Operator pulls the wrong Prometheus image. In air-gapped environments, Prometheus pod deployment will fail with an ImagePullBackOff error. Workaround: Before installing Portworx, upload a Prometheus image with the latest tag to your private registry. |
2.10.4
Nov 8, 2022
- This version addresses security vulnerabilities.
2.10.3
June 30, 2022
Improvements
Improvement Number | Improvement Description |
---|---|
PWX-23523 | CSI Volume Provisioning is now distributed across all Portworx nodes in a cluster. Large volume provisioning performance increases should be seen for large enough volumes. There is no user impact for customers other than higher performance for burst CSI volume provisioning. |
2.10.2
June 1, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-23364 | Fixed a CSI volume provisioning issue where orphaned Portworx volumes were left behind if a PVC deletion was issued before volume creation finished. Resolution: Users should upgrade their Portworx version if they are seeing orphaned Portworx CSI volumes with no associated PVC/PV. |
2.10.1
May 5, 2022
Fixes
Issue Number | Issue Description |
---|---|
PWX-19815 | pxctl credentials create commands were failing due to an RSA error when using Google Cloud KMS as the secret provider and trying to store credentials which were too long for the RSA key to handle.Resolution: This patch adds the fix for the issue without changing the existing operation to add credentials. |
2.10.0
April 7, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- The Portworx Application Control feature provides a method for controlling an individual Portworx volume’s IO or bandwidth usage of backend pool resources. Portworx volumes are created from a common backend pool and share their available IOPS and bandwidth amongst all other provisioned Portworx volumes.
- The volume trash can feature provides protection against accidental or inadvertent volume deletions which could result in loss of data. In a clustered environment such as Kubernetes, unintended deletion of a PV or a namespace will cause volumes to be lost. This feature is recommended in any environment which is prone to such inadvertent deletions, as it can help to prevent data loss.
- You can enable automatic filesystem trimming (auto fstrim) at the volume, node, or cluster level. When you enable auto fstrim at the cluster or node level and enable
nodiscard
on your volumes, auto fstrim monitors the unused space in all filesystems mounted on Portworx volumes and automatically triggers a trim job to return unused space back to the pool, and you do not have to manually issue trim jobs.
Notes
Portworx 2.10 is the last release where Ubuntu 16.04 will be supported.
Improvements
Improvement Number | Improvement Description |
---|---|
PWX-20674 | In Async DR setup, before creating a cluster pair, the DR license is checked on both the clusters. The cluster pair request will error out if only one of the clusters has DR license set. |
PWX-21318 | Users can set the frequency of full backups using pxctl cluster options update -b <number> . The default value is 7, and you can check the value that is set with pxctl cluster options list . The number controls the number of incremental backups made before a full backup is done. |
PWX-21024 | This fix adds --secret_key and --secret_options that allows users to propagate Kubernetes secret config information to the cli and the backend during volume import. |
PWX-19780 | Local volumes that are pending due to ha-increase will now appear when using pxctl volume list --node <node> . |
PWX-20210 | Added support for specifying throughput parameter for gp3 drive types in AWS. The throughput parameter can only be specified at install time. Portworx currently does not allow a way to change the throughput parameter once installed. One can still change the throughput of any drives directly from the AWS console. |
PWX-22977 | Sometimes cloudsnaps can fail with InternalServerError , resulting in cloudsnap backup failures and the need for user intervention to reissue the cloudsnap command for the same volume. This fix increased the aws-sdk retries and also added back-off retries. |
PWX-21122 | VPS users now have control at the pool level with the new built-in topologyKey portworx.io/pool to allow volume affinity and anti-affinity to work for individual pools. Users can now control volume placement topology at the narrower level of pool. This allows finer control than the default topology of nodes. |
PWX-20938 | Added a VolumePlacementStrategy template for use with StatefulSets that allows volume affinity and anti-affinity with volumes belonging to the same StatefulSet pod. Use the key/value px/statefulset-pod"/"${pvc.statefulset-pod} in a matchExpression . With this, you can ensure volumes in the same StatefulSet pod do or do not land on the same node. |
Fixes
Issue Number | Issue Description |
---|---|
PWX-9712 | Some applications might starve Portworx from Async IO event reservations. This will result in a panic loop. Resolution: The absence of Async IO reservation is now a soft failure. |
PWX-20675 | ClusterPair can also be set up using a backup location Kubernetes object instead of creating credentials on both clusters. There was an issue where destination credentials were reset, then the system ignored the backup location object and used the internal object store. This caused migration failures. User impact: Migration start failing for cluster pair configured using backup location. Resolution: Resets of credentials on DR sites have been fixed for cluster pairs configured using backup location. |
PWX-21143 | Portworx POD (oci-monitor container) was using a broad privileged:true security privilege, enabling too many security attributes.Resolution: We have replaced the broad privileged:true security setting with fine-grained security privileges. |
PWX-21358 | When a Portworx cluster is created in a vSphere environment, Portworx disks (vmdks) were unevenly placed among the datastores in a vSphere storage cluster. In extreme cases, all vmdks would land up in the same datastore. We have taken a best effort approach of distributing vmdks as evenly as possible among all the datastores in a Storage Cluster to the extent that vSphere apis allow. User impact: Users had to deal with an uneven distribution of vmdks because vmdk movement across datastores is not supported. To work around this issue, users would bring up nodes one at a time. Resolution: This best-effort approach is available for a more even distribution of vmdks among datastores of a storage pod. |
PWX-21389 | When external EtcD v3 configured with user AuthN (authentication) as a KVDB, Portworx was not installing correctly. Resolution: When user-AuthN is in use, KVDB clients are now properly initialized and set up. |
PWX-21514 | In an internal kvdb cluster operating at maximum kvdb cluster size, if one of the kvdb member nodes goes down, it will be replaced by an available non-kvdb node in the Portworx cluster. When the previous member recovers from the problem and comes up, its kvdb disk will be deleted. In a vSphere environment, this deletion used to fail and users would see an additional kvdb drive when they list all available drives in the Portworx cluster. User impact: vSphere environments could see unused kvdb disks lingering around in the cluster until they are deleted from outside of the Portworx environment. Recommendation (optional): If users choose not to upgrade, they will have to manually delete those extra lingering disks. |
PWX-21551 | When a Portworx volume switched to read-only mode, Portworx restarted docker-containers that use px-volumes, but it did not restart containerd/cri-o containers. Resolution: Portworx now also restarts containerd/cri-o containers. |
PWX-22001 | Volume placement strategy rules with affected_replica rules will now be applied when increasing the HA level of a volume.User impact: Rules with affected_replica volume placement rules were not correctly applied when increasing HA level as they were when initially provisioning a volume for the same HA level. |
PWX-22035 | If a node restarted when Portworx creates a snapshot after deleting a volume, Portworx sometimes restarted. User impact: When the following things happen together:
Resolution: This release addresses this issue and Portworx no longer restarts under this circumstance. |
PWX-22478 | Portworx node-wipe operation did not clean up all the old node identitifiers, which caused issues with the telemetry container after the node was wiped or recycled.Resolution: Portworx node-wipe procedure was fixed, so all node identities are properly recycled. |
PWX-22491 | Portworx installations were using the default dnsPolicy , which did not include the Kubernetes internal DNS server.Resolution: We changed the default dnsPolicy to ClusterFirstWithHostNet , so now the Kubernetes DNS is also used in hostname resolution. |
PWX-22787 | Portworx generates a core then restarts on certain nodes where application pods are trying to setup. User impact: Portworx will generate a core and restart itself in the scenario where an application pod is trying to attach a volume on a node and the volume is already attached and busy in another node in the cluster. The Portworx service will auto-recover from this after the restart. Only Portworx 2.9.1 was impacted by this issue. Resolution: The issue causing Portworx to restart has been fixed. |
PWX-22791 | There is no update API for credentials. User impact: When keys are rotated for cloudsnap credentials, there is no way to update the credentials with the new keys. Only way is to delete and recreate credentials with new keys. This requires stork schedule for cloudsnap to be updated with new credential ID to avoid failures due to credential ID mismatch. With porx version 2.10, update API has been added to credentials and allows users to update most of the parameters. Caution must be exercised while changing parameters such as bucket or the end point, which causes previous cloudsnaps to be no longer visible through the modified credentials. |
PWX-22887 | After a node is decommissioned, backups may fail for volumes which were attached on the decommissioned node. Resolution: Detaching the volume using pxctl host detach fixes the issue. |
PWX-22941 | While performing an internal kvdb node failover, a failure in setting up internal kvdb could result into an orphaned unstarted node entry in the internal kvdb cluster. User impact: Internal kvdb clusters would keep running at a reduced cluster size. Resolution: A Portworx node which failed to perform internal kvdb failover will detect its own orphaned node entry in the kvdb cluster and clean it up. |
PWX-22942 | In stretch cluster, cloudsnaps can sometime fail with Not authenticated with secrets error. This is due a missing check, which schedules the cloudsnaps on a node in a Kubernetes cluster without access to credentials (BackupLocation).Resolution: Fix the check to not allow cloudsnaps on nodes without Kubernetes secrets. |
PWX-23060 | Previously, FlashArray and FlashBlade had a limit of 200 Portworx or FA/FB volumes in a cluster. Resolution: The limits are now 200 Portworx and 100,000 FA/FB volumes in a cluster. |
PWX-23085 | A Portworx upgrade on a storageless node in the cloud drive configuration can get stuck with the message DriveAttachedOnDifferentNode Error when pool ExpandInProgress on another node in the cluster .Resolution: The issue has been fixed. Portworx now skips drives where expansion is in progress until the expansion is complete. |
PWX-23096 | Cloudsnaps may stay in active state forever when the executing node is decommissioned. User impact: Cloudsnaps may be stuck in active state for ever. Further requests to the same volume may be in queued state. Resolution: Mark that these cloudsnaps are stopped, allowing newer requests run on a different replica. |
PWX-23099 | Restoring cloudsnap with the intention of doing inplace restore fails with the error Not enough space available on the pool , even though the used size of the volume being restored is less than available space on pool.User impact: Failure to restore the cloudsnap to the same pool as parent volume. Resolution: Check for pool space with respect to used size of the volume being restored than the actual volume size. |
PWX-23119 | Panic was occurring in one of the Portworx processes when creating a local snapshot for replicated volumes having more than 1 replica and which have the skinnysnap feature enabled. User impact: When the skinnysnap feature is enabled and the number of skinnysnaps is set to greater than 1, the Portworx process on one or more nodes may panic while creating a local snapshot. This happens if not all replicas are online on a volume with a replication level greater than 1. Resolution: Fix the out of bounds access that caused the panic in skinnysnap creation path with replicated volumes having more than 1 replica. |
PWX-23151 | On OpenShift platform, the Portworx service could not use Kubernetes client APIs if the Portworx POD was stopped. Resolution: The Portworx service has been isolated from Portworx POD, so stopping the POD on OpenShift platform no longer prevents Portworx service from using Kubernetes client APIs. |
PWX-23155 | In a cluster with more than 512 nodes, Portworx fails to start after 511 nodes. Portworx had a limit to open client connections which gets crossed while adding nodes after 511. User impact: More than 511 nodes cannot be added to the cluster or it will cause Portworx to be in a crash loop. |
PWX-23174 | For fragmented large volumes, repl-add keeps restarting from scratch.User impact: As repl-add is stuck in loop, ha-add won't finish, and a new replica is not added to the replica set. |
PWX-23189 | On Tanzu vSphere with the Kubernetes platform, the worker VMs have a 16GiB root partition. Due to the small size of the root partition, this can lead to disk pressure as the life of the cluster increases. Resolution: We recommend that you monitor the free disk space on these workers continuously and garbage collect space as needed. |
Known issues (Errata)
Issue Number | Issue Description |
---|---|
PD-1104 | When installed in vSphere local mode, if a VM which is running a storageless Portworx node is migrated to another ESXi which does not have any storage Portworx nodes, this storageless Portworx node will fail to transition into a storage node. Workaround: Restarting Portworx on the storageless Portworx node will transition it into a storage node. Portworx can be restarted by applying the px/service=restart label on the Kubernetes node or by issing systemctl restart portworx on the node. |
PD-1117 | When trash can is enabled in disaster recovery setup (by setting a value greater than 0 for VolumeExpiration in cluster settings) on the destination cluster, users will see many volumes. If the expiration is set to a very large number, these snapshots might take up significant capacity as well. This is a known issue and will be addressed in future release.Workaround: Do not enable VolumeExpiration in cluster settings. |
PD-1125 | If a Portworx storage pool is in an Error state (seen in pxctl service pool show ), do not submit new pool expand operations on the pool.Workaround: Before submitting new pool expand operations, fix the pool state by entering and then exiting the pool in maintenance mode using the pxctl service pool maintenance command. |
PD-1127 | Portworx pool expand operation has the status failed to update drive set state: etcdserver: leader changed .Workaround: This error indicates that the actual pool expansion is complete in the background. The message occurs when Portworx tries to update the status of the drives in the pool. |
PD-1130 | A storage pool expand using add-disk can be stuck in progress with the error Pool is still not up. add drive status: Drive add: No pending operation pool status: StorageFull .Workaround: Restart Portworx on the node to resolve the issue. This can be done by issuing systemctl restart portworx or labeling the Kubernetes node with the px/service=restart label. |
PD-1165 | Due to an incomplete container image, Portworx installation or upgrade operations can get stuck with the message: could not create container: parent snapshot <> does not exist: not found .Workaround: Identify the px-enterprise image and remove it. The following sample commands do this:ctr -n k8s.io i ls | grep docker.io/portworx/px-enterprise:2.10.0 ctr -n k8s.io i rm docker.io/portworx/px-enterprise:2.10.0 |
2.9.1.4
Apr 1, 2022
Notes
- This version addresses security vulnerabilities.
2.9.1.3
Mar 15, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22943 | Portworx with FA cloud drives was erroneously able to start with the user_friendly_names setting enabled. User impact: Portworx installs successfully initially, but on restart it won't be able to identify its own drives. This could cause Portworx to create new drives ignoring the already created ones. Resolution: Portworx no longer starts if the multipath user_friendly_names setting is enabled. If after installing this version you receive this error, update your multipath configuration. |
2.9.1.1
Feb 3, 2022
ADVISORY: Pure Storage strongly recommends users of the 2.9.1 release upgrade to 2.9.1.1.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22787 | Under a certain race condition, Portworx could generate a core and restart itself. This could happen when an application pod tries to attach a volume on a node while the volume is already attached on another node in the cluster. User impact: Portworx on the node where the application pod is trying to attach the volume would generate a core and restart. The Portworx service auto-recovered from this after the restart. Only Portworx 2.9.1 was impacted by this issue. Resolution: The issue causing Portworx to restart has been fixed. |
2.9.1
Jan 27, 2022
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Support for Pure FlashBlade as a Direct Access filesystem has graduated from early access to Generally Available! With this feature, Portworx directly provisions FlashBlade NFS filesystems, maps them to a user PVC, and mounts them to pods. Reach out to your account team to enable this feature.
- Support for Pure FlashArray cloud drives has graduated from early access to Generally Available! Use FlashArrays as a cloud storage provider. Reach out to your account team to enable this feature.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-22105 | Portworx now supports PKS distributions based on "containerd" container runtimes. |
PWX-21721 | The pxctl status command's response time is now reduced when telemetry is enabled. This was done by running telemetry status asynchronously and caching its status. |
PWX-20642 | Portworx no longer requires global permissions on all datastores. Users can now specify which datastores to give Portworx access to. |
PWX-22195 | Improved Portworx logs by adding a co-relation ID to every API request and will be logged at all levels. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22197 | CSI provided drives may have incorrect classification of media type: Disks based on flash media getting classified as Magnetic disks. User impact: Portworx may incorrectly classify other storage attributes which are derived from the storage media like IO Priority, etc. Resolution: CSI provided drives are now correctly classified. If you’ve had your drives incorrectly classified, you can manually change the io_priority of the pool using the pxctl sv pool update command. |
PWX-22079 | A golang panic (stacktrace) occurred when there was an error initializing the storage layer. User impact: When there was an error while initializing the storage-layer, golang sometimes panicked (stacktrace output), and the real error was masked. Resolution: Portworx now properly handles errors from the storage layer, and no longer causes golang panics. |
PWX-21605 | Portworx keeps a track of the number of NFS threads configured on a node. If the number of threads drops below 80% of the configured value it will reset it to the configured thread count. However a variance of 80% was too large, and on an overloaded system could cause the system to run with fewer number of threads than desired. User impact: On certain overloaded systems, sharedV4 application pods could see NFS timeouts since the NFS server had less number of threads than the configured amount. Resolution: Portworx will now keep the NFS thread count within 95% of the configured value. |
PWX-22313 | Transitioning a storageless node into a storage node caused other nodes in the cluster to receive a NodeDown event (for the storageless node) with an IP which matches with the new storage node. This caused the sharedV4 server to assume that there were no sharedV4 clients active on that IP. User impact: A sharedV4 app running on such a node which transitions could see I/O errors. Resolution: Portworx on peer nodes will now detect these transitions, ignore NodeDown events for the same duplicate IP, and avoid removing the client for the sharedV4 volumes. |
PWX-22244 | When fsGroup is set on the volume, the kubelet has to perform a recursive permissions change on the mount path. This can take time and delay the pod creation when there are a large number of files in the volume. In Kubernetes 1.20 or later, there is a setting fsGroupChangePolicy: OnRootMismatch which tells kubelet to skip the recursive permissions change if the permissions on the root (mount path) are correct. This prevented the delay in pod creation, but was rendered ineffective when Portworx reset a permissions value. User impact: Specifying fsGroupChangePolicy: OnRootMismatch did not alleviate the pod creation delay caused by the fsGroup setting. |
PWX-22237 | A race condition in Portworx during storageless node initialization sometimes created an orphaned entry in pxctl clouddrive list where that node's entry was not present in pxctl status . User impact: The node list between pxctl status and pxctl clouddrive list was not in sync. Resolution: Portworx now handles this race condition and ensures that such orphaned entries are removed. |
PWX-22218 | Portworx failed to mount volumes into asymmetrical shared mounts. User impact: When using asymmetrical shared mounts (e.g. mounting different directories between host/container), it was not possible to mount Portworx volumes into these directories. Resolution: After the fix, asymmetrical shared mounts work properly (i.e. you can mount volumes into such directories). |
PWX-22178 | On Kubernetes installations that use the CRI-O container runtime, setting up a custom bidirectional (shared) mount for the Portworx pod did not propagate to portworx.service . Instead, it would be set up as a regular bind-mount, that could not be used to mount the PXD devices. Resolution: The bidirectional mounts are now properly propagated to portworx.service . |
PWX-21544 | When coming out of run-flat mode after more than 10 minutes have elapsed, a Portworx quorum node sometimes failed to start a watch on the internal KVDB because the required KVDB revision had already been compacted. User impact: When using an internal KVDB, there may have been a brief outage when Portworx exits run-flat mode. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-1076 | When using Portworx with FlashArray, if new drives are added while paths are down, it may not have all connections established and may result in failures when only a certain subset of paths go down, even if others are live. Workaround: This can be recovered after all paths are present with pxctl sv m --cycle , which will detach and reattach the drives, hopefully ensuring all paths are added back. |
PD-1068 | When using Portworx with FlashArray, expanding a pool by resizing when some paths are down (even if some are still up) may result in issues, as the single paths may not pick up the new path size and fail the multipath resize operation. Workaround: Run the resize again when all paths are restored to resolve the issue and complete the expansion. |
PD-1038 | Portworx pool expand operations fail to resize when some multipath connection paths of FA are down. Workaround: After the network is restored, you can run iscsiadm -m session --rescan and expand the pool again. |
PD-1067 | Disabling a port on the FlashArray will also remove it from the list of ports in the REST API, and thus Portworx will not attach it. This can cause some multipath paths to remain offline even after the port is reattached, especially if Portworx had a restart while the port was down. Workaround: You can recover the faulty paths by running the pxctl sv m --cycle command to reattach them and bring back the missing paths. Note that unless all paths are down, Portworx will still function fine, just with reduced iSCSI/FC-layer redundancy. |
PD-1062 | px-pure-secret contains FlashArray and FlashBlade connection information, specifically management endpoint and token. The secret is loaded when Portworx starts, therefore it needs to be present before Portworx is deployed. Also, any changes to the secret after Portworx is already started will not be detected. Workaround: If you need to change array backends or renew the token, you must restart Portworx. This also applies to FlashArray disk provisioning, and impacts changes to the FA/FB essentials licenses. |
PD-1045 | In the cloud drive mode of deployment with a FlashArray, a restart of the primary controller or a network outage to the FlashArray could cause a storage node to transition into a storageless node. This transition happens since another storageless node in the cluster picks up the disks and starts as a storage node. The original storage node however still ends up having the signature of the old disks and starts up as a storageless node with StorageInit failure. This happens only if Portworx on this node is unable to cleanly detach its disks due to the primary controller on FlashArray being down. Similarly, after the primary controller restarts or a network outage occurs, a storage node could see errors from the backend disk, or the internal KVDB could see errors from the KVDB disk and cause Portworx or the internal KVDB on that node to enter an error state. Workaround: Once the primary controller is back, restart Portworx on the impacted node to recover. |
PD-1063 | If the Kubernetes ETCD is unstable, Portworx may experience intermittent access issues to the Kubernetes API. Workaround: If a pool expand operation fails with the error message: could not retrieve portworx-storage-decision-matrix config map: etcdserver: leader changed" , retry the pool expand operation. |
PD-1093 | Application pods can get stuck in the ContainerCreating state. Check for both of the following conditions to determine if volume attachment has failed:
|
PD-1071 | If you manually disconnect any connected volumes from FlashArray, the Portworx node may become stuck attempting to reconnect to the original volume if there are pending I/Os. Workaround: Reconnecting the volume will resolve this issue at the next Portworx restart and the node will return to a healthy state. |
PD-1095 | If you uninstall Portworx with deleteStrategy set to Uninstall (and not UninstallAndWipe ), then you reinstall Portworx, the telemetry service and metrics collector will be unable to push metrics and may run into a CrashLoopBack state. This is for certificate security reasons.Workaround: Contact Support to reissue the certificate. |
2.9.0
Nov 22, 2021
Notes
- If you're using Kubernetes 1.22, you should use Stork 2.7.0 with Portworx 2.9.0.
- After upgrading to Portworx 2.9.0, the existing
sharedv4 service
volumes will be switched to using NFS version 4.0.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18037 | Improved pxctl status bootstrap issue reporting when KVDB connectivity is blocked. |
PWX-18038 | Clarified error message when using an incorrect network interface. |
PWX-18362 | Using the pxctl cloudsnap list -d command, you can now list cloudsnaps of volumes that are no longer present in the cluster, but belonged to it. |
PWX-20670 | Portworx will attempt to enable persistent journaling when installing . |
PWX-21373 | The following template can now be used in a VolumePlacementStrategy for the volumeAntiAffinity or volumeAffinity to automatically constrain the MatchExpressions to the PVC namespace.- key: "namespace" values: - "${pvc.namespace}" You can now separate interaction between different namespaces when using volume (anti-)affinity in VPS. |
PWX-21506 | One of the folders used by legacy shared (fuse) volumes will not created unless shared volumes are created and mounted. This change prevents the internal mount path, specifically (/opt/pwx/oci/rootfs/pxmounts ), from being created when there are no shared volumes being used. |
PWX-21994 | Added support for the cgroup V2 -configured hosts. |
PWX-21662 | Portworx now supports OpenShift version 4.9. |
PWX-21341 | Added the sharedv4_failover_strategy storageClass parameter whose value can be either aggressive or normal . The aggressive strategy uses a shorter failover grace period than the one used by the normal strategy. If sharedv4_failover_strategy is unspecified, then the default for sharedv4 service volumes is aggressive and that for sharedv4 volumes is normal . The value for this parameter can be changed using the pxctl volume update command as well. An empty value clears the setting. |
PWX-21684 | Telemetry is now disabled by default opn the spec generator. Enable telemetry under advanced settings. |
PWX-21895 | KDVB Metrics are now disabled by default, lowering the amount of metrics generated. You can reenable them by adding kvdb_metrics_enable=1 as a runtime option. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21710 | If a KVDB node encounters IO errors and restarts, it will fail to unmount the previous mountpoint. User Impact: The KVDB node did not start, resulting in reduced KVDB availability if there were no other nodes available to take over for the failed replica. Resolution: The Portworx container now looks for unhealthy mount points and unmounts them. |
PWX-21590 | Portworx nodes would not transition into run-flat mode when the etcd cluster was unreachable and lost cluster quorum. Each node detected the etcd cluster as unreachable at different times, rendering the cluster unable to reach consensus on whether etcd quorum is lost. User Impact: Cluster KVDB quorum as well as Portworx cluster quorum would be lost and Portworx would not transition into run-flat mode. Resolution: All Portworx nodes will now detect etcd is not reachable within 1 minute and enter run-flat mode. |
PWX-21506 | Mount paths used by shared (fuse) volumes have wider permissions than desired. User Impact: One of the folders used by legacy shared (fuse) volumes were not created unless shared volumes were created and mounted. Resolution: This change prevents the internal mount path, specifically ( /opt/pwx/oci/rootfs/pxmounts ), from being created when there are no shared volumes being used. |
PWX-21168 | Added support for the RKE2 Kubernetes distribution. User Impact: The RKE2 has switched to K3s Kubernetes distribution baseline, which broke the Portworx deployment. Resolution: The YAML-generator at Portworx Central had been fixed to recognize the RKE2-based Kubernetes version, and automatically apply the customization required to install Portworx. |
PWX-20780 | Pods using encrypted sharedv4 volumes got stuck in the terminating state. User Impact: If a node that was hosting the replica of a sharedv4 encrypted volume (server node) was rebooted, it was possible for application pods accessing that volume to get stuck in the terminating state. Resolution: Portworx will detect restart of a sharedv4 encrypted volume server node and will automatically restart application pods using that volume which will recover them to functional state. |
PWX-21965 | Portworx .stack files were getting accumulated in /var/cores and not deleted. User Impact: Users experienced a growing number of files with the .stack extension in /var/cores on the the worker nodes, which they must delete manually.Resolution: The new .stack files generated in Portworx now are now deleted automatically. |
PWX-21823 | In cloud setups, if a cluster was scaled down, resulting in internal KVDB nodes being shutdown, then there was a possibility that more than the quorum number of nodes were removed from the bootstrap configuration map where the internal KVDB nodes list is maintained. User Impact: If internal KVDB nodes were scaled down, users had to manually recover the KVDB after scale up. |
PWX-21807 | Expanding a pool using add drive beyond 6 drives caused creation of new pools. User Impact: If you had six drives in a single pool, and you tried the pxctl service pool expand command to add drives, the command created a brand new pool instead of expanding the existing one.Resolution: The pxctl service pool command will now fail the operation with an error message indicating that the pool has reached the maximum number of drives. |
PWX-21664 | When Kubernetes was installed on top of the containerd container environment and using an older (non-default) runtime version io.containerd.runtime.v1.linux , Portworx installation sometimes did not properly clean up the containerd-shim process and container directories. User Impact: The node may have required a reboot for Portworx upgrade process to complete. Resolution: Portworx now cleans up processes and directories when running on this older containerd runtime. |
PWX-21659 | In certain failed scenarios where cloud backup would fail even before creating a cloud backup ID, the migration status would not be updated with the appropriate error message. User Impact: Migrations triggered as a part of Async DR would fail with an empty error message. Resolution: Portworx now sets the correct message in all cloud backup error scenarios. |
PWX-21558 | You can sepcify discard or nodiscard filesystem mount options using two different volume spec fields (volumespec.nodiscard and volumespec.mount_options ). User Impact: This resulted in ambiguous volumespec settings in the following cases: volumespec.nodiscard = true and volumespec.mount_options="discard=" volumespec.nodiscard = false and volumespec.mount_options="nodiscard=" Resolution: When ambiguous volumespec settings for discard or nodiscard are detected, Portworx uses the volumespec.nodiscard to derive the final value, and generates an alert to notify that the volumespec needs to be fixed. |
PWX-21380 | One of the internal data structures was not properly protected for access from multiple threads. This caused Portworx to encounter an error and be restarted. User Impact: In a scaled-up setup with a large number of nodes, Portworx restarted intermittently. Resolution: Portworx now does not share the data structure between multiple threads. |
PWX-21328 | Applications using file locks do not work with sharedv4 service volumes while using NFS version 3. Any attempt to acquire a file lock hangs indefinitely. User Impact: Containers stuck in this state cannot be terminated using normal methods. Resolution: This no longer occurs with sharedV4 service volumes. |
PWX-21297 | The Portworx pod did not encode stored credentials correctly when authenticating with the container-registry to download the px-enterprise container. User Impact: Depending on the characters used in password for the container-registry, authentication continuously failed, and the Portworx pod was unable to pull and install Portworx on the cluster. Resolution: Portworx now properly encodes credentials. |
PWX-21004 | Locks are not transferred on failover. User Impact: Users saw unpredictable behavior for applications relying on NFSv4 locking. Resolution: Portworx disallows NFSv4 for sharedv4 service volumes. |
PWX-20643 | Pods that used several sharedv4 volumes sometimes became stuck in the terminating state. User Impact: When a pod that was using several sharedv4 volumes was deleted, it sometimes became stuck in the terminating state. Users had to reboot the node to escape this state. Resolution: Pods using sharedv4 volumes no longer get blocked indefinitely. |
PWX-19144 | During migration, Portworx volume labels were not copied. User Impact: Users were forced to manually copy volume labels after migration to reach the desired state. Resolution: Volume labels are now copied automatically during migration. |
PWX-21951 | The Operator created more storage nodes than maxStorageNodePerZone specified in STC.User Impact: The Portworx cluster came up with a different number of storage nodes than the number specified using the maxStorageNodePerZone parameter in the STC.Resolution: Portworx now comes up with the exact number of storage nodes specified in the maxStorageNodePerZone parameter. |
PWX-21799 | Added support for PX_HTTP_PROXY with RKE2 installs. User Impact: When installing Portworx in air-gapped environments, you can include the PX_HTTPS_PROXY environment variable to use http-proxy with the install. However, this variable was not used when pulling px-enterprise image during installs on Kubernetes with containerd container-runtime clusters.Resolution: Portworx now uses the PX_HTTPS_PROXY environment variable when installing Portworx on Kubernetes with containerd container-runtime clusters. |
PWX-21542 | When you installed Portworx on an air-gapped cluster, the node start up time was delayed. This happened because the metering agent ran to report the health of Portworx cluster. User Impact: Portworx tried to run the metering agent, resulting in a delayed start. Resolution: The metering agent is now disabled to avoid the delayed start. |
PWX-21498 | The drain volume attachment job timeout was too long. User Impact: The drain volume attachment job had an upper time limit of 30 minutes. In a certain error scenarios the job will stay in pending for 30 minutes and then timeout and fail. Resolution: Changed the drain volume attachment job timeout from 30 minutes to 10 minutes. Any volume attachment drain operation is expected to complete within 10 minutes. |
PWX-21411 | When a Portworx node went out of quorum and then rejoined the cluster, the volume mountpoint sometimes became read-only. User Impact: Some application pods became stuck in the container creating or crashloopbackoff state. Resolution: Portworx now detects the pods with ReadOnly PVC and proactively bounces the application pods after Portworx startup completes on a node. |
PWX-21224 | Setting cloudsnap threads to four or less resulted in cloudsnap backups in hung state. User Impact: With cloudsnap threads (Cloudsnap maximum threads field in cluster options) set to four or below and doing more than 10 cloudsnaps at the same time, cloudsnaps became stuck and made no progress. Resolution: Incorrect check for thread count no longer results in a deadlock. |
PWX-21197 | The runtime option limit_drivers_per_pool did not work in Portworx version 2.8.0. User Impact: The limit_drives_per_pool is a runtime option to control the number of drives in a pool. It was possible for the last pool to have more drives than the limit under the following circumstances: if the drive count is not an exact multiple of the limit and if creating another pool will have too few drives within. Internally, a new pool is created if drive count is at least 50% of the limit is available in the last pool.Resolution: Now, during the later drive add from maintenance mode, these limits are honoured more strictly. Any drive add will fall into a pool only if the drive count is within the limit. If not, a new pool will be formed. |
PWX-21163 | Prometheus was unable to access the Portworx internal ETCD on a multi network interface setup causing ETCD alerts not to appear. User Impact: Prometheus alerts and monitoring did not work properly for the alerts related internal ETCD on multi-network interface setups. Resolution: Portworx now allows internal etcd access from all network devices for Prometheus can have access and scrape alerts. |
PWX-21057 | Pods failed to come up for restored PVCs that were encrypted with Vault namespace secrets. User Impact: If you used a PVC that was cloned from a snapshot, and it was encrypted by Vault namespace secrets, then pods using that PVC were stuck in the container creating state. Resolution: Portworx now checks for the Vault namespace value at the correct place in the volume spec for restored volumes. This allows pods to finish setup and not get stuck. |
PWX-21037 | Unable to set the mount_options in volume spec. The nodiscard or discard volume fields were not in sync with mount_options .User Impact: This resulted in unpredictable behavior wherein after mounting a pxd volume , volumes appeared mounted with discard option even when volumespec.nodiscard was set to true .Resolution: Portworx now allows setting mount options. The volumespec.nodiscard and mount options are now made to be in sync, and this provides predictable behavior for pxd volume mounts. |
PWX-20962 | Portworx experienced a startup issue with volatile mounts of files in Kubernetes environments. User Impact: If files were manually removed from the /opt/pwx/oci/mounts/ directory, followed by restarting Portworx service before Portworx pod, this sometimes resulted in a looping failure to start the Portworx service.Resolution: Synchronization of volatile mounts into /opt/pwx/oci/mounts/ is fixed, so the restart of the Portworx pod properly restores files/directories in this directory. |
PWX-20949 | Cloudsnap restore operations sometimes failed with the error message Restore failed due to stall . This happened when the restore operation incorrectly evaluates the node to be in maintenance mode. User Impact: User restores may not have completed, and users may have been required to restart the node where the restore was stalled, or reissue the restore command. Resolution: Portworx cloudsnap now does not interpret node status. An upper layer module handles it, preventing this issue. |
PWX-20942 | Synchronous DR on Tanzu cloud drives did not work. User Impact: In Tanzu, the cloud drive backing Portworx is a PVC, which is a cluster-scoped resource. In order to help clusters distinguish between their drivesets, the drivesets should have been labeled accordingly. Resolution: Portworx now supports Tanzu PVCs. |
PWX-20903 | The affected_replicas in VolumePlacementStrategy ReplicaPlacementSpec were not applied correctly when multiple were used. User Impact: Volume provisioning sometimes failed when you had a VPS replicaAffinity with multiple rules using affected_replicas that should work. Resolution: Using multiple affected_replicas rules now work in Portworx as expected. |
PWX-20893 | The pxctl v check --mode fix_safe <volname/volid> command failed when Deleted inode <ino> has zero dtime was found by fsck . User Impact: If a file was in use at the same time it was deleted, it was never properly closed, and the filesystem was remounted, users may have seen an error from fsck .Resolution: The pxctl v check --mode fix_safe <volname/volid> command now recovers the volume to a clean state. |
PWX-20546 | Portworx sometimes displayed a rebalance job status as DONE while rebalance actions were still pending. User Impact: You may have seen a rebalance job status as DONE with some pending jobs belonging to deleted volumes.Resolution: Portworx now does not show the rebalance job for deleted volumes. |
PWX-21279 | In some cases, when multiple matchExpressions were specified in a volume placement strategy, a pool which satisfied only some expressions got selected. User impact: Instead of multiple matchExpressions, users could use multiple rules in versions affected by this issue to get the expected result. Resolution: Portworx now checks for all match expressions. |
PWX-20029 | If an NFS client node was restarted, the client reload process would restart all the pods assigned to the node. If it detected that an NFS server is down, it would start the timer for that event. When the timer expired, the process invoked the recovery routine, and the pods that use the volumes with that NFS were restarted again. This restart was sometimes unnecessary. User impact: Users saw pods reset unnecessarily. Resolution: Portworx now gets the latest list of attached volumes in the recovery routine, and only restarts the pods that are still using the stale volumes. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-1015 | If the Portworx pod is deleted and recreated while a pool expand of a Portworx Pool is in progress, the pool expand will fail with the error message: could not retrieve portworx-storage-decision-matrix config map: Unauthorized .Workaround: Wait for the new pod to come up and then resubmit the pool expand request. The subsequent request will go through. |
PD-1007 | During a Portworx cluster upgrade, if the cluster is using ContainerD as the runtime, a Portworx node may get stuck during upgrade.Workaround: Check if the px-oci-installer process is stuck on the node using the`ps -ef |
PD-1005 | In certain scenarios, a PVC resize request issued from Kubernetes can error out and the API will fail, but Portworx will eventually complete the resize operation. Kubernetes retries the operation but Portworx then returns the error: No change is requested . This causes the PVC size to not match with the actual Portworx volume size. |
PD-990 | Application containers running with non-Docker container runtimes will not get automatically restarted when Portworx detects issues with the volume mounts for those containers. Examples of non-Docker container runtimes are containerd, CRI-O, rkt. The most common scenario where volume mount issues are detected is when the mount becomes read-only when Portworx is down on a node for more than 10 minutes. Workaround: Restart application containers to workaround this issue. In Kubernetes, this means deleting application pods so that they get recreated. |
PD-1020 | A Portworx install where the backing drives are provisioned by Portworx can fail with the following fingerprint:Failed to format [-f --nodiscard /dev/<device-name>] . The issue can happen when the corresponding device is not attached completely on the node and the format cannot detect it.Workaround: Restart Portworx on the node. To restart Portworx, either label the corresponding Kubernetes node with label px/service=restart or if you're already on the node, systemctl restart portworx . |
PD-1017 | The application that uses sharedv4 service encrypted volume may have some pods stuck in the ContainerCreating state after there is a sharedv4 service failover and then failback. The failover happens when the Portworx service is restarted on the node where the volume is attached. The failback happens when the Portworx service is subsequently stopped on the node where the volume attachment was moved during failover.Workaround: Restart the Portworx service on the node where the volume was attached before the failover. Use a normal failover strategy for the sharedv4 service volumes. The default is aggressive . |
PD-1023 | Failures such as iSCSI disconnects, HBA reset, etc. on Portworx pool drives can sometimes cause the pool to enter an error state and remain in that state if there’s an outstanding operation in the kernel. Workaround: To recover from this, reboot the node. |
PWX-21956 | Pool expansions by resizing volumes will fail if some paths to the FlashArray are down. Pool expansions may fail in reduced connectivity scenarios, such as Purity upgrades, data network issues, or others. Workaround: Restore all paths to the array before attempting the resize again. |
PWX-21276 | If some valid and some invalid FlashArray or FlashBlade endpoints are provided, Portworx will fail to start. If invalid credentials or endpoints are entered (or an API token expires), Portworx will fail to start. Workaround: Correct the credentials in the secret and restart Portworx. |
PD-1024 | When using Portworx with FlashBlade (FB), users store login credentials, such as FA/FB IPs and access tokens, in the px-pure-secret Kubernetes secret. In the event that an access token expires or is otherwise invalidated, Portworx automatically provisions workloads onto the next accessible FB to avoid interruptions. As a result, users may not be alerted when FlashBlades become inaccessible, and workloads can concentrate on the remaining FlashBlades, impacting performance. Workaround: To avoid this issue, ensure the credentials stored in px-pure-secret are valid. If you find invalid credentials, correct them and restart Portworx to restore full use. |
2.8.1.6
May 20, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-23997 | A kernel panic occurs if any application tools try to perform a grep operation or user file system level commands on the pxd-control device. User Impact: The affected node will experience a kernel panic due to the Portworx kernel module being unable to handle the filesystem user level commands. Resolution: Portworx now handles these kinds of commands on pxd-control devices by denying access, preventing kernel panic. |
2.8.1.5
Mar 1, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-22968 | In vSphere environments, unused virtual disks were left lingering around. User impact: Users may have seen multiple KVDB disks lingering around on worker nodes without getting cleaned up. Resolution: Portworx now detects these lingering disks and deletes them. |
2.8.1.4
Feb 15, 2022
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21514 | In vSphere environments, Portworx sometimes failed to remove KVDB drives. User impact: Users saw an additional KVDB drive when they listed all available drives in the Portworx cluster. |
2.8.1.3
Jan 24, 2022
Notes
Portworx now includes kernel module support for 4.15.0-163-generic
.
2.8.1.2
Nov 2, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21485 | The NFS rpc-statd service sometimes failed to start on Portworx nodes, preventing sharedv4 volumes from mounting. User Impact: Application pods using sharedv4 volumes sometimes became stuck in the ContainerCreating state, with volume mount operations failing with the error: mount.nfs: rpc.statd is not running but is required for remote locking.mount.nfs: Either use '-o nolock' to keep locks local, or start statd.\nmount.nfs: an incorrect mount option was specified Resolution: Portworx now detects when rpc-statd is either not running or is in an inconsistent state, and restarts it to ensure that sharedv4/NFS mounts proceed. |
2.8.1.1
Oct 13, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21506 | One of the Internal folders used for mounting the legacy shared (FUSE) volumes was always created, even if shared (FUSE) volumes were already present on the system. User impact: Mount paths used by shared (FUSE) volumes had wider permissions than desired. Resolution: This change prevents the internal mount path, specifically /opt/pwx/oci/rootfs/pxmounts , from being created when there are no shared volumes being used. |
2.8.1
Sept 20, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-21172 | In a Metro DR configuration, there can be multiple cluster domains within the same cluster. When a sharedv4 volume is created, its replicas are placed across these cluster domains. If an app requests a sharedv4 volume, then there is no guarantee which replica node will act as the sharedv4 NFS server. This improvement ensures that whenever a sharedv4 application is started in any of the domains, the volume is attached to a node in the same domain where the application is running, guaranteeing minimum latency. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21227 | While trying to run IOs on multiple volumes with an overlapped overwrite pattern, an sm abort error sometimes occurred on one of the nodes. User impact: This error caused Portworx to restart. Resolution: This rare deadlock during resync no longer occurs. |
PWX-21002 | A deadlock in the NFSWatchdog path caused lock contention and an inspect command to hang. User impact: Users experienced a mounting issue for the affected volume and saw the pod stuck in the ContainerCreating state. Resolution: This deadlock has been eliminated. |
PWX-21224 | Setting cloudsnap threads to 4 or less resulted in cloudsnap backups in hung state. User Impact: With cloudsnap threads (Cloudsnap maximum threads field in cluster options) set to 4 or below and doing more than 10 cloudsnaps at the same time, cloudsnaps would be in hung/stuck state without making any progress. Resolution: Incorrect check for thread count resulted in deadlock causing the above scenario, which has been addressed in this release. |
PWX-20949 | Issue" Cloudsnap restore may fail with error message "Restore failed due to stall". This happens if restore incorrectly thinks that the node is in maintenance mode. User Impact: These restores may never complete and user may with need to restart the node where the restore is stalled or reissue the restore command Resolution: This change fixes the issue where cloudsnap does not interpret node status. An upper layer module handles it preventing the issue. |
PWX-19533 | Fixed an issue where a node may accumulate over-writes for a volume causing the px-storage process to restart.User impact: None |
PWX-21163 | Prometheus was unable to access the Portworx internal etcd on a multi network interface setup, hence etcd alerts are not seen. User impact: Prometheus alerts and monitoring did not work properly for the alerts related to internal etcd on a multi-network interface setup. Resolution: Portworx is now allowed internal etcd access from all network devices, giving Prometheus access to scrape alerts. |
PWX-21197 | There was a regression in using limit_drives_per_pool in 2.8.0 User impact: limit_drives_per_pool is a runtime option to control the number of drives in a pool. The last pool could have more drives than the limit if the drive count was not an exact multiple of the limit, and if creating another pool would have resulted in too few drives within it. Internally, a new pool is created if the drive count is at least 50% of the limit available in the last pool.Resolution: These limits are now honored more strictly when a drive is added from maintenance mode. Any drive add operation will fall into a pool only if the drive count is within the limit. If not, a new pool will be formed. |
2.8.0
July 30, 2021
New features
Portworx by Pure Storage is proud to introduce the following new features:
- Early access support for Pure FlashBlade as a Direct Access filesystem. With this feature, Portworx directly provisions FlashBlade NFS filesystems, maps them to a user PVC, and mounts them to pods. Reach out to your account team to enable this feature.
- Early access support for Pure FlashArray cloud drives. Use FlashArrays as a cloud storage provider. Reach out to your account team to enable this feature.
- Snapshot optimization using extent metadata: Reduce the amount of data sent to your cloud storage provider when taking cloud snapshots.
- SkinnySnaps: improve the performance of your storage pools when taking volume snapshots.
- Sharedv4 service volumes: improve fault tolerance by associating sharedv4 volumes with a Kubernetes service.
- You can now install Portworx on Nomad with CSI enabled
- Install and scale a Portworx cluster on VMware Tanzu with CSI.
- With Pure1 integration, Portworx can now automatically upload its diags to Pure Storage's call home service called Pure1.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20720 | Portworx now supports SharedV4 volumes on VMware Photon hosts |
PWX-20131 | You can now resize disk pools with disks of size 32TiB on Azure for up to 32TiB. |
PWX-20060 | The Portworx spec generator now creates the GA/v1 API version of the CSI VolumeSnapshot CRDs. |
PWX-18845 | Portworx now supports Amazon General Purpose SSD volumes (gp3). |
PWX-10281 | The Portworx CSI driver now supports Raw Block volumes for RWO PVCs. |
PWX-20102 | Licensing improvement: Autopilot licenses are now automatically included with all Portworx Enterprise licenses, including floating licenses. |
PWX-19553 | Alerts now bypass the API queue. As a result, pxctl will still show alerts even when the API queue is full. |
PWX-19496 | Kubernetes PVCs will now get created by the CopyOnWrite on demand setting by default. |
PWX-19320 | Portworx CSI Driver volumes will now be rounded up to the nearest GiB to match the Portworx in-tree volume plugin. This change only occurs for new volumes and volume size updates. |
PWX-20803 | Added photon support for pool caching. Portworx will try to install the required packages if enabled. If they fail, the Portworx installation will fail. Pool caching requires the following available packages:
|
PWX-20527 | The maximum number of cloud drives per node (not per pool) has increased from 12 to 32. Note that specific cloud providers may impose their own limits (remaining at 12), and that there are still limits per pool that may come into effect sooner. |
PWX-20423 | Sharedv4 Export Options Improvements: The storage class option export_options can now take any NFS export option as a comma separated list of strings. Portworx will apply those export options on the node where the volume is attached and exported over NFS. |
PWX-20204 | For cloud drive, the requirement to specify skip_deprecation has been removed and users can now use the pxctl sv drive add command without that. |
PWX-18529 | Portworx will now report back home in trial license installations |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20780 | Pods using encrypted sharedv4 volumes sometimes got stuck in the terminating state. User impact: If a node hosting a replica of a sharedv4 encrypted volume (server node) was rebooted, the application pods accessing that volume sometimes got stuck in the terminating state. Resolution: Portworx now detects when a sharedv4 encrypted volume server node restarts, and will automatically restart application pods using that volume to recover them to a functional state. |
PWX-20417 | On Kubernetes-v1.20.1 on the containerd-v1.4.4 container runtime,the Portworx installer used an invalid cgroups-path. User impact: Portworx failed to install Resolution: Portworx now properly installs on this Kubernetes version and runtime. |
PWX-20789 | A pool expand status message was incorrect. User impact: The status message in the pool expand operation that appears in the pxctl service pool show command output was inaccurate when the operation type was add-disk . Users saw the incorrect size amount by which the pool was being expanded by, but the operation functioned properly. Resolution: The status message now correctly states the size of the the pool that is expanding. |
PWX-20327 | When running on cloud at scale (more than 200 nodes), if there are certain nodes flapping or restarting in a loop, it would cause healthy Portworx nodes to slow down processing kvdb watch updates. User impact: Certain volume operations (for example, create or attach) take a long time or fail. |
PWX-20085 | Portworx failed to install when both -metadata and -kvdbDevice options were passed. User impact: Users saw their installations fail if they provided both options. Resolution: If users provide both options, Portworx installation will no longer fail. It now defaults to using the -metadata device for KVDB to maintain backward compatibility. |
PWX-19771 | Communication between nodes could block forever. User impact: Users sometimes saw volume access or management pend forever without timing out. Resolution: All communications in the Portworx cluster now timeout. A new pxctl cluster options allows you to configure a default RPC timeout. |
PWX-20733 | If a custom container registry was specified and the cluster was using containerd, the oci-monitor attempted to pull the Portworx Enterprise image from the wrong registry. User impact: Users saw installation fail. Resolution: oci-monitor now pulls the Portworx Enterprise image from the correct custom registry. |
PWX-20614 | When multiple oci-monitor container images are present (using the same image-hash but a different name), Kubernetes sometimes started the wrong container image. User impact: Users potentially saw the wrong Portworx Enterprise image to be loaded on the nodes. Resolution: The oci-monitor now consults multiple configuration entries when deciding which px-enterprise image needs to be loaded." |
PWX-20456 | Fixes the installation problem with containerd-v1.4.6 container runtime, running in "systemd cgroups" mode. User impact: If one is running a Kubernetes cluster configured with containerd-v1.4.6 (or higher) container runtime with systemd cgroups driver, Portworx service would fail to start. Resolution: Portworx startup issues are now resolved when using this container runtime/configuration. |
PWX-20423 | The security_label export option was removed after a node reboot. User Impact: When using sharedv4 with SELinux enabled, an app using sharedv4 volumes sometimes saw permission issues if the node where the volume was attached was rebooted. Resolution: Improvements to the sharedv4 volumes resolved this issue. |
PWX-20319 | Fixes an issue with Portworx service restarts hanging indefinitely, when DBus service is not responding. User impact: On host-systems that have a non-responsive DBus service, Portworx startup used to hang indefinitely. Resolution: Portworx startup no longer hangs if it cannot connect to the DBus service. |
PWX-20236 | In some scenarios, volume unmount may fail due to EIO errors on mount path, which could be due to prolonged downtime on the volume. User Impact: Pods may fail to terminate and reboot. Resolution: Continue to unmount the volume even when readlink fails with EIO on mount path. This allows pods to continue with remount of the volume. |
PWX-20187 | When passing a Kubernetes secret for etcd username/password using environment variables, they were taken and used "as is", rather than being "expanded" and replaced with the actual values from Kubernetes secret. User impact: When specifying the etcd username/password, the environment variables (e.g. populated by Kubernetes secrets) were not being "expanded" before configuring etcd connection. Resolution: You can now specify the etcd username/password via the environment variables. |
PWX-20092 | Queued backups may fail if the volume replica is reduced in such a way that the replica on the node assigned for queued backup gets removed. User Impact: Queued backups failed. Resolution: Users can wait for queued backups to complete before running the ha-reduce command, or re-issue the backup command once the HA reduce is complete. |
PWX-19805 | Portworx couldn't unmask the rpcbind service. User impact: Portworx could not properly integrate with NFS services, if NFS services were masked on the host. As a consequence, sharedV4 volumes could not be served from that host. Resolution: The Portworx service startup/restart now checks for masked NFS services, and automatically unmasks them. |
PWX-19802 | Migrations were failing with error - "Too many cloudsnap requests please try again" User Impact: If a cluster migration was triggered which migrated more than 200 volumes at once, the migration would fail since Portworx would rate limit the cloudsnap requests. Resolution: Cluster Migration will not fail if the internal cloud snap requests are rate limited. Portworx will gracefully handle those "busy" errors and retry the operation until it succeeds. |
PWX-19568 | When an NVMe partition is specified as a storage device, the device state is shown as "offline". User impact: Users sometimes saw their partitioned NVMe drive state as "offline". Resolution: NVMe partitions given as storage devices now correctly show as "online". |
PWX-19250 | Talisman px-wipe does not support etcd with username and password User impact: Using automated cluster-wipe procedure would not purge the cluster's data from the key-value database, when an external username/password -enabled EtcD was configured as cluster's KVDB. Resolution: The cluster-wipe procedure should now purge the data also from username/password -enabled EtcD." |
PWX-19209 | OCI-Mon/containerd occasional install/upgrade glitches User impact: During Portworx upgrades on containerd container-runtimes, there was a race-condition with the cleanup of install-container, which could block the upgrade until the node got rebooted. Resolution: The cleanup procedure was improved, so the race-condition no longer occurs" |
PWX-18704 | Drive add not respecting limit_drive_per_pool User impact: When a runtime option limit_drive_per_pool was set to 2, a pool delete/initialize/drive add operation sometimes resulted in pools of more than 2 drives.Resolution: The px-runc install-time configuration limit_drives_per_pool is now honoured during drive. |
PWX-18370 | A tracker file from previous Portworx version was not getting deleted during wipe causing sharedv4 volume mount issues. User impact: If you deleted a Portworx version and reinstalled a new version, the wipe process did not remove a tracker file used for sharedv4 volumes. This will fail to mount new sharedv4 volumes after reinstallation. Resolution: The tracker file is now removed. |
PWX-20485 | In versions earlier than Portworx 2.7.0, pxctl volume check on ext4 formatted secure pxd volume returns the "Background Volume Service not supported on Encrypted volumes" error. User Impact: Users couldn't use pxctl to do volume checks on ext4 formatted secure pxd volumes. A workaround was fsck directly from within the Portworx container. Resolution: Portworx now supports volume check on ext4 formatted secure pxd volume using pxctl. |
PWX-20303 | Fixed an issue where KVDB updates were not pulled in from other nodes when this node is unable to get the updates from KVDB. |
PWX-20398 | Fixed an issue where request processing started before the px-storage process was initialized causing it to restart. User impact: These restarts may have created core files, users may have seen the process restart. |
PWX-20690 | The px-storage process incorrectly finished processing internal timestamps causing it to restart. User impact: These restarts may have created core files, users may have seen the process restart. |
PWX-19005 | Cloudsnaps may be stuck irrespective of whether you issued stop on the cloudsnap taskID. This sometimes happened when the local snapshot was deleted while cloudsnap was still active. In these conditions, snap detach operations failed and caused cloudsnaps to get stuck. User Impact: Users saw stuck cloudsnaps, which could only be fixed by restarting the node where the stuck cloudsnap was active. Resolution: Detach failures are not retried forever, minimizing this scenario. |
PWX-20708 | Cloudsnap size was not tracked correctly for incremental cloudsnaps in cloudsnap metadata. User Impact: Users may not see correct cloudsnap size because of this issue. Resolution: Now the incremental cloudsnap size is tracked correctly. |
PWX-20364 | An incorrect check searched all nodes in ReplicaSets to be online. User impact: Cloudsnaps may fail on restart if some of the replica nodes are online. Resolution: Removed the check for online nodes. |
PWX-20237 | Cloudsnap operations did not choose the node where the previous cloudsnap was executed, specifically for reducing the HA level volume. User impact: Some cloudsnaps could be shown as full, even though it could have been incremental. Resolution: If the volume replicas differ between previous cloudsnap and current cloudsnap, Portworx now chooses the same replica node where the previous backup ran. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PD-896 | With 2.8.0, the --sharedv4_service_type option was added to the pxctl volume create and pxctl volume update commands. This new option is not applicable to non-Kubernetes environments, such as Nomad . This option works only in the Kubernetes environment. |
PWX-20666 | The issue occurs during the first boot of a newly created cluster, and if the node contains Ubuntu 20.04/Fedora CoreOS 33.20210301.3.1 as the host and you use the -j auto option. Any Linux distribution using systemd version 245 and above seems to have this problem. Ubuntu 18.04 containers use version 237 of systemd . User impact Portworx takes additional time to come up. Recommendation: Do not use the -j auto option. If you use the option, then consider installing parted command on the host. If that is not possible, wait for Portworx to reach stability. You must only wait for the first boot after new cluster creation. |
PWX-20836 | When using Portworx cloud drives and using a node selector in the StorageCluster CRD specification, a node may come up with a drive configuration that is different from what is specified in the StorageCluster CRD under the node selector. This occurs when one Portworx node creates a drive based on the node selector configuration. If that drive is not currently being used by that node, another node that is trying to initialize may pick up this available drive (even if that newly created drive does not match with the user specified configuration for that node). |
OPERATOR-410 | A workaround for deploying Operator on OCP (IBM) with nodes labeled as worker/master: Remove the nodeAffinity node-role.kubernetes.io/master from the live StorageCluster spec to deploy OCI pods on control plane nodes. |
PD-891 | Currently Photon OS versions 3.0 and 4.0 do not allow a user to install mdadm using default yum and tdnf repositories. As this package is a must for px-cache deployment Pure Storage does not support caching on Photon OS on version 2.8.0 |
PWX-20839 | Portworx 2.8.0 does not support the Photon distro for px-cache. |
PWX-20586 | When creating a burst of PVCs (a large number of PVCs within few seconds) with the Portworx CSI driver or a separate Portworx PVC controller, the Portworx volume creations can get stuck in a pending state. Under this condition, pxctl volume list will show these volumes as “down - attached”. The volume creates will eventually converge and complete, but this can take more than 1 hour.Workaround: Stagger PVC creations in smaller batches if using the CSI driver or the Portworx PVC controller. |
PWX-13190 | The pxctl credentials delete <uid> command is currently not supported for Kubernetes secret provider. To delete the credentials use kubectl delete secret <> -n <namespace> , where <namespace> is where Portworx is installed. |
PWX-9487 | In a metro DR setup where Portworx spans across 2 data centers, a node can be started with an argument --cluster_domain and value set to witness . This node will act as a witness node between the two data centers and it will also contribute to quorum even if it is started as storageless node. |
PWX-20766 | The Portworx CSI Driver on Openshift 3.11 may have issues starting up when a node is rebooted. Customers are advised to upgrade to Openshift 4.0+ or Kubernetes 1.13.12+. |
PWX-20766 | The Portworx CSI Driver will not be enabled by default in the PX-Installer for Kubernetes 1.12 and earlier unless a flag csiAlpha=true is provided. |
PWX-20423 | Portworx uses a set of default export options in storage classes which cannot currently be overridden. The export_options parameter only allows extending the current default options. |
PD-915 | If a storage node is removed or replaced from the cluster and that node was the NFS server for a sharedv4 volume, client application pods running on other nodes can get stuck in pod Terminating state when deleted. User impact: Application pods using sharedv4 volumes remotely will get stuck in Terminating state. Recommendation:
|
PD-914 | When a pod that is using a sharedv4 service volume, is scheduled on the same node where the volume is attached, Portworx sets up the pod to access the volume locally via a bind mount. If the pod is scheduled on a different node, the pod uses an NFS mount to access the volume remotely. If there is a sharedv4 service failover, the volume gets attached to a different node. After the failover, pods that were accessing the volume remotely over NFS, continue to have an access to the volume. But the pods that were accessing the volume locally via bind-mount, lose access to the volume even after Portworx is ready on the node since the volume is no longer attached to that node. Such pods need to be deleted and recreated so that they start accessing the volume remotely over NFS. If the stork is enabled in the Kuberenetes cluster, it automatically deletes such pods. But sometimes a manual intervention may be required if the stork is either not installed or fails to restart such pods. Recommendation: Enable stork to reduce the likelihood of running into this issue. If you do run into this issue, use the command kubectl pod delete -n <namespae> <pod> to delete the pods which cannot access sharedv4 service volume anymore because the sharedv4 service failed over to another node. This problem does not apply to the pods that are using "sharedv4" volumes without the service feature. This problem does not apply to the pods that are accessing "sharedv4 service" volumes remotely i.e. from a node other than the one where volume is attached. |
PD-926 | For information about Sharedv4 service known issues, see the notes in the Provision a Sharedv4 Volume section of the documentation. |
2.7.4
August 27, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-21057 | Pods failed to come up with restored PVCs that were encrypted with vault namespace secrets. User impact: Pods using PVC that is cloned from snapshot, which is encrypted using vault namespace secrets, will remain in container creating. Resolution: Portworx now fixed this issue to copy over the required values for encrypted volumes for cloned PVC. Pods will not remain in container creating for this issue. |
PWX-21139 | In a DR setup, during the failover and failback of an encrypted volume, some labels that Portworx used for encrypting the volumes got removed. User impact: Failback or restore of encrypted volumes using per-volume secrets would fail as restore of the volume on the source cluster would fail. Resolution: The cloud backups and restores done as a part of failover and failback of encrypted volumes ensure that the encryption related labels are not removed. |
2.7.3
July 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20323 | Portworx now tries to reconnect to the KVDB at least 3 times before restarting the Portworx process. |
PWX-19994 | Added two new runtime options: quorum_timeout_in_seconds : Sets the maximum time for which nodes will wait in seconds to reach quorum. After this timeout, Portworx will restart.kv_snap_lock_duration_in_mins : Sets the maximum timeout for which Portworx will wait for a KVDB snapshot operation to complete. After this timeout, Portworx will panic and restart if the snapshot does not complete. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20641 | RedHat OpenShift 4.7.16 write locks partitions Portworx uses during upgrades. Portworx versions 2.5.x through 2.7.2.1 generated chattr-protected immutable /etc/pwx/.private.json files. Lastly, OpenShift 4.7 started protecting root-partition (using read-only mountpoints). User impact: These files were sometimes collected into OpenShift's historical snapshots and interfered with OpenShift upgrades. Resolution: the px-runc statup service now scans and fixes any immutable .private.json files found in CoreOS file-system snapshots and OpenShift snapshots. This works for both read-only and read-write partitions. |
PWX-15391 | When using an internal KVDB, Portworx encountered an error if one of the KVDB nodes went down. User impact: Users saw impacted filesystems enter read-only mode. Resolution: The run-flat feature keeps the Portworx volumes online, even if the KVDB is down. New create/attach/mount operations are not allowed, but existing volumes don't see an I/O interruption as long as all the replicas of the volume are online. Note the following implications for the external and internal KVDB: External KVDB: as long as all the Portworx nodes continue to stay up, all the volume replicas should stay online and with no I/O interruption. However, if any node goes down, volumes with a replica on the down nodes will see an outage. Internal KVDB: if the KVDB is down, it almost certainly implies that at least 2 Portworx nodes are down. Any volumes with a replica on the down nodes will see an I/O interruption. |
PWX-20075 | CSI VolumeSnapshotContent objects incorrectly displayed a restore size of 0. User impact: External backup systems that depend on the CSI VolumeSnapshotContent restore resize sometimes failed. Resolution: The Portworx CSI driver now correctly adds the restore size to new CSI volume snapshots, and snapshot contents will have the correct RestoreSize. |
PWX-19518 | Overwriting a cluster wide secret when using Vault Namespaces failed with the NotFound error. User impact: Users were unable to use Vault as their secret management store. Resolution: The issue has been fixed and Portworx now uses the correct vault namespace while resetting the cluster wide secret. |
PWX-20149 | Portworx encrypted volume creation failed due to lock expiration. User impact: In certain scenarios, Portworx encrypted volume creation took longer than expected and eventually timed out. Resolution: Portworx no longer times-out when creating encrypted devices. |
PWX-20519 | On vSphere environments experiencing high I/O latency, Portworx cluster installation failed while setting-up the internal KVDB. User impact: Users saw the internal KVDB fail to initialize the disks within the allocated time. Resolution: Portworx now initializes a "thin" disk, rather than a "zeroedThick" disk by default; this option can be overridden. |
2.7.2.1
June 25, 2021
Notes
- Portworx 2.7.2.1 once again supports installations on Ubuntu 16.04 with the 4.15.0-142-generic and 4.15.0-144-generic kernels. See the list of supported kernels for information.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-20594 | Portworx erroneously allowed the creation of large replication sets, causing the px-storage process to create a core file.User impact: Users required help from support to fix the volume definition. Resolution: Portworx no longer allows the creation of replication sets larger than 3. |
PWX-20629 | I/O did not progress for Portworx volumes and mount points until the px-storage process was restarted. User impact: I/O may not have progressed and users may have had to restart the px-storage process.Resolution: Portworx no longer requires a restart of the px-storage process. |
PWX-20619 | I/O to the sharedv4 volumes was blocked while the NFS server reloaded the exports. When there were a large number of sharedv4 volumes being exported from the node, I/O was blocked for a prolonged period of time. User impact: Apps using the df command saw it take a long time to complete, causing pods to reset unnecessarily when df was used as a health-check. Removing one of the export options fixed this issue.Resolution: The df command will no longer slow down under these circumstances. |
PWX-20640 | When uploading diagnostics, Portworx made its configuration file immutable. User impact: When made immutable, configuration files may interfere with Opershift upgrades. Resolution: Portworx no longer makes its configuration file immutable when uploading diagnostics. |
2.7.2
May 18, 2021
Notes
- Portworx 2.7.2 no longer supports installations on Ubuntu 16.04 with the 4.15.0-142-generic kernel. Upgrade to Ubuntu 18.04. See the list of supported kernels for information.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-20106 | Portworx now supports various kernels hosted on IBM Cloud in air-gapped environments. Specifically, Portworx supports the 4.15.0-142-generic kernel on Ubuntu 18.04. |
PWX-20072 | Users can now install Portworx from the IBM catalog onto a private cluster and enable the integrated license and billing feature for these clusters. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14559 | The Read and write throughput values in Grafana and Prometheus were erroneously transposed. User impact: Users saw read throughput values where they expected to see write throughput values, and vice versa. Resolution: These values now reflect the correct metric in Grafana and Prometheus. |
2.7.1
April 29, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18530 | Storage pool caching now supports caching on SSD pools, which will be cached with NVME drives if they're available. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-19368 | If Floating licenses were used on a cluster, wiping a node or the cluster didn't return the license leases back to the license server. User impact: Users had to wait for the license leases to expire in order to reuse them. Resolution: Wiping a node or cluster now correctly releases the license leases back to the license server. |
PWX-19200 | Portworx was unable to attach AWS cloud drives when running on AWS Outpost. User Impact: When running on AWS Outpost, Portworx failed to attach the backing drives, causing the cluster initialization to fail. Resolution: Portworx now includes the Outpost ARN in the EBS volume creation, which allows the volume to be attached in the instance |
PWX-19167 | In very rare cases, the px-storage process sometimes aborted and restarted due to a race condition when releasing resources.User impact: Users experienced no interruptions, but may have seen Portworx restart. Resolution: This race condition no longer occurs. |
PWX-19022 | Pods sometimes failed to mount a volume and may not have started if that volume was first attached for background work and then later attached for mounting purposes. User impact: Users saw their pods fail to start. To correct this, they had to detach and reattach the volume, usually by stopping and restarting the affected application. Resolution: Pods no longer fail to mount volumes under these circumstances. |
PWX-18983 | An issue with security and non-CSI deployments during volume deletion caused Portworx to return incorrect information when it detected an error during a request to inspect a volume. User impact: Users saw incorrect information when inspecting a deleted volume. Resolution: Portworx now displays the correct information when inspecting a volume under these circumstances. |
PWX-18957 | After recovering an offline cluster from a KVDB backup file, that cluster's license entered an invalid state. This was caused by the ClusterUUID in the restored KVDB having extra quotes. User impact: Users attempting to recover clusters in this manner saw their licenses fail to restore correctly. Resolution: During the recovery process, Portworx now ensures that no extra quotes are added to the ClusterUUID once the recovery is done. |
PWX-18641 | Portworx displayed an incorrect alert for snapshots when the parent volume's HA level was decreased. User impact: Users may have seen this incorrect alert. Resolution: Portworx will no longer attempt to run HA level reduce operations on snapshots which have an HA level of 1 (which fails and triggers incorrect alert) when the HA level is reduced on the snapshot's parent volume. |
PWX-18447 | Portworx enabled the sharedv4/NFS watchdog even if the --disable-sharedv4 flag was set. User impact: Despite disabling the sharedv4 volume feature, users may have seen errors about NFS/sharedv4 being unhealthy. Resolution: Portworx no longer enables the NFS watchdog if sharedv4/NFS is disabled. |
PWX-17697 | Users couldn't remove storage pool labels. User impact: When users attempted to remove storage pool labels, they saw the command return Pool properties updated , but Portworx didn't remove the label.Resolution: Storage pools now have the same behavior as volumes. You can now remove labels by passing --labels <key=> without a value to remove the previously added label. |
PWX-7505 | Unsecured nodes could be added to a secured cluster, specifically if those nodes are part of a different Kubernetes cluster with a different configuration manifest. User impact: This allowed any of the unsecured nodes to join a secured cluster. As a result, the API endpoints for the unsecured nodes would be unsecured and allow anyone to execute any pxctl or RPC request. Resolution: Portworx can now be configured using the PORTWORX_FEATUREGATE_CHECK_NODE_SECURITY feature gate to prevent unsecured nodes from joining a cluster if at least one node is secured. |
PWX-19503 | The px-storage process initialization got stuck if the "num_cpu_threads" or "num_io_threads" rt_opts value did not equal the "num_threads" rt_opt value.User impact: Portworx didn't come up, and users needed to remove the "num_threads" rt_opts for the px-storage process to finish initialization.Resolution: Portworx initialization no longer gets stuck. |
PWX-19383 | A Kubernetes RBAC issue with the CSI resizer installation caused CSI PVC resizing to fail. User impact: Users saw CSI PVC resize failures. Resolution: The Portworx spec generator now correctly adds the necessary RBAC for the CSI Resizer to function properly. |
PWX-19173 | CSI VolumeSnapshotContent objects incorrectly displayed a restore size of 0. User impact: External backup systems that depend on the CSI VolumeSnapshotContent restore resize sometimes failed. Resolution: The Portworx CSI driver now correctly adds the restore size to a VolumeSnapshotContent object. |
PWX-18640 | The Portworx alert VolumeHAUpdateFailure has been updated to VolumeHAUpdateNotify for cases where the update is not failing. User impact: Users saw misleading VolumeHAUpdateFailure alerts when an update succeeded. Resolution: Portworx alerting system sends the correct alarm event for this case. |
PWX-19277 | Cloudsnaps sometimes failed to attach the internal snap for an aggregated volume if the node containing the aggregated replica was down. User impact: While the cloudsnap operation was marked as failed, the error description did not display the correct error message. Solution: Cloudsnaps no longer fail to attach, and error messages now correctly indicate that the node is down. |
PWX-19797 | With 2.7.0, cloudsnap imposed restrictions on active cloudsnap commands being processed. User impact: Async DR sometimes failed for some volumes. Solution: 2.7.1 increases the number of commands being processed to a much higher value, thereby avoiding async DR failures. |
2.7.0
March 23, 2021
New features
- Announcing the
auto
IO profile, which applies an IO profile to optimize volume performance based on the workload data patterns it sees.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16376 | Multiple KVDBs will now run within the same failure domain if there aren't enough failure domains available to place each KVDB on its own. |
PWX-11345 | Introducing a new Prometheus metric to track latencies Portworx sees from the KVDB. This metric tracks the total time it takes for a KVDB put operation to result in a corresponding watch update from the KVDB. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18789 | Expanding a pool using the lazyzeroedthick disk type in vSphere failed with the error: "found no candidates which have current drive type: lazyzeroedthick". User impact: If a user expanded a pool using the lazyzeroedthick disk type in vSphere, it failed with a message like: "could not find a suitable storage distribution candidate due to: found no candidates which have current drive type: lazyzeroedthick". Resolution: This happened because a section for the lazyzeroedthick disk type was missing in the storage decision matrix config map that ships with Portworx; this has now been added. |
PWX-17578 | Updating volumes with pxctl commands sometimes reset other values that were previously set by a spec. User impact: If users updated the queue-depth value using the pxctl volume update --queue-depth command, the directIO value for that volume was reset.Resolution: Portworx no longer resets volume-specific fields when other fields are updated using pxctl commands. |
PWX-18724 | When using the runtime option rt_opts_conf_high under heavy load, the Portworx storage process sometimes ran out of internal resources and had to restart due to an assertion failure. User impact: Upon restart, the Portworx process may have gotten stuck in a restart loop, resulting application downtime. Resolution: The resources are now sized correctly when the rt_opts_conf_high runtime option is in use. |
PWX-18632 | Portworx displayed expiration dates for permanent licenses. User impact: Users saw a distinct expiration date for their permanent licenses. Despite this reporting error, permanent licenses would not actually expire. Resolution: Portworx now correctly reports permanent licenses as never expiring. |
PWX-18513 | If a volume with an io_profile set to db_remote and replication level of 1 was backed-up using a cloudsnap, attempting to restore that cloudsnap would fail through Stork or in PX-Backup. User impact: Users attempting to restore this kind of cloudsnap without providing an additional parameter to force the replication level to 2 encountered an error. Resolution: The cloudsnap restore operation now resets the io_profile to sequential if it finds a volume with a replication level of 1 and io_profile set to db_remote . |
PWX-18388 | Pool expand operations using the resize-disk method failed on vSphere cloud drive setups. User impact: Users with storage pools powered by vShpere cloud drives could not expand them using the resize-disk method.Resolution: The resize-disk method failed because the rescan-scsi-bus.sh script was missing from the Portworx container. This script has been replaced, and users can once again expand vSphere cloud drive storage pools using resize-disk . |
PWX-18365 | Portworx overrode the cluster option for optimized restores if a different runtime option for optimized restores was provided. User impact: Because Portworx prefers cluster options over runtime options as a standard, users may have been confused when this runtime option behaved differently. Resolution: Portworx no longer honors runtime options for optimized restores; you must use cluster options to enable optimized restores. |
PWX-18210 | The px/service node-label is used to control the state of the Portworx service. However, the px/service=remove label did not properly remove Portworx.User impact: When users attempted to remove a node, Portworx became stuck in an uninstall loop on that node. Resolution: The px/service=remove label now behaves as it previously did, and now uninstalls Portworx on the node as expected. |
PWX-17282 | Previously, every Portworx deployment using the Operator included Stork, regardless of whether or not you enabled the Stork component on the spec generator. User impact: Users' deployments always included Stork, even if they did not want to enable it. Resolution: The spec generator now correctly excludes Stork if you don't enable it. |
PWX-19118 | A resize operation performed while a storage node was in the StorageDown state caused the px-storage process to restart if the node had replica for the volume being resized. User impact: Users experienced no interruptions, but may have seen Portworx restart. Resolution: the px-storage process no longer restarts under these circumstances. |
PWX-19055 | Portworx clusters did not auto-recover when running on vSphere cloud drives in local mode. User Impact: When users installed Portworx using vSphere cloud drives on local, non-shared datastores and used an internal KVDB, Portworx did not automatically recover if a storage node went down. Resolution: Portworx will no longer incorrectly mark its internal KVDB drives as storage drives, allowing the internal KVDB to recover as expected. |
PWX-17217 | Portworx failed to exit maintenance mode after a drive add operation was shown as done . User impact: Portworx stays in maintenance mode and users cannot exit it. Resolution: Portworx now properly exits maintenance mode after a drive add operation. |
PWX-19060 | When Portworx was configured to use email, an entry was printed to the log that contained hashed email credentials. User impact: Hashed, potentially sensitive information may have been written to the logs. Resolution: Portworx no longer prints this hashed information into the log. |
PWX-19028 | Portworx sometimes hung when evaluating multipart licenses where one of the licenses had expired. User impact: Users saw Portworx hang and had to reset the Portworx node. Resolution: Portworx no longer hangs in this scenario. |
Notes
- Portworx 2.7.0 is not currently supported on Fedora 33
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PWX-19022 | Attaching/Mounting a volume on the same host where it was attached internally for background work (such as cloudsnaps) fails to create the virtual kernel device. User impact: The pod may fail to mount the volume and may not start. Recommendation: Detach and re-attach the volume to fix this issue. |
2.6.5
March 6, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18967 | Portworx occasionally locked volume creation for a prolonged period when taking more than one cloudsnap for the same volume. User Impact: Users experienced longer response times when creating new volumes. Resolution: Cloudsnaps now lock only the volume they're being taken on, and no longer interfere with volume creation. |
2.6.4.1
February 18, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18625 | In certain corner cases, when a volume being restored on a destination cluster was deleted before a restore completed, async DR or migration sometimes got stuck. User Impact: Some of the nodes in the destination cluster may have become slow, with logs showing prints similar to this: time="2021-02-16T21:51:23Z" level=error msg="Failed to attach cloudsnap internally :774477562361631177 err:Volume with ID: 774477562361631177 not found" Resolution: Portworx now fails the restore operation if a volume is deleted before a restore completes. |
2.6.4
February 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-18549 | Storage pools with an auto journal partition can now be expanded with a drive resize operation. Use pxctl service pool expand -o resize-disk for this operation. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-18403 | When running vSphere cloud drives, Portworx initialization sometimes failed due to a time out in looking up the disk paths. User impact: Users with VMs containing 2 or more disks that don't show up in the /dev/disk/by-id path saw Portworx initialization time out. Portworx looked for the /dev/disk/by-id path for each disk for 2 minutes before timing out.Resolution: Portworx will now perform a udevadm trigger if it cannot find the disks; the timeout has been removed. |
2.6.3
January 15, 2021
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-17546 | Portworx deployments no longer use network ports 6060 and 6061. |
PWX-16412 | Added support for proxy via the PX_HTTP_PROXY environment variable for usage-based reporting APIs. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-17663 | In certain scenarios where the PV to PVC mapping was changed out of band, Portworx showed incorrect "Volume Consumers" under the volume inspect command output. User impact: Users saw incorrect values in the volume inspect command output. Resolution: Portworx now correctly shows values for "Volume Consumers" in the inspect command output. |
PWX-17155 | Sharedv4 volume consumers were tracked based on nodeID, which changes when a storageless node becomes storage node. User impact: The sharedv4 pods running on a storageless lost access to the mounted sharedv4 volume when that node that was restarted to assume role of storage node. Resolution: Portworx now uses nodeIP instead of nodeID to track the sharedv4 clients as the node IP remains the same when the role changes from storageless to storage node. |
PWX-17699 | Portworx created the incorrect type of vSphere disks when using Portworx disk provisioning. User impact: Portworx incorrectly parsed the lazyzeroedthick disk type provided by users in the vSphere cloud drives spec, and instead created the default eagerzeroedthick disks. Resolution: Portworx now correctly parses the spec and created thes correct disk type. |
PWX-17450 | Volume mount operations inside pods on destination clusters sometimes failed after async DR/Migration. User impact: Users sometimes needed to restart their pods after DR/Migration to correct failed volume mounts. Resolution: These volume mount operations no longer fail. |
PWX-18100 | Expanding pools with the add-drive option created new pools instead of expanding an existing pool. User impact: Users saw new pools created when they were expecting pools to expand in size. Resolution: Portworx now correctly expands pools when the add-drive option is presented. |
2.6.2.1
January 7, 2021
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-17725 | Migration cloud snapshots sometimes failed due to overlapping extents when being transferred to the cloud. User impact: Users saw migrations partially pass. Resolution: Cloud snapshots now handle the transfer of overlapping extents more gracefully to achieve successful cloud snapshots and migrations. |
2.6.2
December 7, 2020
New features
- Announcing a new command for transferring a Portworx cloud driveset from one storage node to a storageless node. This command is currently supported only for Google Cloud Platform, and is not supported when Portworx is installed using an internal KVDB.
- Portworx now allows you to drain/remove volume attachments from a node through the
pxctl service node drain-attachments
command. - Portworx now supports IBM Hyper Protect Crypto Services (IBM HPCS) as a key management store.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-17154 | Portworx now supports a NO_PROXY environment variable with a licensing code that defines which hosts will not use an HTTP proxy for licensing actions. In addition, PX_HTTP_PROXY and PX_HTTPS_PROXY environment variables will be ignored when using license-servers and floating licenses, unless you specify the PX_FORCE_HTTP_PROXY=1 environment variable (i.e. Portworx will assume local-access when working with Floating licenses). |
PWX-16410 | Portworx now supports K3s v1.19. |
PWX-15234 | You can now delete pending references to credentials from the KVDB. |
PWX-16602 | Portworx now allows for a runtime option to disable zero detection for converting zero-filled buffer to discard. |
PWX-14705 | You can now set the "Sender" for the email alerts as follows: sv email set --recipient="email1@portworx.com;email2@portworx.com" . |
PWX-16678 | Portwox now displays alerts when it overrides the user-provided value for the maxStorageNodesPerZone parameter. |
PWX-16655 | Portworx now supports the pxctl service pool cache status <pool> command in all operational modes. The following CLI pxctl commands have been moved to only be supported in "pool maintenance" mode and deprecated from "maintenance" mode:
|
PWX-13377 | Applied the latest OS updates to resolve most of the vulnerabilities detected. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16137 | A race-condition between Portworx and Docker occurred at startup. User impact: If the node cannot install NFS-packages (i.e. air-gapped host environments), and runs Portworx with the Docker container-runtime, users might encounter a race-condition on node reboot that leads to Docker creating new local volumes instead of using existing Portworx volumes. Resolution: Portworx now detects and prevents Docker from creating new local volumes on node reboot. |
PWX-16136 | Portworx pods hung when the dbus service became unresponsive.User impact: In situations where the dbus service running on the host became unresponsive, the Portworx pod wasn't able to upgrade Portworx or propagate configuration changes. Resolution: Portworx now detects when the dbus service is unresponsive and fails over to different methods to run the Portworx upgrades or change configuration. |
PWX-16413 | A harmless warning was sometimes displayed during Portworx installation. User impact: When installed into the staging area, Portworx displayed a warning when setting up the services; these warnings were safe to ignore, but caused unnecessary confusion. Resolution: These warnings are no longer displayed. |
PWX-16420 | Portworx pods running on Ubuntu/20.04 nodes did not proxy the portworx-output.service logs to the OCI-Monitor display. User impact: On Ubuntu/20.04, the Portworx pods did not proxy the Portworx service logs. This made troubleshooting difficult on Kubernetes deployments that do not provide SSH access to the hosts. Resolution: The Portworx pods now correctly proxy the Portworx service logs. |
PWX-16418 | When running on Kubernetes nodes using the ContainerD container runtime, Portworx installs and upgrades sometimes failed. User impact: If a Portworx installation or upgrade was interrupted on the nodes using the ContainerD container runtime, further attempts to install or upgrade Portworx sometimes failed until the node was rebooted. Resolution: Portworx now performs a more thorough cleanup before each install and upgrade operation, solving this issue. |
PWX-16853 | Multipart license expiration was not clear. User impact: When running Portworx on a multipart license (i.e. a license composed out of several parts/ActivationIDs), different parts of the license could expire at different times. The pxctl status and pxctl license list commands didn't provide any indication that some parts of the license would expire sooner. Resolution: The pxctl status and pxctl license list commands now display a notice if a part of the license will expire sooner than the overall license. |
PWX-16769 | License updates sometimes failed to propagate across all nodes. User impact: For users with ETCD as the KVDB and aggressive ETCD compactions, some of the nodes in Portworx cluster skipped the automatic application of updated licenses. Users had to restart the Portworx service to pick up the changes. Resolution: Portworx no longer skips license updates when aggressive ETCD compactions are used. |
PWX-16793 | Sometimes Portworx installation timed out while installing NFS packages. User impact: Some host environments may have a large amount of package repositories or have a slow internet connection, which could lead into timeouts while installing NFS services during Portworx installation or upgrade, which ultimately results in inability to use SharedV4 volumes that depend on NFS. Resolution: Portworx installs have been sped up and avoid timeouts with NFS services installations. |
PWX-17057 | With Linux kernel 4.20.x and above, NFS pool stats are incremented at much slower pace than an hour or it may not increment at all if there are no exports. User impact: With Portworx versions prior to 2.6.2, there may have been false alarms about all NFS threads being busy when there were no exports. Resolution: Portworx no longer processes the NFS pool stats when there are no NFS exports. |
PWX-16775 | The Google object store did not have pagination for object enumeration, which caused any list call to list everything in the bucket. User impact: Cloudsnap backups and restores failed to start and the request timed out. Listing cloudsnaps through pxctl also timed-out.Resolution: Added pagination to object enumeration with the Google object store. |
PWX-16796 | Proxy username and password were ignored as part of PX_HTTP_PROXY on Portworx Essentials causing license renewal to fail. User impact: Portworx essentials clusters went into "license expired" mode with PX_HTTP_PROXY Resolution: Portworx essentials now honors Username and Password fields given as part of PX_HTTP_PROXY to successfully make connections with proxy |
PWX-16072 | When Portworx was installed in PAYG mode, the cluster license expired after being unable to connect to that billing server instead of going into maintenance mode. User Impact: When a PAYG node license expired, users had to recommission the node. Resolution: Portworx PAYG nodes now enter maintenance mode correctly when the billing server is unreachable. |
PWX-16429 | Adding a new drive using the pxctl service drive add command was failing due to an issue with applying labels on the new pool. User impact: If users wanted to add a new Portworx pool to the node, their command to add a new drive using pxctl service drive add failed with an error message about labels being too long. This prevented users from creating new pools. Resolution: The Kubernetes labels are being skipped to be applied to the backend storage pools. |
PWX-16206 | Portworx failed to correctly detect the value of the maxStorageNodesPerZone parameter when running on GKE. User impact: When running on GKE with autoscaling enabled, Portworx did not detect the preferred value to use for the maxStorageNodesPerZone parameter for its cloud drives. As a result, Portworx would run without or with an incorrect value for the maxStorageNodesPerZone parameter. This resulted in issues when the cluster size was scaled and unintended nodes became become storage nodes. Resolution: The calculation for the value of the maxStorageNodesPerZone parameter on the GKE autoscaling min pool size values has been fixed. In addition to this, if the minimum number of nodes in the cluster is lower than the total number of zones, and at least one Portworx node will now be made a storage node in a given zone. |
PWX-16554 | An incorrect check prevented HA level restoration for volumes with HA level 3 under some conditions during decommission operation. User impact: Decommission operation failed to restore HA level for affected volumes if the volume had HA level 3 under some conditions. Resolution: When you decommission a node, Portworx now properly restores the HA level. |
PWX-16407 | DR migration sometimes became stuck in the "Active" state when Portworx restarted while migration was beginning . User impact: Any additional DRs also became stuck and did not finish. Resolution: Portworx now handles this situation better. |
PWX-16495 | Currently, the KVDB lock is held over the Inspect API. If the remote node didn't respond, Portworx held the KVDB lock for more than 3 minutes and the node where the Attach was issued asserted. User impact: The Portworx service could assert and restart if it tried to remotely detach a volume from a peer node but the request to do that took more than 3 minutes. Resolution: Portworx will not assert and restart if such a request gets stuck or does not return within a given timeout. |
PWX-13527 | Internal KVDB startup would fail. This would usually require a wipe of the node and re-install. User impact: Users saw the following error: "Operation cannot be fulfilled on configmaps". Resolution: Portworx will now detect such errors received from Kubernetes and will retry the operation instead of exiting. |
PWX-16384 | When a node with a sharedv4 encrypted volume attached was rebooted, the volume was not re-exported over NFS for other nodes to consume. User impact: Since it's an encrypted volume, it couldn't be attached without a passphrase. Resolution: Portworx now triggers a restart of the app, which will reattach the encrypted sharedv4 volume. |
PWX-16729 | A particular race condition caused an unmount of a sharedv4 volume to succeed without actually removing the underlying NFS mountpoint. User impact: This caused the pod using the sharedv4 volume to be stuck in the "Terminating" state. Resolution: Portworx no longer experiences this race condition. |
PWX-16715 | In certain cloud deployments, API calls to the instance metadata service or the cloud management portals are blocked or routed through a proxy. User impact: In these cases, Portworx calls to the cloud were blocked indefinitely, causing Portworx to fail to initialize. Resolution: Portworx now invokes all the cloud and instance metadata APIs with a timeout and will avoid getting blocked indefinitely. |
PWX-14101 | For certain providers like vSphere, when a cloud drive created by Portworx was deleted out of band, Portworx ignored it, created a new disk, and started as a brand new node. User impact: This caused an issue if there were no additional licenses in the cluster. Resolution: If a disk is deleted out of band or is moved to another datastore in vShpere, Portworx now errors-out and does not create new drives. |
PWX-16465 | Portworx held a lock while performing operations in the dm-crypt layer, or while making an external KMS API call for encrypted volumes. If either of them took longer than the expected amount of time, Portworx asserted. User impact: In certain scenarios, Portworx restarted while attaching encrypted volumes. Resolution: Portworx will no longer assert if any calls get stuck in the "dm-crypt" layer or if the HTTPS calls to the KMS providers timeout. |
PWX-15043 | The pxctl volume inspect command output did not show the "Mount Options" field for NFS proxy volumes. User Impact: You could not see the "Mount Options" field for NFS proxy volumes, even if you explicitly provided the mount options while creating such a volume. Resolution: The pxctl volume inspect command output now shows the "Mount Options" field for NFS proxy volumes. |
PWX-16386 | On certain slower systems, a sharedv4 volume wasn't mounted over NFS as soon as it was exported on the server. User impact: Portworx showed the access denied by server error. Resolution Portworx now detects this error scenario and retries the NFS mount. |
PWX-17477 | In clusters that have seen more than 3000 unsuccessful node add attempts, Portworx, on addition of another node running the 2.6.x release, encountered a node index overflow. User Impact: Other nodes in the cluster could dump a core. Resolution: This patch fixes the node index allocation workflow and prevents the new node from joining the cluster. |
PWX-17206 | A part-probe inside the container took a long time to finish. User impact: Portworx took a long time to reach the "Ready" state after a node restart. Resolution: Portworx now uses a host-based part-probe to resolve this issue. |
PWX-14925 | Portworx drives showed as offline in the pxctl status and pxctl service pool show commandsUser impact: When a drive was added to Portworx, the pxctl status and pxctl service pool show commands showed the drive as offline when the commands were run from the Portworx oci-monitor pod.Resolution: In the oci-monitor pod, pxctl now gets the updated information about the newly added drives from the Portworx container. |
PWX-15984 | A large number of cloudsnap delete requests were stuck pending in the KVDB. User impact: Cloudsnaps were not deleted from the cloud, or cloudsnap delete requests did not make much progress. Resolution: Improvements to cloudsnap delete operations reduce processing times. |
PWX-16681 | Restore failed when the incremental chain is broken due to either deleted cloudsnaps or deleted local snaps in the source cluster. User impact: Async migration continuously fails. Resolution: To automatically resume async DR, migration now deletes local cloudsnaps if a restore fails, triggering a full backup to fix the issue. |
OC-196 | An issue with Portworx upgrades from v2.4, 2.5 to v2.6 on Kubernetes with floating licenses caused an excess number of licenses to be consumed. User impact: When upgrading from v2.4, 2.5 to v2.6 on Kubernetes, Portworx temporarily consumed the double amount of license leases. Resolution: The upgrade to Portworx now properly recycles the license leases during the upgrade procedure, no longer consuming more licenses than it should have. |
Known issues (Errata)
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description |
---|---|
PWX-17217 | Portworx fails to exit maintenance mode after drive add operations showed as "done". User impact: If a user restarts Portworx, the add drive status command may return “done” while the md reshape operation was still in progress. Even when the reshape is done, the in-core status of the mountpoint wont change, and users can't exit maintenance mode.Recommendations: If you're stuck in maintenance mode as a result of this issue, you can restart Portworx to clear it. |
PWX-17531 | A port conflict between containerd and the secure port used by the PVC controller causes the controller to enter the "CrashLoopBackup" mode. User Impact: Users running on Kubernetes clusters using containerd saw their Portworx PVC controller pods enter the "CrashLoopBackup" mode. Recommendations: You can fix this issue by adding the --secure-port=9031 flag to the portworx-pvc-controller deployment, which can be found in the namespace on which you installed Portworx (kube-system by default). If you are using a custom start port for the Portworx installation, add 30 to the configured start port and use that number for the --secure-port parameter (e.g. if using 10000, use 10030). |
2.6.1.6
November 20, 2020
Notes
- Portworx licenses for DR are now enabled to work on IBM Cloud.
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16429 | An issue with applying labels on new pools caused new drive add operations using the pxctl service drive add command to fail. User impact: If users tried to add a new Portworx pool to the node, their pxctl service drive add command failed with an error message about labels being too long. This prevented users from creating new pools.Resolution: Portworx no longer applies the Kubernetes labels to the backend storage pools. |
PWX-16941 | Portworx installation failed when users set the VSPHERE_INSTALL_MODE=local flag to enable the vSphere cloud drive provisioning feature.User impact: Portworx failed to initialize when used in the mode above. Resolution: Portworx now properly initializes when vSphere cloud drive provisioning is enabled. |
2.6.1.5
November 16, 2020
Notes
- Added support for OCP 4.6.1
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16796 | For Portworx Essentials users, Portworx ignored the proxy username and password set as part of PX_HTTP_PROXY , causing license renewal to fail.User impact: Portworx Essentials clusters entered 'license expired' mode when PX_HTTP_PROXY was set. Resolution: Portworx Essentials now honors the Username and Password fields given as part of PX_HTTP_PROXY to successfully make connections with the proxy. |
PWX-16775 | The Google object store did not have pagination for object enumeration, which caused any list call to list everything in the bucket. User impact: Cloudsnap backups and restores failed to start and the request timed out. Listing cloudsnaps through pxctl also timed-out.Resolution: Added pagination to object enumeration with the Google object store. |
2.6.1.4
October 30, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16432 | Multipathd configuration files were not correctly set up for blacklisting Portworx devices. User impact: Incorrect entries in the Multipathd configuration file caused other devices to be handled incorrectly on the host. Resolution: This fix disables the updates to the Multipathd configuration file by default, and adds an option to enable the updates through the runc install argument -enable-mpcfg-update . |
2.6.1.3
October 14, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10434 | When upgrading to the 2.6.x releases on certain platforms where the CPU does not support the SSE4.2 instruction set, Portworx encountered a checksum mismatch on the log file. User impact: The node would go into storageless mode after the upgrade. Resolution: This patch fixes the log replay so that it does the CPU capability check and uses the right checksum type to verify the log checksum. |
2.6.1.2
October 9, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16417 | Portworx would not recognize the multipath devices. User impact: Portworx nodes came up as a storageless node. Resolution: Portworx now properly opens /etc/multipath.conf and recognizes Multipath devices. |
2.6.1.1
October 7, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16013 | On certain kernel versions, such as variants of AKS ubuntu kernel 4.15.0, and under certain conditions, the filesystem IO on the backing storage pool sometimes hung due to a kernel bug. User impact: Impacted Portworx pods displayed a 'healthy' node as 'unhealthy', causing downtime for affected users. Impact: This fix patches the filesystem kernel module in variants of AKS ubuntu kernel 4.15.0 and reinserts the patched kernel module, fixing the issue for users on this kernel. |
2.6.1
October 2, 2020
New features
- Introducing Portworx on the AWS Marketplace: deploy Portworx from the AWS Marketplace and pay through the AWS Marketplace Metering Service.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16005 | Added support for fetching tokens per vault namespace. |
PWX-14307 | Users can instruct Portworx to delete the local snaps created for cloudsnaps after the backup is complete through the pxctl --delete-local option. This causes the subsequent backups to be full. |
PWX-15987 | The pool expand alert now includes additional information about the cluster ID in the event metrics. |
PWX-15427 | Volume resize operation status alerts export metrics to Prometheus with additional context, such as: volumeid, clusterid, pvc name, namespace. |
PWX-13524 | You can now use a network interface for cloudsnap endpoints. This is a cluster-level setting. |
PWX-16063 | Introducing a new px-cache configuration parameter to control the cache block size for advanced users: px-runc arg: -cache_blocksize <size> . |
PWX-15897 | Improved Portworx NFS handling in multiple ways:
|
PWX-15036 | Traditionally, Portworx installed via Pay-as-you-go or marketplace mode will go into maintenance mode if it is unable to report usage within 72 hours. You can now configure a longer time period, up to 7 days(168 hours), by passing rtOpts billing_timeout_hours to the Portworx DaemonSet. If you set an invalid rtOpts value, Portworx returns to 72 hours. |
PWX-11884 | Portworx now supports per volume encryption with scaled volumes. DCOS users who use Portworx scaled volumes can provide a volume spec in the following manner to create a scaled volume which uses "mysecret" secret key for encryption: secret_key=mysecret,secure=true,name=myscaledvolume,scale=3 |
PWX-14950 | Portworx now supports the ability to read vSphere usernames and passwords from a Kubernetes secret directly instead of mounting them inside the Portworx spec. |
PWX-15698 | Added a new command, pxctl service node-usage <node-id> , that displays all volumes/snapshots with their storage usage and exclusive usage bytes for a given node. Since it traverses through the filesystem, this is an expensive command, and should be used with caution and infrequently. This change also removes support for or deprecates capacity usage of single volume: pxctl volume usage is no longer supported. |
PWX-16246 | px-runc now features the -cache_blocksize <value> option, which configures cache blocks size for px-cache. this option supports values of 1MB and above and a power of 2. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15130 | OCI-Monitor could have left zombie processes when installing Portworx for the first time. User impact: In most cases these zombies were harmless, but they had the potential to lock up the yum package system on CentOS hosts in rare circumstances.Resolution: OCI-Monitor now properly cleans up zombie processes. |
PWX-16240 | The ETCD_PASSWORD environment variable was shown in plaintext on px-runc and the OCI-Monitor's logs. User impact: The ETCD_PASSWORD environment variable was shown in plaintext in the Portworx/Kubernetes logs. Resolution: The ETCD_PASSWORD is no longer shown in plaintext in the logs. |
PWX-15806 | KVDB backups are stored under /var/lib/osd/kvdb_backup and one of the internal directories of Portworx where storage is mounted. On storageless nodes, KVDB backup files were not getting rotated from the internal directories since there is no storage.User Impact: The backup files could end up filling the root filesystem of the node. Resolution: Only dump the KVDB backup files under /var/lib/osd/kvdb_backup which get rotated periodically. |
PWX-15705 | Application backups can't work with the newer security model of 2.6.0. User impact: Application backups failed after upgrading to Portworx 2.6.0. Resolution: The auth model now works with the older style of auth annotations. |
PWX-16006 | Under certain circumstances, Portworx doesn't apply all Kubernetes node labels to storage pools. User impact: PVCs using replica affinity on those labels are stuck in the pending state Resolution: Portworx now performs the Kubernetes node update later in the initialization process. |
PWX-15961 | If you reconfigured a network device and attempted to restore a cloud backup on a volume from a snapshot, Portworx tried to use the IP of the previous network device in the restore and the cloud backup failed. User impact: Users saw the following error message: "Failed to create backup: Volume attached on unknown () node", and had to manually attach and detach the volume. Resolution: Portworx now updates the network device when they're reconfigured. |
PWX-15770 | Portworx sometimes couldn't complete volume export/attach/mount operations for NFS pods before timing-out. User impact: The affected pods failed to deploy. Resolution: Portworx no longer retries NFS mount operations in a loop on failure. The timeout for the NFS unmount command starts at 2 minutes, and if retried in a loop, an API Mount request can scale up to more than 4 minutes. |
PWX-15622 | sharedv4 volume mounts timed-out. User Impact: In a slower network or on overloaded nodes, sharedv4 (NFS) volume mounts can timeout and attempt multiple retries. The affected pod never becomes operational and repeatedly shows the signal: killed error. Resolution: sharedv4 volume mount operations now wait 2 minutes before timing-out. You can also specify an option to configure the timeout to larger values if required: pxctl cluster options update --sharedv4-mount-timeout-sec <value> |
2.6.0.2
September 25, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-16160 | Environment variables were not anonymized. User Impact: Sensitive information regarding secrets may potentially have been printed in the logs. Resolution: Portworx now anonymizes all environment variables. |
2.6.0.1
September 22, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-16005 | Added support for fetching tokens per vault namespace |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15705 | Application backups can't work with the newer security model of 2.6.0. User impact: Application backups failed after upgrading to Portworx 2.6.0. Resolution: The auth model now works with the older style of auth annotations. |
2.6.0
August 25, 2020
Notes
- If you're upgrading an auth-enabled Portworx cluster to Portworx 2.6.0, you must upgrade Stork to version 2.4.5.
- Operator versions prior to 1.4 and Autopilot currently do not support auth-enabled clusters running Portworx 2.6.0. Support for this is planned for a future release
New features
- Announcing guest (public) role access: the guest role allows your users to access and manipulate public volumes.
- Portworx now features K3s support: deploy Portworx on the K3s distribution.
- Check out proxy volumes (NFS): use this feature to proxy an external NFS share onto your volumes.
- Introducing automatic cluster-wide capacity distribution and balancing. You can also run rebalance operations manually.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-12620 | Added support for the K3s Kubernetes distribution. Limitation: you must use CSI integration to generate / use PVCs. |
PWX-12587 | Renamed two Prometheus metrics for the floating license server:
|
PWX-12477 | Added a new pxctl command for rebalancing storage pools. To see the syntax, use pxctl service pool rebalance --help . |
PWX-14465 | When a KVDB node is offline, if there are both storageless and storage nodes available to failover and no nodes are marked as metadata-nodes, Portworx prefers storage nodes to start the KVDB. |
PWX-12466 | The --all option for the pxctl cloudmigrate start command has been deprecated. |
PWX-10590 | Portworx now reports Prometheus metrics related to its KVDB activity. These metrics start with the following prefix: px_kvdb_ . |
PWX-13628 | Improved dynamic updates to the Portworx service in Kubernetes environments. Changing Kubernetes configuration directives, such as dnsPolicy , previously did not propagate to the Portworx service (Portworx service needed to be restarted to apply the changes). After the fix, changing the Kubernetes network configuration now automatically propagates to the Portworx service with no restarts required. |
PWX-14780 | Changed the threshold of repeated alerts from 30 seconds to 2 minutes. |
PWX-13806 | Added support for cloudsnaps on the Azure government cloud. To enable this, set the environment variable AZURE_ENVIRONMENT to AzureUsGovernmentCloud . |
PWX-13437 | Cloudsnaps now allow for STANDARD_IA as the storage class for cloudsnap objects with AWS -s3 . You can specify the storage class while creating credentials. |
PWX-12151 | You can now add labels to a sharedv4 volume so that the remote mount/client uses NFSv4. For more information, refer to the Updating volumes using pxctl article. |
PWX-11543 | On node startup, each node will raise an alert if it is unable to connect to any of the endpoints of an external etcd. |
PWX-15226 | Older Vault clients on older Go versions could leak stale connections to the Vault server if authentication with vault fails. Portworx will now cleanup such stale connections. |
PWX-12274 | The pxctl service pool delete command now prints a list of volumes on the pool. |
PWX-13398 | Added the dashboards showing the replication status to the spec generator. |
PWX-15047 | Portworx now raises a detailed error event if installed on nodes without disks. |
PWX-14666 | Improved the recovery time when a storage node goes down and there is another storageless node running in the same failure domain. The recovery will happen only if the machine (VM) for the downed storage node is also deleted, since the storageless node performing the recovery needs to attach the downed node's drives. |
PWX-12646 | Entering thepxctl status command without a token provides confusing information. |
PWX-14206 | You can now specify runtime MountOptions for Portworx volumes. Added two new fields to Portworx volumes:
|
PWX-14102 | You can now export a sharedV4 volume outside of a Portworx cluster using specific IP addresses. |
PWX-10207 | You can now override a volume's group field using pxctl : pxctl volume update --group <GROUP> <VOL_NAME> |
PWX-11884 | Portworx now supports per-volume encryption with scaled volumes. DCOS users who use Portworx scaled volumes can provide a volume spec in the following manner:secret_key=mysecret,secure=true,name=myscaledvolume,scale=3 This example creates a scaled volume that uses the "mysecret" secret key for encryption. |
PWX-15276 | You can now configure the frequency as well as the memory limit at which the Portworx process's memory heap and stack will be dumped:
|
PWX-11967 | The pxctl status command now displays which device is being used for the internal KVDB. This will be shown only on the nodes where the internal KVDB is running. |
PWX-13718 | Portworx now returns a user-friendly error when a sharedv4 mount command times-out instead of signal: killed . |
PWX-12367 | Cloudsnap metrics fields have changed with this release. Previously, the fields were px_backup_stats_status , px_backup_stats_size , and px_backup_stats_duration_seconds . These are replaced with px_backup_stats_backup_status , px_backup_stats_backup_size , and px_backup_stats_backup_duration_seconds . |
PWX-12271 | Portworx now supports VMware Photon OS 3.0 |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-12672 | Due to an accounting issue, I/O gets delayed when a node goes down under a specific condition User impact: I/O may be delayed for a while, but will recover on its own. Resolution: Fix accounting to allow I/O to continue if a node goes down under specific conditions. |
PWX-14333 | When both the multipathd device and the backing device contained the KVDB volume label, Portworx sometimes mistakenly picked up the backing device and failed to mount it. User impact: Portworx failed to come up on the nodes where the internal KVDB was running. Resolution: Portworx now chooses the multipathd device when both multipathd and the backing device contain the KVDB volume label. |
PWX-14937 | Added a fix to allow pool expand using the add-disk operation on storage pools that have a journal partition. User impact: If users installed Portworx with cloud drive provisioning using the auto-journal ( -j auto ), expanding the pool with the add-disk operation type failed with a "not supported" error message. Users also saw the same error when using Autopilot to expand the pools. Resolution: If Portworx was installed using the auto-journal, it can now expand pools using the add-disk operation type. |
PWX-11545 | Portworx installations with storage pool caching failed to come up after upgrading from 2.3.x to 2.4 or 2.5. User impact: Portworx failed to become operational after upgrading from 2.3.0 to 2.5.0. Resolution: This change fixes the upgrade path in 2.6.0, allowing you to upgrade from 2.3.0 to 2.6.0 without experiencing a failure. |
PWX-13777 | Kubernetes makes the information about its services available to pods via environment variables. However, each update to environment variables set for a Portworx pod would result in the restart of the Portworx systemd service.User Impact: Kubernetes clusters that frequently update services could experience storage interruptions. Resolution: OCI-Monitor now manages the changes to its pod environment variables, so the updates made on the unrelated Kubernetes services will no longer cause restarts of the Portworx systemd service. |
PWX-14160 | The Portworx SDK did not return the list of disks and pools in its response. User impact: Users of the Portworx SDK or of the pxctl -j cluster list command would see empty Disks and Pools fields in the response. Resolution: The Portworx SDK handler now populates these 2 missing fields. |
PWX-13655 | When Portworx went down or restarted, it created detailed logs. On some systems, this operation could take a long time and potentially hang. User impact: Log collection could become unresponsive as a result of this log dump. Resolution: Unnecessary logs were removed, eliminating the possibility for log collection to hang. |
PWX-13086 | When a VM running Kubernetes worker node gets deleted, the Portworx drives get deleted. User impact: When a worker node running Kubernetes worker node gets deleted, and the Portworx disks are still attached to the VM, the default vSphere behavior is the disks also get deleted. This causes Portworx to lose its data disk, and hence users will end up losing the Portworx node. Resolution: For vSphere 6.7.3 and above, create Portworx disks (vmdks) such that they don't get deleted on VM deletion by using the keepAfterDeleteVm flag. For lower vSphere versions, the issue persists. |
PWX-11352 | Mounting both <dir> and <dir>/<subdir> into the Portworx container could lead to excessive accumulation of disk-mounts.User impact: Mounting both <dir> and <dir/subdir> in Portworx could cause mountpoints to keep accumulating with every service restart, resulting in host slowdown and (eventually) an unresponsive/hung host system. Resolution: Portworx now checks for overlapping mount directives, and rejects such mount configurations. |
PWX-12528 | When upgrading Portworx on OpenShift, upgrade operations were blocked. Portworx will no longer make the /etc/pwx/.private.json configuration file on the host immutable. User impact: Openshift upgrades could get blocked. Resolution: Portworx now does not set the immutable flag on the /etc/pwx/.private.json |
PWX-12952 | etcd compaction sometimes broke the propagation of license updates. User impact: If the etcd database was configured to automatically compact, license updates may not propagate to all the nodes in the cluster. Resolution: etcd compaction no longer breaks the license updates. |
PWX-12466 | The --all option for the pxctl cloudmigrate start command is deprecated. User Impact: Cloud migration with -a option is no longer supported as it would try to migrate internally created snapshots and fail.Resolution: Users can invoke cloud migration it at volume/namespace level to avoid getting into this failure. |
PWX-14648 | Portworx did not honor the storage class parameter export_options during volume creation. User impact: Any export options provided to the storage class were not included when the storage class was used. Resolution: Portworx now parses export_options when they are set as storage class parameters. |
PWX-14941 | On volume creation, Portworx formats the volume and then detaches the block device. This detach can sometimes return an EBUSY since the format I/O is yet to be completed. Previously, Portworx used to wait for 10 seconds before retrying the detach and did this up to 10 times. User impact: Volume creation sometimes takes more than 10 seconds. Resolution: Portworx now immediately tries to detach and then exponentially backs off if it fails. |
PWX-13528 | Fixed an issue where vSphere 6.5 deletes disks associated with VM on VM Delete. User impact: When Portworx is installed using cloud drives on vSphere 6.5, if users delete the VM directly from vSphere without detaching all the disks, the disks will get deleted causing Portworx to lose quorum. Resolution: To ensure the disk is not deleted on VM deletion, you must now provide the DISK_EXTENSION environment variable with the disk type as .pxd to create a vmdk disk as .pxd types. This option currently works with fresh Portworx installations on VMware infrastructure. You cannot upgrade from previous versions to 2.6.0 with DISK_EXTENSION .You do not need to use this on vSphere 6.7u3 and above. |
PWX-12601 | Restarting the Portworx service while using Objectstore could hang. User impact: Restarting the Portworx service while the uploads local Objectstore was active could hang the restart operation for over 5 minutes, and result in the Portworx service being marked as "failed". Resolution: The Portworx service restarts now gracefully and in a timely fashion, stopping active Objectstore uploads. |
PWX-13542 | Portworx running on vSphere using cloud drives fails to come up when it cannot find the path of the attached disk. User impact: Portworx will fail to initialize on the node when it fails to find the device path of the attached disk. Resolution: Portworx will now retry for up to 2 minutes to find the path of the attached disk. |
PWX-15060 | Portworx may generate cores when user tries to restore an incremental backup that was taken under folder within a bucket. Portworx does not support uploading backups to a folder within the bucket. User impact: Portworx may generate cores till the restore is stopped. Resolution: This change fixes the above issue and returns error to the user rather than generating cores. |
PWX-13178 | In Azure, a disk can be attached anywhere in an availability set. When using availability sets with the auto-disk provisioning feature in Portworx, storageless nodes did not pick up disks left on terminated storage nodes. Ideally, they should have picked up the disks as they can be attached by any VM in the availability set. User impact: When a storage node is terminated in Azure, any online storageless node should pick up the drives left by the storage node and keep the cluster in quorum. Due to incorrect handling of availability sets, storageless nodes did not pick up the drives. Resolution: Portworx now correctly detects that nodes are part of an availability set and attaches drives left by a storage node in the availability set. |
PSG-172 | When using CSI in a PKS environment, volume provisioning was not working because an incorrect kubelet host path was used by the CSI sidecar containers. User Impact: CSI PVCs became stuck as "pending" in PKS. Resolution: Fixed the Portworx specs to use the correct kubelet host path when using CSI on PKS. |
PWX-15341 | Portworx installs on AKS started failing on August 21, 2020 due to an Azure backend change which causes VM disk attach APIs to fail when using VMSS. Until Azure fixes this problem, Portworx calls the Azure disk attach API differently as a workaround. User impact: When using auto disk provisioning in an AKS environment with VMSS, Portworx will not be able to attach Azure Managed Disks and, therefore, not come up at all on the cluster. Resolution: Changed how Portworx invokes the Azure VM Disk attach API. This will ensure that Azure will not throw an error and let the disk attach go through. |
PWX-12530 | When a node in ASG terminates, it takes a specific amount of time to release the attached drives, and a new replacement node may fail to start due to a license limitation. This caused Portworx to fail correctly, but it did not retry to restart. Users had to manually restart Portworx. User impact: When a user is using all node licenses in the cluster and wants to replace a Portworx node with a different one (this can happen during Kubernetes or VM upgrades), the new Portworx node tries to join the cluster, but as the old node is not fully terminated yet, Portworx used to exit with a license limit error. As a result, the user has to manually bounce the new node when the old node is terminated so it can join the cluster. Resolution: On license limit error, Portworx now waits for the old node to fully terminate, and release the cloud disks. Once the old node is fully terminated, the new node will restart, and use the drives from the old node without failing with license limit error. |
PWX-12611 | When using the auto-disk provisioning feature with auto-scaling groups, if the minimum size of an ASG group is zero, and the user specifies a value for the maxStorageNodesPerZone parameter, then Portworx overwrites the user-provided value with zero .User impact: Portworx was overriding the user-provided value for the maxStorageNodesPerZone parameter with zero, based on the ASG group value. This was causing all new nodes to come up as storage nodes, ignoring the user-provided max storage nodes value.Resolution: When the ASG group minimum value is set to zero, and the user provides a different value, Portworx now does not overwrite the value, honoring the user-provided maxStorageNodesPerZone value. |
PWX-11338 | With the PX-Cache feature enabled, if any single caching drive goes offline, the entire storage node goes down. User impact : Portworx pools become inaccessible due to single drive failures. Resolution: Portworx now identifies the pool which should be made offline to sustain the failure of single drive. |
PWX-12670 | If auto journal is used with px-cache, occasionally it is possible for node wipe to fail clearing the md device that was built internal. Manual md stop, followed by device wipefs needed to reinitialize portworx successfully. User Impact: pxctl sv nw fails to wipe Portworx signature needed during recommission a node.Resolution: There is a timing dependency from within the kernel to release access to used drives even with Portworx stopped. This was causing node wipe to fail. Internally Portworx now handles the failure and ensure it cleans up the drive with a retry and throws appropriate failure if still in use. |
PWX-11750 | If you specify a caching device using the -cache flag, and the specified device does not exist, then the -cache flag will be ignored, and pxctl will not display any cache devices.User Impact: Caching may not be enabled even if the user has provided the -cache option at startup.Resolution: Portworx now starts if the cache device is not found, and displays an error message. Note that Portworx will display the error message only if the device is a local /dev/... device and cannot be found. The following error messages will be displayed in the log: Using storage device as cache: Device does not exist: Warning: ignoring non-existing storage device. |
PWX-15115 | Fixed an issue where pod termination can get stuck if the user deletes a volume. User Impact: If users delete the entire application stack at once, including the volumes, Portworx sometimes deletes the volume before the pod terminates. In such a situation, the pod will not terminate properly, because the Portworx volume unmount operation will fail for not being able to find the volume. Resolution: Portworx now correctly handles the situation in which the unmount request is sent for a volume that is no longer present. The pod now terminates properly. |
PWX-9401 | A bug in Kubernetes 1.13.5 and lower caused the Portworx volume driver to occasionally save annotations from one PVC into the parameters for another. User Impact: Portworx may have created a PVC with a different group ID than the one set in its annotations. Resolution: Portworx now uses the group value from the PVC annotation that's fetched at runtime from the Kubernetes API during volume creation to ensure the group ID doesn't change. |
PWX-13362 | Pods may get stuck in the terminating state due to a race condition where the volume gets attached and detached. User Impact: Users may see pods stuck in the terminating state. Resolution: You can solve this issue by restarting the pods. |
PWX-12288 | Handle a race condition in node decommission which caused the node to be in an inconsistent state where it would stay partially decommissioned and never completed the decommission operation. User impact: Decommissioned node shows in the nodes list. Resolution: Due to a race condition in node decommission workflow, Portworx used to leave the decommissioned node in inconsistent state. This has been addressed to avoid this so that decommissioning of node runs to success. |
PWX-15094 | A nil panic in Portworx sometimes occurred when Portworx successfully created a Vault client while Vault was in an error state. User impact: Portworx returned an error: "too many open file handles" for subsequent vault operations. Resolution: Portworx no longer panics when Vault is experiencing an error. |
PWX-12088 | Portworx used a version of IBM KeyProtect that caused a kernel panic if multiple threads tried to use it. User Impact: Portworx nodes experiencing a kernel panic restarted, and some apps did not come back online after recovery. Resolution: Portworx now uses IBM KeyProtect library v0.3.5, which solves this problem. |
PWX-13459 | When using Sharedv4 volumes, if the node where the volume was attached was powered down, daemonset pods on surviving nodes became stuck in the terminating state. User impact: Users saw stuck terminating pods that Kubernetes was unable to recover. Resolution: Pods now terminate properly. |
PWX-15250 | Sometimes, when running Portworx with the internal KVdb, the Portworx service doesn't restart on the node where the internal KVdb runs, because the KVdb devices are still mounted. User impact: If you force-terminate the Portworx service, it'll not start on the nodes running the internal KVdb Resolution: Portworx now successfully restarts. |
PWX-10674 | Node decommission in certain cases would keep the cloud drives around and not delete them. User impact: Cloud drives used to linger around in cloud after owned node is decommissioned. Resolution: Node decommission now removes the cloud drives which were left undeleted in certain cases. |
PWX-8385 | Portworx, Inc. discovered a security issue in the method used to provide the secret location to Portworx from a PVC. User Impact: Users could point to a secret in a namespace they could not access in the annotations of a PVC. Resolution: In 2.6, Portworx no longer supports using annotations in the PVC to create volumes. Instead, it uses the same secure method used by CSI and has the StorageClass reference the secret and the secret namespace. |
2.5.8
September 1, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15622 | sharedv4 volume mounts timed-out. User Impact: In a slower network or on overloaded nodes, sharedv4 (NFS) volume mounts can timeout and attempt multiple retries. The affected pod never becomes operational and repeatedly shows the signal: killed error. Resolution: sharedv4 volume mount operations now wait 2 minutes before timing-out. You can also specify an option to configure the timeout to larger values if required: pxctl cluster options update --sharedv4-mount-timeout-sec <value> |
2.5.7
August 26, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-15341 | Portworx installs on AKS started failing on August 21, 2020 due to an Azure backend change which causes VM disk attach APIs to fail when using VMSS. Until Azure fixes this problem, Portworx calls the Azure disk attach API differently as a workaround. User impact: When using auto disk provisioning in an AKS environment with VMSS, Portworx will not be able to attach Azure Managed Disks and, therefore, not come up at all on the cluster. Resolution: Changed how Portworx invokes the Azure VM Disk attach API. This will ensure that Azure will not throw an error and let the disk attach go through. |
2.5.6
August 17, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-14845 | Added a new Portworx install argument (VAULT_NAMESPACE), allowing you to set a global vault namespace for Portworx. Provide this argument as a part of the px-vault Kubernetes secret. |
PWX-13173 | Added support for providing the vault-namespace argument in pxctl commands:
|
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14777 | If credentials were deleted from Kubernetes secrets while there were multiple pending references to them in a Portworx cluster, Portworx generated an alert for every pending reference. User impact: Users received an excessive number of alerts. Resolution: Portworx now generates only one alert every hour. Portworx, Inc. advises against deleting credentials if there are schedules managed by Stork. |
PWX-14034 | Portworx did not read all the vault credentials for Kubernetes authentication methods provided in a Kubernetes secret. User impact: Portworx Vault authentication failed. Resolution: With 2.5.6, Portworx now reads all the Kubernetes authentication related input arguments from the Kubernetes secret and Portworx Vault authentication will succeed. |
PWX-14937 | Added a fix to allow pool expand using the add-disk operation on storage pools that have a journal partition. User impact: If users installed Portworx with cloud drive provisioning using the auto-journal ( -j auto ), expanding the pool with the add-disk operation type failed with a "not supported" error message. Users also saw the same error when using Autopilot to expand the pools. Resolution: If Portworx was installed using the auto-journal, it can now expand pools using the add-disk operation type. |
PWX-15094 | A nil panic in Portworx sometimes occurred when Portworx successfully created a Vault client while Vault was in an error state. User impact: Portworx returned an error: "too many open file handles" for subsequent vault operations. Resolution: Portworx no longer panics when Vault is experiencing an error. |
PWX-14924 | A driver bug caused Portworx with PX-Security enabled to send unauthorized requests to the volumes API, which were rejected. User impact: Users were unable to create PVCs, which appeared stuck in the Pending state.Resolution: Portworx with PX-Security can once again create PVCs. |
2.5.5
July 29, 2020
Notes
- Portworx 2.5.5 supports OpenShift 4.5.3
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-14102 | You can now export a sharedV4 volume outside of a Portworx cluster using specific IP addresses. |
PWX-12151 | You can now add labels to a sharedv4 volume so that the remote mount/client uses NFSv4. For more information, refer to the Updating volumes using pxctl article. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-14333 | When both the multipathd device and the backing device contain the KVDB volume label, Portworx sometimes mistakenly picked up the backing device and failed to mount it. User impact: Portworx failed to come up on the nodes where internal KVDB was running. Resolution: Portworx now chooses the multipathd device when both multipathd and the backing device contain the KVDB volume label. |
PWX-14399 | A bug introduced in Portworx version 2.5.0, broke the documented procedure to perform a KVDB recovery from dump files. User Impact: Users had to manually update the config maps and change the labels on the disk to perform the recovery. Resolution: You can now recover your KVDB from dump files with a single command using the documented procedure. |
PWX-14164 | Prior to this version, the pxctl cloudsnap list -a command sometimes failed to return all cloudsnaps in an objectstore. Resolution: Portworx now returns all cloudsnaps in an objectstore. |
PWX-14160 | The Portworx SDK did not return disks and pools in the response. User impact: Users of the Portworx SDK or pxctl -j cluster list would see empty Disks and Pools fields in the response. Resolution: The Portworx SDK handler now populates these 2 missing fields. |
PWX-14648 | Portworx did not honor the storage class parameter export_options during volume creation. User impact: any export options provided to the storage class were not included when the storage class was used. Resolution: Parse export_options when they are set as storage class parameters. |
PWX-13149 | On larger workloads, systems that have lots of overwrites on volumes with a node down sometimes causes Portworx run into an assertion failure while updating the in-core metadata and restart. User impact: Portworx containers restart and continue to work. Resolution: With this fix, Portworx will no longer assert or restart during this scenario. |
PWX-14389 | On vSphere 6.7 and above, Portworx created thin provisioned vmdks even if the user provided zeroedthick as the type. User impact: Users couldn't create thick provisioned vmdks on Portworx. Resolution: Portworx now correctly creates thick provisioned vmdks when zeroedthick is specified. |
2.5.4.1
July 27, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-13175 | Portworx now supports sharedv4 volumes on hosts running Flatcar OS. |
2.5.4
July 2, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-14106 | Added support for a new environment variable: PX_HTTP_PROXY . This allows you to specify an HTTP proxy during Portworx installation, permitting sharedv4 volumes and package installation to work properly on air-gapped environments that lacked their own system-wide HTTP proxy. |
PWX-14186 | Optimized the timeouts on misconfigured air-gapped systems, so after the first failure to install NFS services, Portworx will refrain from attempting to install NFS services again for a period of 10 minutes. |
PWX-13011 | Reduced Portworx install/upgrade times on air-gapped environments that drop the outbound network traffic (i.e. where "curl portworx.com" command would hang). |
2.5.3
June 24, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-13437 | Cloudsnaps now allow for STANDARD_IA as the storage class for cloudsnap objects with AWS -s3 . The storage class can be specified while creating credentials. |
PWX-13806 | Added support for cloud snaps on Azure government cloud. To enable this, set the enironmement variable AZURE_ENVIRONMENT to AzureUsGovernmentCloud . |
PWX-10207 | You can now override a volume's group field using pxctl: pxctl volume update --group <GROUP> <VOL_NAME> |
PWX-13375 | Portworx now accepts a new install argument --node_pool_label to determine which Kubernetes node labels to apply to CloudDrive sets. Portworx will only attach those DriveSets to a node if the node label passed through --node_pool_label matches the label on the CloudDrive set. |
PWX-13510 | Added a new runtime option rt_opts kvdb_failover_timeout_in_mins to configure kvdb offline node failover timeout. The default value is set to 3 minutes. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13542 | Portworx running on vSphere using cloud drives fails to come up when it cannot find the path of the attached disk. User impact: Portworx will fail to initialize on the node when it fails to find the device path of the attached disk. Resolution: Portworx will now retry upto 2 minutes to find the path of the attached disk. |
PWX-9401 | A bug in Kubernetes 1.13.5 and lower caused the Portworx volume driver to occasionally save annotations from one PVC into the parameters for another. User Impact: Portworx may have created a PVC with a different group ID than the one set in its annotations. Resolution: Portworx now uses the group value from the PVC annotation that's fetched at runtime from the Kubernetes API during volume creation to ensure the group ID doesn't change. |
PWX-13459 | When using Sharedv4 volumes, if the node where the volume was attached was powered down, daemonset pods on surviving nodes became stuck in the terminating state. User impact: Users saw stuck terminating pods that Kubernetes was unable to recover. Resolution: Pods now terminate properly. |
2.5.2.1
June 19, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13655 | When Portworx went down or restarted, it created detailed logs. On some systems, this operation could take a long time and potentially hang. User impact: Log collection could become unresponsive as a result of this log dump. Resolution: Unnecessary logs were removed, eliminating the possibility for log collection to hang. |
PWX-13777 | The Portworx pod previously passed unnecessary environment variables to the Portworx service, which sometimes caused it to restart more than it needed to. User impact: When the Kubernetes environment changed, users may have experienced storage interruptions due to these restarts. Resolution: The Portworx pod now passes fewer environment variables to the Portworx service, greatly reducing restarts. |
2.3.1.3
June 13, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13086 | When a VM running Kubernetes worker node gets deleted, Portworx drives get deleted. User impact: When a worker node running Kubernetes worker node gets deleted and the Portworx disks are still attached to the VM, the default vSphere behavior is the disks also get deleted. This causes Portworx to lose its data disk and hence users will end up losing the Portworx node. Resolution: For vSphere 6.7.3 and above, create Portworx disks (vmdks) such that they don't get deleted on VM deletion by using the keepAfterDeleteVm flag. For lower vSphere versions, the issue still persists. |
PWX-13542 | Portworx running on vSphere using cloud drives fails to come up when it cannot find the path of the attached disk. User impact: Portworx will fail to initialize on the node when it fails to find the device path of the attached disk. Resolution: Portworx will now retry upto 2 minutes to find the path of the attached disk. |
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-13510 | Added a new runtime option rt_opts kvdb_failover_timeout_in_mins to configure kvdb offline node failover timeout. Default value is set to 3 minutes. |
2.5.0.3
June 12, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13655 | When Portworx went down or restarted, it created detailed logs. On some systems, this operation could take a long time and potentially hang. User impact: Log collection could become unresponsive as a result of this log dump. Resolution: Unnecessary logs were removed, eliminating the possibility for log collection to hang. |
2.5.0.2
June 12, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-9401 | A bug in Kubernetes 1.13.5 and lower caused the Portworx volume driver to occasionally save annotations from one PVC into the parameters for another. User Impact: Portworx may have created a PVC with a different group ID than the one set in its annotations. Resolution: Portworx now uses the group value from the PVC annotation that's fetched at runtime from the Kubernetes API during volume creation to ensure the group ID doesn't change. |
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10207 | You can now override a volume's group field using pxctl: pxctl volume update --group <GROUP> <VOL_NAME> |
2.5.1.3
June 05, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13086 | For vSphere 6.7.3 and above, create Portworx disks (vmdks) such that they don't get deleted on VM deletion. |
PWX-13542 | Fixed in issue where Portworx would fail to come up vSphere using cloud drives when it cannot find the path of the attached disk |
PWX-13510 | Added a new runtime option rt_opts kvdb_failover_timeout_in_mins to configure kvdb offline node failover timeout. Default value is set to 3 minutes |
2.5.2
May 29, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-12737 | Added support for automatic cloud disk management for non-public Azure clouds like US Government, Germany and China clouds. |
PWX-13082 | You can now configure the frequency with which Portworx takes its KVDB backups using the kvdb_dump_frequency_minutes runtime option. |
PWX-10216 | Added support for Vault Namespaces. |
PWX-12603 | When sharedv4 volumes are in use, Portworx uses 16 NFS threads to process them by default. You can change the total number of threads Portworx uses by running the pxctl cluster options update --sharedv4-threads <num> command. |
PWX-12512 | PX-Essential can now refresh licenses through an HTTP or HTTPS proxy. |
PWX-13116 | Users can now request max backups to be enumerated anywhere between > 0 and < 5000. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-13189 | A recursive chmod operation in Kubernetes 1.18 and lower caused mounts with large block volumes to hang when users set a security context for a pod using fsGroup . User Impact: Users with hung mounts would see mount timeouts when creating a pod referencing large block volumes, and pod creation would fail. Resolution: Portworx, Inc. added the allow_others storage class label that, when set to true, will apply a permission change to the mount path. This label should only be used for Kubernetes versions lower than 1.18. Users on newer Kubernetes versions can return to using fsGroup over the Portworx allow_others=true label. |
PWX-12655 | When Portworx images were uploaded to nodes via docker load command, Docker may not have set the image digest properly. User Impact: When the image digest was not available, OCI-Monitor would not detect manually uploaded Portworx images, and would attempt to pull the Portworx image, potentially failing in air-gapped environments. Resolution: Portworx now has improved image detection, even in cases where image digests are not available. |
PWX-13171 | The px_volume_readthroughput , px_volume_writethrougput and px_volume_iops metrics did not update.User Impact: Users may have seen values for these metrics reported as zero. Resolution: Portworx once again updates these metrics. |
PWX-12088 | Portworx used a version of IBM KeyProtect that caused a kernel panic if multiple threads tried to use it. User Impact: Portworx nodes experiencing a kernel panic restarted, and some apps did not come back online after recovery. Resolution: Portworx now uses IBM KeyProtect library v0.3.5, which solves this problem. |
PWX-12466 | The --all option for the pxctl cloudmigrate start CLI command has been deprecated. |
2.5.1.2
May 28, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-13175 | Portworx now supports sharedv4 volumes on hosts running Flatcar OS. |
2.5.1
April 24, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-11638 | Starting with 2.5.1, credentials can be configured without providing secret key or access key to use instance’s IAM capabilities to access cloud provider’s object store. Current support is limited to AWS’s EC2 instance in 2.5.1. |
PWX-12314 | For {{< pxEssentials >}}, an improvment to the pxctl status command now provides the reason for why a license is expired. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-11602 | When Portworx detected an issue with container volumes, for example, if a drive was removed, OCI-Monitor resulted in Portworx pods being stuck in a CrashLoopBackupOff state. User Impact: Portworx pods in users' clusters would not recover. Resolution: When Portworx (OCI-Monitor) detects an issue with container mounts, it sends a request to Kuberenetes to reset/reinitialize the Portworx pod, which fixes the issue. |
PWX-12289 | For the CRI-O container runtime, when OCI-Monitor is set to ImagePullPolicy:IfNotPresent , it should pull the {{< pxEnterprise >}}} image only when the image is not present on the system. The OCI-Monitor incorrectly identified the image as present while it wasn't. User Impact: Portworx failed to pull the required image and OCI-Monitor failed. Resolution: The OCI-Monitor ImagePullPolicy handling now properly pulls images. |
PWX-12292 | When using OCI-Monitor, Portworx failed to drain its pods when required. User Impact: OCI-Monitor failed to start and upgrade operations failed. Resolution: OCI-monitor now properly starts and Portworx upgrades. |
PWX-12252 | For CRI-O integrations, the OCI-Monitor did not copy the install logs into its own output. As a consequence, the OCI-Monitor did not parse/retrieve the INFO: Module version check: Success install log line, and always triggered the cordoning/draining of the nodes. User Impact: Upgrades to version 2.5.0 stalled on OpenShift and/or CRI-O container-runtime Kubernetes clusters. Resolution: Portworx application cordoning and draining during the upgrade process now works properly, allowing upgrades. |
PWX-12180 | Portworx didn't send license server alerts for errors packaged into the response body of a valid REST call. User Impact: Users did not see license server alerts for these kinds of errors. Resolution: Portworx now treats these kinds of errors in the same manner as REST errors, and raises alerts accordingly. |
PWX-11595 | When a Portworx node's storage is down or full, it reports Not Ready to Kubernetes to notify the users. In this case, the Portworx node is still available and serves storage in read-only mode if it's full, or proxies the storage from other nodes if local storage is not available. |
2.5.0.1
April 21, 2020
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-12322 | Portworx failed to start if NFS was in an errored state. User Impact: Users could not start Portworx if NFS was errored. Resolution: Users can now start Portworx if NFS is errored, and Portworx will now raise an alert instead. |
2.5
April 3, 2020
New features
- Introducing PX-Central on-premises: Deploy your own PX-Central dashboard with Portworx on your own cluster.
- Introducing [{{< pxEssentials >}}]: A free Portworx license for prototyping and small production clusters.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-11777 | Added support to the pxctl volume list command, allowing you to list snapshots by node IDs: pxctl volume list -s --node <node-id> . |
PWX-11515 | CLI command flags now use --node instead of --node_id or --node-id .The following commands were modified:
|
PWX-11464 | Portworx now allows non-root user access to sharedv4 volumes by default. To restrict sharedv4 volume access to the root user, set the allow_others label to false :allow_others=false |
PWX-11418 | All cluster and node level alerts will now be raised as Kubernetes events. NODE alerts:
|
PWX-10783 | When Portworx restarts, storageless Portworx nodes will automatically detect any new available storage devices and transition themselves into a storage node. |
PWX-10756 | When running an internal KVDB without a dedicated drive, pxctl status now reports a warning saying that such a configuration is not recommended for a production cluster. |
PWX-10724 | In cloud, you can now add drives to storageless nodes using the pxctl CLI. |
PWX-10371 | pxctl status now reports last known failures if Portworx fails to startup on the node. |
PWX-9834 | An internal KVDB can now run on storageless nodes. In order to run on storageless nodes, you must provide a -kvdb_dev in on-prem clusters, while on cloud Portworx will provision a drive to be used by kvdb. |
PWX-11774 | A new pxctl clouddrive listdrives command allows you to list all the drives in cloud drivesets. On VSphere, this command also lists the datastore for a VMDK in the labels column. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10400 | In some situations, a busy volume remained attached even after a pod is terminated in Kubernetes. User impact: Upgrades or other operations relying on the kubectl drain command got stuck on a node with these attached volumes.Resolution: Portworx now detaches these busy volumes from terminated Kubernetes pods. |
PWX-11753 | Portworx sent the nodeID with cluster-level license alerts: LicenseExpiring and LicenseExpired . User impact: Customers saw the nodeID associated with license alerts when the clusterID would have been more helpful. Resolution: Portworx now reports the ClusterID instead of NodeID with the LicenseExpiring and LicenseExpired license alerts. |
PWX-11722 | Portworx raised alerts at the CLUSTER level that are more appropriately raised at the NODE level. User impact: Users may have seen these alerts at a level they did not expect. Resolution: Portworx now raises the following alerts as NODE instead of CLUSTER level:
|
PWX-11637 | Cloudsnaps did not work with some AWS S3 endpoints when bucket name being uploaded to had uppercase letters in the name. Customer impact: Snapshot restore operations involving buckets with uppercase letters failed. Resolution: Portworx now supports uppercase letters in bucket names when used with S3 endpoints. |
PWX-11365 | Portworx only checked the health of the NFS service on the default port: 2049. However, as this port is configurable, these checks failed if the NFS port changed. User impact: Users who configured their NFS to use a port other than the default encountered errors. Resolution: Portworx now checks the health of the NFS service regardless of the port it's running on, and will raise an alert if determines the NFS server is unhealthy. |
PWX-11280 | Portworx did not update the cloud drive in-progress state after performing a pool expand operation using resize-disk . User impact: Users could have seen a misleading output indicating pool expansion was still in-progress from the pxctl cloud list command output, even though the operation completed.Resolution: Portworx now correctly reports the cloud drive status after a resize-disk operation. |
PWX-10749 | If all nodes in a cluster were storageless, Portworx failed to properly install. User impact: Users attempting to install Portworx on a cluster with only storageless nodes would be left with an out-of-quorum cluster, and would have to wipe the whole installation and redo it with storage nodes. Resolution: Portworx will no longer form a cluster if it cannot find any storage nodes, and will keep reporting an error until a storage node is added to the cluster. |
PWX-10711 | On slower systems, Portworx occasionally received an access denied error from the NFS, and failed to mount sharedv4 volumes. User impact: Users experiencing this issue had to manually retry the sharedv4 mount operation. Resolution: Portworx now retries mounting a sharedv4 volume if it gets an access denied error. |
PWX-11643 | Intermittent vCenter API failures occasionally caused Portworx to fail to find its already attached cloud drives. User Impact: Disk operations reliant on the the vCenter API would fail. Resolution: Portworx now automatically retries operations involving the vCenter API before reporting an error. |
2.4
March 3, 2020
New Features
- Introducing the Portworx license server: Add and manage licenses for multiple Portworx clusters from an on-premises license server. UI integration coming soon.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10852 | Improved prometheus metrics for Portworx |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10939 | A difference between how Portworx calculates license expiration dates using time-zones and how Flexera calculates expiration dates without time-zones caused Portworx to occasionally consider new licenses "expired" on their first day. User Impact: Users with multi-part licenses may have been unable to use their multi-part licenses on the first day they activated them. Resolution: Licenses once again work on the first day they're applied and license expiration dates for multi-part licenses display accurately. |
2.3.6
February 29, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
11377 | Adding support for the 4.18.0-147.5.1.el8_1.x86_64 kernel. |
2.3.5
February 19, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-11000 | Portworx now features a Disaster Recovery plan in the IBM Cloud Marketplace. |
PWX-11122 | Portworx now supports dynamic port range change |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-11101 | If nodes were decommissioned with pending snapshots, those snapshots contained references to the decommissioned node. User impact: New nodes sometimes failed to come up. Resolution: When a node is now decommissioned, Portworx removes any pending snapshots which had references on the decommissioned node. |
PWX-10441 | Due to a race condition in the logic which handles volume attachments during a Portworx restart, sharedv4 volumes could be tagged as attached when they were not. User impact: This caused stale entries in /etc/exports , which led NFS to error out.Resolution: Portworx no longer experiences a race condition at restart, and no longer creates stale entries in /etc/exports . |
2.3.4
February 3, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10726 | Portworx can now be installed on OpenShift 4.3 when coupled with Portworx Operator 1.1.1. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10974 | In version 2.3.3, Portworx erroneously showed multi-part licenses as "expired" when the license with the earliest expiration date expired. Despite this incorrect reporting, multi-part licenses did not expire and the cluster continued functioning normally. User Impact: Users may have seen multi-part licenses erroneously marked "expired". Resolution: Portworx now correctly displays multi-part license expiration dates. |
PWX-10967 | PX-Migrate erroneously indicated success when migrating volumes between clusters with the same internal IP addresses. User Impact: Migrations under these circumstances failed, but users saw Portworx indicate success. Resolution: Px-Migrate now successfully migrates volumes between clusters with the same internal IP addresses. |
2.3.3
January 23, 2020
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10819 | replica anti-affinity rules have been deprecated. User impact: Volume creation may fail if using replica anti-affinity volume placement strategy or when restoring volume using cloud backup configured with such a policy. Recommendation: Remove anti-affinity rules and use affinity rules with NotIn, NotEqual operators to achieve the same effect. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10400 | In some situations, a busy volume remained attached even after a pod is terminated in Kubernetes. User impact: Upgrades or other operations relying on the kubectl drain command got stuck on a node with these attached volumes.Resolution: Portworx now detaches these busy volumes from terminated Kubernetes pods. |
PWX-10809 | Portworx ignored the max_drive_set_count field when deployed in disaggregated mode on cloud deployments.User Impact: If an existing node was terminated and replaced without releasing its storage devices, Portworx sometimes brought a new node online as a storage node, exceeding the max_drive_set_count field value.Resolution: Portworx now correctly enforces the max_drive_set_count field values. |
PWX-10627 | Portworx processes license expiration dates based on a combination of {{< pxEnterprise >}} and AddOn licenses. If these licenses expired at different times, Portworx would not accurately report when they would expire. User Impact: Users with these licenses may have had their cluster's node capacity reduced unexpectedly and may not have been able to start their cluster if it exceeded the remaining available license capacity. Resolution: Portworx now aligns the license expiration dates and accurately reports when they expire. |
2.3.2
December 18, 2019
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10095 | Portworx now restricts access to SharedV4 volumes to nodes requesting a mount on that volume. |
PWX-10499 | The storage pool expansion operation now supports the auto option. |
PWX-10570 | Portworx now accepts the VAULT_BACKEND input argument. When VAULT_BACKEND is provided, Portworx uses that version of Vault Backend instead of querying Vault's sys/mounts/* directory to fetch all the backends. |
PWX-10535 | If a SharedV4 Portworx volume must be accessed over NFS outside the Portworx cluster, set the label "allow_all_ips=true" on the volume. This will export the volume on 0.0.0.0/0.0.0.0, which allows you to mount this volume on any node accessible in the network. |
PWX-10380 | SharedV4 volumes are now enabled when SELinux is enabled on Portworx nodes. If you expect SELinux labels to be propagated from an NFS client to a server, set the ExportOptions on a volume to security_label . You can use the following command to update the volume option: pxctl volume update --export_options security_label <vol-name> |
PWX-10690 | Portworx now uses hourly usage billing. At end of billing cycle, customers are charged by the number of hours Portworx ran rather than the maximum number of nodes used in a given billing cycle. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10366 | Portworx did not delete node region or zone values when instructed. User Impact: Portworx continued to show deleted node and region zone labels after users deleted them. This issue persisted over restarts. Resolution: Portworx now properly deletes these labels and replaces them with the default value. |
PWX-10381 | Portworx enabled a license feature intended only for cloud deployments in on-prem clusters. User Impact: This feature transferred licenses of offline storageless nodes to available nodes when running on an on-prem cluster. Resolution: Portworx now only enables this feature when deployed in cloud environments. |
PWX-10468 | On nodes which were also volume servers and had attached SharedV4 volumes, Portworx did not restart application pods when it entered maintenance mode or was decommissioned. User Impact: Users experienced I/O errors caused by missing application pods. Resolution: Portworx nodes now delete the application pods to recover the application. |
PWX-10575 | On AKS clusters with VM scale sets, if a Portworx node with cloud drives failed to bootstrap, detach operations also failed. User Impact: Cloud drives remained attached to non-existent Portworx nodes. Resolution: Portworx now properly detaches cloud drives if it fails to bootstrap a node. |
PWX-10525 | Portworx frequently queried etcd to retrieve the storage spec and check storage pool status and pending drive operations. User Impact: These frequent queries placed an unnecessary load on etcd, resulting in higher than expected resource usage. Resolution: This fix limits the periodic calls and makes them only when necessary: on a version update. |
PWX-10455 | A failure during a volume create operation can result in a partially formatted volume. A subsequent attach on this volume will retry the formatting operation. In the case of xfs volumes, this formatting operation can fail if the new operation finds an xfs signature on the volume from the previous incomplete operation. User Impact: Partially formatted xfs volumes could not be attached. Resolution: Portworx now uses the force flag when retrying the format operation. |
PWX-10657 | The etcdv3 client Portworx uses currently contains the following critical bug: https://github.com/etcd-io/etcd/pull/10911. When connected to a secure etcd cluster, if the first endpoint goes offline, the etcd client does not failover and fails to create a new connection. User Impact: Portworx restarts and does not reconnect to the etcd cluster. Resolution: After restarting, Portworx now reshuffles the list of endpoints so that the etcd client reconnects to the cluster. |
PWX-10701 | In the pxctl cluster options update command, Portworx did not use the configured value associated with the SnapReservePercent field for overcommit rules if no label selectors were specified. User Impact: Users could not change from the default SnapReservePercent value. Resolution: The SnapReservePercent value can now be properly configured. |
PWX-10685 | Portworx accepted invalid inputs to the pxctl sv drive add --operation status command. User Impact: Users adding cloud drives were unable to see the status of their add operations. Resolution: Portworx now allows only device paths in pxctl service drive add --operation status command. |
PWX-10632 | With the new BlueStore backend, Ceph no longer uses an ext4 formatted backend. As a result, Ceph doesn't mount the drives and Bluestore opens the devices without the o_excl flag.User Impact: When installing with the -a option, Portworx saw the device as "not in use" and picked it up as its storage device. Resolution: Portworx now uses additional filters based on the device name and on-disk signature to prevent this. |
2.3.1.2
December 12, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10681 | For Portworx deployments using an internal KVDB with kubelet running as a docker container, a crash or other interruption which downs both Docker and Portworx can leave an outdated socket file on the node. On restart, Docker attempts to reconnect to Portworx, but Portworx waits on kubelet causing a cyclic dependency. User Impact: Crashes downing both Portworx and Docker without the chance for cleanup rendered both services unable to recover. Resolution: This fix attempts to address this cyclic dependency. Portworx responds to the outdated socket requests as not available , allowing Docker to progress through startup. |
2.3.1
November 18, 2019
New Features
- The
pxctl service pool expand
command is now available, allowing you to expand storage pools by adding drives and consuming unused drive capacity. See the Expand your storage pool size section of the documentation for more information.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10148 | If you're using Portworx on Microsoft Azure, Portworx can now expand storage pools by resizing disks. |
PWX-10332 | Portworx now provides more descriptive error messages for pool expansion failures. |
PWX-10442 | Added a new flag to the volume list command, allowing you to list your volumes per pool UUID: pxctl volume list --pool-uid <uuid> |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10414 | Storage pools sometimes fail to come back online after a disk is added as part of a pool expand operation. User Impact: Impacted storage pools may remain down, impacting apps. |
2.3
November 12, 2019
New Features
- Introducing new ways to control volume provisioning: customize provisioning ratios, disable thin provisioning, or disable provisioning entirely.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10275 | The pxctl service pool update <pool_ID> --labels command functionality has improved. Previously, entering the command would overwrite any existing labels. Users wishing to add a label would have to keep track of and repeat the existing labels they wanted to persist. With the improved functionality, Portworx now behaves as follows:
|
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10155 | Portworx storage pool labels do not inherit Kubernetes node labels. User Impact: PVC creation relying on Kubernetes node labels fails. |
PWX-10239 | Entering the pxctl service drive add -o status command with a --spec flag included causes Portworx to incorrectly add drives. User Impact: Users entering a status command with the conflicting --spec flag can erroneously add new drives. Resolution: With 2.3, Portworx no longer accepts these malformed commands as drive add operations. |
PSP-1978 | Portworx occasionally causes a read/write operation to wait indefinitely on workloads with a large number of overlapping writes. User Impact: Impacted volumes enter a read-only state or become unresponsive. |
2.2.0.5
December 19, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10657 | The etcdv3 client Portworx uses currently contains the following critical bug: https://github.com/etcd-io/etcd/pull/10911. When connected to a secure etcd cluster, if the first endpoint goes offline, the etcd client does not failover and fails to create a new connection. User Impact: Portworx restarts and does not reconnect to the etcd cluster. Resolution: After restarting, Portworx now reshuffles the list of endpoints so that the etcd client reconnects to the cluster. |
PWX-10456 | Portworx Inc. currently packages filesystem dependencies required for Linux kernels into an archive in the Portworx container. Under this current scheme, Portworx does not contain new versions of Linux kernels released after it in the archive. User Impact: Portworx fails to install on clusters using newer versions of RHEL 8 kernels. Resolution: During installation, Portworx now checks mirrors.portworx.com for the latest filesystem dependencies required for running Linux kernels if it cannot find them locally. |
2.2.0.4
December 12, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10681 | For Portworx deployments using an internal KVDB with kubelet running as a docker container, a crash or other interruption which downs both Docker and Portworx can leave an outdated socket file on the node. On restart, Docker attempts to reconnect to Portworx, but Portworx waits on kubelet causing a cyclic dependency. User Impact: Crashes downing both Portworx and Docker without the chance for cleanup rendered both services unable to recover. Resolution: This fix attempts to address this cyclic dependency. Portworx responds to the outdated socket requests as not available , allowing Docker to progress through startup. |
2.2.0.3
December 10, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10661 | Redacted VMware vSphere environment variable values from Portworx logs. |
2.2.0.2
November 27, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10525 | Portworx periodically queries etcd to retrieve the storage spec and check storage pool status and pending drive operations. This fix limits the periodic calls and makes them only when necessary: on a version update. |
2.2.0.1
October 25, 2019
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-10125 | The Portworx pxctl service CLI command now supports pool deletion. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10204 | For Portworx version 2.2.0 on IBM Cloud: If users install Portworx outside of the catalog, Portworx incorrectly starts the metering agent and cannot report usage to the billing server. User Impact: After 72 hours, users' clusters enter maintenance mode |
2.2
September 30, 2019
New Features
- Introducing Storage pool caching, this feature is available on new clusters only.
- Portworx now features stateful application backup and cloning, allowing you new ways to manage your stateful applications.
- Visit PX-Central, a place where you can learn all about getting started with Portworx.
- New jq filtering documentation demonstrates how you can filter
pxctl
output. - The Portworx CSI driver is now generally available for Kubernetes 1.13 and higher.
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-9026 | Previously, to enable sharedv4 volumes while installing Portworx, users were asked to provide the ENABLE_SHARED_AND_SHARED_v4 environment variable. With 2.2, this environment variable is no longer required and sharedv4 volumes are enabled by default. |
PWX-8165 | The pxctl cluster provision-status command with the --show-labels flag now displays storage pool labels. |
PWX-9956 | When encrypting volumes with CSI, users can pass the secret information used for volume encryption through storage class templatized parameters. |
PWX-9888 | When a Portworx volume is resized, the volume usage metrics are now immediately updated in Prometheus. |
PWX-9831 | Users can use custom labels to designate nodes as KVdb nodes through the PX_METADATA_NODE_LABEL environment variable. |
PWX-9769 | Users can specify the base path for the Vault secret store using the VAULT_BASE_PATH environment variable. |
PWX-9727 | With 2.2, Portworx raises an alert every time a pool is resized. |
PWX-9481 | Additional State column for the pxctl clouddrive list and pxctl clouddrive inspect commands makes it easier to see the state of a particular drive. |
PWX-8951 | The pxctl command-line utility now allows users to update the credentials and the cloudsnap schedule for a volume. |
PWX-9976 | With 2.2, users can update a node's CloudDriveSet labels by running the pxctl clouddrive update-labels --nodeid <node-id> command. This is useful for when the px/metadata-node label must be set to true on a node which is part of an operational cluster. |
PWX-8534 | The pxctl cloudsnap list command provides pagination and the users can now specify filters for listing only certain types of backups. By default, migration-related backups are not displayed. |
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-9991 | Using Talisman to upgrade Portworx from version 2.0.x to 2.1 may create a DaemonSet called portworx-api with a wrong IP address. As a result, the DaemonSet is not present on the host network. User Impact: This prevented Stork from finding the Portworx service. In turn, users experienced issues while trying to create cluster pairs. Resolution: With 2.2, the upgrade process creates the DaemonSet with the correct IP address. |
PWX-9777 | If the size of the timestamp file exceeds 10G, Portworx may unexpectedly restart and become stuck in a restart loop. User Impact: Users could not access volumes. |
PWX-9765 | A recent update to the IBM IAM service requires every client to set the "Content-Length" header in their HTPP requests. User Impact: This caused users to see an BXNIM0109E:Property missing or empty. error. Resolution: With 2.2, Portworx sets the "Content-Length" header in every request. |
PWX-9842 | Openshift on IKS: Pods may get stuck in the terminating state during teardown. User Impact: This left stale paths for the volumes, causing the pod teardown to get stuck. |
PWX-9964 | Under certain circumstances, the volume delete operation may fail. User Impact: Users may see a Client.Timeout exceeded while awaiting headers error, and the delete operation may get stuck for approximately 5 minutes. |
PWX-9889 | Under certain circumstances, Portworx exposes empty volume usage metrics to Prometheus. User Impact: Users are unable to see the correct volume usage metrics in Prometheus. |
PWX-9883 | When running Portworx with the internal KVdb, the Portworx service may restart on the online nodes during a KVdb failover. User Impact: Users see an Etcd cluster not reachable error. |
PWX-9855 | Rebooting the internal KVdb nodes within a short interval resulted in the configmap entries getting cleaned up. User Impact: The internal KVdb doesn't start, and the cluster goes out of quorum. Resolution: Check the node labels before performing a KVdb node failover. If the node labels don't allow the node to act as a KVdb node, then don't remove the offline KVdb node from the existing cluster. |
PWX-9826 | If all the nodes are rebooted at the same time, an application node may try to start the internal KVdb, even though it doesn't have the px/metadata-node=true label. User Impact: A storage cloud-drive may get attached to a KVdb node or vice versa. Resolution: Make sure that the KVdb drives are only attached to the designated KVdb nodes, and the application nodes don't attach a DriveSet belonging to a KVdb node. |
PWX-9334 | A bug in the Grafana dashboard caused storageless nodes to display as down in Grafana. |
PWX-10031 | If a migration is performed while only the source cluster is licensed for Disaster recovery, Portworx marks the migration as successful even if the resources are not migrated correctly. User Impact: The migrated volume is empty |
PWX-10000 | The pxctl volume usage command errors out on nodes with more than 100 volumes attached. User Impact: Users see the following error message: Error: Get http://localhost:9001/v1/osd-volumes/usage/<VOL_ID>: net/http: request canceled (Client.Timeout exceeded while awaiting headers) . |
PWX-6894 | For sharedv4 volumes, Portworx provides multi-RW access to the volumes by maintaining a single server and multiple clients. If this type of server goes down, the clients receive an Unmount request for the volume. This results in a server with a dangling client reference. User Impact: The sharedv4 volumes might end up attached even if there is no consumer. |
PWX-9974 | Fixed an issue in which the OpenStorage SDK sends the redirect flag for every request to detach a volume. User Impact: Under certain circumstances, the operation of deleting or detaching a volume could timeout after 5 minutes. |
PWX-9625 | Installing Portworx using a Helm chart on Kubernetes 1.14.3 fails. User Impact: The user sees several helper pods failing because they try to pull an image which doesn't exist. Resolution: Fixed by replacing the kubectl repo with https://hub.docker.com/r/bitnami/kubectl . |
PWX-10061 | When running Portworx on Kubernetes, the upgrade process can reset the custom port settings in the portworx-service spec to their default values. User Impact: Users see an HTTP error 404 error in kubelet while trying to mount a volume. |
Known Issues
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description | Workaround |
---|---|---|
PWX-10049 | CSI: Due to an issue Kubernetes 1.13, if the Kubelet or Portworx goes offline unexpectedly on a node where a volume is attached, the Kubelet will leave orphaned pod directories under /var/lib/kubelet/pods/* . The kubelet logs will report these errors every 2 seconds unless this directory is manually cleaned up. | Move or delete the orphaned pod's directory to stop these logs from showing up. |
PWX-10057 | CSI: In Kubernetes 1.14 with the Portworx CSI driver, unmount may fail intermittently if a volume is attached to a node where Portworx is down. | If unmount fails, retry once Portworx is back online. |
PWX-10056 | PX-Security: With security enabled, the pxctl cloudmigrate status command returns a blank result even there is cloud migration going on. | Use the pxctl cloudmigrate status --task_id <your_cloud_migration_task_ID> command to view the migration status. |
PWX-8421 | Setting collaborator access on a snapshot using pxctl may return an error. | pxctl properly updates collaborator access, despite returning an error. |
2.1.7
December 12, 2019
Fixes
The following issues have been fixed:
Issue Number | Issue Description |
---|---|
PWX-10681 | For Portworx deployments using an internal KVDB with kubelet running as a docker container, a crash or other interruption which downs both Docker and Portworx can leave an outdated socket file on the node. On restart, Docker attempts to reconnect to Portworx, but Portworx waits on kubelet causing a cyclic dependency. User Impact: Crashes downing both Portworx and Docker without the chance for cleanup rendered both services unable to recover. Resolution: This fix attempts to address this cyclic dependency. Portworx responds to the outdated socket requests as not available , allowing Docker to progress through startup. |
2.1.5
September 13, 2019
Fixes
The following issues have been fixed in the 2.1.5 release:
Issue Number | Issue Description |
---|---|
PWX-9911 | When running PX-DR, old cloudsnaps might not have been deleted from the objectstore since deletions were triggered only on full backups. User impact: Over time, the objectstore on the PX-DR destination cluster may run out of space. Resolution: With 2.1.5, the next DR cleans up old cloudsnaps and frees up space in the objectstore. |
PWX-9892 | Asynchronous DR causes a large number of alerts composed of long strings, which resulted in high memory usage. User impact: ETCD disk usage was unnecessarily high. |
PWX-9873 | The DC/OS ACS token used to communicate with DC/OS secrets APIs expires every 5 days, and the auth workflow does not refresh this token when it expires. User impact: This caused users to see an Unauthorized error, which required them to restart Portworx. |
PWX-9811 | In PKS, unset regions impact volume provisioning. User impact: Volumes would be improperly provisioned into a single region. Resolution: With 2.1.5, an unset region will be set to "default". |
2.1.4
August 26, 2019
Fixes
The following issues have been fixed in the 2.1.4 release:
Issue Number | Issue Description |
---|---|
PWX-9781 | Cloudsnap backup operations may fail during catalog collection. User impact: User operations relying on Cloudsnap may fail. Resolution: With 2.1.4, catalog collection has been disabled. |
2.1.3
August 8, 2019
Improvements
Portworx has upgraded or enhanced functionality in the following areas:
Improvement Number | Improvement Description |
---|---|
PWX-8793 | In order to migrate volumes encrypted with an AWS KMS cluster-wide secret between clusters, both clusters must have the same cluster-wide secret. With 2.1.3, Portworx introduces new CLI commands. These commands allow you to dump the cluster-wide secret from one cluster in order to upload the same cluster-wide secret to the destination cluster where encrypted volumes will be migrated. For more information, visit the Dump and upload cluster-wide secrets article. |
Fixes
The following issues have been fixed in the 2.1.3 release:
Issue Number | Issue Description |
---|---|
PWX-8902 | On older versions of Kubernetes configured to use the CRI-O container runtime on CoreOS/RHEL nodes, volume mount operations failed with the following error message: selinux is enabled on docker. Disable selinux by removing --selinux-enabled from dockerd arguments User impact: Kubernetes applications running on this particular configuration attempting to use a shared volume never receive their volume and fail to fully start. Resolution: With 2.1.3, Kubernetes application start as expected. |
PWX-9610 | Portworx used invalid characters for Prometheus metric labels. User impact: Customers using Prometheus experienced errors when attempting to view metrics. Resolution: With 2.1.3, Portworx replaces invalid '/' characters with '_' characters when serving metrics to Prometheus. |
PWX-9632 | Portworx occasionally detected a public network interface, causing the internal ETCD to attempt to use a public IP address to communicate over blocked ports. User impact: Nodes were unable to form a quorum. Resolution: With 2.1.3, Portworx no longer detects public network interfaces. |
Known Issues
Portworx is aware of the following issues, check future release notes for fixes on these issues:
Issue Number | Issue Description | Workaround |
---|---|---|
PWX-9607 | The pxctl volume usage command may fail, causing the storage layer to become unresponsive and freezing storage I/O on the nodes where the volume is provisioned. | With 2.1.3, this command has been hidden. If you're still on 2.1.2, avoid entering this command. If storage does become unresponsive as a result of pxctl volume usage , reboot the nodes on which your volume has been provisioned. |
2.1.2
July 24, 2019
Key Features
- Cloud drive support for Microsoft Azure
- Enhanced Volume placement strategies for advanced volume provisioning rules
- Support for Red Hat Enterprise Linux 8 with CRI-O
Enhancements
PWX-8635 - Add support for the CRI-O container runtime.
User Impact: Portworx has added support for the CRI-O container runtime, with the some log file limitations:
- The progress bar while downloading images is not available.
- Progress information while installing Portworx binaries is not available.
PWX-8665 - Support for optimized restores as a cluster option.
User Impact: Users can now enable optimized restores as a cluster level setting using the CLI.
PWX-9061 - Add ability to remove path-style enforcement for AWS S3.
User Impact: With 2.1.2, Portworx now supports the disabling of path-style enforcement for S3 with the --disable-path-style
parameter.
Key Fixes
PWX-9352 - Upgrading from 2.0.3.7 to 2.1.1 fails.
User Impact: If you have internal KVDB clusters, upgrading from 2.0.3.7 to 2.1.1 is not supported. Portworx by Pure Storage recommends upgrading from 2.0.3.7 to 2.1.2.
PWX-8730 - Allow storageless nodes to join when licenses have been exhausted if offline storageless nodes remain.
User Impact: Previously, rolling upgrades of customer environments in cloud auto-scaling groups may have exceeded licensing quota if storageless nodes were created before offline nodes were removed from the cluster. With 2.1.2, offline nodes no longer count against licensing count.
Recommendations: Upgrade to 2.1.2 to support rolling upgrades in cloud auto-scaling environments. If you are unable to upgrade to 2.1.2, you can work with Portworx support to get temporary licenses that increase the node count until an upgrade to 2.1.2 can be planned.
PWX-9042 - Secrets can be overwritten with certain commands.
User Impact: with 2.1.2, It is no longer possible to unintentionally overwrite secrets using a combination of the pxctl secrets set-cluster-key
and pxctl generate secret
commands.
PWX-8953 - (Consul) KVDB does not pass the CA certificate file to the Consul client.
User Impact: With 2.1.2, CAFile is now properly sent to the Consul client.
PWX-8966 - (ASG) Limit the maximum number of drives you can attach to a node.
User Impact: Previously, attempting to attach too many drives results resulted in errors. With 2.1.2, Portworx introduces a maximum limit of 12 drives per node, and will respond to attempts to add more than 12 drives with the following error: cannot provide more than 12 number of drives per node
.
PWX-7374 - AWS SC1 and ST1 EBS volumes are unsupported.
User Impact: It was previously not possible to add SC1 or ST1 EBS volumes using the pxctl service drive add
command. With 2.1.2, Portworx now supports SC1 and ST1 EBS drive types.
PWX-8792 - Add backoffs to AWS CloudDrive API calls.
User Impact: With 2.1.2, Portworx now slows calls to the AWS Cloud Drive API at increasingly long intervals if it encounters resource limit errors.
PWX-8701 - The internal KVDB should use DNS names for peer urls.
User Impact: The internal KVDB tracks peer URLs as potentially ephemeral IP addresses. If the entire cluster becomes unavailable in an outage, Portworx may be unable to reconnect. With 2.1.2, internal KVDB now keeps track of nodes using a DNS, and can therefore reconnect in the event of a total cluster outage.
PWX-8904 - Introduce timeout on storage requests to avoid possible hung situations when Portworx starts.
User Impact: Previously, Portworx may have failed to start, displaying no active I/O operations. With 2.1.2, storage requests now timeout to avoid possible hung situations on node start.
PWX-8606 - PVC creation fails if no enforcement type is specified in VolumePlacementStrategy
.
User Impact: Previously, if you did not specify an enforcement type in VolumePlacementStrategy
, PVC creation failed. With 2.1.2, Portworx will default to enforcement: required
if you do not specify an enforcement type.
PWX-8959 - The Snapshot API does not return a typed error if a snapshot already exists.
User Impact: With 2.1.2, this error message is improved. If you try to create a new snapshot where one already exists, a clearly typed error is now returned.
PWX-9126 - The nodiscard
option is impacted by volume resize.
User Impact: Previously, resizing a volume sometimes reset a volume’s nodiscard
option configuration. With 2.1.2, this has been fixed.
PWX-8690 - Wipe and upgrade scripts fail on Kubernetes 1.14.
User Impact: Due to a feature deprecation with Kubernetes 1.14, wipe and upgrade scripts did not work. With 2.1.2, The px-wipe
command now correctly removes Portworx pods on Kubernetes 1.14.
PWX-9175 - Portworx fails to detect the device path on certain types of AWS ec2 instances.
User Impact: With 2.1.2, Portworx no longer fails to detect the device path on certain types of AWS ec2 instances and operating system combinations.
PWX-7851 - Storage nodes could be removed by the Kubernetes autoscaler.
User Impact: With 2.1.2, Portworx pods are annotated to prevent node removal by the Kubernetes autoscaler.
PWX-7493 - Portworx selects undesired network interfaces during autodetection.
User Impact: With 2.1.2, Portworx will avoid selecting the undesired network interfaces during configuration.
PWX-9053 - The pxctl service node wipe command fails to wipe MDRAID devices.
User Impact: With 2.1.2, MDRAID devices are now correctly wiped with the node wipe command.
PWX-9046 - Portworx doesn’t recognize MDRAID partitions as journal devices.
User Impact: With 2.1.2, Portworx now recognizes MDRAID partitions as journal devices.
PWX-9054 - Portworx sometimes fails to detect MDRAID partitions.
User Impact: With 2.1.2, Portworx now detects MDRAID partitions when installed with the -A
option.
PWX-8938 - Add “sync” and “noac” sharedv4 mount options.
User Impact: With 2.1.2, “sync” and “noac” sharedv4 mount options are now available.
PWX-8893 - Add support for max_storage_nodes_per_zone
on AKS clusters on Microsoft Azure.
User Impact: With 2.1.2, Portworx supports the max_storage_nodes_per_zone parameter, allowing it to limit storage nodes on AKS availability sets.
PWX-9263 - Volumes can now be created with the XFS filesystem without needing to provide the "force_unsupported_fs_type" option.
User Impact: Previously, provisioning an XFS volume erroneously created an EXT 4 formatted volume. With 2.1.2, provisioning an XFS volume now creates a properly formatted XFS volume.
PWX-8712 - Verify completion of a backup before marking it as complete.
User Impact: With 2.1.2, Portworx now verifies that a backup operation has completed before recording a backup as completed.
PWX-8733 - During upgrade, pods using volume subpaths produce errors.
User Impact: With 2.1.2, pods using volume subpaths are correctly detected and bounced.
PWX-9476 - OCI-Monitor should not restart the Portworx service unless required.
User Impact: Previously, if the OCI-Monitor restarted Portworx any time it detected a change in the configuration file. With 2.1.2, the OCI-Monitor will only restart Portworx if it’s necessary.
Errata
PWX-9473 - Portworx fails to attach a cloud drive to an Azure AKS scale set VM.
User Impact: Portworx fails to attach a cloud drive with the following error messages: Failed to detach disk...
, and Failed to attach drive...
Recommendations: Portworx is working with Microsoft to resolve this issue, in the meantime, Portworx by Pure Storage recommends deleting the impacted VM manually in Azure and allowing it to redeploy.
2.1.1
May 4, 2019
Key Fixes
- PWX-8668 - Kubernetes: Pods get stuck again on "Terminating 0/1". Failing to unmount shared volumes, because they are not found
- PWX-8652 - AWS: Allow specifying custom tags for EBS volumes in spec
- PWX-8643 - Storage pool does not recover out of storage full condition after deleting volumes
- PWX-8529 - Cloudsnap restore very slow for backups that have dependent incremental with large data sets
2.1.0.1
May 22, 2019
Key Fixes
PWX-8870 - Portworx Metro DR didn't mark a node as offline
2.1.0
April 19, 2019
Key Features and Enhancements
- PX-Security
- PX-DR
- Automated application level scheduled snaps and cloudsnaps
- Automated app-consistent cluster to cluster migration
- Optimized incremental cloudsnap restores
Key Fixes
- PWX-7160 - Fix issues with cloudsnaps to IBM Objectstore
- PWX-7481 - Shared volume failed to detach with an error that the volume is mounted. But the volume was not mounted
- PWX-7650 - Portworx install errors w/ "tar: .: file changed as we read it"
- PWX-7794 - License: Aggressive node-decommissions and "Cluster max capacity" error handling
- PWX-7869 - Cloudsnaps: Handle cloud backup deletes in background
- PWX-7891 - pxctl service nw --all failed to delete multi-path devices
- PWX-7951 - In Kubernetes OCI-Mon restarts during updates may leave Portworx down
- PWX-7963 - Cloudsnap restore for xfs volumes has wrong fs type
- PWX-7968 - Cloudsnaps cleanup fails because of missing start timestamp in metadata
- PWX-7989 - Skip cleanup when a pre-existing node fails to join due to licensing
- PWX-8025 - Alert for storage full is deleted even though the condition still exists
- PWX-8033 - DC/OS portworx-mongo now works with nested folders
- PWX-8055 - pxctl import: copy links after all regular files have been copied
- PWX-8060 - Cloudsnap restores will fail when the volume is heavily fragmented
- PWX-8064 - Reducing the HA level of an aggregated volume may cause any active cloud backups on the volume to fail. New cloud backup can be restarted once the HA level has been reduced on the volume
- PWX-8261 - Startup issue with Debian 9 (4.19.0-0.bpo.2-cloud-amd64)
- PWX-8297 - K8S: OCI-Mon must force-pull px-enterprise before reinstalling incomplete Portworx
- PWX-8311 - IKS: OCI Monitor not starting Portworx when both Docker and ContainerD services running
- PWX-8334 - Fixed install progress-bar
- PWX-8335 - Handle mpath device partitions in nodewipe
- PWX-8403 - Error when trying to mount a sharedv4 volume with encryption. The first pod comes up okay, but second and subsequent pods mount results in failure
- PWX-8472 - OpenShift: Portworx mounts leak. Each portworx-service restart will increase the number of mounts
- PWX-8504 - Cloudsnaps: No incremental backups created for cloud backups that have user tags( created by external schedulers)
Errata
- PWX-8470: ASG: CLI does not update metadata device name, if after restart device name changes
2.0.3.8
Key Fixes
PWX-9486 - Changes to Portworx runtime configuration.
User Impact: This fix ensures consistent sync times on the backend.
2.0.3.7
June 19, 2019
Portworx 2.0.3.7 works with Stork 2.1.2. Here are the release notes for Stork 2.1.2.
Key Features and Enhancements
Feature: PWX-7549 - Provide the ability to add multiple drives to a Portworx node in a single command.
Customer Impact: None. This is an enhancement as previously Portworx only allowed only one drive to be added at a time.
Recommendations: Add only one drive at a time or upgrade to 2.0.3.7 to be able to multiple drives.
Key Fixes
Issue: PWX-9203 - Connections get reset in networks with long idle time with bursty traffic
Customer Impact: In very rare cases, in specific network setups, the connections between the node were seen to be disrupted after many hours of idle time.
Recommendations: 2.0.3.7 implements improved connection keep-alives to keep node-to-node connections active in case of long idle times
Issue: PWX-9126 - If a volume has the 'nodiscard' option set after the volume gets resized, the 'nodiscard' is not retained.
Customer Impact: For customers who have used this option to increase the performance, the performance may drop after a volume gets resized.
Recommendations: None. Recommend upgrading to 2.0.3.7
Issue: PWX-9118 - Adding an LVM drive partition as journal device failed.
Customer Impact: Adding a journal device which is LVM drive partition will fail but the error is not clear in the logs. In general, it is not recommended to use LVM partitions as journal devices because of the performance limitations of such devices.
Recommendations: Don't use LVM partitions as a journal device. Upgrade to 2.0.3.7 for better handling.
Issue: PWX-9055 - Installing Portworx with an LVM volume that has a similar name to another LVM volume will deactivate the other LVM volume (not used by Portworx). For e.g, if Portworx is asked use to /dev/mapper/volone and there is another LVM volume with the name /dev/mapper/volone1, volone1 would get deactivated.
Customer Impact: In this case, if the customer was using the raw LVM volume for another application, that volume will be deactivated.
Recommendations: Portworx by Pure Storage recommends inspecting the system beforehand and to use volume names for LVM volumes that do not overlap with the existing volumes. To completely avoid the issue, Portworx recommends upgrading to 2.0.3.7.
Issue: PWX-8717 - Adding a journal device to a storage-less node right after a storage device was added results in Portworx crashing and restarting.
Customer Impact: When a user tries to add a journal device to a storage-less node right after adding a storage device, Portworx might crash and restart. In general, there is no application error as Portworx restarts within the 10 min device timeout interval, but there may be a very brief slow down of the I/Os.
Recomendations: The workaround is to restart Portworx after a storage device was added and then add a journal device. Upgrading to 2.0.3.7 will eliminate the need for a restart.
Issue: PWX-9054 - Using the -A
option during install does not recognize mdraid partitions.
Customer Impact: If the storage disks provided to Portworx during install are from mdraid partitions, Portworx will not recognize them and will come up in storageless mode.
Recommendations: There is no workaround. Upgrading to 2.0.3.7 will enable adding mdraid partitions
Issue: PWX-8904 - Restarting Portworx node could result in not being able to complete the startup sequence.
Customer Impact: This was seen in environments where there was heavy I/O with shared and non-shared volumes. Also, there were lots of replicas on the node to which a lot of traffic was being routed to. This resulted in a scenario where the internal credits were exhausted. 2.0.3.7 increased the credits and resource allocation.
Recommendations: Upgrading to 2.0.3.7 resolves this issue.
Issue: PWX-9017 - De-couple OCI-mon constraints from Portworx systemd service limits
Customer Impact: With previous releases, if the user changes the oci-mon system constraints/limits, it will get passed on to Portworx systemd service.
Recommendations: This is addressed in 2.0.3.7 where the oci-mon limits are no longer passed on Portworx systemd service
Issue: PWX-8846 - Storage-less node ended up with stuck I/Os and needed to be restarted.
Customer Impact: A storage-less node stopped processing I/Os and had to be restarted to have the resources released.
Recommendations: Storage-less node must be restarted to release the resources so application I/O can begin to run. Resources used by volumes which are detached when the node was down are not released when the node comes back online. This was resolved so these resources are now properly released to the global internal resource pool in Portworx.
Issue: PWX-9042 - Do not allow overwriting of secrets
Customer Impact: Users can potentially overwrite their encrypted volume secrets if they end up using the same name for the new secret.
Recommendations: Users must double-check if they already have the same name that they are trying to add new secrets to the same cluster. 2.0.3.7 implemented a check to look for existing names before accepting a new secret. This prevents the secret from being overwritten.
2.0.3.6
May 30, 2019
Key Fixes
- PWX-8740 - Cloudsnaps: Do not create multiple grpc clients to px-storage
- PWX-8299 - Core with 'concurrent map writes' when object store on the remote cluster was unreachable
2.0.3.5
May 28, 2019
Key Fixes
- PWX-5885 - px-runc install is missing option to specify raidlevel
- PWX-7851 - Add pod annotation to prevent scale down of storage nodes in Kubernetes when using autoscaler
- PWX-8701 - Internal Kvdb: Use DNS names for peer URLs instead of IPs for internal kvdb
- PWX-8712 - Cloudsnaps: Verify uploaded objects before marking backup as done
- PWX-8715 - Node index generation fix to avoid same node index generation
- PWX-8733 - Post upgrade to 2.0.3.4: shared volumes were errored because the server endpoint wasn’t there anymore
- PWX-8917 - OCI-Mon: Portworx service not restarted after cpu/mem limits updated
2.0.3.4
April 24, 2019
Key Fixes
- PWX-8472 - OpenShift: Portworx mounts leak. Each portworx-service restart will increase the number of mounts
- PWX-8529 - Fix Cloudsnap restore performance for backups that have dependent incremental with large data sets
- PWX-8652 - AWS: Allow specifying custom tags for EBS volumes in spec
2.0.3.3
April 5, 2019
Key Fixes
- Portworx available on GOOGLE CLOUD PLATFORM MARKETPLACE https://cloud.google.com/marketplace/
- PWX-8451 - Block adding metadata device when running with internal kvdb
- PWX-8345 - Node wipe and upgrade doesn't work if Portworx is installed in a namespace other than kube-system
- PWX-8045 - Cloudmigrate fails if credentials use a custom bucket name
- PWX-7891 - pxctl service node-wipe --all failed to delete multi-path devices
- PWX-8261 - Allow fresh install of Portworx on Linux Kernel version 4.9.0-7-amd64 and 4.9.0-8-amd64
2.0.3.2
March 22, 2019
Key Fixes
- PWX-8062 - Portworx cluster running on Kubernetes does not report volumes metrics
- PWX-8136 - Disable kvproxy audit as it causes the etcd client to trigger unnecessary API requests
- PWX-8098 - Portworx fails to start after reboot on a system with LVM drives and auto-configured journal device
Errata
- PWX-8161 - If an LVM partition is added as journal device after node initialization, any subsequent system reboot will need the LVM partition to be made visible before starting Portworx. This can be done by running "partprobe"
2.0.3.1
March 19, 2019
Key Fixes
- PWX-8063 - Startup issue with 4.19.0-0.bpo.2-cloud-amd64
- PWX-8060 - Cloud backup restore fails with JSON unmarshalling error
- PWX-7989 - Fix licensing issue which was leading to reducing the number of nodes allowed in the cluster. Differentiate between NEW and PRE-EXISTING node failing to join the cluster, and do not clean up if PRE-EXISTING nodes were the ones causing the failures.
- PWX-7980 - Do not cleanup CloudDrives when the drives are initialized and have labels
- PWX-7968 - Cloudsnaps cleanup fails because of missing start timestamp in metadata
- PWX-7963 - Cloudsnap restore for xfs volumes has wrong fs type
- PWX-7794 - LIC: Aggressive node-decommissions and "Cluster max capacity" error handling
2.0.3
March 8, 2019
Key Features and Enhancements
- SharedV4 support for DC/OS
- SharedV4 volume encryption support
- Support DC/OS Zookeeper for discovery service when using internal kvdb in DC/OS configurations
- Volume Policy Management Support
- Support Azure Key Vault for Secret Store
- Fix Kubernetes CVE for RUNC
Key Fixes
- PWX-5657 - Fix a corner case where increasing the replication factor of a volume can take much longer when there are multiple levels of volume clones
- PWX-5762 - Add support for Azure Key Vault
- PWX-6868 - Prometheus framework update to add Portworx support
- PWX-7448 - Show proper error for incorrect pxctl commands
- PWX-7468 - node-wiper script to wipe the namespace created by Kubernetes secrets
- PWX-7481 - Shared volume failed to detach with an error that the Volume is mounted while Volume was not mounted
- PWX-7485 - Display an appropriate message when cluster-wide diags can not be collected
- PWX-7491 - Drive provisioning fixes for issues where extra drives were created than what was specified in the spec.
- PWX-7512 - Speed up Portworx install in DC/OS clusters by installing in each node in parallel.
- PWX-7513 - In DC/OS, Portworx tasks should restart if they go in LOST state
- PWX-7516 - The portworx-prometheus framework version need to be corrected.
- PWX-7571 - CloudSnap : Restore fails sometimes with "failed to get metadata of the backup from cloud"
- PWX-7595 - Handle spurious storage pool Full/offline condition
- PWX-7596 - Portworx creates node labels for every PVs, causing prometheus federation scraping issues
- PWX-7604 - Anonymize the secrets for Key Management Systems
- PWX-7605 - DCOS Portworx-Prometheus pod replace does not work as expected
- PWX-7619 - Make KVDB URLs optional
- PWX-7628 - Alertmanager does not run after installing Portworx 2.0.2
- PWX-7639 - DCOS Portworx framework should install with default options from config.json.
- PWX-7650 - Portworx install errors w/ "tar: .: file changed as we read it"
- PWX-7656 - Shared v4 failover operation fails if the management and data interface of Portworx service is different
- PWX-7661 - [stork] Snapshot status not being updated for all cloudsnaps in groupsnapshot
- PWX-7686 - Enable Portworx to install in AWS instances when auto journaling is enabled.
- PWX-7743 - Prevent Portworx install if only the journal disk is given in the install script and no data disks were given.
- PWX-7766 - When a groupsnapshot request times out, allow for the snapshot scheduler to retry in the next interval or ask the user to retry if it is a manual request
- PWX-7773 - runc CVE-2019-5736 fix #3169
2.0.2.3
March 3, 2019
Key Fixes
- PWX-7919 - Geography updates loading etcd causing high CPU usage
2.0.2.2
February 23, 2019
Key Fixes
- PWX-7664 - When a node running 1.7 with empty journal log gets upgraded to 2.0 and the upgraded node is restarted, the node doesn't fully restart on next boot
2.0.2.1
February 8, 2019
Key Fixes
- PWX-7510 - Remove any secretes info from the diags
- PWX-7214 - licensing engine config update improvements
2.0.2
January 26, 2019
Key Features and Enhancements
- PWX-7207 - Allow docker with SELinux for newer Kubernetes versions
- PWX-7208 - Google Cloud KMS integration
Key Fixes
- PWX-6770 - Restart docker apps using shared volumes on DCOS
- PWX-7006 - Cloud migration cancel didn't cancel all the volume migrations
- PWX-7007 - Add an alert when Cloud migration task is canceled
- PWX-7179 - Pool io priority for KOPS io1 volume should be correctly displayed
- PWX-7199 - Enable capacity usage command for centos kernel >= 3.10.0-862
- PWX-7226 - DCOS Portworx: Manually updated values in /etc/pwx/config.json does not persist
- PWX-7267 - Hide unknown/non-handled licenses
- PWX-7271 - 'pxctl secrets gcloud list-secrets' shows unnecessary line in the console output
- PWX-7280 - Logs getting flooded with "18 is not 14len(values)" after upgrading the kernel to 4.20.0-1
- PWX-7304 - Handle journal device "read-only" cases
- PWX-7348 - Handle journal device "offline" cases
- PWX-7364 - Portworx boot stuck at ns mount
- PWX-7366 - Portworx service restart issues including "missing mountpoint", or "cannot open file/directory"
- PWX-7407 - OCI Monitor: Initiates cordoning even when px.ko was not loaded
- PWX-7466 - Kubernetes/Upgrade: Talisman does not support CRI/Containerd
2.0.1.1
January 19, 2019
Key Fixes
- PWX-7431 - Strip the labels on a config map to fit 63 characters.
- PWX-7411 - Portworx does not come up after upgrade to 2.0.1 when an auto-detecting network interface
2.0.1
December 20, 2018
Key Fixes
- PWX-7159 - Persist kvdb backups outside of the host filesystem
- PWX-7225 - AMI based ASG install does not pick up user config
- PWX-7097 -
pxctl service kvdb
should display correct cluster status after nodes are decommissioned - PWX-7124 - Volume migration fails when the volume has an attached snapshot policy
- PWX-7101 - Enable task ID-based sorting for
pxctl cloudmigrate
commands - PWX-7121 - Creating a paired cluster results in core files in the destination cluster
- PWX-7110 - Delete paired cluster credentials when the cluster pair is deleted
- PWX-7031 - Cluster migration restore status does not reflect the cloudsnap status when cloudsnap has failed
- PWX-7090 - Core files generated when a node is decommissioned with replicas on the node
- PWX-7211 - Fix daemonset affinity in openshift specs
- PWX-6836 - Don't allow deletion of the Portworx configuration data when the Portworx services are still running in the system
- PWX-7134 - SSD/NVME drives are displayed as STORAGE_MEDIUM_MAGNETIC
- PWX-7089 - Intermittent failures in
pxctl cloudsnap list
- PWX-6852 - If Portworx starts before Docker is started, the
SchedulerName
field in pxctl CLI shows as N/A - PWX-7129 - Add an option to improve filesystem space utilization in case of SSD/NVMe drives
- PWX-7011 - Cluster pairing for cluster migration fails when one of the nodes in the destination cluster is down
- PWX-7120 - Cloudsnap restore failures cannot be viewed through
pxctl cloudsnap status
2.0.0.1
December 7, 2018
Key Fixes
- PWX-7131 - Fix an issue with some of the alerts IDs mismatching with the description as part of the upgrade from the 1.x version to 2.0.
- PWX-7122 - Volume restores would occasionally fail when restoring from backups that were done with Portworx 1.x versions.
2.0.0
December 4, 2018
Key Features and Enhancements
- PX-Motion - Migration of applications and data between clusters. Application migration is Kubernetes only.
- PX-Central - Single pane of glass for management, monitoring, and metadata services across multiple Portworx clusters on Kubernetes
- Lighthouse 2.0 - Supports PX-motion with connection to Kubernetes cluster for application and namespace migration.
- Shared volumes (v4) for Kubernetes
- Support Cloudsnaps for Aggregated volumes
- ‘Extent’ based cloudsnaps - Restartable Cloudsnaps if a large volume cloudsnap gets interrupted
- Support Journal device for Repl=1 volumes
- PX-kvdb (etcd) supported internally with Portworx cluster deployment
Key Fixes
- PWX-6458: When decreasing HA of a volume, recover snapshot space unused.
- PWX-5686: Implement accounting and display of space utilized by snapshots and clones.
- PWX-6949: Decommissioned node getting listed from one node in the cluster and not from the other
- PWX-6617: PDM: Dump the cloud drive keys when Portworx loses kvdb connectivity.
- PWX-5876: Volume should get detached when out of quorum or pool down.
Errata
-
PWX-7011: Cluster pair creation failing, because of destination Portworx node is marked down Workaround: Restart the Portworx node and attempt the cluster pairing again
-
PWX-7041: CloudSnap Backup Failed for Pause/Resume by Portworx Restart - All replicas are down
Workaround: This is a variant of the previous errata. For volume with replication factor set to 1, Cloudsnap backup does not resume after the node with replica goes down.
1.7.6
Release Notes: February 7, 2019
Key Fixes
- PWX-7304 - Portworx keeps restarting if journal device made read-only
- PWX-7348 - Portworx keeps restarting, VM reboot after journal device made “offline”
- PWX-7453 - cloudsnap cleanup didn't complete properly in cases where errors were encountered when transmitting the diffs
- PWX-7481 - Shared volume mounts fail when clients connections abruptly lost and not cleaned up properly
- PWX-7600 - Volume mount status might be incorrectly displayed when the node where the volume is attached hits a storage full condition and replicas on that node are moved to a new node
1.7.5
January 15, 2019
Key Fixes
- PWX-7364 Namespace stuck volume issue
- PWX-7299 export pool_status as a stat for prometheus
- PWX-7267 LIC: Hide unknown/non-handled licenses
- PWX-7212 Cloudsnap-Restore: Increase restore verbose level for error cases
- PWX-7179 io1 volume added to KOPS cluster gets displayed as STORAGE_MEDIUM_MAGNETIC
- PWX-7033 Objectstore endpoint failover not happening
1.7.4
January 7, 2019
Key Fixes
- PWX-7292 For all storage errors retry 3 times before making pool offline
- PWX-7291 Detect SSD based pools and mount with nossd if kernel version is less than 4.15
- PWX-7214 LIC: Goroutine leak at license watch re-subscription
- PWX-7143 LIC: Should hard-code "absolute maximums" into License evaluations
- PWX-7142 LIC: SuperMicro misinterpreted as VM [roblox]
1.7.3
December 13, 2018
Key Features and Enhancements
- Provide a runtime option to enable more compact data out of flash media to avoid disk fragmentation
Key Fixes
- Fix an issue with NVMe/SSD disks being shown as Magnetic disks
1.7.2
December 5, 2018
Key Features and Enhancements
- Default queue depth for all volumes (new, coming from older release) set to 128
- Advanced runtime options for write amplification reduction
Key Fixes
- PWX-6928: Store bucket name in cloudsnap object
- PWX-6904: Fix bucket name for cloudsnap ID while reporting status
- PWX-7071: Do not use GFP_ATOMIC allocation
1.7.1.1
November 7, 2018
Key Fixes
- Fix to add/remove node labels in Kubernetes to indicate where volume replicas are placed
1.7.1
November 7, 2018
Key Features and Enhancements
- Restart docker containers using shared volumes for DC/OS to enable automatic re-attach of the containers on Portworx upgrades
- Preserve Kubernetes agent node ids across agent restarts when Kubernetes agents are running statelessly in auto-scaling based environments
1.7.0
November 3, 2018
Key Features and Enhancements
- IBM Kubernetes Service (IKS) Support
- IBM Key Protect Support for Encrypted Volumes
- Containerd runtime Interface (CRI) support
- Automatic VM Datastore provisioning for CentOS ESXi VMs
- Tiered Snapshots for storing volume snapshots on only lower-cost media
- Encryption support for shared volumes
Key Fixes
- PWX-6616 - Fix shared volume mounts going read-only kubernetes in few corner cases
- PWX-6551 - px_volume_read_bytes and px_volume_written_bytes are not available in 1.6.2
- PWX-6479 - Debian 8: Portworx fails to come up if sharedv4 is enabled
- PWX-6560 - PVC creation fails with "Already exists" perpetually
- PWX-6527 - Clean up orphaned volume paths as PVC are attached and detached over a period of time
- PWX-6425 - Cloudnsap schedule option to do full backup always.
- PWX-6408 - Node alerts: Include hostname/IP in addition to node id
- PWX-5963 - Report volumes with no snapshots
1.6.1.4
October 19, 2018
This is a minor patch release with the following fixes/enhancements.
- PWX-6655 - Fix to allow storageless nodes to reuse their node ids in Kubernetes
- PWX-6410 - Fix a bug where Portworx may detach unused loopback devices that are not owned by Portworx on restarts.
- PWX-6713 - Allow update of per volume queue depth
1.6.1.3
October 26, 2018
This is a minor patch release with the following fixes/enhancements.
- PWX-6697: Add support for automatic provisioning of disks on VMware virtual machines on non-Kubernetes clusters and Kubernetes clusters without vSphere Cloud Provider
1.6.1.2
October 23, 2018
This is a minor patch release with the following fixes/enhancements.
- PWX-6567 - Provide a parameter to disable discards during volume create
- PWX-6559 - Provide the ability to map services listening on port 9001 to another port
1.6.1.1
October 11, 2018
This is a minor patch release with fixes issues around volume unmounts as well as pending commands to docker.
- PWX-6494 - Fix rare spurious volume unmounts of attached volumes in case of Portworx service restart under heavy load
- PWX-6559 - Add a timeout for all commands to docker so they timeout if docker hangs or crashes.
1.6.1
October 2, 2018
Key Features and Enhancements
- Per volume queue depth to ensure volume level quality of service
- Large discard sizes up to 10MB support faster file deletes. NOTE: You will need a px-fuse driver update to use this setting. Portworx 1.6.1 will continue to work with old discard size of 1MB if no driver update was done. This is a backward compatible change
- Enable option to always perform a full clone back up for Cloudsnap
- Reduce scheduled snapshot intervals to support snapping every 15 mins from the current limit of 1 hour
Key Fixes
- Fix replica provisioning across availability zones for clusters running on DC/OS in a public cloud
1.6.0
September 20, 2018
Key Features and Enhancements
- OpenStorage SDK support. Link to SDK
- Dynamic VM datastore provisioning support Kubernetes on vSphere/ESX environment
- Pivotal Kubernetes Service (PKS) support with automated storage management for PKS
Errata
- PWX-6198 - SDK Cloud backup and credentials services is still undergoing tests
- PWX-6159 - Intermittent detach volume error seen by when calling the SDK Detach call
- PWX-6056 - Expected error not found when using Stats on a non-existent volume
1.5.1
September 14, 2018
Key Fixes
- PWX-6115 - Consul integration fixes to reduce CPU utilization
- PWX-6049 - Improved detection and handling cloud instance store drives in AWS
- PWX-6197 - Fix issues with max drive per zone in GCP
- When a storagless node loses connectivity to the remaining nodes, it should bring itself down.
- PWX-6208 - Fix GCP provider issues for dynamic disk provisioning in GCP/GKE
- PWX-5815 - Enable running
pxctl
from oci-monitor PODs in Kubernetes - PWX-6295 - Fix LocalNode provisioning pattern when provisioning volumes with greater than 1 replication factor
- PWX-6277 - Portworx fails to run sharedv4 volume support for Fedora
- PWX-6268 - Portworx does not come up in Amazon Linux V2 AMIs
- PWX-6229 - Portworx does not initialize fully in a GKE multi-zone cluster during a fresh install
1.5.0
August 21, 2018
Important note: Consul integration with 1.5.0 has a bug which results in Portworx querying a Consul Cluster too often for a non-existent key. We will be pushing out a 1.5.1 release with a fix by 08/31/2018
Key Features and Enhancements
- Eliminate private.json for stateless installs
- Handle consul leader failures when running with consul as the preferred k/v store
- When a node is offline for longer than the user-configured timeout, move the replicas in that node out to other nodes with free space
- Improvements to AWS Auto-scaling Group handling with KOPS
- Lighthouse Volume Analyzer View Support
- Enable volume resize for volumes that are not attached
- Periodic, light-weight pool rebalance for proactive capacity management
Key Fixes
- PWX-5800 - In AWS Autoscaling mode, Portworx nodes with no storage should always try to attach available drives on restart
- PWX-5827 - Allow adding cloud drives using pxctl service drive add commands
- PWX-5915 - Add PX-DO-NOT-DELETE prefix to all cloud drive names
- PWX-6117 - Fix
pxctl cloudsnap status --local
command failing to execute - PWX-5919 - Improve node decommission handling for volumes that are not in quorum
- PWX-5824 - Improve geo variable handling for kubernetes and DC/OS
- PWX-5902 - Support SuSE CaaS platform
- PWX-5815 - Enable diags collection via oci-monitor when shell access to the minions not allowed
- PWX-5816 - Incorrect bucket names will force a full backup instead of incremental backup
- PWX-5904 - Remove db_remote and random profiles from io_profile help
- PWX-5821 - Fix panics seen zone and rack labels are supplied on volume create
1.4.2.2
August 11, 2018
This is a patch release that adds the capability to switch from shared to sharedv4 one volume at a time. Please contact Portworx support before switching the volume types.
1.4.2
July 21, 2018
Key Features and Enhancements
- Use PX-Central for Kubernetes spec generation.
Key Fixes
- PWX-5681 - Portworx service to handle journald restarts
- PWX-5814 - Fix automatic diag uploads
- PWX-5818 - Fix diag uploads via
pxctl service diags
when running under Kubernetes environments
1.4.0
July 4, 2018
If you are on any of the 1.4 RC builds, you will need to do a fresh install. Please reach out to us at support@portworx.com or on the slack to help assess upgrade options from 1.4 RC builds.
All customers on 1.3.x release will be able to upgrade to 1.4
All customers on 1.2.x release will be able to upgrade to 1.4 but in a few specific cases might need a node reboot after the upgrade. Please reach out to support for help with an upgrade or if there are any questions if you are running 1.2.x in production.
Key Features and Enhancements
- 3DSnaps - Ability to take application-consistent
snapshots cluster wide (Available in 05/14 GA version)
- Volume Group snapshots - Ability to take crash-consistent snapshots on group of volumes based on a user-defined label
- GCP/GKE automated disk management based on disk templates
- Kubernetes per volume secret support to enable volume encryption keys per Kubernetes PVC and using the Kubernetes secrets for key storage
- DC/OS vault integration - Use Vault integrated with DC/OS
- Support Pool Resize - Available in Maintenance Mode only
- Container Storage Interface (CSI) Tech Preview
- Support port mapping used by Portworx from 9001-9015 to a custom port number range by passing the starting port number in install arguments
- Provide ability to do a license transfer from one cluster to another cluster
- Add support for cloudsnap deletes
Key Fixes
- PWX-5360 - Handle disk partitions in node wipe command
- PWX-5351 - Reduce the
pxctl volume list
time taken when a large number of volumes are present - PWX-5365 - Fix cases where cloudsnap progress appears stopped because of time synchronization
- PWX-5271 - Set default journal device size to 2GB
- PWX-5341 - Prune out trailing
/
in storage device name before using it - PWX-5214 - Use device UUID when checking for valid mounts when using device-mapper devices instead of the device names
- PWX-5242 - Provide facility to add metadata journal devices to an existing cluster
- PWX-5287 - Clean up px_env variables as well when using node wipe command
- PWX-5322 - Unmount shared volume on shared volume source mount only on Portworx restarts
- PWX-5319 - Use excl open for open device checks
- PWX-4897 - Allow more time for the resync to complete before changing the replication status
- PWX-5295 - Fix a nil pointer access during cloudsnap credential delete
- PWX-5006 - Tune data written between successive syncs depending on ingress write speed
- PWX-5203 - Cancel any in-progress ha increase operations that are pending on the node if the node is decommissioned
- PWX-5138 - Add startup options for air-gapped deployments
- PWX-4816 - Check for and add lvm devices when handling -a option for device list
- PWX-4609 - Allow canceling of replication increase operations for attached volumes
- PWX-4765 - Fix resource contention issues when running heavy load on multiple shared volumes on many nodes
- PWX-5039 - Fix Portworx OCI uninstall when shared volumes are in use
- PWX-5153 - In Rancher, automatically manage container volume mounts if one of the cluster node restarts
1.3.1.4
May 9, 2018
This is a minor update that improves degraded cluster performance when one or more nodes are down for a long time and brought back online that starts the resync process
1.3.1.2
May 2, 2018
This is a minor update to fix install issues with RHEL Atomic and other fixes.
- RHEL Atomic install fixes
- Clean up any existing diag files before running the diags command again
pxctl upgrade
fixes to pull the latest image information from install.portworx.com- improvements in attached device detection logic in some cloud environments
1.3.1.1
April 16, 2018
This is a minor update to the previous 1.3.1 release
- Fix to make node resync process yield better to the application I/O when some of the nodes are down for a longer period of time and brought back up thereby triggering the resync process.
1.3.1
April 13, 2018
This is a patch release with shared volume performance and stability fixes
Key Fixes
- Fix namespace client crashes when client list is generated when few client nodes are down.
- Allow read/write snapshots in Kubernetes annotations
- Make adding and removing Kubernetes node labels asynchronous to help with large number volume creations in parallel
- Fix Portworx crash when a snapshot is taken at the same time as a node being marked down because of network failures
- Fix nodes option in docker inline volume create and supply nodes value as semicolon-separated values
1.3.0.1
April 6, 2018
This is a patch update with the following fix
- PWX-5115 - Fix
nodes
option in docker inline volume create and supply nodes value as semicolon separated values
1.3.0
March 6, 2018
Upgrade Note 1: Upgrade to 1.3 requires a node restart in non-Kubernetes environments. In Kubernetes environments, the cluster does a rolling upgrade
Upgrade Note 2: Ensure all nodes in Portworx cluster are running 1.3 version before increasing replication factor for the volumes
Upgrade Note 3: Container information parsing code has been disabled and hence the PX-Lighthouse up to 1.1.7 version will not show the container information page. This feature will be back in future releases and with the new lighthouse
Key Features and Enhancements
- Volume create command additions to include volume clone command and integrate snap commands
- Improved snapshot workflows
- Clones - full volume copy created from a snapshot
- Changes to snapshot CLI.
- Creating scheduled snapshots policies per volume
- Important From 1.3 onwards, all snapshots are read-only. If the user wishes to create a read/write snapshot, a volume clone can be created from the snapshot
- Improved resync performance when a node is down for a long time and restarted with accumulated data in the surviving nodes
- Improved performance for database workloads by separating transaction logs to a separate journal device
- Added Portworx signature to drives so drives cannot be accidentally re-used even if the cluster has been deleted.
- Per volume cache attributes for shared volumes
- https support for API end-points
- Portworx Open-Storage scaling groups support for AWS ASG - Workflow improvements
- Allow specifying input EBS volumes in the format “type=gp2,size=100”. (this is documented)
- Instead of adding labels to EBS volumes, Portworx now stores all the information related to them in kvdb. All the EBS volumes it creates and attaches are listed in kvdb and this information is then used to find out EBS volumes being used by Portworx nodes
- Added command
pxctl cloud list
to list all the drives created via ASG
- Integrated kvdb - Early Access - Limited Release for small clusters less than 10 nodes
New CLI Additions and changes to existing ones
- Added
pxctl service node-wipe
to wipe Portworx metadata from a decommisioned node in the cluster - Change
snap_interval
parameter toperiodic
inpxctl volume
commands - Add schduler information in
pxctl status
display - Add info about cloud volumes CLI Kubernetes, others
pxctl service add --journal -d <device>
to add journal device support
Key Fixes
- PWX-4518 - Add a confirmation prompt for
pxctl volume delete
operations - PWX-4655 - Improve “PX Cluster Not In Quorum” Message in
pxctl status
to give additional information. - PWX-4504 - Show all the volumes present in the node in the CLI
- PWX-4475 - Parse io_profile in inline volume spec
- PWX-4479 - Fix io_priority versions when labeling cloudsnaps
- PWX-4378 - Add read/write latency stats to the volume statistics
- PWX-4923 - Add vol_ prefix to read/write volume latency statistics
- PWX-4288 - Handle app container restarts attached to a shared volume if the mount path was unmounted via unmount command
- PWX-4372 - Gracefully handle trial license expiry and Portworx cluster reinstall
- PWX-4544 - Portworx OCI install is unable to proceed with aquasec container installed
- PWX-4531 - Add OS Distribution and Kernel version display in
pxctl status
- PWX-4547 - cloudsnap display catalog with volume name hits “runtime error: index out of range”
- PWX-4585 - handle kvdb server timeouts with an improved retry mechanism
- PWX-4665 - Do not allow drive add to a pool if a rebalance operation is already in progress
- PWX-4691 - Do not allow snapshots on down nodes or if the node is in maintenance mode
- PWX-4397 - Set the correct zone information for all replica-sets
- PWX-4375 - Add
pxctl upgrade
support for OCI containers - PWX-4733 - Remove Swarm Node ID check dependencies for Portworx bring up
- PWX-4484 - Limit replication factor increases to a limit of three at a time within a cluster and one per node
- PWX-4090 - Reserve space in each pool to handle rebalance operations
- PWX-4544 - Handle ./aquasec file during OCI-Install so Portworx can be installed in environments with aquasec
- PWX-4497 - Enable minio to mount shared volumes
- PWX-4551 - Improve
pxctl volume inspect
to show pools on which volumes are allocated, replica nodes and replication add - PWX-4884 - Prevent replication factor increases if all the nodes in the cluster are not running 1.3.0
- PWX-4504 - Show all the volumes present on a node in CLI with a
--node
option - PWX-4824 -
pxctl volume inspect
doesn’t show replication set information properly when one node is out of quorum - PWX-4784 - Support SELinux in 4.12.x kernels and above by setting SELinux context correctly
- PWX-4812 - Handle Kernel upgrades correctly
- PWX-4814 - Synchronize snapshot operations per node
- PWX-4471 - Enhancements to OCI Mount propagation to automount relevant scheduler dirs
- PWX-4721 - When a large number of volumes are cloud snapped at the same time, Portworx container hits a panic
- PWX-4789 - Handle cloudsnaps errors when the schedule has been moved or deleted
- PWX-4709 - Support for adding CloudDrive (EBS volume) to an existing node in a cluster
- PWX-4777 - Fix issues with
pxctl volume inspect
on shared volumes hanging when a large number of volume inspects are done - PWX-4525 -
pxctl status
shows an invalid cluster summary in some nodes when performing an upgrade from 1.2 to 1.3 - PWX-3071 - Provide the ability to force detach a remotely mounted Portworx volume from a single node when the node is down
- PWX-4772 - Handle storage full conditions more gracefully when the backing store for a Portworx volume gets full
- PWX-4757 - Improve Portworx initialization during boot to handle out of quorum volumes gracefully.
- PWX-4747 - Improve a simultaneous large number of volume creates and volume attach/detach in multiple nodes
- PWX-4467 - Fix hangs when successive volume inspects come to the same volume with cloudsnap in progress
- PWX-4420 - Fix race between POD delete and volume unmounts
- PWX-4206 - Under certain conditions, creating a snap using Kubernetes PVC creates a new volume instead of a snapshot
- PWX-4207 - Fix nil pointer dereferences when creating snapshots via Kubernetes
Errata
- PWX-3982 After putting a node into maintenance mode, adding drives, and then running “pxctl service m –exit”, the message “Maintenance operation is in progress, cancel the operation or wait for completion” doesn’t specify which operation hasn’t completed. Workaround: Use pxctl to query the status of all three drive operations (add, replace, rebalance). pxctl then reports which drive operations are in progress and allows exiting from maintenance mode if all maintenance operations are completed.
- PWX-4016 When running under Kubernetes, adding a node label for a scheduled cloudsnap fails with the error “Failed to update k8s node”. A node label isn’t needed for cloudsnaps because they are read-only and used only for backup to the cloud.
- PWX-4021 In case of a failure while a read-only snapshot create operation is in progress, Portworx might fail to come back up. This can happen if the failure coincides with snapshot creation’s file system freeze step, which is required to fence incoming IOs during the operation. To recover from this issue, reboot the node.
- PWX-4027 Canceling a service drive replace operation fails with the message “Replace cancel failed - Not in progress”. However, if you try to exit maintenance mode, the status message indicates that a maintenance operation is in progress. Workaround: Wait for the drive to replace operation to finish. The replace operation might be in a state where it can’t be canceled. Cancel operations are performed when possible.
- PWX-4039 When running Ubuntu on Azure, an XFS volume format fails. Do not use XFS volumes when running Ubuntu on Azure.
- PWX-4043 When a Portworx POD gets deleted in Kubernetes, no alerts are generated to indicate the POD deletion via kubectl.
- PWX-4050 For a Portworx cluster that’s about 100 nodes or greater: If the entire cluster goes down with all the nodes offline, as nodes come online a few nodes get restarted because they are marked offline. A short while after, the system converges and the entire cluster becomes operational. No user intervention required.
- Key Management with AWS KMS doesn’t work anymore because of API changes in the AWS side. Will be fixed in an upcoming release. Refer to this link for additional details. https://github.com/aws/aws-cli/issues/1043
- When shared volumes are configured with io_profile=cms, it results in the px-ns process restarting occasionally.
1.2.23
April 20, 2018
This is a minor update that fixes a panic seen in some Kubernetes environments when the user upgraded from an older version of Portworx to 1.2.22
PWX-5107 - Check if node spec is present before adding the node for volume state change events
1.2.22
February 28, 2018
Key Features and Enhancements
- Support SELinux enable in kernels 4.12.x and above
- Support automatic kernel upgrades. If you expect your environment to upgrade kernels automatically, Portworx by Pure Storage recommends upgrading to 1.2.22.0
1.2.20.0
February 15, 2018
- Minor update to enhance write performance for remote mounts with shared volumes
- 4.15.3 Linux kernel support
Errata (Errata remains the same from 1.2.11.0 release)
- PWX-3982 After putting a node into maintenance mode, adding drives, and then running “pxctl service m –exit”, the message “Maintenance operation is in progress, cancel the operation or wait for completion” doesn’t specify which operation hasn’t completed. Workaround: Use pxctl to query the status of all three drive operations (add, replace, rebalance). pxctl then reports which drive operations are in progress and allows exiting from maintenance mode if all maintenance operations are completed.
- PWX-4014 The pxctl cloudsnap schedule command creates multiple backups for the scheduled time. This issue has no functional impact and will be resolved in the upcoming release.
- PWX-4016 When running under Kubernetes, adding a node label for a scheduled cloudsnap fails with the error “Failed to update k8s node”. A node label isn’t needed for cloudsnaps because they are read-only and used only for backup to the cloud.
- PWX-4017 An incremental cloudsnap backup command fails with the message “Failed to open snap for backup”. Logs indicate that the backup wasn’t found on at least on one of the nodes where the volume was provisioned. Workaround: Trigger another backup manually on the nodes that failed.
- PWX-4021 In case of a failure while a read-only snapshot create operation is in progress, Portworx might fail to come back up. This can happen if the failure coincides with snapshot creation’s file system freeze step, which is required to fence incoming IOs during the operation. To recover from this issue, reboot the node.
- PWX-4027 Canceling a service drive replace operation fails with the message “Replace cancel failed - Not in progress”. However, if you try to exit maintenance mode, the status message indicates that a maintenance operation is in progress. Workaround: Wait for the drive to replace operation to finish. The replace operation might be in a state where it can’t be canceled. Cancel operations are performed when possible.
- PWX-4039 When running Ubuntu on Azure, an XFS volume format fails. Do not use XFS volumes when running Ubuntu on Azure.
- PWX-4043 When a Portworx POD gets deleted in Kubernetes, no alerts are generated to indicate the POD deletion via kubectl.
- PWX-4050 For a Portworx cluster that’s about 100 nodes or greater: If the entire cluster goes down with all the nodes offline, as nodes come online a few nodes get restarted because they are marked offline. A short while after, the system converges and the entire cluster becomes operational. No user intervention required.
- Key Management with AWS KMS doesn’t work anymore because of API changes in the AWS side. Will be fixed in an upcoming release. Refer to this link for additional details. https://github.com/aws/aws-cli/issues/1043
- PWX-4721 - When cloud-snap is performed on a large number of volumes, it results in a Portworx container restart. A workaround is to run cloudsnaps on up to 10 volumes concurrently.
1.2.18.0
February 13, 2018
Key Features and Enhancements
- Improve file import and untar performance when shared volumes are used by Wordpress and tune for WordPress plugin behavior
Errata (Errata remains the same from 1.2.11.0 release)
- PWX-3982 After putting a node into maintenance mode, adding drives, and then running “pxctl service m –exit”, the message “Maintenance operation is in progress, cancel the operation or wait for completion” doesn’t specify which operation hasn’t completed. Workaround: Use pxctl to query the status of all three drive operations (add, replace, rebalance). pxctl then reports which drive operations are in progress and allows exiting from maintenance mode if all maintenance operations are completed.
- PWX-4014 The pxctl cloudsnap schedule command creates multiple backups for the scheduled time. This issue has no functional impact and will be resolved in the upcoming release.
- PWX-4016 When running under Kubernetes, adding a node label for a scheduled cloudsnap fails with the error “Failed to update k8s node”. A node label isn’t needed for cloudsnaps because they are read-only and used only for backup to the cloud.
- PWX-4017 An incremental cloudsnap backup command fails with the message “Failed to open snap for backup”. Logs indicate that the backup wasn’t found on at least on one of the nodes where the volume was provisioned. Workaround: Trigger another backup manually on the nodes that failed.
- PWX-4021 In case of a failure while a read-only snapshot create operation is in progress, Portworx might fail to come back up. This can happen if the failure coincides with snapshot creation’s file system freeze step, which is required to fence incoming IOs during the operation. To recover from this issue, reboot the node.
- PWX-4027 Canceling a service drive replace operation fails with the message “Replace cancel failed - Not in progress”. However, if you try to exit maintenance mode, the status message indicates that a maintenance operation is in progress. Workaround: Wait for the drive to replace operation to finish. The replace operation might be in a state where it can’t be canceled. Cancel operations are performed when possible.
- PWX-4039 When running Ubuntu on Azure, an XFS volume format fails. Do not use XFS volumes when running Ubuntu on Azure.
- PWX-4043 When a Portworx POD gets deleted in Kubernetes, no alerts are generated to indicate the POD deletion via kubectl.
- PWX-4050 For a Portworx cluster that’s about 100 nodes or greater: If the entire cluster goes down with all the nodes offline, as nodes come online a few nodes get restarted because they are marked offline. A short while after, the system converges and the entire cluster becomes operational. No user intervention required.
- Key Management with AWS KMS doesn’t work anymore because of API changes in the AWS side. Will be fixed in an upcoming release. Refer to this link for additional details. https://github.com/aws/aws-cli/issues/1043
1.2.16.2
March 19, 2018
- This is a minor update that fixes volume size not updating whenever the content of the encrypted volume is deleted
1.2.16.1
March 2, 2018
This is a minor update which adds a new flag to limit or disable the generation of core files (-e PXCORESIZE=<size>
). A value of 0 will disable cores
1.2.16.0
February 5, 2018
This is a minor update with performance enhancements for shared volumes to support a large number of directories and files.
Key Fixes
- Shared volume access latency improvements when managing filesystems with a large number of directories and files
Errata (Errata remains the same from 1.2.11.0 release)
- PWX-3982 After putting a node into maintenance mode, adding drives, and then running “pxctl service m –exit”, the message “Maintenance operation is in progress, cancel the operation or wait for completion” doesn’t specify which operation hasn’t completed. Workaround: Use pxctl to query the status of all three drive operations (add, replace, rebalance). pxctl then reports which drive operations are in progress and allows exiting from maintenance mode if all maintenance operations are completed.
- PWX-4014 The pxctl cloudsnap schedule command creates multiple backups for the scheduled time. This issue has no functional impact and will be resolved in the upcoming release.
- PWX-4016 When running under Kubernetes, adding a node label for a scheduled cloudsnap fails with the error “Failed to update k8s node”. A node label isn’t needed for cloudsnaps because they are read-only and used only for backup to the cloud.
- PWX-4017 An incremental cloudsnap backup command fails with the message “Failed to open snap for backup”. Logs indicate that the backup wasn’t found on at least on one of the nodes where the volume was provisioned. Workaround: Trigger another backup manually on the nodes that failed.
- PWX-4021 In case of a failure while a read-only snapshot create operation is in progress, Portworx might fail to come back up. This can happen if the failure coincides with snapshot creation’s file system freeze step, which is required to fence incoming IOs during the operation. To recover from this issue, reboot the node.
- PWX-4027 Canceling a service drive replace operation fails with the message “Replace cancel failed - Not in progress”. However, if you try to exit maintenance mode, the status message indicates that a maintenance operation is in progress. Workaround: Wait for the drive to replace operation to finish. The replace operation might be in a state where it can’t be canceled. Cancel operations are performed when possible.
- PWX-4039 When running Ubuntu on Azure, an XFS volume format fails. Do not use XFS volumes when running Ubuntu on Azure.
- PWX-4043 When a Portworx POD gets deleted in Kubernetes, no alerts are generated to indicate the POD deletion via kubectl.
- PWX-4050 For a Portworx cluster that’s about 100 nodes or greater: If the entire cluster goes down with all the nodes offline, as nodes come online a few nodes get restarted because they are marked offline. A short while after, the system converges and the entire cluster becomes operational. No user intervention required.
1.2.14
January 17, 2018
This is a minor update to support the older Linux kernel versions (4.4.0.x) that ships with Ubuntu distributions
1.2.12.1
January 8, 2018
This is a minor update to support Openshift with SELinux enabled as well as verify with SPECTRE/Meltdown kernel patches
- Verified with the latest kernel patches for SPECTRE/Meltdown issue for all major Linux distros
1.2.12.0
December 22, 2017
This is a minor update to enhance metadata performance on a shared namespace volume.
Key Fixes
- Readdir performance for directories with a large number of files (greater 128K file count in a single directory)
- Portworx running on AWS AutoScalingGroup now handles existing devices attached with names such as
/dev/xvdcw
which have an extra letter at the end. - Occasionally, containers that use shared volumes could get a “transport end point disconnected” error when Portworx restarts. This has been resolved.
- Fixed an issue where Portworx failed to resolve Kubernetes services by their DNS names if the user sets the Portworx DaemonSet DNS Policy as
ClusterFirstWithHostNet
. - PWX- 4078 When Portworx runs in 100s of nodes, a few nodes show high memory usage.
1.2.11.10
December 19, 2017
This is a minor update to address an issue with installing a reboot service while upgrading a runC container.
Key Fixes
- When upgrading a runC container, the new version will correctly install a reboot service. A reboot service (systemd service) is needed to reduce the wait time before a Portworx device returns with a timeout when the Portworx service is down. Without this reboot service, a node can take 10 minutes to reboot.
1.2.11.9
December 18, 2017
Key Fixes
Pass volume name as part of the metrics endpoint so Prometheus/Grafana can display with volume name
- Add current ha level of the volume and io_priority of the volumes to the metrics endpoint
- Abort all pending I/Os the pxd device during a reboot so speed up reboots
- Move the px-ns internal port from 7000 to 9013
- Remove the unnecessary warning string “Data is not local to the node”
- Add px_ prefix to all volume labels
Errata
- Do not manually unmount a volume by using Linux
umount
command for shared volume mounts. This errata applies to the previous versions of Portworx as well.
1.2.11.8
December 11, 2017
Key Fixes
- Fix resync mechanism for read-only snapshots
- Improve log space utilization by removing old log files based on space usage
Errata
- Do not manually unmount a volume by using Linux
umount
command for shared volume mounts. This errata applies to the previous versions of Portworx as well.
1.2.11.7
December 7, 2017
Key Fixes
- Suppress un-necessary log prints about cache flush
- PWX-4272 Handle remote host shutdowns gracefully for shared volumes. In the past, this could leave stray TCP connections.
Errata
- Do not manually unmount a volume by using Linux
umount
command for shared volume mounts. This errata applies to the previous versions of Portworx as well.
1.2.11.6
November 28, 2017
Key Fixes
- Provide the capability to drop system cache on-demand (for select workloads and large memory system) and turn it off by default
1.2.11.5
November 22, 2017
Key Features and Enhancements
- PWX-4178 Perform snapshots in kubernetes via annotations
1.2.11.4
November 20, 2017
Key Features and Enhancements
- Portworx Enterprise container is now available in OCI Format
- Enhancements for db workloads to handle slow media
Key Fixes
- PWX-4224 Ignore
sticky
flag when purging old snapshots after a cloudsnap is completed. - PWX-4220
pxctl status
shows the first interface IP address instead of the mgmt. IP
1.2.11.3
November 16, 2017
Key Fixes
- Shared volume performance improvements
- Do not take an inline snap in Kubernetes when no valid candidate pvcs are found
1.2.11.2
November 11, 2017
Key Fixes
- Increase file descriptors to support a large number of shared volumes
1.2.11.1
November 7, 2017
Key Fixes
- Fix file descriptors not being released after reporting containers attached to a shared volume
1.2.11
October 31, 2017
Key Features and Enhancements
-
You can now update volume labels. The pxctl volume update command has a new option, –label pairs. Specify a list of comma-separated name=value pairs. For example, if the current labels are x1=v1,x2=v2:
The option “–labels x1=v4” results in the labels x1=v4,x2=v2.
The option “–labels x1=” results in the labels x2=v2 (removes a label).
-
Improvements to alerts:
- Additional alerts indicate the cluster status in much more finer detail. This document has more details on all the alerts posted by Portworx: Here
- Rate limiting for alerts so that an alert isn’t repeatedly posted within a short timeframe.
-
You can now update the io_profile field by using the
pxctl volume update
command so the parameter can be enabled for existing volumes.
Key Fixes
-
PWX-3146 Portworx module dependencies fail to load for openSUSE Leap 42.2, Kernel 4.4.57-18.3-default.
-
PWX-3362 If a node is in maintenance mode because of disk errors, the node isn’t switched to a storage-less node. As a result, other resources on the node (such as CPU and memory) aren’t usable.
-
PWX-3448 When Portworx statistics are exported, they include the volume ID instead of the volume name.
-
PWX-3472 When snapshots are triggered on a large number of volumes at the same time, the snap operation fails.
-
PWX-3528 Volume create option parsing isn’t unified across Kubernetes, Docker, and pxctl.
-
PWX-3544 Improvements to Portworx diagnostics - REST API to retrieve and upload diagnostics for a node or cluster. Diagnostics run using the REST API includes vmstat output and the output of pxctl cluster list and pxctl --json volume list. The diagnostics also include netstat -s before the node went down.
-
PWX-3558 px-storage dumps core while running an HA increase on multiple volumes during stress.
-
PWX-3577 When Portworx is running in a container environment, it should allow mounts on only those directories which are bind-mounted. Otherwise, Portworx hangs during a docker stop.
-
PWX-3585 If Portworx stops before a container that’s using its volume stops, the container might get stuck in the D state (I/O in kernel). As a result, ‘systemctl stop docker’ takes 10 minutes as does system shutdown. The default PXD_TIMEOUT to error out IOs is 10 minutes, but should be configurable.
-
PWX-3591 Storage isn’t rebalanced after a drive add operation and before exiting maintenance mode.
-
PWX-3600 Volume HA update operations on snapshots cannot be canceled.
-
PWX-3602 Removing a node from a cluster fails with the message “Could not find any volumes that match ID(s)”.
-
PWX-3606 Portworx metrics now include the following: Disk read and write latency stats, volume read and write latency stats, and per-process stats for CPU and virtual/resident memory.
-
PWX-3612 When creating or updating a volume, disallow the ability to set both the “shared” and “scale” options.
-
PWX-3614 A volume inspect returns the wrong error message when one node in the cluster is down: Could not find any volumes that match ID(s).
-
PWX-3620 The volume inspect command doesn’t show the replication set status, such as whether the replication set has down members or is in a clean or resync state.
-
PWX-3632 After a Kubernetes pod terminates and the Portworx volume unmount/cleanup fails, the kubelet logs include “Orphaned pod <name> found, but volume paths are still present on disk.”
-
PWX-3648 After all nodes in a cluster go offline: If a node doesn’t restart when the other nodes restart, the other restarting nodes don’t mark that node as offline.
-
PWX-3665 The Portworx live core collection hangs sometimes.
-
PWX-3666 The pxctl service diags command doesn’t store all diagnostics for all nodes in the same location. All diagnostics should appear in /var/cores.
-
PWX-3672 The watch function stops after a large time change, such as 7 hours, on the cluster.
-
PWX-3678 The pxctl volume update command interprets the -s option as -shared instead of -size and displays the message “invalid shared flag”.
-
PWX-3700 Multiple alerts appear after a drive add succeeds.
-
PWX-3701 The alert raised when a node enters maintenance mode specifies the node index instead of the node ID.
-
PWX-3704 After backing up a volume that’s in maintenance mode to the cloud, restoring the volume to any online node fails.
-
PWX-3709 High CPU usage occurs while detaching a volume with MySQL in Docker Swarm mode.
-
PWX-3743 In the service alerts output in the CLI, the Description items aren’t aligned.
-
PWX-3746 When a Portworx upgrade requires a node reboot, the message “Upgrade done” shouldn’t print.
-
PWX-3747 When a node exits from maintenance mode, it doesn’t generate an alert.
-
PWX-3764 The px-runc install command on a core node fails to configure the Portworx OCI service and generates the error “invalid cross-device link”.
-
PWX-3777 When running under Kubernetes, pods using a shared volume aren’t available after the volume becomes read-only.
-
PWX-3778 After adding a drive to a storage-less node fails: A second attempt succeeds but there is no message that the drive add succeeded.
-
PWX-3793 When running in Kubernetes, if an unmount fails for a shared volume with the error “volume not mounted”, the volume is stuck in a terminating state.
-
PWX-3817 When running under Kubernetes, a WordPress pod is stuck in terminating for almost ten minutes.
-
PWX-3820 When running Portworx as a Docker V2 plugin: After a service create –replicas command, a volume is mounted locally on a MySQL container instead of a Portworx container. The Swarm service fails with the error “404 Failed to locate volume: Cannot locate volume”. To avoid this issue, you can now specify the volume-driver with the service create command.
-
PWX-3825 When a node is in storage down state because the pool is out of capacity: A drive add fails with the error “Drive add start failed. drive size <size> too big” during an attempt to add the same size disk.
-
PWX-3829 Container status in the Portworx Lighthouse GUI isn’t updated properly from Portworx nodes.
-
PWX-3843 Portworx stats include metrics for utilized and available bytes, but not for total bytes (px_cluster_disk_total_bytes). As a result, alerts can’t be generated in Prometheus for storage utilization.
-
PWX-3844 When you add a snapshot schedule to a volume, the alert type is “Snapshot Interval update failure” instead of “Snapshot interval update success”.
-
PWX-3850 If the allocated io_priority differs from the requested io_priority, no associated alert is generated.
-
PWX-3851 When two Postgres pods attempted to use the same volume, one of the Postgres pods mounted a local volume instead of a Portworx volume.
-
PWX-3859 After adding a volume template to an Auto Scaling Group and Portworx adds tags to the volume: If you stop that cluster and then start a new cluster with the same volume, without removing the tags, a message indicates that the cluster is already initialized. The message should indicate that it failed to attach template volumes because the tag is already used. You can then manually remove the tags from the stopped cluster.
-
PWX-3862 A volume is stuck in the detaching state indefinitely due to an issue in etcd.
-
PWX-3867 When running under Kubernetes, a pod using namespace volumes generates the messages “Orphaned pod <pod> found, but volume paths are still present on disk”.
-
PWX-3868 A Portworx cluster shows an extra node when running with ASG templates enabled if the AWS API returns an error when the Portworx container is booting up.
-
PWX-3871 Added support for dot and hyphen in source and destination names in Kubernetes inline spec for snapshots.
-
PWX-3873 When running under Kubernetes, a volume detach fails on a regular volume, with the message “Failed to detach volume: Failed with status -16”, and px-storage dumps core.
-
PWX-3875 After volume unmount and mount commands are issued in quick succession, sometimes the volume mount fails.
-
PWX-3878 When running under Kubernetes, a Postgres pod gets stuck in a terminating state during when the POD gets deleted.
-
PWX-3879 During volume creation on Kubernetes, node labels aren’t applied on Kubernetes nodes.
-
PWX-3888 An HA increase doesn’t use the node value specified in the command if the node is from a different region.
-
PWX-3895 The pxctl volume list command shows a volume but volume inspect cannot find it.
-
PWX-3902 If a Portworx container is started with the API_SERVER pointing to Lighthouse and etcd servers are also provided, the Portworx container doesn’t send statistics to Lighthouse.
-
PWX-3906 Orphaned pod volume directories can remain in a Kubernetes cluster after an unmount.
-
PWX-3912 During a container umount, namespace volumes might show the error “Device or resource busy”.
-
PWX-3916 Portworx rack information isn’t updated when labels are applied to a Kubernetes node.
-
PWX-3933 The size of a volume created by using a REST API call isn’t rounded to the 4K boundary.
-
PWX-3935 Lighthouse doesn’t show container information when Portworx is run as a Docker V2 plugin.
-
PWX-3936 A volume create doesn’t ignore storage-less nodes in a cluster and thus fails, because it doesn’t allocate the storage to available nodes.
-
PWX-3946 On a node where a cloudsnap schedule is configured: If the node gets decommissioned, the schedule isn’t configured for the new replica set.
-
PWX-3947 Simultaneous mount and unmount likely causes a race in teardown and setup.
-
PWX-3968 If Portworx can’t find a volume template in an Auto Scaling Group, it dumps core and keeps restarting.
-
PWX-3971 Portworx doesn’t install on an Azure Ubuntu 14 Distro with the 3.13.0-32-generic kernel.
-
PWX-3972 When you start a multi-node, multi-zone Auto Scaling Group with a max-count specified, Portworx doesn’t start on all nodes.
-
PWX-3974 When running under Kubernetes, a WordPress app writes data to the local filesystem after a shared volume remount failure (due to RPC timeout errors) during node start.
-
PWX-3997 When running under Kubernetes, deleting Wordpress pods results in orphaned directories.
-
PWX-4000 A drive add or replace fails when Portworx is in storage full/pool offline state.
-
PWX-4012 When using shared volumes: During a WordPress plugin installation, the WordPress pod prompts for FTP site permissions. Portworx now passes the correct GID and UUID to WordPress.
-
PWX-4049 Adding and removing Kubernetes node labels can fail during node updates.
-
PWX-4051 Previous versions of Portworx logged too many “Etcd did not return any transaction responses” messages. That error is now rate-limited to log only a few times.
-
PWX-4083 When volume is in a down state due to a create failure but is still attached without a shared volume export, the detach fails with the error “Mountpath is not mounted”.
-
PWX-4085 When running under Kubernetes, too many instances of this message get generated: “Kubernetes node watch channel closed. Restarting the watch..”
-
PWX-4131 Specifying -a or -A for providing disks to Portworx needs to handle mpath & raid drives/partitions as well
Errata
-
PWX-3982 After putting a node into maintenance mode, adding drives, and then running “pxctl service m –exit”, the message “Maintenance operation is in progress, cancel the operation or wait for completion” doesn’t specify which operation hasn’t completed. Workaround: Use pxctl to query the status of all three drive operations (add, replace, rebalance). pxctl then reports which drive operations are in progress and allows exiting from maintenance mode if all maintenance operations are completed.
-
PWX-4014 The pxctl cloudsnap schedule command creates multiple backups for the scheduled time. This issue has no functional impact and will be resolved in the upcoming release.
-
PWX-4016 When running under Kubernetes, adding a node label for a scheduled cloudsnap fails with the error “Failed to update k8s node”. A node label isn’t needed for cloudsnaps because they are read-only and used only for backup to the cloud.
-
PWX-4017 An incremental cloudsnap backup command fails with the message “Failed to open snap for backup”. Logs indicate that the backup wasn’t found on at least on one of the nodes where the volume was provisioned. Workaround: Trigger another backup manually on the nodes that failed.
-
PWX-4021 In case of a failure while a read-only snapshot create operation is in progress, Portworx might fail to come back up. This can happen if the failure coincides with snapshot creation’s file system freeze step, which is required to fence incoming IOs during the operation. To recover from this issue, reboot the node.
-
PWX-4027 Canceling a service drive replace operation fails with the message “Replace cancel failed - Not in progress”. However, if you try to exit maintenance mode, the status message indicates that a maintenance operation is in progress. Workaround: Wait for the drive to replace operation to finish. The replace operation might be in a state where it can’t be canceled. Cancel operations are performed when possible.
-
PWX-4039 When running Ubuntu on Azure, an XFS volume format fails. Do not use XFS volumes when running Ubuntu on Azure.
-
PWX-4043 When a Portworx POD gets deleted in Kubernetes, no alerts are generated to indicate the POD deletion via kubectl.
-
PWX-4050 For a Portworx cluster that’s about 100 nodes or greater: If the entire cluster goes down with all the nodes offline, as nodes come online a few nodes get restarted because they are marked offline. A short while after, the system converges and the entire cluster becomes operational. No user intervention required.
1.2.10.2
October 6, 2017
Key Fixes
- Fix boot issues with Amazon Linux
- Fix issues with shared volume mount and unmount with multiple containers with kubernetes
1.2.10
September 18, 2017
Key Fixes
- Fix issue when a node running Portworx goes down, it never gets marked down in the kvdb by other nodes.
- Fix issue when a container in Lighthouse UI always shows as running even after it has exited
- Auto re-attach containers mounting shared volumes when Portworx container is restarted.
- Add Linux immutable (CAP_LINUX_IMMUTABLE) when Portworx is running as Docker V2 Plugin
- Set autocache parameter for shared volumes
- On volume mount, make the path read-only if an unmount comes in if the POD gets deleted or Portworx is restarted during POD creation. On unmount, delete the mount path.
- Remove the volume quorum check during volume mounts so the mount can be retried until the quorum is achieved
- Allow snapshot volume source to be provided as another Portworx volume ID and Snapshot ID
- Allow inline snapshot creation in Portworx Kubernetes volume driver using the Portworx Kubernetes volume spec
- Post log messages indicating when logging URL is changed
- Handle volume delete requests gracefully when Portworx container is starting up
- Handle service account access when Portworx is running as a container instead of a daemonset when running under kubernetes
- Implement a global lock for kubernetes filter such that all cluster-wide Kubernetes filter operations are coordinated through the lock
- Improvements in unmounting/detaching handling in kubernetes to handle different POD clean up behaviors for deployments and statefulsets
Errata
- If two containers using the same shared volume are run in the same node using docker, when one container exits, the container’s connection to the volume will get disrupted as well. The workaround is to run containers using shared volume in two different Portworx nodes
1.2.9
August 23, 2017
Important: If you are upgrading from an older version of Portworx (1.2.8 or older) and have Portworx volumes in the attached state, you will need node reboot after upgrade in order for the new version to take effect properly.
Key Features and Enhancements
- Provide the ability to cancel a replication add or HA increase operation
- Automatically decommission a storageless node in the cluster if it has been offline for longer than 48 hours
- Kubernetes snapshots driver for {{< pxEnterprise >}}
- Improve Kubernetes mount/unmount handling with POD failovers and moves
Key Fixes
- Correct mountpath retrieval for encrypted volumes
- Fix cleanup path maintenance mode exit issue and clean up alerts
- Fix S3 provider for compatibility issues with legacy object storage providers not supporting ListObjectsV2 API correctly.
- Add more cloudsnap related alerts to indicate cloudsnap status and any cloudsnap operation failures.
- Fix config.json for Docker Plugin installs
- Read topology parameters on Portworx restart so RACK topology information is read correctly on restarts
- Retain environment variables when Portworx is upgraded via
pxctl upgrade
command - Improve handling for encrypted scale volumes
Errata
- When {{< pxEnterprise >}} is run on a large number of nodes, there is potential memory leak and a few nodes show high memory usage. This issue is resolved in 1.2.12.0 onwards. The workaround is to restart the {{< pxEnterprise >}} container
1.2.8
June 27, 2017
Key Features and Enhancements
- License Tiers for {{< pxEnterprise >}}
1.2.5 Release notes
June 16, 2017
Key Features and Enhancements
- Increase volume limit to 16K volumes
Key Fixes
- Fix issues with volume CLI hitting a panic when used the underlying devices are from LVM devices
- Fix Portworx bootstrap issues with pre-existing snapshot schedules
- Remove alerts posted when volumes are mounted and unmounted
- Remove duplicate updates to kvdb
1.2.4
June 8, 2017
Key Features and Enhancements
- Support for –racks and –zones option when creating replicated volumes
- Improved replication node add speeds
- Node labels and scheduler convergence for docker swarm
- Linux Kernel 4.11 support
- Unique Cluster-specific bucket for each cluster for cloudsnaps
- Load balanced cloudsnap backups for replicated Portworx volumes
- One-time backup schedules for Cloudsnap
- Removed the requirement to have /etc/pwx/kubernetes.yaml in all Kubernetes nodes
Key Fixes
pxctl cloudsnap credentials
command has been moved underpxctl credentials
- Docker inline volume creation support for setting volume aggregation level
- –nodes support for docker inline volume spec
- Volume attach issues after a node restart when container attaching to a volume failed
- Portworx displays issues in Prometheus
- Cloudsnap scheduler display issues where the existing schedules were not seen by some users.
- Removed snapshots from being counted into to total volume count
- Removed non-Portworx related metrics being pushed to Prometheus
- Added CLI feedback and success/failure alerts for
pxctl volume update
command - Fixed issues with Cloudsnap backup status updates for container restarts
1.2.3
May 30, 2017
Key Features and Enhancements
No new features in 1.2.3. This is a patch release.
Key Fixes
- Performance improvements for database workloads
1.2.2
May 24, 2017
Key Features and Enhancements
No new features in 1.2.2. This is a patch release.
Key Fixes
- Fix device detection in AWS authenticated instances
1.2.1
May 9, 2017
Key Features and Enhancements
No new features in 1.2.1. This is a patch release.
Key Fixes
- Fix issues with pod failovers with encrypted volumes
- Improve performance with remote volume mounts
- Add compatibility for Linux 4.10+ kernels
1.2.0
April 27, 2017
Key Features and Enhancements
- AWS Auto-scaling integration with Portworx managing EBS volumes for EC2 instances in AWS ASG
- Multi-cloud Backup and Restore of Portworx volumes
- Encrypted Volumes with Data-at-rest and Data-in-flight encryption
- Docker V2 Plugin Support
- Prometheus Integeration
- Hashicorp Vault, AWS KMS integration and Docker Secrets Integration
- Dynamically resize Portworx volumes with no application downtime
- Improved the security of the Portworx container
Key Fixes
- Issues with volume auto-attach
- Improved network diagnostics on Portworx container start
- Added an alert when volume state transitions to read-only due to loss of quorum
- Display multiple attached hosts on shared volumes
- Improve shared volume container attach when the volume is in resync state
- Allow pxctl to run as a normal user
- Improved pxctl help text for commands like pxctl service