Polyaxon monthly updates
As many of you know, at Polyaxon, we maintain a fast pace of development and ship several features on a weekly or bi-weekly basis. Polyaxon Cloud is released almost daily, and sometimes multiple times per day. While we try to keep our constantly growing user-base up-to-date on our latest and greatest features, many have requested a better change log location to look for updates. Currently, we keep up-to-date release notes, but it lacks visual elements to give more context to UI and UX features and highlights the most important changes.
From this release on, we will use the
announcements tag to go over interesting new features or enhancements that we think might go unnoticed in the release notes on a monthly basis.
New hyperparameter tuning capabilities
Polyaxon v1 comes with a new and different approach to managing and monitoring hyperparameter tuning:
- It manages concurrency on-pull instead of on-push.
- It uses the same pipeline abstraction used by DAGs and Schedules to follow dependent runs.
- It provides a fully customizable interface.
- It creates suggestions in a serverless way which means no more long running service is required.
In Polyaxon v1.2, we introduced some new features that improve other aspects of running hyperparameter tuning:
- Search and comparison.
The optimization algorithms provided by our platform are iterative. One of the issues we had we Polyaxon v0 was an easy way to debug or improve an optimization algorithm.
In Polyaxon v1.2 we introduced a new lineage artifact called
The artifacts lineage tab shows both iteration inputs and outputs.
For instance, this bayesian optimization has several iterations, the last iteration will create suggestions based on 49 observations:
And will generate one new suggestion:
Using these lineage artifacts, users can copy the observation configs and the metrics and can iterate on them offline to improve the suggestion algorithm. It’s also useful in case of an error to reproduce the error on a local machine without requiring a full rerun of the optimization process.
Polyaxon now stores the logs coming from the tuning containers which comes handy in case of an error:
Since it’s possible to run the optimization outside of a Polyaxon cluster based on the iteration lineage, users can improve the tuning process or use different algorithms. In addition to the generic iterative optimization process that Polyaxon exposes, it’s possible now to customize the tuners that Polyaxon uses, in other terms, users can replace the default Bayesian optimization, hyperband, … implementations with their logic simply by overriding the container section.
version: 1.1 kind: operation matrix: kind: bayes container: NEW_CONTAINER_DEFINITION ...
Midrun concurrency update
In v0 it was impossible to change the concurrency of a hyperparameter tuning once it’s scheduled. In v1 we introduced queues, where users can use to schedule their operations and can update the concurrency and priority of those queues, which would impact all runs and pipelines scheduled on those queues.
In v1.2, we allow users not only to update the concurrency on queues but also to update the concurrency on any pipeline, including hyperparameter tuning pipelines. This means that users will not need to update the queue concurrency, to not impact other runs and pipelines, and they can increase or decrease the concurrency on per pipeline or nested pipeline (Note that the concurrency of any pipeline is limited by the concurrency of the queue it runs on).
Currently this feature is possible via the Client/API but we will expose a UI for this very soon.
Reproducing and rerunning
Since it’s possible to copy the iteration’s observations, metrics, and suggestions. Users can also paste the suggestions of any iteration into a
mapping and run them in parallel without the need to rerun the full optimization process.
... matrix: kind: mapping values: [...] # values copied from the lineage # Ref to the component to run ...
Iteration filters and comparison
Finally, it’s possible to filter runs per iteration and compare them or plot visualizations. It’s also possible to compare one iteration against another.
Run info and metadata
Polyaxon API and UI have now a proper field
schedule_at when a run is scheduled to run in the future or if it’s attached to a schedule.
This change will reduce any confusion about runs in the
created status without extra information about their state. It’s also possible to filter by all upcoming runs, in addition to the schedule view where the runs are groups per scheduling pipeline.
Agent and Queue
In Polyaxon v1.1.9 we introduced the possibility to filter runs by queue and/or agent in the comparison table:
In v1.2.1 we added the meta-information on the run overview page:
New search and visualization interface
This is one of the most exciting features in the last couple of releases, we finally shipped a better experience for filtering and comparing runs.
Polyaxon UI comes with several quick filters to avoid reconstructing queries for usual use cases:
Refresh on update and share filters
Sometimes it makes sense to apply the filters automatically without clicking the refresh/apply button, and also to share a search query without saving it to the db, users can use the new share filter button:
The comparison table and visualization view allows having more control by resizing the containers:
And so many fields are exposed for querying and sorting
You can see a non-exhaustive list of all fields that you can use for filtering runs, e.g. commit, docker image, pipeline ids, kinds, runtimes, cloning behavior, flags, and metadata, …, in addition to inputs and outputs, metrics. We are also in the process of allowing users to filter directly by artifact logged which will bring a more powerful data mining interface to all Polyaxon clients, API, UI, and CLI.
Better cache hit compilation and UX
In v1.2, we refactored the compiler to introduce a better cache hit detection and heuristics for the default behavior, the change brings over 80% performance to the compilation process and avoids searching large branches (leaves with nested workflows or hyperparameter tuning).
For example when rerunning an hyperparameter tuning operation, Polyaxon will quickly flag all cache hits and only schedule operations that were not visited before.
Of course users can define how the cache must behave and if it must invalidate any state based on the
Polyaxon UI has much better handling of cache hits, we currently display a banner and proxy all logs, lineage data, and metrics from the original run.
In Polyaxon v0 and the first release of Polyaxon v1, we only displayed the raw content of the Polyaxonfile used for creating an operation and starting a run. We later added the compiled specification to help users have a better idea about the compiled version of their Polyaxonfile and how the parameters and the context were resolved and injected in their template. In v1.2 we added another tab that contains a split view where it’s easier to view the component and the operation.
Updating the last status from the UI
It was possible to use the Python Client, the API, or one of the language SDKs to update/override the last status of a run. In Polyaxon v1.3 we added a new possibility to do the same procedure directly from the UI.
New pricing and plans structure
Polyaxon cloud delivers all the key features of the Polyaxon suite but in a completely managed cloud-native environment. Users only need to deploy our fault-tolerant agent on their Kubernetes cluster, Polyxon Control Plane delivers all orchestration and scheduling abstractions and lets you seamlessly connect your compute resources in minutes.
We refactored the pricing for Polyaxon Cloud entry plans to enable more features for all plans. We are also displaying transparent pricing for additional agents. And we start offering a 14-days free trial on all Cloud plans, no credit card required.
While Polyaxon always delivered the power needed to do data science at scale, teams also needed strong system administration to manage and keep a stateful service healthy such as Polyaxon running on Kubernetes in production. Polyaxon needs monitoring and management, backup, redundancy, capacity and upgrade planning, to name just a few. That represents a real challenge for some organizations. In addition to the complexity of Kubernetes, users running Polyaxon in production need to keep up with the updates and migrations that we ship continuously.
With Cloud, we’ve removed the barriers to entry and decoupled the control plane from the workload, code. datasets, and artifacts that stay on the users’ cluster. We only provide a smaller subset of our services to run on the user’s cloud or on-prem cluster, which makes it possible for smaller teams to start doing data science faster.
Polyaxon Cloud includes all of the features that were challenging for teams to set up, scaling over multiple namespaces and clusters, security and isolation, backup, and automation.
With our new offering, we want to deliver on our ultimate vision for a fully collaborative, shareable, scallable, portable, privacy preserving, and reproducible data science platform.
Over the next couple of releases, we are working on more features to achieve that vision, some of these features include:
Component Hub is coming out of beta in the next release and will be enabled to all accounts following their quotas.
Model Registry is still in beta, but we will soon start enabling some of the features that are getting stable such as experiment promotion, locking, and versioning in the registry. The more complex features such as feedback metrics and monitoring will take a bit more time until we finalize the events interface to avoid breaking changes for a large number of users.
Control plane connections management
Currently updating a connection in an agent requires a redeploy of the agent with the updated configuration. We are currently working on a new feature to allow organizations using Polyaxon Cloud or Polyaxon EE to opt to manage connections via the control plane UI, which should essentially eliminate the need for a redeploy when a connection definition needs to change. This should:
- Improve testing new connections
- Reduce friction
- Improve adoption of more granular connections for datasets, git repos, …
This change also does not impact in any way the isolation, security, and privacy we strive to provide to our customers both Cloud and EE, since all secrets and actual logic is still hosted and runs on the users’ clusters.
More lineage information
Currently, Polyaxon UI shows lineage information related to artifacts. In the upcoming releases, Polyaxon’s lineage page will show more information about:
- Related runs (restarts and copies).
- Upstream and downstream runs when an operation is running in the context of a DAG or if an operation requires artifacts or inputs from previous runs.
- Connections lineage information.
This is one of the requests that keep coming, especially from users of Polyaxon v0, we will provide a simple way to upload code and also artifacts that need to be provisioned before a run starts. We are also thinking about making the build-and-run recipe as simple as in v0.
Adding the schedule_at information to the compiler’s context
Polyaxon schedules are fault-tolerant and highly available, although they provide a similar interface to cron, they run within our scheduler and they are decoupled from the cron system to handle several use cases related to time zones, availability, and conditioning. When there’s an issue on the user’s agent deployment, and the agent is not picking queued operations, a scheduled run might not run at the DateTime when it was supposed to. As soon as the agent is healthy again and can connect to the control plane, Polyaxon will not only schedule the upcoming runs on time but it will also schedule all runs that were supposed to run before. This is both a feature and a bug and in the upcoming releases we would like to:
- Give the user the option to specify if the scheduler should start those run or not.
- Pass the
schedule_atinformation to the compiler, so that the user code can use it to handle extra logic on their code.
Add graph visualization for DAG and Matrix run
Currently, Polyaxon UI shows a grouped view for dags, hyperparameter tuning, and mapping using the comparison table.
In future releases, we would like to bring a visual view using
dagre-d3 to show the dependency between runs in the context of a pipeline.
Learn More about Polyaxon
This blog post just goes over a couple of features that we shipped since our last product update, there are several other features and fixes that are worth checking. To learn more about all the features, fixes, and enhancements, please visit the release notes.
Polyaxon continues to grow quickly and keeps improving and providing the simplest machine learning layer on Kubernetes. We hope that these updates will improve your workflows and increase your productivity, and again, thank you for your continued feedback and support.
Subscribe to Polyaxon
Get the latest posts delivered right to your inbox