Google adds no-code computer vision platform to Vertex AI

At the Google Cloud Next conference, Google showcased a new computer vision platform, Vertex AI Vision, which simplifies the process of creating analytics based on camera feeds and live video. Currently in preview, Vertex AI Vision is an extension of AutoML Vision which can train models to perform image classification and object detection.

Vertex AI Vision provides a canvas for building end-to-end machine learning pipelines spanning the full spectrum of computer vision inference and analysis. It targets decision makers and business analysts who want to create computer vision-based analytics without dealing with complex code. Vertex AI Vision also has an SDK allowing developers to extend functionality and integrate output into web and mobile applications.

Companies have already invested in dozens of surveillance cameras and CCTVs that continuously generate video feeds. On the other hand, there are several pre-trained models that can perform sophisticated image classification, object recognition, and image segmentation. But connecting the dots between data sources (cameras) and ML models to derive intelligent insights and analytics requires advanced skills. Customers should hire skilled ML engineers to build inference pipelines to get actionable insights.

Vertex AI Vision addresses this challenge by providing a no-code environment that does the heavy lifting. Users can easily connect remote streaming inputs from existing cameras to ML models to perform inference. The output of video streams and models are stored in a Vision Warehouse to extract metadata. The same result can be stored in a BigQuery table, making it easier to query and analyze the data. It is also possible to see the output of the stream in real time to validate and monitor the accuracy of the inference pipeline.

Vertex AI Vision has several pre-trained models that can be quickly integrated into the pipeline. The occupancy analysis model allows users to count people or vehicles based on specific inputs added in the video images. The person blur model protects the privacy of people who appear in input videos through distortion, such as masking or blurring the appearance of people in output videos. The person/vehicle detector model can detect and count people or vehicles in video images. The motion filter model reduces computation time by breaking down long video sections into smaller segments that contain a motion event.

In addition to pre-trained models, customers can import existing models trained within the Vertex AI platform. This extends functionality by mixing and matching different designs.

The new platform is based on Google’s principles of fairness, safety, privacy and security, inclusiveness and transparency. Google claims that the new Vision AI Vision platform will only cost a tenth of the currency offerings. As a preview, pricing details are yet to be disclosed. The service is only available in the us-central1 region.

In its current form, Vertex AI Vision is not integrated with Anthos and cannot be run in hybrid mode in the data center or at the edge. Customers must ingest video streams into Google Cloud to run the inference pipeline. Verticals such as healthcare and automotive requiring high throughput and low latency cannot take advantage of Vertex AI Vision. Google should consider deploying Vision AI applications at the edge with the output stored in a local warehouse.

Google’s Vertex AI Vision competes with low-code/no-code platforms such as Amazon SageMaker JumpStart and Azure ML designer. With the rise of large language models and advances in transformer-based natural language processing, expect to see no-code development platforms expanded to support conversational AI.

Next Federated Cohort Learning, Deep Reinforcement Learning, and Oppression Algorithms