GCP Data Analytic Services

Collect, store, process, and analyze large amounts of data.

Sl Product Description Usage Reference
1 Big Query BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using standard SQL and taking advantage of flexible pricing models across on-demand and flat-rate options. Data Warehouse (OLAP) Big Query Reference
2 Data Flow Dataflow is a managed service for executing a wide variety of data processing patterns. The documentation on this site shows you how to deploy your batch and streaming data processing pipelines using Dataflow, including directions for using service features. The Apache Beam SDK is an open source programming model that enables you to develop both batch and streaming pipelines. You create your pipelines with an Apache Beam program and then run them on the Dataflow service. Apache BEAM Batch and Stream Data Processing Data Flow Reference
3 Data Proc Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data. Apache Spark and Hadoop Clustered Big Data Processing Data Proc Reference
4 Composer Cloud Composer is a managed Apache Airflow service that helps you create, schedule, monitor and manage workflows. Cloud Composer automation helps you create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command line tools, so you can focus on your workflows and not your infrastructure. Data Workflow Archestration Composer Reference
5 Data Fusion Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines. Business users, developers, and data scientists can easily and reliably build scalable data integration solutions to cleanse, prepare, blend, transfer, and transform data without having to wrestle with infrastructure. Developing Code free Data pipelines Data Fusion Reference
6 Data Prep Explore, clean, and prepare data for analysis. Data Cleaning and Preparation Data Prep Reference
7 Data Catalog Data Catalog is a fully managed and scalable metadata management service within Dataplex. Data Catalog allows organizations to quickly discover, manage and understand all their data in Google Cloud. It offers: A simple and easy to use search interface for data discovery, powered by the same Google search technology that supports Gmail and Drive. A flexible and powerful cataloging system for capturing technical and business metadata. An auto-tagging mechanism for sensitive data with DLP API integration. Data Discovery and Metadata management Data Catalog Reference
8 Data Plex Dataplex allows you to logically organize your data stored in Cloud Storage and BigQuery into lakes and zones, and automate data management and governance across that data to power analytics at scale.Data Catalog is a metadata management service within Dataplex. Organizing Data Data Plex Reference
9 Data Stream Datastream is a serverless and easy-to-use change data capture (CDC) and replication service. It allows you to seamlessly replicate data from relational database sources such as Oracle, MySQL, and PostgreSQL (Preview) directly into BigQuery (Preview), reliably, and with minimal latency and downtime. Datastream also supports streaming changes from Oracle, MySQL and PostgreSQL (Preview) databases into Cloud Storage. In addition to these destinations, the service offers streamlined integration by using Dataflow templates to build custom workflows to replicate your databases into Cloud SQL or Cloud Spanner for database synchronization, or leverage the event stream directly from Cloud Storage to realize event-driven architectures. Database change data capture (CDC) and replication service Data Stream Reference
10 Pub/Sub Pub/Sub is a fully-managed real-time messaging service that allows you to Ingest event streams from anywhere, at any scale send and receive messages between independent applications. Messaging Service Pub/Sub Reference
11 Looker Looker is a tool that helps you explore, share, and visualize your company's data so that you can make better business decisions. BI Tool Looker Reference