GCP Professional Data Engineer Exam With
Verified Solutions
1. You are building an application that will only detect and label certain
business-to-business product logos within an image. You do not have an in-depth
background working with machine learning models, but you need to get your application
up and running. What is the current best method to accomplish this task? a. The current
best method would be to utilize the AutoML Vision service to train a custom model using
the Vision API.
i. The newly added AutoML services let you train custom image-and-among other
models-using the Google-pretrained API's as a base. Training a custom model will also
work on AI Platform, but this route requires less manual model overhead.
2. Your company streams telemetry data into BigQuery for long-term storage of 2 years
and analysis. Data comes in at a rate of close to 100 million records per day. They want
to be able to run queries against certain time periods of data without incurring the costs
of querying all available records. What is the preferred method to do this? a. Partition a
single table by day, and run queries against individual partitions.
i. Partitioning a single table based on date allows you to keep only one table while, at the
same time, perform queries on a small subset of it. Even though it is technically valid to
use many tables - one for every day - using wildcards, best practice is partitioning a
single table.
3. You are an administrator for a few organizations within the same company. Each
organization has data in their own BigQuery table within a single project. Because of
reasons related to application access, all of the tables must remain in the same project.
You believe each organization should have the ability to view and execute queries
against their own data without revealing data from organizations to unauthorized
viewers. What would you recommend? - Answer a. In that project, create one dataset
per organization. Put each organization's table into its own dataset. Bind access to the
dataset per organization they are in to that company. Now they can see their table but
nobody else's.
i. You can only assign roles at the dataset level. Putting tables into different datasets lets
you control access per dataset.
, 4. Your company is making the move to Google Cloud and has decided to go with a
managed database service to reduce overhead. Your current database supports a
product catalog that does real-time inventory tracking for a retailer. Your database is
500 GB in size. The data is semi-structured, but doesn't require full atomicity. You want
a truly no-ops/ serverless solution. Which of the following should you use for your
storage? a. Cloud Datastore
i. Datastore is ideal for semi-structured data less than 1TB in size. Product catalogs are
a recommended use case.
5. How would you configure your Dataproc environment to use BigQuery as an input and
output source? - Answer a. Install the BigQuery connector on your Dataproc cluster.
i. You can install the BigQuery connector to your cluster for direct programmatic
read/write access to BigQuery. Note that a Cloud Storage bucket is used between the
two services, but you'll interact directly with BigQuery from Dataproc.
6. In AI Platform, what does the CUSTOM tier allow you to configure? Choose the best
answer. - Answer a. Custom number of workers and parameter servers. Machine type of
master server
i. Correct. You can customize the number of workers and parameter servers, but
masters are set to one.
7. You are creating a data pipeline in Google Cloud. You need to preprocess source data
for a machinelearning model. In particular, you need to quickly remove duplicate rows
from three input tables, and you need to remove outliers from columns of data for which
you don't know the distribution of data. What do you do? - Answer a. The following
procedure uses Cloud Dataprep to review the range of values in sample source data
table columns and add the necessary transformations to the job. To do so, for each
column, click the column name, and click each appropriate suggested transformation,
then click Add to add each transformation to the Cloud Dataprep job.
i. Dataprep would be the correct choice since the requirements are to prepare/clean the
source data. For deduplication, using the suggestion transformation is easier and faster
than creating a recipe, which is more work than necessary.
8. You keep regular snapshots of the boot disks of running Compute Engine instances as
part of a backup and restore plan. You need to restore these snapshots for the fewest
number of steps with replacement instances. What do you do? - Answer a. Use the
snapshots to create replacement instances as needed.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Easton. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $12.99. You're not tied to anything after your purchase.