[May 01, 2026] Get Up-To-Date Real Exam Questions for DEA-C01 with New Materials [Q166-Q188]

Share

[May 01, 2026] Get Up-To-Date Real Exam Questions for DEA-C01 with New Materials

Updated DEA-C01 Certification Exam Sample Questions

NEW QUESTION # 166
When would a Data engineer use table with the flatten function instead of the lateral flatten combination?

  • A. Whenthe LATERALFLATTENcombination requires no other source m the from clause to refer to
  • B. WhenTABLE with FLATTENrequires no additional source m the from clause to refer to
  • C. When TABLE with FLATTENrequires another source in the from clause to refer to
  • D. When table withFLATTENis acting like a sub-query executed for each returned row

Answer: C

Explanation:
Explanation
The TABLE function with the FLATTEN function is used to flatten semi-structured data, such as JSON or XML, into a relational format. The TABLE function returns a table expression that can be used in the FROM clause of a query. The TABLE function with the FLATTEN function requires another source in the FROM clause to refer to, such as a table, view, or subquery that contains the semi-structured data. For example:
SELECT t.value:city::string AS city, f.value AS population FROM cities t, TABLE(FLATTEN(input => t.value:population)) f; In this example, the TABLE function with the FLATTEN function refers to the cities table in the FROM clause, which contains JSON data in a variant column named value. The FLATTEN function flattens the population array within each JSON object and returns a table expression with two columns: key and value.
The query then selects the city and population values from the table expression.


NEW QUESTION # 167
A company uses an organization in AWS Organizations to manage multiple AWS accounts. The company uses an enhanced fanout data stream in Amazon Kinesis Data Streams to receive streaming data from multiple producers. The company runs the data stream in an account named Account A. The company wants to use an AWS Lambda function in an account named Account B to process the data from the data stream. The company creates a Lambda execution role in Account B that has permissions to access data from the data stream in Account A.
What additional step must the company take to meet this requirement?

  • A. Add a resource-based policy to the data stream to allow read access for the cross-account Lambda execution role.
  • B. Create a service control policy (SCP) to grant the data stream read access to the cross-account Lambda execution role. Attach the SCP to Account A.
  • C. Create a service control policy (SCP) to grant the data stream read access to the cross-account Lambda execution role. Attach the SCP to Account B.
  • D. Add a resource-based policy to the cross-account Lambda function to grant the data stream read access to the function.

Answer: A

Explanation:
To enable cross-account Lambda processing of Kinesis Data Streams, the stream in Account A must explicitly allow the Lambda execution role from Account B. This is done by adding a resource-based policy on the Kinesis data stream to grant kinesis:SubscribeToShard and related read permissions to the cross-account role. Without this resource-based policy, the Lambda in Account B cannot consume the data.


NEW QUESTION # 168
A company receives call logs as Amazon S3 objects that contain sensitive customer information.
The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access.
Which solution will meet these requirements with the LEAST effort?

  • A. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.
  • B. Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
  • C. Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
  • D. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.

Answer: D


NEW QUESTION # 169
As a Data Engineer, you have requirement to query most recent data from the Large Dataset that reside in the external cloud storage, how would you design your data pipelines keeping in mind fastest time to delivery?

  • A. External tables with Materialized views can be created in Snowflake.
  • B. Unload data into SnowFlake Internal data storage using PUT command.
  • C. Direct Querying External tables on top of existing data stored in external cloud storage for analysis without first loading it into Snowflake.
  • D. Data pipelines would be created to first load data into internal stages & then into Per-manent table with SCD Type 2 transformation.
  • E. Snowpipe can be leveraged with streams to load data in micro batch fashion with CDC streams that capture most recent data only.

Answer: A

Explanation:
Explanation
In a typical table, the data is stored in the database; however, in an external table, the data is stored in files in an external stage. External tables store file-level metadata about the data files, such as the filename, a version identifier and related properties. This enables querying data stored in files in an external stage as if it were inside a database. External tables can access data stored in any format supported by COPY INTO <table> statements.
External tables are read-only, therefore no DML operations can be performed on them; however, external tables can be used for query and join operations. Views can be created against external ta-bles.
Querying data stored external to the database is likely to be slower than querying native database tables; however, materialized views based on external tables can improve query performance.
Creating External tables enable user for querying existing data stored in external cloud storage for analysis without first loading it into Snowflake. The source of truth for the data remains in the ex-ternal cloud storage.
Data sets materialized in Snowflake via materialized views are read-only.
This solution is especially beneficial to accounts that have a large amount of data stored in external cloud storage and only want to query a portion of the data; for example, the most recent data. Users can create materialized views on subsets of this data for improved query performance.


NEW QUESTION # 170
A company runs an Apache Spark application every night in an Amazon EMR cluster. The company uses Amazon EC2 instances to supply compute capacity for the EMR cluster. The company deployed the Spark application in cluster mode.
An error occurs in the Spark application. A log for the error is stored in the application's Spark driver standard error logs. A data engineer needs to investigate the error.
Where can the data engineer find this error log?

  • A. The engineer can connect to the persistent application UI to see the first YARN container log in the Spark UI.
  • B. The engineer can connect to the web UI on the live cluster to see the YARN ResourceManager logs.
  • C. The engineer can connect to the Amazon EMR console to see the Amazon EMR step logs that are archived in Amazon S3.
  • D. The engineer can connect to the primary node of the cluster by using SSH to see the Spark history server logs.

Answer: C

Explanation:
In EMR cluster mode, the Spark driver runs inside the cluster and its standard error logs are captured as step logs. These logs are automatically archived to Amazon S3 and accessible from the Amazon EMR console under step logs. This is the correct location for investigating Spark driver errors.


NEW QUESTION # 171
A data engineer must maintain and monitor a data pipeline on AWS that processes streaming data from Internet of Things (IoT) devices. The pipeline uses Amazon Kinesis Data Streams to ingest data and Amazon Data Firehose to deliver data to an Amazon S3 bucket. The data engineer needs to monitor the health of the pipeline. Which solution will meet these requirements with the LEAST operational effort?

  • A. Configure Amazon CloudWatch alarms to monitor key metrics such as IncomingBytes, OutgoingBytes, and DeliveryToS3.Success for Kinesis Data Streams and Firehose
  • B. Use Amazon Managed Service for Apache Flink to perform near real-time anomaly detection on the streaming data and to invoke alerts if unusual patterns are detected.
  • C. Use an AWS Lambda function to run daily checks on the status of the Kinesis Data Streams and Firehose. Configure the Lambda function to use Amazon Simple Notification Service (Amazon SNS) to send notifications.
  • D. Use Amazon CloudWatch Logs to manually review logs that are generated by Kinesis Data Streams and Firehose.

Answer: A

Explanation:
CloudWatch metrics for Kinesis Data Streams and Firehose are available automatically and provide direct visibility into ingestion, throughput, and delivery success. Configuring CloudWatch alarms on key metrics (such as stream input/output and Firehose delivery success) enables proactive, automated health monitoring and alerting with minimal setup and no custom code or additional services to operate.


NEW QUESTION # 172
A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real- time endpoint in Amazon SageMaker.
The company wants to make real-time inventory recommendations. The company also wants to make predictions about future inventory needs.
Which solutions will meet these requirements? (Choose two.)

  • A. Use SageMaker Autopilot to create inventory management dashboards in Amazon Redshift.
  • B. Use Amazon Redshift ML to schedule regular data exports for offline model training.
  • C. Use SQL to invoke a remote SageMaker endpoint for prediction.
  • D. Use Amazon Redshift as a file storage system to archive old inventory management reports.
  • E. Use Amazon Redshift ML to generate inventory recommendations.

Answer: C,E

Explanation:
Amazon Redshift ML integrates machine learning (ML) directly into the Redshift environment, allowing you to build and use ML models with SQL commands. By leveraging Redshift ML, the company can make real-time inventory recommendations based on historical and current data directly within Redshift.
Redshift can invoke external services, such as a SageMaker real-time endpoint, using SQL queries. This allows the company to send real-time data from Redshift to SageMaker and receive predictions (e.g., inventory forecasting) in real time, meeting the need for real-time predictions.


NEW QUESTION # 173
Which Snowflake feature facilitates access to external API services such as geocoders. data transformation, machine Learning models and other custom code?

  • A. External functions
  • B. External tables
  • C. Java User-Defined Functions (UDFs)
  • D. Security integration

Answer: A

Explanation:
Explanation
External functions are Snowflake functions that facilitate access to external API services such as geocoders, data transformation, machine learning models and other custom code. External functions allow users to invoke external services from within SQL queries and pass arguments and receive results as JSON values. External functions require creating an API integration object and an external function object in Snowflake, as well as deploying an external service endpoint that can communicate with Snowflake via HTTPS.


NEW QUESTION # 174
Which are the two ways to access elements in a JSON object?

  • A. Use dot notation to traverse a path in a JSON object:
    <col-umn>:<level1_element>.<level2_element>.<level3_element>.
  • B. Use SemiColon notation to traverse a path in a JSON object:
    <col-umn>:<level1_element>;<level2_element>;<level3_element>.
  • C. use Curly bracket notation to traverse the path in an object:
    <col-umn>{'<level1_element>'}{'<level2_element>'}.
  • D. use bracket notation to traverse the path in an object:
    <col-umn>['<level1_element>']['<level2_element>'].

Answer: A,D


NEW QUESTION # 175
A company has several new datasets in CSV and JSON formats. A data engineer needs to make the data available to a team of data analysts who will analyze the data by using SQL queries.
Which solution will meet these requirements in the MOST cost-effective way?

  • A. Create an Amazon RDS MySQL cluster. Use AWS Glue to transform and load the CSV and JSON files into database tables. Provide the data analysts access to the MySQL cluster.
  • B. Create an AWS Glue DataBrew project that contains the new data. Make the DataBrew project available to the data analysts.
  • C. Store the data in an Amazon S3 bucket. Use an AWS Glue crawler to catalog the S3 bucket as tables. Create an Amazon Athena workgroup that has a data usage threshold. Grant the data analysts access to the Athena workgroup.
  • D. Load the data into Super-fast, Parallel, In-memory Calculation Engine (SPICE) in Amazon QuickSight. Allow the data analysts to create analyses and dashboards in QuickSight.

Answer: C

Explanation:
By storing the CSV and JSON files in Amazon S3 and running an AWS Glue crawler, you automatically catalog them as tables without upfront ETL.
Amazon Athena then lets analysts run SQL queries directly against those tables on a pay-per- query basis. Using an Athena workgroup with a usage threshold keeps costs under control, making this the most cost-effective, low-operational-overhead solution.


NEW QUESTION # 176
An application consumes messages from an Amazon Simple Queue Service (Amazon SQS) queue. The application experiences occasional downtime. As a result of the downtime, messages within the queue expire and are deleted after 1 day. The message deletions cause data loss for the application.
Which solutions will minimize data loss for the application? (Choose two.)

  • A. Reduce message processing time.
  • B. Increase the message retention period
  • C. Use a delay queue to delay message delivery
  • D. Increase the visibility timeout.
  • E. Attach a dead-letter queue (DLQ) to the SQS queue.

Answer: B,E


NEW QUESTION # 177
An ecommerce company operates a complex order fulfilment process that spans several operational systems hosted in AWS. Each of the operational systems has a Java Database Connectivity (JDBC)-compliant relational database where the latest processing state is captured.
The company needs to give an operations team the ability to track orders on an hourly basis across the entire fulfillment process.
Which solution will meet these requirements with the LEAST development overhead?

  • A. Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Amazon QuickSight dashboards that track the orders.
  • B. Use AWS Glue to build ingestion pipelines from the operational systems into Amazon Redshift Build dashboards in Amazon QuickSight that track the orders.
  • C. Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Grafana dashboards that track the orders.
  • D. Use AWS Glue to build ingestion pipelines from the operational systems into Amazon DynamoDBuild dashboards in Amazon QuickSight that track the orders.

Answer: B


NEW QUESTION # 178
A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original company modifies a query that aggregates sales revenue data to join sales tables from both companies.
The sales table for the first company is named Table S1. The sales table for the second company is named Table S2. Table S1 contains 10 billion records. Table S2 contains 900 million records.
The query becomes slow after the modification. A data engineer must improve the query performance.
Which solutions will meet these requirements? (Choose two.)

  • A. Use the KEY distribution style for both sales tables. Select a low cardinality column to use for the join.
  • B. Use Amazon Redshift Advisor to review and select optimizations to implement.
  • C. Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.
  • D. Use the Amazon Redshift query optimizer to review and select optimizations to implement.
  • E. Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join.

Answer: B,E

Explanation:
Choosing KEY distribution on both tables with a high-cardinality join column colocates matching rows across nodes and avoids data skew, improving join performance.
Redshift Advisor provides automated, actionable recommendations (e.g., distribution and sort keys, stats) to further optimize the slow query with minimal effort.


NEW QUESTION # 179
Snowflake does not treat the inner transaction as nested; instead, the inner transaction is a separate transaction.
What is term used to call these Transaction?

  • A. Atomic Transaction
  • B. Scoped transactions
  • C. Nested Scope Transaction
  • D. Inner Transaction
  • E. Enclosed Transaction

Answer: B


NEW QUESTION # 180
A stream called TRANSACTIONS_STM is created on top of a transactions table in a continuous pipeline running in Snowflake. After a couple of months, the TRANSACTIONS table is renamed transactiok3_raw to comply with new naming standards What will happen to the TRANSACTIONS _STM object?

  • A. TRANSACTIONS _STMwill keep working as expected
  • B. TRANSACTIONS _STMwill be stale and will need to be re-created
  • C. Reading from the traksactioks_3T>: stream will succeed for some time after the expected STALE_TIME.
  • D. TRANSACTIONS _STMwill be automatically renamedTRANSACTIONS _RAW_STM.

Answer: B

Explanation:
Explanation
A stream is a Snowflake object that records the history of changes made to a table. A stream is associated with a specific table at the time of creation, and it cannot be altered to point to a different table later. Therefore, if the source table is renamed, the stream will become stale and will need to be re-created with the new table name. The other options are not correct because:
TRANSACTIONS _STM will not keep working as expected, as it will lose track of the changes made to the renamed table.
TRANSACTIONS _STM will not be automatically renamed TRANSACTIONS _RAW_STM, as streams do not inherit the name changes of their source tables.
Reading from the transactions_stm stream will not succeed for some time after the expected STALE_TIME, as streams do not have a STALE_TIME property.


NEW QUESTION # 181
A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

Which solution will meet this requirement with the LEAST coding effort?

  • A. Use AWS Glue DataBrew to read the files. Use the NEST_TO_MAP transformation to create the new column.
  • B. Use AWS Glue DataBrew to read the files. Use the NEST_TO_ARRAY transformation to create the new column.
  • C. Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
  • D. Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

Answer: A


NEW QUESTION # 182
A company uses an Amazon S3 Standard bucket to maintain a self-managed transactional data lake that uses Apache Iceberg tables. The data lake ingests data both in real time and in batches.
Users report slow performance for real-time tables. A data engineer reviews the real-time tables and notices that the tables are made up of many small data files The data engineer must improve the performance of the real-time tables.
Which solution will meet this requirement?

  • A. Archive historic snapshots.
  • B. Apply compaction.
  • C. Delete S3 objects that are not linked from the Iceberg table.
  • D. Expire historic snapshots.

Answer: B

Explanation:
Compaction merges many small data files into fewer, larger files, which reduces file count and metadata overhead in Apache Iceberg tables, directly improving query performance for real-time workloads.


NEW QUESTION # 183
A company plans to use Amazon Kinesis Data Firehose to store data in Amazon S3. The source data consists of 2 MB .csv files. The company must convert the .csv files to JSON format. The company must store the files in Apache Parquet format.
Which solution will meet these requirements with the LEAST development effort?

  • A. Use Kinesis Data Firehose to convert the .csv files to JSON. Use an AWS Lambda function to store the files in Parquet format.
  • B. Use Kinesis Data Firehose to convert the .csv files to JSON and to store the files in Parquet format.
  • C. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format.
  • D. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON and stores the files in Parquet format.

Answer: B

Explanation:
By using the built-in transformation and format conversion features of Kinesis Data Firehose, you achieve the desired result with minimal custom development, thereby meeting the requirements efficiently and cost-effectively.


NEW QUESTION # 184
A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance.
The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet.
Which combination of steps will meet this requirement with the LEAST operational overhead?
(Choose two.)

  • A. Attach the same security group to the Lambda function and the DB instance. Include a self- referencing rule that allows access through the database port.
  • B. Turn on the public access setting for the DB instance.
  • C. Update the security group of the DB instance to allow only Lambda function invocations on the database port.
  • D. Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.
  • E. Configure the Lambda function to run in the same subnet that the DB instance uses.

Answer: C,E


NEW QUESTION # 185
A company has multiple applications that use datasets that are stored in an Amazon S3 bucket.
The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII.
To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset.
Which solution will meet the requirements with the LEAST operational overhead?

  • A. Use AWS Glue to transform the data for each application. Create multiple copies of the dataset.
    Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
  • B. Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
  • C. Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.
  • D. Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.

Answer: D

Explanation:
Amazon S3 Object Lambda allows you to add your own code to S3 GET requests to modify and process data as it is returned to an application. For example, you could use an S3 Object Lambda to dynamically redact personally identifiable information (PII) from data retrieved from S3. This would allow you to control access to sensitive information based on the needs of different applications, without having to create and manage multiple copies of your data.


NEW QUESTION # 186
If you need to connect to Snowflake using a BI tool or technology, which of the following BI tools and technologies are known to provide native connectivity to Snowflake?

  • A. PROTEGRITY
  • B. SISENSE
  • C. SELECT STAR
  • D. ALATION

Answer: B

Explanation:
Explanation
SISENSE is BI tools and technologies which is known to provide native connectivity to Snowflake, Rest of the options given are security & governance tools supported by SnowFlake.
Business intelligence (BI) tools enable analyzing, discovering, and reporting on data to help execu-tives and managers make more informed business decisions. A key component of any BI tool is the ability to deliver data visualization through dashboards, charts, and other graphical output.
For More details around supported BI Tools in Snowflake Ecosystem, do refer the link below:
https://docs.snowflake.com/en/user-guide/ecosystem-bi


NEW QUESTION # 187
A retail company uses Amazon Aurora PostgreSQL to process and store live transactional data.
The company uses an Amazon Redshift cluster for a data warehouse.
An extract, transform, and load (ETL) job runs every morning to update the Redshift cluster with new data from the PostgreSQL database. The company has grown rapidly and needs to cost optimize the Redshift cluster.
A data engineer needs to create a solution to archive historical data. The data engineer must be able to run analytics queries that effectively combine data from live transactional data in PostgreSQL, current data in Redshift, and archived historical data. The solution must keep only the most recent 15 months of data in Amazon Redshift to reduce costs.
Which combination of steps will meet these requirements? (Choose two.)

  • A. Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.
  • B. Configure the Amazon Redshift Federated Query feature to query live transactional data that is in the PostgreSQL database.
  • C. Schedule a monthly job to copy data that is older than 15 months to Amazon S3 by using the UNLOAD command. Delete the old data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3.
  • D. Configure Amazon Redshift Spectrum to query live transactional data that is in the PostgreSQL database.
  • E. Schedule a monthly job to copy data that is older than 15 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command. Delete the old data from the Redshift cluster.
    Configure Redshift Spectrum to access historical data from S3 Glacier Flexible Retrieval.

Answer: B,C

Explanation:
Choice A ensures that live transactional data from PostgreSQL can be accessed directly within Redshift queries.
Choice C archives historical data in Amazon S3, reducing storage costs in Redshift while still making the data accessible via Redshift Spectrum.


NEW QUESTION # 188
......


Snowflake DEA-C01 Exam Syllabus Topics:

TopicDetails
Topic 1
  • Performance Optimization: This topic assesses the ability to optimize and troubleshoot underperforming queries in Snowflake. Candidates must demonstrate knowledge in configuring optimal solutions, utilizing caching, and monitoring data pipelines. It focuses on ensuring engineers can enhance performance based on specific scenarios, crucial for Snowflake Data Engineers and Software Engineers.
Topic 2
  • Data Movement: Snowflake Data Engineers and Software Engineers are assessed on their proficiency to load, ingest, and troubleshoot data in Snowflake. It evaluates skills in building continuous data pipelines, configuring connectors, and designing data sharing solutions.
Topic 3
  • Data Transformation: The SnowPro Advanced: Data Engineer exam evaluates skills in using User-Defined Functions (UDFs), external functions, and stored procedures. It assesses the ability to handle semi-structured data and utilize Snowpark for transformations. This section ensures Snowflake engineers can effectively transform data within Snowflake environments, critical for data manipulation tasks.
Topic 4
  • Storage and Data Protection: The topic tests the implementation of data recovery features and the understanding of Snowflake's Time Travel and micro-partitions. Engineers are evaluated on their ability to create new environments through cloning and ensure data protection, highlighting essential skills for maintaining Snowflake data integrity and accessibility.
Topic 5
  • Security: The Security topic of the DEA-C01 test covers the principles of Snowflake security, including the management of system roles and data governance. It measures the ability to secure data and ensure compliance with policies, crucial for maintaining secure data environments for Snowflake Data Engineers and Software Engineers.

 

DEA-C01 Study Guide Cover to Cover as Literally: https://exams4sure.pdftorrent.com/DEA-C01-latest-dumps.html