[PT] Recentemente fui aprovado no exame DP-200, que é parte dos requisitos para ser um Azure Data Engineer. Abaixo compartilho com vocês alguns pontos chaves a serem considerados nos seus estudos. São as minhas anotações, NÃO É TUDO QUE CAI NO EXAME, siga a recomendação oficial!

[EN] Recently, I’ve been approved in DP-200 exam, which is a step out of 2 to become an Azure Data Engineer. Just below I’m sharing with you some key points to consider in yout studies. These are my personal notes, IT IS NO ALL CONTENT COVERED IN THE EXAM, continue following official recomendations!

Distributions in Synapse:

· Round-Robbin → Distribute evenly, mostly used for staging, default

· Replicated → Dimension tables less than 2gb, perform joins faster

· Hash → For Large Table, fact, perform aggregations over a column

Partitions in Synapse:

· ColumnStore Index → For read only, good to perform aggregations in lage volume

· Column Index → Indexed in same order as data comes in

· Heap → No natural Order

Delete stale data from partitions in Synapse:

· Copy the table with CTAS

· Switch old data to another table

· Delete old data table

Polybase — What you need basically:

· Scoped Credential

· External Data source (usually blob)

· External file formatting


Storage Account Types:

· Table → entity, properties max 1MB

· Blob → container gen1, gen2 have hierarchical namespace and prepared for Big Data and MPP

· Queue → messaging

· File → FTP

Cosmos APIs:

· SQL API → container of documents

· Mongo → Collection of documents

· Table → Table of Key Value items

· Cassandra → Table of rows - Wide Column

· Gremlin → Graph of Vertices edges

· How To create a container for each API?

Consistency Level Cosmos:

