A thorough and systematic breakdown of Snowflake’s core architecture that effectively demystifies the decoupling of storage and compute. It serves as a solid, no-nonsense foundation for anyone looking to master the mechanics of modern cloud data warehousing.
Deep Dive
Voraussetzung
- Keine Daten verfügbar.
Nächste Schritte
- Keine Daten verfügbar.
Deep Dive
SNOWFLAKE tutorials || Demo - 3 || by Mr. Shekhar On 21-05-2026 @7AM (IST)Indiziert:
SNOWFLAKE tutorials || Demo - 3 || by Mr. Shekhar On 21-05-2026 @7AM (IST) Course Content: https://bit.ly/3rZA4bo ============================================================= To get latest DURGASOFT updates on trending Technologies, Please Subscribe to Our Telegram Channel: LINK: https://t.me/durgasoftupdates ========================================= You an see more Java videos following link: Java tutorial by durga sir https://goo.gl/XWb4RL Java 9 by durga sir https://goo.gl/hXGyBW Java 1.8 Version New Features by Durga sir https://goo.gl/iHXXYU Adv Java JDBC Tutorial by Durga sir https://goo.gl/8q16Eo OCJA 1.8 Java SE 8 Programmer - I (1Z0 - 808 ) By Durga sir https://goo.gl/gC6R7f Core Java by NagoorBabu sir https://goo.gl/s6Nvj1 Advenced Java by Nagoorbabu sir https://goo.gl/ZZonzJ CoreJava by Ratan https://goo.gl/3VM19v Advanced Java jdbc by Ratan https://goo.gl/Rn2UXr Advjava tutorials - JSP by Ratan https://goo.gl/Z6ytxm Adv java servlets tutorial by ratan https://goo.gl/zTwi9y Servlet and JSP Tutorial by anji reddy https://goo.gl/jZMRUv Advanced Java Jdbc by Anjireddy https://goo.gl/16CGzX Hibernate byAnjireddy https://goo.gl/qQojvZ Struts by Anjireddy https://goo.gl/nE1Eof Spring by Mr.AnjiReddy https://goo.gl/NfN14R ADV JAVA by Naveen https://goo.gl/bhSsXF Spring by Mr.Naveen https://goo.gl/huVwFN Hibernate by Mr. Naveen https://goo.gl/TY3Wpd Struts by Mr.Naveen https://goo.gl/Vkmiw7 #DURGASOFTWARE #DURGASOFT #SNOWFLAKE
So today we will uh I am sharing my screen now. So you are able to see my screen?
>> Yes.
>> Yeah. So so we have a actually uh three layers.
We have a three layers in the snowflake architecture. One is storage layer.
Another one is virtual database storage layer, query processing layer that is virtual computers and cloud service layer. So we have a three layers in the snowflake architecture. We will see what is the meaning of each and every layer and what it will do.
So storage layer first one is storage layer. What it is actually the storage layer is all the data lies in cloud storage object completely separate from the compute machine. Snowflake auto converts raw data into kernel micro partitions automatically.
You never manage files. I will tell you any virtual warehouses can read the same data simultaneously with zero interference because they read from the shared storage not from each other. See what is the meaning of this storage layer is the very first part is whenever we load the data into the snowflake tables automatically automatically this will be loaded automatically this data will be loaded into the cloud provider area. We discussed that snowflake has built on top of three cloud products. One is AWS, one is Ashure, one is GCP. Okay. So whenever we load the data into the snowflake, whenever we load the data into the snowflake tables, automatically data is going to load into the cloud provider area in form of multiple micro partitions.
What is the meaning micropartition? We will discuss. So micropartition is the storage place storage component where your data is going to store into the where your data is going to store into the micropartition. The micropartition size is 500 MB.
Compressed data the micro one micropartition size is 500 MB. So how much of data you are going to store? How much of data you are going to store into the micropartitions?
Those very number of suppose if you store 2 GB of data.
Okay.
So one micropartition size is one micropartition size is 500 MB. So whenever you load the 4 GB of data then it will take the around 10 micropartitions like that. The micropartition is storage. The micropartition is storage container in the snowflake. Okay. What is the meaning of micropartition? How would it looks like that? That we will discuss now. Okay.
One second.
See micropartition when you load the data into the snowflake it automatically split the data into the small chains called micropartition whenever we load the data into the snowflake table automatically the data is going to store into the micropartitions okay each micropartition size is 500 MB stored as a compressed column file in a cloud storage whenever whenever we load the data into the snowflake target table data is going to store in form of column format, column formats in the micropartition. See here.
So this is the table and these are the micropartitions. The data is going to store in a form of columns not into rows into not into uh rows into columns because see data warehouse nothing but a data warehouse will be having a very huge data and you will be having a lot of columns around 500 columns in that if the data is stored in a form of column and then rows then if you want to do the aggregation on a particular one column that is revenue column. If the data is stored in a traditional format like column and then rows then it needs to touch the while retrieving a data or while doing aggregation optimizer needs to touch the entire columns to bring the single record also if the data is stored in a column or rows format column and rows format.
Instead of that what the snowflake is doing? Snowflake is doing the data is storing into a column format. Now if you want to if you want to sum the data if you want to sum the data of the particular revenue column then it is going to touch only one column as the data is going to see type name country and then date. Suppose if you want to sum any of the value then it is going to touch only that column itself so that the data is stored in a column format automatically manage you do nothing immutable you never modified once done what will happen actually so these things are automatically we loaded the data into the micropartitions further suppose let's say If you if you modify any data if you modify any data into the if you modify any data into the table or into the value into the column suppose let's consider you modify the value five into the same what will happen in the snowflake the storage location will be uh like uh it will become inactive.
So whatever whatever the values are updated. So for that new space will be allocated new space will be allocated for the new values. So whatever we have a old values it will not touch. It will not touch it will be inactive.
After certain period of time automatically this space will be freed from the micropartition. That is the meaning of immutable. Immutable means if you modify any values, so instead of overriting that value in the microp partition, okay, it is going to write that new value into the new place.
Whatever the old value is there, it is going to become inactive after certain period of time automatically.
automatically that that storage space will be freed.
Why it is doing inactive order? There is a reason that we will discuss in the further sessions that will be very advanced for now. So that is the meaning immutable.
Okay. And one more one more concept we have that is partition pruning that that is partition pruning. So what is the meaning of this partition pruning in the uh snowflake case?
Suppose let's consider if you are quering a data se from table t1 where name equal to y is having a value one y is having a value two.
So these two values Y micro Y is having in the micropartition 3 and the micropartition 4. Instead of scanning to retrieve the two records which are name equal to Y instead of scanning the four micropartitions instead of scanning the four micropartitions it is going to scan only a two micropartitions to bring the two records.
By doing this it is saving the time of the inclusion of your SQL. So this is called as a partition pruning because the metadata metadata is stored in the metadata tables by using the metadata information instead of scanning all the micropartitions instead of scanning the all the micropartitions it is going to scan only two micropartitions it is going to scan only two micro partitions.
One second. One second.
So uh in the help of the metadata information directly it will go into the particular micropartitions and then bring the data for you. Instead of scanning all the micropartitions it is going to scan only required number of micropartitions. So that is called as a m micro that is called as a partition query which is which will be done by with the help of the metadata tables available in the the snowflake once we execute the SQL code. So based on the condition it will take the help of the metadata tables in which micropartitions required records are there then directly it will go into the particular micropartitions and then bring the data.
It is not going to scan all the records.
It is going to scan only required number of micropartitions. So by doing this with the help of the micros with the with the help of the partition pruning you are able to save the execution of you are going to save the execution of your execution of your SQL code. That's how that's how snowflake has the that's how snowflake has the very good performance improvement in all the operations. Okay. So this is this is about the storage layer. Do you clear any doubts?
>> Yes. I have doubts.
>> Yeah. Yeah.
>> How it will how it will se how snowflake will select this will go into the partition three. It it should go into the partition four like the >> Okay. Okay. Yeah. Yeah. I will I will I will explain you. Okay. One second. One second. I'm connecting charger.
>> Okay.
>> I'm connecting charger of my laptop. One second.
So actually whenever you load the data into the table, obviously we load the data into the table, right? Yes. Yes.
Whenever you load the data into the snowflake environment. Okay. Whenever you load the data into the snowflake environment. Okay. Suppose let's see here how that data is stored in say suppose order ID, customer ID, order date, amount, status. Okay, the data is going to store in form of a column format. Order ID 1 2 3 4 customer ID 102 amount 455. Okay.
So whenever you do the aggregation whenever you do the aggregation amount then it is going to consider this column itself. It is not going to consider these two columns. That's why you are going to save the time that that is that is that is other part for your question how the micropartition is going to select only the required number metadata stored per partition for every micropartition for every column snowflake will automatically stores minimum value maximum value null count distinct count. This meta this meta is the key to snowflake performance. See here whenever you load the data whenever you load the data suppose let's t1 is our table t1 table name is table t1 so this is the data if you load if you load the data into this table this data is going to store into this micropartition so in each this is the micropartition one if you see here row number 1 to 6 row number this is row number 1 to 6 row number 7 to 12 row number 13 to 18 row number 20 to 24. So this is the so one of the basic information which is stored in the which is stored in the which is stored in the meta table. Next suppose if you consider type in the meta table what it will do micropartition one. So the rows 1 to 16 it has told what is the minimum of type type two here and what is the maximum value of uh what is the maximum value of uh what is the maximum value of uh what is the maximum value of type column is four. minimum value is two and then minimum value is maximum value is four.
It will store the this minimum value and this maximum value and then row 1 to six. Whenever you query for the particular table based on the information available in the metadata tables it is going directly it is going to look into the directly it is going to look into the particular micropartition and then bring the data. Suppose see here uh we where date equal to 2004 to 2024 to March 15. In partition you have a minimum Jan 1 and then maximum Jan 31.
In partition two you have a February 1, February 28. In micro partition 3 you have March 1 and then March 31. In February partition four you have April 1, April 38. So you have queried the data for the March month. So in this partition three you have a minimum value is march one and then March 31st. So this value this value this value has written into the metadata tables in the snowflake metadata tables in the snowflake with the help of the metadata tables directly it will come into the particular micropartition three and then it will read the data so that it will return the data instead of scanning all the micropartitions it is going to read the particular required records with the help of the meta data tables it is going to return the data so that can be called as a partition pruning that is clear.
>> Suppose the logical structure is uh uh SNM according to the date or ID of the table.
>> No, by default but by default it will not by default it will not uh by default it will not uh sort the data by default it will not sort the data. So suppose when we are doing extracting data from database to this uh snowflake how data will be stored in logical structure uh suppose >> that's what that is see logical structure logical structure is see you when when whenever you load the data data is stored in form of the micropartition this is the component storage component in the snowflake okay if you load the data >> there is no order Like this is the first element this is the last element like >> that is what see if you don't you have a clustering key column that is again separate concept if you create a clustering key key on the particular column then it is going to sort the data and the ordering is also stored into the micropartitions but if you don't create any clustering key whatever the data is there whatever the data is there first 500 MB MB data is stored in the first micro partition.
Next 500 MB data is going to store in the micropartition two. Third 500 MB data is going to store in the micropartition three micropartition four accordingly.
Clear?
Okay.
Got clear.
Um here if you see in micropartition 2 and micropartition 3 there are uh two months 11 by 2 11 by3 11x4 also there in micro partition 3. Uh when we want to get the data from 11 by two date it needs to scan all three right 1 2 3 obviously obviously it will scan the all the three obviously it will scan the all the three because the data is available in the all the three right same values >> there is no there is no there is no uh like uh shortcut if you if you sort the data again if you create if you create suppose see If data is sorted according to dates as you said here in this in this image data is sorted based on the data is sorted based on the >> name column name column. Okay. Now data C A B C D X Y Z data is sorted in this order. Now if you query if you query where where name equal to a then it is going to scan only one micro partition like that.
>> Okay.
>> Right. Like that.
So that is the micropartition pring. So the this this all this information is will be stored in the uh meta tables in the snowflake. Okay.
>> Okay.
>> Here I'm not here.
>> Okay.
One second.
Okay.
>> So this is about the storage layer. This is clear or not clear.
>> Oh okay sir.
Next, next one is uh next one is query processing layer. Cory processing layer is what? processing layer is like which is a virtual computer warehouse which is a virtual computer warehouse that day we have that day in the in days I have uh we have discussed that this is a computer layer comput resources hardware resources are hardware resources are provided hardware resources are provided in the form of virtual computers so there are no hardware components in the snowflake for the computer resources. Computer resources are provided in form of virtual computer warehouse. So this is a virtual computer warehouse. Each virtual computer is fully independent cluster CPUs, memory and localiz caching. They share nothing CPU not so which is a independent which is not going to share to the multiple teams. If multiple teams are there, you can create the multiple virtual computer varrows and assign each virtual computer to the individual team. Assign each virtual computer to the individual team.
Individual team. So this is not a this is not a single >> shared resource. This is not a shared resource. So what is the advantage by creating a multiple virtual computer and assigning to the individual team?
Suppose let's consider as we discussed last time if there are multiple teams if multiple teams are accessing same hardware resources then it will be a problem. Maybe other teams may be getting a late response. Other teams may be getting a connectivity issues because all of the sen all the teams are trying to access the server and then hardware resources if the hardware resources are shared resources. So to avoid that they have introduced the virtual computer warehouse concept. You can create a multiple virtual computer warehouse and then assign to the independent teams. By doing this you can able to see the very good performance issues and you don't see any issues you don't see any issues while working with the by working with the environment so ETL on one warehouse BA and second data science and third none slow each other down so no no other teams will face the any performance issues you have a suspend so that we will do what is see Here uh you can create virtual mirrors like this. So this we will see while while virtual under in the in the in the further sessions you have a options here auto suspend option autores option. What is the meaning of auto suspend option?
Auto option is whenever whenever whenever if you're not using if you are not using continuously for the particular particular time if you are not using continuously for the particular time automatically the virtual will go into the suspended mode. If it is goes into the suspended mode, it will stop the generating the pricing on your Snowflake account. It will stop. So that see suppose you log into the work for the today at 9:00 a.m. and log out at 6:00 a.m. out of which you have used only 3 hours. So you used only for 3 hours. So it is generating the price for the only 3 hours not for the 9 hours. If you if you left if if you if you left your account without using for the continuous for the 5 minutes or 10 minutes then automatically it will calculate internally after 10 minutes automatically it will go into the suspension mode. Automatically it will go into the suspension mode.
So whenever it goes into the suspension mode it will stop the generating the prices.
In the snowflake the pricing will be happening. In the snowflake the pricing will be happening based on the virtual comput utilization and then storage space utilization based on the virtual comput your your computer resources utilization and then your storage space utilization whenever you use the your whenever you use the virtual warehouse virtual warehouse while based of the virtual warehouse while based of the virtual warehouse.
You can able to while creation of the virtual warehouse you can define the different sizes. You can create the virtual warehouse with X small. You can create the virtual ware, medium, large like that. So based on the virtual warehouse size for the utilization of your virtual computer warehouse, it is going to generate the number of credits per hour. So based on this credits you are going to pay to the snowflake like you are using electricity electricity in your house it is going to generate the number of units per month then you are going to pay for those number of units for the utilization of your electricity in the your home like that for consuming or utilizing your virtual computer warehouse it is going to generate the number of credits per over. So based on the number of RS it is going to you are going to pay to the snowflake. That's why whenever you are not using continuously for the particular period it is going to save the after it will go it is going to suspend auto suspend.
Whenever it goes the auto suspend then you are going to credit stop the moment it suspend.
Whenever the stops whenever this stops the credit then it is going to save the it is going to save the price to the your organization. Auto resum what is the meaning of auto as your your virtual warehouse is auto suspended as you are not using continuously whenever you log into your snowflake account then automatically whenever you log into snowflake account automatically it will auto and then it will become active then it is ready to use. So that is the meaning of that is the meaning of your virtual computer that is clear or not clear. Uh yes >> clearer any doubts?
>> No no >> yeah the next one is cloud service layer. The next one is cloud service layer which is an important actually which is an important layer. Okay. So which is cloud service layer. Okay.
This will do a lot of things. This will do a lot of things. Without this without this comput cloud service layer, your snowflake will not work because because see the very first point is whenever you submit the whenever you submit the request to the snowflake that is input and output request. Whenever you submit any SQL statement to your uh snowflake the very first thing is this cloud service will later accept your request. It will parse, it will optimize, it will check the metadata.
Okay. First what it will do whenever you submit the whenever you submit the I request to your cloud your snowflake account. This very first thing is cloud service layer will accept this request.
It will check this cloud service will check your SQL code whether it is syntactically is correct or not. the mentioned objects are available or not.
Everything it will pass initially. If everything is goes well then this cloud service layer will submit that request to the query processing layer. In query processing layer you have a cach a memory. Cach a memory is the space. Cach a memory is the space where you will be having a recently executed SQL results.
So whenever this cloud service layer will pass the IO request to the query processing layer, it will check the results into the cacher memory. The suitable results are available in the cacher memory. This cacher memory is going to return into the this cacher memory will return the result into the query processing layer. This query processing layer will return into the cloud service layer. Then again cloud service layer will return the result to the end user. What happen if the result is not available in the cache memory then that request will pass to the database storage layer database storage layer where we have a actual data where we have a actual data then it will bring the data from this database storage layer this database storage layer will submit the result to the cloud service layer this cloud service layer cloud service layer will return the result to the end user end user so that is The first thing, second thing is whenever whenever the user is trying to uh log into the snowflake account, the very first thing is cloud service layer will verify the credentials of the particular user. If the user is provided the right credentials, then only this cloud service layer will allow to log into the snowflake account. It will check the authentication verification. Cloud service layer will check the authentication verifications. I am a shaker. If I try to login into the snowflake account by providing the credentials whenever by the time I provided the my credentials initially internally this cloud service layer will crossverify all your credentials. If the credentials are correct then only cloud service layer will arrow to the user login snowflake account. So that is the second thing. The third thing is whenever user is trying to whenever user is trying to load the data into the snowflake whenever user is trying to load the data into the snowflake account this cloud service will take a end to end data loading process into the snowflake target tables. What it will do? Suppose let's say I am a shaker. I have initiated the data process into the snowflake account and I have a millions of records. I have millions of records to load the data into the table. While loading a data what cloud service layer will do if the data load request is completed successfully without any intervention then that data permanently committed into the database storage layer. If due to the some XY Z reason due to the some XY Z reason that data load request is not completed successfully partially data is loaded into the tables and remaining half of the data is not loaded. So in that case the data will not the data will become a invalid data because you have not loaded the entire data you have loaded the only partial data. Whenever you load that partial data and the data load request is incompleted in between then it will not become a meaningful data. It will be invalid data. In that scenario what your cloud service layer will do internally automatically whatever the data is loaded partially data is loaded the data will be rolled back data will be that rolled back because that is not a meaningful data invar data because in between the data load processing due to some explicit reason your account is closed or some we got some errors due to that automatically internally it will roll back all the records whatever already processed.
If everything is data processes completed if everything is loaded successfully into the tables then only it will commit then only it will commit the then only it will commit the all the records into the table.
And second point is it will it will also taken care it will also taken care about the security all the security related tasks it will handle cloud service layer will handle suppose I am a shaker and I have a certain limits into the snowflake account I don't have all the accounts it will allow you to do the activity according to the your privileges Those responsibilities will taken care by your cloud service layer only. If the server was user is a super user, he has all the admin processes according to that admin processes. This cloud service layer will allow him to perform all the activities according to my privileges. According to my privileges, it will allow me to provide the to perform the particular activities.
That's why it will handle the all the security related. It will handle the all the security related uh task. That's why we can call it as a cloud service layer is the topmost layer in the snowflake account. Top most player in the support, top most player in the uh architecture. So this is all about architecture. Any doubts?
>> No. No.
>> So this is completely theoretical thing.
Uh I am going to I am going to share all this material with you. Uh you you go ahead and then go through the things. Uh actually we have completed some couple of demos. I got a message from the insured people. They will allow only three demos as a free free demos. So for the next time uh they may be changing the link. You need to enroll to the course then only you'll get the new new link. Okay.
>> Okay. tomorrow [cough] you you join to the session was >> maybe I have a travel plan travel plan actually I'm thinking to travel if possible I will join or else we will connect on Saturday morning okay >> okay sir >> is that fine >> yeah fine fine >> yeah you need to you need to make enrollment to the institute so that you will get the new link and then join the session tomorrow and then if I Yes.
>> Oh, I'm saying. Okay.
>> Yeah. So, if I don't join tomorrow, then don't worry. We will join on Saturday. I have Okay.
>> Thank you. Bye-bye. Have a nice day.
>> I have some loud >> Yeah. Yeah. Yeah. Tell me.
>> Actually, I'm going through like snowflake. I I seen snowflake is having so many roles, right? With the with this course. Which roles we can target?
Strata engineer. Data engineer or analyst as no developer analyst no data engineer means whatever we are discussing that is a data engineer analytics means data analytics means additionally you will be having an on reporting tool experience >> with this during this course we need to have one more analytics reporting tool important for the data analytics.
>> Okay.
>> So with this whatever we are going to discuss you can apply for the data engineering >> time. Okay.
>> Yes. Yes. Yeah.
>> So other requirements are Python SQL.
>> Yeah. Python SQL. You need to focus on the Python and SQL.
So that will become a data engineer.
Data analytics means along with this addition you need to learn the reporting tool.
Okay.
Institute will provide job assistance or we need to uh any job assist will not provide but myself if I at the time of completion of the course if I come to know the any requirements and I will pass that information.
>> Okay.
>> Fine.
>> Okay sir.
>> Yeah.
Then we will connect tomorrow.
>> Okay. Okay.
Yeah, if if I don't join tomorrow then we will join on Saturday. You you you join tomorrow. Okay.
>> Okay.
>> Just wait for 5 to 10 minutes. If I don't join, then you can close immediately again. We will connect the because I will be in the travel. I'm not sure if I >> Okay.
Thank you. Bye-bye. Have a nice day.
Bye.
Ähnliche Videos
Ubuntu Touch Q&A 190
UBports
241 views•2026-05-17
Learning k8s ep. 3 - The end of the VM
devcentral
102 views•2026-05-15
Iterators and Generators: Real Use Cases
jsmentor-uk
188 views•2026-05-17
TCS NQT Coding Questions Solution (One Shot) | TCS NQT Preparation 2027 | TCS Actual PYQ 2026
knacademy20
2K views•2026-05-17
The 4 Bit AI Training Trick
explaquiz
414 views•2026-05-19
Image to 3D World Workflow 👀
badxstudio
843 views•2026-05-16
Why Learn Algorithms in the AI Era
bitsandproofs
245 views•2026-05-17
NFA - Transition Diagram and Transition Table
nesoacademy
198 views•2026-05-19











