Hello everyone welcome to Alibaba Cloud SA developer certification exam preparation on land costs. This is the chapter three data storage planning and replication, in this chapter are majorly talk about how can you plan to store data in the Alibaba Cloud different production service. Talking about application, data storage, you may consider to use optical storage or original database or even know civil database. So in this chapter I divide the whole topic into again four sections, firstly I give you a quick overview about the different forms we can provide. Then we talk about object storage, traditional database and no circle database separately. So if you look into this page, I divide about storage products into the three major categories, the first one object storage, that is OSS. The second one relational database we provide RDS and polar DB, for those database It is a big family. So I just pick up some very classic ones like the Mongo DB, readers and Cassandra and H base. I put the DTS here because if you know it DTS, our data transformation service, it can be used to migrate either noSQL database or list database from some other supporting vendors to Alibaba cloud. The first let's look into object storage, Alibaba cloud object storage is OSS for OSS, you can consider it and the instruct data storage. The applicable scenarios for OSS, majorly fall into these two scenarios. Firstly if you stall a lot of image or media files in the OSS you can consider it ends not only a central storage but also OSS has some naturally embedded features. To be able to process the image and your media file just in the cloud, which means you do not need to download those files to the local space and do the processing locally. You just define some predefined styles so that the OSS can operate on the target pictures or the media files automatically and then you only need to get a result once the processing is done. Another scenario you can consider using OSS is put your image, audio, video files in the OSS it give you another flexibility to be able to integrate with some kind of video service. Remember on OSS we can do something like the media processing automatically also with the help of the ECS. You can set up your whole video processing and transmission life cycle right after the processing this kind of files including the audio video. They can be shared through the CDN to the global wide customers. And also CDN can directly work with the OSS pocket, so that you do not need to have some intermediate proxy to accelerate your content. All these scenarios is just some of the major scenarios you could consider by using the OSS. All your consider OIS and the construct data storage final stop with the OSS, you cannot only process data online but also you can have a very cheap and very flexible storage service. In OSS we also support objects of learning and also you can define some policy to duplicate your data to the different region. So all this give you the most stability and persistency for your data storage, now, let's look into one question for the OSS. The developer is creating a computer function, this function requires around one gigabyte temporary storage for files while executing these files will not be needed after function is completed for fixed period. How can this developer most efficiently handle the temporary files? This is talking about if you have some temporary files generates during the compute function execution, what is the best way, most efficient way to handle it? So here they give you some options. The first option is about the stored files in the cloud disk. The second one is copy the file to the NFS. And the third one is store in the ECS temporary instance. And the last one is copy the files to OSS bucket and they enable the lifecycle policy to delete the files. Honestly speaking all these options are durable. But since it's asking about the most efficient way, which can save this developers time and effort, we prefer the option D because leveraging the OSS package features, it's called lifecycle policy. It can define automatically go out way to delete those timer files when the time is met. So the correct answer is D, okay, now let's look into the next section relational database. The first relational database we want to introduce is RDS, RDS is short for relational database service in Alibaba Cloud. RDS is actually a stable, reliable and flexible online database service based on our server distributed file system and high performance SSD storage. RDS support mySQL, SQL server, postgneSQL and Maria DB, it provides a portfolio of solutions for disaster recovery, backup, restoration, monitoring and migration. Also in this chapter, I want to show you is that for RDS you can choose not only the common instance but also if you really care about the performance you can choose about dedicated instance or even dedicated host. Comparing the common instance between the dedicated instance is that they share the different things. For the common instance you can see the CPU storage operating system they are shared. Only the memory and IO channel is isolated but for delicate instance besides memory and IO will make the CPU and reserved start space totally independent to the particular instance. It gave the customer better performance actually, here. Let's look into one simple question of RDS when you use circle statement to access tables in the database without an index configure a full table scan will be performed. If the table contains a large amount of data this kind of scan will access a large amount of data and consume A lot of database resources. So which of the following options are best practices when creating database indexes. This is a multiple choice question. First one add an index to a field that is frequently required but does not frequently perform add, delete, or modify operations. Applies indices to fields that contain many records. The table should not contain more than six index fields. Applying indexing to field or fixed length, adhere to the right most prefix principle when using composite indexes. So this question is actually talking about some basic understanding about index concept in a relational database. So the answer is A C and D to why the B and the E is not good. Because for B usually if a field contains many records we don't suggest you to separate them. And also if you want to use the composite index and the principle we should follow is the left most prefix principle not the right most, okay? That explain why the answer is A C and D. Another original database choice for you is PolarDB. PolarDB is our next generation relational database that is developed by Alibaba Cloud. PolarDB has three independent entities. So this ensures that polarDB is fully compatible with MySQL, PostgreSQL circle. It is also highly compatible with the oracle centex. PolarDB cluster supports a maximum storage space of 100 terabyte and contain a maximum of 16 nodes. So it is very suitable for the device database application scenario of enterprise level. So probably be using an architect like this. Which allows you to decouple and separate computing and storage. In this case all computing node share the same physical storage. PolarDB allows you to update or downgrade incident specification within just some minutes. And to perfirm full recovery within the seconds. PolarDB can ensure the global data consistency. It also provide data backup and discovery recovery service free of charge. So using polarDB you can gain the benefit of both commercial database and open source cloud native database. Commercial database are stable and scalable with high performance. Open source cloud database and easy to use and feature the rapid iteration. Talking about computing and storage the decoupling. So share the storage here is the couple's computing and from the storage. This is actually make your business requirement for auto scaling very easy. And because all computing nodes share the same underlying storage by using the distributed file system this significantly reduce the storage costs. And here there's one primary node and with multi read only node. So rethorized in party edition can use mountain world class architecture. With the proxy here you can forward the request from the application to the database node. And you can use a proxy to implement some very complex features like authentication, data protection, and the read write splitting. Also the proxy past the circle statement has the right to request to the back end primary node. And evenly distribute the read request to the really only node. So the proxy can also allow the application to access PolarDB by using the same method. So all in all PolarDB is our cognitive solution for the database in Alibaba Cloud. Comparing the RDS it has some very special benefits. Let's look into one simple question for the PolarDB. So which is the following statement about PolarDB's endpoint is not correct? So endpoint is client use to visit the PolarDB service. So the first option is the endpoint of the PolarDB MySQL database includes the cluster endpoint and the primary endpoint. The second one is through the cluster endpoint, application can connect to data nodes. So the third way the primary endpoint support read/write splitting. Last option is cluster endpoints support read/write splitting. Here it looks like these two options need to be select from one of them. So definitely the cluster endpoint is the central place you can visit from a plant side. So it definitely supposedly translating which leave the option C. The primary endpoint is not very correct because the primary endpoint is actually you need to directly access which don't have the support for the read/ write splitting. And I'm using this table to give you an even more detailed comparision between PolarDBand RDS. Some of the features that the PolarDB has but RDS don't. I'm going to highlight here firstly, the maximum number of the instance size PolarDB is much bigger. And regarding the read only know the numbers PolarDB can reach to 15 point for now and the RDS yearly support 3 to 5. Because PolarDB using the shared storage, it can really make the RPO the current point objective could be 0. But RDS they do have some literacy need to think the data from the master to the slave. And also because PolarDB is leveraging the paleface and the storage you can use the fairalevel snapshot to do the backup. It's much faster and RDS you really need to support the logical copy to do the data backup. So all in all PolarDB is our recommended enterprise level solution for the regional database service in Alibaba Cloud. So now let's look into someNoSQL database choices one by one. The first choice you can make is using the redis. I just laid several different scenarios which is suitable for using the redis in Alibaba Cloud. Including gaming, finance and e-commerce online video, transportation and social media. So I'm going to introduce some of the scenarios. Firstly ApsaraDB for redis can serve an important architect component in the gaming industry. You can use redis and the storage service gaming application can be deployed in a simple architect. The main programs can run an ESS instance. And the business data is stored in a jar of DB for radius, videos can be used for persistent storage. It use a mask replica deployment model to implement redundancy. You can also use ApsaraDB for redis and catching service to accelerate connection to AspsaraDB for redis. So data is only stored on back end database. Which is usually the RDS instance, also the high availability of RDS and Redis is essential to your business. If your RDS service becomes unavailable, the RDS instance may be overwhelmed by the request stand from your application. So our RDS adopts the mask replica architecture to ensure the high availability. The master is responsible for handling request when the master fields the replica take control of the workload. The fuel over is completely transparent to the end user. The second scenario is readers is widely used in the e commerce industry for business such as community presentation and recommendation, for the online shopping system. It is very important that online shopping system maybe we're overwhelmed by you the traffic during the large promotion activities. Most database are incapable of handling the heavy load to resolve this issue, you can choose authority before leaders for persistent story. Also considering inventory management system that supports stock picking. The Redis can use to count the inventory and RDS can be used to store the information about the quantity of the items, after D B for Redis, instance are deployed on physical servers that can use SSD disk which can hugely improve the excess speed. The third scenario I want to talk about is the live streaming, honest video, live streaming is strongly relying on the readers which is used to store the user data and the chart history. Redis can be deployed in the master replica architects to significantly improve the service availability, and also the cluster instances can eliminate the performance bottleneck can handle the traffic spark during the live streaming to meet the highest performance requirements. So let's take a look at one simple question regarding Redis, the developer is asking to implement a caching layer in front of RDS. The cache content is expensive to regenerate in case of service failure. Which implementation below would work while maintaining maximum uptime. So when you look through the options, the first one is include implement upside DB four Redis in cluster mode, 71 incinerators on its instance implement episode DB for Redis in standard mode, migrate the RDS database to H base to save the catch layer. I believe based on what is stated in the question part, that's very expensive to regenerate the data, so we don't we want to avoid the service failure. And also to make sure we have the maximum up time. It's all about the service availability, so that I think the class mode is one we can use to solve this problem. So the other options may be working but not the most efficient way to do this. So the correct answer is A, the second notable database I want to recommend to our developer is Mongo DB. We all know the Mongo DB is so called a document database, because what is start in the Mongo DB is just organized like Jackson style so that people can update and change the structure of your data easily. So Mongo DB can also be applicable for the different scenarios for the gaming application, you can also use authority from Mongo DB. And the database for game server to stall the user information, not only the user information but also the gaming equipment and credits of users can also directly stalled in the document to facilitate the future inquiry and update, and also regarding logistic applications. You can use Mongo DB to store the older information, the order status is constantly updated during the shaping process and is stored in the form of embedded area in the Mongo DB. You can read all the changes in order by using a single query which is committed to quick and clear. Another scenario is social network application, Mongo DB can store information and the information of people's chat history, right? You can use geographical location index to search the nearby people and places. Additionally, Mongo DB is suitable for stall the chat history, it can give you a rich query capability, and very fast not only for reading but also for writing also for the live video streaming, Mongo DB can use to store the user information and also the gift information. It is also suitable for big data because when using Mongo DB, you're actually stall the text level information right? Those information can be extracted and analyzed at any time by the max compute or some other big data engine. Let's look into the simple question of Mongo DB here, so which of these are suitable use case for Alibaba Cloud, Mongo DB service? Just and I introduced in the previous slide, first one application requires hike UPS such as logging application. The second one scenarios where complex transaction support is required. The sort of applications where dressing is a good choice to stall and application requires jurors for locations where such as logistic other status. So exactly as I mentioned before, maybe this one is not a proper one because I don't think the notable database has a very good support for the transaction level jobs and the others. I believe they are very suitable for the Mongo DB, so the credit answer because it's a multiple choice question is A, C and D. The certain associated base I want to introduce to you is H base. So Alibaba cloud episode before H base is an end to end no circle service, which is 100% compatible with open source H base, and deeply optimized and extended, it supports gigabyte even to petabyte level of data. So from the picture you can see each base at the database service even used in Alibaba group heavily, it supported many core service from Alibaba group such as top recommendation, risk control of M credit, pay advertising data dashboard, and the tiny logistic. It provides some enterprise capabilities such as low latency, low cost for hosting security, open source standards, and global distribution. So each base comprehensively provide capabilities such as real time storage, high concurrency output, like circle analysis, full text index, and the time Siri query and even special temporal query for massive, semi structured or structured data. Also compiled with comprehensive tools and services and rich ecosystem integration, upside the forage base provides and efficient solution to meet the storage retrieval and analysis need. So I would say Apsara DB for Hbase is recommended prefer database, let's take a quick look at one simple question of H base ,and this question is I think it's very easy to answer. To meet the data analytic requirements on large datasets, which of these is a base can meet the demand from structured and semi-structured storage for almost all types data, albeit with limited support for strong transactions? HBase, OceanBase, RDS MariaDB, Time Sequential Database. So definitely we are talking about the large dataset to support structure, semi-structured at the same time, I believe the HBase is the most suitable one. Okay, now let's look into the last NoSQL database recommendation is Cassandra. Actually, Alibaba cloud episode TV for Cassandra is a distributed no social database. It is based on the open source Apache Cassandra and we integrate with Alibaba cloud database and the service. So I decided to be for Cassandra features the following outstanding advantage, it is distributed, decentralized and multi active architecture. So Cassandra is actually developed for Internet business and has powered service for Internet companies for many years. It is the most popular wide table database. Actually Cassandra support access requests that require high conquering and low latency. It is applicable for scenarios such as locks, messages, feed, stream, orders, beatings, website and other online Internet scenarios, which need to precise a large amount of data. So we can say at least several of the application scenarios in details here. Want some questions you can gain from Cassandra are which of the following is not a feature of Alibaba Cloud's Cassandra database service? Centerless distributed architect. Support both narrow and wide tables providing flexible schema functions. Cluster storage, strong expansion of read and write capabilities. Enterprise-level capabilities such as disaster tolerance, backup recovery, security. So I believe here the keyword narrow and wide table because Cassandra we focus on the wide-table support. So I think the option B is not correct. Now, we have seen the different choices regarding the object storage, original database storage and also the NoSQL database. But the question right, like I have so many choices on cloud, but how can I migrate, migrate existing data to the cloud. So I want to introduce you this service. The DTS, Data Transmission Service, actually DTS support most widely used commercial and open source database including the RDB and also the NoSQL database. So you can find a lot of famous brands here and the source database. DDS can handle most of them and migrate to the proper ones on the cloud. Either RDS or PolarDB or MongodB readers, which I introduced before. The DDS cannot only do this but also they can support a functionality called Subscription. Subscription means whatever changed from the database is 0.2. It can be notified to the other different data processing engine. So this exchange can be captured and it can be aware by some other big data processing services. Also talking about migration itself, we support migration of the metadata, you'll recall it the schema, and the full data migration, incremental data migration is all supported. We even support the Oracle database to either RDS or PolarDB. Definitely we recommend you to consider to migrate Oracle to the PolarDB-O, kind of instance. So which of the following statement is incorrect for DTS?. First one, DTS supported migration between self-built Oracle databases. Second one, DTS support full data migration and incremental migration, but does not support structural migration. This is not correct. DTS does not support the migration of Oracle to RDS? Actually even it's not recommended but we do support, right? DTS does not support migrating Oracle to PolarDB. This is definitely not the correct answer. So the correct answer is A, we can use DTS to do the self build Oracle database migration between each other. Okay, now let me use this page to do a close for this chapter. In this chapter, I introduced the data storage choices and the developer you can pick up to start a different type of your data, of your application. So majorly you can see here from the storage point of view considering the OSS and the so called construct data for the unstructured data. For the original database, we have RDS and policy be recommended. For those of the database I talk about Readers, MongoDB, Cassandra database, also the choices. So from the bottom line using the different application and service to integrate data into the different source type. Once you have a story started here and there, you can begin to consider using our BigData or some other data processing engine to process the data and manage the data. And finally, show it in a different way, like the Quick BI. So from this big picture, I just want to give you a quick overview about how did the storage choices, especially for your application. Hopefully you enjoy the content and we look forward to seeing you in the next chapter.