March 3, 2026

Continuous Operational Improvements

This is a guide on continuous operational improvements.

C++ is among the best languages to start with to learn programming. It is not the easiest, but with its speed and strength, it is one of the most effective. This small study book is ideal for middle school or high school students.

Design Concepts

When you talk about continuous operational improvement, it is about ongoing improvement of your products, your services, and the process. Now this continuous operational improvement does not happen in one go. It is an ongoing incremental breakthroughs and improvements. So when you're talking about continuous operational improvement, it is basically when you build something, you have to continue to improvise on it. You have to continue to improve it. Let's say you have built a software and you just do not stop there. Say, okay, I have built a software, that's enough.

So you continue to improve that software, continue to add more features, continue to fix bugs, continue to make new releases. As in when you do, you release, you have basically followed the continuous operational improvement process, because you're continuously improving the product. So what we are trying to achieve is operational excellence. What operational excellence means here is you're trying to bring out the best in your product or the service, or any kind of offering that you have. It could be a process that you're running within your organization. So for instance, how do you continue to improve your operations.

Now, only way you can understand and measure that your process is good enough, is when you see the results. Outcome of that particular process will determine the operational excellence. Now, it is an incremental improvement. So as the name suggests, it is continuous operational improvement, which means bit by bit you keep on improving. Whether it is small improvement or large improvement, but the improvement needs to happen on continuous basis. Now that incremental improvement can be done by the process that you have defined, or it could be done using specific frameworks like six sigma, lean, total quality management, or even ISO 27001.

So you have to continue to improvise your product, or services, or the processes. Then you're talking about the use of best practices. So the models that I just described, which are six sigma, lean, total quality management, ISO 27001 and of course, there are many more. They offer you the best practices. Now these frameworks are designed based on the best practices from the industry and most of these frameworks can be used across various industries. Be it automobile, be it IT industry, be it manufacturing. So you can use these frameworks across any industry. You can take these frameworks, adopt them and automatically you're practicing the best practices.

Now, continuous improvement is also a continuous learning. Remember, learning never stops. The moment you stop learning, you have failed to make any further improvement. Because with the learning, you will continue to understand whether your product has any scope for improvement. Now, let's take an example, you bought a car five years back. At that time, that particular car was the best in the market. Five years later, you will find that there is another car that has more features, which has better features. Now overall, the look and the feel and the design of the car is much better than your existing car. Now, what has the car company done? Basically, based on their studies, based on the user feedback, based on the continuous learning, they have improved their existing car to something much better.

Now, that happens with continuous learning, which can be through user feedback, which can be through market study, which can be through competition analysis. It depends how you want to learn and what you're trying to achieve. But the thing is, continuous learning has to be integrated into your continuous operational improvement process. Then you're talking about continuous breakthrough performance. Then you're talking about continuous breakthrough performance, which has to come from something you build, something you improvise, otherwise that breakthrough performance will never happen. You can outperform yourself or competitors only if you improve.

Now, we are going to look at the need for continuous operational improvement. When we are talking about the continuous operational improvement in the cloud environment, one of the biggest factor is that cloud environment is always changing. Let's look at bit of comparison. When you have your on-premise data center, things may not change, maybe, let's say for five years, because you are not changing your infrastructure, you're not changing your physical devices. You're not upgrading the operating system or the application. Your on-premise infrastructure may be static. But that does not happen in the cloud environment. In the cloud environment, the cloud service provider is continuously improving.

And why they are improving? Because they have to stay ahead of the competition, they have to offer best in the business. Now, if you do not do that, somebody else will just walk over them and wipe out their businesses. They cannot afford to let that happen. So they continuously are upgrading their applications, their physical infrastructure, their servers and the operating systems. Let's take an example, if you go to the AWS today, you would find the latest operating systems are already available. The latest versions of applications are already available. Something that was released last week is probably also available in the cloud environment. Now, for you to integrate that into your on-premise infrastructure, it will take you probably a couple of weeks or months before you can get an approval, place an order and then get the application up and running in the data center.

But here in the cloud environment, that is not the case. So the environment in the cloud is always changing and the cloud service provider has to cater to different customers, different cloud consumers. Which means it could be a small sized customer, it could be a large customer and they will have their unique demands. They will have their unique requirements. So if the cloud service provider is not able to cater to most of these requirements, then obviously it is pretty obvious that the cloud service provider may just run out of business, or may not have a good list of customers. Therefore they need to meet these demands and then to meet these demands, they need to continue to make improvements in their infrastructure. They need to continue to offer new technologies, new operating systems and new applications. Now, there will never be a best way to run operations.

That, of course, you cannot say something that this is the best and the only method using which I can do a particular task. That is not the right way. If you are doing something, you will find that there will be somebody else who's doing the same thing, but in a much shorter time, in a much smarter way. So can you say your method was the best way? No, you cannot, because somebody else has done it in a much better way. But can you say that guy has the best way of doing things? No, because somebody else, that is the third person who might end up doing something again in a much shorter time, so there can never be a best way to run operations. You can keep on improvising.

You can keep finding smarter ways to shorten the time to run operations. You can start doing automation. Then you will see even the automation applications have matured. They have matured in a way that something that you could not do yesterday using these applications, now you can do it. So everything is improving and they will continue to improve as long as people and the organizations want to stay ahead of the competition and want the performance excellence within their operations. Now, if you want continuous operational improvement, you need to also empower the team. You need to make your team realize that there is always a better way to do something, be it in the cloud, be it in the on-premise data center, but the team needs to be empowered.

They need to think from the improvement perspective, they need to understand what improvement does. One of the biggest reasons why you need continuous operational improvement is because you want to exist in the market. Now, if you're not updated with the market trends, you're not updated with the technologies that the other competitors have. Then of course you will not exist in the market. Soon, you're going to run out of the business. Now, when you want to stay ahead in the market, you want to be the best in the market. That means that you want to be one of the leaders in the market. And of course, you know you need to build continuous operational improvement cycle within your own team, within your infrastructure.

So typically when you're talking about continuous improvement, you're talking about the PDCA model, which is plan, do, check, and act. So that is the model that you need to build into your processes, your team's mind. It needs to be in your complete set of operations. Let's now talk about the cloud ops and the continuous operations. So there is the team called DevOps, which is basically using the Agile methodologies to do quick development of applications. Let's now talk about CloudOps and the cloud operations, which are continuous operations that we are focusing on. So CloudOps basically relies on the continuous operations. Now, what is CloudOps?

That is the first question. It is about running the cloud operations, be it anything. Be it software development, be it monitoring, be it managing the virtual machines. Everything comes under the CloudOps. Now CloudOps cannot exist without cloud operations. Those operations need to be built into the CloudOps. So what happens is when you let's say have a service, you cannot afford a downtime with that particular service. So that is what CloudOps is going to ensure. With the continuous operations, the cloud service continues to run. And basically, their objective is to ensure that anything that is placed in the production must run without interruption. So this means, continuous operations means installing mechanism, running any application or service, or basically any kind of operations in the cloud that runs with zero downtime.

Let's now talk about logical design of the cloud infrastructure. When you're talking about the logical design, so every cloud infrastructure has two aspects or two sides to it. One is the logical design, then comes the physical design. So let's start with logical design. When you're talking about the logical design, it is basically anything that is running in the cloud, is part of the logical design. So be it your virtual machines, software based firewalls, be it how virtual machines are communicating with each other. How data is flowing from one end to other, how public interfaces of the virtual machines or routers are configured, this is all part of the logical design. Basically, how the operations will run within your cloud is what you call the logical design.

So you start with multitenancy. Now, of course, there would be companies, or the organizations who are very much worried about the data they have. They are worried about the security of their data. And to safeguard their data, they shell out lot of money. So they go for a single server model, but majority of the cloud exists in a multitenancy model. Which means that a lot of these virtual machines are running on a single server, which is basically consolidating the virtual machines. Then of course, the single server has a replication model. So the entire thing is getting replicated somewhere else. Now, within a single server, then there is a sharing model. So you're sharing the hard drive, you're sharing the applications, that is the multitenancy model.

Now, of course, you know when you have the data on a single drive and a lot of customers are also having the data on the same drive, there could be a problem. There are risks when you have multitenancy model. But in fact, that is the model which is basically running the entire cloud operations, the entire cloud infrastructure and the cloud computing. Because majority of the users use this particular model. Then we are talking about the cloud management plane, which is about monitoring and administration of the cloud network platform. Then we are talking about automation, monitoring and auditing. So most of the cloud infrastructure needs to run in the automated mode. So of course, you know, you need to monitor and you also need to do the auditing.

For instance, who has access to your data, who has access to your applications? You can do that kind of auditing. You need to have 24 into 7 monitoring model. In the nutshell, everything in the cloud must be automated to cut down the efforts and errors. And it should be continuously monitored and audited. Then we are talking about the service models. There could be different service models like infrastructure as a service, then you could have platform as a service, or you could have software as a service. And moving forward, then we have the logical design levels, in which we are talking about logical design for data separation, that can be incorporated at different levels.

You have compute node levels, you have management plane levels, you have storage nodes, you have control planes, you have a network so the data can be segregated at different levels. Virtualization technology basically helps you overcome with the bottlenecks of partitioning and storing the data. And it allows you to run multiple virtual machines on a single host. So when you're talking about virtualization technology, you should incorporate a hypervisor that meets the system requirements. So basically, you should be carefully selecting the hypervisor. Now, when you talk about these three influencer which are platform as a service, infrastructure as a service, software as a service, you have to understand that there is a shared responsibility model that runs along with them.

For instance, if you talk about platform as a service, cloud consumer is only responsible for application and the data. When you talk about infrastructure as a service, the cloud consumer is responsible, right from the operating system, till the application and data. Now, when you move to software as a service model in this particular model, the cloud consumer has basically no role but to maintain some bit of the data in the cloud environment and some bit of software configuration. But the entire stack of layers is controlled by the cloud service provider. Let's now look at physical consideration. When you're talking about the physical considerations, you have to understand there are no two data centers that are alike.

They should never be for that, because it is an organization's business that derives the requirements for the IT and the data center. You will not set up the data center because you want to set it up in a certain way, you would want to set up the data center to meet your business requirements. So for instance, you would set up applications in the data center because your business requires these applications. You will only set up five servers, because you need five servers as per business requirements. You would not, just on ad hoc basis, would go and set up 20 servers, because 15 of them are going to be lying idle, because there is no use of them. So basically what we are saying is be it your on-premise data center, be it your cloud environment, both of them are designed to serve a specific business need.

And without that need, you would not end up creating something that does not serve any purpose to anybody. So let's look at some of the physical considerations. So you would do cable management, now that is in specific to the on-premise data center. Because in the cloud, you're not worried about the cable management, that is the cloud service providers responsibility. But cable management is pretty much a critical component in the physical design considerations. Now, why would you want to focus on the cable management? Because you do not want any kind of obstructions in the airflow. Now, if you randomly layout the cables and of course they are going to cause a lot of obstruction in the airflow, which means that your data center will have heating problems.

Therefore, the way you want to lay down your cables is to ensure that it does not cross the parts with the air flows and that includes the rack level IT equipment. So you have to also focus on the racks, because servers will be connected. There are going to be cables in the back side of the racks. So you have to ensure that those cables are probably laid out in a way, so that they do not cause any kind of obstruction in the airflow within the rack. Then we are talking about HVAC considerations. Now, HVAC stands for heating, ventilation and air conditioning. You want to ensure that these are running based on the air temperature. So you might want to switch on that air conditioner if it is getting pretty hot within the data center.

But nevertheless, in most cases, air condition should always be running at optimal temperature, so there is no heating. Then we are talking about air management for data centers. Now, you will have basic hot aisle and cold aisle configuration. Air management is another critical factor in the physical design of a data center. So you have to work with hot aisle and cold aisle configuration, which means the data center equipment is laid out in rows of racks, with alternate cold and hot aisle between them. So basically, what you have to do is that there should be an alternate hot and cold rack placed when you're talking about the air management. Now it is about the design and the configuration that minimizes and eliminate mixing between hot air and cold air. So basically, you want to ensure is that cool air is supplied to the equipment and the hot air is rejected from the equipment.

And it is basically sort of thrown out of the data center using the exhaust fans, or whatever other method that you choose to use. But that is how it has to work, so the cool air must exist within the data center, but the hot air must be simply rejected and moved out of the data center. Then we are talking about the aisle separation and containment. Basically, you will have to work with the hot aisle and cold aisle equipment racks. Now, this kind of configuration when you talk about, it is about laying down these racks or rows of racks with alternating cold and alternating hot. Let's look at some of the physical design security considerations. So you have to ensure that there are access points at the building gate.

There are access points laid down at the entry of the data center, or any kind of room that you have within the data center. It needs to be protected. Your server rack should always be under lock and key. Then we are talking about fences. The building should be guarded with the fences. They should be high enough to stop somebody from jumping over. Then you need to have security guards, not only outside the building, but within the building itself. You could also have some security guard at the reception and near the data center. And you need to have electronic surveillance, which could be your CCTV cameras, which could also be the motion sensors, or even the biometric controls. Just to recap, in this particular video, we looked at the need for continuous improvement. We looked at the need for logical design and we also then looked at the need for physical design.

Enterprise Operations

Let's head over to Azure and now see how isolation works. We are on the Azure services page now and see there are a lot of options that are already available. So let's click on the Resource groups icon and here you will notice that there are already two resource groups that were earlier created.

[Video description begins] The: Resource groups screen displays the following buttons on top: Add, Manage view, Export to CSV, and so on. Below these buttons, the "Filter by name" field is located along with "Subscription = = all, Add filter, and so on. The table below includes three columns labelled as: Name, Subscription, and Location. [Video description ends]

We can add many more resource groups. So you can also export to CSV, which is the list of the resource groups. Now you can click on the Add button.

[Video description begins] The "Create a resource group" screen contains three buttons on top labelled: Basics, Tags, and Review + create. The "Basics" section includes a write up about "Resource group." Next is, "Project details" which contains the "Subscription" and "Resource group" fields, and finally the "Resource details" section which displays the Region. [Video description ends]

And now, notice that there is a Create a resource group pages displayed. So the subscription model by default is Pay-As-You-Go and then you enter the name as Test01. Here, in the Region drop down list you can select a particular region. So we just scroll down and see there is a list of regions and we can select (Asia Pacific) West India. Now this is done. So we need to specify the tags now.

[Video description begins] The "Tags" section includes three columns labelled as: Name, Value, and Resource. [Video description ends]

So for this instance, we can just say Name and we can enter a value for the resource groups, which is RSG01. Now you can review the configuration and go ahead and create the resource group.

[Video description begins] The "Review + create" section displays the defined: Subscription, Resource group, and Region. [Video description ends]

So now this particular resource group gets created. So you will have to simply refresh the page to view the particular resource group in the list.

[Video description begins] The host selects the Refresh button on the top left of the screen. [Video description ends]

So we'll just go ahead and refresh the page. So once we refresh the page, that particular resource group which we created appears in the list. Now all three resource groups are in three different regions. So we just click on Test01 resource group, then a middle pane appears

[Video description begins] On clicking Test01 a middle and right panes display. The middle pane contains options such as: Overview, Activity log, Access control (IAM), Tags, and so on, as well as the "Settings" sub-section, which includes: Quickstart, Deployments, Locks, and so on. The right pane includes buttons labelled as: Add, Edit columns, Move, and so on on the top. Below these buttons, the Subscription details appear along with Deployments, Tags, and so on. [Video description ends]

in which there are a lot of options that are available. So we just click on Access control (IAM) and then assign a role.

[Video description begins] The right pane now displays options on top such as: Check access, Role assignments, Roles, and so on. The "Add a role assignment" and "View role assignments" sections appear on the right of this pane. [Video description ends]

We can view the role assignment for this particular resource group and then we click on Deployment. Notice that there has not been any deployment in this particular resource group. So then we scroll down, and we will also look at the cost and the Logs. There are various options that we can look at. Now going back, we can click on Home to go back and click on the virtual private network icon.

[Video description begins] The Virtual networks window displays the following buttons on top: Add, Manage view, Export to CSV, and so on. Below these buttons, the "Filter by name" field is located along with "Subscription = = all, Add filter, and so on. The table below includes columns labelled as: Name, Resource group, Location, Subscription, and so on. [Video description ends]

There is already a virtual private network that is created, we just click Add to create another virtual private group.

[Video description begins] The "Create virtual network" screen contains five buttons on top labelled as: Basics, IP Addresses, Security, Tags, and Review + create. The "Basics" section includes a write up about Azure Virtual Network. It contains a section: Project details which contains the "Subscription" and "Resource group" fields, and finally the: Instance details section which displays the: Name and Region. [Video description ends]

So we will keep the subscription model as default from the drop down list. So first we'll select the resource group which is Test01. We will now assign the name for the virtual private network, which is Test01Net01 just to acknowledge that this particular virtual private network is created within the Test01 resource group. From the region drop down list, we will select (Asia Pacific) West India. Once we do that, now this configuration is done. So once this is done, we just move ahead and specify the IP address.

[Video description begins] The: IP Addresses section displays the following fields, namely: The virtual network's address space, Subnet name, and Subnet address range. [Video description ends]

And we for the time being, we will just select any of the IP address ranges that is there. And under the Security tab, notice that the DDoS Protection is set to Basic. Now if we set it to standard, this means we will have to pay some extra money. Firewall is set to Disabled state, again, if we enable it, it will cost us a little bit extra. So for the time being, we'll just keep it disabled. So once we do that, we now move on to Tags. For the time being, we'll just skip that. And on the Review and create screen,

[Video description begins] The: Review + create section displays the defined: Subscription, Resource group, Region, IP addresses, Security, and so on. [Video description ends]

we should review the configuration that we have done before moving ahead. And then once we are done with the review, we should go ahead and create the virtual private network. So in this configuration, you can just verify what all has been created. So for instance, the resource group name is Test01, so we'll just go ahead and create this virtual private network after review of this information is complete and we verified all the parameters are correct. Now, just to recap in this particular video, we learn to create a resource group. And then we also learn to create a virtual private network within that resource group.

Securing Network Operations

In this video, we will look at how to secure network operations. Now, there are a variety of network operations that you can perform. So be it connecting to a VPN server or creating communication over the internet between two endpoints. Now, you would need your communication to be secured in all manners. So therefore, there are different protocols that you can use. You can use TLS, which is Transport Layer Security. You can use IPsec which is IP security. Or you can use DNSsec, which is DNS security. Now, depending on what operations you are performing accordingly you will be using a particular security protocol to secure your communication. So let's now look at AWS and see what kind of options does it provide in terms of securing network operations.

[Video description begins] AnAWS window displays two options on the top, labelled as: Services and Resource Groups. The left pane includes sub sections such as: Security, Virtual Private Network (VPN), Transit Gateways, and so on. The right pane includes three buttons on top, labelled as: Create Client VPN Endpoint, Download Client Configuration, and Actions. Just below these is the search bar, that labelled as, "Filter by tags and attributes or search by keyword" field. [Video description ends]

So we have the Client VPN Endpoints. Now when you talk about Client VPN Endpoints, you can create it in AWS. So click on the Create Client VPN Endpoint.

[Video description begins] The "Create Client VPN Endpoint" screen displays fields labelled: Name Tag, Description, Client IPv4 CIDR, and Authentication Information and options within that section. [Video description ends]

Now when you're configuring Client VPN Endpoint, it uses a TLS VPN connection. Now TLS is used for end-to-end security over the Internet communications and online security.

[Video description begins] The host selects "Virtual Private Gateways" from the left pane. [Video description ends]

Then you have the Virtual Private Gateways. Here, this one particularly uses IPSec, which is used for securing communication between two endpoints. It provides authentication, integrity, and confidentiality of the information that is being transmitted from one endpoint to the other endpoint.

[Video description begins] The host accesses the: CLOUDFLARE website, which includes various options on the top, labelled as: Products, Solutions, Resources, and so on. The bottom pane displays: How Does DNSSEC Work?, Root-Signing Ceremony, and ECDSA and DNSSEC options. The host scrolls up. [Video description ends]

Now let's move over to CloudFlare, which provides a lot of security solutions. So one of the solution it provides is DNSSEC, which is DNS security. When you talk about DNSSEC, it is a protocol that secures the information used by DNS. It helps you protect the DNS data by using a public key and signs the authoritative zone data. What does this mean? It means that the authoritative zone data cannot be tampered with. Now, this is one particular requirement when you have a domain name that exists. You do not want somebody in your DNS server. You do not want somebody to change anything in your DNS server.

You do not want anybody to change the authoritative zone settings and take control over it. This is one of the services that CloudFlare provides. It also gives you basic details of how DNS works. What is the route signing ceremony, and what are the security protocols. Now if we head over to VERISIGN LABS, so here, we have entered google.com.

[Video description begins] The host accesses the Verisign Labs website, which includes: Domain Name, Detail, Time, and the table for Analyzing DNSSEC problems for google.com. [Video description ends]

We are just trying to figure out whether, is there any DNS problem for google.com but most of the configuration seems to be okay. So here, there are no particular DNSSEC problems for google.com. So VERISIGN, when you enter a domain name it will give you all the details about that particular domain name regarding to the security. And if it finds any information that is not okay with the DNS records, it will flag it right then and there. So there is a lot of information that is displayed in the middle row. You can verify this information and all seems to be pretty much fine because everything is checked marked as green. Now, this is what you can check out for your domain name when you go to the VERISIGN's LAB. Just to recap, in this particular video we looked at what are the options we can use for securing network operations. And to be able to do that we can use TLS, IPSEC, and DNSSEC.

Dynamic Operations Accounting

In this video, we will learn about how to configure Auto Scaling in the AWS environment. Now, when we are talking about Auto Scaling, it is basically the cloud controller, dynamically allocating resources to maximize their use. It uses elasticity. Now, what happens when you're running a server? In this case, there is a requirement for that server to have more resources. Now, Auto Scaling will automatically add more resources. Let's see how to configure this particular Auto Scaling in the AWS environment.

[Video description begins] An AWS window includes the left and right panes. The left pane contains various sub-sections and options such as: EC2 Dashboard, Events, Tags, Instances, Instance Types, Reserved Instances, Images, and so on. The right pane displays three buttons on the top, labelled: Launch Instance, Connect, and Actions. The table below these buttons has columns such as: Name, Instance ID, Availability Zone, and so on. The bottom pane which is titled: Instance contains the tabs such as: Description, Status Checks, Monitoring, and Tags. [Video description ends]

We go into the EC2 dashboard and

[Video description begins] An EC2 Dashboard displays the "Resources" section on the right pane. This includes options such as: Running Instances, Dedicated Hosts, Volumes, Snapshots, Key pairs, Load balancers, and so on. The right-most area displays the: Account attributes and Additional information. [Video description ends]

here we will click on the Running Instance link. Now, once we do that, we will click on Launch Instance. Once this page loads, we'll scroll down and

[Video description begins] The Step 1: Choose an Amazon Machine Image (AMI) screen displays the 7 steps on the top, labelled as: Choose Instance Type, Configure Instance, Add Storage, and so on. The left pane includes the options under Quick Start such as: My AMIs , AWSMarketplace and so on. A list of AMIs appear on the right pane. [Video description ends]

select Microsoft Windows Server 2019 Base and click on Select button.

[Video description begins] The Step 2: Choose an Instance Type screen displays a table with columns labelled as: Family, Type, Memory (GiB), Internal Storage (GB), and so on. [Video description ends]

Notice that general purpose image is already selected by default. It is using free tier eligible option. So I will just go ahead and select this particular image and move ahead.

[Video description begins] The Step 3: Configure Instance Details screen includes fields such as: Number of Instances, Purchasing option, Network, Subnet, and so on. [Video description ends]

Now, the step 3, which is the configuring instance detail. Notice that lot of options are already defined. So for this instance, that network is also defined, we will go ahead and click into Launch into Auto Scaling Group.

[Video description begins] The: Launch into Auto Scaling Group pop up window displays, which includes the "Use a Launch Configuration" link, along with buttons labelled as: Cancel and Continue. [Video description ends]

Now, what it says is you do not have the launch configuration. So we will first configure the launch configuration and click the Continue button. A new page loads up in this particular page, we'll have to create the launch configuration so we define a name in the main text field.

[Video description begins] The: Create Launch Configuration screen displays various fields such as: Name, Purchasing option, IAM role, and Monitoring, along with Advanced Details. [Video description ends]

Let's say, we will say server 01, and then we move ahead and by default, we use the 30 GB General Purpose SSD and move ahead.

[Video description begins] The: Create Launch Configuration screen now displays different columns labelled as: Volume Type, Device, Snapshot, Throughput, and so on. The Step 4: Add Storage is highlighted on the top of the screen. [Video description ends]

We have to assign a security group.

[Video description begins] The Create Launch Configuration screen displays different fields labelled: Assign a security group, Security group name, and Description. The table below these includes columns such as: Type, Protocol, Port Range, and Source, with the Add Rule button below. [Video description ends]

So here we choose to create a new security group, which is the default option. Now, there are a lot of protocols that are already provided. Click on the RDP drop down and keep it as the default option, even though a lot of them are available. So you click on the Source drop down, and see by default anywhere is selected. But you could choose to any particular IP address from where you want to access. Right now, anywhere is given.

[Video description begins] The Create Launch Configuration screen now displays different sub sections such as: AMI Details, Instance Type, and Launch configuration details. The host scrolls down. The Step 6: Review is highlighted on the top of the screen. [Video description ends]

And then we just review the configuration and go ahead and create the launch configuration.

[Video description begins] A pop up appears which displays: Select an existing key pair or create a new key pair as the heading. There are two fields in this window, labelled as: Choose an existing key pair and Select a key pair. An acknowledgment check box and : Cancel and Create launch configuration buttons appear at the bottom. [Video description ends]

So we have to assign a key pair now. So for that, we select the Create New Key Pair option and just for the name sake, we define a new name for the key pair and click the Download Key Pair option. Then we move ahead and Save. So now, we are onto Create Auto Scaling Group.

[Video description begins] The: Create Auto Scaling Group screen displays fields such as: Group name, Launch Configuration, Group size, and Network. The "Advanced Details" section appears at the bottom of the screen. [Video description ends]

So we define a new name for this particular group we say AS01. And notice, it starts with one instance. So this is the bare minimum instances that you would need to start an AutoScaling group. We will keep the default option of one. Then you choose a particular network and along with that you will need to choose a particular subnet. We will just go ahead and select one subnet and then go ahead and move to the next step which is configure scaling policies. We are on the Configure scaling policies page. Now these policies are nothing but we will select Use scaling policies to adjust the capacity of this group.

[Video description begins] The: Configure scaling policies screen now displays the "Scale Group Size" section at the bottom of the screen. This includes fields such as: Name, Metric type, Target value, and so on. [Video description ends]

Notice that it scales between 1 to 1. So we change from 1 to 3 instances, which means it will start with one and it will go up to maximum 3 instances. So we leave the default name. Now, we come to metric type. Basically, these are something that will trigger the Auto Scaling group. So by default it is the Average CPU Utilization. If you have a Load Balancer you can configure on that particular Load Balancer as well. Also it is the application Load Balancer request count per target, then the average CPU utilization, you can specify let's say, if 70% or above, it will launch the other instances. You can also do it on the average network bytes. So in the target value textbox, we go ahead and specify the value as 70. Now, how much time does the instance require before booting up? So it would be like something like 300 seconds. So this is the warmup time for the instance. Now, we can also create a notification.

[Video description begins] The: Create Auto Scaling Group screen now displays fields such as: Send a notification to, With these recipients, and Whenever instances. The Add notification button appears at the bottom of the screen. [Video description ends]

We can type in a message to send a notification to, we can choose the actions on which this notification will be sent. And then we can add the recipients in the with these recipients text box, which means anybody whose names is mentioned in this particular textbox, that email or the notification will be sent to that particular person. Then we move ahead and we also define tags. And finally we do the review and finalize the configuration. So this is how you configure the Auto Scaling group. Now just to recap, in this video, we looked at how to configure Auto Scaling in the AWS environment.

Access Control Operations

So let's click on the VPC link, and then it loads the VPC page.

[Video description begins] The VPC link is present in the "Resources by Region" section. [Video description ends]

In the left pane, notice that we have Virtual Private Network, Customer Gateway, Virtual Private Gateway. We also have Site-to-Site VPN and Client VPN Endpoints.

[Video description begins] The right pane displays two buttons on the top: Create VPC and Actions. The table below displays columns labelled as: Name, VPC ID, State, and so on. The pane below this contains the: Description, CIDR Blocks, Flow Logs, and Tags of the VPC . [Video description ends]

Let's click on Virtual Private Gateway. Now notice there was already a virtual private gateway created, but it was later deleted. Click on that Create Virtual Private Gateway button. Now in the Name tag field, we will define the virtual gateway name as VPG02. And for the time being we will use the Amazon default ASN, which is a service tag used by the BGP protocol. To keep this particular default option, which is the default ASN, we will go ahead and create the virtual private gateway.

[Video description begins] The screen displays a field titled "Virtual Private Gateway ID." A button, that is labelled as "Close" displays at the bottom right of the screen. [Video description ends]

The virtual private gateway has been created so there is a Virtual Private Gateway ID. Now click on this Close button to close this particular page. So there is VPG02 that has been created, but it has been marked as detached. If you look at the State column, it is marked as detached and it uses ipsec for encryption. And there is an ASN number that has been assigned to it. So let's now click on the Actions button and click Attach to VPC.

[Video description begins] As he clicks the "Actions" button, a drop down with the following options displays, namely: Delete Virtual Private Gateway , Attach to VPC, Detach from VPC and Add/Edit Tags. [Video description ends]

[Video description begins] A page titled: Attach to VPC displays. It contains a field, titled: Virtual Private Gateway ID and VPC. The: Cancel and Yes, Attach buttons display at the bottom right. [Video description ends]

So we will have to select a particular VPC and click Yes, Attach button. Now there is a change in the state of this particular virtual private group.

[Video description begins] The "State" column in the right pane shows the status as "attaching." The host selects: Your Customer Gateway Device window. [Video description ends]

So meanwhile, we'll just change over to a diagram and see how it works. So you have the Amazon AWS environment in that you have a VPC, and VPC contains two particular subnets. Now, on the edge of the VPC, there is a virtual private gateway, which connects to the customer gateway. So there are multiple customers to which it can connect. The communication between the virtual private gateway and the customer gateway, there is a VPN connection that is in between both of these end points, which is the virtual private gateway and the customer gateway.

So this entire diagram depicts how virtual gateway connects to a customer gateway. So this is about the diagram. Now let's switch over back to the AWS and see the status, whether it is still attaching or attached. Now, if we see the status, it is still attaching. So we refresh the page and we find that its state has been changed now.

[Video description begins] The host selects the Refresh button on the top left of the window. [Video description ends]

So let's wait for the page to load and see. Now, once the page loads, we see that the status has been changed to attached. Now, just to recap, in this particular video, we learned how to create a virtual private gateway and attach it to a VPC.

Patch Management Plan

We use different types of software or applications throughout our daily lives. Now none of these software or applications can be called as perfect. Here perfect means they do not have any kind of bugs, they do not have any kind of problems embedded into them. So if there are bugs or there are issues or there are problems within these application or software, how do we fix them? We fix them with a process known as Patch Management. Now when you talk about patch management, it is a method of delivering a type of update to the system or the software or application. This could be an update, which is either fixing a bug or it could be an update, which is adding a new functionality to the application or the software. It could also be an update that is fixing a incorrect functionality or either removing or simply fixing it, it depends.

So getting the updates from the software vendor to your system or application is known as the patch management process. Now it could be a hardware level update or it could be a software level update. When you talk about the hardware level update, it could simply be updating the firmware which is either embedded into the router or the switch or the system that you have. It is something like BIOS that needs to be updated from time to time. And for instance, if you go to the hardware vendor site, let's say Dell or HP, you will find that there is an update for the system firmware. Now that is what hardware update is. A Software Update could be simply adding a new functionality or fixing a bug.

When you're talking about adding a new functionality to a particular application or software it is about getting the update added either as a module or a major chunk of code that gets embedded within the application and adds new features. So let me give you a vague example. Now you have an application which you work on daily basis. There are about five menus on that particular application, you have applied a new update to this application. Now when you open the application you have six menu options now. Now there's a new menu option that has been added. Let's say it's about exporting the document into multiple format. So that is the new functionality. Now update could also be simply fixing a functionality or a vulnerability.

Let's say there is a zero day vulnerability that has been discovered in particular application that you're using. So Microsoft or any other software vendor that owns the application will immediately release a new update. When you apply the update, you're closing that issue in the functionality, or you're closing that particular vulnerability within that application. Now a patch management process could be manual or it could be automated. When you're talking about the manual process, it is typically one or two systems where you're going to be applying the updates. Now when you're talking about hundreds of systems, would it be possible for you to update those systems manually?

Yes, you can do it but it would probably take you a month or two to close out that vulnerability or add new functionality by applying the updates. So the better method in this scenario would be to choose an automated method. It could be done through scripts. It could be done through specific patch management applications, like in the Windows environment, you have something called Windows Software Update Services. Now this particular application can deploy patches to all Microsoft products that you have in your environment. It depends on the number of systems you have. It depends on the number of updates you have, you will have to select an appropriate method. But the question is, why is there a need for patch management? The answer is you want to reduce the attack surface, you want to protect the cloud infrastructure.

And you also want to protect the on-premise infrastructure if you need to apply patches or the updates within the on-premise infrastructure. Now remember the same application if it is running in the on premise infrastructure or in the cloud environment, there is no difference. If there is a vulnerability within the application, it will still exist in both the places, you want to reduce that attack surface by closing that vulnerability. Otherwise an attacker if he or she figures out that you are running that application and it still has the vulnerability, that attacker could very well exploit that particular vulnerability. Of course, there are going to be compliance needs. Now whenever you using a regulatory framework or any specific type of compliance framework, it requires you to have your systems updated with the latest patches.

So you need to ensure that you apply these updates on time to meet your compliance needs. You also need to increase the employees productivity by reducing technical bugs. Now let's say you have bought an application. And users continue to work with that application, but the problem with the application is that it continues to crash the moment you try to print a document or export a document, the application simply crashes. This is just a vague example in this scenario. Now that particular bug needs to be fixed because without exporting or printing the documents, your users who are the employees are not able to proceed any further. And they need that particular export document or the printed document as the final outcome.

Now if they are unable to achieve that final outcome of course, they are losing their time, they are losing the money for the organization. Because they are wasting their time trying to struggle with that particular application in printing or exporting the document. Now you want to reduce that technical error or the technical bug and ensure that employees are able to meet their productivity. And of course, like I said earlier, not all updates are meant to fix vulnerabilities or technical issues, they are also there to add new features and functions. And of course, this goes back to the first point, which is about reducing the attack surface.

You want to protect all systems and this not only the servers but other hardware devices like routers, switches or your wireless access points. Everything needs to be always updated with the latest patches and updates, latest firmware in the applications and the hardware level appliances or the devices. Now when you talk about the patch management process, be it the manual process or the automated process there is one typically general process that you need to follow. So the first step is to identify the missing patches. And of course, this has to be done using automated tools to figure out what patches are missing. Now what is this automated tool does is it goes back to the vendor site from where it will scan a list of patches and then it will come to your system and see whether those patches are installed.

It compares both the lists and then generates a list of missing patches. Now you need to acquire these patches. Again, if it is one or two systems, you can manually download these patches and apply them or else you will have to use an automated method or automated application that will download these patches for you and deploy them. But before you get down to the deployment of these patches you need to test these patches. Or the updates that you have downloaded, you need to test them out, you should put up a separate staging server or a staging desktop while you're testing up these updates. Then once the testing of these updates is fine, then you can go ahead and apply these patches in the production environment.

Now testing is a critical component here. If you deploy these patches without any testing, think about what can happen. In the production environment, you have a web server on which you have just deployed an update. And this update does not work well with the existing web server configuration. There you go, you're going to end up crashing either the web server itself or the entire system. So of course, you're going to end up losing your customer, if they are connecting to this particular web server, or employees are going to sit there and wait for you to fix this web server. So you are going to lose time employs are going to lose their productive time customers are going to lose faith in your application and your systems. So you want to be very cautious in this particular step.

Finally, after you have tested and the results are positive, which means no issue has been encountered during the testing phase of these patches or updates, you can go ahead and install them. And once you have done the installation, you need to verify the installation, which means you want to ensure that everything is working fine. All the updates or the patches have been installed properly and they have not caused any issue. Now, just to recap in this particular video, we looked at the definition of patch management we looked at what patches do. And finally, we discussed the patch management process.

ITSM and Operations Management

We all know IT serves as the backbone of almost all the businesses that are running across the world today. So why do we need the IT and what is the role of ITSM in or IT Services Management in our businesses? ITSM helps us to deliver the business values through the IT solutions. Which means we implement technology using processes and deliver it to the people. So basically what I'm trying to highlight is that in ITSM, you have three key components which are technology, processes, and people. Let's talk about the people first who are the end users. These are the customers. They are your employers, management, vendors, and the internal, and the external users.

And with the help of ITSM you can define the correct role for each one of these entities who are involved in this group called people. So what happens once you define a role? They can help you contribute and deliver the correct solution, or receive the correct solution depending on their role. Then, we move on to processes. Now with the ITSM, we need to build the correct processes and identify if there are any gaps in the existing processes. Let's forget about ITSM for a minute. Now, if you're running your IT infrastructure, and you have a lot of processes that are running, and you also have a backup process. You have a retrieval process, you have a process to create users, you have a process to delete users. Now, we don't know whether these processes are following the best practices or not.

So when you say okay, you know I am integrating ITSM within the processes that I have. I can build the correct processes or improve upon my existing processes. For instance, ITSM will provide you with the processes for service desk, incident management, ITSM management, change management, problem management, and knowledge management. From the complete IT perspective, you will get lot of processes using which you can tweak your own processes or build new processes. So what if you don't have any IT asset management processes running in your organization? You know you're not going to be able to track your assets. So you use ITSM, its asset management process, and build your own process around it.

Track your asset, allocate your asset, and this whole thing of asset allocation and asset management becomes very easy. Now the third component is technology. It is all about providing the right services to the right people, you can use the correct tools for ITSM. So ITSM can be delivered through various tools. A lot of these tools have a lot functionality built on ITSM processes, and they help you deliver those values to your end customers. Who are your internal customers who could also be your external customers. For instance, you can define the ticket management process. That is, something that you can implement using a tool. There are a lot of open source and commercial products that are available in the market for ticket management.

Somebody logs a ticket, what happens after somebody logs a ticket? So that entire process is built into the tool. It's automated, the value gets delivered to the customer who knows that where the ticket at any given point of time is. Whether the case has been resolved, or any issue that was logged in has been resolved. Now with the technology, you can also ease out your process automation. And because once you build automation into the process, it reduces human errors, and it also reduces human dependency. Let me give you an example. Now let's say, if there is a small organization that does not have a ticket management process. They will log the tickets in a register or an Excel sheet. Now somebody will forget that you've logged the ticket.

There will not be any escalation process built into this kind of ticketing system or the ticketing process. So to say, now once you have a tool, a user can log in a ticket. You as the service provider, assign these ticket to an individual in your team. Now there is an SLA or service level agreement built into this particular tool. So let's say the ticket has to be closed within four hours. Now what happens if your team member does not attend the ticket for four hours? The tool can also have a mechanism built in, which means that there is an escalation mail that gets generated. These kind of processes can be automated using technology. Why do you need ITSM in the cloud environment for the first place?

So we understand that there is ITSM that you would need in the local environment, which is your on-premise data center. But why in the cloud environment? So you need to have a consistent change management process. So for instance in the cloud environment, even if you look at it from the cloud service providers' perspective. Any change that has been built into the cloud, it may end up impacting thousands of customers. So before you build in any change into the cloud environment, you should go through the change management process. Which means that there is an approval body that will approve the change. They will approve the change basis on what you have tested. What kind of change are you looking for.

So nothing gets implemented on ad hoc basis. Everything, every change comes in with a particular process. There has to be the need for issue resolution responsiveness. Let's now go back on the other side and you act like a customer, which is the cloud consumer. There is a particular issue that you have logged in. So now the cloud service provider needs to be responsive enough to get back to you with the resolution. There has been some level of SLAs that have been built into depending on the severity of the problem. Your resolution can be accordingly given to you in x number of hours. So for instance, if it is a severity one ticket, your resolution has to be done in let's say one hours time. Now if it is severity level three ticket, you might get an answer from the cloud service provider in 24 hours.

So it depends how this entire process has been framed. It depends on what severity of the issue is. And it also depends on what kind of service level agreement or SLA you have with the cloud service provider. There is also a need for problem management collaboration. Now when you talk about problem management, essentially it deals with the minimizing the impact of the problem. So basically, not only we are identifying the impact of the problem, we are also identifying the root cause of the problem. So in this case in future if a similar problem comes, you know what the root cause is. So in the cloud environment, this is where ITSM comes into the picture and plays a big role. Because there is a complete process about problem management. You also need to have SLAs for availability and performance.

You have been paying a lot of money to the cloud service provider, depending on the services that you have opted for. Now, your expectation from the cloud service provider is the availability and performance of these services. Obviously, the performance has to be optimal. There has to be a threshold on which you can monitor the performance of these services, and the applications, or the systems that you're using. So now, the question is how do you benchmark the availability and the performance of the services and applications. You have to have service level agreements. These service level agreements need to define the parameters on to which you can judge the availability and the performance. There has to be some level of transparency and quality that has been delivered by the cloud service provider. And that transparency and the quality can only be possible if you have something like ITSM in place.

Because otherwise how would you know what is transparency? And transparency somewhat has to be dependent on the service level agreements. Because if there is an issue that has to be resolved within few hours. Then, of course, you know what kind of transparency from the service provider, that they will resolve the issue in so many hours. So moving on, let's now talk about ITSM and operations management. It is a complete set of activities that you can perform to manage the IT services within an organization or in the cloud environment. Now these IT services can include strategic planning, it could include design, it could include building, and delivering services to the end users. It could also include continual service improvement. Now IT and best practices can be implemented through the ITIL framework.

Now when you talk about IT operations management, it is basically about the administration of technology components and application requirements. Now ITIL also includes provisioning of IT infrastructure. It talks about the capacity management, cost control activities, performance of applications and services, security management of complete IT infrastructure. So basically in nutshell when you are talking about ITSM, it is focusing on the service delivery. So how do you deliver certain set of services that could be based on certain number of processes? Like we are talking about security management, problem management, incident management, and capacity management. Now rightly ITOM or IT operations management on the other hand focuses mainly on the IT processes, what we have just talked about.

It is about delivering certain services to the customers. It is about the administration of all technology components in an application environment. It is also about the complete IT infrastructure where you are talking about the complete operations management within that. Now IT operations management includes IT security management. Which is about written and implemented security management plan that covers the security policy, asset management, HR security, and access control. Then we are talking about configuration management. Which is about maintaining information about the configuration items or what in ITSM world we call it as CIEs. Then we are talking about the change management. Which is about managing and controlling the changes in the IT infrastructure.

So anything that you want to implement in the IT infrastructure, it has to follow a certain process. Then we are talking about the incident management which is basically defined to identify issues. And help them devise a process so that you can prevent the same issues from reoccurring in the future. It is also about restoring service operations. Then we are talking about problem management, which is used to minimize the impact of problems by identifying the root causes of these problems. So not only you identify a problem, but you also identify the root cause for that particular problem. So in future if you know the same problem reoccurs, you know what the root cause is.

Moving on, then we are talking about the release and deployment management, which is to plan, schedule, and control the movement of releases to test and live environment. So, basically, how do you release a particular component and how do you deploy it? It is the management of both of these components. Then moving on, we are talking about the service-level agreements, which is about negotiating agreements between two or more parties. And you basically have to come to an agreement what kind of services you would want to receive. And what kind of services a third party or a vendor can deliver to you. Then moving on, we are also talking about the availability management, which is about the availability of all IT services.

And you have to ensure how to make these services and the components or the devices available in the IT infrastructure. Then we are talking about the capacity management which ensures IT infrastructure is adequately provisioned to meet the business requirements and service level agreements. Then we are also talking about the business continuity management or the BCM, which is about the continuity of business in case of a disaster that takes place. How do you continue to deliver services without even a single second of downtime? So that is what you have to define in the business continuity management. Just to recap, in this video we looked at ITSM, we looked at why we need ITSM in the cloud environment. Then we also looked at IT operations management and various components to it.

Risk Management Process

In the simplest term you can define risk as a potential or probability of losing something that has value. Now, it is a probability or potential but it has not happened. So when you're talking about a risk, we are talking about a potential of future event that is going to cause a harm. It is never in the present or it is not a present event, which means that there is a possibility that something may happen, but it has not happened, as of now. So if I were to give you an example, now, there is a server that has a single disk and it holds critical data. So there is a risk that the disk may fail. Now this is a possibility, this is a probability that the disk may fail. So the disk has not failed as of now. So again, it is a potential of future event.

When we are talking about disc may fail, we are talking about the future. We are not talking about the present. Risk also is always not avoidable. So there will be cases when you have to accept the risk, you will have to avoid a risk, you will have to transfer the risk or simply you will have to deal with the risk. Which means you reduce the risk altogether, but it depends on the nature of the risk. In reality, it is always not avoidable, which means you will have to end up dealing with it or you will have to accept that particular risk. Some of the risk examples that we can talk about, for instance, is a non-compliance to a policy. Let's say that you have a security policy within your organization. Now the risk is that users may not adhere to this particular security policy. So that is a risk.

There is always a chance that there will be some users who will not abide by the security policy, so the risk will always exist. You're talking about the loss of information or the data. That is also a risk. You have a server with a single disk or a hard drive. Now, if that hard drive fails, now that is also a risk. You will end up losing all your information on that particular hard drive if it fails, therefore, it is a risk. Then you're talking about denial of service attack. That is a possibility. Now, you have a web server that is present on the Internet, which means it is outward-facing to the public Internet.

There will always be a probability of a DoS attack, even though you can choose to reduce it by putting various security controls, but there will always be a possibility. This is a risk information breach that may happen to your organization's data. That is also a risk. Now, if your data center is situated near a river, there is always going to be a risk of floods. There are risk-based frameworks that can help you decide what is important to protect. So they will help you figure out what needs to be protected, what need not be protected, and what should your emphasis be on protecting. So the risk based frameworks will help you decide what is critical and what needs to be protected.

They will also help you determine how should you protect what is critical for you. Then they will also help you to identify which approach should you take. For instance, let me give you an example. Now, you're talking about a particular risk. Let's say there is a project for which you do not have the skilled manpower, now that is a risk. How do you handle that particular risk? There is a method called risk transfer. You could outsource that particular piece of application to an external vendor who can do the development for you. So basically you have taken a particular approach, so risk-based frameworks help you determine which approach should that be in what kind of scenarios.

Then of course they will also help you monitor and improve the control, which means not only you have to determine what is important to protect. You also have to determine how to protect and then what approach or which approach is better in protecting these critical assets. And then, of course, you move ahead and monitor and improve the controls. Remember, as the technology changes, your controls will have to change. For instance, if you have a firewall, their is an updated version of that particular firewall that has another feature that prevents denial of service attacks. So of course you want to upgrade your firewall. You will have to continue to monitor your risk. They will change from time to time and they will also change because of improvements of control.

Now let's look at some of the risk response types. First one is reduction. Now, when you're talking about reduction it is basically you're trying to reduce the risk. To give you an example, installing a batch system, or a biometric system, it is about reducing the risk of unauthorized entry. Then we are talking about installing a firewall, which can reduce the risk of attacks on the network or it could reduce the possibility of denial of service attack. Then we are talking about avoidance. Now avoidance is simply when you're trying to avoid a particular risk, which means that you do not want to directly deal with this particular risk. Examples could be you change the scope of the project. Now, you're trying to avoid the risk of getting into a complex project or deal with something that you cannot handle.

Another example could be you buy a commercial product rather than building a custom product on your own. Here you are trying to avoid the risk of using an open source product or building a product that will not have regular updates or upgrades. Then we are talking about risk acceptance, which means you accept the risk. Now for instance, there is an approved deviation from the security policy, you are accepting the risk that may occur. If there is a deviation that is happening, of course, you're allowing somebody to do something which is against the security policy or it is not as per the security policy. But since there has been an exception that has been made to the security policy, therefore, you're accepting that particular risk and then comes the risk transfer. Now sometimes it is better to transfer a risk to third party.

For instance, you purchase the insurance for your data center. Now you're transferring the risk to a third party, which means that if there is anything that happens to the data center, let's say, flood or earthquake or fire breaks out, now there is an insurance, you have actually transferred the risk to insurance company. Another example could be outsourcing a complex task, which means that you transfer the risk to a third party, because you cannot handle or you cannot reduce the risk on your own. Now, all of these types of risk responses will vary from situation to situation. Some of the examples that I have already given to you, but that is not an exhaustive list. You will have different type of risks depending on the situation and the type of risks.

You will have to appropriately use one of the risk response types. Let's now look at the risk management process. First of all, you have to identify the risk. If you do not know what risks exist, you cannot proceed beyond this particular step. Because anything you want to mitigate, or handle or accept, you need to know what you're mitigating or accepting. So first you identify the risk, then you measure the risk. Here, you're measuring the criticality level of that particular risk. Then you have to examine the possible solutions. What solutions could be possible for a particular risk? What type of risk responses can you use? Then accordingly judge what the risk is, what is the criticality and accordingly you have to examine the possible solution. Then you have to go ahead and implement the solution.

Now, let's say you want to accept the risk. In that means you will have to go ahead with that solution. Then you will apply appropriate controls to ensure that you are able to accept the risks properly and handle it. You have to monitor the risk. You have made a list of risks. Now from time to time, the risks will change. They will either increase their criticality or reduce the criticality, or some risks may even just go away. Now, for instance, if you had a web server that was not behind the firewall, or let's say it was behind the firewall, but attacks still happened on that particular web server. Now you have changed the firewall, you have ensured that there are better rules, more stringent rules have been applied. So which means you are reducing the risk for that web server to get attacked or compromised.

Let's now look at some other broad categories of cloud computing risks. So when you talk about the organizational risk, these are basically when a cloud consumer chooses to go with a particular cloud service provider. Now, these risks are pretty natural because you're outsourcing your entire IT infrastructure to a third party. So there are going to be risks. So some of the risk could be provider lock in, you could get locked in with a specific cloud service provider. Then there is a loss of governance. Now when you have the on-premise data center, you can go on it, you can monitor it, you can control it, and you can do whatever you want to do with that infrastructure. But this kind of governance and monitoring, you lose when you move to a cloud environment. Then you have the compliance risk. Cloud consumers often have to deal with significant compliance obligations.

So for instance, if you as the cloud consumer is dealing with the credit card information, health data, or any kind of personally identifiable information, you will have to make sure that these compliance regulations are met when you move to the cloud environment. A cloud service provider may not be able to fulfill these obligations. So therefore, it becomes your liability. It becomes your responsibility. Then another organizational risk could be what if the cloud service provider shuts down or moves out of the business, then what do you have to do? So it will always be a risk. So while selecting a particular cloud service provider, you have to be very cautiously calculating this particular risk as well. So there could be different types of infrastructure risks, for instance, consolidation of IT infrastructure, that can also lead to a risk.

Now, now you have multiple virtual machines running from multiple cloud consumers on the same physical hard drive. What happens if somebody is able to get out of that virtual machine and get into the host system. Then the person has virtually control over the complete set of virtual machines. So that is going to be a problem. There could also be a single point of failure, if there is no replication that you have configured in the cloud environment. There can also be a single point of failure. Another infrastructure risk could be that if you move to the cloud service provider that has a very big large cloud infrastructure, you don't know whether the cloud service provider has enough skill set to handle that kind of infrastructure. That can always be a risk. Another risk could be the shift of the technical control to the cloud service provider which means that most of the technical control lies with the cloud service provider.

Now, if you take an example of the software as a service platform, you virtually don't have any control whatsoever over the technical environment. All you could do is make minor configuration changes within the application that you're dealing with. Or you could manage some of the data that you're storing in that particular application. But other than that, what you have, absolutely no control over the application, or the cloud infrastructure. Then we are talking about the virtualization risks, and there can be many. So for instance the first risk could be the guest breakout. So in which a guest operating system actually breaks out to the host operating system and takes control over the other virtual machines that are running. Now snapshot security. When you're running virtual images in the cloud environment, you tend to take snapshots. Sometimes what happens is you may go ahead and delete the main virtual image, but the snapshots are still left to work. Similarly, if you have created multiple images of a virtual machine, what happens to these virtual machines?

Those are the risks with the snapshots and the images. Then of course, you have the VM sprawl. That is another major issue. You could tend to create a lot of virtual images in the cloud environment. Then you basically face a risk of losing control over these virtual images, because you don't know how many you have created, you don't know how many you are controlling. So there could always be a risk that some virtual machine which could be lying idle, they would be literally in a dead state and not functioning, but they do exist in the cloud environment. Let's not talk about the legal risks. So there could be a risk on the data protection, then we would have the risk about the jurisdiction. So which means what is the data storage that you have chosen, where the data is stored and where it falls into. If there is any kind of legal complication, which jurisdiction do you go and fight your legal case?

Then we're talking about the non-cloud specific risks. These could be natural disaster, unauthorized facility access, social engineering, network attacks, and of course, the default password. And then finally, we are talking about the cloud specific risks. These could be management plane breach. Remember management plane is basically used to monitor your cloud infrastructure. Now, if there is a breach on that, just imagine you're going to lose everything that you run in your infrastructure. Then of course, you have the resource exhaustion. Now, this could be due to denial of service or DoS attack. So if there is a denial of service or DoS attack on a particular web server, it will make sure that the web server runs out of its resources.

So there is a resource exhaustion that happens on the web server. These are some of the cloud computing risk category that we have just talked about. And of course, you can think of many more categories, if you can, and you can also think of many more examples within these existing categories. But nevertheless, these were some of the broad level categories that we have discussed. Just to recap, in this video we discussed about what is the risk, what are the types of risks, and of course the risk management process and the cloud computing risks.

Communication with Stakeholders

In the on-premise data centers, we have a lot of stakeholders who would be your users, the senior management, the software development team, or the external vendors. Similarly, in the cloud environment also we have various stakeholders. So when we talk about stakeholders, they can be your customers, partners, business leaders, senior management, and the users within your organization. You could also have investors, somebody who's funding the entire cloud movement or your infrastructure in the cloud, or it could be a regulator who would be ensuring that you are following the compliance process. And then, of course, you would also have vendors because you need to have certain work done from these vendors or you are taking some services from these vendors.

Or maybe you need to buy something from these vendors for the infrastructure that you are running. Now these stakeholders are going to be required when you're talking about the cloud operations. They're going to be playing different roles. So for instance, your customers would be somebody who would come and log on to your web application or purchase something through your web application. Or it would be somebody who's downloading a piece of software from your web application. Similarly, somebody who's funding your business, let's say somebody who's given you $10 million for your entire cloud setup. So you can move from your on-premise data center to the cloud environment. Or it could be an investor who bought a stake in your company. It depends, but there are going to be stakeholders for sure. Each one is going to play a different and unique role.

You have to see how you communicate with each of these stakeholders. Let's see how these stakeholders get involved. So you are going to need them from making the cloud-related decisions when you are moving your infrastructure from on-premise to the cloud. Of course, there is going to be a lot of back and forth with the stakeholders. Not all the stakeholders are going to get involved at this stage. For instance, your customers are not going to get involved. It will be your investor. It will be your senior management or the vendor who'll probably help you in the movement in terms of how do you move your entire infrastructure from the on-premise data center to the cloud data center? So they are going to be required for making certain cloud related decisions.

Now they will also help you in identifying business processes that can be moved to the cloud. And of course, you cannot just pick up everything from the on-premise data center and move it to the cloud. That is not the way you should be doing it. You have to meet your stakeholders. You have to find out which business processes you can move later on and which processes you should not be moving to the cloud at all. So for instance, your senior management may simply decide that the finance operation should not be moving to the cloud. They want to keep it in the on-premise data center. They want to store the data locally for the finance operations. They do not want to move the finance data to the cloud at all, which is fine.

I mean, it depends how critical your finance data is. Of course, every organization's finance data is critical. But now specifically, when you are dealing it in millions and billions of dollar, it even becomes more critical. So you will have to sit down with the stakeholders, identify the business processes to be moved to the cloud. You will have to determine the impact on the current services. So for instance, if everything is running up and fine in the on-premise data center, it might be difficult to, first of all, convince your stakeholders that you should be moving to the cloud. Now even if you convince them, you will have to evaluate what is the impact on the current services? Is there going to be a downtime?

How do you ensure whether there is no downtime or least amount of downtime on the current services when you're moving to the cloud? Basically, you will have to sit down with the stakeholders and help them understand what is the future state of the business operations. So for instance, you can give them an example when talking about the on-premise data center. You can tell them that it is very difficult to scale up now in the holiday season, specifically let's talk about Christmas. At that time when the sales are at the highest peak, it is very difficult to all of a sudden expand and then trim down the operations. Now in the cloud environment, that is not the problem because you can add more resources, you can remove them as and when required.

Let's now look at some of the stakeholder identification challenges. Now when you're talking about taking the ownership of reviewing and defining enterprise architecture, now the biggest question here comes is who owns it and who approves it? Who's going to be the right person? So for instance, when you go to the senior management, you tell them okay, you know my architecture for the cloud environment is ready. This is how much it is going to cost you. But now the problem is they are probably the most non-technical people you will come across. They will only be talking business. And therefore, because they do not understand the architecture, how do they approve it? So you will have to figure out a way who owns the enterprise architecture? Is it somebody who's like the chief technology officer who's going to go and talk to the senior management, who's going to convince the senior management, and who takes the ownership, but then the approval has to come from the stakeholders?

You have to look at the possibility of whether you want to have a public cloud, you want to have a private cloud, or you have to have a hybrid cloud which is on-premise and the public cloud. So what kind of solution are you looking at? So who's going to do it? When are you going to do it? And where are you going to do it? So is it going to be a private or hybrid cloud? Then comes the question, how do you select the correct cloud service provider? So within your stakeholders, who's going to help you identify the correct cloud service provider? How do you benchmark good versus the bad? So you will have to think over this particular question, and you have to do some bit of analysis. You have to see what services you are looking for and what is being offered in the market by different cloud service provider.

Based on that, you will have to go and meet your stakeholders, and then you can tell them this is the benchmark that you have done. And then you can tell them that this is the benchmark that has been done. This is what we need. This is what is available in the market. And of course, your senior management and the investors, they would be worried in terms of spending money in the cloud environment because your on-premise infrastructure is running fine anyways. So of course, you'll have to tell them the good and the bad side of the movement to the cloud. You'll have to tell them the cost that is going to be involved. So when you're talking about deciding and moving all or some services now, of course, you know that you do not want to shut down your operations and move them to the cloud in one go.

You will end up collapsing your business if you do so. So the best method is to take piece by piece and move it to the cloud. That is probably the best method. You can do this with minimal or maybe no interruption to the existing services that are serving the customers or the internal stakeholders like users. So you'll have to loop in your key stakeholders. So for instance, you'll have to loop in the users because you want them to know the impact of this particular movement to the cloud. So let's say you have moved one service to the cloud. Now you have set up the web application, you want to see the response time. No other than users would be the best candidate to evaluate that.

Of course, you will need to identify the direct and the indirect costs. So the direct and the indirect costs will make lot of difference to the decision-makers or the stakeholders. So if the cost is in millions of dollars and they are not seeing any direct or indirect benefit of spending that much money, probably you will have difficulty in convincing your stakeholders. So you will have to be very careful in identifying your costs, and then go and talk to the stakeholders, convince them. Of course, don't forget to talk about the benefit. Don't forget to talk about the return on investments, or ROI, when you're talking about the cost to your stakeholders. When you're talking about the risk management plan, you should have it for your on-premise data center. Now you will have to extend that particular plan and extend it to the cloud.

So basically, you will have to simply extend your existing risk management plan to the cloud, and you will have to add more risks that relate to the cloud environment. Now if you move to the cloud, there are going to be possible risks that you are going to face. So of course, you will have to go and talk to your stakeholders and make them aware of these risks so that you can get their buy-in. Now the biggest question is when you are moving to the cloud is who owns what? This is one of the key question that your stakeholders are going to ask you. Who's going to be responsible for the entire infrastructure, the physical infrastructure, or the logical infrastructure? Or, for example, keeping the up-time of the services, who's going to be responsible for the reliability of the infrastructure?

Who's going to be responsible for the confidentiality of the data? So you will have to answer all this. First of all, you have to decide what you want to use in the cloud environment, whether it is going to be the platform as a service, infrastructure as a service, or it is going to be software as a service. The needs from organization to organization will differ. Somebody would simply go for software as a service. Let's say they would simply go and adopt Google Docs, or they would simply go for Salesforce CRM, or they would simply go for Office 365. Now it depends. Somebody may want a development environment, or somebody may want to set up a small network online, for which they would need to use infrastructure as a service.

So depending on what type of cloud environment you're using, what type of cloud deployment you're doing, your responsibilities will differ and you'll need to be very clear about your responsibilities. First of all, which deployment model you're going for. Secondly, you'll have to have clear details of your responsibility as well as the cloud service provider's responsibilities. Just to recap, in this particular video we talked about various stakeholders, who are the stakeholders, what kind of stakeholders we have, and why it is necessary for you to have their buy-in before you move to the cloud environment.